Summary: Added a CUDA implementation of the PiecewiseLinearTransformOp.
Differential Revision: D5378537
fbshipit-source-id: 38857f59f5cc52e16e1ecc97983a0b0b82a46c74
Summary:
# Added the gradients of the operation for both CPU and CUDA kernels.
# Unified variable names across all ops.
# Added reference implementation in numpy.
# The gradient check needs a larger stepsize to succeed, is that normal?
Reviewed By: akyrola
Differential Revision: D5313682
fbshipit-source-id: aceb92649e01c5caeba8774e678f9095502d396c
Summary: replace params with sp, otherwise it will report an empty list
Reviewed By: akyrola
Differential Revision: D5382716
fbshipit-source-id: 34d8e6ee00cbe1718702e3d1f23ea12f8d65063e
Summary:
- Integrated RFF into the preprocessing workflow for dense features
- Developed Flow interface to input RFF parameters
- Created unit test for using RFF with sparseNN
Reviewed By: chocjy
Differential Revision: D5367534
fbshipit-source-id: 07307259c501a614d9ee68a731f0cc8ecd17db68
Summary:
To be used with predictor "online": C++ version of memonger for simple nets. Very simple greedy algorithm. Works well at least on Resnet-50 inference graph: only 3 shared blobs are used.
Next I will integrate this with predictor and run canary (separate diff).
Reviewed By: asaadaldien
Differential Revision: D5375392
fbshipit-source-id: d36e419e39a32e568e105657c27fb00c85a2535d
Summary:
As the title says.
Closes https://github.com/caffe2/caffe2/pull/879
Differential Revision: D5372787
Pulled By: akyrola
fbshipit-source-id: 0ff469c0d227f1b2252c1a0c4f6f8bebaac5580f
Summary: Add synchronization barrier API with configurable timeout. Users can call Synchronize() to join variable length execution before resuming multi-machine communication steps, i.e., resuming distributed training iterations after validation on a single machine.
Reviewed By: akyrola
Differential Revision: D5348387
fbshipit-source-id: 5826da10e6a60c50394c36c7cf47624f10191d11
Summary: Memonger had a bug that it crashes if an input blob was input to multiple ops. This fixes that and adds a test.
Reviewed By: asaadaldien
Differential Revision: D5374860
fbshipit-source-id: 1d5044001eacdbe6db43f69727da9297558f5c5c
Summary: Huge improvement in my tests, and it does not really hurt either.
Reviewed By: wesolwsk
Differential Revision: D5374925
fbshipit-source-id: c96a4ed2ca653120a82233c0037cbfded8a2d2a1
Summary:
b33894e95d removed this line:
```py
unittest.skipIf(workspace.NumCudaDevices() < 2, "Need at least 2 GPUs.")
```
but forgot to add it back later.
```
_________________________________ DataParallelModelTest.test_equiv __________________________________
...
if p2p_access_pattern is not None and not p2p_access_pattern[
> devices[0], peer
]:
E IndexError: index 1 is out of bounds for axis 1 with size 1
...
WARNING:data_parallel_model:** Only 1 GPUs available, GPUs [0, 1] requested
```
/cc akyrola
Closes https://github.com/caffe2/caffe2/pull/888
Reviewed By: akyrola
Differential Revision: D5341310
Pulled By: harouwu
fbshipit-source-id: 8d7f06913c7b5a42009a4033dbb6a48a8e812822
Summary:
- Created the random fourier features layer
- Generated a unit test to test the random fourier features layer is built correctly
- Inspired by the paper [[ https://people.eecs.berkeley.edu/~brecht/papers/07.rah.rec.nips.pdf | Random Features for Large-Scale Kernel Machines]]
Reviewed By: chocjy
Differential Revision: D5318105
fbshipit-source-id: c3885cb5ad1358853d4fc13c780fec3141609176
Summary:
Otherwise was always added to main net instead of param_init_net when
desired (i.e. initial param sync)
Closes https://github.com/caffe2/caffe2/pull/894
Differential Revision: D5367451
Pulled By: akyrola
fbshipit-source-id: 3d82be6da687c736bd15f4852dbd272266eb4811
Summary: Allows to override the input/output record as long as the field blobs are the same.
Reviewed By: yangyangyyy
Differential Revision: D5362132
fbshipit-source-id: 3ac2ac22802902b7eed5c226b00a7e1971ad264c
Summary:
It is quite common question when users get some variant of "blob has version 2 but gradient expects version 1" in their backward pass. The error message is completely unhelpful.
To remedy this, I added proper debug information which tells user how the version number of a blob was incremented over time. i.e which ops caused the version to go op. This should help
understand the issue.
Reviewed By: dzhulgakov
Differential Revision: D5358227
fbshipit-source-id: bc09d048ac33200c35d56460e44e86c2f2888f3f
Summary: Added two operators that can be used to tranfer data into the input format of RNN and back.
Reviewed By: kittipatv
Differential Revision: D5329886
fbshipit-source-id: 07eac29416427b08c49989d4eeed50a6f18493a1
Summary: This was broken in a previous diff, fixing it to use model device type.
Reviewed By: asaadaldien
Differential Revision: D5356005
fbshipit-source-id: a4fcc932bae772076b57625a5fcc0d38eb702cc9
Summary:
this works as a standalone python script because args are
global. When used from Flow for monitoring purposes it doesn't
work. This diff fixes it
Reviewed By: zem7
Differential Revision: D5349996
fbshipit-source-id: f73842901d975b783e09e9db0565eb81880bbea1
Summary:
A couple of fixes to fix broken rerporting of lstm_benchmark:
- last_time must be recorded after warm up
- entry count was incorectly removed
Reviewed By: salexspb
Differential Revision: D5349890
fbshipit-source-id: 5dd5bdf46594c520b61bc3b57b153f90a6a17903
Summary:
Eliminates failures from overloaded machines from only
running a few examples before being timed out.
Reviewed By: tomdz
Differential Revision: D5349555
fbshipit-source-id: 89d1db063f58c72656b37157225a586c9e3f24bc
Summary: Lets try this again. Verify graphs every time memonger is run. Will definitely check for time though.
Reviewed By: akyrola
Differential Revision: D5308188
fbshipit-source-id: 512a76c759b670d31c49d1d492dd8ee1eaf3bafd
Summary:
This add CollectivesConcurrencyControl class to mange creating common context and cyclic controls to execute GLOO collectivces
and refactors AllReduce and _AddDistributedParamterSync to use it
Reviewed By: akyrola
Differential Revision: D5335795
fbshipit-source-id: 5084e0a65cdb989cd949be3868b77a680561022d
Summary:
This is for the ease of removing the common fields of a struct from another.
For example,
s1 = Struct(
('a', Scalar()),
('b', Scalar()),
)
s2 = Struct(('a', Scalar()))
s1 - s2 == Struct(('b', Scalar()))
More examples are provided in the code comments.
Differential Revision: D5299277
fbshipit-source-id: 7008586ffdc8e24e1eccc8757da70330c4d90370
Summary:
In some cases we don't want to compute the full FC during eval.
These layers allow us to compute dot product between
X and W[idx,:] where idx is an input, e.g., label.
Reviewed By: kittipatv
Differential Revision: D5305364
fbshipit-source-id: 0b6a1b61cc8fcb26c8def8bcd037a4a35d223078
Summary:
similar to sparse_nn all gpu, this is our first step towards offline full gpu experiment.
**Compare Run**
cat(128, 32)512-512 :
GPU 21138598 https://fburl.com/jpeod1pi
CPU 21138787 https://fburl.com/vma7225l
Reviewed By: dzhulgakov
Differential Revision: D5308789
fbshipit-source-id: 413819bf9c5fff125d6967ed48faa5c7b3d6fa85
Summary: Combine _AddDistributedParameterSync() and _SyncParams() into a single function to broadcast across distributes machines and all local GPU simultaneously. This is similar to how calls to Allreduce has already optimized using the functionalities of Gloo. All the refactoring work is contained in data_parallel_model.py.
Reviewed By: akyrola, andrewwdye
Differential Revision: D5329277
fbshipit-source-id: 4407b88980cf396f2e0f994d796294fa79fd39ed
Summary:
This bug in the test was exposed by https://github.com/caffe2/caffe2/pull/861 (previously, the test was always using the cuDNN engine, regardless of the value of `engine`). This bug is now blocking https://github.com/caffe2/caffe2/pull/817.
```
____________________ TestConvolution.test_convolution_sync _____________________
...
if use_cudnn and requested_engine != 'CUDNN':
raise ValueError(
> 'When use_cudnn=True, the only engine you can specify is '
E ValueError: When use_cudnn=True, the only engine you can specify is "CUDNN"
```
https://travis-ci.org/caffe2/caffe2/jobs/247605579
Closes https://github.com/caffe2/caffe2/pull/881
Differential Revision: D5332619
Pulled By: akyrola
fbshipit-source-id: 63737768a155359ddbbef1da424fcbb94f86bd4e
Summary: This should make it so we no longer have super hacky DAG chains just to generate vectors of indices that could be specified at model creation time
Reviewed By: akyrola
Differential Revision: D5316707
fbshipit-source-id: 97bb3868b69e0c5a7f465c95f2e16ae0485dcc56
Summary:
Fixes a memonger bug where it could recycle a blob that was released by the same op being processed.
Added a verification step to ensure in-place assignments are not changed.
Reviewed By: asaadaldien
Differential Revision: D5331495
fbshipit-source-id: 20b08f6de5b973e8c9868aa048c142cac1eb6c58
Summary: Implement slice gradient for CPU. Will soon port this over to GPU so NMT can use it
Reviewed By: akyrola
Differential Revision: D5309305
fbshipit-source-id: 8fb5f4e665f236ecce9227c5c0c302f5076b01ad
Summary:
Made them faster.
This should be equivalent to the algorithm akyrola suggested, just with a list (of parents) as an intermediate representation instead of a string.
Reviewed By: akyrola
Differential Revision: D5308133
fbshipit-source-id: c976a513d10e79c157ea803afb99b147e9ea3357
Summary: Data workers test timeouts randomly (very seldom), and looks like the reason is that we call FeedBlob in a thread (eneuque-thread), and first time that is called, it will call workspace.CreateBlob() -- which is not thread safe. Fix this by initializing the scratch blobs explicitly.
Reviewed By: panshen1
Differential Revision: D5292426
fbshipit-source-id: d7dad68f3ccc636c60bd82b2527f00f20da298b5