Summary:
Working towards https://github.com/caffe2/caffe2/pull/817.
`E InvalidArgument: Insufficient bytes of entropy to draw requested array. shape=(4, 2, 5, 1, 3, 5, 5, 1), dtype=float32. Can you reduce the size or dimensions of the array? What about using a smaller dtype? If slow test runs and minimisation are acceptable, you could increase settings().buffer_size from 8192 to at least 24576000.`
https://travis-ci.org/caffe2/caffe2/jobs/243867951
Closes https://github.com/caffe2/caffe2/pull/828
Differential Revision: D5276723
Pulled By: akyrola
fbshipit-source-id: f7d0e2dd8ef8b6a2354bd4ff7c7446c377c954b4
Summary:
Working towards https://github.com/caffe2/caffe2/pull/817.
`E InvalidArgument: Insufficient bytes of entropy to draw requested array. shape=(20, 12, 22), dtype=float32. Can you reduce the size or dimensions of the array? What about using a smaller dtype? If slow test runs and minimisation are acceptable, you could increase settings().buffer_size from 8192 to at least 43253760.`
https://travis-ci.org/caffe2/caffe2/jobs/243867951
/cc kittipatv
Closes https://github.com/caffe2/caffe2/pull/830
Differential Revision: D5276639
Pulled By: akyrola
fbshipit-source-id: 0c21be25ecd931837dc8b0c2cc17048f531350d1
Summary:
We want to make sure that a graph optimized by memonger doesn't have any possibility of two threads writing into the same output blob at the same time, when blobs are renamed.
Creates a graph where edges are built such that a parents node's output blob is a child node's input blob, and there is no node in between the parent and child node that writes to the same blob. If two nets generate the same such graph, then the "path" of data is the same.
Reviewed By: akyrola
Differential Revision: D5210385
fbshipit-source-id: 6317fc4e16289339b50c2dcd86ec8b32d2d544a5
Summary:
This is a real implementation (not GPUFallbackOp) of the TopKOp for GPU.
There are two algorithm implementations:
-for k <= 512, it maps to a warp-wide min-heap implementation, which requires only a single scan of the input data.
-for k > 512, it maps to a multi-pass radix selection algorithm that I originally wrote in cutorch. I took the recent cutorch code and removed some cutorch-specific things as it made sense.
Also added several utility files that one or the other implementations use, some from the Faiss library and some from the cutorch library.
Reviewed By: jamesr66a
Differential Revision: D5248206
fbshipit-source-id: ae5fa3451473264293516c2838f1f40688781cf3
Summary: The old version used one block with 128 threads. Throughput was too low for the NMT use case (calculating squared gradient norms for every parameter), so this increases the throughput. Shaves 7% off CNN model training time per step
Reviewed By: wickedfoo
Differential Revision: D5263748
fbshipit-source-id: adc3bacd11e49ea00c60381d613d993050e899be
Summary:
While this is not intended to be the best performat and
general solution, we can see from the test plan in some cases static DAG RNN could
perform better than our own implementation. Hopefully we will get
dynamic RNN DAG execution at least as fast as this one. Then we will
not need this one in production, only for testing.
Still putting it into our benchmark for comparison purposes
Reviewed By: akyrola
Differential Revision: D5210038
fbshipit-source-id: fa44baf51c455872abd6ec5f5d151cf06e15b1fa
Summary: I accidentaly noticed that we were calling the non-CUDNN version of Transpose with attention, and it is super slow. This broke when rnn_cell was changed to use ModelHelper instead of CNNModelHelper in D5062963, but calls to transpose were not "brewed".
Reviewed By: jamesr66a
Differential Revision: D5264248
fbshipit-source-id: b61494ae210f34597245f1195d20547f5b5cd8b5
Summary: Don't want to assert since it can be useful to sometimes create models that are not run (for example, unit tests).
Reviewed By: pietern
Differential Revision: D5258905
fbshipit-source-id: f1beee0605bfef235ed0f23f7e78259109720254
Summary: This makes it easier to gather top-K by group of rows. This is useful in the situation where we want to pick up top-K from batch of fixed length sessions. Let `N` be number of sessions, and `M` be number of examples in a sessions. We would have a batch of `N * M` rows. We can reshape the score blob to `N x M`, and use it as input to `TopK` to select top score for each session. However, without the new output, it's would be inconvenient to gather the rows corresponding to the top scores. The indices are in `[0, K-1)` range. The new output can be used directly as input to `Gather`.
Reviewed By: chocjy
Differential Revision: D5171459
fbshipit-source-id: 69f7b41456c3f9670650ae07afc8fef8328485e9
Summary:
The global StatRegistry doesn't get reset when the workspace is reset.
```
> self.assertTrue(len(workspace.FetchBlob('k3')) == 2)
E AssertionError: False is not true
```
https://travis-ci.org/lukeyeager/caffe2/jobs/240162665
/cc azzolini
NOTE: this error doesn't show up if you just run `stats_ops_test.py` directly. It shows up when you run other tests in the same session before this test:
```
pytest -v caffe2/python/
```
Closes https://github.com/caffe2/caffe2/pull/788
Differential Revision: D5259232
Pulled By: salexspb
fbshipit-source-id: 3c72633af6bb61c4fda62195298b1e9574b4cbef
Summary: Upgrades this file to use brew instead of CNNHelperModel
Reviewed By: harouwu
Differential Revision: D5252089
fbshipit-source-id: 6df4350717c1d42bc4bcc63d255cd422f085ee05
Summary: Implementation of the SliceOp for CUDA
Reviewed By: akyrola
Differential Revision: D5254287
fbshipit-source-id: 0a1660e1aa161fd088a2d8f886e019c05a1919a2
Summary:
```
File "/data/caffe2/install/caffe2/python/hypothesis_test.py", line 1911, in test_batch_to_space
(w + 2 * pad) / block_size).astype(np.float32)
File "mtrand.pyx", line 1404, in mtrand.RandomState.randn (numpy/random/mtrand/mtrand.c:19843)
File "mtrand.pyx", line 1534, in mtrand.RandomState.standard_normal (numpy/random/mtrand/mtrand.c:20368)
File "mtrand.pyx", line 167, in mtrand.cont0_array (numpy/random/mtrand/mtrand.c:6127)
TypeError: 'float' object cannot be interpreted as an index
```
```
File "/data/caffe2/install/caffe2/python/operator_test/tile_op_test.py", line 101, in tile_ref
tiled_data = np.tile(X, tuple(dims))
File "/data/caffe2/venv/local/lib/python2.7/site-packages/numpy/lib/shape_base.py", line 881, in tile
return c.reshape(shape_out)
TypeError: only integer scalar arrays can be converted to a scalar index
```
I also tested to make sure this still works with 0.11.
Closes https://github.com/caffe2/caffe2/pull/787
Differential Revision: D5248087
Pulled By: salexspb
fbshipit-source-id: eff69482a8eabb8ace330003fa326c832b53865f
Summary: Deprecate CNNModelHelper in python/workspace_test.py to use Model_Helper instead of CNN
Reviewed By: harouwu
Differential Revision: D5251778
fbshipit-source-id: d634f1c76e41a95b0247ebf5d5a48aef6f8e232e
Summary:
This diff deprecates `CNNModelHelper` in the `AlexNet()` function. More diffs will be coming to deprecate the helper in other functions.
Depends on D5241738
Reviewed By: harouwu
Differential Revision: D5247004
fbshipit-source-id: eec5c5ef916a85de8289cb92d2174a6a4b8075bf
Summary: Hard-to-debug problems arise when a gradient creator fails when the forward op is incorrect itself. Add checking of the schema before callig the creator. Also clarify the error messages
Reviewed By: Yangqing
Differential Revision: D5256016
fbshipit-source-id: 78550f7e2ce5b88e26b69fdae4be0eece52edfea
Summary:
The current version of schema.py has a Metadata class with three fields. The default for it is set to
four Nones. This is just changing that to three Nones so that the number of default values matches the number
of actual fields.
Reviewed By: kennyhorror
Differential Revision: D5250463
fbshipit-source-id: 42e5650d270f5f63662614d8445b4819ed370dec
Summary: Also fixed a small bug in ModelHelper constructor
Reviewed By: harouwu
Differential Revision: D5246799
fbshipit-source-id: 3719ca078f0e2b5e463fc93da9c8215f5583bd9a
Summary:
We need to support RNNs explicitly in ExtractPredictorNet, because they store sub-nets as strings in special arguments. When netdef argument arrive, we can generalize this a bit.
Added a test under rnn_cell_test to test that extracting an LSTM predictor net works correctly and sets the device option properly for the step net ops.
Reviewed By: yqwangustc
Differential Revision: D5236334
fbshipit-source-id: cd653427f8c440a14d94195a532d18276f94749a
Summary: A quite common problem is that it is hard to load blobs with pe.load_from_db to a specific device. One must set the device options of the returned init_net and predict_init_net, which is quite magical. So I made load_from_db() able to set these device options automatically, based on device scope or device_option parameter. Added an unit test.
Reviewed By: asaadaldien
Differential Revision: D5249202
fbshipit-source-id: 7b9d91476cb8d1b0ec0d9772e50b9148b8b184fa
Summary:
salexspb This fixes a major perf issue (40% boost on alexnet end-to-end perf) in the multi-precision SGD optimizer - it was causing repeated cudaMalloc / cudaFree calls during training iterations due to the changing size of the `grad` blob as it moved from fp16 <-> fp32.
Closes https://github.com/caffe2/caffe2/pull/797
Differential Revision: D5246978
Pulled By: salexspb
fbshipit-source-id: ec3d7ef18445e19eaf5aac908d0a7bcd5957eb60
Summary: This was only needed in order to initialize stateful PythonOps. Now PythonOp has support for initialization at Op creation time, so this is not used anymore.
Reviewed By: dzhulgakov
Differential Revision: D5242908
fbshipit-source-id: dbaa249466dd0f37f25d204d387b1f99c6dd4fed
Summary: This is going to show a python Caffe2 user where a failed operator was created. Motivation for having this information not right in protobuf is to avoid having it too verboose and keep ability to read protobufs of a net after a simple print() call.
Reviewed By: jamesr66a
Differential Revision: D5226047
fbshipit-source-id: 7edfe850e05a2ec209577142aa3368664a57a108
Summary:
This allows to construct a python op by passing a pickled "builder function call" as an argument to the op.
The builder function is called at PythonOp construction time and returns a function that will be called when the op is run.
This way we allow to drop the dependency on 'tokens', which didn't work properly for protobufs that get distributed to other processes. Now, the PythonOp definition is self-contained: as long as the build dependencies are right, sharding the protobuf is enough to execute the net remotely.
Reviewed By: dzhulgakov
Differential Revision: D5080833
fbshipit-source-id: a5deaca5d3143024cdb121519689224e9dbec5ce
Summary:
truncate id list using the max length computed in compute meta, so that it has determined length,
which is useful for position weighted pooling method.
Reviewed By: sunwael
Differential Revision: D5233739
fbshipit-source-id: f73deec1bb50144ba14c4f8cfa545e1ced5071ce
Summary: Recently people find that this test is too strict because of proto string matching. Thus, I change it to compare fields so that this test will not complain even if protobuf chnaged in future.
Reviewed By: dzhulgakov
Differential Revision: D5229855
fbshipit-source-id: 54efcd7a0f9e5dbba1ddeb480801abcb859e07bd
Summary: added an operator that converts key/value blobs into a blob containing a map pointer, unittest passed.
Differential Revision: D5224449
fbshipit-source-id: 2f60754ed3ba6ed16039c09019117ae3c3646ab2
Summary:
Diff D5224410 initializes the should_stop_blob explicitly. With that, we will
have one more blob when executing the job. Adjusts the check accordingly.
Reviewed By: azzolini
Differential Revision: D5228398
fbshipit-source-id: 439b186c30b0b1d0e41e513babbcccd85e7a1b4a
Summary:
We waste extra memory by creating two autosplit gradient
blobs and then accumulating it into them main one. Sometimesk, when Sum
/ Sub ops are involved, we can avoid wasting extra memory at all.
Ideally we would not waste any memory and make ops add to the same
blob rather then calculating separate results and then mering
them. But it would require a substantial change to the frameworks and
rewriting a lot of operators.
Reviewed By: dzhulgakov
Differential Revision: D5157667
fbshipit-source-id: 8293824d6cdd971d8853ae90aee68e4a6d1e132b
Summary:
It's very useful for simple cases like benchmarking nets where we want to encode input/output record in the net and don't want to go through the hurdles of storing input/output record in MetaNetDef.
For those cases I propose remapping the input/output record before saving to 'input_record/{field_name}'. Then we can recover input/output record back just based on the names of the blobs.
Differential Revision: D5170473
fbshipit-source-id: ac5daa60051605ed93022aec1377a49f08f15663
Summary: This diff fixes an issue with running the same reader in the same workspace multiple times. In order to achieve correct behavior of execution step we have to explicitly initialize should_stop_blob with False.
Reviewed By: kennyhorror
Differential Revision: D5224410
fbshipit-source-id: 4ad2740e187b62b0a1f5612ea3eef223dcc8a799