Summary:
/cc akyrola is it possible this test has been broken ever since 5614816fce?
More generally, why do we still have `hypothesis_test.py` at all? In the case of this test, surely one of these files does more than this one old test:
* `operator_test/cudnn_recurrent_test.py`
* `operator_test/recurrent_network_test.py`
* `operator_test/rnn_cell_test.py`
Closes https://github.com/caffe2/caffe2/pull/843
Differential Revision: D5292109
Pulled By: akyrola
fbshipit-source-id: 6df5df6353a9741d1ae1b796adaab98382857527
Summary:
Working towards https://github.com/caffe2/caffe2/pull/817.
`E InvalidArgument: Insufficient bytes of entropy to draw requested array. shape=(4, 2, 5, 1, 3, 5, 5, 1), dtype=float32. Can you reduce the size or dimensions of the array? What about using a smaller dtype? If slow test runs and minimisation are acceptable, you could increase settings().buffer_size from 8192 to at least 24576000.`
https://travis-ci.org/caffe2/caffe2/jobs/243867951
Closes https://github.com/caffe2/caffe2/pull/828
Differential Revision: D5276723
Pulled By: akyrola
fbshipit-source-id: f7d0e2dd8ef8b6a2354bd4ff7c7446c377c954b4
Summary: Upgrades this file to use brew instead of CNNHelperModel
Reviewed By: harouwu
Differential Revision: D5252089
fbshipit-source-id: 6df4350717c1d42bc4bcc63d255cd422f085ee05
Summary:
```
File "/data/caffe2/install/caffe2/python/hypothesis_test.py", line 1911, in test_batch_to_space
(w + 2 * pad) / block_size).astype(np.float32)
File "mtrand.pyx", line 1404, in mtrand.RandomState.randn (numpy/random/mtrand/mtrand.c:19843)
File "mtrand.pyx", line 1534, in mtrand.RandomState.standard_normal (numpy/random/mtrand/mtrand.c:20368)
File "mtrand.pyx", line 167, in mtrand.cont0_array (numpy/random/mtrand/mtrand.c:6127)
TypeError: 'float' object cannot be interpreted as an index
```
```
File "/data/caffe2/install/caffe2/python/operator_test/tile_op_test.py", line 101, in tile_ref
tiled_data = np.tile(X, tuple(dims))
File "/data/caffe2/venv/local/lib/python2.7/site-packages/numpy/lib/shape_base.py", line 881, in tile
return c.reshape(shape_out)
TypeError: only integer scalar arrays can be converted to a scalar index
```
I also tested to make sure this still works with 0.11.
Closes https://github.com/caffe2/caffe2/pull/787
Differential Revision: D5248087
Pulled By: salexspb
fbshipit-source-id: eff69482a8eabb8ace330003fa326c832b53865f
Summary: Use `CopyItems` so that it accepts any type of tensor. Also, move the cursor to input blob so that it's checkpoint friendly. Output is now also part of input so that inference can work correctly.
Reviewed By: xianjiec
Differential Revision: D4920987
fbshipit-source-id: da532736225ec27f409ff763ff69a0629235151c
Summary:
Implement NormalizeOP for GPU using CUDA, and re-write the graident to be a function of the output
so its more efficent specially for CUDA implemntation.
Reviewed By: akyrola
Differential Revision: D4971300
fbshipit-source-id: e0ab66462000988aaf1f26010ea550533d107167
Summary: Both SquaredL2Distance and SquaredL2DistanceGradient had bad CUDA implementations. Use proper reductions and batched kernels.
Reviewed By: asaadaldien
Differential Revision: D4968527
fbshipit-source-id: f7cf82072d38bc127c757c5751863a9439aca8b5
Summary:
Similar to SafeDequeueBlobsOp, but add weight-based sampling for reading from multiple input BlobsQueue.
WeightedSampleDequeueBlobsOp will take a vector of weights (each weight is mapped to one input blob queue).
Based on probability, we will choose which BlobQueue to fetch.
WeightedSampleDequeueBlobsOp shall stop when any of input BlobQueue is empty.
Reviewed By: dzhulgakov
Differential Revision: D4905160
fbshipit-source-id: 5b1551e2250569f933a6c01ed04442843c5e0cb6
Summary:
add necessary ops for feature processing
* logit op
* replace nan
* batch one hot op
Reviewed By: kittipatv
Differential Revision: D4840869
fbshipit-source-id: 197123ea5608d54f0b5ac7899973a077a6a86775
Summary:
Quite large diff to make cuDNN LSTM and our LSTM produce same results and provide python API for the cuDNN LSTM.
* Added operators RecurrentParamGet and RecurrentParamSet to access weights and biases for the different gates, input/recurrent.
* Removed RecurrentInit as not needed
* recurrent.cudnn_LSTM() returns a special net and mapping that can be used to retrieve the parameters from the LSTM
* recurrent.cudnn_LSTM() can be passed blobs that have the parameters for the individual gate weights and biases
* recurrnet.InitFromLSTMParams() can be used to initialize our own LSTM from CUDNN params. This way we can test if cuDNN and our own produce the same result.
recurrent_test.py tests for the equivalency
Reviewed By: salexspb
Differential Revision: D4654988
fbshipit-source-id: 6c1547d873cadcf33e03b0e0110248f0a7ab8cb0
Summary:
Actually adds stuff on duplicated indices. I didn't use UnorderedSegmentSum because it'd need more modifications for figuring out the first dimension and I don't want to make that function more complex than it's already is :)
We theoretically can have a version that does CopyItems and fails on duplicate indices as a fallback. But I haven't implemented it yet as it wouldn't be that useful for now.
Also fixes hypothesis test - doing rand() inside the body is not cool as it makes hypothesis run forever
Differential Revision: D4814574
fbshipit-source-id: 1851ec5f5df8fc4bf4844585076b8af23a06b0b2
Summary:
Uses the cudnnTransformTensor function. It works by shuffling the strides according to the transpose axis. Significant speedup over current GPU version .
+ moves the transpose test under utility_ops, because hypothesis_test is too big
Reviewed By: jamesr66a
Differential Revision: D4810993
fbshipit-source-id: 82577c4ced1389e70bd5992820ae4d8297a3817f
Summary:
All of these tests fail with some variant of `Cannot create operator of type 'X' on the device 'CUDA'` (see commit messages).
Closes https://github.com/caffe2/caffe2/pull/227
Differential Revision: D4797060
Pulled By: Yangqing
fbshipit-source-id: 5feaa8e949098bfc1254d4c7449a2744e552f925
Summary: PadImage has no kernel parameters resulting pads_ paraemeters to be not set (0). I added a test case too.
Differential Revision: D4785230
fbshipit-source-id: fd475e7c41208e07fa7a363def9a45c6f82cddfe
Summary: Creates SparseMomentumSGDUpdate, a sparse version of MomentumSGDUpdate, to make that optimization method (via in-place updating operator) compatible with GradientSlices.
Differential Revision: D4784973
fbshipit-source-id: e6330f471a4d5f53589a6ac245e38f256ca7f354
Summary: AccumulateHistogramOp, for computing the histogram of all values in input tensors
Differential Revision: D4654417
fbshipit-source-id: dea92346004c772af16e1eb41306287d81dc5a02
Summary:
1. Allow EnsureDense Op to do both in-place pass or copy
2. In MTML, add EnsureDense Op before gather
3. Change the unittest values (adding another operator changes the random seed,
which causes the model initialization also changes)
Reviewed By: xianjiec
Differential Revision: D4625219
fbshipit-source-id: b3c748c3651d1dedd75420912a9698b7e46187c5
Summary:
Update cuDNN RNN interface (mostly fixing ordering of arguments). Set seed so that test can pass consistently
Closes https://github.com/caffe2/caffe2/pull/62
Reviewed By: Yangqing
Differential Revision: D4348966
fbshipit-source-id: f9b56be37739e5bffabec130e3407492b2aef656
Summary:
Add two argument to DotProductOp operator, `force_same_dim` (1 if we want
DotProductOp to only accept two tensors with equal dimension, 0 otherwise) and
pad_value (only useful when force_same_dim = 0, pad the tensor with smaller
size to the same as the other one).
Differential Revision: D4502619
fbshipit-source-id: 46f7da710c6f6365f76a7af6234c34c7f656be62
Summary:
(Caffe2) Modified RecurrentNetworkGradient operator so that training is possible with any of the output blob(s) receiving gradient during the backward pass. This is realized through a new argument for the RecurrentNetwork op, outputs_with_grads, which takes a list of the indices of the output blobs which will receive gradient. The default case (only receiving gradient from the first output blob) remains the default.
New unit test covers the case where outputs_with_grads = [1, 2] using Python LSTM wrapper.
Reviewed By: urikz
Differential Revision: D4518516
fbshipit-source-id: 5c531582b20f3cf727d1aa91239b4d5a2b8a7c1f
Summary:
The existing op tranforms the input in a general way. It needs M transform mappings to transform a NxM input tensor.
But for binary predictions X (Nx2 tensor), we know that X[:, 0] = 1 - X[:, 1].
So we just need one mapping for X[:, 1]. After being transformed, we can compute X[:, 0].
This diff is to handle this.
Differential Revision: D4550441
fbshipit-source-id: 42d8c6e88d830c97628ee930b543740a32acf904
Summary:
1. The existing Gather op outputs gradients in sparse format. We add GatherDense that does the same thing
as Gather but outputs gradients in dense format. This relies on the SparseToDenseOp.
2. SparseToDenseOp converts sparse representation (indices, values) into a dense format (missing values are
filled with zeros). There is an existing SparseToDenseMaskOp. It is mainly for converting sparse features
into dense format. Modifying it to achieve our purpose is too complicated and messy. Better to create a new one.
Reviewed By: dzhulgakov
Differential Revision: D4508879
fbshipit-source-id: f4a50efa1c08586d94040f93195661c41cd414da
Summary:
I have forgotten to remove this one. The rest of indexing
instead of string names is comming after D4446813 lands as scratches
aren't inputs or outputs and thus can't be indexed.
Reviewed By: urikz
Differential Revision: D4465748
fbshipit-source-id: 2ccbedfb35541ef4a2231d1480eef59025bd5290
Summary: looks like we don't a good job with initial recurrent input gradients yet. Here is some fix, but gradient doesn't check yet. The shape is correct now though
Reviewed By: salexspb
Differential Revision: D4475447
fbshipit-source-id: 280f1f59f19e487fd0dce0d440609c50ddce294a
Summary: This diff use stack workspaces in RecurrentNetwork, which allows to simplify the implementation and get rid of scratches.
Reviewed By: salexspb
Differential Revision: D4446813
fbshipit-source-id: 514eec7e4300bdf492a9cb192b40cf4f89acf656
Summary:
it's broken because it relies on add sparse bias.
it's not easy to add_sparse_bias after switch to loader_param.
DPA would like to try it out :)
Differential Revision: D4447275
fbshipit-source-id: 631cb4995f35383070e44387dc86692ba64b91eb
Summary: Remove usage of recurrent_sizes, so recurrent states' sizes can depend on input (in case of attention matrix for beam decoder). I removed recurrent_sizes from forward and backward steps.
Reviewed By: salexspb
Differential Revision: D4427688
fbshipit-source-id: 580420a294d309c86ec5cb4e677058623b7228e1
Summary:
In this diff I stop passing parameters by name and also remove hardcoded output ids which were there specifically for LSTM to work. It also allows to avoid using recurrent_sizes in the backward pass (for forward this is done in D4427688)
Using similar technic it should be simple enough to eliminate blob name passing at all. Then we can fix scoping. These can be done in a next diff.
Reviewed By: urikz
Differential Revision: D4444614
fbshipit-source-id: 3580a76365502b9f2f09e3d8b7e78084ca739f00
Summary:
New operator is added for model calibration. Given a piecewise linear function and raw prediction as input, generate the mapping as output.
Detail can be find in the operator doc.
Differential Revision: D4418640
fbshipit-source-id: f8ff3ea786b0fe233a4ddcb709e5dbf0861ca484
Summary:
This is a handy tool for amortizing expensive operators (e.g.
distributed communication, some heavier kernel launches, etc) over a
lot of small blobs (e.g. all the biases in a network). We can just
coalesce these small blobs in-place into a single blob, act on them in
operators, etc as if they are non-coalsed (passing them as inputs to
operators, etc), and then finally for heavier operators, just work on
the coalesced blob that contains each of these units.
I named it UnsafeCoalesce since it introduces blob aliasing, which
needs care for work like memory management, graph rewriting as in
memonger, etc.
Reviewed By: Yangqing
Differential Revision: D3557149
fbshipit-source-id: 09cff4459b84270fe9e1da3b4a168fd66d01f795
Summary:
This is a first step in improving our RNN story. It provides a wrapper around current RecurrentNetworkOp implementation which infers most of the redundant parameters and makes API much simpler.
Also in order to support general step nets I added an extra argument to the RecurrentNetworkOp.
Future work:
1. Inferring step net output and internal blobs (scratches) sizes and type
2. Avoid accessing blobs by names in c++ part
3. Remove requirement for inputs / output 1:1 correspondence in the step net
4. Make python API support networks with operators like Sum being on the boarder of the Cell net (currently there is an issue with such networks where gradient blobs which are on the side are not explicitly created).
Differential Revision: D4268503
fbshipit-source-id: f8a66491c2b55daa730caeed7e9f2b3921541b49
Summary:
This adds Caffe2 support for MKL operators directly with MKLMemory. Included a
Relu layer that shows how to use it.
Reviewed By: salexspb
Differential Revision: D4322144
fbshipit-source-id: 8b3392c4fd024ab1a7ba7135c349ebd3e1976799
Summary:
float64 test breaks things on the cuda side. I am deleting it for now and if
we add it back, let's make sure we run the test on a GPU machine first :)
Reviewed By: azzolini
Differential Revision: D4324427
fbshipit-source-id: 0246fe9dd28a286422ca94c90f5b0fc33a162e74
Summary: Allows to collect samples over multiple batches. The method uses a circular array and so there is no guarantee about the order of the samples. The goal is to get a view of the data accross multiple batches
Reviewed By: salexspb
Differential Revision: D4216181
fbshipit-source-id: bb9e1fa84ac7e04006dcddb53c9347a42ec83dc8
Summary: Used in the NNPreProc layers. It fails the online training when there is empty batch.
Reviewed By: dzhulgakov
Differential Revision: D4235498
fbshipit-source-id: bde00a011831762e44a3f9bf2190d4b241a06ccc
Summary: Each sparse feature is a ID list. And usually the position of the id in the id list is meaningful. The earlier the id appears in the list, the more important. In this diff, we multiple each embedding with a weight, where the weight corresponds to the position. With this change, same ID appears on different position would have different norm/length/importance after aggregation. The firstX transformation in sigrid is a special case of this model where the weights before n are 1, and 0 after n, where n is the argument of firstX.
Reviewed By: xianjiec
Differential Revision: D4181251
fbshipit-source-id: 2a6f8b7240af445b6bd2052fd24c2d99f39ee7ff