Commit Graph

110 Commits

Author SHA1 Message Date
Luke Yeager
0ade0578b1 Reset workspace after each test in copy_ops_test
Summary:
This was a nasty one to track down. This was the error message:
```
E0323 14:47:46.138900  2870 context_gpu.h:126] Encountered CUDA error: an illegal memory access was encountered
F0323 14:47:46.139143  2870 operator.h:176] Computation on device returned error in operator
input: "x_gpu_2" output: "loss" name: "" type: "AveragedLoss" device_option { device_type: 1 cuda_gpu_id: 1 }
```
Closes https://github.com/caffe2/caffe2/pull/220

Differential Revision: D4771086

Pulled By: Yangqing

fbshipit-source-id: f2d0f39f1647c84d97d9745f8a0305a389bfbc41
2017-03-24 12:20:34 -07:00
Ahmed Aly
99bfd36a04 CRF layer in caffe2
Summary:
This is implementation of a CRF layer in caffe2 according to this paper: https://arxiv.org/abs/1603.01360
Currently this implementation works only for batch_size = 1

Reference implementations:

- Tensorflow:
 63a21e0540/tensorflow/contrib/crf/python/ops/crf.py

- Theano:
https://github.com/glample/tagger/blob/master/model.py#L286

Differential Revision: D4644004

fbshipit-source-id: bf0801fd8562d11dca3fefe371c3d85e1dd69ccc
2017-03-23 22:02:02 -07:00
Alexander Sidorov
d7b2aebf2c Support for Sum in cell net as first operator
Summary: This didn't work for a reason specified in comments. Also some cleanup in the unit tests, now inference uses a custom workspace to run cell net on

Reviewed By: urikz

Differential Revision: D4742670

fbshipit-source-id: 04165c029fddec5ae31b20b207faf06d2fa20816
2017-03-21 18:32:18 -07:00
Ahmed Taei
e41d35909a Conv-ND NCHW CUP/CUDA implementation
Summary: Migrate caffe1 ConvNd implementation to caffe2.

Reviewed By: Yangqing

Differential Revision: D4659868

fbshipit-source-id: 14b178af3faa2c0b12e5a9f7aa76c1d8945419ea
2017-03-20 14:01:07 -07:00
James Reed
33f41c06c0 Remove more instances of batch_size
Summary: D4734505 part 2. Remove more instances of the batch_size parameter

Reviewed By: urikz

Differential Revision: D4736906

fbshipit-source-id: fc9d374e9308017d61c427890364c5ab9cec2edf
2017-03-19 22:31:30 -07:00
James Reed
17da5856ed Remove batch_size parameter from attention and LSTMWithAttention interfaces
Summary: Reshape based on tensor shapes in the graph rather than based on a passed-in batch_size parameter

Reviewed By: urikz

Differential Revision: D4734505

fbshipit-source-id: d9c23d85be84f61124106e752ef2b4f6945e2a07
2017-03-19 18:16:28 -07:00
Yury Zemlyanskiy
d1424c3265 Revert D4702086: Remove batch_size parameter from attention and LSTMWithAttention interfaces
Summary: This reverts commit c4c1d8425cd36c1e86695918eaba2667c27e9601

Differential Revision: D4702086

fbshipit-source-id: 4620610b182bb84b9297b5de32782761ae89d20b
2017-03-17 17:36:47 -07:00
Alexander Sidorov
f97d7949d0 Remove legacy LSTM, cleanup tests
Summary: we don't use this one any more except a few tests

Reviewed By: urikz

Differential Revision: D4731401

fbshipit-source-id: c5c28b7594e3251f501fc28455dfc9bd2093a836
2017-03-17 16:33:53 -07:00
James Cross
79c3a3af54 add gpu support for caffe2-seq2seq
Summary: Adding synchronous optimization on GPUs to the translation training pipeline, via data_parallel_model.Parallelize_GPU, which needs to be updated so there is some way of performing sparse parameter updates (e.g., on embedding tables), whether on GPU or CPU.

Reviewed By: urikz

Differential Revision: D4631914

fbshipit-source-id: 9cdd655f7dbda3f9b2733d459228b3e097892441
2017-03-17 05:19:14 -07:00
Jon Morton
1513b1de6b Add ResizeNearest operator
Summary: This adds a nearest neighbor interpolation resizing operator to caffe2. CPU only, NCHW only, no gradients. Also adds torch2caffe support. This is probably not optimal in terms of performance, but it works.

Reviewed By: ajtulloch

Differential Revision: D4724244

fbshipit-source-id: b8295061141fb513da84acf91fdfd67264119059
2017-03-16 18:49:01 -07:00
James Reed
cc2e915461 Implement TopK op in caffe2
Reviewed By: salexspb, urikz

Differential Revision: D4718439

fbshipit-source-id: e6866eb7bb586f2716662cd4b65961bdd9914525
2017-03-16 17:32:20 -07:00
James Reed
10d95bd0f0 Remove batch_size parameter from attention and LSTMWithAttention interfaces
Summary: Reshape based on tensor shapes in the graph rather than based on a passed-in batch_size parameter

Reviewed By: urikz

Differential Revision: D4702086

fbshipit-source-id: c4c1d8425cd36c1e86695918eaba2667c27e9601
2017-03-16 11:47:52 -07:00
Luke Yeager
7773a2d643 Bugfix: type not being set when inferring types+shapes
Summary:
/cc akyrola

I basically just copied all the `ShapeCall` stuff as `TypeCall`. Is there a better way?
Closes https://github.com/caffe2/caffe2/pull/187

Differential Revision: D4699312

Pulled By: Yangqing

fbshipit-source-id: 92f736ffe4127b00b5821acb1eb359771975fdd7
2017-03-15 18:48:40 -07:00
Luke Yeager
014d1fe5c4 Allow test discovery in caffe2/python/
Summary:
These are all essentially no-op changes which allow for nose-style (or pytest-style) test discovery.

With this patch, you can use any of these methods to discover and run tests under `caffe2/python`:
```
python -m unittest discover -p '*test*.py' caffe2/python/
python -m nose caffe2/python/
python -m pytest caffe2/python/
```

Future work:

* Get all of the tests to pass
  * Some seem to be testing operations which don't have GPU implementations
  * I get a segfault unless I set `CUDA_VISIBLE_DEVICES=0`
  * Some tests are flaky
* Allow test discovery throughout the whole project (e.g. the `experiments/` dir)
Closes https://github.com/caffe2/caffe2/pull/199

Reviewed By: pietern

Differential Revision: D4704504

Pulled By: Yangqing

fbshipit-source-id: 8f5687ec9c8aa873dfaff30dbf44272bc38a206b
2017-03-14 18:16:41 -07:00
Ahmed Taei
a745981c94 ReduceBack{Sum|Mean}Op CPU & GPU implementation
Summary:
Implement ReduceBackSum & ReduceBackMean with gradients for CPU & GPU contexts.
The reduction happens among the last dimenstions for example if input is a
M x N matrix ReduceBackSum will results a vector of dim M x 1 contains the
rowwise sums.

Differential Revision: D4689768

fbshipit-source-id: 5b0482d4341867ecf23526dc6c4d544420e7d8f7
2017-03-13 16:19:58 -07:00
Kairan Sun
ee2bc06926 Add Shape Inference for Reshape Operator
Summary: Add shape inference for reshape. Because it cannot do shape inference for reshaped tensor with runtime tensor data, set `out[0].set_unknown_shape(true)` if no `shape` argument is used.

Differential Revision: D4671125

fbshipit-source-id: 685a9198f9b08e3336014c792f20051b381d8619
2017-03-13 14:31:27 -07:00
Aapo Kyrola
adb3f0ec22 add exception for empty shape param
Summary: Following krp's suggestion, check if the shape parameter is empty.

Reviewed By: dzhulgakov

Differential Revision: D4686698

fbshipit-source-id: 3f9fb1e3215dd2a4a726442531201eeb18224bc6
2017-03-10 00:33:59 -08:00
Karthik Prasad
965a7daf9b Implement MILSTM in caffe2
Summary:
Created a new function with specifics related to MI LSTM implementation in caffe2
See https://arxiv.org/pdf/1606.06630.pdf for details.
See D4478877 for the implementation of the same in tensorflow

Reviewed By: jhcross

Differential Revision: D4669882

fbshipit-source-id: 095bbcf187dbdac2cd79558ff0c8f9f67d8af639
2017-03-09 16:32:47 -08:00
James Cross
c5621ded31 Allow use of ReversePackedSegs operator in CUDA context
Summary: ReversePackedSegs operator for CUDA. Input "lengths" (static integers) required to be in CPU memory.

Differential Revision: D4661281

fbshipit-source-id: c800c316c34015ba8e732dcbcaa8c4edaffdfeab
2017-03-09 15:03:55 -08:00
James Reed
8de1db9eb6 Implement recurrent attention in C2
Summary: Super rough implementation of recurrent attention. Planning to factor out the common code between the two functions as well as train and eval. I want to get this out and get eyes on it sooner rather than later

Differential Revision: D4647837

fbshipit-source-id: 54bc4e8ed0df6f04c86c425926decbe89f73b068
2017-03-08 11:21:28 -08:00
James Cross
8de2027d9b Add gradient operator for SumElements
Summary: Add gradient support for Caffe2 operator SumElements (for use in Translation RNN training pipeline).

Differential Revision: D4669036

fbshipit-source-id: 502760a2a624b20b3241e83a2f208f450b6ff36f
2017-03-07 20:03:07 -08:00
Aapo Kyrola
d8588d8007 CUDA version of elementwise power + rename to Pow + gradient
Summary: Renamed ElementwisePower to Pow for better discoverability. Added CUDA version and Gradient + tests.

Reviewed By: kennyhorror

Differential Revision: D4665550

fbshipit-source-id: dd33d8ad3917d71504e363ab397af50d38a63b1f
2017-03-07 10:20:40 -08:00
Aapo Kyrola
695ea6c7a1 SumElementsOp
Summary: Add a simple op to sum the elements, with optional averaging. This is basically copy from AverageLossOp that we should alias to this. And maybe develop this towards a generic norm op.

Reviewed By: jhcross

Differential Revision: D4664591

fbshipit-source-id: 0e0c0efe9e415e2ad2feecfa42b03db2c83bee70
2017-03-07 05:23:53 -08:00
Aapo Kyrola
8fab453863 Sqr op and gradient
Summary: Due to popular demand, added an op to compute element-wise square + gradient for it (just for the fun of it).

Reviewed By: Yangqing

Differential Revision: D4664797

fbshipit-source-id: 0a29c7c249fdc72f51412bebd6ae352a7801cf05
2017-03-07 03:03:07 -08:00
Kairan Sun
9f588aa8a2 Add Inference for Flatten
Summary: Implementing shape inference for Flatten operator and adding unit tests.

Differential Revision: D4664073

fbshipit-source-id: c54a269fc7633908fe4197682d27076ef97d9c22
2017-03-07 01:21:40 -08:00
Aapo Kyrola
2333ccadfb MaxOp for CUDA
Summary: Simple elementwise Max implementation for CUDA. Given N inputs, it will do N-1 pairwise maxes. I am not sure if it would be much better to iterate through all the inputs in the kernel, since this has better locality. We can also optimize later.

Reviewed By: Yangqing

Differential Revision: D4659953

fbshipit-source-id: 3a23b7fb3dbdf1d43bf3134ece03af4a791844dd
2017-03-06 16:46:53 -08:00
Pooya Davoodi
c61a7ca777 Make counts datatype int. Used as index.
Summary:
To avoid Numpy warning: using a non-integer number instead of an integer will result in an error in the future
Closes https://github.com/caffe2/caffe2/pull/64

Differential Revision: D4658348

Pulled By: Yangqing

fbshipit-source-id: 3a1b33cbb27849bc167b08147d078e8d487567f4
2017-03-06 10:46:36 -08:00
Aapo Kyrola
8caa7cec8d CUDA version of Log
Summary: As in the title. Simple registration issue.

Reviewed By: Yangqing, jhcross

Differential Revision: D4655691

fbshipit-source-id: 661e4d5f1226ec05e099c84f4454aa07c6be4449
2017-03-04 00:32:03 -08:00
Huazhong Ning
6c9105447c support fill bool tensors in GivenTensorFill
Summary:
the existing code uses vector<T> to store the given tensor and then copy to output.
If T=bool, vector<bool> stores the data as bits and then copy does not work.
we use TensorCPU to store it instead.
Also add unittest.

Reviewed By: kennyhorror

Differential Revision: D4622325

fbshipit-source-id: 95c27b5d1cfbc836d2419d01cacde5a3172f4d7e
2017-03-02 20:18:59 -08:00
Aapo Kyrola
ec56737190 fix shape inference for spatial softmax with loss
Summary: The shape inferenec did not check for spatial mode.

Reviewed By: andrewwdye

Differential Revision: D4638218

fbshipit-source-id: f15419738587013dea39e04a3da086890938c4e2
2017-03-01 19:32:32 -08:00
Aapo Kyrola
02937903cc add inference for gradient ops + a couple of missing shape inference functions + fix to scalars
Summary:
A bit too much stuff in one diff, so sorry:

1. Add inference for gradient types by using the fact that x_grad is gradient of x and must be of same shape. This is kind of awkward to use string matching, but in addition I rely on the operator being actually a gradient op.
2. dzhulgakov was write, scalar shape is () and not (1). Sorry, my claim easlier was #fakenews.
3. Added inference functions for MakeTwoClass, MomentumSGDUpdate and Cross entropy ops.

Reviewed By: dzhulgakov

Differential Revision: D4569758

fbshipit-source-id: 0db13f33819777fdddefe21d4b1ebf906fcaf98c
2017-02-28 23:33:32 -08:00
Jerry Pan
8a0ebed4c9 Caffe2: Tile operator
Summary: Caffe2: Tile operator

Differential Revision: D4630698

fbshipit-source-id: 1aa5c3c9d7fcfc17f78c80fd4b752595280266a0
2017-02-28 23:17:26 -08:00
Luke Yeager
69fa85be26 Fix some typos
Summary:
Found while reading through d522693cc8
Closes https://github.com/caffe2/caffe2/pull/176

Differential Revision: D4630275

Pulled By: Yangqing

fbshipit-source-id: 0a8e85d317d427a39467ebcb5e9a70594075bae2
2017-02-28 18:36:12 -08:00
Simon Layton
fbf47a8825 Cudnn v6
Summary:
Add cudnn v6 support, including testing support for dilated convolution.
Add a check to ensure that the versions of cuDNN used to compile Caffe2 and run it are compatible
Closes https://github.com/caffe2/caffe2/pull/85

Reviewed By: bwasti

Differential Revision: D4387690

Pulled By: Yangqing

fbshipit-source-id: 312960134398dd4afe6ee0c01cdc160046c904e8
2017-02-28 17:46:33 -08:00
Artem Volkhin
000db87bc7 Half-floats support for the rest of segment ops
Summary:
previously fp16 type was supported in SparseLengthsSum operator, now it
works in all other segment operator as well.

Reviewed By: dzhulgakov

Differential Revision: D4624312

fbshipit-source-id: c9d72110e3762167270bb088405eaf9c56e88493
2017-02-28 11:19:15 -08:00
Kun Huang
07623e24c9 Implement shape inference function for Im2Colop
Summary: Inference function for the Im2ColOp: caffe2/caffe2/operators/im2col_op.cc.

Differential Revision: D4608663

fbshipit-source-id: d26ffb403c2acb7a5ead5f58f044ee3340c8311a
2017-02-27 10:46:54 -08:00
Kevin Matzen
04d02632e9 instance norm test fix
Summary:
Reduce test input size to instance norm gradient check.  Larger size is currently timing out on stress tests.
e.g. failed: Timeout: Ran out of time before finding a satisfying example for test_instance_norm_gradients. Only found 2 examples in 125.39s.

Reviewed By: Yangqing

Differential Revision: D4608828

fbshipit-source-id: ce17a3ad28752d808efcbf79f1ea4238e63fb005
2017-02-25 14:31:42 -08:00
Peng Yang
8ab13eea6f delete redundant comment lines.
Summary: delete redundant comment lines.

Differential Revision: D4600596

fbshipit-source-id: 4bb619f9ff99d6f799e87970b6b6d5ea7de02c98
2017-02-24 11:04:36 -08:00
Deepak Gopinath
cd4ea42048 Allowing creation of random odd length arrays in RandGaussian
Summary: curandGenerateNormal can only generate arrays of multiple of 2 lengths. MSRAFill and GaussianFill operators use RandGaussian utility method which in turn uses curandGenerateNormal. This is a test which runs the operators on both devices to generate odd sized random arrays.

Differential Revision: D4602819

fbshipit-source-id: e65f5c731e925886cfa14afff482f7053bd020a0
2017-02-23 15:03:22 -08:00
Yury Zemlyanskiy
4a53ab3cb6 LSTMWithAttention implementation in Caffe2
Summary:
Implementation of ##LSTMWithAttention##

Still TBD:
1. There are problems with back propagation, because gradient is not implemented for ops with broadcasting
2. I need to make initial_recurrent_state to be of shape [dim] rather than [1, batch_size, dim], so one doesn't need to provide batch_size to LSTMWithAttention

Differential Revision: D4298735

fbshipit-source-id: 8903fcff4d6a66647ee6d45a6ef28803fc3091e5
2017-02-23 04:08:34 -08:00
Andrew Tulloch
312821d36c Allow in-place instance norm.
Summary:
In-place is ~30% speedup, but needs a change to torch2caffe
or a graph rewrite on the client.

Differential Revision: D4577582

fbshipit-source-id: c31bf8ba97f4fa4cedf355cf2475eb7bab48b304
2017-02-22 14:03:55 -08:00
Artem Volkhin
45e1905722 add support of fp16 to SparseLengthsSum and SparseLengthsMean
Summary: Another part of making DPER compatible with half-floats. This diffs adds supoprt of fp16 to segment reduction operators used in DPER.

Reviewed By: dzhulgakov

Differential Revision: D4587560

fbshipit-source-id: 0ae10648a7286a820bffaee802464dd9464584bc
2017-02-22 11:05:55 -08:00
Peng Yang
26be1977bf fix CrossEntropyOp bug for batch input
Summary: this is to fix the bug with eigen implementation which calculating crossentropy

Reviewed By: salexspb

Differential Revision: D4582078

fbshipit-source-id: 4c92047e9dbbe219fcbef618a45c584c2fbfaad5
2017-02-21 17:34:31 -08:00
Alisson Gusatti Azzolini
04eccb8ebe Performance counters
Summary:
- Key-value store for counters.
- Counters are updated via macros that also export USTD probes.
- Counter values can be exported using caffe2 operators.
- Snapshot mechanism for tracking time-window counter values.

Reviewed By: dzhulgakov, pietern

Differential Revision: D4553761

fbshipit-source-id: 25a1a91a3168dcff2159c6fba7b357d3fd3aa9bf
2017-02-21 16:31:24 -08:00
Qichao Que
7f4d5e9900 Add feed label parser operator.
Summary: Add feed label parser operator, this layer depends on D4520993.

Reviewed By: kennyhorror

Differential Revision: D4538797

fbshipit-source-id: 8efcd7b2f6962c30023c7464a13c125ba1a99dc4
2017-02-21 14:17:00 -08:00
Ahmed Taei
5bc3d2ef03 Add ReduceFront GPU Op's
Summary: Add GPU implementation for ReduceFront{Sum|Mean} Ops

Differential Revision: D4577270

fbshipit-source-id: 697f498531af6b9da4a0138d2a9beb39234f9756
2017-02-17 16:46:42 -08:00
Xianjie Chen
d0621a2449 NextScopedBlob with well-defined behavior and respect namescope
Summary:
Remove the use of `NextName` in layer model helper, so that the same function return `model_helper` that should construct identical `Net`, when under the same NameScope.

The `NextScopedBlob` should only take effect when there is real name conflicting, otherwise it returns ScopedBlobReference.

This is critical for parameter blobs. In long run, we need to be able to specify parameter blobs more explicitly. (kennyhorror is working on this). This solution works in short term for e.g., two tower sparse nn models.

Reviewed By: kennyhorror

Differential Revision: D4555423

fbshipit-source-id: 2c4b99a61392e5d51aa878f7346466a8f14be187
2017-02-16 17:16:36 -08:00
James Cross
b436788b16 LSTMUnit: pass through H values
Summary:
Pass through the h-value recurrent output unchanged at each LSTM step beyond the valid part of a sequence (computed based on seqLengths, allowing batching of sequences of different length). This enables using the final-step output of each sequence as the output when one vector is desired for the entire sequence. Gradient also passed back unchanged.

Also made some cosmetic changes to recurrent_network_test.py (seq_lengths offset corrected, should be in [1, T] rather than [0, T-1]).

Reviewed By: urikz

Differential Revision: D4540307

fbshipit-source-id: 73a9f6326069d713dcb0cdc8d17869317c6dbe96
2017-02-16 15:31:38 -08:00
Steven Strijakov
5429031917 Adding SoftmaxWithLoss operator to Shape Inference
Summary: This diff adds shape inference for the SoftmaxWithLoss Operator

Differential Revision: D4565835

fbshipit-source-id: 1c2db398524c765977ec4d8a22c9b986bf9faf82
2017-02-16 12:32:51 -08:00
Yury Zemlyanskiy
40534de705 Gradient for Copy operator
Summary:
One can find a reason, why I need gradient for CopyOp in this post - https://fb.facebook.com/groups/1405155842844877/permalink/1639683782725414/

Gradient for CopyOp is trivial in case the device was the same (cpu, or same gpu), but get's a little harder, when the copy was made across two different gpu.
I introduce new operator CopyOnDeviceLike, which has additional second input. The op copies the first input to the same device as the second one. The default implementation is exactly the same as CopyOp, but I specialize it for CUDAContext.

Please, let me know if I'm doing anything wrong here! That's my first caffe2 diff, related to operators definitions.

Reviewed By: Yangqing

Differential Revision: D4557258

fbshipit-source-id: 9494be589cc1e5696bbbfe25b7622aaa4c9efe4a
2017-02-16 06:11:27 -08:00