Summary: Adding synchronous optimization on GPUs to the translation training pipeline, via data_parallel_model.Parallelize_GPU, which needs to be updated so there is some way of performing sparse parameter updates (e.g., on embedding tables), whether on GPU or CPU.
Reviewed By: urikz
Differential Revision: D4631914
fbshipit-source-id: 9cdd655f7dbda3f9b2733d459228b3e097892441
Summary: This adds a nearest neighbor interpolation resizing operator to caffe2. CPU only, NCHW only, no gradients. Also adds torch2caffe support. This is probably not optimal in terms of performance, but it works.
Reviewed By: ajtulloch
Differential Revision: D4724244
fbshipit-source-id: b8295061141fb513da84acf91fdfd67264119059
Summary:
1. migrate the basic mtml model to dper 2
2. test dper 2 mtml model
3. test all optimizers
Reviewed By: kittipatv
Differential Revision: D4680215
fbshipit-source-id: 7aac5c59bdac22fcad8ed869b98e9e62dca1d337
Summary: layer that takes a label, prediction pair and outputs the L2 loss
Reviewed By: kittipatv
Differential Revision: D4702111
fbshipit-source-id: 09f2ede44d1b548e61096de741f1b2aa0b66bbcb
Summary:
it was broken in trunk and I fixed it locally then had a
wrong merge in D4672026. This is just a revert of those changes
Reviewed By: ajtulloch
Differential Revision: D4723138
fbshipit-source-id: 14757d9c8ae5135bd7c084003a64e25efc74b54f
Summary: Reshape based on tensor shapes in the graph rather than based on a passed-in batch_size parameter
Reviewed By: urikz
Differential Revision: D4702086
fbshipit-source-id: c4c1d8425cd36c1e86695918eaba2667c27e9601
Summary:
/cc akyrola
I basically just copied all the `ShapeCall` stuff as `TypeCall`. Is there a better way?
Closes https://github.com/caffe2/caffe2/pull/187
Differential Revision: D4699312
Pulled By: Yangqing
fbshipit-source-id: 92f736ffe4127b00b5821acb1eb359771975fdd7
Summary: For some embedding task, we don't want to include bias term in embedding computation.
Reviewed By: xianjiec
Differential Revision: D4689620
fbshipit-source-id: 4168584681d30c0eaa1d17ceaf68edda11924644
Summary:
Make it use Gloo and optionally use Redis for rendezvous (where a
shared filesystem is not available).
Differential Revision: D4709943
fbshipit-source-id: 59cc7a14316c7b634417ea5161a75fab3c19f2fa
Summary:
We are having more and more nested Struct schema. There is increasing need to get/adda field by nested name, e.g., for the following nest Struct schema:
st = Struct(
('a': Scalar()),
('b': Struct(
('c': Scalar()),
)),
)
We may want to get the field "b:c" and/or insert a new field "b:x". The immediate need is for dper2 metrics.
This diff is to achieve this.
Reviewed By: kittipatv
Differential Revision: D4690225
fbshipit-source-id: 71d4a74b36bd1228a2fefd901db2f200602152b7
Summary: For example, test and train nets could have shared workspaces, leading to race condition. This adds an assertion and adds a running counter to the workspace-blob name.
Reviewed By: jhcross
Differential Revision: D4712152
fbshipit-source-id: 808d7069095bac24ebfe0c9d31ebd134f4cf0956
Summary:
No longer need GPU to CPU copies. The allreduce operator no longer
uses 'local allreduce - global allreduce - local broadcast' sequence
when Gloo is used, but passes all input blobs directly.
Depends on D4708860.
Differential Revision: D4709897
fbshipit-source-id: 4d745d5d8bac9c2fcca081dd5d812c902808c3b6
Summary:
This is going to allow to experiment with various training from scratch / fine tunning technics. The code itself for the new model is not intended to be used as is. Instead one could train a full precision model first. Then add quantization for the last layer, then for the next one and so on.
In my experiments I tried getting a pretrained model and then quantizing all inception layers with 4 bits. This restored original accuracy after several dozen iterations
Also in this diff I added a common prefix to the model checkpoint + added this prefix to git / hg ignore.
And also some extra logs which are usefull to quickly see how things changed right after enabling quantization
Differential Revision: D4672026
fbshipit-source-id: b022c8ccf11dd8a2af1a7b2e92673483bc741a11
Summary:
These are all essentially no-op changes which allow for nose-style (or pytest-style) test discovery.
With this patch, you can use any of these methods to discover and run tests under `caffe2/python`:
```
python -m unittest discover -p '*test*.py' caffe2/python/
python -m nose caffe2/python/
python -m pytest caffe2/python/
```
Future work:
* Get all of the tests to pass
* Some seem to be testing operations which don't have GPU implementations
* I get a segfault unless I set `CUDA_VISIBLE_DEVICES=0`
* Some tests are flaky
* Allow test discovery throughout the whole project (e.g. the `experiments/` dir)
Closes https://github.com/caffe2/caffe2/pull/199
Reviewed By: pietern
Differential Revision: D4704504
Pulled By: Yangqing
fbshipit-source-id: 8f5687ec9c8aa873dfaff30dbf44272bc38a206b
Summary:
First, this diff includes a full test of data-parallel LSTM, which confirms it works correctly. To make it work, some changes had to be made:
- cell net/step net external inputs must be namespace scoped
- prevent double-namescoping of cellnet inputs
- make data parallel model understand recurrentnets so the device-mapping works
Reviewed By: salexspb
Differential Revision: D4708840
fbshipit-source-id: 4b0ddc43642d449076a2b6f67ad1c47f84138ff4
Summary: Some operators, e.g., SoftmaxWithLoss, returns scalar-typed tensor. This would allow us to use those ops without having to write layer manually.
Reviewed By: xianjiec, kennyhorror
Differential Revision: D4703982
fbshipit-source-id: f33969971c57fc037c9b44adb37af1caba4084b6
Summary: When cloning recurrent net op, we do a remapping of the lengths-blobs. But if they don't exists (like with CRF), we should not do that.
Differential Revision: D4702123
fbshipit-source-id: 37a22d11e709011b8b98b2cc3d9f08eb9fda06c4
Summary: These python helpers are going to provide sufficient book keeping when adding quantization for conv layers
Reviewed By: Yangqing
Differential Revision: D4671478
fbshipit-source-id: 292e2f633dd30969c0afbe7a8075b340ce9a6d12
Summary: UNK needs tobe indexed in the vocabulary for validation to work. Default args now result in training loss decreasing.
Reviewed By: urikz
Differential Revision: D4703393
fbshipit-source-id: e4d6ad100daf8392f8ba1e502f9ecf39bb8ce24a
Summary:
It has been a pain to save predictor-compatible models from Caffe2. This diff adds function ExtractPredictorNet that takes a training model and outputs a predictor model by removing all operators that are not relevant for prediction, such as backward pass and dequeue-ops for input loading (as in predictor, the input data is external input).
We can also consider including this directly in the predictor exporter for FB usage.
Reviewed By: rpenggithub
Differential Revision: D4693264
fbshipit-source-id: e81abbbec0bd4d717159cf36488d0baaf0130090
Summary:
Implement ReduceBackSum & ReduceBackMean with gradients for CPU & GPU contexts.
The reduction happens among the last dimenstions for example if input is a
M x N matrix ReduceBackSum will results a vector of dim M x 1 contains the
rowwise sums.
Differential Revision: D4689768
fbshipit-source-id: 5b0482d4341867ecf23526dc6c4d544420e7d8f7
Summary: Add shape inference for reshape. Because it cannot do shape inference for reshaped tensor with runtime tensor data, set `out[0].set_unknown_shape(true)` if no `shape` argument is used.
Differential Revision: D4671125
fbshipit-source-id: 685a9198f9b08e3336014c792f20051b381d8619
Summary: We should be using the vocabulary built on the training data, and corpus_eval as data for the evaluation phase.
Reviewed By: urikz
Differential Revision: D4700382
fbshipit-source-id: ca1dd043a28f9bb585faad050c82fb12c1cdf6cc
Summary: Fixed a bug (AttributeError: ModelTrainerLog instance has no attribute 'external_loggers', at File "caffe2/python/experiment_util.py", line 101) when no external_loggers is passed to ModelTrainerLog().
Differential Revision: D4697197
fbshipit-source-id: 1c770c366d87ea474bcf40ab289b67c76648d48b
Summary:
otherwise the blob will be in different namescope, e.g., `_nested`: https://fburl.com/ntlsaezv.
this make tensorboard ugly.
Reviewed By: dzhulgakov
Differential Revision: D4696946
fbshipit-source-id: 73627feccd7c4896964e6c549b7241bcce4f49a7
Summary:
TSIA
This change also fixes an undefined attribute error after running 20
iterations of the resnet50 example trainer.
Differential Revision: D4692794
fbshipit-source-id: b98efdfeb078c5ba89d2a86837f3c672e1eade5f
Summary: A lot of people get confused if the file can't be loaded.
Reviewed By: rpenggithub
Differential Revision: D4686572
fbshipit-source-id: 519ff68a3d4f04cf8ce893f255f7814e043383b6
Summary: We need the InferToDeviceMapping too early, or we should had done it also after running parameter update function since that can create new blobs like the momentum blobs. This fix is maybe not optimal, but works and is fast enough.
Differential Revision: D4693450
fbshipit-source-id: 4c4cc2396dad371b3fbcd1d8da51133ea09a57e0
Summary:
Before we didn't propagate the 'out-of-data' signal if splits_per_epoch wasn't specified.
Right now it's a hacky fix (just reuse ReaderWithLimit). azzolini - any suggestions of more elegant solution? I can create an extra reader that just export "is empty" signal out.
Overall, I guess we need to turn global_queue into a more sustainable unittest that verifies all possible combinations - I'm still not sure it's correct :-\
Reviewed By: xianjiec
Differential Revision: D4665677
fbshipit-source-id: fe44d10ee82c3383145635e67dea1d9b666e061f
Summary: Whe debug using LayerModelHelper, adding Print to model will trigger this assert.
Reviewed By: xianjiec
Differential Revision: D4687859
fbshipit-source-id: 6932e38f8dd17ba0b80da18a20943ecdb2e8af0a
Summary: Thanks for shenpan, detected this bug. Problem is that FinalizeAfterCheckponit() can be passed a list of strings, not blob references, and that fails in stripParam() after assertion I added in D4649208. It is ok to pass strings as well to that function.
Reviewed By: jhcross
Differential Revision: D4691028
fbshipit-source-id: 0bca80d44a5ab641438cc5b26482bca0b1527d69
Summary: Following krp's suggestion, check if the shape parameter is empty.
Reviewed By: dzhulgakov
Differential Revision: D4686698
fbshipit-source-id: 3f9fb1e3215dd2a4a726442531201eeb18224bc6
Summary:
Created a new function with specifics related to MI LSTM implementation in caffe2
See https://arxiv.org/pdf/1606.06630.pdf for details.
See D4478877 for the implementation of the same in tensorflow
Reviewed By: jhcross
Differential Revision: D4669882
fbshipit-source-id: 095bbcf187dbdac2cd79558ff0c8f9f67d8af639
Summary:
OSS implementation of seq2seq model in Caffe2. The script uses Seq2SeqModelCaffe2 class to build and run the model. It takes in training data in the form of text file with one sentence in each line, builds a vocabulary, generates batches based on batch size and runs the net for a configurable number of epochs. It prints total scalar loss at the end of each epoch.
All FBLearner and neural_mt type system dependencies have been removed. Unimplemented and unnecessary methods have been removed to make the script simpler.
fblearner/flow/projects/langtech/translation/neural_mt/model_util_caffe2.py has been moved to caffe2/caffe2/python/examples/seq2seq_util.py and remains unchanged
Potential TODOs:
- Get the model running in GPU. Only GatherOp does not have a corresponding GPU implementation. Try adding CopyGPUToCPU before and CopyCPUToGPU after Gather, and use CUDA DeviceOption.
- Add evaluation on test data with suitable metric (perplexity? bleu?)
Reviewed By: urikz
Differential Revision: D4653333
fbshipit-source-id: 1c7d970ebc86afe23fad4d48854296bf54eb0f77
Summary: ReversePackedSegs operator for CUDA. Input "lengths" (static integers) required to be in CPU memory.
Differential Revision: D4661281
fbshipit-source-id: c800c316c34015ba8e732dcbcaa8c4edaffdfeab
Summary:
Data parallel model did not support sparse operations, nor gradients computed on CPU ops.
Currently sparse operations are done on CPU, so there is no point of "data parallelizing" them. I had to make a few changes to data_parallel_model to support this:
1. Model can have params that are added prior to adding the data parallel part. For example, a lookup table of word vectors would be a parameter that is non-parallel.
2. Thus, when data parallel model is called, it will separate the non-parallel params and avoid working on them. Note: when we add distributed version, we need to explicitly handle them with AllGather!
This works nicely since Caffe2 automatically adds the backward concat-operator when multiple ops gather from the same blob.
I also added support for data parallel CPU ops, which might be necessary in cases when we don't have GPU implemenation of some ops.
Test in data_parallel_model_test validates the correctness of the code by running the same trainer on different number of gpus and checking the end result is same.
Reviewed By: jhcross
Differential Revision: D4649208
fbshipit-source-id: e3b7ae701ead468dc94c52a976eafec5c9831097
Summary: This diff is getting rid of old metrics interface in realtime training.
Reviewed By: xianjiec
Differential Revision: D4649734
fbshipit-source-id: de4af85eb5476df9790ebd3915625bf8beee65af
Summary: sum processor and sqrt pooling is to mimic the DoubleHelix model.
Differential Revision: D4678413
fbshipit-source-id: fc1ccfe3c92c540ce5914dfd8ff1a040805c48db
Summary: AccumulateHistogramOp, for computing the histogram of all values in input tensors
Differential Revision: D4654417
fbshipit-source-id: dea92346004c772af16e1eb41306287d81dc5a02
Summary: Take user inputs for the introspection visualization: convolutions output layer activations, filters using containing phrases, and number of samples
Reviewed By: Mortimerp9
Differential Revision: D4603797
fbshipit-source-id: dc972dcb8ad36e30defab266d710e047b11cff73
Summary: Super rough implementation of recurrent attention. Planning to factor out the common code between the two functions as well as train and eval. I want to get this out and get eyes on it sooner rather than later
Differential Revision: D4647837
fbshipit-source-id: 54bc4e8ed0df6f04c86c425926decbe89f73b068
Summary: In case of distributed task, load_from_db() loads to wrong workspace (when used inside a Python op). Passing which workspace to use explicitly so that it loads to the one Python op is being run.
Reviewed By: kennyhorror
Differential Revision: D4653692
fbshipit-source-id: 94585c012b05ee38b9ce5e8ef0efdd50aa41dd2b
Summary: The evaluation part of the two tower workflow is missing. This diff is to complete it. Part of the newly added functions can be used for other workflows, eg, feed. As the eval workflow in different workflows will be overlapped, a generic eval workflow will be added in a separate diff.
Reviewed By: kennyhorror
Differential Revision: D4646880
fbshipit-source-id: 4d6eb35df10f6f613533d442f2a04dc0332386f8