Summary: Current MultiNodeCheckpointManager return None in this case, yet in JobRunner we assume this function returns a valid task group, i.e. we call session.run(self.checkpoint_manager.init(...)) directly. This will fail the case we use LocalHostScheduler and reuse a MultiNodeCheckpointManager
Reviewed By: azzolini
Differential Revision: D6843450
fbshipit-source-id: a7ec942cfe692f19e8751b0078ae6a6108f29e54
Summary: To match the semantic in ONNX, change the default value of alpha of LeakyRelu to 0.01
Reviewed By: dzhulgakov
Differential Revision: D6840975
fbshipit-source-id: 08543f80fd86cbe96a0eee8d725ef137a5bf4ab8
Summary:
Commonly, net observers attach operator observers at construction. This diff separates the logic into a base class to inherit from.
Closes https://github.com/caffe2/caffe2/pull/1806
Reviewed By: salexspb
Differential Revision: D6808623
Pulled By: mdschatz
fbshipit-source-id: 75ef0eea913ef30943541c829c0a976965f42736
Summary:
In this case, each sequence is treated as having a length equal to the
first dimension of the input tensor. This matches the semantics of
ONNX when the sequence length input is left out.
Closes https://github.com/caffe2/caffe2/pull/1764
Reviewed By: dzhulgakov
Differential Revision: D6751219
Pulled By: anderspapitto
fbshipit-source-id: 89e0efd12339157627494e2b8c83e952bdd8a9f8
Summary:
Main changes:
1. Move reader creation to Brew in order to be consistent and avoid a wild use of param_init_net
2. Use optimizers for training function, avoid manual optimizer construction
3. Add MLP mode (a default)
4. Fix a bunch of too verbose comments and add a bit of new explanations
Closes https://github.com/caffe2/caffe2/pull/1760
Differential Revision: D6749059
Pulled By: salexspb
fbshipit-source-id: 9dfbbb2d9772a74a0300c2e404a92e791f7cc593
Summary: Updates `sparse_lookup.py` for the new fused 8-bit rowwise quantization. Mostly just changing the same files as the original diffs (D5753626 and D5761202). I know very little about this code here so please let me know if this is safe, also in terms of migration away from the non-fused storage.
Reviewed By: kennyhorror
Differential Revision: D6710784
fbshipit-source-id: 185f147af52a094a937ba631b0351225e660d205
Summary:
as titled
After converting categorical to Ngram keys, use this op to extract eids
Differential Revision: D6794020
fbshipit-source-id: 4f9251a22d7a129da30b92845e312876e6510e7e
Summary: Adds cuda support for LC Op
Reviewed By: QueryConnectionException
Differential Revision: D6803659
fbshipit-source-id: 538bbf6fd202c79154132fda0e90e175eb09d025
Summary: Weighted sampling reader dequeue randomly chooses a hive reader to read a mini-batch. This diff allows dequeue to output the index of the randomly chosen table to a specific blob.
Reviewed By: kennyhorror
Differential Revision: D6621070
fbshipit-source-id: 754b981fc2bcfdb0146d2a0a5b677e7cfe74211b
Summary: Fix the flaky test for ngram from categorical test
Reviewed By: dragonxlwang
Differential Revision: D6801152
fbshipit-source-id: dcbae17b1d3737a41fb2f5c794c1146a02c542bb
Summary:
Every call to the checkpoint_metadata_handler write() API requires us to pass all params like db_prefix, db_type etc.
Introducing an init API in the checkpoint_metadata_handler so that such params can be saved and need not be passed in every API call
Reviewed By: mraway, anshulverma
Differential Revision: D6792651
fbshipit-source-id: 059fa4309e8fce1ee5ab009af3e0570573c24245
Summary: This is the first in a series of diffs to enable batch normalization across multiple devices on the same node with data parallel model. The diff contains the ops for computing the per-channel statistics required to obtain the mean and variance across multiple devices on the same node on the forward pass, and the gradient of the bias and scale during backpropagation. The actual modifications to SpatialBN and SpatialBNGradient to make use of these results will be in a separate diff.
Reviewed By: rbgirshick
Differential Revision: D6697336
fbshipit-source-id: 0de2750fe7e851795f238d9f625aeb4d74023dc2
Summary:
This is a first attempt at completing bootcamp task T24449916. This diff contains 3 major changes:
1) Change LayerModelHelper to allow for exposing the output and parameters of any layer to metrics
2) Added a runner that allows metrics to draw arbitrary plots to a matplotlib axes object
3) Implement a metric that aggregates distributions of values in a blob over the training, and try this out in a notebook
Reviewed By: kennyhorror
Differential Revision: D6671273
fbshipit-source-id: b8961837395e89c957edbf5c7c862bdb845ccf4b
Summary: add Test for SparseLookup with PositionWeighted.
Reviewed By: kennyhorror
Differential Revision: D6771612
fbshipit-source-id: b4b3bfd514f366f579b4192643330ae73843d4f9
Summary:
SqueezeOp support to drop drop dims of size 1. MKLMemory now supports Reshape()
if the buffer is in plain layout, in which case just the dims and layouts are
modified similar to caffe2::Tensor. SqueezeOp takes care of converting the
input to plain layout if needed via an intermediate buffer before calling
Reshape().
Differential Revision: D6735656
fbshipit-source-id: 953309498370e1b8986e8c593bc6963f38036255
Summary:
At the end of distributed training, trainer needs to download the parameters back from parameter servers for saving the model. Currently, this parameter downloading happens at the end of job's epoch task group, which creates several problems when checkpointing is enabled for distributed training:
1. When checkpointing is enabled, we run multiple training epochs. At the end of each epoch, the model download tasks will run to collect parameters, but we won't save the model until the true end of training, so there is a big waste of resource.
2. After trainer0 downloads the parameters, these parameters take a lot of memory, so trainer0 can easily run out of memory in the next epoch of training.
Our solution is to insert a parameter download task group between the job's training epoch_group and the job's exit_group.
Reviewed By: azzolini
Differential Revision: D6765393
fbshipit-source-id: 5a4f556fc3c1cd7834a7c406a3c0de3fccd50c49
Summary:
This should translate to an 1% error margin. The gradient checker uses a .5% threshold.
Closes https://github.com/caffe2/caffe2/pull/1766
Differential Revision: D6774077
Pulled By: pietern
fbshipit-source-id: f97c7ffb2ef34fdd71d69320a7fdcf4a6a457715
Summary:
Just redirects to MKLSumOp. Doesn't support broadcast though since dnnSumCreate
expects identical dims.
Differential Revision: D6729788
fbshipit-source-id: 3e189465ad9d026bec4954648562ffe4e67fc393
Summary:
As in name. LATTE translation team moving some code from Python 2 to 3 uncovered a case where comparison between unicode and str types leads NameScope('') to prepend a separator to the beginning of blob names. This fixes it.
Thank you so much to dzhulgakov for tracking down the cause of this so quickly!
Reviewed By: dzhulgakov
Differential Revision: D6766866
fbshipit-source-id: fbe46cff581f425ba10e8668400915ea40baab94
Summary: Make test less computationally expensive
Reviewed By: Yangqing, dzhulgakov
Differential Revision: D6766236
fbshipit-source-id: 59e51faa1331d804b11da9f7237ee9ce0cb27df8
Summary:
Reason for this change:
(1) Setting/Getting default gpu id doesn't seem to be used at all.
(2) It actually is confusing compared to the CUDA_VISIBLE_DEVICES options etc.
(3) When setting cuda_gpu_id=-1 in the CUDAContext arg, it used to use the
default gpu id but probably we should use the current gpu - so that the caller
will be able to control the device placement.
One use case is for TensorRT - if we have a custom callback layer, then it would
be easier for TRT or whatever caller to set the running device.
Reviewed By: dzhulgakov
Differential Revision: D6740357
fbshipit-source-id: 2ea710e434b10220d5a198e31c93847304636863
Summary: Building on D6710785 (float <-> fused_8bit_rowwise conversions) and D6710843 (`FusedEmbeddingLookup`), this diff implements the new reduction operations for the fused 8-bit rowwise storage. I mostly followed the [old 8-bit quantized code](diffusion/FBS/browse/master/fbcode/caffe2/caffe2/operators/lengths_reducer_rowwise_8bit_ops.h) and [full-precision code](diffusion/FBS/browse/master/fbcode/caffe2/caffe2/operators/lengths_reducer_ops.h).
Reviewed By: kennyhorror
Differential Revision: D6710844
fbshipit-source-id: b9e85db7437bd32dd44d01733c3749f35c00b06e
Summary: This first diff adds the conversion operators that go from float to our fused 8bit rowwise quantized storage and back again. For now I've put the scale and bias in front of each row because it makes the pointer arithmetic nicer here and in the EmebddingLookup perfkernel. If benchmarks or other reasons point out that this is a bad idea we can change it easily.
Reviewed By: kennyhorror
Differential Revision: D6710785
fbshipit-source-id: 086ab91c12d3b472564a06eff6329be6cb9e680e
Summary:
This is to update the video input op in caffe2 so that it is up to date.
It adds additional support for:
1, optical flow and early fusion
2, different ways of sampling clips from video
3, different ways of resizing the input video
Reviewed By: dutran
Differential Revision: D6752788
fbshipit-source-id: 0cbd4d4bbbe97b0ada4cba7a55adc91a7af60d5f
Summary:
This updates https://github.com/caffe2/caffe2/pull/1096/ to build doxygen docs with cmake and fixes operator catalog generation. See the new README.md for details, but you can run
```
mkdir build && cd build
cmake -DBUILD_DOCS=ON .. && make
```
and
```
python caffe2/python/docs/github.py ~/c2docs/_docs/operators-catalogue.md
```
to generate docs.
There was one weird issue in `generator.py` that we sometimes receive tuples and sometimes objects. I handled this just by testing `isinstance`, but we might want to be more principled in the future.
Closes https://github.com/caffe2/caffe2/pull/1758
Reviewed By: pietern
Differential Revision: D6752127
Pulled By: orionr
fbshipit-source-id: 9ba9ad8efc920b27a57327f8a7d3050f3650d4ce
Summary: Added the RowWise functionality for SparseAdam, which saves roughly 2/3 memory usage by only keeping one first and second moment term for each row of the parameter tensor, rather than one for each individual parameter.
Differential Revision: D6679342
fbshipit-source-id: ce6fb27e35ce41a890c66f6089cd2748d10e7a44
Summary:
- fixed the false newline at the initialization of the crop layer translation which caused the exceptions described in issue #1215
Closes https://github.com/caffe2/caffe2/pull/1746
Differential Revision: D6716228
Pulled By: Yangqing
fbshipit-source-id: dd93b06b3b903f96505d6e6f8e67caeb6981fe66
Summary:
the fc needs to be in the output_gate_t scope so it can find its input
weights correctly
Closes https://github.com/caffe2/caffe2/pull/1739
Reviewed By: dzhulgakov
Differential Revision: D6705443
Pulled By: anderspapitto
fbshipit-source-id: 139e83ac77589a203ffe404fedab98eea5b1a51c
Summary: This diff enables setting model initialization seed, instead of random seed, when reproducible restults are desired.
Reviewed By: xianjiec
Differential Revision: D6642971
fbshipit-source-id: 387b1ee2ecef4f8f66570c882498fb97d7007e17
Summary:
Instead of constructing db_name as a member of checkpoint_manager, generalize
this function
Reviewed By: anshulverma
Differential Revision: D6671088
fbshipit-source-id: c528538def66933619f2fdf67820bca5d13571ea
Summary: Test in Jenkins fail becasue test_global_pooling_3d filtered too many tests. We made use of infered value of global_pooling (pad and stride will be constant) to reduce the test samples generated.
Reviewed By: pietern
Differential Revision: D6686840
fbshipit-source-id: d316c0e9f9070b12770170ab9f36e33de68a9ab9
Summary:
In D5681122 - when routing to global maxpool and average pool, the condition is not correct.
see T24876217 for discussion
Reviewed By: Yangqing
Differential Revision: D6665466
fbshipit-source-id: dcb5b4686249e6ee8e1e976ab66b003ef09b32fd