* [bootcamp] Improve "Shape" operator to support axes specification
To improve .shape operator of Caffe2 to support x.shape(tensor, axes), which takes an optional int array "axes" as input. For example, x.shape(tensor, [1, 0]) will return the dimension for axis 1 and 0 following the specified order. For current version, "axes" input allows duplications and can have arbitrary length.
* Back out "Add barrier net that runs before training nets"
Original commit changeset: b373fdc9c30f. Need additional changes to some callers to support barrier failures.
* Change warning to verbose log to reduce log spam
The `LOG(WARNING)` was a bit spammy for regular use so lets just make it a `VLOG`.
* Extract the shared code from different caffe2_benchmark binaries
The OSS benchmark and Internal benchmark will share most functions in the benchmark.
* Support MFR in sequence training
As titled.
* Make knowledge distillation work with using logged prediction feature as teacher label.
1) Add loading raw dense feature as teacher label.
2) Optional calibration function for teacher label
3) Add teacher label into generic unit test
4) Deprecated TTSN workflow version using feature_options to config teacher label
* [C2/CUDA]: unjoined cross entropy sigmoid
as desc
* Add async_scheduling executor into deferrable_net_exec_test
Add async_scheduling into tests and fix some exception cases
* Fix Event disabled error
When disabling event in RNN ops make sure we don't call Finish on disabled
event from op's RunAsync
* cuda ensure cpu output op can handle both TensorCPU and TensorCUDA
as desc.
* [C2 Core] Infer input device option in C2 hypothesis_test checkers
Improve how we default input blob device options.
Previously it defaults as where op lives but it is not necessarily the case.
For example:
CopyCPUToGPU
* [C2 Op]SplitByLengthsOp CPU/GPU implementation
[C2 Op]SplitByLengthsOp CPU/GPU implementation
* fix undefined symbol error
not sure why we're getting undefined symbol even with link_whole = True
Need to figure out why but need this workaround for now
* Add tools in DAIPlayground platform to help debugging models
Add additional tools to allow Plauground override individual method defined in AnyExp. This will allow user to create module that specificly change certain default method behavior. An example included in this diff is deactivating test model and checkpointing. When debugging any model problems, switching off components helps me quickly narrow down the location of the bug. The technique is extensively used in task T27038712 (Steady memory increase in EDPM, eventually resulting in gloo/cuda.cu:34: out of memory)
* add shape and type inference for int8 conversion operator
* Fix flaky test for group_norm
Fix flaky test for group_norm
* Fix group_norm_op_test flaky
Fix group_norm_op_test flaky
* Implementation of composite learning rate policy
In many state-of-the-arts deep learning works, people use a simple trick to
schedule the learning rate: use a fixed learning rate until error plateaus
and then switch to a different fixed learning rate, and so on. In this diff,
we implemented a simple version of the composite learning rate. The user gives
a set of learning rates policies and corresponding iteration nums, and the
optimizer will change the learning rate policy based on the number of iterations so far.
For example, the user give two learning rate policies, one is FixedLearningRate
and PolyLearningRate, with an iteration number of 1k. Then the first 1k iteration,
we use FixedLearningRate. For the following iterations, we use PolyLearningRate.
* Split two use cases of CachedReader into two classes, DBFileReader and CachedReader
# Use Cases:
1). input: DB file -> output: DatasetReader.
Use DBFileReader.
2). input: Reader -> build cache DB file -> output: DatasetReader.
Use CachedReader.
# Changes to CachedReader:
1). Move db_path to the constructor.
Because in mock reader. cache will always be built ahead.
# Changes to tests:
1). Make a separate TestCase class for CachedReader and DBFileReader.
2). Make it possible to add more test functions by adding setUp, tearDown and _make_temp_path.
3). Make delete db_path more general. `db_path` could be a file for `log_file_db`, but could also be a directory for `leveldb`.
* Back out "On Mobile phones, call GlobalInit with no arguments in predictor in case we need to perform initialization"
Original commit changeset: 4489c6133f11
* Fix LARS bug
Fixed a bug in the LARS implementation which caused all subsequent blobs not using LARS to have the LARS learning rate multiplier applied to them.
* [tum] support sparse init & add uniformFill option
as title
* Propagate exception for async nets
Capture the exception when an exception is thrown in async nets and re-throw it after wait(). This allows exceptions to be propagated up to the caller.
This diff was a part of D7752068. We split the diff so that C2 core files changes are in a separate diff.
* Automatic update of fbcode/onnx to 69894f207dfcd72d1e70497d387201cec327efbc
Previous import was 403ccfbd0161c38f0834413d790bad0874afbf9a
Included changes:
- **[69894f2](https://github.com/onnx/onnx/commit/69894f2)**: Use op schema.all tensor types in random like definitions (#865) <Scott McKay>
- **[b9d6b90](https://github.com/onnx/onnx/commit/b9d6b90)**: Clarify random like operators (#846) <Scott McKay>
- **[fc6b5fb](https://github.com/onnx/onnx/commit/fc6b5fb)**: Refactor shape inference implementation (#855) <anderspapitto>
- **[b7d8dc8](https://github.com/onnx/onnx/commit/b7d8dc8)**: fix cmake warning message (#863) <Eric S. Yu>
- **[f585c5d](https://github.com/onnx/onnx/commit/f585c5d)**: add pytorch-operator test for tile (#831) <Wenhao Hu>
- **[993fe70](https://github.com/onnx/onnx/commit/993fe70)**: add install step (#832) <Eric S. Yu>
- **[68bc26c](https://github.com/onnx/onnx/commit/68bc26c)**: add type inference for traditional ml ops except classifier ops. (#857) <Ke Zhang>
- **[9cc0cda](https://github.com/onnx/onnx/commit/9cc0cda)**: fix string representation of scalar types (#858) <G. Ramalingam>
- **[1078925](https://github.com/onnx/onnx/commit/1078925)**: fix y in pow test case to scalar (#852) <Wenhao Hu>
- **[c66fb6f](https://github.com/onnx/onnx/commit/c66fb6f)**: Add some math function shape inference (#845) <anderspapitto>
- **[ff667d1](https://github.com/onnx/onnx/commit/ff667d1)**: Refactor return type and docs for ONNXIFI_BACKEND_DIRECTX_ID (#853) <Marat Dukhan>
- **[11c6876](https://github.com/onnx/onnx/commit/11c6876)**: clear initializer names when clear initializer (#849) <Wenhao Hu>
- **[73c34ae](https://github.com/onnx/onnx/commit/73c34ae)**: Clarify FeatureVectorizer description. (#843) <Scott McKay>
- **[1befb9b](https://github.com/onnx/onnx/commit/1befb9b)**: Remove useless text in docs (#850) <Lu Fang>
- **[e84788f](https://github.com/onnx/onnx/commit/e84788f)**: Fix SELU attributes' default values (#839) <Lu Fang>
- **[ebac046](https://github.com/onnx/onnx/commit/ebac046)**: Add tile test case (#823) <Wenhao Hu>
- **[8b7a925](https://github.com/onnx/onnx/commit/8b7a925)**: a few more shape inference functions (#772) <anderspapitto>
- **[9718f42](https://github.com/onnx/onnx/commit/9718f42)**: Make the coefficient non optional for LinearClassifier (#836) <Jaliya Ekanayake>
- **[ef083d0](https://github.com/onnx/onnx/commit/ef083d0)**: Add save_tensor and load_tensor functions for Protos (#770) <Lu Fang>
- **[45ceb55](https://github.com/onnx/onnx/commit/45ceb55)**: Check if CMAKE_BUILD_TYPE set before project(). (#812) <Sergii Dymchenko>
- **[4b3d2b0](https://github.com/onnx/onnx/commit/4b3d2b0)**: [WIP] reenable shape inference tests (#834) <anderspapitto>
- **[22d17ee](https://github.com/onnx/onnx/commit/22d17ee)**: RNN tests: LSTM, GRU, SimpleRNN (#739) <Peyman Manikashani>
- **[de65b95](https://github.com/onnx/onnx/commit/de65b95)**: dimension denotation (#443) <Tian Jin>
- **[eccc76e](https://github.com/onnx/onnx/commit/eccc76e)**: fix field number issue in onnx operator proto and enable its build (#829) <Ke Zhang>
- **[d582beb](https://github.com/onnx/onnx/commit/d582beb)**: disable shape inference test to unbreak ci (#830) <Lu Fang>
- **[485b787](https://github.com/onnx/onnx/commit/485b787)**: function proto for composite op. (#802) <Ke Zhang>
- **[cd58928](https://github.com/onnx/onnx/commit/cd58928)**: specify defaults for attributes of Affine op (#820) <G. Ramalingam>
- **[7ee2cf9](https://github.com/onnx/onnx/commit/7ee2cf9)**: merge the dummy backend back into the main one (#743) <anderspapitto>
- **[1c03a5a](https://github.com/onnx/onnx/commit/1c03a5a)**: [Proposal] ONNX Interface for Framework Integration (previously ONNX Backend API) header and docs (#551) <Marat Dukhan>
- **[3769a98](https://github.com/onnx/onnx/commit/3769a98)**: Rename real model test case from VGG-16 to ZFNet (#821) <Lu Fang>
* [C2]ReluN Op
relu n op.
tf reference: https://www.tensorflow.org/api_docs/python/tf/nn/relu6
* Call destructor when assigning a blob value
* Add executor overrides
Add executor overrides flag to enable migration to async_scheduling executor
* Add barrier net that runs before training nets - attempt #2
Add a synchonize barrier net that is run before training nets. With this net, shards that are faster will wait for other shards before start training. This reduce chances of the faster shards timing out during GLOO AllReduce.
Removed explicit data_parallel_model.py.synchronize call in holmes workflow.
This change was landed previously but caused errors for some EDPM workflows - See https://fb.facebook.com/groups/1426530000692545/permalink/1906766366002237/ - because EDPM assumes any call to CreateOrCloneCommonWorld and Gloo ops are wrapped in exception handlers but in this case exception thrown in the barrier init net is not handled.
To address this issue, we add _CreateOrCloneCommonWorld to the param_init_net instead of a new barrier init net. Since errors for param_init_net run is handled gracefully and re-rendezvous, it should fixes the problem.
* Handle empty nets in async_scheduling
Make sure we don't get stuck on empty nets
* use CUDA_ARCH for conditional compile
* [C2 fix] infer function for ensure_cpu_output_op
* Update group_norm test to reduce flaky test
* Fix lr_multiplier for GPU
* [fix] Re-enable events in RNN ops
We have earlier added event disabling in RNN ops as back then we didn't use
events, with current use cases this is no longer true
(https://fburl.com/8vd0lp8y)
* use ops with cude impl
* Revert D7729695: [caffe2][fix] Re-enable events in RNN ops
This reverts commit 4b215c7496fb724656ff4c776933a15bdbbcde5e
@bypass-lint
An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files
* [observer] Clean up observer_config.h
#accept2ship
* [1/n] Refactor dataio_test.py
Replace code duplication with a common function
* Add barrier net that runs before training nets
Add a synchonize barrier net that is run before training nets. With this net, shards that are faster will wait for other shards before start training. This reduce chances of the faster shards timing out during GLOO AllReduce.
Removed explicit data_parallel_model.py.synchronize call in holmes workflow. Similar change in speech/asr_training workflow will come in another diff.
* Support the dnnlowp backend in caffe2_benchmark
This is for SHARE operator latency evaluation
* Migrate integral_image_op to main caffe2
migrate integral_image_op(GPU version) given by https://fburl.com/yvqezigi
to caffe2/caffe2/operators and implement its CPU version. Write up a test
using the hypothesis_test mechanism
* [pos_disc, fbcode] Implement unjoined lr loss
As explained in https://our.intern.facebook.com/intern/wiki/Model_Based_Calibration/, when the dataset is an joined data set, where labels might change later, we need to use unjoined logloss.
The implementation is almost the same as in Sigrid (https://fburl.com/1trngsls), where
loss = y (log(p) - log(1-p)) + (1-y)(log(1-p)) = xy - (1-y)x - (1-y)log(1+exp(-x))
For x < 0, to ensure stability and avoid overflow, we reformulate the above exp as
loss = xy - (1-y)x - (1-y)x + (1-y)log(1+exp(x)) = xy + (1-y)log(1+exp(x))
Then the final expression becomes
loss = xy + (y - 1) x (x >= 0) - (1 - y) log(1 + exp(x - 2 x (x >= 0)))
where y is the true label, x is the dot product and p = logistic(x).
This kind of implementation is align with the current implementation of the original cross entropy in
https://phabricator.intern.facebook.com/diffusion/FBS/browse/master/fbcode/caffe2/caffe2/operators/cross_entropy_op.cc;0bae3b5d0f825897c5e0dd0ff10f489d7271bf25$7-13
* Keep the array to fix the conflict
* [C2] Compute Adagrad effective LR
The AdagradWithLR op outputs an extra blob which is contains the average effective learning rate across all weights in this blob.
* Open-source extractMetaNetDef & runGlobalInitialization, add new Predictor constructor from db file, and add run_map_outputs
1. Open-source extractMetaNetDef and runGlobalInitialization, for use in
2. new Predictor constructor from db file.
3. Add new run function that returns outputs as TensorMap
* Disable eigen cpu
Disable eigen cpu in transpose and reduce
* Introduce request_only/object_only property of ModelLayer
by default this is False
* A simple TC Caffe2 benchmark
We can run tunner, get MappingOptions and then use them to
compare against cuBLAS
currently broken due to LLVM issues. How to run:
hg checkout eec1ab31b59c03b8deded1c755a9abaf8c45be01
add D7401202
add D7434625
add D7506031
add D7540728
buck run @mode/dev-nosan tc/tc/benchmarks_python:caffe2_benchmark
* Move Caffe2 feature_maps_ops to open source
Need feature maps operators in open source project facebookresearch/BlueWhale
* Manually fix the conflicts in channel shuffle op
* Fix the inconsistency between different gh and fbcode
* Skip Adagrad GPU Test (Because some gpu implementation is missing)
* Fix another test to make sure it won't run on gpu when implementation is not available yet
* [GanH]: two_task_discriminator
as titled
and adding label smooth
* [Dper2] Simplified UI options needed for blob magnitude visualization
* [GanH]: fix tags
as titled
* Added type and shape inference for GatherRange operator
This helps with type / shape inference when using this operator in layers.
Also just a nice to have in general.
* Demonstrate Caffe2 exception handling with StoreHandlerTimeoutError in Python
We'd like to catch and recover from certain Caffe2 net exceptions. Use this diff to demonstrate a pattern of registering a pybind exception mapping and catching in Pythonusing caffe2::StoreHandlerTimeoutException.
* Bind Gloo IoException to IoError in Python
Allow peer failure handling and recovery using an exception based mechanism. This diff registers gloo::IoException with pybind.
* [GanH]: add label smoothing to softmax with loss
as titled
* [C2] Enable LARS in Adagrad and hook it to DPER
* [DPER] Don't pass LayerModelHelper in create_trainer_nodes
Since we're planning to get rid of it eventually and I want to get access to
NetDef only interface ASAP - I'm looking towards removing all references to
LMH, where we don't really need them.
* fix bugs in LambdaRankNdcgOp
the loss and gradient in LambdaRankNdcgOp are incorrect. The loss should be negative log of probs instead of log.
* Restrict thread pool on iOS to only big cores
Historically, iPhones exposed only one type of cores, and Caffe2 thread pool used all of them.
However, iPhone 8/iPhone X exposes 2 big + 4 LITTLE cores. As our thread pool doesn't support work stealing or other forms of load balancing, fast cores end up waiting for the slow ones, and it may be better to restrict execution to only 2 fast cores, like we do on Android.
* Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine
Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine
* make clang happy and get fewer warnings
make clang happy and get fewer warnings
* [Personalization] Support add_output_schema() in layer_model_helper
Problem:
Currently the output_schema of sparse_nn can only be set once. https://fburl.com/efth5zer.
Solution:
For flexibility, we want to add fields to output_schema incrementally.
Plan:
Wrap the change of `model._output_schema` into a new function `add_output_schema()` for adding additional output_schema.
Callsite:
The add_output_schema() should be called instead at https://fburl.com/efth5zer
Reference:
The newly added `add_output_schema()` will be similar to `add_loss()` in https://fburl.com/t2ii8njh
* [C2] Implement Layer-wise Adaptive Rate Scaling (LARS)
* [C2] Implement Layer-wise Adaptive Rate Scaling (LARS)
* add unit test for Lars
* set default value for lars to be None
* remove lars for subclasses of SgdOptimizer
Summary: The original implementation averaged the momentum across the embedding dimensions, which doesn't make any sense. This meant all the embedding dimensions received the same update, becoming a very memory-expensive one-dimensional embedding.
Differential Revision: D7003135
fbshipit-source-id: ed54e3427bc13895a4e949e96b4b17f6ebfb6d53
Summary: Added the RowWise functionality for SparseAdam, which saves roughly 2/3 memory usage by only keeping one first and second moment term for each row of the parameter tensor, rather than one for each individual parameter.
Differential Revision: D6679342
fbshipit-source-id: ce6fb27e35ce41a890c66f6089cd2748d10e7a44
Summary: A while ago, we had to change some blob names in `optimizer.py` (more specifically, names of `iteration_mutex` and `optimizer_iteration`) to handle corner cases when preparing a net for parallel execution.
Reviewed By: azzolini
Differential Revision: D6480819
fbshipit-source-id: a03a7aa9fad322a50e7785914b0eb0f8654e6d90
Summary:
Add `RmsPropOptimizer` to `optimizer.py` so RMSProp can be used as an optimizer.
`RmpsPropOptimizer` uses `RmpPropOp` to update the gradient and `MomentumSGDUpdateOp` to update the model parameters.
Differential Revision: D6118279
fbshipit-source-id: e38b8380ff74c1d1bb1e87fc300b6b55e32cd2e0
Summary:
Current version of the code is not supporting type and shape inference that is
going to make all places that rely on it fail misserably.
I'm still leaving option of doing init in the old way in case if some places
are already failing this inference logic.
Reviewed By: ffjiang
Differential Revision: D6241270
fbshipit-source-id: e9080ffe93d610b5ada58ebe66579acfa57c6b3c
Summary: Add support for SparseMomentumSGDUpdate and tests for momentum SGD in both dense and sparse cases
Reviewed By: akyrola
Differential Revision: D6234834
fbshipit-source-id: 9848c29ea06794ef35f1ebaff0f5e81eac4f4db9
Summary: Made the asesrtion messasge clearer to let people know that rowwise is not supported for dense adagrad.
Differential Revision: D6135363
fbshipit-source-id: d706135a335305627310c69a2a6d7721b0a47f0e
Summary:
Added FP16SgdOptimizer to optimizers. The optimizer updates the params using the FP16MomentumSGDUpdate and FP32MomentumSGDUpdate ops. To determine which update op to call the optimizer expects either the fp32_update flag to be set, or that the blobs are in a recognized format created by initializers.py.
These requirements can be loosened if the blob DataType can be queried in python, though I am unsure of how to do this.
It also forces FP32 updates to SpatialBN as CuDNN does not support FP32 params for SpatialBN.
Reviewed By: asaadaldien
Differential Revision: D5840806
fbshipit-source-id: 84ab8dc11a6e91a198ed72c00287f4809607079d
Summary: Model with rowwise RMSProp does not work in net-rewriting pipeline (fbl 29841194). This diff solves the issue by changing the way Slice op is used in the model and adds a rule to `parallelize.py` to cover for needed cases.
Reviewed By: azzolini
Differential Revision: D6096022
fbshipit-source-id: c4f615b2ba99da9f77a1d49c9fb898e0e59401f8
Summary: 1. iteration and LR must be node-name specific in optimizer
Reviewed By: azzolini
Differential Revision: D6001124
fbshipit-source-id: 0fa53fb3347e89401f62125865166356ac56796b
Summary: Implemented version of SparseAdagrad that only keeps track of an average sum of squared gradients term for each row of the parameter tensor, rather than a sum of squared gradients term for each individual parameter.
Differential Revision: D5881918
fbshipit-source-id: bd96ccf25554b457baaaca9309fc8048adbb37f7
Summary: When trained on billions of data, the adagrad gradient square sum be very big and create an issue of adding small numbers to big numbers. This diff Allow to decay the adagrad gradient square sum.
Reviewed By: queqichao
Differential Revision: D5825932
fbshipit-source-id: 570224483b77d42ae53410fa2f767af86de167eb
Summary: While there is currently support for scaling the base learning rate when loading the model, there is not support for scaling the base learning rate during training. This is needed for LATTE's seq2seq translation models, as the learning schedule is not predefined and is modified at runtime.
Reviewed By: jhcross
Differential Revision: D5701391
fbshipit-source-id: ae3bec45f238db1a2be7af9c04d720067e9095d5
Summary: Moved code for global norm-based gradient clipping from fb specific workflows (seq2seq) to the open-source caffe2 optimizer library
Reviewed By: jhcross
Differential Revision: D5637453
fbshipit-source-id: 7e73c9a1c97c28a152c188467b27a6449f79242e
Summary:
Fix case when optimizer isn't called within a device scope context.
Fix OptimizerContext lr blob names
Reviewed By: volkhin
Differential Revision: D5421046
fbshipit-source-id: 186a0d05f40d4442c5ba5736084626da73a0c0f1
Summary: this diff adds optimizer into param_info, and the associated implementations for modelhelper and brew to set optimizer for each individual parameter.
Reviewed By: kennyhorror
Differential Revision: D5385432
fbshipit-source-id: 5d682f9d1ab077e04a5d76a24d71470f4e64fc92
Summary:
salexspb This fixes a major perf issue (40% boost on alexnet end-to-end perf) in the multi-precision SGD optimizer - it was causing repeated cudaMalloc / cudaFree calls during training iterations due to the changing size of the `grad` blob as it moved from fp16 <-> fp32.
Closes https://github.com/caffe2/caffe2/pull/797
Differential Revision: D5246978
Pulled By: salexspb
fbshipit-source-id: ec3d7ef18445e19eaf5aac908d0a7bcd5957eb60
Summary:
Add add_weight_decay to optimizer + test.
In D5142973 I accidentally removed weight decay from resnet50 trainer, so this restores it.
Reviewed By: asaadaldien
Differential Revision: D5173594
fbshipit-source-id: c736d8955eddff151632ae6be11afde0883f7531
Summary:
Recent diff introduced a duplicate parameter to the model, which would hurt the performance and also affect correctness (duplicate momentum updates, for example). We unfortunately had no checks for duplicate params, outside of data_parallel_model, which fortunately brought this into our attention.
But it is better to have a Validate() function in model_helper, and call that before adding gradient ops and querying for parameters. Added to brew_test calls as well.
Reviewed By: kennyhorror
Differential Revision: D5163458
fbshipit-source-id: 35692e8bfcc359d4e8bc73e6f2358659f6e45ceb
Summary:
Adds support for generating and training pfp16 models. Added SGD optimizer for multi-precision trainers and a new callback to data_parallel_model in order to help multi-precision models keep their different copies of parameters in sync during training.
Closes https://github.com/caffe2/caffe2/pull/697
Differential Revision: D5159712
Pulled By: salexspb
fbshipit-source-id: 60a889494d2e2f4df1d720331e19f638c5eb95cc
Summary: Fix an issue where the parameter is not created in param_init_net, or net, and then we secondarily look at which device op outputs the gradient. This did not work if the gradient was a GradientSlice.
Reviewed By: harouwu
Differential Revision: D5153102
fbshipit-source-id: 20eae660ea32e5a9ea484bf93c04c8f8c71a51ed
Summary:
This diff does two things:
- add supports for optimizer to data_parallel_model. User can supply optimizer_builder_fun instead of param_update_builder_fun. The latter is called for each GPU separately with proper namescope and devicescope, while optimizer builder only is called once and adds optimizes to the whole model.
- use MomentumSGDUpdate instead of MomentumSGD + WeightedSum. This bring major perf benefits.
Changes resnet50 trainer to use optimizer.
This relies on D5133652
Reviewed By: dzhulgakov
Differential Revision: D5142973
fbshipit-source-id: 98e1114f5fae6c657314b3296841ae2dad0dc0e2
Summary:
hankun is using the optimizer, but having mixed set of of GPU and CPU operators. Currently this won't work with optimizer since it adds optimizers for all parameters in the current device scope. But we can actually infer the device that a param belongs to by looking at the device option in the param_init_net.
Added a test as well.
Reviewed By: salexspb
Differential Revision: D5133652
fbshipit-source-id: ad8689d75ac1f5c78981bae1b6978fe91e40ef0f
Summary:
This is going to unblock Nvidia in their work on adding fp16
support to Caffe2. I discussed this with kennyhorror before to make
sure this fits into his work on parameter sharing.
Reviewed By: kennyhorror
Differential Revision: D5127797
fbshipit-source-id: 4db155d320b1862570c23b77c4252bdacbf2296f
Summary: mutex is only supported on CPU. need to make sure mutex and following atomicIter are both on CPU. This is critical for gpu SparseNN training
Differential Revision: D5093184
fbshipit-source-id: 021e6ba699a3208449fa4761cad6b0ec4544957e
Summary:
In transfer learning, parameter initialized from pretrained model might require
a different learning rate than otherwise initialized. To this end, here we
implement a python solution where `base_learning_rate` is scaled by `scale`,
which is in turn set by `scale_learning_rate`; Alternatively, we can achieve
same effect by rewriting the LearningRate operator in C++
Reviewed By: kennyhorror
Differential Revision: D4992827
fbshipit-source-id: 8d7e87a61c95b3eb8ef733ec436f4060e865c0ac
Summary:
Adds a parameter cost estimation step before the actual training starts. The costs are later used in order to better shard the parameters across instances of the parameter server.
Things I needed to modify:
- A few changes to make ModelLayerHelper picklable
- Add support for stopping a distributed job after a number of stats reporting steps.
- Refactored run_dist_job to support collocating the reader with the trainer even when PS are present.
- Option to disable dense updates (when num_dense_servers=0).
Currently there's a huge overhead posed by having to launch a child workflow. I'll try and address next in a subsequent diff.
This is WIP because the other workflows need to be migrated as well.
I can break this down into smaller diffs if reviewers would prefer it.
Reviewed By: kennyhorror
Differential Revision: D4974752
fbshipit-source-id: 04c336acb2945f8f11324a221ffc6967818c0672
Summary:
1. Adds a function to return auxiliary parameters for each optimizer. This function can be used to serialize the optimizers so that they can be recovered.
2. Fixes the bug that the iteration blob is not incremented by one in each iteration. Suppose there are k parameters using the adam learning rate optimizer, the iteration blob is incremented by k based on the original implementation.
Reviewed By: azzolini
Differential Revision: D4872397
fbshipit-source-id: d86711feedda2ba83af5f2a18141b06a6a473733
Summary:
The current optimizer code in c2/python has the following issues:
(1) the optimizers in sgd.py cannot config per param-blob optimizer;
(2) sgd.py is a bad file name. optimizer.py is a better name;
(3) layer_model_helper.py has another set of optimizer code (which supports per param-blob optimizer)
This diff did the following
(1) create optimizer objects so that we can config per param-blob optimizer and that are also compatible to the existing optimizer code
(2) the new optimizer code are much more modulized
(3) move the optimizer code to file with better name (optimizer.py)
(4) replace the optimizer imports in the existing code
will do in next diffs
(1) optimizers with structured parameters for dper2
(2) get rid of the optimizer code in layer_model_helper.py
Reviewed By: salexspb
Differential Revision: D4609013
fbshipit-source-id: 2e2d6dfa8685d10498f89069157453d9feca3f27