* [fix] Re-enable events in RNN ops
We have earlier added event disabling in RNN ops as back then we didn't use
events, with current use cases this is no longer true
(https://fburl.com/8vd0lp8y)
* use ops with cude impl
* Revert D7729695: [caffe2][fix] Re-enable events in RNN ops
This reverts commit 4b215c7496fb724656ff4c776933a15bdbbcde5e
@bypass-lint
An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files
* [observer] Clean up observer_config.h
#accept2ship
* [1/n] Refactor dataio_test.py
Replace code duplication with a common function
* Add barrier net that runs before training nets
Add a synchonize barrier net that is run before training nets. With this net, shards that are faster will wait for other shards before start training. This reduce chances of the faster shards timing out during GLOO AllReduce.
Removed explicit data_parallel_model.py.synchronize call in holmes workflow. Similar change in speech/asr_training workflow will come in another diff.
* Support the dnnlowp backend in caffe2_benchmark
This is for SHARE operator latency evaluation
* Migrate integral_image_op to main caffe2
migrate integral_image_op(GPU version) given by https://fburl.com/yvqezigi
to caffe2/caffe2/operators and implement its CPU version. Write up a test
using the hypothesis_test mechanism
* [pos_disc, fbcode] Implement unjoined lr loss
As explained in https://our.intern.facebook.com/intern/wiki/Model_Based_Calibration/, when the dataset is an joined data set, where labels might change later, we need to use unjoined logloss.
The implementation is almost the same as in Sigrid (https://fburl.com/1trngsls), where
loss = y (log(p) - log(1-p)) + (1-y)(log(1-p)) = xy - (1-y)x - (1-y)log(1+exp(-x))
For x < 0, to ensure stability and avoid overflow, we reformulate the above exp as
loss = xy - (1-y)x - (1-y)x + (1-y)log(1+exp(x)) = xy + (1-y)log(1+exp(x))
Then the final expression becomes
loss = xy + (y - 1) x (x >= 0) - (1 - y) log(1 + exp(x - 2 x (x >= 0)))
where y is the true label, x is the dot product and p = logistic(x).
This kind of implementation is align with the current implementation of the original cross entropy in
https://phabricator.intern.facebook.com/diffusion/FBS/browse/master/fbcode/caffe2/caffe2/operators/cross_entropy_op.cc;0bae3b5d0f825897c5e0dd0ff10f489d7271bf25$7-13
* Keep the array to fix the conflict
* [C2] Compute Adagrad effective LR
The AdagradWithLR op outputs an extra blob which is contains the average effective learning rate across all weights in this blob.
* Open-source extractMetaNetDef & runGlobalInitialization, add new Predictor constructor from db file, and add run_map_outputs
1. Open-source extractMetaNetDef and runGlobalInitialization, for use in
2. new Predictor constructor from db file.
3. Add new run function that returns outputs as TensorMap
* Disable eigen cpu
Disable eigen cpu in transpose and reduce
* Introduce request_only/object_only property of ModelLayer
by default this is False
* A simple TC Caffe2 benchmark
We can run tunner, get MappingOptions and then use them to
compare against cuBLAS
currently broken due to LLVM issues. How to run:
hg checkout eec1ab31b59c03b8deded1c755a9abaf8c45be01
add D7401202
add D7434625
add D7506031
add D7540728
buck run @mode/dev-nosan tc/tc/benchmarks_python:caffe2_benchmark
* Move Caffe2 feature_maps_ops to open source
Need feature maps operators in open source project facebookresearch/BlueWhale
* Manually fix the conflicts in channel shuffle op
* Fix the inconsistency between different gh and fbcode
* Skip Adagrad GPU Test (Because some gpu implementation is missing)
* Fix another test to make sure it won't run on gpu when implementation is not available yet
with python3 np.int defaults to int64. This diff should fix it. I don't know if test exist for this function already, however following ASR test was breaking when i switch to py3
```
buck test caffe2/caffe2/fb/speech/asr_training/:tensor_parser_test
```
Summary:
There is a long lasting problem of scoping which was introduced in original python wrappers early in H1. Basically each RNNCell implemented has to manually scope outputs of each of the operators. If somebody forgets, then there could be weird bugs with layers etc.
Approach is the following. User has to explicitly specify current scope when using apply_over_sequence function and others if the function is going to be called several times (like for stacking layers). This way we use Caffe2 native scoping approach instead of inventing one extra API people have to use (i.e. passing scope name as an argument to the RNNCell constructor).
Closes https://github.com/caffe2/caffe2/pull/1681
Differential Revision: D6777536
Pulled By: salexspb
fbshipit-source-id: 73d860b8d4857589e04bdea5a6fcd3080d68427c
Summary:
Data type conversion between Numpy Array and Caffe2 Tensor currently only support 3 types: FLOAT, DOUBLE and INT32. Support 8bit and 16bit date types will help reduce the model size in some circumstance. I benefit from this to reduce size of a data set from 8GB to 1GB by using INT8.
Closes https://github.com/caffe2/caffe2/pull/930
Reviewed By: Yangqing
Differential Revision: D5440929
Pulled By: akyrola
fbshipit-source-id: 3762da1d845e62a13ba384d1c144328b19dd663b
Summary: Our existing serialization routines take a significant amount of time for large numpy arrays in order to verify the type of each element in the array as well as converting each element to a canonical type. For large floating-point tensors, such as model parameters, this checking and converting takes a significant amount of time. Adding a fast track path for just float32 arrays as this is the most common use case to worry about.
Reviewed By: akyrola
Differential Revision: D5389953
fbshipit-source-id: 26f44cb2426ea3efb849e7707b27d5485f69956c
Summary:
fixing missing future package issue.
Recently we found some of our users does not have future module support. So we might need a try/catch wrapper around all past import
Reviewed By: Yangqing
Differential Revision: D5183547
fbshipit-source-id: 262fdf2940ee1be4454bf0b0abb9e6a0f1a0ee82
Summary: This diff is one step towards enabling python 3 build by making it be more diligent in its handling of strings.
Reviewed By: salexspb
Differential Revision: D4893083
fbshipit-source-id: 28b8adf3280e8d1f0a7dc9b0fee5ad53f2fada57
Summary:
Free scratch blobs at data workers exit. Also add utility function that you can use to reset gradient blobs easily:
from caffe2.python import utils
grad_blobs = [b for b in workspace.Blobs() if b.endswith("_grad") or b.endswith("_shared")]
utils.ResetBlobs(grad_blobs)
Reviewed By: rpenggithub
Differential Revision: D4955531
fbshipit-source-id: d33b2bb2b5247dd2c4cff51c82b1257c871a4179
Summary:
If command line flag caffe2_gpu_memory_tracking is enabled, CUDAContext will keep track of total memory allocated on each GPU. This requires keeping tracking of the sizes of the pointers, thus it might add some overhead, and is thus optional. The overhead is minimal in practice since we don't do allocations after first iterations, usually, though.
Added an op GetGPUMemoryUsage() to fetch this data programmatically, and python function utils GetGPUMemoryUsageStats() to call this op and package the results. Modified LSTM benchmark to report these stats.
This tracking is only for GPU now. CPU allocations are less organized..
Reviewed By: asaadaldien
Differential Revision: D4877451
fbshipit-source-id: 857798fe499d8c78cc590783052cbb2d4db56ea0
Summary:
Use AddNet and AddBlobs to add net and blobs to meta_net_def.
This a codemod and does not change the functionality.
It is for preparation of the protobuf change.
Depends on: D4770648
Reviewed By: salexspb
Differential Revision: D4771110
fbshipit-source-id: 00cecb2105f2c332bd50c3c51b9a10e1004fa90f
Summary:
Codemod to use a separate function, for protobuf change later on
It does not change the functionality
Reviewed By: salexspb
Differential Revision: D4770648
fbshipit-source-id: d8090f45d31ffa5ca1dca47297fb7c196f34d8a6
Summary:
Now it takes two lines to get drop-in debugger: import it and
then decorate your function. Also got rid of enable / disable logic as
it doesn't seem usefull.
We can also try to enable this by default for our tests when running
locally as a next step.
Reviewed By: bwasti
Differential Revision: D4444299
fbshipit-source-id: 6e2006945d8ad640685b1017ca1bd63054728908
Summary:
It helps to develop scripts locally (when working outside of Flow). One doesn't have to rerun the script in order to catch exception in the debugger / add a print statement. (Flow does this kind of thing automatically)
Usage example:
```
if __name__ == '__main__':
workspace.GlobalInit(['caffe2', '--caffe2_log_level=2'])
from caffe2.python.utils import DebugMode
DebugMode.enable()
DebugMode.run(main)
```
Reviewed By: Yangqing
Differential Revision: D4424096
fbshipit-source-id: 73f418c80f581820e70139df7e166981e4d8c55f
Summary:
I got a weird error about NoneType not being iterable which made me think
it was some error in the C2 core, whereas it was an error in my code.
Reviewed By: Yangqing
Differential Revision: D4192799
fbshipit-source-id: 0122f13e205c1c6a0766545f0ad6296228d3a3d9