Commit Graph

1732 Commits

Author SHA1 Message Date
Yinghai Lu
582d47e986
[Caffe2] Scoped dummy name generator (#6458)
* Scoped dummy name generator

* Fix

* Fix

* Use class variable

* Fix build

* comment
2018-04-16 11:58:02 -07:00
bddppq
7ef14bf04c Follow the change of ONNX Cast operator "to" attribute (#6574)
* Follow the change of ONNX Cast operator "to" attribute

* Update Cast conversion in frontend and backend

* update pytorch onnx frontend
2018-04-16 14:24:42 -04:00
Xiaomeng Yang
cd2112717c
[caffe2] Update math functions with params on host. (#6602)
* Update ReduceMean

Add reduce mean to math

Add reduce mean to math

* sync reduce_ops_test

* Update math_gpu.cu
2018-04-14 21:41:41 -07:00
Yinghai Lu
434f710f3f
[Caffe2] Add support to TensorRT (#6150)
* Add support to TensorRT

* Removed License header

* Bind input/output by position

* Comments

* More comments

* Add benchmark

* Add warning for performance degradation on large batch

* Address comments

* comments
2018-04-11 17:03:54 -07:00
Yinghai Lu
ef8f556212
[Caffe2] Changes done inside Facebook (#6378)
* fix unit test for sqrt op

From the error logging:

[idx, grad, grad_estimate] are:
[[ 146.            0.5           0.45776367]
 [ 147.            0.5           0.45776367]

The gradient == 0.5 is correct, which means the SqrtOp and its gradient is doing right job. (Because y = sqrt(x), loss = y^2/2 = x/2, and then d(loss)/dx = 1/2 = 0.5; )

The test failed because of numerical problem of grad_estimate (in unit test). It can be because the step_size is small, and float precision is not high (when there are multiple elements in the tensor, we do sum(y^2) to compute loss)

This diff
- increase the step size, and also move the test cases to be further away from 0 (where sqrt(x) is not well defined) to be safe :)
- also clean up, and merge the test case for inplace Vs. non-inplace

Tested with:

`CAFFE2_HYPOTHESIS_PROFILE=debug ai_bt caffe2/caffe2/python/operator_test:elementwise_ops_test -- "test_sqrt"`

* CompositeReader & CompositeReaderBuilder

A new type of reader gluing multiple readers together.

* Back out "Revert D7394363: [GanH]: Log D Trick for Cross Entropy with Sigmoid"

Original commit changeset: 9325a4356dbe

* [dai][WIP] convert params to int8 on ps before sending to trainer

Add float->uint8 conversion in addition to float->fp16 conversion in model_saver.

* [easy] improve unit test for sparse length sum ops

as desc.

#accept2ship

* Update GitHub upstream to 771fcb3455

* move sparse hash unique ops to OOS and add unit tests

- move the SparseHash version to OOS, since 'sparsehash' is already deps of caffe2 OOS: https://fburl.com/arssw4n1
- The 'SparseHash' engine is also being used in OOS, so the SparseHash version shall be in OOS to reduce confusion: https://fburl.com/o5ea7ah2

- fix the CUDA UniqueOp for the case when batch is empty.
- add unit test

* group_norm_op for caffe2

This is the cuda op for Group Normalization (GN): https://arxiv.org/abs/1803.08494

This code implements GN in one op that computes Y=gamma * (X-mu) / sigma + beta and also its gradients. It is expected to have minimal memory consumption (similar to the BN op), without creating new blobs if GN were implemented as several ops (e.g., reshape, norm_mean/std, affine_channel).

* Resubmit D7405233: disappeared in D7464958

OOS publish causes the op missing -- however, test was still there

* [c2] add sparse hash engine for cuda unique op

The SparseHash version of UniqueOp copy input tensor to CPU, and make use of sparse hash map to get unique output, and then copy back to GPU.

* [dper][gpu] enable unit testing gpu trainer for sparse nn

to debug the GPU trainer using mock data in unit test.

make it easier to develop GPU trainer for new models.

* Reuse Gloo context for Synchronize() calls

Previously we were creating (and leaking) the Gloo context on each call to Synchronize(). Now only run the common world op and create the barrier net once, then run the barrier net on each Synchronize() call. Since timeout is associated with the Gloo context, assert that the timeout is fixed instead of trying to handle the complexity of multiple timeouts (and associated contexts).

* [GanH/WGAN][1/n]: add FC param clipping

as titled

* [mobile] minimizing changes between caffe2_benchmark and speed_benchmark

* [GanH]: enable diagnose within model

avoid finding blob names but to directly enable inside the model

* Add `net_transformer_fun` option to DPM

This callback allows for various transformations to be made to the
model after gradient operators have been added. The immediate motivation for
this is to allow transformations such has "checkpoint-and-recompute" which
allow trading off memory for additional compute.

Adding several callbacks like this has made DPM's API less than ideal at this
stage. However, I could not find any reasonable alternative.

* [DT] [33/n] Compile flow task groups

task groups need to compiled in order to pickle the object in fblearner. However I also changed the Job's compile function as creating new object is not necessary.

* Initial commit for sparse_normalize vectorization and benchmark

* [GanH]: LB Calibration for JSD

as titled

* Tracing event in async executor

Adding event tracing through TRACE_EVENT macro in async executor

* [Resubmit] D7409751 Reseting book-keeping blobs when the reservoir is reset

D7409751 got lost in D7464958

* Visualizing realtime weights values

we want to visualize the weights values as optimizer is iterating. This diff supports to visual the weights at an assigned index.
Currently, we assume the blob to be 2 dimensional.

* [GanH][Easy]: Fix Homotopy Weighting

apparantely, there was a bug in homotopy weight (alpha, beta) update

* [c2] move sparse hash unique op out of oss

so that oss do not need to depend on google hash map.

* Get rid of std::round as it's not supported on Android

* Revert changes on setup.py

* Skip shaky test on Dataio

* fix
2018-04-10 21:11:43 -07:00
Bram Wasti
7bd398b3db
Add fuseNNPACKConvRelu (#6439) 2018-04-10 16:51:16 -07:00
Qinqing Zheng
038b66ee07 [caffe2] use dictionary in Printer (#6443) 2018-04-10 10:37:07 -07:00
Qinqing Zheng
66791f54d5 Update the compile function of Job (#6323) 2018-04-09 22:44:23 -07:00
bddppq
df2e1d2962
Disallow using the OOP api workspace as context managers (#6456) 2018-04-09 22:13:54 -07:00
François Garillot
a91c88a348 Check mappings ONNX -> Caffe2 bear the same argument names (#6317)
* Check mappings ONNX -> Caffe2 bear the same argument names

When adding an extra arg to an input ONNX op, if it's not supported in Caffe2, the exporter would just silently pass it to NetDef and ignore it in the implementation. It's pretty error-prone. Caffe2 also has an OpSchema description and we can enforce that all arguments explicitly appear in schema or listed explicitly in Caffe2.

See also https://github.com/caffe2/caffe2/pull/2478

Add test for C2 argument checking

* Some operators do not log arguments, which prevents argument checks.
Invite users to file an issue to fix the schema.
2018-04-09 09:15:42 -07:00
Svetoslav Kolev
997acfd7fe [Caffe2] Some small changes to InferBlobShapesAndTypes definition and SameAsInput Schema (#6335)
* Change Same as input type deduction to work for ops with multiple outputs

* change InferBlobShapesAndTypes definition to take vector ot pointers instead of unique_ptr. The function doesn't own the objects, so no need to pass smart pointers and that prevents calling the function with existing object, since the caller has to create unique_ptr, i.e. copy an existing object just to create the pointer

* switching order of std::move<unique_ptr> and uniqur_ptr.get

* adding comma
2018-04-06 19:06:46 -07:00
Lu Fang
aab0bd3c13
Change onnx_optimizer API (#6290) 2018-04-06 13:46:53 -07:00
Lu Fang
876ad110af
Skip some unsupported onnx backend tests (#6247) 2018-04-05 21:33:35 -07:00
bddppq
8df2487de9
Properly skip the failing onnx conversion test (#6280) 2018-04-04 14:07:03 -07:00
kuttas
460e8cd376 change print to logger.warning in operator traceback code (#6216) 2018-04-03 08:01:25 -07:00
Qinqing Zheng
fd2e7cb487 Change JobRunner's __call__ function to train (#6205) 2018-04-02 21:04:36 -07:00
Paul Jesse Hellemn
771fcb3455 [caffe2] Fbcode to GitHub sync (#6208)
* [easy] allow empty tensor in cuda relu op

The diff has not enabled unit test of empty tensor, because MLKVersion of ReluOp need extra work to support

* Make blob norm plotting work with distributed trainer when the old framework is used
2018-04-02 16:35:27 -07:00
Orion Reblitz-Richardson
a409f959e8
Remove ShuffleNet from model zoo. (#6203)
* No longer supported.
2018-04-02 15:00:06 -07:00
Orion Reblitz-Richardson
cbe92abd7c Disable failing test_lengths_max_gpu 2018-03-30 21:00:45 -07:00
Ellie Wen
3d27095eec [easy] fix comments
nit: fix comments
2018-03-30 21:00:44 -07:00
Qinqing Zheng
365652229d Back out "Revert D7372460: [DT] [28/n] Lift epoch_limiter"
Original commit changeset: b0a986d16c3b
2018-03-30 21:00:44 -07:00
Andrey Malevich
b9d2ba1dbf Revert D7394363: [GanH]: Log D Trick for Cross Entropy with Sigmoid
This reverts commit d63266ccbc0c1390c58c2a71ae0b562fdec2fbc0

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files
2018-03-30 21:00:44 -07:00
Ellie Wen
363a227d19 extend bucketize op to support duplicated boundries
upgrade bucketize op to support duplicated boundaries
2018-03-30 21:00:44 -07:00
Jason Gauci
551d5fbf9a CUDA version of LengthsMax operator
CUDA version of LengthsMax operator

@override-unit-failures
2018-03-30 21:00:44 -07:00
Andrew Tulloch
0df662c67f [Caffe2] [Int8] More exhaustive unit tests for int8 ops (+ bug fix in Int8Add in-place case)
As title. This catches one bug in the Int8Add in-place case,
which wasn't tested in int8_test.cc
2018-03-30 21:00:44 -07:00
Xiaolong Wang
2b0e39f569 [GanH]: Log D Trick for Cross Entropy with Sigmoid
as titled
2018-03-30 21:00:44 -07:00
Andrey Malevich
f8eb8a66e2 Revert D7372460: [DT] [28/n] Lift epoch_limiter
This reverts commit 05bd9bec10fad5ff9dc40be88836fd7274d50ce9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files
2018-03-30 21:00:44 -07:00
Bram Wasti
ee64200c64 [nomnigraph] Expose transformations to python
Adding a python interface to the transformations
2018-03-30 21:00:44 -07:00
Yiming Wu
03c5198331 [C2 Int8][C2 Core]fetch int8 blob
Providing Python API to fetch Int8 tensors.

  data, scale. zero_point = workspace.FetchInt8Blob(blob_name)

now returns a tuple if the blob contains a Int8TensorCPU

     'data' = int8 data array
     'scale' = fake quantization scale
     'zero_point' = fake quantization offset

Although FetchBlob shares back-end implmentation with FetchInt8Blob, we raise
error to prevent unexpected behavior of the same method
2018-03-30 21:00:44 -07:00
Lu Fang
8f3ba30266 Fix a typo
Fix a typo in optimize_onnx_test.py
2018-03-30 21:00:44 -07:00
James Reed
47a1fd208f Quick and dirty raw value substitution from zip file (#2454) 2018-03-29 19:18:58 -07:00
Lu Fang
344fa57680 Adjust the test since only the op only has CPU implementation 2018-03-27 18:10:39 -07:00
Lu Fang
0ac8495165 Fix the CMake issues caused by internal changes 2018-03-27 18:10:39 -07:00
Xiaolong Wang
af3dcdf6ae [D2]: Improve loss weight by allowing omitted weights
as titled
2018-03-27 18:10:39 -07:00
Xiaolong Wang
d6c30ee6af [GanH]: Unifying two discriminators
to improve the flexibility and combines different discriminators in one model.
2018-03-27 18:10:39 -07:00
Jongsoo Park
3300e21d52 Add SparseLengthsPositionalWeightedSum operator that fuses SparseLengthsWeightedSum, LengthsRangeFill, and Gather
add SparseLengthsPositionalWeightedSum operator that fuses SparseLengthsWeightedSum, LengthsRangeFill, and Gather
2018-03-27 18:10:39 -07:00
Xianjie Chen
e6b04ba121 fix lengths sum cuda op for empty batch
the cuda does not allow launching empty kernel
2018-03-27 18:10:39 -07:00
Xianjie Chen
6ed9a0c3f2 fix cuda elementwise ops for empty batch
CUDA will fail to launch empty kernel
2018-03-27 18:10:39 -07:00
Dehua Cheng
c6587597d8 Ignore backward step when there is no loss function;
Ignore backward step when there is no loss function;

For some customized model, we can encode the update directly in forward step and there is no backward step;
2018-03-27 18:10:39 -07:00
Xiaolong Wang
c909abd85f [GanH] Label Smooth: Add Layer and Integrate to SparseNN
as titled
2018-03-27 18:10:39 -07:00
Yan Zhu
107cb670b1 add typecast and assertion for histogram computing
as title
2018-03-27 18:10:39 -07:00
Xianjie Chen
078b6d5ad1 [layer model] remove duplicated init ops
it saves some model init time, and reduce confusion.
2018-03-27 18:10:39 -07:00
Roxie He
d2453afb1e Add SumElementsInt operator
Added a caffe2 math sum operator so that it takes integers (only int32)
Changed the SumFloatIter to SumGenericIter so that it takes >1 types.
Added a sumElementInt operator
2018-03-27 18:10:39 -07:00
James Cross
16312e8123 [fbtranslate/onnx] decoder step (pytorch -> caffe2) exporter for fbtranlsate
This code introduces a new class for exporting decoder step (ensemble) models trained with fbtranslate pytorch to Caffe2 models via ONNX, for the purpose of use in "component beam search" being developed concurrently in C++ by @juancarabina.
2018-03-27 18:10:39 -07:00
Manoj Krishnan
a92a6233b5 Enable support for placeholder ops in InjectCrossDeviceCopies
This is required to support placeholder/decorator ops which does not have operator schema. Note that the change is made in such a way that it is a no-op if placeholder Ops are not used.

Changes:
1. Since the placeholder ops always run on CPU, added a utility to infer placeholder ops blob devices.
2. Placeholder op's input/output blobs should be on CPU as well. This change takes care of dealing with output blobs - i.e. use blobs on CPU.
3. Added a Unit test - test_inject_copy_placeholder_ops
2018-03-27 18:10:39 -07:00
Jiyan Yang
8fa38f8dce Add gradient clipping (#2452)
As titled.
2018-03-27 15:10:15 -07:00
Orion Reblitz-Richardson
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
Jason Gauci
f93e820e7d Revert "[C2][GPU]LengthsMax CUDA version (#2209)" (#2444)
This reverts commit 71acc269bb573c8c04343e6d534b2557a456b29a.
2018-03-27 01:15:52 -07:00
harouwu
6740126f5c [C2][GPU]LengthsMax CUDA version (#2209)
lengthsmax CUDA version.

will provide gradient later
2018-03-27 00:19:17 -07:00
Kutta Srinivasan
0e0918cb9a dpm synchronize 2018-03-26 19:54:31 -07:00