Commit Graph

28 Commits

Author SHA1 Message Date
Xuehai Pan
8d45f555d7 [BE] [1/3] Rewrite super() calls in caffe2 and benchmarks (#94587)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94587
Approved by: https://github.com/ezyang
2023-02-11 18:19:48 +00:00
Bugra Akyildiz
27c7158166 Remove __future__ imports for legacy Python2 supports (#45033)
Summary:
There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports:

```2to3 -f future -w caffe2```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033

Reviewed By: seemethere

Differential Revision: D23808648

Pulled By: bugra

fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38
2020-09-23 17:57:02 -07:00
Brian Wignall
e7fe64f6a6 Fix typos (#30606)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606

Differential Revision: D18763028

Pulled By: mrshenli

fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c
2019-12-02 20:17:42 -08:00
Swati Rallapalli
e63addfff6 Exponential decay of the weight of task loss (#27508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27508

Implemented a simple exponential decay of the weight of lr loss function, with a lower bound.

Test Plan:
buck test //caffe2/caffe2/fb/dper/layer_models/tests:mtml_test -- test_task_weight_decay
https://our.intern.facebook.com/intern/testinfra/testrun/3377699729136308

canary: f140103452

Reviewed By: chenshouyuan

Differential Revision: D17524101

fbshipit-source-id: 9a653e21a4ecb74dfc4ac949c9e3388f36ef3a20
2019-10-08 09:15:41 -07:00
Yu Shi
43a2fd0e24 Support focal loss in MTML
Summary:
[Not in need of review at this time]
Support focal loss in MTML (effectively dper2 in general) as described in https://arxiv.org/pdf/1708.02002.pdf. Adopt approach similar to Yuchen He's WIP diff D14008545

Test Plan:
Passed the following unit tests
buck test //caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_lr_loss_based_focal_loss
buck test //caffe2/caffe2/fb/dper/layer_models/tests:mtml_test_2 -- test_mtml_with_lr_loss_based_focal_loss
buck test //caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_lr_loss_based_focal_loss_with_stop_grad_in_focal_factor

Passed ./fblearner/flow/projects/dper/canary.sh; URL to track workflow runs: https://fburl.com/fblearner/446ix5q6

Model based on V10 of this diff
f133367092
Baseline model
f133297603

Protobuf of train_net_1 https://our.intern.facebook.com/intern/everpaste/?color=0&handle=GEq30QIFW_7HJJoCAAAAAABMgz4Jbr0LAAAz

Reviewed By: hychyc90, ellie-wen

Differential Revision: D16795972

fbshipit-source-id: 7bacae3e2255293d337951c896e9104208235f33
2019-08-25 01:42:25 -07:00
Xing Wang
b6f130aa70 try to enable uncertainty for lr loss (#17236)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17236

Following the paper in https://papers.nips.cc/paper/7141-what-uncertainties-do-we-need-in-bayesian-deep-learning-for-computer-vision.pdf, approximate the classification case with the regression formulation. For the LRLoss, add penalty based on the variance and regularization on the variance with a tunable parameter lambda.

Reviewed By: chocjy

Differential Revision: D14077106

fbshipit-source-id: 4405d8995cebdc7275a0dd07857d32a8915d78ef
2019-04-11 07:35:19 -07:00
Lu Fang
664fe34e0a
[Caffe2][fbcode=>GH sync] Update from facebook 4323b18ce13c (#7116)
* [fix] Re-enable events in RNN ops

We have earlier added event disabling in RNN ops as back then we didn't use
events, with current use cases this is no longer true
(https://fburl.com/8vd0lp8y)

* use ops with cude impl

* Revert D7729695: [caffe2][fix] Re-enable events in RNN ops

This reverts commit 4b215c7496fb724656ff4c776933a15bdbbcde5e

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [observer] Clean up observer_config.h

#accept2ship

* [1/n] Refactor dataio_test.py

Replace code duplication with a common function

* Add barrier net that runs before training nets

Add a synchonize barrier net that is run before training nets.  With this net, shards that are faster will wait for other shards before start training.  This reduce chances of the faster shards timing out during GLOO AllReduce.

Removed explicit data_parallel_model.py.synchronize call in holmes workflow.  Similar change in speech/asr_training workflow will come in another diff.

* Support the dnnlowp backend in caffe2_benchmark

This is for SHARE operator latency evaluation

* Migrate integral_image_op to main caffe2

migrate integral_image_op(GPU version) given by https://fburl.com/yvqezigi
to caffe2/caffe2/operators and implement its CPU version. Write up a test
using the hypothesis_test mechanism

* [pos_disc, fbcode] Implement unjoined lr loss

As explained in https://our.intern.facebook.com/intern/wiki/Model_Based_Calibration/, when the dataset is an joined data set, where labels might change later, we need to use unjoined logloss.

The implementation is almost the same as in Sigrid (https://fburl.com/1trngsls), where
    loss = y (log(p) - log(1-p)) + (1-y)(log(1-p)) = xy - (1-y)x - (1-y)log(1+exp(-x))

For x < 0, to ensure stability and avoid overflow, we reformulate the above exp as
    loss = xy - (1-y)x - (1-y)x + (1-y)log(1+exp(x)) = xy + (1-y)log(1+exp(x))

Then the final expression becomes
    loss = xy + (y - 1) x (x >= 0) - (1 - y) log(1 + exp(x - 2 x (x >= 0)))

where y is the true label, x is the dot product and p = logistic(x).

This kind of implementation is align with the current implementation of the original cross entropy in
https://phabricator.intern.facebook.com/diffusion/FBS/browse/master/fbcode/caffe2/caffe2/operators/cross_entropy_op.cc;0bae3b5d0f825897c5e0dd0ff10f489d7271bf25$7-13

* Keep the array to fix the conflict

* [C2] Compute Adagrad effective LR

The AdagradWithLR op outputs an extra blob which is contains the average effective learning rate across all weights in this blob.

* Open-source extractMetaNetDef & runGlobalInitialization, add new Predictor constructor from db file, and add run_map_outputs

1. Open-source extractMetaNetDef and runGlobalInitialization, for use in
2. new Predictor constructor from db file.
3. Add new run function that returns outputs as TensorMap

* Disable eigen cpu

Disable eigen cpu in transpose and reduce

* Introduce request_only/object_only property of ModelLayer

by default this is False

* A simple TC Caffe2 benchmark

We can run tunner, get MappingOptions and then use them to
compare against cuBLAS

currently broken due to LLVM issues. How to run:

hg checkout eec1ab31b59c03b8deded1c755a9abaf8c45be01
add D7401202
add D7434625
add D7506031
add D7540728

buck run @mode/dev-nosan tc/tc/benchmarks_python:caffe2_benchmark

* Move Caffe2 feature_maps_ops to open source

Need feature maps operators in open source project facebookresearch/BlueWhale

* Manually fix the conflicts in channel shuffle op

* Fix the inconsistency between different gh and fbcode

* Skip Adagrad GPU Test (Because some gpu implementation is missing)

* Fix another test to make sure it won't run on gpu when implementation is not available yet
2018-05-01 20:49:00 -07:00
Yinghai Lu
ef8f556212
[Caffe2] Changes done inside Facebook (#6378)
* fix unit test for sqrt op

From the error logging:

[idx, grad, grad_estimate] are:
[[ 146.            0.5           0.45776367]
 [ 147.            0.5           0.45776367]

The gradient == 0.5 is correct, which means the SqrtOp and its gradient is doing right job. (Because y = sqrt(x), loss = y^2/2 = x/2, and then d(loss)/dx = 1/2 = 0.5; )

The test failed because of numerical problem of grad_estimate (in unit test). It can be because the step_size is small, and float precision is not high (when there are multiple elements in the tensor, we do sum(y^2) to compute loss)

This diff
- increase the step size, and also move the test cases to be further away from 0 (where sqrt(x) is not well defined) to be safe :)
- also clean up, and merge the test case for inplace Vs. non-inplace

Tested with:

`CAFFE2_HYPOTHESIS_PROFILE=debug ai_bt caffe2/caffe2/python/operator_test:elementwise_ops_test -- "test_sqrt"`

* CompositeReader & CompositeReaderBuilder

A new type of reader gluing multiple readers together.

* Back out "Revert D7394363: [GanH]: Log D Trick for Cross Entropy with Sigmoid"

Original commit changeset: 9325a4356dbe

* [dai][WIP] convert params to int8 on ps before sending to trainer

Add float->uint8 conversion in addition to float->fp16 conversion in model_saver.

* [easy] improve unit test for sparse length sum ops

as desc.

#accept2ship

* Update GitHub upstream to 771fcb3455

* move sparse hash unique ops to OOS and add unit tests

- move the SparseHash version to OOS, since 'sparsehash' is already deps of caffe2 OOS: https://fburl.com/arssw4n1
- The 'SparseHash' engine is also being used in OOS, so the SparseHash version shall be in OOS to reduce confusion: https://fburl.com/o5ea7ah2

- fix the CUDA UniqueOp for the case when batch is empty.
- add unit test

* group_norm_op for caffe2

This is the cuda op for Group Normalization (GN): https://arxiv.org/abs/1803.08494

This code implements GN in one op that computes Y=gamma * (X-mu) / sigma + beta and also its gradients. It is expected to have minimal memory consumption (similar to the BN op), without creating new blobs if GN were implemented as several ops (e.g., reshape, norm_mean/std, affine_channel).

* Resubmit D7405233: disappeared in D7464958

OOS publish causes the op missing -- however, test was still there

* [c2] add sparse hash engine for cuda unique op

The SparseHash version of UniqueOp copy input tensor to CPU, and make use of sparse hash map to get unique output, and then copy back to GPU.

* [dper][gpu] enable unit testing gpu trainer for sparse nn

to debug the GPU trainer using mock data in unit test.

make it easier to develop GPU trainer for new models.

* Reuse Gloo context for Synchronize() calls

Previously we were creating (and leaking) the Gloo context on each call to Synchronize(). Now only run the common world op and create the barrier net once, then run the barrier net on each Synchronize() call. Since timeout is associated with the Gloo context, assert that the timeout is fixed instead of trying to handle the complexity of multiple timeouts (and associated contexts).

* [GanH/WGAN][1/n]: add FC param clipping

as titled

* [mobile] minimizing changes between caffe2_benchmark and speed_benchmark

* [GanH]: enable diagnose within model

avoid finding blob names but to directly enable inside the model

* Add `net_transformer_fun` option to DPM

This callback allows for various transformations to be made to the
model after gradient operators have been added. The immediate motivation for
this is to allow transformations such has "checkpoint-and-recompute" which
allow trading off memory for additional compute.

Adding several callbacks like this has made DPM's API less than ideal at this
stage. However, I could not find any reasonable alternative.

* [DT] [33/n] Compile flow task groups

task groups need to compiled in order to pickle the object in fblearner. However I also changed the Job's compile function as creating new object is not necessary.

* Initial commit for sparse_normalize vectorization and benchmark

* [GanH]: LB Calibration for JSD

as titled

* Tracing event in async executor

Adding event tracing through TRACE_EVENT macro in async executor

* [Resubmit] D7409751 Reseting book-keeping blobs when the reservoir is reset

D7409751 got lost in D7464958

* Visualizing realtime weights values

we want to visualize the weights values as optimizer is iterating. This diff supports to visual the weights at an assigned index.
Currently, we assume the blob to be 2 dimensional.

* [GanH][Easy]: Fix Homotopy Weighting

apparantely, there was a bug in homotopy weight (alpha, beta) update

* [c2] move sparse hash unique op out of oss

so that oss do not need to depend on google hash map.

* Get rid of std::round as it's not supported on Android

* Revert changes on setup.py

* Skip shaky test on Dataio

* fix
2018-04-10 21:11:43 -07:00
Andrey Malevich
b9d2ba1dbf Revert D7394363: [GanH]: Log D Trick for Cross Entropy with Sigmoid
This reverts commit d63266ccbc0c1390c58c2a71ae0b562fdec2fbc0

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files
2018-03-30 21:00:44 -07:00
Xiaolong Wang
2b0e39f569 [GanH]: Log D Trick for Cross Entropy with Sigmoid
as titled
2018-03-30 21:00:44 -07:00
Orion Reblitz-Richardson
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
Xiaolong Wang
a5279dccd4 [GanH]: homotopy JSD
as titled
2018-03-20 13:34:22 -07:00
sf-wind
602a09dde7 Update caffe2 from facebook 4f527ef46abf (#2234)
* [GanH]: two_task_discriminator

as titled

and adding label smooth

* [Dper2] Simplified UI options needed for blob magnitude visualization

* [GanH]: fix tags

as titled

* Added type and shape inference for GatherRange operator

This helps with type / shape inference when using this operator in layers.
Also just a nice to have in general.

* Demonstrate Caffe2 exception handling with StoreHandlerTimeoutError in Python

We'd like to catch and recover from certain Caffe2 net exceptions. Use this diff to demonstrate a pattern of registering a pybind exception mapping and catching in Pythonusing caffe2::StoreHandlerTimeoutException.

* Bind Gloo IoException to IoError in Python

Allow peer failure handling and recovery using an exception based mechanism. This diff registers gloo::IoException with pybind.

* [GanH]: add label smoothing to softmax with loss

as titled

* [C2] Enable LARS in Adagrad and hook it to DPER

* [DPER] Don't pass LayerModelHelper in create_trainer_nodes

Since we're planning to get rid of it eventually and I want to get access to
NetDef only interface ASAP - I'm looking towards removing all references to
LMH, where we don't really need them.

* fix bugs in LambdaRankNdcgOp

the loss and gradient in LambdaRankNdcgOp are incorrect. The loss should be negative log of probs instead of log.

* Restrict thread pool on iOS to only big cores

Historically, iPhones exposed only one type of cores, and Caffe2 thread pool used all of them.
However, iPhone 8/iPhone X exposes 2 big + 4 LITTLE cores. As our thread pool doesn't support work stealing or other forms of load balancing, fast cores end up waiting for the slow ones, and it may be better to restrict execution to only 2 fast cores, like we do on Android.

* Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine

Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine

* make clang happy and get fewer warnings

make clang happy and get fewer warnings

* [Personalization] Support add_output_schema() in layer_model_helper

Problem:
Currently the output_schema of sparse_nn can only be set once. https://fburl.com/efth5zer.

Solution:
For flexibility, we want to add fields to output_schema incrementally.

Plan:
Wrap the change of `model._output_schema` into a new function `add_output_schema()` for adding additional output_schema.

Callsite:
The add_output_schema() should be called instead at https://fburl.com/efth5zer

Reference:
The newly added `add_output_schema()` will be similar to `add_loss()` in https://fburl.com/t2ii8njh
2018-03-12 12:22:59 -07:00
Dmytro Dzhulgakov
968ebb3b82 [GanH]fuse jsd with lr loss/xent
as titled
2018-03-06 00:33:11 -08:00
Tiangao Gou
bc50510016 use numerically stable version of BatchLRLoss
Summary: change all use cases of BatchLRloss to the numerically stable version. This includes the uses of function build_loss defined in fbcode/caffe2/caffe2/fb/dper/layer_models/loss.py and class BatchLRLoss defined in fbcode/caffe2/caffe2/python/layers/batch_lr_loss.py.

Reviewed By: xianjiec

Differential Revision: D6643074

fbshipit-source-id: b5678556b03cbdd380cab8a875974a87c33d7f12
2018-01-02 13:18:36 -08:00
Xue Feng
0c588a500b Replace sigmoid + xent loss with SigmoidCrossEntropyWithLogits for better numerical stability
Summary: Replaced sigmoid + xent loss with SigmoidCrossEntropyWithLogits. The sigmoid layer computes the multinomial logistic loss of the sigmoid of its inputs. It's conceptually identical to a sigmoid layer followed by a multinomial logistic loss layer, but provides a more numerical stable gradient.

Reviewed By: xianjiec

Differential Revision: D6305455

fbshipit-source-id: 444c9f651fbdf13c3c52be5142769f8f98ed8770
2017-11-30 14:04:36 -08:00
Yangqing Jia
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
Jiyan Yang
a8695178aa Adding parameter sharing API to Dper2
Summary:
To achive this, I modified the blob name scheme defined in a layer.
Before it was scope/fc_w and scope/fc_w_auto_0 (if there is another fc
    within the same scope).
Now I change it to scope/fc/w and scope/fc_auto_0/w.
That is, we rely on the uniqueness of the scoped layer name to define
names for blobs.

I also overwrote the create_param method in LayerModelHelper to let it
use the resolved name for blobs given the sharingparameter context.

There are some details such as making the initializer more structured
that I need to finalize.

Reviewed By: kennyhorror

Differential Revision: D5435132

fbshipit-source-id: a0525f5ea0977e255dd5ea765b38913f5951d455
2017-08-03 00:33:18 -07:00
Fedor Borisyuk
686470a6b8 Feature importance in dper 2.0: build network representation
Summary: Changes to enable feature importance.

Reviewed By: kennyhorror

Differential Revision: D5075252

fbshipit-source-id: e5d46e129bcd5cbef77932c63b5a288dd57775d1
2017-06-05 18:03:34 -07:00
Huazhong Ning
e394b60a9c Support un-equal weight training for mtml models
Reviewed By: queqichao

Differential Revision: D5047939

fbshipit-source-id: 857d0d77e0413939e5774fa37d21b92a00d34bf0
2017-05-15 12:56:11 -07:00
Jiyan Yang
a458aa4b2a Fix tags to be based on EXCLUDE_FROM_{CONTEXT}
Summary: Cleaning up the tagging system. Introducing tags EXCLUDE_FROM_{CONTEXT}.

Reviewed By: kennyhorror

Differential Revision: D4974842

fbshipit-source-id: b0fa6772299bb70afa2192c39e45191c9f41336a
2017-05-02 09:32:27 -07:00
Jiyan Yang
795dc1c326 Remove loss ops from eval net
Summary: Current eval nets contain loss operators; see example: https://fburl.com/6otbe0n7, which is unnecessary. This diff is to remove them from the eval net.

Differential Revision: D4934589

fbshipit-source-id: 1ba96c20a3a7ef720414acb4124002fb54cabfc7
2017-04-26 12:46:25 -07:00
Xianjie Chen
9fc56793dd fix trunk for push and small cleanup
Summary:
multiple places broken, blocking the push :(

- fix the weighted training for ads and feeds
- fix the publishing if no exporter model is selected
- fix the feeds retrieval evaluation
- added the default config for retrieval workflows. plan to use for flow test (in next diff)
- clean up not used code
- smaller hash size for faster canary test

Reviewed By: chocjy

Differential Revision: D4817829

fbshipit-source-id: e3d407314268b6487c22b1ee91f158532dda8807
2017-04-02 23:35:49 -07:00
Aaron Markham
58f7f2b441 doxygen python block added
Summary: Closes https://github.com/caffe2/caffe2/pull/226

Differential Revision: D4793550

Pulled By: JoelMarcey

fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e
2017-03-29 06:46:16 -07:00
Huazhong Ning
ad4ae4528f migrate mtml to dper2
Summary:
1. migrate the basic mtml model to dper 2
2. test dper 2 mtml model
3. test all optimizers

Reviewed By: kittipatv

Differential Revision: D4680215

fbshipit-source-id: 7aac5c59bdac22fcad8ed869b98e9e62dca1d337
2017-03-16 17:48:05 -07:00
Xianjie Chen
d0621a2449 NextScopedBlob with well-defined behavior and respect namescope
Summary:
Remove the use of `NextName` in layer model helper, so that the same function return `model_helper` that should construct identical `Net`, when under the same NameScope.

The `NextScopedBlob` should only take effect when there is real name conflicting, otherwise it returns ScopedBlobReference.

This is critical for parameter blobs. In long run, we need to be able to specify parameter blobs more explicitly. (kennyhorror is working on this). This solution works in short term for e.g., two tower sparse nn models.

Reviewed By: kennyhorror

Differential Revision: D4555423

fbshipit-source-id: 2c4b99a61392e5d51aa878f7346466a8f14be187
2017-02-16 17:16:36 -08:00
Yangqing Jia
589398950f fbsync at f5a877 2016-11-18 15:41:06 -08:00
Yangqing Jia
238ceab825 fbsync. TODO: check if build files need update. 2016-11-15 00:00:46 -08:00