Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28620
All Tensors are Variables now, they just happen to have requires_grad=False. Tensors ALWAYS have `VariableTensorId` in their type set.
When constructing this patch, I had to make decisions about what I would fix in this patch, and what I would leave for follow up PRs. Here is the cleanup that happens in this patch:
- The `is_variable` property is removed from TensorOptions. I removed this immediately because unlike Tensor::is_variable, TensorOptions::is_variable doesn't respect our VariableTensorId thread-local state. This means that there were a bunch of places where TensorOptions::is_variable was false, which is obviously bogus in the world when tensor and variable are merged. Instead of keeping the method as a function that always returns true, I just opted to remove it entirely (it's not public API.) All places we set `is_variable` are deleted.
- Knock on effect: there is no longer a separate DeprecatedTypeProperties for the variable and non-variable versions of type.
- Knock on effect: instead of asserting on TensorOptions::is_variable, instead we just test `at::impl::variable_is_excluded()`
- There is now only one copy of the cuDNN RNN dropout cache, not two (I'm not sure why we had two to begin with)
Some cleanup that doesn't happen in this patch:
- Eliminating unnecessary uses of `make_variable`
- Eliminating `Tensor::is_variable`
The most subtle part of this patch is retaining tracing behavior: the fact that everything is a Variable means that more code gets routed to VariableType than before; this can change traces. I identified two places where we didn't appropriately turn off VariableType, mostly factory functions:
- `torch.tensor` must turn off VariableType before invoking `at::empty` to construct the tensor, as it subsequently does direct data access
- `tensor_slow` (invoked when you pass a Python scalar to a tensor argument) must turn off VariableType before calling `scalar_to_tensor` so the scalar gets traced as constant, rather than as a call to `scalar_to_tensor`.
Honestly, these are all giant hacks, and should be replaced with a more specialized guard that just toggles tracing.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: dreiss
Differential Revision: D18171156
Pulled By: ezyang
fbshipit-source-id: 5b6a045beba37492647e350190f495114e86504d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29045
Addressing an issue seen in GitHub https://github.com/pytorch/pytorch/issues/28958
It seems sometimes the workers in this test don't stop cleanly. The purpose of this test is to check that the init_fun in init_workers works as expected, which is captured by the assertEqual in the for loop in the test. The behavior of stop() is not really important here.
The fact it's returning false is probably indicative that a worker is getting blocked but that doesn't affect the correctness of the test.
Test Plan: Ran the test 100 times, it consistently succeeds.
Reviewed By: akyrola
Differential Revision: D18273064
fbshipit-source-id: 5fdff8cf80ec7ba04acf4666a3116e081d96ffec
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28717
Make HasElements support multiple inputs. Any input has element, then return true.
Test Plan: to be added
Reviewed By: BIT-silence
Differential Revision: D17972759
fbshipit-source-id: 3ecdea74a30fcfaaa6490fef1debc6cde68db922
Summary:
This PR makes Caffe2 compatible with TensorRT 6. To make sure it works well, new unit test is added. This test checks PyTorch->ONNX->TRT6 inference flow for all classification models from TorhchVision Zoo.
Note on CMake changes: it has to be done in order to import onnx-tensorrt project. See https://github.com/pytorch/pytorch/issues/18524 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26426
Reviewed By: hl475
Differential Revision: D17495965
Pulled By: houseroad
fbshipit-source-id: 3e8dbe8943f5a28a51368fd5686c8d6e86e7f693
Summary:
Codemod to remove all thread.isAlive() since it throws a warning that is breaking some tests that monitor the output of their cli's
is_alive() was added in python 2.6 this is super safe
This is a codemod I don't care if the code supports python3, just that its python code
Test Plan: unittests
Reviewed By: cooperlees
Differential Revision: D18069520
fbshipit-source-id: 4ca4dcb541c0b0debeb194aba5d060152ad0ef0e
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28327
Test Plan:
Failed as expected and the full protobuf is logged
f145060005
Reviewed By: ffjiang, wx1988
Differential Revision: D17975560
fbshipit-source-id: 5375acffc1f9dede16622b06eb58b6c3a26ebe5a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28265
Fix the difference in dper3 and dper2 when regressionLoss is used.
Test Plan:
test using dper2 model id f134632386
Comparison tool output before change:
```
FOUND OP DIFFERENT WITH DPER2!!!
OP is of type ExpandDims
OP inputs ['supervision:label']
OP outputs ['sparse_nn/regression_loss/mean_squared_error_loss/ExpandDims:0']
===============================
Finished all dper3 ops, number of good ops 11, bad ops 1, skipped 26
run_comparison for dper2 / dper3 nets running time: 0.0020143985748291016
result type: <class 'NoneType'> result: None
```
After change:
```
FOUND OP DIFFERENT WITH DPER2!!!
OP is of type ExpandDims
OP inputs ['sparse_nn_2/regression_loss_2/mean_squared_error_loss_8/Squeeze:0_grad']
OP outputs ['sparse_nn_2/over_arch_2/linear_2/FC_grad']
===============================
Finished all dper3 ops, number of good ops 19, bad ops 1, skipped 16
run_comparison for dper2 / dper3 nets running time: 0.0017991065979003906
result type: <class 'NoneType'> result: None
```
dper2 label part of net P111794577
dper3 label part of net after change P116817194
Reviewed By: kennyhorror
Differential Revision: D17795740
fbshipit-source-id: 9faf96f5140f5a1efdf2985820bda3ca400f61fa
Summary: previously loss_weight is not used correctly for self-supervision branch
Test Plan: buck test mode/dev-nosan //caffe2/caffe2/fb/dper/layer_models/models/experimental/tests:tum_test
Reviewed By: xianjiec
Differential Revision: D17862312
fbshipit-source-id: 554b793a5caa3886946c54333c81a0d8a10230d9
Summary:
We are seeing error "[enforce fail at BlackBoxPredictor.cpp:134] ! !parameter_workspace->HasBlob(out). Net REMOTE of type predict_net writes to blob cat/NGRAM_QRT_VERSIONS_x_EVENT_TYPE_AUTO_FIRST_X/Pool_Option_0/Repeat_0/sparse_lookup/w which exists in the parameter workspace" in online testing for calibration models.
I'm suspecting it's due to the op CopyRowsToTensorOp are being used in prediction
Test Plan:
f143080108 offline predict net does not contain CopyRowsToTensorNet, which looks right.
Waiting for Olga to test online behavior
dper2 canary:
https://fburl.com/fblearner/sv3o3yj1
Differential Revision: D17741823
fbshipit-source-id: 19721b632b5ea9ebfa1ef9ae0e99d3a10c926287
Summary: Currently accelerators does not have the concept for fp32, it only has understandings of fp16 and int8 in terms of data input. In order to fixe the issue here, we want to make sure unaries are turned into fp16 when we have the int8 exporter turned on.
Reviewed By: kennyhorror
Differential Revision: D17743791
fbshipit-source-id: 7322d23eb12ac3f813b525fc0ddd066f95c8ca85
Test Plan:
The notebook showed no diff for id score list
https://our.intern.facebook.com/intern/anp/view/?id=154764
Reviewed By: alyssawangqq
Differential Revision: D17649974
fbshipit-source-id: 84cb4ae372fc215295c2d0b139d65f4eacafae4a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27756
Implement approximate L0 norm for use in the dense feature regularizer that will be used for feature importance. The formula is as follows:
{F212246801}
Reviewed By: wx1988
Differential Revision: D17432708
fbshipit-source-id: 57d6c9c3dd1b4e210b9f10264075c57dbc9c8cb6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27592
The caffe2 data reader test `test_time_limit_reader_with_short_limit` is flaky as-written because it places an upper bound on how much can be read, but under stress it is possible for fewer records to be read. The fix is to make the assertion check a fuzzy/range check rather than exact equality, since there's not a straightforward way to precisely test a timer-based feature.
ghstack-source-id: 91543898
Test Plan:
`buck test mode/dev-tsan //caffe2/caffe2/python:dataio_test-2.7 -- --stress-runs 20` -> P117156924 (with fix, 100% pass)
P117158750 - without fix, lots of failures in this test
Reviewed By: boryiingsu
Differential Revision: D17816775
fbshipit-source-id: 2ab0d3304fbd9c9806d37a4fe2912c840616db61
Summary: This test was failing in 3.7, turns out it was ommitted by test director in 3.6 so I added a skip for both versions
Test Plan: unittests is skipped in 3.7 and 3.6 all other tests pass.
Reviewed By: tomdz
Differential Revision: D17820967
fbshipit-source-id: 571f0ec7fe1b0cb50ead4e0d18c00151a701f36a
Summary:
Support attention weights input to SparseLookup. In attention sum pooling, if attention weights can be pre-calculated before embedding lookup, they can be passed to SparseLookup and processed by SparseLengthsWeightedSum op. One example is id_score attention sum pooling.
Essentially the net is converted from:
LengthsSum(Mul(Gather(keys, w), att_weight))
to:
SpaseLenghtsWeightedSum(keys, w, att_weight)
It unblocks potential efficiency gain with distributed training.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26748
Test Plan: unit test
Reviewed By: chocjy
Differential Revision: D17553345
Pulled By: wheatkit
fbshipit-source-id: 60cc3c4b0bc1eade5459ac598e85286f3849a412
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27508
Implemented a simple exponential decay of the weight of lr loss function, with a lower bound.
Test Plan:
buck test //caffe2/caffe2/fb/dper/layer_models/tests:mtml_test -- test_task_weight_decay
https://our.intern.facebook.com/intern/testinfra/testrun/3377699729136308
canary: f140103452
Reviewed By: chenshouyuan
Differential Revision: D17524101
fbshipit-source-id: 9a653e21a4ecb74dfc4ac949c9e3388f36ef3a20
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26376
* Create the new dense_feature_reg (FCInputLpNorm) for feature importance to be applied to the fully-connected layer for feature-importance.
Test Plan: * Unit test located in: `caffe2/caffe2/fb/dper/layer_models/tests/split_1/sparse_nn_test.py`
Reviewed By: un-disclosed
Differential Revision: D17360361
fbshipit-source-id: 1a0e119eeb17199a13dfffe58b3036ea4255e301
Summary:
In some version of python, then_net and else_net may switch the order. Let's make sure we are iterating the right arg node.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26931
Reviewed By: hl475
Differential Revision: D17614829
Pulled By: houseroad
fbshipit-source-id: 3f1b4eb91ecf4d808f58c34896d3e628aa2e0af0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26654
As per python contract, __getattr__ can only throw AttributeError. Throwing something else breaks hasattr() and causes upstream issues.
Similar bug was in pytorch earlier.
Test Plan: builds
Differential Revision: D17529471
fbshipit-source-id: bb6ac6c9e3be8b80fa2967e6a2e293afd1594cf9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25426
Add embedding table 4bit quantization support.
* add the conversion from fp32 to int4.
* using brew to pass the context so that the 4bit operators are added when generating the predictor net.
Reviewed By: kennyhorror, chocjy
Differential Revision: D16859892
fbshipit-source-id: a06c3f0b56a7eabf9ca4a2b2cb6c63735030d70b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26227
In the previous implementation of composite lr, the lr_scale for each sub policy will be rewritten by the last lr_scale.
Due to another bug in unittest (where policy_lr_scale being the same for all sub policies), this bug was not detected by unittest...
Fix: add an additional field in CompositeLearningRateItem so that we store lr_scale values for all sub policies
If fix unittest, the error in previous implementation:
https://fburl.com/testinfra/ikdbnmey
With the fix,
https://fburl.com/testinfra/m694ehl1
Test Plan:
unittest
buck test caffe2/caffe2/python/operator_test:learning_rate_op_test -- test_composite_learning_rate_op
Reviewed By: chocjy, alex1o1o7cloud
Differential Revision: D17380363
fbshipit-source-id: 161e9cb71bb2ea7f0734a3361e270616057a08e4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26147
We may try to unpickle a byte string in py3 that was pickled from py2. Therefore we need to add encoding latin1.
Reviewed By: kennyhorror
Differential Revision: D17305677
fbshipit-source-id: c0c8a51909629a65eb72bb81cccfbabaee9f8d01
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25908
Original commit changeset: f6e961e88c01
device_option propagation is completely broken in Caffe2 for cases when pass through operators are used. As an example Gather operator don't have gradient and passes through it's inputs, which results in incorrect detection of the components for sparse parameter aggregation (component will be empty instead of the real device).
This diff is trying to fix this issue.
Original diff had a problem, that Caffe2 is not handling cases when device option is present, but contains only metadata (for example one for auto-generated reduction ops in backward pass). This diff is addressing this issue by merging device options during the backward pass
Test Plan:
1. net_transform is finally working with Gather + FloatToHalf transformed model instead of failing because of incorrect number of components.
2. New unit-test.
3. Verify that previously broken benchmark is now passing
ezyang do you have suggestions what else I should test?
Reviewed By: ezyang
Differential Revision: D17281528
fbshipit-source-id: 4a1bc386f29f6a34fbf8008effde9d4890abebfa
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26153
I am suspecting that our multithreaded test-system causes issue with dyndep, if two places try to concurrently InitOpsLibrary. So perhaps we just guard this by a lock. This is just a guess-fix, as it is impossible to repro.
Test Plan: sandcastle
Reviewed By: bddppq
Differential Revision: D17361310
fbshipit-source-id: 596634a2098b18881abbd26a5a727a5ba0d03b6e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26080
Will be used in c2 ctr_mbl_feed model to PyTorch conversion
Test Plan: Unit test
Reviewed By: yinghai
Differential Revision: D17337604
fbshipit-source-id: a90d9f5dc38301608d1562c6f2418e7f4616e753
Summary:
Just a tiny fix to make debugging easier (output errors to stderr and include in the exception message)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25809
Reviewed By: zrphercule
Differential Revision: D17329957
Pulled By: houseroad
fbshipit-source-id: 0d73dd9f62c735fbc5096e6a7c0e5f58e4cd90ae
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25782
Enable variable size embedding for dot processor. We split the embedding matrix into multiple towers, based on the embedding size and perform dot product in a loop over each of the towers and finally concatenate all the dot product outputs.
Test Plan:
buck test //caffe2/caffe2/fb/dper/layer_models/tests/split_1:
https://our.intern.facebook.com/intern/testinfra/testrun/3659174703037560
Specific unit tests --
buck test //caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_per_feature_emb_dim
https://our.intern.facebook.com/intern/testinfra/testrun/3377699726358808
Reviewed By: chenshouyuan
Differential Revision: D16690811
fbshipit-source-id: 8f5bce5aa5b272f5f795d4ac32bba814cc55210b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25203
device_option propagation is completely broken in Caffe2 for cases when pass
through operators are used. As an example Gather operator don't have gradient
and passes through it's inputs, which results in incorrect detection of the
components for sparse parameter aggregation (component will be empty instead of
the real device).
This diff is trying to fix this issue.
Test Plan:
net_transform is finally working with Gather + FloatToHalf transformed model
instead of failing because of incorrect number of components.
Reviewed By: dzhulgakov
Differential Revision: D16936041
fbshipit-source-id: 916551b933469f04e32ddf86ec4b2c07f76c9176
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24863
Add the sparse feature name in logging for ease of debugging
Test Plan:
./buck-out/gen/caffe2/caffe2/fb/dper/layer_models/sparse_nn/pooling_test#binary.par -r test_simple_sum_pooling_named_exception
Another test for id_score_list. the original sparse_key is equivalent to get_key(self.input_record)()
P98343716
./buck-out/gen/caffe2/caffe2/python/layers_test-2.7#binary.par -r test_get_key
Reviewed By: chocjy
Differential Revision: D16901964
fbshipit-source-id: 2523de2e290aca20afd0b909111541d3d152a588