Summary:
Resubmit #20698 which got messed up.
Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl.
Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20745
Differential Revision: D15429196
Pulled By: dzhulgakov
fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca
Summary:
As part of the Variable/Tensor merge work: https://github.com/pytorch/pytorch/issues/13638, we make the following changes in this PR:
1. Remove the `Variable::Impl` class and the `DifferentiableViewImpl` class
2. Change all `Variable.data()` call sites to either use `Variable` directly, or use `Variable.tensor_data()`
3. Remove `Variable.data()` API
3. Add `Variable.variable_data()` that matches `tensor.data` in Python API, which creates a new `Variable` that shares the same storage and tensor metadata with the original `Variable`, but with a completely new autograd history.
After this PR, Variable doesn't wrap a Tensor internally anymore, and both Variable and Tensor use the same TensorImpl class as its `impl_`. The only difference is that Variable always has AutogradMeta in its TensorImpl, but Tensor doesn't.
**Note that this PR is BC-breaking in the following use cases:**
**Use Case 1:**
Previously, `x.data = y` works even if `x` and `y` are of different TensorImpl type (e.g. `x` is a CPU dense tensor whose impl is of type TensorImpl, while `y` is a CPU sparse tensor whose impl is of type SparseTensorImpl). However, after this PR, `x.data = y` doesn't work anymore if `x` and `y` are of different TensorImpl type, because the underlying implementation `variable.set_data(tensor)` no longer works if `variable` and `tensor` have different TensorImpl type.
**Use Case 2:**
If a tensor `x`'s `grad` is sparse, accumulating dense gradients to `x` will change the tensor that `x.grad` is pointing to. This is better illustrated with the following example:
```python
params = torch.tensor([1.5, 1.5]).requires_grad_()
with torch.no_grad():
# Change gradient to a sparse tensor
params.grad = torch.sparse_coo_tensor(torch.tensor([[1, 1]]).long(), torch.tensor([1., 1.]))
grad_saved = params.grad
params.backward(torch.tensor([1.5, 1.5]))
assert id(grad_saved) == id(params.grad) # This will fail after this PR
```
The assertion in the last line will fail after this PR, because adding dense gradients to sparse gradients will change the `params.grad` tensor reference.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17072
Differential Revision: D14075257
Pulled By: yf225
fbshipit-source-id: 0e681df641270dea586042dd26db59f2e76b5957
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20802
Need this for sequence model
Reviewed By: dzhulgakov
Differential Revision: D15448529
fbshipit-source-id: cd5abe3b689fc0e02feff10faf8cd61c99369f4f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20786
Add a method to LayerModelHelper to filter metrics_schema. A general model builder may add metric schema that is not needed in some situations. This change add the ability to skip those unneeded.
Reviewed By: alex1o1o7cloud
Differential Revision: D15418140
fbshipit-source-id: 520f5dffd9938cf206cb1352e2953a4d4d2b6ab1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20502
Following D15307410 removing more floating point exceptions in unit tests
Reviewed By: hx89
Differential Revision: D15340930
fbshipit-source-id: 269fc75e0800bc9d39126767a0f3ca15cd8b0cad
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20501
Fixing unit tests related to optimizer related operators and tests
Reviewed By: hx89
Differential Revision: D15307410
fbshipit-source-id: e5400c26e08f26191ee542fe6b02e0a69bc4e1ae
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20020
Add shape inference for LearningRate op. The output (lr) should have similar shape with input (iteration), but not the same type (float vs int).
Reviewed By: un-disclosed
Differential Revision: D15112300
fbshipit-source-id: 09969aefa15172a6f3c70cd9b2548e3020da5d7a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20108
Add cpp runs for c2, hooked up via pybinds. Print output to terminal. This is not hooked up with the pep output yet because I'd like to verify the numbers first.
Note that this isn't quite the same mechanism as the pytorch cpp hookup, which uses cpp_python_extensions. If I can use the same mechanism to pull all the inputs for c2 through cpp and do FeedBlobs in cpp, then I'll switch to that.
Reviewed By: zheng-xq
Differential Revision: D15155976
fbshipit-source-id: 708079dacd3e19aacfe43d70c5e5bc54da2cf9e3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19803
There is no reason to set a specific logging level for this module. Removing it to just use the default logging level.
Differential Revision: D15098834
fbshipit-source-id: 1654c04500c19690ddde03343f2e84b04bb0f1ef
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19660
Implementation of aggregated Scale operator.
The operator takes a list of tensors as an input and scales all of them them with the argument float value.
The tensor sizes can be different, therefore bookkeeping of the sizes and pointers to the tensors are
necessary for the GPU version of the kernel.
Reviewed By: BIT-silence
Differential Revision: D14984233
fbshipit-source-id: 37cc97159a4f2c38cd6fff4f5710ab7d3a773611
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20062
Previously, the batch counter is incremented even if none of the readers has data. In this diff,
1) Limiter is applied to the last reader so that the batch counter is not incremented unless the first N-1 readers have data
2) The stop blob of the last reader as the stop blob of the task so that it's checked before the counter is incremented
Reviewed By: xianjiec
Differential Revision: D15099761
fbshipit-source-id: 47ed6c728118fe453cf57ac3457085867939485b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20044
We do not have a gating functor. This diff adds it. I'm leveraging existing learning rate op because there are other policies I'll need to use as a union together.
* Since there are other policy in LearningRateOp which will be used as a union, I chose to add it as a LearningRateOp.
* constantwarmup cannot do step function of nonzero first and zero later
* There are multiple uses for it,
* e.g. as a gating blob generator that is useful for turning off.
* e.g. as a learning rate switcher at certain iteration.
* For generalizability, no regulation or constraint is applied on the range of the values
* see figure below for illustration
{F157366621}
Reviewed By: ccheng16
Differential Revision: D15178229
fbshipit-source-id: 1e66e9a4bc1bfb946a57f8aefc97d8170f6be731
Summary:
When output blob names are specified while load_all=1, output blob names are ignored. However, this behavior is not documented. In this diff, we just disallow users to provide blob names when load_all=1.
See discussion at https://fb.workplace.com/groups/1405155842844877/permalink/2714909788536136/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19133
Reviewed By: dzhulgakov
Differential Revision: D14883698
Pulled By: chandlerzuo
fbshipit-source-id: 6e4171e36c4ccc4f857e79da98b858a06b7d8ad6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18499
If the init op is not fp16 compatible, it should throw.
However, in the special case where the original init op is UniformFill,
we replace it with Float16UniformFill
Reviewed By: kennyhorror
Differential Revision: D14627209
fbshipit-source-id: eb427772874a732ca8b3a25d06670d119ce8ac14
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19442
For cases like CV, some of ops like transpose and tile will mangle the batch size so that we don't know how to adjust output batch size. In this case, the current solution is just fix the input batch statically and do not adjust output batch size.
Reviewed By: zrphercule
Differential Revision: D15007237
fbshipit-source-id: a21b943a52ee5462d9d7804dfae44360f579f8cf
Summary:
In this PR, the fusion alogrithms are improved to support DNNLOWP.
1. Enabled conv fusions for DNNLOWP
2. Fused order switch op into following quantize op
3. Improve conv+sum fusion to parse larger scope/window
4. re-org fusion code to fix random crash issue due to changing graph
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18843
Differential Revision: D15021030
Pulled By: yinghai
fbshipit-source-id: 88d2199d9fc69f392de9bfbe1f291e0ebf78ab08
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19083
As we have discussed, there are too many of AdjustBatch ops and they incur reallocation overhead and affects the performance. We will eliminate these ops by
- inling the input adjust batch op into Glow
- inling the output adjust batch op into OnnxifiOp and do that only conditionally.
This is the C2 part of the change and requires change from Glow side to work e2e.
Reviewed By: rdzhabarov
Differential Revision: D14860582
fbshipit-source-id: ac2588b894bac25735babb62b1924acc559face6