Summary:
Closes gh-42998
The issue is marked for 1.6.1, if there's anything I need to do for a backport please tell me what that is.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43053
Reviewed By: izdeby
Differential Revision: D23131708
Pulled By: malfet
fbshipit-source-id: 2744bacce6bdf6ae463c17411b672f09707e0887
Summary:
Related to https://github.com/pytorch/pytorch/issues/40397
Inspired by ezyang's comment at https://github.com/pytorch/pytorch/issues/40397#issuecomment-648233001, this PR attempts to leverage using `__all__` to explicitly export private functions from `_VariableFunctions.pyi` in order to make `mypy` aware of them after:
```
if False:
from torch._C._VariableFunctions import *
```
The generation of the `__all__` template variable excludes some items from `unsorted_function_hints`, as it seems that those without hints end up not being explicitly included in the `.pyi` file: I leaned on the side of caution and opted for having `__all__` consistent with the definitions inside the file. Additionally, added some pretty-printing to avoid having an extremely long line.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40499
Differential Revision: D22240716
Pulled By: ezyang
fbshipit-source-id: 77718752577a82b1e8715e666a8a2118a9d3a1cf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38490
A meta tensor is a tensor that is a lot like a normal tensor,
except it doesn't actually have any data associated with it.
You can use them to carry out shape/dtype computations without
actually having to run the actual code; for example, this could
be used to do shape inference in a JIT analysis pass.
Check out the description in DispatchKey.h for more information.
Meta tensors are part of a larger project to rationalize how we
write kernels so that we don't have to duplicate shape logic
in CPU kernel, CUDA kernel and meta kernel (this PR makes the
duplication problem worse!) However, that infrastructure can
be built on top of this proof of concept, which just shows how
you can start writing meta kernels today even without this
infrastructure.
There are a lot of things that don't work:
- I special cased printing for dense tensors only; if you try to
allocate a meta sparse / quantized tensor things aren't going
to work.
- The printing formula implies that torch.tensor() can take an
ellipsis, but I didn't add this.
- I wrote an example formula for binary operators, but it isn't
even right! (It doesn't do type promotion of memory layout
correctly). The most future proof way to do it right is to
factor out the relevant computation out of TensorIterator,
as it is quite involved.
- Nothing besides torch.add works right now
- Meta functions are ALWAYS included in mobile builds (selective
build doesn't work on them). This isn't a big deal for now
but will become more pressing as more meta functions are added.
One reason I'm putting up this PR now is to check with Yinghai Lu
if we can unblock shape inference for accelerators, while we are
still working on a long term plan for how to unify all shape
computation across our kernels.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D21935609
Pulled By: ezyang
fbshipit-source-id: f7d8636eeb8516b6bc296db99a16e56029972eee
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38211
Just because the annotations are inline doesn't mean the files type
check; most of the newly annotated files have type errors and I
added exclusions for them in mypy.ini. The payoff of moving
all of these modules inline is I can delete the relevant code
generation logic for the pyi files (which was added ignore
annotations that weren't actually relevant anymore.)
For the most part the translation was completely mechanical, but there
were two hairy issues. First, I needed to work around a Python 3.6 and
earlier bug where Generic has a nontrivial metaclass. This fix is in
torch/jit/__init__.py. Second, module.py, we need to apply the same
fix for avoiding contravariance checks that the pyi file used to have;
this is done by declaring forward as a variable (rather than a
function), which appears to be sufficient enough to get mypy to not
contravariantly check input arguments.
Because we aren't actually typechecking these modules in most
cases, it is inevitable that some of these type annotations are wrong.
I slavishly copied the old annotations from the pyi files unless there
was an obvious correction I could make. These annotations will probably
need fixing up later.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D21497397
Pulled By: ezyang
fbshipit-source-id: 2b08bacc152c48f074e7edc4ee5dce1b77d83702
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38173
- Introduce torch.types.Device representing all "device-like" types
- Stubbed torch.device.__reduce__
- Stubbed all torch._C functions comprehensively
- Deleted _safe_call which is unused throughout the codebase
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D21497399
Pulled By: ezyang
fbshipit-source-id: 1f534442b0ec9a70d556545d072f2c06a08b9d15
Summary:
Most test files have a ton of errors; there's not much point adding ignores for them though. The way of working is simply to run `mypy test/test_somefile.py`, fix up the errors, then add that file to the `files =` list in `mypy.ini`.
Can't add all of `test/*` by default, because the JIT test files have (on purpose) syntax errors that are meant to exercise the robustness of the JIT to bad annotations. Leave those alone for now.
_Depends on the ghstacked PRs in gh-38173, only the last 2 commits are new._
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38220
Differential Revision: D21503481
Pulled By: ezyang
fbshipit-source-id: 63026e73201c549d64647a03a20a4c6687720244
Summary:
Fixes https://github.com/pytorch/pytorch/issues/37259, fixes https://github.com/pytorch/pytorch/issues/20156
This lazily calls `at::init_num_threads` once for each thread by adding a call to `lazy_init_num_threads` in `at::parallel_for` and `at::parallel_reduce`.
If this solution is okay, then we should add the same to guard other places that might use MKL or OpenMP.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37461
Reviewed By: ezyang
Differential Revision: D21472763
Pulled By: ilia-cher
fbshipit-source-id: 889d6664f5bd4080037ade02ee324b1233992915
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38157
This removes the error prone process of assembling `torch/__init__.pyi`
(and frequently forgetting to expose things), since now we can simply
rely on the true source file to get things done. Most of the old
codegen in gen_pyi.py is now rerouted to various files:
- `torch/_C/__init__.pyi` (the dumping pile of all misc bindings)
- `torch/_C/_nn.pyi` (NN function bindings)
- `torch/_C/_VariableFunctions.pyi` (torch function bindings)
`torch.types` grew a bunch more definitions that previously where
defined in `torch/__init__.pyi`
Some miscellaneous changes
- Fixed a bug where we treat single TensorList argument as implying
varargs are accepted. This is actually only supported on IntList.
This means we can correctly generate a stub for dequantize.
- Add missing manual stub for nonzero
- Switched torch/onnx/operators.py to directly refer to _C module,
since apparently mypy doesn't think that methods prefixed with
underscores get reexported. This may be a recurring theme; maybe
we need to find a better way to solve it.
Because I was really lazy, I dumped namedtuple definitions in both
`torch._C` and `torch._C._VariableFunctions`. This is definitely wrong.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D21497400
Pulled By: ezyang
fbshipit-source-id: 07b126141c82efaca37be27c07255cb2b9b3f064
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36815
Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.
ghstack-source-id: 103267234
Test Plan:
buck run caffe2/test:quantization --
quantization.test_quantized.TestQuantizedOps.test_channel_shuffle
X86 implementation for QNNPACK is sse2 so this may not be the most efficient
for x86.
Reviewed By: dreiss
Differential Revision: D21093841
fbshipit-source-id: 5282945f352df43fdffaa8544fe34dba99a5b97e
Summary:
- added tests that showcase the problems
- fixed the problems
These changes would allow me to remove many "# type: ignore" comments in my codebase.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36358
Differential Revision: D21230704
Pulled By: ezyang
fbshipit-source-id: e6d475a0aa1fb40258fa0231ade28c38108355fb
Summary:
This enables type checking for named tensors, and fixes the underlying problems.
The bulk of the fix is modifying `gen_pyi.py` to generate reasonable types in `torch/__init__.pyi`. I took two approaches: First, I tried to take a generic approach and added `DimnameList` to the magic list of variable argument lists. Unfortunately that was insufficient for many of the method signatures, so I also added manual definitions for `rename`, `refine_names`, and `unflatten` in `__init__.pyi.in`.
Finally there were a few problems in the doctests that had to be cleaned up so that `test/test_type_hints.py` will run successfully.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36890
Differential Revision: D21259192
Pulled By: zou3519
fbshipit-source-id: 2a9e7d7bec9be5ae3ae2995078c6abfa3eca103c
Summary:
Per title. See related https://github.com/pytorch/pytorch/pull/34570.
In PyTorch 1.7 the plan is for torch.div and Python's division operator to perform "true" division, like Python 3, JAX, and NumPy. To facilitate this change, this PR expands true_divide to be a method so it can cover all of torch.div's use cases.
New true_divide tests are added to test_torch.py, test_type_promotion.py, and test_sparse.py.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34794
Differential Revision: D20545507
Pulled By: mruberry
fbshipit-source-id: 55286f819716c8823d1930441a69008560ac2bd5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35231Fixes#35213
(Note: this ignores all push blocking failures!)
Test Plan: `mypy -c "import torch; ten = torch.tensor([1.0, 2.0, 3.0]); print(7 + ten)"` should not produce any warnings
Differential Revision: D20604924
Pulled By: pbelevich
fbshipit-source-id: 53a293a99b3f2ab6ca5516b31f3a92f67eb67a39
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34468
This PR prepares `at::Generator` for pybind11's `type_caster<at::Generator>` which is required to implement custom RNG in python. The following changes are done:
1. `at::Generator` was moved to `c10::GeneratorImpl` (similar to `c10::TensorImpl`)
2. `at::Generator` was recreated as a holder of `std::shared_ptr<c10::GeneratorImpl>` (similar to `at::Tensor` that holds `c10::intrusive_ptr<c10::TensorImpl>`)
3. Most of `at::Generator*` usages were replaced with `at::Generator`
TBD: replacing `Generator generator = nullptr` with `{}` requires JIT changes(adding Generator to IValue?)
Differential Revision: D20549420
Pulled By: pbelevich
fbshipit-source-id: 4c92a40eab8f033b359bb6c93f4cd84b07ee8d4e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34348
We need this function to do swap dequantize for prim::ListConstruct since
the output of prim::ListConstruct is a list of Tensors
Test Plan:
.
Imported from OSS
Differential Revision: D20504454
fbshipit-source-id: e6155e37da98e2219a6f79737cd46fe32a509c9f
Summary:
(Updated per review feedback)
`torch.floor_divide` is currently a function that can operate on two tensors or a tensor and a scalar (scalar x scalar floor division is handled natively by Python and the JIT has a builtin function for it). This PR updates it to:
- have an out variant: `floor_divide(x, y, out=z)`
- be a method on a tensor: `x.floor_divide(y)`
- have an in-place variant: `x.floor_divide_(y)`
- work with sparse tensors
Tests are added to test_sparse.py and test_torch.py for these new behaviors.
In addition, this PR:
- cleans up the existing sparse division and true_division code and improves their error message
- adds testing of sparse true_division to test_sparse.py
- extends existing floor_divide testing in test_torch to run on CUDA, too, not just the CPU
Unfortunately, making floor_divide a method requires breaking backwards compatibility, and floor_divide has been added to the BC whitelist since this is international. The BC issue is that the first parameter name to torch.floor_divide is changing from input to self. If you previously called torch.floor_divide with keyword arguments, e.g. torch.floor_divide(input=x, other=y), you will need to update to torch.floor_divide(self=x, other=y), or the more common torch.floor_divide(x, y).
The intent of this PR is to allow floor_divide to be substituted for division (torch.div, /) wherever division was previously used. In 1.6 we expect torch.div to perform true_division, and floor_divide is how users can continue to perform integer division with tensors.
There are two potential follow-up issues suggested by this PR:
- the test framework might benefit from additional tensor construction classes, like one to create dividends and divisors for multiple dtypes
- the test framework might benefit from a universal function test class. while methods have reasonable coverage as part of test_torch.py's TestTensorOp tests, function coverage is spotty. Universal functions are similar enough it should be possible to generate tests for them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34552
Differential Revision: D20509850
Pulled By: mruberry
fbshipit-source-id: 2cd3c828aad67191c77f2ed8470411e246f604f8
Summary:
Per title.
Currently torch.full will always (attempt to) produce a float tensor. This is inconsistent with NumPy in (at least) two cases:
- When integral fill values (including bool) are given
- When complex fill values are given
For example:
```
np.full((1, 2), 1).dtype
: dtype('int64')
np.full((1, 2), (1 + 1j)).dtype
: dtype('complex128')
```
Whereas in PyTorch
```
torch.full((1, 2), 1).dtype
: torch.float32
torch.full((1, 2), (1 + 1j)).dtype
: RuntimeError: value cannot be converted to type float without overflow: (1,1)
```
This PR begins the process of deprecating our current behavior of returning float tensors (by default) when given integer fill values by warning the user that integer fill values will require explicitly specifying the dtype or out kwargs in 1.6, and in 1.7 the behavior will change to return a LongTensor by default (BoolTensor for bool values). The intermediate 1.6 release is to prevent changing the behavior silently and unexpectedly.
The PR also implements inference for complex types. So that with it:
```
torch.full((1, 2), (1 + 1j)).dtype
: torch.complex64
```
The complex type inference returns a ComplexFloat tensor when given a complex fill value (and no dtype or out kwarg is specified), unless the default dtype is Double, in which case a ComplexDouble tensor is returned.
A test for these behaviors is added to test_torch.py.
Implementation note:
This PR required customizing full's dispatch because currently in eager codegen the TensorOptions object passed to functions improperly sets has_dtype() to true, even if the user did not explicitly provide a dtype. torch.arange already worked around this issue with its own custom implementation. The JIT, however, does pass a properly constructed TensorOptions object.
Future Work:
This PR does not extend torch.full's complex type inference to ONNX. This seems unlikely to come up and will be a clear error if it does. When integer type inference is added to torch.full, however, then porting the behavior to ONNX may be warranted. torch.arange ported its complex type promotion logic to ONNX, for example.
Additionally, this PR mostly leaves existing call sites in PyTorch that would trigger this warning intact. This is to be more minimal (since the PR is BC breaking). I will submit a separate PR fixing PyTorch's call sites.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34709
Differential Revision: D20509387
Pulled By: mruberry
fbshipit-source-id: 129593ba06a1662032bbbf8056975eaa59baf933
Summary:
(Updated per review feedback)
`torch.floor_divide` is currently a function that can operate on two tensors or a tensor and a scalar (scalar x scalar floor division is handled natively by Python and the JIT has a builtin function for it). This PR updates it to:
- have an out variant: `floor_divide(x, y, out=z)`
- be a method on a tensor: `x.floor_divide(y)`
- have an in-place variant: `x.floor_divide_(y)`
- work with sparse tensors
Tests are added to test_sparse.py and test_torch.py for these new behaviors.
In addition, this PR:
- cleans up the existing sparse division and true_division code and improves their error message
- adds testing of sparse true_division to test_sparse.py
- extends existing floor_divide testing in test_torch to run on CUDA, too, not just the CPU
Unfortunately, making floor_divide a method requires breaking backwards compatibility, and floor_divide has been added to the BC whitelist since this is international. The BC issue is that the first parameter name to torch.floor_divide is changing from input to self. If you previously called torch.floor_divide with keyword arguments, e.g. torch.floor_divide(input=x, other=y), you will need to update to torch.floor_divide(self=x, other=y), or the more common torch.floor_divide(x, y).
The intent of this PR is to allow floor_divide to be substituted for division (torch.div, /) wherever division was previously used. In 1.6 we expect torch.div to perform true_division, and floor_divide is how users can continue to perform integer division with tensors.
There are two potential follow-up issues suggested by this PR:
- the test framework might benefit from additional tensor construction classes, like one to create dividends and divisors for multiple dtypes
- the test framework might benefit from a universal function test class. while methods have reasonable coverage as part of test_torch.py's TestTensorOp tests, function coverage is spotty. Universal functions are similar enough it should be possible to generate tests for them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34552
Differential Revision: D20497453
Pulled By: mruberry
fbshipit-source-id: ac326f2007d8894f730d1278fef84d63bcb07b5d
Summary:
This PR fixed documentation for `torch.add` with alpha. It also fixed these deprecated python calls `torch.add` and `torch.addmm` in tests, which may affect performance in *test/test_sparse.py* and *test/test_nn.py*.
cc csarofeen ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33935
Differential Revision: D20313320
Pulled By: ngimel
fbshipit-source-id: fb08413d7e244865952e3fc0e1be7f1794ce4e9a
Summary:
I've been using pytorch with type hintings, and I found errors that can be easily fixed. So I'm creating this PR to fix type bugs.
I expected below code should be type-checked without any errors.
```python
import torch
from torch.nn import Linear
from torch.autograd import Variable
from torch.optim import AdamW
from torch.utils import hooks
# nn.Module should have training attribute
module = Linear(10, 20)
module.training
# torch should have dtype bfloat16
tensor2 = torch.tensor([1,2,3], dtype=torch.bfloat16)
# torch.Tensor.cuda should accept int or str value
torch.randn(5).cuda(1)
torch.tensor(5).cuda('cuda:0')
# optimizer should have default attribute
module = Linear(10, 20)
print(AdamW(module.weight).default)
# torch.Tensor should have these boolean attributes
torch.tensor([1]).is_sparse
torch.tensor([1]).is_quantized
torch.tensor([1]).is_mkldnn
# Size class should tuple of int
a, b = torch.tensor([[1,2,3]]).size()
# check modules can be accessed
torch.nn.parallel
torch.autograd.profiler
torch.multiprocessing
torch.sparse
torch.onnx
torch.jit
torch.hub
torch.random
torch.distributions
torch.quantization
torch.__config__
torch.__future__
torch.ops
torch.classes
# Variable class's constructor should return Tensor
def fn_to_test_variable(t: torch.Tensor):
return None
v = Variable(torch.tensor(1))
fn_to_test_variable(v)
# check RemovableHandle attributes can be accessed
handle = hooks.RemovableHandle({})
handle.id
handle.next_id
# check torch function hints
torch.is_grad_enabled()
```
But current master branch raises errors. (I checked with pyright)
```
$ pyright test.py
Searching for source files
Found 1 source file
test.py
12:45 - error: 'bfloat16' is not a known member of module
15:21 - error: Argument of type 'Literal[1]' cannot be assigned to parameter 'device' of type 'Optional[device]'
'int' is incompatible with 'device'
Cannot assign to 'None'
16:22 - error: Argument of type 'Literal['cuda:0']' cannot be assigned to parameter 'device' of type 'Optional[device]'
'str' is incompatible with 'device'
Cannot assign to 'None'
23:19 - error: Cannot access member 'is_sparse' for type 'Tensor'
Member 'is_sparse' is unknown
24:19 - error: Cannot access member 'is_quantized' for type 'Tensor'
Member 'is_quantized' is unknown
25:19 - error: Cannot access member 'is_mkldnn' for type 'Tensor'
Member 'is_mkldnn' is unknown
32:7 - error: 'autograd' is not a known member of module
33:7 - error: 'multiprocessing' is not a known member of module
34:7 - error: 'sparse' is not a known member of module
35:7 - error: 'onnx' is not a known member of module
36:7 - error: 'jit' is not a known member of module
37:7 - error: 'hub' is not a known member of module
38:7 - error: 'random' is not a known member of module
39:7 - error: 'distributions' is not a known member of module
40:7 - error: 'quantization' is not a known member of module
41:7 - error: '__config__' is not a known member of module
42:7 - error: '__future__' is not a known member of module
44:7 - error: 'ops' is not a known member of module
45:7 - error: 'classes' is not a known member of module
60:7 - error: 'is_grad_enabled' is not a known member of module
20 errors, 0 warnings
Completed in 1.436sec
```
and below list is not checked as errors, but I think these are errors too.
* `nn.Module.training` is not boolean
* return type of `torch.Tensor.size()` is `Tuple[Unknown]`.
---
related issues.
https://github.com/pytorch/pytorch/issues/23731, https://github.com/pytorch/pytorch/issues/32824, https://github.com/pytorch/pytorch/issues/31753
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33762
Differential Revision: D20118884
Pulled By: albanD
fbshipit-source-id: 41557d66674a11b8e7503a48476d4cdd0f278eab
Summary:
Resolve https://github.com/pytorch/pytorch/issues/33699
`torch/__init__.pyi` will be generated like
```python
# TODO: One downside of doing it this way, is direct use of
# torch.tensor.Tensor doesn't get type annotations. Nobody
# should really do that, so maybe this is not so bad.
class Tensor:
requires_grad: _bool = ...
grad: Optional[Tensor] = ...
# some methods here...
overload
def bernoulli_(self, p: _float=0.5, *, generator: Generator=None) -> Tensor: ...
def bfloat16(self) -> Tensor: ...
def bincount(self, weights: Optional[Tensor]=None, minlength: _int=0) -> Tensor: ...
# some methods here...
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33747
Differential Revision: D20090316
Pulled By: ngimel
fbshipit-source-id: b9ce4c0d4ef720c94ccac0a0342a012e8cf3af0c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31116
Changelist:
- remove BUILD_NAMEDTENSOR macro
- remove torch._C._BUILD_NAMEDTENSOR
- remove all python behavior that relies on torch._C._BUILD_NAMEDTENSOR
Future:
- In the next diff, I will remove all usages of
ATen/core/EnableNamedTensor.h since that header doesn't do anything
anymore
- After that, we'll be done with the BUILD_NAMEDTENSOR removal.
Test Plan: - run CI
Differential Revision: D18934951
Pulled By: zou3519
fbshipit-source-id: 0a0df0f1f0470d0a01c495579333a2835aac9f5d
Summary:
I've typed some attributes from ee920b92c4/torch/csrc/autograd/python_variable.cpp (L490) that were not included in the stubs so that MyPy will be aware of them. I made sure to only add those attributes that are mentioned somewhere in the documentation. If there are attributes mentioned in the documentation that are not meant to be part of the public API (or the opposite), please let me know. I've also made sure that attributes that can't be set are typed as read-only properties. If setting `dtype`, `shape`, `device` or `names` directly is not part of the public API, let me know and I'll make them properties as well.
I've also added `__len__`, `__iter__` and `__contains__`, which means MyPy will no longer complain about `len(t)`, `t1 in t2` and `for t1 in t2`.
Shameless plug: I have another typing-related PR here that needs review: https://github.com/pytorch/pytorch/pull/27445
Fixes https://github.com/pytorch/pytorch/issues/28457
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28578
Reviewed By: lerks
Differential Revision: D18113954
Pulled By: fmassa
fbshipit-source-id: 0b69a2966d22054d8d87392f19ec5aa3918773bc
Summary:
Replaces fused TH kernels with a 2-liner of regular Tensor functions.
Benchmarking revealed that performance improves compared to PyTorch 1.2.
Refs: https://github.com/pytorch/pytorch/issues/24631, https://github.com/pytorch/pytorch/issues/24632, https://github.com/pytorch/pytorch/issues/24764, https://github.com/pytorch/pytorch/issues/24765
VitalyFedyunin
### Benchmarking results on my laptop:
## 1.4.0a0+f63c9e8 output
```
PyTorch version: 1.4.0a0+f63c9e8
CPU Operator sanity check:
tensor(0.5926, grad_fn=<MeanBackward0>)
tensor([-0.0159, -0.0170, -0.0011, -0.0083, -0.0140, -0.0217, -0.0290, -0.0262,
-0.0078, -0.0129])
double backward
tensor(-0.1540, grad_fn=<SumBackward0>)
ok
GPU Operator sanity check:
tensor(0.5601, device='cuda:0', grad_fn=<MeanBackward0>)
tensor([-0.0393, -0.0316, -0.0233, -0.0140, -0.0141, -0.0161, -0.0322, -0.0238,
-0.0054, -0.0151], device='cuda:0')
double backward
tensor(-0.2148, device='cuda:0', grad_fn=<SumBackward0>)
ok
CPU warmup 1000 took 9.025700273923576e-05
CPU warmup 10000 took 0.0009383050055475906
CPU warmup 100000 took 0.0015631120040779933
CPU warmup TOTAL time 0.0026368020044174045
CPU forward 1000 took 6.919399311300367e-05
CPU forward 10000 took 0.00014462800754699856
CPU forward 100000 took 0.0011234670091653243
CPU forward 1000000 took 0.014555767003912479
CPU forward 10000000 took 0.13409724000666756
CPU forward 100000000 took 1.246048310000333
CPU forward TOTAL time 1.3961777170043206
CPU for- & backward 1000 took 0.0003219560021534562
CPU for- & backward 10000 took 0.00037290599721018225
CPU for- & backward 100000 took 0.001975035003852099
CPU for- & backward 1000000 took 0.02621342398924753
CPU for- & backward 10000000 took 0.2944270490115741
CPU for- & backward 100000000 took 1.6856628700043075
CPU for- & backward TOTAL time 2.0091958299890393
GPU warmup 1000 took 0.0002462909906171262
GPU warmup 10000 took 9.991199476644397e-05
GPU warmup 100000 took 0.00034347400651313365
GPU warmup TOTAL time 0.0007382350013358518
GPU forward 1000 took 9.67290106927976e-05
GPU forward 10000 took 9.349700121674687e-05
GPU forward 100000 took 9.384499571751803e-05
GPU forward 1000000 took 0.0004975290066795424
GPU forward 10000000 took 0.0017606960027478635
GPU forward 100000000 took 0.003572814996005036
GPU forward TOTAL time 0.006185991995153017
GPU for- & backward 1000 took 0.00035818999458570033
GPU for- & backward 10000 took 0.0003240450023440644
GPU for- & backward 100000 took 0.0003223370003979653
GPU for- & backward 1000000 took 0.00036740700306836516
GPU for- & backward 10000000 took 0.0003690610028570518
GPU for- & backward 100000000 took 0.0003672500024549663
GPU for- & backward TOTAL time 0.002197896988946013
```
## 1.2 output
```
PyTorch version: 1.2.0
CPU Operator sanity check:
tensor(0.5926, grad_fn=<SoftMarginLossBackward>)
tensor([-0.0159, -0.0170, -0.0011, -0.0083, -0.0140, -0.0217, -0.0290, -0.0262,
-0.0078, -0.0129])
double backward
tensor(-0.1540, grad_fn=<SumBackward0>)
ok
GPU Operator sanity check:
tensor(0.5601, device='cuda:0', grad_fn=<SoftMarginLossBackward>)
tensor([-0.0393, -0.0316, -0.0233, -0.0140, -0.0141, -0.0161, -0.0322, -0.0238,
-0.0054, -0.0151], device='cuda:0')
double backward
tensor(-0.2148, device='cuda:0', grad_fn=<SumBackward0>)
ok
CPU warmup 1000 took 8.422900282312185e-05
CPU warmup 10000 took 0.00036992700188420713
CPU warmup 100000 took 0.003682684007799253
CPU warmup TOTAL time 0.004169487991021015
CPU forward 1000 took 5.521099956240505e-05
CPU forward 10000 took 0.00036948200431652367
CPU forward 100000 took 0.003762389998883009
CPU forward 1000000 took 0.03725024699815549
CPU forward 10000000 took 0.3614480490068672
CPU forward 100000000 took 3.6139175269927364
CPU forward TOTAL time 4.016912263003178
CPU for- & backward 1000 took 0.0002734809968387708
CPU for- & backward 10000 took 0.0006605249946005642
CPU for- & backward 100000 took 0.005437346000690013
CPU for- & backward 1000000 took 0.051245586000732146
CPU for- & backward 10000000 took 0.5291594529990107
CPU for- & backward 100000000 took 5.23841712900321
CPU for- & backward TOTAL time 5.8253340990049765
GPU warmup 1000 took 0.0005757809994975105
GPU warmup 10000 took 0.0004058420017827302
GPU warmup 100000 took 0.0003764610009966418
GPU warmup TOTAL time 0.0013992580061312765
GPU forward 1000 took 0.0003543390048434958
GPU forward 10000 took 0.0003633670130511746
GPU forward 100000 took 0.0004807310033356771
GPU forward 1000000 took 0.0005875999922864139
GPU forward 10000000 took 0.0016903509967960417
GPU forward 100000000 took 0.014400018990272656
GPU forward TOTAL time 0.0179396449966589
GPU for- & backward 1000 took 0.0006167769897729158
GPU for- & backward 10000 took 0.0006845899915788323
GPU for- & backward 100000 took 0.000631830989732407
GPU for- & backward 1000000 took 0.0010741150035755709
GPU for- & backward 10000000 took 0.0017265130009036511
GPU for- & backward 100000000 took 0.014847910992102697
GPU for- & backward TOTAL time 0.01965981800458394
```
### Code used for performance test
```
import torch
import torch.nn.functional as F
import torch.nn as nn
from timeit import default_timer
torch.manual_seed(0)
cpu = torch.device('cpu')
gpu = torch.device('cuda')
loss_fn = F.soft_margin_loss
def run_benchmark(name, depth, require_grad, device, fn):
total_start = default_timer()
for i in range(3, 3 + depth):
start = default_timer()
n = 10 ** i
a = torch.rand(n, requires_grad=require_grad, device=device)
b = torch.rand(n, device=device)
fn(a, b)
end = default_timer()
print('{} {} took {}'.format(name, n, end-start))
total_end = default_timer()
print('{} TOTAL time {}'.format(name, total_end-total_start))
def fwd_only(a, b):
out = loss_fn(a, b)
def fwd_bck(a, b):
out = loss_fn(a, b)
out.backward()
def sanity_check(name, device):
print('{} Operator sanity check:'.format(name))
a = torch.rand(10, requires_grad=True, device=device)
b = torch.rand(10, device=device)
out = loss_fn(a,b)
print(out)
out.backward()
print(a.grad)
print('double backward')
loss = loss_fn(a, b)
loss2 = torch.autograd.grad(loss, a, create_graph=True)
z = loss2[0].sum()
print(z)
z.backward()
print('ok')
print()
print('PyTorch version:', torch.__version__)
sanity_check('CPU', cpu)
sanity_check('GPU', gpu)
print()
run_benchmark('CPU warmup', 3, False, cpu, fwd_only)
run_benchmark('CPU forward', 6, False, cpu, fwd_only)
run_benchmark('CPU for- & backward', 6, True, cpu, fwd_bck)
print()
run_benchmark('GPU warmup', 3, False, gpu, fwd_only)
run_benchmark('GPU forward', 6, False, gpu, fwd_only)
run_benchmark('GPU for- & backward', 6, True, gpu, fwd_bck)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27673
Differential Revision: D17889288
Pulled By: ezyang
fbshipit-source-id: 9ddffe4dbbfab6180847a8fec32443910f18f0a9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27294Fixes#27291
I'm a little annoyed that I have to reintroduce manual binding code. But it's
probably not a good idea to teach the codegen how to do fastpath functions
(is it?)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D17763486
Pulled By: ezyang
fbshipit-source-id: 5793b53e2db80b044e57faae325a95c649d9d459
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25475
I got sucked into this rabbit hole when I was trying to understand
what I should do with TensorTypeId occurrences in
torch/csrc/utils/tensor_new.cpp. I eventually concluded that all of my problems
were because Tensor.new_empty was hand implemented and not actually a native
function. So I made it a native function.
There are a bunch of other new_* functions which should get this
treatment, but I'm sending out this PR just to show how it can
be done.
The general recipe:
1. Implement a concept of TensorOptions merging (TensorOptions::merge_in).
This represents the notion of taking a tensor, but "overriding" some
of its values with specific overrides. One subtlety here is how
devices get merged; see the comments for what our existing behavior is,
and how I preserve it.
2. Implement new_empty as a native function, using options merging.
3. Add another special case to Python binding generation to treat new_*
similar to *_like (i.e., handle TensorOptions correctly). The logic
here is probably wrong, actually; we should codegen TensorOptions
correctly no matter what happens, but new_empty follows the same
pattern as empty_like so I opted not to touch this code too much.
4. Delete the now defunct manual binding code.
5. Delete manual type annotations that are no longer necessary since
we're going through native.
I didn't handle memory format correctly here. I don't know if this function
should accept memory format; prior memory format patches didn't add support
for memory format to new_like. If we had put memory format in TensorOptions
this wouldn't have been a question.
ghstack-source-id: 89294185
Test Plan: sandcastle & ossci
Differential Revision: D17133000
fbshipit-source-id: 00f4e98bd5174f6fd54e8aba2910ea91824771d9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23804
`output = tensor.align_to(names)` returns a view of `tensor` such that
`output.names = names`. Dimensions with the same names in `tensor` and
`output` have the same sizes; dimensions with new names have size 1.
The following must be true for this operation to succeed:
1) tensor.names must be a subsequence (not necessarily contiguous) of `names`
2) Aligning tensor.names to names must not change the absolute position from the
right of any unnamed dimension.
In practice, these constraints mean that aligning cannot transpose
names.
Some examples:
- Tensor[C].align_to(C) -> Tensor[C]
- Tensor[N].align_to([N, C]) -> Tensor[N, C]
- Tensor[H, W].align_to([N, H, W, C]) -> Tensor[N, H, W, C]
- Tensor[None].align_to([N, None]) -> Tensor[N, None]
- Tensor[N].align_to([N, None None]) -> Tensor[N, None, None]
Examples of error cases:
- Tensor[W, H].align_to([N, H, W, C]) -> Error (not a subsequence)
- Tensor[None, H].align_to([None, H, W]) -> Error (would change the
absolute position from the right of a None dimension)
`torch.align_tensors(*tensors)` aligns the named dimensions of each
tensor according to the alignment rules so that they can be used in an
operation. More concretely, it aligns each tensor to the
longest names among the names of the tensors in `tensors`.
This allows users to emulate "broadcasting by names", which is one of
the things named tensors tries to enable. Here is an example:
```
imgs: Tensor[N, C, H, W]
scale: Tensor[N]
// Doesn't work because we do broadcasting by alignment by default
imgs * scale
// Does work
imgs, scale = torch.align_tensors(imgs, scale)
imas * scale
```
Future:
- Consider allowing broadcasting by names by default.
Test Plan:
- The diff looks pretty large but more than half of it is testing.
- new tests [namedtensor ci]
Differential Revision: D16657927
Pulled By: zou3519
fbshipit-source-id: e2f958bf5146c8ee3b694aba57d21b08e928a4e6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24184
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D16764168
Pulled By: ezyang
fbshipit-source-id: cc252a860fd7e4b7fb2b95c5d9fcdbf6935ffeb6
Summary:
I inserted default weight and reduction params in binary_cross_entropy_with_logits function . These default params exist in python and binary_cross_entropy function in cpp.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21336
Differential Revision: D15628917
Pulled By: ezyang
fbshipit-source-id: 38e5f53851125238842df1bd71cb6149c8603be1
Summary:
`Tensor.is_cuda` and `is_leaf` is not a predicate function but a `bool` attribute. This patch fixes the type hints in `torch/__init__.pyi` for those attributes.
```diff
- def is_cuda(self) -> bool: ...
+ is_cuda: bool
- def is_leaf(self) -> bool: ...
+ is_leaf: bool
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21192
Differential Revision: D15592766
Pulled By: soumith
fbshipit-source-id: 8c4ecd6939df8b8a8a19e1c9db6d40193bca7e4a
Summary:
Make it possible to construct a pinned memory tensor without creating a storage first and without calling pin_memory() function. It is also faster, as copy operation is unnecessary.
Supported functions:
```python
torch.rand_like(t, pin_memory=True)
torch.randn_like(t, pin_memory=True)
torch.empty_like(t, pin_memory=True)
torch.full_like(t, 4, pin_memory=True)
torch.zeros_like(t, pin_memory=True)
torch.ones_like(t, pin_memory=True)
torch.tensor([10,11], pin_memory=True)
torch.randn(3, 5, pin_memory=True)
torch.rand(3, pin_memory=True)
torch.zeros(3, pin_memory=True)
torch.randperm(3, pin_memory=True)
torch.empty(6, pin_memory=True)
torch.ones(6, pin_memory=True)
torch.eye(6, pin_memory=True)
torch.arange(3, 5, pin_memory=True)
```
Part of the bigger: `Remove Storage` plan.
Now compatible with both torch scripts:
` _1 = torch.zeros([10], dtype=6, layout=0, device=torch.device("cpu"), pin_memory=False)`
and
` _1 = torch.zeros([10], dtype=6, layout=0, device=torch.device("cpu"))`
Same checked for all similar functions `rand_like`, `empty_like` and others
It is fixed version of #18455
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18952
Differential Revision: D14801792
Pulled By: VitalyFedyunin
fbshipit-source-id: 8dbc61078ff7a637d0ecdb95d4e98f704d5450ba
Summary:
Make it possible to construct a pinned memory tensor without creating a storage first and without calling pin_memory() function. It is also faster, as copy operation is unnecessary.
Supported functions:
```python
torch.rand_like(t, pin_memory=True)
torch.randn_like(t, pin_memory=True)
torch.empty_like(t, pin_memory=True)
torch.full_like(t, 4, pin_memory=True)
torch.zeros_like(t, pin_memory=True)
torch.ones_like(t, pin_memory=True)
torch.tensor([10,11], pin_memory=True)
torch.randn(3, 5, pin_memory=True)
torch.rand(3, pin_memory=True)
torch.zeros(3, pin_memory=True)
torch.randperm(3, pin_memory=True)
torch.empty(6, pin_memory=True)
torch.ones(6, pin_memory=True)
torch.eye(6, pin_memory=True)
torch.arange(3, 5, pin_memory=True)
```
Part of the bigger: `Remove Storage` plan.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18455
Reviewed By: ezyang
Differential Revision: D14672084
Pulled By: VitalyFedyunin
fbshipit-source-id: 9d0997ec00f59500ee018f8b851934d334012124
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**
This was requested by someone at Facebook; this lint is turned
on for Facebook by default. "Sure, why not."
I had to noqa a number of imports in __init__. Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it. Left for future work.
Be careful! flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments. flake8-3 will
report an import unused; flake8-2 will not. For now, I just
noqa'd all these sites.
All the changes were done by hand.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D14687478
fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16751
This was made more complicated by the fact that ivalue::IntList
is a thing. So I had to fix all of the sites where we referring
to IValue post facto.
The following codemods were run, in this order:
```
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntList IntArrayRef
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntArrayRef::create IntList::create
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in ivalue::IntArrayRef ivalue::IntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in Tag::IntArrayRef Tag::IntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in isIntArrayRef isIntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in toIntArrayRef toIntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'Shared<IntArrayRef>' 'Shared<IntList>'
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'intrusive_ptr<IntArrayRef>' 'intrusive_ptr<IntList>'
```
Some manual fixups were done afterwards; they can be reviewed separately
at https://github.com/pytorch/pytorch/pull/16752
Reviewed By: dzhulgakov
Differential Revision: D13954363
fbshipit-source-id: b5c40aacba042402155a2f5a229fa6db7992ac64
Summary:
So that things like below can be JITable, and available in C++ API:
```python
import torch
torch.jit.script
def f(x, y, z):
x.index_add(0, y, z)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12413
Differential Revision: D13899948
Pulled By: suo
fbshipit-source-id: b0006b4bee2d1085c813733e1037e2dcde4ce626
Summary:
We have:
- This is an initial stab at creating a type stub `torch/__init__.pyi` .
- This is only tested on Python 3, since that's the only Python version mypy
works on.
- So far, we only aim at doing this for torch functions and torch.Tensor.
- Quite a few methods and functions have to be typed manually. These are
done in `torch/__init__.pyi.in`
For me, PyCharm (the non-paid one) didn't seem to indicate errors in the .pyi when opening and seemed to be able to get the type hint for the few functions I tried, but I don't use PyCharm for my usual PyTorch activities, so I didn't extensively try this out.
An example of a generated PYI is at [this gist](https://gist.github.com/ezyang/bf9b6a5fa8827c52152858169bcb61b1).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12500
Differential Revision: D13695553
Pulled By: ezyang
fbshipit-source-id: 4566c71913ede4e4c23ebc4a72c17151f94e8e21