Summary:
Previously, `at::native::embedding` implicitly assumed that the `weight` argument would be 1-D or greater. Given a 0-D tensor, it would segfault. This change makes it throw a RuntimeError instead.
Fixes https://github.com/pytorch/pytorch/issues/41780
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42550
Reviewed By: smessmer
Differential Revision: D23040744
Pulled By: albanD
fbshipit-source-id: d3d315850a5ee2d2b6fcc0bdb30db2b76ffffb01
Summary:
Fixes https://github.com/pytorch/pytorch/issues/41656
For the CPU version, this is a regression introduced in https://github.com/pytorch/pytorch/issues/10980 which vectorized the `grid_sampler_2d` implementation. It uses the AVX2 gather intrinsic which for `float` requires 32-bit indexing to match the number of floats in the AVX register. There is also an `i64gather_ps` variant but this only utilizes half of the vector width so would be expected to give worse performance in the more likely case where 32-bit indexing is acceptable. So, I've left the optimised AVX version as-is and reinstated the old non-vectorized version as a fallback.
For the CUDA version, this operation has never supported 32-bit indexing so this isn't a regression. I've templated the kernel on index type and added 64-bit variants. Although I gather in some places a simple `TORCH_CHECK(canUse32BitIndexMath(...))` is used instead. So, there is a decision to be made here.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41923
Reviewed By: glaringlee
Differential Revision: D22925931
Pulled By: zou3519
fbshipit-source-id: 920816107aae26360c5e7f4e9c729fa9057268bb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42215
Specifically on https://github.com/pytorch/pytorch/pull/27477#discussion_r371402079
We would like to supported with include_last=True overall for other reduction types like mean and max. It now causes further code fragmentation in DPER (https://www.internalfb.com/intern/diff/D22794469/).
More details: https://www.internalfb.com/intern/diff/D22794469/?dest_fbid=309597093427021&transaction_id=631457624153457
ghstack-source-id: 108733009
Test Plan:
```
buck test mode/dev-nosan //caffe2/test:nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu"
```
```
(base) [jianyuhuang@devbig281.ftw3.facebook.com: ~/fbsource/fbcode/caffe2/test] $ TORCH_SHOW_CPP_STACKTRACES=1 buck test mode/dev-nosan //caffe2/test:
nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu" --print-passing-details
Parsing buck files: finished in 1.2 sec
Building: finished in 5.5 sec (100%) 10130/10130 jobs, 2 updated
Total time: 6.7 sec
More details at https://www.internalfb.com/intern/buck/build/dbdc2063-69d8-45cb-9146-308a9e8505ef
First unknown argument: --print-passing-details.
Falling back to TestPilot classic.
Trace available for this run at /tmp/testpilot.20200728-195414.1422748.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision cd2638f1f47250eac058b8c36561760027d16add fbpkg f88726c8ebde4ba288e1172a348c7f46 at Mon Jul 27 18:11:43 2020 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/887/t.par
Discovering tests
Running 1 test
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/844425097242375
✓ caffe2/test:nn - test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu (test_nn.TestNNDeviceTypeCPU) 0.162 1/1 (passed)
Test output:
> /data/users/jianyuhuang/fbsource/fbcode/buck-out/dev/gen/caffe2/test/nn#binary,link-tree/torch/_utils_internal.py:103: DeprecationWarning: This is a NOOP in python >= 3.7, its just too dangerous with how we write code at facebook. Instead we patch os.fork and multiprocessing which can raise exceptions if a deadlock would happen.
> threadSafeForkRegisterAtFork()
> /usr/local/fbcode/platform007/lib/python3.7/importlib/_bootstrap.py:219: ImportWarning: can't resolve package from __spec__ or __package__, falling back on __name__
and __path__
> return f(*args, **kwds)
> test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu (test_nn.TestNNDeviceTypeCPU) ... Couldn't download test skip set, leaving all tests enabled...
> ok
>
> ----------------------------------------------------------------------
> Ran 1 test in 0.162s
>
> OK
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/844425097242375
Summary (total time 5.54s):
PASS: 1
FAIL: 0
SKIP: 0
FATAL: 0
TIMEOUT: 0
OMIT: 0
Did _not_ run with tpx. See https://fburl.com/tpx for details.
```
Reviewed By: dzhulgakov
Differential Revision: D22801881
fbshipit-source-id: 80a624465727081bb9bf55c28419695a3d79c6e5
Summary:
Reland PR https://github.com/pytorch/pytorch/issues/40056
A new overload of upsample_linear1d_backward_cuda was added in a recent commit, so I had to add the nondeterministic alert to it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41538
Reviewed By: zou3519
Differential Revision: D22608376
Pulled By: ezyang
fbshipit-source-id: 54a2aa127e069197471f1feede6ad8f8dc6a2f82
Summary:
This PR implements a feature extension discussed in https://github.com/pytorch/pytorch/issues/41516.
I followed this other PR https://github.com/pytorch/pytorch/issues/22245 to add this other module. While I was at it, I also added `extra_repr()` method in `Flatten` which was missing.
I see there are no unit tests for these modules. Should I add those too? If so, what is the best place I should place these?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41564
Reviewed By: gchanan
Differential Revision: D22636766
Pulled By: albanD
fbshipit-source-id: f9efdefd3ffe7d9af9482087625344af8f990943
Summary:
This test function is confusing since our `assertEqual` behavior allows for tolerance to be specified, and this is a redundant mechanism.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41514
Reviewed By: ngimel
Differential Revision: D22569348
Pulled By: mruberry
fbshipit-source-id: 2b2ff8aaa9625a51207941dfee8e07786181fe9f
Summary:
BCELoss currently uses different broadcasting semantics than numpy. Since previous versions of PyTorch have thrown a warning in these cases telling the user that input sizes should match, and since the CUDA and CPU results differ when sizes do not match, it makes sense to upgrade the size mismatch warning to an error.
We can consider supporting numpy broadcasting semantics in BCELoss in the future if needed.
Closes https://github.com/pytorch/pytorch/issues/40023
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41426
Reviewed By: zou3519
Differential Revision: D22540841
Pulled By: ezyang
fbshipit-source-id: 6c6d94c78fa0ae30ebe385d05a9e3501a42b3652
Summary:
Closes https://github.com/pytorch/pytorch/issues/36977
This avoid the division by zero that was causing NaNs to appear in the output. `AvgPooling2d` and `AvgPooling3d` both had this issue on CPU and CUDA.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41368
Reviewed By: ailzhang
Differential Revision: D22520013
Pulled By: ezyang
fbshipit-source-id: 3ece7829f858f5bc17c2c1d905266ac510f11194
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39342
Many networks such as resnet have adds followed by relu. This op is the
first step in enabling this fused implementation.
Once we have the fused add_relu op, a JIT pass will be written to
replace add + relu patterns with add_relu.
Test Plan:
python test/test_nn.py TestAddRelu
Imported from OSS
Differential Revision: D21822397
fbshipit-source-id: 03df83a3e46ddb48a90c5a6f755227a7e361a0e8
Summary:
fix https://github.com/pytorch/pytorch/issues/40227
Removed the sorting operation both in ModuleDict class, updated the docstring.
Also remove a sort operation in corresponding unit test, which will lead to unit test fail.
BC Note: Python version after 3.6, the plain dict will preserve the order of keys.
example:
For a python 3.6+ user, if he is initial a ModuleDict instance using plain python dict:
{
"b": torch.nn.MaxPool2d(3),
"a": torch.nn.MaxPool2d(3)
}
, he will get a ModuleDict which preserve the order:
ModuleDict(
(b): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)
(a): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)
)
For a python 3.5 user, if we maintain the same input, then the output ModuleDict could be:
ModuleDict(
(a): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)
(b): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)
)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40905
Differential Revision: D22357480
Pulled By: albanD
fbshipit-source-id: 0e2502769647bb64f404978243ca1ebe5346d573
Summary:
This PR aims at tackling https://github.com/pytorch/pytorch/issues/37823 by:
- ensuring that buffers will be used for normalization computation but won't be updated, when buffers are not None, and `track_running_stats=False`
- adding a corresponding unittest to ensure expected behaviour
Any feedback is welcome!
_Note: we might want to update the docstrings of `BatchNorm*d`, feel free to share any suggestion!_
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38084
Differential Revision: D22047871
Pulled By: ezyang
fbshipit-source-id: 5acbcad9773e7901f26d625db71d43d7dc236d3e
Summary:
This allows registering hooks that will be executed for every module.
This idea arose in a discussion with tkerola and niboshi kindly proposed this approach.
The use case for this is to avoid boilerplate code when registering the same hook for all the modules in a complex model, the internal use-case was to allow every model to accept a NumPy array in the forward pass in a simpler way. Other use cases involve general mechanisms for plotting or tracing & debugging.
Currently, this is shared for all the modules but this can be worked out to have the hooks shared only per type of module.
If this functionality is not needed feel free to close the PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38972
Differential Revision: D22091364
Pulled By: albanD
fbshipit-source-id: 204ff5f9e119eff5bdd9140c64cb5dc467bb23a2
Summary:
This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument.
In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872
Differential Revision: D21740237
Pulled By: mruberry
fbshipit-source-id: acbc027aa1d7877a49664d94db9a5fff91a07042
Summary:
Fix https://github.com/pytorch/pytorch/issues/38764
The current problem is that, `top_diff` and `top_mask` pointers are shifted "accumulatively" with for-n and for-c loops. This may cause overflow and illegal memory access when the loop counts are greater than one, that is n > 65535 or c > 65535 (the case in https://github.com/pytorch/pytorch/issues/38764). Since neither of n > 65535 or c > 65535 is common, it has not been seen before. The simple fix would be using new pointer variables for the n & c offset instead of directly modifying `top_diff` or `top_mask`.
However, I think the current nchw max_pool2d GPU impl still has plenty of room for performance improvement. We can check that in a later PR if needed.
Slightly clean up the indentation. Also add tests to use CPU impl as a reference check.
cc skrah
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38953
Differential Revision: D21721930
Pulled By: ezyang
fbshipit-source-id: fef7d911d814f8ed9fd67c60cabe5d52f8fd3d57
Summary:
This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument.
In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872
Differential Revision: D21717199
Pulled By: mruberry
fbshipit-source-id: 9feb856f94eee911b44f6c7140a1d07c1b026d3a
Summary:
Fixes https://github.com/pytorch/pytorch/issues/38839. Previously, if magnitude of input values was large, when computing `max+log(sum)` the `log(sum)` value was essentially ignored, now the result is computed as
`x-max-log(sum)` which has a better chance of preserving accuracy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38945
Differential Revision: D21712483
Pulled By: ngimel
fbshipit-source-id: c1a3599ed981ba7a7fd130cbd7040a706b7eace0
Summary:
CC ezyang xw285cornell sunway513
Commit 59d92e442b (https://github.com/pytorch/pytorch/issues/38557) has caused this test to regularly fail on ROCm CI gfx900 hosts. Skipping test until root cause analysis can complete.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38724
Differential Revision: D21645815
Pulled By: xw285cornell
fbshipit-source-id: 4087e9565710c271ca5c026a5ae0c5132e56f44d
Summary:
Per title.
We move all the individual gradient norms to a single device before stacking (no-op if all the gradients are already on a single device), `clip_coef` is copied to the device of gradient, which may be suboptimal as there could be multiple copies, but no worse than when we were synchronizing for each parameter. In a simple case of all gradients on a single device, there should be no synchronization.
Also, we no longer error out if parameter list is empty or none of the parameters have gradients, and return 0 total_norm instead.
Fixes https://github.com/pytorch/pytorch/issues/38605
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38615
Reviewed By: ailzhang
Differential Revision: D21634588
Pulled By: ngimel
fbshipit-source-id: ea4d08d4f3445438260052820c7ca285231a156b
Summary:
Edit: this has been updated to reflect the PR's current status, which has changed after review.
This PR updates the behavior of the assertEqual, assertNotEqual, and assert_allclose to be consistent with each other and torch.isclose. It corrects several additional bugs in the current implementations and adds extensive testing and comments, too.
These updates follow from changes to assertEqual like https://github.com/pytorch/pytorch/pull/34258 and https://github.com/pytorch/pytorch/pull/37069, and from our discussion of torch.isclose for complex tensors (see https://github.com/pytorch/pytorch/issues/36462), where we decided to implement a NumPy-compatible mathematical notion of "closeness" for complex tensors that is not a great fit for our testing framework.
The detailed changelist is:
- New test framework functions for comparing tensors and scalars
- Tensors are compared using isclose; the real and imaginary parts of complex tensors are compared independently
- Scalars are compared using the same algorithm
- assertEqual and assert_allclose now use this common comparison function, instead of each implementing their own with divergent behavior
- assertEqual-like debug messages are now available for all tensor and scalar comparisons, with additional context when comparing the components of sparse, quantized, and complex tensors
- Extensive testing of the comparison behavior and debug messages
- Small Updates
- assertEqual now takes an "exact_device" argument, analogous to "exact_dtype", which should be useful in multidevice tests
- assertEqual now takes an "equal_nan" argument for argument consistency with torch.isclose
- assertEqual no longer takes the "allow_inf" keyword, which misleadingly only applied to scalar comparisons, was only ever set (rarely) to true, and is not supported by torch.isclose
- Bug fixes:
- the exact_dtype attribute has been removed (no longer needed after https://github.com/pytorch/pytorch/pull/38103)
- message arguments passed to assertEqual are now handled correctly
- bool x other dtype comparisons are now supported
- uint8 and int8 tensor comparisons now function properly
- rtol for integer comparisons is now supported (default is zero)
- rtol and atol for scalar comparisons are now supported
- complex scalar comparisons are now supported, analogous to complex tensor comparisons
- assertNotEqual is now equivalent to the logical negation of assertEqual
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37294
Differential Revision: D21596830
Pulled By: mruberry
fbshipit-source-id: f2576669f7113a06f82581fc71883e6b772de19b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35620
Python 2 has reached end-of-life and is no longer supported by PyTorch.
`self.subTest` can be used directly in Python 3.
Test Plan: CI
Differential Revision: D20842872
Pulled By: dreiss
fbshipit-source-id: 6ad42550c01e6959821ff07df767fc14b58c5a9e
Summary:
Add read/write vectorization to non-persistent softmax kernels only. At this point launch logic has minimal changes, and `ILP=vectorization=2` is always used (the code can handle other values, but `ILP=2` has been the most consistent performer).
Dispatch to persistent / non-persistent kernels is unchanged.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36485
Differential Revision: D21477775
Pulled By: ngimel
fbshipit-source-id: 9ff7fd243695d7bbf4121390085b64db0bbdef35