Commit Graph

2119 Commits

Author SHA1 Message Date
Kurt Mohler
8bdce9bb74 Fix UntypedStorage.resize_ to keep same CUDA device index (#113386)
Fixes #113300

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113386
Approved by: https://github.com/albanD
2023-11-10 01:57:25 +00:00
Kurt Mohler
fd209543d5 Add torch.utils.deterministic.fill_uninitialized_memory flag (#111377)
Part of #109802

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111377
Approved by: https://github.com/albanD, https://github.com/aaronenyeshi
2023-11-01 16:10:09 +00:00
PyTorch MergeBot
ace2713d1e Revert "Add torch.utils.deterministic.fill_uninitialized_memory flag (#111377)"
This reverts commit f1785373c0.

Reverted https://github.com/pytorch/pytorch/pull/111377 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/111377#issuecomment-1784179040))
2023-10-29 17:41:55 +00:00
Nikita Shulga
b61efe1c2b Fix torch.[size|stride](dim=None)` invocation (#111991)
Per documentation, one should be able to explicitly pass dim argument as None to get tensor size across all dimentions/strides, but before this change it was incorrectly interpreted as named tensor call.

Modify `size` and `stride` signatures generated by `gen_pyi.py` to highlight that overload with `None` will return a Tuple, but one with `dim: _int` returns `int`.

Add regression test to validate the behavior, and remove the check for asserts from two named tensors tests (NamedTensors are dead, aren't they?)

Fixes https://github.com/pytorch/pytorch/issues/111944
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111991
Approved by: https://github.com/zou3519
2023-10-26 04:14:35 +00:00
Kurt Mohler
f1785373c0 Add torch.utils.deterministic.fill_uninitialized_memory flag (#111377)
Part of #109802

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111377
Approved by: https://github.com/albanD
2023-10-26 02:39:06 +00:00
Nikita Shulga
7709382b50 Fix regression in torch.equal behavior for NaNs (#111699)
`torch.equal(x, x)` should return false if one of `x` is a tenor of floats one of which is NaN.
So, it renders some of the optimization proposed in https://github.com/pytorch/pytorch/pull/100024 invalid, though as result `torch.equal` will become much slower for identical floating point tensors.

Add regression test that calls torch.equal for tensor containing NaN

Fixes https://github.com/pytorch/pytorch/issues/111251

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111699
Approved by: https://github.com/Skylion007, https://github.com/albanD
2023-10-21 00:02:45 +00:00
CaoE
d1afb7d43d add Half support for multinomial on CPU (#104178)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104178
Approved by: https://github.com/jgong5, https://github.com/kulinseth, https://github.com/cpuhrsch
2023-10-20 19:16:04 +00:00
Evgeni Burovski
48989bc820 trace frames with np.ndarray (#110512)
Fixes #109604

Resubmit gh-109715 + several skips and small fixes to make tests pass.

The main fix here is by @ysiraichi : previously, dynamo did not resume tracing numpy ndarrays after a graph break.
While at it, fix several small issues Yukio's fix uncovers:

- graph break gracefully on numpy dtypes which do not map to torch.dtypes (uint16 etc)
- recognize array scalars in dynamo, treat them as 0D ndarrays
- make sure that iterating over torch.ndarray generates arrays not bare tensors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110512
Approved by: https://github.com/lezcano
2023-10-15 00:56:10 +00:00
CaoE
8713a1a363 add Half support for bernoulli on CPU (#104176)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104176
Approved by: https://github.com/mingfeima, https://github.com/cpuhrsch
2023-10-13 01:18:55 +00:00
Prachi Gupta
53a9ac534c Added decorator skipRocmIfTorchInductor and skipped failing tests (#107760)
This PR adds a skip decorator which will disable tests in CI for ROCm inductor workflow. This new workflow will be coming in via https://github.com/pytorch/pytorch/pull/110544

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107760
Approved by: https://github.com/jataylo, https://github.com/pruthvistony, https://github.com/atalman
2023-10-12 16:00:35 +00:00
Elias Ellison
cf1da9bd17 enable index add test (#111016)
Dynamo is swallowing a user exception when suppress_errors is set to True. There's an issue filed for that: https://github.com/pytorch/pytorch/issues/108798. In the meantime we still like the functionality in this test which works without the default setting (dont suppress errors) to not regress.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111016
Approved by: https://github.com/yanboliang
2023-10-11 19:41:35 +00:00
eellison
fb4b9e9c8e Re-enable a couple of fixed tests (#110770)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110770
Approved by: https://github.com/yanboliang, https://github.com/int3, https://github.com/Skylion007
ghstack dependencies: #110651
2023-10-10 19:13:14 +00:00
eellison
c5f06b9753 Re-enable test_copy_transpose_math_view, neg_view/dce fix (#110651)
- neg view can just be lowered to neg() post functionalization
- we were treating all fallback kernels as not having side effects. we shouldn't dce mutating fallback kernels - either mutations induced by the reinplacing pass or clone_ with unsupported arguments (complex)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110651
Approved by: https://github.com/Chillee, https://github.com/jansel, https://github.com/malfet, https://github.com/Skylion007
2023-10-10 16:34:01 +00:00
jjsjann123
37567fdf31 Nvfuser cpp api deprecation attempt 2 (#110881)
attempting to re-try #110318 deprecating nvfuser c++ API

warning has been updated to TORCH_WARN_ONCE;
Warning thrown inside torch::jit::fuser::cuda::isEnabled() is turned off and will be deprecated when we pulled out TorchScript integration in the follow up PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110881
Approved by: https://github.com/davidberard98, https://github.com/NicolasHug
2023-10-10 08:07:03 +00:00
PyTorch MergeBot
bbdc8c7b05 Revert "deprecating nvfuser c++ API (#110318)"
This reverts commit bf0866fc16.

Reverted https://github.com/pytorch/pytorch/pull/110318 on behalf of https://github.com/davidberard98 due to too many warnings being thrown in torchvision https://github.com/pytorch/pytorch/issues/110857 ([comment](https://github.com/pytorch/pytorch/pull/110318#issuecomment-1753245449))
2023-10-09 15:41:50 +00:00
jjsjann123
bf0866fc16 deprecating nvfuser c++ API (#110318)
deprecating nvfuser c++ API

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110318
Approved by: https://github.com/davidberard98
2023-10-07 02:25:21 +00:00
eellison
3812f2e40c Preserve layout on like constructors (#110242)
Partially fixes `test_memory_format_factory_like_functions_preserve` with PYTORCH_TEST_WITH_INDUCTOR. Inductor preserves memory layouts for user-visible outputs as annotated on the fx graph that it is passed in. That graph is generated from running aot_autograd with decompositions. If the decompositions give incorrect strides, so will inductor.

This preserves the layout of `_like` operators when it corresponds to a `torch.memory_format`. It doesnt fix a) arbitrary permutations, b) striding of non-dense outputs. Both of these are lower-pri compared to preserving channels last. We would need either https://github.com/pytorch/pytorch/issues/92920 or a `to` variant that takes in a physical layout arbitrary permutations. I converted the output of rand to the correct layout instead of passing the layout in so that this would compose with the `replace_random` pass, and because the two pointwise ops will get fused anyway.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110242
Approved by: https://github.com/int3
2023-10-02 23:53:55 +00:00
Moritz Hennen
09c598745c Rename torch._C._TensorBase to TensorBase (#109940)
I have gone ahead and implemented the renaming of the type `torch._C._TensorBase` to a non-private class name `TensorBase`.
The changes also include leaving `torch._C._TensorBase` as an alias to the new type: 70458768fb/torch/csrc/autograd/python_variable.cpp (L2196-L2197) both in the c++ code and in the corresponding `__init__.pyi.in` file:
70458768fb/torch/_C/__init__.pyi.in (L1522)

Fixes #109438

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109940
Approved by: https://github.com/ezyang
2023-09-25 19:10:22 +00:00
Jez Ng
063a62622b Add memory overlap check to meta_copy_ (#108989)
Fixes `test_copy_many_to_one`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108989
Approved by: https://github.com/eellison
2023-09-12 23:28:14 +00:00
Kurt Mohler
4c5e43574c Reland 2: Add PyObject preservation for UntypedStorage (#109039)
Relands #103907 after it was reverted. This PR makes the new `ignore_hermetic_tls` argument of `check_pyobj` optional to avoid causing a compilation error in torchdistx

Part of #91395

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109039
Approved by: https://github.com/ezyang
2023-09-12 22:26:05 +00:00
PyTorch MergeBot
41bd0fde7e Revert "Remove fixed skips (#108674)"
This reverts commit ab9fb03d6f.

Reverted https://github.com/pytorch/pytorch/pull/108674 on behalf of https://github.com/huydhn due to Sorry for picking this up a bit late, but with https://github.com/pytorch/pytorch/pull/108647 reverted, these tests are failing again. So we need to wait for the PR to reland before we can land this change ([comment](https://github.com/pytorch/pytorch/pull/108674#issuecomment-1715202692))
2023-09-12 08:04:32 +00:00
PyTorch MergeBot
59f605be57 Revert "Reland 2: Add PyObject preservation for UntypedStorage (#109039)"
This reverts commit 419e4e17a2.

Reverted https://github.com/pytorch/pytorch/pull/109039 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing linter job in trunk, probably due to a landrace ([comment](https://github.com/pytorch/pytorch/pull/109039#issuecomment-1715147020))
2023-09-12 07:26:11 +00:00
Kurt Mohler
419e4e17a2 Reland 2: Add PyObject preservation for UntypedStorage (#109039)
Relands #103907 after it was reverted. This PR makes the new `ignore_hermetic_tls` argument of `check_pyobj` optional to avoid causing a compilation error in torchdistx

Part of #91395

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109039
Approved by: https://github.com/ezyang
2023-09-12 01:19:40 +00:00
Li-Huai (Allan) Lin
b2cba439b4 Introduce Tensor overload to linspace and logspace (#104889)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104889
Approved by: https://github.com/zou3519
ghstack dependencies: #107958
2023-09-11 23:30:40 +00:00
PyTorch MergeBot
a7f5abeade Revert "Introduce Tensor overload to linspace and logspace (#104889)"
This reverts commit 57e5239321.

Reverted https://github.com/pytorch/pytorch/pull/104889 on behalf of https://github.com/clee2000 due to sorry have to revert this to revert https://github.com/pytorch/pytorch/pull/107958 ([comment](https://github.com/pytorch/pytorch/pull/104889#issuecomment-1714305768))
2023-09-11 17:33:48 +00:00
Li-Huai (Allan) Lin
57e5239321 Introduce Tensor overload to linspace and logspace (#104889)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104889
Approved by: https://github.com/zou3519
ghstack dependencies: #107958
2023-09-11 15:29:39 +00:00
Edward Z. Yang
137afe74e0 Don't fastpath conj copy when conj/neg bit mismatch (#108881)
Fixes https://github.com/pytorch/pytorch/issues/106051

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108881
Approved by: https://github.com/soulitzer
2023-09-08 20:44:43 +00:00
PyTorch MergeBot
68238606f3 Revert "Reland: Add PyObject preservation for UntypedStorage (#103907)"
This reverts commit 56b848157c.

Reverted https://github.com/pytorch/pytorch/pull/103907 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing torchdistx build which uses check_pyobj here 9c1b9f5cb2/src/python/torchdistx/_C/deferred_init.cc (L87) ([comment](https://github.com/pytorch/pytorch/pull/103907#issuecomment-1712121158))
2023-09-08 19:27:07 +00:00
Evgeni Burovski
1f20531939 fall back to eager on NotImplementedError (#107863)
Follow-up to https://github.com/pytorch/pytorch/pull/107710:

Help  dynamo fall back to eager when compiling unimplemented numpy constructs:

- arrays of strings
- (arg){min, max} for complex types
- various arguments typed as NotImplemented (`np.ones(4, order="F")` etc)
- numpy functions which torch._numpy does not implement

To test, run (we do not implement arrays of strings)

```
import torch
import numpy as np

@torch.compile(fullgraph=False)
def fn():
    return np.asarray(["L", "U"])
```

and observe it compiles with fullgraph=False and fails with fullgraph=True

Fixes https://github.com/pytorch/pytorch/issues/107970

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107863
Approved by: https://github.com/ezyang, https://github.com/lezcano
2023-09-07 21:22:20 +00:00
eellison
ab9fb03d6f Remove fixed skips (#108674)
These no longer fail with TEST_WITH_TORCHINDUCTOR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108674
Approved by: https://github.com/desertfire
2023-09-07 17:36:56 +00:00
Kurt Mohler
56b848157c Reland: Add PyObject preservation for UntypedStorage (#103907)
This relands #97470 after #102553 reverted it. This PR attempts to fix the internal failure by avoiding an unnecessary intermediate storage buffer allocation in `c10::newStorageImplFromRefcountedDataPtr`.

Part of #91395

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103907
Approved by: https://github.com/ezyang
2023-09-07 04:24:11 +00:00
Kurt Mohler
3f88e3105f Reland: Remove remaining global set_default_dtype calls from tests (#108088)
Fixes #68972

Relands #107246

To avoid causing Meta-internal CI failures, this PR avoids always asserting that the default dtype is float in the `TestCase.setUp/tearDown` methods. Instead, the assert is only done if `TestCase._default_dtype_check_enabled == True`. `_default_dtype_check_enabled` is set to True in the `if __name__ == "__main__":` blocks of all the relevant test files that have required changes for this issue

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108088
Approved by: https://github.com/ezyang
2023-09-07 03:04:34 +00:00
PyTorch MergeBot
43527d41a2 Revert "Remove fixed skips (#108674)"
This reverts commit 518cfda2dd.

Reverted https://github.com/pytorch/pytorch/pull/108674 on behalf of https://github.com/huydhn due to Sorry for reverting this, but one test is failing on inductor 518cfda2dd, and it seems easier to revert this than disabling the test ([comment](https://github.com/pytorch/pytorch/pull/108674#issuecomment-1709310192))
2023-09-07 00:56:46 +00:00
eellison
518cfda2dd Remove fixed skips (#108674)
These no longer fail with TEST_WITH_TORCHINDUCTOR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108674
Approved by: https://github.com/desertfire
2023-09-06 22:33:43 +00:00
PyTorch MergeBot
161ea463e6 Revert "Remove remaining global set_default_dtype calls from tests (#107246)"
This reverts commit aa8ea1d787.

Reverted https://github.com/pytorch/pytorch/pull/107246 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/107246#issuecomment-1693838522))
2023-08-25 19:34:55 +00:00
Digant Desai
8a7a6867b9 [PyTorch][Tensor] Introduce tensor.dim_order (#106835)
Summary:
This is a stride based attribute for a tensor available in Python.

This can help inspect tensors generated using `torch.empty_permuted(.., physical_layout, ...)`, where physical_layout should match the dim_order returned here. `empty_permuted` will be renamed to use dim_order as the param name in the future. And also help Executorch export pipeline with implementing dim_order based tensors.

Differential Revision: D48134476

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106835
Approved by: https://github.com/ezyang
2023-08-25 00:06:03 +00:00
Kurt Mohler
aa8ea1d787 Remove remaining global set_default_dtype calls from tests (#107246)
Fixes #68972

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107246
Approved by: https://github.com/ezyang
2023-08-24 16:10:48 +00:00
Aaron Gokaslan
660e8060ad [BE]: Update ruff to 0.285 (#107519)
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.

I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
2023-08-22 23:16:38 +00:00
PyTorch MergeBot
d59a6864fb Revert "[BE]: Update ruff to 0.285 (#107519)"
This reverts commit 88ab3e4322.

Reverted https://github.com/pytorch/pytorch/pull/107519 on behalf of https://github.com/ZainRizvi due to Sorry, but this PR breaks internal tests. @ezyang, can you please hep them get unblocked? It seems like one of the strings was prob accidentally modified ([comment](https://github.com/pytorch/pytorch/pull/107519#issuecomment-1688833480))
2023-08-22 19:53:32 +00:00
Aaron Gokaslan
88ab3e4322 [BE]: Update ruff to 0.285 (#107519)
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.

I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
2023-08-20 01:36:18 +00:00
Catherine Lee
bc053070f8 Mark test_gradient_extreme_cases as slow for inductor (#107189)
test_gradient_extreme_cases_* takes ~5 minutes on the inductor sm86 shard and possibly even longer on the inductor workflow since it's timing out right now although I'm not sure what the difference between the two is, and sometimes auto slow test detection isn't catching it
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107189
Approved by: https://github.com/ZainRizvi
2023-08-15 22:03:00 +00:00
Sam Larsen
3d00170b20 [inductor] fix test_dim_function_empty (#106994)
Summary: Looks like the assert syntax was just wrong

Test Plan:
PYTORCH_TEST_WITH_INDUCTOR=1 python test/test_torch.py -k test_dim_function_empty
PYTORCH_TEST_WITH_AOT_EAGER=1 python test/test_torch.py -k test_dim_function_empty
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106994
Approved by: https://github.com/eellison
2023-08-11 21:38:53 +00:00
Kshiteej K
a899333ffc fix: nll_loss batch rule with negative ignore_idx (#106118)
We use python decompositions instead of writing our own for batching rules.

Fixes https://github.com/pytorch/pytorch/issues/105736

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106118
Approved by: https://github.com/lezcano, https://github.com/zou3519
2023-08-04 07:43:02 +00:00
Fuzzkatt
ae1c0f42a3 update tf32 thresholds for H100 (#105879)
Addresses tf32 threshold related failures from NVIDIA internal testing for following unit tests:

H100:
- test_nn.py: test_ConvTranspose2d_dilated_cuda_tf32, test_ConvTranspose2d_no_bias_cuda_tf32, test_Transformer_multilayer_coder_cuda_tf32
- test_torch.py: test_cdist_non_contiguous_batch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105879
Approved by: https://github.com/ezyang
2023-08-02 16:44:01 +00:00
Scott Wolchok
b435bff53a [PyTorch] Add tests for empty tensors w/storage null data_ptr (#101426)
Further investigation seems to show that changing this behavior (making empty tensors sometimes have non-null data_ptr) was the real problem with #98090 . Adding tests to lock down this behavior so we don't change it by accident again.

Differential Revision: [D45873002](https://our.internmc.facebook.com/intern/diff/D45873002/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101426
Approved by: https://github.com/zou3519
2023-07-27 05:19:42 +00:00
Nikita Karetnikov
eac9e1b35f [OpInfo] add reference and error inputs for multilabel_margin_loss (#105523)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105523
Approved by: https://github.com/ezyang
2023-07-23 02:16:29 +00:00
Justin Chu
4cc1745b13 [BE] f-stringify torch/ and scripts (#105538)
This PR is a follow up on the pyupgrade series to convert more strings to use f-strings using `flynt`.

- https://docs.python.org/3/reference/lexical_analysis.html#f-strings
- https://pypi.org/project/flynt/

Command used:

```
flynt torch/ -ll 120
flynt scripts/ -ll 120
flynt tools/ -ll 120
```

and excluded `collect_env.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105538
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-07-21 19:35:24 +00:00
Justin Chu
73e1455327 [BE] Enable ruff's UP rules and autoformat test/ (#105434)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105434
Approved by: https://github.com/albanD
2023-07-19 20:36:06 +00:00
Kurt Mohler
fcb7d4b358 Mark bincount CUDA deterministic if weights are not given (#105244)
Fixes #98316

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105244
Approved by: https://github.com/mikaylagawarecki
2023-07-18 01:16:51 +00:00
Nikita Shulga
5837e95d30 [Reland] Update mypy to 1.4.1 (#105227)
This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022)
- Update mypy to 1.4.1 (#91983)

That were reverted due to the conflict with internal source repo.

Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
  - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
  - Add missing return statement to `torch._export. deserialize_graph`
  - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
  - Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
  - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`

Unrelated, to bypass CI failures due to the gcc9 dependency update in Ubuntu-18.04:
- Add hack to squash older libstdc++ from conda environment in favor one from OS to `.ci/docker/install_conda.sh`
- Update bazel cuda builds to focal, as with libstdc++-6.0.32 bazel builds loose the ability to catch exceptions (probably because they link with cupti statically, but I could not found where it is done)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007
2023-07-15 20:30:20 +00:00
PyTorch MergeBot
15fd1ea118 Revert "[Reland] Update mypy to 1.4.1 (#105227)"
This reverts commit c9c4f8efc3.

Reverted https://github.com/pytorch/pytorch/pull/105227 on behalf of https://github.com/atalman due to trying to mitigate ci sev #105248 ([comment](https://github.com/pytorch/pytorch/pull/105227#issuecomment-1636510935))
2023-07-14 22:28:35 +00:00
Nikita Karetnikov
0c89596e4f [OpInfo] add reference and error inputs for multi_margin_loss (#104850)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104850
Approved by: https://github.com/ezyang
2023-07-14 21:16:09 +00:00
Nikita Shulga
c9c4f8efc3 [Reland] Update mypy to 1.4.1 (#105227)
This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022)
- Update mypy to 1.4.1 (#91983)

That were reverted due to the conflict with internal source repo.

Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
  - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
  - Add missing return statement to `torch._export. deserialize_graph`
  - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
  - Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
  - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007
2023-07-14 20:45:12 +00:00
PyTorch MergeBot
3c5a494d7a Revert "Update mypy to 1.4.1 (#91983)"
This reverts commit 634659e262.

Reverted https://github.com/pytorch/pytorch/pull/91983 on behalf of https://github.com/malfet due to It's dependent change was reverted, so reverting this one as well, to keep CI clean ([comment](https://github.com/pytorch/pytorch/pull/91983#issuecomment-1636059709))
2023-07-14 15:59:16 +00:00
Kurt Mohler
f987d11fa7 Reland: Make torch.empty* deterministic by filling with NaN or max int (#104995)
Relands #101849 after #104302 reverted it.

torchrec PR https://github.com/pytorch/torchrec/pull/1269 fixes the torchrec failure that caused #101849 to be reverted

Part of #82004

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104995
Approved by: https://github.com/albanD
2023-07-13 22:18:03 +00:00
Nikita Shulga
634659e262 Update mypy to 1.4.1 (#91983)
Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
  - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
  - Add missing return statement to `torch._export. deserialize_graph`
  - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
  -
TODO (in followup PR):
  - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91983
Approved by: https://github.com/kit1980, https://github.com/ZainRizvi, https://github.com/huydhn, https://github.com/thiagocrepaldi, https://github.com/aaronenyeshi
2023-07-13 16:30:36 +00:00
yanbing-j
053654b9cf Optimize scatter_add/scatter_reduce in BFloat16/Half data type in CPU backend (#103427)
### Description

This PR is to optimize scatter_add/scatter_reduce of BFloat16/Half data type in CPU backend, which is one task in https://github.com/pyg-team/pytorch_geometric/issues/7057. Main point is creating a buffer among threads to accumulate intermediate data as fp32 data type.

Next step:

 - [x] Add benchmarks
 - [x] Extend to Half
 - [x] Simplify code

### Performance test (Updated)

Test BFloat16 in Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
With jemalloc and iomp

Single socket (40C)
![image](https://github.com/pytorch/pytorch/assets/61222868/4b4342f1-8cc3-46f7-81f5-651becd9b1e3)

Single core
![image](https://github.com/pytorch/pytorch/assets/61222868/09e5f700-2c2e-4208-979e-74b85474dea6)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103427
Approved by: https://github.com/mingfeima, https://github.com/albanD
2023-07-13 09:34:29 +00:00
Aaron Gokaslan
2f95a3d0fc [BE]: Apply ruff PERF fixes to torch (#104917)
Applies automated ruff fixes in the PERF modules and enables all automatic ones. I also updated ruff which applied some additional fixes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104917
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-07-11 20:45:21 +00:00
Kurt Mohler
0ccdbbe233 Add deterministic path for Tensor.resize_ (#104300)
New elements added to a tensor by `torch.Tensor.resize_` are set to NaN/MAX_INT when deterministic mode is turned on.

When `torch.Tensor.resize_` is called on a quantized tensor and deterministic mode is turned on, a nondeterministic error is raised.

Part of #82004

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104300
Approved by: https://github.com/albanD
2023-07-07 00:22:13 +00:00
Nikita Shulga
ddd7da7546 Enable more tests (#104437)
Remove `test_segment_reductions` from list of blocklisted tests Remove `@onlyCPU` qualifier from test_segment_reductions as it has CUDA specific parts

Fixes https://github.com/pytorch/pytorch/issues/104410

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104437
Approved by: https://github.com/atalman, https://github.com/huydhn
2023-06-30 16:26:11 +00:00
Amr Elshennawy
a78bddac01 Revert D46920584: Multisect successfully blamed D46920584 for test or build failures (#104269) (#104302)
Summary:

This diff is reverting D46920584
D46920584: Make `torch.empty*` deterministic by filling with NaN or max int value (#101849) by generatedunixname499836121 has been identified to be causing the following test or build failures:

Tests affected:
- [torchrec/distributed/composable/tests:test_fsdp - torchrec.distributed.composable.tests.test_fsdp.FullyShardTest: test_composable_checkpoint](https://www.internalfb.com/intern/test/281475062923125/)

Here's the Multisect link:
https://www.internalfb.com/multisect/2341386
Here are the tasks that are relevant to this breakage:

We're generating a revert to back out the changes in this diff, please note the backout may land if someone accepts it.

If you believe this diff has been generated in error you may Commandeer and Abandon it.

Test Plan: NA

Reviewed By: huydhn, osalpekar

Differential Revision: D46997394

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104302
Approved by: https://github.com/osalpekar
2023-06-29 20:20:58 +00:00
Richard Barnes
8cad411d3d Fix UntypedStorage pin error (#104355)
Summary:
Fixes:
```
TypeError: cannot pin 'torch.storage.UntypedStorage' only CPU memory can be pinned
```

Test Plan: Sandcastle

Differential Revision: D47093797

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104355
Approved by: https://github.com/malfet
2023-06-29 16:06:52 +00:00
Kurt Mohler
2642f31e4c Make torch.empty* deterministic by filling with NaN or max int value (#101849)
Part of #82004

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101849
Approved by: https://github.com/lezcano, https://github.com/albanD, https://github.com/kulinseth
2023-06-21 02:53:22 +00:00
Elias Ellison
40d70ba7ed Remove a number of fixed skips (#103162)
Also adds `PYTORCH_TEST_WITH_AOT_EAGER` to distinguish errors coming from aot_autograd and not inductor (not tested in ci, but useful for local debugging)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103162
Approved by: https://github.com/desertfire
2023-06-08 17:37:59 +00:00
ts
d2d03f0f44 Make index_add_ error if input source shape is wrong (#100321)
Fixes #92576 , checking the following as described in the documentation:

"source.shape[dim] == len(index) and source.shape[i] == self.shape[i] for i != dim"

Would be happy to iterate on this if there are any issues, and would be happy to implement the checking for the CUDA and MPS implementations of index_add_.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100321
Approved by: https://github.com/lezcano
2023-06-08 06:51:10 +00:00
Lu Fang
1237502213 Introduce fast path for cuda_equal (#102714)
We introduce the same trick for cuda_equal. Assuming in cuda_equal, the flags are already handled correctly.

Added the tests for cuda part.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102714
Approved by: https://github.com/ezyang
2023-06-03 05:49:49 +00:00
Shiyan Deng
685505353a Back out "Add PyObject preservation for UntypedStorage (#97470)" (#102553)
Summary:
Original commit changeset: c24708d18ccb

Original Phabricator Diff: D46159983

Test Plan: SL tests and CI

Differential Revision: D46284986

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102553
Approved by: https://github.com/DanilBaibak
2023-06-01 17:23:43 +00:00
Edward Z. Yang
818d92f58c Support resize on meta storage (#101988)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101988
Approved by: https://github.com/albanD, https://github.com/bdhirsh
2023-05-25 04:41:45 +00:00
PyTorch MergeBot
210fc28d5e Revert "Support resize on meta storage (#101988)"
This reverts commit 7d1ba0a92a.

Reverted https://github.com/pytorch/pytorch/pull/101988 on behalf of https://github.com/osalpekar due to Need to revert and rebase this in order to unblock train import ([comment](https://github.com/pytorch/pytorch/pull/101988#issuecomment-1561970230))
2023-05-24 21:51:33 +00:00
Wang, Eikan
2e18dd2bdc Improve bf16 neg by bypassing the convertion between BF16 and FP32 (#99711)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99711
Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/desertfire
2023-05-24 03:25:23 +00:00
Kazuaki Ishizaki
be5e77ca4c Make _StorageBase.byteswap faster ( > 10000x) (#101925)
This PR addresses #101690. This PR implement faster data elements swap in `_StorageBase` using C++ rather than using Python.

This PR helps such a situation that a large model saved on a little-endian machine will be loaded on a big-endian machine.

TODO:
- [x] Add test cases
- [x] Add performance comparison before and after the PR
- [ ] (Optional) Investigate further opportunities for performance improvements by [SIMDization](https://dev.to/wunk/fast-array-reversal-with-simd-j3p)

Fixes #101690

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101925
Approved by: https://github.com/mikaylagawarecki
2023-05-24 00:13:41 +00:00
Edward Z. Yang
7d1ba0a92a Support resize on meta storage (#101988)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101988
Approved by: https://github.com/albanD, https://github.com/bdhirsh
2023-05-23 16:49:17 +00:00
Kurt Mohler
5fe629e314 Add PyObject preservation for UntypedStorage (#97470)
Part of #91395

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97470
Approved by: https://github.com/ezyang
2023-05-23 01:27:30 +00:00
drisspg
6f13d6892a Add meta support for multinomial (#101324)
# Summary
Found this when trying to compile the text gen loop of nanogpt here: b33289942b/torchbenchmark/models/nanogpt_generate/model.py (L322)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101324
Approved by: https://github.com/ngimel
2023-05-19 00:04:26 +00:00
Edward Z. Yang
c567748e16 Make interpolate_bilinear deterministic using decomposition (#101115)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101115
Approved by: https://github.com/ngimel
2023-05-11 22:48:01 +00:00
Yu, Guangye
14964b3aa5 Add is_xpu to torch type (#101072)
# Motivate
Without this PR:
```python
>>>import torch
>>>torch.IntTensor.is_cuda
False
>>>torch.IntTensor.is_xpu
<attribute 'is_xpu' of 'torch._C._TensorBase' objects>
```

With this PR:
```python
>>>import torch
>>>torch.IntTensor.is_xpu
False
```
Align to CUDA, some customer code use is_xpu to check the backend. Without this PR, the check is always True which result in an unexpected behavior

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101072
Approved by: https://github.com/mikaylagawarecki
2023-05-11 17:50:59 +00:00
vfdev-5
622e582a2b Register get_cpu_capability for jit (#100723)
Description:

Context: In torchvision we ensure that functional ops are torchscriptable. Recently exposed `torch.backends.cpu.get_cpu_capability()` in https://github.com/pytorch/pytorch/pull/100164 is failing in torchvision CI
```
RuntimeError:
Python builtin <built-in function _get_cpu_capability> is currently not supported in Torchscript:
  File "/usr/local/lib/python3.10/dist-packages/torch/backends/cpu/__init__.py", line 17
    - "AVX512"
    """
    return torch._C._get_cpu_capability()
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
```
Ref: https://github.com/pytorch/vision/pull/7557

In this PR, `torch._C._get_cpu_capability()` is explicitly registered for JIT and tested.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100723
Approved by: https://github.com/albanD
2023-05-09 09:52:29 +00:00
Valentin Andrei
9bc68fcd25 [pytorch] Accelerate indexing_backward_kernel with duplicates (#99441 attempt 2) (#100505)
By knowing the stride value ahead of time, we can simplify the kernel code as follows:

If stride == 1 we can use the whole warp to reduce the gradients
If stride < warp_size we don't need the internal while (start_feature < stride) loop as blockDim.x is always 32

This changes improve the performance of the kernel when duplicates are present and do not affect the performance with low amount of duplicates. The implementation is deterministic.

The proposed implementation uses opmath_t to accumulate in registers the gradient values so when using FP16/BF16 it may overflow if the number of elements is large. This is different from the initial implementation who accumulates in scalar_t and does not overflow. In addition, when the stride is 1, we are using warp shuffles to sum the gradient so the order of the addition is slightly different than a reference implementation which causes some minor numerical differences when compared to a reference.

TEST CODE:

```
# The first element is the number of iterations.
# The second represents the number of unique elements. If
# set to 0, the number of unique elements is equal to the
# number of elements.
# The remaining elements are the tensor dimensions.

basic_indexing_tests = [
    [10, 0, 12345],
    [10, 4, 12345],
    [10, 16, 512, 512, 32],
    [10, 0, 4, 4],
    [10, 0, 32, 32],
    [10, 8, 32, 32],
    [10, 8, 64, 32, 16],
    [10, 0, 64, 32, 16],
    [10, 16, 512, 512, 32],
    [10, 0, 675, 999, 13],
    [10, 0, 123, 456, 31],
    [10, 0, 512, 512, 32],
    [10, 4, 512, 512, 32],
    [10, 2, 512, 512, 32],
    [10, 0, 128, 128, 16, 16],
    [10, 8, 128, 126, 16, 16],
    [10, 4, 128, 126, 16, 16],
    [10, 0, 64, 64, 16, 16, 16],
    [10, 8, 64, 64, 16, 16, 16],
    [10, 2, 64, 64, 16, 16, 16],
    [10, 1, 64, 64, 16, 16, 16],
]

def run_basic_indexing_on_device(x, index, expected, device_string, iters):
    x_dev = x.to(device_string)
    x_dev = x_dev.detach().requires_grad_()
    index_dev = index.to(device_string)

    # Run backward pass; keep gradients and measure time
    torch.cuda.synchronize()
    t_bw_s = time()
    for _ in range(iters):
        y = x_dev[index_dev]
        z = y.sum()
        z.backward()
    torch.cuda.synchronize()
    t_bw_s = (time() - t_bw_s) / iters

    return (x_dev.grad, t_bw_s)

def run_basic_indexing_test(test_input):
    tensor_size = tuple(test_input[:5])
    niters = test_input[0]
    num_unique = test_input[1]
    tensor_size = tuple(test_input[2:])

    numel = 1
    for dim in tensor_size:
        numel *= dim
    if num_unique == 0:
        num_unique = numel

    index = torch.randint(0, num_unique, tensor_size, dtype=torch.long, device="cpu")
    x = torch.randn((numel,), dtype=torch.float32, device="cuda")

    index = index.detach()
    x = x.detach().requires_grad_()

    (cpu_grad, t_bw_cpu) = run_basic_indexing_on_device(x, index, numel / 2, "cpu", 1)
    (gpu_grad, t_bw_gpu) = run_basic_indexing_on_device(x, index, numel / 2, "cuda", 1)

    max_delta = torch.max(torch.abs(cpu_grad - gpu_grad.to("cpu")))
    missmatches = torch.nonzero(torch.abs(cpu_grad - gpu_grad.to("cpu")))

    (gpu_grad_perf, t_gpu) = run_basic_indexing_on_device(
        x, index, numel / 2, "cuda", niters
    )

    print(
        "test = {}, delta = {:.5f}, missmatches = {} duration_ms = {:.3f}".format(
            tuple(test_input), max_delta, missmatches, t_gpu * 1000.0
        )
    )

    if torch.numel(missmatches) > 0:
        print("cpu grad = {}", cpu_grad[missmatches])
        print("gpu grad = {}", gpu_grad[missmatches])
```

RESULTS:

```
Default Implementation

test = (1, 0, 12345), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.726
test = (1, 4, 12345), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.867
test = (1, 16, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 80.514
test = (1, 0, 4, 4), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.689
test = (1, 0, 32, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.547
test = (1, 8, 32, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.537
test = (1, 8, 64, 32, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1.199
test = (1, 0, 64, 32, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.584
test = (1, 16, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 80.055
test = (1, 0, 675, 999, 13), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 8.411
test = (1, 0, 123, 456, 31), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 2.419
test = (1, 0, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 8.048
test = (1, 4, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 307.633
test = (1, 2, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 606.403
test = (1, 0, 128, 128, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 4.099
test = (1, 8, 128, 126, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 76.813
test = (1, 4, 128, 126, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 148.760
test = (1, 0, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 16.547
test = (1, 8, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 317.583
test = (1, 2, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1204.800
test = (1, 1, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 2412.133

Small Stride Kernel Version

test = (1, 0, 12345), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.904
test = (1, 4, 12345), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 2.156
test = (1, 16, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 308.878
test = (1, 0, 4, 4), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.566
test = (1, 0, 32, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.540
test = (1, 8, 32, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.550
test = (1, 8, 64, 32, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 2.868
test = (1, 0, 64, 32, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.656
test = (1, 16, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 307.856
test = (1, 0, 675, 999, 13), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 6.624
test = (1, 0, 123, 456, 31), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1.837
test = (1, 0, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 6.274
test = (1, 4, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1127.040
test = (1, 2, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 2123.942
test = (1, 0, 128, 128, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 3.282
test = (1, 8, 128, 126, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 288.997
test = (1, 4, 128, 126, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 547.267
test = (1, 0, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 12.844
test = (1, 8, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1178.934
test = (1, 2, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 4262.042
test = (1, 1, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 8172.318

Stride 1 Kernel Version

test = (1, 0, 12345), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.692
test = (1, 4, 12345), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.834
test = (1, 16, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 81.023
test = (1, 0, 4, 4), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.631
test = (100, 0, 32, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.491
test = (100, 8, 32, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.477
test = (50, 8, 64, 32, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.561
test = (50, 0, 64, 32, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.516
test = (16, 10, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 126.455
test = (10, 0, 675, 999, 13), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 8.238
test = (10, 0, 123, 456, 31), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1.520
test = (10, 0, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 7.854
test = (10, 4, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 306.327
test = (10, 2, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 610.498
test = (5, 0, 128, 128, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 3.684
test = (5, 8, 128, 126, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 75.604
test = (5, 4, 128, 126, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 148.679
test = (1, 0, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 16.525
test = (1, 8, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 315.095
test = (1, 2, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1214.715
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100505
Approved by: https://github.com/ngimel
2023-05-03 23:52:58 +00:00
vfdev-5
6a12f10b08 Publicly exposing torch.backends.cpu.get_cpu_capability() (#100164)
Description:

- As suggested by Nikita, created `torch.backends.cpu` submodule and exposed `get_cpu_capability`.

- In torchvision Resize method we want to know current cpu capability in order to pick appropriate codepath depending on cpu capablities

Newly coded vectorized resize of uint8 images on AVX2 supported CPUs is now faster than older way (uint8->float->resize->uint8). However, on non-avx hardware (e.g. Mac M1) certain configs are slower using native uint8.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100164
Approved by: https://github.com/albanD, https://github.com/malfet
2023-05-03 19:02:07 +00:00
PyTorch MergeBot
1114673c90 Revert "[pytorch] Accelerate indexing_backward_kernel with duplicates (#99441)"
This reverts commit 97afbcbc80.

Reverted https://github.com/pytorch/pytorch/pull/99441 on behalf of https://github.com/ngimel due to breaks ROCM ([comment](https://github.com/pytorch/pytorch/pull/99441#issuecomment-1531804487))
2023-05-02 16:46:04 +00:00
Lu Fang
090ec55f8d
Only skip in torch inductor test
Differential Revision: D45464303nnPull Request resolved: https://github.com/pytorch/pytorch/pull/100435
2023-05-01 22:21:37 -07:00
Lu Fang
429155b3c8
Disable some check to get the test pass
Differential Revision: D45437730nnPull Request resolved: https://github.com/pytorch/pytorch/pull/100364
2023-05-01 16:28:12 -07:00
valentinandrei
97afbcbc80 [pytorch] Accelerate indexing_backward_kernel with duplicates (#99441)
By knowing the stride value ahead of time, we can simplify the kernel code as follows:

If `stride == 1` we can use the whole warp to reduce the gradients
If `stride < warp_size` we don't need the internal `while (start_feature < stride)` loop as `blockDim.x` is always 32

This changes improve the performance of the kernel when duplicates are present and do not affect the performance with low amount of duplicates. The implementation is deterministic.

The proposed implementation uses `opmath_t` to accumulate in registers the gradient values so when using FP16/BF16 it may overflow if the number of elements is large. This is different from the initial implementation who accumulates in `scalar_t` and does not overflow. In addition, when the stride is 1, we are using warp shuffles to sum the gradient so the order of the addition is slightly different than a reference implementation which causes some minor numerical differences when compared to a reference.

TEST CODE:

```
# The first element is the number of iterations.
# The second represents the number of unique elements. If
# set to 0, the number of unique elements is equal to the
# number of elements.
# The remaining elements are the tensor dimensions.

basic_indexing_tests = [
    [10, 0, 12345],
    [10, 4, 12345],
    [10, 16, 512, 512, 32],
    [10, 0, 4, 4],
    [10, 0, 32, 32],
    [10, 8, 32, 32],
    [10, 8, 64, 32, 16],
    [10, 0, 64, 32, 16],
    [10, 16, 512, 512, 32],
    [10, 0, 675, 999, 13],
    [10, 0, 123, 456, 31],
    [10, 0, 512, 512, 32],
    [10, 4, 512, 512, 32],
    [10, 2, 512, 512, 32],
    [10, 0, 128, 128, 16, 16],
    [10, 8, 128, 126, 16, 16],
    [10, 4, 128, 126, 16, 16],
    [10, 0, 64, 64, 16, 16, 16],
    [10, 8, 64, 64, 16, 16, 16],
    [10, 2, 64, 64, 16, 16, 16],
    [10, 1, 64, 64, 16, 16, 16],
]

def run_basic_indexing_on_device(x, index, expected, device_string, iters):
    x_dev = x.to(device_string)
    x_dev = x_dev.detach().requires_grad_()
    index_dev = index.to(device_string)

    # Run backward pass; keep gradients and measure time
    torch.cuda.synchronize()
    t_bw_s = time()
    for _ in range(iters):
        y = x_dev[index_dev]
        z = y.sum()
        z.backward()
    torch.cuda.synchronize()
    t_bw_s = (time() - t_bw_s) / iters

    return (x_dev.grad, t_bw_s)

def run_basic_indexing_test(test_input):
    tensor_size = tuple(test_input[:5])
    niters = test_input[0]
    num_unique = test_input[1]
    tensor_size = tuple(test_input[2:])

    numel = 1
    for dim in tensor_size:
        numel *= dim
    if num_unique == 0:
        num_unique = numel

    index = torch.randint(0, num_unique, tensor_size, dtype=torch.long, device="cpu")
    x = torch.randn((numel,), dtype=torch.float32, device="cuda")

    index = index.detach()
    x = x.detach().requires_grad_()

    (cpu_grad, t_bw_cpu) = run_basic_indexing_on_device(x, index, numel / 2, "cpu", 1)
    (gpu_grad, t_bw_gpu) = run_basic_indexing_on_device(x, index, numel / 2, "cuda", 1)

    max_delta = torch.max(torch.abs(cpu_grad - gpu_grad.to("cpu")))
    missmatches = torch.nonzero(torch.abs(cpu_grad - gpu_grad.to("cpu")))

    (gpu_grad_perf, t_gpu) = run_basic_indexing_on_device(
        x, index, numel / 2, "cuda", niters
    )

    print(
        "test = {}, delta = {:.5f}, missmatches = {} duration_ms = {:.3f}".format(
            tuple(test_input), max_delta, missmatches, t_gpu * 1000.0
        )
    )

    if torch.numel(missmatches) > 0:
        print("cpu grad = {}", cpu_grad[missmatches])
        print("gpu grad = {}", gpu_grad[missmatches])
```

RESULTS:

```
Default Implementation

test = (1, 0, 12345), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.726
test = (1, 4, 12345), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.867
test = (1, 16, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 80.514
test = (1, 0, 4, 4), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.689
test = (1, 0, 32, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.547
test = (1, 8, 32, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.537
test = (1, 8, 64, 32, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1.199
test = (1, 0, 64, 32, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.584
test = (1, 16, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 80.055
test = (1, 0, 675, 999, 13), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 8.411
test = (1, 0, 123, 456, 31), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 2.419
test = (1, 0, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 8.048
test = (1, 4, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 307.633
test = (1, 2, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 606.403
test = (1, 0, 128, 128, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 4.099
test = (1, 8, 128, 126, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 76.813
test = (1, 4, 128, 126, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 148.760
test = (1, 0, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 16.547
test = (1, 8, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 317.583
test = (1, 2, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1204.800
test = (1, 1, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 2412.133

Small Stride Kernel Version

test = (1, 0, 12345), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.904
test = (1, 4, 12345), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 2.156
test = (1, 16, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 308.878
test = (1, 0, 4, 4), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.566
test = (1, 0, 32, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.540
test = (1, 8, 32, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.550
test = (1, 8, 64, 32, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 2.868
test = (1, 0, 64, 32, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.656
test = (1, 16, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 307.856
test = (1, 0, 675, 999, 13), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 6.624
test = (1, 0, 123, 456, 31), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1.837
test = (1, 0, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 6.274
test = (1, 4, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1127.040
test = (1, 2, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 2123.942
test = (1, 0, 128, 128, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 3.282
test = (1, 8, 128, 126, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 288.997
test = (1, 4, 128, 126, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 547.267
test = (1, 0, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 12.844
test = (1, 8, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1178.934
test = (1, 2, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 4262.042
test = (1, 1, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 8172.318

Stride 1 Kernel Version

test = (1, 0, 12345), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.692
test = (1, 4, 12345), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.834
test = (1, 16, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 81.023
test = (1, 0, 4, 4), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.631
test = (100, 0, 32, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.491
test = (100, 8, 32, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.477
test = (50, 8, 64, 32, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.561
test = (50, 0, 64, 32, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 0.516
test = (16, 10, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 126.455
test = (10, 0, 675, 999, 13), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 8.238
test = (10, 0, 123, 456, 31), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1.520
test = (10, 0, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 7.854
test = (10, 4, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 306.327
test = (10, 2, 512, 512, 32), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 610.498
test = (5, 0, 128, 128, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 3.684
test = (5, 8, 128, 126, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 75.604
test = (5, 4, 128, 126, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 148.679
test = (1, 0, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 16.525
test = (1, 8, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 315.095
test = (1, 2, 64, 64, 16, 16, 16), delta = 0.00000, missmatches = tensor([], size=(0, 1), dtype=torch.int64) duration_ms = 1214.715
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99441
Approved by: https://github.com/ngimel
2023-05-01 22:41:00 +00:00
Lu Fang
d7fa7fa8cf
Introduce fast path in the CPU equal op
Differential Revision: D45282119nnPull Request resolved: https://github.com/pytorch/pytorch/pull/100024
2023-04-28 16:00:17 -07:00
kshitij12345
61dffa61c3 [fix] masked_scatter_: non-contiguous self (#100232)
Fixes https://github.com/pytorch/pytorch/issues/99638

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100232
Approved by: https://github.com/ngimel
2023-04-28 18:12:23 +00:00
dujinhang
9cd48b0575 Add warning information for dtypetensor. (#99521)
Fixes #ISSUE_NUMBER

Without affecting the existing cpu/cuda logic, a separate interface is provided for the custom backend and users can choose whether to use the interface function which provides 10 tensor types with custom backend variations.

Therefore, users can use torch.set_deafult_tensor_type to set the default device tensor type, or use torch.xxx.dtypetensor to create a tensor.For example,torch.set_deafult_tensor_type(torch.foo.DoubleTensor) or torch.foo.DoubleTensor([]).

@albanD , please review my changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99521
Approved by: https://github.com/albanD
2023-04-28 18:01:45 +00:00
Larry Liu
687afeb686 [dynamo][numpy] Add NumpyTensorVariable to translate ndarray attribute calls to tensor attributes (#95849)
Issue: #93684

# Problem

Reduce graph breaks when dynamo compiles python functions containing numpy functions and ndarray operations.

# Design (as I know it)

* Use torch_np.ndarray(a wrapper of tensor) to back a `VariableTracker`: `NumpyTensorVariable`.
* Translate all attributes and methods calls, on ndarray, to torch_np.ndarray equivalent.

This PR adds `NumpyTensorVariable` and supports:
1.  tensor to ndarray, ndarray to tensor
2. numpy functions such as numpy.meshgrid()
3. ndarray attributes such as `itemsize`, `stride`

Next PR will handle returning `np.ndarray` and add support for ndarray methods
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95849
Approved by: https://github.com/ezyang
2023-04-27 16:18:35 +00:00
Jiong Gong
e5c9a0fcf5 [dynamo] avoid graph break on repeat_interleave.self_int (#99528)
Address convit_base failure: https://github.com/pytorch/torchdynamo/issues/1886 mentioned in https://github.com/pytorch/pytorch/issues/93777
Also for models like EleutherAI/gpt-j-6B.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99528
Approved by: https://github.com/ezyang
2023-04-25 04:47:39 +00:00
BJ Hargrave
555ab310dc Add itemsize and nbytes properties to Tensor (#98322)
Adds properties for itemsize and nbytes to Tensor matching the properties in NumPy.

Fixes https://github.com/pytorch/pytorch/issues/12728

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98322
Approved by: https://github.com/ezyang
2023-04-05 12:11:55 +00:00
Jason Ansel
b96fe9b61c Fix issues related to ClassInstantier in HF models (#97997)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97997
Approved by: https://github.com/anijain2305
2023-04-04 00:01:08 +00:00
Jason Ansel
71d850a100 [inductor] Fallback on complex64 kernels (#98155)
Later PRs in this stack fixe graph breaks in GoogleFnet which triggers errors from inductor trying to compile torch.complex64, this fixes that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98155
Approved by: https://github.com/anijain2305, https://github.com/ngimel
2023-04-03 01:06:43 +00:00
Nikita Shulga
2af09393f9 masked_scatter should accept only bool masks (#97999)
Modify test_torch to check that assert is raised in this case

torch.uint8 usage has been deprecated for a few releases, and errors has been raised for other dtypes on CUDA device, but not on CPU.
This PR finally restricts mask to just `torch.bool`
See https://github.com/pytorch/pytorch/pull/96594 as an example doing it for `torch.masked_fill`

Fixes https://github.com/pytorch/pytorch/issues/94634

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97999
Approved by: https://github.com/ngimel
2023-04-01 23:25:25 +00:00
Nikita Shulga
a1dc2b1774 [BE] Remove bool dtype from masked_scatter (#98015)
<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at a9fa438</samp>

Simplified a test function for `torch.masked_scatter` in `test/test_torch.py` by removing redundant and unnecessary code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98015
Approved by: https://github.com/ezyang
2023-03-31 01:45:57 +00:00
Aleksei Nikiforov
8289120ef0 Revert "test/test_torch.py: fix TestTorch::test_from_buffer test (#96952)" (#97759)
Tests were already fixed in https://github.com/pytorch/pytorch/pull/92834, and these changes instead of also fixing tests are now breaking them again.

This reverts commit 7f94ea8492.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97759
Approved by: https://github.com/janeyx99
2023-03-28 18:43:08 +00:00
Nikita Shulga
542fb0b1fa Specify file encoding in test_torch.py (#97628)
Attempt to fix
```
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 5260: ordinal not in range(128)
```
in https://github.com/pytorch/pytorch/actions/runs/4522628359/jobs/7965372405

In general, it's a good practice to explicitly specify encoding, as otherwise it depends on environment variable and makes tests failures unpredicatble

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97628
Approved by: https://github.com/dagitses, https://github.com/kit1980
2023-03-26 20:03:25 +00:00
Edward Z. Yang
37faa48844 DCE inference graphs too (#97275)
I added a bunch of asserts to verify that I didn't accidentally kill copy_ in the graph, hopefully this combined with our existing tests is good enough.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97275
Approved by: https://github.com/bdhirsh
2023-03-23 07:02:52 +00:00
Kurt Mohler
fbc803df0c Only warn once for TypedStorage deprecation (#97379)
Fixes #97207

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97379
Approved by: https://github.com/ezyang
2023-03-23 05:40:23 +00:00
Aleksei Nikiforov
7f94ea8492 test/test_torch.py: fix TestTorch::test_from_buffer test (#96952)
Use opposite encoding on big endian systems
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96952
Approved by: https://github.com/ezyang
2023-03-17 14:36:33 +00:00
mingfeima
06054d7df0 fix random output issue on index_select when src is scalar and index is empty (#96408)
Fix https://github.com/pytorch/pytorch/issues/94340
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96408
Approved by: https://github.com/ngimel
2023-03-16 05:30:45 +00:00
Kurt Mohler
06b7285163 Add torch._check* functions analogous to C++ TORCH_CHECK* (#88725)
Adds `_check`, `_check_index`, `_check_value`, `_check_type`, `_check_not_implemented`, `_check_tensor_all`

Part of #72948
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88725
Approved by: https://github.com/albanD
2023-03-14 20:44:50 +00:00
kvathupo
2b9d9bcb85 Deprecate non-bool masks in masked_fill (#96594)
__What?__
Per discussion at #94634, deprecate `masked_fill` with non-bool masks. Deprecation warnings were previously added by #22261, but not for Apple MPS. I can revert the MPS changes if deprecation warnings are wanted first tho. See also #96112.

Fixes #85063 and #89320.

__Further Development?__
- Fixed the mask dtype checking for the cuda dispatch for `masked_fill` in `aten/src/ATen/native/cuda/Indexing.cu`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96594
Approved by: https://github.com/malfet, https://github.com/ngimel
2023-03-13 01:41:47 +00:00
Nikita Shulga
1cd0929bf7 [BC] Allow only bool tensors as mask in masked_select (#96112)
`byte` support was marked as deprecated in 1.8, so it's fine to remove this in 2.1 (or even 2.0)
Deprecation warning was added by https://github.com/pytorch/pytorch/pull/22261

Also, fix bunch of syntactic errors in comments

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96112
Approved by: https://github.com/ezyang
2023-03-07 01:43:14 +00:00
puririshi98
8aa34602f7 Jetson Update for CI Redo (#94549)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94549
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-02-21 17:13:38 +00:00
Yuxin Wu
9bb2fe3eae fix numpy1.24 deprecations in unittests (#93997)
Fixes https://github.com/pytorch/pytorch/issues/91329

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93997
Approved by: https://github.com/ngimel, https://github.com/jerryzh168
2023-02-18 00:59:09 +00:00
Xuehai Pan
b005ec62b9 [BE] Remove dependency on six and future (#94709)
Remove the Python 2 and 3 compatibility library [six](https://pypi.org/project/six) and [future](https://pypi.org/project/future) and `torch._six`. We only support Python 3.8+ now. It's time to retire them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94709
Approved by: https://github.com/malfet, https://github.com/Skylion007
2023-02-14 09:14:14 +00:00
Brian Hirsh
ceb0f1576b turn functionalization on in aot_autograd inference (#92857)
still waiting for CI fallout
fixes #90759

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92857
Approved by: https://github.com/ezyang
2023-02-13 17:48:00 +00:00
Nikita Shulga
4869929f32 Update Triton hash (#94249)
That includes MLIR + latest packaging changes (that also download ptxas from CUDA-12)
Tweak CI to install gcc-9 to build trition

Disable a few tests to make everything be correct

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94249
Approved by: https://github.com/Skylion007, https://github.com/ngimel, https://github.com/weiwangmeta
2023-02-13 13:17:36 +00:00
Aaron Gokaslan
9171f7d4cd [BE] Modernize PyTorch even more for 3.8 with pyupgrade (#94520)
Applies some more pyupgrade fixits to PyTorch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94520
Approved by: https://github.com/ezyang
2023-02-10 18:02:50 +00:00
ganler
db6cfff827 fix: forbid multi-index for index_select over scalar (#94347)
Fixes #88940

According to the [doc](https://pytorch.org/docs/stable/generated/torch.index_select.html):
1. "The returned tensor has the same number of dimensions as the original tensor (`input`). "
2.  "The `dim`th dimension has the same size as the length of `index`; other dimensions have the same size as in the original tensor."

These two conditions cannot be satisfied at the same time if the `input` is a scalar && `index` has multiple values: because a scalar at most holds one element (according to property 1, the output is a scalar), it is impossible to satisfy "The `dim`th dimension has the same size as the length of `index`" when `index` has multiple values.

However, currently, if we do so we either get:

1. Buffer overflow with ASAN;
2. Or (w/o ASAN) silently returns outputs that is not consistent with the doc (`x.index_select(0, torch.Tensor([0, 0, 0]).int())` returns `x`).

As a result, we should explicitly reject such cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94347
Approved by: https://github.com/malfet
2023-02-10 17:17:09 +00:00
min-jean-cho
900e09c872 [Dynamo] Support torch.Tensor.fn as TorchVariable, not UserDefinedObjectVariable, preventing graph break (#93243)
As found in #92709, thanks to @ngimel and @jansel, currently `torch.Tensor.fn` points to `UserDefinedObjectVariable` rather than `TorchVariable`. The root cause is due to https://github.com/pytorch/pytorch/pull/92709#pullrequestreview-1273357406. To prevent this, build `TorchVariable`  of `torch.Tensor.fn` pointing to `torch.ops.aten.fn`.

This issue propagates to `torch.Tensor.fn` causing graph break with `nopython=True`.
```python
import torch
import torch._dynamo as dynamo

#op = torch.ops.aten.abs_ # no graph break
op = torch.Tensor.abs_ # graph break
args = torch.empty(10)

def foo(args):
    return op(args)

opt_foo = dynamo.optimize("inductor", nopython=True)(foo)
y_ = opt_foo(args)

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93243
Approved by: https://github.com/jansel
2023-02-07 09:26:50 +00:00
min-jean-cho
6e1cfcdf4b cauchy_ few fixes (1) check gamma > 0 (2) better dtype error log (#93314)
Related #92047

(1) `torch.Tensor.cauchy_` is missing check for `gamma > 0` (`torch.distributions.cauchy.Cauchy` correctly checks  `gamma > 0`).
(2) add better error log on dtype similar to exponential_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93314
Approved by: https://github.com/jgong5, https://github.com/fritzo, https://github.com/lezcano
2023-02-03 11:56:28 +00:00
min-jean-cho
2f0b0c5dd7 exponential_ few fixes (1) lambda > 0 (2) mkl kernel to continuous (3) better error log on dtype (#92891)
Exponential distribution is continuous. Fixes CPU MKL exponential implementation to exclude integer dtypes.

```python
import torch
dtypes = [torch.uint8, torch.int8, torch.int16, torch.int32, torch.int64]

for dtype in dtypes:
    x = torch.empty(10000, dtype=dtype).exponential_() # should fail !
    print("dtype: ", x.dtype, "sum: ", x.sum())
```

### Additional Context

Related to #92709. This issue propagates to OpInfo of exponential.

```
AssertionError: The supported dtypes for exponential on device type cpu are incorrect!
The following dtypes worked in forward but are not listed by the OpInfo: {torch.int64, torch.uint8, torch.int8, torch.int16, torch.int32}.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92891
Approved by: https://github.com/CaoE, https://github.com/jgong5, https://github.com/ngimel
2023-01-28 02:27:16 +00:00
Yanbo Liang
a6b51448f5 [Dynamo] Supports if condition on user defined object (#90892)
Fixes Meta internal user case, see the pattern in unit test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90892
Approved by: https://github.com/jansel, https://github.com/mlazos
2023-01-26 04:19:32 +00:00
PyTorch MergeBot
9b23fd378f Revert "Logcumsumexp for complex in CPU and CUDA (#90847)"
This reverts commit 64985123e4.

Reverted https://github.com/pytorch/pytorch/pull/90847 on behalf of https://github.com/malfet due to Reverting to decrease build time, let's discuss the alternatives here
2023-01-24 20:49:08 +00:00
pierreHaslee
1c30844eaa where() function added as a Tensor method as well (#92849)
Fixes #88470

I added the "method" keyword in `aten/src/ATen/native/native_functions.yaml` for the function `where` with Scalar Overload.
This way, you can now use `Tensor.where()` with a scalar parameter the same way `torch.where()` can.

I added a test in `test/test_torch.py` as requested.
It uses the `where()` method on a tensor and then checks it has the same results as the `torch.where()` function.
The test is roughly the same as the one provided by the author of the issue.

PS: this is the second PR I make to resolve this issue, the first one is #92747. I had troubles with commit signatures and is therefore closed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92849
Approved by: https://github.com/albanD
2023-01-24 03:09:33 +00:00
mfkasim1
64985123e4 Logcumsumexp for complex in CPU and CUDA (#90847)
Another PR towards solving #89205.
What's in this PR:

* The implementation of forward `logcumsumexp` for complex numbers in CPU & CUDA
* The tests on forward call of `logcumsumexp` for complex numbers
* The implementation of backward `logcumsumexp` for complex numbers

What's missing:

* The test on backward gradient of `logcumsumexp` (it complaints `RuntimeError: logcumsumexp does not support automatic differentiation for outputs with complex dtype.` and I don't know how to solve the error and I don't know where to put the test for the backward computation). If possible, I'd like this to be done in this PR.

It's really tricky to handle the edge cases here (i.e. the ones involving `inf`), but I've tried my best to put some comments explaining the reasonings of my decisions in this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90847
Approved by: https://github.com/albanD
2023-01-20 15:10:50 +00:00
Kurt Mohler
647b8f8e3e Add TORCH_CHECK_TENSOR_ALL (#89097)
`TORCH_CHECK_TENSOR_ALL(cond, ...)` is a wrapper around `TORCH_CHECK` which allows the condition argument to be a tensor, batched or unbatched. `cond` can be a boolean tensor of any size. If any element is False, or if `cond.numel() == 0`, then `TORCH_CHECK_TENSOR_ALL` raises an error

Part of #72948
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89097
Approved by: https://github.com/zou3519
2023-01-19 21:04:09 +00:00
BowenBao
a72bcb3388 Do not leak SkipFrame exception to parent frames (#91059)
Discovered by https://github.com/pytorch/torchdynamo/issues/2000, we noticed the exception `SkipFrame` to avoid repeatedly compiling frame of loop with graph breaks could leak to parent frames while inlining, which then prevents compiling.

This PR checks at inlining if such exception is raised and would instead raise an `Unsupported` to the outer frame. The original behavior and goal of #88857 is unaffected: the inner frame that has loop would still be skipped.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91059
Approved by: https://github.com/jansel, https://github.com/thiagocrepaldi
2023-01-13 17:11:22 +00:00
XiaobingSuper
1892c75a45 fix norrow_copy correctness issue for non-contiguous input for cpu path(reland) (#91883)
This PR is about re-land https://github.com/pytorch/pytorch/pull/91789.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91883
Approved by: https://github.com/lezcano
2023-01-10 10:56:18 +00:00
PyTorch MergeBot
d85f3c8237 Revert "fix norrow_copy correctness issue for non-contiguous input for cpu path (#91789)"
This reverts commit 136dadd689.

Reverted https://github.com/pytorch/pytorch/pull/91789 on behalf of https://github.com/huydhn due to This breaks trunk with XPASS test_vmap_exhaustive_narrow_copy_cpu_float32 136dadd689
2023-01-09 06:50:20 +00:00
XiaobingSuper
136dadd689 fix norrow_copy correctness issue for non-contiguous input for cpu path (#91789)
Fix https://github.com/pytorch/pytorch/issues/91690.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91789
Approved by: https://github.com/jgong5, https://github.com/lezcano
2023-01-09 00:55:03 +00:00
PyTorch MergeBot
b3603f8129 Revert "Deduplicate c10 error and PyTorchError hierarchy (#87855)"
This reverts commit 34f2d3e6ae.

Reverted https://github.com/pytorch/pytorch/pull/87855 on behalf of https://github.com/osalpekar due to perf regression in quantization tests
2023-01-06 19:56:35 +00:00
William Phetsinorath
34f2d3e6ae Deduplicate c10 error and PyTorchError hierarchy (#87855)
Fixes #53370

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87855
Approved by: https://github.com/albanD
2023-01-02 15:53:36 +00:00
ecao
274d3b24c3 use scatter_add for index_add when dim is the most inner dim (#88729)
### Motivation
When dim is -1 and the slice of source or result is noncontiguous, original `index_add` is slow as it uses add for the sliced tensor, which is serial on index and parallel on sliced tensor to avoid write conflict. Doing parallel on the sliced tensor is not optimal as the size of sliced tensor may be not big enough to parallel and also causes multiple parallelizations.

`scatter_add ` is used to speedup for this case as `scatter_add ` parallels on the outer dimension of input and is serial on the inner dimension to avoid write conflict. `scatter_add ` only need one parallel and the size of outer dimensions is bigger to do parallel.

### Testing

- Single core:

Before:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 2.82E-03 | 2.11E-03
[10, 128, 50, 50] | 0.023604 | 0.023794

After:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 9.30E-04 | 1.66E-03
[10, 128, 50, 50] | 0.005995 | 0.010003

- Single socket (28 cores):

Before:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 2.96E-03 | 2.52E-03
[10, 128, 50, 50] | 0.012208 | 0.012568

After:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 7.44E-05 | 1.33E-04
[10, 128, 50, 50] | 0.000333 | 0.000469

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88729
Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/malfet
2022-12-28 12:04:17 +00:00
PyTorch MergeBot
eadd557266 Revert "use scatter_add for index_add when dim is the most inner dim (#88729)"
This reverts commit 68e9da68cb.

Reverted https://github.com/pytorch/pytorch/pull/88729 on behalf of https://github.com/atalman due to Break internal build
2022-12-22 18:06:45 +00:00
ecao
68e9da68cb use scatter_add for index_add when dim is the most inner dim (#88729)
### Motivation
When dim is -1 and the slice of source or result is noncontiguous, original `index_add` is slow as it uses add for the sliced tensor, which is serial on index and parallel on sliced tensor to avoid write conflict. Doing parallel on the sliced tensor is not optimal as the size of sliced tensor may be not big enough to parallel and also causes multiple parallelizations.

`scatter_add ` is used to speedup for this case as `scatter_add ` parallels on the outer dimension of input and is serial on the inner dimension to avoid write conflict. `scatter_add ` only need one parallel and the size of outer dimensions is bigger to do parallel.

### Testing

- Single core:

Before:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 2.82E-03 | 2.11E-03
[10, 128, 50, 50] | 0.023604 | 0.023794

After:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 9.30E-04 | 1.66E-03
[10, 128, 50, 50] | 0.005995 | 0.010003

- Single socket (28 cores):

Before:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 2.96E-03 | 2.52E-03
[10, 128, 50, 50] | 0.012208 | 0.012568

After:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 7.44E-05 | 1.33E-04
[10, 128, 50, 50] | 0.000333 | 0.000469

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88729
Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/malfet
2022-12-22 01:13:35 +00:00
PyTorch MergeBot
3194281ca7 Revert "use scatter_add for index_add when dim is the most inner dim (#88729)"
This reverts commit 13dbad6369.

Reverted https://github.com/pytorch/pytorch/pull/88729 on behalf of https://github.com/desertfire due to causing inductor test failure
2022-12-20 15:19:54 +00:00
ecao
13dbad6369 use scatter_add for index_add when dim is the most inner dim (#88729)
### Motivation
When dim is -1 and the slice of source or result is noncontiguous, original `index_add` is slow as it uses add for the sliced tensor, which is serial on index and parallel on sliced tensor to avoid write conflict. Doing parallel on the sliced tensor is not optimal as the size of sliced tensor may be not big enough to parallel and also causes multiple parallelizations.

`scatter_add ` is used to speedup for this case as `scatter_add ` parallels on the outer dimension of input and is serial on the inner dimension to avoid write conflict. `scatter_add ` only need one parallel and the size of outer dimensions is bigger to do parallel.

### Testing

- Single core:

Before:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 2.82E-03 | 2.11E-03
[10, 128, 50, 50] | 0.023604 | 0.023794

After:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 9.30E-04 | 1.66E-03
[10, 128, 50, 50] | 0.005995 | 0.010003

- Single socket (28 cores):

Before:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 2.96E-03 | 2.52E-03
[10, 128, 50, 50] | 0.012208 | 0.012568

After:

shape | fp32 / s | bf16 / s
-- | -- | --
[10, 128, 20, 20] | 7.44E-05 | 1.33E-04
[10, 128, 50, 50] | 0.000333 | 0.000469

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88729
Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/malfet
2022-12-20 13:12:36 +00:00
Yanbo Liang
511fbad830 [Dynamo] Fix builder for class with metaclass (#90807)
Fixes Meta internal user case: a class with metaclass can't be identified as ```UserDefinedClassVariable```.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90807
Approved by: https://github.com/jansel
2022-12-20 05:02:28 +00:00
Edward Z. Yang
e686a442b4 If a torch.* returns non-Tensor, make this unimplemented rather than assert. (#89918)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89918
Approved by: https://github.com/albanD
2022-12-15 21:53:54 +00:00
Edward Z. Yang
283cf718ed Fix _fix_weakref memory leak (#90823)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90823
Approved by: https://github.com/eellison, https://github.com/albanD
2022-12-15 01:07:29 +00:00
Edward Z. Yang
cc504ce292 Restore test_warn_types (#90810)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90810
Approved by: https://github.com/ngimel
2022-12-14 05:15:32 +00:00
Bin Bao
7035bcdd0f [inductor] Enable test_torch (#90518)
Summary: Skipping failures in those tests so that CI can guard other
passing cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90518
Approved by: https://github.com/jansel
2022-12-13 16:21:35 +00:00
Yuxin Wu
5d8618dfbd Some memory saving in large unittests (#90148)
Two tests test_large_cumsum, test_large_cumprod use a lot of memory. This PR:
* Reduces their memory usage by: avoid `self.assertEqual` and avoid a temporary python variable
* Mark their memory requirement by decorator.

related to https://github.com/pytorch/pytorch/issues/84944
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90148
Approved by: https://github.com/soumith
2022-12-11 21:04:38 +00:00
Edward Z. Yang
2ad6ed8ac9 Fix some typed storage is deprecated warnings. (#89867)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89867
Approved by: https://github.com/albanD
2022-12-07 20:09:57 +00:00
eqy
f7520cb51e Reduce memory usage requirement of test_pdist_norm_large in test_torch.py (#90075)
Basically the same fix as #85373, `/usr/bin/time` indicates that the memory requirement on the host-side was actually ~64GiB before the workaround and ~30GiB after.

CC @ptrblck @davidberard98

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90075
Approved by: https://github.com/davidberard98
2022-12-03 05:28:21 +00:00
Yu, Guangye
4144ad16af add XPU backend to support torch.save and torch.load (#89679)
# Motivate
We need to add XPU backend to support torch.save and torch.load when parameter _use_new_zipfile_serialization=False.

# Solution
We give a design via wrap data as a tensor:
>1. and use an in-place copy for H2D
>2. directly call a tensor.to() for D2H.

This can help us:
>1. unify the generic code for all backends.
>2. support all the non-CPU device backends.

# Additional Context
No need more UT.
test/test_serialization.py will cover this code change.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89679
Approved by: https://github.com/ezyang
2022-11-30 20:38:02 +00:00
albanD
8713119c89 Stream actually overrides __new__ so we need to patch it as well (#89592)
Avoids
```
$ python foo.py
Traceback (most recent call last):
  File "foo.py", line 3, in <module>
    a = torch.cuda.Stream()
  File "/home/albandes/local/pytorch/3.8_debug_source/torch/cuda/streams.py", line 34, in __new__
    return super(Stream, cls).__new__(cls, priority=priority, **kwargs)
TypeError: object.__new__() takes exactly one argument (the type to instantiate)
```
And now gets
```
$ python foo.py
Traceback (most recent call last):
  File "foo.py", line 3, in <module>
    a = torch.cuda.Stream()
  File "/home/albandes/local/pytorch/3.8_debug_source/torch/cuda/streams.py", line 34, in __new__
    return super(Stream, cls).__new__(cls, priority=priority, **kwargs)
  File "/home/albandes/local/pytorch/3.8_debug_source/torch/cuda/_utils.py", line 44, in err_fn
    raise RuntimeError(
RuntimeError: Tried to instantiate dummy base class Stream

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89592
Approved by: https://github.com/soumith
2022-11-29 21:43:23 +00:00
David Berard
a029ec2c88 Move gpu slow tests to sm86 (#87880)
NVFuser tests (which are slow tests) would be better to run on more
modern GPU hardware.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87880
Approved by: https://github.com/malfet
2022-11-29 19:29:59 +00:00
Nikita Karetnikov
57af0c8245 Bug fix: make sure copy_impl doesn't read out of bounds (#88544)
Fixes #88543.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88544
Approved by: https://github.com/lezcano
2022-11-16 13:23:38 +00:00
PyTorch MergeBot
8441443132 Revert "Add nondeterministic error for scatter (#88244)"
This reverts commit e940a2f8e2.

Reverted https://github.com/pytorch/pytorch/pull/88244 on behalf of https://github.com/mehtanirav due to Internal test failures
2022-11-10 23:56:49 +00:00
Kurt Mohler
ee28b865ee Deprecate TypedStorage, its derived classes, and all of their public methods (#85303)
Part of #85302

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85303
Approved by: https://github.com/ezyang
2022-11-08 18:11:01 +00:00
Kurt Mohler
e940a2f8e2 Add nondeterministic error for scatter (#88244)
Fixes #88096

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88244
Approved by: https://github.com/ezyang, https://github.com/mruberry
2022-11-04 20:23:59 +00:00
Nikolay Korovaiko
0f6304ef1e disable the out variants in test_cumprod test for inductor (#88328)
`out=` variants aren't supported by autograd and it's not a must fix, so disabling the test (https://github.com/pytorch/torchdynamo/issues/1798) for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88328
Approved by: https://github.com/desertfire
2022-11-03 16:52:37 +00:00
Nikolay Korovaiko
529ba076c6 add an exclude for test_constructor for inductor (#88143)
This test (https://github.com/pytorch/torchdynamo/issues/1800) fails since none of the c-tor ops support `pin_memory=True`. Natalia suggests it's not a priority to fix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88143
Approved by: https://github.com/desertfire
2022-11-03 16:21:18 +00:00
Edward Z. Yang
f884e817d4 Make Python op registration work with torchdeploy/multipy (#87162)
See strategy at PythonOpRegistrationTrampoline.cpp for the
big picture.

Along the way, I made OperatorHandle support == and hashing,
and slightly changed the low level python_dispatch impl API
to disallow empty strings for dispatch key, which had the knock
on effect of requiring us to explicitly make sure we pass in
CompositeImplicitAutograd if we would have passed in "" (I didn't apply
this to the rest of the file because I'm lazy.)

Test strategy is we delete the logic for preventing Python op
registrations in torch from being skipped in a torchdeploy context
and show CI still works.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87162
Approved by: https://github.com/anjali411, https://github.com/bdhirsh
2022-11-03 12:56:44 +00:00
Philip Meier
bc73affdad prepare removal of deprecated functionality in torch.testing (#87969)
_Redo of #86586 with all BC breaking changes granularly placed into separate commits._

---

Per title. Deprecation happened on Feb 25, 2022 in c6f1bbc0ac, which made it into the 1.12 release. Since it is now 245 days later and the next release will be 1.14, the removals later in the stack comply with the [BC policy](https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#minimizing-the-disruption-of-bc-breaking-changes).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87969
Approved by: https://github.com/mruberry
2022-11-02 14:04:48 +00:00
Kurt Mohler
1dbc8ad3b7 Add Warning class and refactor C++ warnings to use it (#84101)
Also adds `TORCH_WARN_WITH` and `TORCH_WARN_DEPRECATION` macros

Part of #72948

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84101
Approved by: https://github.com/albanD
2022-10-18 20:02:42 +00:00
Natalia Gimelshein
1704256b10 Enables where to have cpu scalar args (#87022)
This is for decompositions only, no attempt made to have good performance for this case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87022
Approved by: https://github.com/ezyang, https://github.com/eellison, https://github.com/mruberry
2022-10-17 17:08:47 +00:00
Mikayla Gawarecki
afaee00fec Add python nested_tensor and as_nested_tensor constructors in torch.nested (#85593)
Remove `torch.nested_tensor` which has erroneous behavior wrt gradients (could be either leaf or not leaf). Introduce `torch.nested.nested_tensor` and `torch.nested.as_nested_tensor` in the vein of `torch.tensor` and `torch.as_tensor`. Done in nested `__init__.py` for now but can move to pybind in future (when we want to load from numpy/nested lists ).

Discussed offline with @cpuhrsch and pybind constructor (https://github.com/pytorch/pytorch/pull/85536) was more gnarly than expected, so we can move to that when we do need loading from numpy etc.

Differential Revision: [D39806622](https://our.internmc.facebook.com/intern/diff/D39806622)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85593
Approved by: https://github.com/drisspg, https://github.com/cpuhrsch
2022-09-28 20:15:02 +00:00