Commit Graph

175 Commits

Author SHA1 Message Date
Kurt Mohler
3908ebca86 Test COW materialization in backward ops (#123593)
Part of #97856

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123593
Approved by: https://github.com/ezyang
2024-04-09 22:31:50 +00:00
Pearu Peterson
d895192e87 Fix zeros_like on sparse compressed fake tensors (#123084)
Fixes https://github.com/pytorch/pytorch/pull/117907#issuecomment-2025769663

Adds block compressed sparse tensors support to zeros_like

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123084
Approved by: https://github.com/amjames, https://github.com/peterbell10
2024-04-03 16:11:11 +00:00
Yakai Wang
4d5cdc2e1e Fix empty_like bug for sparse tensors. (#121900)
Fixes #121671

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121900
Approved by: https://github.com/pearu
2024-04-01 22:40:38 +00:00
Kurt Mohler
ca9606f809 Update COW OpInfo test to include kwargs and expected materialization (#122437)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122437
Approved by: https://github.com/ezyang
2024-03-24 06:07:30 +00:00
PyTorch MergeBot
c80601f35a Revert "Avoid COW materialize in conv, log sigmoid, repeat, group_norm, batch_norm (#121537)"
This reverts commit a2a88f39ee.

Reverted https://github.com/pytorch/pytorch/pull/121537 on behalf of https://github.com/kurtamohler due to flaky CI failures ([comment](https://github.com/pytorch/pytorch/pull/121537#issuecomment-2010937226))
2024-03-21 00:03:30 +00:00
Kurt Mohler
a2a88f39ee Avoid COW materialize in conv, log sigmoid, repeat, group_norm, batch_norm (#121537)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121537
Approved by: https://github.com/ezyang
2024-03-19 06:15:00 +00:00
lezcano
8a5a377190 Move doc links to point to main (#121823)
The previous links were pointing to an outdated branch

Command: `find . -type f -exec sed -i "s:docs/main:docs/master:g" {} + `

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121823
Approved by: https://github.com/albanD, https://github.com/malfet
2024-03-15 19:49:37 +00:00
blorange-amd
b27d76949b [ROCm] Enable several fake_crossref UTs on ROCm (#121112)
Enabled unit tests:

test_ops::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_norm_subgradients_at_zero_cuda_float32
test_ops::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_norm_subgradients_at_zero_cuda_float32
test_ops::TestFakeTensorCUDA::test_fake_crossref_backward_amp_norm_nuc_cuda_float32
test_ops::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_norm_nuc_cuda_float32
test_ops::TestFakeTensorCUDA::test_fake_crossref_backward_amp_svd_cuda_float32
test_ops::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_svd_cuda_float32

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121112
Approved by: https://github.com/ezyang
2024-03-06 17:36:47 +00:00
Kurt Mohler
77aea289ae Add test to check that COW inputs are not materialized (#119507)
Part of #97856

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119507
Approved by: https://github.com/ezyang
ghstack dependencies: #120455
2024-03-01 05:05:28 +00:00
PyTorch MergeBot
dbe0967a0a Revert "Add test to check that COW inputs are not materialized (#119507)"
This reverts commit 2ebf2c88ba.

Reverted https://github.com/pytorch/pytorch/pull/119507 on behalf of https://github.com/izaitsevfb due to breaks xla jobs ([comment](https://github.com/pytorch/pytorch/pull/119507#issuecomment-1970022840))
2024-02-28 22:26:59 +00:00
Kurt Mohler
2ebf2c88ba Add test to check that COW inputs are not materialized (#119507)
Part of #97856

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119507
Approved by: https://github.com/ezyang
ghstack dependencies: #120455
2024-02-28 00:37:33 +00:00
lezcano
b97fa6ac30 Make roll a decomposition and remove its lowering (#119857)
We use the fact that we now propagate indexing properly to avoid having
to maintain two different implementations of the op. Doing this we also remove
a spurious guard on this op.

We move the ref into a decomp as we now use advanced indexing.
The only difference we did in the implementation is that we now use
advanced indexing rather than `torch.cat`.

We also remove it from core. Let's see how this goes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119857
Approved by: https://github.com/peterbell10, https://github.com/larryliu0820
ghstack dependencies: #119863, #119864
2024-02-16 19:14:39 +00:00
Andrew M. James
4625ecb858 Add decomp for linalg.cross (#119809)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119809
Approved by: https://github.com/lezcano, https://github.com/peterbell10
2024-02-16 09:58:38 +00:00
blorange-amd
df9b44436a [ROCm] Enable float16/complex32 fft tests on ROCm (#117296)
This PR is to enable float16/complex32 fft tests on ROCm.
Sample results are attached here:
[test_spectral_ops_results.log](https://github.com/pytorch/pytorch/files/13908533/test_spectral_ops_results.log)

test_decomp::TestDecompCUDA::test_comprehensive_fft*
test_decomp::TestDecompCUDA::test_quick_fft*
test_jit_fuser_te::TestNNCOpInfoCUDA::test_nnc_correctness_fft*
test_meta::TestMetaCUDA::test_dispatch_meta_inplace_fft*
test_meta::TestMetaCUDA::test_dispatch_meta_outplace_fft*
test_meta::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft*
test_meta::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft*
test_meta::TestMetaCUDA::test_meta_inplace_fft*
test_meta::TestMetaCUDA::test_meta_outplace_fft*
test_ops::TestCommonCUDA::test_complex_half_reference_testing_fft*
test_ops::TestCommonCUDA::test_python_ref__refs_fft*
test_ops::TestCommonCUDA::test_python_ref_executor__refs_fft*
test_ops::TestCommonCUDA::test_python_ref_meta__refs*
test_ops::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft*
test_schema_check::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft*
test_spectral_ops::TestFFTCUDA::test_empty_fft__refs_fft*
test_spectral_ops::TestFFTCUDA::test_empty_fft_fft*
test_spectral_ops::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft*
test_spectral_ops::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft*
test_spectral_ops::TestFFTCUDA::test_fft_round_trip_cuda*
test_spectral_ops::TestFFTCUDA::test_fft_type_promotion_cuda*
test_spectral_ops::TestFFTCUDA::test_fftn_round_trip_cuda*
test_spectral_ops::TestFFTCUDA::test_hfftn_cuda_float16
test_spectral_ops::TestFFTCUDA::test_ihfftn_cuda_float16
test_utils::TestDeviceUtilsCUDA::test_device_mode_ops_fft

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117296
Approved by: https://github.com/pruthvistony, https://github.com/malfet
2024-02-13 22:35:32 +00:00
Pearu Peterson
2c91e13afc Add lowerings to special functions (#119187)
As in the title.

In addition, the PR introduces infrastructure for lowerings of pointwise functions that have both cpp and triton implementations available.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119187
Approved by: https://github.com/peterbell10
2024-02-11 16:35:40 +00:00
Edward Z. Yang
9bce208dfb Replace follow_imports = silent with normal (#118414)
This is a lot of files changed! Don't panic! Here's how it works:

* Previously, we set `follow_imports = silent` for our mypy.ini configuration. Per https://mypy.readthedocs.io/en/stable/running_mypy.html#follow-imports, what this does is whenever we have an import to a module which is not listed as a file to be typechecked in mypy, we typecheck it as normal but suppress all errors that occurred in that file.
* When mypy is run inside lintrunner, the list of files is precisely the files covered by the glob in lintrunner.toml, but with files in excludes excluded.
* The top-level directive `# mypy: ignore-errors` instructs mypy to typecheck the file as normal, but ignore all errors.
* Therefore, it should be equivalent to set `follow_imports = normal`, if we put `# mypy: ignore-errors` on all files that were previously excluded from the file list.
* Having done this, we can remove the exclude list from .lintrunner.toml, since excluding a file from typechecking is baked into the files themselves.
* torch/_dynamo and torch/_inductor were previously in the exclude list, because they were covered by MYPYINDUCTOR. It is not OK to mark these as `# mypy: ignore-errors` as this will impede typechecking on the alternate configuration. So they are temporarily being checked twice, but I am suppressing the errors in these files as the configurations are not quite the same. I plan to unify the configurations so this is only a temporary state.
* There were some straggler type errors after these changes somehow, so I fixed them as needed. There weren't that many.

In the future, to start type checking a file, just remove the ignore-errors directive from the top of the file.

The codemod was done with this script authored by GPT-4:

```
import glob

exclude_patterns = [
    ...
]

for pattern in exclude_patterns:
    for filepath in glob.glob(pattern, recursive=True):
        if filepath.endswith('.py'):
            with open(filepath, 'r+') as f:
                content = f.read()
                f.seek(0, 0)
                f.write('# mypy: ignore-errors\n\n' + content)
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118414
Approved by: https://github.com/thiagocrepaldi, https://github.com/albanD
2024-01-27 02:44:11 +00:00
Khushi Agrawal
5d2d21a7be [bfloat16][easy] kthvalue, median (#117279)
Fixes #109991
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117279
Approved by: https://github.com/Skylion007
2024-01-11 22:44:07 +00:00
Pearu Peterson
4a37f57c69 Add batched sparse CSR/CSC/BSR/BSC to sparse COO conversion support (#116206)
As in the title.

Fixes https://github.com/pytorch/pytorch/issues/104868

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116206
Approved by: https://github.com/amjames, https://github.com/lezcano, https://github.com/cpuhrsch
2024-01-07 19:42:02 +00:00
Aaron Gokaslan
a7902571be Add bfloat16 CUDA support to gamma unary functions (#116929)
Add bfloat16 support to unary gamma functions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116929
Approved by: https://github.com/malfet
2024-01-07 02:07:55 +00:00
Aaron Gokaslan
bd10fea79a [BE]: Enable F821 and fix bugs (#116579)
Fixes #112371

I tried to fix as many of the bugs as I could, a few I could not figure out what the proper fix for them was though and so I left them with noqas.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116579
Approved by: https://github.com/ezyang
2024-01-01 08:40:46 +00:00
wgb
71ec3edbf7 Enhance Opinfo to support privateuse1 (#116417)
Fix Opinfo does not support third-party devices when the current test framework instantiation method is privateuse1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116417
Approved by: https://github.com/albanD
2023-12-29 13:43:29 +00:00
Isuru Fernando
1f1ff629a8 Use parent class attribute supports_out for foreach_zero opinfo (#112778)
Instead of introducing a new has_no_out_of_place attribute
Also fixes foreach_copy tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112778
Approved by: https://github.com/lezcano
2023-11-22 18:00:44 +00:00
Joel Schlosser
afdc528520 Print the index and summary of the SampleInput that failed an OpInfo test (#99444)
Related to the Reproducible Testing BE project. Goal is to print out the sample input that failed an OpInfo test.

Crazy idea: to avoid requiring widespread changes across tests that use OpInfo sample inputs, return a new special iterator type from `OpInfo.sample_inputs()`, etc. that tracks the most recent item seen. If a test fails later on, print out this info to identify the sample that failed the test.

This solves the problem that the test framework currently has no concept of which sample input is being operated on.

This PR contains the following changes:
* New `TrackedInputIter` that wraps a sample inputs func iterator and tracks the most recent input seen in a `TrackedInput` structure
    * The information is stored in a dictionary on the test function itself, mapping `full test ID -> most recent TrackedInput`
* To determine the test function that is being run, we do some stack crawling hackery in `extract_test_fn_and_id()`
* Above applies only when one of the following is called: `OpInfo.sample_inputs()`, `OpInfo.error_inputs()`, `OpInfo.reference_inputs()`, and `OpInfo.conjugate_sample_inputs()`. This could easily be extended to `ModuleInfo`s and the sparse sample input funcs as well

Example output when a sample input causes a failure:
```
======================================================================
ERROR: test_foo_add_cpu_uint8 (__main__.TestFakeTensorCPU)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jbschlosser/branches/reproducible_testing/torch/testing/_internal/common_device_type.py", line 911, in test_wrapper
    return test(*args, **kwargs)
  File "/home/jbschlosser/branches/reproducible_testing/torch/testing/_internal/common_device_type.py", line 1097, in only_fn
    return fn(slf, *args, **kwargs)
  File "/home/jbschlosser/branches/reproducible_testing/test/test_ops.py", line 2211, in test_foo
    self.fail('Example failure')
AssertionError: Example failure

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/jbschlosser/branches/reproducible_testing/torch/testing/_internal/common_utils.py", line 2436, in wrapper
    method(*args, **kwargs)
  File "/home/jbschlosser/branches/reproducible_testing/torch/testing/_internal/common_device_type.py", line 414, in instantiated_test
    result = test(self, **param_kwargs)
  File "/home/jbschlosser/branches/reproducible_testing/torch/testing/_internal/common_device_type.py", line 917, in test_wrapper
    raise Exception(
Exception: Caused by sample input at index 2: SampleInput(input=Tensor[size=(5, 1), device="cpu", dtype=torch.uint8], args=TensorList[Tensor[size=(5,), device="cpu", dtype=torch.uint8]], kwargs={}, broadcasts_input=True, name='')

To execute this test, run the following from the base repo dir:
     python test/test_ops.py -k test_foo_add_cpu_uint8

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

----------------------------------------------------------------------
```

This notably doesn't print the actual `SampleInput` values, as that's hard without fully reproducible random sample generation. I went down this path for a while and it seems infeasible without adding an untenable amount of overhead to set the random seed per SampleInput (see https://github.com/pytorch/pytorch/issues/86694#issuecomment-1614943708 for more details). For now, I am settling for at least spitting out the index and some metadata of the `SampleInput`, as it seems better than nothing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99444
Approved by: https://github.com/janeyx99
2023-11-21 23:08:35 +00:00
PyTorch MergeBot
5f0d72124e Revert "Print the index and summary of the SampleInput that failed an OpInfo test (#99444)"
This reverts commit e7f12b1eb0.

Reverted https://github.com/pytorch/pytorch/pull/99444 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to cause memory leak on CUDA job e7f12b1eb0 ([comment](https://github.com/pytorch/pytorch/pull/99444#issuecomment-1820491298))
2023-11-21 08:58:54 +00:00
Joel Schlosser
e7f12b1eb0 Print the index and summary of the SampleInput that failed an OpInfo test (#99444)
Related to the Reproducible Testing BE project. Goal is to print out the sample input that failed an OpInfo test.

Crazy idea: to avoid requiring widespread changes across tests that use OpInfo sample inputs, return a new special iterator type from `OpInfo.sample_inputs()`, etc. that tracks the most recent item seen. If a test fails later on, print out this info to identify the sample that failed the test.

This solves the problem that the test framework currently has no concept of which sample input is being operated on.

This PR contains the following changes:
* New `TrackedInputIter` that wraps a sample inputs func iterator and tracks the most recent input seen in a `TrackedInput` structure
    * The information is stored in a dictionary on the test function itself, mapping `full test ID -> most recent TrackedInput`
* To determine the test function that is being run, we do some stack crawling hackery in `extract_test_fn_and_id()`
* Above applies only when one of the following is called: `OpInfo.sample_inputs()`, `OpInfo.error_inputs()`, `OpInfo.reference_inputs()`, and `OpInfo.conjugate_sample_inputs()`. This could easily be extended to `ModuleInfo`s and the sparse sample input funcs as well

Example output when a sample input causes a failure:
```
======================================================================
ERROR: test_foo_add_cpu_uint8 (__main__.TestFakeTensorCPU)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jbschlosser/branches/reproducible_testing/torch/testing/_internal/common_device_type.py", line 911, in test_wrapper
    return test(*args, **kwargs)
  File "/home/jbschlosser/branches/reproducible_testing/torch/testing/_internal/common_device_type.py", line 1097, in only_fn
    return fn(slf, *args, **kwargs)
  File "/home/jbschlosser/branches/reproducible_testing/test/test_ops.py", line 2211, in test_foo
    self.fail('Example failure')
AssertionError: Example failure

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/jbschlosser/branches/reproducible_testing/torch/testing/_internal/common_utils.py", line 2436, in wrapper
    method(*args, **kwargs)
  File "/home/jbschlosser/branches/reproducible_testing/torch/testing/_internal/common_device_type.py", line 414, in instantiated_test
    result = test(self, **param_kwargs)
  File "/home/jbschlosser/branches/reproducible_testing/torch/testing/_internal/common_device_type.py", line 917, in test_wrapper
    raise Exception(
Exception: Caused by sample input at index 2: SampleInput(input=Tensor[size=(5, 1), device="cpu", dtype=torch.uint8], args=TensorList[Tensor[size=(5,), device="cpu", dtype=torch.uint8]], kwargs={}, broadcasts_input=True, name='')

To execute this test, run the following from the base repo dir:
     python test/test_ops.py -k test_foo_add_cpu_uint8

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

----------------------------------------------------------------------
```

This notably doesn't print the actual `SampleInput` values, as that's hard without fully reproducible random sample generation. I went down this path for a while and it seems infeasible without adding an untenable amount of overhead to set the random seed per SampleInput (see https://github.com/pytorch/pytorch/issues/86694#issuecomment-1614943708 for more details). For now, I am settling for at least spitting out the index and some metadata of the `SampleInput`, as it seems better than nothing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99444
Approved by: https://github.com/janeyx99
2023-11-21 00:11:20 +00:00
CaoE
455241bbd3 Add Half for aten2, logaddexp, logaddexp2, hypot, and nextafter on CPU (#112138)
Add Half for aten2, logaddexp, logaddexp2, hypot, and nextafter on CPU.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112138
Approved by: https://github.com/cpuhrsch
2023-11-06 06:01:29 +00:00
CaoE
26b5e27ace Add Half support for cummax, cummin, cumprod, logcumsumexp, and prod on CPU (#112132)
Add Half support for cummax, cummin, cumprod, logcumsumexp, and prod on CPU.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112132
Approved by: https://github.com/cpuhrsch
2023-11-05 12:31:38 +00:00
CaoE
a310cc8968 Add Half support for kthvalue, cross, hist, and logit on CPU (#112135)
Add Half support for kthvalue, cross, hist, and logit on CPU.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112135
Approved by: https://github.com/cpuhrsch
2023-10-31 09:12:47 +00:00
Nikita Shulga
328a4c5475 [BE] Enhance OpInfo.supported_dtype (#111995)
Current implementation is prone to errors, as it accepts any object, but does not print an error or something if device_type is not recognized.

Remediate it by accepting both device-type and device identifies (either `torch.device` instance or "{device_type}:{ordinal}" string

Fixes https://github.com/pytorch/pytorch/issues/111179

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111995
Approved by: https://github.com/albanD
2023-10-27 19:42:01 +00:00
Cao E
1c89ea7f72 Add Half support for softmax and log_softmax on CPU (#103315)
Add Half support for softmax and log_softmax on CPU.
Note: This introduces a correctness issue with MPS https://github.com/pytorch/pytorch/issues/111416 and https://github.com/pytorch/pytorch/issues/111479.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103315
Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki, https://github.com/malfet
2023-10-26 08:38:54 +00:00
Aaron Gokaslan
cb856b08b2 [BE]: Attach cause to some exceptions and enable RUFF TRY200 (#111496)
Did some easy fixes from enabling TRY200. Most of these seem like oversights instead of intentional. The proper way to silence intentional errors is with `from None` to note that you thought about whether it should contain the cause and decided against it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111496
Approved by: https://github.com/malfet
2023-10-19 21:56:36 +00:00
CaoE
2a40b7efcb Add Half support for addcmul, addcdiv, cumsum, and topk on CPU (#103319)
Add Half support for addcmul, addcdiv, cumsum, and topk on CPU.
Note: This PR will introduce the issue  https://github.com/pytorch/pytorch/issues/111454.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103319
Approved by: https://github.com/jgong5, https://github.com/cpuhrsch
2023-10-19 17:47:45 +00:00
Philip Meier
973c87b320 raise instead of skip in test/test_meta.py (#110939)
Supersedes #109004.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110939
Approved by: https://github.com/lezcano, https://github.com/kurtamohler
2023-10-17 10:17:43 +00:00
CaoE
9399e0b1ff add fp16 support for gemm (#99498)
### Testing

Native matmul vs. mkldnn matmul  on SPR (with avx512_fp16 support)

single core:

Input | Naïve impl   / ms | oneDNN /   ms | Speed up
-- | -- | -- | --
M: 128, N: 128, K: 128, trans_a: False, trans_b: False | 2010.387 | 64.700 | 31.072
M: 128, N: 256, K: 128, trans_a: False, trans_b: False | 4027.116 | 107.780 | 37.364
M: 8192, N: 768, K: 768, trans_a: False, trans_b: False | 28685868.488 | 90663.008 | 316.401

56 cores:
Input | Naïve impl   / ms | oneDNN /   ms | Speed up
-- | -- | -- | --
M: 128, N: 128, K: 128, trans_a: False, trans_b: False | 5.091 | 0.24 | 211.30
M: 128, N: 128, K: 128, trans_a: False, trans_b: True | 5.224 | 0.23 | 220.09
M: 128, N: 256, K: 128, trans_a: False, trans_b: False | 10.006 | 0.30 | 330.31
M: 8192, N: 768, K: 768, trans_a: False, trans_b: False | 29435.372 | 1.770 | 1662.80
M: 8192, N: 768, K: 768, trans_a: False, trans_b: True | 31464.961 | 1.728 |  18204.76
M: 8192, N: 768, K: 3072, trans_a: False, trans_b: False | 115035.849  | 7.990 | 14396.90
M: 8192, N: 768, K: 3072, trans_a: False, trans_b: True | 122981.023 |  7.725 | 15918.34
Batch: 768, M: 128, N: 64, K: 128  | 2032.523 | 0.705 | 2882.23

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99498
Approved by: https://github.com/jgong5, https://github.com/malfet
2023-09-28 01:03:50 +00:00
Peter Bell
d796518485 [refs] Fix size check from #108360 (#109083)
PR #108360 uses the same default `last_dim_size` formula from complex-to-real (C2R) transforms for
complex-to-complex (C2C) and real-to-complex (R2C). However, this is not correct because for C2R
the input is only half the size of the full tensor, which is not the case for C2C and C2R.

This error is mostly benign since `last_dim_size` was only used for the `>= 1` condition which is
almost always met anyway.

For this PR I now use it as the argument to `_apply_norm` which makes it load-bearing for correctness
and so is thoroughly tested now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109083
Approved by: https://github.com/lezcano
2023-09-27 23:59:29 +00:00
Jane Xu
0a60219fe3 [foreach] Fix 0-size handling for real for real (#109402)
@crcrpar's last attempt to fix the 0-size problem unfortunately did not pass all cases. See my comment in https://github.com/pytorch/pytorch/issues/100701. When we have a tail tensor of size 0, the old code would mess with the chunk logic to check the previous tensor's length. This is flawed because:
1. if the previous tensor was also 0 sized, (so a tensor list of [tensor, tensor, tensor, ..., 0-sized tensor, 0-sized tensor],) chunks would still be 0 and the nested for loop would be missed.
2. the nested forloop pronounces side effects on tensorListMeta that _shouldn't_ be there! This can mess up the compute in unexpected ways that I haven't really needed to reason through.

We noticed that the problem had not been fixed due to an internal report. This PR solves the issue by:
- removing the finagling of chunks when the tail tensor is 0-sized
- adding a surefire way for the kernel to be launched in the case where the last tensor is 0-sized AND there's content in the metadata, signifying there is stuff to compute still.

## test plan

As I went through the code, I also added some comments explaining what's up and modified our tensor inputs to ensure that this case is tested in the test_parity test in test_foreach.py. Yes, I do realize there is quite a bit of duplication and that this file could be due for a refactor. That said, the primary goal of this PR is to fix the pretty egregious bug and refactoring can be a followup.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109402
Approved by: https://github.com/albanD
2023-09-26 17:38:20 +00:00
jjsjann123
0d3db1048a remove nvfuser test in upstream pytorch (#109918)
Removing nvfuser related tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109918
Approved by: https://github.com/msaroufim
2023-09-24 13:49:37 +00:00
Aaron Gokaslan
6d725e7d66 [BE]: enable ruff rules PLR1722 and PLW3301 (#109461)
Enables two ruff rules derived from pylint:
* PLR1722 replaces any exit() calls with sys.exit(). exit() is only designed to be used in repl contexts as may not always be imported by default. This always use the version in the sys module which is better
* PLW3301 replaces nested min / max calls with simplified versions (ie. `min(a, min(b, c))` => `min(a, b. c)`). The new version is more idiomatic and more efficient.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109461
Approved by: https://github.com/ezyang
2023-09-18 02:07:21 +00:00
Masaki Kozuki
602413a0a0 Refactor test_foreach.py (#107869)
## Summary
- Change the default of `supports_autograd` and `supports_forward_ad` of `ForeachFuncInfo` to `True`
- Add `test_zero_size_tensor_inputs` to make sure that foreach functions can handle 0-size Tensor inputs
- Add `test_parity` to check the consistency between outputs of foreach and for-loop of native function.
- Add `test_autodiff` to check forward-mode and reverse-mode AD
- Keep the corner cases that are not covered by the newly introduced methods

rel:
- #58833

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107869
Approved by: https://github.com/janeyx99
2023-09-14 19:39:26 +00:00
Kiarash Jamali
fb288aa99b Add Bfloat16 support to CrossKernel.cu (#108941)
Fixes #108940
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108941
Approved by: https://github.com/mikaylagawarecki
2023-09-11 19:05:01 +00:00
ekamiti
0f88d93b10 decomposition spectral ops fixes (#108360)
Fixes https://github.com/pytorch/pytorch/issues/105986, https://github.com/pytorch/pytorch/issues/108204, https://github.com/pytorch/pytorch/issues/108205

Fix all issues flagged when making changes for https://github.com/pytorch/pytorch/pull/107421

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108360
Approved by: https://github.com/ezyang
2023-09-09 04:48:09 +00:00
ekamiti
0ef2556351 Update sparse_funcs to include primtorch types (#107421)
Fixes #107335.

A few issues have been identified while enabling this test and filed:
https://github.com/pytorch/pytorch/issues/105986
https://github.com/pytorch/pytorch/issues/108204
https://github.com/pytorch/pytorch/issues/108205

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107421
Approved by: https://github.com/ezyang
2023-09-05 14:34:48 +00:00
lezcano
239fed7e1e Add reference for linalg.vecdot (#108188)
Was addressing https://github.com/pytorch/pytorch/issues/108127, but
then I realised that vecdot is already CompositeImplicit. Pushing anyway
as a short-and-sweet PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108188
Approved by: https://github.com/peterbell10
2023-08-31 15:30:23 +00:00
Ken Jin
7349e8c1a1 Don't use np.random for TorchDynamo (#108009)
Part of https://github.com/pytorch/pytorch/issues/107970

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108009
Approved by: https://github.com/lezcano
2023-08-28 17:18:40 +00:00
ekamiti
4a022e2185 Update unary_ufuncs groupings to include primtorch types. (#107345)
Fixes #107335. The skips were updated for the _ref ops to match those for eager mode where necessary. Part of breakdown of https://github.com/pytorch/pytorch/pull/104489.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107345
Approved by: https://github.com/ezyang
2023-08-23 22:45:19 +00:00
ekamiti
017499b078 Update reduction_ops groupings to include primtorch types (#107338)
Fixes https://github.com/pytorch/pytorch/issues/107335. The skips were updated for the _ref ops to match those for eager mode where necessary. Part of breakdown of https://github.com/pytorch/pytorch/pull/104489.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107338
Approved by: https://github.com/ezyang
2023-08-19 02:09:11 +00:00
Masaki Kozuki
b234b94760 Add in-place _foreach_copy (#107226)
Fixes #107162

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107226
Approved by: https://github.com/janeyx99
2023-08-17 00:11:18 +00:00
Ivan Yashchuk
c913f3857f Remove dynamo+nvfuser (#105789)
This PR removes unmaintained Dynamo+nvFuser.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105789
Approved by: https://github.com/jansel, https://github.com/jjsjann123, https://github.com/albanD
2023-08-08 22:29:32 +00:00
PyTorch MergeBot
891bb259f8 Revert "Remove dynamo+nvfuser (#105789)"
This reverts commit 6030151d37.

Reverted https://github.com/pytorch/pytorch/pull/105789 on behalf of https://github.com/DanilBaibak due to Break a lot of tests on main. ([comment](https://github.com/pytorch/pytorch/pull/105789#issuecomment-1669710571))
2023-08-08 14:20:32 +00:00
Ivan Yashchuk
6030151d37 Remove dynamo+nvfuser (#105789)
This PR removes unmaintained Dynamo+nvFuser.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105789
Approved by: https://github.com/jansel, https://github.com/jjsjann123, https://github.com/albanD
2023-08-08 13:29:31 +00:00