Commit Graph

1436 Commits

Author SHA1 Message Date
Sun, Jiayi
c173a9d9b3 add Half support for layer_norm on CPU (#99590)
### Testing
Single socket (icx, 32cores):
| shape | fp32 forward (ms) | fp16 forward (ms) | mixed fp32 fp16 forward (ms) | fp32 backward (ms) | fp16 backward (ms) | mixed fp32 fp16 backward (ms) |
| -- | -- | -- | -- | -- | -- | -- |
| (1, 8, 16) | 0.012 | 0.011 | 0.011 | 0.051 | 0.051 | 0.050 |
| (8 ,8, 16) | 0.013 | 0.013 | 0.013 | 0.054 | 0.053 | 0.051 |
| (32, 8, 16) | 0.015 | 0.014 | 0.014 | 0.059 | 0.054 | 0.052 |
| (64, 128, 56, 56) | 1.875 | 0.790 | 1.016 | 12.845 | 7.151 | 6.985 |
| (64, 128, 256, 256) | 50.226 | 25.462 | 35.736 | 328.957 | 179.615 | 175.618 |

Single core (icx):

| shape | fp32 forward (ms) | fp16 forward (ms) | mixed fp32 fp16 forward (ms) | fp32 backward (ms) | fp16 backward (ms) | mixed fp32 fp16 backward (ms) |
| -- | -- | -- | -- | -- | -- | -- |
| (1, 8, 16) | 0.012 | 0.011 | 0.011 | 0.040 | 0.041 | 0.041 |
| (8 ,8, 16) | 0.012 | 0.012 | 0.012 | 0.042 | 0.042 | 0.042 |
| (32, 8, 16) | 0.027 | 0.014 | 0.014 | 0.048 | 0.048 | 0.046 |
| (64, 128, 56, 56) | 58.054 | 11.034 | 17.928 | 108.603 | 48.816 | 50.244 |
| (64, 128, 256, 256) | 1327.758 | 352.394 | 496.994 | 2846.182 | 1224.247 | 1218.422 |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99590
Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/cpuhrsch
2023-12-20 01:11:15 +00:00
eqy
d55365dc05 [CUDA] Workaround shmem limit for certain input sizes in AdaptiveAvgPool1D (#115231)
Reference issue #68248

CC @ptrblck @malfet @xwang233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115231
Approved by: https://github.com/mikaylagawarecki
2023-12-19 22:40:10 +00:00
PyTorch MergeBot
c006c8b50e Revert "markDynamoStrictTest some more (#115885)"
This reverts commit 55ce4693ff.

Reverted https://github.com/pytorch/pytorch/pull/115885 on behalf of https://github.com/atalman due to OSSCI oncall, broke inductor ([comment](https://github.com/pytorch/pytorch/pull/115885#issuecomment-1858409669))
2023-12-15 19:51:24 +00:00
rzou
55ce4693ff markDynamoStrictTest some more (#115885)
Featuring
test_native_mha.py
test_nn.py
test_prims.py
test_schema_check.py
test_serialization.py
test_show_pickle.py
test_sort_and_select.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115885
Approved by: https://github.com/voznesenskym
ghstack dependencies: #115845, #115855, #115856, #115857, #115858, #115870, #115871, #115879
2023-12-15 13:19:52 +00:00
eqy
9056903b09 [CUDA] 64-bit indexing for avg_pool_backward (#114193)
Fixes #113833

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114193
Approved by: https://github.com/malfet
2023-12-15 03:58:46 +00:00
Mikayla Gawarecki
f5919335db Fix _load_from_state_dict for num_batches_tracked in batchnorm (#115285)
I approved https://github.com/pytorch/pytorch/pull/110850 which did the following

Previously:
`num_batches_tracked` not in state_dict when doing `m.load_state_dict(state_dict)` --> always overwrite module's `num_batches_tracked` in `load_from_state_dict` with a 0 cpu tensor

Now:
`num_batches_tracked` not in state_dict loaded when doing `m.load_state_dict(state_dict)` --> only overwrite module's `num_batches_tracked`  in `load_from_state_dict` with a 0 cpu tensor if module does not have `num_batches_tracked`

This causes the following issue:

```
with torch.device('meta'):
     m = BatchNorm(...)
m.load_state_dict(state_dict, assign=True)
```

If `num_batches_tracked` is not in `state_dict`, since `modules's` `num_batches_tracked` is present on meta device, it is not overwritten with a 0 cpu tensor. When compiling, this error is raised

```
AssertionError: Does not support mixing cuda+meta
```

I am not sure whether the explicit check for meta device makes sense as a fix, will add testing if this fix is ok

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115285
Approved by: https://github.com/albanD
2023-12-07 22:48:26 +00:00
Jeff Daily
4c04ae2451 [ROCm] fix test_softmax_forward_64bit_indexing_cuda OOM (#113093)
TestNNDeviceTypeCUDA.test_softmax_forward_64bit_indexing_cuda started failing for ROCm after #112096 with the message

torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 13.35 GiB. GPU 0 has a total capacity of 31.98 GiB of which 3.89 GiB is free. Of the allocated memory 26.69 GiB is allocated by PyTorch, and 18.91 MiB is reserved by PyTorch but unallocated.

This amounts to approximately 41GB. The test is currently decorated with `largeTensorTest("30GB", "cuda")` but this is not sufficient for ROCm.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113093
Approved by: https://github.com/malfet
2023-11-07 03:00:37 +00:00
Eddie Yan
e39668770a [CUDA] 64-bit indexing fixes for cross-entropy kernels (#112096)
For #108345, #111484

Addresses the forward kernels implicated in the issues, but will take another look at the backward kernels (in follow-up PRs if necessary).

The spatial softmax kernel is changed to use signed integer indexing rather than unsigned as `ScalarType` only has signed integer types declared for now, but this should be a minor change.

CC @ptrblck @crcrpar (who landed a few related PRs recently).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112096
Approved by: https://github.com/mikaylagawarecki
2023-11-06 17:37:08 +00:00
Tobias Ringwald
29716e865c Enforce both input tensor shapes of CosineEmbeddingLoss to be equal. (#112782)
…Added a test to prevent regressions.

Fixes #112732.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112782
Approved by: https://github.com/lezcano
2023-11-03 15:15:06 +00:00
Tristan Rice
013f622dd2 grid_sample: support bfloat16 (#112331)
This adds bfloat16 support to `torch.nn.functional.grid_sample` this is particularly important when doing feature sampling such as for rendering techniques used in PyTorch3d or for camera projections to voxel grids such as in SimpleBEV.

Related to #57707

Test plan:

```
pytest test/test_nn.py -k grid_sample
pytest test/test_ops.py -k grid_sample
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112331
Approved by: https://github.com/zou3519
2023-10-30 19:31:41 +00:00
Cao E
1c89ea7f72 Add Half support for softmax and log_softmax on CPU (#103315)
Add Half support for softmax and log_softmax on CPU.
Note: This introduces a correctness issue with MPS https://github.com/pytorch/pytorch/issues/111416 and https://github.com/pytorch/pytorch/issues/111479.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103315
Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki, https://github.com/malfet
2023-10-26 08:38:54 +00:00
pbialecki
17b732eb04 increase CPU memory requirement for test_nll_loss_large (#110963)
Running `python test_nn.py -v -k test_nll_loss_large_tensor` on a machine with a small host RAM availability (e.g. ~50GB) fails with a `SIGKILL` even though the currently specified memory requirements for CPU (and GPU) are set to 48GB and are thus met.

Profiling the peak memory usage via:
```
\time -v python test_nn.py -v -k test_nll_loss_large_tensor
```
and adding `print(torch.cuda.memory_summaryu())` at the end of the test shows a higher host RAM usage of >100GB and a device memory usage of ~32GB.
```
	Command being timed: "python test_nn.py -v -k test_nll_loss_large_tensor"
	User time (seconds): 81.66
	System time (seconds): 229.02
	Percent of CPU this job got: 671%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:46.30
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 118150096
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 90280839
	Voluntary context switches: 1669
	Involuntary context switches: 1214548
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
```
```
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |  32769 MiB |  32769 MiB |  81923 MiB |  49154 MiB |
|       from large pool |  32768 MiB |  32768 MiB |  81921 MiB |  49152 MiB |
|       from small pool |      0 MiB |      0 MiB |      1 MiB |      1 MiB |
|---------------------------------------------------------------------------|
| Active memory         |  32769 MiB |  32769 MiB |  81923 MiB |  49154 MiB |
|       from large pool |  32768 MiB |  32768 MiB |  81921 MiB |  49152 MiB |
|       from small pool |      0 MiB |      0 MiB |      1 MiB |      1 MiB |
|---------------------------------------------------------------------------|
| Requested memory      |  32769 MiB |  32769 MiB |  81923 MiB |  49154 MiB |
|       from large pool |  32768 MiB |  32768 MiB |  81921 MiB |  49152 MiB |
|       from small pool |      0 MiB |      0 MiB |      1 MiB |      1 MiB |
|---------------------------------------------------------------------------|
| GPU reserved memory   |  32774 MiB |  32774 MiB |  81938 MiB |  49164 MiB |
|       from large pool |  32772 MiB |  32772 MiB |  81930 MiB |  49158 MiB |
|       from small pool |      2 MiB |      2 MiB |      8 MiB |      6 MiB |
|---------------------------------------------------------------------------|
...
```

We haven't seen this issue before as the majority of our runners have sufficient host RAM and I just ran into it by chance.

CC @atalman @malfet @crcrpar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110963
Approved by: https://github.com/mikaylagawarecki, https://github.com/eqy, https://github.com/malfet
2023-10-25 23:45:47 +00:00
PyTorch MergeBot
5ce8002d24 Revert "Remove deprecated fbgemm operators (#104535)"
This reverts commit 57c7aa12db.

Reverted https://github.com/pytorch/pytorch/pull/104535 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/104535#issuecomment-1779650412))
2023-10-25 16:34:16 +00:00
Oleg Bulatov
192477b5ba Enable flake8-bugbear B020 lint (#110823)
Fixes part of https://github.com/pytorch/pytorch/issues/106571

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110823
Approved by: https://github.com/Skylion007
2023-10-24 22:43:47 +00:00
FFFrog
0e0f6a248d Fix num_batches_tracked of BatchNorm when load_state_dict (#110850)
Fixes #110361

as the title shown

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110850
Approved by: https://github.com/mikaylagawarecki
2023-10-24 04:20:38 +00:00
Peter Bell
57c7aa12db Remove deprecated fbgemm operators (#104535)
These operators are not used and have been deprecated since #72690 (Feb 2022). Additionally, the `torch.jit.quantized` interface has been deprecated since #40102 (June 2020).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104535
Approved by: https://github.com/ezyang
2023-10-22 06:10:09 +00:00
CaoE
54c28c564f add Half support for BatchNorm on CPU (#102070)
Fixes #106543

### Testing

Single core:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
(1, 4, 256, 256) | 0.7116 | 0.1427 | 0.1744 | 0.2638 | 0.2002 | 0.2556
(1, 32, 100, 100) | 0.8579 | 0.1725 | 0.2077 | 0.3023 | 0.2399 | 0.2995
(32, 16, 200, 200) | 57.3466 | 12.2179 | 13.1320 | 45.9524 | 24.1526 | 24.9882

28 cores:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
(1, 4, 256, 256) | 0.2571 | 0.0713 | 0.0846 | 0.1140 | 0.0883 |  0.1043
(1, 32, 100, 100) | 0.1077 | 0.0510 | 0.0548 | 0.0700 | 0.0645 | 0.0713
(32, 16, 200, 200) | 5.5060 | 1.4195 | 1.4663 | 6.773 | 3.0886 | 3.1343

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102070
Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki, https://github.com/mingfeima
2023-09-19 10:43:33 +00:00
lezcano
653c1564bf Fix broadcasting cosine_similarity (#109363)
Fixes https://github.com/pytorch/pytorch/issues/109333
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109363
Approved by: https://github.com/peterbell10
2023-09-15 17:12:35 +00:00
PyTorch MergeBot
b226373d16 Revert "add Half support for BatchNorm on CPU (#102070)"
This reverts commit b6a1d3fb97.

Reverted https://github.com/pytorch/pytorch/pull/102070 on behalf of https://github.com/clee2000 due to I'm very sorry but it looks like #106543 was not fixed, I still see it failing on main b6a1d3fb97 https://github.com/pytorch/pytorch/actions/runs/6185704949/job/16793975677 ([comment](https://github.com/pytorch/pytorch/pull/102070#issuecomment-1719747065))
2023-09-14 16:13:34 +00:00
CaoE
b6a1d3fb97 add Half support for BatchNorm on CPU (#102070)
Fixes #106543

### Testing

Single core:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
(1, 4, 256, 256) | 0.7116 | 0.1427 | 0.1744 | 0.2638 | 0.2002 | 0.2556
(1, 32, 100, 100) | 0.8579 | 0.1725 | 0.2077 | 0.3023 | 0.2399 | 0.2995
(32, 16, 200, 200) | 57.3466 | 12.2179 | 13.1320 | 45.9524 | 24.1526 | 24.9882

28 cores:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
(1, 4, 256, 256) | 0.2571 | 0.0713 | 0.0846 | 0.1140 | 0.0883 |  0.1043
(1, 32, 100, 100) | 0.1077 | 0.0510 | 0.0548 | 0.0700 | 0.0645 | 0.0713
(32, 16, 200, 200) | 5.5060 | 1.4195 | 1.4663 | 6.773 | 3.0886 | 3.1343

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102070
Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki
2023-09-14 12:23:59 +00:00
PyTorch MergeBot
04a765f95d Revert "add Half support for BatchNorm on CPU (#102070)"
This reverts commit 6065e7a97c.

Reverted https://github.com/pytorch/pytorch/pull/102070 on behalf of https://github.com/clee2000 due to sorry it looks like this is causing an unexpected success for `test_jit_fuser_te.py::TestNNCOpInfoCPU::test_nnc_correctness_nn_functional_batch_norm_cpu_float16` 6065e7a97c https://github.com/pytorch/pytorch/actions/runs/6178069462/job/16770849782 ([comment](https://github.com/pytorch/pytorch/pull/102070#issuecomment-1718402208))
2023-09-13 22:38:42 +00:00
CaoE
6065e7a97c add Half support for BatchNorm on CPU (#102070)
Fixes #106543

### Testing

Single core:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
(1, 4, 256, 256) | 0.7116 | 0.1427 | 0.1744 | 0.2638 | 0.2002 | 0.2556
(1, 32, 100, 100) | 0.8579 | 0.1725 | 0.2077 | 0.3023 | 0.2399 | 0.2995
(32, 16, 200, 200) | 57.3466 | 12.2179 | 13.1320 | 45.9524 | 24.1526 | 24.9882

28 cores:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
(1, 4, 256, 256) | 0.2571 | 0.0713 | 0.0846 | 0.1140 | 0.0883 |  0.1043
(1, 32, 100, 100) | 0.1077 | 0.0510 | 0.0548 | 0.0700 | 0.0645 | 0.0713
(32, 16, 200, 200) | 5.5060 | 1.4195 | 1.4663 | 6.773 | 3.0886 | 3.1343

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102070
Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki
2023-09-13 17:30:16 +00:00
Kurt Mohler
3f88e3105f Reland: Remove remaining global set_default_dtype calls from tests (#108088)
Fixes #68972

Relands #107246

To avoid causing Meta-internal CI failures, this PR avoids always asserting that the default dtype is float in the `TestCase.setUp/tearDown` methods. Instead, the assert is only done if `TestCase._default_dtype_check_enabled == True`. `_default_dtype_check_enabled` is set to True in the `if __name__ == "__main__":` blocks of all the relevant test files that have required changes for this issue

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108088
Approved by: https://github.com/ezyang
2023-09-07 03:04:34 +00:00
CaoE
8f02884569 add Half support for GroupNorm on CPU (#100234)
### Testing
Single socket (28cores):

* Contiguous:

shape | forward / s| forward / s| backward / s| backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 fp16 | fp32 | mixed fp32 fp16
[10,   128, 10, 10] | 2.45E-05 | 3.26E-05 | 6.87E-05 | 7.40E-05
[10,   128, 80, 80] | 0.000726 | 0.000606 | 0.002183 | 0.001112

* Channels Last:

shape | forward / s| forward / s| backward / s| backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 fp16 | fp32 | mixed fp32 fp16
[10,   128, 10, 10] | 2.88E-05 | 2.72E-05 | 6.56E-05 | 6.63E-05
[10,   128, 80, 80] | 0.00076 | 0.000256 | 0.002385 | 0.000735

Single core:

* Contiguous:

shape | forward / s| forward / s| backward / s| backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 fp16 | fp32 | mixed fp32 fp16
[10,   128, 10, 10] | 9.47E-05 | 1.90E-04 | 2.03E-04 | 3.10E-04
[10,   128, 80, 80] | 6.25E-03 | 8.98E-03 | 0.016485 | 0.01369

* Channels Last:

shape | forward / s| forward / s| backward / s| backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 fp16 | fp32 | mixed fp32 fp16
[10,   128, 10, 10] | 8.66E-05 | 7.89E-05 | 1.95E-04 | 1.43E-04
[10,   128, 80, 80] | 5.97E-03 | 3.13E-03 | 0.01626 | 8.70E-03

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100234
Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki
2023-09-01 21:25:24 +00:00
Mikayla Gawarecki
3817de5d84 Fix layernorm cpu precision issues (#108089)
#108072

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108089
Approved by: https://github.com/mingfeima, https://github.com/albanD
2023-08-30 23:55:10 +00:00
Xia, Weiwen
97a291f6bd [ONEDNN][BC-breaking] update onednn from v2.7.3 to v3.1.1 (#97957)
**Summary**
Update onednn from v2.7.3 to v3.1.1.
It is bc-breaking as some APIs are changed on oneDNN side. Changes include:
- PyTorch code where oneDNN is directly called
- Submodule `third_party/ideep` to adapt to oneDNN's new API.
- CMAKE files to fix build issues.

**Test plan**
Building issues and correctness are covered by CI checks.
For performance, we have run TorchBench models to ensure there is no regression. Below is the comparison before and after oneDNN update.
![image](https://github.com/pytorch/pytorch/assets/12522207/415a4ff0-7566-40c6-aed0-24997a475b0e)

Note:
- Base commit of PyTorch: da322ea
- CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Ice Lake)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97957
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-08-25 12:13:18 +00:00
Aaron Gokaslan
660e8060ad [BE]: Update ruff to 0.285 (#107519)
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.

I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
2023-08-22 23:16:38 +00:00
PyTorch MergeBot
d59a6864fb Revert "[BE]: Update ruff to 0.285 (#107519)"
This reverts commit 88ab3e4322.

Reverted https://github.com/pytorch/pytorch/pull/107519 on behalf of https://github.com/ZainRizvi due to Sorry, but this PR breaks internal tests. @ezyang, can you please hep them get unblocked? It seems like one of the strings was prob accidentally modified ([comment](https://github.com/pytorch/pytorch/pull/107519#issuecomment-1688833480))
2023-08-22 19:53:32 +00:00
Aaron Gokaslan
88ab3e4322 [BE]: Update ruff to 0.285 (#107519)
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.

I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
2023-08-20 01:36:18 +00:00
lcskrishna
bc662ffff9 [ROCm] Update ROCm skip decorators (#106138)
This PR adds a msg argument for skipIfRocm and skipCUDAIfRocm.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106138
Approved by: https://github.com/jataylo, https://github.com/jeffdaily, https://github.com/pruthvistony, https://github.com/albanD
2023-08-18 22:02:06 +00:00
Kurt Mohler
6af6b8f728 Reland: Remove set_default_dtype from nn tests (#107069)
Part of #68972
Relands #105775

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107069
Approved by: https://github.com/ezyang
2023-08-14 17:01:57 +00:00
PyTorch MergeBot
ec0f3fda7d Revert "Remove set_default_dtype from nn tests (#105775)"
This reverts commit 4d6a891baf.

Reverted https://github.com/pytorch/pytorch/pull/105775 on behalf of https://github.com/huydhn due to Sorry for reverting you change, it is failing one of the slow test in trunk ([comment](https://github.com/pytorch/pytorch/pull/105775#issuecomment-1675460195))
2023-08-11 22:14:17 +00:00
Kurt Mohler
4d6a891baf Remove set_default_dtype from nn tests (#105775)
Part of #68972

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105775
Approved by: https://github.com/ezyang
2023-08-10 14:56:13 +00:00
Jason Lu
bc88028e8e Back out "Reland "Make adding buffers more like adding parameters (#104069)" (#106224)" (#106743)
Summary:
Original commit changeset: 81319beb97f3

Original Phabricator Diff: D47961182

Test Plan: revert to maintain backward compat with legacy ads_dper3 production package. Read details in: S357822

Reviewed By: atuljangra

Differential Revision: D48131623

@diff-train-skip-merge
(D48131623 landed internally)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106743
Approved by: https://github.com/malfet
2023-08-08 15:27:34 +00:00
Michael Gschwind
63d45275f4 is causal hints for transformer (#106143)
Summary:
make is_causal hint flags available for the top level transformer module.

It's debatable whether this is useful -- at present we autodetect causal masks for src and tgt masks in transformer encoder and decoder, respectively. is_causal flags available woul enable users to short-cut this check by asserting whether they mask is causal, or not.

I am putting this diff up for discussion, not as a solution.  Not doing anything may be the right solution, unless there is strong (data-driven) user demand. -- it appears the consensus is to move ahead with this, as per discussions below.

@cpuhrsch @mikaylagawarecki @jbschlosser @janEbert

Test Plan: sandcastle

Differential Revision: D47373260

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106143
Approved by: https://github.com/mikaylagawarecki
2023-08-04 14:16:48 +00:00
CaoE
f82e6ff29e add channel last 3d support for batch_norm on CPU (#97774)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97774
Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/mikaylagawarecki
2023-08-03 01:16:05 +00:00
Mikayla Gawarecki
c9be60cd0e Add error inputs to ModuleInfo (mirroring OpInfo) (#106325)
Add infra for error inputs to ModuleInfos, migrate first few error inputs tests from test_nn.py (more to come!)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106325
Approved by: https://github.com/albanD
2023-08-01 12:49:56 +00:00
Mikayla Gawarecki
d8e5f2aa6d Reland "Make adding buffers more like adding parameters (#104069)" (#106224)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106224
Approved by: https://github.com/atalman, https://github.com/albanD
2023-07-31 17:18:56 +00:00
Mikayla Gawarecki
ca7ece9b50 [easy] improve hint on error message in nn.Module.load_state_dict (#106042)
Fix #105963

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106042
Approved by: https://github.com/albanD
2023-07-27 19:56:02 +00:00
Nikita Karetnikov
eac9e1b35f [OpInfo] add reference and error inputs for multilabel_margin_loss (#105523)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105523
Approved by: https://github.com/ezyang
2023-07-23 02:16:29 +00:00
Aaron Gokaslan
6d43c89f37 [BE]: Update Ruff to 0.0.280 (#105724)
Removes unusued loop values in python dictionary iteration. Automated fix from Ruff master

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105724
Approved by: https://github.com/ezyang, https://github.com/janeyx99
2023-07-22 23:03:34 +00:00
Andrey Talman
c6653b65d8 Back out "Make adding buffers more like adding parameters (#104069)" (#105581)
Summary:
D47537831 is breaking pyper tests: https://fb.workplace.com/groups/802176577445480/posts/1018902842439518/

with `TypeError: register_buffer() takes 3 positional arguments but 4 were given`

Original commit changeset: d4b4069fbd38

Original Phabricator Diff: D47537831

Test Plan:
```
buck2 run //caffe2/torch/fb/training_toolkit/integration_tests/training_lifecycle/cogwheel_tests/pyper_release_v2:cogwheel_smallworld_inline_cvr_infer_pyper_pyper__canary_offline_training-launcher -- --run-harness-in-tupperware --build-fbpkg ads_dper3 --build-fbpkg training_platform
```

Reviewed By: atalman

Differential Revision: D47600140

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105581
Approved by: https://github.com/mikaylagawarecki
2023-07-20 03:39:53 +00:00
Justin Chu
73e1455327 [BE] Enable ruff's UP rules and autoformat test/ (#105434)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105434
Approved by: https://github.com/albanD
2023-07-19 20:36:06 +00:00
Michael Gschwind
11b753af01 Refactor causal mask generation and detection for nn.transformer (#105265)
Summary:
* Create a private global-scope function _generate_subsequent because static class attribute member functions not supported by TorchScript resulting in torchscripting errors.
* Make TransformerEncoder and TransformerDecoder consistent w.r.t. is_causal handling by calling _detect_casual_mask
* Clarify documentation that is_causal is a hint
* Move causal mask detection into a method _detect_causal_mask
* only accept input-size compatible causal mask as causal mask
* update _generate_subsequent_causal_mask to include factory kwargs for dtype and device:
   avoid extra copies & conversions by passing directly to torch.full.

Test Plan: sandcastle & github CICD
Continuation of #101487 (due to a tooling issue) which is a continuation-in-part of https://github.com/pytorch/pytorch/pull/98327 by @janEbert

Differential Revision: D47427117

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105265
Approved by: https://github.com/mikaylagawarecki
2023-07-19 01:26:50 +00:00
Danni Li
1b78f23a1a Allow nn.ChannelShuffle to run without erroring on CUDA tensors (#105351)
Summary: Include GPU support for `nn.ChannelShuffle` & update test.

Fix: #104603

Test Plan: Please see GitHub Actions.

Differential Revision: D47523764

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105351
Approved by: https://github.com/mikaylagawarecki
2023-07-18 16:24:30 +00:00
ekamiti
32d422f335 Make adding buffers more like adding parameters (#104069)
Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new `Buffer` class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the `register_buffer` method has not been changed. The `persistent` parameter in the `Buffer` type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new `Buffer` type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the `Buffer` type can be used as a drop in replacement for `register_buffer` as it just leads to `register_buffer` being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible.

Fixes #35735

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104069
Approved by: https://github.com/mikaylagawarecki
2023-07-17 17:59:05 +00:00
Nikita Karetnikov
0c89596e4f [OpInfo] add reference and error inputs for multi_margin_loss (#104850)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104850
Approved by: https://github.com/ezyang
2023-07-14 21:16:09 +00:00
yanbing-j
3fe2b73416 Update use_mkldnn in LSTM op to avoid input and parameter not in the same device (#102050)
This PR is to fix https://github.com/pytorch/pytorch/issues/101935.

Only when input, parameters and hidden states are all in CPU device, LSTM will go into oneDNN fast path implementation. Otherwise, it will fallback to the original implmentation.

Note here, if input and parameters are indeed not in the same device, it will encounter Error `Input and parameter tensors are not at the same device, found input tensor......` in `check_attributes`. Therefore, the proper usage of LSTM is `input.to(device)` and `model.to(device)` together.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102050
Approved by: https://github.com/XiaobingSuper, https://github.com/albanD
2023-07-13 01:13:59 +00:00
Masaki Kozuki
6929e9e947 Use int64_t accordingly in cunn_SoftMaxBackward to avoid int overflow (#104270)
Fixes #103501

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104270
Approved by: https://github.com/malfet, https://github.com/mikaylagawarecki
2023-06-30 21:39:46 +00:00
cyy
54cb61f7d9 enable ASAN on some tests (#103647)
Enabling more tests on ASAN, meanwhile we disable float-divide-by-zero and float-cast-overflow, both are disabled because they are also disabled by default in latest clang.
The following cited doc explains the reasons.
```
-fsanitize=float-cast-overflow: Conversion to, from, or between floating-point types
which would overflow the destination. Because the range of representable values
for all floating-point types supported by Clang is [-inf, +inf], the only cases detected are
conversions from floating point to integer types.
-fsanitize=float-divide-by-zero: Floating point division by zero.
This is undefined per the C and C++ standards,
 but is defined by Clang (and by ISO/IEC/IEEE 60559 / IEEE 754) as producing
either an infinity or NaN value,
so is not included in -fsanitize=undefined.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103647
Approved by: https://github.com/kit1980
2023-06-28 02:17:14 +00:00