Commit Graph

67574 Commits

Author SHA1 Message Date
atalman
0bd5a3fed7 [releng] Docker release Refactor Push nightly tags step. Move cuda and cudnn version to docker tag rather then name (#116097)
Follow up after : https://github.com/pytorch/pytorch/pull/116070

This PR does 2 things.

1. Refactor Push nightly tags step, don't need to extract CUDA_VERSION anymore. New tag should be in this format: ``${PYTORCH_VERSION}-cuda$(CUDA_VERSION_SHORT)-cudnn$(CUDNN_VERSION)-runtime``
2. Move cuda$(CUDA_VERSION_SHORT)-cudnn$(CUDNN_VERSION) from docker name to tag

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116097
Approved by: https://github.com/jeanschmidt
2023-12-19 13:53:08 +00:00
Carlos Mocholí
a31effa15f Update device_mesh.py docs imports (#116074)
These are not importable from `torch.distributed`, at least today.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116074
Approved by: https://github.com/wz337, https://github.com/fegin
2023-12-19 09:44:55 +00:00
eqy
2a44034895 [CUDA] Include <thrust/swap.h> in LinearAlgebra.cu (#116072)
Fixes build against the latest `NVIDIA/cccl`.

CC @malfet @xwang233 @ptrblck

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116072
Approved by: https://github.com/malfet, https://github.com/xwang233
2023-12-19 05:56:52 +00:00
FFFrog
327bdcdb14 Some tiny modification about torch.set/get_default_device (#116014)
1. fix bug of torch.set_default_device in multi-threading
2. add new interface named torch.get_default_device

Fixes #115333
Fixes #115917

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116014
Approved by: https://github.com/malfet, https://github.com/jansel
2023-12-19 05:08:06 +00:00
wz337
b48abbc020 [DeviceMesh] Fix DeviceMesh docstring (#116053)
1. remove outdated comments
2. fix examples in docstring

Doc after fix:
<img width="706" alt="image" src="https://github.com/pytorch/pytorch/assets/31293777/19f4f03c-0fd7-4e88-bca1-1a6ce693fbb7">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116053
Approved by: https://github.com/wanchaol
2023-12-19 04:05:49 +00:00
Isuru Fernando
8b0122ad33 Add lowerings for reflection_pad{1, 3}d_backward (#115645)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115645
Approved by: https://github.com/lezcano, https://github.com/peterbell10
2023-12-19 04:05:10 +00:00
Nikita Shulga
9dda4b20a0 [MPS] Enable select/[broad]cast ops for complex dtypes (#115727)
By representing `torch.cfloat`/`torch.chalf` as `float2`/`half2` metal types and modifying `SCATTER_OPS_TEMPLATE`/`GATHER_OPS_TEMPLATE` to accept third argument which is fully specialized `cast` function, which is no-op for regular type, but special cased for float->complex and complex->float

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115727
Approved by: https://github.com/kulinseth
2023-12-19 02:25:28 +00:00
cyy
1544c37520 [7/N] Fixes clang-tidy warnings in c10/{core,util}/*.h (#115495)
This PR continues to fix clang-tidy warnings for headers in c10/core and c10/util.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115495
Approved by: https://github.com/malfet
2023-12-19 02:14:30 +00:00
CaoE
9b8f934068 Remove memory_format check for native_group_norm_backward (#115721)
To fix https://github.com/pytorch/pytorch/issues/115940.
Remove memory_format check for native_group_norm_backward.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115721
Approved by: https://github.com/mikaylagawarecki
2023-12-19 02:12:26 +00:00
Oguz Ulgen
01b979fc9a [Inductor] Fix constant folding and extern kernel mutation tracking bugs (#115908)
This PR fixes two bugs
1) Constant folding a triton kernel results in the kernel's inputs to be returned back without any modification. Disable constant folding for triton kernels. Need more investigation
2) NoneLayout buffers should not be deleted as they do not exist

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115908
Approved by: https://github.com/aakhundov, https://github.com/jansel
2023-12-19 02:06:50 +00:00
Yanbo Liang
bb5a27052f [Dynamo][9/N] Make SkipFilesVariable wrap functions only (#115963)
Make ```SkipFilesVariable``` only handle function type, and route skipped classes to ```UserDefinedClassVariable```. The reasons behind this are:
* We'd like to remove ```is_allowed```, so the allowed/disallowed torch classes should have a proper place to handle. We can put them in either ```SkipFilesVariable``` and ```UserDefinedClassVariable``` under the current architecture, but it's  confusing to have two places do one thing.
   - Going forward, let's make ```SkipFilesVariable``` only handle functions, and probably I'll rename it to ```SkippedFunctionVariable``` in the following PRs.
   - Let's do dispatch by value's type, all torch classes stuff would go to ```UserDefinedClassVariable``` in the next PR.
* We'd merge in_graph/skip/inline trace decision into the same API ```trace_rule.lookup```, so probably we have to limit the input to only function for better organizing ```VariableBuilder._wrap``` logics.
   - Next step, I'll merge ```skipfiles.check``` into ```trace_rules.lookup```, and do the skipfile check before wrapping them into correct variable tracker.
   - Though the ```TorchCtxManagerClassVariable``` is decided by ```trace_rules.lookup```, I'll refactor it out in the following PRs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115963
Approved by: https://github.com/jansel
2023-12-19 02:01:47 +00:00
PyTorch MergeBot
47908a608f Revert "[ROCm] add hipblaslt support (#114329)"
This reverts commit b062ea3803.

Reverted https://github.com/pytorch/pytorch/pull/114329 on behalf of https://github.com/jeanschmidt due to Reverting due to inconsistencies on internal diff ([comment](https://github.com/pytorch/pytorch/pull/114329#issuecomment-1861933267))
2023-12-19 01:04:58 +00:00
PyTorch MergeBot
ed0c0c49ef Revert "[ROCm] fix nightly 5.6 build (#116029)"
This reverts commit 63e242b1e4.

Reverted https://github.com/pytorch/pytorch/pull/116029 on behalf of https://github.com/jeanschmidt due to Need to revert, in order to be able to revert #114329 ([comment](https://github.com/pytorch/pytorch/pull/116029#issuecomment-1861931736))
2023-12-19 01:01:42 +00:00
atalman
368a0c06d4 [releng] Docker Official release make sure cuda version is part of image name (#116070)
Follow up on https://github.com/pytorch/pytorch/pull/115949

Change docker build image name:
``pytorch:2.1.2-devel``-> ``2.1.2-cuda12.1-cudnn8-devel and 2.1.2-cuda11.8-cudnn8-devel``

Ref: https://github.com/orgs/pytorch/packages/container/package/pytorch-nightly

Naming will be same as in https://hub.docker.com/r/pytorch/pytorch/tags
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116070
Approved by: https://github.com/huydhn, https://github.com/seemethere
2023-12-19 00:58:15 +00:00
Shubhraprakash Das
5894af83be Use dequantized weight and bias in conv2d quantized ops (#115615)
Summary:
Dequantize weight and bias for conv2d ops to improve performance. The weight and bias are usually small in size hence they do not increase memory footprint by a lot when dequantized.

With optimization cunet-enc ops:
vulkan.quantized_conv2d  {96, 72, 2}                      3753204
vulkan.quantized_conv2d  {96, 72, 2}                      6977048
vulkan.quantized_conv2d_dw{96, 72, 2}                      2499640
vulkan.quantized_conv2d_pw_2x2{96, 72, 2}                       842088
vulkan.quantized_conv2d  {48, 36, 4}                      2388152
vulkan.quantized_conv2d  {48, 36, 4}                      4775940
vulkan.quantized_conv2d_dw{48, 36, 4}                       709800
vulkan.quantized_conv2d_pw_2x2{48, 36, 4}                       483236
vulkan.quantized_conv2d  {24, 18, 8}                      2562144
vulkan.quantized_conv2d  {24, 18, 8}                      5447624
vulkan.quantized_conv2d_dw{24, 18, 8}                       392756
vulkan.quantized_conv2d_pw_2x2{24, 18, 8}                       509080

Without optimization:
vulkan.quantized_conv2d  {96, 72, 2}                      4291768
vulkan.quantized_conv2d  {96, 72, 2}                      7871344
vulkan.quantized_conv2d_dw{96, 72, 2}                      2658500
vulkan.quantized_conv2d_pw_2x2{96, 72, 2}                       891020
vulkan.quantized_conv2d  {48, 36, 4}                      2966860
vulkan.quantized_conv2d  {48, 36, 4}                      5661812
vulkan.quantized_conv2d_dw{48, 36, 4}                       816556
vulkan.quantized_conv2d_pw_2x2{48, 36, 4}                       528632
vulkan.quantized_conv2d  {24, 18, 8}                      3139604
vulkan.quantized_conv2d  {24, 18, 8}                      6202820
vulkan.quantized_conv2d_dw{24, 18, 8}                       452660
vulkan.quantized_conv2d_pw_2x2{24, 18, 8}                       557388

Test Plan:
Ensure all vulkan quantize tests pass:
buck2 run --target-platforms ovr_configplatform/macos:arm64-fbsourcexplat/caffe2:pt_vulkan_quantized_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 --show-output"
Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc
[==========] Running 78 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 78 tests from VulkanAPITest

...
[==========] 78 tests from 1 test suite ran. (1519 ms total)
[  PASSED  ] 78 tests.

buck2 run --target-platforms ovr_config//platform/macos:arm64-fbsource  //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 --show-output"

Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc
[==========] Running 395 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 395 tests from VulkanAPITest

...
[----------] 395 tests from VulkanAPITest (6515 ms total)

[----------] Global test environment tear-down
[==========] 395 tests from 1 test suite ran. (6515 ms total)
[  PASSED  ] 394 tests.
[  SKIPPED ] 1 test, listed below:
[  SKIPPED ] VulkanAPITest.querypool_flushed_shader_log

  YOU HAVE 5 DISABLED TESTS

Reviewed By: yipjustin

Differential Revision: D50997532

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115615
Approved by: https://github.com/manuelcandales, https://github.com/yipjustin
2023-12-19 00:23:52 +00:00
Yue Dong
270ed13e87 [DTensor] Make DTensor from_local backward partial() to replicate() pass through (#115967)
Summary:
This change makes the `DTensor.from_local()` placements in backward pass from `Partial()` to `Replicate()` as pass through for following reasons:
1. When we run backward pass of DTensor.from_local, if the target placement is partial() (i.e. from user manual overwrite code instead of torch_dispatch) we keep the grad as replicate. This is because converting the gradients back to `Partial()` is meaningless.
2. The current div logic will lead to wrong numerical value in the above case.

Test Plan:
**CI**:
CI Tests

**Unit test**:
`buck2 test mode/dev-nosan //caffe2/test/distributed/_tensor:redistribute`
- Passed

**With model training**:
```
# We tested the case where input tensor is manually overwrite as Partial() and
# output tensor manually overwrite to Shard() then to local.

# Before the change: numerical value not correct
Forward pass:
    collective: ReduceScatter
backward pass:
    collective: AllGather + div by process group size

# After the change: div is removed as expected.
Forward pass:
    collective: ReduceScatter
Backward pas:
    collective: AllGather
```

Differential Revision: D52175709

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115967
Approved by: https://github.com/wanchaol
2023-12-19 00:16:10 +00:00
Philip Meier
3472a9200d expand subclass type tests in dynamo (#116024)
Following up on my own comments in https://github.com/pytorch/pytorch/pull/115323#pullrequestreview-1769491483.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116024
Approved by: https://github.com/mlazos
2023-12-19 00:08:55 +00:00
David Berard
054f9548b4 [dynamo] Store CompilationEvents in a buffer in torch._dynamo.utils (#115788)
Motivation: it would be nice to be able to test using the metrics in log_compilation_event; currently dumps logs (or logs to a database in fbcode) - these are hard to use in unit tests.

This change:
* always record the information in torch._dynamo.utils.record_compilation_metrics; here, log into a limited-size deque to prevent the list of metrics from getting too long
* if config.log_compilation_metrics, then call back into the original log_compilation_event function

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115788
Approved by: https://github.com/yanboliang
2023-12-18 23:26:13 +00:00
drisspg
fc58909bab Fix allowed dtypes for mem_eff attention (#116026)
# Summary

Fix issue bug in detecting mem eff capability for cuda devices less than sm80:
https://github.com/pytorch-labs/gpt-fast/issues/49

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116026
Approved by: https://github.com/janeyx99
2023-12-18 23:20:52 +00:00
drisspg
6b120c6cf9 Update the sdpa benchmark to measure forward backward time in isolation (#115986)
# Summary

The benchmarks were getting a little stale and I think it makes more sense to measure in isolation now rather than E2E in a mha component.

This is a pre-req for getting the data for https://github.com/pytorch/pytorch/pull/115357

Output from run:
``` Shell
+------------+-----------+-----------+------------+-----------+-----------+----------------+--------------------+--------------------+
| batch_size | num_heads | q_seq_len | kv_seq_len | embed_dim | is_causal |     dtype      |    forward_time    |   backward_time    |
+------------+-----------+-----------+------------+-----------+-----------+----------------+--------------------+--------------------+
|     1      |    16     |    128    |    128     |   2048    |   True    | torch.bfloat16 | 23.86634959839284  | 66.21150835417211  |
|     1      |    16     |    128    |    128     |   2048    |   False   | torch.bfloat16 | 23.452017060481012 | 66.90612225793302  |
|     1      |    16     |    256    |    256     |   2048    |   True    | torch.bfloat16 | 24.478124547749758 |  76.4232068322599  |
|     1      |    16     |    256    |    256     |   2048    |   False   | torch.bfloat16 |  24.6928428998217  | 75.76151192188263  |
|     1      |    16     |    512    |    512     |   2048    |   True    | torch.bfloat16 | 28.69622849393636  | 114.73898496478796 |
|     1      |    16     |    512    |    512     |   2048    |   False   | torch.bfloat16 | 34.399422979913645 | 112.96746158041059 |
|     1      |    16     |   1024    |    1024    |   2048    |   True    | torch.bfloat16 |  65.4690912924707  | 216.26344555988908 |
|     1      |    16     |   1024    |    1024    |   2048    |   False   | torch.bfloat16 | 88.57532404363155  | 212.07790216431025 |
|     8      |    16     |    128    |    128     |   2048    |   True    | torch.bfloat16 | 11.582905380055308 | 70.09557797573505  |
|     8      |    16     |    128    |    128     |   2048    |   False   | torch.bfloat16 | 12.068384909071026 | 70.01491216942668  |
|     8      |    16     |    256    |    256     |   2048    |   True    | torch.bfloat16 | 31.671419646590945 | 203.54910241439939 |
|     8      |    16     |    256    |    256     |   2048    |   False   | torch.bfloat16 |  33.0585768679157  | 209.45609430782497 |
|     8      |    16     |    512    |    512     |   2048    |   True    | torch.bfloat16 | 87.43969700299202  | 469.8729298543185  |
|     8      |    16     |    512    |    512     |   2048    |   False   | torch.bfloat16 | 123.9265550393611  | 580.1084265112877  |
|     8      |    16     |   1024    |    1024    |   2048    |   True    | torch.bfloat16 | 561.1918237991632  | 1181.655174586922  |
|     8      |    16     |   1024    |    1024    |   2048    |   False   | torch.bfloat16 | 884.2707145959139  | 1662.4679416418073 |
+------------+-----------+-----------+------------+-----------+-----------+----------------+--------------------+--------------------+
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115986
Approved by: https://github.com/mikaylagawarecki
2023-12-18 22:40:47 +00:00
Joel Schlosser
bf62511e07 Reshape decomposition for jagged layout NT (#115191)
No more segfault from using `reshape()` on jagged NT :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115191
Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer
2023-12-18 22:34:41 +00:00
Jeff Daily
63e242b1e4 [ROCm] fix nightly 5.6 build (#116029)
ROCm 5.6 nightly wheel build broken by #114329.  This fixes it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116029
Approved by: https://github.com/huydhn, https://github.com/jithunnair-amd, https://github.com/atalman
2023-12-18 22:12:30 +00:00
Lucas Pasqualin
8452f41305 Adds allreduce to inductor remap (#115950)
Fixes #115728

Implements a rewrite path for allreduce

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115950
Approved by: https://github.com/wconstab
2023-12-18 22:00:22 +00:00
Tianyu Liu
2a5659a797 add length assertion to PrepareModuleInput and PrepareModuleOutput (#115957)
## summary

`zip(inputs, self.input_layouts, self.desired_input_layouts)` is used in `_prepare_input_fn`; similar for `_prepare_output_fn`. Without assertion, unmatched dimension in inputs/outputs will be lost, potentially causing unexpected behabiors.

## test plan
`python test/distributed/tensor/parallel/test_tp_style.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115957
Approved by: https://github.com/wanchaol
2023-12-18 21:50:18 +00:00
Jacob Rodal
a699b10339 [buck2][win] fix caffe2 protobuf_rule (#115954)
Summary:
c2_protobuf_rule ([here](https://fburl.com/code/iyiulpmv)) is broken on buck2, ultimately due to the following error:

> .\./caffe2.proto: File does not reside within any path specified using --proto_path (or -I).  You must specify a --proto_path which encompasses this file.  Note that the proto_path must be an exact prefix of the .proto file names -- protoc is too dumb to figure out when two paths (e.g. absolute and relative) are equivalent (it's harder than you think).

The root cause is differences in how buck1 and buck2 handle `%SRCDIR%` (absolute versus relative paths). This diff fixes the build.

Test Plan:
# Before

```
buck2 build arvr/mode/win/opt //xplat/caffe2:caffe2.pb.h
```

```
More details at https://www.internalfb.com/intern/buck/build/c6550454-ae6d-479e-9d08-016e544ef050
BUILD SUCCEEDED
```

```
Action failed: fbsource//xplat/caffe2:caffe2.pb.h (genrule)
Remote command returned non-zero exit code <no exit code>
Reproduce locally: frecli cas download-action 5df17cf64b7e2fc5ab090c91e1129f2f3cad36dc72c7c182ab052af23d3f32aa:145
stdout:
stderr:
OUTMISS: Missing outputs: buck-out/v2/gen/fbsource/dd87aacb8683145b/xplat/caffe2/caffe2.pb.h/out/caffe2.pb.h
```

# After

Buck1 still works

```
buck1 build arvr/mode/win/opt //xplat/caffe2:caffe2.pb.h
```

Buck2 works

```
buck2 build arvr/mode/win/opt //xplat/caffe2:caffe2.pb.h
```

```
Buck UI: https://www.internalfb.com/buck2/e5dae607-325a-4eab-b0c9-66fe4e9a6254
BUILD SUCCEEDED
```

Differential Revision: D52218365

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115954
Approved by: https://github.com/mcr229
2023-12-18 21:41:10 +00:00
isdanni
2f7bb18def [Doc] Add padding size constraint in nn.ReflectionPad2d (#115995)
Fixes #115532

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115995
Approved by: https://github.com/mikaylagawarecki
2023-12-18 21:29:14 +00:00
angelayi
1e272fb6d6 [export] Undo "module: export" labeling (#116042)
Delete the auto-labeling of "module: export" as this is not really used, and we want to delete the "module: export" label.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116042
Approved by: https://github.com/clee2000
2023-12-18 21:23:17 +00:00
Catherine Lee
c4748b425e Add main in dynamo/test_compile.py (#115941)
Need to verify that  it is dynamo's custom TestCase and run_tests instead of the general common_utils TestCase and run_tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115941
Approved by: https://github.com/msaroufim
2023-12-18 20:53:28 +00:00
Wanchao Liang
a1a0b290d2 [tp] further fix the docs (#115974)
some typo result in the note section not rendered properly, can't see
this from the last PR directly as the last PR only show the first commit
documentation :(

Also make the parallelize_module doc example more concrete

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115974
Approved by: https://github.com/wz337
2023-12-18 20:41:53 +00:00
Jesse Cai
8868c1cfae [sparse][ci] Add cuSPASRELt to CI (#115369)
Summary:

This PR adds in cuSPARSELt v0.4.07 into CI (12.1 and 11.8 CUDA) to run our cuSPARSELt specific tests.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115369
Approved by: https://github.com/malfet
2023-12-18 20:33:30 +00:00
PyTorch UpdateBot
2b2ed52799 [xla hash update] update the pinned xla hash (#116003)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml).
Update the pinned xla hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116003
Approved by: https://github.com/clee2000
2023-12-18 20:31:49 +00:00
atalman
7b6210e8a4 Use matrix generate script for docker release workflows (#115949)
Enable both supported CUDA version builds for docker release. Rather then building only 1 version.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115949
Approved by: https://github.com/huydhn
2023-12-18 20:20:59 +00:00
gs-olive
e30d436b01 [fx][split][testing] Add testing for #107981 (#108731)
- Follow-up to #107981, adding testing for metadata copying in placeholder nodes within the `split_by_tags` utility
- Validation included in the test from #107248, since both tests are relevant to the same aspect of the utility
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108731
Approved by: https://github.com/angelayi
2023-12-18 20:19:18 +00:00
vinithakv
bf20b56e9d Fix PyTorch build error on ppc64le (#115729)
The PyTorch build breaks when building from tip on ppc64le with following error pytorch/aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp:863:46: error: no matching function for call to 'at::vec::DEFAULT::Vectorizedc10::qint8::dequantize(at::vec::DEFAULT::Vectorized&, at::vec::DEFAULT::Vectorized&)

Issue reported #115165

This patch fixes the build issue.

Fixes #115165

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115729
Approved by: https://github.com/albanD
2023-12-18 19:00:56 +00:00
Tobias Ringwald
77366ba637 Increased hardcoded limit for number of GPUs. (#115368)
Fixes #115331.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115368
Approved by: https://github.com/albanD
2023-12-18 18:39:19 +00:00
Michael Lazos
80b1ecc308 Run eager adam optimizer in benchmarks where possible (#115445)
Runs eager Adam (instead of SGD) on all models that don't fail accuracy.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115445
Approved by: https://github.com/desertfire
2023-12-18 18:28:23 +00:00
zdevito
8a445f7bd5 Serve multistream graph captures from correct pool (#114647)
This fixes #114320 by placing the logic for determining whether to allocate
to a pool inside a callback that is controlled by CUDAGraph.cpp or by the
python bound api to allocate a stream directly to a pool.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114647
Approved by: https://github.com/ngimel, https://github.com/eellison
2023-12-18 18:24:15 +00:00
CK Luk
3b70bd3970 Take 2 of "Add an option to log the source of the Triton kernels generated by torch._inductor (#115979)
Summary: This is useful the comparing the Triton kernels generated by two different invocations of torch.compile on the same model (e.g., checking of serial compile and parallel compile generate identical Triton kernels).

Test Plan:
Unit test:
buck2 test mode/opt //caffe2/torch/fb/module_factory/sync_sgd/tests:test_torchdynamo_wrapper -- --print-passing-details >& ~/tmp/log.test
PyPer Mast job:
https://www.internalfb.com/mast/job/sw-951074659-OfflineTraining_87587a4e
See the *.py files generated in:
pyper_traces/tree/torchinductor_traces/sw-951074659-OfflineTraining_87587a4e/4623

Differential Revision: D52221500

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115979
Approved by: https://github.com/yanboliang
2023-12-18 18:16:44 +00:00
Behrang Javaherian
386776c49a [torch] Reduce the memory usage by adding flags to clearing intermediate graphs used for optimization during the ineference. (#115657)
Summary: During the inference time the intermediate graphs for optimization are not used so the Executor's graph is the only graph we need to keep around these two flags

Test Plan:
the FLAGS are all off by default

baseline
```
buck run mode/opt-clang  sigrid/predictor/client/localnet:run_model -- --model_id_to_load=951679039 --model_snapshot_to_load=244 --torch_jit_do_not_store_optimized_graph=true
I1212 10:24:20.407408 401092 SigridPredictorLocalModelFactory.cpp:32] Memory usage for 951679039_244 is 182863 Kb
```
```
buck run mode/opt-clang  sigrid/predictor/client/localnet:run_model -- --model_id_to_load=951679039 --model_snapshot_to_load=244 --torch_jit_do_not_store_optimized_graph=true --torch_jit_release_profiling_graph_after_optimization=true
I1212 10:31:37.663487 464000 SigridPredictorLocalModelFactory.cpp:32] Memory usage for 951679039_244 is 186127 Kb
```
```
buck run mode/opt-clang  sigrid/predictor/client/localnet:run_model -- --model_id_to_load=951679039 --model_snapshot_to_load=244 --torch_jit_do_not_store_optimized_graph=true --torch_jit_release_profiling_graph_after_optimization=true --torch_jit_execution_plan_avoid_extra_graph_copy=true
I1212 10:29:42.848093 447218 SigridPredictorLocalModelFactory.cpp:32] Memory usage for 951679039_244 is 129451 Kb```

Differential Revision: D52081631

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115657
Approved by: https://github.com/houseroad
2023-12-18 17:56:39 +00:00
Wanchao Liang
dd367b7c8f check tensor subclass when using torch.compile + SAC (#115960)
as titled, when using SAC + torch.compile, it currently only check for
functional tensor, but not checking any tensor subclasses, therefore SAC
under torch.compile would ignore the tensor types like tensor
subclasses. Fixed in this PR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115960
Approved by: https://github.com/bdhirsh
2023-12-18 17:49:06 +00:00
angelayi
e43d33f4f7 [export] Support torch.sym* ops (#115854)
Fixes https://github.com/pytorch/pytorch/issues/108830 and https://github.com/pytorch/executorch/issues/1379#issuecomment-1853322866

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115854
Approved by: https://github.com/zhxchen17
2023-12-18 17:48:47 +00:00
Aaron Gokaslan
647f14e70b [BE]: Enable clang-tidy check for readability-string-compare (#115994)
Adds a clang-tidy check to ensure string compare is not used unnecessarily in a way that is less efficient and less readable if an equality overload exists.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115994
Approved by: https://github.com/albanD
2023-12-18 16:13:00 +00:00
Nikita Shulga
d7caef7996 [CI] Update clang-format (#116002)
To 17.0.6 build using https://github.com/pytorch/test-infra/blob/main/.github/workflows/clang-tidy-linux.yml

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116002
Approved by: https://github.com/suo
2023-12-18 14:58:46 +00:00
Mu-Chu Lee
c285ca7916 [AOTInductor] Add updaing constant buffer to active buffer. (#116001)
Summary:
Refactor update inactive constant buffer to allow updating with active
buffer.

Test Plan:
Existing test to test inactive buffer updates.
UpdateConstantsCuda in cpp test for active buffer updates.

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116001
Approved by: https://github.com/chenyang78
2023-12-18 11:49:03 +00:00
Pearu Peterson
34fe850d00 SymInt'ify sparse_compressed_tensor (#107903)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107903
Approved by: https://github.com/cpuhrsch
ghstack dependencies: #115586
2023-12-17 17:36:20 +00:00
Pearu Peterson
419f2ca3e3 Fix a crash in sparse compressed tensor invariants check when nnz == 0 (#115825)
Fixes python crash example from https://github.com/pytorch/pytorch/issues/115755

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115825
Approved by: https://github.com/cpuhrsch
2023-12-17 17:36:15 +00:00
Tej Singh
eafeba71c1 Adamw refactor (#115983)
Fixes #104899, refactors adamw by abstracting out common code in adam.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115983
Approved by: https://github.com/janeyx99
2023-12-17 06:58:39 +00:00
Yue Dong
87ea6fb844 Make input contiguous for DTensor reduce scatter to fix the incorrect numerical values (#115847)
Summary:
This change is to make the input tensor contiguous for DTensor reduce scatter in the case no padding is needed.

There's no exception thrown during training, but we ran into numerical value correctness issue without the change.

Test Plan:
**CI**
CI test

**WHEN model test**:
- Verified loss for each iteration within the expected range.
- Verified NE on-par with this change with 4B training data.

Differential Revision: D52170822

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115847
Approved by: https://github.com/wanchaol
2023-12-17 01:35:09 +00:00
Menglu Yu
bc4115ffcf [Inductor][Observability] Change to log.debug to avoid excessive long of logs (#115474)
Summary: Titled

Test Plan: CI

Differential Revision: D52003825

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115474
Approved by: https://github.com/jackiexu1992, https://github.com/yanboliang
2023-12-17 00:25:54 +00:00
Nikita Shulga
4123cca859 [AARCH64] Fall back to GEMM if mkldnn_matmul fails (#115936)
- Add call to `at::globalContext().userEnabledMkldnn()` to `apply_mkldnn_matmul_heur`
- Surround calls to `mkldnn_matmul` with `try {} catch {}`
- Print warning and fall back to BLAS (by calling  `at::globalContext().setUserEnabledMkldnn()`) if `mkldnn_matmul()` fails

Test plan: On Linux arm run:
```shell
$ sudo chmod 400 /sys; python -c "import torch;m=torch.nn.Linear(1, 32);print(torch.__version__);print(m(torch.rand(32, 1)))"
Error in cpuinfo: failed to parse the list of possible processors in /sys/devices/system/cpu/possible
Error in cpuinfo: failed to parse the list of present processors in /sys/devices/system/cpu/present
Error in cpuinfo: failed to parse both lists of possible and present processors
2.3.0.dev20231215
bad err=11 in Xbyak::Error
bad err=11 in Xbyak::Error
/home/ubuntu/miniconda3/envs/py311/lib/python3.11/site-packages/torch/nn/modules/linear.py:116: UserWarning: mkldnn_matmul failed, switching to BLAS gemm:internal error (Triggered internally at /pytorch/aten/src/ATen/native/LinearAlgebra.cpp:1509.)
  return F.linear(input, self.weight, self.bias)
tensor([[-0.5183,  0.2279, -0.4035,  ..., -0.3446,  0.0938, -0.2113],
        [-0.5111,  0.2362, -0.3821,  ..., -0.3536,  0.1011, -0.2159],
        [-0.6387,  0.0894, -0.7619,  ..., -0.1939, -0.0282, -0.1344],
        ...,
        [-0.6352,  0.0934, -0.7516,  ..., -0.1983, -0.0247, -0.1366],
        [-0.4790,  0.2733, -0.2862,  ..., -0.3939,  0.1338, -0.2365],
        [-0.5702,  0.1682, -0.5580,  ..., -0.2796,  0.0412, -0.1782]],
       grad_fn=<AddmmBackward0>)
```
Fixes https://github.com/pytorch/pytorch/issues/114750

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115936
Approved by: https://github.com/lezcano
2023-12-16 21:37:56 +00:00