pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
atalman	0bd5a3fed7	[releng] Docker release Refactor Push nightly tags step. Move cuda and cudnn version to docker tag rather then name (#116097 ) Follow up after : https://github.com/pytorch/pytorch/pull/116070 This PR does 2 things. 1. Refactor Push nightly tags step, don't need to extract CUDA_VERSION anymore. New tag should be in this format: ``${PYTORCH_VERSION}-cuda$(CUDA_VERSION_SHORT)-cudnn$(CUDNN_VERSION)-runtime`` 2. Move cuda$(CUDA_VERSION_SHORT)-cudnn$(CUDNN_VERSION) from docker name to tag Pull Request resolved: https://github.com/pytorch/pytorch/pull/116097 Approved by: https://github.com/jeanschmidt	2023-12-19 13:53:08 +00:00
Carlos Mocholí	a31effa15f	Update device_mesh.py docs imports (#116074 ) These are not importable from `torch.distributed`, at least today. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116074 Approved by: https://github.com/wz337, https://github.com/fegin	2023-12-19 09:44:55 +00:00
eqy	2a44034895	[CUDA] Include `<thrust/swap.h>` in `LinearAlgebra.cu` (#116072 ) Fixes build against the latest `NVIDIA/cccl`. CC @malfet @xwang233 @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/116072 Approved by: https://github.com/malfet, https://github.com/xwang233	2023-12-19 05:56:52 +00:00
FFFrog	327bdcdb14	Some tiny modification about torch.set/get_default_device (#116014 ) 1. fix bug of torch.set_default_device in multi-threading 2. add new interface named torch.get_default_device Fixes #115333 Fixes #115917 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116014 Approved by: https://github.com/malfet, https://github.com/jansel	2023-12-19 05:08:06 +00:00
wz337	b48abbc020	[DeviceMesh] Fix DeviceMesh docstring (#116053 ) 1. remove outdated comments 2. fix examples in docstring Doc after fix: <img width="706" alt="image" src="https://github.com/pytorch/pytorch/assets/31293777/19f4f03c-0fd7-4e88-bca1-1a6ce693fbb7"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/116053 Approved by: https://github.com/wanchaol	2023-12-19 04:05:49 +00:00
Isuru Fernando	8b0122ad33	Add lowerings for reflection_pad{1, 3}d_backward (#115645 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115645 Approved by: https://github.com/lezcano, https://github.com/peterbell10	2023-12-19 04:05:10 +00:00
Nikita Shulga	9dda4b20a0	[MPS] Enable select/[broad]cast ops for complex dtypes (#115727 ) By representing `torch.cfloat`/`torch.chalf` as `float2`/`half2` metal types and modifying `SCATTER_OPS_TEMPLATE`/`GATHER_OPS_TEMPLATE` to accept third argument which is fully specialized `cast` function, which is no-op for regular type, but special cased for float->complex and complex->float Pull Request resolved: https://github.com/pytorch/pytorch/pull/115727 Approved by: https://github.com/kulinseth	2023-12-19 02:25:28 +00:00
cyy	1544c37520	[7/N] Fixes clang-tidy warnings in c10/{core,util}/*.h (#115495 ) This PR continues to fix clang-tidy warnings for headers in c10/core and c10/util. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115495 Approved by: https://github.com/malfet	2023-12-19 02:14:30 +00:00
CaoE	9b8f934068	Remove memory_format check for native_group_norm_backward (#115721 ) To fix https://github.com/pytorch/pytorch/issues/115940. Remove memory_format check for native_group_norm_backward. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115721 Approved by: https://github.com/mikaylagawarecki	2023-12-19 02:12:26 +00:00
Oguz Ulgen	01b979fc9a	[Inductor] Fix constant folding and extern kernel mutation tracking bugs (#115908 ) This PR fixes two bugs 1) Constant folding a triton kernel results in the kernel's inputs to be returned back without any modification. Disable constant folding for triton kernels. Need more investigation 2) NoneLayout buffers should not be deleted as they do not exist Pull Request resolved: https://github.com/pytorch/pytorch/pull/115908 Approved by: https://github.com/aakhundov, https://github.com/jansel	2023-12-19 02:06:50 +00:00
Yanbo Liang	bb5a27052f	[Dynamo][9/N] Make SkipFilesVariable wrap functions only (#115963 ) Make ```SkipFilesVariable``` only handle function type, and route skipped classes to ```UserDefinedClassVariable```. The reasons behind this are: * We'd like to remove ```is_allowed```, so the allowed/disallowed torch classes should have a proper place to handle. We can put them in either ```SkipFilesVariable``` and ```UserDefinedClassVariable``` under the current architecture, but it's confusing to have two places do one thing. - Going forward, let's make ```SkipFilesVariable``` only handle functions, and probably I'll rename it to ```SkippedFunctionVariable``` in the following PRs. - Let's do dispatch by value's type, all torch classes stuff would go to ```UserDefinedClassVariable``` in the next PR. * We'd merge in_graph/skip/inline trace decision into the same API ```trace_rule.lookup```, so probably we have to limit the input to only function for better organizing ```VariableBuilder._wrap``` logics. - Next step, I'll merge ```skipfiles.check``` into ```trace_rules.lookup```, and do the skipfile check before wrapping them into correct variable tracker. - Though the ```TorchCtxManagerClassVariable``` is decided by ```trace_rules.lookup```, I'll refactor it out in the following PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115963 Approved by: https://github.com/jansel	2023-12-19 02:01:47 +00:00
PyTorch MergeBot	47908a608f	Revert "[ROCm] add hipblaslt support (#114329 )" This reverts commit `b062ea3803`. Reverted https://github.com/pytorch/pytorch/pull/114329 on behalf of https://github.com/jeanschmidt due to Reverting due to inconsistencies on internal diff ([comment](https://github.com/pytorch/pytorch/pull/114329#issuecomment-1861933267))	2023-12-19 01:04:58 +00:00
PyTorch MergeBot	ed0c0c49ef	Revert "[ROCm] fix nightly 5.6 build (#116029 )" This reverts commit `63e242b1e4`. Reverted https://github.com/pytorch/pytorch/pull/116029 on behalf of https://github.com/jeanschmidt due to Need to revert, in order to be able to revert #114329 ([comment](https://github.com/pytorch/pytorch/pull/116029#issuecomment-1861931736))	2023-12-19 01:01:42 +00:00
atalman	368a0c06d4	[releng] Docker Official release make sure cuda version is part of image name (#116070 ) Follow up on https://github.com/pytorch/pytorch/pull/115949 Change docker build image name: ``pytorch:2.1.2-devel``-> ``2.1.2-cuda12.1-cudnn8-devel and 2.1.2-cuda11.8-cudnn8-devel`` Ref: https://github.com/orgs/pytorch/packages/container/package/pytorch-nightly Naming will be same as in https://hub.docker.com/r/pytorch/pytorch/tags Pull Request resolved: https://github.com/pytorch/pytorch/pull/116070 Approved by: https://github.com/huydhn, https://github.com/seemethere	2023-12-19 00:58:15 +00:00
Shubhraprakash Das	5894af83be	Use dequantized weight and bias in conv2d quantized ops (#115615 ) Summary: Dequantize weight and bias for conv2d ops to improve performance. The weight and bias are usually small in size hence they do not increase memory footprint by a lot when dequantized. With optimization cunet-enc ops: vulkan.quantized_conv2d {96, 72, 2} 3753204 vulkan.quantized_conv2d {96, 72, 2} 6977048 vulkan.quantized_conv2d_dw{96, 72, 2} 2499640 vulkan.quantized_conv2d_pw_2x2{96, 72, 2} 842088 vulkan.quantized_conv2d {48, 36, 4} 2388152 vulkan.quantized_conv2d {48, 36, 4} 4775940 vulkan.quantized_conv2d_dw{48, 36, 4} 709800 vulkan.quantized_conv2d_pw_2x2{48, 36, 4} 483236 vulkan.quantized_conv2d {24, 18, 8} 2562144 vulkan.quantized_conv2d {24, 18, 8} 5447624 vulkan.quantized_conv2d_dw{24, 18, 8} 392756 vulkan.quantized_conv2d_pw_2x2{24, 18, 8} 509080 Without optimization: vulkan.quantized_conv2d {96, 72, 2} 4291768 vulkan.quantized_conv2d {96, 72, 2} 7871344 vulkan.quantized_conv2d_dw{96, 72, 2} 2658500 vulkan.quantized_conv2d_pw_2x2{96, 72, 2} 891020 vulkan.quantized_conv2d {48, 36, 4} 2966860 vulkan.quantized_conv2d {48, 36, 4} 5661812 vulkan.quantized_conv2d_dw{48, 36, 4} 816556 vulkan.quantized_conv2d_pw_2x2{48, 36, 4} 528632 vulkan.quantized_conv2d {24, 18, 8} 3139604 vulkan.quantized_conv2d {24, 18, 8} 6202820 vulkan.quantized_conv2d_dw{24, 18, 8} 452660 vulkan.quantized_conv2d_pw_2x2{24, 18, 8} 557388 Test Plan: Ensure all vulkan quantize tests pass: buck2 run --target-platforms ovr_configplatform/macos:arm64-fbsourcexplat/caffe2:pt_vulkan_quantized_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 --show-output" Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc [==========] Running 78 tests from 1 test suite. [----------] Global test environment set-up. [----------] 78 tests from VulkanAPITest ... [==========] 78 tests from 1 test suite ran. (1519 ms total) [ PASSED ] 78 tests. buck2 run --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 --show-output" Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc [==========] Running 395 tests from 1 test suite. [----------] Global test environment set-up. [----------] 395 tests from VulkanAPITest ... [----------] 395 tests from VulkanAPITest (6515 ms total) [----------] Global test environment tear-down [==========] 395 tests from 1 test suite ran. (6515 ms total) [ PASSED ] 394 tests. [ SKIPPED ] 1 test, listed below: [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log YOU HAVE 5 DISABLED TESTS Reviewed By: yipjustin Differential Revision: D50997532 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115615 Approved by: https://github.com/manuelcandales, https://github.com/yipjustin	2023-12-19 00:23:52 +00:00
Yue Dong	270ed13e87	[DTensor] Make DTensor `from_local` backward partial() to replicate() pass through (#115967 ) Summary: This change makes the `DTensor.from_local()` placements in backward pass from `Partial()` to `Replicate()` as pass through for following reasons: 1. When we run backward pass of DTensor.from_local, if the target placement is partial() (i.e. from user manual overwrite code instead of torch_dispatch) we keep the grad as replicate. This is because converting the gradients back to `Partial()` is meaningless. 2. The current div logic will lead to wrong numerical value in the above case. Test Plan: CI: CI Tests Unit test: `buck2 test mode/dev-nosan //caffe2/test/distributed/_tensor:redistribute` - Passed With model training: ``` # We tested the case where input tensor is manually overwrite as Partial() and # output tensor manually overwrite to Shard() then to local. # Before the change: numerical value not correct Forward pass: collective: ReduceScatter backward pass: collective: AllGather + div by process group size # After the change: div is removed as expected. Forward pass: collective: ReduceScatter Backward pas: collective: AllGather ``` Differential Revision: D52175709 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115967 Approved by: https://github.com/wanchaol	2023-12-19 00:16:10 +00:00
Philip Meier	3472a9200d	expand subclass type tests in dynamo (#116024 ) Following up on my own comments in https://github.com/pytorch/pytorch/pull/115323#pullrequestreview-1769491483. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116024 Approved by: https://github.com/mlazos	2023-12-19 00:08:55 +00:00
David Berard	054f9548b4	[dynamo] Store CompilationEvents in a buffer in torch._dynamo.utils (#115788 ) Motivation: it would be nice to be able to test using the metrics in log_compilation_event; currently dumps logs (or logs to a database in fbcode) - these are hard to use in unit tests. This change: * always record the information in torch._dynamo.utils.record_compilation_metrics; here, log into a limited-size deque to prevent the list of metrics from getting too long * if config.log_compilation_metrics, then call back into the original log_compilation_event function Pull Request resolved: https://github.com/pytorch/pytorch/pull/115788 Approved by: https://github.com/yanboliang	2023-12-18 23:26:13 +00:00
drisspg	fc58909bab	Fix allowed dtypes for mem_eff attention (#116026 ) # Summary Fix issue bug in detecting mem eff capability for cuda devices less than sm80: https://github.com/pytorch-labs/gpt-fast/issues/49 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116026 Approved by: https://github.com/janeyx99	2023-12-18 23:20:52 +00:00
drisspg	6b120c6cf9	Update the sdpa benchmark to measure forward backward time in isolation (#115986 ) # Summary The benchmarks were getting a little stale and I think it makes more sense to measure in isolation now rather than E2E in a mha component. This is a pre-req for getting the data for https://github.com/pytorch/pytorch/pull/115357 Output from run: ``` Shell +------------+-----------+-----------+------------+-----------+-----------+----------------+--------------------+--------------------+ \| batch_size \| num_heads \| q_seq_len \| kv_seq_len \| embed_dim \| is_causal \| dtype \| forward_time \| backward_time \| +------------+-----------+-----------+------------+-----------+-----------+----------------+--------------------+--------------------+ \| 1 \| 16 \| 128 \| 128 \| 2048 \| True \| torch.bfloat16 \| 23.86634959839284 \| 66.21150835417211 \| \| 1 \| 16 \| 128 \| 128 \| 2048 \| False \| torch.bfloat16 \| 23.452017060481012 \| 66.90612225793302 \| \| 1 \| 16 \| 256 \| 256 \| 2048 \| True \| torch.bfloat16 \| 24.478124547749758 \| 76.4232068322599 \| \| 1 \| 16 \| 256 \| 256 \| 2048 \| False \| torch.bfloat16 \| 24.6928428998217 \| 75.76151192188263 \| \| 1 \| 16 \| 512 \| 512 \| 2048 \| True \| torch.bfloat16 \| 28.69622849393636 \| 114.73898496478796 \| \| 1 \| 16 \| 512 \| 512 \| 2048 \| False \| torch.bfloat16 \| 34.399422979913645 \| 112.96746158041059 \| \| 1 \| 16 \| 1024 \| 1024 \| 2048 \| True \| torch.bfloat16 \| 65.4690912924707 \| 216.26344555988908 \| \| 1 \| 16 \| 1024 \| 1024 \| 2048 \| False \| torch.bfloat16 \| 88.57532404363155 \| 212.07790216431025 \| \| 8 \| 16 \| 128 \| 128 \| 2048 \| True \| torch.bfloat16 \| 11.582905380055308 \| 70.09557797573505 \| \| 8 \| 16 \| 128 \| 128 \| 2048 \| False \| torch.bfloat16 \| 12.068384909071026 \| 70.01491216942668 \| \| 8 \| 16 \| 256 \| 256 \| 2048 \| True \| torch.bfloat16 \| 31.671419646590945 \| 203.54910241439939 \| \| 8 \| 16 \| 256 \| 256 \| 2048 \| False \| torch.bfloat16 \| 33.0585768679157 \| 209.45609430782497 \| \| 8 \| 16 \| 512 \| 512 \| 2048 \| True \| torch.bfloat16 \| 87.43969700299202 \| 469.8729298543185 \| \| 8 \| 16 \| 512 \| 512 \| 2048 \| False \| torch.bfloat16 \| 123.9265550393611 \| 580.1084265112877 \| \| 8 \| 16 \| 1024 \| 1024 \| 2048 \| True \| torch.bfloat16 \| 561.1918237991632 \| 1181.655174586922 \| \| 8 \| 16 \| 1024 \| 1024 \| 2048 \| False \| torch.bfloat16 \| 884.2707145959139 \| 1662.4679416418073 \| +------------+-----------+-----------+------------+-----------+-----------+----------------+--------------------+--------------------+ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115986 Approved by: https://github.com/mikaylagawarecki	2023-12-18 22:40:47 +00:00
Joel Schlosser	bf62511e07	Reshape decomposition for jagged layout NT (#115191 ) No more segfault from using `reshape()` on jagged NT :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115191 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer	2023-12-18 22:34:41 +00:00
Jeff Daily	63e242b1e4	[ROCm] fix nightly 5.6 build (#116029 ) ROCm 5.6 nightly wheel build broken by #114329. This fixes it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116029 Approved by: https://github.com/huydhn, https://github.com/jithunnair-amd, https://github.com/atalman	2023-12-18 22:12:30 +00:00
Lucas Pasqualin	8452f41305	Adds allreduce to inductor remap (#115950 ) Fixes #115728 Implements a rewrite path for allreduce Pull Request resolved: https://github.com/pytorch/pytorch/pull/115950 Approved by: https://github.com/wconstab	2023-12-18 22:00:22 +00:00
Tianyu Liu	2a5659a797	add length assertion to PrepareModuleInput and PrepareModuleOutput (#115957 ) ## summary `zip(inputs, self.input_layouts, self.desired_input_layouts)` is used in `_prepare_input_fn`; similar for `_prepare_output_fn`. Without assertion, unmatched dimension in inputs/outputs will be lost, potentially causing unexpected behabiors. ## test plan `python test/distributed/tensor/parallel/test_tp_style.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115957 Approved by: https://github.com/wanchaol	2023-12-18 21:50:18 +00:00
Jacob Rodal	a699b10339	[buck2][win] fix caffe2 protobuf_rule (#115954 ) Summary: c2_protobuf_rule ([here](https://fburl.com/code/iyiulpmv)) is broken on buck2, ultimately due to the following error: > .\./caffe2.proto: File does not reside within any path specified using --proto_path (or -I). You must specify a --proto_path which encompasses this file. Note that the proto_path must be an exact prefix of the .proto file names -- protoc is too dumb to figure out when two paths (e.g. absolute and relative) are equivalent (it's harder than you think). The root cause is differences in how buck1 and buck2 handle `%SRCDIR%` (absolute versus relative paths). This diff fixes the build. Test Plan: # Before ``` buck2 build arvr/mode/win/opt //xplat/caffe2:caffe2.pb.h ``` ``` More details at https://www.internalfb.com/intern/buck/build/c6550454-ae6d-479e-9d08-016e544ef050 BUILD SUCCEEDED ``` ``` Action failed: fbsource//xplat/caffe2:caffe2.pb.h (genrule) Remote command returned non-zero exit code <no exit code> Reproduce locally: frecli cas download-action 5df17cf64b7e2fc5ab090c91e1129f2f3cad36dc72c7c182ab052af23d3f32aa:145 stdout: stderr: OUTMISS: Missing outputs: buck-out/v2/gen/fbsource/dd87aacb8683145b/xplat/caffe2/caffe2.pb.h/out/caffe2.pb.h ``` # After Buck1 still works ``` buck1 build arvr/mode/win/opt //xplat/caffe2:caffe2.pb.h ``` Buck2 works ``` buck2 build arvr/mode/win/opt //xplat/caffe2:caffe2.pb.h ``` ``` Buck UI: https://www.internalfb.com/buck2/e5dae607-325a-4eab-b0c9-66fe4e9a6254 BUILD SUCCEEDED ``` Differential Revision: D52218365 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115954 Approved by: https://github.com/mcr229	2023-12-18 21:41:10 +00:00
isdanni	2f7bb18def	[Doc] Add padding size constraint in nn.ReflectionPad2d (#115995 ) Fixes #115532 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115995 Approved by: https://github.com/mikaylagawarecki	2023-12-18 21:29:14 +00:00
angelayi	1e272fb6d6	[export] Undo "module: export" labeling (#116042 ) Delete the auto-labeling of "module: export" as this is not really used, and we want to delete the "module: export" label. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116042 Approved by: https://github.com/clee2000	2023-12-18 21:23:17 +00:00
Catherine Lee	c4748b425e	Add main in dynamo/test_compile.py (#115941 ) Need to verify that it is dynamo's custom TestCase and run_tests instead of the general common_utils TestCase and run_tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/115941 Approved by: https://github.com/msaroufim	2023-12-18 20:53:28 +00:00
Wanchao Liang	a1a0b290d2	[tp] further fix the docs (#115974 ) some typo result in the note section not rendered properly, can't see this from the last PR directly as the last PR only show the first commit documentation :( Also make the parallelize_module doc example more concrete Pull Request resolved: https://github.com/pytorch/pytorch/pull/115974 Approved by: https://github.com/wz337	2023-12-18 20:41:53 +00:00
Jesse Cai	8868c1cfae	[sparse][ci] Add cuSPASRELt to CI (#115369 ) Summary: This PR adds in cuSPARSELt v0.4.07 into CI (12.1 and 11.8 CUDA) to run our cuSPARSELt specific tests. Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/115369 Approved by: https://github.com/malfet	2023-12-18 20:33:30 +00:00
PyTorch UpdateBot	2b2ed52799	[xla hash update] update the pinned xla hash (#116003 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116003 Approved by: https://github.com/clee2000	2023-12-18 20:31:49 +00:00
atalman	7b6210e8a4	Use matrix generate script for docker release workflows (#115949 ) Enable both supported CUDA version builds for docker release. Rather then building only 1 version. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115949 Approved by: https://github.com/huydhn	2023-12-18 20:20:59 +00:00
gs-olive	e30d436b01	[fx][split][testing] Add testing for #107981 (#108731 ) - Follow-up to #107981, adding testing for metadata copying in placeholder nodes within the `split_by_tags` utility - Validation included in the test from #107248, since both tests are relevant to the same aspect of the utility Pull Request resolved: https://github.com/pytorch/pytorch/pull/108731 Approved by: https://github.com/angelayi	2023-12-18 20:19:18 +00:00
vinithakv	bf20b56e9d	Fix PyTorch build error on ppc64le (#115729 ) The PyTorch build breaks when building from tip on ppc64le with following error pytorch/aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp:863:46: error: no matching function for call to 'at::vec::DEFAULT::Vectorizedc10::qint8::dequantize(at::vec::DEFAULT::Vectorized&, at::vec::DEFAULT::Vectorized&) Issue reported #115165 This patch fixes the build issue. Fixes #115165 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115729 Approved by: https://github.com/albanD	2023-12-18 19:00:56 +00:00
Tobias Ringwald	77366ba637	Increased hardcoded limit for number of GPUs. (#115368 ) Fixes #115331. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115368 Approved by: https://github.com/albanD	2023-12-18 18:39:19 +00:00
Michael Lazos	80b1ecc308	Run eager adam optimizer in benchmarks where possible (#115445 ) Runs eager Adam (instead of SGD) on all models that don't fail accuracy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115445 Approved by: https://github.com/desertfire	2023-12-18 18:28:23 +00:00
zdevito	8a445f7bd5	Serve multistream graph captures from correct pool (#114647 ) This fixes #114320 by placing the logic for determining whether to allocate to a pool inside a callback that is controlled by CUDAGraph.cpp or by the python bound api to allocate a stream directly to a pool. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114647 Approved by: https://github.com/ngimel, https://github.com/eellison	2023-12-18 18:24:15 +00:00
CK Luk	3b70bd3970	Take 2 of "Add an option to log the source of the Triton kernels generated by torch._inductor (#115979 ) Summary: This is useful the comparing the Triton kernels generated by two different invocations of torch.compile on the same model (e.g., checking of serial compile and parallel compile generate identical Triton kernels). Test Plan: Unit test: buck2 test mode/opt //caffe2/torch/fb/module_factory/sync_sgd/tests:test_torchdynamo_wrapper -- --print-passing-details >& ~/tmp/log.test PyPer Mast job: https://www.internalfb.com/mast/job/sw-951074659-OfflineTraining_87587a4e See the *.py files generated in: pyper_traces/tree/torchinductor_traces/sw-951074659-OfflineTraining_87587a4e/4623 Differential Revision: D52221500 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115979 Approved by: https://github.com/yanboliang	2023-12-18 18:16:44 +00:00
Behrang Javaherian	386776c49a	[torch] Reduce the memory usage by adding flags to clearing intermediate graphs used for optimization during the ineference. (#115657 ) Summary: During the inference time the intermediate graphs for optimization are not used so the Executor's graph is the only graph we need to keep around these two flags Test Plan: the FLAGS are all off by default baseline ``` buck run mode/opt-clang sigrid/predictor/client/localnet:run_model -- --model_id_to_load=951679039 --model_snapshot_to_load=244 --torch_jit_do_not_store_optimized_graph=true I1212 10:24:20.407408 401092 SigridPredictorLocalModelFactory.cpp:32] Memory usage for 951679039_244 is 182863 Kb ``` ``` buck run mode/opt-clang sigrid/predictor/client/localnet:run_model -- --model_id_to_load=951679039 --model_snapshot_to_load=244 --torch_jit_do_not_store_optimized_graph=true --torch_jit_release_profiling_graph_after_optimization=true I1212 10:31:37.663487 464000 SigridPredictorLocalModelFactory.cpp:32] Memory usage for 951679039_244 is 186127 Kb ``` ``` buck run mode/opt-clang sigrid/predictor/client/localnet:run_model -- --model_id_to_load=951679039 --model_snapshot_to_load=244 --torch_jit_do_not_store_optimized_graph=true --torch_jit_release_profiling_graph_after_optimization=true --torch_jit_execution_plan_avoid_extra_graph_copy=true I1212 10:29:42.848093 447218 SigridPredictorLocalModelFactory.cpp:32] Memory usage for 951679039_244 is 129451 Kb``` Differential Revision: D52081631 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115657 Approved by: https://github.com/houseroad	2023-12-18 17:56:39 +00:00
Wanchao Liang	dd367b7c8f	check tensor subclass when using torch.compile + SAC (#115960 ) as titled, when using SAC + torch.compile, it currently only check for functional tensor, but not checking any tensor subclasses, therefore SAC under torch.compile would ignore the tensor types like tensor subclasses. Fixed in this PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/115960 Approved by: https://github.com/bdhirsh	2023-12-18 17:49:06 +00:00
angelayi	e43d33f4f7	[export] Support torch.sym* ops (#115854 ) Fixes https://github.com/pytorch/pytorch/issues/108830 and https://github.com/pytorch/executorch/issues/1379#issuecomment-1853322866 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115854 Approved by: https://github.com/zhxchen17	2023-12-18 17:48:47 +00:00
Aaron Gokaslan	647f14e70b	[BE]: Enable clang-tidy check for readability-string-compare (#115994 ) Adds a clang-tidy check to ensure string compare is not used unnecessarily in a way that is less efficient and less readable if an equality overload exists. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115994 Approved by: https://github.com/albanD	2023-12-18 16:13:00 +00:00
Nikita Shulga	d7caef7996	[CI] Update clang-format (#116002 ) To 17.0.6 build using https://github.com/pytorch/test-infra/blob/main/.github/workflows/clang-tidy-linux.yml Pull Request resolved: https://github.com/pytorch/pytorch/pull/116002 Approved by: https://github.com/suo	2023-12-18 14:58:46 +00:00
Mu-Chu Lee	c285ca7916	[AOTInductor] Add updaing constant buffer to active buffer. (#116001 ) Summary: Refactor update inactive constant buffer to allow updating with active buffer. Test Plan: Existing test to test inactive buffer updates. UpdateConstantsCuda in cpp test for active buffer updates. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/116001 Approved by: https://github.com/chenyang78	2023-12-18 11:49:03 +00:00
Pearu Peterson	34fe850d00	SymInt'ify sparse_compressed_tensor (#107903 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107903 Approved by: https://github.com/cpuhrsch ghstack dependencies: #115586	2023-12-17 17:36:20 +00:00
Pearu Peterson	419f2ca3e3	Fix a crash in sparse compressed tensor invariants check when nnz == 0 (#115825 ) Fixes python crash example from https://github.com/pytorch/pytorch/issues/115755 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115825 Approved by: https://github.com/cpuhrsch	2023-12-17 17:36:15 +00:00
Tej Singh	eafeba71c1	Adamw refactor (#115983 ) Fixes #104899, refactors adamw by abstracting out common code in adam. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115983 Approved by: https://github.com/janeyx99	2023-12-17 06:58:39 +00:00
Yue Dong	87ea6fb844	Make input contiguous for DTensor reduce scatter to fix the incorrect numerical values (#115847 ) Summary: This change is to make the input tensor contiguous for DTensor reduce scatter in the case no padding is needed. There's no exception thrown during training, but we ran into numerical value correctness issue without the change. Test Plan: CI CI test WHEN model test: - Verified loss for each iteration within the expected range. - Verified NE on-par with this change with 4B training data. Differential Revision: D52170822 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115847 Approved by: https://github.com/wanchaol	2023-12-17 01:35:09 +00:00
Menglu Yu	bc4115ffcf	[Inductor][Observability] Change to log.debug to avoid excessive long of logs (#115474 ) Summary: Titled Test Plan: CI Differential Revision: D52003825 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115474 Approved by: https://github.com/jackiexu1992, https://github.com/yanboliang	2023-12-17 00:25:54 +00:00
Nikita Shulga	4123cca859	[AARCH64] Fall back to GEMM if mkldnn_matmul fails (#115936 ) - Add call to `at::globalContext().userEnabledMkldnn()` to `apply_mkldnn_matmul_heur` - Surround calls to `mkldnn_matmul` with `try {} catch {}` - Print warning and fall back to BLAS (by calling `at::globalContext().setUserEnabledMkldnn()`) if `mkldnn_matmul()` fails Test plan: On Linux arm run: ```shell $ sudo chmod 400 /sys; python -c "import torch;m=torch.nn.Linear(1, 32);print(torch.__version__);print(m(torch.rand(32, 1)))" Error in cpuinfo: failed to parse the list of possible processors in /sys/devices/system/cpu/possible Error in cpuinfo: failed to parse the list of present processors in /sys/devices/system/cpu/present Error in cpuinfo: failed to parse both lists of possible and present processors 2.3.0.dev20231215 bad err=11 in Xbyak::Error bad err=11 in Xbyak::Error /home/ubuntu/miniconda3/envs/py311/lib/python3.11/site-packages/torch/nn/modules/linear.py:116: UserWarning: mkldnn_matmul failed, switching to BLAS gemm:internal error (Triggered internally at /pytorch/aten/src/ATen/native/LinearAlgebra.cpp:1509.) return F.linear(input, self.weight, self.bias) tensor([[-0.5183, 0.2279, -0.4035, ..., -0.3446, 0.0938, -0.2113], [-0.5111, 0.2362, -0.3821, ..., -0.3536, 0.1011, -0.2159], [-0.6387, 0.0894, -0.7619, ..., -0.1939, -0.0282, -0.1344], ..., [-0.6352, 0.0934, -0.7516, ..., -0.1983, -0.0247, -0.1366], [-0.4790, 0.2733, -0.2862, ..., -0.3939, 0.1338, -0.2365], [-0.5702, 0.1682, -0.5580, ..., -0.2796, 0.0412, -0.1782]], grad_fn=<AddmmBackward0>) ``` Fixes https://github.com/pytorch/pytorch/issues/114750 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115936 Approved by: https://github.com/lezcano	2023-12-16 21:37:56 +00:00

1 2 3 4 5 ...

67574 Commits