pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
ydwu4	461ffaaaf3	[dynamo] support torchbind object input (#124978 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124978 Approved by: https://github.com/jansel	2024-05-07 03:02:00 +00:00
Zhirui Dai	3411d54811	fix loading optimizer options from archive (#125215 ) This PR makes libtorch behave the same as PyTorch when loading optimizer state from archive. With PyTorch, options of parameter groups are loaded from the archive, which is missing currently in libtorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125215 Approved by: https://github.com/janeyx99	2024-05-06 23:58:40 +00:00
Shuqiang Zhang	bfd5bb0c44	[c10d] only PG0 should dump when monitoring thread timed out (#125356 ) Summary: We found that some dumps are missing when monitoring thread timeout. This is likely due to multiple PGs could still dump the same records at the same time. So we should allow only PG0 to actualy dump Test Plan: unit test python test/run_test.py --cpp --verbose -i cpp/ProcessGroupNCCLErrorsTest Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/125356 Approved by: https://github.com/c-p-i-o	2024-05-04 00:43:20 +00:00
Stefan-Alin Pahontu	bebefcf845	Driver folder check (#117548 ) Added extra check for driver folders for Libtorch, as stat struct does not recognize driver folders, so torch.save should work for them as well. (e.g. save model.pt directly under C: ) Fixes [#111121](https://github.com/pytorch/pytorch/issues/111121) and #105488 Co-authored-by: Ozan Aydin <148207261+ozanMSFT@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/117548 Approved by: https://github.com/malfet	2024-05-03 09:10:11 +00:00
Wes Bland	6f5f405b05	[ncclx] Rename NCCL-EXP to NCCLX (#125238 ) Reviewed By: kryanchun Differential Revision: D56534548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125238 Approved by: https://github.com/kwen2501	2024-05-01 23:29:55 +00:00
PyTorch MergeBot	724c7491d0	Revert " [Distributed] [7/N] Fix clang-tidy warnings in torch/csrc/distributed/c10d (#124987 )" This reverts commit `b3fd94d15e`. Reverted https://github.com/pytorch/pytorch/pull/124987 on behalf of https://github.com/ezyang due to broke downstream extensions ([comment](https://github.com/pytorch/pytorch/pull/124987#issuecomment-2083956511))	2024-04-30 00:37:53 +00:00
cyy	b3fd94d15e	[Distributed] [7/N] Fix clang-tidy warnings in torch/csrc/distributed/c10d (#124987 ) This PR continues to clean clang-tidy warnings in torch/csrc/distributed/c10d, following #124701. In addition, libfmt dependency is added in CMake code to enable using it in the headers. The libfmt has to be added as private dependency to torch_cuda and torch_hip because they include torch/csrc/distributed/c10d/Utils.hpp which uses libfmt. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124987 Approved by: https://github.com/malfet	2024-04-27 07:22:27 +00:00
Kurt Mohler	abcb42cdd2	Avoid COW materialize in various places (1) (#124984 ) Most, not all, of these cases were found automatically with `git grep -n '^\s\<const\>.\.=.*\<data_ptr\>'` Part of #97856 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124984 Approved by: https://github.com/Skylion007	2024-04-26 19:06:28 +00:00
Shivam Raikundalia	63d4dc5a80	Remove TMP_LIBKINETO_NANOSECOND flag from Compilation (#124734 ) Summary: Now that we have reached nanosecond granularity, we can now remove the temporary guards that were previously required for nanosecond precision. Test Plan: Regression should cover this change Reviewed By: aaronenyeshi Differential Revision: D56444570 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124734 Approved by: https://github.com/aaronenyeshi	2024-04-26 06:57:03 +00:00
Bin Bao	b2fd224f27	[AOTI] Add more ABI-compatiblity unit test (#123900 ) Summary: Follow https://github.com/pytorch/pytorch/pull/123848, and test more c10 util functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123900 Approved by: https://github.com/chenyang78	2024-04-23 16:06:40 +00:00
Bin Bao	4946638f06	[AOTI] Add ABI-compatiblity tests (#123848 ) Summary: In AOTInductor generated CPU model code, there can be direct references to some aten/c10 utility functions and data structures, e.g. at::vec and c10::Half. These are performance critical and thus it doesn't make sense to create C shim for them. Instead, we make sure they are implemented in a header-only way, and use this set of tests to guard future changes. There are more header files to be updated, but we will do it in other followup PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123848 Approved by: https://github.com/jansel ghstack dependencies: #123847	2024-04-19 00:51:24 +00:00
Xuehai Pan	93e249969b	[BE] enable `ruff` rule `RSE` and remove useless parentheses in `raise` statements (#124261 ) Remove useless parentheses in `raise` statements if the exception type is raised with no argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124261 Approved by: https://github.com/albanD	2024-04-17 19:29:34 +00:00
Shivam Raikundalia	3ebbeb75fd	[Profiler] Make Kineto traces export ns granularity for finer timestamps (#122425 ) (#123650 ) Summary: Kineto traces use microsecond level granularity because of chrome tracing defaults to that precision. Fix by adding preprocessor flag to TARGETS and BUCK files. Also remove any unnecessary ns to us conversions made in the profiler itself. This diff contains profiler changes only. Libkineto changes found in D54964435. Test Plan: Check JSON and chrome tracing to make sure values are as expected. Tracing with flags enabled should have ns precision. Tracings without flags should be same as master. Zoomer: https://www.internalfb.com/intern/zoomer/?profiling_run_fbid=796886748550189 Ran key_averages() to make sure FunctionEvent code working as expected: -- ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ProfilerStep* 0.74% 3.976ms 64.40% 346.613ms 69.323ms 0.000us 0.00% 61.710ms 12.342ms 5 Optimizer.zero_grad#SGD.zero_grad 0.76% 4.109ms 0.76% 4.109ms 821.743us 0.000us 0.00% 0.000us 0.000us 5 ## forward ## 6.89% 37.057ms 27.19% 146.320ms 29.264ms 0.000us 0.00% 58.708ms 11.742ms 5 aten::conv2d 0.22% 1.176ms 7.74% 41.658ms 157.199us 0.000us 0.00% 27.550ms 103.962us 265 aten::convolution 0.79% 4.273ms 7.52% 40.482ms 152.762us 0.000us 0.00% 27.550ms 103.962us 265 aten::_convolution 0.69% 3.688ms 6.73% 36.209ms 136.637us 0.000us 0.00% 27.550ms 103.962us 265 aten::cudnn_convolution 6.04% 32.520ms 6.04% 32.520ms 122.719us 27.550ms 8.44% 27.550ms 103.962us 265 aten::add_ 2.42% 13.045ms 2.42% 13.045ms 30.694us 12.700ms 3.89% 12.700ms 29.882us 425 aten::batch_norm 0.19% 1.027ms 8.12% 43.717ms 164.971us 0.000us 0.00% 16.744ms 63.185us 265 aten::_batch_norm_impl_index 0.31% 1.646ms 7.93% 42.691ms 161.096us 0.000us 0.00% 16.744ms 63.185us 265 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Differential Revision: D55925068 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123650 Approved by: https://github.com/aaronenyeshi	2024-04-11 04:29:20 +00:00
PyTorch MergeBot	c66d503194	Revert "[Profiler][submodule] Make Kineto traces export ns granularity for finer timestamps (#122425 )" This reverts commit `6f7dd2f84a`. Reverted https://github.com/pytorch/pytorch/pull/122425 on behalf of https://github.com/malfet due to Breaks ROCM builds ([comment](https://github.com/pytorch/pytorch/pull/122425#issuecomment-2041129241))	2024-04-06 16:19:00 +00:00
Shivam Raikundalia	6f7dd2f84a	[Profiler][submodule] Make Kineto traces export ns granularity for finer timestamps (#122425 ) Summary: Kineto traces use microsecond level granularity because of chrome tracing defaults to that precision. Fix by adding preprocessor flag to TARGETS and BUCK files. Also remove any unnecessary ns to us conversions made in the profiler itself. This diff contains profiler changes only. Libkineto changes found in D54964435. Test Plan: Check JSON and chrome tracing to make sure values are as expected. Tracing with flags enabled should have ns precision. Tracings without flags should be same as master. Tracing with flags enabled: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Mar_18_14_37_22.4155151.pt.trace.json.gz&bucket=gpu_traces Tracing without flags enabled: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Mar_18_14_39_15.4166047.pt.trace.json.gz&bucket=gpu_traces Tracing on main: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Mar_18_14_42_43.4177559.pt.trace.json.gz&bucket=gpu_traces Ran key_averages() to make sure FunctionEvent code working as expected: -- ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ProfilerStep* 0.74% 3.976ms 64.40% 346.613ms 69.323ms 0.000us 0.00% 61.710ms 12.342ms 5 Optimizer.zero_grad#SGD.zero_grad 0.76% 4.109ms 0.76% 4.109ms 821.743us 0.000us 0.00% 0.000us 0.000us 5 ## forward ## 6.89% 37.057ms 27.19% 146.320ms 29.264ms 0.000us 0.00% 58.708ms 11.742ms 5 aten::conv2d 0.22% 1.176ms 7.74% 41.658ms 157.199us 0.000us 0.00% 27.550ms 103.962us 265 aten::convolution 0.79% 4.273ms 7.52% 40.482ms 152.762us 0.000us 0.00% 27.550ms 103.962us 265 aten::_convolution 0.69% 3.688ms 6.73% 36.209ms 136.637us 0.000us 0.00% 27.550ms 103.962us 265 aten::cudnn_convolution 6.04% 32.520ms 6.04% 32.520ms 122.719us 27.550ms 8.44% 27.550ms 103.962us 265 aten::add_ 2.42% 13.045ms 2.42% 13.045ms 30.694us 12.700ms 3.89% 12.700ms 29.882us 425 aten::batch_norm 0.19% 1.027ms 8.12% 43.717ms 164.971us 0.000us 0.00% 16.744ms 63.185us 265 aten::_batch_norm_impl_index 0.31% 1.646ms 7.93% 42.691ms 161.096us 0.000us 0.00% 16.744ms 63.185us 265 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Differential Revision: D55087993 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122425 Approved by: https://github.com/aaronenyeshi	2024-04-06 06:04:28 +00:00
Arun Pa	f71e368969	UFMT formatting on test/autograd test/ao test/cpp test/backends (#123369 ) Partially addresses #123062 Ran lintrunner on - test/_test_bazel.py - test/ao - test/autograd test/backends test/benchmark_uitls test/conftest.py test/bottleneck_test test/cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/123369 Approved by: https://github.com/huydhn	2024-04-05 18:51:38 +00:00
Chun Cai	691054eeef	Fix error message of autograd (#123154 ) This PR updates the error message in autograd when an input tensor does not set to `require_grad`. The original message does not contain the index info, making users hard to debug. The error message style consists with that on line 105-109. Co-authored-by: Jeffrey Wan <soulitzer@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/123154 Approved by: https://github.com/soulitzer	2024-04-03 19:07:21 +00:00
ydwu4	c77352b5cc	Add torch._library.register_fake_class to fakify torchBind class (#122622 ) This PR only adds abstract class registration logic without touching existing tests so they still trace with real script object. The added tests are only for registration APIs and test error messages. Our design is that the abstract implementation should be in Python. This is much better in terms of usability. But this also has implications for custom op that takes script object as input, which is detailed later in this stack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122622 Approved by: https://github.com/zou3519 ghstack dependencies: #122619, #122620, #122621	2024-04-02 23:52:17 +00:00
ydwu4	46c7235406	add tensor queue example (#122621 ) This PR adds a tensor queue example for later use. It doesn't touch any existing logic. It refactors the tests a little bit to avoid importing the library in unittest setUp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122621 Approved by: https://github.com/zou3519 ghstack dependencies: #122619, #122620	2024-04-02 23:52:17 +00:00
blegouix	ccfc87b199	include scheduler_on_plateau in optim.h (#121722 ) Fixes #121593 Co-authored-by: Jane Xu <janeyx@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/121722 Approved by: https://github.com/albanD	2024-03-27 19:45:25 +00:00
Mu-Chu Lee	2367d0dacd	[AOTInductor] Add tensor_constantX to pass constant buffer update's check (#122562 ) (#122690 ) Summary: During tracing, some constants (tensor_constant{idx}) are being generated internally. Those constants are neither parameters or buffers, and users have zero control on them. To accomodate this, we should allow users not passing in those constants generated internally but still be able the constants in the model. Test Plan: Included in commit. ``` build/bin/test_aot_inductor ``` Reviewed By: zoranzhao Differential Revision: D55354548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122690 Approved by: https://github.com/khabinov	2024-03-26 23:25:15 +00:00
PyTorch MergeBot	55f36d1ada	Revert "[AOTInductor] Add tensor_constantX to pass constant buffer update's check (#122562 )" This reverts commit `57a3d00b06`. Reverted https://github.com/pytorch/pytorch/pull/122562 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/122562#issuecomment-2019262415))	2024-03-26 02:18:19 +00:00
Mu-Chu Lee	57a3d00b06	[AOTInductor] Add tensor_constantX to pass constant buffer update's check (#122562 ) Summary: During tracing, some constants (tensor_constant{idx}) are being generated internally. Those constants are neither parameters or buffers, and users have zero control on them. To accomodate this, we should allow users not passing in those constants generated internally but still be able the constants in the model. Test Plan: Included in commit. ``` build/bin/test_aot_inductor ``` Differential Revision: D55286634 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122562 Approved by: https://github.com/chenyang78, https://github.com/khabinov	2024-03-25 22:05:20 +00:00
David Berard	2d9cee20a2	[jit] AliasDB type hash - don't always return 0 (#121874 ) This hash was missing an assignment, so for almost all types it was returning "0". c10::flat_hash_map turns out to have really bad behavior with a terrible hash like this, nearly exponential in memory usage. Differential Revision: [D54916424](https://our.internmc.facebook.com/intern/diff/D54916424) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121874 Approved by: https://github.com/eellison	2024-03-14 23:16:08 +00:00
angelayi	e8836759d0	[export] Add effect token to export (#121424 ) Following the creation of effect tokens (https://github.com/pytorch/pytorch/pull/120296), we want to now add support for these tokens in export because the calling/returning convention has changed. The inputs are now `(tokens, params, buffers, constants, user_inputs)` and the outputs are `(tokens, buffer_mutations, user_mutations, user_outputs)`. The graph looks something like: ``` graph(): %arg0_1 : [num_users=1] = placeholder[target=arg0_1] %attr : [num_users=2] = placeholder[target=attr] %arg1_1 : [num_users=2] = placeholder[target=arg1_1] %with_effects : [num_users=2] = call_function[target=torch._higher_order_ops.effects.with_effects](args = (%arg0_1, _TorchScriptTesting.takes_foo.default, %attr, %arg1_1), kwargs = {}) %getitem : [num_users=1] = call_function[target=operator.getitem](args = (%with_effects, 0), kwargs = {}) %getitem_1 : [num_users=1] = call_function[target=operator.getitem](args = (%with_effects, 1), kwargs = {}) %with_effects_1 : [num_users=2] = call_function[target=torch._higher_order_ops.effects.with_effects](args = (%getitem, _TorchScriptTesting.takes_foo.default, %attr, %getitem_1), kwargs = {}) %getitem_2 : [num_users=1] = call_function[target=operator.getitem](args = (%with_effects_1, 0), kwargs = {}) %getitem_3 : [num_users=1] = call_function[target=operator.getitem](args = (%with_effects_1, 1), kwargs = {}) %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%arg1_1, %getitem_3), kwargs = {}) return (getitem_2, add) ``` During unlifting, we will first remove the tokens and with_effect calls using the `remove_effect_tokens` pass. (cc @SherlockNoMad on the pass to remove tokens). This is so that this won't change the calling conventions when retracing. The graph after unlifting looks something like: ``` graph(): %attr_1 : [num_users=2] = get_attr[target=attr] %arg1_1 : [num_users=2] = placeholder[target=arg1_1] %takes_foo_default_1 : [num_users=1] = call_function[target=torch.ops._TorchScriptTesting.takes_foo.default](args = (%attr_1, %arg1_1), kwargs = {}) %takes_foo_default : [num_users=1] = call_function[target=torch.ops._TorchScriptTesting.takes_foo.default](args = (%attr_1, %takes_foo_default_1), kwargs = {}) %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%arg1_1, %takes_foo_default), kwargs = {}) return (add,) ``` Serialization support will be added in a followup. Note: tokens only affect custom ops that take in ScriptObjects, not ScriptObject methods yet. Differential Revision: [D54639390](https://our.internmc.facebook.com/intern/diff/D54639390) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121424 Approved by: https://github.com/tugsbayasgalan	2024-03-09 02:43:26 +00:00
Will Constable	f85d3a022c	[C10D] Fix pointToPoint op Flight Recording (#120270 ) Fix and test issues with both coalesced and individual send/recv ops Considered an alternate approach and then ditched it - alternate approach: #119757 - reason ditched: prefer recording individual collective events inside coalescing region instead of just the event at the end of the region, which also would not have tensor sizes or opnames without additional state variables added Another approach also ditched - record events on workEnqueue instead of initWork - reason ditched: too messy to get input/output shapes tagged on recording when recording in workEnqueue. Adding the info onto the Work obj would be possible, but adds to overhead of copying Works which we do on every collective. We can get info off the input/output tensors directly in initWork, but we don't want to keep refs to those tensors alive while the work is Enqueued, so we'd have to specifically copy size lists or something. This PR instead avoids creating a work inside pointToPoint when coalescing is active. Instead, only at endCoalescing() is a work finally intialized and enqueued. But it adds a record() call inside pointToPoint() instead of creating a work, during coalescing. This record() call picks up tensor shapes and op names. It ALSO changes initWork to accept a 'record' argument. This defaults to false, and should only be set to true if the caller ensures the work will be enqueued by workEnqueue, ensuring its cuda events are live when used by flight recorder's update_state(). The testing uncovers some odd pre-existing behavior and leaves them alone for now. We could change some of these - seq starts off at 1, not 0 for first op (but this is inconistent) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120270 Approved by: https://github.com/shuqiangzhang ghstack dependencies: #120724	2024-02-29 01:03:31 +00:00
Shuqiang Zhang	39f0a5ecc9	[c10d] simplify the dump timeout logic and unify the async call (#120331 ) Summary: The current dump timeout logic is a bit cumbersome as it needs 2 times: 1. timeout, 2. wake up time. And in theory the caller just needs to wait for a max of timeout value for the dump and declare the dump to be either successful or not. Also we unify the async call using std::async instead of a customized async lauch function for each operation. Test Plan: Unit tests Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/120331 Approved by: https://github.com/wconstab	2024-02-23 19:46:40 +00:00
cyy	1aad5c98b4	[structural binding][5/N] Replace std::tie with structural binding (#120142 ) This PR follows https://github.com/pytorch/pytorch/pull/119774, it is a continued work to clean up std::tie. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120142 Approved by: https://github.com/albanD	2024-02-21 22:32:55 +00:00
soulitzer	312ce35c1f	Rename singleton int to nested int (#119661 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119661 Approved by: https://github.com/ezyang	2024-02-16 19:21:17 +00:00
cyy	47a2e6b6b8	Fix C++20 build (#112333 ) Currently C++20 fails because of incorrect template initialization order. This PR adjusted the order of theses classes and a constructor to address the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112333 Approved by: https://github.com/albanD	2024-02-13 05:10:19 +00:00
Sahdev Zala	110919c984	Check QNNPACK support for the platform before running test (#119139 ) Do not run test ConstantPropagation.CustomClassesCanBePropagated on a platform where QNNPACK is not supported. For example, this test fails on M1 Mac because QNNPACK is not supported on M1 Mac: [----------] 1 test from ConstantPropagation [ RUN ] ConstantPropagation.CustomClassesCanBePropagated unknown file: Failure as described in more details in the issue #88613. After the PR, test passes successfully as below: [----------] 1 test from ConstantPropagation [ RUN ] ConstantPropagation.CustomClassesCanBePropagated [ OK ] ConstantPropagation.CustomClassesCanBePropagated (0 ms) [----------] 1 test from ConstantPropagation (0 ms total) Fixes #88613 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119139 Approved by: https://github.com/jcaip	2024-02-12 20:21:07 +00:00
suo	82248f0b1c	[export] improve FakeTensor serialization (#119531 ) Recently we made it possible to serialize ExportedPrograms with fake parameters/buffers/etc. The serialization regime was kind of whacky; basically we serialized a stub and reassembled the FakeTensor using metadata that we had stashed elsewhere in the Graph state. This was bad for a few reasons: - Storing the metadata separately from the actual serialized object caused situations where you could have one but not the other. An example case is if you had a FakeTensor contained inside a TorchBind object—there was no obviously place to store the metadata for this. This actually happens—TensorQueue in fbgemm does this. - It created an annoying cycle: we had to deserialize the Graph's tensor metadata in order to deserialize (potentially faked) constants, but we need constants in order to deserialize the Graph. This fixes all that. The basic idea is to patch the reducer function for FakeTensor at serialization time, and serialize a copy of the FakeTensor metadata. We already are policing BC for the TensorMeta schema struct so it's not a net increase in the BC surface. As a bonus, I fixed a weird bug with torchbind tracing where we were accidentally reinterpreting a torch.ScriptObject as a torch.ScriptModule (which was the root cause of some weird behavior @bahuang was seeing last week). Differential Revision: [D53601251](https://our.internmc.facebook.com/intern/diff/D53601251/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119531 Approved by: https://github.com/zhxchen17	2024-02-12 19:28:08 +00:00
Ke Wen	b2043c0543	[c10d] PGNCCL refactor part 2: Simplify ProcessGroupNCCL into single-device style (#119421 ) Part 2 and last part of #118674: Introduce actual "single-device" code change to ProcessGroupNCCL. assert size == 1 and test refactor have been done in #119099. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119421 Approved by: https://github.com/shuqiangzhang	2024-02-12 18:45:49 +00:00
PyTorch MergeBot	0342b227e5	Revert "[c10d] PGNCCL refactor part 2: Simplify ProcessGroupNCCL into single-device style (#119421 )" This reverts commit `f3e7d80993`. Reverted https://github.com/pytorch/pytorch/pull/119421 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/119421#issuecomment-1938169747))	2024-02-12 07:34:20 +00:00
Ke Wen	f3e7d80993	[c10d] PGNCCL refactor part 2: Simplify ProcessGroupNCCL into single-device style (#119421 ) Part 2 and last part of #118674: Introduce actual "single-device" code change to ProcessGroupNCCL. assert size == 1 and test refactor have been done in #119099. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119421 Approved by: https://github.com/shuqiangzhang	2024-02-09 20:23:20 +00:00
Ke Wen	029a16c41f	[c10d] PGNCCL refactor part 1: adds assert size==1 (#119099 ) Breaking #118674 into multiple smaller PRs. This is the first one. It adds `assert size==1` to PGNCCL, and refactors some old tests written in multi-device style (which would otherwise fail at the assert). Pull Request resolved: https://github.com/pytorch/pytorch/pull/119099 Approved by: https://github.com/wconstab, https://github.com/XilunWu	2024-02-07 22:29:29 +00:00
PyTorch MergeBot	9d46fe603d	Revert "[c10d] PGNCCL refactor part 1: adds assert size==1 (#119099 )" This reverts commit `4ab852b6c5`. Reverted https://github.com/pytorch/pytorch/pull/119099 on behalf of https://github.com/atalman due to Breaks internal tests ([comment](https://github.com/pytorch/pytorch/pull/119099#issuecomment-1930839754))	2024-02-06 22:14:36 +00:00
Ke Wen	4ab852b6c5	[c10d] PGNCCL refactor part 1: adds assert size==1 (#119099 ) Breaking #118674 into multiple smaller PRs. This is the first one. It adds `assert size==1` to PGNCCL, and refactors some old tests written in multi-device style (which would otherwise fail at the assert). Pull Request resolved: https://github.com/pytorch/pytorch/pull/119099 Approved by: https://github.com/wconstab	2024-02-06 06:59:47 +00:00
lancerts	576383c2eb	Add torch check for dtype within bilinear (#118900 ) Fixes https://github.com/pytorch/pytorch/issues/117237 Short-term fix, when dtype does not match, it will be reflected in the torch check. @ezyang a cpp test case is added Pull Request resolved: https://github.com/pytorch/pytorch/pull/118900 Approved by: https://github.com/ezyang, https://github.com/malfet	2024-02-03 00:02:00 +00:00
Mu-Chu Lee	2b48891e62	[AOTInductor] Add Runtime Constant-folding for AOTInductor (#118765 ) Summary: Add Runtime Constant-folding for AOTInductor. This also include the invocation of constant folding at load time. The constant folding lowering is a 2-step process. First, we split the graph into 2 modules, one of it is the constant module, which doesn't depend on any input and the whole module could be inferred (constant-folded) one-time and be reused. The constant module, is lowered, and being codegen-ed as usual and cached (let's call this constant code). The constant code reuses the whole lowering/profiling/etc. process, only difference is that we do not generate any headers or initialization for the constant code. Second, after handling the constant module, we take care of the main module (which is the part that would depend on the user input.) For the main module, we take in one additional component, the constant code, compare with a normal lowering. Addition step we do here is that, we inject the constant code into the codegen-ed main module, and create the caller for the main module to consume the result of the constant module. Test Plan: Unit tests included in commit. Differential Revision: D53274382 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118765 Approved by: https://github.com/chenyang78	2024-02-01 04:54:25 +00:00
Boyuan Feng	b369888bec	Replace `constraints` with `dynamic_shapes` in caffe2/test/cpp & torchrec/distributed/tests/test_pt2 (#118026 ) Summary: `constraints` argument for `torch.export` has been deprecated in favor of the `dynamic_shapes` argument. This PR updates the use of the deprecated API in `caffe2/test/cpp` and `torchrec/distributed/test/test_pt2`. Test Plan: CI Differential Revision: D52977354 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118026 Approved by: https://github.com/chenyang78	2024-01-23 22:15:15 +00:00
suo	4057d005ff	Initial torchbind support in PT2 (#117697 ) This PR adds the bare minimum functionality to get torchbind working in an e2e testable way on PT2. It implements: * ProxyTensor support * Simple torch.export support (proxytensor-only path, e.g. non-strict). * add some tests exercising the path. Because all this is not fully baked, I hide the functionality behind a feature flag (`enable_torchbind_tracing()`) so it does not affect regular users for now. Still on the agenda: * Dynamo support * Actual FakeMode support * Mutability support Hoping to get this first bit in as a standalone, as it will unblock some more extensive experimentation/testing going on internally. Differential Revision: [D51825372](https://our.internmc.facebook.com/intern/diff/D51825372/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/117697 Approved by: https://github.com/SherlockNoMad	2024-01-19 06:28:20 +00:00
Ke Wen	c16e6e4cf7	[ProcessGroup] Make watchdog check work queue more frequently (#117297 ) Today watchdog's sleep interval is 1s. That's a bit long compared to modern GPU link's (or network link's) speed. Take DDP and Ampere for example: DDP's bucket size = 25 MB Ampere's NVLink speed = 250 GB/s 25 MB / 250 GB/s = 100 ms. So we are updating the interval to 100 ms. Update: 25 MB / 250 GB/s = 0.1 ms But let's see how it goes so far between making the checking more aggressive. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117297 Approved by: https://github.com/fduwjj	2024-01-19 02:33:31 +00:00
Ke Wen	6d96beb6be	[c10d] Remove health check (#117699 ) https://github.com/pytorch/pytorch/pull/114916 and https://github.com/pytorch/pytorch/pull/116222 added support for eager NCCL comm init (performed as soon as `init_process_group` is called). If any user cares about the time difference and want to see NCCL init errors early, they can use eager init now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117699 Approved by: https://github.com/wconstab	2024-01-18 02:14:49 +00:00
fduwjj	ca4df16fdd	[c10d] Make DebugInfoWriter Singleton across all PG objects (#116489 ) Previously, we have the writer register to each NCCL PG(backend), so for every pg, we have a NCCL PG instance, so if we use some customized writer when multiple sub-PGs are used, we need to ensure user to register the writer for every backend which indicates a bad UX. Furthermore, the debug info is global, so it does not make sense to have the writer for each instance. We even have a static mutex in the `dumpDebuggingInfo` to ensure we serialize the write, that makes it more obvious that we can make the writer a singleton so that we only have one writer instance for all PG instances. Although the rationale is clear, the implementation may vary a lot. So this PR is RFC for now to see if this implementation makes sense or not. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116489 Approved by: https://github.com/kwen2501	2024-01-03 03:42:54 +00:00
Bin Bao	feafbcf437	[AOTI][refactor] Refactor model runner API (#116047 ) Summary: 1) make proxy executor as a private member; 2) use std::string instead of char* Differential Revision: [D52301106](https://our.internmc.facebook.com/intern/diff/D52301106) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116047 Approved by: https://github.com/khabinov	2023-12-21 01:05:37 +00:00
Xilun Wu	0b0b9b3275	[c10d][libuv] add partial read test for libuv backend and fix an error which only happens when partially reading a buffer (#116141 ) Test Plan 1. build pytorch 2. execute `TORCH_CPP_LOG_LEVEL=INFO build/bin/TCPStoreTest --gtest_filter=TCPStoreTest.testLibUVPartialRead` from the pytorch root directory. without the change: <img width="761" alt="image" src="https://github.com/pytorch/pytorch/assets/12968408/1942e3c2-a9c1-4fe4-87e8-7e21f4d8f9aa"> with the change: <img width="747" alt="image" src="https://github.com/pytorch/pytorch/assets/12968408/f3e96a5b-0ed1-49bd-9184-bb8a5ebebc33"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/116141 Approved by: https://github.com/wconstab	2023-12-20 18:37:55 +00:00
Bin Bao	fabf9433e7	[AOTI][refactor] Organize model runner files (#116022 ) Summary: Move runner util files into a subdirectory and put AOTIModelContainerRunnerCpu into a separate file Differential Revision: [D52300693](https://our.internmc.facebook.com/intern/diff/D52300693) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116022 Approved by: https://github.com/khabinov	2023-12-20 15:35:34 +00:00
Nikita Shulga	d7caef7996	[CI] Update clang-format (#116002 ) To 17.0.6 build using https://github.com/pytorch/test-infra/blob/main/.github/workflows/clang-tidy-linux.yml Pull Request resolved: https://github.com/pytorch/pytorch/pull/116002 Approved by: https://github.com/suo	2023-12-18 14:58:46 +00:00
Mu-Chu Lee	c285ca7916	[AOTInductor] Add updaing constant buffer to active buffer. (#116001 ) Summary: Refactor update inactive constant buffer to allow updating with active buffer. Test Plan: Existing test to test inactive buffer updates. UpdateConstantsCuda in cpp test for active buffer updates. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/116001 Approved by: https://github.com/chenyang78	2023-12-18 11:49:03 +00:00

1 2 3 4 5 ...

2231 Commits