pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
lezcano	acd02a60d5	Add a test making sure we are not importing SymPy when importing torch (#112038 ) As per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/112038 Approved by: https://github.com/malfet, https://github.com/peterbell10 ghstack dependencies: #112035, #112036, #112037	2023-10-26 23:32:27 +00:00
lezcano	47ccf04885	Split SymNode into its own file (#112037 ) This PR: - Moves TrueDiv, LShift, RShift, IsNonOverlappingAndDenseIndicator to `_sympy.functions.py` - Moves SymNode to `fx.experimental.sym_node`. - This file does not have any SymPy dependencies at import time - It installs the magic methods in Sym{Bool,Int,Float}. - N.b. With this split, we may be able to move Sym{Bool,Int,Float} to this file, and remove quite a few of the hacks around these classes - Imports `sym_node` in `torch/__init__.py` rather than the whole `symbolic_shapes.py`. This breaks the import-time dependency between torch and SymPy Pull Request resolved: https://github.com/pytorch/pytorch/pull/112037 Approved by: https://github.com/peterbell10 ghstack dependencies: #112035, #112036	2023-10-26 23:32:27 +00:00
lezcano	deac5357db	Make proxy_tensor.py not depend on SymPy (#112036 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112036 Approved by: https://github.com/malfet, https://github.com/peterbell10 ghstack dependencies: #112035	2023-10-26 23:32:19 +00:00
lezcano	4f7f46ee35	Move SymDispatchMode to its own file (#112035 ) This is just code movement + a getter and a setter to break the dependency of SymDispatchMode, and in turn, ProxySymDispatchMode on sympy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112035 Approved by: https://github.com/peterbell10	2023-10-26 23:32:11 +00:00
PyTorch MergeBot	55ab9932f5	Revert "Constrain sdpa to fx strides (#111721 )" This reverts commit `8a7c3cec78`. Reverted https://github.com/pytorch/pytorch/pull/111721 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is breaking ROCm job in trunk `8a7c3cec78` ([comment](https://github.com/pytorch/pytorch/pull/111721#issuecomment-1782064133))	2023-10-26 23:27:57 +00:00
PyTorch MergeBot	4a94f77c8e	Revert "Make numpy/lib vendored tests dynamo traceable (#112147 )" This reverts commit `190b6e4ba8`. Reverted https://github.com/pytorch/pytorch/pull/112147 on behalf of https://github.com/huydhn due to Sorry for reverting this again, but this is failing in trunk `190b6e4ba8` ([comment](https://github.com/pytorch/pytorch/pull/112147#issuecomment-1782056995))	2023-10-26 23:23:49 +00:00
Shunting Zhang	73cc5d1cdd	[inductor] benchmark fusion (#108193 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108193 Approved by: https://github.com/jansel	2023-10-26 22:18:37 +00:00
Nikita Shulga	e660bd1422	Re-enable some embedded bag tests (#111712 ) They were temporary disabled in 2019 by https://github.com/pytorch/pytorch/pull/26599 As suggested, increased relative tolerance from 0 to 2% when tests are using float16 dtype <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 1e49d84</samp> > _`TestEmbeddingNN`_ > _CUDA tests restored_ > _Bug fixed in autumn breeze_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/111712 Approved by: https://github.com/huydhn	2023-10-26 22:16:38 +00:00
Evgeni Burovski	190b6e4ba8	Make numpy/lib vendored tests dynamo traceable (#112147 ) Follow up https://github.com/pytorch/pytorch/pull/112146 and #112141 : make numpy/lib vendored tests dynamo traceable Pull Request resolved: https://github.com/pytorch/pytorch/pull/112147 Approved by: https://github.com/lezcano	2023-10-26 21:41:22 +00:00
PyTorch MergeBot	abe172e268	Revert "Cleanup error reporting for ProcessGroupNCCL (#111979 )" This reverts commit `b29c658265`. Reverted https://github.com/pytorch/pytorch/pull/111979 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing multigpu test in trunk `b29c658265` ([comment](https://github.com/pytorch/pytorch/pull/111979#issuecomment-1781919184))	2023-10-26 21:29:40 +00:00
rzou	d91a18c433	Grandfather in torchgen'ed aten ops to torch.Tag.pt2_compliant_tag (#112053 ) In torchgen, we add the pt2_compliant_tag to all aten ops. Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/112053 Approved by: https://github.com/soulitzer	2023-10-26 21:21:09 +00:00
Jon Chuang	27cf49549a	[dynamo] `ExecutorchCallDelegateHigherOrderVariable` - add sanity check that input and output tensors are disjoint (#111960 ) Fixes https://github.com/pytorch/pytorch/issues/111917 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111960 Approved by: https://github.com/zou3519	2023-10-26 21:13:05 +00:00
Bin Bao	73f36e44fb	[aotinductor] Add a debug compile flag (#112021 ) Summary: When the debug compile flag is specified, model.so is compiled with "-O0 -g". Pull Request resolved: https://github.com/pytorch/pytorch/pull/112021 Approved by: https://github.com/chenyang78 ghstack dependencies: #111823	2023-10-26 21:11:08 +00:00
Bin Bao	f66cc67562	[aotinductor] Fix duplicated unbacked symbol declarations (#111823 ) Summary: For https://github.com/pytorch/pytorch/issues/111711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111823 Approved by: https://github.com/ezyang, https://github.com/aakhundov	2023-10-26 21:11:08 +00:00
Lengyue	f839a5627b	Add bf16 support to replicate padding (#112099 ) Fixes #99433 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112099 Approved by: https://github.com/mikaylagawarecki	2023-10-26 20:30:49 +00:00
Elias Ellison	8a7c3cec78	Constrain sdpa to fx strides (#111721 ) Fix for https://github.com/pytorch/pytorch/issues/109607. sdpa requires last dimension strides to be 1. Add constraint so that we run the op with the strides we observed in tracing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111721 Approved by: https://github.com/drisspg, https://github.com/Chillee, https://github.com/jansel ghstack dependencies: #111976	2023-10-26 20:21:55 +00:00
Justin Yip	1b702b185e	[pytorch-vulkan] disable one zero-dim tensor test to fix test (#112087 ) Summary: D50347338 has bug on android (not Mac, not Devserver). This diff disable the test for time being while I identify the actual cause. Test Plan: ## Compile on devserver ``` [yipjustin@129360.od ~/fbsource (e415d865c)]$ buck2 build -c ndk.static_linking=true -c pt.enable_qpl=0 --target-platforms=ovr_config//platform/android:arm32-fbsource //xplat/caffe2:pt_vulkan_api_test_binAndroid --show-output File changed: fbcode//caffe2/aten/src/ATen/test/vulkan_api_test.cpp File changed: fbsource//xplat/caffe2/aten/src/ATen/test/vulkan_api_test.cpp Buck UI: https://www.internalfb.com/buck2/99d47e63-ed6e-4db9-bee2-24909d647b78 Network: Up: 3.2KiB Down: 67KiB (reSessionID-459e359b-773c-48a4-b129-81fde7c5e876) Jobs completed: 4664. Time elapsed: 7.3s. Cache hits: 100%. Commands: 38 (cached: 38, remote: 0, local: 0) BUILD SUCCEEDED fbsource//xplat/caffe2:pt_vulkan_api_test_binAndroid buck-out/v2/gen/fbsource/f1f3f9bed27e143c/xplat/caffe2/__pt_vulkan_api_test_binAndroid__/pt_vulkan_api_test_binAndroid ``` ## Run test. adb shell /data/local/tmp/pt_vulkan_api_test_binAndroid \| pastry Result: P864940908 ``` ... [ OK ] VulkanAPITest.lstm_success (7 ms) [ RUN ] VulkanAPITest.lstm_mclareninputs_success [ OK ] VulkanAPITest.lstm_mclareninputs_success (56 ms) [ RUN ] VulkanAPITest.lstm_prepack_success [ OK ] VulkanAPITest.lstm_prepack_success (7 ms) [ RUN ] VulkanAPITest.querypool_flushed_shader_log xplat/caffe2/aten/src/ATen/test/vulkan_api_test.cpp:7568: Skipped QueryPool is not available [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log (0 ms) [----------] 391 tests from VulkanAPITest (30715 ms total) [----------] Global test environment tear-down [==========] 391 tests from 1 test suite ran. (30715 ms total) [ PASSED ] 390 tests. [ SKIPPED ] 1 test, listed below: [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log YOU HAVE 7 DISABLED TESTS ``` Reviewed By: liuk22 Differential Revision: D50668570 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112087 Approved by: https://github.com/izaitsevfb, https://github.com/SS-JIA	2023-10-26 19:48:40 +00:00
Yang Chen	5e5329155e	[aotinductor] only include -lc10 for non-fbcode case (#112125 ) Summary: otherwise, we would break internal uses Differential Revision: D50681467 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112125 Approved by: https://github.com/swolchok, https://github.com/desertfire, https://github.com/SherlockNoMad	2023-10-26 19:47:08 +00:00
PyTorch MergeBot	3a284dae30	Revert "Do not materialize entire randperm in RandomSampler (#103339 )" This reverts commit `d80174e2db`. Reverted https://github.com/pytorch/pytorch/pull/103339 on behalf of https://github.com/kit1980 due to Cause issues on MPS, and also fails without numpy ([comment](https://github.com/pytorch/pytorch/pull/103339#issuecomment-1781705172))	2023-10-26 18:53:14 +00:00
Thiago Crepaldi	b7affa2ac3	Add unit test for ONNX models with torch.distributions.normal.Normal (#111498 ) Fixes #111034 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111498 Approved by: https://github.com/justinchuby, https://github.com/BowenBao	2023-10-26 17:57:34 +00:00
ydwu4	8bc0b382fa	[HigherOrderOp] Move map_impl to torch.ops.higher_order (#111404 ) The purpose of this pr is as titled. Because of some misusage of ghstack, ghimport, and export to github from internal, the stack of https://github.com/pytorch/pytorch/pull/111092 is a mess. I'll try to land them one by one. This is a replacement for #111092 and #111400. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111404 Approved by: https://github.com/tugsbayasgalan, https://github.com/zou3519	2023-10-26 16:59:10 +00:00
Huy Do	f6f81a5969	Update get-workflow-job-id to also return job name (#112103 ) Then we can use this job name in `filter-test-configs` if it's available. This addresses the issue in which `filter-test-configs` on GitHub runners (MacOS x86) couldn't find the runner log to get the job name. This is expected because GitHub runners are isolated, so a job should not be able to access runner logs, which could contains information from other jobs. This allows all missing features depending on running `filter-test-configs` on GitHub runners: * Rerun disabled tests and memory leak check. For example, this would help avoid closing https://github.com/pytorch/pytorch/issues/110980#issuecomment-1779806466 early with the disabled test running properly on MacOS x86 * MacOS x86 jobs can now be disabled or marked as unstable I keep the current logic to parse the log as a fallback because it's working fine on self-hosted runners. That also handles the case where `get-workflow-job-id` fails. Also I move the rest of `get-workflow-job-id` up before the test step like https://github.com/pytorch/pytorch/pull/111483 ### Testing Spot checks some jobs to confirm they have the correct names: * MacOS M1 test job https://github.com/pytorch/pytorch/actions/runs/6648305319/job/18065275722?pr=112103#step:10:8 * MacOS x86 build job https://github.com/pytorch/pytorch/actions/runs/6648306305/job/18065138137?pr=112103#step:9:14 * Linux test job has https://github.com/pytorch/pytorch/actions/runs/6648300991/job/18065354503?pr=112103#step:13:7 * Windows test job https://github.com/pytorch/pytorch/actions/runs/6648305319/job/18065599500?pr=112103#step:12:7 * MacOS x86 test job https://github.com/pytorch/pytorch/actions/runs/6648306305/job/18066312801#step:10:8 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112103 Approved by: https://github.com/clee2000	2023-10-26 16:42:46 +00:00
PyTorch MergeBot	485cc0faae	Revert "[inductor] benchmark fusion (#108193 )" This reverts commit `ec0cdcdf6a`. Reverted https://github.com/pytorch/pytorch/pull/108193 on behalf of https://github.com/ZainRizvi due to This test is breaking trunk. In the future please make sure to add the ciflow/trunk label before force merging any PR to ensure your code doesn't break those tests ([comment](https://github.com/pytorch/pytorch/pull/108193#issuecomment-1781473282))	2023-10-26 16:41:20 +00:00
Edward Z. Yang	7da713bbaf	Convert evaluate_expr GuardOnDataDependentSymNode into graph break (#111919 ) Extracted this failure from https://github.com/pytorch/pytorch/pull/110155 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/111919 Approved by: https://github.com/lezcano	2023-10-26 16:28:00 +00:00
ydwu4	036abd43b3	[dynamo] Preserve node names in export (#111947 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111947 Approved by: https://github.com/ydwu4, https://github.com/zou3519	2023-10-26 16:11:35 +00:00
angelayi	b126adcdee	[aotinductor] Pass TorchIR to AOTInductor (#110020 ) Updates `_export.aot_compile` to pass a torch IR graph to inductor, allowing inductor to now run the pre_grad_passes, and reuse more of inductor's code. Also updates the API to only return the `so_path`, and not returning the exported program. The pytree call spec is now serialized and placed inside of the generated model code. When calling the model, because there is no c++ pytree implementation linked yet, we can access the call specs through `get_call_spec()`, and call pytree flatten/unflattenin python. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110020 Approved by: https://github.com/desertfire	2023-10-26 15:54:31 +00:00
Evgeni Burovski	ed2cc4dd59	TST: make torch_np added tests dynamo traceable (#112149 ) Follow up https://github.com/pytorch/pytorch/pull/112146, https://github.com/pytorch/pytorch/pull/112141 and https://github.com/pytorch/pytorch/pull/112147: make torch_np added tests dynamo traceable Pull Request resolved: https://github.com/pytorch/pytorch/pull/112149 Approved by: https://github.com/lezcano	2023-10-26 15:36:36 +00:00
Joel Schlosser	42e4c648a2	New @decorateIf decorator for param-specific conditional decoration (#112033 ) Adds a new decorator `@decorateIf(decorator, predicate_fn)`. Examples: ```python from torch.testing._internal.common_utils import decorateIf ... @decorateIf(unittest.skip, lambda params: params["x"] == 2) @parametrize("x", range(5)) def test_foo(self, x): ... @parametrize("x,y", [(1, 'foo'), (2, 'bar'), (3, 'baz')]) @decorateIf( unittest.expectedFailure, lambda params: params["x"] == 3 and params["y"] == "baz" ) def test_bar(self, x, y): ... @decorateIf( unittest.expectedFailure, lambda params: params["op"].name == "add" and params["dtype"] == torch.float16 ) @ops(op_db) def test_op_foo(self, device, dtype, op): ... @decorateIf( unittest.skip, lambda params: params["module_info"].module_cls is torch.nn.Linear and \ params["device"] == "cpu" ) @modules(module_db) def test_module_foo(self, device, dtype, module_info): ... ``` Follow-up for per-param decoration based on https://github.com/pytorch/pytorch/issues/79161#issuecomment-1152487359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112033 Approved by: https://github.com/clee2000, https://github.com/pmeier	2023-10-26 14:39:59 +00:00
Yang Chen	7671be8108	[aotinductor] allow generating default args in fbcode (#112085 ) Summary: Previously, we want to maintain forward-compatibility by skipping default args in the serialized artifacts in fbcode. However, some of our shim interfaces require default values being set. Discussed with Sherlock offline and we decided to allow serializing default args into the C++ wrapper code for now. We will refine this part if we see real FC requirement. Test Plan: ci Differential Revision: D50638663 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112085 Approved by: https://github.com/SherlockNoMad	2023-10-26 14:17:54 +00:00
lezcano	c8a5bb451e	Do not import sympy within torch._prims_common (#112034 ) This is the first of a few PRs that avoid importing SymPy at import time. The pitch here is that we (almost!) do not have SymPy on our API, so this should be feasible. This should speed-up torch imports by a good 15% as per https://dev-discuss.pytorch.org/t/delving-into-what-happens-when-you-import-torch/1589 In this PR we just move a few global imports into local imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112034 Approved by: https://github.com/ezyang	2023-10-26 12:53:25 +00:00
Jon Chuang	d6724a51f9	[dynamo] md5 hash non `compile_ignored` configs (#111298 ) fixes: https://github.com/pytorch/pytorch/issues/111235 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111298 Approved by: https://github.com/ezyang ghstack dependencies: #111303	2023-10-26 10:59:10 +00:00
Cao E	1c89ea7f72	Add Half support for softmax and log_softmax on CPU (#103315 ) Add Half support for softmax and log_softmax on CPU. Note: This introduces a correctness issue with MPS https://github.com/pytorch/pytorch/issues/111416 and https://github.com/pytorch/pytorch/issues/111479. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103315 Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki, https://github.com/malfet	2023-10-26 08:38:54 +00:00
dshi7	fbff99ffea	Add regex matching to Inductor all2all collective unit tests (#112077 ) Fixes #111776 Support check_regex in FileCheck() by adding `find_regex` in `struct TORCH_API StringCordView`. Callsite accepts RE syntax for std::regex. However, I haven't figured out submatch ID yet. For example, "buf5[0], buf6_inputs[0]" is still considered a match. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112077 Approved by: https://github.com/yf225	2023-10-26 08:29:30 +00:00
XiaobingSuper	395614c1a4	keep sync bn training flag same with converted bn's training flag (#111998 ) When converting bn to sync bn, we need to keep sync bn's training flag with the original bn flag, the motivation is there in case the given origin model has set some bn training flag and others are not seated, after we convert sync bn, we hoping not to change this behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111998 Approved by: https://github.com/mikaylagawarecki	2023-10-26 08:18:08 +00:00
chilli	e38347f490	Readded device_assert skipping in index and index_put (and also added (#112093 ) copy to noop pass) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112093 Approved by: https://github.com/oulgen, https://github.com/lezcano ghstack dependencies: #111990	2023-10-26 07:54:44 +00:00
Jon Chuang	d090c18fca	[dynamo] annotate config with `@compile_ignored` (#111303 ) Fixes: #111221 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111303 Approved by: https://github.com/ezyang	2023-10-26 05:41:29 +00:00
Jez Ng	89bd17552d	[dynamo] Enable typechecking for funcname_cache.py (#112031 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112031 Approved by: https://github.com/Skylion007 ghstack dependencies: #111894, #111992	2023-10-26 04:54:16 +00:00
Jez Ng	413baa1b25	[dynamo] Enable typechecking for codegen.py (#111992 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111992 Approved by: https://github.com/Skylion007, https://github.com/eellison ghstack dependencies: #111894	2023-10-26 04:54:16 +00:00
Jez Ng	e67d2c9825	[dynamo] Enable typechecking for allowed_functions.py (#111894 ) Motivation: MYPYNOFOLLOW currently typechecks almost all inductor files and some dynamo files as well. However, it has `follow_imports=skip` enabled which greatly nerfs its effectiveness. I would like to enable import following for all the files currently checked by MYPYNOFOLLOW. But that leads to a lot of new errors in other files. I can exclude errors from files in other directories, but it is somewhat difficult to do that for dynamo and inductor files themselves. Thus I am making sure all the dynamo files typecheck first. Note on changes: I could not type the return value of `make_function_id_set` since it was returning a class defined in the function body. Thus I deleted `make_function_id_set` and replaced it with a direct construction of the `FunctionIdSet` instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111894 Approved by: https://github.com/Skylion007, https://github.com/eellison	2023-10-26 04:54:16 +00:00
Nikita Shulga	b61efe1c2b	Fix `torch.[size\|stride]`(dim=None)` invocation (#111991 ) Per documentation, one should be able to explicitly pass dim argument as None to get tensor size across all dimentions/strides, but before this change it was incorrectly interpreted as named tensor call. Modify `size` and `stride` signatures generated by `gen_pyi.py` to highlight that overload with `None` will return a Tuple, but one with `dim: _int` returns `int`. Add regression test to validate the behavior, and remove the check for asserts from two named tensors tests (NamedTensors are dead, aren't they?) Fixes https://github.com/pytorch/pytorch/issues/111944 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111991 Approved by: https://github.com/zou3519	2023-10-26 04:14:35 +00:00
Shunting Zhang	ec0cdcdf6a	[inductor] benchmark fusion (#108193 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108193 Approved by: https://github.com/jansel	2023-10-26 04:14:22 +00:00
Jon Chuang	edafe2ddb9	[dynamo] Be stricter about `HigherOrderOperator` kwargs (#111938 ) kwargs need to be handled carefully in speculate subgraph. We should be clearer about the contract of what the inputs are. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111938 Approved by: https://github.com/zou3519	2023-10-26 03:51:30 +00:00
Brian Hirsh	2aaa7e542c	AOTAutograd: avoid intermediate_base logic when all aliased outputs came from a multi_output_view (#111411 ) Partially addresses https://github.com/pytorch/pytorch/issues/111081 This fixes the majority of the slowness from https://fb.workplace.com/groups/1405155842844877/permalink/7491314274228973/. In particular, the type of example that suffers the most perf-wise in AOTAutograd looks like this: ``` @torch.compile def f(x): intermediate = x.mul(2) outs = intermediate.unbind(0) return outs x = torch.randn(50, 50, requires_grad=True) outs = f(x) sum(outs).sum().backward() ``` There are 50 output tensors in the above function, that all alias each other. AOTAutograd will dutifully exercise its intermediate base [logic](https://github.com/pytorch/pytorch/blob/main/torch/_functorch/aot_autograd.py#L294), and try to regenerate the aliases outside of the compiled `autograd.Function` at runtime, to ensure that the autograd engine is aware of the aliasing. In this case, this will result in 50 AsStridedBackward nodes in the backward, because we will fall back to using as_strided to generate each of those 50 outputs. The current PR as is (somewhat unsafely) ensures that the backward graph consists of a single `UnbindBackward`, or a call to `aten.cat()`. I left a long comment in the code describing the situation, but the core idea is that autograd does not let you mutate grad_fn of tensor aliases that come from multi-output views. So if we have `k` outputs that alias each other, but `k-1` of them are aliases that came from multi-output views, then in eager mode, it would not be possible to mutate one of the aliases in a way that would change the grad_fn of any of the other aliases, without causing an error in the backward. So the claim I'm making is that if we hide this aliasing from the autograd engine, then it is impossible for the user to perform any mutations that would cause autograd metadata to diverge between torch.compile and eager in a way that isn't an error in eager mode. To be fair, I think that taking the approach outlined in https://docs.google.com/document/d/1DlfFq8TKbuAn2zyJxLfoW-X1qkkm5PLdHFtySo03QAk/edit would also help us avoid the as_strided calls in this particularly egregious case, and* keep the autograd error messages. This relies on both pre-dispatch functionalization being fully hardened and adding some pretty invasive changes to AOTAutograd though, and is probably at least several months out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111411 Approved by: https://github.com/ezyang	2023-10-26 02:54:50 +00:00
Jeff Daily	28c0b07d19	[ROCm] remove HCC references (#111975 ) - rename `__HIP_PLATFORM_HCC__` to `__HIP_PLATFORM_AMD__` - rename `HIP_HCC_FLAGS` to `HIP_CLANG_FLAGS` - rename `PYTORCH_HIP_HCC_LIBRARIES` to `PYTORCH_HIP_LIBRARIES` - workaround in tools/amd_build/build_amd.py until submodules are updated These symbols have had a long deprecation cycle and will finally be removed in ROCm 6.0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111975 Approved by: https://github.com/ezyang, https://github.com/hongxiayang	2023-10-26 02:39:10 +00:00
Kurt Mohler	f1785373c0	Add `torch.utils.deterministic.fill_uninitialized_memory` flag (#111377 ) Part of #109802 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111377 Approved by: https://github.com/albanD	2023-10-26 02:39:06 +00:00
Hongtao Yu	7a3a00bb0b	[inductor] Remove redundant views (#111773 ) As a follow-up to https://github.com/pytorch/pytorch/pull/110740, this patches enables removing redundant complex views to allow more operation fusing. E.g, given ``` @torch.compile def foo(X, Y): Z = X + Y A = X + Y return A + Z ``` the generated code is: ``` @triton.jit def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): xnumel = 6 xoffset = tl.program_id(0) * XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:] xmask = xindex < xnumel x0 = xindex tmp0 = tl.load(in_ptr0 + (x0), xmask) tmp1 = tl.load(in_ptr1 + (x0), xmask) tmp2 = tmp0 + tmp1 tmp3 = tmp2 + tmp2 tl.store(out_ptr0 + (x0), tmp3, xmask) ''') def call(args): arg0_1, arg1_1 = args args.clear() assert_size_stride(arg0_1, (3, ), (1, )) assert_size_stride(arg1_1, (3, ), (1, )) with torch.cuda._DeviceGuard(0): torch.cuda.set_device(0) # no-op to ensure context # Source Nodes: [A], Original ATen: [aten.add] buf0 = aten.view.dtype(arg0_1, torch.float32) del arg0_1 buf1 = buf0 del buf0 # Source Nodes: [A], Original ATen: [aten.add] buf2 = aten.view.dtype(arg1_1, torch.float32) del arg1_1 buf3 = buf2 del buf2 buf4 = empty_strided((6, ), (1, ), device='cuda', dtype=torch.float32) # Source Nodes: [add_2], Original ATen: [aten.add] stream0 = get_cuda_stream(0) triton_poi_fused_add_0.run(buf1, buf3, buf4, 6, grid=grid(6), stream=stream0) del buf1 del buf3 # Source Nodes: [add_2], Original ATen: [aten.add] buf5 = aten.view.dtype(buf4, torch.complex64) del buf4 buf6 = buf5 del buf5 return (buf6, ) ``` whereas previously the generated code was: ``` @triton.jit def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): xnumel = 6 xoffset = tl.program_id(0) * XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:] xmask = xindex < xnumel x0 = xindex tmp0 = tl.load(in_ptr0 + (x0), xmask) tmp1 = tl.load(in_ptr1 + (x0), xmask) tmp2 = tmp0 + tmp1 tl.store(out_ptr0 + (x0), tmp2, xmask) def call(args): arg0_1, arg1_1 = args args.clear() assert_size_stride(arg0_1, (3, ), (1, )) assert_size_stride(arg1_1, (3, ), (1, )) with torch.cuda._DeviceGuard(0): torch.cuda.set_device(0) # no-op to ensure context # Source Nodes: [A], Original ATen: [aten.add] buf0 = aten.view.dtype(arg0_1, torch.float32) buf1 = buf0 del buf0 # Source Nodes: [A], Original ATen: [aten.add] buf2 = aten.view.dtype(arg1_1, torch.float32) buf3 = buf2 del buf2 buf4 = empty_strided((6, ), (1, ), device='cuda', dtype=torch.float32) # Source Nodes: [A], Original ATen: [aten.add] stream0 = get_cuda_stream(0) triton_poi_fused_add_0.run(buf1, buf3, buf4, 6, grid=grid(6), stream=stream0) del buf1 del buf3 # Source Nodes: [A], Original ATen: [aten.add] buf5 = aten.view.dtype(buf4, torch.complex64) buf6 = buf5 del buf5 # Source Nodes: [add_2], Original ATen: [aten.add] buf7 = aten.view.dtype(buf6, torch.float32) del buf6 buf8 = buf7 del buf7 # Source Nodes: [Z], Original ATen: [aten.add] buf9 = aten.view.dtype(arg0_1, torch.float32) del arg0_1 buf10 = buf9 del buf9 # Source Nodes: [Z], Original ATen: [aten.add] buf11 = aten.view.dtype(arg1_1, torch.float32) del arg1_1 buf12 = buf11 del buf11 buf13 = buf4; del buf4 # reuse # Source Nodes: [Z], Original ATen: [aten.add] triton_poi_fused_add_0.run(buf10, buf12, buf13, 6, grid=grid(6), stream=stream0) del buf10 del buf12 # Source Nodes: [Z], Original ATen: [aten.add] buf14 = aten.view.dtype(buf13, torch.complex64) buf15 = buf14 del buf14 # Source Nodes: [add_2], Original ATen: [aten.add] buf16 = aten.view.dtype(buf15, torch.float32) del buf15 buf17 = buf16 del buf16 buf18 = buf13; del buf13 # reuse # Source Nodes: [add_2], Original ATen: [aten.add] triton_poi_fused_add_0.run(buf8, buf17, buf18, 6, grid=grid(6), stream=stream0) del buf17 del buf8 # Source Nodes: [add_2], Original ATen: [aten.add] buf19 = aten.view.dtype(buf18, torch.complex64) del buf18 buf20 = buf19 del buf19 return (buf20, ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/111773 Approved by: https://github.com/jansel	2023-10-26 02:37:17 +00:00
Zhengxu Chen	64d75f72d4	[fx] Add a faster method for inserting positional argument. (#111974 ) Summary: Traditionally when user want to update the arguments for an FX node, the only way is to call the setter of .args property on nodes. This may be problematic when we insert a lot of arguments. Because of the semantics of the setter method, it has a worst case O(n) complexity. Adding a new insert_arg provides us two benefits: 1. The operation is guaranteed to be O(1) cost. 2. User can express the intentation more directly, instead of writing code like `node.args = (arg,) + node.args` Test Plan: caffe2/test:fx -- -r test_insert_arg Reviewed By: suo Differential Revision: D50574435 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111974 Approved by: https://github.com/angelayi	2023-10-26 02:30:42 +00:00
Pritam Damania	b29c658265	Cleanup error reporting for ProcessGroupNCCL (#111979 ) Continuing some of the work from https://github.com/pytorch/pytorch/pull/108191, I realized majority of errors raised from ProcessGroupNCCL were just generic RuntimeError. In this PR, I've added appropriate error types to all the exceptions raised from ProcessGroupNCCL. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111979 Approved by: https://github.com/fduwjj	2023-10-26 01:39:54 +00:00
chilli	74adb4cccc	Updated flop counter to accept pytree inputs/outputs (#111990 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111990 Approved by: https://github.com/ezyang	2023-10-26 01:25:27 +00:00
PyTorch MergeBot	d641450180	Revert "[cpu][inductor] improve cpu vec implementations of log (#111898 )" This reverts commit `b570320364`. Reverted https://github.com/pytorch/pytorch/pull/111898 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/111898#issuecomment-1780263780))	2023-10-26 01:12:19 +00:00

1 2 3 4 5 ...

65545 Commits