pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Bin Bao	3058700f7f	[aotinductor] Add AOTIModelRunner as a utility class (#110891 ) Summary: Introduce a utility class AOTIModelRunner to take care of running an AOTInductor compiled model. It does things like dlopen a model, initialize the model container, setup inputs and outputs, and destroy the model container. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110891 Approved by: https://github.com/chenyang78 ghstack dependencies: #110652	2023-10-11 15:58:28 +00:00
Bin Bao	b17c247eb1	[aotindutor] Update the cpp test example (#110652 ) Summary: store inputs and outpus in python, and load them back to run the compiled model in c++ and compare the output Pull Request resolved: https://github.com/pytorch/pytorch/pull/110652 Approved by: https://github.com/chenyang78	2023-10-11 15:58:28 +00:00
ydwu4	3062e267b1	[cond] Add more tests for valid inputs of cond (#110727 ) This PR adds a parametrized test for cond. It tests cond can be traced with valid inputs. Specifically valid inputs is combination of: - pred (python boolean, boolean tensor, int tensor, scalar tensor) - true_fn/false_fn (func, obj, nn_module) - Operands (0 or more tensor inputs), tested with 0 and 2 - closures (0 or more tensor closures), tested with 0 and 2 - nested_level (no nesting or level-2 nested cond) What this test doesn't cover: - pred: symbolic boolean expression as predicate - true_fn/false_fn: that mutates indermediate tensors - operands: non-tensor operands such as float, int - closures: nn_module attribute closures, python constant closures - nested_level: 3+ Pull Request resolved: https://github.com/pytorch/pytorch/pull/110727 Approved by: https://github.com/zou3519	2023-10-11 15:56:13 +00:00
Nikita Shulga	ef19824db8	Suppress warnings in tensorpipe.h (#111012 ) To fix distributed compilation with clang-15 Fixes https://github.com/pytorch/pytorch/issues/110974 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111012 Approved by: https://github.com/huydhn, https://github.com/drisspg, https://github.com/Skylion007	2023-10-11 15:41:30 +00:00
Nikita Shulga	f2d476843e	[MPS][BE] Avoid redispatch in `sign_out` (#110955 ) By calling `at::mps::sign_outf` rather than `at::sign_out` that calls dispatcher again. Also, do not copy output unnecessarily. <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at f942e74</samp> > _Metal tensors rise from the ashes_ > _`sign` and `sgn` unleash their flashes_ > _MPSFunctions reign supreme_ > _In the header of the metal dream_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/110955 Approved by: https://github.com/kulinseth, https://github.com/albanD	2023-10-11 15:10:21 +00:00
Sam Larsen	fc1105b282	[inductor] Implement Fx graph caching to improve warm compilation time. (#103453 ) Summary: Implement an on-disk cache to save and reuse compiled FX Graphs. This implementation does not handle tensors with symbolic shapes. This needs to be done in a follow-up PR. Test Plan: * New unit tests exercising saving and load from the cache. * New unit tests to exercise the cache key calculations. * Ran several benchmarks to see cache hit and resulting compilation times. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103453 Approved by: https://github.com/eellison, https://github.com/Chillee	2023-10-11 14:39:14 +00:00
rzou	2cf9782912	[generate_opcheck_tests] Add some reasonable defaults (#110977 ) Summary: Make it easier to add `generate_opcheck_tests` by adding defaults for the failures_dict location, the additional decorators, and the test utils. Test Plan: Existing tests Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110977 Approved by: https://github.com/williamwen42 ghstack dependencies: #110951	2023-10-11 14:28:05 +00:00
Bin Bao	4abfa22812	[aotinductor] Add a perf smoke test for AOTInductor (#110972 ) Summary: To prevent perf regression like the one caused by #110510 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110972 Approved by: https://github.com/chenyang78	2023-10-11 13:30:05 +00:00
PyTorch MergeBot	98c329b19e	Revert "[core ATen IR] Add decompositions for max, min, var_mean (#110906 )" This reverts commit `9606cda64e`. Reverted https://github.com/pytorch/pytorch/pull/110906 on behalf of https://github.com/SS-JIA due to Breaks internal CI ([comment](https://github.com/pytorch/pytorch/pull/110906#issuecomment-1757490740))	2023-10-11 11:41:21 +00:00
igm503	95ff51d8ed	[MPS] Add support for Softshrink to MPS Backend (#110814 ) Adds the softshrink activation function to the mps backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110814 Approved by: https://github.com/kulinseth	2023-10-11 07:55:39 +00:00
Rohan Varma	de370eb313	[Distributed] Small nits to apply_optimizer_in_backward (#110903 ) Clarify a few things around the documentation Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/110903 Approved by: https://github.com/janeyx99	2023-10-11 07:45:45 +00:00
PyTorch MergeBot	0821868110	Revert "[export] Get export APIs ready for PTC (#110410 )" This reverts commit `b96ea9f361`. Reverted https://github.com/pytorch/pytorch/pull/110410 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/110410#issuecomment-1757017249))	2023-10-11 07:31:51 +00:00
Huy Do	74029fae9d	Fix broken period workflow after #110976 (#111013 ) My typo mistake after #110976 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111013 Approved by: https://github.com/kit1980, https://github.com/malfet	2023-10-11 06:40:18 +00:00
Ramin Azarmehr	056d6247c7	[MPS] Use Metal Events to synchronize buffers in MPSAllocator (Part 1) (#106938 ) - This PR is the first part of a bigger change to use `MPSEvent` to synchronize shared-buffers between CPU/GPU. - Add APIs to record and wait for `MPSEvents` in `MPSAllocator`. - Use a container list for Buffer Pools to simplify iterating over them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106938 Approved by: https://github.com/kulinseth	2023-10-11 06:13:05 +00:00
Angela Yi	b96ea9f361	[export] Get export APIs ready for PTC (#110410 ) Summary: https://docs.google.com/document/d/1QJJEGnj2nHGPODlw38BEG3KLLCOTfdOVjPrNQbz_LM8/edit#bookmark=id.lp80wfshq130 Changes: * `torch.export` will return a functional ATen graph w/o decompositions * `exported_program.run_decompositions(decomposition_table)` will optionally take a decomposition table, and run decompositions on the exported program, returning a new exported program. By default we will run the Core ATen decomposition table. Calling convention for Executorch stays the same: ``` pre_autograd_graph = capture_pre_autograd_graph(f, args, ...) aten_graph_no_decomps = torch.export.export(pre_autograd_graph, args, ...) # Within to_edge we decompose to core aten and then convert to edge edge_graph = exir.to_edge(aten_graph_no_decomps) ``` Test Plan: CI Differential Revision: D49742989 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110410 Approved by: https://github.com/ydwu4	2023-10-11 06:10:07 +00:00
Michael Voznesensky	1e7947b3e0	Revert "Reland 3rd try [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#109323 )" + Forward fixes + test (#110964 ) This reverts commit `f786fbdebd`. Forward fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/110964 Approved by: https://github.com/ezyang, https://github.com/anijain2305	2023-10-11 05:16:47 +00:00
Nikita Shulga	e49ea87162	Fix socket.cpp compilation using gcc-9.4 (#111002 ) Otherwise following error is thrown when attempted to compile with WERROR enabled: ``` In file included from /home/nshulga/git/pytorch/pytorch/torch/csrc/distributed/c10d/socket.cpp:30: /home/nshulga/git/pytorch/pytorch/third_party/fmt/include/fmt/chrono.h:340:24: warning: redundant redeclaration of ‘constexpr’ static data member ‘fmt::v10::detail::codecvt_result<CodeUnit>::max_size’ [-Wdeprecated] 340 \| constexpr const size_t codecvt_result<CodeUnit>::max_size; \| ^~~~~~~~~~~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/third_party/fmt/include/fmt/chrono.h:335:33: note: previous declaration of ‘fmt::v10::detail::codecvt_result<CodeUnit>::max_size’ 335 \| static constexpr const size_t max_size = 32; \| ^~~~~~~~ ``` or following if using clang as host compiler ``` In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/distributed/c10d/socket.cpp:30: /Users/nshulga/git/pytorch/pytorch/third_party/fmt/include/fmt/chrono.h:340:50: warning: out-of-line definition of constexpr static data member is redundant in C++17 and is deprecated [-Wdeprecated] constexpr const size_t codecvt_result<CodeUnit>::max_size; ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/111002 Approved by: https://github.com/drisspg	2023-10-11 05:16:00 +00:00
wz337	a614281ea9	Add current_device() to torch.cpu (#110987 ) Better support device agnostic, add a "cpu" return for `current_device()` in torch.cpu so that we won't run into `AttributeError: module 'torch.cpu' has no attribute 'current_device'`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110987 Approved by: https://github.com/wanchaol	2023-10-11 05:13:10 +00:00
soulitzer	110382bacf	Make NestedTensor compilable with eager backend (#109171 ) In this PR: - Adds support for strides for jagged tensor (design doc for this coming soon) - NestedTensor skips automatic dynamic - Make use of @bdhirsh's subclass fakification logic by adding the __tensor_{un,}flatten__ functions. - Additional logic for fakification: since existing subclass fakification logic does not handle the case where the outer tensor has an additional dimension. We insert one-off logic to (1) insert an extra SingletonSymInt onto the fakified NestedTensor. (2) make sure we call track_symint on both the sizes on the inner and outer tensor during guard creation. Remaining things that are weird: - Still need to skip some logic in meta utils for some reason (I was going to write this up more, but decided not to since we're not able to do this anyway for a immediate reason: we cannot arbitrarily compare singleton ints. For now I'm just following Brian's advise from [here](https://github.com/pytorch/pytorch/pull/109171#discussion_r1328137070) ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109171 Approved by: https://github.com/ezyang, https://github.com/bdhirsh	2023-10-11 04:47:10 +00:00
drisspg	e0dbaa04d2	Fix the meta func for mem_eff_backward (#110893 ) Fixes #110832 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110893 Approved by: https://github.com/eellison	2023-10-11 02:58:54 +00:00
andrewor14	0e551bbcd7	[quant][pt2] Preserve source_fn_stack after QAT fusion (#110899 ) Test Plan: python test/test_quantization.py TestQuantizePT2EQAT.test_qat_preserve_source_fn_stack Reviewers: jerryzh168, kimishpatel Subscribers: jerryzh168, kimishpatel, supriyar Differential Revision: [D50101253](https://our.internmc.facebook.com/intern/diff/D50101253) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110899 Approved by: https://github.com/jerryzh168	2023-10-11 02:55:52 +00:00
Tugsbayasgalan Manlaibaatar	5aee22e0e0	Move export.constrain_as_* to torch._constrain_as_* (#110757 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110757 Approved by: https://github.com/avikchaudhuri ghstack dependencies: #109859	2023-10-11 02:37:55 +00:00
soulitzer	c9eb8d8d90	Add set_checkpoint_debug_enabled that overrides local setting (#110728 ) People access activation checkpoint through many layers of config and it is not always guaranteed that all the layers of wrapping around checkpoint properly propagate all the kwargs, e.g. debug mode. This context manager offers an alternative way to enable debug mode that bypasses the need for all layers to propagate kwargs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110728 Approved by: https://github.com/albanD ghstack dependencies: #110673, #110674, #110675, #110676	2023-10-11 02:12:31 +00:00
Michael Voznesensky	02f6a8126e	Support a simple subset of functions as backward hooks on intermediate tensors (#109537 ) The main thrust of the initial effort here was to capture `register_hook` calls on tensors in compile regions. The first part of this was done in https://github.com/pytorch/pytorch/pull/108903 wherein we added support for register_hook input tensors. The distinction between input and intermediary is due to implementation differences. There are 2 kinds of hooks: 1) Hooks on objects with sources (inputs, params) 2) Hooks on objects w/o sources (intermediaries, and outputs). Note: As outputs can be made simple by how dynamo handles residuals, they could actually be handled as if they were inputs, but, for the sake of this PR, we will refer to hooks as either hooks on inputs (sourced), or hooks on intermediaries (not sourced). The plan: For tensors w/ a source: (The PR above) We record registered hooks, store them as a global, and associate them with the tensor in residuals. This means that when dynamo goes to create the frame, where we produce bytecode to stitch together our PT2 modified bytecode with the original eager code, we call register_hook. This registration of hooks in residuals is sound because (a) it happens right after a Pt2 frame region ends and (b) we know that the tensor is alive in f_locals, f_globals, or a module in the users invoking frame. This means we can soundly know it will be around to invoke register_hook on. As long as we guard on the identity of the lifted function, this is sound to do. For tensors w/o a source: (This PR) Ostensibly, the most correct and complete solution would be to smuggle hooks into a runtime wrapper in aot_autograd, where all the items the hooks close over are lifted to inputs as necessary and passed alongside the user provided function. This is necessary so that we can properly trace out and capture all the mutations within the user defined hook at backwards time. This is too complicated - so, we limited the scope of this initial PR to a simple subset of hooks: - Hooks must have a source (be known to us already, not a lambda or intermediary defined function) - We must be tracing under compiled autograd The flow: We use the HOP added in https://github.com/pytorch/pytorch/pull/109690/files, referred to as the HOP below. 1) We intercept register_hook calls and wrap the user defined fn in the HOP 2) We write a `_register_hook_trampoline` to the graph that is a local no-arg function that is invoked as a call_function in the dynamo graph 3) aot_autograd inlines through it during its trace, and sees the HOP 4) the HOP preserves itself in the graph - it does not get traced into 5) During backwards, compiled_autograd installs the HOP under a hook call 6) When compiled_autograd enters compilation over its generated graph, dynamo traces the contents of the hook Pull Request resolved: https://github.com/pytorch/pytorch/pull/109537 Approved by: https://github.com/ezyang	2023-10-11 01:35:37 +00:00
Jon Chuang	79212430df	feat(inductor): fx graph debug should display device (#110346 ) Device mismatch issues are root cause of: https://github.com/pytorch/pytorch/issues/107006, hence make device-related scheduling issues easier to diagnose. Also format single-kwarg graphs to be more concise Example rendering: ![image](https://github.com/pytorch/pytorch/assets/9093549/1b59a994-f2df-45c9-8cb7-37eb3ba12654) CC code owners: @ngimel @jansel @shunting314 @mlazos @peterbell10 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110346 Approved by: https://github.com/eellison	2023-10-11 00:34:55 +00:00
Edward Z. Yang	24bf9aeb6b	Fix arange with dynamic end argument. (#110979 ) Fixes https://github.com/pytorch/pytorch/issues/93468 There's a few extra tests that are sort of unrelated, but I ended up writing them while working on the fix and decided to keep them. The big idea here is to split the `_check` so that `expect_true` works; I could have probably also improved the symbolic reasoning but I'm lazy. One small logging fix too. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/110979 Approved by: https://github.com/Skylion007	2023-10-11 00:32:34 +00:00
leslie-fang-intel	a11d4a8378	[Reland] [Inductor] Break the loop fusion when node2 depends on node1 mutations (#110677 ) Reland PR https://github.com/pytorch/pytorch/pull/109172 which has been reverted in https://github.com/pytorch/pytorch/pull/110622 Differential Revision: [D50097373](https://our.internmc.facebook.com/intern/diff/D50097373) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110677 Approved by: https://github.com/jgong5, https://github.com/ezyang	2023-10-11 00:26:45 +00:00
PyTorch MergeBot	314a502eb0	Revert "Reland "[C10] PG observability hooks. (#108815 )" (#110907 )" This reverts commit `7678cd22af`. Reverted https://github.com/pytorch/pytorch/pull/110907 on behalf of https://github.com/huydhn due to Sorry for reverting this, but macos job in trunk starts failing after this `7678cd22af` ([comment](https://github.com/pytorch/pytorch/pull/110907#issuecomment-1756497387))	2023-10-11 00:23:42 +00:00
Huy Do	2edc75a669	Add a workflow to release Android binaries (#110976 ) This adds 2 jobs to build PyTorch Android with and without lite interpreter: * Keep the list of currently supported ABI armeabi-v7a, arm64-v8a, x86, x86_64 * Pass all the test on emulator * Run an the test app on emulator and my Android phone `arm64-v8a` without any issue ![Screenshot_20231010-114453](https://github.com/pytorch/pytorch/assets/475357/57e12188-1675-44d2-a259-9f9577578590) * Run on AWS https://us-west-2.console.aws.amazon.com/devicefarm/home#/mobile/projects/b531574a-fb82-40ae-b687-8f0b81341ae0/runs/5fce6818-628a-4099-9aab-23e91a212076 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110976 Approved by: https://github.com/atalman	2023-10-11 00:19:33 +00:00
Jon Chuang	5aa96fd336	[dynamo] list index: add more list types to testing, support namedtuple, improve error handling (#110919 ) Follow up: #110817 Minor improvements as discussed in prev PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/110919 Approved by: https://github.com/ezyang	2023-10-11 00:16:39 +00:00
SS-JIA	9606cda64e	[core ATen IR] Add decompositions for max, min, var_mean (#110906 ) ## Context Add decompositions for `aten.max`, `aten.min`, and `aten.var_mean`. These operators follow a pattern of returning a tuple of outputs from two component operators: ``` aten.max(x) -> return aten.amax(x), aten.argmax(x) aten.min(x) -> return aten.amin(x), aten.argmin(x) aten.var_mean(x) -> return aten.var(x), aten.mean(x) ``` For `var_mean`, the `refs` implementation was doing something similar, so I changed it to call `torch.` ops instead like was done for other `refs` implementations previously. cc: @peterbell10 @lezcano Note that Inductor lowers all these directly, so they are excluded from the Inductor decomp table. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110906 Approved by: https://github.com/manuelcandales	2023-10-11 00:06:24 +00:00
PyTorch MergeBot	3100d3e661	Revert "[inductor] Implement Fx graph caching to improve warm compilation time. (#103453 )" This reverts commit `8a8668e1ae`. Reverted https://github.com/pytorch/pytorch/pull/103453 on behalf of https://github.com/kit1980 due to The newly added test fails on internal builds ([comment](https://github.com/pytorch/pytorch/pull/103453#issuecomment-1756449919))	2023-10-10 23:21:59 +00:00
cyy	f98d6ad8b3	[1/N] Apply clang-tidy to aten/src/ATen/core/ (#110861 ) It is time to cliang-tidy aten. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110861 Approved by: https://github.com/Skylion007	2023-10-10 23:20:58 +00:00
Will Constable	ca03f36233	Change ProcessGroupNCCL default timeout to 10 min (#110947 ) Avoid changing default for other backends as CPU backend (GLOO) may need longer timeouts. Motivated by trying to save cluster time when encountering collective hangs. Generally collectives should time out within seconds and 30 minutes (or 10 minutes) should provide ample headroom for edge cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110947 Approved by: https://github.com/xw285cornell, https://github.com/fduwjj	2023-10-10 22:28:39 +00:00
Tugsbayasgalan Manlaibaatar	cd275dc24f	Remove RangeConstraints in favor of ValueRanges (#109859 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109859 Approved by: https://github.com/avikchaudhuri	2023-10-10 22:22:05 +00:00
Jerry Zhang	7a69e3d30b	[fx][subgraph_matcher] Add a matcher that supports name to node map (#110743 ) Summary: We want the matcher to return a name -> node in target graph so that we can refer to the node by name, this is useful for downstream applications like quantization. and also we can use the torch API as source of truth instead of matching aten API directly. Test Plan: python test/fx/test_matcher_utils.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110743 Approved by: https://github.com/SherlockNoMad	2023-10-10 22:21:24 +00:00
Ramil Nugmanov	91eeb77260	StackDataset batched sampling (#110694 ) Optimization of loading minibatches Pull Request resolved: https://github.com/pytorch/pytorch/pull/110694 Approved by: https://github.com/ejguan	2023-10-10 22:05:51 +00:00
Joel Schlosser	ac01304e22	pin_memory support for NT (#110404 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110404 Approved by: https://github.com/cpuhrsch, https://github.com/albanD ghstack dependencies: #110292	2023-10-10 21:58:19 +00:00
Joel Schlosser	43ea782af3	Multiprocessing support for NT (#110292 ) Fixes #110161 Allows NTs to be used in DataLoaders with `num_workers > 1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110292 Approved by: https://github.com/cpuhrsch, https://github.com/albanD	2023-10-10 21:58:19 +00:00
Aaron Bockover	7f2d25c547	[ONNX] bump onnx submodule to rel-1.15.0 (#110663 ) - onnx==1.15.0rc1 - onnxscript==0.1.0.dev20231006 - ort-nightly==1.17.0.dev20231005001 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110663 Approved by: https://github.com/ezyang, https://github.com/thiagocrepaldi	2023-10-10 21:44:09 +00:00
rzou	3a29cdc5e6	[optests] Add dontGenerateOpCheckTests and is_inside_opcheck_mode (#110951 ) This PR adds the following helper functions for generated opcheck tests: - dontGenerateOpCheckTests is a decorator that skips generation of the opcheck tests for the generated function - is_inside_opcheck_mode lets us query if we are in a generated test. Useful for fast debugging out-of-tree without needing to update PyTorch. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/110951 Approved by: https://github.com/williamwen42	2023-10-10 21:43:43 +00:00
wz337	d9eb5a57aa	[FSDP] Change _create_chunk_dtensor in fsdp/_shard_utils.py to use public API from DTensor (#110831 ) This PR: 1) updates _create_chunk_dtensor() in _shard_utils.py to use public APIs from DTensor. This will avoid the global_size calculation error from using DTensor.from_local() for uneven-sharded parameters, as described in https://github.com/pytorch/pytorch/issues/110762 2) updates test/distributed/fsdp/test_fsdp_dtensor_state_dict.py to include unit test for a model with uneven sharding. cc. @wanchaol, @fegin Pull Request resolved: https://github.com/pytorch/pytorch/pull/110831 Approved by: https://github.com/wanchaol, https://github.com/fegin	2023-10-10 21:04:27 +00:00
Jon Chuang	6e770c0dda	[dynamo] Add `itertools.repeat` via polyfill (#110953 ) Fixes https://github.com/pytorch/pytorch/issues/110286 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110953 Approved by: https://github.com/ezyang	2023-10-10 20:40:33 +00:00
PyTorch MergeBot	02a02a23ee	Revert "Move at::{Refcounted,}MapAllocator to c10 (#109881 )" This reverts commit `0341deb1c7`. Reverted https://github.com/pytorch/pytorch/pull/109881 on behalf of https://github.com/albanD due to It does break buck build ([comment](https://github.com/pytorch/pytorch/pull/109881#issuecomment-1756195823))	2023-10-10 20:39:12 +00:00
Khushi Agrawal	495f77be7a	[cpu] explicitly vectorize digamma (#110217 ) ### Benchmarking results ```python [-------------- torch.digamma(x) Benchmark -------------] \| implicitly vectorized \| explicitly vectorized 1 threads: ----------------------------------------------------------------------- dtype torch.float16 - n : 100 \| 3.8 \| 3.5 dtype torch.float16 - n : 200 \| 5.8 \| 5.3 dtype torch.float16 - n : 500 \| 11.8 \| 10.7 dtype torch.float16 - n : 1000 \| 22.0 \| 19.6 dtype torch.float16 - n : 10000 \| 203.6 \| 179.7 dtype torch.float32 - n : 100 \| 3.8 \| 3.6 dtype torch.float32 - n : 200 \| 5.7 \| 5.5 dtype torch.float32 - n : 500 \| 11.1 \| 11.1 dtype torch.float32 - n : 1000 \| 20.6 \| 20.5 dtype torch.float32 - n : 10000 \| 191.7 \| 189.6 dtype torch.float64 - n : 100 \| 3.8 \| 3.7 dtype torch.float64 - n : 200 \| 5.9 \| 5.7 dtype torch.float64 - n : 500 \| 11.9 \| 11.7 dtype torch.float64 - n : 1000 \| 22.1 \| 21.7 dtype torch.float64 - n : 10000 \| 203.6 \| 199.7 dtype torch.bfloat16 - n : 100 \| 3.7 \| 3.5 dtype torch.bfloat16 - n : 200 \| 5.6 \| 5.3 dtype torch.bfloat16 - n : 500 \| 11.2 \| 10.6 dtype torch.bfloat16 - n : 1000 \| 20.8 \| 19.5 dtype torch.bfloat16 - n : 10000 \| 190.0 \| 179.7 Times are in microseconds (us).` ``` ### Benchmarking config Machine: Intel(R) Core(TM) i7-10870H CPU @ 2.20GHz <p> ```python >>> import torch >>> print(f"Torch config: {torch.__config__.show()}") Torch config: PyTorch built with: - GCC 9.3 - C++ Version: 201703 - Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications - OpenMP 201511 (a.k.a. OpenMP 4.5) - LAPACK is enabled (usually provided by MKL) - CPU capability usage: AVX2 - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CXX_COMPILER=/usr/local/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.2.0, USE_CUDA=0, USE_CUDNN=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=0, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, ``` </p> Script - ``` import torch import pickle from torch.utils import benchmark from itertools import product device = 'cpu' dtypes = (torch.float16, torch.float32, torch.float64, torch.bfloat16) n = (100, 200, 500, 1000, 10000) result = [] for dtype, num in product(dtypes, n): x = torch.rand(num, dtype=dtype, device='cpu') torch.digamma(x) stmt = 'torch.digamma(x)' measurement = benchmark.Timer( stmt=stmt, globals={'x': x}, label=stmt + " Benchmark", sub_label=f"dtype {dtype} - n : {num}", description="vectorized", ).blocked_autorange(min_run_time=10) result.append(measurement) fname_prefix = "benchmark_digamma_" benchmark.Compare(result).print() with open(fname_prefix+"vectorized", "wb") as f: pickle.dump(result, f) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110217 Approved by: https://github.com/sanchitintel, https://github.com/vfdev-5, https://github.com/ezyang	2023-10-10 20:31:25 +00:00
Will Constable	7678cd22af	Reland "[C10] PG observability hooks. (#108815 )" (#110907 ) This reverts commit `ff0358b038`. (original PR https://github.com/pytorch/pytorch/pull/108815 desc copied below) Expose a set of observability hooks into C10D such that our users can detect collectives failure both faster and more easily. The design is similar to NCCL desync debug that it minimized the overhead by doing most of the work out of the main thread. This PR introduces a new module torch.distributed.hooks that exposes the following set of methods: register_collective_start_hook register_collective_end_hook register_process_group_hook The process group hook exposes PG creation on the member ranks and call them inline from the the PG creation code. This is fine since this happens during initialization and a limited number of times. The collective start/end hooks are fired from a single background thread. It reads events from a C++ queue and dispatches over. Queue notification is oddly done using a pipe, this is needed so python can abort the thread on shutdown and have it as background thread. This is not possible with more reasonable choices like a condvar. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110907 Approved by: https://github.com/fduwjj	2023-10-10 20:09:40 +00:00
Jon Chuang	84ad3ed7b2	[dynamo] add config for displaying all guard failures (#110927 ) Fixes https://github.com/pytorch/pytorch/issues/110879 Example output: ``` ('Recompiling function fn in /home/jonch/Desktop/Programming/mlsys/pytorch/test/dynamo/test_misc.py:4578', 'triggered by the following guard failures: ["___check_type_id(L[\'obj\'], 94834370481168)", "L[\'obj\'].x == -0.5"]') ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110927 Approved by: https://github.com/lezcano	2023-10-10 19:57:44 +00:00
DanilBaibak	8cf1a02e80	Rever [Profiler] Improve the docstring for export_memory_timeline (#110978 ) Rever [Profiler] Improve the docstring for export_memory_timeline Pull Request resolved: https://github.com/pytorch/pytorch/pull/110978 Approved by: https://github.com/huydhn, https://github.com/aaronenyeshi	2023-10-10 19:57:25 +00:00
soulitzer	bc49b1e50b	[reland] Use is_symbolic instead of testing isinstance in some place (#110676 ) reland of https://github.com/pytorch/pytorch/pull/110372 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110676 Approved by: https://github.com/ezyang ghstack dependencies: #110673, #110674, #110675	2023-10-10 19:37:17 +00:00
soulitzer	df9a6bcaef	[reland] Symintify guards.cpp (#110675 ) reland of https://github.com/pytorch/pytorch/pull/110371 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110675 Approved by: https://github.com/ezyang ghstack dependencies: #110673, #110674	2023-10-10 19:37:17 +00:00

1 2 3 4 5 ...

64995 Commits