pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	718bb9016f	Revert "[Memory Snapshot] Add recordAnnotations to capture record_function annotations (#124179 )" This reverts commit `187aeaeabf`. Reverted https://github.com/pytorch/pytorch/pull/124179 on behalf of https://github.com/clee2000 due to test_tensorexpr.py::TestTensorExprFuser::test_simple_add is causing a segfault https://github.com/pytorch/pytorch/actions/runs/9097383783/job/25007155440 `187aeaeabf`, test was skipped due to bad TD ([comment](https://github.com/pytorch/pytorch/pull/124179#issuecomment-2112948246))	2024-05-15 16:11:47 +00:00
Aaron Enye Shi	187aeaeabf	[Memory Snapshot] Add recordAnnotations to capture record_function annotations (#124179 ) Summary: Add new traceEvents into Memory Snapshot for record_function annotations. These will capture both the profiler's step annotation as well as user annotations. Test Plan: CI New Snapshot Generated: devvm2184.cco0.facebook.com.Apr_19_13_27_14.3072800.snapshot.pickle Snippet of Snapshot device_traces show `ProfilerStep#0`, and `## forward ##` annotations: ``` [[{'action': 'user_defined', 'addr': 0, 'size': 0, 'stream': 0, 'time_us': 1713558427168556, 'frames': [{'name': 'START', 'filename': 'ProfilerStep#0', 'line': 0}]}, {'action': 'user_defined', 'addr': 0, 'size': 0, 'stream': 0, 'time_us': 1713558427168738, 'frames': [{'name': 'END', 'filename': 'ProfilerStep#0', 'line': 0}]}, {'action': 'user_defined', 'addr': 0, 'size': 0, 'stream': 0, 'time_us': 1713558427168865, 'frames': [{'name': 'START', 'filename': 'ProfilerStep#1', 'line': 0}]}, {'action': 'user_defined', 'addr': 0, 'size': 0, 'stream': 0, 'time_us': 1713558427168920, 'frames': [{'name': 'START', 'filename': '## forward ##', 'line': 0}]}, {'action': 'alloc', 'addr': 140166073581568, 'size': 3211264, 'stream': 0, 'time_us': 1713558427172978, 'frames': [{'name': '_conv_forward', 'filename': '/mnt/xarfuse/uid-416185/235d4caf-seed-nspid4026531836_cgpid32884718-ns-4026531840/torch/nn/modules/conv ``` Differential Revision: D55941362 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/124179 Approved by: https://github.com/zdevito	2024-05-15 14:19:40 +00:00
Richard Barnes	ed327876f5	[codemod] `c10:optional` -> `std::optional` (#126135 ) Generated by running the following from PyTorch root: ``` find . -regex ".*\.$cpp\\|h\\|cu\\|hpp\\|cc\\|cxx$$" \| grep -v "build/" \| xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/' ``` `c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi	2024-05-14 19:35:51 +00:00
Richard Barnes	b55f57b7af	[codemod][lowrisk] Remove extra semi colon from caffe2/c10/core/SymNodeImpl.h (#123055 ) Summary: `-Wextra-semi` or `-Wextra-semi-stmt` If the code compiles, this is safe to land. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123055 Approved by: https://github.com/Skylion007	2024-05-14 19:35:29 +00:00
David Berard	82edc8b5d5	[NT] Make NestedTensor register as having symbolic sizes/strides (#124687 ) Fixes #123698 This PR makes TensorImpl::has_symbolic_sizes_strides return false for NestedTensors. 1. It passes in the actual sizes when we call `_make_wrapper_subclass` - this is the change that makes the subclass register as `has_symbolic_sizes_strides() == True` 2. It adds a field to `_make_wrapper_subclass` where an explicit `numel` can be provided. This allows us to skip the numel computation for the storage, which previously fails due to arithmetic on NestedInts. 3. Implements `aten::numel` for NJT - this is separate from the overridden numel in `make_wrapper_subclass` for now. Note also that this means that we leave `dispatch_sizes_strides_policy="sizes"`, so that we call into the custom `numel` implementation (as well as `sizes` and `strides`), because `numel` cannot currently be computed from `sizes` for NJT. Note also that this depends on #121361, because calling TensorImpl::set_sizes_and_strides() tries to clone the sizes into the tensor, which means that we need `clone` to be implemented on NestedInt. Differential Revision: [D57225736](https://our.internmc.facebook.com/intern/diff/D57225736) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124687 Approved by: https://github.com/albanD	2024-05-13 16:50:25 +00:00
Yu, Guangye	31372fa842	Support generic stream/event on CUDA/HIP backend (#125757 ) # Motivation According to [#123611](https://github.com/pytorch/pytorch/pull/123611), we support generic stream/event on CUDA backend. # Additional Context new method/attribute on `torch.Event` for cuda - torch.Event.event_id - torch.Event.elapsed_time - torch.Event.synchronize new method on `c10::Event` on cuda backend - c10.Event.event_id - c10.Event.elapsed_time - c10.Event.synchronize Pull Request resolved: https://github.com/pytorch/pytorch/pull/125757 Approved by: https://github.com/albanD, https://github.com/jgong5, https://github.com/EikanWang	2024-05-10 13:34:09 +00:00
Nikita Shulga	021ff7fd77	[BE] Explicitly handle all types `c10::isSigned` (#125637 ) By defining `CASE_ISSIGNED` macros that just returns `std::numeric_limits<dtype>::is_signed` for the types where it makes sense and explicitly code some types when it does not Remove `default:` case from the switch to avoid regressions like the one reported in https://github.com/pytorch/pytorch/issues/125124 , as [`-Wswitch-enum`](https://clang.llvm.org/docs/DiagnosticsReference.html#wswitch-enum) in combination with `-Werror` will raise an error in case of a missing entry, for example: ``` /Users/nshulga/git/pytorch/pytorch/c10/core/ScalarType.h:518:11: warning: enumeration value 'QInt32' not handled in switch [-Wswitch] switch (t) { ^ /Users/nshulga/git/pytorch/pytorch/c10/core/ScalarType.h:518:11: note: add missing switch cases switch (t) { ^ 1 warning generated. ``` Fixes https://github.com/pytorch/pytorch/issues/125124 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125637 Approved by: https://github.com/albanD	2024-05-07 22:51:03 +00:00
Nikita Shulga	65fc3c31bc	[BE] Delete unused `AT_FORALL_SCALAR_TYPES_AND[456]` (#125607 ) Check that they are not used by running the following ``` % grep -h "AT_FORALL_SCALAR_TYPES_AND" . -R\|grep -v #define\|cut -d\( -f1\|sort\|uniq AT_FORALL_SCALAR_TYPES_AND3 AT_FORALL_SCALAR_TYPES_AND3 AT_FORALL_SCALAR_TYPES_AND AT_FORALL_SCALAR_TYPES_AND2 AT_FORALL_SCALAR_TYPES_AND3 AT_FORALL_SCALAR_TYPES_AND7 AT_FORALL_SCALAR_TYPES_AND2 AT_FORALL_SCALAR_TYPES_AND3 AT_FORALL_SCALAR_TYPES_AND7 AT_FORALL_SCALAR_TYPES_AND2 AT_FORALL_SCALAR_TYPES_AND3 AT_FORALL_SCALAR_TYPES_AND7 // AT_FORALL_SCALAR_TYPES / AT_FORALL_SCALAR_TYPES_AND macros below, which are AT_FORALL_SCALAR_TYPES_AND AT_FORALL_SCALAR_TYPES_AND2 AT_FORALL_SCALAR_TYPES_AND3 AT_FORALL_SCALAR_TYPES_AND7 using at::Half; // for AT_FORALL_SCALAR_TYPES_AND3 ``` or by checking online using https://github.com/search?type=code&q=AT_FORALL_SCALAR_TYPES_AND4+repo%3Apytorch%2Fpytorch Pull Request resolved: https://github.com/pytorch/pytorch/pull/125607 Approved by: https://github.com/albanD	2024-05-07 00:01:35 +00:00
Aaron Orenstein	609c958281	Fix mypy issues in fake_tensor.py (#124428 ) fake_tensor.py had mypy error ignored. That seems less than desirable. Also added SafePyObjectT<T> which is a tagged wrapper around a SafePyObject but provides static type checking (with no other guarantees). Used `SafePyObjectT<TorchDispatchModeKey>` on some of the TorchDispatchModeTLS API to ensure that we don't accidentally inject a different type than expected into the stack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124428 Approved by: https://github.com/malfet	2024-04-26 15:35:53 +00:00
PyTorch MergeBot	f131c2c199	Revert "Fix mypy issues in fake_tensor.py (#124428 )" This reverts commit `25c0d3f3f0`. Reverted https://github.com/pytorch/pytorch/pull/124428 on behalf of https://github.com/jeanschmidt due to Unfortunately, I needed to revert #123735 and this one depends on it. So please check if there are no merge conflicts or breakages and feel free to merge this PR again ([comment](https://github.com/pytorch/pytorch/pull/124428#issuecomment-2078699836))	2024-04-26 06:15:17 +00:00
Aaron Orenstein	25c0d3f3f0	Fix mypy issues in fake_tensor.py (#124428 ) fake_tensor.py had mypy error ignored. That seems less than desirable. Also added SafePyObjectT<T> which is a tagged wrapper around a SafePyObject but provides static type checking (with no other guarantees). Used `SafePyObjectT<TorchDispatchModeKey>` on some of the TorchDispatchModeTLS API to ensure that we don't accidentally inject a different type than expected into the stack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124428 Approved by: https://github.com/malfet	2024-04-25 14:07:53 +00:00
egienvalue	408aa0182c	Build device generic torch.Stream and torch.Event based on c10::Stream/Event (#123611 ) This diff intends to build device generic torch.Stream and torch.Event for newly added accelerators in PyTorch. ------------ torch.Stream APIs ``` # Defined in torch/csrc/Stream.cpp class Stream(_StreamBase): stream_id: _int # Stream id device_index: _int device_type: _int device: _device # The device of the stream @overload def __new__(self, device: Optional[DeviceLikeType] = None, priority: _int = 0) -> Stream: ... @overload def __new__(self, stream_id: _int, device_index: _int, device_type: _int, priority: _int = 0) -> Stream: ... def wait_event(self, event: Event) -> None: ... def wait_stream(self, other: Stream) -> None: ... def record_event(self, event: Optional[Event] = None) -> Event: ... def query(self) -> None: ... def synchronize(self) -> None: ... def __hash__(self) -> _int: ... def __repr__(self) -> str: ... def __eq__(self, other: object) -> _bool: ... ``` ------------------ torch.Event APIs: - IPC related APIs are not implemented, since many device backends don't support it, but we leave interfaces there for future adaption of torch.cuda.Stream. - currently only the enable_timing is supported, since it is the most common one used in other device backends. We have to refactor the event flag system in PyTorch to support more fancy flag. - elapsedTime API is added to c10::Event ``` # Defined in torch/csrc/Event.cpp class Event(_EventBase): device: _device # The device of the Event event_id: _int # The raw event created by device backend def __new__(self, device: Optional[DeviceLikeType] = None, enable_timing: _bool = False, blocking: _bool = False, interprocess: _bool = False) -> Event: ... @classmethod def from_ipc_handle(self, device: DeviceLikeType, ipc_handle: bytes) -> Event: ... def record(self, stream: Optional[Stream] = None) -> None: ... def wait(self, stream: Optional[Stream] = None) -> None: ... def query(self) -> _bool: ... def elapsed_time(self, other: Event) -> _float: ... def synchronize(self) -> None: ... def ipc_handle(self) -> bytes: ... def __repr__(self) -> str: ... ``` ----------- c10::Event provides new APIs - calculate elapsedTime. - Get raw event id - Synchronize event. ``` double elapsedTime(const Event& event) const { return impl_.elapsedTime(event.impl_); } void* eventId() const { return impl_.eventId(); } void synchronize() const { return impl_.synchronize(); } ``` ---------- TODO: need to find a good way to test them in PyTorch with API mocks. Differential Revision: [D56443357](https://our.internmc.facebook.com/intern/diff/D56443357) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123611 Approved by: https://github.com/albanD, https://github.com/jeffdaily	2024-04-24 20:51:17 +00:00
Ashwin Hari	5f5778476a	rename ort to maia (#123265 ) Fixes #123264 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123265 Approved by: https://github.com/albanD	2024-04-23 00:33:25 +00:00
PyTorch MergeBot	277ab8a4c0	Revert "[Environment Variable][1/N] Use thread-safe env variable API in c10 (#119449 )" This reverts commit `a56e057814`. Reverted https://github.com/pytorch/pytorch/pull/119449 on behalf of https://github.com/jeanschmidt due to Broken internal signals, @albanD please help get this sorted :) ([comment](https://github.com/pytorch/pytorch/pull/119449#issuecomment-2069716129))	2024-04-22 14:44:44 +00:00
PyTorch MergeBot	0feab7d6c3	Revert "Build device generic torch.Stream and torch.Event based on c10::Stream/Event (#123611 )" This reverts commit `cb17721899`. Reverted https://github.com/pytorch/pytorch/pull/123611 on behalf of https://github.com/jeffdaily due to This broke ROCm. see test_overrides.py ([comment](https://github.com/pytorch/pytorch/pull/123611#issuecomment-2067363780))	2024-04-19 22:44:26 +00:00
cyy	a56e057814	[Environment Variable][1/N] Use thread-safe env variable API in c10 (#119449 ) This PR is the beginning of attempts to wrap thread-unsafe getenv and set_env functions inside a RW mutex. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119449 Approved by: https://github.com/malfet, https://github.com/albanD	2024-04-19 13:39:41 +00:00
PyTorch MergeBot	61bc188f42	Revert "[Environment Variable][1/N] Use thread-safe env variable API in c10 (#119449 )" This reverts commit `b51f66c195`. Reverted https://github.com/pytorch/pytorch/pull/119449 on behalf of https://github.com/malfet due to Broke gcc9 builds ([comment](https://github.com/pytorch/pytorch/pull/119449#issuecomment-2064936414))	2024-04-18 18:53:59 +00:00
egienvalue	cb17721899	Build device generic torch.Stream and torch.Event based on c10::Stream/Event (#123611 ) This diff intends to build device generic torch.Stream and torch.Event for newly added accelerators in PyTorch. ------------ torch.Stream APIs ``` # Defined in torch/csrc/Stream.cpp class Stream(_StreamBase): stream_id: _int # Stream id device_index: _int device_type: _int device: _device # The device of the stream @overload def __new__(self, device: Optional[DeviceLikeType] = None, priority: _int = 0) -> Stream: ... @overload def __new__(self, stream_id: _int, device_index: _int, device_type: _int, priority: _int = 0) -> Stream: ... def query(self) -> _bool: ... def synchronize(self) -> None: ... def wait_event(self, event: Event) -> None: ... def wait_stream(self, other: Stream) -> None: ... def record_event(self, event: Optional[Event] = None) -> Event: ... def query(self) -> None: ... def synchronize(self) -> None: ... def __hash__(self) -> _int: ... def __repr__(self) -> str: ... def __eq__(self, other: object) -> _bool: ... ``` ------------------ torch.Event APIs: - IPC related APIs are not implemented, since many device backends don't support it, but we leave interfaces there for future adaption of torch.cuda.Stream. - currently only the enable_timing is supported, since it is the most common one used in other device backends. We have to refactor the event flag system in PyTorch to support more fancy flag. - elapsedTime API is added to c10::Event ``` # Defined in torch/csrc/Event.cpp class Event(_EventBase): device: _device # The device of the Event event_id: _int # The raw event created by device backend def __new__(self, device: Optional[DeviceLikeType] = None, enable_timing: _bool = False, blocking: _bool = False, interprocess: _bool = False) -> Event: ... @classmethod def from_ipc_handle(self, device: DeviceLikeType, ipc_handle: bytes) -> Event: ... def record(self, stream: Optional[Stream] = None) -> None: ... def wait(self, stream: Optional[Stream] = None) -> None: ... def query(self) -> _bool: ... def elapsed_time(self, other: Event) -> _float: ... def synchronize(self) -> None: ... def ipc_handle(self) -> bytes: ... def __repr__(self) -> str: ... ``` ----------- c10::Event provides new APIs - calculate elapsedTime. - Get raw event id - Synchronize event. ``` double elapsedTime(const Event& event) const { return impl_.elapsedTime(event.impl_); } void* eventId() const { return impl_.eventId(); } void synchronize() const { return impl_.synchronize(); } ``` ---------- TODO: need to find a good way to test them in PyTorch with API mocks. Differential Revision: [D55351839](https://our.internmc.facebook.com/intern/diff/D55351839/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123611 Approved by: https://github.com/albanD	2024-04-18 17:35:09 +00:00
cyy	b51f66c195	[Environment Variable][1/N] Use thread-safe env variable API in c10 (#119449 ) This PR is the beginning of attempts to wrap thread-unsafe getenv and set_env functions inside a RW mutex. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119449 Approved by: https://github.com/albanD	2024-04-18 13:35:48 +00:00
rzou	648c39c47d	Add OpOverload.redispatch; use it in new custom ops API (#124089 ) A kernel has "dispatcher convention" if there is an additional keyset arg at the beginning of the argument list. This PR: - adds a way to register kernels with dispatcher_convention using Library.impl (pass dispatcher_convention = True) - adds OpOverload.redispatch We use both of the above in the new custom ops API: we register the autograd kernel in dispatcher convention so that we can actually call redispatch like how pytorch built-in ops do it. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/124089 Approved by: https://github.com/albanD ghstack dependencies: #123937, #124064, #124065, #124066, #124071	2024-04-18 12:48:04 +00:00
William Wen	38bfe7bcd1	add link to custom ops troubleshooting page on tensor data_ptr error (#124240 ) Fix part of https://github.com/pytorch/pytorch/issues/123603. Example traceback on branch https://github.com/pytorch/vision/compare/main...wwen/custom_ops_test: ``` running my_custom_op! Traceback (most recent call last): File "/data/users/williamwen/torchvision/playground.py", line 13, in <module> print(opt_fn1(torch.randn(3, 3))) File "/data/users/williamwen/pytorch2/torch/_dynamo/eval_frame.py", line 387, in _fn return fn(args, kwargs) File "/data/users/williamwen/pytorch2/torch/_dynamo/convert_frame.py", line 977, in catch_errors return callback(frame, cache_entry, hooks, frame_state, skip=1) File "/data/users/williamwen/pytorch2/torch/_dynamo/convert_frame.py", line 818, in _convert_frame result = inner_convert( File "/data/users/williamwen/pytorch2/torch/_dynamo/convert_frame.py", line 411, in _convert_frame_assert return _compile( File "/data/users/williamwen/pytorch2/torch/_utils_internal.py", line 70, in wrapper_function return function(args, *kwargs) File "/data/users/williamwen/py310-env/lib/python3.10/contextlib.py", line 79, in inner return func(args, *kwds) File "/data/users/williamwen/pytorch2/torch/_dynamo/convert_frame.py", line 700, in _compile guarded_code = compile_inner(code, one_graph, hooks, transform) File "/data/users/williamwen/pytorch2/torch/_dynamo/utils.py", line 266, in time_wrapper r = func(args, *kwargs) File "/data/users/williamwen/pytorch2/torch/_dynamo/convert_frame.py", line 568, in compile_inner out_code = transform_code_object(code, transform) File "/data/users/williamwen/pytorch2/torch/_dynamo/bytecode_transformation.py", line 1116, in transform_code_object transformations(instructions, code_options) File "/data/users/williamwen/pytorch2/torch/_dynamo/convert_frame.py", line 173, in _fn return fn(args, kwargs) File "/data/users/williamwen/pytorch2/torch/_dynamo/convert_frame.py", line 515, in transform tracer.run() File "/data/users/williamwen/pytorch2/torch/_dynamo/symbolic_convert.py", line 2237, in run super().run() File "/data/users/williamwen/pytorch2/torch/_dynamo/symbolic_convert.py", line 875, in run while self.step(): File "/data/users/williamwen/pytorch2/torch/_dynamo/symbolic_convert.py", line 790, in step self.dispatch_table[inst.opcode](self, inst) File "/data/users/williamwen/pytorch2/torch/_dynamo/symbolic_convert.py", line 492, in wrapper return inner_fn(self, inst) File "/data/users/williamwen/pytorch2/torch/_dynamo/symbolic_convert.py", line 1260, in CALL_FUNCTION self.call_function(fn, args, {}) File "/data/users/williamwen/pytorch2/torch/_dynamo/symbolic_convert.py", line 730, in call_function self.push(fn.call_function(self, args, kwargs)) File "/data/users/williamwen/pytorch2/torch/_dynamo/variables/torch.py", line 747, in call_function tensor_variable = wrap_fx_proxy( File "/data/users/williamwen/pytorch2/torch/_dynamo/variables/builder.py", line 1425, in wrap_fx_proxy return wrap_fx_proxy_cls(target_cls=TensorVariable, kwargs) File "/data/users/williamwen/pytorch2/torch/_dynamo/variables/builder.py", line 1510, in wrap_fx_proxy_cls example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True) File "/data/users/williamwen/pytorch2/torch/_dynamo/utils.py", line 1804, in get_fake_value raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None File "/data/users/williamwen/pytorch2/torch/_dynamo/utils.py", line 1736, in get_fake_value ret_val = wrap_fake_exception( File "/data/users/williamwen/pytorch2/torch/_dynamo/utils.py", line 1251, in wrap_fake_exception return fn() File "/data/users/williamwen/pytorch2/torch/_dynamo/utils.py", line 1737, in <lambda> lambda: run_node(tx.output, node, args, kwargs, nnmodule) File "/data/users/williamwen/pytorch2/torch/_dynamo/utils.py", line 1872, in run_node raise RuntimeError(make_error_message(e)).with_traceback( File "/data/users/williamwen/pytorch2/torch/_dynamo/utils.py", line 1854, in run_node return node.target(args, kwargs) File "/data/users/williamwen/pytorch2/torch/_ops.py", line 870, in __call__ return self_._op(args, *(kwargs or {})) torch._dynamo.exc.TorchRuntimeError: Failed running call_function torchvision.my_custom_op1((FakeTensor(..., size=(3, 3)),), **{}): The tensor has a non-zero number of elements, but its data is not allocated yet. Caffe2 uses a lazy allocation, so you will need to call mutable_data() or raw_mutable_data() to actually allocate memory. If you're using torch.compile/export/fx, it is likely that we are erroneously tracing into a custom kernel. To fix this, please wrap the custom kernel into an opaque custom op. Please see the following for details: https://docs.google.com/document/d/1W--T6wz8IY8fOI0Vm8BF44PdBgs283QvpelJZWieQWQ from user code: File "/data/users/williamwen/torchvision/playground.py", line 5, in fn1 return torch.ops.torchvision.my_custom_op1(x) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/124240 Approved by: https://github.com/zou3519	2024-04-18 00:08:09 +00:00
PyTorch MergeBot	f5049de242	Revert "[Environment Variable][1/N] Use thread-safe env variable API in c10 (#119449 )" This reverts commit `5bef127c2e`. Reverted https://github.com/pytorch/pytorch/pull/119449 on behalf of https://github.com/PaliC due to your using TORCH_INTERNAL_ASSERT incorrectly ([comment](https://github.com/pytorch/pytorch/pull/119449#issuecomment-2062696010))	2024-04-17 23:44:00 +00:00
cyy	5bef127c2e	[Environment Variable][1/N] Use thread-safe env variable API in c10 (#119449 ) This PR is the beginning of attempts to wrap thread-unsafe getenv and set_env functions inside a RW mutex. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119449 Approved by: https://github.com/albanD	2024-04-16 04:39:20 +00:00
David Berard	6b24ec480c	[Tensor] Detect more cases of symbolic sizes/strides (#123696 ) Previously, we'd just check `has_symbolic_sizes_strides()` to know whether a tensor has symbolic sizes or strides; if does, we skip some profiler logic. But sometimes `has_symbolic_sizes_strides()` returns false, but we do actually have symbolic sizes or strides. So in this change, we add `may_have_symbolic_sizes_strides()` - which should never return false if the tensor has symbolic sizes and strides Why not change `has_symbolic_sizes_strides()`? It seems like there's preexisting logic that assumes that "if has_symbolic_sizes_strides(), then we can assume that this tensor is guaranteed to have symbolic sizes or strides". In this case, we have python-implemented sizes or strides, which should follow a different code path. Differential Revision: [D55947660](https://our.internmc.facebook.com/intern/diff/D55947660/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123696 Approved by: https://github.com/aaronenyeshi, https://github.com/soulitzer	2024-04-11 16:51:52 +00:00
Edward Z. Yang	8aad72b0d3	Support all unsigned int sizes on unique (#123643 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/123643 Approved by: https://github.com/albanD, https://github.com/kit1980	2024-04-11 06:50:12 +00:00
Aaron Orenstein	2bcc83dfbd	Preserve dispatch state across function tracing (#122073 ) If we throw an exception in the "wrong" place we can end up with the dispatch state being in a weird state which can cause all future dispatching to fail. Preserve and restore it as part of `preserve_global_state` so we know it's sane after that. Also fake_tensor's in_kernel_invocation_manager() was leaving a bit set in the dispatcher (DispatchKey.Dense) which affected follow-on code. Fixed that to reset after as well. Repro: before: ``` $ rm test/dynamo_skips/TestSparseCPU.test_to_dense_with_gradcheck_sparse_cpu_complex64 $ PYTORCH_TEST_WITH_DYNAMO=1 pytest -s test/dynamo/test_export.py test/test_sparse.py -k 'test_to_dense_with_gradcheck_sparse_cpu_complex64' ======== 1 passed, 6173 deselected in 5.21s ============= $ PYTORCH_TEST_WITH_DYNAMO=1 pytest -s test/dynamo/test_export.py test/test_sparse.py -k 'test_torch_inference_mode_ctx or test_to_dense_with_gradcheck_sparse_cpu_complex64' ========= 1 skipped, 6172 deselected, 1 error in 5.29s ========= ``` (note that test_to_dense_with_gradcheck_sparse_cpu_complex64 passes on its own but failed when including the skipped test_export.py tests) after: ``` $ rm test/dynamo_skips/TestSparseCPU.test_to_dense_with_gradcheck_sparse_cpu_complex64 $ PYTORCH_TEST_WITH_DYNAMO=1 pytest -s test/dynamo/test_export.py test/test_sparse.py -k 'test_to_dense_with_gradcheck_sparse_cpu_complex64' ===================== 1 passed, 6173 deselected in 5.42s ===================== $ PYTORCH_TEST_WITH_DYNAMO=1 pytest -s test/dynamo/test_export.py test/test_sparse.py -k 'test_torch_inference_mode_ctx or test_to_dense_with_gradcheck_sparse_cpu_complex64' ===================== 1 passed, 1 skipped, 6172 deselected in 7.30s ====================== ``` (note that test_to_dense_with_gradcheck_sparse_cpu_complex64 passes in both runs) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122073 Approved by: https://github.com/zou3519	2024-04-10 18:57:01 +00:00
rzou	d486cb7c1b	Deprecate calling FakeTensor.data_ptr in eager-mode (#123292 ) Today, we error out on FakeTensor.data_ptr under torch.compile. This PR moves to error out on FakeTensor.data_ptr under eager mode to avoid diverging behavior. We do this by adding another bit onto FakeTensor that we'll remove after the deprecation cycle. Test Plan: - tested locally Pull Request resolved: https://github.com/pytorch/pytorch/pull/123292 Approved by: https://github.com/eellison ghstack dependencies: #123261, #123282, #123291	2024-04-04 20:35:24 +00:00
Yu, Guangye	eb7adc3ae0	Refactor gpu trace to be device-agnostic (#121794 ) # Motivation Refactor gpu trace to be device-agnostic. gpu trace is usually used in runtime components, including Device, Stream, Event, Guard, and Allocator. It should be device-agnostic and can be shared among each device backend. # Solution move `_cuda_trace.py` to `_gpu_trace.py`, which makes each device backend owns their callback, respectively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121794 Approved by: https://github.com/jgong5, https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui	2024-03-30 13:04:38 +00:00
Frank Lin	249e65b92d	Graph-Safe RNG State Exchange for Tensor Parallelism (#114068 ) See #113541 The PR allows for registering and controlling multiple RNG states using indices, ensuring cudagraph-safe operations, and includes both C++ and Python API changes to support this functionality. cc @eellison @anijain2305 @jansel @ezyang @ptrblck @csarofeen @mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/114068 Approved by: https://github.com/ezyang, https://github.com/eqy, https://github.com/xuzhao9	2024-03-27 01:14:38 +00:00
rzou	c81c9ba472	Disallow {FakeTensor,FunctionalTensor}.data_ptr (#122514 ) This PR: - disallows FakeTensor.data_ptr when it is called inside PT2 or fx tracing. - disallows FunctionalTensor.data_ptr (python FunctionalTensor is only used in PT2) The motivation behind this is that the leading cause of segfaults when using custom ops with PT2 is calling .data_ptr on FunctionalTensor or FakeTensor. This change is BC-breaking. If your code broke as a result of this, it's because there was a bug in it (these .data_ptr should never be accessed!). You can either fix the bug (recommended) or get the previous behavior back with: ``` from torch._subclasses.fake_tensor import FakeTensor from torch._subclasses.functional_tensor import FunctionalTensor data_ptr = 0 if isinstance(tensor, (FakeTensor, FunctionalTensor)) else tensor.data_ptr() ``` Test Plan: - existing tests Differential Revision: [D55366199](https://our.internmc.facebook.com/intern/diff/D55366199) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122514 Approved by: https://github.com/ezyang, https://github.com/albanD, https://github.com/yifuwang, https://github.com/kurtamohler	2024-03-26 23:55:42 +00:00
PyTorch MergeBot	4dc09d6aa4	Revert "Graph-Safe RNG State Exchange for Tensor Parallelism (#114068 )" This reverts commit `e9dcda5cba`. Reverted https://github.com/pytorch/pytorch/pull/114068 on behalf of https://github.com/ezyang due to memory leak in another ci ([comment](https://github.com/pytorch/pytorch/pull/114068#issuecomment-2018044527))	2024-03-25 13:49:04 +00:00
cyy	52e9049ffa	Remove unused variables (#122496 ) This PR removes several unused variables in the code base. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122496 Approved by: https://github.com/ezyang	2024-03-22 18:04:09 +00:00
PyTorch MergeBot	968c4c4154	Revert "Refactor gpu trace to be device-agnostic (#121794 )" This reverts commit `74deacbf31`. Reverted https://github.com/pytorch/pytorch/pull/121794 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it breaks ROCm jobs in trunk `74deacbf31`, please help take a look and reland the change ([comment](https://github.com/pytorch/pytorch/pull/121794#issuecomment-2013674083))	2024-03-21 20:33:17 +00:00
Frank Lin	e9dcda5cba	Graph-Safe RNG State Exchange for Tensor Parallelism (#114068 ) See #113541 The PR allows for registering and controlling multiple RNG states using indices, ensuring cudagraph-safe operations, and includes both C++ and Python API changes to support this functionality. cc @eellison @anijain2305 @jansel @ezyang @ptrblck @csarofeen @mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/114068 Approved by: https://github.com/ezyang	2024-03-21 01:57:08 +00:00
Yu, Guangye	74deacbf31	Refactor gpu trace to be device-agnostic (#121794 ) # Motivation Refactor gpu trace to be device-agnostic. gpu trace is usually used in runtime components, including Device, Stream, Event, Guard, and Allocator. It should be device-agnostic and can be shared among each device backend. # Solution move `_cuda_trace.py` to `_gpu_trace.py`, which makes each device backend owns their callback, respectively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121794 Approved by: https://github.com/jgong5, https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui	2024-03-21 01:52:58 +00:00
feifan	d0153ca755	use make_storage_impl to create storages for COWStorage. (#121896 ) Thanks to https://github.com/pytorch/pytorch/pull/118459， `make_storage_impl` will use the func ,which register for other backends, to create StorageImpl. `make_storage_impl` completely overwrites the `make_intrusive<StorageImpl>`, so it makes sense to replace `make_intrusive<StorageImpl>` with `make_storage_impl` to create storage in cow. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121896 Approved by: https://github.com/ezyang	2024-03-19 23:40:15 +00:00
PyTorch MergeBot	f9ed1c432d	Revert "Refactor gpu trace to be device-agnostic (#121794 )" This reverts commit `0ff1109e26`. Reverted https://github.com/pytorch/pytorch/pull/121794 on behalf of https://github.com/jeanschmidt due to Reverting to see if rocm trunk errors are related ([comment](https://github.com/pytorch/pytorch/pull/121794#issuecomment-2007519408))	2024-03-19 15:40:26 +00:00
Yu, Guangye	0ff1109e26	Refactor gpu trace to be device-agnostic (#121794 ) # Motivation Refactor gpu trace to be device-agnostic. gpu trace is usually used in runtime components, including Device, Stream, Event, Guard, and Allocator. It should be device-agnostic and can be shared among each device backend. # Solution move `_cuda_trace.py` to `_gpu_trace.py`, which makes each device backend owns their callback, respectively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121794 Approved by: https://github.com/jgong5, https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui	2024-03-19 06:02:28 +00:00
chentianyi16	0e68eb1505	Add privateuseone flags for c10::EventFlag (#121118 ) Fixes #117341 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121118 Approved by: https://github.com/albanD	2024-03-14 20:07:12 +00:00
Chen_Liqing	291ce86a6c	Modify StorageImplCreateHelper (#118459 ) I want to use tensor.untyped_storage()[a:b] for ``PrivateUse1`` backend but fail. The code will go into ``THPStorage_get``: `bb6eba189f/torch/csrc/Storage.cpp (L525-L540)` Here ``torch`` will create a new ``c10::StorageImpl`` but not consider about ``PrivateUse1`` backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118459 Approved by: https://github.com/albanD	2024-03-07 06:26:55 +00:00
cyy	507611f9ae	[CUDACachingAllocator] Turn Allocator::allocate into non-const (#120969 ) Ideally, the method should be non-const since it changes the allocator state. Some const_casts are also removed in the way. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120969 Approved by: https://github.com/albanD	2024-03-05 09:53:05 +00:00
Pearu Peterson	70d4d109f2	Make SparseCsr a functionality dispatch key (#120703 ) As in the title. To enable meta and fake tensor support for sparse compressed tensors in compliance with the meta/fake tensor support for sparse COO tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120703 Approved by: https://github.com/ezyang	2024-03-01 13:28:46 +00:00
Kurt Mohler	13a54ce279	Avoid COW materialization in `at::parallel_for/parallel_reduce` (#120455 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120455 Approved by: https://github.com/albanD	2024-03-01 05:05:28 +00:00
PyTorch MergeBot	86ff31c4a0	Revert "Avoid COW materialization in `at::parallel_for/parallel_reduce` (#120455 )" This reverts commit `cabc09a5f2`. Reverted https://github.com/pytorch/pytorch/pull/120455 on behalf of https://github.com/izaitsevfb due to breaks xla jobs ([comment](https://github.com/pytorch/pytorch/pull/120455#issuecomment-1970026100))	2024-02-28 22:30:18 +00:00
PyTorch MergeBot	a9d9077f12	Revert "Increased compile time max GPUs to 512. Switched to int16_t DeviceIndex. (#119639 )" This reverts commit `7c556428c7`. Reverted https://github.com/pytorch/pytorch/pull/119639 on behalf of https://github.com/kit1980 due to breaking internal builds, see D54286923 ([comment](https://github.com/pytorch/pytorch/pull/119639#issuecomment-1969634480))	2024-02-28 18:57:09 +00:00
Kurt Mohler	cabc09a5f2	Avoid COW materialization in `at::parallel_for/parallel_reduce` (#120455 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120455 Approved by: https://github.com/albanD	2024-02-28 00:37:33 +00:00
Tobias Ringwald	7c556428c7	Increased compile time max GPUs to 512. Switched to int16_t DeviceIndex. (#119639 ) Fixes #115331. This PR increases the number of valid GPU devices to 512 (from 64) in order to future-proof PyTorch for providers that offer [single nodes with a large device count](https://www.tensorwave.com/). Until now, `DeviceIndex` was an `int8_t`, thus multiple changes were necessary: - `DeviceIndex` changed to `int16_t`. Updated consumers that assume it to be an `int8_t`. - Updated bounds checking for `torch.device()` in the Python frontend. Right now, we allow funny things like `torch.device('cpu', 200).index == -56`, which is undefined behavior. I inserted some checks to only allow values between 0 and `c10::Device::MAX_NUM_DEVICES - 1`. - Updated the `ArgumentInfo` struct as it hardcodes the device index as 8 bit field [^1]. Might be a breaking change, not sure if users rely on this. - Introduced `c10::Device::MAX_NUM_DEVICES` as a replacement for the old `C10_COMPILE_TIME_MAX_GPUS` [^1]: This field was unsigned, so I guess this has also been undef behavior the whole time? Our default device index is -1, so this always wrapped around to 255 when written to the `ArgumentInfo` struct. When I switched the `DeviceIndex` to `int16_t`, it actually stayed 255 after unpacking from `ArgumentInfo` again, as the `DeviceIndex` was now wide enough that it didn't wrap back to -1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119639 Approved by: https://github.com/cyyever, https://github.com/albanD, https://github.com/huydhn	2024-02-27 07:05:48 +00:00
PyTorch MergeBot	8a32a07856	Revert "Add meta device support to sparse compressed tensors (#120498 )" This reverts commit `5d71ba6885`. Reverted https://github.com/pytorch/pytorch/pull/120498 on behalf of https://github.com/zou3519 due to broke CI ([comment](https://github.com/pytorch/pytorch/pull/120498#issuecomment-1964491999))	2024-02-26 15:59:36 +00:00
Shan19900305	685d862c45	Add SparsePrivateUse1 in backend_to_string, layout_from_backend and check_base_legacy_new. (#119263 ) 1) Using items stored in torch._tensor_classes to check item passed from python side; 2) Add SparsePrivateUse1 in backend_to_string, layout_from_backend and check_base_legacy_new; 3) Using more general API to get python module name in get_storage_obj and get_name functions. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/119263 Approved by: https://github.com/ezyang	2024-02-26 01:54:30 +00:00
Pearu Peterson	5d71ba6885	Add meta device support to sparse compressed tensors (#120498 ) As in the title. Unblocks https://github.com/pytorch/pytorch/pull/117907#discussion_r1499251745 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120498 Approved by: https://github.com/ezyang	2024-02-25 16:50:17 +00:00
PyTorch MergeBot	fff9d98e58	Revert "Increased compile time max GPUs to 512. Switched to int16_t DeviceIndex. (#119639 )" This reverts commit `e0268821dd`. Reverted https://github.com/pytorch/pytorch/pull/119639 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I think the Window failures are legit as they are failing now in trunk, i.e. `450339ab2d` ([comment](https://github.com/pytorch/pytorch/pull/119639#issuecomment-1958428416))	2024-02-22 00:12:54 +00:00
Tobias Ringwald	e0268821dd	Increased compile time max GPUs to 512. Switched to int16_t DeviceIndex. (#119639 ) Fixes #115331. This PR increases the number of valid GPU devices to 512 (from 64) in order to future-proof PyTorch for providers that offer [single nodes with a large device count](https://www.tensorwave.com/). Until now, `DeviceIndex` was an `int8_t`, thus multiple changes were necessary: - `DeviceIndex` changed to `int16_t`. Updated consumers that assume it to be an `int8_t`. - Updated bounds checking for `torch.device()` in the Python frontend. Right now, we allow funny things like `torch.device('cpu', 200).index == -56`, which is undefined behavior. I inserted some checks to only allow values between 0 and `c10::Device::MAX_NUM_DEVICES - 1`. - Updated the `ArgumentInfo` struct as it hardcodes the device index as 8 bit field [^1]. Might be a breaking change, not sure if users rely on this. - Introduced `c10::Device::MAX_NUM_DEVICES` as a replacement for the old `C10_COMPILE_TIME_MAX_GPUS` [^1]: This field was unsigned, so I guess this has also been undef behavior the whole time? Our default device index is -1, so this always wrapped around to 255 when written to the `ArgumentInfo` struct. When I switched the `DeviceIndex` to `int16_t`, it actually stayed 255 after unpacking from `ArgumentInfo` again, as the `DeviceIndex` was now wide enough that it didn't wrap back to -1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119639 Approved by: https://github.com/cyyever, https://github.com/albanD	2024-02-21 21:10:49 +00:00
soulitzer	27c5bbe5cb	Add is_nested_int() (#119975 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119975 Approved by: https://github.com/jbschlosser ghstack dependencies: #119661, #119974	2024-02-21 21:10:02 +00:00
cyy	d5d13ab15e	Remove C10_FALLTHROUGH (#120157 ) Since [[fallthrough]] is supported in our C++17 compilers and no other repo is using it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120157 Approved by: https://github.com/Skylion007	2024-02-21 06:18:58 +00:00
soulitzer	312ce35c1f	Rename singleton int to nested int (#119661 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119661 Approved by: https://github.com/ezyang	2024-02-16 19:21:17 +00:00
cyy	87c6cd2f00	[1/N] Replace std::tie with structural binding (#119774 ) This PR replaces some std::tie calls with structural binding from C++17. This not only makes the code more compact, but also has some performance gain. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119774 Approved by: https://github.com/albanD, https://github.com/malfet	2024-02-14 09:25:04 +00:00
Edward Z. Yang	6665b96ebb	Rewrite maybe_reduce more carefully for unbacked SymInt (#119562 ) Fixes https://github.com/pytorch/pytorch/issues/119476 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/119562 Approved by: https://github.com/albanD ghstack dependencies: #119559	2024-02-13 21:40:06 +00:00
cyy	10f3abc6b8	[DeviceIndex][3/N] Use DeviceIndex in more places (#119635 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/119635 Approved by: https://github.com/ezyang	2024-02-12 21:31:27 +00:00
Edward Z. Yang	3f0fd36835	Introduce size oblivious guards (#118579 ) Fixes https://github.com/pytorch/pytorch/issues/117361 The implementation here slightly diverges from what was proposed in the issue, so I will recap what this PR is doing here. Today, when doing computations involving size-like unbacked SymInts, we assume for all operations that the compile time range of the integer is `[2, inf]`, even though at runtime we also accept zero and one. This PR removes the carte blanche assumption, and instead does the analysis in a much more limited and controlled fashion: only for guards which we have designated as "size oblivious" are we willing to do the analysis under the assumption that the range of all size-like unbacked SymInts is `[2, inf]`; otherwise, we will faithfully only do analysis with `[0, inf]` (or whatever the user provided) bounds. The infra pieces of this PR are: * Remove runtime_var_to_range from torch/fx/experimental/symbolic_shapes.py; modify `_constrain_range_for_size` to refine the range without clamping min to 2, and instead add the symbol to a `size_like` set in the ShapeEnv * When evaluating an expression, if the expression is requested to be evaluated in a `size_oblivious` way, we attempt to statically compute the value of the expression with the assumption that all symbols in `size_like` are updated to assume that they are `>= 2`. * Add Python and C++ APIs for guarding on a SymBool in a size-oblivious way. In C++, I also need to add some helpers for performing symbolic comparisons, since the stock comparisons immediately specialize in the "normal" way. The rest of the changes of the PR are marking various spots in PyTorch framework code as size oblivious, based on what our current test suite exercises. As you review the places where we have marked things as size oblivious, it may become clear why I ended up not opting for the "designate a branch as the default branch when it's not statically obvious which way to go": for some of the conditions, this answer is rather non-obvious. I think potentially there is another refinement on top of this PR, which is something like "I don't care if you can't figure it out with ValueRange analysis, go down this path anyway if there are unbacked sizes involved." But even if we add this API, I think we are obligated to attempt the ValueRange analysis first, since it can lead to better outcomes sometimes (e.g., we are able to figure out that something is contiguous no matter what the unbacked size is.) When is it permissible to mark something as size oblivious? Heuristically, it is OK anywhere in framework code if it gets you past a guard on unbacked SymInt problem. It is somewhat difficult to provide a true semantic answer, however. In particular, these annotations don't have any observational equivalence guarantee; for example, if I have `torch.empty(u0, 1).squeeze()`, we will always produce a `[u0]` size tensor, even though if `u0 == 1` PyTorch will actually produce a `[]` size tensor. The argument that I gave to Lezcano is that we are in fact defining an alternate semantics for a "special" size = 0, 1, for which we have these alternate eager mode semantics. In particular, suppose that we have a constant `special1` which semantically denotes 1, but triggers alternate handling rules. We would define `torch.empty(special1, 1).squeeze()` to always produce a `[special1]` size tensor, making its semantics coincide with unbacked SymInt semantics. In this model, the decision to designate guards as size oblivious is simply a user API question: you put them where ever you need some handling for special1! As we conservatively error out whenever it is not obvious what `special1` semantics should be, it is always valid to expand these semantics to cover more cases (although you can always choose the wrong semantics!) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118579 Approved by: https://github.com/eellison, https://github.com/lezcano	2024-02-06 19:45:32 +00:00
cyy	4a019047ad	Enable nested namespace check in clang-tidy (#118506 ) It is time to enable nested namespaces in the code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118506 Approved by: https://github.com/albanD	2024-01-31 00:32:35 +00:00
dilililiwhy	b025e5984c	Get Device instance with correct type when privateuse1 backend is registered (#117966 ) Fixes #ISSUE_NUMBER If privateuse1 backend is registered. Let torch.device return corresponding instance of Device when only index is given. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117966 Approved by: https://github.com/albanD, https://github.com/malfet	2024-01-24 19:03:18 +00:00
Nikita Shulga	90b3cf33ac	[C10] Make Scalar constructable from longs (#118149 ) On Linux and Mac `int64_t` is an alias to either `long` (Linux) or `long long` (Mac) Because of that, attempt to construct `c10::Scalar` from the other type will fail with `conversion from ‘long long int’ to ‘c10::Scalar’ is ambiguous`. I.e. attempt to compile: ```cpp int main() { c10::Scalar s = 1L; } ``` on MacOS failed with: ``` foo.cpp:3:15: error: conversion from 'long' to 'c10::Scalar' is ambiguous c10::Scalar s = 1L; ^ ~~ /Users/nshulga/git/pytorch/pytorch/torch/include/c10/core/Scalar.h:59:7: note: candidate constructor DEFINE_IMPLICIT_CTOR) ^ /Users/nshulga/git/pytorch/pytorch/torch/include/c10/core/Scalar.h:59:7: note: candidate constructor /Users/nshulga/git/pytorch/pytorch/torch/include/c10/core/Scalar.h:59:7: note: candidate constructor /Users/nshulga/git/pytorch/pytorch/torch/include/c10/core/Scalar.h:59:7: note: candidate constructor /Users/nshulga/git/pytorch/pytorch/torch/include/c10/core/Scalar.h:59:7: note: candidate constructor /Users/nshulga/git/pytorch/pytorch/torch/include/c10/core/Scalar.h:59:7: note: candidate constructor /Users/nshulga/git/pytorch/pytorch/torch/include/c10/core/Scalar.h:59:7: note: candidate constructor /Users/nshulga/git/pytorch/pytorch/torch/include/c10/core/Scalar.h:62:3: note: candidate constructor Scalar(uint16_t vv) : Scalar(vv, true) {} ^ /Users/nshulga/git/pytorch/pytorch/torch/include/c10/core/Scalar.h:63:3: note: candidate constructor Scalar(uint32_t vv) : Scalar(vv, true) {} ^ /Users/nshulga/git/pytorch/pytorch/torch/include/c10/core/Scalar.h:64:3: note: candidate constructor Scalar(uint64_t vv) { ^ ``` Prevent this by providing missing constructors when needed. Alas one can not use SFINAE, as template constructors on Scalar mess up a lot of implicit conversions, so I use `static_asserts` to detect early on if premise for constructing this class holds. Add ScalarTest::LongsAndLongLongs that is essentially a compile time test Discovered while trying to enable AOTI on MacOS Pull Request resolved: https://github.com/pytorch/pytorch/pull/118149 Approved by: https://github.com/ezyang, https://github.com/albanD ghstack dependencies: #118077, #118076	2024-01-24 17:32:29 +00:00
Aaron Orenstein	d280b6ae58	Ensure that deleter is called even for a no-data tensor. (#117418 ) Summary: When using a custom deleter InefficientStdFunctionContext was using a std::unique_ptr<> to store the pointer and call the deleter - but this failed to call the deleter if the pointer was null. Since we have a separate holder class anyway take out the std::unique_ptr<> and call the deleter directly. Fixes #117273 Test Plan: Pull Request resolved: https://github.com/pytorch/pytorch/pull/117418 Approved by: https://github.com/wjakob, https://github.com/yanboliang	2024-01-22 23:27:27 +00:00
Jeff Daily	01abb5af21	additional support for float8_e4m3fnuz and _e5m2fnuz (#115214 ) Follow up to #107586. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115214 Approved by: https://github.com/peterbell10, https://github.com/malfet	2024-01-22 18:33:41 +00:00
cyy	3baade4425	Remove calls of c10::guts::conjunction,c10::guts::disjunction,c10::guts::negation (#117926 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/117926 Approved by: https://github.com/Skylion007	2024-01-22 00:35:42 +00:00
PyTorch MergeBot	b637fdc8b3	Revert "additional support for float8_e4m3fnuz and _e5m2fnuz (#115214 )" This reverts commit `74e1362499`. Reverted https://github.com/pytorch/pytorch/pull/115214 on behalf of https://github.com/PaliC due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/115214#issuecomment-1900815152))	2024-01-19 17:35:04 +00:00
Jeff Daily	74e1362499	additional support for float8_e4m3fnuz and _e5m2fnuz (#115214 ) Follow up to #107586. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115214 Approved by: https://github.com/peterbell10	2024-01-19 00:50:18 +00:00
Jerry Zhang	3e397cefc5	Add uint1 to uint7 dtypes (#117208 ) Summary: These dtypes are added since we see more demand for these sub byte dtypes, especially with the popularity of LLMs (https://pytorch.org/blog/accelerating-generative-ai-2/#step-4-reducing-the-size-of-the-weights-even-more-with-int4-quantization-and-gptq-2021-toks) Note these are just placeholders, the operator support for these dtypes will be implemented with tensor subclass. e.g. torch.empty(..., dtype=torch.uint1) will return a tensor subclass of uint1, that supports different operations like bitwsise ops, add, mul etc. (will be added later) Also Note that these are not quantized data types, we'll implement quantization logic with tensor subclass backed up by these dtypes as well. e.g `Int4GroupedQuantization(torch.Tensor)` will be implemented with torch.uint4 Tensors (see https://github.com/pytorch-labs/ao/pull/13 as an example) Test Plan: CIs python test/test_quantization.py -k test_uint1_7_dtype Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/117208 Approved by: https://github.com/ezyang	2024-01-13 01:09:23 +00:00
Edward Yang	b4a35632f9	Add function to materialize COW storages (#117053 ) Summary: From Kurt Mohler, see https://github.com/pytorch/pytorch/pull/113396 (manually imported due to ghimport problems) Test Plan: sandcastle, OSS CI Differential Revision: D52610522 Pull Request resolved: https://github.com/pytorch/pytorch/pull/117053 Approved by: https://github.com/malfet, https://github.com/kurtamohler	2024-01-10 15:34:16 +00:00
Edward Z. Yang	2e983fcfd3	Support unsigned int for randint, item, equality, fill, iinfo, tensor (#116805 ) These are some basic utilities that are often used for testing. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/116805 Approved by: https://github.com/albanD	2024-01-10 02:17:23 +00:00
Nikita Shulga	fdfdba7c13	[BE] Use `__builtin_overflow_sub` when available (#117015 ) Which is faster then ternary. Following script ```python import torch from timeit import default_timer global_setup = """ """ setup = """ c10::SymInt a = c10::SymInt(123); """ code = """ -a; """ from torch.utils.benchmark import Timer t = Timer(stmt=code, setup=setup, global_setup=global_setup, language="c++", timer=default_timer) print(t.blocked_autorange()) ``` reports 4.17 ns median type before and 3.61 ns after on x86_64 Linux and 2.02 ns before and 1.91 ns after on Apple M1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/117015 Approved by: https://github.com/albanD	2024-01-10 00:50:09 +00:00
Edward Z. Yang	f4e35e2c3d	Proposed mechanism for handling uint64_t in Scalar (#116595 ) Here's the problem: if we support unsigned integer types, and in particular if we support uint64_t, we need a way to represent these integers in Scalar. However, Scalar currently stores all integral values inside int64_t, which is not wide enough to accommodate all possible uint64_t values. So we need to do something to Scalar to support it. The obvious thing to do is add a uint64_t field to the union, and used it some situations. But when should we use it? The proposal is that we only use it if and only if the integer in question is not representable in int64_t. The historical precedent for this is our handling for uint8_t. Because this type is representable inside int64_t, we have historically stored it inside Scalar as an int64_t. In general, the concept behind Scalar is that it doesn't know the signedness/unsignedness/bitwidth of its input; in particular, we typically construct Scalar from Python int, which doesn't have any concept of how wide the integer is! So it doesn't make any sense to allow for a small integer like 255 to be representable under both the HAS_i tag and the HAS_u tag. So we forbid the latter case. Although I have proposed this, the PR as currently written just chokes when you pass it a uint64_t that's too big. There's some more logic that would have to be written out for this. I'm putting this out to start to get some agreement that this is the way to do it. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/116595 Approved by: https://github.com/albanD	2024-01-08 22:02:03 +00:00
Edward Z. Yang	8ddac14a15	Add unsigned integer dtypes to PyTorch (#116594 ) The dtypes are very useless right now (not even fill works), but it makes torch.uint16, uint32 and uint64 available as a dtype. Towards https://github.com/pytorch/pytorch/issues/58734 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/116594 Approved by: https://github.com/albanD ghstack dependencies: #116698, #116693	2024-01-07 07:40:49 +00:00
Edward Z. Yang	8e273e23b5	Refactor promoteType to no longer use shifting strategy (#116693 ) Instead of manually fixing the indices (extremely error prone when new dtypes are added) we just setup a lookup table to map ScalarType to the offsets table. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/116693 Approved by: https://github.com/albanD ghstack dependencies: #116698	2024-01-07 07:40:49 +00:00
Edward Z. Yang	a8a9695047	Move promoteTypes to cpp file (#116685 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/116685 Approved by: https://github.com/albanD	2024-01-04 04:42:14 +00:00
cyy	bb2a1e9941	Enable readability-redundant-smartptr-get in clang-tidy (#116381 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116381 Approved by: https://github.com/Skylion007	2023-12-26 06:05:15 +00:00
cyy	9a0c217a0a	[9/N] Fixes clang-tidy warnings in c10/util/*.h (#116185 ) Continued work to clean headers in c10/util. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116185 Approved by: https://github.com/Skylion007	2023-12-22 09:35:44 +00:00
Nikita Shulga	f2c1fb3ee4	Fix crash in SymInt unary minus (#116160 ) Before this change `-SymInt(std::numeric_limits<int64_t>::min()) == 0` would reliably crash with null pointer dereference, as `data_` of the SymInt returned by `operator-` would be `0x8000000000000000`, because of the carry/overflow flags set by `negq`. Before the change x86_64 assembly generated for `4f02cc0670/c10/core/SymInt.cpp (L137)` looked as follows: ``` 0x7ffff7f2f490 <+115>: movq %rax, %rdx 0x7ffff7f2f493 <+118>: negq %rdx 0x7ffff7f2f496 <+121>: movq %rdx, (%rbp) 0x7ffff7f2f49a <+125>: movabsq $0x4000000000000000, %rdx ; imm = 0x4000000000000000 0x7ffff7f2f4a4 <+135>: cmpq %rdx, %rax 0x7ffff7f2f4a7 <+138>: jle 0x7ffff7f2f520 ; <+259> at SymInt.cpp:141:1 ``` `negq %rfx` correspond to unary minus and `cmpq %rdx, 0x4000000000000000` are inverted `check_range` `b6d0d0819a/c10/core/SymInt.h (L247-L249)` Flags raised by `negq` will affect the results of `cmpq`, and as result value would not be allocated on heap, but rather preserved as `nullptr`. Not sure if it's worth benchmarking, but perhaps using `__builtin_sub_overflow` would be faster as it does not require an extra comparison, just guarantees that overflow flags is cleared after the op. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116160 Approved by: https://github.com/Skylion007, https://github.com/colesbury	2023-12-20 20:12:57 +00:00
cyy	968b94bef2	[8/N] Fixes clang-tidy warnings in c10/{core,util}/.h (#116082 ) This patch enables clang-tidy coverage on c10//.h and contains other fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116082 Approved by: https://github.com/Skylion007	2023-12-20 12:22:21 +00:00
cyy	1544c37520	[7/N] Fixes clang-tidy warnings in c10/{core,util}/*.h (#115495 ) This PR continues to fix clang-tidy warnings for headers in c10/core and c10/util. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115495 Approved by: https://github.com/malfet	2023-12-19 02:14:30 +00:00
Sunita Nadampalli	88207b10ca	Enable thp(transparent huge pages) for buffer sizes >=2MB (#107697 ) The 2MB thp pages provide better allocation latencies compared to the standard 4KB pages. This change has shown substantial improvement for batch mode usecases where the tensor sizes are larger than 100MB. Only enabled if THP_MEM_ALLOC_ENABLE environment variable is set. Relanding https://github.com/pytorch/pytorch/pull/93888 with functionality disabled for Android Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/107697 Approved by: https://github.com/malfet	2023-12-16 18:16:19 +00:00
soulitzer	3fa3ed4923	Workaround to avoid MSVC std ambiguous symbol error (#115748 ) Don't know what the correct fix is, but it appears that this is the known workaround https://github.com/pytorch/pytorch/issues/18607 Failing windows build: https://hud.pytorch.org/pytorch/pytorch/pull/114897?sha=574a6f7cfe979f1bac62c6b0b51380ff67a31a09 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115748 Approved by: https://github.com/jbschlosser ghstack dependencies: #114895, #115739	2023-12-13 23:22:52 +00:00
soulitzer	67ce57ff66	Add pragma once to headers (#115739 ) This reverts commit 9b93c23b5e2d695c2fbd9c886cc0c8010edab717. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115739 Approved by: https://github.com/Skylion007, https://github.com/jbschlosser ghstack dependencies: #114895	2023-12-13 23:22:52 +00:00
soulitzer	4d8ad4fb82	Move SingletonSymNodeImpl from c10 to aten (#114895 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114895 Approved by: https://github.com/jbschlosser	2023-12-13 20:01:18 +00:00
cyy	99f222372b	[5/N] Fixes clang-tidy warnings in c10/{core,util}/*.h (#115354 ) This PR continues to fix clang-tidy warnings for headers in c10/core and c10/util. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115354 Approved by: https://github.com/Skylion007	2023-12-09 17:16:04 +00:00
cyy	516bd4a72c	[1/N] Use std::in_place (#115170 ) It is time to gradually replace c10::in_place with std::in_place. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115170 Approved by: https://github.com/colesbury	2023-12-09 03:52:39 +00:00
cyy	7b8084d1c6	[5/N] Fixes clang-tidy warnings in c10/core/*.h (#115232 ) This PR continues to fix clang-tidy warnings for headers in c10/core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115232 Approved by: https://github.com/Skylion007	2023-12-07 15:48:03 +00:00
cyyever	1224acc018	[3/N] Fixes clang-tidy warnings in header files (#114431 ) This PR series tries to enable clang-tidy for headers in torch/csrc and c10/util. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114431 Approved by: https://github.com/Skylion007	2023-12-05 12:58:27 +00:00
Scott Wolchok	165f4f6ccf	[PyTorch] Redirect c10::optional to std::optional (#101995 ) We have C++17 now! I am intentionally dropping the `c10::optional<c10::ArrayRef>` size optimization. It was intended to improve dispatch, but thanks to D34602980 / #70864 we don't use `optional<ArrayRef>` in function arguments anymore anyway. Differential Revision: [D46079028](https://our.internmc.facebook.com/intern/diff/D46079028/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101995 Approved by: https://github.com/malfet, https://github.com/Skylion007, https://github.com/ezyang	2023-11-30 02:46:41 +00:00
cyy	07e00de8d7	Add missing member initialization in c10::ExtraMeta constructor (#114448 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114448 Approved by: https://github.com/Skylion007	2023-11-24 03:54:11 +00:00
cyy	b76e2949f7	Fix pool_size type in TaskThreadPool (#114063 ) As negative values of pool_size mean calling defaultNumThreads() Pull Request resolved: https://github.com/pytorch/pytorch/pull/114063 Approved by: https://github.com/Skylion007	2023-11-23 21:20:45 +00:00
Edward Z. Yang	8c4812be80	Replace expect_int with guard_int (#113921 ) The idea is that instead of erroring, we will just specialize at these sites. Fixes https://github.com/pytorch/pytorch/issues/113142 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113921 Approved by: https://github.com/zou3519	2023-11-20 21:27:48 +00:00
drisspg	2b97f5a9a1	Disallow fp8 type promotion (#113975 ) Fixes #113663 As well as updating the promotion logic to disallow automatic type promotion between fp8 types this PR also cleans up the table entries. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113975 Approved by: https://github.com/albanD, https://github.com/malfet	2023-11-20 19:47:43 +00:00
PyTorch MergeBot	f36d09fcb7	Revert "Add function to materialize COW storages (#113396 )" This reverts commit `e2f090086b`. Reverted https://github.com/pytorch/pytorch/pull/113396 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/113396#issuecomment-1818769090))	2023-11-20 10:26:01 +00:00
cyy	bae61ecb96	[Reland 1] Cleanup header inclusions in torch_cpu by iwyu (#112311 ) Reland https://github.com/pytorch/pytorch/pull/101178 to use IWYU on torch_cpu. The header file changes are excluded to avoid breaking internal jobs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112311 Approved by: https://github.com/ezyang	2023-11-19 04:06:36 +00:00
Edward Z. Yang	33f7c6638f	Guard when fetching non-symbolic value out of Scalar (#113911 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113911 Approved by: https://github.com/voznesenskym ghstack dependencies: #113877	2023-11-18 06:39:09 +00:00
Kurt Mohler	e2f090086b	Add function to materialize COW storages (#113396 ) Part of #109833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113396 Approved by: https://github.com/ezyang	2023-11-17 01:58:51 +00:00
Nikita Shulga	05d949279c	[C10] cpuinfo error handling (#113771 ) If `cpuinfo_initalize` returns false, call to subsequent cpuinfo functions may result in `abort()` Also, `defaultNumThreads()` method is assumption if one method fails then try another, and finally return 1. Alas there are no good way to test it on x86 platform, but on ARM one can replicate it by running `sudo chmod 750 /sys` and then `python3 -c "import torch;torch._C.profiler.gather_traceback(True, True, True)"` <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 4d942e8</samp> > _`cpuinfo` fails_ > _avoid undefined behavior_ > _check before you count_ Partially addresses https://github.com/pytorch/pytorch/issues/113568 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113771 Approved by: https://github.com/atalman	2023-11-15 19:49:34 +00:00
George White	6c187246d6	Add support for float8_e4m3fnuz and _e5m2fnuz (#107586 ) This PR relates to the feature in [this feature submission](https://docs.google.com/document/d/1pF2T1xz54IPg1jG7FhykbrpbcJZVelQw0v8vBaoLkfs/edit). It has been based on #104242 which adds similar float8 types. These new types added in this PR are described in the paper at https://arxiv.org/abs/2206.02915. A brief description and comparison of the types with other float8 types can be also found in the [OpenXLA RFC](https://github.com/openxla/stablehlo/blob/main/rfcs/20230321-fp8_fnuz.md). Pull Request resolved: https://github.com/pytorch/pytorch/pull/107586 Approved by: https://github.com/seemethere, https://github.com/malfet	2023-11-15 15:01:11 +00:00
Chen_Liqing	b910d9eaa6	Add tensor.is_privateuseone (#113421 ) We found a scenario where ``tensor.device().is_privateuseone()`` is used to determine whether a tensor is privateuse1 but fails. In the code of ``Autograd``, for example： ``` ::std::tuple<at::Tensor,at::Tensor,at::Tensor> native_batch_norm(c10::DispatchKeySet ks, const at::Tensor & input, const c10::optional<at::Tensor> & weight, const c10::optional<at::Tensor> & bias, const c10::optional<at::Tensor> & running_mean, const c10::optional<at::Tensor> & running_var, bool training, double momentum, double eps) { auto& input_ = unpack(input, "input", 0); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( input, weight, bias ); [[maybe_unused]] auto _any_has_forward_grad_result0 = (isFwGradDefined(input) \|\| isFwGradDefined(weight) \|\| isFwGradDefined(bias)); check_no_requires_grad(running_mean, "running_mean", "native_batch_norm"); check_no_requires_grad(running_var, "running_var", "native_batch_norm"); std::shared_ptr<NativeBatchNormBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<NativeBatchNormBackward0>(new NativeBatchNormBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( input, weight, bias )); grad_fn->eps = eps; grad_fn->input_ = SavedVariable(input, false); grad_fn->running_mean_ = SavedVariable(running_mean, false); grad_fn->running_var_ = SavedVariable(running_var, false); grad_fn->training = training; grad_fn->weight_ = SavedVariable(weight, false); } ... } ``` When ``weight`` is ``None``, an empty tensor is automatically generated and will be transferred to the backward calculation: `c7e12c7427/torch/csrc/autograd/saved_variable.cpp (L121-L128)` At the beginning of the backward calculation in our scenario, we need to determine whether the input tensor is ``PrivateUse1`` . However, if we use ``tensor.device().is_privateuseone()``, we will get an error ``"tensor does not have a device"``: `c7e12c7427/c10/core/TensorImpl.h (L1223-L1235)` I think this part of the code can be optimized, what do you think? Pull Request resolved: https://github.com/pytorch/pytorch/pull/113421 Approved by: https://github.com/albanD	2023-11-13 01:51:27 +00:00
Peter Bell	15b61d6c1a	TensorImpl: Lazily compute numel and contiguity when symbolic (#112785 ) Currently whenever the sizes or strides are modified for a `TensorImpl` we eagerly recompute the numel and memory format flags. This is fine for static shapes as it's all fast C++ code, but for symbolic shapes it runs slow python code. This instead changes the `SymbolicShapeMeta` object to compute the derived quantities lazily at the first request. This has the added benefit that we can now pass assumptions in `empty_tensor_restride` which remove the need to compute some contiguity flags at all. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112785 Approved by: https://github.com/ezyang ghstack dependencies: #112689, #112890	2023-11-09 01:36:37 +00:00
Peter Bell	8c4bdac560	TensorImpl: Move symbolic refresh_numel and refresh_contiguous into their own class (#112890 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112890 Approved by: https://github.com/lezcano ghstack dependencies: #112689	2023-11-09 01:36:37 +00:00
Edward Z. Yang	9e6e9587c1	Make numel/sym_numel PyInterpreter work symmetrically to others (#113065 ) Just some better engineering code cleanup. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113065 Approved by: https://github.com/voznesenskym	2023-11-08 17:44:29 +00:00
Peter Bell	c6f435befd	Don't recompute numel and contiguous in detach (#112689 ) When symolic shapes are involved, `refresh_numel` and `refresh_contiguous` are fairly expensive since they dispatch to python for SymInt handling. However, in the case of detach we can just copy the existing values instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112689 Approved by: https://github.com/lezcano, https://github.com/ezyang	2023-11-07 10:55:48 +00:00
Edward Z. Yang	3be99012d4	Switch some more SymInt tests to TORCH_CHECK_ALWAYS_SHOW_CPP_STACKTRACE (#112626 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/112626 Approved by: https://github.com/bdhirsh	2023-11-03 03:15:26 +00:00
Edward Z. Yang	a1ab22b81d	Reland "Trigger specialization when you call size()/stride() from C++ (#111935 )" (#112605 ) This reverts commit `22221c6d60`. Differential Revision: [D50886564](https://our.internmc.facebook.com/intern/diff/D50886564) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112605 Approved by: https://github.com/voznesenskym	2023-11-02 13:27:31 +00:00
PyTorch MergeBot	22221c6d60	Revert "Trigger specialization when you call size()/stride() from C++ (#111935 )" This reverts commit `5846705e36`. Reverted https://github.com/pytorch/pytorch/pull/111935 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/111935#issuecomment-1782107024))	2023-10-27 00:23:03 +00:00
Edward Z. Yang	5846705e36	Trigger specialization when you call size()/stride() from C++ (#111935 ) This should be the last of the "it used to work with static shapes but it doesn't work with dynamic shapes" hard errors. Now we will just specialize if you hit it from C++. The strategy here is a bit clever. We shunt the size() call to Python binding if an error would have occurred. Importantly, we already have logic to make sure the newly allocated ints stay live for the duration of the ArrayRef access. storage_offset is intentionally omitted because there are some problems with it. I will fix them next. This should let us get rid of the aotautograd_static test configuration. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/111935 Approved by: https://github.com/zou3519	2023-10-25 16:17:55 +00:00
ydwu4	f3d02d9ae6	Add support for sym_ite (#111440 ) This PR supports sym_ite. This is useful for converting SymBool to SymInt in e.g. #109916. Internally, it uses sympy.Piecewise. We cannot use sympy.ITE because it expects the arguments and output all to be boolean type but we want return SymInt type when converting a SymBool to SymInt. So we use sympy.Piecewise to denote the symbolic relationship. Note that this pr uses the range analysis for sympy.Piecewise implemented in https://github.com/pytorch/pytorch/blob/main/torch/utils/_sympy/value_ranges.py. Test Plan: See added test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111440 Approved by: https://github.com/ezyang	2023-10-23 16:17:43 +00:00
Kurt Mohler	a267d95c2a	Reland: Add `lazy_clone_storage` to create COW storages (#111579 ) Relands #110192 NOTE: COW storages do not actually copy on write yet, they just have the COW deleter and deleter context applied to them Part of #109833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111579 Approved by: https://github.com/ezyang	2023-10-20 15:49:59 +00:00
PyTorch MergeBot	9c7391ea36	Revert " [1/N] Apply clang-tidy to c10 cuda files (#111137 )" This reverts commit `43b023694e`. Reverted https://github.com/pytorch/pytorch/pull/111137 on behalf of https://github.com/malfet due to Was reverted internally due to the failures in torch.cuda.memory_stats(device=0) (presumably) ([comment](https://github.com/pytorch/pytorch/pull/111137#issuecomment-1769274103))	2023-10-18 20:32:53 +00:00
cyy	43b023694e	[1/N] Apply clang-tidy to c10 cuda files (#111137 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/111137 Approved by: https://github.com/zou3519, https://github.com/Skylion007	2023-10-17 04:52:50 +00:00
PyTorch MergeBot	97a513ed07	Revert "Add `lazy_clone_storage` to create COW storages (#110192 )" This reverts commit `1c30814417`. Reverted https://github.com/pytorch/pytorch/pull/110192 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, @ezyang please support the author providing further details ([comment](https://github.com/pytorch/pytorch/pull/110192#issuecomment-1765157285))	2023-10-16 19:43:20 +00:00
Kurt Mohler	1c30814417	Add `lazy_clone_storage` to create COW storages (#110192 ) This PR relands #110022 but accounts for the changes in #110191. Also, the function for creating COW storages is called `lazy_clone_storage` in this PR, instead of `try_ensure` NOTE: COW storages do not actually copy on write yet, they just have the COW deleter and deleter context applied to them Part of #109833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110192 Approved by: https://github.com/ezyang	2023-10-14 00:53:21 +00:00
PyTorch MergeBot	482782406a	Revert "Add `lazy_clone_storage` to create COW storages (#110192 )" This reverts commit `33f1513486`. Reverted https://github.com/pytorch/pytorch/pull/110192 on behalf of https://github.com/kit1980 due to revert to work around some importing issues ([comment](https://github.com/pytorch/pytorch/pull/110192#issuecomment-1762430374))	2023-10-14 00:48:45 +00:00
PyTorch MergeBot	f68d6e8108	Revert "Move at::{Refcounted,}MapAllocator to c10 (#109881 )" This reverts commit `68a1219f74`. Reverted https://github.com/pytorch/pytorch/pull/109881 on behalf of https://github.com/kit1980 due to breaking internal builds, undefined symbol: _ZN3c1022RefcountedMapAllocator6decrefEv ([comment](https://github.com/pytorch/pytorch/pull/109881#issuecomment-1761950014))	2023-10-13 17:57:53 +00:00
Kazuaki Ishizaki	8162f4170b	Fix typo under c10 directory (#111155 ) This PR fixes typo in comments and messages in files under `c10` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111155 Approved by: https://github.com/Skylion007	2023-10-13 16:52:51 +00:00
Kurt Mohler	33f1513486	Add `lazy_clone_storage` to create COW storages (#110192 ) This PR relands #110022 but accounts for the changes in #110191. Also, the function for creating COW storages is called `lazy_clone_storage` in this PR, instead of `try_ensure` NOTE: COW storages do not actually copy on write yet, they just have the COW deleter and deleter context applied to them Part of #109833 Differential Revision: [D50265134](https://our.internmc.facebook.com/intern/diff/D50265134) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110192 Approved by: https://github.com/ezyang	2023-10-13 15:33:40 +00:00
Jesse Cai	4c01686027	Public API for constructing NT with jagged layout from tensor list (#111078 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111078 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer ghstack dependencies: #109123	2023-10-13 03:27:41 +00:00
Peter Bell	68a1219f74	Move at::{Refcounted,}MapAllocator to c10 (#109881 ) `libshm.so` depends on the torch library exclusively for `at::RefcountedMapAllocator`, so it makes sense to move it to c10 along with the other memory allocators. This means `libshm.so` only depends on `c10` and we don't need to relink `libshm.so` for every ATen change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109881 Approved by: https://github.com/albanD	2023-10-12 10:51:13 +00:00
cyy	f98d6ad8b3	[1/N] Apply clang-tidy to aten/src/ATen/core/ (#110861 ) It is time to cliang-tidy aten. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110861 Approved by: https://github.com/Skylion007	2023-10-10 23:20:58 +00:00
PyTorch MergeBot	02a02a23ee	Revert "Move at::{Refcounted,}MapAllocator to c10 (#109881 )" This reverts commit `0341deb1c7`. Reverted https://github.com/pytorch/pytorch/pull/109881 on behalf of https://github.com/albanD due to It does break buck build ([comment](https://github.com/pytorch/pytorch/pull/109881#issuecomment-1756195823))	2023-10-10 20:39:12 +00:00
soulitzer	fda0a965c7	[reland] Support SingletonSymNode mul with coefficient (#110673 ) reland of https://github.com/pytorch/pytorch/pull/110369 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110673 Approved by: https://github.com/ezyang	2023-10-10 19:37:17 +00:00
Peter Bell	0341deb1c7	Move at::{Refcounted,}MapAllocator to c10 (#109881 ) `libshm.so` depends on the torch library exclusively for `at::RefcountedMapAllocator`, so it makes sense to move it to c10 along with the other memory allocators. This means `libshm.so` only depends on `c10` and we don't need to relink `libshm.so` for every ATen change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109881 Approved by: https://github.com/albanD	2023-10-09 23:53:47 +00:00
Kazuaki Ishizaki	50bd252863	Fix typo `the the` (#110869 ) This PR fixes typo `the the` of comments and exception message in files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110869 Approved by: https://github.com/soulitzer	2023-10-09 19:32:45 +00:00
vinithakv	36e6b0cfa2	Fix cpuinfo related crash on ppc64 (#110708 ) The "import torch" crashes with following cpuinfo error on powerpc64. ============================================================== >>> import torch Error in cpuinfo: processor architecture is not supported in cpuinfo Fatal error in cpuinfo: cpuinfo_get_processors_count called before cpuinfo is initialized Aborted (core dumped) ================================================================== The patch fixes this by excluding powerpc from using cpuinfo as it is not supported for ppc64. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110708 Approved by: https://github.com/ezyang	2023-10-08 13:31:54 +00:00
cyy	12f97bb2e9	[Reland][3/N] Add -Wdeprecated and related fixes (#110518 ) Fixes the string_view errors and reland the work. The previous changes in torch/csrc/utils/invalid_arguments.cpp were too aggressive and not tested thoroughly. They are discarded. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110518 Approved by: https://github.com/ezyang	2023-10-07 08:38:40 +00:00
soulitzer	69ea214cc2	[reland] Update singleton int to error when inequality relation is undefined (#110672 ) reland of https://github.com/pytorch/pytorch/pull/110044 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110672 Approved by: https://github.com/ezyang	2023-10-06 17:50:25 +00:00
PyTorch MergeBot	330db8278b	Revert "Update singleton int to error when inequality relation is undefined (#110044 )" This reverts commit `07331c65e6`. Reverted https://github.com/pytorch/pytorch/pull/110044 on behalf of https://github.com/PaliC due to bottom diff is causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/110044#issuecomment-1749805209))	2023-10-05 23:55:37 +00:00
PyTorch MergeBot	1c3fae46ee	Revert "Support SingletonSymNode mul with coefficient (#110369 )" This reverts commit `eb8feb8ff8`. Reverted https://github.com/pytorch/pytorch/pull/110369 on behalf of https://github.com/PaliC due to bottom diff is causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/110369#issuecomment-1749802899))	2023-10-05 23:51:28 +00:00
Amadeusz Skrzypczak	653f966df0	Fix type promotion of float8_e5m2 and float8_e4m3fn (#110279 ) There is an issue with float8 type promotion, because _promoteTypesLookup doesn't contain records for few types between bfloat16 and float8. I have simply moved float8 types just after bfloat16, however I'm not sure if it doesn't break serialization. Please, decide if it can stay like this, or should I insert missing records filled with "ud" into _promoteTypesLookup instead of moving types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110279 Approved by: https://github.com/albanD	2023-10-05 01:28:48 +00:00
soulitzer	eb8feb8ff8	Support SingletonSymNode mul with coefficient (#110369 ) We want to be able to use SingletonSymNode to represent strides for Jagged layout tensor. The following is for 3D, but easily generalizable to higher dimensions. Constraints: - [B, x, D] (where x represents the "variably lengthed dim") can be strided in two ways [x, 1, sum(x)] and [dx, d, 1]. We need two different placeholder values depending on how the jagged tensor is strided. - When doing operations we need the strides of output tensors to be expressable in terms of the strides and sizes of the inner tensors. Given [B, x, D] @ [D, D'], the output strides is [x * D', D', 1] rather than some opaque [x2, D', 1]. This constraint exists because if I'm tracing, I need a symint to represent the output stride. This symint needs to come from somewhere; I get it in several ways: (1) create a constant, (2) unbacked symint, (3) create a new input using a source, (4) output of an operation on an existing symint. It is clear that (4) is what we want here, which brings us to the design below. Design: Given the two constraints, the most straightforward way to implement this is actually to update SingletonSymNode to include some scalar factor, i.e. Morally, SingletonSymNode represents `factor * [s_0, s_1, …, s_n]` This enables us to symbolically compute strides from sizes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110369 Approved by: https://github.com/ezyang ghstack dependencies: #110044	2023-10-04 22:56:15 +00:00
soulitzer	07331c65e6	Update singleton int to error when inequality relation is undefined (#110044 ) Previously, something like j0 >= 3, would return False. In sympy however, it is not possible to make it so that both j0 >= 3 and j0 < 3 return False. In sympy, you only get to dispatch on Ge, and the remaining are derived, e.g. defining Ge(j0 >= 3) to be False would force Lt(j0, 3) to be True, which is not what we want. In this PR, we make it so that both j0 >=3 and j0 < 3 error, so that in a future PR when we create the symbolic counterpart of this singleton, the behaviors can be the same. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110044 Approved by: https://github.com/ezyang	2023-10-04 22:55:53 +00:00
PyTorch MergeBot	156aefa89b	Revert "[3/N] Add -Wdeprecated and related fixes (#109698 )" This reverts commit `c31fcdaa4f`. Reverted https://github.com/pytorch/pytorch/pull/109698 on behalf of https://github.com/PaliC due to breaking quantization tests ( quantization/test_quantize_per_channel_sub_byte and quantization/test_quantize_per_channel_float_qparams) internally ([comment](https://github.com/pytorch/pytorch/pull/109698#issuecomment-1746999806))	2023-10-04 14:33:47 +00:00
cyy	c31fcdaa4f	[3/N] Add -Wdeprecated and related fixes (#109698 ) This PR follows #108626. Hopefully we can enable the warning in the next PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109698 Approved by: https://github.com/Skylion007, https://github.com/ezyang	2023-10-03 22:50:53 +00:00
cyy	168f516fae	[3/N] Move c10::variant to std::variant (#110141 ) This PR moves more c10::variant calls to std::variant Pull Request resolved: https://github.com/pytorch/pytorch/pull/110141 Approved by: https://github.com/Skylion007	2023-09-28 18:43:55 +00:00
Kurt Mohler	f2c360e3e5	Reorganize and rename COW files and APIs (#110191 ) This PR does the following: * Combine `cow/context.<h/cpp>` and `cow/deleter.<h/cpp>` into `cow/COWDeleter.<h/cpp>` * Rename `Context` to `COWDeleterContext` * Rename `delete_context` to `cow_deleter` * Remove the separate `impl_cow_context` bazel library, combining it with the base c10 core library * Rename `context_test.cpp` to `cow_test.cpp` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110191 Approved by: https://github.com/ezyang	2023-09-28 17:50:44 +00:00
cyy	a81d083b1c	[Reland] Add -Wdeprecated and related fixes (#110019 ) This is reland of PRs #https://github.com/pytorch/pytorch/pull/108626 and #109564. We fixed the IOS build failure by changing ``` ((CHECK) ? (EXPR) : ([] { assert(!#CHECK); }(), (EXPR))) ``` to ``` ((CHECK) ? (EXPR) : ([] { assert(false); }(), (EXPR))) ``` in TR2_OPTIONAL_ASSERTED_EXPRESSION, since the former syntax was invalid on Apple Clang. Anyway, we could apply the simple fix hoping that c10::optional would be replaced by std::optional soon. We also enabled -Wdeprecated on c10. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110019 Approved by: https://github.com/clee2000	2023-09-28 03:34:29 +00:00
cyy	36eb1bb548	Use constexpr members in ConstantSymNodeImpl (#110142 ) A simple refactoring. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110142 Approved by: https://github.com/Skylion007	2023-09-27 18:31:33 +00:00
PyTorch MergeBot	1265400ba6	Revert "Reland: implement a function to convert a storage to copy-on-write (#110022 )" This reverts commit `dddf07e56a`. Reverted https://github.com/pytorch/pytorch/pull/110022 on behalf of https://github.com/atalman due to New tests are failing in internal CI ([comment](https://github.com/pytorch/pytorch/pull/110022#issuecomment-1737584693))	2023-09-27 15:05:41 +00:00
mikey dagitses	dddf07e56a	Reland: implement a function to convert a storage to copy-on-write (#110022 ) Relands #100819 In addition, the `impl_cow_context` library is combined into the base c10 core library, and COW unit tests are combined into just one binary. Part of #109833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110022 Approved by: https://github.com/ezyang	2023-09-26 03:33:18 +00:00
Nikita Shulga	f87863335c	[BE]s/DEFINE_ENUM/DEFINE_ST_ENUM_VAL_/ (#109917 ) To avoid potential collisions with other libraries that can define such enum globally (which is a bad practice, but happens sometimes) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109917 Approved by: https://github.com/Skylion007	2023-09-25 22:19:09 +00:00
PyTorch MergeBot	83deaa16ed	Revert "[1/N] Cleanup header inclusions in torch_cpu by iwyu (#101178 )" This reverts commit `b7a95f4fdb`. Reverted https://github.com/pytorch/pytorch/pull/101178 on behalf of https://github.com/atalman due to Break internal CI ([comment](https://github.com/pytorch/pytorch/pull/101178#issuecomment-1734384645))	2023-09-25 20:05:25 +00:00
eellison	4734496a0c	Extend storage access error api for untyped_storage() (#109750 ) In cudagraph trees, we invalidate tensors at some point and drop their storage. Then, when they are accessed with .data_ptr(), a custom error message is thrown. Previously, this invalidation didn't also make untyped_storage()/storage() error which could result in a segfault. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109750 Approved by: https://github.com/zou3519	2023-09-25 17:51:27 +00:00
cyy	b7a95f4fdb	[1/N] Cleanup header inclusions in torch_cpu by iwyu (#101178 ) Following our previous IWYU work #100304 on C10, it makes more sense to try IWYU on torch_cpu. This PR does exactly that. Meanwhile, it fixes issue #48684. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101178 Approved by: https://github.com/ezyang	2023-09-24 05:01:20 +00:00
Brian Hirsh	63526a63f5	Make FunctionalTensor subclass to be more like functorch (interaction with ZeroTensor + Conjugate key) (#109023 ) I added some tests for Conj, Neg and ZeroTensor for both python and C++ functionalization. This also fixes a nasty segfult when running a functorch `jacfwd` test with `torch.compile`, once AOTAutograd is using `FunctionalTensor`. Changes: (1) I use Jeffrey's `make_wrapper_subclass(extra_dispatch_keys)` kwarg to plumb extra dispatch keys ontoto the wrapper, mirroring what C++ functionalization does (C++ functionalization will mirror all dispatch keys from the inner tensor to the wrapper, except for python and functorch keys). (2) FunctionalTensorMode will decompose CompositeImplicitAutograd ops, since (for example) ZeroTensor kernels can send ops like `.to()` directly to the Python key. We'll need a way to toggle this later for pre-dispatch functionalization (3) Bound `_ForceDispatchKeyGuard` and BatchedTensorImpl's dispatch keyset to python Pull Request resolved: https://github.com/pytorch/pytorch/pull/109023 Approved by: https://github.com/zou3519 ghstack dependencies: #108654, #109662, #109632	2023-09-22 07:09:04 +00:00
Brian Hirsh	dae9aa8925	fix subclass custom sizes dynamic shapes caching (#108654 ) This PR fixes the ownership/lifetime handling for tensor subclasses that override sizes/strides, when tensors get resized. This is needed now, because `FunctionalTensor` is a subclass that has a custom size/stride (so it can plumb requests to its inner tensor), and is also a core piece of infra (it's used during tracing in AOTAutograd, which means that metadata mutation and resizing that happens to work with torch.compile today needs to work with FunctionalTensor). After a bunch of discussion with @ezyang and @soulitzer, I updated `PyInterpreter::sym_sizes()` (and friends) so that: (1) They allocate a py::capsule buffer and stash it on the tensor on the first call to size/stride (2) On a size/stride call where we noticed that the number of dimensions on the tensor has changed (so our buffer it stale), we re-allocate the buffer (3) On a size/strude cal where we notice that the number of dimensions is the same, but the values are different (this happens whenever a tensor experiences a metadata mutation, like `.transpose_()`), we inplace-modify the buffer and put the new ints/symints into it I also ended up doing the SmallVector optimization, which was required to fix some tests in AOTAutograd. Ideally we should look into those tests, and nail down the parts of our codebase that rely on SmallVector not re-allocating on a resize... but I'm saving this for a followup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108654 Approved by: https://github.com/ezyang	2023-09-22 07:09:04 +00:00
rzou	8124a6c40c	[TORCH_LIBRARY] Add impl_abstract_pystub (#109529 ) We want users to be able to define custom ops in C++ but put the abstract impl in Python (since it is easier to write them in Python and the abstract impl better models device semantics and data-dependent operators). `m.impl_abstract_pystub(opname, python_module, context)` declares the abstract_impl of the operator to exist in the given python module. When the abstract_impl needs to be accessed (either via FakeTensor or Meta), and it does not exist, the PyTorch Dispatcher will yell with a descriptive error message. Some details: - We construct a new global AbstractImplPyStub mapping in Dispatcher.cpp. Read/write to this map is protected by the Dispatcher lock. - We add a new Meta Tensor fallback kernel. The fallback errors out if there is no meta kernel, but also offers a nicer error message if we see that there is a pystub. - We create a `torch._utils_internal.throw_abstract_impl_not_imported_error` helper function to throw errors. This way, we can throw different error messages in OSS PyTorch vs internal PyTorch. To invoke this from C++, we added a PyInterpreter::throw_abstract_impl_not_imported_error. Differential Revision: [D49464753](https://our.internmc.facebook.com/intern/diff/D49464753/) Differential Revision: [D49464753](https://our.internmc.facebook.com/intern/diff/D49464753) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109529 Approved by: https://github.com/ezyang, https://github.com/bdhirsh	2023-09-22 04:55:36 +00:00
Edward Z. Yang	09622d8d49	Allow inferring size-nature from sizes passed to empty constructor (#109720 ) This removes the need for many constrain_as_size calls as we now infer them from error checking for sizes. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109720 Approved by: https://github.com/aakhundov	2023-09-21 17:57:40 +00:00
Aleksei Nikiforov	b91ba226ce	Don't use cpuinfo on s390x (#109496 ) It doesn't support s390x and just crashes pytorch on init. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109496 Approved by: https://github.com/huydhn	2023-09-21 12:20:49 +00:00
soulitzer	5252fcb133	Handle constant SymBool in unary and binary operations (#109169 ) In this PR: - When Constant SymNode are detected in unary/binary ops demote them to plain int/bool before proceeding. Sometimes this means doing a unary op with a Constant SymNode would result in a plain bool. - Introduce an is_symbolic method, only available from Python. We need this because isinstance(x, SymInt) is no longer sufficient to check whether a given int/SymInt is symbolic or not. See later PR in the stack to see how this is used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109169 Approved by: https://github.com/ezyang	2023-09-20 20:37:15 +00:00
PyTorch MergeBot	1cc052bcab	Revert "[1/N] Add -Wdeprecated and related fixes (#108626 )" This reverts commit `a53a677b4d`. Reverted https://github.com/pytorch/pytorch/pull/108626 on behalf of https://github.com/clee2000 due to I'm getting errors internally that look like the below on x86_64-apple-ios-simulator with clang 16 ([comment](https://github.com/pytorch/pytorch/pull/108626#issuecomment-1728102447))	2023-09-20 16:49:11 +00:00
cyy	a53a677b4d	[1/N] Add -Wdeprecated and related fixes (#108626 ) This PR adds -Wdeprecated to CMake warnings and fixes related issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108626 Approved by: https://github.com/ezyang, https://github.com/Skylion007	2023-09-19 09:24:04 +00:00
cyy	ac603bc2f8	[Reland] Eliminate invocations of c10::stoi,c10::stod,c10::stoull,c10::stoll (#109566 ) This is reland of #87603 with definitions of c10::stoXX kept for further investigation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109566 Approved by: https://github.com/huydhn	2023-09-19 07:15:25 +00:00
Edward Z. Yang	77d745666b	Add TORCH_CHECK_ALWAYS_SHOW_CPP_STACKTRACE (#109373 ) Unlike TORCH_CHECK, these always show C++ stacktrace on error. Put it on errors where you frequently seem to need this information. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109373 Approved by: https://github.com/bdhirsh ghstack dependencies: #109372	2023-09-18 19:46:32 +00:00
Edward Z. Yang	8a1bbf383d	Out-of-line cannot call with symbolic error test (#109372 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109372 Approved by: https://github.com/bdhirsh	2023-09-18 19:46:32 +00:00
PyTorch MergeBot	4d44d8c00a	Revert "Eliminate c10::stoi,c10::stod,c10::stoull,c10::stoll (#109179 )" This reverts commit `852f1b8417`. Reverted https://github.com/pytorch/pytorch/pull/109179 on behalf of https://github.com/huydhn due to Sorry for reverting your change but this is breaking periodic buck build, so please fix the issue and reland the change https://github.com/pytorch/pytorch/actions/runs/6207458526/job/16852695272 ([comment](https://github.com/pytorch/pytorch/pull/109179#issuecomment-1724168571))	2023-09-18 18:41:12 +00:00
Rockerz	05c31b3b69	typo in DispatchKeySet.h (#109431 ) Fixes #108641 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109431 Approved by: https://github.com/Skylion007	2023-09-18 17:34:36 +00:00
cyy	852f1b8417	Eliminate c10::stoi,c10::stod,c10::stoull,c10::stoll (#109179 ) We can remove these functions in favor of std ones. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109179 Approved by: https://github.com/colesbury	2023-09-16 07:22:50 +00:00
Brian Hirsh	f22b303f65	Add TorchDispatch version of functionalization (#106404 ) This PR adds a new `FunctionalTensor` subclass, and `FunctionalTensorMode` torch dispatch mode. Together, this class/mode are a lightweight wrapper around our existing C++ functionalization logic. This idea came from Ed - later in the stack, I want to be able to run functionalization underneath torch_dispatch, when performing tracing in AOTAutograd. I can't do this easily with vanilla C++ functionalization, because it has a dedicated dispatch key that always runs before TorchDispatch. However, by adding a torch_dispatch mode shim around functionalization, we can use functionalization as a torch_dispatch mode, which will make it easier to run underneath other modes later. This PR provides the basic new classes, and some light testing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106404 Approved by: https://github.com/ezyang	2023-09-15 20:19:25 +00:00
cyy	8cb96f5f2c	[Reland]Use cpuinfo to determine c10::ThreadPool thread number (#107339 ) Relands PR #107010 and fixes BUCK builds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107339 Approved by: https://github.com/ezyang	2023-09-14 23:44:23 +00:00
soulitzer	4667a5c948	Update SingletonSymNode to allow more comparisons (#108315 ) In this PR: - {in,}equality between singleton and plain ints returns false instead of erroring - Morally define the semantic of j0 > c to be as if j0 represented an array [s_0, s_1, ... s_n] and s_k > c for all k - Just like for equality, we don't actually want to do the comparison one by one, instead j0 is constrained to some range [min, max]. By default this range is [2, int64_t::max] so that it acts like a size and passes 0/1 specialization checks. - In the future, we can define some API to allow users to constrain the range of their singletons Pull Request resolved: https://github.com/pytorch/pytorch/pull/108315 Approved by: https://github.com/ezyang	2023-09-13 01:58:02 +00:00
Kurt Mohler	4c5e43574c	Reland 2: Add PyObject preservation for UntypedStorage (#109039 ) Relands #103907 after it was reverted. This PR makes the new `ignore_hermetic_tls` argument of `check_pyobj` optional to avoid causing a compilation error in torchdistx Part of #91395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109039 Approved by: https://github.com/ezyang	2023-09-12 22:26:05 +00:00
PyTorch MergeBot	59f605be57	Revert "Reland 2: Add PyObject preservation for UntypedStorage (#109039 )" This reverts commit `419e4e17a2`. Reverted https://github.com/pytorch/pytorch/pull/109039 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing linter job in trunk, probably due to a landrace ([comment](https://github.com/pytorch/pytorch/pull/109039#issuecomment-1715147020))	2023-09-12 07:26:11 +00:00
Kurt Mohler	419e4e17a2	Reland 2: Add PyObject preservation for UntypedStorage (#109039 ) Relands #103907 after it was reverted. This PR makes the new `ignore_hermetic_tls` argument of `check_pyobj` optional to avoid causing a compilation error in torchdistx Part of #91395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109039 Approved by: https://github.com/ezyang	2023-09-12 01:19:40 +00:00
cyy	f150f96255	[Reland] increase clang-tidy coverage in torch/csrc (#108875 ) Reland PR #103058 since there was a time gap between this PR and other PRs in terms of torch/csrc modifications Pull Request resolved: https://github.com/pytorch/pytorch/pull/108875 Approved by: https://github.com/Skylion007	2023-09-12 00:54:53 +00:00
PyTorch MergeBot	68238606f3	Revert "Reland: Add PyObject preservation for UntypedStorage (#103907 )" This reverts commit `56b848157c`. Reverted https://github.com/pytorch/pytorch/pull/103907 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing torchdistx build which uses check_pyobj here `9c1b9f5cb2/src/python/torchdistx/_C/deferred_init.cc (L87)` ([comment](https://github.com/pytorch/pytorch/pull/103907#issuecomment-1712121158))	2023-09-08 19:27:07 +00:00
PyTorch MergeBot	fa8bfe5ca2	Revert "increase clang-tidy coverage in torch/csrc (#103058 )" This reverts commit `cdf7f3e780`. Reverted https://github.com/pytorch/pytorch/pull/103058 on behalf of https://github.com/atalman due to Sorry for reverting your change, breaks lint ([comment](https://github.com/pytorch/pytorch/pull/103058#issuecomment-1711906915))	2023-09-08 16:07:41 +00:00
cyy	cdf7f3e780	increase clang-tidy coverage in torch/csrc (#103058 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/103058 Approved by: https://github.com/Skylion007	2023-09-08 15:07:32 +00:00
Joel Schlosser	b928e08f3d	Initial vmap + NT support with unbind fallback (#106786 ) PoC demonstrating vmap + NT based on the [design doc](https://docs.google.com/document/d/1dVVk6TOqz93PLTIneU2T3xaxCs9qZ0MaJyCvOAp_bC0). This PR: * Allows `BatchedTensorImpl`s to contain NTs * Introduces a `BatchedNestedTensor` dispatch key for NT-specific batching rules * Provides a batching rule fallback that unbinds the NTs -> performs computation on constituent -> rebinds results into NT Restrictions: * Only supports one level of vmap * Only supports vmapping over dim=0 for NTs * For operations with mixed NT / dense inputs, support is also limited to dim=0 for the dense inputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/106786 Approved by: https://github.com/zou3519	2023-09-07 13:53:20 +00:00
Kurt Mohler	56b848157c	Reland: Add PyObject preservation for UntypedStorage (#103907 ) This relands #97470 after #102553 reverted it. This PR attempts to fix the internal failure by avoiding an unnecessary intermediate storage buffer allocation in `c10::newStorageImplFromRefcountedDataPtr`. Part of #91395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103907 Approved by: https://github.com/ezyang	2023-09-07 04:24:11 +00:00
cyy	01fc6466d1	[Reland] [1/N] fix clang-tidy warnings in torch/csrc (#108114 ) Reland of PR #107648 with auto replaced with Py_ssize_t in eval_frame.c. This PR applies fixes to some found issues by clang-tidy in torch/csrc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108114 Approved by: https://github.com/Skylion007	2023-08-30 17:11:16 +00:00
Brian Hirsh	da54f3c519	reorder proxy / fake modes so they always run last (#104482 ) Update: Made refactor of the original PR. See the original description below, but here I'll describe the updates: (1) TLS changes in `TorchDispatchModeTLS.h/cpp`. I added a `TorchDispatchModeKey` enum, that (for now) just contains PROXY and FAKE. The ModeTLS used to just contain a `std::vector<std::shared_ptr<c10::SafePyObject>>` corresponding to the mode stack. It now also contains a separate array of "infra modes", indexed by mode key (PROXY and FAKE, with a new addition, FUNCTIONAL, coming later in the stack). `TorchDispatchModeTLS::push_onto_stack` and `TorchDispatchModeTLS::pop_stack` are now a bit more complicated. Pushing accepts an optional mode_key, which if set, tells us to add the given mode directly to our "infra_modes" array. Popping will first check the "user mode" stack, before trying to pop anything from the infra mode stack. It also optionally returns the mode key of the mode we popped if there was one - that way if we push that same mode back onto the TLS later, we know where it goes. `TorchDispatchModeTLS::dispatch_mode_enabled()` now accepts an optional `skip_infra_modes` param, so you can separately query if there are "any modes at all", or if there are "any user modes". `TorchDispatchModeTLS::get/set/unset_mode()` all take in a mode key, and get/set/unset the mode at that particular mode key (meaning they are only meant to be used for infra modes). There were also some mild codegen changes to support the new enum (2) `fake_tensor.py/proxy_tensor.py/_python_dispatch.py` The way I tell the infra that certain subclasses/modes are "infra" is through the enum: I gave `FakeTensor` and `FakeTensorMode` a `self._mode_key = torch._C.TorchDispatchModeKey.FAKE`. `TorchDispatchMode.__enter/exit__()` (in `_python_dispatch.py` now check if the current mode has a mode key, and if so they plumb it into any `push_onto_stack()` calls (which eventually instructs `TorchDispatchModeTLS` where to put the mode). Same thing for `ProxyTorchDispatchMode`. I also had to change both of these mode's enter/exit, to handle the fact that there can no longer be multiple proxy/fake modes on the mode stack at once. I updated them both to have a `self.enter_stack: List[Optional[TorchDispatchMode]]` - whenever we push a given mode in `__enter__`, we remove the current ambient fake/proxy mode from the mode stack, and save it in `enter_stack`, so that on exit we can reset the state properly. (2) dispatching logic in `python_arg_parser.cpp` This is where the core dispatching logic changes are. I added two helpers, `dispatch_on_subclass()` and `dispatch_on_mode()`. The overall dispatching order is now: ``` (a) dispatch_on_mode() # try user modes first (where the mode stack automatically considers infra modes last) (b) dispatch_on_subclass() # try user subclasses next (skipping infra subclasses) (c) dispatch_on_subclass() # try infra subclasses next (skipping user subclasses) ``` Note that we still want "user subclasses" to run before "infra modes". As Ed helped me realize, this will work today: If proxy/fake modes in step 1, they'll return NotImplemented if they see a user subclass, allowing us to redispatch to the user subclass. How do (b) and (c) distinguish between user and infra subclasses? Infra subclasses (FakeTensor, and later FunctionalTensor) are required to have a `_mode_key` hidden on the subclass - so we filter via arguments that do/don't have the _mode_key. (3) I also changed `DoubleTensor` to `TwoTensor` to minimize confusion (@albanD pointed out that DoubleTensor would be easily confused with `torch.FloatTensor` and friends). ----- original description below ----- The main purpose of this PR is to fix the "ordering problem" between torch_dispatch modes, where we want to ensure that our Fake and Proxy dispatch modes always run after any dispatch modes created by the user, regardless of where they are in the stack. See this doc for more details: https://docs.google.com/document/d/1COQ291nOZvtFnzGTQMJqoYZ3sttEYFw_7HbfSyL8gcA/edit Full set of changes below. I ended up including a few semi-related changes in this PR that I documented - but if folks would rather I separate them out, happy to try to do that. (1) Add dedicated TLS slots for FakeTensorMode and ProxyTensorMode This is the main component of this PR. There are two new slots, `TorchDispatchModeTLS.fake_mode_` and `TorchDispatchModeTLS.proxy_mode_`, which correspond to a single "global" fake and proxy mode. There is now an invariant that `torchDispatchModeState.stack_` can never contain either of these modes. I also added a `TorchDispatchModeTLS::maybe_highest_mode()` helper that consults the `stack_` as well as both the proxy and fake slots, and returns the highest priority mode - this is because there are a few places in the codebase where we legitimately want to get the highest priority mode, including fake or proxy, if one is set. This also made the implementations of the existing `disable_proxy_modes_tracing()` and `get_innermost_proxy_mode()` marginally simpler. (2) Updated the dispatching logic in handle_torch_function_no_python_arg_parser() This is the function that actually figures out which torch_dispatch implementation to call, given the current mode stack and tensor subclass inputs. This function got marginally more complicated as part of the refactor: First we inspect the mode stack and any non-fake subclass inputs. Then we check for the proxy mode slot. Then we check for the Fake mode slot, before finally checking for any fake subclass inputs. (3) new python `_get_fake_tensor_mode()` and `_get_proxy_tensor_mode()` API's Before, if you wanted to see if proxy or fake modes were active in python, you would have to consult the mode stack. Since these two modes are no longer part of the actual mode stack, I added two new API's to directly check if either proxy or fake modes are active. (4) Allow traceable tensor subclasses to access storages from python This is convenient later in the stack, where AOTAutograd needs to detect aliasing of inputs and outputs, where those inputs and outputs might be tensor subclasses. Previously, `x.untyped_storage()` would raise an error if `x` was a subclass. In this PR, I tried to relax this constraint as little as possible: `THPVariable_storage()` will only try to return a storage to python if the tensor subclass that you are passing in is "traceable" (5) Fixed subclass fakeification @wanchaol recently added support to be able to fakeify tensor subclasses. That fakeification logic works in most cases, but there is one case it doesn't handle: autograd metadata. In particular, since autograd sees our tensor subclasses and not their desugared tensors, we need to make sure that our fakeified subclass has the same autograd metadata as the original subclass. I updated `meta_utils.py` to make sure that the autograd metadata is correct. (6) make tensor subclasses resizeable Previously we didn't allow tensor subclasses to be resizeable. I ran into an issue where fakeifying a tensor subclass occasionally requires swapping out its storage, which can involve resizing the tensor. Mechanically, this required updating `at::for_blob()` to expose a way to request that the tensor that you create has resizeable storage, and then using this new API in `_make_wrapper_tensor()`. (7) Added a basic DoubleTensor subclass for testing I use this subclass more later in this stack in my AOTAutograd tests - but it serves as a simple subclass example to test the dispatch ordering in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104482 Approved by: https://github.com/ezyang ghstack dependencies: #107415	2023-08-29 02:36:48 +00:00
PyTorch MergeBot	8cbf77585d	Revert "[1/N] fix clang-tidy warnings in torch/csrc (#107648 )" This reverts commit `49eeca00d1`. Reverted https://github.com/pytorch/pytorch/pull/107648 on behalf of https://github.com/osalpekar due to This causes breakages due to underspecified type ([comment](https://github.com/pytorch/pytorch/pull/107648#issuecomment-1696372588))	2023-08-28 20:35:12 +00:00
cyy	d9fb7166d6	[BE] use DeviceIndex instead of int64_t for related device interfaces (#103068 ) This PR unifies the device interfaces in aten/cpp and torch/csrc/cpp to use c10::DeviceIndex. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103068 Approved by: https://github.com/malfet	2023-08-25 20:16:14 +00:00
cyy	49eeca00d1	[1/N] fix clang-tidy warnings in torch/csrc (#107648 ) Apply fixes to some found issues by clang-tidy in torch/csrc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107648 Approved by: https://github.com/Skylion007	2023-08-25 00:30:09 +00:00
soulitzer	d7130e9704	Add SingletonSymIntNode (#107089 ) Adds `SingletonSymNodeImpl` (alternatively, `SkolemSymNodeImpl`). This is a int-like object that only allows the`eq` operation; any other operation produces an error. The main complexity is that we require operations that dispatch to SymNode must take and return SymNodes, but when performing operations involving `SingletonSymNodeImpl`, operations involving SymNode can return non-SymNode bools. For more discussion see [here](https://docs.google.com/document/d/18iqMdnHlUnvoTz4BveBbyWFi_tCRmFoqMFdBHKmCm_k/edit) - Introduce `ConstantSymNodeImpl` a generalization of `LargeNegativeIntSymNodeImpl` and replace usage of `LargeNegativeIntSymNodeImpl` in SymInt. - Also use ConstantSymNodeImpl to enable SymBool to store its data on a SymNode. Remove the assumption that if SymBool holds a non-null SymNode, it must be symbolic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107089 Approved by: https://github.com/ezyang ghstack dependencies: #107839	2023-08-24 21:38:47 +00:00
Yukio Siraichi	bcede143bd	Do not mutate `SymNode` expression. (#107492 ) This PR stops `SymNode` from mutating (i.e. simplifying) its expression. Instead, the simplification (without mutation) is deferred to the `SymNode.maybe_as_int` method. ```python - FakeTensor(size=(s0,), ...) - FakeTensor(size=(s1, s2, s3), ...) - Eq(s0, s1 + s2 + s3) - FakeTensor(size=(s0,), ...) - FakeTensor(size=(s1, s2, s3), ...) ``` In summary, this PR: - Replaces `SymNode._expr` by `SymNode.expr`, removing the old property function - This makes it so `SymNode` instances never update their expression - Creates `SymNode.simplified_expr()` method for actually calling `ShapeEnv.replace` on its expression. Note that this doesn't updates `SymNode.expr` - Changes how `tensor.size()` gets converted to its Python `torch.Size` type - Instead of calling `SymInt::maybe_as_int()` method, we create a new `SymInt::is_symbolic()` method for checking whether it is actually a symbolic value - This is needed so that when we call `tensor.size()` in the Python side, the returned sequence is faithful to the actual data, instead of possibly simplifying it and returning an integer - 2 files needs this modification: - _torch/csrc/Size.cpp_: for handling `torch.Tensor.size` Python calls - _torch/csrc/utils/pybind.cpp_: for handling `symint.cast()` C++ calls Pull Request resolved: https://github.com/pytorch/pytorch/pull/107492 Approved by: https://github.com/ezyang ghstack dependencies: #107523	2023-08-22 12:38:05 +00:00
Alan Ji	2d727c8c3f	remove the duplicate method `is_private_use1` in class Device (#107198 ) In the `Device` class, there are two methods with similar functions called `is_private_use1` and `is_privateuseone`. `ddf36c82b8/c10/core/Device.h (L84-L87)` `ddf36c82b8/c10/core/Device.h (L159-L162)` The former is not being utilized and therefore, this PR removes it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107198 Approved by: https://github.com/bdhirsh	2023-08-17 18:23:29 +00:00
PyTorch MergeBot	6c0bba3daf	Revert "Use cpuinfo to determine c10::ThreadPool thread number (#107010 )" This reverts commit `ad0476540d`. Reverted https://github.com/pytorch/pytorch/pull/107010 on behalf of https://github.com/izaitsevfb due to Breaks internal meta builds ([comment](https://github.com/pytorch/pytorch/pull/107010#issuecomment-1679866821))	2023-08-16 02:20:31 +00:00
Edward Z. Yang	e1ee10e6f5	Add expect_true for irrefutable guards (#106720 ) Here's what it does from the comments: ``` Assume that a boolean is true for the purposes of subsequent symbolic reasoning. This will keep track of corresponding runtime checks to verify that the result is upheld: either as a regular guard, or as a special set of asserts which are triggered when an unbacked SymInt is allocated. DO NOT use this function for these cases: - This is inappropriate for "branching" conditions (where both true and false result in valid programs). We will always assume the condition evaluates true, and so it will never be possible to trace the false condition when you use it. For true branching on unbacked SymInts, you must use torch.cond. - This is inappropriate for situations where you know some other system invariant guarantees that this property holds, since you don't really need to insert a runtime check in that case. Use something like constrain_range in that case. This API has a hitch. To avoid having to reimplement error reporting capabilities, this function CAN return False. The invariant is that the surrounding code must raise an error when this function returns False. This is quite low level, so we recommend using other functions like check() which enforce this in a more intuitive way. By the way, this name is a nod to the __builtin_expect likely macro, which is used similarly (but unlike __builtin_expect, you MUST fail in the unlikely branch.) ``` We don't do anything with this right now, except use it to discharge regular guards. Follow up PRs to (1) use it at important error checking sites, (2) actually ensure the runtime asserts make there way into the exported IR / inductor generated code. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106720 Approved by: https://github.com/ysiraichi, https://github.com/voznesenskym	2023-08-15 18:42:22 +00:00
cyy	ad0476540d	Use cpuinfo to determine c10::ThreadPool thread number (#107010 ) This PR prefers "logical processor number" (the cpu cores shown in htop) returned by cpuinfo for determining c10 thread number. If that fails, it uses hardware_concurrency exactly. The motivation is that in a x86 host with 64 cores and Hyper-Threading disabled, the current behavior uses 32 threads, resulting half of cores being idle. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107010 Approved by: https://github.com/ezyang	2023-08-15 17:26:24 +00:00
Nikita Shulga	22a20d0850	Add `isFloat8Type` predicate (#106977 ) And make float8 dtypes part of `isReducedFloatingType()` predicate Pull Request resolved: https://github.com/pytorch/pytorch/pull/106977 Approved by: https://github.com/albanD	2023-08-11 20:13:57 +00:00
Nikita Shulga	97396cdbb2	Fix undefined behavior detected by clang-12 (#106354 ) Compiler behavior when non-zero offset is added to a null pointer is undefined and is a bad habit. - When `lapackEig` is called with to estimate a workspace size, do not add matrix size to the W pointer. - When `unpack_pivots_cpu_kernel` with zero `dim_size` exit early. - When `topk_impl_loop` is called with `k` is zero, exit right away as output tensors are empty anyway. - Ignore adding non-zero storage-offset in `TensorImpl::data_ptr_impl_impl`, which can be the case if tensor is created as `torch.empty(3)[4:]`. - In `s_addmm_out_sparse_dense_worker` do not call `axpy` over an empty vector. - In `_sparse_binary_op_intersection_kernel_impl` do skip computing `ptr_indices_dim` when `sparse_dim` is empty. - Exit `grid_sample` forward/backward kernels earlier if either `input` or `grid` are empty tensors. Found by asan in clang-12 Before the change UBSan report looks as follows: ``` ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-12/bin/llvm-symbolizer UBSAN_OPTIONS=print_stacktrace=1 LD_PRELOAD=/usr/lib/llvm-12/lib/clang/12.0.1/lib/linux/libclang_rt.asan-x86_64.so python test_fx_experimental.py -v -k test_normalize_operator_exhaustive_linalg_eig_cpu_float32 Test results will be stored in test-reports/python-unittest/test_fx_experimental Running tests... ---------------------------------------------------------------------- test_normalize_operator_exhaustive_linalg_eig_cpu_float32 (__main__.TestNormalizeOperatorsCPU) ... /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:111: UserWarning: 'has_cuda' is deprecated, please use 'torch.backends.cuda.is_built()' torch.has_cuda, /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:112: UserWarning: 'has_cudnn' is deprecated, please use 'torch.backends.cudnn.is_available()' torch.has_cudnn, /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:118: UserWarning: 'has_mps' is deprecated, please use 'torch.backends.mps.is_built()' torch.has_mps, /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:119: UserWarning: 'has_mkldnn' is deprecated, please use 'torch.backends.mkldnn.is_available()' torch.has_mkldnn, /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:937:17: runtime error: applying non-zero offset 20 to null pointer #0 0x7f2025794888 in void at::native::lapackEig<float, float>(char, char, int, float, int, float, float, int, float, int, float, int, float, int) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9945888) #1 0x7f20257da256 in void at::native::(anonymous namespace)::apply_linalg_eig<float>(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, bool) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x998b256) #2 0x7f20257d902d in at::native::(anonymous namespace)::linalg_eig_kernel(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor const&, bool) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x998a02d) #3 0x7f20257b5b3d in at::native::linalg_eig_out_info(at::Tensor const&, at::Tensor&, at::Tensor&, at::Tensor&, bool) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9966b3d) #4 0x7f20257b4770 in at::native::linalg_eig_out(at::Tensor const&, at::Tensor&, at::Tensor&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9965770) #5 0x7f20280710e6 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor&, at::Tensor&> (at::Tensor const&, at::Tensor&, at::Tensor&), &(at::(anonymous namespace)::(anonymous namespace)::wrapper_CPU_out_linalg_eig_out(at::Tensor const&, at::Tensor&, at::Tensor&))>, std::tuple<at::Tensor&, at::Tensor&>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor&, at::Tensor&> >, std::tuple<at::Tensor&, at::Tensor&> (at::Tensor const&, at::Tensor&, at::Tensor&)>::call(c10::OperatorKernel, c10::DispatchKeySet, at::Tensor const&, at::Tensor&, at::Tensor&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xc2220e6) #6 0x7f202727a045 in at::_ops::linalg_eig_out::call(at::Tensor const&, at::Tensor&, at::Tensor&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xb42b045) #7 0x7f20257b7e29 in at::native::linalg_eig(at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9968e29) #8 0x7f2028070bf0 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (at::Tensor const&), &(at::(anonymous namespace)::(anonymous namespace)::wrapper_CPU__linalg_eig(at::Tensor const&))>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&> >, std::tuple<at::Tensor, at::Tensor> (at::Tensor const&)>::call(c10::OperatorKernel, c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xc221bf0) #9 0x7f2026b1f787 in std::tuple<at::Tensor, at::Tensor> c10::Dispatcher::redispatch<std::tuple<at::Tensor, at::Tensor>, at::Tensor const&>(c10::TypedOperatorHandle<std::tuple<at::Tensor, at::Tensor> (at::Tensor const&)> const&, c10::DispatchKeySet, at::Tensor const&) const (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xacd0787) #10 0x7f20273230a7 in at::_ops::linalg_eig::redispatch(c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xb4d40a7) #11 0x7f202c3cc32d in torch::autograd::VariableType::(anonymous namespace)::linalg_eig(c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x1057d32d) #12 0x7f202c3cba96 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&), &(torch::autograd::VariableType::(anonymous namespace)::linalg_eig(c10::DispatchKeySet, at::Tensor const&))>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&> >, std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&)>::call(c10::OperatorKernel, c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x1057ca96) #13 0x7f20272798e0 in at::_ops::linalg_eig::call(at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xb42a8e0) #14 0x7f2043d97ae3 in torch::autograd::THPVariable_linalg_eig(_object, _object, _object*) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_python.so+0x23feae3) #15 0x5072d6 in cfunction_call /usr/local/src/conda/python-3.9.17/Objects/methodobject.c:543:19 ... SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:937:17 in ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/106354 Approved by: https://github.com/huydhn, https://github.com/lezcano	2023-08-03 05:36:03 +00:00
Jun Luo	17a3141696	Support is_mtia() (#106396 ) Summary: As title. Test Plan: CI tests. Reviewed By: yuhc Differential Revision: D47937061 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106396 Approved by: https://github.com/yuhc, https://github.com/ezyang	2023-08-02 03:24:23 +00:00
cyy	b3e24c53eb	use performance-unnecessary-value-param in clang-tidy (#102615 ) performance-unnecessary-value-param has been disabled in clang-tidy for a long time. However, this check is actually useful and able to some interesting performance problems. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102615 Approved by: https://github.com/malfet, https://github.com/Skylion007	2023-07-28 17:37:03 +00:00
cyy	646fa36875	Add const reference in opportunities detected by clang-tidy (#105931 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105931 Approved by: https://github.com/Skylion007	2023-07-26 21:38:10 +00:00
Amadeusz Skrzypczak	b64bd4a5dd	Add torch.float8_e5m2 and torch.float8_e4m3 data types (#104242 ) Proposal of two float8 variants - e5m2 and e4m3 - based on https://arxiv.org/pdf/2209.05433.pdf Hide all Float8 operator implementations behind `#if !defined(C10_MOBILE)` guard to keep Android build size almost unchanged TODO: - Refactor duplicated code - Cleanup unbalanced pragma pop in dtype utils - Add native implementation on the CUDA size Co-authored-by: Nikita Shulga <nshulga@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/104242 Approved by: https://github.com/albanD	2023-07-20 16:09:11 +00:00
PyTorch MergeBot	f2b15772ff	Revert "Add torch.float8_e5m2 and torch.float8_e4m3 data types (#104242 )" This reverts commit `a9804130e5`. Reverted https://github.com/pytorch/pytorch/pull/104242 on behalf of https://github.com/PaliC due to breaks lint (run lintrunner and remerge) ([comment](https://github.com/pytorch/pytorch/pull/104242#issuecomment-1644150284))	2023-07-20 15:37:53 +00:00
Amadeusz Skrzypczak	a9804130e5	Add torch.float8_e5m2 and torch.float8_e4m3 data types (#104242 ) Proposal of two float8 variants - e5m2 and e4m3 - based on https://arxiv.org/pdf/2209.05433.pdf Hide all Float8 operator implementations behind `#if !defined(C10_MOBILE)` guard to keep Android build size almost unchanged TODO: - Refactor duplicated code - Cleanup unbalanced pragma pop in dtype utils - Add native implementation on the CUDA size Co-authored-by: Nikita Shulga <nshulga@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/104242 Approved by: https://github.com/albanD	2023-07-20 09:45:45 +00:00
Edward Z. Yang	1152e86da1	Transmute refined SymInt into int (#104828 ) Previously, x.size(0) could return a SymInt, even when the internal sympy expression was actually already constant (e.g., due to an introduced guard.) We now allow to query the Python object with maybe_as_int which allows us to transmute these objects back to int when possible. It is still possible to end up with a constant SymInt even after this change, e.g., if you get out a SymInt and while holding onto it specialize it, but casual users are more likely to get ints when they want to. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/104828 Approved by: https://github.com/Skylion007	2023-07-15 18:46:10 +00:00
PyTorch MergeBot	1c69f363c4	Revert "Transmute refined SymInt into int (#104828 )" This reverts commit `0f322a300e`. Reverted https://github.com/pytorch/pytorch/pull/104828 on behalf of https://github.com/ezyang due to executorch failure ([comment](https://github.com/pytorch/pytorch/pull/104828#issuecomment-1635997559))	2023-07-14 15:08:11 +00:00
Edward Z. Yang	0f322a300e	Transmute refined SymInt into int (#104828 ) Previously, x.size(0) could return a SymInt, even when the internal sympy expression was actually already constant (e.g., due to an introduced guard.) We now allow to query the Python object with maybe_as_int which allows us to transmute these objects back to int when possible. It is still possible to end up with a constant SymInt even after this change, e.g., if you get out a SymInt and while holding onto it specialize it, but casual users are more likely to get ints when they want to. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/104828 Approved by: https://github.com/Skylion007	2023-07-13 07:02:52 +00:00
PyTorch MergeBot	06a5df8d31	Revert "Transmute refined SymInt into int (#104828 )" This reverts commit `4694f54356`. Reverted https://github.com/pytorch/pytorch/pull/104828 on behalf of https://github.com/ezyang due to broke inductor ([comment](https://github.com/pytorch/pytorch/pull/104828#issuecomment-1633049980))	2023-07-12 18:57:58 +00:00
Edward Z. Yang	4694f54356	Transmute refined SymInt into int (#104828 ) Previously, x.size(0) could return a SymInt, even when the internal sympy expression was actually already constant (e.g., due to an introduced guard.) We now allow to query the Python object with maybe_as_int which allows us to transmute these objects back to int when possible. It is still possible to end up with a constant SymInt even after this change, e.g., if you get out a SymInt and while holding onto it specialize it, but casual users are more likely to get ints when they want to. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/104828 Approved by: https://github.com/Skylion007	2023-07-12 16:40:21 +00:00
Jason Ansel	dffcf999bd	Misc changes from compiled autograd branch (#104316 ) This PR pulls out some standalone changes from #103822 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104316 Approved by: https://github.com/ezyang	2023-07-08 20:59:20 +00:00
Anthony Alayo	8d65635378	Prefixing DeviceType with c10 namespace to avoid name collisions (#104364 ) Fixes #91338 Follow up from https://github.com/pytorch/pytorch/pull/91342 > 🚀 The feature, motivation and pitch > We have an existing DeviceType class all over the place in our code base, and it conflicts with the one that is used in torch. > Thankfully the pytorch DeciceType enum class is under the c10 namespace. ``` In file included from /xxx/build/_deps/torch-src/../../aten/src/ATen/ops/view.h:5: /xxx/_deps/torch-src/aten/src/ATen/Context.h:265:14: error: reference to 'DeviceType' is ambiguous if (p == DeviceType::HIP) { ^ /xxx/include/Common_types.h:178:8: note: candidate found by name lookup is 'DeviceType' struct DeviceType { ^ /xxx/build/_deps/torch-src/c10/../c10/core/DeviceType.h:32:12: note: candidate found by name lookup is 'c10::DeviceType' enum class DeviceType : int8_t { ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/104364 Approved by: https://github.com/albanD	2023-07-07 13:23:03 +00:00
Elias Ellison	7bb40be143	Fix fake tensor for private use backends (#103090 ) Fix for https://github.com/pytorch/pytorch/issues/101244 We need meta to be higher priority than PrivateUse1 (as it is for cpu and cuda) so that when meta is in tls we hit meta kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103090 Approved by: https://github.com/bdhirsh	2023-06-27 21:17:40 +00:00
Xu Han	6c1ccccf21	Enable mimalloc on pytorch Windows (#102595 ) This PR is implemention of [#102534](https://github.com/pytorch/pytorch/issues/102534), option 2. Major changes: 1. Add mimalloc to the submodule. 2. Add build option "USE_MIMALLOC". 3. It is only enabled on Windows build, And it would improve pytorch memory allocation performance. Additional Test: <img width="953" alt="image" src="https://github.com/pytorch/pytorch/assets/8433590/4b2ec2dc-16f1-4ad9-b457-cfeb37e489d3"> This PR also build & static link mimalloc on Linux well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102595 Approved by: https://github.com/jgong5, https://github.com/malfet	2023-06-27 08:53:26 +00:00
Meghan	6ff4548b6e	[AMP] Support XLA:TPU (#96370 ) With https://github.com/pytorch/xla/pull/5148, https://github.com/pytorch/xla/pull/4740 With these changes XLA:GPU users should use `torch.cuda.amp.autocast()` for AMP with float16 XLA:TPU users should use `torch.amp.autocast('xla')` for AMP with bfloat16 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96370 Approved by: https://github.com/bdhirsh, https://github.com/malfet	2023-06-23 19:46:42 +00:00

... 2 3 4 5 6 ...

1359 Commits