Commit Graph

369 Commits

Author SHA1 Message Date
Yuanyuan Chen
f231be25c6 Mark unused parameters in C++ code (#164912)
This PR adds unused parameter name comments in C++ declarations to improve code readability.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164912
Approved by: https://github.com/Skylion007
2025-10-09 06:23:25 +00:00
PyTorch MergeBot
331191ce4b Revert "[BE] Make PyObjectSlot use a global PyInterpreter (#162659)"
This reverts commit 29cbcbac42.

Reverted https://github.com/pytorch/pytorch/pull/162659 on behalf of https://github.com/izaitsevfb due to reverted internally, see [D83214133](https://www.internalfb.com/diff/D83214133) ([comment](https://github.com/pytorch/pytorch/pull/162659#issuecomment-3369348172))
2025-10-05 21:39:57 +00:00
PaliC
29cbcbac42 [BE] Make PyObjectSlot use a global PyInterpreter (#162659)
This pr gets rid of the pyobj_interpreter_ variable from PyObjectSlot and saves a word in the process

Gonna ask for review from @huydhn as there are some changes to CI.

Testing: imported internally and the failed android build seems to work now!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162659
Approved by: https://github.com/albanD, https://github.com/huydhn
2025-09-25 08:53:19 +00:00
PyTorch MergeBot
edafc902d7 Revert "[BE] Make PyObjectSlot use a global PyInterpreter (#162659)"
This reverts commit d1993c27ae.

Reverted https://github.com/pytorch/pytorch/pull/162659 on behalf of https://github.com/wdvr due to reverted internally, please see D82771705 @PaliC ([comment](https://github.com/pytorch/pytorch/pull/162659#issuecomment-3317110247))
2025-09-22 06:22:37 +00:00
Sahan Paliskara
d1993c27ae [BE] Make PyObjectSlot use a global PyInterpreter (#162659)
This pr gets rid of the pyobj_interpreter_ variable from PyObjectSlot and saves a word in the process

Gonna ask for review from @huydhn as there are some changes to CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162659
Approved by: https://github.com/albanD, https://github.com/huydhn
2025-09-17 16:40:55 +00:00
PyTorch MergeBot
4db203f875 Revert "[BE] Make PyObjectSlot use a global PyInterpreter (#162659)"
This reverts commit 05ee8114f8.

Reverted https://github.com/pytorch/pytorch/pull/162659 on behalf of https://github.com/jeanschmidt due to seems to have introduced errors in linting see https://github.com/pytorch/pytorch/actions/runs/17750689989/job/50444910643 ([comment](https://github.com/pytorch/pytorch/pull/162659#issuecomment-3298626136))
2025-09-16 12:52:57 +00:00
PaliC
05ee8114f8 [BE] Make PyObjectSlot use a global PyInterpreter (#162659)
This pr gets rid of the pyobj_interpreter_ variable from PyObjectSlot and saves a word in the process

Gonna ask for review from @huydhn as there are some changes to CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162659
Approved by: https://github.com/albanD, https://github.com/huydhn
2025-09-16 00:37:09 +00:00
FFFrog
2beffb3311 Refactoring TensorImpl by using constexpr and std::is_same_v (#161043)
As the title stated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161043
Approved by: https://github.com/Skylion007
2025-08-22 10:49:49 +00:00
Isalia20
7f4cb4a3e0 [MPS] coalesce for sparse tensors (#159729)
MPS coalesce function for sparse tensors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159729
Approved by: https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-08-08 13:49:55 +00:00
Laith Sakka
7cfd054075 [attempt 2] Compute contiguity symbolically to avoid dde, and introduce c++ sym_is_contiguous (#157472)
Summary:
When we compute contiguity for a tensor with dynamic shapes we first:
1) Try to compute it without guarding.
2) If all shapes hinted, compute it with potentially adding guards.
3) if any input is not hinted, compute it symbolically.

sym_is_contiguous return a SymBool that is then either evaluated or guard_or_false can be called
on it to avoid data dependent errors.

ex:
 bool is_contiguous = input.sym_is_contiguous().guard_or_false(__FILE__, __LINE__);
is_contiguous_or_false is a helper function that does that.

In this PR I only handle default contiguity, will follow up with changes for other formats like  channel_last .
We use this patter in this PR for several locations to avoid DDEs.

Test Plan:
contbuild & OSS CI,

Rollback Plan:

Reviewed By: malfet

Differential Revision: D77639021

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157472
Approved by: https://github.com/aorenste
2025-07-02 23:12:29 +00:00
PyTorch MergeBot
c6a27bae36 Revert "[do not revert] Compute contiguity symbolically to avoid dde, and introduce c++ sym_is_contiguous (#155590)"
This reverts commit d0a9629435.

Reverted https://github.com/pytorch/pytorch/pull/155590 on behalf of https://github.com/laithsakka due to was asked by to land this internally  ([comment](https://github.com/pytorch/pytorch/pull/155590#issuecomment-3025796794))
2025-07-01 22:58:14 +00:00
Laith Sakka
d0a9629435 [do not revert] Compute contiguity symbolically to avoid dde, and introduce c++ sym_is_contiguous (#155590)
When we compute contiguity for a tensor with dynamic shapes we first:
1) Try to compute it without guarding.
2) If all shapes hinted, compute it with potentially adding guards.
3) if any input is not hinted, compute it symbolically.

sym_is_contiguous return a SymBool that is then either evaluated or guard_or_false can be called
on it to avoid data dependent errors.

ex:
 bool is_contiguous = input.sym_is_contiguous().guard_or_false(__FILE__, __LINE__);
is_contiguous_or_false is a helper function that does that.

In this PR I only handle default contiguity, will follow up with changes for other formats like  channel_last .
We use this patter in this PR for several locations to avoid DDEs.
Differential Revision: [D77183032](https://our.internmc.facebook.com/intern/diff/D77183032)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155590
Approved by: https://github.com/ezyang
2025-07-01 21:39:38 +00:00
PyTorch MergeBot
1586521461 Revert "Compute contiguity symbolically to avoid dde, and introduce c++ sym_is_contiguous (#155590)"
This reverts commit 2c76f31221.

Reverted https://github.com/pytorch/pytorch/pull/155590 on behalf of https://github.com/jeanschmidt due to Breaking 1000s of internal builds, it cant be properly landed internally, there are no options except revert and codev. ([comment](https://github.com/pytorch/pytorch/pull/155590#issuecomment-3023503929))
2025-07-01 11:23:00 +00:00
Laith Sakka
2c76f31221 Compute contiguity symbolically to avoid dde, and introduce c++ sym_is_contiguous (#155590)
When we compute contiguity for a tensor with dynamic shapes we first:
1) Try to compute it without guarding.
2) If all shapes hinted, compute it with potentially adding guards.
3) if any input is not hinted, compute it symbolically.

sym_is_contiguous return a SymBool that is then either evaluated or guard_or_false can be called
on it to avoid data dependent errors.

ex:
 bool is_contiguous = input.sym_is_contiguous().guard_or_false(__FILE__, __LINE__);
is_contiguous_or_false is a helper function that does that.

In this PR I only handle default contiguity, will follow up with changes for other formats like  channel_last .
We use this patter in this PR for several locations to avoid DDEs.
Differential Revision: [D77183032](https://our.internmc.facebook.com/intern/diff/D77183032)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155590
Approved by: https://github.com/ezyang
2025-06-27 04:59:52 +00:00
Xuehai Pan
402ae09e41 [BE] fix typos in c10/ (#156078)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156078
Approved by: https://github.com/malfet, https://github.com/cyyever
2025-06-18 10:24:44 +00:00
Laith Sakka
2585960b47 remove redundent type_id (#155539)
Those were added in https://github.com/pytorch/pytorch/pull/92229 to prevent confusion of overloads.
but the variants that accepts SymBool are all removed in https://github.com/pytorch/pytorch/pull/112890
with the introduction of SymbolicShapeMeta.
Hence that dummy arg is not needed anymore.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155539
Approved by: https://github.com/ezyang
2025-06-11 08:46:56 +00:00
Nikita Shulga
c4d1ff02f8 [Lint] Update clang-format to 19.1.4 (#153889)
All changes other than the one to `tools/linter/adapters/s3_init_config.json` are generated by newer clang-format
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153889
Approved by: https://github.com/cyyever, https://github.com/atalman
2025-05-20 14:12:46 +00:00
Dylan Maloy
8e0f9fbccf [c10] helpers for runtime c10::alias re-use (#151361)
Summary: we need these to check whether the input tensor was re-sized/strided between executions when choosing to alias

Test Plan: CI

Reviewed By: henryoier

Differential Revision: D73061676

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151361
Approved by: https://github.com/SherlockNoMad
2025-04-17 20:27:17 +00:00
cyy
f4f0f2995d Fix Wextra-semi warnings (#139000)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139000
Approved by: https://github.com/ezyang
2024-10-28 21:48:51 +00:00
Richard Barnes
42994234a6 std::value/std::type -> std::_v/std::_t (#138746)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138746
Approved by: https://github.com/cyyever, https://github.com/malfet
2024-10-26 20:59:24 +00:00
cyy
38d3c27849 [1/N] Enable cppcoreguidelines-special-member-functions (#137405)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137405
Approved by: https://github.com/ezyang
2024-10-23 00:16:53 +00:00
Jessica Vandebon
baff86dafb [MTIA tensor] allow shallow copy between CPU and MTIA tensors (#135871)
Reviewed By: egienvalue, hanzlfs

Differential Revision: D61662214

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135871
Approved by: https://github.com/egienvalue, https://github.com/nautsimon
2024-09-13 22:13:58 +00:00
cyy
f83ef69b84 Fix typo in assignment operators (#131890)
Most typos were introduced in #131077
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131890
Approved by: https://github.com/Skylion007
2024-07-27 11:13:42 +00:00
cyyever
451462dbff [1/N] Add missing constructors or assignment operators (#131077)
Just mark them as deleted in most cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131077
Approved by: https://github.com/ezyang
2024-07-24 12:09:39 +00:00
cyy
1d1d074072 [3/N] Fix Wunused-parameter warnings (#131271)
Follows #131170

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131271
Approved by: https://github.com/ezyang
2024-07-20 23:31:03 +00:00
cyy
f4dcf2ae93 [1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301
Approved by: https://github.com/ezyang, https://github.com/r-barnes
2024-07-08 07:03:53 +00:00
rzou
08b616281f [custom ops] Switch out references from old landing page to new landing page (#129178)
Test Plan:
- existing tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129178
Approved by: https://github.com/albanD
ghstack dependencies: #129177
2024-06-21 13:31:40 +00:00
PyTorch MergeBot
846bb30e13 Revert "[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301)"
This reverts commit bd72e28314.

Reverted https://github.com/pytorch/pytorch/pull/128301 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it fails XLA build bd72e28314. Please rebase your PR before relanding because I think the failure is hidden by an unrelated broken trunk XLA failure from your current base commit ([comment](https://github.com/pytorch/pytorch/pull/128301#issuecomment-2169035822))
2024-06-15 01:58:20 +00:00
cyy
bd72e28314 [1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301
Approved by: https://github.com/ezyang
2024-06-14 23:21:01 +00:00
rzou
c9beea13ac Rewrite existing links to custom ops gdocs with the landing page (#127423)
NB: these links will be live after the docs build happens, which is once
a day.

Test Plan:
- existing tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127423
Approved by: https://github.com/jansel, https://github.com/williamwen42
ghstack dependencies: #127291, #127292, #127400
2024-05-30 14:54:29 +00:00
Richard Barnes
ed327876f5 [codemod] c10:optional -> std::optional (#126135)
Generated by running the following from PyTorch root:
```
find . -regex ".*\.\(cpp\|h\|cu\|hpp\|cc\|cxx\)$" | grep -v "build/" | xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/'
```

`c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi
2024-05-14 19:35:51 +00:00
David Berard
82edc8b5d5 [NT] Make NestedTensor register as having symbolic sizes/strides (#124687)
Fixes #123698

This PR makes TensorImpl::has_symbolic_sizes_strides return false for NestedTensors.

1. It passes in the actual sizes when we call `_make_wrapper_subclass` - this is the change that makes the subclass register as `has_symbolic_sizes_strides() == True`
2. It adds a field to `_make_wrapper_subclass` where an explicit `numel` can be provided. This allows us to skip the numel computation for the storage, which previously fails due to arithmetic on NestedInts.
3. Implements `aten::numel` for NJT - this is separate from the overridden numel in `make_wrapper_subclass` for now. Note also that this means that we leave `dispatch_sizes_strides_policy="sizes"`, so that we call into the custom `numel` implementation (as well as `sizes` and `strides`), because `numel` cannot currently be computed from `sizes` for NJT.

Note also that this depends on #121361, because calling TensorImpl::set_sizes_and_strides() tries to clone the sizes into the tensor, which means that we need `clone` to be implemented on NestedInt.

Differential Revision: [D57225736](https://our.internmc.facebook.com/intern/diff/D57225736)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124687
Approved by: https://github.com/albanD
2024-05-13 16:50:25 +00:00
Ashwin Hari
5f5778476a rename ort to maia (#123265)
Fixes #123264

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123265
Approved by: https://github.com/albanD
2024-04-23 00:33:25 +00:00
William Wen
38bfe7bcd1 add link to custom ops troubleshooting page on tensor data_ptr error (#124240)
Fix part of https://github.com/pytorch/pytorch/issues/123603.

Example traceback on branch https://github.com/pytorch/vision/compare/main...wwen/custom_ops_test:
```
running my_custom_op!
Traceback (most recent call last):
  File "/data/users/williamwen/torchvision/playground.py", line 13, in <module>
    print(opt_fn1(torch.randn(3, 3)))
  File "/data/users/williamwen/pytorch2/torch/_dynamo/eval_frame.py", line 387, in _fn
    return fn(*args, **kwargs)
  File "/data/users/williamwen/pytorch2/torch/_dynamo/convert_frame.py", line 977, in catch_errors
    return callback(frame, cache_entry, hooks, frame_state, skip=1)
  File "/data/users/williamwen/pytorch2/torch/_dynamo/convert_frame.py", line 818, in _convert_frame
    result = inner_convert(
  File "/data/users/williamwen/pytorch2/torch/_dynamo/convert_frame.py", line 411, in _convert_frame_assert
    return _compile(
  File "/data/users/williamwen/pytorch2/torch/_utils_internal.py", line 70, in wrapper_function
    return function(*args, **kwargs)
  File "/data/users/williamwen/py310-env/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/data/users/williamwen/pytorch2/torch/_dynamo/convert_frame.py", line 700, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
  File "/data/users/williamwen/pytorch2/torch/_dynamo/utils.py", line 266, in time_wrapper
    r = func(*args, **kwargs)
  File "/data/users/williamwen/pytorch2/torch/_dynamo/convert_frame.py", line 568, in compile_inner
    out_code = transform_code_object(code, transform)
  File "/data/users/williamwen/pytorch2/torch/_dynamo/bytecode_transformation.py", line 1116, in transform_code_object
    transformations(instructions, code_options)
  File "/data/users/williamwen/pytorch2/torch/_dynamo/convert_frame.py", line 173, in _fn
    return fn(*args, **kwargs)
  File "/data/users/williamwen/pytorch2/torch/_dynamo/convert_frame.py", line 515, in transform
    tracer.run()
  File "/data/users/williamwen/pytorch2/torch/_dynamo/symbolic_convert.py", line 2237, in run
    super().run()
  File "/data/users/williamwen/pytorch2/torch/_dynamo/symbolic_convert.py", line 875, in run
    while self.step():
  File "/data/users/williamwen/pytorch2/torch/_dynamo/symbolic_convert.py", line 790, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "/data/users/williamwen/pytorch2/torch/_dynamo/symbolic_convert.py", line 492, in wrapper
    return inner_fn(self, inst)
  File "/data/users/williamwen/pytorch2/torch/_dynamo/symbolic_convert.py", line 1260, in CALL_FUNCTION
    self.call_function(fn, args, {})
  File "/data/users/williamwen/pytorch2/torch/_dynamo/symbolic_convert.py", line 730, in call_function
    self.push(fn.call_function(self, args, kwargs))
  File "/data/users/williamwen/pytorch2/torch/_dynamo/variables/torch.py", line 747, in call_function
    tensor_variable = wrap_fx_proxy(
  File "/data/users/williamwen/pytorch2/torch/_dynamo/variables/builder.py", line 1425, in wrap_fx_proxy
    return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs)
  File "/data/users/williamwen/pytorch2/torch/_dynamo/variables/builder.py", line 1510, in wrap_fx_proxy_cls
    example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True)
  File "/data/users/williamwen/pytorch2/torch/_dynamo/utils.py", line 1804, in get_fake_value
    raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None
  File "/data/users/williamwen/pytorch2/torch/_dynamo/utils.py", line 1736, in get_fake_value
    ret_val = wrap_fake_exception(
  File "/data/users/williamwen/pytorch2/torch/_dynamo/utils.py", line 1251, in wrap_fake_exception
    return fn()
  File "/data/users/williamwen/pytorch2/torch/_dynamo/utils.py", line 1737, in <lambda>
    lambda: run_node(tx.output, node, args, kwargs, nnmodule)
  File "/data/users/williamwen/pytorch2/torch/_dynamo/utils.py", line 1872, in run_node
    raise RuntimeError(make_error_message(e)).with_traceback(
  File "/data/users/williamwen/pytorch2/torch/_dynamo/utils.py", line 1854, in run_node
    return node.target(*args, **kwargs)
  File "/data/users/williamwen/pytorch2/torch/_ops.py", line 870, in __call__
    return self_._op(*args, **(kwargs or {}))
torch._dynamo.exc.TorchRuntimeError: Failed running call_function torchvision.my_custom_op1(*(FakeTensor(..., size=(3, 3)),), **{}):
The tensor has a non-zero number of elements, but its data is not allocated yet. Caffe2 uses a lazy allocation, so you will need to call mutable_data() or raw_mutable_data() to actually allocate memory.
If you're using torch.compile/export/fx, it is likely that we are erroneously tracing into a custom kernel. To fix this, please wrap the custom kernel into an opaque custom op. Please see the following for details: https://docs.google.com/document/d/1W--T6wz8IY8fOI0Vm8BF44PdBgs283QvpelJZWieQWQ

from user code:
   File "/data/users/williamwen/torchvision/playground.py", line 5, in fn1
    return torch.ops.torchvision.my_custom_op1(x)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124240
Approved by: https://github.com/zou3519
2024-04-18 00:08:09 +00:00
David Berard
6b24ec480c [Tensor] Detect more cases of symbolic sizes/strides (#123696)
Previously, we'd just check `has_symbolic_sizes_strides()` to know whether a tensor has symbolic sizes or strides; if does, we skip some profiler logic. But sometimes `has_symbolic_sizes_strides()` returns false, but we do actually have symbolic sizes or strides.

So in this change, we add `may_have_symbolic_sizes_strides()` - which should never return false if the tensor has symbolic sizes and strides

Why not change `has_symbolic_sizes_strides()`? It seems like there's preexisting logic that assumes that "if has_symbolic_sizes_strides(), then we can assume that this tensor is guaranteed to have symbolic sizes or strides". In this case, we have python-implemented sizes or strides, which should follow a different code path.

Differential Revision: [D55947660](https://our.internmc.facebook.com/intern/diff/D55947660/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123696
Approved by: https://github.com/aaronenyeshi, https://github.com/soulitzer
2024-04-11 16:51:52 +00:00
cyy
507611f9ae [CUDACachingAllocator] Turn Allocator::allocate into non-const (#120969)
Ideally, the method should be non-const since it changes the allocator state. Some const_casts are also removed in the way.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120969
Approved by: https://github.com/albanD
2024-03-05 09:53:05 +00:00
Pearu Peterson
70d4d109f2 Make SparseCsr a functionality dispatch key (#120703)
As in the title.

To enable meta and fake tensor support for sparse compressed tensors in compliance with the meta/fake tensor support for sparse COO tensor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120703
Approved by: https://github.com/ezyang
2024-03-01 13:28:46 +00:00
PyTorch MergeBot
a9d9077f12 Revert "Increased compile time max GPUs to 512. Switched to int16_t DeviceIndex. (#119639)"
This reverts commit 7c556428c7.

Reverted https://github.com/pytorch/pytorch/pull/119639 on behalf of https://github.com/kit1980 due to breaking internal builds, see D54286923 ([comment](https://github.com/pytorch/pytorch/pull/119639#issuecomment-1969634480))
2024-02-28 18:57:09 +00:00
Tobias Ringwald
7c556428c7 Increased compile time max GPUs to 512. Switched to int16_t DeviceIndex. (#119639)
Fixes #115331.

This PR increases the number of valid GPU devices to 512 (from 64) in order to future-proof PyTorch for providers that offer [single nodes with a large device count](https://www.tensorwave.com/). Until now, `DeviceIndex` was an `int8_t`, thus multiple changes were necessary:

- `DeviceIndex` changed to `int16_t`. Updated consumers that assume it to be an `int8_t`.
- Updated bounds checking for `torch.device()` in the Python frontend. Right now, we allow funny things like `torch.device('cpu', 200).index == -56`, which is undefined behavior. I inserted some checks to only allow values between 0 and `c10::Device::MAX_NUM_DEVICES - 1`.
- Updated the `ArgumentInfo` struct as it hardcodes the device index as 8 bit field [^1]. Might be a breaking change, not sure if users rely on this.
- Introduced `c10::Device::MAX_NUM_DEVICES` as a replacement for the old `C10_COMPILE_TIME_MAX_GPUS`

[^1]: This field was unsigned, so I guess this has also been undef behavior the whole time? Our default device index is -1, so this always wrapped around to 255 when written to the `ArgumentInfo` struct. When I switched the `DeviceIndex` to `int16_t`, it actually stayed 255 after unpacking from `ArgumentInfo` again, as the `DeviceIndex` was now wide enough that it didn't wrap back to -1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119639
Approved by: https://github.com/cyyever, https://github.com/albanD, https://github.com/huydhn
2024-02-27 07:05:48 +00:00
PyTorch MergeBot
fff9d98e58 Revert "Increased compile time max GPUs to 512. Switched to int16_t DeviceIndex. (#119639)"
This reverts commit e0268821dd.

Reverted https://github.com/pytorch/pytorch/pull/119639 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I think the Window failures are legit as they are failing now in trunk, i.e. 450339ab2d ([comment](https://github.com/pytorch/pytorch/pull/119639#issuecomment-1958428416))
2024-02-22 00:12:54 +00:00
Tobias Ringwald
e0268821dd Increased compile time max GPUs to 512. Switched to int16_t DeviceIndex. (#119639)
Fixes #115331.

This PR increases the number of valid GPU devices to 512 (from 64) in order to future-proof PyTorch for providers that offer [single nodes with a large device count](https://www.tensorwave.com/). Until now, `DeviceIndex` was an `int8_t`, thus multiple changes were necessary:

- `DeviceIndex` changed to `int16_t`. Updated consumers that assume it to be an `int8_t`.
- Updated bounds checking for `torch.device()` in the Python frontend. Right now, we allow funny things like `torch.device('cpu', 200).index == -56`, which is undefined behavior. I inserted some checks to only allow values between 0 and `c10::Device::MAX_NUM_DEVICES - 1`.
- Updated the `ArgumentInfo` struct as it hardcodes the device index as 8 bit field [^1]. Might be a breaking change, not sure if users rely on this.
- Introduced `c10::Device::MAX_NUM_DEVICES` as a replacement for the old `C10_COMPILE_TIME_MAX_GPUS`

[^1]: This field was unsigned, so I guess this has also been undef behavior the whole time? Our default device index is -1, so this always wrapped around to 255 when written to the `ArgumentInfo` struct. When I switched the `DeviceIndex` to `int16_t`, it actually stayed 255 after unpacking from `ArgumentInfo` again, as the `DeviceIndex` was now wide enough that it didn't wrap back to -1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119639
Approved by: https://github.com/cyyever, https://github.com/albanD
2024-02-21 21:10:49 +00:00
cyy
10f3abc6b8 [DeviceIndex][3/N] Use DeviceIndex in more places (#119635)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119635
Approved by: https://github.com/ezyang
2024-02-12 21:31:27 +00:00
cyy
1544c37520 [7/N] Fixes clang-tidy warnings in c10/{core,util}/*.h (#115495)
This PR continues to fix clang-tidy warnings for headers in c10/core and c10/util.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115495
Approved by: https://github.com/malfet
2023-12-19 02:14:30 +00:00
cyy
7b8084d1c6 [5/N] Fixes clang-tidy warnings in c10/core/*.h (#115232)
This PR continues to fix clang-tidy warnings for headers in c10/core.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115232
Approved by: https://github.com/Skylion007
2023-12-07 15:48:03 +00:00
cyyever
1224acc018 [3/N] Fixes clang-tidy warnings in header files (#114431)
This PR series tries to enable clang-tidy for headers in torch/csrc and c10/util.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114431
Approved by: https://github.com/Skylion007
2023-12-05 12:58:27 +00:00
cyy
07e00de8d7 Add missing member initialization in c10::ExtraMeta constructor (#114448)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114448
Approved by: https://github.com/Skylion007
2023-11-24 03:54:11 +00:00
Chen_Liqing
b910d9eaa6 Add tensor.is_privateuseone (#113421)
We found a scenario where ``tensor.device().is_privateuseone()`` is used to determine whether a tensor is privateuse1 but fails.
In the code of ``Autograd``, for example:
```
::std::tuple<at::Tensor,at::Tensor,at::Tensor> native_batch_norm(c10::DispatchKeySet ks, const at::Tensor & input, const c10::optional<at::Tensor> & weight, const c10::optional<at::Tensor> & bias, const c10::optional<at::Tensor> & running_mean, const c10::optional<at::Tensor> & running_var, bool training, double momentum, double eps) {
  auto& input_ = unpack(input, "input", 0);
  [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( input, weight, bias );

  [[maybe_unused]] auto _any_has_forward_grad_result0 = (isFwGradDefined(input) || isFwGradDefined(weight) || isFwGradDefined(bias));
  check_no_requires_grad(running_mean, "running_mean", "native_batch_norm");
  check_no_requires_grad(running_var, "running_var", "native_batch_norm");
  std::shared_ptr<NativeBatchNormBackward0> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<NativeBatchNormBackward0>(new NativeBatchNormBackward0(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( input, weight, bias ));
    grad_fn->eps = eps;
    grad_fn->input_ = SavedVariable(input, false);
    grad_fn->running_mean_ = SavedVariable(running_mean, false);
    grad_fn->running_var_ = SavedVariable(running_var, false);
    grad_fn->training = training;
    grad_fn->weight_ = SavedVariable(weight, false);
  }
  ...
}
```
When ``weight`` is ``None``, an empty tensor is automatically generated and will be transferred to the backward calculation:
c7e12c7427/torch/csrc/autograd/saved_variable.cpp (L121-L128)
At the beginning of the backward calculation in our scenario, we need to determine whether the input tensor is ``PrivateUse1`` . However, if we use ``tensor.device().is_privateuseone()``, we will get an error ``"tensor does not have a device"``:
c7e12c7427/c10/core/TensorImpl.h (L1223-L1235)
I think this part of the code can be optimized, what do you think?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113421
Approved by: https://github.com/albanD
2023-11-13 01:51:27 +00:00
Peter Bell
15b61d6c1a TensorImpl: Lazily compute numel and contiguity when symbolic (#112785)
Currently whenever the sizes or strides are modified for a `TensorImpl` we
eagerly recompute the numel and memory format flags. This is fine for static
shapes as it's all fast C++ code, but for symbolic shapes it runs slow python code.

This instead changes the `SymbolicShapeMeta` object to compute the derived
quantities lazily at the first request. This has the added benefit that we can
now pass assumptions in `empty_tensor_restride` which remove the need to compute
some contiguity flags at all.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112785
Approved by: https://github.com/ezyang
ghstack dependencies: #112689, #112890
2023-11-09 01:36:37 +00:00
Peter Bell
8c4bdac560 TensorImpl: Move symbolic refresh_numel and refresh_contiguous into their own class (#112890)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112890
Approved by: https://github.com/lezcano
ghstack dependencies: #112689
2023-11-09 01:36:37 +00:00
Peter Bell
c6f435befd Don't recompute numel and contiguous in detach (#112689)
When symolic shapes are involved, `refresh_numel` and `refresh_contiguous` are
fairly expensive since they dispatch to python for SymInt handling. However, in
the case of detach we can just copy the existing values instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112689
Approved by: https://github.com/lezcano, https://github.com/ezyang
2023-11-07 10:55:48 +00:00