Change our representation of sizes and strides to contain SymInts
instead of int64_t.
Right now it's not actually possible to create a Tensor with symbolic
shape, so this change is intended to be a no-op.
But the intended behavior is:
- If you create a Tensor with symbolic shape, a `CustomSizes` policy
will be set, and the `has_symbolic_sizes_strides_` bit will be set. (not
currently implemented)
- Calling any TensorImpl function that naively interacts with sizes and
strides will throw. For hot-path functions (`sizes()`, `strides()`), we
make use of the existing policy check to throw. For others, we just have
a regular `TORCH_CHECK(!has_symbolic_sizes_strides_)`.
This also undoes the explicit constructor I made in
https://github.com/pytorch/pytorch/pull/77666; it ended up being more
annoying than useful when making these changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78272
Approved by: https://github.com/Krovatkin, https://github.com/Chillee
Change our representation of sizes and strides to contain SymInts
instead of int64_t.
Right now it's not actually possible to create a Tensor with symbolic
shape, so this change is intended to be a no-op.
But the intended behavior is:
- If you create a Tensor with symbolic shape, a `CustomSizes` policy
will be set, and the `has_symbolic_sizes_strides_` bit will be set. (not
currently implemented)
- Calling any TensorImpl function that naively interacts with sizes and
strides will throw. For hot-path functions (`sizes()`, `strides()`), we
make use of the existing policy check to throw. For others, we just have
a regular `TORCH_CHECK(!has_symbolic_sizes_strides_)`.
This also undoes the explicit constructor I made in
https://github.com/pytorch/pytorch/pull/77666; it ended up being more
annoying than useful when making these changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77994
Approved by: https://github.com/Krovatkin
With SymInt we are using the negative space of `int64_t` in our internal
representation. `SizesAndStridesTest` breaks this because it initializes
`SizesAndStrides` with negative sizes/strides. This PR fixes that.
As an aside: feels like `SizesAndStrides` (and `SymInt`) should really
take a uint64_t, but that would be BC-breaking so I don't do it here.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77820
Approved by: https://github.com/ezyang
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75808
Just as it is often difficult to write a single kernel that can handle both CPU and CUDA, so can it be difficult to do the same for NestedTensor.
ghstack-source-id: 154171542
(Note: this ignores all push blocking failures!)
Test Plan: CI?
Reviewed By: bdhirsh
Differential Revision: D35603836
fbshipit-source-id: fb0ebb19d34531ed96ce176aca325f8e2b5f90e6
(cherry picked from commit 0bcd753f93c04256c1b745f84a74ecccf0dceef5)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74963
This is a re-land of D35192346 (9872a06d77) and D35192317 (a9216cde6c), which together are a diff that changes the internal representation of `DispatchKeySet` in pytorch core to free up the number of dispatch keys that we have available. See a more detailed description of the design in the original PR: https://github.com/pytorch/pytorch/pull/69633.
The original PR broke Milan workflows, which use a pytorch mobile build, and manifested as a memory corruption bug inside of `liboacrmerged.so`.
**Background: Existing Mobile Optimization**
Pytorch mobile builds have an existing optimization (here cc23725e89/c10/core/DispatchKey.h (L382) and here cc23725e89/aten/src/ATen/core/dispatch/OperatorEntry.h (L214)), which works as follows:
Every operator in pytorch has a "dispatch table" of function pointers, corresponding to all of the (up to 64) different kernels that we might dispatch to when we run an operator in pytorch (autograd, cpu, cuda, complex number support, etc).
In mobile builds, the size of that table is shrunk from 64 to 8 to save a bunch of space, because mobile doesn't end up using the functionality associated with most dispatch keys.
The dispatcher also has a notion of "fallback kernels", which are kernels that you can register to a particular dispatch key, but should be able to work for "any operator". The array of fallback kernels is defined here: cc23725e89/aten/src/ATen/core/dispatch/Dispatcher.h (L294).
The mobile-optimization currently does **not** extend to this array (it wouldn't be that useful anyway because there is only one array of fallback kernels globally - vs. there is a separate dispatch table of function pointers per operator). So the per-operator tables on mobile are size 8, while the fallback table is size 64.
**The Bug**
This PR actually makes it difficult to enable that optimization separately for the per-operator arrays vs. the fallback array, and incidentally shrunk the size of the fallback array from 64 to 8 for mobile (that happened on this line: https://github.com/pytorch/pytorch/pull/69633/files#diff-f735cd7aa68f15b624100cbc4bb3b5ea76ffc7c9d3bec3b0ccabaa09609e5319R294).
That isn't a problem by itself (since mobile doesn't actually use any of the fallbacks that can no longer be stored). However, pytorch core will still register all of those fallback kernels on startup in mobile builds, even if they aren't used. When we tried to register one of those fallbacks on startup, it would try to dump the kernel somewhere in memory past the bounds of the (now smaller) array inside of the `Dispatcher` object, `backendFallbackKernels_`.
**Why didn't this problem show up in OSS CI? Why didn't it break other internal mobile workflows aside from Milan?**
Ideally, this failure would show up as part of the OSS signal on GitHub, since we already have mobile OSS builds. Given that it was another memory corruption issue that only affected Milan (subset of mobile), I'm not sure what's specific about Milan's builds that caused it only to manifest there. dreiss I wonder if there's another flavor of mobile builds we could run in OSS CI that could potentially help catch this?
**The debugging experience was pretty difficult**
Debugging the Milan-specific failure was made difficult by the following:
(1) lack of CI
- the original Milan failure didn't surface on my original diff, because the Milan job(s) that failed weren't triggered to run on pytorch changes. There's probably a balance to strike here, since those jobs will only be useful if they aren't flaky, and if they can produce reliable failure logs for debugging.
(2) It's difficult to get a repro.
- my work laptop doesn't have the right specs to run the Milan development workflow (not enough disk space)
- There is an existing OnDemand workflow for Milan, but it appears to be relatively new, and after a bunch of help from MarcioPorto, we ran into issues forwarding the log output from Milan tests on the emulator back to the terminal (see the original discussion here: https://fb.workplace.com/groups/OnDemandFRL/permalink/1424937774645433/)
(3) Lack of stack-traces.
- Most Milan failures didn't include actionable stack traces. phding generously helped me debug by running my suggested patches locally, and reporting back if there were any failures. The failing test didn't include a stack trace though (just the line where the crash appeared), so I ended up making some educated guesses about what the issue was based on the area of the crash.
ghstack-source-id: 152688542
Test Plan: Confirmed with phding that the broken Milan workflow from the previous version of this diff is now passing.
Reviewed By: phding, albanD
Differential Revision: D35222806
fbshipit-source-id: 0ad115a0f768bc8ea5d4c203b2990254c7092d30
(cherry picked from commit 002b91966f11fd55ab3fa3801b636fa39a6dd12c)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74110
Since this class was added, it's missing a unit test. dbort first noticed it, so adding one in this commit.
ghstack-source-id: 151471746
Test Plan: `buck test //xplat/caffe2/c10:c10_test`
Reviewed By: dbort
Differential Revision: D34822911
fbshipit-source-id: 919a125081a2093d6f4e5a2cdb008145c05ec803
(cherry picked from commit 358b7dacced866c54b8c1972393d042ebbd93d9e)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74062
Since this class was added, it's missing a unit test. dbort first noticed it, so adding one in this commit.
ghstack-source-id: 151122696
Test Plan: `buck test //xplat/caffe2/c10:c10_test`
Reviewed By: dbort
Differential Revision: D34800969
fbshipit-source-id: e665ab0df2faf505536bf27bdf29fcd3e70fe699
(cherry picked from commit e060756d2772dbcbe59a6422de786e338807afa6)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71909
This reduces the dependencies for these tests to their corresponding
libraries and reduces the distance from file to test.
ghstack-source-id: 150235102
Test Plan: This ought to be a no-op: rely on CI to validate.
Reviewed By: malfet
Differential Revision: D33815406
fbshipit-source-id: 7097e9dcfec2fc27fedae91637ba1ebda670198c
(cherry picked from commit 66b1640f2cb6faf1f17f0392f1f3242871ade16f)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71908
This reduces the dependencies of these tests and also the distance
from each test to its primary input.
ghstack-source-id: 150235100
Test Plan: This ought to be a no-op, rely on CI to validate.
Reviewed By: malfet
Differential Revision: D33815404
fbshipit-source-id: 8f69ebabe5f7bacba056b0f31e55161fc431a45e
(cherry picked from commit 3906723bd5b9e9d1eb6f8e37b4173ad695658cd9)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71907
This allows us to refactor the c10 tests without anything downstream
needing to be concerned about it.
ghstack-source-id: 150235098
Test Plan: This ought to be a no-op, rely on CI to validate.
Reviewed By: malfet
Differential Revision: D33815403
fbshipit-source-id: d358d6e8b1b45b62cef73bdbfd9c7709a7075c42
(cherry picked from commit a554dbe55a28516c8db2287552194860be87f2f0)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72402
The original PR had an array-out-of-bounds access in `DispatchKeyExtractor.cpp`, that wasn't caught by ASAN and appeared to only manifest in a subset of android internal tests. After fixing the OOB access (and adding more asserts), I confirmed that the android internal test passes.
Reland of D33255193 (20b8653dfa)
ghstack-source-id: 148830728
Test Plan:
Steps to test:
(1) connect to a mobile OD
(2) run `one_world android emulator android-29` in a terminal to start the android emulator
(3) In a separate terminal, run the test: `buck test //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test -c test.external_runner=tpx -- --regex 'testBIXRayModel.*PyTorchBIXRayInstrumentationTest' --force-remote-execution --run-disabled`
I also ran `buck test fbandroid/mode/dbg //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test`, which failed before and passed after the PR.
Reviewed By: albanD
Differential Revision: D34034848
fbshipit-source-id: 9677ee2c0a1afd1183896f7055009445712523c5
(cherry picked from commit 9ab9b12d35)
Summary: I think this diff stack broke all the related tasks below.
Test Plan:
For our failing tests:
buck test //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test -c test.external_runner=tpx -- --regex 'testBIXRayModel.*PyTorchBIXRayInstrumentationTest' --force-remote-execution --run-disabled
For the ubn:
Not really sure what to do, trying to build the app and see if I can use an effect?
Reviewed By: shoumikhin
Differential Revision: D34018849
fbshipit-source-id: 3571718cb6621931af931b494e0a70d6e0164e65
(cherry picked from commit 3cc63cb2ea)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70928
ghstack-source-id: 148159366
Test Plan: Ensured that the same number of tests are found and run.
Reviewed By: malfet
Differential Revision: D33455272
fbshipit-source-id: fba1e3409b14794be3e6fe4445c56dd5361cfe9d
(cherry picked from commit b45fce500a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69110
I pasted the current LLVM code, reapplied the modifications listed in the code comments, caught a few more in the diff/build process. The trivially copyable detection is different now; if gcc builds fail, will try reverting to C10_IS_TRIVIALLY_COPYABLE or copying what LLVM is doing.
The motivation for this change is that, as noted in an existing comment, C10_IS_TRIVIALLY_COPYABLE did the wrong thing for std::unique_ptr, which caused problems with D32454856 / #68412.
ghstack-source-id: 145327773
Test Plan: CI
Reviewed By: bhosmer, mruberry
Differential Revision: D32733017
fbshipit-source-id: 9452ab90328e3fdf457aad23a26f2f6835b0bd3d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66746
Modified loops in files under fbsource/fbcode/caffe2/ from the format
`for(TYPE var=x0;var<x_max;x++)`
to the format
`for(const auto var: irange(xmax))`
This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.
Test Plan: Sandcastle
Reviewed By: malfet
Differential Revision: D31705361
fbshipit-source-id: 33fd22eb03086d114e2c98e56703e8ec84460268
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234
Modified loops in files under fbsource/fbcode/caffe2/ from the format
`for(TYPE var=x0;var<x_max;x++)`
to the format
`for(const auto var: irange(xmax))`
This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.
bypass_size_limit
allow-large-files
Test Plan: Sandcastle
Reviewed By: ngimel
Differential Revision: D30652629
fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e
Summary:
Delete `-Wno-unused-variable` from top level `CMakeLists.txt`
Still suppress those warnings for tests and `torch_python`
Delete number of unused variables from caffe2 code
Use `(void)var;` to suppress unused variable in range loops
Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants
Do not delete `caffe2::OperatorBase::Output` calls as they have side effects
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66041
Reviewed By: ngimel
Differential Revision: D31360142
Pulled By: malfet
fbshipit-source-id: 6fdfb9f91efdc49ca984a2f2a17ee377d28210c8
Summary:
Delete `-Wno-unused-variable` from top level `CMakeLists.txt`
Still suppress those warnings for tests and `torch_python`
Delete number of unused variables from caffe2 code
Use `(void)var;` to suppress unused variable in range loops
Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65954
Reviewed By: ngimel
Differential Revision: D31326599
Pulled By: malfet
fbshipit-source-id: 924155f1257a2ba1896c50512f615e45ca1f61f3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610
- Replace HIP_PLATFORM_HCC with USE_ROCM
- Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION.
- In the next PR
- Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify.
- HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc.
cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd
Reviewed By: jbschlosser
Differential Revision: D30909053
Pulled By: ezyang
fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64670
Bounds checking is not required for `std::string_view`, and the checking hoses performance for the following performance prototype diff.
ghstack-source-id: 138037531
Test Plan: CI
Reviewed By: ezyang, bhosmer
Differential Revision: D30747515
fbshipit-source-id: 1f4374415a82dfdccce76ea2c6885c13cb93d369
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63203
Currently, `c10::Device` isn't being tested - i.e. there's no test to ensure that the device string parsing works as expected. This diff adds very basic tests to assert that the stuff we expect to work works, and the stuff that we don't expect to work doesn't work.
ghstack-source-id: 136006962
Test Plan:
New test. Ran as:
```
cd fbsource/fbcode/
buck test //caffe2/c10:c10_test_0 -- -r '.*DeviceTest.*'
```
Reviewed By: dreiss, raziel
Differential Revision: D30286910
fbshipit-source-id: b5699068dcbba89d5d224dbaf74b175f3f785a00
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62527
If NDEBUG is applied inconsistently in compilation we might get 'ambiguous declaration' error. Let's make sure that the forward declaration matches glibc including all specifiers.
Test Plan: sandcastle
Reviewed By: mdschatz
Differential Revision: D30030051
fbshipit-source-id: 9f4d5f1d4e74f0a4eaeeaaaad76b93ee485d8bcd
Summary:
The cases are found out by compiling against clang on Windows.
Those functions will still be exported under this case, which is a waste of space in the symbol table.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62952
Reviewed By: gchanan
Differential Revision: D30191291
Pulled By: ezyang
fbshipit-source-id: 3319b0ec4f5fb02e0fe1b81dbbcedcf12a0c795e
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`
All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008
Reviewed By: driazati, r-barnes
Differential Revision: D29838584
Pulled By: malfet
fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59333
Code comment should explain this in sufficient detail. In brief, making it 16 bytes should get it to be passed in registers.
ghstack-source-id: 130631329
Test Plan: Updated optional_test and added static_assert in Optional.cpp.
Reviewed By: ezyang
Differential Revision: D28843027
fbshipit-source-id: 3029f05e03a9f04ca7337962e7770cdeb9a608d9