Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66109
This refactor is no longer necessary for ufunc codegen, as I changed
the format of ufuncs to not directly be inserted into the 'dispatch'
key, but I think the refactored code here is better. The basic concept
is to directly construct BackendMetadata as we are parsing entries of
the dispatch dictionary, rather than post facto creating them later.
This centralizes the compute and means that the creation of the backend index
is just a simple reindexing by operator name (nothing nontrivial).
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: bdhirsh
Differential Revision: D31385760
Pulled By: ezyang
fbshipit-source-id: 4fcb491ba025d2aa6fd356586b57affb97a507fc
(cherry picked from commit 21c93d4199)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72730
This diff contains changes from several PRs landed to lazy_tensor_staging branch.
- generating 'fallback' overrides for each codegenned op, useful for debugging
- supports operators which are missing aten:: symbols for op names, instead using their string counterpart
- makes the IR class a base class instead of hardcoding the assumption of TS
Test Plan: tested on lazy_tensor_staging branch
Reviewed By: desertfire
Differential Revision: D34178476
fbshipit-source-id: 7190b2e0d82b4eb1f4510c858c24446c6df3f9d0
(cherry picked from commit 6713d3f0ef)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72402
The original PR had an array-out-of-bounds access in `DispatchKeyExtractor.cpp`, that wasn't caught by ASAN and appeared to only manifest in a subset of android internal tests. After fixing the OOB access (and adding more asserts), I confirmed that the android internal test passes.
Reland of D33255193 (20b8653dfa)
ghstack-source-id: 148830728
Test Plan:
Steps to test:
(1) connect to a mobile OD
(2) run `one_world android emulator android-29` in a terminal to start the android emulator
(3) In a separate terminal, run the test: `buck test //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test -c test.external_runner=tpx -- --regex 'testBIXRayModel.*PyTorchBIXRayInstrumentationTest' --force-remote-execution --run-disabled`
I also ran `buck test fbandroid/mode/dbg //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test`, which failed before and passed after the PR.
Reviewed By: albanD
Differential Revision: D34034848
fbshipit-source-id: 9677ee2c0a1afd1183896f7055009445712523c5
(cherry picked from commit 9ab9b12d35)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71925
This patch fixes a few minor issues and introduces a few changes that are found in enabling ROCm compilation for ATen.
1. Minor type-related changes in `ATen/miopen/*`
This is to suppress compiler warnings.
2. [EXCLUDED] Hipify `ATen/naitve/miopen/*.cpp`
`ATen/native/miopen/*.cpp` includes "cuda/CUDAConfig.h", which should be `hip/HIPConfig.h` (though compilation succeeds without this change since currently `CUDAConfig.h` = `HIPConfig.h`).
3. Update `gen.py` to include `hip/EmptyTensor.h` instead of `cuda/EmptyTensor.h` for HIP compilation
`RegisterCUDA.cpp` (for HIP) should include `hip/EmptyTensor.h` (though compilation succeeds without this change since currently `cuda/EmptyTensor.h` does not contain CUDA-specific logic).
4. Exclude the `USE_DIRECT_NVRTC` code when `USE_ROCM=0`.
Note that `USE_DIRECT_NVRTC` is always undefined for OSS compilation. It seems that this flag exists only for an internal purpose.
5. [EXCLUDED] Exclude `frexp()` for ROCm <= 3.10
a newer ROCm (i.e., officially supported ROCm versions) has `frexp()`, but an old ROCm (e.g., ROCm <= 3.10) doesn't. This preprocessor branch avoids compilation error for old ROCm (though such an old ROCm is not officially supported).
6. Change an include path from `aten/src/ATen/` to `ATen/` in `SharedReduceOps.h`
This is, as far as I checked, the only place that includes `Aten` from `aten/src`. This change unifies the include format.
Test Plan: CI (including GitHub CI for ROCm)
Reviewed By: xw285cornell
Differential Revision: D33441758
fbshipit-source-id: 0853806c60de050d329b5ddddb8d51948f8f2788
(cherry picked from commit c2b8c16308)
Summary:
I've added the parsing of an optional first line in native_functions.yaml after the precomputed keyword for arguments that will be precomputed without replacement. This line is optional, must be the first and does not contain any arrow.
These new fields are precomputed as before in the meta function and added to the precompute struct returned by the meta function. For now I've put them as last args of the impl function where they can be reused.
example:
native_function.yaml:
```
...
precomputed:
- int numBatch, int numPlanes, int inputT, int inputH, int inputW <- new
- kernel_size -> int poolSizeT, int poolSizeH, int poolSizeW
- output_size -> int outputT, int outputH, int outputW
```
meta:
```
TORCH_PRECOMPUTE_META_FUNC(fractional_max_pool3d)(
const at::Tensor& input_,
IntArrayRef pool_size,
IntArrayRef output_size,
const at::Tensor& randomSamples
) {
...
return TORCH_PRECOMPUTE_STRUCT(fractional_max_pool3d)().set_numBatch(numBatch).set_numPlanes(numPlanes).set_inputT(inputT).set_inputH(inputH).set_inputW(inputW)
.set_poolSizeT(poolSizeT) ...
}
```
impl:
```
TORCH_IMPL_FUNC(fractional_max_pool3d_out_cpu)(
const at::Tensor& input_,
int64_t poolSizeT,
int64_t poolSizeH,
int64_t poolSizeW,
int64_t outputT,
int64_t outputH,
int64_t outputW,
const at::Tensor& randomSamples,
const at::Tensor& output,
const at::Tensor& indices,
int64_t numBatch, <- for now I've put them here
int64_t numPlanes,
int64_t inputT,
int64_t inputH,
int64_t inputW) {
```
Fixes https://github.com/pytorch/pytorch/issues/71314
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71368
Reviewed By: zou3519
Differential Revision: D33683984
Pulled By: bdhirsh
fbshipit-source-id: 33066dd92b8743aadf0dc8102f6bf0689f843242
(cherry picked from commit 64e46af6a4)
Summary: I think this diff stack broke all the related tasks below.
Test Plan:
For our failing tests:
buck test //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test -c test.external_runner=tpx -- --regex 'testBIXRayModel.*PyTorchBIXRayInstrumentationTest' --force-remote-execution --run-disabled
For the ubn:
Not really sure what to do, trying to build the app and see if I can use an effect?
Reviewed By: shoumikhin
Differential Revision: D34018849
fbshipit-source-id: 3571718cb6621931af931b494e0a70d6e0164e65
(cherry picked from commit 3cc63cb2ea)
Summary:
When the constant list is empty, previous codegen will generate something like
```
std::vector<c10::IValue>({
}), // constants list,
```
However it will fail quick-check, because it includes trailing spaces. This pr will generate the following instead.
```
std::vector<c10::IValue>(), // constants list,
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72199
ghstack-source-id: 148231023
Test Plan: CI
Reviewed By: tugsbayasgalan
Differential Revision: D33952046
fbshipit-source-id: 359b8a418928c89bbeb446b44774b312c94f03bc
(cherry picked from commit 060490f667)
Summary:
This improves a dry-run of `gen.py` from 0.80s to 0.45s.
`FileManager` in `dry_run` mode doesn't actually need to compute the
environment; it just records the filenames that would have been
written.
cc ezyang bhosmer bdhirsh
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69805
Reviewed By: ngimel
Differential Revision: D33944912
Pulled By: albanD
fbshipit-source-id: 74f22af3f2bd5afdef7105961270198566fa91e5
(cherry picked from commit 6fcdc15954)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71930
Previously `fbcode/caffe2/test/mobile/test_upgrader_bytecode_table_example.cpp` was checked in as intermediate step to make sure upgrader codegen works properly, before upgrader codegen is actually being used.
this change use `buck run mode/opt //caffe2/torch/fb/mobile/upgrader_codegen:upgrader_codegen` to codegen `upgrader_mobile.cpp` and we no longer need to use the checkin file `test_upgrader_bytecode_table_example.cpp` for the codegen unit test.
ghstack-source-id: 147957826
Test Plan:
```
buck test mode/opt //caffe2/test:upgrader_codegen
```
Reviewed By: tugsbayasgalan
Differential Revision: D33746264
fbshipit-source-id: 18de3cae53aed966e67f8dc42976a2d10d3788b3
(cherry picked from commit 661ffa7860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71938
`generated` will trigger the generated changes and hide the file changes. It's also misleading, because `gen_mobile_upgraders.py` itself is not autogen. Separate the keyword out from `gen_mobile_upgraders.py` so it's easier to see the changes from `gen_mobile_upgraders.py`.
ghstack-source-id: 147957825
Test Plan:
```
buck run mode/opt //caffe2/torch/fb/mobile/upgrader_codegen:upgrader_codegen
```
Reviewed By: tugsbayasgalan
Differential Revision: D33826982
fbshipit-source-id: 593c19f8ef4c9da776b11650863dc43c0b171cd5
(cherry picked from commit 43038d5bc7)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71578
Use more robust way of extracting upgrader min and max versions
Test Plan: omgitsgreen
Reviewed By: cccclai
Differential Revision: D33690113
fbshipit-source-id: 79a964acb26d7ca1354e104710a285b8da3f46d1
(cherry picked from commit 9e316ee5c1)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70617
This reduces the divergence between the code generated for
`create_out` different devices, and means the `TensorOptions` don't
need to be unpacked.
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision: D33623680
Pulled By: ngimel
fbshipit-source-id: 54f36774a8530be99c26a54270d4d95f3e38d684
(cherry picked from commit b22ba92e27)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69407
This generates aten_interned_strings.h from `native_functions.yaml`
which is more like how it was originally done. The items deleted from
`interned_strings.h` are duplicates that need to be removed in order
for the code to compile, some of the remaining items may still be out
of date but it is fairly benign even if that's the case.
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D32923636
Pulled By: albanD
fbshipit-source-id: a0fd6b3714e70454c5f4ea9b19da5e047d2a4687
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70338
Today Unpickler is used by both server and mobile for deserializing model, and it always fallback to mobile parser when there's no type resolver provided by user. However this is not intended as server and mobile type parser supports different things. In this diff we provide a default fallback using script parser and opt it out for all mobile cases.
ghstack-source-id: 146727330
(Note: this ignores all push blocking failures!)
Test Plan: CI
Reviewed By: iseeyuan
Differential Revision: D33284352
fbshipit-source-id: 997c4f110b36eee6596e8f23f6a87bf91a4197ed
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70385
This commit sync gen_lazy_tensor.py from the lazy_tensor_staging branch
to the master.
Test Plan: CI in the lazy_tensor_staging branch.
Reviewed By: wconstab
Differential Revision: D33306232
Pulled By: alanwaketan
fbshipit-source-id: a15c72b22418637f851a6cd4901a9f5c4be75449
Summary:
Removes the internal typeshed for PyTorch and replaces it with PyTorch's own type annotations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69926
Generated files are in P471601595, P471601643, P471601662
Based on an example in D26410012
Test Plan: Sandcastle
Reviewed By: malfet, pradeep90
Differential Revision: D32292834
fbshipit-source-id: 5223f514cbdccd02c08ef0a027a48d92cdebed2c
Summary:
From operator version map and upgrader torchscript, generate upgrader_mobile.cpp file. It also includes a unit test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69194
ghstack-source-id: 145819351
Test Plan:
```
buck test mode/opt //caffe2/test:upgrader_codegen
```
```
buck run mode/opt //caffe2/torch/fb/mobile/upgrader_codegen:upgrader_codegen
```
```
python /Users/chenlai/pytorch/tools/codegen/operator_versions/gen_mobile_upgraders.py
```
Reviewed By: iseeyuan
Differential Revision: D32748985
fbshipit-source-id: f8437766edaba459bfc5e7fc7a3ca0520c4edb9a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68690
RegisterFunctionalization.cpp is a shared file, so only including the
required operators means a single operator change only requires 1
shard to be rebuilt instead of all of them.
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D32596275
Pulled By: albanD
fbshipit-source-id: 8b56f48872156b96fbc0a16b542b8bab76b73fd4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68689
Currently Register{DispatchKey}.cpp includes all of
`NativeFunctions.h`, so any operator signature change requires all
backend registration to be recompiled. However, most backends only
have registrations for a small fraction of operators so it makes sense
to only include the specific functions required.
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D32596273
Pulled By: albanD
fbshipit-source-id: 11d511f47937fbd5ff9f677c9914277b5d015c25
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68714
This splits the static dispatch headers (e.g. `CPUFunctions.h`)
into per operators headers (e.g. `ops/empty_cpu_dispatch.h`) which is
needed for when `Tensor.h` is compiled with static dispatch enabled.
There are also several places in ATen where the static dispatch
headers are used as an optimization even in dynamic dispatch builds.
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D32596265
Pulled By: albanD
fbshipit-source-id: 287783ef4e35c7601e9d2714ddbc8d4a5b1fb9e5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68687
This adds `NativeFunction.root_name` which is the canonical name
for the operator group. i.e. the BaseOperatorName without inplace or
double-underscores. In the previous PR I referred to this as
`base_name` but confusingly `BaseOperatorName` does potentially
include inplace or double-underscores.
I also add the property to `NativeFunctionsGroup` so that grouped
functions with type `Union[NativeFunction, NativeFunctionsGroup]`
can have the property queried without needing `isinstance` checks.
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D32596271
Pulled By: albanD
fbshipit-source-id: 8b6dad806ec8d796dcd70fc664604670d668cae7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68247
This splits `Functions.h`, `Operators.h`, `NativeFunctions.h` and
`NativeMetaFunctions.h` into seperate headers per operator base name.
With `at::sum` as an example, we can include:
```cpp
<ATen/core/sum.h> // Like Functions.h
<ATen/core/sum_ops.h> // Like Operators.h
<ATen/core/sum_native.h> // Like NativeFunctions.h
<ATen/core/sum_meta.h> // Like NativeMetaFunctions.h
```
The umbrella headers are still being generated, but all they do is
include from the `ATen/ops' folder.
Further, `TensorBody.h` now only includes the operators that have
method variants. Which means files that only include `Tensor.h` don't
need to be rebuilt when you modify function-only operators. Currently
there are about 680 operators that don't have method variants, so this
is potentially a significant win for incremental builds.
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision: D32596272
Pulled By: albanD
fbshipit-source-id: 447671b2b6adc1364f66ed9717c896dae25fa272
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68246
Currently the codegen produces a list of output files at CMake
configuration time and the build system has no way of knowing if the
outputs change. So if that happens, you basically need to delete the
build folder and re-run from scratch.
Instead, this generates the output list every time the code generation
is run and changes the output to be a `.cmake` file that gets included
in the main cmake configuration step. That means the build system
knows to re-run cmake automatically if a new output is added. So, for
example you could change the number of shards that `Operators.cpp` is
split into and it all just works transparently to the user.
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D32596268
Pulled By: albanD
fbshipit-source-id: 15e0896aeaead90aed64b9c8fda70cf28fef13a2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69296
remove a commented block of code that was accidentally checked in
Test Plan: no testable changes
Reviewed By: alanwaketan
Differential Revision: D32799197
fbshipit-source-id: d3eb05cbafb0f5a4a3f41c17f66ca6d0c2fc60b7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69327
Original commit changeset: d44096d88265
Original Phabricator Diff: D32144240 (668574af4a)
Test Plan:
CI
original diff failed 175 builds in CI
Reviewed By: airboyang, anjali411
Differential Revision: D32809407
fbshipit-source-id: c7c8e69bcee0274992e2d5da901f035332e60071