Using the same `tools/generate_torch_version.py` script
It's already available on Python level, but not on C++ one
Please note, that updating commit hash will force recompilation of less than 10 files according to
```
% touch caffe2/core/macros.h; ninja -d explain -j1 -v -n torch_python
ninja explain: output caffe2/torch/CMakeFiles/gen_torch_version doesn't exist
ninja explain: caffe2/torch/CMakeFiles/gen_torch_version is dirty
ninja explain: /Users/malfet/git/pytorch/pytorch/torch/version.py is dirty
ninja explain: output third_party/kineto/libkineto/CMakeFiles/libkineto_defs.bzl of phony edge with no inputs doesn't exist
ninja explain: third_party/kineto/libkineto/CMakeFiles/libkineto_defs.bzl is dirty
ninja explain: output caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Version.cpp.o older than most recent input /Users/malfet/git/pytorch/pytorch/build/caffe2/core/macros.h (1732301546390618881 vs 1732301802196214000)
ninja explain: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Version.cpp.o is dirty
ninja explain: output caffe2/CMakeFiles/torch_cpu.dir/core/common.cc.o older than most recent input /Users/malfet/git/pytorch/pytorch/build/caffe2/core/macros.h (1732301546233600752 vs 1732301802196214000)
ninja explain: caffe2/CMakeFiles/torch_cpu.dir/core/common.cc.o is dirty
ninja explain: output caffe2/CMakeFiles/torch_cpu.dir/serialize/inline_container.cc.o older than most recent input /Users/malfet/git/pytorch/pytorch/build/caffe2/core/macros.h (1732301546651089243 vs 1732301802196214000)
ninja explain: caffe2/CMakeFiles/torch_cpu.dir/serialize/inline_container.cc.o is dirty
ninja explain: output caffe2/CMakeFiles/torch_cpu.dir/serialize/file_adapter.cc.o older than most recent input /Users/malfet/git/pytorch/pytorch/build/caffe2/core/macros.h (1732301546224176845 vs 1732301802196214000)
ninja explain: caffe2/CMakeFiles/torch_cpu.dir/serialize/file_adapter.cc.o is dirty
ninja explain: output caffe2/CMakeFiles/torch_cpu.dir/utils/threadpool/ThreadPool.cc.o older than most recent input /Users/malfet/git/pytorch/pytorch/build/caffe2/core/macros.h (1732301546464535054 vs 1732301802196214000)
ninja explain: caffe2/CMakeFiles/torch_cpu.dir/utils/threadpool/ThreadPool.cc.o is dirty
ninja explain: output caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/impl.cpp.o older than most recent input /Users/malfet/git/pytorch/pytorch/build/caffe2/core/macros.h (1732301550062608920 vs 1732301802196214000)
ninja explain: caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/impl.cpp.o is dirty
ninja explain: output caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/mps/MPSFallback.mm.o older than most recent input /Users/malfet/git/pytorch/pytorch/build/caffe2/core/macros.h (1732301547538843492 vs 1732301802196214000)
ninja explain: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/mps/MPSFallback.mm.o is dirty
```
Differential Revision: [D66468257](https://our.internmc.facebook.com/intern/diff/D66468257)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141313
Approved by: https://github.com/ezyang
Related to #103973#110532#108404#94891
**Context:**
As commented in 6ae0554d11/cmake/Dependencies.cmake (L1198)
Kernel asserts are enabled by default for CUDA and disabled for ROCm.
However it is somewhat broken, and Kernel assert was still enabled for ROCm.
Disabling kernel assert is also needed for users who do not have PCIe atomics support. These community users have verified that disabling the kernel assert in PyTorch/ROCm platform fixed their pytorch workflow, like torch.sum script, stable-diffusion. (see the related issues)
**Changes:**
This pull request serves the following purposes:
* Refactor and clean up the logic, make it simpler for ROCm to enable and disable Kernel Asserts
* Fix the bug that Kernel Asserts for ROCm was not disabled by default.
Specifically,
- Renamed `TORCH_DISABLE_GPU_ASSERTS` to `C10_USE_ROCM_KERNEL_ASSERT` for the following reasons:
(1) This variable only applies to ROCm.
(2) The new name is more align with #define CUDA_KERNEL_ASSERT function.
(3) With USE_ in front of the name, we can easily control it with environment variable to turn on and off this feature during build (e.g. `USE_ROCM_KERNEL_ASSERT=1 python setup.py develop` will enable kernel assert for ROCm build).
- Get rid of the `ROCM_FORCE_ENABLE_GPU_ASSERTS' to simplify the logic and make it easier to understand and maintain
- Added `#cmakedefine` to carry over the CMake variable to C++
**Tests:**
(1) build with default mode and verify that USE_ROCM_KERNEL_ASSERT is OFF(0), and kernel assert is disabled:
```
python setup.py develop
```
Verify CMakeCache.txt has correct value.
```
/xxxx/pytorch/build$ grep USE_ROCM_KERNEL_ASSERT CMakeCache.txt
USE_ROCM_KERNEL_ASSERT:BOOL=0
```
Tested the following code in ROCm build and CUDA build, and expected the return code differently.
```
subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"])
```
This piece of code is adapted from below unit test to get around the limitation that this unit test now was skipped for ROCm. (We will check to enable this unit test in the future)
```
python test/test_cuda_expandable_segments.py -k test_fixed_cuda_assert_async
```
Ran the following script, expecting r ==0 since the CUDA_KERNEL_ASSERT is defined as nothing:
```
>> import sys
>>> import subprocess
>>> r=subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"])
>>> r
0
```
(2) Enable the kernel assert by building with USE_ROCM_KERNEL_ASSERT=1, or USE_ROCM_KERNEL_ASSERT=ON
```
USE_ROCM_KERNEL_ASSERT=1 python setup.py develop
```
Verify `USE_ROCM_KERNEL_ASSERT` is `1`
```
/xxxx/pytorch/build$ grep USE_ROCM_KERNEL_ASSERT CMakeCache.txt
USE_ROCM_KERNEL_ASSERT:BOOL=1
```
Run the assert test, and expected return code not equal to 0.
```
>> import sys
>>> import subprocess
>>> r=subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"])
>>>/xxxx/pytorch/aten/src/ATen/native/hip/TensorCompare.hip:108: _assert_async_cuda_kernel: Device-side assertion `input[0] != 0' failed.
:0:rocdevice.cpp :2690: 2435301199202 us: [pid:206019 tid:0x7f6cf0a77700] Callback: Queue 0x7f64e8400000 aborting with error : HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception. code: 0x1016
>>> r
-6
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114660
Approved by: https://github.com/jeffdaily, https://github.com/malfet, https://github.com/jithunnair-amd
remove unused CAFFE2_VERSION macros
Summary:
Nothing reads these and they are completely subsumed by TORCH_VERSION.
Getting rid of these will be helpful for build unification, since they
are also not used internally.
Test Plan: Rely on CI.
Reviewers: sahanp
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97337
Approved by: https://github.com/malfet
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610
- Replace HIP_PLATFORM_HCC with USE_ROCM
- Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION.
- In the next PR
- Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify.
- HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc.
cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd
Reviewed By: jbschlosser
Differential Revision: D30909053
Pulled By: ezyang
fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06
Summary:
many newly added build settings are not saved in torch.__config__. adding them to the mix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48380
Reviewed By: samestep
Differential Revision: D25161951
Pulled By: walterddr
fbshipit-source-id: 1d3dee033c93f2d1a7e2a6bcaf88aedafeac8d31
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43564
Static dispatch was originally introduced for mobile selective build.
Since we have added selective build support for dynamic dispatch and
tested it in FB production for months, we can deprecate static dispatch
to reduce the complexity of the codebase.
Test Plan: Imported from OSS
Reviewed By: ezyang
Differential Revision: D23324452
Pulled By: ljk53
fbshipit-source-id: d2970257616a8c6337f90249076fca1ae93090c7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31116
Changelist:
- remove BUILD_NAMEDTENSOR macro
- remove torch._C._BUILD_NAMEDTENSOR
- remove all python behavior that relies on torch._C._BUILD_NAMEDTENSOR
Future:
- In the next diff, I will remove all usages of
ATen/core/EnableNamedTensor.h since that header doesn't do anything
anymore
- After that, we'll be done with the BUILD_NAMEDTENSOR removal.
Test Plan: - run CI
Differential Revision: D18934951
Pulled By: zou3519
fbshipit-source-id: 0a0df0f1f0470d0a01c495579333a2835aac9f5d
Summary:
Currently the build system accepts USE_NAMEDTENSOR from the environment
variable and turns it into NAMEDTENSOR_ENABLED when passing to CMake.
This discrepancy does not seem necessary and complicates the build
system. The naming of this build option is also semantically incorrect
("BUILD_" vis-a-vis "USE_"). This commit eradicate this issue before it
is made into a stable release.
The support of NO_NAMEDTENSOR is also removed, since PyTorch has been
quite inconsistent about "NO_*" build options.
---
Note: All environment variables with their names starting with `BUILD_` are currently automatically passed to CMake with no need of an additional wrapper.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22360
Differential Revision: D16074509
Pulled By: zou3519
fbshipit-source-id: dc316287e26192118f3c99b945454bc50535b2ae
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14664
This diff just adds a framework to add avx512 kernels.
Please be really really careful about using avx512 kernels unless you're convinced using avx512 will bring good enough *overall* speedups because it can backfire because of cpu frequency going down.
Reviewed By: duc0
Differential Revision: D13281944
fbshipit-source-id: 04fce8619c63f814944b727a99fbd7d35538eac6
Summary:
All usages of the `ndarray` construct have now been guarded with `USE_NUMPY`. This eliminates the requirement of NumPy while building PyTorch from source.
Fixes#11757
Reviewed By: Yangqing
Differential Revision: D10031862
Pulled By: SsnL
fbshipit-source-id: 32d84fd770a7714d544e2ca1895a3d7c75b3d712
Summary:
CAFFE2_UNIQUE_LONG_TYPEMETA has been a tricky variable defined only from cmake - this is an experiment to remove it and see what exact compilers need that one set.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12311
Reviewed By: dzhulgakov
Differential Revision: D10187777
Pulled By: Yangqing
fbshipit-source-id: 03e4ede4eafc291e947e0449382bc557cb624b34
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12046
This /sounds/ like a good idea in theory, but a feature
like this must be implemented very carefully, because if
you just plop the Git version in a header (that is included
by every file in your project, as macros.h is), then every
time you do a 'git pull', you will do a FULL rebuild, because
macros.h is going to regenerate to a new version and of course
you have to rebuild a source file if a header file changes.
I don't have time to implement it correctly, so I'm axing
the feature instead. If you want git versions in, e.g.,
nightly builds, please explicitly specify that when you feed
in the version.
Reviewed By: pjh5
Differential Revision: D10030556
fbshipit-source-id: 499d001c7b8ccd4ef15ce10dd6591c300c7df27d
* Make ATen buildable without all Caffe2 by root cmake
* Fix typo in aten cmake
* Set BUILD_ATEN from USE_ATEN as compat
* Only set BUILD_ATEN from USE_ATEN when on
* Have USE_GLOO only set when BUILD_CAFFE2
* Add support to TensorRT
* Removed License header
* Bind input/output by position
* Comments
* More comments
* Add benchmark
* Add warning for performance degradation on large batch
* Address comments
* comments
Summary:
This will help releasing models that are using Caffe2 but have their own operator implementations and extensions. More detailed docs to arrive later. Let's see what contbuild says.
Closes https://github.com/caffe2/caffe2/pull/1378
Differential Revision: D6155045
Pulled By: Yangqing
fbshipit-source-id: 657a4c8de2f8e095bad5ed5db5b3e476b2a877e1
Summary:
Useful for figuring out with people which version they built with. We can just ask for --caffe2_version gflag or get core.build_options from python.
Also adds CMAKE_INSTALL_RPATH_USE_LINK_PATH - without it wasn't building on my Mac. How should it be tested?
Closes https://github.com/caffe2/caffe2/pull/1271
Reviewed By: bddppq
Differential Revision: D5940750
Pulled By: dzhulgakov
fbshipit-source-id: 45b4c94f67e79346a10a65b34f40fd258295dad1
Summary:
This brings proper versioning in Caffe2: instead of manual version macros, this puts the version information in CMake (replacing the TODO bwasti line) and uses macros.h.in to then generate the version in the C++ header.
A few misc updates:
- Removed the mac os rpath, verified on local macbook that it is no longer needed.
- Misc updates for caffe2 ready:
- Mapped cmake/Cuda.cmake with gloo's setting.
- upstreamed third_party/nccl so it builds with cuda 9.
- Separated the Caffe2 cpu dependencies and cuda dependencies
- now libCaffe2_CPU.so do not depend on any cuda libs.
- caffe2 python extensions now depend on cpu and gpu separately too.
- Reduced the number of unused functions in Utils.cmake
Closes https://github.com/caffe2/caffe2/pull/1256
Reviewed By: dzhulgakov
Differential Revision: D5899210
Pulled By: Yangqing
fbshipit-source-id: 36366e47366c3258374d646cf410b5f49f95767b
Summary:
Using file(WRITE) caused the file to be rewritten for every CMake
reconfigure, which was causing unnecessary full rebuilds of the project
even when no source files changed.
The new strategy has the added benefit of enforcing that the macros.h file
is always generated correctly. When the main project relies on this
header for macro definitions (instead of relying on add_definitions()),
we can be more confident that the project will build correctly when used
as a library (which is the whole point of the macros.h file).
Upsides:
* No more unnecessary rebuilds
* Higher confidence that the project will compile properly as a third-party library
Downsides:
* Developers need to add an entry to `macros.h.in` whenever they would have added a new definition with `add_definitions()`
Closes https://github.com/caffe2/caffe2/pull/1103
Differential Revision: D5680367
Pulled By: Yangqing
fbshipit-source-id: 4db29c28589efda1b6a3f5f88752e3984260a0f2