pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Xiao Wang	ef0332e36d	Allow relocatable device code linking in pytorch CUDA extensions (#78225 ) Close https://github.com/pytorch/pytorch/issues/57543 Doc: check `Relocatable device code linking:` in https://docs-preview.pytorch.org/78225/cpp_extension.html#torch.utils.cpp_extension.CUDAExtension Pull Request resolved: https://github.com/pytorch/pytorch/pull/78225 Approved by: https://github.com/ezyang, https://github.com/malfet	2022-06-02 21:35:56 +00:00
Nikita Shulga	6302cdb9bc	[Reland] Add BUILD_LAZY_CUDA_LINALG option (#73447 ) Summary: When enabled, it will generate `torch_cuda_linalg` library, which would depend on cusolve and magma and registers dynamic bindings to it from LinearAlgebraStubs Avoid symbol clashes that can result in infinite recursion by moving all symbols in the library to its own namespace. Add checks that should prevent calling self in recursion to `LinearAlgebraStubs.cpp` Pull Request resolved: https://github.com/pytorch/pytorch/pull/73447 Reviewed By: albanD Differential Revision: D34538827 Pulled By: malfet fbshipit-source-id: f2535b471d3524768a84b2e169b6aa24c26c03bf (cherry picked from commit 4ec24b079c861c1122f0fa86e280b977c3c2f7ac)	2022-03-01 21:33:07 +00:00
Jane Xu	31271284bc	Revert D33992795: Add BUILD_LAZY_CUDA_LINALG option Test Plan: revert-hammer Differential Revision: D33992795 (`82130758f0`) Original commit changeset: d1fa351a3206 Original Phabricator Diff: D33992795 (`82130758f0`) fbshipit-source-id: f0a66d7431aea2c358718eef16fab05712cd6cae (cherry picked from commit df4900115f712e477ed5cc97510e6515a1ca17a9)	2022-02-25 18:37:31 +00:00
Nikita Shulga	82130758f0	Add BUILD_LAZY_CUDA_LINALG option (#72306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72306 When enable, it will generate `torch_cuda_linalg` library, which would depend on cusolve and magma and registers dynamic bindings to it from LinearAlgebraStubs Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33992795 Pulled By: malfet fbshipit-source-id: d1fa351a320659b29754997c20d754e69bfe36c0 (cherry picked from commit d5d6c69a988b9454538ecd28674206da2541de17)	2022-02-24 03:30:04 +00:00
Masaki Kozuki	7c739e1ab9	Resubmit #67161 (#67735 ) Summary: Skip building extensions if windows following https://github.com/pytorch/pytorch/pull/67161#issuecomment-958062611 Related issue: https://github.com/pytorch/pytorch/issues/67073 cc ngimel xwang233 ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/67735 Reviewed By: bdhirsh Differential Revision: D32141250 Pulled By: ngimel fbshipit-source-id: 9bfdb7cf694c99f6fc8cbe9033a12429b6e4b6fe	2021-11-04 09:59:30 -07:00
Mike Ruberry	aa16de517d	Revert D31984694: [pytorch][PR] make `TORCH_(CUDABLAS\|CUSOLVER)_CHECK` usable in custom extensions Test Plan: revert-hammer Differential Revision: D31984694 (`d4493b27ee`) Original commit changeset: 0035ecd13980 fbshipit-source-id: c85689007719c9e4a930b0a8a32d481a501d3c14	2021-10-30 03:51:18 -07:00
Masaki Kozuki	d4493b27ee	make `TORCH_(CUDABLAS\|CUSOLVER)_CHECK` usable in custom extensions (#67161 ) Summary: Make `TORCH_CUDABLAS_CHECK` and `TORCH_CUSOLVER_CHECK` available in custom extensions by exporting the internal functions called by the both macros. Rel: https://github.com/pytorch/pytorch/issues/67073 cc xwang233 ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/67161 Reviewed By: jbschlosser Differential Revision: D31984694 Pulled By: ngimel fbshipit-source-id: 0035ecd1398078cf7d3abc23aaefda57aaa31106	2021-10-29 17:27:07 -07:00
Aaron Bockover	c78ab28441	Add support for the ONNX Runtime Eager Mode backend (#58248 ) Summary: This PR implements the necessary hooks/stubs/enums/etc for complete ONNX Runtime (ORT) Eager Mode integration. The actual extension will live out of tree at https://github.com/pytorch/ort. We have been [working on this at Microsoft](https://github.com/microsoft/onnxruntime-pytorch/tree/eager-ort/torch_onnxruntime) for the last few months, and are finally ready to contribute the PyTorch core changes upstream (nothing major or exciting, just the usual boilerplate for adding new backends). The ORT backend will allow us to ferry [almost] all torch ops into granular ONNX kernels that ORT will eagerly execute against any devices it supports (therefore, we only need a single ORT backend from a PyTorch perspective). Pull Request resolved: https://github.com/pytorch/pytorch/pull/58248 Reviewed By: astaff Differential Revision: D30344992 Pulled By: albanD fbshipit-source-id: 69082b32121246340d686e16653626114b7714b2	2021-08-20 11:17:13 -07:00
Shen Li	1022443168	Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: revert-hammer Differential Revision: D30279364 (`b004307252`) Original commit changeset: c1ed77dfe43a fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e	2021-08-12 11:45:01 -07:00
Zsolt Dollenstein	b004307252	[codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: manual inspection & sandcastle Reviewed By: zertosh Differential Revision: D30279364 fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a	2021-08-12 10:58:35 -07:00
Jiewen Tan	357c4d9cc4	Add a test case for findDanglingImpls (#61104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61104 This patch added a new test case for findDanglingImpls. The test case introduces a C++ extension which has a dangling impl such that findDanglingImpls can find it and output its information. Test Plan: python test/test_dispatch.py TestDispatch.test_find_dangling_impls_ext Imported from OSS Reviewed By: ezyang Differential Revision: D29512520 fbshipit-source-id: 6883fb8f065f2c0ae0e7a1adf6fd298591497e2b	2021-07-07 13:34:16 -07:00
Gao, Xiang	a1f9a3c643	Fix UB in library.h (#57962 ) Summary: The function name and return type both are called `class_`, therefore they are ambiguous and this is UB and does not work on NVCC. See the tests for the failure case. Thanks for the help of Thibaut Lutz from NVIDIA's compiler team. cc: yueyericardo ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/57962 Reviewed By: mruberry Differential Revision: D28359400 Pulled By: ezyang fbshipit-source-id: c64ec89203f99f656611aba34f7424eed7bc9e7c	2021-05-11 16:04:02 -07:00
Liang Luo	c37095760d	[torch distributed] Implementing all_gather_base (#56315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56315 This diff implements the all_gather_base in pytorch distributed. Test Plan: dist.all_gather_base(output, input)... Reviewed By: agolynski, amylittleyang Differential Revision: D27488999 fbshipit-source-id: 937ec8bddf9527fa4d114f984d1d0f6a5b8c3936	2021-04-23 14:16:47 -07:00
peter	3517ee1bcb	Fix ordered_dict.h for CUDA on Windows (#55275 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55266 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55275 Reviewed By: mrshenli Differential Revision: D27623887 Pulled By: malfet fbshipit-source-id: 6dac357e21179a259ac95f0e1b7399b03dacc81d	2021-04-07 23:43:35 -07:00
Wenlei Xie	2ecb2c7931	Pass Scalar by reference (#53583 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53583 `Scalar` takes 32 bytes due to `c10::complex<double>` requires aligning to 16 bytes. Passing Scalar by reference shows about 1% improvements on instruction count. All the changes in this commit are codemoded except for the following 4 files (which code-gen signatures): ``` tools/codegen/api/cpp.py tools/codegen/api/native.py tools/codegen/api/structured.py caffe2/contrib/aten/gen_op.py ``` # Codemode ## Main Step For the codemod part, here is the main command used: ``` fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}' ``` As you can tell, it codemods both `Scalar` and `optional<Scalar>`. Apply these commands iteratively until reaching a fix-point (since one method signature might contain multiple `Scalar` parameter). In retrospect, excluding `thrid_party` and `torch/csrc/jit` would be a good idea. (I revert it manually later, see https://github.com/pytorch/pytorch/pull/53479 as an reference). ## Pre-Step Prior to applying the main command, as some `Scalar` are presented as `at::Scalar` or `c10::Scalar`, so I codemod some of them in advance. Here is an incomplete list: ``` fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)at::Scalar (\w+)' '${1}const at::Scalar& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)at::Scalar (\w+)' '${1}const at::Scalar& ${2}' fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}' ``` ## Fixup There are a couple of post codemod fixup. For example, `const Scalar` will be codemoded into `const const Scalar&`. `at:Scalar` will be codemoded into `at::const Scalar&` (if `Pre-step` is not done comprehensively). Here is an incomplete list: ``` fastmod --extensions cpp 'const const Scalar' 'const Scalar' fastmod --extensions h 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>' fastmod --extensions cpp 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>' fastmod 'at::const Scalar&' 'const at::Scalar&' ``` ## Supplementary `cu` and `mm` files also need to be codemoded, for example: ``` fastmod --extensions cu 'at::const Scalar&' 'const at::Scalar&' fastmod --extensions mm '([a-zA-Z_+]$[^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' ``` Function pointers are not codemoded. Here is an incomplete list: ``` # Cover case: using index_fill_fn = void()(TensorIterator & iter, int64_t dim, int64_t self_dim_size, int64_t self_dim_stride, Scalar source); fastmod --extensions h '(void\s\(\s\\s$$[^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' # Cover case: using softplus_fn = void ()(TensorIterator&, Scalar, Scalar); fastmod --extensions h '(void\s\(\s\\s$$[^)],?\s)Scalar([, $])' '${1}const Scalar&${2}' fastmod --extensions cpp '(void\s$\s\\s$$[^)],?\s)Scalar([, $])' '${1}const Scalar&${2}' fastmod --extensions h '(void\s$\s\\s$$[^)],?\s)optional<Scalar>([, $])' '${1}const optional<Scalar>&${2}' ``` Some corner cases needs to be manually fixed. ghstack-source-id: 123970306 Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D26904445 fbshipit-source-id: 8d8a002af4b5125f153a32f03c6956be7ae5671d	2021-03-15 23:17:06 -07:00
Edward Yang	37bf6c134b	Register DefaultBackend implementations for functional/inplace structured operators (#53037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53037 As remarked in #52277 it is easy to give an (inefficient, due to extra redispatches) DefaultBackend implementation of foo and foo_ in terms of foo_out. This patch enables code generation for DefaultBackend in these cases by default for all structured kernels. You can see the payoff in MSNPU extension: it only has to register a kernel for add.out, and it gets add and add_ kernels automatically. The actual code changes are very modest: - When DefaultBackend, call the dispatched (not direct native::) functions to allocate tensors, change device guard, etc - Don't call impl() for DefaultBackend (as it doesn't exist); instead, directly generate a call to at::foo_out to do the actual work. - Do NOT generate DefaultBackend implementation for foo_out. Actually, there is a case to be made for this being a good idea with more infra; see comments inside. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D26731225 Pulled By: ezyang fbshipit-source-id: 939da7cb69f694722ec293e5e42e74a755dd0985	2021-03-02 14:13:08 -08:00
Qifan Lu	4e2ab2cd73	Move generator state APIs to ATen (#49589 ) Summary: ## Rationale While most of the `torch.Generator` properties and methods are implemented as a thin wrapper of the corresponding `at::Generator` methods, `torch.Generator.get_state()` and `torch.Generator.set_state()` are implemented in legacy Torch code and are not dispatched through the `c10::GeneratorImpl` interface. This is not structured well and makes implementing generators for new backends (e.g. `XLAGeneratorImpl` for the XLA backend) inconvenient. As such, this pull request seeks to move these generator state APIs to c10 and ATen. ## What is being refactored? * Interfaces - Added `c10::GeneratorImpl::set_state` and `c10::GeneratorImpl::state` for getting and setting the internal state of a random number generator. - `at::Generator::set_state` and `at::Generator::state` wraps the above-mentioned APIs, as it's basically a PIMPL. - Added helper function `at::detail::check_rng_state` for checking the validity of new RNG state tensor. * CPU Generator - Renamed and moved `THTensor_(setRNGState)` and `THTensor_(getRNGState)` to `CPUGeneratorImpl::set_state` and `CPUGenerator::state`. - Renamed and moved `THGeneratorState` and `THGeneratorStateNew` to `CPUGeneratorStateLegacy` and `CPUGeneratorState`. * CUDA Generator - Renamed and moved `THCRandom_setRNGState` and `THCRandom_getRNGState` to `CUDAGeneratorImpl::set_state` and `CUDAGeneratorImpl::state`. * PyTorch Bindings - `THPGenerator_setState` and `THPGenerator_getState` now simply forward to `at::Generator::set_state` and `at::Generator::state`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49589 Reviewed By: H-Huang Differential Revision: D25785774 Pulled By: pbelevich fbshipit-source-id: 8ed79209c4ffb1a0ae8b19952ac8871ac9e0255f	2021-01-06 18:26:56 -08:00
Sebastian Messmer	4a14020c0d	Remove .impl_UNBOXED() and functionalities associated with it (#49220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49220 Since all ops are c10-full, we can remove .impl_UNBOXED now. This also removes the ability of KernelFunction or CppFunction to store unboxedOnly kernels. ghstack-source-id: 119450489 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25490225 fbshipit-source-id: 32de9d591e6a842fe18abc82541580647e9cfdad	2021-01-06 14:22:46 -08:00
Yixin Bao	840e71f4e6	Check CUDA kernel launches (/fbcode/caffe2/) (#49145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49145 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49105 (1) Add a safety check `C10_CUDA_KERNEL_LAUNCH_CHECK()` after each kernel launch. This diff only changes the files inside the directory /fbsource/fbcode/caffe2/modules/, /fbsource/fbcode/caffe2/fb/, /fbsource/fbcode/caffe2/test/. (2) Get rid of old check `AT_CUDA_CHECK(cudaGetLastError())` when necessary. Test Plan: Test build: ``` buck build mode/dev-nosan //caffe2/modules/detectron: buck test mode/dev-nosan //caffe2/modules/detectron: buck build mode/dev-nosan //caffe2/torch/fb/: buck test mode/dev-nosan //caffe2/torch/fb/: ``` To check for launches without checks: ``` python3 caffe2/torch/testing/check_kernel_launches.py ``` Make sure none of the updated files are in the returned list. Reviewed By: r-barnes Differential Revision: D25452852 fbshipit-source-id: d6657edab612c9e0fa99b29c68460be8b1a20064	2020-12-10 10:43:03 -08:00
Supriya Rao	bfa95f90a0	Revert D25325039: Check CUDA kernel launches (/fbcode/caffe2/) Test Plan: revert-hammer Differential Revision: D25325039 (`f5e9ffbc27`) Original commit changeset: 2043d6e63c7d fbshipit-source-id: 5377dd2aa7c6f58c8641c956b7642c7c559bbc40	2020-12-09 14:07:16 -08:00
Yixin Bao	f5e9ffbc27	Check CUDA kernel launches (/fbcode/caffe2/) (#49105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49105 (1) Add a safety check `C10_CUDA_KERNEL_LAUNCH_CHECK()` after each kernel launch. This diff only changes the files inside the directory /fbsource/fbcode/caffe2/modules/, /fbsource/fbcode/caffe2/fb/, /fbsource/fbcode/caffe2/test/. (2) Get rid of old check `AT_CUDA_CHECK(cudaGetLastError())` when necessary. Test Plan: Test build: ``` buck build //caffe2/modules/detectron: buck build //caffe2/torch/fb/: ``` To check for launches without checks: ``` python3 caffe2/torch/testing/check_kernel_launches.py ``` Make sure none of the updated files are in the returned list. Reviewed By: r-barnes Differential Revision: D25325039 fbshipit-source-id: 2043d6e63c7d029c35576d3101c18247ffe92f01	2020-12-09 12:34:55 -08:00
Jithun Nair	5f62308739	Hipify revamp [REDUX] (#48715 ) Summary: [Refiled version of earlier PR https://github.com/pytorch/pytorch/issues/45451] This PR revamps the hipify module in PyTorch to overcome a long list of shortcomings in the original implementation. However, these improvements are applied only when using hipify to build PyTorch extensions, not for PyTorch or Caffe2 itself. Correspondingly, changes are made to cpp_extension.py to match these improvements. The list of improvements to hipify is as follows: 1. Hipify files in the same directory as the original file, unless there's a "cuda" subdirectory in the original file path, in which case the hipified file will be in the corresponding file path with "hip" subdirectory instead of "cuda". 2. Never hipify the file in-place if changes are introduced due to hipification i.e. always ensure the hipified file either resides in a different folder or has a different filename compared to the original file. 3. Prevent re-hipification of already hipified files. This avoids creation of unnecessary "hip/hip" etc. subdirectories and additional files which have no actual use. 4. Do not write out hipified versions of files if they are identical to the original file. This results in a cleaner output directory, with minimal number of hipified files created. 5. Update header rewrite logic so that it accounts for the previous improvement. 6. Update header rewrite logic so it respects the rules for finding header files depending on whether "" or <> is used. 7. Return a dictionary of mappings of original file paths to hipified file paths from hipify function. 8. Introduce a version for hipify module to allow extensions to contain back-compatible code that targets a specific point in PyTorch where the hipify functionality changed. 9. Update cuda_to_hip_mappings.py to account for the ROCm component subdirectories inside /opt/rocm/include. This also results in cleanup of the Caffe2_HIP_INCLUDE path to remove unnecessary additions to the include path. The list of changes to cpp_extension.py is as follows: 1. Call hipify when building a CUDAExtension for ROCm. 2. Prune the list of source files to CUDAExtension to include only the hipified versions of any source files in the list (if both original and hipified versions of the source file are in the list) 3. Add subdirectories of /opt/rocm/include to the include path for extensions, so that ROCm headers for subcomponent libraries are found automatically cc jeffdaily sunway513 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/48715 Reviewed By: bdhirsh Differential Revision: D25272824 Pulled By: ezyang fbshipit-source-id: 8bba68b27e41ca742781e1c4d7b07c6f985f040e	2020-12-02 18:03:23 -08:00
Nikita Shulga	8af9f2cc23	Revert D24924736: [pytorch][PR] Hipify revamp Test Plan: revert-hammer Differential Revision: D24924736 (`10b490a3e0`) Original commit changeset: 4af42b8ff4f2 fbshipit-source-id: 7f8f90d55d8a69a2890ec73622fcea559189e381	2020-11-18 11:48:30 -08:00
Jithun Nair	10b490a3e0	Hipify revamp (#45451 ) Summary: This PR revamps the hipify module in PyTorch to overcome a long list of shortcomings in the original implementation. However, these improvements are applied only when using hipify to build PyTorch extensions, not for PyTorch or Caffe2 itself. Correspondingly, changes are made to `cpp_extension.py` to match these improvements. The list of improvements to hipify is as follows: 1. Hipify files in the same directory as the original file, unless there's a "cuda" subdirectory in the original file path, in which case the hipified file will be in the corresponding file path with "hip" subdirectory instead of "cuda". 2. Never hipify the file in-place if changes are introduced due to hipification i.e. always ensure the hipified file either resides in a different folder or has a different filename compared to the original file. 3. Prevent re-hipification of already hipified files. This avoids creation of unnecessary "hip/hip" etc. subdirectories and additional files which have no actual use. 4. Do not write out hipified versions of files if they are identical to the original file. This results in a cleaner output directory, with minimal number of hipified files created. 5. Update header rewrite logic so that it accounts for the previous improvement. 6. Update header rewrite logic so it respects the rules for finding header files depending on whether `""` or `<>` is used. 7. Return a dictionary of mappings of original file paths to hipified file paths from `hipify` function. 8. Introduce a version for hipify module to allow extensions to contain back-compatible code that targets a specific point in PyTorch where the hipify functionality changed. 9. Update `cuda_to_hip_mappings.py` to account for the ROCm component subdirectories inside `/opt/rocm/include`. This also results in cleanup of the `Caffe2_HIP_INCLUDE` path to remove unnecessary additions to the include path. The list of changes to `cpp_extension.py` is as follows: 1. Call `hipify` when building a CUDAExtension for ROCm. 2. Prune the list of source files to CUDAExtension to include only the hipified versions of any source files in the list (if both original and hipified versions of the source file are in the list) 3. Add subdirectories of /opt/rocm/include to the include path for extensions, so that ROCm headers for subcomponent libraries are found automatically cc jeffdaily sunway513 hgaspar lcskrishna ashishfarmer Pull Request resolved: https://github.com/pytorch/pytorch/pull/45451 Reviewed By: ezyang Differential Revision: D24924736 Pulled By: malfet fbshipit-source-id: 4af42b8ff4f21c3782dedb8719b8f9f86b34bd2d	2020-11-18 08:37:49 -08:00
Gao, Xiang	b12d645c2f	Test TORCH_LIBRARY in CUDA extension (#47524 ) Summary: In the [official documentation](https://pytorch.org/tutorials/advanced/torch_script_custom_ops.html), it is recommended to use `TORCH_LIBRARY` to register ops for TorchScript. However, that code is never tested with CUDA extension and is actually broken (https://github.com/pytorch/pytorch/issues/47493). This PR adds a test for it. It will not pass CI now, but it will pass when the issue https://github.com/pytorch/pytorch/issues/47493 is fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47524 Reviewed By: zou3519 Differential Revision: D24991839 Pulled By: ezyang fbshipit-source-id: 037196621c7ff9a6e7905efc1097ff97906a0b1c	2020-11-16 13:12:22 -08:00
Sebastian Messmer	edf751ca2f	Make empty c10-full (#46092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46092 Make empty c10-full without using hacky-wrapper, i.e. port the kernel to the new style signature. This PR also changes the signature of some helpers called by empty to the new style. ghstack-source-id: 116544203 (Note: this ignores all push blocking failures!) Test Plan: vs prev diff (outdated, before c10::optional fix): https://www.internalfb.com/intern/fblearner/details/224735103/ after c10::optional fix: https://www.internalfb.com/intern/fblearner/details/231391773/ Also, after the c10::optional fix, the instruction counting benchmark shows a 2% regression for calling empty from Python. We decided this is acceptable and decided against landing D24425836 which would fix the regression. Reviewed By: ezyang Differential Revision: D24219944 fbshipit-source-id: e554096e90ce438c75b679131c3151ff8e5c5d50	2020-11-12 17:08:21 -08:00
Wanchao Liang	553ccccc54	[c10d] switch ProcessGroup to be managed by intrusive_ptr (#47343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47343 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D24723418 Pulled By: wanchaol fbshipit-source-id: 0463819b96c53b12bdbb3905431110d7b21beb77	2020-11-12 07:36:23 -08:00
Wanchao Liang	665ac2f7b0	[reland] [c10d] switch Store to be managed by intrusive_ptr (#47808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47808 reland https://github.com/pytorch/pytorch/pull/47074 Test Plan: wait for ci Reviewed By: gmagogsfm Differential Revision: D24905246 fbshipit-source-id: edeb7e6e486570ce889f12512e9dc02061d6cc03	2020-11-11 22:53:20 -08:00
Wanchao Liang	70ae5685f9	[reland][c10d] switch ProcessGroup::Work to be managed by intrusive_ptr (#47806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47806 reland https://github.com/pytorch/pytorch/pull/44046 Test Plan: wait for ci Reviewed By: gmagogsfm Differential Revision: D24905245 fbshipit-source-id: ad75ace5432fcfd22d513878f5a73c4bb017324e	2020-11-11 22:51:03 -08:00
Wanchao Liang	dac0192148	Revert D23632280: [c10d] switch ProcessGroup::Work to be managed by intrusive_ptr Test Plan: revert-hammer Differential Revision: D23632280 (`0650a6166f`) Original commit changeset: 0a4642a8ffab fbshipit-source-id: 2aa8ddb874fab11f773f4c08d740afcd865482e9	2020-11-11 10:54:08 -08:00
Wanchao Liang	1f946e942d	Revert D24667128: [c10d] switch Store to be managed by intrusive_ptr Test Plan: revert-hammer Differential Revision: D24667128 (`0cfe3451d4`) Original commit changeset: 9b6024c31c85 fbshipit-source-id: d8ddf9eb2fccef5023e05698e0c4662708fe4945	2020-11-11 10:49:58 -08:00
Wanchao Liang	0cfe3451d4	[c10d] switch Store to be managed by intrusive_ptr (#47074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47074 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D24667128 Pulled By: wanchaol fbshipit-source-id: 9b6024c31c851b7c3243540f460ae57323da523b	2020-11-10 23:36:44 -08:00
Wanchao Liang	0650a6166f	[c10d] switch ProcessGroup::Work to be managed by intrusive_ptr (#44046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44046 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23632280 Pulled By: wanchaol fbshipit-source-id: 0a4642a8ffabdd26c52c1baabfa30c0f446c3c85	2020-11-10 23:30:22 -08:00
Pritam Damania	a2b4177c5b	Add barrier() at the end of init_process_group and new_group. (#45181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45181 `init_process_group` and `new_group` update a bunch of global variables after initializing the actual process group. As a result, there is a race that after initializing the process group on say rank 0, if we immediately check the default process group on rank 1 (say via RPC), we might actually get an error since rank 1 hasn't yet updated its _default_pg variable. To resolve this issue, I've added barrier() at the end of both of these calls. This ensures that once these calls return we are guaranteed about correct initialization on all ranks. Since these calls are usually done mostly during initialization, it should be fine to add the overhead of a barrier() here. #Closes: https://github.com/pytorch/pytorch/issues/40434, https://github.com/pytorch/pytorch/issues/40378 ghstack-source-id: 112923112 Test Plan: Reproduced the failures in https://github.com/pytorch/pytorch/issues/40434 and https://github.com/pytorch/pytorch/issues/40378 and verified that this PR fixes the issue. Reviewed By: mrshenli Differential Revision: D23858025 fbshipit-source-id: c4d5e46c2157981caf3ba1525dec5310dcbc1830	2020-09-25 15:46:59 -07:00
Basil Hosmer	79e6aaeb4c	pull empty() out of use_c10_dispatcher: full (#43572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43572 Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D23326019 Pulled By: bhosmer fbshipit-source-id: 10a4d7ffe33b4be4ae45396725456c6097ce1757	2020-08-26 22:51:06 -07:00
Sebastian Messmer	3a19af2427	Make operators with optional Tensor? arguments c10-full (#41610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41610 Previously, operators that have a `Tensor?` (i.e. optional tensor) in their schema implemented it using `Tensor` in C++ and filled in an undefined tensor for the None case. The c10 operator library, however, expects `Tensor?` to be represented as `optional<Tensor>`, so those operators couldn't be c10-full yet and still had to use codegenerated unboxing instead of templated unboxing. This PR changes that. It extends the `hacky_wrapper_for_legacy_signatures` to not only take case of TensorOptions, but now also map between signatures taking `Tensor` and `optional<Tensor>`. For this, it requires an additional template parameter, the expected signature, and it uses that to go argument-by-argument and unwrap any optionals it finds. ghstack-source-id: 108873701 Test Plan: waitforsandcastle Reviewed By: bhosmer Differential Revision: D22607879 fbshipit-source-id: 57b2fb01a294b804f82cd55cd70f0ef4a478e14f	2020-07-31 16:09:08 -07:00
Stanislau Hlebik	b774ce54f8	remediation of S205607 fbshipit-source-id: 798decc90db4f13770e97cdce3c0df7d5421b2a3	2020-07-17 17:19:47 -07:00
Stanislau Hlebik	8fdea489af	remediation of S205607 fbshipit-source-id: 5113fe0c527595e4227ff827253b7414abbdf7ac	2020-07-17 17:17:03 -07:00
Omkar Salpekar	9d92fa2679	[NCCL] Add timeout to ProcessGroup Work Wait (#40944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40944 This stack adds Work-level timeout for blocking wait. This PR just changes the API to accept a default wait arg for the wait function in each ProcessGroup backend. The ProcessGroup superclass correctly waits for the given timeout by changing the CV wait to wait_for. Closes: https://github.com/pytorch/pytorch/issues/37571 ghstack-source-id: 107835735 Test Plan: Tests in 4th PR in this stack Reviewed By: jiayisuse Differential Revision: D22107135 fbshipit-source-id: b38c07cb5e79e6c86c205e580336e7918ed96501	2020-07-16 10:56:58 -07:00
Changji Shi	47c72be3d7	Port /test/cpp_extensions/rng_extension.cpp to new operator registration API (#39459 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39459 Update to this PR: this code isn't going to fully solve https://github.com/pytorch/pytorch/issues/37010. The changes required for 37010 is more than this PR initially planned. Instead, this PR switches op registration of rng related tests to use the new API (similar to what was done in #36925) Test Plan: 1) unit tests Imported from OSS Reviewed By: ezyang Differential Revision: D22264889 fbshipit-source-id: 82488ac6e3b762a756818434e22c2a0f9cb9dd47	2020-06-26 16:12:54 -07:00
Sebastian Messmer	0494e0ad70	Back out "Revert D21581908: Move TensorOptions ops to c10" (#40595 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40595 ghstack-source-id: 106691774 Test Plan: waitforsandcastle Differential Revision: D22247729 fbshipit-source-id: 14745588cae267c1e0cc51cd9541a9b8abb830e5	2020-06-26 12:57:09 -07:00
Sebastian Messmer	581ad48806	Revert D21581908: Move TensorOptions ops to c10 Test Plan: revert-hammer Differential Revision: D21581908 Original commit changeset: 6d4a9f526fd7 fbshipit-source-id: fe1e6368a09120ea40dea405e8409983541e3cb5	2020-06-23 16:10:07 -07:00
Sebastian Messmer	b623bdeabb	Move TensorOptions ops to c10 (#39492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39492 This PR adds use_c10_dispatcher: full to ops taking TensorOptions. To allow this, since the c10 operator library doesn't know about TensorOptions, we need to register the operator kernels as optional<ScalarType>, optional<Device>, optional<Layout>, optional<bool> instead, and also call them this way. Changes: Add use_c10_dispatcher: full to those ops Write hacky_wrapper_for_legacy_signatures which takes an old-style kernel (i.e. one written to take TensorOptions) an creates a wrapper kernel for it that takes the scattered optional<ScalarType>, optional<Device>, optional<Layout>, optional<bool> instead. Change codegen so that all op registrations are wrapped into hacky_wrapper_for_legacy_signatures. This is added to all ops but is a no-op if the op doesn't take TensorOptions. This allows us in the future to just change a kernel signature from TensorOptions to the scattered version and have it work without having to touch codegen. Change codegen so that the frontend calls those operators with expanded arguments instead of with a TensorOptions object. This is required because now the kernels are written in this way. This PR does not remove TensorOptions special cases from codegen, but instead it separates kernels from the codegen/frontend issues. After this, kernels can be worked on separately without having to touch codegen and codegen can be worked on without having to touch kernels. Codegen diff: P133121032 ghstack-source-id: 106426630 Test Plan: waitforsandcastle Differential Revision: D21581908 fbshipit-source-id: 6d4a9f526fd70fae40581bf26f3ccf794ce6a89e	2020-06-23 14:13:34 -07:00
Sebastian Messmer	86b1afa039	Assert that kernels are called with the right signature (#40251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40251 Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel. This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly. ghstack-source-id: 106194240 Test Plan: waitforsandcastle Differential Revision: D22126701 fbshipit-source-id: 6c908a822e295757bcc0014f78f51e6a560f221f	2020-06-18 21:54:05 -07:00
Sebastian Messmer	cb8b2f0636	Revert D21534052: Assert that kernels are called with the right signature Test Plan: revert-hammer Differential Revision: D21534052 Original commit changeset: 6be436a3f205 fbshipit-source-id: a149c5ca7f9e78947ae3059ac4470712f291660b	2020-06-18 15:00:13 -07:00
Sebastian Messmer	55cdd31bd0	Assert that kernels are called with the right signature (#38361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38361 Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel. This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly. supersedes D17485438 ghstack-source-id: 106178820 Test Plan: waitforsandcastle Differential Revision: D21534052 fbshipit-source-id: 6be436a3f20586277a051d764af29e21d5567da0	2020-06-18 14:22:48 -07:00
Kurt Mohler	f9eb8824f1	Remove datatype from Storage and StorageImpl (#38870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38870 * Removed dtype data member from StorageImpl * Removed any methods or method arguments in Storage/StorageImpl that deal with dtypes * Update all callers of the changed API Part of issue https://github.com/pytorch/pytorch/issues/33950 Original PR: https://github.com/pytorch/pytorch/pull/38038 Reviewed By: albanD Differential Revision: D21549645 Pulled By: ezyang fbshipit-source-id: 4289b356c55ff6b9530376a79343b99b540ee3de	2020-05-21 15:26:08 -07:00
lixinyu	5a979fcb99	allow user passing relative paths in include_dirs within setuptools.setup (#38264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38264 Test Plan: Imported from OSS Differential Revision: D21509277 Pulled By: glaringlee fbshipit-source-id: b0bc17d375a89b96b1bdacde5987b4f4baa9468e	2020-05-13 20:00:12 -07:00
Edward Yang	fe88806784	Back out "Revert D21171334: [pytorch][PR] Change StorageImpl to track byte count rather than element count" (#37893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37893 Original commit changeset: 50746043acf3 Test Plan: sandcastle and ossci Reviewed By: malfet, seemethere, ngimel Differential Revision: D21416509 fbshipit-source-id: 735ec4e61f9d36d4537f52dd2dc6267751aeb94b	2020-05-05 22:43:15 -07:00
Edward Yang	a2fc7f787a	Revert D21171334: [pytorch][PR] Change StorageImpl to track byte count rather than element count Test Plan: revert-hammer Differential Revision: D21171334 Original commit changeset: 37329a379de9 fbshipit-source-id: 50746043acf3c76754688de0fe6f1cc12437ea2f	2020-05-05 16:36:15 -07:00

1 2 3 4

156 Commits