pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
egienvalue	8461e7ed9e	Add test_cpp_extensions tests for stream_and_event and mita_backend (#123614 ) Test the generic torch.Stream/Event with fake device gurad and hooks. Since we added a fake device backend, it is mutual exclusive to other backends. Tests will be skipped if TEST_CUDA or TEST_ROCM is true. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123614 Approved by: https://github.com/albanD ghstack dependencies: #123611, #123612	2024-04-26 16:17:54 +00:00
Shan19900305	8d12ba9acf	add methods for open device in PackedSequence module. (#124923 ) 1) add is_{custom_device_name}() and {custom_device_name}() for open device register; 2) fix open device failed testcases. @ezyang @bdhirsh Pull Request resolved: https://github.com/pytorch/pytorch/pull/124923 Approved by: https://github.com/ezyang	2024-04-26 15:26:20 +00:00
PyTorch MergeBot	4a1299cc0e	Revert "Add test_cpp_extensions tests for stream_and_event and mita_backend (#123614 )" This reverts commit `355dc34f86`. Reverted https://github.com/pytorch/pytorch/pull/123614 on behalf of https://github.com/jeffdaily due to this PR broke ROCm with message RuntimeError: Cannot have MTIA with other devices ([comment](https://github.com/pytorch/pytorch/pull/123612#issuecomment-2077649762))	2024-04-25 16:06:46 +00:00
egienvalue	355dc34f86	Add test_cpp_extensions tests for stream_and_event and mita_backend (#123614 ) Test the generic torch.Stream/Event with fake device gurad and hooks. Differential Revision: [D56443358](https://our.internmc.facebook.com/intern/diff/D56443358) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123614 Approved by: https://github.com/albanD ghstack dependencies: #123611, #123612	2024-04-24 20:51:20 +00:00
Ashwin Hari	5f5778476a	rename ort to maia (#123265 ) Fixes #123264 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123265 Approved by: https://github.com/albanD	2024-04-23 00:33:25 +00:00
PyTorch MergeBot	52da03edeb	Revert "Add test_cpp_extensions tests for stream_and_event and mita_backend (#123614 )" This reverts commit `b6f0159db0`. Reverted https://github.com/pytorch/pytorch/pull/123614 on behalf of https://github.com/jeffdaily due to This broke ROCm. see test_overrides.py ([comment](https://github.com/pytorch/pytorch/pull/123611#issuecomment-2067363780))	2024-04-19 22:44:26 +00:00
egienvalue	b6f0159db0	Add test_cpp_extensions tests for stream_and_event and mita_backend (#123614 ) Test the generic torch.Stream/Event with fake device gurad and hooks. @exported-using-ghexport Differential Revision: [D55902506](https://our.internmc.facebook.com/intern/diff/D55902506/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123614 Approved by: https://github.com/albanD ghstack dependencies: #123611, #123612	2024-04-18 17:40:13 +00:00
Yuanhao Ji	c797fbc4e1	Enable UFMT on `test/cpp_api_parity`, `test/cpp_extensions`, `test/create_dummy_torchscript_model.py`, `test/custom_backend`, `test/custom_operator` (#123518 ) Partially addresses #123062 Ran lintrunner on: - `test/cpp_api_parity` - `test/cpp_extensions` - `test/create_dummy_torchscript_model.py` - `test/custom_backend` - `test/custom_operator` Pull Request resolved: https://github.com/pytorch/pytorch/pull/123518 Approved by: https://github.com/huydhn	2024-04-08 20:18:42 +00:00
chentianyi16	83ad8e01b1	fix the problem that cpu_fallback for aten::triu_indices on custom device crashed (#121306 ) Fixes #121289 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121306 Approved by: https://github.com/ezyang	2024-03-26 01:29:45 +00:00
PyTorch MergeBot	db506762d1	Revert "Change ATEN generator argument type to const std::optional<Generator>& (#120076 )" This reverts commit `a52b4e2257`. Reverted https://github.com/pytorch/pytorch/pull/120076 on behalf of https://github.com/atalman due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/120076#issuecomment-2018680656))	2024-03-25 18:52:05 +00:00
cyy	a52b4e2257	Change ATEN generator argument type to const std::optional<Generator>& (#120076 ) This PR proposes to use std::optional<Generator>& for underlying functions to avoid unnecessary copy and move operations. The torchgen code was changed to generate the new type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120076 Approved by: https://github.com/malfet	2024-03-24 02:12:08 +00:00
PyTorch MergeBot	02fee6caec	Revert "Change ATEN generator argument type to const std::optional<Generator>& (#120076 )" This reverts commit `ecbe82b9ce`. Reverted https://github.com/pytorch/pytorch/pull/120076 on behalf of https://github.com/jeanschmidt due to Reverting in order to check if this will fix XLA trunk jobs ([comment](https://github.com/pytorch/pytorch/pull/120076#issuecomment-2015272644))	2024-03-22 14:53:45 +00:00
cyy	ecbe82b9ce	Change ATEN generator argument type to const std::optional<Generator>& (#120076 ) This PR proposes to use std::optional<Generator>& for underlying functions to avoid unnecessary copy and move operations. The torchgen code was changed to generate the new type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120076 Approved by: https://github.com/malfet	2024-03-22 03:49:31 +00:00
Shan19900305	6662627c89	Add APIs for custom device using TensorIteratorBase. (#120792 ) 1) add operand and get_dim_names API; 2) set will_resize to true when output tensor is undefined; 3) add abs_stub for dummy device and calculate on cpu device; 4) support dummy device copy with stride; Pull Request resolved: https://github.com/pytorch/pytorch/pull/120792 Approved by: https://github.com/ezyang	2024-03-20 03:51:09 +00:00
PyTorch MergeBot	c0996866f4	Revert "Change ATEN generator argument type to const std::optional<Generator>& (#120076 )" This reverts commit `4305c64fea`. Reverted https://github.com/pytorch/pytorch/pull/120076 on behalf of https://github.com/izaitsevfb due to breaking internal builds(take 3) ([comment](https://github.com/pytorch/pytorch/pull/120076#issuecomment-1986338164))	2024-03-08 20:01:03 +00:00
cyy	4305c64fea	Change ATEN generator argument type to const std::optional<Generator>& (#120076 ) This PR proposes to use std::optional<Generator>& for underlying functions to avoid unnecessary copy and move operations. The torchgen code was changed to generate the new type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120076 Approved by: https://github.com/malfet	2024-03-07 09:52:21 +00:00
Chen_Liqing	291ce86a6c	Modify StorageImplCreateHelper (#118459 ) I want to use tensor.untyped_storage()[a:b] for ``PrivateUse1`` backend but fail. The code will go into ``THPStorage_get``: `bb6eba189f/torch/csrc/Storage.cpp (L525-L540)` Here ``torch`` will create a new ``c10::StorageImpl`` but not consider about ``PrivateUse1`` backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118459 Approved by: https://github.com/albanD	2024-03-07 06:26:55 +00:00
cyy	507611f9ae	[CUDACachingAllocator] Turn Allocator::allocate into non-const (#120969 ) Ideally, the method should be non-const since it changes the allocator state. Some const_casts are also removed in the way. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120969 Approved by: https://github.com/albanD	2024-03-05 09:53:05 +00:00
Shan19900305	6c3600d008	Enable optional tensorList fallback to cpu. (#119273 ) add optional tensorList fallback to cpu. Add testcases and old pr is: https://github.com/pytorch/pytorch/pull/106449 @bdhirsh Pull Request resolved: https://github.com/pytorch/pytorch/pull/119273 Approved by: https://github.com/bdhirsh	2024-02-07 03:54:13 +00:00
Edward Yang	b4a35632f9	Add function to materialize COW storages (#117053 ) Summary: From Kurt Mohler, see https://github.com/pytorch/pytorch/pull/113396 (manually imported due to ghimport problems) Test Plan: sandcastle, OSS CI Differential Revision: D52610522 Pull Request resolved: https://github.com/pytorch/pytorch/pull/117053 Approved by: https://github.com/malfet, https://github.com/kurtamohler	2024-01-10 15:34:16 +00:00
PyTorch MergeBot	f36d09fcb7	Revert "Add function to materialize COW storages (#113396 )" This reverts commit `e2f090086b`. Reverted https://github.com/pytorch/pytorch/pull/113396 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/113396#issuecomment-1818769090))	2023-11-20 10:26:01 +00:00
Kurt Mohler	e2f090086b	Add function to materialize COW storages (#113396 ) Part of #109833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113396 Approved by: https://github.com/ezyang	2023-11-17 01:58:51 +00:00
feifan	c73da67d46	new_qtensor support privateuseone allocator. (#111464 ) I want to create a quant tensor through `PerTensorAffineQuantizer`. But I found that it will throw error because of the lake of judgment for PrivateUse1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111464 Approved by: https://github.com/ezyang	2023-11-01 05:16:58 +00:00
FFFrog	68cb854d73	Fix CPUFallback Mechinasm on TensorList Type (#105209 ) Fixes #104965 Currently, the cpufallback mechinasm lack the code logic of TensorList, so some operators like _foreach_add_/_foreach_add don`t work well. cc @bdhirsh Pull Request resolved: https://github.com/pytorch/pytorch/pull/105209 Approved by: https://github.com/bdhirsh	2023-08-05 15:38:30 +00:00
FFFrog	ae4b2d272f	Fix the Test of duplicate registration on genarator (#106536 ) The duplicate registration test case shown in the figure below has always failed. `3d165dc3f3/test/test_cpp_extensions_open_device_registration.py (L171-L173)` `3d165dc3f3/aten/src/ATen/core/GeneratorForPrivateuseone.h (L36-L37)` Because there is a static variable in the ```self.module.register_generator()``` function, it will only be initialized once. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106536 Approved by: https://github.com/albanD	2023-08-04 16:09:40 +00:00
Brian Hirsh	4a549dd57a	AOTAutograd: correctness fix when tracing custom autograd functions that alias inputs (#102992 ) Fixes https://github.com/pytorch/pytorch/issues/102970. See the comment [here](https://github.com/pytorch/pytorch/issues/102970#issuecomment-1577223773) for details. We normally treat "outputs that alias inputs" specially in AOTAutograd, by replaying the views at runtime, instead of baking them into the graph. For views that are part of custom autograd functions though, we can't do that view-replay, since it will clobber the backwards function that the user specified in their custom autograd.Function. Right now in this PR, I distinguish between "aliased inputs that are normal views" vs. "aliased inputs that are views that came from an autograd.Function call" by checking the outputs `.grad_fn` field, to see if it inherits from our custom CBackward function class. Then I added a new `OutputType` enum value, that we effectively treat the "normal" way (the same way that we treat ordinary, non-aliased outputs). The new enum val is mostly for debugging - so we can print it and know that our graph had custom autograd.Function aliased outputs in it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102992 Approved by: https://github.com/ezyang, https://github.com/zou3519	2023-07-31 19:02:12 +00:00
shibo19	7047d132fd	add context support for custom device (#105056 ) Fixes #ISSUE_NUMBER as the title, add context support for custom device and testcase. And in the future, we may want to refactor these hooks for different device to unify the APIs, would you agree my idea？ @albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/105056 Approved by: https://github.com/albanD	2023-07-29 12:56:03 +00:00
kshitij12345	47894bb165	[functorch] disable C++ Function under functorch transforms (#103957 ) Fixes https://github.com/pytorch/pytorch/issues/102720 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103957 Approved by: https://github.com/zou3519	2023-06-23 11:01:44 +00:00
Bug Hunter Yan	b7777c812e	extend serialization for tensor metadata (#99808 ) Fixes #ISSUE_NUMBER Add the serialization logic of backend metadata to the serialization of tensor, which is implemented through custom registration functions. In #97429 , the structure backendMeta is provided in TensorImpl, and we think that this part of information may also need to be serialized for custom. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99808 Approved by: https://github.com/ezyang, https://github.com/huydhn	2023-06-14 01:43:21 +00:00
Li-Huai (Allan) Lin	3c0072e7c0	[MPS] Prerequisite for MPS C++ extension (#102483 ) in order to add mps kernels to torchvision codebase, we need to expose mps headers and allow objc++ files used in extensions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102483 Approved by: https://github.com/malfet	2023-06-07 17:28:31 +00:00
Bug Hunter Yan	0c470b17e3	Extend storage create for custom storageImpl (#100237 ) Fixes #ISSUE_NUMBER For the scenario where users inherit storageimpl to implement their own subclasses, the current storage creation method cannot correctly create storage objects. Refer to the registration method of Allocator to expand the creation method of storageimpl, users can register their own custom storageimpl creation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100237 Approved by: https://github.com/albanD	2023-05-17 04:30:13 +00:00
PyTorch MergeBot	1272cd73da	Revert "extend serialization for tensor metadata (#99808 )" This reverts commit `4b9bc6f2a6`. Reverted https://github.com/pytorch/pytorch/pull/99808 on behalf of https://github.com/izaitsevfb due to Breaks internal builds: ld.lld: error: undefined symbol: torch::jit::GetBackendMetaSerialization() ([comment](https://github.com/pytorch/pytorch/pull/99808#issuecomment-1550071656))	2023-05-16 17:22:25 +00:00
fakeYan	4b9bc6f2a6	extend serialization for tensor metadata (#99808 ) Fixes #ISSUE_NUMBER Add the serialization logic of backend metadata to the serialization of tensor, which is implemented through custom registration functions. In #97429 , the structure backendMeta is provided in TensorImpl, and we think that this part of information may also need to be serialized for custom. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99808 Approved by: https://github.com/ezyang	2023-05-15 19:45:34 +00:00
zhi.cai	bf50180b4a	enable dispatch stub for backend PrivateUse1 (#99611 ) When expanding the new backend of pytorch in the form of out ot tree, Privateuse1 will be reused. So we also need to support PrivateUse1 in the dispatch stub module Pull Request resolved: https://github.com/pytorch/pytorch/pull/99611 Approved by: https://github.com/ezyang	2023-05-12 04:02:12 +00:00
XDaoHong	a723f1f2b9	fix _privateuse1_tag problem (#100632 ) Fix _privateuse1_tag bug in torch/serialization.py Add device_index after device_type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100632 Approved by: https://github.com/ezyang	2023-05-10 09:53:19 +00:00
PyTorch MergeBot	5c14eea1de	Revert "extend serialization for tensor metadata (#99808 )" This reverts commit `73dd6f04c9`. Reverted https://github.com/pytorch/pytorch/pull/99808 on behalf of https://github.com/atalman due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/99808#issuecomment-1536823538))	2023-05-05 21:55:52 +00:00
Bug Hunter Yan	73dd6f04c9	extend serialization for tensor metadata (#99808 ) Fixes #ISSUE_NUMBER Add the serialization logic of backend metadata to the serialization of tensor, which is implemented through custom registration functions. In #97429 , the structure backendMeta is provided in TensorImpl, and we think that this part of information may also need to be serialized for custom. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99808 Approved by: https://github.com/ezyang	2023-05-04 20:32:11 +00:00
wbigat	b02aa5e71d	[Feature] storage resize_ support custom device. (#99882 ) Fixes #99326 Support storage resize_ for custom device, by calling dispatched tensor operations. @ezyang this pr is another case that was brought up in issue #99326, please take a moment to review this change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99882 Approved by: https://github.com/ezyang	2023-04-27 20:18:35 +00:00
wbigat	ee5f09ab80	[Feature] storage pin memory support custom device. (#99712 ) Fixes #99326 Support storage pin_memory and is_pinned for custom device, by calling dispatched tensor operations. @ezyang this pr is what we have discussed in issue #99326, would you please take a moment to review it, thanks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99712 Approved by: https://github.com/ezyang	2023-04-21 18:31:01 +00:00
Animesh Jain	971df458db	Reland of "Python binding to set/get CUDA rng state offset" (#99565 ) Why? * To reduce the latency of hot path in https://github.com/pytorch/pytorch/pull/97377 Concern - I had to add `set_offset` in all instances of `GeneratorImpl`. I don't know if there is a better way. ~~~~ import torch torch.cuda.manual_seed(123) print(torch.cuda.get_rng_state()) torch.cuda.set_rng_state_offset(40) print(torch.cuda.get_rng_state()) tensor([123, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=torch.uint8) tensor([123, 0, 0, 0, 0, 0, 0, 0, 40, 0, 0, 0, 0, 0, 0, 0], dtype=torch.uint8) ~~~~ Reland of https://github.com/pytorch/pytorch/pull/98965 (cherry picked from commit `8214fe07e8`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99565 Approved by: https://github.com/anijain2305	2023-04-20 15:42:25 +00:00
PyTorch MergeBot	bb2cd4a107	Revert "Python binding to set/get CUDA rng state offset (#98965 )" This reverts commit `8214fe07e8`. Reverted https://github.com/pytorch/pytorch/pull/98965 on behalf of https://github.com/DanilBaibak due to Break internal build	2023-04-19 11:23:32 +00:00
Animesh Jain	8214fe07e8	Python binding to set/get CUDA rng state offset (#98965 ) Why? * To reduce the latency of hot path in https://github.com/pytorch/pytorch/pull/97377 Concern - I had to add `set_offset` in all instances of `GeneratorImpl`. I don't know if there is a better way. ~~~~ import torch torch.cuda.manual_seed(123) print(torch.cuda.get_rng_state()) torch.cuda.set_rng_state_offset(40) print(torch.cuda.get_rng_state()) tensor([123, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=torch.uint8) tensor([123, 0, 0, 0, 0, 0, 0, 0, 40, 0, 0, 0, 0, 0, 0, 0], dtype=torch.uint8) ~~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/98965 Approved by: https://github.com/kulinseth, https://github.com/ezyang	2023-04-18 07:52:21 +00:00
Bug Hunter Yan	2b54d673fc	Add custom backend case for storage and automatically generate storage attributes. (#98478 ) Currently storage only considers partial backend. We want storage to create on custom backend by key PrivateUse1. It also provides an easy automatic generation of storage-related attributes. When the user registers a new backend, the corresponding methods and attributes can be automatically generated. Do this code. `torch.utils.rename_privateuse1_backend('foo')` `torch.utils.generate_storage_for_privateuse1_backend()` Then, get the following methods and attributes. `torch.TypedStorage.is_foo` `torch.TypedStorage.foo()` `torch.UntypedStorage.is_foo` `torch.UntypedStorage.foo()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/98478 Approved by: https://github.com/albanD	2023-04-17 19:18:39 +00:00
Edward Z. Yang	756a86d52c	Support large negative SymInt (#99157 ) The strategy is that we will heap allocate a LargeNegativeIntSymNodeImpl whenever we have a large negative int, so that we can keep the old `is_symbolic` test (now called `is_heap_allocated`) on SymInt. Whenever we need to do something with these ints, though, we convert them back into a plain `int64_t` (and then, e.g., wrap it in whatever user specificed SymNodeImpl they need.) We cannot wrap directly in the user specified SymNodeImpl as we generally do not know what the "tracing context" is from C++. We expect large negative ints to be rare, so we don't apply optimizations like singleton-ifying INT_MIN. Here's the order to review: * c10/core/SymInt.h and cpp * `is_symbolic` renamed to `is_heap_allocated` as I needed to audit all use sites: the old `is_symbolic` test would return true for large negative int, but it would be wrong to then try to dispatch on the LargeNegativeIntSymNodeImpl which supports very few operations. In this file, I had to update expect_int, * If you pass in a large negative integer, we instead heap allocate it in `promote_to_negative`. The function is written in a funny way to keep compact constructor code for SymInt (the heap allocation happens out of line) * clone is now moved out-of-line * New method maybe_as_int which will give you a constant int if it is possible, either because it's stored inline or in LargeNegativeIntSymNodeImpl. This is the preferred replacement for previous use of is_symbolic() and then as_int_unchecked(). * Rename toSymNodeImpl to toSymNode, which is more correct (since it returns a SymNode) * Complete rewrite of `normalize_symints.cpp` to use new `maybe_as_int`. Cannot easily use the old code structure, so it's now done doing a macro and typing out each case manually (it's actually not that bad.) * Reimplementations of all the unary operators by hand to use `maybe_as_int`, relatively simple. * c10/core/LargeNegativeIntSymNodeImpl.h - Just stores a int64_t value, but it has to be big and negative. Most methods are not implemented, since we will rewrap the large negative int in the real SymNodeImpl subclass before doing operations with it * The rest of the files are just rewriting code to use `maybe_as_int`. There is a nontrivial comment in c10/core/SymIntArrayRef.h Very minor test adjustment in c10/test/core/SymInt_test.cpp . Plan to exercise this properly in next PR. Companion XLA PR: https://github.com/pytorch/xla/pull/4882 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99157 Approved by: https://github.com/albanD	2023-04-15 22:43:51 +00:00
fakeYan	668c578083	Automatically generate attributes and methods for custom backends. (#98066 ) Fixes #ISSUE_NUMBER #97593 A new extension mechanism has been added. When the user registers a new backend, the corresponding methods and attributes can be automatically generated. Do this code. `torch.utils.rename_privateuse1_backend('foo')` `torch.utils.generate_for_privateuse1_backend()` Then, get the following methods and attributes. `torch.Tensor.is_foo` `torch.Tensor.foo()` `torch.nn.Module.foo()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/98066 Approved by: https://github.com/albanD	2023-04-13 22:04:05 +00:00
PyTorch MergeBot	cb3c478069	Revert "refactor(add privateuseone floder in aten/src/ATen): add a PrivateUse… (#98127 )" This reverts commit `5a537e291d`. Reverted https://github.com/pytorch/pytorch/pull/98127 on behalf of https://github.com/weiwangmeta due to Sorry, our internal code is not ready to take such changes	2023-04-08 05:32:21 +00:00
ykddd	5a537e291d	refactor(add privateuseone floder in aten/src/ATen): add a PrivateUse… (#98127 ) Add a PrivateUse1 folder to contain all the feature adaptations for PrivateUse1 under Aten,For example GetGeneratorPrivate which is used for the three-party backend to register his own Generator implementation.This makes it easier for us to centrally manage these features, and it will increase the convenience of adaptation for different back-end manufacturers. For more info: https://github.com/pytorch/pytorch/issues/98073 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98127 Approved by: https://github.com/bdhirsh	2023-04-07 03:43:16 +00:00
Sergii Dymchenko	5ab50cf048	Fix shoud/shoudl typos (#97930 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97930 Approved by: https://github.com/clee2000	2023-03-30 08:27:16 +00:00
donnyyou	8a6e28ccd3	Fix typo for generator. (#97136 ) Fix typo for generator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97136 Approved by: https://github.com/Skylion007, https://github.com/kit1980	2023-03-20 20:43:56 +00:00
shibo	7038458c5b	Add Generator register for the privateuse1 backend (#93920 ) Fixes #92202 Add generator regiter for the backend of `privateuseone` module: backend Pull Request resolved: https://github.com/pytorch/pytorch/pull/93920 Approved by: https://github.com/bdhirsh	2023-03-07 03:43:23 +00:00
PyTorch MergeBot	f152a79be9	Revert "update aten op overload to not use `from` to avoid compile errors (#89797 )" This reverts commit `021d267694`. Reverted https://github.com/pytorch/pytorch/pull/89797 on behalf of https://github.com/jeanschmidt due to breaking internal builds - more details on https://fburl.com/sandcastle/bz8mgkil	2023-02-10 11:32:25 +00:00
Elias Ellison	021d267694	update aten op overload to not use `from` to avoid compile errors (#89797 ) Fix for https://github.com/pytorch/pytorch/issues/93591 by changing `random_.from` to `random_.from_int`. The previous signature would fail when printed in an fx graph, because `from` is a reserved python keyword. This change affects serialization but I have added an adapter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89797 Approved by: https://github.com/tugsbayasgalan	2023-02-08 22:04:59 +00:00
Angel Avila	adc1a94ef4	Add tests for custom pybind type_casters (#89897 ) This is a followup to #89115 which Fixes #88958 This adds tests to verify at runtime that the types returned by custom pybind type_casters are correctly specified in the second argument to `PYBIND11_TYPE_CASTER`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89897 Approved by: https://github.com/ezyang	2022-12-02 07:02:09 +00:00
Min Si	1ad0048b64	Refactor distribuetd to use absolute header path (#85780 ) Headers under torch/csrc/distributed may be referened with relative path, e.g., "<c10d/...>". However, relative path cannot be gracefully handled by Meta internal build when the NCCL PG is hipified to support AMD/RCCL because the "hipified" header files are generated in other directories. Moreover, using absolute path for header inclusion is the state-of-the-art in most components in Pytorch. Thus, this patch refactors all header paths in torch/csrc/distributed to be absolute. See D39835774 for more details about Meta internal complication. How to test: commit 9e5d199 removes -I./torch/csrc/distributed in compile options. Thus use it to verify we don't miss any relative path use of torch/csrc/distributed headers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85780 Approved by: https://github.com/kumpera, https://github.com/huydhn	2022-09-30 05:13:50 +00:00
PyTorch MergeBot	a50d8864fc	Revert "Refactor distribuetd to use absolute header path (#85780 )" This reverts commit `668082718a`. Reverted https://github.com/pytorch/pytorch/pull/85780 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but it breaks build due to a missing file <c10d/Store.hpp>	2022-09-30 02:04:29 +00:00
Min Si	668082718a	Refactor distribuetd to use absolute header path (#85780 ) Headers under torch/csrc/distributed may be referened with relative path, e.g., "<c10d/...>". However, relative path cannot be gracefully handled by Meta internal build when the NCCL PG is hipified to support AMD/RCCL because the "hipified" header files are generated in other directories. Moreover, using absolute path for header inclusion is the state-of-the-art in most components in Pytorch. Thus, this patch refactors all header paths in torch/csrc/distributed to be absolute. See D39835774 for more details about Meta internal complication. How to test: commit 9e5d199 removes -I./torch/csrc/distributed in compile options. Thus use it to verify we don't miss any relative path use of torch/csrc/distributed headers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85780 Approved by: https://github.com/kumpera	2022-09-30 00:27:24 +00:00
Howard Huang	74ead61944	[2/N] [Dispatchable Collectives] Extract ProcessGroup::Work into a separate class and update references (#83680 ) ### Changes - Move ProcessGroup::Work into its own class and update all the references to it / header includes. #### Motivation In the future PRs we will repurpose ProcessGroup to instead contain a list of Backends (ProcessGroupNCCL/Gloo/UCC) and perform dispatching to them based on tensor type. This change is prevent a circular dependency with ProcessGroup depending on Backend and Backend depending on ProcessGroup::Work. Differential Revision: [D38839212](https://our.internmc.facebook.com/intern/diff/D38839212) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83680 Approved by: https://github.com/kwen2501	2022-09-14 13:05:58 +00:00
Edward Z. Yang	19e27b1556	Make dispatcher registrations of SymInt functions backwards compatible (#84557 ) Previously, when we SymInt-ify a schema, this is a BC-breaking change for all people who registered functions for that function; they must accept c10::SymInt where they previously accepted int64_t. This is not great. With this change, I accept old type registrations transparently. The idea is in several parts: - At the registration site, at compile time I have no idea whether or not if the function being registered has a SymInt schema or not. So I must defer the exact compatibility check. What I do instead is check if the function pointer registered to me has SymInt in the argument or not. If it does, I assume it is new-style and ensure it is also registered to a special sym_ slot on KernelFunction. If not, it only goes in the conventional slot. - At the dispatcher site, I know at compile time whether or not this is a SymInt function. If it is, I check for a sym_ slot on the KernelFunction, and preferentially use that. If no such slot exists, I then fall back to the regular slot... but I convert all SymInt arguments to int64_t arguments (doing assertions that no true symbolic integer was passed.) I can skip this test entirely if the function doesn't have any SymInts in it; in that case I know that only the original slot could have been registered. Fortunately, both branches of the short circuit typecheck, so I didn't have to use SFINAE or if-constexpr to make it work; just a plain if statement that I expect the compiler to optimize away. - Schema validation is now modestly more complicated. There are two parts. First, function schema validation proceeds by checking if the signature in question has any SymInt-like types in it or not. If it does, we do function schema validation against the real types; if it doesn't, we do validation against the fake types (but only for symint; MemoryFormat is always MemoryFormat). Second, cpp signature validation also keeps track of a "symint" cpp signature and a "non-symint" cpp signature. We only compare symint with symint, and non-symint with non-symint. I did not implement checking a conflict between a symint and non-symint cpp signature, though in principle you could try converting the SymInt types to non-SymInt types and doing the comparison that way. To show it is working, I remove a bunch of c10::asIntArrayRefSlow shims, as the dispatcher is able to insert them automatically now. I didn't update the Metal registrations (though they can get similar treatment) as OSS CI coverage is insufficient for this case. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D39280965](https://our.internmc.facebook.com/intern/diff/D39280965) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84557 Approved by: https://github.com/wconstab	2022-09-07 16:30:21 +00:00
Edward Z. Yang	ad44670fa1	Back out "Revert D38984222: Don't introduce new overload for SymInt (#83628 )" (#84173 ) Also Back out "Revert D39075159: [acc_tensor] Use SymIntArrayRef for overloaded empty.memory_format's signature" Original commit changeset: dab4a9dba4fa Original commit changeset: dcaf16c037a9 Original Phabricator Diff: D38984222 Original Phabricator Diff: D39075159 Also update Metal registrations for C++ registration changes. Also update NNPI registration to account for tightened schema checking Differential Revision: [D39084762](https://our.internmc.facebook.com/intern/diff/D39084762/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39084762/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/84173 Approved by: https://github.com/Krovatkin	2022-08-29 18:01:07 +00:00
PyTorch MergeBot	c7edcd6968	Revert "Don't introduce new overload for SymInt (#83628 )" This reverts commit `9790d90e4b`. Reverted https://github.com/pytorch/pytorch/pull/83628 on behalf of https://github.com/malfet due to Breaks internal builds, see D39076487	2022-08-27 01:23:17 +00:00
Edward Z. Yang	9790d90e4b	Don't introduce new overload for SymInt (#83628 ) Previously, we introduced new SymInt overloads for every function we wanted. This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented. This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts. This is BC-breaking in the following ways: * The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change. Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually. This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this. This is not BC-breaking in the following ways: * The user facing C++ API remains compatible. Even if a function changes from int to SymInt, the default C++ binding still takes only ints. (e.g., at::empty(IntArrayRef, ...). To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed. * This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type. Structure of the PR: * The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it as if it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other: * The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular: * When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences. * In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!) * In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway. * Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes. * The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK. * I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it. * I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload) * I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.) * I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints. * I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628 Approved by: https://github.com/albanD, https://github.com/bdhirsh	2022-08-26 01:35:40 +00:00
PyTorch MergeBot	a7edf71360	Revert "Don't introduce new overload for SymInt (#83628 )" This reverts commit `8fae7027b3`. Reverted https://github.com/pytorch/pytorch/pull/83628 on behalf of https://github.com/malfet due to breaking internal builds, see https://www.internalfb.com/diff/D38984222	2022-08-25 00:49:40 +00:00
Edward Z. Yang	8fae7027b3	Don't introduce new overload for SymInt (#83628 ) Previously, we introduced new SymInt overloads for every function we wanted. This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented. This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts. This is BC-breaking in the following ways: * The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change. Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually. This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this. This is not BC-breaking in the following ways: * The user facing C++ API remains compatible. Even if a function changes from int to SymInt, the default C++ binding still takes only ints. (e.g., at::empty(IntArrayRef, ...). To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed. * This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type. Structure of the PR: * The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it as if it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other: * The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular: * When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences. * In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!) * In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway. * Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes. * The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK. * I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it. * I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload) * I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.) * I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints. * I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628 Approved by: https://github.com/albanD, https://github.com/bdhirsh	2022-08-23 22:04:07 +00:00
Brian Hirsh	282de5539d	add open device registration test with cpp extensions (#80477 ) Adding a test for open device registration using cpp extensions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80477 Approved by: https://github.com/albanD, https://github.com/malfet	2022-07-12 01:46:16 +00:00
Nikolay Korovaiko	0a5123a752	Revert "Revert "Add support for directly passing symint to empty"" (#79954 ) Relanding https://github.com/Krovatkin/pytorch/pull/new/krovatkin/symint_empty Pull Request resolved: https://github.com/pytorch/pytorch/pull/79954 Approved by: https://github.com/Chillee, https://github.com/kulinseth	2022-07-04 20:08:55 +00:00
Xiao Wang	ef0332e36d	Allow relocatable device code linking in pytorch CUDA extensions (#78225 ) Close https://github.com/pytorch/pytorch/issues/57543 Doc: check `Relocatable device code linking:` in https://docs-preview.pytorch.org/78225/cpp_extension.html#torch.utils.cpp_extension.CUDAExtension Pull Request resolved: https://github.com/pytorch/pytorch/pull/78225 Approved by: https://github.com/ezyang, https://github.com/malfet	2022-06-02 21:35:56 +00:00
Nikita Shulga	6302cdb9bc	[Reland] Add BUILD_LAZY_CUDA_LINALG option (#73447 ) Summary: When enabled, it will generate `torch_cuda_linalg` library, which would depend on cusolve and magma and registers dynamic bindings to it from LinearAlgebraStubs Avoid symbol clashes that can result in infinite recursion by moving all symbols in the library to its own namespace. Add checks that should prevent calling self in recursion to `LinearAlgebraStubs.cpp` Pull Request resolved: https://github.com/pytorch/pytorch/pull/73447 Reviewed By: albanD Differential Revision: D34538827 Pulled By: malfet fbshipit-source-id: f2535b471d3524768a84b2e169b6aa24c26c03bf (cherry picked from commit 4ec24b079c861c1122f0fa86e280b977c3c2f7ac)	2022-03-01 21:33:07 +00:00
Jane Xu	31271284bc	Revert D33992795: Add BUILD_LAZY_CUDA_LINALG option Test Plan: revert-hammer Differential Revision: D33992795 (`82130758f0`) Original commit changeset: d1fa351a3206 Original Phabricator Diff: D33992795 (`82130758f0`) fbshipit-source-id: f0a66d7431aea2c358718eef16fab05712cd6cae (cherry picked from commit df4900115f712e477ed5cc97510e6515a1ca17a9)	2022-02-25 18:37:31 +00:00
Nikita Shulga	82130758f0	Add BUILD_LAZY_CUDA_LINALG option (#72306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72306 When enable, it will generate `torch_cuda_linalg` library, which would depend on cusolve and magma and registers dynamic bindings to it from LinearAlgebraStubs Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33992795 Pulled By: malfet fbshipit-source-id: d1fa351a320659b29754997c20d754e69bfe36c0 (cherry picked from commit d5d6c69a988b9454538ecd28674206da2541de17)	2022-02-24 03:30:04 +00:00
Masaki Kozuki	7c739e1ab9	Resubmit #67161 (#67735 ) Summary: Skip building extensions if windows following https://github.com/pytorch/pytorch/pull/67161#issuecomment-958062611 Related issue: https://github.com/pytorch/pytorch/issues/67073 cc ngimel xwang233 ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/67735 Reviewed By: bdhirsh Differential Revision: D32141250 Pulled By: ngimel fbshipit-source-id: 9bfdb7cf694c99f6fc8cbe9033a12429b6e4b6fe	2021-11-04 09:59:30 -07:00
Mike Ruberry	aa16de517d	Revert D31984694: [pytorch][PR] make `TORCH_(CUDABLAS\|CUSOLVER)_CHECK` usable in custom extensions Test Plan: revert-hammer Differential Revision: D31984694 (`d4493b27ee`) Original commit changeset: 0035ecd13980 fbshipit-source-id: c85689007719c9e4a930b0a8a32d481a501d3c14	2021-10-30 03:51:18 -07:00
Masaki Kozuki	d4493b27ee	make `TORCH_(CUDABLAS\|CUSOLVER)_CHECK` usable in custom extensions (#67161 ) Summary: Make `TORCH_CUDABLAS_CHECK` and `TORCH_CUSOLVER_CHECK` available in custom extensions by exporting the internal functions called by the both macros. Rel: https://github.com/pytorch/pytorch/issues/67073 cc xwang233 ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/67161 Reviewed By: jbschlosser Differential Revision: D31984694 Pulled By: ngimel fbshipit-source-id: 0035ecd1398078cf7d3abc23aaefda57aaa31106	2021-10-29 17:27:07 -07:00
Aaron Bockover	c78ab28441	Add support for the ONNX Runtime Eager Mode backend (#58248 ) Summary: This PR implements the necessary hooks/stubs/enums/etc for complete ONNX Runtime (ORT) Eager Mode integration. The actual extension will live out of tree at https://github.com/pytorch/ort. We have been [working on this at Microsoft](https://github.com/microsoft/onnxruntime-pytorch/tree/eager-ort/torch_onnxruntime) for the last few months, and are finally ready to contribute the PyTorch core changes upstream (nothing major or exciting, just the usual boilerplate for adding new backends). The ORT backend will allow us to ferry [almost] all torch ops into granular ONNX kernels that ORT will eagerly execute against any devices it supports (therefore, we only need a single ORT backend from a PyTorch perspective). Pull Request resolved: https://github.com/pytorch/pytorch/pull/58248 Reviewed By: astaff Differential Revision: D30344992 Pulled By: albanD fbshipit-source-id: 69082b32121246340d686e16653626114b7714b2	2021-08-20 11:17:13 -07:00
Shen Li	1022443168	Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: revert-hammer Differential Revision: D30279364 (`b004307252`) Original commit changeset: c1ed77dfe43a fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e	2021-08-12 11:45:01 -07:00
Zsolt Dollenstein	b004307252	[codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: manual inspection & sandcastle Reviewed By: zertosh Differential Revision: D30279364 fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a	2021-08-12 10:58:35 -07:00
Jiewen Tan	357c4d9cc4	Add a test case for findDanglingImpls (#61104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61104 This patch added a new test case for findDanglingImpls. The test case introduces a C++ extension which has a dangling impl such that findDanglingImpls can find it and output its information. Test Plan: python test/test_dispatch.py TestDispatch.test_find_dangling_impls_ext Imported from OSS Reviewed By: ezyang Differential Revision: D29512520 fbshipit-source-id: 6883fb8f065f2c0ae0e7a1adf6fd298591497e2b	2021-07-07 13:34:16 -07:00
Gao, Xiang	a1f9a3c643	Fix UB in library.h (#57962 ) Summary: The function name and return type both are called `class_`, therefore they are ambiguous and this is UB and does not work on NVCC. See the tests for the failure case. Thanks for the help of Thibaut Lutz from NVIDIA's compiler team. cc: yueyericardo ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/57962 Reviewed By: mruberry Differential Revision: D28359400 Pulled By: ezyang fbshipit-source-id: c64ec89203f99f656611aba34f7424eed7bc9e7c	2021-05-11 16:04:02 -07:00
Liang Luo	c37095760d	[torch distributed] Implementing all_gather_base (#56315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56315 This diff implements the all_gather_base in pytorch distributed. Test Plan: dist.all_gather_base(output, input)... Reviewed By: agolynski, amylittleyang Differential Revision: D27488999 fbshipit-source-id: 937ec8bddf9527fa4d114f984d1d0f6a5b8c3936	2021-04-23 14:16:47 -07:00
peter	3517ee1bcb	Fix ordered_dict.h for CUDA on Windows (#55275 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55266 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55275 Reviewed By: mrshenli Differential Revision: D27623887 Pulled By: malfet fbshipit-source-id: 6dac357e21179a259ac95f0e1b7399b03dacc81d	2021-04-07 23:43:35 -07:00
Wenlei Xie	2ecb2c7931	Pass Scalar by reference (#53583 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53583 `Scalar` takes 32 bytes due to `c10::complex<double>` requires aligning to 16 bytes. Passing Scalar by reference shows about 1% improvements on instruction count. All the changes in this commit are codemoded except for the following 4 files (which code-gen signatures): ``` tools/codegen/api/cpp.py tools/codegen/api/native.py tools/codegen/api/structured.py caffe2/contrib/aten/gen_op.py ``` # Codemode ## Main Step For the codemod part, here is the main command used: ``` fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}' ``` As you can tell, it codemods both `Scalar` and `optional<Scalar>`. Apply these commands iteratively until reaching a fix-point (since one method signature might contain multiple `Scalar` parameter). In retrospect, excluding `thrid_party` and `torch/csrc/jit` would be a good idea. (I revert it manually later, see https://github.com/pytorch/pytorch/pull/53479 as an reference). ## Pre-Step Prior to applying the main command, as some `Scalar` are presented as `at::Scalar` or `c10::Scalar`, so I codemod some of them in advance. Here is an incomplete list: ``` fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)at::Scalar (\w+)' '${1}const at::Scalar& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)at::Scalar (\w+)' '${1}const at::Scalar& ${2}' fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}' ``` ## Fixup There are a couple of post codemod fixup. For example, `const Scalar` will be codemoded into `const const Scalar&`. `at:Scalar` will be codemoded into `at::const Scalar&` (if `Pre-step` is not done comprehensively). Here is an incomplete list: ``` fastmod --extensions cpp 'const const Scalar' 'const Scalar' fastmod --extensions h 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>' fastmod --extensions cpp 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>' fastmod 'at::const Scalar&' 'const at::Scalar&' ``` ## Supplementary `cu` and `mm` files also need to be codemoded, for example: ``` fastmod --extensions cu 'at::const Scalar&' 'const at::Scalar&' fastmod --extensions mm '([a-zA-Z_+]$[^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' ``` Function pointers are not codemoded. Here is an incomplete list: ``` # Cover case: using index_fill_fn = void()(TensorIterator & iter, int64_t dim, int64_t self_dim_size, int64_t self_dim_stride, Scalar source); fastmod --extensions h '(void\s\(\s\\s$$[^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' # Cover case: using softplus_fn = void ()(TensorIterator&, Scalar, Scalar); fastmod --extensions h '(void\s\(\s\\s$$[^)],?\s)Scalar([, $])' '${1}const Scalar&${2}' fastmod --extensions cpp '(void\s$\s\\s$$[^)],?\s)Scalar([, $])' '${1}const Scalar&${2}' fastmod --extensions h '(void\s$\s\\s$$[^)],?\s)optional<Scalar>([, $])' '${1}const optional<Scalar>&${2}' ``` Some corner cases needs to be manually fixed. ghstack-source-id: 123970306 Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D26904445 fbshipit-source-id: 8d8a002af4b5125f153a32f03c6956be7ae5671d	2021-03-15 23:17:06 -07:00
Edward Yang	37bf6c134b	Register DefaultBackend implementations for functional/inplace structured operators (#53037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53037 As remarked in #52277 it is easy to give an (inefficient, due to extra redispatches) DefaultBackend implementation of foo and foo_ in terms of foo_out. This patch enables code generation for DefaultBackend in these cases by default for all structured kernels. You can see the payoff in MSNPU extension: it only has to register a kernel for add.out, and it gets add and add_ kernels automatically. The actual code changes are very modest: - When DefaultBackend, call the dispatched (not direct native::) functions to allocate tensors, change device guard, etc - Don't call impl() for DefaultBackend (as it doesn't exist); instead, directly generate a call to at::foo_out to do the actual work. - Do NOT generate DefaultBackend implementation for foo_out. Actually, there is a case to be made for this being a good idea with more infra; see comments inside. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D26731225 Pulled By: ezyang fbshipit-source-id: 939da7cb69f694722ec293e5e42e74a755dd0985	2021-03-02 14:13:08 -08:00
Qifan Lu	4e2ab2cd73	Move generator state APIs to ATen (#49589 ) Summary: ## Rationale While most of the `torch.Generator` properties and methods are implemented as a thin wrapper of the corresponding `at::Generator` methods, `torch.Generator.get_state()` and `torch.Generator.set_state()` are implemented in legacy Torch code and are not dispatched through the `c10::GeneratorImpl` interface. This is not structured well and makes implementing generators for new backends (e.g. `XLAGeneratorImpl` for the XLA backend) inconvenient. As such, this pull request seeks to move these generator state APIs to c10 and ATen. ## What is being refactored? * Interfaces - Added `c10::GeneratorImpl::set_state` and `c10::GeneratorImpl::state` for getting and setting the internal state of a random number generator. - `at::Generator::set_state` and `at::Generator::state` wraps the above-mentioned APIs, as it's basically a PIMPL. - Added helper function `at::detail::check_rng_state` for checking the validity of new RNG state tensor. * CPU Generator - Renamed and moved `THTensor_(setRNGState)` and `THTensor_(getRNGState)` to `CPUGeneratorImpl::set_state` and `CPUGenerator::state`. - Renamed and moved `THGeneratorState` and `THGeneratorStateNew` to `CPUGeneratorStateLegacy` and `CPUGeneratorState`. * CUDA Generator - Renamed and moved `THCRandom_setRNGState` and `THCRandom_getRNGState` to `CUDAGeneratorImpl::set_state` and `CUDAGeneratorImpl::state`. * PyTorch Bindings - `THPGenerator_setState` and `THPGenerator_getState` now simply forward to `at::Generator::set_state` and `at::Generator::state`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49589 Reviewed By: H-Huang Differential Revision: D25785774 Pulled By: pbelevich fbshipit-source-id: 8ed79209c4ffb1a0ae8b19952ac8871ac9e0255f	2021-01-06 18:26:56 -08:00
Sebastian Messmer	4a14020c0d	Remove .impl_UNBOXED() and functionalities associated with it (#49220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49220 Since all ops are c10-full, we can remove .impl_UNBOXED now. This also removes the ability of KernelFunction or CppFunction to store unboxedOnly kernels. ghstack-source-id: 119450489 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25490225 fbshipit-source-id: 32de9d591e6a842fe18abc82541580647e9cfdad	2021-01-06 14:22:46 -08:00
Yixin Bao	840e71f4e6	Check CUDA kernel launches (/fbcode/caffe2/) (#49145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49145 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49105 (1) Add a safety check `C10_CUDA_KERNEL_LAUNCH_CHECK()` after each kernel launch. This diff only changes the files inside the directory /fbsource/fbcode/caffe2/modules/, /fbsource/fbcode/caffe2/fb/, /fbsource/fbcode/caffe2/test/. (2) Get rid of old check `AT_CUDA_CHECK(cudaGetLastError())` when necessary. Test Plan: Test build: ``` buck build mode/dev-nosan //caffe2/modules/detectron: buck test mode/dev-nosan //caffe2/modules/detectron: buck build mode/dev-nosan //caffe2/torch/fb/: buck test mode/dev-nosan //caffe2/torch/fb/: ``` To check for launches without checks: ``` python3 caffe2/torch/testing/check_kernel_launches.py ``` Make sure none of the updated files are in the returned list. Reviewed By: r-barnes Differential Revision: D25452852 fbshipit-source-id: d6657edab612c9e0fa99b29c68460be8b1a20064	2020-12-10 10:43:03 -08:00
Supriya Rao	bfa95f90a0	Revert D25325039: Check CUDA kernel launches (/fbcode/caffe2/) Test Plan: revert-hammer Differential Revision: D25325039 (`f5e9ffbc27`) Original commit changeset: 2043d6e63c7d fbshipit-source-id: 5377dd2aa7c6f58c8641c956b7642c7c559bbc40	2020-12-09 14:07:16 -08:00
Yixin Bao	f5e9ffbc27	Check CUDA kernel launches (/fbcode/caffe2/) (#49105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49105 (1) Add a safety check `C10_CUDA_KERNEL_LAUNCH_CHECK()` after each kernel launch. This diff only changes the files inside the directory /fbsource/fbcode/caffe2/modules/, /fbsource/fbcode/caffe2/fb/, /fbsource/fbcode/caffe2/test/. (2) Get rid of old check `AT_CUDA_CHECK(cudaGetLastError())` when necessary. Test Plan: Test build: ``` buck build //caffe2/modules/detectron: buck build //caffe2/torch/fb/: ``` To check for launches without checks: ``` python3 caffe2/torch/testing/check_kernel_launches.py ``` Make sure none of the updated files are in the returned list. Reviewed By: r-barnes Differential Revision: D25325039 fbshipit-source-id: 2043d6e63c7d029c35576d3101c18247ffe92f01	2020-12-09 12:34:55 -08:00
Jithun Nair	5f62308739	Hipify revamp [REDUX] (#48715 ) Summary: [Refiled version of earlier PR https://github.com/pytorch/pytorch/issues/45451] This PR revamps the hipify module in PyTorch to overcome a long list of shortcomings in the original implementation. However, these improvements are applied only when using hipify to build PyTorch extensions, not for PyTorch or Caffe2 itself. Correspondingly, changes are made to cpp_extension.py to match these improvements. The list of improvements to hipify is as follows: 1. Hipify files in the same directory as the original file, unless there's a "cuda" subdirectory in the original file path, in which case the hipified file will be in the corresponding file path with "hip" subdirectory instead of "cuda". 2. Never hipify the file in-place if changes are introduced due to hipification i.e. always ensure the hipified file either resides in a different folder or has a different filename compared to the original file. 3. Prevent re-hipification of already hipified files. This avoids creation of unnecessary "hip/hip" etc. subdirectories and additional files which have no actual use. 4. Do not write out hipified versions of files if they are identical to the original file. This results in a cleaner output directory, with minimal number of hipified files created. 5. Update header rewrite logic so that it accounts for the previous improvement. 6. Update header rewrite logic so it respects the rules for finding header files depending on whether "" or <> is used. 7. Return a dictionary of mappings of original file paths to hipified file paths from hipify function. 8. Introduce a version for hipify module to allow extensions to contain back-compatible code that targets a specific point in PyTorch where the hipify functionality changed. 9. Update cuda_to_hip_mappings.py to account for the ROCm component subdirectories inside /opt/rocm/include. This also results in cleanup of the Caffe2_HIP_INCLUDE path to remove unnecessary additions to the include path. The list of changes to cpp_extension.py is as follows: 1. Call hipify when building a CUDAExtension for ROCm. 2. Prune the list of source files to CUDAExtension to include only the hipified versions of any source files in the list (if both original and hipified versions of the source file are in the list) 3. Add subdirectories of /opt/rocm/include to the include path for extensions, so that ROCm headers for subcomponent libraries are found automatically cc jeffdaily sunway513 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/48715 Reviewed By: bdhirsh Differential Revision: D25272824 Pulled By: ezyang fbshipit-source-id: 8bba68b27e41ca742781e1c4d7b07c6f985f040e	2020-12-02 18:03:23 -08:00
Nikita Shulga	8af9f2cc23	Revert D24924736: [pytorch][PR] Hipify revamp Test Plan: revert-hammer Differential Revision: D24924736 (`10b490a3e0`) Original commit changeset: 4af42b8ff4f2 fbshipit-source-id: 7f8f90d55d8a69a2890ec73622fcea559189e381	2020-11-18 11:48:30 -08:00
Jithun Nair	10b490a3e0	Hipify revamp (#45451 ) Summary: This PR revamps the hipify module in PyTorch to overcome a long list of shortcomings in the original implementation. However, these improvements are applied only when using hipify to build PyTorch extensions, not for PyTorch or Caffe2 itself. Correspondingly, changes are made to `cpp_extension.py` to match these improvements. The list of improvements to hipify is as follows: 1. Hipify files in the same directory as the original file, unless there's a "cuda" subdirectory in the original file path, in which case the hipified file will be in the corresponding file path with "hip" subdirectory instead of "cuda". 2. Never hipify the file in-place if changes are introduced due to hipification i.e. always ensure the hipified file either resides in a different folder or has a different filename compared to the original file. 3. Prevent re-hipification of already hipified files. This avoids creation of unnecessary "hip/hip" etc. subdirectories and additional files which have no actual use. 4. Do not write out hipified versions of files if they are identical to the original file. This results in a cleaner output directory, with minimal number of hipified files created. 5. Update header rewrite logic so that it accounts for the previous improvement. 6. Update header rewrite logic so it respects the rules for finding header files depending on whether `""` or `<>` is used. 7. Return a dictionary of mappings of original file paths to hipified file paths from `hipify` function. 8. Introduce a version for hipify module to allow extensions to contain back-compatible code that targets a specific point in PyTorch where the hipify functionality changed. 9. Update `cuda_to_hip_mappings.py` to account for the ROCm component subdirectories inside `/opt/rocm/include`. This also results in cleanup of the `Caffe2_HIP_INCLUDE` path to remove unnecessary additions to the include path. The list of changes to `cpp_extension.py` is as follows: 1. Call `hipify` when building a CUDAExtension for ROCm. 2. Prune the list of source files to CUDAExtension to include only the hipified versions of any source files in the list (if both original and hipified versions of the source file are in the list) 3. Add subdirectories of /opt/rocm/include to the include path for extensions, so that ROCm headers for subcomponent libraries are found automatically cc jeffdaily sunway513 hgaspar lcskrishna ashishfarmer Pull Request resolved: https://github.com/pytorch/pytorch/pull/45451 Reviewed By: ezyang Differential Revision: D24924736 Pulled By: malfet fbshipit-source-id: 4af42b8ff4f21c3782dedb8719b8f9f86b34bd2d	2020-11-18 08:37:49 -08:00
Gao, Xiang	b12d645c2f	Test TORCH_LIBRARY in CUDA extension (#47524 ) Summary: In the [official documentation](https://pytorch.org/tutorials/advanced/torch_script_custom_ops.html), it is recommended to use `TORCH_LIBRARY` to register ops for TorchScript. However, that code is never tested with CUDA extension and is actually broken (https://github.com/pytorch/pytorch/issues/47493). This PR adds a test for it. It will not pass CI now, but it will pass when the issue https://github.com/pytorch/pytorch/issues/47493 is fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47524 Reviewed By: zou3519 Differential Revision: D24991839 Pulled By: ezyang fbshipit-source-id: 037196621c7ff9a6e7905efc1097ff97906a0b1c	2020-11-16 13:12:22 -08:00
Sebastian Messmer	edf751ca2f	Make empty c10-full (#46092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46092 Make empty c10-full without using hacky-wrapper, i.e. port the kernel to the new style signature. This PR also changes the signature of some helpers called by empty to the new style. ghstack-source-id: 116544203 (Note: this ignores all push blocking failures!) Test Plan: vs prev diff (outdated, before c10::optional fix): https://www.internalfb.com/intern/fblearner/details/224735103/ after c10::optional fix: https://www.internalfb.com/intern/fblearner/details/231391773/ Also, after the c10::optional fix, the instruction counting benchmark shows a 2% regression for calling empty from Python. We decided this is acceptable and decided against landing D24425836 which would fix the regression. Reviewed By: ezyang Differential Revision: D24219944 fbshipit-source-id: e554096e90ce438c75b679131c3151ff8e5c5d50	2020-11-12 17:08:21 -08:00
Wanchao Liang	553ccccc54	[c10d] switch ProcessGroup to be managed by intrusive_ptr (#47343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47343 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D24723418 Pulled By: wanchaol fbshipit-source-id: 0463819b96c53b12bdbb3905431110d7b21beb77	2020-11-12 07:36:23 -08:00
Wanchao Liang	665ac2f7b0	[reland] [c10d] switch Store to be managed by intrusive_ptr (#47808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47808 reland https://github.com/pytorch/pytorch/pull/47074 Test Plan: wait for ci Reviewed By: gmagogsfm Differential Revision: D24905246 fbshipit-source-id: edeb7e6e486570ce889f12512e9dc02061d6cc03	2020-11-11 22:53:20 -08:00
Wanchao Liang	70ae5685f9	[reland][c10d] switch ProcessGroup::Work to be managed by intrusive_ptr (#47806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47806 reland https://github.com/pytorch/pytorch/pull/44046 Test Plan: wait for ci Reviewed By: gmagogsfm Differential Revision: D24905245 fbshipit-source-id: ad75ace5432fcfd22d513878f5a73c4bb017324e	2020-11-11 22:51:03 -08:00
Wanchao Liang	dac0192148	Revert D23632280: [c10d] switch ProcessGroup::Work to be managed by intrusive_ptr Test Plan: revert-hammer Differential Revision: D23632280 (`0650a6166f`) Original commit changeset: 0a4642a8ffab fbshipit-source-id: 2aa8ddb874fab11f773f4c08d740afcd865482e9	2020-11-11 10:54:08 -08:00
Wanchao Liang	1f946e942d	Revert D24667128: [c10d] switch Store to be managed by intrusive_ptr Test Plan: revert-hammer Differential Revision: D24667128 (`0cfe3451d4`) Original commit changeset: 9b6024c31c85 fbshipit-source-id: d8ddf9eb2fccef5023e05698e0c4662708fe4945	2020-11-11 10:49:58 -08:00
Wanchao Liang	0cfe3451d4	[c10d] switch Store to be managed by intrusive_ptr (#47074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47074 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D24667128 Pulled By: wanchaol fbshipit-source-id: 9b6024c31c851b7c3243540f460ae57323da523b	2020-11-10 23:36:44 -08:00
Wanchao Liang	0650a6166f	[c10d] switch ProcessGroup::Work to be managed by intrusive_ptr (#44046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44046 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23632280 Pulled By: wanchaol fbshipit-source-id: 0a4642a8ffabdd26c52c1baabfa30c0f446c3c85	2020-11-10 23:30:22 -08:00
Pritam Damania	a2b4177c5b	Add barrier() at the end of init_process_group and new_group. (#45181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45181 `init_process_group` and `new_group` update a bunch of global variables after initializing the actual process group. As a result, there is a race that after initializing the process group on say rank 0, if we immediately check the default process group on rank 1 (say via RPC), we might actually get an error since rank 1 hasn't yet updated its _default_pg variable. To resolve this issue, I've added barrier() at the end of both of these calls. This ensures that once these calls return we are guaranteed about correct initialization on all ranks. Since these calls are usually done mostly during initialization, it should be fine to add the overhead of a barrier() here. #Closes: https://github.com/pytorch/pytorch/issues/40434, https://github.com/pytorch/pytorch/issues/40378 ghstack-source-id: 112923112 Test Plan: Reproduced the failures in https://github.com/pytorch/pytorch/issues/40434 and https://github.com/pytorch/pytorch/issues/40378 and verified that this PR fixes the issue. Reviewed By: mrshenli Differential Revision: D23858025 fbshipit-source-id: c4d5e46c2157981caf3ba1525dec5310dcbc1830	2020-09-25 15:46:59 -07:00
Basil Hosmer	79e6aaeb4c	pull empty() out of use_c10_dispatcher: full (#43572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43572 Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D23326019 Pulled By: bhosmer fbshipit-source-id: 10a4d7ffe33b4be4ae45396725456c6097ce1757	2020-08-26 22:51:06 -07:00
Sebastian Messmer	3a19af2427	Make operators with optional Tensor? arguments c10-full (#41610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41610 Previously, operators that have a `Tensor?` (i.e. optional tensor) in their schema implemented it using `Tensor` in C++ and filled in an undefined tensor for the None case. The c10 operator library, however, expects `Tensor?` to be represented as `optional<Tensor>`, so those operators couldn't be c10-full yet and still had to use codegenerated unboxing instead of templated unboxing. This PR changes that. It extends the `hacky_wrapper_for_legacy_signatures` to not only take case of TensorOptions, but now also map between signatures taking `Tensor` and `optional<Tensor>`. For this, it requires an additional template parameter, the expected signature, and it uses that to go argument-by-argument and unwrap any optionals it finds. ghstack-source-id: 108873701 Test Plan: waitforsandcastle Reviewed By: bhosmer Differential Revision: D22607879 fbshipit-source-id: 57b2fb01a294b804f82cd55cd70f0ef4a478e14f	2020-07-31 16:09:08 -07:00
Stanislau Hlebik	b774ce54f8	remediation of S205607 fbshipit-source-id: 798decc90db4f13770e97cdce3c0df7d5421b2a3	2020-07-17 17:19:47 -07:00
Stanislau Hlebik	8fdea489af	remediation of S205607 fbshipit-source-id: 5113fe0c527595e4227ff827253b7414abbdf7ac	2020-07-17 17:17:03 -07:00
Omkar Salpekar	9d92fa2679	[NCCL] Add timeout to ProcessGroup Work Wait (#40944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40944 This stack adds Work-level timeout for blocking wait. This PR just changes the API to accept a default wait arg for the wait function in each ProcessGroup backend. The ProcessGroup superclass correctly waits for the given timeout by changing the CV wait to wait_for. Closes: https://github.com/pytorch/pytorch/issues/37571 ghstack-source-id: 107835735 Test Plan: Tests in 4th PR in this stack Reviewed By: jiayisuse Differential Revision: D22107135 fbshipit-source-id: b38c07cb5e79e6c86c205e580336e7918ed96501	2020-07-16 10:56:58 -07:00
Changji Shi	47c72be3d7	Port /test/cpp_extensions/rng_extension.cpp to new operator registration API (#39459 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39459 Update to this PR: this code isn't going to fully solve https://github.com/pytorch/pytorch/issues/37010. The changes required for 37010 is more than this PR initially planned. Instead, this PR switches op registration of rng related tests to use the new API (similar to what was done in #36925) Test Plan: 1) unit tests Imported from OSS Reviewed By: ezyang Differential Revision: D22264889 fbshipit-source-id: 82488ac6e3b762a756818434e22c2a0f9cb9dd47	2020-06-26 16:12:54 -07:00
Sebastian Messmer	0494e0ad70	Back out "Revert D21581908: Move TensorOptions ops to c10" (#40595 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40595 ghstack-source-id: 106691774 Test Plan: waitforsandcastle Differential Revision: D22247729 fbshipit-source-id: 14745588cae267c1e0cc51cd9541a9b8abb830e5	2020-06-26 12:57:09 -07:00
Sebastian Messmer	581ad48806	Revert D21581908: Move TensorOptions ops to c10 Test Plan: revert-hammer Differential Revision: D21581908 Original commit changeset: 6d4a9f526fd7 fbshipit-source-id: fe1e6368a09120ea40dea405e8409983541e3cb5	2020-06-23 16:10:07 -07:00
Sebastian Messmer	b623bdeabb	Move TensorOptions ops to c10 (#39492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39492 This PR adds use_c10_dispatcher: full to ops taking TensorOptions. To allow this, since the c10 operator library doesn't know about TensorOptions, we need to register the operator kernels as optional<ScalarType>, optional<Device>, optional<Layout>, optional<bool> instead, and also call them this way. Changes: Add use_c10_dispatcher: full to those ops Write hacky_wrapper_for_legacy_signatures which takes an old-style kernel (i.e. one written to take TensorOptions) an creates a wrapper kernel for it that takes the scattered optional<ScalarType>, optional<Device>, optional<Layout>, optional<bool> instead. Change codegen so that all op registrations are wrapped into hacky_wrapper_for_legacy_signatures. This is added to all ops but is a no-op if the op doesn't take TensorOptions. This allows us in the future to just change a kernel signature from TensorOptions to the scattered version and have it work without having to touch codegen. Change codegen so that the frontend calls those operators with expanded arguments instead of with a TensorOptions object. This is required because now the kernels are written in this way. This PR does not remove TensorOptions special cases from codegen, but instead it separates kernels from the codegen/frontend issues. After this, kernels can be worked on separately without having to touch codegen and codegen can be worked on without having to touch kernels. Codegen diff: P133121032 ghstack-source-id: 106426630 Test Plan: waitforsandcastle Differential Revision: D21581908 fbshipit-source-id: 6d4a9f526fd70fae40581bf26f3ccf794ce6a89e	2020-06-23 14:13:34 -07:00
Sebastian Messmer	86b1afa039	Assert that kernels are called with the right signature (#40251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40251 Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel. This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly. ghstack-source-id: 106194240 Test Plan: waitforsandcastle Differential Revision: D22126701 fbshipit-source-id: 6c908a822e295757bcc0014f78f51e6a560f221f	2020-06-18 21:54:05 -07:00
Sebastian Messmer	cb8b2f0636	Revert D21534052: Assert that kernels are called with the right signature Test Plan: revert-hammer Differential Revision: D21534052 Original commit changeset: 6be436a3f205 fbshipit-source-id: a149c5ca7f9e78947ae3059ac4470712f291660b	2020-06-18 15:00:13 -07:00
Sebastian Messmer	55cdd31bd0	Assert that kernels are called with the right signature (#38361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38361 Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel. This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly. supersedes D17485438 ghstack-source-id: 106178820 Test Plan: waitforsandcastle Differential Revision: D21534052 fbshipit-source-id: 6be436a3f20586277a051d764af29e21d5567da0	2020-06-18 14:22:48 -07:00
Kurt Mohler	f9eb8824f1	Remove datatype from Storage and StorageImpl (#38870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38870 * Removed dtype data member from StorageImpl * Removed any methods or method arguments in Storage/StorageImpl that deal with dtypes * Update all callers of the changed API Part of issue https://github.com/pytorch/pytorch/issues/33950 Original PR: https://github.com/pytorch/pytorch/pull/38038 Reviewed By: albanD Differential Revision: D21549645 Pulled By: ezyang fbshipit-source-id: 4289b356c55ff6b9530376a79343b99b540ee3de	2020-05-21 15:26:08 -07:00
lixinyu	5a979fcb99	allow user passing relative paths in include_dirs within setuptools.setup (#38264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38264 Test Plan: Imported from OSS Differential Revision: D21509277 Pulled By: glaringlee fbshipit-source-id: b0bc17d375a89b96b1bdacde5987b4f4baa9468e	2020-05-13 20:00:12 -07:00
Edward Yang	fe88806784	Back out "Revert D21171334: [pytorch][PR] Change StorageImpl to track byte count rather than element count" (#37893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37893 Original commit changeset: 50746043acf3 Test Plan: sandcastle and ossci Reviewed By: malfet, seemethere, ngimel Differential Revision: D21416509 fbshipit-source-id: 735ec4e61f9d36d4537f52dd2dc6267751aeb94b	2020-05-05 22:43:15 -07:00
Edward Yang	a2fc7f787a	Revert D21171334: [pytorch][PR] Change StorageImpl to track byte count rather than element count Test Plan: revert-hammer Differential Revision: D21171334 Original commit changeset: 37329a379de9 fbshipit-source-id: 50746043acf3c76754688de0fe6f1cc12437ea2f	2020-05-05 16:36:15 -07:00
Kurt Mohler	3706803b60	Change StorageImpl to track byte count rather than element count (#37776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37776 * Remove type-specific size tracking in favor of byte size tracking in Storage and StorageImpl * Changed numel() and set_numel() to nbytes() and set_nbytes() * Added enum argument to Storage/StorageImpl constructor to indicate new meaning of the size parameter * Update all callers of the changed API Part of issue https://github.com/pytorch/pytorch/issues/33950 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37028 Differential Revision: D21171334 Pulled By: ezyang fbshipit-source-id: 37329a379de9a3a83cc5e9007e455a3e1c2d10b8	2020-05-05 14:20:51 -07:00
Pavel Belevich	1beca4ac6a	Prerequisites for CSPRNG (#36631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36631 Summary of changes 1. Moved random transformation functions to DistributionHelper.h (`uniform_int_from_to_distribution`, `uniform_int_full_range_distribution`, `uniform_int_distribution`) to avoid code duplication between default CPU, CUDA rngs and custom rng extensions 2. Made GeneratorImpl fields protected instead of private 3. Introduced `TORCH_CHECK_IF_NOT_ON_CUDA` that does the same as `TORCH_CHECK` if it is not CUDA/ROCm device 4. To test multiple rng extensions I had to move ops registration to the method `registerOps()`, expose it to python and call it `def setUp(self)` Test Plan: Imported from OSS Differential Revision: D21229202 Pulled By: pbelevich fbshipit-source-id: 6aa3280f2fc3324cf3e748388b5087e3a1e49f23	2020-04-24 12:25:37 -07:00
Edward Yang	a894fff265	Back out "Revert D21089648: Put TORCH_LIBRARY in torch/library.h; add custom class API" Summary: Original commit changeset: 636e8a11afc6 Test Plan: export to OSS Reviewed By: malfet Differential Revision: D21170502 fbshipit-source-id: e8f35f103c4924aedbcaaf868475008d24bdeeab	2020-04-22 09:18:23 -07:00
James Reed	2ccdc39dce	Revert D21089648: Put TORCH_LIBRARY in torch/library.h; add custom class API Test Plan: revert-hammer Differential Revision: D21089648 Original commit changeset: 8d54329c1252 fbshipit-source-id: 636e8a11afc628a4cdae9d44824985c10c70555e	2020-04-21 12:21:45 -07:00
Edward Yang	01100cb477	Put TORCH_LIBRARY in torch/library.h; add custom class API (#36742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36742 Now, you can define a custom class inside a TORCH_LIBRARY block. It looks very similar to what you did before. Instead of ``` static auto m = torch::class_<Class>("Namespace", "Class").def("foo", foo); ``` you write ``` TORCH_LIBRARY(Namespace, m) { m.class_<Class>("Class") .def("foo", foo); } ``` All the old usages still work, but at some point we should start updating the tutorials when we're ready to go 100% live with the new pybind11 style API. custom class API previously lived in torch/ folder and in torch namespace, so for consistency, the new TORCH_LIBRARY also got moved to torch/library.h The definition of Library::class_ is in the bottom of that header because I need all of the class_ constructors available, but there is a circular dependency between the two headers. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D21089648 Test Plan: Imported from OSS Pulled By: ezyang fbshipit-source-id: 8d54329c125242605336c22fa1642aae6940b507	2020-04-21 10:05:21 -07:00
Edward Yang	e29348f828	Switch to pybind11 style registration function API. (#36258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36258 Previous we had a && chaining style API. There are some downsides to this API: - It's easy to forget the 'static' qualifier in front, leading to subtle ODR bugs. - It is not compatible with torchbind class_ definitions, as these need multiple levels of chaining. So in practice people end up having to define multiple static initializers, one per class. - It's not like pybind11. - There's no way to conveniently get the file and line number of the registration, as there is no macro point in the API. - The old API doesn't really encourage people to put all of their definitions for a library in one place, and to give a custom namespace for it. Similarly, the old API wasn't very DRY, because you had to keep repeating the namespace/dispatch key you were writing implementations for. The new API is modeled exactly off of the PYBIND11_MODULE macro: you write: ``` TORCH_LIBRARY(aten, m) { m.def("aten::add(Tensor self, Tensor other) -> Tensor"); ... } ``` in a non-chaining fashion, and under the hood the macro expands to define a function, and define a static initializer that allocates c10::Library (previously called c10::Module, but we renamed it to avoid confusion with the existing NN module concept), passes it to your function, and then retains it for the rest of the lifetime of the program. Specification of the namespace is mandatory, and in later commit I plan to make it a hard error to TORCH_LIBRARY the same library name twice. If you are specifying an implementation for an existing operator (e.g., you're the XLA backend, or even if you're just putting registrations for implementations at the implementation site), you should use TORCH_LIBRARY_IMPL, which instead takes a backend argument (instead of namespace) and can be used to specify an implementation for a backend. Unlike TORCH_LIBRARY, you can do as many of these as you want for a backend. This needs updates to the mobile code analyzer. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20929257 Pulled By: ezyang fbshipit-source-id: ba04d78492e8c93ae7190165fb936f6872896ada	2020-04-16 10:44:21 -07:00
Edward Yang	dd64e738c5	Expunge TensorId from all DispatchKey names. (#36240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36240 It's annoying, historical, and unnecessary (enum class is already namespaced). I did this codemod with: ``` git grep -l 'CPUTensorId' \| xargs sed -i 's/CPUTensorId/CPU/g' git grep -l 'CUDATensorId' \| xargs sed -i 's/CUDATensorId/CUDA/g' git grep -l 'VariableTensorId' \| xargs sed -i 's/VariableTensorId/Autograd/g' git grep -l 'HIPTensorId' \| xargs sed -i 's/HIPTensorId/HIP/g' git grep -l 'MSNPUTensorId' \| xargs sed -i 's/MSNPUTensorId/MSNPU/g' git grep -l 'XLATensorId' \| xargs sed -i 's/XLATensorId/XLA/g' git grep -l 'PrivateUse1_TensorId' \| xargs sed -i 's/PrivateUse1_TensorId/PrivateUse1/g' git grep -l 'PrivateUse2_TensorId' \| xargs sed -i 's/PrivateUse2_TensorId/PrivateUse2/g' git grep -l 'PrivateUse3_TensorId' \| xargs sed -i 's/PrivateUse3_TensorId/PrivateUse3/g' git grep -l 'AutocastTensorId' \| xargs sed -i 's/AutocastTensorId/Autocast/g' git grep -l '_PreAutogradTensorId' \| xargs sed -i 's/_PreAutogradTensorId/_PreAutograd/g' git grep -l 'TESTING_ONLY_GenericWrapperTensorId' \| xargs sed -i 's/TESTING_ONLY_GenericWrapperTensorId/TESTING_ONLY_GenericWrapper/g' git grep -l 'TESTING_ONLY_GenericModeTensorId' \| xargs sed -i 's/TESTING_ONLY_GenericModeTensorId/TESTING_ONLY_GenericMode/g' ``` Then I did a git grep for remaining TensorId occurrences, and manually killed those (mostly in codegen, and some docs that needed updating). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20929255 Pulled By: ezyang fbshipit-source-id: dc371b6aa6e6ea7c0a5660137c14debde806a09d	2020-04-13 23:33:44 -07:00
Pavel Belevich	c9a1fc2b31	replace Generator arguments with c10::optional<Generator> (#36232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36232 The purpose of this PR is to replace `at::Generator generator = nullptr` with `c10::optional<at::Generator> = c10::nullopt` all over the code * #36230 Replace std::shared_ptr with c10::intrusive_ptr in at::Generator Test Plan: Imported from OSS Differential Revision: D20943603 Pulled By: pbelevich fbshipit-source-id: 65d335990f01fcc706867d5344e73793fad68ae6	2020-04-13 16:26:57 -07:00
Edward Yang	2de3e491a8	[RELAND] Add temporary impl_UNBOXED syntax sugar for unboxed-only defs. (#36223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36223 Previously #35714 There are a lot of unboxed only defs. We're committed to removing them at the end of the half but as I am about to do a lot of porting to the new API, let's get them into a form where they're easy to remove. This is a new overload impl_UNBOXED that will pass the function pointer straight to CppFunction::makeUnboxedOnly I don't attempt to make the _UNBOXED API complete; in particular, catchall declarations don't get this sugar (as there are very few of them). To get some coverage of _UNBOXED API for code analysis, I switched one of our unboxed tests to be an impl rather than a def. This shouldn't materially affect coverage. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20929259 Pulled By: ezyang fbshipit-source-id: 72d2061b6c8a6afbcd392b47f53ade18de2f9184	2020-04-09 14:58:33 -07:00
Edward Yang	ef07bb65e9	[RELAND] Add DispatchKey impl overload; remove use of torch::dispatch (#36222 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36222 Reland of #35706, with fixes to code analyzer. It is extremely common to define implementations of operators at a specific dispatch key, so we add an overload to impl specifically for this case. I then delete most uses of torch::dispatch dispatch_autograd call sites can't make use of this overload. So instead the new preferred way to specify something as autograd is to pass kAutograd as the dispatch key (short form, analogous to kCPU/kCUDA which we support today). I flip flopped about whether or not kAutograd should have the type DispatchKey or some other type (to help better encapsulate the DispatchKey enum); this is more direct and I can't think of any BC problems from this usage. Some other reorganization I did: - I renamed all of the worker functions in op_registration to have a leading underscore and made them private, just to make it more clear what the public versus private API were (the private API shouldn't be used by users because it doesn't come with && overloads) Note that this means I needed to adjust the regex in the code analyzer, because - In a few places where I was touching lines already, I replaced full DispatchKey typed out enums with shorter kFoo names, similar to kAutograd but I didn't publish these globally. - Code analyzer now prints a unified diff, and in the other order (because I tend to think of the diff as reporting how the /new/ result is different) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20929256 Pulled By: ezyang fbshipit-source-id: c69b803d2b3a1a8aff70e14da33d3adec5239f13	2020-04-09 14:56:55 -07:00
Feng Tian	762270c51f	add c10d dynamic loading mechanism and unit test (#28068 ) Summary: The original behavior of pytorch c10d only supports built-in c10d backends, such as nccl/gloo/mpi. This patch is used to extend the c10d capability to support dynamically loading 3rd party communication libraries which are derived from ProcessGroup base class. related RFC is in: https://github.com/pytorch/pytorch/issues/27955 Through this way, user just need specify a 3rd party c10d backend name when invoking torch.distributed.init_process_group(). The proposed logic will try to load corresponding c10d backend cpp extension automatically. as for how to develop a new 3rd party c10d backend through cpp extension, pls refer to test/cpp_extensions/cpp_c10d_extension.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/28068 Differential Revision: D19174838 Pulled By: agolynski fbshipit-source-id: 3409a504a43ce7260e6f9d1207c00e87471fac62	2020-04-02 15:46:51 -07:00
Karl Ostmo	0f99b28431	Revert D20775783: Add DispatchKey impl overload; remove use of torch::dispatch Test Plan: revert-hammer Differential Revision: D20775783 Original commit changeset: e45b289e5d1f fbshipit-source-id: 08551428fa886e93cfda14eb51a0f920c335df34	2020-04-02 10:51:50 -07:00
Karl Ostmo	e67951af63	Revert D20775782: Add temporary impl_UNBOXED syntax sugar for unboxed-only defs. Test Plan: revert-hammer Differential Revision: D20775782 Original commit changeset: c5e804c69f59 fbshipit-source-id: 2198e715eb3a24d198a949a44ec192bec523ffb4	2020-04-02 10:48:51 -07:00
Edward Yang	8e951c5793	Add temporary impl_UNBOXED syntax sugar for unboxed-only defs. (#35714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35714 There are a lot of unboxed only defs. We're committed to removing them at the end of the half but as I am about to do a lot of porting to the new API, let's get them into a form where they're easy to remove. This is a new overload impl_UNBOXED that will pass the function pointer straight to CppFunction::makeUnboxedOnly I don't attempt to make the _UNBOXED API complete; in particular, catchall declarations don't get this sugar (as there are very few of them). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20775782 Pulled By: ezyang fbshipit-source-id: c5e804c69f5961c9d4862f6c5dbbe4c524cc32cc	2020-04-02 08:52:54 -07:00
Edward Yang	2db61193bb	Add DispatchKey impl overload; remove use of torch::dispatch (#35706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35706 It is extremely common to define implementations of operators at a specific dispatch key, so we add an overload to impl specifically for this case. I then delete most uses of torch::dispatch dispatch_autograd call sites can't make use of this overload. So instead the new preferred way to specify something as autograd is to pass kAutograd as the dispatch key (short form, analogous to kCPU/kCUDA which we support today). I flip flopped about whether or not kAutograd should have the type DispatchKey or some other type (to help better encapsulate the DispatchKey enum); this is more direct and I can't think of any BC problems from this usage. Some other reorganization I did: - I renamed all of the worker functions in op_registration to have a leading underscore and made them private, just to make it more clear what the public versus private API were (the private API shouldn't be used by users because it doesn't come with && overloads) - In a few places where I was touching lines already, I replaced full DispatchKey typed out enums with shorter kFoo names, similar to kAutograd but I didn't publish these globally. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20775783 Pulled By: ezyang fbshipit-source-id: e45b289e5d1f86c180b24cf14c63cf4459ab5337	2020-04-02 08:51:22 -07:00
Edward Yang	4f4ed5c108	Disable c10::import(ns) (#35398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35398 This disables namespaced c10::import which is broken with custom mobile op builds. This is to help prevent people from accidentally breaking the custom mobile build in a mysterious way; if they use the longform version it will work. Fixing the analyzer is tracked in https://github.com/pytorch/pytorch/issues/35397 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20680519 Pulled By: ezyang fbshipit-source-id: a18ac8df7e72bf399807870beedb828131273e48	2020-03-30 21:12:49 -07:00
Edward Yang	9e3605de98	[RELAND] New operator registration API (#35061 ) (#35629 ) Summary: Reland of https://github.com/pytorch/pytorch/pull/35061 ; removed the get qualified type name magic from debug strings to work around MSVC 2017 bug. Main points of the new API: - You can register implementations (impl) without having to specify a schema. - Registrations are commutative, so no matter what order your static initializers run, you end up with the same end result. op_registration_test.cpp contains a reasonably comprehensive accounting for the available API surface How does this implementation proceed? The basic concept is to relax the internal invariants of Dispatcher data structures to allow the possibility that a FunctionSchema is not specified in an Operator. - DispatchKeyExtractor has an uninitialized state where it doesn't look for dispatch keys in any arguments of the stack. It can have a schema (de)registered to itself post facto with registerSchema/unregisterSchema. - DispatchTable has a new constructor taking only an OperatorName for the uninitialized state. It can have a schema (de)registered to itself post facto with registerSchema/unregisterSchema - OperatorDef maintains counts of both defs and well as defs_and_impls. defs_and_impls keeps track of the outstanding impl registrations; you may have impl registrations but no defs. If there are no defs (no schema), the operator is not returned by findSchema. A new findOperatorByName fucntion unconditionally returns the OperatorHandle even if there's no schema. OperatorHandle::hasSchema can be used to check if the operator has schema. - Replaced 'registerKernel' with 'registerImpl', which is the new interface for directly registering kernels without implementations. - Because 'registerImpl' no longer requires an OperatorHandle, change 'registerDef' to only return a RegistrationHandleRAII. This is marginally less efficient (since we're doing two hash table lookups on a registration now), but this won't matter in the long term, and probably doesn't matter now either. - Rename registerBackendFallbackKernel to registerFallback (this exposed a bunch of places where we're improperly directly interfacing with Dispatcher; we need to add this capability to the true public API) - All code generated internal registrations are switched to use the new API. This includes VariableType registrations (which previously weren't converted) and the mobile autograd stuff - Switch the new-style def()/impl() APIs to interact directly with Dispatcher, rather than indirecting through the old API - We deleted alias analysis kind merging entirely. As a nod to BC, it's possible to define a full schema with alias analysis kind, and then later do another full schema def with missing alias analysis kind, but the opposite direction is not allowed. We can remove this entirely following the plan at https://github.com/pytorch/pytorch/issues/35040 - Schema matching is moved inside the dispatcher, because we might not be able to immediately schema match at the point of an impl() (because we don't have the schema yet). To do this, we store the inferred function schema inside a KernelEntry, so we can check it when we get the real schema. - Registered kernel functions now store a debug string which can be used to more easily identify them. Tests use this to distinguish between multiple distinct registrations; regular invocations get only very basic information. Because we need our static initializers to work no matter what order they're run, the testing strategy on this PR is quite involved. The general concept: - Bind a (very gimped) version of the dispatcher API from Python, so that we can easily write a more complex testing harness using expect tests. - For series of registrations we want to test, exhaustively test every possible permutation of registrations (and deregistrations), and show that the intermediate states agree no matter what path is taken. - Intermediate states are rendered using a new dumpState() debugging method that prints the internal state of the dispatcher. This method may be generally useful for people who want to see what's in the dispatcher. - Simultaneously, add a new invariant testing function which checks that the internal invariants of the dispatcher are upheld (so we don't have to print internal implementation details of the dispatcher) The testing framework found a few bugs in development. For example, here is a case where we registered schema too early, before checking if it was valid: ``` Traceback (most recent call last): File "test/test_dispatch.py", line 164, in test_def_impl_schema_mismatch ], raises=True) File "test/test_dispatch.py", line 135, in commute results=results, raises=raises) File "test/test_dispatch.py", line 83, in run_permutation .format(ctor_order[:i], op_ix)) File "test/test_dispatch.py", line 59, in check_invariants .format(expected_provenance, actual_provenance) AssertionError: 'name[16 chars]ema: (none)\ncatchall: boxed unboxed :: (Tenso[18 chars]0)\n' != 'name[16 chars]ema: test::foo(Tensor x, Tensor y) -> (Tensor)[53 chars]0)\n' name: test::foo - schema: (none) + schema: test::foo(Tensor x, Tensor y) -> (Tensor) catchall: boxed unboxed :: (Tensor _0) -> (Tensor _0) : expected from running ctors (1,); actual from running ctors (1,) and then failing to run ctor 0 (did this failure leave the dispatcher in a wedged state? it shouldn't!) ``` There are also C++ smoketests for the API. These tests comprehensively cover the C++ API surface of the new operator registration API, but don't check very hard if the API does the right thing (that's what test_dispatch.py is for) Some miscellaneous changes which could have been split into other PRs, but I was too lazy to do so: - Add torch::jit::parseName (mirroring parseSchema/parseSchemaOrName) - Add cloneWithName functionality to FunctionSchema - Unconditionally generate schema registration, even when type_method_dispatch is a dict. The one exception is for manual registrations.... - Add fallback, CppFunction::makeFallthrough and CppFunction::makeFromBoxedFunction to public API of op_registration, so we can stop calling internal registerImpl directly - Add new syntax sugar dispatch_autograd for registering autograd kernels - Minor OperatorName cleanup, storing OperatorName in DispatchTable and defining operator<< on OperatorName - Refactored the op registration API to take FunctionSchema directly. We now do namespacing by post facto fixing up the OperatorName embedded in FunctionSchema. This also means that you can now do torch::import("ns1").def("ns2::blah") and have the ns2 override ns1 (although maybe this is not the correct behavior.) - New torch::schema public API, for attaching alias analysis kind annotation kinds. This meant we had to template up some function signatures which previously took const char*. There's now a nice comment explaining this strategy. - torch::import now takes std::string which means we can use the namespacing from Python Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/35629 Differential Revision: D20724551 Pulled By: ezyang fbshipit-source-id: befa46a1affb4ec4ae1fb39e3564a63695a6ca41	2020-03-29 19:48:29 -07:00
Edward Yang	227beb9095	Revert D20680520: New operator registration API Test Plan: revert-hammer Differential Revision: D20680520 Original commit changeset: 5d39a28e4ec7 fbshipit-source-id: 5b2497ffc24db9a05b01d526f161bc0164f9f707	2020-03-28 14:49:56 -07:00
Edward Yang	28ab8c6ff8	New operator registration API (#35061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35061 Main points of the new API: - You can register implementations (impl) without having to specify a schema. - Registrations are commutative, so no matter what order your static initializers run, you end up with the same end result. op_registration_test.cpp contains a reasonably comprehensive accounting for the available API surface How does this implementation proceed? The basic concept is to relax the internal invariants of Dispatcher data structures to allow the possibility that a FunctionSchema is not specified in an Operator. - DispatchKeyExtractor has an uninitialized state where it doesn't look for dispatch keys in any arguments of the stack. It can have a schema (de)registered to itself post facto with registerSchema/unregisterSchema. - DispatchTable has a new constructor taking only an OperatorName for the uninitialized state. It can have a schema (de)registered to itself post facto with registerSchema/unregisterSchema - OperatorDef maintains counts of both defs and well as defs_and_impls. defs_and_impls keeps track of the outstanding impl registrations; you may have impl registrations but no defs. If there are no defs (no schema), the operator is not returned by findSchema. A new findOperatorByName fucntion unconditionally returns the OperatorHandle even if there's no schema. OperatorHandle::hasSchema can be used to check if the operator has schema. - Replaced 'registerKernel' with 'registerImpl', which is the new interface for directly registering kernels without implementations. - Because 'registerImpl' no longer requires an OperatorHandle, change 'registerDef' to only return a RegistrationHandleRAII. This is marginally less efficient (since we're doing two hash table lookups on a registration now), but this won't matter in the long term, and probably doesn't matter now either. - Rename registerBackendFallbackKernel to registerFallback (this exposed a bunch of places where we're improperly directly interfacing with Dispatcher; we need to add this capability to the true public API) - All code generated internal registrations are switched to use the new API. This includes VariableType registrations (which previously weren't converted) and the mobile autograd stuff - Switch the new-style def()/impl() APIs to interact directly with Dispatcher, rather than indirecting through the old API - We deleted alias analysis kind merging entirely. As a nod to BC, it's possible to define a full schema with alias analysis kind, and then later do another full schema def with missing alias analysis kind, but the opposite direction is not allowed. We can remove this entirely following the plan at https://github.com/pytorch/pytorch/issues/35040 - Schema matching is moved inside the dispatcher, because we might not be able to immediately schema match at the point of an impl() (because we don't have the schema yet). To do this, we store the inferred function schema inside a KernelEntry, so we can check it when we get the real schema. - Registered kernel functions now store a debug string which can be used to more easily identify them. There's some best effort stuff based on __FUNCSIG__ but this is only really capable of reporting types and not function symbols. Tests use this to distinguish between multiple distinct registrations. Because we need our static initializers to work no matter what order they're run, the testing strategy on this PR is quite involved. The general concept: - Bind a (very gimped) version of the dispatcher API from Python, so that we can easily write a more complex testing harness using expect tests. - For series of registrations we want to test, exhaustively test every possible permutation of registrations (and deregistrations), and show that the intermediate states agree no matter what path is taken. - Intermediate states are rendered using a new dumpState() debugging method that prints the internal state of the dispatcher. This method may be generally useful for people who want to see what's in the dispatcher. - Simultaneously, add a new invariant testing function which checks that the internal invariants of the dispatcher are upheld (so we don't have to print internal implementation details of the dispatcher) The testing framework found a few bugs in development. For example, here is a case where we registered schema too early, before checking if it was valid: ``` Traceback (most recent call last): File "test/test_dispatch.py", line 164, in test_def_impl_schema_mismatch ], raises=True) File "test/test_dispatch.py", line 135, in commute results=results, raises=raises) File "test/test_dispatch.py", line 83, in run_permutation .format(ctor_order[:i], op_ix)) File "test/test_dispatch.py", line 59, in check_invariants .format(expected_provenance, actual_provenance) AssertionError: 'name[16 chars]ema: (none)\ncatchall: boxed unboxed :: (Tenso[18 chars]0)\n' != 'name[16 chars]ema: test::foo(Tensor x, Tensor y) -> (Tensor)[53 chars]0)\n' name: test::foo - schema: (none) + schema: test::foo(Tensor x, Tensor y) -> (Tensor) catchall: boxed unboxed :: (Tensor _0) -> (Tensor _0) : expected from running ctors (1,); actual from running ctors (1,) and then failing to run ctor 0 (did this failure leave the dispatcher in a wedged state? it shouldn't!) ``` There are also C++ smoketests for the API. These tests comprehensively cover the C++ API surface of the new operator registration API, but don't check very hard if the API does the right thing (that's what test_dispatch.py is for) Some miscellaneous changes which could have been split into other PRs, but I was too lazy to do so: - Add torch::jit::parseName (mirroring parseSchema/parseSchemaOrName) - Add cloneWithName functionality to FunctionSchema - Unconditionally generate schema registration, even when type_method_dispatch is a dict. The one exception is for manual registrations.... - Add fallback, CppFunction::makeFallthrough and CppFunction::makeFromBoxedFunction to public API of op_registration, so we can stop calling internal registerImpl directly - Add new syntax sugar dispatch_autograd for registering autograd kernels - Minor OperatorName cleanup, storing OperatorName in DispatchTable and defining operator<< on OperatorName - Refactored the op registration API to take FunctionSchema directly. We now do namespacing by post facto fixing up the OperatorName embedded in FunctionSchema. This also means that you can now do torch::import("ns1").def("ns2::blah") and have the ns2 override ns1 (although maybe this is not the correct behavior.) - New torch::schema public API, for attaching alias analysis kind annotation kinds. This meant we had to template up some function signatures which previously took const char*. There's now a nice comment explaining this strategy. - torch::import now takes std::string which means we can use the namespacing from Python Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20680520 Pulled By: ezyang fbshipit-source-id: 5d39a28e4ec7c73fe4b1fb2222e865ab65e188f5	2020-03-28 10:52:49 -07:00
Pavel Belevich	11a40410e7	pybind11 type_caster for at::Generator and custom RNG python test (#34774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34774 This PR provides pybind11's `type_caster<at::Generator>` that allows mapping `at::Generator` instance returned from user-defined method to python `torch::Generator`, defined as `THPGenerator ` c++ class. This allows 1) defining custom RNG in c++ extension 2) using custom RNG in python code. `TestRNGExtension.test_rng` shows how to use custom RNG defined in `rng_extension.cpp` Test Plan: Imported from OSS Differential Revision: D20549451 Pulled By: pbelevich fbshipit-source-id: 312a6deccf8228f7f60695bbf95834620d52f5eb	2020-03-22 10:57:35 -07:00
Edward Yang	22963f42ec	Delete unnecessary aliasAnalysis specification from operator registrations. (#33093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33093 In #30187 the aliasAnalysis field on operator registration was updated so that alias analysis could be specified in only some registration call sites, rather than requiring it be consistently specified in all call sites. With this change, we can eliminate the requirement that all registrations specify aliasAnalysis; as long as we know one site specifies the correct aliasAnalysis, we don't have to specify it any of the other sites. In this patch, the "one site" is TypeDefault.cpp (previously we only generated these stub declarations for manually registered functions, but now we generate the stubs for everything). Then I delete aliasAnalysis anywhere we register an op for an existing function (which is a lot of places). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19837897 Pulled By: ezyang fbshipit-source-id: 26a7fbc809ec1553da89ea5c0361f3e81526d4c2	2020-02-21 14:24:44 -08:00
ashish	616beb1412	[ROCm] Added support for pytorch extensions to use HIP (#32669 ) Summary: This pull request has changes for: 1. Enabling a torch module with HIP code to be compiled by cpp_extensions.py 2. Fixes for hipify module to be able to be used by a torch extension cc: ezyang iotamudelta jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/32669 Differential Revision: D20033893 Pulled By: zou3519 fbshipit-source-id: fd6ddc8cdcd3930f41008636bb2bc9dd26cdb008	2020-02-21 12:10:02 -08:00
Edward Yang	9c8b67b179	Revert D19905015: Revert D19858239: [pytorch][PR] Refactor and add VS 14.16 and 2019 CI for Windows Test Plan: revert-hammer Differential Revision: D19905015 Original commit changeset: b117e44d5552 fbshipit-source-id: a10c78aed953434f69f466bdd36f914334ba82f3	2020-02-14 13:42:29 -08:00
George Guanheng Zhang	ff5f38f53b	Revert D19858239: [pytorch][PR] Refactor and add VS 14.16 and 2019 CI for Windows Test Plan: revert-hammer Differential Revision: D19858239 Original commit changeset: f068d8505886 fbshipit-source-id: b117e44d5552e157747920d8098ce3b86a29c6bf	2020-02-14 07:35:08 -08:00
peter	946f3a9ed7	Refactor and add VS 14.16 and 2019 CI for Windows (#33117 ) Summary: Changes according to https://github.com/pytorch/pytorch/issues/18319. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33117 Differential Revision: D19858239 Pulled By: ezyang fbshipit-source-id: f068d8505886b92c9388c9c636eab5bd20377ceb	2020-02-13 11:45:41 -08:00
Richard Zou	6209412647	Add option to use ninja to compile ahead-of-time cpp_extensions (#32495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32495 Background ------------------------------ Previously, ninja was used to compile+link inline cpp_extensions and ahead-of-time cpp_extensions were compiled with distutils. This PR adds the ability to compile (but not link) ahead-of-time cpp_extensions with ninja. The main motivation for this is to speed up cpp_extension builds: distutils does not make use of parallelism. With this PR, using the new option, on my machine, - torchvision compilation goes from 3m43s to 49s - nestedtensor compilation goes from 2m0s to 28s. User-facing changes ------------------------------ I added a `use_ninja` flag to BuildExtension. This defaults to `True`. When `use_ninja` is True: - it will attempt to use ninja. - If we cannot use ninja, then this throws a warning and falls back to distutils. - Situations we cannot use ninja: Windows (NYI, I'll open a new issue for this), if ninja cannot be found on the system. Implementation Details ------------------------------ This PR makes this change in two steps. Please me know if it would be easier to review this if I split this up into a stacked diff. Those changes are: 1) refactor _write_ninja_file to separate the policy (what compiler flags to pass) from the mechanism (how to write the ninja file and do compilation). 2) call _write_ninja_file and _run_ninja_build while building ahead-of-time cpp_extensions. These are only used to compile objects; distutils still handles the linking. Change 1: refactor _write_ninja_file to seperate policy from mechanism - I split _write_ninja_file into: _write_ninja_file and _write_ninja_file_to_build_library - I renamed _build_extension_module to _run_ninja_build Change 2: Call _write_ninja_file while building ahead-of-time cpp_extensions - _write_ninja_file_and_compile_objects calls _write_ninja_file to only build object files. - We monkey-patch distutils.CCompiler.compile to call _write_ninja_files_and_compile_objects - distutils still handles the linking step. The linking step is not a bottleneck so it was not a concern. - This change only works on unix-based systems. Our code for windows goes down a different codepath and I did not want to mess with that. - If a system does not support ninja, we raise a warning and fall back to the original compilation path. Test Plan ------------------------------ Adhoc testing - I built torchvision using pytorch master and printed out the build commands. Next, I used this branch to build torchvision and looked at the ninja file. I compared the ninja file with the build commands and asserted that they were functionally the same. - I repeated the above for pytorch/nestedtensor. PyTorch test suite - I split `test_cpp_extensions` into `test_cpp_extensions_aot` and `test_cpp_extensions_jit`. The AOT (ahead-of-time) version tests ahead-of-time and the JIT version tests just-in-time (not to be confused with TorchScript) - `test_cpp_extensions_aot` gets run TWICE by run_test.py, once with a module that was built with ninja, and once with a module that was built without ninja. - run_test.py asserts that when we are building with use_ninja=True, ninja is actually available on the system. Test Plan: Imported from OSS Differential Revision: D19730432 Pulled By: zou3519 fbshipit-source-id: 819590d01cf65e8da5a1e8019b8b3084792fee90	2020-02-05 18:49:29 -08:00
Pavel Belevich	9357b91180	Remove -Werror from test/cpp_extensions/setup.py (#32704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32704 -Werror is too aggressive check for test cpp extensions because it fails even on deprecation warnings which is are included from core codebase. Fixes #32136 Test Plan: Imported from OSS Differential Revision: D19620190 Pulled By: pbelevich fbshipit-source-id: 0e91566eb5de853559bb59e68a02b0bb15e7341b	2020-01-29 14:12:32 -08:00
Pavel Belevich	62b06b9fae	Rename TensorTypeId to DispatchKey (#32154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32154 TensorTypeId -> DispatchKey c10/core/TensorTypeId.h -> c10/core/DispatchKey.h c10/core/TensorTypeId.cpp -> c10/core/DispatchKey.cpp TensorTypeId::* -> DispatchKey::* TensorTypeId type_id -> DispatchKey dispatch_key type_id -> dispatch_key TensorTypeId::NumTensorIds -> DispatchKey::NumDispatchKeys RealTensorTypeId -> RealDispatchKey TensorTypeSet -> DispatchKeySet TensorTypeIds -> DispatchKeys c10/core/TensorTypeSet.h -> c10/core/DispatchKeySet.h c10/core/TensorTypeSet.cpp -> c10/core/DispatchKeySet.cpp type_set() -> key_set() type_set_ -> key_set_ typeSet -> keySet ExcludeTensorTypeIdGuard -> ExcludeDispatchKeyGuard IncludeTensorTypeIdGuard -> IncludeDispatchKeyGuard LocalTensorTypeSet -> LocalDispatchKeySet c10/core/impl/LocalTensorTypeSet.h -> c10/core/impl/LocalDispatchKeySet.h c10/core/impl/LocalTensorTypeSet.cpp -> c10/core/impl/LocalDispatchKeySet.cpp tls_local_tensor_type_set -> tls_local_dispatch_key_set tls_is_tensor_type_id_excluded -> tls_is_dispatch_key_excluded tls_set_tensor_type_id_excluded -> tls_set_dispatch_key_excluded tls_is_tensor_type_id_included -> tls_is_dispatch_key_included tls_set_tensor_type_id_included -> tls_set_dispatch_key_included MultiDispatchTensorTypeSet -> MultiDispatchKeySet multi_dispatch_tensor_type_set -> multi_dispatch_key_set tensorTypeIdToBackend -> dispatchKeyToBackend backendToTensorTypeId -> backendToDispatchKey initForTensorTypeSet -> initForDispatchKeySet inferred_type_set -> inferred_key_set computeTensorTypeId -> computeDispatchKey PODLocalTensorTypeSet raw_local_tensor_type_set -> PODLocalDispatchKeySet raw_local_dispatch_key_set get_default_tensor_type_id -> get_default_dispatch_key inferred_type_id -> inferred_dispatch_key actual_type_id -> actual_dispatch_key typeSetToDispatchKey_ -> dispatchKeySetToDispatchKey_ get_type_id() -> get_dispatch_key() legacyExtractTypeId -> legacyExtractDispatchKey extractTypeId -> extractDispatchKey Test Plan: Imported from OSS Differential Revision: D19398900 Pulled By: pbelevich fbshipit-source-id: 234ad19f93d33e00201b61e153b740a339035776	2020-01-15 11:16:08 -08:00
Nathan Goldbaum	f531815526	Deprecate tensor.type() (#30281 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/29161. I looked a bit at the code changes related to this and think I have all of the use cases of `DeprecatedTypeProperties` covered in the message, but suggestions from someone with more context on this would be very much appreciated :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/30281 Differential Revision: D18830818 Pulled By: ezyang fbshipit-source-id: 1a7fcee15354ae09e6644577e7fa33bd26acfe20	2019-12-05 10:55:34 -08:00
peter	e870a9a870	More checks on MSVC (#29709 ) Summary: The flags `/sdl` and `/permissive-` are switched on automatically when using the VS GUI. Adding those checks will ensure that those annoying errors won't appear when users use the VS GUI to build their project. More info: https://docs.microsoft.com/en-us/cpp/build/reference/sdl-enable-additional-security-checks?view=vs-2017 https://docs.microsoft.com/en-us/cpp/build/reference/permissive-standards-conformance?view=vs-2017 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29709 Differential Revision: D18473888 Pulled By: bddppq fbshipit-source-id: 21156b0232a5dc3b566d14491d00bacb11493254	2019-11-13 00:15:40 -08:00
Sebastian Messmer	01aea1f268	Delete ATenDispatch (#28468 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28468 We don't need this anymore. ghstack-source-id: 92595388 Test Plan: unit tests Differential Revision: D18073339 fbshipit-source-id: d0ef1332c83e47117fe0a5eadc8faedb259cfba0	2019-10-24 22:15:00 -07:00
Sebastian Messmer	ee920b92c4	Move complex extension test to c10 (#28208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28208 Backend extensions should call torch::RegisterOperators, not globalATenDispatch(). If the op is still on globalATenDispatch, then torch::RegisterOperators will do the right thing and forward it to globalATenDispatch. ghstack-source-id: 92436988 Test Plan: waitforsandcastle Differential Revision: D17975369 fbshipit-source-id: 0d4bd5e4e5b86e6dcfba527a7d11c25508896ac1	2019-10-23 01:33:47 -07:00
Edward Yang	b56ad744a2	Delete backwards compatibility Backend overload for registerOp (#25914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25914 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17284083 Pulled By: ezyang fbshipit-source-id: 430ac7ea2bd042b1f4bb874e53679d0fde326dec	2019-09-25 07:21:44 -07:00
Sebastian Messmer	8321f2592e	Register ATen ops with c10 (#26131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26131 Changes in this PR: - For each operator with use_c10_dispatcher: True, additionally generate a c10 registration line in TypeDefault.cpp, CPUType.cpp, and other backend files. - This doesn't change globalATenDispatch yet, the c10 registration is purely additional and the operator calling path doesn't change. A diff further up the stack will change these things. - Enable the use_c10_dispatcher: True flag for about ~70% of operators - This also changes the c10->jit operator export because ATen ops are already exported to JIT directly and we don't want to export the registered c10 ops because they would clash - For this, we need a way to recognize if a certain operator is already moved from ATen to c10, this is done by generating a OpsAlreadyMovedToC10.cpp file with the list. A diff further up in the stack will also need this file to make sure we don't break the backend extension API for these ops. Reasons for some ops to be excluded (i.e. not have the `use_c10_dispatcher` flag set to true): - `Tensor?(a!)` (i.e. optional tensor with annotations) not supported in c++ function schema parser yet - `-> void` in native_functions.yaml vs `-> ()` expected by function schema parser - out functions have different argument order in C++ as in the jit schema - `Tensor?` (i.e. optional tensor) doesn't work nicely with undefined tensor sometimes being undefined tensor and sometimes being None. - fixed-size arrays like `int[3]` not supported in c10 yet These will be fixed in separate diffs and then the exclusion tag will be removed. ghstack-source-id: 90060748 Test Plan: a diff stacked on top uses these registrations to call these ops from ATen Differential Revision: D16603131 fbshipit-source-id: 315eb83d0b567eb0cd49973060b44ee1d6d64bfb	2019-09-13 13:52:40 -07:00
Mike Ruberry	a024e1e091	Creates Torch-friendly Event class and adds Stream tracking to autograd (#25130 ) Summary: Resubmission of https://github.com/pytorch/pytorch/issues/23424 because previous PR was borked. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25130 Test Plan: Two tests were added to cuda_stream_test for this functionality. Differential Revision: D17145538 Pulled By: mruberry fbshipit-source-id: 2546c5907c038412e03aa0d3328a972b0164c455	2019-09-01 12:37:52 -07:00

1 2 3 4 5 ...

321 Commits