Commit Graph

131 Commits

Author SHA1 Message Date
Sebastian Messmer
edf751ca2f Make empty c10-full (#46092)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46092

Make empty c10-full without using hacky-wrapper, i.e. port the kernel to the new style signature.

This PR also changes the signature of some helpers called by empty to the new style.
ghstack-source-id: 116544203

(Note: this ignores all push blocking failures!)

Test Plan:
vs prev diff (outdated, before c10::optional fix): https://www.internalfb.com/intern/fblearner/details/224735103/

after c10::optional fix:
https://www.internalfb.com/intern/fblearner/details/231391773/

Also, after the c10::optional fix, the instruction counting benchmark shows a 2% regression for calling empty from Python. We decided this is acceptable and decided against landing D24425836 which would fix the regression.

Reviewed By: ezyang

Differential Revision: D24219944

fbshipit-source-id: e554096e90ce438c75b679131c3151ff8e5c5d50
2020-11-12 17:08:21 -08:00
Wanchao Liang
553ccccc54 [c10d] switch ProcessGroup to be managed by intrusive_ptr (#47343)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47343

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D24723418

Pulled By: wanchaol

fbshipit-source-id: 0463819b96c53b12bdbb3905431110d7b21beb77
2020-11-12 07:36:23 -08:00
Wanchao Liang
665ac2f7b0 [reland] [c10d] switch Store to be managed by intrusive_ptr (#47808)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47808

reland https://github.com/pytorch/pytorch/pull/47074

Test Plan: wait for ci

Reviewed By: gmagogsfm

Differential Revision: D24905246

fbshipit-source-id: edeb7e6e486570ce889f12512e9dc02061d6cc03
2020-11-11 22:53:20 -08:00
Wanchao Liang
70ae5685f9 [reland][c10d] switch ProcessGroup::Work to be managed by intrusive_ptr (#47806)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47806

reland https://github.com/pytorch/pytorch/pull/44046

Test Plan: wait for ci

Reviewed By: gmagogsfm

Differential Revision: D24905245

fbshipit-source-id: ad75ace5432fcfd22d513878f5a73c4bb017324e
2020-11-11 22:51:03 -08:00
Wanchao Liang
dac0192148 Revert D23632280: [c10d] switch ProcessGroup::Work to be managed by intrusive_ptr
Test Plan: revert-hammer

Differential Revision:
D23632280 (0650a6166f)

Original commit changeset: 0a4642a8ffab

fbshipit-source-id: 2aa8ddb874fab11f773f4c08d740afcd865482e9
2020-11-11 10:54:08 -08:00
Wanchao Liang
1f946e942d Revert D24667128: [c10d] switch Store to be managed by intrusive_ptr
Test Plan: revert-hammer

Differential Revision:
D24667128 (0cfe3451d4)

Original commit changeset: 9b6024c31c85

fbshipit-source-id: d8ddf9eb2fccef5023e05698e0c4662708fe4945
2020-11-11 10:49:58 -08:00
Wanchao Liang
0cfe3451d4 [c10d] switch Store to be managed by intrusive_ptr (#47074)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47074

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D24667128

Pulled By: wanchaol

fbshipit-source-id: 9b6024c31c851b7c3243540f460ae57323da523b
2020-11-10 23:36:44 -08:00
Wanchao Liang
0650a6166f [c10d] switch ProcessGroup::Work to be managed by intrusive_ptr (#44046)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44046

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D23632280

Pulled By: wanchaol

fbshipit-source-id: 0a4642a8ffabdd26c52c1baabfa30c0f446c3c85
2020-11-10 23:30:22 -08:00
Pritam Damania
a2b4177c5b Add barrier() at the end of init_process_group and new_group. (#45181)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45181

`init_process_group` and `new_group` update a bunch of global
variables after initializing the actual process group. As a result, there is a
race that after initializing the process group on say rank 0, if we immediately
check the default process group on rank 1 (say via RPC), we might actually get
an error since rank 1 hasn't yet updated its _default_pg variable.

To resolve this issue, I've added barrier() at the end of both of these calls.
This ensures that once these calls return we are guaranteed about correct
initialization on all ranks.

Since these calls are usually done mostly during initialization, it should be
fine to add the overhead of a barrier() here.

#Closes: https://github.com/pytorch/pytorch/issues/40434, https://github.com/pytorch/pytorch/issues/40378
ghstack-source-id: 112923112

Test Plan:
Reproduced the failures in
https://github.com/pytorch/pytorch/issues/40434 and
https://github.com/pytorch/pytorch/issues/40378 and verified that this PR fixes
the issue.

Reviewed By: mrshenli

Differential Revision: D23858025

fbshipit-source-id: c4d5e46c2157981caf3ba1525dec5310dcbc1830
2020-09-25 15:46:59 -07:00
Basil Hosmer
79e6aaeb4c pull empty() out of use_c10_dispatcher: full (#43572)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43572

Test Plan: Imported from OSS

Reviewed By: smessmer

Differential Revision: D23326019

Pulled By: bhosmer

fbshipit-source-id: 10a4d7ffe33b4be4ae45396725456c6097ce1757
2020-08-26 22:51:06 -07:00
Sebastian Messmer
3a19af2427 Make operators with optional Tensor? arguments c10-full (#41610)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41610

Previously, operators that have a `Tensor?` (i.e. optional tensor) in their schema implemented it using `Tensor` in C++ and filled in an undefined tensor for the None case.
The c10 operator library, however, expects `Tensor?` to be represented as `optional<Tensor>`, so those operators couldn't be c10-full yet and still had to use codegenerated unboxing instead of templated unboxing.

This PR changes that. It extends the `hacky_wrapper_for_legacy_signatures` to not only take case of TensorOptions, but now also map between signatures taking `Tensor` and `optional<Tensor>`.
For this, it requires an additional template parameter, the expected signature, and it uses that to go argument-by-argument and unwrap any optionals it finds.
ghstack-source-id: 108873701

Test Plan: waitforsandcastle

Reviewed By: bhosmer

Differential Revision: D22607879

fbshipit-source-id: 57b2fb01a294b804f82cd55cd70f0ef4a478e14f
2020-07-31 16:09:08 -07:00
Stanislau Hlebik
b774ce54f8 remediation of S205607
fbshipit-source-id: 798decc90db4f13770e97cdce3c0df7d5421b2a3
2020-07-17 17:19:47 -07:00
Stanislau Hlebik
8fdea489af remediation of S205607
fbshipit-source-id: 5113fe0c527595e4227ff827253b7414abbdf7ac
2020-07-17 17:17:03 -07:00
Omkar Salpekar
9d92fa2679 [NCCL] Add timeout to ProcessGroup Work Wait (#40944)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40944

This stack adds Work-level timeout for blocking wait.

This PR just changes the API to accept a default wait arg for the wait function in each ProcessGroup backend. The ProcessGroup superclass correctly waits for the given timeout by changing the CV wait to wait_for.

Closes: https://github.com/pytorch/pytorch/issues/37571
ghstack-source-id: 107835735

Test Plan: Tests in 4th PR in this stack

Reviewed By: jiayisuse

Differential Revision: D22107135

fbshipit-source-id: b38c07cb5e79e6c86c205e580336e7918ed96501
2020-07-16 10:56:58 -07:00
Changji Shi
47c72be3d7 Port /test/cpp_extensions/rng_extension.cpp to new operator registration API (#39459)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39459

Update to this PR: this code isn't going to fully solve https://github.com/pytorch/pytorch/issues/37010. The changes required for 37010 is more than this PR initially planned. Instead, this PR switches op registration of rng related tests to use the new API (similar to what was done in #36925)

Test Plan:
1) unit tests

Imported from OSS

Reviewed By: ezyang

Differential Revision: D22264889

fbshipit-source-id: 82488ac6e3b762a756818434e22c2a0f9cb9dd47
2020-06-26 16:12:54 -07:00
Sebastian Messmer
0494e0ad70 Back out "Revert D21581908: Move TensorOptions ops to c10" (#40595)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40595

ghstack-source-id: 106691774

Test Plan: waitforsandcastle

Differential Revision: D22247729

fbshipit-source-id: 14745588cae267c1e0cc51cd9541a9b8abb830e5
2020-06-26 12:57:09 -07:00
Sebastian Messmer
581ad48806 Revert D21581908: Move TensorOptions ops to c10
Test Plan: revert-hammer

Differential Revision:
D21581908

Original commit changeset: 6d4a9f526fd7

fbshipit-source-id: fe1e6368a09120ea40dea405e8409983541e3cb5
2020-06-23 16:10:07 -07:00
Sebastian Messmer
b623bdeabb Move TensorOptions ops to c10 (#39492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39492

This PR adds use_c10_dispatcher: full to ops taking TensorOptions. To allow this, since the c10 operator library doesn't know about TensorOptions, we need to register the operator kernels as optional<ScalarType>, optional<Device>, optional<Layout>, optional<bool> instead, and also call them this way.

Changes:

Add use_c10_dispatcher: full to those ops
Write hacky_wrapper_for_legacy_signatures which takes an old-style kernel (i.e. one written to take TensorOptions) an creates a wrapper kernel for it that takes the scattered optional<ScalarType>, optional<Device>, optional<Layout>, optional<bool> instead.
Change codegen so that all op registrations are wrapped into hacky_wrapper_for_legacy_signatures. This is added to all ops but is a no-op if the op doesn't take TensorOptions. This allows us in the future to just change a kernel signature from TensorOptions to the scattered version and have it work without having to touch codegen.
Change codegen so that the frontend calls those operators with expanded arguments instead of with a TensorOptions object. This is required because now the kernels are written in this way.
This PR does not remove TensorOptions special cases from codegen, but instead it separates kernels from the codegen/frontend issues. After this, kernels can be worked on separately without having to touch codegen and codegen can be worked on without having to touch kernels.

Codegen diff: P133121032

ghstack-source-id: 106426630

Test Plan: waitforsandcastle

Differential Revision: D21581908

fbshipit-source-id: 6d4a9f526fd70fae40581bf26f3ccf794ce6a89e
2020-06-23 14:13:34 -07:00
Sebastian Messmer
86b1afa039 Assert that kernels are called with the right signature (#40251)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40251

Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.
ghstack-source-id: 106194240

Test Plan: waitforsandcastle

Differential Revision: D22126701

fbshipit-source-id: 6c908a822e295757bcc0014f78f51e6a560f221f
2020-06-18 21:54:05 -07:00
Sebastian Messmer
cb8b2f0636 Revert D21534052: Assert that kernels are called with the right signature
Test Plan: revert-hammer

Differential Revision:
D21534052

Original commit changeset: 6be436a3f205

fbshipit-source-id: a149c5ca7f9e78947ae3059ac4470712f291660b
2020-06-18 15:00:13 -07:00
Sebastian Messmer
55cdd31bd0 Assert that kernels are called with the right signature (#38361)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38361

Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.

This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.

supersedes D17485438

ghstack-source-id: 106178820

Test Plan: waitforsandcastle

Differential Revision: D21534052

fbshipit-source-id: 6be436a3f20586277a051d764af29e21d5567da0
2020-06-18 14:22:48 -07:00
Kurt Mohler
f9eb8824f1 Remove datatype from Storage and StorageImpl (#38870)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38870

* Removed dtype data member from StorageImpl
* Removed any methods or method arguments in Storage/StorageImpl that deal with dtypes
* Update all callers of the changed API

Part of issue https://github.com/pytorch/pytorch/issues/33950
Original PR: https://github.com/pytorch/pytorch/pull/38038

Reviewed By: albanD

Differential Revision: D21549645

Pulled By: ezyang

fbshipit-source-id: 4289b356c55ff6b9530376a79343b99b540ee3de
2020-05-21 15:26:08 -07:00
lixinyu
5a979fcb99 allow user passing relative paths in include_dirs within setuptools.setup (#38264)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38264

Test Plan: Imported from OSS

Differential Revision: D21509277

Pulled By: glaringlee

fbshipit-source-id: b0bc17d375a89b96b1bdacde5987b4f4baa9468e
2020-05-13 20:00:12 -07:00
Edward Yang
fe88806784 Back out "Revert D21171334: [pytorch][PR] Change StorageImpl to track byte count rather than element count" (#37893)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37893

Original commit changeset: 50746043acf3

Test Plan: sandcastle and ossci

Reviewed By: malfet, seemethere, ngimel

Differential Revision: D21416509

fbshipit-source-id: 735ec4e61f9d36d4537f52dd2dc6267751aeb94b
2020-05-05 22:43:15 -07:00
Edward Yang
a2fc7f787a Revert D21171334: [pytorch][PR] Change StorageImpl to track byte count rather than element count
Test Plan: revert-hammer

Differential Revision:
D21171334

Original commit changeset: 37329a379de9

fbshipit-source-id: 50746043acf3c76754688de0fe6f1cc12437ea2f
2020-05-05 16:36:15 -07:00
Kurt Mohler
3706803b60 Change StorageImpl to track byte count rather than element count (#37776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37776

* Remove type-specific size tracking in favor of byte size tracking in Storage and StorageImpl
* Changed numel() and set_numel() to nbytes() and set_nbytes()
* Added enum argument to Storage/StorageImpl constructor to indicate new meaning of the size parameter
* Update all callers of the changed API

Part of issue https://github.com/pytorch/pytorch/issues/33950
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37028

Differential Revision: D21171334

Pulled By: ezyang

fbshipit-source-id: 37329a379de9a3a83cc5e9007e455a3e1c2d10b8
2020-05-05 14:20:51 -07:00
Pavel Belevich
1beca4ac6a Prerequisites for CSPRNG (#36631)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36631

Summary of changes

1. Moved random transformation functions to DistributionHelper.h (`uniform_int_from_to_distribution`, `uniform_int_full_range_distribution`, `uniform_int_distribution`) to avoid code duplication between default CPU, CUDA rngs and custom rng extensions
2. Made GeneratorImpl fields protected instead of private
3. Introduced `TORCH_CHECK_IF_NOT_ON_CUDA` that does the same as `TORCH_CHECK` if it is not CUDA/ROCm device
4. To test multiple rng extensions I had to move ops registration to the method `registerOps()`, expose it to python and call it `def setUp(self)`

Test Plan: Imported from OSS

Differential Revision: D21229202

Pulled By: pbelevich

fbshipit-source-id: 6aa3280f2fc3324cf3e748388b5087e3a1e49f23
2020-04-24 12:25:37 -07:00
Edward Yang
a894fff265 Back out "Revert D21089648: Put TORCH_LIBRARY in torch/library.h; add custom class API"
Summary: Original commit changeset: 636e8a11afc6

Test Plan: export to OSS

Reviewed By: malfet

Differential Revision: D21170502

fbshipit-source-id: e8f35f103c4924aedbcaaf868475008d24bdeeab
2020-04-22 09:18:23 -07:00
James Reed
2ccdc39dce Revert D21089648: Put TORCH_LIBRARY in torch/library.h; add custom class API
Test Plan: revert-hammer

Differential Revision:
D21089648

Original commit changeset: 8d54329c1252

fbshipit-source-id: 636e8a11afc628a4cdae9d44824985c10c70555e
2020-04-21 12:21:45 -07:00
Edward Yang
01100cb477 Put TORCH_LIBRARY in torch/library.h; add custom class API (#36742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36742

Now, you can define a custom class inside a TORCH_LIBRARY block.
It looks very similar to what you did before.  Instead of

```
static auto m = torch::class_<Class>("Namespace", "Class").def("foo", foo);
```

you write

```
TORCH_LIBRARY(Namespace, m) {
  m.class_<Class>("Class")
    .def("foo", foo);
}
```

All the old usages still work, but at some point we should start
updating the tutorials when we're ready to go 100% live with the
new pybind11 style API.

custom class API previously lived in torch/ folder and in torch
namespace, so for consistency, the new TORCH_LIBRARY also got
moved to torch/library.h The definition of Library::class_ is in the
bottom of that header because I need all of the class_ constructors
available, but there is a circular dependency between the two headers.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D21089648

Test Plan: Imported from OSS

Pulled By: ezyang

fbshipit-source-id: 8d54329c125242605336c22fa1642aae6940b507
2020-04-21 10:05:21 -07:00
Edward Yang
e29348f828 Switch to pybind11 style registration function API. (#36258)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36258

Previous we had a && chaining style API.  There are some downsides to
this API:

- It's easy to forget the 'static' qualifier in front, leading to
  subtle ODR bugs.
- It is not compatible with torchbind class_ definitions, as these
  need multiple levels of chaining.  So in practice people end
  up having to define multiple static initializers, one per class.
- It's not like pybind11.
- There's no way to conveniently get the file and line number of
  the registration, as there is no macro point in the API.
- The old API doesn't really encourage people to put all of their
  definitions for a library in one place, and to give a custom
  namespace for it.  Similarly, the old API wasn't very DRY, because
  you had to keep repeating the namespace/dispatch key you
  were writing implementations for.

The new API is modeled exactly off of the PYBIND11_MODULE macro:
you write:

```
TORCH_LIBRARY(aten, m) {
  m.def("aten::add(Tensor self, Tensor other) -> Tensor");
  ...
}
```

in a non-chaining fashion, and under the hood the macro expands to
define a function, and define a static initializer that allocates
c10::Library (previously called c10::Module, but we renamed it
to avoid confusion with the existing NN module concept), passes
it to your function, and then retains it for the rest of the lifetime
of the program.  Specification of the namespace is mandatory,
and in later commit I plan to make it a hard error to TORCH_LIBRARY
the same library name twice.

If you are specifying an implementation for an existing operator
(e.g., you're the XLA backend, or even if you're just putting
registrations for implementations at the implementation site),
you should use TORCH_LIBRARY_IMPL, which instead takes a backend
argument (instead of namespace) and can be used to specify an
implementation for a backend.  Unlike TORCH_LIBRARY, you can do
as many of these as you want for a backend.

This needs updates to the mobile code analyzer.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20929257

Pulled By: ezyang

fbshipit-source-id: ba04d78492e8c93ae7190165fb936f6872896ada
2020-04-16 10:44:21 -07:00
Edward Yang
dd64e738c5 Expunge TensorId from all DispatchKey names. (#36240)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36240

It's annoying, historical, and unnecessary (enum class is already
namespaced).  I did this codemod with:

```
git grep -l 'CPUTensorId' | xargs sed -i 's/CPUTensorId/CPU/g'
git grep -l 'CUDATensorId' | xargs sed -i 's/CUDATensorId/CUDA/g'
git grep -l 'VariableTensorId' | xargs sed -i 's/VariableTensorId/Autograd/g'
git grep -l 'HIPTensorId' | xargs sed -i 's/HIPTensorId/HIP/g'
git grep -l 'MSNPUTensorId' | xargs sed -i 's/MSNPUTensorId/MSNPU/g'
git grep -l 'XLATensorId' | xargs sed -i 's/XLATensorId/XLA/g'
git grep -l 'PrivateUse1_TensorId' | xargs sed -i 's/PrivateUse1_TensorId/PrivateUse1/g'
git grep -l 'PrivateUse2_TensorId' | xargs sed -i 's/PrivateUse2_TensorId/PrivateUse2/g'
git grep -l 'PrivateUse3_TensorId' | xargs sed -i 's/PrivateUse3_TensorId/PrivateUse3/g'
git grep -l 'AutocastTensorId' | xargs sed -i 's/AutocastTensorId/Autocast/g'
git grep -l '_PreAutogradTensorId' | xargs sed -i 's/_PreAutogradTensorId/_PreAutograd/g'
git grep -l 'TESTING_ONLY_GenericWrapperTensorId' | xargs sed -i 's/TESTING_ONLY_GenericWrapperTensorId/TESTING_ONLY_GenericWrapper/g'
git grep -l 'TESTING_ONLY_GenericModeTensorId' | xargs sed -i 's/TESTING_ONLY_GenericModeTensorId/TESTING_ONLY_GenericMode/g'
```

Then I did a git grep for remaining TensorId occurrences, and manually
killed those (mostly in codegen, and some docs that needed updating).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20929255

Pulled By: ezyang

fbshipit-source-id: dc371b6aa6e6ea7c0a5660137c14debde806a09d
2020-04-13 23:33:44 -07:00
Pavel Belevich
c9a1fc2b31 replace Generator arguments with c10::optional<Generator> (#36232)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36232

The purpose of this PR is to replace `at::Generator generator = nullptr` with `c10::optional<at::Generator> = c10::nullopt` all over the code

* #36230 Replace std::shared_ptr with c10::intrusive_ptr in at::Generator

Test Plan: Imported from OSS

Differential Revision: D20943603

Pulled By: pbelevich

fbshipit-source-id: 65d335990f01fcc706867d5344e73793fad68ae6
2020-04-13 16:26:57 -07:00
Edward Yang
2de3e491a8 [RELAND] Add temporary impl_UNBOXED syntax sugar for unboxed-only defs. (#36223)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36223

Previously #35714

There are a lot of unboxed only defs.  We're committed to removing
them at the end of the half but as I am about to do a lot of porting
to the new API, let's get them into a form where they're easy to
remove.  This is a new overload impl_UNBOXED that will pass
the function pointer straight to CppFunction::makeUnboxedOnly

I don't attempt to make the _UNBOXED API complete; in particular,
catchall declarations don't get this sugar (as there are very few
of them).

To get some coverage of _UNBOXED API for code analysis, I switched
one of our unboxed tests to be an impl rather than a def.  This
shouldn't materially affect coverage.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20929259

Pulled By: ezyang

fbshipit-source-id: 72d2061b6c8a6afbcd392b47f53ade18de2f9184
2020-04-09 14:58:33 -07:00
Edward Yang
ef07bb65e9 [RELAND] Add DispatchKey impl overload; remove use of torch::dispatch (#36222)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36222

Reland of #35706, with fixes to code analyzer.

It is extremely common to define implementations of operators at a
specific dispatch key, so we add an overload to impl specifically for
this case.  I then delete most uses of torch::dispatch

dispatch_autograd call sites can't make use of this overload.  So
instead the new preferred way to specify something as autograd is to
pass kAutograd as the dispatch key (short form, analogous to kCPU/kCUDA
which we support today).

I flip flopped about whether or not kAutograd should have the type
DispatchKey or some other type (to help better encapsulate the
DispatchKey enum); this is more direct and I can't think of any
BC problems from this usage.

Some other reorganization I did:
- I renamed all of the worker functions in op_registration to have
  a leading underscore and made them private, just to make it more
  clear what the public versus private API were (the private API
  shouldn't be used by users because it doesn't come with && overloads)
  Note that this means I needed to adjust the regex in the
  code analyzer, because
- In a few places where I was touching lines already, I replaced
  full DispatchKey typed out enums with shorter kFoo names, similar
  to kAutograd but I didn't publish these globally.
- Code analyzer now prints a unified diff, and in the other order
  (because I tend to think of the diff as reporting how the /new/ result
  is different)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20929256

Pulled By: ezyang

fbshipit-source-id: c69b803d2b3a1a8aff70e14da33d3adec5239f13
2020-04-09 14:56:55 -07:00
Feng Tian
762270c51f add c10d dynamic loading mechanism and unit test (#28068)
Summary:
The original behavior of pytorch c10d only supports built-in c10d backends, such as
nccl/gloo/mpi. This patch is used to extend the c10d capability to support dynamically
loading 3rd party communication libraries which are derived from ProcessGroup base class.

related RFC is in: https://github.com/pytorch/pytorch/issues/27955

Through this way, user just need specify a 3rd party c10d backend name when invoking
torch.distributed.init_process_group(). The proposed logic will try to load corresponding
c10d backend cpp extension automatically. as for how to develop a new 3rd party c10d backend
through cpp extension, pls refer to test/cpp_extensions/cpp_c10d_extension.cpp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28068

Differential Revision: D19174838

Pulled By: agolynski

fbshipit-source-id: 3409a504a43ce7260e6f9d1207c00e87471fac62
2020-04-02 15:46:51 -07:00
Karl Ostmo
0f99b28431 Revert D20775783: Add DispatchKey impl overload; remove use of torch::dispatch
Test Plan: revert-hammer

Differential Revision:
D20775783

Original commit changeset: e45b289e5d1f

fbshipit-source-id: 08551428fa886e93cfda14eb51a0f920c335df34
2020-04-02 10:51:50 -07:00
Karl Ostmo
e67951af63 Revert D20775782: Add temporary impl_UNBOXED syntax sugar for unboxed-only defs.
Test Plan: revert-hammer

Differential Revision:
D20775782

Original commit changeset: c5e804c69f59

fbshipit-source-id: 2198e715eb3a24d198a949a44ec192bec523ffb4
2020-04-02 10:48:51 -07:00
Edward Yang
8e951c5793 Add temporary impl_UNBOXED syntax sugar for unboxed-only defs. (#35714)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35714

There are a lot of unboxed only defs.  We're committed to removing
them at the end of the half but as I am about to do a lot of porting
to the new API, let's get them into a form where they're easy to
remove.  This is a new overload impl_UNBOXED that will pass
the function pointer straight to CppFunction::makeUnboxedOnly

I don't attempt to make the _UNBOXED API complete; in particular,
catchall declarations don't get this sugar (as there are very few
of them).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20775782

Pulled By: ezyang

fbshipit-source-id: c5e804c69f5961c9d4862f6c5dbbe4c524cc32cc
2020-04-02 08:52:54 -07:00
Edward Yang
2db61193bb Add DispatchKey impl overload; remove use of torch::dispatch (#35706)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35706

It is extremely common to define implementations of operators at a
specific dispatch key, so we add an overload to impl specifically for
this case.  I then delete most uses of torch::dispatch

dispatch_autograd call sites can't make use of this overload.  So
instead the new preferred way to specify something as autograd is to
pass kAutograd as the dispatch key (short form, analogous to kCPU/kCUDA
which we support today).

I flip flopped about whether or not kAutograd should have the type
DispatchKey or some other type (to help better encapsulate the
DispatchKey enum); this is more direct and I can't think of any
BC problems from this usage.

Some other reorganization I did:
- I renamed all of the worker functions in op_registration to have
  a leading underscore and made them private, just to make it more
  clear what the public versus private API were (the private API
  shouldn't be used by users because it doesn't come with && overloads)
- In a few places where I was touching lines already, I replaced
  full DispatchKey typed out enums with shorter kFoo names, similar
  to kAutograd but I didn't publish these globally.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20775783

Pulled By: ezyang

fbshipit-source-id: e45b289e5d1f86c180b24cf14c63cf4459ab5337
2020-04-02 08:51:22 -07:00
Edward Yang
4f4ed5c108 Disable c10::import(ns) (#35398)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35398

This disables namespaced c10::import which is broken with custom
mobile op builds.  This is to help prevent people from accidentally
breaking the custom mobile build in a mysterious way; if they use
the longform version it will work.  Fixing the analyzer is tracked
in https://github.com/pytorch/pytorch/issues/35397

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20680519

Pulled By: ezyang

fbshipit-source-id: a18ac8df7e72bf399807870beedb828131273e48
2020-03-30 21:12:49 -07:00
Edward Yang
9e3605de98 [RELAND] New operator registration API (#35061) (#35629)
Summary:
Reland of https://github.com/pytorch/pytorch/pull/35061 ; removed
the get qualified type name magic from debug strings to work around
MSVC 2017 bug.

Main points of the new API:

- You can register implementations (impl) without having to specify a schema.
- Registrations are commutative, so no matter what order your static
  initializers run, you end up with the same end result.

op_registration_test.cpp contains a reasonably comprehensive accounting
for the available API surface

How does this implementation proceed?  The basic concept is to relax the
internal invariants of Dispatcher data structures to allow the
possibility that a FunctionSchema is not specified in an Operator.

- DispatchKeyExtractor has an uninitialized state where it doesn't look
  for dispatch keys in any arguments of the stack.  It can have a
  schema (de)registered to itself post facto with
  registerSchema/unregisterSchema.
- DispatchTable has a new constructor taking only an OperatorName for
  the uninitialized state.  It can have a schema (de)registered to itself
  post facto with registerSchema/unregisterSchema
- OperatorDef maintains counts of both defs and well as defs_and_impls.
  defs_and_impls keeps track of the outstanding impl registrations; you
  may have impl registrations but no defs.  If there are no defs (no
  schema), the operator is not returned by findSchema.  A new
  findOperatorByName fucntion unconditionally returns the OperatorHandle
  even if there's no schema.  OperatorHandle::hasSchema can be used
  to check if the operator has schema.
- Replaced 'registerKernel' with 'registerImpl', which is the new
  interface for directly registering kernels without implementations.
- Because 'registerImpl' no longer requires an OperatorHandle, change
  'registerDef' to only return a RegistrationHandleRAII.  This is marginally
  less efficient (since we're doing two hash table lookups on a registration
  now), but this won't matter in the long term, and probably doesn't
  matter now either.
- Rename registerBackendFallbackKernel to registerFallback (this exposed
  a bunch of places where we're improperly directly interfacing with Dispatcher;
  we need to add this capability to the true public API)
- All code generated internal registrations are switched to use the new
  API.  This includes VariableType registrations (which previously
  weren't converted) and the mobile autograd stuff
- Switch the new-style def()/impl() APIs to interact directly with Dispatcher,
  rather than indirecting through the old API
- We deleted alias analysis kind merging entirely.  As a nod to BC, it's
  possible to define a full schema with alias analysis kind, and then
  later do another full schema def with missing alias analysis kind, but
  the opposite direction is not allowed.  We can remove this entirely
  following the plan at https://github.com/pytorch/pytorch/issues/35040
- Schema matching is moved inside the dispatcher, because we might not
  be able to immediately schema match at the point of an impl() (because
  we don't have the schema yet).  To do this, we store the inferred
  function schema inside a KernelEntry, so we can check it when we get
  the real schema.
- Registered kernel functions now store a debug string which
  can be used to more easily identify them.  Tests use this to
  distinguish between multiple distinct registrations; regular
  invocations get only very basic information.

Because we need our static initializers to work no matter what order
they're run, the testing strategy on this PR is quite involved.

The general concept:
- Bind a (very gimped) version of the dispatcher API from Python,
  so that we can easily write a more complex testing harness
  using expect tests.
- For series of registrations we want to test, exhaustively
  test every possible permutation of registrations (and
  deregistrations), and show that the intermediate states
  agree no matter what path is taken.
- Intermediate states are rendered using a new dumpState()
  debugging method that prints the internal state of the
  dispatcher.  This method may be generally useful for people
  who want to see what's in the dispatcher.
- Simultaneously, add a new invariant testing function which
  checks that the internal invariants of the dispatcher are
  upheld (so we don't have to print internal implementation
  details of the dispatcher)

The testing framework found a few bugs in development.  For example,
here is a case where we registered schema too early, before checking
if it was valid:

```
Traceback (most recent call last):
  File "test/test_dispatch.py", line 164, in test_def_impl_schema_mismatch
    ], raises=True)
  File "test/test_dispatch.py", line 135, in commute
    results=results, raises=raises)
  File "test/test_dispatch.py", line 83, in run_permutation
    .format(ctor_order[:i], op_ix))
  File "test/test_dispatch.py", line 59, in check_invariants
    .format(expected_provenance, actual_provenance)
AssertionError: 'name[16 chars]ema: (none)\ncatchall: boxed unboxed :: (Tenso[18 chars]0)\n' != 'name[16 chars]ema: test::foo(Tensor x, Tensor y) -> (Tensor)[53 chars]0)\n'
  name: test::foo
- schema: (none)
+ schema: test::foo(Tensor x, Tensor y) -> (Tensor)
  catchall: boxed unboxed :: (Tensor _0) -> (Tensor _0)
 : expected from running ctors (1,); actual from running ctors (1,) and then failing to run ctor 0 (did this failure leave the dispatcher in a wedged state? it shouldn't!)
```

There are also C++ smoketests for the API.  These tests comprehensively
cover the C++ API surface of the new operator registration API, but
don't check very hard if the API does the right thing (that's what
test_dispatch.py is for)

Some miscellaneous changes which could have been split into other
PRs, but I was too lazy to do so:

- Add torch::jit::parseName (mirroring parseSchema/parseSchemaOrName)
- Add cloneWithName functionality to FunctionSchema
- Unconditionally generate schema registration, even when type_method_dispatch
  is a dict.  The one exception is for manual registrations....
- Add fallback, CppFunction::makeFallthrough and
  CppFunction::makeFromBoxedFunction to public API of op_registration, so we can
  stop calling internal registerImpl directly
- Add new syntax sugar dispatch_autograd for registering autograd kernels
- Minor OperatorName cleanup, storing OperatorName in DispatchTable
  and defining operator<< on OperatorName
- Refactored the op registration API to take FunctionSchema directly.
  We now do namespacing by post facto fixing up the OperatorName
  embedded in FunctionSchema.  This also means that you can
  now do torch::import("ns1").def("ns2::blah") and have the ns2
  override ns1 (although maybe this is not the correct behavior.)
- New torch::schema public API, for attaching alias analysis kind
  annotation kinds.  This meant we had to template up some function
  signatures which previously took const char*.  There's now a nice
  comment explaining this strategy.
- torch::import now takes std::string which means we can use
  the namespacing from Python

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35629

Differential Revision: D20724551

Pulled By: ezyang

fbshipit-source-id: befa46a1affb4ec4ae1fb39e3564a63695a6ca41
2020-03-29 19:48:29 -07:00
Edward Yang
227beb9095 Revert D20680520: New operator registration API
Test Plan: revert-hammer

Differential Revision:
D20680520

Original commit changeset: 5d39a28e4ec7

fbshipit-source-id: 5b2497ffc24db9a05b01d526f161bc0164f9f707
2020-03-28 14:49:56 -07:00
Edward Yang
28ab8c6ff8 New operator registration API (#35061)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35061

Main points of the new API:

- You can register implementations (impl) without having to specify a schema.
- Registrations are commutative, so no matter what order your static
  initializers run, you end up with the same end result.

op_registration_test.cpp contains a reasonably comprehensive accounting
for the available API surface

How does this implementation proceed?  The basic concept is to relax the
internal invariants of Dispatcher data structures to allow the
possibility that a FunctionSchema is not specified in an Operator.

- DispatchKeyExtractor has an uninitialized state where it doesn't look
  for dispatch keys in any arguments of the stack.  It can have a
  schema (de)registered to itself post facto with
  registerSchema/unregisterSchema.
- DispatchTable has a new constructor taking only an OperatorName for
  the uninitialized state.  It can have a schema (de)registered to itself
  post facto with registerSchema/unregisterSchema
- OperatorDef maintains counts of both defs and well as defs_and_impls.
  defs_and_impls keeps track of the outstanding impl registrations; you
  may have impl registrations but no defs.  If there are no defs (no
  schema), the operator is not returned by findSchema.  A new
  findOperatorByName fucntion unconditionally returns the OperatorHandle
  even if there's no schema.  OperatorHandle::hasSchema can be used
  to check if the operator has schema.
- Replaced 'registerKernel' with 'registerImpl', which is the new
  interface for directly registering kernels without implementations.
- Because 'registerImpl' no longer requires an OperatorHandle, change
  'registerDef' to only return a RegistrationHandleRAII.  This is marginally
  less efficient (since we're doing two hash table lookups on a registration
  now), but this won't matter in the long term, and probably doesn't
  matter now either.
- Rename registerBackendFallbackKernel to registerFallback (this exposed
  a bunch of places where we're improperly directly interfacing with Dispatcher;
  we need to add this capability to the true public API)
- All code generated internal registrations are switched to use the new
  API.  This includes VariableType registrations (which previously
  weren't converted) and the mobile autograd stuff
- Switch the new-style def()/impl() APIs to interact directly with Dispatcher,
  rather than indirecting through the old API
- We deleted alias analysis kind merging entirely.  As a nod to BC, it's
  possible to define a full schema with alias analysis kind, and then
  later do another full schema def with missing alias analysis kind, but
  the opposite direction is not allowed.  We can remove this entirely
  following the plan at https://github.com/pytorch/pytorch/issues/35040
- Schema matching is moved inside the dispatcher, because we might not
  be able to immediately schema match at the point of an impl() (because
  we don't have the schema yet).  To do this, we store the inferred
  function schema inside a KernelEntry, so we can check it when we get
  the real schema.
- Registered kernel functions now store a debug string which
  can be used to more easily identify them.  There's some best
  effort stuff based on __FUNCSIG__ but this is only really
  capable of reporting types and not function symbols.  Tests
  use this to distinguish between multiple distinct registrations.

Because we need our static initializers to work no matter what order
they're run, the testing strategy on this PR is quite involved.

The general concept:
- Bind a (very gimped) version of the dispatcher API from Python,
  so that we can easily write a more complex testing harness
  using expect tests.
- For series of registrations we want to test, exhaustively
  test every possible permutation of registrations (and
  deregistrations), and show that the intermediate states
  agree no matter what path is taken.
- Intermediate states are rendered using a new dumpState()
  debugging method that prints the internal state of the
  dispatcher.  This method may be generally useful for people
  who want to see what's in the dispatcher.
- Simultaneously, add a new invariant testing function which
  checks that the internal invariants of the dispatcher are
  upheld (so we don't have to print internal implementation
  details of the dispatcher)

The testing framework found a few bugs in development.  For example,
here is a case where we registered schema too early, before checking
if it was valid:

```
Traceback (most recent call last):
  File "test/test_dispatch.py", line 164, in test_def_impl_schema_mismatch
    ], raises=True)
  File "test/test_dispatch.py", line 135, in commute
    results=results, raises=raises)
  File "test/test_dispatch.py", line 83, in run_permutation
    .format(ctor_order[:i], op_ix))
  File "test/test_dispatch.py", line 59, in check_invariants
    .format(expected_provenance, actual_provenance)
AssertionError: 'name[16 chars]ema: (none)\ncatchall: boxed unboxed :: (Tenso[18 chars]0)\n' != 'name[16 chars]ema: test::foo(Tensor x, Tensor y) -> (Tensor)[53 chars]0)\n'
  name: test::foo
- schema: (none)
+ schema: test::foo(Tensor x, Tensor y) -> (Tensor)
  catchall: boxed unboxed :: (Tensor _0) -> (Tensor _0)
 : expected from running ctors (1,); actual from running ctors (1,) and then failing to run ctor 0 (did this failure leave the dispatcher in a wedged state? it shouldn't!)
```

There are also C++ smoketests for the API.  These tests comprehensively
cover the C++ API surface of the new operator registration API, but
don't check very hard if the API does the right thing (that's what
test_dispatch.py is for)

Some miscellaneous changes which could have been split into other
PRs, but I was too lazy to do so:

- Add torch::jit::parseName (mirroring parseSchema/parseSchemaOrName)
- Add cloneWithName functionality to FunctionSchema
- Unconditionally generate schema registration, even when type_method_dispatch
  is a dict.  The one exception is for manual registrations....
- Add fallback, CppFunction::makeFallthrough and
  CppFunction::makeFromBoxedFunction to public API of op_registration, so we can
  stop calling internal registerImpl directly
- Add new syntax sugar dispatch_autograd for registering autograd kernels
- Minor OperatorName cleanup, storing OperatorName in DispatchTable
  and defining operator<< on OperatorName
- Refactored the op registration API to take FunctionSchema directly.
  We now do namespacing by post facto fixing up the OperatorName
  embedded in FunctionSchema.  This also means that you can
  now do torch::import("ns1").def("ns2::blah") and have the ns2
  override ns1 (although maybe this is not the correct behavior.)
- New torch::schema public API, for attaching alias analysis kind
  annotation kinds.  This meant we had to template up some function
  signatures which previously took const char*.  There's now a nice
  comment explaining this strategy.
- torch::import now takes std::string which means we can use
  the namespacing from Python

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20680520

Pulled By: ezyang

fbshipit-source-id: 5d39a28e4ec7c73fe4b1fb2222e865ab65e188f5
2020-03-28 10:52:49 -07:00
Pavel Belevich
11a40410e7 pybind11 type_caster for at::Generator and custom RNG python test (#34774)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34774

This PR provides pybind11's `type_caster<at::Generator>` that allows mapping `at::Generator` instance returned from user-defined method to python `torch::Generator`, defined as `THPGenerator ` c++ class.

This allows 1) defining custom RNG in c++ extension 2) using custom RNG in python code.

`TestRNGExtension.test_rng` shows how to use custom RNG defined in `rng_extension.cpp`

Test Plan: Imported from OSS

Differential Revision: D20549451

Pulled By: pbelevich

fbshipit-source-id: 312a6deccf8228f7f60695bbf95834620d52f5eb
2020-03-22 10:57:35 -07:00
Edward Yang
22963f42ec Delete unnecessary aliasAnalysis specification from operator registrations. (#33093)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33093

In #30187 the aliasAnalysis field on operator registration was updated
so that alias analysis could be specified in only some registration call
sites, rather than requiring it be consistently specified in all call
sites.  With this change, we can eliminate the requirement that all
registrations specify aliasAnalysis; as long as we know *one* site
specifies the correct aliasAnalysis, we don't have to specify it
any of the other sites.

In this patch, the "one site" is TypeDefault.cpp (previously we only
generated these stub declarations for manually registered functions,
but now we generate the stubs for everything).  Then I delete aliasAnalysis
anywhere we register an op for an existing function (which is a lot
of places).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19837897

Pulled By: ezyang

fbshipit-source-id: 26a7fbc809ec1553da89ea5c0361f3e81526d4c2
2020-02-21 14:24:44 -08:00
ashish
616beb1412 [ROCm] Added support for pytorch extensions to use HIP (#32669)
Summary:
This pull request has changes for:
1. Enabling a torch module with HIP code to be compiled by cpp_extensions.py
2. Fixes for hipify module to be able to be used by a torch extension

cc: ezyang iotamudelta jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32669

Differential Revision: D20033893

Pulled By: zou3519

fbshipit-source-id: fd6ddc8cdcd3930f41008636bb2bc9dd26cdb008
2020-02-21 12:10:02 -08:00
Edward Yang
9c8b67b179 Revert D19905015: Revert D19858239: [pytorch][PR] Refactor and add VS 14.16 and 2019 CI for Windows
Test Plan: revert-hammer

Differential Revision:
D19905015

Original commit changeset: b117e44d5552

fbshipit-source-id: a10c78aed953434f69f466bdd36f914334ba82f3
2020-02-14 13:42:29 -08:00
George Guanheng Zhang
ff5f38f53b Revert D19858239: [pytorch][PR] Refactor and add VS 14.16 and 2019 CI for Windows
Test Plan: revert-hammer

Differential Revision:
D19858239

Original commit changeset: f068d8505886

fbshipit-source-id: b117e44d5552e157747920d8098ce3b86a29c6bf
2020-02-14 07:35:08 -08:00
peter
946f3a9ed7 Refactor and add VS 14.16 and 2019 CI for Windows (#33117)
Summary:
Changes according to https://github.com/pytorch/pytorch/issues/18319.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33117

Differential Revision: D19858239

Pulled By: ezyang

fbshipit-source-id: f068d8505886b92c9388c9c636eab5bd20377ceb
2020-02-13 11:45:41 -08:00