Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48840
The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams.
This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor.
In https://github.com/pytorch/pytorch/pull/48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to.
In my opinion, this approach is just brilliant! Thank wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes.
ghstack-source-id: 118704935
Test Plan: Unit tests
Reviewed By: wanchaol
Differential Revision: D25334355
fbshipit-source-id: 3f1d3bf6e6e8505a114c877fb9a6fcc3f68d91d3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49009
As per the title, we should generally not have exception swalling and
this commit makes it so that if there is a true error in JIT operator
resolution, it is propagated back to the RPC callee and we don't silently
swallow any other exceptions that may happen. Swallowing the exceptions
previously resulted in hard to debug issues such as unexpected ops showing up
in profiler, and flaky tests which were fixed by
https://github.com/pytorch/pytorch/pull/41287
Added a unittest that validates the error that comes from `jit/pybind_utils.h`.
ghstack-source-id: 118794661
Test Plan: CI
Reviewed By: mrshenli
Differential Revision: D25392905
fbshipit-source-id: 6f93251635740bcf902824548b2bc6f9249be5f0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49146
Add support for Storage arguments to IValue and the JIT typing system, and make ops that were blocked on that c10-full.
ghstack-source-id: 118710665
(Note: this ignores all push blocking failures!)
Test Plan: waitforsandcastle
Reviewed By: ezyang
Differential Revision: D25456799
fbshipit-source-id: da14f125af352de5fcf05a83a69ad5a69d5a3b45
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48502
This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed).
---
FutureNCCL restricted the values to be tensors, or (singleton) lists of tensors, or Python object that could be converted to either of those types. We need a CUDA future that can handle more generic types though.
The main challenge is extracting all DataPtrs from an arbitrary object. I think I found some ways of doing so, but I'd like some JIT experts to look into this and tell me if there are better ways. I'll add inline comments for where their input would be appreciated.
ghstack-source-id: 118180026
Test Plan: Unit tests (I should probably add new ones)
Reviewed By: wanchaol
Differential Revision: D25177562
fbshipit-source-id: 1ef18e67bf44543c70abb4ca152f1610dea4e533
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48495
This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed).
---
PythonFutureWrapper needs to provide a GIL-aware way to extract tensors from an IValue of type PyObject. Since this was only used by FutureNCCL it was guarded by #ifdef USE_C10D_NCCL. However, we will need to use it with CUDA-aware futures other than the NCCL one. This might have been achieved simply by replacing USE_C10D_NCCL with USE_CUDA, but I wanted to clean this up better.
We're dealing with two independent dimensions: C++-vs-Python and CPU-vs-CUDA. To make the code more modular, the two dimensions should be dealt with by orthogonal solutions: the user setting a custom callback to handle Python, and the subclass being CUDA-aware. Mixing these two axes makes it more complicated.
Another reason for changing how this works is that later on, when we'll introduce multi-device support, we'll need to extract dataptrs for other reasons too (rather than just recording streams with the caching allocator), namely to inspect the value to determine which devices it resides on.
ghstack-source-id: 118180038
Test Plan: Unit tests
Reviewed By: mrshenli
Differential Revision: D25177560
fbshipit-source-id: 3a424610c1ea191e8371ffee0a26d62639895884
Summary:
Fix for https://github.com/pytorch/pytorch/issues/46122
For `Any`, we infer the type of the ivalue to set the ivalue's type tag. When we saw a Tensor, we would use a specialized Tensor type, so when `Dict[str, Tensor]` was passed in as any `Any` arg it would be inferred as `Dict[str, Float(2, 2, 2, 2)]` which breaks runtime `isinstance` checking.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46130
Reviewed By: glaringlee
Differential Revision: D24261447
Pulled By: eellison
fbshipit-source-id: 8a2bb26ce5b6c56c8dcd8db79e420f4b5ed83ed5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47586
relanding PR of https://github.com/pytorch/pytorch/pull/44492, and add
additional Capsule related wrapping to ensure we still have the correct
type in pybind11 to resolve Capsule as torch._C.CapsuleType
Test Plan: Imported from OSS
Reviewed By: gmagogsfm
Differential Revision: D24822519
Pulled By: wanchaol
fbshipit-source-id: eaaea446fb54b56ed3b0d04c31481c64096e9459
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45318
When calling `then()` from WorkNCCL, record the input data pointers in futureNCCLCallbackStream_ before the execution of the input callback.
Note that the recording cannot be directly added to the lambda used by addCallback in ProcessGroupNCCL.hpp. This is because the type of future value in that context is pyobject rather than TensorList, but a type casting will require pybind and introduce Python dependency, which should not be allowed in c10d library.
I have considered creating a util function in a separate file to support this type casting, and then placing it under torch/csrc directory where python dependency is allowed. However, torch/csrc has a dependency on c10d, so this will create a circular dependency.
Finally, a `record_stream_cb_` member is added to FutureNCCL, and the default value is nullptr. A default `record_stream_cb_` implementation is added to `PythonFutureWrapper,` where Python dependency is allowed.
In addition, a few lines are reformatted by lint.
caffe2/torch/csrc/distributed/c10d/init.cpp is only reformatted.
#Closes: https://github.com/pytorch/pytorch/issues/44203
Test Plan:
buck test mode/dev-nosan caffe2/test/distributed:c10d -- ProcessGroupNCCLTest
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_accumulate_gradients_no_sync_allreduce_with_then_hook
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_comm_hook_allreduce_with_then_hook_nccl
Reviewed By: pritamdamania87
Differential Revision: D23910257
fbshipit-source-id: 66920746c41f3a27a3689f22e2a2d9709d0faa15
Summary:
The record_stream method was hard coded for CUDA device. Define the record_stream in the native_functions.yaml to enable the dynamic dispatch to different end device.
Fixes https://github.com/pytorch/pytorch/issues/36556
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44301
Reviewed By: glaringlee
Differential Revision: D23763954
Pulled By: ezyang
fbshipit-source-id: e6d24f5e7892b56101fa858a6cad2abc5cdc4293
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42869
We realized that when we invoke a simple callback that divides the tensors by `world_size` after `allreduce`, the performance was almost 50% lower in terms of QPS compared to the case where a simple `allreduce` hook is used with no `then` callback.
The main problem was as we call `work.wait()` before invoking `then` callback, we were synchronizing `work`'s stream with the default PyTorch stream inside [`runHook`](https://github.com/pytorch/pytorch/blob/master/torch/csrc/distributed/c10d/reducer.cpp#L609) and stalling the backward computation.
In that PR, we ensure that FutureNCCL's `then` callback is not stalling the backward computation. Assuming single-process single-device, `FutureNCCL` gets a new stream from device's pool using `at::cuda::getStreamFromPool` to run `callback` and before invoking the `callback` inline it synchronizes `WorkNCCL`'s stream by callback's stream not the default stream.
ghstack-source-id: 110208431
Test Plan: Run performance benchmark tests to validate performance issue is resolved. Also, `python test/distributed/test_c10d.py` to avoid any odd issues.
Reviewed By: pritamdamania87
Differential Revision: D23055807
fbshipit-source-id: 60e50993f1ed97497514eac5cb1018579ed2a4c5
Summary:
[3/N] Implement Enum JIT support
* Add enum value as constant support
* Add sugared value for EnumClass
Supported:
Enum-typed function arguments
using Enum type and comparing them
Support getting name/value attrs of enums
Using Enum value as constant
TODO:
Add PyThon sugared value for Enum
Support Enum-typed return values
Support serialization and deserialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42085
Reviewed By: eellison
Differential Revision: D22758042
Pulled By: gmagogsfm
fbshipit-source-id: 5c6e571686c0b60d7fbad59503f5f94b3b3cd125
Summary:
* Add EnumType and AnyEnumType as first-class jit type
* Add Enum-typed IValue
* Enhanced aten::eq to support Enum
Supported:
Enum-typed function targuments
using Enum type and comparing them
TODO:
Add PyThon sugared value for Enum
Support getting name/value attrs of enums
Support Enum-typed return values
Support enum values of different types in same Enum class
Support serialization and deserialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41390
Reviewed By: eellison
Differential Revision: D22524388
Pulled By: gmagogsfm
fbshipit-source-id: 1627154a64e752d8457cd53270f3d14aea4b1150
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40902
See the bottom of this stack for context.
Test Plan: Imported from OSS
Reviewed By: eellison
Differential Revision: D22360210
Pulled By: suo
fbshipit-source-id: 4275127173a36982ce9ad357aa344435b98e1faf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37034
c10 takes a Stack* in boxed functions while JIT took Stack&.
c10 doesn't return anything while JIT returns an int which is always zero.
This changes JIT to follow the c10 behavior.
ghstack-source-id: 106834069
Test Plan: unit tests
Differential Revision: D20567950
fbshipit-source-id: 1a7aea291023afc52ae706957e9a5ca576fbb53b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40270
Original commit changeset: 1227e243ab94
D22082806 (1e03d603c6) broke the model generation of pyper models. We trace the namedtuple as input. To unblock the development of PyPer project, let's revert the diff first.
Sorry about the inconvenience, SplitInfinity
ghstack-source-id: 106217609
Test Plan: buck run dper3/dper3_models/experimental/pytorch/feed:feed_generation_script -- --model_files_dir=/tmp/
Reviewed By: alyssawangqq
Differential Revision: D22132960
fbshipit-source-id: ce9278c8462602a341e231ea890e46f74e743ddf
Summary:
**Summary**
This commit modifies type inference for `nn.Module` instance attributes
such that the type of a `NamedTuple` attribute is inferred correctly and
such that the field names of this `NamedTuple` instance can be used in
scripted methods. At present, the type of this attribute is inferred to be
`Tuple[T, U, ..., V]`, so the field must be referred to by index and
cannot be referred to by name.
**Test Plan**
This commit adds a unit test to test that a field of a `NamedTuple`
attribute can be referred to by name in a scripted method.
**Fixes**
This commit fixes https://github.com/pytorch/pytorch/issues/37668.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39116
Differential Revision: D22082806
Pulled By: SplitInfinity
fbshipit-source-id: 1227e243ab941376cd5e382fb093751e88dc8846
Summary:
Clearly expressing a type is inferred by PyTorch instead of explicitly annotated by user makes many error messages more user-friendly
Currently Type has two string conversion methods. str() for IR printing and python_str() for serialization and error message generation. If we want to include more information in type printing while maintaining serialization/deserialization correctness, we need to split python_str() into annotation_str() and repr_str().
annotation_str is solely responsible for serialization, it strictly matches format of python type annotation. repr_str() is responsible for generating a human-readable error message that includes information like "this type is inferred, not explicitly annotated"
Closes https://github.com/pytorch/pytorch/issues/39449
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39544
Differential Revision: D21978759
Pulled By: gmagogsfm
fbshipit-source-id: 733566f5a62e748b5ca4bb3c5943ebb6d5b664d0
Summary:
Previously, on conversion from python -> c++ it was casted to double list through bad copy pasta. It's pretty unusual for someone to script a broadcasting list function directly since it's an internal api, so it was unlikely to affect anyone.
Fix for https://github.com/pytorch/pytorch/issues/39450
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39481
Reviewed By: jamesr66a
Differential Revision: D21870557
Pulled By: eellison
fbshipit-source-id: e704e5e87d2702a270b7d65c4df444246a134480
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39008
This commit adds a `torch.futures.Future` type and exposes its ctor,
`wait`, `then`, and `set_result` APIs. This type is currently a
wrapper of `c10::ivalue::Future` and mainly used by RPC for now. Later,
we could revamp c10d APIs to return this `Future` type as well. More
utils will be added into `torch.futures` package in followup PRs.
Test Plan: Imported from OSS
Differential Revision: D21723022
Pulled By: mrshenli
fbshipit-source-id: 92e56160544e9bf00d11db3e8347a1b9707882c9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37474
Previously we would segfault
Test Plan: Imported from OSS
Differential Revision: D21297542
Pulled By: suo
fbshipit-source-id: c7e2f828a250c490ec23fb51c6a4a642d3370e52
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35154
This is for issue https://github.com/pytorch/pytorch/issues/34999.
close https://github.com/pytorch/pytorch/issues/34999.
https://github.com/pytorch/pytorch/issues/34997 need more work.
This will make a few work items easier, like 1) Dist autograd profiler, 2) JIT annotation for Future.
Test Plan:
```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_rref_forward_chain --stress-runs 100
buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork && \
buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par \
-r test_call_method_on_rref
```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- 'test_rref_proxy_class \(fb\.test_rpc_fork\.RpcTestWithFork\)' --stress-runs 100
test_rref_proxy_reuse
test_handle_send_exceptions
```
buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork
buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork && \
buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \
-r test_script_call_python_return_future
```
Differential Revision: D7722184
fbshipit-source-id: bd92b855bfea4913d6672700590c57622fa86e0e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37472
Our convention is for `findX` to return an optional version and `getX`
to assert that the X is there. Fix up `getMethod` to be consistent with
this convention.
Test Plan: Imported from OSS
Differential Revision: D21297543
Pulled By: suo
fbshipit-source-id: b40f56231cc8183e61bbb01fe5c0c113bcb6464d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35115
This commit runs the newly added tools/clang_format.py on the JIT
codebase and includes all of the formatting changes thus produced.
Testing:
Ran the script, CI.
Test Plan: Imported from OSS
Reviewed By: eellison
Differential Revision: D20568523
Pulled By: SplitInfinity
fbshipit-source-id: e09bdb982ccf090eecfb7c7b461b8d0681eef82b
Summary:
add `id` function so to give uses a way of keeping a `seen` set of nn modules.
n practice, this is only used between values of `T` and `T` or `T` and `Optional[T]`, so in this implementation I made it so that None is the only value that can be zero. Python also only guarantees `id()` gives semantically meaningful results for pointer types.
EDIT: now only allowing id on class types
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34975
Reviewed By: driazati
Differential Revision: D20599564
Pulled By: eellison
fbshipit-source-id: 3c6666a9b9b0258198adc70969dd6332e3375e4f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35039
This is the initial step towards merging ivalue future and rpc future
Test Plan: Imported from OSS
Differential Revision: D20537164
Pulled By: wanchaol
fbshipit-source-id: d4f148c88e49ed6b0881ca4b4dd945ea24166183
Summary:
add `id` function so to give uses a way of keeping a `seen` set of nn modules.
n practice, this is only used between values of `T` and `T` or `T` and `Optional[T]`, so in this implementation I made it so that None is the only value that can be zero. Python also only guarantees `id()` gives semantically meaningful results for pointer types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34975
Differential Revision: D20549677
Pulled By: eellison
fbshipit-source-id: cca5ed4ef013f7540f93abf49f91f9830dfdca14
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34671
Like the python arg parser, this tries to convert to the schema in order.
It introduces schema_match_exception which gets thrown when the schema doesn't match,
allowing the overload handler to try the next option.
Behavior will not 100% match the schema argument parser but should work for
simple cases using custom binding.
Test Plan: Imported from OSS
Differential Revision: D20432206
Pulled By: zdevito
fbshipit-source-id: 280839a2205ea3497db3a9b5741fccc1e2bff9a8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34515
Once upon a time we thought this was necessary. In reality it is not, so
removing it.
For backcompat, our public interface (defined in `api/`) still has
typedefs to the old `script::` names.
There was only one collision: `Pass` as a `Stmt` and `Pass` as a graph
transform. I renamed one of them.
Test Plan: Imported from OSS
Differential Revision: D20353503
Pulled By: suo
fbshipit-source-id: 48bb911ce75120a8c9e0c6fb65262ef775dfba93
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33992
resubmit of https://github.com/pytorch/pytorch/pull/33369 with tweaks on when the rref type being created to ensure ivalue->type() hold the correct RRef type inside of inner element type.
Test Plan: Imported from OSS
Differential Revision: D20175043
Pulled By: wanchaol
fbshipit-source-id: a08b178e989c995632374e6c868d23c5a85526ae