Commit Graph

788 Commits

Author SHA1 Message Date
Basil Hosmer
1ed1a2f5b0 [wip] fast typeMeta/ScalarType conversion approach 2 (#44965)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44965

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23789657

Pulled By: bhosmer

fbshipit-source-id: 5afdd52d24bd097891ff4a7313033f7bd400165e
2020-09-29 02:39:36 -07:00
Bert Maher
03342af3a3 Add env variable to bypass CUDACachingAllocator for debugging (#45294)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45294

While tracking down a recent memory corruption bug we found that
cuda-memcheck wasn't finding the bad accesses, and ngimel pointed out that
it's because we use a caching allocator so a lot of "out of bounds" accesses
land in a valid slab.

This PR adds a runtime knob (`PYTORCH_NO_CUDA_MEMORY_CACHING`) that, when set,
bypasses the caching allocator's caching logic so that allocations go straight
to cudaMalloc.  This way, cuda-memcheck will actually work.

Test Plan:
Insert some memory errors and run a test under cuda-memcheck;
observe that cuda-memcheck flags an error where expected.

Specifically I removed the output-masking logic here:
https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/cuda_codegen.cpp#L819-L826

And ran:
```
PYTORCH_NO_CUDA_MEMORY_CACHING=1 cuda-memcheck pytest -k test_superslomo test_jit_fuser_te.py
```

Reviewed By: ngimel

Differential Revision: D23964734

Pulled By: bertmaher

fbshipit-source-id: 04efd11e8aff037b9edde80c70585cb820ee6e39
2020-09-28 11:40:04 -07:00
Sebastian Messmer
78fcde9c50 Trace scattered tensor options arguments (#44071)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44071

Previously, tracing re-gathered ScalarType, Layout, Device, bool into a TensorOptions object and called `tracer::addInput()` on the gathered TensorOptions argument. `tracer::addInput()` then scattered them again and added the individual scattered arguments to the traced graph. This PR avoids the extraneous gathering and re-scattering step and calls `tracer::addInput()` on the individual arguments directly. This avoid the perf hit for an unnecessary gathering step.

This applies to both c10-full and non-c10-full ops. In the case of c10-full ops, the tracing kernels takes scattered arguments and we can directly pass them to `tracer::addInput()`. In the case of non-c10-full ops, the kernel takes a `TensorOptions` argument but we still call `tracer::addInput()` on the scattered arguments.
ghstack-source-id: 112825793

Test Plan:
waitforsandcastle

vs master: https://www.internalfb.com/intern/fblearner/details/216129483/

vs previous diff: https://www.internalfb.com/intern/fblearner/details/216170069/

Reviewed By: ezyang

Differential Revision: D23486638

fbshipit-source-id: e0b53e6673cef8d7f94158e718301eee261e5d22
2020-09-25 09:04:06 -07:00
Sebastian Messmer
2ac7de7d53 Remove hacky_wrapper from BackendSelect kernels (#44062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44062

Previously, BackendSelect kernels were still written in the legacy way, i.e. they took one TensorOptions argument instead of scattered dtype, layout, device, pin_memory,  and they used hacky_wrapper to be callable. This caused a re-wrapping step. Calling into a BackencSelect kernel required taking the individual scattered arguments, packing them into a TensorOptions, and the kernel itself then gathered them again for redispatch.

Now with this PR, BackendSelect kernels are written in the new way and no hacky_wrapper or rewrapping is needed for them.
ghstack-source-id: 112825789

Test Plan:
vs master: https://www.internalfb.com/intern/fblearner/details/216117032/

vs previous diff: https://www.internalfb.com/intern/fblearner/details/216170194/

Reviewed By: ezyang

Differential Revision: D23484192

fbshipit-source-id: e8fb49c4692404b6b775d18548b990c4cdddbada
2020-09-25 09:04:03 -07:00
Hong Xu
b470fa4500 Add complex number support for binary logical operators (#43174)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43174

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D23684425

Pulled By: mruberry

fbshipit-source-id: 4857b16e18ec4c65327136badd7f04c74e32d330
2020-09-23 23:03:00 -07:00
Rohan Varma
70d2e4d1f6 [RPC profiling] Allow disableProfiler() to be called from another thread. (#44653)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44653

This changes the profiler per a discussion with ilia-cher offline that enables `disableProfiler()` event consolidation logic to be called from different threads (i.e. threads where the profiler was not explicitly enabled). This is needed to support the functionality enabled by D23638387 where we defer profiling event collection until executing an async callback that can execute on a different thread, to support RPC async function profiling.

This is done by introducing 2 flags `cleanupTLSState` and `consolidate` which controls whether we should clean up thread local settings (we don't do this when calling `disableProfiler()` on non-main threads) and whether we should consolidate all profiled events. Backwards compatiblity is ensured since both options are true by default.

Added a test in `test_misc.cpp` to test this.
ghstack-source-id: 112605620

Reviewed By: mrshenli

Differential Revision: D23638499

fbshipit-source-id: f5bbb0d41ef883c5e5870bc27e086b8b8908f46b
2020-09-22 21:16:58 -07:00
Ailing Zhang
92f8f75c59 Add alias dispatch key Math. (#44354)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44354

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23591481

Pulled By: ailzhang

fbshipit-source-id: 6e93c4ec99a07f3fc920ba2d09dc222e6ced5adf
2020-09-21 11:10:39 -07:00
Ailing Zhang
a47e3697ab Use iterator of DispatchKeySet. (#44682)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44682

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23698387

Pulled By: ailzhang

fbshipit-source-id: 4fa140db9254c2c9c342bf1c8dfd952469b0b779
2020-09-18 13:34:27 -07:00
Kimish Patel
f605d7581e Implement better caching allocator for segmentation usecase. (#44618)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44618

This diff refactors caching allocator to allow for overriding behavior by
making it a virtual class.

Test Plan: https://www.internalfb.com/intern/fblearner/details/218419618?tab=Experiment%20Results

Reviewed By: dreiss

Differential Revision: D23672902

fbshipit-source-id: 976f02922178695fab1c87f453fcb59142c258ec
2020-09-17 08:56:14 -07:00
Louis Feng
eb75cfb9c0 Back out "Revert D23323486: DPP Async Tracing" plus windows build fix. (#44702)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44702

Original commit changeset: c6bd6d277aca

This diff caused windows build to fail due to a compiler bug in VS2019 (lambda capture constant int value). This back out works around the issue with explicit capture of const int value.

Test Plan: Tested and previously landed.

Reviewed By: mruberry

Differential Revision: D23703215

fbshipit-source-id: f9ef23be97540bc9cf78a855295fb8c69f360459
2020-09-16 11:32:11 -07:00
Mike Ruberry
7036e91abd Revert D23323486: DPP Async Tracing
Test Plan: revert-hammer

Differential Revision:
D23323486 (71673b31f9)

Original commit changeset: 4b6ca6c0e320

fbshipit-source-id: c6bd6d277aca070bef2de3522c2a60e23b4395ad
2020-09-15 01:19:23 -07:00
Louis Feng
71673b31f9 DPP Async Tracing (#44252)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44252

Add tracing to DPP client. Because DPP requests are async, we need to be able to start a trace event in one thread and potentially end in a different thread. RecordFunction and LibgpumonObserver previously assume each trace event starts and finishes in the same thread. So they use a thread local context to track enter and exit call backs. Async events breaks this assumption. This change attaches the event context to the RecordFunction object so we do not need to use thread local context.

Test Plan:
Tested with dpp perf test and able to collect trace.

{F307824044}

Reviewed By: ilia-cher

Differential Revision: D23323486

fbshipit-source-id: 4b6ca6c0e32028fb38a476cd1f44c17a001fc03b
2020-09-14 18:43:14 -07:00
kshitij12345
c68a99bd61 [numpy] Add torch.exp2 (#44184)
Summary:
Reference https://github.com/pytorch/pytorch/issues/42515

TODO
* [x] Add tests
* [x] Add docs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44184

Reviewed By: ngimel

Differential Revision: D23674237

Pulled By: mruberry

fbshipit-source-id: 7f4fb1900fad3051cd7fc9d3d7f6d985c5fb093c
2020-09-14 04:05:37 -07:00
Caleb Thomas
dd4bbe1a79 Add iterator like functionality for DispatchKeySet (#44066)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44066

Add STL Input iterator to DispatchKeySet:
* Iterator is able to iterate from first not undefined DispatchKey
to NumDispatchKeys.
* Iterator is invalidated once underlying DispatchKeySet is invalidated

Note see http://www.cplusplus.com/reference/iterator/ for comparisons of
different iterators.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23611405

Pulled By: linux-jedi

fbshipit-source-id: 131b287d60226a1d67a6ee0f88571f8c4d29f9c3
2020-09-11 15:08:15 -07:00
Ailing Zhang
39bb455e36 Update fallback kernel for Autograd keys. (#44349)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44349

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23589807

Pulled By: ailzhang

fbshipit-source-id: 0e4b0bf3e07bb4e35cbf1bda22f7b03193eb3dc4
2020-09-11 12:04:52 -07:00
Giuseppe Ottaviano
6324ef4ced [caffe2] Speed up compilation of aten-op.cc (#44440)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44440

`aten-op.cc` takes a long time to compile due to the large generated constructor. For each case, the `std::function` constructor and the initialization functions are inlined, producing a huge amount of intermediate code that takes a long time to optimize, given that many compiler optimization passes are superlinear in the function size.

This diff moves each case to a separate function, so that each one is cheap to optimize, and the constructor is just a large jump table, which is easy to optimize.

Reviewed By: dzhulgakov

Differential Revision: D23593741

fbshipit-source-id: 1ce7a31cda10d9b0c9d799716ea312a291dc0d36
2020-09-09 21:21:48 -07:00
Ailing Zhang
4aacfab221 Resolve Autograd key for disable_variable_dispatch flag. (#44268)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44268

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D23561042

Pulled By: ailzhang

fbshipit-source-id: 6f35cd9a543bea3f9e294584f1db7c3622ebb741
2020-09-08 21:27:52 -07:00
Ailing Zhang
1b2da9ed82 Expose alias key info in dumpState and update test_dispatch. (#44081)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44081

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23492794

Pulled By: ailzhang

fbshipit-source-id: 27a2978591900463bda2e92e0201c9fd719f9792
2020-09-06 18:43:05 -07:00
Xiang Gao
263412e536 Rename is_complex_t -> is_complex (#39906)
Summary:
`is_complex_t` is a bad name. For example in std, there are `std::is_same` but not `std::is_same_t`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/39906

Reviewed By: mrshenli

Differential Revision: D22665013

Pulled By: anjali411

fbshipit-source-id: 4b71745f5e2ea2d8cf5845d95ada4556c87e040d
2020-09-01 21:04:19 -07:00
Ailing Zhang
224232032c Move Autograd to an alias dispatch key (#43070)
Summary:
This PR moves `DispatchKey::Autograd` to an alias dispatch key mapping to `AutogradCPU, AutogradCUDA, AutogradXLA, AutogradOther, AutogradPrivate*` keys.

A few things are handled in this PR:
- Update alias dispatch key mapping and precompute dispatchTable logic
- Move `Autograd` key from `always_included` set to TensorImpl constructor.
- Update `dummyTensor` constructor to take `requires_grad` as optional argument so that it's closer to the real application in op_registration_test.
- Use `BackendSelect` key for both backend select before and after autograd layer. (1 liner in backend_select codegen)

A few planned followups ordered by priority:
- [cleanup] Update `test_dispatch.py` to include testing `Autograd`.
- [cleanup] Add Math alias key and move catchAll to Math. (to remove 2.2 in `computeDispatchTableEntryWithDebug`)
- [new feature] Add support for Math in native_functions.yaml
- [cleanup] Add iterator like functionality to DispatchKeySet
- [cleanup/large] Only add Autograd backend keys when tensor requires grad. (cc: ljk53 ?)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43070

Reviewed By: ezyang

Differential Revision: D23281535

Pulled By: ailzhang

fbshipit-source-id: 9ad00b17142e9b83304f63cf599f785500f28f71
2020-09-01 09:05:29 -07:00
Ashkan Aliabadi
4e39c310eb Move torch/csrc/utils/hash.h to c10/util/hash.h. (#42503)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42503

Test Plan: Imported from OSS

Reviewed By: IvanKobzarev

Differential Revision: D23252331

Pulled By: AshkanAliabadi

fbshipit-source-id: 3c4c0e27b9a7eec8560e374c2a3ba5f1c65dae48
2020-08-29 17:47:00 -07:00
Nikita Shulga
d10056652b Enable torch.half for lt and masked_select (#43704)
Summary:
Enable testing of those options in `TestTorchDeviceTypeCPU.test_logical_cpu` and `TestTorchDeviceTypeCPU.test_masked_select_cpu_float16`
Add `view_as_real` testing for `torch.complex32` type

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43704

Reviewed By: albanD

Differential Revision: D23373070

Pulled By: malfet

fbshipit-source-id: 00f17f23b48513379a414227aea91e2d3c0dd5f9
2020-08-29 02:37:26 -07:00
Kimish Patel
dc5d365514 Fix bug in caching allocator. (#43719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43719

Accidentally this slipped through: with guard did not update the current
context

Test Plan: cpu_caching_allocator_test

Reviewed By: linbinyu

Differential Revision: D23374453

fbshipit-source-id: 1d3ef21cc390d0a8bde98fb1b5c2175b40ab571b
2020-08-28 11:56:23 -07:00
Jiakai Liu
3a0e35c9f2 [pytorch] deprecate static dispatch (#43564)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43564

Static dispatch was originally introduced for mobile selective build.

Since we have added selective build support for dynamic dispatch and
tested it in FB production for months, we can deprecate static dispatch
to reduce the complexity of the codebase.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23324452

Pulled By: ljk53

fbshipit-source-id: d2970257616a8c6337f90249076fca1ae93090c7
2020-08-27 14:52:48 -07:00
Yi Zhang
47e1b7a8f1 Set CONSTEXPR_EXCEPT_WIN_CUDA as const while it is not constexpr (#43380)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/42467

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43380

Reviewed By: albanD

Differential Revision: D23278930

Pulled By: pbelevich

fbshipit-source-id: 6ce0bc9fd73cd0ead46c414fdea5f6fb7e9fec3e
2020-08-22 03:25:37 -07:00
Basil Hosmer
915fd1c8fc centralize autograd dispatch key set (#43387)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43387

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D23258687

Pulled By: bhosmer

fbshipit-source-id: 3718f74fc7324db027f87eda0b90893a960aa56e
2020-08-22 00:46:02 -07:00
Kimish Patel
2a08566b8f Simple caching allocator for CPU. (#42006)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42006

This PR introduces a simple CPU caching allocator. This is specifically
intended for mobile use cases and for inference. There is nothing
specific to the implementation that can prevent it from other use cases,
however its simplicity may not be suitable everywhere.
It simply tracks allocation by sizes and relies on deterministic
repeatable behavior where allocation of same sizes are made on every
inference.
Thus after the first allocation when the pointer is returned, instead of
returning it to system, allocator caches it for subsequent use.
Memory is freed automatically at the end of the process, or it can be
explicitly freed.
This is enabled at the moment in DefaultMobileCPUAllocator only.

Test Plan:
android test: cpu_caching_allocator_test

Imported from OSS

Reviewed By: dreiss

Differential Revision: D22726976

fbshipit-source-id: 9a38b1ce34059d5653040a1c3d035bfc97609e6c
2020-08-21 19:09:22 -07:00
Mike Ruberry
3aec1185e0 Enables bfloat16 x [float16, complex64, complex128] type promotion (#43324)
Summary:
Implements bfloat16 type promotion consistent with JAX (see https://jax.readthedocs.io/en/latest/type_promotion.html), addressing issue https://github.com/pytorch/pytorch/issues/43049.

- bfloat16 x float16 -> float32
- bfloat16 x complex64 -> complex64
- bfloat16 x complex128 -> complex128

Existing tests, after updates, are sufficient to validate the new behavior.

cc xuhdev

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43324

Reviewed By: albanD

Differential Revision: D23259823

Pulled By: mruberry

fbshipit-source-id: ca9c2c7d0325faced1f884f3c37edf8fa8c8b089
2020-08-21 10:48:04 -07:00
Hong Xu
7b520297dc Remove erroneous trailing backslashes (#43318)
Summary:
They were likely copied from some macro definition, but they do not
belong to macro definitions here.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43318

Reviewed By: pbelevich

Differential Revision: D23241526

Pulled By: mrshenli

fbshipit-source-id: e0b5eddfde2c882bb67f56d84ee79281cc5fc941
2020-08-21 08:21:56 -07:00
Nikita Shulga
e10aa47615 Fix at::native::view_as_real() for ComplexHalf Tensors (#43279)
Summary:
Add ComplexHalf case to toValueType, which fixes the logic how view_as_real and view_as_complex slices complex tensor to the floating point one, as it is used to generate tensor of random complex values, see:
018b4d7abb/aten/src/ATen/native/DistributionTemplates.h (L200)
Also add ability to convert python complex object to `c10::complex<at::Half>`

Add `torch.half` and `torch.complex32` to the list of `test_randn` dtypes

Fixes https://github.com/pytorch/pytorch/issues/43143

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43279

Reviewed By: mrshenli

Differential Revision: D23230296

Pulled By: malfet

fbshipit-source-id: b4bb66c4c81dd867e72ab7c4563d73f6a4d80a44
2020-08-20 17:38:06 -07:00
Xiang Gao
d5bc2a8058 Remove std::complex from c10::Half (#39833)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39833

Reviewed By: mrshenli

Differential Revision: D22644987

Pulled By: anjali411

fbshipit-source-id: 5ae5db10b12d410560eca43234efa04b711a639c
2020-08-18 15:22:36 -07:00
Ailing Zhang
7cb8d68ae1 Rename XLAPreAutograd to AutogradXLA. (#43047)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43047

Reviewed By: ezyang

Differential Revision: D23134326

Pulled By: ailzhang

fbshipit-source-id: 5fcbc23755daa8a28f9b03af6aeb3ea0603b5c9a
2020-08-17 10:47:43 -07:00
Muthu Arivoli
b8102b1550 Implement torch.nextafter (#42580)
Summary:
Related to https://github.com/pytorch/pytorch/issues/38349.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42580

Reviewed By: smessmer

Differential Revision: D23012260

Pulled By: mruberry

fbshipit-source-id: ce82a63c4ad407ec6ffea795f575ca7c58cd6137
2020-08-14 00:35:30 -07:00
Supriya Rao
6f8446840e [quant] Create PerRowQuantizer for floating point scale and zero_point (#42612)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42612

Add a new Quantizer that supports an input zero point (bias) that can be float.
The quantization equation in this case is

Xq = (Xf - bias) * inv_scale, where bias is float zero_point value
We start with per-row implementation and can extend to per-tensor in the future, if necessary

Test Plan:
python test/test_quantization.py TestQuantizedTensor

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D22960142

fbshipit-source-id: ca9ab6c5b45115d3dcb1c4358897093594313706
2020-08-13 11:20:53 -07:00
Muthu Arivoli
92885ebe16 Implement hypot (#42291)
Summary:
Related to https://github.com/pytorch/pytorch/issues/38349
Closes https://github.com/pytorch/pytorch/issues/22764

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42291

Reviewed By: malfet

Differential Revision: D22951859

Pulled By: mruberry

fbshipit-source-id: d0118f2b6437e5c3f775f699ec46e946a8da50f0
2020-08-12 13:18:26 -07:00
Richard Zou
e8f4b04d9a vmap: temporarily disable support for random functions (#42617)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42617

While we figure out the random plan, I want to initially disable
support for random operations. This is because there is an ambiguity in
what randomness means. For example,

```
tensor = torch.zeros(B0, 1)
vmap(lambda t: t.normal_())(tensor)
```

in the above example, should tensor[0] and tensor[1] be equal (i.e.,
use the same random seed), or should they be different?

The mechanism for disabling random support is as follows:
- We add a new dispatch key called VmapMode
- Whenever we're inside vmap, we enable VmapMode for all tensors.
This is done via at::VmapMode::increment_nesting and
at::VmapMode::decrement_nesting.
- DispatchKey::VmapMode's fallback kernel is the fallthrough kernel.
- We register kernels that raise errors for all random functions on
DispatchKey::VmapMode. This way, whenever someone calls a random
function on any tensor (not just BatchedTensors) inside of a vmap block,
an error gets thrown.

Test Plan: - pytest test/test_vmap.py -v -k "Operators"

Reviewed By: ezyang

Differential Revision: D22954840

Pulled By: zou3519

fbshipit-source-id: cb8d71062d4087e10cbf408f74b1a9dff81a226d
2020-08-11 07:19:51 -07:00
Basil Hosmer
b6810c1064 Include/ExcludeDispatchKeySetGuard API (#42658)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42658

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D22971426

Pulled By: bhosmer

fbshipit-source-id: 4d63e0cb31745e7b662685176ae0126ff04cdece
2020-08-08 16:27:05 -07:00
Basil Hosmer
c889de7e25 update DispatchKey::toString() (#42619)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42619

Added missing entries to `DispatchKey::toString()` and reordered to match declaration order in `DispatchKey.h`

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D22963407

Pulled By: bhosmer

fbshipit-source-id: 34a012135599f497c308ba90ea6e8117e85c74ac
2020-08-07 22:39:23 -07:00
aviloria
6755e49cad Set proper return type (#42454)
Summary:
This function was always expecting to return a `size_t` value

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42454

Reviewed By: ezyang

Differential Revision: D22993168

Pulled By: ailzhang

fbshipit-source-id: 044df8ce17983f04681bda8c30cd742920ef7b1e
2020-08-07 19:22:35 -07:00
Sebastian Messmer
95f4f67552 Restrict conversion to SmallVector (#42694)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42694

The old implementation allowed calling SmallVector constructor and operator= for any type without restrictions,
but then failed with a compiler error when the type wasn't a collection.

Instead, we should only use it if Container follows a container concept and just not match the constructor otherwise.

This fixes an issue kimishpatel was running into.
ghstack-source-id: 109370513

Test Plan: unit tests

Reviewed By: kimishpatel, ezyang

Differential Revision: D22983020

fbshipit-source-id: c31264f5c393762d822f3d64dd2a8e3279d8da44
2020-08-07 15:47:29 -07:00
Basil Hosmer
1f689b6ef9 suppress all Autograd keys in AutoNonVariableTypeMode (#42610)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42610

Fix for https://github.com/pytorch/pytorch/issues/42609: `AutoNonVariableTypeMode` should suppress all autograd dispatch keys, not just `Autograd` (e.g. `XLAPreAutograd`, `PrivateUse<N>_PreAutograd`)

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D22963408

Pulled By: bhosmer

fbshipit-source-id: 2f3516580ce0c9136aff5e025285d679394f2f18
2020-08-06 13:15:42 -07:00
Ilia Cherniavskii
a53fdaa23f Remove ProfiledType (#42570)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42570

ProfiledType doesn't do anything and is not used atm, removing

Test Plan: CI

Reviewed By: ezyang

Differential Revision: D22938664

Pulled By: ilia-cher

fbshipit-source-id: 037c512938028f44258b702bbcde3f8c144f4aa0
2020-08-06 01:52:08 -07:00
Dmytro Dzhulgakov
06d978a9ad [c10/cuda] Reorganize device_count() and robustly surface ASAN warnings (#42249)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42249

Main change is to bring Caffe2's superior error messages for cuda initialization into c10 and use them in all code paths.

Basic logic:

| Case | Call to device_count() | init_cuda, e.g. allocating tensor |
| -- | -- | -- |
| all good | non-zero | just works |
| no gpus | 0, no warning | throw exception with good message |
| driver issues | 0, produce warning | throw exception with good message |
| out of memory with ASAN | 0, produce warning| throw exception with ASAN message |

Previously, the error thrown from init_cuda was very generic and the ASAN warning (if any) was buried in the logs.

Other clean up changes:
* cache device_count() always in a static variable
* move all asan macros in c10

Test Plan:
Hard to unittest because of build modes. Verified manually that the behavior from the table above holds by running the following script in different modes (ASAN/no-ASAN, CUDA_VISIBLE_DEVICES=):

```
print('before import')
import torch
print('after import')
print('devices: ', torch.cuda.device_count())
x = torch.tensor([1,2,3])
print('tensor creation')
x = x.cuda()
print('moved to cuda')
```

Reviewed By: ngimel

Differential Revision: D22824329

fbshipit-source-id: 5314007313a3897fc955b02f8b21b661ae35fdf5
2020-08-05 11:39:31 -07:00
Christian Puhrsch
24199e0768 tuple_map / tuple_concat (#42326)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42326

ghstack-source-id: 108868289

Test Plan: Unit tests

Reviewed By: smessmer

Differential Revision: D22846504

fbshipit-source-id: fa9539d16e21996bbd80db3e3c524b174b22069e
2020-08-03 19:19:47 -07:00
Sebastian Messmer
3a19af2427 Make operators with optional Tensor? arguments c10-full (#41610)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41610

Previously, operators that have a `Tensor?` (i.e. optional tensor) in their schema implemented it using `Tensor` in C++ and filled in an undefined tensor for the None case.
The c10 operator library, however, expects `Tensor?` to be represented as `optional<Tensor>`, so those operators couldn't be c10-full yet and still had to use codegenerated unboxing instead of templated unboxing.

This PR changes that. It extends the `hacky_wrapper_for_legacy_signatures` to not only take case of TensorOptions, but now also map between signatures taking `Tensor` and `optional<Tensor>`.
For this, it requires an additional template parameter, the expected signature, and it uses that to go argument-by-argument and unwrap any optionals it finds.
ghstack-source-id: 108873701

Test Plan: waitforsandcastle

Reviewed By: bhosmer

Differential Revision: D22607879

fbshipit-source-id: 57b2fb01a294b804f82cd55cd70f0ef4a478e14f
2020-07-31 16:09:08 -07:00
ziab
1c8217a7a6 Abstract cuda calls made from torch_python (#42251)
Summary:
* Make c10::cuda functions regular non-inlined functions
* Add driver_version() and device_synchronize() functions

With this change I don't see anymore direct calls to CUDA API when look at Modules.cpp.obj

FYI malfet

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42251

Reviewed By: malfet

Differential Revision: D22826505

Pulled By: ziab

fbshipit-source-id: 8dc2f3e209d3710e2ce78411982a10e8c727573c
2020-07-30 19:18:33 -07:00
Hong Xu
344defc973 Let bfloat16 support promotion with other types (#41698)
Summary:
Fix https://github.com/pytorch/pytorch/issues/40580

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41698

Reviewed By: albanD

Differential Revision: D22824042

Pulled By: mruberry

fbshipit-source-id: 7dad9c12dc51d8f88c3ca963ae9c5f8aa2f72277
2020-07-30 12:28:09 -07:00
Basil Hosmer
029007c8b6 Improved coverage for unboxed->boxed kernel wrappers (#38999)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38999

Adds boxing for inplace and outplace kernels, itemizes
remaining unsupported cases, and fails compilation when
new unsupported types are introduced in op signatures.

Test Plan: Imported from OSS

Differential Revision: D21718547

Pulled By: bhosmer

fbshipit-source-id: 03295128b21d1843e86789fb474f38411b26a8b6
2020-07-29 11:31:16 -07:00
albanD
45c5bac870 [WIP] Fix cpp grad accessor API (#40887)
Summary:
Update the API to access grad in cpp to avoid unexpected thread safety issues.
In particular, with the current API, a check like `t.grad().defined()` is not thread safe.

- This introduces `t.mutable_grad()` that should be used when getting a mutable version of the saved gradient. This function is **not** thread safe.
- The `Tensor& grad()` API is now removed. We could not do a deprecation cycle as most of our call side use non-const Tensors that use the non-const overload. This would lead to most calls hitting the warning. This would be too verbose for all the users.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/40887

Reviewed By: ezyang

Differential Revision: D22343932

Pulled By: albanD

fbshipit-source-id: d5eb909bb743bc20caaf2098196e18ca4110c5d2
2020-07-16 09:11:12 -07:00
Nikita Shulga
702140758f Move GLOG_ constants into c10 namespace (#41504)
Summary:
Declaring GLOG_ constants in google namespace causes a conflict in C++ project that uses GLOG and links with LibPyTorch compiled without GLOG.
For example, see https://github.com/facebookresearch/ReAgent/issues/288

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41504

Reviewed By: kaiwenw

Differential Revision: D22564308

Pulled By: malfet

fbshipit-source-id: 2167bd2c6124bd14a67cc0a1360521d3c375e3c2
2020-07-15 21:56:00 -07:00