pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Basil Hosmer	1ed1a2f5b0	[wip] fast typeMeta/ScalarType conversion approach 2 (#44965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44965 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23789657 Pulled By: bhosmer fbshipit-source-id: 5afdd52d24bd097891ff4a7313033f7bd400165e	2020-09-29 02:39:36 -07:00
Bert Maher	03342af3a3	Add env variable to bypass CUDACachingAllocator for debugging (#45294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45294 While tracking down a recent memory corruption bug we found that cuda-memcheck wasn't finding the bad accesses, and ngimel pointed out that it's because we use a caching allocator so a lot of "out of bounds" accesses land in a valid slab. This PR adds a runtime knob (`PYTORCH_NO_CUDA_MEMORY_CACHING`) that, when set, bypasses the caching allocator's caching logic so that allocations go straight to cudaMalloc. This way, cuda-memcheck will actually work. Test Plan: Insert some memory errors and run a test under cuda-memcheck; observe that cuda-memcheck flags an error where expected. Specifically I removed the output-masking logic here: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/cuda_codegen.cpp#L819-L826 And ran: ``` PYTORCH_NO_CUDA_MEMORY_CACHING=1 cuda-memcheck pytest -k test_superslomo test_jit_fuser_te.py ``` Reviewed By: ngimel Differential Revision: D23964734 Pulled By: bertmaher fbshipit-source-id: 04efd11e8aff037b9edde80c70585cb820ee6e39	2020-09-28 11:40:04 -07:00
Sebastian Messmer	78fcde9c50	Trace scattered tensor options arguments (#44071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44071 Previously, tracing re-gathered ScalarType, Layout, Device, bool into a TensorOptions object and called `tracer::addInput()` on the gathered TensorOptions argument. `tracer::addInput()` then scattered them again and added the individual scattered arguments to the traced graph. This PR avoids the extraneous gathering and re-scattering step and calls `tracer::addInput()` on the individual arguments directly. This avoid the perf hit for an unnecessary gathering step. This applies to both c10-full and non-c10-full ops. In the case of c10-full ops, the tracing kernels takes scattered arguments and we can directly pass them to `tracer::addInput()`. In the case of non-c10-full ops, the kernel takes a `TensorOptions` argument but we still call `tracer::addInput()` on the scattered arguments. ghstack-source-id: 112825793 Test Plan: waitforsandcastle vs master: https://www.internalfb.com/intern/fblearner/details/216129483/ vs previous diff: https://www.internalfb.com/intern/fblearner/details/216170069/ Reviewed By: ezyang Differential Revision: D23486638 fbshipit-source-id: e0b53e6673cef8d7f94158e718301eee261e5d22	2020-09-25 09:04:06 -07:00
Sebastian Messmer	2ac7de7d53	Remove hacky_wrapper from BackendSelect kernels (#44062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44062 Previously, BackendSelect kernels were still written in the legacy way, i.e. they took one TensorOptions argument instead of scattered dtype, layout, device, pin_memory, and they used hacky_wrapper to be callable. This caused a re-wrapping step. Calling into a BackencSelect kernel required taking the individual scattered arguments, packing them into a TensorOptions, and the kernel itself then gathered them again for redispatch. Now with this PR, BackendSelect kernels are written in the new way and no hacky_wrapper or rewrapping is needed for them. ghstack-source-id: 112825789 Test Plan: vs master: https://www.internalfb.com/intern/fblearner/details/216117032/ vs previous diff: https://www.internalfb.com/intern/fblearner/details/216170194/ Reviewed By: ezyang Differential Revision: D23484192 fbshipit-source-id: e8fb49c4692404b6b775d18548b990c4cdddbada	2020-09-25 09:04:03 -07:00
Hong Xu	b470fa4500	Add complex number support for binary logical operators (#43174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43174 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23684425 Pulled By: mruberry fbshipit-source-id: 4857b16e18ec4c65327136badd7f04c74e32d330	2020-09-23 23:03:00 -07:00
Rohan Varma	70d2e4d1f6	[RPC profiling] Allow disableProfiler() to be called from another thread. (#44653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44653 This changes the profiler per a discussion with ilia-cher offline that enables `disableProfiler()` event consolidation logic to be called from different threads (i.e. threads where the profiler was not explicitly enabled). This is needed to support the functionality enabled by D23638387 where we defer profiling event collection until executing an async callback that can execute on a different thread, to support RPC async function profiling. This is done by introducing 2 flags `cleanupTLSState` and `consolidate` which controls whether we should clean up thread local settings (we don't do this when calling `disableProfiler()` on non-main threads) and whether we should consolidate all profiled events. Backwards compatiblity is ensured since both options are true by default. Added a test in `test_misc.cpp` to test this. ghstack-source-id: 112605620 Reviewed By: mrshenli Differential Revision: D23638499 fbshipit-source-id: f5bbb0d41ef883c5e5870bc27e086b8b8908f46b	2020-09-22 21:16:58 -07:00
Ailing Zhang	92f8f75c59	Add alias dispatch key Math. (#44354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44354 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23591481 Pulled By: ailzhang fbshipit-source-id: 6e93c4ec99a07f3fc920ba2d09dc222e6ced5adf	2020-09-21 11:10:39 -07:00
Ailing Zhang	a47e3697ab	Use iterator of DispatchKeySet. (#44682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44682 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23698387 Pulled By: ailzhang fbshipit-source-id: 4fa140db9254c2c9c342bf1c8dfd952469b0b779	2020-09-18 13:34:27 -07:00
Kimish Patel	f605d7581e	Implement better caching allocator for segmentation usecase. (#44618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44618 This diff refactors caching allocator to allow for overriding behavior by making it a virtual class. Test Plan: https://www.internalfb.com/intern/fblearner/details/218419618?tab=Experiment%20Results Reviewed By: dreiss Differential Revision: D23672902 fbshipit-source-id: 976f02922178695fab1c87f453fcb59142c258ec	2020-09-17 08:56:14 -07:00
Louis Feng	eb75cfb9c0	Back out "Revert D23323486: DPP Async Tracing" plus windows build fix. (#44702 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44702 Original commit changeset: c6bd6d277aca This diff caused windows build to fail due to a compiler bug in VS2019 (lambda capture constant int value). This back out works around the issue with explicit capture of const int value. Test Plan: Tested and previously landed. Reviewed By: mruberry Differential Revision: D23703215 fbshipit-source-id: f9ef23be97540bc9cf78a855295fb8c69f360459	2020-09-16 11:32:11 -07:00
Mike Ruberry	7036e91abd	Revert D23323486: DPP Async Tracing Test Plan: revert-hammer Differential Revision: D23323486 (`71673b31f9`) Original commit changeset: 4b6ca6c0e320 fbshipit-source-id: c6bd6d277aca070bef2de3522c2a60e23b4395ad	2020-09-15 01:19:23 -07:00
Louis Feng	71673b31f9	DPP Async Tracing (#44252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44252 Add tracing to DPP client. Because DPP requests are async, we need to be able to start a trace event in one thread and potentially end in a different thread. RecordFunction and LibgpumonObserver previously assume each trace event starts and finishes in the same thread. So they use a thread local context to track enter and exit call backs. Async events breaks this assumption. This change attaches the event context to the RecordFunction object so we do not need to use thread local context. Test Plan: Tested with dpp perf test and able to collect trace. {F307824044} Reviewed By: ilia-cher Differential Revision: D23323486 fbshipit-source-id: 4b6ca6c0e32028fb38a476cd1f44c17a001fc03b	2020-09-14 18:43:14 -07:00
kshitij12345	c68a99bd61	[numpy] Add `torch.exp2` (#44184 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 TODO * [x] Add tests * [x] Add docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/44184 Reviewed By: ngimel Differential Revision: D23674237 Pulled By: mruberry fbshipit-source-id: 7f4fb1900fad3051cd7fc9d3d7f6d985c5fb093c	2020-09-14 04:05:37 -07:00
Caleb Thomas	dd4bbe1a79	Add iterator like functionality for DispatchKeySet (#44066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44066 Add STL Input iterator to DispatchKeySet: * Iterator is able to iterate from first not undefined DispatchKey to NumDispatchKeys. * Iterator is invalidated once underlying DispatchKeySet is invalidated Note see http://www.cplusplus.com/reference/iterator/ for comparisons of different iterators. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23611405 Pulled By: linux-jedi fbshipit-source-id: 131b287d60226a1d67a6ee0f88571f8c4d29f9c3	2020-09-11 15:08:15 -07:00
Ailing Zhang	39bb455e36	Update fallback kernel for Autograd keys. (#44349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44349 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23589807 Pulled By: ailzhang fbshipit-source-id: 0e4b0bf3e07bb4e35cbf1bda22f7b03193eb3dc4	2020-09-11 12:04:52 -07:00
Giuseppe Ottaviano	6324ef4ced	[caffe2] Speed up compilation of aten-op.cc (#44440 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44440 `aten-op.cc` takes a long time to compile due to the large generated constructor. For each case, the `std::function` constructor and the initialization functions are inlined, producing a huge amount of intermediate code that takes a long time to optimize, given that many compiler optimization passes are superlinear in the function size. This diff moves each case to a separate function, so that each one is cheap to optimize, and the constructor is just a large jump table, which is easy to optimize. Reviewed By: dzhulgakov Differential Revision: D23593741 fbshipit-source-id: 1ce7a31cda10d9b0c9d799716ea312a291dc0d36	2020-09-09 21:21:48 -07:00
Ailing Zhang	4aacfab221	Resolve Autograd key for disable_variable_dispatch flag. (#44268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44268 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23561042 Pulled By: ailzhang fbshipit-source-id: 6f35cd9a543bea3f9e294584f1db7c3622ebb741	2020-09-08 21:27:52 -07:00
Ailing Zhang	1b2da9ed82	Expose alias key info in dumpState and update test_dispatch. (#44081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44081 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23492794 Pulled By: ailzhang fbshipit-source-id: 27a2978591900463bda2e92e0201c9fd719f9792	2020-09-06 18:43:05 -07:00
Xiang Gao	263412e536	Rename is_complex_t -> is_complex (#39906 ) Summary: `is_complex_t` is a bad name. For example in std, there are `std::is_same` but not `std::is_same_t`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39906 Reviewed By: mrshenli Differential Revision: D22665013 Pulled By: anjali411 fbshipit-source-id: 4b71745f5e2ea2d8cf5845d95ada4556c87e040d	2020-09-01 21:04:19 -07:00
Ailing Zhang	224232032c	Move Autograd to an alias dispatch key (#43070 ) Summary: This PR moves `DispatchKey::Autograd` to an alias dispatch key mapping to `AutogradCPU, AutogradCUDA, AutogradXLA, AutogradOther, AutogradPrivate*` keys. A few things are handled in this PR: - Update alias dispatch key mapping and precompute dispatchTable logic - Move `Autograd` key from `always_included` set to TensorImpl constructor. - Update `dummyTensor` constructor to take `requires_grad` as optional argument so that it's closer to the real application in op_registration_test. - Use `BackendSelect` key for both backend select before and after autograd layer. (1 liner in backend_select codegen) A few planned followups ordered by priority: - [cleanup] Update `test_dispatch.py` to include testing `Autograd`. - [cleanup] Add Math alias key and move catchAll to Math. (to remove 2.2 in `computeDispatchTableEntryWithDebug`) - [new feature] Add support for Math in native_functions.yaml - [cleanup] Add iterator like functionality to DispatchKeySet - [cleanup/large] Only add Autograd backend keys when tensor requires grad. (cc: ljk53 ?) Pull Request resolved: https://github.com/pytorch/pytorch/pull/43070 Reviewed By: ezyang Differential Revision: D23281535 Pulled By: ailzhang fbshipit-source-id: 9ad00b17142e9b83304f63cf599f785500f28f71	2020-09-01 09:05:29 -07:00
Ashkan Aliabadi	4e39c310eb	Move torch/csrc/utils/hash.h to c10/util/hash.h. (#42503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42503 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D23252331 Pulled By: AshkanAliabadi fbshipit-source-id: 3c4c0e27b9a7eec8560e374c2a3ba5f1c65dae48	2020-08-29 17:47:00 -07:00
Nikita Shulga	d10056652b	Enable `torch.half` for `lt` and `masked_select` (#43704 ) Summary: Enable testing of those options in `TestTorchDeviceTypeCPU.test_logical_cpu` and `TestTorchDeviceTypeCPU.test_masked_select_cpu_float16` Add `view_as_real` testing for `torch.complex32` type Pull Request resolved: https://github.com/pytorch/pytorch/pull/43704 Reviewed By: albanD Differential Revision: D23373070 Pulled By: malfet fbshipit-source-id: 00f17f23b48513379a414227aea91e2d3c0dd5f9	2020-08-29 02:37:26 -07:00
Kimish Patel	dc5d365514	Fix bug in caching allocator. (#43719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43719 Accidentally this slipped through: with guard did not update the current context Test Plan: cpu_caching_allocator_test Reviewed By: linbinyu Differential Revision: D23374453 fbshipit-source-id: 1d3ef21cc390d0a8bde98fb1b5c2175b40ab571b	2020-08-28 11:56:23 -07:00
Jiakai Liu	3a0e35c9f2	[pytorch] deprecate static dispatch (#43564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43564 Static dispatch was originally introduced for mobile selective build. Since we have added selective build support for dynamic dispatch and tested it in FB production for months, we can deprecate static dispatch to reduce the complexity of the codebase. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23324452 Pulled By: ljk53 fbshipit-source-id: d2970257616a8c6337f90249076fca1ae93090c7	2020-08-27 14:52:48 -07:00
Yi Zhang	47e1b7a8f1	Set CONSTEXPR_EXCEPT_WIN_CUDA as const while it is not constexpr (#43380 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42467 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43380 Reviewed By: albanD Differential Revision: D23278930 Pulled By: pbelevich fbshipit-source-id: 6ce0bc9fd73cd0ead46c414fdea5f6fb7e9fec3e	2020-08-22 03:25:37 -07:00
Basil Hosmer	915fd1c8fc	centralize autograd dispatch key set (#43387 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43387 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D23258687 Pulled By: bhosmer fbshipit-source-id: 3718f74fc7324db027f87eda0b90893a960aa56e	2020-08-22 00:46:02 -07:00
Kimish Patel	2a08566b8f	Simple caching allocator for CPU. (#42006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42006 This PR introduces a simple CPU caching allocator. This is specifically intended for mobile use cases and for inference. There is nothing specific to the implementation that can prevent it from other use cases, however its simplicity may not be suitable everywhere. It simply tracks allocation by sizes and relies on deterministic repeatable behavior where allocation of same sizes are made on every inference. Thus after the first allocation when the pointer is returned, instead of returning it to system, allocator caches it for subsequent use. Memory is freed automatically at the end of the process, or it can be explicitly freed. This is enabled at the moment in DefaultMobileCPUAllocator only. Test Plan: android test: cpu_caching_allocator_test Imported from OSS Reviewed By: dreiss Differential Revision: D22726976 fbshipit-source-id: 9a38b1ce34059d5653040a1c3d035bfc97609e6c	2020-08-21 19:09:22 -07:00
Mike Ruberry	3aec1185e0	Enables bfloat16 x [float16, complex64, complex128] type promotion (#43324 ) Summary: Implements bfloat16 type promotion consistent with JAX (see https://jax.readthedocs.io/en/latest/type_promotion.html), addressing issue https://github.com/pytorch/pytorch/issues/43049. - bfloat16 x float16 -> float32 - bfloat16 x complex64 -> complex64 - bfloat16 x complex128 -> complex128 Existing tests, after updates, are sufficient to validate the new behavior. cc xuhdev Pull Request resolved: https://github.com/pytorch/pytorch/pull/43324 Reviewed By: albanD Differential Revision: D23259823 Pulled By: mruberry fbshipit-source-id: ca9c2c7d0325faced1f884f3c37edf8fa8c8b089	2020-08-21 10:48:04 -07:00
Hong Xu	7b520297dc	Remove erroneous trailing backslashes (#43318 ) Summary: They were likely copied from some macro definition, but they do not belong to macro definitions here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43318 Reviewed By: pbelevich Differential Revision: D23241526 Pulled By: mrshenli fbshipit-source-id: e0b5eddfde2c882bb67f56d84ee79281cc5fc941	2020-08-21 08:21:56 -07:00
Nikita Shulga	e10aa47615	Fix `at::native::view_as_real()` for ComplexHalf Tensors (#43279 ) Summary: Add ComplexHalf case to toValueType, which fixes the logic how view_as_real and view_as_complex slices complex tensor to the floating point one, as it is used to generate tensor of random complex values, see: `018b4d7abb/aten/src/ATen/native/DistributionTemplates.h (L200)` Also add ability to convert python complex object to `c10::complex<at::Half>` Add `torch.half` and `torch.complex32` to the list of `test_randn` dtypes Fixes https://github.com/pytorch/pytorch/issues/43143 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43279 Reviewed By: mrshenli Differential Revision: D23230296 Pulled By: malfet fbshipit-source-id: b4bb66c4c81dd867e72ab7c4563d73f6a4d80a44	2020-08-20 17:38:06 -07:00
Xiang Gao	d5bc2a8058	Remove std::complex from c10::Half (#39833 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39833 Reviewed By: mrshenli Differential Revision: D22644987 Pulled By: anjali411 fbshipit-source-id: 5ae5db10b12d410560eca43234efa04b711a639c	2020-08-18 15:22:36 -07:00
Ailing Zhang	7cb8d68ae1	Rename XLAPreAutograd to AutogradXLA. (#43047 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/43047 Reviewed By: ezyang Differential Revision: D23134326 Pulled By: ailzhang fbshipit-source-id: 5fcbc23755daa8a28f9b03af6aeb3ea0603b5c9a	2020-08-17 10:47:43 -07:00
Muthu Arivoli	b8102b1550	Implement torch.nextafter (#42580 ) Summary: Related to https://github.com/pytorch/pytorch/issues/38349. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42580 Reviewed By: smessmer Differential Revision: D23012260 Pulled By: mruberry fbshipit-source-id: ce82a63c4ad407ec6ffea795f575ca7c58cd6137	2020-08-14 00:35:30 -07:00
Supriya Rao	6f8446840e	[quant] Create PerRowQuantizer for floating point scale and zero_point (#42612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42612 Add a new Quantizer that supports an input zero point (bias) that can be float. The quantization equation in this case is Xq = (Xf - bias) * inv_scale, where bias is float zero_point value We start with per-row implementation and can extend to per-tensor in the future, if necessary Test Plan: python test/test_quantization.py TestQuantizedTensor Imported from OSS Reviewed By: jerryzh168 Differential Revision: D22960142 fbshipit-source-id: ca9ab6c5b45115d3dcb1c4358897093594313706	2020-08-13 11:20:53 -07:00
Muthu Arivoli	92885ebe16	Implement hypot (#42291 ) Summary: Related to https://github.com/pytorch/pytorch/issues/38349 Closes https://github.com/pytorch/pytorch/issues/22764 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42291 Reviewed By: malfet Differential Revision: D22951859 Pulled By: mruberry fbshipit-source-id: d0118f2b6437e5c3f775f699ec46e946a8da50f0	2020-08-12 13:18:26 -07:00
Richard Zou	e8f4b04d9a	vmap: temporarily disable support for random functions (#42617 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42617 While we figure out the random plan, I want to initially disable support for random operations. This is because there is an ambiguity in what randomness means. For example, ``` tensor = torch.zeros(B0, 1) vmap(lambda t: t.normal_())(tensor) ``` in the above example, should tensor[0] and tensor[1] be equal (i.e., use the same random seed), or should they be different? The mechanism for disabling random support is as follows: - We add a new dispatch key called VmapMode - Whenever we're inside vmap, we enable VmapMode for all tensors. This is done via at::VmapMode::increment_nesting and at::VmapMode::decrement_nesting. - DispatchKey::VmapMode's fallback kernel is the fallthrough kernel. - We register kernels that raise errors for all random functions on DispatchKey::VmapMode. This way, whenever someone calls a random function on any tensor (not just BatchedTensors) inside of a vmap block, an error gets thrown. Test Plan: - pytest test/test_vmap.py -v -k "Operators" Reviewed By: ezyang Differential Revision: D22954840 Pulled By: zou3519 fbshipit-source-id: cb8d71062d4087e10cbf408f74b1a9dff81a226d	2020-08-11 07:19:51 -07:00
Basil Hosmer	b6810c1064	Include/ExcludeDispatchKeySetGuard API (#42658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42658 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D22971426 Pulled By: bhosmer fbshipit-source-id: 4d63e0cb31745e7b662685176ae0126ff04cdece	2020-08-08 16:27:05 -07:00
Basil Hosmer	c889de7e25	update DispatchKey::toString() (#42619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42619 Added missing entries to `DispatchKey::toString()` and reordered to match declaration order in `DispatchKey.h` Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D22963407 Pulled By: bhosmer fbshipit-source-id: 34a012135599f497c308ba90ea6e8117e85c74ac	2020-08-07 22:39:23 -07:00
aviloria	6755e49cad	Set proper return type (#42454 ) Summary: This function was always expecting to return a `size_t` value Pull Request resolved: https://github.com/pytorch/pytorch/pull/42454 Reviewed By: ezyang Differential Revision: D22993168 Pulled By: ailzhang fbshipit-source-id: 044df8ce17983f04681bda8c30cd742920ef7b1e	2020-08-07 19:22:35 -07:00
Sebastian Messmer	95f4f67552	Restrict conversion to SmallVector (#42694 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42694 The old implementation allowed calling SmallVector constructor and operator= for any type without restrictions, but then failed with a compiler error when the type wasn't a collection. Instead, we should only use it if Container follows a container concept and just not match the constructor otherwise. This fixes an issue kimishpatel was running into. ghstack-source-id: 109370513 Test Plan: unit tests Reviewed By: kimishpatel, ezyang Differential Revision: D22983020 fbshipit-source-id: c31264f5c393762d822f3d64dd2a8e3279d8da44	2020-08-07 15:47:29 -07:00
Basil Hosmer	1f689b6ef9	suppress all Autograd keys in AutoNonVariableTypeMode (#42610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42610 Fix for https://github.com/pytorch/pytorch/issues/42609: `AutoNonVariableTypeMode` should suppress all autograd dispatch keys, not just `Autograd` (e.g. `XLAPreAutograd`, `PrivateUse<N>_PreAutograd`) Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D22963408 Pulled By: bhosmer fbshipit-source-id: 2f3516580ce0c9136aff5e025285d679394f2f18	2020-08-06 13:15:42 -07:00
Ilia Cherniavskii	a53fdaa23f	Remove ProfiledType (#42570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42570 ProfiledType doesn't do anything and is not used atm, removing Test Plan: CI Reviewed By: ezyang Differential Revision: D22938664 Pulled By: ilia-cher fbshipit-source-id: 037c512938028f44258b702bbcde3f8c144f4aa0	2020-08-06 01:52:08 -07:00
Dmytro Dzhulgakov	06d978a9ad	[c10/cuda] Reorganize device_count() and robustly surface ASAN warnings (#42249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42249 Main change is to bring Caffe2's superior error messages for cuda initialization into c10 and use them in all code paths. Basic logic: \| Case \| Call to device_count() \| init_cuda, e.g. allocating tensor \| \| -- \| -- \| -- \| \| all good \| non-zero \| just works \| \| no gpus \| 0, no warning \| throw exception with good message \| \| driver issues \| 0, produce warning \| throw exception with good message \| \| out of memory with ASAN \| 0, produce warning\| throw exception with ASAN message \| Previously, the error thrown from init_cuda was very generic and the ASAN warning (if any) was buried in the logs. Other clean up changes: * cache device_count() always in a static variable * move all asan macros in c10 Test Plan: Hard to unittest because of build modes. Verified manually that the behavior from the table above holds by running the following script in different modes (ASAN/no-ASAN, CUDA_VISIBLE_DEVICES=): ``` print('before import') import torch print('after import') print('devices: ', torch.cuda.device_count()) x = torch.tensor([1,2,3]) print('tensor creation') x = x.cuda() print('moved to cuda') ``` Reviewed By: ngimel Differential Revision: D22824329 fbshipit-source-id: 5314007313a3897fc955b02f8b21b661ae35fdf5	2020-08-05 11:39:31 -07:00
Christian Puhrsch	24199e0768	tuple_map / tuple_concat (#42326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42326 ghstack-source-id: 108868289 Test Plan: Unit tests Reviewed By: smessmer Differential Revision: D22846504 fbshipit-source-id: fa9539d16e21996bbd80db3e3c524b174b22069e	2020-08-03 19:19:47 -07:00
Sebastian Messmer	3a19af2427	Make operators with optional Tensor? arguments c10-full (#41610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41610 Previously, operators that have a `Tensor?` (i.e. optional tensor) in their schema implemented it using `Tensor` in C++ and filled in an undefined tensor for the None case. The c10 operator library, however, expects `Tensor?` to be represented as `optional<Tensor>`, so those operators couldn't be c10-full yet and still had to use codegenerated unboxing instead of templated unboxing. This PR changes that. It extends the `hacky_wrapper_for_legacy_signatures` to not only take case of TensorOptions, but now also map between signatures taking `Tensor` and `optional<Tensor>`. For this, it requires an additional template parameter, the expected signature, and it uses that to go argument-by-argument and unwrap any optionals it finds. ghstack-source-id: 108873701 Test Plan: waitforsandcastle Reviewed By: bhosmer Differential Revision: D22607879 fbshipit-source-id: 57b2fb01a294b804f82cd55cd70f0ef4a478e14f	2020-07-31 16:09:08 -07:00
ziab	1c8217a7a6	Abstract cuda calls made from `torch_python` (#42251 ) Summary: * Make c10::cuda functions regular non-inlined functions * Add driver_version() and device_synchronize() functions With this change I don't see anymore direct calls to CUDA API when look at Modules.cpp.obj FYI malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/42251 Reviewed By: malfet Differential Revision: D22826505 Pulled By: ziab fbshipit-source-id: 8dc2f3e209d3710e2ce78411982a10e8c727573c	2020-07-30 19:18:33 -07:00
Hong Xu	344defc973	Let bfloat16 support promotion with other types (#41698 ) Summary: Fix https://github.com/pytorch/pytorch/issues/40580 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41698 Reviewed By: albanD Differential Revision: D22824042 Pulled By: mruberry fbshipit-source-id: 7dad9c12dc51d8f88c3ca963ae9c5f8aa2f72277	2020-07-30 12:28:09 -07:00
Basil Hosmer	029007c8b6	Improved coverage for unboxed->boxed kernel wrappers (#38999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38999 Adds boxing for inplace and outplace kernels, itemizes remaining unsupported cases, and fails compilation when new unsupported types are introduced in op signatures. Test Plan: Imported from OSS Differential Revision: D21718547 Pulled By: bhosmer fbshipit-source-id: 03295128b21d1843e86789fb474f38411b26a8b6	2020-07-29 11:31:16 -07:00
albanD	45c5bac870	[WIP] Fix cpp grad accessor API (#40887 ) Summary: Update the API to access grad in cpp to avoid unexpected thread safety issues. In particular, with the current API, a check like `t.grad().defined()` is not thread safe. - This introduces `t.mutable_grad()` that should be used when getting a mutable version of the saved gradient. This function is not thread safe. - The `Tensor& grad()` API is now removed. We could not do a deprecation cycle as most of our call side use non-const Tensors that use the non-const overload. This would lead to most calls hitting the warning. This would be too verbose for all the users. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40887 Reviewed By: ezyang Differential Revision: D22343932 Pulled By: albanD fbshipit-source-id: d5eb909bb743bc20caaf2098196e18ca4110c5d2	2020-07-16 09:11:12 -07:00
Nikita Shulga	702140758f	Move GLOG_ constants into c10 namespace (#41504 ) Summary: Declaring GLOG_ constants in google namespace causes a conflict in C++ project that uses GLOG and links with LibPyTorch compiled without GLOG. For example, see https://github.com/facebookresearch/ReAgent/issues/288 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41504 Reviewed By: kaiwenw Differential Revision: D22564308 Pulled By: malfet fbshipit-source-id: 2167bd2c6124bd14a67cc0a1360521d3c375e3c2	2020-07-15 21:56:00 -07:00

1 2 3 4 5 ...

788 Commits