pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Richard Barnes	8558c0e612	Eliminate narrowing conversion (#46730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46730 A narrowing conversion on `last_idx` raises a compiler warning. This fixes that. Test Plan: Standard pre-commit test rig. Reviewed By: EscapeZero Differential Revision: D24481497 fbshipit-source-id: f3e913b586738add59c422c3cf65035d87fc9e34	2020-10-22 20:08:59 -07:00
Ivan Kobzarev	3112e23428	[py][vulkan][reland] Add is_vulkan to py api, add vulkan to device type parsing (#46655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46655 Test Plan: Imported from OSS Pulled By: IvanKobzarev Reviewed By: mrshenli Differential Revision: D24448984 fbshipit-source-id: 5000846a06077f7a5a06dd51da422d2a42f70820	2020-10-22 09:35:50 -07:00
Shen Li	cebe87fe3a	Revert D24379422: [py][vulkan] Add is_vulkan to py api, add vulkan to device type parsing Test Plan: revert-hammer Differential Revision: D24379422 (`e8fbe54cf5`) Original commit changeset: afab89bb9e17 fbshipit-source-id: 743c77e453239f10c155c67490cba5a42ab42f58	2020-10-21 08:23:05 -07:00
Ivan Kobzarev	e8fbe54cf5	[py][vulkan] Add is_vulkan to py api, add vulkan to device type parsing (#46511 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46511 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D24379422 Pulled By: IvanKobzarev fbshipit-source-id: afab89bb9e17c50934083598262bbe14ea82e893	2020-10-20 20:04:24 -07:00
Wanchao Liang	e8ff0f6c5c	[c10] add operator= of intrusive_ptr to weak_intrusive_ptr (#44045 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44045 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23632281 Pulled By: wanchaol fbshipit-source-id: ea50427fc261f0c77ddaac2e73032827320d7077	2020-10-17 03:35:44 -07:00
Tao Xu	5da4a08675	[GPU] Add metal to DispatchKeySet (#46455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46455 After this PR(https://github.com/pytorch/pytorch/pull/46236 ) landed, the `aten::copy_` can no longer be dispatched to Metal kernels. ghstack-source-id: 114499399 Test Plan: - Sandcastle CI - Circle CI Reviewed By: IvanKobzarev, ailzhang Differential Revision: D24356769 fbshipit-source-id: 8660ca5be663fdc8985d9eb710ddaadbb43b0ddd	2020-10-16 16:33:26 -07:00
Nikita Shulga	b5702e2350	Fix out-of-bounds access for caching allocator calls (#46439 ) Summary: In assertValidDevice() compare device index to `caching_allocator.device_allocator` rather than to `device_no` Fixes potential crashes when caching allocator is accessed before being initialized, for example by calling something like: `python -c "import torch;print(torch.cuda.memory_stats(0))"` Fixes https://github.com/pytorch/pytorch/issues/46437 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46439 Reviewed By: ngimel Differential Revision: D24350717 Pulled By: malfet fbshipit-source-id: 714e6e74f7c2367a9830b0292478270192f07a7f	2020-10-16 08:24:46 -07:00
Tao Xu	053c252c66	Update COMPILE_TIME_MAX_DEVICE_TYPES to 12 (#46327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46327 ### Summary Update the COMPILE_TIME_MAX_DEVICE_TYPES to 12 as we landed a new Metal backend. ### Test Plan - Circle CI Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D24309189 Pulled By: xta0 fbshipit-source-id: eec076b7e4fc94bab11840318821aa554447e541	2020-10-15 02:09:17 -07:00
Kimish Patel	4aaad88790	Bug fixes in profiling allocator (#45993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45993 Some bug exposed via updated test and validation code. Also enabled this test to be run on CI instead of just mobile only test. Test Plan: cpu_profiling_allocator_test Imported from OSS Reviewed By: dzhulgakov Differential Revision: D24172599 fbshipit-source-id: da0d2e1d1dec87b476bf39a1c2a2ffa0e4b5df66	2020-10-14 22:45:04 -07:00
Basil Hosmer	d22455128f	[dispatcher] avoid autograd fixup step on non-backend keys (#46135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46135 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D24235974 Pulled By: bhosmer fbshipit-source-id: 21215b31146673caae904bb82395858419641633	2020-10-13 23:33:15 -07:00
Tao Xu	a277c097ac	[iOS][GPU] Add Metal/MPSCNN support on iOS (#46112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46112 ### Summary This PR adds the support of running torchscript models on iOS GPU via Metal (Inference only). The feature is currently in prototype state, API changes are expected. The tutorial and the documents will be added once it goes to beta. allow-large-files - Users API ``` auto module = torch::jit::load(model); module.eval(); at::Tensor input = at::ones({1,3,224,224}, at::ScalarType::Float).metal(); auto output = module.forward({input}).toTensor().cpu(); ``` - Supported Models - Person Segmentation v106 (FB Internal) - Mobilenetv2 - Supported Operators - aten::conv2d - aten::addmm - aten::add.Tensor - aten::sub.Tensor - aten::mul.Tensor - aten::relu - aten::hardtanh - aten::hardtanh_ - aten::sigmoid - aten::max_pool2d - aten::adaptive_avg_pool2d - aten::reshape - aten::t - aten::view - aten::log_softmax.int - aten::upsample_nearest2d.vec - Supported Devices - Apple A9 and above - iOS 10.2 and above - CMake scripts - `IOS_ARCH=arm64 ./scripts/build_ios.sh -DUSE_METAL=ON` ### Test Plan - Circle CI ghstack-source-id: 114155638 Test Plan: 1. Sandcastle CI 2. Circle CI Reviewed By: dreiss Differential Revision: D23236555 fbshipit-source-id: 98ffc48b837e308bc678c37a9a5fd8ae72d11625	2020-10-13 01:46:56 -07:00
Ailing Zhang	0ddcc0ce35	Add alias dispatch key DefaultBackend. (#45718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45718 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D24165892 Pulled By: ailzhang fbshipit-source-id: ed28bf62b7c6320d966fd10b7a44b14efffe2f62	2020-10-09 12:02:44 -07:00
Yinghai Lu	c9caa828f5	Throw special exception when backend compilation is met with fatal error (#45952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45952 Pull Request resolved: https://github.com/pytorch/glow/pull/4967 When glow compilation meets with nonrecoverable fatal error (hardware is busted), we would like to throw a special exception other than the normal caffe2::EnforceNotMet so that we can signal the upper layer application to handle it differently. Test Plan: Manually code some error and add LOG(FATAL) in the special exception path and wait for application to fatal. Reviewed By: ipiszy Differential Revision: D24156792 fbshipit-source-id: 4ae21bb0d36c89eac331fc52dd4682826b3ea180	2020-10-08 00:46:01 -07:00
Xiao Wang	fcc7f272de	maximum number of threads per block for sm_86 is 1536 (#45889 ) Summary: according to https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications Pull Request resolved: https://github.com/pytorch/pytorch/pull/45889 Reviewed By: albanD Differential Revision: D24131188 Pulled By: ngimel fbshipit-source-id: 31d3038f7b1bc403751448c62b19609573c67a49	2020-10-06 12:01:33 -07:00
Kimish Patel	a09e1098e7	Profiling allocator for mobile. (#43951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43951 AllocationPlan: Stores the sequence of allocations, their sizes and liftime of the allocations. Along with this it also stores the total size of a single memory blob, total_size, required to satisfy all the allocations. It also stores the offsets in the blob, of size total_size, corresponding to each allocation. Thus allocation plan contains: - allocation sizes - allocation lifetimes - allocation offsets - total size AllocationPlaner: Takes a pointer to the allocation plan and fills it ups with plan, i.e. sizes, lifetimes, offsets, total size. This is done via WithProfileAllocationsGuard which takes in AllocationPlan* and constructs AllocationPlanner* and set the thread local allocation_planner to it. MobileCPUAllocator profiles allocations via allocation_planner. In WithValidateAllocationsGuard, allocations profiled in the allocation plan are validated. CPUProfilingAllocator: Application owns CPUProfilingAllocator Using WithProfilingAllocatorGuard, it passes both CPUProfilingAllocator and AllocationPlan created earlier. Then CPUProfilingAllocator will manage allocations and frees according to the plan. Allocations that are not managed by CPUProfilingAllocator will be routed through c10::alloc_cpu, c10::free_cpu. Test Plan: cpu_profiling_allocator_test on mobile. Imported from OSS Reviewed By: dreiss Differential Revision: D23451019 fbshipit-source-id: 98bf1dbcfa8fcfb83d505ac01095e84a3f5b778d	2020-10-06 09:09:54 -07:00
peterjc123	b1373a74e0	Don't export enums for CUDA sources on Windows (#45829 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/45829 Reviewed By: VitalyFedyunin Differential Revision: D24113130 Pulled By: ezyang fbshipit-source-id: 8356c837ed3a790efecf8dfcc8fb6ee6f45bd6e2	2020-10-06 08:04:36 -07:00
Xiaodong Wang	2fbe5971b3	[pytorch/cuda] apply 16-bit mask to the index for device guard registry (#45485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45485 Essentially this is the problem reported by ezyang: https://fb.workplace.com/groups/llvm.gcc/permalink/4053565044692080. There are two proposed fixes: * https://github.com/pytorch/pytorch/pull/44883: this doesn't work because it fails some static assert at runtime ``` caffe2/c10/core/TensorOptions.h:553:1: error: static_assert failed due to requirement 'sizeof(c10::TensorOptions) <= sizeof(long) * 2' "TensorOptions must fit in 128-bits" static_assert( sizeof(TensorOptions) <= sizeof(int64_t) * 2, ^ ``` * https://github.com/pytorch/pytorch/pull/44885: to be tested This diff is a temp hack to work around the problem. W/o this patch: ``` volatile size_t device_type = static_cast<size_t>(type); auto p = device_guard_impl_registry[device_type].load(); C10_LOG_FIRST_N(WARNING, 10) << "XDW-fail: " << cntr << ", Device type: " << type << ", type cast: " << device_type << ", guard: " << p; // output XDW-fail: 1129, Device type: cuda, type cast: 65537, guard: 0 ``` Another workaround is D23788441, which changes -O3 to -O2. So this seems to be a miscompilation for nvcc or the host compiler. Reviewed By: ezyang Differential Revision: D23972356 fbshipit-source-id: ab91fbbfccb6389052de216f95cf9a8265445aea	2020-10-05 22:37:47 -07:00
Xiang Gao	2fa062002e	CUDA BFloat16 infrastructure (#44925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44925 Reviewed By: agolynski Differential Revision: D23783910 Pulled By: ngimel fbshipit-source-id: dacac2ad87d58056bdc68bfe0b7ab1de5c2af0d8	2020-10-02 16:21:30 -07:00
Supriya Rao	04526a49d3	[quant] creating quint4x2 dtype for quantized tensors (#44678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44678 This is a prototype PR that introduces 4 bit qtensors. The new dtype added for this is c10::quint4x2 The underlying storage for this is still uint8_t, so we pack 2 4-bit values in a byte while quantizing it. This change uses most of the existing scaffolding for qtensor storage. We allocate storage based on the dtype before creating a new qtensor. It also adds a dispatch mechanism for this dtype so we can use this to get the bitwidth, qmin and qmax info while quantizing and packing the qtensor (when we add 2-bit qtensor) Kernels that use this dtype should be aware of the packing format. Test Plan: Locally tested ``` x = torch.ones((100, 100), dtype=torch.float) qx_8bit = torch.quantize_per_tensor(x, scale=1.0, zero_point=2, dtype=torch.quint8) qx = torch.quantize_per_tensor(x, scale=1.0, zero_point=2, dtype=torch.quint4x2) torch.save(x, "temp.p") print('Size float (B):', os.path.getsize("temp.p")) os.remove('temp.p') torch.save(qx_8bit, "temp.p") print('Size quantized 8bit(B):', os.path.getsize("temp.p")) os.remove('temp.p') torch.save(qx, "temp.p") print('Size quantized 4bit(B):', os.path.getsize("temp.p")) os.remove('temp.p') ``` Size float (B): 40760 Size quantized 8bit(B): 10808 Size quantized 4bit(B): 5816 Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23993134 fbshipit-source-id: 073bf262f9680416150ba78ed2d932032275946d	2020-10-01 23:53:34 -07:00
Abaho Katabarwa	de3a48013a	Use CAFFE2_USE_MSVC_STATIC_RUNTIME to determine when to avoid waiting for global destructors on Windows (#43532 ) Summary: We are trying to build libtorch statically (BUILD_SHARED_LIBS=OFF) then link it into a DLL. Our setup hits the infinite loop mentioned [here](`54c05fa34e/torch/csrc/autograd/engine.cpp (L228)`) because we build with `BUILD_SHARED_LIBS=OFF` but still link it all into a DLL at the end of the day. This PR fixes the issue by changing the condition to guard on which windows runtime the build links against using the `CAFFE2_USE_MSVC_STATIC_RUNTIME` flag. `CAFFE2_USE_MSVC_STATIC_RUNTIME` defaults to ON when `BUILD_SHARED_LIBS=OFF`, so backwards compatibility is maintained. I'm not entirely confident I understand the subtleties of the windows runtime versus linking setup, but this setup works for us and should not affect the existing builds. Fixes https://github.com/pytorch/pytorch/issues/44470 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43532 Reviewed By: mrshenli Differential Revision: D24053767 Pulled By: albanD fbshipit-source-id: 1127fefe5104d302a4fc083106d4e9f48e50add8	2020-10-01 16:41:14 -07:00
Kimish Patel	6e55a26e10	Move mobile specific CPUCachingAllocator to c10/mobile folder. (#45364 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45364 Plus add some more comments about the usage, limitations and cons. Test Plan: Build and run benchmark binary. Reviewed By: gchanan Differential Revision: D23944193 fbshipit-source-id: 30d4f4991d2185a0ab768d94c846d73730fc0835	2020-09-29 11:33:26 -07:00
Mike Ruberry	ab5edf21b0	Revert D23789657: [wip] fast typeMeta/ScalarType conversion approach 2 Test Plan: revert-hammer Differential Revision: D23789657 (`1ed1a2f5b0`) Original commit changeset: 5afdd52d24bd fbshipit-source-id: 6d827be8895bcb39c8e85342eee0f7a3f5056c76	2020-09-29 09:40:53 -07:00
Basil Hosmer	1ed1a2f5b0	[wip] fast typeMeta/ScalarType conversion approach 2 (#44965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44965 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23789657 Pulled By: bhosmer fbshipit-source-id: 5afdd52d24bd097891ff4a7313033f7bd400165e	2020-09-29 02:39:36 -07:00
Bert Maher	03342af3a3	Add env variable to bypass CUDACachingAllocator for debugging (#45294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45294 While tracking down a recent memory corruption bug we found that cuda-memcheck wasn't finding the bad accesses, and ngimel pointed out that it's because we use a caching allocator so a lot of "out of bounds" accesses land in a valid slab. This PR adds a runtime knob (`PYTORCH_NO_CUDA_MEMORY_CACHING`) that, when set, bypasses the caching allocator's caching logic so that allocations go straight to cudaMalloc. This way, cuda-memcheck will actually work. Test Plan: Insert some memory errors and run a test under cuda-memcheck; observe that cuda-memcheck flags an error where expected. Specifically I removed the output-masking logic here: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/cuda_codegen.cpp#L819-L826 And ran: ``` PYTORCH_NO_CUDA_MEMORY_CACHING=1 cuda-memcheck pytest -k test_superslomo test_jit_fuser_te.py ``` Reviewed By: ngimel Differential Revision: D23964734 Pulled By: bertmaher fbshipit-source-id: 04efd11e8aff037b9edde80c70585cb820ee6e39	2020-09-28 11:40:04 -07:00
Sebastian Messmer	78fcde9c50	Trace scattered tensor options arguments (#44071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44071 Previously, tracing re-gathered ScalarType, Layout, Device, bool into a TensorOptions object and called `tracer::addInput()` on the gathered TensorOptions argument. `tracer::addInput()` then scattered them again and added the individual scattered arguments to the traced graph. This PR avoids the extraneous gathering and re-scattering step and calls `tracer::addInput()` on the individual arguments directly. This avoid the perf hit for an unnecessary gathering step. This applies to both c10-full and non-c10-full ops. In the case of c10-full ops, the tracing kernels takes scattered arguments and we can directly pass them to `tracer::addInput()`. In the case of non-c10-full ops, the kernel takes a `TensorOptions` argument but we still call `tracer::addInput()` on the scattered arguments. ghstack-source-id: 112825793 Test Plan: waitforsandcastle vs master: https://www.internalfb.com/intern/fblearner/details/216129483/ vs previous diff: https://www.internalfb.com/intern/fblearner/details/216170069/ Reviewed By: ezyang Differential Revision: D23486638 fbshipit-source-id: e0b53e6673cef8d7f94158e718301eee261e5d22	2020-09-25 09:04:06 -07:00
Sebastian Messmer	2ac7de7d53	Remove hacky_wrapper from BackendSelect kernels (#44062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44062 Previously, BackendSelect kernels were still written in the legacy way, i.e. they took one TensorOptions argument instead of scattered dtype, layout, device, pin_memory, and they used hacky_wrapper to be callable. This caused a re-wrapping step. Calling into a BackencSelect kernel required taking the individual scattered arguments, packing them into a TensorOptions, and the kernel itself then gathered them again for redispatch. Now with this PR, BackendSelect kernels are written in the new way and no hacky_wrapper or rewrapping is needed for them. ghstack-source-id: 112825789 Test Plan: vs master: https://www.internalfb.com/intern/fblearner/details/216117032/ vs previous diff: https://www.internalfb.com/intern/fblearner/details/216170194/ Reviewed By: ezyang Differential Revision: D23484192 fbshipit-source-id: e8fb49c4692404b6b775d18548b990c4cdddbada	2020-09-25 09:04:03 -07:00
Hong Xu	b470fa4500	Add complex number support for binary logical operators (#43174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43174 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23684425 Pulled By: mruberry fbshipit-source-id: 4857b16e18ec4c65327136badd7f04c74e32d330	2020-09-23 23:03:00 -07:00
Rohan Varma	70d2e4d1f6	[RPC profiling] Allow disableProfiler() to be called from another thread. (#44653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44653 This changes the profiler per a discussion with ilia-cher offline that enables `disableProfiler()` event consolidation logic to be called from different threads (i.e. threads where the profiler was not explicitly enabled). This is needed to support the functionality enabled by D23638387 where we defer profiling event collection until executing an async callback that can execute on a different thread, to support RPC async function profiling. This is done by introducing 2 flags `cleanupTLSState` and `consolidate` which controls whether we should clean up thread local settings (we don't do this when calling `disableProfiler()` on non-main threads) and whether we should consolidate all profiled events. Backwards compatiblity is ensured since both options are true by default. Added a test in `test_misc.cpp` to test this. ghstack-source-id: 112605620 Reviewed By: mrshenli Differential Revision: D23638499 fbshipit-source-id: f5bbb0d41ef883c5e5870bc27e086b8b8908f46b	2020-09-22 21:16:58 -07:00
Ailing Zhang	92f8f75c59	Add alias dispatch key Math. (#44354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44354 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23591481 Pulled By: ailzhang fbshipit-source-id: 6e93c4ec99a07f3fc920ba2d09dc222e6ced5adf	2020-09-21 11:10:39 -07:00
Ailing Zhang	a47e3697ab	Use iterator of DispatchKeySet. (#44682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44682 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23698387 Pulled By: ailzhang fbshipit-source-id: 4fa140db9254c2c9c342bf1c8dfd952469b0b779	2020-09-18 13:34:27 -07:00
Kimish Patel	f605d7581e	Implement better caching allocator for segmentation usecase. (#44618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44618 This diff refactors caching allocator to allow for overriding behavior by making it a virtual class. Test Plan: https://www.internalfb.com/intern/fblearner/details/218419618?tab=Experiment%20Results Reviewed By: dreiss Differential Revision: D23672902 fbshipit-source-id: 976f02922178695fab1c87f453fcb59142c258ec	2020-09-17 08:56:14 -07:00
Louis Feng	eb75cfb9c0	Back out "Revert D23323486: DPP Async Tracing" plus windows build fix. (#44702 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44702 Original commit changeset: c6bd6d277aca This diff caused windows build to fail due to a compiler bug in VS2019 (lambda capture constant int value). This back out works around the issue with explicit capture of const int value. Test Plan: Tested and previously landed. Reviewed By: mruberry Differential Revision: D23703215 fbshipit-source-id: f9ef23be97540bc9cf78a855295fb8c69f360459	2020-09-16 11:32:11 -07:00
Mike Ruberry	7036e91abd	Revert D23323486: DPP Async Tracing Test Plan: revert-hammer Differential Revision: D23323486 (`71673b31f9`) Original commit changeset: 4b6ca6c0e320 fbshipit-source-id: c6bd6d277aca070bef2de3522c2a60e23b4395ad	2020-09-15 01:19:23 -07:00
Louis Feng	71673b31f9	DPP Async Tracing (#44252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44252 Add tracing to DPP client. Because DPP requests are async, we need to be able to start a trace event in one thread and potentially end in a different thread. RecordFunction and LibgpumonObserver previously assume each trace event starts and finishes in the same thread. So they use a thread local context to track enter and exit call backs. Async events breaks this assumption. This change attaches the event context to the RecordFunction object so we do not need to use thread local context. Test Plan: Tested with dpp perf test and able to collect trace. {F307824044} Reviewed By: ilia-cher Differential Revision: D23323486 fbshipit-source-id: 4b6ca6c0e32028fb38a476cd1f44c17a001fc03b	2020-09-14 18:43:14 -07:00
kshitij12345	c68a99bd61	[numpy] Add `torch.exp2` (#44184 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 TODO * [x] Add tests * [x] Add docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/44184 Reviewed By: ngimel Differential Revision: D23674237 Pulled By: mruberry fbshipit-source-id: 7f4fb1900fad3051cd7fc9d3d7f6d985c5fb093c	2020-09-14 04:05:37 -07:00
Caleb Thomas	dd4bbe1a79	Add iterator like functionality for DispatchKeySet (#44066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44066 Add STL Input iterator to DispatchKeySet: * Iterator is able to iterate from first not undefined DispatchKey to NumDispatchKeys. * Iterator is invalidated once underlying DispatchKeySet is invalidated Note see http://www.cplusplus.com/reference/iterator/ for comparisons of different iterators. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23611405 Pulled By: linux-jedi fbshipit-source-id: 131b287d60226a1d67a6ee0f88571f8c4d29f9c3	2020-09-11 15:08:15 -07:00
Ailing Zhang	39bb455e36	Update fallback kernel for Autograd keys. (#44349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44349 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23589807 Pulled By: ailzhang fbshipit-source-id: 0e4b0bf3e07bb4e35cbf1bda22f7b03193eb3dc4	2020-09-11 12:04:52 -07:00
Giuseppe Ottaviano	6324ef4ced	[caffe2] Speed up compilation of aten-op.cc (#44440 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44440 `aten-op.cc` takes a long time to compile due to the large generated constructor. For each case, the `std::function` constructor and the initialization functions are inlined, producing a huge amount of intermediate code that takes a long time to optimize, given that many compiler optimization passes are superlinear in the function size. This diff moves each case to a separate function, so that each one is cheap to optimize, and the constructor is just a large jump table, which is easy to optimize. Reviewed By: dzhulgakov Differential Revision: D23593741 fbshipit-source-id: 1ce7a31cda10d9b0c9d799716ea312a291dc0d36	2020-09-09 21:21:48 -07:00
Ailing Zhang	4aacfab221	Resolve Autograd key for disable_variable_dispatch flag. (#44268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44268 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23561042 Pulled By: ailzhang fbshipit-source-id: 6f35cd9a543bea3f9e294584f1db7c3622ebb741	2020-09-08 21:27:52 -07:00
Ailing Zhang	1b2da9ed82	Expose alias key info in dumpState and update test_dispatch. (#44081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44081 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23492794 Pulled By: ailzhang fbshipit-source-id: 27a2978591900463bda2e92e0201c9fd719f9792	2020-09-06 18:43:05 -07:00
Xiang Gao	263412e536	Rename is_complex_t -> is_complex (#39906 ) Summary: `is_complex_t` is a bad name. For example in std, there are `std::is_same` but not `std::is_same_t`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39906 Reviewed By: mrshenli Differential Revision: D22665013 Pulled By: anjali411 fbshipit-source-id: 4b71745f5e2ea2d8cf5845d95ada4556c87e040d	2020-09-01 21:04:19 -07:00
Ailing Zhang	224232032c	Move Autograd to an alias dispatch key (#43070 ) Summary: This PR moves `DispatchKey::Autograd` to an alias dispatch key mapping to `AutogradCPU, AutogradCUDA, AutogradXLA, AutogradOther, AutogradPrivate*` keys. A few things are handled in this PR: - Update alias dispatch key mapping and precompute dispatchTable logic - Move `Autograd` key from `always_included` set to TensorImpl constructor. - Update `dummyTensor` constructor to take `requires_grad` as optional argument so that it's closer to the real application in op_registration_test. - Use `BackendSelect` key for both backend select before and after autograd layer. (1 liner in backend_select codegen) A few planned followups ordered by priority: - [cleanup] Update `test_dispatch.py` to include testing `Autograd`. - [cleanup] Add Math alias key and move catchAll to Math. (to remove 2.2 in `computeDispatchTableEntryWithDebug`) - [new feature] Add support for Math in native_functions.yaml - [cleanup] Add iterator like functionality to DispatchKeySet - [cleanup/large] Only add Autograd backend keys when tensor requires grad. (cc: ljk53 ?) Pull Request resolved: https://github.com/pytorch/pytorch/pull/43070 Reviewed By: ezyang Differential Revision: D23281535 Pulled By: ailzhang fbshipit-source-id: 9ad00b17142e9b83304f63cf599f785500f28f71	2020-09-01 09:05:29 -07:00
Ashkan Aliabadi	4e39c310eb	Move torch/csrc/utils/hash.h to c10/util/hash.h. (#42503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42503 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D23252331 Pulled By: AshkanAliabadi fbshipit-source-id: 3c4c0e27b9a7eec8560e374c2a3ba5f1c65dae48	2020-08-29 17:47:00 -07:00
Nikita Shulga	d10056652b	Enable `torch.half` for `lt` and `masked_select` (#43704 ) Summary: Enable testing of those options in `TestTorchDeviceTypeCPU.test_logical_cpu` and `TestTorchDeviceTypeCPU.test_masked_select_cpu_float16` Add `view_as_real` testing for `torch.complex32` type Pull Request resolved: https://github.com/pytorch/pytorch/pull/43704 Reviewed By: albanD Differential Revision: D23373070 Pulled By: malfet fbshipit-source-id: 00f17f23b48513379a414227aea91e2d3c0dd5f9	2020-08-29 02:37:26 -07:00
Kimish Patel	dc5d365514	Fix bug in caching allocator. (#43719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43719 Accidentally this slipped through: with guard did not update the current context Test Plan: cpu_caching_allocator_test Reviewed By: linbinyu Differential Revision: D23374453 fbshipit-source-id: 1d3ef21cc390d0a8bde98fb1b5c2175b40ab571b	2020-08-28 11:56:23 -07:00
Jiakai Liu	3a0e35c9f2	[pytorch] deprecate static dispatch (#43564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43564 Static dispatch was originally introduced for mobile selective build. Since we have added selective build support for dynamic dispatch and tested it in FB production for months, we can deprecate static dispatch to reduce the complexity of the codebase. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23324452 Pulled By: ljk53 fbshipit-source-id: d2970257616a8c6337f90249076fca1ae93090c7	2020-08-27 14:52:48 -07:00
Yi Zhang	47e1b7a8f1	Set CONSTEXPR_EXCEPT_WIN_CUDA as const while it is not constexpr (#43380 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42467 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43380 Reviewed By: albanD Differential Revision: D23278930 Pulled By: pbelevich fbshipit-source-id: 6ce0bc9fd73cd0ead46c414fdea5f6fb7e9fec3e	2020-08-22 03:25:37 -07:00
Basil Hosmer	915fd1c8fc	centralize autograd dispatch key set (#43387 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43387 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D23258687 Pulled By: bhosmer fbshipit-source-id: 3718f74fc7324db027f87eda0b90893a960aa56e	2020-08-22 00:46:02 -07:00
Kimish Patel	2a08566b8f	Simple caching allocator for CPU. (#42006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42006 This PR introduces a simple CPU caching allocator. This is specifically intended for mobile use cases and for inference. There is nothing specific to the implementation that can prevent it from other use cases, however its simplicity may not be suitable everywhere. It simply tracks allocation by sizes and relies on deterministic repeatable behavior where allocation of same sizes are made on every inference. Thus after the first allocation when the pointer is returned, instead of returning it to system, allocator caches it for subsequent use. Memory is freed automatically at the end of the process, or it can be explicitly freed. This is enabled at the moment in DefaultMobileCPUAllocator only. Test Plan: android test: cpu_caching_allocator_test Imported from OSS Reviewed By: dreiss Differential Revision: D22726976 fbshipit-source-id: 9a38b1ce34059d5653040a1c3d035bfc97609e6c	2020-08-21 19:09:22 -07:00
Mike Ruberry	3aec1185e0	Enables bfloat16 x [float16, complex64, complex128] type promotion (#43324 ) Summary: Implements bfloat16 type promotion consistent with JAX (see https://jax.readthedocs.io/en/latest/type_promotion.html), addressing issue https://github.com/pytorch/pytorch/issues/43049. - bfloat16 x float16 -> float32 - bfloat16 x complex64 -> complex64 - bfloat16 x complex128 -> complex128 Existing tests, after updates, are sufficient to validate the new behavior. cc xuhdev Pull Request resolved: https://github.com/pytorch/pytorch/pull/43324 Reviewed By: albanD Differential Revision: D23259823 Pulled By: mruberry fbshipit-source-id: ca9c2c7d0325faced1f884f3c37edf8fa8c8b089	2020-08-21 10:48:04 -07:00

1 2 3 4 5 ...

810 Commits