pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Gregory Chanan	bd1271338a	Add python_variable._is_view for debugging. (#18197 ) Summary: I don't know if we actually want to expose this or not, but it's useful for debugging. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18197 Reviewed By: ezyang Differential Revision: D14530712 Pulled By: gchanan fbshipit-source-id: 98fdba9cf113738f0db3a198c49365de536b9919	2019-03-20 08:43:02 -07:00
Edward Yang	18949c8e00	Add nbytes, itemsize, element_size to at::Tensor. (#17810 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17810 Partially addresses #12728. Also, switch the element_size bindings to use the new function, rather than the method on Type. We don't add Python bindings yet, as they need to be special (they will be properties.) Differential Revision: D14388790 fbshipit-source-id: 294183d0c8a59b0c13f2bf21d6f1cd557333e83b	2019-03-12 09:48:54 -07:00
Roy Li	7aae51cded	Replace tensor.type().scalarType() calls with tensor.scalar_type() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17515 Reviewed By: ezyang Differential Revision: D14233250 fbshipit-source-id: 6c7af8d2291c0c2b148001b30cf03834f34366c0	2019-03-08 14:08:18 -08:00
Xiang Gao	2e5a8cee82	Customize the printing of namedtuple return (#17136 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/17112 ```python print("good", torch.randn(5,5,5).max(1)) print("terrible", torch.randn(5,5,10).max(1)) print("not as good", torch.randn(5,5,500).max(1)) print ("old behaviour = gold standard") print(tuple(torch.randn(5,5,5).max(1))) print(tuple(torch.randn(5,5,10).max(1))) print(tuple(torch.randn(5,5,500).max(1))) ``` now gives ``` >>> import torch >>> print("good", torch.randn(5,5,5).max(1)) good torch.return_types.max( values=tensor([[ 1.2821, 1.8063, 1.8075, 1.3082, -0.1267], [ 0.3437, 0.7353, 1.2619, 0.7557, 1.6662], [ 0.8583, 1.8906, 1.0246, 1.7598, 1.1184], [ 1.7821, 0.0230, 0.9452, 1.0318, 1.0823], [ 0.4116, -0.0379, -0.1843, 1.4129, 1.8796]]), indices=tensor([[4, 4, 3, 2, 1], [1, 2, 4, 1, 1], [2, 4, 0, 2, 1], [0, 2, 0, 3, 1], [0, 4, 4, 4, 4]])) >>> print("terrible", torch.randn(5,5,10).max(1)) terrible torch.return_types.max( values=tensor([[ 2.1272, 1.3664, 2.2067, 1.3974, -0.0883, 1.2505, 1.0074, 1.1217, 0.3849, 0.6936], [ 0.6288, -0.4560, 1.2748, 1.5482, 1.2777, 1.6874, 0.7151, 0.6041, 1.3572, 1.6232], [ 1.6703, 1.0075, 1.6480, 2.2839, 1.3390, 0.4938, 1.6449, 1.7628, 0.8141, 2.5714], [ 0.7079, 1.8677, 3.2478, 1.5591, 2.4870, 0.8635, -0.1450, 1.6923, 1.4924, 1.6298], [ 2.4056, 0.8002, 0.9317, 0.7455, 0.7866, 2.1191, 0.3492, 1.2095, 1.8637, 1.7470]]), indices=tensor([[1, 1, 0, 0, 0, 0, 3, 4, 4, 4], [4, 2, 2, 1, 2, 2, 3, 1, 1, 3], [0, 3, 3, 0, 2, 1, 4, 1, 0, 1], [4, 1, 3, 0, 3, 2, 0, 1, 4, 3], [1, 0, 3, 2, 1, 0, 0, 1, 0, 1]])) >>> print("not as good", torch.randn(5,5,500).max(1)) not as good torch.return_types.max( values=tensor([[ 0.3877, 0.7873, 1.8701, ..., 0.5971, 1.6103, -0.3435], [ 1.1300, 2.2418, 1.4239, ..., 1.3943, 0.3872, 1.6475], [ 2.0656, 1.3136, 0.9896, ..., 2.3918, 0.8226, 1.0517], [ 1.1054, 0.9945, 1.0561, ..., 2.1039, 1.1524, 3.0304], [ 1.5041, 2.2809, 1.0883, ..., 0.8504, 2.4774, 1.1041]]), indices=tensor([[4, 3, 1, ..., 1, 4, 0], [4, 4, 4, ..., 3, 0, 3], [3, 0, 1, ..., 2, 2, 4], [0, 1, 1, ..., 4, 2, 2], [1, 0, 4, ..., 2, 0, 2]])) >>> print ("old behaviour = gold standard") old behaviour = gold standard >>> print(tuple(torch.randn(5,5,5).max(1))) (tensor([[ 1.1908, 1.1807, 1.3151, 1.7184, 0.3556], [ 0.3798, 0.9213, 0.3001, 1.3087, 2.2419], [ 1.4233, 1.4814, 1.9900, 1.7744, 1.3059], [ 1.0026, -0.0330, 1.3061, 1.8730, 2.0685], [ 1.3041, 1.6458, 1.3449, 1.8948, 3.6206]]), tensor([[0, 4, 3, 4, 0], [1, 1, 4, 0, 4], [4, 1, 0, 3, 3], [1, 2, 1, 4, 0], [3, 3, 0, 3, 3]])) >>> print(tuple(torch.randn(5,5,10).max(1))) (tensor([[-0.1232, 0.8275, 0.6732, 1.1223, 0.8247, 1.2851, 1.6009, 1.9979, 1.9109, 0.7313], [ 0.2260, 0.5922, 1.6928, 0.6024, 2.1158, 3.0619, 0.5653, 0.7426, 0.8316, 0.6346], [ 0.4319, 0.2231, 0.5255, 1.7620, 1.1657, 0.8875, 0.5782, 0.6506, 0.5032, 1.7097], [ 0.4137, 1.7265, 1.4260, 2.0301, 1.2244, 0.7128, 2.6345, 0.7230, 1.3553, 1.6508], [ 1.0684, 1.7195, 1.4068, 0.7076, -0.0242, 0.8474, 0.8754, 1.7108, 0.2188, 1.1584]]), tensor([[0, 1, 3, 4, 2, 3, 4, 2, 1, 0], [1, 4, 0, 0, 3, 2, 0, 0, 3, 3], [2, 3, 1, 1, 4, 0, 1, 4, 4, 4], [0, 4, 1, 3, 2, 0, 2, 0, 3, 1], [1, 0, 0, 0, 0, 3, 3, 3, 2, 0]])) >>> print(tuple(torch.randn(5,5,500).max(1))) (tensor([[0.9395, 1.5572, 1.8797, ..., 2.0494, 0.8202, 0.9623], [1.7937, 0.7225, 1.8836, ..., 0.7927, 1.4976, 1.1813], [0.8558, 1.6943, 1.4192, ..., 0.8327, 1.9661, 0.4197], [1.2993, 1.4995, 0.9357, ..., 0.7810, 1.3030, 2.6216], [1.4206, 1.8315, 1.0338, ..., 1.4312, 1.3198, 1.5233]]), tensor([[0, 4, 3, ..., 3, 0, 2], [0, 1, 0, ..., 0, 4, 3], [3, 4, 3, ..., 3, 0, 0], [3, 2, 3, ..., 1, 2, 1], [1, 2, 4, ..., 3, 1, 3]])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17136 Differential Revision: D14250021 Pulled By: VitalyFedyunin fbshipit-source-id: aae72f03b35980063b1ac1f07b8353eddb0c8b93	2019-02-28 13:07:26 -08:00
David Riazati	18edd3ab08	Warn when tracing legacy constructors Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16770 Differential Revision: D13963581 Pulled By: driazati fbshipit-source-id: 8f8cdfc455ba65be370fd952fc5e5c233525d002	2019-02-05 18:32:59 -08:00
Edward Yang	4404762d7d	Rename IntList to IntArrayRef. (#16751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16751 This was made more complicated by the fact that ivalue::IntList is a thing. So I had to fix all of the sites where we referring to IValue post facto. The following codemods were run, in this order: ``` codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntList IntArrayRef codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntArrayRef::create IntList::create codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in ivalue::IntArrayRef ivalue::IntList codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in Tag::IntArrayRef Tag::IntList codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in isIntArrayRef isIntList codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in toIntArrayRef toIntList codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'Shared<IntArrayRef>' 'Shared<IntList>' codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'intrusive_ptr<IntArrayRef>' 'intrusive_ptr<IntList>' ``` Some manual fixups were done afterwards; they can be reviewed separately at https://github.com/pytorch/pytorch/pull/16752 Reviewed By: dzhulgakov Differential Revision: D13954363 fbshipit-source-id: b5c40aacba042402155a2f5a229fa6db7992ac64	2019-02-05 14:54:34 -08:00
James Reed	d1ed0176df	Trace fork and join calls Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16232 Differential Revision: D13772974 Pulled By: jamesr66a fbshipit-source-id: b2db370271809e26d3301f8cc98eec567db5e62b	2019-01-26 14:42:45 -08:00
Edward Yang	e936a69085	Move THCCachingAllocator to c10_cuda. (#16119 ) Summary: Some renaming and renamespacing also took place. I was originally planning not to do anything, but it turns out that it was easier to make HIPify work by using a namespace CUDACachingAllocator:: rather than THCCachingAllocator_, since :: is a word boundary but _ is not. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16119 Reviewed By: smessmer Differential Revision: D13718768 fbshipit-source-id: 884a481d99027fd3e34471c020f826aa12225656	2019-01-24 12:06:56 -08:00
Edward Yang	24b50f1411	Remove unnecessary includes and headers from THCCachingAllocator, move to at::cuda:: namespace (#16117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16117 This means I can move it to c10_cuda with minimal fuss. Reviewed By: smessmer Differential Revision: D13717836 fbshipit-source-id: a94c7dc649af64542480fc1c226b289588886c00	2019-01-24 12:06:54 -08:00
Shen Li	24f4d3987e	Move all Stream and Event Python implementation to C++ (#15937 ) Summary: 1. Added `torch/csrc/cuda/Event.h` and `torch/csrc/cuda/Event.cpp` to bind Python Event class to C++ implementation. 2. Move all CUDA runtime invocations from `torch/cuda/streams.py` to C++ 3. Added tests to cover Stream and Event APIs. ~(event IPC handle tests is introduced in #15974)~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/15937 Differential Revision: D13649001 Pulled By: mrshenli fbshipit-source-id: 84ca58f35f6ba679a4ba33150ceba678d760d240	2019-01-17 07:29:22 -08:00
Thomas Viehmann	2d56df7892	Use .to to convert new tensors in new_tensor (#14097 ) Summary: This would solve the tracing problems of #13969. Fixes: #14732 I would appreciate if this got good scrutiny before applied. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14097 Differential Revision: D13323181 Pulled By: ezyang fbshipit-source-id: dcd104b497c0bfddb751923c6166a3824b7a3702	2018-12-04 14:03:56 -08:00
Edward Yang	c5cc1e3ab2	Delete legacy THCStream (long live THCStream). (#14246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14246 This commit systematically eliminates THCStream entirely from THC, replacing it with at::cuda::CUDAStream. In places where the previous pointer type showed up in a public API signature, those functions are now only available to C++ clients. (It would not be too difficult to make a C-compatible version of CUDAStream, as it's really just a simple struct, but we leave this for future work.) All functions in THC that referred to THCStream were expunged in favor of their modern counterparts. One annoyance was that I didn't feel like redoing how the torch.cuda.Stream binding code worked, but I really wanted to get rid of the stored THCStream* pointer. So I repurposed the bit-packing code I implemented for Stream hashing, and used that to (reversibly) store streams in a uint64_t cdata field. A perhaps more future proof solution would be to get rid of cdata entirely, and store the device and stream ID directly. Billing of changes: - All CUDAStream_ pointer API functions are now hidden and anonymously namespaced (instead of being in the impl namespace). All use sites rewritten to use the modern C++ API. Since CUDAStreamInternals is no longer part of the public API, the CUDAStreamInternals constructor and internals() method have been removed, and replaced with anonymous functions in the C++ file. - device_index() returns DeviceIndex rather than int64_t now - Stream and CUDAStream now have pack/unpack methods. (CUDAStream checks that the unpacked bit-pattern is for a CUDA device.) - THCStream.h header is removed entirely - Most THCStream handling functions in THC API are removed Reviewed By: gchanan Differential Revision: D13121531 fbshipit-source-id: 48873262cc0a37c3eec75a7ba1c93c800da40222	2018-11-27 08:32:09 -08:00
Edward Yang	e35418b3be	New implementations of DeviceGuard, StreamGuard and MultiStreamGuard (with CUDA specializations) (#13342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13342 This PR introduces a few new concepts: - DeviceGuardImplInterface, and implementations for CPU and CUDA, which provide a generic interface for interfacing with device and stream state, without requiring a direct dependency on the code in question. - InlineDeviceGuard, a general template for generating both specialized and dynamically dispatched device guard implementations. Dynamic dispatch is done by specializing it on a VirtualGuardImpl. - Provide a device-independent DeviceGuard class, which can be used even from CPU code. It uses the aforementioned dynamic dispatch. - CUDA-specialized CUDAGuard class, which doesn't have a dynamic dispatch but can only be used from CUDA. - StreamGuard, which is the same as above, but for streams rather than devices. - Optional variants of all the aforementioned guards, which are a no-op if no device/stream is specified - CUDAMultiStreamGuard, specifically for the case when we want to set a device on every guard. There are some subtle semantic changes, which have been thoroughly documented in the class definition. BC-breaking changes: - Move constructor/assignment have been removed from all device guard implementations. - In some cases where you previously wrote 'set_device' (or 'set_stream'), you now must write 'reset_device', because if you switch devices/device types, the stream/device on the previous device is unset. This is different from previous behavior. - CUDAGuard no longer handles streams, or multiple streams. Use CUDAStreamGuard or CUDAMultiStreamGuard as appropriate for your use case. Reviewed By: dzhulgakov Differential Revision: D12849620 fbshipit-source-id: f61956256f0b12be754b3234fcc73c2abc1be04e	2018-11-11 12:11:10 -08:00
Wanchao Liang	e552c04d53	Add proper comment for dispatch_to (#13783 ) Summary: Add proper comment to the fix in https://github.com/pytorch/pytorch/pull/13700 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13783 Differential Revision: D13009956 Pulled By: wanchaol fbshipit-source-id: 34f5259204dab12f4159ab191e7b08e2f5226292	2018-11-09 15:48:15 -08:00
Gregory Chanan	a1b2f1710d	Remove _th_is_contiguous, make is_set_to a function, not a method. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13725 Differential Revision: D12980246 Pulled By: gchanan fbshipit-source-id: e5c5742a67e5a25062df736e28b44c133a635ca8	2018-11-09 07:02:38 -08:00
Wanchao Liang	411d89ca64	Fix the bug in dispatch_to when calling cpu() (#13700 ) Summary: When we added to in #13146, we did not emit the cast correctly in one of the dispatch overloads, then when we call .cpu(), the dtype will always be the default float type, which is wrong. CC jamesr66a eellison Pull Request resolved: https://github.com/pytorch/pytorch/pull/13700 Differential Revision: D12968699 Pulled By: wanchaol fbshipit-source-id: c1aaf2bf6a163643ce5360797da61c68271d8bf8	2018-11-07 22:57:35 -08:00
Richard Zou	8c2d0c831f	Speed up tensor.storage_offset (#13267 ) Summary: This PR special cases tensor.storage_offset to avoid dispatches in the common case. tensor.storage_offset is important for torch.as_strided performance, because as_strided(sizes, strides) shares an implementation with as_strided(sizes, strides, storage_offset) and it might not be the best if there were two separate implementations (including backward implementations). This PR reduces times on a tensor.storage_offset microbenchmark from 22ns to 2ns (these numbers are pretty stable). For a torch.as_strided benchmark, this PR reduces numbers from 1042 to 928ns, a 100ns improvement, but this number is noisy and goes up and down. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13267 Reviewed By: ezyang Differential Revision: D12829828 Pulled By: zou3519 fbshipit-source-id: df907731e2398ce2baf1c8b1860a561ccc456f78	2018-10-30 07:36:21 -07:00
Richard Zou	efab8e8fdf	Speed up tensor.get_device(), is_cuda(), is_sparse() by avoiding dispatches (#12841 ) Summary: `tensor.get_device()` went through two dispatches: once to the native function `get_device()`, and another when `get_device` calls `_th_get_device()`. This PR avoids the dispatch by directly implementing the `get_device` function as a method on Tensor. Future Work: - Investigate caching Device on TensorImpl. This will probably bring the tensor.get_device down to 2ns, but I'm not sure it's worth it. before: ``` ------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------ BM_TensorTypeId 0 ns 0 ns 1000000000 BM_TensorType 8 ns 8 ns 89407911 BM_TensorIsCuda 24 ns 24 ns 29313017 BM_TensorIsSparse 27 ns 27 ns 26083160 BM_TensorTypeIsCuda 11 ns 11 ns 65128120 BM_TensorNumel 11 ns 11 ns 68314492 BM_TensorGetDevice 71 ns 71 ns 9633125 BM_DeviceGuardCtor 173 ns 173 ns 4067173 BM_DeviceGuard 232 ns 232 ns `3009690` ``` after: ``` ------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------ BM_TensorTypeId 0 ns 0 ns 1000000000 BM_TensorType 10 ns 10 ns 69803872 BM_TensorIsCuda 2 ns 2 ns 321626683 BM_TensorIsSparse 6 ns 6 ns 177045382 BM_TensorNumel 12 ns 12 ns 58770533 BM_TensorGetDevice 4 ns 4 ns 128113396 BM_DeviceGuardCtor 52 ns 52 ns 14997278 BM_DeviceGuard 158 ns 158 ns 5767248 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12841 Differential Revision: D10489353 Pulled By: zou3519 fbshipit-source-id: a596bc77352f21d5d35433c6de02c2f65aab5f9e	2018-10-25 19:57:52 -07:00
Thomas Viehmann	ba25e13782	Forbid Module.to with copy argument. (#12617 ) Summary: Module.to uses the Tensor.to parsing facility. It should not, however, accept "copy" as a keyword/fourth positional argument. See #12571 for discussion. Thank you SsnL for noticing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12617 Differential Revision: D10392053 Pulled By: ezyang fbshipit-source-id: b67a5def7993189b4b47193abc7b741b7d07512c	2018-10-16 20:31:44 -07:00
Yangqing Jia	713e706618	Move exception to C10 (#12354 ) Summary: There are still a few work to be done: - Move logging and unify AT_WARN with LOG(ERROR). - A few header files are still being plumbed through, need cleaning. - caffe2::EnforceNotMet aliasing is not done yet. - need to unify the macros. See c10/util/Exception.h This is mainly a codemod and not causing functional changes. If you find your job failing and trace back to this diff, usually it can be fixed by the following approaches: (1) add //caffe2/c10:c10 to your dependency (or transitive dependency). (2) change objects such as at::Error, at::Optional to the c10 namespace. (3) change functions to the c10 namespace. Especially, caffe2::MakeString is not overridden by the unified c10::str function. Nothing else changes. Please kindly consider not reverting this diff - it involves multiple rounds of rebasing and the fix is usually simple. Contact jiayq@ or AI Platform Dev for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12354 Reviewed By: orionr Differential Revision: D10238910 Pulled By: Yangqing fbshipit-source-id: 7794d5bf2797ab0ca6ebaccaa2f7ebbd50ff8f32	2018-10-15 13:33:18 -07:00
Thomas Viehmann	0cf3c1ce66	Add copy= keyword to Tensor.to (#12571 ) Summary: Fixes: #12454 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12571 Differential Revision: D10356994 Pulled By: SsnL fbshipit-source-id: d87416078a5a8e5ffa690cd73c09fa6b4e16aa25	2018-10-12 02:10:44 -07:00
James Reed	2279299c6c	Implement aten::contiguous (#12541 ) Summary: Implement contiguous as `aten::contiguous` so it can be recorded during tracing. This was causing issues with both the trace checker as well as when a `contiguous()`-ed tensor was used downstream in a view that expected certain strides Pull Request resolved: https://github.com/pytorch/pytorch/pull/12541 Differential Revision: D10304028 Pulled By: jamesr66a fbshipit-source-id: dc4c878771d052f5a0e9674f610fdec3c6782c41	2018-10-11 23:39:39 -07:00
Christian Puhrsch	a9e6a673ae	Remove caffe2::Tensor::capacity_nbytes, at::Tensor::to##name##Data, (#11876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11876 Modern C++ api instead of macros, item() is aligned with Python frontend. caffe2::Tensor::capacity_nbytes is effecitvely unused and confusing w.r.t. caffe2::Tensor::nbytes(). codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCByte "item<uint8_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCLong "item<int64_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCInt "item<int32_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCDouble "item<double>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCFloat "item<float>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toByteData "data<uint8_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toLongData "data<int64_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toIntData "data<int32_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toDoubleData "data<double>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toFloatData "data<float>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCByte "item<uint8_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCLong "item<int64_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCInt "item<int32_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCDouble "item<double>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCFloat "item<float>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toByteData "data<uint8_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toLongData "data<int64_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toIntData "data<int32_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toDoubleData "data<double>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toFloatData "data<float>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCComplexDouble "item<std::complex<double>>" codemod -d tc --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCFloat "item<float>" Reviewed By: ezyang Differential Revision: D9948572 fbshipit-source-id: 70c9f5390d92b82c85fdd5f8a5aebca338ab413c	2018-09-24 10:40:10 -07:00
Adam Paszke	90e31f4896	Improve tracer warnings (#11545 ) Summary: Also, fix a performance bug in `ensureUnique`. Previously it formatted the warning string even though we weren't tracing, so all that work would always happen in the hot path and be for nothing. A sample of how the new warnings look like: ``` tmp.py:4: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Pytho n values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! int(x) tmp.py:5: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this fun ction to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might caus e the trace to be incorrect. torch.tensor([1.]) tmp.py:6: TracerWarning: There are 2 live references to the data region being modified when tracing in-place operator add_. This might cause t he trace to be incorrect, because all other views that also reference this data will not not reflect this change in the trace! On the other ha nd, if all other views use the same memory, but are disjoint (e.g. are outputs of torch.split), this might still be safe. torch.split(y, 2, dim=1)[0].add_(2) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11545 Differential Revision: D9782975 Pulled By: apaszke fbshipit-source-id: 5b3abd31366e59c69e0b7ff278042b5563deb5a9	2018-09-11 22:10:32 -07:00
Adam Paszke	62c9d4ac96	Make .to() methods native functions (to fix JIT tracing) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11491 Differential Revision: D9771121 Pulled By: apaszke fbshipit-source-id: 08d11101fb12093f8cf913b06359adddf3af9da7	2018-09-11 21:55:42 -07:00
Edward Yang	ac9268f25d	Conversions to and from complex numbers. (#11420 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11420 Surprisingly tricky! Here are the major pieces: - We grow a even yet more ludicrous macro AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_EXCEPT_COMPLEX_HALF which does what it says on the tin. This is because I was too lazy to figure out how to define the necessary conversions in and out of ComplexHalf without triggering ambiguity problems. It doesn't seem to be as simple as just Half. Leave it for when someone actually wants this. - Scalar now can hold std::complex<double>. Internally, it is stored as double[2] because nvcc chokes on a non-POD type inside a union. - overflow() checking is generalized to work with complex. When converting to std::complex<T>, all we need to do is check for overflow against T. When converting from complex, we must check (1) if To is not complex, that imag() == 0 and (2) for overflow componentwise. - convert() is generalized to work with complex<->real conversions. Complex to real drops the imaginary component; we rely on overflow checking to tell if this actually loses fidelity. To get the specializations and overloads to work out, we introduce a new Converter class that actually is specializable. - Complex scalars convert into Python complex numbers - This probably fixes complex tensor printing, but there is no way to test this right now. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Reviewed By: cpuhrsch Differential Revision: D9697878 Pulled By: ezyang fbshipit-source-id: 181519e56bbab67ed1e5b49c691b873e124d7946	2018-09-08 16:39:43 -07:00
James Reed	03c06ec93d	Traceable detach (#11038 ) Summary: This makes it so `detach` and `detach_` are traceable and also adds a pass to erase them before ONNX export Pull Request resolved: https://github.com/pytorch/pytorch/pull/11038 Differential Revision: D9588038 Pulled By: jamesr66a fbshipit-source-id: 263dd3147e24fcb0c716743f37fdb9f84c0015e7	2018-08-31 16:40:42 -07:00
Adam Paszke	780d2792c5	Warn about non-traceable behavior when tracing (#11088 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11088 Differential Revision: D9585527 Pulled By: apaszke fbshipit-source-id: 29a03cb152d83b626f748fff4501ac9e139994c2	2018-08-31 14:27:00 -07:00
Edward Yang	750ede7215	Rename getType to getVariableTypeFromBaseType / getVariableType (#11095 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11095 We used getType to mean a lot of things. - getVariableTypeFromBaseType: given a base Type (non-Variable type) compute the Variable Type which corresponds to it. - getVariableType: like at::getType, but return the Variable type rather than the plain type. This rename makes it clearer at the use-site what things are what, and will make a subsequent rename of at::getType easier. Reviewed By: gchanan, cpuhrsch Differential Revision: D9583630 fbshipit-source-id: 2667ec98e7607bc466920c7415a8c651fd56dfca	2018-08-30 20:11:25 -07:00
Edward Yang	19031c68dc	Use intrusive_ptr in Storage; replace unique_ptr<Storage> with Storage (#10488 ) Summary: ``` Use intrusive_ptr in Storage; replace unique_ptr<Storage> with Storage This patch does two major changes: - It replaces the use of Retainable in Storage with a new implementation based on intrusive_ptr. This will be necessary because Caffe2 will be using this class to implement intrusive_ptrs, and we need to line these up for the merge. One good thing about the new implementation is that the default copy/move constructors/assignment operators and destructor work automatically, instead of needing to be hardcoded into Storage/Tensor. - It replaces all places where we returned std::unique_ptr<Storage> with Storage, collapsing an unnecessary double indirection that is no longer necessary now that we have correctly working copy/move constructors. I didn't initially want to do step (2), but it was very important to eliminate all bare uses of new Storage and new StorageImpl, and this making the API change was the most straightforward way to do this. HOW TO FIX YOUR CODE IN THE NEW API - You no longer need to dereference the result of tensor.storage() to pass it to set. So, instead of: x.set_(*y.storage()); just write: x.set_(y.storage()); - If you were accessing methods on StorageImpl via the pImpl() method, you must use the dot operator to run pImpl(). Even better; just drop pImpl, we now have method forwarding. So, instead of: storage->pImpl()->data(); just do: storage->data(); // storage.pImpl()->data() works too but is not as recommended - storage->getDevice() is no more; instead use storage->device().index() MISC CODE UPDATES - retain, release, weak_retain, weak_release and weak_lock are now reimplemented using the "blessed API", and renamed to make it clearer that their use is discouraged. - nvcc OS X and general OS X portability improvements to intrusive_ptr - A new comment in intrusive_ptr describing how stack allocated intrusive_ptr_targets work differently than heap allocated ones from c10::make_intrusive CAVEAT EMPTOR - THStorage_weakRetain used to work on strong pointers, but it NO LONGER works with intrusive_ptr. You must reclaim the strong pointer into a real strong pointer, construct a weak pointer from it, and then release the strong and weak pointers. See StorageSharing.cpp for an example. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10488 Reviewed By: gchanan Differential Revision: D9306134 Pulled By: ezyang fbshipit-source-id: 02d58ef62dab8e4da6131e1a24834a65c21048e2	2018-08-21 21:39:55 -07:00
Edward Yang	6bdbad93b9	Refactor Device to not depend on Backend. (#10478 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10478 - Removed Backend constructor from Device, and fixed all use-sites to use DeviceType::CPU instead of kCPU, or use a new function backendToDeviceType to perform the conversion. - New method device_type() on Type; it gives you the underlying device type, e.g., CPU for SparseCPU. - We add backward compatibility for kCPU/kCUDA uses, by introducing a new special type which is implicitly convertible to both DeviceType and Backend. As long as you don't define a function that's overloaded on both DeviceType and Backend (but not on BackendOrDeviceType), the implicit conversions will ensure that uses of at::Device(at::kCPU) keep working. We fixed use-sites in the library, but did NOT fix sites in the test code, so that we can exercise this BC code. Reviewed By: Yangqing Differential Revision: D9301861 fbshipit-source-id: 9a9d88620500715c7b37e655b4fd761f6dd72716	2018-08-18 17:39:14 -07:00
Sebastian Messmer	f51f15bb27	Update include paths for ATen/core (#10130 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10130 Update some include paths to make them internally consistent Reviewed By: ezyang Differential Revision: D9119906 fbshipit-source-id: b44e5cab8e8e795ee18afe9ffc6caf1f2b413467	2018-08-03 11:57:02 -07:00
Wanchao Liang	47c1badf90	Fix the clamp special case and gradient problem on None, add None to JIT (#9596 ) Summary: Supersedes #8925 This PR fixes #8502, it fixes the gradients problem for clamp when passing None to the function, and add support for the NoneLiteral and NoneType in script to enable clamp tests. Now we could have corner cases like: ```python torch.jit.script def func(): x = torch.randn(3, 3, requires_grad=True) y = torch.clamp(x, None, 0) # max = 0 y = torch.clamp(x, min=None, max=0) ``` In both JIT and Aten, we use Scalar(NAN) as a sentinel value when passing None type to function clamp, this is the current way we used to support None type in JIT and to solve the gradient problem when user explicitly passing None into clamp. In JIT side, we create a tensor(NAN) and undefinedTensor if we encounter None when matching the function schema, and later in the interpreter, it will translate to Scalar(NAN) if needed. Ideally we don't need clamp_min and clamp_max in ATenNative/Autograd and could only support clamp after this change, but since bunch of other operators (e.g. Activation.cpp, Loss.cpp) is using clamp_min in several places, we will still have the functions available, but all python invocations will only call clamp instead of clamp_min/max (with calling underlying th_max/th_min in clamp). zdevito jamesr66a Pull Request resolved: https://github.com/pytorch/pytorch/pull/9596 Reviewed By: zdevito Differential Revision: D8940839 Pulled By: wanchaol fbshipit-source-id: c543a867b82e0ab8c99384773b173fdde2605d28	2018-07-27 22:54:33 -07:00
Adam Paszke	aa7af94656	Make JIT tracing a thread-local property (#9414 ) Summary: As in the title. Lets us simplify a lot of code. Depends on #9363, so please review only the last commit. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9414 Reviewed By: zdevito Differential Revision: D8836496 Pulled By: apaszke fbshipit-source-id: 9b3c3d1f001a9dc522f8478abc005b6b86cfa3e3	2018-07-19 19:09:39 -07:00
Peter Goldsborough	372d1d6735	Create ATen tensors via TensorOptions (#7869 ) * Created TensorOptions Storing the type in TensorOptions to solve the Variable problem Created convenience creation functions for TensorOptions and added tests Converted zeros to TensorOptions Converted rand to TensorOptions Fix codegen for TensorOptions and multiple arguments Put TensorOptions convenience functions into torch namespace too All factory functions except _like support TensorOptions Integrated with recent JIT changes Support _like functions Fix in place modification Some cleanups and fixes Support sparse_coo_tensor Fix bug in Type.cpp Fix .empty calls in C++ API Fix bug in Type.cpp Trying to fix device placement Make AutoGPU CPU compatible Remove some auto_gpu.h uses Fixing some headers Fix some remaining CUDA/AutoGPU issues Fix some AutoGPU uses Fixes to dispatch_tensor_conversion Reset version of new variables to zero Implemented parsing device strings Random fixes to tests Self review cleanups flake8 Undo changes to variable.{h,cpp} because they fail on gcc7.2 Add [cuda] tag to tensor_options_cuda.cpp Move AutoGPU::set_index_from into .cpp file because Windows is stupid and sucks Fix linker error in AutoGPU.cpp Fix bad merge conflict in native_functions.yaml Fixed caffe2/contrib/aten Fix new window functions added to TensorFactories.cpp * Removed torch::TensorOptions Added code to generate wrapper functions for factory methods Add implicit constructor from Backend to TensorOptions Remove Var() from C++ API and use torch:: functions Use torch:: functions more subtly in C++ API Make AutoGPU::set_device more exception safe Check status directly in DynamicCUDAHooksInterface Rename AutoGPU to DeviceGuard Removed set_requires_grad from python_variables.h and warn appropriately in Variable::set_requires_grad remove python_default_init: self.type() Add back original factory functions, but with deprecation warnings Disable DeviceGuard for a couple functions in ATen Remove print statement Fix DeviceGuard construction from undefined tensor Fixing CUDA device compiler issues Moved as many methods as possible into header files Dont generate python functions for deprecated factories Remove merge conflict artefact Fix tensor_options_cuda.cpp Fix set_requires_grad not being checked Fix tensor_new.h TEMPORARILY put some methods in .cpp files to see if it solves issues on windows and mac Fix bug in DeviceGuard.h Missing includes TEMPORARILY moving a few more methods into .cpp to see if it fixes windows Fixing linker errors * Fix up SummaryOps to use new factories Undo device agnostic behavior of DeviceGuard Use -1 instead of optional for default device index Also move DeviceGuard methods into header Fixes around device index after optional -> int32_t switch Fix use of DeviceGuard in new_with_tensor_copy Fix tensor_options.cpp * Fix Type::copy( * Remove test_non_float_params from ONNX tests * Set requires_grad=False in ONNX tests that use ints * Put layout/dtype/device on Tensor * Post merge fixes * Change behavior of DeviceGuard to match AutoGPU * Fix C++ API integration tests * Fix flip functions	2018-06-16 00:40:35 -07:00
Soumith Chintala	dc186cc9fe	Remove NO_* and WITH_* across codebase, except in setup.py (#8555 ) * remove legacy options from CMakeLists * codemod WITH_ to USE_ for WITH_CUDA, WITH_CUDNN, WITH_DISTRIBUTED, WITH_DISTRIBUTED_MW, WITH_GLOO_IBVERBS, WITH_NCCL, WITH_ROCM, WITH_NUMPY * cover SYSTEM_NCCL, MKLDNN, NNPACK, C10D, NINJA * removed NO_* variables and hotpatch them only in setup.py * fix lint	2018-06-15 12:29:48 -04:00
Tongzhou Wang	c0a419e6ba	Add non_blocking to Tensor/Module.to (#7312 ) * Add non_blocking to Tensor/Module.to * flake8 * Add argparse tests * cpp parse * Use C++ parser * use a commong parse function with Tensor.to * fix test_jit * use THPObjectPtr * increase refcount for None, True, and False * address comments * address comments	2018-06-04 18:46:52 -04:00
Sam Gross	6c7a8318c4	Fix Tensor.type(dtype) not preserving device (#7474 ) Note that Tensor.cuda() will stil copy the tensor to the current device if it's a CUDA tensor on a different device. Fixes #7441	2018-05-10 18:22:13 -04:00
Adam Paszke	0829d4502d	Trace size-dependent expressions correctly (#6554 ) This makes the JIT tracer much more robust, by allowing it to record dependencies on tensor sizes. For example, if you were to trace this function def fn(x): return x.view(x.size(1), -1) before this patch, then it would embed the actual value of x.size(1) in the trace as a constant, making it very hard to have e.g. batch size independent traces. Now, this will correctly record the dependency, and will retrieve the size of x at every run.	2018-05-04 10:55:39 +02:00
Thomas Viehmann	8fbab83c2a	only Tensors of floating point dtype can require gradients (see #7021 ) (#7034 )	2018-04-30 10:20:00 +02:00
gchanan	d0b0edf27a	Add a requires_grad_() function to tensors. (#6771 )	2018-04-19 13:47:24 -04:00
Tongzhou Wang	892be8b779	Make dtype in .to positional rather than kwarg only (#6628 )	2018-04-16 14:03:40 -04:00
gchanan	46374ad5c8	Add tensor.to(device) method. (#6588 ) * Add tensor.on(device) and tensor.on_device_as(tensor) methods. * Rename {'on', 'on_device_as'} -> 'to'. * Fix test ordinal. * Fix device ordinal again.	2018-04-16 10:50:34 -04:00
gchanan	749d51414a	Separate cuda-ness from dtype. (#6470 ) * Separate cuda-ness from dtype. There are no longer torch.cuda.int64, etc; only torch.int64 that correspond to at::ScalarType. At the python arg parser level, the corresponding ATen type is selected from the combination of (ScalarType, Layout, Device). There is also currently unused code in here for support ScalarType in native_functions; this will be used for specifying aggregate types on reduction functions. * Fix test_autograd. * Add defaults to randint_like. * Track is_cuda in py tensor types. * Fix test_sparse. * Fix multiprocessing. * Fix rnn. * Fix test_nn. * Fix flake8.	2018-04-12 14:05:44 -04:00
gchanan	87e369111a	Add string-style devices to all tensors. (#6283 ) * Add string-style devices to all tensors. Previously, tensors only had a 'get_device' method which would throw an exception on a CPU tensor. This made it necessary to if/else code that was meant to be device agnostic. This PR implements the following: 1) Adds a 'device' property to all tensors that returns a string representation of the device for all tensors. For cpu tensors this is 'cpu'. For cuda tensors this is 'cuda:X', where X is the cuda device ordinal. 2) Adds a DeviceSpec class. This is just a helper class for separating device_type and device_index specification and to allow partial specification. For example, you can call DeviceSpec('cuda'), DeviceSpec('cuda:0'), DeviceSpec('cuda', 1). Also has backwards compatibility support for specifying integers, which are treated as cuda devices. DeviceSpecs have the following properties: a) device_type: string representation of the device type (i.e. 'cpu' or 'cuda') b) device_index: integer for the device index (None if not specified) c) cuda_device_index: for backwards compatibility; behaves roughly like `get_device` did previously. I.e. if a function previously took integers for cuda devices, it can now take DeviceSpecs (or strings), and can maintain the old functionality by calling `old_index = DeviceSpec(old).cuda_device_index`. 3) tensor methods and torch. functions that took integer devices can now take integers, strings, or DeviceSpecs. For example: torch.randn((2,3), dtype=torch.cuda.float32, device='cuda:1') TODO in future PRs: A) Split out cuda from dtype so you don't need to overspecify cuda-ness B) We currently only support strings/DeviceSpecs in tensor methods and torch. functions. We should have equivalents torch.cuda.device(...), torch.cuda.device_of, etc. at the torch. level that work on strings/DeviceSpecs * Add deviceInt64 to python arg parser. * device_str. * Remove device_str. * remove device prefix from attributes. * Use const char * instead of string. * Move autogpu index out of Device. * comment on is_default. * Rename torch.DeviceSpec to torch.device. * comment. * Fix tests. * Fix flake8. * Fix sparse_coo_tensor parameter name. * Improve error message. * Remove device_ prefix from C++ device object. * Allocate static strings. * Return not implemented from rich compare. * Move torch::Device to THPDevice. * Remove cuda index. * Py_RETURN_NOTIMPLEMENTED doesn't exist in python2.	2018-04-06 15:12:05 -04:00
Sam Gross	6b3a4637d6	Make the tensor type torch.Tensor instead of torch.autograd.Variable (#5785 ) This changes type(tensor) to return `torch.Tensor` instead of `torch.autograd.Variable`. This requires a few implementation changes: - torch.Tensor is now a regular Python class instead of a pseudo-factory like torch.FloatTensor/torch.DoubleTensor - torch.autograd.Variable is just a shell with a __new__ function. Since no instanes are constructed it doesn't have any methods. - Adds torch.get_default_dtype() since torch.Tensor.dtype returns <attribute 'dtype' of 'torch._C._TensorBase' objects>	2018-04-03 16:29:25 -04:00
gchanan	4c81282c33	Introduce torch.layout and split layout from dtypes. (#6145 ) * Introduce torch.layout and split layout from dtypes. Tensors (and tensor types) now have a 'layout' attribute that returns either 'torch.strided' or 'torch.sparse_coo'. Previously, dtypes were 1-to-1 with ATen types/PyTensorTypes; the impetus behind this decision was to make things easy in the common case (i.e. specifying a type in a factory function). But this doesn't really follow for sparity, which isn't a common case. It also doesn't properly represent the concept or a dtype, which in numpy are proper scalar types (i.e. roughly the type returned from indexing the last dimension of an n-d array). But this should be the same whether or not the tensor is represented via strides, sparsity, etc. This is accomplished by: 1) having the dtype of tensor return the (device-type, scalar-type) combination, i.e. torch.cuda.float32, so both torch.cuda.FloatTensor and torch.cuda.sparse.FloatTensor have the same dtype 2) Adding a layout parameter to python functions, where the combination of (dtype, layout) maps to an ATen type that is used for dispatch. * Formatting, make init throw python_error. * Fix cuda not enabled error message. * Fix test.	2018-04-02 14:07:50 -04:00
gchanan	c474136ee1	[REDO] Add torch.sparse_coo_tensor factory. (#5781 ) * Add torch.sparse_coo_tensor factory. Notes: 1) I didn't add Tensor.new_sparse_coo_tensor; it didn't seem particularly useful, but it's easy to add 2) This doesn't do the type inference, i.e. torch.sparse_coo_tensor(indices=LongTensor, values=IntTensor) will return a sparse tensor corresponding to the default type rather than a sparse IntTensor. We can add type inference later when we add it to other factories. * Fix merge. * Use type_conversion function from python_variable_methods.	2018-03-16 13:58:02 -04:00
James Reed	55af142b44	Traceable dispatch for cast methods (#5629 ) Previously, methods like int() and long() would fail tracing because they eventually dispatch down to toType, which takes a Type as a parameter. We don't (currently) support tracing ops with Type inputs[0], so this PR adds specializations for the ATen scalar types and dispatches to those directly. These specialized ops can be traced into the IR without needing a Type argument. A more long-term solution would be to add support for Types in the IR. * Traceable dispatch for Variable cast methods * Add ONNX symbolics * Fix test * Fix cross-backend copy issue * Prepend underscores to cast identifiers * Metaprogram symbolics * clang-format * stupid lint * Add comments for all code fragments	2018-03-12 19:01:14 -04:00
gchanan	ae0c04c773	Add torch.empty, torch.full and new_ size Tensor factory methods. (#5668 ) * Add torch.empty, torch.full and new_ size Tensor factory methods. This adds torch.full, torch.empty equivalents of np.full, np.empty. In addition, this adds size-based Tensor factory methods new_empty, new_ones, new_full, new_zeros, which is meant to complete the separation of the legacy "new" method into data-based and size-based functions. This also fixes an issue in sparse zeros_like when the dtype didn't match the argument dtype. * Get rid of unnecessary zero in sparse tensor zeros_like. * Fix test if only 1 cuda device.	2018-03-09 15:29:29 -05:00

1 2

95 Commits