pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-08 07:39:33 +01:00

Author	SHA1	Message	Date
David Reiss	e75fb4356b	Remove (most) Python 2 support from Python code (#35615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35615 Python 2 has reached end-of-life and is no longer supported by PyTorch. Now we can clean up a lot of cruft that we put in place to support it. These changes were all done manually, and I skipped anything that seemed like it would take more than a few seconds, so I think it makes sense to review it manually as well (though using side-by-side view and ignoring whitespace change might be helpful). Test Plan: CI Differential Revision: D20842886 Pulled By: dreiss fbshipit-source-id: 8cad4e87c45895e7ce3938a88e61157a79504aed	2020-04-22 09:23:14 -07:00
Brian Wignall	f326045b37	Fix typos, via a Levenshtein-type corrector (#31523 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking. Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523 Differential Revision: D19216749 Pulled By: mrshenli fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea	2020-01-17 16:03:19 -08:00
Brian Wignall	e7fe64f6a6	Fix typos (#30606 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606 Differential Revision: D18763028 Pulled By: mrshenli fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c	2019-12-02 20:17:42 -08:00
Richard Zou	277d442d18	Rename torch.namedtensor -> torch._namedtensor_internals (#26349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26349 The directory holds a lot of private helper functions that help implement named tensor functionality. Instead of naming each helper function with a leading underscore, I change the name of the import to `_namedtensor_internals` to signal it should not be used directly. Test Plan: - [namedtensor ci] Differential Revision: D17424178 Pulled By: zou3519 fbshipit-source-id: 8f7b74346765759303480e581038a661021acf53	2019-09-18 05:47:09 -07:00
Richard Zou	2513ca66ca	Add guards for using named tensor with serialization and multiprocessing (#25345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25345 Test Plan - New tests [namedtensor ci] Test Plan: Imported from OSS Differential Revision: D17101486 Pulled By: zou3519 fbshipit-source-id: 58e803b042056ee6abab8551517f74078f2b81d5	2019-08-29 14:10:33 -07:00
SsnL	eb756746ab	Fix possible deadlock in SharedCache inside a forked child proc (#25158 ) Summary: Related: https://github.com/pytorch/pytorch/issues/24927#issuecomment-524608021 `fork` inherits lock state. So if we happen to unfortunately fork when the `SharedCache` lock is held. We could deadlock in the child process when some code tries to acquire it. Following pytorch multiprocessing library design, this patch resets the lock to a new object after fork. A similar example from python core lib for `multiprocessing.Queue` is : ```py class Queue(object): def __init__(self, ...): ... self._after_fork() if sys.platform != 'win32': register_after_fork(self, Queue._after_fork) def _after_fork(self): debug('Queue._after_fork()') self._notempty = threading.Condition(threading.Lock()) self._buffer = collections.deque() self._thread = None self._jointhread = None self._joincancelled = False self._closed = False self._close = None self._send_bytes = self._writer.send_bytes self._recv_bytes = self._reader.recv_bytes self._poll = self._reader.poll ``` `d4d60134b2/Lib/multiprocessing/queues.py (L54-L78)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/25158 Differential Revision: D17091227 Pulled By: soumith fbshipit-source-id: ee7130f47d7bbd42fc34a2598f1f6974d8d7cdb7	2019-08-28 13:34:03 -07:00
Tongzhou Wang	bc6281028c	rebuild_storage_fd retry on EINTR (#21723 ) Summary: Some data loader tests are flaky on py 2 with the following error ``` Jun 12 22:17:31 Traceback (most recent call last): Jun 12 22:17:31 File "test_dataloader.py", line 798, in test_iterable_dataset Jun 12 22:17:31 fetched = sorted([d.item() for d in dataloader_iter]) Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 697, in __next__ Jun 12 22:17:31 idx, data = self._get_data() Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 664, in _get_data Jun 12 22:17:31 success, data = self._try_get_data() Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 617, in _try_get_data Jun 12 22:17:31 data = self.data_queue.get(timeout=timeout) Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/multiprocessing/queues.py", line 135, in get Jun 12 22:17:31 res = self._recv() Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/queue.py", line 22, in recv Jun 12 22:17:31 return pickle.loads(buf) Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1382, in loads Jun 12 22:17:31 return Unpickler(file).load() Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 858, in load Jun 12 22:17:31 dispatch[key](self) Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1133, in load_reduce Jun 12 22:17:31 value = func(*args) Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/reductions.py", line 274, in rebuild_storage_fd Jun 12 22:17:31 fd = multiprocessing.reduction.rebuild_handle(df) Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/multiprocessing/reduction.py", line 157, in rebuild_handle Jun 12 22:17:31 new_handle = recv_handle(conn) Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/multiprocessing/reduction.py", line 83, in recv_handle Jun 12 22:17:31 return _multiprocessing.recvfd(conn.fileno()) Jun 12 22:17:31 OSError: [Errno 4] Interrupted system call ``` Apparently, Python 2.7's `recvfd` calls `recvmsg` without EINTR retry: https://github.com/python/cpython/blob/2.7/Modules/_multiprocessing/multiprocessing.c#L174 So we should call it with an outer try-catch loop. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21723 Differential Revision: D15806247 Pulled By: ezyang fbshipit-source-id: 16cb661cc0fb418fd37353a1fef7ceeb634f02b7	2019-06-14 09:10:00 -07:00
Soumith Chintala	2e029db2f9	fixes multiprocessing serialization for integer nn.Parameter (#18639 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/17345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18639 Differential Revision: D14711565 Pulled By: soumith fbshipit-source-id: 0063ed138a215b95d6571dcd68b18569714abe19	2019-04-01 17:15:42 -07:00
Edward Yang	173f224570	Turn on F401: Unused import warning. (#18598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598 ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a Stack from [ghstack](https://github.com/ezyang/ghstack): * #18598 Turn on F401: Unused import warning. This was requested by someone at Facebook; this lint is turned on for Facebook by default. "Sure, why not." I had to noqa a number of imports in __init__. Hypothetically we're supposed to use __all__ in this case, but I was too lazy to fix it. Left for future work. Be careful! flake8-2 and flake8-3 behave differently with respect to import resolution for # type: comments. flake8-3 will report an import unused; flake8-2 will not. For now, I just noqa'd all these sites. All the changes were done by hand. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14687478 fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3	2019-03-30 09:01:17 -07:00
Vitaly Fedyunin	5653a914f7	Implement reference counting for shared IPC CUDA tensors (#16854 ) Summary: This is to fix #16141 and similar issues. The idea is to track a reference to every shared CUDA Storage and deallocate memory only after a consumer process deallocates received Storage. ezyang Done with cleanup. Same (insignificantly better) performance as in file-per-share solution, but handles millions of shared tensors easily. Note [ ] documentation in progress. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16854 Differential Revision: D13994490 Pulled By: VitalyFedyunin fbshipit-source-id: 565148ec3ac4fafb32d37fde0486b325bed6fbd1	2019-03-25 10:24:38 -07:00
hysts	cbefd0323b	Fix typo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17521 Differential Revision: D14237482 Pulled By: soumith fbshipit-source-id: 636e0fbe2c667d15fcb649136a65ae64937fa0cb	2019-02-26 20:23:34 -08:00
Shen Li	24f4d3987e	Move all Stream and Event Python implementation to C++ (#15937 ) Summary: 1. Added `torch/csrc/cuda/Event.h` and `torch/csrc/cuda/Event.cpp` to bind Python Event class to C++ implementation. 2. Move all CUDA runtime invocations from `torch/cuda/streams.py` to C++ 3. Added tests to cover Stream and Event APIs. ~(event IPC handle tests is introduced in #15974)~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/15937 Differential Revision: D13649001 Pulled By: mrshenli fbshipit-source-id: 84ca58f35f6ba679a4ba33150ceba678d760d240	2019-01-17 07:29:22 -08:00
Ailing Zhang	be47470c91	Fix cuda multiprocessing cached memory (#14736 ) Summary: This PR fixes #11422 In the old world of CUDA IPC, when we want to share a tensor T from A to B, we have to share the whole CUDA mem allocation where T's storage sit in. And we casted it to the same type of storage of T's. This causes problem when two different types of storage got allocated to the same CUDA mem block. When we try to reconstruct the second tensor, it will complain about wrong storage type. In this PR we reconstruct the storage only (not the entire mem block). However, CUDA only allows one open memHandle once per process, we have to save the device pointer in a global cache so that we can reconstruct tensors as they come. Thanks a ton to ezyang who helped design the solution and debugged the issue! Pull Request resolved: https://github.com/pytorch/pytorch/pull/14736 Differential Revision: D13335899 Pulled By: ailzhang fbshipit-source-id: cad69db392ed6f8fdc2b93a9dc2899f6d378c371	2018-12-05 10:55:43 -08:00
Edward Yang	3bfa7258b3	Don't serialize hooks (#11705 ) Summary: Fixes #11683. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11705 Differential Revision: D9833057 Pulled By: ezyang fbshipit-source-id: 18af9bcd77b088326738d567100fbe4a4c869dd6	2018-10-16 20:11:03 -07:00
Sam Gross	0b63d12db6	Don't call into Python during Storage destruction. (#10407 ) Summary: ``` This removes PyObjectFinalizer. We were seeing SIGSEGV at exit in some programs that use multiprocessing. The backtrace pointed to StorageRef.__del__ being called from subtype_dealloc. My guess is that the Python interpreter was shutdown before all C++ Storage objects were deallocated. Deallocating the C++ Storage called the finalizer which called back into Python after it was no longer safe to do so. This avoids a callback from C++ into Python during Storage finalization. Instead, dead Storage objects (expired weak references) are collected periodically when shared_cache exceeds a limit. The limit is scaled with 2x the number of live references, which places an upper bound on the amount of extra memory held by dead Storage objects. In practice, this should be very small. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10407 Differential Revision: D9272400 Pulled By: colesbury fbshipit-source-id: ecb14d9c6d54ffc91e134c34a4e770a4d09048a2	2018-08-13 11:20:07 -07:00
Edward Yang	674f7a9778	Correctly share CUDA Parameters. (#10220 ) Summary: ``` Correctly share CUDA Parameters, requires_grad and hooks. Previously, the following was true: - If you put a Parameter for a CUDA tensor in multiprocessing queue (or otherwise tried to transfer it), this failed, saying that we cannot pickle CUDA storage. This is issue #9996. - If you put a leaf Tensor that requires_grad=True through the multiprocessing queue, it would come out the other end as requires_grad=False (It should have come out the other end as requires_grad=True). Similarly, backwards hooks were lost. - If you put a non-leaf Tensor that requires_grad=True through the multiprocessing queue, it would come out the other end as requires_grad=False. The root cause for the first issue was that implementation of reductions for Parameter used the superclass implementation (tensor) in __reduce_ex__, but this always picks up the non-ForkingPickler reduction, which doesn't work with CUDA tensors. So, we registered a new ForkingPickler specifically for Parameter, and adjusted the code to correctly rewrap a Tensor in a Parameter if it was originally a parameter. While working on this, we realized that requires_grad and backwards hooks would not be preserved in the ForkingPickler reduction implementation. We fixed the reducer to save these parameters. However, Adam Paszke pointed out that we shouldn't allow sending requires_grad=True, non-leaf Tensors over a multiprocessing queue, since we don't actually support autograd over process boundar. We now throw an error in this case; this may cause previously working code to fail, but this is easy enough to fix; just detach() the tensor before sending it. The error message says so. Fixes #9996. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10220 Differential Revision: D9160746 Pulled By: ezyang fbshipit-source-id: a39c0dbc012ba5afc7a9e646da5c7f325b3cf05c	2018-08-10 13:54:56 -07:00
Edward Yang	976f9253a5	Eliminate storage views. (#9466 ) Summary: Storage views were previously used to implement CUDA IPC sharing, but they weren't necessary. The new strategy is described in Note [CUDA IPC and the caching allocator]. This also fixes an unrelated bug, where we weren't actually using the Tensor forking pickler, because we didn't register a pickler for torch.Tensor. Fixes #9447. Fixes #46. Signed-off-by: Edward Z. Yang <ezyang@fb.com> CC apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9466 Reviewed By: apaszke Differential Revision: D8859698 Pulled By: ezyang fbshipit-source-id: 3362cb92f6ae4aa37084c57d79b31004bd0b4a97	2018-07-16 15:40:24 -07:00
Edward Yang	d0d1820814	Add weak pointer and finalizer support directly to THStorage. (#9148 ) Summary: The underlying use-case is the file descriptor to storage cache in torch.multiprocessing.reductions. Previously, this was implemented by wrapping an existing allocator with a "weak ref" allocator which also knew to null out the weak reference when the storage died. This is terribly oblique, and prevents us from refactoring the allocators to get rid of per-storage allocator state. So instead of going through this fiasco, we instead directly implement weak pointers and finalizers in THStorage. Weak pointers to THStorage retain the THStorage struct, but not the data_ptr. When all strong references die, data_ptr dies and the finalizers get invoked. There is one major hazard in this patch, which is what happens if you repeatedly call _weak_ref on a storage. For cleanliness, we no longer shove our grubby fingers into the finalizer struct to see if there is already a Python object for the weak reference and return it; we just create a new one (no one is checking these Python objects for identity). This means if you keep calling it, we'll keep piling on finalizers. That's bad! But I am not going to fix it until it is actually a problem for someone, because then we need to add another caching layer. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9148 Differential Revision: D8729106 Pulled By: ezyang fbshipit-source-id: 69710ca3b7c7e05069090e1b263f8b6b9f1cf72f	2018-07-10 06:25:33 -07:00
Richard Zou	e831ad6204	Fix sharing of empty tensor in multiprocessing (#6229 ) Fixes #5719 Previously, the following would error out with an "Invalid file descriptor" error: ``` import torch import torch.multiprocessing as mp q = mp.Queue() t = torch.tensor([]) q.put(t) ``` on some OSes. The problem was that because one cannot mmap data of size 0, and that an empty tensor has a storage of size 0, the file descriptor for the storage (referencing shared memory) was not being set. The multiprocessing sharing code then calls DupFD on that uninitialized file descriptor, leading to an error. This PR special cases sharing an empty tensor on the CPU. CUDA does not have this problem. Unit tests for both cpu and cuda empty tensors	2018-04-03 11:49:40 -04:00
Adam Lerer	e71cf20192	improved serialization (no tar copy) (#713 )	2017-02-22 22:24:20 +01:00
Sam Gross	bd5303010d	Refactor autograd package to separate Python dependencies. (#662 ) The core autograd Variable, Function, and Engine no longer depend on the Python API. This let's us implement functions in C++. In the future, we can also multithread engine and release the GIL for most of the non-Python backwards.	2017-02-13 16:00:16 -08:00
Luke Yeager	3ed720079e	[pep8] Fix most remaining lint manually	2017-01-28 01:15:51 +01:00
Luke Yeager	e7c1e6a8e3	[pep8] Fix most lint automatically with autopep8 Here's the command I used to invoke autopep8 (in parallel!): git ls-files \| grep '\.py$' \| xargs -n1 -P`nproc` autopep8 -i Several rules are ignored in setup.cfg. The goal is to let autopep8 handle everything which it can handle safely, and to disable any rules which are tricky or controversial to address. We may want to come back and re-enable some of these rules later, but I'm trying to make this patch as safe as possible. Also configures flake8 to match pep8's behavior. Also configures TravisCI to check the whole project for lint.	2017-01-28 01:15:51 +01:00
Sam Gross	0c69fd559a	Fix CUDA sharing across processes (#530 )	2017-01-20 18:28:39 -05:00
Adam Paszke	f908432eb3	Ensure that Variable's grad is shared between processes	2016-12-31 16:25:39 -05:00
Sam Gross	24af02154c	Use ForkingPickler for sharing tensor/storages across processes (#344 ) This hooks into the (internal) ForkingPickler class in multiprocessing to reduce tensors, storages, and CUDA events instead of our queue from joblib. This makes it easier to use the standard multiprocessing classes in later versions of Python. This also exposes: - Tensor/Storage.share_memory_() - Module.share_memory() These methods move the CPU tensors and storages to shared memory. If you're using the "fork" method of multiprocessing, these objects can be directly inherited instead of serialized through a queue.	2016-12-28 20:34:23 -05:00

26 Commits