pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-08 07:39:33 +01:00

Author	SHA1	Message	Date
Kurt Mohler	ee28b865ee	Deprecate TypedStorage, its derived classes, and all of their public methods (#85303 ) Part of #85302 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85303 Approved by: https://github.com/ezyang	2022-11-08 18:11:01 +00:00
Kurt Mohler	14d0296e5c	Rename `_Typed/_UntypedStorage` to `Typed/UntypedStorage` and update docs (#82438 ) ### Description Since the major changes for `_TypedStorage` and `_UntypedStorage` are now complete, they can be renamed to be public. `TypedStorage._untyped()` is renamed to `TypedStorage.untyped()`. Documentation for storages is improved as well. ### Issue Fixes #82436 ### Testing N/A Pull Request resolved: https://github.com/pytorch/pytorch/pull/82438 Approved by: https://github.com/ezyang	2022-07-30 19:37:08 +00:00
ProGamerGov	357b7d589c	Fix docstring inconsistencies: string -> str, boolean -> bool (#82410 ) ### Description Throughout the PyTorch docs and codebase, the `string` type in docstrings is referred to by two separate names. This leads to inconsistent docs, like you can see here: https://pytorch.org/docs/stable/generated/torch.nn.Conv3d.html#torch.nn.Conv3d This PR fixes this issue by ensuring that all mentions of the string type in docstrings, are using the same format that Sphinx generates hyperlinks for. ### Testing No testing should be required for this change Pull Request resolved: https://github.com/pytorch/pytorch/pull/82410 Approved by: https://github.com/jbschlosser	2022-07-28 21:29:57 +00:00
PyTorch MergeBot	3c9479dc30	Revert "FIX make sure we import the correct object from multiprocessing (#53282 )" This reverts commit `e103d6af3d`. Reverted https://github.com/pytorch/pytorch/pull/53282 on behalf of https://github.com/janeyx99 due to Sorry, reverting as this breaks 10.2 tests on trunk `e103d6af3d`	2022-07-20 20:28:39 +00:00
tomMoral	e103d6af3d	FIX make sure we import the correct object from multiprocessing (#53282 ) Fixes #44687. The issue was that the Process object is not the one from the `_default_context` which should be `loky` when nesting `loky` calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53282 Approved by: https://github.com/VitalyFedyunin	2022-07-20 16:52:03 +00:00
Elias Ellison	1058b47562	Weak-ref-ify MetaConverter and FakeTensorConverter (#80544 ) Make `MetaConverter` and `FakeTensorConverter` hold weak references to their memoized tensors, and also have `MetaConverter` hold weak reference to Tensor storage. Otherwise it can be tricky for users to make sure all existing FakeTensors or FakeTensorModes are deleted which otherwise will leak memory. I ran into https://github.com/pytorch/pytorch/issues/7733 which I was able to get around with the following (see comment from code): ``` # torch.Tensors cannot be used as a key in a dictionary # because they define a custom __eq__ function which when used # to resolve hash collisions will throw when comparing tensors: # "RuntimeError: bool value of Tensor with more than one value is ambiguous." # To avoid that, we use an object which will hold a Tensor and use # its id for both hashing and equality. # In order to use this as a weak key reference, we cannot # simply use weakref.WeakKeyDictionary because the newly constructed # WeakTensorRefKey only use would be a dictionary so it would have no strong # references. # To get around this issue, we can use it as a normal key, and then set # `weakref.finalize` to delete the key when its contained tensor dies. ``` While for the tensor memo we can set a `weakref.finalize` callback that will remove the corresponding `WeakTensorRefKey` from the tensor memo, at the point that this callback is invoked the tensor storage is not yet deallocated.. See comment from code: ``` # [expired-storages] # NB: even though the tensor has died, # the deallocation of its storage can take longer, # even when the storage has no other uses/views. # In this case, the StorageWeakRef object will be kept alive # longer than it needs to be, however the storage itself # will be deallocated. We retain the possibly dead storages # and periodically check if any of them are expired and # can be freed. ``` partial fix for https://github.com/pytorch/torchdynamo/issues/468 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80544 Approved by: https://github.com/ezyang	2022-06-29 23:36:35 +00:00
Kurt Mohler	cecb2ad95e	Restore old names for private funcs in legacy storages (#77861 ) Followup from #75459 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77861 Approved by: https://github.com/ezyang	2022-05-20 02:03:34 +00:00
Kurt Mohler	aea6e2c396	Merge torch.cuda._UntypedStorage into torch._UntypedStorage (#75459 ) Fixes #74933 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75459 Approved by: https://github.com/ezyang	2022-05-19 13:54:39 +00:00
Kurt Mohler	79ddc72b85	Virtualize `<type>Storage` classes (#66970 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66228 cc ezyang bhosmer smessmer ljk53 bdhirsh Pull Request resolved: https://github.com/pytorch/pytorch/pull/66970 Reviewed By: bdhirsh Differential Revision: D33245612 Pulled By: ezyang fbshipit-source-id: 4c61c2cb029e2b94b0e68927c377d3e1c358dd7c (cherry picked from commit d29fcdfb4bc2cc17b1795d4349e4b56fa0d1cf12)	2022-03-22 23:44:48 +00:00
Kurt Mohler	8e7fe87630	Rename `Typed/UntypedStorage` to `_Typed/_UntypedStorage` (#72540 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72540 Reviewed By: jbschlosser Differential Revision: D34216823 Pulled By: bdhirsh fbshipit-source-id: 1bc9930ab582771ebf02308e035576cd1a0dbe47 (cherry picked from commit `329238f612`)	2022-02-15 23:53:01 +00:00
epwalsh	14d3d29b16	make ProcessException pickleable (#70118 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70116 Happy to add tests if you let me know the best place to put them. cc VitalyFedyunin Pull Request resolved: https://github.com/pytorch/pytorch/pull/70118 Reviewed By: malfet Differential Revision: D33255899 Pulled By: ejguan fbshipit-source-id: 41d495374182eb28bb8bb421e890eca3bddc077b	2021-12-30 09:09:55 -08:00
Kurt Mohler	5883523c1d	Remove dtype from torch.Storage and use only torch.ByteStorage (#62030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62030 Remove dtype tracking from Python Storage interface, remove all the different `<type>Storage` classes except for `ByteStorage`, and update serialization accordingly, while maintaining as much FC/BC as possible Fixes https://github.com/pytorch/pytorch/issues/47442 * THE SERIALIZATION FORMAT IS FULLY FC/BC. We worked very hard to make sure this is the case. We will probably want to break FC at some point to make the serialization structure of tensors make more sense, but not today. * There is now only a single torch.ByteStorage class. Methods like `Tensor.set_` no longer check that the dtype of storage is appropriate. * As we no longer know what dtype of a storage is, we've removed the size method from Storage, replacing it with nbytes. This is to help catch otherwise silent errors where you confuse number of elements with number of bytes. * `Storage._new_shared` takes a `nbytes` kwarg and will reject previous positional only calls. `Storage._new_with_file` and `_set_from_file` require explicit element size arguments. * It's no longer possible to convert storages to different types using the float/double/etc methods. Instead, do the conversion using a tensor. * It's no longer possible to allocate a typed storage directly using FloatStorage/DoubleStorage/etc constructors. Instead, construct a tensor and extract its storage. The classes still exist but they are used purely for unpickling. * The preexisting serialization format stores dtype with storage, and in fact this dtype is used to determine the dtype of the tensor overall. To accommodate this case, we introduce a new TypedStorage concept that exists only during unpickling time which is used to temporarily store the dtype so we can construct a tensor. If you overrode the handling of pickling/unpickling, you MUST add handling for TypedStorage or your serialization code will degrade to standard file-based serialization. Original pull request: https://github.com/pytorch/pytorch/pull/59671 Reviewed By: soulitzer, ngimel Differential Revision: D29466819 Pulled By: ezyang fbshipit-source-id: 4a14e5d3c2b08e06e558683d97f7378a3180b00e	2021-10-05 13:50:34 -07:00
Kaushik B	ba07aaf211	Fix typo in warning for spawn method (#57927 ) Summary: Fix typo in warning for spawn method Pull Request resolved: https://github.com/pytorch/pytorch/pull/57927 Reviewed By: suo Differential Revision: D28326390 Pulled By: bdhirsh fbshipit-source-id: b0c12b1020d713865687f94f28ab2873ae260c23	2021-05-10 13:12:38 -07:00
Vasiliy Alekseev	bac4cfd54d	Fix mp serialization for integer nn.Parameter on CUDA (#56529 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56342 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56529 Reviewed By: albanD Differential Revision: D27896094 Pulled By: ngimel fbshipit-source-id: fe817781eb7139ea57c78acfd56e7c11b61eb4ed	2021-04-22 16:21:04 -07:00
Sam Estep	4753100a3b	Un-ignore F403 in .flake8 (#55838 ) Summary: Generally wildcard imports are bad for the reasons described here: https://www.flake8rules.com/rules/F403.html This PR replaces wildcard imports with an explicit list of imported items where possible, and adds a `# noqa: F403` comment in the other cases (mostly re-exports in `__init__.py` files). This is a prerequisite for https://github.com/pytorch/pytorch/issues/55816, because currently [`tools/codegen/dest/register_dispatch_key.py` simply fails if you sort its imports](https://github.com/pytorch/pytorch/actions/runs/742505908). Pull Request resolved: https://github.com/pytorch/pytorch/pull/55838 Test Plan: CI. You can also run `flake8` locally. Reviewed By: jbschlosser Differential Revision: D27724232 Pulled By: samestep fbshipit-source-id: 269fb09cb4168f8a51fd65bfaacc6cda7fb87c34	2021-04-13 09:24:07 -07:00
Robert Gmyr	93bbbeccf7	Make SharedCache thread-safe (#53750 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53731 Make SharedCache thread-safe by using explicit locks instead of relying on atomicity of certain Python operations Pull Request resolved: https://github.com/pytorch/pytorch/pull/53750 Reviewed By: malfet Differential Revision: D27304793 Pulled By: albanD fbshipit-source-id: 7c62babe4357bed57df3056fbda6801fb6168846	2021-03-25 06:35:03 -07:00
Richard Barnes	b89827b73f	Drop unused imports (#49972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49972 From ``` ./python/libcst/libcst codemod remove_unused_imports.RemoveUnusedImportsWithGlean --no-format caffe2/ ``` Test Plan: Standard sandcastle tests Reviewed By: xush6528 Differential Revision: D25727352 fbshipit-source-id: 6b90717e161aeb1da8df30e67d586101d35d7d5f	2021-01-13 12:26:17 -08:00
Hugo van Kemenade	473e78c0fa	Remove redundant code for unsupported Python versions (#49486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49486 Remove code for Python 3.5 and lower. There's more that can be removed/modernised, but sticking mainly to redundant version checks here, to keep the diff/PR smaller. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46579 Reviewed By: zou3519 Differential Revision: D24453571 Pulled By: ezyang fbshipit-source-id: c2cfcf05d6c5f65df64d89c331692c9aec09248e	2021-01-06 12:45:46 -08:00
Samuel Marks	e6779d4357	[*.py] Rename "Arguments:" to "Args:" (#49736 ) Summary: I've written custom parsers and emitters for everything from docstrings to classes and functions. However, I recently came across an issue when I was parsing/generating from the TensorFlow codebase: inconsistent use of `Args:` and `Arguments:` in its docstrings. ```sh (pytorch#c348fae)$ for name in 'Args:' 'Arguments:'; do printf '%-10s %04d\n' "$name" "$(rg -IFtpy --count-matches "$name" \| paste -s -d+ -- \| bc)"; done Args: 1095 Arguments: 0336 ``` It is easy enough to extend my parsers to support both variants, however it looks like `Arguments:` is wrong anyway, as per: - https://google.github.io/styleguide/pyguide.html#doc-function-args @ [`ddccc0f`](https://github.com/google/styleguide/blob/ddccc0f/pyguide.md) - https://chromium.googlesource.com/chromiumos/docs/+/master/styleguide/python.md#describing-arguments-in-docstrings @ [`9fc0fc0`](https://chromium.googlesource.com/chromiumos/docs/+/9fc0fc0/styleguide/python.md) - https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html @ [`c0ae8e3`](https://github.com/sphinx-contrib/napoleon/blob/c0ae8e3/docs/source/example_google.rst) Therefore, only `Args:` is valid. This PR replaces them throughout the codebase. PS: For related PRs, see tensorflow/tensorflow/pull/45420 PPS: The trackbacks automatically appearing below are sending the same changes to other repositories in the [PyTorch](https://github.com/pytorch) organisation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49736 Reviewed By: albanD Differential Revision: D25710534 Pulled By: soumith fbshipit-source-id: 61e8ff01abb433e9f78185c2d1d0cbd7c22c1619	2020-12-28 09:34:47 -08:00
Guilherme Leobas	cf92b0f3a0	add type annotations to multiprocessing module (#47756 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47757 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47756 Reviewed By: malfet Differential Revision: D24970773 Pulled By: ezyang fbshipit-source-id: b0b9edb9cc1057829c6320e78174c6d5f7a77477	2020-11-16 13:05:49 -08:00
Chester Liu	17a6bc7c1b	Cleanup unused code for Python < 3.6 (#47822 ) Summary: I think these can be safely removed since the min version of supported Python is now 3.6 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47822 Reviewed By: smessmer Differential Revision: D24954936 Pulled By: ezyang fbshipit-source-id: 5d4b2aeb78fc97d7ee4abaf5fb2aae21bf765e8b	2020-11-13 21:37:01 -08:00
Aliaksandr Ivanou	3ffd2af8cd	Add exception classification to torch.multiprocessing.spawn (#45174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45174 Introduce different types of exceptions that map to different failures of torch.multiprocessing.spawn. The change introduces three different exception types: ProcessRaisedException - occurs when the process initiated by spawn raises an exception ProcessExitedException - occurs when the process initiated by spawn exits The following logic will allow frameworks that use mp.spawn to categorize failures. This can be helpful for tracking metrics and enhancing logs. Test Plan: Imported from OSS Reviewed By: taohe Differential Revision: D23889400 Pulled By: tierex fbshipit-source-id: 8849624c616230a6a81158c52ce0c18beb437330	2020-10-09 12:59:41 -07:00
Xiang Gao	20ac736200	Remove py2 compatible future imports (#44735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735 Reviewed By: mruberry Differential Revision: D23731306 Pulled By: ezyang fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f	2020-09-16 12:55:57 -07:00
David Reiss	e75fb4356b	Remove (most) Python 2 support from Python code (#35615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35615 Python 2 has reached end-of-life and is no longer supported by PyTorch. Now we can clean up a lot of cruft that we put in place to support it. These changes were all done manually, and I skipped anything that seemed like it would take more than a few seconds, so I think it makes sense to review it manually as well (though using side-by-side view and ignoring whitespace change might be helpful). Test Plan: CI Differential Revision: D20842886 Pulled By: dreiss fbshipit-source-id: 8cad4e87c45895e7ce3938a88e61157a79504aed	2020-04-22 09:23:14 -07:00
Kiuk Chung	7314f1c281	[torch/multiprocessing] Update documentation indicating that start_method is ignored for mp.spawn() (#33070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33070 `start_method` parameter is intentionally ignored for `mp.spawn()`. Document this fact and point the user to `start_processes` if they want to use a different `start_method`. Test Plan: Warning message looks like: ``` main.py:8: UserWarning: This method only supports start_method=spawn (got: fork). To use a different start_method use: torch.multiprocessing.start_process(...) warnings.warn(msg) ``` Reviewed By: ailzhang Differential Revision: D19780235 fbshipit-source-id: 4599cd18c3ba6cc401810efe4f390290ffa8023b	2020-02-07 15:26:00 -08:00
Brian Wignall	f326045b37	Fix typos, via a Levenshtein-type corrector (#31523 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking. Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523 Differential Revision: D19216749 Pulled By: mrshenli fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea	2020-01-17 16:03:19 -08:00
peterjc123	6486bdfb90	Fix `os.register_at_fork` not defined on Windows (#30809 ) Summary: According to https://docs.python.org/3.8/library/os.html#os.register_at_fork, this function is only available in Unix platforms. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30809 Differential Revision: D18828777 Pulled By: bddppq fbshipit-source-id: 3325a984da488bb0a80a5c27131553fbcf78921f	2019-12-05 13:36:53 -08:00
Peter Bell	dcd1216efe	Force early initialization of OpenMP in forked children (#29006 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/28389 Intel's OpenMP implementation sets the thread affinity on the first call to an OpenMP function after a fork. By adding an atfork handler we can force this to happen before a user tries to set the affinity in their own DataLoader `worker_init_fn`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29006 Differential Revision: D18782456 Pulled By: ezyang fbshipit-source-id: ce0b515256da0cf18ceb125e0cdec99a3311bbd3	2019-12-03 15:23:31 -08:00
Ailing Zhang	a997f224ac	Add torch.multiprocessing.create_processes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28493 Differential Revision: D18766066 Pulled By: ailzhang fbshipit-source-id: 7f424c8fae3012be2416cf9bc72ee2dde40c1f89	2019-12-03 10:38:19 -08:00
Brian Wignall	e7fe64f6a6	Fix typos (#30606 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606 Differential Revision: D18763028 Pulled By: mrshenli fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c	2019-12-02 20:17:42 -08:00
なるみ	d83389d327	Ignore F401 in all __init__.py without putting noqa (#25823 ) Summary: By adding `per-file-ignores = __init__.py: F401` into `.flake8` with `flake8>=3.7`, we can ignore F410 in all `__init__.py` without putting `# noqa: F401` line by line. http://flake8.pycqa.org/en/latest/user/options.html?highlight=per-file-ignores#cmdoption-flake8-per-file-ignores Pull Request resolved: https://github.com/pytorch/pytorch/pull/25823 Differential Revision: D17252182 Pulled By: soumith fbshipit-source-id: 87b174075b79e4078953a7521bd1a8f82405646b	2019-10-23 15:28:13 -07:00
Richard Zou	277d442d18	Rename torch.namedtensor -> torch._namedtensor_internals (#26349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26349 The directory holds a lot of private helper functions that help implement named tensor functionality. Instead of naming each helper function with a leading underscore, I change the name of the import to `_namedtensor_internals` to signal it should not be used directly. Test Plan: - [namedtensor ci] Differential Revision: D17424178 Pulled By: zou3519 fbshipit-source-id: 8f7b74346765759303480e581038a661021acf53	2019-09-18 05:47:09 -07:00
Richard Zou	2513ca66ca	Add guards for using named tensor with serialization and multiprocessing (#25345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25345 Test Plan - New tests [namedtensor ci] Test Plan: Imported from OSS Differential Revision: D17101486 Pulled By: zou3519 fbshipit-source-id: 58e803b042056ee6abab8551517f74078f2b81d5	2019-08-29 14:10:33 -07:00
SsnL	eb756746ab	Fix possible deadlock in SharedCache inside a forked child proc (#25158 ) Summary: Related: https://github.com/pytorch/pytorch/issues/24927#issuecomment-524608021 `fork` inherits lock state. So if we happen to unfortunately fork when the `SharedCache` lock is held. We could deadlock in the child process when some code tries to acquire it. Following pytorch multiprocessing library design, this patch resets the lock to a new object after fork. A similar example from python core lib for `multiprocessing.Queue` is : ```py class Queue(object): def __init__(self, ...): ... self._after_fork() if sys.platform != 'win32': register_after_fork(self, Queue._after_fork) def _after_fork(self): debug('Queue._after_fork()') self._notempty = threading.Condition(threading.Lock()) self._buffer = collections.deque() self._thread = None self._jointhread = None self._joincancelled = False self._closed = False self._close = None self._send_bytes = self._writer.send_bytes self._recv_bytes = self._reader.recv_bytes self._poll = self._reader.poll ``` `d4d60134b2/Lib/multiprocessing/queues.py (L54-L78)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/25158 Differential Revision: D17091227 Pulled By: soumith fbshipit-source-id: ee7130f47d7bbd42fc34a2598f1f6974d8d7cdb7	2019-08-28 13:34:03 -07:00
SsnL	e982e46de3	Add multiprocessing_context= argument to DataLoader (#22990 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/22131 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22990 Differential Revision: D16539052 Pulled By: colesbury fbshipit-source-id: b1c48ae2fb54065dd96a67be263254129e02eaa2	2019-07-29 12:58:40 -07:00
Tongzhou Wang	bc6281028c	rebuild_storage_fd retry on EINTR (#21723 ) Summary: Some data loader tests are flaky on py 2 with the following error ``` Jun 12 22:17:31 Traceback (most recent call last): Jun 12 22:17:31 File "test_dataloader.py", line 798, in test_iterable_dataset Jun 12 22:17:31 fetched = sorted([d.item() for d in dataloader_iter]) Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 697, in __next__ Jun 12 22:17:31 idx, data = self._get_data() Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 664, in _get_data Jun 12 22:17:31 success, data = self._try_get_data() Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 617, in _try_get_data Jun 12 22:17:31 data = self.data_queue.get(timeout=timeout) Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/multiprocessing/queues.py", line 135, in get Jun 12 22:17:31 res = self._recv() Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/queue.py", line 22, in recv Jun 12 22:17:31 return pickle.loads(buf) Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1382, in loads Jun 12 22:17:31 return Unpickler(file).load() Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 858, in load Jun 12 22:17:31 dispatch[key](self) Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1133, in load_reduce Jun 12 22:17:31 value = func(*args) Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/reductions.py", line 274, in rebuild_storage_fd Jun 12 22:17:31 fd = multiprocessing.reduction.rebuild_handle(df) Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/multiprocessing/reduction.py", line 157, in rebuild_handle Jun 12 22:17:31 new_handle = recv_handle(conn) Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/multiprocessing/reduction.py", line 83, in recv_handle Jun 12 22:17:31 return _multiprocessing.recvfd(conn.fileno()) Jun 12 22:17:31 OSError: [Errno 4] Interrupted system call ``` Apparently, Python 2.7's `recvfd` calls `recvmsg` without EINTR retry: https://github.com/python/cpython/blob/2.7/Modules/_multiprocessing/multiprocessing.c#L174 So we should call it with an outer try-catch loop. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21723 Differential Revision: D15806247 Pulled By: ezyang fbshipit-source-id: 16cb661cc0fb418fd37353a1fef7ceeb634f02b7	2019-06-14 09:10:00 -07:00
Soumith Chintala	2e029db2f9	fixes multiprocessing serialization for integer nn.Parameter (#18639 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/17345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18639 Differential Revision: D14711565 Pulled By: soumith fbshipit-source-id: 0063ed138a215b95d6571dcd68b18569714abe19	2019-04-01 17:15:42 -07:00
Edward Yang	173f224570	Turn on F401: Unused import warning. (#18598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598 ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a Stack from [ghstack](https://github.com/ezyang/ghstack): * #18598 Turn on F401: Unused import warning. This was requested by someone at Facebook; this lint is turned on for Facebook by default. "Sure, why not." I had to noqa a number of imports in __init__. Hypothetically we're supposed to use __all__ in this case, but I was too lazy to fix it. Left for future work. Be careful! flake8-2 and flake8-3 behave differently with respect to import resolution for # type: comments. flake8-3 will report an import unused; flake8-2 will not. For now, I just noqa'd all these sites. All the changes were done by hand. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14687478 fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3	2019-03-30 09:01:17 -07:00
Vitaly Fedyunin	5653a914f7	Implement reference counting for shared IPC CUDA tensors (#16854 ) Summary: This is to fix #16141 and similar issues. The idea is to track a reference to every shared CUDA Storage and deallocate memory only after a consumer process deallocates received Storage. ezyang Done with cleanup. Same (insignificantly better) performance as in file-per-share solution, but handles millions of shared tensors easily. Note [ ] documentation in progress. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16854 Differential Revision: D13994490 Pulled By: VitalyFedyunin fbshipit-source-id: 565148ec3ac4fafb32d37fde0486b325bed6fbd1	2019-03-25 10:24:38 -07:00
hysts	cbefd0323b	Fix typo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17521 Differential Revision: D14237482 Pulled By: soumith fbshipit-source-id: 636e0fbe2c667d15fcb649136a65ae64937fa0cb	2019-02-26 20:23:34 -08:00
Shen Li	24f4d3987e	Move all Stream and Event Python implementation to C++ (#15937 ) Summary: 1. Added `torch/csrc/cuda/Event.h` and `torch/csrc/cuda/Event.cpp` to bind Python Event class to C++ implementation. 2. Move all CUDA runtime invocations from `torch/cuda/streams.py` to C++ 3. Added tests to cover Stream and Event APIs. ~(event IPC handle tests is introduced in #15974)~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/15937 Differential Revision: D13649001 Pulled By: mrshenli fbshipit-source-id: 84ca58f35f6ba679a4ba33150ceba678d760d240	2019-01-17 07:29:22 -08:00
Ailing Zhang	be47470c91	Fix cuda multiprocessing cached memory (#14736 ) Summary: This PR fixes #11422 In the old world of CUDA IPC, when we want to share a tensor T from A to B, we have to share the whole CUDA mem allocation where T's storage sit in. And we casted it to the same type of storage of T's. This causes problem when two different types of storage got allocated to the same CUDA mem block. When we try to reconstruct the second tensor, it will complain about wrong storage type. In this PR we reconstruct the storage only (not the entire mem block). However, CUDA only allows one open memHandle once per process, we have to save the device pointer in a global cache so that we can reconstruct tensors as they come. Thanks a ton to ezyang who helped design the solution and debugged the issue! Pull Request resolved: https://github.com/pytorch/pytorch/pull/14736 Differential Revision: D13335899 Pulled By: ailzhang fbshipit-source-id: cad69db392ed6f8fdc2b93a9dc2899f6d378c371	2018-12-05 10:55:43 -08:00
Pieter Noordhuis	220ce8046e	Binding for prctl(PR_SET_PDEATHSIG) (#14491 ) Summary: If torch.multiprocessing.spawn is used to launch non-daemonic processes (the default since #14391), the spawned children won't be automatically terminated when the parent terminates. On Linux, we can address this by setting PR_SET_PDEATHSIG, which delivers a configurable signal to child processes when their parent terminates. Fixes #14394. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14491 Differential Revision: D13270374 Pulled By: pietern fbshipit-source-id: 092c9d3c3cea2622c3766b467957bc27a1bd500c	2018-11-29 20:09:19 -08:00
Teng Li	ffbc3905a1	Fixed torch.multiprocessing.spawn for not being able to spawn like dataloader workers (#14391 ) Summary: Should fix: https://github.com/pytorch/pytorch/issues/14390 Now imagenet example works fine with multiprocessing and more than 1 dataloader worker Pull Request resolved: https://github.com/pytorch/pytorch/pull/14391 Reviewed By: calebho Differential Revision: D13209800 Pulled By: teng-li fbshipit-source-id: e8abc0fb38d4436cf3474dcbba0e28f4290e4d29	2018-11-27 12:37:41 -08:00
Teng Li	778e23606b	multiprocessing.spawn python version check (#14039 ) Summary: This will be super helpful to the user Pull Request resolved: https://github.com/pytorch/pytorch/pull/14039 Differential Revision: D13089200 Pulled By: teng-li fbshipit-source-id: 29e7507bd8fe5a0c58a85c52f976bfca282b4c1b	2018-11-16 18:53:23 -08:00
Pieter Noordhuis	1caa341c68	Add torch.multiprocessing.spawn docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13846 Differential Revision: D13029595 Pulled By: pietern fbshipit-source-id: b733b00f7070c18535c31801f20e6e717eec7748	2018-11-12 14:39:52 -08:00
Pieter Noordhuis	be424de869	Add torch.multiprocessing.spawn helper (#13518 ) Summary: This helper addresses a common pattern where one spawns N processes to work on some common task (e.g. parallel preprocessing or multiple training loops). A straightforward approach is to use the multiprocessing API directly and then consecutively call join on the resulting processes. This pattern breaks down in the face of errors. If one of the processes terminates with an exception or via some signal, and it is not the first process that was launched, the join call on the first process won't be affected. This helper seeks to solve this by waiting on termination from any of the spawned processes. When any process terminates with a non-zero exit status, it terminates the remaining processes, and raises an exception in the parent process. If the process terminated with an exception, it is propagated to the parent. If the process terminated via a signal (e.g. SIGINT, SIGSEGV), this is mentioned in the exception as well. Requires Python >= 3.4. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13518 Reviewed By: orionr Differential Revision: D12929045 Pulled By: pietern fbshipit-source-id: 00df19fa16a568d1e22f37a2ba65677ab0cce3fd	2018-11-06 14:08:37 -08:00
Edward Yang	3bfa7258b3	Don't serialize hooks (#11705 ) Summary: Fixes #11683. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11705 Differential Revision: D9833057 Pulled By: ezyang fbshipit-source-id: 18af9bcd77b088326738d567100fbe4a4c869dd6	2018-10-16 20:11:03 -07:00
Sam Gross	0b63d12db6	Don't call into Python during Storage destruction. (#10407 ) Summary: ``` This removes PyObjectFinalizer. We were seeing SIGSEGV at exit in some programs that use multiprocessing. The backtrace pointed to StorageRef.__del__ being called from subtype_dealloc. My guess is that the Python interpreter was shutdown before all C++ Storage objects were deallocated. Deallocating the C++ Storage called the finalizer which called back into Python after it was no longer safe to do so. This avoids a callback from C++ into Python during Storage finalization. Instead, dead Storage objects (expired weak references) are collected periodically when shared_cache exceeds a limit. The limit is scaled with 2x the number of live references, which places an upper bound on the amount of extra memory held by dead Storage objects. In practice, this should be very small. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10407 Differential Revision: D9272400 Pulled By: colesbury fbshipit-source-id: ecb14d9c6d54ffc91e134c34a4e770a4d09048a2	2018-08-13 11:20:07 -07:00
Edward Yang	674f7a9778	Correctly share CUDA Parameters. (#10220 ) Summary: ``` Correctly share CUDA Parameters, requires_grad and hooks. Previously, the following was true: - If you put a Parameter for a CUDA tensor in multiprocessing queue (or otherwise tried to transfer it), this failed, saying that we cannot pickle CUDA storage. This is issue #9996. - If you put a leaf Tensor that requires_grad=True through the multiprocessing queue, it would come out the other end as requires_grad=False (It should have come out the other end as requires_grad=True). Similarly, backwards hooks were lost. - If you put a non-leaf Tensor that requires_grad=True through the multiprocessing queue, it would come out the other end as requires_grad=False. The root cause for the first issue was that implementation of reductions for Parameter used the superclass implementation (tensor) in __reduce_ex__, but this always picks up the non-ForkingPickler reduction, which doesn't work with CUDA tensors. So, we registered a new ForkingPickler specifically for Parameter, and adjusted the code to correctly rewrap a Tensor in a Parameter if it was originally a parameter. While working on this, we realized that requires_grad and backwards hooks would not be preserved in the ForkingPickler reduction implementation. We fixed the reducer to save these parameters. However, Adam Paszke pointed out that we shouldn't allow sending requires_grad=True, non-leaf Tensors over a multiprocessing queue, since we don't actually support autograd over process boundar. We now throw an error in this case; this may cause previously working code to fail, but this is easy enough to fix; just detach() the tensor before sending it. The error message says so. Fixes #9996. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10220 Differential Revision: D9160746 Pulled By: ezyang fbshipit-source-id: a39c0dbc012ba5afc7a9e646da5c7f325b3cf05c	2018-08-10 13:54:56 -07:00

1 2

71 Commits