pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Vitaly Fedyunin	57d01be92b	Replacing assertEqual with assertEqualIgnoreType wherever types missmatch (#38102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38102 Test Plan: Imported from OSS Differential Revision: D21477060 Pulled By: VitalyFedyunin fbshipit-source-id: 25e0fd837ca9bfccf0ce994c80f7790c894096d4	2020-05-09 14:48:55 -07:00
David Reiss	e75fb4356b	Remove (most) Python 2 support from Python code (#35615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35615 Python 2 has reached end-of-life and is no longer supported by PyTorch. Now we can clean up a lot of cruft that we put in place to support it. These changes were all done manually, and I skipped anything that seemed like it would take more than a few seconds, so I think it makes sense to review it manually as well (though using side-by-side view and ignoring whitespace change might be helpful). Test Plan: CI Differential Revision: D20842886 Pulled By: dreiss fbshipit-source-id: 8cad4e87c45895e7ce3938a88e61157a79504aed	2020-04-22 09:23:14 -07:00
Wojciech Baranowski	69e3ee2d5f	DataLoader: properly diagnose exceeding file descriptor limit (#34768 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/973 Common failure scenario: * DataLoader creates workers and communicates with them through SHMs * Workers send back through an AF_UNIX socket file descriptors to SHMs containing data * The limit of open files gets fully used * A FD gets stripped from a socket message coming back from a worker, without the worker knowing this. * This causes a `RuntimeError: received 0 items of ancdata` in the standard `multiprocessing` package * The exception is not handled by PyTorch and so is presented to the users. After this change the user will see ``` Traceback (most recent call last): File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 761, in _try_get_data data = self._data_queue.get(timeout=timeout) File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/multiprocessing/queues.py", line 113, in get return _ForkingPickler.loads(res) File "/home/wbaranowski/git/Quansight/pytorch/torch/multiprocessing/reductions.py", line 294, in rebuild_storage_fd fd = df.detach() File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach return reduction.recv_handle(conn) File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/multiprocessing/reduction.py", line 184, in recv_handle return recvfds(s, 1)[0] File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/multiprocessing/reduction.py", line 162, in recvfds len(ancdata)) RuntimeError: received 0 items of ancdata During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 787, in _try_get_data fs = [tempfile.NamedTemporaryFile() for i in range(10)] File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 787, in <listcomp> fs = [tempfile.NamedTemporaryFile() for i in range(10)] File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/tempfile.py", line 551, in NamedTemporaryFile (fd, name) = _mkstemp_inner(dir, prefix, suffix, flags, output_type) File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/tempfile.py", line 262, in _mkstemp_inner fd = _os.open(file, flags, 0o600) OSError: [Errno 24] Too many open files: '/tmp/tmpnx_f6v_f' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "test_shm_leak.py", line 56, in <module> worker_init_fn=worker_init_fn File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 345, in __next__ data = self._next_data() File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 861, in _next_data idx, data = self._get_data() File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 828, in _get_data success, data = self._try_get_data() File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 791, in _try_get_data "Too many open files. Communication with the" RuntimeError: Too many open files. Communication with the workers is no longer possible. Please increase the limit using `ulimit -n` in the shell or change the sharing strategy by calling `torch.multiprocessing.set_sharing_strategy('file_system')` at the beginning of your code ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34768 Differential Revision: D20538053 Pulled By: ezyang fbshipit-source-id: be4425cf2fa02aff61619b2b829c153cb1a867cb	2020-04-14 07:10:57 -07:00
Wanchao Liang	3526627f46	Use unittest assertWarns instead (#36411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36411 This PR remove pytorch specific defined assertwarns and use the unit test one, also format some tests Test Plan: Imported from OSS Differential Revision: D20998159 Pulled By: wanchaol fbshipit-source-id: 1280ecff2dd293b95a639d13cc7417fc819c2201	2020-04-13 15:56:42 -07:00
Hong Xu	817e4f9ef1	Correct a ValueError in dataloader to TypeError (#36244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36244 Differential Revision: D20963949 Pulled By: ezyang fbshipit-source-id: 8c6aa4831021788052269e7aa8282d11eba4e085	2020-04-10 09:03:58 -07:00
Mathis Chenuet	17a01c7c7b	feature: deterministic random_split (#34043 ) Summary: ## 🚀 Feature Option to provide a seed (random_state) for random_split() like the sklearn API https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html. ## Motivation Useful for deterministic sampling & reproducible data generation (easily, without affecting the PRNG for other uses). See https://github.com/pytorch/pytorch/issues/32467 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34043 Differential Revision: D20605678 Pulled By: ezyang fbshipit-source-id: 12b10bf72cd8a0d4264ae4d326064f806945d011	2020-03-26 08:02:39 -07:00
Hong Xu	a6a72ac68f	Fix all occurrences of C416. (#33429 ) Summary: C416: Unnecessary (list/set) comprehension - rewrite using list/set(). See https://pypi.org/project/flake8-comprehensions/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/33429 Differential Revision: D19972858 Pulled By: ezyang fbshipit-source-id: faac042a94c59d737bd5ae983121a0a029346e23	2020-02-21 08:32:22 -08:00
Pritam Damania	f050b16dd9	Move pytorch distributed tests to separate folder for contbuild. (#30445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445 Create distributed and rpc directories under caffe/test for better management of unit tests. Differential Revision: D18702786 fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606	2020-01-22 21:16:59 -08:00
Brian Vaughan	945ce71b18	Correctly handle scalar types, fix parse of numpy ints (#30486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30486 Fixes: https://github.com/pytorch/pytorch/issues/29252 There is some incorrect code in the handling of parsing python numbers that led to issue #29252: When we allow interpretation of a zero-dim numpy integer value as a scalar in pytorch, we incorrectly parse the int as a float. This PR also fixes the issue described in the "FIXME" here: https://github.com/pytorch/pytorch/pull/27628/files#diff-f539198dd366265fb8dc2d661bc5d5bcR1487 Test Plan: Added a unit test based on the example given in the issue. Differential Revision: D18932520 Pulled By: nairbv fbshipit-source-id: f6416f28dfd73ac72c1042042851d76beb5fcf65	2019-12-11 15:35:57 -08:00
Tongzhou Wang	c37de32b23	Enable len(dataloader) for iterable dataset (#23587 ) Summary: Copy-paste comment from code for reasoning: ``` # NOTE [ IterableDataset and __len__ ] # # For `IterableDataset`, `__len__` could be inaccurate when one naively # does multi-processing data loading, since the samples will be duplicated. # However, no real use case should be actually using that behavior, so # it should count as a user error. We should generally trust user # code to do the proper thing (e.g., configure each replica differently # in `__iter__`), and give us the correct `__len__` if they choose to # implement it (this will still throw if the dataset does not implement # a `__len__`). # # To provide a further warning, we track if `__len__` was called on the # `DataLoader`, save the returned value in `self._len_called`, and warn # if the iterator ends up yielding more than this number of samples. ``` Fixes https://github.com/pytorch/pytorch/issues/30184 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23587 Differential Revision: D18852625 Pulled By: ailzhang fbshipit-source-id: aea8d4d70c7f21aaa69b35908a6f43026493d826	2019-12-06 15:38:05 -08:00
Peter Bell	dcd1216efe	Force early initialization of OpenMP in forked children (#29006 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/28389 Intel's OpenMP implementation sets the thread affinity on the first call to an OpenMP function after a fork. By adding an atfork handler we can force this to happen before a user tries to set the affinity in their own DataLoader `worker_init_fn`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29006 Differential Revision: D18782456 Pulled By: ezyang fbshipit-source-id: ce0b515256da0cf18ceb125e0cdec99a3311bbd3	2019-12-03 15:23:31 -08:00
Michael Suo	4b0a6d299c	test reporting (#29658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29658 This PR makes our test scripts output artifacts that CircleCI can understand. This has a few benefits: 1. We can actually see failed tests and their output in the job screen (instead of having to scroll through logs) 2. We can use the CircleCI test metadata API to track failed tests programmatically. it looks like this (old ui): https://circleci.com/gh/pytorch/pytorch/3546584?pipelines-ui-opt-out or this (new ui): https://app.circleci.com/jobs/github/pytorch/pytorch/3546584/tests Test Plan: Imported from OSS Differential Revision: D18597261 Pulled By: suo fbshipit-source-id: 07fc7d26bbb834e13cc4cc0e48178645ae6579f5	2019-11-19 11:15:31 -08:00
Mike Ruberry	f6bda1e07b	Removes @default_floating_dtype decorator (#27628 ) Summary: One fewer legacy decorator cluttering the test suite. Functions relying on this decorator were updated or, in the case of test_sparse, the test suite was put back on double by default. Note: this PR is blocked on https://github.com/pytorch/pytorch/issues/27599. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27628 Differential Revision: D17896254 Pulled By: mruberry fbshipit-source-id: 13d460301f50ef4af7a660372432108164c0de1f	2019-10-12 12:39:34 -07:00
Mike Ruberry	7f183a978f	Stops common_utils.py from setting the default tensor type (to torch.DoubleTensor) (#27444 ) Summary: This PR stop common_utils.py from setting the default tensor type when it's imported. See issue https://github.com/pytorch/pytorch/issues/27355. This is a frequent source of confusion for test writers. Many tests relied on this setting (whether they knew it or not), and this PR also updates the test suite to pass without common_utils.py setting the default tensor type. Some larger test files now set the default floating dtype themselves, however. These test files are: - test_autograd.py - test_distributions.py - test_jit.py - test_nn.py This is still a significant improvement from today, however. First, these files set the default floating dtype much more clearly than importing it from common_utils. Second, the rest of the test suite no longer sets this globally. Third, this PR is a springboard to updating those tests, too. In particular, as tests are made generic they can be moved aways from relying on this global setting. Notable technical changes in this PR are: - Significant updates to test_torch.py to make it pass without setting the default floating dtype globally. - The default_floating_dtype decorator is now defined in common_utils, a couple versions of this operator were defined in test files previously. - test_torch-specific parts of common_utils were refactored into test_torch. - tensor creation methods in common_utils were updated to accept an optional dtype and device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27444 Differential Revision: D17795235 Pulled By: mruberry fbshipit-source-id: 7f77271c0c836e69f183ad9057a2c4b29f09d2e1	2019-10-08 09:52:44 -07:00
SsnL	df9d8f9032	Fix no auto batching bugs: cannot bulk load; not work with namedtuple (#26065 ) Summary: see title Pull Request resolved: https://github.com/pytorch/pytorch/pull/26065 Differential Revision: D17392851 Pulled By: soumith fbshipit-source-id: 468cd41c8e03d689ff2e0261d948e28daad6bfaf	2019-09-16 07:22:31 -07:00
Pritam Damania	f8611eaa7e	Disable tsan for test_dataloader.py. (#25005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25005 Seeing a bunch of failures in TSAN mostly with the following error: ``` ThreadSanitizer: starting new threads after multi-threaded fork is not supported. Dying (set die_after_fork=0 to override) ``` TSAN is unsafe to use in a multi-threaded program after fork() and setting die_after_fork can lead to deadlocks. As a result, I'm disabling tsan. ghstack-source-id: 88765698 Differential Revision: D16954347 fbshipit-source-id: 18895cd82b5052938284b46479d8470af2d74a06	2019-08-22 16:20:54 -07:00
Tongzhou Wang	928754b67d	make more iterator attributes private (#23744 ) Summary: 1. Prefixed underscores to any `DataLoaderIter` attribute that is not part of the data loader ctor argument list. 2. Prefixed `DataLoader.dataset_kind` with underscore because it only makes sense with the private enum `_DatasetKind`, and is an implementation detail. 3. Disallow setting `DataLoader.dataset` and `DataLoader.batch_sampler` after initializing a `DataLoader` because they affect other attributes in `__init__`. These changes should not have major BC breaking effect since the big changes are on the iterator class and most users don't even store it. I GitHub searched `pin_memory_thread` and (while I didn't look through all result pages) results I see are forks of pytorch and blog posts on how data loader works. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23744 Differential Revision: D16732507 Pulled By: ezyang fbshipit-source-id: 9f04d000b4200b8047f31eaa3473780b66cebd26	2019-08-09 11:43:00 -07:00
SsnL	e982e46de3	Add multiprocessing_context= argument to DataLoader (#22990 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/22131 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22990 Differential Revision: D16539052 Pulled By: colesbury fbshipit-source-id: b1c48ae2fb54065dd96a67be263254129e02eaa2	2019-07-29 12:58:40 -07:00
Jan Schlüter	0bc90194fb	Catch and print exception traceback in parallel_apply() workers (#18055 ) Summary: When an exception occurs in one of the modules passed to `parallel_apply()`, it is caught and re-raised in the main thread. This preserves the original exception type and message, but has the traceback point at the position where it's re-raised, rather than the original point of failure. This PR saves the exception information required to generate the traceback, and includes the original traceback in the message of the exception raised in the main thread. Before: ``` ... File ".../torch/nn/parallel/data_parallel.py", line 153, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File ".../torch/nn/parallel/parallel_apply.py", line 84, in parallel_apply raise output RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor ``` After: ``` ... File ".../torch/nn/parallel/data_parallel.py", line 153, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File ".../torch/nn/parallel/parallel_apply.py", line 88, in parallel_apply ''.join(traceback.format_exception(*exc_info))) RuntimeError: Caught exception in replica 0. Original traceback and message: Traceback (most recent call last): ... File "../models/foo.py", line 319, in bar baz = asdf / ghij[:, np.newaxis] RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor ``` I took care to raise an exception of the original type (in case the main code checks for that), but replaced the message. It helped me find a bug that did not occur outside `data_parallel()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18055 Differential Revision: D16444972 Pulled By: zhangguanheng66 fbshipit-source-id: ec436c9d4677fad18106a8046cfa835a20a101ce	2019-07-26 11:41:22 -07:00
Tongzhou Wang	25eae3ed08	Disable test_proper_exit flaky worker_kill (#22208 ) Summary: I learned from https://github.com/pytorch/pytorch/pull/22058 that `worker_kill` is just flaky, regardless of `hold_iter_reference`. So let's disable it altogether for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22208 Differential Revision: D15990307 Pulled By: soumith fbshipit-source-id: d7d3f4fe7eaac4987f240cb8fd032c73a84157d7	2019-06-26 09:47:40 -07:00
Tongzhou Wang	71741ba115	rename test to be more consistent Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22057 Differential Revision: D15936870 Pulled By: soumith fbshipit-source-id: ab6194219da2582efdf324b89b2bc87dfe4e5d69	2019-06-20 22:02:36 -07:00
Tongzhou Wang	058beae411	Add IterableDataset (#19228 ) Summary: This is a modified version of https://github.com/pytorch/pytorch/pull/14705 since commit structure for that PR is quite messy. 1. Add `IterableDataset`. 3. So we have 2 data loader mods: `Iterable` and `Map`. 1. `Iterable` if the `dataset` is an instance of `IterableDataset` 2. `Map` o.w. 3. Add better support for non-batch loading (i.e., `batch_size=None` and `batch_sampler=None`). This is useful in doing things like bulk loading. 3. Refactor `DataLoaderIter` into two classes, `_SingleProcessDataLoaderIter` and `_MultiProcessingDataLoaderIter`. Rename some methods to be more generic, e.g., `get_batch` -> `get_data`. 4. Add `torch.utils.data.get_worker_info` which returns worker information in a worker proc (e.g., worker id, dataset obj copy, etc.) and can be used in `IterableDataset.__iter__` and `worker_init_fn` to do per-worker configuration. 5. Add `ChainDataset`, which is the analog of `ConcatDataset` for `IterableDataset`. 7. Import torch.utils.data in `torch/__init__.py` 9. data loader examples and documentations 10. Use `get_worker_info` to detect whether we are in a worker process in `default_collate` Closes https://github.com/pytorch/pytorch/issues/17909, https://github.com/pytorch/pytorch/issues/18096, https://github.com/pytorch/pytorch/issues/19946, and some of https://github.com/pytorch/pytorch/issues/13023 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19228 Reviewed By: bddppq Differential Revision: D15058152 fbshipit-source-id: 9e081a901a071d7e4502b88054a34b450ab5ddde	2019-06-20 20:12:44 -07:00
Iurii Zdebskyi	03617574d3	Сhange type of a tensor with bools (#19097 ) Summary: This is bc-breaking change Change dtype of a tensor which was created from bool data. Old behavior: torch.tensor([True, False]) -> uint8 tensor Now: torch.tensor([True, False]) -> bool tensor Tested via tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19097 Reviewed By: ezyang Differential Revision: D15632553 Pulled By: izdeby fbshipit-source-id: b019150844c561a6845710a3c62b12f06b68bbe3	2019-06-05 10:19:27 -07:00
Tongzhou Wang	f051fbd4a8	Fix typo in test_dataloader Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21226 Differential Revision: D15592797 Pulled By: soumith fbshipit-source-id: b9a83e574c7b10fb0d661332ab68e376409a4724	2019-06-01 10:30:14 -07:00
Tongzhou Wang	1d4685c20f	Improve test_proper_exit error printing (#20166 ) Summary: This doesn't have `strace` yet. But still have `faulthandler` to print stack traces at hanging. Also part of an attempt to isolate changes from #19228 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/20166 Differential Revision: D15536504 Pulled By: ezyang fbshipit-source-id: fe6e6e2e9899f30d8167436d7bc62b42883a3356	2019-05-29 07:51:31 -07:00
Tongzhou Wang	f496ea36b2	DataLoader: add error detection for worker_init_fn (#20150 ) Summary: This is an attempt to isolate unrelated changes from #19228 for easier review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20150 Differential Revision: D15314891 Pulled By: ezyang fbshipit-source-id: 8c429747ba83ad5aca4cdd8f8086bcf65a326921	2019-05-12 18:28:56 -07:00
Tongzhou Wang	1ab33fce9a	Disable worker_kill & holder_iter_reference combination in test_proper_exit (#20172 ) Summary: cc nairbv All failures I have seen are of this combination. So let's just disable it for all cases. After #20063 I find it failing for py3 once. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20172 Differential Revision: D15266527 Pulled By: nairbv fbshipit-source-id: afb9389dfc54a0878d52975ffa37a0fd2aa3a735	2019-05-08 14:39:47 -07:00
Brian Vaughan	9005a2c0fc	disable flaky test_proper_exit again, still occasionally failing (#20063 ) Summary: test was disabled for being flaky, re-enabled in https://github.com/pytorch/pytorch/pull/19421 but still occasionally failing: https://circleci.com/gh/pytorch/pytorch/1520165?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link ``` Apr 29 19:51:58 ====================================================================== Apr 29 19:51:58 FAIL: test_proper_exit (__main__.TestDataLoader) Apr 29 19:51:58 There might be ConnectionResetError or leaked semaphore warning (due to dirty process exit), but they are all safe to ignore Apr 29 19:51:58 ---------------------------------------------------------------------- Apr 29 19:51:58 Traceback (most recent call last): Apr 29 19:51:58 File "/var/lib/jenkins/workspace/test/common_utils.py", line 129, in wrapper Apr 29 19:51:58 fn(args, kwargs) Apr 29 19:51:58 File "test_dataloader.py", line 847, in test_proper_exit Apr 29 19:51:58 self.fail(fail_msg + ', and had exception {}'.format(loader_p.exception)) Apr 29 19:51:58 AssertionError: test_proper_exit with use_workers=True, pin_memory=False, hold_iter_reference=False, exit_method=worker_kill: loader process did not terminate, and had exception Traceback (most recent call last): Apr 29 19:51:58 File "test_dataloader.py", line 227, in run Apr 29 19:51:58 super(ErrorTrackingProcess, self).run() Apr 29 19:51:58 File "/opt/python/2.7.9/lib/python2.7/multiprocessing/process.py", line 114, in run Apr 29 19:51:58 self._target(self._args, *self._kwargs) Apr 29 19:51:58 File "test_dataloader.py", line 424, in _test_proper_exit Apr 29 19:51:58 for i, _ in enumerate(it): Apr 29 19:51:58 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 545, in __next__ Apr 29 19:51:58 idx, batch = self._get_batch() Apr 29 19:51:58 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 522, in _get_batch Apr 29 19:51:58 success, data = self._try_get_batch() Apr 29 19:51:58 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 480, in _try_get_batch Apr 29 19:51:58 data = self.data_queue.get(timeout=timeout) Apr 29 19:51:58 File "/opt/python/2.7.9/lib/python2.7/multiprocessing/queues.py", line 135, in get Apr 29 19:51:58 res = self._recv() Apr 29 19:51:58 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/queue.py", line 22, in recv Apr 29 19:51:58 return pickle.loads(buf) Apr 29 19:51:58 File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1382, in loads Apr 29 19:51:58 return Unpickler(file).load() Apr 29 19:51:58 File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 858, in load Apr 29 19:51:58 dispatch[key](self) Apr 29 19:51:58 File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1133, in load_reduce Apr 29 19:51:58 value = func(args) Apr 29 19:51:58 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/reductions.py", line 274, in rebuild_storage_fd Apr 29 19:51:58 fd = multiprocessing.reduction.rebuild_handle(df) Apr 29 19:51:58 File "/opt/python/2.7.9/lib/python2.7/multiprocessing/reduction.py", line 155, in rebuild_handle Apr 29 19:51:58 conn = Client(address, authkey=current_process().authkey) Apr 29 19:51:58 File "/opt/python/2.7.9/lib/python2.7/multiprocessing/connection.py", line 169, in Client Apr 29 19:51:58 c = SocketClient(address) Apr 29 19:51:58 File "/opt/python/2.7.9/lib/python2.7/multiprocessing/connection.py", line 304, in SocketClient Apr 29 19:51:58 s.connect(address) Apr 29 19:51:58 File "/opt/python/2.7.9/lib/python2.7/socket.py", line 224, in meth Apr 29 19:51:58 return getattr(self._sock,name)(*args) Apr 29 19:51:58 error: [Errno 111] Connection refused ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/20063 Differential Revision: D15218223 Pulled By: nairbv fbshipit-source-id: 32018c4220f7cb9372ef138631fc3a79759265e1	2019-05-06 08:34:27 -07:00
Seungwon Park	6c7135decb	fix typo: pytoch -> pytorch Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19719 Differential Revision: D15080095 Pulled By: ezyang fbshipit-source-id: b731a0fde87d25c63c1e3d4b9a9c2244e5ad84af	2019-04-25 06:40:40 -07:00
SsnL	5e62ee2b97	Fix no SIGCHLD checking in DataLoaderIter._shutdown_workers (#19421 ) Summary: Also 1. Bump multiprocessing test timeout following python core tests 2. Fix one type of flakiness in `test_proper_exit`. 3. Add trace reporting when loader process hangs in `test_proper_exit` using `faulthandler`. 3. Give `test_proper_exit` another try. I'll heavily retest this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19421 Differential Revision: D15063728 Pulled By: ezyang fbshipit-source-id: 4e0d992622e11053c44a9ec237b88b9a28a4472c	2019-04-24 08:06:58 -07:00
Edward Yang	8793e8db42	Disable flaky test_proper_exit test. (#18950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18950 ghimport-source-id: 27bd575fd3c73a51ace1360aa020fa63a792a5d2 Differential Revision: D14802009 Pulled By: ezyang fbshipit-source-id: 051e1d038892c2c6e8337357fa80771b8dc42680	2019-04-05 09:49:54 -07:00
Edward Yang	173f224570	Turn on F401: Unused import warning. (#18598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598 ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a Stack from [ghstack](https://github.com/ezyang/ghstack): * #18598 Turn on F401: Unused import warning. This was requested by someone at Facebook; this lint is turned on for Facebook by default. "Sure, why not." I had to noqa a number of imports in __init__. Hypothetically we're supposed to use __all__ in this case, but I was too lazy to fix it. Left for future work. Be careful! flake8-2 and flake8-3 behave differently with respect to import resolution for # type: comments. flake8-3 will report an import unused; flake8-2 will not. For now, I just noqa'd all these sites. All the changes were done by hand. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14687478 fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3	2019-03-30 09:01:17 -07:00
Tongzhou Wang	d1e416ac73	Enable printing to stderr for test_proper_exit for better debugging (#18458 ) Summary: related to https://github.com/pytorch/pytorch/issues/16608 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18458 Differential Revision: D14611718 Pulled By: soumith fbshipit-source-id: 6dc903ff2d32b9c3b76470869d1f4e9a67f706df	2019-03-25 19:20:21 -07:00
Edward Yang	2934153f35	Correctly call superclass setUp in TestCase subclasses. (#18291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18291 ghimport-source-id: d6e95e899bd320407967df41435801e54864ba62 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18292 Add test for #17271 (torch.exp incorrect for 2*31 size tensor) #18291 Correctly call superclass setUp in TestCase subclasses. This makes PYTORCH_TEST_SKIP_FAST work correctly for more tests, reducing the wasted testing effort on our slow_test job. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14567643 fbshipit-source-id: 40cf1d6556e0dd0a0550ff3d9ffed8b6000f8191	2019-03-22 07:46:44 -07:00
Tongzhou Wang	f212fd9fd6	Customized pin_memory for PackedSequence (#18079 ) Summary: fixes https://github.com/pytorch/pytorch/issues/18078 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18079 Reviewed By: ezyang Differential Revision: D14521192 Pulled By: zou3519 fbshipit-source-id: cec773a3a6f2c405a0d9701e213b7caf81649181	2019-03-19 13:41:30 -07:00
Edward Yang	d391137acd	Fix lint in test_dataloader.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17878 Reviewed By: eellison Differential Revision: D14409933 fbshipit-source-id: 20ee8953a21e29b4557aff62b5e48dddd630eef6	2019-03-11 14:50:51 -07:00
Edward Yang	b3c9090736	Revert D14392864: Fix lint in test_dataloader.py Differential Revision: D14392864 Original commit changeset: 12477b9cfe29 fbshipit-source-id: 1864a80d5cfaceeae55d0145340a578f978ab4a7	2019-03-11 10:19:41 -07:00
Edward Yang	c02369151d	Fix lint in test_dataloader.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17820 Reviewed By: eellison Differential Revision: D14392864 fbshipit-source-id: 12477b9cfe290428d51cc28e024c8cbe8bb7bf51	2019-03-11 08:01:33 -07:00
bhushan	a6c4ea66dd	Passing indices as a list to Subset instead of Tensor (#17649 ) Summary: Indices in Subset were stored as tensors earlier passing as list in random_split to ensure integer indexing fixes: #17466 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17649 Differential Revision: D14400250 Pulled By: soumith fbshipit-source-id: cd20a959f33773c4babf8e861ea37ec61c2713a0	2019-03-10 09:23:53 -07:00
youkaichao	b87abdfc12	typo fix Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17653 Differential Revision: D14302003 Pulled By: ezyang fbshipit-source-id: 8ad90985a392b07127c7e315d4e74ce77962b573	2019-03-06 11:36:44 -08:00
Eskil Jörgensen	8042edcdb1	Make pin_memory and default_collate preserve namedtuples (#16440 ) Summary: Open issue: https://github.com/pytorch/pytorch/issues/3281 Corresponding PR (conflict): https://github.com/pytorch/pytorch/pull/4577 Another open issue: https://github.com/pytorch/pytorch/issues/14613 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16440 Differential Revision: D14020901 Pulled By: ezyang fbshipit-source-id: 4abe817fc43c281a510715d311bad544511995d3	2019-02-11 08:47:33 -08:00
Michael Carilli	0742874643	Allow dataloader to accept a custom memory pinning function (#16743 ) Summary: Renewed attempt at https://github.com/pytorch/pytorch/pull/14171 From the original PR: > Currently, the pin_memory_batch function in the dataloader will return a batch comprised of any unrecognized type without pinning the data, because it doesn't know how. > >This behavior was preventing us from overlapping data prefetching in Mask-RCNN, whose custom collate_fn returns a custom batch type. The old PR allowed the user to implement batch pinning for custom batch and data types by passing a custom pin function to the dataloader. slayton58 suggested a cleaner approach: allow the user to define a `pin_memory` method on their custom types, and have `pin_memory_batch` [check for the presence of that method](https://github.com/pytorch/pytorch/pull/16743/files#diff-9f154cbd884fe654066b1621fad654f3R56) in the incoming batch as a fallback. I've updated the test and docstrings accordingly. The old PR was merged but then reverted due to weird cuda OOM errors on windows that may or may not have been related. I have no idea why my changes would cause such errors (then or now) but it's something to keep an eye out for. fmassa and yf225 who were my POCs on the old PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16743 Differential Revision: D13991745 Pulled By: ezyang fbshipit-source-id: 74e71f62a03be453b4caa9f5524e9bc53467fa17	2019-02-10 19:37:53 -08:00
Johannes M Dieterich	23e1c55cc0	enable unit tests working on ROCm 2.1 (#16871 ) Summary: This is the first round of enabling unit tests that work on ROCm 2.1 in my tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16871 Differential Revision: D13997662 Pulled By: bddppq fbshipit-source-id: d909a3f7dd5fc8f85f126bf0613751c8e4ef949f	2019-02-09 00:30:50 -08:00
SsnL	4aae89fa7b	Make test_proper_exit more robust (#16249 ) Summary: 1. Improve error message for better debugging info 2. Increase timeout 3. Also apply the windows worker failure detection mechanism on non-Windows platforms, for better robustness Attempt to fix #14501 cc ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/16249 Differential Revision: D13784702 Pulled By: ezyang fbshipit-source-id: 09a7cff83ab9edce561ed69f9fb555ab35d1275f	2019-01-25 08:25:05 -08:00
SsnL	ffd613800f	Add IS_PYTORCH_CI flag for testing (#16006 ) Summary: Use case: Some data loader tests rely on `psutil` (a third party lib). So they are guarded by `skipIf`. But we want to always test them on CI envs. With `IS_PYTORCH_CI`, we can raise if `psutil` is not found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16006 Reviewed By: ezyang Differential Revision: D13673957 Pulled By: yf225 fbshipit-source-id: c63a7138093f45333c0b371fed0bcc88b67f2a22	2019-01-16 23:07:38 -08:00
kyryl	a7415787ac	fix RandomSampler length (#15991 ) Summary: Hi! This PR addresses #15537 issue. Please review. Thanks! Differential Revision: D13649890 Pulled By: soumith fbshipit-source-id: 166212ae383331345423236dfc4fa2ea907d265d	2019-01-13 23:09:51 -08:00
SsnL	9b5ec2a076	Fix TestDataLoader.test_proper_exit (#15665 ) Summary: Currently, in `test_proper_exit`, 1. we do not kill the correct input `pid` in the `kill_pid` function `fe15d6a2c2/test/test_dataloader.py (L325-L329)` 2. the Windows command that detects process status doesn't actually work `fe15d6a2c2/test/test_dataloader.py (L641-L646)` 3. `worker_error` and `worker_kill` cases (sometimes?) are not tested because the workers may exit naturally due to the pre-fetching mechanism and a too small `dataset size / batch size`. In this PR, I, in separate commits: 1. Install `psutil` (a python package specifically built for process monitoring) on some CI builds. (Linux builds installation are done in https://github.com/pietern/pytorch-dockerfiles/pull/29 https://github.com/pietern/pytorch-dockerfiles/pull/30 https://github.com/pytorch/ossci-job-dsl/pull/36 and https://github.com/pytorch/pytorch/pull/15795). 2. Rewrite `test_proper_exit` with `psutil` so we 1. do not rely on the hacky `is_process_alive` `fe15d6a2c2/test/test_dataloader.py (L640-L653)` 2. increase the #task per worker so `worker_error` and `worker_kill` properly trigger 3. test error message content to ensure that the loader exits with correct message corresponding to each exiting scenario. 3. Fix Windows data loader not having any mechanism to detect worker failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15665 Differential Revision: D13615527 Pulled By: soumith fbshipit-source-id: cfb2f67837d2d87928a53f00b4d20f09754b7949	2019-01-10 08:47:27 -08:00
Christoph	2a45050fdc	Concatenate directly into shared memory when constructing batches for numpy (#14534 ) Summary: Since #1323 tensors are shared with shared memory, but this feature is not active for numpy. This PR fix this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14534 Differential Revision: D13561649 Pulled By: soumith fbshipit-source-id: b6bc9e99fb91e8b675c2ef131fba9fa11c1647c0	2018-12-29 17:51:02 -08:00
SsnL	fb22f76eb6	default_collate should collate bool list to byte tensors (#14669 ) Summary: Based on #15331 . Review only the last commit. Fixes https://github.com/pytorch/pytorch/issues/14507. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14669 Reviewed By: ezyang Differential Revision: D13528725 Pulled By: soumith fbshipit-source-id: f12f1ac1c4ff2a3ddd6877c0c096a5da3a1ffa3c	2018-12-28 12:26:46 -08:00
SsnL	9217bde807	Refactor dataloader.py (#15331 ) Summary: Same as #14668, and was approved there. ailzhang , please apply this patch to Horizon's `data_streamer.py`: https://gist.github.com/SsnL/020fdb3d6b7016d81b6ba1d04cc41459 Thank you! Below is the original description at #14668: As I am working on tasks in https://github.com/pytorch/pytorch/issues/13023, I realized how unreadable the code is because all functions to be run in multiprocessing must be at top global level. Adding more functionalities to `dataloader.py` will only make things worse. So in this PR, I refactor `dataloader.py` and move much of it into `data._utils`. E.g., the `_worker_loop` and related methods are now in `data._utils.worker`, signal handling code in `data._utils.signal_handling`, collating code in `data._utils.collate`, etc. This split, IMHO, makes code much clearer. I will base my future changes to DataLoader on top of this. No functionality is changed, except that I added `torch._six.queue`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15331 Reviewed By: yf225 Differential Revision: D13503120 Pulled By: ailzhang fbshipit-source-id: 94df16b4d80ad1102c437cde0d5a2e62cffe1f8e	2018-12-19 12:36:03 -08:00
Ailing Zhang	38eb1beff5	Revert D13289919: [pytorch][PR] [DataLoader] Refactor dataloader.py Differential Revision: D13289919 Original commit changeset: d701bc7bb48f fbshipit-source-id: c350c491fefa98a0a7c0cf22cb832e78aeb15c3d	2018-12-04 20:25:16 -08:00
SsnL	16558a1e9d	Refactor dataloader.py (#14668 ) Summary: As I am working on tasks in https://github.com/pytorch/pytorch/issues/13023, I realized how unreadable the code is because all functions to be run in multiprocessing must be at top global level. Adding more functionalities to `dataloader.py` will only make things worse. So in this PR, I refactor `dataloader.py` and move much of it into `data._utils`. E.g., the `_worker_loop` and related methods are now in `data._utils.worker`, signal handling code in `data._utils.signal_handling`, collating code in `data._utils.collate`, etc. This split, IMHO, makes code much clearer. I will base my future changes to DataLoader on top of this. No functionality is changed, except that I added `torch._six.queue`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14668 Reviewed By: soumith Differential Revision: D13289919 Pulled By: ailzhang fbshipit-source-id: d701bc7bb48f5dd7b163b5be941a9d27eb277a4c	2018-12-04 09:53:41 -08:00
Will Feng	5918de8e84	Revert D13166669: [pytorch][PR] Allow dataloader to accept a custom memory pinning function Differential Revision: D13166669 Original commit changeset: ca965f9841d4 fbshipit-source-id: 0836b4f50f73ba01c97491a719660f02e36f20ad	2018-11-26 14:55:04 -08:00
Michael Carilli	7557a993ab	Allow dataloader to accept a custom memory pinning function (#14171 ) Summary: Currently, the `pin_memory_batch` function in the dataloader will return a batch comprised of any unrecognized type without pinning the data, because it doesn't know how. This behavior was preventing us from overlapping data prefetching in Mask-RCNN, whose custom `collate_fn` returns a custom batch type. The present PR adds the ability for the user to pass a `pin_fn` alongside any custom `collate_fn` to handle such custom types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14171 Differential Revision: D13166669 Pulled By: soumith fbshipit-source-id: ca965f9841d4a259b3ca4413c8bd0d8743d433ab	2018-11-23 08:12:43 -08:00
Freddie Mendoza	2c21de2007	Make JOIN_TIMEOUT longer for ppc64le (#14107 ) Summary: This should resolve the issue on ppc64le getting FAIL: test_proper_exit (__main__.TestDataLoader). This only happens when the CI build machine is very busy and fails with a timeout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14107 Differential Revision: D13103859 Pulled By: soumith fbshipit-source-id: 268be80b59840853c5025f3211af272f68608fe5	2018-11-16 12:12:58 -08:00
Zachary DeVito	dae7616078	Shard all of tests based on how many tests exist. (#13160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13160 Reduces pytorch_core build from 2 hours to 30 minutes Reviewed By: soumith, dzhulgakov Differential Revision: D10524261 fbshipit-source-id: 97270ac73404b5ea4c264cd0e9d8d4b1be79b0e9	2018-10-26 18:20:34 -07:00
James Sun	f4944f0f8a	Rename test/common.py to test/common_utils.py (#12794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12794 common.py is used in base_module for almost all tests in test/. The name of this file is so common that can easily conflict with other dependencies if they happen to have another common.py in the base module. Rename the file to avoid conflict. Reviewed By: orionr Differential Revision: D10438204 fbshipit-source-id: 6a996c14980722330be0a9fd3a54c20af4b3d380	2018-10-17 23:04:29 -07:00
Brian W. Hart	53f4dbc9ac	test_proper_exit: avoid truncation of info message (#12612 ) Summary: test_proper_exit in the dataloader test bucket includes (as its docstring) a reassuring message about complaints that may appear during the test. The message is displayed when the tests are run in verbose mode. But the docstring includes a line break, and the unittest framework only prints the first line of the docstring (see shortDesription()). As a result, the 2nd (more reassuring) half of the message is not displayed. Catenate the docstring onto a single line so all is visible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12612 Differential Revision: D10368786 Pulled By: ezyang fbshipit-source-id: 14b259a6d6a3491d4290148eae56e6ab06f2a9b6	2018-10-12 16:32:28 -07:00
Tongzhou Wang	6069f6f454	Try to prevent occasional timeout in test_proper_exit Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12587 Differential Revision: D10361411 Pulled By: SsnL fbshipit-source-id: 97d0ff9d40918b7729c21f4de6d8cabeb65c728a	2018-10-12 10:53:01 -07:00
Johannes M Dieterich	957142a4fe	switch ROCm CI targets to white rabbit release (#12577 ) Summary: * switches docker files over to white rabbit release - removed custom package installs * skips five tests that regressed in that release * fixes some case-sensitivity issues in ROCm supplied cmake files by sed'ing them in the docker * includes first changes to the infrastructure to support upcoming hip-clang compiler * prints ROCm library versions as part of the build (as discussed w/ ezyang ) * explicitly searches for miopengemm * installs the new hip-thrust package to be able to remove the explicit Thrust checkout in a future revision Pull Request resolved: https://github.com/pytorch/pytorch/pull/12577 Differential Revision: D10350165 Pulled By: bddppq fbshipit-source-id: 60f9c9caf04a48cfa90f4c37e242d944a175ab31	2018-10-11 18:03:11 -07:00
Jie	a3fb004b18	(#12474 ) Summary: Modifies the DistributedSampler logic. Now each process samples elements with a given interval, instead of a consecutive section. This eliminates the possibility where the DataLoader uses padded data while dropping the real data. It happens when: 1. DistributedSampler padded data; and 2. DataLoader drops_last is effectively true, and drops less then the number of padded data. from the example down, we see that data (10, 11, 12) are padded through duplicating data sample (1, 2, 3) The old sampler drops legit original data (3, 6, 9) and introduces duplication (10, 11) into the training set; while the new sampler logic samples correct data points from the data set. This example has been added to dataloader unit test example: ``` data after shuffle: 1, 2, 3, 4, 5, 6, 7, 8, 9 padded data : 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 old sampler: -> DataLoader with (batch_size=2 and drop_last=True) p 1: 1, 2, 3 1, 2 p 2: 4, 5, 6 4, 5 p 3: 7, 8, 9 7, 8 p 4:10,11,12 10,11 new sampler: -> p 1: 1, 5, 9 1, 5 p 2: 2, 6,10 2, 6 p 3: 3, 7,11 3, 7 p 4: 4, 8,12 4, 8 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12474 Differential Revision: D10260410 Pulled By: SsnL fbshipit-source-id: 710856571260f42ce25955b81a5b8008e04938cf	2018-10-09 11:23:50 -07:00
Tongzhou Wang	11c31aef04	Prevent hanging in data loader altogether Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11985 Differential Revision: D10202374 Pulled By: SsnL fbshipit-source-id: 1ab1a07185f78a104f9b05930a87ef5a32f431e4	2018-10-09 09:54:19 -07:00
iotamudelta	a2ebbccc9f	fix unit tests on CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12187 Differential Revision: D10118483 Pulled By: bddppq fbshipit-source-id: 986c8fb48d61e00103c713548a50e74489a0e442	2018-09-28 23:11:55 -07:00
Wei Yang	7f9fd1cc26	allow RandomSampler to sample with replacement (#9911 ) Summary: fixes #7908 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9911 Reviewed By: yf225 Differential Revision: D9023223 Pulled By: weiyangfb fbshipit-source-id: 68b199bef3940b7205d0fdad75e7c46e6fe65ba7	2018-08-28 10:52:25 -07:00
Johannes M Dieterich	a4c59a9dab	MIOpen integration, more tests enabled, bug fixes (#10612 ) Summary: * first integration of MIOpen for batch norm and conv on ROCm * workaround a ROCm compiler bug exposed by elementwise_kernel through explicit capture of variables in the densest packing * workaround a ROCm compiler bug exposed by having `extern "C" __host__` as a definition and just `__host__` in the implementation through the hipify script * use fabs() in accordance with C++11 for double absolute, not ::abs() which is integer-only on ROCm * enable test_sparse set on CI, skip tests that don't work currently on ROCm * enable more tests in test_optim after the elementwise_bug got fixed * enable more tests in test_dataloader * improvements to hipification and ROCm build With this, resnet18 on CIFAR data trains without hang or crash in our tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10612 Reviewed By: bddppq Differential Revision: D9423872 Pulled By: ezyang fbshipit-source-id: 22c0c985217d65c593f35762b3eb16969ad96bdd	2018-08-23 15:24:47 -07:00
iotamudelta	75651d5b58	improve use of ROCm libraries, enable more tests, small fixes (#10406 ) Summary: * some small leftovers from the last PR review * enable more unit test sets for CI * replace use of hcRNG w/ rocRAND (docker image was already updated w/ newer rocRAND) * use rocBLAS instead of hipBLAS to allow convergence w/ Caffe2 * use strided_batched gemm interface also from the batched internal interface * re-enable Dropout.cu as we now have philox w/ rocRAND Pull Request resolved: https://github.com/pytorch/pytorch/pull/10406 Reviewed By: Jorghi12 Differential Revision: D9277093 Pulled By: ezyang fbshipit-source-id: 7ef2f6fe4ead77e501ed7aea5c3743afe2466ca2	2018-08-13 11:39:43 -07:00
Tongzhou Wang	04f381650e	Resubmit: Fix dataloader hang when it is not completely iterated (#10366 ) Summary: https://github.com/pytorch/pytorch/pull/9655 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10366 Differential Revision: D9237393 Pulled By: SsnL fbshipit-source-id: fabfad7f371ba33300098f6b885c0e3f26c3e14a	2018-08-09 00:10:24 -07:00
iotamudelta	cfa05706ef	ROCm contributions week 29 (#9653 ) Summary: In this changeset: * improvements to `hipify-python.py` * marking unit tests broken for ROCm * reducing the number of jobs for the built to avoid out of memory issues * switch to Thrust/cub-hip master for the CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/9653 Differential Revision: D9117791 Pulled By: ezyang fbshipit-source-id: a6c3c7b81f2bda9825974bf9bf89a97767244352	2018-08-02 09:09:00 -07:00
Tongzhou Wang	a7f183f971	Revert "Fix dataloader hang when it is not completely iterated (#9655 )" (#9804 ) Summary: This reverts commit `9ee5133651`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9804 Reviewed By: ezyang Differential Revision: D8987780 Pulled By: SsnL fbshipit-source-id: 75ad70b0b8d672d0b35235fa248b187be64b68e5	2018-07-25 10:10:30 -07:00
Tongzhou Wang	a387331e54	Re-enable test_segfault after recent dataloder changes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9700 Differential Revision: D8953615 Pulled By: SsnL fbshipit-source-id: c6aa3c07dd2857dd54889d47e537a6b1e9198c60	2018-07-23 18:38:42 -07:00
Tongzhou Wang	9ee5133651	Fix dataloader hang when it is not completely iterated (#9655 ) Summary: second trial of https://github.com/pytorch/pytorch/pull/7140 cc csarofeen Let's see if this works. It passes everything locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9655 Differential Revision: D8940177 Pulled By: SsnL fbshipit-source-id: 8d6340fc9f7355c71e1e26b262da166402faa158	2018-07-22 20:38:27 -07:00
Tongzhou Wang	050a2588b5	change stft to have consistent signature with librosa (#9497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9497 Fixes #7883 by using `rfft`. It's worth noting that this is BC breaking. And it's impossible to detect the change because the two signatures before and after this change supports a common subset of calling patterns, e.g., `stft(Tensor, int, int)`. (some other calling patterns will raise error). soumith and I plan to change the current `stft` interface because it is a bit messy and non-standard. rafaelvalle suggested us that `librosa` is a good reference API to align with. After discussing with soumith and ezyang , and given that `stft` is only out for 1 release, I decide to go with directly changing the signature. Also, my understanding is that most researchers in this field will welcome this change as `librosa` seems to be the golden-standard here. (it doesn't yet support all `pad_mode` but those will become available if added to `F.pad`.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/9308 Reviewed By: ezyang Differential Revision: D8806148 Pulled By: SsnL fbshipit-source-id: f6e8777d0c34d4a4d7024e638dc9c63242e8bb58	2018-07-17 10:55:43 -07:00
Will Feng	90fd4df695	Add flag for disabling tests with multiprocessing spawn start method (#9061 ) Summary: This will resolve some of the timeout issues in CPU and GPU tests internally. Closes https://github.com/pytorch/pytorch/pull/9061 Reviewed By: ezyang Differential Revision: D8707471 Pulled By: yf225 fbshipit-source-id: 9dc82a2c9da0c540ae015442f74b9b2b1a67a246	2018-06-30 14:39:11 -07:00
Will Feng	c84b97b979	[READY TO MERGE] Enable tests that use DataLoader with multiple workers on Windows (#6745 ) * Don't import TEST_CUDA for test_dataloader on Windows * test_partial_workers is stuck on Windows	2018-06-06 22:50:39 -04:00
Will Feng	e8bdbdaa27	Terminate dataloader workers properly when parent process is SIGKILL'ed (#6779 ) Reopening #6606 with fix for TEST_CUDA import issue on Windows and improvement to how we wait for manager exit in test_manager_unclean_exit. Loop tested on the Windows CI multiple times to make sure this actually fixes the CUDA OOM issue. * Terminate dataloader workers properly when parent process is SIGKILL'ed * Wait for worker processes to finish before shutting down manager process * Add test for checking proper worker exit * cosmetic change * Test only if CUDA exists * Don't call multiprocessing.set_start_method() in Python 2 * import TEST_CUDA only when we are in __main__ * Tune JOIN_TIMEOUT * handle os.getppid() == 0 case * Reset to original JOIN_TIMEOUT * Use WaitForSingleObject() to check parent process status on Windows * Fix TEST_CUDA import * clean up * Check main process only when index_queue.get() times out * Change index_queues to multiprocessing.Queue * Move manager checking logic to watchdog class * Fix bugs in dataloader * Fix TEST_CUDA import issue * Don't import TEST_CUDA from common_nn * Use event to signal manager exit in test * fix lint * Add comments	2018-04-22 23:03:54 -04:00
gchanan	4c5b95a433	Revert "Terminate dataloader workers properly when parent process is SIGKILL'ed (#6606 )" (#6772 ) This reverts commit `8d6a50aaeb`.	2018-04-19 14:28:48 -04:00
Will Feng	8d6a50aaeb	Terminate dataloader workers properly when parent process is SIGKILL'ed (#6606 ) * Terminate dataloader workers properly when parent process is SIGKILL'ed * Wait for worker processes to finish before shutting down manager process * Add test for checking proper worker exit * cosmetic change * Test only if CUDA exists * Don't call multiprocessing.set_start_method() in Python 2 * import TEST_CUDA only when we are in __main__ * Tune JOIN_TIMEOUT * handle os.getppid() == 0 case * Reset to original JOIN_TIMEOUT * Use WaitForSingleObject() to check parent process status on Windows * Fix TEST_CUDA import * clean up * Check main process only when index_queue.get() times out * Change index_queues to multiprocessing.Queue * Move manager checking logic to watchdog class * Fix bugs in dataloader * Fix TEST_CUDA import issue	2018-04-18 20:41:33 -04:00
Tongzhou Wang	60a16e5663	Set dataloader.batch_size = None when batch_sampler is given (#6108 )	2018-03-30 10:01:09 +02:00
Jason Park	64e2c03bea	Enable TensorDataset to get any number of tensors (#6038 ) Keeping compatibility, enable TensorDataset to get any number of tensors. * Enable TensorDataset to get any number of tensors * Update dataset.py Fix syntax error on python 2.7 * Add several test for tensordataset * Fix whitespaces * Simplify args * Update dataset.py	2018-03-28 11:20:50 -04:00
AlexanderRadionov	831780390c	Fixed non-determinate preprocessing on DataLoader (#4640 ) dded ind_worker_queue parameter to data.DataLoader. It makes preprocessing determinate. DataLoader in multiprocessing mode may cause non-deterministic issue. Even if radom_seed has frozen, each subprocess may get tasks in unstable order. This is caused by different I/O time while data loads. If you use augmentation while data loading, it makes results unreproduceble. Look at the https://discuss.pytorch.org/t/deterministic-non-deterministic-results-with-pytorch/9087 To fix this issue I have added the individual queue for each worker. In this case each worker get tasks in the stable order. In summary, subprocess produces the stable results. To reproduce issue you may change ind_worker_queue to False and run the script several times. Code to reproduce issue is in the corresponding PR. * TestIndividualWorkerQueue added to DataLoader tests * Review fixes * "Simplify" code by removing itertools * Rebase conflicts fix * Review fixes * Fixed shutdown behavior * Removed ind_worker_queue flag. * Rebase on master * Disable tests that use DataLoader with multiple workers (#5322)	2018-03-23 17:43:59 -04:00
Will Feng	0340e46f9b	Disable tests that use DataLoader with multiple workers (#5322 )	2018-02-21 09:20:37 -05:00
Tongzhou Wang	964707e9b5	temporarily disable test_segfault until we figure out why it intermittently fails on cuda CI workere (#4976 )	2018-01-31 19:04:44 -05:00
Tongzhou Wang	64a9ecae02	Dataloader issues (#4643 ) * EINTR and kill by loader fix * addressed @apaszke 's comments * remove EINTR handling and add test if we are in main thread before setting SIGCHLD	2018-01-29 01:18:17 +01:00
peterjc123	2dd7039b6b	Fix multiprocessing and dataloader tests on Windows (#4453 )	2018-01-06 17:41:36 +01:00
Tongzhou Wang	cc9dc3f343	add lock for SynchronizedSeedDataset; add additional os level close stderr for tests that launch failing process (#4463 )	2018-01-03 22:45:05 -05:00
Alykhan Tejani	18a866aedd	Add random_split to torch.utils.data.dataset (#4435 )	2018-01-02 18:56:49 +01:00
Will Feng	1681d07199	Disable tests and fix issues with Windows CUDA build (#4251 )	2017-12-20 11:30:21 +01:00
Tongzhou Wang	5cc26c0c90	Add default PyTorch seeding and worker_init_fn to DataLoader (#4018 ) * Add default PyTorch seeding and worker_init_fn to DataLoader * generate seed using current RNG each time * worker_seed <- main_proc_RNG_generated_seed + worker_id	2017-12-18 02:19:08 -05:00
Will Feng	db446d69ca	Fix issues with Windows 7 & 10 CPU build (#4065 )	2017-12-15 10:14:43 +01:00
SsnL	1661370ac5	Signal handling in DataLoader workers; Timeout option (#3474 )	2017-11-29 23:52:14 +01:00
Richard Zou	e579ae75b5	Fix error when default_collate is passed a collection of numpy.str_ (#3404 ) * Fix error when default_collate is passed a collection of numpy.str_ * Error if default_collate input is nested nparray containing non-numbers	2017-11-08 10:02:08 -05:00
Tzu-Wei Huang	618026e999	implements operator + for Dataset class (#3180 ) * implements operator + for Dataset class * check for exact equivalent	2017-10-29 01:19:59 +05:30
Valentin Haenel	d592e188f7	port of ConcatDataset (#1902 )	2017-06-27 12:31:56 -04:00
Sam Gross	f09027bc29	Add batch sampler to DataLoader (#1867 )	2017-06-22 20:18:31 +02:00
Isac Arnekvist	156fe28666	dataloader can now handle growing datasets (#1575 )	2017-05-17 19:23:15 -04:00
Sasank Chilamkurthy	94b147fd41	Allows dicts batches in dataloader. (#1354 ) * Allow dicts in Dataloader * use collections.Sequence instead of collections.Iterable in dataloader	2017-04-28 19:14:52 +02:00
Adam Paszke	605b3c86ce	Retain the type of numpy scalars in collate_fn	2017-04-11 14:48:54 -07:00
Eli Stevens	e216f557fd	Fixes issue returning strings from a Dataloader with pin_memory=True (#908 )	2017-03-13 10:11:07 +01:00
Adam Paszke	7ea6ae57c8	Support numpy arrays in default_collate	2017-02-20 23:28:31 -08:00
zhtvk	4d37ef878c	Remove view on data and target tensors of dim 1 in TensorDataset (#609 )	2017-02-09 22:06:39 +01:00
Luke Yeager	e7c1e6a8e3	[pep8] Fix most lint automatically with autopep8 Here's the command I used to invoke autopep8 (in parallel!): git ls-files \| grep '\.py$' \| xargs -n1 -P`nproc` autopep8 -i Several rules are ignored in setup.cfg. The goal is to let autopep8 handle everything which it can handle safely, and to disable any rules which are tricky or controversial to address. We may want to come back and re-enable some of these rules later, but I'm trying to make this patch as safe as possible. Also configures flake8 to match pep8's behavior. Also configures TravisCI to check the whole project for lint.	2017-01-28 01:15:51 +01:00
Adam Paszke	a1fa995044	Fixes and improvements (#593 ) * Fix error in ELU backward * Add --seed flag for testst st * Add test for BatchNorm eval * Fix autograd.backward docs * Support cc flags in cuDNN search * Fix IndexSelect backward formula	2017-01-25 22:21:49 -05:00
Sam Gross	ac8a5e7f0d	Remove error message assertion (#480 ) Depending on how PyTorch is compiled, the source code for DataLoader might not be fully available which can cause a spurious error in test_dataloader.py	2017-01-18 13:16:38 -05:00
Sergey Zagoruyko	89d930335b	fix tests for GPU-less setup (#298 )	2016-12-12 10:56:57 +01:00
Sam Gross	be3276fcdd	Account for batch_size in DataLoader.__len__() (#277 )	2016-12-02 01:21:36 -05:00
Sam Gross	aea6ba4bcd	Support pinned memory in the DataLoader (#265 ) DataLoader now supports the constructor argument 'pin_memory'. When set to true, tensors in the sample are copied to pinned memory. This happens in a background thread when num_workers > 1.	2016-11-29 12:35:03 -05:00
Sam Gross	6db721b5dd	Make DataLoader preserve the ordering of the dataset (#135 )	2016-10-21 23:54:16 -04:00

... 2 3 4 5 6

257 Commits