Summary:
pin_memory, has optional device parameter to specify
which device you want to pin for. With this above change
the Dataloader will work only for CUDA backend. To add
support for other backend which supports pinned memory,
dataloader is updated with device as optional parameter.
Fixes #{issue number}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65402
Reviewed By: zou3519
Differential Revision: D32282204
Pulled By: VitalyFedyunin
fbshipit-source-id: e2e09876969af108d0db38af7c2d1b2f1cfa9858
(cherry picked from commit 3b76e151964fce442e27fe8fb5c37af930da4fa1)
Without this, `DataLoader2` will just add an `Shuffler` to the end of the datapipe if `shuffle=True`:
```py
from torch.utils.data.dataloader_experimental import DataLoader2
from torchdata.datapipes.iter import IterableWrapper, IterDataPipe, Shuffler
class Sorter(IterDataPipe):
def __init__(self, datapipe):
self.datapipe = datapipe
def __iter__(self):
return iter(sorted(self.datapipe))
data = list(range(1000))
dp = IterableWrapper(data)
dp = Shuffler(dp).set_shuffle(False)
dp = Sorter(dp)
dl2 = DataLoader2(dp, shuffle=True, batch_size=None)
assert list(dl2) == data # fails unless you hit a lucky random seed
```
This example is somewhat non-sensical, but demonstrates we cannot simply add a `Shuffler`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75014
Approved by: https://github.com/ejguan
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72896
Fixing the issue described here: https://github.com/pytorch/data/issues/214
There will be a follow-up PR in TorchData as well
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision: D34258669
Pulled By: NivekT
fbshipit-source-id: 6dd88250ed14ebe779915dc46139be7e012e9d1b
(cherry picked from commit 025b8ed98019e576bfef04c33a3f33ed1a426a66)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71579Fixes#1551
As the comment in the code, register a function to terminate persistent workers.
By adding a reference of these workers in `atexit`, it would prevent Python interpreter kills these persistent worker processes before `pin_memorh_thread` exits.
And, if users explicitly kills DataLoader iterator, such function in `atexit` would be a no-op.
Test Plan: Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision: D33896537
Pulled By: ejguan
fbshipit-source-id: 36b57eac7523d8aa180180c2b61fc693ea4638ae
(cherry picked from commit 05add2ae0f)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71579Fixes#1551
As the comment in the code, register a function to terminate persistent workers. Using `atexit` to make sure termination of persistent workers always happens at the end (after pin_memory_thread exits).
We need such mechanism because Python interpreter would clean up worker process before DataLoader iterator in some rare cases.
Test Plan: Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision: D33694867
Pulled By: ejguan
fbshipit-source-id: 0847f4d424a0cd6b3c0be8235d505415970254e8
(cherry picked from commit 18ad4621af)
Summary:
`default_collate`, `default_convert`, and `pin_memory` convert sequences into lists. I believe they should keep the original type when possible (e.g., I have a class that inherits from `list`, which comes from a 3rd party library that I can't change, and provides extra functionality).
Note it's easy to do when the type supports an iterable in its creation but it's not always the case (e.g., `range`).
Even though this can be accomplished if using a custom `default_collate`/`default_convert`, 1) this is behavior they should support out-of-the-box IMHO, and 2) `pin_memory` still does it.
cc VitalyFedyunin ejguan NivekT
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68779
Reviewed By: wenleix
Differential Revision: D32651129
Pulled By: ejguan
fbshipit-source-id: 17c390934bacc0e4ead060469cf15dde815550b4
Summary:
This test occasionally deadlocks while waiting for the child process to report result.
But as the test is small, entire test should never take more than 1-2 sec, but to be on the safe side set timeout to 5 sec
Somewhat mitigates https://github.com/pytorch/pytorch/issues/65727
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65742
Reviewed By: janeyx99, ejguan
Differential Revision: D31235116
Pulled By: malfet
fbshipit-source-id: 0cdd2f7295f6f9fcefee954a14352e18fae20696
Summary:
The setaffinity test always fails when the number of CPUs is smaller
than 3. Changed the test to be dynamically based on the number of CPUs
of the system.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65042
Reviewed By: jbschlosser
Differential Revision: D30960554
Pulled By: ejguan
fbshipit-source-id: 55ac12714b4b0964b48c3617b79a7a345d40ebce
Summary:
`IS_PYTORCH_CI` and `IN_CI` are used randomly, however in some cases IN_CI is not currently set because it only exist in .circleci/scripts/setup_ci_environment.sh. This cleans up the 2 flags and only use IN_CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60279
Test Plan: CI
Reviewed By: seemethere
Differential Revision: D29239545
Pulled By: walterddr
fbshipit-source-id: a069424a2bb8790a3adfdaf0dc460301026bf8c7
Summary:
Dataset can be indexed by a list, but a list can not be indexed by a list. This gives error when slicing a Subset initialised with a Subset, instead of a dataset.
Fixed the issue by changing the indices to a Tensor which can be indexed by a list.
Fixes https://github.com/pytorch/pytorch/issues/59512
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59513
Reviewed By: zou3519
Differential Revision: D29196891
Pulled By: ejguan
fbshipit-source-id: ccde6e474fbcbddd2e9c7c107bc8b5de1307cdb9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56797
After adding default seeding strategy for NumPy random module within each worker of DataLoader #56488, two concerns are raised:
- We dropped the support for NumPy < 1.17 due to `SeedSequence`
- In order to support seeding for NumPy < 1.17, how can we provide seed for `numpy.random`?
- First option is set the same seed as `random`. But, the problem is a same algorithm is shared between `numpy.random` and `random`. With the same seed, they will have exact same state sequence. Thanks to rkern, we noticed this so-called [bad things](https://github.com/PyTorchLightning/pytorch-lightning/pull/6960#issuecomment-818393659).
- Considering most of users do not aware this problem, we can provide a better seed by default for `numpy.random` using same `SeedSequence` algorithm as numpy. This is just a workaround with hard-coded function to generate an array of four int32 as the seed.
To better coping with this problem since there are amount of 3rd party libraries not just `NumPy` having random module. We may at the end need to implement a `SeedSequence` within `torch.random` module, then users can `spawn` a new `SeedSequence` for each library.
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D28000619
Pulled By: ejguan
fbshipit-source-id: 5701c8124a38ea5ded69eb8eee70f9680877ffa6
Summary:
## Problem summary
Fixes https://github.com/pytorch/pytorch/issues/54752 - when the number of threads is more than 3 and at least one `set_num_threads` invocation has taken place before forking child processes by the dataloader, `set_num_threads(1)` in the child process causes a segfault, as during its invocation, the child process is made to handle the data structures of the Caffe2 thread-pool of the parent process, whose data structures it inherits from the parent process (these threads don't exist in the child process, but some of its data structures do, due to the copy-on-write technique used by `fork`).
## Solution
malfet [advised](https://github.com/pytorch/pytorch/issues/54752#issuecomment-810315302) & [authored code](https://github.com/pytorch/pytorch/pull/54895#pullrequestreview-625670122) for adding a `pthread_atfork` handler in `pytorch/caffe2/utils/threadpool/pthreadpool-cpp.cc`, that's invoked in the child process right after fork, to leak the Caffe2 thread-pool (the child inherits the thread-pool's data structures from its parent process, but doesn't actually have those threads, since after `fork` , a child process only has one thread).
## Additional changes
Added unittest `test_no_segfault` to test for this issue in `test_dataloader.py`
Also enabled `test_segfault` (which actually makes sure that segfaults happen in worker processes in a particular case).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54895
Reviewed By: zhangguanheng66
Differential Revision: D27542253
Pulled By: malfet
fbshipit-source-id: 10f9c67ce1ff1aa37d3efebf405bd93f7f9d2489
Summary:
When compiled with OpenMP support `ideep`'s computational_cache would cache max number of OpenMP workers
This number could be wrong after `torch.set_num_threads` call, so clean it after the call.
Fixes https://github.com/pytorch/pytorch/issues/53565
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53871
Reviewed By: albanD
Differential Revision: D27003265
Pulled By: malfet
fbshipit-source-id: 1d84c23070eafb3d444e09590d64f97f99ae9d36
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52141
Remove BufferShuffleDataSet, as it's not being used anywhere within PyTorch (no usage on Github based on a search) and it's not included in the release of PyTorch 1.7.1.
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D26710940
Pulled By: ejguan
fbshipit-source-id: 90023b4bfb105d6aa392753082100f9181ecebd0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49486
Remove code for Python 3.5 and lower.
There's more that can be removed/modernised, but sticking mainly to redundant version checks here, to keep the diff/PR smaller.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46579
Reviewed By: zou3519
Differential Revision: D24453571
Pulled By: ezyang
fbshipit-source-id: c2cfcf05d6c5f65df64d89c331692c9aec09248e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48865
If DistributedSampler was provided an invalid rank (ex:
https://discuss.pytorch.org/t/distributed-datasets-on-multi-machines/105113),
it failed with a cryptic assertion failure.
To fix this issue, I've added an additional check to DistributedSampler to
validate we provide a valid rank.
ghstack-source-id: 117906769
Test Plan:
1) waitforbuildbot
2) Unit test added.
Reviewed By: malfet
Differential Revision: D25344945
fbshipit-source-id: 7685e00c8b2c200efbd2949fb32ee32ea7232a08
Summary:
`TCPStoreTest.test_numkeys_delkeys` takes 5+ min (mostly in idle wait for socket timeout)
`TestDataLoader.test_proper_exit` and `TestDataLoaderPersistentWorkers.test_proper_exit` take 2.5 min each
`TestXNNPACKConv1dTransformPass.test_conv1d_with_relu_fc` takes 2 min to finish
Add option to skip reporting test classes that run for less than a second to `print_test_stats.py` and speed up `TestTorchDeviceTypeCUDA.test_matmul_45724_cuda`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46068
Reviewed By: mruberry
Differential Revision: D24208660
Pulled By: malfet
fbshipit-source-id: 780e0d8be4f0cf69ea28de79e423291a1f3349b7