Commit Graph

228 Commits

Author SHA1 Message Date
Alexander Grund
2785818f5a Choose test affinity based on current affinity (#80327)
This avoids test failures in cgroup environments

Fixes https://github.com/pytorch/pytorch/issues/44368

CC @VitalyFedyunin

new PR after #44369 got closed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80327
Approved by: https://github.com/VitalyFedyunin
2022-07-20 16:45:57 +00:00
erjia
270069cfb9 Fix DataLoader flaky tests that run out of shared memory (#81660)
Fixes #70517 #70547 #74598

I basically disable the `loader_kill` test that would prevent proper clean up, which would be handled by Python interpreter in normal python script execution. However, in pytest framework, the python interpreter never exits and execute clean up on the shared memory using `/multiprocessing/resource_tracker.py`. So, we might encounter the running out of shared memory in some cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81660
Approved by: https://github.com/NivekT
2022-07-19 15:38:20 +00:00
Robert
3064982fb8 Support percentages in random_split (#78877)
Fixes #78510

This PR adds support for using fractions with `random_split`. This should be completely backwards-compatible as the fractional-style splitting is only applied when the sum across the input lengths is lower than 1.0
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78877
Approved by: https://github.com/ejguan
2022-06-16 02:00:25 +00:00
Michael Suo
c978b609f7 [ci] remove IN_CI env var
The conventional env var to set is CI. Both circle and GHA set it, so
IN_CI is unnecessary

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79229

Approved by: https://github.com/janeyx99
2022-06-11 17:16:30 +00:00
Vitaly Fedyunin
883f8ef62e [DataLoader] DataLoader now automatically apply sharding to DataPipes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78631

Approved by: https://github.com/ejguan, https://github.com/NivekT
2022-06-02 17:40:29 +00:00
Sergii Dymchenko
e8bf3a9cd4 Remove Python 2-related code from dataloader (#78594)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78594
Approved by: https://github.com/seemethere
2022-06-01 05:25:23 +00:00
Alban Desmaison
7c3d3b759b Migrate x86 trunk build/test to macos12
This will enable MPS building but will NOT test mps
as the runner do not have AMD gpus

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77645

Approved by: https://github.com/malfet, https://github.com/seemethere
2022-05-18 11:59:19 +00:00
Jeeja
45bbc4c028 Update Dataloader with default parameter device (#65402)
Summary:
pin_memory, has optional device parameter to specify
which device you want to pin for.  With this above change
the Dataloader will work only for CUDA backend. To add
support for other backend which supports pinned memory,
dataloader is updated with device as optional parameter.

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65402

Reviewed By: zou3519

Differential Revision: D32282204

Pulled By: VitalyFedyunin

fbshipit-source-id: e2e09876969af108d0db38af7c2d1b2f1cfa9858
(cherry picked from commit 3b76e151964fce442e27fe8fb5c37af930da4fa1)
2022-04-21 01:33:53 +00:00
Philip Meier
3c10987692 don't add extra shuffle in DataLoader2 if one is present
Without this, `DataLoader2` will just add an `Shuffler` to the end of the datapipe if `shuffle=True`:

```py
from torch.utils.data.dataloader_experimental import DataLoader2

from torchdata.datapipes.iter import IterableWrapper, IterDataPipe, Shuffler

class Sorter(IterDataPipe):
    def __init__(self, datapipe):
        self.datapipe = datapipe

    def __iter__(self):
        return iter(sorted(self.datapipe))

data = list(range(1000))
dp = IterableWrapper(data)
dp = Shuffler(dp).set_shuffle(False)
dp = Sorter(dp)

dl2 = DataLoader2(dp, shuffle=True, batch_size=None)

assert list(dl2) == data  # fails unless you hit a lucky random seed
```

This example is somewhat non-sensical, but demonstrates we cannot simply add a `Shuffler`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75014
Approved by: https://github.com/ejguan
2022-04-05 19:53:08 +00:00
Evren Tumer
7534525735 Reset worker cycle iterator for determinism across runs (#73675)
Summary:
Reset worker cycle iterator for determinism across runs

Fixes https://github.com/pytorch/pytorch/issues/73603

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73675

Reviewed By: bdhirsh

Differential Revision: D34688704

Pulled By: ejguan

fbshipit-source-id: 7bab11f0b9f59645d9b168fa11d92dc7c2c4d34e
(cherry picked from commit eb5fd559224988f9967528e154cf37c5031fe7c2)
2022-03-09 14:55:07 +00:00
Kevin Tse
cd4ecce1bb [DataPipe] Fix issue with DataPipe serialization with dill (#72896)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72896

Fixing the issue described here: https://github.com/pytorch/data/issues/214

There will be a follow-up PR in TorchData as well

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D34258669

Pulled By: NivekT

fbshipit-source-id: 6dd88250ed14ebe779915dc46139be7e012e9d1b
(cherry picked from commit 025b8ed98019e576bfef04c33a3f33ed1a426a66)
2022-02-23 16:31:20 +00:00
Erjia Guan
67a275c293 Fix persistent worker exits before pin_memory thread (#71579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71579

Fixes #1551

As the comment in the code, register a function to terminate persistent workers.
By adding a reference of these workers in `atexit`, it would prevent Python interpreter kills these persistent worker processes before `pin_memorh_thread` exits.
And, if users explicitly kills DataLoader iterator, such function in `atexit` would be a no-op.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D33896537

Pulled By: ejguan

fbshipit-source-id: 36b57eac7523d8aa180180c2b61fc693ea4638ae
(cherry picked from commit 05add2ae0f)
2022-02-01 23:57:17 +00:00
pyhuang97@gmail.com
16a9ffba4b Allow specifying num_samples to RandomSampler even when replacement=False (#71568)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/38032 #39214

Hi, I modified the RandomSampler to satisfy the requirement of https://github.com/pytorch/pytorch/issues/38032. I also added and deleted some test cases in the test/test_dataloader.py to match with the new requirement.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71568

Reviewed By: mikaylagawarecki

Differential Revision: D33741776

Pulled By: ejguan

fbshipit-source-id: 2d25f5096b7b36ad9fb6455107182f387cf8ee43
(cherry picked from commit 9c7e1891c2)
2022-01-25 15:34:24 +00:00
Nikita Shulga
86aefdc082 Revert D33694867: Fix persistent worker exits before pin_memory thread
Test Plan: revert-hammer

Differential Revision:
D33694867 (e2191e7084)

Original commit changeset: 0847f4d424a0

Original Phabricator Diff: D33694867 (e2191e7084)

fbshipit-source-id: 5f28616700d8647cbe468a9e300724a7f0c6cc15
(cherry picked from commit 3d8125ba6d)
2022-01-22 00:09:28 +00:00
Erjia Guan
e2191e7084 Fix persistent worker exits before pin_memory thread (#71579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71579

Fixes #1551

As the comment in the code, register a function to terminate persistent workers. Using `atexit` to make sure termination of persistent workers always happens at the end (after pin_memory_thread exits).
We need such mechanism because Python interpreter would clean up worker process before DataLoader iterator in some rare cases.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D33694867

Pulled By: ejguan

fbshipit-source-id: 0847f4d424a0cd6b3c0be8235d505415970254e8
(cherry picked from commit 18ad4621af)
2022-01-21 20:31:16 +00:00
Vitaly Fedyunin
d90012689f [DataPipe] Control shuffle settings from DataLoader2 (#65756)
Summary:
Makes `shuffle` DataPipe sensitive to DataLoader(2) `shuffle` kwarg.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65756

Reviewed By: albanD

Differential Revision: D31344867

Pulled By: VitalyFedyunin

fbshipit-source-id: e0084e0ac193ac784d6298328ca1222745681347
2021-12-14 07:35:26 -08:00
Kevin Tse
39fb855d91 [DataLoader] Implementing communication processes for Map-style DataPipes (#68549)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68549

cc SsnL VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D32922676

Pulled By: NivekT

fbshipit-source-id: fd918a342214d617a489ac5acffff15b55e9b255
2021-12-08 07:27:01 -08:00
Santiago Castro
f776f30780 Keep the sequence or mapping type in default_collate (#68779)
Summary:
`default_collate`, `default_convert`, and `pin_memory` convert sequences into lists. I believe they should keep the original type when possible (e.g., I have a class that inherits from `list`, which comes from a 3rd party library that I can't change, and provides extra functionality).

Note it's easy to do when the type supports an iterable in its creation but it's not always the case (e.g., `range`).

Even though this can be accomplished if using a custom `default_collate`/`default_convert`, 1) this is behavior they should support out-of-the-box IMHO, and 2) `pin_memory` still does it.

cc VitalyFedyunin ejguan NivekT

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68779

Reviewed By: wenleix

Differential Revision: D32651129

Pulled By: ejguan

fbshipit-source-id: 17c390934bacc0e4ead060469cf15dde815550b4
2021-11-29 13:14:20 -08:00
Jane Xu
39215ddf84 [skip ci] Set test owners for dataloader tests (#66839)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc SsnL VitalyFedyunin ejguan NivekT

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66839

Reviewed By: ejguan

Differential Revision: D31761722

Pulled By: janeyx99

fbshipit-source-id: 8315ac03352c11b3215d89856b3cfda6cd78fa0c
2021-10-19 08:31:16 -07:00
Michael Suo
9d13ae450a [oss/ci] skip all dataloader tests with asan (#66561)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66561

See https://github.com/pytorch/pytorch/issues/66223 for context.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D31617142

Pulled By: suo

fbshipit-source-id: 16b280fc47a7c40fa19c5c72192d342dd33680bf
2021-10-13 11:39:41 -07:00
Michael Suo
213c3f45da [oss/ci] skip TestDataLoaderPersistentWorkers on ASAN (#66236)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66236

it's flaky, see https://github.com/pytorch/pytorch/issues/66223

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31462056

Pulled By: suo

fbshipit-source-id: f4362a8020dc05ac8856706c0508d48be026eeb8
2021-10-06 17:56:19 -07:00
Erjia Guan
b777d790ea Convert Sampler back to lazily construction (#63646)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63646

Fixes #63609

Test Plan: Imported from OSS

Reviewed By: NivekT

Differential Revision: D30451774

Pulled By: ejguan

fbshipit-source-id: 550d77494326446d1a42b5da0559e0d384c47413
2021-09-30 07:32:06 -07:00
Nikita Shulga
4a7a0ea42e Skip flaky ASAN tests (#65792)
Summary:
See https://github.com/pytorch/pytorch/issues/65727

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65792

Reviewed By: janeyx99

Differential Revision: D31254490

Pulled By: malfet

fbshipit-source-id: 76714db30a5566fbab95179236ccdafab22cf551
2021-09-28 22:33:02 -07:00
Nikita Shulga
145202c45b Define timeout in TestIndividualWorkerQueue (#65742)
Summary:
This test occasionally deadlocks while waiting for the child process to report result.
But as the test is small, entire test should never take more than 1-2 sec, but to be on the safe side set timeout to 5 sec

Somewhat mitigates https://github.com/pytorch/pytorch/issues/65727

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65742

Reviewed By: janeyx99, ejguan

Differential Revision: D31235116

Pulled By: malfet

fbshipit-source-id: 0cdd2f7295f6f9fcefee954a14352e18fae20696
2021-09-28 10:01:19 -07:00
Hong Xu
fb8bdb8039 When test set_affinity, don't hardcode the CPU ID (#65042)
Summary:
The setaffinity test always fails when the number of CPUs is smaller
than 3. Changed the test to be dynamically based on the number of CPUs
of the system.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65042

Reviewed By: jbschlosser

Differential Revision: D30960554

Pulled By: ejguan

fbshipit-source-id: 55ac12714b4b0964b48c3617b79a7a345d40ebce
2021-09-15 08:10:59 -07:00
Rishi Puri
2ae938e15e Fixes failure in test_dataloader.py that occurs on jetson boards (#64757)
Summary:
CUDA IPC is not supported for jetsons

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64757

Reviewed By: jbschlosser

Differential Revision: D30900593

Pulled By: ejguan

fbshipit-source-id: c6b2e8a9746276fdb4a009b6412e47cc8aac69f2
2021-09-13 10:11:04 -07:00
Erjia Guan
3cd0a4ac15 Fix test_ind_worker_queue by setting max_num_worker based on system resource (#63779)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63779

Fixes #63657

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30494185

Pulled By: ejguan

fbshipit-source-id: d1bd24299b25d589889604aaf18ad347bdff4df4
2021-09-02 12:36:56 -07:00
Vitaly Fedyunin
82174330d0 [DataLoader2] Adding Messages, Protocols, Loop wrappers (#63882)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63882

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30627452

Pulled By: VitalyFedyunin

fbshipit-source-id: 561ea2df07f3572e04401171946154024126387b
2021-08-30 07:57:20 -07:00
Erjia Guan
ad47fb8858 Rename IterableAsDataPipe to IterableWrapper (#63981)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63981

Rename `IterableAsDataPipe` to `IterableWrapper` based on our naming convention `Op-er`

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D30554197

Pulled By: ejguan

fbshipit-source-id: c2eacb20df5645d83ca165d6a1591f7e4791990f
2021-08-26 10:23:25 -07:00
Vitaly Fedyunin
e1bdebf685 Adding DataLoader2 class as future replacement of DataLoader (#63742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63742

Supports sharding and batching on loader level**

Supports sharding and batching on loader level

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30494506

Pulled By: VitalyFedyunin

fbshipit-source-id: 6648e09d955055ac38e3a4e3973f701acefca762
2021-08-23 18:09:07 -07:00
Alban Desmaison
71da114412 Revert D30426527: Adding DataLoader2 class as future replacement of DataLoader
Test Plan: revert-hammer

Differential Revision:
D30426527 (5a7133b87f)

Original commit changeset: e5905d3364c4

fbshipit-source-id: 794d8a4e9256ccff8cf894aee10eff6adc30d502
2021-08-20 12:06:52 -07:00
Vitaly Fedyunin
5a7133b87f Adding DataLoader2 class as future replacement of DataLoader (#63523)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63523

Supports sharding and batching on loader level**
* #63522 Adding IterableAsDataPipe IterDataPipe
usefull for tests and simple cases

Supports sharding and batching on loader level

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30426527

Pulled By: VitalyFedyunin

fbshipit-source-id: e5905d3364c4880e720dd62fb066f08881c71a6e
2021-08-20 09:01:55 -07:00
Shen Li
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
Zsolt Dollenstein
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
DamonDeng
53489bc385 fix for #60319 , forcing to use fork as start method in test/test_dat… (#60868)
Summary:
fix for https://github.com/pytorch/pytorch/issues/60319 , forcing to use fork as start method in test/test_dataloader.py

Fixes #{60319}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60868

Reviewed By: mruberry

Differential Revision: D29432876

Pulled By: ejguan

fbshipit-source-id: 5da25f7cfaf8ea0803c0b1aacf2badd656799e16
2021-06-29 09:30:37 -07:00
Rong Rong (AI Infra)
510334f34b [BE] clean up IS_PYTORCH_CI and IN_CI (#60279)
Summary:
`IS_PYTORCH_CI` and `IN_CI` are used randomly, however in some cases IN_CI is not currently set because it only exist in .circleci/scripts/setup_ci_environment.sh. This cleans up the 2 flags and only use IN_CI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60279

Test Plan: CI

Reviewed By: seemethere

Differential Revision: D29239545

Pulled By: walterddr

fbshipit-source-id: a069424a2bb8790a3adfdaf0dc460301026bf8c7
2021-06-20 19:45:07 -07:00
TJ-coding
7c29ca7f2b Fix Subset of a Subset not sliceable issue (#59513)
Summary:
Dataset can be indexed by a list, but a list can not be indexed by a list. This gives error when slicing a Subset initialised with a Subset, instead of a dataset.

Fixed the issue by changing the indices to a Tensor which can be indexed by a list.

Fixes https://github.com/pytorch/pytorch/issues/59512

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59513

Reviewed By: zou3519

Differential Revision: D29196891

Pulled By: ejguan

fbshipit-source-id: ccde6e474fbcbddd2e9c7c107bc8b5de1307cdb9
2021-06-18 07:07:34 -07:00
Erjia Guan
3b977a0d28 [DataLoader] Add generate_state for NumPy seeding (#56797)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56797

After adding default seeding strategy for NumPy random module within each worker of DataLoader #56488, two concerns are raised:
- We dropped the support for NumPy < 1.17 due to `SeedSequence`
- In order to support seeding for NumPy < 1.17, how can we provide seed for `numpy.random`?
  - First option is set the same seed as `random`. But, the problem is a same algorithm is shared between `numpy.random` and `random`. With the same seed, they will have exact same state sequence. Thanks to rkern, we noticed this so-called [bad things](https://github.com/PyTorchLightning/pytorch-lightning/pull/6960#issuecomment-818393659).
  - Considering most of users do not aware this problem, we can provide a better seed by default for `numpy.random` using same `SeedSequence` algorithm as numpy. This is just a workaround with hard-coded function to generate an array of four int32 as the seed.

To better coping with this problem since there are amount of 3rd party libraries not just `NumPy` having random module. We may at the end need to implement a `SeedSequence` within `torch.random` module, then users can `spawn` a new `SeedSequence` for each library.

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D28000619

Pulled By: ejguan

fbshipit-source-id: 5701c8124a38ea5ded69eb8eee70f9680877ffa6
2021-04-27 08:14:02 -07:00
Yukio Siraichi
93bf0ae6fc Remove legacy constructor calls from pytorch codebase. (#54142)
Summary:
Follow up from https://github.com/pytorch/pytorch/issues/53889
Related to https://github.com/pytorch/pytorch/issues/47112

Removing every occurrence of the legacy constructor call present in PyTorch at:
- _docs_
- _benchmarks_
- _test_
- _caffe2_
- _CONTRIBUTING.md_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54142

Reviewed By: ngimel

Differential Revision: D27699450

Pulled By: mruberry

fbshipit-source-id: 530aa3f5746cc8bc1407d5d51b2bbd8075e30546
2021-04-11 15:45:17 -07:00
Winston Smith
8ed20b3f65 Leak Caffe2 threadpool in child processes right after fork to prevent segfault (#54895)
Summary:
## Problem summary
Fixes https://github.com/pytorch/pytorch/issues/54752 - when the number of threads is more than 3 and at least one `set_num_threads` invocation has taken place before forking child processes by the dataloader, `set_num_threads(1)` in the child process causes a segfault, as during its invocation, the child process is made to handle the data structures of the Caffe2 thread-pool of the parent process, whose data structures it inherits from the parent process (these threads don't exist in the child process, but some of its data structures do, due to the copy-on-write technique used by `fork`).

## Solution
malfet [advised](https://github.com/pytorch/pytorch/issues/54752#issuecomment-810315302) & [authored code](https://github.com/pytorch/pytorch/pull/54895#pullrequestreview-625670122) for adding a `pthread_atfork` handler in `pytorch/caffe2/utils/threadpool/pthreadpool-cpp.cc`, that's invoked in the child process right after fork, to leak the Caffe2 thread-pool (the child inherits the thread-pool's data structures from its parent process, but doesn't actually have those threads, since after `fork` , a child process only has one thread).

## Additional changes
Added unittest `test_no_segfault` to test for this issue in `test_dataloader.py`
Also enabled `test_segfault` (which actually makes sure that segfaults happen in worker processes in a particular case).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54895

Reviewed By: zhangguanheng66

Differential Revision: D27542253

Pulled By: malfet

fbshipit-source-id: 10f9c67ce1ff1aa37d3efebf405bd93f7f9d2489
2021-04-03 10:51:20 -07:00
Nikita Shulga
f2689b1e13 Make ideep honor torch.set_num_thread changes (#53871)
Summary:
When compiled with OpenMP support `ideep`'s computational_cache would cache max number of OpenMP workers
This number could be wrong after `torch.set_num_threads` call, so clean it after the call.

Fixes https://github.com/pytorch/pytorch/issues/53565

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53871

Reviewed By: albanD

Differential Revision: D27003265

Pulled By: malfet

fbshipit-source-id: 1d84c23070eafb3d444e09590d64f97f99ae9d36
2021-03-13 11:20:44 -08:00
Yi Zhang
fd582af06c enable coverage test for dataloader on Windows (#52550)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/50661
For coverage,
The class qualified name is `'SimpleCustomBatch': <class '__mp_main__.SimpleCustomBatch'>`

For pytest
The class qualified name is `'SimpleCustomBatch': <class 'test_dataloader.SimpleCustomBatch'>`

So move the class to one separate file

![image](https://user-images.githubusercontent.com/16190118/108611869-d6b51f80-741d-11eb-908e-be7a64da916d.png)

As malfet suggestion, use __import__ to avoid adding new file.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52550

Reviewed By: walterddr

Differential Revision: D26754023

Pulled By: malfet

fbshipit-source-id: 34b0fbe7336b9303cedc28ec6116ab752a2d3630
2021-03-02 18:40:47 -08:00
Kyle Chen
a9f7ae5357 [ROCm] Enable test cases in test/test_dataloader.py for ROCm (#52766)
Summary:
Enabling test cases in test_dataloader.py for ROCm because they are passing now.

Signed-off-by: Kyle Chen <kylechen@amd.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52766

Reviewed By: H-Huang

Differential Revision: D26706402

Pulled By: ngimel

fbshipit-source-id: 63d4ea6d9b16f6244eb0f0f8f7a957bac8469111
2021-03-01 13:32:35 -08:00
Erjia Guan
89b1053413 [DataLoader] Move BufferedShuffle from Dataset to DataPipe (#52141)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52141

Remove BufferShuffleDataSet, as it's not being used anywhere within PyTorch (no usage on Github based on a search) and it's not included in the release of PyTorch 1.7.1.

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D26710940

Pulled By: ejguan

fbshipit-source-id: 90023b4bfb105d6aa392753082100f9181ecebd0
2021-03-01 12:54:44 -08:00
Chester Liu
58eb23378f Clean up usage of torch._six partially (#49785)
Summary:
See https://github.com/pytorch/pytorch/issues/42919

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49785

Reviewed By: mruberry

Differential Revision: D25963833

Pulled By: bugra

fbshipit-source-id: 11c90d6b8d3f206c9d0a4d8621b773beb10c6ba2
2021-02-08 13:58:34 -08:00
Tongzhou Wang
54ce171f16 Fix persistent_workers + pin_memory (#48543)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/48370 https://github.com/pytorch/pytorch/issues/47445

cc emcastillo who authored the original functionality.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48543

Reviewed By: bdhirsh

Differential Revision: D25277474

Pulled By: ejguan

fbshipit-source-id: 1967002124fb0fff57caca8982bc7df359a059a2
2021-01-08 07:04:10 -08:00
Hugo van Kemenade
473e78c0fa Remove redundant code for unsupported Python versions (#49486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49486

Remove code for Python 3.5 and lower.

There's more that can be removed/modernised, but sticking mainly to redundant version checks here, to keep the diff/PR smaller.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46579

Reviewed By: zou3519

Differential Revision: D24453571

Pulled By: ezyang

fbshipit-source-id: c2cfcf05d6c5f65df64d89c331692c9aec09248e
2021-01-06 12:45:46 -08:00
Pritam Damania
21c38e1799 Additional validation for DistributedSampler. (#48865)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48865

If DistributedSampler was provided an invalid rank (ex:
https://discuss.pytorch.org/t/distributed-datasets-on-multi-machines/105113),
it failed with a cryptic assertion failure.

To fix this issue, I've added an additional check to DistributedSampler to
validate we provide a valid rank.
ghstack-source-id: 117906769

Test Plan:
1) waitforbuildbot
2) Unit test added.

Reviewed By: malfet

Differential Revision: D25344945

fbshipit-source-id: 7685e00c8b2c200efbd2949fb32ee32ea7232a08
2020-12-11 17:22:22 -08:00
SsnL
4abca9067b Fix dataloader hang with large sampler (#48669)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/48666

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48669

Reviewed By: zhangguanheng66

Differential Revision: D25255763

Pulled By: VitalyFedyunin

fbshipit-source-id: d06421f52bb1d00cdf8025f1a2ba0d1f9284731a
2020-12-02 09:07:30 -08:00
lixinyu
67b7e751e6 add warning if DataLoader is going to create excessive number of thread (#46867)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46867

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D24545540

Pulled By: glaringlee

fbshipit-source-id: a3bef0d417e535b8ec0bb33f39cfa2308aadfff0
2020-10-30 07:54:23 -07:00
Nikita Shulga
f363a2e106 Mark top 3 slowest tests as slow (#46068)
Summary:
`TCPStoreTest.test_numkeys_delkeys` takes 5+ min (mostly in idle wait for socket timeout)
`TestDataLoader.test_proper_exit` and `TestDataLoaderPersistentWorkers.test_proper_exit` take 2.5 min each
`TestXNNPACKConv1dTransformPass.test_conv1d_with_relu_fc` takes 2 min to finish

Add option to skip reporting test classes that run for less than a second to `print_test_stats.py` and speed up `TestTorchDeviceTypeCUDA.test_matmul_45724_cuda`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46068

Reviewed By: mruberry

Differential Revision: D24208660

Pulled By: malfet

fbshipit-source-id: 780e0d8be4f0cf69ea28de79e423291a1f3349b7
2020-10-08 21:10:03 -07:00
Erjia Guan
96540e918c Add ShuffleDataset with buffer (#45290)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45290

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D24001084

Pulled By: erjia-guan

fbshipit-source-id: d8a7455cf3f18e1f8c1edc53c42c1a99c8573c51
2020-09-30 07:58:15 -07:00
Akihiro Nitta
84949672bf Fix exception chaining in test/ (#44193)
Summary:
## Motivation
This PR fixes https://github.com/pytorch/pytorch/issues/43770 and is the continuation of https://github.com/pytorch/pytorch/issues/43836.

## Description of the change
This PR fixes exception chaining only in files under `test/` where appropriate.
To fix exception chaining, I used either:
1. `raise new_exception from old_exception` where `new_exception` itself seems not descriptive enough to debug or `old_exception` delivers valuable information.
2. `raise new_exception from None` where raising both of `new_exception` and `old_exception` seems a bit noisy and redundant.

## List of lines containing `raise` in `except` clause:
I wrote [this simple script](https://gist.github.com/akihironitta/4223c1b32404b36c1b349d70c4c93b4d) using [ast](https://docs.python.org/3.8/library/ast.html#module-ast) to list lines where `raise`ing in `except` clause.

- [x] f8f35fddd4/test/test_cpp_extensions_aot.py (L16)
- [x] f8f35fddd4/test/test_jit.py (L2503)
- [x] f8f35fddd4/test/onnx/model_defs/word_language_model.py (L22)
- [x] f8f35fddd4/test/onnx/verify.py (L73)
- [x] f8f35fddd4/test/onnx/verify.py (L110)
- [x] f8f35fddd4/test/onnx/test_verify.py (L31)
- [x] f8f35fddd4/test/distributed/test_c10d.py (L255)
- [x] f8f35fddd4/test/distributed/test_c10d.py (L2992)
- [x] f8f35fddd4/test/distributed/test_c10d.py (L3025)
- [x] f8f35fddd4/test/distributed/test_c10d.py (L3712)
- [x] f8f35fddd4/test/distributed/test_distributed.py (L3180)
- [x] f8f35fddd4/test/distributed/test_distributed.py (L3198)
- [x] f8f35fddd4/test/distributed/test_data_parallel.py (L752)
- [x] f8f35fddd4/test/distributed/test_data_parallel.py (L776)
- [x] f8f35fddd4/test/test_type_hints.py (L151)
- [x] f8f35fddd4/test/test_jit_fuser.py (L771)
- [x] f8f35fddd4/test/test_jit_fuser.py (L773)
- [x] f8f35fddd4/test/test_dispatch.py (L105)
- [x] f8f35fddd4/test/test_distributions.py (L4738)
- [x] f8f35fddd4/test/test_nn.py (L9824)
- [x] f8f35fddd4/test/test_namedtensor.py (L843)
- [x] f8f35fddd4/test/test_jit_fuser_te.py (L875)
- [x] f8f35fddd4/test/test_jit_fuser_te.py (L877)
- [x] f8f35fddd4/test/test_dataloader.py (L31)
- [x] f8f35fddd4/test/test_dataloader.py (L43)
- [x] f8f35fddd4/test/test_dataloader.py (L365)
- [x] f8f35fddd4/test/test_dataloader.py (L391)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44193

Reviewed By: albanD

Differential Revision: D23681529

Pulled By: malfet

fbshipit-source-id: 7c2256ff17334625081137b35baeb816c1e53e0b
2020-09-14 14:20:16 -07:00
Emilio Castillo
5472426b9f Reset DataLoader workers instead of creating new ones (#35795)
Summary:
This PR needs discussion as it changes the behavior of `DataLoader`. It can be closed if its not considered a good practice.

Currently, the `DataLoader` spawns a new `_BaseDataLoaderIter` object every epoch,
In the case of the multiprocess DataLoader, every epoch the worker processes are re-created and they make a copy of the original `Dataset` object.
If users want to cache data or do some tracking on their datasets, all their data will be wiped out every epoch. Notice that this doesn't happen when the number of workers is 0. giving some inconsistencies with the multiprocess and serial data loaders.

This PR keeps the `_BaseDataLoaderIter` object alive and just resets it within epochs, so the workers remain active and so their own `Dataset` objects. People seem to file issues about this often.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/35795

Reviewed By: ailzhang

Differential Revision: D23426612

Pulled By: VitalyFedyunin

fbshipit-source-id: e16950036bae35548cd0cfa78faa06b6c232a2ea
2020-09-01 11:48:00 -07:00
Nikita Shulga
2b70f82737 fix typo in test_dataloader test_multiprocessing_contexts (take 2) (#43588)
Summary:
2nd attempt to land https://github.com/pytorch/pytorch/pull/43343

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43588

Reviewed By: seemethere

Differential Revision: D23332284

Pulled By: malfet

fbshipit-source-id: d78faf468c56af2f176dbdd2ce4bd51f0b5df6fd
2020-08-25 21:11:53 -07:00
George Guanheng Zhang
9420c773d0 Revert D23299452: [pytorch][PR] fix typo in test_dataloader test_multiprocessing_contexts
Test Plan: revert-hammer

Differential Revision:
D23299452 (6a2d7a05c4)

Original commit changeset: 9489c48b83bc

fbshipit-source-id: e8c15d338dd89d8e92f3710e9cf149149bd2e763
2020-08-25 12:34:49 -07:00
Jeff Daily
6a2d7a05c4 fix typo in test_dataloader test_multiprocessing_contexts (#43343)
Summary:
https://github.com/pytorch/pytorch/issues/22990 added a multiprocessing_context argument to DataLoader, but a typo in the test causes the wrong DataLoader class to be used.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43343

Reviewed By: glaringlee

Differential Revision: D23299452

Pulled By: malfet

fbshipit-source-id: 9489c48b83bce36f46d350cad902f7ad96e1eec4
2020-08-25 09:36:56 -07:00
yl-to
1b55e2b043 add prefetch_factor for multiprocessing prefetching process (#41130)
Summary:
fix https://github.com/pytorch/pytorch/issues/40604
Add parameter to Dataloader to configure the per-worker prefetch number.
Before this edit, the prefetch process always prefetch 2 * num_workers data items, this commit help us make this configurable, e.x. you can specify to prefetch 10 * num_workers data items.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41130

Reviewed By: izdeby

Differential Revision: D22705288

Pulled By: albanD

fbshipit-source-id: 2c483fce409735fef1351eb5aa0b033f8e596561
2020-07-24 08:38:13 -07:00
Daiming Yang
ad7133d3c1 Patch for #40026 RandomSampler generates samples one at a time when replacement=True (#41682)
Summary:
Fix https://github.com/pytorch/pytorch/issues/32530
Fix/Patch https://github.com/pytorch/pytorch/pull/40026

Resubmit this patch and fix the type error.

Force the input type to `manual_seed()` in `sampler.py` to be `int`.

ezyang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41682

Reviewed By: izdeby

Differential Revision: D22665477

Pulled By: ezyang

fbshipit-source-id: 1725c8aa742c31e74321f20448f4b6a392afb38d
2020-07-22 13:45:09 -07:00
Shen Li
86590f226e Revert D22519869: [pytorch][PR] RandomSampler generates samples one at a time when replacement=True
Test Plan: revert-hammer

Differential Revision:
D22519869 (09647e1287)

Original commit changeset: be6585002586

fbshipit-source-id: 31ca5ceb24dd0b291f46f427a6f30f1037252a5d
2020-07-16 12:59:10 -07:00
Daiming Yang
09647e1287 RandomSampler generates samples one at a time when replacement=True (#40026)
Summary:
Fix https://github.com/pytorch/pytorch/issues/32530

I used the next() function to generate samples one at a time. To compensate replacement=False, I added a variable called "sample_list" to RandomSampler for random permutation.

cc SsnL

Pull Request resolved: https://github.com/pytorch/pytorch/pull/40026

Reviewed By: zhangguanheng66

Differential Revision: D22519869

Pulled By: ezyang

fbshipit-source-id: be65850025864d659a713b3bc461b25d6d0048a2
2020-07-16 11:42:32 -07:00
Wojciech Baranowski
0e09511af9 type annotations for dataloader, dataset, sampler (#39392)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/38913

Pull Request resolved: https://github.com/pytorch/pytorch/pull/39392

Reviewed By: anjali411

Differential Revision: D22102489

Pulled By: zou3519

fbshipit-source-id: acb68d9521145f0b047214d62b5bdc5a0d1b9be4
2020-07-07 07:16:18 -07:00
X Wang
063d5b0d3f Remove get_fail_msg in test_dataloader.test_proper_exit (#40745)
Summary:
Close https://github.com/pytorch/pytorch/issues/40744
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40745

Reviewed By: ezyang

Differential Revision: D22308972

Pulled By: colesbury

fbshipit-source-id: 4b4847e6b926b2614c8b14f17a9db3b0376baabe
2020-07-06 07:48:32 -07:00
Linyuan Gong
0a75234934 Allow np.memmap objects (numpy arrays based on files) to be processed… (#39847)
Summary:
Allow np.memmap objects to be processed by default_collate

np.memmap objects has the same behavior as numpy arrays, and the only difference is that they are stored in a binary file on the disk. However, the default_collate function used by PyTorch DataLoader only accepts np.array, and rejects np.memmap by type checking. This commit allows np.memmap objects to be processed by default_collate. In this way, users can use in-disk large arrays with PyTorch DataLoader.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39847

Reviewed By: ezyang

Differential Revision: D22284650

Pulled By: zou3519

fbshipit-source-id: 003e3208a2afd1afc2e4640df14b3446201e00b4
2020-06-30 15:00:20 -07:00
Tongzhou Wang
23db54acdf [DataLoader] add repr for WorkerInfo (#39975)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39975

Differential Revision: D22039414

Pulled By: ezyang

fbshipit-source-id: 230f68a91fca901bce652fdf88ba88167f39b978
2020-06-16 08:19:32 -07:00
ShawnZhong
c8c53c802e Add generator= kwarg for DataLoader & random samplers (#39737)
Summary:
Fix https://github.com/pytorch/pytorch/issues/39572

Add `generator=` kwarg for DataLoader & random samplers

cc: SsnL, deeppatel4557, albanD, mitar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39737

Differential Revision: D22019132

Pulled By: albanD

fbshipit-source-id: 835e08b86c5396bc0b0e41057661306b15394d6e
2020-06-15 07:01:20 -07:00
Daiming Yang
0b90b9cdd3 Allow shuffle when auto-batching disabled in DataLoader (#39865)
Summary:
Fix https://github.com/pytorch/pytorch/issues/35761
cc SsnL

Note: closed the other PR for this new branch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39865

Differential Revision: D22003612

Pulled By: ezyang

fbshipit-source-id: 26aecd1b298fe99d3924f4c8157cd6cae2561c7c
2020-06-11 15:17:46 -07:00
Hong Xu
283a3ff16d The exception raised when RandomSampler.replacement is non-boolean should be TypeError (#36547)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36547

Differential Revision: D21818752

Pulled By: ezyang

fbshipit-source-id: 7502a24a0df134c44ac72959ba992777c873f8e9
2020-06-02 06:54:02 -07:00
Donna Choi
3d2fce6bc3 Change len(DataLoader) for IterableDataset (#38925)
Summary:
Fix https://github.com/pytorch/pytorch/issues/36176

One-liner change to ensure that ```len(loader) == (len(dataset) // batch_size)``` for IterableDataset.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38925

Differential Revision: D21731587

Pulled By: ezyang

fbshipit-source-id: 59a086165a004c0c1c8a1ee0776b1444bd26de23
2020-05-27 11:56:41 -07:00
Donna Choi
8c07a98adc Error out of default_collate for lists of unequal size (#38492)
Summary:
Fix issue https://github.com/pytorch/pytorch/issues/23141#

In the below example ```default_collate``` collates each element of the list. Since the second element isn't present in all samples, it is discarded:
```
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
import numpy as np

class CustomDataset(Dataset):
    def __len__(self):
        return 2

    def __getitem__(self, idx):
        tmp = {
            "foo": np.array([1, 2, 3]),
            "bar": ["X"] * (idx+1),
        }

        return tmp

training = CustomDataset()

for batch in DataLoader(training, batch_size=2):
    print(batch)
```
Yields
```
{
  'foo': tensor(
    [
      [1, 2, 3],
      [1, 2, 3]
    ]
  ),
  'bar': [
      ('X', 'X'),
    ]
}
```

Based on discussion in the issue, it seems the best course of action is to error out in this case. This seems consistent with what is done for tensor elements, as seen in [TensorShape.cpp line 1066](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/TensorShape.cpp#L1060) which is called when ```torch.stack``` is called. In this PR, I introduce a similar message to error out for lists.

SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38492

Differential Revision: D21620396

Pulled By: ezyang

fbshipit-source-id: 17f59fbb1ed1f0d9b2185c95b9ebe55ece701b0c
2020-05-18 14:53:33 -07:00
SsnL
b5868b2833 Relax sampler check in BatchSampler (#38403)
Summary:
Since the check was added in https://github.com/pytorch/pytorch/pull/6249, one can not pass an iterable as a sampler to the data loader anymore, which was a very handy feature (e.g., https://github.com/pytorch/pytorch/issues/1337). I think the check should be removed for two-fold reasons:
1. It is too strict. There is no reason that it should not be a general iterable.
2. It is inconsistent. In `DataLoader` (the main place where people use samplers), you can pass a general iterable as `batch_sampler` but not `sampler` due to this check.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38403

Differential Revision: D21555958

Pulled By: soumith

fbshipit-source-id: c7267bb99a31edd8f2750689205d6edc5dab5cff
2020-05-13 22:24:29 -07:00
Vitaly Fedyunin
57d01be92b Replacing assertEqual with assertEqualIgnoreType wherever types missmatch (#38102)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38102

Test Plan: Imported from OSS

Differential Revision: D21477060

Pulled By: VitalyFedyunin

fbshipit-source-id: 25e0fd837ca9bfccf0ce994c80f7790c894096d4
2020-05-09 14:48:55 -07:00
David Reiss
e75fb4356b Remove (most) Python 2 support from Python code (#35615)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35615

Python 2 has reached end-of-life and is no longer supported by PyTorch.
Now we can clean up a lot of cruft that we put in place to support it.
These changes were all done manually, and I skipped anything that seemed
like it would take more than a few seconds, so I think it makes sense to
review it manually as well (though using side-by-side view and ignoring
whitespace change might be helpful).

Test Plan: CI

Differential Revision: D20842886

Pulled By: dreiss

fbshipit-source-id: 8cad4e87c45895e7ce3938a88e61157a79504aed
2020-04-22 09:23:14 -07:00
Wojciech Baranowski
69e3ee2d5f DataLoader: properly diagnose exceeding file descriptor limit (#34768)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/973

Common failure scenario:
* DataLoader creates workers and communicates with them through SHMs
* Workers send back through an AF_UNIX socket file descriptors to SHMs containing data
* The limit of open files gets fully used
* A FD gets stripped from a socket message coming back from a worker, without the worker knowing this.
* This causes a `RuntimeError: received 0 items of ancdata` in the standard `multiprocessing` package
* The exception is not handled by PyTorch and so is presented to the users.

After this change the user will see

```
Traceback (most recent call last):
  File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 761, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/multiprocessing/queues.py", line 113, in get
    return _ForkingPickler.loads(res)
  File "/home/wbaranowski/git/Quansight/pytorch/torch/multiprocessing/reductions.py", line 294, in rebuild_storage_fd
    fd = df.detach()
  File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach
    return reduction.recv_handle(conn)
  File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/multiprocessing/reduction.py", line 184, in recv_handle
    return recvfds(s, 1)[0]
  File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/multiprocessing/reduction.py", line 162, in recvfds
    len(ancdata))
RuntimeError: received 0 items of ancdata

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 787, in _try_get_data
    fs = [tempfile.NamedTemporaryFile() for i in range(10)]
  File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 787, in <listcomp>
    fs = [tempfile.NamedTemporaryFile() for i in range(10)]
  File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/tempfile.py", line 551, in NamedTemporaryFile
    (fd, name) = _mkstemp_inner(dir, prefix, suffix, flags, output_type)
  File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/tempfile.py", line 262, in _mkstemp_inner
    fd = _os.open(file, flags, 0o600)
OSError: [Errno 24] Too many open files: '/tmp/tmpnx_f6v_f'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test_shm_leak.py", line 56, in <module>
    worker_init_fn=worker_init_fn
  File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 861, in _next_data
    idx, data = self._get_data()
  File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 828, in _get_data
    success, data = self._try_get_data()
  File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 791, in _try_get_data
    "Too many open files. Communication with the"
RuntimeError: Too many open files. Communication with the workers is no longer possible. Please increase the limit using `ulimit -n` in the shell or change the sharing strategy by calling `torch.multiprocessing.set_sharing_strategy('file_system')` at the beginning of your code
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34768

Differential Revision: D20538053

Pulled By: ezyang

fbshipit-source-id: be4425cf2fa02aff61619b2b829c153cb1a867cb
2020-04-14 07:10:57 -07:00
Wanchao Liang
3526627f46 Use unittest assertWarns instead (#36411)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36411

This PR remove pytorch specific defined assertwarns and use the unit
test one, also format some tests

Test Plan: Imported from OSS

Differential Revision: D20998159

Pulled By: wanchaol

fbshipit-source-id: 1280ecff2dd293b95a639d13cc7417fc819c2201
2020-04-13 15:56:42 -07:00
Hong Xu
817e4f9ef1 Correct a ValueError in dataloader to TypeError (#36244)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36244

Differential Revision: D20963949

Pulled By: ezyang

fbshipit-source-id: 8c6aa4831021788052269e7aa8282d11eba4e085
2020-04-10 09:03:58 -07:00
Mathis Chenuet
17a01c7c7b feature: deterministic random_split (#34043)
Summary:
## 🚀 Feature
Option to provide a seed (random_state) for random_split() like the sklearn API https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html.

## Motivation
Useful for deterministic sampling & reproducible data generation (easily, without affecting the PRNG for other uses).
See https://github.com/pytorch/pytorch/issues/32467
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34043

Differential Revision: D20605678

Pulled By: ezyang

fbshipit-source-id: 12b10bf72cd8a0d4264ae4d326064f806945d011
2020-03-26 08:02:39 -07:00
Hong Xu
a6a72ac68f Fix all occurrences of C416. (#33429)
Summary:
C416: Unnecessary (list/set) comprehension - rewrite using list/set().

See https://pypi.org/project/flake8-comprehensions/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33429

Differential Revision: D19972858

Pulled By: ezyang

fbshipit-source-id: faac042a94c59d737bd5ae983121a0a029346e23
2020-02-21 08:32:22 -08:00
Pritam Damania
f050b16dd9 Move pytorch distributed tests to separate folder for contbuild. (#30445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445

Create distributed and rpc directories under caffe/test for better management
of unit tests.

Differential Revision: D18702786

fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606
2020-01-22 21:16:59 -08:00
Brian Vaughan
945ce71b18 Correctly handle scalar types, fix parse of numpy ints (#30486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30486

Fixes: https://github.com/pytorch/pytorch/issues/29252

There is some incorrect code in the handling of parsing python numbers that led to issue #29252:

When we allow interpretation of a zero-dim numpy integer value
as a scalar in pytorch, we incorrectly parse the int as a float.

This PR also fixes the issue described in the "FIXME" here:
https://github.com/pytorch/pytorch/pull/27628/files#diff-f539198dd366265fb8dc2d661bc5d5bcR1487

Test Plan: Added a unit test based on the example given in the issue.

Differential Revision: D18932520

Pulled By: nairbv

fbshipit-source-id: f6416f28dfd73ac72c1042042851d76beb5fcf65
2019-12-11 15:35:57 -08:00
Tongzhou Wang
c37de32b23 Enable len(dataloader) for iterable dataset (#23587)
Summary:
Copy-paste comment from code for reasoning:

```
            # NOTE [ IterableDataset and __len__ ]
            #
            # For `IterableDataset`, `__len__` could be inaccurate when one naively
            # does multi-processing data loading, since the samples will be duplicated.
            # However, no real use case should be actually using that behavior, so
            # it should count as a user error. We should generally trust user
            # code to do the proper thing (e.g., configure each replica differently
            # in `__iter__`), and give us the correct `__len__` if they choose to
            # implement it (this will still throw if the dataset does not implement
            # a `__len__`).
            #
            # To provide a further warning, we track if `__len__` was called on the
            # `DataLoader`, save the returned value in `self._len_called`, and warn
            # if the iterator ends up yielding more than this number of samples.
```

Fixes https://github.com/pytorch/pytorch/issues/30184
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23587

Differential Revision: D18852625

Pulled By: ailzhang

fbshipit-source-id: aea8d4d70c7f21aaa69b35908a6f43026493d826
2019-12-06 15:38:05 -08:00
Peter Bell
dcd1216efe Force early initialization of OpenMP in forked children (#29006)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/28389

Intel's OpenMP implementation sets the thread affinity on the first call to an OpenMP function after a fork. By adding an atfork handler we can force this to happen before a user tries to set the affinity in their own DataLoader `worker_init_fn`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29006

Differential Revision: D18782456

Pulled By: ezyang

fbshipit-source-id: ce0b515256da0cf18ceb125e0cdec99a3311bbd3
2019-12-03 15:23:31 -08:00
Michael Suo
4b0a6d299c test reporting (#29658)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29658

This PR makes our test scripts output artifacts that CircleCI can
understand. This has a few benefits:
1. We can actually see failed tests and their output in the job screen
(instead of having to scroll through logs)
2. We can use the CircleCI test metadata API to track failed tests
programmatically.

it looks like this (old ui):
https://circleci.com/gh/pytorch/pytorch/3546584?pipelines-ui-opt-out
or this (new ui):
https://app.circleci.com/jobs/github/pytorch/pytorch/3546584/tests

Test Plan: Imported from OSS

Differential Revision: D18597261

Pulled By: suo

fbshipit-source-id: 07fc7d26bbb834e13cc4cc0e48178645ae6579f5
2019-11-19 11:15:31 -08:00
Mike Ruberry
f6bda1e07b Removes @default_floating_dtype decorator (#27628)
Summary:
One fewer legacy decorator cluttering the test suite.

Functions relying on this decorator were updated or, in the case of test_sparse, the test suite was put back on double by default.

Note: this PR is blocked on https://github.com/pytorch/pytorch/issues/27599.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27628

Differential Revision: D17896254

Pulled By: mruberry

fbshipit-source-id: 13d460301f50ef4af7a660372432108164c0de1f
2019-10-12 12:39:34 -07:00
Mike Ruberry
7f183a978f Stops common_utils.py from setting the default tensor type (to torch.DoubleTensor) (#27444)
Summary:
This PR stop common_utils.py from setting the default tensor type when it's imported. See issue https://github.com/pytorch/pytorch/issues/27355. This is a frequent source of confusion for test writers.

Many tests relied on this setting (whether they knew it or not), and this PR also updates the test suite to pass without common_utils.py setting the default tensor type. Some larger test files now set the default floating dtype themselves, however. These test files are:

- test_autograd.py
- test_distributions.py
- test_jit.py
- test_nn.py

This is still a significant improvement from today, however. First, these files set the default floating dtype much more clearly than importing it from common_utils. Second, the rest of the test suite no longer sets this globally. Third, this PR is a springboard to updating those tests, too. In particular, as tests are made generic they can be moved aways from relying on this global setting.

Notable technical changes in this PR are:

- Significant updates to test_torch.py to make it pass without setting the default floating dtype globally.
- The default_floating_dtype decorator is now defined in common_utils, a couple versions of this operator were defined in test files previously.
- test_torch-specific parts of common_utils were refactored into test_torch.
- tensor creation methods in common_utils were updated to accept an optional dtype and device.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27444

Differential Revision: D17795235

Pulled By: mruberry

fbshipit-source-id: 7f77271c0c836e69f183ad9057a2c4b29f09d2e1
2019-10-08 09:52:44 -07:00
SsnL
df9d8f9032 Fix no auto batching bugs: cannot bulk load; not work with namedtuple (#26065)
Summary:
see title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26065

Differential Revision: D17392851

Pulled By: soumith

fbshipit-source-id: 468cd41c8e03d689ff2e0261d948e28daad6bfaf
2019-09-16 07:22:31 -07:00
Pritam Damania
f8611eaa7e Disable tsan for test_dataloader.py. (#25005)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25005

Seeing a bunch of failures in TSAN mostly with the following error:

```
ThreadSanitizer: starting new threads after multi-threaded fork is not
supported. Dying (set die_after_fork=0 to override)
```

TSAN is unsafe to use in a multi-threaded program after fork() and setting
die_after_fork can lead to deadlocks. As a result, I'm disabling tsan.
ghstack-source-id: 88765698

Differential Revision: D16954347

fbshipit-source-id: 18895cd82b5052938284b46479d8470af2d74a06
2019-08-22 16:20:54 -07:00
Tongzhou Wang
928754b67d make more iterator attributes private (#23744)
Summary:
1. Prefixed underscores to any `DataLoaderIter` attribute that is not part of the data loader ctor argument list.
2. Prefixed `DataLoader.dataset_kind` with underscore because it only makes sense with the private enum `_DatasetKind`, and is an implementation detail.
3. Disallow setting `DataLoader.dataset` and `DataLoader.batch_sampler` after initializing a `DataLoader` because they affect other attributes in `__init__`.

These changes should not have major BC breaking effect since the big changes are on the iterator class and most users don't even store it. I GitHub searched `pin_memory_thread` and (while I didn't look through all result pages) results I see are forks of pytorch and blog posts on how data loader works.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23744

Differential Revision: D16732507

Pulled By: ezyang

fbshipit-source-id: 9f04d000b4200b8047f31eaa3473780b66cebd26
2019-08-09 11:43:00 -07:00
SsnL
e982e46de3 Add multiprocessing_context= argument to DataLoader (#22990)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22131
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22990

Differential Revision: D16539052

Pulled By: colesbury

fbshipit-source-id: b1c48ae2fb54065dd96a67be263254129e02eaa2
2019-07-29 12:58:40 -07:00
Jan Schlüter
0bc90194fb Catch and print exception traceback in parallel_apply() workers (#18055)
Summary:
When an exception occurs in one of the modules passed to `parallel_apply()`, it is caught and re-raised in the main thread. This preserves the original exception type and message, but has the traceback point at the position where it's re-raised, rather than the original point of failure.

This PR saves the exception information required to generate the traceback, and includes the original traceback in the message of the exception raised in the main thread.

Before:
```
  ...
  File ".../torch/nn/parallel/data_parallel.py", line 153, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File ".../torch/nn/parallel/parallel_apply.py", line 84, in parallel_apply
    raise output
RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor
```

After:
```
  ...
  File ".../torch/nn/parallel/data_parallel.py", line 153, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File ".../torch/nn/parallel/parallel_apply.py", line 88, in parallel_apply
    ''.join(traceback.format_exception(*exc_info)))
RuntimeError: Caught exception in replica 0. Original traceback and message:
Traceback (most recent call last):
  ...
  File "../models/foo.py", line 319, in bar
    baz = asdf / ghij[:, np.newaxis]
RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor
```

I took care to raise an exception of the original type (in case the main code checks for that), but replaced the message. It helped me find a bug that did not occur outside `data_parallel()`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18055

Differential Revision: D16444972

Pulled By: zhangguanheng66

fbshipit-source-id: ec436c9d4677fad18106a8046cfa835a20a101ce
2019-07-26 11:41:22 -07:00
Tongzhou Wang
25eae3ed08 Disable test_proper_exit flaky worker_kill (#22208)
Summary:
I learned from https://github.com/pytorch/pytorch/pull/22058 that `worker_kill` is just flaky, regardless of `hold_iter_reference`. So let's disable it altogether for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22208

Differential Revision: D15990307

Pulled By: soumith

fbshipit-source-id: d7d3f4fe7eaac4987f240cb8fd032c73a84157d7
2019-06-26 09:47:40 -07:00
Tongzhou Wang
71741ba115 rename test to be more consistent
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22057

Differential Revision: D15936870

Pulled By: soumith

fbshipit-source-id: ab6194219da2582efdf324b89b2bc87dfe4e5d69
2019-06-20 22:02:36 -07:00
Tongzhou Wang
058beae411 Add IterableDataset (#19228)
Summary:
This is a modified version of https://github.com/pytorch/pytorch/pull/14705 since commit structure for that PR is quite messy.

1. Add `IterableDataset`.
3. So we have 2 data loader mods: `Iterable` and `Map`.

    1. `Iterable` if the `dataset` is an instance of `IterableDataset`
    2. `Map` o.w.

3. Add better support for non-batch loading (i.e., `batch_size=None` and `batch_sampler=None`). This is useful in doing things like bulk loading.
3. Refactor `DataLoaderIter` into two classes, `_SingleProcessDataLoaderIter` and `_MultiProcessingDataLoaderIter`. Rename some methods to be more generic, e.g., `get_batch` -> `get_data`.
4. Add `torch.utils.data.get_worker_info` which returns worker information in a worker proc (e.g., worker id, dataset obj copy, etc.) and can be used in `IterableDataset.__iter__` and `worker_init_fn` to do per-worker configuration.
5. Add `ChainDataset`, which is the analog of `ConcatDataset` for `IterableDataset`.
7. Import torch.utils.data in `torch/__init__.py`
9. data loader examples and documentations
10. Use `get_worker_info` to detect whether we are in a worker process in `default_collate`

Closes https://github.com/pytorch/pytorch/issues/17909, https://github.com/pytorch/pytorch/issues/18096, https://github.com/pytorch/pytorch/issues/19946, and some of https://github.com/pytorch/pytorch/issues/13023
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19228

Reviewed By: bddppq

Differential Revision: D15058152

fbshipit-source-id: 9e081a901a071d7e4502b88054a34b450ab5ddde
2019-06-20 20:12:44 -07:00
Iurii Zdebskyi
03617574d3 Сhange type of a tensor with bools (#19097)
Summary:
**This is **bc-breaking** change**
Change dtype of a tensor which was created from bool data.
Old behavior: torch.tensor([True, False]) -> uint8 tensor
Now: torch.tensor([True, False]) -> bool tensor

Tested via tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19097

Reviewed By: ezyang

Differential Revision: D15632553

Pulled By: izdeby

fbshipit-source-id: b019150844c561a6845710a3c62b12f06b68bbe3
2019-06-05 10:19:27 -07:00
Tongzhou Wang
f051fbd4a8 Fix typo in test_dataloader
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21226

Differential Revision: D15592797

Pulled By: soumith

fbshipit-source-id: b9a83e574c7b10fb0d661332ab68e376409a4724
2019-06-01 10:30:14 -07:00
Tongzhou Wang
1d4685c20f Improve test_proper_exit error printing (#20166)
Summary:
This doesn't have `strace` yet. But still have `faulthandler` to print stack traces at hanging. Also part of an attempt to isolate changes from #19228 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20166

Differential Revision: D15536504

Pulled By: ezyang

fbshipit-source-id: fe6e6e2e9899f30d8167436d7bc62b42883a3356
2019-05-29 07:51:31 -07:00
Tongzhou Wang
f496ea36b2 DataLoader: add error detection for worker_init_fn (#20150)
Summary:
This is an attempt to isolate unrelated changes from #19228 for easier review.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20150

Differential Revision: D15314891

Pulled By: ezyang

fbshipit-source-id: 8c429747ba83ad5aca4cdd8f8086bcf65a326921
2019-05-12 18:28:56 -07:00
Tongzhou Wang
1ab33fce9a Disable worker_kill & holder_iter_reference combination in test_proper_exit (#20172)
Summary:
cc nairbv
All failures I have seen are of this combination. So let's just disable it for all cases. After #20063 I find it failing for py3 once.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20172

Differential Revision: D15266527

Pulled By: nairbv

fbshipit-source-id: afb9389dfc54a0878d52975ffa37a0fd2aa3a735
2019-05-08 14:39:47 -07:00
Brian Vaughan
9005a2c0fc disable flaky test_proper_exit again, still occasionally failing (#20063)
Summary:
test was disabled for being flaky, re-enabled in https://github.com/pytorch/pytorch/pull/19421 but still occasionally failing:

https://circleci.com/gh/pytorch/pytorch/1520165?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

```
Apr 29 19:51:58 ======================================================================
Apr 29 19:51:58 FAIL: test_proper_exit (__main__.TestDataLoader)
Apr 29 19:51:58 There might be ConnectionResetError or leaked semaphore warning (due to dirty process exit), but they are all safe to ignore
Apr 29 19:51:58 ----------------------------------------------------------------------
Apr 29 19:51:58 Traceback (most recent call last):
Apr 29 19:51:58   File "/var/lib/jenkins/workspace/test/common_utils.py", line 129, in wrapper
Apr 29 19:51:58     fn(*args, **kwargs)
Apr 29 19:51:58   File "test_dataloader.py", line 847, in test_proper_exit
Apr 29 19:51:58     self.fail(fail_msg + ', and had exception {}'.format(loader_p.exception))
Apr 29 19:51:58 AssertionError: test_proper_exit with use_workers=True, pin_memory=False, hold_iter_reference=False, exit_method=worker_kill: loader process did not terminate, and had exception Traceback (most recent call last):
Apr 29 19:51:58   File "test_dataloader.py", line 227, in run
Apr 29 19:51:58     super(ErrorTrackingProcess, self).run()
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/process.py", line 114, in run
Apr 29 19:51:58     self._target(*self._args, **self._kwargs)
Apr 29 19:51:58   File "test_dataloader.py", line 424, in _test_proper_exit
Apr 29 19:51:58     for i, _ in enumerate(it):
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 545, in __next__
Apr 29 19:51:58     idx, batch = self._get_batch()
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 522, in _get_batch
Apr 29 19:51:58     success, data = self._try_get_batch()
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 480, in _try_get_batch
Apr 29 19:51:58     data = self.data_queue.get(timeout=timeout)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/queues.py", line 135, in get
Apr 29 19:51:58     res = self._recv()
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/queue.py", line 22, in recv
Apr 29 19:51:58     return pickle.loads(buf)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1382, in loads
Apr 29 19:51:58     return Unpickler(file).load()
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 858, in load
Apr 29 19:51:58     dispatch[key](self)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1133, in load_reduce
Apr 29 19:51:58     value = func(*args)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/reductions.py", line 274, in rebuild_storage_fd
Apr 29 19:51:58     fd = multiprocessing.reduction.rebuild_handle(df)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/reduction.py", line 155, in rebuild_handle
Apr 29 19:51:58     conn = Client(address, authkey=current_process().authkey)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/connection.py", line 169, in Client
Apr 29 19:51:58     c = SocketClient(address)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/connection.py", line 304, in SocketClient
Apr 29 19:51:58     s.connect(address)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/socket.py", line 224, in meth
Apr 29 19:51:58     return getattr(self._sock,name)(*args)
Apr 29 19:51:58 error: [Errno 111] Connection refused

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20063

Differential Revision: D15218223

Pulled By: nairbv

fbshipit-source-id: 32018c4220f7cb9372ef138631fc3a79759265e1
2019-05-06 08:34:27 -07:00
Seungwon Park
6c7135decb fix typo: pytoch -> pytorch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19719

Differential Revision: D15080095

Pulled By: ezyang

fbshipit-source-id: b731a0fde87d25c63c1e3d4b9a9c2244e5ad84af
2019-04-25 06:40:40 -07:00