Commit Graph

150 Commits

Author SHA1 Message Date
erjia
ccccd0efec [DataLoader] Share seed via Distributed Store to get rid of CUDA dependency (#79829)
Fixes #79828

In distributed environment, before this PR, DataLoader would create a Tensor holding the shared seed in RANK 0 and send the Tensor to other processes. However, when `NCCL` is used as the distributed backend, the Tensor is required to be moved to cuda before broadcasted from RANK 0 to other RANKs. And, this causes the Issue where DataLoader doesn't move the Tensor to cuda before sharing using `NCCL`.

After offline discussion with @mrshenli, we think the distributed Store is a better solution as the shared seed is just an integer value. Then, we can get rid of the dependency on NCCL and CUDA when sharing info between distributed processes for DataLoader.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79829
Approved by: https://github.com/VitalyFedyunin, https://github.com/NivekT
2022-06-20 19:18:35 +00:00
erjia
04f87f2ab9 [DataLoader] Fix the world_size when distributed sharding MapDataPipe (#79524)
Fixes #79449

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79524
Approved by: https://github.com/NivekT, https://github.com/VitalyFedyunin
2022-06-14 19:03:57 +00:00
ErjiaGuan
5158a6b41a Foward fix sharding bug for DL (#79124)
This PR solves a bug introduced by #79041

`torch.utils.data.graph_settings.apply_sharding` changes the datapipe in-place and returns `None`

It would resolve the Error in TorchData. See: https://github.com/pytorch/data/actions/runs/2461030312
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79124
Approved by: https://github.com/VitalyFedyunin
2022-06-08 16:16:58 +00:00
erjia
b3ed65343d Fix sharding strategy for distributed DL (#79041)
1. Change the sharding strategy from sharding by worker first then by rank to sharding in the order of rank then workers.
2. Change to fetch Rank and World size in main process for the sake of `spawn`.

For the change 1:
Before this PR, for the case when dataset can not be evenly divided by `worker_num * world_size`, more data will be retrieved by workers in first RANKs.
Using the following example:
- dataset size: 100
- world_size: 4
- num_worker: 2

The number of data retrieved by each rank before this PR
- Rank 0: 26
- Rank 1: 26
- Rank 2: 24
- Rank 3: 24

The number of data retrieved by each rank after this PR
- Rank 0: 25
- Rank 1: 25
- Rank 2: 25
- Rank 3: 25

For the change 2:
Before this PR, `dist` functions are invoked inside worker processes. It's fine when the worker processes are forked from the parent process. All environment variables are inherited and exposed to these `dist` functions. However, when the worker processes are spawned, they won't be able to access to these environment variables, then the dataset won't be sharded by rank.
After this PR, `_sharding_worker_init_fn` should be working for both `spawn` and `fork` case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79041
Approved by: https://github.com/VitalyFedyunin, https://github.com/NivekT
2022-06-07 20:56:32 +00:00
Vitaly Fedyunin
6fe6902f97 [DataLoader] Apply sharding settings in dist when num_workers is 0
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78950

Approved by: https://github.com/ejguan, https://github.com/NivekT
2022-06-06 20:03:02 +00:00
erjia
9b6cb83b0c Make ShufflerDataPipe deterministic for persistent DL and distributed DL (#78765)
Fixes https://github.com/pytorch/data/issues/426

This PR introduces two main changes:
- It ensures the `ShufflerDataPipe` would share the same seed across distributed processes.
- Users can reset `shuffle` for persistent workers per epoch.

Detail:
- `shared_seed` is shared across distributed and worker processes. It will seed a `shared_rng` to provide seeds to each `ShufflerDataPipe` in the pipeline
- `worker_loop` now accepts a new argument of `shared_seed` to accept this shared seed.
- The `shared_seed` is attached to `_ResumeIteration` for resetting seed per epoch for `persistent worker`
- I choose not to touch `base_seed` simply for BC issue

I used this [script](https://gist.github.com/ejguan/d88f75fa822cb696ab1bc5bc25844f47) to test the result with `world_size=4`. Please check the result in: https://gist.github.com/ejguan/6ee2d2de12ca57f9eb4b97ef5a0e300b

You can see there isn't any duplicated/missing element for each epoch. And, with the same seed, the order of data remains the same across epochs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78765
Approved by: https://github.com/VitalyFedyunin
2022-06-06 17:24:00 +00:00
PyTorch MergeBot
129d9dbb15 Revert "Make ShufflerDataPipe deterministic for persistent DL and distributed DL (#78765)"
This reverts commit b769a0e18b.

Reverted https://github.com/pytorch/pytorch/pull/78765 on behalf of https://github.com/janeyx99 due to broke lint on trunk
2022-06-06 14:24:51 +00:00
erjia
b769a0e18b Make ShufflerDataPipe deterministic for persistent DL and distributed DL (#78765)
Fixes https://github.com/pytorch/data/issues/426

This PR introduces two main changes:
- It ensures the `ShufflerDataPipe` would share the same seed across distributed processes.
- Users can reset `shuffle` for persistent workers per epoch.

Detail:
- `shared_seed` is shared across distributed and worker processes. It will seed a `shared_rng` to provide seeds to each `ShufflerDataPipe` in the pipeline
- `worker_loop` now accepts a new argument of `shared_seed` to accept this shared seed.
- The `shared_seed` is attached to `_ResumeIteration` for resetting seed per epoch for `persistent worker`
- I choose not to touch `base_seed` simply for BC issue

I used this [script](https://gist.github.com/ejguan/d88f75fa822cb696ab1bc5bc25844f47) to test the result with `world_size=4`. Please check the result in: https://gist.github.com/ejguan/6ee2d2de12ca57f9eb4b97ef5a0e300b

You can see there isn't any duplicated/missing element for each epoch. And, with the same seed, the order of data remains the same across epochs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78765
Approved by: https://github.com/VitalyFedyunin
2022-06-06 13:36:37 +00:00
Vitaly Fedyunin
883f8ef62e [DataLoader] DataLoader now automatically apply sharding to DataPipes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78631

Approved by: https://github.com/ejguan, https://github.com/NivekT
2022-06-02 17:40:29 +00:00
Sergii Dymchenko
e8bf3a9cd4 Remove Python 2-related code from dataloader (#78594)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78594
Approved by: https://github.com/seemethere
2022-06-01 05:25:23 +00:00
erjia
365ce350cb Make ShufflerDataPipe deterministic for SP & MP DataLoader (#77741)
This is the first PR to make DataPipe deterministic.

Users should be able to use `torch.manual_seed(seed)` to control the shuffle order for the following cases:
- Directly over `DataPipe`
- For single-process DataLoader
- Multiprocessing DataLoader

Unfortunately, for distributed training, users have to run `apply_shuffle_seed` manually to make sure all distributed processes having the same order of shuffle.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77741
Approved by: https://github.com/VitalyFedyunin, https://github.com/NivekT
2022-05-18 23:32:07 +00:00
Vitaly Fedyunin
edffd595c2 [DataLoader] Adding ability to use dill to pass DataPipes in mutiprocessing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77288

Approved by: https://github.com/ejguan, https://github.com/NivekT
2022-05-15 23:04:03 +00:00
Michael Suo
fb0f285638 [lint] upgrade mypy to latest version
Fixes https://github.com/pytorch/pytorch/issues/75927.

Had to fix some bugs and add some ignores.

To check if clean:
```
lintrunner --paths-cmd='git grep -Il .' --take MYPY,MYPYSTRICT
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76753

Approved by: https://github.com/malfet
2022-05-03 20:51:34 +00:00
PyTorch MergeBot
3d7428d9ac Revert "[lint] upgrade mypy to latest version"
This reverts commit 9bf18aab94.

Reverted https://github.com/pytorch/pytorch/pull/76753 on behalf of https://github.com/suo
2022-05-03 20:01:18 +00:00
Michael Suo
9bf18aab94 [lint] upgrade mypy to latest version
Fixes https://github.com/pytorch/pytorch/issues/75927.

Had to fix some bugs and add some ignores.

To check if clean:
```
lintrunner --paths-cmd='git grep -Il .' --take MYPY,MYPYSTRICT
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76753

Approved by: https://github.com/malfet
2022-05-03 19:43:28 +00:00
Erjia Guan
0289ab2cec Fix data-related public API (#368)
Summary:
X-link: https://github.com/pytorch/data/pull/368

This is PR aims to expose the right data-relate API.

There are two more changes made in this PR to convert public api to private api
`check_lambda_fn` -> `_check_lambda_fn`
`deprecation_warning` -> `_deprecation_warning`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76143

Reviewed By: albanD, NivekT

Differential Revision: D35798311

Pulled By: ejguan

fbshipit-source-id: b13fded5c88a533c706702fb2070c918c839dca4
(cherry picked from commit 0b534b829a2e90e1e533951c6d334fdeaa9358b9)
2022-04-21 17:27:05 -07:00
Jeeja
45bbc4c028 Update Dataloader with default parameter device (#65402)
Summary:
pin_memory, has optional device parameter to specify
which device you want to pin for.  With this above change
the Dataloader will work only for CUDA backend. To add
support for other backend which supports pinned memory,
dataloader is updated with device as optional parameter.

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65402

Reviewed By: zou3519

Differential Revision: D32282204

Pulled By: VitalyFedyunin

fbshipit-source-id: e2e09876969af108d0db38af7c2d1b2f1cfa9858
(cherry picked from commit 3b76e151964fce442e27fe8fb5c37af930da4fa1)
2022-04-21 01:33:53 +00:00
Philip Meier
04db1b874f prevent overriding shuffle settings in DataLoader for datapipes
Fixes https://github.com/pytorch/data/issues/295

Follow-up to https://github.com/pytorch/pytorch/pull/75014#issuecomment-1091921305. We only need to update locations where we actually check `shuffle` for identity with a boolean value, i.e. `shuffle is False`. For bool-ish checks like `if shuffle:`, `None` behaves just like `False`.

`IterDataPipe`'s are currently not mentioned in the docstring. Since this change only applies to them, I didn't update it. LMK, if I should do that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75505
Approved by: https://github.com/ejguan
2022-04-12 18:26:33 +00:00
Philip Meier
3c10987692 don't add extra shuffle in DataLoader2 if one is present
Without this, `DataLoader2` will just add an `Shuffler` to the end of the datapipe if `shuffle=True`:

```py
from torch.utils.data.dataloader_experimental import DataLoader2

from torchdata.datapipes.iter import IterableWrapper, IterDataPipe, Shuffler

class Sorter(IterDataPipe):
    def __init__(self, datapipe):
        self.datapipe = datapipe

    def __iter__(self):
        return iter(sorted(self.datapipe))

data = list(range(1000))
dp = IterableWrapper(data)
dp = Shuffler(dp).set_shuffle(False)
dp = Sorter(dp)

dl2 = DataLoader2(dp, shuffle=True, batch_size=None)

assert list(dl2) == data  # fails unless you hit a lucky random seed
```

This example is somewhat non-sensical, but demonstrates we cannot simply add a `Shuffler`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75014
Approved by: https://github.com/ejguan
2022-04-05 19:53:08 +00:00
amin-nejad
cce831c805 Fix misleading DataLoader docstring
Fixes description of `prefetch_factor` argument to `DataLoader` as discussed in #58030
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74558
Approved by: https://github.com/NivekT
2022-03-28 17:54:48 +00:00
Evren Tumer
7534525735 Reset worker cycle iterator for determinism across runs (#73675)
Summary:
Reset worker cycle iterator for determinism across runs

Fixes https://github.com/pytorch/pytorch/issues/73603

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73675

Reviewed By: bdhirsh

Differential Revision: D34688704

Pulled By: ejguan

fbshipit-source-id: 7bab11f0b9f59645d9b168fa11d92dc7c2c4d34e
(cherry picked from commit eb5fd559224988f9967528e154cf37c5031fe7c2)
2022-03-09 14:55:07 +00:00
Erjia Guan
67a275c293 Fix persistent worker exits before pin_memory thread (#71579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71579

Fixes #1551

As the comment in the code, register a function to terminate persistent workers.
By adding a reference of these workers in `atexit`, it would prevent Python interpreter kills these persistent worker processes before `pin_memorh_thread` exits.
And, if users explicitly kills DataLoader iterator, such function in `atexit` would be a no-op.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D33896537

Pulled By: ejguan

fbshipit-source-id: 36b57eac7523d8aa180180c2b61fc693ea4638ae
(cherry picked from commit 05add2ae0f)
2022-02-01 23:57:17 +00:00
Nikita Shulga
86aefdc082 Revert D33694867: Fix persistent worker exits before pin_memory thread
Test Plan: revert-hammer

Differential Revision:
D33694867 (e2191e7084)

Original commit changeset: 0847f4d424a0

Original Phabricator Diff: D33694867 (e2191e7084)

fbshipit-source-id: 5f28616700d8647cbe468a9e300724a7f0c6cc15
(cherry picked from commit 3d8125ba6d)
2022-01-22 00:09:28 +00:00
Erjia Guan
e2191e7084 Fix persistent worker exits before pin_memory thread (#71579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71579

Fixes #1551

As the comment in the code, register a function to terminate persistent workers. Using `atexit` to make sure termination of persistent workers always happens at the end (after pin_memory_thread exits).
We need such mechanism because Python interpreter would clean up worker process before DataLoader iterator in some rare cases.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D33694867

Pulled By: ejguan

fbshipit-source-id: 0847f4d424a0cd6b3c0be8235d505415970254e8
(cherry picked from commit 18ad4621af)
2022-01-21 20:31:16 +00:00
Erjia Guan
0721fc6474 Decouple MapDataPipe from Dataset (#70991)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70991

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D33477680

Pulled By: ejguan

fbshipit-source-id: d3e89492e921a96791319f35052a229684ddf7cf
2022-01-07 14:28:41 -08:00
Kevin Tse
b67eaec853 [DateLoader] more clearly expose 'default_collate' and 'default_convert' to users (#69862)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69862

Fixes #69445

cc SsnL VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: ejguan, ngimel

Differential Revision: D33068792

Pulled By: NivekT

fbshipit-source-id: ef9791acdc23d014b8761fa7420062d454ce8969
2021-12-14 11:18:26 -08:00
Vitaly Fedyunin
d90012689f [DataPipe] Control shuffle settings from DataLoader2 (#65756)
Summary:
Makes `shuffle` DataPipe sensitive to DataLoader(2) `shuffle` kwarg.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65756

Reviewed By: albanD

Differential Revision: D31344867

Pulled By: VitalyFedyunin

fbshipit-source-id: e0084e0ac193ac784d6298328ca1222745681347
2021-12-14 07:35:26 -08:00
Erjia Guan
060e41eafa Forward fix type hint for DataLoader (#66001)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66001

Test Plan: Imported from OSS

Reviewed By: NivekT

Differential Revision: D31340565

Pulled By: ejguan

fbshipit-source-id: d05ae42ebf93f61d781dc5d81ef0222e24f5acb3
2021-10-01 15:48:45 -07:00
Michael Suo
21da6ae9ce suppress mypy error (#66003)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66003

Differential Revision:
D31340874
D31340874

Test Plan: Imported from OSS

Reviewed By: seemethere

Pulled By: suo

fbshipit-source-id: d9ef0f40625fe5ff21f8a5e044d5a75400367dc2
2021-10-01 09:17:42 -07:00
Roman Shapovalov
fc52f1293e Improve pytorch type hints (Dataloader, trig functions)
Summary:
This is to fix Pyre errors in our applications:
* calling `tensor.cos()` etc.
* creating a data loader with batch sampler that is `List[List[int]]`.

Test Plan: TODO: rebase the diffs and run Pyre.

Reviewed By: ejguan

Differential Revision: D31309564

fbshipit-source-id: 1c6f3070d7570260de170e2fe2153d277b246745
2021-10-01 06:53:57 -07:00
Adam J. Stewart
e5ab0d1013 DataLoader: allow non-integer Samplers (#63500)
Summary:
Not entirely sure how to use TypeVar but if someone could give me a hint it would be appreciated. Also let me know if you want me to add tests so we can make sure non-integer samplers actually work. It seems like `test/test_dataloader.py` is the correct location but that's a big file.

Fixes https://github.com/pytorch/pytorch/issues/63483

ejguan

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63500

Reviewed By: mruberry

Differential Revision: D30403689

Pulled By: ejguan

fbshipit-source-id: 464e09e5aad3215b94a29cc5e21cb4b10ec136e3
2021-08-19 14:55:46 -07:00
Victor Bittorf
91c076eadc Add TorchVitals for DataLoader (#60959)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60959

Add TorchVitals for Dataloader, this indicates that the data loader was enabled.

This is a no-op if TORCH_VITALS environment variable is not set.

Test Plan: buck test mode/dbg caffe2/test:torch -- --regex vitals

Reviewed By: VitalyFedyunin

Differential Revision: D29445146

fbshipit-source-id: d5778fff3dafb3c0463fec7a498bff4905597518
2021-06-29 14:08:32 -07:00
Philip Meier
d5988c5eca remove unused type: ignore directives (#60006)
Summary:
During development it is common practice to put `type: ignore` comments on lines that are correct, but `mypy` doesn't recognize this. This often stems from the fact, that the used `mypy` version wasn't able to handle the used pattern.

With every new release `mypy` gets better at handling complex code. In addition to fix all the previously accepted but now failing patterns, we should also revisit all `type: ignore` comments to see if they are still needed or not. Fortunately, we don't need to do it manually: by adding `warn_unused_ignores = True` to the configuration, `mypy` will error out in case it encounters an `type: ignore` that is no longer needed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60006

Reviewed By: jbschlosser, malfet

Differential Revision: D29133237

Pulled By: albanD

fbshipit-source-id: 41e82edc5cd5affa7ccedad044b59b94dad4425a
2021-06-18 07:23:31 -07:00
Erjia Guan
8cf85a1152 [DataLoader][doc] Randomness for base_seed generator and NumPy seed (#56528)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56528

Tried to search across internal and external usage of DataLoader. People haven't started to use `generator` for `DataLoader`.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D27908487

Pulled By: ejguan

fbshipit-source-id: 14c83ed40d4ba4dc988b121968a78c2732d8eb93
2021-04-22 09:40:45 -07:00
Erjia Guan
aec83ff45e [DataLoader] Add Numpy seeding to worker of DataLoader (#56488)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56488

Considering amount of requests for this feature, introduce numpy seeding as default within each worker for DataLoader.

## BC-breaking Note:
- By introducing default numpy.random seeding strategy to workers of DataLoader, users don't need to manually set seed for workers by the `worker_init_fn`. And this PR won't influence users who are currently using `worker_init_fn` to set customized seed for workers.
- DataLoader will preserve reproducibility for users who are using numpy.random within Dataset.
- Multiprocessing (without `worker_init_fn` to define seed for numpy)
  - Start method as `spawn`: Each worker will now have seed for numpy random, rather than the seed generated from the imported time of Numpy module that make the DataLoader lose the reproducibility.
  - Start method as `fork`: Each worker not only have the same benefit as `spawn`,  but also have different seed for numpy as default, rather than inheriting the same seed.

Using the following Dataset and script as an example:
```py
class RandomDataset(Dataset):
    def __getitem__(self, ind):
        item = [ind, np.random.randint(1, 10000)]
        return item

    def __len__(self):
        return 20

if __name__ == '__main__'"
    ctx = mp.get_context('fork')
    ds = RandomDataset()
    g = torch.Generator()
    g.manual_seed(0)
    dl = DataLoader(ds, 2, shuffle=False, num_workers=4, multiprocessing_context=ctx, generator=g)

    epochs = 2
    for _ in range(epochs):
        for batch in d;:
            print(batch)
        print("====" * 10)
```

### 1.8.1:
Each worker generates same random result per iteration. And the seed will be reset to same for each epoch.
```py
tensor([[   0, 7449],
        [   1, 1519]])
tensor([[   2, 7449],
        [   3, 1519]])
tensor([[   4, 9645],
        [   5, 2387]])
tensor([[   6, 9645],
        [   7, 2387]])
tensor([[   8, 3118],
        [   9, 4552]])
=========================
tensor([[   0, 7449],
        [   1, 1519]])
tensor([[   2, 7449],
        [   3, 1519]])
tensor([[   4, 9645],
        [   5, 2387]])
tensor([[   6, 9645],
        [   7, 2387]])
tensor([[   8, 3118],
        [   9, 4552]])
=========================
```

### This PR:
Each worker has different seed at the beginning and re-seed for each epoch.
```py
tensor([[   0, 8715],
        [   1, 5555]])
tensor([[   2, 6379],
        [   3, 1432]])
tensor([[   4, 3271],
        [   5, 5132]])
tensor([[   6, 4287],
        [   7, 1104]])
tensor([[   8, 8682],
        [   9, 1699]])
=========================
tensor([[   0, 1374],
        [   1,  996]])
tensor([[   2,  143],
        [   3, 3507]])
tensor([[   4, 5887],
        [   5, 4730]])
tensor([[   6, 7274],
        [   7,  738]])
tensor([[   8, 6374],
        [   9, 1572]])
=========================
```

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D27908486

Pulled By: ejguan

fbshipit-source-id: 5f313a30563bedeb88be214fa4beca0cefe9e4f4
2021-04-22 09:39:33 -07:00
Sam Estep
75024e228c Add lint for unqualified type: ignore (#56290)
Summary:
The other half of https://github.com/pytorch/pytorch/issues/56272.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56290

Test Plan:
CI should pass on the tip of this PR, and we know that the lint works because the following CI runs (before this PR was finished) failed:

- https://github.com/pytorch/pytorch/runs/2384511062
- https://github.com/pytorch/pytorch/actions/runs/765036024

Reviewed By: seemethere

Differential Revision: D27867219

Pulled By: samestep

fbshipit-source-id: e648f07b6822867e70833e23ddafe7fb7eaca235
2021-04-21 08:07:23 -07:00
Zhiyuan Chen
7d4e9bdba1 Add type hint for SequentialSampler (#56374)
Summary:
Add type hint for SequentialSampler

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56374

Reviewed By: heitorschueroff

Differential Revision: D27884528

Pulled By: ejguan

fbshipit-source-id: 68eb900643098565743245c843e76e464f981458
2021-04-20 14:45:52 -07:00
danielgordon10
7f1693d95e Fix type hints of the callable arguments for DataLoader (#52924)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52806

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52924

Reviewed By: malfet

Differential Revision: D26694894

Pulled By: ejguan

fbshipit-source-id: 55734ec9684caa90f1e599b65659b7c57047f802
2021-02-27 07:45:49 -08:00
Chester Liu
58eb23378f Clean up usage of torch._six partially (#49785)
Summary:
See https://github.com/pytorch/pytorch/issues/42919

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49785

Reviewed By: mruberry

Differential Revision: D25963833

Pulled By: bugra

fbshipit-source-id: 11c90d6b8d3f206c9d0a4d8621b773beb10c6ba2
2021-02-08 13:58:34 -08:00
Tongzhou Wang
54ce171f16 Fix persistent_workers + pin_memory (#48543)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/48370 https://github.com/pytorch/pytorch/issues/47445

cc emcastillo who authored the original functionality.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48543

Reviewed By: bdhirsh

Differential Revision: D25277474

Pulled By: ejguan

fbshipit-source-id: 1967002124fb0fff57caca8982bc7df359a059a2
2021-01-08 07:04:10 -08:00
Hugo van Kemenade
473e78c0fa Remove redundant code for unsupported Python versions (#49486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49486

Remove code for Python 3.5 and lower.

There's more that can be removed/modernised, but sticking mainly to redundant version checks here, to keep the diff/PR smaller.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46579

Reviewed By: zou3519

Differential Revision: D24453571

Pulled By: ezyang

fbshipit-source-id: c2cfcf05d6c5f65df64d89c331692c9aec09248e
2021-01-06 12:45:46 -08:00
Samuel Marks
e6779d4357 [*.py] Rename "Arguments:" to "Args:" (#49736)
Summary:
I've written custom parsers and emitters for everything from docstrings to classes and functions. However, I recently came across an issue when I was parsing/generating from the TensorFlow codebase: inconsistent use of `Args:` and `Arguments:` in its docstrings.

```sh
(pytorch#c348fae)$ for name in 'Args:' 'Arguments:'; do
    printf '%-10s %04d\n' "$name" "$(rg -IFtpy --count-matches "$name" | paste -s -d+ -- | bc)"; done
Args:      1095
Arguments: 0336
```

It is easy enough to extend my parsers to support both variants, however it looks like `Arguments:` is wrong anyway, as per:

  - https://google.github.io/styleguide/pyguide.html#doc-function-args @ [`ddccc0f`](https://github.com/google/styleguide/blob/ddccc0f/pyguide.md)

  - https://chromium.googlesource.com/chromiumos/docs/+/master/styleguide/python.md#describing-arguments-in-docstrings @ [`9fc0fc0`](https://chromium.googlesource.com/chromiumos/docs/+/9fc0fc0/styleguide/python.md)

  - https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html @ [`c0ae8e3`](https://github.com/sphinx-contrib/napoleon/blob/c0ae8e3/docs/source/example_google.rst)

Therefore, only `Args:` is valid. This PR replaces them throughout the codebase.

PS: For related PRs, see tensorflow/tensorflow/pull/45420

PPS: The trackbacks automatically appearing below are sending the same changes to other repositories in the [PyTorch](https://github.com/pytorch) organisation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49736

Reviewed By: albanD

Differential Revision: D25710534

Pulled By: soumith

fbshipit-source-id: 61e8ff01abb433e9f78185c2d1d0cbd7c22c1619
2020-12-28 09:34:47 -08:00
Tom McClintock
a3aafea076 Fixed a typo in dataloader.py. (#49437)
Summary:
This small PR fixes a one character typo in the docstring for `DataLoader`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49437

Reviewed By: ngimel

Differential Revision: D25665971

Pulled By: mrshenli

fbshipit-source-id: b60f975f1e3bf0bb8f88e39f490f716c602f087e
2020-12-21 10:27:24 -08:00
Teng Gao
1c31f76297 Add high level profiling trace for dataloading and optimizer (#47655)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/47441

To give user more information about python level functions in profiler traces, we propose to instrument on the following functions:

```
_BaseDataLoaderIter.__next__
Optimizer.step
Optimizer.zero_grad
```

Because the record_function already uses if (!active) to check whether the profiler is enabled, so we don't explicitly call torch.autograd._profiler_enabled() before each instrument.

Acknowledgement: nbcsm, guotuofeng, gunandrose4u , guyang3532 , mszhanyi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47655

Reviewed By: smessmer

Differential Revision: D24960386

Pulled By: ilia-cher

fbshipit-source-id: 2eb655789e2e2f506e1b8f95ad3d470c83281102
2020-12-09 00:13:56 -08:00
Tongzhou Wang
1112773cf5 Fix unintended error when worker force kill happens #43455 (#43462)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/43455

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43462

Reviewed By: bdhirsh

Differential Revision: D25277759

Pulled By: VitalyFedyunin

fbshipit-source-id: 0bb0d87374c0403853d71aac2c242374bfc7acf2
2020-12-02 21:42:16 -08:00
SsnL
4abca9067b Fix dataloader hang with large sampler (#48669)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/48666

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48669

Reviewed By: zhangguanheng66

Differential Revision: D25255763

Pulled By: VitalyFedyunin

fbshipit-source-id: d06421f52bb1d00cdf8025f1a2ba0d1f9284731a
2020-12-02 09:07:30 -08:00
lixinyu
67b7e751e6 add warning if DataLoader is going to create excessive number of thread (#46867)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46867

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D24545540

Pulled By: glaringlee

fbshipit-source-id: a3bef0d417e535b8ec0bb33f39cfa2308aadfff0
2020-10-30 07:54:23 -07:00
Vitaly Fedyunin
31ee5d8d8b Adding information how to control randomness with DataLoader (#45749)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45749

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D24088407

Pulled By: VitalyFedyunin

fbshipit-source-id: 398b73ec5e8c83000ebc692001da847fc0aaa48f
2020-10-12 16:57:58 -07:00
Emilio Castillo
5472426b9f Reset DataLoader workers instead of creating new ones (#35795)
Summary:
This PR needs discussion as it changes the behavior of `DataLoader`. It can be closed if its not considered a good practice.

Currently, the `DataLoader` spawns a new `_BaseDataLoaderIter` object every epoch,
In the case of the multiprocess DataLoader, every epoch the worker processes are re-created and they make a copy of the original `Dataset` object.
If users want to cache data or do some tracking on their datasets, all their data will be wiped out every epoch. Notice that this doesn't happen when the number of workers is 0. giving some inconsistencies with the multiprocess and serial data loaders.

This PR keeps the `_BaseDataLoaderIter` object alive and just resets it within epochs, so the workers remain active and so their own `Dataset` objects. People seem to file issues about this often.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/35795

Reviewed By: ailzhang

Differential Revision: D23426612

Pulled By: VitalyFedyunin

fbshipit-source-id: e16950036bae35548cd0cfa78faa06b6c232a2ea
2020-09-01 11:48:00 -07:00
Akihiro Nitta
f17d7a5556 Fix exception chaining in torch/ (#43836)
Summary:
## Motivation
Fixes https://github.com/pytorch/pytorch/issues/43770.

## Description of the change
This PR fixes exception chaining only in files under `torch/` where appropriate.
To fix exception chaining, I used either:
1. `raise new_exception from old_exception` where `new_exception` itself seems not descriptive enough to debug or `old_exception` delivers valuable information.
2. `raise new_exception from None` where raising both of `new_exception` and `old_exception` seems a bit noisy and redundant.
I subjectively chose which one to use from the above options.

## List of lines containing raise in except clause:
I wrote [this simple script](https://gist.github.com/akihironitta/4223c1b32404b36c1b349d70c4c93b4d) using [ast](https://docs.python.org/3.8/library/ast.html#module-ast) to list lines where `raise`ing in `except` clause.

- [x] 000739c31a/torch/jit/annotations.py (L35)
- [x] 000739c31a/torch/jit/annotations.py (L150)
- [x] 000739c31a/torch/jit/annotations.py (L158)
- [x] 000739c31a/torch/jit/annotations.py (L231)
- [x] 000739c31a/torch/jit/_trace.py (L432)
- [x] 000739c31a/torch/nn/utils/prune.py (L192)
- [x] 000739c31a/torch/cuda/nvtx.py (L7)
- [x] 000739c31a/torch/utils/cpp_extension.py (L1537)
- [x] 000739c31a/torch/utils/tensorboard/_pytorch_graph.py (L292)
- [x] 000739c31a/torch/utils/data/dataloader.py (L835)
- [x] 000739c31a/torch/utils/data/dataloader.py (L849)
- [x] 000739c31a/torch/utils/data/dataloader.py (L856)
- [x] 000739c31a/torch/testing/_internal/common_utils.py (L186)
- [x] 000739c31a/torch/testing/_internal/common_utils.py (L189)
- [x] 000739c31a/torch/testing/_internal/common_utils.py (L424)
- [x] 000739c31a/torch/testing/_internal/common_utils.py (L1279)
- [x] 000739c31a/torch/testing/_internal/common_utils.py (L1283)
- [x] 000739c31a/torch/testing/_internal/common_utils.py (L1356)
- [x] 000739c31a/torch/testing/_internal/common_utils.py (L1388)
- [x] 000739c31a/torch/testing/_internal/common_utils.py (L1391)
- [ ] 000739c31a/torch/testing/_internal/common_utils.py (L1412)
- [x] 000739c31a/torch/testing/_internal/codegen/random_topo_test.py (L310)
- [x] 000739c31a/torch/testing/_internal/codegen/random_topo_test.py (L329)
- [x] 000739c31a/torch/testing/_internal/codegen/random_topo_test.py (L332)
- [x] 000739c31a/torch/testing/_internal/jit_utils.py (L183)
- [x] 000739c31a/torch/testing/_internal/common_nn.py (L4789)
- [x] 000739c31a/torch/onnx/utils.py (L367)
- [x] 000739c31a/torch/onnx/utils.py (L659)
- [x] 000739c31a/torch/onnx/utils.py (L892)
- [x] 000739c31a/torch/onnx/utils.py (L897)
- [x] 000739c31a/torch/serialization.py (L108)
- [x] 000739c31a/torch/serialization.py (L754)
- [x] 000739c31a/torch/distributed/rpc/_testing/faulty_agent_backend_registry.py (L76)
- [x] 000739c31a/torch/distributed/rpc/backend_registry.py (L260)
- [x] 000739c31a/torch/distributed/distributed_c10d.py (L184)
- [x] 000739c31a/torch/_utils_internal.py (L57)
- [x] 000739c31a/torch/hub.py (L494)
- [x] 000739c31a/torch/contrib/_tensorboard_vis.py (L16)
- [x] 000739c31a/torch/distributions/lowrank_multivariate_normal.py (L100)
- [x] 000739c31a/torch/distributions/constraint_registry.py (L142)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43836

Reviewed By: ailzhang

Differential Revision: D23431212

Pulled By: malfet

fbshipit-source-id: 5f7f41b391164a5ad0efc06e55cd58c23408a921
2020-08-31 20:26:23 -07:00