Commit Graph

167 Commits

Author SHA1 Message Date
Xuehai Pan
b005ec62b9 [BE] Remove dependency on six and future (#94709)
Remove the Python 2 and 3 compatibility library [six](https://pypi.org/project/six) and [future](https://pypi.org/project/future) and `torch._six`. We only support Python 3.8+ now. It's time to retire them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94709
Approved by: https://github.com/malfet, https://github.com/Skylion007
2023-02-14 09:14:14 +00:00
Xuehai Pan
5b1cedacde [BE] [2/3] Rewrite super() calls in functorch and torch (#94588)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94588
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-10 21:16:33 +00:00
Aaron Gokaslan
8fce9a09cd [BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308)
Apply parts of pyupgrade to torch (starting with the safest changes).
This PR only does two things: removes the need to inherit from object and removes unused future imports.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-07 21:10:56 +00:00
Wenlei Xie
d6dec1a5cf Refactor sharding data pipe into a seperate file (#94095)
Move `ShardingFilterIterDataPipe` into a dedicated file.

Also, propose to have a dedicated parent class (`_ShardingIterDataPipe`) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composable `Callable[[Iterable], Iterable]`.  So we don't need to based on whether `is_shardable` and `apply_sharding` are presented in DataPipe in `graph_settings.py`. But open to other discussions.

Open question: Should
[ShardingRoundRobinDispatcherIterDataPipe](01fc762003/torchdata/datapipes/iter/util/sharding.py (L16-L17)) also be considered as a `_ShardingIterDataPipe`? (e.g. this sharding is executed by replicating (the metadata), while `ShardingRoundRobinDispatcherIterDataPipe` hints too expensive to replicate so requires round robin data exchange/dispatch).

Differential Revision: D43014692

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94095
Approved by: https://github.com/ejguan, https://github.com/NivekT
2023-02-07 09:12:02 +00:00
Dmitry Tomshin
11db12bd94 Issue 68576 prefetch factor docstring changes (#89874)
Fixes #68576

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89874
Approved by: https://github.com/kit1980
2022-11-30 23:42:56 +00:00
Dmitry Tomshin
57e05e822d Issue 68576 prefetch factor (#88972)
Fixes #68576
This PR allows set the `prefetch_factor=None` making it really optional according to the documentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88972
Approved by: https://github.com/kit1980
2022-11-18 00:10:50 +00:00
Vitaly Fedyunin
9dadf8fcc2 [DataPipes] Add group support to the sharding_filter (#88424)
Differential Revision: [D41006747](https://our.internmc.facebook.com/intern/diff/D41006747)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88424
Approved by: https://github.com/ejguan
2022-11-07 22:07:01 +00:00
erjia
b90db4a78f [DataPipe] Fix type checking to accept both Iter and Map DataPipe (#87285)
Fixes https://github.com/pytorch/data/issues/841

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87285
Approved by: https://github.com/NivekT
2022-10-20 05:05:56 +00:00
leizhenyuan
c6187ea326 add support for pin memory on xpu device (#86545)
add support for pin memory on xpu device

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86545
Approved by: https://github.com/ezyang
2022-10-19 13:24:48 +00:00
Tongzhou Wang
7ff1ca4e33 Add type annotation to get_worker_info (#87017)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87017
Approved by: https://github.com/ejguan, https://github.com/NivekT
2022-10-19 00:25:04 +00:00
Erjia Guan
f1a6f32b72 [DataLoader] Make distributed lazily initialized & share seed via PG (#85279)
Fixes #84492 https://github.com/pytorch/data/issues/772

## Changes
- Move the logic of distributed sharding from the constructor of DataLoader to the constructor of DataLoaderIterator. This would prevent the Error caused by lazy distributed process initialization
- Replace distributed store by process group (`gloo`) to share the random seed because `mpi` backend doesn't provide distributed store.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85279
Approved by: https://github.com/NivekT, https://github.com/VitalyFedyunin
2022-09-23 18:52:52 +00:00
erjia
33bb8ae350 Set shuffle to DataPipes with set_shuffle API (#83741)
This PR requires PR is landed: https://github.com/pytorch/pytorch/pull/83202

## changes
- For `apply_shuffle_setting` and `apply_shuffle_seed`, it makes sure it will apply shuffle setting to each of DataPipe that contains a method called `set_shuffle` or `set_seed`.
- Change the API from `apply_shuffle_seed` to `apply_random_seed`.
- Fix a bug that `apply_shuffle_seed` only accepts DataPipe that is hashable. After the PR, this function uses `id` to prevent seeding the same DataPipe multiple times per epoch.
- Fix another bug from `shuffler` that `reset` with `_enable=False` would also reset `_seed`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83741
Approved by: https://github.com/NivekT
2022-09-13 13:38:58 +00:00
erjia
56fef4e6ee fix NoneType object has no attribute python_exit_status (#83985)
Fixes #83791

Prevents the Error when `_utils` has been cleared by Python before `__del__` is invoked.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83985
Approved by: https://github.com/NivekT
2022-08-25 16:05:14 +00:00
ProGamerGov
71d50f4f89 Change docstring type callable to Callable for consistency (#82487)
### Description

Across PyTorch's docstrings, both `callable` and `Callable` for variable types. The Callable should be capitalized as we are referring to the `Callable` type, and not the Python `callable()` function.

### Testing

There shouldn't be any testing required.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82487
Approved by: https://github.com/albanD
2022-08-01 17:26:09 +00:00
erjia
aa1466d542 Raise proper timeout when sharing the distributed shared seed (#81666)
Fixes https://github.com/pytorch/data/issues/659

- This would fix the problem that a slow DataLoader on rank 0 would cause TimeoutError as I have removed the `wait` operation on other Ranks.
- This PR also adds a [default timeout](f6a45f7984/torch/csrc/distributed/c10d/ProcessGroup.hpp (L26-L27)) as 30 * 60 seconds (taking reference from the distributed team's implementation). When the distributed seed is stuck on any rank, a proper timeout with detailed message will be raised.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81666
Approved by: https://github.com/NivekT
2022-07-19 17:21:02 +00:00
Vitaly Fedyunin
e9b3bc2ead [DataLoader] Locking lower ranks seed recepients (#81071)
Exit seed receiving section only when all ranks received seed, otherwise we are at risk that current rank
will reach same section of the code again while rank zero still in the previous iteration

Fixes: #80845

Differential Revision: [D37702557](https://our.internmc.facebook.com/intern/diff/D37702557)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81071
Approved by: https://github.com/msaroufim, https://github.com/ejguan
2022-07-08 18:53:45 +00:00
erjia
3ec9d34f21 Fix distributed store to use add for the counter of DL shared seed (#80348)
In order to get the result of `_shared_seed_recv_cnt` properly, switch from `store.get` to `store.add(key, 0)`.

See the comment from distributed team for the reason:
590d3e5774/torch/distributed/distributed_c10d.py (L242-L246)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80348
Approved by: https://github.com/VitalyFedyunin, https://github.com/NivekT
2022-06-27 21:59:17 +00:00
erjia
ccccd0efec [DataLoader] Share seed via Distributed Store to get rid of CUDA dependency (#79829)
Fixes #79828

In distributed environment, before this PR, DataLoader would create a Tensor holding the shared seed in RANK 0 and send the Tensor to other processes. However, when `NCCL` is used as the distributed backend, the Tensor is required to be moved to cuda before broadcasted from RANK 0 to other RANKs. And, this causes the Issue where DataLoader doesn't move the Tensor to cuda before sharing using `NCCL`.

After offline discussion with @mrshenli, we think the distributed Store is a better solution as the shared seed is just an integer value. Then, we can get rid of the dependency on NCCL and CUDA when sharing info between distributed processes for DataLoader.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79829
Approved by: https://github.com/VitalyFedyunin, https://github.com/NivekT
2022-06-20 19:18:35 +00:00
erjia
04f87f2ab9 [DataLoader] Fix the world_size when distributed sharding MapDataPipe (#79524)
Fixes #79449

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79524
Approved by: https://github.com/NivekT, https://github.com/VitalyFedyunin
2022-06-14 19:03:57 +00:00
ErjiaGuan
5158a6b41a Foward fix sharding bug for DL (#79124)
This PR solves a bug introduced by #79041

`torch.utils.data.graph_settings.apply_sharding` changes the datapipe in-place and returns `None`

It would resolve the Error in TorchData. See: https://github.com/pytorch/data/actions/runs/2461030312
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79124
Approved by: https://github.com/VitalyFedyunin
2022-06-08 16:16:58 +00:00
erjia
b3ed65343d Fix sharding strategy for distributed DL (#79041)
1. Change the sharding strategy from sharding by worker first then by rank to sharding in the order of rank then workers.
2. Change to fetch Rank and World size in main process for the sake of `spawn`.

For the change 1:
Before this PR, for the case when dataset can not be evenly divided by `worker_num * world_size`, more data will be retrieved by workers in first RANKs.
Using the following example:
- dataset size: 100
- world_size: 4
- num_worker: 2

The number of data retrieved by each rank before this PR
- Rank 0: 26
- Rank 1: 26
- Rank 2: 24
- Rank 3: 24

The number of data retrieved by each rank after this PR
- Rank 0: 25
- Rank 1: 25
- Rank 2: 25
- Rank 3: 25

For the change 2:
Before this PR, `dist` functions are invoked inside worker processes. It's fine when the worker processes are forked from the parent process. All environment variables are inherited and exposed to these `dist` functions. However, when the worker processes are spawned, they won't be able to access to these environment variables, then the dataset won't be sharded by rank.
After this PR, `_sharding_worker_init_fn` should be working for both `spawn` and `fork` case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79041
Approved by: https://github.com/VitalyFedyunin, https://github.com/NivekT
2022-06-07 20:56:32 +00:00
Vitaly Fedyunin
6fe6902f97 [DataLoader] Apply sharding settings in dist when num_workers is 0
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78950

Approved by: https://github.com/ejguan, https://github.com/NivekT
2022-06-06 20:03:02 +00:00
erjia
9b6cb83b0c Make ShufflerDataPipe deterministic for persistent DL and distributed DL (#78765)
Fixes https://github.com/pytorch/data/issues/426

This PR introduces two main changes:
- It ensures the `ShufflerDataPipe` would share the same seed across distributed processes.
- Users can reset `shuffle` for persistent workers per epoch.

Detail:
- `shared_seed` is shared across distributed and worker processes. It will seed a `shared_rng` to provide seeds to each `ShufflerDataPipe` in the pipeline
- `worker_loop` now accepts a new argument of `shared_seed` to accept this shared seed.
- The `shared_seed` is attached to `_ResumeIteration` for resetting seed per epoch for `persistent worker`
- I choose not to touch `base_seed` simply for BC issue

I used this [script](https://gist.github.com/ejguan/d88f75fa822cb696ab1bc5bc25844f47) to test the result with `world_size=4`. Please check the result in: https://gist.github.com/ejguan/6ee2d2de12ca57f9eb4b97ef5a0e300b

You can see there isn't any duplicated/missing element for each epoch. And, with the same seed, the order of data remains the same across epochs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78765
Approved by: https://github.com/VitalyFedyunin
2022-06-06 17:24:00 +00:00
PyTorch MergeBot
129d9dbb15 Revert "Make ShufflerDataPipe deterministic for persistent DL and distributed DL (#78765)"
This reverts commit b769a0e18b.

Reverted https://github.com/pytorch/pytorch/pull/78765 on behalf of https://github.com/janeyx99 due to broke lint on trunk
2022-06-06 14:24:51 +00:00
erjia
b769a0e18b Make ShufflerDataPipe deterministic for persistent DL and distributed DL (#78765)
Fixes https://github.com/pytorch/data/issues/426

This PR introduces two main changes:
- It ensures the `ShufflerDataPipe` would share the same seed across distributed processes.
- Users can reset `shuffle` for persistent workers per epoch.

Detail:
- `shared_seed` is shared across distributed and worker processes. It will seed a `shared_rng` to provide seeds to each `ShufflerDataPipe` in the pipeline
- `worker_loop` now accepts a new argument of `shared_seed` to accept this shared seed.
- The `shared_seed` is attached to `_ResumeIteration` for resetting seed per epoch for `persistent worker`
- I choose not to touch `base_seed` simply for BC issue

I used this [script](https://gist.github.com/ejguan/d88f75fa822cb696ab1bc5bc25844f47) to test the result with `world_size=4`. Please check the result in: https://gist.github.com/ejguan/6ee2d2de12ca57f9eb4b97ef5a0e300b

You can see there isn't any duplicated/missing element for each epoch. And, with the same seed, the order of data remains the same across epochs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78765
Approved by: https://github.com/VitalyFedyunin
2022-06-06 13:36:37 +00:00
Vitaly Fedyunin
883f8ef62e [DataLoader] DataLoader now automatically apply sharding to DataPipes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78631

Approved by: https://github.com/ejguan, https://github.com/NivekT
2022-06-02 17:40:29 +00:00
Sergii Dymchenko
e8bf3a9cd4 Remove Python 2-related code from dataloader (#78594)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78594
Approved by: https://github.com/seemethere
2022-06-01 05:25:23 +00:00
erjia
365ce350cb Make ShufflerDataPipe deterministic for SP & MP DataLoader (#77741)
This is the first PR to make DataPipe deterministic.

Users should be able to use `torch.manual_seed(seed)` to control the shuffle order for the following cases:
- Directly over `DataPipe`
- For single-process DataLoader
- Multiprocessing DataLoader

Unfortunately, for distributed training, users have to run `apply_shuffle_seed` manually to make sure all distributed processes having the same order of shuffle.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77741
Approved by: https://github.com/VitalyFedyunin, https://github.com/NivekT
2022-05-18 23:32:07 +00:00
Vitaly Fedyunin
edffd595c2 [DataLoader] Adding ability to use dill to pass DataPipes in mutiprocessing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77288

Approved by: https://github.com/ejguan, https://github.com/NivekT
2022-05-15 23:04:03 +00:00
Michael Suo
fb0f285638 [lint] upgrade mypy to latest version
Fixes https://github.com/pytorch/pytorch/issues/75927.

Had to fix some bugs and add some ignores.

To check if clean:
```
lintrunner --paths-cmd='git grep -Il .' --take MYPY,MYPYSTRICT
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76753

Approved by: https://github.com/malfet
2022-05-03 20:51:34 +00:00
PyTorch MergeBot
3d7428d9ac Revert "[lint] upgrade mypy to latest version"
This reverts commit 9bf18aab94.

Reverted https://github.com/pytorch/pytorch/pull/76753 on behalf of https://github.com/suo
2022-05-03 20:01:18 +00:00
Michael Suo
9bf18aab94 [lint] upgrade mypy to latest version
Fixes https://github.com/pytorch/pytorch/issues/75927.

Had to fix some bugs and add some ignores.

To check if clean:
```
lintrunner --paths-cmd='git grep -Il .' --take MYPY,MYPYSTRICT
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76753

Approved by: https://github.com/malfet
2022-05-03 19:43:28 +00:00
Erjia Guan
0289ab2cec Fix data-related public API (#368)
Summary:
X-link: https://github.com/pytorch/data/pull/368

This is PR aims to expose the right data-relate API.

There are two more changes made in this PR to convert public api to private api
`check_lambda_fn` -> `_check_lambda_fn`
`deprecation_warning` -> `_deprecation_warning`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76143

Reviewed By: albanD, NivekT

Differential Revision: D35798311

Pulled By: ejguan

fbshipit-source-id: b13fded5c88a533c706702fb2070c918c839dca4
(cherry picked from commit 0b534b829a2e90e1e533951c6d334fdeaa9358b9)
2022-04-21 17:27:05 -07:00
Jeeja
45bbc4c028 Update Dataloader with default parameter device (#65402)
Summary:
pin_memory, has optional device parameter to specify
which device you want to pin for.  With this above change
the Dataloader will work only for CUDA backend. To add
support for other backend which supports pinned memory,
dataloader is updated with device as optional parameter.

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65402

Reviewed By: zou3519

Differential Revision: D32282204

Pulled By: VitalyFedyunin

fbshipit-source-id: e2e09876969af108d0db38af7c2d1b2f1cfa9858
(cherry picked from commit 3b76e151964fce442e27fe8fb5c37af930da4fa1)
2022-04-21 01:33:53 +00:00
Philip Meier
04db1b874f prevent overriding shuffle settings in DataLoader for datapipes
Fixes https://github.com/pytorch/data/issues/295

Follow-up to https://github.com/pytorch/pytorch/pull/75014#issuecomment-1091921305. We only need to update locations where we actually check `shuffle` for identity with a boolean value, i.e. `shuffle is False`. For bool-ish checks like `if shuffle:`, `None` behaves just like `False`.

`IterDataPipe`'s are currently not mentioned in the docstring. Since this change only applies to them, I didn't update it. LMK, if I should do that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75505
Approved by: https://github.com/ejguan
2022-04-12 18:26:33 +00:00
Philip Meier
3c10987692 don't add extra shuffle in DataLoader2 if one is present
Without this, `DataLoader2` will just add an `Shuffler` to the end of the datapipe if `shuffle=True`:

```py
from torch.utils.data.dataloader_experimental import DataLoader2

from torchdata.datapipes.iter import IterableWrapper, IterDataPipe, Shuffler

class Sorter(IterDataPipe):
    def __init__(self, datapipe):
        self.datapipe = datapipe

    def __iter__(self):
        return iter(sorted(self.datapipe))

data = list(range(1000))
dp = IterableWrapper(data)
dp = Shuffler(dp).set_shuffle(False)
dp = Sorter(dp)

dl2 = DataLoader2(dp, shuffle=True, batch_size=None)

assert list(dl2) == data  # fails unless you hit a lucky random seed
```

This example is somewhat non-sensical, but demonstrates we cannot simply add a `Shuffler`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75014
Approved by: https://github.com/ejguan
2022-04-05 19:53:08 +00:00
amin-nejad
cce831c805 Fix misleading DataLoader docstring
Fixes description of `prefetch_factor` argument to `DataLoader` as discussed in #58030
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74558
Approved by: https://github.com/NivekT
2022-03-28 17:54:48 +00:00
Evren Tumer
7534525735 Reset worker cycle iterator for determinism across runs (#73675)
Summary:
Reset worker cycle iterator for determinism across runs

Fixes https://github.com/pytorch/pytorch/issues/73603

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73675

Reviewed By: bdhirsh

Differential Revision: D34688704

Pulled By: ejguan

fbshipit-source-id: 7bab11f0b9f59645d9b168fa11d92dc7c2c4d34e
(cherry picked from commit eb5fd559224988f9967528e154cf37c5031fe7c2)
2022-03-09 14:55:07 +00:00
Erjia Guan
67a275c293 Fix persistent worker exits before pin_memory thread (#71579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71579

Fixes #1551

As the comment in the code, register a function to terminate persistent workers.
By adding a reference of these workers in `atexit`, it would prevent Python interpreter kills these persistent worker processes before `pin_memorh_thread` exits.
And, if users explicitly kills DataLoader iterator, such function in `atexit` would be a no-op.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D33896537

Pulled By: ejguan

fbshipit-source-id: 36b57eac7523d8aa180180c2b61fc693ea4638ae
(cherry picked from commit 05add2ae0f)
2022-02-01 23:57:17 +00:00
Nikita Shulga
86aefdc082 Revert D33694867: Fix persistent worker exits before pin_memory thread
Test Plan: revert-hammer

Differential Revision:
D33694867 (e2191e7084)

Original commit changeset: 0847f4d424a0

Original Phabricator Diff: D33694867 (e2191e7084)

fbshipit-source-id: 5f28616700d8647cbe468a9e300724a7f0c6cc15
(cherry picked from commit 3d8125ba6d)
2022-01-22 00:09:28 +00:00
Erjia Guan
e2191e7084 Fix persistent worker exits before pin_memory thread (#71579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71579

Fixes #1551

As the comment in the code, register a function to terminate persistent workers. Using `atexit` to make sure termination of persistent workers always happens at the end (after pin_memory_thread exits).
We need such mechanism because Python interpreter would clean up worker process before DataLoader iterator in some rare cases.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D33694867

Pulled By: ejguan

fbshipit-source-id: 0847f4d424a0cd6b3c0be8235d505415970254e8
(cherry picked from commit 18ad4621af)
2022-01-21 20:31:16 +00:00
Erjia Guan
0721fc6474 Decouple MapDataPipe from Dataset (#70991)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70991

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D33477680

Pulled By: ejguan

fbshipit-source-id: d3e89492e921a96791319f35052a229684ddf7cf
2022-01-07 14:28:41 -08:00
Kevin Tse
b67eaec853 [DateLoader] more clearly expose 'default_collate' and 'default_convert' to users (#69862)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69862

Fixes #69445

cc SsnL VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: ejguan, ngimel

Differential Revision: D33068792

Pulled By: NivekT

fbshipit-source-id: ef9791acdc23d014b8761fa7420062d454ce8969
2021-12-14 11:18:26 -08:00
Vitaly Fedyunin
d90012689f [DataPipe] Control shuffle settings from DataLoader2 (#65756)
Summary:
Makes `shuffle` DataPipe sensitive to DataLoader(2) `shuffle` kwarg.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65756

Reviewed By: albanD

Differential Revision: D31344867

Pulled By: VitalyFedyunin

fbshipit-source-id: e0084e0ac193ac784d6298328ca1222745681347
2021-12-14 07:35:26 -08:00
Erjia Guan
060e41eafa Forward fix type hint for DataLoader (#66001)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66001

Test Plan: Imported from OSS

Reviewed By: NivekT

Differential Revision: D31340565

Pulled By: ejguan

fbshipit-source-id: d05ae42ebf93f61d781dc5d81ef0222e24f5acb3
2021-10-01 15:48:45 -07:00
Michael Suo
21da6ae9ce suppress mypy error (#66003)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66003

Differential Revision:
D31340874
D31340874

Test Plan: Imported from OSS

Reviewed By: seemethere

Pulled By: suo

fbshipit-source-id: d9ef0f40625fe5ff21f8a5e044d5a75400367dc2
2021-10-01 09:17:42 -07:00
Roman Shapovalov
fc52f1293e Improve pytorch type hints (Dataloader, trig functions)
Summary:
This is to fix Pyre errors in our applications:
* calling `tensor.cos()` etc.
* creating a data loader with batch sampler that is `List[List[int]]`.

Test Plan: TODO: rebase the diffs and run Pyre.

Reviewed By: ejguan

Differential Revision: D31309564

fbshipit-source-id: 1c6f3070d7570260de170e2fe2153d277b246745
2021-10-01 06:53:57 -07:00
Adam J. Stewart
e5ab0d1013 DataLoader: allow non-integer Samplers (#63500)
Summary:
Not entirely sure how to use TypeVar but if someone could give me a hint it would be appreciated. Also let me know if you want me to add tests so we can make sure non-integer samplers actually work. It seems like `test/test_dataloader.py` is the correct location but that's a big file.

Fixes https://github.com/pytorch/pytorch/issues/63483

ejguan

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63500

Reviewed By: mruberry

Differential Revision: D30403689

Pulled By: ejguan

fbshipit-source-id: 464e09e5aad3215b94a29cc5e21cb4b10ec136e3
2021-08-19 14:55:46 -07:00
Victor Bittorf
91c076eadc Add TorchVitals for DataLoader (#60959)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60959

Add TorchVitals for Dataloader, this indicates that the data loader was enabled.

This is a no-op if TORCH_VITALS environment variable is not set.

Test Plan: buck test mode/dbg caffe2/test:torch -- --regex vitals

Reviewed By: VitalyFedyunin

Differential Revision: D29445146

fbshipit-source-id: d5778fff3dafb3c0463fec7a498bff4905597518
2021-06-29 14:08:32 -07:00
Philip Meier
d5988c5eca remove unused type: ignore directives (#60006)
Summary:
During development it is common practice to put `type: ignore` comments on lines that are correct, but `mypy` doesn't recognize this. This often stems from the fact, that the used `mypy` version wasn't able to handle the used pattern.

With every new release `mypy` gets better at handling complex code. In addition to fix all the previously accepted but now failing patterns, we should also revisit all `type: ignore` comments to see if they are still needed or not. Fortunately, we don't need to do it manually: by adding `warn_unused_ignores = True` to the configuration, `mypy` will error out in case it encounters an `type: ignore` that is no longer needed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60006

Reviewed By: jbschlosser, malfet

Differential Revision: D29133237

Pulled By: albanD

fbshipit-source-id: 41e82edc5cd5affa7ccedad044b59b94dad4425a
2021-06-18 07:23:31 -07:00