Commit Graph

174 Commits

Author SHA1 Message Date
Erjia Guan
5c696443c7 [DataLoader] Modfity construct_time_validation to argument_validation (#55836)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55836

Change construct_time_validation to argument_validation as we should provide users the flexibility to use this decorator over all different functions, which are required with type validation.

It can also work as a construct-time validation
```py
class ExampleDataPipe(IterDataPipe):
    argument_validation
    def __init__(self, dp: IterDataPipe[int]):
        self.dp = dp

    ...
```
Notebook is also updated.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D27743478

Pulled By: ejguan

fbshipit-source-id: 49743152d121028cd7d72d89dc7df5c7c7b94c41
2021-05-12 11:58:05 -07:00
Erjia Guan
b58a7c95aa [DataLoader] Raise detailed Error for ForwardRef type (#57824)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57824

Implement type check for string type. Re-raise detailed exception at compile time.
```py
>>> class InvalidData(Generic[T_co], NamedTuple):  # Invalid generic namedtuple in Python typing
...     name: str
...     data: T_co

class DP(IterDataPipe['InvalidData[int]']):
...     pass
TypeError: InvalidData[int] is not supported by Python typing
```

Add `__type_class__` attribute to class, which optimizes the static checking flow by reducing checking times.
```py
>>> class DP1(IterDataPipe[Union[int, str]]):
...     pass
>>> class DP2(DP1[int]):
...     pass
>>> list((cls, getattr(cls, '__type_class__', None)) for cls in DP2.__mro__)
[(<class '__main__.DP2'>, False), (<class 'abc.DP1[int]'>, True), (<class '__main__.DP1'>, False), (<class 'abc.IterableDataset[typing.Union[int, str]]'>, True), (<class 'torch.utils.data.dataset.IterableDataset'>, False), (<class 'torch.utils.data.dataset.Dataset'>, None), (<class 'typing.Generic'>, None), (<class 'object'>, None)]
```
Among the class of `DP2`'s MRO, only `DP2`, `DP1` will be static checked when `__type_class__` is `False`. `abc.DP1[int]` and `abc.IterableDataset[typing.Union[int, str]]` will be ignored since they are just a class with typing.

## Future
When Python 3.6 is deprecated, using TypeAlias rather than TypeMeta can eliminates the usage of `__type_class__` attribute.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D28289104

Pulled By: ejguan

fbshipit-source-id: 1da97460c8bfc48cea7396033fde484a24caba7c
2021-05-11 13:38:30 -07:00
Erjia Guan
ece15f6902 [DataLoader] Change Decoder signature and add MatHandler (#57391)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57391

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D28151601

Pulled By: ejguan

fbshipit-source-id: 34814197d2f068cab0c7ca2330152ad588eb1ef0
2021-05-10 06:29:00 -07:00
Sam Estep
75024e228c Add lint for unqualified type: ignore (#56290)
Summary:
The other half of https://github.com/pytorch/pytorch/issues/56272.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56290

Test Plan:
CI should pass on the tip of this PR, and we know that the lint works because the following CI runs (before this PR was finished) failed:

- https://github.com/pytorch/pytorch/runs/2384511062
- https://github.com/pytorch/pytorch/actions/runs/765036024

Reviewed By: seemethere

Differential Revision: D27867219

Pulled By: samestep

fbshipit-source-id: e648f07b6822867e70833e23ddafe7fb7eaca235
2021-04-21 08:07:23 -07:00
Erjia Guan
0b1c3dfae4 [DataLoader] Typing Enforcement for DataPipe at runtime (#54544)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54544

## Feature
- Add `subinstance(data, type)` to check `data` is a subtype instance of the `type`
- Add a decorator of `runtime_validation` to validate the returned data from `__iter__` is subtype instance of hint.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D27327234

Pulled By: ejguan

fbshipit-source-id: fb6a332762b0fe75284bb2b52a13ed171b42558c
2021-04-02 15:22:32 -07:00
Erjia Guan
1535520f08 [DataLoader] Typing Enforcement for DataPipe at construct-time (#54066)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54066

## Feature
- Add a decorator `construct_time_validation` to validate each input datapipe according to the corresponding type hint.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D27327236

Pulled By: ejguan

fbshipit-source-id: a9d4c6edb5b05090bd5a369eee50a6fb4d7cf957
2021-04-02 15:22:29 -07:00
Erjia Guan
44edf8c421 [DataLoader] Typing Enforcement for DataPipe at Compile-time (#54020)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54020

## Feature
- Add `issubtype` to check the type is a subtype of the other type.
- Add `_DataPipeMeta` (mimic Python typing 3.6)
  - Add `type` attribute for each DataPipe
  - Save original `__init__` function for each DataPipe
  - Validate return hint of `__iter__`
  - Replace `__init__` function bases on `type`
    - Fixed type: Put original `__init__` back, if it exists or use a plain `__init__`
    -  Non-fixed type: Add new `__init__` with the functionality to copy `cls.type` for each instance. (Optimized for memory)

No Error for main repo, `torchvision`, `torchaudio` and `torchtext`.

## Future
- Add same thing for `__getitem__`.
- When DataFrame came out, add an another type for DataFrame with column name and type.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D27327232

Pulled By: ejguan

fbshipit-source-id: fd3a6029c16f5d814b1d7e1b1566fdcd8fd1ad9a
2021-04-02 15:22:27 -07:00
Erjia Guan
560e3be587 [DataLoader] Implement issubtype for type hints (#54299)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54299

## Feature
- Check type is a subtype of another type

Prerequisite for DataPipe tying system.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D27327235

Pulled By: ejguan

fbshipit-source-id: 8f50a663a86540677c9e132ac7c5216fdac46f70
2021-04-02 15:20:55 -07:00
Erjia Guan
fff0a3f906 [DataLoader] ZipIterDataPipe (#53554)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53554

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D26913406

Pulled By: ejguan

fbshipit-source-id: 24604b41d08eb6f7689add152229049a4c65c06e
2021-03-12 08:26:21 -08:00
Erjia Guan
1ba80264f4 [DataLoader] ConcatDataPipe (#53301)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53301

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D26829322

Pulled By: ejguan

fbshipit-source-id: eeea42fd9ab267d10f39ad7debc279eaded23570
2021-03-06 07:32:02 -08:00
Erjia Guan
c957e2ab42 Add more datapipe to functional API (#53123)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53123

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D26756638

Pulled By: ejguan

fbshipit-source-id: 6ff0eb6c7ee702056ff19eeb723949e4642f2784
2021-03-03 07:01:00 -08:00
Erjia Guan
89b1053413 [DataLoader] Move BufferedShuffle from Dataset to DataPipe (#52141)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52141

Remove BufferShuffleDataSet, as it's not being used anywhere within PyTorch (no usage on Github based on a search) and it's not included in the release of PyTorch 1.7.1.

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D26710940

Pulled By: ejguan

fbshipit-source-id: 90023b4bfb105d6aa392753082100f9181ecebd0
2021-03-01 12:54:44 -08:00
Erjia Guan
b534466f01 [DataLoader] TransformsIterDataPipe (#52604)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52604

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D26581511

Pulled By: ejguan

fbshipit-source-id: c927726b7afba14586f16cde0237f2cef9080079
2021-02-23 15:47:27 -08:00
Erjia Guan
4ee5bc74d3 [DataLoader] Change signature of Functional DataPipe (#52458)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52458

Test Plan: Imported from OSS

Reviewed By: glaringlee

Differential Revision: D26523282

Pulled By: ejguan

fbshipit-source-id: c7358fc351f859617754a27b8a701d11ada5d61a
2021-02-18 23:30:58 -08:00
Erjia Guan
059c564ba4 [DataLoader] Fix module import (#52224)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52224

Test Plan: Imported from OSS

Reviewed By: glaringlee

Differential Revision: D26429871

Pulled By: ejguan

fbshipit-source-id: fcf2e5435658ecb92af1079def953b08cebb1f7f
2021-02-16 16:12:33 -08:00
Erjia Guan
425a5dc3f7 [DataLoader] Modify SamplerIDP signature (#52104)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52104

Make the API of `SamplerIterDataPipe` more reasonable with `sampler_args` and `sampler_kwargs`.

Test Plan: Imported from OSS

Reviewed By: glaringlee

Differential Revision: D26401494

Pulled By: ejguan

fbshipit-source-id: ee5b5c414782d0880b12968bc9c8aa470b753f6a
2021-02-11 09:29:52 -08:00
Erjia Guan
9eb70c3c78 [DataLoader] Rename Callable to Map IterDataPipe (#51879)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51879

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D26314775

Pulled By: ejguan

fbshipit-source-id: ee77909eae97092155ed6a6c794540e68a04d754
2021-02-09 17:09:06 -08:00
Erjia Guan
104371e1dc [DataLoader] Implement FilterIterDataPipe (#51783)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51783

Test Plan: Imported from OSS

Reviewed By: glaringlee

Differential Revision: D26277688

Pulled By: ejguan

fbshipit-source-id: 25ed7da9da88c030b29627142c2f04fed26cdcda
2021-02-09 17:06:06 -08:00
lixinyu
015cabf82a move GroupByFilename Dataset to DataPipe (#51709)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51709

Move GroupByFilename Dataset to DataPipe

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D26263585

Pulled By: glaringlee

fbshipit-source-id: 00e3e13b47b89117f1ccfc4cd6239940a40d071e
2021-02-09 03:34:56 -08:00
lixinyu
482b94ae51 move RoutedDecoder Dataset to DataPipe (#51704)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51704

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D26245910

Pulled By: glaringlee

fbshipit-source-id: 91e3c9f8a6c1209c1a1a752ba29a80dbd9bf4119
2021-02-09 03:31:07 -08:00
lixinyu
1ee0c42d6d move ZipDataset to Zip DataPipe (#51599)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51599

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D26212859

Pulled By: glaringlee

fbshipit-source-id: 3fabcf8876d3c9c24339dbf6a12e0bb04b400108
2021-02-03 15:42:59 -08:00
Erjia Guan
52de407b4b [DataLoader] Rename Functional DataSet to DataPipe (#51488)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51488

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D26209888

Pulled By: ejguan

fbshipit-source-id: cb8bc852b1e4d72be81e0297308a43954cd95332
2021-02-03 07:01:09 -08:00
lixinyu
c0d58bce0d move Tar Dataset to Tar DataPipe (#51398)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51398

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D26162319

Pulled By: glaringlee

fbshipit-source-id: a84879fe4ca044e34238d5e1d31a245d4b80ae8e
2021-02-02 07:46:53 -08:00
lixinyu
5ed0ad4b6a DataPipe naming convension update (#51262)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51262

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D26120628

Pulled By: glaringlee

fbshipit-source-id: 6855a0dd6d4a93ff93adce1039960ffd7057a827
2021-01-28 17:44:36 -08:00