This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022)
- Update mypy to 1.4.1 (#91983)
That were reverted due to the conflict with internal source repo.
Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
- Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
- Add missing return statement to `torch._export. deserialize_graph`
- Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
- Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
- Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`
Unrelated, to bypass CI failures due to the gcc9 dependency update in Ubuntu-18.04:
- Add hack to squash older libstdc++ from conda environment in favor one from OS to `.ci/docker/install_conda.sh`
- Update bazel cuda builds to focal, as with libstdc++-6.0.32 bazel builds loose the ability to catch exceptions (probably because they link with cupti statically, but I could not found where it is done)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007
This PR re-lands
- [Typing] Fix PEP 484 Violation (#105022)
- Update mypy to 1.4.1 (#91983)
That were reverted due to the conflict with internal source repo.
Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional)
Plus few real fixes:
- Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi`
- Add missing return statement to `torch._export. deserialize_graph`
- Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights`
- Add assert it `torch/optim/optimizer.py` that Optional list is not None
TODO (in followup PR):
- Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007
Torch wrapping datasets list has:
`TensorDataset`
`ConcatDataset`
`ChainDataset`
`TensorDataset` is useful for stacking sets of tensors but can't work with objects without `.size()` method.
This PR proposes `StackDataset`, similar to `TensorDataset` but for a general case like `ConcatDataset`.
Possible usage of `StackDataset` is multimodal networks with different input like image+text or for staking non-tensor input and property to predict.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101338
Approved by: https://github.com/ejguan, https://github.com/NivekT
DataLoader supports batched loading from Mapped Datasets.
This is the fetcher's implementation of auto-detection of batch loading support.
torch.utils.data._utils.fetch._MapDatasetFetcher
```
class _MapDatasetFetcher(_BaseDatasetFetcher):
def fetch(self, possibly_batched_index):
if self.auto_collation:
if hasattr(self.dataset, "__getitems__") and self.dataset.__getitems__:
data = self.dataset.__getitems__(possibly_batched_index)
else:
data = [self.dataset[idx] for idx in possibly_batched_index]
```
Description of Dataset API now shows this feature.
Additionally, Subset dataset now supports `__getitems__` if parent dataset supports it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100375
Approved by: https://github.com/ejguan, https://github.com/NivekT
Add helpful context message to `NotImplementedError`'s thrown by Dataset and IterableDataset, reminding users that they must implement `__getitem__`/`__iter__` in subclasses. Currently, users are presented with a bare `NotImplementedError` without describing the remedy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100667
Approved by: https://github.com/NivekT
This is a new version of #15648 based on the latest master branch.
Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR.
In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.)
Fixes https://github.com/pytorch/pytorch/issues/71105
@ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797
Approved by: https://github.com/ezyang
Fixes#78510
This PR adds support for using fractions with `random_split`. This should be completely backwards-compatible as the fractional-style splitting is only applied when the sum across the input lengths is lower than 1.0
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78877
Approved by: https://github.com/ejguan
Summary:
X-link: https://github.com/pytorch/data/pull/368
This is PR aims to expose the right data-relate API.
There are two more changes made in this PR to convert public api to private api
`check_lambda_fn` -> `_check_lambda_fn`
`deprecation_warning` -> `_deprecation_warning`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76143
Reviewed By: albanD, NivekT
Differential Revision: D35798311
Pulled By: ejguan
fbshipit-source-id: b13fded5c88a533c706702fb2070c918c839dca4
(cherry picked from commit 0b534b829a2e90e1e533951c6d334fdeaa9358b9)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73396
Separating DataPipes from Dataset into different files. This makes the code more maintainable and simplifies some of the code generation.
I have also tried to move `datapipe.py` into `torch.utils.data.datapipes`, but that will lead to circular import and rewriting many import statements. Should I put more time and go down that path some more?
Fixes https://github.com/pytorch/data/issues/213
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision: D34481962
Pulled By: NivekT
fbshipit-source-id: 42fb26fe7fc334636852cfd8719fc807bdaa7912
(cherry picked from commit 81e76a64e297cb5c58caa951c554e49526173936)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72896
Fixing the issue described here: https://github.com/pytorch/data/issues/214
There will be a follow-up PR in TorchData as well
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision: D34258669
Pulled By: NivekT
fbshipit-source-id: 6dd88250ed14ebe779915dc46139be7e012e9d1b
(cherry picked from commit 025b8ed98019e576bfef04c33a3f33ed1a426a66)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72618
The major changes are in torch/utils/data/dataset.py
Let me know if anything is unclear. I'm open to suggestion.
Test Plan: Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision: D34119492
Pulled By: NivekT
fbshipit-source-id: 358cb6d33d18501f9042431350f872ebaa9b4070
(cherry picked from commit 53b484f60a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67783
Add `getstate_hook` to exclude primitive objects and callable when serialization when `exclude_primitive` is enabled for `traverse`.
For graph traversing, we don't have to handle the lambda and other stuff.
This is used by `OnDiskCacheHolder` to trace the DataPipe Graph.
Test Plan: Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision: D32146697
Pulled By: ejguan
fbshipit-source-id: 03b2ce981bb21066e807f57c167b77b2d0e0ce61
Summary:
`datasets` needs to be iterable, but also sized because the length is checked. But immediately after it's converted to a list. By changing the order of these 2 lines, it doesn't need to be sized anymore.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64114
Reviewed By: H-Huang
Differential Revision: D30641480
Pulled By: ejguan
fbshipit-source-id: 7e16548c2123afa65b83845f9929271fa07fe1e8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63422Fixes#63095
Make `DataChunk` delegate to list method. Then it will support in-place operations:
- `sort`
- `reverse`
- `append`
- `extend`
- `random.shuffle`
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D30379027
Pulled By: ejguan
fbshipit-source-id: d176bd0cc8b89b915c7bb184ff243ab1f605616d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62768
This is part of TorchArrow DF support preparation, separating it to multiple PRs to simplify review process.
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision: D30149090
Pulled By: VitalyFedyunin
fbshipit-source-id: a36b5ff56e2ac6b06060014d4cd41b487754acb8
Summary:
During development it is common practice to put `type: ignore` comments on lines that are correct, but `mypy` doesn't recognize this. This often stems from the fact, that the used `mypy` version wasn't able to handle the used pattern.
With every new release `mypy` gets better at handling complex code. In addition to fix all the previously accepted but now failing patterns, we should also revisit all `type: ignore` comments to see if they are still needed or not. Fortunately, we don't need to do it manually: by adding `warn_unused_ignores = True` to the configuration, `mypy` will error out in case it encounters an `type: ignore` that is no longer needed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60006
Reviewed By: jbschlosser, malfet
Differential Revision: D29133237
Pulled By: albanD
fbshipit-source-id: 41e82edc5cd5affa7ccedad044b59b94dad4425a
Summary:
Dataset can be indexed by a list, but a list can not be indexed by a list. This gives error when slicing a Subset initialised with a Subset, instead of a dataset.
Fixed the issue by changing the indices to a Tensor which can be indexed by a list.
Fixes https://github.com/pytorch/pytorch/issues/59512
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59513
Reviewed By: zou3519
Differential Revision: D29196891
Pulled By: ejguan
fbshipit-source-id: ccde6e474fbcbddd2e9c7c107bc8b5de1307cdb9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54020
## Feature
- Add `issubtype` to check the type is a subtype of the other type.
- Add `_DataPipeMeta` (mimic Python typing 3.6)
- Add `type` attribute for each DataPipe
- Save original `__init__` function for each DataPipe
- Validate return hint of `__iter__`
- Replace `__init__` function bases on `type`
- Fixed type: Put original `__init__` back, if it exists or use a plain `__init__`
- Non-fixed type: Add new `__init__` with the functionality to copy `cls.type` for each instance. (Optimized for memory)
No Error for main repo, `torchvision`, `torchaudio` and `torchtext`.
## Future
- Add same thing for `__getitem__`.
- When DataFrame came out, add an another type for DataFrame with column name and type.
Test Plan: Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision: D27327232
Pulled By: ejguan
fbshipit-source-id: fd3a6029c16f5d814b1d7e1b1566fdcd8fd1ad9a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53271
- [x] Add `set_determinism` context manager
- [x] Add `non_deterministic` decorator for `DataPipe`
- Raise error at the construction time for non-deterministic DataPipe when `determinism` is set to `True`
- [ ] Support `non_deterministic` with option
- When `GreedyJoin` only contains one datapipe, it should still be deterministic.
Note: Test is in the [PR](https://github.com/facebookexternal/torchdata/pull/15). As the main repo doesn't have non-deterministic DataPipe yet.
Test Plan: Imported from OSS
Reviewed By: anjali411
Differential Revision: D26823023
Pulled By: ejguan
fbshipit-source-id: 51bb92fc3d18d1fc9536c1229363c536ad120876
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52141
Remove BufferShuffleDataSet, as it's not being used anywhere within PyTorch (no usage on Github based on a search) and it's not included in the release of PyTorch 1.7.1.
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D26710940
Pulled By: ejguan
fbshipit-source-id: 90023b4bfb105d6aa392753082100f9181ecebd0
Summary:
While using `torch.utils.data.TensorDataset`, if sizes of tensors mismatch, there's now a proper error message.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46905
Reviewed By: ngimel
Differential Revision: D24565712
Pulled By: mrshenli
fbshipit-source-id: 98cdf189591c2a7a1b693627cc8464e8f553d9ee