pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Erjia Guan	782f18e9b5	[DLv2] Make graph `traverse` working with unhashable `DataPipe` (#80509 ) Summary: This Diff removes the requirement for `traverse` function that `DataPipe` needs to be hash-able. `traverse` function now is using `id` of `DataPipe` instance rather than `DataPipe` itself as the key for both `cache` and graph. But, it requires the changes of type of `DataPipeGraph` from `Dict[DataPipe, "DataPipeGraph"]` to `Dict[int, Tuple[DataPipe, "DataPipeGraph"]]`. Differential Revision: D37354153 Ref PR in TorchData: https://github.com/pytorch/data/pull/559 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80509 Approved by: https://github.com/VitalyFedyunin	2022-07-12 14:47:42 +00:00
Vitaly Fedyunin	bcab5257de	Expanding DataPipe to support DataFrames (#71931 ) Differential Revision: [D37500516](https://our.internmc.facebook.com/intern/diff/D37500516) Pull Request resolved: https://github.com/pytorch/pytorch/pull/71931 Approved by: https://github.com/ejguan	2022-07-08 18:46:10 +00:00
Kevin Tse	b8e50f512f	[DataPipe] Count number of successful yields for IterDataPipe (#79657 ) This PR adds an attribute and logic to count the number of successful yields from `IterDataPipe`. This information can be useful to fast-forward a DataPipe (or the entire graph) back to a certain state. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79657 Approved by: https://github.com/VitalyFedyunin	2022-06-28 17:30:33 +00:00
Erjia Guan	3d218e1c87	Raise warning for unpickable local function (#547 ) (#80232 ) Summary: X-link: https://github.com/pytorch/data/pull/547 Fixes https://github.com/pytorch/data/issues/538 - Improve the validation function to raise warning about unpickable function when either lambda or local function is provided to DataPipe. - The inner function from functools.partial object is extracted as well for validation - Mimic the behavior of pickle module for local lambda function: It would only raise Error for the local function rather than lambda function. So, we will raise warning about local function not lambda function. ```py >>> import pickle >>> def fn(): ... lf = lambda x: x ... pickle.dumps(lf) >>> pickle.dumps(fn) AttributeError: Can't pickle local object 'fn.<locals>.<lambda>' ``` This Diff also fixes the Error introduced by https://github.com/pytorch/pytorch/pull/79344 Test Plan: CI on PyTorch and TorchData Manually validated the tests from TorchVision Differential Revision: D37417556 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80232 Approved by: https://github.com/NivekT	2022-06-27 21:47:09 +00:00
PyTorch MergeBot	fcdaf35114	Revert "Add validation for mapper function in datapipes with `input_col` (#79344 )" This reverts commit `787ac4edf8`. Reverted https://github.com/pytorch/pytorch/pull/79344 on behalf of https://github.com/ejguan due to This PR breaks multiple use cases and the CI from TorchVision becomes red	2022-06-24 17:17:33 +00:00
PyTorch MergeBot	79ba65c0f2	Revert "Raise warning for unpickable local function (#80140 )" This reverts commit `4b75b7d3c1`. Reverted https://github.com/pytorch/pytorch/pull/80140 on behalf of https://github.com/ejguan due to It will break the CI for TorchData	2022-06-24 14:49:06 +00:00
erjia	4b75b7d3c1	Raise warning for unpickable local function (#80140 ) Fixes https://github.com/pytorch/data/issues/538 - Improve the validation function to raise warning about unpickable function when either lambda or local function is provided to `DataPipe`. - The inner function from `functools.partial` object is extracted as well for validation - Mimic the behavior of `pickle` module for local lambda function: It would only raise Error for the local function rather than `lambda` function. So, we will raise warning about local function not lambda function. ```py >>> import pickle >>> def fn(): ... lf = lambda x: x ... pickle.dumps(lf) >>> pickle.dumps(fn) AttributeError: Can't pickle local object 'fn.<locals>.<lambda>' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/80140 Approved by: https://github.com/VitalyFedyunin, https://github.com/NivekT	2022-06-24 13:50:51 +00:00
Robert	787ac4edf8	Add validation for mapper function in datapipes with `input_col` (#79344 ) As linked in https://github.com/pytorch/data/issues/362 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79344 Approved by: https://github.com/ejguan, https://github.com/NivekT	2022-06-23 18:49:35 +00:00
Robert Xiu	9fca008809	[DataPipe] Adding functional API for FileLister (#78419 ) Fixes #78263 Follow-up from pytorch/data#387. This adds a functional API `list_files()` to `FileListerDataPipe`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78419 Approved by: https://github.com/NivekT, https://github.com/ejguan	2022-06-06 17:26:19 +00:00
erjia	9b6cb83b0c	Make ShufflerDataPipe deterministic for persistent DL and distributed DL (#78765 ) Fixes https://github.com/pytorch/data/issues/426 This PR introduces two main changes: - It ensures the `ShufflerDataPipe` would share the same seed across distributed processes. - Users can reset `shuffle` for persistent workers per epoch. Detail: - `shared_seed` is shared across distributed and worker processes. It will seed a `shared_rng` to provide seeds to each `ShufflerDataPipe` in the pipeline - `worker_loop` now accepts a new argument of `shared_seed` to accept this shared seed. - The `shared_seed` is attached to `_ResumeIteration` for resetting seed per epoch for `persistent worker` - I choose not to touch `base_seed` simply for BC issue I used this [script](https://gist.github.com/ejguan/d88f75fa822cb696ab1bc5bc25844f47) to test the result with `world_size=4`. Please check the result in: https://gist.github.com/ejguan/6ee2d2de12ca57f9eb4b97ef5a0e300b You can see there isn't any duplicated/missing element for each epoch. And, with the same seed, the order of data remains the same across epochs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78765 Approved by: https://github.com/VitalyFedyunin	2022-06-06 17:24:00 +00:00
PyTorch MergeBot	129d9dbb15	Revert "Make ShufflerDataPipe deterministic for persistent DL and distributed DL (#78765 )" This reverts commit `b769a0e18b`. Reverted https://github.com/pytorch/pytorch/pull/78765 on behalf of https://github.com/janeyx99 due to broke lint on trunk	2022-06-06 14:24:51 +00:00
erjia	b769a0e18b	Make ShufflerDataPipe deterministic for persistent DL and distributed DL (#78765 ) Fixes https://github.com/pytorch/data/issues/426 This PR introduces two main changes: - It ensures the `ShufflerDataPipe` would share the same seed across distributed processes. - Users can reset `shuffle` for persistent workers per epoch. Detail: - `shared_seed` is shared across distributed and worker processes. It will seed a `shared_rng` to provide seeds to each `ShufflerDataPipe` in the pipeline - `worker_loop` now accepts a new argument of `shared_seed` to accept this shared seed. - The `shared_seed` is attached to `_ResumeIteration` for resetting seed per epoch for `persistent worker` - I choose not to touch `base_seed` simply for BC issue I used this [script](https://gist.github.com/ejguan/d88f75fa822cb696ab1bc5bc25844f47) to test the result with `world_size=4`. Please check the result in: https://gist.github.com/ejguan/6ee2d2de12ca57f9eb4b97ef5a0e300b You can see there isn't any duplicated/missing element for each epoch. And, with the same seed, the order of data remains the same across epochs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78765 Approved by: https://github.com/VitalyFedyunin	2022-06-06 13:36:37 +00:00
Kevin Tse	b4a6730ce1	[DataPipe] Refactor 'mux' to have buffer as an instance variable Pull Request resolved: https://github.com/pytorch/pytorch/pull/77775 Approved by: https://github.com/ejguan	2022-05-19 19:55:27 +00:00
erjia	99f6e614e8	Seed `Shuffler` for MP DataLoader without explicit `manual_seed`. (#77855 ) Follow up on https://github.com/pytorch/pytorch/pull/77741 This PR guarantees the `Shuffler` in first iteration with MP DataLoader has the same seed across worker processes when users don't specify the seed. Check newly added tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/77855 Approved by: https://github.com/NivekT	2022-05-19 17:28:26 +00:00
erjia	365ce350cb	Make ShufflerDataPipe deterministic for SP & MP DataLoader (#77741 ) This is the first PR to make DataPipe deterministic. Users should be able to use `torch.manual_seed(seed)` to control the shuffle order for the following cases: - Directly over `DataPipe` - For single-process DataLoader - Multiprocessing DataLoader Unfortunately, for distributed training, users have to run `apply_shuffle_seed` manually to make sure all distributed processes having the same order of shuffle. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77741 Approved by: https://github.com/VitalyFedyunin, https://github.com/NivekT	2022-05-18 23:32:07 +00:00
Ning Li (Seattle)	4d1ead6dff	[DataPipe] Update `mux` data pipe (#76384 ) (#77145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76384 OSS issue discussion: https://github.com/pytorch/data/issues/346 This diff updates `mux` and `mux_longest` data pipe. `mux`: Yields one element at a time from each of the input Iterable DataPipes (functional name: ``mux``). As in, one element from the 1st input DataPipe, then one element from the 2nd DataPipe in the next iteration, and so on. It ends when the shortest input DataPipe is exhausted. `mux` example: ``` >>> from torchdata.datapipes.iter import IterableWrapper >>> dp1, dp2, dp3 = IterableWrapper(range(3)), IterableWrapper(range(10, 15)), IterableWrapper(range(20, 25)) >>> list(dp1.mux(dp2, dp3)) [0, 10, 20, 1, 11, 21, 2, 12, 22] ``` Test Plan: buck test mode/opt //caffe2/test:datapipe https://www.internalfb.com/intern/testinfra/testrun/4785074706282345 Differential Revision: D36017945 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77145 Approved by: https://github.com/NivekT, https://github.com/ejguan	2022-05-18 16:23:07 +00:00
Kevin Tse	bbaefdf6b5	[DataPipe] Enforcing single valid iterator for IterDataPipes multiple DataPipes as outputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/75995 Approved by: https://github.com/VitalyFedyunin	2022-05-18 01:31:39 +00:00
Kevin Tse	7c52f204e0	[DataPipe] Enforcing single valid iterator for IterDataPipes without multiple outputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/70479 Approved by: https://github.com/ejguan	2022-05-18 01:31:38 +00:00
Vitaly Fedyunin	edffd595c2	[DataLoader] Adding ability to use dill to pass DataPipes in mutiprocessing Pull Request resolved: https://github.com/pytorch/pytorch/pull/77288 Approved by: https://github.com/ejguan, https://github.com/NivekT	2022-05-15 23:04:03 +00:00
Kevin Tse	a008d19ff7	[DataPipe] Revamp serialization logic of DataPipes Pull Request resolved: https://github.com/pytorch/pytorch/pull/74984 Approved by: https://github.com/ejguan	2022-05-10 16:16:46 +00:00
zengk95	ef63408853	Revert [DataPipe] Update mux data pipe Reverts #76384 this this is breaking tests test_demux_mux_datapipe (__main__.TestIterableDataPipeBasic. See logs: `a997046017` and was red on the PR as well: https://hud.pytorch.org/pytorch/pytorch/pull/76384 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76507 Approved by: https://github.com/kit1980	2022-04-28 00:06:30 +00:00
Ning Li (Seattle)	a997046017	[DataPipe] Update `mux` data pipe (#76384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76384 OSS issue discussion: https://github.com/pytorch/data/issues/346 This diff updates `mux` and `mux_longest` data pipe. `mux`: Yields one element at a time from each of the input Iterable DataPipes (functional name: ``mux``). As in, one element from the 1st input DataPipe, then one element from the 2nd DataPipe in the next iteration, and so on. It ends when the shortest input DataPipe is exhausted. `mux` example: ``` >>> from torchdata.datapipes.iter import IterableWrapper >>> dp1, dp2, dp3 = IterableWrapper(range(3)), IterableWrapper(range(10, 15)), IterableWrapper(range(20, 25)) >>> list(dp1.mux(dp2, dp3)) [0, 10, 20, 1, 11, 21, 2, 12, 22] ``` Test Plan: buck test mode/dev //pytorch/data/test:tests -- --exact 'pytorch/data/test:tests - test_mux_longest_iterdatapipe (test_datapipe.TestDataPipe)' https://www.internalfb.com/intern/testinfra/testrun/3096224791148107 Reviewed By: ejguan Differential Revision: D35799965 fbshipit-source-id: 320e71a342ec27e6e9200624aad42f4b99f97c3a (cherry picked from commit 741ed595275df6c05026ed6f0e78d7052328fb7d)	2022-04-27 22:10:42 +00:00
erjia	0ff05b1e97	[DataPipe] Add funtional API docstring and fix typo in test Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/76272 Approved by: https://github.com/ishaan-mehta, https://github.com/NivekT	2022-04-25 14:16:53 +00:00
Kevin Tse	383f026791	[DataPipe] Enabling graph traversal for MapDataPipe Pull Request resolved: https://github.com/pytorch/pytorch/pull/74851 Approved by: https://github.com/ejguan	2022-04-22 18:06:16 +00:00
erjia	ec591087fb	[DataPipe] Add input_col to filter and add deprecation warning for DataPipe arguments Last patch to align DataPipe API with TorchArrow DataFrame For deprecation warning of DataPipe argument: ``` The argument `drop_empty_batches` of `FilterIterDataPipe()` is deprecated since 1.12 and will be removed in 1.14. See https://github.com/pytorch/data/issues/163 for details. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/76060 Approved by: https://github.com/NivekT	2022-04-22 17:49:39 +00:00
erjia	b8cce8847f	[DataPipe] Add functional API to StreamReader and FileOpener Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/76233 Approved by: https://github.com/NivekT	2022-04-22 17:49:26 +00:00
erjia	841a7f5187	[DataPipe] apply dill serialization for _Demux and add cache to traverse - Fix _Demux can not be pickled with DILL presented https://github.com/pytorch/pytorch/pull/74958#issuecomment-1084637227 - And add cache to traverse function to prevent infinite recursion for circular reference of DataPipe (Fixes https://github.com/pytorch/data/issues/237) Pull Request resolved: https://github.com/pytorch/pytorch/pull/75034 Approved by: https://github.com/wenleix	2022-04-04 19:45:14 +00:00
Kevin Tse	4c5d532728	[DataPipe] only apply special serialization when dill is installed Pull Request resolved: https://github.com/pytorch/pytorch/pull/74958 Approved by: https://github.com/ejguan	2022-03-30 20:38:05 +00:00
Nicolas Hug	5667c4ea21	Remove default parameter of ShufflerIterDataPipe (#74370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74370 Closes https://github.com/pytorch/data/issues/298. This PR: - removes the `default` parameter of `ShufflerIterDataPipe` - renames `set_shuffle_setting()` into `set_shuffle()` - let `set_shuffle()` return `self`. Test Plan: Imported from OSS Reviewed By: george-qi Differential Revision: D35073666 Pulled By: NicolasHug fbshipit-source-id: 9847b037e70f44f36eaf4471f2c12fa8ec2ed73c (cherry picked from commit b07ab646f308532886e8daddd57e937a53edb153)	2022-03-28 12:47:24 +00:00
Kevin Tse	eec994fc16	[DataPipe] Separating DataPipes from Dataset into different files (#73396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73396 Separating DataPipes from Dataset into different files. This makes the code more maintainable and simplifies some of the code generation. I have also tried to move `datapipe.py` into `torch.utils.data.datapipes`, but that will lead to circular import and rewriting many import statements. Should I put more time and go down that path some more? Fixes https://github.com/pytorch/data/issues/213 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D34481962 Pulled By: NivekT fbshipit-source-id: 42fb26fe7fc334636852cfd8719fc807bdaa7912 (cherry picked from commit 81e76a64e297cb5c58caa951c554e49526173936)	2022-03-15 14:46:34 +00:00
Kevin Tse	8811d217ed	[DataPipe] Slight refactoring IterDataPipe serialization test (#73922 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73922 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D34732288 Pulled By: NivekT fbshipit-source-id: f31229332fe4eac85cc2085484f6e1b1d802987d (cherry picked from commit ace20054e4f3f9bd9610640755400fbde82650c3)	2022-03-09 15:33:12 +00:00
Kevin Tse	0821154072	[DataPipe] Adding serialization test for all MapDataPipe (#73921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73921 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D34732286 Pulled By: NivekT fbshipit-source-id: 893af2fbb83feb1bae226d3205105de5d3836378 (cherry picked from commit f44fd3c5210d0afdbf826e3b7e7fbe2ec216c3b7)	2022-03-09 15:33:12 +00:00
Kevin Tse	f85309e478	[DataPipe] Adding serialization test at different stages of reading for IterDataPipes (#73119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73119 Test if a DataPipe is serializable after its contents are partially read and completely read. This is especially important for DataPipes with buffers. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D34354496 Pulled By: NivekT fbshipit-source-id: 36971d68b9ca1de81fb254e9a459b8f54fe0f9ff (cherry picked from commit e8f39a7aa364bd2b19145788f7e67c06f948f81b)	2022-02-23 16:31:21 +00:00
Kevin Tse	cd4ecce1bb	[DataPipe] Fix issue with DataPipe serialization with `dill` (#72896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72896 Fixing the issue described here: https://github.com/pytorch/data/issues/214 There will be a follow-up PR in TorchData as well Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D34258669 Pulled By: NivekT fbshipit-source-id: 6dd88250ed14ebe779915dc46139be7e012e9d1b (cherry picked from commit 025b8ed98019e576bfef04c33a3f33ed1a426a66)	2022-02-23 16:31:20 +00:00
Erjia Guan	6297aa114f	[DataPipe] Extend FileLister to support load multiple directories (#72260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72260 Test Plan: Imported from OSS Reviewed By: dagitses, NivekT Differential Revision: D33979744 Pulled By: ejguan fbshipit-source-id: 5733d20382642fc2274afd838b33c98150d81e91 (cherry picked from commit `f70537ae76`)	2022-02-04 07:55:00 +00:00
Erjia Guan	7b014cc645	[DataPipe] Disable Typing for DataPipe before branch cut (#72123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72123 There is a bug to fix the typing system in DataPipe, which would take more than 1 week to fix. I will follow up on it later this month. As branch cut is today, add this PR to disable typing to make sure release works. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D33920610 Pulled By: ejguan fbshipit-source-id: febff849ab2272fd3b1c5127a20f27eb82992d9c (cherry picked from commit `ee103e62e7`)	2022-02-02 05:00:41 +00:00
Santiago Castro	5024c1bc7b	Make `get_file_pathnames_from_root` output order deterministic (#70435 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70103 I used an argument so it can be disabled. I called it `deterministic_order` because `sort` can be confusing, as it's actually sorted but by dir levels. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70435 Reviewed By: albanD Differential Revision: D33899755 Pulled By: ejguan fbshipit-source-id: e8a08f03a49120333b2d27f332cd21a3240a02a9 (cherry picked from commit `4616e43ec3`)	2022-02-01 18:12:23 +00:00
Vitaly Fedyunin	b36b11cbc1	Separating CaptureDataFrame out of DFIterDataPipe (#71776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71776 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D33771602 Pulled By: VitalyFedyunin fbshipit-source-id: 59d85bc707a9568f1f0960fc184113a4f422d2df (cherry picked from commit `93522768ef`)	2022-01-26 03:25:02 +00:00
Erjia Guan	bb157dd4eb	Make methods of internal file_obj visible from StreamWrapper (#71653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71653 Test Plan: Imported from OSS Reviewed By: NivekT Differential Revision: D33718749 Pulled By: ejguan fbshipit-source-id: f3a8244f22ca37049b8678afa0e329b23c957a9d (cherry picked from commit `a4d12ca48e`)	2022-01-25 15:34:24 +00:00
Kevin Tse	13ea2cb330	[DataPipe] Make GroupBy serializable with lambda function (#71497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71497 Related to https://github.com/pytorch/data/issues/172 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D33668749 Pulled By: NivekT fbshipit-source-id: 6506614e9d4389dc645d8985c00fdb3402122d9b (cherry picked from commit `458e76fcb1`)	2022-01-21 16:04:45 +00:00
Kevin Tse	36b4c95e74	[DataPipe] adding serialization test for all core IterDataPipes (#71456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71456 Related to https://github.com/pytorch/data/issues/172 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D33668748 Pulled By: NivekT fbshipit-source-id: ea2085d5ed47533ca49258cc52471373c6ae1847 (cherry picked from commit `d5f6fde1d0`)	2022-01-21 16:04:45 +00:00
Kevin Tse	011fd1d933	[DataPipe] improving DataPipe unit tests (#70215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70215 A few renaming, formatting, and additional tests to make the unit tests better. cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D33344610 Pulled By: NivekT fbshipit-source-id: bb36f7452bdc44964c9ce0650c7ae308ba2c5aa5 (cherry picked from commit `0aae20cb27`)	2022-01-20 15:49:53 +00:00
Erjia Guan	fd9e08df5d	Make Demux serializable with lambda function (#71311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71311 Test Plan: Imported from OSS Reviewed By: NivekT Differential Revision: D33584552 Pulled By: ejguan fbshipit-source-id: 52324faf5547f9f77582ec170ec91ce3114cfc61	2022-01-18 06:47:54 -08:00
Kevin Tse	1e3893ecbb	[DataPipe] Removing deprecated DataPipes (#71161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71161 Users should import these DataPipes from [TorchData](https://github.com/pytorch/data) if they would like to use them. We will be checking for any downstream library usage before landing this PR. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D33532272 Pulled By: NivekT fbshipit-source-id: 9dbfb21baf2d1183e0aa379049ad8304753e08a1	2022-01-13 07:37:48 -08:00
Kevin Tse	8dcfdf39e7	[DataPipe] Renaming FileLoader to FileOpener with deprecation warning for FileLoader (#70367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70367 This PR renames the `FileLoaderIterDataPipe` to `FileOpenerIterDataPipe`. For the sake of not breaking many CI tests immediately, it still preserves `FileLoader` as an alias. This will allow downstream libraries/users to migrate their use cases before we fully remove all references to `FileLoader` from PyTorch. Fixes https://github.com/pytorch/data/issues/103. More detailed discussion about this decision is also in the linked issue. cc VitalyFedyunin ejguan NivekT pmeier Nayef211 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D33301648 Pulled By: NivekT fbshipit-source-id: 59278dcd44e372df0ba2001a4eecbf9792580d0b	2022-01-04 09:14:50 -08:00
Kevin Tse	ad0cd8a76e	[DataPipe] Improve inline doc and testing for CollatorIterDataPipe (#70139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70139 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D33199107 Pulled By: NivekT fbshipit-source-id: f96d77490998ac9bc3da8d4ff1a9caa08e9e7f27	2021-12-20 08:05:21 -08:00
Kevin Tse	3d51c88032	[DataPipe] Unifying API - removing options to have fn_args and fn_kwargs from MapDataPipes (#69561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69561 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32952099 Pulled By: NivekT fbshipit-source-id: 95b725774a9d04d655e2542760726908f33043f4	2021-12-16 18:11:00 -08:00
Kevin Tse	b89c283c80	[DataPipe] Unifying API - removing options to have fn_args and fn_kwargs from IterDataPipes (#69560 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69560 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32952100 Pulled By: NivekT fbshipit-source-id: e0cc31408c7cf3220fe274feed1c7202a1aaae70	2021-12-16 18:09:52 -08:00
Vitaly Fedyunin	d90012689f	[DataPipe] Control shuffle settings from DataLoader2 (#65756 ) Summary: Makes `shuffle` DataPipe sensitive to DataLoader(2) `shuffle` kwarg. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65756 Reviewed By: albanD Differential Revision: D31344867 Pulled By: VitalyFedyunin fbshipit-source-id: e0084e0ac193ac784d6298328ca1222745681347	2021-12-14 07:35:26 -08:00
Kevin Tse	81a60b9813	[DataPipe] Adding output types to DataPipe interface file (#69647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69647 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D32989067 Pulled By: NivekT fbshipit-source-id: 2c2e71e9e514e0d584affaa0b71b7b0d07a2ddbf	2021-12-10 12:04:45 -08:00

1 2 3

122 Commits