pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Aaron Orenstein	4e2b4c6ed6	Fix broken docs (#124940 ) These were causing doctest to be unhappy. In particular the doc from #124496 caused #124771 to fail "trunk / win-vs2019-cpu-py3 / test" to fail when pushing. Not sure why it wasn't a problem on the original PR. Testing: `./test/run_doctests.sh`: before: ``` === 4 warnings in 11.21 seconds === ``` after: ``` === in 11.11 seconds === ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/124940 Approved by: https://github.com/zou3519, https://github.com/atalman, https://github.com/huydhn	2024-04-26 19:24:52 +00:00
fzyzcjy	1e8d4b389b	Super tiny fix typo (#122881 ) "CustoType" -> "CustomType" Pull Request resolved: https://github.com/pytorch/pytorch/pull/122881 Approved by: https://github.com/awgu	2024-03-28 16:13:25 +00:00
Xuehai Pan	d3876f73e7	Preserve metadata for `MutableMapping` and `MutableSequence` in `pin_memory` and `collate_fn` (#120553 ) For the user-defined `Mapping` type, it may contain some metadata (e.g., pytorch/tensordict#679, https://github.com/pytorch/pytorch/pull/120195#issue-2141716712). Simply use `type(mapping)({k: v for k, v in mapping.items()})` do not take this metadata into account. This PR uses `copy.copy(mapping)` to create a clone of the original collection and iteratively updates the elements in the cloned collection. This preserves the metadata in the original collection via `copy.copy(...)` rather than relying on the `__init__` method in the user-defined classes. Reference: - pytorch/tensordict#679 - #120195 Closes #120195 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120553 Approved by: https://github.com/vmoens	2024-03-01 20:43:42 +00:00
Pearu Peterson	0bd4d1f4ab	Add sparse tensors support to dataloader. (#112842 ) Fixes https://github.com/pytorch/pytorch/issues/106837 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112842 Approved by: https://github.com/cpuhrsch, https://github.com/gokulavasan	2023-11-19 16:05:27 +00:00
Aryan Gupta	92e7f79609	Doc: Add and Fix docstrings for torch.util.data files (#112817 ) Fixes #112635 Fix docstrings for `torch.utils.data` files. ``` Before: > pydocstyle torch/utils/data/graph.py --count Before: 5 After: 1 > pydocstyle torch/utils/data/graph_settings.py --count Before: 8 After: 3 > pydocstyle torch/utils/data/dataloader.py --count Before: 12 After: 6 > pydocstyle torch/utils/data/dataset.py --count Before: 28 After: 23 > pydocstyle torch/utils/data/sampler.py --count Before: 24 After: 19 > pydocstyle torch/utils/data/_utils/signal_handling.py --count Before: 1 After: 0 > pydocstyle torch/utils/data/_utils/__init__.py --count Before: 2 After: 0 > pydocstyle torch/utils/data/_utils/collate.py --count Before: 20 After: 6 > pydocstyle torch/utils/data/_utils/fetch.py --count Before: 3 After: 0 > pydocstyle torch/utils/data/_utils/pin_memory.py --count Before: 4 After: 1 > pydocstyle torch/utils/data/datapipes/_decorator.py --count Before: 19 After: 16 > pydocstyle torch/utils/data/datapipes/_hook_iterator.py --count Before: 13 After: 0 > pydocstyle torch/utils/data/datapipes/_typing.py --count Before: 17 After: 4 > pydocstyle torch/utils/data/datapipes/gen_pyi.py --count Before: 19 After: 4 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112817 Approved by: https://github.com/kit1980	2023-11-07 17:59:56 +00:00
Joel Schlosser	43ea782af3	Multiprocessing support for NT (#110292 ) Fixes #110161 Allows NTs to be used in DataLoaders with `num_workers > 1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110292 Approved by: https://github.com/cpuhrsch, https://github.com/albanD	2023-10-10 21:58:19 +00:00
PyTorch MergeBot	dac895c10a	Revert "Multiprocessing support for NT (#110292 )" This reverts commit `f17fe89e14`. Reverted https://github.com/pytorch/pytorch/pull/110292 on behalf of https://github.com/kit1980 due to Causes CUDA memory leaks ([comment](https://github.com/pytorch/pytorch/pull/110292#issuecomment-1749852095))	2023-10-06 01:07:40 +00:00
Joel Schlosser	f17fe89e14	Multiprocessing support for NT (#110292 ) Fixes #110161 Allows NTs to be used in DataLoaders with `num_workers > 1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110292 Approved by: https://github.com/cpuhrsch, https://github.com/albanD	2023-10-05 15:04:48 +00:00
Kevin Tse	bb42104fe8	[DataLoader] Fix collation logic (#97789 ) Similar to #97737, a previous auto-refactor changed how `bytes` are handled during collation, which can potentially lead to performance regression. This PR undoes that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97789 Approved by: https://github.com/albanD	2023-03-28 20:25:34 +00:00
Xuehai Pan	b005ec62b9	[BE] Remove dependency on `six` and `future` (#94709 ) Remove the Python 2 and 3 compatibility library [six](https://pypi.org/project/six) and [future](https://pypi.org/project/future) and `torch._six`. We only support Python 3.8+ now. It's time to retire them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94709 Approved by: https://github.com/malfet, https://github.com/Skylion007	2023-02-14 09:14:14 +00:00
joncrall	ad782ff7df	Enable xdoctest runner in CI for real this time (#83816 ) Builds on #83317 and enables running the doctests. Just need to figure out what is causing the failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83816 Approved by: https://github.com/ezyang, https://github.com/malfet	2022-12-29 05:32:42 +00:00
Kurt Mohler	ee28b865ee	Deprecate TypedStorage, its derived classes, and all of their public methods (#85303 ) Part of #85302 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85303 Approved by: https://github.com/ezyang	2022-11-08 18:11:01 +00:00
erjia	b13b10a8fa	Extend collate function that can register collate functions to handle specific types (#85748 ) As per request from Vision team, adding `collate` function with an extra argument of `collate_fn_map` to dispatch custom collate functions for non-collection objects and specific objects. If the type of batch element is not present in`collate_fn_map`, it will go through all keys in the insertion order to check if the type is a subclass of the key. If so, it will invoke the corresponding collate functions. And, `default_collate` will utilize the `collate` function with a few by default collate function for `int`, `float`, `str` and `numpy object`. Benefit: - Domain teams can register their own `collate` function to handle their specific type of objects - Easier for users to extend from the `collate` function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85748 Approved by: https://github.com/NivekT, https://github.com/pmeier	2022-09-30 13:30:18 +00:00
joncrall	4618371da5	Integrate xdoctest - Rebased (#82797 ) This is a new version of #15648 based on the latest master branch. Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR. In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.) Fixes https://github.com/pytorch/pytorch/issues/71105 @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797 Approved by: https://github.com/ezyang	2022-08-12 02:08:01 +00:00
Kurt Mohler	aea6e2c396	Merge torch.cuda._UntypedStorage into torch._UntypedStorage (#75459 ) Fixes #74933 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75459 Approved by: https://github.com/ezyang	2022-05-19 13:54:39 +00:00
Behrooz	76916ccf81	Fix lists in the docstring Signed-off-by: Behrooz <3968947+drbeh@users.noreply.github.com> Fixes #76593 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76594 Approved by: https://github.com/ejguan	2022-05-02 14:12:23 +00:00
naturomics	5749be4678	Fix the shape inconsistency of `out` and `elem` tensor (#71065 ) Summary: See bug report https://github.com/pytorch/pytorch/issues/71063 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71065 Reviewed By: anjali411 Differential Revision: D33549921 Pulled By: ejguan fbshipit-source-id: bc43f5f9a88f7dcd8729d0e0f4b90d20f40b3064	2022-01-12 13:57:19 -08:00
Kevin Tse	b67eaec853	[DateLoader] more clearly expose 'default_collate' and 'default_convert' to users (#69862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69862 Fixes #69445 cc SsnL VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan, ngimel Differential Revision: D33068792 Pulled By: NivekT fbshipit-source-id: ef9791acdc23d014b8761fa7420062d454ce8969	2021-12-14 11:18:26 -08:00
Santiago Castro	f776f30780	Keep the sequence or mapping type in `default_collate` (#68779 ) Summary: `default_collate`, `default_convert`, and `pin_memory` convert sequences into lists. I believe they should keep the original type when possible (e.g., I have a class that inherits from `list`, which comes from a 3rd party library that I can't change, and provides extra functionality). Note it's easy to do when the type supports an iterable in its creation but it's not always the case (e.g., `range`). Even though this can be accomplished if using a custom `default_collate`/`default_convert`, 1) this is behavior they should support out-of-the-box IMHO, and 2) `pin_memory` still does it. cc VitalyFedyunin ejguan NivekT Pull Request resolved: https://github.com/pytorch/pytorch/pull/68779 Reviewed By: wenleix Differential Revision: D32651129 Pulled By: ejguan fbshipit-source-id: 17c390934bacc0e4ead060469cf15dde815550b4	2021-11-29 13:14:20 -08:00
Santiago Castro	86463a8d02	Save some little memory in `default_collate` (#61424 ) Summary: It can be a non-little save if there are many workers and a large batch size. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61424 Reviewed By: soulitzer Differential Revision: D29635477 Pulled By: ejguan fbshipit-source-id: 1fc48b5964e873bd8833ad81bed9d51b0b6d137e	2021-07-09 12:27:07 -07:00
Chester Liu	58eb23378f	Clean up usage of torch._six partially (#49785 ) Summary: See https://github.com/pytorch/pytorch/issues/42919 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49785 Reviewed By: mruberry Differential Revision: D25963833 Pulled By: bugra fbshipit-source-id: 11c90d6b8d3f206c9d0a4d8621b773beb10c6ba2	2021-02-08 13:58:34 -08:00
mungsoo	c38a5cba0d	Remove duplicate assignment in collate.py (#40655 ) Summary: Duplicated assignment Pull Request resolved: https://github.com/pytorch/pytorch/pull/40655 Reviewed By: ezyang Differential Revision: D22308827 Pulled By: colesbury fbshipit-source-id: 48361da8994b3ca00ef29e9afd3ec2672266f00a	2020-07-06 12:37:59 -07:00
Linyuan Gong	0a75234934	Allow np.memmap objects (numpy arrays based on files) to be processed… (#39847 ) Summary: Allow np.memmap objects to be processed by default_collate np.memmap objects has the same behavior as numpy arrays, and the only difference is that they are stored in a binary file on the disk. However, the default_collate function used by PyTorch DataLoader only accepts np.array, and rejects np.memmap by type checking. This commit allows np.memmap objects to be processed by default_collate. In this way, users can use in-disk large arrays with PyTorch DataLoader. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39847 Reviewed By: ezyang Differential Revision: D22284650 Pulled By: zou3519 fbshipit-source-id: 003e3208a2afd1afc2e4640df14b3446201e00b4	2020-06-30 15:00:20 -07:00
Donna Choi	8c07a98adc	Error out of default_collate for lists of unequal size (#38492 ) Summary: Fix issue https://github.com/pytorch/pytorch/issues/23141# In the below example ```default_collate``` collates each element of the list. Since the second element isn't present in all samples, it is discarded: ``` from torch.utils.data import Dataset from torch.utils.data import DataLoader import numpy as np class CustomDataset(Dataset): def __len__(self): return 2 def __getitem__(self, idx): tmp = { "foo": np.array([1, 2, 3]), "bar": ["X"] * (idx+1), } return tmp training = CustomDataset() for batch in DataLoader(training, batch_size=2): print(batch) ``` Yields ``` { 'foo': tensor( [ [1, 2, 3], [1, 2, 3] ] ), 'bar': [ ('X', 'X'), ] } ``` Based on discussion in the issue, it seems the best course of action is to error out in this case. This seems consistent with what is done for tensor elements, as seen in [TensorShape.cpp line 1066](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/TensorShape.cpp#L1060) which is called when ```torch.stack``` is called. In this PR, I introduce a similar message to error out for lists. SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/38492 Differential Revision: D21620396 Pulled By: ezyang fbshipit-source-id: 17f59fbb1ed1f0d9b2185c95b9ebe55ece701b0c	2020-05-18 14:53:33 -07:00
Nathan Goldbaum	f522bde121	Replace references to _DataLoaderIter with _BaseDataLoaderIter (#27105 ) Summary: Back in April, malmaud added type annotations for `dataloader.py`. However, at about the same time, SsnL in https://github.com/pytorch/pytorch/issues/19228 replaced `_DataLoaderIter` with `_BaseDataLoaderIter` and two subclasses, `_SingleProcessDataLoaderIter`, and `_MultiProcessingDataLoaderIter`. However - probably because these changes happened in parallel at roughly the same time, the type stubs and several other references in the codebase were never updated to match this refactoring. I've gone ahead and done the updates to reflect the refactoring in https://github.com/pytorch/pytorch/issues/19228, which fixes the specific type stub/impelementation mismatch pointed out in https://github.com/pytorch/pytorch/issues/26673, although not the broader problem that pytorch doesn't have a test to make sure that the `.pyi` type stub files match the real API defined in `.py` files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27105 Differential Revision: D17813641 Pulled By: ezyang fbshipit-source-id: ed7ac025c8d6ad3f298dd073347ec83bb4b6600c	2019-10-08 12:09:02 -07:00
SsnL	df9d8f9032	Fix no auto batching bugs: cannot bulk load; not work with namedtuple (#26065 ) Summary: see title Pull Request resolved: https://github.com/pytorch/pytorch/pull/26065 Differential Revision: D17392851 Pulled By: soumith fbshipit-source-id: 468cd41c8e03d689ff2e0261d948e28daad6bfaf	2019-09-16 07:22:31 -07:00
Tongzhou Wang	058beae411	Add IterableDataset (#19228 ) Summary: This is a modified version of https://github.com/pytorch/pytorch/pull/14705 since commit structure for that PR is quite messy. 1. Add `IterableDataset`. 3. So we have 2 data loader mods: `Iterable` and `Map`. 1. `Iterable` if the `dataset` is an instance of `IterableDataset` 2. `Map` o.w. 3. Add better support for non-batch loading (i.e., `batch_size=None` and `batch_sampler=None`). This is useful in doing things like bulk loading. 3. Refactor `DataLoaderIter` into two classes, `_SingleProcessDataLoaderIter` and `_MultiProcessingDataLoaderIter`. Rename some methods to be more generic, e.g., `get_batch` -> `get_data`. 4. Add `torch.utils.data.get_worker_info` which returns worker information in a worker proc (e.g., worker id, dataset obj copy, etc.) and can be used in `IterableDataset.__iter__` and `worker_init_fn` to do per-worker configuration. 5. Add `ChainDataset`, which is the analog of `ConcatDataset` for `IterableDataset`. 7. Import torch.utils.data in `torch/__init__.py` 9. data loader examples and documentations 10. Use `get_worker_info` to detect whether we are in a worker process in `default_collate` Closes https://github.com/pytorch/pytorch/issues/17909, https://github.com/pytorch/pytorch/issues/18096, https://github.com/pytorch/pytorch/issues/19946, and some of https://github.com/pytorch/pytorch/issues/13023 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19228 Reviewed By: bddppq Differential Revision: D15058152 fbshipit-source-id: 9e081a901a071d7e4502b88054a34b450ab5ddde	2019-06-20 20:12:44 -07:00
Eskil Jörgensen	8042edcdb1	Make pin_memory and default_collate preserve namedtuples (#16440 ) Summary: Open issue: https://github.com/pytorch/pytorch/issues/3281 Corresponding PR (conflict): https://github.com/pytorch/pytorch/pull/4577 Another open issue: https://github.com/pytorch/pytorch/issues/14613 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16440 Differential Revision: D14020901 Pulled By: ezyang fbshipit-source-id: 4abe817fc43c281a510715d311bad544511995d3	2019-02-11 08:47:33 -08:00
Christoph	2a45050fdc	Concatenate directly into shared memory when constructing batches for numpy (#14534 ) Summary: Since #1323 tensors are shared with shared memory, but this feature is not active for numpy. This PR fix this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14534 Differential Revision: D13561649 Pulled By: soumith fbshipit-source-id: b6bc9e99fb91e8b675c2ef131fba9fa11c1647c0	2018-12-29 17:51:02 -08:00
SsnL	fb22f76eb6	default_collate should collate bool list to byte tensors (#14669 ) Summary: Based on #15331 . Review only the last commit. Fixes https://github.com/pytorch/pytorch/issues/14507. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14669 Reviewed By: ezyang Differential Revision: D13528725 Pulled By: soumith fbshipit-source-id: f12f1ac1c4ff2a3ddd6877c0c096a5da3a1ffa3c	2018-12-28 12:26:46 -08:00
SsnL	9217bde807	Refactor dataloader.py (#15331 ) Summary: Same as #14668, and was approved there. ailzhang , please apply this patch to Horizon's `data_streamer.py`: https://gist.github.com/SsnL/020fdb3d6b7016d81b6ba1d04cc41459 Thank you! Below is the original description at #14668: As I am working on tasks in https://github.com/pytorch/pytorch/issues/13023, I realized how unreadable the code is because all functions to be run in multiprocessing must be at top global level. Adding more functionalities to `dataloader.py` will only make things worse. So in this PR, I refactor `dataloader.py` and move much of it into `data._utils`. E.g., the `_worker_loop` and related methods are now in `data._utils.worker`, signal handling code in `data._utils.signal_handling`, collating code in `data._utils.collate`, etc. This split, IMHO, makes code much clearer. I will base my future changes to DataLoader on top of this. No functionality is changed, except that I added `torch._six.queue`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15331 Reviewed By: yf225 Differential Revision: D13503120 Pulled By: ailzhang fbshipit-source-id: 94df16b4d80ad1102c437cde0d5a2e62cffe1f8e	2018-12-19 12:36:03 -08:00
Ailing Zhang	38eb1beff5	Revert D13289919: [pytorch][PR] [DataLoader] Refactor dataloader.py Differential Revision: D13289919 Original commit changeset: d701bc7bb48f fbshipit-source-id: c350c491fefa98a0a7c0cf22cb832e78aeb15c3d	2018-12-04 20:25:16 -08:00
SsnL	16558a1e9d	Refactor dataloader.py (#14668 ) Summary: As I am working on tasks in https://github.com/pytorch/pytorch/issues/13023, I realized how unreadable the code is because all functions to be run in multiprocessing must be at top global level. Adding more functionalities to `dataloader.py` will only make things worse. So in this PR, I refactor `dataloader.py` and move much of it into `data._utils`. E.g., the `_worker_loop` and related methods are now in `data._utils.worker`, signal handling code in `data._utils.signal_handling`, collating code in `data._utils.collate`, etc. This split, IMHO, makes code much clearer. I will base my future changes to DataLoader on top of this. No functionality is changed, except that I added `torch._six.queue`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14668 Reviewed By: soumith Differential Revision: D13289919 Pulled By: ailzhang fbshipit-source-id: d701bc7bb48f5dd7b163b5be941a9d27eb277a4c	2018-12-04 09:53:41 -08:00

33 Commits