pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Xiang Gao	23174ca71b	[reland] Enable TF32 support for cuBLAS (#41498 ) Summary: fix rocm Pull Request resolved: https://github.com/pytorch/pytorch/pull/41498 Reviewed By: mruberry Differential Revision: D22560572 Pulled By: ngimel fbshipit-source-id: 5ee79e96cb29e70d9180830d058efb53d1c6c041	2020-07-15 21:00:55 -07:00
Shen Li	3a63a939d4	Revert D22517785: [pytorch][PR] Enable TF32 support for cuBLAS Test Plan: revert-hammer Differential Revision: D22517785 (`288ece89e1`) Original commit changeset: 87334c893561 fbshipit-source-id: 0a0674f49c1bcfc98f7f88af5a8c7de93b76e458	2020-07-15 08:15:48 -07:00
Xiang Gao	288ece89e1	Enable TF32 support for cuBLAS (#40800 ) Summary: Benchmark on a fully connected network and torchvision models (time in seconds) on GA100: \| model \| batch size \| forward(TF32) \| forward(FP32) \| backward(TF32) \| backward(FP32) \| \|--------------------\|------------\|---------------\|---------------\|----------------\|----------------\| \| FC 512-128-32-8 \| 512 \| 0.000211 \| 0.000321 \| 0.000499 \| 0.000532 \| \| alexnet \| 512 \| 0.0184 \| 0.0255 \| 0.0486 \| 0.0709 \| \| densenet161 \| 128 \| 0.0665 \| 0.204 \| 0.108 \| 0.437 \| \| googlenet \| 256 \| 0.0925 \| 0.110 \| 0.269 \| 0.326 \| \| inception_v3 \| 256 \| 0.155 \| 0.214 \| 0.391 \| 0.510 \| \| mnasnet1_0 \| 512 \| 0.108 \| 0.137 \| 0.298 \| 0.312 \| \| mobilenet_v2 \| 512 \| 0.114 \| 0.294 \| 0.133 \| 0.303 \| \| resnet18 \| 512 \| 0.0722 \| 0.100 \| 0.182 \| 0.228 \| \| resnext50_32x4d \| 256 \| 0.170 \| 0.237 \| 0.373 \| 0.479 \| \| shufflenet_v2_x1_0 \| 512 \| 0.0463 \| 0.0473 \| 0.125 \| 0.123 \| \| squeezenet1_0 \| 512 \| 0.0870 \| 0.0948 \| 0.205 \| 0.214 \| \| vgg16 \| 256 \| 0.167 \| 0.234 \| 0.401 \| 0.502 \| \| wide_resnet50_2 \| 512 \| 0.186 \| 0.310 \| 0.415 \| 0.638 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/40800 Reviewed By: mruberry Differential Revision: D22517785 Pulled By: ngimel fbshipit-source-id: 87334c8935616f72a6af5abbd3ae69f76923dc3e	2020-07-14 13:21:10 -07:00
Edward Leardi	6b50874cb7	Fix HTTP links in documentation to HTTPS (#40878 ) Summary: I ran `make linkcheck` using `sphinx.builders.linkcheck` on the documentation and noticed a few links weren't using HTTPS so I quickly updated them all. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40878 Differential Revision: D22404647 Pulled By: ngimel fbshipit-source-id: 9c9756db59197304023fddc28f252314f6cf4af3	2020-07-06 20:05:21 -07:00
Xiang Gao	df8d6eeb19	Update docs about DP and DDP for CUDA (#35063 ) Summary: We should recommend DDP instead of DP. Hope we can also cherry-pick this for 1.5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35063 Differential Revision: D20549621 Pulled By: ngimel fbshipit-source-id: 86b1b2134664065cc6070ea4212895f993eaf543	2020-03-20 20:06:37 -07:00
Jerry Ma	1610ea8ef8	Comprehensive-ish instrumentation for CUDA memory allocator (#27361 ) Summary: Adds comprehensive memory instrumentation to the CUDA caching memory allocator. # Counters Added comprehensive instrumentation for the following stats: - Allocation requests (`allocation`) - Allocated memory (`allocated_bytes`) - Reserved segments from cudaMalloc (`segment`) - Reserved memory (`reserved_bytes`) - Active memory blocks (`active`) - Active memory (`active_bytes`) - Inactive, non-releasable blocks (`inactive_split`) - Inactive, non-releasable memory (`inactive_split_bytes`) - Number of failed cudaMalloc calls that result in a cache flush and retry (`cuda_malloc_retries`) - Number of OOMs (`num_ooms`) Except for the last two, these stats are segmented between all memory, large blocks, and small blocks. Along with the current value of each stat, historical counts of allocs/frees as well as peak usage are tracked by the allocator. # Snapshots Added the capability to get a "memory snapshot" – that is, to generate a complete dump of the allocator block/segment state. # Implementation: major changes - Added `torch.cuda.memory_stats()` (and associated C++ changes) which returns all instrumented stats as a dictionary. - Added `torch.cuda.snapshot()` (and associated C++ changes) which returns a complete dump of the allocator block/segment state as a list of segments. - Added memory summary generator in `torch.cuda.memory_summary()` for ease of client access to the instrumentation stats. Potentially useful to dump when catching OOMs. Sample output here: https://pastebin.com/uKZjtupq # Implementation: minor changes - Add error-checking helper functions for Python dicts and lists in `torch/csrc/utils/`. - Existing memory management functions in `torch.cuda` moved from `__init__.py` to `memory.py` and star-imported to the main CUDA module. - Add various helper functions to `torch.cuda` to return individual items from `torch.cuda.memory_stats()`. - `torch.cuda.reset_max_memory_cached()` and `torch.cuda.reset_max_memory_allocated()` are deprecated in favor of `reset_peak_stats`. It's a bit difficult to think of a case where only one of those stats should be reset, and IMO this makes the peak stats collectively more consistent. - `torch.cuda.memory_cached()` and `torch.cuda.max_memory_cached()` are deprecated in favor of `*memory_reserved()`. - Style (add access modifiers in the allocator class, random nit fixes, etc.) # Testing - Added consistency check for stats in `test_cuda.py`. This verifies that the data from `memory_stats()` is faithful to the data from `snapshot()`. - Ran on various basic workflows (toy example, CIFAR) # Performance Running the following speed benchmark: https://pastebin.com/UNndQg50 - Before this PR: 45.98 microseconds per tensor creation - After this PR: 46.65 microseconds per tensor creation Pull Request resolved: https://github.com/pytorch/pytorch/pull/27361 Differential Revision: D17758747 Pulled By: jma127 fbshipit-source-id: 5a84e82d696c40c505646b9a1b4e0c3bba38aeb6	2019-10-08 15:42:48 -07:00
Tongzhou Wang	98d3d1659e	Document benchmarking practice for CUDA Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23910 Differential Revision: D16732365 Pulled By: ezyang fbshipit-source-id: 24e055602d479293da3e00a7143bba8f92bb7c4a	2019-08-13 15:07:23 -07:00
Tongzhou Wang	058beae411	Add IterableDataset (#19228 ) Summary: This is a modified version of https://github.com/pytorch/pytorch/pull/14705 since commit structure for that PR is quite messy. 1. Add `IterableDataset`. 3. So we have 2 data loader mods: `Iterable` and `Map`. 1. `Iterable` if the `dataset` is an instance of `IterableDataset` 2. `Map` o.w. 3. Add better support for non-batch loading (i.e., `batch_size=None` and `batch_sampler=None`). This is useful in doing things like bulk loading. 3. Refactor `DataLoaderIter` into two classes, `_SingleProcessDataLoaderIter` and `_MultiProcessingDataLoaderIter`. Rename some methods to be more generic, e.g., `get_batch` -> `get_data`. 4. Add `torch.utils.data.get_worker_info` which returns worker information in a worker proc (e.g., worker id, dataset obj copy, etc.) and can be used in `IterableDataset.__iter__` and `worker_init_fn` to do per-worker configuration. 5. Add `ChainDataset`, which is the analog of `ConcatDataset` for `IterableDataset`. 7. Import torch.utils.data in `torch/__init__.py` 9. data loader examples and documentations 10. Use `get_worker_info` to detect whether we are in a worker process in `default_collate` Closes https://github.com/pytorch/pytorch/issues/17909, https://github.com/pytorch/pytorch/issues/18096, https://github.com/pytorch/pytorch/issues/19946, and some of https://github.com/pytorch/pytorch/issues/13023 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19228 Reviewed By: bddppq Differential Revision: D15058152 fbshipit-source-id: 9e081a901a071d7e4502b88054a34b450ab5ddde	2019-06-20 20:12:44 -07:00
Tongzhou Wang	bb89827e1d	Update cuda pinned memory note to include tensor.to (#20977 ) Summary: separate bits of changes from #19228 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20977 Differential Revision: D15511919 Pulled By: soumith fbshipit-source-id: 5015a29cdac6d6e160388c493182c330f0da63ec	2019-05-26 22:22:06 -07:00
Tongzhou Wang	6d307db5b4	Move cuFFT plan cache note outside Best Practices (#19538 ) Summary: I mistakenly put it there. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19538 Differential Revision: D15026500 Pulled By: soumith fbshipit-source-id: 0c13499571fdfd789c3bd1c4b58abd870725d422	2019-04-20 21:39:59 -07:00
Tongzhou Wang	973d51079b	Add device-specific cuFFT plan caches (#19300 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/19224 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19300 Differential Revision: D14986967 Pulled By: soumith fbshipit-source-id: 8c31237db50d6924bba1472434c10326610d9255	2019-04-18 06:39:35 -07:00
SsnL	300dcc3b96	Add cuda.reset_max_memory_* (#15985 ) Summary: Addresses #15968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15985 Differential Revision: D13649916 Pulled By: soumith fbshipit-source-id: a207aea5709a79dba7a6fc541d0a70103f49efff	2019-01-14 07:31:51 -08:00
cclauss	b0248df72a	Docs: Change cuda(async) —> cuda(non_blocking) (#12158 ) Summary: goldsborough Modify the docs to match the changes made in #4999 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12158 Differential Revision: D10103964 Pulled By: SsnL fbshipit-source-id: 1b8692da86aca1a52e8d2e6cea76a5ad1f71e058	2018-09-28 08:39:27 -07:00
Kaiyu Shi	0169ac5936	Fix sample code for cuda stream (#8319 )	2018-06-10 11:41:50 -04:00
Richard Zou	0430bfe40b	[docs] Update broadcasting and cuda semantics notes (#6904 ) * [docs] Update broadcasting and cuda semantics notes * Update multiprocessing.rst * address comments * Address comments	2018-04-24 13:41:24 -04:00
Kento NOZAWA	c00ee6da8f	Fix typos (#6348 ) * Fix typo * Fix typo * Update faq.rst	2018-04-06 11:06:42 -04:00
Tongzhou Wang	392fc8885c	add faq on cuda memory management and dataloder (#5378 )	2018-02-27 18:35:30 -05:00
Tongzhou Wang	6420c6b224	Improve `torch.cuda.empty_cache` documentation (#4879 ) * add doc about empty_cache wont increase amount of memory available * typo	2018-01-27 04:54:25 -05:00
Yongjik Kim	dd5c195646	More documentation for CUDA stream functions. (#4756 )	2018-01-21 12:58:51 +01:00
Tongzhou Wang	5918243b0c	Methods for checking CUDA memory usage (#4511 ) * gpu mem allocated * add test * addressed some of @apaszke 's comments * cache stats * add more comments about test	2018-01-09 11:47:48 -05:00
SsnL	bb1b826cdc	Exposing emptyCache from allocator (#3518 ) * Add empty_cache binding * cuda.empty_cache document * update docs	2017-11-07 17:00:38 -05:00
Kaixhin	5de7f9e731	Tidy up CUDA notes	2017-11-05 14:42:06 +01:00
Kai Arulkumaran	a7c5be1d45	Document CUDA best practices (#3227 )	2017-10-25 22:38:17 +02:00
Hungryof	73128f7b08	fix minor typos (#2051 ) * Update extending.rst fix typo * Update cuda.rst fix typo	2017-07-11 11:01:41 -04:00
Du Phan	86e40ed875	Fix a typo in docs about pinned memory buffers (#1023 ) * remove misleading guide for BCELoss * fix docs about pinned memory buffers	2017-03-17 05:08:03 -04:00
Eli Stevens	88275da5e8	CUDA documentation tweaks (#858 )	2017-02-26 20:37:43 +01:00
Eli Stevens	b87c113cf4	CUDA documentation enhancement and docs versioning (#848 ) * Add more detail to CUDA documentation Also adds better cross-linking to the pages that discuss relevant topics. * Adds recommendation to torch.save docs * Make the version numbers for the docs dynamic Might need tweaks for beta, 1.0, etc.	2017-02-26 08:33:26 -05:00
Alfredo Canziani	a38749d15f	Fix cuda notes Target GPU is consisten with source GPU	2017-01-27 19:30:49 +01:00
Adam Paszke	4cc11066b2	Add torch.utils.data docs and improve notes (#460 ) * Add torch.utils.data docs and improve notes	2017-01-17 14:51:05 -05:00
Adam Paszke	15c1dad340	Minor fixes and torch.cuda docs	2017-01-16 20:38:14 -05:00

30 Commits