pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Chengji Yao	3d77529f5b	enable autocast for xla (#48570 ) Summary: For enabling amp in torch/xla, see [this](https://github.com/pytorch/xla/pull/2654). Pull Request resolved: https://github.com/pytorch/pytorch/pull/48570 Reviewed By: ezyang Differential Revision: D26120627 Pulled By: ailzhang fbshipit-source-id: 32627b17c04bfdad128624676ea9bf6f117bc97d	2021-02-11 12:06:13 -08:00
Stas Bekman	20fe2e12d6	typo (#48887 ) Summary: a small grammar fix jspisak - thank you! Pull Request resolved: https://github.com/pytorch/pytorch/pull/48887 Reviewed By: malfet, zhangguanheng66 Differential Revision: D25358638 Pulled By: brianjo fbshipit-source-id: 3b805b54df3410f8770e1c6ddc569b26661cece4	2021-02-09 07:50:29 -08:00
Chester Liu	58eb23378f	Clean up usage of torch._six partially (#49785 ) Summary: See https://github.com/pytorch/pytorch/issues/42919 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49785 Reviewed By: mruberry Differential Revision: D25963833 Pulled By: bugra fbshipit-source-id: 11c90d6b8d3f206c9d0a4d8621b773beb10c6ba2	2021-02-08 13:58:34 -08:00
Nikita Shulga	43f0ccd1ec	torch.cuda.memory_allocated to return `{}` if not initialized (#51179 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49952 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51179 Reviewed By: ngimel Differential Revision: D26094932 Pulled By: malfet fbshipit-source-id: 0ec28ef9b0604245753d3f2b0e3536286700668d	2021-01-28 20:38:17 -08:00
Will Constable	f2e41257e4	Back out "Revert D26077905: Back out "Revert D25850783: Add torch::deploy, an embedded torch-python interpreter"" (#51267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51267 Original commit changeset: b70185916502 Test Plan: test locally, oss ci-all, fbcode incl deferred Reviewed By: suo Differential Revision: D26121251 fbshipit-source-id: 4315b7fd5476914c8e5d6f547e1cfbcf0c227781	2021-01-28 19:30:45 -08:00
Mike Ruberry	12a434abbc	Revert D26077905: Back out "Revert D25850783: Add torch::deploy, an embedded torch-python interpreter" Test Plan: revert-hammer Differential Revision: D26077905 (`dc2a44c4fc`) Original commit changeset: fae83bf9822d fbshipit-source-id: b70185916502ba9ebe16d781cf0659b9f7865c9a	2021-01-27 19:53:29 -08:00
Will Constable	dc2a44c4fc	Back out "Revert D25850783: Add torch::deploy, an embedded torch-python interpreter" (#51124 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51124 Original commit changeset: 1c7133627da2 Test Plan: Test locally with interpreter_test and on CI Reviewed By: suo Differential Revision: D26077905 fbshipit-source-id: fae83bf9822d79e9a9b5641bc5191a7f3fdea78d	2021-01-27 16:49:42 -08:00
Mike Ruberry	e843974a6e	Revert D25850783: Add torch::deploy, an embedded torch-python interpreter Test Plan: revert-hammer Differential Revision: D25850783 (`3192f9e4fe`) Original commit changeset: a4656377caff fbshipit-source-id: 1c7133627da28fb12848da7a9a46de6d3b2b67c6	2021-01-26 02:07:44 -08:00
Will Constable	3192f9e4fe	Add torch::deploy, an embedded torch-python interpreter (#50458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50458 libinterpreter.so contains a frozen python distribution including torch-python bindings. Freezing refers to serializing bytecode of python standard library modules as well as the torch python library and embedding them in the library code. This library can then be dlopened multiple times in one process context, each interpreter having its own python state and GIL. In addition, each python environment is sealed off from the filesystem and can only import the frozen modules included in the distribution. This change relies on newly added frozenpython, a cpython 3.8.6 fork built for this purpose. Frozenpython provides libpython3.8-frozen.a which contains frozen bytecode and object code for the python standard library. Building on top of frozen python, the frozen torch-python bindings are added in this diff, providing each embedded interpreter with a copy of the torch bindings. Each interpreter is intended to share one instance of libtorch and the underlying tensor libraries. Known issues - Autograd is not expected to work with the embedded interpreter currently, as it manages its own python interactions and needs to coordinate with the duplicated python states in each of the interpreters. - Distributed and cuda stuff is disabled in libinterpreter.so build, needs to be revisited - __file__ is not supported in the context of embedded python since there are no files for the underlying library modules. using __file__ - __version__ is not properly supported in the embedded torch-python, just a workaround for now Test Plan: tested locally and on CI with cmake and buck builds running torch::deploy interpreter_test Reviewed By: ailzhang Differential Revision: D25850783 fbshipit-source-id: a4656377caff25b73913daae7ae2f88bcab8fd88	2021-01-25 15:14:28 -08:00
Nikita Shulga	dea529a779	Add torch.cuda.can_device_access_peer (#50446 ) Summary: And unrelying torch._C._cuda_canDeviceAccessPeer, which is a wrapper around cudaDeviceCanAccessPeer Pull Request resolved: https://github.com/pytorch/pytorch/pull/50446 Reviewed By: mrshenli Differential Revision: D25890405 Pulled By: malfet fbshipit-source-id: ef09405f115bbe73ba301d608d56cd8f8453201b	2021-01-12 20:30:45 -08:00
Hugo van Kemenade	473e78c0fa	Remove redundant code for unsupported Python versions (#49486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49486 Remove code for Python 3.5 and lower. There's more that can be removed/modernised, but sticking mainly to redundant version checks here, to keep the diff/PR smaller. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46579 Reviewed By: zou3519 Differential Revision: D24453571 Pulled By: ezyang fbshipit-source-id: c2cfcf05d6c5f65df64d89c331692c9aec09248e	2021-01-06 12:45:46 -08:00
Jeff Yang	d1a56fcd9d	[docs] add docstring in torch.cuda.get_device_properties (#49792 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49737 Added docstring in `torch.cuda.get_device_properties` Added the `Returns` in `torch.cuda.get_device_name` Pull Request resolved: https://github.com/pytorch/pytorch/pull/49792 Reviewed By: mruberry Differential Revision: D25784046 Pulled By: ngimel fbshipit-source-id: f88da02147f92c889398957fcaf22961d3bb1062	2021-01-05 14:51:07 -08:00
Samuel Marks	e6779d4357	[*.py] Rename "Arguments:" to "Args:" (#49736 ) Summary: I've written custom parsers and emitters for everything from docstrings to classes and functions. However, I recently came across an issue when I was parsing/generating from the TensorFlow codebase: inconsistent use of `Args:` and `Arguments:` in its docstrings. ```sh (pytorch#c348fae)$ for name in 'Args:' 'Arguments:'; do printf '%-10s %04d\n' "$name" "$(rg -IFtpy --count-matches "$name" \| paste -s -d+ -- \| bc)"; done Args: 1095 Arguments: 0336 ``` It is easy enough to extend my parsers to support both variants, however it looks like `Arguments:` is wrong anyway, as per: - https://google.github.io/styleguide/pyguide.html#doc-function-args @ [`ddccc0f`](https://github.com/google/styleguide/blob/ddccc0f/pyguide.md) - https://chromium.googlesource.com/chromiumos/docs/+/master/styleguide/python.md#describing-arguments-in-docstrings @ [`9fc0fc0`](https://chromium.googlesource.com/chromiumos/docs/+/9fc0fc0/styleguide/python.md) - https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html @ [`c0ae8e3`](https://github.com/sphinx-contrib/napoleon/blob/c0ae8e3/docs/source/example_google.rst) Therefore, only `Args:` is valid. This PR replaces them throughout the codebase. PS: For related PRs, see tensorflow/tensorflow/pull/45420 PPS: The trackbacks automatically appearing below are sending the same changes to other repositories in the [PyTorch](https://github.com/pytorch) organisation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49736 Reviewed By: albanD Differential Revision: D25710534 Pulled By: soumith fbshipit-source-id: 61e8ff01abb433e9f78185c2d1d0cbd7c22c1619	2020-12-28 09:34:47 -08:00
Michael Carilli	c068180a17	[CUDA graphs] Cuda RNG-safe graph capture and replay bindings (#48875 ) Summary: Part 2 of https://github.com/pytorch/pytorch/pull/46148 refactor. (part 1 was https://github.com/pytorch/pytorch/pull/48694.) Contains - a few more CUDAGeneratorImpl diffs to clean up graph capture interaction - Capture and replay bindings that interact correctly with CUDAGeneratorImpl - Tests. Diffs compile and tests pass on my machine (ubuntu 20.04, cuda 11.0) but it needs finetuning for many CI builds. See [Note [CUDA Graph-safe RNG states]](`02d89f9f1d/aten/src/ATen/CUDAGeneratorImpl.h (L13-L85)`) for the strategy, based on https://github.com/pytorch/pytorch/pull/46148#issuecomment-724414794. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48875 Reviewed By: zou3519 Differential Revision: D25482654 Pulled By: ngimel fbshipit-source-id: 634dbc4c6c9d7d0d9a62dc81a52d430561f905fe	2020-12-14 10:51:58 -08:00
x00480351	47aa253632	[Feature] Allow user to specify a fraction of the GPU memory. (#48172 ) Summary: Add a new function, torch.cuda.set_per_process_memory_fraction(fraction, device), to torch.cuda. Related: https://github.com/pytorch/pytorch/issues/18626 The fraction (float type, from 0 to 1) is used to limit memory of cashing allocator on GPU device . One can set it on any visible GPU. The allowed memory equals total memory * fraction. It will raise an OOM error when try to apply GPU memory more than the allowed value. This function is similar to Tensorflow's per_process_gpu_memory_fraction Note， this setting is just limit the cashing allocator in one process. If you are using multiprocess, you need to put this setting in to the subprocess to limit its GPU memory, because subprocess could have its own allocator. ## usage In some cases, one needs to split a GPU device as two parts. Can set limitation before GPU memory using. Eg. device: 0, each part takes half memory, the code as follows: ``` torch.cuda.set_per_process_memory_fraction(0.5, 0) ``` There is an example to show what it is. ```python import torch torch.cuda.set_per_process_memory_fraction(0.5, 0) torch.cuda.empty_cache() total_memory = torch.cuda.get_device_properties(0).total_memory # less than 0.5 will be ok: tmp_tensor = torch.empty(int(total_memory * 0.499), dtype=torch.int8, device='cuda') del tmp_tensordel tmp_tensor torch.cuda.empty_cache() # this allocation will raise a OOM: torch.empty(total_memory // 2, dtype=torch.int8, device='cuda') """ It raises an error as follows: RuntimeError: CUDA out of memory. Tried to allocate 5.59 GiB (GPU 0; 11.17 GiB total capacity; 0 bytes already allocated; 10.91 GiB free; 5.59 GiB allowed; 0 bytes reserved in total by PyTorch) """ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/48172 Reviewed By: bdhirsh Differential Revision: D25275381 Pulled By: VitalyFedyunin fbshipit-source-id: d8e7af31902c2eb795d416b57011cc8a22891b8f	2020-12-03 11:45:56 -08:00
Nikita Shulga	1454cbf087	Make numpy optional dependency for torch.cuda.amp (#48154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48154 Test Plan: Uninstall `numpy` and try to importing `torch` Discovered while working on https://github.com/pytorch/pytorch/issues/48145 Reviewed By: walterddr Differential Revision: D25046307 Pulled By: malfet fbshipit-source-id: c1171a49e03bdc40e8dc1d65928c6c12626e33db	2020-11-18 08:31:44 -08:00
Guilherme Leobas	4f9d0757f3	Add type informations to torch.cuda (#47134 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47133 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47134 Reviewed By: smessmer Differential Revision: D24955031 Pulled By: ezyang fbshipit-source-id: 87f4623643715baa6ac0627383f009956f80cd46	2020-11-13 21:34:35 -08:00
Christian Hudon	511f89eaa9	Add nvtx.range() context manager (#42925 ) Summary: Small quality-of-life improvement to NVTX Python bindings, that we're using internally and that would be useful to other folks using NVTX annotations via PyTorch. (And my first potential PyTorch contribution.) Instead of needing to be careful with try/finally to make sure all your range_push'es are range_pop'ed: ``` nvtx.range_push("Some event") try: # Code here... finally: nvtx.range_pop() ``` you can simply do: ``` with nvtx.range("Some event"): # Code here... ``` or even use it as a decorator: ``` class MyModel(nn.Module): # Other methods here... nvtx.range("MyModel.forward()") def forward(self, *input): # Forward pass code here... ``` A couple small open questions: 1. I also added the ability to call `msg.format()` inside `range()`, with the intention that, if there is nothing listening to NVTX events, we should skip the string formatting, to lower the overhead in that case. If you like that idea, I could add the actual "skip string formatting if nobody is listening to events" parts. We can also just leave it as is. Or I can remove that if you folks don't like it. (In the first two cases, should we add that to `range_push()` and `mark()` too?) Just let me know which one it is, and I'll update the pull request. 2. I don't think there are many places for bugs to hide in that function, but I can certainly add a quick test, if you folks want. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42925 Reviewed By: gchanan Differential Revision: D24476977 Pulled By: ezyang fbshipit-source-id: 874882818d958e167e624052e42d52fae3c4abf1	2020-10-22 19:46:16 -07:00
Alexander Grund	93719440b8	Replace map(lambda constructs (#46462 ) Summary: Follow-up of https://github.com/pytorch/pytorch/issues/46461 with a similar goal Makes them more readable and possibly faster. Care has to be taken because `map` applies the function immediately while `(x for x in xs)` is a generator expression which gets evaluated later. This is a benefit in some cases where it is not required to actually create the list of values in memory (e.g. when passing to `tuple` or `extend` or `join`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/46462 Reviewed By: zou3519 Differential Revision: D24422343 Pulled By: ezyang fbshipit-source-id: 252e33499c92ac0b15238f2df32681dbbda2b237	2020-10-22 09:50:22 -07:00
Michael Carilli	72bc3d9de4	Use MTA for amp grad unscaling, enforce op math type in MTA functors, and allow op lambdas (#44778 ) Summary: Amp gradient unscaling is a great use case for multi tensor apply (in fact it's the first case I wrote it for). This PR adds an MTA unscale+infcheck functor. Really excited to have it for `torch.cuda.amp`. izdeby your interface was clean and straightforward to use, great work! Labeled as bc-breaking because the native_functions.yaml exposure of unscale+infcheck changes from [`_amp_non_finite_check_and_unscale_` to `_amp_foreach_non_finite_check_and_unscale_`]( https://github.com/pytorch/pytorch/pull/44778/files#diff-f1e4b2c15de770d978d0eb77b53a4077L6289-L6293). The PR also modifies Unary/Binary/Pointwise Functors to - do ops' internal math in FP32 for FP16 or bfloat16 inputs, which improves precision ([and throughput, on some architectures!](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#arithmetic-instructions)) and has no downside for the ops we care about. - accept an instantiated op functor rather than an op functor template (`template<class> class Op`). This allows calling code to pass lambdas. Open question: As written now, the PR has MTA Functors take care of pre- and post-casting FP16/bfloat16 inputs to FP32 before running the ops. However, alternatively, the pre- and post-math casting could be deferred/written into the ops themselves, which gives them a bit more control. I can easily rewrite it that way if you prefer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44778 Reviewed By: gchanan Differential Revision: D23944102 Pulled By: izdeby fbshipit-source-id: 22b25ccad5f69b413c77afe8733fa9cacc8e766d	2020-10-01 07:51:16 -07:00
Nikita Shulga	b3135c2056	Enable torch.cuda.amp typechecking (#45480 ) Summary: Fix `torch._C._autocast_*_nesting` declarations in __init__.pyi Fix iterable constructor logic: not every iterable can be constructed using `type(val)(val)` trick, for example it would not work for `val=range(10)` although `isinstance(val, Iterable)` is True Change optional resolution logic to meet mypy expectations Fixes https://github.com/pytorch/pytorch/issues/45436 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45480 Reviewed By: walterddr Differential Revision: D23982822 Pulled By: malfet fbshipit-source-id: 6418a28d04ece1b2427dcde4b71effb67856a872	2020-09-29 09:31:55 -07:00
Nikita Shulga	8ab2ad306d	Enable `torch.cuda.nccl` typechecking (#45344 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45336 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45344 Reviewed By: walterddr Differential Revision: D23935306 Pulled By: malfet fbshipit-source-id: dd09d4f8ff7a327131764487158675027a13bf69	2020-09-25 17:02:47 -07:00
Nikita Shulga	c8166d4b58	Add `torch.cuda.comm` to typechecking CI (#45350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45350 Reviewed By: walterddr Differential Revision: D23935750 Pulled By: malfet fbshipit-source-id: 5a7d2d4fbc976699d80bb5caf4727c19fa2c5bc8	2020-09-25 12:13:43 -07:00
Xiao Wang	7e5492e1be	[minor] Fix undefined variable (#45246 ) Summary: The commit `2a37f3fd2f` https://github.com/pytorch/pytorch/pull/45130 deleted the python variable `capability` which is used in later lines. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45246 Reviewed By: walterddr Differential Revision: D23923916 Pulled By: malfet fbshipit-source-id: c5d7fef9e4a87ccc621191200e5965710e9d6aaa	2020-09-24 20:17:13 -07:00
Nikita Shulga	2a37f3fd2f	Relax CUDA architecture check (#45130 ) Summary: NVIDIA GPUs are binary compatible within major compute capability revision This would prevent: "GeForce RTX 3080 with CUDA capability sm_86 is not compatible with the current PyTorch installation." messages from appearing, since CUDA-11 do not support code generation for sm_85. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45130 Reviewed By: ngimel Differential Revision: D23841556 Pulled By: malfet fbshipit-source-id: bcfc9e8da63dfe62cdec06909b6c049aaed6a18a	2020-09-22 17:26:47 -07:00
Natalia Gimelshein	95a69a7d09	adds list_gpu_processes function (#44616 ) Summary: per title, to make it easier to track the creation of stray contexts: ``` python -c "import torch; a=torch.randn(1, device='cuda'); print(torch.cuda.memory.list_gpu_processes(0)); print(torch.cuda.memory.list_gpu_processes(1))" GPU:0 process 79749 uses 601.000 MB GPU memory GPU:1 no processes are running ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44616 Reviewed By: mruberry Differential Revision: D23675739 Pulled By: ngimel fbshipit-source-id: ffa14cad9d7144e883de13b1c2c6817bd432f53a	2020-09-14 09:54:32 -07:00
Nikita Shulga	e49dd9fa05	Delete `raise_from` from `torch._six` (#43981 ) Summary: No need for compatibility wrapper in Python3+ world Pull Request resolved: https://github.com/pytorch/pytorch/pull/43981 Reviewed By: seemethere Differential Revision: D23458325 Pulled By: malfet fbshipit-source-id: 00f822895625f4867c22376fe558c50316f5974d	2020-09-01 15:46:18 -07:00
Pritam Damania	306eb3def7	Additional error checking for `torch.cuda.nccl` APIs. (#43247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43247 `torch.cuda.nccl` APIs didn't throw appropriate errors when called with inputs/outputs that were of the wrong type and it resulted in some cryptic errors instead. Adding some error checks with explicit error messages for these APIs. ghstack-source-id: 110683546 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D23206069 fbshipit-source-id: 8107b39d27f4b7c921aa238ef37c051a9ef4d65b	2020-08-26 13:50:00 -07:00
Nikita Shulga	0fa99d50bc	Enable torch.cuda.memory typechecking (#43444 ) Summary: Add number of function prototypes defined in torch/csrs/cuda/Module.cpp to `__init__.pyi.in` Fixes https://github.com/pytorch/pytorch/issues/43442 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43444 Reviewed By: ezyang Differential Revision: D23280221 Pulled By: malfet fbshipit-source-id: 7d67dff7b24c8d7b7e72c919e6e7b847f242ef83	2020-08-24 11:46:04 -07:00
Nikita Shulga	db78c07ced	Enable torch.cuda.nvtx typechecking (#43443 ) Summary: Add pyi file covering torch._C.nvtx submodule Fixes https://github.com/pytorch/pytorch/issues/43436 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43443 Reviewed By: ezyang Differential Revision: D23280188 Pulled By: malfet fbshipit-source-id: 882860cce9feb0b5307c8b7c887f4a2f2c1548a2	2020-08-24 08:20:12 -07:00
Daniel van Strien	86841f5f61	Update cuda init docstring to improve clarity (#42923 ) Summary: A small clarity improvement to the cuda init docstring Pull Request resolved: https://github.com/pytorch/pytorch/pull/42923 Reviewed By: zhangguanheng66 Differential Revision: D23080693 Pulled By: mrshenli fbshipit-source-id: aad5ed9276af3b872c1def76c6175ee30104ccb2	2020-08-12 15:41:28 -07:00
Xiang Gao	c14fbc36ed	Update docs about CUDA stream priority (#41364 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41364 Reviewed By: malfet Differential Revision: D22962856 Pulled By: ngimel fbshipit-source-id: 47f65069516cb555579455e8680deb937fc1f544	2020-08-05 20:03:18 -07:00
Dmytro Dzhulgakov	06d978a9ad	[c10/cuda] Reorganize device_count() and robustly surface ASAN warnings (#42249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42249 Main change is to bring Caffe2's superior error messages for cuda initialization into c10 and use them in all code paths. Basic logic: \| Case \| Call to device_count() \| init_cuda, e.g. allocating tensor \| \| -- \| -- \| -- \| \| all good \| non-zero \| just works \| \| no gpus \| 0, no warning \| throw exception with good message \| \| driver issues \| 0, produce warning \| throw exception with good message \| \| out of memory with ASAN \| 0, produce warning\| throw exception with ASAN message \| Previously, the error thrown from init_cuda was very generic and the ASAN warning (if any) was buried in the logs. Other clean up changes: * cache device_count() always in a static variable * move all asan macros in c10 Test Plan: Hard to unittest because of build modes. Verified manually that the behavior from the table above holds by running the following script in different modes (ASAN/no-ASAN, CUDA_VISIBLE_DEVICES=): ``` print('before import') import torch print('after import') print('devices: ', torch.cuda.device_count()) x = torch.tensor([1,2,3]) print('tensor creation') x = x.cuda() print('moved to cuda') ``` Reviewed By: ngimel Differential Revision: D22824329 fbshipit-source-id: 5314007313a3897fc955b02f8b21b661ae35fdf5	2020-08-05 11:39:31 -07:00
Michael Carilli	7cdf786a07	fix typo in GradScaler docstring (#42236 ) Summary: Closes https://github.com/pytorch/pytorch/issues/42226. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42236 Reviewed By: albanD Differential Revision: D22817980 Pulled By: ngimel fbshipit-source-id: 4326fe028dba1dbeed454edc4e4d4fffa56f51d6	2020-07-29 13:14:57 -07:00
wudenggang	9600ed9af3	typo fixes (#41632 ) Summary: typo fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/41632 Reviewed By: ezyang Differential Revision: D22617827 Pulled By: mrshenli fbshipit-source-id: c2bfcb7cc36913a8dd32f13fc9adc3aa0a9b682f	2020-07-20 07:23:00 -07:00
Stanislau Hlebik	b774ce54f8	remediation of S205607 fbshipit-source-id: 798decc90db4f13770e97cdce3c0df7d5421b2a3	2020-07-17 17:19:47 -07:00
Stanislau Hlebik	8fdea489af	remediation of S205607 fbshipit-source-id: 5113fe0c527595e4227ff827253b7414abbdf7ac	2020-07-17 17:17:03 -07:00
Luca Wehrstedt	c20426f86d	Fix torch.cuda.check_error type errors (#41330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41330 `torch.cuda.check_error` is annotated as taking an `int` as argument but when running `torch.cuda.check_error(34)` one would get: ``` TypeError: cudaGetErrorString(): incompatible function arguments. The following argument types are supported: 1. (arg0: torch._C._cudart.cudaError) -> str Invoked with: 34 ``` Even if one explicitly casted the argument, running `torch.cuda.check_error(torch._C._cudart.cudaError(34))` would give: ``` AttributeError: 'str' object has no attribute 'decode' ``` This PR fixes both issues (thus allowing `check_error` to be called with a un-casted int) and adds a test. ghstack-source-id: 107628709 Test Plan: Unit tests Reviewed By: ezyang Differential Revision: D22500549 fbshipit-source-id: 9170c1e466dd554d471e928b26eb472a712da9e1	2020-07-14 00:47:14 -07:00
Nikita Shulga	b952eaf668	Preserve CUDA gencode flags (#41173 ) Summary: Add `torch._C._cuda_getArchFlags()` that returns list of architecture `torch_cuda` were compiled with Add `torch.cuda.get_arch_list()` and `torch.cuda.get_gencode_flags()` methods that returns architecture list and gencode flags PyTorch were compiled with Print warning if some of GPUs is not compatible with any of the CUBINs Pull Request resolved: https://github.com/pytorch/pytorch/pull/41173 Differential Revision: D22459998 Pulled By: malfet fbshipit-source-id: 65d40ae29e54a0ba0f3f2da11b821fdb4d452d95	2020-07-09 14:59:35 -07:00
chengjun	8d570bc708	Decouple DataParallel/DistributedDataParallel from CUDA (#38454 ) Summary: Decouple DataParallel/DistributedDataParallel from CUDA to support more device types. - Move torch/cuda/comm.py to torch/nn/parallel/comm.py with minor changes for common devices support. Torch.cuda.comm is kept as is for backward compatibility - Provide common APIs to arbitrary device types without changing existing CUDA APIs in torch.cuda space. - Replace the torch.cuda calls in DataParellel/DistributedDataParallel with the new APIs. Related RFC: [https://github.com/pytorch/pytorch/issues/36160](https://github.com/pytorch/pytorch/issues/36160) Pull Request resolved: https://github.com/pytorch/pytorch/pull/38454 Differential Revision: D22051557 Pulled By: mrshenli fbshipit-source-id: 7842dad0e5d3ca0f6fb760bda49182dcf6653af8	2020-07-07 12:48:16 -07:00
Edward Leardi	6b50874cb7	Fix HTTP links in documentation to HTTPS (#40878 ) Summary: I ran `make linkcheck` using `sphinx.builders.linkcheck` on the documentation and noticed a few links weren't using HTTPS so I quickly updated them all. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40878 Differential Revision: D22404647 Pulled By: ngimel fbshipit-source-id: 9c9756db59197304023fddc28f252314f6cf4af3	2020-07-06 20:05:21 -07:00
X Wang	88ea51c061	doc string fix for torch.cuda.set_rng_state_all (#40544 ) Summary: Fix https://github.com/pytorch/pytorch/issues/40239 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40544 Differential Revision: D22233989 Pulled By: ezyang fbshipit-source-id: b5098357a3e0c50037f95ba0d701523d5dce2628	2020-06-25 08:37:14 -07:00
SsnL	de7ac60cf4	Add out= variants for cuda.comm.broadcast/gather/scatter (#39681 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/38911 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39681 Differential Revision: D22161342 Pulled By: mrshenli fbshipit-source-id: 60295077159b02087823e93bb6ebac9d70adea0a	2020-06-24 12:58:19 -07:00
Michael Carilli	b4ccdef090	Allow torch.cuda.amp.GradScaler to support sparse gradients (#36786 ) Summary: Should close https://github.com/pytorch/pytorch/issues/35810. I decided to keep sparse handling on the Python side for clarity, although it could be moved to the C++ side (into `_amp_non_finite_check_and_unscale_`) without much trouble. For non-fp16 sparse grads the logic is simple (call `_amp_non_finite_check_and_unscale_` on `grad._values()`) instead of `grad` itself. At least I hope it's that easy. For fp16 sparse grads, it's tricker. Sparse tensors can be uncoalesced. From the [Note](https://pytorch.org/docs/master/sparse.html#torch.sparse.FloatTensor): > Our sparse tensor format permits uncoalesced sparse tensors, where there may be duplicate coordinates in the indices; in this case, the interpretation is that the value at that index is the sum of all duplicate value entries. An uncoalesced scaled fp16 grad may have values at duplicate coordinates that are all finite but large, such that adding them to make the coalesced version WOULD cause overflows. If I checked `_values()` on the uncoalesced version, it might not report overflows, but I think it should. So, if the grad is sparse, fp16, and uncoalesced, I still call `_amp_non_finite_check_and_unscale_` to unscale `grad._values()` in-place, but I also double-check the coalesced version by calling a second `_amp_non_finite_check_and_unscale_` on `grad.coalesce()._values()`. `coalesce()` is out-of-place, so this call doesn't redundantly affect `grad._values()`, but it does have the power to populate the same `found_inf` tensor. The `is_coalesced()` check and `coalesce()` probably aren't great for performance, but if someone needs a giant embedding table in FP16, they're better than nothing and memorywise, they'll only create a copy of nnz gradient values+indices, which is still way better than changing the whole table to FP32. An `unscale` variant with liberty to create unscaled grads out-of-place, and replace `param.grad` instead of writing through it, could get away with just one `_amp_non_finite_check_and_unscale_`. It could say `coalesced = grad.coalesced()`, do only the stronger `_amp_non_finite_check_and_unscale_` on `coalesced._values()`, and set `param.grad = coalesced`. I could even avoid replacing `param.grad` itself by going one level deeper and setting `param.grad`'s indices and values to `coalesced`'s, but that seems brittle and still isn't truly "in place". you could whiteboard an uncoalesced fp32 grad with the same property, but fp32's range is big enough that I don't think it's realistic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36786 Reviewed By: ezyang Differential Revision: D22202832 Pulled By: ngimel fbshipit-source-id: b70961a4b6fc3a4c1882f65e7f34874066435735	2020-06-24 09:10:49 -07:00
Michael Carilli	3b040c478a	Make custom_fwd a no-op when not executed under autocast (#36171 ) Summary: Currently, a custom autograd function written with ``` torch.cuda.amp.custom_fwd(cast_inputs=dtype) def forward(ctx, *args): ... ``` casts incoming floating-point CUDA tensors to `dtype` unconditionally, regardless of whether the function executes in an autocast-enabled region. I think I had the wrong idea there. Autocast-disabled regions should give the user control of input types. Also, `custom_fwd(cast_inputs=dtype)`-decorated functions' behavior should align with native fp32list/fp16list functions. C++-side casting wrappers have no effect when autocast is disabled, and `custom_fwd`'s casting should behave the same way. The present PR changes `custom_fwd` so it only casts in autocast-enabled regions (also updates custom_fwd to ignore fp64 inputs, like the C++ wrappers). Pull Request resolved: https://github.com/pytorch/pytorch/pull/36171 Differential Revision: D22179511 Pulled By: ngimel fbshipit-source-id: 5a93d070179a43206066bce19da0a5a19ecaabbd	2020-06-23 10:23:02 -07:00
Nikita Shulga	5766da503b	Device name should be a string, not bytes (#40322 ) Summary: I.e. do not accept `bytes` as possible type of `device` argument in `torch.cuda._get_device_index` Pull Request resolved: https://github.com/pytorch/pytorch/pull/40322 Differential Revision: D22176885 Pulled By: malfet fbshipit-source-id: 2f3a46174161f1cdcf6a6ad94a31e54b18ad6186	2020-06-22 19:27:25 -07:00
Nikita Shulga	8b5732e8ad	Move `torch.cuda` annotations inline (#40075 ) Summary: Also enable `torch.cuda` typechecking Pull Request resolved: https://github.com/pytorch/pytorch/pull/40075 Differential Revision: D22121275 Pulled By: malfet fbshipit-source-id: dbecef09911334e8f3d87f5ecab66349da9f2325	2020-06-18 15:52:29 -07:00
Nikita Shulga	76fbfba644	Move _dummy_type to _utils.py (#40177 ) Summary: Use it from both __init__ and streams to define dummy types when CUDA is missing Fix accidental reference of global `storage_name` from `_dummy_type` Add type annotations Pull Request resolved: https://github.com/pytorch/pytorch/pull/40177 Differential Revision: D22106922 Pulled By: malfet fbshipit-source-id: 52fbfd91d70a78eb14d7ffda109c02ad1231497e	2020-06-17 22:50:02 -07:00
SsnL	d5236f8517	Avoid initializing unnecessary tensors in nccl.reduce (#39688 ) Summary: While working on https://github.com/pytorch/pytorch/issues/38911, I realized that `nccl.reduce` only needs a single output tensor, while our current implementation requires a list of output tensors. This, along with a TODO I fixed in reduce_add, should have some speed up for data parallel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39688 Differential Revision: D22034547 Pulled By: mrshenli fbshipit-source-id: e74d54d673ebbb062474b1bb5cc93a095a3a5f6c	2020-06-14 10:11:32 -07:00
Edward Yang	da2004e132	Upgrade lint. (#39483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39483 I fixed all of the new errors that occurred because of the upgrade. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21884575 Pulled By: ezyang fbshipit-source-id: 45c8e1f1ecb410c8d7c46dd3922ad70e982a0685	2020-06-04 12:56:43 -07:00
Michael Carilli	25f918548d	Allow GradScaler to be pickled (#38296 ) Summary: Should unblock https://github.com/PyTorchLightning/pytorch-lightning/issues/1782. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38296 Differential Revision: D21553296 Pulled By: albanD fbshipit-source-id: 9041a72d7cf8833e4b01bc767fd2321f17c7c5f2	2020-05-14 09:14:28 -07:00
Jerry Ma	88c447bf71	Change DeprecationWarning to UserWarning in `torch.cuda` (#32142 ) Summary: Follow-up of https://github.com/pytorch/pytorch/issues/27361 . Addresses https://github.com/pytorch/pytorch/issues/32141 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/32142 Differential Revision: D19404540 Pulled By: gchanan fbshipit-source-id: f0b230a3224004286064da2b617ff471ba272f47	2020-05-06 08:28:43 -07:00
anjali411	1f09f7ea44	Python API for Complex Storage and storage copy logic (#35771 ) Summary: Following up on this: https://github.com/pytorch/pytorch/pull/35851 cross dtype storage copy is not being used internally, so I have not included cross dtype copy for complex. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35771 Differential Revision: D21319650 Pulled By: anjali411 fbshipit-source-id: 07c72996ee598eba0cf401ad61534494d6f5b5b3	2020-05-01 11:47:22 -07:00
Bartosz Gasiorzewski	867e05921f	Fix multiple issues with type annotations (#36358 ) Summary: - added tests that showcase the problems - fixed the problems These changes would allow me to remove many "# type: ignore" comments in my codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36358 Differential Revision: D21230704 Pulled By: ezyang fbshipit-source-id: e6d475a0aa1fb40258fa0231ade28c38108355fb	2020-04-29 11:16:39 -07:00
Thomas Viehmann	dc4d888193	ROCm: don't warn about CUDA compute capabilities (#35949 ) Summary: The warning doesn't apply for ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35949 Differential Revision: D21050540 Pulled By: ezyang fbshipit-source-id: 23b13bddd13f132c2017ddea12b2dc54f611fba6	2020-04-20 11:53:56 -07:00
Michael Carilli	e6bc34f549	Amp gradient accumulation example (#36601 ) Summary: Several people have asked me about proper Amp usage with gradient accumulation. In particular, it's [unclear to people](https://github.com/NVIDIA/apex/issues/439#issuecomment-610351482) that you should only call `scaler.unscale_()` (if desired) and `scaler.update()` in iterations where you actually plan to step. This PR adds a minimal accumulation example. I built the docs locally and it looks free from sphinx errors, at least. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36601 Differential Revision: D21082295 Pulled By: ngimel fbshipit-source-id: b2faa6c02b9f7e1972618a0f1d5360a03f0450ac	2020-04-17 09:56:36 -07:00
Michael Carilli	0f0271e255	[RELAND2] Eager autocasting, out-of-place ops only (with MSVC 2017 fix) (#35102 ) Summary: This is the second reland attempt for https://github.com/pytorch/pytorch/pull/32140. The first reland attempt https://github.com/pytorch/pytorch/pull/35011 failed due a [small incompatible change](https://github.com/pytorch/pytorch/pull/35011#issuecomment-601754216) in recent master (`skipIfRocm` was removed from `test_data_parallel.py`). The present PR restores skipIfRocm. Description from first reland attempt https://github.com/pytorch/pytorch/pull/35011: > https://github.com/pytorch/pytorch/pull/32140 was approved and merged, but [reverted](`d0577e19f0`) because it broke builds with versions of Visual Studio older than 15.8 that were not represented in public CI. The build failures were caused by a [known VS bug](https://developercommunity.visualstudio.com/content/problem/27729/allow-function-with-internal-linkage-as-template-n.html), fixed in versions 15.8 and newer. > > The present PR reverts the revert (restoring https://github.com/pytorch/pytorch/pull/32140 's diffs) and adds a workaround to enable compilation with VS < 15.8. The workaround isn't pretty, but it's guarded by macros such that it's only used when compiling with VS < 15.8. All other builds compile with the same code/control flow as was merged in https://github.com/pytorch/pytorch/pull/32140. > > Original description of https://github.com/pytorch/pytorch/pull/32140: > > Initial integration of eager autocasting, supporting out-of-place ops only for easier review. > Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081 > > > In-place ops and ops with user-supplied out=... can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/issues/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35102 Differential Revision: D20596918 Pulled By: ezyang fbshipit-source-id: 60caa279bb0ce4a9bb0b28c1d585d42cf1cc7e50	2020-03-24 09:08:04 -07:00
Mike Ruberry	fe276d541e	Revert D20541921: [pytorch][PR] [RELAND] Eager autocasting, out-of-place ops only (with MSVC 2017 fix) Test Plan: revert-hammer Differential Revision: D20541921 Original commit changeset: abb5488dca86 fbshipit-source-id: d2c6038978f80e5429632f8b49107090a8a247f4	2020-03-19 22:39:12 -07:00
Michael Carilli	991b97277a	[RELAND] Eager autocasting, out-of-place ops only (with MSVC 2017 fix) (#35011 ) Summary: https://github.com/pytorch/pytorch/pull/32140 was approved and merged, but [reverted](`d0577e19f0`) because it broke builds with versions of Visual Studio older than 15.8 that were not represented in public CI. The build failures were caused by a [known VS bug](https://developercommunity.visualstudio.com/content/problem/27729/allow-function-with-internal-linkage-as-template-n.html), fixed in versions 15.8 and newer. The present PR reverts the revert (restoring https://github.com/pytorch/pytorch/pull/32140 's diffs) and adds a workaround to enable compilation with VS < 15.8. The workaround isn't pretty, but it's guarded by macros such that it's only used when compiling with VS < 15.8. All other builds compile with the same code/control flow as was merged in https://github.com/pytorch/pytorch/pull/32140. Original description of https://github.com/pytorch/pytorch/pull/32140: > Initial integration of eager autocasting, supporting out-of-place ops only for easier review. Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081 > In-place ops and ops with user-supplied out=... can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/issues/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35011 Differential Revision: D20541921 Pulled By: ezyang fbshipit-source-id: abb5488dca8620b0daac4306ebf2bb47fc36e4f5	2020-03-19 20:18:18 -07:00
Edward Yang	d0577e19f0	Revert D20346700: [pytorch][PR] Eager autocasting, out-of-place ops only Test Plan: revert-hammer Differential Revision: D20346700 Original commit changeset: 12d77b391731 fbshipit-source-id: 108d72bf24232f443c0be293ec932c0c478d6a60	2020-03-18 11:42:51 -07:00
Michael Carilli	aaa8f02156	Eager autocasting, out-of-place ops only (#32140 ) Summary: Initial integration of eager autocasting, supporting out-of-place ops only for easier review. Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081 In-place ops and ops with user-supplied `out=...` can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/pull/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32140 Differential Revision: D20346700 Pulled By: ezyang fbshipit-source-id: 12d77b3917310186fbddf11c59b2794dc859131f	2020-03-18 10:28:21 -07:00
Peter Bell	5fc5cf6571	Stop using ctypes to interface with CUDA libraries. (#33678 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33016, Continuation of https://github.com/pytorch/pytorch/issues/31160 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33678 Differential Revision: D20249187 Pulled By: ezyang fbshipit-source-id: 172ce4a0fee7fbe01436a421d1af22ef6173b6ed	2020-03-11 07:22:46 -07:00
Emilio Castillo	31cc311143	Expose `CUDACachingAllocator` `raw_alloc` and `raw_delete` to python (#33860 ) Summary: This PR aims to improve the interoperability with [CuPy](https://github.com/cupy/cupy/pulls). Instead of having two separate and conflicting memory pools. With this PR, CuPy can directly alloc memory from the PyTorch allocator by means of this proposal https://github.com/cupy/cupy/pull/3126 We would like to gather feedback to know if this approach makes sense for PyTorch, or other alternative designs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33860 Differential Revision: D20212788 Pulled By: ngimel fbshipit-source-id: bc1e08a66da1992d26021147bf645dc65239581c	2020-03-03 17:50:11 -08:00
Michael Carilli	a726827ec8	Formatting changes for gradient scaling (#33832 ) Summary: hard to get right locally...I can build the docs but never quite match what it looks like live. the bullet point indentation was just an oversight. Removing `Returns:` formatting tabs because they take up a lot of space when rendered and add no clarity. Some functions in Pytorch [do use them](https://pytorch.org/docs/master/torch.html#torch.eye), but [many don't bother](https://pytorch.org/docs/master/torch.html#torch.is_tensor), so apparently some people shared my feelings (Not using them is in line with existing practice). Pull Request resolved: https://github.com/pytorch/pytorch/pull/33832 Differential Revision: D20135581 Pulled By: ngimel fbshipit-source-id: bc788a7e57b142f95c4fa5baf3fe01f94c45abd8	2020-02-28 11:40:48 -08:00
Michael Carilli	fc6a153688	[WIP] Reanimate gradient scaling API with original scale update heuristic (#33366 ) Summary: Also, windows memory failures responsible for the earlier reversion have been fixed. This PR (initially) contains 2 commits: * a revert of the revert * all changes to implement the original Apex scale update heuristic, squashed into a single commit for easier diff review Pull Request resolved: https://github.com/pytorch/pytorch/pull/33366 Differential Revision: D20099026 Pulled By: ngimel fbshipit-source-id: 339b9b6bd5134bf055057492cd1eedb7e4461529	2020-02-25 19:00:34 -08:00
Edward Yang	ae53f8dd25	Revert D19859905: [pytorch][PR] Gradient scaling API Test Plan: revert-hammer Differential Revision: D19859905 Original commit changeset: bb8ae6966214 fbshipit-source-id: 28f1c93e8a00e3a4bbe8cc981499b15468f0b970	2020-02-14 11:03:27 -08:00
Michael Carilli	40246fa63c	Gradient scaling API (#26512 ) Summary: This PR implements the gradient scaling API that mruberry, jjsjann123, ngimel, zdevito, gchanan and I have been discussing. Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081. Volume-wise, this PR is mostly documentation and tests. The Python API (found entirely in `torch/cuda/amp/amp_scaler.py`) is lightweight . The exposed functions are intended to make the implementation and control flow of gradient scaling convenient, intuitive, and performant. The API is probably easiest to digest by looking at the documentation and examples. `docs/source/amp.rst` is the homepage for the Automatic Mixed Precision package. `docs/source/notes/amp_examples.rst` includes several examples demonstrating common but not-immediately-obvious use cases. Examples are backed by tests in `test_cuda.py` (and thankfully the tests pass :P). Two small utility kernels have been added in `native/cuda/AmpKernels.cu` to improve performance and avoid host-device synchronizations wherever possible. Existing optimizers, both in the wild and in Pytorch core, do not need to change to use the scaling API. However, the API was also designed to establish a contract between user scripts and optimizers such that writers of _new_ custom optimizers have the control points they need to implement fast, optionally sync-free updates. User scripts that obey the scaling API can drop such custom optimizers in and reap performance benefits without having to change anything aside from the optimizer constructor itself. [I know what the contract with custom optimizers should be](`35829f24ef/torch/cuda/amp/amp_scaler.py (L179-L184)`), but I'm waiting for review on the rest of the API before I go about documenting it (it will be given a dedicated section in `docs/source/notes/amp_examples.rst`. Currently, the gradient scaling examples do not include the auto-casting API as discussed in https://github.com/pytorch/pytorch/issues/25081. The gradient scaling API is intended to be orthogonal/modular relative to autocasting. Without auto-casting the gradient scaling API is fully use-_able_, but not terribly use-_ful_, so it's up to you guys whether you want to wait until auto-casting is ready before merging the scaling API as well. ### Todo - [ ] How do I get c10 registered status for my two custom kernels? They're very simple. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26512 Differential Revision: D19859905 Pulled By: mruberry fbshipit-source-id: bb8ae6966214718dfee11345db824389e4286923	2020-02-13 11:06:06 -08:00
peter	b77c25dec0	Fix dll load logic for Python 3.8 on Windows (#32215 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31181 and https://github.com/pytorch/pytorch/pull/31162#discussion_r362495611. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32215 Differential Revision: D19501869 Pulled By: ezyang fbshipit-source-id: 363824e52d2592ad968ecf1df345aa4c0daff915	2020-01-22 08:33:34 -08:00
Brian Wignall	e7fe64f6a6	Fix typos (#30606 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606 Differential Revision: D18763028 Pulled By: mrshenli fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c	2019-12-02 20:17:42 -08:00
Peter Bell	bb119d957e	Move torch.cuda's atfork handler into C++ (#29101 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/23401 We cannot rely on `multiprocessing.util.register_after_fork` since it is only called for processes created by the `multiprocessing` module and not `os.fork()`. Moving to `pthread_atfork` does always get called. However, I don't think it's safe to call python functions inside of the `atfork` handler so the python code has to be a bit more careful when checking `_initialized`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29101 Differential Revision: D18355451 Pulled By: ezyang fbshipit-source-id: 4d4253a3669796212c099dad4e5bdfdb0df40469	2019-11-11 07:34:27 -08:00
Igor Fedan	75309b45f3	explicitly provide memory format when calling to clone() at Indexing.cpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28660 Test Plan: Imported from OSS Differential Revision: D18333346 Pulled By: ifedan fbshipit-source-id: 06590205d883a5096388a4ae318389244130972d	2019-11-07 05:38:32 -08:00
なるみ	d83389d327	Ignore F401 in all __init__.py without putting noqa (#25823 ) Summary: By adding `per-file-ignores = __init__.py: F401` into `.flake8` with `flake8>=3.7`, we can ignore F410 in all `__init__.py` without putting `# noqa: F401` line by line. http://flake8.pycqa.org/en/latest/user/options.html?highlight=per-file-ignores#cmdoption-flake8-per-file-ignores Pull Request resolved: https://github.com/pytorch/pytorch/pull/25823 Differential Revision: D17252182 Pulled By: soumith fbshipit-source-id: 87b174075b79e4078953a7521bd1a8f82405646b	2019-10-23 15:28:13 -07:00
zou3519	e5d6b75319	Bag of documentation fixes; fix more sphinx warnings (#27850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27850 Many of these are real problems in the documentation (i.e., link or bullet point doesn't display correctly). Test Plan: - built and viewed the documentation for each change locally. Differential Revision: D17908123 Pulled By: zou3519 fbshipit-source-id: 65c92a352c89b90fb6b508c388b0874233a3817a	2019-10-15 07:31:14 -07:00
zou3519	23bffc4f14	Fix most documentation warnings (#27782 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27782 Warnings show up when running `make html` to build documentation. All of the warnings are very reasonable and point to bugs in our docs. This PR attempts to fix most of those warnings. In the future we will add something to the CI that asserts that there are no warnings in our docs. Test Plan: - build and view changes locally Differential Revision: D17887067 Pulled By: zou3519 fbshipit-source-id: 6bf4d08764759133b20983d6cd7f5d27e5ee3166	2019-10-13 10:34:01 -07:00
Jerry Ma	1610ea8ef8	Comprehensive-ish instrumentation for CUDA memory allocator (#27361 ) Summary: Adds comprehensive memory instrumentation to the CUDA caching memory allocator. # Counters Added comprehensive instrumentation for the following stats: - Allocation requests (`allocation`) - Allocated memory (`allocated_bytes`) - Reserved segments from cudaMalloc (`segment`) - Reserved memory (`reserved_bytes`) - Active memory blocks (`active`) - Active memory (`active_bytes`) - Inactive, non-releasable blocks (`inactive_split`) - Inactive, non-releasable memory (`inactive_split_bytes`) - Number of failed cudaMalloc calls that result in a cache flush and retry (`cuda_malloc_retries`) - Number of OOMs (`num_ooms`) Except for the last two, these stats are segmented between all memory, large blocks, and small blocks. Along with the current value of each stat, historical counts of allocs/frees as well as peak usage are tracked by the allocator. # Snapshots Added the capability to get a "memory snapshot" – that is, to generate a complete dump of the allocator block/segment state. # Implementation: major changes - Added `torch.cuda.memory_stats()` (and associated C++ changes) which returns all instrumented stats as a dictionary. - Added `torch.cuda.snapshot()` (and associated C++ changes) which returns a complete dump of the allocator block/segment state as a list of segments. - Added memory summary generator in `torch.cuda.memory_summary()` for ease of client access to the instrumentation stats. Potentially useful to dump when catching OOMs. Sample output here: https://pastebin.com/uKZjtupq # Implementation: minor changes - Add error-checking helper functions for Python dicts and lists in `torch/csrc/utils/`. - Existing memory management functions in `torch.cuda` moved from `__init__.py` to `memory.py` and star-imported to the main CUDA module. - Add various helper functions to `torch.cuda` to return individual items from `torch.cuda.memory_stats()`. - `torch.cuda.reset_max_memory_cached()` and `torch.cuda.reset_max_memory_allocated()` are deprecated in favor of `reset_peak_stats`. It's a bit difficult to think of a case where only one of those stats should be reset, and IMO this makes the peak stats collectively more consistent. - `torch.cuda.memory_cached()` and `torch.cuda.max_memory_cached()` are deprecated in favor of `*memory_reserved()`. - Style (add access modifiers in the allocator class, random nit fixes, etc.) # Testing - Added consistency check for stats in `test_cuda.py`. This verifies that the data from `memory_stats()` is faithful to the data from `snapshot()`. - Ran on various basic workflows (toy example, CIFAR) # Performance Running the following speed benchmark: https://pastebin.com/UNndQg50 - Before this PR: 45.98 microseconds per tensor creation - After this PR: 46.65 microseconds per tensor creation Pull Request resolved: https://github.com/pytorch/pytorch/pull/27361 Differential Revision: D17758747 Pulled By: jma127 fbshipit-source-id: 5a84e82d696c40c505646b9a1b4e0c3bba38aeb6	2019-10-08 15:42:48 -07:00
Edward Yang	925131a85e	Fix race in CUDA initialization (#25788 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25788 Previously, I thought that _lazy_init held the GIL throughout initialization, so I could write the code in a single-threaded manner. This is not true; it releases the GIL at various points, which make it possible for another thread to race with initialization. The correct fix is to add locking for the initialization section, so other threads wait until the first thread finishes initializing before being let in. There is some subtlety with how to handle lazy calls, which will call _lazy_init reentrantly; this is handled using TLS that lets you know if you are the initializing thread (and therefore reentrant calls are OK.) Fixes #16559 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17366348 Pulled By: ezyang fbshipit-source-id: 99b982709323e2370d03c127c46d87be97495916	2019-09-17 07:40:29 -07:00
Yaroslav Bulatov	7f3c423541	Add type hint for cuda.set_rng_state (#26200 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/26199 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26200 Differential Revision: D17386885 Pulled By: soumith fbshipit-source-id: 9da03aae29281b2ed691cbfdd7b85fde55e5b7ef	2019-09-14 19:29:42 -07:00
Hong Xu	97f129bf0a	Let set_rng_state and get_rng_state accept string parameter (#23448 ) Summary: Currently set_rng_state and get_rng_state do not accept string as their parameters. This commit let them accept strings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23448 Differential Revision: D16527172 Pulled By: soumith fbshipit-source-id: 8f9a2129979706e16877cc110f104770fbbe952c	2019-07-29 08:08:39 -07:00
Guanheng Zhang	b22adeb007	Fix error message for a wrong fork CUDA (#23322 ) Summary: Re-land https://github.com/pytorch/pytorch/pull/23030 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23322 Differential Revision: D16469442 Pulled By: zhangguanheng66 fbshipit-source-id: 70b63ab6265efa3f289111ef0ce46bb3c0d353bc	2019-07-25 12:58:14 -07:00
Edward Yang	1f608d09cf	Revert D16440000: [pytorch][PR] Re-land "Fix error message for a wrong fork CUDA" Differential Revision: D16440000 Original commit changeset: e05683275522 fbshipit-source-id: b688f24c1e6d3d8f63c2d415262a3f0ab1b85914	2019-07-24 12:05:36 -07:00
Guanheng Zhang	aa660b8eb7	Re-land "Fix error message for a wrong fork CUDA" (#23209 ) Summary: Re-land https://github.com/pytorch/pytorch/pull/23030 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23209 Differential Revision: D16440000 Pulled By: zhangguanheng66 fbshipit-source-id: e05683275522835a33d5a7e6d76b7e94774e4d98	2019-07-24 07:01:04 -07:00
Jesse Hellemn	06d11f0434	Revert D16368004: [pytorch][PR] Fix error message for a wrong fork CUDA Differential Revision: D16368004 Original commit changeset: 44b6977790ce fbshipit-source-id: c81a232bd52219e56a19c64650c4b6dedeb167cb	2019-07-22 18:46:48 -07:00
Guanheng Zhang	a6e45a69a8	Fix error message for a wrong fork CUDA (#23030 ) Summary: Fix https://github.com/pytorch/pytorch/issues/17357 Unblock 1.2 release. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23030 Differential Revision: D16368004 Pulled By: zhangguanheng66 fbshipit-source-id: 44b6977790ce768efa4777bae41d4b26dae5f288	2019-07-22 15:04:32 -07:00
Iurii Zdebskyi	3a8d7463bd	Enabled BFloat16 storage (#21523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21523 ghimport-source-id: 698b3cbd6b21c09b9ff8bf8011980df8e35c33b0 Test Plan: Imported from OSS Differential Revision: D15819368 Pulled By: izdeby fbshipit-source-id: f6b3bba7b3ca8ee677bd80a231dbb3920c07d61c	2019-07-09 21:51:06 -07:00
Syed Tousif Ahmed	effcc398c4	Refactor Random Number Generators in ATen (#21555 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21555 ghimport-source-id: dd900a8c3e1ef9ef1e011b8bb5476626d18cc462 Test Plan: Imported from OSS Differential Revision: D15875780 Pulled By: ezyang fbshipit-source-id: 6e04e90af62ab9c9593d74f344a3a084aaaf6f43	2019-06-19 13:54:09 -07:00
ptrblck	bad67015fe	Add warning for Turing GPUs and CUDA <= 9000 (#21468 ) Summary: Turing GPUs (compute capability 7.5) require CUDA10 to work properly. We've seen some issues for these GPUs using PyTorch binaries with CUDA9 or older: [Discussion Board #1](https://discuss.pytorch.org/t/cudnn-status-execution-failed-error/38575) [Discussion Board #2](https://discuss.pytorch.org/t/cublas-runtime-error-on-gpu-running-but-works-on-cpu/46545/6) Tested on using CUDA9 with an RTX 2080Ti. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21468 Differential Revision: D15696170 Pulled By: ezyang fbshipit-source-id: ed43f4e4948d3f97ec8e7d7952110cbbfeafef2a	2019-06-06 19:33:02 -07:00
Syed Tousif Ahmed	0e3c4a054b	Remove curandStateMTGP32 usage (#21301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21301 ghimport-source-id: d4516237a8fb46d1f74c47532e849e5926fc6a79 Differential Revision: D15632929 Pulled By: ezyang fbshipit-source-id: b5147edb95dc3d71f87581aa2ab002e48c3fef30	2019-06-05 14:06:25 -07:00
Syed Tousif Ahmed	5268b7dfaf	Remove support for CUDA 8 (#20298 ) Summary: 1.1.0 stopped support for CUDA 8 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20298 Differential Revision: D15294639 Pulled By: ezyang fbshipit-source-id: b9411bfe456f93f1529b745dc83b7d6310df684d	2019-05-13 11:24:22 -07:00
peter	d6f62b70f3	Fix cuda and cudnn libraries search process on Windows (#20205 ) Summary: Fixes #20202 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20205 Differential Revision: D15258626 Pulled By: ezyang fbshipit-source-id: 855ad457a8bb7a46accc7cf6ec5cb09e98f6e770	2019-05-08 06:08:47 -07:00
SsnL	dce3d74dfb	add torch.cuda.synchronize(device=None) (#19573 ) Summary: fixes https://github.com/pytorch/pytorch/issues/19509 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19573 Differential Revision: D15045730 Pulled By: ezyang fbshipit-source-id: 732721b4b360fc4348ca7c87d4cd1386e7651bdd	2019-04-23 08:40:38 -07:00
Jon Malmaud	1b25fdbcd0	More type stubs (#18511 ) Summary: Added stubs for: * The `device` module * The `cuda` module * Parts of the `optim` module * Began adding stubs for the `autograd` module. I'll annotate more later but `no_grad` and friends are probably the most used exports from it so it seemed like a good place to start. This would close #16996, although comments on that issue reference other missing stubs so maybe it's worth keeping open as an umbrella issue. The big remaining missing package is `nn`. Also added a `py.typed` file so mypy will pick up on the type stubs. That closes #17639. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18511 Differential Revision: D14715053 Pulled By: ezyang fbshipit-source-id: 9e4882ac997063650e6ce47604b3eaf1232c61c9	2019-04-01 16:03:58 -07:00
Edward Yang	173f224570	Turn on F401: Unused import warning. (#18598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598 ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a Stack from [ghstack](https://github.com/ezyang/ghstack): * #18598 Turn on F401: Unused import warning. This was requested by someone at Facebook; this lint is turned on for Facebook by default. "Sure, why not." I had to noqa a number of imports in __init__. Hypothetically we're supposed to use __all__ in this case, but I was too lazy to fix it. Left for future work. Be careful! flake8-2 and flake8-3 behave differently with respect to import resolution for # type: comments. flake8-3 will report an import unused; flake8-2 will not. For now, I just noqa'd all these sites. All the changes were done by hand. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14687478 fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3	2019-03-30 09:01:17 -07:00
Vitaly Fedyunin	5653a914f7	Implement reference counting for shared IPC CUDA tensors (#16854 ) Summary: This is to fix #16141 and similar issues. The idea is to track a reference to every shared CUDA Storage and deallocate memory only after a consumer process deallocates received Storage. ezyang Done with cleanup. Same (insignificantly better) performance as in file-per-share solution, but handles millions of shared tensors easily. Note [ ] documentation in progress. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16854 Differential Revision: D13994490 Pulled By: VitalyFedyunin fbshipit-source-id: 565148ec3ac4fafb32d37fde0486b325bed6fbd1	2019-03-25 10:24:38 -07:00
Shen Li	b527055fcf	Restore current streams on dst device after switching streams (#17439 ) Summary: When switching back to `d0` from a stream on a different device `d1`, we need to restore the current streams on both `d0` and `d1`. The current implementation only does that for `d0`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17439 Differential Revision: D14208919 Pulled By: mrshenli fbshipit-source-id: 89f2565b9977206256efbec42adbd789329ccad8	2019-02-25 12:06:41 -08:00
Iurii Zdebskyi	444039c47b	Bool tensor. Part 0: Boolean storage implementation (#16810 ) Summary: This is the first commit from a series of planned changes in order to add boolean tensors to PyTorch. The whole plan looks like this: 0. Storage Implementation (this change) 1. Tensor Creation. 2. Tensor Conversions. 3. Tensor Indexing. 4. Tensor Operations. 5. Back compatibility related changes. This feature was requested by the community: https://github.com/pytorch/pytorch/issues/4764 https://github.com/pytorch/pytorch/issues/4219 https://github.com/pytorch/pytorch/issues/4288 Change: Added boolean type to the Storage class for CPU and CUDA backends. Tested via: 1. unit tests 2. running this: -> import torch -> torch.BoolStorage <class 'torch.BoolStorage'> -> torch.cuda.BoolStorage <class 'torch.cuda.BoolStorage'> Pull Request resolved: https://github.com/pytorch/pytorch/pull/16810 Reviewed By: gchanan Differential Revision: D14087246 Pulled By: izdeby fbshipit-source-id: 042642ced1cb0fd1bb6bff05f9ca871a5c54ee5e	2019-02-19 08:22:13 -08:00
surgan12	fad9eda7fb	Optional arg fixes (#17222 ) Summary: fixes #17210. cc : ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/17222 Differential Revision: D14130833 Pulled By: soumith fbshipit-source-id: 19ff6020c47208e3436ae28cd16110a0f435b25e	2019-02-19 04:39:18 -08:00
Soumith Chintala	717ae09184	improve error message (#16719 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/16712 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16719 Differential Revision: D13978688 Pulled By: ezyang fbshipit-source-id: 61f8fa4c54c6969a58550e32e18be2eb9254ced7	2019-02-06 15:51:58 -08:00
Lu Fang	b1b00f329e	Fix the flake8 linter Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16549 Reviewed By: bddppq Differential Revision: D13877435 Pulled By: houseroad fbshipit-source-id: dbe575ba3f6dd30d27ac6aa5eec2eea025063540	2019-01-30 09:36:00 -08:00
Shen Li	2235fb256e	Add default_stream() and enhance current_stream() (#16200 ) Summary: Closes #16156 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16200 Differential Revision: D13747455 Pulled By: mrshenli fbshipit-source-id: 00c0d5f341c3ac7a757bdb4631a17e11fbc6d3ec	2019-01-22 14:35:19 -08:00
Shen Li	898329c3f9	Unify device() return type in Stream, Event, and Tensor (#16150 ) Summary: Addresses one future work item in #15937 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16150 Differential Revision: D13732299 Pulled By: mrshenli fbshipit-source-id: 4d0b35df573a3bf92dea6e2e7eb42fe8bac77b18	2019-01-19 23:01:31 -08:00
Shen Li	292edfb087	Change current device in stream context manager if necessary (#16128 ) Summary: Fixes #16019 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16128 Differential Revision: D13721850 Pulled By: mrshenli fbshipit-source-id: 422c6c0b97c1cd46e127e265b532cb8c74a3aac5	2019-01-18 12:39:51 -08:00
Derek Kim	fbdafb006e	Fix trivial typos in torch.cuda._utils (#16026 ) Summary: Trivial typo fixings. Maybe the indefinite article "an" is needed before each "specified index" but I'm not perfectly sure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16026 Differential Revision: D13709499 Pulled By: ezyang fbshipit-source-id: 698b000bb8aa063afd81db6e67046456a439b2ce	2019-01-17 10:40:43 -08:00
Shen Li	24f4d3987e	Move all Stream and Event Python implementation to C++ (#15937 ) Summary: 1. Added `torch/csrc/cuda/Event.h` and `torch/csrc/cuda/Event.cpp` to bind Python Event class to C++ implementation. 2. Move all CUDA runtime invocations from `torch/cuda/streams.py` to C++ 3. Added tests to cover Stream and Event APIs. ~(event IPC handle tests is introduced in #15974)~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/15937 Differential Revision: D13649001 Pulled By: mrshenli fbshipit-source-id: 84ca58f35f6ba679a4ba33150ceba678d760d240	2019-01-17 07:29:22 -08:00
SsnL	300dcc3b96	Add cuda.reset_max_memory_* (#15985 ) Summary: Addresses #15968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15985 Differential Revision: D13649916 Pulled By: soumith fbshipit-source-id: a207aea5709a79dba7a6fc541d0a70103f49efff	2019-01-14 07:31:51 -08:00
Shen Li	7b9f794580	Wrap C10 CUDAStream instead of cudaStream_t in THCPStream Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15833 Differential Revision: D13608337 Pulled By: mrshenli fbshipit-source-id: 4c66ef89fad0dc14a11ddb69da92907797cd2828	2019-01-09 15:12:48 -08:00
Shen Li	99d2743863	Move Stream.query() implementation down to C++ (#15737 ) Summary: See #15682 Pushing up this small PR to check if I am doing the right thing. If correct, more will follow for other Stream APIs. Questions will be added inline. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15737 Differential Revision: D13581400 Pulled By: mrshenli fbshipit-source-id: 24afed7847b89b62f0692c79a101ec7ff9d9ee4d	2019-01-07 20:58:07 -08:00
Shen Li	1e9a6d7192	A quick fix for Stream operation errors on non-current device (#15689 ) Summary: see #15682 This is a quick fix by implementing the simpler solution as suggested by colesbury. As benchmark result shows, it slows down `Stream.query()` by ~20%, I would be happy to further pursue a more complex solution by implementing this in C++/ATen. But I would still vote for merge this quick fix first just to get rid of the bug sooner. ~Test TBA~ Added FYI jeffreyksmithjr now ```python In [1]: def f(): ...: d0 = torch.device('cuda:0') ...: d1 = torch.device('cuda:1') ...: with torch.cuda.device(d0): ...: s0 = torch.cuda.current_stream() ...: with torch.cuda.device(d1): ...: s1 = torch.cuda.current_stream() ...: s0.query() ...: s1.query() In [4]: %timeit f() 38.1 µs ± 4.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [5]: %timeit f() 37.6 µs ± 2.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` before ```python In [4]: %timeit f() 28.5 µs ± 1.74 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [5]: %timeit f() 35.3 µs ± 2.91 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/15689 Differential Revision: D13571697 Pulled By: mrshenli fbshipit-source-id: 4fe697f91248c6419136d37bb5b7147e612e2f4c	2019-01-03 15:14:58 -08:00
SsnL	e4477feb15	Update cuda.get/set_rng_state doc (#14324 ) Summary: Now that `cuda.get/set_rng_state` accept `device` objects, the default value should be an device object, and doc should mention so. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14324 Reviewed By: ezyang Differential Revision: D13528707 Pulled By: soumith fbshipit-source-id: 32fdac467dfea6d5b96b7e2a42dc8cfd42ba11ee	2018-12-27 14:09:25 -08:00
David Riazati	59d71b9664	Bicubic interpolation for nn.functional.interpolate (#9849 ) Summary: Addresses #918, interpolation results should be similar to tf * Adds bicubic interpolation operator to `nn.functional.interpolate` * Corresponding test in `test_nn.py` The operator is added in legacy `TH` to be aligned with the other upsampling operators; they can be refactored/moved to ATen all at once when #10482 is resolved Pull Request resolved: https://github.com/pytorch/pytorch/pull/9849 Differential Revision: D9007525 Pulled By: driazati fbshipit-source-id: 93ef49a34ce4e5ffd4bda94cd9a6ddc939f0a4cc	2018-12-17 15:31:48 -08:00
Krishna Kalyan	5e09c7bc80	record unit time in torch.cuda.event (#15221 ) Summary: Record unit of time for torch.cuda.Event's elapsed_time Differential Revision: D13467646 Pulled By: zou3519 fbshipit-source-id: 4f1f4ef5fa4bc5a1b4775dfcec6ab155e5bf8d6e	2018-12-14 15:29:06 -08:00
SsnL	fab8085111	_get_device_index supports parsing device strings Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14929 Reviewed By: weiyangfb Differential Revision: D13394498 Pulled By: soumith fbshipit-source-id: 948c6118abdf6c1e1a8a17709333954cafb2345e	2018-12-09 21:12:46 -08:00
Tongzhou Wang	2448a83d30	Give broadcast_coalesced tensors different version counters (#13594 ) Summary: In `broadcast_coalesced`, since multiple variables can be "views" of a big flattened tensor, they can share the same version counter. However, this base flat tensor is not exposed and they don't share any memory locations, so this is not necessary. Furthermore, it can cause problems, e.g., when two buffers are broadcast together in `DataParallel` and one of them is modified in-place during `forward` but the other is needed in backward, autograd engine will complain. Fixing the bug discovered at https://github.com/pytorch/pytorch/pull/13350#issuecomment-436011370 edit: This is a very real problem. E.g., consider using Spectral Norm + Batch Norm together. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13594 Differential Revision: D12967311 Pulled By: SsnL fbshipit-source-id: 52998dbabe149f575cf0fb79e7016f0b95e4b9e5	2018-11-07 21:49:35 -08:00
Evan Klitzke	189c1e1afb	Rewrite http://pytorch.org -> https://pytorch.org throughout project (#12636 ) Summary: The pytorch.org site redirects all of the http:// requests to the https:// site anyway, so the comments and error messages might as well refer directly to the https:// site. The GitHub project description should also be updated to point to https://pytorch.org Pull Request resolved: https://github.com/pytorch/pytorch/pull/12636 Differential Revision: D10377099 Pulled By: soumith fbshipit-source-id: f47eaba1dd3eecc5dbe62afaf7022573dc3fd039	2018-10-15 13:03:27 -07:00
Tongzhou Wang	8e33451e2e	Make torch.cuda.* take device objects; Update distributed docs (#10833 ) Summary: Commits: 1. Make `torch.cuda.*` take device objects 2. Update `torch.distributed` docs to emphasize calling `torch.cuda.set_device` before `init_process_group` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10833 Differential Revision: D9514241 Pulled By: SsnL fbshipit-source-id: 2497464305fb1e63d6c495291a5744aaa7e2696e	2018-08-27 15:24:42 -07:00
Matt Dawkins	e41528a5cc	Also set stdin to subprocess pipe in FindCUDA windows popen call (#10379 ) Summary: Background: we run pytorch in embedded C++ pipelines, running in C++ GUIs in https://github.com/Kitware/VIAME and without this addition, the call was failing with the below error, but only on certain windows platforms/configurations: OSError: [WinError6] The handle is invalid At: C:\Program Files\VIAME\Python36\site-packages\torch\cuda_init_.py(162):_lazy_init C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(249): <lambda> C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(182): _apply C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(176): _apply C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(249): cuda C:\Program Files\VIAME\lib\python3.6None\site-packages\kwiver\arrows\pytorch\pytorch_resnet_f_extractor.py(74):_init_ C:\Program Files\VIAME\lib\python3.6None\site-packages\kwiver\processes\resnet_descriptors.py(132): _configure Pull Request resolved: https://github.com/pytorch/pytorch/pull/10379 Differential Revision: D9330772 Pulled By: ezyang fbshipit-source-id: 657ae7590879004558158d3c4abef2ec11d9ed57	2018-08-14 23:10:20 -07:00
Peter Goldsborough	f1ce15b50c	Move nccl scatter and gather to C++ (#9117 ) Summary: As I try to replicate DP in C++, I need to move some functions into C++ from Python. This PR ports the scatter and gather primitives from Python in torch/cuda/comm.py to C++ in torch/csrc/cuda/comm.cpp. The basic infrastructure was already there, since apaszke had rewritten broadcast in C++ already. I'm not very familiar with this code, so let me know if I'm doing something wrong. I largely just literally translated the code. I don't know how "public" `torch.cuda.comm` is, but I feel like the `destination_index` parameter for `gather` should be changed from -1 indicating CPU to `None` indicating CPU, and `-1` indicating the default CUDA device. That would make the code clearer IMO. apaszke colesbury teng-li pietern Closes https://github.com/pytorch/pytorch/pull/9117 Differential Revision: D8721729 Pulled By: goldsborough fbshipit-source-id: 1844a488079d21fa209b32e2c73e48632cbe9e68	2018-07-06 11:10:33 -07:00
LaiyuanGong	f5cd479b59	fix type mismatch while call torch._C._cuda_setDevice (#8065 ) * fix type mismatch while call torch._C._cuda_setDevice * fix type mismatch in scatter * fix type mismatch in scatter * fix type mismatch while call torch._C._cuda_setDevice * fix type mismatch while call torch._C._cuda_setDevice * fix type mismatch while call torch._C._cuda_setDevice	2018-06-05 09:53:22 -04:00
Soumith Chintala	50e92a3085	Static linkage for CUDA (#6807 ) * add static linkage option for CUDA libs * add CuFFT linking via fakelink * remove warning for 5.0 cuda architecture	2018-04-22 13:57:17 -04:00
Tongzhou Wang	4563e190c4	Use THC cached CUDA device property when get_device_name and get_device_capability (#6027 ) Getting CUDA device property struct with cudaGetDeviceProperties is expensive. THC caches CUDA device property, which is available via THCState_getDeviceProperties, which is available via at::globalContext().getDeviceProperties(device), which is available via torch.cuda.get_device_properties. This PR changes the two methods that previously calls cudaGetDeviceProperties to directly using torch.cuda.get_device_properties in Python. Also fixes ATen compile error when it can't find CUDA. Fixes #4908. Using the script from that issue, we get roughly 18x speed-up. [ssnl@ ~] python dev.py # master 0.2826697587966919 0.00034999847412109375 0.0003493785858154297 0.000356292724609375 0.00036025047302246094 0.0003629922866821289 0.00036084651947021484 0.00035686492919921874 0.00036056041717529296 0.0003606319427490234 [ssnl@ ~] python dev.py # this PR 0.27275662422180175 2.1147727966308594e-05 1.9598007202148438e-05 1.94549560546875e-05 1.9359588623046876e-05 1.938343048095703e-05 2.0074844360351563e-05 1.952648162841797e-05 1.9311904907226562e-05 1.938343048095703e-05	2018-03-30 16:39:22 -04:00
Sam Gross	48a3349c29	Delete dead Tensor code paths (#5417 ) This deletes most of the dead Tensor code paths, including the TensorMethods cwrap and generic/Tensor.cpp. This also moves the THNN.cwrap/.cpp generation to generate_code which can use ninja if installed.	2018-02-27 17:58:09 -05:00
Carl Lemaire	6b95ca4eda	DataParallel: GPU imbalance warning (#5376 )	2018-02-27 21:30:41 +01:00
Sam Gross	30ec06c140	Merge Variable and Tensor classes (#5225 ) This replaces the torch.Tensor constructors with factories that produce Variables. Similarly, functions on the torch module (e.g. torch.randn) now return Variables. To keep the PR to a reasonable size, I've left most of the unused tensor code. Subsequent PRs will remove the dead code, clean-up calls to torch.autograd.Variable, and rename Variable to Tensor everywhere. There are some breaking changes because Variable and Tensors had slightly different semantics. There's a list of those changes here: https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge	2018-02-23 18:03:31 -05:00
Soumith Chintala	2d84cb4b04	warn that CUDA capability 3.0 and 5.0 is no longer supported (#5125 )	2018-02-08 00:07:53 -05:00
Sam Gross	895aebac08	Use Variable instead of Tensor in Function.forward (#4786 ) The Tensor and Variable classes are being merged. autograd.Function.forward is now called on Variables, but with "no-grad" mode (torch.no_grad()) enabled. One benefit is that we no longer have to explicitly track shared storages.	2018-02-06 17:24:27 -05:00
Peter Goldsborough	86fd5fd524	Replace async with non_blocking for Python 3.7 (#4999 ) * Replace async with non_blocking for Python 3.7 upgrade * Remove trailing whitespace * Give _cuda and _type kwargs and accept async for compatibility * Rename async to non_blocking in all C++ code * Add entries for async in python_variable_methods * Friendlier backward compatibility for cuda and type	2018-02-02 09:23:51 -05:00
Christian Sarofeen	ef4cf860ac	Lazy init in set device, also should not be called in getDevCount (#4918 )	2018-01-30 16:24:31 +01:00
albanD	ee8bcdca79	make torch.cuda.empty_cache() a no-op when cuda is not initialized (#4936 )	2018-01-30 16:22:17 +01:00
albanD	7a47790c27	Add missing _lazy_init in cuda python functions	2018-01-29 18:19:03 +01:00
SsnL	3ecd25b065	fix indentation	2018-01-28 20:56:57 +01:00
Tongzhou Wang	6420c6b224	Improve `torch.cuda.empty_cache` documentation (#4879 ) * add doc about empty_cache wont increase amount of memory available * typo	2018-01-27 04:54:25 -05:00
Yongjik Kim	dd5c195646	More documentation for CUDA stream functions. (#4756 )	2018-01-21 12:58:51 +01:00
Sam Gross	f1c616418d	Fix Python docs for broadcast and braodcast_coalesced (#4727 )	2018-01-19 10:57:20 -05:00
Adam Paszke	1061d7970d	Move broadcast and broadcast_coalesced to C++	2018-01-18 11:16:45 +01:00
Tongzhou Wang	5918243b0c	Methods for checking CUDA memory usage (#4511 ) * gpu mem allocated * add test * addressed some of @apaszke 's comments * cache stats * add more comments about test	2018-01-09 11:47:48 -05:00
Edward Z. Yang	c6381c6d44	Add function to explicitly initialize PyTorch CUDA state. (#4180 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-14 17:48:05 -05:00
Richard Zou	d450895a74	fix typo (#4175 )	2017-12-14 12:31:58 -05:00
Sam Gross	bcfe259f83	Add streams and comms as optional arguments (#3968 ) Adds streams and comms as optional arguments to the NCCL calls in torch.cuda.nccl. Also exposes ncclUniqueId and ncclCommInitRank for multi-process mode. Moves Py_RETURN_NONE statements after the GIL is re-acquired.	2017-12-04 13:51:22 -05:00
Luca Antiga	af58bfbb1b	Make integer parameters and buffers immune to float(), double() and half() (#3820 ) * Avoid casting integer params and buffers to float(), double() and half() * Add test for immune integer buffers * Fix documentation for float(), double() and half() * Fix test	2017-11-22 18:34:53 -05:00
Soumith Chintala	50009144c0	add warnings if device capability is less than ideal (#3601 )	2017-11-09 11:48:59 -05:00
Ozan Çağlayan	dd6d04ddf2	doc: Normalize all true/false in docstrings to ``True\|False`` (#3593 ) * doc: Normalize all true/false in docstrings to ``True\|False`` This makes them more apparent in the documentation. * doc: fix flake8	2017-11-09 08:12:29 -05:00
peterjc123	aa911939a3	Improve Windows Compatibility (for csrc/scripts) (#2941 )	2017-11-08 19:51:35 +01:00
SsnL	bb1b826cdc	Exposing emptyCache from allocator (#3518 ) * Add empty_cache binding * cuda.empty_cache document * update docs	2017-11-07 17:00:38 -05:00
SsnL	fa5efab669	comments and case where not all sparse (#3370 )	2017-11-01 06:05:17 -04:00
SsnL	01be4d6b20	sparse broadcast_coalesce and reduce_add_coalesced	2017-10-28 18:52:35 -04:00
Adam Paszke	76abc06b1f	Fix nvprof mode in autograd profiler	2017-10-20 10:22:54 -04:00
SsnL	fce3ed19e5	Change device_id to device in python land (#3133 ) * change device_id to device in python land * cuda/random.py	2017-10-17 00:54:26 +02:00
Soumith Chintala	efe91fb9c1	delete redundant python nccl code	2017-10-09 22:24:18 -04:00
Soumith Chintala	e9dccb3156	implement all_reduce, broadcast, all_gather, reduce_scatter	2017-10-09 22:24:18 -04:00
Soumith Chintala	4d62933529	add initial NCCL C bindings	2017-10-09 22:24:18 -04:00
Edward Z. Yang	2dcaa40425	Add get_rng_state_all and set_rng_state_all. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-30 16:21:04 -04:00

1 2 3 4 5 ...

323 Commits