pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Kazuaki Ishizaki	50ed38a7eb	Fix typo under docs directory (#97202 ) This PR fixes typo in `.rst` files under docs directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97202 Approved by: https://github.com/kit1980	2023-03-21 01:24:10 +00:00
Xuehai Pan	8d45f555d7	[BE] [1/3] Rewrite `super()` calls in caffe2 and benchmarks (#94587 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94587 Approved by: https://github.com/ezyang	2023-02-11 18:19:48 +00:00
double7	685108b201	[docs] Fix incorrect wrapping of function (#94446 ) The sample code of document incorrectly wraps the function decorator. To fix this, update the attributes of `func` based on `torch_function`. Fixes #94305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94446 Approved by: https://github.com/ezyang	2023-02-09 16:01:10 +00:00
soulitzer	77cbaedd5c	[docs] Add section about tensor hooks on in-place in autograd note (#93116 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93116 Approved by: https://github.com/albanD	2023-02-01 17:35:21 +00:00
Felix Divo	219e9533f0	Improve autograd doc on complex numbers (#93065 ) A tiny change to fix formatting and clarify a bit in [this section](https://pytorch.org/docs/stable/notes/autograd.html#what-are-complex-derivatives). Pull Request resolved: https://github.com/pytorch/pytorch/pull/93065 Approved by: https://github.com/albanD	2023-01-27 09:36:38 +00:00
Richard Zou	98b78aa11c	[autograd.Function] setup_context always appears on the Function (#92312 ) Previously, we used the existence of setup_context to switch between if forward should take a ctx object or not. To be consistent with all other staticmethod (which always exist on the autograd.Function), this PR change it so that we use IF setup_context gets overriden by the user to switch between if forward should take a ctx object or not. Fixes https://github.com/pytorch/pytorch/issues/91451 Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/92312 Approved by: https://github.com/albanD, https://github.com/soulitzer	2023-01-18 02:55:42 +00:00
soulitzer	88366a9075	Document hooks ordering behavior in the autograd note (#91667 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91667 Approved by: https://github.com/albanD	2023-01-18 00:20:13 +00:00
Richard Zou	2f9166ef89	[autograd.Function] Cleanup asymmetry in generate_vmap_rule and vmap (#91787 ) This PR: - changes generate_vmap_rule to either be True or False. Previously it could be True, False, or not set. This simplifies the implementation a bit. - changes the vmap staticmethod to always be on the autograd.Function rather than sometimes defined. This is how the other staticmethod (forward, backward, jvp) are implemented and allows us to document it. There are 4 possible states for the autograd.Function w.r.t. to the above: - generate_vmap_rule is True, vmap staticmethod overriden. This raises an error when used with vmap. - generate_vmap_rule is False, vmap staticmethod overriden. This is valid. - generate_vmap_rule is True, vmap staticmethod not overriden. This is valid. - generate_vmap_rule is False, vmap staticmethod not overriden. This raises an error when used with vmap. Future: - setup_context needs the same treatment, but that's a bit tricker to implement. Test Plan: - new unittest - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/91787 Approved by: https://github.com/soulitzer	2023-01-17 13:36:34 +00:00
Emilio Castillo	07e595e88a	Add `device_idx` to `free_fn` in `CUDAPluggableAllocator` (#91398 ) This was requested by nvidia folks, track also the device_id in the free function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91398 Approved by: https://github.com/albanD	2023-01-12 05:03:48 +00:00
Kazuaki Ishizaki	4f91b8e0ee	Fix typo under docs directory (#91871 ) This PR fixes typo in '.rst' files under 'docs' directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/91871 Approved by: https://github.com/ngimel	2023-01-10 22:33:36 +00:00
Will Constable	630ef6c711	Fix Dynamo+DDP documentation (#91832 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91832 Approved by: https://github.com/soumith, https://github.com/davidberard98	2023-01-09 17:35:49 +00:00
Richard Zou	264f5ed516	[autograd.Function] Add docs on the functorch interaction (#91452 ) This PR: - Updates autograd.Function.forward docs to reflect how you either define a forward with ctx or a separate forward and setup_context - Updates the "Extending Autograd" docs to suggest the usage of autograd.Function with separate forward and setup_context. This should be the default because there is a low barrier to go from this to an autograd.Function that is fully supported by functorch transforms. - Adds a new "Extending torch.func with autograd.Function" doc that explains how to use autograd.Function with torch.func. It also explains how to use generate_vmap_rule and how to manually write a vmap staticmethod. While writing this, I noticed that the implementation of setup_context staticmethod/generate_vmap_rule/vmap staticmethod are a bit inconsistent with the other method/attributes on autograd.Function: - https://github.com/pytorch/pytorch/issues/91451 - I'm happy to fix those if we think it is a problem, either in this PR or a followup (this PR is getting long, I want some initial docs out that I can point early adopters at, and fixing the problems in the future isn't really BC-breaking). Test Plan: - view docs preview Pull Request resolved: https://github.com/pytorch/pytorch/pull/91452 Approved by: https://github.com/soulitzer	2023-01-04 00:28:19 +00:00
bowen0701	e803d336eb	Fix missing indentation in serialization.rst (#91253 ) Fixes #ISSUE_NUMBER In serialization.rst, fix class ControlFlowModule's forward(): missing indentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91253 Approved by: https://github.com/kit1980	2022-12-21 20:14:44 +00:00
Eddie Yan	8b617f813d	[cuBLAS] Add an option to disable reduced precision reductions for BF16 GEMM (#89172 ) Essentially the same change as #67946, except that the default is to disallow reduced precision reductions in `BFloat16` GEMMs (for now). If performance is severely regressed, we can change the default, but this option appears to be necessary to pass some `addmm` `BFloat16` tests on H100. CC @ptrblck @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/89172 Approved by: https://github.com/ngimel	2022-12-21 18:58:28 +00:00
Arek Sredzki	44dac51c36	Improve Autograd Documentation Clarity (#89401 ) This makes minor adjustments to the autograd docs, improving clarity and resolving grammatical errors Pull Request resolved: https://github.com/pytorch/pytorch/pull/89401 Approved by: https://github.com/kit1980	2022-12-06 06:45:04 +00:00
Will Constable	447283752c	Update DDP docs for Dynamo/DDPOptimizer (#89096 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89096 Approved by: https://github.com/msaroufim	2022-11-30 05:50:12 +00:00
eqy	8321066031	Tweak formatting of note on macros (#89598 ) For readability when viewing the rendered file e.g., from the browser. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89598 Approved by: https://github.com/kit1980	2022-11-28 20:42:30 +00:00
Emilio Castillo	c9d4390d13	Add Pluggable CUDA allocator backend (#86786 ) Fixes #43144 This uses the Backend system added by [82682](https://github.com/pytorch/pytorch/pull/82682) to change allocators dynamically during the code execution. This will allow us to use RMM, use CUDA managed memory for some portions of the code that do not fit in GPU memory. Write static memory allocators to reduce fragmentation while training models and improve interoperability with external DL compilers/libraries. For example, we could have the following allocator in c++ ```c++ #include <sys/types.h> #include <cuda_runtime_api.h> #include <iostream> extern "C" { void* my_malloc(ssize_t size, int device, cudaStream_t stream) { void ptr; std::cout<<"alloc "<< size<<std::endl; cudaMalloc(&ptr, size); return ptr; } void my_free(void ptr) { std::cout<<"free "<<std::endl; cudaFree(ptr); } } ``` Compile it as a shared library ``` nvcc allocator.cc -o alloc.so -shared --compiler-options '-fPIC' ``` And use it from PyTorch as follows ```python import torch # Init caching # b = torch.zeros(10, device='cuda') new_alloc = torch.cuda.memory.CUDAPluggableAllocator('alloc.so', 'my_malloc', 'my_free') old = torch.cuda.memory.get_current_allocator() torch.cuda.memory.change_current_allocator(new_alloc) b = torch.zeros(10, device='cuda') # This will error since the current allocator was already instantiated torch.cuda.memory.change_current_allocator(old) ``` Things to discuss - How to test this, needs compiling external code ... Pull Request resolved: https://github.com/pytorch/pytorch/pull/86786 Approved by: https://github.com/albanD	2022-11-23 17:54:36 +00:00
lezcano	d453b3c4d4	Add a note on the stability of linalg functions. (#88313 ) This was long-due, as it keeps comming up in issues. Fixes https://github.com/pytorch/pytorch/issues/85950 Fixes https://github.com/pytorch/pytorch/issues/59720 Fixes https://github.com/pytorch/pytorch/issues/59782 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88313 Approved by: https://github.com/soumith, https://github.com/mruberry	2022-11-07 22:44:23 +00:00
Codrin Popa	5b767d404e	Modified roundup_power2_divisions to specify the number of divisions for each power of two interval (#87290 ) Summary: Improved roundup_power2_divisions knob so it allows better control of rouding in the PyTorch CUDA Caching Allocator. This new version allows setting the number of divisions per power of two interval starting from 1MB and ending at 64GB and above. An example use case is when rouding is desirable for small allocations but there are also very large allocations which are persistent, thus would not benefit from rounding and take up extra space. Test Plan: Tested locally Differential Revision: D40103909 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87290 Approved by: https://github.com/zdevito	2022-11-04 19:31:16 +00:00
Pruthvi Madugundu	fbd08fb358	Introduce TORCH_DISABLE_GPU_ASSERTS (#84190 ) - Asserts for CUDA are enabled by default - Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON` - Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON` This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2022-11-04 04:43:05 +00:00
PyTorch MergeBot	0fa23663cc	Revert "Introduce TORCH_DISABLE_GPU_ASSERTS (#84190 )" This reverts commit `1e2c4a6e0e`. Reverted https://github.com/pytorch/pytorch/pull/84190 on behalf of https://github.com/malfet due to Needs internal changes, has to be landed via co-dev	2022-11-02 18:13:37 +00:00
Pruthvi Madugundu	1e2c4a6e0e	Introduce TORCH_DISABLE_GPU_ASSERTS (#84190 ) - Asserts for CUDA are enabled by default - Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON` - Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON` This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2022-11-02 17:41:57 +00:00
Kazuaki Ishizaki	72ec1b5fc1	Fix typo under docs directory (#87583 ) This PR fixes typo in `.rst` files under docs directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/87583 Approved by: https://github.com/kit1980	2022-10-24 23:52:44 +00:00
albanD	9db7270ee7	Small update to Module note (#87142 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87142 Approved by: https://github.com/cpuhrsch	2022-10-17 22:56:49 +00:00
Jan Margeta	e85dbcc9b0	[docs] Fix ScalarTensor __repr__ in Extending PyTorch example (#86330 ) This PR fixes the __repr__ of the `ScalarTensor` class in the Extending PyTorch example to correspond with the class name instead of `DiagonalTensor`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86330 Approved by: https://github.com/bdhirsh	2022-10-17 20:01:10 +00:00
Kshiteej K	54ee95c8ec	[nn] module: full_backward_pre_hook (#86700 ) Fixes https://github.com/pytorch/pytorch/issues/42824 * [x] Test * [x] Doc Pull Request resolved: https://github.com/pytorch/pytorch/pull/86700 Approved by: https://github.com/soulitzer	2022-10-13 17:36:39 +00:00
Daniel Dale	ce56ee11fd	Extend torch.cuda.is_available() to attempt an NVML-based CUDA availability assessment when explicitly requested by the user (#85951 ) Fixes #83973 (This is a substitute PR for https://github.com/pytorch/pytorch/pull/85024) First of all, thanks for your invaluable contributions to PyTorch everyone! Given how extensively `torch.cuda.is_available` is used in the PyTorch ecosystem, IMHO it's worthwhile to provide downstream libraries/frameworks/users the ability to alter the default behavior of `torch.cuda.is_available` in the context of their PyTorch usage. I'm confident there are many current and future such use cases which could benefit from leveraging a weakened, NVML-based `torch.cuda.is_available` assessment at a downstream framework's explicit direction (thanks @malfet `81da50a972` !). Though one could always patch out the `torch.cuda.is_available` function with another implementation in a downstream library, I think this environmental variable based configuration option is more convenient and the cost to including the option is quite low. As discussed in https://github.com/pytorch/pytorch/pull/85024#issuecomment-1261542045, this PR gates new non-default NVML-based CUDA behavior with an environmental variable (PYTORCH_NVML_BASED_CUDA_CHK) that allows a user/framework to invoke non-default, NVML-based `is_available()` assessments if desired. Thanks again for your work everyone! @ngimel @malfet @awaelchli Pull Request resolved: https://github.com/pytorch/pytorch/pull/85951 Approved by: https://github.com/ngimel	2022-10-12 18:37:50 +00:00
Eddie Yan	25725fd624	(Re-open) Adds cudaMallocAsync as an alternative backend for the CUDA allocator (#82682 ) Rebased version of @mcarilli 's cudaMallocAsync #65365 for continued testing Pull Request resolved: https://github.com/pytorch/pytorch/pull/82682 Approved by: https://github.com/ngimel	2022-10-12 03:44:21 +00:00
Codrin Popa	d401732baa	Added roundup_bypass_threshold_mb knobs to the PyTorch Caching Allocator (#85940 ) Summary: Added an additional roundup knob( ``roundup_bypass_threshold_mb``) to bypass rounding the requested allocation size, for allocation requests larger than the threshold value (in MB). This can help reduce the memory footprint when making large allocations that are expected to be persistent or have a large lifetime. Differential Revision: D39868104 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85940 Approved by: https://github.com/zdevito	2022-10-03 16:56:22 +00:00
Kazuaki Ishizaki	bc57306bdd	Fix typo under docs directory and RELEASE.md (#85896 ) This PR fixes typo in rst files under docs directory and `RELEASE.md`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85896 Approved by: https://github.com/kit1980	2022-09-29 21:41:59 +00:00
Eddie Yan	d892d5d682	[CUBLAS][TF32][CUDNN] Update numerical_accuracy.rst (#79537 ) CC @mruberry @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/79537 Approved by: https://github.com/ngimel, https://github.com/mruberry	2022-09-07 18:30:26 +00:00
Christian Jauvin	089101fc82	Fix small typo in cuda.rst (#84012 ) This fixes a very minor typo in the CUDA semantics doc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84012 Approved by: https://github.com/malfet	2022-08-26 04:53:49 +00:00
soulitzer	e60f8f4f60	Improve autograd custom function docs (#81340 ) Fixes https://github.com/pytorch/pytorch/issues/81223 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81340 Approved by: https://github.com/albanD	2022-07-21 19:54:30 +00:00
Danielle Pintz	8926b5b9c2	Fix typos in docs: Profiler and CUDA semantics (#80406 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80406 Approved by: https://github.com/robieta	2022-07-13 18:53:02 +00:00
eqy	eff74ed7bd	[AMP] Use generic autocast in example, specify dtype (#79579 ) CC @mruberry @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/79579 Approved by: https://github.com/mruberry, https://github.com/ngimel	2022-06-17 21:32:51 +00:00
Rhys Goodall	62ba548cac	[DOC] Missing line in serialization notes (#79454 ) Small typo fix to serialization docs where there was a missing line in one of the examples. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79454 Approved by: https://github.com/mruberry	2022-06-17 18:26:47 +00:00
Mike Ruberry	1d47e0df5a	Updates TF32 docs (#79401 ) Updates TF32 docs to reflect PyTorch 1.12 updates. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79401 Approved by: https://github.com/ngimel	2022-06-13 21:02:00 +00:00
lezcano	a8ea58afee	Add randomness case to the autograd notes I also took this chance to clean a bit the sphinx formatting and reworded a few minor things. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78617 Approved by: https://github.com/soulitzer, https://github.com/albanD	2022-06-08 21:27:03 +00:00
Kurt Mohler	a4403c17c7	Improve reproducibility docs for RNG (#78849 ) * Mention that operations may change RNG state and how to deal with it * Add link to Reproducibility note in `use_deterministic_algorithms` docs * Also fix a broken link Fixes #77206 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78849 Approved by: https://github.com/mruberry	2022-06-06 14:53:59 +00:00
albanD	b30b1f3dec	update mps note with more details (#78669 ) Follow up to the comments in https://github.com/pytorch/pytorch/pull/77767#pullrequestreview-978807521 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78669 Approved by: https://github.com/kulinseth, https://github.com/anjali411	2022-06-02 20:53:19 +00:00
vfdev	642fc94501	Update extending.rst (#78707 ) Follow-up fix for https://github.com/pytorch/pytorch/pull/78073 : https://github.com/pytorch/pytorch/pull/78073#discussion_r887621219 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78707 Approved by: https://github.com/albanD	2022-06-02 17:24:00 +00:00
Philip Meier	288b23bc52	fix MetadataTensor example (#78073 ) ```py [bar if bar for bar in foo] ``` is invalid Python syntax. The `if` clause needs to be at the end: ```py [bar for bar in foo if bar] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/78073 Approved by: https://github.com/albanD	2022-05-31 21:34:19 +00:00
Alban Desmaison	dcd2ba3538	improve mps note to describe the different functions available (#77767 ) Fixing https://github.com/pytorch/pytorch/issues/77748 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77767 Approved by: https://github.com/soulitzer	2022-05-18 20:17:23 +00:00
Jeff Daily	de86146c61	rocblas alt impl during backward pass only (#71881 ) In preparation of adopting future rocblas library options, it is necessary to track when the backward pass of training is executing. The scope-based helper class `BackwardPassGuard` is provided to toggle state. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71881 Approved by: https://github.com/albanD	2022-05-18 19:42:58 +00:00
Kulin Seth	e011a8e18b	Enable PyTorch operations on MPS Backend. (#77343 ) Add PyTorch operations to MPS backend. - https://github.com/pytorch/pytorch/issues/77394 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77343 Approved by: https://github.com/albanD	2022-05-13 18:28:53 +00:00
James Reed	286d788029	Properly capitalize PyTorch (#77308 ) pytorch -> PyTorch Pull Request resolved: https://github.com/pytorch/pytorch/pull/77308 Approved by: https://github.com/bertmaher, https://github.com/mthrok	2022-05-12 18:07:32 +00:00
Alban Desmaison	d5210a4269	Add gradient choice detail to autograd doc Trying to clarify what our backward functions should compute. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76898 Approved by: https://github.com/soulitzer, https://github.com/Lezcano	2022-05-06 21:12:25 +00:00
Smark	ab57876420	fix docs error in Autograd Mechanics Fixes #74682 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74807 Approved by: https://github.com/albanD	2022-03-29 18:32:16 +00:00
leslie-fang-intel	3a112ebb57	add autocast cpu doc As discussed in https://github.com/pytorch/pytorch/issues/55374#issuecomment-968333614, here we update the cpu autocast operation list in autocast API document. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68567 Approved by: https://github.com/ezyang	2022-03-22 02:02:43 +00:00
Jaewon Lee	11ea09effc	[CUDACachingAlloc/GPUInference] Implement garbage collection without GPU sync (#74261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74261 ### Goal Implement a cheap way to reclaim GPU memory (garbage collection) without incurring GPU sync. ### Why do we need this? Currently, there are only two ways to reclaim GPU memory block already assigned to a particular stream. - `release_available_cached_blocks(params)`: Free blocks exceeding the `CachingAllocatorConfig::max_split_size()` until we can satisfy the request. Issue: If the `max_split_size` is unset (default), this function is a no-op. Even if this is set, the reclamation is quite conservative (e.g., never frees blocks under max_split_size). - `release_cached_blocks()`: Waits for all the in-flight events and then reclaim blocks. Issue: 'waiting for all event' is very expensive as it will likely stall all the GPU operations. Many GPU applications without a proper handling of potential GPU throttling would suffer/crash. ### Proposed idea - If the garbage collection threshold is set, try to reclaim some memory blocks without synchronization. It should be safe to do so, as `release_available_cached_blocks` essentially does the same thing (but less aggressively). - GC is triggered only when we fail to serve a `malloc` request from the block pool. No need to free blocks when the block pool is functioning just fine. - Prioritize reclaiming blocks that weren't reused for long time. Reclamation stops once the used memory capacity < threshold. - This code path is totally optional; by default it won't be invoked. Test Plan: - Unit tests - Manually checked that the GPU memory usage stays as indicated by the garbage collector. If not the caching allocator at least tries to keep freeing the blocks. Reviewed By: jianyuh Differential Revision: D34482514 fbshipit-source-id: d5eae62ac60b94b0bca851f9d233a092d086e3c2 (cherry picked from commit 05780f1ed4b176f05e765b2411c9eaa2eaeb48b0)	2022-03-21 18:46:02 +00:00
Banit Agrawal	ac3effd150	[PyTorch GPU Allocator] Better use of blocks with rounding of allocation sizes (#74213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74213 In the current CUDACachingAllocator, the sizes are rounded up in multiple of blocks size of 512, so this works for smaller sizes. However for large sizes, we can have lots of different size blocks in the larger pool. This is problematic when we have variable batch sizes 1001, 1021, 1023 -> all will go to different block size and will create different size of blocks. This will create lots of unused blocks and will waste GPU memory capacity. This diff adds a rounding approach to allocation size. It rounds up the size to nearest power-of-2 divisions and the power2-division can be changed with env variable setting. For example, if we need to round-up size of1200 and if number of divisions is 4, the size 1200 lies between 1024 and 2048 and if we do 4 divisions between them, the values are 1024, 1280, 1536, and 1792. So the function will return 1280 as the nearest ceiling of power-2 division. env setting: export PYTORCH_CUDA_ALLOC_CONF=roundup_power2_divisions:4 ghstack-source-id: 151446017 Reviewed By: ezyang Differential Revision: D34868036 fbshipit-source-id: 494785add16e6b37c920dcb5a2b81d4c637b554a (cherry picked from commit 548454ccacbd8700e7ffd2d762e40b4ba37abbae)	2022-03-16 02:53:53 +00:00
Rohit Goswami	801abc0cdd	MAINT, DOC: Trivial spellings and warnings (#72745 ) Summary: Fixes N/A. Just minor annoyances. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72745 Reviewed By: samdow Differential Revision: D34216016 Pulled By: albanD fbshipit-source-id: b65600b50e41a1dd7bf7d076b0dd3e2d1c99caf9 (cherry picked from commit `b959392a5f`)	2022-02-14 21:55:19 +00:00
Felix Divo	340fae4363	[Doc] Better formatting in autograd.rst (#72586 ) Summary: See title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72586 Reviewed By: soulitzer Differential Revision: D34177704 Pulled By: albanD fbshipit-source-id: 1adf6ebed4f64ec4d8fff160df300c8e6ee528ea (cherry picked from commit `bbb586d67d`)	2022-02-11 22:46:10 +00:00
Felix Divo	25fba4a019	[DOC] Add link to "double backward" from "extending pytorch" page (#72584 ) Summary: It is probably the most user friendly to link to that (lesser known?) feature. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72584 Reviewed By: soulitzer Differential Revision: D34173999 Pulled By: albanD fbshipit-source-id: 99fff7a55412faf54888f8317ab2388f4d7d30e4 (cherry picked from commit `2191ee7657`)	2022-02-11 20:34:13 +00:00
Mike Ruberry	9b9b878c89	Fixes jiterator cache macro include + updates CUDA note with cache variables (#71452 ) Summary: Per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71452 Reviewed By: ngimel Differential Revision: D33646495 Pulled By: mruberry fbshipit-source-id: bbf627e6d7a724a83a3ea2ae9c0f50430f8d578e (cherry picked from commit `d1e72b144a`)	2022-01-19 03:45:05 +00:00
Rohan Varma	4fd1992a60	[Docs][BE] DDP doc fix (#71363 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71363 Looks like DDP example is currently broken as per https://discuss.pytorch.org/t/official-ddp-example-is-broken/141493. Fix the issue by setting the correct env variable. ghstack-source-id: 147080377 Test Plan: CI Reviewed By: mrshenli Differential Revision: D33607250 fbshipit-source-id: e0e7d03cc365c186253b959c4c5405a5e3609218 (cherry picked from commit `32472884ec`)	2022-01-18 22:24:51 +00:00
Jake Tae	23f902f7e4	Fix incorrect variable in autograd docs (#70884 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68362. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70884 Reviewed By: mruberry Differential Revision: D33463331 Pulled By: ngimel fbshipit-source-id: 834ba9c450972710e0424cc92af222551f0b4a4a	2022-01-06 20:53:10 -08:00
Peter Bell	e279963eef	Remove remaining THC code (#69039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69039 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32872476 Pulled By: ngimel fbshipit-source-id: 7972aacc24aef9450fb59b707ed6396c501bcb31	2021-12-08 12:18:08 -08:00
Rodrigo Bermúdez Schettino	1a202b0c39	Docs: Fix broken code syntax in autograd.rst (#69362 ) Summary: The backticks around `nn.Parameters` were not rendered correctly because the word was enclosed in an italics block. Spotted the issue on https://pytorch.org/docs/stable/notes/autograd.html#locally-disable-grad-doc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69362 Reviewed By: zou3519 Differential Revision: D32924093 Pulled By: albanD fbshipit-source-id: 5a310ac3f3d13a5116f7aa911817b9452eee711d	2021-12-07 12:03:15 -08:00
Michael Carilli	da023611d7	[CUDA graphs] Fixes make_graphed_callables example typos (#69379 ) Summary: cc mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/69379 Reviewed By: mruberry Differential Revision: D32841260 Pulled By: ngimel fbshipit-source-id: a7d0b9db0578526907547b201eddd55827812b63	2021-12-03 16:51:14 -08:00
Elio	088a4feb41	Update the documentation for AMP with DataParallel (#69218 ) Summary: Following https://github.com/pytorch/pytorch/issues/60540 and pull request https://github.com/pytorch/pytorch/issues/43102 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69218 Reviewed By: gchanan Differential Revision: D32803814 Pulled By: ngimel fbshipit-source-id: 06fdbbee2c7734153271be70ec4bc24263c8c367	2021-12-03 14:58:47 -08:00
Vansh Sharma	ff125a3624	Minor changes in documentation (#68557 ) Summary: Fixed some small typos Pull Request resolved: https://github.com/pytorch/pytorch/pull/68557 Reviewed By: mruberry Differential Revision: D32538749 Pulled By: ngimel fbshipit-source-id: 09a9cd4031463b6a40d7307bd8fcb7d364444ac3	2021-11-18 17:57:16 -08:00
eqy	790763b0fe	Add an option to disable reduced precision reductions for FP16 GEMM (#67946 ) Summary: https://github.com/pytorch/pytorch/issues/67578 disabled reduced precision reductions for FP16 GEMMs. After benchmarking, we've found that this has substantial performance impacts for common GEMM shapes (e.g., those found in popular instantiations of multiheaded-attention) on architectures such as Volta. As these performance regressions may come as a surprise to current users, this PR adds a toggle to disable reduced precision reductions `torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = ` rather than making it the default behavior. CC ngimel ptrblck stas00 Note that the behavior after the previous PR can be replicated with `torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = False` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67946 Reviewed By: zou3519 Differential Revision: D32289896 Pulled By: ngimel fbshipit-source-id: a1ea2918b77e27a7d9b391e030417802a0174abe	2021-11-09 17:27:20 -08:00
Alban Desmaison	708f7b1209	Update extending doc to cover forward mode AD (#66962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66962 Reviewed By: VitalyFedyunin Differential Revision: D31897782 Pulled By: albanD fbshipit-source-id: 64164783a14a7ed4cedc17da28f1181d9807a499	2021-10-27 14:18:38 -07:00
Natalia Gimelshein	fdd9f49cf5	add a note on numerical accuracy (#65947 ) Summary: Per title Fixes https://github.com/pytorch/pytorch/issues/54437 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65947 Reviewed By: albanD Differential Revision: D31612445 Pulled By: ngimel fbshipit-source-id: 5c155891a088aef3b9813f253d0dc1ee4d51ae1c	2021-10-13 12:43:55 -07:00
Rodrigo Berriel	7e772e7685	Update link to tutorial on defining NN modules (#65534 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65527. Please, see my comment in the issue: https://github.com/pytorch/pytorch/issues/65527#issuecomment-925863193. The file was renamed in `ce58d5904c (diff-e5ef486bd89eb38de15752211d9437953681b8caa8f44d7c86bb820d13151df2)`, but the link in this repository was not updated. It doesn't change the fact that the old link is still working, but I guess this has to be fixed in [pytorch/tutorials](https://github.com/pytorch/tutorials) instead of here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65534 Reviewed By: soulitzer Differential Revision: D31144269 Pulled By: H-Huang fbshipit-source-id: f70744a21113b7dc84510e2992d87f0fed793985	2021-09-23 11:26:50 -07:00
Rodrigo Berriel	f0ada4bd54	[docs] Remove .data from some docs (#65358 ) Summary: Related to https://github.com/pytorch/pytorch/issues/30987. Fix the following task: - [ ] Remove the use of `.data` in all our internal code: - [ ] ... - [x] `docs/source/scripts/build_activation_images.py` and `docs/source/notes/extending.rst` In `docs/source/scripts/build_activation_images.py`, I used `nn.init` because the snippet already assumes `nn` is available (the class inherits from `nn.Module`). cc albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/65358 Reviewed By: malfet Differential Revision: D31061790 Pulled By: albanD fbshipit-source-id: be936c2035f0bdd49986351026fe3e932a5b4032	2021-09-21 06:32:31 -07:00
Michael Carilli	e3210ca184	[CUDA graphs] Beta, not prototype (#65247 ) Summary: Powers have decided this API should be listed as beta. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65247 Reviewed By: malfet Differential Revision: D31057940 Pulled By: ngimel fbshipit-source-id: 137b63cbd2c7409fecdc161a22135619bfc96bfa	2021-09-20 13:32:36 -07:00
albanD	473e55d5b2	Use classmethods for overrides (#64841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64841 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30991424 Pulled By: albanD fbshipit-source-id: 551e2119768f3a4292713f3bfa83930f5506adbd	2021-09-17 08:32:49 -07:00
Jane Xu	4c4c03124b	Remove old references to 9.2 in documentation (#65059 ) Summary: Removes references in .rst and README.md and comments in the Dockerfile Pull Request resolved: https://github.com/pytorch/pytorch/pull/65059 Reviewed By: malfet Differential Revision: D30961110 Pulled By: janeyx99 fbshipit-source-id: 702a9a81bf08125ec4ac38bc656fc2c128c30018	2021-09-16 13:24:05 -07:00
Michael Carilli	36cac2be4d	[CUDA graphs] moves memory sharing intro paragraph (#64996 ) Summary: Puts memory sharing intro under Sharing memory... header, where it should have been all along. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64996 Reviewed By: mruberry Differential Revision: D30948619 Pulled By: ngimel fbshipit-source-id: 5d9dd267b34e9d3fc499d4738377b58a22da1dc2	2021-09-14 17:53:43 -07:00
Michael Carilli	8d08b103be	[CUDA graphs] Prototype API and documentation (#63269 ) Summary: RFC: https://github.com/pytorch/pytorch/issues/61880 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63269 Reviewed By: mruberry Differential Revision: D30596643 Pulled By: ngimel fbshipit-source-id: b1f8061406364b667e2c2d4d30fbce1f0d8456be	2021-08-31 13:34:23 -07:00
Joel Schlosser	196fd3ee7a	Modules note v2 (#63963 ) Summary: This PR expands the [note on modules](https://pytorch.org/docs/stable/notes/modules.html) with additional info for 1.10. It adds the following: * Examples of using hooks * Examples of using apply() * Examples for ParameterList / ParameterDict * register_parameter() / register_buffer() usage * Discussion of train() / eval() modes * Distributed training overview / links * TorchScript overview / links * Quantization overview / links * FX overview / links * Parametrization overview / link to tutorial Pull Request resolved: https://github.com/pytorch/pytorch/pull/63963 Reviewed By: albanD Differential Revision: D30606604 Pulled By: jbschlosser fbshipit-source-id: c1030b19162bcb5fe7364bcdc981a2eb6d6e89b4	2021-08-27 11:30:18 -07:00
Jithun Nair	730ce29baf	Add note on ifdefing based on CUDA_VERSION for ROCm path (#62850 ) Summary: CUDA_VERSION and HIP_VERSION follow very unrelated versioning schemes, so it does not make sense to use CUDA_VERSION to determine the ROCm path. This note explicitly addresses it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62850 Reviewed By: mruberry Differential Revision: D30547562 Pulled By: malfet fbshipit-source-id: 02990fa66a88466c2330ab85f446b25b78545150	2021-08-25 15:02:03 -07:00
Victor Quach	b95ce1591d	Add docs describing saved tensor hooks (#62362 ) Summary: Add section to the Autograd mechanics docs to describe the recently exposed saved tensors (https://github.com/pytorch/pytorch/issues/52451), how to register packing / unpacking hooks (https://github.com/pytorch/pytorch/issues/60975) and how to use default hooks (https://github.com/pytorch/pytorch/issues/61834) Sister PR: https://github.com/pytorch/pytorch/issues/62361 (will add a link from autograd.rst to notes/autograd in whatever PR does not land first) Pull Request resolved: https://github.com/pytorch/pytorch/pull/62362 Reviewed By: soulitzer Differential Revision: D30453177 Pulled By: Varal7 fbshipit-source-id: f5759977b069ff0ef36a47b08856d297691a6caa	2021-08-20 11:10:51 -07:00
soulitzer	2f615f6313	Improve custom function docs (#60312 ) Summary: - Adds some code examples for `ctx` methods and make requirements of arguments more clear - Type annotations for `save_for_backward`, `mark_dirty`, `mark_non_differentiable`, and `set_materialize_grads` (BC-breaking?) - Refactor `torch.autograd.Function` doc Pull Request resolved: https://github.com/pytorch/pytorch/pull/60312 Reviewed By: VitalyFedyunin Differential Revision: D30314961 Pulled By: soulitzer fbshipit-source-id: a284314b65662e26390417bd2b6b12cd85e68dc8	2021-08-18 11:31:31 -07:00
kyshel	e75ed4a4b5	add comma to prevent syntax errors (#62492 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/62492 Reviewed By: VitalyFedyunin Differential Revision: D30304684 Pulled By: ezyang fbshipit-source-id: db08ca39bcecbfd79ea50df18536bf4e87f51e15	2021-08-16 12:27:31 -07:00
cpatru	6d896cb545	Update faq.rst so OOM section mentions checkpoint (#62709 ) Summary: This FAQ has a section for CUDA OOMs where there are lots of don'ts. This limits modeling solution. Deep nets can blow up memory due to output caching during training. It's a known problem with a known solution: to trade-off compute for memory via checkpointing. FAQ should mention it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62709 Reviewed By: nairbv Differential Revision: D30103326 Pulled By: ezyang fbshipit-source-id: 3a8b465a7fbe19aae88f83cc50fe82ebafcb56c9	2021-08-05 07:40:08 -07:00
Victor Quach	5830f122f1	Add docstrings for save_on_cpu hooks (#62410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62410 This PR adds docstrings for CPU hooks introduced in #61928. Also uncomments the warning about pinned memory in CUDA semantics docs. Depends on: #62361. For now docstrings are an orphan page at https://docs-preview.pytorch.org/62410/generated/torch.autograd.graph.set_save_on_cpu_hooks.html#torch-autograd-graph-set-save-on-cpu-hooks Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29990129 Pulled By: Varal7 fbshipit-source-id: 7a98eeee6a0abb11e2c2d9169cd1aa35ad7ba3f4	2021-08-03 17:53:45 -07:00
Michael Dagitses	58df01c3b8	clarify default value of requires_grad for tensors (#61038 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61038 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29491984 Pulled By: dagitses fbshipit-source-id: 7e6b7f8e81d77f38c881b86a68c17d3cf5483dad	2021-07-12 12:57:37 -07:00
Jithun Nair	336970c03e	Add note on torch.distributed backends on ROCm (#58975 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58975 Reviewed By: soulitzer Differential Revision: D29595510 Pulled By: rohan-varma fbshipit-source-id: 384bb67fcd003d65b76e957a474406b2a38099b9	2021-07-10 03:51:19 -07:00
Michael Carilli	2fa6c7627e	[CUDA graphs][BC-breaking] Removes post-backward syncs on default stream (#60421 ) Summary: Before https://github.com/pytorch/pytorch/pull/57833, calls to backward() or grad() synced only the calling thread's default stream with autograd leaf streams at the end of backward. This made the following weird pattern safe: ```python with torch.cuda.stream(s): # imagine forward used many streams, so backward leaf nodes may run on many streams loss.backward() # no sync use grads ``` but a more benign-looking pattern was unsafe: ```python with torch.cuda.stream(s): # imagine forward used a lot of streams, so backward leaf nodes may run on many streams loss.backward() # backward() syncs the default stream with all the leaf streams, but does not sync s with anything, # so counterintuitively (even though we're in the same stream context as backward()!) # it is NOT SAFE to use grads here, and there's no easy way to make it safe, # unless you manually sync on all the streams you used in forward, # or move "use grads" back to default stream outside the context. use grads ``` mruberry ngimel and I decided backward() should have the [same user-facing stream semantics as any cuda op](https://pytorch.org/docs/master/notes/cuda.html#stream-semantics-of-backward-passes). In other words, the weird pattern should be unsafe, and the benign-looking pattern should be safe. Implementationwise, this meant backward() should sync its calling thread's current stream, not default stream, with the leaf streams. After https://github.com/pytorch/pytorch/pull/57833, backward syncs the calling thread's current stream AND default stream with all leaf streams at the end of backward. The default stream syncs were retained for temporary backward compatibility. This PR finishes https://github.com/pytorch/pytorch/pull/57833's work by deleting syncs on the default stream. With this PR, graph-capturing an entire backward() call should be possible (see the [test_graph_grad_scaling diffs](https://github.com/pytorch/pytorch/compare/master...mcarilli:streaming_backwards_remove_default_syncs?expand=1#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3641-R3642)). first paragraph has a formatting error which this PR should also fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60421 Reviewed By: albanD Differential Revision: D29370344 Pulled By: ngimel fbshipit-source-id: 3248bc5fb92fc517db0c15c897e5d7250f67d7fe	2021-06-24 17:34:02 -07:00
Luca Wehrstedt	bb9e1150ea	Revert D29342234: [pytorch][PR] [CUDA graphs][BC-breaking] Removes post-backward syncs on default stream Test Plan: revert-hammer Differential Revision: D29342234 (`675cea1adb`) Original commit changeset: 98e6be7fdd85 fbshipit-source-id: 84022973248b2254210eee57402df2c4f4bc43c6	2021-06-24 04:49:28 -07:00
Michael Carilli	675cea1adb	[CUDA graphs][BC-breaking] Removes post-backward syncs on default stream (#60421 ) Summary: Before https://github.com/pytorch/pytorch/pull/57833, calls to backward() or grad() synced only the calling thread's default stream with autograd leaf streams at the end of backward. This made the following weird pattern safe: ```python with torch.cuda.stream(s): # imagine forward used many streams, so backward leaf nodes may run on many streams loss.backward() # no sync use grads ``` but a more benign-looking pattern was unsafe: ```python with torch.cuda.stream(s): # imagine forward used a lot of streams, so backward leaf nodes may run on many streams loss.backward() # backward() syncs the default stream with all the leaf streams, but does not sync s with anything, # so counterintuitively (even though we're in the same stream context as backward()!) # it is NOT SAFE to use grads here, and there's no easy way to make it safe, # unless you manually sync on all the streams you used in forward, # or move "use grads" back to default stream outside the context. use grads ``` mruberry ngimel and I decided backward() should have the [same user-facing stream semantics as any cuda op](https://pytorch.org/docs/master/notes/cuda.html#stream-semantics-of-backward-passes). In other words, the weird pattern should be unsafe, and the benign-looking pattern should be safe. Implementationwise, this meant backward() should sync its calling thread's current stream, not default stream, with the leaf streams. After https://github.com/pytorch/pytorch/pull/57833, backward syncs the calling thread's current stream AND default stream with all leaf streams at the end of backward. The default stream syncs were retained for temporary backward compatibility. This PR finishes https://github.com/pytorch/pytorch/pull/57833's work by deleting syncs on the default stream. With this PR, graph-capturing an entire backward() call should be possible (see the [test_graph_grad_scaling diffs](https://github.com/pytorch/pytorch/compare/master...mcarilli:streaming_backwards_remove_default_syncs?expand=1#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3641-R3642)). first paragraph has a formatting error which this PR should also fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60421 Reviewed By: VitalyFedyunin, albanD Differential Revision: D29342234 Pulled By: ngimel fbshipit-source-id: 98e6be7fdd8550872f0a78f9a66cb8dfe75abf63	2021-06-23 23:35:24 -07:00
Michael Wootton	2f3be2735f	Don't split oversize cached blocks (#44742 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/35901 This change is designed to prevent fragmentation in the Caching Allocator. Permissive block splitting in the allocator allows very large blocks to be split into many pieces. Once split too finely it is unlikely all pieces will be 'free' at that same time so the original allocation can never be returned. Anecdotally, we've seen a model run out of memory failing to alloc a 50 MB block on a 32 GB card while the caching allocator is holding 13 GB of 'split free blocks' Approach: - Large blocks above a certain size are designated "oversize". This limit is currently set 1 decade above large, 200 MB - Oversize blocks can not be split - Oversize blocks must closely match the requested size (e.g. a 200 MB request will match an existing 205 MB block, but not a 300 MB block) - In lieu of splitting oversize blocks there is a mechanism to quickly free a single oversize block (to the system allocator) to allow an appropriate size block to be allocated. This will be activated under memory pressure and will prevent _release_cached_blocks()_ from triggering Initial performance tests show this is similar or quicker than the original strategy. Additional tests are ongoing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44742 Reviewed By: zou3519 Differential Revision: D29186394 Pulled By: ezyang fbshipit-source-id: c88918836db3f51df59de6d1b3e03602ebe306a9	2021-06-21 11:46:08 -07:00
Michael Carilli	be038d8989	[CUDA graphs] Make stream semantics of backward calls consistent with other cuda ops (ci-all edition) (#57833 ) Summary: ci-all resubmit of https://github.com/pytorch/pytorch/pull/54227. Tests look good except for a few distributed autograd failures (pytorch_linux_xenial_cuda10_2_cudnn7_py3_multigpu_test) and rocm failures (pr/pytorch-linux-bionic-rocm4.1-py3.6). The common denominator in rocm failures appears to be multi-gpu activity: some [multiprocess DDP failures](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.1-py3.6-test1/8115/console), some [single-process failures](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.1-py3.6-test2/8115/console) where the single process has autograd ops that span devices. jeffdaily jithunnair-amd sunway513, could one of you take a look? The streaming backward change is also beneficial to rocm, I expect. For debugging rocm failures, I think we should ignore the multiprocess/DDP tests and focus on the single process cases. The root cause is probably the same and the single process cases are simpler. ---------------------------------- Update: Rocm failures are due to https://github.com/pytorch/pytorch/issues/59750. `2718a54032` is a workaround, to be updated once https://github.com/pytorch/pytorch/issues/59750 is fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57833 Reviewed By: mruberry Differential Revision: D28942391 Pulled By: ngimel fbshipit-source-id: d6047e971c5f1c6386334bf3641402a92f12e2f8	2021-06-13 12:09:56 -07:00
Jeffrey Wan	a7a5992d7d	Add no-grad inference mode note (#58513 ) Summary: Adds a note explaining the difference between several often conflated mechanisms in the autograd note Also adds a link to this note from the docs in `grad_mode` and `nn.module`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58513 Reviewed By: gchanan Differential Revision: D28651129 Pulled By: soulitzer fbshipit-source-id: af9eb1749b641fc1b632815634eea36bf7979156	2021-05-25 13:06:54 -07:00
Jithun Nair	ab6b5fa036	Add HIP (ROCm) semantics doc (#57871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57871 Reviewed By: agolynski Differential Revision: D28385510 Pulled By: malfet fbshipit-source-id: 9cf69e52d026a1cf74cc12d8727ca17ae026235e	2021-05-12 12:34:07 -07:00
albanD	d16ed1ee8a	Add first draft of gradcheck note (#55966 ) Summary: You can find the latest rendered version in the `python_doc_build` CI job below, in the artifact tab of that build on circle CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/55966 Reviewed By: H-Huang Differential Revision: D28032446 Pulled By: albanD fbshipit-source-id: 227ad37b03d39894d736c19cae3195b4d56fc62f	2021-04-27 14:33:42 -07:00
Ilqar Ramazanli	70d9be0f42	Replace duplicative s with alpha (#56804 ) Summary: It is always easier to read a document when different objects / concepts denoted with different variables / representations. In this PR we make sure the [complex autograd](https://pytorch.org/docs/master/notes/autograd.html#autograd-for-complex-numbers) documentation, the variable of output and step size diverge. Fixes https://github.com/pytorch/pytorch/issues/53633 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56804 Reviewed By: anjali411 Differential Revision: D27989959 Pulled By: iramazanli fbshipit-source-id: c271590ee744c8aeeff62bfaa2295429765ef64e	2021-04-25 16:27:09 -07:00
Erjia Guan	8cf85a1152	[DataLoader][doc] Randomness for base_seed generator and NumPy seed (#56528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56528 Tried to search across internal and external usage of DataLoader. People haven't started to use `generator` for `DataLoader`. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27908487 Pulled By: ejguan fbshipit-source-id: 14c83ed40d4ba4dc988b121968a78c2732d8eb93	2021-04-22 09:40:45 -07:00
Natalia Gimelshein	f94c95a2dd	Revert D23752058: [pytorch][PR] Don't split oversize cached blocks Test Plan: revert-hammer Differential Revision: D23752058 (`67dcd62310`) Original commit changeset: ccb7c13e3cf8 fbshipit-source-id: 12ae9702135ea510e9714ed97fb75ca3b9f97c27	2021-04-14 09:24:08 -07:00
Michael Wootton	67dcd62310	Don't split oversize cached blocks (#44742 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/35901 This change is designed to prevent fragmentation in the Caching Allocator. Permissive block splitting in the allocator allows very large blocks to be split into many pieces. Once split too finely it is unlikely all pieces will be 'free' at that same time so the original allocation can never be returned. Anecdotally, we've seen a model run out of memory failing to alloc a 50 MB block on a 32 GB card while the caching allocator is holding 13 GB of 'split free blocks' Approach: - Large blocks above a certain size are designated "oversize". This limit is currently set 1 decade above large, 200 MB - Oversize blocks can not be split - Oversize blocks must closely match the requested size (e.g. a 200 MB request will match an existing 205 MB block, but not a 300 MB block) - In lieu of splitting oversize blocks there is a mechanism to quickly free a single oversize block (to the system allocator) to allow an appropriate size block to be allocated. This will be activated under memory pressure and will prevent _release_cached_blocks()_ from triggering Initial performance tests show this is similar or quicker than the original strategy. Additional tests are ongoing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44742 Reviewed By: ngimel Differential Revision: D23752058 Pulled By: ezyang fbshipit-source-id: ccb7c13e3cf8ef2707706726ac9aaac3a5e3d5c8	2021-04-14 03:04:41 -07:00
Yukio Siraichi	93bf0ae6fc	Remove legacy constructor calls from pytorch codebase. (#54142 ) Summary: Follow up from https://github.com/pytorch/pytorch/issues/53889 Related to https://github.com/pytorch/pytorch/issues/47112 Removing every occurrence of the legacy constructor call present in PyTorch at: - _docs_ - _benchmarks_ - _test_ - _caffe2_ - _CONTRIBUTING.md_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/54142 Reviewed By: ngimel Differential Revision: D27699450 Pulled By: mruberry fbshipit-source-id: 530aa3f5746cc8bc1407d5d51b2bbd8075e30546	2021-04-11 15:45:17 -07:00
James Reed	ec38dda1cc	Remove extra close bracket in extending.rst (#55409 ) Summary: Small typo Pull Request resolved: https://github.com/pytorch/pytorch/pull/55409 Reviewed By: pbelevich Differential Revision: D27611177 Pulled By: jamesr66a fbshipit-source-id: 8a5ff702e4ab8a7eb2403432889f8b7a5a69484b	2021-04-07 21:15:46 -07:00
Peter Bell	8ac0619784	Avoid infinite recursion in __torch_function__ example (#55391 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55284 This gets the example to run but probably doesn't help the readability of the example. Thoughts? Pull Request resolved: https://github.com/pytorch/pytorch/pull/55391 Reviewed By: mrshenli Differential Revision: D27621096 Pulled By: ezyang fbshipit-source-id: d02c4fb0001e54139a167b477fd3b4a229e4dc8c	2021-04-07 20:31:46 -07:00
James Reed	c96f076248	Fix typo in extending.rst (#55408 ) Summary: Small typo in docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/55408 Reviewed By: pbelevich Differential Revision: D27611175 Pulled By: jamesr66a fbshipit-source-id: a83a6220054c0411329792c7ac6afceb2b699f44	2021-04-07 03:46:01 -07:00
Sam Estep	5bcbbf5373	Lint trailing newlines (#54737 ) Summary: Context: https://github.com/pytorch/pytorch/issues/53406 added a lint for trailing whitespace at the ends of lines. However, in order to pass FB-internal lints, that PR also had to normalize the trailing newlines in four of the files it touched. This PR adds an OSS lint to normalize trailing newlines. The changes to the following files (made in 54847d0adb9be71be4979cead3d9d4c02160e4cd) are the only manually-written parts of this PR: - `.github/workflows/lint.yml` - `mypy-strict.ini` - `tools/README.md` - `tools/test/test_trailing_newlines.py` - `tools/trailing_newlines.py` I would have liked to make this just a shell one-liner like the other three similar lints, but nothing I could find quite fit the bill. Specifically, all the answers I tried from the following Stack Overflow questions were far too slow (at least a minute and a half to run on this entire repository): - [How to detect file ends in newline?](https://stackoverflow.com/q/38746) - [How do I find files that do not end with a newline/linefeed?](https://stackoverflow.com/q/4631068) - [How to list all files in the Git index without newline at end of file](https://stackoverflow.com/q/27624800) - [Linux - check if there is an empty line at the end of a file [duplicate]](https://stackoverflow.com/q/34943632) - [git ensure newline at end of each file](https://stackoverflow.com/q/57770972) To avoid giving false positives during the few days after this PR is merged, we should probably only merge it after https://github.com/pytorch/pytorch/issues/54967. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54737 Test Plan: Running the shell script from the "Ensure correct trailing newlines" step in the `quick-checks` job of `.github/workflows/lint.yml` should print no output and exit in a fraction of a second with a status of 0. That was not the case prior to this PR, as shown by this failing GHA workflow run on an earlier draft of this PR: - https://github.com/pytorch/pytorch/runs/2197446987?check_suite_focus=true In contrast, this run (after correcting the trailing newlines in this PR) succeeded: - https://github.com/pytorch/pytorch/pull/54737/checks?check_run_id=2197553241 To unit-test `tools/trailing_newlines.py` itself (this is run as part of our "Test tools" GitHub Actions workflow): ``` python tools/test/test_trailing_newlines.py ``` Reviewed By: malfet Differential Revision: D27409736 Pulled By: samestep fbshipit-source-id: 46f565227046b39f68349bbd5633105b2d2e9b19	2021-03-30 13:09:52 -07:00
Jeff Yang	475251631b	docs: reference links to serialization.html (#54659 ) Summary: fixes https://github.com/pytorch/pytorch/issues/54311 https://11811979-65600975-gh.circle-artifacts.com/0/docs/generated/torch.save.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/54659 Reviewed By: ailzhang Differential Revision: D27328281 Pulled By: zou3519 fbshipit-source-id: b88d02e5407238a338d537d013a297ae9cdf922b	2021-03-29 10:15:07 -07:00
Eric Jang	c2ccb3578e	Fix inport -> import typo in documentation (#53589 ) Summary: Fixes a small documentation typo Pull Request resolved: https://github.com/pytorch/pytorch/pull/53589 Reviewed By: ngimel Differential Revision: D26907045 Pulled By: Chillee fbshipit-source-id: 15c35bec8d75dd897fe8886d0e0e1b889df65b24	2021-03-08 23:56:42 -08:00
Sam Estep	8c798e0622	Forbid trailing whitespace (#53406 ) Summary: Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857 These are the only hand-written parts of this diff: - the addition to `.github/workflows/lint.yml` - the file endings changed in these four files (to appease FB-internal land-blocking lints): - `GLOSSARY.md` - `aten/src/ATen/core/op_registration/README.md` - `scripts/README.md` - `torch/csrc/jit/codegen/fuser/README.md` The rest was generated by running this command (on macOS): ``` git grep -I -l ' $' -- . ':(exclude)/contrib/' ':(exclude)third_party' \| xargs gsed -i 's/ *$//' ``` I looked over the auto-generated changes and didn't see anything that looked problematic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406 Test Plan: This run (after adding the lint but before removing existing trailing spaces) failed: - https://github.com/pytorch/pytorch/runs/2043032377 This run (on the tip of this PR) succeeded: - https://github.com/pytorch/pytorch/runs/2043296348 Reviewed By: walterddr, seemethere Differential Revision: D26856620 Pulled By: samestep fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97	2021-03-05 17:22:55 -08:00
peter	8870c391e9	Update mkl to 2020.2.254 (#52964 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52907 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52964 Reviewed By: H-Huang Differential Revision: D26726464 Pulled By: seemethere fbshipit-source-id: 8f3067292e6416e299b4b040c8fb73510134f02e	2021-03-01 11:13:57 -08:00
Joel Schlosser	a0137808a7	Note on Modules for 1.8 docs (#51536 ) Summary: A new note on Modules for 1.8 documentation. Rendered form can be seen here: https://alband.github.io/doc_view/notes/modules.html (thanks Alban!) Pull Request resolved: https://github.com/pytorch/pytorch/pull/51536 Reviewed By: albanD Differential Revision: D26254282 Pulled By: jbschlosser fbshipit-source-id: 09cbd46aa268a29b6f54fd48ffe1d6b98db0ff31	2021-02-04 11:28:11 -08:00
anjali411	34d4d79966	Autograd doc note fix (#51661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51661 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D26230912 Pulled By: anjali411 fbshipit-source-id: 94323d7bce631a4c5781020e9650495461119ede	2021-02-03 15:08:35 -08:00
Mike Ruberry	40c0fffb4b	Fixes docs (#51439 ) Summary: pytorch_python_doc_build is failing with: ``` Jan 31 04:30:45 /var/lib/jenkins/workspace/docs/source/notes/broadcasting.rst:6: WARNING: 'any' reference target not found: numpy.doc.broadcasting ``` this removes the incorrect reference and adds an updated link. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51439 Reviewed By: ngimel Differential Revision: D26170232 Pulled By: mruberry fbshipit-source-id: 829999db52e1e860d36d626d0d9f26e31283d14b	2021-01-31 22:00:26 -08:00
anjali411	fd9a85d21b	Doc update for complex numbers (#51129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51129 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26094947 Pulled By: anjali411 fbshipit-source-id: 4e1cdf8915a8c6a86ac3462685cdce881e1bcffa	2021-01-27 07:32:26 -08:00
Hameer Abbasi	f7b339d11c	Clarify wording around overrides subclasses. (#51031 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47117 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51031 Reviewed By: bdhirsh Differential Revision: D26047498 Pulled By: albanD fbshipit-source-id: dd0a7d9f97c0f6469b3050d2e3b4473f1bee3820	2021-01-25 08:19:13 -08:00
Kurt Mohler	8ab1a1495d	Rename `set_deterministic` to `use_deterministic_algorithms` (#49904 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49100 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49904 Reviewed By: ezyang, mrshenli Differential Revision: D25956761 Pulled By: mruberry fbshipit-source-id: 86a59289d50825a0ebbd7c358b483c8d8039ffa6	2021-01-22 11:27:07 -08:00
Jeffrey Wan	1833009202	Fix typo in complex autograd docs (#49755 ) Summary: Update complex autograd docs to fix a typo Pull Request resolved: https://github.com/pytorch/pytorch/pull/49755 Reviewed By: mruberry Differential Revision: D25692649 Pulled By: soulitzer fbshipit-source-id: 43c2113b4c8f2d1828880102189a5a9b887dc784	2020-12-23 14:42:34 -08:00
pbialecki	1451d84766	Minor doc fix: change truncating to rounding in TF32 docs (#49625 ) Summary: Minor doc fix in clarifying that the input data is rounded not truncated. CC zasdfgbnm ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/49625 Reviewed By: mruberry Differential Revision: D25668244 Pulled By: ngimel fbshipit-source-id: ac97e41e0ca296276544f9e9f85b2cf1790d9985	2020-12-22 13:46:33 -08:00
Peter Bell	5180caeeb4	Remove deprecated spectral ops from torch namespace (#48594 ) Summary: Ref https://github.com/pytorch/pytorch/issues/42175 This removes the 4 deprecated spectral functions: `torch.{fft,rfft,ifft,irfft}`. `torch.fft` is also now imported by by default. The actual `at::native` functions are still used in `torch.stft` so can't be full removed yet. But will once https://github.com/pytorch/pytorch/issues/47601 has been merged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48594 Reviewed By: heitorschueroff Differential Revision: D25298929 Pulled By: mruberry fbshipit-source-id: e36737fe8192fcd16f7e6310f8b49de478e63bf0	2020-12-05 04:12:32 -08:00
peter	3c5db30eaa	Update magma to 2.5.4 for Windows (#48656 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48527 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48656 Reviewed By: zhangguanheng66 Differential Revision: D25261601 Pulled By: malfet fbshipit-source-id: 4ba0036ca882bccd1990108d13596455d179d06e	2020-12-02 09:45:21 -08:00
Hameer Abbasi	4e15877d5c	Add documentation for torch.overrides submodule. (#48170 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48087 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48170 Reviewed By: ejguan Differential Revision: D25220942 Pulled By: ezyang fbshipit-source-id: a2b7f7b565f5e77173d8ce2fe9676a8131f929b6	2020-11-30 11:25:31 -08:00
Hameer Abbasi	3a2aad9314	Fix documentation to point to torch.overrides instead of _overrides. (#47842 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47697 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47842 Reviewed By: smessmer Differential Revision: D24951750 Pulled By: ezyang fbshipit-source-id: df62ec2e52f1c561c864a50bac4abf4a55e4f8e6	2020-11-16 08:28:53 -08:00
Masaki Kozuki	2eb1e866e8	Update links in DDP note (#47663 ) Summary: Update the links in https://pytorch.org/docs/stable/notes/ddp.html#. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47663 Reviewed By: smessmer Differential Revision: D24951684 Pulled By: ezyang fbshipit-source-id: c1c104d76cf0292a7fc75a627bf76bb56fea72d0	2020-11-13 21:26:28 -08:00
Xiang Gao	4a7de2746f	Add docs on how to toggle TF32 flags on C++ (#47331 ) Summary: I have been asked several times how to toggle this flag on libtorch. I think it would be good to mention it in the docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47331 Reviewed By: glaringlee Differential Revision: D24777576 Pulled By: mruberry fbshipit-source-id: cc2a338c477bb57e0bb74b8960c47fde99665e41	2020-11-08 01:29:24 -08:00
anjali411	ac245f6b45	Complex autograd doc fix (#46258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46258 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D24286512 Pulled By: anjali411 fbshipit-source-id: 60bc98d69336101c0d8fe5ab542b9757b5e7faac	2020-10-13 14:36:50 -07:00
Vitaly Fedyunin	31ee5d8d8b	Adding information how to control randomness with DataLoader (#45749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45749 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D24088407 Pulled By: VitalyFedyunin fbshipit-source-id: 398b73ec5e8c83000ebc692001da847fc0aaa48f	2020-10-12 16:57:58 -07:00
anjali411	89256611b5	Doc note update for complex autograd (#45270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45270 <img width="1679" alt="Screen Shot 2020-10-07 at 1 45 59 PM" src="https://user-images.githubusercontent.com/20081078/95368324-fa7b2d00-08a3-11eb-9066-2e659a4085a2.png"> <img width="1673" alt="Screen Shot 2020-10-07 at 1 46 10 PM" src="https://user-images.githubusercontent.com/20081078/95368332-fbac5a00-08a3-11eb-9be5-77ce6deb8967.png"> <img width="1667" alt="Screen Shot 2020-10-07 at 1 46 30 PM" src="https://user-images.githubusercontent.com/20081078/95368337-fe0eb400-08a3-11eb-80a2-5ad23feeeb83.png"> <img width="1679" alt="Screen Shot 2020-10-07 at 1 46 48 PM" src="https://user-images.githubusercontent.com/20081078/95368345-00710e00-08a4-11eb-96d9-e2d544554a4b.png"> <img width="1680" alt="Screen Shot 2020-10-07 at 1 47 03 PM" src="https://user-images.githubusercontent.com/20081078/95368350-023ad180-08a4-11eb-89b3-f079480741f4.png"> <img width="1680" alt="Screen Shot 2020-10-07 at 1 47 12 PM" src="https://user-images.githubusercontent.com/20081078/95368364-0535c200-08a4-11eb-82fc-9435a046e4ca.png"> Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D24203257 Pulled By: anjali411 fbshipit-source-id: cd637dade5fb40cecf5d9f4bd03d508d36e26fcd	2020-10-08 15:04:52 -07:00
Michael Carilli	5640b79bf8	Allow consumer ops to sync on GraphRoot's gradient (#45787 ) Summary: Currently, a GraphRoot instance doesn't have an associated stream. Streaming backward synchronization logic assumes the instance ran on the default stream, and tells consumer ops to sync with the default stream. If the gradient the GraphRoot instance passes to consumer backward ops was populated on a non-default stream, we have a race condition. The race condition can exist even if the user doesn't give a manually populated gradient: ```python with torch.cuda.stream(side_stream): # loss.backward() implicitly synthesizes a one-element 1.0 tensor on side_stream # GraphRoot passes it to consumers, but consumers first sync on default stream, not side_stream. loss.backward() # Internally to backward(), streaming-backward logic takes over, stuff executes on the same stream it ran on in forward, # and the side_stream context is irrelevant. GraphRoot's interaction with its first consumer(s) is the spot where # the side_stream context causes a problem. ``` This PR fixes the race condition by associating a GraphRoot instance, at construction time, with the current stream(s) on the device(s) of the grads it will pass to consumers. (i think this relies on GraphRoot executing in the main thread, before backward thread(s) fork, because the grads were populated on the main thread.) The test demonstrates the race condition. It fails reliably without the PR's GraphRoot diffs and passes with the GraphRoot diffs. With the GraphRoot diffs, manually populating an incoming-gradient arg for `backward` (or `torch.autograd.grad`) and the actual call to `autograd.backward` will have the same stream-semantics relationship as any other pair of ops: ```python # implicit population is safe with torch.cuda.stream(side_stream): loss.backward() # explicit population in side stream then backward in side stream is safe with torch.cuda.stream(side_stream): kickoff_grad = torch.ones_like(loss) loss.backward(gradient=kickoff_grad) # explicit population in one stream then backward kickoff in another stream # is NOT safe, even with this PR's diffs, but that unsafety is consistent with # stream-semantics relationship of any pair of ops kickoff_grad = torch.ones_like(loss) with torch.cuda.stream(side_stream): loss.backward(gradient=kickoff_grad) # Safe, as you'd expect for any pair of ops kickoff_grad = torch.ones_like(loss) side_stream.wait_stream(torch.cuda.current_stream()) with torch.cuda.stream(side_stream): loss.backward(gradient=kickoff_grad) ``` This PR also adds the last three examples above to cuda docs and references them from autograd docstrings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45787 Reviewed By: nairbv Differential Revision: D24138376 Pulled By: albanD fbshipit-source-id: bc4cd9390f9f0358633db530b1b09f9c1080d2a3	2020-10-07 08:53:53 -07:00
Bert Maher	03342af3a3	Add env variable to bypass CUDACachingAllocator for debugging (#45294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45294 While tracking down a recent memory corruption bug we found that cuda-memcheck wasn't finding the bad accesses, and ngimel pointed out that it's because we use a caching allocator so a lot of "out of bounds" accesses land in a valid slab. This PR adds a runtime knob (`PYTORCH_NO_CUDA_MEMORY_CACHING`) that, when set, bypasses the caching allocator's caching logic so that allocations go straight to cudaMalloc. This way, cuda-memcheck will actually work. Test Plan: Insert some memory errors and run a test under cuda-memcheck; observe that cuda-memcheck flags an error where expected. Specifically I removed the output-masking logic here: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/cuda_codegen.cpp#L819-L826 And ran: ``` PYTORCH_NO_CUDA_MEMORY_CACHING=1 cuda-memcheck pytest -k test_superslomo test_jit_fuser_te.py ``` Reviewed By: ngimel Differential Revision: D23964734 Pulled By: bertmaher fbshipit-source-id: 04efd11e8aff037b9edde80c70585cb820ee6e39	2020-09-28 11:40:04 -07:00
Michael Carilli	3e6bb5233f	Reference amp tutorial (recipe) from core amp docs (#44725 ) Summary: https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html is live. Core amp docs should reference it. Also i fixed some typos in the `zero_grad` docs we ignored when git was behaving weirdly during ngimel 's merge of https://github.com/pytorch/pytorch/pull/44423. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44725 Reviewed By: mruberry Differential Revision: D23723807 Pulled By: ngimel fbshipit-source-id: ca0b76365f8ca908bd978e3b38bf81857fa6c2a3	2020-09-16 11:37:58 -07:00
Michael Carilli	2fd142a2ef	Small clarification to amp gradient penalty example (#44667 ) Summary: requested by https://discuss.pytorch.org/t/what-is-the-correct-way-of-computing-a-grad-penalty-using-amp/95827/3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44667 Reviewed By: mruberry Differential Revision: D23692768 Pulled By: ngimel fbshipit-source-id: 83c61b94e79ef9f86abed2cc066f188dce0c8456	2020-09-14 21:56:09 -07:00
Gao, Xiang	5e97f251a8	Enable TF32 support for cuDNN (#40737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40737 Reviewed By: mruberry Differential Revision: D22801525 Pulled By: ngimel fbshipit-source-id: ac7f7e728b4b3e01925337e8c9996f26a6433fd2	2020-09-01 15:34:24 -07:00
Kurt Mohler	d7ee84c9b5	Update determinism documentation (#41692 ) Summary: Add user-facing documentation for set_deterministic Also update grammar and readability in Reproducibility page Issue https://github.com/pytorch/pytorch/issues/15359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41692 Reviewed By: ailzhang Differential Revision: D23433061 Pulled By: mruberry fbshipit-source-id: 4c4552950803c2aaf80f7bb4792d2095706d07cf	2020-08-31 21:06:24 -07:00
peterjc123	9b05fbd92e	Correct the windows docs (#43479 ) Summary: Fixes https://discuss.pytorch.org/t/i-cannot-use-the-pytorch-that-was-built-successfully-from-source-dll-initialization-routine-failed-error-loading-caffe2-detectron-ops-gpu-dll/93243/5?u=peterjc123. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43479 Reviewed By: mrshenli, ngimel Differential Revision: D23294211 Pulled By: ezyang fbshipit-source-id: d67df7d0355c2783153d780c94f959758b246d36	2020-08-25 13:41:24 -07:00
Heitor Schueroff de Souza	ffc3da35f4	Don't materialize output grads (#41821 ) Summary: Added a new option in AutogradContext to tell autograd to not materialize output grad tensors, that is, don't expand undefined/None tensors into tensors full of zeros before passing them as input to the backward function. This PR is the second part that closes https://github.com/pytorch/pytorch/issues/41359. The first PR is https://github.com/pytorch/pytorch/pull/41490. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41821 Reviewed By: albanD Differential Revision: D22693163 Pulled By: heitorschueroff fbshipit-source-id: a8d060405a17ab1280a8506a06a2bbd85cb86461	2020-08-11 04:27:07 -07:00
Hameer Abbasi	3d46e02ea1	Add __torch_function__ for methods (#37091 ) Summary: According to pytorch/rfcs#3 From the goals in the RFC: 1. Support subclassing `torch.Tensor` in Python (done here) 2. Preserve `torch.Tensor` subclasses when calling `torch` functions on them (done here) 3. Use the PyTorch API with `torch.Tensor`-like objects that are _not_ `torch.Tensor` subclasses (done in https://github.com/pytorch/pytorch/issues/30730) 4. Preserve `torch.Tensor` subclasses when calling `torch.Tensor` methods. (done here) 5. Propagating subclass instances correctly also with operators, using views/slices/indexing/etc. (done here) 6. Preserve subclass attributes when using methods or views/slices/indexing. (done here) 7. A way to insert code that operates on both functions and methods uniformly (so we can write a single function that overrides all operators). (done here) 8. The ability to give external libraries a way to also define functions/methods that follow the `__torch_function__` protocol. (will be addressed in a separate PR) This PR makes the following changes: 1. Adds the `self` argument to the arg parser. 2. Dispatches on `self` as well if `self` is not `nullptr`. 3. Adds a `torch._C.DisableTorchFunction` context manager to disable `__torch_function__`. 4. Adds a `torch::torch_function_enabled()` and `torch._C._torch_function_enabled()` to check the state of `__torch_function__`. 5. Dispatches all `torch._C.TensorBase` and `torch.Tensor` methods via `__torch_function__`. TODO: - [x] Sequence Methods - [x] Docs - [x] Tests Closes https://github.com/pytorch/pytorch/issues/28361 Benchmarks in https://github.com/pytorch/pytorch/pull/37091#issuecomment-633657778 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37091 Reviewed By: ngimel Differential Revision: D22765678 Pulled By: ezyang fbshipit-source-id: 53f8aa17ddb8b1108c0997f6a7aa13cb5be73de0	2020-08-05 20:44:13 -07:00
peter	192487d716	Update MAGMA to 2.5.3 for Windows (#42410 ) Summary: In order to introduce CUDA 11 build jobs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42410 Reviewed By: malfet Differential Revision: D22892025 Pulled By: ezyang fbshipit-source-id: 11bd7507f623d654a589ba00a138f6b947990f4c	2020-08-03 07:43:09 -07:00
Xiao Wang	60e2baf5e0	[doc] Add LSTM non-deterministic workaround (#40893 ) Summary: Related: https://github.com/pytorch/pytorch/issues/35661 Preview ![image](https://user-images.githubusercontent.com/24860335/86861581-4b4c7100-c07c-11ea-950a-3145bfae9af9.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/40893 Reviewed By: vincentqb Differential Revision: D22535418 Pulled By: ngimel fbshipit-source-id: f194ddaff8ec6d03a3616c87466e2cbbe7e429a9	2020-07-21 16:20:02 -07:00
Mike Ruberry	a0e58996fb	Makes the use of the term "module" consistent through the serialization note (#41563 ) Summary: module -> torch.nn.Module or ScriptModule, as appropriate. + bonus grammar fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41563 Reviewed By: gchanan Differential Revision: D22584173 Pulled By: mruberry fbshipit-source-id: 8c90f1f9a194bfdb277c97cf02c9b8c1c6ddc601	2020-07-16 14:59:49 -07:00
Mike Ruberry	f49d97a848	Notes for lcm and gcd, formatting doc fixes (#41526 ) Summary: A small PR fixing some formatting in lcm, gcd, and the serialization note. Adds a note to lcm and gcd explaining behavior that is not always defined. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41526 Reviewed By: ngimel Differential Revision: D22569341 Pulled By: mruberry fbshipit-source-id: 5f5ff98c0831f65e82b991ef444a5cee8e3c8b5a	2020-07-16 13:15:29 -07:00
anjali411	b9442bb03e	Doc note for complex (#41252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41252 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D22553266 Pulled By: anjali411 fbshipit-source-id: f6dc409da048496d72b29b0976dfd3dd6645bc4d	2020-07-16 08:53:27 -07:00
Xiang Gao	23174ca71b	[reland] Enable TF32 support for cuBLAS (#41498 ) Summary: fix rocm Pull Request resolved: https://github.com/pytorch/pytorch/pull/41498 Reviewed By: mruberry Differential Revision: D22560572 Pulled By: ngimel fbshipit-source-id: 5ee79e96cb29e70d9180830d058efb53d1c6c041	2020-07-15 21:00:55 -07:00
Mike Ruberry	60f2fa6a84	Updates serialization note to explain versioned symbols and dynamic versioning (#41395 ) Summary: Doc update intended to clarify and expand our current serialization behavior, including explaining the difference between torch.save/torch.load, torch.nn.Module.state_dict/torch.nn.Module.load_state_dict, and torch.jit.save/torch.jit.load. Also explains, for the time, when historic serialized Torchscript behavior is preserved and our recommendation for preserving behavior (using the same PyTorch version to consume a model as produced it). Pull Request resolved: https://github.com/pytorch/pytorch/pull/41395 Reviewed By: ngimel Differential Revision: D22560538 Pulled By: mruberry fbshipit-source-id: dbc2f1bb92ab61ff2eca4888febc21f7dda76ba1	2020-07-15 19:05:19 -07:00
Shen Li	3a63a939d4	Revert D22517785: [pytorch][PR] Enable TF32 support for cuBLAS Test Plan: revert-hammer Differential Revision: D22517785 (`288ece89e1`) Original commit changeset: 87334c893561 fbshipit-source-id: 0a0674f49c1bcfc98f7f88af5a8c7de93b76e458	2020-07-15 08:15:48 -07:00
Xiang Gao	288ece89e1	Enable TF32 support for cuBLAS (#40800 ) Summary: Benchmark on a fully connected network and torchvision models (time in seconds) on GA100: \| model \| batch size \| forward(TF32) \| forward(FP32) \| backward(TF32) \| backward(FP32) \| \|--------------------\|------------\|---------------\|---------------\|----------------\|----------------\| \| FC 512-128-32-8 \| 512 \| 0.000211 \| 0.000321 \| 0.000499 \| 0.000532 \| \| alexnet \| 512 \| 0.0184 \| 0.0255 \| 0.0486 \| 0.0709 \| \| densenet161 \| 128 \| 0.0665 \| 0.204 \| 0.108 \| 0.437 \| \| googlenet \| 256 \| 0.0925 \| 0.110 \| 0.269 \| 0.326 \| \| inception_v3 \| 256 \| 0.155 \| 0.214 \| 0.391 \| 0.510 \| \| mnasnet1_0 \| 512 \| 0.108 \| 0.137 \| 0.298 \| 0.312 \| \| mobilenet_v2 \| 512 \| 0.114 \| 0.294 \| 0.133 \| 0.303 \| \| resnet18 \| 512 \| 0.0722 \| 0.100 \| 0.182 \| 0.228 \| \| resnext50_32x4d \| 256 \| 0.170 \| 0.237 \| 0.373 \| 0.479 \| \| shufflenet_v2_x1_0 \| 512 \| 0.0463 \| 0.0473 \| 0.125 \| 0.123 \| \| squeezenet1_0 \| 512 \| 0.0870 \| 0.0948 \| 0.205 \| 0.214 \| \| vgg16 \| 256 \| 0.167 \| 0.234 \| 0.401 \| 0.502 \| \| wide_resnet50_2 \| 512 \| 0.186 \| 0.310 \| 0.415 \| 0.638 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/40800 Reviewed By: mruberry Differential Revision: D22517785 Pulled By: ngimel fbshipit-source-id: 87334c8935616f72a6af5abbd3ae69f76923dc3e	2020-07-14 13:21:10 -07:00
Michael Carilli	d927aee312	Small clarification of torch.cuda.amp multi-model example (#41203 ) Summary: some people have been confused by `retain_graph` in the snippet, they thought it was an additional requirement imposed by amp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41203 Differential Revision: D22463700 Pulled By: ngimel fbshipit-source-id: e6fc8871be2bf0ecc1794b1c6f5ea99af922bf7e	2020-07-10 11:13:26 -07:00
anjali411	db38487ece	Autograd Doc for Complex Numbers (#41012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41012 Test Plan: Imported from OSS Differential Revision: D22476911 Pulled By: anjali411 fbshipit-source-id: 7da20cb4312a0465272bebe053520d9911475828	2020-07-10 09:57:43 -07:00
Edward Leardi	6b50874cb7	Fix HTTP links in documentation to HTTPS (#40878 ) Summary: I ran `make linkcheck` using `sphinx.builders.linkcheck` on the documentation and noticed a few links weren't using HTTPS so I quickly updated them all. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40878 Differential Revision: D22404647 Pulled By: ngimel fbshipit-source-id: 9c9756db59197304023fddc28f252314f6cf4af3	2020-07-06 20:05:21 -07:00
Ailing Zhang	d7cd16858f	Add documentation about storage sharing is preserved and serialized f… (#40412 ) Summary: …ile size. fixes https://github.com/pytorch/pytorch/issues/40157 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40412 Reviewed By: ezyang Differential Revision: D22265639 Pulled By: ailzhang fbshipit-source-id: 16b0301f16038bd784e7e92f63253fedc7820adc	2020-06-29 17:23:29 -07:00
Jeong Ukjae	b4db529352	Fix wrong link in docs/source/notes/ddp.rst (#40484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40484 Differential Revision: D22259834 Pulled By: mrshenli fbshipit-source-id: 4ec912c600c81010bdb2778c35cbb0321480199f	2020-06-28 13:55:56 -07:00
Wanchao Liang	eebd492dcf	[doc] fix autograd doc subsubsection display issue (#40582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40582 There's a misuse in the `requires_grad` with ~~~~~~, "~~~~" is not a official section marker, change it to "^^^^^" to denote subsubsections, also fix the other places where we should use subsection "-----" instead of subsubsection "^^^^" see https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#sections Before: <img width="712" alt="rst_before" src="https://user-images.githubusercontent.com/9443650/85789835-2226fa80-b6e4-11ea-97b6-2b19fdf324a4.png"> After: <img width="922" alt="rst_after" src="https://user-images.githubusercontent.com/9443650/85789856-281cdb80-b6e4-11ea-925f-cb3f4ebaa2bf.png"> Test Plan: Imported from OSS Differential Revision: D22245747 Pulled By: wanchaol fbshipit-source-id: 11548ed42f627706863bb74d4269827d1b3450d4	2020-06-25 23:28:33 -07:00
Michael Carilli	3b040c478a	Make custom_fwd a no-op when not executed under autocast (#36171 ) Summary: Currently, a custom autograd function written with ``` torch.cuda.amp.custom_fwd(cast_inputs=dtype) def forward(ctx, *args): ... ``` casts incoming floating-point CUDA tensors to `dtype` unconditionally, regardless of whether the function executes in an autocast-enabled region. I think I had the wrong idea there. Autocast-disabled regions should give the user control of input types. Also, `custom_fwd(cast_inputs=dtype)`-decorated functions' behavior should align with native fp32list/fp16list functions. C++-side casting wrappers have no effect when autocast is disabled, and `custom_fwd`'s casting should behave the same way. The present PR changes `custom_fwd` so it only casts in autocast-enabled regions (also updates custom_fwd to ignore fp64 inputs, like the C++ wrappers). Pull Request resolved: https://github.com/pytorch/pytorch/pull/36171 Differential Revision: D22179511 Pulled By: ngimel fbshipit-source-id: 5a93d070179a43206066bce19da0a5a19ecaabbd	2020-06-23 10:23:02 -07:00
Rohan Varma	ae2f1f0372	[DDP Note] Remove refs to RoundRobin PG until we officially support it (#40380 ) Summary: Removes line mentioning `ProcessGroupRoundRobin` since we don't intend it to be used as a public API just yet. We can add this back when we officially support the API Pull Request resolved: https://github.com/pytorch/pytorch/pull/40380 Differential Revision: D22165556 Pulled By: rohan-varma fbshipit-source-id: 24d0477d881dc74f2ff579de61dfd1ced2b09e75	2020-06-22 16:19:29 -07:00
anjali411	8ec2ae9a9f	Add view_as_real, view_as_complex for complex tensors (#39099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39099 Test Plan: Imported from OSS Differential Revision: D22057886 Pulled By: anjali411 fbshipit-source-id: bad5ba7097ba0dd13f2c549b2463094dee9afa14	2020-06-22 15:15:27 -07:00
James Reed	c73095e78f	Add note to serialization docs about zipfile format (#40288 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40288 Test Plan: Imported from OSS Differential Revision: D22140324 Pulled By: jamesr66a fbshipit-source-id: 01d7aa642ed2f4e4bdac4b7f3223bf4d7e62fd4d	2020-06-19 13:40:08 -07:00
Alban Desmaison	b88b7d552f	Prevent custom Functions from creating non differentiable type that requires grad (#38326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38326 Test Plan: Imported from OSS Differential Revision: D21668740 Pulled By: albanD fbshipit-source-id: f452f65e76003492055311523a652937b1300183	2020-05-21 08:30:14 -07:00
Ilia Cherniavskii	43dd8760d7	Move ThreadLocalDebugInfo to c10 (#37774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37774 Move ThreadLocalDebugInfo from ATen to C10 Test Plan: Imported from OSS Differential Revision: D21384249 Pulled By: ilia-cher fbshipit-source-id: f9b5089a868f84a2ee013695a481fcc883d3c6b2	2020-05-11 19:27:41 -07:00
毛毛	19d6e32e9a	fix sample code (#38002 ) Summary: Make Linear layer working correct when bias is False Pull Request resolved: https://github.com/pytorch/pytorch/pull/38002 Differential Revision: D21509679 Pulled By: malfet fbshipit-source-id: c7077992cf414ecc557b39e5ed1e39ef01c8b347	2020-05-11 15:34:09 -07:00
Ilia Cherniavskii	2d708cefcc	Move RecordFunction into ATen (#37548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37548 Moving RecordFunction from torch::autograd::profiler into at namespace Test Plan: CI Imported from OSS Differential Revision: D21315852 fbshipit-source-id: 4a4dbabf116c162f9aef0da8606590ec3f3847aa	2020-05-07 14:52:39 -07:00
Ilia Cherniavskii	c24c5f9684	Make RecordFunction callbacks thread local and modernize interface (#37491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37491 This PR modernizes RecordFunction API and adds thread local callbacks in addition to the global ones Changes: - support for TLS callbacks, this is going to be the foundation of profiler and other tools - modernize interface around simple set of functions (add\|remove\|has\|clear)(Global\|ThreadLocal)(Callback) and adding RecordFunctionCallback to easily construct callbacks to be passed - we also add `.setShouldRun` into the callback interface to support cases when simple uniform sampling is not enough - to properly support add/remove introduce the idea of callback handle returned by add - internal implementation still uses SmallVector to store intermediate state (as before) - in this case these are vector of handles of callbacks that were picked to run - to speed up runtime we keep these vectors sorted, this way we can quickly enumerate callbacks that need to be run - added tests for new functionality Test Plan: BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py develop install ./build/bin/test_jit CI record_function_benchmark: https://gist.github.com/ilia-cher/f1e094dae47fe23e55e7672ac4dcda2f Imported from OSS Differential Revision: D21300448 fbshipit-source-id: 6d55c26dbf20b33d35c3f1604dcc07bb063c8c43	2020-05-07 14:51:02 -07:00
Edward Yang	4fef3763dd	Revert "Revert D21337640: [pytorch][PR] Split up documentation into subpages and clean up some warnings" (#37778 ) Summary: Original PR: https://github.com/pytorch/pytorch/pull/37419 cc mattip suo Pull Request resolved: https://github.com/pytorch/pytorch/pull/37778 Differential Revision: D21385774 Pulled By: ezyang fbshipit-source-id: 5de532faab8bae132736b6b5189e0ee2ac9935be	2020-05-04 14:32:35 -07:00
Michael Suo	20f7e62b1d	Revert D21337640: [pytorch][PR] Split up documentation into subpages and clean up some warnings Test Plan: revert-hammer Differential Revision: D21337640 Original commit changeset: d4ad198780c3 fbshipit-source-id: fa9ba6ac542173a50bdb45bfa12f3fec0ed704fb	2020-05-04 10:57:55 -07:00
mattip	f10fbcc820	Split up documentation into subpages and clean up some warnings (#37419 ) Summary: xref gh-32838, gh-34032 This is a major refactor of parts of the documentation to split it up using sphinx's `autosummary` feature which will build out `autofuction` and `autoclass` stub files and link to them. The end result is that the top module pages like torch.nn.rst and torch.rst are now more like table-of-contents to the actual single-class or single-function documentations pages. Along the way, I modified many of the docstrings to eliminate sphinx warnings when building. I think the only thing I changed from a non-documentation perspective is to add names to `__all__` when adding them to `globals()` in `torch.__init__.py` I do not know the CI system: are the documentation build artifacts available after the build, so reviewers can preview before merging? Pull Request resolved: https://github.com/pytorch/pytorch/pull/37419 Differential Revision: D21337640 Pulled By: ezyang fbshipit-source-id: d4ad198780c3ae7a96a9f22651e00ff2d31a0c0f	2020-05-04 09:39:22 -07:00
Ilia Cherniavskii	d068a456d3	[resubmit] Enable global observers API (#37382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37382 After adding c10::DispatchKey::Profiler the behavior of RecordFunction observers is also controlled by the dispatch key, this PR moves the logic outside of the profiler into the record function Reviewed By: jamesr66a Differential Revision: D21268320 fbshipit-source-id: 93207e3b55325d20dcc5b1e8f448ab86933321da	2020-04-28 10:49:31 -07:00
Michael Suo	20143e5f27	Revert D21245094: [resubmit] Enable global observers API Test Plan: revert-hammer Differential Revision: D21245094 Original commit changeset: 595e41b18206 fbshipit-source-id: 90344b361857d76ce5db75438c949dad1f5f186b	2020-04-27 16:19:46 -07:00
Wanchao Liang	1039b95ff0	[autograd] add documentation about multithread autograd (#37020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37020 Add multithread autograd documentation to the doc note. Test Plan: Imported from OSS Differential Revision: D21260996 Pulled By: wanchaol fbshipit-source-id: 91d523560268ae62d4c6d773121b282ba837a561	2020-04-27 15:53:21 -07:00
Ilia Cherniavskii	5fab4c30dd	[resubmit] Enable global observers API (#37292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37292 After adding c10::DispatchKey::Profiler the behavior of RecordFunction observers is also controlled by the dispatch key, this PR moves the logic outside of the profiler into the record function Reviewed By: jamesr66a Differential Revision: D21245094 fbshipit-source-id: 595e41b18206d2ba4cf639cb320f630907868b3f	2020-04-27 14:24:51 -07:00
Ilia Cherniavskii	856e8cf028	Revert D21213786: Enable global observers API Test Plan: revert-hammer Differential Revision: D21213786 Original commit changeset: e618254da74a fbshipit-source-id: 425ea5d44fa55655ec0dd586c5075996b926177b	2020-04-25 00:59:24 -07:00
Ilia Cherniavskii	6e659e928b	Enable global observers API (#37195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37195 After adding c10::DispatchKey::Profiler the behavior of RecordFunction observers is also controlled by the dispatch key, this PR moves the logic outside of the profiler into the record function Reviewed By: ngimel Differential Revision: D21213786 fbshipit-source-id: e618254da74a4f1ce16c51a3869bbd75a4f561ad	2020-04-24 23:49:28 -07:00
Alban Desmaison	3799d1d74a	Fix many doc issues (#37099 ) Summary: Fix https://github.com/pytorch/pytorch/issues/35643 https://github.com/pytorch/pytorch/issues/37063 https://github.com/pytorch/pytorch/issues/36307 https://github.com/pytorch/pytorch/issues/35861 https://github.com/pytorch/pytorch/issues/35299 https://github.com/pytorch/pytorch/issues/23108 https://github.com/pytorch/pytorch/issues/4661 Just a bunch of small updates on the doc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37099 Differential Revision: D21185713 Pulled By: albanD fbshipit-source-id: 4ac06d6709dc0da6109a6ad3daae75667ee5863e	2020-04-23 10:01:03 -07:00
Michael Carilli	e6bc34f549	Amp gradient accumulation example (#36601 ) Summary: Several people have asked me about proper Amp usage with gradient accumulation. In particular, it's [unclear to people](https://github.com/NVIDIA/apex/issues/439#issuecomment-610351482) that you should only call `scaler.unscale_()` (if desired) and `scaler.update()` in iterations where you actually plan to step. This PR adds a minimal accumulation example. I built the docs locally and it looks free from sphinx errors, at least. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36601 Differential Revision: D21082295 Pulled By: ngimel fbshipit-source-id: b2faa6c02b9f7e1972618a0f1d5360a03f0450ac	2020-04-17 09:56:36 -07:00
Jessica Lin	ac950bb9c8	Update docs for master to remove Python 2 references (#36336 ) Summary: Fix compile error from original PR in jit_language_references.rst: https://github.com/pytorch/pytorch/pull/36114 Full details in task: https://our.intern.facebook.com/intern/tasks/?t=64776265 With pytroch 1.5+ we remove python2 support from PyTorch. All documentation under docs/ and on the pytorch.org website needs to remove Python 2 references. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36336 Differential Revision: D21057507 Pulled By: jlin27 fbshipit-source-id: 993a763f1ecb16dad859bc02a07625ddc023645d	2020-04-16 10:15:48 -07:00
Edward Yang	6016f694c0	Revert D20901746: [pytorch][PR] Update docs for master to remove Python 2 references Test Plan: revert-hammer Differential Revision: D20901746 Original commit changeset: 07f8dc8e6fab fbshipit-source-id: 13c55597f9f79b8473210cf35a5a0f1fb34bae39	2020-04-08 14:49:11 -07:00
Jessica Lin	43234be525	Update docs for master to remove Python 2 references (#36114 ) Summary: Full details in task: https://our.intern.facebook.com/intern/tasks/?t=64776265 With pytroch 1.5+ we remove python2 support from PyTorch. All documentation under docs/ and on the pytorch.org website needs to remove Python 2 references. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36114 Differential Revision: D20901746 Pulled By: jlin27 fbshipit-source-id: 07f8dc8e6fab0b232e5048a63079cab0c433c85f	2020-04-07 16:13:18 -07:00
Rohan Varma	1f06db2579	Refactored rpc docs (#35109 ) Summary: Reorganize as per jlin27 's comments. Screenshots added in comments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35109 Differential Revision: D20788774 Pulled By: rohan-varma fbshipit-source-id: 7d64be70ef76ed6ff303d05d39c338293c234766	2020-04-01 02:01:34 -07:00
Ilia Cherniavskii	bc6bd0bb1a	Debug Information Guard Summary: This diff fixes the issues with current handling of debug information passed along the execution of the model. (For example, it is possible that multiple calls to the debug guard may override each other) Test Plan: CI test/cpp/jit Reviewed By: dzhulgakov Differential Revision: D20602775 fbshipit-source-id: 4683957954028af81a1a0f1f12b243650230c9bb	2020-04-01 01:55:29 -07:00
Ilia Cherniavskii	800d5617c0	Recording of TorchScript functions (#34710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34710 Extending RecordFunction API to support new recording scopes (such as TorchScript functions), as well as giving more flexibility to set sampling rate. Test Plan: unit test (test_misc.cpp/testRecordFunction) Reviewed By: gdankel, dzhulgakov Differential Revision: D20158523 fbshipit-source-id: a9e0819d21cc06f4952d92d43246587c36137582	2020-03-31 00:33:23 -07:00
pinzhenx	bd604cb5b7	Upgrade MKL-DNN to DNNL v1.2 (#32422 ) Summary: ## Motivation This PR upgrades MKL-DNN from v0.20 to DNNL v1.2 and resolves https://github.com/pytorch/pytorch/issues/30300. DNNL (Deep Neural Network Library) is the new brand of MKL-DNN, which improves performance, quality, and usability over the old version. This PR focuses on the migration of all existing functionalities, including minor fixes, performance improvement and code clean up. It serves as the cornerstone of our future efforts to accommodate new features like OpenCL support, BF16 training, INT8 inference, etc. and to let the Pytorch community derive more benefits from the Intel Architecture. <br> ## What's included? Even DNNL has many breaking changes to the API, we managed to absorb most of them in ideep. This PR contains minimalist changes to the integration code in pytorch. Below is a summary of the changes: <br> General: 1. Replace op-level allocator with global-registered allocator ``` // before ideep::sum::compute<AllocForMKLDNN>(scales, {x, y}, z); // after ideep::sum::compute(scales, {x, y}, z); ``` The allocator is now being registeted at `aten/src/ATen/native/mkldnn/IDeepRegistration.cpp`. Thereafter all tensors derived from the `cpu_engine` (by default) will use the c10 allocator. ``` RegisterEngineAllocator cpu_alloc( ideep::engine::cpu_engine(), [](size_t size) { return c10::GetAllocator(c10::DeviceType::CPU)->raw_allocate(size); }, [](void* p) { c10::GetAllocator(c10::DeviceType::CPU)->raw_deallocate(p); } ); ``` ------ 2. Simplify group convolution We had such a scenario in convolution where ideep tensor shape mismatched aten tensor: when `groups > 1`, DNNL expects weights tensors to be 5-d with an extra group dimension, e.g. `goihw` instead of `oihw` in 2d conv case. As shown below, a lot of extra checks came with this difference in shape before. Now we've completely hidden this difference in ideep and all tensors are going to align with pytorch's definition. So we could safely remove these checks from both aten and c2 integration code. ``` // aten/src/ATen/native/mkldnn/Conv.cpp if (w.ndims() == x.ndims() + 1) { AT_ASSERTM( groups > 1, "Only group _mkldnn_conv2d weights could have been reordered to 5d"); kernel_size[0] = w.get_dim(0) * w.get_dim(1); std::copy_n( w.get_dims().cbegin() + 2, x.ndims() - 1, kernel_size.begin() + 1); } else { std::copy_n(w.get_dims().cbegin(), x.ndims(), kernel_size.begin()); } ``` ------ 3. Enable DNNL built-in cache Previously, we stored DNNL jitted kernels along with intermediate buffers inside ideep using an LRU cache. Now we are switching to the newly added DNNL built-in cache, and no longer caching buffers in order to reduce memory footprint. This change will be mainly reflected in lower memory usage from memory profiling results. On the code side, we removed couple of lines of `op_key_` that depended on the ideep cache before. ------ 4. Use 64-bit integer to denote dimensions We changed the type of `ideep::dims` from `vector<int32_t>` to `vector<int64_t>`. This renders ideep dims no longer compatible with 32-bit dims used by caffe2. So we use something like `{stride_.begin(), stride_.end()}` to cast parameter `stride_` into a int64 vector. <br> Misc changes in each commit: Commit: change build options Some build options were slightly changed, mainly to avoid name collisions with other projects that include DNNL as a subproject. In addition, DNNL built-in cache is enabled by option `DNNL_ENABLE_PRIMITIVE_CACHE`. Old \| New -- \| -- WITH_EXAMPLE \| MKLDNN_BUILD_EXAMPLES WITH_TEST \| MKLDNN_BUILD_TESTS MKLDNN_THREADING \| MKLDNN_CPU_RUNTIME MKLDNN_USE_MKL \| N/A (not use MKL anymore) ------ Commit: aten reintegration - aten/src/ATen/native/mkldnn/BinaryOps.cpp Implement binary ops using new operation `binary` provided by DNNL - aten/src/ATen/native/mkldnn/Conv.cpp Clean up group convolution checks Simplify conv backward integration - aten/src/ATen/native/mkldnn/MKLDNNConversions.cpp Simplify prepacking convolution weights - test/test_mkldnn.py Fixed an issue in conv2d unit test: it didn't check conv results between mkldnn and aten implementation before. Instead, it compared the mkldnn with mkldnn as the default cpu path will also go into mkldnn. Now we use `torch.backends.mkldnn.flags` to fix this issue - torch/utils/mkldnn.py Prepack weight tensor on module `__init__` to achieve better performance significantly ------ Commit: caffe2 reintegration - caffe2/ideep/ideep_utils.h Clean up unused type definitions - caffe2/ideep/operators/adam_op.cc & caffe2/ideep/operators/momentum_sgd_op.cc Unify tensor initialization with `ideep::tensor::init`. Obsolete `ideep::tensor::reinit` - caffe2/ideep/operators/conv_op.cc & caffe2/ideep/operators/quantization/int8_conv_op.cc Clean up group convolution checks Revamp convolution API - caffe2/ideep/operators/conv_transpose_op.cc Clean up group convolution checks Clean up deconv workaround code ------ Commit: custom allocator - Register c10 allocator as mentioned above <br><br> ## Performance We tested inference on some common models based on user scenarios, and most performance numbers are either better than or on par with DNNL 0.20. ratio: new / old \| Latency (batch=1 4T) \| Throughput (batch=64 56T) -- \| -- \| -- pytorch resnet18 \| 121.4% \| 99.7% pytorch resnet50 \| 123.1% \| 106.9% pytorch resnext101_32x8d \| 116.3% \| 100.1% pytorch resnext50_32x4d \| 141.9% \| 104.4% pytorch mobilenet_v2 \| 163.0% \| 105.8% caffe2 alexnet \| 303.0% \| 99.2% caffe2 googlenet-v3 \| 101.1% \| 99.2% caffe2 inception-v1 \| 102.2% \| 101.7% caffe2 mobilenet-v1 \| 356.1% \| 253.7% caffe2 resnet101 \| 100.4% \| 99.8% caffe2 resnet152 \| 99.8% \| 99.8% caffe2 shufflenet \| 141.1% \| 69.0% † caffe2 squeezenet \| 98.5% \| 99.2% caffe2 vgg16 \| 136.8% \| 100.6% caffe2 googlenet-v3 int8 \| 100.0% \| 100.7% caffe2 mobilenet-v1 int8 \| 779.2% \| 943.0% caffe2 resnet50 int8 \| 99.5% \| 95.5% _Configuration: Platform: Skylake 8180 Latency Test: 4 threads, warmup 30, iteration 500, batch size 1 Throughput Test: 56 threads, warmup 30, iteration 200, batch size 64_ † Shufflenet is one of the few models that require temp buffers during inference. The performance degradation is an expected issue since we no longer cache any buffer in the ideep. As for the solution, we suggest users opt for caching allocator like jemalloc as a drop-in replacement for system allocator in such heavy workloads. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32422 Test Plan: Perf results: https://our.intern.facebook.com/intern/fblearner/details/177790608?tab=Experiment%20Results 10% improvement for ResNext with avx512, neutral on avx2 More results: https://fb.quip.com/ob10AL0bCDXW#NNNACAUoHJP Reviewed By: yinghai Differential Revision: D20381325 Pulled By: dzhulgakov fbshipit-source-id: 803b906fd89ed8b723c5fcab55039efe3e4bcb77	2020-03-26 22:07:59 -07:00
Michael Carilli	0f0271e255	[RELAND2] Eager autocasting, out-of-place ops only (with MSVC 2017 fix) (#35102 ) Summary: This is the second reland attempt for https://github.com/pytorch/pytorch/pull/32140. The first reland attempt https://github.com/pytorch/pytorch/pull/35011 failed due a [small incompatible change](https://github.com/pytorch/pytorch/pull/35011#issuecomment-601754216) in recent master (`skipIfRocm` was removed from `test_data_parallel.py`). The present PR restores skipIfRocm. Description from first reland attempt https://github.com/pytorch/pytorch/pull/35011: > https://github.com/pytorch/pytorch/pull/32140 was approved and merged, but [reverted](`d0577e19f0`) because it broke builds with versions of Visual Studio older than 15.8 that were not represented in public CI. The build failures were caused by a [known VS bug](https://developercommunity.visualstudio.com/content/problem/27729/allow-function-with-internal-linkage-as-template-n.html), fixed in versions 15.8 and newer. > > The present PR reverts the revert (restoring https://github.com/pytorch/pytorch/pull/32140 's diffs) and adds a workaround to enable compilation with VS < 15.8. The workaround isn't pretty, but it's guarded by macros such that it's only used when compiling with VS < 15.8. All other builds compile with the same code/control flow as was merged in https://github.com/pytorch/pytorch/pull/32140. > > Original description of https://github.com/pytorch/pytorch/pull/32140: > > Initial integration of eager autocasting, supporting out-of-place ops only for easier review. > Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081 > > > In-place ops and ops with user-supplied out=... can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/issues/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35102 Differential Revision: D20596918 Pulled By: ezyang fbshipit-source-id: 60caa279bb0ce4a9bb0b28c1d585d42cf1cc7e50	2020-03-24 09:08:04 -07:00
Peter Bell	bd0ef784e0	FAQ: Add note about recovering from OOM (#35214 ) Summary: Closes https://github.com/pytorch/pytorch/issues/18853 This documents the workaround needed to solve the issues in https://github.com/pytorch/pytorch/issues/18853 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35214 Differential Revision: D20604877 Pulled By: ezyang fbshipit-source-id: 71ed13cfa567d8e88fa9f18180a171cd174fb528	2020-03-23 20:22:46 -07:00
Xiang Gao	df8d6eeb19	Update docs about DP and DDP for CUDA (#35063 ) Summary: We should recommend DDP instead of DP. Hope we can also cherry-pick this for 1.5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35063 Differential Revision: D20549621 Pulled By: ngimel fbshipit-source-id: 86b1b2134664065cc6070ea4212895f993eaf543	2020-03-20 20:06:37 -07:00
Mike Ruberry	fe276d541e	Revert D20541921: [pytorch][PR] [RELAND] Eager autocasting, out-of-place ops only (with MSVC 2017 fix) Test Plan: revert-hammer Differential Revision: D20541921 Original commit changeset: abb5488dca86 fbshipit-source-id: d2c6038978f80e5429632f8b49107090a8a247f4	2020-03-19 22:39:12 -07:00
Michael Carilli	991b97277a	[RELAND] Eager autocasting, out-of-place ops only (with MSVC 2017 fix) (#35011 ) Summary: https://github.com/pytorch/pytorch/pull/32140 was approved and merged, but [reverted](`d0577e19f0`) because it broke builds with versions of Visual Studio older than 15.8 that were not represented in public CI. The build failures were caused by a [known VS bug](https://developercommunity.visualstudio.com/content/problem/27729/allow-function-with-internal-linkage-as-template-n.html), fixed in versions 15.8 and newer. The present PR reverts the revert (restoring https://github.com/pytorch/pytorch/pull/32140 's diffs) and adds a workaround to enable compilation with VS < 15.8. The workaround isn't pretty, but it's guarded by macros such that it's only used when compiling with VS < 15.8. All other builds compile with the same code/control flow as was merged in https://github.com/pytorch/pytorch/pull/32140. Original description of https://github.com/pytorch/pytorch/pull/32140: > Initial integration of eager autocasting, supporting out-of-place ops only for easier review. Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081 > In-place ops and ops with user-supplied out=... can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/issues/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35011 Differential Revision: D20541921 Pulled By: ezyang fbshipit-source-id: abb5488dca8620b0daac4306ebf2bb47fc36e4f5	2020-03-19 20:18:18 -07:00
Edward Yang	d0577e19f0	Revert D20346700: [pytorch][PR] Eager autocasting, out-of-place ops only Test Plan: revert-hammer Differential Revision: D20346700 Original commit changeset: 12d77b391731 fbshipit-source-id: 108d72bf24232f443c0be293ec932c0c478d6a60	2020-03-18 11:42:51 -07:00
Michael Carilli	aaa8f02156	Eager autocasting, out-of-place ops only (#32140 ) Summary: Initial integration of eager autocasting, supporting out-of-place ops only for easier review. Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081 In-place ops and ops with user-supplied `out=...` can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/pull/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32140 Differential Revision: D20346700 Pulled By: ezyang fbshipit-source-id: 12d77b3917310186fbddf11c59b2794dc859131f	2020-03-18 10:28:21 -07:00
Shen Li	800bdcf000	Removing experimental tag in for RPC and adding experimental tag for RPC+TorchScript (#34887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34887 Test Plan: Imported from OSS Differential Revision: D20491409 Pulled By: mrshenli fbshipit-source-id: ce79c9706eb70a3a52a4032de4f0bd538b694332	2020-03-17 17:43:42 -07:00
Hameer Abbasi	6b701de130	Add types argument to __torch_function__ (#34303 ) Summary: This PR adds the `types` argument to `__torch_function__` as per RFC 0001: https://github.com/pytorch/rfcs/pull/3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34303 Differential Revision: D20474992 Pulled By: ezyang fbshipit-source-id: cdd40b3b38f3bda4ece8812a629f5db87e919d01	2020-03-17 13:32:00 -07:00
Rohan Varma	fd35596585	[docs][1.5] Update distributed autograd note (#34657 ) Summary: - Update API calls `backward` and `optim.step` now that we require `context_id` - Add notes to clarify purpose of distributed autograd context (this was a source of confusion in some feedback) - Add note that details why optimizer requires context_id - Clearly specify that we don't have SMART mode yet Pull Request resolved: https://github.com/pytorch/pytorch/pull/34657 Differential Revision: D20427667 Pulled By: rohan-varma fbshipit-source-id: 5f8a3539ccf648a78e9e9a0dfdfe389c678b1606	2020-03-12 22:56:32 -07:00
Nathan Goldbaum	3f1ba3c465	Redo of "Add API for listing functions overridable by __torch_function__" (#34240 ) Summary: This is a redo of https://github.com/pytorch/pytorch/pull/33791, which was reverted because it introduced a flaky test. The test was flaky and only flaky on Python3.5 because of dict order randomization. I've fixed the issue with tests clobbering each other in `b539fec` and removed the override tests for `torch.nn.functional.tanh` and `torch.nn.functional.sigmoid`, which are deprecated and shouldn't be overridable in `e0d7402`. I also verified that no more test clobbering is happening. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34240 Differential Revision: D20252442 Pulled By: cpuhrsch fbshipit-source-id: 069568e342a41c90e1dc76cbf85ba4aed47f24be	2020-03-12 10:33:17 -07:00
Michael Suo	c235be42dd	[jit] kill script namespace (#34515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34515 Once upon a time we thought this was necessary. In reality it is not, so removing it. For backcompat, our public interface (defined in `api/`) still has typedefs to the old `script::` names. There was only one collision: `Pass` as a `Stmt` and `Pass` as a graph transform. I renamed one of them. Test Plan: Imported from OSS Differential Revision: D20353503 Pulled By: suo fbshipit-source-id: 48bb911ce75120a8c9e0c6fb65262ef775dfba93	2020-03-11 23:32:48 -07:00
Duncan Riach	516a587438	Enhance reproducibility documentation (#33795 ) Summary: Improves explanation of non-determinism when running on GPUs. Adds info about `torch.nn.BCELoss` operating non-deterministically on GPUs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33795 Differential Revision: D20284880 Pulled By: ngimel fbshipit-source-id: d543959636d261a80c234150304344b19a37ba5d	2020-03-06 15:32:04 -08:00
Shen Li	ac6e75a165	Revert D20195053: [pytorch][PR] Add API for listing functions overridable by __torch_function__ Test Plan: revert-hammer Differential Revision: D20195053 Original commit changeset: 1585f4e405f5 fbshipit-source-id: 3c1aab9c60e3138d40d200ae4238bda0cddf8896	2020-03-04 10:13:54 -08:00
peter	5f4a01b2ea	Update MAGMA to 2.5.2 for Windows (#34205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34205 Differential Revision: D20248224 Pulled By: soumith fbshipit-source-id: f5e0fe06aa8f8ee551abe45db1d55d06e95ab928	2020-03-04 08:28:09 -08:00
Nathan Goldbaum	ad2825a2c9	Add API for listing functions overridable by __torch_function__ (#33791 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33182 This adds private API functions that developers of types that implement `__torch_function__` can use to ensure full coverage of the subset of the PyTorch API that can be overrided. I've refactored some of the code in the tests into a new `torch._overrides.get_overridable_functions` function. I've also changed `TENSOR_LIKE_TORCH_OVERRIDES` into `torch._overrides.get_testing_overrides` and `IGNORED_TORCH_FUNCTIONS` into `torch._overrides.get_ignored_functions`. Making these two static global variables in the tests into functions should allow rewriting their implementation to construct their return values instead of just statically defining the return value as is done here. Currently that is blocked on not being able to inspect function signatures of compiled kernels in PyTorch (see https://github.com/pytorch/pytorch/issues/28233). See the docs I've added for usage examples of these new functions. I also refactored the existing override tests to make use of these new functions, which should be a good forcing function to make sure they're kept up-to-date. Finally, while working on this I discovered that `TestTorchFunctionOverrides.test_mean` and `TestTorchFunctionOverrides.test_mm` weren't ever being run because they were getting clobbered by the other dynamically generated override tests. I fixed that by renaming the tests and then fixing the actual test code. I've verified that all the subclassing semantics is correct and that the updated test answers are correct. I'm happy to put the fixes to the existing tests in as a separate pull request if that would be easier to review. ping cpuhrsch since the feature request originally came from them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33791 Differential Revision: D20195053 Pulled By: cpuhrsch fbshipit-source-id: 1585f4e405f5223932b410eae03a288dc8eb627e	2020-03-03 12:40:34 -08:00
Omkar Salpekar	24dd800e6a	[Dist Autograd] Functional API for Dist Autograd and Dist Optimizer (#33711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33711 Fixed #33480 This makes `dist_autograd.backward` and `dist_optimizer.step` functional by making the user explicitly pass in the `context_id` as opposed to relying on the confusing thread_local context_id. This diff incorporates these API changes and all places where these functions are called. More concretely, this code: ``` with dist_autograd.context(): # Forward pass. dist_autograd.backward([loss.sum()]) dist_optim.step() ``` should now be written as follows: ``` with dist_autograd.context() as context_id: # Forward pass. dist_autograd.backward(context_id, [loss.sum()]) dist_optim.step(context_id) ``` Test Plan: Ensuring all existing dist_autograd and dist_optimizer tests pass with the new API. Also added a new test case for input checking. Differential Revision: D20011710 fbshipit-source-id: 216e12207934a2a79c7223332b97c558d89d4d65	2020-02-26 19:08:28 -08:00
Michael Carilli	fc6a153688	[WIP] Reanimate gradient scaling API with original scale update heuristic (#33366 ) Summary: Also, windows memory failures responsible for the earlier reversion have been fixed. This PR (initially) contains 2 commits: * a revert of the revert * all changes to implement the original Apex scale update heuristic, squashed into a single commit for easier diff review Pull Request resolved: https://github.com/pytorch/pytorch/pull/33366 Differential Revision: D20099026 Pulled By: ngimel fbshipit-source-id: 339b9b6bd5134bf055057492cd1eedb7e4461529	2020-02-25 19:00:34 -08:00
peter	adbe289870	Update MKL to 2020.0.166 for Windows (#33690 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33690 Differential Revision: D20089300 Pulled By: ezyang fbshipit-source-id: 887c006fbdb2c837f0a1c607a196811f44f1fb35	2020-02-24 22:43:34 -08:00
Edward Yang	ae53f8dd25	Revert D19859905: [pytorch][PR] Gradient scaling API Test Plan: revert-hammer Differential Revision: D19859905 Original commit changeset: bb8ae6966214 fbshipit-source-id: 28f1c93e8a00e3a4bbe8cc981499b15468f0b970	2020-02-14 11:03:27 -08:00
Michael Carilli	40246fa63c	Gradient scaling API (#26512 ) Summary: This PR implements the gradient scaling API that mruberry, jjsjann123, ngimel, zdevito, gchanan and I have been discussing. Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081. Volume-wise, this PR is mostly documentation and tests. The Python API (found entirely in `torch/cuda/amp/amp_scaler.py`) is lightweight . The exposed functions are intended to make the implementation and control flow of gradient scaling convenient, intuitive, and performant. The API is probably easiest to digest by looking at the documentation and examples. `docs/source/amp.rst` is the homepage for the Automatic Mixed Precision package. `docs/source/notes/amp_examples.rst` includes several examples demonstrating common but not-immediately-obvious use cases. Examples are backed by tests in `test_cuda.py` (and thankfully the tests pass :P). Two small utility kernels have been added in `native/cuda/AmpKernels.cu` to improve performance and avoid host-device synchronizations wherever possible. Existing optimizers, both in the wild and in Pytorch core, do not need to change to use the scaling API. However, the API was also designed to establish a contract between user scripts and optimizers such that writers of _new_ custom optimizers have the control points they need to implement fast, optionally sync-free updates. User scripts that obey the scaling API can drop such custom optimizers in and reap performance benefits without having to change anything aside from the optimizer constructor itself. [I know what the contract with custom optimizers should be](`35829f24ef/torch/cuda/amp/amp_scaler.py (L179-L184)`), but I'm waiting for review on the rest of the API before I go about documenting it (it will be given a dedicated section in `docs/source/notes/amp_examples.rst`. Currently, the gradient scaling examples do not include the auto-casting API as discussed in https://github.com/pytorch/pytorch/issues/25081. The gradient scaling API is intended to be orthogonal/modular relative to autocasting. Without auto-casting the gradient scaling API is fully use-_able_, but not terribly use-_ful_, so it's up to you guys whether you want to wait until auto-casting is ready before merging the scaling API as well. ### Todo - [ ] How do I get c10 registered status for my two custom kernels? They're very simple. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26512 Differential Revision: D19859905 Pulled By: mruberry fbshipit-source-id: bb8ae6966214718dfee11345db824389e4286923	2020-02-13 11:06:06 -08:00
Ilia Cherniavskii	04829e924a	Update CPU threading doc (#33083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33083 Added more recommendations, some notes and warning Test Plan: cd docs ; make html Differential Revision: D19829133 Pulled By: ilia-cher fbshipit-source-id: b9fbd89f5875b3ce35cc42ba75a3b44bb132c506	2020-02-11 14:13:51 -08:00
Brian Wignall	f326045b37	Fix typos, via a Levenshtein-type corrector (#31523 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking. Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523 Differential Revision: D19216749 Pulled By: mrshenli fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea	2020-01-17 16:03:19 -08:00
Shen Li	322f34b245	Adding DDP Design Note Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32158 Test Plan: Imported from OSS Differential Revision: D19405980 Pulled By: mrshenli fbshipit-source-id: 808ef1c71b637546f8872375bf1828967b1a5a60	2020-01-15 14:10:45 -08:00
Vamshi Chowdary	05088da8e9	[pytorch][PR] Fixed error in sample code of documentation (#31682 ) Summary: "in_features" and "out_features" are not defined. Possibly a typo. They should be "input_features" and "output_features" instead Pull Request resolved: https://github.com/pytorch/pytorch/pull/31682 Differential Revision: D19251685 Pulled By: zou3519 fbshipit-source-id: ac9e524e792a1853a16e8876d76b908495d8f35e	2020-01-15 10:34:07 -08:00
Rohan Varma	a561a8448b	minor doc tweak to use mp.spawn in example (#30381 ) Summary: Per pietern's comment in https://github.com/pytorch/pytorch/issues/30022, we can make this example launcher a bit simpler by using `torch.multiprocessing`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30381 Differential Revision: D19292080 Pulled By: rohan-varma fbshipit-source-id: 018ace945601166ef3af05d8c3e69d900bd77c3b	2020-01-06 22:19:01 -08:00
Nathan Goldbaum	9d3402e4cb	Add the __torch_function__ API override mechanism (#30730 ) Summary: This is a re-do of https://github.com/pytorch/pytorch/issues/27064, which was reverted (`b8792c0438`). This was landed at the same time as other work that added new operators to the `torch` namespace so the check for whether the `torch` namespace is exhaustively checked for overridability was triggering test failures. I've temporarily disabled that check and added an explanatory comment that the check will be re-enabled in a future PR that will be merged during a time when the commit velocity on PyTorch is lower. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30730 Differential Revision: D18813270 Pulled By: ezyang fbshipit-source-id: 70477c4656dca8fea6e7bc59259555041fcfbf68	2019-12-04 13:19:07 -08:00
Edward Yang	b8792c0438	Revert D18645954: add __torch_function__ API override mechanism Test Plan: revert-hammer Differential Revision: D18645954 Original commit changeset: 54b5e4344d7a fbshipit-source-id: 4a7aebb483e6b001130d6f384ccc53c5a808ab13	2019-12-04 07:41:47 -08:00
Prasun Anand	d12786b24f	add __torch_function__ API override mechanism (#27064 ) Summary: Closes https://github.com/pytorch/pytorch/issues/24015 (see description of that issue for more details). For a toy example, see the `DiagonalTensor` and `SubDiagonalTensor` class in test/test_overrides.py. This PR currently contains: * tests for `__torch_function__` behavior * modification to `gen_python_functions` and `parse` function signatures and dispatched to correct overloaded argument. This feature is inspired by and analogous to NumPy's `__array_function__` protocol ([see NumPy Enhancement Proposal 18](https://numpy.org/neps/nep-0018-array-function-protocol.html#trying-array-function-methods-until-the-right-one-works)). ### Benchmarks: See Nathan's comment below: https://github.com/pytorch/pytorch/pull/27064#issuecomment-554601189 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27064 Differential Revision: D18645954 Pulled By: ezyang fbshipit-source-id: 54b5e4344d7afdbcf996bb57191b0bdadc7b1767	2019-12-04 05:56:46 -08:00

... 2 3 4 5 6 ...

459 Commits