pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Michael Carilli	3e6bb5233f	Reference amp tutorial (recipe) from core amp docs (#44725 ) Summary: https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html is live. Core amp docs should reference it. Also i fixed some typos in the `zero_grad` docs we ignored when git was behaving weirdly during ngimel 's merge of https://github.com/pytorch/pytorch/pull/44423. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44725 Reviewed By: mruberry Differential Revision: D23723807 Pulled By: ngimel fbshipit-source-id: ca0b76365f8ca908bd978e3b38bf81857fa6c2a3	2020-09-16 11:37:58 -07:00
Michael Carilli	2fd142a2ef	Small clarification to amp gradient penalty example (#44667 ) Summary: requested by https://discuss.pytorch.org/t/what-is-the-correct-way-of-computing-a-grad-penalty-using-amp/95827/3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44667 Reviewed By: mruberry Differential Revision: D23692768 Pulled By: ngimel fbshipit-source-id: 83c61b94e79ef9f86abed2cc066f188dce0c8456	2020-09-14 21:56:09 -07:00
Gao, Xiang	5e97f251a8	Enable TF32 support for cuDNN (#40737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40737 Reviewed By: mruberry Differential Revision: D22801525 Pulled By: ngimel fbshipit-source-id: ac7f7e728b4b3e01925337e8c9996f26a6433fd2	2020-09-01 15:34:24 -07:00
Kurt Mohler	d7ee84c9b5	Update determinism documentation (#41692 ) Summary: Add user-facing documentation for set_deterministic Also update grammar and readability in Reproducibility page Issue https://github.com/pytorch/pytorch/issues/15359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41692 Reviewed By: ailzhang Differential Revision: D23433061 Pulled By: mruberry fbshipit-source-id: 4c4552950803c2aaf80f7bb4792d2095706d07cf	2020-08-31 21:06:24 -07:00
peterjc123	9b05fbd92e	Correct the windows docs (#43479 ) Summary: Fixes https://discuss.pytorch.org/t/i-cannot-use-the-pytorch-that-was-built-successfully-from-source-dll-initialization-routine-failed-error-loading-caffe2-detectron-ops-gpu-dll/93243/5?u=peterjc123. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43479 Reviewed By: mrshenli, ngimel Differential Revision: D23294211 Pulled By: ezyang fbshipit-source-id: d67df7d0355c2783153d780c94f959758b246d36	2020-08-25 13:41:24 -07:00
Heitor Schueroff de Souza	ffc3da35f4	Don't materialize output grads (#41821 ) Summary: Added a new option in AutogradContext to tell autograd to not materialize output grad tensors, that is, don't expand undefined/None tensors into tensors full of zeros before passing them as input to the backward function. This PR is the second part that closes https://github.com/pytorch/pytorch/issues/41359. The first PR is https://github.com/pytorch/pytorch/pull/41490. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41821 Reviewed By: albanD Differential Revision: D22693163 Pulled By: heitorschueroff fbshipit-source-id: a8d060405a17ab1280a8506a06a2bbd85cb86461	2020-08-11 04:27:07 -07:00
Hameer Abbasi	3d46e02ea1	Add __torch_function__ for methods (#37091 ) Summary: According to pytorch/rfcs#3 From the goals in the RFC: 1. Support subclassing `torch.Tensor` in Python (done here) 2. Preserve `torch.Tensor` subclasses when calling `torch` functions on them (done here) 3. Use the PyTorch API with `torch.Tensor`-like objects that are _not_ `torch.Tensor` subclasses (done in https://github.com/pytorch/pytorch/issues/30730) 4. Preserve `torch.Tensor` subclasses when calling `torch.Tensor` methods. (done here) 5. Propagating subclass instances correctly also with operators, using views/slices/indexing/etc. (done here) 6. Preserve subclass attributes when using methods or views/slices/indexing. (done here) 7. A way to insert code that operates on both functions and methods uniformly (so we can write a single function that overrides all operators). (done here) 8. The ability to give external libraries a way to also define functions/methods that follow the `__torch_function__` protocol. (will be addressed in a separate PR) This PR makes the following changes: 1. Adds the `self` argument to the arg parser. 2. Dispatches on `self` as well if `self` is not `nullptr`. 3. Adds a `torch._C.DisableTorchFunction` context manager to disable `__torch_function__`. 4. Adds a `torch::torch_function_enabled()` and `torch._C._torch_function_enabled()` to check the state of `__torch_function__`. 5. Dispatches all `torch._C.TensorBase` and `torch.Tensor` methods via `__torch_function__`. TODO: - [x] Sequence Methods - [x] Docs - [x] Tests Closes https://github.com/pytorch/pytorch/issues/28361 Benchmarks in https://github.com/pytorch/pytorch/pull/37091#issuecomment-633657778 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37091 Reviewed By: ngimel Differential Revision: D22765678 Pulled By: ezyang fbshipit-source-id: 53f8aa17ddb8b1108c0997f6a7aa13cb5be73de0	2020-08-05 20:44:13 -07:00
peter	192487d716	Update MAGMA to 2.5.3 for Windows (#42410 ) Summary: In order to introduce CUDA 11 build jobs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42410 Reviewed By: malfet Differential Revision: D22892025 Pulled By: ezyang fbshipit-source-id: 11bd7507f623d654a589ba00a138f6b947990f4c	2020-08-03 07:43:09 -07:00
Xiao Wang	60e2baf5e0	[doc] Add LSTM non-deterministic workaround (#40893 ) Summary: Related: https://github.com/pytorch/pytorch/issues/35661 Preview ![image](https://user-images.githubusercontent.com/24860335/86861581-4b4c7100-c07c-11ea-950a-3145bfae9af9.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/40893 Reviewed By: vincentqb Differential Revision: D22535418 Pulled By: ngimel fbshipit-source-id: f194ddaff8ec6d03a3616c87466e2cbbe7e429a9	2020-07-21 16:20:02 -07:00
Mike Ruberry	a0e58996fb	Makes the use of the term "module" consistent through the serialization note (#41563 ) Summary: module -> torch.nn.Module or ScriptModule, as appropriate. + bonus grammar fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41563 Reviewed By: gchanan Differential Revision: D22584173 Pulled By: mruberry fbshipit-source-id: 8c90f1f9a194bfdb277c97cf02c9b8c1c6ddc601	2020-07-16 14:59:49 -07:00
Mike Ruberry	f49d97a848	Notes for lcm and gcd, formatting doc fixes (#41526 ) Summary: A small PR fixing some formatting in lcm, gcd, and the serialization note. Adds a note to lcm and gcd explaining behavior that is not always defined. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41526 Reviewed By: ngimel Differential Revision: D22569341 Pulled By: mruberry fbshipit-source-id: 5f5ff98c0831f65e82b991ef444a5cee8e3c8b5a	2020-07-16 13:15:29 -07:00
anjali411	b9442bb03e	Doc note for complex (#41252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41252 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D22553266 Pulled By: anjali411 fbshipit-source-id: f6dc409da048496d72b29b0976dfd3dd6645bc4d	2020-07-16 08:53:27 -07:00
Xiang Gao	23174ca71b	[reland] Enable TF32 support for cuBLAS (#41498 ) Summary: fix rocm Pull Request resolved: https://github.com/pytorch/pytorch/pull/41498 Reviewed By: mruberry Differential Revision: D22560572 Pulled By: ngimel fbshipit-source-id: 5ee79e96cb29e70d9180830d058efb53d1c6c041	2020-07-15 21:00:55 -07:00
Mike Ruberry	60f2fa6a84	Updates serialization note to explain versioned symbols and dynamic versioning (#41395 ) Summary: Doc update intended to clarify and expand our current serialization behavior, including explaining the difference between torch.save/torch.load, torch.nn.Module.state_dict/torch.nn.Module.load_state_dict, and torch.jit.save/torch.jit.load. Also explains, for the time, when historic serialized Torchscript behavior is preserved and our recommendation for preserving behavior (using the same PyTorch version to consume a model as produced it). Pull Request resolved: https://github.com/pytorch/pytorch/pull/41395 Reviewed By: ngimel Differential Revision: D22560538 Pulled By: mruberry fbshipit-source-id: dbc2f1bb92ab61ff2eca4888febc21f7dda76ba1	2020-07-15 19:05:19 -07:00
Shen Li	3a63a939d4	Revert D22517785: [pytorch][PR] Enable TF32 support for cuBLAS Test Plan: revert-hammer Differential Revision: D22517785 (`288ece89e1`) Original commit changeset: 87334c893561 fbshipit-source-id: 0a0674f49c1bcfc98f7f88af5a8c7de93b76e458	2020-07-15 08:15:48 -07:00
Xiang Gao	288ece89e1	Enable TF32 support for cuBLAS (#40800 ) Summary: Benchmark on a fully connected network and torchvision models (time in seconds) on GA100: \| model \| batch size \| forward(TF32) \| forward(FP32) \| backward(TF32) \| backward(FP32) \| \|--------------------\|------------\|---------------\|---------------\|----------------\|----------------\| \| FC 512-128-32-8 \| 512 \| 0.000211 \| 0.000321 \| 0.000499 \| 0.000532 \| \| alexnet \| 512 \| 0.0184 \| 0.0255 \| 0.0486 \| 0.0709 \| \| densenet161 \| 128 \| 0.0665 \| 0.204 \| 0.108 \| 0.437 \| \| googlenet \| 256 \| 0.0925 \| 0.110 \| 0.269 \| 0.326 \| \| inception_v3 \| 256 \| 0.155 \| 0.214 \| 0.391 \| 0.510 \| \| mnasnet1_0 \| 512 \| 0.108 \| 0.137 \| 0.298 \| 0.312 \| \| mobilenet_v2 \| 512 \| 0.114 \| 0.294 \| 0.133 \| 0.303 \| \| resnet18 \| 512 \| 0.0722 \| 0.100 \| 0.182 \| 0.228 \| \| resnext50_32x4d \| 256 \| 0.170 \| 0.237 \| 0.373 \| 0.479 \| \| shufflenet_v2_x1_0 \| 512 \| 0.0463 \| 0.0473 \| 0.125 \| 0.123 \| \| squeezenet1_0 \| 512 \| 0.0870 \| 0.0948 \| 0.205 \| 0.214 \| \| vgg16 \| 256 \| 0.167 \| 0.234 \| 0.401 \| 0.502 \| \| wide_resnet50_2 \| 512 \| 0.186 \| 0.310 \| 0.415 \| 0.638 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/40800 Reviewed By: mruberry Differential Revision: D22517785 Pulled By: ngimel fbshipit-source-id: 87334c8935616f72a6af5abbd3ae69f76923dc3e	2020-07-14 13:21:10 -07:00
Michael Carilli	d927aee312	Small clarification of torch.cuda.amp multi-model example (#41203 ) Summary: some people have been confused by `retain_graph` in the snippet, they thought it was an additional requirement imposed by amp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41203 Differential Revision: D22463700 Pulled By: ngimel fbshipit-source-id: e6fc8871be2bf0ecc1794b1c6f5ea99af922bf7e	2020-07-10 11:13:26 -07:00
anjali411	db38487ece	Autograd Doc for Complex Numbers (#41012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41012 Test Plan: Imported from OSS Differential Revision: D22476911 Pulled By: anjali411 fbshipit-source-id: 7da20cb4312a0465272bebe053520d9911475828	2020-07-10 09:57:43 -07:00
Edward Leardi	6b50874cb7	Fix HTTP links in documentation to HTTPS (#40878 ) Summary: I ran `make linkcheck` using `sphinx.builders.linkcheck` on the documentation and noticed a few links weren't using HTTPS so I quickly updated them all. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40878 Differential Revision: D22404647 Pulled By: ngimel fbshipit-source-id: 9c9756db59197304023fddc28f252314f6cf4af3	2020-07-06 20:05:21 -07:00
Ailing Zhang	d7cd16858f	Add documentation about storage sharing is preserved and serialized f… (#40412 ) Summary: …ile size. fixes https://github.com/pytorch/pytorch/issues/40157 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40412 Reviewed By: ezyang Differential Revision: D22265639 Pulled By: ailzhang fbshipit-source-id: 16b0301f16038bd784e7e92f63253fedc7820adc	2020-06-29 17:23:29 -07:00
Jeong Ukjae	b4db529352	Fix wrong link in docs/source/notes/ddp.rst (#40484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40484 Differential Revision: D22259834 Pulled By: mrshenli fbshipit-source-id: 4ec912c600c81010bdb2778c35cbb0321480199f	2020-06-28 13:55:56 -07:00
Wanchao Liang	eebd492dcf	[doc] fix autograd doc subsubsection display issue (#40582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40582 There's a misuse in the `requires_grad` with ~~~~~~, "~~~~" is not a official section marker, change it to "^^^^^" to denote subsubsections, also fix the other places where we should use subsection "-----" instead of subsubsection "^^^^" see https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#sections Before: <img width="712" alt="rst_before" src="https://user-images.githubusercontent.com/9443650/85789835-2226fa80-b6e4-11ea-97b6-2b19fdf324a4.png"> After: <img width="922" alt="rst_after" src="https://user-images.githubusercontent.com/9443650/85789856-281cdb80-b6e4-11ea-925f-cb3f4ebaa2bf.png"> Test Plan: Imported from OSS Differential Revision: D22245747 Pulled By: wanchaol fbshipit-source-id: 11548ed42f627706863bb74d4269827d1b3450d4	2020-06-25 23:28:33 -07:00
Michael Carilli	3b040c478a	Make custom_fwd a no-op when not executed under autocast (#36171 ) Summary: Currently, a custom autograd function written with ``` torch.cuda.amp.custom_fwd(cast_inputs=dtype) def forward(ctx, *args): ... ``` casts incoming floating-point CUDA tensors to `dtype` unconditionally, regardless of whether the function executes in an autocast-enabled region. I think I had the wrong idea there. Autocast-disabled regions should give the user control of input types. Also, `custom_fwd(cast_inputs=dtype)`-decorated functions' behavior should align with native fp32list/fp16list functions. C++-side casting wrappers have no effect when autocast is disabled, and `custom_fwd`'s casting should behave the same way. The present PR changes `custom_fwd` so it only casts in autocast-enabled regions (also updates custom_fwd to ignore fp64 inputs, like the C++ wrappers). Pull Request resolved: https://github.com/pytorch/pytorch/pull/36171 Differential Revision: D22179511 Pulled By: ngimel fbshipit-source-id: 5a93d070179a43206066bce19da0a5a19ecaabbd	2020-06-23 10:23:02 -07:00
Rohan Varma	ae2f1f0372	[DDP Note] Remove refs to RoundRobin PG until we officially support it (#40380 ) Summary: Removes line mentioning `ProcessGroupRoundRobin` since we don't intend it to be used as a public API just yet. We can add this back when we officially support the API Pull Request resolved: https://github.com/pytorch/pytorch/pull/40380 Differential Revision: D22165556 Pulled By: rohan-varma fbshipit-source-id: 24d0477d881dc74f2ff579de61dfd1ced2b09e75	2020-06-22 16:19:29 -07:00
anjali411	8ec2ae9a9f	Add view_as_real, view_as_complex for complex tensors (#39099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39099 Test Plan: Imported from OSS Differential Revision: D22057886 Pulled By: anjali411 fbshipit-source-id: bad5ba7097ba0dd13f2c549b2463094dee9afa14	2020-06-22 15:15:27 -07:00
James Reed	c73095e78f	Add note to serialization docs about zipfile format (#40288 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40288 Test Plan: Imported from OSS Differential Revision: D22140324 Pulled By: jamesr66a fbshipit-source-id: 01d7aa642ed2f4e4bdac4b7f3223bf4d7e62fd4d	2020-06-19 13:40:08 -07:00
Alban Desmaison	b88b7d552f	Prevent custom Functions from creating non differentiable type that requires grad (#38326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38326 Test Plan: Imported from OSS Differential Revision: D21668740 Pulled By: albanD fbshipit-source-id: f452f65e76003492055311523a652937b1300183	2020-05-21 08:30:14 -07:00
Ilia Cherniavskii	43dd8760d7	Move ThreadLocalDebugInfo to c10 (#37774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37774 Move ThreadLocalDebugInfo from ATen to C10 Test Plan: Imported from OSS Differential Revision: D21384249 Pulled By: ilia-cher fbshipit-source-id: f9b5089a868f84a2ee013695a481fcc883d3c6b2	2020-05-11 19:27:41 -07:00
毛毛	19d6e32e9a	fix sample code (#38002 ) Summary: Make Linear layer working correct when bias is False Pull Request resolved: https://github.com/pytorch/pytorch/pull/38002 Differential Revision: D21509679 Pulled By: malfet fbshipit-source-id: c7077992cf414ecc557b39e5ed1e39ef01c8b347	2020-05-11 15:34:09 -07:00
Ilia Cherniavskii	2d708cefcc	Move RecordFunction into ATen (#37548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37548 Moving RecordFunction from torch::autograd::profiler into at namespace Test Plan: CI Imported from OSS Differential Revision: D21315852 fbshipit-source-id: 4a4dbabf116c162f9aef0da8606590ec3f3847aa	2020-05-07 14:52:39 -07:00
Ilia Cherniavskii	c24c5f9684	Make RecordFunction callbacks thread local and modernize interface (#37491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37491 This PR modernizes RecordFunction API and adds thread local callbacks in addition to the global ones Changes: - support for TLS callbacks, this is going to be the foundation of profiler and other tools - modernize interface around simple set of functions (add\|remove\|has\|clear)(Global\|ThreadLocal)(Callback) and adding RecordFunctionCallback to easily construct callbacks to be passed - we also add `.setShouldRun` into the callback interface to support cases when simple uniform sampling is not enough - to properly support add/remove introduce the idea of callback handle returned by add - internal implementation still uses SmallVector to store intermediate state (as before) - in this case these are vector of handles of callbacks that were picked to run - to speed up runtime we keep these vectors sorted, this way we can quickly enumerate callbacks that need to be run - added tests for new functionality Test Plan: BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py develop install ./build/bin/test_jit CI record_function_benchmark: https://gist.github.com/ilia-cher/f1e094dae47fe23e55e7672ac4dcda2f Imported from OSS Differential Revision: D21300448 fbshipit-source-id: 6d55c26dbf20b33d35c3f1604dcc07bb063c8c43	2020-05-07 14:51:02 -07:00
Edward Yang	4fef3763dd	Revert "Revert D21337640: [pytorch][PR] Split up documentation into subpages and clean up some warnings" (#37778 ) Summary: Original PR: https://github.com/pytorch/pytorch/pull/37419 cc mattip suo Pull Request resolved: https://github.com/pytorch/pytorch/pull/37778 Differential Revision: D21385774 Pulled By: ezyang fbshipit-source-id: 5de532faab8bae132736b6b5189e0ee2ac9935be	2020-05-04 14:32:35 -07:00
Michael Suo	20f7e62b1d	Revert D21337640: [pytorch][PR] Split up documentation into subpages and clean up some warnings Test Plan: revert-hammer Differential Revision: D21337640 Original commit changeset: d4ad198780c3 fbshipit-source-id: fa9ba6ac542173a50bdb45bfa12f3fec0ed704fb	2020-05-04 10:57:55 -07:00
mattip	f10fbcc820	Split up documentation into subpages and clean up some warnings (#37419 ) Summary: xref gh-32838, gh-34032 This is a major refactor of parts of the documentation to split it up using sphinx's `autosummary` feature which will build out `autofuction` and `autoclass` stub files and link to them. The end result is that the top module pages like torch.nn.rst and torch.rst are now more like table-of-contents to the actual single-class or single-function documentations pages. Along the way, I modified many of the docstrings to eliminate sphinx warnings when building. I think the only thing I changed from a non-documentation perspective is to add names to `__all__` when adding them to `globals()` in `torch.__init__.py` I do not know the CI system: are the documentation build artifacts available after the build, so reviewers can preview before merging? Pull Request resolved: https://github.com/pytorch/pytorch/pull/37419 Differential Revision: D21337640 Pulled By: ezyang fbshipit-source-id: d4ad198780c3ae7a96a9f22651e00ff2d31a0c0f	2020-05-04 09:39:22 -07:00
Ilia Cherniavskii	d068a456d3	[resubmit] Enable global observers API (#37382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37382 After adding c10::DispatchKey::Profiler the behavior of RecordFunction observers is also controlled by the dispatch key, this PR moves the logic outside of the profiler into the record function Reviewed By: jamesr66a Differential Revision: D21268320 fbshipit-source-id: 93207e3b55325d20dcc5b1e8f448ab86933321da	2020-04-28 10:49:31 -07:00
Michael Suo	20143e5f27	Revert D21245094: [resubmit] Enable global observers API Test Plan: revert-hammer Differential Revision: D21245094 Original commit changeset: 595e41b18206 fbshipit-source-id: 90344b361857d76ce5db75438c949dad1f5f186b	2020-04-27 16:19:46 -07:00
Wanchao Liang	1039b95ff0	[autograd] add documentation about multithread autograd (#37020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37020 Add multithread autograd documentation to the doc note. Test Plan: Imported from OSS Differential Revision: D21260996 Pulled By: wanchaol fbshipit-source-id: 91d523560268ae62d4c6d773121b282ba837a561	2020-04-27 15:53:21 -07:00
Ilia Cherniavskii	5fab4c30dd	[resubmit] Enable global observers API (#37292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37292 After adding c10::DispatchKey::Profiler the behavior of RecordFunction observers is also controlled by the dispatch key, this PR moves the logic outside of the profiler into the record function Reviewed By: jamesr66a Differential Revision: D21245094 fbshipit-source-id: 595e41b18206d2ba4cf639cb320f630907868b3f	2020-04-27 14:24:51 -07:00
Ilia Cherniavskii	856e8cf028	Revert D21213786: Enable global observers API Test Plan: revert-hammer Differential Revision: D21213786 Original commit changeset: e618254da74a fbshipit-source-id: 425ea5d44fa55655ec0dd586c5075996b926177b	2020-04-25 00:59:24 -07:00
Ilia Cherniavskii	6e659e928b	Enable global observers API (#37195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37195 After adding c10::DispatchKey::Profiler the behavior of RecordFunction observers is also controlled by the dispatch key, this PR moves the logic outside of the profiler into the record function Reviewed By: ngimel Differential Revision: D21213786 fbshipit-source-id: e618254da74a4f1ce16c51a3869bbd75a4f561ad	2020-04-24 23:49:28 -07:00
Alban Desmaison	3799d1d74a	Fix many doc issues (#37099 ) Summary: Fix https://github.com/pytorch/pytorch/issues/35643 https://github.com/pytorch/pytorch/issues/37063 https://github.com/pytorch/pytorch/issues/36307 https://github.com/pytorch/pytorch/issues/35861 https://github.com/pytorch/pytorch/issues/35299 https://github.com/pytorch/pytorch/issues/23108 https://github.com/pytorch/pytorch/issues/4661 Just a bunch of small updates on the doc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37099 Differential Revision: D21185713 Pulled By: albanD fbshipit-source-id: 4ac06d6709dc0da6109a6ad3daae75667ee5863e	2020-04-23 10:01:03 -07:00
Michael Carilli	e6bc34f549	Amp gradient accumulation example (#36601 ) Summary: Several people have asked me about proper Amp usage with gradient accumulation. In particular, it's [unclear to people](https://github.com/NVIDIA/apex/issues/439#issuecomment-610351482) that you should only call `scaler.unscale_()` (if desired) and `scaler.update()` in iterations where you actually plan to step. This PR adds a minimal accumulation example. I built the docs locally and it looks free from sphinx errors, at least. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36601 Differential Revision: D21082295 Pulled By: ngimel fbshipit-source-id: b2faa6c02b9f7e1972618a0f1d5360a03f0450ac	2020-04-17 09:56:36 -07:00
Jessica Lin	ac950bb9c8	Update docs for master to remove Python 2 references (#36336 ) Summary: Fix compile error from original PR in jit_language_references.rst: https://github.com/pytorch/pytorch/pull/36114 Full details in task: https://our.intern.facebook.com/intern/tasks/?t=64776265 With pytroch 1.5+ we remove python2 support from PyTorch. All documentation under docs/ and on the pytorch.org website needs to remove Python 2 references. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36336 Differential Revision: D21057507 Pulled By: jlin27 fbshipit-source-id: 993a763f1ecb16dad859bc02a07625ddc023645d	2020-04-16 10:15:48 -07:00
Edward Yang	6016f694c0	Revert D20901746: [pytorch][PR] Update docs for master to remove Python 2 references Test Plan: revert-hammer Differential Revision: D20901746 Original commit changeset: 07f8dc8e6fab fbshipit-source-id: 13c55597f9f79b8473210cf35a5a0f1fb34bae39	2020-04-08 14:49:11 -07:00
Jessica Lin	43234be525	Update docs for master to remove Python 2 references (#36114 ) Summary: Full details in task: https://our.intern.facebook.com/intern/tasks/?t=64776265 With pytroch 1.5+ we remove python2 support from PyTorch. All documentation under docs/ and on the pytorch.org website needs to remove Python 2 references. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36114 Differential Revision: D20901746 Pulled By: jlin27 fbshipit-source-id: 07f8dc8e6fab0b232e5048a63079cab0c433c85f	2020-04-07 16:13:18 -07:00
Rohan Varma	1f06db2579	Refactored rpc docs (#35109 ) Summary: Reorganize as per jlin27 's comments. Screenshots added in comments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35109 Differential Revision: D20788774 Pulled By: rohan-varma fbshipit-source-id: 7d64be70ef76ed6ff303d05d39c338293c234766	2020-04-01 02:01:34 -07:00
Ilia Cherniavskii	bc6bd0bb1a	Debug Information Guard Summary: This diff fixes the issues with current handling of debug information passed along the execution of the model. (For example, it is possible that multiple calls to the debug guard may override each other) Test Plan: CI test/cpp/jit Reviewed By: dzhulgakov Differential Revision: D20602775 fbshipit-source-id: 4683957954028af81a1a0f1f12b243650230c9bb	2020-04-01 01:55:29 -07:00
Ilia Cherniavskii	800d5617c0	Recording of TorchScript functions (#34710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34710 Extending RecordFunction API to support new recording scopes (such as TorchScript functions), as well as giving more flexibility to set sampling rate. Test Plan: unit test (test_misc.cpp/testRecordFunction) Reviewed By: gdankel, dzhulgakov Differential Revision: D20158523 fbshipit-source-id: a9e0819d21cc06f4952d92d43246587c36137582	2020-03-31 00:33:23 -07:00
pinzhenx	bd604cb5b7	Upgrade MKL-DNN to DNNL v1.2 (#32422 ) Summary: ## Motivation This PR upgrades MKL-DNN from v0.20 to DNNL v1.2 and resolves https://github.com/pytorch/pytorch/issues/30300. DNNL (Deep Neural Network Library) is the new brand of MKL-DNN, which improves performance, quality, and usability over the old version. This PR focuses on the migration of all existing functionalities, including minor fixes, performance improvement and code clean up. It serves as the cornerstone of our future efforts to accommodate new features like OpenCL support, BF16 training, INT8 inference, etc. and to let the Pytorch community derive more benefits from the Intel Architecture. <br> ## What's included? Even DNNL has many breaking changes to the API, we managed to absorb most of them in ideep. This PR contains minimalist changes to the integration code in pytorch. Below is a summary of the changes: <br> General: 1. Replace op-level allocator with global-registered allocator ``` // before ideep::sum::compute<AllocForMKLDNN>(scales, {x, y}, z); // after ideep::sum::compute(scales, {x, y}, z); ``` The allocator is now being registeted at `aten/src/ATen/native/mkldnn/IDeepRegistration.cpp`. Thereafter all tensors derived from the `cpu_engine` (by default) will use the c10 allocator. ``` RegisterEngineAllocator cpu_alloc( ideep::engine::cpu_engine(), [](size_t size) { return c10::GetAllocator(c10::DeviceType::CPU)->raw_allocate(size); }, [](void* p) { c10::GetAllocator(c10::DeviceType::CPU)->raw_deallocate(p); } ); ``` ------ 2. Simplify group convolution We had such a scenario in convolution where ideep tensor shape mismatched aten tensor: when `groups > 1`, DNNL expects weights tensors to be 5-d with an extra group dimension, e.g. `goihw` instead of `oihw` in 2d conv case. As shown below, a lot of extra checks came with this difference in shape before. Now we've completely hidden this difference in ideep and all tensors are going to align with pytorch's definition. So we could safely remove these checks from both aten and c2 integration code. ``` // aten/src/ATen/native/mkldnn/Conv.cpp if (w.ndims() == x.ndims() + 1) { AT_ASSERTM( groups > 1, "Only group _mkldnn_conv2d weights could have been reordered to 5d"); kernel_size[0] = w.get_dim(0) * w.get_dim(1); std::copy_n( w.get_dims().cbegin() + 2, x.ndims() - 1, kernel_size.begin() + 1); } else { std::copy_n(w.get_dims().cbegin(), x.ndims(), kernel_size.begin()); } ``` ------ 3. Enable DNNL built-in cache Previously, we stored DNNL jitted kernels along with intermediate buffers inside ideep using an LRU cache. Now we are switching to the newly added DNNL built-in cache, and no longer caching buffers in order to reduce memory footprint. This change will be mainly reflected in lower memory usage from memory profiling results. On the code side, we removed couple of lines of `op_key_` that depended on the ideep cache before. ------ 4. Use 64-bit integer to denote dimensions We changed the type of `ideep::dims` from `vector<int32_t>` to `vector<int64_t>`. This renders ideep dims no longer compatible with 32-bit dims used by caffe2. So we use something like `{stride_.begin(), stride_.end()}` to cast parameter `stride_` into a int64 vector. <br> Misc changes in each commit: Commit: change build options Some build options were slightly changed, mainly to avoid name collisions with other projects that include DNNL as a subproject. In addition, DNNL built-in cache is enabled by option `DNNL_ENABLE_PRIMITIVE_CACHE`. Old \| New -- \| -- WITH_EXAMPLE \| MKLDNN_BUILD_EXAMPLES WITH_TEST \| MKLDNN_BUILD_TESTS MKLDNN_THREADING \| MKLDNN_CPU_RUNTIME MKLDNN_USE_MKL \| N/A (not use MKL anymore) ------ Commit: aten reintegration - aten/src/ATen/native/mkldnn/BinaryOps.cpp Implement binary ops using new operation `binary` provided by DNNL - aten/src/ATen/native/mkldnn/Conv.cpp Clean up group convolution checks Simplify conv backward integration - aten/src/ATen/native/mkldnn/MKLDNNConversions.cpp Simplify prepacking convolution weights - test/test_mkldnn.py Fixed an issue in conv2d unit test: it didn't check conv results between mkldnn and aten implementation before. Instead, it compared the mkldnn with mkldnn as the default cpu path will also go into mkldnn. Now we use `torch.backends.mkldnn.flags` to fix this issue - torch/utils/mkldnn.py Prepack weight tensor on module `__init__` to achieve better performance significantly ------ Commit: caffe2 reintegration - caffe2/ideep/ideep_utils.h Clean up unused type definitions - caffe2/ideep/operators/adam_op.cc & caffe2/ideep/operators/momentum_sgd_op.cc Unify tensor initialization with `ideep::tensor::init`. Obsolete `ideep::tensor::reinit` - caffe2/ideep/operators/conv_op.cc & caffe2/ideep/operators/quantization/int8_conv_op.cc Clean up group convolution checks Revamp convolution API - caffe2/ideep/operators/conv_transpose_op.cc Clean up group convolution checks Clean up deconv workaround code ------ Commit: custom allocator - Register c10 allocator as mentioned above <br><br> ## Performance We tested inference on some common models based on user scenarios, and most performance numbers are either better than or on par with DNNL 0.20. ratio: new / old \| Latency (batch=1 4T) \| Throughput (batch=64 56T) -- \| -- \| -- pytorch resnet18 \| 121.4% \| 99.7% pytorch resnet50 \| 123.1% \| 106.9% pytorch resnext101_32x8d \| 116.3% \| 100.1% pytorch resnext50_32x4d \| 141.9% \| 104.4% pytorch mobilenet_v2 \| 163.0% \| 105.8% caffe2 alexnet \| 303.0% \| 99.2% caffe2 googlenet-v3 \| 101.1% \| 99.2% caffe2 inception-v1 \| 102.2% \| 101.7% caffe2 mobilenet-v1 \| 356.1% \| 253.7% caffe2 resnet101 \| 100.4% \| 99.8% caffe2 resnet152 \| 99.8% \| 99.8% caffe2 shufflenet \| 141.1% \| 69.0% † caffe2 squeezenet \| 98.5% \| 99.2% caffe2 vgg16 \| 136.8% \| 100.6% caffe2 googlenet-v3 int8 \| 100.0% \| 100.7% caffe2 mobilenet-v1 int8 \| 779.2% \| 943.0% caffe2 resnet50 int8 \| 99.5% \| 95.5% _Configuration: Platform: Skylake 8180 Latency Test: 4 threads, warmup 30, iteration 500, batch size 1 Throughput Test: 56 threads, warmup 30, iteration 200, batch size 64_ † Shufflenet is one of the few models that require temp buffers during inference. The performance degradation is an expected issue since we no longer cache any buffer in the ideep. As for the solution, we suggest users opt for caching allocator like jemalloc as a drop-in replacement for system allocator in such heavy workloads. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32422 Test Plan: Perf results: https://our.intern.facebook.com/intern/fblearner/details/177790608?tab=Experiment%20Results 10% improvement for ResNext with avx512, neutral on avx2 More results: https://fb.quip.com/ob10AL0bCDXW#NNNACAUoHJP Reviewed By: yinghai Differential Revision: D20381325 Pulled By: dzhulgakov fbshipit-source-id: 803b906fd89ed8b723c5fcab55039efe3e4bcb77	2020-03-26 22:07:59 -07:00
Michael Carilli	0f0271e255	[RELAND2] Eager autocasting, out-of-place ops only (with MSVC 2017 fix) (#35102 ) Summary: This is the second reland attempt for https://github.com/pytorch/pytorch/pull/32140. The first reland attempt https://github.com/pytorch/pytorch/pull/35011 failed due a [small incompatible change](https://github.com/pytorch/pytorch/pull/35011#issuecomment-601754216) in recent master (`skipIfRocm` was removed from `test_data_parallel.py`). The present PR restores skipIfRocm. Description from first reland attempt https://github.com/pytorch/pytorch/pull/35011: > https://github.com/pytorch/pytorch/pull/32140 was approved and merged, but [reverted](`d0577e19f0`) because it broke builds with versions of Visual Studio older than 15.8 that were not represented in public CI. The build failures were caused by a [known VS bug](https://developercommunity.visualstudio.com/content/problem/27729/allow-function-with-internal-linkage-as-template-n.html), fixed in versions 15.8 and newer. > > The present PR reverts the revert (restoring https://github.com/pytorch/pytorch/pull/32140 's diffs) and adds a workaround to enable compilation with VS < 15.8. The workaround isn't pretty, but it's guarded by macros such that it's only used when compiling with VS < 15.8. All other builds compile with the same code/control flow as was merged in https://github.com/pytorch/pytorch/pull/32140. > > Original description of https://github.com/pytorch/pytorch/pull/32140: > > Initial integration of eager autocasting, supporting out-of-place ops only for easier review. > Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081 > > > In-place ops and ops with user-supplied out=... can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/issues/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35102 Differential Revision: D20596918 Pulled By: ezyang fbshipit-source-id: 60caa279bb0ce4a9bb0b28c1d585d42cf1cc7e50	2020-03-24 09:08:04 -07:00

1 2 3 4

187 Commits