pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
chunyuan	8b11d81058	[Re-landing 68111] Add JIT graph fuser for oneDNN Graph API (Preview4.1) Re-landing https://github.com/pytorch/pytorch/pull/68111 ## Description Preview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444). On the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included: - The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used - The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties. ### User API: The optimization pass is disabled by default. Users could enable it by: ``` torch.jit.enable_onednn_fusion(True) ``` ### Performance: [pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance: - SkyLake 8180 (1 socket of 28 cores): ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png) - SkyLake 8180 (single thread): ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png) \* By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI) \** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops ### Directory structure of the integration code Fuser-related code are placed under: ``` torch/csrc/jit/codegen/onednn/ ``` Optimization pass registration is done in: ``` torch/csrc/jit/passes/onednn_graph_fuser.h ``` CMake for the integration code is: ``` caffe2/CMakeLists.txt ``` ## Limitations - In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step. - We have only optimized the inference use case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74596 Approved by: https://github.com/malfet	2022-04-29 01:01:33 +00:00
Ivan Yashchuk	8bb7203049	Add torch.linalg.ldl_factor_ex and torch.linalg.ldl_solve This PR adds a function for computing the LDL decomposition and a function that can solve systems of linear equations using this decomposition. The result of `torch.linalg.ldl_factor_ex` is in a compact form and it's required to use it only through `torch.linalg.ldl_solve`. In the future, we could provide `ldl_unpack` function that transforms the compact representation into explicit matrices. Fixes https://github.com/pytorch/pytorch/issues/54847. cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/69828 Approved by: https://github.com/Lezcano, https://github.com/mruberry, https://github.com/albanD	2022-04-28 19:23:37 +00:00
Jerry Zhang	30342f6ba6	[quant][docs] Fix formatting for quantization.rst (#76223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76223 Small formatting fixes that was missed because I didn't check the generated doc last time Test Plan: visual inspection of the generated docs for this PR Imported from OSS Reviewed By: HDCharles Differential Revision: D35853174 fbshipit-source-id: 4454a4bf5d0c998d866bbae1d6b5286827082033 (cherry picked from commit 125f60356ccc9cd6888c515889bd27ff9860ec74)	2022-04-26 03:16:39 +00:00
Elias Ellison	0d7be81c9c	[JIT] Add Context Manager to force strict fusion Fixes https://github.com/pytorch/pytorch/issues/75464 Adds a context manager that will throw if the ops in the context are not fused. API is : ``` with torch.jit.strict_fusion(): ... ``` A few TODOs: [+] Compose/figure out how to do with autodiff - right now it will run on autodiff as well [+] Support all of the nvfuser operators that are added in guarding [+] Figure out what to do with control flow that isn't taken (right now it will just error). this is probably a source of the original issue :/ - will just error [+] (After those are figured out) add to docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/75777 Approved by: https://github.com/davidberard98	2022-04-25 16:08:57 +00:00
Jerry Zhang	056627ddce	[quant][docs] Add more docs for quantization.rst (#75998 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75998 Add more details to user facing docs quantization.rst, which will be displayed in the official quantization doc page: https://pytorch.org/docs/stable/quantization.html This includes: * docs for quantization stack (quantized tensor, quantized operator and modules, observer, fake_quantize, QConfig, quantization flow) * Added support table for quantization mode, quantization flow mode and backend, (also moved around operator support table) * restructured eager mode and fx mode docs as well Test Plan: inspect the doc that's built by github ci Imported from OSS Reviewed By: dzdang Differential Revision: D35739111 fbshipit-source-id: 3762d387479bdd37472cb17d5c49da2f520effbb (cherry picked from commit db5e6411c52c08dd9c45f841ab86713d36a75d51)	2022-04-22 06:42:39 -07:00
albanD	a6a5e6cecf	move the stateless util to public API! Pull Request resolved: https://github.com/pytorch/pytorch/pull/75834 Approved by: https://github.com/zou3519, https://github.com/jbschlosser	2022-04-21 13:42:24 +00:00
kshitij12345	aa51704ce5	[complex32] add chalf alias for complex32 and chalf method Reference: https://github.com/pytorch/pytorch/issues/74537 Adds chalf alias for complex32 and also adds method `chalf` similar to `cfloat, cdouble` TODO: * [x] Add docs * [x] Add override Pull Request resolved: https://github.com/pytorch/pytorch/pull/75320 Approved by: https://github.com/anjali411	2022-04-20 23:44:47 +00:00
Jerry Zhang	74454bdb46	[quant][fx] Move backend_config folder to torch.ao.quantization Summary: Following https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md we implemented the backend configuration for fbgemm/qnnpack backend, currently it was under fx folder, but we'd like to use this for all different workflows, including eager, fx graph and define by run quantization, this PR moves it to torch.ao.quantization namespace so that it can be shared by different workflows Also moves some utility functions specific to fx to fx/backend_config_utils.py and some files are kept in fx folder (quantize_handler.py and fuse_handler.py) Test Plan: python test/teset_quantization.py TestQuantizeFx python test/teset_quantization.py TestQuantizeFxOps python test/teset_quantization.py TestQuantizeFxModels python test/test_quantization.py TestAOMigrationQuantization python test/test_quantization.py TestAOMigrationQuantizationFx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75823 Approved by: https://github.com/vkuzo	2022-04-19 15:38:57 +00:00
Alban Desmaison	bd7e99cbb9	Fix doc build Regression introduced in https://github.com/pytorch/pytorch/pull/73224 The caller for this script has never been updated to pass in main: `2ecc59086a/.github/workflows/_docs.yml (L81-L85)` So this change made it so that all PR doc is built as-if it was a release (for example https://github.com/pytorch/pytorch/runs/6031182009?check_suite_focus=true) and so the coverage test for the doc didn't run for a month :( Pull Request resolved: https://github.com/pytorch/pytorch/pull/75997 Approved by: https://github.com/musebc, https://github.com/seemethere	2022-04-19 04:07:47 +00:00
Brian Johnson	990d155c9c	Update Index.rst to add TorchRec to domain list. Adds TorchRec and TorchData to domain library list. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73229 Approved by: https://github.com/colin2328, https://github.com/jamesr66a	2022-04-15 02:39:12 +00:00
Nikita Shulga	348881deaf	Update doc copyrights to 2022 Also, s/Torch/PyTorch/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/75690 Approved by: https://github.com/kit1980, https://github.com/soumith	2022-04-13 00:25:23 +00:00
Yulv-git	ac2d2e3a3d	Fix some typos. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/75561 Approved by: https://github.com/albanD	2022-04-11 21:55:59 +00:00
Nuno-Mota	0bd3354547	Update onnx.rst Fixes #75508 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75509 Approved by: https://github.com/BowenBao	2022-04-08 20:07:01 +00:00
Mikayla Gawarecki	11f1fef981	Update documentation for scatter_reduce Pull Request resolved: https://github.com/pytorch/pytorch/pull/74608 Approved by: https://github.com/cpuhrsch	2022-04-07 15:41:23 +00:00
Thiago Crepaldi	89e79f844d	Add list of supported ATen ops by ONNX converter into torch.onnx page This PR introduces a new documentation page with a list of supported ATen operators by the ONNX converter. When `make html` (or similar) are called, a python script will generate a temporary CSV file inside the doc build folder with a list of operators/opsets currently supported by the PyTorch ONNX exporter. That CSV is used by Sphinx to build a HTML table using the same theme as the rest of the documentation. That page is linked to the existing `onnx.rst`, including its table of contents. @BowenBao @shubhambhokare1 Feel free to add more details on how the script cross reference onnx symbolics and aten operators list from torch jit api` Below is the workflow for the changed pages: The initial torch.onnx page was modified to add a link to the list of supported aten operators ![image](https://user-images.githubusercontent.com/5469809/159046387-c459bffc-c9b2-4fcb-8468-8181fdddf911.png) The screen below highlights the text structure changes to the `ATen operartors` section ![image](https://user-images.githubusercontent.com/5469809/159046730-ccd1e594-c8e6-4b8d-a9ec-8bf6ad58a435.png) Finally the new page with the list of supported operators is shown below ![image](https://user-images.githubusercontent.com/5469809/159046872-0d99b769-8b95-4c2b-99a9-a8cfdd0b6ecf.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/74397 Approved by: https://github.com/garymm, https://github.com/malfet	2022-04-07 00:05:44 +00:00
Vasiliy Kuznetsov	74b23b2066	quantization: autogenerate quantization backend configs for documentation (#75126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75126 Quantization has a high volume of configurations of how to quantize an op for a reference model representation which is useful for a lowering step for a backend. An example of this is ``` {'dtype_configs': [{'input_dtype': torch.quint8, 'output_dtype': torch.quint8}], 'observation_type': <ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT: 0>, 'pattern': <class 'torch.nn.modules.conv.ConvTranspose1d'>}, ``` These configs are checked into master, and they are created with Python functions. Therefore, there is no easy way for the user to see what the configs actually are without running some Python code. This PR is one approach to document these configs. Here is what this is doing: 1. during documentation build, write a text file of the configs 2. render that text file on a quantization page, with some additional context In the future, this could be extended to autogenerate better looking tables such as: op support per backend and dtype, op support per valid quantization settings per backend, etc. Test Plan: ``` cd docs make html cd html python -m http.server 8000 // render http://[::]:8000/quantization-backend-configuration.html // it renders correctly ``` Reviewed By: ejguan Differential Revision: D35365461 Pulled By: vkuzo fbshipit-source-id: d60f776ccb57da9db3d09550e4b27bd5e725635a (cherry picked from commit 14865c0e23bc080120342c8f9278f0fae8eb8fbd)	2022-04-04 22:22:30 +00:00
Sherlockk Huang	bbf7e159e0	Implement torch.special.log_ndtr Implements torch.special.log_ndtr Issue: https://github.com/pytorch/pytorch/issues/50345 TODO: - [x] adding proper reference to scipy implementation - [x] double check if the changes in test/test_unary_ufuncs.py is really necessary - [x] check setting for UnaryUfuncInfo cc: @kshitij12345 @mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/74795 Approved by: https://github.com/anjali411	2022-03-29 23:13:37 +00:00
Smark	ab57876420	fix docs error in Autograd Mechanics Fixes #74682 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74807 Approved by: https://github.com/albanD	2022-03-29 18:32:16 +00:00
Janakan	923a922b1b	Grammatically updated quantization tech doc Improved PyTorch technical documentation consistency for the "quantization API summary" section. ![Screen Shot 2022-03-19 at 4 07 46 PM](https://user-images.githubusercontent.com/72175053/160317638-51e26ec0-903e-44ba-ba59-aa114d4fda93.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/74436 Approved by: https://github.com/albanD	2022-03-28 16:48:25 +00:00
Kurt Mohler	79ddc72b85	Virtualize `<type>Storage` classes (#66970 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66228 cc ezyang bhosmer smessmer ljk53 bdhirsh Pull Request resolved: https://github.com/pytorch/pytorch/pull/66970 Reviewed By: bdhirsh Differential Revision: D33245612 Pulled By: ezyang fbshipit-source-id: 4c61c2cb029e2b94b0e68927c377d3e1c358dd7c (cherry picked from commit d29fcdfb4bc2cc17b1795d4349e4b56fa0d1cf12)	2022-03-22 23:44:48 +00:00
leslie-fang-intel	3a112ebb57	add autocast cpu doc As discussed in https://github.com/pytorch/pytorch/issues/55374#issuecomment-968333614, here we update the cpu autocast operation list in autocast API document. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68567 Approved by: https://github.com/ezyang	2022-03-22 02:02:43 +00:00
Michael Suo	e5bf87963d	Revert D34584878: [pytorch][PR] Add JIT graph fuser for oneDNN Graph API (Preview4) Test Plan: revert-hammer Differential Revision: D34584878 (`7dd0823011`) Original commit changeset: ce817aa8cc90 Original Phabricator Diff: D34584878 (`7dd0823011`) fbshipit-source-id: a941aaad34f8fe5f0c51f719f9f5c29b811c4d5b (cherry picked from commit a43262ec7521b1665b02a64d3f279e72ee2344b9)	2022-03-21 23:07:14 +00:00
chunyuan	7dd0823011	Add JIT graph fuser for oneDNN Graph API (Preview4) (#68111 ) Summary: ## Description Preview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444). On the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included: - The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used - The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties. ### User API: The optimization pass is disabled by default. Users could enable it by: ``` torch.jit.enable_onednn_fusion(True) ``` ### Performance: [pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance: - SkyLake 8180 (1 socket of 28 cores): ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png) - SkyLake 8180 (single thread): ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png) \* By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI) \** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops ### Directory structure of the integration code Fuser-related code are placed under: ``` torch/csrc/jit/codegen/onednn/ ``` Optimization pass registration is done in: ``` torch/csrc/jit/passes/onednn_graph_fuser.h ``` CMake for the integration code is: ``` caffe2/CMakeLists.txt ``` ## Limitations - In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step. - We have only optimized the inference use case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68111 Reviewed By: eellison Differential Revision: D34584878 Pulled By: malfet fbshipit-source-id: ce817aa8cc9052ee9ed930c9cf66be83449e61a4 (cherry picked from commit cd17683aa7d9c0947df45a1ab53627feff795587)	2022-03-21 22:12:19 +00:00
Jaewon Lee	11ea09effc	[CUDACachingAlloc/GPUInference] Implement garbage collection without GPU sync (#74261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74261 ### Goal Implement a cheap way to reclaim GPU memory (garbage collection) without incurring GPU sync. ### Why do we need this? Currently, there are only two ways to reclaim GPU memory block already assigned to a particular stream. - `release_available_cached_blocks(params)`: Free blocks exceeding the `CachingAllocatorConfig::max_split_size()` until we can satisfy the request. Issue: If the `max_split_size` is unset (default), this function is a no-op. Even if this is set, the reclamation is quite conservative (e.g., never frees blocks under max_split_size). - `release_cached_blocks()`: Waits for all the in-flight events and then reclaim blocks. Issue: 'waiting for all event' is very expensive as it will likely stall all the GPU operations. Many GPU applications without a proper handling of potential GPU throttling would suffer/crash. ### Proposed idea - If the garbage collection threshold is set, try to reclaim some memory blocks without synchronization. It should be safe to do so, as `release_available_cached_blocks` essentially does the same thing (but less aggressively). - GC is triggered only when we fail to serve a `malloc` request from the block pool. No need to free blocks when the block pool is functioning just fine. - Prioritize reclaiming blocks that weren't reused for long time. Reclamation stops once the used memory capacity < threshold. - This code path is totally optional; by default it won't be invoked. Test Plan: - Unit tests - Manually checked that the GPU memory usage stays as indicated by the garbage collector. If not the caching allocator at least tries to keep freeing the blocks. Reviewed By: jianyuh Differential Revision: D34482514 fbshipit-source-id: d5eae62ac60b94b0bca851f9d233a092d086e3c2 (cherry picked from commit 05780f1ed4b176f05e765b2411c9eaa2eaeb48b0)	2022-03-21 18:46:02 +00:00
BowenBao	54a6942f8d	[ONNX] ONNX Exporter logging (#71342 ) Summary: Add ONNX exporter logging facility. Supporting both C++/Python logging api. Logging can be turned on/off. Logging output stream can be either set to `stdout` or `stderr`. A few other changes: * When exception is raised in passes, the current IR graph being processed will be logged. * When exception is raised from `_jit_pass_onnx` (the pass that converts nodes from namespace `ATen` to `ONNX`), both ATen IR graph and ONNX IR graph under construction will be logged. * Exception message for ConstantFolding is truncated to avoid being too verbose. * Update the final printed IR graph with node name in ONNX ModelProto as node attribute. Torch IR Node does not have name. Adding this to printed IR graph helps debugging. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71342 Reviewed By: msaroufim Differential Revision: D34433473 Pulled By: malfet fbshipit-source-id: 4b137dfd6a33eb681a5f2612f19aadf5dfe3d84a (cherry picked from commit 67a8ebed5192c266f604bdcca931df6fe589699f)	2022-03-17 19:40:03 +00:00
Banit Agrawal	ac3effd150	[PyTorch GPU Allocator] Better use of blocks with rounding of allocation sizes (#74213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74213 In the current CUDACachingAllocator, the sizes are rounded up in multiple of blocks size of 512, so this works for smaller sizes. However for large sizes, we can have lots of different size blocks in the larger pool. This is problematic when we have variable batch sizes 1001, 1021, 1023 -> all will go to different block size and will create different size of blocks. This will create lots of unused blocks and will waste GPU memory capacity. This diff adds a rounding approach to allocation size. It rounds up the size to nearest power-of-2 divisions and the power2-division can be changed with env variable setting. For example, if we need to round-up size of1200 and if number of divisions is 4, the size 1200 lies between 1024 and 2048 and if we do 4 divisions between them, the values are 1024, 1280, 1536, and 1792. So the function will return 1280 as the nearest ceiling of power-2 division. env setting: export PYTORCH_CUDA_ALLOC_CONF=roundup_power2_divisions:4 ghstack-source-id: 151446017 Reviewed By: ezyang Differential Revision: D34868036 fbshipit-source-id: 494785add16e6b37c920dcb5a2b81d4c637b554a (cherry picked from commit 548454ccacbd8700e7ffd2d762e40b4ba37abbae)	2022-03-16 02:53:53 +00:00
Ke Wen	1f04a00ccf	[PyTorch Distributed] Update documentation about NCCL environment variables (#74006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74006 updated recommendations about environment variables to use during debug and performance tuning Test Plan: `make html` Reviewed By: rohan-varma Differential Revision: D34767454 fbshipit-source-id: 08cd58469bf72b58702e50e82020fa19b43b5911 (cherry picked from commit ac7e6630f8043f85d3d16be17c6a8ad1ebb2990c)	2022-03-11 23:57:17 +00:00
Alban Desmaison	734281c3d6	Cleanup all module references in doc (#73983 ) Summary: Working towards https://docs.google.com/document/d/10yx2-4gs0gTMOimVS403MnoAWkqitS8TUHX73PN8EjE/edit?pli=1# This PR: - Ensure that all the submodules are listed in a rst file (that ensure they are considered by the coverage tool) - Remove some long deprecated code that just error out on import - Remove the allow list altogether to ensure nothing gets added back there Pull Request resolved: https://github.com/pytorch/pytorch/pull/73983 Reviewed By: anjali411 Differential Revision: D34787908 Pulled By: albanD fbshipit-source-id: 163ce61e133b12b2f2e1cbe374f979e3d6858db7 (cherry picked from commit c9edfead7a01dc45bfc24eaf7220d2a84ab1f62e)	2022-03-10 22:26:29 +00:00
Alban Desmaison	238f7d9cbf	rename config module file to work with gh pages better Fixes https://github.com/pytorch/pytorch/issues/62018 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74038 Approved by: https://github.com/mruberry, https://github.com/seemethere	2022-03-10 20:41:44 +00:00
Rohit Goswami	979a78f8b2	Sphinx panel Fixes https://github.com/pytorch/pytorch/issues/73835. The full context for this is detailed in the issue, but briefly: - Adds `sphinx-panel` Other PRs will demonstrate usage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73836 Approved by: https://github.com/albanD	2022-03-07 14:50:09 +00:00
Pritam Damania	71aa3ab020	Add note in RPC docs about retries. (#73601 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73601 Some users had questions about how the RPC framework deals with failures and whether we retry. Adding a note about this to our docs to elaborate on our current behavior and why we chose that approach. ghstack-source-id: 150359866 Test Plan: view docs. Reviewed By: mrshenli Differential Revision: D34560199 fbshipit-source-id: ee33ceed7fa706270d4ca5c8fcff7535583490ff (cherry picked from commit 954a906240cc40aacf08ca13f6554a35303a678a)	2022-03-03 00:29:31 +00:00
Ren Pang	e8b10b6e34	fix wrong indexing of class names in docs Fixes #73631 Locally built and tested. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73632 Approved by: jbschlosser	2022-03-02 22:21:21 +00:00
Christian Puhrsch	484c0de670	Minimal NestedTensor (#72881 ) Summary: This PR adds a minimal version of a NestedTensor. It introduces the general harness future development can be built around. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72881 Reviewed By: albanD Differential Revision: D34259177 Pulled By: cpuhrsch fbshipit-source-id: 0245c36f603424e20f3b09651043c207f526d760 (cherry picked from commit 10764e8d427f29b364567e4cbc86ed73c3933158)	2022-03-02 16:31:51 +00:00
Nikita Shulga	8ac7393565	Revert D33767740: [pytorch][PR] Sparse CSR CPU: cuSolverSP backend for `linalg.solve` Test Plan: revert-hammer Differential Revision: D33767740 (`199d9a992c`) Original commit changeset: a945f065210c Original Phabricator Diff: D33767740 (`199d9a992c`) fbshipit-source-id: b7934df18118f8d6d5f165deb5aae9887953ae43 (cherry picked from commit d3ddbb021b227e3638f6f7c22c6eadfa73695e31)	2022-03-01 18:33:23 +00:00
Kushashwa Ravi Shrimali	199d9a992c	Sparse CSR CPU: cuSolverSP backend for `linalg.solve` (#71399 ) Summary: This PR introduces the `cuSolverSP` backend for `linalg.solve` with sparse CSR input matrices. The motivation comes from the issue: https://github.com/pytorch/pytorch/issues/69538. `cuSolver` provides [`cusolverSp<t>csrlsvluHost`](https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-lt-t-gt-csrlsvlu) API, a few things to note: 1. As mentioned in the documentation: `only CPU (Host) path is provided.` From the profiling, there doesn't seem to be any GPU kernel launch for optimization, please see the profiling below. 2. Since only `host` path is provided, the CPU path uses `csrlsvluHost` (but requires PyTorch to be installed/built with CUDA support). 3. The documentation mentions reordering helps optimize stuff, but it isn't clear how it affects the performance. There are options for reordering, so we stick to `reorder = 0` as the default choice. `cuSolver` has [`csrlsvqr`](https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-lt-t-gt-csrlsvqr) function which provides a `device` path to solve the linear system. This function is used for the CUDA path in this PR. Gist: For CPU Path: we call [`csrlsvluHost` function of cuSolver](https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-lt-t-gt-csrlsvlu). For CUDA Path: we call [`csrlsvqr` function of cuSolver](https://docs.nvidia.com/cuda/cusolver/index.html#cusolver-lt-t-gt-csrlsvqr). Profiling: (On sparse input tensor of size 1000 x 1000, with a vector of shape length 1000), for `csrlsvlu` function (to show no GPU optimization) ```cpp ==3999651== Profiling result: Type Time(%) Time Calls Avg Min Max Name GPU activities: 100.00% 2.1440us 1 2.1440us 2.1440us 2.1440us [CUDA memcpy HtoD] API calls: 99.72% 1.07199s 9 119.11ms 500ns 1.07164s cudaFree 0.11% 1.2182ms 398 3.0600us 140ns 137.94us cuDeviceGetAttribute 0.06% 674.45us 4 168.61us 165.50us 173.64us cuDeviceTotalMem 0.03% 357.07us 4 89.268us 2.7800us 201.89us cudaMalloc 0.03% 309.29us 1 309.29us 309.29us 309.29us cudaGetDeviceProperties 0.01% 160.47us 332 483ns 350ns 3.3300us cudaFuncSetAttribute 0.01% 115.12us 4 28.780us 26.290us 33.410us cuDeviceGetName 0.00% 28.591us 5 5.7180us 440ns 16.921us cudaGetDevice 0.00% 22.061us 4 5.5150us 871ns 18.690us cudaDeviceSynchronize 0.00% 20.370us 18 1.1310us 410ns 6.9900us cudaEventDestroy 0.00% 16.390us 1 16.390us 16.390us 16.390us cudaMemcpy 0.00% 11.540us 2 5.7700us 1.4900us 10.050us cuDeviceGetPCIBusId 0.00% 10.510us 18 583ns 430ns 1.6200us cudaEventCreateWithFlags 0.00% 7.9100us 21 376ns 290ns 700ns cudaDeviceGetAttribute 0.00% 1.4300us 6 238ns 150ns 590ns cuDeviceGet 0.00% 1.2200us 4 305ns 190ns 500ns cuDeviceGetCount 0.00% 900ns 1 900ns 900ns 900ns cuInit 0.00% 860ns 4 215ns 180ns 260ns cuDeviceGetUuid 0.00% 240ns 1 240ns 240ns 240ns cuDriverGetVersion 0.00% 230ns 1 230ns 230ns 230ns cudaGetDeviceCount ``` Script: ```python import torch def solve(x, other, out): torch.linalg.solve(x, other, out=out) if __name__ == "__main__": dense_inp = torch.randn((1000, 1000), dtype=torch.float64) # Set 50% of the values to 0 randomly dense_inp = torch.nn.functional.dropout(dense_inp, p=0.5) sparse_inp = dense_inp.to_sparse_csr() other = torch.randint(100, (1000,), dtype=torch.float64) out = torch.randint(1, (1000,), dtype=torch.float64) solve(sparse_inp, other, out) ``` The following error is raised when the function is used on a CPU device with PyTorch built/installed without CUDA support: * When built without CUDA support: ```python /home/krshrimali/pytorch/torch/autograd/profiler.py:151: UserWarning: CUDA is not available, disabling CUDA profiling warn("CUDA is not available, disabling CUDA profiling") Traceback (most recent call last): File "/home/krshrimali/pytorch/test_sp.py", line 17, in <module> solve(x, other, out) File "/home/krshrimali/pytorch/test_sp.py", line 5, in solve torch.linalg.solve(x, other, out=out) RuntimeError: PyTorch was not built with CUDA support. Please use PyTorch built CUDA support ``` Performance Comparison (vs SciPy's [`scipy.sparse.linalg.spsolve`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.spsolve.html): Time taken by `scipy.sparse.linalg.spsolve` : 0.595 seconds On CPU: Time taken by `torch.linalg.solve` : 4.565 seconds On CUDA: Time taken by `torch.linalg.solve`: 1.838 seconds The inputs are of dimensions: (17281, 17281) and (17281, 1), and were taken from https://math.nist.gov/MatrixMarket/extreme.html. Thanks to IvanYashchuk for helping me with the PR, and guiding me through it. cc: IvanYashchuk pearu nikitaved cpuhrsch cc nikitaved pearu cpuhrsch Pull Request resolved: https://github.com/pytorch/pytorch/pull/71399 Reviewed By: VitalyFedyunin Differential Revision: D33767740 Pulled By: cpuhrsch fbshipit-source-id: a945f065210cd719096eb8d7cdbf8e8937c2fce9 (cherry picked from commit f4f35c17da414e1ca6c6d91402933521857aa1ea)	2022-03-01 05:32:35 +00:00
Vasiliy Kuznetsov	01bd6f4357	pytorch: fix typo in quantization docs (#73511 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73511 Fixes typo in describing the `torch.qint32` data type. Test Plan: CI Reviewed By: andrewor14 Differential Revision: D34522741 Pulled By: vkuzo fbshipit-source-id: f05f8440d9708281213a4b3736e8f59199dd7b1a (cherry picked from commit ca9e598d60cac016e58fda9cd0f329ca412ec36b)	2022-02-28 23:11:52 +00:00
Peter Bell	f437ca6e8e	Remove legacy tensor constructors for complex dtypes PR #72405 added four new types to the public python API: `torch.ComplexFloatTensor`, `torch.ComplexDoubleTensor`, `torch.cuda.ComplexFloatTensor` and `torch.cuda.ComplexDoubleTensor`. I believe this was unintentional and a clarifying comment as to the purpose of `all_declared_types` is needed to avoid this in future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73370	2022-02-28 15:13:44 +00:00
Philip Meier	c6f1bbc0ac	promote torch.testing to stable (#73348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73348 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D34457727 Pulled By: mruberry fbshipit-source-id: 2cc812b643e0d1e753bead2751ee79b3f03fde20 (cherry picked from commit bcdaca1a019a679b8b274e2fb5f19bfd08874ce9)	2022-02-25 06:30:31 +00:00
Jacob Hepkema	91261feb7b	Add SoftplusTransform (#52300 ) Summary: This pull request introduces `SoftplusTransform` to `torch.distributions.transforms`. `SoftplusTransform` transforms via the mapping `Softplus(x) = log(1 + exp(x))`. Note that the transform is different to [`torch.nn.Softplus`](https://pytorch.org/docs/stable/generated/torch.nn.Softplus.html#torch.nn.Softplus), as that has additional `beta` and `threshold` parameters. Inverse and `log_abs_det_jacobian` for a more complex `SoftplusTransform` can be added in the future. vitkl fritzo Addresses the issue discussed here: [pyro issue 855](https://github.com/pyro-ppl/numpyro/issues/855) Pull Request resolved: https://github.com/pytorch/pytorch/pull/52300 Reviewed By: albanD, ejguan Differential Revision: D34082655 Pulled By: neerajprad fbshipit-source-id: 6114e74ee5d73c1527191bed612a142d691e2094 (cherry picked from commit a181a3a9e53a34214a503d38760ad7778d08a680)	2022-02-25 02:30:03 +00:00
Can Balioglu	0e7a7a5fe7	Add documentation for c10d log levels (#73361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73361 This PR adds the documentation for the newly introduced `TORCH_CPP_LOG_LEVEL` and how it can be used along with `TORCH_DISTRIBUTED_DEBUG` to adjust the log level of c10d. ghstack-source-id: 149874995 Test Plan: Locally rendered and checked the documentation. Reviewed By: rohan-varma Differential Revision: D34452352 fbshipit-source-id: ecb54590f3030ddef9921a7152ca9f7fc9438345 (cherry picked from commit f4c7c6f3b27dbd3006686cf26a6e9e53cd2c8f09)	2022-02-24 20:38:15 +00:00
Edgar Andrés Margffoy Tuay	86deecd7be	Check clang++/g++ version when compiling CUDA extensions (#63230 ) Summary: See https://github.com/pytorch/pytorch/issues/55267 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63230 Reviewed By: soulitzer Differential Revision: D34159119 Pulled By: malfet fbshipit-source-id: 6eef7582388bf6a42dcc1d82b6e4b1f40f418dd7 (cherry picked from commit 2056d0a0be7951602de22f8d3b4efc28dd71b6c2)	2022-02-24 08:32:32 +00:00
Can Balioglu	e1db2f13ce	Refactor TORCH_DISTRIBUTED_DEBUG implementation (#73166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73166 This PR refactors, cleans up, and optimizes the implementation of `TORCH_DISTRIBUTED_DEBUG`. It also introduces three new user APIs: `get_debug_level()`, `set_debug_level()`, and `set_debug_level_from_env()` to retrieve and modify the debug level after a process has started. ghstack-source-id: 149778566 Test Plan: Run the existing unit tests. Reviewed By: rohan-varma Differential Revision: D34371226 fbshipit-source-id: e18443b411adcbaf39b2ec999178c198052fcd5b (cherry picked from commit 26d6bb1584b83a0490d8b766482656a5887fa21d)	2022-02-24 02:33:05 +00:00
Nikita Karetnikov	75db05c3fd	Check if the iterator is valid before dereferencing it (#72405 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72405 Fixes #71674. This shouldn't segfault now: ``` import torch d = torch.complex64 torch.set_default_dtype(d) ``` Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D34423660 Pulled By: anjali411 fbshipit-source-id: cac92a6f56846f2c0727a120b5f568aa75baa21e (cherry picked from commit eaab813a0fddced24303b3bd50e4fcdba1516e46)	2022-02-23 18:33:46 +00:00
Nikita Shulga	cfb6c942fe	`scatter_reduce` documentation (#73125 ) Summary: Reland of https://github.com/pytorch/pytorch/issues/68580 (which were milestoned for 1.11) plus partial revert of https://github.com/pytorch/pytorch/pull/72543 Pull Request resolved: https://github.com/pytorch/pytorch/pull/73125 Reviewed By: bdhirsh Differential Revision: D34355217 Pulled By: malfet fbshipit-source-id: 325ecdeaf53183d653b44ee5e6e8839ceefd9200 (cherry picked from commit `71db31748a`)	2022-02-22 19:33:46 +00:00
Gary Miguel	dbac0f5cdc	Update persons of interest for ONNX (#72072 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72072 Reviewed By: H-Huang Differential Revision: D34230534 Pulled By: malfet fbshipit-source-id: ed5abdfacf0d9628c6cc99957fa578d71a79d025 (cherry picked from commit `4669c346c4`)	2022-02-16 23:01:13 +00:00
Elias Ellison	f8a2efc190	Make fusion strategy api public (#72639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72639 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D34159123 Pulled By: eellison fbshipit-source-id: 27e4d9694a83e8d6829009882715be4308c96a9f (cherry picked from commit `1cadcd2f75`)	2022-02-16 03:45:15 +00:00
Kurt Mohler	8e7fe87630	Rename `Typed/UntypedStorage` to `_Typed/_UntypedStorage` (#72540 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72540 Reviewed By: jbschlosser Differential Revision: D34216823 Pulled By: bdhirsh fbshipit-source-id: 1bc9930ab582771ebf02308e035576cd1a0dbe47 (cherry picked from commit `329238f612`)	2022-02-15 23:53:01 +00:00
Nikita Shulga	cb00d9601c	Revert D33800694: [pytorch][PR] `scatter_reduce` documentation Test Plan: revert-hammer Differential Revision: D33800694 (`12a1df27c7`) Original commit changeset: 2e09492a29ce Original Phabricator Diff: D33800694 (`12a1df27c7`) fbshipit-source-id: 2a4775c0042551607fe3ab77f5bfe9f2e4b6b78e (cherry picked from commit `4bd6c0d2bb`)	2022-02-15 20:10:26 +00:00
rusty1s	12a1df27c7	`scatter_reduce` documentation (#68580 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63780 (part 2) Pull Request resolved: https://github.com/pytorch/pytorch/pull/68580 Reviewed By: atalman Differential Revision: D33800694 Pulled By: malfet fbshipit-source-id: 2e09492a29cef115a7cca7c8209d1dcb6ae24eb9 (cherry picked from commit `696ff75940`)	2022-02-15 19:43:54 +00:00
Huamin Li	32dd4a8639	move fx_acc out of pytorch core (#72803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72803 as title Reviewed By: jfix71 Differential Revision: D34101788 fbshipit-source-id: a9fd84671929af21405c049603e9895ec68de3d8 (cherry picked from commit `e98fd1c32d`)	2022-02-15 16:13:43 +00:00

1 2 3 4 5 ...

1661 Commits