pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Daniel Dale	ce56ee11fd	Extend torch.cuda.is_available() to attempt an NVML-based CUDA availability assessment when explicitly requested by the user (#85951 ) Fixes #83973 (This is a substitute PR for https://github.com/pytorch/pytorch/pull/85024) First of all, thanks for your invaluable contributions to PyTorch everyone! Given how extensively `torch.cuda.is_available` is used in the PyTorch ecosystem, IMHO it's worthwhile to provide downstream libraries/frameworks/users the ability to alter the default behavior of `torch.cuda.is_available` in the context of their PyTorch usage. I'm confident there are many current and future such use cases which could benefit from leveraging a weakened, NVML-based `torch.cuda.is_available` assessment at a downstream framework's explicit direction (thanks @malfet `81da50a972` !). Though one could always patch out the `torch.cuda.is_available` function with another implementation in a downstream library, I think this environmental variable based configuration option is more convenient and the cost to including the option is quite low. As discussed in https://github.com/pytorch/pytorch/pull/85024#issuecomment-1261542045, this PR gates new non-default NVML-based CUDA behavior with an environmental variable (PYTORCH_NVML_BASED_CUDA_CHK) that allows a user/framework to invoke non-default, NVML-based `is_available()` assessments if desired. Thanks again for your work everyone! @ngimel @malfet @awaelchli Pull Request resolved: https://github.com/pytorch/pytorch/pull/85951 Approved by: https://github.com/ngimel	2022-10-12 18:37:50 +00:00
Eddie Yan	25725fd624	(Re-open) Adds cudaMallocAsync as an alternative backend for the CUDA allocator (#82682 ) Rebased version of @mcarilli 's cudaMallocAsync #65365 for continued testing Pull Request resolved: https://github.com/pytorch/pytorch/pull/82682 Approved by: https://github.com/ngimel	2022-10-12 03:44:21 +00:00
Partho	42bd275233	[doc] LR scheduler example fix (#86629 ) Fixes issue #86208 As suggested in the issue, updated the LR scheduler example to use a regular nn.Module like the other examples on the same page. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86629 Approved by: https://github.com/soulitzer	2022-10-11 21:41:50 +00:00
zaf	3a02873183	[quant][ao_migration] nn.intrinsic.quantized migration to ao (#86172 ) All quantization-related modules are being migrated to `torch.ao`. This migrates the `nn.intrinsic.quantized`. Please, see the [tracker](https://github.com/pytorch/pytorch/issues/81667) for the timeline. ``` python test/test_quantization.py -- TestAOMigrationNNIntrinsic ``` Internal: ``` buck2 test @mode/dev-nosan //caffe2/test:quantization -- TestAOMigrationNNIntrinsic ``` Differential Revision: [D39425515](https://our.internmc.facebook.com/intern/diff/D39425515/) Differential Revision: [D39425515](https://our.internmc.facebook.com/intern/diff/D39425515) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86172 Approved by: https://github.com/jerryzh168	2022-10-08 00:01:38 +00:00
zaf	efccb6401c	[quant][ao_migration] nn.intrinsic.qat migration to ao (#86171 ) All quantization-related modules are being migrated to `torch.ao`. This migrates the `nn.intrinsic.qat`. Please, see the [tracker](https://github.com/pytorch/pytorch/issues/81667) for the timeline. ``` python test/test_quantization.py TestAOMigrationNNIntrinsic ``` Differential Revision: [D39419993](https://our.internmc.facebook.com/intern/diff/D39419993/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39419993/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/86171 Approved by: https://github.com/jerryzh168	2022-10-07 17:29:42 +00:00
Howard Huang	cc9183eb4c	Update distributed.rst backend collective support chart (#86406 ) NCCL `scatter` was added by Wanchao in https://github.com/pytorch/pytorch/pull/70029 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86406 Approved by: https://github.com/wanchaol	2022-10-07 12:59:09 +00:00
Zafar	0e30da3f2f	[refactor] Renaming ao.sparsity to ao.pruning (#84867 ) `Sparsity` as a term doesn't reflect the tools that are developed by the AO. The `torch/ao/sparsity` also has utilities for structured pruning, which internally we always referred to as just "pruning". To avoid any confusion, we renamed `Sparsity` to `Prune`. We will not be introducing the backwards compatibility, as so far this toolset was kept under silent development. This change will reflect the changes in the documentation as well. TODO: - [ ] Change the tutorials - [ ] Confirm no bc-breakages - [ ] Reflect the changes in the trackers and RFC docs Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/84867 Approved by: https://github.com/supriyar	2022-10-07 00:58:41 +00:00
Sahan Paliskara	936e93058b	Delete torch::deploy from pytorch core (#85953 ) As we have migrated torch::deploy over to https://github.com/pytorch/multipy, we can now delete it from pytorch core as ongoing development will happen there. This PR was created due to syncing issues with https://github.com/pytorch/pytorch/pull/85443 which is where the review history can be found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85953 Approved by: https://github.com/seemethere, https://github.com/malfet	2022-10-06 07:20:16 +00:00
Elias Ellison	d04889323e	Add Context Manager for Disabling Multithreading in Backwards, use in aot autograd (#86245 ) We were running into a few issues with running multithreaded backwards in aot_autograd: such as https://github.com/pytorch/pytorch/issues/86136, and `FakeTensorMode` getting into a weird state as a result of not executing functions completely sequentially. The multithreaded backwards is lost in translation when we trace out the backwards anyway, and adds a lot of additional complexity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86245 Approved by: https://github.com/albanD, https://github.com/yf225	2022-10-06 03:27:42 +00:00
Jane Xu	a348975e00	Add opteinsum backend to give users control (#86219 ) This achieves the same things as https://github.com/pytorch/pytorch/pull/85908 but using backends instead of kwargs (which breaks torchscript unfortunately). This also does mean we let go of numpy compatibility BUT the wins here are that users can control what opt einsum they wanna do! The backend allows for..well you should just read the docs: ``` .. attribute:: torch.backends.opteinsum.enabled A :class:`bool` that controls whether opt_einsum is enabled (on by default). If so, torch.einsum will use opt_einsum (https://optimized-einsum.readthedocs.io/en/stable/path_finding.html) to calculate an optimal path of contraction for faster performance. .. attribute:: torch.backends.opteinsum.strategy A :class:`str` that specifies which strategies to try when `torch.backends.opteinsum.enabled` is True. By default, torch.einsum will try the "auto" strategy, but the "greedy" and "optimal" strategies are also supported. Note that the "optimal" strategy is factorial on the number of inputs as it tries all possible paths. See more details in opt_einsum's docs (https://optimized-einsum.readthedocs.io/en/stable/path_finding.html). ``` In trying (and failing) to land 85908, I discovered that jit script does NOT actually pull from python's version of einsum (because it cannot support variadic args nor kwargs). Thus I learned that jitted einsum does not subscribe to the new opt_einsum path calculation. Overall, this is fine since jit script is getting deprecated, but where is the best place to document this? ## Test plan: - added tests to CI - locally tested that trying to set the strategy to something invalid will error properly - locally tested that tests will pass even if you don't have opt-einsum - locally tested that setting the strategy when opt-einsum is not there will also error properly Pull Request resolved: https://github.com/pytorch/pytorch/pull/86219 Approved by: https://github.com/soulitzer, https://github.com/malfet	2022-10-05 06:33:25 +00:00
Jing Xu	f20e4eab7b	Fix ITT unit-tests if PyTorch is compiled with `USE_ITT=OFF` (#86199 ) Fixes https://github.com/pytorch/pytorch/pull/84848#discussion_r986329680 @malfet @slgong-fb Pull Request resolved: https://github.com/pytorch/pytorch/pull/86199 Approved by: https://github.com/malfet	2022-10-04 21:57:05 +00:00
Khushi	d6b030856b	[primTorch] special: j0, j1, spherical_j0 (#86049 ) Adds prims and refs for special functions (bessel_j0, bessel_j1, spherical_bessel_j0). Thanks! Pull Request resolved: https://github.com/pytorch/pytorch/pull/86049 Approved by: https://github.com/mruberry	2022-10-04 18:21:46 +00:00
Driss Guessous	cd6477617c	Custom sdp implementations dense (#85984 ) # Summary - This code creates the runtime dispatch system for choosing a performant fused SDP kernel. The only choice of fused kernel is flash_attention. It also creates python flags and a context manager that can be used to turn off and on behavior for dispatch. - This also adds support for flash_attention with dense tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85984 Approved by: https://github.com/cpuhrsch	2022-10-03 17:36:37 +00:00
vfdev	8d9472d7d4	[skip-ci] Fixed bad link in build_ci_governance.rst (#85933 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85933 Approved by: https://github.com/albanD	2022-10-03 17:35:44 +00:00
Masaki Kozuki	85d520d448	[docs] Add `torch.channels_last_3d (#85888 ) As per title, updating https://pytorch.org/docs/master/tensor_attributes.html#torch-memory-format. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85888 Approved by: https://github.com/ngimel	2022-10-03 17:32:07 +00:00
Codrin Popa	d401732baa	Added roundup_bypass_threshold_mb knobs to the PyTorch Caching Allocator (#85940 ) Summary: Added an additional roundup knob( ``roundup_bypass_threshold_mb``) to bypass rounding the requested allocation size, for allocation requests larger than the threshold value (in MB). This can help reduce the memory footprint when making large allocations that are expected to be persistent or have a large lifetime. Differential Revision: D39868104 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85940 Approved by: https://github.com/zdevito	2022-10-03 16:56:22 +00:00
Richard Zou	a262ccea58	Change torch.autograd.graph.disable_saved_tensors_hooks to be public API (#85994 ) Also addresses some comments from the review in https://github.com/pytorch/pytorch/pull/85971 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85994 Approved by: https://github.com/albanD, https://github.com/soulitzer	2022-10-03 16:25:01 +00:00
vfdev	6fd5d6397a	[Docs] Updated torchvision people (#85931 ) cc @datumbox @pmeier Pull Request resolved: https://github.com/pytorch/pytorch/pull/85931 Approved by: https://github.com/fmassa, https://github.com/datumbox	2022-10-03 10:57:08 +00:00
Ke Wen	05d1128106	[c10d] Start deprecating *_multigpu APIs (#85961 ) ### Deprecation reasons: - For most users training is on one GPU per process so these APIs are rarely used - They added one more API dimension - They can be expressed in a composed manner - They are not abstracted – specific to GPU - They caused backend APIs and implementations to have nested `std::vector<std::vector<Tensor>>`, which is hard to read or maintain Pull Request resolved: https://github.com/pytorch/pytorch/pull/85961 Approved by: https://github.com/XilunWu, https://github.com/H-Huang	2022-10-01 00:59:39 +00:00
Justin Chu	69b927701a	[ONNX] Update user documentation (#85819 ) - Remove mentions of `SymbolicContext` in the doc - Comment out the PythonOp example so that it is not shown to users - Updated code blocks and wording - Changed to recommend using `pip` for installing onnx. Now adds a deprecation message to the docs (demo only): ![image](https://user-images.githubusercontent.com/11205048/193327649-f789b369-6b59-49e0-8bba-34a6785eb128.png) Fixes #85608 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85819 Approved by: https://github.com/AllenTiTaiWang, https://github.com/BowenBao	2022-09-30 19:35:34 +00:00
erjia	b13b10a8fa	Extend collate function that can register collate functions to handle specific types (#85748 ) As per request from Vision team, adding `collate` function with an extra argument of `collate_fn_map` to dispatch custom collate functions for non-collection objects and specific objects. If the type of batch element is not present in`collate_fn_map`, it will go through all keys in the insertion order to check if the type is a subclass of the key. If so, it will invoke the corresponding collate functions. And, `default_collate` will utilize the `collate` function with a few by default collate function for `int`, `float`, `str` and `numpy object`. Benefit: - Domain teams can register their own `collate` function to handle their specific type of objects - Easier for users to extend from the `collate` function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85748 Approved by: https://github.com/NivekT, https://github.com/pmeier	2022-09-30 13:30:18 +00:00
Ke Wen	ade1c19612	Add reduce_scatter_tensor in place of _reduce_scatter_base (#85867 ) This is a twin PR similar to the one for `all_gather_into_tensor` (#85686). The philosophy for renaming `_reduce_scatter_base` instead of merging it is described in #85686. Cc @rohan-varma @H-Huang @crcrpar @ptrblck @mrshenli Pull Request resolved: https://github.com/pytorch/pytorch/pull/85867 Approved by: https://github.com/crcrpar, https://github.com/H-Huang	2022-09-30 05:48:16 +00:00
Kazuaki Ishizaki	bc57306bdd	Fix typo under docs directory and RELEASE.md (#85896 ) This PR fixes typo in rst files under docs directory and `RELEASE.md`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85896 Approved by: https://github.com/kit1980	2022-09-29 21:41:59 +00:00
zaf	d542aab5c1	[quant][ao_migration] nn.intrinsic migration to ao (#84842 ) All quantization-related modules are being migrated to `torch.ao`. This migrates the `nn.intrinsic.modules`. Please, see the [tracker](https://github.com/pytorch/pytorch/issues/81667) for the timeline. Differential Revision: [D39419733](https://our.internmc.facebook.com/intern/diff/D39419733/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39419733/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/84842 Approved by: https://github.com/jerryzh168	2022-09-28 23:54:29 +00:00
Mikayla Gawarecki	afaee00fec	Add python `nested_tensor` and `as_nested_tensor` constructors in `torch.nested` (#85593 ) Remove `torch.nested_tensor` which has erroneous behavior wrt gradients (could be either leaf or not leaf). Introduce `torch.nested.nested_tensor` and `torch.nested.as_nested_tensor` in the vein of `torch.tensor` and `torch.as_tensor`. Done in nested `__init__.py` for now but can move to pybind in future (when we want to load from numpy/nested lists ). Discussed offline with @cpuhrsch and pybind constructor (https://github.com/pytorch/pytorch/pull/85536) was more gnarly than expected, so we can move to that when we do need loading from numpy etc. Differential Revision: [D39806622](https://our.internmc.facebook.com/intern/diff/D39806622) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85593 Approved by: https://github.com/drisspg, https://github.com/cpuhrsch	2022-09-28 20:15:02 +00:00
Jing Xu	80b8886223	add itt unit test and docstrings (#84848 ) Add unit tests and docstrings corresponding to PR https://github.com/pytorch/pytorch/pull/63289 UT: 1. `test_profiler_emit_itt` in `test/test_autograd.py`. This test is merely intended to catch if emit_itt breaks on construction. 2. Test `torch.profiler.itt` functions in `test/test_itt.py` 3. Only testing that emit_itt runs when `record_shapes` option is enabled in `test/test_profiler.py`. Docstring: 1. add ITT related info into `docs/source/bottleneck.rst` 4. add `torch.profiler.itt` functions to `docs/source/profiler.rst` 5. add docstring to `torch.profiler.itt` functions in `torch/profiler/itt.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84848 Approved by: https://github.com/malfet	2022-09-28 01:39:58 +00:00
Andrew M. James	5bfcf1f01a	[Docs] Update sparse Maintainers (#85126 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85126 Approved by: https://github.com/cpuhrsch	2022-09-27 22:50:31 +00:00
Ke Wen	775a22c7c6	Add all_gather_into_tensor in place of _all_gather_base (#85686 ) ### Description - This PR renames `_all_gather_base` to `all_gather_into_tensor` so that it is clearer in meaning. - The `all_gather_into_tensor` API differs from the `all_gather` API in the output it accepts -- a single, large tensor instead of a list of tensors. - This PR also adds deprecation warning to `_all_gather_base`. ### Issue `_all_gather_base` was implemented in https://github.com/pytorch/pytorch/pull/33924 to avoid unnecessary flattening. There was previous effort (#82639) to merge `_all_gather_base` with the existing `all_gather` API by detecting the parameter type passed in for the output. There are, however, two "blockers" that make the merge difficult: (i) The merge leads to backward compatibility break. We would need to change the parameter name `tensor_list` in `all_gather` to a general name `output` that can cover both tensor and tensor list. (ii) Recently, the `all_gather` API has added uneven tensor support, utilizing the tensor boundaries implied by the list. We are, however, not sure to add such support to the `_all_gather_base` function, because that would require users to pass in additional tensor boundary information. In view of the above, we decided to productize `_all_gather_base` as a separate function, but with a clearer name. ### Testing Added tests: - `test_all_gather_into_cat_tensor_cuda` -- output form as with `torch.cat`. For example: ``` >>> tensor_in tensor([1, 2], device='cuda:0') # Rank 0 tensor([3, 4], device='cuda:1') # Rank 1 >>> tensor_out tensor([1, 2, 3, 4], device='cuda:0') # Rank 0 tensor([1, 2, 3, 4], device='cuda:1') # Rank 1 ``` - `test_all_gather_into_stack_tensor_cuda` -- output form as with `torch.stack`. For example: ``` >>> tensor_out2 tensor([[1, 2], [3, 4]], device='cuda:0') # Rank 0 tensor([[1, 2], [3, 4]], device='cuda:1') # Rank 1 ``` The output form is determined by the shape of the output tensor passed by the user, no flag used. Cc @rohan-varma @mrshenli @crcrpar @ptrblck @H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85686 Approved by: https://github.com/rohan-varma, https://github.com/crcrpar	2022-09-27 22:50:22 +00:00
supriyar	18685b7fe1	Update PT maintainers list for AO (#85125 ) Summary: Update the list based on recommendation in https://github.com/pytorch/pytorch/blob/master/docs/source/community/build_ci_governance.rst Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D39745619](https://our.internmc.facebook.com/intern/diff/D39745619) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85125 Approved by: https://github.com/gchanan	2022-09-23 23:38:57 +00:00
Ivan Yashchuk	539076e2c2	Remove deprecated torch.lstsq (#70980 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.lstsq`. There's a note in `tools/codegen/gen.py` about `lstsq` schema in `native_function.yaml` that I will not remove: `87139d8532/tools/codegen/gen.py (L734-L770)` cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/70980 Approved by: https://github.com/lezcano, https://github.com/kit1980	2022-09-23 00:16:55 +00:00
Ivan Yashchuk	bcf93181a0	Remove deprecated torch.matrix_rank (#70981 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.matrix_rank`. cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/70981 Approved by: https://github.com/lezcano, https://github.com/kit1980	2022-09-22 17:40:46 +00:00
lezcano	de0f3c4200	Change Lezcano to lezcano (#85396 ) I changed my handle to lezcano (no caps) as rhere's always issues with capital letters when automatising stuff. The last issue was https://github.com/pytorch/test-infra/pull/751 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85396 Approved by: https://github.com/ezyang	2022-09-21 13:49:55 +00:00
Mateusz Sypniewski	b70c254ebb	Rework printing tensor aliases in CSAN error message (#85008 ) Small rework of how the error message is formatted, introduces a distinction between the arguments and the output of kernels. Verified manually on multiple examples that the message is printed as expected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85008 Approved by: https://github.com/lw	2022-09-21 13:41:52 +00:00
Justin Chu	d6c2080eb4	[ONNX] Update ONNX documentation to include unsupported operators (#84496 ) - Update ONNX documentation to include unsupported operators - Include aten, quantized and other namespaces Pull Request resolved: https://github.com/pytorch/pytorch/pull/84496 Approved by: https://github.com/AllenTiTaiWang, https://github.com/BowenBao, https://github.com/kit1980	2022-09-16 23:48:37 +00:00
Feisi Fu	d8eae6283d	Rename 'torch/ao/nn/quantized._reference' to 'torch/ao/nn/quantized/reference'. (#84974 ) Currently, the path for reference modules contains _ which means it's private (https://github.com/pytorch/pytorch/tree/master/torch/ao/nn/quantized/_reference), but we would like to make it public since the reference module is now enabled by default in the fx graph mode quantization flow and it will be added to eager mode flow as well in the future. To make '_reference' public, it should satisfy the [public API rules](https://github.com/pytorch/pytorch/wiki/Public-API-definition-and-documentation). I did in the first commit (prepare '_reference' to be public): 1: add __all__ to public modules and packages; 2. made functions, that are only used in the file that the function is defined, private by adding _ at their names. Fixes #83090. (we rename the 'torch/ao/nn/quantized/_reference', because of migration #81667.) This is a dup for the #84786. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84974 Approved by: https://github.com/andrewor14, https://github.com/z-a-f	2022-09-16 17:49:07 +00:00
Khushi Agrawal	2386cd2945	[reland] [numpy] add torch.concatenate, alias of torch.cat (#85073 ) Previous PR: #82946 Fixes #81161 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85073 Approved by: https://github.com/mruberry	2022-09-15 19:34:44 +00:00
PyTorch MergeBot	fa7bf3e2dc	Revert "[numpy] add `torch.concatenate`, alias of torch.cat (#82946 )" This reverts commit `270e5e519d`. Reverted https://github.com/pytorch/pytorch/pull/82946 on behalf of https://github.com/malfet due to Broke M1 tests, see `270e5e519d`	2022-09-14 21:32:11 +00:00
Khushi Agrawal	270e5e519d	[numpy] add `torch.concatenate`, alias of torch.cat (#82946 ) As per the title. Fixes: #81161 - [x] add ErrorInputs - ~[ ] dtype argument?~ - ~[ ] casting argument?~ As discussed offline with @kshitij12345, we can currently ignore `dtype` and `casting` arguments. cc: @kshitij12345! Pull Request resolved: https://github.com/pytorch/pytorch/pull/82946 Approved by: https://github.com/mruberry	2022-09-14 19:28:43 +00:00
Nayef Ahmed	cb9ef4668e	Updated library level maintainers for torchtext (#84950 ) - Updated library level maintainers for torchtext to reflect internal changes to the team Pull Request resolved: https://github.com/pytorch/pytorch/pull/84950 Approved by: https://github.com/mthrok	2022-09-14 00:35:36 +00:00
Mikayla Gawarecki	e217b30b0f	Add `torch.nested` namespace (#84102 ) First step towards #83775 - only `to_padded_tensor` is moved to the nested namespace for now - following the schema used for `special`, `fft`, `linalg` and other namespaces, nested functions are registered in native_functions.yaml as `nested_{function_name}` and are bound to the desired Python name in `torch/nested/__init__.py`, and the desired C++ name in `torch/csrc/api/include/torch/nested.h`. ~~Question: should we keep the documentation for `Tensor.to_padded_tensor` or can this deleted since it is shared by `torch.nested.to_padded_tensor`?~~ [generated nested docs](https://docs-preview.pytorch.org/84102/nested.html?highlight=nested#module-torch.nested) Differential Revision: [D39361148](https://our.internmc.facebook.com/intern/diff/D39361148) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84102 Approved by: https://github.com/drisspg	2022-09-12 16:31:05 +00:00
Slava Kovalevskyi	2698f99dc7	fixing form link for governance (#84861 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84861 Approved by: https://github.com/malfet	2022-09-12 14:15:52 +00:00
Dmytro Dzhulgakov	96e4bd9500	[docs] Person of interest update: sparse, torchrec and smaller tweaks (#84772 ) Fixes #83363 This is not a full update yet, but fixes some obvious things: missing modules (torchrec, sparse) and brings a few people from merge_rules.json who are working on the respective modules. There are still discrepancies - e.g. Intel CPU work is split in many categories in merge_rules, but it's better to improve things incrementally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84772 Approved by: https://github.com/b0noI, https://github.com/malfet	2022-09-10 00:09:57 +00:00
Ivan Yashchuk	01c54ad6de	Remove deprecated torch.eig (#70982 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.eig`. cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/70982 Approved by: https://github.com/Lezcano, https://github.com/malfet	2022-09-09 21:31:57 +00:00
Mateusz Sypniewski	d12f3524b7	Add user facing documentation for CSAN (#84689 ) This adds a user facing tutorial for the CSAN tool. The documentation preview should be available [here](https://docs-preview.pytorch.org/84689/index.html) once the GitHub job completes on this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84689 Approved by: https://github.com/lw	2022-09-09 15:29:34 +00:00
Jerry Zhang	214a6500e3	[quant][docs] Additonal fixes for quantize_fx docs (#84587 ) Summary: Some more clarifications for the arguments, including linking to object docs (QConfigMapping, BackendConfig) and adding types in the doc Test Plan: ``` cd docs make html ``` and visual inspection for the generated docs Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/84587 Approved by: https://github.com/vkuzo	2022-09-09 15:23:23 +00:00
Sergii Dymchenko	49ec8d32c7	Suggest draft PRs in contribution_guide.rst (#84658 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84658 Approved by: https://github.com/huydhn	2022-09-08 03:12:50 +00:00
Eddie Yan	d892d5d682	[CUBLAS][TF32][CUDNN] Update numerical_accuracy.rst (#79537 ) CC @mruberry @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/79537 Approved by: https://github.com/ngimel, https://github.com/mruberry	2022-09-07 18:30:26 +00:00
Bin Chen	06ebe2d5bc	Add watchdog to TorchElastic agent and trainers (#84081 ) Summary: D38604238 (`3b11b80fc3`) introduced a named pipe based watchdog timer. This diff uses the named pipe based watchdog timer in TorchElastic agent and training worker processes (in the StuckJobDetector class) to allow the TorchElastic agent to detect the stuck of a training process, and kill the process to create a core dump. Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/elastic/agent/server/test:local_agent_test ``` ``` RemoteExecution session id: reSessionID-0bfcacef-24d1-42bc-a1d3-f3058fc42b2f-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/7318349503394739 ✓ ListingSuccess: caffe2/test/distributed/elastic/agent/server/test:local_agent_test : 55 tests discovered (22.699) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_barrier_failed_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (47.140) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_distributed_sum_homogeneous_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (49.198) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_happy_function_c10d (local_elastic_agent_test.LocalElasticAgentTest) (46.387) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_happy_function_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (46.094) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_bipolar_function_etcd (local_elastic_agent_test.LocalElasticAgentTest) (106.342) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_correct_rank_assignment_homogeneous_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (64.888) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_correct_rank_assignment_homogeneous_etcd (local_elastic_agent_test.LocalElasticAgentTest) (69.158) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_agent_local_watchdog_setup_enabled_etcd (local_elastic_agent_test.LocalElasticAgentTest) (46.965) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_double_agent_elastic_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (79.626) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_function_with_return_value_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (46.113) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_sad_function_etcd (local_elastic_agent_test.LocalElasticAgentTest) (46.487) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_shutdown_called_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (24.358) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_torch_rpc_c10d (local_elastic_agent_test.LocalElasticAgentTest) (48.216) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_distributed_sum_homogeneous_c10d (local_elastic_agent_test.LocalElasticAgentTest) (48.433) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_torch_rpc_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (47.029) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_simple_dist_sum_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (44.357) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_check_master_addr_port_override_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (45.176) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_check_nccl_async_error_handling_env_default_c10d (local_elastic_agent_test.LocalElasticAgentTest) (45.980) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_simple_dist_sum_c10d (local_elastic_agent_test.LocalElasticAgentTest) (47.151) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_simple_dist_sum_etcd (local_elastic_agent_test.LocalElasticAgentTest) (44.614) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_correct_rank_assignment_heterogeneous_etcd (local_elastic_agent_test.LocalElasticAgentTest) (69.099) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_agent_local_watchdog_setup_enabled_c10d (local_elastic_agent_test.LocalElasticAgentTest) (45.367) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_shutdown_called_etcd (local_elastic_agent_test.LocalElasticAgentTest) (22.804) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_double_agent_elastic_c10d (local_elastic_agent_test.LocalElasticAgentTest) (77.560) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_dummy_compute_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (46.050) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_distributed_sum_heterogeneous_c10d (local_elastic_agent_test.LocalElasticAgentTest) (48.088) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_double_agent_elastic_etcd (local_elastic_agent_test.LocalElasticAgentTest) (77.286) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_double_agent_fault_tolerance_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (50.670) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_check_master_addr_port_override_etcd (local_elastic_agent_test.LocalElasticAgentTest) (45.631) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_distributed_sum_heterogeneous_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (50.867) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_double_agent_fault_tolerance_etcd (local_elastic_agent_test.LocalElasticAgentTest) (51.095) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_happy_function_etcd (local_elastic_agent_test.LocalElasticAgentTest) (45.000) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_sad_function_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (45.197) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_distributed_sum_homogeneous_etcd (local_elastic_agent_test.LocalElasticAgentTest) (46.873) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_shutdown_called_c10d (local_elastic_agent_test.LocalElasticAgentTest) (23.160) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_barrier_failed_etcd (local_elastic_agent_test.LocalElasticAgentTest) (43.632) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_torch_rpc_etcd (local_elastic_agent_test.LocalElasticAgentTest) (44.536) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_bipolar_function_c10d (local_elastic_agent_test.LocalElasticAgentTest) (89.859) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_workers_drift_fail_etcd (local_elastic_agent_test.LocalElasticAgentTest) (48.277) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_check_nccl_async_error_handling_env_c10d (local_elastic_agent_test.LocalElasticAgentTest) (43.930) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_bipolar_function_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (87.677) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_workers_drift_success_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (48.965) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_workers_drift_fail_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (50.143) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_workers_drift_success_etcd (local_elastic_agent_test.LocalElasticAgentTest) (46.781) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_function_with_return_value_etcd (local_elastic_agent_test.LocalElasticAgentTest) (45.152) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_barrier_failed_c10d (local_elastic_agent_test.LocalElasticAgentTest) (44.832) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_function_with_return_value_c10d (local_elastic_agent_test.LocalElasticAgentTest) (45.281) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_correct_rank_assignment_heterogeneous_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (74.968) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_agent_local_watchdog_setup_disabled_c10d (local_elastic_agent_test.LocalElasticAgentTest) (46.141) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_dummy_compute_c10d (local_elastic_agent_test.LocalElasticAgentTest) (44.960) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_dummy_compute_etcd (local_elastic_agent_test.LocalElasticAgentTest) (45.292) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_agent_local_watchdog_setup_disabled_etcd (local_elastic_agent_test.LocalElasticAgentTest) (44.611) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_check_env_function_etcd (local_elastic_agent_test.LocalElasticAgentTest) (44.939) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_distributed_sum_heterogeneous_etcd (local_elastic_agent_test.LocalElasticAgentTest) (47.609) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_sad_function_c10d (local_elastic_agent_test.LocalElasticAgentTest) (45.628) Summary Pass: 55 ListingSuccess: 1 Finished test run: https://www.internalfb.com/intern/testinfra/testrun/7318349503394739 ``` ----------- ``` buck test caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test ``` ``` RemoteExecution session id: reSessionID-607a0028-4095-4dfc-b657-55f0807fe621-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8162774432794818 ✓ ListingSuccess: caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test : 11 tests discovered (39.037) ✓ Pass: caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test - test_thrift_api_called (caffe2.torch.fb.trainer.stuck_detection.tests.collect_quickstack_test.CollectQuickstackTrace) (0.655) ✓ Pass: caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test - test_setup_local_watchdog (caffe2.torch.fb.trainer.stuck_detection.tests.stuck_job_detector_test.StuckJobDetectorTest) (36.510) ✓ Pass: caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test - test_dont_print_when_job_normal (caffe2.torch.fb.trainer.stuck_detection.tests.stuck_job_detector_test.StuckJobDetectorTest) (36.727) ✓ Pass: caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test - test_send_watchdog_request_on_batch_callbacks_no_server (caffe2.torch.fb.trainer.stuck_detection.tests.stuck_job_detector_test.StuckJobDetectorTest) (37.060) ✓ Pass: caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test - test_quickstack_stuck_job (caffe2.torch.fb.trainer.stuck_detection.tests.stuck_job_detector_test.StuckJobDetectorTest) (37.242) ✓ Pass: caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test - test_setup_local_watchdog_disabled (caffe2.torch.fb.trainer.stuck_detection.tests.stuck_job_detector_test.StuckJobDetectorTest) (37.243) ✓ Pass: caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test - test_print_stack_trace_when_job_stuck (caffe2.torch.fb.trainer.stuck_detection.tests.stuck_job_detector_test.StuckJobDetectorTest) (37.590) ✓ Pass: caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test - test_print_when_stuck (caffe2.torch.fb.trainer.stuck_detection.tests.stuck_job_detector_test.StuckJobDetectorTest) (37.590) ✓ Pass: caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test - test_setup_local_watchdog_no_file (caffe2.torch.fb.trainer.stuck_detection.tests.stuck_job_detector_test.StuckJobDetectorTest) (37.589) ✓ Pass: caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test - test_signposts_stack_trace_when_job_stuck (caffe2.torch.fb.trainer.stuck_detection.tests.stuck_job_detector_test.StuckJobDetectorTest) (38.132) ✓ Pass: caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test - test_send_watchdog_request_on_batch_callbacks (caffe2.torch.fb.trainer.stuck_detection.tests.stuck_job_detector_test.StuckJobDetectorTest) (38.133) Summary Pass: 11 ListingSuccess: 1 Finished test run: https://www.internalfb.com/intern/testinfra/testrun/8162774432794818 ``` Differential Revision: D38930476 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84081 Approved by: https://github.com/d4l3k	2022-09-07 00:17:20 +00:00
Edward Z. Yang	2a332afbf4	Add SymFloat, support SymInt to SymFloat conversion (#84284 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84284 Approved by: https://github.com/albanD	2022-09-03 01:30:32 +00:00
Slava Kovalevskyi	c585e149e2	Process for maintaining Build + CI contributors list (#83869 ) The following issues are fixed: * process of adding new contributors to the "Build + CI" module added * folks who qualified are explicitly added Pull Request resolved: https://github.com/pytorch/pytorch/pull/83869 Approved by: https://github.com/svekars, https://github.com/seemethere, https://github.com/malfet	2022-08-31 21:48:39 +00:00

1 2 3 4 5 ...

1877 Commits