Commit Graph

336 Commits

Author SHA1 Message Date
Xuehai Pan
55064a4ef9 [BE] add parentheses to kwargs unpacking func(*args, **(kwargs or {})) (#115026)
This PR adds parentheses to kwargs unpacking `func(*args, **(kwargs or {}))` for better code readability.

With/without the parentheses are semantic equivalent because they produce the same bytecode.

```console
$ echo "func(*args, **kwargs or {})" | python3 -m dis -
  0           0 RESUME                   0

  1           2 PUSH_NULL
              4 LOAD_NAME                0 (func)
              6 LOAD_NAME                1 (args)
              8 BUILD_MAP                0
             10 LOAD_NAME                2 (kwargs)
             12 JUMP_IF_TRUE_OR_POP      1 (to 16)
             14 BUILD_MAP                0
        >>   16 DICT_MERGE               1
             18 CALL_FUNCTION_EX         1
             20 POP_TOP
             22 LOAD_CONST               0 (None)
             24 RETURN_VALUE

$ echo "func(*args, **(kwargs or {}))" | python3 -m dis -
  0           0 RESUME                   0

  1           2 PUSH_NULL
              4 LOAD_NAME                0 (func)
              6 LOAD_NAME                1 (args)
              8 BUILD_MAP                0
             10 LOAD_NAME                2 (kwargs)
             12 JUMP_IF_TRUE_OR_POP      1 (to 16)
             14 BUILD_MAP                0
        >>   16 DICT_MERGE               1
             18 CALL_FUNCTION_EX         1
             20 POP_TOP
             22 LOAD_CONST               0 (None)
             24 RETURN_VALUE
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115026
Approved by: https://github.com/Skylion007
2023-12-03 20:03:26 +00:00
Rohan Varma
3c78ea4c9d [DDP][Compile] Test to Ensure torch.compile works w/static_graph=True (#114621)
Resolves https://github.com/pytorch/pytorch/issues/93672. This was
actually fixed by https://github.com/pytorch/pytorch/pull/103487 but I didn't
realize that PR also fixes torch compile at the time.

Differential Revision: [D51596148](https://our.internmc.facebook.com/intern/diff/D51596148/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114621
Approved by: https://github.com/wconstab
2023-12-01 22:18:45 +00:00
Philip Meier
373f2060ba fix extending torch native API docs (#114863)
Couldn't think of a better `release notes:` label. Feel free to set a more fitting one
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114863
Approved by: https://github.com/mikaylagawarecki
2023-12-01 06:09:35 +00:00
Edward Z. Yang
09df6b771b Add a note about performant record_stream use. (#112526)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112526
Approved by: https://github.com/albanD
2023-11-02 15:50:22 +00:00
Kurt Mohler
fd209543d5 Add torch.utils.deterministic.fill_uninitialized_memory flag (#111377)
Part of #109802

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111377
Approved by: https://github.com/albanD, https://github.com/aaronenyeshi
2023-11-01 16:10:09 +00:00
PyTorch MergeBot
ace2713d1e Revert "Add torch.utils.deterministic.fill_uninitialized_memory flag (#111377)"
This reverts commit f1785373c0.

Reverted https://github.com/pytorch/pytorch/pull/111377 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/111377#issuecomment-1784179040))
2023-10-29 17:41:55 +00:00
Kurt Mohler
f1785373c0 Add torch.utils.deterministic.fill_uninitialized_memory flag (#111377)
Part of #109802

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111377
Approved by: https://github.com/albanD
2023-10-26 02:39:06 +00:00
Nikita Shulga
d22e5e4b52 Fix DDP notes (#111833)
To include `import os` otherwise sample is not syntactically correct Reported in https://github.com/pytorch/pytorch.github.io/pull/1490

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111833
Approved by: https://github.com/wanchaol
2023-10-23 22:05:36 +00:00
eqy
894b9957c8 [DOCS][CUDA] Update TF32 docs for sm90 (#111337)
For #110252.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111337
Approved by: https://github.com/msaroufim
2023-10-19 09:36:13 +00:00
albanD
a0bbd075b2 Add the Mode section in the extending doc (#110073)
Cover the basic principles of Mode and an example on how to use them and their behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110073
Approved by: https://github.com/janeyx99
2023-10-06 23:50:55 +00:00
Banit Agrawal
64583c4d04 [CUDA Host Allocator] Add support of CudaHostRegister (#108488)
Summary: This diff adds another option to create cuda pinned memory using cudaHostRegister.

Differential Revision: D45843715

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108488
Approved by: https://github.com/zdevito
2023-10-06 04:13:02 +00:00
Kazuaki Ishizaki
aa3629ee3e Fix typo under docs directory (#110359)
This PR fixes typo in `.rst` files under docs directory

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110359
Approved by: https://github.com/kit1980
2023-10-03 16:36:05 +00:00
FFFrog
d4990ad5a1 Fix the example in the extending.func.rst (#109279)
As the title shown ,the `backward` function is missing the definition of `ind` and `ind_inv`, which will lead to error when calling backward
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109279
Approved by: https://github.com/zou3519
2023-09-14 17:29:39 +00:00
Zachary DeVito
40cbda274b document memory snapshotting (#107660)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107660
Approved by: https://github.com/albanD
ghstack dependencies: #107171, #107399
2023-08-24 19:20:03 +00:00
Jane Xu
515aa993e3 Document post acc grad hooks in backward hooks execution (#107323)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107323
Approved by: https://github.com/soulitzer, https://github.com/albanD
2023-08-22 18:37:03 +00:00
David Radley
dbc2216800 Add autograd modes table to docs (#104774)
Fixes #104461

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104774
Approved by: https://github.com/soulitzer
2023-07-08 03:14:10 +00:00
Aleksei Nikiforov
c42fd73cf9 Add functions to get and set default endianness in load() functions (#101973)
By default interpret tensor data as native endian, but add an option to interpret data as little endian or big endian.

Related to #101688

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101973
Approved by: https://github.com/mikaylagawarecki
2023-07-06 20:12:56 +00:00
Mikayla Gawarecki
981f24e806 Add docstring to torch.serialization.register_package (#104046)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104046
Approved by: https://github.com/albanD
2023-06-26 23:28:32 +00:00
ZhaoqiongZ
7cef7195f6 [draft] Update Multiprocessing best practices with CPU device (#103229)
Fixes [#102498](https://github.com/pytorch/pytorch/issues/102498)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103229
Approved by: https://github.com/mingfeima, https://github.com/svekars, https://github.com/jgong5
2023-06-25 06:26:40 +00:00
albanD
4143b6b89b Add torch_dispatch and modes to extending.rst note (#102087)
The following subjects are not in this PR and will be done in a follow up:
- Go through torch_function section and update to the latest phrasing and link to the proper new sections
- Go through torch.library and custom device docs to add links to the new sections as appropriate
- Top level explanations on which component should be used
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102087
Approved by: https://github.com/janeyx99
2023-06-22 12:56:35 +00:00
Rickey K. Liang
807d81155f [CUDA][CUBLAS] Fix BF16 reduced precision reduction note in Numerical accuracy docs (#101884)
Fixes #100966

Ref #101044

Align implementation and documentation. (This is what's previously missed from the above issue and PR)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101884
Approved by: https://github.com/eqy, https://github.com/ezyang
2023-05-21 17:38:00 +00:00
Ran Ding
b5c8d0359c Update autograd.rst (#101007)
Fixes #ISSUE_NUMBER

typo fix and small change to improve clarity

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101007
Approved by: https://github.com/lezcano, https://github.com/anjali411
2023-05-12 11:47:51 +00:00
eqy
33f3dca6b5 [CUDA][CUBLAS] Fix BF16 reduced precision reduction note in docs (#101044)
#100966

CC @ngimel @ezyang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101044
Approved by: https://github.com/ngimel
2023-05-10 06:50:58 +00:00
eqy
6e2efd16d8 [CUDA][CUBLAS] Add cuBLAS workspace allocation behavior to docs (#100919)
Adding to the docs for now, hopefully we can move to `cudaMallocAsync`-backed cuBLAS workspaces soon which should alleviate the recent confusion around `cuBLAS` "leaking" memory through workspaces.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100919
Approved by: https://github.com/ngimel
2023-05-10 06:40:26 +00:00
Richard Barnes
9c185b6b46 [codemod] Replace hasattr with getattr in caffe2/docs/source/notes/extending.rst (#100598)
Summary:
The pattern
```
X.Y if hasattr(X, "Y") else Z
```
can be replaced with
```
getattr(X, "Y", Z)
```

The [getattr](https://www.w3schools.com/python/ref_func_getattr.asp) function gives more succinct code than the [hasattr](https://www.w3schools.com/python/ref_func_hasattr.asp) function. Please use it when appropriate.

**This diff is very low risk. Green tests indicate that you can safely Accept & Ship.**

Test Plan: Sandcastle

Differential Revision: D44886464

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100598
Approved by: https://github.com/Skylion007
2023-05-04 16:36:15 +00:00
Svetlana Karslioglu
d425da8bf3 Replace master with main in links and docs/conf.py (#100176)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100176
Approved by: https://github.com/albanD, https://github.com/malfet
2023-05-02 18:20:32 +00:00
Richard Zou
6b9e22f3f6 Clarify the saving of intermediates in the "extending torch.func" docs (#98020)
Fixes https://github.com/pytorch/pytorch/issues/97260

We got some feedback that the page reads like "in order to save an input
for backward, you must return it as an output of the
autograd.Function.forward".

Doing so actually raises an error (on master and as of 2.1), but results
in an ambiguous situation on 2.0.0. To avoid more users running into
this, we clarify the documentation so it doesn't read like the above
and clearly mentions that you can save things from the inputs or
outputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98020
Approved by: https://github.com/soulitzer, https://github.com/kshitij12345
2023-03-31 13:57:37 +00:00
Kazuaki Ishizaki
50ed38a7eb Fix typo under docs directory (#97202)
This PR fixes typo in `.rst` files under docs directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97202
Approved by: https://github.com/kit1980
2023-03-21 01:24:10 +00:00
Xuehai Pan
8d45f555d7 [BE] [1/3] Rewrite super() calls in caffe2 and benchmarks (#94587)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94587
Approved by: https://github.com/ezyang
2023-02-11 18:19:48 +00:00
double7
685108b201 [docs] Fix incorrect wrapping of function (#94446)
The sample code of document incorrectly wraps the function decorator. To fix this, update the attributes of `func` based on `torch_function`.

Fixes #94305

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94446
Approved by: https://github.com/ezyang
2023-02-09 16:01:10 +00:00
soulitzer
77cbaedd5c [docs] Add section about tensor hooks on in-place in autograd note (#93116)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93116
Approved by: https://github.com/albanD
2023-02-01 17:35:21 +00:00
Felix Divo
219e9533f0 Improve autograd doc on complex numbers (#93065)
A tiny change to fix formatting and clarify a bit in [this section](https://pytorch.org/docs/stable/notes/autograd.html#what-are-complex-derivatives).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93065
Approved by: https://github.com/albanD
2023-01-27 09:36:38 +00:00
Richard Zou
98b78aa11c [autograd.Function] setup_context always appears on the Function (#92312)
Previously, we used the existence of setup_context to switch between if
forward should take a ctx object or not.

To be consistent with all other staticmethod (which always exist on the
autograd.Function), this PR change it so that we use IF setup_context
gets overriden by the user to switch between if forward should take a
ctx object or not.

Fixes https://github.com/pytorch/pytorch/issues/91451

Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92312
Approved by: https://github.com/albanD, https://github.com/soulitzer
2023-01-18 02:55:42 +00:00
soulitzer
88366a9075 Document hooks ordering behavior in the autograd note (#91667)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91667
Approved by: https://github.com/albanD
2023-01-18 00:20:13 +00:00
Richard Zou
2f9166ef89 [autograd.Function] Cleanup asymmetry in generate_vmap_rule and vmap (#91787)
This PR:
- changes generate_vmap_rule to either be True or False. Previously it
  could be True, False, or not set. This simplifies the implementation a
  bit.
- changes the vmap staticmethod to always be on the autograd.Function
  rather than sometimes defined.
  This is how the other staticmethod (forward, backward, jvp) are
  implemented and allows us to document it.

There are 4 possible states for the autograd.Function w.r.t. to the
above:
- generate_vmap_rule is True, vmap staticmethod overriden. This raises
  an error when used with vmap.
- generate_vmap_rule is False, vmap staticmethod overriden. This is
  valid.
- generate_vmap_rule is True, vmap staticmethod not overriden. This is
  valid.
- generate_vmap_rule is False, vmap staticmethod not overriden. This
  raises an error when used with vmap.

Future:
- setup_context needs the same treatment, but that's a bit tricker to
  implement.

Test Plan:
- new unittest
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91787
Approved by: https://github.com/soulitzer
2023-01-17 13:36:34 +00:00
Emilio Castillo
07e595e88a Add device_idx to free_fn in CUDAPluggableAllocator (#91398)
This was requested by nvidia folks, track also the device_id in the free function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91398
Approved by: https://github.com/albanD
2023-01-12 05:03:48 +00:00
Kazuaki Ishizaki
4f91b8e0ee Fix typo under docs directory (#91871)
This PR fixes typo in '.rst' files under 'docs' directory

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91871
Approved by: https://github.com/ngimel
2023-01-10 22:33:36 +00:00
Will Constable
630ef6c711 Fix Dynamo+DDP documentation (#91832)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91832
Approved by: https://github.com/soumith, https://github.com/davidberard98
2023-01-09 17:35:49 +00:00
Richard Zou
264f5ed516 [autograd.Function] Add docs on the functorch interaction (#91452)
This PR:
- Updates autograd.Function.forward docs to reflect how you either
  define a forward with ctx or a separate forward and setup_context
- Updates the "Extending Autograd" docs to suggest the usage of
  autograd.Function with separate forward and setup_context. This should
  be the default because there is a low barrier to go from this to
  an autograd.Function that is fully supported by functorch transforms.
- Adds a new "Extending torch.func with autograd.Function" doc that
  explains how to use autograd.Function with torch.func. It also
  explains how to use generate_vmap_rule and how to manually write a
  vmap staticmethod.

While writing this, I noticed that the implementation of
setup_context staticmethod/generate_vmap_rule/vmap staticmethod are a
bit inconsistent with the other method/attributes on autograd.Function:
- https://github.com/pytorch/pytorch/issues/91451
- I'm happy to fix those if we think it is a problem, either in this PR
  or a followup (this PR is getting long, I want some initial docs
  out that I can point early adopters at, and fixing the problems in the
  future isn't really BC-breaking).

Test Plan:
- view docs preview
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91452
Approved by: https://github.com/soulitzer
2023-01-04 00:28:19 +00:00
bowen0701
e803d336eb Fix missing indentation in serialization.rst (#91253)
Fixes #ISSUE_NUMBER

In serialization.rst, fix class ControlFlowModule's forward(): missing indentation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91253
Approved by: https://github.com/kit1980
2022-12-21 20:14:44 +00:00
Eddie Yan
8b617f813d [cuBLAS] Add an option to disable reduced precision reductions for BF16 GEMM (#89172)
Essentially the same change as #67946, except that the default is to disallow reduced precision reductions in `BFloat16` GEMMs (for now). If performance is severely regressed, we can change the default, but this option appears to be necessary to pass some `addmm` `BFloat16` tests on H100.

CC @ptrblck @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89172
Approved by: https://github.com/ngimel
2022-12-21 18:58:28 +00:00
Arek Sredzki
44dac51c36 Improve Autograd Documentation Clarity (#89401)
This makes minor adjustments to the autograd docs, improving clarity and resolving grammatical errors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89401
Approved by: https://github.com/kit1980
2022-12-06 06:45:04 +00:00
Will Constable
447283752c Update DDP docs for Dynamo/DDPOptimizer (#89096)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89096
Approved by: https://github.com/msaroufim
2022-11-30 05:50:12 +00:00
eqy
8321066031 Tweak formatting of note on macros (#89598)
For readability when viewing the rendered file e.g., from the browser.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89598
Approved by: https://github.com/kit1980
2022-11-28 20:42:30 +00:00
Emilio Castillo
c9d4390d13 Add Pluggable CUDA allocator backend (#86786)
Fixes #43144

This uses the Backend system added by [82682](https://github.com/pytorch/pytorch/pull/82682) to change allocators dynamically during the code execution. This will allow us to use RMM, use CUDA managed memory for some portions of the code that do not fit in GPU memory. Write static memory allocators to reduce fragmentation while training models and improve interoperability with external DL compilers/libraries.

For example, we could have the following allocator in c++

```c++
#include <sys/types.h>
#include <cuda_runtime_api.h>
#include <iostream>

extern "C" {
void* my_malloc(ssize_t size, int device, cudaStream_t stream) {
   void *ptr;
   std::cout<<"alloc "<< size<<std::endl;
   cudaMalloc(&ptr, size);
   return ptr;
}

void my_free(void* ptr) {
   std::cout<<"free "<<std::endl;
   cudaFree(ptr);
}
}
```

Compile it as a shared library
```
nvcc allocator.cc -o alloc.so -shared --compiler-options '-fPIC'
```

And use it from PyTorch as follows

```python
import torch

# Init caching
# b = torch.zeros(10, device='cuda')
new_alloc = torch.cuda.memory.CUDAPluggableAllocator('alloc.so', 'my_malloc', 'my_free')
old = torch.cuda.memory.get_current_allocator()
torch.cuda.memory.change_current_allocator(new_alloc)
b = torch.zeros(10, device='cuda')
# This will error since the current allocator was already instantiated
torch.cuda.memory.change_current_allocator(old)
```

Things to discuss
- How to test this, needs compiling external code ...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86786
Approved by: https://github.com/albanD
2022-11-23 17:54:36 +00:00
lezcano
d453b3c4d4 Add a note on the stability of linalg functions. (#88313)
This was long-due, as it keeps comming up in issues.

Fixes https://github.com/pytorch/pytorch/issues/85950
Fixes https://github.com/pytorch/pytorch/issues/59720
Fixes https://github.com/pytorch/pytorch/issues/59782

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88313
Approved by: https://github.com/soumith, https://github.com/mruberry
2022-11-07 22:44:23 +00:00
Codrin Popa
5b767d404e Modified roundup_power2_divisions to specify the number of divisions for each power of two interval (#87290)
Summary:
Improved roundup_power2_divisions knob so it allows better control of rouding in the PyTorch CUDA Caching Allocator.

This new version allows setting the number of divisions per power of two interval starting from 1MB and ending at 64GB and above. An example use case is when rouding is desirable for small allocations but there are also very large allocations which are persistent, thus would not benefit from rounding and take up extra space.

Test Plan: Tested locally

Differential Revision: D40103909

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87290
Approved by: https://github.com/zdevito
2022-11-04 19:31:16 +00:00
Pruthvi Madugundu
fbd08fb358 Introduce TORCH_DISABLE_GPU_ASSERTS (#84190)
- Asserts for CUDA are enabled by default
- Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON`
- Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON`

This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190
Approved by: https://github.com/jeffdaily, https://github.com/malfet
2022-11-04 04:43:05 +00:00
PyTorch MergeBot
0fa23663cc Revert "Introduce TORCH_DISABLE_GPU_ASSERTS (#84190)"
This reverts commit 1e2c4a6e0e.

Reverted https://github.com/pytorch/pytorch/pull/84190 on behalf of https://github.com/malfet due to Needs internal changes, has to be landed via co-dev
2022-11-02 18:13:37 +00:00
Pruthvi Madugundu
1e2c4a6e0e Introduce TORCH_DISABLE_GPU_ASSERTS (#84190)
- Asserts for CUDA are enabled by default
- Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON`
- Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON`

This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190
Approved by: https://github.com/jeffdaily, https://github.com/malfet
2022-11-02 17:41:57 +00:00