Commit Graph

1263 Commits

Author SHA1 Message Date
Sergii Dymchenko
f51f6aa387 Fix non-existing parameters in docstrings (#90505)
Continuation after https://github.com/pytorch/pytorch/pull/90163.

Here is a script I used to find all the non-existing arguments in the docstrings (the script can give false positives in presence of *args/**kwargs or decorators):

_Edit:_
I've realized that the indentation is wrong for the last `break` in the script, so the script only gives output for a function if the first docstring argument is wrong. I'll create a separate PR if I find more issues with corrected script.

``` python
import ast
import os
import docstring_parser

for root, dirs, files in os.walk('.'):
    for name in files:
        if root.startswith("./.git/") or root.startswith("./third_party/"):
            continue
        if name.endswith(".py"):
            full_name = os.path.join(root, name)
            with open(full_name, "r") as source:
                tree = ast.parse(source.read())
                for node in ast.walk(tree):
                    if isinstance(node, ast.FunctionDef):
                        all_node_args = node.args.args
                        if node.args.vararg is not None:
                            all_node_args.append(node.args.vararg)
                        if node.args.kwarg is not None:
                            all_node_args.append(node.args.kwarg)
                        if node.args.posonlyargs is not None:
                            all_node_args.extend(node.args.posonlyargs)
                        if node.args.kwonlyargs is not None:
                            all_node_args.extend(node.args.kwonlyargs)
                        args = [a.arg for a in all_node_args]
                        docstring = docstring_parser.parse(ast.get_docstring(node))
                        doc_args = [a.arg_name for a in docstring.params]
                        clean_doc_args = []
                        for a in doc_args:
                            clean_a = ""
                            for c in a.split()[0]:
                                if c.isalnum() or c == '_':
                                    clean_a += c
                            if clean_a:
                                clean_doc_args.append(clean_a)
                        doc_args = clean_doc_args
                        for a in doc_args:
                            if a not in args:
                                print(full_name, node.lineno, args, doc_args)
                            break

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90505
Approved by: https://github.com/malfet, https://github.com/ZainRizvi
2022-12-09 21:43:09 +00:00
Frederik Gerzer
09ccda0d94 Fix: Make __len__ of datapipes dynamic (#88302)
Fixes #88074

Several datapipes have their lengths cached on being executed for the first time. However, source datapipes might change in length (most prominently, whenever `apply_sharding` is called). The behaviour is counter-intuitive because we do not expect `__len__` to have side-effects.

This PR makes `__len__` dynamically computed.

Changes:
- Add note to the `datapipes` README that `__len__` should be dynamic and why.
- Remove caching of length computations in `ConcaterIterDataPipe`, `MultiplexerIterDataPipe`, `ZipperIterDataPipe`, `BatcherIterDataPipe`, `ConcaterMapDataPipe`, and `BatcherMapDataPipe`.
- This required removal of the `length` attribute in setstate/getstate of `MultiplexerIterDataPipe`. I am unsure whether to remove this completely and risk breaking saved checkpoints (as I did) or whether to just ignore the `length` of the loaded `state`.
- This also means the classes above no longer have a `length` attribute. I have found no uses of this, though.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88302
Approved by: https://github.com/NivekT
2022-12-09 19:15:53 +00:00
Yuxin Wu
c00b135adf Remove deprecated call to tf.io.gfile.get_filesystem (#89832)
Fixes #30966 . Fixes #47139
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89832
Approved by: https://github.com/soumith
2022-12-08 08:53:27 +00:00
Yuxin Wu
ecd784667c Avoid overflow in tensorboard image summary (#90423)
Fix #90419

Added some code such that the test will update the expect files when `expecttest.ACCEPT` is True.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90423
Approved by: https://github.com/soumith
2022-12-08 08:31:52 +00:00
Nikita Shulga
36ac095ff8 Migrate PyTorch to C++17 (#85969)
With CUDA-10.2 gone we can finally do it!

This PR mostly contains build system related changes, invasive functional ones are to be followed.
Among many expected tweaks to the build system, here are few unexpected ones:
 - Force onnx_proto project to be updated to C++17 to avoid `duplicate symbols` error when compiled by gcc-7.5.0, as storage rule for `constexpr` changed in C++17, but gcc does not seem to follow it
 - Do not use `std::apply` on CUDA but rely on the built-in variant, as it results in test failures when CUDA runtime picks host rather than device function when `std::apply` is invoked from CUDA code.
 - `std::decay_t` -> `::std::decay_t` and `std::move`->`::std::move` as VC++ for some reason claims that `std` symbol is ambigious
 - Disable use of `std::aligned_alloc` on Android, as its `libc++` does not implement it.

Some prerequisites:
 - https://github.com/pytorch/pytorch/pull/89297
 - https://github.com/pytorch/pytorch/pull/89605
 - https://github.com/pytorch/pytorch/pull/90228
 - https://github.com/pytorch/pytorch/pull/90389
 - https://github.com/pytorch/pytorch/pull/90379
 - https://github.com/pytorch/pytorch/pull/89570
 - https://github.com/facebookincubator/gloo/pull/336
 - https://github.com/facebookincubator/gloo/pull/343
 - 919676fb32

Fixes https://github.com/pytorch/pytorch/issues/56055

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85969
Approved by: https://github.com/ezyang, https://github.com/kulinseth
2022-12-08 02:27:48 +00:00
Richard Barnes
ad188a227e Introduce CUDA Device Assertions Infrastructure (#84609)
Summary:
This diff introduces a set of changes that makes it possible for the host to get assertions from CUDA devices. This includes the introduction of

**`CUDA_KERNEL_ASSERT2`**

A preprocessor macro to be used within a CUDA kernel that, upon an assertion failure, writes the assertion message, file, line number, and possibly other information to UVM (Managed memory). Once this is done, the original assertion is triggered, which places the GPU in a Bad State requiring recovery. In my tests, data written to UVM appears there before the GPU reaches the Bad State and is still accessible from the host after the GPU is in this state.

Messages are written to a multi-message buffer which can, in theory, hold many assertion failures. I've done this as a precaution in case there are several, but I don't actually know whether that is possible and a simpler design which holds only a single message may well be all that is necessary.

**`TORCH_DSA_KERNEL_ARGS`**

This preprocess macro is added as an _argument_ to a kernel function's signature. It expands to supply the standardized names of all the arguments needed by `C10_CUDA_COMMUNICATING_KERNEL_ASSERTION` to handle device-side assertions. This includes, eg, the name of the pointer to the UVM memory the assertion would be written to. This macro abstracts the arguments so there is a single point of change if the system needs to be modified.

**`c10::cuda::get_global_cuda_kernel_launch_registry()`**

This host-side function returns a singleton object that manages the host's part of the device-side assertions. Upon allocation, the singleton allocates sufficient UVM (Managed) memory to hold information about several device-side assertion failures. The singleton also provides methods for getting the current traceback (used to identify when a kernel was launched). To avoid consuming all the host's memory the singleton stores launches in a circular buffer; a unique "generation number" is used to ensure that kernel launch failures map to their actual launch points (in the case that the circular buffer wraps before the failure is detected).

**`TORCH_DSA_KERNEL_LAUNCH`**

This host-side preprocessor macro replaces the standard
```
kernel_name<<<blocks, threads, shmem, stream>>>(args)
```
invocation with
```
TORCH_DSA_KERNEL_LAUNCH(blocks, threads, shmem, stream, args);
```
Internally, it fetches the UVM (Managed) pointer and generation number from the singleton and append these to the standard argument list. It also checks to ensure the kernel launches correctly. This abstraction on kernel launches can be modified to provide additional safety/logging.

**`c10::cuda::c10_retrieve_device_side_assertion_info`**
This host-side function checks, when called, that no kernel assertions have occurred. If one has. It then raises an exception with:
1. Information (file, line number) of what kernel was launched.
2. Information (file, line number, message) about the device-side assertion
3. Information (file, line number) about where the failure was detected.

**Checking for device-side assertions**

Device-side assertions are most likely to be noticed by the host when a CUDA API call such as `cudaDeviceSynchronize` is made and fails with a `cudaError_t` indicating
> CUDA error: device-side assert triggered CUDA kernel errors

Therefore, we rewrite `C10_CUDA_CHECK()` to include a call to `c10_retrieve_device_side_assertion_info()`. To make the code cleaner, most of the logic of `C10_CUDA_CHECK()` is now contained within a new function `c10_cuda_check_implementation()` to which `C10_CUDA_CHECK` passes the preprocessor information about filenames, function names, and line numbers. (In C++20 we can use `std::source_location` to eliminate macros entirely!)

# Notes on special cases

* Multiple assertions from the same block are recorded
* Multiple assertions from different blocks are recorded
* Launching kernels from many threads on many streams seems to be handled correctly
* If two process are using the same GPU and one of the processes fails with a device-side assertion the other process continues without issue
* X Multiple assertions from separate kernels on different streams seem to be recorded, but we can't reproduce the test condition
* X Multiple assertions from separate devices should be all be shown upon exit, but we've been unable to generate a test that produces this condition

Differential Revision: D37621532

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84609
Approved by: https://github.com/ezyang, https://github.com/malfet
2022-12-08 01:26:07 +00:00
Ram Rachum
351d73b97f Fix exception causes all over the codebase (#90271)
This is the continuation to #90134 and hopefully the final PR in this series.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90271
Approved by: https://github.com/kit1980
2022-12-07 04:29:00 +00:00
erjia
b1eb42bcfd [4/4][DataPipe] Remove iterator depletion in Zipper (#89974)
Fixes: https://github.com/pytorch/data/issues/865

I will add another PR in torchdata to validate this change would solve the infinite datapipe problem (I have tested locally). This is one of the most annoying stack of PRs cause by separation between TorchData and PyTorch.

There is a case that `file.close` is never called because when generator function has never reached to the end. A simple example would be `zip` two datepipes with different length. The longer DataPipe would never reach the end of generator and then it will be cleaned up by `gc`. So, the line of `file.close` is not executed. (This is the reason that Vitaly has to create this [hack](4451eb24e6/torch/utils/data/datapipes/iter/combining.py (L573-L583)) to retrieve all remaining data to make sure generator function is fully executed)

However, this hack introduces another problem where an infinite datapipe would make `zip` never end as it would try to deplete the infinite iterator. See: https://github.com/pytorch/data/issues/865

So, in this PR, I am adding a `try-finally` clause to make sure the `file.close` is always executed during the destruction of `generator` object. Then, we don't need the hack within `zip` any more.

Differential Revision: [D41699469](https://our.internmc.facebook.com/intern/diff/D41699469)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89974
Approved by: https://github.com/NivekT, https://github.com/wenleix
2022-12-05 16:45:34 +00:00
erjia
bda6ff0990 [1/4][DataPipe] Properly cleanup unclosed files within generator function (#89973)
There is a case that `file.close` is never called because when generator function has never reached to the end. A simple example would be `zip` two datepipes with different length. The longer DataPipe would never reach the end of generator and then it will be cleaned up by `gc`. So, the line of `file.close` is not executed. (This is the reason that Vitaly has to create this [hack](4451eb24e6/torch/utils/data/datapipes/iter/combining.py (L573-L583)) to retrieve all remaining data to make sure generator function is fully executed)

However, this hack introduces another problem where an infinite datapipe would make `zip` never end as it would try to deplete the infinite iterator. See: https://github.com/pytorch/data/issues/865

So, in this PR, I am adding a `try-finally` clause to make sure the `file.close` is always executed during the destruction of `generator` object. Then, we don't need the hack within `zip` any more.

Differential Revision: [D41699470](https://our.internmc.facebook.com/intern/diff/D41699470)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89973
Approved by: https://github.com/NivekT
2022-12-04 04:04:46 +00:00
Dmitry Tomshin
11db12bd94 Issue 68576 prefetch factor docstring changes (#89874)
Fixes #68576

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89874
Approved by: https://github.com/kit1980
2022-11-30 23:42:56 +00:00
Erjia Guan
991028cd9f Deprecating DataPipes (#89794)
Summary: per title

Test Plan:
`buck2 test buck2 test //caffe2/test:datapipe` https://www.internalfb.com/intern/testinfra/testconsole/testrun/6473924589747074/
`buck2 test mode/opt //pytorch/data/test:tests`

Differential Revision: D41563765

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89794
Approved by: https://github.com/wenleix, https://github.com/NivekT
2022-11-29 19:21:53 +00:00
Emilio Castillo
c9d4390d13 Add Pluggable CUDA allocator backend (#86786)
Fixes #43144

This uses the Backend system added by [82682](https://github.com/pytorch/pytorch/pull/82682) to change allocators dynamically during the code execution. This will allow us to use RMM, use CUDA managed memory for some portions of the code that do not fit in GPU memory. Write static memory allocators to reduce fragmentation while training models and improve interoperability with external DL compilers/libraries.

For example, we could have the following allocator in c++

```c++
#include <sys/types.h>
#include <cuda_runtime_api.h>
#include <iostream>

extern "C" {
void* my_malloc(ssize_t size, int device, cudaStream_t stream) {
   void *ptr;
   std::cout<<"alloc "<< size<<std::endl;
   cudaMalloc(&ptr, size);
   return ptr;
}

void my_free(void* ptr) {
   std::cout<<"free "<<std::endl;
   cudaFree(ptr);
}
}
```

Compile it as a shared library
```
nvcc allocator.cc -o alloc.so -shared --compiler-options '-fPIC'
```

And use it from PyTorch as follows

```python
import torch

# Init caching
# b = torch.zeros(10, device='cuda')
new_alloc = torch.cuda.memory.CUDAPluggableAllocator('alloc.so', 'my_malloc', 'my_free')
old = torch.cuda.memory.get_current_allocator()
torch.cuda.memory.change_current_allocator(new_alloc)
b = torch.zeros(10, device='cuda')
# This will error since the current allocator was already instantiated
torch.cuda.memory.change_current_allocator(old)
```

Things to discuss
- How to test this, needs compiling external code ...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86786
Approved by: https://github.com/albanD
2022-11-23 17:54:36 +00:00
Alexander Grund
5b51ca6808 Update CUDA compiler matrix (#86360)
Switch GCC/Clang max versions to be exclusive as the `include/crt/host_config.h` checks the major version only for the upper bound. This allows to be less restrictive and match the checks in the aforementioned header.
Also update the versions using that header in the CUDA SDKs.

Follow up to #82860

I noticed this as PyTorch 1.12.1 with CUDA 11.3.1 and GCC 10.3 was failing in the `test_cpp_extensions*` tests.

Example for CUDA 11.3.1 from the SDK header:

```
#if __GNUC__ > 11
// Error out
...
#if (__clang_major__ >= 12) || (__clang_major__ < 3) || ((__clang_major__ == 3) &&  (__clang_minor__ < 3))
// Error out
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86360
Approved by: https://github.com/ezyang
2022-11-23 03:07:22 +00:00
Shen Li
f5d18574a3 Allow Module forward-pre and forward hooks to take kwargs (#89389)
closes #35643

This PR is mostly borrowed from #82042. Thanks @Padarn for implementing
the first version and debugging into the errors.

Based on the discussion in #82042 this PR adds a with_kwargs
argument to register_forward_pre_hook and register_forward_hook
methods. When the arg is set to true, the provided hook must accept
kwargs args. Under the hook, this PR adds a
`_forward_pre_hooks_with_kwargs` and a `_forward_hook_with_kwargs`
set to keep track of which hooks accept kwargs.

Differential Revision: [D41431111](https://our.internmc.facebook.com/intern/diff/D41431111)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89389
Approved by: https://github.com/soulitzer
2022-11-23 02:43:32 +00:00
Dmitry Tomshin
57e05e822d Issue 68576 prefetch factor (#88972)
Fixes #68576
This PR allows set the `prefetch_factor=None` making it really optional according to the documentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88972
Approved by: https://github.com/kit1980
2022-11-18 00:10:50 +00:00
erjia
3d8a853a87 [DataPipe] Add container template for _Fork and _Demux (#89216)
- This would remove the hard-coded check within `_ChildDataPipe`.
- Add `get_length_by_instance` to parent class to make sure there is a chance that child DataPipe can have different lengths
- Prevent Error when `__del__` executed when the object has already been removed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89216
Approved by: https://github.com/NivekT
2022-11-17 23:06:41 +00:00
Kazuaki Ishizaki
1cd6ebe095 Fix typos in messages under torch (#89049)
This PR fixes typos of messages in `.py` files under torch directory.
Only in `torch/onnx/symbolic_opset16.py`, fix a typo in comment to make the operator name correct.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89049
Approved by: https://github.com/lezcano
2022-11-17 04:18:14 +00:00
soulitzer
6b521bbf35 Prevent module full_backward_hook from erroring in double backward (#88357)
Also clarifies documentation to say "execute if and only if gradients wrt outputs are computed" (previously, "execute every time gradients wrt inputs are computed")

See https://docs.google.com/document/d/1tFZKYdsSzRBJ7Di7SWt8X8fSg-E3eiUPwomMF10UyhM/edit for more details regarding the question: 'should module full_backward_hooks be called every time the gradients wrt module inputs are called, or should module full_backward_hooks only be called when the "backward for the module" have been computed?'

Fixes https://github.com/pytorch/pytorch/issues/88312

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88357
Approved by: https://github.com/albanD
2022-11-16 19:27:30 +00:00
Kevin Tse
be8d88f8d0 [DataLoader] Removing DataLoader2 related code (#88848)
Removing these lines of code as `DataLoader2` has been added to [TorchData](https://github.com/pytorch/data). I'm importing this to confirm it will not impact internal codes.

Differential Revision: [D41201578](https://our.internmc.facebook.com/intern/diff/D41201578)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88848
Approved by: https://github.com/ejguan
2022-11-11 22:27:01 +00:00
Nikita Shulga
575e02df53 Fix CUDNN_PATH handling on Windows (#88898)
Fixes https://github.com/pytorch/pytorch/issues/88873
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88898
Approved by: https://github.com/kit1980
2022-11-11 21:19:26 +00:00
erjia
90cf14ddf6 [DataPipe] Deprecating drop_empty_batches from Filter and other functional APIs (#88693)
- Deprecating based on https://github.com/pytorch/data/issues/163

Corresponding PRs from TorchData: https://github.com/pytorch/data/pull/890
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88693
Approved by: https://github.com/NivekT
2022-11-10 19:54:22 +00:00
kshitij12345
eb9b156019 [fix] MathBits: serialization (#88182)
Fixes #81690

TODO:

* [x] C++ Unpickler Fix (locally tested pickled in Python and unpickled in C++)
* [x] C++ Pickler Fix (locally tested pickled in C++ and unpickled in Python)
* [x] Do quant_tensor, sparse_tensor, etc require similar changes? (Sparse and Quant don't need this)
* [x] Add Comments
* [x] How to make sure C++ and Python are in sync? (Functions in `pickler.h` help in getting and setting Tensor Metadata (math-bits for now) on a tensor. They are the only place which should handle this.)

Notes:
Quant Tensor don't support complex dtypes and for float they segfault with `_neg_view` : https://github.com/pytorch/pytorch/issues/88484

Sparse Tensor:
```python
>>> a = torch.tensor([[0, 2.], [3j, 0]]).to_sparse()
>>> a.conj().is_conj()
False
>>> a._neg_view()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NotImplementedError: Cannot access storage of SparseTensorImpl
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88182
Approved by: https://github.com/ezyang, https://github.com/anjali411
2022-11-09 17:15:12 +00:00
Eddie Yan
a7420d2ccb Hopper (sm90) support (#87736)
Essentially a followup of #87436

CC @xwang233 @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87736
Approved by: https://github.com/xwang233, https://github.com/malfet
2022-11-09 01:49:50 +00:00
Kurt Mohler
ee28b865ee Deprecate TypedStorage, its derived classes, and all of their public methods (#85303)
Part of #85302

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85303
Approved by: https://github.com/ezyang
2022-11-08 18:11:01 +00:00
Kazuaki Ishizaki
4ea2310f1e Fix typos used in documents under torch directory (#88483)
This PR fixes typos, in comments of Python files, that are found from a search box at https://pytorch.org/docs/master/search.html.
This is a follow-up of #88300.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88483
Approved by: https://github.com/kit1980
2022-11-08 01:33:36 +00:00
Vitaly Fedyunin
9dadf8fcc2 [DataPipes] Add group support to the sharding_filter (#88424)
Differential Revision: [D41006747](https://our.internmc.facebook.com/intern/diff/D41006747)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88424
Approved by: https://github.com/ejguan
2022-11-07 22:07:01 +00:00
Elias Ellison
2ce2fc133d Disable Current Modes when printing Tensor (#88344)
Fix for https://github.com/pytorch/pytorch/issues/88087

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88344
Approved by: https://github.com/ezyang, https://github.com/samdow
2022-11-04 00:45:35 +00:00
Kazuaki Ishizaki
2ddefbdc3c Fix typos used in documents under torch directory (#88300)
This PR fixes typos, in comments of Python files, that are found from a search box at https://pytorch.org/docs/master/search.html

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88300
Approved by: https://github.com/lezcano
2022-11-02 09:38:13 +00:00
Salil Desai
bc68625151 [Vulkan] Add support for Optimization Blocklist to Vulkan Rewrite (#87431)
Optimization Blocklist will be used in a future diff (D40315730) to make the rewrite to transfer input/output backends optional

Differential Revision: [D40315729](https://our.internmc.facebook.com/intern/diff/D40315729/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87431
Approved by: https://github.com/mcr229, https://github.com/digantdesai
2022-10-31 14:15:51 +00:00
jpvillam
38dd4cbdf1 ROCm enable sparse_sampled_addmm (#86401)
Enables:
test_comprehensive_sparse_sampled_addmm_cuda_complex128
test_comprehensive_sparse_sampled_addmm_cuda_complex64
test_comprehensive_sparse_sampled_addmm_cuda_float32
test_comprehensive_sparse_sampled_addmm_cuda_float64
test_dispatch_meta_sparse_sampled_addmm_cuda_complex128
test_dispatch_meta_sparse_sampled_addmm_cuda_complex64
test_dispatch_meta_sparse_sampled_addmm_cuda_float32
test_dispatch_meta_sparse_sampled_addmm_cuda_float64
test_meta_sparse_sampled_addmm_cuda_complex128
test_meta_sparse_sampled_addmm_cuda_complex64
test_meta_sparse_sampled_addmm_cuda_float32
test_meta_sparse_sampled_addmm_cuda_float64

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86401
Approved by: https://github.com/ngimel
2022-10-26 19:39:24 +00:00
erjia
63138fbec3 [DataLoader2] Change serialization wrapper to iterator (#87459)
This is temporary fix for internal SEV. We have run three different workflows to validate this fix would unblock internal SEV.
And, those are a few following-up tasks:
- [ ] Create reproducible test for multithreading with generator
- [ ] Figure out how to make fullsynciterator is working properly with generator
- [ ] Move Wrapper back to generator if needed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87459
Approved by: https://github.com/NivekT
2022-10-25 01:27:56 +00:00
Greg Hogan
71fe069d98 ada lovelace (arch 8.9) support (#87436)
changes required to be able to compile https://github.com/pytorch/vision and https://github.com/nvidia/apex for `sm_89` architecture
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87436
Approved by: https://github.com/ngimel
2022-10-24 21:25:36 +00:00
Nikita Shulga
c28cdb53ea [BE] Delete BUILD_SPLIT_CUDA option (#87502)
As we are linking with cuDNN and cuBLAS dynamically for all configs anyway, as statically linked cuDNN is different library than dynamically linked one, increases default memory footprint, etc, and libtorch_cuda even if compiled for all GPU architectures is no longer approaching 2Gb binary size limit, so BUILD_SPLIT_CUDA can go away.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87502
Approved by: https://github.com/atalman
2022-10-22 06:00:59 +00:00
samdow
169ec120ef [Modes] refactor modes to only use a stack in cpp (#86458)
Refactors the mode code to only have the C++ mode stack and not the "C++ mode" like we originally had. This also simplifies the mode logic in a number of places
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86458
Approved by: https://github.com/zou3519
2022-10-21 19:18:23 +00:00
Brian Hirsh
ce0c6e828e Reland "add an API for external backends to register custom device names (#86992)" (#87453)
Re-land of https://github.com/pytorch/pytorch/pull/86992

This reverts commit a895af9250.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87453
Approved by: https://github.com/ezyang, https://github.com/albanD
2022-10-21 16:51:36 +00:00
PyTorch MergeBot
a895af9250 Revert "add an API for external backends to register custom device names (#86992)"
This reverts commit fb6826bfd8.

Reverted https://github.com/pytorch/pytorch/pull/86992 on behalf of https://github.com/jeanschmidt due to breaking internal builds - D40534212 - arstudio-windows-tests-landcastle-0
2022-10-20 14:51:08 +00:00
erjia
b90db4a78f [DataPipe] Fix type checking to accept both Iter and Map DataPipe (#87285)
Fixes https://github.com/pytorch/data/issues/841

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87285
Approved by: https://github.com/NivekT
2022-10-20 05:05:56 +00:00
albanD
c141f28b64 Fix compilation warning and spurious print (#87297)
Fixes compilation warning, make this warning an error and remove a random print.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87297
Approved by: https://github.com/malfet
2022-10-19 20:56:37 +00:00
Brian Hirsh
fb6826bfd8 add an API for external backends to register custom device names (#86992)
This API adds some improvements to external backends who are building C++ backends out of tree using the `PrivateUse1` dispatch key.

The docs and linked examples go over the API in more detail, but you should be able to use it like:
```
# This should probably be in the __init__.py file of a external backend's python package
> torch.register_privateuse1_backend("foo")`
# And it will allow the user to do this:
> a = torch.ones(2, device="foo")
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86992
Approved by: https://github.com/albanD
2022-10-19 16:44:17 +00:00
leizhenyuan
c6187ea326 add support for pin memory on xpu device (#86545)
add support for pin memory on xpu device

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86545
Approved by: https://github.com/ezyang
2022-10-19 13:24:48 +00:00
Tongzhou Wang
7ff1ca4e33 Add type annotation to get_worker_info (#87017)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87017
Approved by: https://github.com/ejguan, https://github.com/NivekT
2022-10-19 00:25:04 +00:00
hxu296
09a967d6c9 Make nested TreeSpec printing nicer (#46538) (#86546)
1. Made TreeSpec into a dataclass.
2. In `__repr__`, recursively transformed TreeSpec into dictionaries and then pretty-printed it.

Fixes #46538. Hi, @ezyang. this PR is for the TreeSpec `__repr__` refactor we discussed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86546
Approved by: https://github.com/ezyang
2022-10-18 16:50:39 +00:00
Kevin Tse
c01c7a5e2c [DataPipe] Fix missing functional name for FileLister (#86497)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86497
Approved by: https://github.com/ejguan
2022-10-17 18:13:37 +00:00
edward-io
4584d06e76 [data] add autocompletion to datapipes (#86960)
In REPLs (e.g. jupyter notebook) autocomplete now works:

<img width="750" alt="image" src="https://user-images.githubusercontent.com/53842584/195776448-f33180da-d1cd-4e47-b9a0-4fd9eb2f78b7.png">

even with custom data pipes:

<img width="804" alt="image" src="https://user-images.githubusercontent.com/53842584/195776957-5c51895e-f469-4b13-81ba-c9b507022555.png">

Unfortunately I wasn't able to figure out how to get autocomplete to work for non-REPLs (e.g. VSCode) - may need to generate fake pyi stubs, which 1) won't work for custom datapipes and 2) is a larger project to tackle :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86960
Approved by: https://github.com/NivekT
2022-10-15 00:25:26 +00:00
Kshiteej K
54ee95c8ec [nn] module: full_backward_pre_hook (#86700)
Fixes https://github.com/pytorch/pytorch/issues/42824

* [x] Test
* [x] Doc
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86700
Approved by: https://github.com/soulitzer
2022-10-13 17:36:39 +00:00
Syed Tousif Ahmed
77d94ac5ab Sets CUDA_MODULE_LOADING to LAZY when not set by the user (#85692)
This PR sets CUDA_MODULE_LOADING if it's not set by the user. By default, it sets it to "LAZY".

It was tested using the following commands:
```
python -c "import torch; tensor=torch.randn(20, 16, 50, 100).cuda(); free, total = torch.cuda.cudart().cudaMemGetInfo(0); print(total-free)"
```
which shows a memory usage of: 287,047,680 bytes

vs

```
CUDA_MODULE_LOADING="DEFAULT" python -c "import torch; tensor=torch.randn(20, 16, 50, 100).cuda(); free, total = torch.cuda.cudart().cudaMemGetInfo(0); print(total-free)"
```
which shows 666,632,192 bytes.

C++ implementation is needed for the libtorch users (otherwise it could have been a pure python functionality).

cc: @ptrblck @ngimel @malfet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85692
Approved by: https://github.com/malfet
2022-10-13 14:03:01 +00:00
胡玮文
92562046e9 Optimize __dlpack_device__ performance (#86665)
This can be critical when processing a large number of tensors

```bash
python -m timeit --setup 'import torch; t = torch.empty(1000, device="cuda")' 't.__dlpack_device__()'
```

based on 1.12.1:
before:
100000 loops, best of 5: 2.32 usec per loop
after:
500000 loops, best of 5: 844 nsec per loop

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86665
Approved by: https://github.com/SunDoge, https://github.com/soulitzer
2022-10-11 19:03:46 +00:00
Rohan Varma
d93b1b9c4e Address feedback from previous PR (#86622)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86622
Approved by: https://github.com/albanD
2022-10-10 18:53:41 +00:00
Rohan Varma
7a411952fb CheckpointSequential support non-reentrant (#86331)
Closes https://github.com/pytorch/pytorch/issues/86328

Adds `use_reentrant` argument to `checkpoint_sequential`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86331
Approved by: https://github.com/zhaojuanmao, https://github.com/albanD
2022-10-06 23:10:18 +00:00
originates
dfde7cf3e2 ANTIALIAS updated to Resampling.LANCZOS in torch/utils/tensorboard/summary.py (#85679)
**Line 492: ANTIALIAS updated to Resampling.LANCZOS**

Removes the following Depreciation Warning:

`DeprecationWarning: ANTIALIAS is deprecated and will be removed in Pillow 10 (2023-07-01). `
`Use Resampling.LANCZOS instead.`

---

```
   try:
        ANTIALIAS = Image.Resampling.LANCZOS
    except AttributeError:
        ANTIALIAS = Image.ANTIALIAS
    image = image.resize((scaled_width, scaled_height), ANTIALIAS)
```

Now Resampling.LANCZOS will be used unless it gives an AttributeError exception in which case it will revert back to using Image.ANTIALIAS.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85679
Approved by: https://github.com/albanD
2022-10-03 22:10:02 +00:00