pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Sergii Dymchenko	f51f6aa387	Fix non-existing parameters in docstrings (#90505 ) Continuation after https://github.com/pytorch/pytorch/pull/90163. Here is a script I used to find all the non-existing arguments in the docstrings (the script can give false positives in presence of args/*kwargs or decorators): _Edit:_ I've realized that the indentation is wrong for the last `break` in the script, so the script only gives output for a function if the first docstring argument is wrong. I'll create a separate PR if I find more issues with corrected script. ``` python import ast import os import docstring_parser for root, dirs, files in os.walk('.'): for name in files: if root.startswith("./.git/") or root.startswith("./third_party/"): continue if name.endswith(".py"): full_name = os.path.join(root, name) with open(full_name, "r") as source: tree = ast.parse(source.read()) for node in ast.walk(tree): if isinstance(node, ast.FunctionDef): all_node_args = node.args.args if node.args.vararg is not None: all_node_args.append(node.args.vararg) if node.args.kwarg is not None: all_node_args.append(node.args.kwarg) if node.args.posonlyargs is not None: all_node_args.extend(node.args.posonlyargs) if node.args.kwonlyargs is not None: all_node_args.extend(node.args.kwonlyargs) args = [a.arg for a in all_node_args] docstring = docstring_parser.parse(ast.get_docstring(node)) doc_args = [a.arg_name for a in docstring.params] clean_doc_args = [] for a in doc_args: clean_a = "" for c in a.split()[0]: if c.isalnum() or c == '_': clean_a += c if clean_a: clean_doc_args.append(clean_a) doc_args = clean_doc_args for a in doc_args: if a not in args: print(full_name, node.lineno, args, doc_args) break ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/90505 Approved by: https://github.com/malfet, https://github.com/ZainRizvi	2022-12-09 21:43:09 +00:00
Frederik Gerzer	09ccda0d94	Fix: Make `__len__` of datapipes dynamic (#88302 ) Fixes #88074 Several datapipes have their lengths cached on being executed for the first time. However, source datapipes might change in length (most prominently, whenever `apply_sharding` is called). The behaviour is counter-intuitive because we do not expect `__len__` to have side-effects. This PR makes `__len__` dynamically computed. Changes: - Add note to the `datapipes` README that `__len__` should be dynamic and why. - Remove caching of length computations in `ConcaterIterDataPipe`, `MultiplexerIterDataPipe`, `ZipperIterDataPipe`, `BatcherIterDataPipe`, `ConcaterMapDataPipe`, and `BatcherMapDataPipe`. - This required removal of the `length` attribute in setstate/getstate of `MultiplexerIterDataPipe`. I am unsure whether to remove this completely and risk breaking saved checkpoints (as I did) or whether to just ignore the `length` of the loaded `state`. - This also means the classes above no longer have a `length` attribute. I have found no uses of this, though. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88302 Approved by: https://github.com/NivekT	2022-12-09 19:15:53 +00:00
Yuxin Wu	c00b135adf	Remove deprecated call to tf.io.gfile.get_filesystem (#89832 ) Fixes #30966 . Fixes #47139 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89832 Approved by: https://github.com/soumith	2022-12-08 08:53:27 +00:00
Yuxin Wu	ecd784667c	Avoid overflow in tensorboard image summary (#90423 ) Fix #90419 Added some code such that the test will update the expect files when `expecttest.ACCEPT` is True. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90423 Approved by: https://github.com/soumith	2022-12-08 08:31:52 +00:00
Nikita Shulga	36ac095ff8	Migrate PyTorch to C++17 (#85969 ) With CUDA-10.2 gone we can finally do it! This PR mostly contains build system related changes, invasive functional ones are to be followed. Among many expected tweaks to the build system, here are few unexpected ones: - Force onnx_proto project to be updated to C++17 to avoid `duplicate symbols` error when compiled by gcc-7.5.0, as storage rule for `constexpr` changed in C++17, but gcc does not seem to follow it - Do not use `std::apply` on CUDA but rely on the built-in variant, as it results in test failures when CUDA runtime picks host rather than device function when `std::apply` is invoked from CUDA code. - `std::decay_t` -> `::std::decay_t` and `std::move`->`::std::move` as VC++ for some reason claims that `std` symbol is ambigious - Disable use of `std::aligned_alloc` on Android, as its `libc++` does not implement it. Some prerequisites: - https://github.com/pytorch/pytorch/pull/89297 - https://github.com/pytorch/pytorch/pull/89605 - https://github.com/pytorch/pytorch/pull/90228 - https://github.com/pytorch/pytorch/pull/90389 - https://github.com/pytorch/pytorch/pull/90379 - https://github.com/pytorch/pytorch/pull/89570 - https://github.com/facebookincubator/gloo/pull/336 - https://github.com/facebookincubator/gloo/pull/343 - `919676fb32` Fixes https://github.com/pytorch/pytorch/issues/56055 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85969 Approved by: https://github.com/ezyang, https://github.com/kulinseth	2022-12-08 02:27:48 +00:00
Richard Barnes	ad188a227e	Introduce CUDA Device Assertions Infrastructure (#84609 ) Summary: This diff introduces a set of changes that makes it possible for the host to get assertions from CUDA devices. This includes the introduction of `CUDA_KERNEL_ASSERT2` A preprocessor macro to be used within a CUDA kernel that, upon an assertion failure, writes the assertion message, file, line number, and possibly other information to UVM (Managed memory). Once this is done, the original assertion is triggered, which places the GPU in a Bad State requiring recovery. In my tests, data written to UVM appears there before the GPU reaches the Bad State and is still accessible from the host after the GPU is in this state. Messages are written to a multi-message buffer which can, in theory, hold many assertion failures. I've done this as a precaution in case there are several, but I don't actually know whether that is possible and a simpler design which holds only a single message may well be all that is necessary. `TORCH_DSA_KERNEL_ARGS` This preprocess macro is added as an _argument_ to a kernel function's signature. It expands to supply the standardized names of all the arguments needed by `C10_CUDA_COMMUNICATING_KERNEL_ASSERTION` to handle device-side assertions. This includes, eg, the name of the pointer to the UVM memory the assertion would be written to. This macro abstracts the arguments so there is a single point of change if the system needs to be modified. `c10::cuda::get_global_cuda_kernel_launch_registry()` This host-side function returns a singleton object that manages the host's part of the device-side assertions. Upon allocation, the singleton allocates sufficient UVM (Managed) memory to hold information about several device-side assertion failures. The singleton also provides methods for getting the current traceback (used to identify when a kernel was launched). To avoid consuming all the host's memory the singleton stores launches in a circular buffer; a unique "generation number" is used to ensure that kernel launch failures map to their actual launch points (in the case that the circular buffer wraps before the failure is detected). `TORCH_DSA_KERNEL_LAUNCH` This host-side preprocessor macro replaces the standard ``` kernel_name<<<blocks, threads, shmem, stream>>>(args) ``` invocation with ``` TORCH_DSA_KERNEL_LAUNCH(blocks, threads, shmem, stream, args); ``` Internally, it fetches the UVM (Managed) pointer and generation number from the singleton and append these to the standard argument list. It also checks to ensure the kernel launches correctly. This abstraction on kernel launches can be modified to provide additional safety/logging. `c10::cuda::c10_retrieve_device_side_assertion_info` This host-side function checks, when called, that no kernel assertions have occurred. If one has. It then raises an exception with: 1. Information (file, line number) of what kernel was launched. 2. Information (file, line number, message) about the device-side assertion 3. Information (file, line number) about where the failure was detected. Checking for device-side assertions Device-side assertions are most likely to be noticed by the host when a CUDA API call such as `cudaDeviceSynchronize` is made and fails with a `cudaError_t` indicating > CUDA error: device-side assert triggered CUDA kernel errors Therefore, we rewrite `C10_CUDA_CHECK()` to include a call to `c10_retrieve_device_side_assertion_info()`. To make the code cleaner, most of the logic of `C10_CUDA_CHECK()` is now contained within a new function `c10_cuda_check_implementation()` to which `C10_CUDA_CHECK` passes the preprocessor information about filenames, function names, and line numbers. (In C++20 we can use `std::source_location` to eliminate macros entirely!) # Notes on special cases * Multiple assertions from the same block are recorded * Multiple assertions from different blocks are recorded * Launching kernels from many threads on many streams seems to be handled correctly * If two process are using the same GPU and one of the processes fails with a device-side assertion the other process continues without issue * X Multiple assertions from separate kernels on different streams seem to be recorded, but we can't reproduce the test condition * X Multiple assertions from separate devices should be all be shown upon exit, but we've been unable to generate a test that produces this condition Differential Revision: D37621532 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84609 Approved by: https://github.com/ezyang, https://github.com/malfet	2022-12-08 01:26:07 +00:00
Ram Rachum	351d73b97f	Fix exception causes all over the codebase (#90271 ) This is the continuation to #90134 and hopefully the final PR in this series. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90271 Approved by: https://github.com/kit1980	2022-12-07 04:29:00 +00:00
erjia	b1eb42bcfd	[4/4][DataPipe] Remove iterator depletion in Zipper (#89974 ) Fixes: https://github.com/pytorch/data/issues/865 I will add another PR in torchdata to validate this change would solve the infinite datapipe problem (I have tested locally). This is one of the most annoying stack of PRs cause by separation between TorchData and PyTorch. There is a case that `file.close` is never called because when generator function has never reached to the end. A simple example would be `zip` two datepipes with different length. The longer DataPipe would never reach the end of generator and then it will be cleaned up by `gc`. So, the line of `file.close` is not executed. (This is the reason that Vitaly has to create this [hack](`4451eb24e6/torch/utils/data/datapipes/iter/combining.py (L573-L583)`) to retrieve all remaining data to make sure generator function is fully executed) However, this hack introduces another problem where an infinite datapipe would make `zip` never end as it would try to deplete the infinite iterator. See: https://github.com/pytorch/data/issues/865 So, in this PR, I am adding a `try-finally` clause to make sure the `file.close` is always executed during the destruction of `generator` object. Then, we don't need the hack within `zip` any more. Differential Revision: [D41699469](https://our.internmc.facebook.com/intern/diff/D41699469) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89974 Approved by: https://github.com/NivekT, https://github.com/wenleix	2022-12-05 16:45:34 +00:00
erjia	bda6ff0990	[1/4][DataPipe] Properly cleanup unclosed files within generator function (#89973 ) There is a case that `file.close` is never called because when generator function has never reached to the end. A simple example would be `zip` two datepipes with different length. The longer DataPipe would never reach the end of generator and then it will be cleaned up by `gc`. So, the line of `file.close` is not executed. (This is the reason that Vitaly has to create this [hack](`4451eb24e6/torch/utils/data/datapipes/iter/combining.py (L573-L583)`) to retrieve all remaining data to make sure generator function is fully executed) However, this hack introduces another problem where an infinite datapipe would make `zip` never end as it would try to deplete the infinite iterator. See: https://github.com/pytorch/data/issues/865 So, in this PR, I am adding a `try-finally` clause to make sure the `file.close` is always executed during the destruction of `generator` object. Then, we don't need the hack within `zip` any more. Differential Revision: [D41699470](https://our.internmc.facebook.com/intern/diff/D41699470) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89973 Approved by: https://github.com/NivekT	2022-12-04 04:04:46 +00:00
Dmitry Tomshin	11db12bd94	Issue 68576 prefetch factor docstring changes (#89874 ) Fixes #68576 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89874 Approved by: https://github.com/kit1980	2022-11-30 23:42:56 +00:00
Erjia Guan	991028cd9f	Deprecating DataPipes (#89794 ) Summary: per title Test Plan: `buck2 test buck2 test //caffe2/test:datapipe` https://www.internalfb.com/intern/testinfra/testconsole/testrun/6473924589747074/ `buck2 test mode/opt //pytorch/data/test:tests` Differential Revision: D41563765 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89794 Approved by: https://github.com/wenleix, https://github.com/NivekT	2022-11-29 19:21:53 +00:00
Emilio Castillo	c9d4390d13	Add Pluggable CUDA allocator backend (#86786 ) Fixes #43144 This uses the Backend system added by [82682](https://github.com/pytorch/pytorch/pull/82682) to change allocators dynamically during the code execution. This will allow us to use RMM, use CUDA managed memory for some portions of the code that do not fit in GPU memory. Write static memory allocators to reduce fragmentation while training models and improve interoperability with external DL compilers/libraries. For example, we could have the following allocator in c++ ```c++ #include <sys/types.h> #include <cuda_runtime_api.h> #include <iostream> extern "C" { void* my_malloc(ssize_t size, int device, cudaStream_t stream) { void ptr; std::cout<<"alloc "<< size<<std::endl; cudaMalloc(&ptr, size); return ptr; } void my_free(void ptr) { std::cout<<"free "<<std::endl; cudaFree(ptr); } } ``` Compile it as a shared library ``` nvcc allocator.cc -o alloc.so -shared --compiler-options '-fPIC' ``` And use it from PyTorch as follows ```python import torch # Init caching # b = torch.zeros(10, device='cuda') new_alloc = torch.cuda.memory.CUDAPluggableAllocator('alloc.so', 'my_malloc', 'my_free') old = torch.cuda.memory.get_current_allocator() torch.cuda.memory.change_current_allocator(new_alloc) b = torch.zeros(10, device='cuda') # This will error since the current allocator was already instantiated torch.cuda.memory.change_current_allocator(old) ``` Things to discuss - How to test this, needs compiling external code ... Pull Request resolved: https://github.com/pytorch/pytorch/pull/86786 Approved by: https://github.com/albanD	2022-11-23 17:54:36 +00:00
Alexander Grund	5b51ca6808	Update CUDA compiler matrix (#86360 ) Switch GCC/Clang max versions to be exclusive as the `include/crt/host_config.h` checks the major version only for the upper bound. This allows to be less restrictive and match the checks in the aforementioned header. Also update the versions using that header in the CUDA SDKs. Follow up to #82860 I noticed this as PyTorch 1.12.1 with CUDA 11.3.1 and GCC 10.3 was failing in the `test_cpp_extensions*` tests. Example for CUDA 11.3.1 from the SDK header: ``` #if __GNUC__ > 11 // Error out ... #if (__clang_major__ >= 12) \|\| (__clang_major__ < 3) \|\| ((__clang_major__ == 3) && (__clang_minor__ < 3)) // Error out ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/86360 Approved by: https://github.com/ezyang	2022-11-23 03:07:22 +00:00
Shen Li	f5d18574a3	Allow Module forward-pre and forward hooks to take kwargs (#89389 ) closes #35643 This PR is mostly borrowed from #82042. Thanks @Padarn for implementing the first version and debugging into the errors. Based on the discussion in #82042 this PR adds a with_kwargs argument to register_forward_pre_hook and register_forward_hook methods. When the arg is set to true, the provided hook must accept kwargs args. Under the hook, this PR adds a `_forward_pre_hooks_with_kwargs` and a `_forward_hook_with_kwargs` set to keep track of which hooks accept kwargs. Differential Revision: [D41431111](https://our.internmc.facebook.com/intern/diff/D41431111) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89389 Approved by: https://github.com/soulitzer	2022-11-23 02:43:32 +00:00
Dmitry Tomshin	57e05e822d	Issue 68576 prefetch factor (#88972 ) Fixes #68576 This PR allows set the `prefetch_factor=None` making it really optional according to the documentation Pull Request resolved: https://github.com/pytorch/pytorch/pull/88972 Approved by: https://github.com/kit1980	2022-11-18 00:10:50 +00:00
erjia	3d8a853a87	[DataPipe] Add container template for _Fork and _Demux (#89216 ) - This would remove the hard-coded check within `_ChildDataPipe`. - Add `get_length_by_instance` to parent class to make sure there is a chance that child DataPipe can have different lengths - Prevent Error when `__del__` executed when the object has already been removed Pull Request resolved: https://github.com/pytorch/pytorch/pull/89216 Approved by: https://github.com/NivekT	2022-11-17 23:06:41 +00:00
Kazuaki Ishizaki	1cd6ebe095	Fix typos in messages under torch (#89049 ) This PR fixes typos of messages in `.py` files under torch directory. Only in `torch/onnx/symbolic_opset16.py`, fix a typo in comment to make the operator name correct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89049 Approved by: https://github.com/lezcano	2022-11-17 04:18:14 +00:00
soulitzer	6b521bbf35	Prevent module full_backward_hook from erroring in double backward (#88357 ) Also clarifies documentation to say "execute if and only if gradients wrt outputs are computed" (previously, "execute every time gradients wrt inputs are computed") See https://docs.google.com/document/d/1tFZKYdsSzRBJ7Di7SWt8X8fSg-E3eiUPwomMF10UyhM/edit for more details regarding the question: 'should module full_backward_hooks be called every time the gradients wrt module inputs are called, or should module full_backward_hooks only be called when the "backward for the module" have been computed?' Fixes https://github.com/pytorch/pytorch/issues/88312 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88357 Approved by: https://github.com/albanD	2022-11-16 19:27:30 +00:00
Kevin Tse	be8d88f8d0	[DataLoader] Removing DataLoader2 related code (#88848 ) Removing these lines of code as `DataLoader2` has been added to [TorchData](https://github.com/pytorch/data). I'm importing this to confirm it will not impact internal codes. Differential Revision: [D41201578](https://our.internmc.facebook.com/intern/diff/D41201578) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88848 Approved by: https://github.com/ejguan	2022-11-11 22:27:01 +00:00
Nikita Shulga	575e02df53	Fix CUDNN_PATH handling on Windows (#88898 ) Fixes https://github.com/pytorch/pytorch/issues/88873 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88898 Approved by: https://github.com/kit1980	2022-11-11 21:19:26 +00:00
erjia	90cf14ddf6	[DataPipe] Deprecating drop_empty_batches from Filter and other functional APIs (#88693 ) - Deprecating based on https://github.com/pytorch/data/issues/163 Corresponding PRs from TorchData: https://github.com/pytorch/data/pull/890 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88693 Approved by: https://github.com/NivekT	2022-11-10 19:54:22 +00:00
kshitij12345	eb9b156019	[fix] MathBits: serialization (#88182 ) Fixes #81690 TODO: * [x] C++ Unpickler Fix (locally tested pickled in Python and unpickled in C++) * [x] C++ Pickler Fix (locally tested pickled in C++ and unpickled in Python) * [x] Do quant_tensor, sparse_tensor, etc require similar changes? (Sparse and Quant don't need this) * [x] Add Comments * [x] How to make sure C++ and Python are in sync? (Functions in `pickler.h` help in getting and setting Tensor Metadata (math-bits for now) on a tensor. They are the only place which should handle this.) Notes: Quant Tensor don't support complex dtypes and for float they segfault with `_neg_view` : https://github.com/pytorch/pytorch/issues/88484 Sparse Tensor: ```python >>> a = torch.tensor([[0, 2.], [3j, 0]]).to_sparse() >>> a.conj().is_conj() False >>> a._neg_view() Traceback (most recent call last): File "<stdin>", line 1, in <module> NotImplementedError: Cannot access storage of SparseTensorImpl ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88182 Approved by: https://github.com/ezyang, https://github.com/anjali411	2022-11-09 17:15:12 +00:00
Eddie Yan	a7420d2ccb	Hopper (`sm90`) support (#87736 ) Essentially a followup of #87436 CC @xwang233 @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/87736 Approved by: https://github.com/xwang233, https://github.com/malfet	2022-11-09 01:49:50 +00:00
Kurt Mohler	ee28b865ee	Deprecate TypedStorage, its derived classes, and all of their public methods (#85303 ) Part of #85302 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85303 Approved by: https://github.com/ezyang	2022-11-08 18:11:01 +00:00
Kazuaki Ishizaki	4ea2310f1e	Fix typos used in documents under torch directory (#88483 ) This PR fixes typos, in comments of Python files, that are found from a search box at https://pytorch.org/docs/master/search.html. This is a follow-up of #88300. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88483 Approved by: https://github.com/kit1980	2022-11-08 01:33:36 +00:00
Vitaly Fedyunin	9dadf8fcc2	[DataPipes] Add group support to the sharding_filter (#88424 ) Differential Revision: [D41006747](https://our.internmc.facebook.com/intern/diff/D41006747) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88424 Approved by: https://github.com/ejguan	2022-11-07 22:07:01 +00:00
Elias Ellison	2ce2fc133d	Disable Current Modes when printing Tensor (#88344 ) Fix for https://github.com/pytorch/pytorch/issues/88087 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88344 Approved by: https://github.com/ezyang, https://github.com/samdow	2022-11-04 00:45:35 +00:00
Kazuaki Ishizaki	2ddefbdc3c	Fix typos used in documents under torch directory (#88300 ) This PR fixes typos, in comments of Python files, that are found from a search box at https://pytorch.org/docs/master/search.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/88300 Approved by: https://github.com/lezcano	2022-11-02 09:38:13 +00:00
Salil Desai	bc68625151	[Vulkan] Add support for Optimization Blocklist to Vulkan Rewrite (#87431 ) Optimization Blocklist will be used in a future diff (D40315730) to make the rewrite to transfer input/output backends optional Differential Revision: [D40315729](https://our.internmc.facebook.com/intern/diff/D40315729/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87431 Approved by: https://github.com/mcr229, https://github.com/digantdesai	2022-10-31 14:15:51 +00:00
jpvillam	38dd4cbdf1	ROCm enable sparse_sampled_addmm (#86401 ) Enables: test_comprehensive_sparse_sampled_addmm_cuda_complex128 test_comprehensive_sparse_sampled_addmm_cuda_complex64 test_comprehensive_sparse_sampled_addmm_cuda_float32 test_comprehensive_sparse_sampled_addmm_cuda_float64 test_dispatch_meta_sparse_sampled_addmm_cuda_complex128 test_dispatch_meta_sparse_sampled_addmm_cuda_complex64 test_dispatch_meta_sparse_sampled_addmm_cuda_float32 test_dispatch_meta_sparse_sampled_addmm_cuda_float64 test_meta_sparse_sampled_addmm_cuda_complex128 test_meta_sparse_sampled_addmm_cuda_complex64 test_meta_sparse_sampled_addmm_cuda_float32 test_meta_sparse_sampled_addmm_cuda_float64 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86401 Approved by: https://github.com/ngimel	2022-10-26 19:39:24 +00:00
erjia	63138fbec3	[DataLoader2] Change serialization wrapper to iterator (#87459 ) This is temporary fix for internal SEV. We have run three different workflows to validate this fix would unblock internal SEV. And, those are a few following-up tasks: - [ ] Create reproducible test for multithreading with generator - [ ] Figure out how to make fullsynciterator is working properly with generator - [ ] Move Wrapper back to generator if needed Pull Request resolved: https://github.com/pytorch/pytorch/pull/87459 Approved by: https://github.com/NivekT	2022-10-25 01:27:56 +00:00
Greg Hogan	71fe069d98	ada lovelace (arch 8.9) support (#87436 ) changes required to be able to compile https://github.com/pytorch/vision and https://github.com/nvidia/apex for `sm_89` architecture Pull Request resolved: https://github.com/pytorch/pytorch/pull/87436 Approved by: https://github.com/ngimel	2022-10-24 21:25:36 +00:00
Nikita Shulga	c28cdb53ea	[BE] Delete BUILD_SPLIT_CUDA option (#87502 ) As we are linking with cuDNN and cuBLAS dynamically for all configs anyway, as statically linked cuDNN is different library than dynamically linked one, increases default memory footprint, etc, and libtorch_cuda even if compiled for all GPU architectures is no longer approaching 2Gb binary size limit, so BUILD_SPLIT_CUDA can go away. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87502 Approved by: https://github.com/atalman	2022-10-22 06:00:59 +00:00
samdow	169ec120ef	[Modes] refactor modes to only use a stack in cpp (#86458 ) Refactors the mode code to only have the C++ mode stack and not the "C++ mode" like we originally had. This also simplifies the mode logic in a number of places Pull Request resolved: https://github.com/pytorch/pytorch/pull/86458 Approved by: https://github.com/zou3519	2022-10-21 19:18:23 +00:00
Brian Hirsh	ce0c6e828e	Reland "add an API for external backends to register custom device names (#86992 )" (#87453 ) Re-land of https://github.com/pytorch/pytorch/pull/86992 This reverts commit `a895af9250`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87453 Approved by: https://github.com/ezyang, https://github.com/albanD	2022-10-21 16:51:36 +00:00
PyTorch MergeBot	a895af9250	Revert "add an API for external backends to register custom device names (#86992 )" This reverts commit `fb6826bfd8`. Reverted https://github.com/pytorch/pytorch/pull/86992 on behalf of https://github.com/jeanschmidt due to breaking internal builds - D40534212 - arstudio-windows-tests-landcastle-0	2022-10-20 14:51:08 +00:00
erjia	b90db4a78f	[DataPipe] Fix type checking to accept both Iter and Map DataPipe (#87285 ) Fixes https://github.com/pytorch/data/issues/841 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87285 Approved by: https://github.com/NivekT	2022-10-20 05:05:56 +00:00
albanD	c141f28b64	Fix compilation warning and spurious print (#87297 ) Fixes compilation warning, make this warning an error and remove a random print. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87297 Approved by: https://github.com/malfet	2022-10-19 20:56:37 +00:00
Brian Hirsh	fb6826bfd8	add an API for external backends to register custom device names (#86992 ) This API adds some improvements to external backends who are building C++ backends out of tree using the `PrivateUse1` dispatch key. The docs and linked examples go over the API in more detail, but you should be able to use it like: ``` # This should probably be in the __init__.py file of a external backend's python package > torch.register_privateuse1_backend("foo")` # And it will allow the user to do this: > a = torch.ones(2, device="foo") ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/86992 Approved by: https://github.com/albanD	2022-10-19 16:44:17 +00:00
leizhenyuan	c6187ea326	add support for pin memory on xpu device (#86545 ) add support for pin memory on xpu device Pull Request resolved: https://github.com/pytorch/pytorch/pull/86545 Approved by: https://github.com/ezyang	2022-10-19 13:24:48 +00:00
Tongzhou Wang	7ff1ca4e33	Add type annotation to get_worker_info (#87017 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87017 Approved by: https://github.com/ejguan, https://github.com/NivekT	2022-10-19 00:25:04 +00:00
hxu296	09a967d6c9	Make nested TreeSpec printing nicer (#46538 ) (#86546 ) 1. Made TreeSpec into a dataclass. 2. In `__repr__`, recursively transformed TreeSpec into dictionaries and then pretty-printed it. Fixes #46538. Hi, @ezyang. this PR is for the TreeSpec `__repr__` refactor we discussed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86546 Approved by: https://github.com/ezyang	2022-10-18 16:50:39 +00:00
Kevin Tse	c01c7a5e2c	[DataPipe] Fix missing functional name for FileLister (#86497 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86497 Approved by: https://github.com/ejguan	2022-10-17 18:13:37 +00:00
edward-io	4584d06e76	[data] add autocompletion to datapipes (#86960 ) In REPLs (e.g. jupyter notebook) autocomplete now works: <img width="750" alt="image" src="https://user-images.githubusercontent.com/53842584/195776448-f33180da-d1cd-4e47-b9a0-4fd9eb2f78b7.png"> even with custom data pipes: <img width="804" alt="image" src="https://user-images.githubusercontent.com/53842584/195776957-5c51895e-f469-4b13-81ba-c9b507022555.png"> Unfortunately I wasn't able to figure out how to get autocomplete to work for non-REPLs (e.g. VSCode) - may need to generate fake pyi stubs, which 1) won't work for custom datapipes and 2) is a larger project to tackle :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86960 Approved by: https://github.com/NivekT	2022-10-15 00:25:26 +00:00
Kshiteej K	54ee95c8ec	[nn] module: full_backward_pre_hook (#86700 ) Fixes https://github.com/pytorch/pytorch/issues/42824 * [x] Test * [x] Doc Pull Request resolved: https://github.com/pytorch/pytorch/pull/86700 Approved by: https://github.com/soulitzer	2022-10-13 17:36:39 +00:00
Syed Tousif Ahmed	77d94ac5ab	Sets CUDA_MODULE_LOADING to LAZY when not set by the user (#85692 ) This PR sets CUDA_MODULE_LOADING if it's not set by the user. By default, it sets it to "LAZY". It was tested using the following commands: ``` python -c "import torch; tensor=torch.randn(20, 16, 50, 100).cuda(); free, total = torch.cuda.cudart().cudaMemGetInfo(0); print(total-free)" ``` which shows a memory usage of: 287,047,680 bytes vs ``` CUDA_MODULE_LOADING="DEFAULT" python -c "import torch; tensor=torch.randn(20, 16, 50, 100).cuda(); free, total = torch.cuda.cudart().cudaMemGetInfo(0); print(total-free)" ``` which shows 666,632,192 bytes. C++ implementation is needed for the libtorch users (otherwise it could have been a pure python functionality). cc: @ptrblck @ngimel @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/85692 Approved by: https://github.com/malfet	2022-10-13 14:03:01 +00:00
胡玮文	92562046e9	Optimize __dlpack_device__ performance (#86665 ) This can be critical when processing a large number of tensors ```bash python -m timeit --setup 'import torch; t = torch.empty(1000, device="cuda")' 't.__dlpack_device__()' ``` based on 1.12.1: before: 100000 loops, best of 5: 2.32 usec per loop after: 500000 loops, best of 5: 844 nsec per loop Pull Request resolved: https://github.com/pytorch/pytorch/pull/86665 Approved by: https://github.com/SunDoge, https://github.com/soulitzer	2022-10-11 19:03:46 +00:00
Rohan Varma	d93b1b9c4e	Address feedback from previous PR (#86622 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86622 Approved by: https://github.com/albanD	2022-10-10 18:53:41 +00:00
Rohan Varma	7a411952fb	CheckpointSequential support non-reentrant (#86331 ) Closes https://github.com/pytorch/pytorch/issues/86328 Adds `use_reentrant` argument to `checkpoint_sequential`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86331 Approved by: https://github.com/zhaojuanmao, https://github.com/albanD	2022-10-06 23:10:18 +00:00
originates	dfde7cf3e2	ANTIALIAS updated to Resampling.LANCZOS in torch/utils/tensorboard/summary.py (#85679 ) Line 492: ANTIALIAS updated to Resampling.LANCZOS Removes the following Depreciation Warning: `DeprecationWarning: ANTIALIAS is deprecated and will be removed in Pillow 10 (2023-07-01). ` `Use Resampling.LANCZOS instead.` --- ``` try: ANTIALIAS = Image.Resampling.LANCZOS except AttributeError: ANTIALIAS = Image.ANTIALIAS image = image.resize((scaled_width, scaled_height), ANTIALIAS) ``` Now Resampling.LANCZOS will be used unless it gives an AttributeError exception in which case it will revert back to using Image.ANTIALIAS. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85679 Approved by: https://github.com/albanD	2022-10-03 22:10:02 +00:00

1 2 3 4 5 ...

1263 Commits