Commit Graph

667 Commits

Author SHA1 Message Date
PyTorch MergeBot
c0fd7894cc Revert "Fast standalone symbolize for unwinding (#123966)"
This reverts commit 772ae6da1e.

Reverted https://github.com/pytorch/pytorch/pull/123966 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, check D56522678 ([comment](https://github.com/pytorch/pytorch/pull/123966#issuecomment-2076821043))
2024-04-25 10:04:48 +00:00
Tiger Huo
94af62b000 Updated test_graph_grad_scaling to use new OptimizerInfo infrastructure (#123581)
This PR targets the issue mentioned in #123451 , and solves the specific task to update`test_graph_grad_scaling` in `test/test_cuda.py` to use the new OptimizerInfo infrastructure.

`test_graph_grad_scaling` is moved to a new `TestCase` class called `TestCudaOptims` in order to use `instantiate_device_type_tests`. The test content remained the same. `@onlyCUDA` is applied to the new test; the original use of the wrapper function is also changed to a `@parametrize` decorator for better style.

If we think that this migration is successful, we can delete the original test item under `TestCuda`. Currently it is left untouched to avoid any unexpected issues.

Local linter passed.
```
$ lintrunner test/test_cuda.py
ok No lint issues.
```

Local tests passed.
```
> python .\test\test_cuda.py -k test_graph_grad_scaling
Ran 7 tests in 0.458s
OK (skipped = 3)
```
Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123581
Approved by: https://github.com/janeyx99
2024-04-25 06:29:20 +00:00
Catherine Lee
4f29103749 [ez][CI] Move test_cuda off CI_SERIAL_LIST (#124649)
Tag test cases with large tensor with serial, also tag a few more that failed on a previous iteration of this PR

Move test_cuda and test_cuda_expandable_segments off the serial list
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124649
Approved by: https://github.com/ZainRizvi
2024-04-24 22:04:23 +00:00
zdevito
772ae6da1e Fast standalone symbolize for unwinding (#123966)
We've had issues using addr2line. On certain versions of
CentOS it is on a version that has a performance regression making it very slow,
and even normallly it is not that fast, taking several seconds even when parallelized
for a typical memory trace dump.

Folly Symbolize or LLVMSymbolize are fast but it requires PyTorch take a dependency on those libraries to do this, and given the number of environments we run stuff in, we end up hitting cases where we fallback to slow addr2line behavior.

This adds a standalone symbolizer to PyTorch similar to the unwinder which has
no external dependencies and is ~20x faster than addr2line for unwinding PyTorch frames.

I've tested this on some memory profiling runs using all combinations of {gcc, clang} x {dwarf4, dwarf5} and it seems to do a good job at getting line numbers and function names right. It is also careful to route all reads of library data through the `CheckedLexer` object, which ensure it is not reading out of bounds of the section. Errors are routed through UnwindError so that those exceptions get caught and we produce a ?? frame rather than crash. I also added a fuzz test which gives all our symbolizer options random addresses in the process to make sure they do not crash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123966
Approved by: https://github.com/ezyang
2024-04-23 15:27:18 +00:00
Aaron Gokaslan
5a1216bb2e [BE]: Update ruff to 0.4.1 (#124549)
Update ruff to 0.4.1 .
This version fixes a lot false negatives/false positives, is 20-40% faster, and has various other bug fixes.

Below is a before and after table showing the execution time of ruff lint and ruff format in milliseconds courtesy of https://astral.sh/blog/ruff-v0.4.0

| Repository                                         | Linter (v0.3) | Linter (v0.4) | Formatter (v0.3) | Formatter (v0.4) |
|----------------------------------------------------|---------------|---------------|------------------|------------------|
| [pytorch/pytorch](https://github.com/pytorch/pytorch) | 328.7         | 251.8         | 351.1            | 274.9            |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124549
Approved by: https://github.com/ezyang
2024-04-21 14:06:23 +00:00
Michael Lazos
16771747c2 Add tensor step and capturable support to rprop (#122261)
Towards fixing https://github.com/pytorch/pytorch/issues/115679
Fixes Rprop step update while compiling

Also adds capturable support + testing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122261
Approved by: https://github.com/janeyx99
2024-03-28 23:31:18 +00:00
Edward Z. Yang
0284bca99b Don't cache device_count if we haven't initialized CUDA yet (#122815)
Before initializing CUDA, it can change by modifying CUDA_VISIBLE_DEVICES

Fixes https://github.com/pytorch/pytorch/issues/122085
Fixes https://github.com/pytorch/pytorch/issues/38616
Fixes https://github.com/pytorch/pytorch/issues/110000
Fixes https://github.com/pytorch/pytorch/issues/110971
Fixes https://github.com/pytorch/pytorch/issues/95073

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122815
Approved by: https://github.com/albanD
2024-03-28 13:23:45 +00:00
Michael Lazos
caa57e4fcd Add tensor step and capturable support to rmsprop (#122264)
Towards fixing https://github.com/pytorch/pytorch/issues/115679
Fixes RMSprop step update while compiling

Adds capturable support to RMSprop

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122264
Approved by: https://github.com/janeyx99
2024-03-28 03:39:28 +00:00
Frank Lin
249e65b92d Graph-Safe RNG State Exchange for Tensor Parallelism (#114068)
See #113541

The PR allows for registering and controlling multiple RNG states using indices, ensuring cudagraph-safe operations, and includes both C++ and Python API changes to support this functionality.

cc  @eellison @anijain2305 @jansel @ezyang @ptrblck @csarofeen @mcarilli
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114068
Approved by: https://github.com/ezyang, https://github.com/eqy, https://github.com/xuzhao9
2024-03-27 01:14:38 +00:00
PyTorch MergeBot
4dc09d6aa4 Revert "Graph-Safe RNG State Exchange for Tensor Parallelism (#114068)"
This reverts commit e9dcda5cba.

Reverted https://github.com/pytorch/pytorch/pull/114068 on behalf of https://github.com/ezyang due to memory leak in another ci ([comment](https://github.com/pytorch/pytorch/pull/114068#issuecomment-2018044527))
2024-03-25 13:49:04 +00:00
Michael Lazos
365e89a591 Add tensor step to adadelta (#122252)
Towards fixing https://github.com/pytorch/pytorch/issues/115679
Fixes Adadelta step update while compiling

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122252
Approved by: https://github.com/janeyx99
2024-03-21 07:28:47 +00:00
Frank Lin
e9dcda5cba Graph-Safe RNG State Exchange for Tensor Parallelism (#114068)
See #113541

The PR allows for registering and controlling multiple RNG states using indices, ensuring cudagraph-safe operations, and includes both C++ and Python API changes to support this functionality.

cc  @eellison @anijain2305 @jansel @ezyang @ptrblck @csarofeen @mcarilli
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114068
Approved by: https://github.com/ezyang
2024-03-21 01:57:08 +00:00
Andres Lugo-Reyes
e01b07e1e8 [ROCm] Autocast RNN Support (#121539)
Fixes #116361

Implements Autocast wrapper for miopen rnn's

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121539
Approved by: https://github.com/albanD, https://github.com/jeffdaily
2024-03-11 21:14:43 +00:00
Natalia Gimelshein
89add71168 fix synchronization behavior for copies with type change (#121341)
Fixes #121320

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121341
Approved by: https://github.com/albanD
2024-03-11 17:09:45 +00:00
Aidyn-A
ca9678405a [CUDA graphs] Pool argument for make_graphed_callables (#121475)
It is just a nice feature to have for the situations when users want multiple graphs captures and/or graphed callables to share the same memory pool.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121475
Approved by: https://github.com/eellison, https://github.com/eqy
2024-03-09 00:15:38 +00:00
Jane Xu
9d6c5be781 Add ASGD capturable API for forloop (#121264)
@tfsingh I got to it first--wanted to land this stack and close the gap ASAP.

This PR also fixes a discrepancy between `_init_group` and `__set_state__` because we have the constants live on params' device always.

There are some next steps though:
- ASGD can be made faster by making etas, mus, steps be on CPU when NOT capturable. (I had mistakenly thought foreachifying was faster and so we landed https://github.com/pytorch/pytorch/pull/107857, but it is slower). No one has complained yet though.  ¯\_(ツ)_/¯

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121264
Approved by: https://github.com/albanD
ghstack dependencies: #121260
2024-03-08 00:00:30 +00:00
Jane Xu
24821fec26 Add RAdam capturable API for forloop (#121260)
Implementation thanks to @MarouaneMaatouk in https://github.com/pytorch/pytorch/pull/118697, though I've since cleaned it up a lot to save perf on the rect < 5 eager case. It also just looks better now :) Added tests and the cudagraph health check.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121260
Approved by: https://github.com/mlazos
2024-03-08 00:00:30 +00:00
Jane Xu
53bdae736d Add capturable single tensor Adamax (#121183)
Finishes the work started in https://github.com/pytorch/pytorch/pull/118697. Thanks @MarouaneMaatouk for the attempt, but due to inactivity I have opened this PR for Adamax. Note that the new capturable implementation is much simpler and I've modified the foreach capturable impl--it now calls fewer kernels and is more easily comparable to forloop.

Next steps:
* This PR discovered two bugs: #121178 and #121238.
* Move the now hefty graph optim tests in test_cuda to use OptimInfo.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121183
Approved by: https://github.com/albanD
2024-03-07 17:57:02 +00:00
Aaron Enye Shi
aa36821615 [Memory Snapshot] Stop clearing history when changing context (#120436)
Summary:
This change will avoid clearing the memory event history, when changing the context from `record_memory_history(context=None)` to `record_memory_history(context="python")`.

Now it will continue recording memory events with changing context on the fly. Only `record_memory_history(enabled=None)` will clear the history.

Test Plan:
# Ran on the following local Resnet50 example:

- At iteration=0, record_memory_history(context=None, stacks="python")
- At iteration=3, record_memory_history(context="all", stacks="python")
- After iteration=4, export_memory_snapshot()

## Before:
 - Only collects the last 2 iterations with python call stacks.
![image](https://github.com/pytorch/pytorch/assets/17602366/86154532-9f73-4d10-9194-19e8c96ee4f3)

## After:
 - Collects all 5 iterations, where first 3 iterations have no call stacks, and last 2 iterations have python call stacks.
![image](https://github.com/pytorch/pytorch/assets/17602366/c2c277d6-b400-4da2-85c8-a7f119d409f8)
![image](https://github.com/pytorch/pytorch/assets/17602366/dc9da2f8-41cc-44b0-9c32-ec3cbe79d2c4)

Differential Revision: D54084017

Pulled By: aaronenyeshi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120436
Approved by: https://github.com/zdevito, https://github.com/leitian
2024-02-28 22:46:26 +00:00
CaoE
113138aa55 add test cases for GradScaler on CPU (#109994)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109994
Approved by: https://github.com/jgong5, https://github.com/ezyang
2024-02-02 21:49:07 +00:00
Michael Lazos
800e2e823f Add compilable foreach RAdam support (#117912)
Fixes https://github.com/pytorch/pytorch/issues/117807

This brings the number of supported optimizers with `torch.compile` to 11/13 (!)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117912
Approved by: https://github.com/janeyx99
2024-01-27 04:32:27 +00:00
Aaron Shi
6ac284122b [Memory Snapshot] Track context for SEGMENT_FREE and SEGMENT_UNMAP (#118055)
Summary: Show the stack when SEGMENT_FREE and SEGMENT_UNMAP occurs. This may be useful for debugging such as when empty_cache() may cause a segment to be freed. If the free context is unavailable, resort to the segment allocation stack.

Test Plan: CI

Differential Revision: D52984953

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118055
Approved by: https://github.com/zdevito
2024-01-23 21:48:57 +00:00
Michael Lazos
aaae2d8bb6 Add compilable and capturable foreach adamax with tests (#117835)
Based off of https://github.com/pytorch/pytorch/pull/110345

Fixes https://github.com/pytorch/pytorch/issues/117812

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117835
Approved by: https://github.com/janeyx99
2024-01-20 05:29:05 +00:00
Masaki Kozuki
1d14adfa66 [mta] Fused SGD (#116585)
depends on #116583

rel:
- #94791

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116585
Approved by: https://github.com/janeyx99
2024-01-16 23:54:38 +00:00
CaoE
29516bd2a0 add _amp_foreach_non_finite_check_and_unscale_cpu_ and _amp_update_scale_cpu_ kernels on CPU (#109281)
Step1 of https://github.com/pytorch/pytorch/issues/111559.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109281
Approved by: https://github.com/jgong5, https://github.com/ezyang
2024-01-16 15:25:08 +00:00
Ting Lu
c167c34396 Skip unsupported tests on arm (#117344)
add skips to tests that involve record_context_cpp on ARM as it is only supported on linux x86_64 arch. Error is reported as below:
```
Traceback (most recent call last):
  File "/usr/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/lib/python3.10/unittest/case.py", line 591, in run
    self._callTestMethod(testMethod)
  File "/usr/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
    method()
  File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2674, in wrapper
    method(*args, **kwargs)
  File "/opt/pytorch/pytorch/test/test_cuda.py", line 3481, in test_direct_traceback
    c = gather_traceback(True, True, True)
RuntimeError: record_context_cpp is not support on non-linux non-x86_64 platforms
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117344
Approved by: https://github.com/malfet, https://github.com/drisspg
2024-01-12 21:12:11 +00:00
Doe Hyun Yoon
83c45a9931 Faster gc_count update for CUDACachingAllocator (and avoid nullptr de… (#117064)
…reference) (#109065)

Summary:

Modify the way we update gc_count in CUDACachingAlloctor to make it faster.

Originally D48481557, but reverted due to nullptr dereference in some cases (D49003756). This diff changed to use correct constructor for search key (so avoid nullptr dereference). Also, added nullptr check (and returns 0 if it is) in gc_count functions.

Differential Revision: D49068760

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117064
Approved by: https://github.com/zdevito
2024-01-11 19:47:05 +00:00
Nikita Shulga
a6325ad86c Fix cuInit test on Windows (#117055)
By changing library name from `libcuda.so.1` to `nvcuda.dll` on Windows

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117055
Approved by: https://github.com/Skylion007, https://github.com/huydhn, https://github.com/atalman
2024-01-10 00:45:18 +00:00
Nikita Shulga
81b7a09d27 [CI] Test that cuInit is not called during import (#117010)
By making a driver API call in subprocess and expecting it to return `CUDA_ERROR_NOT_INITIALIZED`

Test Plan: run it on nighties before https://github.com/pytorch/pytorch/pull/116201 got reverted and observe the failure

This is very important for lots of distributed launchers

Fixes https://github.com/pytorch/pytorch/issues/116276

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117010
Approved by: https://github.com/albanD
2024-01-09 14:44:22 +00:00
Aaron Gokaslan
95041829c8 Add bfloat16 CUDA support to RNN (#116927)
Fixes #116925
Fixes #116763

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116927
Approved by: https://github.com/malfet
2024-01-06 22:55:34 +00:00
Aaron Gokaslan
3fe437b24b [BE]: Update flake8 to v6.1.0 and fix lints (#116591)
Updates flake8 to v6.1.0 and fixes a few lints using sed and some ruff tooling.
- Replace `assert(0)` with `raise AssertionError()`
- Remove extraneous parenthesis i.e.
  - `assert(a == b)` -> `assert a == b`
  - `if(x > y or y < z):`->`if x > y or y < z:`
  - And `return('...')` -> `return '...'`

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116591
Approved by: https://github.com/albanD, https://github.com/malfet
2024-01-03 06:04:44 +00:00
Aaron Gokaslan
bd10fea79a [BE]: Enable F821 and fix bugs (#116579)
Fixes #112371

I tried to fix as many of the bugs as I could, a few I could not figure out what the proper fix for them was though and so I left them with noqas.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116579
Approved by: https://github.com/ezyang
2024-01-01 08:40:46 +00:00
zdevito
4afe2687d5 Reland "Serve multistream graph captures from correct pool (#114647)" (#116199)
Fixes a variable shadowing problem that broke internal builds.

This reverts commit fe15645619.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116199
Approved by: https://github.com/eellison
2023-12-20 21:22:34 +00:00
PyTorch MergeBot
fe15645619 Revert "Serve multistream graph captures from correct pool (#114647)"
This reverts commit 8a445f7bd5.

Reverted https://github.com/pytorch/pytorch/pull/114647 on behalf of https://github.com/jeanschmidt due to breaking multiple internal build jobs, please check internal diff in order to obtain more details ([comment](https://github.com/pytorch/pytorch/pull/114647#issuecomment-1864840724))
2023-12-20 17:11:42 +00:00
zdevito
8a445f7bd5 Serve multistream graph captures from correct pool (#114647)
This fixes #114320 by placing the logic for determining whether to allocate
to a pool inside a callback that is controlled by CUDAGraph.cpp or by the
python bound api to allocate a stream directly to a pool.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114647
Approved by: https://github.com/ngimel, https://github.com/eellison
2023-12-18 18:24:15 +00:00
rzou
8ddca5aeae markDynamoStrictTest some more tests (#115857)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115857
Approved by: https://github.com/voznesenskym
ghstack dependencies: #115845, #115855, #115856
2023-12-15 01:22:38 +00:00
atalman
43e3242490 [BE] Remove test corner cases for CUDA older than supported 11.8 (#114989)
Remove deprecated CUDA use cases from tests.
Similar to: https://github.com/pytorch/pytorch/pull/112873

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114989
Approved by: https://github.com/malfet
2023-12-04 21:41:03 +00:00
eqy
6a86cf00ad [CUDA][cuBLAS] Remove explicit cuBLAS workspace allocation for CUDA 12.2+ (#113994)
cuBLAS should be using `cudaMallocAsync` in CUDA 12.2+, which removes the need for explicit workspace allocation to avoid increasing memory usage with multiple graph captures.

CC @ptrblck @malfet

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113994
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-11-22 23:23:51 +00:00
Banit Agrawal
cc776d2186 [PyTorch Pinned Allocator] Create per thread task pool for mapping memory space (#111545)
Differential Revision: D50443865

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111545
Approved by: https://github.com/zdevito
2023-10-22 00:23:49 +00:00
Kazuaki Ishizaki
a603dcc307 Fix typo under test directory (#110826)
This PR fixes typo `the the` of comments in files under `test` directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110826
Approved by: https://github.com/Skylion007
2023-10-08 20:52:38 +00:00
Banit Agrawal
64583c4d04 [CUDA Host Allocator] Add support of CudaHostRegister (#108488)
Summary: This diff adds another option to create cuda pinned memory using cudaHostRegister.

Differential Revision: D45843715

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108488
Approved by: https://github.com/zdevito
2023-10-06 04:13:02 +00:00
Aidyn-A
e7bd9c5315 [CUDA][CUDA Graphs] Fix CUDAGraph::reset function (#108896)
The following two cases fail due to a small oversight `CUDAGraph::reset()` that causes failures in graph destructor
```Python
import torch

x = torch.zeros(4, device="cuda")
g = torch.cuda.CUDAGraph()
with torch.cuda.graph(g):
    x = x + 1

g.reset()
del g
```
that fails with:
```
terminate called after throwing an instance of 'c10::Error'
  what():  uc >= 0 INTERNAL ASSERT FAILED at ".../pytorch/c10/cuda/CUDACachingAllocator.cpp":2157, please report a bug to PyTorch.
```

and reset and subsequent re-capture
```Python
import torch

x = torch.zeros(4, device="cuda")
g = torch.cuda.CUDAGraph()
with torch.cuda.graph(g):
    x = x + 1

g.reset()

with torch.cuda.graph(g):
    x = x + 1
g.replay()
```
which fails with:
```
Traceback (most recent call last):
  File "test_graph.py", line 11, in <module>
    with torch.cuda.graph(g):
  File ".../pytorch/torch/cuda/graphs.py", line 192, in __enter__
    self.cuda_graph.capture_begin(
  File ".../pytorch/torch/cuda/graphs.py", line 77, in capture_begin
    super().capture_begin(pool=pool, capture_error_mode=capture_error_mode)
RuntimeError: This CUDAGraph instance already owns a captured graph. To capture a new graph, create a new instance.

```

This PR fixes `CUDAGraph::reset()` function for above to use cases.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108896
Approved by: https://github.com/ezyang
2023-09-11 19:49:31 +00:00
Michael Lazos
b193f295b6 Add capturable ASGD impl (#107857)
Add capturable ASGD impl + test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107857
Approved by: https://github.com/janeyx99
2023-09-07 06:30:30 +00:00
Banit Agrawal
b8af8ac784 [CUDACaching Allocator] Release the allocator lock on the slow path (#108367)
Summary: This diff is to release the global allocator lock on the slow path when we do synchronous cudaMalloc call.

Differential Revision: D48750077

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108367
Approved by: https://github.com/zdevito
2023-09-02 02:52:25 +00:00
Elias Ellison
0a9778a372 Expose cudaStreamCaptureMode in CUDA Graphs, use local setting in inductor (#107407)
>  capture_error_mode (str, optional): specifies the cudaStreamCaptureMode for the graph capture stream.
Can be "global", "thread_local" or "relaxed". During cuda graph capture, some actions, such as cudaMalloc,
 may be unsafe. "global" will error on actions in other threads, "thread_local" will only error for
 actions in the current thread, and "relaxed" will not error on these actions.

Inductor codegen is single-threaded, so it should be safe to enable "thread_local" for inductor's cuda graph capturing. We have seen errors when inductor cudagraphs has been used concurrently with data preprocessing in other threads.

Differential Revision: [D48656014](https://our.internmc.facebook.com/intern/diff/D48656014)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107407
Approved by: https://github.com/albanD, https://github.com/eqy
2023-08-25 01:44:26 +00:00
Zachary DeVito
cc54448a07 [memory snapshot] add 'address' key to block (#107171)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107171
Approved by: https://github.com/ngimel
2023-08-23 18:57:24 +00:00
Aaron Gokaslan
660e8060ad [BE]: Update ruff to 0.285 (#107519)
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.

I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
2023-08-22 23:16:38 +00:00
PyTorch MergeBot
d59a6864fb Revert "[BE]: Update ruff to 0.285 (#107519)"
This reverts commit 88ab3e4322.

Reverted https://github.com/pytorch/pytorch/pull/107519 on behalf of https://github.com/ZainRizvi due to Sorry, but this PR breaks internal tests. @ezyang, can you please hep them get unblocked? It seems like one of the strings was prob accidentally modified ([comment](https://github.com/pytorch/pytorch/pull/107519#issuecomment-1688833480))
2023-08-22 19:53:32 +00:00
Aaron Gokaslan
88ab3e4322 [BE]: Update ruff to 0.285 (#107519)
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.

I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
2023-08-20 01:36:18 +00:00
lcskrishna
bc662ffff9 [ROCm] Update ROCm skip decorators (#106138)
This PR adds a msg argument for skipIfRocm and skipCUDAIfRocm.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106138
Approved by: https://github.com/jataylo, https://github.com/jeffdaily, https://github.com/pruthvistony, https://github.com/albanD
2023-08-18 22:02:06 +00:00