Xu Han
ba81c3c290
[inductor] add cpp builder code. (take 2) ( #125849 )
...
Fully manual rebase the code of PR: https://github.com/pytorch/pytorch/pull/124045
The old PR seems crashed due to too many commits, and too many times rebase. Please reference: https://github.com/pytorch/pytorch/pull/124045#issuecomment-2103744588
-------
It is the first step of RFC https://github.com/pytorch/pytorch/issues/124245 .
Changes:
1. Add cpp builder code, the new cpp_builder support Windows OS.
2. Add CPU ISA checker which is cross OS and exported from backend cpuinfo.
3. Switch compiler ISA checker to new cpp builder.
4. CppCodeCache use the new ISA checker.
5. Add temprary `test_new_cpp_build_logical` UT to help on transfer to new code.
<img width="1853" alt="Image" src="https://github.com/pytorch/pytorch/assets/8433590/ce6519ab-ba92-4204-b1d6-7d15d2ba2cbe ">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125849
Approved by: https://github.com/jgong5 , https://github.com/desertfire
2024-06-07 20:49:58 +00:00
Animesh Jain
662a78f957
[dynamo] Inline the getattr of fx graph and proxy graph ( #128172 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128172
Approved by: https://github.com/yanboliang
ghstack dependencies: #128001 , #126578 , #128158
2024-06-07 17:14:58 +00:00
Xuehai Pan
c97e3ebb96
Fix wrongly exposed variables in torch/__init__.py ( #127795 )
...
<img width="609" alt="image" src="https://github.com/pytorch/pytorch/assets/16078332/964c6707-1856-4c2c-8cd8-ce1d96d38d36 ">
This PR removes temporary variables in `torch/__init__.py`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127795
Approved by: https://github.com/albanD
2024-06-06 08:31:41 +00:00
PyTorch MergeBot
d1fad416a8
Revert "Add aten._unsafe_masked_index ( #116491 )"
...
This reverts commit f03f8bc901 .
Reverted https://github.com/pytorch/pytorch/pull/116491 on behalf of https://github.com/PaliC due to breaking onnx tests ([comment](https://github.com/pytorch/pytorch/pull/116491#issuecomment-2145557724 ))
2024-06-03 15:51:50 +00:00
Isuru Fernando
f03f8bc901
Add aten._unsafe_masked_index ( #116491 )
...
To generate masked indexing operations that would generate
masked loads in triton code
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116491
Approved by: https://github.com/lezcano , https://github.com/peterbell10
2024-06-03 14:44:03 +00:00
lezcano
48538d3d14
Implement svd_lowrank and pca_lowrank for complex numbers ( #125580 )
...
We fix a number of bugs previously present in the complex
implementation.
We also heavily simplify the implementation, using, among
other things, that we now have conjugate views.
I saw there is a comment regarding how slow some checks on this
function are. As such, I removed quite a few of the combinations of inputs
to make the OpInfo lighter. I still left a couple relevant examples to not regress
coverage though.
Fixes https://github.com/pytorch/pytorch/issues/122188
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125580
Approved by: https://github.com/pearu , https://github.com/peterbell10
2024-05-30 14:45:58 +00:00
Andrew M. James
80a8fc07b2
[dynamo] Handle np.iinfo/finfo/dtype as input ( #124482 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124482
Approved by: https://github.com/lezcano
ghstack dependencies: #124481
2024-05-29 16:00:15 +00:00
Animesh Jain
1507d5205a
[dynamo][fsdp] Skip Dynamo tracing of __getattr__ if its top-level frame ( #127263 )
...
The generated bytecode for the first frame is below. Inlined comments about the LOAD_ATTR which causes Dynamo to trigger again on `__getattr__`.
~~~
[__bytecode] MODIFIED BYTECODE fn /data/users/anijain/pytorch2/test/dynamo/test_activation_checkpointing.py line 1129
[__bytecode] 1129 0 COPY_FREE_VARS 1
[__bytecode] 2 RESUME 0
[__bytecode] 4 PUSH_NULL
[__bytecode] 6 LOAD_GLOBAL 10 (__compiled_fn_1)
[__bytecode] 18 LOAD_FAST 0 (x)
[__bytecode] 20 LOAD_DEREF 1 (mod)
[__bytecode] 22 LOAD_ATTR 6 (_checkpoint_wrapped_module)
[__bytecode] 32 LOAD_CONST 1 (0)
[__bytecode] 34 BINARY_SUBSCR
[__bytecode] 44 LOAD_ATTR 7 (weight)
[__bytecode] 54 LOAD_DEREF 1 (mod)
[__bytecode] 56 LOAD_ATTR 6 (_checkpoint_wrapped_module)
[__bytecode] 66 LOAD_CONST 1 (0)
[__bytecode] 68 BINARY_SUBSCR
[__bytecode] 78 LOAD_ATTR 8 (bias)
# When this optimized bytecode is executed, these two lines call the __getattr__ of ActivationWrapper module.
# Dynamo gets invoked on __getattr__.
# If we had inlined __getattr__ during the tracing, we would have seen the LOAD_ATTR
# on more low level data structures like _modules, obviating the need for CPython
# to call python overriden __getattr__. But today, UnspecializedNNModuleVariable
# calls python getattr at tracing time (instead of inlining it), resulting in LOAD_ATTR
# on the module itself.
# To prevent Dynamo to skip tracing of __Getattr__ on the optimized bytecode,
# we can check if its top level frame and just skip it.
[__bytecode] 88 LOAD_DEREF 1 (mod)
[__bytecode] 90 LOAD_ATTR 0 (a)
[__bytecode] 100 PRECALL 4
[__bytecode] 104 CALL 4
[__bytecode] 114 UNPACK_SEQUENCE 1
[__bytecode] 118 RETURN_VALUE
~~~~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127263
Approved by: https://github.com/yf225
2024-05-28 08:16:53 +00:00
Yu, Guangye
e7a42702f9
generalize custom_fwd&custom_bwd to be device-agnostic ( #126531 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126531
Approved by: https://github.com/jgong5 , https://github.com/gujinghui , https://github.com/albanD , https://github.com/EikanWang
ghstack dependencies: #126527
2024-05-25 06:48:16 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
72f0bdcc22
Remove torch._constrain_as_value ( #127103 )
...
Summary: This API doesn't do anything useful and should be subsumed by torch._check.
Test Plan: CI
Differential Revision: D57786740
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127103
Approved by: https://github.com/angelayi
2024-05-24 22:49:46 +00:00
Oguz Ulgen
a6155d23d1
[easy] Delete dead code global ( #126903 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126903
Approved by: https://github.com/aorenste
ghstack dependencies: #126083
2024-05-23 08:29:29 +00:00
Oguz Ulgen
cc61d03ac9
Do not trace into triton/backends ( #126083 )
...
Fixes #125807
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126083
Approved by: https://github.com/yanboliang , https://github.com/jansel
2024-05-23 08:29:29 +00:00
Jack Taylor
d30cdc4321
[ROCm] amdsmi library integration ( #119182 )
...
Adds monitoring support for ROCm using amdsmi in place of pynvml.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119182
Approved by: https://github.com/jeffdaily , https://github.com/malfet , https://github.com/xw285cornell
2024-05-21 01:59:26 +00:00
Jason Ansel
f9de510121
[dynamo] Graph break on set_num_threads ( #126623 )
...
Fixes #125364
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126623
Approved by: https://github.com/yanboliang
2024-05-20 17:44:32 +00:00
PyTorch MergeBot
315389bfed
Revert "Remove deprecated _aminmax operator ( #125995 )"
...
This reverts commit 0116ffae7f .
Reverted https://github.com/pytorch/pytorch/pull/125995 on behalf of https://github.com/huydhn due to Sorry for reverting your change but we need to reland this after I get rid of all usage of _aminmax internally in Meta ([comment](https://github.com/pytorch/pytorch/pull/125995#issuecomment-2113769497 ))
2024-05-16 01:45:37 +00:00
cyy
0116ffae7f
Remove deprecated _aminmax operator ( #125995 )
...
It has been deprecated for a long time.
Co-authored-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125995
Approved by: https://github.com/ezyang
2024-05-12 17:50:17 +00:00
PyTorch MergeBot
0d4fdb0bb7
Revert "[ROCm] amdsmi library integration ( #119182 )"
...
This reverts commit 85447c41e3 .
Reverted https://github.com/pytorch/pytorch/pull/119182 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but the ROCm failed test is legit 85447c41e3 ([comment](https://github.com/pytorch/pytorch/pull/119182#issuecomment-2103433197 ))
2024-05-09 21:18:21 +00:00
Jack Taylor
85447c41e3
[ROCm] amdsmi library integration ( #119182 )
...
Adds monitoring support for ROCm using amdsmi in place of pynvml.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119182
Approved by: https://github.com/jeffdaily , https://github.com/malfet , https://github.com/xw285cornell
2024-05-09 18:21:38 +00:00
Michael Lazos
1b1b18a7a4
Add LRScheduler Composability E2E Tests ( #125653 )
...
adds tests to verify the LRSchedulers correctly update the compiled optimizers without recompiles.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125653
Approved by: https://github.com/yanboliang
ghstack dependencies: #123751 , #123752 , #123753 , #125383
2024-05-09 00:52:43 +00:00
PyTorch MergeBot
2e237fcd70
Revert "[inductor] add cpp builder code. ( #124045 )"
...
This reverts commit 469383755f .
Reverted https://github.com/pytorch/pytorch/pull/124045 on behalf of https://github.com/clee2000 due to broke inductor/test_codecache and inductor/test_max_autotune 469383755f https://github.com/pytorch/pytorch/actions/runs/8996772350/job/24724775182 ([comment](https://github.com/pytorch/pytorch/pull/124045#issuecomment-2100851419 ))
2024-05-08 15:33:20 +00:00
Xu Han
469383755f
[inductor] add cpp builder code. ( #124045 )
...
Previous full PR https://github.com/pytorch/pytorch/pull/115248 is failed to merge due to fb_code is hard to debug.
I also tried to submit them as two pieces, https://github.com/pytorch/pytorch/pull/118514 https://github.com/pytorch/pytorch/pull/118515 . And they have passed PreCI at that time.
Now I tried to split https://github.com/pytorch/pytorch/pull/115248 into smaller piece, and it is the first step of RFC https://github.com/pytorch/pytorch/issues/124245 .
Changes:
1. Add cpp builder code, the new cpp_builder support Windows OS.
2. Add CPU ISA checker which is cross OS and exported from backend cpuinfo.
3. Switch compiler ISA checker to new cpp builder.
4. CppCodeCache use the new ISA checker.
5. Add temprary `test_new_cpp_build_logical` UT to help on transfer to new code.
<img width="1853" alt="Image" src="https://github.com/pytorch/pytorch/assets/8433590/ce6519ab-ba92-4204-b1d6-7d15d2ba2cbe ">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124045
Approved by: https://github.com/jgong5 , https://github.com/jansel
2024-05-08 05:27:15 +00:00
PyTorch MergeBot
2f79a18324
Revert "[inductor] add cpp builder code. ( #124045 )"
...
This reverts commit 7864d287a1 .
Reverted https://github.com/pytorch/pytorch/pull/124045 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing trunk jobs 7864d287a1 including lint ([comment](https://github.com/pytorch/pytorch/pull/124045#issuecomment-2099306071 ))
2024-05-07 21:04:49 +00:00
Xu Han
7864d287a1
[inductor] add cpp builder code. ( #124045 )
...
Previous full PR https://github.com/pytorch/pytorch/pull/115248 is failed to merge due to fb_code is hard to debug.
I also tried to submit them as two pieces, https://github.com/pytorch/pytorch/pull/118514 https://github.com/pytorch/pytorch/pull/118515 . And they have passed PreCI at that time.
Now I tried to split https://github.com/pytorch/pytorch/pull/115248 into smaller piece, and it is the first step of RFC https://github.com/pytorch/pytorch/issues/124245 .
Changes:
1. Add cpp builder code, the new cpp_builder support Windows OS.
2. Add CPU ISA checker which is cross OS and exported from backend cpuinfo.
3. Switch compiler ISA checker to new cpp builder.
4. CppCodeCache use the new ISA checker.
5. Add temprary `test_new_cpp_build_logical` UT to help on transfer to new code.
<img width="1853" alt="Image" src="https://github.com/pytorch/pytorch/assets/8433590/ce6519ab-ba92-4204-b1d6-7d15d2ba2cbe ">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124045
Approved by: https://github.com/jgong5 , https://github.com/jansel
2024-05-07 20:07:41 +00:00
Aaron Gokaslan
1dd42e42c4
[BE]: Try TCH autofixes on torch/ ( #125536 )
...
Tries TCH autofixes and see what breaks
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125536
Approved by: https://github.com/ezyang
2024-05-05 23:13:59 +00:00
Chien-Chin Huang
1eb7b8eb60
[PT2D] Ensure the trace rules are correct with distributed ( #125333 )
...
Summary:
1. Avoid using `torch._dynamo.disable`.
2. Clear the LRU cache of the trace rules. This won't do anything if rules are not evluated before PG initilization.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125333
Approved by: https://github.com/yanboliang
2024-05-02 16:28:38 +00:00
Brian Hirsh
5173cbe260
fix FakeTensor creation on noncontiguous subclasses ( #124399 )
...
Fixes https://github.com/pytorch/pytorch/issues/125287
Fixes https://github.com/pytorch/pytorch/issues/124090 , context on the issue
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124399
Approved by: https://github.com/soulitzer
ghstack dependencies: #124398
2024-05-01 21:56:01 +00:00
Yanbo Liang
ce503c1b40
Dynamo x autograd.Function supports setup_context ( #124802 )
...
Fixes part of #118397
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124802
Approved by: https://github.com/zou3519
2024-04-27 04:57:13 +00:00
Guilherme Leobas
763dc26e59
[Dynamo] Add dynamo support to torch.func.linearize ( #123118 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123118
Approved by: https://github.com/zou3519
2024-04-23 21:31:49 +00:00
Peter Bell
7ecbbc40c3
[HOP][inductor] Add higher order associative scan operator ( #119430 )
...
Currently only supports single tensor scans, e.g. `cumsum`, `cumprod`, `logcumsumexp`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119430
Approved by: https://github.com/Chillee
2024-04-23 14:40:13 +00:00
Jeff Daily
6ede882c0b
preferred blas library; cublaslt gemm implementation ( #122106 )
...
Following the example of PyTorch supporting a preferred Linalg library (cusolver or magma), this PR introduces a preferred blas library selector of either cublas or cublaslt for CUDA and hipblas or hipblaslt for ROCm via normal hipification of sources.
The default blas implementation remains cublas or hipblas. cublaslt or hipblaslt can be enabled using environment variable TORCH_BLAS_PREFER_CUBLASLT=1 (or TORCH_BLAS_PREFER_HIPBLASLT=1 as an alias) or by calling `torch.backends.cuda.preferred_blas_library(backend="cublaslt")` or as an alias `backend="hipblaslt"`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122106
Approved by: https://github.com/lezcano
2024-04-22 15:38:22 +00:00
Aaron Gokaslan
29cc293725
[BE]: FURB142 - Remove set mutations. Use set update ( #124551 )
...
Uses set mutation methods instead of manually reimplementing (update, set_difference etc).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124551
Approved by: https://github.com/ezyang
2024-04-21 14:12:33 +00:00
Animesh Jain
febc4d8759
[dynamo][easy] forbid_in_graph check to use getattr_static ( #124445 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124445
Approved by: https://github.com/yanboliang , https://github.com/jansel
2024-04-20 14:11:05 +00:00
soulitzer
cf5ca58e7f
[NJT] Inline through torch.nested.nested_tensor_from_jagged instead of graph break ( #124343 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124343
Approved by: https://github.com/jbschlosser
2024-04-19 23:13:59 +00:00
PyTorch MergeBot
4a0900d04b
Revert "[NJT] Inline through torch.nested.nested_tensor_from_jagged instead of graph break ( #124343 )"
...
This reverts commit ef93402f61 .
Reverted https://github.com/pytorch/pytorch/pull/124343 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/124343#issuecomment-2064937192 ))
2024-04-18 18:55:48 +00:00
Jason Ansel
7a6edb0b66
Possible fix for einops warning ( #124084 )
...
See https://github.com/arogozhnikov/einops/issues/315
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124084
Approved by: https://github.com/peterbell10
2024-04-18 17:09:50 +00:00
soulitzer
ef93402f61
[NJT] Inline through torch.nested.nested_tensor_from_jagged instead of graph break ( #124343 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124343
Approved by: https://github.com/jbschlosser
2024-04-18 14:42:54 +00:00
Aleksandar Samardžić
f5331aade5
Simplify ATen sparse semi-structured operators based on CUTLASS ( #123473 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123473
Approved by: https://github.com/cpuhrsch
2024-04-14 06:57:41 +00:00
PyTorch MergeBot
97261be0a8
Revert "Simplify ATen sparse semi-structured operators based on CUTLASS ( #123473 )"
...
This reverts commit b2a0b8c446 .
Reverted https://github.com/pytorch/pytorch/pull/123473 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/123473#issuecomment-2053561077 ))
2024-04-13 07:47:32 +00:00
rzou
5d1f9bd2bc
Move the trace_rules.py docs up ( #123873 )
...
I always remember that the docs exist but cannot actually find it in the
file because it is on line 3000. Moving it to the top of the file for
visibility.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123873
Approved by: https://github.com/yanboliang
2024-04-12 20:18:38 +00:00
Aleksandar Samardžić
b2a0b8c446
Simplify ATen sparse semi-structured operators based on CUTLASS ( #123473 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123473
Approved by: https://github.com/cpuhrsch
2024-04-11 11:56:27 +00:00
Guilherme Leobas
2a37793249
[Dynamo] Ensure that Higher Order Ops can be composed in dynamo ( #123357 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123357
Approved by: https://github.com/zou3519
ghstack dependencies: #122211
2024-04-09 18:50:17 +00:00
Guilherme Leobas
dbe0c474a9
Ensure all torch.func.* functions capture can be disabled ( #122212 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122212
Approved by: https://github.com/zou3519
ghstack dependencies: #122211
2024-04-05 03:29:11 +00:00
Joel Schlosser
721dcaff94
Revert usage of NJT views in SDPA ( #123215 )
...
For internal purposes, this PR reverts the use of real views in SDPA -> autograd.Function "views" (i.e. `ViewBufferFromNested` and `ViewNestedFromBuffer`). This is a temporary fix to get the FIRST model launched and working.
**Note: this breaks some other Dynamo tests related to SDPA that rely on real views, but the breakage there isn't expected to be likely in a real-world scenario.**
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123215
Approved by: https://github.com/YuqingJ
2024-04-04 18:45:47 +00:00
PyTorch MergeBot
63d17d3c90
Revert "Revert usage of NJT views in SDPA ( #123215 )"
...
This reverts commit 0fcddb5625 .
Reverted https://github.com/pytorch/pytorch/pull/123215 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but I think it needs to be skipped on ROCm 0fcddb5625 ([comment](https://github.com/pytorch/pytorch/pull/123215#issuecomment-2036080570 ))
2024-04-04 02:57:09 +00:00
Joel Schlosser
0fcddb5625
Revert usage of NJT views in SDPA ( #123215 )
...
For internal purposes, this PR reverts the use of real views in SDPA -> autograd.Function "views" (i.e. `ViewBufferFromNested` and `ViewNestedFromBuffer`). This is a temporary fix to get the FIRST model launched and working.
**Note: this breaks some other Dynamo tests related to SDPA that rely on real views, but the breakage there isn't expected to be likely in a real-world scenario.**
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123215
Approved by: https://github.com/YuqingJ
2024-04-03 23:25:31 +00:00
rzou
44c0c0fc0f
Add torch.library.custom_op ( #122344 )
...
This is the entrypoint for defining an opaque/blackbox (e.g. PyTorch will
never peek into it) custom op. In this PR, you can specify backend impls
and the abstract impl for this op.
NB: most of this PR is docstrings, please don't be intimidated by the
line count.
There are a number of interesting features:
- we infer the schema from type hints. In a followup I add the ability
to manually specify a schema.
- name inference. The user needs to manually specify an op name for now.
In a followup we add the ability to automatically infer a name (this
is a little tricky).
- custom_op registrations can override each other. This makes them
more pleasant to work with in environments like colab.
- we require that the outputs of the custom_op do not alias any inputs
or each other. We enforce this via a runtime check, but can relax this
into an opcheck test if it really matters in the future.
Test Plan:
- new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122344
Approved by: https://github.com/ezyang , https://github.com/albanD
2024-04-03 18:36:17 +00:00
willfengg
f1c4d0fb2c
[dynamo] show inlining reasons from trace_rules ( #123014 )
...
show specific inlining reasons with ``TORCH_LOGS="+dynamo" TORCHDYNAMO_VERBOSE=1``
* before, ``INLINING <code...>, inlined according trace_rules.lookup``
* after, ``INLINING <code...> inlined according trace_rules.lookup MOD_INLINELIST``
this can distanguish between inlining by default or by MOD_INLINELIST (specific rule)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123014
Approved by: https://github.com/jansel
ghstack dependencies: #123013
2024-04-02 03:04:22 +00:00
willfengg
d765e223ac
[dynamo][PT2D] avoid skipping dynamo_resume_* in torch/testing/_internal ( #123013 )
...
this PR ensures ``dynamo_resume_`` survives ``trace_rules.py``. As a ground truth, modules defined outside of ``pytorch/torch`` folders can survive ``trace_rules.py``
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123013
Approved by: https://github.com/jansel
2024-04-01 21:12:48 +00:00
Yu, Guangye
eb7adc3ae0
Refactor gpu trace to be device-agnostic ( #121794 )
...
# Motivation
Refactor gpu trace to be device-agnostic. gpu trace is usually used in runtime components, including Device, Stream, Event, Guard, and Allocator. It should be device-agnostic and can be shared among each device backend.
# Solution
move `_cuda_trace.py` to `_gpu_trace.py`, which makes each device backend owns their callback, respectively.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121794
Approved by: https://github.com/jgong5 , https://github.com/albanD , https://github.com/EikanWang , https://github.com/gujinghui
2024-03-30 13:04:38 +00:00
Mikayla Gawarecki
487b6d40ec
Add RMSNorm module ( #121364 )
...
Similar to dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L51)
**The implementation here is not optimized and we welcome pull requests to improve this**
- Use `normalized_shape` instead of singular integer `dim` to be aligned with the `nn.LayerNorm` implementation
- Remove the [upcast to float and downcast
](dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L73) )
Differential Revision: [](https://our.internmc.facebook.com/intern/diff/ )
Differential Revision: [D55485840](https://our.internmc.facebook.com/intern/diff/D55485840 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121364
Approved by: https://github.com/albanD
2024-03-29 18:05:28 +00:00