Commit Graph

197 Commits

Author SHA1 Message Date
atalman
244b124bb8 Add linux cpu test for 3.12 (#117853)
This is continuation of work: https://github.com/pytorch/pytorch/pull/113987

Co-authored-by: albanD <desmaison.alban@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117853
Approved by: https://github.com/albanD
2024-02-14 20:52:23 +00:00
Jason Ansel
2de24c11f6 [inductor] Slightly faster memory allocation on CUDA (#118255)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118255
Approved by: https://github.com/peterbell10
ghstack dependencies: #118065, #118070, #118171
2024-01-25 20:49:14 +00:00
Jason Ansel
817debeb89 [inductor] Slightly faster memory allocation on CPU (#118171)
Based on `python benchmarks/dynamo/microbenchmarks/overheads.py`:
- Before `12.2us`
- After `10.5us`

This is inspired by a2c17a2b00 -- but in Python rather than C++

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118171
Approved by: https://github.com/jgong5, https://github.com/peterbell10
ghstack dependencies: #118065, #118070
2024-01-25 16:54:57 +00:00
Jason Ansel
a669319450 [inductor] Faster C++ kernel python bindings (#117500)
Calling C++ from Python via ctypes is notoriously slow.  This switches to generating our own C++ bindings directly, which is a >5x speedup on this kernel-launch-bound microbenchmark:
```python
from ctypes import c_void_p
import torch
from torch import empty
from torch._inductor.codecache import AsyncCompile
from torch._dynamo.testing import rand_strided
from torch._inductor.utils import print_performance
from torch._inductor.wrapper_benchmark import compiled_module_main

async_compile = AsyncCompile()

src = '''
#include "/tmp/torchinductor_jansel/gb/cgbau5vlj6cetmcjbjbtw6x4rrivaln6f45s5d72gy2bfx5foz3k.h"
extern "C" void kernel(const float* in_ptr0,
                       float* out_ptr0)
{
    {
        auto tmp0 = in_ptr0[static_cast<long>(0L)];
        auto tmp1 = static_cast<float>(1.0);
        auto tmp2 = decltype(tmp0)(tmp0 + tmp1);
        out_ptr0[static_cast<long>(0L)] = tmp2;
    }
}
'''

cpp_fused_add_ctypes = async_compile.cpp(src)
cpp_fused_add_cpython = async_compile.cpp_pybinding(["const float*", "float*"], src)

async_compile.wait(globals())
del async_compile

def call(arg0_1):
    buf0 = empty((1,), device='cpu', dtype=torch.float32)
    if use_ctypes:
        for _ in range(100):
            cpp_fused_add_ctypes(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
    else:
        for _ in range(100):
            cpp_fused_add_cpython(arg0_1, buf0)
    del arg0_1
    return (buf0,)

def benchmark_compiled_module(times=1000, repeat=100):
    arg0_1 = rand_strided((1,), (1,), device='cpu', dtype=torch.float32)
    return print_performance(lambda: call(arg0_1), times=times, repeat=repeat)

print("old ctypes bindings: ", end='')
use_ctypes = True
compiled_module_main('None', benchmark_compiled_module)
print("new bindings:        ", end='')
use_ctypes = False
compiled_module_main('None', benchmark_compiled_module)
```
Output:
```
old ctypes bindings: 0.000073
new bindings:        0.000013
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117500
Approved by: https://github.com/desertfire
2024-01-18 16:20:12 +00:00
Nikita Shulga
a1afd1b195 Revert "[inductor] Faster C++ kernel python bindings (#117500)"
It should have never been landed, but was landed again, thanks to
ghstack grafting/ungrafting see discussion on https://github.com/pytorch/pytorch/pull/116910

This reverts commit e457b6fb18.
2024-01-17 17:06:32 -08:00
titaiwangms
e457b6fb18 [inductor] Faster C++ kernel python bindings (#117500)
Calling C++ from Python via ctypes is notoriously slow.  This switches to generating our own C++ bindings directly, which is a >5x speedup on this kernel-launch-bound microbenchmark:
```python
from ctypes import c_void_p
import torch
from torch import empty
from torch._inductor.codecache import AsyncCompile
from torch._dynamo.testing import rand_strided
from torch._inductor.utils import print_performance
from torch._inductor.wrapper_benchmark import compiled_module_main

async_compile = AsyncCompile()

src = '''
#include "/tmp/torchinductor_jansel/gb/cgbau5vlj6cetmcjbjbtw6x4rrivaln6f45s5d72gy2bfx5foz3k.h"
extern "C" void kernel(const float* in_ptr0,
                       float* out_ptr0)
{
    {
        auto tmp0 = in_ptr0[static_cast<long>(0L)];
        auto tmp1 = static_cast<float>(1.0);
        auto tmp2 = decltype(tmp0)(tmp0 + tmp1);
        out_ptr0[static_cast<long>(0L)] = tmp2;
    }
}
'''

cpp_fused_add_ctypes = async_compile.cpp(src)
cpp_fused_add_cpython = async_compile.cpp_pybinding(["const float*", "float*"], src)

async_compile.wait(globals())
del async_compile

def call(arg0_1):
    buf0 = empty((1,), device='cpu', dtype=torch.float32)
    if use_ctypes:
        for _ in range(100):
            cpp_fused_add_ctypes(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
    else:
        for _ in range(100):
            cpp_fused_add_cpython(arg0_1, buf0)
    del arg0_1
    return (buf0,)

def benchmark_compiled_module(times=1000, repeat=100):
    arg0_1 = rand_strided((1,), (1,), device='cpu', dtype=torch.float32)
    return print_performance(lambda: call(arg0_1), times=times, repeat=repeat)

print("old ctypes bindings: ", end='')
use_ctypes = True
compiled_module_main('None', benchmark_compiled_module)
print("new bindings:        ", end='')
use_ctypes = False
compiled_module_main('None', benchmark_compiled_module)
```
Output:
```
old ctypes bindings: 0.000073
new bindings:        0.000013
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117500
Approved by: https://github.com/desertfire
ghstack dependencies: #117409, #116667, #117591
2024-01-17 23:03:15 +00:00
PyTorch MergeBot
da6abaeeac Revert "[inductor] Faster C++ kernel python bindings (#117500)"
This reverts commit bb0fd1bd3c.

Reverted https://github.com/pytorch/pytorch/pull/117500 on behalf of https://github.com/PaliC due to breaking internal discussed with author offline ([comment](https://github.com/pytorch/pytorch/pull/117500#issuecomment-1896516512))
2024-01-17 19:34:26 +00:00
titaiwangms
bb0fd1bd3c [inductor] Faster C++ kernel python bindings (#117500)
Calling C++ from Python via ctypes is notoriously slow.  This switches to generating our own C++ bindings directly, which is a >5x speedup on this kernel-launch-bound microbenchmark:
```python
from ctypes import c_void_p
import torch
from torch import empty
from torch._inductor.codecache import AsyncCompile
from torch._dynamo.testing import rand_strided
from torch._inductor.utils import print_performance
from torch._inductor.wrapper_benchmark import compiled_module_main

async_compile = AsyncCompile()

src = '''
#include "/tmp/torchinductor_jansel/gb/cgbau5vlj6cetmcjbjbtw6x4rrivaln6f45s5d72gy2bfx5foz3k.h"
extern "C" void kernel(const float* in_ptr0,
                       float* out_ptr0)
{
    {
        auto tmp0 = in_ptr0[static_cast<long>(0L)];
        auto tmp1 = static_cast<float>(1.0);
        auto tmp2 = decltype(tmp0)(tmp0 + tmp1);
        out_ptr0[static_cast<long>(0L)] = tmp2;
    }
}
'''

cpp_fused_add_ctypes = async_compile.cpp(src)
cpp_fused_add_cpython = async_compile.cpp_pybinding(["const float*", "float*"], src)

async_compile.wait(globals())
del async_compile

def call(arg0_1):
    buf0 = empty((1,), device='cpu', dtype=torch.float32)
    if use_ctypes:
        for _ in range(100):
            cpp_fused_add_ctypes(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
    else:
        for _ in range(100):
            cpp_fused_add_cpython(arg0_1, buf0)
    del arg0_1
    return (buf0,)

def benchmark_compiled_module(times=1000, repeat=100):
    arg0_1 = rand_strided((1,), (1,), device='cpu', dtype=torch.float32)
    return print_performance(lambda: call(arg0_1), times=times, repeat=repeat)

print("old ctypes bindings: ", end='')
use_ctypes = True
compiled_module_main('None', benchmark_compiled_module)
print("new bindings:        ", end='')
use_ctypes = False
compiled_module_main('None', benchmark_compiled_module)
```
Output:
```
old ctypes bindings: 0.000073
new bindings:        0.000013
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117500
Approved by: https://github.com/desertfire
ghstack dependencies: #117409, #116667, #117591
2024-01-17 19:12:24 +00:00
PyTorch MergeBot
9da01affd3 Revert "[inductor] Faster C++ kernel python bindings (#117500)"
This reverts commit 3a52147cc5.

Reverted https://github.com/pytorch/pytorch/pull/117500 on behalf of https://github.com/PaliC due to breaking internal discussed with author offline ([comment](https://github.com/pytorch/pytorch/pull/117500#issuecomment-1896426304))
2024-01-17 18:42:39 +00:00
Jason Ansel
3a52147cc5 [inductor] Faster C++ kernel python bindings (#117500)
Calling C++ from Python via ctypes is notoriously slow.  This switches to generating our own C++ bindings directly, which is a >5x speedup on this kernel-launch-bound microbenchmark:
```python
from ctypes import c_void_p
import torch
from torch import empty
from torch._inductor.codecache import AsyncCompile
from torch._dynamo.testing import rand_strided
from torch._inductor.utils import print_performance
from torch._inductor.wrapper_benchmark import compiled_module_main

async_compile = AsyncCompile()

src = '''
#include "/tmp/torchinductor_jansel/gb/cgbau5vlj6cetmcjbjbtw6x4rrivaln6f45s5d72gy2bfx5foz3k.h"
extern "C" void kernel(const float* in_ptr0,
                       float* out_ptr0)
{
    {
        auto tmp0 = in_ptr0[static_cast<long>(0L)];
        auto tmp1 = static_cast<float>(1.0);
        auto tmp2 = decltype(tmp0)(tmp0 + tmp1);
        out_ptr0[static_cast<long>(0L)] = tmp2;
    }
}
'''

cpp_fused_add_ctypes = async_compile.cpp(src)
cpp_fused_add_cpython = async_compile.cpp_pybinding(["const float*", "float*"], src)

async_compile.wait(globals())
del async_compile

def call(arg0_1):
    buf0 = empty((1,), device='cpu', dtype=torch.float32)
    if use_ctypes:
        for _ in range(100):
            cpp_fused_add_ctypes(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
    else:
        for _ in range(100):
            cpp_fused_add_cpython(arg0_1, buf0)
    del arg0_1
    return (buf0,)

def benchmark_compiled_module(times=1000, repeat=100):
    arg0_1 = rand_strided((1,), (1,), device='cpu', dtype=torch.float32)
    return print_performance(lambda: call(arg0_1), times=times, repeat=repeat)

print("old ctypes bindings: ", end='')
use_ctypes = True
compiled_module_main('None', benchmark_compiled_module)
print("new bindings:        ", end='')
use_ctypes = False
compiled_module_main('None', benchmark_compiled_module)
```
Output:
```
old ctypes bindings: 0.000073
new bindings:        0.000013
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117500
Approved by: https://github.com/desertfire
2024-01-16 22:30:04 +00:00
Michael Voznesensky
1e7947b3e0 Revert "Reland 3rd try [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#109323)" + Forward fixes + test (#110964)
This reverts commit f786fbdebd.

Forward fixes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110964
Approved by: https://github.com/ezyang, https://github.com/anijain2305
2023-10-11 05:16:47 +00:00
soulitzer
df9a6bcaef [reland] Symintify guards.cpp (#110675)
reland of https://github.com/pytorch/pytorch/pull/110371

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110675
Approved by: https://github.com/ezyang
ghstack dependencies: #110673, #110674
2023-10-10 19:37:17 +00:00
PyTorch MergeBot
585e2bd818 Revert "Symintify guards.cpp (#110371)"
This reverts commit e1cfcdfa06.

Reverted https://github.com/pytorch/pytorch/pull/110371 on behalf of https://github.com/PaliC due to bottom diff is causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/110371#issuecomment-1749798063))
2023-10-05 23:42:35 +00:00
soulitzer
e1cfcdfa06 Symintify guards.cpp (#110371)
Separating this out so we can check perf more easily

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110371
Approved by: https://github.com/ezyang
ghstack dependencies: #110044, #110369, #110370
2023-10-04 22:56:42 +00:00
Yu Guo
2bf3ca1be7 [torchdynamo] preserve deterministic_algorithms_warn_only in convert_context (#110457)
Summary: preserve deterministic_algorithms_warn_only  in dynamo context

Test Plan: modified unit tests to test warn_only

Differential Revision: D49872622

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110457
Approved by: https://github.com/jansel
2023-10-04 07:12:32 +00:00
Animesh Jain
0673aa3d28 [dynamo][guards-log] Print nn module guard saved dict versions for debugging (#110028)
This is the output for nn module guards

~~~
[DEBUG] GUARDS:
[DEBUG] hasattr(L['x'], '_dynamo_dynamic_indices') == False           # _dynamo/variables/builder.py:1356 in wrap_fx_proxy_cls
[DEBUG] ___check_obj_id(L['self'], 139820807110912)                   # for mod in self.mods:  # examples/graph_break.py:35 in forward
[DEBUG] __nn_module_guard_0(L['self']) # versions(mod=9998, _parameters=1194395, _buffers=1194397, _modules=1194423, _forward_hooks=1194405, _forward_pre_hooks=1194411, _backward_hooks=1194402, _backward_pre_hooks=1194400)  # for mod in self.mods:  # examples/graph_break.py:35 in forward
[DEBUG] ___check_obj_id(L['self'].mods[0], 139817945727568)           # for mod in self.mods:  # examples/graph_break.py:35 in forward
[DEBUG] __nn_module_guard_1(L['self'].mods[0]) # versions(mod=10001, _parameters=1194428, _buffers=1194430, _modules=1194522, _forward_hooks=1194438, _forward_pre_hooks=1194444, _backward_hooks=1194435, _backward_pre_hooks=1194433)  # for mod in self.mods:  # examples/graph_break.py:35 in forward
[DEBUG] ___check_obj_id(L['self'].mods[1], 139817945560640)           # for mod in self.mods:  # examples/graph_break.py:35 in forward
[DEBUG] __nn_module_guard_2(L['self'].mods[1]) # versions(mod=10001, _parameters=1194660, _buffers=1194662, _modules=1194753, _forward_hooks=1194670, _forward_pre_hooks=1194676, _backward_hooks=1194667, _backward_pre_hooks=1194665)  # for mod in self.mods:  # examples/graph_break.py:35 in forward
[DEBUG] ___check_obj_id(L['self'].mods[0].linear, 139817945727856)    # return self.linear(a)  # examples/graph_break.py:24 in helper
[DEBUG] __nn_module_guard_3(L['self'].mods[0].linear) # versions(mod=10004, _parameters=1470004, _buffers=1194467, _modules=1194493, _forward_hooks=1194475, _forward_pre_hooks=1194481, _backward_hooks=1194472, _backward_pre_hooks=1194470)  # return self.linear(a)  # examples/graph_break.py:24 in helper
[DEBUG] ___check_obj_id(L['self'].mods[1].linear, 139817945561120)    # return self.linear(a)  # examples/graph_break.py:24 in helper
[DEBUG] __nn_module_guard_4(L['self'].mods[1].linear) # versions(mod=10004, _parameters=1470008, _buffers=1194699, _modules=1194725, _forward_hooks=1194707, _forward_pre_hooks=1194713, _backward_hooks=1194704, _backward_pre_hooks=1194702)  # return self.linear(a)  # examples/graph_break.py:24 in helper
[DEBUG] utils_device.CURRENT_DEVICE == None                           # _dynamo/output_graph.py:373 in init_ambient_guards
~~~

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110028
Approved by: https://github.com/ezyang
ghstack dependencies: #110023, #110039
2023-09-26 08:53:07 +00:00
Edward Z. Yang
518308a740 Trace through pytree API with dynamo. (#108533)
Fix: #107315

This PR enables dynamo to trace through the `pytree` API by inlining its functions. In
order to do so, a few details of `pytree` had to be changed.

In summary, this PR:

- Introduces `TreeSpecVariable` for representing `TreeSpec` instances
- Specializes `<type>.__bases__` call, returning a `TupleVariable`
- Enables the call to `id` builtin function for every variable that implements
  `as_python_constant` method
- Specializes `ConstantVariable.call_method` for its (un)flatten functions
- Implements `UserDefinedObjectVariable.as_python_constant`
- Modifies `pytree` by:
    - Make `SUPPORTED_NODES` a map of ids (instead of types) to `NodeDef`
    - Removed `functools.wraps` function, since it can't be inlined

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108533
Approved by: https://github.com/ezyang, https://github.com/voznesenskym
ghstack dependencies: #109201
2023-09-20 00:04:56 +00:00
Ken Jin
f9e72acc8f Guard default dtype in torchdynamo (#109459)
Fixes https://github.com/pytorch/pytorch/issues/109458

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109459
Approved by: https://github.com/ezyang
2023-09-17 22:51:33 +00:00
Animesh Jain
f786fbdebd Reland 3rd try [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#109323)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109323
Approved by: https://github.com/huydhn, https://github.com/voznesenskym
2023-09-15 08:44:14 +00:00
PyTorch MergeBot
56c2386157 Revert "reland [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108883)"
This reverts commit d4230e5574.

Reverted https://github.com/pytorch/pytorch/pull/108883 on behalf of https://github.com/huydhn due to Per the discussion thread on D49122208, reverting this change ([comment](https://github.com/pytorch/pytorch/pull/108883#issuecomment-1712707853))
2023-09-10 04:40:02 +00:00
Animesh Jain
d4230e5574 reland [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108883)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108883
Approved by: https://github.com/voznesenskym, https://github.com/huydhn
2023-09-09 03:12:31 +00:00
Jason Ansel
4965fffeda [dynamo] Move global state guards to C++ (#108624)
This combines a bunch of python global state guards into a single C++ guard and switches to checking them 100% of the time.  It also adds a few new guards for things that change inductor's behavior.   Even though we are checking more things, I expect this to be much faster.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108624
Approved by: https://github.com/anijain2305
2023-09-08 04:07:08 +00:00
PyTorch MergeBot
72f24d0001 Revert "[dynamo][finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108528)"
This reverts commit 34bb74c4cf.

Reverted https://github.com/pytorch/pytorch/pull/108528 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it has some nasty merge conflicts after the revert of D48910794. I need to revert this so the conflict could be resolved. Please help rebase this tomorrow and reland the change ([comment](https://github.com/pytorch/pytorch/pull/108528#issuecomment-1711034781))
2023-09-08 03:49:41 +00:00
Animesh Jain
34bb74c4cf [dynamo][finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108528)
**This PR is a 99% copy paste of Sam Gross** (@colesbury) work at https://github.com/pytorch/pytorch/pull/100642. Copied from there

--------
The NN_MODULE guard now subsumes guards on Module attributes. The check_fn will fail if the module attributes are changed (such as Module.training), parameters, submodules, and buffers are added or removed, and if fields are changed on the type itself.

This gives up specificity in the guard check -- if any field is changed the check_fn fails -- for faster overall checks.

-----

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108528
Approved by: https://github.com/ezyang
2023-09-07 01:45:47 +00:00
cyy
054f3f1d8f [3/N] fix clang-tidy warnings in torch/csrc (#108024)
Apply fixes to some found issues by clang-tidy in torch/csrc.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108024
Approved by: https://github.com/Skylion007, https://github.com/albanD, https://github.com/malfet
2023-08-28 18:00:00 +00:00
ydwu4
31b0445702 Fix torch.compile with FakeTensor that has SymInt sizes (#107662)
**Motivation:**
When input FakeTensor to torch.compile has SymInt sizes (e.g. make_fx(opt_f, tracing_mode="symbolic"):
1. We cannot create a FakeTensor from that input in dynamo due to the SymInts.
2. We cannot check input tensors in guard check function and will abort due to tensor check calls sizes/strides.

For 1, we specialize the FakeTensor's SymInts using their hints. This is mostly safe since inputs mostly have concrete shapes and not computed from some DynamicOutputShape ops. We'll throw a data dependent error if the symint is unbacked.

For 2, we replace size/stride calls with the sym_* variants in TENSOR_CHECK guards' check function.

**Test Plan:**
See added tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107662
Approved by: https://github.com/ezyang
2023-08-23 05:27:57 +00:00
Edward Z. Yang
68b9bf9671 Simplify verbose error guard printing (#107516)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107516
Approved by: https://github.com/anijain2305
ghstack dependencies: #107505
2023-08-20 06:50:27 +00:00
cyy
1157b4393b Add const reference and std::move in opportunities detected by clang-tidy (#105815)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105815
Approved by: https://github.com/Skylion007
2023-07-25 12:28:14 +00:00
Michael Voznesensky
485cad4a86 Dynamo tensor aliasing guards, dedup graphargs (#104921)
The story here is relatively simple - when we go to wrap a tensor, we (1) ensure that it is a real, not fake tensor (2) check if we have seen it before. (3) If we have seen it, we create a positive alias guard and return the associated variable. If not, we proceed.

By short circuiting here, we avoid lifting it to a graph input, and guarantee that the only names passed to tensors are unique. This allows us to guard on the unique relationships (pyboject addresses, aka IDs, cannot match) to give us guards for negative aliases.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104921
Approved by: https://github.com/jansel, https://github.com/ezyang
2023-07-13 22:18:08 +00:00
PyTorch MergeBot
bfd995f0d6 Revert "Specialize storage_offset - Does not cover automatic dynamic (#104204)"
This reverts commit 803c14490b.

Reverted https://github.com/pytorch/pytorch/pull/104204 on behalf of https://github.com/ezyang due to also due to https://github.com/pytorch/pytorch/issues/104563 ([comment](https://github.com/pytorch/pytorch/pull/104204#issuecomment-1620653507))
2023-07-04 19:41:32 +00:00
Michael Voznesensky
803c14490b Specialize storage_offset - Does not cover automatic dynamic (#104204)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104204
Approved by: https://github.com/wconstab
2023-06-27 05:51:42 +00:00
Brian Hirsh
98ab11a2c3 separate out dynamo .requires_grad and .is_grad_enabled guards (#100570)
Fixes https://github.com/pytorch/pytorch/issues/100977

This will hopefully fix this error (from [issue](https://github.com/pytorch/pytorch/issues/99616))

This PR fixes an internal model: we were running an inductor inference graph, but `torch.is_grad_enabled()` was True, causing us to error inside of the inference graph when we encountered an out= operator.

I haven't been able to create a smaller repro - before landing this, I want to create a smaller repro to convince myself of why we need to separate out these guards.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100570
Approved by: https://github.com/ezyang
2023-05-24 14:58:40 +00:00
PyTorch MergeBot
7f3fed125e Revert "separate out dynamo .requires_grad and .is_grad_enabled guards (#100570)"
This reverts commit 1fabee399d.

Reverted https://github.com/pytorch/pytorch/pull/100570 on behalf of https://github.com/PaliC due to breaking inductor tests along with #101219 ([comment](https://github.com/pytorch/pytorch/pull/100570#issuecomment-1555271267))
2023-05-19 21:29:09 +00:00
Brian Hirsh
1fabee399d separate out dynamo .requires_grad and .is_grad_enabled guards (#100570)
Fixes https://github.com/pytorch/pytorch/issues/100977

This will hopefully fix this error (from [issue](https://github.com/pytorch/pytorch/issues/99616))

This PR fixes an internal model: we were running an inductor inference graph, but `torch.is_grad_enabled()` was True, causing us to error inside of the inference graph when we encountered an out= operator.

I haven't been able to create a smaller repro - before landing this, I want to create a smaller repro to convince myself of why we need to separate out these guards.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100570
Approved by: https://github.com/ezyang
2023-05-19 16:14:56 +00:00
Michael Voznesensky
4c2892944f Guard static shapes alongside tensors, instead of from shape_env, in dynamic_shapes=True (#99566)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99566
Approved by: https://github.com/ezyang
2023-04-22 16:46:52 +00:00
Joel Schlosser
35be579701 Refactor TENSOR_MATCH guards to check dim (for NT support) (#97896)
Tweaks the TENSOR_MATCH guard logic to avoid saving sizes / strides for the case of dynamic shapes. Instead, the dim() is stored, which is enough for both dense tensors and NTs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97896
Approved by: https://github.com/ezyang
2023-03-29 23:08:03 +00:00
cyy
f8ad64d5eb [dynamo] avoid truncation of python pointers (#95619)
This PR is separated from #94927 . It aims to fix to the MSVC warnings that passed python pointers are truncated to a smaller integer type.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95619
Approved by: https://github.com/Skylion007
2023-02-28 19:38:34 +00:00
Michael Voznesensky
9ded087bac During export, generate Python TENSOR_MATCH guards (#94970)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94970
Approved by: https://github.com/ezyang
2023-02-24 05:37:31 +00:00
PyTorch MergeBot
254b161def Revert "During export, generate Python TENSOR_MATCH guards (#94970)"
This reverts commit 5a8092f058.

Reverted https://github.com/pytorch/pytorch/pull/94970 on behalf of https://github.com/voznesenskym due to Clowny comparison bug on edge cases for devices
2023-02-23 17:47:59 +00:00
Michael Voznesensky
5a8092f058 During export, generate Python TENSOR_MATCH guards (#94970)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94970
Approved by: https://github.com/ezyang
2023-02-22 17:28:17 +00:00
PyTorch MergeBot
6ae60b19b7 Revert "During export, generate Python TENSOR_MATCH guards (#94970)"
This reverts commit 5d2eb6d636.

Reverted https://github.com/pytorch/pytorch/pull/94970 on behalf of https://github.com/jeanschmidt due to Requires codev to land internal test changes
2023-02-22 16:49:37 +00:00
Michael Voznesensky
5d2eb6d636 During export, generate Python TENSOR_MATCH guards (#94970)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94970
Approved by: https://github.com/ezyang
2023-02-21 19:12:57 +00:00
Michael Voznesensky
eb39d990ce Guard on at::Tensor device index (#91779)
Fixes #91777

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91779
Approved by: https://github.com/ngimel
2023-01-19 00:58:04 +00:00
Aaron Gokaslan
3916d7a575 Apply modernize-use-emplace to aten, c10, torch (#91077)
Apply clang-tidy check modernize-use-emplace. This is slightly more efficient by using an inplace constructor and is the recommended style in parts of the codebase covered by clang-tidy. This just manually applies the check to rest of the codebase. Pinging @ezyang as this is related to my other PRs he reviewed like #89000

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91077
Approved by: https://github.com/ezyang
2022-12-19 07:49:56 +00:00
Jason Ansel
8b0cc9c752 [inductor] Fix copysign issue in old msvc build (#87117)
Should fix https://github.com/pytorch/pytorch/pull/87028#issuecomment-1281066036
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87117
Approved by: https://github.com/DanilBaibak
2022-10-18 06:06:31 +00:00
Jason Ansel
30f6f6903c [inductor] Move size asserts to C++, fix bug (#87028)
Inductor internally models any `size=1` dimension as having `stride=0` to simplify indexing formulas (sympy will remove these terms from the expression).

This caused a bug in our generate stride assert in detectron2_maskrcnn_r_50_fpn, where we asserted the wrong stride of a size==1 dimension.

This fixes that bug, and moves size/stride assert logic to C++ which should be a small perf gain.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87028
Approved by: https://github.com/anijain2305
2022-10-16 20:17:22 +00:00
Jason Ansel
f1fdb6efbd Manual changes for moving dynamo to core (#86621)
This is the subset of the changes in #86461 not auto-generated by `copy_to_core.sh`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86621
Approved by: https://github.com/albanD
2022-10-11 23:01:21 +00:00