Commit Graph

16 Commits

Author SHA1 Message Date
Jeff Daily
ba694520e5 [ROCm] fix JIT codegen (#57400)
Summary:
Fixes upcoming changes that are part of ROCm 4.2 and affect PyTorch JIT.

- ROCM_VERSION macro must be available to both device and host compilation passes.
- Unifies some of CUDA and HIP differences in the code generated.
  - NAN / POS_INFINITY / NEG_INFINITY
  - Do not hipify `extern __shared__` -> `HIP_DYNAMIC_SHARED()` macro [deprecated]
- Differentiates bf16 codegen for HIP.
- Optionally provides missing macros when using hiprtc precompiled header feature.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57400

Reviewed By: ejguan

Differential Revision: D28421065

Pulled By: malfet

fbshipit-source-id: 215f476773c61d8b0d9d148a4e5f5d016f863074
2021-05-27 11:45:07 -07:00
Nikita Shulga
3a66a1cb99 [clang-tidy] Exclude cppcoreguidelines-avoid-magic-numbers (#57841)
Summary:
Add cppcoreguidelines-avoid-magic-numbers exclusion to clang-tidy
Remove existing nolint warnings using following script:
```
for file in `git ls-files | grep -v \.py`; do gsed '/^ *\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers)/d' -i  $file; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57841

Reviewed By: samestep

Differential Revision: D28295045

Pulled By: malfet

fbshipit-source-id: 7c6e8d1213c9593f169ed3df6a916498f1a97163
2021-05-07 20:02:33 -07:00
Nikita Shulga
4cb534f92e Make PyTorch code-base clang-tidy compliant (#56892)
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os

def get_compiled_files_list():
    import json
    with open("build/compile_commands.json") as f:
        data = json.load(f)
    files = [os.path.relpath(node['file']) for node in data]
    for idx, fname in enumerate(files):
        if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
            files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
    return files

def run_clang_tidy(fname):
    check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
    changes = check_output(["git", "ls-files", "-m"])
    if len(changes) == 0:
        return
    check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])

def main():
    git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
    compiled_files = get_compiled_files_list()
    for idx, fname in enumerate(git_files):
        if fname not in compiled_files:
            continue
        if fname.startswith("caffe2/contrib/aten/"):
            continue
        print(f"[{idx}/{len(git_files)}] Processing {fname}")
        run_clang_tidy(fname)

if __name__ == "__main__":
    main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892

Reviewed By: H-Huang

Differential Revision: D27991944

Pulled By: malfet

fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
2021-04-28 14:10:25 -07:00
Jeff Daily
e1752ffa04 [reland][ROCm] use hiprtc precompiled header (#55965)
Summary:
Revert "Revert D27449031 (2a7df657fe): [pytorch][PR] [ROCm] use hiprtc precompiled header".  Reland PR https://github.com/pytorch/pytorch/issues/54350.

This reverts commit 204ac21bf1.

The original PR was reverted under suspicion that it was causing CI instability, but it was instead due to a hardware failure.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55965

Reviewed By: jbschlosser

Differential Revision: D27755907

Pulled By: malfet

fbshipit-source-id: 75bf0b9d888df3dee62f00a366b1123757e0474e
2021-04-15 15:47:56 -07:00
Alexander Golynski
204ac21bf1 Revert D27449031: [pytorch][PR] [ROCm] use hiprtc precompiled header
Test Plan: revert-hammer

Differential Revision:
D27449031 (2a7df657fe)

Original commit changeset: 81a8d7847a47

fbshipit-source-id: b7b970c8ea4110357fba3ad4d52a86fa5641d90c
2021-04-01 06:42:04 -07:00
Jeff Daily
2a7df657fe [ROCm] use hiprtc precompiled header (#54350)
Summary:
HIP's runtime compiler (hiprtc) is adding support for precompiled HIP headers in the ROCm 4.2 release.  Conditionally add support for this feature.  Using this feature will improve the ROCm torch wheel user experience; users will no longer need to install HIP headers separately to use torch JIT features.

The use of this feature is conditionalized on a new ROCM_VERSION macro.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54350

Reviewed By: H-Huang

Differential Revision: D27449031

Pulled By: malfet

fbshipit-source-id: 81a8d7847a47ce2bb253d1ea58740ef66ed154a3
2021-03-31 13:36:50 -07:00
Bert Maher
9db4802184 [fuser] Support bfloat16 (#54571)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54571

Supports bfloat16 via a similar method to half: upconvert inputs to
fp32, do math, then downconvert outputs to bf16.

Resource strings are mostly derived from cuda-11 headers.

Fixes #53918, for the legacy fuser at least.

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D27328987

Pulled By: bertmaher

fbshipit-source-id: 5c0eae44164623faa0c75cb818e8bf0211579fdc
2021-03-25 15:59:15 -07:00
Nikita Shulga
5499e839f1 [Fuser] Do not attempt to use OpenMP if build without OpenMP support (#51504)
Summary:
Clang from XCode does not support `-fopenmp` option, no need to try to compile with it.
Infer whether OpenMP is supported by checking _OPENMP define.
Also, use clang compiler if host app was compiled with clang rather than gcc.
Fix few range loop warnings and add static_asserts that range loop variables are raw pointers.

This changes makes fuser tests on OS X a bit faster.

Before:
```
% python3 test_jit.py -v  TestScript.test_batchnorm_fuser_cpu
Fail to import hypothesis in common_utils, tests are not derandomized
CUDA not available, skipping tests
test_batchnorm_fuser_cpu (__main__.TestScript) ... clang: error: unsupported option '-fopenmp'
clang: error: unsupported option '-fopenmp'
warning: pytorch jit fuser failed to compile with openmp, trying without it...
ok

----------------------------------------------------------------------
Ran 1 test in 0.468s

OK
```

After:
```
% python3 test_jit.py -v  TestScript.test_batchnorm_fuser_cpu
Fail to import hypothesis in common_utils, tests are not derandomized
CUDA not available, skipping tests
test_batchnorm_fuser_cpu (__main__.TestScript) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.435s

OK
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51504

Reviewed By: smessmer

Differential Revision: D26186875

Pulled By: malfet

fbshipit-source-id: 930b3bcf543fdfad0f493d687072aaaf5f9e2bfc
2021-02-02 15:31:59 -08:00
Scott Wolchok
4a0d17ba2d [PyTorch][codemod] Replace immediately-dereferenced expect calls w/expectRef (#50228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50228

`fastmod -m 'expect(<((at|c10)::)?\w+Type>\(\)\s*)->'
'expectRef${1}.'`
Presuming it builds, this is a safe change: the result of `expect()`
wasn't being saved anywhere, so we didn't need it, so we can take a
reference instead of a new `shared_ptr`.
ghstack-source-id: 119782961

Test Plan: CI

Reviewed By: SplitInfinity

Differential Revision: D25837374

fbshipit-source-id: 86757b70b1520e3dbaa141001e7976400cdd3b08
2021-01-13 16:13:55 -08:00
Bert Maher
6ec8fabc29 Fix frac in CUDA fuser (#44152)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44152

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D23528506

fbshipit-source-id: bfd468d72fa55ce317f88ae83e1f2d5eee041aa0
2020-09-09 11:10:08 -07:00
Elias Ellison
5bd2902796 [JIT] Remove references to no longer generated _tanh_backward and _sigmoid_backward (#44138)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44138

If you look at the sigmoid and tanh backward they are composed of other ops: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/runtime/symbolic_script.cpp#L786
https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/runtime/symbolic_script.cpp#L164

So tanh_backward and sigmoid_backward are no longer generated / legacy ops.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D23543603

Pulled By: eellison

fbshipit-source-id: ce8353e53043cf969b536aac47c9576d66d4ce02
2020-09-05 01:41:36 -07:00
Bert Maher
0bf27d64f4 Fix NaN propagation in fuser's min/max implementation (#43590)
Summary:
fmax/fmin propagate the number if one argument is NaN, which doesn't match the eager mode behavior.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43590

Reviewed By: mruberry

Differential Revision: D23338664

Pulled By: bertmaher

fbshipit-source-id: b0316a6f01fcf8946ba77621efa18f339379b2d0
2020-08-26 17:31:06 -07:00
Nikolay Korovaiko
6ecb5bb1f0 match old fuser rem to eager (#37196)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37196

Reviewed By: zdevito

Differential Revision: D21223172

Pulled By: Krovatkin

fbshipit-source-id: 4d4ff1127d5dc69ab73f07ca79c1f5b0b4dd9732
2020-05-01 10:55:06 -07:00
Deyu Fu
346215caa4 [jit] Adding vectorized load/store support for JIT generated CUDA kernel (#36555)
Summary:
JIT pointwise kernel currently does not do vectorized load/store, which may lead to not optimal performance in shorter data types, like half and int8.

In this PR, a fixed length of 4 elements per load/store is added for supported tensor shape, implemented as a runtime check inside kernel.

Supported tensor shape:
- all input/output data point are aligned to 4*sizeof(dtype)
- last dimension contiguous(stride 1) and size is multiple of 4
- all other dimension have stride that is multiple of 4

All test_jit* passed, and here is performance result on a simple `ax+by+c` fusion
result before PR:
```
torch.float32 kernel time: 0.748 ms.
torch.float16 kernel time: 0.423 ms.
torch.int8 kernel time: 0.268 ms.
```
result after PR:
```
torch.float32 kernel time: 0.733 ms.
torch.float16 kernel time: 0.363 ms.
torch.int8 kernel time: 0.191 ms.
```
test code:
```
import torch
import time

# disable profiling to test all data types
torch._C._jit_set_profiling_mode(False)
torch._C._jit_set_profiling_executor(False)

torch.jit.script
def axpby(x, y):
    return  x * 2 - y * 3 + 1

for test_dtype in [torch.float32, torch.float16, torch.int8]:
    a = torch.randn(12345,4096, device="cuda").to(test_dtype)
    b = torch.randn(12345,4096, device="cuda").to(test_dtype)

    # warm up
    for _ in range(100):
        c = axpby(a,b)

    torch.cuda.synchronize()
    start = time.time()

    for _ in range(1000):
        c = axpby(a,b)

    torch.cuda.synchronize()
    end = time.time()
    print("{} kernel time: {:.3f} ms.".format(test_dtype, end-start))
```
Generated code:
[log_with_generated_code.txt](https://github.com/pytorch/pytorch/files/4472813/log_with_generated_code.txt)

Additional note:
double type is disabled from vectorized code path.

We can later improve it with dynamic vectorization length support and less in-kernel check when we can use tensor shape information in codegen. For now, this implementation is following cache through TensorDesc mechanism, which does not have enough compile time information.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36555

Differential Revision: D21142762

Pulled By: ngimel

fbshipit-source-id: 1cfdc5807a944c4670b040dc2d2dfa480377e7d7
2020-04-20 19:24:28 -07:00
Meghan Lele
6384c2d81b [JIT] clang-format JIT code (#35115)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35115

This commit runs the newly added tools/clang_format.py on the JIT
codebase and includes all of the formatting changes thus produced.

Testing:
Ran the script, CI.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D20568523

Pulled By: SplitInfinity

fbshipit-source-id: e09bdb982ccf090eecfb7c7b461b8d0681eef82b
2020-03-26 11:24:51 -07:00
Michael Suo
dbe850af5b [jit] do the code reorg (#33851)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851

Rationale and context described in #33828.

Script to reproduce the move:
https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9
ghstack-source-id: 99079645

Test Plan: Make sure CI passes

Reviewed By: jamesr66a

Differential Revision: D20133869

fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e
2020-02-27 13:02:51 -08:00