pytorch/torch/csrc/jit/codegen/fuser
Nikita Shulga 5499e839f1 [Fuser] Do not attempt to use OpenMP if build without OpenMP support (#51504)
Summary:
Clang from XCode does not support `-fopenmp` option, no need to try to compile with it.
Infer whether OpenMP is supported by checking _OPENMP define.
Also, use clang compiler if host app was compiled with clang rather than gcc.
Fix few range loop warnings and add static_asserts that range loop variables are raw pointers.

This changes makes fuser tests on OS X a bit faster.

Before:
```
% python3 test_jit.py -v  TestScript.test_batchnorm_fuser_cpu
Fail to import hypothesis in common_utils, tests are not derandomized
CUDA not available, skipping tests
test_batchnorm_fuser_cpu (__main__.TestScript) ... clang: error: unsupported option '-fopenmp'
clang: error: unsupported option '-fopenmp'
warning: pytorch jit fuser failed to compile with openmp, trying without it...
ok

----------------------------------------------------------------------
Ran 1 test in 0.468s

OK
```

After:
```
% python3 test_jit.py -v  TestScript.test_batchnorm_fuser_cpu
Fail to import hypothesis in common_utils, tests are not derandomized
CUDA not available, skipping tests
test_batchnorm_fuser_cpu (__main__.TestScript) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.435s

OK
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51504

Reviewed By: smessmer

Differential Revision: D26186875

Pulled By: malfet

fbshipit-source-id: 930b3bcf543fdfad0f493d687072aaaf5f9e2bfc
2021-02-02 15:31:59 -08:00
..
cpu [Fuser] Do not attempt to use OpenMP if build without OpenMP support (#51504) 2021-02-02 15:31:59 -08:00
cuda patch nvrtc API for cuda TK >= 11.1 (#50319) 2021-01-27 23:58:20 -08:00
arg_spec.h Move torch/csrc/utils/hash.h to c10/util/hash.h. (#42503) 2020-08-29 17:47:00 -07:00
codegen.cpp [Fuser] Do not attempt to use OpenMP if build without OpenMP support (#51504) 2021-02-02 15:31:59 -08:00
codegen.h
compiler.cpp [PyTorch][codemod] Replace immediately-dereferenced expect calls w/expectRef (#50228) 2021-01-13 16:13:55 -08:00
compiler.h
executor.cpp
executor.h
fallback.cpp
fallback.h
fused_kernel.h
interface.cpp [NNC] Add cpu fusion gflag (#48682) 2020-12-02 19:47:18 -08:00
interface.h Force LLVM Compilation for CPU Tests (#46949) 2020-11-12 11:12:08 -08:00
kernel_cache.cpp
kernel_cache.h
kernel_spec.h [Fuser] Do not attempt to use OpenMP if build without OpenMP support (#51504) 2021-02-02 15:31:59 -08:00
partition_desc.h
README.md
tensor_desc.h Move torch/csrc/utils/hash.h to c10/util/hash.h. (#42503) 2020-08-29 17:47:00 -07:00
tensor_info.h

PyTorch Fuser

The fuser accepts subgraphs wrapped in "fusion nodes" and tries to execute them by just-in-time (JIT) compiling kernels that run all the graph operations.

Code Organization

The fuser is designed hierarchically with device-independent logic eventually deferring to device-specific logic and implementation. The device-specific code is (mostly) found in each devices' subdirectory. The device-independent logic has six components:

  • The Interface (interface.h/cpp) has functions to register and run fusions, interrogate fusion functionality, and perform debugging.
  • The Compiler (compiler.h/cpp) performs "upfront" and "runtime" compilation. When fusions are registered, upfront compilation produces fallback code and and performs some shape inference. When a fusion is run, runtime compilation invokes code generation and the device-specific compilation logic.
  • The Code Generator (codegen.h/cpp) produces the string to be compiled on the device.
  • The Executor (executor.h/cpp) runs requested fusions. It performs shape inference, expands tensors as necessary, determines the device to run on, acquires a cached compiled kernel or requests the Compiler produce a new one, invokes device-specific code to launch the kernel and updates the stack.
  • The Fallback (fallback.h/cpp) runs subgraphs that can't be fused because shape inference didn't determine a common tensor size or the device the tensors are on doesn't support fusion.
  • The Kernel Specification Cache (kernel_cache.h/cpp) is a thread-safe cache holding the device-independent specifications produced during upfront compilation. These specifications each have their own thread-safe stores of compiled kernels that the Executor checks before requesting runtime compilation.

The device-specific components have logic for compiling and running code in FusedKernelCPU (cpu/fused_kernel.h/cpp) and FusedKernelCUDA (cuda/fused_kernel.h/cpp).