Commit Graph

206 Commits

Author SHA1 Message Date
Akshit Khurana
bb3e1f30a8 [Pytorch NNAPI] Add compilation_preference & relax_f32_to_f16 APIs (#78758)
Summary:
compilation_preference is one of:

ANEURALNETWORKS_PREFER_LOW_POWER = 0
ANEURALNETWORKS_PREFER_FAST_SINGLE_ANSWER = 1
ANEURALNETWORKS_PREFER_SUSTAINED_SPEED = 2

relax_f32_to_f16 calls Model_relaxComputationFloat32toFloat16

Test Plan:
Tested on device with nnapi models

* Works with existing exported models
* Works with new exported models with options

Differential Revision: D36433236

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78758
Approved by: https://github.com/kimishpatel
2022-06-06 20:57:34 +00:00
Max Ren
93d5a722b1 [coreml] Introducing Quantization (#78108)
Summary: Adding Quantization mode to preprocess, which allows us to run through quantization for coreml models

Test Plan:
https://fburl.com/anp/r0ntsbq0

Notebook runnining through quantization workflow:

created a custom bentos kernel to run it through coreml

```bento_kernel(
    name = "coreml",
    deps = [
        "fbsource//third-party/pypi/coremltools:coremltools",
        "//caffe2:coreml_backend",
        "//caffe2:coreml_backend_cpp",
        "//caffe2:torch",
        "//caffe2/torch/fb/mobile/model_exporter:model_exporter",
    ],
)
```

Initial benchmarks on iPhone 11:

FP32 Core ML Model:
https://our.intern.facebook.com/intern/aibench/details/203998485252700

Quantized Core ML Model:
https://our.intern.facebook.com/intern/aibench/details/927584023592505

High End Quantized Model:
https://our.intern.facebook.com/intern/aibench/details/396271714697929

Summarized Results
| Backend | Quantization | p50 net latency | Model Size |
|---------|--------------|-----------------|------------|
| Core ML | No           | 1.2200          | 1.2mb      |
| Core ML | Yes          | 1.2135          | 385kb      |
| CPU     | Yes          | 3.1720          | 426kb      |

Reviewed By: SS-JIA

Differential Revision: D36559966

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78108
Approved by: https://github.com/jmdetloff
2022-06-01 17:10:17 +00:00
PyTorch MergeBot
b994ce359e Revert "[cuDNN V8 API] (reopen) Allow the number of kernels profiled under torch.backends.cudnn.benchmark = True to be limitedCudnnv8 benchmark limit (#77002)"
This reverts commit c274f2ad52.

Reverted https://github.com/pytorch/pytorch/pull/77002 on behalf of https://github.com/malfet due to please, as it breaks internal CI, but also no CUDA heads should be included from `torch/csrc/Module.cpp`, but rather should be implemented/registered in `torch/csrc/cuda/Module.cpp`
2022-05-24 21:52:35 +00:00
Nikita Shulga
6244daa6a9 [MPS] Fix torch.mps.is_available() (#78121)
By introducing `at:mps::is_available()` and changing `torch._C._is_mps_available` from property to memoizable callable

Also, if `_mtl_device` is released in MPSDevice destructor, shouldn't it be retained in the constructor

Looks like GitHubActions Mac runner does not have any Metal devices available, according to https://github.com/malfet/deleteme/runs/6560871657?check_suite_focus=true#step:3:15

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78121
Approved by: https://github.com/albanD
2022-05-24 05:10:38 +00:00
Eddie Yan
c274f2ad52 [cuDNN V8 API] (reopen) Allow the number of kernels profiled under torch.backends.cudnn.benchmark = True to be limitedCudnnv8 benchmark limit (#77002)
(reopening due to botched merge)
The cuDNN V8 API (main support merged in https://github.com/pytorch/pytorch/pull/60755) potentially exposes many more kernels with benchmark=True. While these additional kernels can improve performance, it is often unnecessary to run every kernel returned by the heuristic and doing so may degrade the user experience by causing the first model iteration to be very slow. To alleviate this issue, this PR introduces torch.backends.cudnn.benchmark_limit. benchmark_limit specifies the maximum number of working cuDNN kernels to try for a given workload, with the default being 10 (similar to what TensorFlow does). benchmark_limit = 0 yields the current behavior of trying every kernel returned by the heuristic.

CC @ptrblck @ngimel @xwang233
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77002
Approved by: https://github.com/ngimel
2022-05-24 00:11:47 +00:00
Kulin Seth
f348b1b2b5 Add the Runtime components for MPS backend. (#76725)
The PR adds the runtime components and few basic operations like copy, as_strided for MPS backend.

Current list of identified TODOs are:

-  https://github.com/pytorch/pytorch/issues/77176
- Unify the logic with CUDACachingAllocator and remove redundant code.
-  https://github.com/pytorch/pytorch/issues/77170
- Look into using C++ smart pointers where possible with ObjC code
- Use empty_strided_generic() to implement the `empty_strided_mps` code
- https://github.com/pytorch/pytorch/issues/77144
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76725
Approved by: https://github.com/albanD
2022-05-11 17:19:45 +00:00
PyTorch MergeBot
1467e0dd5d Revert "Deprecate torch.lu"
This reverts commit a5bbfd94fb.

Reverted https://github.com/pytorch/pytorch/pull/73804 on behalf of https://github.com/malfet
2022-05-09 19:06:44 +00:00
lezcano
a5bbfd94fb Deprecate torch.lu
**BC-breaking note**:

This PR deprecates `torch.lu` in favor of `torch.linalg.lu_factor`.
A upgrade guide is added to the documentation for `torch.lu`.

Note this PR DOES NOT remove `torch.lu`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73804

Approved by: https://github.com/IvanYashchuk, https://github.com/mruberry
2022-05-05 19:17:11 +00:00
Kurt Mohler
5375b2e994 Resolve int[]? arguments to new OptionalIntArrayRef class
This PR uses the `OptionalArrayRef` template class that was drafted in #64084.

Fixes #44409
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70864
Approved by: https://github.com/ezyang
2022-03-26 01:45:50 +00:00
Tao Xu
06ff4f570c [Core ML] Support enumerated input shapes (#74441)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74441

For xirp based segmentation models, we want to support enumerated input shapes. This allows us to support both landscape and portrait mode images without sacrificing the performance. P488118264
ghstack-source-id: 151736964

Test Plan: `buck run coreml:xirp -- --model="/home/taox/xirp/xirp_20a.pt" --out="/home/taox/xirp/xirp_20a_coreml_enumerated.ptl"`

Reviewed By: mcr229

Differential Revision: D34803184

fbshipit-source-id: c462c0783846a1489ca7ce4d5a654aa6927c9c44
(cherry picked from commit 67d418c97531daaf3d03d1000ca4a4ff60de2a95)
2022-03-21 21:32:24 +00:00
Weiwen Xia
060f1b822a Add onednn quant backend (#74137)
Summary:
Resolve the conflicts in https://github.com/pytorch/pytorch/pull/69820
jerryzh168 Please review. Thanks.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74137

Reviewed By: samdow

Differential Revision: D34840477

Pulled By: jerryzh168

fbshipit-source-id: 8aa60981ff7be211a1609644f273b16d18efd425
(cherry picked from commit de76bb808b315e9a2e45d8c5f1c1233a47d669c4)
2022-03-15 01:28:21 +00:00
Jerry Zhang
5a897536f3 Revert D33716039: [pytorch][PR] Add ONEDNN quantization backend
Test Plan: revert-hammer

Differential Revision:
D33716039 (989b24855e)

Original commit changeset: 6f7bb807e857

Original Phabricator Diff: D33716039 (989b24855e)

fbshipit-source-id: ed233c5b99d4edb7d5a9d6c600825c78555f16d0
(cherry picked from commit d3e1f825b06ef67adb13623ccb7cbf1b700c1dd5)
2022-03-11 22:06:25 +00:00
Xia Weiwen
989b24855e Add ONEDNN quantization backend (#69820)
Summary:
This PR adds a new quantization backend, ONEDNN, with quantized conv and linear kernels in the same code path as the FBGEMM backend

The ONEDNN backend is an alternative of FBGEMM and QNNPACK backends. It takes advantage of features of the latest Intel® CPU products. It supports VNNI on Cascade Lake and the AMX instruction set to be available on Sapphire Rapids which has 8X int8 peak TOPS over VNNI.

ONEDNN demonstrates better performance on conv kernels of popular CNN models than FBGEMM. It also supports more fused ops, such as convolution-add-ReLU, than FBGEMM and QNNPACK.
To use this backend, users only need to set the quantization backend to 'onednn' before any calculation without a single change to models.
```python
torch.backends.quantized.engine = 'onednn'
```

## Design docs
https://github.com/pytorch/pytorch/issues/21120#issuecomment-562371983
https://github.com/pytorch/pytorch/pull/67177#issuecomment-963787096

## File changes
**Add ONEDNN to qengine list**
- aten/src/ATen/Context.cpp
- c10/core/QEngine.h
- torch/ao/quantization/qconfig.py
- torch/backends/quantized/\_\_init\_\_.py

**Implement qconv & qlinear for ONEDNN backend**
- aten/src/ATen/native/quantized/cpu/conv_serialization.h
- aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp
- aten/src/ATen/native/quantized/cpu/onednn_utils.h
- aten/src/ATen/native/quantized/cpu/qconv.cpp
- aten/src/ATen/native/quantized/cpu/qconv_dynamic.cpp
- aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp
- aten/src/ATen/native/quantized/cpu/qconv_unpack.cpp
- aten/src/ATen/native/quantized/cpu/qlinear.cpp
- aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp
- aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp
- aten/src/ATen/native/quantized/cpu/qlinear_unpack.cpp

**Skip tests that are not supported by ONEDNN**
- test/ao/sparsity/test_kernels.py
- test/quantization/core/test_quantized_module.py
- test/quantization/core/test_quantized_op.py

## Validation results
This PR has passed `test_quantization.py` and `test_mkldnn.py`.
Below are performance data of int8 2d convolution and linear on the Cascade Lake Xeon® platform:
(Note: Tested with single instance on single core. Using the latest oneDNN library.)

**Table 1. Performance comparison of int8 2d convolution operator**
|No.|	Shape|	FBGEMM|	ONEDNN|	Gain|
|-|-|-|-|-|
|1|	IC=128, OC=128, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0|	668.310us|	535.630us|	24.8%|
|2|	IC=128, OC=128, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0|	290.630us|	281.810us|	3.1%|
|3|	IC=128, OC=256, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0|	1.045ms|	893.010us|	17.0%|
|4|	IC=128, OC=256, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0|	385.320us|	373.720us|	3.1%|
|5|	IC=256, OC=256, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0|	1.876ms|	1.641ms|	14.3%|
|6|	IC=256, OC=256, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0|	660.460us|	638.470us|	3.4%|

**Table 2. Performance comparison of int8 linear operator**
|No.|	Shape (m, n, k)|	FBGEMM|	ONEDNN|	Gap|
|-|-|-|-|-|
|1|	64, 800, 320|	80.550us|	96.770us|	20.10%|
|2|	64, 768, 512|	101.230us|	130.720us|	29.10%|
|3|	16, 256, 512|	30.230us|	51.450us|	70.20%|
|4|	128, 128, 128|	33.810us|	50.480us|	49.30%|
|5|	256, 512, 256|	154.490us|	195.050us|	26.30%|
|6|	1024, 1024, 1024|	3.134ms|	3.514ms|	12.10%|

ONEDNN showed advantages over FBGEMM for convolution. However, it has performance gap to FBGEMM for Linear ops. The gap is a known issue and further optimization is in progress in the oneDNN library. On the latest platforms, better performance of ONEDNN is achieved for both conv and linear.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69820

Reviewed By: HDCharles

Differential Revision: D33716039

Pulled By: jerryzh168

fbshipit-source-id: 6f7bb807e85798142dfcffccfca8b8bd652fb3dd
(cherry picked from commit 91526b373560f42ba0ad307f9cccfc0eb5218b1f)
2022-03-11 20:31:49 +00:00
lkct
7d542a4f2b Fix type annotation for torch.backends.cudnn.allow_tf32 (#72757)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/72753

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72757

Reviewed By: samdow

Differential Revision: D34204436

Pulled By: ngimel

fbshipit-source-id: 3528efd7bdf72c1d9338806555ecb643ab94ffeb
(cherry picked from commit 7036c2e6e6)
2022-02-14 17:26:37 +00:00
Akshit Khurana
a70297e7cb NNAPI: quant logistic fix (#70847)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70847

NNAPI needs a fixed zero point and scale for sigmoid (logistic)
ghstack-source-id: 146555935

Test Plan: LIBNEURALNETWORKS_PATH="/path/to/libneuralnetworks.so" pytest test/test_nnapi.py

Reviewed By: dreiss

Differential Revision: D33237918

fbshipit-source-id: 05ef3a81bf1589ad44b599a19bce4066531c432b
2022-01-07 13:36:33 -08:00
Akshit Khurana
44283c2766 NNAPI: Add qint16 support via int16 (#70621)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70621

Pytorch doesn't have support for qint16 yet. Add an option to handle qint16 via int16 & qint32 data types.

* For qint16 tensors in NNAPI, the user sends a qint32 tensor. We convert the qint32 to int16 for the converter and set the zero point and scale for nnapi
    * inputs to the model have to have fixed scale and zero point and are only supported for testing
* Added a flag use_int16_for_qint16 which will be used maintain backwards compatibility in the converter when true qint16 is supported in PyTorch
ghstack-source-id: 146507483

Test Plan: pytest test/test_nnapi.py

Reviewed By: dreiss

Differential Revision: D33285124

fbshipit-source-id: b6376fa1bb18a0b9f6a18c545f600222b650cb66
2022-01-04 23:12:38 -08:00
Akshit Khurana
1150046d29 NNAPI: Add runtime flexible shapes & return shapes (#70334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70334

* Use 0 for load time flexible shapes
* -1 for runtime flexible shapes
* NNAPI needs return shapes for flexible outputs

Test Plan: Tested via upcoming ops

Reviewed By: dreiss

Differential Revision: D33237922

fbshipit-source-id: 50afdd8e3c6401dfb79b4bc09513c9882a09e5d5
2022-01-04 08:37:09 -08:00
Akshit Khurana
d9106116aa nnapi: Add int32 type torchscript expressions (#70197)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70197

Test Plan:
* `pytest test/test_nnapi.py`
* Testing via ops following this commit

Reviewed By: anshuljain1, dreiss

Differential Revision: D33237917

fbshipit-source-id: f0493620f28a62ad9fe0b97b67d1e25059d50c24
2022-01-03 19:00:38 -08:00
Xiao Wang
bfe5ad28e6 [Linalg] Add a runtime switch to let pytorch prefer a backend impl in linalg functions on GPU (#67980)
Summary:
Per title.

This PR introduces a global flag that lets pytorch prefer one of the many backend implementations while calling linear algebra functions on GPU.

Usage:
```python
torch.backends.cuda.preferred_linalg_library('cusolver')
```

Available options (str): `'default'`, `'cusolver'`, `'magma'`.

Issue https://github.com/pytorch/pytorch/issues/63992 inspired me to write this PR. No heuristic is perfect on all devices, library versions, matrix shapes, workloads, etc. We can obtain better performance if we can conveniently switch linear algebra backends at runtime.

Performance of linear algebra operators after this PR should be no worse than before. The flag is set to **`'default'`** by default, which makes everything the same as before this PR.

The implementation of this PR is basically following that of https://github.com/pytorch/pytorch/pull/67790.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67980

Reviewed By: mruberry

Differential Revision: D32849457

Pulled By: ngimel

fbshipit-source-id: 679fee7744a03af057995aef06316306073010a6
2021-12-03 19:06:30 -08:00
eqy
790763b0fe Add an option to disable reduced precision reductions for FP16 GEMM (#67946)
Summary:
https://github.com/pytorch/pytorch/issues/67578 disabled reduced precision reductions for FP16 GEMMs. After benchmarking, we've found that this has substantial performance impacts for common GEMM shapes (e.g., those found in popular instantiations of multiheaded-attention) on architectures such as Volta. As these performance regressions may come as a surprise to current users, this PR adds a toggle to disable reduced precision reductions
`torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = `
rather than making it the default behavior.

CC ngimel ptrblck
stas00 Note that the behavior after the previous PR can be replicated with
`torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = False`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67946

Reviewed By: zou3519

Differential Revision: D32289896

Pulled By: ngimel

fbshipit-source-id: a1ea2918b77e27a7d9b391e030417802a0174abe
2021-11-09 17:27:20 -08:00
Akshit Khurana
1de8976e85 Add quantized::convtranspose2d (#63914)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63914

Test Plan: Imported from OSS

Reviewed By: dreiss

Differential Revision: D30531889

fbshipit-source-id: a65e389da2722efbc62e3fe1edf503732326350d
2021-09-24 17:07:29 -07:00
Akshit Khurana
ab5eb56983 add qmul (#63913)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63913

Test Plan: Imported from OSS

Reviewed By: dreiss

Differential Revision: D30531890

fbshipit-source-id: 29d88cc61bd1e328cc7ae7a91a2f8d4819803c8d
2021-09-24 17:06:17 -07:00
Tao Xu
7dc3858deb [CoreML][fbcode] Add the preprocess python APIs (#64521)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64521

Add the preprocess part for the coreml delegate. Check out the `example.py` for the usage.
ghstack-source-id: 138324214

Test Plan:
```
(base) [taox@devvm2780.vll0 ~/fbsource/fbcode/caffe2/fb]  buck run coreml:example -- --model="/home/taox/mobilenetv2/mobilenetv2.pt" --out="/home/taox/mobilenetv2/mobilenetv2_coreml.pt"
Parsing buck files: finished in 0.5 sec
Downloaded 0/1 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 10.6 sec (100%) 12611/57623 jobs, 1/57623 updated
  Total time: 11.1 sec
Converting Frontend ==> MIL Ops: 100%|██████████████████████████████████████████▉| 382/383 [00:00<00:00, 692.58 ops/s]
Running MIL optimization passes: 100%|███████████████████████████████████████████| 18/18 [00:00<00:00, 45.55 passes/s]
Translating MIL ==> MLModel Ops: 100%|███████████████████████████████████████████| 704/704 [00:01<00:00, 468.56 ops/s]
input {
  name: "input_0"
  type {
    multiArrayType {
      shape: 1
      shape: 3
      shape: 224
      shape: 224
      dataType: FLOAT32
    }
  }
}
output {
  name: "645"
  type {
    multiArrayType {
      dataType: FLOAT32
    }
  }
}
metadata {
  userDefined {
    key: "com.github.apple.coremltools.source"
    value: "torch==1.10.0a0+fb"
  }
  userDefined {
    key: "com.github.apple.coremltools.version"
    value: "4.1"
  }
}

{'inputs': '[["input_0", "0", "[1, 3, 224, 224]"]]', 'outputs': '[["645", "0", "[1, 1000]"]]', 'config': '{"spec_ver": "4", "backend": "cpu", "allow_low_precision": "True"}', 'metadata': '{"coremltool_ver": "4.1", "torch_ver": "torch==1.10.0a0+fb"}'}
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0826 13:27:12.690302 2477051 backend_detail.cpp:376] Warning: Backend [coreml] is not available. Execution of this Module is still possible by saving and loading on a device where the backend is available. (function codegen_backend_module)
graph(%self.1 : torch.jit.LoweredModule.coreml.__torch__.torchvision.models.mobilenetv2.MobileNetV2,
      %x.1 : Tensor):
  %51 : str = prim::Constant[value="Exception: Backend is not available."]()
  %50 : str = prim::Constant[value="AssertionError: "]()
  %14 : str = prim::Constant[value="forward"]() # <string>:5:62
  %48 : Tensor = prim::Uninitialized()
  %44 : Tensor = prim::Uninitialized()
  %typed_inputs.1 : Any[] = prim::ListConstruct(%x.1)
  %__backend.3 : __torch__.torch.classes.__backends__.coreml = prim::GetAttr[name="__backend"](%self.1)
  %8 : bool = prim::CallMethod[name="is_available"](%__backend.3) # <string>:4:19
  %49 : Tensor = prim::If(%8) # <string>:4:16
    block0():
      %__backend : __torch__.torch.classes.__backends__.coreml = prim::GetAttr[name="__backend"](%self.1)
      %__handles : Dict(str, Any) = prim::GetAttr[name="__handles"](%self.1)
      %15 : Any = aten::__getitem__(%__handles, %14) # <string>:5:47
      %17 : Any[] = prim::CallMethod[name="execute"](%__backend, %15, %typed_inputs.1) # <string>:5:24
      %18 : Any = prim::ListUnpack(%17)
      %20 : bool = prim::isinstance[types=[Tensor]](%18)
      %39 : Tensor = prim::If(%20) # <string>:6:18
        block0():
          %22 : Tensor = prim::unchecked_cast(%18)
          -> (%22)
        block1():
           = prim::RaiseException(%50) # <string>:6:18
          -> (%44)
      -> (%39)
    block1():
       = prim::RaiseException(%51) # <string>:9:18
      -> (%48)
  return (%49)

```

Reviewed By: raziel

Differential Revision: D30585154

fbshipit-source-id: 66c7d2e931be6eaa3c43a0ee131ea8046452449d
2021-09-17 00:25:14 -07:00
Akshit Khurana
2d58f3f56d NNAPI: Support const values in binary ops
Summary:
NNAPI converter failed with 1 const value and one tensor earlier
Code suggestions from dreiss

Test Plan:
pytest test/test_nnapi.py::TestNNAPI::test_pointwise_binary

Imported from OSS

Reviewed By: anshuljain1

Differential Revision: D28893881

fbshipit-source-id: 59240373fb03c6fdafa4cb2fa4d8408dd20092f6
2021-08-20 21:10:26 -07:00
Amy He
73f1e2d1dc [8/N] Nnapi backend delegation preprocess: New refactored design (#62225)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62225

Rewrote the preprocess function for Android NNAPI delegate.
Previously, `preprocess()` called `convert_model_to_nnapi()` using Pybind and returned a NnapiModule that is serialized for mobile. Now, `preprocess()` calls a sub-function of `convert_model_to_nnapi()` and returns several preprocessed items (that were previously components of NnapiModule).

Dictionary returned contains:
   "shape_compute_module": torch::jit::Module,
   "ser_model": torch::Tensor,
   "weights": List[torch.Tensor],
   "inp_mem_fmts": List[int],
   "out_mem_fmts": List[int]

**Purpose and Future:**
The purpose of these changes are to move more implementation from bytecode and Torchscript to the delegate API, since bytecode is less efficient.
Now, only the shape computation uses bytecode. In the future, shape computation will be moved out of Torchscript as well.

**nnapi_backend_preprocess.cpp:** preprocess implementation
**prepare.py**: refactored a portion of `convert_model_to_nnapi()` to `process_for_nnapi()`, so preprocess can get components of NnapiModule

**Test:**
Ran `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py` on OSS successfully
ghstack-source-id: 134444190

Test Plan: Ran `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py` on OSS successfully

Reviewed By: raziel

Differential Revision: D29922279

fbshipit-source-id: cadcf8908d8a745dc7abbe286e97d6ead937d4ab
2021-07-27 18:52:48 -07:00
Akshit Khurana
8e71f48f0a Handle simple NNAPI flatten NHWC case (#61796)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61796

We can easily handle nnapi conversion for nhwc inputs
that have 1 channel or H & W are 1

Test Plan:
pytest test/test_nnapi.py::TestNNAPI::test_flatten

Imported from OSS

Reviewed By: saketh-are

Differential Revision: D29827735

fbshipit-source-id: 65dee4b42fceef1b032bf5dd1c4cc6e020d01e14
2021-07-26 10:59:04 -07:00
Akshit Khurana
a3670ba377 Add option to specify custom NNAPI serializer (#61025)
Summary:
To add serializer for custom ops we can subclass default serializer
and update ADDER_MAP

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61025

Test Plan:
* pytest test/test_nnapi.py::TestNNAPI for current serializer
* Custom serializers to be tested with custom ops

Imported from OSS

Reviewed By: anshuljain1

Differential Revision: D29480745

fbshipit-source-id: 37e3f8de3c97f6c8a486f9879ce11430ea89af34
2021-07-09 15:27:10 -07:00
Akshit Khurana
ae65f63971 Make nnapi flatten converter accept flex inputs (#61024)
Summary:
As title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61024

Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_flatten

Reviewed By: anshuljain1

Differential Revision: D29480748

fbshipit-source-id: c334b09600a64d3e552cec843d6da3de28e7d27c
2021-07-09 15:27:02 -07:00
Akshit Khurana
76c0f223d3 Make nnapi cat converter accept flex inputs
Summary: As title

Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_cat

Reviewed By: anshuljain1

Differential Revision: D29480747

fbshipit-source-id: 161803054ff1a4c2c750fc30a5f0fc6d8a24b2c9
2021-07-09 14:27:53 -07:00
Akshit Khurana
9e81d3d869 Make NNAPI linear converter accept flex inputs (#61022)
Summary:
As title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61022

Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_linear

Reviewed By: anshuljain1

Differential Revision: D29480749

fbshipit-source-id: 35975861740298c9e16f866c939e7ee3c2151710
2021-07-09 14:27:51 -07:00
Akshit Khurana
9e533a62f6 Make conv2d nnapi converter accept flexible batch (#61021)
Summary:
Same as title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61021

Test Plan: pytest test/test_nnapi.py::TestNNAPI

Reviewed By: anshuljain1

Differential Revision: D29480746

fbshipit-source-id: 7217c8f3a811db8c3c373f3e7ca31caf9502ef22
2021-07-09 10:28:10 -07:00
Akshit Khurana
8bd3e52e00 Add conv2d transpose NNAPI converter (#59529)
Summary:
* Conv2d transpose support
* Quantize WIP

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59529

Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_conv2d_transpose

Reviewed By: anshuljain1

Differential Revision: D28926335

fbshipit-source-id: 8f90182f96cee0a13c4f38331d421e1e8ac618de
2021-07-09 09:29:20 -07:00
Ivan Kobzarev
7b6ddb6793 [nnapi] add log_softmax (#61378)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61378

Test Plan: Imported from OSS

Reviewed By: axitkhurana

Differential Revision: D29597355

Pulled By: IvanKobzarev

fbshipit-source-id: 55124749f8eeffa2b2713f7cffd5ccf965561de1
2021-07-07 18:28:39 -07:00
Akshit Khurana
baa518e2f6 Add Int32 support for NNAPI (#59365)
Summary:
Support Int32 tensors in NNAPI converter

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59365

Test Plan: Local testing with FB prod models

Reviewed By: anshuljain1

Differential Revision: D28881040

fbshipit-source-id: 2dacceffd322a21d91bfefcf2fb2ea400d952d0d
2021-07-07 12:40:49 -07:00
Akshit Khurana
cf285d8eea Add aten::slice NNAPI converter (#59364)
Summary:
Add support for aten::slice op in the NNAPI model converter

* If start = 0; end = max -> identity
* Flexible shapes can be passed through
* Flexible shapes can't be sliced over

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59364

Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_slice

Reviewed By: anshuljain1

Differential Revision: D28881039

fbshipit-source-id: 3c1c630ff27b5bba6eda403d87570c61d43ae90e
2021-07-07 12:40:47 -07:00
Akshit Khurana
d26372794a Add aten::detach NNAPI converter (#58543)
Summary:
* Add support for aten::detach op in the NNAPI model converter as a no-op
* Also add flexible op support for add_pointwise_simple_unary_op

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58543

Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_detatch

Reviewed By: anshuljain1

Differential Revision: D28531942

fbshipit-source-id: 4387dbbbadd8ce6b690841f3a903e68a380b849d
2021-07-07 12:40:46 -07:00
Akshit Khurana
0be228dd5f Add aten::flatten NNAPI converter (#60885)
Summary:
Add support for aten::div op in the NNAPI model converter. Startup time
variable size support isn't supported as shapes go as inputs to NNAPI op

Runtime variable size support to supported soon

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60885

Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_flatten

Reviewed By: anshuljain1

Differential Revision: D29451725

fbshipit-source-id: 8902745f7758c8cc88ad4b4ce02b8301ff894bd4
2021-07-07 12:40:44 -07:00
Akshit Khurana
b297f65b66 Add aten::div NNAPI converter (#58541)
Summary:
Add support for aten::div op in the NNAPI model converter. Add variable
size input test as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58541

Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_div

Reviewed By: anshuljain1

Differential Revision: D28531943

fbshipit-source-id: e96342146f6de216f7b88443618edfc54963747c
2021-07-07 12:40:42 -07:00
Akshit Khurana
eab18a9a40 Add aten::to NNAPI converter (#58540)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58540

Add support for aten::to op in the NNAPI model converter for simple
cases like to("cpu"), to("gpu")

Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_to

Reviewed By: anshuljain1

Differential Revision: D28531941

fbshipit-source-id: 0c934f7aceaff2669307c3426efe32046d8c44f3
2021-07-07 12:40:41 -07:00
Akshit Khurana
14d604a13e Add aten::softmax NNAPI converter (#58539)
Summary:
Add support for aten::softmax op in the NNAPI model converter with
flexible size

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58539

Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_softmax

Reviewed By: anshuljain1

Differential Revision: D28531946

fbshipit-source-id: 8633f3e3f7f52795f9866ff16ad0867ea36a19e8
2021-07-07 12:39:31 -07:00
Akshit Khurana
369802a504 Add aten::avgpool2d NNAPI converter (#58538)
Summary:
Add support for aten::avgpool2d op in the NNAPI model converter with var
size support

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58538

Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_avgpool2d

Reviewed By: anshuljain1

Differential Revision: D28531944

fbshipit-source-id: 43ff8c9389365698c282f204042b49c7ec84d824
2021-07-01 14:07:14 -07:00
Akshit Khurana
c4bb6a5781 NNAPI: flex size support for upsample_nearest2d op (#57563)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57563

Add flexible size support for upsample_nearest2d op in nnapi model conversion

Test Plan:
pytest test/test_nnapi.py

Imported from OSS

Reviewed By: dreiss

Differential Revision: D28200847

fbshipit-source-id: 901fe3f6e68e4c16ece730f3ffa68dc88c6ed6c3
2021-05-05 13:54:43 -07:00
Akshit Khurana
4c609a9782 NNAPI: Add qadd flexible size support (#57562)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57562

Add flexible size support for qadd op in nnapi model conversion

Test Plan:
pytest test/test_nnapi.py

Imported from OSS

Reviewed By: dreiss

Differential Revision: D28200849

fbshipit-source-id: d5b2ea8e9eb8ae405ff2c960f7549cef60bc0991
2021-05-05 13:54:41 -07:00
Akshit Khurana
28cd04ea64 NNAPI: add flexible size support for conv2d (#57561)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57561

Add flexible size support for conv2d op in nnapi model conversion

Test Plan:
pytest test/test_nnapi.py

Imported from OSS

Reviewed By: dreiss

Differential Revision: D28200848

fbshipit-source-id: d94ccf48a3d8453aa8e96c7cac02948c4cd870cc
2021-05-05 13:53:33 -07:00
Guilherme Leobas
e7c79cb158 Add type annotations to nnapi (#48142)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/48141

~Mypy is complaining about a missing arg in a function call.~
```bash
torch/backends/_nnapi/serializer.py:806: error: Too few arguments for "_do_add_binary"  [call-arg]
Found 1 error in 1 file (checked 1140 source files)
```

9392137dbe/torch/backends/_nnapi/serializer.py (L804-L806)

~dreiss, would you mind take a look when you have some cycles to spare and see what would be the appropriated value for `fuse_code` here? Thanks :)~

Edit: https://github.com/pytorch/pytorch/issues/48925 got merged a couple of days ago. The blocking part is now unblocked, and I just pushed the changes to make mypy happy again. This PR is ready for review.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48142

Reviewed By: ezyang

Differential Revision: D28006249

Pulled By: walterddr

fbshipit-source-id: 5e43eeba7143512a549efaad31541f86718add7c
2021-04-26 19:08:07 -07:00
Sam Estep
75024e228c Add lint for unqualified type: ignore (#56290)
Summary:
The other half of https://github.com/pytorch/pytorch/issues/56272.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56290

Test Plan:
CI should pass on the tip of this PR, and we know that the lint works because the following CI runs (before this PR was finished) failed:

- https://github.com/pytorch/pytorch/runs/2384511062
- https://github.com/pytorch/pytorch/actions/runs/765036024

Reviewed By: seemethere

Differential Revision: D27867219

Pulled By: samestep

fbshipit-source-id: e648f07b6822867e70833e23ddafe7fb7eaca235
2021-04-21 08:07:23 -07:00
David Reiss
da7a27b847 [NNAPI] Initial flexible size support (#54701)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54701

We need NNAPI models to support inputs (and, by extension, intermediate
values and outputs) whose shape is only determined at load time.  For
example, a vision models input shape might be dependent on the aspect
ratio of the device camera.  While NNAPI has full support for variable
shapes (by setting components of the operand shape to 0), the guidance
we have received is that vendor-provided drivers for real hardware are
not able to support this efficiently.  Therefore, we take a hybrid
approach where shapes are calculated at model load time to
semi-dynamically construct our NNAPI model.  While this doesn't let us
have truly dynamic input shapes, it does allow us to ensure that the
vendor driver only sees fixed shapes, so we get maximum performance.

In this initial commit, only PReLU supports dynamic shapes.  Additional
operators will be converted in separate diffs.

- In order to convert a flexible-shape model, the user supplies inputs
  with shapes containing dimensions of size 0 for the flexible
  dimensions.
- During conversion, we generate code to compute the shapes of all
  intermediates and outputs as a function of the input shapes.
- We no longer run the input model to produce the output templates.
  Instead, we generate code to return properly-sized templates, given
  the input shapes.
- All of this generated code goes into a "ShapeComputeModule" that is
  used by the NnapiModule during initialization.
- The ShapeComputeModule mutates the serialized model to fill in the
  computed sizes for each operand.  This requires us to change the dtype
  for the serialized model to int32, but this should be fine because
  everything in it is already 4-byte aligned.
- NnapiInitWrapper no longer exists.  Instead, initialization is
  performed on the first run, based on the real arguments.  We plan to
  provide an API for doing eager initialization.
- Unit test updated to allow separate arguments to be given for trace,
  conversion, and inference.  A flexible-shape test case was added for
  PReLU.

Test Plan: Unit test

Reviewed By: axitkhurana

Differential Revision: D27536796

Pulled By: dreiss

fbshipit-source-id: 105585f247987b1e6ec6946a6fe44401237cb0a0
2021-04-06 13:49:43 -07:00
David Reiss
1e3b3a4714 [NNAPI] Create get_next_operand_id (#54700)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54700

This is an internal method just to make it more clear what
len(self.operands) is doing.

Test Plan: Unit test

Reviewed By: axitkhurana

Differential Revision: D27536794

Pulled By: dreiss

fbshipit-source-id: 678cee8a47df6757dd2e6feabf2560fd82d32e26
2021-04-06 13:49:41 -07:00
David Reiss
ca67c17e46 [NNAPI] Add fixed-size assertions (#54699)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54699

We'll soon be adding support for flexible-size tensors to the NNAPI
converter, but it won't be added to all ops at once.  Create
get_tensor_operand_by_jitval_fixed_size as a wrapper for
get_tensor_operand_by_jitval that verifies that the argument has a fixed
shape.  Update all call sites.  As flexible size support is added to
each op, the call sites can be converted back and proper size checks
added.

Test Plan: Unit test

Reviewed By: axitkhurana

Differential Revision: D27536791

Pulled By: dreiss

fbshipit-source-id: 6fb1fea814d767b6ff263fd8b88240a51be74777
2021-04-06 13:49:38 -07:00
David Reiss
5936faee7e [NNAPI] Rename local variable (#54698)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54698

"mf" was short for memory format, but the concept that this variable
represents was renamed to "dim_order", so rename the variable.

Test Plan: Unit test

Reviewed By: axitkhurana

Differential Revision: D27536793

Pulled By: dreiss

fbshipit-source-id: 2b31c70da1ff221a7833e67486690fa606f01dea
2021-04-06 13:49:35 -07:00
David Reiss
1f1d26137b [NNAPI] Use code generation to better support list input/output (#54697)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54697

Previously, models being converted to NNAPI were expected to take inputs
as separate arguments, but the generated NNAPI model could only take
multiple inputs as a list.  Now the generated model always takes inputs
(single or multiple) as separate tensor arguments.

Previously, models being converted to NNAPI were expected to return
outputs as a single tensor or tuple of tensors, but the generated NNAPI
model would return multiple outputs as a list. Now the generated model
returns a tuple as well (or single tensor).

Internally, we decied what output format to use (single tensor or tuple)
based on the conversion process, rather than by running the model.

Test Plan: Unit test

Reviewed By: axitkhurana

Differential Revision: D27536790

Pulled By: dreiss

fbshipit-source-id: c0f93c85d450757e568985947cc2f32043795859
2021-04-06 13:49:33 -07:00
David Reiss
d34d6244e7 [NNAPI] Use array instead of struct for serializing ints (#54696)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54696

This was originally developed for a Python version where array was not
available.

Test Plan: Unit test

Reviewed By: axitkhurana

Differential Revision: D27536792

Pulled By: dreiss

fbshipit-source-id: 39e5507e37d4f91871113439fe752a4d5373eaba
2021-04-06 13:49:30 -07:00
David Reiss
476c597ae6 [NNAPI] Handle binary ops combining NHWC+NCHW in some cases (#48812)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48812

This came up in a squeeze-and-excitation model.  Starting with an NHWC
tensor T, we perform a mean operation across H and W, giving an NxC
tensor, which (after some fully connected layers) is reshaped to
NxCx1x1, then multiplied with T.  To handle this, we detect the specific
case of a binary op with one NHWC input and one contiguous input with
H,W == 1,1 and allow the op to be applied (after transposing the
contiguous input).

Test Plan: Unit test.

Reviewed By: axitkhurana

Differential Revision: D25317939

Pulled By: dreiss

fbshipit-source-id: b4c17ab3b874d1a7defa04664010ba82115f1c20
2021-04-06 13:49:25 -07:00
David Reiss
b057d27b0b [NNAPI] Add support for unsqueeze, cat, and mean (#48811)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48811

Test Plan: Unit tests.

Reviewed By: axitkhurana

Differential Revision: D25317936

Pulled By: dreiss

fbshipit-source-id: 9b3a0a75b8157ae35ac13d52293a67800bad0ded
2021-04-06 13:49:22 -07:00
David Reiss
8fcf9ca341 [NNAPI] Update support for Linear (#54695)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54695

Previously, torch.nn.Linear was calling aten::addmm internally.  Now
it's calling aten::linear, so add support for that.

Test Plan: Unit test

Reviewed By: axitkhurana

Differential Revision: D27536795

Pulled By: dreiss

fbshipit-source-id: 42c8d2a80b20ac12ed9bba599c5e0e874256bb13
2021-04-06 13:49:17 -07:00
David Reiss
8d960f7043 [NNAPI] Fix hardtanh (#47520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47520

NNAPI defines "RELU1" as clamping from [-1, 1], not [0, 1] as I
previously assumed.  Fix our implementation to match.

Test Plan: Upcoming unit test.

Reviewed By: axitkhurana

Differential Revision: D25317934

Pulled By: dreiss

fbshipit-source-id: 70efd5bb6092b0628ff6b765ce6f6274ef28d741
2021-04-06 13:49:14 -07:00
David Reiss
beca1fdbec [NNAPI] Fix MUL op (#47519)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47519

This wasn't updated when _do_add_binary was refactored.

Test Plan: Upcoming unit test.

Reviewed By: axitkhurana

Differential Revision: D25317938

Pulled By: dreiss

fbshipit-source-id: 99212404c189481cfa692dd77d8f7c7865b6872b
2021-04-06 13:49:12 -07:00
David Reiss
38a3c28f17 [NNAPI] Remove solid weights support (#47518)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47518

This was left over from an old version of the code.  The idea was that
instead of indexing into separate tensors for each weight, you could
bundle them all into a single file and use different offsets into that
file.  With the current design, this is nontrivial to support, so drop
the code for now.

Test Plan: CI

Reviewed By: axitkhurana

Differential Revision: D25317935

Pulled By: dreiss

fbshipit-source-id: e26ab3a8d437cb1bbb50319209fa56d9c571ce61
2021-04-06 13:49:09 -07:00
David Reiss
1be909f074 [NNAPI] Fix models with no weights (#47517)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47517

While we're unlikely to see this in practice, it comes up in unit tests.
This type annotation is necessary for `torch.jit.script` to figure out
the type of the list if it is empty.

Test Plan: Unit tests in a later diff.

Reviewed By: axitkhurana

Differential Revision: D25317937

Pulled By: dreiss

fbshipit-source-id: de8b6665c6fcd3cd2b39e3c696a39336c064e4c1
2021-04-06 13:49:06 -07:00
Akshit Khurana
d0fd41dcfe Add size op in nnapi serializer (#52026)
Summary:
serializer didn't support aten::size

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52026

Test Plan: Torchvision Mobilenetv2 [script](https://pytorch.org/tutorials/prototype/nnapi_mobilenetv2.html) works. [Test](ecfed07cc5) to be merged after [this PR](https://github.com/pytorch/pytorch/pull/47521/files) is merged

Reviewed By: dreiss

Differential Revision: D26363133

Pulled By: axitkhurana

fbshipit-source-id: 772a6bea62bca69f8bba19c25c582a1734a70eb1
2021-02-10 15:57:01 -08:00
David Reiss
9a9383ef2e PyTorch NNAPI integration prototype (#46780)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46780

This is in prototype status, but pretty functional.  There are two major
parts.

- Model converter.  This is a pure Python component that consumes a
  model in TorchScript format, converts the operations into NNAPI
  semantics, and serializes the model in a custom format.  It then wraps
  the result in a new TorchScript model that can invoke NNAPI under the
  hood.
- Runtime.  This is a TorchBind object that deserializes the model and
  sends the result to NNAPI.  This is fairly simple since the serialized
  format is basically just a list of NNAPI calls to make, so most of the
  code is spent on bounds checking.

A few notes on the design.
- Currently, all tensor sizes need to be fixed, and those fixed sizes
  are burned directly into the serialized model.  This will probably
  need to change.  NNAPI supports variable-sized tensors, but the
  important hardware backends do not.  However, we're seeing use cases
  crop up where the input size is not known until around the time that
  the model is loaded (for example, it might depend on the camera aspect
  ratio).  I think the proper fix here is to remove the code in the
  converter that eagerly calculates the sizes of the intermediate
  tensors and replace it with a code generator that will generate some
  TorchScript code that will perform those calculations at model load
  time.  This way, we will be able to support models that have
  variable-sized inputs while still only showing fixed-sized operands to
  NNAPI.
- The important hardware backends want operands to be in NHWC order, but
  PyTorch natively represents all tensors and NCHW.  The strategy for
  this is to keep NCHW during most of the conversion process, but track
  and additional value per operand representing the "dimension order".
  The dimension order gets propagated through convolutions and pointwise
  ops.  When we're ready to serialize the model, we reorder the
  dimensions for "channels last" operands to NHWC.

Test Plan:
Some local testing with FB prod models.  I'll need to add some examples
and automated tests.

Reviewed By: iseeyuan

Differential Revision: D24574040

Pulled By: dreiss

fbshipit-source-id: 6adc8571b234877ee3666ec0c0de24da35c38a1f
2020-11-05 21:31:01 -08:00
Jane (Yuan) Xu
1c996b7170 Enable typechecking for torch.testing._internal.common_quantized.* (#44805)
Summary:
Addresses a subproblem of [Issue 42969](https://github.com/pytorch/pytorch/issues/42969)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44805

Reviewed By: malfet

Differential Revision: D23742754

Pulled By: janeyx99

fbshipit-source-id: e916a6a0c049cac318549a485d47f19363087d15
2020-09-17 14:24:32 -07:00
Xiang Gao
e48201c5cf Mention TF32 on related docs (#44690)
Summary:
cc: ptrblck

![image](https://user-images.githubusercontent.com/1032377/93168022-cbbfcb80-f6d6-11ea-8f6e-f2c8a15c5bea.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44690

Reviewed By: ngimel

Differential Revision: D23727921

Pulled By: mruberry

fbshipit-source-id: db7cc8e74cde09c13d6a57683129fd839863b914
2020-09-16 19:18:30 -07:00
Xiang Gao
20ac736200 Remove py2 compatible future imports (#44735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735

Reviewed By: mruberry

Differential Revision: D23731306

Pulled By: ezyang

fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f
2020-09-16 12:55:57 -07:00
Nikita Shulga
c44e4878ae Enable torch.backends.quantized typechecks (#44794)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/44793

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44794

Reviewed By: walterddr

Differential Revision: D23734353

Pulled By: malfet

fbshipit-source-id: 491bd7c8f147759715eb296d7537a172685aa066
2020-09-16 12:21:20 -07:00
Gao, Xiang
5e97f251a8 Enable TF32 support for cuDNN (#40737)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40737

Reviewed By: mruberry

Differential Revision: D22801525

Pulled By: ngimel

fbshipit-source-id: ac7f7e728b4b3e01925337e8c9996f26a6433fd2
2020-09-01 15:34:24 -07:00
Xiang Gao
23174ca71b [reland] Enable TF32 support for cuBLAS (#41498)
Summary:
fix rocm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41498

Reviewed By: mruberry

Differential Revision: D22560572

Pulled By: ngimel

fbshipit-source-id: 5ee79e96cb29e70d9180830d058efb53d1c6c041
2020-07-15 21:00:55 -07:00
Shen Li
3a63a939d4 Revert D22517785: [pytorch][PR] Enable TF32 support for cuBLAS
Test Plan: revert-hammer

Differential Revision:
D22517785 (288ece89e1)

Original commit changeset: 87334c893561

fbshipit-source-id: 0a0674f49c1bcfc98f7f88af5a8c7de93b76e458
2020-07-15 08:15:48 -07:00
Xiang Gao
288ece89e1 Enable TF32 support for cuBLAS (#40800)
Summary:
Benchmark on a fully connected network and torchvision models (time in seconds) on GA100:

| model              | batch size | forward(TF32) | forward(FP32) | backward(TF32) | backward(FP32) |
|--------------------|------------|---------------|---------------|----------------|----------------|
| FC 512-128-32-8    | 512        | 0.000211      | 0.000321      | 0.000499       | 0.000532       |
| alexnet            | 512        | 0.0184        | 0.0255        | 0.0486         | 0.0709         |
| densenet161        | 128        | 0.0665        | 0.204         | 0.108          | 0.437          |
| googlenet          | 256        | 0.0925        | 0.110         | 0.269          | 0.326          |
| inception_v3       | 256        | 0.155         | 0.214         | 0.391          | 0.510          |
| mnasnet1_0         | 512        | 0.108         | 0.137         | 0.298          | 0.312          |
| mobilenet_v2       | 512        | 0.114         | 0.294         | 0.133          | 0.303          |
| resnet18           | 512        | 0.0722        | 0.100         | 0.182          | 0.228          |
| resnext50_32x4d    | 256        | 0.170         | 0.237         | 0.373          | 0.479          |
| shufflenet_v2_x1_0 | 512        | 0.0463        | 0.0473        | 0.125          | 0.123          |
| squeezenet1_0      | 512        | 0.0870        | 0.0948        | 0.205          | 0.214          |
| vgg16              | 256        | 0.167         | 0.234         | 0.401          | 0.502          |
| wide_resnet50_2    | 512        | 0.186         | 0.310         | 0.415          | 0.638          |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/40800

Reviewed By: mruberry

Differential Revision: D22517785

Pulled By: ngimel

fbshipit-source-id: 87334c8935616f72a6af5abbd3ae69f76923dc3e
2020-07-14 13:21:10 -07:00
Shawn Zhong
21ba3b4f40 Fix torch.backends.cudnn mypy error (#38947)
Summary:
Fix https://github.com/pytorch/pytorch/issues/38410

![image](https://user-images.githubusercontent.com/6421097/82724121-74b26880-9c99-11ea-9b63-e92de2dccdf2.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38947

Differential Revision: D21765290

Pulled By: ezyang

fbshipit-source-id: 5d2b25f039a653c609d60cdaac4a7ac5812ae291
2020-06-03 10:55:43 -07:00
guol-fnst
42b2dee6c2 verbose unused in torch.backends.cudnn (#39228)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39228

Differential Revision: D21818455

Pulled By: ezyang

fbshipit-source-id: abf158f2d745fd135cd0966ee30d559cefa456c0
2020-06-01 09:08:03 -07:00
Ailing Zhang
7c13a07286 [Reland] Remove uses of type() part 2 (#38288)
Summary:
Reland of https://github.com/pytorch/pytorch/issues/38140. It got reverted since it broke slow tests which were only run on master branch(thanks mruberry !). Enabling all CI tests in this PR to make sure they pass.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38288

Reviewed By: mruberry

Differential Revision: D21524923

Pulled By: ailzhang

fbshipit-source-id: 3a9ecc7461781066499c677249112434b08d2783
2020-05-12 13:37:14 -07:00
Mike Ruberry
f6b1c046b6 Revert D21483808: [pytorch][PR] Remove uses of type() part 2
Test Plan: revert-hammer

Differential Revision:
D21483808

Original commit changeset: 12f5de6151ba

fbshipit-source-id: 2755fa97ae3f342ae88b1531acfa790772a27c17
2020-05-09 00:42:39 -07:00
Ailing Zhang
86d28706e0 Remove uses of type() part 2 (#38140)
Summary:
I'm mostly done with cleaning up test/ folder. There're a bunch of remaining callsites but they're "valid" in testing `type()` functionalities. We cannot remove them until it's fully deprecated.
Next PR would mainly focus on move some callsites to an internal API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38140

Differential Revision: D21483808

Pulled By: ailzhang

fbshipit-source-id: 12f5de6151bae59374cfa0372e827651de7e1c0f
2020-05-08 19:30:46 -07:00
Kimish Patel
4c30fc7238 Integrate XNNPACK with custom class for packing weights. (#34047)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34047

This PR integrates the added xnnpack conv2d and linear op via
custom class registration for packed weights. The packed struct
is serializable.

Test Plan:
python test test/test_xnnpack_integration.py

Imported from OSS

Differential Revision: D20185657

fbshipit-source-id: fc7e692d8f913e493b293b02d92f4e78536d7698
2020-03-14 12:51:56 -07:00
Peter Bell
5fc5cf6571 Stop using ctypes to interface with CUDA libraries. (#33678)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33016, Continuation of https://github.com/pytorch/pytorch/issues/31160
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33678

Differential Revision: D20249187

Pulled By: ezyang

fbshipit-source-id: 172ce4a0fee7fbe01436a421d1af22ef6173b6ed
2020-03-11 07:22:46 -07:00
Jithun Nair
718c538ff9 Add ability to enable/disable MIOpen at runtime (#33118)
Summary:
1. Set `torch._C.has_cudnn` to `True` for ROCm
2. Make MIOpen invocations respect value of `cudnn_enabled` or `at::globalContext().userEnabledCuDNN()`
3. `torch/backends/cudnn/__init__.py`: Add hip-specific changes (use "hide whitespace changes" option to view simpler diff)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33118

Differential Revision: D19977719

Pulled By: bddppq

fbshipit-source-id: 64d4dd1d78afcf96201360d85b8be5950f96dfad
2020-02-20 10:47:57 -08:00
peter
b77c25dec0 Fix dll load logic for Python 3.8 on Windows (#32215)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/31181 and https://github.com/pytorch/pytorch/pull/31162#discussion_r362495611.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32215

Differential Revision: D19501869

Pulled By: ezyang

fbshipit-source-id: 363824e52d2592ad968ecf1df345aa4c0daff915
2020-01-22 08:33:34 -08:00
Brian Wignall
e7fe64f6a6 Fix typos (#30606)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606

Differential Revision: D18763028

Pulled By: mrshenli

fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c
2019-12-02 20:17:42 -08:00
Dmytro Dzhulgakov
764bf826e3 Remove fbgemm_is_cpu_supported in favor of torch.backends.quantized.supported_qengines (#26840)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26840

Cleaning up top-level namespace. Also cosmetic changes to torch.backends.quantized

Test Plan: Imported from OSS

Differential Revision: D17604403

Pulled By: dzhulgakov

fbshipit-source-id: c55af277ea7319d962a82a6120f65ccd47a60abc
2019-09-27 13:45:15 -07:00
Supriya Rao
45391ccecb Update qengine flag in python to string (#26620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26620

This change updates torch.backend.quantized.engine to accept string ("fbgemm"/"qnnpack"/"none" for now).
set_qengine and get_qengine return an int which represents the at::QEngine enum

Test Plan:
python test/test_torch.py

Imported from OSS

Differential Revision: D17533582

fbshipit-source-id: 5103263d0d59ff37d43dec27243cb76ba8ba633f
2019-09-23 17:56:50 -07:00
Jerry Zhang
8f50ea0f5c Add NoQEngine to QEngine and refactor the name of set/get qengine (#26471)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26471

att

Test Plan:
.

Imported from OSS

Differential Revision: D17491215

fbshipit-source-id: 5790aa0113bfdbeeb838f3d1530397606ccaa1e9
2019-09-19 17:42:09 -07:00
Ailing Zhang
b1ecf4bc82 Revert D17464904: Add NoQEngine to QEngine and refactor the name of set/get qengine
Test Plan: revert-hammer

Differential Revision:
D17464904

Original commit changeset: d8f2cebb978f

fbshipit-source-id: 8feb86f7347f455eb51538ce7893d4a096ba0ba4
2019-09-18 20:04:58 -07:00
Jerry Zhang
4f7292f7ee Add NoQEngine to QEngine and refactor the name of set/get qengine (#26330)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26330

att

Test Plan:
.

Imported from OSS

Differential Revision: D17464904

fbshipit-source-id: d8f2cebb978fcbc478bc7e111ba24bc71a6f8915
2019-09-18 19:38:59 -07:00
Supriya Rao
24d5b5f5f9 Add Runtime flag for quantized backend. (#25680)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25680

Add a runtime flag to choose between FBGEMM and QNNPACK when compiled with both.

The flag can be set by using torch.backends.quantized.engine = torch.fbgemm/torch.qnnpack or ctx::setPreferredQuantizedEngine(at::QEngine)
ghstack-source-id: 89935643

Test Plan: Verified torch.backends.quantized.engine works

Differential Revision: D17198233

fbshipit-source-id: e5449d06f4136385e0e6d18bd4237f8654a61672
2019-09-11 21:37:36 -07:00
jiayisun
b9bf91feb8 Add torch.backends.mkldnn.enabled flag (#25459)
Summary:
This PR is about add torch.backends.mkldnn.enabled flag said in https://github.com/pytorch/pytorch/issues/25186 which can be used disable mkldnn at runtime step as torch.backends.cudnn.enabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25459

Differential Revision: D17258926

Pulled By: ezyang

fbshipit-source-id: e179ad364cc608fdaa7d0f37e2e762ceb5eda598
2019-09-11 12:09:40 -07:00
peter
d6f62b70f3 Fix cuda and cudnn libraries search process on Windows (#20205)
Summary:
Fixes #20202
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20205

Differential Revision: D15258626

Pulled By: ezyang

fbshipit-source-id: 855ad457a8bb7a46accc7cf6ec5cb09e98f6e770
2019-05-08 06:08:47 -07:00
Tongzhou Wang
973d51079b Add device-specific cuFFT plan caches (#19300)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/19224
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19300

Differential Revision: D14986967

Pulled By: soumith

fbshipit-source-id: 8c31237db50d6924bba1472434c10326610d9255
2019-04-18 06:39:35 -07:00
Edward Yang
50df3e5e2e Add ability to query if built with CUDA and MKL-DNN. (#18362)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18362
ghimport-source-id: 374b7ab97e2d6a894368007133201f510539296f

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18242 Test running a CUDA build on CPU machine.
* **#18362 Add ability to query if built with CUDA and MKL-DNN.**

Fixes #18108.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14584430

fbshipit-source-id: 7605a1ac4e8f2a7c70d52e5a43ad7f03f0457473
2019-03-25 10:39:09 -07:00
SsnL
13422fca32 Add torch.backends.openmp.is_available(); fix some cmake messages (#16425)
Summary:
1. add `torch.backends.openmp.is_available()`
2. Improve various `cmake` outputs
3. Fix LDFLAGS not respected by `caffe2_pybind11_state_*` targets
4. Fix `MKL` warning message, and QUIET flag.
5. Fix various typos
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16425

Differential Revision: D13903395

Pulled By: soumith

fbshipit-source-id: d15c5d46f53e1ff1c27fca2887b9d23d0bd85b4d
2019-01-31 16:15:46 -08:00
Lu Fang
b1b00f329e Fix the flake8 linter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16549

Reviewed By: bddppq

Differential Revision: D13877435

Pulled By: houseroad

fbshipit-source-id: dbe575ba3f6dd30d27ac6aa5eec2eea025063540
2019-01-30 09:36:00 -08:00
David Riazati
bc74ec80d0 Add support for torch.backends.cudnn.enabled (#13057)
Summary:
This is used commonly in `nn` functions. This PR adds it as a weak
module (and also alters the conversion of weak modules to strong modules
to accept ordinary `object`s)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13057

Differential Revision: D10846618

Pulled By: driazati

fbshipit-source-id: 028b9f852d40e2e53ee85b93282c98cef8cd336b
2018-10-31 09:31:09 -07:00
sclarkson
2b033332c8 Allow linking to backwards-compatible cuDNN at runtime (#12239)
Summary:
Fixes #12193
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12239

Differential Revision: D10321744

Pulled By: soumith

fbshipit-source-id: bf437f7f9b6231158a1585d2dabae8d937396478
2018-10-10 23:56:51 -07:00
Matt Dawkins
87b2f05a9c Also set stdin to subprocess pipe in FindCUDNN windows popen call (#11435)
Summary:
Same issue as https://github.com/pytorch/pytorch/pull/10379, just in a different place (adding this resolves it)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11435

Differential Revision: D9736396

Pulled By: soumith

fbshipit-source-id: 220a52b8009fc2bee9313c5a091443c68f85f62f
2018-09-09 11:40:25 -07:00
Peter Goldsborough
9ce15173fb Move _cudnn_init_dropout_state to TensorOptions and enable cuDNN dropout in C++ API RNNs (#9012)
Summary:
The goal of this PR was to add support for dropout descriptors in the C++ API's RNN class.
The end result is a 4x-5x speedup for our RNN integration tests since they can now use cuDNN instead of autograd when dropout is set.

To achieve this, I had to move `_cudnn_init_dropout_state` to the `TensorOptions` API.

I also fixed a bug around `RNN::cuda()` not flattening parameters for cuDNN.

ebetica ezyang
Closes https://github.com/pytorch/pytorch/pull/9012

Reviewed By: pjh5

Differential Revision: D8689786

Pulled By: goldsborough

fbshipit-source-id: 44fb191f5a38e41c4ded5417306b5bbc012cd56c
2018-06-29 17:25:23 -07:00
Tongzhou Wang
e6c7b38f94
Cache cufft plans (#8344)
* cache cufft plans

* use an LRU cache

* suffix CuFFTParams members with _

* import print_function for py2

* lint

* fix potential race; add dummy impl for CPU only builds

* cpp formatting; remove nccl makefile change

* Use CUDA hooks instead

* comments and doc

* update the error message

* move LRU cachae to a separate file and native::detail namespace

* update comment

* specify NOTE location in CuFFTPlanCache.h

* update disabled_features.yaml to make amd ci work

* another fix for AMD CI in disabled_features.yaml

* Wrap cufft_plan_cache_* methods in __HIP_PLATFORM_HCC__

* improve the notes

* lint

* revert onnx change

* put back inlining for CUFFT_CHECK
2018-06-22 13:02:34 -04:00
Peter Goldsborough
0acddd6cee
Add torch.cuda.cudnn_is_available (#8703) 2018-06-20 14:18:03 -07:00
Edward Z. Yang
64834f6fb8
Split libATen.so into libATen_cpu.so and libATen_cuda.so (#7275)
* Split libATen.so into libATen_cpu.so and libATen_cuda.so

Previously, ATen could be built with either CPU-only support, or
CPU/CUDA support, but only via a compile-time flag, requiring
two separate builds.  This means that if you have a program which
indirectly uses a CPU-only build of ATen, and a CPU/CUDA-build of
ATen, you're gonna have a bad time.  And you might want a CPU-only
build of ATen, because it is 15M (versus the 300M of a CUDA build).

This commit splits libATen.so into two libraries, CPU/CUDA, so
that it's not necessary to do a full rebuild to get CPU-only
support; instead, if you link against libATen_cpu.so only, you
are CPU-only; if you additionally link/dlopen libATen_cuda.so,
this enables CUDA support.  This brings ATen's dynamic library
structure more similar to Caffe2's.  libATen.so is no more
(this is BC BREAKING)

The general principle for how this works is that we introduce
a *hooks* interface, which introduces a dynamic dispatch indirection
between a call site and implementation site of CUDA functionality,
mediated by a static initialization registry.  This means that we can continue
to, for example, lazily initialize CUDA from Context (a core, CPU class) without
having a direct dependency on the CUDA bits.  Instead, we look up
in the registry if, e.g., CUDA hooks have been loaded (this loading
process happens at static initialization time), and if they
have been we dynamic dispatch to this class.  We similarly use
the hooks interface to handle Variable registration.

We introduce a new invariant: if the backend of a type has not
been initialized (e.g., it's library has not been dlopened; for
CUDA, this also includes CUDA initialization), then the Type
pointers in the context registry are NULL.  If you access the
registry directly you must maintain this invariant.

There are a few potholes along the way.  I document them here:

- Previously, PyTorch maintained a separate registry for variable
  types, because no provision for them was made in the Context's
  type_registry.  Now that we have the hooks mechanism, we can easily
  have PyTorch register variables in the main registry.  The code
  has been refactored accordingly.

- There is a subtle ordering issue between Variable and CUDA.
  We permit libATen_cuda.so and PyTorch to be loaded in either
  order (in practice, CUDA is always loaded "after" PyTorch, because
  it is lazily initialized.)  This means that, when CUDA types are
  loaded, we must subsequently also initialize their Variable equivalents.
  Appropriate hooks were added to VariableHooks to make this possible;
  similarly, getVariableHooks() is not referentially transparent, and
  will change behavior after Variables are loaded.  (This is different
  to CUDAHooks, which is "burned in" after you try to initialize CUDA.)

- The cmake is adjusted to separate dependencies into either CPU
  or CUDA dependencies.  The generator scripts are adjusted to either
  generate a file as a CUDA (cuda_file_manager) or CPU file (file_manager).

- I changed all native functions which were CUDA-only (the cudnn functions)
  to have dispatches for CUDA only (making it permissible to not specify
  all dispatch options.)  This uncovered a bug in how we were handling
  native functions which dispatch on a Type argument; I introduced a new
  self_ty keyword to handle this case.  I'm not 100% happy about it
  but it fixed my problem.

  This also exposed the fact that set_history incompletely handles
  heterogenous return tuples combining Tensor and TensorList.  I
  swapped this codegen to use flatten() (at the possible cost of
  a slight perf regression, since we're allocating another vector now
  in this code path).

- thc_state is no longer a public member of Context; use getTHCState() instead

- This PR comes with Registry from Caffe2, for handling static initialization.
  I needed to make a bunch of fixes to Registry to make it more portable

  - No more ##__VA_ARGS__ token pasting; instead, it is mandatory to pass at
    least one argument to the var-args. CUDAHooks and VariableHooks pass a nullary
    struct CUDAHooksArgs/VariableHooksArgs to solve the problem. We must get rid of
    token pasting because it does not work with MSVC.

  - It seems MSVC is not willing to generate code for constructors of template
    classes at use sites which cross DLL boundaries. So we explicitly instantiate
    the class to get around the problem. This involved tweaks to the boilerplate
    generating macros, and also required us to shuffle around namespaces a bit,
    because you can't specialize a template unless you are in the same namespace as
    the template.
  - Insertion of AT_API to appropriate places where the registry must be exported

- We have a general problem which is that on recent Ubuntu distributions,
  --as-needed is enabled for shared libraries, which is (cc @apaszke who was
  worrying about this in #7160 see also #7160 (comment)). For now, I've hacked
  this up in the PR to pass -Wl,--no-as-needed to all of the spots necessary to
  make CI work, but a more sustainable solution is to attempt to dlopen
  libATen_cuda.so when CUDA functionality is requested.

    - The JIT tests somehow manage to try to touch CUDA without loading libATen_cuda.so. So
      we pass -Wl,--no-as-needed when linking libATen_cuda.so to _C.so

- There is a very subtle linking issue with lapack, which is solved by making sure libATen_cuda.so links against LAPACK. There's a comment in aten/src/ATen/CMakeLists.txt about htis as well as a follow up bug at #7353

- autogradpp used AT_CUDA_ENABLED directly. We've expunged these uses and added
  a few more things to CUDAHooks (getNumGPUs)

- Added manualSeedAll to Generator so that we can invoke it polymorphically (it
  only does something different for CUDAGenerator)

- There's a new cuda/CUDAConfig.h header for CUDA-only ifdef macros (AT_CUDNN_ENABLED, most prominently)

- CUDAHooks/VariableHooks structs live in at namespace because Registry's
  namespace support is not good enough to handle it otherwise (see Registry
  changes above)

- There's some modest moving around of native functions in ReduceOps and
  UnaryOps to get the CUDA-only function implementations into separate files, so
  they are only compiled into libATen_cuda.so. sspaddmm needed a separate CUDA
  function due to object linkage boundaries.

- Some direct uses of native functions in CUDA code has to go away, since these
  functions are not exported, so you have to go through the dispatcher
  (at::native::empty_like to at::empty_like)

- Code in THC/THCS/THCUNN now properly use THC_API macro instead of TH_API
  (which matters now that TH and THC are not in the same library)

- Added code debt in torch/_thnn/utils.py and other THNN parsing code to handle
  both TH_API and THC_API

- TensorUtils.h is now properly exported with AT_API

- Dead uses of TH_EXPORTS and co expunged; we now use ATen_cpu_exports and
  ATen_cuda_exports (new, in ATenCUDAGeneral.h) consistently

- Fix some incorrect type annotations on _cudnn_rnn_backward, where we didn't
  declare a type as possibly undefined when we should have. We didn't catch this
  previously because optional annotations are not tested on "pass-through" native
  ATen ops (which don't have dispatch). Upstream issue at #7316

- There's a new cmake macro aten_compile_options for applying all of our
  per-target compile time options. We use this on the cpu and cuda libraries.

- test/test_cpp_extensions.py can be run directly by invoking in Python,
  assuming you've setup your PYTHONPATH setup correctly

- type_from_string does some new funny business to only query for all valid CUDA
  types (which causes CUDA initialization) when we see "torch.cuda." in the
  requested string

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Last mile libtorch fixes

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* pedantic fix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-10 10:28:33 -07:00
Tongzhou Wang
1c01eabd3c
Codemod to update our codebase to 0.4 standard (#6641)
* Codemod to update our codebase to 0.4 standard

* Update some of the test scri[ts

* remove Variable in test_clip_grad_value

* fix _symbolic_override_wrapper_maker
2018-04-17 22:06:54 -04:00
gchanan
749d51414a
Separate cuda-ness from dtype. (#6470)
* Separate cuda-ness from dtype.

There are no longer torch.cuda.int64, etc; only torch.int64 that correspond to at::ScalarType.
At the python arg parser level, the corresponding ATen type is selected from the combination of (ScalarType, Layout, Device).

There is also currently unused code in here for support ScalarType in native_functions; this will be used for specifying aggregate types
on reduction functions.

* Fix test_autograd.

* Add defaults to randint_like.

* Track is_cuda in py tensor types.

* Fix test_sparse.

* Fix multiprocessing.

* Fix rnn.

* Fix test_nn.

* Fix flake8.
2018-04-12 14:05:44 -04:00