Commit Graph

5620 Commits

Author SHA1 Message Date
eellison
29fe90e2a2
[release/1.6] [JIT] Dont include view ops in autodiff graphs (#42029)
* Dont include view ops in autodiff graphs

* skip view ops in autodiff testing

* two more tests

* appease calng format

* Pacify clang-format

Co-authored-by: eellison <eellison@fb.com>
Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
2020-07-24 13:41:32 -07:00
Nikita Shulga
35ad2d8586 Revert "[jit] fix tuple alias analysis (#41992)"
This reverts commit 8aa878fc93.
2020-07-24 13:32:00 -07:00
Michael Suo
8aa878fc93
[jit] fix tuple alias analysis (#41992)
Previously when analyzing a TupleConstruct, we ignored the aliasing
information of the inputs and simply marked all elements of the returned
tuple as wildcards. But since we can fully reason about the contents of
a tuple statically, we should be able to assign them aliasing
information.

This analysis was not only incomplete but produced incorrect results,
since if `a` is not a wildcard, `a noalias wilcard`. So if we looked at
`tuple(a)` and reported the aliasing info as `tuple(wildcard)`, then
`tuple[0] noalias a`, which is...wrong.
2020-07-24 08:05:20 -07:00
Nikita Shulga
43d746305c
Preserve CUDA gencode flags (#41212)
Summary:
Add `torch._C._cuda_getArchFlags()` that returns list of architecture `torch_cuda` were compiled with
Add `torch.cuda.get_arch_list()` and `torch.cuda.get_gencode_flags()` methods that returns architecture list and gencode flags PyTorch were compiled with
Print warning if some of GPUs is not compatible with any of the CUBINs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41173

Differential Revision: D22459998

Pulled By: malfet

fbshipit-source-id: 65d40ae29e54a0ba0f3f2da11b821fdb4d452d95
2020-07-09 17:34:50 -07:00
Negin Raoof
9409e03903
[ONNX][1.6] Update interpolate recompute_scale_factor default (#41117)
* Update interpolate recompute_scale_factor default

* Update upsampling.h

* Update functional.py
2020-07-09 17:24:53 -07:00
Rohan Varma
77ffb25925
Add guard for non-default stream in DDP's autograd engine callback (#40115) (#41151)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40115

Closes https://github.com/pytorch/pytorch/issues/37790
Closes https://github.com/pytorch/pytorch/issues/37944

A user may wish to run DDP's forward + backwards step under a non-default CUDA stream such as those created by `with torch.cuda.Stream(stream)`. In this case, the user should be responsible for synchronizing events on this stream with other streams used in the program (per the documentation at https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics), but currently DDP has a bug which causes DDP under non-default streams to fail.

If a user does the following:
```
model = DDP(...)
loss = model(inptut).sum()
loss.backward()
grad = model.module.weight.grad()
average = dist.all_reduce(grad)
```

There is a chance that `average` and `grad` will not be equal. This is because the CUDA kernels corresponding to the  `all_reduce` call may run before `loss.backward()`'s kernels are finished. Specifically, in DDP we copy the allreduced gradients back to the model parameter gradients in an autograd engine callback, but this callback runs on the default stream. Note that this can also be fixed by the application synchronizing on the current stream, although this should not be expected, since the application is not using the current stream at all.

This PR fixes the issue by passing the current stream into DDP's callback.

Tested by adding a UT `test_DistributedDataParallel_non_default_stream` that fails without this PR
ghstack-source-id: 106481208

Differential Revision: D22073353

fbshipit-source-id: 70da9b44e5f546ff8b6d8c42022ecc846dff033e
2020-07-08 21:08:17 -07:00
Jerry Zhang
a857af50a4
[quant][graphmode][fix] cloning schema in insert_observers (#40624) (#40934)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40624

Previously we didn't clone schema, so the default schema is used, this is
causing issue for some models

Test Plan: Imported from OSS

Differential Revision: D22259519

fbshipit-source-id: e2a393a54cb18f55da0c7152a74ddc22079ac350
2020-07-07 13:27:36 -07:00
Jerry Zhang
d0045e5520
Some fixes for graph mode quantization (#40935)
* [quant] aten::repeat work for quantized tensor (#40644)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40644

Test Plan: Imported from OSS

Differential Revision: D22268558

fbshipit-source-id: 3bc9a129bece1b547c519772ecc6b980780fb904

* [quant][graphmode][fix] remove unsupported ops in the list (#40653)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40653

(Note: this ignores all push blocking failures!)

Test Plan: Imported from OSS

Differential Revision: D22271413

fbshipit-source-id: a01611b5d90849ac673fa5a310f910c858e907a3
2020-07-07 13:26:27 -07:00
Jerry Zhang
0406b69b79
[quant][graphmode][fix] Fold conv bn (#40865) (#40970)
* [quant][graphmode][fix] Fold conv bn (#40865)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40865

1. applied filter for the module types
2. removed the assumption that the conv bn are immediate child of parent module

Test Plan:
python test/test_quantization.py TestQuantizeJitPasses

Imported from OSS

Differential Revision: D22338074

fbshipit-source-id: 64739a5e56c0a74249a1dbc2c8454b88ec32aa9e

* [quant][graphmode][fix] Print the node in error message (#40889)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40889

Test Plan: Imported from OSS

Differential Revision: D22348266

fbshipit-source-id: eed2ece5c94fcfaf187d6770bed4a7109f0c0b4a
2020-07-07 13:25:39 -07:00
Jerry Zhang
6220cc4380
[quant][graphmode][fix] dequantize propagation for {add/mul}_scalar + aten::repeat (#40933)
* [quant][graphmode][fix] dequantize propagation for {add/mul}_scalar (#40596)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40596

Previously the fusion patterns for {add/mul}_scalar is inconsistent since the op pattern
produces a non-quantized tensor and the op replacement graph produces a quantized tensor

Test Plan: Imported from OSS

Differential Revision: D22251072

fbshipit-source-id: e16eb92cf6611578cca1ed8ebde961f8d0610137

* [quant][graphmode] Support quantization for `aten::apend` (#40743)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40743

`aten::append` modifies input inplace and the output is ignored, these ops are not
supported right now, so we'll need to first make `aten::append` non-inplace
by change
```
ignored = aten::append(list, x)
```
to
```
x_list = aten::ListConstruct(x)
result = aten::add(list, x_list)
```
and then quantize the aten::add instead.

Test Plan:
TestQuantizeJitOps.test_general_shape_ops

Imported from OSS

Differential Revision: D22302151

fbshipit-source-id: 931000388e7501e9dd17bec2fad8a96b71a5efc5
2020-07-07 13:25:02 -07:00
eellison
c35b4c770b
Bucket of shape analysis fixes (#41044)
* [JIT] fix unfold shape analysis (#40749)

Summary:
unfold on a 0-dimensioned tensor returns a 1-dim tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40749

Differential Revision: D22361481

Pulled By: eellison

fbshipit-source-id: 621597e5f97f6e39953eb86f8b85bb4142527a9f

* shape analysis fix for default dtype'

ghstack-source-id: 723aa27c2685417715a0891f5ca1ae885d4c9832
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40938

* fix grad thrashing of shape analysis

ghstack-source-id: dd8742b1da52d17e9d6ab6c81ff0b27520b09417
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40939

Co-authored-by: Elias Ellison <eellison@fb.com>
2020-07-07 12:59:47 -07:00
Mikhail Zolotukhin
11b70b0041
[JIT] Switch executor from Simple to Legacy. (#41017)
* properly skip legacy tests regardless of the default executor (#40381)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40381

Differential Revision: D22173938

Pulled By: Krovatkin

fbshipit-source-id: 305fc4484977e828cc4cee6e053a1e1ab9f0d6c7

* [JIT] Switch executor from Simple to Legacy.

This is done for 1.6 only in order to recover performance regressions
caused by the Legacy->Simple switch that was done in 1.5. On master we
still plan to use Simple executor and fix the performance issues in 1.7
without falling back to the Legacy executor.

Co-authored-by: Nikolay Korovaiko <korovaikon@gmail.com>
2020-07-06 21:35:02 -07:00
Nick Korovaiko
3f13c9a2c8
infer tensor properties based on an input tensor rather than defaults for xxx_like ctors (#40895) (#41016)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40895

Reviewed By: eellison

Differential Revision: D22358878

Pulled By: Krovatkin

fbshipit-source-id: 2db2429aa89c180d8e52a6bb1265308483da46a2
2020-07-06 16:52:59 -07:00
Nick Korovaiko
63a94c021a
shape inference of undefined for prim::grad (#40866) (#41015)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40866

Reviewed By: pbelevich

Differential Revision: D22358988

Pulled By: Krovatkin

fbshipit-source-id: 7118d7f8d4eaf056cfb71dc0d588d38b1dfb0fc7
2020-07-06 16:51:37 -07:00
Nick Korovaiko
2b175ba909
update requires_gard on loop inputs correctly (master) (#40926) (#41014)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40926

Reviewed By: eellison

Differential Revision: D22359471

Pulled By: Krovatkin

fbshipit-source-id: 823e87674e2d2917f075255ec926e0485972f4e2
2020-07-06 16:30:14 -07:00
Jerry Zhang
e89c4f0dec
[quant] Fix fuse linear pass (#40549) (#40751)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40549

Currently we didn't check if %weight_t is produced by `aten::t`, this will fuse some `matmul`/`addmm` that is
not 2d to `aten::linear`, which is incorrect

Test Plan: Imported from OSS

Differential Revision: D22225921

fbshipit-source-id: 9723e82fdbac6d8e1a7ade22f3a9791321ab12b6
2020-07-02 10:23:22 -07:00
Jerry Zhang
ea273c68f9
Inplace construct of TorchScript Module and inplace option for quantization (#40750)
* [WIP][JIT] Add ScriptModule._reconstruct (#39979)

Summary:
**Summary**
This commit adds an instance method `_reconstruct` that permits users
to reconstruct a `ScriptModule` from a given C++ `Module` instance.

**Testing**
This commit adds a unit test for `_reconstruct`.

**Fixes**
This pull request fixes https://github.com/pytorch/pytorch/issues/33912.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39979

Differential Revision: D22172323

Pulled By: SplitInfinity

fbshipit-source-id: 9aa6551c422a5a324b822a09cd8d7c660f99ca5c

* [quant][graphmode] Enable inplace option for top level API (#40414)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40414

after `_reconstruct` is supported in RecursiveScriptModule: https://github.com/pytorch/pytorch/pull/39979
we can support inplace option in quantization API

Test Plan: Imported from OSS

Differential Revision: D22178326

fbshipit-source-id: c78bc2bcf2c42b06280c12262bb31aebcadc6c32

Co-authored-by: Meghan Lele <meghanl@fb.com>
2020-07-02 10:22:45 -07:00
Jerry Zhang
4dd37bfbf7
[jit] Remove unnecessary clone APIs for script::Module and RecursiveScriptModule (#40297) (#40748)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40297

Test Plan: Imported from OSS

Differential Revision: D22191660

fbshipit-source-id: 4b338ca82caaca04784bffe01fdae3d180c192f4
2020-07-02 10:22:27 -07:00
Nikita Shulga
b4b8f5b9d4
Release GIL during DDP construction. (#40877)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40495

As part of debugging flaky ddp_under_dist_autograd tests, I realized
we were running into the following deadlock.

1) Rank 0 would go into DDP construction, hold GIL and wait for broadcast in
DDP construction.
2) Rank 3 is a little slower and performs an RRef fetch call before the DDP
construction.
3) The RRef fetch call is done on Rank 0 and tries to acquire GIL.
4) We now have a deadlock since Rank 0 is waiting for Rank 3 to enter the
collective and Rank 3 is waiting for Rank 0 to release GIL.
ghstack-source-id: 106534442

Test Plan:
1) Ran ddp_under_dist_autograd 500 times.
2) waitforbuildbot

Differential Revision: D22205180

fbshipit-source-id: 6afd55342e801b9edb9591ff25158a244a8ea66a

Co-authored-by: Pritam Damania <pritam.damania@fb.com>
2020-07-01 13:36:50 -07:00
Wanchao
41816dc97f
[1.6] Fix dictConstruct ordering and enable dict mix (#40797)
A combination of https://github.com/pytorch/pytorch/pull/39601 and
https://github.com/pytorch/pytorch/pull/40424 both are approved and
merged in master
2020-07-01 09:30:16 -07:00
Mike Ruberry
ddea6c552f
Ports full dtype inference deprecation to 1.6 (#40799)
* ports full deprecation

* fixtures

* Fixes lint

* Trying to fix phantom lint issue

* nuclear lint option

* Paradoxical linter fix

Co-authored-by: Mike Ruberry <mruberry@devfair044.maas>
2020-07-01 09:27:27 -07:00
Mikhail Zolotukhin
091537a764
[JIT][1.6] Shape analysis fixes. (#40716)
* [JIT] Update type of the unsqueeze's output in shape analysis.

* [JIT] Fix shape analysis for aten::masked_select.

The reference says that this op always returns a 1-D tensor, even if
the input and the mask are 0-D.
2020-07-01 08:41:05 -07:00
peterjc123
415e499330
Fix zip serialization for file > 2GiB for Windows (#40852) 2020-07-01 08:36:40 -07:00
Mike Ruberry
75a074abdc
1.6 Port: Dynamic Versioning (#40542)
Co-authored-by: Mike Ruberry <mruberry@devfair044.maas>
2020-06-30 10:18:18 -07:00
James Reed
0c90b6da5c
[1.6 cherrypick] Fix zip serialization for file > 2GiB (#40757)
* [1.6 cherrypick] Fix zip serialization for file > 2GiB

* Update test/test_serialization.py

Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
2020-06-30 07:10:02 -07:00
wconstab
fe45c2c986
Allow slicing sequential container (#40538)
- fixes #38034
- works around missing slice functionality in Sequential
  by casting to tuple and slicing that instead
- supports iterating on the resulting slice but not call()
2020-06-29 19:29:19 -07:00
peterjc123
ea1b0dba18
Remove constexpr for NVCC on Windows (#40676) 2020-06-29 13:48:50 -07:00
eellison
8682ac147b
Docs merge (#40569)
Co-authored-by: Elias Ellison <eellison@fb.com>
2020-06-26 12:24:08 -07:00
mrshenli
0dc93ac119
[v1.6.0 patch] Install method docstrings from PyRRef to RRef (#40620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40461

It turned out `:inheried-members:` (see [doc](https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#directive-autoclass)) is not really usable.

Because pybind11 generates a docstring that writes `self` as parent class, `rpc.PyRRef`, type.

As a workaround, I am pulling docstrings on parent-class, `PyRRef` class, into subclass, `RRef`. And do surgery on the docstring generated by pybind11.

{F241283111}

ghstack-source-id: 106472496

P134031188

Differential Revision: D7933834

fbshipit-source-id: c03a8a4c9d98888b64492a8caba1591595bfe247

Co-authored-by: Shihao Xu <shihaoxu@fb.com>
2020-06-26 12:15:28 -07:00
Ilia Cherniavskii
d8c384544e Destroy CUDA events after profiling (#39962)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39962

Adding a simple wrapper with ref count for cuda event and
destroying cuda event after the last copy is destroyed

Test Plan: CI cuda profiler tests

Differential Revision: D22027092

Pulled By: ilia-cher

fbshipit-source-id: e0810388aa60b2291eb010896e13af1fad92e472
2020-06-23 10:44:39 -07:00
Jerry Zhang
f652abc1dd [jit] Enable copy.deepcopy and copy.copy for RecursiveScriptModule (#32685)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32685

att

Test Plan:
.

Imported from OSS

Differential Revision: D21220755

fbshipit-source-id: 5c71e9bb9f43032cf60563a9e67579118a8d7e33
2020-06-23 09:21:12 -07:00
Pritam Damania
54c05fa34e Add basic GPU support to distributed autograd. (#40312)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40312

As part of https://github.com/pytorch/pytorch/issues/40255, we
realized that GPU support for distributed autograd was broken as part of our
multithreaded autograd change.

To fix this in the short term for 1.6, this PR includes the following changes:

1) Long lived CPU thread in DistEngine to execute GPU->CPU continuations in the
autograd graph.
2) The long lived CPU thread has its own ready_queue and this queue is used for
all GraphTasks created by DistEngine.
3) In thread_main(), the CPU thread cannot exit once the GraphTask is done
processing because of the new CPU thread added in 1).
4) To resolve this, thread_main() now has a parameter `device_thread` instead
of `reentrant_thread`. When device_thread is True, we expect this to be a long
lived device thread that does not exit.
5) When device_thread is False, thread_main is expected to run a GraphTask and
return once done.
ghstack-source-id: 106391329

Test Plan: waitforbuildbot

Differential Revision: D22146183

fbshipit-source-id: dd146b7a95f55db75f6767889b7255e9d62d5825
2020-06-23 07:49:00 -07:00
Luca Wehrstedt
78b3d5f878 [TensorPipe] Register multiplexing channel over UV (#40389)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40389

The `mpt_uv` channel MultiPlexes over a Transport, namely the UV one. What this means is that it takes a tensor, chunks it into equal parts and sends each of them on a separate UV connection, each running in a separate UV loop. Thus they each have their own socket and thread. This allows them to reach bandwidths that go beyond what a simple single-threaded approach can do, which is necessary to reach the high bandwidths of some modern NICs.
ghstack-source-id: 106375511

Test Plan: Ran a few manual tests myself, for the rest relied on the PyTorch RPC tests.

Differential Revision: D22144380

fbshipit-source-id: ef555fa04c6f13a4acf3bd5f7b03d04d02460d38
2020-06-23 00:24:17 -07:00
Jerry Zhang
ba89a89376 [quant][graphmode][refactor] InsertQuantDeQuantHelper (#40384)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40384

Test Plan: Imported from OSS

Differential Revision: D22164072

fbshipit-source-id: 0ca86265cfef1afa99dd860a452f3dd76e31792a
2020-06-22 21:30:17 -07:00
Vitaly Fedyunin
7bf1dd582a Fix Cuda IPC deadlock (#40347)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40347

Fixes: #39541
Fixes: #25301

Differential Revision: D22152662

Test Plan: Imported from OSS

Pulled By: VitalyFedyunin

fbshipit-source-id: 82548aa4c937e0260932244e78cb132bcb3209b3
2020-06-22 20:50:25 -07:00
Jerry Zhang
18122facb9 [quant][graphmode] Add warning for debug option for add_scalar/mul_scalar (#40383)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40383

debug option is not supported for these cases, so we print a warning if it occurs

Test Plan: Imported from OSS

Differential Revision: D22164071

fbshipit-source-id: 90459530f4efdd6d255df4f015606cb0e9070cd3
2020-06-22 20:29:44 -07:00
Jerry Zhang
64f925eb0c [quant][graphmode] Add support for functional linear (#40331)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40331

Test Plan: Imported from OSS

Differential Revision: D22162905

fbshipit-source-id: 3e0320d5f5c267c778af8e2fe4224f8383aab2c8
2020-06-22 18:05:06 -07:00
Michael Carilli
8066fba226 [RELAND2] Change AccumulateGrad to yield .grads that match weights' memory layout (#40358)
Summary:
https://github.com/pytorch/pytorch/pull/40129 fixed the error responsible for the first revert, but exposed another error in the same test.

This PR is intended as the "master copy" for merge, and it runs on full CI.
Two other PRs (restricted to run on a small subset of CI) supporting debugging DDP failures/hangs with multiple devices per process (`test_c10d.py:DistributedDataParallelTest.test_grad_layout_1devicemodule_2replicaperprocess`).
- https://github.com/pytorch/pytorch/pull/40290 tries the test with purely rowmajor contiguous params on an untouched master.  In other words https://github.com/pytorch/pytorch/pull/40290 contains none of this PR's diffs aside from the test itself.
- https://github.com/pytorch/pytorch/pull/40178, for comparison, tries the test with this PR's diffs.

Both fail the same way, indicating failure is unrelated to this PR's other diffs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40358

Differential Revision: D22165785

Pulled By: albanD

fbshipit-source-id: ac7cdd79af5c080ab74341671392dca8e717554e
2020-06-22 17:13:21 -07:00
anjali411
8ec2ae9a9f Add view_as_real, view_as_complex for complex tensors (#39099)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39099

Test Plan: Imported from OSS

Differential Revision: D22057886

Pulled By: anjali411

fbshipit-source-id: bad5ba7097ba0dd13f2c549b2463094dee9afa14
2020-06-22 15:15:27 -07:00
Alban Desmaison
02ae9a1583 add TypeError to c10 and fix segfault in error checking in Tensor constructor (#40106)
Summary:
As per title.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40106

Differential Revision: D22137193

Pulled By: albanD

fbshipit-source-id: 11d059263c00a834211f016bd9a9e18fdc0437ef
2020-06-22 13:42:44 -07:00
Zhang, Xiaobing
87c5f02f3d jit: Conv3d + BatchNorm3d fusion (#40082)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40082

Differential Revision: D22120340

Pulled By: jerryzh168

fbshipit-source-id: fce6c5f03fe7ab6c60620cbdf547d5a466a470e3
2020-06-22 11:15:52 -07:00
Rohan Varma
14f7e95c1a Add prefix of remote events for RPC profiling (#40066)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40066

Builds on top of the previous PR to ensure that all remotely profiled events are prefixed with the key for the RPC that generated them.

The key is generated by the result of `_build_rpc_profiling_key` in `rpc/internal.py` and prefixed onto the event name. In order to do this, we set the current-key when creating the RPC in Python, retrieve the currently-set key in C++ and save a GloballyUniqueId -> key mapping to an in-memory map. When we receive an RPC with profiling information, we expect to receive this ID back, and look up the corresponding profiling key in the map.

The key is then added to all the remote events.

Tested by adding tests to ensure the key is added to all the remote events. Also added a UT which tests in under the multi-threading scenario, to ensure that the mapping's correctness is maintained when several RPCs are in the process of being created at once.
ghstack-source-id: 106316106

Test Plan: Unit test

Differential Revision: D22040035

fbshipit-source-id: 9215feb06084b294edbfa6e03385e13c1d730c43
2020-06-22 11:01:07 -07:00
BowenBao
eaa91071ca [ONNX] Support large attribute and subgraph for large model (#38793)
Summary:
Previously large tensor data in attributes and subgraphs are not stored externally. ONNX won't be able to serialize the model for cases where the total size sums up to >= 2GB. This PR enables that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38793

Reviewed By: hl475

Differential Revision: D22111092

Pulled By: houseroad

fbshipit-source-id: 355234e50825d576754de33c86a9690161caaeaf
2020-06-22 10:34:37 -07:00
Edward Yang
e4766fb4d9 Meta tensors, but without code deduplication (#38490)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38490

A meta tensor is a tensor that is a lot like a normal tensor,
except it doesn't actually have any data associated with it.
You can use them to carry out shape/dtype computations without
actually having to run the actual code; for example, this could
be used to do shape inference in a JIT analysis pass.
Check out the description in DispatchKey.h for more information.

Meta tensors are part of a larger project to rationalize how we
write kernels so that we don't have to duplicate shape logic
in CPU kernel, CUDA kernel and meta kernel (this PR makes the
duplication problem worse!)  However, that infrastructure can
be built on top of this proof of concept, which just shows how
you can start writing meta kernels today even without this
infrastructure.

There are a lot of things that don't work:
- I special cased printing for dense tensors only; if you try to
  allocate a meta sparse / quantized tensor things aren't going
  to work.
- The printing formula implies that torch.tensor() can take an
  ellipsis, but I didn't add this.
- I wrote an example formula for binary operators, but it isn't
  even right!  (It doesn't do type promotion of memory layout
  correctly).  The most future proof way to do it right is to
  factor out the relevant computation out of TensorIterator,
  as it is quite involved.
- Nothing besides torch.add works right now
- Meta functions are ALWAYS included in mobile builds (selective
  build doesn't work on them).  This isn't a big deal for now
  but will become more pressing as more meta functions are added.

One reason I'm putting up this PR now is to check with Yinghai Lu
if we can unblock shape inference for accelerators, while we are
still working on a long term plan for how to unify all shape
computation across our kernels.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D21935609

Pulled By: ezyang

fbshipit-source-id: f7d8636eeb8516b6bc296db99a16e56029972eee
2020-06-22 09:18:33 -07:00
Vasiliy Kuznetsov
ab8a99bd36 graph mode: add hardswish inplace handling (#40284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40284

Adds graph mode handling for inplace hardswish, and test coverage for functional hardswish.

Test Plan:
```
python test/test_quantization.py TestQuantizeScriptPTSQOps.test_hardswish
```

Imported from OSS

Differential Revision: D22140628

fbshipit-source-id: 55a514f7dc1130d510f69ee4e611d7cb5e08d02e
2020-06-21 09:40:50 -07:00
Vasiliy Kuznetsov
c6dbfcaf9e quantized elu: graph mode handling (#40111)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40111

Adds graph mode handling for quantized elu.

Test Plan:
```
python test/test_quantization.py TestQuantizeScriptPTSQOps.test_elu
```

Imported from OSS

Differential Revision: D22075080

fbshipit-source-id: 37fb1b9e390f2a33d47cbd025157532379b6aa64
2020-06-21 09:40:48 -07:00
Vasiliy Kuznetsov
13d54c6471 quantized elu: require observation (#40100)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40100

ELU has a range of [-1, inf]. In the original PR which added
the quantized operator we decided to pass the quantization params
from the input.  However, it makes more sense to require observation
for this op.

This PR changes the API to require observation. Next PRs in this stack
will add the eager and graph mode handling.

Test Plan:
```
python test/test_quantization.py TestQuantizedOps.test_qelu
```

Imported from OSS

Differential Revision: D22075083

fbshipit-source-id: 0ea0fd05a00cc7a5f122a2b1de09144bbd586f32
2020-06-21 09:38:28 -07:00
Ivan Kobzarev
3852215170 [vulkan] jit passes for vulkan conv2 prepack and fuse with clamp (#39282)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39282

Test Plan: Imported from OSS

Differential Revision: D21962424

Pulled By: IvanKobzarev

fbshipit-source-id: 2d20e827d2c3836b7e6b443293377c68dc1ffa5a
2020-06-20 14:12:21 -07:00
Zafar
9da277c635 [quant][graphmodel] linear_relu (#40021)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40021

This replaces #36889 due to significant merge conflicts

Test Plan: Imported from OSS

Differential Revision: D22087061

Pulled By: z-a-f

fbshipit-source-id: 6a65cdd3c0c0c957968a9d017902fb6d03b58150
2020-06-19 23:32:54 -07:00
Jerry Zhang
e04a611b91 [quant][graphmode] clang format changes (#40329)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40329

Test Plan: Imported from OSS

Differential Revision: D22149706

fbshipit-source-id: 3c07cb0c09a53a01fc69185943ddc409264a6ff5
2020-06-19 23:22:43 -07:00