Commit Graph

8526 Commits

Author SHA1 Message Date
Rohan Varma
dbc8b00816 Document WorkerInfo and RpcBackendOptions structures in RPC docs. (#31077)
Summary:
We mention `WorkerInfo` and `RpcBackendOptions` in a couple of different locations in our docs, and these are public classes that the user may use, so we should add the class to the documentation.
<img width="978" alt="Screen Shot 2019-12-10 at 1 42 22 PM" src="https://user-images.githubusercontent.com/8039770/70571759-47db2080-1b53-11ea-9d61-c83985a29dd9.png">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31077

Differential Revision: D18928162

Pulled By: rohan-varma

fbshipit-source-id: 67f11eedd87523c469377b791a0ba23704ec3723
2019-12-11 11:39:57 -08:00
Iurii Zdebskyi
44ecc3a70b Add tracing support for optional Device and Layout (#30979)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30979

This stack is a first step toward an effort to fix, clean up and simplify code generation logic. �Please see the master [task](https://github.com/pytorch/pytorch/issues/30405) to see related discussions and all the known issues.

Main focus of these changes is TensorOptions in code generation.
Goals:
- Remove TensorOptions from generated code wherever it's possible. Leave it only in python/C++ API layers.
- Refactor TensorOptions logic to a single place.
- Log all discovered issues.

Non goals:
- Fix Everything!
- Remove all the hacks in code generation scripts.
- Clean up and defector all code generation scripts.

--------------
In this PR:
Add tracing support for optional Device and Layout types.

--------------

Test Plan: Imported from OSS

Differential Revision: D18912685

Pulled By: izdeby

fbshipit-source-id: 4a9514ce2eee0041f9bc96636d3ddb4f077675e1
2019-12-11 11:32:52 -08:00
David Riazati
1f87e823b8 Make nn.Transformer TorchScript compatible (#28561)
Summary:
This makes `nn.Transformer` usable from TorchScript. It preserves backwards compatibility via `__setstate__` on the encoder/decoder.

Fixes https://github.com/pytorch/pytorch/issues/24173
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28561

Differential Revision: D18124753

Pulled By: driazati

fbshipit-source-id: 7314843e5aa9c9bf974c4672e4edb24ed8ef4a6f
2019-12-11 10:57:31 -08:00
Alban Desmaison
717274c001 Add useful warnings for t.grad when it won't be populated for known reasons (#30531)
Summary:
Fix https://github.com/pytorch/pytorch/issues/2362 and https://github.com/pytorch/pytorch/issues/19778

To avoid issues with frozen model, we only consider warning for Tensors that require gradients and are neither leafs nor retain gradients.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30531

Differential Revision: D18832767

Pulled By: albanD

fbshipit-source-id: 743e863dc14ab57713e66da78b2e4d759dfba0ff
2019-12-11 09:47:18 -08:00
Richard Zou
9305f44854 Remove BUILD_NAMEDTENSOR from codegen and .cu files (#31047)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31047

Changelist:
- remove BUILD_NAMEDTENSOR from .cu files
- remove BUILD_NAMEDTENSOR special handling in function_wrapper.py
- remove BUILD_NAMEDTENSOR from cpp_extension.py. This code actually
did nothing because we always compile with BUILD_NAMEDTENSOR.

Test Plan: - run tests

Differential Revision: D18908442

Pulled By: zou3519

fbshipit-source-id: b239e24de58580adaf3cef573350773a38b1e4f0
2019-12-11 08:49:56 -08:00
BowenBao
8013ffd400 Fix weight_norm export for dim=0 (#31015)
Summary:
Exported weight_norm is incorrectly reducing over axis 0 as well when dim is set to 0.
Previous test case only covers weight with size(0) == 1, which yields the same result whether reduced over or not.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31015

Reviewed By: hl475

Differential Revision: D18900894

Pulled By: houseroad

fbshipit-source-id: 19004f51933b37f848dbe4138e617a7a8e35a9ec
2019-12-10 23:43:56 -08:00
Elias Ellison
9f3fe78239 peephole optimize type refinements (#31024)
Summary:
Peephole optimize out type refinements when they are no longer refining the type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31024

Differential Revision: D18920958

Pulled By: eellison

fbshipit-source-id: 6d05d9812b9f9dcf001de760a78a2042fb832773
2019-12-10 18:32:28 -08:00
Jeremy Lilley
e7e6d56b77 Allow async work in rpc RequestCallback processing. (#30637)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30637

RequestCallback api currently forces work to be always synchronous, which,
as we scale, means we're going to need to throw large number of (mostly
blocked) threads at the rpc problem. For some activities like dependent
autograd rpcs, there's not a necessary reason to block in these threads.

In this change, the RequestCallback api is updated to return a
shared_ptr<FutureMessage> rather than a Message:

   std::shared_ptr<FutureMessage> operator()(Message& request) const;

With a futures-style api, RPC ops that wish to be async can then be async,
while short-lived blocking functions (or Python UDFs) can just block.

In this change, we keep all of the current ops as synchronous (i.e. we block
and then return a completed FutureMessage). We also update the rpc_agents in
a manner compatible with this sort of parallelism.

Here, we only want to incur overhead when we use the async behavior.
Some modest extra cost seems unavoidable here (e.g. the allocation for the
std::make_shared<>), but we can trivially detect the synchronous/completed
case in the rpc_agent and avoid the extra thread-switches/etc. in that case.
ghstack-source-id: 95287026

Test Plan:
- Basic: buck test mode/dev-nosan caffe2/test/...
  - Additional testcase in ThriftRpcAgentTest for deferred work.

Differential Revision: D18774322

fbshipit-source-id: cf49922a71707cfb1726de16f93af23b160385d8
2019-12-10 16:11:05 -08:00
Supriya Rao
e42af97349 Add quantized concat conversion (#30887)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30887

Support to convert quantized concat from pytorch to caffe2

Test Plan:
python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_cat

Imported from OSS

Differential Revision: D18855676

fbshipit-source-id: 5d0cf3f03c61819e168b080afa368b1255d0419c
2019-12-10 15:46:16 -08:00
Ilia Cherniavskii
3de8584de8 Correct definition of nodes that work with Autograd (#30683)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30683

Assume that a node can work with autograd only if it is not a fusion
group and in prim or aten namespaces.

Test Plan: CI

Reviewed By: lly-zero-one

Differential Revision: D18795171

Pulled By: ilia-cher

fbshipit-source-id: 301090557e330b58be70e956784f7f0dc343c684
2019-12-10 15:39:38 -08:00
TH3CHARLie
5edfe9cb80 add torch.square (#30719)
Summary:
fixes https://github.com/pytorch/pytorch/issues/30524
This adds an new operator `torch.square` to PyTorch

I think it is ready for the first-time review now albanD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30719

Differential Revision: D18909268

Pulled By: albanD

fbshipit-source-id: 5626c445d8db20471a56fc1d7a3490e77812662b
2019-12-10 15:22:46 -08:00
Michael Suo
e3d40f857b Make nn.Module forward() type annotation more permissive (#31057)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31057

The current signature basically will always fail to type check, because
mypy enforces that the subclass method's input types must be "wider"
than their superclass method's input types (i.e. they can vary
contravariantly). And nothing is wider than `Any`.

This change makes it so that any input params are allowed in
`forward()`. Fixes #29099

Test Plan: Imported from OSS

Differential Revision: D18918034

Pulled By: suo

fbshipit-source-id: 9940e9f769b55d580d9d7f23abf6f88edb92627f
2019-12-10 14:36:13 -08:00
Pritam Damania
b01b05790e Fix memory leak due to circular dependency. (#31030)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31030

DistAutogradContext held a shared_ptr reference to RecvRpcBackward and
RecvRpcBackward held a shared_ptr reference to the context. This circular
dependency caused significant memory leaks. As a result, I'm changing the
reference in RecvRpcBackward to be a weak_ptr.

Test Plan: waitforbuildbot

Differential Revision: D18896389

fbshipit-source-id: e5bc588b6f998885854e3a67de1e82452e8475ce
2019-12-10 12:20:43 -08:00
Pieter Noordhuis
78a00d72b4 Revert D18899127: resubmit polish up overloads on free functions
Test Plan: revert-hammer

Differential Revision:
D18899127

Original commit changeset: 9049b8718926

fbshipit-source-id: c70a8aa4120aa757dce0926a8ab3cc5c92cd6041
2019-12-10 10:51:07 -08:00
Hong Xu
394d2f7037 Fix the rendering of the doc of max. (#30779)
Summary:
Close https://github.com/pytorch/pytorch/issues/30731
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30779

Differential Revision: D18837317

Pulled By: zou3519

fbshipit-source-id: b9b5ba414756a68d4b39a7a7c2d89fee1e3c040f
2019-12-10 10:48:16 -08:00
hxia11
06c7420fa2 Raise error if a block can not be found from a CUDA tensor (#30870)
Summary:
After several discussions, we agreed not to put any extra safety check for recordStream as either the check will cause failures in certain scenarios or there is no need to throw for user errors.

As a summary, it simply does what is described in https://github.com/pytorch/pytorch/issues/27405, check if a tensor is indeed allocated by a CUDACachingAllocator instance, if it is, then throw internal error if a block can not be retrieved.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30870

Differential Revision: D18851669

Pulled By: yxia11

fbshipit-source-id: c2f01798cd24f1fd0f35db8764057d5d333dab95
2019-12-10 08:04:00 -08:00
Elias Ellison
af4040d808 resubmit polish up overloads on free functions (#31014)
Summary:
Resubmitting https://github.com/pytorch/pytorch/pull/30356

Second commit has reintroduces deleted function which caused revert previously.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31014

Differential Revision: D18899127

Pulled By: eellison

fbshipit-source-id: 9049b8718926c329d9cb46bb96eac6c278e9b866
2019-12-10 07:57:47 -08:00
Richard Zou
e05ee4c421 Remove BUILD_NAMEDTENSOR macros (#30894)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30894

This PR begins the process of removing BUILD_NAMEDTENSOR macros. There
will be followups.

Reasons for removing the macros:
- BUILD_NAMEDTENSOR is always on and has been on since pytorch 1.3.0.
- Since we don't test building without it, it is useless to keep around.
- Code becomes nicer to read without the macros

Reasons for not removing the macros:
- potential for feature flagging

Now, I argue against needing to feature flag. The main reason why we
might want to feature flag is if we need to disable the feature.
We'd need a fast switch to disable the feature if someone discovers
in the future that named tensors caused some regression in some existing workflows.

In https://github.com/pytorch/pytorch/pull/25798, I did a variety of
macro- and micro- benchmarks to determine the performance impact of named
tensors on regular tensors.

[The
microbenchmarks](https://github.com/pytorch/pytorch/pull/25798#issuecomment-529014810)
were not very stable, and running the
microbenchmarks for more iterations doesn't actually help because the
noise is not distributed in a nice way. Instead of microbenchmarks I ran
a [profiler
(perf)](https://github.com/pytorch/pytorch/pull/25798#issuecomment-555707645)
to estimate how much overhead named tensors add to unnamed code. I
estimated the overhead to be less than 100ns for `add` and even smaller
for `mm`; there are ways to optimize even futher if we find this to be a
problem.

[Initial
macrobenchmarks](https://github.com/pytorch/pytorch/pull/25798#issuecomment-530539104)
were also not very stable. I ran imagenet for some number of epochs. To
make them more stable, I got rid of the data loading (which seemed to
vary between runs). [In some benchmarkers without data
loading](https://github.com/pytorch/pytorch/pull/25798#issuecomment-562214053),
we can see that the results are less noisy now. These results support
no noticeable regressions in speed.

Test Plan: - wait for CI

Differential Revision: D18858543

Pulled By: zou3519

fbshipit-source-id: 08bf3853a9f506c6b084808dc9ddd1e835f48c13
2019-12-10 07:54:05 -08:00
Elias Ellison
f48a8901c5 Add floor_divide function (#30493)
Summary:
Adds `torch.floor_divide` following the numpy's `floor_divide` api. I only implemented the out-of-place version, I can add the inplace version if requested.

Also fixes  https://github.com/pytorch/pytorch/issues/27512
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30493

Differential Revision: D18896211

Pulled By: eellison

fbshipit-source-id: ee401c96ab23a62fc114ed3bb9791b8ec150ecbd
2019-12-10 07:51:39 -08:00
neginraoof
5205556782 Export custom ops (#29752)
Summary:
Updated to export API:
When calling this API, a dict containing the custom opsets (domain and version) used to export the model could be provided.
We allow registering one custom opset (domain, version) per ONNX opset. So, when exporting an operator from a custom domain, users need to pass this pair. Default custom opset version is 1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29752

Reviewed By: hl475

Differential Revision: D18703662

Pulled By: houseroad

fbshipit-source-id: 84d22557d132b526169051193d730761798fce60
2019-12-09 18:48:50 -08:00
Jerry Zhang
04b9324476 Factor out getInvokedMethod in InsertQuantDeQuantHelper (#30860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30860

att

Test Plan:
.

Imported from OSS

Differential Revision: D18849021

fbshipit-source-id: e5ff260f2f4e88075b0c6b32ccfd8272053ccc41
2019-12-09 16:10:58 -08:00
Wanchao Liang
73dd8c005a Revert D18864774: polish up overloads on free functions
Test Plan: revert-hammer

Differential Revision:
D18864774

Original commit changeset: 6c566738bd6f

fbshipit-source-id: 669192605a3bc1a6ba06bbb5cae54f61637a45ae
2019-12-09 15:41:45 -08:00
Elias Ellison
446488960a polish up overloads on free functions (#30356)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30356

This finishes up the `torch.jit.overload` api for free-functions.
- defaults now required on the implementation function itself
- fully follows [overload spec](https://mypy.readthedocs.io/en/latest/more_types.html#function-overloading) such that the following is supported

```
overload
def mouse_event(x1: int, y1: int) -> ClickEvent: ...
def mouse_event(x1: int,
                y1: int,
                x2: Optional[int] = None,
                y2: Optional[int] = None): ...
```

Note: `jit.overload` isn't supported yet for UDT, but is support for modules. This PR doesn't make the same changes for modules, if reviewers think I should include them then I could do so in a follow up PR or wait to land this. Since that's still an internal api I think it's fine, and the changes here would allow us to expose `torch.jit.overload` on free functions.

Test Plan: Imported from OSS

Differential Revision: D18864774

Pulled By: eellison

fbshipit-source-id: 6c566738bd6f0551a000a9ea8d56e403636b7856
2019-12-09 15:12:18 -08:00
Elias Ellison
a03581b927 add tests that schemas are valid (#30749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30749

Add check to schemas that the schema is sane.

I removed the defaults from symbolic_script because they were in some cases wrong and don't actually do anything. At the point they're invoked the forward should already have matched all arguments.

Test Plan: Imported from OSS

Differential Revision: D18864775

Pulled By: eellison

fbshipit-source-id: 273d7e96d65b8a3d3de72e2d7bfcdf2417046c6b
2019-12-09 15:12:13 -08:00
Shen Li
e9ca13d7f5 Add glue code to collect debug info from all components
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30888

Test Plan: Imported from OSS

Differential Revision: D18857139

Pulled By: mrshenli

fbshipit-source-id: 5c1bfb83a21a4a57c4297bb94f14baa09520b791
2019-12-09 14:39:11 -08:00
Shen Li
8a57362000 Fix index out of bound error in Engine::ready_queue_size when called before start_threads
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30967

Test Plan: Imported from OSS

Differential Revision: D18887178

Pulled By: mrshenli

fbshipit-source-id: 67baeac9214a4749ce7e9b4d89862c93620b2d5e
2019-12-09 14:39:07 -08:00
Shen Li
a38c9b1ade Adding debugging metrics to process group agent
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30884

Test Plan: Imported from OSS

Differential Revision: D18857140

Pulled By: mrshenli

fbshipit-source-id: 4ec61d13778dd49467159d0db4b6dd51feaf282b
2019-12-09 14:39:03 -08:00
Elias Ellison
82268bf300 handle reassignment to inf and nan (#30877)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30877

Previously, when the environment tried to reassign variables which had been assigned to "inf" or "nan" it would fail because they are not simple values. Constant prop exposed this, a test was failing internally because of it.

Test Plan: Imported from OSS

Reviewed By: Krovatkin

Differential Revision: D18861016

Pulled By: eellison

fbshipit-source-id: b9b72978a26a0b00b13bf8ea7685825551f5a541
2019-12-09 14:20:17 -08:00
Elias Ellison
3eefc06feb add constant prop for immutable types (#30544)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30544

Run Constant Propagation upon compilation only on ops with non-aliasing inputs and outputs. This speeds up the first run of `torchvision.models.resnet18` by over 50% and speeds up compilation by about 25% (although the effects didn't seem additive with with https://github.com/pytorch/pytorch/pull/30503, so I'm going to land this PR first and then see if caching still has a sizable impact).

Running constant prop only with non-aliasing types does a lot of graph cleanup by removing constant ifs and a bunch of other smaller ops. It also avoids all the jitter problems we had when we tried running full constant prop previously. Bc it is idempotent it doesn't jitter, and it doesn't jitter graphs constructed from tracing because tracing doesn't emit any ops that only involve non-aliasing inputs.

Full constant prop isn't idempotent because what ops are run depends on the state of mutation in alias db, which will often change upon successive iterations of constant propagation, and bc it affects graphs constructed from tracing.

Edit: if we were okay with running constant propagation on graphs constructed from tracing (potentially making them hard to debug), an alternative would be to run constant propagation until the graph reaches a fixed point.

Test Plan: Imported from OSS

Differential Revision: D18833607

Pulled By: eellison

fbshipit-source-id: 92a0adb4882d67ed5a0db5c279f5e122aeeba54a
2019-12-09 14:20:12 -08:00
Elias Ellison
648bb501a1 rename shouldAnnotate api (#30543)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30543

`shouldAnnotate` doesn't make make a ton of sense as a public api

Test Plan: Imported from OSS

Differential Revision: D18833608

Pulled By: eellison

fbshipit-source-id: 460ee05d0fa91b1edc640c037be2a6ee8eaf50a6
2019-12-09 14:20:07 -08:00
Jerry Zhang
5bf58274cc getQParams return a dictionary of qparams (#30859)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30859

We can dictionary of quantization parameters to simplify the code
handling these things a bit

Test Plan:
.

Imported from OSS

Differential Revision: D18849023

fbshipit-source-id: 09e9860b2656a1affa8776016e16794529bcee3b
2019-12-09 13:42:21 -08:00
Sebastian Messmer
536481d9de Fix missing virtual destructor (#30927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30927

Classes that are used virtually (e.g. have virtual methods) must have a virtual destructor or bad things happen
ghstack-source-id: 95144736

Test Plan: waitforsandcastle

Differential Revision: D18870351

fbshipit-source-id: 333af4e95469fdd9103aa9ef17b40cbc4a343f82
2019-12-09 12:25:26 -08:00
Rohan Varma
4f342a61c1 add the worker IDs outside of addSendRpcBackward to ensure they are (#30914)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30914

When tensors don't require grad, we don't call `addSendRpcBackward`, where we record known workerIDs to clean up the dist autograd context later. But since  https://github.com/pytorch/pytorch/pull/29781, we always include the autograd context ID in RPCs, even if tensors do not require grad. So, it could be possible that we don't release the contexts on some nodes.

This can contribute to OOMs since the contexts will not be cleaned up in this case, which can be checking by running the unit test without this patch. We can fix this issue by moving the `addKnownWorkerIds`  call to the `getMessageWithAutograd` function.
ghstack-source-id: 95178561

Test Plan: Added a unit test: `test_context_cleanup_tensor_no_grad`

Differential Revision: D18869191

fbshipit-source-id: b80f66bfd0dd7d01960abe1691d3f44095bb1b2b
2019-12-09 11:38:34 -08:00
Pritam Damania
776fdda753 Add debug info API for distributed autograd. (#30642)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30642

Adding a couple of basic metrics for distributed autograd which would
help in determining stuckness.
ghstack-source-id: 95156189

Test Plan: waitforbuildbot

Differential Revision: D18776478

fbshipit-source-id: a0556ad6fe2b7c3cd0082ee2350c1c78cafaaec5
2019-12-07 13:56:51 -08:00
BowenBao
63f1b780ba Support exporting aten::copy_ and aten::index_put to ONNX opset 11 (#26941)
Summary:
- [x] Add more comments and refactor the logic of `ReshapeToAdvancedIndexingFormat`
- [x] Add more description here. Cases that are/aren't supported, and how they are supported.
- [x] Need to merge this PR https://github.com/pytorch/pytorch/issues/27186 to enable testing inplace operators.

We are now supporting exporting aten::copy_ and aten::index_put to ONNX.
Here's a breakdown of the different cases in PyTorch code.

```
# Case 1: Scalar Indices
x[0, 1, 2] = data

# Case 2: Slice Indices
x[1:3, :, ::2] = data

# Case 3: Ellipsis Indices
x[..., 0] = data

# Case 4: Tensor Indices
ind1 = torch.tensor([0, 2])
ind2 = torch.tensor([1, 1])
x[ind1, ind2] = data

# Case 5: Mixing all the above cases
ind1 = torch.tensor([0, 2])
ind2 = torch.tensor([1, 1])
x[1:3, ind1, ind2, ..., 3] = data
```

Limitations:

Tensor indices must be consecutive, and 1-d tensors.

```
# Supported
ind1 = torch.tensor([0, 2])
ind2 = torch.tensor([1, 1])
x[ind1, ind2] = data

# Not supported
ind1 = torch.tensor([0, 2])
ind2 = torch.tensor([1, 1])
ind3 = torch.tensor([[0], [1]])
x[ind1, :, ind2] = data
x[ind3] = data
```

Negative indices are not supported.
```
# Not supported
x[-1] = data
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26941

Differential Revision: D17951030

Pulled By: houseroad

fbshipit-source-id: 4357777072f53aa0bc4b297aa1ee53457a7f8dec
2019-12-06 22:48:46 -08:00
Junjie Bai
a26238da57 Enable using torch.autograd.profiler.record_function as decorator (#30861)
Summary:
```python
record_function('my_func')
def f(x, y):
    return x + y

with profile() as p:
    f(1, 2)
print(prof.key_averages().table())
```

```
------------------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------
Name                                  Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     Number of Calls
------------------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------
my_func                               85.42%           86.796us         87.27%           88.670us         88.670us         1
------------------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------
Self CPU time total: 101.606us
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30861

Differential Revision: D18857993

Pulled By: bddppq

fbshipit-source-id: eb6b8e2a8d4f3a7f8e5b4cb3da1ee3320acb1ae7
2019-12-06 21:38:35 -08:00
Pritam Damania
5c56986738 Attach autograd edges only for tensors requiring grad. (#30904)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30904

When we sent tensors over RPC, on the server side we would call
addRecvRpcBackward which would call `set_history` on all tensors. This was
incorrect and set the `requires_grad` flag on tensors that didn't actually need
grad.

To fix this, we only attach autograd edges to tensors that need grads.
ghstack-source-id: 95113672
ghstack-source-id: 95113999

Test Plan: waitforbuildbot

Differential Revision: D18828561

fbshipit-source-id: d8942b76e9e4c567f8f1821f125c00d275ea0f90
2019-12-06 18:05:57 -08:00
Michael Suo
62b10721fb Actually make flake8 do something (#30892)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30892

Fixes all outstanding lints and actually installs a properly configured
flake8

Test Plan: Imported from OSS

Differential Revision: D18862825

Pulled By: suo

fbshipit-source-id: 08e9083338a7309272e17bb803feaa42e348aa85
2019-12-06 17:50:50 -08:00
Xingying Cheng
7b97eaeba5 Add module level qpl logging. (#30906)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30906

Add mobile module observer to measure performance of each method run.
ghstack-source-id: 95120194

Test Plan:
Run pytext model through BI cloaking flow on lite-interpreter and verify logs are sent:
1. buck install -r fb4a
2. Go to internal setting and find MobileConfig, search for android_bi_infra_cloaking_iab_models and set the following params:
a. sample_rate: 1.0
b. enabled: true
c. use_bytedoc_pytorch_model: true
d. use_bytedoc_caffe2_model: false
e. use_full_jit: false
3. Go back to new feed and scroll down until find an ads which will direct you to offsite webpage;
4. Click on the ads, wait for the offsite ads loads;
5. Click back to news feed;
6. Go to scuba table: https://fburl.com/scuba/4fghwp0b and see all the operator runs have been logged:

{F223456981}

Reviewed By: ljk53

Differential Revision: D18702116

fbshipit-source-id: a9f07eee684e3022cef5ba3c5934f30f20192a85
2019-12-06 15:52:26 -08:00
Nikolay Korovaiko
118f1c633b refactor the way we are handling bailout counts
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30410

Differential Revision: D18733370

Pulled By: Krovatkin

fbshipit-source-id: 0ea9dc0f3dd1a47bcc09f1d54745460f9bd71886
2019-12-06 15:45:38 -08:00
Tongzhou Wang
c37de32b23 Enable len(dataloader) for iterable dataset (#23587)
Summary:
Copy-paste comment from code for reasoning:

```
            # NOTE [ IterableDataset and __len__ ]
            #
            # For `IterableDataset`, `__len__` could be inaccurate when one naively
            # does multi-processing data loading, since the samples will be duplicated.
            # However, no real use case should be actually using that behavior, so
            # it should count as a user error. We should generally trust user
            # code to do the proper thing (e.g., configure each replica differently
            # in `__iter__`), and give us the correct `__len__` if they choose to
            # implement it (this will still throw if the dataset does not implement
            # a `__len__`).
            #
            # To provide a further warning, we track if `__len__` was called on the
            # `DataLoader`, save the returned value in `self._len_called`, and warn
            # if the iterator ends up yielding more than this number of samples.
```

Fixes https://github.com/pytorch/pytorch/issues/30184
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23587

Differential Revision: D18852625

Pulled By: ailzhang

fbshipit-source-id: aea8d4d70c7f21aaa69b35908a6f43026493d826
2019-12-06 15:38:05 -08:00
Shen Li
26c51468c5 Fix examples in RRef API doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30857

Test Plan: Imported from OSS

Differential Revision: D18847527

Pulled By: mrshenli

fbshipit-source-id: 7dc9d28277597f8fc3ef97fa9ac98a312e76e6fb
2019-12-06 13:14:11 -08:00
Shen Li
642469b706 Fix examples in API doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30856

Test Plan: Imported from OSS

Differential Revision: D18847528

Pulled By: mrshenli

fbshipit-source-id: 57f666d9d4b634fb77b1b65debd2b07e2bebd57a
2019-12-06 13:14:06 -08:00
Shen Li
5e6c3fb23b Add more details to explain rpc_backend_options arg in init_rpc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30855

Test Plan: Imported from OSS

Differential Revision: D18847529

Pulled By: mrshenli

fbshipit-source-id: b4f0d5797f3b41cce155b7821d6bd34b268bd24e
2019-12-06 13:14:02 -08:00
Jerry Zhang
6d06b925ba Remove values_to_quantize_ (#30858)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30858

This is not needed since we have `values_to_qparams_`

Test Plan:
.

Imported from OSS

Differential Revision: D18848992

fbshipit-source-id: dc81f59967a93abdd5562f1010f02de4f4e60db0
2019-12-06 12:15:13 -08:00
Xingying Cheng
81e4739141 Move QScheme ops to c10 (#30134)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30134

ghstack-source-id: 95055387

Test Plan: buck build mode/dev caffe2:generate-code

Differential Revision: D18609716

fbshipit-source-id: fec39359e0b97387a9b13f8179d72a731cc61808
2019-12-06 12:04:51 -08:00
Xingying Cheng
78254eab45 Add mobile operator observer for qpl logging.
Summary: Add mobile operator observer to measure performance of each operator run, the result will also log into QPL event: [MOBILE_OPERATOR_STATS ](https://fburl.com/quicklog/8773a00a).

Test Plan:
Run pytext model through BI cloaking flow on lite-interpreter and verify logs are sent:
1. buck install -r fb4a
2. Go to internal setting and find MobileConfig, search for android_bi_infra_cloaking_iab_models and set the following params:
a. sample_rate: 1.0
b. enabled: true
c. use_bytedoc_pytorch_model: true
d. use_bytedoc_caffe2_model: false
e. use_full_jit: false
3. Go back to new feed and scroll down until find an ads which will direct you to offsite webpage;
4. Click on the ads, wait for the offsite ads loads;
5. Click back to news feed;
6. Go to scuba table: https://fburl.com/scuba/er7t4g9u and see all the operator runs have been logged:

{F223250762}

Reviewed By: ljk53

Differential Revision: D18131224

fbshipit-source-id: 23e2f6e2a9851c04b29511b45dc53f3cce03e8a0
2019-12-06 11:55:32 -08:00
Sebastian Messmer
e123d90a93 Back out "Back out "Back out "Revert D18542342: Boxed variable dispatch""" (#30650)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30650

Original commit changeset: 51bb7aac7cb7
ghstack-source-id: 95082205

Test Plan: CI

Differential Revision: D18778190

fbshipit-source-id: 7e9577e88fd0492006b6ea836ec081aea9da6b0c
2019-12-06 11:45:09 -08:00
Sebastian Messmer
37435d36ed Refactor VariableTypeManual (#30649)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30649

Operators in VariableTypeManual are now no longer registered against the VariableTypeId key, but they are registered as compound ops. See https://github.com/pytorch/pytorch/issues/30102 for background.

This also requires the non-variable codegen to ignore them and requires removal of VariableMethodStubs.cpp.

So, because function_wrapper.py now also needs to know which ops are manual, instead of having a hard-coded list in gen_variable_type.cpp for ops with manual implementation, we now have a `manual_kernel_registration` flag in native_functions.yaml that disables the registration of operator kernels for this operator (the schema is still registered). Then, we manually register the right kernels for the operator.
ghstack-source-id: 95082204

Test Plan: unit tests

Differential Revision: D18778191

fbshipit-source-id: 0af6f9e43ff4fb9800ce19b286dfccd0fd22cc41
2019-12-06 11:45:05 -08:00
Jerry Zhang
4ed2eae2d0 Add registerQParams function (#30552)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30552

For upcoming changes to support quantizing shared class type

Test Plan:
.

Imported from OSS

Differential Revision: D18818653

fbshipit-source-id: 393a55db69b20a1c00ffa0157ab568cb097915b2
2019-12-06 11:17:35 -08:00
Anjali Chourdia
5687ee1d85 added a serialize function in SGD class to utilize the existing macro for serialization/deserialization calls
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30739

Differential Revision: D18842908

Pulled By: anjali411

fbshipit-source-id: 7dc13ff9c4fc126790b88b1b4b5d03425c349d38
2019-12-06 08:38:07 -08:00
Seiya Tokui
1d7b40f1c4 Fix reading __cuda_array_interface__ without strides (#24947)
Summary:
When converting a contiguous CuPy ndarray to Tensor via `__cuda_array_interface__`, an error occurs due to incorrect handling of default strides. This PR fixes this problem. It makes `torch.tensor(cupy_ndarray)` works for contiguous inputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24947

Differential Revision: D18838986

Pulled By: ezyang

fbshipit-source-id: 2d827578f54ea22836037fe9ea8735b99f2efb42
2019-12-06 07:36:27 -08:00
Xintao Chen
9a858aba5f Moving checks related to options.aliasAnalysis and schema.hasAliasInfo to read callsite (#30671)
Summary:
**Context:**
In D18530964, we allow not set aliasAnalysis at previous registration call, and then update it to the correct one in following registration call.

But its not working E2E due to those existing checks.

So we want to remove or delay those TORCH_CHECKs.

Here is the existing three callsites for operator.aliasAnalysisKind():
https://our.intern.facebook.com/intern/diffusion/FBS/browse/master/fbcode/caffe2/torch/csrc/jit/ir.cpp?lines=994%2C995%2C996%2C1001%2C1004

https://our.intern.facebook.com/intern/diffusion/FBS/browse/master/fbcode/caffe2/torch/csrc/jit/operator.cpp?lines=147%2C155

https://our.intern.facebook.com/intern/diffusion/FBS/browse/master/fbcode/caffe2/torch/csrc/jit/passes/alias_analysis.cpp?lines=260%2C277%2C380

**Things to check**
1. Those two checks are different. But since in original op_registration code, if options.schemaOrName_->is_right() is FALSE, we kind of convert it to FunctionSchema type, so in the read callsites, we only need to check the following: options.aliasAnalysisKind_ == AliasAnalysisKind::FROM_SCHEMA ||  !schema.hasAnyAliasInfo()

2. If the three callsites above are indeed needed for those checks.

3. Here we made assumptions that for reads from jit or other places, its always being called after all registrations calls are done. Trying to make sure its a valid assumption
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30671

Test Plan: Will update and refactor the tests soon.

Differential Revision: D18784623

Pulled By: charliechen0401

fbshipit-source-id: 75edea140d0ae3e54820e1aeef010c81fe26416a
2019-12-06 01:36:22 -08:00
Shen Li
619e2ffe23 Replace deprecated AT_* with TORCH_* to reduce warnings in c10d
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30795

Test Plan: Imported from OSS

Differential Revision: D18826310

Pulled By: mrshenli

fbshipit-source-id: 0041ac2e5788e874e0a566abd57a8a90e658da9b
2019-12-06 01:28:30 -08:00
Shen Li
b0cba8ceae Replace deprecated AT_ERROR with TORCH_CHECK to reduce warnings in rpc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30794

Test Plan: Imported from OSS

Differential Revision: D18826311

Pulled By: mrshenli

fbshipit-source-id: bfd58d30f386bbe9535264b2afce4acbe7ac5b0e
2019-12-06 01:28:26 -08:00
Satendra Gera
d32aec5ad6 Add get_metrics and get_debug_info to rpc agent (#30833)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30833

[rpc] Add get_metrics and get_debug_info to rpc agent

Test Plan: UT and builds

Reviewed By: mrshenli

Differential Revision: D18835068

fbshipit-source-id: f552cf196bb6d54ccd38a44ba981e7d5b15513f0
2019-12-05 23:52:42 -08:00
Jerry Zhang
f1755d9aea Insert GetAttr for quantization parameters instead of Constant (#30551)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30551

To enable quantizing with shared types, we need to insert GetAttr nodes for
quantization parameters since the code might be shared by multiple module instances
and we'd like to make quantized module instance also share the same code but with
different values of attributes.

Test Plan:
test_jit.py, test_quantization.py

Imported from OSS

Differential Revision: D18818652

fbshipit-source-id: fc95623cac59dcedd9e3f95397524eae515e7a11
2019-12-05 22:52:45 -08:00
Jerry Zhang
a7406516d1 Refactor bias and weight check and add aten::linear pattern (#30474)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30474

There are some common parts in `isBiasOfConvOrLinear` and `isWeightOfConvOrLinear`, we can factor
them out, the refactor will allow for easier extension of new patterns

Test Plan:
python test/test_jit.py
python test/test_quantization.py

Imported from OSS

Differential Revision: D18795725

fbshipit-source-id: 446463da5e3fa8464db441ed0d9651930487b3b7
2019-12-05 21:00:39 -08:00
Supriya Rao
a51c5f5cbf Add JIT pass to insert permutes for conv ops (#30679)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30679

Caffe2 expects quantized ops to be in NHWC format while pytorch inputs are in NCHW.
Add a jit pass to insert permutes to convert from nchw2nhwc before each conv op and add nhwc2nchw permute after the conv op.
Using graph rewriter to find consecutive redundant permutes and remove them from the graph

Test Plan:
python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps

Imported from OSS

Differential Revision: D18790518

fbshipit-source-id: 4dd39cf0b31b21f5586c0edfdce2260d4e245112
2019-12-05 18:51:16 -08:00
peterjc123
6486bdfb90 Fix os.register_at_fork not defined on Windows (#30809)
Summary:
According to https://docs.python.org/3.8/library/os.html#os.register_at_fork, this function is only available in Unix platforms.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30809

Differential Revision: D18828777

Pulled By: bddppq

fbshipit-source-id: 3325a984da488bb0a80a5c27131553fbcf78921f
2019-12-05 13:36:53 -08:00
Will Feng
244b0bd1a5 Add docs for how we expose declarations in at:: to torch:: (#30760)
Summary:
This PR adds docs for how we expose declarations in `at::` to `torch::`, to make the semantics more clear.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30760

Differential Revision: D18833081

Pulled By: yf225

fbshipit-source-id: eff4d8815c67f681ce3a930ce99771cf2e55dbd9
2019-12-05 13:05:28 -08:00
Nathan Goldbaum
f531815526 Deprecate tensor.type() (#30281)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/29161.

I looked a bit at the code changes related to this and think I have all of the use cases of `DeprecatedTypeProperties` covered in the message, but suggestions from someone with more context on this would be very much appreciated :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30281

Differential Revision: D18830818

Pulled By: ezyang

fbshipit-source-id: 1a7fcee15354ae09e6644577e7fa33bd26acfe20
2019-12-05 10:55:34 -08:00
Heungsub Hans Lee
fa251cfd97 Fully deprecate variadic inputs of checkpoint_sequential (#25985)
Summary:
To support variadic inputs of `checkpoint_sequential` was deprecated at https://github.com/pytorch/pytorch/issues/21006. This case should be warned with `DeprecationWarning` for PyTorch 1.2, but it should be simply failed with `TypeError` since PyTorch 1.3. This patch removes the `DeprecationWarning` for PyTorch 1.2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25985

Differential Revision: D18809875

Pulled By: albanD

fbshipit-source-id: e84dd8629c04979c4b2dc63e8ada94292e8cedd0
2019-12-05 09:23:28 -08:00
Jerry Zhang
1d20c32bf1 Make InsertQuantDeQuantHelper global (#30550)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30550

Right now we have a `InsertQuantDeQuantHelper` for each module, but we need
it to be global because we need to know what graphs have been quantized before
and based on this information we can decide how to handle the module instance.

Test Plan:
test_jit.py, test_quantization.py

Imported from OSS

Differential Revision: D18818651

fbshipit-source-id: bfcaf37094ce20a257171a0c99b05b9348ebc13d
2019-12-04 20:03:00 -08:00
Jerry Zhang
c4c2e23385 Supporting making submodules unique (#30037)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30037

Support quantization for modules with reused submodules, e.g. relu (automatically make unique)
We first do a pass on the graph to find all duplicate uses of the same module, and record the `Value`s of the
module instance, for each of these values we create a new module and change the access to that module.

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D18821483

fbshipit-source-id: 1698b981e9e9f0c728d9f03fcbcfbd260151f679
2019-12-04 19:26:56 -08:00
Zachary DeVito
7a2889b014 Stop producing op_version_set version numbers.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28122

Test Plan: Imported from OSS

Differential Revision: D17959565

Pulled By: zdevito

fbshipit-source-id: 701101bd870700eb0c9882c69e2cfdd2524b555e
2019-12-04 19:14:43 -08:00
Jerry Zhang
3c1bb21cf5 Invoke more passes in insertObservers (#30473)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30473

Invoked `ConstantPooling` and `FuseLinear` pass before
`insertObservers`.
`ConstantPooling` is for cleanning up traced graph, e.g. when we
have to constant node that has the same value, this pass will merge them,
this allows us to have less quantization patterns
`FuseLinear` is to merge the exploded linear function into `aten::linear` so
that we can quantize this function properly. We need to fuse it because right now
the way we recognize weight and bias is by matching the argument position in certain function
calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve
the bounary of the linear function to recognize the weight of linear. Since in the exploded
linear code, input of addmm is transposed weight rather than the original weight of linear.
ghstack-source-id: 94887831

Test Plan:
This is needed for quantizing traced model tests to pass

Imported from OSS

Differential Revision: D18795722

fbshipit-source-id: 192d9d1e56307e2e1d90e30dce0502e31cb4f829
2019-12-04 18:45:04 -08:00
Wanchao Liang
569ea63f3b fix anynonzero op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29423

Test Plan: Imported from OSS

Differential Revision: D18820523

fbshipit-source-id: 55c7a1911121f0aed008bd684b448151bbbf0a8a
2019-12-04 16:40:43 -08:00
Jerry Zhang
1707774417 AddConstant and findConstant for ClassType (#29217)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29217

We want to preserve constant information in ClassType so that
users can access the constants in the module by name.
This is also used later for freezing some attribute(converting
attributes to constant)

Test Plan:
tbd

Imported from OSS

Differential Revision: D18799955

fbshipit-source-id: fbfbcd5d3f7f560368b96e2a87e270c822a3d03a
2019-12-04 14:17:13 -08:00
davidriazati
2308a0ec1b Improve documentation around builtin functions (#30347)
Summary:
This breaks the builtins page into some more sections and adds details about Python built-in functions
](https://our.intern.facebook.com/intern/diff/18718166/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30347

Pulled By: driazati

Reviewed By: wanchaol

Differential Revision: D18718166

fbshipit-source-id: bf43260ab7bcf92cccef684a5ce68cb16020771d
2019-12-04 13:50:40 -08:00
Nathan Goldbaum
9d3402e4cb Add the __torch_function__ API override mechanism (#30730)
Summary:
This is a re-do of https://github.com/pytorch/pytorch/issues/27064, which was reverted (b8792c0438). This was landed at the same time as other work that added new operators to the `torch` namespace so the check for whether the `torch` namespace is exhaustively checked for overridability was triggering test failures.

I've temporarily disabled that check and added an explanatory comment that the check will be re-enabled in a future PR that will be merged during a time when the commit velocity on PyTorch is lower.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30730

Differential Revision: D18813270

Pulled By: ezyang

fbshipit-source-id: 70477c4656dca8fea6e7bc59259555041fcfbf68
2019-12-04 13:19:07 -08:00
Elias Ellison
d38f9117fd Cache compilation of free functions (#30503)
Summary:
We don't have to recompile free functions if we've already compiled them.

Improved compilation of resnet18 by 27%.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30503

Differential Revision: D18796501

Pulled By: eellison

fbshipit-source-id: 2dee0fc5fcf9adc5b92213f8cb813730d71b376f
2019-12-04 12:45:35 -08:00
Jerry Zhang
756f279d95 Rename QuantizeHelper to InsertQuantDeQuantHelper (#30549)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30549

Preparing for later refactoring

Test Plan:
.

Imported from OSS

Differential Revision: D18802464

fbshipit-source-id: 0b5afb143549d93eed4c429125d3d5fd253093a9
2019-12-04 10:40:22 -08:00
Jerry Zhang
f73cd28082 InsertObservers for shared class types (#30548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30548

ClassTypes can be shared among different module instances, but previously we assumed
they would be unique, this PR enables the insert_observers pass to work with shared class types

Test Plan:
python test/test_jit.py
python test/test_quantization.py

Imported from OSS

Differential Revision: D18802465

fbshipit-source-id: b782e71e44a043af45577ac2b5c83e695155bb8b
2019-12-04 09:34:47 -08:00
Edward Yang
a55f125e3b Check the error return of nvrtcGetProgramLogSize and nvrtcGetProgramLog (#30663)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30663

Yes they can fail.  See https://github.com/ROCm-Developer-Tools/HIP/issues/1706

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18810088

Pulled By: ezyang

fbshipit-source-id: 96186e71c9a195bdbbed811e7ba8dc40bec09eae
2019-12-04 08:37:43 -08:00
Edward Yang
38986e1dea Split libtorch.so back into libtorch_{cpu,cuda,hip} (#30315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30315

The new structure is that libtorch_cpu contains the bulk of our
code, and libtorch depends on libtorch_cpu and libtorch_cuda.
This is a reland of https://github.com/pytorch/pytorch/pull/29731 but
I've extracted all of the prep work into separate PRs which can be
landed before this one.

Some things of note:

* torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library)
* The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774
In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/libprotobuf.a(arena.cc.o) is referenced by DSO"
* A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly
* I had to torch_cpu/torch_cuda caffe2_interface_library so that they get whole-archived linked into torch when you statically link. And I had to do this in an *exported* fashion because torch needs to depend on torch_cpu_library. In the end I exported everything and removed the redefinition in the Caffe2Config.cmake. However, I am not too sure why the old code did it in this way in the first place; however, it doesn't seem to have broken anything to switch it this way.
* There's some uses of `__HIP_PLATFORM_HCC__` still in `torch_cpu` code, so I had to apply it to that library too (UGH). This manifests as a failer when trying to run the CUDA fuser. This doesn't really matter substantively right now because we still in-place HIPify, but it would be good to fix eventually. This was a bit difficult to debug because of an unrelated HIP bug, see https://github.com/ROCm-Developer-Tools/HIP/issues/1706

Fixes #27215 (as our libraries are smaller), and executes on
part of the plan in #29235.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18790941

Pulled By: ezyang

fbshipit-source-id: 01296f6089d3de5e8365251b490c51e694f2d6c7
2019-12-04 08:04:57 -08:00
Will Price
1189595875 Fix Tensor.argsort -> torch.argsort documentation link
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30464

Differential Revision: D18717657

Pulled By: zou3519

fbshipit-source-id: 9894f63c6cb1b5311117441e78805230d1bc09f3
2019-12-04 07:49:38 -08:00
Edward Yang
b8792c0438 Revert D18645954: add __torch_function__ API override mechanism
Test Plan: revert-hammer

Differential Revision:
D18645954

Original commit changeset: 54b5e4344d7a

fbshipit-source-id: 4a7aebb483e6b001130d6f384ccc53c5a808ab13
2019-12-04 07:41:47 -08:00
Tongzhou Wang
a68b790293 fix ref to nonexistent torch.repeat
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30614

Differential Revision: D18808517

Pulled By: ezyang

fbshipit-source-id: 27f9bda6fbbd1c3c751a0e96fdc336bf724c0b31
2019-12-04 07:27:01 -08:00
Tongzhou Wang
ec7bb9de1c format tri[lu]_indices doc better
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30377

Differential Revision: D18689152

Pulled By: zou3519

fbshipit-source-id: 7fab1e39ecd39ef6a3869befcbe217f8d3b6a87e
2019-12-04 07:16:34 -08:00
Tongzhou Wang
d6ca93b353 add doc for F.softplus
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30055

Differential Revision: D18762624

Pulled By: zou3519

fbshipit-source-id: 61da88cbb8cd0f37ac26b0fb8aaacdbe85c724ba
2019-12-04 07:16:30 -08:00
Prasun Anand
d12786b24f add __torch_function__ API override mechanism (#27064)
Summary:
Closes https://github.com/pytorch/pytorch/issues/24015 (see description of that issue for more details).

For a toy example, see the `DiagonalTensor` and `SubDiagonalTensor` class in test/test_overrides.py.

This PR currently contains:

* tests for `__torch_function__` behavior
* modification to `gen_python_functions` and `parse` function signatures and dispatched to correct overloaded argument.

This feature is inspired by and analogous to NumPy's `__array_function__` protocol ([see NumPy Enhancement Proposal 18](https://numpy.org/neps/nep-0018-array-function-protocol.html#trying-array-function-methods-until-the-right-one-works)).

### Benchmarks:
See Nathan's comment below: https://github.com/pytorch/pytorch/pull/27064#issuecomment-554601189
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27064

Differential Revision: D18645954

Pulled By: ezyang

fbshipit-source-id: 54b5e4344d7afdbcf996bb57191b0bdadc7b1767
2019-12-04 05:56:46 -08:00
Martin Yuan
b26401f965 Dump operator names of a script module (#30467)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30467

Introduce function jit.export_opnames(module), which returns a list of all operator names used in the module and its submodules. One usage is to have mobile custom build to link only operators in the returned list to save the mobile size.

Example:
import torch
m = torch.jit.load("example.pt")
print(torch.jit.export_opnames(m))

The outputs are in alphabetical order:
['aten::_convolution', 'aten::add.Tensor', 'aten::add_.Tensor', 'aten::addmm', 'aten::append.Tensor', 'aten::cat', 'aten::dropout', 'aten::embedding', 'aten::matmul', 'aten::max.dim', 'aten::mul.Tensor', 'aten::permute', 'aten::relu', 'aten::t', 'aten::tanh', 'prim::ListConstruct', 'prim::TupleConstruct', 'prim::TupleUnpack']

Test Plan: Imported from OSS

Differential Revision: D18801619

Pulled By: iseeyuan

fbshipit-source-id: f9b198d3e82b095daf704ee595d8026ad889bb13
2019-12-03 20:20:33 -08:00
Shen Li
63a1542ed2 Adding Debug Info for RRef Context
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30610

Test Plan: Imported from OSS

Differential Revision: D18763592

Pulled By: mrshenli

fbshipit-source-id: ad8854bdb6250c29eaa0f582d66cfd31394312e5
2019-12-03 19:16:31 -08:00
Shen Li
6dda241ab8 Add RRef.__str__() API
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30609

Test Plan: Imported from OSS

Differential Revision: D18763593

Pulled By: mrshenli

fbshipit-source-id: 20f1eea2d6cfe9ab2a27a9677d97dde07c1dca9b
2019-12-03 19:16:26 -08:00
Hong Xu
bb5dcaf24f Add logical_and and logical_or (#30521)
Summary:
With the CI failure caused in 8bbafa0b32 fixed (incorrect return type of the lambdas in CUDA kernels)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30521

Differential Revision: D18770151

Pulled By: ailzhang

fbshipit-source-id: 02f0fe1d5718c34d24da6dbb5884ee8b247ce39a
2019-12-03 18:24:54 -08:00
Prasun Anand
3cf8382984 detect_anomaly() for SparseTensors (#29803)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/28649

1. Modified detect_anomaly() to use isnan()
2. isnan() for SparseTensors returns a bool Tensor of _values.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29803

Differential Revision: D18594299

Pulled By: ezyang

fbshipit-source-id: 3f4190c569f53219be330584fc604ca43c4a6c7a
2019-12-03 15:42:51 -08:00
Rohan Varma
fef4360536 remove default constructor in futureInfo (#30197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30197

This default constructor was added because std::map's operator[]
requires a default constructor. However, instead of using operator[], we can
use emplace and remove the constructor, to ensure that the FutureInfo struct
doesnt get constructed with garbage values.
ghstack-source-id: 94802453

Test Plan: Unit tests pass.

Differential Revision: D18627675

fbshipit-source-id: c4cb000e60081478c0fd7308e17103ebbc4dc554
2019-12-03 15:36:22 -08:00
Tristan Rice
59151d3e43 autograd/profiler: support merging FunctionEventAvg (#30677)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30677

Currently you can only add FunctionEvents to FunctionEventAvg. This makes it so you can add multiple FunctionEventAvg objects together. This is useful for merging multiple profiles together such as when dealing with distributed training.

Test Plan:
added unit test

  buck test //caffe2/test:autograd -- test_profiler

Reviewed By: bddppq

Differential Revision: D18785578

fbshipit-source-id: 567a441dec885db7b0bd8f6e0ac9a60b18092278
2019-12-03 15:28:58 -08:00
Peter Bell
dcd1216efe Force early initialization of OpenMP in forked children (#29006)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/28389

Intel's OpenMP implementation sets the thread affinity on the first call to an OpenMP function after a fork. By adding an atfork handler we can force this to happen before a user tries to set the affinity in their own DataLoader `worker_init_fn`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29006

Differential Revision: D18782456

Pulled By: ezyang

fbshipit-source-id: ce0b515256da0cf18ceb125e0cdec99a3311bbd3
2019-12-03 15:23:31 -08:00
Nikolay Korovaiko
d4c25add45 make sure the counter stays correct in between bailout transitions (#30186)
Summary:
This fixes the second issue reported in https://github.com/pytorch/pytorch/issues/29909 namely, a loop counter is assigned the wrong values after transitioning to a bailout graph.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30186

Differential Revision: D18646845

Pulled By: Krovatkin

fbshipit-source-id: 1f7c601dd9f35892979385ffa132fb0886a4f203
2019-12-03 14:59:08 -08:00
Will Feng
03a73cb9ac Remove namespace F = torch::nn::functional from torch/nn/modules/batchhnorm.h (#30684)
Summary:
This PR removes `namespace F = torch::nn::functional` from `torch/nn/modules/batchhnorm.h`, so that people don't have to define `torch::nn::functional` as `F` if they don't want to.

Fixes https://github.com/pytorch/pytorch/issues/30682.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30684

Differential Revision: D18795717

Pulled By: yf225

fbshipit-source-id: c9feffbeb632cc6b4ce3e6c22c0a78533bab69ad
2019-12-03 14:52:23 -08:00
Brian Vaughan
604a27361f remove tuple_parser (#30659)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30659

I could only find one usage of TupleParser and it doesn't seem worth maintaining just for that one usage.

Test Plan: Imported from OSS

Differential Revision: D18795979

Pulled By: nairbv

fbshipit-source-id: 6e50d65fc8fade0944f36ab20d00f1539a3d4cb8
2019-12-03 14:49:59 -08:00
Supriya Rao
980aead1f8 Add support for quantized slice conversion (#30498)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30498

Updated Int8SliceOp to accept dim, start and end index similar to Pytorch.

Test Plan:
python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_slice

Imported from OSS

Differential Revision: D18740519

fbshipit-source-id: 2313f37a4936edb150ce04911b241e591e191801
2019-12-03 14:37:59 -08:00
Sebastian Messmer
bc2e6d10fa Back out "Revert D17908478: Switch PyTorch/Caffe2 to C++14"
Summary: Original commit changeset: 775d2e29be0b

Test Plan: CI

Reviewed By: mruberry

Differential Revision: D18775520

fbshipit-source-id: a350b3f86b66d97241f208786ee67e9a51172eac
2019-12-03 14:33:43 -08:00
Yanli Zhao
40146eb48e Skip ProcessGroupGlooAyncTest if there is no CUDA available (#30345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30345

Skip ProcessGroupGlooAyncTest if there is no CUDA available, otherwise in sandcastle non GPU host the test will abort with failing to load CUDA library
ghstack-source-id: 94771241

Test Plan: test skipped on non GPU host

Differential Revision: D18665322

fbshipit-source-id: 8c7b89aeecc6ec007bee12d864a6058384254e61
2019-12-03 13:27:34 -08:00
Jerry Zhang
19cd90d303 Globally record observer nodes (#30547)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30547

att

Test Plan:
test_jit.py test_quantization.py

Imported from OSS

Differential Revision: D18784752

fbshipit-source-id: 000e140aa86ff12a240d98da71871a5a5053401f
2019-12-03 12:16:00 -08:00
Jerry Zhang
7023e13fbb Fix mapping white list (#30636)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30636

Currently DeQuantStub is still in whitelist because set union has
lower precedence than set difference
fix issue: https://github.com/pytorch/pytorch/issues/29646

Test Plan:
verified locally that we don't attach qconfig for DeQuantStub

Imported from OSS

Differential Revision: D18775275

fbshipit-source-id: 8da07e40963555671b3d4326c9291706103f858e
2019-12-03 11:34:28 -08:00
Ailing Zhang
a997f224ac Add torch.multiprocessing.create_processes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28493

Differential Revision: D18766066

Pulled By: ailzhang

fbshipit-source-id: 7f424c8fae3012be2416cf9bc72ee2dde40c1f89
2019-12-03 10:38:19 -08:00
Lara
4d30415f12 Add ONNX Scripting Conv Support (#30618)
Summary:
Convolution nodes are traced as aten:_convolution and are currently supported in ONNX.
Scripting convolution uses aten:conv<1,2,3>d which are currently not supported in ONNX.
This PR adds the symbolics for aten:conv<1,2,3>d and aten:conv_transpose<1,2,3>d
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30618

Reviewed By: hl475

Differential Revision: D18778145

Pulled By: houseroad

fbshipit-source-id: 4af0379f29974a1ce8443024d1d87b3eb8d2dd36
2019-12-03 10:28:38 -08:00
Jerry Zhang
89be1a22d4 split getInvokedMethods (#30546)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30546

factor out this function for later support of quantizing shared types

Test Plan:
test_jit.py, test_quantization.py

Imported from OSS

Differential Revision: D18776304

fbshipit-source-id: f5a736b0f69019cefe17ec4517da1ae5462f78e1
2019-12-03 10:11:57 -08:00
Rohan Varma
5a484245d9 Change test_invalid_names test to only test constructor of WorkerInfo (#30620)
Summary:
This tests seems to only test that we throw exceptions in the `WorkerInfo` constructor when invalid names are passed in, so I don't think we need to complicate by initializing RPC, and exposing ourselves to potential flakiness.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30620

Differential Revision: D18766955

Pulled By: rohan-varma

fbshipit-source-id: 11643de4d57431e5f46e096c7766de3ab0b9b05a
2019-12-03 09:07:10 -08:00
Shen Li
f9f54201d3 Remove deprecated fromIvalue in RRefForkData
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30646

Test Plan: Imported from OSS

Differential Revision: D18777610

Pulled By: mrshenli

fbshipit-source-id: 7a749c1035e36bbb464332d3829fd53e2c6cf727
2019-12-03 09:01:40 -08:00
Brian Vaughan
e5b947a3a8 Raise an error for is_signed on quantized types (#30527)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30527

When we introduced dtype.is_signed we allowed for support of
quantized types, but we're not sure what the correct result should be.

See discussion at https://github.com/pytorch/pytorch/pull/29511

Test Plan: Imported from OSS

Differential Revision: D18765410

Pulled By: nairbv

fbshipit-source-id: c87cfe999b604cfcbbafa561e04d0d5cdbf41e6d
2019-12-03 06:34:53 -08:00
Will Feng
18ec4632b3 Exclude undefined tensors in the result of Module::parameters() / named_paramters() / buffers() / named_buffers() (#30626)
Summary:
PR https://github.com/pytorch/pytorch/pull/30523 attempted to fix https://github.com/pytorch/pytorch/issues/30508 and https://github.com/pytorch/pytorch/issues/30462, but the fix wasn't complete. This PR makes the following improvements:
1. Fixes https://github.com/pytorch/pytorch/issues/30508 and https://github.com/pytorch/pytorch/issues/30462 properly by excluding undefined tensors in the result of `Module::parameters()` / `named_parameters()` / `buffers()` / `named_buffers()`, which mirrors the Python API behavior.
2. Audits all use sites of `Module::parameters_` / `buffers_` and change them to `Module::named_parameters(/*recurse=*/false)` / `named_buffers(/*recurse=*/false)` when appropriate, so that use sites of module parameters / buffers never need to worry about undefined tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30626

Differential Revision: D18777507

Pulled By: yf225

fbshipit-source-id: 55b64b69779e1186342efd3c44857f416334ed6b
2019-12-02 21:59:58 -08:00
Brian Wignall
e7fe64f6a6 Fix typos (#30606)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606

Differential Revision: D18763028

Pulled By: mrshenli

fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c
2019-12-02 20:17:42 -08:00
Jianyu Huang
0bebfe2143 Add the explicit per-tensor/per-channel quant info when we print the module (#30591)
Summary:
As Title says. We would like to explicitly distinguish per-tensor/per-channel scheme when we print the module.

Here is an example for Lenet after applying the per-channel dynamic quantization:

Before this PR:
```
FloatModel(
  (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(20, 50, kernel_size=(5, 5), stride=(1, 1))
  (fc1): DynamicQuantizedLinear(
    in_features=800, out_features=500
    (_packed_params): LinearPackedParams()
  )
  (fc2): DynamicQuantizedLinear(
    in_features=500, out_features=10
    (_packed_params): LinearPackedParams()
  )
)
```

After this PR:
```
FloatModel(
  (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(20, 50, kernel_size=(5, 5), stride=(1, 1))
  (fc1): DynamicQuantizedLinear(
    in_features=800, out_features=500, qscheme=torch.per_channel_affine
    (_packed_params): LinearPackedParams()
  )
  (fc2): DynamicQuantizedLinear(
    in_features=500, out_features=10, qscheme=torch.per_channel_affine
    (_packed_params): LinearPackedParams()
  )
)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30591

Differential Revision: D18764366

Pulled By: jianyuh

fbshipit-source-id: e897ab42ace6b82b2a90729ba788313c7873de1a
2019-12-02 20:14:46 -08:00
Jeremy Lilley
4dab29a2bd Fix serialization memory lifetime issue. (#30603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30603

Pickler object needs to be kept in scope until data is written out to the
final serialized string. tensorData in particular is a reference to memory
owned by the descoped Pickle object.

Noticed this by inspection. In practice, this potential read-after-free here
is limited to non-cpu tensors, and any such use was very soon after free.
ghstack-source-id: 94756036

Test Plan: existing test suite at buck test mode/dev-nosan caffe2/test:rpc_fork

Differential Revision: D18760463

fbshipit-source-id: 9de890d66626aa48f13ca376dd9bd50b92e0cb00
2019-12-02 20:10:28 -08:00
Pritam Damania
db81e13d6b Fix TCPStoreTest and improve tcputils::connect() (#30354)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30354

TCPStoreTest would timeout since the TCPStore constructor for the
server would block the main thread waiting for workers. The workers themselves
were spawned later on once the server store is created. As a result, this test
would always timeout.

To fix the test, I moved the server store to a thread so that the workers can
register with the server in parallel.

In addition to this made a few improvements to tcputils::connect. When
tcputils::connect() encountered an exception, it always looked at `errno` for
the error code. In some cases `errno` could be overwritten and the real error
code would be stored in `std::system_error`. As a result, I've modified the
code to look at the error code in `std::system_error` if we catch an exception
of that type.
ghstack-source-id: 94758939

Test Plan: waitforbuildbot

Differential Revision: D18668454

fbshipit-source-id: d5a3c57b066b094bfecda9a79d9d31bfa32e17f0
2019-12-02 19:52:34 -08:00
Supriya Rao
968c0d4a46 Add support for converting quantized AvgPool2d and Reshape operations (#30490)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30490

Add symbolic mapping to Int8AvgPool2d and Int8Reshape op in C2

Test Plan:
python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps

Imported from OSS

Differential Revision: D18740520

fbshipit-source-id: 1606125500c4b549fbc984e7929b7fd5204396a0
2019-12-02 18:15:01 -08:00
davidriazati
9c02b88791 Add pickler support for Device (#30131)
Summary:
This PR adds (un)pickling support for `c10::Device`. It also adds `torch.device` as a type annotation for device attributes.
](https://our.intern.facebook.com/intern/diff/18664421/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30131

Pulled By: driazati

Differential Revision: D18664421

fbshipit-source-id: 64378fb42b2d1bbe2bd86259e5ed10f24b5d1e49
2019-12-02 17:43:08 -08:00
Mingbo Wan
3636cb0364 windows build (#30556)
Summary:
based on https://github.com/pytorch/pytorch/pull/28677
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30556

Differential Revision: D18764040

Pulled By: mingbowan

fbshipit-source-id: 53104636800f5887b74a82c154bc5e9603de9322
2019-12-02 14:54:22 -08:00
Edward Yang
1111a6b810 Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL (#30274)
Summary:
Reland of https://github.com/pytorch/pytorch/pull/29095
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30274

Differential Revision: D18762293

Pulled By: ezyang

fbshipit-source-id: d3d50c2dd12bcb678ab25fa708eb6587cc4b66f9
2019-12-02 12:19:58 -08:00
Shen Li
dd52f50fc8 Add examples to RRef doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30516

Test Plan: Imported from OSS

Differential Revision: D18728183

Pulled By: mrshenli

fbshipit-source-id: af472ebed0e6dd0a85653b080abd3ac4d482bd26
2019-11-28 15:34:26 -08:00
Shen Li
30d70d5378 Make doc source format consistent in rpc/init.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30515

Test Plan: Imported from OSS

Differential Revision: D18728184

Pulled By: mrshenli

fbshipit-source-id: 7b643c7f8225943113fbd7130ff6aadb30c1d4e9
2019-11-28 15:34:22 -08:00
Jeremy Lilley
f4e7e9039d Improve process_group_agent() serialization speed (#29785)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29785

TLDR: This change improves process_group's serialization speed:
  Serialize_Tensor64:     12.38us ->   1.99us  (~-84%)
  Deserialize_Tensor64:   33.89us ->   5.62us  (~-84%)
  Serialize_Tensor1M:    525.74us -> 285.43us  (~-45%)
  Deserialize_Tensor1M:  892.61us -> 273.68us  (~-70%)

After speaking with the jit team, we had consensus that torch::save()/load()
are somewhat high-overhead for RPC serialization, mostly intended for
persistent disk data.

(Particularly, for large tensors, 35% of the time is spent in CRC checking, even
with the fb-side changes to subsitute 40x faster SSE-accelerated crc checking;
Also, for small tensors, the zip container overhead is considerable, as is the
overhead of lexing/parsing an embedded text python program for each RPC).

The jit team encouraged us to use jit::pickler, with the WriteableTensorData
way of outputting result tensors (not the default side-tensor table, or
with pickling the actual tensors). This ends up just pickling some tensor
metadata, and giving us some tensor blobs that we can mindlessly
blit over the wire (they copy to cpu memory if needed).

There is yet no standardized container format for the pickled data
(there is jit::pickle_save() checked in, but but it's experimental,
no load function is yet provided), but they encouraged us to just use
something sensible for this, and possibly revisit later. For now, I made
the directory headers slightly http-inspired.

Note that serialization is just one component of the pipeline, but that
said, we also see reasonable reductions in end-to-end echo times (noisier):
   ProcessGroupAgent_Echo(Tensor_Small)   855.25us -> 492.65us  (~-42%)
   ProcessGroupAgent_Echo(Tensor_1M)       10.82ms -> 6.94ms    (~-35%)
   ProcessGroupAgent_Echo(Small_NoTensor) 688.82us -> 301.72us  (~-56%)
   ProcessGroupAgent_Echo(1MB_NoTensor)     4.65ms -> 3.71ms    (~-20%)

I moved the "wire serialization" logic to a separate file to assist with
unittesting.
ghstack-source-id: 94694682

Test Plan:
buck test mode/dev-nosan caffe2/test/cpp/api:serialize
  buck test mode/dev-nosan caffe2/test/...

Differential Revision: D18493938

fbshipit-source-id: 07ddfe87dbe56472bc944f7d070627052c94a8f4
2019-11-28 09:57:52 -08:00
Rohan Varma
1350b99de4 Add local shutdown to process group agent (#30330)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30330

This is now possible due to previous changes made in `gloo` and `ProcessGroupGloo`. We `abort` the listener thread that is waiting for a message, and join all other threads. The API is changed so that the previous `wait_all_workers` does not destroy the agent, and this is now done in a new `shutdown` method. All callsites are updated appropriately.

ghstack-source-id: 94673884
ghstack-source-id: 94673884

Test Plan: Unit tests pass.

Reviewed By: mrshenli

Differential Revision: D18661775

fbshipit-source-id: 5aaa7c14603e18253394224994f6cd43234301c2
2019-11-27 22:34:08 -08:00
Will Feng
7ac8efa689 Skip undefined tensors when moving torch::nn module to a different device (#30523)
Summary:
This fixes high-pri issues such as https://github.com/pytorch/pytorch/issues/30508 and https://github.com/pytorch/pytorch/issues/30462.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30523

Differential Revision: D18732904

Pulled By: yf225

fbshipit-source-id: fe5a7a43838000f5803bd9c01ecfba0c3f02df5d
2019-11-27 21:21:02 -08:00
Sebastian Messmer
a2ed50c920 Revert D17908478: Switch PyTorch/Caffe2 to C++14
Test Plan: revert-hammer

Differential Revision:
D17908478

Original commit changeset: 6e340024591e

fbshipit-source-id: 775d2e29be0bc3a0db64f164c8960c44d4877d5d
2019-11-27 14:57:05 -08:00
Tao Xu
a69be8123a Use gettimeofday on iOS (#30361)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30361

### Summary

By default, the compiler will choose `clock_gettime` for the iOS build. However, that API is not available until iOS 10. Since the Facebook app still supports iOS 9.0,  we have to use `gettimeofday` instead.

```shell
xplat/caffe2/torch/csrc/autograd/profiler.h:86:3: error: 'clock_gettime' is only available on iOS 10.0 or newer [-Werror,-Wunguarded-availability]

xplat/caffe2/torch/csrc/autograd/profiler.h:86:17: error: '_CLOCK_MONOTONIC' is only available on iOS 10.0 or newer [-Werror,-Wunguarded-availability]
```

P.S. the open-sourced version is iOS 12.0 and above, so we don't have this problem.

### Test Plan

- buck build works
- Don't break CIs

Test Plan: Imported from OSS

Differential Revision: D18730262

Pulled By: xta0

fbshipit-source-id: fe6d954b8d3c23cbc9d1e25a2e72e0b0c1d4eaa9
2019-11-27 11:48:41 -08:00
Sebastian Messmer
d0acc9c085 Switch PyTorch/Caffe2 to C++14 (#30406)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30406

ghstack-source-id: 94642238

Test Plan: waitforsandcastle

Differential Revision: D17908478

fbshipit-source-id: 6e340024591ec2c69521668022999df4a33b4ddb
2019-11-27 10:47:31 -08:00
Richard Zou
ec5c08de74 Revert D18580867: Add logical_and and logical_or
Test Plan: revert-hammer

Differential Revision:
D18580867

Original commit changeset: 7e4d7c37da4d

fbshipit-source-id: 81fb604c7aef8d847f518f5faa016e7bd0423016
2019-11-27 09:27:00 -08:00
Bowen Bao
1e8ed021c6 Support logsoftmax with dim != -1 (#30433)
Summary:
PyTorch dim and ONNX axis have different meanings.
ONNX only supports log_softmax with dim = -1. Transpose must be added before and after log_softmax to support other cases.
This requires input rank to be known at export time.
Fixes https://github.com/pytorch/pytorch/issues/17918
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30433

Reviewed By: hl475

Differential Revision: D18723520

Pulled By: houseroad

fbshipit-source-id: d0ed3b3f051d08d46495a7abfa854edd120dca3a
2019-11-27 08:34:38 -08:00
Pieter Noordhuis
0282c5ae69 Add helper to aggregate multiple process groups (#25768)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25768

The round robin process group can be constructed from multiple other
process groups. Every collective call against this new process group
is delegated to the specified process groups in a round robin fashion.

Doing so may benefit performance when calling into multiple NCCL
process groups. Instead of adding support for round-robin usage of
NCCL communicators, we achieve the same without changing the NCCL
process group and adding this wrapper class.

The API to create this round robin process group is a bit harsh. If we
find it adds significant benefit we can revisit and make this a first
class citizen in the torch.distributed module.
ghstack-source-id: 94578376

Test Plan: The newly added test passes.

Reviewed By: chenyangyu1988

Differential Revision: D17226323

fbshipit-source-id: ec9f754b66f33b983fee30bfb86a1c4c5d74767d
2019-11-27 08:34:34 -08:00
Pieter Noordhuis
1d3f3a1a0c Add pybind11 trampoline class for c10d.Store (#30415)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30415

This enables subclassing of c10d.Store and implementing its interface in Python.
ghstack-source-id: 94586627

Test Plan: New tests passes.

Reviewed By: vladbelous

Differential Revision: D18693018

fbshipit-source-id: fa1eba4bd11cc09a3d6bf3f35369c885033c63c0
2019-11-27 08:34:29 -08:00
neginraoof
512c2a2df5 Enable constant folding (#29834)
Summary:
Set default do_constant_folding = True
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29834

Reviewed By: hl475

Differential Revision: D18588037

Pulled By: houseroad

fbshipit-source-id: b35c06161321629c886e177ea666eff31cebf06a
2019-11-27 08:34:20 -08:00
Junjie Bai
c1c8105de0 Make the warning of using SparseTensor in JIT less noisy
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30499

Test Plan: waitforsandcastle

Reviewed By: wanchaol

Differential Revision: D18705553

fbshipit-source-id: d6e16e3285a74a1c031a5312f7a690f1baf392f8
2019-11-27 08:34:16 -08:00
Daya Khudia
2d6b2f39e9 Fix docs so that the example works (#30120)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30120

The example given for functional conv2d didn't work. This diff fixes the example in docs so that it works.

Fixes https://github.com/pytorch/pytorch/issues/29649
ghstack-source-id: 94601559

Test Plan: Tried the example locally

Differential Revision: D18604606

fbshipit-source-id: ff1a4f903e2843efe30d962d4ff00e5065cd1d7e
2019-11-26 17:38:40 -08:00
Pavel Belevich
6bd8937aee FunctionParameter::set_default_str replace || with &&
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30471

Test Plan: Imported from OSS

Differential Revision: D18710958

Pulled By: pbelevich

fbshipit-source-id: 7e5339175c7e16cd975a90bf6b123df728045e4d
2019-11-26 17:38:31 -08:00
Hong Xu
8bbafa0b32 Add logical_and and logical_or (#28162)
Summary:
Superseding https://github.com/pytorch/pytorch/issues/24379 as type promotion has been implemented.

Close https://github.com/pytorch/pytorch/issues/24379
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28162

Differential Revision: D18580867

Pulled By: ailzhang

fbshipit-source-id: 7e4d7c37da4dc8df87314bd4f1f6a7539e46586a
2019-11-26 17:38:22 -08:00
James Reed
05a1644ce3 Fix BC for quantized linear
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30481

Test Plan: Imported from OSS

Differential Revision: D18714602

Pulled By: jamesr66a

fbshipit-source-id: d51206c22cf2446e98053446789c6324c0481321
2019-11-26 17:38:09 -08:00
Elias Ellison
634f370c63 Add comment to ops bound at python layer
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30419

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D18714000

Pulled By: eellison

fbshipit-source-id: 22ccb941b2db24031921f378c600e68fe70e1346
2019-11-26 17:37:59 -08:00
albanD
b0871f211b Make all optimizers consistent so that they don't change gradients inplace
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30257

Test Plan: Imported from OSS

Differential Revision: D18665461

Pulled By: albanD

fbshipit-source-id: cfdafef919468a41007881b82fd288b7128baf95
2019-11-26 12:16:25 -08:00
vishwakftw
dcd9f49809 Specify ordering on singular values and eigenvalues output from torch… (#30389)
Summary:
….svd/symeig respectively

Changelog:
- Adds a note to docstrings of the both functions specifying the ordering

Fixes https://github.com/pytorch/pytorch/issues/30301
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30389

Differential Revision: D18707608

Pulled By: zou3519

fbshipit-source-id: b0f73631578f39a24fae9af4997c6491de8be9a8
2019-11-26 10:23:47 -08:00
BowenBao
0febff36ac Export dynamic unbind/split and __getitem__ (#29136)
Summary:
In ONNX opset 11, a series of sequence ops were added. Operators that are related to Tensor[] in PyTorch can be exported using these sequence ops.
In this PR, unbind/split that produces Tensor[], and __getitem__ that takes Tensor[] as input, are exported correctly to ONNX opset 11.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29136

Reviewed By: hl475

Differential Revision: D18309222

Pulled By: houseroad

fbshipit-source-id: be12c96bf8d0a56900683ef579f1c808c0a1af21
2019-11-26 06:54:06 -08:00
Supriya Rao
2599b9b551 Add output_size argument to caffe2 Int8ResizeNearest (#30202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30202

Pytorch Upsample operator has output_size as an argument.
For quantized tensor inputs we cannot get the input_size to calculate the width and height scale factor.
Instead we pass the output_size directly to caffe2 to calculate the scale factors.

Test Plan:
python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_upsample

Imported from OSS

Differential Revision: D18631478

fbshipit-source-id: 38a39129bc863f4ecf2293acc068e40ab7edc825
2019-11-26 06:54:02 -08:00
Shen Li
efe1859ad9 By default ignore RRef leaks during shutdown (#30217)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30217

Before this commit, RRefContext throws an error if it detects any
RRef leak during shutdown. However, this requires applications to
make sure that is has freed all references to RRefs in application
code, which can be a bad debugging experience when for large
applications. Besides, this also relies on Python GC to free things
up in time, which might not always be true. After this commit,
RRefContext would ignore leaking RRefs during shutdown, as shutdown
is called when the application has finished training and no longer
care about local states. Hence, it should be OK to just ignore
those leaks and destroy OwnerRRefs. If application would like to
enforce no leaks, just set torch.distributed.rpc.api._ignore_rref_leak
to False.

Test Plan: Imported from OSS

Differential Revision: D18632546

Pulled By: mrshenli

fbshipit-source-id: 2744b2401dafdd16de0e0a76cf8e07777bed0f38
2019-11-26 06:53:58 -08:00
Spandan Tiwari
06db5ad707 Provide names for operator nodes in ONNX exported graph. (#27342)
Summary:
The PyTorch exporter does not add any name to the ONNX operators in the exported graph. A common request is to add names to op nodes by default. This helps the readability of the graph in visualization tools such a Netron, or when the ONNX graph is printed as a string. Also, it helps with the debuggability of the ONNX graph.

Therefore this PR adds name to operators in the exporters. The names follow a simple format, <op_type>_<index>. Expect files for tests in `test/onnx/test_operators.py` have been updated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27342

Reviewed By: hl475

Differential Revision: D17790979

Pulled By: houseroad

fbshipit-source-id: 1eaae88b5f51f152735a2ff96e22827837e34d9d
2019-11-26 06:53:53 -08:00
BowenBao
584be86c3f Try exporting ONNX with force_outplace=False (#29466)
Summary:
This should resolve https://github.com/pytorch/pytorch/issues/29008. This flag has two effects on the tracer.
- Remove the underscroll for inplace operators. E.g.: index_put_ ==> index_put. This is handled in utils.py separately as well.
- Add out as input for backward computation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29466

Reviewed By: hl475

Differential Revision: D18422815

Pulled By: houseroad

fbshipit-source-id: 317b6a3c8a5751fe6fe49d7543e429d281ed0d6d
2019-11-26 06:53:49 -08:00
Raghuraman Krishnamoorthi
eccf42fd15 Bug fix: Handle missing keys in observer state dict during load (#30357)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30357

Fix issue https://github.com/pytorch/pytorch/issues/29032 in loading from state dict for observers and fake quant.
ghstack-source-id: 94468814

Test Plan: Ensures that load/save of fake quant and observers with missing keys works correctly.

Differential Revision: D18668517

fbshipit-source-id: 0eda6f47c39102e55977fc548b9a03664f123ad7
2019-11-26 06:53:45 -08:00
Jonathan Reynolds
085dde5965 Fix for when PyTorch model trace has RecursiveScriptModules (#30430)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30430

When a module isn't a TracedModule, attempt to get name information with `original_name` property on module and default to 'Module' when no such property exists.

Test Plan:
### Change child module to scripted module:
```
model = torchvision.models.alexnet()
model.classifier = torch.jit.script(model.classifier)
```
### Add graph
```
w = SummaryWriter()
w.add_graph(model, torch.rand((2, 3, 224, 224)))
w.close()
```
### No errors
However, graph is disconnected at parts and hard to understand.
{F223327878}

Reviewed By: sanekmelnikov

Differential Revision: D18690836

fbshipit-source-id: 42295d06b7c1d48d5401776dca1e0d12cd64b49d
2019-11-26 06:53:35 -08:00
Jerry Zhang
661a6c8ef2 Add get_qparams and revert the changes to calculate_qparams (#30262)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30262

`get_qparams` returns all parameters that's needed to call quantize function

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D18645047

fbshipit-source-id: e57c11a66dac2d589778d412a996796ad5b6f86a
2019-11-26 06:53:26 -08:00
Zhang Zhi
ab2ec4d835 Fix inexistent parameter in document (#24335)
Summary:
There is no `out` argument to `argsort` according to the source code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24335

Differential Revision: D16829134

Pulled By: vincentqb

fbshipit-source-id: 8f91154984cd4a753ba1d6105fb8a9bfa0da22b3
2019-11-26 06:53:17 -08:00
Jerry Zhang
0b71e7e1fd Refactor QAT Conv module for better extensibility (#30362)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30362

Right now the qat modules(qat.ConvBn2d, qat.ConvBnReLU2d, qat.Conv2d)
are not convinent to support other dimensions of Conv, this PR refactors
these modules so that we can support Conv1d/Conv3d better

Test Plan:
python test/test_quantization.py

Imported from OSS

Differential Revision: D18691152

fbshipit-source-id: 5b561e6b054eadd31b98cabdf1ac67a61ee9b805
2019-11-26 06:53:12 -08:00
Lingyi Liu
b8f50d9cc8 Support to add dequant for each use of Value (#30145)
Summary:
In this PR, we mainly handle the case there are multiple usage of a Value when inserting the quant-dequant pair. This change will add one dequant for each usage of the Value.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30145

Differential Revision: D18671600

Pulled By: lly-zero-one

fbshipit-source-id: 61324a98861da85b80dcf7e930381311118ae53b
2019-11-25 14:52:58 -08:00
Rohan Varma
5c6705e62c add default arg for init_method (#30208)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30208

Adds default arg for init_method so users don't have to pass this in,
and moves it to `RpcBackendOptions` struct. Removes `init_method` arg from rpc.init_rpc. Also fixes some docs.
ghstack-source-id: 94500475

Test Plan: Unit tests pass.

Reviewed By: mrshenli

Differential Revision: D18630074

fbshipit-source-id: 04b7dd7ec96f4c4da311b71d250233f1f262135a
2019-11-25 14:52:48 -08:00
Xiaomeng Yang
c12f9a12a8 Fix quantized ConvReLU3d test (#30266)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30266

Fix quantized ConvReLU3d test

Test Plan: buck test mode/dev-nosan //caffe2/test:quantized -- "conv"

Reviewed By: hl475

Differential Revision: D18645717

fbshipit-source-id: bbe93f9daf5046f2aa05363efc7d0e59eaff37bf
2019-11-25 14:52:32 -08:00
Sebastian Messmer
aa2862b843 Hide the OperatorKernel* argument from the stack based kernel API (#29337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29337

This argument is needed by boxing wrappers so they're able to get a pointer to the corresponding unboxed kernel and call into it.
But if a kernel is registered in a boxed way, we don't need it and should hide this from the API.
This is especially needed for the backend fallback API where users would only be left wondering why this argument is there and what it does.
Also, hiding it allows us to potentially totally remove it in a future refactoring if we find some way to do so.
ghstack-source-id: 94481316

Test Plan: unit tests

Differential Revision: D18361991

fbshipit-source-id: 5cef26c896fe3f2a5db730d3bc79dcd62e7ef492
2019-11-23 15:25:01 -08:00
Sebastian Messmer
583c288232 Add a OperatorHandle argument to boxed kernels (#29201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29201

This is required for boxed backend fallback kernels (e.g. lazy, AMP) because they need to know which op was actually called.
ghstack-source-id: 94481313

Test Plan: I will add unit tests in a diff stacked on top

Differential Revision: D18282746

fbshipit-source-id: 339a1bbabd6aff31a587b98f095c75104dfc6f99
2019-11-23 15:24:49 -08:00
Chris Gottbrath
7c4b9042ab Updates to quantization documentation (#30288)
Summary:
This pull request includes fixes for six quantization doc bugs.

https://github.com/pytorch/pytorch/issues/30283 - Rendering issue on QConfig
https://github.com/pytorch/pytorch/issues/26305 - Minor doc issue on fuse_modules()
https://github.com/pytorch/pytorch/issues/27451 - Issues with ConvReLU2d, ConvReLU3d, and LinearReLU doc issues
https://github.com/pytorch/pytorch/issues/26899 - Missing docstrings in torch.nn.intrinsic fused functions
https://github.com/pytorch/pytorch/issues/29735 - add discussion of QNNPack to quantization doc page
https://github.com/pytorch/pytorch/issues/27938 - some of the quantized functions lack documentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30288

Differential Revision: D18653368

Pulled By: gottbrath

fbshipit-source-id: 410b3dd81ff10909a7f1a7736ca42d7cabf0beb1
2019-11-23 09:29:30 -08:00
Lingyi Liu
59ca9b7430 Graph-mode quantization for convolution from traced model (#30245)
Summary:
In the PR, we enhance the graph-mode quantization for aten::_convolution, which could be generated from tracing path.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30245

Differential Revision: D18671597

Pulled By: lly-zero-one

fbshipit-source-id: 78a2470fbb0fe0def55d63c6bda7cbb5c89f7848
2019-11-23 01:24:50 -08:00
davidriazati
2a7a39c1af (de)serialization of values between C++ and Python (#30108)
Summary:
This PR updates `torch::pickle_save` to use the new zipfile format introduced in #29232 and adds `torch::pickle_load` which can decode the zipfile format. Now that `torch.save/load` use this format as well (if the `_use_new_zipfile_serialization` flag is `True`), raw values saved in Python can be loaded in C++ and vice versa.

Fixes #20356
](https://our.intern.facebook.com/intern/diff/18607087/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30108

Pulled By: driazati

Differential Revision: D18607087

fbshipit-source-id: 067cdd5b1cf9c30ddc7e2e5021a8cceee62d8a14
2019-11-23 00:06:07 -08:00
Lingyi Liu
328ec5460f refactor the observer removal and quantize tensor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30360

Differential Revision: D18670373

Pulled By: lly-zero-one

fbshipit-source-id: 1481d6e4d5ce40376577b8deb0a0f74d5559076e
2019-11-22 21:25:23 -08:00
Shihao Xu
6a00191fc2 Add RpcAgent::getWorkerInfos() (#30241)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30241

We need an API to get all worker infos. This will be used by backend-agnostic `rpc.wait_all_workers()` API.
ghstack-source-id: 94454935

Test Plan:
# Unit tests

```
buck test mode/dev-nosan //caffe2/test:rpc_fork -- test_get_worker_infos

buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_get_worker_infos
```

```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_get_worker_infos

buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_get_worker_infos
```

Differential Revision: D5693412

fbshipit-source-id: 5123c8248b6d44fd36b8a5f381dbabb2660e6f0f
2019-11-22 18:26:30 -08:00
Hongyi Jia
c7f988b8c6 transport open registration (#30167)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30167

Pull Request resolved: https://github.com/pytorch/pytorch/pull/29164

- Created GlooDeviceFactory to hide device creation details
- Added transport option while on Python interface

The reason of making the factory class is to make it easier to extend gloo transport in the future

Test Plan: Imported from OSS

Reviewed By: satgera, d4l3k

Differential Revision: D18596527

fbshipit-source-id: e8114162ee8d841c0e0769315b48356b37d6ca0a
2019-11-22 17:41:52 -08:00
Sebastian Messmer
ac103a5d78 Remove variable wrapping from register_c10_ops (#29207)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29207

The logic calling c10 ops from JIT did some variable wrapping to make sure all results are always variables.
Thanks to ezyang, this is not needed anymore because everything is a variable now.
ghstack-source-id: 93345590

Test Plan: waitforsandcastle

Differential Revision: D18327507

fbshipit-source-id: 86512c5e19d6972d70f125feae172461c25e3cb6
2019-11-22 15:32:55 -08:00
David Riazati
8c6f0c0587 Detect TorchScript archives in torch.load (#29339)
Summary:
This PR looks for a `constants.pkl` file at the top level in a zip file
in `torch.load`. If found, it calls `torch.jit.load` instead and issues
a warning to call `torch.jit.load` directly
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29339

Differential Revision: D18611095

Pulled By: driazati

fbshipit-source-id: f070a02f6b5509054fc3876b3e8356bbbcc183e1
2019-11-22 12:30:30 -08:00
James Reed
97fae401f0 Use LinearPackedParams everywhere
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30198

Test Plan: Imported from OSS

Differential Revision: D18628003

Pulled By: jamesr66a

fbshipit-source-id: 76ff0248fd859e805a15cde555d26dd2138636fa
2019-11-22 11:31:17 -08:00
James Reed
1cc321deed Memoize parseIR calls in graph mode quantization
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30188

Test Plan: Imported from OSS

Differential Revision: D18625743

Pulled By: jamesr66a

fbshipit-source-id: 88f9da8e79324ba91e3550a8fc1a05e85bb83a86
2019-11-22 11:31:13 -08:00
James Reed
65f465050b Dont use SubgraphRewriter in FoldQuantizeCallIntoBuffer
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30264

Test Plan: Imported from OSS

Differential Revision: D18645531

Pulled By: jamesr66a

fbshipit-source-id: 44fc0f0a3c8cabe62924baae0d556e43bbf637ec
2019-11-22 11:31:08 -08:00
Shen Li
a9f3f48f88 Revert D5578006: Add local shutdown to process group agent
Test Plan: revert-hammer

Differential Revision:
D5578006

Original commit changeset: 6258879fb44c

fbshipit-source-id: 11b893b3a280a8383eeb20a0548626811616dca1
2019-11-22 11:31:04 -08:00
Christian Puhrsch
7903fb118f Move qkv_same, kv_same into branch (#30142)
Summary:
Perf improvements to multi_head_attention_forward

- qkv_same and kv_same were not used outside of that branch. Further, kv_same was calculated even though it is not used if qkv_same
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30142

Differential Revision: D18610938

Pulled By: cpuhrsch

fbshipit-source-id: 19b7456f20aef90032b0f42d7da8c8a2d5563ee3
2019-11-22 10:40:02 -08:00
Rohan Varma
c478a92b93 Add local shutdown to process group agent (#30020)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30020
This is now possible due to previous changes made in `gloo` and `ProcessGroupGloo`. We `abort` the listener thread that is waiting for a message, and join all other threads. The destructor calls this same `localShutdown` method, but we ensure this is not called multiple times.

ghstack-source-id: 94415336

Test Plan: Unit tests pass.

Differential Revision: D5578006

fbshipit-source-id: 6258879fb44c9fca97fdfad64468c1488c16ac02
2019-11-22 10:03:00 -08:00
Martin Yuan
559b3b5a7a Use unboxed registration for most of operators used in lite interpreter. (#30239)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30239

Use unboxed registration per smessmer 's request. For some ops with optional arg or tensor list that unboxed registration are not supported, we still use boxed.

Test Plan: Imported from OSS

Differential Revision: D18653846

Pulled By: iseeyuan

fbshipit-source-id: c22ce8111dfff0ba63316a9bcfe2b712b2d31fc1
2019-11-22 10:00:30 -08:00
Rohan Varma
f41422121e default construct rpc agent options based on the backend type (#30201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30201

Provide a default constructor so that users don't have to construct
RPC agent options. Also rename this to RPCBackend Options as suggested.
ghstack-source-id: 94411768

Test Plan: Unit tests pass.

Differential Revision: D18628698

fbshipit-source-id: 81fb45f124ad1006e628f6045162308093c9d446
2019-11-22 08:18:06 -08:00
Luke Yeager
183aa1534f Add --no_python flag (#29144)
Summary:
Allows you to use a bash script wrapper in-between launch and your
training script. e.g.
```
python -m torch.distributed.launch --nproc_per_node=8 --no_python --use_env \
    bash -c 'exec numactl --cpunodebind=$(( LOCAL_RANK / 4 )) "$@"' -- \
    python train.py ...
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29144

Differential Revision: D18345647

Pulled By: pietern

fbshipit-source-id: f05849c38c82de782988d07d300e00cf9f37253a
2019-11-22 06:05:41 -08:00
Pieter Noordhuis
a074080d57 Mark c10d::~NCCLUtils as noexcept (#29118)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29118

It's never a good idea to throw from a destructor and per #28288 we
can't use `std::make_shared` on a class with a `noexcept(false)`
destructor.

To fix this, we `abort` instead of throw from the `NCCLComm` destructor.

Closes #28288.
ghstack-source-id: 93182910

Test Plan: ProcessGroupNCCLErrorsTest runs successfully.

Reviewed By: pritamdamania87

Differential Revision: D18298271

fbshipit-source-id: ccac37753fef64fb63cb304433f4f97dc5621379
2019-11-22 04:06:12 -08:00
Natalia Lunova
23650671a8 add_hparams() NoneType error (#30286)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30286

add_hparams() in torch.utils.tensorboard.writer produced the following error
python3.7/site-packages/torch/utils/tensorboard/writer.py", line 294, in add_hparams
    with SummaryWriter(log_dir=os.path.join(self.file_writer.get_logdir(), str(time.time()))) as w_hp:
AttributeError: 'NoneType' object has no attribute 'get_logdir'
Other methods such as add_scalar() and add_histogram() use self._get_file_writer() instead of self.file_writer directly.

Test Plan:
```
writer = summary_writer()
writer.add_hparams({"a": 0, "b": 0}, {"hparam/test_accuracy": 0.5}))
writer.flush()
writer.close()
```

Reviewed By: J0Nreynolds, sanekmelnikov

Differential Revision: D18650610

fbshipit-source-id: 1039dd2067d37913a8a131c8b372491a63154899
2019-11-21 23:25:26 -08:00
neginraoof
a822a1d2a8 Avoid overwriting output type in onnx graph (#25906)
Summary:
When creating the onnx graph, we overwrite the output type with the output type of the PT graph.
In some special cases, when using scripting, the PT graph does not have type information. We want to avoid overwriting the input type is these cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25906

Reviewed By: hl475

Differential Revision: D18645903

Pulled By: houseroad

fbshipit-source-id: 56acc43e0c15c74ac8ebd689e04f7371054e362e
2019-11-21 21:30:12 -08:00
Jonathan Reynolds
0c04763d59 Changes to get inlined graph and proper names after JIT updates (#30244)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30244

This makes several small changes to the tensorboard graph parsing methods to address the recent changes to the PyTorch JIT trace/graph.
- Inline graph to get information for all nodes
- Assign and propagate scope names to GetAttr nodes
- Prune all useless GetAttr nodes (any with a ClassType output type - tensors and primitives are kept)
- Create output nodes so output tensor shape can be examined

Reviewed By: sanekmelnikov

Differential Revision: D18556323

fbshipit-source-id: b73a809bacfa554c3fe9c4ae3563525f57539874
2019-11-21 16:59:28 -08:00
Shen Li
fea963d3ae Fix BackendType repr in doc (#30243)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30243

Before this commit, rpc docs shows init_rpc as the following:

```
torch.distributed.rpc.init_rpc(
   name,
   backend=<BackendType.PROCESS_GROUP: BackendValue(
     construct_rpc_agent_options_handler=<function _process_group_construct_rpc_agent_options_handler>,
     init_backend_handler=<function _process_group_init_backend_handler>)>,
   init_method=None,
   rank=-1,
   world_size=None,
   rpc_agent_options=None
)
```

It unnecessarily leaks implementation details. This commit adds a
__repr__ function to BackendType Enum class to address this problem.

closes #29905

Test Plan: Imported from OSS

Differential Revision: D18641559

Pulled By: mrshenli

fbshipit-source-id: 19bf8a2d21c8207f026d097d8e3f077578d53106
2019-11-21 16:22:43 -08:00
Junjie Bai
352731bd6e Revert D18632773: Split libtorch.so back into libtorch_{cpu,cuda,hip}
Test Plan: revert-hammer

Differential Revision:
D18632773

Original commit changeset: ea717c81e0d7

fbshipit-source-id: 18601439f9f81c9f389020e5a0e4e04adb21772d
2019-11-21 15:01:09 -08:00
Mike Ruberry
eff4c4d7c1 Revert D18301806: Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL
Test Plan: revert-hammer

Differential Revision:
D18301806

Original commit changeset: 03da6a26c41e

fbshipit-source-id: c1324ee8d154e7e16f5dd4f1cf3625aaa566cd39
2019-11-21 14:50:07 -08:00
Alan Du
f4b9690f2d Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL (#29095)
Summary:
Given that pybind11 implements these gil functions, I don't think it makes sense for Pytorch to have its own bespoke versions.

Fixes https://github.com/pytorch/pytorch/issues/29065
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29095

Differential Revision: D18301806

Pulled By: ezyang

fbshipit-source-id: 03da6a26c41ee65aaadf7b67b9f0b14d2def2a5a
2019-11-21 13:44:40 -08:00
Jerry Zhang
1bba0eb35b Add clone_instance for Module (#30168)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30168

Previous implementation of `clone` in `script::Module` copies both the module instance and the
class type, after we enabled type sharing https://github.com/pytorch/pytorch/pull/26666 we also
need to have a function to clone instance only and share the underlying class type.

Test Plan:
tbd

Imported from OSS

Differential Revision: D18631324

fbshipit-source-id: dbadcf19695faee0f755f45093b24618c047b9d1
2019-11-21 13:00:34 -08:00
Mikhail Zolotukhin
2c1c6de122 Represent the original python name the same way in traced and scripted modules.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29912

Test Plan: Imported from OSS

Differential Revision: D18533135

Pulled By: ZolotukhinM

fbshipit-source-id: 080dbafa5dcd8c1fb12fec0c956e52fceec430e7
2019-11-21 11:55:40 -08:00
Edward Yang
ec30d9028a Split libtorch.so back into libtorch_{cpu,cuda,hip} (#29731)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29731

The new structure is that libtorch_cpu contains the bulk of our
code, and libtorch depends on libtorch_cpu and libtorch_cuda.

Some subtleties about the patch:
- There were a few functions that crossed CPU-CUDA boundary without API macros. I just added them, easy enough. An inverse situation was aten/src/THC/THCTensorRandom.cu where we weren't supposed to put API macros directly in a cpp file.
- DispatchStub wasn't getting all of its symbols related to static members on DispatchStub exported properly. I tried a few fixes but in the end I just moved everyone off using DispatchStub to dispatch CUDA/HIP (so they just use normal dispatch for those cases.) Additionally, there were some mistakes where people incorrectly were failing to actually import the declaration of the dispatch stub, so added includes for those cases.
- torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library)
- The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774
- In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/l
ibprotobuf.a(arena.cc.o) is referenced by DSO"
- A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly. This situation also happens with custom C++ extensions.
- There's a ROCm compiler bug where extern "C" on functions is not respected. There's a little workaround to handle this.
- Because I was too lazy to check if HIPify was converting TORCH_CUDA_API into TORCH_HIP_API, I just made it so HIP build also triggers the TORCH_CUDA_API macro. Eventually, we should translate and keep the nature of TORCH_CUDA_API constant in all cases.

Fixes #27215 (as our libraries are smaller), and executes on
part of the plan in #29235.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18632773

Pulled By: ezyang

fbshipit-source-id: ea717c81e0d7554ede1dc404108603455a81da82
2019-11-21 11:27:33 -08:00
Lingyi Liu
7d3afc4186 enable the per channel dynamic quantization (#30122)
Summary:
The PR tried to enable the per-channel(row-wise) dynamic quantization for linear operator. Given we have seen some accuracy drop due to the per-tensor quantization, we expect the per-channel could help improve the accuracy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30122

Differential Revision: D18630541

Pulled By: lly-zero-one

fbshipit-source-id: d52685deec5e7de46cd686ae649a8c8765b9cacf
2019-11-21 10:12:05 -08:00
Will Feng
3ba1456aee Fix clip_grad_norm_ / clip_grad_value_ to take input by value instead of by non-const ref (#30216)
Summary:
The original design of `torch::nn::utils::clip_grad_norm_` / `clip_grad_value_` takes input by non-const reference, which prevents users from passing rvalue reference as input into the functions. This PR changes the functions to take input by value, which matches the Python version's semantics, and also adheres to the C++ API convention that if a function modifies its input in-place, it should take that input by value.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30216

Differential Revision: D18632543

Pulled By: yf225

fbshipit-source-id: 97a09d6467f982fe9c8120f483a9c07fcf13699e
2019-11-21 10:07:00 -08:00
Wen Zhang
6e4c23b02f Add RPC internal helper that overrides the default pickler. (#30185)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30185

To enable share_memory over RPC, add an internal helper that overrides the default RPC pickler.
Replace D18598974
ghstack-source-id: 94299660

Test Plan:
`python test/test_rpc_spawn RpcTestWithSpawn.test_use_rpc_pickler`

`buck test mode/dev-nosan //caffe2/test:rpc_spawn -- test_use_rpc_pickler`

Reviewed By: mrshenli

Differential Revision: D18621372

fbshipit-source-id: c680ef711b2c42524c47a5266e911fa8e0cd45ae
2019-11-21 10:01:02 -08:00
Nikolay Korovaiko
e3334723b2 fix a crash due in nested bailouts (#30097)
Summary:
A prim::BailOut also needs to capture max trip counts as for some graphs they aren't constants and they are used in continuation graphs to figure out the remaining number of iterations to run.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30097

Differential Revision: D18624446

Pulled By: Krovatkin

fbshipit-source-id: 085d25981c6669f65848996cd2d50066cc252048
2019-11-21 09:53:12 -08:00
Edward Yang
9e81616343 Merge Tensor and Variable types. (#28287)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28287

This PR eliminates the static distinction between
Tensor and Variable.  Every Variable is a Tensor, no need to static_cast
or call the Variable constructor.

To do this, I need Tensor to have API parity with Variable. I have already
moved most of the methods I don't want in Tensor off Variable.
These implementations are all placed in Tensor.cpp.

One API difference is that all Variable methods now have const, so we no longer
have faux const-correctness (see https://github.com/zdevito/ATen/issues/27 for
back story)

This diff is BC breaking in a few ways:
- Because torch::autograd::Variable is now just an alias of at::Tensor, ADL for
  `torch::autograd` functions no longer works, you have to explicitly qualify
  them with `torch::autograd` (examples: `torch/nn/parallel/data_parallel.h`)
- Because Variable and Tensor are now the same type, code which assumes that
  they are different types (e.g., for the purposes of templating, or enable_if checks)
  will not work until you delete the (now) redundant overload/specialization.
  (examples: `torch/nn/modules/container/any.h`, `torch/csrc/utils/pybind.h`)

Some other notes:
- I'm not sure what was going with the old template implementation of `extract_vars`,
  but I couldn't get the sfinae version to work. Replacing it with an overloading based version
  made it work.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18571426

Pulled By: ezyang

fbshipit-source-id: 2ea8151e5f1d8512cdebf1345399642e68b707b8
2019-11-21 09:26:39 -08:00
Wanchao Liang
f7b12a9858 fix aten::grad to return optional list (#29577)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29577

`torch.autograd.grad` can return none is one of the input is not in the
autograd graph or not requires_grad, this fix it so that it return a
list of optional tensor instead of list of tensor.

This might have BC issue unfortunately, but I think it's rare both
internal and external (only training use it, and most of the training
use backward, instead of autograd.grad), so whitelist it.

Test Plan: Imported from OSS

Differential Revision: D18491642

fbshipit-source-id: d32b2b3446cf9e8b9a98f6d203a21a75643d8991
2019-11-20 22:19:10 -08:00
Rohan Varma
cc16819028 Add abort API in gloo ProcessGroup Send/Recv Work (#29928)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29928

Original author: Shihao Xu
- Add abort to `c10d::ProcessGroup::Work`.
- Change the return type of `c10d::ProcessGroup::Work::wait()` to boolean to indicate if the work is aborted after waiting.
- Add unit test for the correctness of abort.
ghstack-source-id: 94305515
ghstack-source-id: 94305515

Differential Revision: D5685727

fbshipit-source-id: 6e682bb563c2393a5c303c877331140417d3f607
2019-11-20 20:18:54 -08:00
lsrock1
0a77c090d5 C++ parity, convert_parameters (#29267)
Summary:
yf225 https://github.com/pytorch/pytorch/issues/25883
update parameters_to_vector and vector_to_parameters
check please!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29267

Differential Revision: D18628571

Pulled By: yf225

fbshipit-source-id: 03783e6b0f8183dd97ae48f3da4acb1d07083555
2019-11-20 19:59:11 -08:00
Lara
bbb3c415c9 ONNX Hardtanh Opset 11 Support (#30169)
Summary:
Add support for hardtanh that was blacklisted in opset 11.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30169

Reviewed By: hl475

Differential Revision: D18619552

Pulled By: houseroad

fbshipit-source-id: 0c1bfb0a53d1dd2327c5db7afd03a90482abb9fe
2019-11-20 18:59:00 -08:00
James Reed
449828378d Serialize ClassType as its qualname
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30058

Test Plan: Imported from OSS

Differential Revision: D18584269

Pulled By: jamesr66a

fbshipit-source-id: 5f1d0142bd7cd94eecbd2ed9250a0de47639040b
2019-11-20 16:17:26 -08:00
Rohan Varma
de05114618 polish examples in docstrings and update docs to reflect correct use of (#30052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30052

Some of the examples provided in `rpc/api.py` were not updated along
with the code changes, this PR updates them. Also removes the
`dist.ProcessGroup` information since `init_rpc` now initializes a default
process group.
ghstack-source-id: 94273004

Test Plan: Unit tests pass

Differential Revision: D18582596

fbshipit-source-id: a637683f0221f9600f7e50b74e9f7e5a1d331d8f
2019-11-20 15:30:38 -08:00
Jeremy Lilley
bebed492cf Make RRefContext singleton leaky, deal with module destruct order race. (#30172)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30172

RRefContext is a conventional singleton, used by rref.cpp. At module teardown
time, it's not defined whether rref_context.cpp or rref.cpp will be destroyed first.

We were observing a SIGSEGV because RRefContext is destroyed before a dangling
~UserRRef() call is able to execute. Particularly, the underlying
ctx.agent()->getWorkerInfo(ownerId_) call failed.

This change just avoids the SIGSEGV by forcing an intentional leak, though we still
need to deal with why there's a dangling UserRref at module destruction time.
ghstack-source-id: 94287441

Test Plan:
existing test suite
       test_elastic_averaging in context of D18511430, where the segfault reproed reliable.

Differential Revision: D18620786

fbshipit-source-id: 17b6ccc0eb1724b579a68615e4afb8e9672b0662
2019-11-20 15:12:51 -08:00
Wanchao Liang
36aaa299f8 shut up clang-tidy on ir.h/cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30118

Test Plan: Imported from OSS

Differential Revision: D18620239

fbshipit-source-id: 5734d9d1f38a9b38ac4a1fc121fb246b783fa262
2019-11-20 13:19:25 -08:00
James Reed
c2b7b2cbf8 Make observed values actually flow through observers (#30140)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30140

This seems more semantically correct to me, and makes it so we don't have to iterate over Uses of observed values

Test Plan: Imported from OSS

Differential Revision: D18610676

Pulled By: jamesr66a

fbshipit-source-id: f835266f148bd8198b05cd9df95276e1112dd250
2019-11-20 12:48:16 -08:00
James Reed
2d534abb39 Modernize graph mode IR API calls
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30130

Test Plan: Imported from OSS

Differential Revision: D18608004

Pulled By: jamesr66a

fbshipit-source-id: 42e946ec96b1d26a364abe0a7eb71aa0aecc52ed
2019-11-20 12:48:12 -08:00
Rohan Varma
f304bd5062 rename join_rpc to wait_all_workers in public api (#30050)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30050

Renames this API to wait_all_workers as discussed.
ghstack-source-id: 94273005

Test Plan: Unit tests pass

Differential Revision: D18581466

fbshipit-source-id: 4ff5d5fb2d528f17252d5b5f30c3047d2efb92bf
2019-11-20 12:38:35 -08:00
Will Feng
a460c856dd Fix naming for kl_div and binary_cross_entropy functional options (#30146)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30146

This PR fixes naming for kl_div and binary_cross_entropy functional options, to be more consistent with the naming scheme of other functional options.

Test Plan: Imported from OSS

Differential Revision: D18618971

Pulled By: yf225

fbshipit-source-id: 2af62c1a0ace2cd0c36c2f1071639bf131d8fe61
2019-11-20 12:23:50 -08:00
Raghuraman Krishnamoorthi
67b77afcdf Fast histogram observer
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29790

Test Plan:
import torch
import time
import numpy as np
from torch.quantization.observer import HistogramObserver

X = torch.randn(1,1,224,224)

obs = HistogramObserver(2048)
acc_time = 0
for i in range(100):
   X = torch.randn(10,1,320,320)
   start = time.time()
   obs(X)
   #obs.forward_new(X)
   acc_time = acc_time + time.time()-start
print(acc_time)

Imported from OSS

Differential Revision: D18508562

fbshipit-source-id: 456e82360ce1b3f9d8b6e1832d23f1339655011a
2019-11-20 11:14:41 -08:00
Jerry Zhang
f2b851a9e5 Returning axis from calculate_qparams (#29494)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29494

`calculate_qparams` of per channel quantization should return the axis, this
PR added this and also added corresponding support in graph mode

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D18580905

fbshipit-source-id: f9691c1f043f8bca39f81716a4d0b10f60a65396
2019-11-20 11:06:48 -08:00
David Reiss
fbcb88e8b3 Split module.cpp and export.cpp to support saving on mobile (#29881)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29881

Breaking these into separate files allows us to have three different builds:
- Mobile inference-only.
- Mobile with module saving.
- Server with module saving and other export functions like ONNX.

And this can be accomplished just by selecting which cpp files to compile,
without setting any preprocessor flags.

Test Plan: CI.  Local mobile+saving build.

Reviewed By: smessmer

Differential Revision: D18509296

fbshipit-source-id: 9438273bac4624df5c7f035b2bacb901cce43053
2019-11-20 10:47:21 -08:00
Will Feng
72bc7bf37b Revert D18612158: Fix naming for kl_div and binary_cross_entropy functional options
Test Plan: revert-hammer

Differential Revision:
D18612158

Original commit changeset: 8c403fa1c2a0

fbshipit-source-id: f22d7c4664119d4e7397fc017bacecf3e318af11
2019-11-20 10:26:31 -08:00
Will Feng
e84fcc1fd1 Fix naming for kl_div and binary_cross_entropy functional options (#30146)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30146

This PR fixes naming for kl_div and binary_cross_entropy functional options, to be more consistent with the naming scheme of other functional options.

Test Plan: Imported from OSS

Differential Revision: D18612158

Pulled By: yf225

fbshipit-source-id: 8c403fa1c2a0a65734a3ec2387cc0937c46cab24
2019-11-20 09:44:21 -08:00
xiaobing.zhang
c2c835dd95 Port sigmoid backward to Aten(CPU+CUDA) (#29185)
Summary:
VitalyFedyunin, This PR is about port sigmoid backward to Aten:
**Test script:**
```
import torch
import torch.nn as nn
import time

torch.manual_seed(0)

def _time():
    if torch.cuda.is_available():
        torch.cuda.synchronize()
    return time.time()

device = "cpu"
if torch.cuda.is_available():
    device = "cuda"

#warm up
for n in [100, 10000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    for i in range(1000):
        output = input.sigmoid().sum()
        output.backward()

#get running time
for n in [100, 10000]:
    bwd_t = 0
    input = torch.randn(128, n, requires_grad=True, device=device)
    for i in range(10000):
        output = input.sigmoid().sum()
        t1 = _time()
        output.backward()
        t2 = _time()
        bwd_t = bwd_t + (t2 - t1)
    bwd_avg = bwd_t / 10000 * 1000
    print("input size(128, %d), backwad avg time is %.2f (ms)." % (n, bwd_avg))
```
Test Device: CPU: skx-8280, GPU: Tesla P40

**Perfromance**:
Before:
```
GPU:
input size(128, 100), backwad avg time is 0.14 (ms).
input size(128, 10000), backwad avg time is 0.17 (ms).
CPU:
OMP_NUM_THREADS=56
input size(128, 100), backwad avg time is 0.06 (ms).
input size(128, 10000), backwad avg time is 4.21 (ms).
OMP_NUM_THREADS=1
input size(128, 100), backwad avg time is 0.06 (ms).
input size(128, 10000), backwad avg time is 2.30 (ms).
```
After:
```
GPU:
input size(128, 100), backwad avg time is 0.14 (ms).
input size(128, 10000), backwad avg time is 0.17 (ms).
CPU:
OMP_NUM_THREADS=56
input size(128, 100), backwad avg time is 0.05 (ms).
input size(128, 10000), backwad avg time is 0.48 (ms).
OMP_NUM_THREADS=1
input size(128, 100), backwad avg time is 0.04 (ms).
input size(128, 10000), backwad avg time is 0.86 (ms).
```
How to set number thread? using following script:
```
num_threads=$1
script=$2
last_core=`expr $num_threads - 1`
echo "using $num_threads OMP threads"
echo "bind cores to 0~$last_core"
export OMP_NUM_THREADS=$num_threads
export KMP_AFFINITY=granularity=fine,compact,1,0
numactl --physcpubind=0-$last_core --membind=0 python $script
```
and run **./run.sh num_threads test.py**.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29185

Differential Revision: D18587352

Pulled By: VitalyFedyunin

fbshipit-source-id: 8167ca261960399f795d35a83fa8c4be365bc4da
2019-11-20 07:31:42 -08:00