Commit Graph

39023 Commits

Author SHA1 Message Date
tktrungna
f875027713 Update test distribute path 2021-08-03 09:13:11 -07:00
Philip Meier
2cf4d8128d add OpInfo for torch.nn.functional.mse_loss (#62254)
Summary:
Addresses facebookresearch/functorch#78.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62254

Reviewed By: malfet

Differential Revision: D30013331

Pulled By: zou3519

fbshipit-source-id: e3242cb97d1f061b932e3e0ed589f1ee6a291512
2021-08-03 09:01:09 -07:00
Raghavan Raman
ab8af15545 [Static Runtime] Enabled building Static Runtime tests and benchmarks in OSS CI (#62226)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62226

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D29923800

Pulled By: navahgar

fbshipit-source-id: 33cfe0e92a34c7140ea762e5715301cfbf401434
2021-08-03 08:52:36 -07:00
Andrew Gu
43327cc197 Refactor commonalities between two approaches (#62624)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62624

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D30058543

Pulled By: andwgu

fbshipit-source-id: 73c794062b75e011868fae264f592549eed67482
2021-08-03 08:43:14 -07:00
Andrew Gu
e6a3967c2a Add invariant check (bucket indices: 0, 1, ..., k-1) (#62623)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62623

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D30058544

Pulled By: andwgu

fbshipit-source-id: a56910f294c6a40118751eebe255b62700f42be9
2021-08-03 08:13:52 -07:00
Kevin Tse
87465a6e68 adding operator cumulative_trapezoid (#61615)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* https://github.com/pytorch/pytorch/issues/61616
* **https://github.com/pytorch/pytorch/issues/61615**
* https://github.com/pytorch/pytorch/issues/61475

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61615

Reviewed By: malfet, mruberry

Differential Revision: D29975064

Pulled By: NivekT

fbshipit-source-id: 4d4e98f3efb720fdc44eb238ecbf0fa157ac13d7
2021-08-03 08:04:00 -07:00
Sergei Vorobev
b37578b3c0 Make bazel output less verbose in CI (#62601)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62600

Adds `bazel --config=no-tty` that is useful for less verbose output in environments that don't implement full tty like CI.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62601

Reviewed By: soulitzer

Differential Revision: D30070154

Pulled By: malfet

fbshipit-source-id: 5b89af8441c3c6c7ca7e9a0ebdfddee00c9ab576
2021-08-03 07:59:01 -07:00
Victor Quach
3bda4ea842 Avoid unnecessary copying data in Saved Variable (#61927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61927

This is a refactor of `SavedVariable.cpp` to prevent ever defining the `data_` tensor if default hooks are set.

Before the refactor:

```c++
data_ = variable.tensor_data(); // this is wasteful if hooks are defined
register_hooks(Engine::get_default_engine().get_default_saved_variable_hooks());
```

After the refactor:
```c++
if (get_default_hooks_()) {
  save_metadata_(variable);
  register_hooks_(get_default_hooks_(), variable);
  return;
}
save_metadata_(variable);
data_ = variable.tensor_data(); // only needed if hooks are not defined
```

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D29848524

Pulled By: Varal7

fbshipit-source-id: abca1eee37a17b47841e28d8a576490913fce1ce
2021-08-03 07:09:47 -07:00
Yukio Siraichi
7edb4f8761 Port cumprod kernel to structured kernels. (#61899)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61899

Tracking issue: #55070

This PR also removes `at::_cumprod`, which was the "backend" for `at::cumprod`.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29939489

Pulled By: ezyang

fbshipit-source-id: d5e4a6dfa6c79e4b135508ea13c2d11bd0684f63
2021-08-03 06:58:13 -07:00
Yukio Siraichi
e52325655a Port cumprod kernel to structured kernels. (#61899)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61899

Tracking issue: #55070

This PR also removes `at::_cumprod`, which was the "backend" for `at::cumprod`.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29939152

Pulled By: ezyang

fbshipit-source-id: b3379033a1ffe3c7bc8216d16d089d388ea559ba
2021-08-03 06:57:09 -07:00
yanbing-j
c7a7c2b62f Enable Gelu fp32/bf16 in CPU path using Mkldnn implementation (#58525)
Summary:
Enable Gelu bf16/fp32 in CPU path using Mkldnn implementation. User doesn't need to_mkldnn() explicitly. New Gelu fp32 performs better than original one.

Add Gelu backward for https://github.com/pytorch/pytorch/pull/53615.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58525

Reviewed By: ejguan

Differential Revision: D29940369

Pulled By: ezyang

fbshipit-source-id: df9598262ec50e5d7f6e96490562aa1b116948bf
2021-08-03 06:52:23 -07:00
kshitij12345
fd8004b42e add bfloat16 impl for nextafter (#61829)
Summary:
Add `BFloat16` support for `nextafter`.

* [x] Add OpInfo
* [x] Add Implementation Test (C++ tests)
* [x] Add credit

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61829

Reviewed By: ejguan

Differential Revision: D29932498

Pulled By: mruberry

fbshipit-source-id: 89524531a4800569ba1addd08a4ace330a6f72a4
2021-08-02 23:16:58 -07:00
Richard Barnes
2888b7fec5 Fix sign comparison (#62483)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62483

Test Plan: Sandcastle

Reviewed By: albanD

Differential Revision: D30015385

fbshipit-source-id: eefc3208fb8c42ff46b9f4d910eb93c32595fa28
2021-08-02 22:50:39 -07:00
Nikita Shulga
a77be16538 TensorAccessor::bounds_check should be a CPU-only funciton (#62628)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62628

This fixes following errors when ROCm compiler is used
```
caffe2/aten/src/ATen/core/TensorAccessor.h:160:5: error: throw is prohibited in AMP-restricted functions
    TORCH_CHECK_INDEX(
    ^
```

Test Plan: CI

Reviewed By: zhouzhuojie, seemethere

Differential Revision: D30059737

fbshipit-source-id: d094ee608768db41fcc91d044c2c6d7d29f33fe4
2021-08-02 22:46:24 -07:00
Adam Simpkins
e0364ccc33 [caffe2] break one circular dependency between Caffe2 and ATen-cpu (#62632)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62632

Update the caffe2/core/context.h to directly use `at::mt19937` instead of the
`at::CPUGeneratorImpl` wrapper class from the ATen-cpu library.

Using `at::CPUGeneratorImpl` causes circular dependencies between the ATen and
caffe2 code.  In particular the `at::CPUGeneratorImpl::get_state()` logic
depends on CPU Tensor functionality that currently depends on code from
caffe2.

Test Plan:
The RNG behavior should be identically to the previous code (perhaps even
faster since we now avoid virtual function calls).

  buck test //caffe2/caffe2:caffe2_test_cpu \
    //caffe2/caffe2/python: //caffe2/caffe2/fb/operators:

Differential Revision: D29915701

fbshipit-source-id: f9b2eab8d3b21b2224d30bcf52be9c0e7eb7cd0a
2021-08-02 22:40:56 -07:00
Pritam Damania
88af4d8441 Initialize RRefs only when explicitly asked for. (#62618)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62618

ShardedTensor implicitly initialized RRefs to remote shards if the
RPC framework was initialized. Although, there are use cases where the RPC
framework might be initialized for a different purpose but users would not
prefer that ShardedTensor initializes RRefs as well.

As a result, I've made RRef initialization explcitit in ShardedTensor APIs.
ghstack-source-id: 134889287

Test Plan:
1) waitforbuildbot
2) unit tests.

Reviewed By: wanchaol

Differential Revision: D30056833

fbshipit-source-id: 9b2433a38dafa1888589c5b72ed93b6f0ee51639
2021-08-02 22:17:17 -07:00
Isuru Fernando
b58e04f156 Make sure FindLAPACK finds the same BLAS library (#49647)
Summary:
BLAS library is found by cmake/Dependencies.cmake and then
LAPACK library is found by FindLAPACK.cmake which in turn calls
FindBLAS.cmake. This means that we are searching for BLAS twice
and they might be different things. By setting a few variables,
this can be avoided.

cc seemethere

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49647

Reviewed By: seemethere, ejguan

Differential Revision: D29943680

Pulled By: malfet

fbshipit-source-id: 3cbc350ea645a1a28dd92c19e5ee7f9eecdeff59
2021-08-02 20:41:00 -07:00
Nathan Lanza
2d038b5dc8 Cast a var to void that is unused
Summary: The comment above makes it seem intentional, so just ignore it.

Test Plan: NFC

Reviewed By: smeenai

Differential Revision: D30057632

fbshipit-source-id: 45929b4eeeefdf22f5c7c4dd603229635f9da31b
2021-08-02 19:56:41 -07:00
Santiago Castro
c4196bee93 Save some memory in scatter (#62516)
Summary:
Also removes some redundant parenthesis for clarity.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62516

Reviewed By: andwgu

Differential Revision: D30030546

Pulled By: SciPioneer

fbshipit-source-id: e106486f70b9590bf3dcffb76d23f5725737542f
2021-08-02 18:41:58 -07:00
Hui Guo
10d3a2c13a [tensorexpr] Added logging info for SimplifierUnderContext (#62138)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62138

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D29891257

Pulled By: huiguoo

fbshipit-source-id: c36b3d615fa2fe971d022111bef61ee843a9dbea
2021-08-02 18:38:55 -07:00
Hui Guo
3a592730d5 [nnc] Simplify i%100 to i if i is less than 100; fixed #52580 (#60693)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60693

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D29375938

Pulled By: huiguoo

fbshipit-source-id: 1388729c5b93805cb156efa53e8823d5462885bf
2021-08-02 18:38:54 -07:00
Hui Guo
8f7ae77040 [nnc] Add context-sensitive simplification for div/mod (#60688)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60688

Test Plan: Imported from OSS

Reviewed By: navahgar, ZolotukhinM

Differential Revision: D29373313

Pulled By: huiguoo

fbshipit-source-id: 90d7f2fbfce583b0ea3b0f1c7899e22b0210bd62
2021-08-02 18:37:39 -07:00
Pritam Damania
c07a123b26 Support saving and loading ShardedTensor. (#62242)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62242

1) Add a state_dict hook to ensure ShardedTensors are
added to a state_dict.
2) Add a pre load state_dict hook to ensure ShardedTensor are added back to a
module at load time.
3) Add a `with_load_process_group` context manager for load time.
4) Added ser-de capability to ShardedTensor.
ghstack-source-id: 134860967

Test Plan:
1) unit tests
2) waitforbuildbot

Reviewed By: wanchaol

Differential Revision: D29927881

fbshipit-source-id: b1ef8872ed91e9cb0e2d5dd17d2764678ab89f0c
2021-08-02 18:33:19 -07:00
Eli Uriegas
dd23372aa5 .circleci: Prefix intermediate build image tags (#62610)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62610

Prefixes intermediate build image tags with build- so that ECR lifecycle
policies can automatically clean them up

Policy to automatically cleanup images prefixed with `build-`: b02dd818f9

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: walterddr

Differential Revision: D30055952

Pulled By: seemethere

fbshipit-source-id: 328b9c94ffc02877d088d0118a19c732f580838b
2021-08-02 18:17:14 -07:00
Victor Quach
525fa2f0b6 [reland] Catch saved tensors default hooks race condition (#62564)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62564

If the user runs code that registers default saved tensor hooks from
multiple threads, it will fail with a nice error message most of the
time. This commit handles the very rare case where a race condition
would have made it fail silently.

Relanding previous PR #61957

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30045406

Pulled By: Varal7

fbshipit-source-id: d04f74c99affbbf655e53cfc2acd42f7c5b4e6eb
2021-08-02 18:00:37 -07:00
Nikita Shulga
f5cf24a224 Fix lint in test_deploy_from_python.py (#62626)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62626

Reviewed By: walterddr, zhouzhuojie, seemethere

Differential Revision: D30059119

Pulled By: malfet

fbshipit-source-id: 2aff44c1585091d864ab7e02d69046204e5b5d17
2021-08-02 17:55:24 -07:00
Mustafa Bal
615ac8e573 Added logic for notifying PTE webapp for Nightly and PR builds (#62512)
Summary:
This PR adds the logic to notify the PTE webapp for DevOps PyTorch Nightly and PR builds.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62512

Reviewed By: iramazanli

Differential Revision: D30046165

Pulled By: malfet

fbshipit-source-id: ef7e4848d4db9f38536a647fcd2d8e26cf64b960
2021-08-02 16:44:35 -07:00
Yi Wang
db071ef005 [Reland][DDP Communication Hook] Rename 4 Methods of GradBucket Class (#62592)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62592

Reland #62510

`GradBucket` is an important class defined in both C++ and Python, used for PyTorch Distributed Training. We need to rename the following methods for simplicity:
1) get_index -> index
2) is_the_last_bucket_to_allreduce -> is_last,
3) get_per_parameter_tensors -> gradients,
4) get_model_params_for_bucket -> parameters.
ghstack-source-id: 134848352

Test Plan: unit test

Reviewed By: andwgu

Differential Revision: D30049431

fbshipit-source-id: 1bcac331aa30e529b7230e3891bc811c531b0ea9
2021-08-02 16:38:09 -07:00
Salil Desai
d228a8fc94 [Vulkan] Softmax Along Channel Dim (#62239)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62239

Added naive implementation of vulkan softmax (not using shared memory)

Based off of naive implementation of mean, found here:

2565a33c98/aten/src/ATen/native/vulkan/glsl/mean.glsl

Test Plan:
After building:

```
build/bin/vulkan_api_test
```

{F637001190}

```
[ RUN      ] VulkanAPITest.softmax
[       OK ] VulkanAPITest.softmax (180 ms)
```

Reviewed By: SS-JIA

Differential Revision: D29793150

fbshipit-source-id: 4f9d8e1dae8a43cbcb7063b095fa4726df06c929
2021-08-02 16:20:44 -07:00
Peter Bell
940cbbce76 Add BFloat16 support to CPU nansum (#61083)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61083

It's already supported on CUDA, so it seems reasonable to support on CPU as
well. This also changes `test_nansum` to compare against `torch.sum` since numpy
doesn't support BFloat16. Note that `test_nansum_vs_numpy` checks against
NumPy as well, so that's still being tested.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30006227

Pulled By: heitorschueroff

fbshipit-source-id: 1449730e1936417e7de1f8b3cf8cdcc15518873c
2021-08-02 16:03:57 -07:00
Zachary DeVito
27d3d3a7d7 deploy in python fix to work in @opt mode
Summary: if we let torch_deploy get put in libomnibus, it hides the symbols we need to link against

Test Plan: buck test //caffe2/torch/csrc/deploy:test_deploy_from_python -- --exact 'caffe2/torch/csrc/deploy:test_deploy_from_python - test_deploy_from_python (caffe2.torch.csrc.deploy.test_deploy_from_python.TestDeployFromPython)' --run-disabled

Reviewed By: wconstab

Differential Revision: D30031134

fbshipit-source-id: e5c2f740f17abafec7d01c57c664bd71a00b6f61
2021-08-02 14:47:49 -07:00
Gao, Xiang
a4af91b2fe Cleanup CUDA 10.1 and 10.0 support on CI (#62597)
Summary:
10.1 is removed in https://github.com/pytorch/pytorch/pull/56056

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62597

Reviewed By: walterddr

Differential Revision: D30053902

Pulled By: seemethere

fbshipit-source-id: deb148e5e44c12b08c267a36fbd4a1afa138e6e4
2021-08-02 14:42:25 -07:00
Jacob Szwejbka
305d5fcc05 [Pytorch Edge] get_model_bytecode int -> uint (#62201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62201

change int to uint to be the same type as the runtimes bytecode. Only affects c++ since python doesn't have uints iirc. Also changed the behavior of the functions from returning -1 and a warning to just throw an exception. Wasnt sure what the proper behavior here would be (returning UINT_MAX seemed gross) so feedback is appreciated.

Test Plan: ci

Reviewed By: raziel

Differential Revision: D29914072

fbshipit-source-id: 1bb08702fc301d7c7612b5ad7205a6dbe855c890
2021-08-02 14:17:44 -07:00
Nikita Shulga
0c4c37b01e Disable libtorch testing on MacOS (#62599)
Summary:
Fixes regression introduced by https://github.com/pytorch/pytorch/issues/62402

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62599

Reviewed By: walterddr, driazati

Differential Revision: D30051914

Pulled By: malfet

fbshipit-source-id: a07184b21cc4b2d0ae31fe385bb58a0f665595c6
2021-08-02 13:41:18 -07:00
Bradley Davis
093495d3f0 [fx] prevent implicit submodule inlining when submodule is a GraphModule (#62436)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62436

## Problem

Given two modules and a tracer that indiscriminately marks all modules as a leaf:
```
class InnerModule(torch.nn.Module):

    def forward(self, t):
        return t + t

class MyModule(torch.nn.Module):
    def __init__(self, inner):
        super().__init__()
        self.inner = inner

    def forward(self, t):
        x = self.inner(t)
        y = self.inner(t)
        return x + y

class MyTracer(torch.fx.Tracer):
    def is_leaf_module(self, module, name):
        return True
```

One might generally expect the following behavior (note call_module nodes):
```
print(">> Outer GraphModule (with inner module as nn.Module):")
inner = InnerModule()
m = MyModule(inner)
gm = torch.fx.GraphModule(m, MyTracer().trace(m))
print(gm.graph.print_tabular())

>> Outer GraphModule (with inner module as nn.Module):
opcode         name     target                   args              kwargs
-------------  -------  -----------------------  ----------------  --------
placeholder    t        t                        ()                {}
call_module    inner    inner                    (t,)              {}
call_module    inner_1  inner                    (t,)              {}
call_function  add      <built-in function add>  (inner, inner_1)  {}
output         output   output                   (add,)            {}
None
```

However, when the inner module is first symbolically traced, the symbolic trace of the outer module ignores `is_leaf_node` entirely, and traces through the whole module (note call_function nodes).
```
print(">> Inner module as GraphModule:")
inner = InnerModule()
inner_gm = torch.fx.GraphModule(inner, MyTracer().trace(inner))
print(inner_gm.graph.print_tabular())

print(">> Outer GraphModule (with inner module as GraphModule):")
m = MyModule(inner_gm)
gm = torch.fx.GraphModule(m, MyTracer().trace(m))
print(gm.graph.print_tabular())

>> Inner module as GraphModule:
opcode         name    target                   args    kwargs
-------------  ------  -----------------------  ------  --------
placeholder    t       t                        ()      {}
call_function  add     <built-in function add>  (t, t)  {}
output         output  output                   (add,)  {}
None

>> Outer GraphModule (with inner module as GraphModule):
opcode         name    target                   args          kwargs
-------------  ------  -----------------------  ------------  --------
placeholder    t       t                        ()            {}
call_function  add     <built-in function add>  (t, t)        {}
call_function  add_1   <built-in function add>  (t, t)        {}
call_function  add_2   <built-in function add>  (add, add_1)  {}
output         output  output                   (add_2,)      {}
None
```

This is surprising behavior and at first glance violates the tracer's intent. As I understand it, `torch.fx.symbolic_trace.Tracer.trace` intends to patch `torch.nn.Module.__call__` with a `module_call_wrapper()` that records a `call_module` node if the module is a leaf, else executes `torch.fx._symbbolic_trace._orig_module_call = torch.nn.Module.__call__`, which is set a module loading time.

**Every submodule should be a leaf, but no `call_module` nodes are created when that submodule is a `GraphModule`. Why?**

Upon further inspection, I found:

- The constructor for GraphModule includes a path to `GraphModule.recompile()` via the setter for a `fx.Graph`:
```
inner_gm = torch.fx.GraphModule(inner, MyTracer().trace(inner))

File "/torch/fx/graph_module.py", line 252, in __init__
self.graph = graph

File "/torch/nn/modules/module.py", line 1183, in __setattr__
object.__setattr__(self, name, value)

File "/torch/fx/graph_module.py", line 277, in graph
self.recompile()
```
- `recompile()` wraps the `__call__` method by holding a reference to the `__call__` method at the time of recompilation:
```
cls = type(self)
cls_call = cls.__call__
...
def wrapped_call(self, *args, **kwargs):
    try:
        return cls_call(self, *args, **kwargs)
    except Exception as e:
        ...
cls.__call__ = wrapped_call
```
- Recompilation of the inner GraphModule happens on initialization, before creation or tracing of the outer module. Adding some old-fashioned print debug statements gives:
```
Inner Module:
_orig_module_call: <function Module._call_impl at 0x7faaebfee8b0>
recompile: cls.__call__ now wraps _orig_module_call, <function Module._call_impl at 0x7faaebfee8b0>

Outer Module:
_orig_module_call: <function Module._call_impl at 0x7faaebfee8b0>
tracing: patching method <class 'torch.nn.modules.module.Module'>.__call__ <function Module._call_impl at 0x7faaebfee8b0> with <function Module._call_impl at 0x7fa9d42bce50>

outer module MRO before tracing:
(0) <class '__main__.MyModule'>: <function Module._call_impl at 0x7faaebfee8b0>
(1) <class 'torch.nn.modules.module.Module'>: <function Module._call_impl at 0x7faaebfee8b0>
(2) <class 'object'>: <method-wrapper '__call__' of type object at 0x7fac3cd15f00>

outer module MRO during tracing:
(0) <class '__main__.MyModule'>: <function Module._call_impl at 0x7fa9d42bce50>
(1) <class 'torch.nn.modules.module.Module'>: <function Module._call_impl at 0x7fa9d42bce50>
(2) <class 'object'>: <method-wrapper '__call__' of type object at 0x7fac3cd15f00>

inner module MRO before tracing:
(0) <class 'torch.fx.graph_module.GraphModule.__new__.<locals>.GraphModuleImpl'>: <function x.y.z.wrapped_call at 0x7fa9d42a8670>
(1) <class 'torch.fx.graph_module.GraphModule'>: <function Module._call_impl at 0x7faaebfee8b0>
(2) <class 'torch.nn.modules.module.Module'>: <function Module._call_impl at 0x7faaebfee8b0>
(3) <class 'object'>: <method-wrapper '__call__' of type object at 0x7fac3cd15f00>

inner module MRO during tracing:
(0) <class 'torch.fx.graph_module.GraphModule.__new__.<locals>.GraphModuleImpl'>: <function x.y.z.wrapped_call at 0x7fa9d42a8670>
(1) <class 'torch.fx.graph_module.GraphModule'>: <function Module._call_impl at 0x7fa9d42bce50>
(2) <class 'torch.nn.modules.module.Module'>: <function Module._call_impl at 0x7fa9d42bce50>
(3) <class 'object'>: <method-wrapper '__call__' of type object at 0x7fac3cd15f00>
```

- The outer module is patched correctly, but the inner module's first element in its MRO is the `wrapped_call` from `recompile` that still invokes `<function Module._call_impl at 0x7faaebfee8b0>` directly. Therefore, no call_module nodes are created.

## In Practice

In practice, this behavior affects the ability of `torch.package` to package `GraphModules` whose submodules are `GraphModules`. In our case, the `GraphModule` submodules are not passed through a constructor, but created separately and installed on the root `GraphModule` via `setattr`. This means that prior to packaging, there appear to be no issues with the module, since the root's graph was created before any call_module targets were replaced with `GraphModules`.

When unpackaging such a model with `torch.package`, `torch.fx.graph_module._deserialize_graph_module` uses an inline `KeepModules` tracer that sets all submodules to leaves; the unpackaged module is implicitly and surprisingly inlined in the process.

## Potential Solution

This behavior was previously not understood by us, and so the current workaround is a gnarly process of wrapping all submodules with a `nn.Module` with a manually installed forward method.

Changing `wrapped_call` to return `return super(type(self), self).__call__(*args, **kwargs)` whenever `__call__` is inherited at least appears to solve the issue. Does this seem like an acceptable approach?

## Other Thoughts
- Repeated calls to `recompile` create nested `wrapped_calls`, all for the purpose of error handling. This seems probably unnecessary ¯\\_(ツ)\_/¯
- If a root module with a overriden `__call__` method is symbolically traced, it is ignored

Test Plan:
```
buck test:
    ✓ ListingSuccess: caffe2/test:fx - main (12.570)
    ✓ Pass: caffe2/test:fx - test_tracing_graphmodules_as_leaf_submodules (test_fx.TestFX) (11.982)
```

Reviewed By: ansley

Differential Revision: D29997935

fbshipit-source-id: 1988fbb025b14188da26a3e73e94fb789c3c1f74
2021-08-02 13:37:08 -07:00
Howard Huang
dc1bd6acee Remove PROCESS GROUP rpc backend (#62411)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62411

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29990408

Pulled By: H-Huang

fbshipit-source-id: 183d3b316767b12993cebbe32b73c2850fd1cc42
2021-08-02 12:26:22 -07:00
Yi Wang
2ec4f69b48 [DDP Comm Hook] Do not expose hook_then_optimizer as a public method (#62532)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62532

This method is not stable at this time, so avoid releasing it when DDP communication hook feature is released as a stable feature.
ghstack-source-id: 134787831

Test Plan:
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_hook_with_optimizer_parity
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_hook_then_optimizer_nccl

Reviewed By: rohan-varma

Differential Revision: D30031222

fbshipit-source-id: e03a8e13fee5116a5ddd724eb76316ee98f2a676
2021-08-02 12:25:01 -07:00
Victor Quach
b161ac541d [reland] Add default Saved Variable hooks (#62563)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62563

Expose a pair of functions to Python users: torch.autograd.graph.set_saved_tensors_default_hooks(pack, unpack) and torch.autograd.graph.reset_saved_tensors_default_hooks().
These functions control the hooks applied to saved tensors: all tensors saved in that context will be packed using the pack function, then unpacked accordingly when needed.

Currently, this works by simply calling register_hooks (cf #60975) directly at the end of the constructor of a SavedVariable. This could be optimized further by not performing the copy before registering default hooks, but this would require a small refactor. Edit: the refactor is done in #61927.

A current limitation is that if users create tensors in this context, they will not be able to register additional hooks on the saved tensor.

For instance, to perform something like #28997, one could define a pack function that saves to disk whenever the tensor size is too big and returns a filename, then unpack simply reads the content of the file and outputs a tensor, e.g.:

```
def pack(x):
    name = os.path.join(tmp_dir, str(uuid.uuid4()))
    torch.save(x, name)
    return name

def unpack(name):
    return torch.load(name)
```

Relanding previous PR: https://github.com/pytorch/pytorch/pull/61834

Original PR led to timeout error in: https://www.internalfb.com/mast/job/yuguo-release_canary_offline_training-inlinecvrp_a-canary_offline_train_28a7ecfc

Now passing: https://www.internalfb.com/mast/job/quach-release_canary_offline_training-inlinecvrp_a-canary_offline_train_9bb57e98

The difference with the new version is we don't need to acquire the GIL when calling `PyDefaultSavedVariableHooks::get_hooks`.

Test Plan: Imported from OSS

Reviewed By: iramazanli

Differential Revision: D30045405

Pulled By: Varal7

fbshipit-source-id: 7f6c07af3a56fe8835d5edcc815c15ea4fb4e332
2021-08-02 11:30:26 -07:00
Eli Uriegas
6f95850127 Revert D30024161: [DDP Communication Hook] Rename 4 Methods of GradBucket Class
Test Plan: revert-hammer

Differential Revision:
D30024161 (29c8b1db57)

Original commit changeset: 07e6072a2f7b

fbshipit-source-id: d571c2caadaf7b71fe2aba3c0597bd8074d153de
2021-08-02 10:26:54 -07:00
Philip Meier
2e4f566d30 add OpInfo for torch.nn.functional.softplus (#62317)
Summary:
Addresses facebookresearch/functorch#78.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62317

Reviewed By: malfet

Differential Revision: D30013322

Pulled By: zou3519

fbshipit-source-id: e80affd10b81534234694c9e4326cc68c7efc7fe
2021-08-02 09:46:13 -07:00
kshitij12345
cb626da145 [fix] mark non-differentiable ops (#62529)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62506
Fixes https://github.com/pytorch/pytorch/issues/62504

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62529

Reviewed By: albanD

Differential Revision: D30032665

Pulled By: malfet

fbshipit-source-id: 90254c50fb4a873e3eda59c8484626137e01cb31
2021-08-02 09:40:45 -07:00
Meghan Lele
562b555a2b [CUDA] Fix typo in Normalization.cu (#62515)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62515

**Summary**
This commit fixes an obvious typo in `Normalization.cu` I found while
working on #62452. Since that PR will not be landed anytime soon, I
thought it would be prudent to land this fix.

**Test Plan**
Continuous integration.

Test Plan: Imported from OSS

Reviewed By: makslevental

Differential Revision: D30027324

Pulled By: SplitInfinity

fbshipit-source-id: 9d368a54c13f8e2bf6f6956dfb2bee974226f726
2021-08-02 09:38:46 -07:00
Qing Hu
29c8b1db57 [DDP Communication Hook] Rename 4 Methods of GradBucket Class (#62510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62510

`GradBucket` is an important class defined in both C++ and Python, used for PyTorch Distributed Training. We need to rename the following methods for simplicity:
1) get_index -> index
2) is_the_last_bucket_to_allreduce -> is_last,
3) get_per_parameter_tensors -> gradients,
4) get_model_params_for_bucket -> parameters.

Test Plan:
Local run comprehensive test with following results:
https://pxl.cl/1Ml8b
For two timeout failure test cases, most likely environment related and fail in my devserver.

Reviewed By: SciPioneer

Differential Revision: D30024161

fbshipit-source-id: 07e6072a2f7b81f731425d9b71f8c8b60d383b0f
2021-08-02 09:33:32 -07:00
BoTorch website deployment script
34cb2b5d04 Update SobolEngine docstring w/ correct behavior (#62548)
Summary:
Sobol was modfied to not drop the first point. This update reflects this behavior in the docstring.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62548

Reviewed By: qingfeng10

Differential Revision: D30035627

Pulled By: Balandat

fbshipit-source-id: 64c659ea30c0c929778da3b58041875834e25e9a
2021-08-02 09:04:38 -07:00
Marjan Fariborz
2445d5c60a Removed the hypothesis tests for adaptive_avg_pool (#60886)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60886

Remove all the hypothesis tests from test_adaptive_avg_pool2d_nhwc, test_adaptive_avg_pool, and test_adaptive_avg_pool3d_ndhwc.

Test Plan: I tested it with buck test //caffe2/test:quantization and all three tests passed. The tests that failed are test_conv2d_api (test_quantized_functional.py), test_conv3d_api (test_quantized_functional.py),

Reviewed By: wanchaol, jerryzh168

Differential Revision: D29432184

fbshipit-source-id: 2a4c540d2c169aec69cf8d143d5a155394885745
2021-08-02 08:57:14 -07:00
Yi Zhang
3dc588d577 Fix: no enough space for cu102 debug nightly build (#62465)
Summary:
Fixes #{issue number}
![image](https://user-images.githubusercontent.com/16190118/127632173-783630b7-c644-4239-b1dd-fb12b6bacf83.png)

verification:
https://app.circleci.com/pipelines/github/pytorch/pytorch/358483/workflows/a34a0123-54fe-418f-9211-4af75ee56a70/jobs/15120463

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62465

Reviewed By: iramazanli

Differential Revision: D30045280

Pulled By: janeyx99

fbshipit-source-id: f40090eb02fd1d86033971611d492c7b107cc4bd
2021-08-02 08:44:16 -07:00
Andrew Gu
51f687fd4b Add overlap with DDP to ZeRO (two approaches) (#62157)
Summary:
**Overview:**
This adds two approaches to overlapping `DistributedDataParallel.backward()` with `ZeroRedundancyOptimizer.step()` by providing two hook constructors: `hook_with_zero_step()` and `hook_with_zero_step_interleaved()`. The former waits for all backward computation to finish before starting optimizer computation, while the latter launches a partial optimizer computation using the contents of a gradient bucket once that bucket's all-reduce completes. The two approaches each suffer from their own weaknesses, and which one to use depends on the specific hardware configuration.

Both approaches can share changes to `ZeroRedundancyOptimizer`. A user should pass `overlap_with_ddp=True` to `ZeroRedundancyOptimizer`, construct a DDP communication hook using either `hook_with_zero_step()` or `hook_with_zero_step_interleaved()`, and register that communication hook. `ZeroRedundancyOptimizer.step()` should still be called in the training loop, though the optimizer computation and communication will be offloaded to originate from the communication hook. Currently, the first two iterations are vacuous, meaning they do not result in parameter updates and the inputs are ignored. This is required to finalize the DDP bucket strategy and to then initialize the `ZeroRedundancyOptimizer`'s local optimizer based on that bucketing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62157

Test Plan:
The existing `ZeroRedundancyOptimizer` tests pass, and new unit tests for both hooks pass:
- ~~`test_ddp_with_zero_step_parity_cpu`~~ (removed for now due to flakiness in CI -- under investigation, could possibly be similar Gloo issue as with `hook_with_zero_step_interleaved()`)
- `test_ddp_with_zero_step_parity_gpu`
- `test_ddp_with_zero_step_interleaved_parity_gpu`

These were tested on the AI AWS cluster.

An analogous `test_ddp_with_zero_step_interleaved_parity_cpu` is missing due to existing bugs with Gloo. See https://github.com/pytorch/pytorch/pull/62302.

Both approaches have been verified using an internal accuracy benchmark.

Reviewed By: mrshenli

Differential Revision: D29971046

Pulled By: andwgu

fbshipit-source-id: a7234c23c7ea253f144a698fd7e3c0fe039de5e8
2021-08-02 08:33:34 -07:00
Joel Schlosser
ee482edf0a Callable activation function support for Transformer modules (C++) (#62342)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60747

Enhances the C++ versions of `Transformer`, `TransformerEncoderLayer`, and `TransformerDecoderLayer` to support callables as their activation functions. The old way of specifying activation function still works as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62342

Reviewed By: malfet

Differential Revision: D30022592

Pulled By: jbschlosser

fbshipit-source-id: d3c62410b84b1bd8c5ed3a1b3a3cce55608390c4
2021-08-02 08:06:39 -07:00
Rong Rong (AI Infra)
c9d5325c52 [BE] shorten the name part 1 (#62402)
Summary:
This should address part of https://github.com/pytorch/pytorch/issues/62357.

1. rename all files 'generated-*' to make it clear, filename will not be in CI workflow name
2. remove all 'pytorch-' in names
3. make sure the build test shell scripts are adopted to new name

Next change should reduce more device related naming

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62402

Reviewed By: malfet

Differential Revision: D30021959

Pulled By: walterddr

fbshipit-source-id: 64b21a2020e25a507101c09c010cb593d8ac4146
2021-08-02 07:56:55 -07:00
Can Balioglu
7565039ee9 Support system-provided Intel TBB (#61934)
Summary:
This PR: (1) enables the use of a system-provided Intel TBB for building PyTorch, (2) removes `tbb:task_scheduler_init` references since it has been removed from TBB a while ago (3) marks the implementation of `_internal_set_num_threads` with a TODO as it requires a revision that fixes its thread allocation logic.

Tested with `test/run_test`; no new tests are introduced since there are no behavioral changes (removal of `tbb::task_scheduler_init` has no impact on the runtime behavior).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61934

Reviewed By: malfet

Differential Revision: D29805416

Pulled By: cbalioglu

fbshipit-source-id: 22042b428b57b8fede9dfcc83878d679a19561dd
2021-08-02 07:39:00 -07:00