Commit Graph

11958 Commits

Author SHA1 Message Date
Mike Ruberry
95bdfbcca3
update 2020-09-29 06:28:11 -07:00
Mike Ruberry
f7f626adbc
Fixes 2020-09-29 06:23:34 -07:00
Mike Ruberry
baf5147870 update docs 2020-09-29 06:10:47 -07:00
Antonio Cuni
37f9af7f29 Missing tests about torch.xxx(out=...) (#44465)
Summary:
PR opened just to run the CI tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44465

Reviewed By: ngimel

Differential Revision: D23907565

Pulled By: mruberry

fbshipit-source-id: 620661667877f1e9a2bab17d19988e2dc986fc0f
2020-09-29 04:54:46 -07:00
Mike Ruberry
56af122659 Revert D23966878: [pytorch][PR] This PR flips a switch to enable PE + TE
Test Plan: revert-hammer

Differential Revision:
D23966878 (dddb685c11)

Original commit changeset: 2010a0b07c59

fbshipit-source-id: 132556039730fd3e4babd0d7ca8daf9c8d14f728
2020-09-29 04:33:19 -07:00
Basil Hosmer
1ed1a2f5b0 [wip] fast typeMeta/ScalarType conversion approach 2 (#44965)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44965

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23789657

Pulled By: bhosmer

fbshipit-source-id: 5afdd52d24bd097891ff4a7313033f7bd400165e
2020-09-29 02:39:36 -07:00
Supriya Rao
489af4ddcb [quant] Add quant APIs to save/load observer state_dict (#44846)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44846

The save function traverses the model state dict to pick out the observer stats
load function traverse the module hierarchy to load the state dict into module attributes depending on observer type

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_save_observer_state_dict

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D23746821

fbshipit-source-id: 05c571b62949a2833602d736a81924d77e7ade55
2020-09-29 01:52:42 -07:00
Zafar
bb478810e0 [quant] torch.max_pool1d (#45152)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45152

Test Plan: Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D23846473

Pulled By: z-a-f

fbshipit-source-id: 38fd611e568e4f8b39b7a00adeb42c7b99576360
2020-09-29 01:45:22 -07:00
Mikhail Zolotukhin
b86008ab75 [TensorExpr] Remove buf_ field from class Tensor. (#45390)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45390

Tensor objects should always refer to their Function's bufs. Currently
we never create a Tensor with a buffer different than of its function,
but having it in two places seems incorrect and dangerous.

Differential Revision: D23952865

Test Plan: Imported from OSS

Reviewed By: nickgg

Pulled By: ZolotukhinM

fbshipit-source-id: e63fc26d7078427514649d9ce973b74ea635a94a
2020-09-29 01:21:57 -07:00
Mikhail Zolotukhin
3c33695a6d [TensorExpr] Rename Buffer to Placeholder. (#45389)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45389

Differential Revision: D23952866

Test Plan: Imported from OSS

Reviewed By: nickgg

Pulled By: ZolotukhinM

fbshipit-source-id: 17eedd3ac17897501403482ac1866c569d247c75
2020-09-29 01:21:54 -07:00
Mikhail Zolotukhin
92306b85d5 [TensorExpr] Consolidate {buffer,function,tensor}.{h.cpp} in tensor.{h,cpp}. (#45388)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45388

Classes defined in these files are closely related, so it is reasonable
to have them all in one file. The change is purely a code move.

Differential Revision: D23952867

Test Plan: Imported from OSS

Reviewed By: nickgg

Pulled By: ZolotukhinM

fbshipit-source-id: 12cfaa968bdfc4dff00509e34310a497c7b59155
2020-09-29 01:17:10 -07:00
Iurii Zdebskyi
8c309fc052 Add more tests for mt optimizers (#45475)
Summary:
Add more test cases for mt optimizers and fix Adam/AdamW

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45475

Reviewed By: soumith

Differential Revision: D23982727

Pulled By: izdeby

fbshipit-source-id: 4b24d37bd52a2fa3719d3e3a5dcf3b96990b0f5b
2020-09-28 23:59:58 -07:00
James Reed
6bdb871d47 [FX] Lint pass for Graphs (#44973)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44973

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D23792631

Pulled By: jamesr66a

fbshipit-source-id: d8faef0c311d8bd611ba0a7e1e2f353e3e5a1068
2020-09-28 23:00:32 -07:00
James Reed
b0bdc82a00 [FX][EZ] Fix bug where copying node made non-unique name (#45311)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45311

Test Plan: Imported from OSS

Reviewed By: dzhulgakov

Differential Revision: D23917864

Pulled By: jamesr66a

fbshipit-source-id: 10d0a4017ffe160bce4ba0d830e035616bbded74
2020-09-28 22:55:20 -07:00
lixinyu
417e3f85e5 Support tuple inputs in NN Module test (#44853)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44853

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D23750441

Pulled By: glaringlee

fbshipit-source-id: 1b111a370a726b40521134b711c35f48dda99411
2020-09-28 22:05:05 -07:00
Nikolay Korovaiko
dddb685c11 This PR flips a switch to enable PE + TE (#45396)
Summary:
This PR flips a switch to enable PE + TE
next PR: https://github.com/pytorch/pytorch/pull/45397

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45396

Reviewed By: suo

Differential Revision: D23966878

Pulled By: Krovatkin

fbshipit-source-id: 2010a0b07c595992a88b3fe0792d6af315cf421e
2020-09-28 21:57:50 -07:00
Natalia Gimelshein
50b91103a9 add self cuda time to avoid double/quadruple counting (#45209)
Summary:
In profiler, cuda did not report self time, so for composite functions there was no way to determine which function is really taking time. In addition, "total cuda time" reported was frequently more than total wallclock time. This PR adds "self CUDA time" in profiler, and computes total cuda time based on self cuda time, similar to how it's done for CPU. Also, slight formatting changes to make table more compact. Before:
```
--------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------
Name                  Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CUDA total %     CUDA total       CUDA time avg    Number of Calls
--------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------
aten::matmul          0.17%            890.805us        99.05%           523.401ms        5.234ms          49.91%           791.184ms        7.912ms          100
aten::mm              98.09%           518.336ms        98.88%           522.511ms        5.225ms          49.89%           790.885ms        7.909ms          100
aten::t               0.29%            1.530ms          0.49%            2.588ms          25.882us         0.07%            1.058ms          10.576us         100
aten::view            0.46%            2.448ms          0.46%            2.448ms          12.238us         0.06%            918.936us        4.595us          200
aten::transpose       0.13%            707.204us        0.20%            1.058ms          10.581us         0.03%            457.802us        4.578us          100
aten::empty           0.14%            716.056us        0.14%            716.056us        7.161us          0.01%            185.694us        1.857us          100
aten::as_strided      0.07%            350.935us        0.07%            350.935us        3.509us          0.01%            156.380us        1.564us          100
aten::stride          0.65%            3.458ms          0.65%            3.458ms          11.527us         0.03%            441.258us        1.471us          300
--------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------
Self CPU time total: 528.437ms
CUDA time total: 1.585s

Recorded timeit time:  789.0814 ms

```
Note recorded timeit time (with proper cuda syncs) is 2 times smaller than "CUDA time total" reported by profiler

After
```
--------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls
--------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
        aten::matmul         0.15%     802.716us        99.06%     523.548ms       5.235ms     302.451us         0.04%     791.151ms       7.912ms           100
            aten::mm        98.20%     519.007ms        98.91%     522.745ms       5.227ms     790.225ms        99.63%     790.848ms       7.908ms           100
             aten::t         0.27%       1.406ms         0.49%       2.578ms      25.783us     604.964us         0.08%       1.066ms      10.662us           100
          aten::view         0.45%       2.371ms         0.45%       2.371ms      11.856us     926.281us         0.12%     926.281us       4.631us           200
     aten::transpose         0.15%     783.462us         0.22%       1.173ms      11.727us     310.016us         0.04%     461.282us       4.613us           100
         aten::empty         0.11%     591.603us         0.11%     591.603us       5.916us     176.566us         0.02%     176.566us       1.766us           100
    aten::as_strided         0.07%     389.270us         0.07%     389.270us       3.893us     151.266us         0.02%     151.266us       1.513us           100
        aten::stride         0.60%       3.147ms         0.60%       3.147ms      10.489us     446.451us         0.06%     446.451us       1.488us           300
--------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 528.498ms
CUDA time total: 793.143ms

Recorded timeit time:  788.9832 ms

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45209

Reviewed By: zou3519

Differential Revision: D23925491

Pulled By: ngimel

fbshipit-source-id: 7f9c49238d116bfd2db9db3e8943355c953a77d0
2020-09-28 21:51:13 -07:00
Shen Li
5be954b502 Fix WorkerInfo link format (#45476)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45476

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D23982069

Pulled By: mrshenli

fbshipit-source-id: 6d932e77c1941dfd96592b388353f0fc8968dde6
2020-09-28 20:48:15 -07:00
Shen Li
8e47fcba5f Update docs for RPC async_execution (#45458)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45458

Test Plan: Imported from OSS

Reviewed By: pritamdamania87

Differential Revision: D23973366

Pulled By: mrshenli

fbshipit-source-id: 3697f07fa972db21746aa25eaf461c1b93293f58
2020-09-28 20:48:12 -07:00
Shen Li
c5ade5f698 Fix no_sync docs (#45455)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45455

Test Plan: Imported from OSS

Reviewed By: pritamdamania87

Differential Revision: D23973365

Pulled By: mrshenli

fbshipit-source-id: 87c9878cdc7310754670b83efa65ae6f877f86fb
2020-09-28 20:48:09 -07:00
Shen Li
6967e6295e Fix DDP docs (#45454)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45454

Test Plan: Imported from OSS

Reviewed By: pritamdamania87

Differential Revision: D23973367

Pulled By: mrshenli

fbshipit-source-id: 11f20d51d0d0f92f199e4023f02b86623867bae0
2020-09-28 20:43:22 -07:00
Alex Suhan
52cbc9e4ec [TensorExpr] Always inline and DCE in the LLVM backend (#45445)
Summary:
Inline pytorch into wrapper, which is especially helpful in combination
with dead code elimination to reduce IR size and compilation times when
a lot of parameters are unused.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45445

Test Plan: CI

Reviewed By: ZolotukhinM

Differential Revision: D23969009

Pulled By: asuhan

fbshipit-source-id: a21509d07e4c130b6aa6eae5236bb64db2748a3d
2020-09-28 18:11:13 -07:00
Meghan Lele
7ac872b934 [JIT] Modify to_backend API so that it accepts wrapped modules (#43612)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43612

**Summary**
This commit modifies the `torch._C._jit_to_backend` function so that it
accepts `ScriptModules` as inputs. It already returns `ScriptModules`
(as opposed to C++ modules), so this makes sense and makes the API more
intuitive.

**Test Plan**
Continuous integration, which includes unit tests and out-of-tree tests
for custom backends.

**Fixes**
This commit fixes #41432.

Test Plan: Imported from OSS

Reviewed By: suo, jamesr66a

Differential Revision: D23339854

Pulled By: SplitInfinity

fbshipit-source-id: 08ecef729c4e1e6bddf3f483276947fc3559ea88
2020-09-28 17:17:01 -07:00
Rong Rong
5855aa8dac Type check quasirandom (#45434)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/42978.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45434

Reviewed By: walterddr

Differential Revision: D23967139

Pulled By: ajitmaths

fbshipit-source-id: bcee6627f367fd01aa9a5c10a7c24331fc1823ad
2020-09-28 16:49:38 -07:00
Rong Rong
49b198c454 type check for torch.testing._internal.common_utils (#45375)
Summary:
part of torch.testing._internal.* effort

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45375

Reviewed By: malfet

Differential Revision: D23964315

Pulled By: walterddr

fbshipit-source-id: efdd643297f5c7f75670ffe60ff7e82fc413d18d
2020-09-28 16:28:46 -07:00
Heitor Schueroff de Souza
96f8755034 Fixed handling of nan for evenly_distribute_backward (#45280)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45280

Performance is the same on CPU and on CUDA is only 1-1.05x slower. This change is necessary for the future nan ops including nan(min|max|median)

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D23908796

Pulled By: heitorschueroff

fbshipit-source-id: c2b57acbe924cfa59fbd85216811f29f4af05088
2020-09-28 15:57:02 -07:00
Jan Schlüter
6a206df891 20000x faster audio conversion for SummaryWriter (#44201)
Summary:
Stumbled upon a little gem in the audio conversion for `SummaryWriter.add_audio()`: two Python `for` loops to convert a float array to little-endian int16 samples. On my machine, this took 35 seconds for a 30-second 22.05 kHz excerpt. The same can be done directly in numpy in 1.65 milliseconds. (No offense, I'm glad that the functionality was there!)

Would also be ready to extend this to support stereo waveforms, or should this become a separate PR?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44201

Reviewed By: J0Nreynolds

Differential Revision: D23831002

Pulled By: edward-io

fbshipit-source-id: 5c8f1ac7823d1ed41b53c4f97ab9a7bac33ea94b
2020-09-28 15:44:29 -07:00
Zachary DeVito
e54e1fe51e [package] Add dependency viz (#45214)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45214

When in verbose mode the package exporter will produce an html visualization
of dependencies of a module to make it easier to trim out unneeded code,
or debug inclusion of things that cannot be exported.

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D23873525

Pulled By: zdevito

fbshipit-source-id: 6801991573d8dd5ab8c284e09572b36a35e1e5a4
2020-09-28 15:38:41 -07:00
Omkar Salpekar
6b65b3cbd8 [Distributed] DeleteKey API for c10d TCP Store (#45401)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45401

Added a DeleteKey API for the TCP Store
ghstack-source-id: 112997162

Test Plan:
Modified the existing get/set test to use delete. verified that the
correct keys were deleted and that the numKeys API returned the right values

Reviewed By: mrshenli

Differential Revision: D23955730

fbshipit-source-id: 5c9f82be34ff4521c59f56f8d9c1abf775c67f9f
2020-09-28 15:30:39 -07:00
Gregory Chanan
1097fe0088 Remove CriterionTest.test_cuda code for dtype None. (#45316)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45316

It's never used.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D23919449

Pulled By: gchanan

fbshipit-source-id: f9aaeeabf3940389156bfc01bc3118d348ca4cf6
2020-09-28 15:08:09 -07:00
lcskrishna
a4486fe7ba [ROCm] Print name irrespective of seq number assignment for roctx traces (#45229)
Summary:
Recent changes to the seq_num correlation behavior in profiler (PR https://github.com/pytorch/pytorch/issues/42565)  has changed the behavior for emit_nvtx(record_shapes=True)  which doesn't print the name of the operator properly.

Created PR to dump out the name in roctx traces, irrespective of the sequence number assigned only for ROCm.

cc: jeffdaily sunway513

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45229

Reviewed By: zou3519

Differential Revision: D23932902

Pulled By: albanD

fbshipit-source-id: c782667ff002b70b51f1cc921afd1b1ac533b39d
2020-09-28 15:03:47 -07:00
Taylor Robie
c6b7eeb654 Gh/taylorrobie/timer cleanup (#45361)
Summary:
This PR cleans up some of the rough edges around `Timer` and `Compare`
* Moves `Measurement` to be dataclass based
* Adds a bunch of type annotations. MyPy is now happy.
* Allows missing entries in `Compare`. This is one of the biggest usability issues with `Compare` right now, both from an API perspective and because the current failure mode is really unpleasant.
* Greatly expands the testing of `Compare`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45361

Test Plan: Changes to Timer are covered under existing tests, changes to `Compare` are covered by the expanded `test_compare` method.

Reviewed By: bwasti

Differential Revision: D23966816

Pulled By: robieta

fbshipit-source-id: 826969f73b42f72fa35f4de3c64d0988b61474cd
2020-09-28 14:56:43 -07:00
Negin Raoof
a77d633db1 [ONNX] Fix view for dynamic input shape (#43558)
Summary:
Export of view op with dynamic input shape is broken when using tensors with a 0-dim.
This fix removes symbolic use of static input size to fix this issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43558

Reviewed By: ailzhang

Differential Revision: D23965090

Pulled By: bzinodev

fbshipit-source-id: 628e9d7ee5d53375f25052340ca6feabf7ba7c53
2020-09-28 14:46:51 -07:00
Gregory Chanan
5d1fee23b3 Remove convert_target from NN tests. (#45291)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45291

It's not necessary, you can just check if the dtype is integral.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D23911963

Pulled By: gchanan

fbshipit-source-id: 230139e1651eb76226f4095e31068dded30e03e8
2020-09-28 14:21:42 -07:00
Rong Rong
986af53be2 type check for torch.testing._internalcodegen:* (#45368)
Summary:
part of `torch.testing._internal.*` effort

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45368

Reviewed By: malfet

Differential Revision: D23950512

Pulled By: walterddr

fbshipit-source-id: 399f712d12cdd9795b0136328f512c3f86a15f24
2020-09-28 14:04:52 -07:00
Yi Wang
7a4c417ed3 Fix typo (#45379)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45379

Registeres -> Registers in reducer.h.
ghstack-source-id: 112982279

Test Plan: N/A

Reviewed By: mrshenli

Differential Revision: D23951203

fbshipit-source-id: 96c7dc2e1e12c132339b9ac83ce1da52c812740c
2020-09-28 14:02:01 -07:00
BowenBao
57c18127dc [ONNX] Update div export to perform true divide (#44831)
Summary:
related https://github.com/pytorch/pytorch/issues/43787

Now that PyTorch div is actually performing true divide, update onnx export code to stay consistent.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44831

Reviewed By: eellison

Differential Revision: D23880316

Pulled By: bzinodev

fbshipit-source-id: 3bb8db34142ac4fed4039295ad3c4cb79487987f
2020-09-28 13:53:43 -07:00
gunandrose4u
47debdca42 Document change for DDP enabled on Windows platform (#45392)
Summary:
Document change for DDP enabled on Windows platform

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45392

Reviewed By: gchanan

Differential Revision: D23962344

Pulled By: mrshenli

fbshipit-source-id: 8924c6ca36d68699871d8add3e0aab6542ea269c
2020-09-28 13:22:42 -07:00
Iurii Zdebskyi
722faeb2a4 [RELAND] Added optimizers based on multi tensor apply (#45408)
Summary:
Original PR https://github.com/pytorch/pytorch/pull/45299.  The present PR fixes minor bugs that caused revert.

Adding a new namespace `torch.optim._multi_tensor` with a bunch of updated optimizers. Those optimizers are using _foreach APIs which improve performance significantly.

### Tests
- updated existing tests to use both optimizers
- added `test_multi_tensor_optimizers` test to verify correctness.

### Perf results

**Adam**
timeit: 42.69 ms --> 10.16 ms
autorange: 41.96 ms --> 10.28 ms

**AdamW**
timeit: 51.38 ms --> 15.63 ms
autorange: 50.82 ms --> 16.07 ms

**SGD**
timeit: 6.28 ms --> 4.40 ms
autorange: 6.13 ms --> 4.73 ms

**RMSprop**
timeit: 28.63 ms --> 5.89 ms
autorange: 28.27 ms -->  5.76 ms

**Rprop**
timeit: 213.30 --> 178.42
autorange: 212.03 --> 178.03

**ASGD**
timeit: 21.67 --> 9.33
autorange: 21.64 --> 9.27

**Adamax**
timeit: 55.60 --> 48.29
autorange: 55.22 -> 49.13

**Rerf Script used**

```
import torch
import time
import torch.optim as optim
from torch.autograd import Variable
from torch.optim.lr_scheduler import ExponentialLR, ReduceLROnPlateau, StepLR
import torch.nn as nn
import time
import torchvision
import torch.utils._benchmark as benchmark_utils

device = "cuda"
model = torchvision.models.resnet.resnet101(pretrained=True).to(device)
targets = torch.randint(0, 1000, (100, 100), device=device)
criterion = nn.CrossEntropyLoss()

optimizer = optim.SGD(model.parameters(), lr=1e-3) # <----------------------- optimizer.
                                                          # would compare optim.SGD vs optim._multi_tensor.SGD
running_loss = 0.0
target = torch.empty(128, dtype=torch.long, device=device).random_(5)

optimizer.zero_grad()
inputs = torch.rand(128, 3, 100, 100, device=device , requires_grad=True)
outputs = model(inputs)
loss = criterion(outputs, target)
loss.backward()
optimizer.step()
running_loss += loss.item()

def main():
    timer = benchmark_utils.Timer(
        stmt="optimizer.step()",
        globals=globals(),
        label="str(optimizer)",
    )

    for i in range(1):
        print(f"Run: {i}\n{'-' * 40}")
        print(f"timeit:\n{timer.timeit(1000)}\n")
        print(f"autorange:\n{timer.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45408

Reviewed By: gchanan

Differential Revision: D23956680

Pulled By: izdeby

fbshipit-source-id: c5eab7bf5fce14a287c15cead1cdc26e42cfed94
2020-09-28 13:14:04 -07:00
Bram Wasti
87b356d093 [static runtime] Split out graph preparation from runtime (#44131)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44131

Test Plan: Imported from OSS

Reviewed By: hlu1

Differential Revision: D23604305

Pulled By: bwasti

fbshipit-source-id: 7b47da4961d99074199417ef1407a788c7d80ee6
2020-09-28 13:01:23 -07:00
Nikolay Korovaiko
993628c74a Build shape expressions and remove outputs that are only used by aten::sizes (#45080)
Summary:
Currently, TE materializes all intermediate results even if they are only used for computing their shapes. This diff ports the approach the OF (Old Fuser) took to deal with this issue. Namely, given the structure of a fusion group we infer all the sizes outside a fusion group based on fusion group's inputs.

A simple example would be:

```
        def test_fuse(a, b):
            c = a + b
            d = c + b
            return d
```

Here we don't need to cache `c` as computing a gradient for `b` in `d = c + b` doesn't need it. We do need to compute sizes for all arguments here in case broadcasts happen.

Without this optimization, TE would need to materialize `c` so we can get its size

```
[DUMP profiling_graph_executor_impl.cpp:499] Optimized Graph:
[DUMP profiling_graph_executor_impl.cpp:499] graph(%a.1 : Tensor,
[DUMP profiling_graph_executor_impl.cpp:499]       %b.1 : Tensor):
[DUMP profiling_graph_executor_impl.cpp:499]   %11 : Tensor = prim::DifferentiableGraph_0(%b.1, %a.1)
[DUMP profiling_graph_executor_impl.cpp:499]   return (%11)
[DUMP profiling_graph_executor_impl.cpp:499] with prim::DifferentiableGraph_0 = graph(%11 : Tensor,
[DUMP profiling_graph_executor_impl.cpp:499]       %13 : Tensor):
[DUMP profiling_graph_executor_impl.cpp:499]   %59 : int[] = aten::size(%13) # <string>:3:44
[DUMP profiling_graph_executor_impl.cpp:499]   %62 : int[] = aten::size(%11) # <string>:3:93
[DUMP profiling_graph_executor_impl.cpp:499]   %83 : Double(1:1, requires_grad=0, device=cuda:0), %84 : Double(1:1, requires_grad=0, device=cuda:0), %85 : bool = prim::TypeCheck(%11, %13)
[DUMP profiling_graph_executor_impl.cpp:499]   %86 : Tensor, %87 : Tensor = prim::If(%85)
[DUMP profiling_graph_executor_impl.cpp:499]     block0():
[DUMP profiling_graph_executor_impl.cpp:499]       %d.4 : Double(1:1, requires_grad=0, device=cuda:0), %c.4 : Double(1:1, requires_grad=0, device=cuda:0) = prim::TensorExprGroup_0(%83, %84)
[DUMP profiling_graph_executor_impl.cpp:499]       -> (%d.4, %c.4)
[DUMP profiling_graph_executor_impl.cpp:499]     block1():
[DUMP profiling_graph_executor_impl.cpp:499]       %94 : Function = prim::Constant[name="fallback_function", fallback=1]()
[DUMP profiling_graph_executor_impl.cpp:499]       %95 : (Tensor, Tensor) = prim::CallFunction(%94, %11, %13)
[DUMP profiling_graph_executor_impl.cpp:499]       %96 : Tensor, %97 : Tensor = prim::TupleUnpack(%95)
[DUMP profiling_graph_executor_impl.cpp:499]       -> (%96, %97)
[DUMP profiling_graph_executor_impl.cpp:499]   %60 : int[] = aten::size(%87) # <string>:3:55
[DUMP profiling_graph_executor_impl.cpp:499]   %61 : int[]? = aten::_size_if_not_equal(%59, %60) # <string>:3:19
[DUMP profiling_graph_executor_impl.cpp:499]   %64 : int[]? = aten::_size_if_not_equal(%62, %60) # <string>:3:68
[DUMP profiling_graph_executor_impl.cpp:499]   %67 : int[] = aten::size(%86) # <string>:3:55
[DUMP profiling_graph_executor_impl.cpp:499]   %68 : int[]? = aten::_size_if_not_equal(%60, %67) # <string>:3:19
[DUMP profiling_graph_executor_impl.cpp:499]   %71 : int[]? = aten::_size_if_not_equal(%62, %67) # <string>:3:68
[DUMP profiling_graph_executor_impl.cpp:499]   return (%86, %61, %64, %68, %71)
[DUMP profiling_graph_executor_impl.cpp:499] with prim::TensorExprGroup_0 = graph(%1 : Double(1:1, requires_grad=0, device=cuda:0),
[DUMP profiling_graph_executor_impl.cpp:499]       %4 : Double(1:1, requires_grad=0, device=cuda:0)):
[DUMP profiling_graph_executor_impl.cpp:499]   %5 : int = prim::Constant[value=1]()
[DUMP profiling_graph_executor_impl.cpp:499]   %c.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%4, %1, %5) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2872:16
[DUMP profiling_graph_executor_impl.cpp:499]   %2 : int = prim::Constant[value=1]()
[DUMP profiling_graph_executor_impl.cpp:499]   %d.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%c.3, %1, %2) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2873:16
[DUMP profiling_graph_executor_impl.cpp:499]   return (%d.3, %c.3)
```

With this optimization we use `prim::BroadcastSizes` to compute the size of `c`. No need to materialize it.

```
[DUMP profiling_graph_executor_impl.cpp:499] Optimized Graph:
[DUMP profiling_graph_executor_impl.cpp:499] graph(%a.1 : Tensor,
[DUMP profiling_graph_executor_impl.cpp:499]       %b.1 : Tensor):
[DUMP profiling_graph_executor_impl.cpp:499]   %11 : Tensor = prim::DifferentiableGraph_0(%b.1, %a.1)
[DUMP profiling_graph_executor_impl.cpp:499]   return (%11)
[DUMP profiling_graph_executor_impl.cpp:499] with prim::DifferentiableGraph_0 = graph(%11 : Tensor,
[DUMP profiling_graph_executor_impl.cpp:499]       %13 : Tensor):
[DUMP profiling_graph_executor_impl.cpp:499]   %59 : int[] = aten::size(%13) # <string>:3:44
[DUMP profiling_graph_executor_impl.cpp:499]   %62 : int[] = aten::size(%11) # <string>:3:93
[DUMP profiling_graph_executor_impl.cpp:499]   %88 : Double(1:1, requires_grad=0, device=cuda:0), %89 : Double(1:1, requires_grad=0, device=cuda:0), %90 : bool = prim::TypeCheck(%11, %13)
[DUMP profiling_graph_executor_impl.cpp:499]   %91 : Tensor = prim::If(%90)
[DUMP profiling_graph_executor_impl.cpp:499]     block0():
[DUMP profiling_graph_executor_impl.cpp:499]       %d.4 : Double(1:1, requires_grad=0, device=cuda:0) = prim::TensorExprGroup_0(%88, %89)
[DUMP profiling_graph_executor_impl.cpp:499]       -> (%d.4)
[DUMP profiling_graph_executor_impl.cpp:499]     block1():
[DUMP profiling_graph_executor_impl.cpp:499]       %97 : Function = prim::Constant[name="fallback_function", fallback=1]()
[DUMP profiling_graph_executor_impl.cpp:499]       %98 : (Tensor) = prim::CallFunction(%97, %11, %13)
[DUMP profiling_graph_executor_impl.cpp:499]       %99 : Tensor = prim::TupleUnpack(%98)
[DUMP profiling_graph_executor_impl.cpp:499]       -> (%99)
[DUMP profiling_graph_executor_impl.cpp:499]   %85 : int[] = aten::size(%91)
[DUMP profiling_graph_executor_impl.cpp:499]   %86 : int[] = prim::BroadcastSizes(%59, %62)
[DUMP profiling_graph_executor_impl.cpp:499]   %61 : int[]? = aten::_size_if_not_equal(%59, %86) # <string>:3:19
[DUMP profiling_graph_executor_impl.cpp:499]   %64 : int[]? = aten::_size_if_not_equal(%62, %86) # <string>:3:68
[DUMP profiling_graph_executor_impl.cpp:499]   %68 : int[]? = aten::_size_if_not_equal(%86, %85) # <string>:3:19
[DUMP profiling_graph_executor_impl.cpp:499]   %71 : int[]? = aten::_size_if_not_equal(%62, %85) # <string>:3:68
[DUMP profiling_graph_executor_impl.cpp:499]   return (%91, %61, %64, %68, %71)
[DUMP profiling_graph_executor_impl.cpp:499] with prim::TensorExprGroup_0 = graph(%1 : Double(1:1, requires_grad=0, device=cuda:0),
[DUMP profiling_graph_executor_impl.cpp:499]       %4 : Double(1:1, requires_grad=0, device=cuda:0)):
[DUMP profiling_graph_executor_impl.cpp:499]   %5 : int = prim::Constant[value=1]()
[DUMP profiling_graph_executor_impl.cpp:499]   %c.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%4, %1, %5) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2872:16
[DUMP profiling_graph_executor_impl.cpp:499]   %2 : int = prim::Constant[value=1]()
[DUMP profiling_graph_executor_impl.cpp:499]   %d.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%c.3, %1, %2) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2873:16
[DUMP profiling_graph_executor_impl.cpp:499]   return (%d.3)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45080

Reviewed By: bertmaher

Differential Revision: D23856410

Pulled By: Krovatkin

fbshipit-source-id: 2956286eb03a4894a5baa151c35e6092466322b1
2020-09-28 10:45:56 -07:00
Rong Rong
48d29c830d [hotfix] disable problematic cuda tests on rocm builds (#45435)
Summary:
Disable the recent 3 cuda tests on amd rocm build/tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45435

Reviewed By: malfet

Differential Revision: D23962881

Pulled By: walterddr

fbshipit-source-id: ad4ea1f835b4722cdbdce685806cfd64376cc16f
2020-09-28 10:02:12 -07:00
Nikita Vedeneev
e4950a093a Backward support for generalized eigenvalue solver with LOBPCG in forward [only k-rank SYMEIG case] (#43002)
Summary:
As per title. Fixes [#{38948}](https://github.com/pytorch/pytorch/issues/38948). Therein you can find some blueprints for the algorithm being used in this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43002

Reviewed By: zou3519

Differential Revision: D23931326

Pulled By: albanD

fbshipit-source-id: e6994af70d94145f974ef87aa5cea166d6deff1e
2020-09-28 07:22:35 -07:00
Mike Ruberry
6417a70465 Updates linalg warning + docs (#45415)
Summary:
Changes the deprecation of norm to a docs deprecation, since PyTorch components still rely on norm and some behavior, like automatically flattening tensors, may need to be ported to torch.linalg.norm. The documentation is also updated to clarify that torch.norm and torch.linalg.norm are distinct.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45415

Reviewed By: ngimel

Differential Revision: D23958252

Pulled By: mruberry

fbshipit-source-id: fd54e807c59a2655453a6bcd9f4073cb2c12e8ac
2020-09-28 05:28:42 -07:00
generatedunixname89002005325676
7818a214c5 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D23959094

fbshipit-source-id: 6caa046d263114bff38a38d756099aac357e4f04
2020-09-28 05:08:46 -07:00
Negin Raoof
95a97e51b5 [ONNX] Improve scripting inplace indexing ops (#44351)
Summary:
Fix a couple of issues with scripting inplace indexing in prepare_inplace_ops_for_onnx pass.
1- Tracing index copy (such as cases lik x[1:3] = data) already applies broadcasting on rhs if needed. The broadcasting node (aten::expand) is missing in scripting cases.

2- Inplace indexing with ellipsis (aten::copy_) is replaced with aten::index_put and then handled with slice+select in this pass.
Support for negative indices for this op added.

Shape inference is also enabled for scripting tests using new JIT API.
A few more tests are enabled for scripting.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44351

Reviewed By: ezyang

Differential Revision: D23880267

Pulled By: bzinodev

fbshipit-source-id: 78b33444633eb7ae0fbabc7415e3b16001f5207f
2020-09-28 00:32:36 -07:00
Zino Benaissa
13f76f2be4 Fix preserve submodule attribute in freezing (#45143)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45143

This PR prevents freezing cleaning up a submodule when user requests to
preserve a submodule.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D23844969

Pulled By: bzinodev

fbshipit-source-id: 80e6db3fc12460d62e634ea0336ae2a3551c2151
2020-09-28 00:05:38 -07:00
liqunfu
c3bf402cbb handle onnx nll with default ignore index (#44816)
Summary:
in ONNX NegativeLogLikelihoodLoss specification, ignore_index is optional without default value.
therefore, when convert nll op to ONNX, we need to set ignore_index attribute even if it is not specified (e.g. ignore_index=-100).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44816

Reviewed By: ezyang

Differential Revision: D23880354

Pulled By: bzinodev

fbshipit-source-id: d0bdd58d0a4507ed9ce37133e68533fe6d1bdf2b
2020-09-27 23:26:19 -07:00
shubhambhokare1
5b839bca78 [ONNX] Optimize export_onnx api to reduce string and model proto exchange (#44332)
Summary:
Optimize export_onnx api to reduce string and model proto exchange in export.cpp

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44332

Reviewed By: bwasti, eellison

Differential Revision: D23880129

Pulled By: bzinodev

fbshipit-source-id: 1d216d8f710f356cbba2334fb21ea15a89dd16fa
2020-09-27 16:29:08 -07:00
neginraoof
4005afe94b [ONNX] Update narrow for dynamic inputs (#44039)
Summary:
Update narrow for dynamic inputs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44039

Reviewed By: mruberry

Differential Revision: D23742215

Pulled By: bzinodev

fbshipit-source-id: 0d58d2fe996f91a124af988a9a21ee433e842d07
2020-09-27 15:52:57 -07:00