pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Mike Ruberry	95bdfbcca3	update	2020-09-29 06:28:11 -07:00
Mike Ruberry	f7f626adbc	Fixes	2020-09-29 06:23:34 -07:00
Mike Ruberry	baf5147870	update docs	2020-09-29 06:10:47 -07:00
Antonio Cuni	37f9af7f29	Missing tests about torch.xxx(out=...) (#44465 ) Summary: PR opened just to run the CI tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/44465 Reviewed By: ngimel Differential Revision: D23907565 Pulled By: mruberry fbshipit-source-id: 620661667877f1e9a2bab17d19988e2dc986fc0f	2020-09-29 04:54:46 -07:00
Mike Ruberry	56af122659	Revert D23966878: [pytorch][PR] This PR flips a switch to enable PE + TE Test Plan: revert-hammer Differential Revision: D23966878 (`dddb685c11`) Original commit changeset: 2010a0b07c59 fbshipit-source-id: 132556039730fd3e4babd0d7ca8daf9c8d14f728	2020-09-29 04:33:19 -07:00
Basil Hosmer	1ed1a2f5b0	[wip] fast typeMeta/ScalarType conversion approach 2 (#44965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44965 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23789657 Pulled By: bhosmer fbshipit-source-id: 5afdd52d24bd097891ff4a7313033f7bd400165e	2020-09-29 02:39:36 -07:00
Supriya Rao	489af4ddcb	[quant] Add quant APIs to save/load observer state_dict (#44846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44846 The save function traverses the model state dict to pick out the observer stats load function traverse the module hierarchy to load the state dict into module attributes depending on observer type Test Plan: python test/test_quantization.py TestQuantizeFx.test_save_observer_state_dict Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23746821 fbshipit-source-id: 05c571b62949a2833602d736a81924d77e7ade55	2020-09-29 01:52:42 -07:00
Zafar	bb478810e0	[quant] torch.max_pool1d (#45152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45152 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23846473 Pulled By: z-a-f fbshipit-source-id: 38fd611e568e4f8b39b7a00adeb42c7b99576360	2020-09-29 01:45:22 -07:00
Mikhail Zolotukhin	b86008ab75	[TensorExpr] Remove buf_ field from class Tensor. (#45390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45390 Tensor objects should always refer to their Function's bufs. Currently we never create a Tensor with a buffer different than of its function, but having it in two places seems incorrect and dangerous. Differential Revision: D23952865 Test Plan: Imported from OSS Reviewed By: nickgg Pulled By: ZolotukhinM fbshipit-source-id: e63fc26d7078427514649d9ce973b74ea635a94a	2020-09-29 01:21:57 -07:00
Mikhail Zolotukhin	3c33695a6d	[TensorExpr] Rename `Buffer` to `Placeholder`. (#45389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45389 Differential Revision: D23952866 Test Plan: Imported from OSS Reviewed By: nickgg Pulled By: ZolotukhinM fbshipit-source-id: 17eedd3ac17897501403482ac1866c569d247c75	2020-09-29 01:21:54 -07:00
Mikhail Zolotukhin	92306b85d5	[TensorExpr] Consolidate {buffer,function,tensor}.{h.cpp} in tensor.{h,cpp}. (#45388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45388 Classes defined in these files are closely related, so it is reasonable to have them all in one file. The change is purely a code move. Differential Revision: D23952867 Test Plan: Imported from OSS Reviewed By: nickgg Pulled By: ZolotukhinM fbshipit-source-id: 12cfaa968bdfc4dff00509e34310a497c7b59155	2020-09-29 01:17:10 -07:00
Iurii Zdebskyi	8c309fc052	Add more tests for mt optimizers (#45475 ) Summary: Add more test cases for mt optimizers and fix Adam/AdamW Pull Request resolved: https://github.com/pytorch/pytorch/pull/45475 Reviewed By: soumith Differential Revision: D23982727 Pulled By: izdeby fbshipit-source-id: 4b24d37bd52a2fa3719d3e3a5dcf3b96990b0f5b	2020-09-28 23:59:58 -07:00
James Reed	6bdb871d47	[FX] Lint pass for Graphs (#44973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44973 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D23792631 Pulled By: jamesr66a fbshipit-source-id: d8faef0c311d8bd611ba0a7e1e2f353e3e5a1068	2020-09-28 23:00:32 -07:00
James Reed	b0bdc82a00	[FX][EZ] Fix bug where copying node made non-unique name (#45311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45311 Test Plan: Imported from OSS Reviewed By: dzhulgakov Differential Revision: D23917864 Pulled By: jamesr66a fbshipit-source-id: 10d0a4017ffe160bce4ba0d830e035616bbded74	2020-09-28 22:55:20 -07:00
lixinyu	417e3f85e5	Support tuple inputs in NN Module test (#44853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44853 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23750441 Pulled By: glaringlee fbshipit-source-id: 1b111a370a726b40521134b711c35f48dda99411	2020-09-28 22:05:05 -07:00
Nikolay Korovaiko	dddb685c11	This PR flips a switch to enable PE + TE (#45396 ) Summary: This PR flips a switch to enable PE + TE next PR: https://github.com/pytorch/pytorch/pull/45397 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45396 Reviewed By: suo Differential Revision: D23966878 Pulled By: Krovatkin fbshipit-source-id: 2010a0b07c595992a88b3fe0792d6af315cf421e	2020-09-28 21:57:50 -07:00
Natalia Gimelshein	50b91103a9	add self cuda time to avoid double/quadruple counting (#45209 ) Summary: In profiler, cuda did not report self time, so for composite functions there was no way to determine which function is really taking time. In addition, "total cuda time" reported was frequently more than total wallclock time. This PR adds "self CUDA time" in profiler, and computes total cuda time based on self cuda time, similar to how it's done for CPU. Also, slight formatting changes to make table more compact. Before: ``` -------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CUDA total % CUDA total CUDA time avg Number of Calls -------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- aten::matmul 0.17% 890.805us 99.05% 523.401ms 5.234ms 49.91% 791.184ms 7.912ms 100 aten::mm 98.09% 518.336ms 98.88% 522.511ms 5.225ms 49.89% 790.885ms 7.909ms 100 aten::t 0.29% 1.530ms 0.49% 2.588ms 25.882us 0.07% 1.058ms 10.576us 100 aten::view 0.46% 2.448ms 0.46% 2.448ms 12.238us 0.06% 918.936us 4.595us 200 aten::transpose 0.13% 707.204us 0.20% 1.058ms 10.581us 0.03% 457.802us 4.578us 100 aten::empty 0.14% 716.056us 0.14% 716.056us 7.161us 0.01% 185.694us 1.857us 100 aten::as_strided 0.07% 350.935us 0.07% 350.935us 3.509us 0.01% 156.380us 1.564us 100 aten::stride 0.65% 3.458ms 0.65% 3.458ms 11.527us 0.03% 441.258us 1.471us 300 -------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 528.437ms CUDA time total: 1.585s Recorded timeit time: 789.0814 ms ``` Note recorded timeit time (with proper cuda syncs) is 2 times smaller than "CUDA time total" reported by profiler After ``` -------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls -------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::matmul 0.15% 802.716us 99.06% 523.548ms 5.235ms 302.451us 0.04% 791.151ms 7.912ms 100 aten::mm 98.20% 519.007ms 98.91% 522.745ms 5.227ms 790.225ms 99.63% 790.848ms 7.908ms 100 aten::t 0.27% 1.406ms 0.49% 2.578ms 25.783us 604.964us 0.08% 1.066ms 10.662us 100 aten::view 0.45% 2.371ms 0.45% 2.371ms 11.856us 926.281us 0.12% 926.281us 4.631us 200 aten::transpose 0.15% 783.462us 0.22% 1.173ms 11.727us 310.016us 0.04% 461.282us 4.613us 100 aten::empty 0.11% 591.603us 0.11% 591.603us 5.916us 176.566us 0.02% 176.566us 1.766us 100 aten::as_strided 0.07% 389.270us 0.07% 389.270us 3.893us 151.266us 0.02% 151.266us 1.513us 100 aten::stride 0.60% 3.147ms 0.60% 3.147ms 10.489us 446.451us 0.06% 446.451us 1.488us 300 -------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 528.498ms CUDA time total: 793.143ms Recorded timeit time: 788.9832 ms ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45209 Reviewed By: zou3519 Differential Revision: D23925491 Pulled By: ngimel fbshipit-source-id: 7f9c49238d116bfd2db9db3e8943355c953a77d0	2020-09-28 21:51:13 -07:00
Shen Li	5be954b502	Fix WorkerInfo link format (#45476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45476 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D23982069 Pulled By: mrshenli fbshipit-source-id: 6d932e77c1941dfd96592b388353f0fc8968dde6	2020-09-28 20:48:15 -07:00
Shen Li	8e47fcba5f	Update docs for RPC async_execution (#45458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45458 Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D23973366 Pulled By: mrshenli fbshipit-source-id: 3697f07fa972db21746aa25eaf461c1b93293f58	2020-09-28 20:48:12 -07:00
Shen Li	c5ade5f698	Fix no_sync docs (#45455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45455 Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D23973365 Pulled By: mrshenli fbshipit-source-id: 87c9878cdc7310754670b83efa65ae6f877f86fb	2020-09-28 20:48:09 -07:00
Shen Li	6967e6295e	Fix DDP docs (#45454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45454 Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D23973367 Pulled By: mrshenli fbshipit-source-id: 11f20d51d0d0f92f199e4023f02b86623867bae0	2020-09-28 20:43:22 -07:00
Alex Suhan	52cbc9e4ec	[TensorExpr] Always inline and DCE in the LLVM backend (#45445 ) Summary: Inline pytorch into wrapper, which is especially helpful in combination with dead code elimination to reduce IR size and compilation times when a lot of parameters are unused. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45445 Test Plan: CI Reviewed By: ZolotukhinM Differential Revision: D23969009 Pulled By: asuhan fbshipit-source-id: a21509d07e4c130b6aa6eae5236bb64db2748a3d	2020-09-28 18:11:13 -07:00
Meghan Lele	7ac872b934	[JIT] Modify to_backend API so that it accepts wrapped modules (#43612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43612 Summary This commit modifies the `torch._C._jit_to_backend` function so that it accepts `ScriptModules` as inputs. It already returns `ScriptModules` (as opposed to C++ modules), so this makes sense and makes the API more intuitive. Test Plan Continuous integration, which includes unit tests and out-of-tree tests for custom backends. Fixes This commit fixes #41432. Test Plan: Imported from OSS Reviewed By: suo, jamesr66a Differential Revision: D23339854 Pulled By: SplitInfinity fbshipit-source-id: 08ecef729c4e1e6bddf3f483276947fc3559ea88	2020-09-28 17:17:01 -07:00
Rong Rong	5855aa8dac	Type check quasirandom (#45434 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42978. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45434 Reviewed By: walterddr Differential Revision: D23967139 Pulled By: ajitmaths fbshipit-source-id: bcee6627f367fd01aa9a5c10a7c24331fc1823ad	2020-09-28 16:49:38 -07:00
Rong Rong	49b198c454	type check for torch.testing._internal.common_utils (#45375 ) Summary: part of torch.testing._internal.* effort Pull Request resolved: https://github.com/pytorch/pytorch/pull/45375 Reviewed By: malfet Differential Revision: D23964315 Pulled By: walterddr fbshipit-source-id: efdd643297f5c7f75670ffe60ff7e82fc413d18d	2020-09-28 16:28:46 -07:00
Heitor Schueroff de Souza	96f8755034	Fixed handling of nan for evenly_distribute_backward (#45280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45280 Performance is the same on CPU and on CUDA is only 1-1.05x slower. This change is necessary for the future nan ops including nan(min\|max\|median) Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D23908796 Pulled By: heitorschueroff fbshipit-source-id: c2b57acbe924cfa59fbd85216811f29f4af05088	2020-09-28 15:57:02 -07:00
Jan Schlüter	6a206df891	20000x faster audio conversion for SummaryWriter (#44201 ) Summary: Stumbled upon a little gem in the audio conversion for `SummaryWriter.add_audio()`: two Python `for` loops to convert a float array to little-endian int16 samples. On my machine, this took 35 seconds for a 30-second 22.05 kHz excerpt. The same can be done directly in numpy in 1.65 milliseconds. (No offense, I'm glad that the functionality was there!) Would also be ready to extend this to support stereo waveforms, or should this become a separate PR? Pull Request resolved: https://github.com/pytorch/pytorch/pull/44201 Reviewed By: J0Nreynolds Differential Revision: D23831002 Pulled By: edward-io fbshipit-source-id: 5c8f1ac7823d1ed41b53c4f97ab9a7bac33ea94b	2020-09-28 15:44:29 -07:00
Zachary DeVito	e54e1fe51e	[package] Add dependency viz (#45214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45214 When in verbose mode the package exporter will produce an html visualization of dependencies of a module to make it easier to trim out unneeded code, or debug inclusion of things that cannot be exported. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D23873525 Pulled By: zdevito fbshipit-source-id: 6801991573d8dd5ab8c284e09572b36a35e1e5a4	2020-09-28 15:38:41 -07:00
Omkar Salpekar	6b65b3cbd8	[Distributed] DeleteKey API for c10d TCP Store (#45401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45401 Added a DeleteKey API for the TCP Store ghstack-source-id: 112997162 Test Plan: Modified the existing get/set test to use delete. verified that the correct keys were deleted and that the numKeys API returned the right values Reviewed By: mrshenli Differential Revision: D23955730 fbshipit-source-id: 5c9f82be34ff4521c59f56f8d9c1abf775c67f9f	2020-09-28 15:30:39 -07:00
Gregory Chanan	1097fe0088	Remove CriterionTest.test_cuda code for dtype None. (#45316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45316 It's never used. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23919449 Pulled By: gchanan fbshipit-source-id: f9aaeeabf3940389156bfc01bc3118d348ca4cf6	2020-09-28 15:08:09 -07:00
lcskrishna	a4486fe7ba	[ROCm] Print name irrespective of seq number assignment for roctx traces (#45229 ) Summary: Recent changes to the seq_num correlation behavior in profiler (PR https://github.com/pytorch/pytorch/issues/42565) has changed the behavior for emit_nvtx(record_shapes=True) which doesn't print the name of the operator properly. Created PR to dump out the name in roctx traces, irrespective of the sequence number assigned only for ROCm. cc: jeffdaily sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45229 Reviewed By: zou3519 Differential Revision: D23932902 Pulled By: albanD fbshipit-source-id: c782667ff002b70b51f1cc921afd1b1ac533b39d	2020-09-28 15:03:47 -07:00
Taylor Robie	c6b7eeb654	Gh/taylorrobie/timer cleanup (#45361 ) Summary: This PR cleans up some of the rough edges around `Timer` and `Compare` * Moves `Measurement` to be dataclass based * Adds a bunch of type annotations. MyPy is now happy. * Allows missing entries in `Compare`. This is one of the biggest usability issues with `Compare` right now, both from an API perspective and because the current failure mode is really unpleasant. * Greatly expands the testing of `Compare` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45361 Test Plan: Changes to Timer are covered under existing tests, changes to `Compare` are covered by the expanded `test_compare` method. Reviewed By: bwasti Differential Revision: D23966816 Pulled By: robieta fbshipit-source-id: 826969f73b42f72fa35f4de3c64d0988b61474cd	2020-09-28 14:56:43 -07:00
Negin Raoof	a77d633db1	[ONNX] Fix view for dynamic input shape (#43558 ) Summary: Export of view op with dynamic input shape is broken when using tensors with a 0-dim. This fix removes symbolic use of static input size to fix this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43558 Reviewed By: ailzhang Differential Revision: D23965090 Pulled By: bzinodev fbshipit-source-id: 628e9d7ee5d53375f25052340ca6feabf7ba7c53	2020-09-28 14:46:51 -07:00
Gregory Chanan	5d1fee23b3	Remove convert_target from NN tests. (#45291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45291 It's not necessary, you can just check if the dtype is integral. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23911963 Pulled By: gchanan fbshipit-source-id: 230139e1651eb76226f4095e31068dded30e03e8	2020-09-28 14:21:42 -07:00
Rong Rong	986af53be2	type check for torch.testing._internalcodegen:* (#45368 ) Summary: part of `torch.testing._internal.*` effort Pull Request resolved: https://github.com/pytorch/pytorch/pull/45368 Reviewed By: malfet Differential Revision: D23950512 Pulled By: walterddr fbshipit-source-id: 399f712d12cdd9795b0136328f512c3f86a15f24	2020-09-28 14:04:52 -07:00
Yi Wang	7a4c417ed3	Fix typo (#45379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45379 Registeres -> Registers in reducer.h. ghstack-source-id: 112982279 Test Plan: N/A Reviewed By: mrshenli Differential Revision: D23951203 fbshipit-source-id: 96c7dc2e1e12c132339b9ac83ce1da52c812740c	2020-09-28 14:02:01 -07:00
BowenBao	57c18127dc	[ONNX] Update div export to perform true divide (#44831 ) Summary: related https://github.com/pytorch/pytorch/issues/43787 Now that PyTorch div is actually performing true divide, update onnx export code to stay consistent. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44831 Reviewed By: eellison Differential Revision: D23880316 Pulled By: bzinodev fbshipit-source-id: 3bb8db34142ac4fed4039295ad3c4cb79487987f	2020-09-28 13:53:43 -07:00
gunandrose4u	47debdca42	Document change for DDP enabled on Windows platform (#45392 ) Summary: Document change for DDP enabled on Windows platform Pull Request resolved: https://github.com/pytorch/pytorch/pull/45392 Reviewed By: gchanan Differential Revision: D23962344 Pulled By: mrshenli fbshipit-source-id: 8924c6ca36d68699871d8add3e0aab6542ea269c	2020-09-28 13:22:42 -07:00
Iurii Zdebskyi	722faeb2a4	[RELAND] Added optimizers based on multi tensor apply (#45408 ) Summary: Original PR https://github.com/pytorch/pytorch/pull/45299. The present PR fixes minor bugs that caused revert. Adding a new namespace `torch.optim._multi_tensor` with a bunch of updated optimizers. Those optimizers are using _foreach APIs which improve performance significantly. ### Tests - updated existing tests to use both optimizers - added `test_multi_tensor_optimizers` test to verify correctness. ### Perf results Adam timeit: 42.69 ms --> 10.16 ms autorange: 41.96 ms --> 10.28 ms AdamW timeit: 51.38 ms --> 15.63 ms autorange: 50.82 ms --> 16.07 ms SGD timeit: 6.28 ms --> 4.40 ms autorange: 6.13 ms --> 4.73 ms RMSprop timeit: 28.63 ms --> 5.89 ms autorange: 28.27 ms --> 5.76 ms Rprop timeit: 213.30 --> 178.42 autorange: 212.03 --> 178.03 ASGD timeit: 21.67 --> 9.33 autorange: 21.64 --> 9.27 Adamax timeit: 55.60 --> 48.29 autorange: 55.22 -> 49.13 Rerf Script used ``` import torch import time import torch.optim as optim from torch.autograd import Variable from torch.optim.lr_scheduler import ExponentialLR, ReduceLROnPlateau, StepLR import torch.nn as nn import time import torchvision import torch.utils._benchmark as benchmark_utils device = "cuda" model = torchvision.models.resnet.resnet101(pretrained=True).to(device) targets = torch.randint(0, 1000, (100, 100), device=device) criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=1e-3) # <----------------------- optimizer. # would compare optim.SGD vs optim._multi_tensor.SGD running_loss = 0.0 target = torch.empty(128, dtype=torch.long, device=device).random_(5) optimizer.zero_grad() inputs = torch.rand(128, 3, 100, 100, device=device , requires_grad=True) outputs = model(inputs) loss = criterion(outputs, target) loss.backward() optimizer.step() running_loss += loss.item() def main(): timer = benchmark_utils.Timer( stmt="optimizer.step()", globals=globals(), label="str(optimizer)", ) for i in range(1): print(f"Run: {i}\n{'-' * 40}") print(f"timeit:\n{timer.timeit(1000)}\n") print(f"autorange:\n{timer.blocked_autorange()}\n\n") if __name__ == "__main__": main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45408 Reviewed By: gchanan Differential Revision: D23956680 Pulled By: izdeby fbshipit-source-id: c5eab7bf5fce14a287c15cead1cdc26e42cfed94	2020-09-28 13:14:04 -07:00
Bram Wasti	87b356d093	[static runtime] Split out graph preparation from runtime (#44131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44131 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D23604305 Pulled By: bwasti fbshipit-source-id: 7b47da4961d99074199417ef1407a788c7d80ee6	2020-09-28 13:01:23 -07:00
Nikolay Korovaiko	993628c74a	Build shape expressions and remove outputs that are only used by `aten::size`s (#45080 ) Summary: Currently, TE materializes all intermediate results even if they are only used for computing their shapes. This diff ports the approach the OF (Old Fuser) took to deal with this issue. Namely, given the structure of a fusion group we infer all the sizes outside a fusion group based on fusion group's inputs. A simple example would be: ``` def test_fuse(a, b): c = a + b d = c + b return d ``` Here we don't need to cache `c` as computing a gradient for `b` in `d = c + b` doesn't need it. We do need to compute sizes for all arguments here in case broadcasts happen. Without this optimization, TE would need to materialize `c` so we can get its size ``` [DUMP profiling_graph_executor_impl.cpp:499] Optimized Graph: [DUMP profiling_graph_executor_impl.cpp:499] graph(%a.1 : Tensor, [DUMP profiling_graph_executor_impl.cpp:499] %b.1 : Tensor): [DUMP profiling_graph_executor_impl.cpp:499] %11 : Tensor = prim::DifferentiableGraph_0(%b.1, %a.1) [DUMP profiling_graph_executor_impl.cpp:499] return (%11) [DUMP profiling_graph_executor_impl.cpp:499] with prim::DifferentiableGraph_0 = graph(%11 : Tensor, [DUMP profiling_graph_executor_impl.cpp:499] %13 : Tensor): [DUMP profiling_graph_executor_impl.cpp:499] %59 : int[] = aten::size(%13) # <string>:3:44 [DUMP profiling_graph_executor_impl.cpp:499] %62 : int[] = aten::size(%11) # <string>:3:93 [DUMP profiling_graph_executor_impl.cpp:499] %83 : Double(1:1, requires_grad=0, device=cuda:0), %84 : Double(1:1, requires_grad=0, device=cuda:0), %85 : bool = prim::TypeCheck(%11, %13) [DUMP profiling_graph_executor_impl.cpp:499] %86 : Tensor, %87 : Tensor = prim::If(%85) [DUMP profiling_graph_executor_impl.cpp:499] block0(): [DUMP profiling_graph_executor_impl.cpp:499] %d.4 : Double(1:1, requires_grad=0, device=cuda:0), %c.4 : Double(1:1, requires_grad=0, device=cuda:0) = prim::TensorExprGroup_0(%83, %84) [DUMP profiling_graph_executor_impl.cpp:499] -> (%d.4, %c.4) [DUMP profiling_graph_executor_impl.cpp:499] block1(): [DUMP profiling_graph_executor_impl.cpp:499] %94 : Function = prim::Constant[name="fallback_function", fallback=1]() [DUMP profiling_graph_executor_impl.cpp:499] %95 : (Tensor, Tensor) = prim::CallFunction(%94, %11, %13) [DUMP profiling_graph_executor_impl.cpp:499] %96 : Tensor, %97 : Tensor = prim::TupleUnpack(%95) [DUMP profiling_graph_executor_impl.cpp:499] -> (%96, %97) [DUMP profiling_graph_executor_impl.cpp:499] %60 : int[] = aten::size(%87) # <string>:3:55 [DUMP profiling_graph_executor_impl.cpp:499] %61 : int[]? = aten::_size_if_not_equal(%59, %60) # <string>:3:19 [DUMP profiling_graph_executor_impl.cpp:499] %64 : int[]? = aten::_size_if_not_equal(%62, %60) # <string>:3:68 [DUMP profiling_graph_executor_impl.cpp:499] %67 : int[] = aten::size(%86) # <string>:3:55 [DUMP profiling_graph_executor_impl.cpp:499] %68 : int[]? = aten::_size_if_not_equal(%60, %67) # <string>:3:19 [DUMP profiling_graph_executor_impl.cpp:499] %71 : int[]? = aten::_size_if_not_equal(%62, %67) # <string>:3:68 [DUMP profiling_graph_executor_impl.cpp:499] return (%86, %61, %64, %68, %71) [DUMP profiling_graph_executor_impl.cpp:499] with prim::TensorExprGroup_0 = graph(%1 : Double(1:1, requires_grad=0, device=cuda:0), [DUMP profiling_graph_executor_impl.cpp:499] %4 : Double(1:1, requires_grad=0, device=cuda:0)): [DUMP profiling_graph_executor_impl.cpp:499] %5 : int = prim::Constant[value=1]() [DUMP profiling_graph_executor_impl.cpp:499] %c.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%4, %1, %5) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2872:16 [DUMP profiling_graph_executor_impl.cpp:499] %2 : int = prim::Constant[value=1]() [DUMP profiling_graph_executor_impl.cpp:499] %d.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%c.3, %1, %2) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2873:16 [DUMP profiling_graph_executor_impl.cpp:499] return (%d.3, %c.3) ``` With this optimization we use `prim::BroadcastSizes` to compute the size of `c`. No need to materialize it. ``` [DUMP profiling_graph_executor_impl.cpp:499] Optimized Graph: [DUMP profiling_graph_executor_impl.cpp:499] graph(%a.1 : Tensor, [DUMP profiling_graph_executor_impl.cpp:499] %b.1 : Tensor): [DUMP profiling_graph_executor_impl.cpp:499] %11 : Tensor = prim::DifferentiableGraph_0(%b.1, %a.1) [DUMP profiling_graph_executor_impl.cpp:499] return (%11) [DUMP profiling_graph_executor_impl.cpp:499] with prim::DifferentiableGraph_0 = graph(%11 : Tensor, [DUMP profiling_graph_executor_impl.cpp:499] %13 : Tensor): [DUMP profiling_graph_executor_impl.cpp:499] %59 : int[] = aten::size(%13) # <string>:3:44 [DUMP profiling_graph_executor_impl.cpp:499] %62 : int[] = aten::size(%11) # <string>:3:93 [DUMP profiling_graph_executor_impl.cpp:499] %88 : Double(1:1, requires_grad=0, device=cuda:0), %89 : Double(1:1, requires_grad=0, device=cuda:0), %90 : bool = prim::TypeCheck(%11, %13) [DUMP profiling_graph_executor_impl.cpp:499] %91 : Tensor = prim::If(%90) [DUMP profiling_graph_executor_impl.cpp:499] block0(): [DUMP profiling_graph_executor_impl.cpp:499] %d.4 : Double(1:1, requires_grad=0, device=cuda:0) = prim::TensorExprGroup_0(%88, %89) [DUMP profiling_graph_executor_impl.cpp:499] -> (%d.4) [DUMP profiling_graph_executor_impl.cpp:499] block1(): [DUMP profiling_graph_executor_impl.cpp:499] %97 : Function = prim::Constant[name="fallback_function", fallback=1]() [DUMP profiling_graph_executor_impl.cpp:499] %98 : (Tensor) = prim::CallFunction(%97, %11, %13) [DUMP profiling_graph_executor_impl.cpp:499] %99 : Tensor = prim::TupleUnpack(%98) [DUMP profiling_graph_executor_impl.cpp:499] -> (%99) [DUMP profiling_graph_executor_impl.cpp:499] %85 : int[] = aten::size(%91) [DUMP profiling_graph_executor_impl.cpp:499] %86 : int[] = prim::BroadcastSizes(%59, %62) [DUMP profiling_graph_executor_impl.cpp:499] %61 : int[]? = aten::_size_if_not_equal(%59, %86) # <string>:3:19 [DUMP profiling_graph_executor_impl.cpp:499] %64 : int[]? = aten::_size_if_not_equal(%62, %86) # <string>:3:68 [DUMP profiling_graph_executor_impl.cpp:499] %68 : int[]? = aten::_size_if_not_equal(%86, %85) # <string>:3:19 [DUMP profiling_graph_executor_impl.cpp:499] %71 : int[]? = aten::_size_if_not_equal(%62, %85) # <string>:3:68 [DUMP profiling_graph_executor_impl.cpp:499] return (%91, %61, %64, %68, %71) [DUMP profiling_graph_executor_impl.cpp:499] with prim::TensorExprGroup_0 = graph(%1 : Double(1:1, requires_grad=0, device=cuda:0), [DUMP profiling_graph_executor_impl.cpp:499] %4 : Double(1:1, requires_grad=0, device=cuda:0)): [DUMP profiling_graph_executor_impl.cpp:499] %5 : int = prim::Constant[value=1]() [DUMP profiling_graph_executor_impl.cpp:499] %c.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%4, %1, %5) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2872:16 [DUMP profiling_graph_executor_impl.cpp:499] %2 : int = prim::Constant[value=1]() [DUMP profiling_graph_executor_impl.cpp:499] %d.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%c.3, %1, %2) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2873:16 [DUMP profiling_graph_executor_impl.cpp:499] return (%d.3) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45080 Reviewed By: bertmaher Differential Revision: D23856410 Pulled By: Krovatkin fbshipit-source-id: 2956286eb03a4894a5baa151c35e6092466322b1	2020-09-28 10:45:56 -07:00
Rong Rong	48d29c830d	[hotfix] disable problematic cuda tests on rocm builds (#45435 ) Summary: Disable the recent 3 cuda tests on amd rocm build/tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/45435 Reviewed By: malfet Differential Revision: D23962881 Pulled By: walterddr fbshipit-source-id: ad4ea1f835b4722cdbdce685806cfd64376cc16f	2020-09-28 10:02:12 -07:00
Nikita Vedeneev	e4950a093a	Backward support for generalized eigenvalue solver with LOBPCG in forward [only k-rank SYMEIG case] (#43002 ) Summary: As per title. Fixes [#{38948}](https://github.com/pytorch/pytorch/issues/38948). Therein you can find some blueprints for the algorithm being used in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43002 Reviewed By: zou3519 Differential Revision: D23931326 Pulled By: albanD fbshipit-source-id: e6994af70d94145f974ef87aa5cea166d6deff1e	2020-09-28 07:22:35 -07:00
Mike Ruberry	6417a70465	Updates linalg warning + docs (#45415 ) Summary: Changes the deprecation of norm to a docs deprecation, since PyTorch components still rely on norm and some behavior, like automatically flattening tensors, may need to be ported to torch.linalg.norm. The documentation is also updated to clarify that torch.norm and torch.linalg.norm are distinct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45415 Reviewed By: ngimel Differential Revision: D23958252 Pulled By: mruberry fbshipit-source-id: fd54e807c59a2655453a6bcd9f4073cb2c12e8ac	2020-09-28 05:28:42 -07:00
generatedunixname89002005325676	7818a214c5	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D23959094 fbshipit-source-id: 6caa046d263114bff38a38d756099aac357e4f04	2020-09-28 05:08:46 -07:00
Negin Raoof	95a97e51b5	[ONNX] Improve scripting inplace indexing ops (#44351 ) Summary: Fix a couple of issues with scripting inplace indexing in prepare_inplace_ops_for_onnx pass. 1- Tracing index copy (such as cases lik x[1:3] = data) already applies broadcasting on rhs if needed. The broadcasting node (aten::expand) is missing in scripting cases. 2- Inplace indexing with ellipsis (aten::copy_) is replaced with aten::index_put and then handled with slice+select in this pass. Support for negative indices for this op added. Shape inference is also enabled for scripting tests using new JIT API. A few more tests are enabled for scripting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44351 Reviewed By: ezyang Differential Revision: D23880267 Pulled By: bzinodev fbshipit-source-id: 78b33444633eb7ae0fbabc7415e3b16001f5207f	2020-09-28 00:32:36 -07:00
Zino Benaissa	13f76f2be4	Fix preserve submodule attribute in freezing (#45143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45143 This PR prevents freezing cleaning up a submodule when user requests to preserve a submodule. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23844969 Pulled By: bzinodev fbshipit-source-id: 80e6db3fc12460d62e634ea0336ae2a3551c2151	2020-09-28 00:05:38 -07:00
liqunfu	c3bf402cbb	handle onnx nll with default ignore index (#44816 ) Summary: in ONNX NegativeLogLikelihoodLoss specification, ignore_index is optional without default value. therefore, when convert nll op to ONNX, we need to set ignore_index attribute even if it is not specified (e.g. ignore_index=-100). Pull Request resolved: https://github.com/pytorch/pytorch/pull/44816 Reviewed By: ezyang Differential Revision: D23880354 Pulled By: bzinodev fbshipit-source-id: d0bdd58d0a4507ed9ce37133e68533fe6d1bdf2b	2020-09-27 23:26:19 -07:00
shubhambhokare1	5b839bca78	[ONNX] Optimize export_onnx api to reduce string and model proto exchange (#44332 ) Summary: Optimize export_onnx api to reduce string and model proto exchange in export.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/44332 Reviewed By: bwasti, eellison Differential Revision: D23880129 Pulled By: bzinodev fbshipit-source-id: 1d216d8f710f356cbba2334fb21ea15a89dd16fa	2020-09-27 16:29:08 -07:00
neginraoof	4005afe94b	[ONNX] Update narrow for dynamic inputs (#44039 ) Summary: Update narrow for dynamic inputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/44039 Reviewed By: mruberry Differential Revision: D23742215 Pulled By: bzinodev fbshipit-source-id: 0d58d2fe996f91a124af988a9a21ee433e842d07	2020-09-27 15:52:57 -07:00

1 2 3 4 5 ...

11958 Commits