pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
tktrungna	f875027713	Update test distribute path	2021-08-03 09:13:11 -07:00
Philip Meier	2cf4d8128d	add `OpInfo` for `torch.nn.functional.mse_loss` (#62254 ) Summary: Addresses facebookresearch/functorch#78. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62254 Reviewed By: malfet Differential Revision: D30013331 Pulled By: zou3519 fbshipit-source-id: e3242cb97d1f061b932e3e0ed589f1ee6a291512	2021-08-03 09:01:09 -07:00
Raghavan Raman	ab8af15545	[Static Runtime] Enabled building Static Runtime tests and benchmarks in OSS CI (#62226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62226 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D29923800 Pulled By: navahgar fbshipit-source-id: 33cfe0e92a34c7140ea762e5715301cfbf401434	2021-08-03 08:52:36 -07:00
Andrew Gu	43327cc197	Refactor commonalities between two approaches (#62624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62624 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30058543 Pulled By: andwgu fbshipit-source-id: 73c794062b75e011868fae264f592549eed67482	2021-08-03 08:43:14 -07:00
Andrew Gu	e6a3967c2a	Add invariant check (bucket indices: 0, 1, ..., k-1) (#62623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62623 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30058544 Pulled By: andwgu fbshipit-source-id: a56910f294c6a40118751eebe255b62700f42be9	2021-08-03 08:13:52 -07:00
Kevin Tse	87465a6e68	adding operator cumulative_trapezoid (#61615 ) Summary: Stack from [ghstack](https://github.com/ezyang/ghstack): * https://github.com/pytorch/pytorch/issues/61616 * https://github.com/pytorch/pytorch/issues/61615 * https://github.com/pytorch/pytorch/issues/61475 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61615 Reviewed By: malfet, mruberry Differential Revision: D29975064 Pulled By: NivekT fbshipit-source-id: 4d4e98f3efb720fdc44eb238ecbf0fa157ac13d7	2021-08-03 08:04:00 -07:00
Sergei Vorobev	b37578b3c0	Make bazel output less verbose in CI (#62601 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62600 Adds `bazel --config=no-tty` that is useful for less verbose output in environments that don't implement full tty like CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62601 Reviewed By: soulitzer Differential Revision: D30070154 Pulled By: malfet fbshipit-source-id: 5b89af8441c3c6c7ca7e9a0ebdfddee00c9ab576	2021-08-03 07:59:01 -07:00
Victor Quach	3bda4ea842	Avoid unnecessary copying data in Saved Variable (#61927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61927 This is a refactor of `SavedVariable.cpp` to prevent ever defining the `data_` tensor if default hooks are set. Before the refactor: ```c++ data_ = variable.tensor_data(); // this is wasteful if hooks are defined register_hooks(Engine::get_default_engine().get_default_saved_variable_hooks()); ``` After the refactor: ```c++ if (get_default_hooks_()) { save_metadata_(variable); register_hooks_(get_default_hooks_(), variable); return; } save_metadata_(variable); data_ = variable.tensor_data(); // only needed if hooks are not defined ``` Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29848524 Pulled By: Varal7 fbshipit-source-id: abca1eee37a17b47841e28d8a576490913fce1ce	2021-08-03 07:09:47 -07:00
Yukio Siraichi	7edb4f8761	Port `cumprod` kernel to structured kernels. (#61899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61899 Tracking issue: #55070 This PR also removes `at::_cumprod`, which was the "backend" for `at::cumprod`. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29939489 Pulled By: ezyang fbshipit-source-id: d5e4a6dfa6c79e4b135508ea13c2d11bd0684f63	2021-08-03 06:58:13 -07:00
Yukio Siraichi	e52325655a	Port `cumprod` kernel to structured kernels. (#61899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61899 Tracking issue: #55070 This PR also removes `at::_cumprod`, which was the "backend" for `at::cumprod`. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29939152 Pulled By: ezyang fbshipit-source-id: b3379033a1ffe3c7bc8216d16d089d388ea559ba	2021-08-03 06:57:09 -07:00
yanbing-j	c7a7c2b62f	Enable Gelu fp32/bf16 in CPU path using Mkldnn implementation (#58525 ) Summary: Enable Gelu bf16/fp32 in CPU path using Mkldnn implementation. User doesn't need to_mkldnn() explicitly. New Gelu fp32 performs better than original one. Add Gelu backward for https://github.com/pytorch/pytorch/pull/53615. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58525 Reviewed By: ejguan Differential Revision: D29940369 Pulled By: ezyang fbshipit-source-id: df9598262ec50e5d7f6e96490562aa1b116948bf	2021-08-03 06:52:23 -07:00
kshitij12345	fd8004b42e	add bfloat16 impl for nextafter (#61829 ) Summary: Add `BFloat16` support for `nextafter`. * [x] Add OpInfo * [x] Add Implementation Test (C++ tests) * [x] Add credit Pull Request resolved: https://github.com/pytorch/pytorch/pull/61829 Reviewed By: ejguan Differential Revision: D29932498 Pulled By: mruberry fbshipit-source-id: 89524531a4800569ba1addd08a4ace330a6f72a4	2021-08-02 23:16:58 -07:00
Richard Barnes	2888b7fec5	Fix sign comparison (#62483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62483 Test Plan: Sandcastle Reviewed By: albanD Differential Revision: D30015385 fbshipit-source-id: eefc3208fb8c42ff46b9f4d910eb93c32595fa28	2021-08-02 22:50:39 -07:00
Nikita Shulga	a77be16538	TensorAccessor::bounds_check should be a CPU-only funciton (#62628 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62628 This fixes following errors when ROCm compiler is used ``` caffe2/aten/src/ATen/core/TensorAccessor.h:160:5: error: throw is prohibited in AMP-restricted functions TORCH_CHECK_INDEX( ^ ``` Test Plan: CI Reviewed By: zhouzhuojie, seemethere Differential Revision: D30059737 fbshipit-source-id: d094ee608768db41fcc91d044c2c6d7d29f33fe4	2021-08-02 22:46:24 -07:00
Adam Simpkins	e0364ccc33	[caffe2] break one circular dependency between Caffe2 and ATen-cpu (#62632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62632 Update the caffe2/core/context.h to directly use `at::mt19937` instead of the `at::CPUGeneratorImpl` wrapper class from the ATen-cpu library. Using `at::CPUGeneratorImpl` causes circular dependencies between the ATen and caffe2 code. In particular the `at::CPUGeneratorImpl::get_state()` logic depends on CPU Tensor functionality that currently depends on code from caffe2. Test Plan: The RNG behavior should be identically to the previous code (perhaps even faster since we now avoid virtual function calls). buck test //caffe2/caffe2:caffe2_test_cpu \ //caffe2/caffe2/python: //caffe2/caffe2/fb/operators: Differential Revision: D29915701 fbshipit-source-id: f9b2eab8d3b21b2224d30bcf52be9c0e7eb7cd0a	2021-08-02 22:40:56 -07:00
Pritam Damania	88af4d8441	Initialize RRefs only when explicitly asked for. (#62618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62618 ShardedTensor implicitly initialized RRefs to remote shards if the RPC framework was initialized. Although, there are use cases where the RPC framework might be initialized for a different purpose but users would not prefer that ShardedTensor initializes RRefs as well. As a result, I've made RRef initialization explcitit in ShardedTensor APIs. ghstack-source-id: 134889287 Test Plan: 1) waitforbuildbot 2) unit tests. Reviewed By: wanchaol Differential Revision: D30056833 fbshipit-source-id: 9b2433a38dafa1888589c5b72ed93b6f0ee51639	2021-08-02 22:17:17 -07:00
Isuru Fernando	b58e04f156	Make sure FindLAPACK finds the same BLAS library (#49647 ) Summary: BLAS library is found by cmake/Dependencies.cmake and then LAPACK library is found by FindLAPACK.cmake which in turn calls FindBLAS.cmake. This means that we are searching for BLAS twice and they might be different things. By setting a few variables, this can be avoided. cc seemethere Pull Request resolved: https://github.com/pytorch/pytorch/pull/49647 Reviewed By: seemethere, ejguan Differential Revision: D29943680 Pulled By: malfet fbshipit-source-id: 3cbc350ea645a1a28dd92c19e5ee7f9eecdeff59	2021-08-02 20:41:00 -07:00
Nathan Lanza	2d038b5dc8	Cast a var to void that is unused Summary: The comment above makes it seem intentional, so just ignore it. Test Plan: NFC Reviewed By: smeenai Differential Revision: D30057632 fbshipit-source-id: 45929b4eeeefdf22f5c7c4dd603229635f9da31b	2021-08-02 19:56:41 -07:00
Santiago Castro	c4196bee93	Save some memory in scatter (#62516 ) Summary: Also removes some redundant parenthesis for clarity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62516 Reviewed By: andwgu Differential Revision: D30030546 Pulled By: SciPioneer fbshipit-source-id: e106486f70b9590bf3dcffb76d23f5725737542f	2021-08-02 18:41:58 -07:00
Hui Guo	10d3a2c13a	[tensorexpr] Added logging info for SimplifierUnderContext (#62138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62138 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D29891257 Pulled By: huiguoo fbshipit-source-id: c36b3d615fa2fe971d022111bef61ee843a9dbea	2021-08-02 18:38:55 -07:00
Hui Guo	3a592730d5	[nnc] Simplify i%100 to i if i is less than 100; fixed #52580 (#60693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60693 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29375938 Pulled By: huiguoo fbshipit-source-id: 1388729c5b93805cb156efa53e8823d5462885bf	2021-08-02 18:38:54 -07:00
Hui Guo	8f7ae77040	[nnc] Add context-sensitive simplification for div/mod (#60688 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60688 Test Plan: Imported from OSS Reviewed By: navahgar, ZolotukhinM Differential Revision: D29373313 Pulled By: huiguoo fbshipit-source-id: 90d7f2fbfce583b0ea3b0f1c7899e22b0210bd62	2021-08-02 18:37:39 -07:00
Pritam Damania	c07a123b26	Support saving and loading ShardedTensor. (#62242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62242 1) Add a state_dict hook to ensure ShardedTensors are added to a state_dict. 2) Add a pre load state_dict hook to ensure ShardedTensor are added back to a module at load time. 3) Add a `with_load_process_group` context manager for load time. 4) Added ser-de capability to ShardedTensor. ghstack-source-id: 134860967 Test Plan: 1) unit tests 2) waitforbuildbot Reviewed By: wanchaol Differential Revision: D29927881 fbshipit-source-id: b1ef8872ed91e9cb0e2d5dd17d2764678ab89f0c	2021-08-02 18:33:19 -07:00
Eli Uriegas	dd23372aa5	.circleci: Prefix intermediate build image tags (#62610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62610 Prefixes intermediate build image tags with build- so that ECR lifecycle policies can automatically clean them up Policy to automatically cleanup images prefixed with `build-`: `b02dd818f9` Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D30055952 Pulled By: seemethere fbshipit-source-id: 328b9c94ffc02877d088d0118a19c732f580838b	2021-08-02 18:17:14 -07:00
Victor Quach	525fa2f0b6	[reland] Catch saved tensors default hooks race condition (#62564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62564 If the user runs code that registers default saved tensor hooks from multiple threads, it will fail with a nice error message most of the time. This commit handles the very rare case where a race condition would have made it fail silently. Relanding previous PR #61957 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30045406 Pulled By: Varal7 fbshipit-source-id: d04f74c99affbbf655e53cfc2acd42f7c5b4e6eb	2021-08-02 18:00:37 -07:00
Nikita Shulga	f5cf24a224	Fix lint in test_deploy_from_python.py (#62626 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62626 Reviewed By: walterddr, zhouzhuojie, seemethere Differential Revision: D30059119 Pulled By: malfet fbshipit-source-id: 2aff44c1585091d864ab7e02d69046204e5b5d17	2021-08-02 17:55:24 -07:00
Mustafa Bal	615ac8e573	Added logic for notifying PTE webapp for Nightly and PR builds (#62512 ) Summary: This PR adds the logic to notify the PTE webapp for DevOps PyTorch Nightly and PR builds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62512 Reviewed By: iramazanli Differential Revision: D30046165 Pulled By: malfet fbshipit-source-id: ef7e4848d4db9f38536a647fcd2d8e26cf64b960	2021-08-02 16:44:35 -07:00
Yi Wang	db071ef005	[Reland][DDP Communication Hook] Rename 4 Methods of GradBucket Class (#62592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62592 Reland #62510 `GradBucket` is an important class defined in both C++ and Python, used for PyTorch Distributed Training. We need to rename the following methods for simplicity: 1) get_index -> index 2) is_the_last_bucket_to_allreduce -> is_last, 3) get_per_parameter_tensors -> gradients, 4) get_model_params_for_bucket -> parameters. ghstack-source-id: 134848352 Test Plan: unit test Reviewed By: andwgu Differential Revision: D30049431 fbshipit-source-id: 1bcac331aa30e529b7230e3891bc811c531b0ea9	2021-08-02 16:38:09 -07:00
Salil Desai	d228a8fc94	[Vulkan] Softmax Along Channel Dim (#62239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62239 Added naive implementation of vulkan softmax (not using shared memory) Based off of naive implementation of mean, found here: `2565a33c98/aten/src/ATen/native/vulkan/glsl/mean.glsl` Test Plan: After building: ``` build/bin/vulkan_api_test ``` {F637001190} ``` [ RUN ] VulkanAPITest.softmax [ OK ] VulkanAPITest.softmax (180 ms) ``` Reviewed By: SS-JIA Differential Revision: D29793150 fbshipit-source-id: 4f9d8e1dae8a43cbcb7063b095fa4726df06c929	2021-08-02 16:20:44 -07:00
Peter Bell	940cbbce76	Add BFloat16 support to CPU nansum (#61083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61083 It's already supported on CUDA, so it seems reasonable to support on CPU as well. This also changes `test_nansum` to compare against `torch.sum` since numpy doesn't support BFloat16. Note that `test_nansum_vs_numpy` checks against NumPy as well, so that's still being tested. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30006227 Pulled By: heitorschueroff fbshipit-source-id: 1449730e1936417e7de1f8b3cf8cdcc15518873c	2021-08-02 16:03:57 -07:00
Zachary DeVito	27d3d3a7d7	deploy in python fix to work in @opt mode Summary: if we let torch_deploy get put in libomnibus, it hides the symbols we need to link against Test Plan: buck test //caffe2/torch/csrc/deploy:test_deploy_from_python -- --exact 'caffe2/torch/csrc/deploy:test_deploy_from_python - test_deploy_from_python (caffe2.torch.csrc.deploy.test_deploy_from_python.TestDeployFromPython)' --run-disabled Reviewed By: wconstab Differential Revision: D30031134 fbshipit-source-id: e5c2f740f17abafec7d01c57c664bd71a00b6f61	2021-08-02 14:47:49 -07:00
Gao, Xiang	a4af91b2fe	Cleanup CUDA 10.1 and 10.0 support on CI (#62597 ) Summary: 10.1 is removed in https://github.com/pytorch/pytorch/pull/56056 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62597 Reviewed By: walterddr Differential Revision: D30053902 Pulled By: seemethere fbshipit-source-id: deb148e5e44c12b08c267a36fbd4a1afa138e6e4	2021-08-02 14:42:25 -07:00
Jacob Szwejbka	305d5fcc05	[Pytorch Edge] get_model_bytecode int -> uint (#62201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62201 change int to uint to be the same type as the runtimes bytecode. Only affects c++ since python doesn't have uints iirc. Also changed the behavior of the functions from returning -1 and a warning to just throw an exception. Wasnt sure what the proper behavior here would be (returning UINT_MAX seemed gross) so feedback is appreciated. Test Plan: ci Reviewed By: raziel Differential Revision: D29914072 fbshipit-source-id: 1bb08702fc301d7c7612b5ad7205a6dbe855c890	2021-08-02 14:17:44 -07:00
Nikita Shulga	0c4c37b01e	Disable libtorch testing on MacOS (#62599 ) Summary: Fixes regression introduced by https://github.com/pytorch/pytorch/issues/62402 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62599 Reviewed By: walterddr, driazati Differential Revision: D30051914 Pulled By: malfet fbshipit-source-id: a07184b21cc4b2d0ae31fe385bb58a0f665595c6	2021-08-02 13:41:18 -07:00
Bradley Davis	093495d3f0	[fx] prevent implicit submodule inlining when submodule is a GraphModule (#62436 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62436 ## Problem Given two modules and a tracer that indiscriminately marks all modules as a leaf: ``` class InnerModule(torch.nn.Module): def forward(self, t): return t + t class MyModule(torch.nn.Module): def __init__(self, inner): super().__init__() self.inner = inner def forward(self, t): x = self.inner(t) y = self.inner(t) return x + y class MyTracer(torch.fx.Tracer): def is_leaf_module(self, module, name): return True ``` One might generally expect the following behavior (note call_module nodes): ``` print(">> Outer GraphModule (with inner module as nn.Module):") inner = InnerModule() m = MyModule(inner) gm = torch.fx.GraphModule(m, MyTracer().trace(m)) print(gm.graph.print_tabular()) >> Outer GraphModule (with inner module as nn.Module): opcode name target args kwargs ------------- ------- ----------------------- ---------------- -------- placeholder t t () {} call_module inner inner (t,) {} call_module inner_1 inner (t,) {} call_function add <built-in function add> (inner, inner_1) {} output output output (add,) {} None ``` However, when the inner module is first symbolically traced, the symbolic trace of the outer module ignores `is_leaf_node` entirely, and traces through the whole module (note call_function nodes). ``` print(">> Inner module as GraphModule:") inner = InnerModule() inner_gm = torch.fx.GraphModule(inner, MyTracer().trace(inner)) print(inner_gm.graph.print_tabular()) print(">> Outer GraphModule (with inner module as GraphModule):") m = MyModule(inner_gm) gm = torch.fx.GraphModule(m, MyTracer().trace(m)) print(gm.graph.print_tabular()) >> Inner module as GraphModule: opcode name target args kwargs ------------- ------ ----------------------- ------ -------- placeholder t t () {} call_function add <built-in function add> (t, t) {} output output output (add,) {} None >> Outer GraphModule (with inner module as GraphModule): opcode name target args kwargs ------------- ------ ----------------------- ------------ -------- placeholder t t () {} call_function add <built-in function add> (t, t) {} call_function add_1 <built-in function add> (t, t) {} call_function add_2 <built-in function add> (add, add_1) {} output output output (add_2,) {} None ``` This is surprising behavior and at first glance violates the tracer's intent. As I understand it, `torch.fx.symbolic_trace.Tracer.trace` intends to patch `torch.nn.Module.__call__` with a `module_call_wrapper()` that records a `call_module` node if the module is a leaf, else executes `torch.fx._symbbolic_trace._orig_module_call = torch.nn.Module.__call__`, which is set a module loading time. Every submodule should be a leaf, but no `call_module` nodes are created when that submodule is a `GraphModule`. Why? Upon further inspection, I found: - The constructor for GraphModule includes a path to `GraphModule.recompile()` via the setter for a `fx.Graph`: ``` inner_gm = torch.fx.GraphModule(inner, MyTracer().trace(inner)) File "/torch/fx/graph_module.py", line 252, in __init__ self.graph = graph File "/torch/nn/modules/module.py", line 1183, in __setattr__ object.__setattr__(self, name, value) File "/torch/fx/graph_module.py", line 277, in graph self.recompile() ``` - `recompile()` wraps the `__call__` method by holding a reference to the `__call__` method at the time of recompilation: ``` cls = type(self) cls_call = cls.__call__ ... def wrapped_call(self, args, kwargs): try: return cls_call(self, args, *kwargs) except Exception as e: ... cls.__call__ = wrapped_call ``` - Recompilation of the inner GraphModule happens on initialization, before creation or tracing of the outer module. Adding some old-fashioned print debug statements gives: ``` Inner Module: _orig_module_call: <function Module._call_impl at 0x7faaebfee8b0> recompile: cls.__call__ now wraps _orig_module_call, <function Module._call_impl at 0x7faaebfee8b0> Outer Module: _orig_module_call: <function Module._call_impl at 0x7faaebfee8b0> tracing: patching method <class 'torch.nn.modules.module.Module'>.__call__ <function Module._call_impl at 0x7faaebfee8b0> with <function Module._call_impl at 0x7fa9d42bce50> outer module MRO before tracing: (0) <class '__main__.MyModule'>: <function Module._call_impl at 0x7faaebfee8b0> (1) <class 'torch.nn.modules.module.Module'>: <function Module._call_impl at 0x7faaebfee8b0> (2) <class 'object'>: <method-wrapper '__call__' of type object at 0x7fac3cd15f00> outer module MRO during tracing: (0) <class '__main__.MyModule'>: <function Module._call_impl at 0x7fa9d42bce50> (1) <class 'torch.nn.modules.module.Module'>: <function Module._call_impl at 0x7fa9d42bce50> (2) <class 'object'>: <method-wrapper '__call__' of type object at 0x7fac3cd15f00> inner module MRO before tracing: (0) <class 'torch.fx.graph_module.GraphModule.__new__.<locals>.GraphModuleImpl'>: <function x.y.z.wrapped_call at 0x7fa9d42a8670> (1) <class 'torch.fx.graph_module.GraphModule'>: <function Module._call_impl at 0x7faaebfee8b0> (2) <class 'torch.nn.modules.module.Module'>: <function Module._call_impl at 0x7faaebfee8b0> (3) <class 'object'>: <method-wrapper '__call__' of type object at 0x7fac3cd15f00> inner module MRO during tracing: (0) <class 'torch.fx.graph_module.GraphModule.__new__.<locals>.GraphModuleImpl'>: <function x.y.z.wrapped_call at 0x7fa9d42a8670> (1) <class 'torch.fx.graph_module.GraphModule'>: <function Module._call_impl at 0x7fa9d42bce50> (2) <class 'torch.nn.modules.module.Module'>: <function Module._call_impl at 0x7fa9d42bce50> (3) <class 'object'>: <method-wrapper '__call__' of type object at 0x7fac3cd15f00> ``` - The outer module is patched correctly, but the inner module's first element in its MRO is the `wrapped_call` from `recompile` that still invokes `<function Module._call_impl at 0x7faaebfee8b0>` directly. Therefore, no call_module nodes are created. ## In Practice In practice, this behavior affects the ability of `torch.package` to package `GraphModules` whose submodules are `GraphModules`. In our case, the `GraphModule` submodules are not passed through a constructor, but created separately and installed on the root `GraphModule` via `setattr`. This means that prior to packaging, there appear to be no issues with the module, since the root's graph was created before any call_module targets were replaced with `GraphModules`. When unpackaging such a model with `torch.package`, `torch.fx.graph_module._deserialize_graph_module` uses an inline `KeepModules` tracer that sets all submodules to leaves; the unpackaged module is implicitly and surprisingly inlined in the process. ## Potential Solution This behavior was previously not understood by us, and so the current workaround is a gnarly process of wrapping all submodules with a `nn.Module` with a manually installed forward method. Changing `wrapped_call` to return `return super(type(self), self).__call__(args, **kwargs)` whenever `__call__` is inherited at least appears to solve the issue. Does this seem like an acceptable approach? ## Other Thoughts - Repeated calls to `recompile` create nested `wrapped_calls`, all for the purpose of error handling. This seems probably unnecessary ¯\\_(ツ)\_/¯ - If a root module with a overriden `__call__` method is symbolically traced, it is ignored Test Plan: ``` buck test: ✓ ListingSuccess: caffe2/test:fx - main (12.570) ✓ Pass: caffe2/test:fx - test_tracing_graphmodules_as_leaf_submodules (test_fx.TestFX) (11.982) ``` Reviewed By: ansley Differential Revision: D29997935 fbshipit-source-id: 1988fbb025b14188da26a3e73e94fb789c3c1f74	2021-08-02 13:37:08 -07:00
Howard Huang	dc1bd6acee	Remove PROCESS GROUP rpc backend (#62411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62411 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29990408 Pulled By: H-Huang fbshipit-source-id: 183d3b316767b12993cebbe32b73c2850fd1cc42	2021-08-02 12:26:22 -07:00
Yi Wang	2ec4f69b48	[DDP Comm Hook] Do not expose hook_then_optimizer as a public method (#62532 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62532 This method is not stable at this time, so avoid releasing it when DDP communication hook feature is released as a stable feature. ghstack-source-id: 134787831 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_hook_with_optimizer_parity buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_hook_then_optimizer_nccl Reviewed By: rohan-varma Differential Revision: D30031222 fbshipit-source-id: e03a8e13fee5116a5ddd724eb76316ee98f2a676	2021-08-02 12:25:01 -07:00
Victor Quach	b161ac541d	[reland] Add default Saved Variable hooks (#62563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62563 Expose a pair of functions to Python users: torch.autograd.graph.set_saved_tensors_default_hooks(pack, unpack) and torch.autograd.graph.reset_saved_tensors_default_hooks(). These functions control the hooks applied to saved tensors: all tensors saved in that context will be packed using the pack function, then unpacked accordingly when needed. Currently, this works by simply calling register_hooks (cf #60975) directly at the end of the constructor of a SavedVariable. This could be optimized further by not performing the copy before registering default hooks, but this would require a small refactor. Edit: the refactor is done in #61927. A current limitation is that if users create tensors in this context, they will not be able to register additional hooks on the saved tensor. For instance, to perform something like #28997, one could define a pack function that saves to disk whenever the tensor size is too big and returns a filename, then unpack simply reads the content of the file and outputs a tensor, e.g.: ``` def pack(x): name = os.path.join(tmp_dir, str(uuid.uuid4())) torch.save(x, name) return name def unpack(name): return torch.load(name) ``` Relanding previous PR: https://github.com/pytorch/pytorch/pull/61834 Original PR led to timeout error in: https://www.internalfb.com/mast/job/yuguo-release_canary_offline_training-inlinecvrp_a-canary_offline_train_28a7ecfc Now passing: https://www.internalfb.com/mast/job/quach-release_canary_offline_training-inlinecvrp_a-canary_offline_train_9bb57e98 The difference with the new version is we don't need to acquire the GIL when calling `PyDefaultSavedVariableHooks::get_hooks`. Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D30045405 Pulled By: Varal7 fbshipit-source-id: 7f6c07af3a56fe8835d5edcc815c15ea4fb4e332	2021-08-02 11:30:26 -07:00
Eli Uriegas	6f95850127	Revert D30024161: [DDP Communication Hook] Rename 4 Methods of GradBucket Class Test Plan: revert-hammer Differential Revision: D30024161 (`29c8b1db57`) Original commit changeset: 07e6072a2f7b fbshipit-source-id: d571c2caadaf7b71fe2aba3c0597bd8074d153de	2021-08-02 10:26:54 -07:00
Philip Meier	2e4f566d30	add `OpInfo` for `torch.nn.functional.softplus` (#62317 ) Summary: Addresses facebookresearch/functorch#78. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62317 Reviewed By: malfet Differential Revision: D30013322 Pulled By: zou3519 fbshipit-source-id: e80affd10b81534234694c9e4326cc68c7efc7fe	2021-08-02 09:46:13 -07:00
kshitij12345	cb626da145	[fix] mark non-differentiable ops (#62529 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62506 Fixes https://github.com/pytorch/pytorch/issues/62504 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62529 Reviewed By: albanD Differential Revision: D30032665 Pulled By: malfet fbshipit-source-id: 90254c50fb4a873e3eda59c8484626137e01cb31	2021-08-02 09:40:45 -07:00
Meghan Lele	562b555a2b	[CUDA] Fix typo in Normalization.cu (#62515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62515 Summary This commit fixes an obvious typo in `Normalization.cu` I found while working on #62452. Since that PR will not be landed anytime soon, I thought it would be prudent to land this fix. Test Plan Continuous integration. Test Plan: Imported from OSS Reviewed By: makslevental Differential Revision: D30027324 Pulled By: SplitInfinity fbshipit-source-id: 9d368a54c13f8e2bf6f6956dfb2bee974226f726	2021-08-02 09:38:46 -07:00
Qing Hu	29c8b1db57	[DDP Communication Hook] Rename 4 Methods of GradBucket Class (#62510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62510 `GradBucket` is an important class defined in both C++ and Python, used for PyTorch Distributed Training. We need to rename the following methods for simplicity: 1) get_index -> index 2) is_the_last_bucket_to_allreduce -> is_last, 3) get_per_parameter_tensors -> gradients, 4) get_model_params_for_bucket -> parameters. Test Plan: Local run comprehensive test with following results: https://pxl.cl/1Ml8b For two timeout failure test cases, most likely environment related and fail in my devserver. Reviewed By: SciPioneer Differential Revision: D30024161 fbshipit-source-id: 07e6072a2f7b81f731425d9b71f8c8b60d383b0f	2021-08-02 09:33:32 -07:00
BoTorch website deployment script	34cb2b5d04	Update SobolEngine docstring w/ correct behavior (#62548 ) Summary: Sobol was modfied to not drop the first point. This update reflects this behavior in the docstring. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62548 Reviewed By: qingfeng10 Differential Revision: D30035627 Pulled By: Balandat fbshipit-source-id: 64c659ea30c0c929778da3b58041875834e25e9a	2021-08-02 09:04:38 -07:00
Marjan Fariborz	2445d5c60a	Removed the hypothesis tests for adaptive_avg_pool (#60886 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60886 Remove all the hypothesis tests from test_adaptive_avg_pool2d_nhwc, test_adaptive_avg_pool, and test_adaptive_avg_pool3d_ndhwc. Test Plan: I tested it with buck test //caffe2/test:quantization and all three tests passed. The tests that failed are test_conv2d_api (test_quantized_functional.py), test_conv3d_api (test_quantized_functional.py), Reviewed By: wanchaol, jerryzh168 Differential Revision: D29432184 fbshipit-source-id: 2a4c540d2c169aec69cf8d143d5a155394885745	2021-08-02 08:57:14 -07:00
Yi Zhang	3dc588d577	Fix: no enough space for cu102 debug nightly build (#62465 ) Summary: Fixes #{issue number} ![image](https://user-images.githubusercontent.com/16190118/127632173-783630b7-c644-4239-b1dd-fb12b6bacf83.png) verification: https://app.circleci.com/pipelines/github/pytorch/pytorch/358483/workflows/a34a0123-54fe-418f-9211-4af75ee56a70/jobs/15120463 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62465 Reviewed By: iramazanli Differential Revision: D30045280 Pulled By: janeyx99 fbshipit-source-id: f40090eb02fd1d86033971611d492c7b107cc4bd	2021-08-02 08:44:16 -07:00
Andrew Gu	51f687fd4b	Add overlap with DDP to ZeRO (two approaches) (#62157 ) Summary: Overview: This adds two approaches to overlapping `DistributedDataParallel.backward()` with `ZeroRedundancyOptimizer.step()` by providing two hook constructors: `hook_with_zero_step()` and `hook_with_zero_step_interleaved()`. The former waits for all backward computation to finish before starting optimizer computation, while the latter launches a partial optimizer computation using the contents of a gradient bucket once that bucket's all-reduce completes. The two approaches each suffer from their own weaknesses, and which one to use depends on the specific hardware configuration. Both approaches can share changes to `ZeroRedundancyOptimizer`. A user should pass `overlap_with_ddp=True` to `ZeroRedundancyOptimizer`, construct a DDP communication hook using either `hook_with_zero_step()` or `hook_with_zero_step_interleaved()`, and register that communication hook. `ZeroRedundancyOptimizer.step()` should still be called in the training loop, though the optimizer computation and communication will be offloaded to originate from the communication hook. Currently, the first two iterations are vacuous, meaning they do not result in parameter updates and the inputs are ignored. This is required to finalize the DDP bucket strategy and to then initialize the `ZeroRedundancyOptimizer`'s local optimizer based on that bucketing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62157 Test Plan: The existing `ZeroRedundancyOptimizer` tests pass, and new unit tests for both hooks pass: - ~~`test_ddp_with_zero_step_parity_cpu`~~ (removed for now due to flakiness in CI -- under investigation, could possibly be similar Gloo issue as with `hook_with_zero_step_interleaved()`) - `test_ddp_with_zero_step_parity_gpu` - `test_ddp_with_zero_step_interleaved_parity_gpu` These were tested on the AI AWS cluster. An analogous `test_ddp_with_zero_step_interleaved_parity_cpu` is missing due to existing bugs with Gloo. See https://github.com/pytorch/pytorch/pull/62302. Both approaches have been verified using an internal accuracy benchmark. Reviewed By: mrshenli Differential Revision: D29971046 Pulled By: andwgu fbshipit-source-id: a7234c23c7ea253f144a698fd7e3c0fe039de5e8	2021-08-02 08:33:34 -07:00
Joel Schlosser	ee482edf0a	Callable activation function support for Transformer modules (C++) (#62342 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60747 Enhances the C++ versions of `Transformer`, `TransformerEncoderLayer`, and `TransformerDecoderLayer` to support callables as their activation functions. The old way of specifying activation function still works as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62342 Reviewed By: malfet Differential Revision: D30022592 Pulled By: jbschlosser fbshipit-source-id: d3c62410b84b1bd8c5ed3a1b3a3cce55608390c4	2021-08-02 08:06:39 -07:00
Rong Rong (AI Infra)	c9d5325c52	[BE] shorten the name part 1 (#62402 ) Summary: This should address part of https://github.com/pytorch/pytorch/issues/62357. 1. rename all files 'generated-*' to make it clear, filename will not be in CI workflow name 2. remove all 'pytorch-' in names 3. make sure the build test shell scripts are adopted to new name Next change should reduce more device related naming Pull Request resolved: https://github.com/pytorch/pytorch/pull/62402 Reviewed By: malfet Differential Revision: D30021959 Pulled By: walterddr fbshipit-source-id: 64b21a2020e25a507101c09c010cb593d8ac4146	2021-08-02 07:56:55 -07:00
Can Balioglu	7565039ee9	Support system-provided Intel TBB (#61934 ) Summary: This PR: (1) enables the use of a system-provided Intel TBB for building PyTorch, (2) removes `tbb:task_scheduler_init` references since it has been removed from TBB a while ago (3) marks the implementation of `_internal_set_num_threads` with a TODO as it requires a revision that fixes its thread allocation logic. Tested with `test/run_test`; no new tests are introduced since there are no behavioral changes (removal of `tbb::task_scheduler_init` has no impact on the runtime behavior). Pull Request resolved: https://github.com/pytorch/pytorch/pull/61934 Reviewed By: malfet Differential Revision: D29805416 Pulled By: cbalioglu fbshipit-source-id: 22042b428b57b8fede9dfcc83878d679a19561dd	2021-08-02 07:39:00 -07:00

1 2 3 4 5 ...

39023 Commits