pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Jerry Zhang	096bea5251	[reland][quant][graphmode][fx][fp16] Add fp16 support for {add\|mul}{_relu} (#52714 ) (#53019 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53019 Test Plan: python test/test_quantization.py TestQuantizedOps.test_add python test/test_quantization.py TestQuantizedOps.test_mul python test/test_quantization.py TestQuantizedOps.test_add_relu python test/test_quantization.py TestQuantizedOps.test_mul_relu Imported from OSS Imported from OSS Reviewed By: vkuzo Differential Revision: D26725350 fbshipit-source-id: 2a89f5da6a21908f454f870521d2a4549fdd291e	2021-03-01 13:19:42 -08:00
Kyle Chen	0a70ec45d1	[ROCm] Enable test cases in autocast_test_lists.py for ROCm (#52737 ) Summary: Enabling test cases in autocast_test_lists.py for ROCm because they are passing. Signed-off-by: Kyle Chen <kylechen@amd.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/52737 Reviewed By: H-Huang Differential Revision: D26706346 Pulled By: ngimel fbshipit-source-id: c1b3b3d8c0ef2a5b1f7e2bd061a749afbae16590	2021-03-01 12:51:56 -08:00
kshitij12345	a06cf5d8a4	[numpy] torch.{rad2deg, deg2rad}: promote integer inputs to float (#51853 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Depends on https://github.com/pytorch/pytorch/issues/51283 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51853 Reviewed By: albanD Differential Revision: D26399743 Pulled By: mruberry fbshipit-source-id: a6f0e12723e1451c6479d818752fe5d41788715d	2021-03-01 06:25:23 -08:00
kshitij12345	f5617b0932	[testing] Add Opinfo for torch.frac and minor fixes (#52660 ) Summary: Reference : https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52660 Reviewed By: ailzhang Differential Revision: D26618151 Pulled By: mruberry fbshipit-source-id: cf0df38e46f44d3afff6e0015af5a840c661aa0e	2021-03-01 04:58:31 -08:00
Mike Ruberry	312b297b82	Revert D26626092: [quant][graphmode][fx][fp16] Add fp16 support for {add\|mul}{_relu} Test Plan: revert-hammer Differential Revision: D26626092 (`2962fbb03c`) Original commit changeset: 91d040efa51e fbshipit-source-id: cc6bcc0f451d6adcd7bf7572451e6e3cd6ad59d1	2021-03-01 04:52:47 -08:00
jiej	4d94ee566e	Ge v1 (#52136 ) Summary: This is a second attempt to use graph executor to run forward on a gradient. This allows a secondary chance to profile intermediate tensor introduced by autodiff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52136 Reviewed By: pbelevich Differential Revision: D26693978 Pulled By: Krovatkin fbshipit-source-id: 91dde8009a210950af8e5173668ada241e16dd52	2021-02-28 00:53:13 -08:00
Jerry Zhang	2962fbb03c	[quant][graphmode][fx][fp16] Add fp16 support for {add\|mul}{_relu} (#52714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52714 Test Plan: python test/test_quantization.py TestQuantizedOps.test_add python test/test_quantization.py TestQuantizedOps.test_mul python test/test_quantization.py TestQuantizedOps.test_add_relu python test/test_quantization.py TestQuantizedOps.test_mul_relu Imported from OSS Reviewed By: vkuzo Differential Revision: D26626092 fbshipit-source-id: 91d040efa51e9c955eb688ec16a30f0c12233958	2021-02-27 22:12:10 -08:00
Jerry Zhang	177694681e	[quant][graphmode][fx] Add reference option support for linear_dynamic_fp16 (#52534 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52534 Currently linear_dynamic_fp16 has a signature that's tied to fbgemm/qnnpack We'll need to produce a pattern equivalent to linear_dynamic_fp16 to support extensions to other backends Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_linear_dynamic_fp16 Imported from OSS Reviewed By: vkuzo Differential Revision: D26557726 fbshipit-source-id: 270c9f781f73c79416a092b7831294cabca84b0c	2021-02-26 21:12:22 -08:00
Rohan Varma	b8e6e2971c	Run distributed_test with NCCL_ASYNC_ERROR_HANDLING (#52619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52619 Runs this test suite with nccl_async_error_handling enabled. It is the default to run many distributed training jobs, and can also help catch errors/hangs in tests more easily. We don't expect any changes in the actual existing tests since they shouldn't have any hangs. Also removes a commented out line ghstack-source-id: 122595646 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D26588108 fbshipit-source-id: a57bbe2ae5a0c86731d77be45756b17151618eb6	2021-02-26 11:59:49 -08:00
Vasiliy Kuznetsov	d2e88246d8	ns for fx: make return type of ns APIs future proof (#52789 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52789 Changes the return type of NS APIs from ``` { layer_name: { model_name: [torch.Tensor(...), ...], }, } ``` to ``` { layer_name: { model_name: { 'type': 'weight', # or node_output, etc 'values': [torch.Tensor(...), ...], // future info can be added here, such as node name, etc }, } ``` Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: hx89 Differential Revision: D26652640 fbshipit-source-id: 4b31164e402754141368d5a04d595f2b643af3bb	2021-02-25 20:45:44 -08:00
Vasiliy Kuznetsov	fe068157de	ns for fx: unify return types of weight and activation APIs (#52779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52779 1. makes the return type of the weight comparison APIs match the return type of the activation comparison APIs: ``` # before {layer_name: {model_name: weight_tensor}} {layer_name: {model_name: [activation_tensor]}} # after {layer_name: {model_name: [weight_tensor]}} {layer_name: {model_name: [activation_tensor]}} ``` 2. makes a type alias for the type, so future changes are easier Test Plan: ``` mypy torch/quantization python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: hx89 Differential Revision: D26652639 fbshipit-source-id: eb1f04d6913cedf88d628f362468875ae9ced928	2021-02-25 20:45:39 -08:00
Xiang	a52001f923	Improve test_reference_numerics (#51604 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50749 ci-all version of https://github.com/pytorch/pytorch/pull/50550 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51604 Reviewed By: anjali411 Differential Revision: D26666951 Pulled By: mruberry fbshipit-source-id: b87db68f1d2a0f6c151edbc5c7809bbceece69b0	2021-02-25 15:38:42 -08:00
Joel Schlosser	f974cf4688	Test for distributed RL with RPC (#52393 ) Summary: Addresses one item in https://github.com/pytorch/pytorch/issues/46321 ## Background This is a test version of the RL RPC example defined [here](https://github.com/pytorch/examples/blob/master/distributed/rpc/rl/main.py) and [here](https://pytorch.org/tutorials/intermediate/rpc_tutorial.html), with the following differences: * It defines and uses a `DummyEnv` to avoid a dependency on `gym`. The `DummyEnv` simply returns random states & rewards for a small number of iterations. * It removes the `ArgumentParser` and utilizes `RpcAgentTestFixture` + hard-coded constants for configuration and launching. * It changes the worker names to match what the internal Thrift RPC tests expect. The code is purposefully kept very similar to the original example code outside of these differences. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52393 Test Plan: ``` pytest test/distributed/rpc/test_tensorpipe_agent.py -k test_rl_rpc -vs pytest test/distributed/rpc/test_process_group_agent.py -k test_rl_rpc -vs ``` Reviewed By: glaringlee Differential Revision: D26515435 Pulled By: jbschlosser fbshipit-source-id: 548548c4671fe353d83c04108580d807108ca76e	2021-02-25 10:52:53 -08:00
pbialecki	39fa0b5d0a	Add scatter_add to amp promote list (#52133 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51730 I've added the `scatter_add` and `scatter_add.dimname` to the promote list as well as test cases for the former op. However, it seems that `scatter_add` [doesn't support named tensors yet](`8b0cb5ede3/aten/src/ATen/native/NamedTensor.cpp (L356-L358)`) (thanks t-vi for the pointer): ```python dev = 'cuda' torch.scatter_add(torch.zeros(2, 2, 2, dtype=torch.float16, device=dev, names=('N', 'C', 'L')), 'C', torch.randint(0, 2, (2, 2, 2), device=dev), torch.randn((2, 2, 2), dtype=torch.float32, device=dev)) > RuntimeError: scatter_add: You passed a dimname (string) to this op in place of a dimension index but it does not yet support this behavior. Please pass a dimension index to work around this. ``` which raised this error after adding this test case. I'm thus unsure, if I should also remove `scatter_add.dimname` from the promote list or not. In any case, once named tensors are supported a potential test could be added as: ```python ("scatter_add", (torch.zeros(2, 2, 2, dtype=torch.float16, device=dev, names=('N', 'C', 'L')), 'C', torch.randint(0, 2, (2, 2, 2), device=dev), torch.randn((2, 2, 2), dtype=torch.float32, device=dev))), ``` CC mcarilli ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/52133 Reviewed By: ejguan Differential Revision: D26440392 Pulled By: ngimel fbshipit-source-id: f4ee2d0b9e1f81afb6f94261c497cf2bf79ec115	2021-02-25 09:37:01 -08:00
Shen Li	1ac59d9db3	Fix RPC get_worker_info for rank=0 (#52804 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52804 `rpc.get_worker_info` used to only take string in v1.6. We recently allow it to accept `int` and `WorkerInfo`, but the previous check on `worker_name` is no longer correct. This commit adds explicit `not None` check. Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D26655089 Pulled By: mrshenli fbshipit-source-id: fa1545bd6dd2b33bc1e919de46b94e799ab9719c	2021-02-25 08:15:01 -08:00
Jane Xu	f71d9e28f9	Store test filename in test report path (#52791 ) Summary: This way, we can have a mapping from the test files we directly execute (the tests [here](https://github.com/pytorch/pytorch/blob/master/test/run_test.py#L20)) to the test suites that we store data for in XML reports. This will come in use later for categorizing the tests we run in CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52791 Reviewed By: samestep Differential Revision: D26655086 Pulled By: janeyx99 fbshipit-source-id: 94be32f80d7bc0ea1a7a11d4c4b1d3d8e774c5ea	2021-02-25 07:53:30 -08:00
Luca Wehrstedt	92a4ee1cf6	Revert D26375734: Implemented torch.linalg.multi_dot Test Plan: revert-hammer Differential Revision: D26375734 (`0396f492b9`) Original commit changeset: 839642692424 fbshipit-source-id: cb64db646010128d802e1930d5e9526c1f7aa6a2	2021-02-25 00:43:57 -08:00
Bel H	30cb6ac53c	Introduce `mlc` device (ML Compute device) to PyTorch's device list (#50634 ) Summary: Apple recently announced ML Compute, a new framework available in macOS Big Sur, which enables users to accelerate the training of neural networks on Mac hardware. This PR is the first on a series of PRs that will enable the integration with ML Compute. Most of the integration code will live on a separate subrepo named `mlc`. The integration with `mlc` (ML Compute) will be very similar to that of xla. We rely on registering our ops through: TORCH_LIBRARY_IMPL(aten, PrivateUse1, m) { m.impl_UNBOXED(<op_schema_name>, &customized_op_kernel) ... } Pull Request resolved: https://github.com/pytorch/pytorch/pull/50634 Reviewed By: malfet Differential Revision: D26614213 Pulled By: smessmer fbshipit-source-id: 3b492b346c61cc3950ac880ac01a82fbdddbc07b	2021-02-24 22:39:11 -08:00
Heitor Schueroff	0396f492b9	Implemented torch.linalg.multi_dot (#51807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51807 Implemented torch.linalg.multi_dot similar to [numpy.linalg.multi_dot](https://numpy.org/doc/stable/reference/generated/numpy.linalg.multi_dot.html). This function does not support broadcasting or batched inputs at the moment. NOTE numpy.linalg.multi_dot allows the first and last tensors to have more than 2 dimensions despite their docs stating these must be either 1D or 2D. This PR diverges from NumPy in that it enforces this restriction. TODO - [ ] Benchmark against NumPy - [x] Add OpInfo testing - [x] Remove unnecessary copy for out= argument Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D26375734 Pulled By: heitorschueroff fbshipit-source-id: 839642692424c4b1783606c76dd5b34455368f0b	2021-02-24 15:32:30 -08:00
Heitor Schueroff	08d7f29601	Add discontiguous kwarg to make_tensor (#51985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51985 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D26375733 Pulled By: heitorschueroff fbshipit-source-id: bb7831dc28c24b90c6f83885681eeccfdbb83438	2021-02-24 08:57:24 -08:00
Pritam Damania	1c63cb2c0f	Pass child error to parent in distributed tests. (#52632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52632 Distributed tests run in a multiprocessing environment, where a parent process drives the tests through several child processes. As a result, when a child process fails the parent only prints the following: ``` Process 0 exited with error code 10 ``` The child process also logs its own exception, but it is cumberson to go through the logs and track this down. To alleviate this, I've added a bunch of pipes for each child process so that the child process writes the error to the pipe before exiting and the parent process can read the appropriate error from the pipe and display it. The new output printed by the parent is as follows: ``` > RuntimeError: Process 0 exited with error code 10 and exception: Traceback (most recent call last): File "torch/testing/_internal/common_distributed.py", line 361, in _run getattr(self, test_name)() File "torch/testing/_internal/common_distributed.py", line 288, in wrapper fn() File "test_c10d.py", line 789, in test_broadcast_checks pg.broadcast([t1], opts) ValueError: ProcessGroupGloo::broadcast: invalid root rank: -1 Process 1 exited with error code 10 and exception: Traceback (most recent call last): File "torch/testing/_internal/common_distributed.py", line 361, in _run getattr(self, test_name)() File "torch/testing/_internal/common_distributed.py", line 288, in wrapper fn() File "test_c10d.py", line 789, in test_broadcast_checks pg.broadcast([t1], opts) ValueError: ProcessGroupGloo::broadcast: invalid root rank: -1 Process 2 exited with error code 10 and exception: Traceback (most recent call last): File "torch/testing/_internal/common_distributed.py", line 361, in _run getattr(self, test_name)() File "torch/testing/_internal/common_distributed.py", line 288, in wrapper fn() File "test_c10d.py", line 789, in test_broadcast_checks pg.broadcast([t1], opts) ValueError: ProcessGroupGloo::broadcast: invalid root rank: -1 Process 3 exited with error code 10 and exception: Traceback (most recent call last): File "torch/testing/_internal/common_distributed.py", line 361, in _run getattr(self, test_name)() File "torch/testing/_internal/common_distributed.py", line 288, in wrapper fn() File "test_c10d.py", line 789, in test_broadcast_checks pg.broadcast([t1], opts) ValueError: ProcessGroupGloo::broadcast: invalid root rank: -1 ``` ghstack-source-id: 122273793 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D26589274 fbshipit-source-id: 7b7a71ec790b216a89db7c157377f426531349a5	2021-02-23 11:50:25 -08:00
kshitij12345	49b59e3472	Add OpInfo entries for i0 and logical_not (#51956 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51956 Reviewed By: albanD Differential Revision: D26404440 Pulled By: mruberry fbshipit-source-id: dd73e63155dd4a200afb38a5e566eb2132e69fde	2021-02-23 10:12:05 -08:00
kshitij12345	ed71cbdd39	Revert PR 52483 "[reland][complex] `masked_fill` (#52587 ) Summary: Revert "[reland][complex] `masked_fill`: Complex Autograd support update masked_scatter skips. (https://github.com/pytorch/pytorch/issues/52483)" This reverts commit `b6cf17deee`. Reference: https://github.com/pytorch/pytorch/pull/52483#issuecomment-783023560 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52587 Reviewed By: anjali411 Differential Revision: D26579741 Pulled By: malfet fbshipit-source-id: 9b53c8aab51d844d0f65393609861a4ff72ef7bb	2021-02-22 10:53:37 -08:00
Brian Hirsh	57637e0ab4	port upsample_nearest3d and upsample_trilinear3d to structured (#52065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52065 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D26373027 Pulled By: bdhirsh fbshipit-source-id: 76b283ea8142732ffc8f7b200a8494349739e326	2021-02-22 10:38:52 -08:00
Brian Hirsh	d659477ae0	port upsample_bilinear2d and upsample_bicubic2d to structured (#52012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52012 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D26356329 Pulled By: bdhirsh fbshipit-source-id: 8f974224799493e3172fe5dff3fbd43af8c09722	2021-02-22 10:38:48 -08:00
Brian Hirsh	f3ea5ca672	port upsample_linear1d to structured (#51917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51917 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D26327750 Pulled By: bdhirsh fbshipit-source-id: 443ad278010ce655eb5f08fa6889c45ccb328268	2021-02-22 10:38:43 -08:00
Rohan Varma	ef8d17e112	[DDP] Separate error messages for unused params in forward and not all outputs (#52391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52391 There are 2 ways DDP can throw the exception refactored here - 1) Unused params in the forward pass. We provide `find_unused_parameters=True` for this. 2) All params used in fwd pass, but not all outputs used in loss computation. There are a few workarounds for this but we do not provide native support. Previously, these 2 issues were combined into 1 error message but that has historically resulted in confusion, with users reporting getting this error even when they enable `find_unused_parameters=True` (which they expect to fix this error). As a result there is additional churn to debug these issues because the true cause (1) vs (2) is not known. This commit helps to fix the issue by separating out the 2 error messages depending on if we ran with unused parameter detection or not. Hopefully this should make the error message much more clear and actionable. error msg with `find_unused_params=True`: ``` RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. Since `find_unused_parameters=True` is enabled, this likely means that not all `forward` outputs participate in computing loss. You can fix this by making sure all `forward` function outputs participate in calculating loss. If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). ``` error msg without `find_unused_params` specified: ``` RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by making sure all `forward` function outputs participate in calculating loss. If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). ``` ghstack-source-id: 122097900 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D26496688 fbshipit-source-id: 4a9eeeda10293da13d94a692d10cb954e4506d7c	2021-02-19 17:09:22 -08:00
Jane Xu	09516d2d0c	Reenables skipped tests for all CUDA versions except 11.2 (#52359 ) Summary: This PR adds functionality to skip a test based on CUDA version. This way, we can be more specific when skipping a test, such as when the test only fails for a particular CUDA version. This allows us to add back the skipped tests for CUDA 11.2 for other CUDA versions, such as 10.1 and 11.1. I tested this locally (by using 11.0 instead of 11.2), but will run all the CI to make sure it works. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52359 Reviewed By: walterddr Differential Revision: D26487951 Pulled By: janeyx99 fbshipit-source-id: 45c71cc6105ffd9985054880009cf68ea5ef3f6a	2021-02-19 15:30:55 -08:00
Jerry Zhang	626756ac39	[quant][graphmode][api] debug --> reference (#52179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52179 Rename debug to reference. We'll use this to produce a reference quantized model that can be used as a common interface between pytorch quantized model and backends. Test Plan: python test/test_quantization.py TestQuantizeFx Imported from OSS Reviewed By: vkuzo Differential Revision: D26424656 fbshipit-source-id: a0299b023f6ba7d98f5750724c517b0ecb987b35	2021-02-19 14:20:01 -08:00
kshitij12345	b6cf17deee	[reland][complex] `masked_fill`: Complex Autograd support and update masked_scatter skips. (#52483 ) Summary: Reland https://github.com/pytorch/pytorch/issues/52035 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52483 Reviewed By: heitorschueroff Differential Revision: D26545097 Pulled By: anjali411 fbshipit-source-id: f154c239183279be381a7393a8226778b36148bb	2021-02-19 12:36:49 -08:00
Nikita Vedeneev	9699c703c2	Stable sort for the CPU take 2. (#51790 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38681. A duplicate of https://github.com/pytorch/pytorch/pull/50052 created to become importable to the fb internal tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51790 Reviewed By: agolynski Differential Revision: D26279045 Pulled By: glaringlee fbshipit-source-id: 348e171dee9c370a76002b65d0c82c329f57a421	2021-02-19 09:28:57 -08:00
kshitij12345	5fda3b094c	Add conj OpInfo and fix out inconsistency (#52059 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Fixes: https://github.com/pytorch/pytorch/issues/51949 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52059 Reviewed By: ailzhang Differential Revision: D26373800 Pulled By: anjali411 fbshipit-source-id: d2c92263a690072c0f23cb60885be42eebea48c6	2021-02-19 08:18:55 -08:00
Yanli Zhao	c75fa39b6c	add stats that can only be collected at runtime (#51386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51386 add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data 1. gpu time stats are not collected for single process multiple devices in this diff, as that requires events are created and recorded on multiple devices 2. use at::cuda::event API for safer calls 3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results. 4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls ghstack-source-id: 121933566 Test Plan: unit tests Reviewed By: SciPioneer Differential Revision: D26158645 fbshipit-source-id: ce5f15187802eba76accb980449be68902c10178	2021-02-19 00:13:11 -08:00
Rohan Varma	c29e279f72	[DDP] unittest for when params arent used in backward pass (#52384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52384 Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient. ghstack-source-id: 122001930 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D26482479 fbshipit-source-id: c80bdeea7cf9db35390e385084ef28d64ed239eb	2021-02-18 23:34:16 -08:00
76181208+imaginary-person@users.noreply.github.com	3adc8f8cf7	Enable min & max for Float16 & BFloat16 (#51244 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50790. Added `min()` & `max()` support for `Float16` & `BFloat16`. CUDA already supported these ops on `Float16`, so the other three combinations had to be enabled. `OpInfo`s for `min` & `max` were also added, and their sample inputs were removed from `method_tests()`. ### MORE INFO The (slightly) long-term goal is to add dispatch for `min()` & `max()` related operations on CPU & CUDA for `Float16` & `BFloat16`, wherever they aren't present already: 1. `amin()` 2. `argmax()` 3. `amax()` 4. `argmin()` 5. `torch._aminmax()` 6. `torch.clamp()` on CPU. Was already supported on CUDA 7. `min()` (in this PR) 8. `max()` (in this PR) 9. `minimum()` 10. `maximum()` I'll submit separate PRs for the other ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51244 Reviewed By: jbschlosser Differential Revision: D26503455 Pulled By: anjali411 fbshipit-source-id: c32247f214e9272ca2e4322a23337874e737b140	2021-02-18 23:13:51 -08:00
Raghavan Raman	c7a70eec1b	Make LLVM the default backend for TE (#52314 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52264 When CPU fusion is enabled without LLVM support in PyTorch, it causes huge slowdown (> 50x). This PR makes the LLVM backend the default backend for TE. Now, an error will be reported if CPU fusion is enabled without LLVM support, to avoid this performance regression. This PR also updates the tests to not use LLVM, so that the old flow is continued. This is necessary because tests run in CI do not have LLVM. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52314 Reviewed By: ejguan Differential Revision: D26491294 Pulled By: navahgar fbshipit-source-id: 74561db1207da805d6d28039450db046ba2988fb	2021-02-18 12:00:38 -08:00
Gregory Chanan	983347fa25	Allow broadcasting against lerp weights. (#52319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52319 Fixes: https://github.com/pytorch/pytorch/issues/52254 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26488411 Pulled By: gchanan fbshipit-source-id: 60eb471609986584c4235ba7f263581e988e7642	2021-02-18 09:53:25 -08:00
Rong Rong (AI Infra)	b52e2e6045	[BE] _get_torch_cuda_version should return tuple (#52409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52409 Reviewed By: jbschlosser, glaringlee Differential Revision: D26513924 Pulled By: walterddr fbshipit-source-id: ee18ef357c326c5ad344d80c59821cc2b8814734	2021-02-18 09:28:38 -08:00
Vasiliy Kuznetsov	d903106bad	[wip] ns for fx: add support for subgraph matching (#52130 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52130 We have patterns like (F.linear, F.relu) which need to match to (toq.linear_relu). So, we need to match subgraphs. This PR does the following: * defines a "subgraph" as (start_node, end_node). The current assumption is that subgraphs are simple, there is always a path from start_node to end_node, and we can ignore any non-input args/kwargs of these nodes for the purposes of matching and copying things. An example one node subgraph is (F.linear, F.linear). An example two node subgraph is (F.linear, F.relu). * changes the matching logic to iterate over subgraphs instead of nodes * changes the NS core APIs to use subgraph pairs instead of node pairs: 1. for weights, we match on the start node 2. for unshadowed activations, we observe the end nodes 3. for shadowed activations, we copy the subgraph of a to graph c TODO(before review) write up better, not ready for review yet Test Plan: TODO before land: better test plan Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26403092 fbshipit-source-id: e49aaad4b02b8d60589435848bee422b8f41937a	2021-02-18 08:20:04 -08:00
Vasiliy Kuznetsov	3978ffb37a	NS for FX: add test for a simple sparsenn model (#52092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52092 Adds a very simple toy sparsenn model, and enables its inspection with the new NS APIs. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_sparsenn_compare_activations python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_sparsenn_shadow ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26403095 fbshipit-source-id: 3c3650aca47186deb32f2b3f1d87a0716d1ad9d1	2021-02-18 08:17:57 -08:00
Anjali Chourdia	758aa45563	Revert D26369476: [pytorch][PR] [complex] `masked_fill`: Complex Autograd support and update masked_scatter skips. Test Plan: revert-hammer Differential Revision: D26369476 (`7a408c7290`) Original commit changeset: 7a79d5a609b0 fbshipit-source-id: f0011f40962ccbcd8e7c19bd727e1e49cf2ec0c4	2021-02-18 05:01:03 -08:00
kshitij12345	7a408c7290	[complex] `masked_fill`: Complex Autograd support and update masked_scatter skips. (#52035 ) Summary: Now that `masked_fill` CUDA is migrated, skips on masked_scatter can be removed. Reference: https://github.com/pytorch/pytorch/issues/33152 Note: Have decreased the shape of Tensor for `masked_scatter` from (M, M) -> (S, S) and so on. With shapes of M : 96.53s ``` test/test_ops.py ........................................ssssssssssss........................ssssssssssss........................ [100%] =============================================================== 88 passed, 24 skipped, 7981 deselected in 96.53s (0:01:36) ================================================================ ``` With shapes of S : 46.53s ``` test/test_ops.py ........................................ssssssssssss........................ssssssssssss........................ [100%] ==================================================================== 88 passed, 24 skipped, 7981 deselected in 46.53s ===================================================================== ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52035 Reviewed By: VitalyFedyunin Differential Revision: D26369476 Pulled By: anjali411 fbshipit-source-id: 7a79d5a609b0019f8fe9ce6452924dd33390dce1	2021-02-17 22:49:26 -08:00
Kimish Patel	08b95e3c48	[Pytorch, Sparsity] Integrate sparse qnnpack operator in framework (#52377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52377 Add QNNPACK specific packed params for sparse linear. Add sparse linear dynamic op with appropriate registration. Add python side LinearDynamic module for sparsity. Add tests to validate sparse linear qnnpack kernels. Note that since these test are mostly run on x86 platform and given that 1x4 sparse kernels are implemented both in sse and arm, LinearDynamic at the moment defaults to 1x4 pattern. Plan is to add another diff that will allow a global override for 8x1 pattern such that prepare/convert flow can work for exporting model for mobile. Test Plan: buck run caffe2/torch/fb/model_optimization:sparsity_test Reviewed By: z-a-f Differential Revision: D26491944 fbshipit-source-id: b98839b4c62664e1fabbb0cbeb2e5c1bd5903b4d	2021-02-17 18:25:13 -08:00
Ansley Ussery	440fddf07b	Remove unnecessary statement in `capture_stderr` (#52366 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52366 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D26489602 Pulled By: ansley fbshipit-source-id: dd0db0a631840b5efd5dc48887fbf724781c6be4	2021-02-17 12:28:46 -08:00
Rohan Varma	6dabe0b291	[Dist Profiling] Enable dist profiling for DDP (gloo only) (#52031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52031 Closes https://github.com/pytorch/pytorch/issues/52020 Ensures that we can profile collectives in DDP by propagating the profiler threadLocalState appropriately. As described in the above issue, before this wouldn't work as the profiler would only be enabled on the main thread. ghstack-source-id: 121818080 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D26356192 fbshipit-source-id: 0158b5833a3f857a0b4b2943ae3037e9d998dfd1	2021-02-17 12:21:37 -08:00
Nikita Shulga	72d1ccd3ca	Revert D26263480: [Pytorch, Sparsity] Integrate sparse qnnpack operator in framework Test Plan: revert-hammer Differential Revision: D26263480 (`87ebaa4eb1`) Original commit changeset: 04ab60aec624 fbshipit-source-id: ad7690eebdc4b2782c2c94b5bbadbde4ef7c0627	2021-02-17 11:29:08 -08:00
Kimish Patel	87ebaa4eb1	[Pytorch, Sparsity] Integrate sparse qnnpack operator in framework Summary: Add QNNPACK specific packed params for sparse linear. Add sparse linear dynamic op with appropriate registration. Add python side LinearDynamic module for sparsity. Add tests to validate sparse linear qnnpack kernels. Note that since these test are mostly run on x86 platform and given that 1x4 sparse kernels are implemented both in sse and arm, LinearDynamic at the moment defaults to 1x4 pattern. Plan is to add another diff that will allow a global override for 8x1 pattern such that prepare/convert flow can work for exporting model for mobile. Test Plan: buck run caffe2/torch/fb/model_optimization:sparsity_test Reviewed By: z-a-f Differential Revision: D26263480 fbshipit-source-id: 04ab60aec624d1ecce8cfb38b79c7e94f501cdf6	2021-02-17 08:44:16 -08:00
Vasiliy Kuznetsov	bfc7e28188	reland - ns for fx - stubs of the three APIs (compare weights, activations, activations with shadow) (#52302 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52302 Adds the basic functionality for the three Numeric Suite core APIs to work on FX models: 1. comparing weights 2. comparing activations, with same input fed to both models 3. comparing activations, with nodes of A shadowing nodes of B Note: there are a lot of TODOs in the code, and some/most of the APIs and implementation details may change as we iterate. This is just the first PR. Test Plan: We have unit test coverage for all of the APIs, for now this is with toy models: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Reviewed By: raghuramank100 Differential Revision: D26463013 Pulled By: vkuzo fbshipit-source-id: e454115099ad18e4037d3c54986951cdffcab367	2021-02-16 19:59:32 -08:00
Natalia Gimelshein	eaddadd4f7	Revert D26403094: ns for fx - stubs of the three APIs (compare weights, activations, activations with shadow) Test Plan: revert-hammer Differential Revision: D26403094 (`37622db76a`) Original commit changeset: 9752331d4ae0 fbshipit-source-id: f0a32d443a29b25af33d90420dfd1bada40c917c	2021-02-14 15:09:16 -08:00
Ansley Ussery	4cc10563e7	Customize traceback for calls to symbolically-traced code (#51648 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51648 The following code will throw during the call to `traced(5)`: ```python class M(nn.Module): def __init__(self): super(M, self).__init__() self.W = torch.nn.Parameter(torch.randn(5)) def forward(self, x): return torch.dot(self.W, x) traced = fx.symbolic_trace(M()) traced(5) ``` Traceback before: ``` Traceback (most recent call last): File "test/tinytest.py", line 26, in <module> traced(5) File "/home/ansley/local/pytorch/torch/fx/graph_module.py", line 338, in wrapped_call return self._cls_call(self, args, kwargs) File "/home/ansley/local/pytorch/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, *kwargs) File "<eval_with_key_0>", line 4, in forward TypeError: dot(): argument 'tensor' (position 2) must be Tensor, not int ``` Traceback after: ``` Traceback (most recent call last): File "/home/ansley/local/pytorch/torch/fx/graph_module.py", line 338, in wrapped_call return torch.nn.Module.__call__(self, args, *kwargs) File "/home/ansley/local/pytorch/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, **kwargs) File "<eval_with_key_1>", line 4, in forward dot_1 = torch.dot(w, x); w = x = None TypeError: dot(): argument 'tensor' (position 2) must be Tensor, not int Call using an FX-traced Module, line 4 of the traced Module’s generated forward function: w = self.W dot_1 = torch.dot(w, x); w = x = None ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE relu_1 = dot_1.relu(); dot_1 = None return relu_1 ``` (Note that the same `TypeError` is thrown despite modifying the traceback.) Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D26424005 Pulled By: ansley fbshipit-source-id: 368f46ba81fb3111bd09654825bb2ac5595207d1	2021-02-12 18:31:23 -08:00

1 2 3 4 5 ...

935 Commits