pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
James Reed	9bc8f071a3	[WIP] Move torch.fx into its own target (#46658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46658 ghstack-source-id: 115213192 Test Plan: waitforsadcastle Reviewed By: zdevito, vkuzo Differential Revision: D24374723 fbshipit-source-id: 2b5708001f5df2ffb21ea5e586e26030653ccdcf	2020-10-29 17:03:08 -07:00
Rohan Varma	d850b5c98c	Fix DDP issue where parameters share same grad_accumulator (#46755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46755 As reported in https://github.com/pytorch/pytorch/issues/41324, there is a bug in DDP when `find_unused_parameters=True` and 2 or more parameters share the same gradient accumulator. In the reducer, we currently keep a mapping of grad accumulator to index and populate it with map[accumulator] = index, but this overwrites indices when the accumulator is the same. To fix this, switch the mapping values to a vector of indices to hold all such indices that share the same accumulator. ghstack-source-id: 115453567 Test Plan: Added UT Reviewed By: pritamdamania87 Differential Revision: D24497388 fbshipit-source-id: d32dfa9c5cd0b7a8df13c7873d5d28917b766640	2020-10-29 12:23:06 -07:00
Yi Wang	cab32d9cdf	[RPC Framework] Support remote device format "<workername>/<device>" (#46773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46773 Changed the constructor of RemoteModule to accept a `remote_device` arg in the following format: "<workername>/<device>" (e.g., "trainer0/cpu", "ps0/cuda:0") This arg merges the original `on` and `device` arg. Original PR issue: RemoteDevice Format #46554 ghstack-source-id: 115448051 Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- RemoteModule Reviewed By: pritamdamania87 Differential Revision: D24482562 fbshipit-source-id: 5acfc73772576a4b674df27625bf560b8f8e67c1	2020-10-29 00:14:56 -07:00
Kshiteej K	5c8aad1141	[numpy] `torch.cos`, `torch.tan` : promote integer inputs to float (#46706 ) Summary: References https://github.com/pytorch/pytorch/issues/42515 cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/46706 Reviewed By: izdeby Differential Revision: D24537262 Pulled By: mruberry fbshipit-source-id: e57377a625814a3f34a765ce6bfd63a33c02a5d9	2020-10-28 22:02:52 -07:00
Jerry Zhang	c2a3951352	[quant][graphmode][fx] Remove inplace option for convert_fx (#46955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46955 Initially we were thinking of adding a `invalidate_quantized_float_parameters` option to free the memory of quantized floating parameters, but it turns out we will do module swap just like in eager mode for the modules that are quantized, so the old floating point module will not be referenced after quantization. therefore this feature is only needed for functionals, since most people are using quantization with modules we may not need this. we'll revisit after we find there is a need for this. Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D24579400 fbshipit-source-id: fbb0e567405dc0604a2089fc001573affdade986	2020-10-28 21:07:19 -07:00
Rohan Varma	c7183c9878	Fix object-based collectives API to use torch.cuda.current_device instead of (#46897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46897 These APIs implicitly assumed that gpu for rank == rank index, but that is not necessarily true. For example, the first GPU could be used for a different purpose and rank 0 could use GPU 1, rank 1 uses GPU 2, etc. Thus, we mandate that the user specify the device to use via `torch.cuda.set_device()` before making calls to this API. This expectation should be okay since we clearly document it, and we expect the user to set this for DistributedDataParallel as well. Also adds/tidies up some documentation. ghstack-source-id: 115359633 Test Plan: Modified unittests Reviewed By: divchenko Differential Revision: D24556177 fbshipit-source-id: 7e826007241eba0fde3019180066ed56faf3c0ca	2020-10-28 18:12:50 -07:00
Richard Barnes	353e7f940f	Ensure kernel launches are checked (#46474 ) Summary: Caffe2 and Torch currently does not have a consistent mechanism for determining if a kernel has launched successfully. The result is difficult-to-detect or silent errors. This diff provides functionality to fix that. Subsequent diffs on the stack fix the identified issues. Kernel launch errors may arise if invalid launch parameters (number of blocks, number of threads, shared memory, or stream id) are specified incorrectly for the hardware or for other reasons. Interestingly, unless these launch errors are specifically checked for CUDA will silently fail and return garbage answers which can affect downstream computation. Therefore, catching launch errors is important. Launches are currently checked by placing ``` AT_CUDA_CHECK(cudaGetLastError()); ``` somewhere below the kernel launch. This is bad for two reasons. 1. The check may be performed at a site distant to the kernel launch, making debugging difficult. 2. The separation of the launch from the check means that it is difficult for humans and static analyzers to determine whether the check has taken place. This diff defines a macro: ``` #define TORCH_CUDA_KERNEL_LAUNCH_CHECK() AT_CUDA_CHECK(cudaGetLastError()) ``` which clearly indicates the check. This diff also introduces a new test which analyzes code to identify kernel launches and determines whether the line immediately following the launch contains `TORCH_CUDA_KERNEL_LAUNCH_CHECK();`. A search of the Caffe2 codebase identifies 104 instances of `AT_CUDA_CHECK(cudaGetLastError());` while the foregoing test identifies 1,467 launches which are not paired with a check. Visual inspection indicates that few of these are false positives, highlighting the need for some sort of static analysis system. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46474 Test Plan: The new test is run with: ``` buck test //caffe2/test:kernel_launch_checks -- --print-passing-details ``` And should be launched automatically with the other land tests. (TODO: Is it?) The test is currently set up only to provide warnings but can later be adjusted to require checks. Otherwise, I rely on the existing test frameworks to ensure that changes resulting from reorganizing existing launch checks don't cause regressions. Reviewed By: ngimel Differential Revision: D24309971 Pulled By: r-barnes fbshipit-source-id: 0dc97984a408138ad06ff2bca86ad17ef2fdf0b6	2020-10-28 09:27:48 -07:00
Yang Wang	810c68fb1d	[OpBench] fix jit tracing with quantized op/tensor by enabling `_compare_tensors_internal` to compare quantized tensors (#46772 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46772 When running `buck run caffe2/benchmarks/operator_benchmark/pt:qactivation_test -- --use_jit`, I encountered the following error P146518683. The error was traced down to the fact that `torch.allclose` does not work with quantized tensors (the error was triggered by this particular multiplication https://fburl.com/diffusion/8vw647o6 since native mul can not work with a float scalar and a quantized tensor.) Minimum example to reproduce: ```(Pdb) input = torch.ones(5) (Pdb) aa = torch.quantize_per_tensor(input, scale=1.0, zero_point=0, dtype=torch.quint8) (Pdb) bb = torch.quantize_per_tensor(input, scale=1.0, zero_point=0, dtype=torch.quint8) (Pdb) torch.allclose(aa, bb) Comparison exception: promoteTypes with quantized numbers is not handled yet; figure out what the correct rules should be, offending types: QUInt8 Float ``` Here the proposed fix is to compare quantized tensors strictly within `_compare_tensors_internal`. The other two possible fixes are: 1. convert quantized tensors to float tensors first before sending them to `torch.allclose` 2. change `torch.allclose` to handle quantized tensor. Test Plan: buck run caffe2/benchmarks/operator_benchmark/pt:qactivation_test -- --use_jit Reviewed By: kimishpatel Differential Revision: D24506723 fbshipit-source-id: 6426ea2a88854b4fb89abef0edd2b49921283796	2020-10-27 18:53:13 -07:00
kshitij12345	21e60643c0	[numpy] `torch.log{2,10}` : promote integer inputs to float (#46810 ) Summary: References https://github.com/pytorch/pytorch/issues/42515 cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/46810 Reviewed By: izdeby Differential Revision: D24536187 Pulled By: mruberry fbshipit-source-id: b7dd7678d4e996f3dea0245c65055654e02be459	2020-10-27 13:07:44 -07:00
Pritam Damania	adafd3d4b2	Support RRef.backward() for local RRefs. (#46568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46568 This PR adds support for an RRef.backward() API. This would be useful in applications like pipeline parallelism as described here: https://github.com/pytorch/pytorch/issues/44827 This PR only adds support for local RRefs, remote RRef support will be added in a follow up PR. ghstack-source-id: 115100729 Test Plan: 1) unit tests. 2) waitforbuildbot Reviewed By: mrshenli Differential Revision: D24406311 fbshipit-source-id: fb0b4e185d9721bf57f4dea9847e0aaa66b3e513	2020-10-26 17:31:17 -07:00
anjali411	13a5be571b	Enable complex backward for torch.take() and tensor.fill_() (#46860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46860 Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D24544601 Pulled By: anjali411 fbshipit-source-id: 4e29d48da30da3630cb558ccee464d89780b1ab7	2020-10-26 15:46:08 -07:00
Oscar Sandoval	58ed60c259	Added context manager enabling all futures returned by rpc_async and custom build rpc functions to be automatically waited on (#41807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41807 Test Plan: Make sure ci tests pass, including newly written test Reviewed By: mrshenli Differential Revision: D22640839 Pulled By: osandoval-fb fbshipit-source-id: 3ff98d8e8c6e6d08575e307f05b5e159442d7216	2020-10-26 12:53:35 -07:00
anjali411	d94bd998ec	Update backward formulas (Re #44444 ) (#46275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46275 Re #44444 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D24285785 Pulled By: anjali411 fbshipit-source-id: c60ecd4fe4f144132085f2c91d3b950e92b2a491	2020-10-25 19:40:59 -07:00
Guilherme Leobas	789e935304	Annotate torch.nn.cpp (#46490 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46489 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46490 Reviewed By: zhangguanheng66 Differential Revision: D24509519 Pulled By: ezyang fbshipit-source-id: edffd32ab2ac17ae4bbd44826b71f5cb9f1da1c5	2020-10-23 17:40:32 -07:00
Nikita Vedeneev	c31ced4246	make `torch.lu` differentiable. (#46284 ) Summary: As per title. Limitations: only for batches of squared full-rank matrices. CC albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/46284 Reviewed By: zou3519 Differential Revision: D24448266 Pulled By: albanD fbshipit-source-id: d98215166268553a648af6bdec5a32ad601b7814	2020-10-23 10:13:46 -07:00
Supriya Rao	e34c825b77	[quant][fx] Embedding quantization support (#46677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46677 Add support for weight only embedding quantization Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_qembedding_module Imported from OSS Reviewed By: jerryzh168 Differential Revision: D24463305 fbshipit-source-id: 2dba49d8a77cf237a8e6da2efdd83b1ebdc432d6	2020-10-22 17:59:52 -07:00
Alexander Grund	93719440b8	Replace map(lambda constructs (#46462 ) Summary: Follow-up of https://github.com/pytorch/pytorch/issues/46461 with a similar goal Makes them more readable and possibly faster. Care has to be taken because `map` applies the function immediately while `(x for x in xs)` is a generator expression which gets evaluated later. This is a benefit in some cases where it is not required to actually create the list of values in memory (e.g. when passing to `tuple` or `extend` or `join`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/46462 Reviewed By: zou3519 Differential Revision: D24422343 Pulled By: ezyang fbshipit-source-id: 252e33499c92ac0b15238f2df32681dbbda2b237	2020-10-22 09:50:22 -07:00
Rohan Varma	25dc0056f2	[RPC] print exception message on workers that run python functions (#46372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46372 Currently, in `_run_function`, we catch an exception from the python function which is run, and report it back to the master. However in some large scale training jobs, it would be valuable to also log the error on the trainer itself for faster debugging. Test Plan: Added unittest. Reviewed By: pritamdamania87 Differential Revision: D24324578 fbshipit-source-id: 88460d7599ea69d2c38fd9c10eb6471f7edd4100	2020-10-22 09:44:15 -07:00
Ivan Kobzarev	3112e23428	[py][vulkan][reland] Add is_vulkan to py api, add vulkan to device type parsing (#46655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46655 Test Plan: Imported from OSS Pulled By: IvanKobzarev Reviewed By: mrshenli Differential Revision: D24448984 fbshipit-source-id: 5000846a06077f7a5a06dd51da422d2a42f70820	2020-10-22 09:35:50 -07:00
Rohan Varma	7245d2c939	Avoid scatter for single-device case in DDP (#46304 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46304 In the case that a single process operates only on one GPU, we can avoid this scatter and instead replace it with a recursive version of `to` which transfers the input tensors to the correct device. The implementation of `_recursive_to` is modeled after `scatter` in https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/scatter_gather.py, in order to keep parity with the previous conventions (i.e. custom types not having their tensors moved). ghstack-source-id: 114896677 Test Plan: Added unittest, and CI Reviewed By: pritamdamania87 Differential Revision: D24296377 fbshipit-source-id: 536242da05ecabfcd36dffe14168b1f2cf58ca1d	2020-10-22 08:29:37 -07:00
kshitij12345	8e13fe6c44	[numpy] `torch.sin` : support and promote integer inputs to float (#45733 ) Summary: References https://github.com/pytorch/pytorch/issues/42515 > Enable integer -> float unary type promotion for ops like sin Will follow-up for other such Ops once this PR is merged. cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/45733 Reviewed By: zou3519 Differential Revision: D24431194 Pulled By: mruberry fbshipit-source-id: db600bc5de0e535b538d2aa301c3526b7c75ed17	2020-10-22 01:58:57 -07:00
Xiao Wang	fe4f90c40b	Cusolver inverse check info (#46625 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46557 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46625 Reviewed By: zou3519 Differential Revision: D24438577 Pulled By: ngimel fbshipit-source-id: d00e6eb2eae4aa39ca6ecf5914fe9cf37c24b906	2020-10-21 21:46:33 -07:00
Rahul Nambiar	adbb50ea67	Enabling alias annotation checks for all operations during autograd tests (#46601 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46601 * except excluded tests and magic methods. https://github.com/pytorch/pytorch/issues/38731 Previously, we'd only do run these tests for inplace operations. Since this is a lot more tests, fixed these issues that came up when running them - - Updated schema of conj() to reflect existing behaviour. - Updated deepEquals method in check_alias_annotation.cpp to re-use the overloaded == operator. Previous implementation did not cover all types of IValues. - Corrected the order inputs are passed in during autograd testing of 'view' & 'reshape'. - Subbed out atn::ger with the func its aliased to, atn::outer, for testing. The alias annotation checking code doesn't handle aliased operators properly. ghstack-source-id: 114830903 Test Plan: Ran all tests in test:jit and verified they pass. Reviewed By: eellison Differential Revision: D24424955 fbshipit-source-id: 382d7e2585911b81b1573f21fff1d54a5e9a2054	2020-10-21 20:01:57 -07:00
Howard Huang	611f028168	Add Batch-Updating Parameter Server Example to CI Tests (#46510 ) Summary: Resolves one item in https://github.com/pytorch/pytorch/issues/46321 This PR sets up DistExamplesTest which will be used as the class to implement future tests for examples. This class is run as part of CI tests. It also creates a dist_examples folder and includes the [batch server example](https://github.com/pytorch/examples/blob/master/distributed/rpc/batch/parameter_server.py) which is slightly modified to allow to be tested. Run test: pytest test/distributed/rpc/test_tensorpipe_agent.py -k test_batch_updating_parameter_server -vs pytest test/distributed/rpc/test_process_group_agent.py -k test_batch_updating_parameter_server -vs Pull Request resolved: https://github.com/pytorch/pytorch/pull/46510 Reviewed By: mrshenli Differential Revision: D24379296 Pulled By: H-Huang fbshipit-source-id: 1c102041e338b022b7a659a51894422addc0e06f	2020-10-21 08:46:46 -07:00
Jerry Zhang	f9446cb15a	[quant][refactor] Remove register api and rename get__mapping to get_default__mapping (#46337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46337 We plan to pass around the mappings instead of using global registration api to keep the mappings local to the transformations user is performing Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D24317436 fbshipit-source-id: 81569b88f05eeeaa9595447e482a12827aeb961f	2020-10-20 15:53:47 -07:00
Jane Xu	0d4590c279	renaming env var IN_CIRCLECI to a broader name of IN_CI (#46567 ) Summary: The `IN_CIRCLECI` variable is a misnomer since the flag really indicates when we enable XML reporting because we want to run the test in CI. Since this doesn't necessarily mean CircleCI in particular, IN_CI is more accurate and general. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46567 Reviewed By: walterddr Differential Revision: D24407642 Pulled By: janeyx99 fbshipit-source-id: 5e141a0571b914310a174a58ac0fde58e9521c6b	2020-10-20 08:25:39 -07:00
Alexander Grund	5b0f400488	Replace list(map(...)) constructs by list comprehensions (#46461 ) Summary: As discussed in https://github.com/pytorch/pytorch/issues/46392 this makes the code more readable and possibly more performant. It also fixes a bug detected by this where the argument order of `map` was confused: `030a24906e (diff-5bb26bd3a23ee3bb540aeadcc0385df2a4e48de39f87ed9ea76b21990738fe98L1537-R1537)` Fixes https://github.com/pytorch/pytorch/issues/46392 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46461 Reviewed By: ailzhang Differential Revision: D24367015 Pulled By: ezyang fbshipit-source-id: d55a67933cc22346b00544c9671f09982ad920e7	2020-10-19 18:42:49 -07:00
Jerry Zhang	30d687522d	[reland][quant][eagermode] Move custom_module registration to prepare/convert_custom_config_dict (#46293 ) (#46364 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46364 Test Plan: Imported from OSS Imported from OSS Reviewed By: vkuzo Differential Revision: D24322747 fbshipit-source-id: 4801ba1835fc805bf767fe9810b9edfa2ceeefb4	2020-10-19 15:21:00 -07:00
Nikita Shulga	172ed51a17	Mark parts of spectral tests as slow (#46509 ) Summary: According to https://app.circleci.com/pipelines/github/pytorch/pytorch/228154/workflows/31951076-b633-4391-bd0d-b2953c940876/jobs/8290059 TestFFTCUDA.test_fftn_backward_cuda_complex128 takes 242 seconds to finish, where most of the time spent checking 2nd gradient Refactor common part of test_fft_backward and test_fftn_backward into _fft_grad_check_helper Introduce `slowAwareTest` decorator Split test into fast and slow parts by checking 2nd degree gradient only during the slow part Pull Request resolved: https://github.com/pytorch/pytorch/pull/46509 Reviewed By: walterddr Differential Revision: D24378901 Pulled By: malfet fbshipit-source-id: 606670c2078480219905f63b9b278b835e760a66	2020-10-19 10:11:46 -07:00
Rohan Varma	5c5484c889	[Flaky tests] Fix test_all_gather_timeout test (#45989 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45989 This test was failing internally for the Thrift-based RPC agent, since it has a different error regex. Use `self.get_timeout_error_regex` which gets the timeout error string for each backend to fix this. ghstack-source-id: 114463458 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D24170394 fbshipit-source-id: 9b30945e3e30f36472268d042173f8175ad88098	2020-10-16 09:02:46 -07:00
Xiong Wei	7b788d113e	Fix deprecated warnings for nan_to_num (#46309 ) Summary: Related to https://github.com/pytorch/pytorch/issues/44592 This PR is to fix the deprecated warnings for the nan_to_num function. Below is the warning message when building the latest code. ``` ../aten/src/ATen/native/UnaryOps.cpp: In function ‘at::Tensor& at::native::nan_to_num_out(at::Tensor&, const at::Tensor&, c10::optional<double>, c10::optional<double>, c10::optional<double>)’: ../aten/src/ATen/native/UnaryOps.cpp:397:45: warning: ‘bool c10::isIntegralType(c10::ScalarType)’ is deprecated: isIntegralType is deprecated. Please use the overload with 'includeBool' parameter instead. [-Wdeprecated-declarations] if (c10::isIntegralType(self.scalar_type())) { ``` The deprecated warning is defined in `ScalarType.h`. `d790ec6de0/c10/core/ScalarType.h (L255-L260)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/46309 Reviewed By: mrshenli Differential Revision: D24310248 Pulled By: heitorschueroff fbshipit-source-id: 0f9f2ad304eb5a2da9d2b415343f2fc9029037af	2020-10-16 06:07:14 -07:00
Rong Rong	d1745c36dc	fix type check for torch.quantization._numeric_suite (#46330 ) Summary: fix https://github.com/pytorch/pytorch/issues/42977 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46330 Reviewed By: malfet Differential Revision: D24320449 Pulled By: walterddr fbshipit-source-id: f892b5c83cb932aee53245d6c825568b3e05f3c6	2020-10-15 20:45:07 -07:00
Ivan Yashchuk	c1141b6f68	Added support for complex torch.pinverse (#45819 ) Summary: This PR adds support for complex-valued input for `torch.pinverse`. Fixed cuda SVD implementation to return singular values with real dtype. Fixes https://github.com/pytorch/pytorch/issues/45385. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45819 Reviewed By: heitorschueroff Differential Revision: D24306539 Pulled By: anjali411 fbshipit-source-id: 2fe19bc630de528e0643132689e1bc5ffeaa162a	2020-10-15 12:28:22 -07:00
Xiaomeng Yang	a87a1c1103	Fix perfornance issue of GroupNorm on CUDA when feature map is small. (#46170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46170 Fix perfornance issue of GroupNorm on CUDA when feature map is small. Benchmark script: ``` import torch import torch.nn.functional as F from timeit import Timer norm = torch.nn.GroupNorm(8, 512).cuda() num = 5000 sizes = [(1024, 512, 14, 14), (1024, 512, 7, 7), (1024, 512)] def forward(x): _ = norm(x) torch.cuda.synchronize() def backward(y, grad): y.backward(grad, retain_graph=True) torch.cuda.synchronize() if __name__ == "__main__": # warm up x = torch.rand((sizes[0]), dtype=torch.float, device="cuda", requires_grad=True) for _ in range(100): forward(x) for size in sizes: x = torch.rand(size, dtype=torch.float, device="cuda", requires_grad=True) t = Timer("forward(x)", "from __main__ import forward, x") print(f"size = {size}:") t1 = t.timeit(num) / num * 1e6 print(f"avg_forward_time = {t1}us") y = norm(x) grad = torch.randn_like(y) t = Timer("backward(y, grad)", "from __main__ import backward, y, grad") t2 = t.timeit(num) / num * 1e6 print(f"avg_backward_time = {t2}us") ``` Benchmark result before this Diff: ``` size = (1024, 512, 14, 14): avg_forward_time = 1636.729855206795us avg_backward_time = 5488.682465581223us size = (1024, 512, 7, 7): avg_forward_time = 465.88476160541177us avg_backward_time = 3129.9425506033003us size = (1024, 512): avg_forward_time = 96.90486900508404us avg_backward_time = 2319.4099438143894us ``` Benchmark result after this Diff: ``` size = (1024, 512, 14, 14): avg_forward_time = 1635.6191572034732us avg_backward_time = 4140.7730475999415us size = (1024, 512, 7, 7): avg_forward_time = 463.6513736099005us avg_backward_time = 1641.7451039887965us size = (1024, 512): avg_forward_time = 66.59087920561433us avg_backward_time = 128.6882139975205us ``` Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "GroupNorm" Reviewed By: hl475, houseroad Differential Revision: D24242738 fbshipit-source-id: b52c82d7b6e47855c48fa8ceacd0c55d03bb92d5	2020-10-14 23:34:33 -07:00
Meghan Lele	75bf5f2b59	[JIT] Improve class type annotation inference (#45940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45940 Summary In `try_ann_to_type`, if an annotation has an attribute named `__torch_script_class__`, it is assumed to be a TorchScript class that has already been scripted. However, if it is a class that extends another class, this code path causes a crash because it looks up the JIT type for the class by name in the compilation unit. This JIT type obviously cannot exist because inheritance is not supported. This commit fixes this by looking up the qualified name of a class in torch.jit._state._script_class in order to ascertain whether it has already been scripted (instead of looking for a `__torch_script_class__` attribute on the class object. Test Plan This commit adds a unit test consisting of the code sample from the issue that reported this problem. Fixes This commit fixes #45860. Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D24310027 Pulled By: SplitInfinity fbshipit-source-id: 9f8225f3316fd50738d98e3544bf5562b16425b6	2020-10-14 23:28:47 -07:00
HyunJun	a69910868a	Fix possible padding length overflow in DistributedSampler (#45329 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45324 This fix handles cases for `len(dataset) * 2 < num_replica` in DistributedSampler. (which previous code resulted in error.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/45329 Reviewed By: mruberry Differential Revision: D24205035 Pulled By: rohan-varma fbshipit-source-id: f94329d9c1e7deaee41e5af319e7c7d0c741910c	2020-10-14 17:19:44 -07:00
Mike Ruberry	ff0af7242b	Revert D24290811: [quant][eagermode] Move custom_module registration to prepare/convert_custom_config_dict Test Plan: revert-hammer Differential Revision: D24290811 (`3ad797c937`) Original commit changeset: 7d2aee98e194 fbshipit-source-id: 24013e92044f2a1b36b1a9f475bbaa6f17bdaa11	2020-10-14 16:42:55 -07:00
Jerry Zhang	3ad797c937	[quant][eagermode] Move custom_module registration to prepare/convert_custom_config_dict (#46293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46293 Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D24290811 fbshipit-source-id: 7d2aee98e1946c2a4268efb94443f1e5daaa793e	2020-10-14 12:10:37 -07:00
ashish	5500b62f28	Enable zero batch conv tests for ROCm (#46305 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/26669 This PR enables convolution tests for zero batch size implemented in https://github.com/pytorch/pytorch/pull/26214/. jamesr66a jeffdaily sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46305 Reviewed By: navahgar Differential Revision: D24307981 Pulled By: heitorschueroff fbshipit-source-id: dfc595fa855ae084b60a693e209b0fdcc714221d	2020-10-14 11:36:30 -07:00
Brian Hirsh	1f791c06f0	adding BAND/BOR/BXOR reduce ops to unsupported list for complex numbers. added tests (#46270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46270 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D24284702 Pulled By: bdhirsh fbshipit-source-id: 7e6c3fce83a4367808a638f0400999399b2c35b0	2020-10-14 08:48:14 -07:00
Pritam Damania	f89498f3f8	Allow RPC framework to use rank in addition to WorkerInfo and name. (#46221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46221 The RPC framework only allowed sending RPCs based on provided WorkerInfo or name. When using RPC with DDP, sometimes it might just be easier to refer to everything in terms of ranks since DDP doesn't support names yet. As a result, support a `to` parameter in the RPC APIs which allow for specifying a rank as well would be helpful. ghstack-source-id: 114207172 Test Plan: 1) waitforbuildbot 2) Unit Tests Reviewed By: mrshenli Differential Revision: D24264989 fbshipit-source-id: 5edf5d92e2bd2f213471dfe7c74eebfa9efc9f70	2020-10-13 17:52:54 -07:00
Edward Yang	2118d58d45	Add some more docs to expecttest. (#46263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46263 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: robieta Differential Revision: D24281640 Pulled By: ezyang fbshipit-source-id: 88c5b3bf091f47b69ce58aa321669158c5afda79	2020-10-13 15:17:11 -07:00
Brian Hirsh	a3caa719af	fix #45552 - adding add_done_callback(fn) to torch.futures.Future (#45675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45675 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D24055353 Pulled By: bdhirsh fbshipit-source-id: 9233c8e17acc878f0fecbe740a4397fb55cf722f	2020-10-13 07:47:36 -07:00
Erjia Guan	bed3b40523	Implement ravel (#46098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46098 Doc: ![image](https://user-images.githubusercontent.com/68879799/95611323-ae5cf380-0a2f-11eb-9b8e-56bf79ce68af.png) Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D24253213 Pulled By: ejguan fbshipit-source-id: 42a866c902272cbe3743a9d0cb3afb9165d51c0b	2020-10-12 16:00:44 -07:00
Brian Hirsh	c02efdefa8	adding complex support for distributed functions and . fix #45760 (#45879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45879 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D24127949 Pulled By: bdhirsh fbshipit-source-id: 8061b14fa1c0adbe22b9397c2d7f92618556d223	2020-10-12 12:44:47 -07:00
Mingzhe Li	281463ba0b	[NCCL] Enable send/recv tests (#45994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45994 Send/Recv tests were disabled because of the https://github.com/pytorch/pytorch/issues/42517. With that issue fixed, this diff enables those tests. ghstack-source-id: 113970569 Test Plan: waitforsandcastle Reviewed By: jiayisuse Differential Revision: D24172484 fbshipit-source-id: 7492ee2e9bf88840c0d0086003ce8e99995aeb91	2020-10-09 15:00:39 -07:00
Rohan Varma	62554a3bd2	Prioritize raising error message about unused parameters when rebuild_buckets fails (#45933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45933 Occasionally users run DDP with models with unused params, in this case we would like to surface an error message telling them to run with find_unused_params=True. However, a recent change to rebuild_buckets logic (https://github.com/pytorch/pytorch/pull/44798) made it so that we raise a size mismatch error when this happens, but the information about unused parameters is likely to be more useful and likely to be the most common case of failure. Prefer raising this error over the subsequent size mismatch errors. ghstack-source-id: 113914759 Test Plan: Added unittest Reviewed By: mrshenli Differential Revision: D24151256 fbshipit-source-id: 5d349a988b4aac7d3e0ef7b3cd84dfdcbe9db675	2020-10-09 09:16:45 -07:00
Nikita Shulga	f363a2e106	Mark top 3 slowest tests as slow (#46068 ) Summary: `TCPStoreTest.test_numkeys_delkeys` takes 5+ min (mostly in idle wait for socket timeout) `TestDataLoader.test_proper_exit` and `TestDataLoaderPersistentWorkers.test_proper_exit` take 2.5 min each `TestXNNPACKConv1dTransformPass.test_conv1d_with_relu_fc` takes 2 min to finish Add option to skip reporting test classes that run for less than a second to `print_test_stats.py` and speed up `TestTorchDeviceTypeCUDA.test_matmul_45724_cuda` Pull Request resolved: https://github.com/pytorch/pytorch/pull/46068 Reviewed By: mruberry Differential Revision: D24208660 Pulled By: malfet fbshipit-source-id: 780e0d8be4f0cf69ea28de79e423291a1f3349b7	2020-10-08 21:10:03 -07:00
Mingzhe Li	b7f7378b2d	[NCCL] support send/recv to/from self when communicator is created on demand (#45873 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45873 This diff adds support for sending/receiving to/from self. It also fixed a bug when p2p operations are not used by all processes. ghstack-source-id: 113910526 Test Plan: waitforsandcastle Reviewed By: jiayisuse Differential Revision: D24124413 fbshipit-source-id: edccb830757ac64f569e7908fec8cb2b43cd098d	2020-10-08 19:19:15 -07:00
Shen Li	96d48178c8	Make pipeWrite and pipeRead noexcept (#45783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45783 After the previous device maps commits, `pipeWrite` might throw. In this case, if we increment active calls before `pipeWrite` on the caller, that active call won't be decremented properly when `pipeWrite` throws. As a result, `shutdown` can silently timeout. I noticed this as some tests take more than 60s to finish. This commit extract the tensor device checking logic out of pipeWrite, and make sure the error is thrown before the active call count is incremented. Differential Revision: D24094803 Test Plan: Imported from OSS Reviewed By: mruberry Pulled By: mrshenli fbshipit-source-id: d30316bb23d2afd3ba4f5540c3bd94a2ac10969b	2020-10-08 18:53:51 -07:00

1 2 3 4 5 ...

620 Commits