pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Kushashwa Ravi Shrimali	44c20ce676	Alias for `i0` to `special` namespace (#59141 ) Summary: See https://github.com/pytorch/pytorch/issues/50345 cc: mruberry kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59141 Reviewed By: ngimel Differential Revision: D28784097 Pulled By: mruberry fbshipit-source-id: 9b61a21906ef337292686fd40e328502a79e6f09	2021-06-01 23:04:09 -07:00
driazati	059a717c9e	Fix breakpad build and add to more images (#59236 ) Summary: This PR * adds the breakpad build to most of the remaining docker images (except the mobile + slim ones) * pins to a [fork of breakpad](https://github.com/google/breakpad/compare/master...driazati:master?expand=1) to enable dasiy chaining on signal handlers * renames the API to be nicer Pull Request resolved: https://github.com/pytorch/pytorch/pull/59236 Reviewed By: malfet Differential Revision: D28792511 Pulled By: driazati fbshipit-source-id: 83723e74b7f0a00e1695210ac2620a0c91ab4bf2	2021-06-01 22:47:14 -07:00
Yi Wang	dbe629c51d	[RPC Framework] Support creating a RemoteModule by RRef (#59242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59242 #Oringal PR Issue: https://github.com/pytorch/pytorch/issues/58274 This can be a workaround: Instead of passing a script `RemoteModule` over RPC, pass its `module_rref` field over RPC, and then construct a new `RemoteModule` on the receiver end. ghstack-source-id: 130268018 Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_send_remote_module_over_the_wire_script_not_supported buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_remote_module_py_pickle_not_supported_script buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_create_remote_module_by_module_rref Reviewed By: vipannalla Differential Revision: D28794905 fbshipit-source-id: 1a677ff0d4b47c078ad47b50d7102a198a1fc39b	2021-06-01 22:35:03 -07:00
Jerry Zhang	3218d890dd	[quant][graphmode][fx][fix] Fix support for custom module (#59041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59041 Static quantization for Custom module support was removed in a previous refactor https://github.com/pytorch/pytorch/pull/57519 since it's not covered by the test case This PR re-enabled the test case and fixed the support Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D28724866 fbshipit-source-id: 1974675b88b56a2173daf86965d6f3fb7ebd783b	2021-06-01 22:31:15 -07:00
Jerry Zhang	06af7618e7	[quant][graphmode][fx][refactor] Remove Quantizer class from convert (QuantizeHandler) (#59040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59040 To remove Quantizer class and split prepare and convert functions to different files Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724870 fbshipit-source-id: c0f748711b825cd46bdfcc05c054c77a41e8207a	2021-06-01 22:00:49 -07:00
Philip Meier	0a26781966	fix numpy compatibility in test for `torch.kthvalue` (#59214 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59201. Should be merged after https://github.com/pytorch/pytorch/issues/59067 to ensure this actually working correctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59214 Reviewed By: albanD Differential Revision: D28792363 Pulled By: mruberry fbshipit-source-id: 0cf613463139352906fb567f1efcc582c2c25de8	2021-06-01 21:57:09 -07:00
Ivan Yashchuk	e9e1bb1a4e	Fix device of info tensor for torch.linalg.inv_ex with MAGMA backend (#59223 ) Summary: This PR fixes `torch.linalg.inv_ex` with MAGMA backend. `info` tensor was returned on CPU device even for CUDA inputs. Now it's on the same device as input. Fixes https://github.com/pytorch/pytorch/issues/58769 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59223 Reviewed By: ngimel Differential Revision: D28814876 Pulled By: mruberry fbshipit-source-id: f66c6f06fb8bc305cb2e22b08750a25c8888fb65	2021-06-01 21:49:57 -07:00
Jerry Zhang	50e6ee3ca2	[quant][graphmode][fx][refactor] Remove Quantizer class from quantize_node (#59039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59039 To remove Quantizer class and split prepare and convert functions to different files Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724874 fbshipit-source-id: bd984716b2da1d6879c3e92fa827574783a41567	2021-06-01 21:40:08 -07:00
Alexander	2d8f0d966f	CUDA support in the CSR layout: CUDA addmm/matvec (#59012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59012 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D28719631 Pulled By: bhosmer fbshipit-source-id: 43e2004a61e114aeb0a7c6ad8a25fedda238c6da	2021-06-01 21:16:42 -07:00
Michael Carilli	3efefc4016	[CUDA graphs] Makes sure all graphs tests call empty_cache() at some point before capture (#59233 ) Summary: Graphs tests are sometimes flaky in CI ([example](https://app.circleci.com/pipelines/github/pytorch/pytorch/328930/workflows/0311199b-a0be-4802-a286-cf1e73f96c70/jobs/13793451)) because when the GPU runs near its max memory capacity (which is not unusual during a long test), sometimes, to satisfy new allocations that don't match any existing unused blocks, the caching allocator may call `synchronize_and_free_events` to wait on block end-of-life events and cudaFree unused blocks, then re-cudaMalloc a new block. For ungraphed ops this isn't a problem, but synchronizing or calling cudaFree while capturing is illegal, so `synchronize_and_free_events` raises an error if called during capture. The graphs tests themselves don't use much memory, so calling torch.cuda.empty_cache() at some point before their captures should ensure memory is available and the captures never need `synchronize_and_free_events`. I was already calling empty_cache() near the beginning of several graphs tests. This PR extends it to the ones I forgot. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59233 Reviewed By: mruberry Differential Revision: D28816691 Pulled By: ngimel fbshipit-source-id: 5cd83e48e43b1107daed5cfa2efff0fdb4f99dff	2021-06-01 21:05:46 -07:00
Jerry Zhang	1d37f41567	[quant][graphmode][fx][refactor] Remove _prepare from Quantizer class (#59038 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59038 To remove Quantizer class and split prepare and convert functions to different files Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724869 fbshipit-source-id: e8501c9720b5ddb654e78bc8fa08de0466c1d52b	2021-06-01 18:01:22 -07:00
Richard Zou	970096b624	[Reland] Adds an aten::_ops namespace with unambiguous function names (#59018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59018 Fixes #58044. This PR: - adds `ATEN_FN(op)` and `ATEN_FN2(op, overload)` macros that resolve to an non-overloaded function in aten::_ops that calls the desired operator (without default arguments). The motivation for this is two-fold: 1) Using aten operators with templates is hard if the operator is overloaded (e.g. add.Tensor and add.Scalar). 2) Method-only operators require special handling; pointers-to-method are different from function pointers. `ATEN_FN2(add_, Tensor)` returns a function instead of a method. There is some interesting behavior for out= operations. `ATEN_FN2(sin, "out")` gives a function that is faithful to the schema; that is, the order of arguments is exactly what it looks like in the schema. This makes it so that you can directly register `ATEN_FN2(sin,"out")` (or a function wrapping it using the same signature) as an override for a DispatchKey. Test Plan: - New tests that ATEN_FN2 works on function and method-only operators - New test that ATEN_FN works - New test that ATEN_FN macro returns a "faithful" function. Codegen output: Operators.h and Operators.cpp are both here: https://gist.github.com/zou3519/c2c6a900410b571f0d7d127019ca5175 Reviewed By: bdhirsh Differential Revision: D28721206 Pulled By: zou3519 fbshipit-source-id: a070017f98e8f4038cb0c64be315eef45d264217	2021-06-01 17:19:06 -07:00
Yu Guo	8805093ec5	use long index type for index_add_cuda deterministic path (#59254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59254 index_add can take int or long index tensor whereas index_put only takes long indices tensor. In the deterministic path of index_add_cuda, we use index_put. Hence we better convert index tensor to long. Test Plan: buck test mode/opt //caffe2/test:torch_cuda -- test_index_add_deterministic ✓ ListingSuccess: caffe2/test:torch_cuda - main (14.748) ✓ Pass: caffe2/test:torch_cuda - test_index_add_deterministic_cuda (test_torch.TestTorchDeviceTypeCUDA) (27.717) ✓ Pass: caffe2/test:torch_cuda - main (27.717) Reviewed By: ngimel Differential Revision: D28804038 fbshipit-source-id: de12932a7738f2805f3bceb3ec024497625bce6a	2021-06-01 16:28:18 -07:00
Jerry Zhang	20348fb32e	[quant][graphmode][fx][refactor] Remove find_matches from Quantizer class (#59037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59037 To remove Quantizer class and split prepare and convert functions to different files Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724865 fbshipit-source-id: 6c6824d0af7dd47d4c111d6a08e373bc65f33e08	2021-06-01 16:07:07 -07:00
Jerry Zhang	7d64fc675b	[quant][graphmode][fx][refactor] Remove fold_weights from Quantizer class (#59036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59036 To remove Quantizer class and split prepare and convert functions to different files Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724862 fbshipit-source-id: 5900420127fcc14846bc34c9ac29ff7e6a703f1e	2021-06-01 15:52:57 -07:00
Thomas J. Fan	8af6281201	DOC Adds register_module_full_backward_hook into docs (#58954 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54443 Adds `register_module_full_backward_hook` into the index so it is rendered in the html docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58954 Reviewed By: ngimel Differential Revision: D28801816 Pulled By: jbschlosser fbshipit-source-id: a2e737fe983e5d7e4e26d7639183bca34b571cb8	2021-06-01 15:47:10 -07:00
Bert Maher	6e7dae9cec	[nnc] Enable CPU fusion inside Facebook, take 3 (#59253 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59253 Fixed a miscompilation exposed by multithreaded profiling collection; let's try again. ghstack-source-id: 130286580 Test Plan: servicelab Reviewed By: navahgar, huiguoo Differential Revision: D28800692 fbshipit-source-id: d791c3b2ccd75fe5e6eca0859083d4cd67460147	2021-06-01 15:42:22 -07:00
Jerry Zhang	cc4891804c	[quant][graphmode][fx][refactor] Remove save_state and restore_state from Quantizer class (#59035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59035 To remove Quantizer class and split prepare and convert functions to different files Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724872 fbshipit-source-id: d32752c635917c9820e5e7cc414ba9d48a258a19	2021-06-01 15:38:36 -07:00
Elton Chen-Yu Ho	336ac9496f	Fix mismatch in README.md Docker Image section (#59199 ) Summary: docker.Makefile has CUDNN_VERSION=8 as the defaults, but README.md states cuDNN v7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59199 Reviewed By: mruberry Differential Revision: D28808611 Pulled By: ngimel fbshipit-source-id: 96cea32bfe33184b2bff69b7bb7f3e50a2b9c6aa	2021-06-01 15:22:30 -07:00
Jagadish Krishnamoorthy	95c26b2806	[ROCm] disable test test_Conv2d_groups_nobias for ROCm (#59158 ) Summary: Disabling the test since its failing in ROCm4.2 Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/59158 Reviewed By: mruberry Differential Revision: D28808953 Pulled By: ngimel fbshipit-source-id: 134f147ead6dc559d2cde49cf8343cd976e6c224	2021-06-01 15:10:06 -07:00
Jerry Zhang	3d521e8b40	[quant][graphmode][fx][refactor] Remove prepare_custom_config from Quantizer class (#59034 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59034 To remove Quantizer class and split prepare and convert functions to different files Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724873 fbshipit-source-id: 870e0822843ad1d035f41eaa015bdde9ccf6ec23	2021-06-01 14:52:22 -07:00
Rohan Varma	a5dcd3c4b7	Revert D28240105: [pytorch][PR] Fix DistributedSampler mem usage on large datasets Test Plan: revert-hammer Differential Revision: D28240105 (`a0ce8da26e`) Original commit changeset: 4c6aa493d0f7 fbshipit-source-id: 8a0e17764c2f26c8316f88ad6c8772b08883ceee	2021-06-01 14:44:23 -07:00
Andrew McCollum	a0ce8da26e	Fix DistributedSampler mem usage on large datasets (#51841 ) Summary: The current implementation of DistributedSampler generates a python list to hold all of the indices, and then returns a slice of this list for the given rank (creating a partial copy of the list). When the underlying dataset is large, both of these choices waste a large amount of memory. It is much more efficient to create a tensor to hold the indices, and then index into that tensor instead of creating slices. In the case of a sampler with `shuffle=False`, it would be possible to avoid creating the `indices` tensor entirely (since the index will always match the value), but I have opted instead here to keep the implementation as similar to the existing version as possible. One possible benefit of this approach is that memory usage will not significantly change based on changing this parameter. Still, it might be better to simply return the indices directly without the underlying array. Additionally, the logic around calculating the number of samples is unnecessarily complex. When dropping the last batch, this can be a simple floor division. In a simple test script which creates a sampler for a dataset with a 100,000,000 items, memory usage is reduced 98% compared to the existing implementation. Fixes https://github.com/pytorch/pytorch/issues/45427 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51841 Reviewed By: albanD Differential Revision: D28240105 Pulled By: rohan-varma fbshipit-source-id: 4c6aa493d0f75c07ec14c98791b3a531300fb1db	2021-06-01 14:15:14 -07:00
Andrew Gu	5a42a97c49	Add NCCL_ASYNC_ERROR_HANDLING as an environment variable (#59109 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57878. This adds `NCCL_ASYNC_ERROR_HANDLING` as a DDP relevant environment variable and includes a check for that variable in the test `test_dump_DDP_relevant_env_vars()`. Notably, the modified test now checks for the new variable but does not check for any of the other previously-existing relevant environment variables that were not already tested for (e.g. `NCCL_BLOCKING_WAIT`). The change was tested via the following on an AI AWS cluster: `WORLD_SIZE=2 BACKEND=nccl gpurun pytest test/distributed/test_distributed_spawn.py -k test_dump_DDP_relevant_env_vars -vs` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59109 Reviewed By: H-Huang, SciPioneer Differential Revision: D28761148 Pulled By: andwgu fbshipit-source-id: 7be4820e61a670b001408d0dd273f65029b1d2fe	2021-06-01 14:02:41 -07:00
Thomas J. Fan	5f1117226f	DOC Update register_buffer/parameter docstring explaining None (#59015 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/40977 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59015 Reviewed By: ngimel Differential Revision: D28797948 Pulled By: jbschlosser fbshipit-source-id: 3bf60af5c1cfc5f1786b4975b48f093391374503	2021-06-01 13:55:07 -07:00
Jerry Zhang	e4b2684331	[quant][graphmode][fx][refactor] Remove patterns from Quantizer class (#59033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59033 To remove Quantizer class and split prepare and convert functions to different files Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724861 fbshipit-source-id: 97b38e851b6bf581510a24636b1d8d6f1d977f5a	2021-06-01 13:44:08 -07:00
Jerry Zhang	83892c1861	[quant][graphmode][fx][refactor] Remove node_name_to_scope from Quantizer (#59032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59032 To remove Quantizer class and split prepare and convert functions to different files Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724868 fbshipit-source-id: 6df639f20076b480812b6dcf0fc7d2c87ca29d8b	2021-06-01 13:26:09 -07:00
Jerry Zhang	3826f7e8e0	[quant][graphmode][fx][refactor] Remove quantized_graph from Quantizer (#59031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59031 Trying to remove Quantizer class and split prepare and convert code Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724871 fbshipit-source-id: dad0332ba271c4cfb6ec1e8f2036443149b5bea4	2021-06-01 13:01:54 -07:00
Jerry Zhang	1b4586ee20	[quant][gx][graphmode][refactor] Remove modules from Quantizer (#59030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59030 Trying to remove Quantizer class and split prepare and convert code Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724875 fbshipit-source-id: d6610c1d5eb7755331252be9e348a230abf4175c	2021-06-01 12:42:28 -07:00
Elton Leander Pinto	aa857850bb	Add check_env, getenv api (#59052 ) Summary: Related Issue: https://github.com/pytorch/pytorch/issues/57691 This PR introduces an API for checking environment variables: ```c++ optional<bool> check_env(const char name) ``` Reads the environment variable name and returns - `optional<true>`, if set equal to "1" - `optional<false>`, if set equal to "0" - `nullopt`, otherwise Issues a warning if the environment variable was set to any value other than 0 or 1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59052 Test Plan: Manually run the following test case: - Apply this diff to the repo ``` diff --git a/torch/csrc/Exceptions.cpp b/torch/csrc/Exceptions.cpp index d008643f70..990d254f0d 100644 --- a/torch/csrc/Exceptions.cpp +++ b/torch/csrc/Exceptions.cpp @@ -9,6 +9,9 @@ #include <torch/csrc/THP.h> +#include <c10/util/Optional.h> +#include <c10/util/env.h> + // NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) PyObject THPException_FatalError; @@ -23,18 +26,7 @@ bool THPException_init(PyObject *module) namespace torch { static bool compute_cpp_stack_traces_enabled() { - auto envar = std::getenv("TORCH_SHOW_CPP_STACKTRACES"); - if (envar) { - if (strcmp(envar, "0") == 0) { - return false; - } - if (strcmp(envar, "1") == 0) { - return true; - } - TORCH_WARN("ignoring invalid value for TORCH_SHOW_CPP_STACKTRACES: ", envar, - " valid values are 0 or 1."); - } - return false; + return c10::utils::check_env("TORCH_SHOW_CPP_STACKTRACES").value_or(false); } bool get_cpp_stacktraces_enabled() { ``` This patch replaces the prior `std::getenv` usage in `torch/csrc/Exceptions.cpp` to use the new api. - Run the following python3 script ```python import torch print(torch.__version__) # should print local version (not release) a1 = torch.tensor([1,2,3]) a2 = torch.tensor([2]) a1 @ a2 ``` using the following commands ```bash python3 test.py # should not output CPP trace TORCH_SHOW_CPP_STACKTRACES=1 python3 test.py # should output CPP trace ``` Reviewed By: ngimel Differential Revision: D28799873 Pulled By: 1ntEgr8 fbshipit-source-id: 3e23353f48679ba8ce0364c049420ba4ff86ff09	2021-06-01 12:24:14 -07:00
aflah02	fd2a36369a	Fixed torch.nn.MultiMarginLoss equation format error (#59188 ) Summary: Removed the extra parenthesis from the right side Fixes https://github.com/pytorch/pytorch/issues/58634 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59188 Reviewed By: ngimel Differential Revision: D28797720 Pulled By: jbschlosser fbshipit-source-id: 47e3084526389e7d1cc17c1a01b253e666c58784	2021-06-01 12:04:34 -07:00
Jack Montgomery	06399d441d	Create EngineHolder for serializing and running TRT Engines with PyTorch Test Plan: python tests `buck test mode/opt -c python.package_style=inplace -c fbcode.platform=platform009 -c fbcode.enable_gpu_sections=true -j 20 deeplearning/trt/EngineHolder:engine_holder_test` python tests to generate test models (this outputs the jit model files for use with cpp tests) `buck run mode/opt -c python.package_style=inplace -c fbcode.platform=platform009 -c fbcode.enable_gpu_sections=true -j 20 deeplearning/trt/EngineHolder:engine_holder_generate_test_models` cpp tests `buck test mode/opt -c python.package_style=inplace -c fbcode.platform=platform009 -c fbcode.enable_gpu_sections=true -j 20 deeplearning/trt/EngineHolder:engine_holder_test_cpp` run service locally build service `buck build mode/opt-split-dwarf -c fbcode.platform=platform009 -c fbcode.enable_gpu_sections=true -j 20 smart/inference_platform_sp/predictor_gpu:service` run service `buck-out/gen/smart/inference_platform_sp/predictor_gpu/service --model_dir="/home/jackmontgomery" --model_id=123_0 --pytorch_predictor_use_cuda` build requester `buck build mode/opt -c python.package_style=inplace -c fbcode.platform=platform009 -c fbcode.enable_gpu_sections=true -j 20 glow/fb/test:invoke_cv_pt_predictor` run requester `buck-out/gen/glow/fb/test/invoke_cv_pt_predictor.par --model_id=123_0 --port=33131 --host="2401:db00:eef0:1100:3560:0:1c02:2115" --num_parallel_requesters=1` Reviewed By: 842974287 Differential Revision: D28581591 fbshipit-source-id: 7738b05543c2c840ee6b8f0d4818f21dc7f61b19	2021-06-01 11:41:33 -07:00
albanD	e9e5588588	Improve Tensor traverse to traverse its grad_fn when possible (#58271 ) Summary: There are two main changes here: - THPVariable will actually visit their grad_fn if there are no other reference to the c++ Tensor and no other reference to the grad_fn. The critical observation compared to the existing comment (thanks Ed!) is that if we also check that the c++ Tensor object is not referenced somewhere else, we're sure that no one can change the grad_fn refcount between the traverse and the clear. - THPVariable don't need a special clear for this new cases as we're the only owner of the c++ Tensor and so the cdata.reset() will necessarily free the Tensor and all its resources. The two tests are to ensure: - That the cycles are indeed collectible by the gc Pull Request resolved: https://github.com/pytorch/pytorch/pull/58271 Reviewed By: ngimel Differential Revision: D28796461 Pulled By: albanD fbshipit-source-id: 62c05930ddd0c48422c79b03118db41a73c1355d	2021-06-01 10:27:52 -07:00
Your Name	65748f81c9	Un-verbose the build (#59235 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/59235 Reviewed By: zou3519 Differential Revision: D28792468 Pulled By: driazati fbshipit-source-id: 98f730ea0ee28b4b5c13198879bee8f586c0c14c	2021-06-01 10:14:26 -07:00
Jerry Zhang	7523728368	[quant][graphmode][fx] Factor out run_weight_observer (#59029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59029 Trying to remove Quantizer class and split prepare and convert code Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724864 fbshipit-source-id: 67ac5e7eb351970fdf46532c3c2ac6ac831bc697	2021-06-01 10:01:42 -07:00
Jerry Zhang	10fc42eacc	[quant][graphmode][fx] Merge quant_env and env (#59028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59028 Previously we have an env and a quant_env in convert, which is a bit confusing, in this PR we merged them and have a Dict[str, Tuple[Node, torch.dtype]] Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D28724863 fbshipit-source-id: 722a682c70d300a6ccd2b988786a1ac2d45e880e	2021-06-01 09:21:38 -07:00
Luca Wehrstedt	afdfd2288a	Revert D28767060: [pytorch][PR] Migrate renorm to ATen (CPU and CUDA) Test Plan: revert-hammer Differential Revision: D28767060 (`74ec50893d`) Original commit changeset: 93dcbe5483f7 fbshipit-source-id: ae85d90212df4e6bb3a5da310e97ad1c06aa9a77	2021-06-01 05:15:21 -07:00
Daniel Haziza	0b040e17e5	More user-friendly error messages (#59106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59106 Should make debugging a bit easier Test Plan: Example error in https://www.internalfb.com/intern/aibench/details/884106485190261 (open log for Portal or Portal+): ``` The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/__torch__/torch/backends/_nnapi/prepare.py", line 29, in forward _0 = uninitialized(__torch__.torch.classes._nnapi.Compilation) if torch.__is__(self.comp, None): _1 = (self).init(args, ) ~~~~~~~~~~ <--- HERE else: pass File "code/__torch__/torch/backends/_nnapi/prepare.py", line 97, in init comp = __torch__.torch.classes._nnapi.Compilation.__new__(__torch__.torch.classes._nnapi.Compilation) _22 = (comp).__init__() _23 = (comp).init(self.ser_model, self.weights, ) ~~~~~~~~~~ <--- HERE self.comp = comp return None Traceback of TorchScript, original code (most recent call last): File "/data/users/dhaziza/fbsource/fbcode/buck-out/dev/gen/mobile-vision/d2go/projects/facegen/tools/export_to_app#link-tree/torch/backends/_nnapi/prepare.py", line 47, in forward def forward(self, args: List[torch.Tensor]) -> List[torch.Tensor]: if self.comp is None: self.init(args) ~~~~~~~~~ <--- HERE comp = self.comp assert comp is not None File "/data/users/dhaziza/fbsource/fbcode/buck-out/dev/gen/mobile-vision/d2go/projects/facegen/tools/export_to_app#link-tree/torch/backends/_nnapi/prepare.py", line 42, in init self.weights = [w.contiguous() for w in self.weights] comp = torch.classes._nnapi.Compilation() comp.init(self.ser_model, self.weights) ~~~~~~~~~ <--- HERE self.comp = comp RuntimeError: [enforce fail at nnapi_model_loader.cpp:171] result == ANEURALNETWORKS_NO_ERROR. NNAPI returned error: 4 ``` Reviewed By: axitkhurana Differential Revision: D28287450 fbshipit-source-id: ccd10301e1492f8879f9d6dd57b60c4e683ebb9e	2021-06-01 02:05:24 -07:00
Oleg Khabinov	cab4849463	[caffe2][glow] Share info about current batch_size (#58902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58902 Pull Request resolved: https://github.com/pytorch/glow/pull/5681 Reviewed By: ChunliF Differential Revision: D28665162 fbshipit-source-id: 39e173a24ee247bc6fee44009798c74dddb27648	2021-06-01 01:21:42 -07:00
Facebook Community Bot	7fb3385f4b	Automated submodule update: FBGEMM (#59170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59170 This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `ffc2e1a91e` Pull Request resolved: https://github.com/pytorch/pytorch/pull/58874 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: hx89 Differential Revision: D28648577 Pulled By: jspark1105 fbshipit-source-id: 0ad1a6fdf27cd3f05f9e342030461cb7caa9986b	2021-05-31 23:18:58 -07:00
Peter Bell	74ec50893d	Migrate renorm to ATen (CPU and CUDA) (#59108 ) Summary: Closes https://github.com/pytorch/pytorch/issues/24754, closes https://github.com/pytorch/pytorch/issues/24616, closes https://github.com/pytorch/pytorch/issues/50874 This reuses `linalg_vector_norm` to calculate the norms. I just add a new kernel that turns the norm into a normalization factor, then multiply the original tensor using a normal broadcasted `mul` operator. The result is less code, and better performance to boot. #### Benchmarks (CPU): \| Shape \| Dim \| Before \| After (1 thread) \| After (8 threads) \| \|:------------:\|:---:\|--------:\|-----------------:\|------------------:\| \| (10, 10, 10) \| 0 \| 11.6 us \| 4.2 us \| 4.2 us \| \| \| 1 \| 14.3 us \| 5.2 us \| 5.2 us \| \| \| 2 \| 12.7 us \| 4.6 us \| 4.6 us \| \| (50, 50, 50) \| 0 \| 330 us \| 120 us \| 24.4 us \| \| \| 1 \| 350 us \| 135 us \| 28.2 us \| \| \| 2 \| 417 us \| 130 us \| 24.4 us \| #### Benchmarks (CUDA) \| Shape \| Dim \| Before \| After \| \|:------------:\|:---:\|--------:\|--------:\| \| (10, 10, 10) \| 0 \| 12.5 us \| 12.1 us \| \| \| 1 \| 13.1 us \| 12.2 us \| \| \| 2 \| 13.1 us \| 11.8 us \| \| (50, 50, 50) \| 0 \| 33.7 us \| 11.6 us \| \| \| 1 \| 36.5 us \| 15.8 us \| \| \| 2 \| 41.1 us \| 15 us \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/59108 Reviewed By: mrshenli Differential Revision: D28767060 Pulled By: ngimel fbshipit-source-id: 93dcbe5483f71cc6a6444fbd5b1aa1f29975d857	2021-05-31 22:38:16 -07:00
kshitij12345	223725cfb0	OpInfo: div - port pending method_tests entry (#59173 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Depends on: https://github.com/pytorch/pytorch/issues/59154 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59173 Reviewed By: ngimel Differential Revision: D28785178 Pulled By: mruberry fbshipit-source-id: 902310f2d77e499a2355a23b2d5a8c0b21b8c5bb	2021-05-31 17:32:27 -07:00
Kushashwa Ravi Shrimali	6d45d7a6c3	Enables previously "slow" `gradgrad` checks on CUDA (#57802 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57508 Earlier, a few CUDA `gradgrad` checks (see the list of ops below) were disabled because of them being too slow. There have been improvements (see https://github.com/pytorch/pytorch/issues/57508 for reference) and this PR aimed on: 1. Time taken by `gradgrad` checks on CUDA for the ops listed below. 2. Enabling the tests again if the times sound reasonable Ops considered: `addbmm, baddbmm, bmm, cholesky, symeig, inverse, linalg.cholesky, linalg.cholesky_ex, linalg.eigh, linalg.qr, lu, qr, solve, triangular_solve, linalg.pinv, svd, linalg.svd, pinverse, linalg.householder_product, linalg.solve`. For numbers (on time taken) on a separate CI run: https://github.com/pytorch/pytorch/pull/57802#issuecomment-836169691. cc: mruberry albanD pmeier Pull Request resolved: https://github.com/pytorch/pytorch/pull/57802 Reviewed By: ngimel Differential Revision: D28784106 Pulled By: mruberry fbshipit-source-id: 9b15238319f143c59f83d500e831d66d98542ff8	2021-05-30 22:16:46 -07:00
krshrimali	ef40757de3	OpInfo: `zero_` (#58731 ) Summary: See https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58731 Reviewed By: ngimel Differential Revision: D28784083 Pulled By: mruberry fbshipit-source-id: f06de8045afd3728b1fedc014c091d8fd1955a9f	2021-05-30 21:49:29 -07:00
kshitij12345	2aeb16c13a	[fix] i1-i1e ROCm failure: mark array as const so that it is available for host and device (#59187 ) Summary: Fix failing ROCm build introduced by https://github.com/pytorch/pytorch/issues/56352 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59187 Reviewed By: ngimel Differential Revision: D28784072 Pulled By: mruberry fbshipit-source-id: 36a5bd11ad2fe80a81aae6eb8b21f0901c842ddc	2021-05-30 21:44:54 -07:00
kshitij12345	fea7a79e0b	[special] Add ndtr (#58126 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Plot: ![image](https://user-images.githubusercontent.com/19503980/117942099-54efd680-b328-11eb-8948-c3080779ce19.png) https://colab.research.google.com/drive/1Of67A042rOImj8wrLF_fUTgoy_wVEOZS?usp=sharing TODO: * [x] Add docs (https://13385714-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.ndtr) Pull Request resolved: https://github.com/pytorch/pytorch/pull/58126 Reviewed By: anjali411 Differential Revision: D28700957 Pulled By: mruberry fbshipit-source-id: 5b9991e97ec1e8fd01518cc9d9849108d35fe406	2021-05-30 21:12:04 -07:00
Peter Bell	2a78f6376c	TensorIterator: Reduce serial_for_each static overhead (#58909 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58909 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D28776507 Pulled By: ngimel fbshipit-source-id: 4f0283d03b26aa5785b687b78d77e6b0efcbaf65	2021-05-30 21:08:54 -07:00
kshitij12345	445e838210	OpInfo: resize_, resize_as_ (#59176 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59176 Reviewed By: ngimel Differential Revision: D28780083 Pulled By: mruberry fbshipit-source-id: 472584e8faa4cb1031908df097849d2d4167fdf5	2021-05-30 18:53:17 -07:00
kshitij12345	ea465f7378	OpInfo: true_divide and minor fix (#59154 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59154 Reviewed By: ngimel Differential Revision: D28780115 Pulled By: mruberry fbshipit-source-id: 91e254698597fa0c7d4df6053ec017a85e180304	2021-05-30 18:35:30 -07:00
Peter Bell	aaccdc3996	SparseCsr: Fix some uses of deprecated Tensor methods (#58990 ) Summary: This fixes some deprecation warnings in the build that were introduced by https://github.com/pytorch/pytorch/issues/58768. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58990 Reviewed By: ngimel Differential Revision: D28776804 Pulled By: mruberry fbshipit-source-id: 8abf75ea8f7adca537f9c808e68356829407665e	2021-05-30 03:58:19 -07:00

1 2 3 4 5 ...

37279 Commits