Commit Graph

37279 Commits

Author SHA1 Message Date
Kushashwa Ravi Shrimali
44c20ce676 Alias for i0 to special namespace (#59141)
Summary:
See https://github.com/pytorch/pytorch/issues/50345

cc: mruberry kshitij12345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59141

Reviewed By: ngimel

Differential Revision: D28784097

Pulled By: mruberry

fbshipit-source-id: 9b61a21906ef337292686fd40e328502a79e6f09
2021-06-01 23:04:09 -07:00
driazati
059a717c9e Fix breakpad build and add to more images (#59236)
Summary:
This PR
* adds the breakpad build to most of the remaining docker images (except the mobile + slim ones)
* pins to a [fork of breakpad](https://github.com/google/breakpad/compare/master...driazati:master?expand=1) to enable dasiy chaining on signal handlers
* renames the API to be nicer

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59236

Reviewed By: malfet

Differential Revision: D28792511

Pulled By: driazati

fbshipit-source-id: 83723e74b7f0a00e1695210ac2620a0c91ab4bf2
2021-06-01 22:47:14 -07:00
Yi Wang
dbe629c51d [RPC Framework] Support creating a RemoteModule by RRef (#59242)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59242

#Oringal PR Issue: https://github.com/pytorch/pytorch/issues/58274

This can be a workaround: Instead of passing a script `RemoteModule` over RPC, pass its `module_rref` field over RPC, and then construct a new `RemoteModule` on the receiver end.
ghstack-source-id: 130268018

Test Plan:
buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_send_remote_module_over_the_wire_script_not_supported

buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_remote_module_py_pickle_not_supported_script

buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_create_remote_module_by_module_rref

Reviewed By: vipannalla

Differential Revision: D28794905

fbshipit-source-id: 1a677ff0d4b47c078ad47b50d7102a198a1fc39b
2021-06-01 22:35:03 -07:00
Jerry Zhang
3218d890dd [quant][graphmode][fx][fix] Fix support for custom module (#59041)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59041

Static quantization for Custom module support was removed in a previous refactor
https://github.com/pytorch/pytorch/pull/57519 since it's not covered by the test case
This PR re-enabled the test case and fixed the support

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724866

fbshipit-source-id: 1974675b88b56a2173daf86965d6f3fb7ebd783b
2021-06-01 22:31:15 -07:00
Jerry Zhang
06af7618e7 [quant][graphmode][fx][refactor] Remove Quantizer class from convert (QuantizeHandler) (#59040)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59040

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724870

fbshipit-source-id: c0f748711b825cd46bdfcc05c054c77a41e8207a
2021-06-01 22:00:49 -07:00
Philip Meier
0a26781966 fix numpy compatibility in test for torch.kthvalue (#59214)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59201. Should be merged after https://github.com/pytorch/pytorch/issues/59067 to ensure this actually working correctly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59214

Reviewed By: albanD

Differential Revision: D28792363

Pulled By: mruberry

fbshipit-source-id: 0cf613463139352906fb567f1efcc582c2c25de8
2021-06-01 21:57:09 -07:00
Ivan Yashchuk
e9e1bb1a4e Fix device of info tensor for torch.linalg.inv_ex with MAGMA backend (#59223)
Summary:
This PR fixes `torch.linalg.inv_ex` with MAGMA backend.
`info` tensor was returned on CPU device even for CUDA inputs.
Now it's on the same device as input.

Fixes https://github.com/pytorch/pytorch/issues/58769

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59223

Reviewed By: ngimel

Differential Revision: D28814876

Pulled By: mruberry

fbshipit-source-id: f66c6f06fb8bc305cb2e22b08750a25c8888fb65
2021-06-01 21:49:57 -07:00
Jerry Zhang
50e6ee3ca2 [quant][graphmode][fx][refactor] Remove Quantizer class from quantize_node (#59039)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59039

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724874

fbshipit-source-id: bd984716b2da1d6879c3e92fa827574783a41567
2021-06-01 21:40:08 -07:00
Alexander
2d8f0d966f CUDA support in the CSR layout: CUDA addmm/matvec (#59012)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59012

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D28719631

Pulled By: bhosmer

fbshipit-source-id: 43e2004a61e114aeb0a7c6ad8a25fedda238c6da
2021-06-01 21:16:42 -07:00
Michael Carilli
3efefc4016 [CUDA graphs] Makes sure all graphs tests call empty_cache() at some point before capture (#59233)
Summary:
Graphs tests are sometimes flaky in CI ([example](https://app.circleci.com/pipelines/github/pytorch/pytorch/328930/workflows/0311199b-a0be-4802-a286-cf1e73f96c70/jobs/13793451)) because when the GPU runs near its max memory capacity (which is not unusual during a long test), sometimes, to satisfy new allocations that don't match any existing unused blocks, the caching allocator may call `synchronize_and_free_events` to wait on block end-of-life events and cudaFree unused blocks, then re-cudaMalloc a new block. For ungraphed ops this isn't a problem, but synchronizing or calling cudaFree while capturing is illegal, so `synchronize_and_free_events` raises an error if called during capture.

The graphs tests themselves don't use much memory, so calling torch.cuda.empty_cache() at some point before their captures should ensure memory is available and the captures never need `synchronize_and_free_events`.

I was already calling empty_cache() near the beginning of several graphs tests. This PR extends it to the ones I forgot.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59233

Reviewed By: mruberry

Differential Revision: D28816691

Pulled By: ngimel

fbshipit-source-id: 5cd83e48e43b1107daed5cfa2efff0fdb4f99dff
2021-06-01 21:05:46 -07:00
Jerry Zhang
1d37f41567 [quant][graphmode][fx][refactor] Remove _prepare from Quantizer class (#59038)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59038

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724869

fbshipit-source-id: e8501c9720b5ddb654e78bc8fa08de0466c1d52b
2021-06-01 18:01:22 -07:00
Richard Zou
970096b624 [Reland] Adds an aten::_ops namespace with unambiguous function names (#59018)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59018

Fixes #58044.

This PR:
- adds `ATEN_FN(op)` and `ATEN_FN2(op, overload)` macros that resolve to
an non-overloaded function in aten::_ops that calls the desired operator
(without default arguments).

The motivation for this is two-fold:
1) Using aten operators with templates is hard if the operator is
overloaded (e.g. add.Tensor and add.Scalar).
2) Method-only operators require special handling; pointers-to-method
are different from function pointers. `ATEN_FN2(add_, Tensor)` returns
a function instead of a method.

There is some interesting behavior for out= operations.
`ATEN_FN2(sin, "out")` gives a function that is *faithful* to the schema;
that is, the order of arguments is exactly what it looks like in the
schema. This makes it so that you can directly register
`ATEN_FN2(sin,"out")` (or a function wrapping it using the same signature)
as an override for a DispatchKey.

Test Plan:
- New tests that ATEN_FN2 works on function and method-only operators
- New test that ATEN_FN works
- New test that ATEN_FN macro returns a "faithful" function.

Codegen output:
Operators.h and Operators.cpp are both here:
https://gist.github.com/zou3519/c2c6a900410b571f0d7d127019ca5175

Reviewed By: bdhirsh

Differential Revision: D28721206

Pulled By: zou3519

fbshipit-source-id: a070017f98e8f4038cb0c64be315eef45d264217
2021-06-01 17:19:06 -07:00
Yu Guo
8805093ec5 use long index type for index_add_cuda deterministic path (#59254)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59254

index_add can take int or long index tensor whereas index_put only takes long indices tensor.

In the deterministic path of index_add_cuda, we use index_put. Hence we better convert index tensor to long.

Test Plan:
buck test mode/opt //caffe2/test:torch_cuda -- test_index_add_deterministic

    ✓ ListingSuccess: caffe2/test:torch_cuda - main (14.748)
    ✓ Pass: caffe2/test:torch_cuda - test_index_add_deterministic_cuda (test_torch.TestTorchDeviceTypeCUDA) (27.717)
    ✓ Pass: caffe2/test:torch_cuda - main (27.717)

Reviewed By: ngimel

Differential Revision: D28804038

fbshipit-source-id: de12932a7738f2805f3bceb3ec024497625bce6a
2021-06-01 16:28:18 -07:00
Jerry Zhang
20348fb32e [quant][graphmode][fx][refactor] Remove find_matches from Quantizer class (#59037)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59037

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724865

fbshipit-source-id: 6c6824d0af7dd47d4c111d6a08e373bc65f33e08
2021-06-01 16:07:07 -07:00
Jerry Zhang
7d64fc675b [quant][graphmode][fx][refactor] Remove fold_weights from Quantizer class (#59036)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59036

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724862

fbshipit-source-id: 5900420127fcc14846bc34c9ac29ff7e6a703f1e
2021-06-01 15:52:57 -07:00
Thomas J. Fan
8af6281201 DOC Adds register_module_full_backward_hook into docs (#58954)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/54443

Adds `register_module_full_backward_hook` into the index so it is rendered in the html docs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58954

Reviewed By: ngimel

Differential Revision: D28801816

Pulled By: jbschlosser

fbshipit-source-id: a2e737fe983e5d7e4e26d7639183bca34b571cb8
2021-06-01 15:47:10 -07:00
Bert Maher
6e7dae9cec [nnc] Enable CPU fusion inside Facebook, take 3 (#59253)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59253

Fixed a miscompilation exposed by multithreaded profiling collection; let's try again.
ghstack-source-id: 130286580

Test Plan: servicelab

Reviewed By: navahgar, huiguoo

Differential Revision: D28800692

fbshipit-source-id: d791c3b2ccd75fe5e6eca0859083d4cd67460147
2021-06-01 15:42:22 -07:00
Jerry Zhang
cc4891804c [quant][graphmode][fx][refactor] Remove save_state and restore_state from Quantizer class (#59035)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59035

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724872

fbshipit-source-id: d32752c635917c9820e5e7cc414ba9d48a258a19
2021-06-01 15:38:36 -07:00
Elton Chen-Yu Ho
336ac9496f Fix mismatch in README.md Docker Image section (#59199)
Summary:
docker.Makefile has CUDNN_VERSION=8 as the defaults, but README.md states cuDNN v7

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59199

Reviewed By: mruberry

Differential Revision: D28808611

Pulled By: ngimel

fbshipit-source-id: 96cea32bfe33184b2bff69b7bb7f3e50a2b9c6aa
2021-06-01 15:22:30 -07:00
Jagadish Krishnamoorthy
95c26b2806 [ROCm] disable test test_Conv2d_groups_nobias for ROCm (#59158)
Summary:
Disabling the test since its failing in ROCm4.2

Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59158

Reviewed By: mruberry

Differential Revision: D28808953

Pulled By: ngimel

fbshipit-source-id: 134f147ead6dc559d2cde49cf8343cd976e6c224
2021-06-01 15:10:06 -07:00
Jerry Zhang
3d521e8b40 [quant][graphmode][fx][refactor] Remove prepare_custom_config from Quantizer class (#59034)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59034

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724873

fbshipit-source-id: 870e0822843ad1d035f41eaa015bdde9ccf6ec23
2021-06-01 14:52:22 -07:00
Rohan Varma
a5dcd3c4b7 Revert D28240105: [pytorch][PR] Fix DistributedSampler mem usage on large datasets
Test Plan: revert-hammer

Differential Revision:
D28240105 (a0ce8da26e)

Original commit changeset: 4c6aa493d0f7

fbshipit-source-id: 8a0e17764c2f26c8316f88ad6c8772b08883ceee
2021-06-01 14:44:23 -07:00
Andrew McCollum
a0ce8da26e Fix DistributedSampler mem usage on large datasets (#51841)
Summary:
The current implementation of DistributedSampler generates a python list to hold all of the indices, and then returns a slice of this list for the given rank (creating a partial copy of the list). When the underlying dataset is large, both of these choices waste a large amount of memory. It is much more efficient to create a tensor to hold the indices, and then index into that tensor instead of creating slices.

In the case of a sampler with `shuffle=False`, it would be possible to avoid creating the `indices` tensor entirely (since the index will always match the value), but I have opted instead here to keep the implementation as similar to the existing version as possible. One possible benefit of this approach is that memory usage will not significantly change based on changing this parameter. Still, it might be better to simply return the indices directly without the underlying array.

Additionally, the logic around calculating the number of samples is unnecessarily complex. When dropping the last batch, this can be a simple floor division.

In a simple test script which creates a sampler for a dataset with a 100,000,000 items, memory usage is reduced 98% compared to the existing implementation.

Fixes https://github.com/pytorch/pytorch/issues/45427

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51841

Reviewed By: albanD

Differential Revision: D28240105

Pulled By: rohan-varma

fbshipit-source-id: 4c6aa493d0f75c07ec14c98791b3a531300fb1db
2021-06-01 14:15:14 -07:00
Andrew Gu
5a42a97c49 Add NCCL_ASYNC_ERROR_HANDLING as an environment variable (#59109)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/57878.

This adds `NCCL_ASYNC_ERROR_HANDLING` as a DDP relevant environment variable and includes a check for that variable in the test `test_dump_DDP_relevant_env_vars()`. Notably, the modified test now checks for the new variable but does not check for any of the other previously-existing relevant environment variables that were not already tested for (e.g. `NCCL_BLOCKING_WAIT`).

The change was tested via the following on an AI AWS cluster:
`WORLD_SIZE=2 BACKEND=nccl gpurun pytest test/distributed/test_distributed_spawn.py -k test_dump_DDP_relevant_env_vars -vs`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59109

Reviewed By: H-Huang, SciPioneer

Differential Revision: D28761148

Pulled By: andwgu

fbshipit-source-id: 7be4820e61a670b001408d0dd273f65029b1d2fe
2021-06-01 14:02:41 -07:00
Thomas J. Fan
5f1117226f DOC Update register_buffer/parameter docstring explaining None (#59015)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/40977

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59015

Reviewed By: ngimel

Differential Revision: D28797948

Pulled By: jbschlosser

fbshipit-source-id: 3bf60af5c1cfc5f1786b4975b48f093391374503
2021-06-01 13:55:07 -07:00
Jerry Zhang
e4b2684331 [quant][graphmode][fx][refactor] Remove patterns from Quantizer class (#59033)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59033

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724861

fbshipit-source-id: 97b38e851b6bf581510a24636b1d8d6f1d977f5a
2021-06-01 13:44:08 -07:00
Jerry Zhang
83892c1861 [quant][graphmode][fx][refactor] Remove node_name_to_scope from Quantizer (#59032)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59032

To remove Quantizer class and split prepare and convert functions to different files

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724868

fbshipit-source-id: 6df639f20076b480812b6dcf0fc7d2c87ca29d8b
2021-06-01 13:26:09 -07:00
Jerry Zhang
3826f7e8e0 [quant][graphmode][fx][refactor] Remove quantized_graph from Quantizer (#59031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59031

Trying to remove Quantizer class and split prepare and convert code

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724871

fbshipit-source-id: dad0332ba271c4cfb6ec1e8f2036443149b5bea4
2021-06-01 13:01:54 -07:00
Jerry Zhang
1b4586ee20 [quant][gx][graphmode][refactor] Remove modules from Quantizer (#59030)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59030

Trying to remove Quantizer class and split prepare and convert code

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724875

fbshipit-source-id: d6610c1d5eb7755331252be9e348a230abf4175c
2021-06-01 12:42:28 -07:00
Elton Leander Pinto
aa857850bb Add check_env, getenv api (#59052)
Summary:
Related Issue: https://github.com/pytorch/pytorch/issues/57691
This PR introduces an API for checking environment variables:

```c++
optional<bool> check_env(const char *name)
```
Reads the environment variable name and returns
- `optional<true>`,                       if set equal to "1"
- `optional<false>`,                      if set equal to "0"
- `nullopt`,   otherwise

Issues a warning if the environment variable was set to any value other than 0 or 1

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59052

Test Plan:
Manually run the following test case:

- Apply this diff to the repo
```
 diff --git a/torch/csrc/Exceptions.cpp b/torch/csrc/Exceptions.cpp
index d008643f70..990d254f0d 100644
 --- a/torch/csrc/Exceptions.cpp
+++ b/torch/csrc/Exceptions.cpp
@@ -9,6 +9,9 @@

 #include <torch/csrc/THP.h>

+#include <c10/util/Optional.h>
+#include <c10/util/env.h>
+
 // NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)
 PyObject *THPException_FatalError;

@@ -23,18 +26,7 @@ bool THPException_init(PyObject *module)
 namespace torch {

 static bool compute_cpp_stack_traces_enabled() {
-  auto envar = std::getenv("TORCH_SHOW_CPP_STACKTRACES");
-  if (envar) {
-    if (strcmp(envar, "0") == 0) {
-      return false;
-    }
-    if (strcmp(envar, "1") == 0) {
-      return true;
-    }
-    TORCH_WARN("ignoring invalid value for TORCH_SHOW_CPP_STACKTRACES: ", envar,
-               " valid values are 0 or 1.");
-  }
-  return false;
+ return c10::utils::check_env("TORCH_SHOW_CPP_STACKTRACES").value_or(false);
 }

 bool get_cpp_stacktraces_enabled() {
```
This patch replaces the prior `std::getenv` usage in `torch/csrc/Exceptions.cpp` to use the new api.
- Run the following python3 script
```python
import torch

print(torch.__version__) # should print local version (not release)

a1 = torch.tensor([1,2,3])
a2 = torch.tensor([2])

a1 @ a2
```
using the following commands
```bash
python3 test.py # should not output CPP trace
TORCH_SHOW_CPP_STACKTRACES=1 python3 test.py # should output CPP trace
```

Reviewed By: ngimel

Differential Revision: D28799873

Pulled By: 1ntEgr8

fbshipit-source-id: 3e23353f48679ba8ce0364c049420ba4ff86ff09
2021-06-01 12:24:14 -07:00
aflah02
fd2a36369a Fixed torch.nn.MultiMarginLoss equation format error (#59188)
Summary:
Removed the extra parenthesis from the right side
Fixes https://github.com/pytorch/pytorch/issues/58634

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59188

Reviewed By: ngimel

Differential Revision: D28797720

Pulled By: jbschlosser

fbshipit-source-id: 47e3084526389e7d1cc17c1a01b253e666c58784
2021-06-01 12:04:34 -07:00
Jack Montgomery
06399d441d Create EngineHolder for serializing and running TRT Engines with PyTorch
Test Plan:
**python tests**
`buck test mode/opt -c python.package_style=inplace -c fbcode.platform=platform009 -c fbcode.enable_gpu_sections=true -j 20 deeplearning/trt/EngineHolder:engine_holder_test`

**python tests to generate test models** (this outputs the jit model files for use with cpp tests)
`buck run mode/opt -c python.package_style=inplace -c fbcode.platform=platform009 -c fbcode.enable_gpu_sections=true -j 20 deeplearning/trt/EngineHolder:engine_holder_generate_test_models`

**cpp tests**
`buck test mode/opt -c python.package_style=inplace -c fbcode.platform=platform009 -c fbcode.enable_gpu_sections=true -j 20 deeplearning/trt/EngineHolder:engine_holder_test_cpp`

**run service locally**

*build service*
`buck build mode/opt-split-dwarf -c fbcode.platform=platform009 -c fbcode.enable_gpu_sections=true -j 20 smart/inference_platform_sp/predictor_gpu:service`

*run service*
`buck-out/gen/smart/inference_platform_sp/predictor_gpu/service --model_dir="/home/jackmontgomery" --model_id=123_0 --pytorch_predictor_use_cuda`

*build requester*
`buck build mode/opt -c python.package_style=inplace -c fbcode.platform=platform009 -c fbcode.enable_gpu_sections=true -j 20 glow/fb/test:invoke_cv_pt_predictor`

*run requester*
`buck-out/gen/glow/fb/test/invoke_cv_pt_predictor.par --model_id=123_0 --port=33131 --host="2401:db00:eef0:1100:3560:0:1c02:2115" --num_parallel_requesters=1`

Reviewed By: 842974287

Differential Revision: D28581591

fbshipit-source-id: 7738b05543c2c840ee6b8f0d4818f21dc7f61b19
2021-06-01 11:41:33 -07:00
albanD
e9e5588588 Improve Tensor traverse to traverse its grad_fn when possible (#58271)
Summary:
There are two main changes here:
- THPVariable will actually visit their grad_fn if there are no other reference to the c++ Tensor and no other reference to the grad_fn. The critical observation compared to the existing comment (thanks Ed!) is that if we also check that the c++ Tensor object is not referenced somewhere else, we're sure that no one can change the grad_fn refcount between the traverse and the clear.
- THPVariable don't need a special clear for this new cases as we're the only owner of the c++ Tensor and so the cdata.reset() will necessarily free the Tensor and all its resources.

The two tests are to ensure:
- That the cycles are indeed collectible by the gc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58271

Reviewed By: ngimel

Differential Revision: D28796461

Pulled By: albanD

fbshipit-source-id: 62c05930ddd0c48422c79b03118db41a73c1355d
2021-06-01 10:27:52 -07:00
Your Name
65748f81c9 Un-verbose the build (#59235)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59235

Reviewed By: zou3519

Differential Revision: D28792468

Pulled By: driazati

fbshipit-source-id: 98f730ea0ee28b4b5c13198879bee8f586c0c14c
2021-06-01 10:14:26 -07:00
Jerry Zhang
7523728368 [quant][graphmode][fx] Factor out run_weight_observer (#59029)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59029

Trying to remove Quantizer class and split prepare and convert code

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724864

fbshipit-source-id: 67ac5e7eb351970fdf46532c3c2ac6ac831bc697
2021-06-01 10:01:42 -07:00
Jerry Zhang
10fc42eacc [quant][graphmode][fx] Merge quant_env and env (#59028)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59028

Previously we have an env and a quant_env in convert, which is a bit confusing,
in this PR we merged them and have a Dict[str, Tuple[Node, torch.dtype]]

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D28724863

fbshipit-source-id: 722a682c70d300a6ccd2b988786a1ac2d45e880e
2021-06-01 09:21:38 -07:00
Luca Wehrstedt
afdfd2288a Revert D28767060: [pytorch][PR] Migrate renorm to ATen (CPU and CUDA)
Test Plan: revert-hammer

Differential Revision:
D28767060 (74ec50893d)

Original commit changeset: 93dcbe5483f7

fbshipit-source-id: ae85d90212df4e6bb3a5da310e97ad1c06aa9a77
2021-06-01 05:15:21 -07:00
Daniel Haziza
0b040e17e5 More user-friendly error messages (#59106)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59106

Should make debugging a bit easier

Test Plan:
Example error in https://www.internalfb.com/intern/aibench/details/884106485190261 (open log for Portal or Portal+):
```
The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/torch/backends/_nnapi/prepare.py", line 29, in forward
    _0 = uninitialized(__torch__.torch.classes._nnapi.Compilation)
    if torch.__is__(self.comp, None):
      _1 = (self).init(args, )
            ~~~~~~~~~~ <--- HERE
    else:
      pass
  File "code/__torch__/torch/backends/_nnapi/prepare.py", line 97, in init
    comp = __torch__.torch.classes._nnapi.Compilation.__new__(__torch__.torch.classes._nnapi.Compilation)
    _22 = (comp).__init__()
    _23 = (comp).init(self.ser_model, self.weights, )
           ~~~~~~~~~~ <--- HERE
    self.comp = comp
    return None

Traceback of TorchScript, original code (most recent call last):
  File "/data/users/dhaziza/fbsource/fbcode/buck-out/dev/gen/mobile-vision/d2go/projects/facegen/tools/export_to_app#link-tree/torch/backends/_nnapi/prepare.py", line 47, in forward
    def forward(self, args: List[torch.Tensor]) -> List[torch.Tensor]:
        if self.comp is None:
            self.init(args)
            ~~~~~~~~~ <--- HERE
        comp = self.comp
        assert comp is not None
  File "/data/users/dhaziza/fbsource/fbcode/buck-out/dev/gen/mobile-vision/d2go/projects/facegen/tools/export_to_app#link-tree/torch/backends/_nnapi/prepare.py", line 42, in init
        self.weights = [w.contiguous() for w in self.weights]
        comp = torch.classes._nnapi.Compilation()
        comp.init(self.ser_model, self.weights)
        ~~~~~~~~~ <--- HERE
        self.comp = comp
RuntimeError: [enforce fail at nnapi_model_loader.cpp:171] result == ANEURALNETWORKS_NO_ERROR. NNAPI returned error: 4
```

Reviewed By: axitkhurana

Differential Revision: D28287450

fbshipit-source-id: ccd10301e1492f8879f9d6dd57b60c4e683ebb9e
2021-06-01 02:05:24 -07:00
Oleg Khabinov
cab4849463 [caffe2][glow] Share info about current batch_size (#58902)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58902

Pull Request resolved: https://github.com/pytorch/glow/pull/5681

Reviewed By: ChunliF

Differential Revision: D28665162

fbshipit-source-id: 39e173a24ee247bc6fee44009798c74dddb27648
2021-06-01 01:21:42 -07:00
Facebook Community Bot
7fb3385f4b Automated submodule update: FBGEMM (#59170)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59170

This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: ffc2e1a91e

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58874

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: hx89

Differential Revision: D28648577

Pulled By: jspark1105

fbshipit-source-id: 0ad1a6fdf27cd3f05f9e342030461cb7caa9986b
2021-05-31 23:18:58 -07:00
Peter Bell
74ec50893d Migrate renorm to ATen (CPU and CUDA) (#59108)
Summary:
Closes https://github.com/pytorch/pytorch/issues/24754, closes https://github.com/pytorch/pytorch/issues/24616, closes https://github.com/pytorch/pytorch/issues/50874

This reuses `linalg_vector_norm` to calculate the norms. I just add a new kernel that turns  the norm into a normalization factor, then multiply the original tensor using a normal broadcasted `mul` operator. The result is less code, and better performance to boot.

#### Benchmarks (CPU):
|     Shape    | Dim |  Before | After (1 thread) | After (8 threads) |
|:------------:|:---:|--------:|-----------------:|------------------:|
| (10, 10, 10) | 0   | 11.6 us |           4.2 us |            4.2 us |
|              | 1   | 14.3 us |           5.2 us |            5.2 us |
|              | 2   | 12.7 us |           4.6 us |            4.6 us |
| (50, 50, 50) | 0   |  330 us |           120 us |           24.4 us |
|              | 1   |  350 us |           135 us |           28.2 us |
|              | 2   |  417 us |           130 us |           24.4 us |

#### Benchmarks (CUDA)
|     Shape    | Dim |  Before |   After |
|:------------:|:---:|--------:|--------:|
| (10, 10, 10) | 0   | 12.5 us | 12.1 us |
|              | 1   | 13.1 us | 12.2 us |
|              | 2   | 13.1 us | 11.8 us |
| (50, 50, 50) | 0   | 33.7 us | 11.6 us |
|              | 1   | 36.5 us | 15.8 us |
|              | 2   | 41.1 us |   15 us |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59108

Reviewed By: mrshenli

Differential Revision: D28767060

Pulled By: ngimel

fbshipit-source-id: 93dcbe5483f71cc6a6444fbd5b1aa1f29975d857
2021-05-31 22:38:16 -07:00
kshitij12345
223725cfb0 OpInfo: div - port pending method_tests entry (#59173)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/54261

Depends on: https://github.com/pytorch/pytorch/issues/59154

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59173

Reviewed By: ngimel

Differential Revision: D28785178

Pulled By: mruberry

fbshipit-source-id: 902310f2d77e499a2355a23b2d5a8c0b21b8c5bb
2021-05-31 17:32:27 -07:00
Kushashwa Ravi Shrimali
6d45d7a6c3 Enables previously "slow" gradgrad checks on CUDA (#57802)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/57508

Earlier, a few CUDA `gradgrad` checks (see the list of ops below) were disabled because of them being too slow. There have been improvements (see https://github.com/pytorch/pytorch/issues/57508 for reference) and this PR aimed on:

1. Time taken by `gradgrad` checks on CUDA for the ops listed below.
2. Enabling the tests again if the times sound reasonable

Ops considered: `addbmm, baddbmm, bmm, cholesky, symeig, inverse, linalg.cholesky, linalg.cholesky_ex, linalg.eigh, linalg.qr, lu, qr, solve, triangular_solve, linalg.pinv, svd, linalg.svd, pinverse, linalg.householder_product, linalg.solve`.

For numbers (on time taken) on a separate CI run: https://github.com/pytorch/pytorch/pull/57802#issuecomment-836169691.

cc: mruberry albanD pmeier

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57802

Reviewed By: ngimel

Differential Revision: D28784106

Pulled By: mruberry

fbshipit-source-id: 9b15238319f143c59f83d500e831d66d98542ff8
2021-05-30 22:16:46 -07:00
krshrimali
ef40757de3 OpInfo: zero_ (#58731)
Summary:
See https://github.com/pytorch/pytorch/issues/54261

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58731

Reviewed By: ngimel

Differential Revision: D28784083

Pulled By: mruberry

fbshipit-source-id: f06de8045afd3728b1fedc014c091d8fd1955a9f
2021-05-30 21:49:29 -07:00
kshitij12345
2aeb16c13a [fix] i1-i1e ROCm failure: mark array as const so that it is available for host and device (#59187)
Summary:
Fix failing ROCm build introduced by https://github.com/pytorch/pytorch/issues/56352

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59187

Reviewed By: ngimel

Differential Revision: D28784072

Pulled By: mruberry

fbshipit-source-id: 36a5bd11ad2fe80a81aae6eb8b21f0901c842ddc
2021-05-30 21:44:54 -07:00
kshitij12345
fea7a79e0b [special] Add ndtr (#58126)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/50345

Plot:
![image](https://user-images.githubusercontent.com/19503980/117942099-54efd680-b328-11eb-8948-c3080779ce19.png)
https://colab.research.google.com/drive/1Of67A042rOImj8wrLF_fUTgoy_wVEOZS?usp=sharing

TODO:
* [x] Add docs (https://13385714-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.ndtr)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58126

Reviewed By: anjali411

Differential Revision: D28700957

Pulled By: mruberry

fbshipit-source-id: 5b9991e97ec1e8fd01518cc9d9849108d35fe406
2021-05-30 21:12:04 -07:00
Peter Bell
2a78f6376c TensorIterator: Reduce serial_for_each static overhead (#58909)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58909

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D28776507

Pulled By: ngimel

fbshipit-source-id: 4f0283d03b26aa5785b687b78d77e6b0efcbaf65
2021-05-30 21:08:54 -07:00
kshitij12345
445e838210 OpInfo: resize_, resize_as_ (#59176)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/54261

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59176

Reviewed By: ngimel

Differential Revision: D28780083

Pulled By: mruberry

fbshipit-source-id: 472584e8faa4cb1031908df097849d2d4167fdf5
2021-05-30 18:53:17 -07:00
kshitij12345
ea465f7378 OpInfo: true_divide and minor fix (#59154)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/54261

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59154

Reviewed By: ngimel

Differential Revision: D28780115

Pulled By: mruberry

fbshipit-source-id: 91e254698597fa0c7d4df6053ec017a85e180304
2021-05-30 18:35:30 -07:00
Peter Bell
aaccdc3996 SparseCsr: Fix some uses of deprecated Tensor methods (#58990)
Summary:
This fixes some deprecation warnings in the build that were introduced by https://github.com/pytorch/pytorch/issues/58768.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58990

Reviewed By: ngimel

Differential Revision: D28776804

Pulled By: mruberry

fbshipit-source-id: 8abf75ea8f7adca537f9c808e68356829407665e
2021-05-30 03:58:19 -07:00