Commit Graph

40733 Commits

Author SHA1 Message Date
Luca Wehrstedt
201174cb91 Revert D31389480: [pytorch][PR] Allow external CUDA streams to be set as current
Test Plan: revert-hammer

Differential Revision:
D31389480 (61f0bb70c1)

Original commit changeset: 2b2f40e5452c

fbshipit-source-id: c6631e51abcf3819732f981f646cb77b91569c7d
2021-10-08 09:20:24 -07:00
Rohan Varma
b72a1782d8 [PG Wrapper][BE] Add collective information when monitored barrier error is (#66167)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66167

Sometimes due to desync we see PG wrapper monitored barrier fail. In
this case it would be useful to print the info about the collective that was
trying to run along with the actual error.
ghstack-source-id: 140037653

Test Plan: CI

Reviewed By: cbalioglu

Differential Revision: D31353021

fbshipit-source-id: e2a515326c9314c98119978d5566eb5431cca96c
2021-10-08 09:14:24 -07:00
Rohan Varma
b5b1d49a66 [PG Wrapper][BE] Make some methods private (#66166)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66166

These methods should be private.
ghstack-source-id: 139782587

Test Plan: CI

Reviewed By: cbalioglu

Differential Revision: D31353020

fbshipit-source-id: 583fb315cc2cacc37df3d29cd5793b42558930b3
2021-10-08 09:13:02 -07:00
Peter Bell
0cad2c0615 Move intraop_launch_future from Parallel.h (#64166)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64166

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30728585

Pulled By: dagitses

fbshipit-source-id: 75a41418ae9218bec9bac27597051295222b6eee
2021-10-08 09:07:35 -07:00
Scott Wolchok
2d885ab73d [jit] Reduce refcounting of Types (#65345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65345

FooType::get() can return a const reference. Inconveniently, converting shared_ptr<FooType> to shared_ptr<Type> requires a copy & refcount bump, so to properly take advantage of this in unshapedType() we need to take a const Type& in isSubtypeOf(), which is good practice anyway -- don't require a shared_ptr if you don't need to take ownership.
ghstack-source-id: 140044165

Test Plan:
CI

perf says c10::unshapedType time decreased from 2.8% to 2.2% during static runtime startup, though I expect this to be generally beneficial.

Reviewed By: hlu1

Differential Revision: D31027361

fbshipit-source-id: 676feb81db9f74ad7b8651d8774f4ecb4cfa6ab8
2021-10-08 09:03:04 -07:00
Scott Wolchok
1ae468a484 [jit] Refcounting spot fixes (#65346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65346

Tidying up the top sources of reference count decrements seen during static runtime startup.
ghstack-source-id: 140027349

Test Plan:
CI

perf now shows under 2% time spend in ~__shared_count instead of about 5%.

Reviewed By: suo

Differential Revision: D31057277

fbshipit-source-id: 9a16daf2e655fda80d4ec21290b30f02ba63d8da
2021-10-08 08:39:20 -07:00
Kevin Tse
8ebe1a924d [DataPipe] moving mux IterDataPipe test to the right location (#66277)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66277

Previously, it is grouped together with tests related to `MapDataPipe`, but it should be with `IterDataPipe`.

cc VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31485823

Pulled By: NivekT

fbshipit-source-id: d13d8c28cbfc305da0e3033d4109a0f971281a02
2021-10-08 08:32:29 -07:00
Kevin Tse
ed17851642 [DataPipe] adding test for IterableWrapperIterDataPipe (#66276)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66276

cc VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31485824

Pulled By: NivekT

fbshipit-source-id: c7b21636e4b17e264bfb5dbea69cd3c477472f0b
2021-10-08 08:32:26 -07:00
Kevin Tse
e808e3d3d6 [DataPipe] adding SequenceWrapperMapDataPipe (#66275)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66275

Once this is added to Core, TorchData's PR will not need a custom class and can use this wrapper instead.

cc VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31485822

Pulled By: NivekT

fbshipit-source-id: 790de27629c89c0ca7163a8ee5a09ee8b8233340
2021-10-08 08:32:24 -07:00
Vasiliy Kuznetsov
a7cc07f109 quantized embedding: make error message clearer (#66051)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66051

Make the error message clearer when quantized embedding is converted
with an unsupported dtype. This is helpful when debugging quantization
errors on new models.

Test Plan:
```
class M(nn.Module):
    def __init__(self):
        super().__init__()
        self.embedding = nn.Embedding(1, 1)

m = M().eval()
m.qconfig = torch.quantization.QConfig(
    activation=torch.quantization.MinMaxObserver.with_args(dtype=torch.qint8),
    weight=torch.quantization.MinMaxObserver.with_args(dtype=torch.qint8))
m.embedding.qconfig = m.qconfig
mp = torch.quantization.prepare(m)
mq = torch.quantization.convert(m)
// error message now includes the incorrect dtype
```

Imported from OSS

Reviewed By: dagitses

Differential Revision: D31472848

fbshipit-source-id: 86f6d90bc0ad611aa9d1bdae24497bc6f3d2acaa
2021-10-08 08:32:22 -07:00
Vasiliy Kuznetsov
c9aba3b128 make error message when trying to quantize non floats more specific (#66050)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66050

Adds the dtype to an error message when trying to quantize something
other than a float.  This is useful for debugging quantization tools on
new models.

Test Plan:
```
x = torch.randn(1, 1, 1, 1, dtype=torch.double)
xq = torch.quantize_per_tensor(x, 0.01, 0, torch.quint8)
// error message now includes Double
```

Imported from OSS

Reviewed By: dagitses

Differential Revision: D31472849

fbshipit-source-id: 2331ffacefcbc6f8eca79694757d740de74a0f1d
2021-10-08 08:32:19 -07:00
Vasiliy Kuznetsov
81660c08f0 quantized add: enable broadcasting (#66049)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66049

Enables quantized add with broadcasting. As pointed out by jamesr66a,
this was disabled but TensorIterator already supports it. Added a test
case to verify.

Test Plan:
```
python test/test_quantization.py TestQuantizedOps.test_qadd_broadcast
```

Imported from OSS

Reviewed By: dagitses

Differential Revision: D31472850

fbshipit-source-id: a3b16d9000487918db743525d22db6864330762b
2021-10-08 08:31:07 -07:00
Edward Yang
ece0221854 Rename int to long, add more C++ types. (#66108)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66108

BC-breaking change: intT is now longT (which aligns it more accurately with how
the types are referred to in C++).  The benefit for this is we can idiomatically
express all C++ dtypes (with intT now mapping to int32_t).  These types are needed
for ufunc codegen in a latter patch.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31385761

Pulled By: ezyang

fbshipit-source-id: ec6f3a0953794313470dbe14911f23ac116be425
2021-10-08 08:25:06 -07:00
Edward Yang
11bc435622 Allow registration of custom symbolics for prim namespace (#64460) (#66139)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66139

[ONNX] Add prim::PythonOp check back in export.cpp (#64944)

Add prim::PythonOp check back in export.cpp

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31424102

fbshipit-source-id: 6d2eef767fab846ed79ea509e97b714072bac9f4

Co-authored-by: jiafatom <jiafa@microsoft.com>
2021-10-08 07:41:06 -07:00
Edward Yang
9b09a5f7ba [ONNX] Enable scripting tests (#64780) (#66138)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66138

* Scripting tests

* Fixed scripting tests for lower opsets

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31424099

fbshipit-source-id: 67095b7ac67b9da986961788392aa92c95cf11f2
2021-10-08 07:41:03 -07:00
Edward Yang
53fefaa916 [ONNX] Fix duplicated output same name case (#64190) (#66137)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66137

* fix duplicated output node same output name issue.

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31424100

fbshipit-source-id: b1b06a92c51744030788b651f3a597d987a8deda

Co-authored-by: hwangdeyu <dejack953@outlook.com>
2021-10-08 07:41:01 -07:00
BowenBao
4af47eb3a7 [ONNX] Update slice process shape to support rank only inference (#65782) (#66149)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66149

Updated logic will be able to infer rank of slice output, when only rank is known for slice input. Enables cases where `ConstantValueMap::HasRank(input)` is `True`, while `ConstantValueMap::HasShape(input)` is `False`.

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31423232

Pulled By: ezyang

fbshipit-source-id: 516e3916aa71afda2b10e44620636e42ed837236

Co-authored-by: BowenBao <bowbao@microsoft.com>
2021-10-08 07:39:40 -07:00
Richard Zou
dc37547c44 Opinfos for avg_pooling (#64214)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64214

Added OpInfos for:
- F.adapative_avg_pool{1, 3}d
- F.avg_pool{1, 3}d

The 2d variants already had OpInfos.

Test Plan: - run tests

Reviewed By: albanD, mruberry

Differential Revision: D30667797

Pulled By: zou3519

fbshipit-source-id: 53f5cd02070de5b7db4abb017d727376b59288df
2021-10-08 07:26:08 -07:00
Jeeja KP
8d6d448238 Add HPU for Autograd Fallback (#65605)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65605

Reviewed By: albanD

Differential Revision: D31373899

Pulled By: ezyang

fbshipit-source-id: 894f62dc44b0532f152dc97b839eecfbaed25e8c
2021-10-08 07:21:44 -07:00
Ankita Sharma
4af913a7cf fixed minor issues for index_add in docs (#65806)
Summary:
Hi, I'm looking forward to contributing to PyTorch, so starting with a minor fix in the documentation for `index_add`.

Currently, in the documentation for `index_add_` (please see https://pytorch.org/docs/master/generated/torch.Tensor.index_add_.html#torch.Tensor.index_add_):

1. `tensor` attribute was pointing to `torch.tensor` class, which IMO - is (thought may not be a big deal) unintentional.
2. `dim` attribute is pointing to `torch.Tensor.dim`, which again IMO - is unintentional.

This PR suggests a correction for the first point above, to rename `tensor` attribute to `input` so that it doesn't point to `torch.tensor` class. (I've verified that others ops like `scatter` use `input`, so this should not break the consistency in the documentation). I couldn't find an appropriate fix for the second point above, since renaming `dim` to something else will break the consistency (as almost all others op in PyTorch use `dim` as the attribute name).

I may be wrong here, so please let me know if there is any feedback or an alternate fix for this.

_Note:_ I plan to fix this behavior for `index_copy_` (https://pytorch.org/docs/master/generated/torch.Tensor.index_copy_.html#torch.Tensor.index_copy_) once and if this PR is approved.

To the reviewers, please help me tag the correct person who could help review this PR.

cc: krshrimali mruberry zou3519

cc brianjo mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65806

Reviewed By: dagitses, mruberry

Differential Revision: D31431182

Pulled By: zou3519

fbshipit-source-id: 66ced9677ac3bc71d672d13366f9f567ecea0a2d
2021-10-08 07:17:15 -07:00
Luca Wehrstedt
61f0bb70c1 Allow external CUDA streams to be set as current (#65914)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/65822.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65914

Reviewed By: dagitses

Differential Revision: D31389480

Pulled By: lw

fbshipit-source-id: 2b2f40e5452c5b2a0b9f0f705750d2aa9deb2ead
2021-10-08 06:09:32 -07:00
Shiyan Deng
60fe854f9f [fx2trt] save and load TRTModule for OSS (#65958)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65958

zhxchen17 added `pickle` pybind for trt engine which allows us to save and load a nn.Module with trt engine in fbcode. This diff though is explicitly ser/des engine in __set_state__` and `__get_state__` so that in OSS people can also save and load TRTModule directly.

Test Plan: buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fx2trt

Reviewed By: wushirong

Differential Revision: D31309429

fbshipit-source-id: 9068e2ae6375ed0e1bb55b0e9d582b8d9c049dbf
2021-10-07 22:27:40 -07:00
jiej
321345d7c9 Revert "Revert D31227448: [pytorch][PR] fixing sorting in stride indices" (#66176)
Summary:
enabling https://github.com/pytorch/pytorch/issues/63940

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66176

Reviewed By: ngimel

Differential Revision: D31423920

Pulled By: dzhulgakov

fbshipit-source-id: 06b1e0f757f4fb5b31ee1fa464bcd689df919b9c
2021-10-07 22:09:07 -07:00
Shiyan Deng
74477ba243 [fx2trt] More controls over output dtypes (#65959)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65959

Give some more controls over the output dtype of a trt engine. Previously it would be fp16 if we turn on fp16_mode. This diff allows the engine to generate fp32 output with fp16_mode=True.

Test Plan: CI

Reviewed By: kflu, wushirong

Differential Revision: D31243929

fbshipit-source-id: 09c752e6f382d6ad169da66878d9a9277c134869
2021-10-07 22:03:51 -07:00
CodemodService FBSourceClangFormatLinterBot
227f91e72d [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D31495160

fbshipit-source-id: b0a56003a6695989dff0d325cdc118182662ec61
2021-10-07 21:09:22 -07:00
Ben Koopman
a58ff186e8 [quant][embedding qat] Add basic EmbeddingBag QAT fakeQuant workflow (#65443)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65443

Test Plan: Imported from OSS

Reviewed By: dagitses, supriyar

Differential Revision: D31456445

Pulled By: b-koopman

fbshipit-source-id: 0edda6e272d9005fce65f2ba6a5e6abc831836de
2021-10-07 20:19:29 -07:00
Dhruv Matani
64caee1356 [PyTorch Edge] Leave out field for debug_handle if not being built with eager symbolication support (#66131)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66131

Turns out that a model with 72k instructions causes about 0.5MiB of additional memory overhead (if there's an 8 byte memory overhead per instruction). This is not necessary if we're building w/o eager symbolication support. This change eliminates the 8 byte `debug_handle` if the build is w/o eager symbolication support.
ghstack-source-id: 140045478

(Note: this ignores all push blocking failures!)

Test Plan:
```
buck build -c "pt.enable_eager_symbolication"=1 //xplat/caffe2/fb/lite_predictor:lite_predictor
buck build //xplat/caffe2/fb/lite_predictor:lite_predictor
```

Reviewed By: kimishpatel

Differential Revision: D31387784

fbshipit-source-id: af56787ad833b990a46b79ab021e512edaa22143
2021-10-07 20:01:18 -07:00
Nikita Shulga
ebe530a9cd Periodic jobs should not have CIFLOW_DEFAULT label (#66300)
Summary:
Noticed that `periodic-pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-slow-gradcheck` job has a `ciflow/default`, but does not have a `ciflow/scheduled` label
Added asserts to enforce that jobs with non-trival is_scheduled property do not have default and do have scheduled labesl

Rename `periodic-pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-slow-gradcheck` to `periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck`

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66300

Reviewed By: seemethere

Differential Revision: D31493323

Pulled By: malfet

fbshipit-source-id: 194c1d7a4e659847d94a547b87a0d7d08e66406d
2021-10-07 19:57:32 -07:00
Peter Bell
bd9eee4e65 TBB: Use static partitioner to match OpenMP scheduling (#65327)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65327

Should fix https://github.com/pytorch/pytorch/issues/64571

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D31474116

Pulled By: malfet

fbshipit-source-id: 8c4264d4778c6caf58261e3f70d72decd134128d
2021-10-07 19:12:36 -07:00
Nikita Shulga
d5033410b1 Parallel: Deduplicate parallel functions in different backends (#65326)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65326

parallel_for and parallel_reduce currently share some common code in
all backends, specifically for detecting if it should run in parallel
or not. This moves all the backend-specific code into a single
`internal::invoke_parallel` function and makes the `parallel_`
functions common to all backends.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D31124495

fbshipit-source-id: 65c3d2af42a8860cc4d6349566085c9fa8d8c6f0
2021-10-07 19:11:19 -07:00
Nikita Shulga
e1817d895f [BE] Cleanup python_function.cpp (#66296)
Summary:
- Delete unused `var_input_idx`
- Fix `uninitialized variable` clang-tidy warning by setting `PyObject* input` to PyNone

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66296

Reviewed By: janeyx99

Differential Revision: D31491016

Pulled By: malfet

fbshipit-source-id: 08267144be0cd049d122580cdf81cf586c3e30a6
2021-10-07 18:41:17 -07:00
Eli Uriegas
ca363d1e22 docker: Ensure libgnutls30 for all docker builds (#66258)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66258

Installing libgnutls30 has shown to be good when confronted with the
CERT issue related to deb.nodesource.com

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D31477789

Pulled By: seemethere

fbshipit-source-id: f87ae4c098771acc505db14e3982d8858cf7326f
2021-10-07 18:36:40 -07:00
Rohan Varma
38f5144eae Fix https://github.com/pytorch/pytorch/issues/61982 (#66015)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66015

Fixes https://github.com/pytorch/pytorch/issues/61982 by clone of
tensors in DDPSink. Only applies once for static_graph and generally for unused
params which already has overhead, so perf hit should not be an issue. Will
verify with benchmark.

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D31346633

fbshipit-source-id: 5b9245ade628565cffe01731f6a0dcbb6126029b
2021-10-07 18:11:18 -07:00
Peter Bell
20f2e55d4f Rename cuda/Resize.cu to cuda/Resize.cpp (#65943)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65943

These files don't require nvcc to compile.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31386277

Pulled By: ngimel

fbshipit-source-id: 1066ee87fa795e2c7969447fbce1fe2633fb9680
2021-10-07 16:37:51 -07:00
Ashish Solanki
86de09e49a Upgrade to ubuntu:trusty-20190515 (#63468)
Summary:
Security Upgrade to ubuntu:trusty-20190515

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63468

Reviewed By: ngimel

Differential Revision: D31393552

Pulled By: malfet

fbshipit-source-id: 4e2399e3cddc1d549c08c82c08015e00569c19bc
2021-10-07 16:28:08 -07:00
Don Jang
416f593080 [Static Runtime] Group graph nodes into input aliases & output aliases (#65517)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65517

This change retrofits `GetAlwaysAliveValues` into `ValueGroup` to group the values used by a graph into three groups as follows:

- input_aliases:  values that are either inputs or contain aliases of inputs or constants.
- output_aliases: values that are either outputs or contain aliases of outputs and are not in input_aliases.
- Values that dont't show up in input_aliases and output_aliases are internally created consumed within the graph.

`output_aliases` is the only new group introduced by this change, and a following diff will use this to preallocate output Tensors to accelerate Static Runtime's performance.

Test Plan: Added `ValueGroup.Init` to cover the updated code path. Note that there was no test for `GetAlwaysAliveValues` before.

Reviewed By: hlu1

Differential Revision: D30940955

fbshipit-source-id: 2cb065ecda0f447a61e64a7cf70cc7c6947f7dfc
2021-10-07 14:35:12 -07:00
Mikayla Gawarecki
0e2d1b221a [Bootcamp][Pytorch Core] Add testing for complex non-vanilla SGD
Summary: Adding test to ensure non-Vanilla SGD behaves as if complex numbers are two real numbers in R^2 as per issue 65711 on github

Test Plan:
```buck test mode/dev caffe2/test:optim -- 'test_sgd_complex'```

https://pxl.cl/1QLxw

Reviewed By: albanD

Differential Revision: D31477212

fbshipit-source-id: 500678e561a05ac96759223b4c87a37cab26c6a6
2021-10-07 14:07:39 -07:00
Shunting Zhang
5e7d8ec846 Support Registering a Variable Length List of Builtin Modules for torch::deploy Builtin Libraries (#66021)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66021

A builtin library consists of a list of frozen modules and a list of builtin modules. For tensorrt, it's quite simple since we only have a single builtin module tensorrt.tensorrt. But it can be complex for libraries like numpy which contains multiple builtin modules (np.core._multiarray_umath, np.random.mtrand etc.) if we want to add it as a torch::deploy builtin. We enhance the macro that registers builtin libraries to accept a variable length of builtin modules. We can use this macro to register frozentorch, frozenpython, tensorrt for now and can also use it to register libraries like numpy later on.

The enhanced macro now looks as follows. Although we don't need to worry about back-compatibility for now,  but this enhanced version is fully compatible with the previous version. The previous version is just a special case when the library contains no builtin modules.

 ```
REGISTER_TORCH_DEPLOY_BUILTIN(library_name_without_quote, frozen_modules_list,
    builtin_module_name_1, builtin_module_init_function_1, ...,
    builtin_module_name_N, builtin_module_init_function_N)
```
ghstack-source-id: 140007970

Test Plan:
1. Play around with interactive_embedded_interpreter.cpp to import torch._C, tensorrt.tensorrt etc inside the embedded interpreter.
2. Enhance test_builtin_registry.cpp
3. Run test_deploy.cpp and test_deploy_gpu.cpp

Reviewed By: suo

Differential Revision: D31349390

fbshipit-source-id: 70a1fcf660341180fc4d5195aed15ceb07c2bef7
2021-10-07 13:23:46 -07:00
Raghavan Raman
40dd2711b6 [Static Runtime] Cleanup LLVMCodeGen memory after code gen completes (#66218)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66218

This stack of diffs reduces the memory used by LLVMCodeGen object.

Here are the numbers on model `294738512`: (this is the number reported as `Memory turnover after freeze_module:` in the output)

```
Before: 123343496
After : 121566008
```

So, there is a reduction of about `~1.77MB` with this change of making `PytorchLLVMJIT` a singleton.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM, hlu1

Differential Revision: D31445798

Pulled By: navahgar

fbshipit-source-id: c860d36456b2c5d3e21010c1217e2948326f666d
2021-10-07 13:17:13 -07:00
Raghavan Raman
7e5ef5e517 [nnc] Added a cache to use singleton instances of PytorchLLVMJIT for every triple,cpu,attrs combination (#66217)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66217

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D31445797

Pulled By: navahgar

fbshipit-source-id: 4e1450100928132ccce4ef3c6c20ad6661cfabed
2021-10-07 13:17:11 -07:00
Raghavan Raman
c30dc52739 [nnc] Use given kernel function name while emitting code (#66216)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66216

Test Plan: Imported from OSS

Reviewed By: dagitses, priyaramani

Differential Revision: D31445799

Pulled By: navahgar

fbshipit-source-id: 8d164209831339d364710b14f6a263a16e108281
2021-10-07 13:15:46 -07:00
Bin Wen
3cc40253d9 add gather to ShardedTensor (#65671)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65671

Tentative implementation to use dist.gather_object to collect shards from all ranks and then "merge" them. The merge is done on dst_rank though padding the sharded tensors into the size of full tensor based on their metadata (offsets, lengths) first, and then summing these padded tensors together.

Also considered concatenating sharded tensor without padding to minimize memory footprint (assuming padding will increase memory). But it may not be flexible enough for arbitrary sharing (e.g. shard on multiple directions)

Another way can be constructing the padded tensor on each rank and reduce to rank0. I feel this is the most easy implementation. But it will invoke higher memory usage and comm payload. Please let me know if this alternative is preferred.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23

Test Plan:
Imported from OSS

  python test/distributed/_sharded_tensor/test_sharded_tensor.py -v -k test_gather

did not manage to test on oss, but tested in fbcode by reserving on demand gpu

  arc patch D31197611

modify the test with 2 gpus as on-demand gpu only has 2 cores (D31227986)

   buck test -c fbcode.enable_gpu_sections=true mode/dev-nosan caffe2/test/distributed/_sharded_tensor:sharded_tensor -- test_gather

   buck-out/gen/caffe2/test/distributed/_sharded_tensor/sharded_tensor#binary.par  test_sharded_tensor.TestShardedTensorChunked.test_gather

{F667213605}

Reviewed By: dagitses, pritamdamania87

Differential Revision: D31197611

Pulled By: dracifer

fbshipit-source-id: cf98b4a2d7838b11b9582eb23f826bb0fa38a7f4
2021-10-07 13:01:12 -07:00
Peter Bell
f445ed19b2 OpInfo for 2d fft functions (#66128)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66128

cc mruberry peterbell10

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D31450217

Pulled By: mruberry

fbshipit-source-id: 1952fc60c5d5f454966c43f5710b8b97a9794d0e
2021-10-07 12:50:06 -07:00
Peter Bell
2213c463ba C++ API and docs for hfftn (#66127)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66127

cc mruberry peterbell10

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D31450216

Pulled By: mruberry

fbshipit-source-id: 2878aee294aa7d74482b66d536258bac0541408d
2021-10-07 12:48:36 -07:00
Peter Bell
e6a4f746c2 slow_conv3d: Use at::sum for grad_bias accumulation (#65758)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65758

The same change has been made in conv2d, the proper algorithm is both
faster and gives more precision.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31257872

Pulled By: ngimel

fbshipit-source-id: 6ff3a7a00a05b66f83d45cc820bd0c230cb8de6d
2021-10-07 12:20:49 -07:00
Ivan Yashchuk
2e4e5b0264 Add inplace_variant for resize_ OpInfo (#66135)
Summary:
Enable testing of `torch.Tensor.resize_`.
The negative view test is skipped as the test doesn't work with resize_ see
https://github.com/pytorch/pytorch/issues/65945.

cc mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66135

Reviewed By: dagitses

Differential Revision: D31444263

Pulled By: mruberry

fbshipit-source-id: 00c7fe05df28fba01508b31adb3ed4fdcf4d0326
2021-10-07 12:00:30 -07:00
Samuel Salas
361b34eb81 Chunk: acc_ops (#66010)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66010

Added chunk acc op and unit test.

Removed misleading return statements.

Test Plan: buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer

Reviewed By: 842974287

Differential Revision: D31326490

fbshipit-source-id: 81183ad8773eb7471566bec07cdd3dd6c4cee217
2021-10-07 11:41:00 -07:00
Patrick Spencer
9fb6ba24e7 Update torch.fx.passes.split_module docstring (#65542)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65542

Add docstring for torch.fx.passes.split_module that conforms to Google Python Style conventions.

Changed original example to the example from this diff:
https://www.internalfb.com/diff/D24925283 (9734c042b8)

Test Plan:
Ran buck test //caffe2/test:fx. No errors detected
https://pxl.cl/1QCch

Reviewed By: jamesr66a

Differential Revision: D31145694

fbshipit-source-id: 8e54f3b1be3dca1c4d414fdeeab71b9f2b5d9f3e
2021-10-07 10:37:10 -07:00
Mike Iovine
d5f64afc38 [Static Runtime] Support aten::to.prim_dtype overload (#64928)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64928

Added support this overload of `aten::to`:
```
aten::to.prim_dtype(Tensor(a) self, int? dtype, bool non_blocking=False, bool copy=False) -> Tensor(a|b)
```

Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- IndividualOps_to`

Reviewed By: hlu1

Differential Revision: D30901398

fbshipit-source-id: 38ce807c30185e92dd472b404b362f22ac7e4efb
2021-10-07 10:22:44 -07:00
Will Constable
a8c0b362ce [pytorch][PR] Add hash and int128 utils for Lazy Tensor Core" (#66181)
Summary:
These utils are prerequisites for Lazy Node base class.
- set up new torch/csrc/lazy, test/cpp/lazy dirs
- add source files to build_variables.bzl in new lazy_core_sources var
- create new test_lazy binary

Fixes https://github.com/pytorch/pytorch/issues/65636

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66181

Original commit changeset: 3d0d5377d71e

Test Plan:
Run PyTorch XLA corresponding PR in XLA CI:
https://github.com/pytorch/xla/pull/3148/files

Reviewed By: suo

Differential Revision: D31416438

fbshipit-source-id: 58a6a49c5bc30134bc6bae2e42778f359b9a8f40
2021-10-07 10:05:26 -07:00