Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62006
Closes gh-24646, gh-24647
There is no `TensorIterator` equivalent to these kernels so this is just
migrating the existing kernels over to the ATen style.
I've benchmarked for contiguous tensors with this script:
```
import torch
shape = (10, 10, 100, 100)
x = torch.randn(*shape, device='cuda')
w = torch.randn((10, 1, 5, 5), device='cuda')
for _ in range(100):
torch.nn.functional.conv2d(x, w, groups=10)
```
and similarly for backwards. I see these as the same to within measurement error.
| | Master Forward (us) | This PR Forward (us) |
|------------------:|:-------------------:|:--------------------:|
| Forward | 133.5 | 133.6 |
| Backward (input) | 1,102 | 1,119 |
| Backward (weight) | 2,220 | 2,217 |
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D29883676
Pulled By: ngimel
fbshipit-source-id: 9b2ac62cdd8a84e1a23ffcd66035b2b2fe2374d8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61892
This PR changes is_reference=True for linear to produce a pattern consists of dequant - float linear - quant instead of reference linear module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for
convert in the future.
Test Plan:
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D29810657
fbshipit-source-id: 949615bbc017bc454d81c8a6b2bdec53badaab19
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62213
Added sanity checks in preprocess function for Android NNAPI delegate.
`preprocess()` requires some input metadata passed through its `method_compile_spec` function argument.
`preprocess()` now throws specific error messages, if it cannot find the correct input arguments.
Example error message:
```
RuntimeError: method_compile_spec does not contain the "forward" key.
method_compile_spec should contain a Tensor or Tensor List which bundles input parameters: shape, dtype, quantization, and dimorder.
For input shapes, use 0 for run/load time flexible input.
method_compile_spec must use the following format: {"forward": {"inputs": at::Tensor}} OR {"forward": {"inputs": c10::List<at::Tensor>}}
```
nnapi_backend_preprocess.cpp: contains sanity check implementation
test_backend_nnapi.py: sanity check unit tests
Test: Ran `python test/test_jit.py TestNnapiBackend` in OSS successfully.
TODO: Using Tensors to pass input parameters is a temporary hack. When a dedicated object is implemented, update the sanity check error message.
ghstack-source-id: 134339282
Test Plan: Ran `python test/test_jit.py TestNnapiBackend` in OSS successfully.
Reviewed By: raziel, iseeyuan
Differential Revision: D29917004
fbshipit-source-id: 0d5c6b35889c556cda905ffc29c25c5422ae9ee4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61871
When set_static_graph=False, the only type of dynamism we really
support in DDP is dynamic set of unused parameters which must be explicitly
enabled with find_unused_parameters=True. Although, some workflows have static
set of unused parameters, would be good to detect and add this to logging to
identify workflows that are candidates for static graph optimization.
ghstack-source-id: 134371429
Test Plan: CI
Reviewed By: zhaojuanmao
Differential Revision: D29773962
fbshipit-source-id: 1f741984c6e6f8e3e55cf69ca719b1e25a485b13
Summary:
Re-land of D29935444
We previously had lots of ops with implementations like this:
```
if (p_node->Output(0).isNone()) {
p_node->Output(0) = create_empty_like(input_0);
}
...
auto& out = p_node->Output(0);
some_func_out(inputs, out);
```
This would make the output have the correct shape. But it would
also take the dtype of `input_0`, which is not always correct.
This change transforms these blocks to:
```
if (p_node->Output(0).isNone()) {
p_node->Output(0) = some_func(inputs)
} else {
...
auto& out = p_node->Output(0);
some_func_out(inputs, out);
}
```
This gives the output the correct shape and dtype.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62267
Reviewed By: ejguan
Differential Revision: D29937253
Pulled By: malfet
fbshipit-source-id: d91ca5d5703490d7d349a1de2ad3bb09b0c33967
Summary:
This PR contains the initial version of `ModuleInfo` for use in testing modules. The design philosophy taken here is to start small and simple and build out / refactor as needed when more test coverage or `ModuleInfo` entries are added. As such, it's not intended for general usage yet. The PR contains the following:
* (new file) `torch/testing/_internal/common_modules.py`
* `ModuleInfo` definition - metadata for each module to use in testing
* `module_db` - the actual `ModuleInfo` database; currently contains entries for two modules
* `ModuleInput` - analogous to `SampleInput` from OpInfo; contains `FunctionInput`s for both constructor and forward pass inputs
* Constructor and forward pass inputs are tied together within a `ModuleInput` because they are likely correlated
* `FunctionInput` - just contains args and kwargs to pass to a function (is there a nicer way to do this?)
* `modules` decorator - analogous to `ops`; specifies a set of modules to run a test over
* Some constants used to keep track of all modules under torch.nn:
* `MODULE_NAMESPACES` - list of all namespaces containing modules
* `MODULE_CLASSES` - list of all module class objects
* `MODULE_CLASS_NAMES` - dict from module class object to nice name (e.g. torch.nn.Linear -> "nn.Linear")
* (new file) `test/test_modules.py`
* Uses the above to define tests over modules
* Currently, there is one test for demonstration, `test_forward`, which instantiates a module, runs its forward pass, and compares it to a reference, if one is defined
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61935
Reviewed By: mruberry
Differential Revision: D29881832
Pulled By: jbschlosser
fbshipit-source-id: cc05c7d85f190a3aa42d55d4c8b01847d1efd57f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61907
Removing the code for faulty process group agent since it was replaced by faulty tensorpipe agent
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision: D29794666
Pulled By: H-Huang
fbshipit-source-id: 0b35191cc07220b6774ecacc8d004f25fd2e87f0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61859
BC-breakign note:
Previously we do not add observer/fake_quant for output of add/mul for tensor - scalar operation,
in this PR we added the observer/fake_quant instance (that's the same as input) to correctly model
the behavior of the quantized add_scalar and mul_scalar op (since quantized add/mul scalar assumes the
output quantized tensor have the same quantization parameter as input)
Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_add
python test/test_quantization.py TestQuantizeFxOps.test_mul
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D29770859
fbshipit-source-id: f43fcbfecd04c392467770b22c481bbbdaf43c25
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62079
Adds support for kwarg arguments into functional optimizer running as
hook.
ghstack-source-id: 134330379
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision: D29838127
fbshipit-source-id: 2ab051ef5f0dff19c145ebe2260668b927ba47b2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62078
Ensure that kwarg arguments such as momentum and weight decay maintain
parity between optimizer.step and step_param.
ghstack-source-id: 134330377
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision: D29837942
fbshipit-source-id: 1ae39648fc26aebd8aaef1a7ac0e03b598a8ed60
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61756
DDP will support running optimizer as communication hook with
optimizers that support a per-parameter/gradient step function `step_param`.
Add parity tests as we implement more optimizers that support step_param to
ensure parity with regular optimizers.
ghstack-source-id: 134330378
Test Plan: Ci
Reviewed By: SciPioneer
Differential Revision: D29727549
fbshipit-source-id: 18977c896f12b8e478298488b298fd107affcf5f
Summary:
Increase `page_idx` in the loop rather than outside of it
Break from the loop when receive empty response as it means there are no more items to fetch via pagination request
Also, add options to use provided github token (via `GITHUB_TOKEN` environment variable)
Fixes failure with "Rate Limit Exceeded" when doing something like `torch.hub.list("pytorch/test-infra:dsf")`
Fixes https://github.com/pytorch/pytorch/issues/61755
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62072
Reviewed By: jbschlosser
Differential Revision: D29868539
Pulled By: malfet
fbshipit-source-id: 206082a0ba1208e9b15ff6c9c6cb71d2da74f1c3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61933
### Issue:
SubModules with same name are not serialized correctly in bytecode format while using `_save_for_mobile`. These submodules are not distinguished as different modules even though they have different foward, setstate etc if they have the same name.
### Fix:
Mangler creates unique names so that modules and submodules that have same names can be uniquely identified while saving the module. iseeyuan rightly pointed out the underlying issue that mangler is not used in the process of saving bytecode and hence unique references for the submodules are not created. Please refer to the notebook to repro the issue: N777224
### Diff:
The above idea of fix is implemented. The mangled names are used in bytecode thereby the files in `code/` directory now have right reference to the `bytecode.pkl`
Will this have backward compatibility?
iseeyuan please feel free to correct or update this.
Yes. This fix impacts only modules with same name sub modules which were not serialized correctly before. Existing modules should have correct references and `_load_for_mobile` must not see any change. To confirm this the existing test cases need to pass for the diff to be approved and shipped.
ghstack-source-id: 134242696
Test Plan:
```
~/fbsource/fbcode > buck test caffe2/test/cpp/jit:jit -- BackendTest.TestCompositeWithSetStates
Downloaded 0/5 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 19.2 sec (100%) 17619/17619 jobs, 3/17619 updated
Total time: 19.5 sec
More details at https://www.internalfb.com/intern/buck/build/91542d50-25f2-434d-9e1a-b93117f4efe1
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: de9e27cf-4c6c-4980-8bc5-b830b7c9c534
Trace available for this run at /tmp/tpx-20210719-161607.659665/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/844425127206388
✓ ListingSuccess: caffe2/test/cpp/jit:jit - main (8.140)
✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestCompositeWithSetStates (0.528)
Summary
Pass: 1
ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/844425127206388
```
```
~/fbsource/fbcode > buck test caffe2/test/cpp/jit:jit -- BackendTest.TestConsistencyOfCompositeWithSetStates
Building: finished in 4.7 sec (100%) 6787/6787 jobs, 0/6787 updated
Total time: 5.0 sec
More details at https://www.internalfb.com/intern/buck/build/63d6d871-1dd9-4c72-a63b-ed91900c4dc9
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 81023cd2-c1a2-498b-81b8-86383d73d23b
Trace available for this run at /tmp/tpx-20210722-160818.436635/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8725724325952153
✓ ListingSuccess: caffe2/test/cpp/jit:jit - main (7.867)
✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestConsistencyOfCompositeWithSetStates (0.607)
Summary
Pass: 1
ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/8725724325952153
```
To check the `bytecode.pkl` using module inspector please check:
N1007089
Reviewed By: iseeyuan
Differential Revision: D29669831
fbshipit-source-id: 504dfcb5f7446be5e1c9bd31f0bd9c986ce1a647
Summary:
CI built the documentation for the recent 1.9.0rc1 tag, but left the git version in the `version`, so (as of now) going to https://pytorch.org/docs/1.9.0/index.html and looking at the version in the upper-left corner shows "1.9.0a0+git5f0bbb3" not "1.9.0". This PR should change that to cut off everything after and including the "a".
It should be cherry-picked to the release/1.9 branch so that the next rc will override the current documentation with a "cleaner" version.
brianjo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58486
Reviewed By: zou3519
Differential Revision: D28640476
Pulled By: malfet
fbshipit-source-id: 9fd1063f4a2bc90fa8c1d12666e8c0de3d324b5c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62202
Add acc_ops.flatten converter. Also migrate to oss acc tacer for trt interpreter.
Test Plan: unit test
Reviewed By: khabinov
Differential Revision: D29861555
fbshipit-source-id: dac88a703fdbf386f3f7fb27674a67951f3add49
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61791
methods from forward
During inlining we attached InlinedCallstack to nodes being inlined. In
the process we attach moodule information as well, such that if
CallMethod is being inlined we know which class instance and class type
the method belongs to. However, CallMethod can be calling a method of
the same object to which the graph belongs. e.g.:
```
def forward(self, input):
x = input + 10
return forward_impl_(x, input)
```
Here forward_impl is method defined on the same class in which forward
is defined. Existing module hierarchy annotation will mislabel this as
unknown instance since the method is not associated with output of
GetAttr node (it would be we had called self.conv.forward_impl_ for
example).
Change in this PR reconciles this by creating a placeholder name "SELF"
for module instance indicating that you can traverse InlinedCallStack
backwards to find first node with name != SELF, which would be the name
of the object.
e.g.:
TOP(ResNet)::forward.SELF(ResNet)::_forward_impl.layer1(Sequential)::forward.0(BasicBlock)::forward.conv1(Conv2d)::forward.SELF(Conv2d)::_conv_forward
Test Plan:
Add test
Imported from OSS
Reviewed By: larryliu0820
Differential Revision: D29745443
fbshipit-source-id: 1525e41df53913341c4c36a56772454782a0ba93
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62192
This support is hacky because it doesn't preserve meta tensor storage
sharing (e.g., if you serialize a model with shared storage, e.g., a
tensor and a view on a tensor, when I deserialize the viewing
relationship will be broken and these are just different tensors.) The
hack is also durable, in the sense that we will be on the hook for
supporting `_rebuild_meta_tensor_no_storage` in perpetuity in the
future, even if we change our mind about the serialization format.
This unblocks an FB production use case. I didn't add C++ support to minimize
blast area of this patch.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D29910535
Pulled By: ezyang
fbshipit-source-id: d98dcdd0108dfc3ae730a071d3c583b6d0281d21
Summary:
This PR disables the `cppcoreguidelines-non-private-member-variables-in-classes` check. PyTorch makes use of `protected` members throughout the codebase, and we do not want to perform this clang-tidy check in CI to improve signal-to-noise.
Relevant failure: https://github.com/pytorch/pytorch/pull/61871/checks?check_run_id=3146453417
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62212
Reviewed By: driazati
Differential Revision: D29917882
Pulled By: 1ntEgr8
fbshipit-source-id: f607c3d050a122e95136f9915060c4cda6694c9d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62210Fixes#62204
Test Plan: #62211 clang-tidy should only error on the added lines (and not on context/removals)
Reviewed By: driazati
Differential Revision: D29917897
Pulled By: 1ntEgr8
fbshipit-source-id: de91dbf34c1ad8405507cad91ab3dd0d6c61d82e
Summary:
This PR enables installing our custom MacOS clang-tidy binaries. It also updates related documentation.
The binaries are produced by [this CI job](https://github.com/pytorch/test-infra/blob/master/.github/workflows/clang-tidy-macos.yml), and are published to S3.
This PR does not handle versioning of the downloaded binaries as this is being worked on separately. See https://github.com/pytorch/test-infra/issues/73
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62214
Test Plan:
On a MacOS machine, run
```bash
python3 -m tools.linter.install.clang_tidy
.clang-tidy-bin/clang-tidy --checks="*" --list-checks | grep "misc-max-tokens"
```
Reviewed By: janeyx99, mruberry
Differential Revision: D29917728
Pulled By: 1ntEgr8
fbshipit-source-id: 98d0d8b7a57bdebf0ebcdc83228ef391e8c6629e
Summary:
We previously had lots of ops with implementations like this:
```
if (p_node->Output(0).isNone()) {
p_node->Output(0) = create_empty_like(input_0);
}
...
auto& out = p_node->Output(0);
some_func_out(inputs, out);
```
This would make the output have the correct shape. But it would
also take the dtype of `input_0`, which is not always correct.
This change transforms these blocks to:
```
if (p_node->Output(0).isNone()) {
p_node->Output(0) = some_func(inputs)
} else {
...
auto& out = p_node->Output(0);
some_func_out(inputs, out);
}
```
This gives the output the correct shape and dtype.
Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest`
Reviewed By: hlu1
Differential Revision: D29887367
fbshipit-source-id: cef04bfa52ec082ad3a9a32aa27c44e275c6b24c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61458
Context
-------
functorch is unable to vmap(grad(f)) when f contains a .to
call. This is because .to (when it is not a no-op) decomposes
to .copy_ under grad and the .copy_ is not compatible with vmap.
Fix
---
The fix for this is to have all Tensor::to variants call a new operator,
`_to_copy`, that always copies and is a primitive w.r.t. autograd so
that autograd decomposes Tensor::to into a call to `_to_copy`.
(This is related to https://github.com/pytorch/pytorch/issues/60956,
please let me know if you want to bikeshed the naming).
In order to get this done I had to do a bit of refactoring. All of the
`::to` implementations now call `to_impl` which may call `_to_copy`.
Autograd codegen changes
------------------------
The second thing I had to do was modify the autograd codegen. Right now,
autograd assumes that every output is either statically known to be
differentiable or not differentiable at codegen time. `_to_copy` is a
little special because its differentiability depends on the output
dtype. e.g. `torch.randn(3, requires_grad=True).to(torch.long)` is non
differentiable. To get this to work:
- I changed how `output_differentiability` in derivatives.yaml work.
- output_differentiability can now accept "conditions" for each of the
output arguments. A "condition" is some C++ code.
- We currently only support `output_differentiability` with conditions
if there is a single output. This is for convenience and can be changed
in the future.
- I added a new `output_differentiability_conditions` field to
DifferentiabilityInfo. This gets populated in load_derivatives.yaml
- forward-mode and reverse-mode AD take
`output_differentiability_conditions` into account.
Here's how the generated code for `VariableType::_to_copy`
[looks
like](https://gist.github.com/zou3519/93462df4bda1837acee345205b7cc849)
No other autogenerated code gets modified by this PR.
Performance benchmarking
------------------------
- I benchmarked [three
cases that demonstrate overhead](https://gist.github.com/zou3519/5b6985e6906b80eec5a0dd94ed5b6a1a).
- Case A: No-op .to(). Instruction count went from 50223 to 25623. I
have no clue why but this is a good thing.
- Case B: not-no-op .to(). Instruction count went from 665291 to 671961.
This is expected; `_to_copy` adds an additional dispatch.
- Case C: not-no-op .to() forward pass and backward pass. Instruction count
went from 4022841 to 4030057. This PR adds
an additional dispatch to .to() (so there should be one additional
dispatch in the forward pass) so this number looks reasonable.
Test Plan
---------
- test_torch.py has a test_to
- test_cuda.py has test_to*
- test_autograd has tests (test_type_conversions) that exercise the
reverse-mode path
- test_ops.py has some tests (like log_softmax) that exercise the
reverse-mode and forward-mode AD path.
- test_quantization, test_namedtensor all exercise tensor.to as well.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D29801652
Pulled By: zou3519
fbshipit-source-id: bb01eb1acf3d79d84f284150d1be4be3b4ace351
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62103
This PR adds a squid proxy that's deployed dedicated for PyTorch CI. Initially we only roll out to GHA, and if things are ok we will extend this to circleci tests if necessary.
`http_proxy` and `https_proxy` are compatible with the following http clients:
- curl
- wget
- python
Existing cache policy:
```
refresh_pattern -i .(7z|deb|rpm|exe|zip|tar|tgz|gz|ram|rar|bin|tiff|bz2|run|csv|sh)$ 1440 80% 2880
```
It uses the standard squid refresh_pattern for cache requests. In our setup, we tried
to cache at least (1440 minutes - 1 day) and at max (2880 minutes - 2 days), with
last-modified factor 80% ([squid doc](http://www.squid-cache.org/Doc/config/refresh_pattern/)). Please refer to [pytorch/test-infra](https://github.com/pytorch/test-infra/tree/master/aws/websites/squid-proxy) for details.
Right now, it only applies to the `build` and `test` step, to limit the scope and make sure build and test are more reliable with egress cache.
Test Plan: Imported from OSS
Reviewed By: jbschlosser, malfet, seemethere, janeyx99
Differential Revision: D29892919
Pulled By: zhouzhuojie
fbshipit-source-id: ac17227f2553ca62881711b3e9943488dfd8defd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61959
We no longer need to cache the Input Pointer as XNNPACK has implemented a more robust approach where indirection buffer does not need to be recalculated even if activation tensor pointer changes, as long as tensor dimensions are the same.
This reverses the changes in https://github.com/pytorch/pytorch/pull/42840/files
Reviewed By: kimishpatel
Differential Revision: D29777605
fbshipit-source-id: c1750538c17bce34f885c6f1bbb1f7164ebba25b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62177
Reland of https://github.com/pytorch/pytorch/pull/61678
Fix CI failure by gating including torchvision model on whether torchvision is available or not.
ghstack-source-id: 134282165
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision: D29904101
fbshipit-source-id: 47e799eb4a90acbbda91c5857ea00de3045d49f5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62135
The initial implementation of Adam with Smart Decay had an off-by-one error. This was in the summation of the geometric series used to calculate how much built-up momentum would have been discharged in skipped minibatches.
The unit tests should have caught these, but the testing strategy missed this because k, the "number of skipped minibatches" was always either 0 or so high that the impact of the bug was too small. The impact of the bug was proportional to 1/k. The testing strategy has also been adjusted to cover this bug.
Differential Revision: D29889309
fbshipit-source-id: b086c0efed5c27f621061e726533c73658daffc6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62197
For build configs with ATEN_CPU_STATIC_DISPATCH defined, quantization tests will fail since they
require QuantizedCPU dispatch to be enabled.
This will fix some internal test failures like https://www.internalfb.com/intern/test/844424941811803?ref_report_id=0 which are run under the `caffe2_aten_cpu_inference` project
Test Plan:
buck test mode/dev //caffe2/aten:quantized_test
Imported from OSS
Reviewed By: bdhirsh
Differential Revision: D29912742
fbshipit-source-id: b117eb9f4afb51e0d0dd52fbe9d5c5be7dfafe85
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61796
We can easily handle nnapi conversion for nhwc inputs
that have 1 channel or H & W are 1
Test Plan:
pytest test/test_nnapi.py::TestNNAPI::test_flatten
Imported from OSS
Reviewed By: saketh-are
Differential Revision: D29827735
fbshipit-source-id: 65dee4b42fceef1b032bf5dd1c4cc6e020d01e14
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62109
The `size` parameter only worked correctly for *args like invocation
:10, 20 and not for list: [10, 20] and tuples: (10, 20). This PR ensures this
works similar to `torch.empty`.
ghstack-source-id: 134246166
Test Plan:
1) unit tests
2) waitforbuildbot
Reviewed By: SciPioneer
Differential Revision: D29884768
fbshipit-source-id: 7a4a3c5ed5d7c081344f6ead3170905b97fc652d