Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72866https://github.com/pytorch/pytorch/pull/71597 adds a wrapper `torch.jit.LoweredWrapper` and it breaks the model dump. Fix the model_dump in the notebook
ghstack-source-id: 149311636
Test Plan:
CI and test with N509022
Before:
{F701413403}
After:
{F701412963}
Reviewed By: iseeyuan
Differential Revision: D34247216
fbshipit-source-id: 695b02b03675fae596bb450441b327e4cdcffe9c
(cherry picked from commit d46a82a4c1)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72592
Only code paths that are not perf-critical read `ProcessedNode::num_outputs_` and also its static feature of the op that `ProcessedNode` instance is executing.
Therefore, it's better to move `ProcessedNode::num_outputs_` into `ProcessedFunction::num_outputs_` and let `ProcessedNode` access it via `ProcessedNode::fn_` for its occasional use. Note that this prevents duplicating num_outputs_ per node & per Static Runtime instance since `ProcessedFunction` instances are shared across all runtime instances.
It's confirmed that this change reduces the `sizeof(ProcessedNode)` by 14% from local instrumentation as follows:
- Before
-- sizeof(ProcessedNode): 56
- After
-- sizeof(Processednode): 48
Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest`
Reviewed By: mikeiovine
Differential Revision: D33984792
fbshipit-source-id: e29ffc97b799e679215f42e1e85cd3fcd7e88983
(cherry picked from commit 0f7003f4df)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72770
This PR improves upon PR70622 by removing the call to_make_per_tensor_quantized_tensor
and directly creating a quantized int8 tensor that is passed into raw_cudnn_convolution_forward as
opposed to a non-quantized int8 tensor.
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D34243926
Pulled By: dzdang
fbshipit-source-id: 7725db27d0a276e8108086fecb7ecb18aa227102
(cherry picked from commit e20e99c7b9)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66109
This refactor is no longer necessary for ufunc codegen, as I changed
the format of ufuncs to not directly be inserted into the 'dispatch'
key, but I think the refactored code here is better. The basic concept
is to directly construct BackendMetadata as we are parsing entries of
the dispatch dictionary, rather than post facto creating them later.
This centralizes the compute and means that the creation of the backend index
is just a simple reindexing by operator name (nothing nontrivial).
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: bdhirsh
Differential Revision: D31385760
Pulled By: ezyang
fbshipit-source-id: 4fcb491ba025d2aa6fd356586b57affb97a507fc
(cherry picked from commit 21c93d4199)
Adds credentials to macOS workflows to upload_test_statistics when
needed
Signed-off-by: Eli Uriegas <eliuriegasfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72955
Summary:
Based on past PRs, here is an non-exhaustive list of files to consider for extension. The PR is not meant to be final. Based on feedback and discussion, files could be dropped from the list, or PR could be updated to move code around such that extension is no longer needed.
List of files below and description:
* These files are for converting from IR to ONNX proto. These should be used only for ONNX.
```
"torch/csrc/jit/serialization/export.*",
"torch/csrc/jit/serialization/onnx.*",
```
* This file is touched whenever pass signature is updated.
```
"torch/_C/__init__.pyi.in",
```
* These files are touched whenever pass signature is updated. Somehow it's been convention that onnx passes are also added here, but it could be possible to move them. Let me know what you think.
~~"torch/csrc/jit/python/init.cpp",~~
~~"torch/csrc/jit/python/script_init.cpp",~~
Update: Bowen will move onnx passes to files under onnx folder.
* ~~Touched when need new attr::xxx, or onnx::xxx.~~
~~"aten/src/ATen/core/interned_strings.h"~~
Update: Nikita will help separate this file.
malfet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72297
Reviewed By: H-Huang
Differential Revision: D34254666
Pulled By: malfet
fbshipit-source-id: 032cfa590cbedf4648b7335fe8f09a2380ab14cb
(cherry picked from commit 88653eadbf)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72943
Adds some documentation about our github actions setup and examples on
how to add workflows / regenerate workflows
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Test Plan: Imported from OSS
Reviewed By: kit1980
Differential Revision: D34283475
Pulled By: seemethere
fbshipit-source-id: a4ac9711c19aaf9312361f46db681d4457ab790c
(cherry picked from commit f352bba8e9)
**this is a re-submit of this PR, the previous version broke forked pull requests by checkout out the head ref as opposed to the head sha**
There are two commits that we test sometimes in CI:
1. The merge commit (a test merge between the PR head ref and the latest base ref)
2. The head ref (the exact commit that was at the head of the user's branch when they pushed).
This distinction is fairly subtle; in the case of 1, you are effectively running against a "rebased" version of your PR's branch. The problem is that we use *both* of these commits today, with confusing results—depending on how you put up your PR and what workflows are running, we might be testing two different commits!
We should probably consolidate on one. This would eliminate a subtle but complex part of our CI (I am mildly horrified by the complexity of [this explanation](https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md#which-commit-is-used-in-ci), although it's heroic that someone went and documented it lol). This PR consolidates on using the head ref (option 2).
- This is the behavior of phabricator/fbcode, which many PT devs will be more familiar with.
- This is the behavior of ghstack
- Our master branch moves quite quickly, so the chance that there is a substantial divergence between your local test runs and CI is high, with confusing results that are nondeterministic based on when you put up the PR.
- We use a linear history/squash-rebase-merge workflow, which is better modeled by option 2. Option 1 effectively emulates a merge-commit-style workflow.
The primary disadvantage is that now when re-running workflows, you will not be re-running against a "rebased" version of the PR, but the exact head ref that was pushed. Tbh I find it quite unintuitive that what you're testing changes depending on when you press the re-run button, but I know at least @malfet does this so it's worth mentioning.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71974
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72730
This diff contains changes from several PRs landed to lazy_tensor_staging branch.
- generating 'fallback' overrides for each codegenned op, useful for debugging
- supports operators which are missing aten:: symbols for op names, instead using their string counterpart
- makes the IR class a base class instead of hardcoding the assumption of TS
Test Plan: tested on lazy_tensor_staging branch
Reviewed By: desertfire
Differential Revision: D34178476
fbshipit-source-id: 7190b2e0d82b4eb1f4510c858c24446c6df3f9d0
(cherry picked from commit 6713d3f0ef)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72671
The existing kernel did not handle cases where D % 4 != 0 or dim_per_head % 4 != 0. Now we have a non-vectorized kernel for these cases.
ghstack-source-id: 149201477
Test Plan: Updated test_nn to cover these cases.
Reviewed By: zrphercule, ngimel
Differential Revision: D34119371
fbshipit-source-id: 4e9b4d9b636224ef2c433593f6f236df040de782
(cherry picked from commit f5393878e4)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72464
We had some trouble getting this component (and this test!) right, so let's test it.
ghstack-source-id: 149201478
Test Plan: new test passes
Reviewed By: zrphercule
Differential Revision: D33992477
fbshipit-source-id: cc377eed5d4a4412b42bdabf360601c6e52947cf
(cherry picked from commit 9832867b12)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72733
To improve the perf cost due to communication in the process of init the sharded tensor. There are two changes in this PR/diff:
1. We create a new API named `_init_from_local_tensor` so that if we have only one local tensor, we can initiate a sharded tensor directly from it. (GH issue: https://github.com/pytorch/pytorch/issues/72092)
2. We create a new API to infer the sharding spec from global meta data, so we don't have to manually set the sharding spec when it's not `EnumerableShardingSpec`. (GH issue: https://github.com/pytorch/pytorch/issues/67244)
ghstack-source-id: 149229259
Test Plan: CI
Reviewed By: wanchaol
Differential Revision: D34132739
fbshipit-source-id: 3a60135761bcc19d6020b6c45cb2979869645ce6
(cherry picked from commit af569325e2)
Summary:
It seemed strange to me that min_runtime_lib was dependent on the serialization headers but didnt have a dependency on their .cc. This puts them into their own target that contains both and then updates deps.
(Note: this ignores all push blocking failures!)
Test Plan: ci
Reviewed By: iseeyuan
Differential Revision: D34159900
fbshipit-source-id: 57102414be2439f5f4e3ed8ccd2b0c375b9de9b2
(cherry picked from commit c9ff2d2d9d)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72547
toTuple() returns a new intrusive pointer that bumps its underlying ref count. Whereas, toTupeRef returns a reference. We can save an unnecessary ref count bump.
Based on https://fb.workplace.com/groups/pytorch.edge.team/permalink/1021780808376658/
similar to D34047666 (85d7e73a8a)
ghstack-source-id: 148665193
Test Plan:
```
> Executing task: buck: buck test //xplat/caffe2:test_lite_interpreter --config client.id=nuclide <
Executing in directory: /data/users/pavithran/fbsource
buck test //xplat/caffe2:test_lite_interpreter --config client.id=nuclide
clang-9: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument]
Parsing buck files: finished in 2.1 sec
Creating action graph: finished in 0.5 sec
[RE] Metadata: Session ID=[reSessionID-66858379-0761-4966-a933-bc7f0d0add95]
[RE] Waiting on 0 remote actions. Completed 523 actions remotely, action cache hit rate: 0.00%.
Downloaded 3947/5089 artifacts, 20.92 Mbytes, 12.5% cache miss (for updated rules)
Building: finished in 01:04.0 min (100%) 5438/5438 jobs, 5192/5438 updated
Total time: 01:06.6 min
Testing: finished in 06:53.7 min (71 PASS/0 FAIL)
BUILD SUCCEEDED
RESULTS FOR //xplat/caffe2:test_lite_interpreter
PASS 406.0s 71 Passed 0 Skipped 0 Failed //xplat/caffe2:test_lite_interpreter
TESTS PASSED
Terminal will be reused by tasks, press any key to close it.
```
Reviewed By: kimishpatel
Differential Revision: D34082609
fbshipit-source-id: 4bcbdb2d11dd4c3bc392010487dccd2270278222
(cherry picked from commit dd64eb386d)