Commit Graph

1733 Commits

Author SHA1 Message Date
PyTorch MergeBot
17c149ad9e Revert "[CI] Use prebuilt triton from nightly repo (#94732)"
This reverts commit 18d93cdc5d.

Reverted https://github.com/pytorch/pytorch/pull/94732 on behalf of https://github.com/kit1980 due to Reverting per offline discussion to try to fix dynamo test failures after triton update
2023-02-17 21:51:25 +00:00
Will Constable
a8cbf70ffc Inductor support for aten::all_reduce (#93111)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93111
Approved by: https://github.com/jansel, https://github.com/wanchaol
2023-02-17 04:42:04 +00:00
Nikita Shulga
d0fbed76c6 Test inductor with stock g++ (#90710)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90710
Approved by: https://github.com/jansel
2023-02-16 15:10:17 +00:00
AllenTiTaiWang
28e69954a1 [ONNX] Support aten::bit_wise_not in fx-onnx exporter (#94919)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94919
Approved by: https://github.com/justinchuby, https://github.com/wschin
2023-02-16 06:21:59 +00:00
Nikita Shulga
18d93cdc5d [CI] Use prebuilt triton from nightly repo (#94732)
No point in building from source if it was prebuilt already

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94732
Approved by: https://github.com/DanilBaibak, https://github.com/atalman, https://github.com/huydhn, https://github.com/jansel
2023-02-14 15:51:23 +00:00
Xuehai Pan
b005ec62b9 [BE] Remove dependency on six and future (#94709)
Remove the Python 2 and 3 compatibility library [six](https://pypi.org/project/six) and [future](https://pypi.org/project/future) and `torch._six`. We only support Python 3.8+ now. It's time to retire them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94709
Approved by: https://github.com/malfet, https://github.com/Skylion007
2023-02-14 09:14:14 +00:00
BowenBao
055dc72dba [ONNX] Bump onnx to 1.13.1, onnxruntime to 1.14.0 (#94767)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94767
Approved by: https://github.com/abock
2023-02-14 03:53:05 +00:00
Huy Do
bdf9963e57 Cache linter S3 dependencies (#94745)
Fixes https://github.com/pytorch/pytorch/issues/94716
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94745
Approved by: https://github.com/seemethere
2023-02-13 19:44:23 +00:00
Nikita Shulga
4869929f32 Update Triton hash (#94249)
That includes MLIR + latest packaging changes (that also download ptxas from CUDA-12)
Tweak CI to install gcc-9 to build trition

Disable a few tests to make everything be correct

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94249
Approved by: https://github.com/Skylion007, https://github.com/ngimel, https://github.com/weiwangmeta
2023-02-13 13:17:36 +00:00
Wei Wang
6fadd5e94a Checkout torchbench with only needed models (#94578)
Addresses (https://github.com/pytorch/pytorch/pull/93395#issuecomment-1414231011) The perf smoke test is supposed to be around one minute. But the torchbench checkout process is taking more than 15 minutes. This PR explores a way to just checkout torchbench with only needed models that are later used to do perf smoke test and memory compression ratio check.

Torchbench installation has "python install.py models model1 model 2 model3" support to just install model1 model2 and model3, not providing "models model1 model2 model3" would install all models by default.

Before this PR, inductor job takes about 27 minutes (21 minutes spent in testing phase) https://github.com/pytorch/pytorch/actions/runs/4149154553/jobs/7178024253
After this PR, inductor job takes about 19 minutes (12 minutes spent in testing phase), pytorch checkout and docker image pull takes about 5 - 6 minutes total.  https://github.com/pytorch/pytorch/actions/runs/4149155814/jobs/7178735494

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94578
Approved by: https://github.com/orionr, https://github.com/malfet, https://github.com/desertfire
2023-02-13 04:02:18 +00:00
Huy Do
371f587c92 Dockerize lint jobs (#94255)
This is to minimize network flakiness when running lint jobs.  I create a new Docker image for linter and install all linter dependencies there.  After that, all linter jobs are converted to use Nova generic Linux job https://github.com/pytorch/test-infra/blob/main/.github/workflows/linux_job.yml with the new image.

For the future task: I encounter this issue with the current mypy version we are using and Python 3.11 https://github.com/python/mypy/issues/13627.  Fixing this requires upgrading mypy to a newer version, but that can be done separately (require formatting/fixing `*.py` files with the newer mypy version)

`collect_env` linter job is currently not included here as it needs older Python versions (3.5).  It could also be converted to use the same mechanism (with another Docker image, probably).  This one rarely fails though.

### Testing

BEFORE
https://github.com/pytorch/pytorch/actions/runs/4130366955 took a total of ~14m

AFTER
https://github.com/pytorch/pytorch/actions/runs/4130712385 also takes a total of ~14m
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94255
Approved by: https://github.com/ZainRizvi
2023-02-11 21:56:19 +00:00
Justin Chu
a27bd42bb9 [ONNX] Use onnxruntime to run fx tests (#94638)
- Enable the mnist test
- Removed `max_pool2d` in the test because we don't have the op yet.
- Add aten::convolution
- Bump onnxscript version
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94638
Approved by: https://github.com/BowenBao, https://github.com/wschin, https://github.com/titaiwangms
2023-02-11 15:32:03 +00:00
BowenBao
88d0235b73 [ONNX] Update CI test environment; Add symbolic functions (#94564)
* CI Test environment to install onnx and onnx-script.
* Add symbolic function for `bitwise_or`, `convert_element_type` and `masked_fill_`.
* Update symbolic function for `slice` and `arange`.
* Update .pyi signature for `_jit_pass_onnx_graph_shape_type_inference`.

Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>
Co-authored-by: Ti-Tai Wang <titaiwang@microsoft.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94564
Approved by: https://github.com/abock
2023-02-10 20:44:59 +00:00
Huy Do
2af89e96ec Lower libtorch build parallelization to avoid OOM (#94548)
Memory usage increases after https://github.com/pytorch/pytorch/pull/88575.  Docker crashes with exit code 137, clearly means out of memory

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94548
Approved by: https://github.com/seemethere
2023-02-10 01:52:09 +00:00
pramenku
dddc0b41db [ROCm] centos update endpoint repo and fix sudo (#92034)
* Update ROCm centos Dockerfile
* Update install_user.sh for centos sudo issue

Fixes ROCm centos Dockerfile due to https://packages.endpoint.com/rhel/7/os/x86_64/endpoint-repo-1.9-1.x86_64.rpm file is not accessible.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92034
Approved by: https://github.com/malfet
2023-02-09 21:30:58 +00:00
Xuehai Pan
69e0bda999 [BE] Import Literal, Protocol, and Final from standard library typing as of Python 3.8+ (#94490)
Changes:

1. `typing_extensions -> typing-extentions` in dependency. Use dash rather than underline to fit the [PEP 503: Normalized Names](https://peps.python.org/pep-0503/#normalized-names) convention.

```python
import re

def normalize(name):
    return re.sub(r"[-_.]+", "-", name).lower()
```

2. Import `Literal`, `Protocal`, and `Final` from standard library as of Python 3.8+
3. Replace `Union[Literal[XXX], Literal[YYY]]` to `Literal[XXX, YYY]`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94490
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-09 19:17:49 +00:00
Jack Taylor
75545798c6 test_inductor test.sh fix (#92833)
inductor/test_torchinductor suite is not running as part of the CI. I have triaged this down to a bug in the arguments supplied in test/run_test.py

Currently test_inductor runs the test suites as:
`PYTORCH_TEST_WITH_INDUCTOR=0 python test/run_test.py --include inductor/test_torchinductor --include inductor/test_torchinductor_opinfo --verbose`

Which will only set off the test_torchinductor_opinfo suite

Example from CI logs: https://github.com/pytorch/pytorch/actions/runs/3926246136/jobs/6711985831#step:10:45089
```
+ PYTORCH_TEST_WITH_INDUCTOR=0
+ python test/run_test.py --include inductor/test_torchinductor --include inductor/test_torchinductor_opinfo --verbose
Ignoring disabled issues:  []
/var/lib/jenkins/workspace/test/run_test.py:1193: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if torch.version.cuda is not None and LooseVersion(torch.version.cuda) >= "11.6":
Selected tests:
 inductor/test_torchinductor_opinfo
Prioritized test from test file changes.
reordering tests for PR:
prioritized: []
the rest: ['inductor/test_torchinductor_opinfo']
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92833
Approved by: https://github.com/seemethere
2023-02-09 18:51:25 +00:00
AllenTiTaiWang
6d722dba0f [ONNX] Update CI onnx and ORT version (#94439)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94439
Approved by: https://github.com/BowenBao
2023-02-09 04:08:38 +00:00
Nikita Shulga
82401c6a69 [BE] Set PYTORCH_TEST_WITH_INDUCTOR only once (#94411)
Setting the same env-var twice should have no effect, unless one is trying mini rowhammer here

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94411
Approved by: https://github.com/jeanschmidt, https://github.com/huydhn, https://github.com/Skylion007
2023-02-08 21:00:40 +00:00
albanD
75e04f6dad Test enabling full testing on 3.11 for linux (#94056)
Testing what happens if we run everything right now.
Will remove the broken stuff to get a a mergeable version next.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94056
Approved by: https://github.com/malfet
2023-02-07 23:02:13 +00:00
albanD
9b3277c095 Make sure to properly pull the right submodule in BC test (#94182)
To unblock https://github.com/pytorch/pytorch/pull/93219
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94182
Approved by: https://github.com/ezyang, https://github.com/malfet, https://github.com/Skylion007
2023-02-06 18:03:35 +00:00
Nikita Shulga
6c4dc98b9d [CI][BE] Move docker forlder to .ci (#93104)
Follow up after https://github.com/pytorch/pytorch/pull/92569

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93104
Approved by: https://github.com/huydhn, https://github.com/seemethere, https://github.com/ZainRizvi
2023-02-03 12:25:33 +00:00
Jane Xu
0ecb071fc4 [BE][CI] change references from .jenkins to .ci (#92624)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92624
Approved by: https://github.com/ZainRizvi, https://github.com/huydhn
2023-01-30 22:50:07 +00:00
Bin Bao
2b267fa7f2 [inductor] Check memory compression ratio in model tests (#89305)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89305
Approved by: https://github.com/weiwangmeta
2023-01-30 22:01:06 +00:00
Catherine Lee
27ab1dfc28 Remove print_test_stats, test_history, s3_stat_parser (#92841)
Pritam Damania no longer uses it (and is no longer with FB), and I don't know who else has interest in this
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92841
Approved by: https://github.com/malfet, https://github.com/huydhn, https://github.com/ZainRizvi, https://github.com/seemethere
2023-01-27 18:11:42 +00:00
Huy Do
074f5ce0b7 Install Torchvision in all Linux shards (#93108)
Also skip `test_roi_align_dynamic_shapes` for cuda as introduced by https://github.com/pytorch/pytorch/pull/92667.  With Torchvision properly installed, the test fails with the following error:

```
2023-01-26T04:46:58.1532060Z   test_roi_align_dynamic_shapes_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:266: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-26T04:46:58.1532195Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-26T04:46:58.1532383Z     test_roi_align_dynamic_shapes_cuda errored - num_retries_left: 3
2023-01-26T04:46:58.1532479Z Traceback (most recent call last):
2023-01-26T04:46:58.1532725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 1155, in run_node
2023-01-26T04:46:58.1532821Z     return node.target(*args, **kwargs)
2023-01-26T04:46:58.1533056Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 499, in __call__
2023-01-26T04:46:58.1533160Z     return self._op(*args, **kwargs or {})
2023-01-26T04:46:58.1533304Z RuntimeError: Cannot call sizes() on tensor with symbolic sizes/strides
```

https://github.com/pytorch/pytorch/issues/93054 reveals a blindspot in the CI where Torchvision was only installed in the first and second shard.  The above test should show that failure as part of https://github.com/pytorch/pytorch/pull/92667, but then it was skipped because Torchvision was not installed (in the 3rd shard) for `test_roi_align` to run.  The test is still skipped here, but in a more explicit way.

Fixes https://github.com/pytorch/pytorch/issues/93054

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93108
Approved by: https://github.com/clee2000, https://github.com/jjsjann123, https://github.com/nkaretnikov
2023-01-27 03:15:18 +00:00
Huy Do
68a49322e7 [MacOS] Explicitly use cmake from cloned conda environment (#92737)
My first attempt to fix `Library not loaded: @rpath/libzstd.1.dylib` issue on MacOS M1 in https://github.com/pytorch/pytorch/pull/91142 provides some additional logs about flaky error but doesn't fix the issue as I see some of them recently, for example

* e4d83d54a6

Looking at the log, I can see that:

* CMAKE_EXEC correctly points to `CMAKE_EXEC=/Users/ec2-user/runner/_work/_temp/conda_environment_3971491892/bin/cmake`
* The library is there under the executable rpath
```
ls -la /Users/ec2-user/runner/_work/_temp/conda_environment_3971491892/bin/../lib
...
2023-01-20T23:22:03.9761370Z -rwxr-xr-x    2 ec2-user  staff    737776 Apr 22  2022 libzstd.1.5.2.dylib
2023-01-20T23:22:03.9761630Z lrwxr-xr-x    1 ec2-user  staff        19 Jan 20 22:47 libzstd.1.dylib -> libzstd.1.5.2.dylib
...
```

Then calling cmake after that suddenly uses the wrong cmake from miniconda package cache:

```
2023-01-20T23:22:04.0636880Z + cmake ..
2023-01-20T23:22:04.1924790Z dyld[85763]: Library not loaded: @rpath/libzstd.1.dylib
2023-01-20T23:22:04.1925540Z   Referenced from: /Users/ec2-user/runner/_work/_temp/miniconda/pkgs/cmake-3.22.1-hae769c0_0/bin/cmake
```

This is weird, so my second attempt will be more explicit and use the correct cmake executable in `CMAKE_EXEC`.  May be something manipulates the global path in between making ` /Users/ec2-user/runner/_work/_temp/miniconda/pkgs/cmake-3.22.1-hae769c0_0/bin/cmake` comes first in the PATH

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92737
Approved by: https://github.com/ZainRizvi
2023-01-26 21:07:41 +00:00
jjsjann123
c11b301bcd [NVFUSER] refactor nvfuser build (#89621)
This PR is the first step towards refactors the build for nvfuser in order to have the coegen being a standalone library.

Contents inside this PR:
1. nvfuser code base has been moved to `./nvfuser`, from `./torch/csrc/jit/codegen/cuda/`, except for registration code for integration (interface.h/interface.cpp)
2. splits the build system so nvfuser is generating its own `.so` files. Currently there are:
    - `libnvfuser_codegen.so`, which contains the integration, codegen and runtime system of nvfuser
    - `nvfuser.so`, which is nvfuser's python API via pybind. Python frontend is now exposed via `nvfuser._C.XXX` instead of `torch._C._nvfuser`
3. nvfuser cpp tests is currently being compiled into `nvfuser_tests`
4. cmake is refactored so that:
    - nvfuser now has its own `CMakeLists.txt`, which is under `torch/csrc/jit/codegen/cuda/`.
    - nvfuser backend code is not compiled inside `libtorch_cuda_xxx` any more
    - nvfuser is added as a subdirectory under `./CMakeLists.txt` at the very end after torch is built.
    - since nvfuser has dependency on torch, the registration of nvfuser at runtime is done via dlopen (`at::DynamicLibrary`). This avoids circular dependency in cmake, which will be a nightmare to handle. For details, look at `torch/csrc/jit/codegen/cuda/interface.cpp::LoadingNvfuserLibrary`

Future work that's scoped in following PR:
- Currently since nvfuser codegen has dependency on torch, we need to refactor that out so we can move nvfuser into a submodule and not rely on dlopen to load the library. @malfet
- Since we moved nvfuser into a cmake build, we effectively disabled bazel build for nvfuser. This could impact internal workload at Meta, so we need to put support back. cc'ing @vors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89621
Approved by: https://github.com/davidberard98
2023-01-26 02:50:44 +00:00
Jane Xu
b453adc945 [BE][CI] rename .jenkins (#92845)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92845
Approved by: https://github.com/clee2000
2023-01-25 23:47:38 +00:00
PyTorch MergeBot
afe6ea884f Revert "[BE][CI] rename .jenkins to .ci, add symlink (#92621)"
This reverts commit 8972a9fe6a.

Reverted https://github.com/pytorch/pytorch/pull/92621 on behalf of https://github.com/atalman due to breaks shipit
2023-01-23 15:04:58 +00:00
Nikita Shulga
b5f614c4cd Move ASAN and ONNX to Python 3.9 and 3.8 (#92712)
As 3.7 is getting deprecated
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92712
Approved by: https://github.com/weiwangmeta, https://github.com/kit1980, https://github.com/seemethere
2023-01-23 14:46:02 +00:00
Edward Z. Yang
de69cedf98 Run all of the timm models shards in the periodic (#92743)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92743
Approved by: https://github.com/kit1980
2023-01-21 18:39:17 +00:00
Jane Xu
8972a9fe6a [BE][CI] rename .jenkins to .ci, add symlink (#92621)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92621
Approved by: https://github.com/huydhn, https://github.com/ZainRizvi
2023-01-21 02:40:18 +00:00