Today, we have two pieces that conspire to determine what workflows we run:
- `generate_ci_workflows.py`, which takes a declarative description of what we want the workflow to do and uses jinja to generate a workflow yaml file
- `generate-test-matrix`, which runs at CI time to dynamically generate test jobs.
This is bad:
- Having one layer of code generation is unfortunate, having two is confusing.
- You cannot tell from a workflow yaml file what test jobs will be run.
- We have to do this careful dance of plumbing the args to `generate-test-matrix` through setting env vars and other such ugliness.
- In cases where the build job fails and prevents `generate-test-matrix` from running, a ghost `test` job that doesn't actually exist noises up the HUD and our stats.
- A bunch of useless `generate-test-matrix` jobs (8 on PRs) noise up our signal.
As far as I can tell, this complexity is unnecessary--we have all the information we need to generate the build matrix statically. There does not appear to be an advantage in retaining generate-build-matrix, so I am removing `generate-test-matrix` to simplify the CI.
The *only* place where we were actually doing something dynamic is in our windows gpu workflow, where we would check at runtime whether the workflow was triggered from a PR or master and behave accordingly. This is more simply done by just having two separate workflows with different trigger conditions, which avoids the madness of needing to parse labels and forking the behavior dynamically, which has been a source of confusion in the past.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73001
Summary:
Remove fx2trt test from oss CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72595
Test Plan: CI
Reviewed By: houseroad
Differential Revision: D34112595
Pulled By: wushirong
fbshipit-source-id: 02376ef0f25381eff31b72dcbf964c1966af9793
(cherry picked from commit e3d698a942)
These were left out of the intial migration for some reason so this just
transfers over those tests
Signed-off-by: Eli Uriegas <eliuriegasfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71644
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Summary:
This PR implements the workflow changes described in https://fb.quip.com/oi8wAvajpR4g. Combined with the bot logic in d928549336 (can be moved to probot but is easier to test there), it fully implements the proposal.
The CIFlow comment is slightly outdated now but is still technically correct (all the commands will continue to work as before, just through a different mechanism).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70321
Reviewed By: atalman, janeyx99
Differential Revision: D33690370
Pulled By: suo
fbshipit-source-id: 8d81ffeb249cdae53c5526798a4a504560d0204f
(cherry picked from commit 5ed8d0dfae)
Summary:
Also adds a mechanism for all workflows to do this
Signed-off-by: Eli Uriegas <eliuriegasfb.com>
cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71567
Reviewed By: malfet
Differential Revision: D33687713
Pulled By: seemethere
fbshipit-source-id: a3c7ef41ed04f9caa82c180961d2f4b7c24582dd
(cherry picked from commit eef2eafffd)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71431
Adds a PR trigger based on paths to the binary build workflows to make
it easier to test / verify changes to the binary build workflows without
adding a bunch of skipped checks to the majority of our workflows
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Test Plan: Imported from OSS
Reviewed By: atalman
Differential Revision: D33641276
Pulled By: seemethere
fbshipit-source-id: 0ed65cbcebf06dfe998f81d67df817250dd1a716
(cherry picked from commit 598b55fd18)
Summary:
The many times a day was probably not intentional
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71255
Reviewed By: suo, atalman
Differential Revision: D33559155
Pulled By: janeyx99
fbshipit-source-id: c8703cea6f3188c9bcb0867b895261808d3164ee
Summary:
Our docker builds have not been running with our previous cron, changes this so it should work hopefully.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71232
Reviewed By: ejguan
Differential Revision: D33552231
Pulled By: janeyx99
fbshipit-source-id: 1a3e1607b03d37614eedf04093d73f1b96698840
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68388
Updates the gpu architectures as well as adding a trigger for
on_pull_request for the binary build workflows so that we can iterate on
this later
TODO:
* Create follow up PR to enable nightly linux GHA builds / disable CircleCI nighlty linux builds
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Test Plan: Imported from OSS
Reviewed By: janeyx99
Differential Revision: D33462294
Pulled By: seemethere
fbshipit-source-id: 5fa30517550d36f504b491cf6c1e5c9da56d8191
Summary:
The CMake build defaults to `USE_PER_OPERATOR_HEADERS = 1` which
generates extra headers in the `ATen/ops` folder that don't exist
otherwise. In particular, fb-internal builds using buck don't support
these headers and so all includes must be guarded with
`#ifdef AT_PER_OPERATOR_HEADERS`.
This adds a CI run which builds with `USE_PER_OPERATOR_HEADERS = 0` so
open source contributions don't have to wait for their PR to be
imported to find out it doesn't work in fb-internal. This flag
shouldn't effect runtime behavior though, so I don't run any tests.
cc seemethere malfet pytorch/pytorch-dev-infra
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69907
Reviewed By: malfet, atalman
Differential Revision: D33411864
Pulled By: seemethere
fbshipit-source-id: 18b34d7a83dc81cf8a6c396ba8369e1789f936e9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70453
Removes the current xla config, downstream `pytorch/xla` is broken for
clang compilation so temporarily removing this config until the xla team
can fix this upstream CI.
Context: https://github.com/pytorch/xla/pull/3255/files#r775980035
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Test Plan: Imported from OSS
Reviewed By: zengk95
Differential Revision: D33338463
Pulled By: seemethere
fbshipit-source-id: 1ef332c685d5e2cc7e2eb038e93bd656847fd099
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66725
This removes the ci_flow_should_run job and puts it in the build stage for the different job templates.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70204
Reviewed By: malfet
Differential Revision: D33282338
Pulled By: zengk95
fbshipit-source-id: 327ff2bca9720d2a69083594ada5c7788b65adbd
Summary:
All for builds of the Android (arm32/64 and x86_32/64) are not migrated to the GHA, away from circleCI. Since this part of the workflow creates final binary with all architectures in it, it was not possible to do migration step by step.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68843
Reviewed By: malfet
Differential Revision: D33257480
Pulled By: b0noI
fbshipit-source-id: dd280c8268bdd31763754c36f38e4ea12b23cd2e
Summary:
Fixes https://github.com/pytorch/pytorch/issues/35316
On master, bazel cuda build is disabled due to lack of a proper `cu_library` rule. This PR:
- Add `rules_cuda` to the WORKSPACE and forward `cu_library` to `rules_cuda`.
- Use a simple local cuda and cudnn repositories (adopted from TRTorch) for cuda 11.3.
- Fix current broken cuda build.
- Enable cuda build in CI, not just for `:torch` target but all the test binaries to catch undefined symbols.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66241
Reviewed By: ejguan
Differential Revision: D31544091
Pulled By: malfet
fbshipit-source-id: fd3c34d0e8f80fee06f015694a4c13a8e9e12206
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68261
This PR changes the number of test shard from 2-->3 for all Asan test, aiming to improve the run time for Asan tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69843
Reviewed By: janeyx99
Differential Revision: D33160771
Pulled By: xidachen
fbshipit-source-id: dba1d318cc49b923e18704839471d8753cc00eca
Summary:
This is partial revert of bb522c9d7a to revert addition of workflows for CUDA 11.5 windows that fails
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69365
Reviewed By: suo
Differential Revision: D32831418
Pulled By: atalman
fbshipit-source-id: 184346d22623f88594312a4ce2e4d29cc67e8338
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69172
Migrates the docs push jobs to Github Actions by implementing a simple
WITH_PUSH switch to do the actual push.
Adds 2 new workflows for GHA:
* linux-docs (on trunk)
* linux-docs-push (on schedule)
linux-docs-push is the only workflow that actually gets access to
credentials so it should be relatively safe.
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Test Plan: Imported from OSS
Reviewed By: malfet
Differential Revision: D32767239
Pulled By: seemethere
fbshipit-source-id: 5b100f986cf4023c323f4f96f0fe7942fec49ad2
Summary:
Do not run distributed tests as part of separate shard, but keep it inside one of the two shards (to limit concurrency problems)
Fixes https://github.com/pytorch/pytorch/issues/68260
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68784
Reviewed By: seemethere, janeyx99
Differential Revision: D32653440
Pulled By: malfet
fbshipit-source-id: ebe5bbc30bdf67e930f2c766c920932700f3a4e4
Summary:
This fixes custom class registration issue when `typeid` is not guaranteed to be unique across multiple libraries, which is the case for libc++ runtime on MacOS 11 in particular for M1
From [libcxx/include/typeinfo](78d6a7767e/include/typeinfo (L139)):
```
// -------------------------------------------------------------------------- //
// NonUniqueARMRTTIBit
// -------------------------------------------------------------------------- //
// This implementation of type_info does not assume always a unique copy of
// the RTTI for a given type inside a program. It packs the pointer to the
// type name into a uintptr_t and reserves the high bit of that pointer (which
// is assumed to be free for use under the ABI in use) to represent whether
// that specific copy of the RTTI can be assumed unique inside the program.
// To implement equality-comparison of type_infos, we check whether BOTH
// type_infos are guaranteed unique, and if so, we simply compare the addresses
// of their type names instead of doing a deep string comparison, which is
// faster. If at least one of the type_infos can't guarantee uniqueness, we
// have no choice but to fall back to a deep string comparison.
```
But `std::type_index` hash is computed always assuming that implementation is unique
By adding a slow path this problem can be fixed in those scenarios.
Fixes https://github.com/pytorch/pytorch/issues/68039
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68717
Reviewed By: seemethere
Differential Revision: D32605187
Pulled By: malfet
fbshipit-source-id: 8d50e56885b8c97dad3bc34a69c47ef879456dd1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68180
Since we've open sourced the tracing-based selective build, we can deprecate the
op-dependency-graph-based selective build and the static analyzer tool that
produces the dependency graph.
ghstack-source-id: 143108377
Test Plan: CIs
Reviewed By: seemethere
Differential Revision: D32358467
fbshipit-source-id: c61523706b85a49361416da2230ec1b035b8b99c