Summary: Tests are frequently failing with "exceeded the deadline of 1000.00ms", we expect this to happen, so remove the deadline
Test Plan: N/A: Fix breakages
Reviewed By: robieta
Differential Revision: D28581051
fbshipit-source-id: 4825ada9af151fa5d57c45c549138c15ba613705
Summary: When run on very heavily loaded machines, some of these tests are timing out. It's not an issue with the test, it's an issue with the environment. I've removed the timeout so we at least keep unit test coverage.
Test Plan: N/A: Fix breakages
Reviewed By: ngimel
Differential Revision: D28492334
fbshipit-source-id: aed3ee371763161aab2d356f5623c7df053fda6f
Summary:
This is the only line (not in `third_party`) matching the regex `^#!.*python2`, and [it is not the first line of its file](https://github.com/koalaman/shellcheck/wiki/SC1128), so it has no effect. As a followup to https://github.com/pytorch/pytorch/issues/58275, this PR removes that shebang to reduce confusion, so now all Python shebangs in this repo are `python3`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58409
Reviewed By: walterddr
Differential Revision: D28478469
Pulled By: samestep
fbshipit-source-id: c17684c8651e45d3fc383cbbc04a31192d10f52f
Summary:
Some machines don't have a versionless `python` on their PATH, which breaks these existing shebangs.
I'm assuming that all the existing versionless `python` shebangs are meant to be `python3` and not `python2`; please let me know if my assumption was incorrect for any of these.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58275
Test Plan: CI.
Reviewed By: zhouzhuojie
Differential Revision: D28428143
Pulled By: samestep
fbshipit-source-id: 6562be3d12924db72a92a0207b060ef740f61ebf
Summary: Removed the deadline restriction since the first run can take more than the deadline, wile subsequent runs are shorter.
Reviewed By: ngimel
Differential Revision: D28260077
fbshipit-source-id: 8ed2f5c16bc184bf4fae0a59b662fa1da2d4dd0a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57296
Seems many trainers disable print(), so we cannot see the thread dumps with CompleteInTimeOrDie(). So log.info() also.
Test Plan: sandcastle
Reviewed By: aalmah
Differential Revision: D28098738
fbshipit-source-id: dfdca8801bacf5c7bccecc2387cb7ef41dadfa46
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os
def get_compiled_files_list():
import json
with open("build/compile_commands.json") as f:
data = json.load(f)
files = [os.path.relpath(node['file']) for node in data]
for idx, fname in enumerate(files):
if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
return files
def run_clang_tidy(fname):
check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
changes = check_output(["git", "ls-files", "-m"])
if len(changes) == 0:
return
check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])
def main():
git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
compiled_files = get_compiled_files_list()
for idx, fname in enumerate(git_files):
if fname not in compiled_files:
continue
if fname.startswith("caffe2/contrib/aten/"):
continue
print(f"[{idx}/{len(git_files)}] Processing {fname}")
run_clang_tidy(fname)
if __name__ == "__main__":
main()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892
Reviewed By: H-Huang
Differential Revision: D27991944
Pulled By: malfet
fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56717
The signal_handler was under the caffe2 namespacee but was being used
by PyTorch as well.
I've fixed this my moving it to the c10 namespace where now both C2 and PyTorch
can use it.
The signal_handler interface in caffe2/utils/signal_handler.h is kept the same
for backward compatiblity for C2, but most of the commmon code is moved to c10.
ghstack-source-id: 127446929
Test Plan: waitforbuildbot
Reviewed By: ezyang
Differential Revision: D27946738
fbshipit-source-id: d6228d1a0108f4c807d405e7a0bb799c5375388f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56813
When the arg `pass_inputs_as_tensor_list` is True, the input tensors are wrapped into a TensorList and passes in as a single param.
Test Plan: buck test //caffe2/caffe2/python:workspace_test -- TestScriptModule
Reviewed By: dzhulgakov
Differential Revision: D27972928
fbshipit-source-id: 5a199649445b0306f3134086c85bd55da45e1a0b
Summary: `networkx 2.4+` replaced `node` attribute to `nodes` in graph object. This caused failures in `caffe2`'s' `topological_sort_traversal_longest_path` function which uses networkx library for topological sort.
Differential Revision: D27718857
fbshipit-source-id: 812fbb613946565d089cc84a20f3cdf7df046e19
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55003
Using the `caffe2::setPrintStackTracesOnFatalSignal` utility in
distributed tests to set a signal handler that dumps the state of all threads
for all processes when it receives a FATAL signal. This would help in debugging
tests further.
I had to revert all the python faulthandler code since only one signal handler
function is supported, so running python faulthandler with
`setPrintStackTracesOnFatalSignal` doesn't work.
Sample output:
```
SIGSEGV(11), PID: 3492872, Thread 3492872:
[0] ???(0x7fa7b2d1d61b) in libcaffe2_caffe2_caffe2_cpu.so
[1] ???(0x7fa7b2d1d3fb) in libcaffe2_caffe2_caffe2_cpu.so
[2] ???(0x7fa7b2d1d33d) in libcaffe2_caffe2_caffe2_cpu.so
[3] ???(0x7fa7b2d1d167) in libcaffe2_caffe2_caffe2_cpu.so
[4] ???(0x7fa7ce683150) in libpthread.so.0
[5] ???(0x7fa7be2b233c) in libcaffe2__C_impl_cuda.so
[6] ???(0x7fa7be2ce80c) in libcaffe2__C_impl_cuda.so
[7] ???(0x7fa7be2a0512) in libcaffe2__C_impl_cuda.so
[8] torch::distributed::rpc::TensorPipeAgent::send(torch::distributed::rpc::WorkerInfo const&, torch::distributed::rpc::Message&&, float, std::unordered_map<signed char, signed char, std::hash<signed char>, std::equal_to<signed char>, std::allocator<std::pair<signed char const, signed char> > > const&)+0x24f(0x7fa7be29f71f) in libcaffe2__C_impl_cuda.so
[9] torch::distributed::autograd::sendMessageWithAutograd(torch::distributed::rpc::RpcAgent&, torch::distributed::rpc::WorkerInfo const&, torch::distributed::rpc::Message&&, bool, float, bool)+0x393(0x7fa7b602b203) in libcaffe2_libtorch.so
[10] torch::distributed::rpc::pyRpcPythonUdf(torch::distributed::rpc::WorkerInfo const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::vector<at::Tensor, std::allocator<at::Tensor> >&, float, bool)+0x201(0x7fa7bd844971) in libcaffe2__C_impl_cuda.so
```
ghstack-source-id: 125630551
Test Plan: waitforbuildbot
Reviewed By: SciPioneer
Differential Revision: D27419714
fbshipit-source-id: 8aca9a14ef688004053d8798124d9c3a3fbe3489
Summary:
*Context:* https://github.com/pytorch/pytorch/issues/53406 added a lint for trailing whitespace at the ends of lines. However, in order to pass FB-internal lints, that PR also had to normalize the trailing newlines in four of the files it touched. This PR adds an OSS lint to normalize trailing newlines.
The changes to the following files (made in 54847d0adb9be71be4979cead3d9d4c02160e4cd) are the only manually-written parts of this PR:
- `.github/workflows/lint.yml`
- `mypy-strict.ini`
- `tools/README.md`
- `tools/test/test_trailing_newlines.py`
- `tools/trailing_newlines.py`
I would have liked to make this just a shell one-liner like the other three similar lints, but nothing I could find quite fit the bill. Specifically, all the answers I tried from the following Stack Overflow questions were far too slow (at least a minute and a half to run on this entire repository):
- [How to detect file ends in newline?](https://stackoverflow.com/q/38746)
- [How do I find files that do not end with a newline/linefeed?](https://stackoverflow.com/q/4631068)
- [How to list all files in the Git index without newline at end of file](https://stackoverflow.com/q/27624800)
- [Linux - check if there is an empty line at the end of a file [duplicate]](https://stackoverflow.com/q/34943632)
- [git ensure newline at end of each file](https://stackoverflow.com/q/57770972)
To avoid giving false positives during the few days after this PR is merged, we should probably only merge it after https://github.com/pytorch/pytorch/issues/54967.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54737
Test Plan:
Running the shell script from the "Ensure correct trailing newlines" step in the `quick-checks` job of `.github/workflows/lint.yml` should print no output and exit in a fraction of a second with a status of 0. That was not the case prior to this PR, as shown by this failing GHA workflow run on an earlier draft of this PR:
- https://github.com/pytorch/pytorch/runs/2197446987?check_suite_focus=true
In contrast, this run (after correcting the trailing newlines in this PR) succeeded:
- https://github.com/pytorch/pytorch/pull/54737/checks?check_run_id=2197553241
To unit-test `tools/trailing_newlines.py` itself (this is run as part of our "Test tools" GitHub Actions workflow):
```
python tools/test/test_trailing_newlines.py
```
Reviewed By: malfet
Differential Revision: D27409736
Pulled By: samestep
fbshipit-source-id: 46f565227046b39f68349bbd5633105b2d2e9b19
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54042
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53881
1. Fix position_weighted optimizer: Position weighted layer uses default optimizer but is actually gradient_slice, which will cause problem if we do not handle it properly in the new optimizier. The solution is to use sparseadagrad when it is gradient_slices.
2. Optimizer implementation of v1 and v2: using 1st momentum with/without bias_correction.
3. also implemented decoupled weight decay in the new optimizer.
Test Plan:
buck test //caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_2 -- test_mlp_optimization
buck test //caffe2/caffe2/python:optimizer_test -- TestDecayAdagrad
buck test //caffe2/caffe2/python/operator_test:decay_adagrad_test
ctr_mbl_feed work flow: f255731660
oc work flow: f255739503
Reviewed By: 0x10cxR1
Differential Revision: D26839668
fbshipit-source-id: 2b6881c1a88540ef5766be40f5e80001257e2199
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53735
Add an option to BlobSerializationOptions to request that float data be
serialized as bfloat16. This reduces the serialized data size at the expense
of some loss in precision.
ghstack-source-id: 124317910
Test Plan: Included a new unit test.
Reviewed By: mraway
Differential Revision: D26658205
fbshipit-source-id: 74521ed161059066355a3f208488ed01a344dbb5
Summary: Add ability to reset optimizer counter..
Test Plan: will wait for integration tests to run on diff.
Differential Revision: D27248286
fbshipit-source-id: a608df1bd61b64eb317c9ffd9cfdd804c5288f6d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54274
Some of the Python tests need to be aware of whether or not FBGEMM is
available, so expose this setting in the pybind extension.
ghstack-source-id: 124317732
Test Plan: Will use this variable in the tests on D26658205.
Reviewed By: mraway
Differential Revision: D27171780
fbshipit-source-id: 4c94144a959bf8bf0e1553b6e029e94a91794e29
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53402
Add an `options` field to the `Save` operator which accepts options for how to
serialize different blobs. At the moment this simply allows controlling the
existing `chunk_size` behavior, but in the future we can add other options,
such as the ability to control compression settings or other serialization
formats.
ghstack-source-id: 123567034
Test Plan:
Added a new test to `load_save_test.py` that passes in options and verifies
that blobs were serialized with the expected number of chunks.
buck test caffe2/caffe2:caffe2_test_cpu \
caffe2/caffe2/core:serialization_test \
caffe2/caffe2/python/operator_test:load_save_test
Reviewed By: mraway
Differential Revision: D26502577
fbshipit-source-id: 6e302e530bb96990517c2e35c505db7f14a56284
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53401
This is a reland of D26641599 (cd9ac54ea7) after rebasing onto D26802576 (f595ba1bae).
Add some small utility functions to read the blob names back from the minidb
file so that we can verify how many chunks were written for each blob.
ghstack-source-id: 123567033
Test Plan: buck test caffe2/caffe2/python/operator_test:load_save_test
Reviewed By: mraway
Differential Revision: D26853942
fbshipit-source-id: 0b45078fdd279f547752c8fdb771e296374a00da
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857
These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
- `GLOSSARY.md`
- `aten/src/ATen/core/op_registration/README.md`
- `scripts/README.md`
- `torch/csrc/jit/codegen/fuser/README.md`
The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```
I looked over the auto-generated changes and didn't see anything that looked problematic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406
Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377
This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348
Reviewed By: walterddr, seemethere
Differential Revision: D26856620
Pulled By: samestep
fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
Summary:
Add some small utility functions to read the blob names back from the minidb
file so that we can verify how many chunks were written for each blob.
Test Plan: buck test caffe2/caffe2/python/operator_test:load_save_test
Reviewed By: mraway
Differential Revision: D26641599
fbshipit-source-id: bccb0af157d85e585e95bc7be61c4584fba3cb04
Summary:
Add a test in `load_save_test.py` that passes in a chunk_size parameter,
to ensure that we exercise the logic that passes the chunk size to the C++
serialization code.
Test Plan:
Ran the tests with the vlog level set to 3 and manually verified the log
messages showed that we were serializing in the expected chunks.
There are existing C++ tests that confirm chunking behavior works as expected
in the pure C++ code.
Reviewed By: mraway
Differential Revision: D26502578
fbshipit-source-id: cd0074f2358da81c68b0fed2c2a94818d83a957d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52388
Pull Request resolved: https://github.com/pytorch/glow/pull/5364
This allows us to change global variables through onnxifi calls. And add python bindings along with it. Note that we supply a dummy backend_id as it's not needed by glow due to setting being global.
#codemod
Test Plan:
```
buck test mode/dev //glow/fb/test:test_onnxifi_optionnnpi
```
Reviewed By: jfix71, khabinov
Differential Revision: D26481652
fbshipit-source-id: 19b8201c77f653cf7d93ad68760aa7fb5ec45ff4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51768
This updates python/core.py to explicitly define all of the `DataType`
values rather than dynamically defining them at runtime from the
`caffe2_pb2` values.
This allows type checkers like Pyre and Mypy to see the members of the
`DataType` class. Otherwise the type checkers report errors such as
`"core.DataType" has no attribute "INT64"`.
This code does keep a run-time check that all of the data types defined
by `caffe2_pb2.proto` are defined correctly in this file. This way if
someone does add a new type to `caffe2_pb2.proto` it should be very
quickly apparent that this file needs to be updated and kept in sync.
ghstack-source-id: 121936201
Test Plan:
Confirmed that various caffe2/python tests still pass.
Verified that this allows many `pyre-fixme` comments to be removed in
downstream projects, and that Pyre is still clean for these projects.
Reviewed By: jeffdunn
Differential Revision: D26271725
Pulled By: simpkins
fbshipit-source-id: f9e95795de60aba67d7d3872d0c141ed82ba8e39
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51767
The `_import_c_extension.py` finds the right C extension library to use,
and then simply re-exports all of the symbols that it defines.
This adds a `_import_c_extension.pyi` file with type hints to let type
checkers like Pyre and Mypy know the names of the symbols that will be
re-exported from the C extension.
This does not define all of the symbols provided by the C extension,
but does define all of the symbols necessary to make type checkers happy
about other code in the `caffe2/python` directory.
ghstack-source-id: 121916324
Test Plan:
Was able to have Pyre successfully type check the `caffe2/python`
directory with this stub file plus a few other changes.
Confirmed that all of the dependent projects affected by this report no new
pyre issues in sandcastle.
Ran `python test/test_type_hints.py` in the PyTorch github repository and
confirmed it also passes.
Differential Revision: D26271726
Pulled By: simpkins
fbshipit-source-id: 6dbadcf02e0b2cc44a9e3cdabe9291c1250959b4
Summary: Previously there was no regularizer implemented for fp16 sparse features. Add regularizer support here using the Float16SparseNormalize implemented in this stack.
Test Plan:
buck test //caffe2/caffe2/python:regularizer_test
In f248648705, we can see there is the operator `Float16SparseNormalize`.
{F356635445}
Reviewed By: bigrabithong
Differential Revision: D24042567
fbshipit-source-id: 5e0065f8c10b8748daffa8a54a6bf8f461460b18
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51762
Update test_util.py to add a `make_tempdir()` function to the `TestCase`
class. The main advantage of this function is that the temporary
directory will be automatically cleaned up when the test case finishes,
so that test case does not need to worry about manually cleaning up this
directory.
This also prefixes the directory name with `caffe2_test.` so that it is
more obvious where the temporary directories came from if they are ever
left behind after a crashed or killed test process.
This updates the tests in `operator_test/load_save_test.py` to use this
new function, so they no longer have to perform their own manual cleanup
in each test.
Test Plan: python caffe2/python/operator_test/load_save_test.py
Reviewed By: mraway
Differential Revision: D26271178
Pulled By: simpkins
fbshipit-source-id: 51175eefed39d65c03484482e84923e5f39a4768
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51766
Check if we are on Windows using `sys.platform` rather than
`platform.system()`. Even though `platform.system()` is more modern, it
has a few downsides: this performs a runtime check of the platform type,
which has non-zero overhead. On Linux it actually executes the separate
`/bin/uname` process. On the other hand `sys.platform` is determined
when the Python interpreter is compiled, so this is a simple hard-coded
string.
Because it is a runtime check, `platform.system()` checks also cannot be
analyzed by static type checkers like Pyre and Mypy. These type
checkers do understand `sys.platform` checks, and can correctly avoid
complaining about code paths that use platform-specific modules and
functions. e.g., they can avoid complaining about `ctypes.WinDLL` not
existing on Linux if its use is guarded by a `sys.platform` check.
ghstack-source-id: 121107705
Test Plan: Ran tests on Linux, and will check CI test results.
Reviewed By: mraway
Differential Revision: D26271724
Pulled By: simpkins
fbshipit-source-id: b86e427e4ceec0324464ba4bc88b95d5813172d0
Summary:
Increasing the deadline as to avoid
flakiness of the test on ROCM.
Signed-off-by: Roy, Arindam <rarindam@gmail.com>
Fixes #{issue number}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52013
Reviewed By: albanD
Differential Revision: D26360209
Pulled By: mrshenli
fbshipit-source-id: 1ddc7062c5ff7c980233d22844073de9fb7dcbb3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52083
This makes minor fixes in `caffe2/python` to address all errors currently
reported by Pyre.
I update the code to fix errors when doing so looked simple and safe,
and added `pyre-fixme` comments in other places.
ghstack-source-id: 121109695
Test Plan: Confirmed that Pyre no longer reports errors under `caffe2/python`
Differential Revision: D26272279
fbshipit-source-id: b1eb19d323b613f23280ce9c71e800e874ca1162
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51769
Remove some Python 2 compatibility code that otherwise causes errors to
be reported from static type checkers.
Static type checkers complain that the old Python 2 modules and
functions referenced by this code do not exist. Given that Python 2
support is entirely deprecated now we can simply remove the
compatibility code.
ghstack-source-id: 121313191
Test Plan:
Was able to get Pyre to successfully type check the `caffe2/python`
directory with this and some other changes.
Reviewed By: Tianshu-Bao
Differential Revision: D26271723
Pulled By: simpkins
fbshipit-source-id: fec8a09466be6867388832380480aafd36616aa1
Summary: Moving caffe2_core_gpu_python contbuild to use GPU/RE
Test Plan: CI
Reviewed By: malfet
Differential Revision: D26261826
fbshipit-source-id: a6f8c7bd8368c1cb69499ea0ea7d5add0956a7ad
Summary:
The test is flaky on ROCM when deadline is set to 1 second. This is affecting builds as it is failing randomly.
Disabling for now.
Signed-off-by: Arindam Roy <rarindam@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50964
Reviewed By: houseroad
Differential Revision: D26049370
Pulled By: BIT-silence
fbshipit-source-id: 22337590a8896ad75f1281e56fbbeae897f5c3b2