In almost all cases this is only included for writing the output formatter, which
only uses `std::ostream` so including `<ostream>` is sufficient.
The istream header is ~1000 lines so the difference is non-trivial.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106914
Approved by: https://github.com/lezcano
In almost all cases this is only included for writing the output formatter, which
only uses `std::ostream` so including `<ostream>` is sufficient.
The istream header is ~1000 lines so the difference is non-trivial.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106914
Approved by: https://github.com/lezcano
We define specializations for pybind11 defined templates
(in particular, PYBIND11_DECLARE_HOLDER_TYPE) and consequently
it is important that these specializations *always* be #include'd
when making use of pybind11 templates whose behavior depends on
these specializations, otherwise we can cause an ODR violation.
The easiest way to ensure that all the specializations are always
loaded is to designate a header (in this case, torch/csrc/util/pybind.h)
that ensures the specializations are defined, and then add a lint
to ensure this header is included whenever pybind11 headers are
included.
The existing grep linter didn't have enough knobs to do this
conveniently, so I added some features. I'm open to suggestions
for how to structure the features better. The main changes:
- Added an --allowlist-pattern flag, which turns off the grep lint
if some other line exists. This is used to stop the grep
lint from complaining about pybind11 includes if the util
include already exists.
- Added --match-first-only flag, which lets grep only match against
the first matching line. This is because, even if there are multiple
includes that are problematic, I only need to fix one of them.
We don't /really/ need this, but when I was running lintrunner -a
to fixup the preexisting codebase it was annoying without this,
as the lintrunner overall driver fails if there are multiple edits
on the same file.
I excluded any files that didn't otherwise have a dependency on
torch/ATen, this was mostly caffe2 and the valgrind wrapper compat
bindings.
Note the grep replacement is kind of crappy, but clang-tidy lint
cleaned it up in most cases.
See also https://github.com/pybind/pybind11/issues/4099
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82552
Approved by: https://github.com/albanD
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os
def get_compiled_files_list():
import json
with open("build/compile_commands.json") as f:
data = json.load(f)
files = [os.path.relpath(node['file']) for node in data]
for idx, fname in enumerate(files):
if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
return files
def run_clang_tidy(fname):
check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
changes = check_output(["git", "ls-files", "-m"])
if len(changes) == 0:
return
check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])
def main():
git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
compiled_files = get_compiled_files_list()
for idx, fname in enumerate(git_files):
if fname not in compiled_files:
continue
if fname.startswith("caffe2/contrib/aten/"):
continue
print(f"[{idx}/{len(git_files)}] Processing {fname}")
run_clang_tidy(fname)
if __name__ == "__main__":
main()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892
Reviewed By: H-Huang
Differential Revision: D27991944
Pulled By: malfet
fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36282
The reason to do this explicitly in the tool is that we don't want to capture warmup in profiling (as well as input cloning). So instead we make the benchmarking code explicitly aware of the profiler.
Example output:
```
I0408 16:06:40.300040 85516 throughput_benchmark-inl.h:106] Using Autograd profiler. Trace will be saved to /tmp/tmpt0gsz85y
I0408 16:06:40.302232 85516 throughput_benchmark-inl.h:111] Starting threads
I0408 16:06:40.302258 85524 throughput_benchmark-inl.h:78] Starting forward thread 1
I0408 16:06:40.302259 85525 throughput_benchmark-inl.h:78] Starting forward thread 2
I0408 16:06:40.302261 85523 throughput_benchmark-inl.h:78] Starting forward thread 0
I0408 16:06:40.302259 85526 throughput_benchmark-inl.h:78] Starting forward thread 3
I0408 16:06:40.412879 85525 throughput_benchmark-inl.h:88] Shutting down forward thread 2. Total number of finished threads: 1
I0408 16:06:40.412971 85523 throughput_benchmark-inl.h:88] Shutting down forward thread 0. Total number of finished threads: 2
I0408 16:06:40.412989 85526 throughput_benchmark-inl.h:88] Shutting down forward thread 3. Total number of finished threads: 3
I0408 16:06:40.413033 85524 throughput_benchmark-inl.h:88] Shutting down forward thread 1. Total number of finished threads: 4
I0408 16:06:40.413056 85516 throughput_benchmark-inl.h:123] Finished benchmark
Average latency per example: 443.256us
Total number of iterations: 1000
Total number of iterations per second (across all threads): 9024.12
Total time: 110.814ms
```
Test Plan: Imported from OSS
Differential Revision: D20987125
Pulled By: ezyang
fbshipit-source-id: 1f8980c3a5a0abdc268c7a16c99aa9ea868689eb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35848
This class so far was used from Python binding only. As a result, testing in c++ only environment is not currently possible. More specifically, adding inputs requires using
py::args and py::kwargs. This PR fixes this by adding another addInput function to ScriptModuleBenchmark class.
Test Plan: Imported from OSS
Differential Revision: D20820772
Pulled By: ilia-cher
fbshipit-source-id: f1ea1b7baa637b297cc0dec5ca6375f6caff21f5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34515
Once upon a time we thought this was necessary. In reality it is not, so
removing it.
For backcompat, our public interface (defined in `api/`) still has
typedefs to the old `script::` names.
There was only one collision: `Pass` as a `Stmt` and `Pass` as a graph
transform. I renamed one of them.
Test Plan: Imported from OSS
Differential Revision: D20353503
Pulled By: suo
fbshipit-source-id: 48bb911ce75120a8c9e0c6fb65262ef775dfba93
Summary:
This is useful for measuring inference performance of your
models. This is a very basic benchmark for now. We don't support
batching on the benchmark side, no inter and intra op parallelizm is
supported yet, just caller based parallelizm.
Main phylosophy here is that user should be able to provide inputs
from python and just stack them within the benchmark. API should be
exactly the same as passing inputs to module.forward.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20766
Test Plan: Added a new unit test
Differential Revision: D15435461
Pulled By: salexspb
fbshipit-source-id: db08829dc3f4398bb1d8aa16cc4a58b6c72f16c6