mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-06 12:20:52 +01:00
Added a `--num_workers` option to `server.py` that allows more than 1 worker in the `ThreadPoolWorker` used for model predictions. Each worker uses its own `cuda.Stream()` that is created when the worker thread is initialized. Ran benchmark for 2-4 workers with `compile=False` (since compile is not thread-safe) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116190 Approved by: https://github.com/albanD ghstack dependencies: #115286, #116187, #116188, #116189 |
||
|---|---|---|
| .. | ||
| output_1_false.md | ||
| output_1_true.md | ||
| output_32_false.md | ||
| output_32_true.md | ||
| output_64_false.md | ||
| output_64_true.md | ||
| output_128_false.md | ||
| output_128_true.md | ||
| output_256_false.md | ||
| output_256_true.md | ||