pytorch/benchmarks/inference/results
Mikayla Gawarecki 19207b9183 Allow more backend worker threads with each using a separate cuda stream (#116190)
Added a `--num_workers` option to `server.py` that allows more than 1 worker in the `ThreadPoolWorker` used for model predictions. Each worker uses its own `cuda.Stream()` that is created when the worker thread is initialized.

Ran benchmark for 2-4 workers with `compile=False` (since compile is not thread-safe)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116190
Approved by: https://github.com/albanD
ghstack dependencies: #115286, #116187, #116188, #116189
2023-12-20 22:08:29 +00:00
..
output_1_false.md Allow more backend worker threads with each using a separate cuda stream (#116190) 2023-12-20 22:08:29 +00:00
output_1_true.md Do H2D/D2H of input/result on separate threads/cuda.Streams (#116189) 2023-12-20 22:08:29 +00:00
output_32_false.md Allow more backend worker threads with each using a separate cuda stream (#116190) 2023-12-20 22:08:29 +00:00
output_32_true.md Do H2D/D2H of input/result on separate threads/cuda.Streams (#116189) 2023-12-20 22:08:29 +00:00
output_64_false.md Allow more backend worker threads with each using a separate cuda stream (#116190) 2023-12-20 22:08:29 +00:00
output_64_true.md Do H2D/D2H of input/result on separate threads/cuda.Streams (#116189) 2023-12-20 22:08:29 +00:00
output_128_false.md Allow more backend worker threads with each using a separate cuda stream (#116190) 2023-12-20 22:08:29 +00:00
output_128_true.md Do H2D/D2H of input/result on separate threads/cuda.Streams (#116189) 2023-12-20 22:08:29 +00:00
output_256_false.md Allow more backend worker threads with each using a separate cuda stream (#116190) 2023-12-20 22:08:29 +00:00
output_256_true.md Do H2D/D2H of input/result on separate threads/cuda.Streams (#116189) 2023-12-20 22:08:29 +00:00