pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

History

Mikayla Gawarecki 19207b9183 Allow more backend worker threads with each using a separate cuda stream (#116190 ) Added a `--num_workers` option to `server.py` that allows more than 1 worker in the `ThreadPoolWorker` used for model predictions. Each worker uses its own `cuda.Stream()` that is created when the worker thread is initialized. Ran benchmark for 2-4 workers with `compile=False` (since compile is not thread-safe) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116190 Approved by: https://github.com/albanD ghstack dependencies: #115286, #116187, #116188, #116189		2023-12-20 22:08:29 +00:00
..
output_1_false.md	Allow more backend worker threads with each using a separate cuda stream (#116190 )	2023-12-20 22:08:29 +00:00
output_1_true.md	Do H2D/D2H of input/result on separate threads/cuda.Streams (#116189 )	2023-12-20 22:08:29 +00:00
output_32_false.md	Allow more backend worker threads with each using a separate cuda stream (#116190 )	2023-12-20 22:08:29 +00:00
output_32_true.md	Do H2D/D2H of input/result on separate threads/cuda.Streams (#116189 )	2023-12-20 22:08:29 +00:00
output_64_false.md	Allow more backend worker threads with each using a separate cuda stream (#116190 )	2023-12-20 22:08:29 +00:00
output_64_true.md	Do H2D/D2H of input/result on separate threads/cuda.Streams (#116189 )	2023-12-20 22:08:29 +00:00
output_128_false.md	Allow more backend worker threads with each using a separate cuda stream (#116190 )	2023-12-20 22:08:29 +00:00
output_128_true.md	Do H2D/D2H of input/result on separate threads/cuda.Streams (#116189 )	2023-12-20 22:08:29 +00:00
output_256_false.md	Allow more backend worker threads with each using a separate cuda stream (#116190 )	2023-12-20 22:08:29 +00:00
output_256_true.md	Do H2D/D2H of input/result on separate threads/cuda.Streams (#116189 )	2023-12-20 22:08:29 +00:00