The 2MB thp pages provide better allocation latencies compared to the standard 4KB pages. This change has shown substantial improvement for batch mode usecases where the tensor sizes are larger than 100MB.
Only enabled if THP_MEM_ALLOC_ENABLE environment variable is set.
Relanding https://github.com/pytorch/pytorch/pull/93888 with functionality disabled for Android
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107697
Approved by: https://github.com/malfet
The 2MB thp pages provide better allocation latencies compared to the standard 4KB pages. This change has shown significant improvement for batch mode usecases where the tensor sizes are larger than 100MB.
Only enabled if `THP_MEM_ALLOC_ENABLE` environment variable is set.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93888
Approved by: https://github.com/jgong5, https://github.com/malfet
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70858
ghstack-source-id: 147642533
Test Plan: Extracted a constant to a new header, trusting CI build to validate.
Reviewed By: malfet
Differential Revision: D33329689
fbshipit-source-id: 8697bb81a5cc3366462ebdf1f214b62d478fa77c
(cherry picked from commit 16663847e1)