mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
Improves first memory compression on pytorch struct from .55 -> .73. However, it doesn't totally eliminate the overhead from autotuning. Any other pointers on where the overhead is coming from in autotuning would be great.
Edit: i think it's just the triton cache clearing
|
||
|---|---|---|
| .. | ||
| __init__.py | ||
| autotuner.py | ||
| common.py | ||
| cpp_prefix.h | ||
| cpp.py | ||
| triton_conv_delta_x_hwc.j2 | ||
| triton_conv_delta_x.j2 | ||
| triton_mm.j2 | ||
| triton_template.py | ||
| triton.py | ||
| wrapper.py | ||