pytorch/torch/utils/_cpp_extension_versioner.py
Peter Goldsborough 6100c0ea14 Introduce ExtensionVersioner for C++ extensions (#11725)
Summary:
Python never closes shared library it `dlopen`s. This means that calling `load` or `load_inline` (i.e. building a JIT C++ extension) with the same C++ extension name twice in the same Python process will never re-load the library, even if the compiled source code and the underlying shared library have changed. The only way to circumvent this is to create a new library and load it under a new module name.

I fix this, of course, by introducing a layer of indirection. Loading a JIT C++ extension now goes through an `ExtensionVersioner`, which hashes the contents of the source files as well as build flags, and if this hash changed, bumps an internal version stored for each module name. A bump in the version will result in the ninja file being edited and a new shared library and effectively a new C++ extension to be compiled. For this the version name is appended as `_v<version>` to the extension name for all versions greater zero.

One caveat is that if you were to update your code many times and always re-load it in the same process, you may end up with quite a lot of shared library objects in your extension's folder under `/tmp`. I imagine this isn't too bad, since extensions are typically small and there isn't really a good way for us to garbage collect old libraries, since we don't know what still has handles to them.

Fixes https://github.com/pytorch/pytorch/issues/11398 CC The controller you requested could not be found.

ezyang gchanan soumith fmassa
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11725

Differential Revision: D9948244

Pulled By: goldsborough

fbshipit-source-id: 695bbdc1f1597c5e4306a45cd8ba46f15c941383
2018-09-20 14:43:12 -07:00

55 lines
1.7 KiB
Python

import collections
Entry = collections.namedtuple('Entry', 'version, hash')
def update_hash(seed, value):
# Good old boost::hash_combine
# https://www.boost.org/doc/libs/1_35_0/doc/html/boost/hash_combine_id241013.html
return seed ^ (hash(value) + 0x9e3779b9 + (seed << 6) + (seed >> 2))
def hash_source_files(hash_value, source_files):
for filename in source_files:
with open(filename) as file:
hash_value = update_hash(hash_value, file.read())
return hash_value
def hash_build_arguments(hash_value, build_arguments):
for group in build_arguments:
if group:
for argument in group:
hash_value = update_hash(hash_value, argument)
return hash_value
class ExtensionVersioner(object):
def __init__(self):
self.entries = {}
def get_version(self, name):
entry = self.entries.get(name)
return None if entry is None else entry.version
def bump_version_if_changed(self,
name,
source_files,
build_arguments,
build_directory,
with_cuda):
hash_value = 0
hash_value = hash_source_files(hash_value, source_files)
hash_value = hash_build_arguments(hash_value, build_arguments)
hash_value = update_hash(hash_value, build_directory)
hash_value = update_hash(hash_value, with_cuda)
entry = self.entries.get(name)
if entry is None:
self.entries[name] = entry = Entry(0, hash_value)
elif hash_value != entry.hash:
self.entries[name] = entry = Entry(entry.version + 1, hash_value)
return entry.version