Commit Graph

7 Commits

Author SHA1 Message Date
银河渡舟
4d8090cabb Avoid file encoding issues when loading cpp extensions (#138565)
I've found that when using `torch.utils.cpp_extension.load` on my Windows system, decoding errors occur when my .cpp/.cu files contain certain non-English characters.

`test.py`:
```py
from torch.utils.cpp_extension import load
my_lib = load(name='my_cuda_kernel', sources=['my_cuda_kernel.cu'], extra_cuda_cflags=['-O2', '-std=c++17'])
# ......
```

`my_cuda_kernel.cu`:
```cpp
#include <torch/types.h>
#include <torch/extension.h>
// 向量化 <------ some chinese characters

// ......
```

Errors will be reported as:
```
Traceback (most recent call last):
  File "E:\test\test.py", line 8, in <module>
    my_lib = load(
                 ^^^^^
  File "C:\Users\XXX\AppData\Roaming\Python\Python311\site-packages\torch\utils\cpp_extension.py", line 1314, in load
    return _jit_compile(
           ^^^^^^^^^^^^^
  File "C:\Users\XXX\AppData\Roaming\Python\Python311\site-packages\torch\utils\cpp_extension.py", line 1680, in _jit_compile
    version = JIT_EXTENSION_VERSIONER.bump_version_if_changed(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\XXX\AppData\Roaming\Python\Python311\site-packages\torch\utils\_cpp_extension_versioner.py", line 46, in bump_version_if_changed
    hash_value = hash_source_files(hash_value, source_files)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\XXX\AppData\Roaming\Python\Python311\site-packages\torch\utils\_cpp_extension_versioner.py", line 17, in hash_source_files
    hash_value = update_hash(hash_value, file.read())
                                         ^^^^^^^^^^^
UnicodeDecodeError: 'gbk' codec can't decode byte 0x96 in position 141: illegal multibyte sequence
```

The issue lies in the fact that the `open()` function in Python is platform-dependent, which can cause decoding errors when a file contains characters that are not supported by the default encoding. Pytorch uses file contents to generate hash string:
60c1433041/torch/utils/_cpp_extension_versioner.py (L16-L17)

In my windows the default encoding is `gbk` but all of my cpp files are in `utf-8`.

There is a simple solution to this problem I think: just change the file reading mode to binary mode, which can avoid issues related to file encoding. It works perfectly on my computer.

```diff
- with open(filename) as file:
+ with open(filename, 'rb') as file:
    hash_value = update_hash(hash_value, file.read())
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138565
Approved by: https://github.com/malfet, https://github.com/janeyx99

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2024-10-28 14:06:34 +00:00
Aaron Orenstein
8db9dfa2d7 Flip default value for mypy disallow_untyped_defs [9/11] (#127846)
See #127836 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127846
Approved by: https://github.com/ezyang
ghstack dependencies: #127842, #127843, #127844, #127845
2024-06-08 18:50:06 +00:00
Aaron Gokaslan
8fce9a09cd [BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308)
Apply parts of pyupgrade to torch (starting with the safest changes).
This PR only does two things: removes the need to inherit from object and removes unused future imports.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-07 21:10:56 +00:00
Taylor Robie
07f038aa9d Add option for cpp_extensions to compile standalone executable (#47862)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47862

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D25199265

Pulled By: robieta

fbshipit-source-id: eceb04dea60b82eb10434099639fa3afa61000ca
2020-12-01 20:03:08 -08:00
Xiang Gao
20ac736200 Remove py2 compatible future imports (#44735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735

Reviewed By: mruberry

Differential Revision: D23731306

Pulled By: ezyang

fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f
2020-09-16 12:55:57 -07:00
Yangqing Jia
c47f680086 arc lint torch/utils (#13141)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13141

This is an example diff to show what lint rules are being applied.

Reviewed By: mingzhe09088

Differential Revision: D10858478

fbshipit-source-id: cbeb013f10f755b0095478adf79366e7cf7836ff
2018-10-25 14:59:03 -07:00
Peter Goldsborough
6100c0ea14 Introduce ExtensionVersioner for C++ extensions (#11725)
Summary:
Python never closes shared library it `dlopen`s. This means that calling `load` or `load_inline` (i.e. building a JIT C++ extension) with the same C++ extension name twice in the same Python process will never re-load the library, even if the compiled source code and the underlying shared library have changed. The only way to circumvent this is to create a new library and load it under a new module name.

I fix this, of course, by introducing a layer of indirection. Loading a JIT C++ extension now goes through an `ExtensionVersioner`, which hashes the contents of the source files as well as build flags, and if this hash changed, bumps an internal version stored for each module name. A bump in the version will result in the ninja file being edited and a new shared library and effectively a new C++ extension to be compiled. For this the version name is appended as `_v<version>` to the extension name for all versions greater zero.

One caveat is that if you were to update your code many times and always re-load it in the same process, you may end up with quite a lot of shared library objects in your extension's folder under `/tmp`. I imagine this isn't too bad, since extensions are typically small and there isn't really a good way for us to garbage collect old libraries, since we don't know what still has handles to them.

Fixes https://github.com/pytorch/pytorch/issues/11398 CC The controller you requested could not be found.

ezyang gchanan soumith fmassa
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11725

Differential Revision: D9948244

Pulled By: goldsborough

fbshipit-source-id: 695bbdc1f1597c5e4306a45cd8ba46f15c941383
2018-09-20 14:43:12 -07:00