Summary:
To be able to get more info on serialization/deserialization events, adding these two files to the metadata logging.
- file_name
- file_size
Test Plan: buck2 test mode/dev caffe2/caffe2/serialize:inline_container_test
Reviewed By: davidberard98
Differential Revision: D51040426
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113077
Approved by: https://github.com/davidberard98
Summary:
Use concurrent multiple readers to access record from different start index. It can provide better performance when the data being accessed is large.
bypass-github-pytorch-ci-checks
Test Plan:
```
buck2 run @//mode/dev //caffe2/caffe2/serialize:inline_container_test
```
Reviewed By: YazhiGao
Differential Revision: D50957607
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112818
Approved by: https://github.com/houseroad, https://github.com/huydhn
Summary:
Zion-4s core has poor perf when it comes to reading the large tensor (e.g. 300G), no matter for manifold downloading or reading from files. In this diff, I changed the getRecord function from single thread to multiple threads by passing multiple readers to getRecord function and access the same record at different chunks with different readers.
We control the number of additional reader with the`sigrid_model_manager_additional_reader` flag. The default value is 0. When `additional_reader=2`, we allocate `2` extra read client threads.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111426
Approved by: https://github.com/jiayisuse
For some weird reason, the batch file gets rid of the `exit /b 1` inside the for loop, so failures never actually get surfaced. Add skips for the tests that were failing.
Also don't run the windows cpu build on main since it's in trunk. This is what currently works for the rocm build.
The temp file failure originates from https://github.com/pytorch/pytorch/pull/108508 (got fixed before I merged this PR)
I'm not sure when the ChunkRecordIteratorTest started failing, but it was after the above.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109393
Approved by: https://github.com/malfet
Summary: The new logger allows passing metadata into the api usage logger. The immediate use case is to pass the serialization_id to the save and load events to be enable tracking serialized models in API events. It could be extended to add more metadata in the future.
Test Plan:
```
buck2 test @//mode/dev //caffe2/caffe2/serialize:inline_container_test
```
Reviewed By: davidberard98
Differential Revision: D45683697
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101762
Approved by: https://github.com/davidberard98
Summary:
serialization_id was added in a previous change to be written as a random GUID associated with each time saving of a module is called, for the purpose of adding tracking for saved artifacts. In order not to disturb existing systems that rely on the serialized bytes to be deterministic for serializing the same module, this change uses the combined hash of uncompressed content and file names instead of GUID for serialization id.
The use of this hashing reuses the same CRC32 that is already calculated for zip writing, so it doesn't incur additional computational overhead.
Data descriptor is one of the file headers inside the zip format https://en.wikipedia.org/wiki/ZIP_(file_format)#Data_descriptor. It contains the CRC32 of the uncompressed data. By inspecting the written data in PyTorchStreamWriter, the CRC32 is found for each written record.
In order to make serialization_id a unique and deterministic id for the
serialized files without computation overhead, the updated `serialization_id` is computed based on all files written, and is composed of:
1) a combined hash of record name hashes
2) a combined crc32 of the record uncompressed data
Example value: "15656915541136177431866432772"
Test Plan: buck2 test @//mode/dev //caffe2/caffe2/serialize:inline_container_test
Differential Revision: D46038973
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101964
Approved by: https://github.com/davidberard98
Summary:
In order to better track models after serialization, this change writes a serialization_id as a UUID to inline container. Having this ID enables traceability of model in saving and loading events.
serialization_id is generated as a new UUID everytime serialization takes place. It can be thought of as a model snapshot identifier at the time of serialization.
Test Plan:
```
buck2 test @//mode/dev //caffe2/caffe2/serialize:inline_container_test
```
Local tests:
```
buck2 run @//mode/opt //scripts/atannous:example_pytorch_package
buck2 run @//mode/opt //scripts/atannous:example_pytorch
buck2 run @//mode/opt //scripts/atannous:example_pytorch_script
```
```
$ unzip -l output.pt
Archive: output.pt
Length Date Time Name
--------- ---------- ----- ----
36 00-00-1980 00:00 output/.data/serialization_id
358 00-00-1980 00:00 output/extra/producer_info.json
58 00-00-1980 00:00 output/data.pkl
261 00-00-1980 00:00 output/code/__torch__.py
326 00-00-1980 00:00 output/code/__torch__.py.debug_pkl
4 00-00-1980 00:00 output/constants.pkl
2 00-00-1980 00:00 output/version
--------- -------
1045 7 files
```
```
unzip -p output.pt "output/.data/serialization_id"
a9f903df-cbf6-40e3-8068-68086167ec60
```
Differential Revision: D45683657
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100994
Approved by: https://github.com/davidberard98
Number of OSS PR were reverted, because new signed-unsigned comparison warnings, which are treated as errors in some internal builds.
Not sure how those selective rules are applied, but this PR removes `-Wno-sign-compare` from PyTorch codebase.
The only tricky part in this PR, as making sure that non-ASCII character detection works for both signed and unsigned chars here:
6e3d51b08a/torch/csrc/jit/serialization/python_print.cpp (L926)
Exclude several files from sign-compare if flash attention is used, due to the violation in cutlass, to be fixed by https://github.com/NVIDIA/cutlass/pull/869
Do not try to fix sign compare violations in caffe2 codebase
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96723
Approved by: https://github.com/albanD
This PR do two things:
1. It moves some Windows warning suppression from various CMake files into the main CMakeList.txt, following the conventions of gcc and clang.
2. It fixes some Windows warnings in the source code. Most importantly, it fixes lots of dll warnings by adjusting C10_API to TORCH_API or TORCH_PYTHON_API. There are still some dll warnings because some TORCH_API functions are actually built as part of libtorch_python
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94927
Approved by: https://github.com/malfet
Avoid double exception in destructor if attempting to serialize to
python object that does not have `write` method
Use `Finalizer` class in `PyTorchStreamWriter::writeEndOfFile()` to a
always set `finailized_` property even if excretion occurs. (as there
isn't much one can do at this point)
Add expicit check for the attribue to `_open_zipfile_writer_buffer` and
add unitests
Modernize code a bit by using Python-3 `super()` method
Fixes https://github.com/pytorch/pytorch/issues/87997
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88128
Approved by: https://github.com/albanD
Hi!
I was playing with libfuzzer and found bug when loading a model from file via `torch::jit::load` function.
There is an unhandled exception in caffe2/serialize when calling a `stoull` function on unsanitized version string.
The bug can be reproduced with `aot_model_compiler` binary:
```
aot_model_compiler --model=crash-stoull --model_name=name --model_version=1 --input_dims='1,3,224,224;2,2' --input_types='float;float'
```
Crash file is provided in [crash.zip](https://github.com/pytorch/pytorch/files/8701504/crash.zip).
gdb output:
```
Temporary breakpoint 1, main (argc=6, argv=0x7ffcd160f9f8) at /pytorch_master/binaries/aot_model_compiler.cc:87
87 "Run NNC AOT compiler for pytorch model. Example usage:\n"
(gdb) c
Continuing.
terminate called after throwing an instance of 'std::invalid_argument'
what(): stoull
Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007fa637f16859 in __GI_abort () at abort.c:79
#2 0x00007fa6381c1911 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007fa6381cd38c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007fa6381cd3f7 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007fa6381cd6a9 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007fa6381c42ce in std::__throw_invalid_argument(char const*) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7 0x000000000247d567 in __gnu_cxx::__stoa<unsigned long long, unsigned long long, char, int> (__str=0x7ffcd160f228 "ZZ", __idx=0x0, __base=10, __convf=<optimized out>, __name=<optimized out>)
at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/ext/string_conversions.h:83
#8 std::__cxx11::stoull (__str="ZZ", __idx=0x0, __base=10) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/basic_string.h:6577
#9 caffe2::serialize::PyTorchStreamReader::init (this=this@entry=0x8c11ce0) at /pytorch_master/caffe2/serialize/inline_container.cc:145
#10 0x000000000247d9c7 in caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader (this=0x8c11ce0, in=std::shared_ptr<class caffe2::serialize::ReadAdapterInterface> (empty) = {...})
at /pytorch_master/caffe2/serialize/inline_container.cc:88
#11 0x00000000035b7ba4 in __gnu_cxx::new_allocator<caffe2::serialize::PyTorchStreamReader>::construct<caffe2::serialize::PyTorchStreamReader, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (
__p=0x2, __args=..., this=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/ext/new_allocator.h:150
#12 std::allocator_traits<std::allocator<caffe2::serialize::PyTorchStreamReader> >::construct<caffe2::serialize::PyTorchStreamReader, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (__a=...,
__p=0x2, __p@entry=0x8c11ce0, __args=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/alloc_traits.h:512
#13 0x00000000035b1988 in std::_Sp_counted_ptr_inplace<caffe2::serialize::PyTorchStreamReader, std::allocator<caffe2::serialize::PyTorchStreamReader>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (this=0x8c11cd0, __a=..., __args=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr_base.h:551
#14 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<caffe2::serialize::PyTorchStreamReader, std::allocator<caffe2::serialize::PyTorchStreamReader>, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (this=0x7ffcd160f3a8, __p=@0x7ffcd160f3a0: 0x10, __args=..., __a=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr_base.h:683
#15 std::__shared_ptr<caffe2::serialize::PyTorchStreamReader, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<caffe2::serialize::PyTorchStreamReader>, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (this=0x7ffcd160f3a0, __args=..., __tag=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr_base.h:1371
#16 std::shared_ptr<caffe2::serialize::PyTorchStreamReader>::shared_ptr<std::allocator<caffe2::serialize::PyTorchStreamReader>, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (this=0x7ffcd160f3a0,
__args=..., __tag=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr.h:408
#17 std::allocate_shared<caffe2::serialize::PyTorchStreamReader, std::allocator<caffe2::serialize::PyTorchStreamReader>, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (__args=..., __a=...)
at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr.h:859
#18 std::make_shared<caffe2::serialize::PyTorchStreamReader, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (__args=...)
at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr.h:875
#19 torch::jit::load (rai=std::shared_ptr<class caffe2::serialize::ReadAdapterInterface> (empty) = {...}, device=device@entry=..., Python Exception <class 'gdb.error'> No type named std::__detail::_Hash_node<struct std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, true>.:
extra_files=std::unordered_map with 0 elements)
at /pytorch_master/torch/csrc/jit/serialization/import.cpp:474
#20 0x00000000035b1ef6 in torch::jit::load (filename="crash-stoull", device=device@entry=..., Python Exception <class 'gdb.error'> No type named std::__detail::_Hash_node<struct std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, true>.:
extra_files=std::unordered_map with 0 elements) at /pytorch_master/torch/csrc/jit/serialization/import.cpp:444
#21 0x00000000035b1d22 in torch::jit::load (filename="", device=device@entry=...) at /pytorch_master/torch/csrc/jit/serialization/import.cpp:424
#22 0x00000000008f9be3 in main (argc=1, argv=0x7ffcd160f9f8) at /pytorch_master/binaries/aot_model_compiler.cc:128
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77557
Approved by: https://github.com/Gamrix
Summary:
Remove code dup in import.cpp / export_modules.cpp such that
1. Only one copy of switching logic (detect flatbuffer / is_flatbuffer);
2. Move detection of includeness of flatbuffer to runtime (so no more macros)
This also reverts the dependency of import.cpp -> flatbuffer_loader.cpp to flatbuffer_loader.cpp -> import.cpp.
Differential Revision: D36926217
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79184
Approved by: https://github.com/zhxchen17
Fixes#75177
Couldn't find any utility method to get directory name in pytorch repo, hence creating a function for that.
Let me know if a new function is not needed.
I also referred [this](https://github.com/pytorch/pytorch/blob/master/c10/test/util/tempfile_test.cpp#L15) for directory check.
Also I am using TORCH_CHECK to show the error. This is highly verbose with the entire stack visible. Is there any alternative for the same so that it is easier to read? This could happen a frequently, so small and concise error would be more helpful here.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75681
Approved by: https://github.com/albanD
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75201
In this diff:
1. Bump supported version to 9, which will serve as a placeholder for upcoming version bump to v9 for flatbuffer format migration.
2. Implements backport from v9 flatbuffer file to v8 pickle file.
ghstack-source-id: 153225189
(Note: this ignores all push blocking failures!)
Test Plan:
fb:
```
cd ~/fbsource/fbcode/ && buck test -c fbcode.caffe2_enable_flatbuffer=1 caffe2/test/cpp/jit:jit -- LiteInterpreterTest.BackPortByteCodeModelAllVersions
Parsing buck files: finished in 0.7 sec
Downloaded 0/25 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 20.7 sec (100%) 21783/21783 jobs, 5/21783 updated
cd ~/fbsource/fbcode/ && buck test caffe2/test/cpp/jit:jit -- FlatbufferTest.FlatbufferBackPortTest
Parsing buck files: finished in 0.7 sec
Building: finished in 4.5 sec (100%) 12972/53298 jobs, 0/53298 updated
Total time: 5.3 sec
More details at https://www.internalfb.com/intern/buck/build/b658d597-d358-4293-97cb-28e7612b96e8
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 35d5542d-6ee3-4c28-be10-1d822c7a6fef
Trace available for this run at /tmp/tpx-20220308-090347.891303-35d5542d-6ee3-4c28-be10-1d822c7a6fef/trace.log
RemoteExecution session id: reSessionID-35d5542d-6ee3-4c28-be10-1d822c7a6fef-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8444249379196000
✓ ListingSuccess: caffe2/test/cpp/jit:jit : 490 tests discovered (22.838)
✓ Pass: caffe2/test/cpp/jit:jit - FlatbufferTest.FlatbufferBackPortTest (0.289)
Summary
Pass: 1
ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/8444249379196000
```
Reviewed By: iseeyuan
Differential Revision: D34702597
fbshipit-source-id: 5c203c29d13360d7934ce6e57557739e7038c05e
(cherry picked from commit 6189e08a2bd968fdab636f77cb6bd73d6c36beb2)
Summary:
Explicitly state that users should upgrade PyTorch to mitigate issues of loading TS module that's outside of support window
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74228
Reviewed By: tugsbayasgalan
Differential Revision: D34887538
Pulled By: gmagogsfm
fbshipit-source-id: 7ebeb5ee5f5b2f388f8f8bb72b8eb12eadd7a613
(cherry picked from commit c584df2fc80e70b28e9d6008c84295305e4e19b6)
This patch enables Pytorch build from source with Ninja and
'Visual Studio 16 2019' CMake generator on Windows on Arm.
Tests:
- Build from source: 'python setup.py develop'.
- Run simple Pytorch example: passed
- python test\test_torch.py:
-- same results as on x64
-- Ran 1344 tests, failures=2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72424
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73097
As title
ghstack-source-id: 149500912
Test Plan: CI
Reviewed By: pavithranrao
Differential Revision: D34347005
fbshipit-source-id: 76f96c627983a81fa02701ab174d35cb9c891628
(cherry picked from commit 857de08b31)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71662
backport v8 to v7 to support promoted ops as instruction
a flag to help export as instruction from v8 and export as operators for v7 and below
Test Plan:
```
buck test caffe2/test/cpp/jit:jit -- LiteInterpreterTest.BackPortByteCodeModelAllVersions
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/5629499620570927
✓ ListingSuccess: caffe2/test/cpp/jit:jit : 461 tests discovered (15.693)
✓ Pass: caffe2/test/cpp/jit:jit - LiteInterpreterTest.BackPortByteCodeModelAllVersions (2.712)
Summary
Pass: 1
ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5629499620570927
```
```
buck run mode/opt //caffe2/torch/fb/mobile/upgrader_codegen:upgrader_codegen
buck test mode/opt //caffe2/test:upgrader_codegen -- mobile.test_upgrader_codegen.TestLiteScriptModule
Parsing buck files: finished in 0.8 sec
Downloaded 0/2 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 01:39.4 min (100%) 11031/11031 jobs, 2/11031 updated
Total time: 01:40.2 min
More details at https://www.internalfb.com/intern/buck/build/a8b0e417-019c-44ba-be6b-23379411a965
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 44fbfa66-cce8-4277-82ac-f89d79558581
Trace available for this run at /tmp/tpx-20220202-160956.915412/trace.log
RemoteExecution session id: reSessionID-44fbfa66-cce8-4277-82ac-f89d79558581-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/281475200877601
✓ ListingSuccess: caffe2/test:upgrader_codegen : 1 tests discovered (1.249)
✓ Pass: caffe2/test:upgrader_codegen - test_generate_bytecode (mobile.test_upgrader_codegen.TestLiteScriptModule) (1.365)
Summary
Pass: 1
ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/281475200877601
```
Reviewed By: iseeyuan
Differential Revision: D33719098
fbshipit-source-id: e2d2b23d298f98e4d4fcdfc344f7b8c6f92cff26
(cherry picked from commit 81b956c23a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71486
This PR adds upgraders for linspace and linspace.out as the optional step size will be deprecated soon. Old models will be using steps size of 100 when nothing is provided.
Test Plan: buck-out/gen/caffe2/test/jit#binary.par -r TestUpgraders.test_aten_linspace
Reviewed By: cccclai, mruberry
Differential Revision: D33654308
fbshipit-source-id: 0e0138091da0b11d4f49156eeb6bcd7e46102a5b
(cherry picked from commit 931ae4af32)
Summary: Reland for D33282878 (911d527b87) . Land backend change first to maintain FC. Will wait for 2 weeks after this diff is in. And than land the front-end change in next diff.
Test Plan:
test in next diff
time buck test mode/dev-nosan fblearner/flow/projects/langtech/translation:tests -- test_e2e_base_training
Reviewed By: gmagogsfm
Differential Revision: D33342547
fbshipit-source-id: b3dee9a4bdfd78103848c12629e5fccafdd621e3
(cherry picked from commit ae1935f1af)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71461
After operator versioning work, the version in model file is used for operator versioning, while bytecode_version is used for bytecode versioning (for bytecode schema). They are two seperate things now and this comparison is not needed.
ghstack-source-id: 147209286
Test Plan: CI
Reviewed By: iseeyuan, tugsbayasgalan
Differential Revision: D33648592
fbshipit-source-id: beaa136a728f88435176a00c07b2d521210f107f
(cherry picked from commit e90e650e1a)