Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29421
Inline graph before writing the bytecode file, so that all the instructions are emitted from the top-level methods.
Test Plan: Imported from OSS
Differential Revision: D18404180
fbshipit-source-id: 4759474a8dba3813616ebce8253bea09941f6bbb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28300
- Remove trivial stringstream from ScriptModuleSerializer::writeCode;
I didn't include this in earlier changes to avoid a merge conflict
with an earlier change.
- Remove underscore from QualifiedName var ref; no difference in
current use, but more correct.
ghstack-source-id: 92206909
Test Plan:
Benchmark: buck build mode/opt experimental/jeremyl/c2:
Correctness: buck test mode/dev-nosan caffe2/test/...
Differential Revision: D18012511
fbshipit-source-id: 7db057d77741cf69c4f2fed560771c3201da19ed
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28230
This change improves the pickling small data benchmark by roughly 30%.
(25.8usec -> 18.05usec).
One of the main issues was that we were spending 25%+ of the cpu profile
time in std::[o]stringstream constructors alone.
Two main parts
- Change some std::stringstream to std::ostringstream, when they
showed up on hot-ish paths, and it was trivial to convert them.
Roughly 27% of the std::stringstream constructor time is spent
building the constituent std::basic_istream. If the istream isn't
needed, don't construct it.
- For a couple of very hot paths (e.g. Pickler::pushGlobal), just
convert to traditional string::append(). std::ostringstream is
convenient, but not particularly efficient.
ghstack-source-id: 92153103
Test Plan:
Benchmarking: buck build mode/opt experimental/jeremyl/c2:SerializationBench
Correctness: buck test mode/dev-nosan caffe2/test/...
Differential Revision: D17982181
fbshipit-source-id: 7fd4d267293231244c10c1e5b8f4951a7a3d852f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28180
ScriptModuleSerializer::writeCode() is the only place during torch::save()
serialization where we attempt to zip compress records.
This change avoids compressing these string records if they are
sufficiently small - e.g. in the example I looked at:
- the strings were 123 and 28 bytes, respectively.
- the cost in the compression routines was 16.5% of the torch::save() cost.
(we're building a huffman table for a 28 byte string).
We'd save time and not significantly affect the space if we add these
1-line conditional compressions, rather than making it unconditional.
ghstack-source-id: 92104517
Test Plan:
Benchmark: experimental/jeremyl/c2:SerializationBench
Correctness: normal buck mode/dev-nosan caffe2/test/...
Differential Revision: D17967995
fbshipit-source-id: 7ff934388533645dc987e105c814ffe6324f4596
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28129
The previous PR in the stack removed the need to order classes/functions
or have correct import statements. This resolved circular depedency issues
that can arise when class constructors like ModuleList put new instances
of themselves in a common namespace.
This PR changes our export format to no longer produce this information.
By doing so we can make the logic signficantly simpler, since we just
keep track of an individual PythonPrint object per file.
Notes:
* PythonPrint was changed to manage its own stream/list of ranges. It
was doing this anyway internally, this just makes the API more clear.
* Since we are changing the serialization format, I also removed op_version_set.
It is now replaced with the VERSION number that written in the zip archive.
This further simplifies the code emission process.
* A test of op_version_set was removed since there is no longer any behavior
to test.
Test Plan: Imported from OSS
Differential Revision: D17961610
Pulled By: zdevito
fbshipit-source-id: ada362c4ca34d05393a1a7e799c94785ab9d9825
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28039
Right now, torch::save() uses std::ostream, which results in unnecessary
data copies in practice. Similar for torch::load().
Adding a std::function<size_t(const void*, size_t)> as an output option,
parallel to the existing filename and std::ostream apis, gives users the
flexibility to emit directly to a backing store.
For a simple case of appending the output to a std::string, we observe
significant benchmark savings (on order of -50%), even with the
minor std::function<> dispatch overhead. The main reason is that
std::ostringstream effectively requires 2 extra copies of the data
beyond a simple string.append lambda.
We also provide a parallel api for the load(), though this one is
slightly more complex due to the need to do arbitrary position reads.
Test Plan:
buck test mode/dev-nosan caffe2/test/...
(Basic serialization test in caffe2/test/cpp/api/serialize.cpp)
Benchmark in experimental/jeremyl/c2/SerializationBench.cpp, with D17823443
(1M time goes from 90ms -> 40ms, albeit with crc patch applied)
Differential Revision: D17939034
fbshipit-source-id: 344cce46f74b6438cb638a8cfbeccf4e1aa882d7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26787
A follow up PR will remove the need to issue import statements,
or write classes in order since they are no longer needed.
This change allows the same PythonPrint class
to be used for an entire file which will be needed in that patch.
Test Plan: Imported from OSS
Differential Revision: D17566440
Pulled By: zdevito
fbshipit-source-id: 1ee896da0cdfe6a003298e1d4b0238403b9ed6dd
Summary:
Right now, torch::save() uses std::ostream, which results in unnecessary
data copies in practice. Similar for torch::load().
Adding a std::function<size_t(const void*, size_t)> as an output option,
parallel to the existing filename and std::ostream apis, gives users the
flexibility to emit directly to a backing store.
For a simple case of appending the output to a std::string, we observe
significant benchmark savings (on order of -50%), even with the
minor std::function<> dispatch overhead. The main reason is that
std::ostringstream effectively requires 2 extra copies of the data
beyond a simple string.append lambda.
We also provide a parallel api for the load(), though this one is
slightly more complex due to the need to do arbitrary position reads.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27586
Test Plan:
buck test mode/dev-nosan caffe2/test/...
(Basic serialization test in caffe2/test/cpp/api/serialize.cpp)
Benchmark in experimental/jeremyl/c2/SerializationBench.cpp, with D17823443
(1M time goes from 90ms -> 40ms, albeit with crc patch applied)
Differential Revision: D17822962
Pulled By: jjlilley
fbshipit-source-id: d344a7e59707f3b30d42280fbab78f87399e4d10
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26770
This PR added the interface/object serialization as module attribute, to
allow initializing object as a interface type during python
initialization. Because interface type can be backed by any class object
that implements that interface, if we declare it in
python/module.__init__, we will need to collect the run time types of the
value and serialize them to ensure complete code information
Test Plan: Imported from OSS
Differential Revision: D17742707
fbshipit-source-id: 7f614ad4f982996d320a0e2dd3515bf47370e730
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27104
* The use case here is to replace prim::ListConstruct, which requires Node, but Node is not available in mobile lite interpreter.
* (OPN, X, N), X is the index to the vararg operator-name and operator tables. N is number of inputs. For ListConstruct example, operator name can be "aten::listconstruct" and the overloaded name is the output type ("int", "float", "bool", "tensor" and "generic").
* A vararg operator table is built with void(int input_size, Stack& stack) functions.
## Unit test
LiteInterpreterConv covers OPN instruction and conv operator.
Test Plan: Imported from OSS
Differential Revision: D17762853
fbshipit-source-id: 475aa0c6678e3760cec805862a78510913a89c83
Summary:
Bumping up the `producer_version` in exported ONNX models in view of the next release. Updating tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26976
Reviewed By: hl475
Differential Revision: D17631902
Pulled By: houseroad
fbshipit-source-id: 6d58964657402ac23963c49c07fcc813386aabf0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26758
This PR changes the order in which we import classes and functions so
that is is no longer necessary for them to defined in order in a file,
or for there to be proper import statements in the exported file.
Actually importing a function/class now is driven by the need to resolve
the entity during unpickling, type resolution, or value resolution.
While this should allow significant simplification to the code that
serializes classes, this work has not been done yet in order to avoid
inevitable forward compat issues in the transition period.
Notes:
* Individual functions have been replaced with a SourceImporter object
that exposes a resolveType method. This method loads the type if
it has not been loaded yet, potentially parsing (but not loading)
the file it exists in if that file hasn't been parsed yet.
* Some legacy functionality needed to be added as a method to this object
since the old format still used some of this logic for class resolution.
Test Plan: Imported from OSS
Differential Revision: D17558989
Pulled By: zdevito
fbshipit-source-id: 7eae3470bcbd388c4de463e3462d527776ed46c6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25187
The bytecode export flow: dump the bytecode format for the light weighted interpreter.
* The bytecode is generated without input spec optimization. It would be more generic (input independent) with no obvious performance degradation (to be tested).
* Main API: torch::jit::script::Module::save(filename, extra_files, bool *bytecode_format* = false).
* Both bytecode and module object are exported in pickle format.
* The module object (in data.pkl) is the same as the original JIT model.
* The serializer is dependent on pickle only (no protobuf or Json).
* The major functionality is forked in ScriptModuleSerializer2::serialize().
* The test loader is test_bc_export.cpp.
* Simple APIs are added in Code and its implementation to get necessary information (instructions, operators and constants).
* Since there's no dependency on graph/node, GetAttr is promoted from an operator to first-class instruction (https://github.com/pytorch/pytorch/pull/25151) .
* Some definitions (instructions, writeArchive, etc) that are shared by full JIT and bytecode are pulled out of the local namespace (https://github.com/pytorch/pytorch/pull/25148).
The output layout looks like:
* folders of methods.
* In each method folder (for example, forward/):
* bytecode.pkl: instructions and operators
* constants{.pkl,/}: constant list in constants.pkl. If there are tensors in constants, the binary tensor files in constants/ folder.
* data{.pkl,/}: the module object, with binary tensor files in data/ folder. The same as in torchscript.
Test Plan: Imported from OSS
Differential Revision: D17076411
fbshipit-source-id: 46eb298e7320d1e585b0101effc0fcfd09219046
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25440
See the comments deleted for what this PR is all about
Test Plan: Imported from OSS
Differential Revision: D17125690
Pulled By: suo
fbshipit-source-id: a4a2f541a3e161f9c15b51df475130e7bf683cf8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24284
This PR finishes the unification of all Tensor types into a single object.
ProfiledTensorType is renamed to TensorType and the old TensorType is
deleted.
Notes:
* Fixes bug in merge for VaryingShape by changing its representation to an
optional list of optional ints.
* Removes ProfiledTensorType::create(type) invocations that can now
simply be expect calls on tensor type.
Test Plan: Imported from OSS
Differential Revision: D16794034
Pulled By: zdevito
fbshipit-source-id: 10362398d0bb166d0d385d74801e95d9b87d9dfc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24282
This moves a test from Python to cpp, and in doing so lets us clean up a
bunch of otherwise unused code.
Test Plan: Imported from OSS
Differential Revision: D16800562
Pulled By: suo
fbshipit-source-id: ebc29bb81f4fb2538081fa309ead1739980f1093
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24281
These are not just classes anymore, rename
Test Plan: Imported from OSS
Differential Revision: D16800564
Pulled By: suo
fbshipit-source-id: 8b8d508944c26a8916fc7642df43f22583dfcf82
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24278
We had a lot of redundant methods. Killing them.
Test Plan: Imported from OSS
Differential Revision: D16800561
Pulled By: suo
fbshipit-source-id: 60acc1d5b0f34130a1f66a1e5bc7df364a5feb57
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23846
This moves a test from Python to cpp, and in doing so lets us clean up a
bunch of otherwise unused code.
Test Plan: Imported from OSS
Differential Revision: D16684390
Pulled By: suo
fbshipit-source-id: fca81ca14d1ac9e4d6b47ae5eecaa42b38d69147
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23845
These are not just classes anymore, rename
Test Plan: Imported from OSS
Differential Revision: D16684391
Pulled By: suo
fbshipit-source-id: af0024c0b7fbcca68785ec3fc6dc288ec46a1b84
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23691
We had a lot of redundant methods. Killing them.
Test Plan: Imported from OSS
Differential Revision: D16611883
Pulled By: suo
fbshipit-source-id: a32c0a8b8b7e909b386a70abb0827c26cbd37e20
Summary:
Starting ONNX IR version 4, the initializers in the ONNX graph do not have to be inputs of the graphs. This constraint, which existed in IR version 3 and earlier, was relaxed in IR version 4. This PR provides an API level argument to allow ONNX export with the relaxed constraint of IR version 4, i.e. provides the option to not include initializers as inputs. This allows backends/runtimes to do certain optimizations, such as constant folding, better.
*Edit*: After discussion with houseroad we have the following behavior. For any OperatorExportType, except OperatorExportTypes.ONNX, the current status of export is maintained in this PR by default. However, the user can override it by setting the `keep_initializers_as_inputs` argument to the export API. But when exporting to ONNX, i.e. OperatorExportType is OperatorExportTypes.ONNX, the current status is changed in that by default the initializers are NOT part of the input. Again, the default can be overridden by setting the `keep_initializers_as_inputs` argument.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23284
Differential Revision: D16459961
Pulled By: bddppq
fbshipit-source-id: b8f0270dfaba47cdb8e04bd4cc2d6294f1cb39cf
Summary:
Adds qtensor specific fields to the proto file so that they get serialized into the model.json
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23356
ghstack-source-id: 87263428
Differential Revision: D16473237
fbshipit-source-id: bf5b51d0863d036d30a1644a3c3b74516468224b
Summary:
Bumping up the producer_version in exported ONNX models in view of the next release. Updating tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23120
Reviewed By: zrphercule
Differential Revision: D16420917
Pulled By: houseroad
fbshipit-source-id: 6686b10523c102e924ecaf96fd3231240b4219a9
Summary:
`pickle` supports this and a lot of the quantized use cases for get/set
state follow this pattern
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23119
Pulled By: driazati
Differential Revision: D16391234
fbshipit-source-id: 9f63e0a1679daa61b17aa64b5995e2be23b07b50