Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66742
Modified loops in files under fbsource/fbcode/caffe2/ from the format
`for(TYPE var=x0;var<x_max;x++)`
to the format
`for(const auto var: irange(xmax))`
This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.
Test Plan: Sandcastle
Reviewed By: malfet
Differential Revision: D31705366
fbshipit-source-id: be58222426c192406a7f93c21582c3f6f2082401
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234
Modified loops in files under fbsource/fbcode/caffe2/ from the format
`for(TYPE var=x0;var<x_max;x++)`
to the format
`for(const auto var: irange(xmax))`
This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.
bypass_size_limit
allow-large-files
Test Plan: Sandcastle
Reviewed By: ngimel
Differential Revision: D30652629
fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61551
We aim to enable rate limiter in C2 load, with a fix bandwidth limit.
This diff update LoadOp to pass down the manifold db options.
Test Plan:
```
buck test mode/opt caffe2/caffe2/python/operator_test:load_save_test
```
Differential Revision: D29639102
fbshipit-source-id: cf69549adadf4c7f12a8a2b7f3ca39092cab4b99
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os
def get_compiled_files_list():
import json
with open("build/compile_commands.json") as f:
data = json.load(f)
files = [os.path.relpath(node['file']) for node in data]
for idx, fname in enumerate(files):
if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
return files
def run_clang_tidy(fname):
check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
changes = check_output(["git", "ls-files", "-m"])
if len(changes) == 0:
return
check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])
def main():
git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
compiled_files = get_compiled_files_list()
for idx, fname in enumerate(git_files):
if fname not in compiled_files:
continue
if fname.startswith("caffe2/contrib/aten/"):
continue
print(f"[{idx}/{len(git_files)}] Processing {fname}")
run_clang_tidy(fname)
if __name__ == "__main__":
main()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892
Reviewed By: H-Huang
Differential Revision: D27991944
Pulled By: malfet
fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
Summary:
When loading optional blobs from a large file to workspace, for instance: https://fburl.com/diffusion/l0mcnofg, we are currently loading the file in multiple times. https://fburl.com/diffusion/qhbpyq0e
This diff optimized the load time by loading in the large model file only once, and using the allow_incomplete arg into LoadOp. The implementation of the LoadOp with this arg previously did not delete the blobs that were not found, which is also fixed in this diff.
Test Plan:
Existing unit tests:
```
buck test //caffe2/caffe2/fb/distribute/tests:meta_net_def_storage_utils_test
```
Many sandcastle integration tests.
scuba logs: https://fburl.com/scuba/dai_modelstore/txdf3pjt
Reviewed By: TailofJune
Differential Revision: D27575622
fbshipit-source-id: 7c2b25ef603a378e87ebdbe349c94c2f1952493c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55935
Add a new `DB::SetOptions()` method to allow passing options to the DB as part
of Save operations. This can be used for passing in options to control the
serialization behavior, such as rate limits or other parameters. The
serialization options are passed is an opaque string, so that different DB
implementations may choose their own options and options format.
This also adds a new `db_options` parameter to the `Save` operator.
This allows users to pass in the DB options when saving data.
ghstack-source-id: 126589771
Test Plan:
I don't have any tests in this diff since no DB implements options yet. The
next diff in the stack includes an options implementation, along with unit
tests that verify the options are passed in correctly.
Differential Revision: D27729461
fbshipit-source-id: 4d03250c389c66a049cdee1d05e082f5649ac0f0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53402
Add an `options` field to the `Save` operator which accepts options for how to
serialize different blobs. At the moment this simply allows controlling the
existing `chunk_size` behavior, but in the future we can add other options,
such as the ability to control compression settings or other serialization
formats.
ghstack-source-id: 123567034
Test Plan:
Added a new test to `load_save_test.py` that passes in options and verifies
that blobs were serialized with the expected number of chunks.
buck test caffe2/caffe2:caffe2_test_cpu \
caffe2/caffe2/core:serialization_test \
caffe2/caffe2/python/operator_test:load_save_test
Reviewed By: mraway
Differential Revision: D26502577
fbshipit-source-id: 6e302e530bb96990517c2e35c505db7f14a56284
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53434
Use `snprintf()` to avoid buffer overflows.
Also only throw an exception on error, instead of crashing the entire
application. A failure can occur if the caller supplies an invalid format
string.
ghstack-source-id: 123401582
Test Plan:
Ran the checkpoint tests:
buck test caffe2/caffe2/python/operator_test:checkpoint_test
Verified that the checkpoint file names logged in the output are the same
before and after this change.
I also tested manually changed the initial buffer size to 1 to confirm that
the code works when the initial buffer size is too small. I considered
updating the checkpoint_test.py code to test using long db names that would
exceed this limit, but I figured that long filenames was likely to cause
other problems on some platforms (Windows has a maximum path length of 260
characters up until pretty recent releases).
Differential Revision: D26863355
fbshipit-source-id: 8fc24faa2a8dd145471067718d323fdc8ce055d6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53298
This is a re-land of D26641600 (3969391c07), but with the `SaveOpImpl` class marked as
`TORCH_API` to ensure that its symbols get exported properly in shared library
builds.
This moves the `SaveOp` code from `load_save_op.h` to `load_save_op.cc`.
Previously this implementation was all in the templatized `SaveOp` class, even
though most of the logic didn't depend on the template parameters. Having
this code be in the header file slows down the build, and forces more files to
be rebuilt than necessary when changing the SaveOp code. Having this code be
in a template class can also increase the generated code size be larger than
needed, as we don't need separate copies instantiated for each context type.
ghstack-source-id: 123146018
Test Plan:
buck test //caffe2/caffe2/python/operator_test:load_save_test
Also tested performing the CMake-based build using shared libraries with CUDA
enabled, and confirmed that the build succeeded.
Reviewed By: mraway
Differential Revision: D26802576
fbshipit-source-id: fc2dbdc1cd20680b082c887366a6305d86688138
Summary:
Move the `SaveOp` code from `load_save_op.h` to `load_save_op.cc`.
Previously this implementation was all in the templatized `SaveOp` class, even
though most of the logic didn't depend on the template parameters. Having
this code be in the header file slows down the build, and forces more files to
be rebuilt than necessary when changing the SaveOp code. Having this code be
in a template class can also increase the generated code size be larger than
needed, as we don't need separate copies instantiated for each context type.
Test Plan: buck test //caffe2/caffe2/python/operator_test:load_save_test
Reviewed By: mraway
Differential Revision: D26641600
fbshipit-source-id: 84ebe8164ffac1e4a691be41147f0c5d8e890e09
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29888
Extract some common functions out of class LoadOp.
Reviewed By: yinghai, ipiszy
Differential Revision: D18456785
fbshipit-source-id: d0b8e86ad5709c35f1dc3821376000db1114dc95
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22829
Sending out caffe2 load op changes separately since we want pick it to open source.
This change is needed because the shape information of the blobs is determined from the load operator and that shape information is needed in our download_group.
Reviewed By: boryiingsu
Differential Revision: D16229465
fbshipit-source-id: f78b2df9a7f26968d70eca68dde75cd11ab6f7a2
Summary:
When output blob names are specified while load_all=1, output blob names are ignored. However, this behavior is not documented. In this diff, we just disallow users to provide blob names when load_all=1.
See discussion at https://fb.workplace.com/groups/1405155842844877/permalink/2714909788536136/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19133
Reviewed By: dzhulgakov
Differential Revision: D14883698
Pulled By: chandlerzuo
fbshipit-source-id: 6e4171e36c4ccc4f857e79da98b858a06b7d8ad6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19042
show the model saving step in the log.
Reviewed By: kennyhorror
Differential Revision: D14809385
fbshipit-source-id: c7a1e50ff92bb45b16b1c501d9325b304b07fbd3
Summary:
CreateDB actually returns nullptr when db type is unknown and throws when the file is missing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17795
Reviewed By: ezyang
Differential Revision: D14383226
Pulled By: dzhulgakov
fbshipit-source-id: 1dcf75a6b4ba8b64a24d4e5daf02db3189d56b7b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12949
Currently the default chunk size in save operation is 1MB and I don't find a way to configure it at runtime. Add a parameter to configure chunk size in SaveOp.
Reviewed By: mraway, xsh6528
Differential Revision: D10454037
fbshipit-source-id: a5cd8f9846aea4b1e3612a3fcfa431b68bda8104
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11817
Blob::Serialize() and Blob::Deserialize() are now free functions SerializeBlob(), DeserializeBlob() instead.
This takes away access to Blob internals from them and makes future refactorings easier.
Reviewed By: ezyang
Differential Revision: D9882726
fbshipit-source-id: 3251ebd4b53fc12f5e6924a6e4a8db3846ab3729
Summary:
Breaking out of #8338
This PR is a workaround for a bug with CUDA9.2 + GCC7.
Here is the error this PR fixed:
.../pytorch/caffe2/operators/elementwise_ops.h: In constructor ‘caffe2::BinaryElementwiseWithArgsOp<InputTypes, Context, Functor, OutputTypeMap>::BinaryElementwiseWithArgsOp(const caffe2::OperatorDef&, caffe2::Workspace*)’:
.../pytorch/caffe2/operators/elementwise_ops.h:106:189: error: ‘GetSingleArgument<bool>’ is not a member of ‘caffe2::BinaryElementwiseWithArgsOp<InputTypes, Context, Functor, OutputTypeMap>’
BinaryElementwiseWithArgsOp(const OperatorDef& operator_def, Workspace* ws)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10510
Reviewed By: orionr
Differential Revision: D9319742
Pulled By: mingzhe09088
fbshipit-source-id: ce59e3db14539f071f3c20301e77ca36a6fc3f81
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9939
Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13
Pull Request resolved: https://github.com/pytorch/translate/pull/166
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125
Closes https://github.com/pytorch/pytorch/pull/9125
Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later
Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:
1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change
Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.
Reviewed By: ezyang, houseroad
Differential Revision: D9024330
fbshipit-source-id: e0b8295d2dc6ebe2963383ded5af799ad17164ba
Summary:
Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13
Pull Request resolved: https://github.com/pytorch/translate/pull/166
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125
Closes https://github.com/pytorch/pytorch/pull/9125
Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later
Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:
1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change
Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.
Reviewed By: xw285cornell
Differential Revision: D8121878
fbshipit-source-id: 4a5e9a677ba4ac82095df959851a054c81eccf81
Summary: The current Load op can only load blobs from one file. We need to make the Load op to support loading blobs from a list of dbs.
Reviewed By: boryiingsu
Differential Revision: D6596034
fbshipit-source-id: 906fa48b0ad61c83e247d497b6b079c04fed499f
Summary: Added functionality that allows users to store huge blobs of any type not only Tensors. Blob has to be divided into chunks in the same way as Tensor blob.
Reviewed By: kennyhorror
Differential Revision: D5432762
fbshipit-source-id: c171faacd99d209bfae6f9707ebde7c4e23ba3b9
Summary: DBExists function was factored out of the DBExistsOp.
Reviewed By: azzolini
Differential Revision: D5472587
fbshipit-source-id: 2a53375ffcccfb88e8f0af2ab55dad4c6a9586e3
Summary:
if strip_prefix_ not found in blob name, strip_prefix_.size() characters of blob name will be stripped.
Closes https://github.com/caffe2/caffe2/pull/924
Differential Revision: D5440941
Pulled By: akyrola
fbshipit-source-id: 1db772fac4c74f2ce05105eec4bc7742a9067ebc
Summary:
A new argument `blob_name_overrides` is added, which is to specify the
destination of loaded blob (in order to allow they have different names than
what are in the saved file/db).
This will be used for parameter initailization by pretrained model
in Dper 2. When loading a blob, we need to avoid name collision by assigning the
loaded blob with a new (temp) name.
Reviewed By: xianjiec
Differential Revision: D4952485
fbshipit-source-id: 4ce79bf40223314bb94981c22cbe537ae3f3d27c
Summary:
To evaluate on checkpoints, we often need to load from multiple checkpoints.
However, it is inconvenient if we always need to check the existence of
a checkpoint manually. Adds interfaces to check the existence of a DB
so that we can find available checkpoints automatically.
Reviewed By: azzolini
Differential Revision: D4823876
fbshipit-source-id: e5a65b736ac2addd0447c4add81dbd0986f422e7
Summary:
Description.
We kinda have our hands tied here, can't reference conext_gpu since it needs to run under _gpu TARGET to pick up correct headers and can't change the interface of deserialize blob to return size since not all blobs are tensors.
If this works then let's ship it.
Reviewed By: urikz
Differential Revision: D4826034
fbshipit-source-id: 631ba56386ccb91d9b19d780a3e012d0ceea2422
Summary:
- Fixed loading params into ensemble model
- Small fix for beam decoder
Differential Revision: D4807595
fbshipit-source-id: 0187fda7eb469401f1acd8e6108de54ab67ae922
Summary:
To evaluate from checkpoints, we need to load a model from the checkpoints.
However, the checkpoints store way more blobs than the blobs needed by the
model. This function enables the model builder to load only the blobs
associated with the model to the workspace. After that, the model builder
can evaluate the model from the populated workspace.
Reviewed By: azzolini
Differential Revision: D4751414
fbshipit-source-id: a7a420228d681fc2dcfd8573cf69a97b1abc2ef3
Summary:
modified load_save_op to work with my training script
- SaveOp now correctly strips specified prefix of the form 'gpu_0/' when saving model blobnames to DB
- when translating DB blobnames to model blobnames, LoadOp can now optionally add prefix of the same form
Reviewed By: Yangqing
Differential Revision: D4664134
fbshipit-source-id: a2512e79f0c5172c5111af3e9b6fd161f268f4df
Summary: Added validation for load op when doing load_all by refactoring validation logic for loading specific blobs.
Reviewed By: kennyhorror
Differential Revision: D4641986
fbshipit-source-id: e0075a12188ca09d7628add72c143b40d5d9f382
Summary:
- Replaces strip_regex implementation in SaveOp. It deletes the prefix of blob names upto a given substring.
- Adds the same functionality to LoadOp. Needed for loading checkpoints that are stored using the strip_prefix feature.
Closes https://github.com/caffe2/caffe2/pull/129
Differential Revision: D4512234
Pulled By: Yangqing
fbshipit-source-id: d926c1c5adcc7a711365cede11f21421bb7d4138
Summary:
In current implementation of SaveOp we always use names for blobs from the
current workspace. But there is a use case for replacing names in saved model:
for example, to use half-floats in prediction model but keep full-floats for
training model we might want to save a blob "w_fp16" as "w".
Differential Revision: D4567304
fbshipit-source-id: 87bc84fa6a45d8bfa33edb55ac1fb1cff542dbe3
Summary:
Created some simple benchmark to test model saving speed, plus few possible
optimization on top of it.
Since we don't really want to have partial LogFileDB ever, it makes sense to
commit the transactions only after we've finished serialization.
As a result in my test serialization time in my dummy test drops from
480 seconds, to:
Serialization time: 52.5134651661
Deserialization time: 60.5741639137
One more really scary things that I've found:
it looks like load_op with load_all might actually load corrupted DBs (if they'll be truncated), so we do need to fix it really badly (save all blobs we have in the DB or even better checksum).
Reviewed By: dzhulgakov
Differential Revision: D4558216
fbshipit-source-id: 4145c07f29b9dda527a2e57842f3abd8023d71a3
Summary: Makes it much nicer to spot errors, especially in iPython notebook.
Reviewed By: kennyhorror
Differential Revision: D4465726
fbshipit-source-id: c0adaf5168248a70987ff9d5dfce54a622ff2219
Summary:
It seems that a simple string("") conversion instead of "" is enough.
Closes https://github.com/caffe2/caffe2/pull/105
Differential Revision: D4458626
Pulled By: Yangqing
fbshipit-source-id: 5072499516332ad1067779526523a3f10aade6ef
Summary: added functions to "de scope" the saved model files
Reviewed By: Yangqing
Differential Revision: D4444966
fbshipit-source-id: f447c15754f8e0648459148fcc7fba410dc06f68
Summary: Some DB don't support duplicate keys. Nvidia had problems with LMDB where we potentially can setup duplicate keys. But this won't be possible in some other cases. So instead lets just store different chunks with different keys in DB. And then when reading back we will remove the special suffix.
Reviewed By: dzhulgakov
Differential Revision: D4446583
fbshipit-source-id: 6b345e342840c5fd476029166db131d343467d48
Summary:
This renames the "Snapshot" op name to "Checkpoint" as we discussed earlier.
The early Snapshot name is still available, but we should move to the new name and
eventually deprecate the old name.
The Python SnapshotManager should be also changed, cc azzolini
Reviewed By: dzhulgakov
Differential Revision: D4272021
fbshipit-source-id: 4b8e029354416530dfbf0d538bfc91a0f61e0296