Commit Graph

65 Commits

Author SHA1 Message Date
Richard Barnes
1433160a36 use irange for loops 6 (#66742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66742

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D31705366

fbshipit-source-id: be58222426c192406a7f93c21582c3f6f2082401
2021-12-07 16:07:50 -08:00
Xue Li
2f099c7555 Revert D30652629: use irange for loops
Test Plan: revert-hammer

Differential Revision:
D30652629 (687c2267d4)

Original commit changeset: 0ae6c4bbbb55

fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3
2021-10-15 15:23:10 -07:00
Richard Barnes
687c2267d4 use irange for loops (#66234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

bypass_size_limit
allow-large-files

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D30652629

fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e
2021-10-15 13:50:33 -07:00
Kaige Liu
58adaaba60 Enable C2 load rate limiter [2/n] (#61551)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61551

We aim to enable rate limiter in C2 load, with a fix bandwidth limit.
This diff update LoadOp to pass down the manifold db options.

Test Plan:
```
buck test mode/opt caffe2/caffe2/python/operator_test:load_save_test
```

Differential Revision: D29639102

fbshipit-source-id: cf69549adadf4c7f12a8a2b7f3ca39092cab4b99
2021-07-14 08:27:05 -07:00
Nikita Shulga
4cb534f92e Make PyTorch code-base clang-tidy compliant (#56892)
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os

def get_compiled_files_list():
    import json
    with open("build/compile_commands.json") as f:
        data = json.load(f)
    files = [os.path.relpath(node['file']) for node in data]
    for idx, fname in enumerate(files):
        if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
            files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
    return files

def run_clang_tidy(fname):
    check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
    changes = check_output(["git", "ls-files", "-m"])
    if len(changes) == 0:
        return
    check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])

def main():
    git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
    compiled_files = get_compiled_files_list()
    for idx, fname in enumerate(git_files):
        if fname not in compiled_files:
            continue
        if fname.startswith("caffe2/contrib/aten/"):
            continue
        print(f"[{idx}/{len(git_files)}] Processing {fname}")
        run_clang_tidy(fname)

if __name__ == "__main__":
    main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892

Reviewed By: H-Huang

Differential Revision: D27991944

Pulled By: malfet

fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
2021-04-28 14:10:25 -07:00
Georgia Hong
26046b9110 [caffe2][publish] Optimize metanetdef load
Summary:
When loading optional blobs from a large file to workspace, for instance: https://fburl.com/diffusion/l0mcnofg, we are currently loading the file in multiple times. https://fburl.com/diffusion/qhbpyq0e

This diff optimized the load time by loading in the large model file only once, and using the allow_incomplete arg into LoadOp. The implementation of the LoadOp with this arg previously did not delete the blobs that were not found, which is also fixed in this diff.

Test Plan:
Existing unit tests:
```
buck test //caffe2/caffe2/fb/distribute/tests:meta_net_def_storage_utils_test
```
Many sandcastle integration tests.

scuba logs: https://fburl.com/scuba/dai_modelstore/txdf3pjt

Reviewed By: TailofJune

Differential Revision: D27575622

fbshipit-source-id: 7c2b25ef603a378e87ebdbe349c94c2f1952493c
2021-04-16 11:35:53 -07:00
Adam Simpkins
aae1023bed [caffe2] allow passing options to the DB in Save operations (#55935)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55935

Add a new `DB::SetOptions()` method to allow passing options to the DB as part
of Save operations.  This can be used for passing in options to control the
serialization behavior, such as rate limits or other parameters.  The
serialization options are passed is an opaque string, so that different DB
implementations may choose their own options and options format.

This also adds a new `db_options` parameter to the `Save` operator.
This allows users to pass in the DB options when saving data.
ghstack-source-id: 126589771

Test Plan:
I don't have any tests in this diff since no DB implements options yet.  The
next diff in the stack includes an options implementation, along with unit
tests that verify the options are passed in correctly.

Differential Revision: D27729461

fbshipit-source-id: 4d03250c389c66a049cdee1d05e082f5649ac0f0
2021-04-15 14:45:47 -07:00
Adam Simpkins
7e5ffbfa94 [caffe2] add a SerializationOptions field for the save operator (#53402)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53402

Add an `options` field to the `Save` operator which accepts options for how to
serialize different blobs.  At the moment this simply allows controlling the
existing `chunk_size` behavior, but in the future we can add other options,
such as the ability to control compression settings or other serialization
formats.
ghstack-source-id: 123567034

Test Plan:
Added a new test to `load_save_test.py` that passes in options and verifies
that blobs were serialized with the expected number of chunks.

  buck test caffe2/caffe2:caffe2_test_cpu \
    caffe2/caffe2/core:serialization_test \
    caffe2/caffe2/python/operator_test:load_save_test

Reviewed By: mraway

Differential Revision: D26502577

fbshipit-source-id: 6e302e530bb96990517c2e35c505db7f14a56284
2021-03-11 13:02:58 -08:00
Adam Simpkins
efb1895f81 [caffe2] use snprintf() instead of sprintf() in the Checkpoint operator (#53434)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53434

Use `snprintf()` to avoid buffer overflows.
Also only throw an exception on error, instead of crashing the entire
application.  A failure can occur if the caller supplies an invalid format
string.
ghstack-source-id: 123401582

Test Plan:
Ran the checkpoint tests:

  buck test caffe2/caffe2/python/operator_test:checkpoint_test

Verified that the checkpoint file names logged in the output are the same
before and after this change.

I also tested manually changed the initial buffer size to 1 to confirm that
the code works when the initial buffer size is too small.  I considered
updating the checkpoint_test.py code to test using long db names that would
exceed this limit, but I figured that long filenames was likely to cause
other problems on some platforms (Windows has a maximum path length of 260
characters up until pretty recent releases).

Differential Revision: D26863355

fbshipit-source-id: 8fc24faa2a8dd145471067718d323fdc8ce055d6
2021-03-09 12:54:15 -08:00
Adam Simpkins
f595ba1bae [caffe2] move the SaveOp implementation from a header to a .cc file (#53298)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53298

This is a re-land of D26641600 (3969391c07), but with the `SaveOpImpl` class marked as
`TORCH_API` to ensure that its symbols get exported properly in shared library
builds.

This moves the `SaveOp` code from `load_save_op.h` to `load_save_op.cc`.

Previously this implementation was all in the templatized `SaveOp` class, even
though most of the logic didn't depend on the template parameters.  Having
this code be in the header file slows down the build, and forces more files to
be rebuilt than necessary when changing the SaveOp code.  Having this code be
in a template class can also increase the generated code size be larger than
needed, as we don't need separate copies instantiated for each context type.
ghstack-source-id: 123146018

Test Plan:
buck test //caffe2/caffe2/python/operator_test:load_save_test

Also tested performing the CMake-based build using shared libraries with CUDA
enabled, and confirmed that the build succeeded.

Reviewed By: mraway

Differential Revision: D26802576

fbshipit-source-id: fc2dbdc1cd20680b082c887366a6305d86688138
2021-03-05 14:52:14 -08:00
Natalia Gimelshein
af1fb4e4ee Revert D26641600: [caffe2] move the SaveOp implementation from a header to a .cc file
Test Plan: revert-hammer

Differential Revision:
D26641600 (3969391c07)

Original commit changeset: 84ebe8164ffa

fbshipit-source-id: c3a85b7b15b8cdbf019abfabfd740a5b1d5e8775
2021-02-25 21:32:44 -08:00
Adam Simpkins
3969391c07 [caffe2] move the SaveOp implementation from a header to a .cc file
Summary:
Move the `SaveOp` code from `load_save_op.h` to `load_save_op.cc`.

Previously this implementation was all in the templatized `SaveOp` class, even
though most of the logic didn't depend on the template parameters.  Having
this code be in the header file slows down the build, and forces more files to
be rebuilt than necessary when changing the SaveOp code.  Having this code be
in a template class can also increase the generated code size be larger than
needed, as we don't need separate copies instantiated for each context type.

Test Plan: buck test //caffe2/caffe2/python/operator_test:load_save_test

Reviewed By: mraway

Differential Revision: D26641600

fbshipit-source-id: 84ebe8164ffac1e4a691be41147f0c5d8e890e09
2021-02-25 20:21:55 -08:00
Chunli Fu
58ee61176c SeqBlobReader Implementation (#29888)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29888

Extract some common functions out of class LoadOp.

Reviewed By: yinghai, ipiszy

Differential Revision: D18456785

fbshipit-source-id: d0b8e86ad5709c35f1dc3821376000db1114dc95
2019-11-16 01:18:54 -08:00
Pradeep Dorairaj
ead1193241 Transfer Learning: Caffe2 load op changes to return shape inference (#22829)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22829

Sending out caffe2 load op changes separately since we want pick it to open source.

This change is needed because the shape information of the blobs is determined from the load operator and that shape information is needed in our download_group.

Reviewed By: boryiingsu

Differential Revision: D16229465

fbshipit-source-id: f78b2df9a7f26968d70eca68dde75cd11ab6f7a2
2019-07-12 19:45:13 -07:00
Chandler Zuo
472be69a73 Avoid Output Uninitialized Blobs in Load with load_all=1 (#19133)
Summary:
When output blob names are specified while load_all=1, output blob names are ignored. However, this behavior is not documented. In this diff, we just disallow users to provide blob names when load_all=1.

See discussion at https://fb.workplace.com/groups/1405155842844877/permalink/2714909788536136/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19133

Reviewed By: dzhulgakov

Differential Revision: D14883698

Pulled By: chandlerzuo

fbshipit-source-id: 6e4171e36c4ccc4f857e79da98b858a06b7d8ad6
2019-04-27 10:45:44 -07:00
Liang Xiong
b1bea0b733 add logging to make the saving action visible (#19042)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19042

show the model saving step in the log.

Reviewed By: kennyhorror

Differential Revision: D14809385

fbshipit-source-id: c7a1e50ff92bb45b16b1c501d9325b304b07fbd3
2019-04-09 09:35:43 -07:00
Dmytro Dzhulgakov
a60fadfb71 Change message on unknown db type to be friendly (#17795)
Summary:
CreateDB actually returns nullptr when db type is unknown and throws when the file is missing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17795

Reviewed By: ezyang

Differential Revision: D14383226

Pulled By: dzhulgakov

fbshipit-source-id: 1dcf75a6b4ba8b64a24d4e5daf02db3189d56b7b
2019-03-08 10:46:24 -08:00
Sebastian Messmer
42512242cc refactor caffe2 operator constructors - 4/9 (#17085)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17085

clangr codemod

Reviewed By: ezyang

Differential Revision: D14078515

fbshipit-source-id: aaa48ae10892e3f47063f2133e026fea46f3240b
2019-02-28 14:23:52 -08:00
Wendong Li
569a29b81a Make chunk size configurable in SaveOp (#12949)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12949

Currently the default chunk size in save operation is 1MB and I don't find a way to configure it at runtime. Add a parameter to configure chunk size in SaveOp.

Reviewed By: mraway, xsh6528

Differential Revision: D10454037

fbshipit-source-id: a5cd8f9846aea4b1e3612a3fcfa431b68bda8104
2018-10-25 15:47:34 -07:00
Sebastian Messmer
b2b05b7c20 Move blob serialization to free functions (#11817)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11817

Blob::Serialize() and Blob::Deserialize() are now free functions SerializeBlob(), DeserializeBlob() instead.
This takes away access to Blob internals from them and makes future refactorings easier.

Reviewed By: ezyang

Differential Revision: D9882726

fbshipit-source-id: 3251ebd4b53fc12f5e6924a6e4a8db3846ab3729
2018-09-20 23:27:34 -07:00
Mingzhe Li
964e30de1d Workaround for Cuda9.2 and GCC7 compilation errors (#10510)
Summary:
Breaking out of #8338

This PR is a workaround for a bug with CUDA9.2 + GCC7.

Here is the error this PR fixed:
.../pytorch/caffe2/operators/elementwise_ops.h: In constructor ‘caffe2::BinaryElementwiseWithArgsOp<InputTypes, Context, Functor, OutputTypeMap>::BinaryElementwiseWithArgsOp(const caffe2::OperatorDef&, caffe2::Workspace*)’:
.../pytorch/caffe2/operators/elementwise_ops.h:106:189: error: ‘GetSingleArgument<bool>’ is not a member of ‘caffe2::BinaryElementwiseWithArgsOp<InputTypes, Context, Functor, OutputTypeMap>’
   BinaryElementwiseWithArgsOp(const OperatorDef& operator_def, Workspace* ws)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10510

Reviewed By: orionr

Differential Revision: D9319742

Pulled By: mingzhe09088

fbshipit-source-id: ce59e3db14539f071f3c20301e77ca36a6fc3f81
2018-08-14 20:54:52 -07:00
Jerry Zhang
aebf3b47ae Remove template parameter from Tensor (#9939)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9939

Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13

Pull Request resolved: https://github.com/pytorch/translate/pull/166

Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125

Closes https://github.com/pytorch/pytorch/pull/9125

Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later

Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:

1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change

Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.

Reviewed By: ezyang, houseroad

Differential Revision: D9024330

fbshipit-source-id: e0b8295d2dc6ebe2963383ded5af799ad17164ba
2018-07-27 10:56:39 -07:00
Jerry Zhang
969b62f276 Revert D8121878: Remove template parameter from Tensor
Differential Revision:
D8121878

Original commit changeset: 4a5e9a677ba4

fbshipit-source-id: d8e2c0bb145b52fbcca323b22d1d3346f0b3249e
2018-07-26 14:02:04 -07:00
Jerry Zhang
cd5adc7b5f Remove template parameter from Tensor (#13)
Summary:
Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13

Pull Request resolved: https://github.com/pytorch/translate/pull/166

Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125

Closes https://github.com/pytorch/pytorch/pull/9125

Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later

Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:

1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change

Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.

Reviewed By: xw285cornell

Differential Revision: D8121878

fbshipit-source-id: 4a5e9a677ba4ac82095df959851a054c81eccf81
2018-07-26 10:25:23 -07:00
Orion Reblitz-Richardson
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
Lei Tian
56508566a1 Enhance Caffe2 Load op to support loading blobs from multiple files.
Summary: The current Load op can only load blobs from one file. We need to make the Load op to support loading blobs from a list of dbs.

Reviewed By: boryiingsu

Differential Revision: D6596034

fbshipit-source-id: 906fa48b0ad61c83e247d497b6b079c04fed499f
2018-01-02 18:02:19 -08:00
Yangqing Jia
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
Dmitrii Podoprikhin
7df859871e Added functionality that allows users to store huge blobs
Summary: Added functionality that allows users to store huge blobs of any type not only Tensors. Blob has to be divided into chunks in the same way as Tensor blob.

Reviewed By: kennyhorror

Differential Revision: D5432762

fbshipit-source-id: c171faacd99d209bfae6f9707ebde7c4e23ba3b9
2017-08-02 16:08:09 -07:00
Robert Eng
4195858614 factored out DBExists function
Summary: DBExists function was factored out of the DBExistsOp.

Reviewed By: azzolini

Differential Revision: D5472587

fbshipit-source-id: 2a53375ffcccfb88e8f0af2ab55dad4c6a9586e3
2017-07-24 11:21:27 -07:00
Junjie Bai
4e019dbb6f Rename def() to debug_def()
Summary: Also eliminated non-debug ueses of debug_def

Reviewed By: akyrola

Differential Revision: D5441534

fbshipit-source-id: 9dab5fb74e25b4da504fa893ec1f3478e282d3f3
2017-07-17 23:50:01 -07:00
CSLJXing
cddb73899c fix strip prefix bug in SaveOp
Summary:
if strip_prefix_ not found in blob name, strip_prefix_.size() characters of blob name will be stripped.
Closes https://github.com/caffe2/caffe2/pull/924

Differential Revision: D5440941

Pulled By: akyrola

fbshipit-source-id: 1db772fac4c74f2ce05105eec4bc7742a9067ebc
2017-07-17 19:08:23 -07:00
Xiaolong Wang
e9d5863860 Allow Load operator to load into overriden names
Summary:
A new argument `blob_name_overrides` is added, which is to specify the
destination of loaded blob (in order to allow they have different names than
what are in the saved file/db).

This will be used for parameter initailization by pretrained model
in Dper 2. When loading a blob, we need to avoid name collision by assigning the
loaded blob with a new (temp) name.

Reviewed By: xianjiec

Differential Revision: D4952485

fbshipit-source-id: 4ce79bf40223314bb94981c22cbe537ae3f3d27c
2017-04-27 01:18:12 -07:00
Bor-Yiing Su
81a55f441c Adds interfaces to check the existence of a DB
Summary:
To evaluate on checkpoints, we often need to load from multiple checkpoints.
However, it is inconvenient if we always need to check the existence of
a checkpoint manually. Adds interfaces to check the existence of a DB
so that we can find available checkpoints automatically.

Reviewed By: azzolini

Differential Revision: D4823876

fbshipit-source-id: e5a65b736ac2addd0447c4add81dbd0986f422e7
2017-04-11 14:07:49 -07:00
Wael Abdelghani
65439e849b Fix mixed context loading validation
Summary:
Description.
We kinda have our hands tied here, can't reference conext_gpu since it needs to run under _gpu TARGET to pick up correct headers and can't change the interface of deserialize blob to return size since not all blobs are tensors.
If this works then let's ship it.

Reviewed By: urikz

Differential Revision: D4826034

fbshipit-source-id: 631ba56386ccb91d9b19d780a3e012d0ceea2422
2017-04-05 08:20:03 -07:00
Yury Zemlyanskiy
b0a0c437dd Some fixes for load/saving and beam search
Summary:
- Fixed loading params into ensemble model
- Small fix for beam decoder

Differential Revision: D4807595

fbshipit-source-id: 0187fda7eb469401f1acd8e6108de54ab67ae922
2017-04-01 02:17:21 -07:00
Luke Yeager
2d7731a5d1 Fix typo "mistmatch"
Summary: Closes https://github.com/caffe2/caffe2/pull/239

Differential Revision: D4814359

Pulled By: Yangqing

fbshipit-source-id: 59e959fb97a1d4960626c11242dc9b828b5db25f
2017-03-31 17:06:21 -07:00
Bor-Yiing Su
7fa4acab9b Loads only the model blobs from the checkpoints.
Summary:
To evaluate from checkpoints, we need to load a model from the checkpoints.
However, the checkpoints store way more blobs than the blobs needed by the
model. This function enables the model builder to load only the blobs
associated with the model to the workspace. After that, the model builder
can evaluate the model from the populated workspace.

Reviewed By: azzolini

Differential Revision: D4751414

fbshipit-source-id: a7a420228d681fc2dcfd8573cf69a97b1abc2ef3
2017-03-27 10:02:11 -07:00
Matt Uyttendaele
046b467c9a added prefix to load op
Summary:
modified load_save_op to work with my training script

- SaveOp now correctly strips specified prefix of the form 'gpu_0/' when saving model blobnames to DB
- when translating DB blobnames to model blobnames, LoadOp can now optionally add prefix of the same form

Reviewed By: Yangqing

Differential Revision: D4664134

fbshipit-source-id: a2512e79f0c5172c5111af3e9b6fd161f268f4df
2017-03-08 12:48:50 -08:00
Wael Abdelghani
9ef35f4a0b Add validation checks to load op
Summary: Added validation for load op when doing load_all by refactoring validation logic for loading specific blobs.

Reviewed By: kennyhorror

Differential Revision: D4641986

fbshipit-source-id: e0075a12188ca09d7628add72c143b40d5d9f382
2017-03-06 09:46:35 -08:00
Pooya Davoodi
aef75ca5dd Strip prefix of strip_prefix in blob names before save and load.
Summary:
- Replaces strip_regex implementation in SaveOp. It deletes the prefix of blob names upto a given substring.
- Adds the same functionality to LoadOp. Needed for loading checkpoints that are stored using the strip_prefix feature.
Closes https://github.com/caffe2/caffe2/pull/129

Differential Revision: D4512234

Pulled By: Yangqing

fbshipit-source-id: d926c1c5adcc7a711365cede11f21421bb7d4138
2017-03-04 15:46:47 -08:00
Artem Volkhin
0c03c8fca5 Add name_overrides argument to SaveOp
Summary:
In current implementation of SaveOp we always use names for blobs from the
current workspace. But there is a use case for replacing names in saved model:
for example, to use half-floats in prediction model but keep full-floats for
training model we might want to save a blob "w_fp16" as "w".

Differential Revision: D4567304

fbshipit-source-id: 87bc84fa6a45d8bfa33edb55ac1fb1cff542dbe3
2017-02-16 12:32:51 -08:00
Andrey Malevich
a8d70f3552 Try to improve serialization speed for SparseNN.
Summary:
Created some simple benchmark to test model saving speed, plus few possible
optimization on top of it.

Since we don't really want to have partial LogFileDB ever, it makes sense to
commit the transactions only after we've finished serialization.

As a result in my test serialization time in my dummy test drops from
480 seconds, to:
Serialization time: 52.5134651661
Deserialization time: 60.5741639137

One more really scary things that I've found:
it looks like load_op with load_all might actually load corrupted DBs (if they'll be truncated), so we do need to fix it really badly (save all blobs we have in the DB or even better checksum).

Reviewed By: dzhulgakov

Differential Revision: D4558216

fbshipit-source-id: 4145c07f29b9dda527a2e57842f3abd8023d71a3
2017-02-15 16:00:44 -08:00
Dmytro Dzhulgakov
864f561525 Make BlobDeserialization throw exceptions instead of returning bool
Summary: Makes it much nicer to spot errors, especially in iPython notebook.

Reviewed By: kennyhorror

Differential Revision: D4465726

fbshipit-source-id: c0adaf5168248a70987ff9d5dfce54a622ff2219
2017-01-26 09:44:19 -08:00
Yangqing Jia
f0996309d9 Fix Caffe2 gcc 4.8 regex issue
Summary:
It seems that a simple string("") conversion instead of "" is enough.
Closes https://github.com/caffe2/caffe2/pull/105

Differential Revision: D4458626

Pulled By: Yangqing

fbshipit-source-id: 5072499516332ad1067779526523a3f10aade6ef
2017-01-24 19:29:21 -08:00
Matt Uyttendaele
200ae58c35 modified save_op for multi-gpu training
Summary: added functions to "de scope" the saved model files

Reviewed By: Yangqing

Differential Revision: D4444966

fbshipit-source-id: f447c15754f8e0648459148fcc7fba410dc06f68
2017-01-23 19:44:20 -08:00
Alexander Sidorov
ceb0c765b9 Make avoid duplicate keys when doing chunking in serialization
Summary: Some DB don't support duplicate keys. Nvidia had problems with LMDB where we potentially can setup duplicate keys. But this won't be possible in some other cases. So instead lets just store different chunks with different keys in DB. And then when reading back we will remove the special suffix.

Reviewed By: dzhulgakov

Differential Revision: D4446583

fbshipit-source-id: 6b345e342840c5fd476029166db131d343467d48
2017-01-23 10:14:18 -08:00
Yangqing Jia
4858a6bc6f snapshot -> checkpoint
Summary:
This renames the "Snapshot" op name to "Checkpoint" as we discussed earlier.

The early Snapshot name is still available, but we should move to the new name and
eventually deprecate the old name.

The Python SnapshotManager should be also changed, cc azzolini

Reviewed By: dzhulgakov

Differential Revision: D4272021

fbshipit-source-id: 4b8e029354416530dfbf0d538bfc91a0f61e0296
2016-12-15 12:01:30 -08:00
Yangqing Jia
589398950f fbsync at f5a877 2016-11-18 15:41:06 -08:00
Yangqing Jia
238ceab825 fbsync. TODO: check if build files need update. 2016-11-15 00:00:46 -08:00
Yangqing Jia
44509f9f91 fbsync: mostly lint changes, added mkl files 2016-10-11 22:45:06 -07:00