Commit Graph

82 Commits

Author SHA1 Message Date
PyTorch MergeBot
b5594f7df0 Revert "Use missing-prototypes in torch_cpu (#103725)"
This reverts commit 716b3b893d.

Reverted https://github.com/pytorch/pytorch/pull/103725 on behalf of https://github.com/osalpekar due to Broke caffe2 builds due. More info at [D46920675](https://www.internalfb.com/diff/D46920675) ([comment](https://github.com/pytorch/pytorch/pull/103725#issuecomment-1603129273))
2023-06-22 18:30:31 +00:00
cyy
716b3b893d Use missing-prototypes in torch_cpu (#103725)
This PR enables  Wmissing-prototypes in torch_cpu except some generated cpp files and the mps and metal backends.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103725
Approved by: https://github.com/albanD
2023-06-21 13:19:55 +00:00
Richard Barnes
704af23ee4 Use a reference in GetSingleArgument (#71007)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71007

A string copy at Line 417 is currently consuming 125,749,287,000 cycles/day. I suspect the issue is with a copy-on-return, but we can experiment with introducing a reference in the middle to see if that produces a good savings without changing the interface.

Reference
```
["Inline caffe2::ArgumentHelper::GetSingleArgument @ caffe2/caffe2/utils/proto_utils.cc:417"]
```

Test Plan: Sandcastle

Reviewed By: xw285cornell

Differential Revision: D33478883

fbshipit-source-id: e863e359c0c718fcd0d52fd4b3c7858067de0670
2022-01-07 20:18:56 -08:00
Scott Wolchok
03a58a2ba0 [Caffe2] Create fewer strings during argument fetching (#64285)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64285

With C++14 heterogeneous ordered container lookup, it is no longer necessary to create a `std::string` in order to look up elements of a `CaffeMap` keyed by std::string. Accordingly, this diff reworks the argument-getting operator functions to avoid that in favor of `c10::string_view`.
ghstack-source-id: 137139818
ghstack-source-id: 137139818

Test Plan: buildsizebot iOS apps -- code size win. less strings is probably marginally good for perf but this only happens at setup time anyway.

Reviewed By: dzhulgakov

Differential Revision: D26826676

fbshipit-source-id: ee653b14dc2c528bae8c90f0fc6a7a419cbca1d6
2021-09-01 13:30:54 -07:00
Andrew Gallagher
20bda0057e [caffe2/utils] Add explicit rule to avoid package boundary violation
Summary:
Add a rule to wrap proto_utils.h and depend on that, rather than
relying on a glob which violates package boundaries.

Reviewed By: igorsugak

Differential Revision: D29273453

fbshipit-source-id: 08f198a03d06ee2fdf61f5dbe1d0087db22aec8b
2021-06-22 12:22:24 -07:00
Nikita Shulga
3a66a1cb99 [clang-tidy] Exclude cppcoreguidelines-avoid-magic-numbers (#57841)
Summary:
Add cppcoreguidelines-avoid-magic-numbers exclusion to clang-tidy
Remove existing nolint warnings using following script:
```
for file in `git ls-files | grep -v \.py`; do gsed '/^ *\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers)/d' -i  $file; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57841

Reviewed By: samestep

Differential Revision: D28295045

Pulled By: malfet

fbshipit-source-id: 7c6e8d1213c9593f169ed3df6a916498f1a97163
2021-05-07 20:02:33 -07:00
Nikita Shulga
4cb534f92e Make PyTorch code-base clang-tidy compliant (#56892)
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os

def get_compiled_files_list():
    import json
    with open("build/compile_commands.json") as f:
        data = json.load(f)
    files = [os.path.relpath(node['file']) for node in data]
    for idx, fname in enumerate(files):
        if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
            files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
    return files

def run_clang_tidy(fname):
    check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
    changes = check_output(["git", "ls-files", "-m"])
    if len(changes) == 0:
        return
    check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])

def main():
    git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
    compiled_files = get_compiled_files_list()
    for idx, fname in enumerate(git_files):
        if fname not in compiled_files:
            continue
        if fname.startswith("caffe2/contrib/aten/"):
            continue
        print(f"[{idx}/{len(git_files)}] Processing {fname}")
        run_clang_tidy(fname)

if __name__ == "__main__":
    main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892

Reviewed By: H-Huang

Differential Revision: D27991944

Pulled By: malfet

fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
2021-04-28 14:10:25 -07:00
davidriazati@fb.com
2f5c352162 Fix protobuf warnings in caffe2 (#56186)
Summary:
This guards some deprecated usages of the Protobuf API behind an `#ifdef` (this is how onnx does it as well)
](https://our.intern.facebook.com/intern/diff/27803121/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56186

Pulled By: driazati

Reviewed By: bertmaher, dzhulgakov

Differential Revision: D27803121

fbshipit-source-id: 2d3a348ec1ab9879a0d8f2dff17c5444fd4baf2c
2021-04-19 15:19:53 -07:00
Ying Zhang
8c1a70a7c9 [A*][Gen-1.5] Add shape inference func for PredictorCall.
Summary:
ATT, so that the shape inference works for a model with only distributed parts.

Previously, we rely on a full_predictor net to do shape inference. For very large models, the full_predictor net won't be generated, so we have to do shape inference based on distributed parts. Surprisingly, the PredictorCall op does tensor name mapping so it has to have shape inference func supported.

Test Plan: Added unittests.

Reviewed By: khabinov

Differential Revision: D27250956

fbshipit-source-id: 3ebd36ba1eb020bb5d00358cffb8f038a6a996e8
2021-04-06 21:18:40 -07:00
Scott Wolchok
4c9eb57914 [PyTorch] Narrow Device to 2 bytes by narrowing DeviceType and DeviceIndex (#47023)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47023

DeviceType pretty clearly only needs 1 byte. DeviceIndex only needs 1 byte given that machines don't have anywhere near 255 GPUs in them as far as I know.
ghstack-source-id: 116901430

Test Plan: Existing tests, added assertion to catch if my assumption about DeviceIndex is incorrect

Reviewed By: dzhulgakov

Differential Revision: D24605460

fbshipit-source-id: 7c9a89027fcf8eebd623b7cdbf6302162c981cd2
2020-11-18 19:39:40 -08:00
Yinghai Lu
a92b49f7c8 [Onnxifi] Don't throw exception when we cannot write out debug files (#45979)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45979

For some reason, sometime we cannot write out the debug files. This shouldn't block the whole service. Hence, we opt in to error out instead of throw error.

Test Plan: Run net_runner test at `/` and observe error being printed out but the test passes.

Reviewed By: ipiszy

Differential Revision: D24165081

fbshipit-source-id: a4e1d0479d54d741e615e3a00b3003f512394fd4
2020-10-08 00:18:24 -07:00
Hao Lu
39b4701d31 [caffe2][redo] Reimplement RemoveOpsByType with SSA (#41606)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41606

The previous diff (D22220798 (59294fbbb9) and D22220797) was recently reverted (D22492356 (28291d3cf8), D22492355) because of a bug associated with the op AsyncIf. The AsyncIf op has net_defs as args and the SSA rewriting didn't take that into account. It has a special path for the op If, but not for AsyncIf. Several changes I made to fix the bug:
1) Add op AsyncIf to the special path for If op in SSA rewriting
2) clear inputs/outputs of the netdefs that are args in If/AsyncIf ops because they're no longer valid
3) revert renamed inputs/outputs in the arg netdefs that are in the external_outputs in the parent netdef

2) and 3) are existing bugs in the `SsaRewrite` function that were just never exposed before.

The algorithm for `RemoveOpsByType` is the same as in my previous diff D22220798 (59294fbbb9). The only new changes in this diff are in `onnx::SsaRewrite` and a few newly added unit tests.

(Note: this ignores all push blocking failures!)

Reviewed By: yinghai

Differential Revision: D22588652

fbshipit-source-id: ebb68ecd1662ea2bae14d4be8f61a75cd8b7e3e6
2020-07-17 16:06:43 -07:00
Dmytro Dzhulgakov
49457a7be7 Logging for ATen op subtype
Summary: ATenOp should go away, but before it does it's important to understand what's going inside of it. We already log `arguments`, but it's rather hard to parse in scuba as its a list, not a dictionary. Let's extract operator name explicitly so that grouping works well

Test Plan: unittest

Reviewed By: ngimel

Differential Revision: D21057966

fbshipit-source-id: 86be7cca39055620477a28bd5d8ab29e8edd2ff9
2020-04-19 23:02:50 -07:00
Dmytro Dzhulgakov
1f759936f0 Propagate model id used by Predictor to Caffe2 logging
Summary:
Does the same things as D19658565 but for Caffe2 models.

From investigation https://fb.quip.com/PbgsAEmoJVuf the model id that predictor uses and the model id saved inside the model don't match. Common reason is recurring fluent2 jobs but there are others.

Since model_id from predictor is what the rest of datasets use, it's way more useful imho. I've considered adding both ids, but it'd require additional piping and I don't think it's that useful.

Test Plan: unittests added

Reviewed By: houseroad

Differential Revision: D20630599

fbshipit-source-id: 3e6d0cb0b6f8c8b6ae5935138f55ae7a2ff60653
2020-03-29 23:07:32 -07:00
Yinghai Lu
79e1305519 [net_runner] Get shape info from qtensors (#34321)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34321

Mostly cosmetic as we can infer the shape anyway. It can remove a lot of the noise in the log though.

Note that weight sharing doesn't work yet. I'll add another diff to address this.

Reviewed By: houseroad

Differential Revision: D20290841

fbshipit-source-id: fe6f9b60d05dbe150af15b5d9d7a69fd902e12cc
2020-03-09 18:34:16 -07:00
Yinghai Lu
80404cb2f5 Add support for getting TensorProto argument (#18364)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18364

att

Reviewed By: bddppq

Differential Revision: D14584784

fbshipit-source-id: 03f9207d5cf4f7f4b812428a931edbcdcb21ca8d
2019-04-02 20:58:28 -07:00
Duc Ngo
172ec4ace5 caffe2 - Util to cleanup external inputs and outputs from a NetDef (#18194)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18194

Add a util method to cleanup external inputs and outputs from a NetDef

The following conditions will be met after the modification
- No duplicate external inputs
- No duplicate external outputs
- Going through list of ops in order, all op inputs must be outputs
from other ops, or registered as external inputs.
- All external outputs must be outputs of some operators.

Reviewed By: ZolotukhinM

Differential Revision: D14528589

fbshipit-source-id: c8d82fda1946aa3696abcbec869a4a8bb22f09b6
2019-03-22 11:23:03 -07:00
Sebastian Messmer
d408324350 Move files to/from c10/core and c10/util (#15316)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15316

This starts cleaning up the files in c10 according to the module structure we decided on.

Move to c10/util:
- Half.h, Half-inl.h, Half.cpp, bitcasts.h

Move to c10/core:
- Device.h, Device.cpp
- DeviceType.h, DeviceType.cpp

i-am-not-moving-c2-to-c10

Reviewed By: dzhulgakov

Differential Revision: D13498493

fbshipit-source-id: dfcf1c490474a12ab950c72ca686b8ad86428f63
2019-01-10 16:22:22 -08:00
David Reiss
cbd1c519c4 Replace non-printable-ascii characters in ProtoDebugString (#14918)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14918

When ProtoBuf-Lite is in use, ProtoDebugString just calls SerializeAsString.
This produces binary output, which is not a very suitable "debug" string.
Specifically, we've observed it causing problems when calling code tries to
add the debug string to a Java exception message (which requires valid UTF-8).
Now, we replace all non-ASCII bytes with "?".

This is not a very fast implementation, but generating debug strings shouldn't
be a performance-sensitive operation in any application.

Reviewed By: dzhulgakov

Differential Revision: D13385540

fbshipit-source-id: 8868172baf20efaf53fecf7d666a6980f59b64f5
2018-12-13 13:16:24 -08:00
Junjie Bai
a682ce9144 Add back HIP support to async net (#13400)
Summary:
We lost HIP support in last refactoring 620ece2668
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13400

Differential Revision: D12868211

Pulled By: bddppq

fbshipit-source-id: 72dbfda105b826bee28ddf480e88fca7d63f93d8
2018-10-31 17:52:36 -07:00
Ilia Cherniavskii
620ece2668 Simplify thread pool creation logic (#13114)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13114

Using one thread pool creator for all device types

Reviewed By: manojkris, wesolwsk

Differential Revision: D10851533

fbshipit-source-id: 32ca51d7932ba7faa8137df26315f52ecb4c6157
2018-10-26 16:02:08 -07:00
Junjie Bai
e290a9d2fd Back out "Migrate DeviceOption.numa_node_id to DeviceOption.device_id"
Summary: Original commit changeset: 82583d0ad4b8

Reviewed By: enosair, ilia-cher

Differential Revision: D10560741

fbshipit-source-id: e289a37d441bd2243b369810abf451292891d9ee
2018-10-24 17:11:25 -07:00
Edward Yang
34cca9f05b Move Device and DeviceType to c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12995

Reviewed By: Yangqing

Differential Revision: D10513246

fbshipit-source-id: 0c6d52e09166d7e8a786c1a0e21685ec9c35b12a
2018-10-24 08:27:44 -07:00
Junjie Bai
202893fe1a Migrate DeviceOption.numa_node_id to DeviceOption.device_id
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12717

Reviewed By: ilia-cher

Differential Revision: D10408325

fbshipit-source-id: 82583d0ad4b8db094ee4c5c607b52500826328f7
2018-10-19 12:45:48 -07:00
Yangqing Jia
713e706618 Move exception to C10 (#12354)
Summary:
There are still a few work to be done:

- Move logging and unify AT_WARN with LOG(ERROR).
- A few header files are still being plumbed through, need cleaning.
- caffe2::EnforceNotMet aliasing is not done yet.
- need to unify the macros. See c10/util/Exception.h

This is mainly a codemod and not causing functional changes. If you find your job failing and trace back to this diff, usually it can be fixed by the following approaches:

(1) add //caffe2/c10:c10 to your dependency (or transitive dependency).
(2) change objects such as at::Error, at::Optional to the c10 namespace.
(3) change functions to the c10 namespace. Especially, caffe2::MakeString is not overridden by the unified c10::str function. Nothing else changes.

Please kindly consider not reverting this diff - it involves multiple rounds of rebasing and the fix is usually simple. Contact jiayq@ or AI Platform Dev for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/12354

Reviewed By: orionr

Differential Revision: D10238910

Pulled By: Yangqing

fbshipit-source-id: 7794d5bf2797ab0ca6ebaccaa2f7ebbd50ff8f32
2018-10-15 13:33:18 -07:00
Junjie Bai
89010d60f9 Migrate HIP to use DeviceOption.device_id and delete DeviceOption.hip_gpu_id
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12546

Reviewed By: hyuen, xw285cornell

Differential Revision: D10305222

fbshipit-source-id: 955e1d2878508a25fe4e9980ae66f8f54aaf7db9
2018-10-10 18:25:06 -07:00
Junjie Bai
f54ab540af Rename cuda_gpu_id to device_id in DeviceOption (#12456)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12456

codemod with 'Yes to all'
codemod -d . --extensions h,cc,cpp,cu,py,proto,pbtxt,pb.txt,config cuda_gpu_id device_id

Overload TextFormat::ParseFromString to do string replace when parsing from protobuf format

Reviewed By: Yangqing

Differential Revision: D10240535

fbshipit-source-id: 5e6992bec961214be8dbe26f16f5794154a22b25
2018-10-09 15:54:04 -07:00
Junjie Bai
ff608a9ff3 Back out "Revert D10123245: Back out "codemod cuda_gpu_id to device_id"" (#12232)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12232

Original commit changeset: fca91fea58b7

This adds proper modifications to the DeviceType <->DeviceOption conversion code added in D10033396

Reviewed By: jerryzh168

Differential Revision: D10132473

fbshipit-source-id: 801ef777e2950982cb47b48051b1471a0a91e64b
2018-10-01 21:54:52 -07:00
Rick Ratmansky
3010dc4208 Revert D10123245: Back out "codemod cuda_gpu_id to device_id"
Differential Revision:
D10123245

Original commit changeset: d83da8e00a12

fbshipit-source-id: fca91fea58b7df208edc2e218a1d514f9821ec7b
2018-10-01 12:22:36 -07:00
Yang Liu
7d7d336c45 Back out "codemod cuda_gpu_id to device_id"
Summary:
Original commit changeset: f5614a5d2607

D9986213 is causing Multifeed Aggregator a [huge performance different](https://our.intern.facebook.com/intern/ads/analyze_canary/412951953278781781/) and is blocking aggregator push since last Friday night: https://fburl.com/feedtools/b6izvwjz
We need to land this revert ASAP to unblock aggregator push.

Reviewed By: orionr

Differential Revision: D10123245

fbshipit-source-id: d83da8e00a1250f5d09811a0a587c127e377aab2
2018-10-01 11:31:14 -07:00
Junjie Bai
3eb5940cf5 codemod cuda_gpu_id to device_id (#12022)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12022

codemod -d . --extensions h,cc,cpp,cu,py,proto,pbtxt,pb.txt,config cuda_gpu_id device_id

codemod with 'Yes to all'

Reviewed By: orionr

Differential Revision: D9986213

fbshipit-source-id: f5614a5d26078817aee8caf79a494abfd6a95ff1
2018-09-27 20:24:53 -07:00
Yangqing Jia
28dba2f928 Unify all *_EXPORT and *_IMPORT macros across c++ backend (#12019)
Summary:
TSIA. Right now we should basically use C10_EXPORT and C10_IMPORT for explicitly marking dllexport and dllimport, as a continued effort of the C10 unification.

This is a codemod by mechanically doing the following change:

CAFFE2_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT}
AT_CORE_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12019

Reviewed By: ezyang, teng-li

Differential Revision: D10016276

Pulled By: Yangqing

fbshipit-source-id: a420d62c43d1110105fc88f9e9076e28a3203164
2018-09-25 17:41:05 -07:00
Jerry Zhang
9f4bcdf075 caffe2::DeviceType -> at::DeviceType (#11254)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11254
Previously we use DeviceType in caffe2.proto directly, but it's an `enum` and have implicit conversion to int, which does not have type safety, e.g. we have to explicitly check for a device type is valid in event.h:
```
template <int d>
struct EventCreateFunctionRegisterer {
  explicit EventCreateFunctionRegisterer(EventCreateFunction f) {
    static_assert(d < MaxDeviceTypes, "");
    Event::event_creator_[d] = f;
  }
};
```
at::DeviceType is an `enum class`, and it does not have implicit conversion to int, and provides better type safety guarantees. In this diff we have done the following refactor(taking CPU as an example):

    1. caffe2::DeviceType → caffe2::DeviceTypeProto
    2. caffe2::CPU → caffe2::PROTO_CPU
    3. caffe2::DeviceType = at::DeviceType
    4. caffe2::CPU = at::DeviceType::CPU

codemod -d caffe2/caffe2 --extensions h,cc,cpp 'device_type\(\), ' 'device_type(), PROTO_'
+ some manual changes

In short, after this diff, in c++, caffe2::CPU refers to the at::DeviceType::CPU and the old proto caffe2::CPU will be caffe2::PROTO_CPU.
In python side, we have a temporary workaround that alias `caffe2_pb2.CPU = caffe2_pb2.PROOT_CPU` to make the change easier to review and this will be removed later.

Reviewed By: ezyang

Differential Revision: D9545704

fbshipit-source-id: 461a28a4ca74e616d3ee183a607078a717fd38a7
2018-09-05 16:28:09 -07:00
Orion Reblitz-Richardson
6508db7421 Remove BUILD_CAFFE2 and build everything (#8338)
Summary:
This completely removes BUILD_CAFFE2 from CMake. There is still a little bit of "full build" stuff in setup.py that enables USE_CUDNN and BUILD_PYTHON, but otherwise everything should be enabled for PyTorch as well as Caffe2. This gets us a lot closer to full unification.

cc mingzhe09088, pjh5, ezyang, smessmer, Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8338

Reviewed By: mingzhe09088

Differential Revision: D9600513

Pulled By: orionr

fbshipit-source-id: 9f6ca49df35b920d3439dcec56e7b26ad4768b7d
2018-08-31 13:10:24 -07:00
Pengyao Chen
dec3ed7b49 Increase the limit for Proto size (#10745)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10745

ParseProtoFromLargeString hits limit when using recurring v2. To unblock warmup project, we can increase the limit temporarily. More details in this post -- https://fb.facebook.com/groups/264913123977784/permalink/463566404112454/

Differential Revision: D9436368

fbshipit-source-id: 54488f27ef941cab679843cb0c502095dd056c1b
2018-08-23 13:55:50 -07:00
Edward Yang
36939417b2 Introduce at::DeviceType, which subsumes at::Device::Type and (partially) caffe2::DeviceType (#10175)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10175

Previously, we had at::Device::Type and caffe2::DeviceType (from protobuf),
intended to help us distinguish between CPU, CUDA, etc. devices.

This replaces at::Device::Type entirely with at::DeviceType, which in turn
is a direct, 'enum class' version of the protobuf generated caffe2::DeviceType
'enum'.  We can't eliminate the 'enum' because this would a pretty drastic
API change (enum is interconvertible with integers, enum class is not) but
we can make the two line up exactly and share code for, e.g., printing.

Reviewed By: Yangqing

Differential Revision: D9137156

fbshipit-source-id: 566385cd6efb1ed722b25e6f7849a910b50342ab
2018-08-03 19:25:06 -07:00
Lu Fang
63233f98ad Bump up opset version to 7 in Caffe2 ONNX exporter (#8854)
Summary:
Will bump up to opset 8 in another PR to match the current opset version.

Already tested through generating the models in current model zoo.
Closes https://github.com/pytorch/pytorch/pull/8854

Reviewed By: ezyang

Differential Revision: D8666437

Pulled By: houseroad

fbshipit-source-id: feffdf704dd3136aa59c0f1ff1830c14d1bd20aa
2018-06-28 07:39:02 -07:00
Duc Ngo
f52c2ca1c6 net_async tracing use enable_profile arg from NetDef (#8927)
Summary:
Closes https://github.com/pytorch/pytorch/pull/8927

Closes https://github.com/pytorch/pytorch/pull/8855

- Add parameter `enable_tracing` to the Arg field of NetDef. `net_async_tracing` will only enable Tracer for Net instances that have this field set (unless the command line argument also include the net name).
- Append a unique id to the json profiling result file because there could be multiple instances of the same net running.
- Dump json profling file regularly instead of just when the Tracer object is destroyed

Reviewed By: ilia-cher

Differential Revision: D8372378

fbshipit-source-id: 8adc9d59f48b67456beed2e3a88235c298fdfd01
2018-06-27 16:24:57 -07:00
Orion Reblitz-Richardson
edd4e2c5d1
Expose proto utils and ONNX (#8073)
* Expose proto utils and ONNX from PyTorch libcaffe2.so

* Try to use protobuf from _C.so

* Fix ONNX proto header include

* Adjust order of imports for ONNX until nanopb goes away

* Set and use ONNX_NAMESPACE for PyTorch builds

* Show protobuf summary for all builds

* Add ONNX_NAMESPACE for cpp_build

* Statically link libprotobuf.a into libtorch.so

* Set ONNX_NAMESPACE on Windows build

* Move core/dispatch up as well

* Add /MD flag for Windows build of _C

* Potential Windows fix for ONNX and protobuf

* Add direct linkage from _C to ONNX on Windows

* Only include protobuf wrapper for PyTorch

* Pass extra_compile_args to _nvrtc ext build

* Remove installation of .a files
2018-06-13 10:25:32 -07:00
bddppq
e5b997223c
[Caffe2] Enabling AMD GPU Backend for Caffe2 (#7955)
* Add hip support for caffe2 core

* Add MIOPEN header/wrapper to caffe2 core

* Add HIP device into caffe2 PB

* top level makefile change for rocm/hip

* makefile scaffolding for AMD/RocM/HIP

* Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files

* caffe2 PB update for AMD/ROCM HIP device

* Add AMD/RocM/Thrust dependency

* HIP threadpool update

* Fix makefile macro

* makefile fix: duplicate test/binary name

* makefile clean-up

* makefile clean-up

* add HIP operator registry

* add utilities for hip device

* Add USE_HIP to config summary

* makefile fix for BUILD_TEST

* merge latest

* Fix indentation

* code clean-up

* Guard builds without HIP and use the same cmake script as PyTorch to find HIP

* Setup rocm environment variables in build.sh (ideally should be done in the docker images)

* setup locale

* set HIP_PLATFORM

* Revert "set HIP_PLATFORM"

This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad.

* continue the build script environment variables mess

* HCC_AMDGPU_TARGET

* Cleanup the mess, has been fixed in the lastest docker images

* Assign protobuf field hip_gpu_id a new field number for backward compatibility

* change name to avoid conflict

* Fix duplicated thread pool flag

* Refactor cmake files to not add hip includes and libs globally

* Fix the wrong usage of environment variables detection in cmake

* Add MIOPEN CNN operators

* Revert "Add MIOPEN CNN operators"

This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a.

* Resolve merge conflicts

* .

* Update GetAsyncNetHIPThreadPool

* Enable BUILD_CAFFE2 in pytorch build

* Unifiy USE_HIP and USE_ROCM

* always check USE_ROCM

* .

* remove unrelated change

* move all core hip files to separate subdirectory

* .

* .

* recurse glob core directory

* .

* correct include

* .
2018-06-04 09:04:30 -07:00
bddppq
5e35fbfaa3
Post process onnx proto (#8064)
* Post processing onnx generated protobuf files to hide global symbols

* .

* .
2018-06-02 10:46:48 -07:00
bddppq
966c65859d Revert "[Caffe2] Enabling AMD GPU Backend for Caffe2" (#7802)
* Revert "[auto] Update onnx to 4898c9e - Added TensorDenotation and metadata_props for images (onnx/onnx#879) 4898c9e925"

This reverts commit 9c679dab5f.

* Revert "Add BiasCHW fallback for GPU (#7738)"

This reverts commit 14ad2e74f1.

* Revert "[Caffe2] Enabling AMD GPU Backend for Caffe2 (#7566)"

This reverts commit 2ebcf4bb37.
2018-05-23 17:58:47 -07:00
Peter Yeh
2ebcf4bb37 [Caffe2] Enabling AMD GPU Backend for Caffe2 (#7566)
* Add hip support for caffe2 core

* Add MIOPEN header/wrapper to caffe2 core

* Add HIP device into caffe2 PB

* top level makefile change for rocm/hip

* makefile scaffolding for AMD/RocM/HIP

* Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files

* caffe2 PB update for AMD/ROCM HIP device

* Add AMD/RocM/Thrust dependency

* HIP threadpool update

* Fix makefile macro

* makefile fix: duplicate test/binary name

* makefile clean-up

* makefile clean-up

* add HIP operator registry

* add utilities for hip device

* Add USE_HIP to config summary

* makefile fix for BUILD_TEST

* merge latest

* Fix indentation

* code clean-up

* Guard builds without HIP and use the same cmake script as PyTorch to find HIP

* Setup rocm environment variables in build.sh (ideally should be done in the docker images)

* setup locale

* set HIP_PLATFORM

* Revert "set HIP_PLATFORM"

This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad.

* continue the build script environment variables mess

* HCC_AMDGPU_TARGET

* Cleanup the mess, has been fixed in the lastest docker images

* Assign protobuf field hip_gpu_id a new field number for backward compatibility

* change name to avoid conflict

* Fix duplicated thread pool flag

* Refactor cmake files to not add hip includes and libs globally

* Fix the wrong usage of environment variables detection in cmake

* Add MIOPEN CNN operators

* Revert "Add MIOPEN CNN operators"

This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a.
2018-05-23 15:13:09 -07:00
Jinghui
26ddefbda1 [feature request] [Caffe2] Enable MKLDNN support for inference (#6699)
* Add operators based-on IDEEP interfaces

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Enable IDEEP as a caffe2 device

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Add test cases for IDEEP ops

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Add IDEEP as a caffe2 submodule

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Skip test cases if no IDEEP support

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Correct cmake options for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Add dependences on ideep libraries

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix issues in IDEEP conv ops and etc.

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Move ideep from caffe2/ideep to caffe2/contrib/ideep

Signed-off-by: Gu Jinghui <jinghui.gu@intel.com>

* Update IDEEP to fix cmake issue

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix cmake issue caused by USE_MKL option

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Correct comments in MKL cmake file

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>
2018-04-22 21:58:14 -07:00
Bram Wasti
a73b3fd1f0
[caffe2][opencl] Add OpenCL context (#6777) 2018-04-20 11:31:21 -07:00
Yinghai Lu
ef8f556212
[Caffe2] Changes done inside Facebook (#6378)
* fix unit test for sqrt op

From the error logging:

[idx, grad, grad_estimate] are:
[[ 146.            0.5           0.45776367]
 [ 147.            0.5           0.45776367]

The gradient == 0.5 is correct, which means the SqrtOp and its gradient is doing right job. (Because y = sqrt(x), loss = y^2/2 = x/2, and then d(loss)/dx = 1/2 = 0.5; )

The test failed because of numerical problem of grad_estimate (in unit test). It can be because the step_size is small, and float precision is not high (when there are multiple elements in the tensor, we do sum(y^2) to compute loss)

This diff
- increase the step size, and also move the test cases to be further away from 0 (where sqrt(x) is not well defined) to be safe :)
- also clean up, and merge the test case for inplace Vs. non-inplace

Tested with:

`CAFFE2_HYPOTHESIS_PROFILE=debug ai_bt caffe2/caffe2/python/operator_test:elementwise_ops_test -- "test_sqrt"`

* CompositeReader & CompositeReaderBuilder

A new type of reader gluing multiple readers together.

* Back out "Revert D7394363: [GanH]: Log D Trick for Cross Entropy with Sigmoid"

Original commit changeset: 9325a4356dbe

* [dai][WIP] convert params to int8 on ps before sending to trainer

Add float->uint8 conversion in addition to float->fp16 conversion in model_saver.

* [easy] improve unit test for sparse length sum ops

as desc.

#accept2ship

* Update GitHub upstream to 771fcb3455

* move sparse hash unique ops to OOS and add unit tests

- move the SparseHash version to OOS, since 'sparsehash' is already deps of caffe2 OOS: https://fburl.com/arssw4n1
- The 'SparseHash' engine is also being used in OOS, so the SparseHash version shall be in OOS to reduce confusion: https://fburl.com/o5ea7ah2

- fix the CUDA UniqueOp for the case when batch is empty.
- add unit test

* group_norm_op for caffe2

This is the cuda op for Group Normalization (GN): https://arxiv.org/abs/1803.08494

This code implements GN in one op that computes Y=gamma * (X-mu) / sigma + beta and also its gradients. It is expected to have minimal memory consumption (similar to the BN op), without creating new blobs if GN were implemented as several ops (e.g., reshape, norm_mean/std, affine_channel).

* Resubmit D7405233: disappeared in D7464958

OOS publish causes the op missing -- however, test was still there

* [c2] add sparse hash engine for cuda unique op

The SparseHash version of UniqueOp copy input tensor to CPU, and make use of sparse hash map to get unique output, and then copy back to GPU.

* [dper][gpu] enable unit testing gpu trainer for sparse nn

to debug the GPU trainer using mock data in unit test.

make it easier to develop GPU trainer for new models.

* Reuse Gloo context for Synchronize() calls

Previously we were creating (and leaking) the Gloo context on each call to Synchronize(). Now only run the common world op and create the barrier net once, then run the barrier net on each Synchronize() call. Since timeout is associated with the Gloo context, assert that the timeout is fixed instead of trying to handle the complexity of multiple timeouts (and associated contexts).

* [GanH/WGAN][1/n]: add FC param clipping

as titled

* [mobile] minimizing changes between caffe2_benchmark and speed_benchmark

* [GanH]: enable diagnose within model

avoid finding blob names but to directly enable inside the model

* Add `net_transformer_fun` option to DPM

This callback allows for various transformations to be made to the
model after gradient operators have been added. The immediate motivation for
this is to allow transformations such has "checkpoint-and-recompute" which
allow trading off memory for additional compute.

Adding several callbacks like this has made DPM's API less than ideal at this
stage. However, I could not find any reasonable alternative.

* [DT] [33/n] Compile flow task groups

task groups need to compiled in order to pickle the object in fblearner. However I also changed the Job's compile function as creating new object is not necessary.

* Initial commit for sparse_normalize vectorization and benchmark

* [GanH]: LB Calibration for JSD

as titled

* Tracing event in async executor

Adding event tracing through TRACE_EVENT macro in async executor

* [Resubmit] D7409751 Reseting book-keeping blobs when the reservoir is reset

D7409751 got lost in D7464958

* Visualizing realtime weights values

we want to visualize the weights values as optimizer is iterating. This diff supports to visual the weights at an assigned index.
Currently, we assume the blob to be 2 dimensional.

* [GanH][Easy]: Fix Homotopy Weighting

apparantely, there was a bug in homotopy weight (alpha, beta) update

* [c2] move sparse hash unique op out of oss

so that oss do not need to depend on google hash map.

* Get rid of std::round as it's not supported on Android

* Revert changes on setup.py

* Skip shaky test on Dataio

* fix
2018-04-10 21:11:43 -07:00
Orion Reblitz-Richardson
dbac044759 Add protobuf wrapper functions to proto_utils.
* These will be used when we statically link libprotobuf.a inside libcaffe2.so
2018-03-28 10:05:20 -07:00
Orion Reblitz-Richardson
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
Yangqing Jia
2d03ae2f85 Move ParseProtobufFromLargeString to proto_utils (#2354)
* Move ParseProtobufFromLargeString to proto_utils

* ParseProtobuf -> ParseProto to be consistent in naming
2018-03-21 17:05:14 -07:00
Yangqing Jia
611a89c4b6 Remove more protobuf APIs. (#2348)
* Wrap ShutdownProtobufLibrary

* Remove text_format.h header and only put the function in proto_utils.h

* ParseFromString returns bool
2018-03-21 10:29:45 -07:00