Commit Graph

1443 Commits

Author SHA1 Message Date
Lin Yang
cc5befc461 [Format] format a few files (#35187)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35187

When I touch these files, lint will always introduce some unintended change, to prevent it from happening, we need to format the code first.
change is generated by:
  arc f

Test Plan: integration test.

Differential Revision: D20587596

fbshipit-source-id: 512cf6b86bd6632a61c80ed53e3a9e229feecc2a
2020-04-17 14:30:01 -07:00
Edward Yang
dd64e738c5 Expunge TensorId from all DispatchKey names. (#36240)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36240

It's annoying, historical, and unnecessary (enum class is already
namespaced).  I did this codemod with:

```
git grep -l 'CPUTensorId' | xargs sed -i 's/CPUTensorId/CPU/g'
git grep -l 'CUDATensorId' | xargs sed -i 's/CUDATensorId/CUDA/g'
git grep -l 'VariableTensorId' | xargs sed -i 's/VariableTensorId/Autograd/g'
git grep -l 'HIPTensorId' | xargs sed -i 's/HIPTensorId/HIP/g'
git grep -l 'MSNPUTensorId' | xargs sed -i 's/MSNPUTensorId/MSNPU/g'
git grep -l 'XLATensorId' | xargs sed -i 's/XLATensorId/XLA/g'
git grep -l 'PrivateUse1_TensorId' | xargs sed -i 's/PrivateUse1_TensorId/PrivateUse1/g'
git grep -l 'PrivateUse2_TensorId' | xargs sed -i 's/PrivateUse2_TensorId/PrivateUse2/g'
git grep -l 'PrivateUse3_TensorId' | xargs sed -i 's/PrivateUse3_TensorId/PrivateUse3/g'
git grep -l 'AutocastTensorId' | xargs sed -i 's/AutocastTensorId/Autocast/g'
git grep -l '_PreAutogradTensorId' | xargs sed -i 's/_PreAutogradTensorId/_PreAutograd/g'
git grep -l 'TESTING_ONLY_GenericWrapperTensorId' | xargs sed -i 's/TESTING_ONLY_GenericWrapperTensorId/TESTING_ONLY_GenericWrapper/g'
git grep -l 'TESTING_ONLY_GenericModeTensorId' | xargs sed -i 's/TESTING_ONLY_GenericModeTensorId/TESTING_ONLY_GenericMode/g'
```

Then I did a git grep for remaining TensorId occurrences, and manually
killed those (mostly in codegen, and some docs that needed updating).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20929255

Pulled By: ezyang

fbshipit-source-id: dc371b6aa6e6ea7c0a5660137c14debde806a09d
2020-04-13 23:33:44 -07:00
Hao Lu
4d1ccafb4b [caffe2] Enable copying for caffe2::Tensor (#36468)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36468

Since `caffe2::Tensor` is now refcounted, enabling copy constructor and the copy assignment operator should be fine.

Test Plan:
```
buck test mode/dev //caffe2/caffe2:caffe2_test_cpu -- TensorTest
```

AI/AF canaries with changes up to D20959214:

https://our.intern.facebook.com/intern/experiment_store/experiment/3298538636995/#commit1-commit2
https://our.intern.facebook.com/intern/experiment_store/experiment/2199027015376/#commit1-commit2

AI/AF canaries on this diff:
https://our.intern.facebook.com/intern/ads/canary/425960191574068914/
https://our.intern.facebook.com/intern/ads/canary/425960179835413033/

Reviewed By: yinghai

Differential Revision: D20985924

fbshipit-source-id: ead5f5ceff23d0adc06d598128de16a5533d767b
2020-04-13 21:41:52 -07:00
Tristan Rice
ce54f0d411 Back out "Revert D20449887: [dt][caffe2] enable using smart exceptions in async nets" (#36172)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36172

Original commit changeset: 3d7801613f86

D20449887 broke some OSS tests as the OSS export sync wasn't working correctly.

Test Plan:
Manually export latest version to OSS to trigger the tests

+ test plan in D20449887

verified onnx tests are passing in https://github.com/pytorch/pytorch/pull/36172

Reviewed By: andrewwdye

Differential Revision: D20902279

fbshipit-source-id: bc30fcc9f5cc8076f69a5d92675fd27455948372
2020-04-13 11:31:52 -07:00
Tristan Rice
90c7db8ae3 caffe2/core/plan_executor: add cancellation of async nets on error + propagate exceptions via std::exception_ptr for stack traces (#31966)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31966

This has three parts:
* When `--caffe2_handle_executor_threads_exceptions` is set when a parallel execution step throws an exception it can hang waiting for async nets to finish. This adds cancellation code to cancel any async nets.
* This makes the exceptions returned from parallel workers pass a std::exception_ptr so the stack trace can be recorded with folly::SmartExceptionTracer.
* Define Cancel method at NetBase level to avoid pulling in unsupported AsyncSchedulingNet for fbandroid.

Test Plan:
Added unit tests for plan_executor

  buck test //caffe2/caffe2:caffe2_test_cpu
  buck test //caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100

Reviewed By: boryiingsu

Differential Revision: D19320177

fbshipit-source-id: d9939fcea1317751fa3de4172dfae7f781b71b75
2020-04-09 14:38:18 -07:00
Nikita Shulga
0f34d648c8 Fix signed-unsigned warnings (RELAND) (#36224)
Summary:
This is a realand of https://github.com/pytorch/pytorch/pull/36196
Before the fix bazel spews following multi-line warning for every single caffe2 operator:
```
In file included from ./c10/util/logging_is_google_glog.h:50,
                 from ./c10/util/Logging.h:26,
                 from ./caffe2/core/logging.h:2,
                 from ./caffe2/core/blob.h:13,
                 from ./caffe2/core/operator.h:18,
                 from ./caffe2/sgd/adadelta_op.h:1,
                 from caffe2/sgd/adadelta_op.cc:1:
bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h: In instantiation of 'std::string* google::Check_LTImpl(const T1&, const T2&, const char*) [with T1 = int; T2 = long unsigned int; std::string = std::__cxx11::basic_string<char>]':
./caffe2/core/operator.h:192:5:   required from 'const T& caffe2::OperatorBase::Input(int, caffe2::DeviceType) [with T = caffe2::Tensor; caffe2::DeviceType = c10::DeviceType]'
./caffe2/core/operator.h:890:48:   required from 'const caffe2::Tensor& caffe2::Operator<Context>::Input(int, caffe2::DeviceType) [with Context = caffe2::CPUContext; caffe2::DeviceType = c10::DeviceType]'
./caffe2/sgd/adadelta_op.h:87:5:   required from 'bool caffe2::SparseAdadeltaOp<Context>::RunOnDevice() [with Context = caffe2::CPUContext]'
./caffe2/sgd/adadelta_op.h:85:8:   required from here
bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:722:32: warning: comparison of integer expressions of different signedness: 'const int' and 'const long unsigned int' [-Wsign-compare]
  722 | DEFINE_CHECK_OP_IMPL(Check_LT, < )
      |                                ^
bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:148:53: note: in definition of macro 'GOOGLE_PREDICT_TRUE'
  148 | #define GOOGLE_PREDICT_TRUE(x) (__builtin_expect(!!(x), 1))
      |                                                     ^
bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:722:1: note: in expansion of macro 'DEFINE_CHECK_OP_IMPL'
  722 | DEFINE_CHECK_OP_IMPL(Check_LT, < )
      | ^~~~~~~~~~~~~~~~~~~~
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36224

Test Plan: CI

Differential Revision: D20919506

Pulled By: malfet

fbshipit-source-id: b8b4b7c62dcbc109b30165b19635a6ef30033e73
2020-04-08 16:29:27 -07:00
Akshay Bhandary
83abd7ffbf Revert D20909696: [pytorch][PR] Fix signed-unsigned warnings
Test Plan: revert-hammer

Differential Revision:
D20909696

Original commit changeset: 16723355f473

fbshipit-source-id: e1cf6e9d42f852693549a94d7f5830196781f00e
2020-04-08 01:21:04 -07:00
Nikita Shulga
25fe27981f Fix signed-unsigned warnings (#36196)
Summary:
Otherwise, while bazel spews following multi-line warning for every single caffe2 operator:
```
In file included from ./c10/util/logging_is_google_glog.h:50,
                 from ./c10/util/Logging.h:26,
                 from ./caffe2/core/logging.h:2,
                 from ./caffe2/core/blob.h:13,
                 from ./caffe2/core/operator.h:18,
                 from ./caffe2/sgd/adadelta_op.h:1,
                 from caffe2/sgd/adadelta_op.cc:1:
bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h: In instantiation of 'std::string* google::Check_LTImpl(const T1&, const T2&, const char*) [with T1 = int; T2 = long unsigned int; std::string = std::__cxx11::basic_string<char>]':
./caffe2/core/operator.h:192:5:   required from 'const T& caffe2::OperatorBase::Input(int, caffe2::DeviceType) [with T = caffe2::Tensor; caffe2::DeviceType = c10::DeviceType]'
./caffe2/core/operator.h:890:48:   required from 'const caffe2::Tensor& caffe2::Operator<Context>::Input(int, caffe2::DeviceType) [with Context = caffe2::CPUContext; caffe2::DeviceType = c10::DeviceType]'
./caffe2/sgd/adadelta_op.h:87:5:   required from 'bool caffe2::SparseAdadeltaOp<Context>::RunOnDevice() [with Context = caffe2::CPUContext]'
./caffe2/sgd/adadelta_op.h:85:8:   required from here
bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:722:32: warning: comparison of integer expressions of different signedness: 'const int' and 'const long unsigned int' [-Wsign-compare]
  722 | DEFINE_CHECK_OP_IMPL(Check_LT, < )
      |                                ^
bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:148:53: note: in definition of macro 'GOOGLE_PREDICT_TRUE'
  148 | #define GOOGLE_PREDICT_TRUE(x) (__builtin_expect(!!(x), 1))
      |                                                     ^
bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:722:1: note: in expansion of macro 'DEFINE_CHECK_OP_IMPL'
  722 | DEFINE_CHECK_OP_IMPL(Check_LT, < )
      | ^~~~~~~~~~~~~~~~~~~~
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36196

Differential Revision: D20909696

Pulled By: malfet

fbshipit-source-id: 16723355f473379ba9da6d3c33bd561b9724800a
2020-04-07 21:31:01 -07:00
Edward Yang
459163b8eb Revert D20449887: [dt][caffe2] enable using smart exceptions in async nets
Test Plan: revert-hammer

Differential Revision:
D20449887

Original commit changeset: 047fdf1bd52f

fbshipit-source-id: 3d7801613f86885c204f3946f3a52a855516faa3
2020-04-06 19:37:05 -07:00
Tristan Rice
8ef82fc2c9 [dt][caffe2] enable using smart exceptions in async nets (#34753)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34753

This improves support for exceptions and capturing stack traces in caffe2 async nets. We generally want to use exceptions everywhere we can in order to preserve stack information. It also makes the exception timestamp more accurate so multiple exceptions at the same time can be correctly ordered.

Test Plan: Updated the tests to use the new error semantics + adds a test to ensure the stack is correctly propagated through deferrable async scheduling.

Reviewed By: andrewwdye

Differential Revision: D20449887

fbshipit-source-id: 047fdf1bd52fd7c7c1f3fde77df9a27ed9e288e7
2020-04-06 14:27:07 -07:00
Tristan Rice
676fc929b7 [caffe2] fix type and shape inference for common gradient ops (#35857)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35857

This fixes a lot of common ops for InferBlobShapesAndTypes as well as adds support for testing the inferred shapes and types of gradient ops.

Ops:
* Concat
* Split
* LeakyReLU
* Relu
* Prelu
* Gelu
* Elu
* Sinh, Tanh, Cosh
* Abs
* ... and a number of other simple element wise ops

Test Plan:
Added support to hypothesis test to check the shape and type of gradient ops.

Enabled it for all the ops I fixed the shape and type inference for.

  buck test caffe2/caffe2/python/operator_test:

Reviewed By: pradeepd24

Differential Revision: D20806284

fbshipit-source-id: 77f796d9ff208e09e871bdbadf9a0a7c196b77f2
2020-04-02 11:17:04 -07:00
Nikita Shulga
16774f7353 Increase TimerTest tolerance to 20% on Windows (#35818)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35818

Test Plan: CI

Differential Revision: D20798424

Pulled By: malfet

fbshipit-source-id: 57e8d9c6b93903a6632168a4a35bf946d8c518aa
2020-04-01 14:29:05 -07:00
Edward Yang
3f3b96b1f8 Revert D20735881: [pytorch][PR] [WIP] [reland][pytorch][PR] Fix some incorrect annotation…
Test Plan: revert-hammer

Differential Revision:
D20735881

Original commit changeset: d21e940380f0

fbshipit-source-id: fb50a099320bfac92c9b8e1ca12cdc50d302342f
2020-03-30 12:28:27 -07:00
peter
e7a37823b0 [WIP] [reland][pytorch][PR] Fix some incorrect annotation… (#35588)
Summary:
…s found by clang-cl"

This reverts commit a9b540d109.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35588

Differential Revision: D20735881

Pulled By: ezyang

fbshipit-source-id: d21e940380f0c1b9b9b84e9cc892985fd3ad0ac3
2020-03-30 11:42:19 -07:00
Nikita Shulga
a9b540d109 Revert D20670031: [pytorch][PR] Fix some incorrect annotations found by clang-cl
Test Plan: revert-hammer

Differential Revision:
D20670031

Original commit changeset: cd8018dee703

fbshipit-source-id: 6900bf46346f0f415812607e5eff67259fc7b478
2020-03-27 18:26:01 -07:00
peter
45c9ed825a Formatting cmake (to lowercase without space for if/elseif/else/endif) (#35521)
Summary:
Running commands:
```bash
shopt -s globstar

sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i CMakeLists.txt
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i caffe2/**/CMakeLists.txt
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i torch/**/CMakeLists.txt
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i c10/**/CMakeLists.txt
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake/**/*.cmake
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake/**/*.cmake.in
```
We may further convert all the commands into lowercase according to the following issue: 77543bde41.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35521

Differential Revision: D20704382

Pulled By: malfet

fbshipit-source-id: 42186b9b1660c34428ab7ceb8d3f7a0ced5d2e80
2020-03-27 14:25:17 -07:00
peter
0c16cedafe Fix some incorrect annotations found by clang-cl (#35364)
Summary:
Fixes incorrect usages of symbol annotations including:
1. Exporting or importing a function/class in an anonymous namespace.
2. Exporting or importing a function/class implementation in a header file. However, by removing the symbol annotations, they are now local symbols. If they need to be remain global, I can move the implementations to the source file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35364

Differential Revision: D20670031

Pulled By: ezyang

fbshipit-source-id: cd8018dee703e2424482c27fe9608e040d8105b8
2020-03-27 10:40:04 -07:00
Linbin Yu
93065ff767 [1] add missing header for C10_EXPORT_CAFFE2_OP_TO_C10 (#35245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35245

add missing header file for C10_EXPORT_CAFFE2_OP_TO_C10_CPU micro

(Note: this ignores all push blocking failures!)

Test Plan: buck build -c caffe2.expose_op_to_c10=1 //xplat/caffe2:mask_rcnn_opsAndroid

Reviewed By: dreiss

Differential Revision: D20528761

fbshipit-source-id: 7cd186ba72964c2e193aca994f87a91a71c3c5d7
2020-03-24 22:16:03 -07:00
Nikita Shulga
6f737dd4a3 Fix signed-unsigned warnings (#34791)
Summary:
And few typos
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34791

Test Plan: CI

Differential Revision: D20524879

Pulled By: malfet

fbshipit-source-id: 58fa03bd6356979e77cd1bffb6370d41a177c409
2020-03-19 00:29:56 -07:00
Nikita Shulga
a3de359464 Do not throw from CUDAContext destructor (#34756)
Summary:
Throwing from destructor leads to undefined behaviour (most often to segault)
So it's better to leak memory then segault
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34756

Test Plan: Run `test_pytorch_onnx_caffe2`

Differential Revision: D20504228

Pulled By: malfet

fbshipit-source-id: 7a05776fea9036f602e95b8182f8493cb5886dab
2020-03-18 00:13:18 -07:00
Nikita Shulga
e70c28856f [Caffe2] Move more method implementations from tensor.h to tensor.cc (#34811)
Summary:
To speed up compilation time
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34811

Test Plan: CI

Differential Revision: D20476992

Pulled By: malfet

fbshipit-source-id: 922cde93783fbfc04854851d7a05a635d5239792
2020-03-16 22:15:18 -07:00
Nikita Shulga
ef78fa8668 caffe2::OperatorBase do not need to be aware of at::Tensor functions (#34810)
Summary:
Replacing <ATen/core/Tensor.h> with <<ATen/core/TensorBody.h> speeds up compilation of caffe2 operators by 15%
For example, it reduces pool_op.cu compilation from 18.8s to 16s
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34810

Test Plan: CI

Differential Revision: D20472230

Pulled By: malfet

fbshipit-source-id: e1b261cc24ff577f09e2d5f6428be2063c6d4a8b
2020-03-16 12:58:05 -07:00
Linbin Yu
2fe7fc681d [PT] add macro to expose caffe2 ops to PyTorch mobile (#34578)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34578

Right now C10_EXPORT_CAFFE2_OP_TO_C10_CPU didn't work on mobile since we disabled some code paths. This diff added a new macro to enable these code paths so we can register caffe2 ops in PT mobile.

Test Plan:
verified caffe2 ops are registered in PT mobile
(on the whole stack)

```
_caffe2::BBoxConcatBatchSplits(Tensor[] input_list, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor output)
_caffe2::BBoxTransform(Tensor rois, Tensor deltas, Tensor im_info, float[] weights, bool apply_scale, bool rotated, bool angle_bound_on, int angle_bound_lo, int angle_bound_hi, float clip_angle_thresh, bool legacy_plus_one, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor output_0, Tensor output_1)
_caffe2::BoxWithNMSLimit(Tensor scores, Tensor boxes, Tensor batch_splits, float score_thresh, float nms, int detections_per_im, bool soft_nms_enabled, str soft_nms_method, float soft_nms_sigma, float soft_nms_min_score_thres, bool rotated, bool cls_agnostic_bbox_reg, bool input_boxes_include_bg_cls, bool output_classes_include_bg_cls, bool legacy_plus_one, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor scores, Tensor boxes, Tensor classes, Tensor batch_splits, Tensor keeps, Tensor keeps_size)
_caffe2::GenerateProposals(Tensor scores, Tensor bbox_deltas, Tensor im_info, Tensor anchors, float spatial_scale, int pre_nms_topN, int post_nms_topN, float nms_thresh, float min_size, bool angle_bound_on, int angle_bound_lo, int angle_bound_hi, float clip_angle_thresh, bool legacy_plus_one, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor output_0, Tensor output_1)
_caffe2::HeatmapMaxKeypoint(Tensor heatmaps, Tensor bboxes_in, bool should_output_softmax=True, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor keypoints)
_caffe2::ResizeNearest(Tensor X, str order, float width_scale, float height_scale, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor Y)
_caffe2::RoIAlign(Tensor features, Tensor rois, str order, float spatial_scale, int pooled_h, int pooled_w, int sampling_ratio, bool aligned, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor)

Reviewed By: dreiss

Differential Revision: D20128254

fbshipit-source-id: 49a837dddc431eb528b5c72ffdfe0d0131cd10b4
2020-03-11 19:15:14 -07:00
Rohith Menon
879a90b322 [ModelLoading] Use byte encoding for uint8, fp16 etc. instead of int32 (#34343)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34343

Use byte encoding for uint8, fp16 etc. instead of int32 in TensorProto serialization/deserialization

tl;dr
- fp16 tensor deserialization 12x faster, serialized size 25% lower
- uint8 tensor deserialization 36x faster, serialized size 25% lower

Test Plan:
```
============================================================================
caffe2/caffe2/fb/predictor/ModelLoaderBenchmark.cpprelative  time/iter  iters/s
============================================================================
BlobProtoInt32DeserializationFloat16                        12.37ms    80.82
BlobProtoByteDeserializationFloat16             1125.46%     1.10ms   909.64
----------------------------------------------------------------------------
BlobProtoInt32DeserializationUInt8                          17.57ms    56.92
BlobProtoByteDeserializationUInt8               3629.45%   484.02us    2.07K
============================================================================
```

Reviewed By: yinghai

Differential Revision: D20137451

fbshipit-source-id: 8ed4be2286a6d4c7e134fcb0832f22bc645039a1
2020-03-06 11:58:30 -08:00
Artem Volkhin
75d29f8d3e Allow converting IValue to vector<string> (#34269)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34269

follow up for https://github.com/pytorch/pytorch/pull/16519

Test Plan: unit tests

Reviewed By: houseroad

Differential Revision: D20261495

fbshipit-source-id: 947f3cbd469d9258ec2dbb36cb68efe15a3b19eb
2020-03-05 12:31:23 -08:00
Michael Ranieri
1702152ef9 fixup unit tests (#34105)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34105

make parallel_net_test.cc chronos conforming.
exclude gtest asserts that check thrown exceptions when exceptions are disabled.

Test Plan: CI green

Differential Revision: D20153525

fbshipit-source-id: 7371e559da948f46773fed09e3a23a77411d59e0
2020-03-03 10:33:21 -08:00
cyy
8a14b41617 fix warnings reported by PVS (#33868)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33868

Differential Revision: D20169059

Pulled By: ailzhang

fbshipit-source-id: ec12226ae27ddd89fa5bacdd35151981ebfedcfd
2020-03-02 18:51:38 -08:00
Michael Ranieri
b874c039f6 Allow checking for cached module before asserting (#33954)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33954

fixes caffe2/core/module_test.cc on windows
misc lint fixes.

Test Plan: CI green

Reviewed By: malfet

Differential Revision: D20153512

fbshipit-source-id: aeae84a028e26edd65c7218611e3c49a8d9bb8c0
2020-03-02 15:43:50 -08:00
Michael Ranieri
9239608037 fix windows clang attributes (#33959)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33959

make sure clang on windows uses correct attributes.
add support for cl.exe style pragma attributes

Test Plan: CI green

Differential Revision: D20153548

fbshipit-source-id: bfbfd374e8f5e7d7b8598453c3ca2b6693a425f1
2020-03-02 13:20:51 -08:00
Igor Sugak
5dde8cd483 [caffe2] fix no matching function min/max Clang errors (#33563)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33563

When NVCC or Clang are driving CUDA compilation many math functions are declared by default, with a small difference: Clang marks them as `__device__` only, while NVCC uses both `__host__` and `__device__`. This makes every un-elaborated `min` or `max` function call from a `__host__` function generate a syntax error when Clang is used.

Fix the errors by using `std::min` and `std::max` from `<algorithm>`, since C++14 they are `constexpr` and can be used in the `__device__` code [1].

1. https://llvm.org/docs/CompileCudaWithLLVM.html#algorithm

Test Plan:
```lang=bash
buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow
buck build mode/opt //fblearner/flow/projects/dper:workflow
```
Execute tests on devgpu:
```
buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda
```

Reviewed By: ngimel

Differential Revision: D20005795

fbshipit-source-id: 98a3f35e8a96c15d3ad3d2066396591f5cca1696
2020-02-28 11:33:24 -08:00
Michael Suo
dbe850af5b [jit] do the code reorg (#33851)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851

Rationale and context described in #33828.

Script to reproduce the move:
https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9
ghstack-source-id: 99079645

Test Plan: Make sure CI passes

Reviewed By: jamesr66a

Differential Revision: D20133869

fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e
2020-02-27 13:02:51 -08:00
Yinghai Lu
a2f3c6c26f Call RandomNumberSeed() on-demand (#33539)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33539

We rarely use the `random_seed_` in context but we always initialize it with `RandomNumberSeed()` which isn't trivial. This diff makes it that we only call `RandomNumberSeed()` once when we want to use `random_seed_`.

Test Plan:
unittests.

Canaries:
AF: https://our.intern.facebook.com/intern/ads/canary/424753437441438410
AI: https://our.intern.facebook.com/intern/ads/canary/424753467414318838
Prospector: https://our.intern.facebook.com/intern/ads/canary/424753976999968569

Reviewed By: ipiszy

Differential Revision: D19993190

fbshipit-source-id: 1d2606bd65476ff3b519c69f9cbfa3b80f75cdff
2020-02-22 01:22:18 -08:00
Dmytro Dzhulgakov
e10aa6b72f Fix flaky DagNetTest unittest
Summary: The first run of the net is noisy sometimes - just run it twice.

Reviewed By: cheshen1

Differential Revision: D20039274

fbshipit-source-id: 639e65646bf52f3efe1ecd4bbcd0e413d9389b29
2020-02-21 16:08:04 -08:00
Brian Wignall
f326045b37 Fix typos, via a Levenshtein-type corrector (#31523)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking.

Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523

Differential Revision: D19216749

Pulled By: mrshenli

fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea
2020-01-17 16:03:19 -08:00
Zachary DeVito
7e3c438913 Renaming IValue List functions (#32093)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32093

toGenericListRef -> toListRef
isGenericList -> isList
toGenericList -> toList
toXListRef -> toXVector

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D19369767

Pulled By: zdevito

fbshipit-source-id: 4f0078f95b83e6586524c03f7bcf206722fdd9ae
2020-01-17 15:17:45 -08:00
Yanghan Wang
9b6ec61bfd exposing CPU/GPU Copy ops (#32248)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32248

expose CPU/GPU copy ops

Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:torch_integration_test

Reviewed By: houseroad

Differential Revision: D19405856

fbshipit-source-id: 1df4aa202e26647cb81e9fe7e4478e594a5f7f3e
2020-01-17 12:40:43 -08:00
Yinghai Lu
df514fd8c0 C++ C2/Glow operator unittest
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32258

Test Plan:
```
 buck test glow/fb/test/numerics:fp16_op_test
```

Reviewed By: bddppq

Differential Revision: D19401786

fbshipit-source-id: 1382b5208be6172d3e6f768dedad7ebec31cffc9
2020-01-17 12:13:34 -08:00
Pavel Belevich
62b06b9fae Rename TensorTypeId to DispatchKey (#32154)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32154

TensorTypeId -> DispatchKey
	c10/core/TensorTypeId.h -> c10/core/DispatchKey.h
	c10/core/TensorTypeId.cpp -> c10/core/DispatchKey.cpp
	TensorTypeId::* -> DispatchKey::*
	TensorTypeId type_id -> DispatchKey dispatch_key
		type_id -> dispatch_key
	TensorTypeId::NumTensorIds -> DispatchKey::NumDispatchKeys
	RealTensorTypeId -> RealDispatchKey
TensorTypeSet -> DispatchKeySet
	TensorTypeIds -> DispatchKeys
	c10/core/TensorTypeSet.h -> c10/core/DispatchKeySet.h
	c10/core/TensorTypeSet.cpp -> c10/core/DispatchKeySet.cpp
	type_set() -> key_set()
	type_set_ -> key_set_
	typeSet -> keySet
ExcludeTensorTypeIdGuard -> ExcludeDispatchKeyGuard
IncludeTensorTypeIdGuard -> IncludeDispatchKeyGuard
LocalTensorTypeSet -> LocalDispatchKeySet
	c10/core/impl/LocalTensorTypeSet.h -> c10/core/impl/LocalDispatchKeySet.h
	c10/core/impl/LocalTensorTypeSet.cpp -> c10/core/impl/LocalDispatchKeySet.cpp
	tls_local_tensor_type_set -> tls_local_dispatch_key_set
	tls_is_tensor_type_id_excluded -> tls_is_dispatch_key_excluded
	tls_set_tensor_type_id_excluded -> tls_set_dispatch_key_excluded
	tls_is_tensor_type_id_included -> tls_is_dispatch_key_included
	tls_set_tensor_type_id_included -> tls_set_dispatch_key_included
MultiDispatchTensorTypeSet -> MultiDispatchKeySet
	multi_dispatch_tensor_type_set -> multi_dispatch_key_set
tensorTypeIdToBackend -> dispatchKeyToBackend
backendToTensorTypeId -> backendToDispatchKey
initForTensorTypeSet -> initForDispatchKeySet
inferred_type_set -> inferred_key_set
computeTensorTypeId -> computeDispatchKey
PODLocalTensorTypeSet raw_local_tensor_type_set -> PODLocalDispatchKeySet raw_local_dispatch_key_set
get_default_tensor_type_id -> get_default_dispatch_key
inferred_type_id -> inferred_dispatch_key
actual_type_id -> actual_dispatch_key
typeSetToDispatchKey_ -> dispatchKeySetToDispatchKey_
get_type_id() -> get_dispatch_key()
legacyExtractTypeId -> legacyExtractDispatchKey
extractTypeId -> extractDispatchKey

Test Plan: Imported from OSS

Differential Revision: D19398900

Pulled By: pbelevich

fbshipit-source-id: 234ad19f93d33e00201b61e153b740a339035776
2020-01-15 11:16:08 -08:00
Zachary DeVito
14593f077f remove list specialization from ivalue (#30734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30734

What are specialized lists?

The IValues that hold List[int], List[Tensor], and List[AnythingElse] are different C++ types.
e.g. List[int] has a std::vector<int> while List[AnythingElse] holds a std::vector<IValue>.

Why do we have specialized lists?

When we first created the JIT we needed to bind the ATen C++ API which has std::vector<int>,
std::vector<Tensor> as inputs. The easiest way to match this API was to make our IValues contain
these same types. Conversion was just unwrapping the IValue, very easy and cheap.

What is the problem with specialized lists?

We end up with significant special cases through the compiler. Other types like Dict are not
specialized. So in the Pickler, for instance, there is a single piece of logic to handle
their serialization. For Lists, we end up with multiple cases. Furthermore, it doesn't
match Python, leading to problems along translation boundaries. Our pickle serialization
is slightly different than python, so it is harder to load objects from our IValue serialization
as Python values.

They also make it harder to provide an easy-to-use user API. We'd like to match pybind11 for C++
bindings to TorchScript. This would entail having a single torch::List class (untemplated)
that can be used to construct inputs. This is made much harder if the underlying ivalue needs
to be different depending on the type inside the list. The ideal case would be to have a constructor like

```
template<typename T>
List(std::vector<T> foo);
```

It would then set up the type tags correctly based on type T, without the need for passing tags.

Do specialized lists improve perf?

Not in a way we have been able to measure. Our major concern initially was having to translate
a std::vector<IValue> to std::vector<int> to call ATen functions. This was especially a concern
for aten::_convolution which takes a number of mostly-constant lists of integers. However,
when we measure the effect of actually having to do this conversion for an aten::_convolution,
it does not take measurable time (benchmark results below).
This is true even if you use a trivial convolution (e.g. 1x1x1), and comment out the actual convolution code.

What are the issues removing them?

This PR removes list specialization but keeps the serialization format, and IValue APIs almost exactly
the same. The only visible change is that toTensorListRef and family have turned into toTensorVector
because they now return by value a copy of the list as a vector.

Further PRs can then clean up the complexity issues that arose from speclization. This will likely
involve removing the isTensorList/isIntList functions, and refactoring the code that used them to
work generically. At some point we will also change serialization to no longer write specialized
lists in the pickle binary. This is forward incompatible, so will go in its own PR.

Benchmark:
```
import torch

import torch.nn as nn
import torch.nn.functional as F
import time

class MnistNet(nn.Module):
    def __init__(self):
        super(MnistNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 1, kernel_size=1)
        self.conv2 = nn.Conv2d(1, 1, kernel_size=1)

    def forward(self, x):
        for i in range(10):
            x = F.relu(self.conv1(x))
            x = F.relu(self.conv2(x))
        return x

model = MnistNet()
x = torch.rand(1, 1, 1, 1)
r = torch.jit.trace(model, x )
r(x)
r(x)
r(x)
r(x)
print(torch.jit.last_executed_optimized_graph())

while True:
    b = time.time()
    for i in range(100):
        r(x)
    e = time.time()
    print(e - b)
```

Results (no observable difference):

```
Before (actual conv)
0.13251137733459473
0.13260436058044434
0.13276338577270508
0.1327497959136963
0.13250041007995605
0.13270330429077148
0.13290190696716309
0.13265132904052734
0.13274288177490234
0.1326758861541748
0.13253355026245117
0.13254785537719727
0.13260746002197266
0.13285017013549805
0.13264012336730957
0.132490873336792
0.13280034065246582
0.13243484497070312
0.1325232982635498
0.1326127052307129
0.13264131546020508
0.13274383544921875
0.13298296928405762
0.1326909065246582
-------------------
After (actual conv)
0.13127517700195312
0.13150334358215332
0.13092470169067383
0.13102364540100098
0.13134360313415527
0.13155555725097656
0.13314104080200195
0.13151955604553223
0.13160037994384766
0.1315293312072754
0.13137340545654297
0.13148093223571777
0.131455659866333
0.1327371597290039
0.13134026527404785
0.13152337074279785
0.13151192665100098
0.13165974617004395
0.13403725624084473
0.13251852989196777
0.13135504722595215
0.1315624713897705
0.1317615509033203
0.1314380168914795
0.13157200813293457
--------------------

The following replace the convolution operator with a no-op, to show
that even if the conv op was made faster, then we still would not see
a difference:

Before (fake conv)
0.0069539546966552734
0.0069522857666015625
0.007120847702026367
0.007344722747802734
0.007689952850341797
0.007932662963867188
0.00761723518371582
0.007501363754272461
0.007532835006713867
0.007141828536987305
0.007174253463745117
0.007114410400390625
0.007071495056152344
------------------
After (fake conv)
0.007458209991455078
0.007337093353271484
0.007268190383911133
0.007313251495361328
0.007306575775146484
0.007468700408935547
0.0073091983795166016
0.007308483123779297
0.007538318634033203
0.007356882095336914
0.007464170455932617
0.007372140884399414
```

Test Plan: Imported from OSS

Differential Revision: D18814702

Pulled By: zdevito

fbshipit-source-id: 0371c73b63068fdc12f24b801371ea90f23531a6
2020-01-12 18:28:25 -08:00
Hong Xu
daf00beaba Remove duplicated Numa detection code. (#30628)
Summary:
cmake/Dependencies.cmake (1111a6b810/cmake/Dependencies.cmake (L595-L609)) has already detected Numa. Duplicated detection and variables may lead to
incorrect results.

Close https://github.com/pytorch/pytorch/issues/29968
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30628

Differential Revision: D18782479

Pulled By: ezyang

fbshipit-source-id: f74441f03367f11af8fa59b92d656c6fa070fbd0
2020-01-03 08:48:46 -08:00
peterjc123
c4121ed8db Fix is_fundamental template for MSVC (#30959)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/30932
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30959

Differential Revision: D18891797

Pulled By: mingbowan

fbshipit-source-id: e6c36ee80065e66117873e768f86f507c48aaef1
2019-12-19 12:10:22 -08:00
Tristan Rice
b0bd35ff13 caffe2/event: allow multiple errors such as when cancelled (#31335)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31335

When an error occurs in a net we end up cancelling all the async ops. If one error occurs it's highly likely other errors will occur as well.

Typically we see:
1. SendOp failed due to a network error
2. async scheduling cancels all other ops via `SetFinished("Cancelled");`
3. Another SendOp fails due to a network error and crashes the process when the exception is thrown.

This changes caffe2 ops to allow failing twice.

Test Plan: buck test //caffe2/caffe2:caffe2_test_cpu

Reviewed By: andrewwdye

Differential Revision: D19106548

fbshipit-source-id: 4b7882258a240894cc16d061a563c83a3214d3d9
2019-12-18 13:10:57 -08:00
Sebastian Messmer
643ca5def2 Replace c10::guts::stuff with std::stuff (#30915)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30915

Since we now have C++14, we don't need these c10::guts helpers anymore
ghstack-source-id: 95777609

Test Plan: waitforsandcastle

Differential Revision: D18869639

fbshipit-source-id: 97716f932297c64c6e814410ac47b444c33d4e2e
2019-12-16 13:57:19 -08:00
Sebastian Messmer
409151e1bb Use [[noreturn]] instead of C10_NORETURN or CAFFE_NORETURN (#30917)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30917

This is a C++14 feature, we can use this now.
ghstack-source-id: 95255753

Test Plan: waitforsandcastle

Differential Revision: D18869637

fbshipit-source-id: dd02036b9faeaffa64b2d2d305725443054da31b
2019-12-15 23:54:16 -08:00
Richard Zou
9047d4df45 Remove all remaining usages of BUILD_NAMEDTENSOR (#31116)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31116

Changelist:
- remove BUILD_NAMEDTENSOR macro
- remove torch._C._BUILD_NAMEDTENSOR
- remove all python behavior that relies on torch._C._BUILD_NAMEDTENSOR

Future:
- In the next diff, I will remove all usages of
ATen/core/EnableNamedTensor.h since that header doesn't do anything
anymore
- After that, we'll be done with the BUILD_NAMEDTENSOR removal.

Test Plan: - run CI

Differential Revision: D18934951

Pulled By: zou3519

fbshipit-source-id: 0a0df0f1f0470d0a01c495579333a2835aac9f5d
2019-12-12 09:53:03 -08:00
Shunting Zhang
7f5f2e8871 add ZERO_COLLISION_HASH to caffe2 data type (#30912)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30912

Add a new data type ZERO_COLLISION_HASH .

Test Plan: ci

Reviewed By: boryiingsu

Differential Revision: D18843626

fbshipit-source-id: b2d8280f13c78b4a656cf95822198df59de7b64c
2019-12-10 21:36:24 -08:00
Edward Yang
38986e1dea Split libtorch.so back into libtorch_{cpu,cuda,hip} (#30315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30315

The new structure is that libtorch_cpu contains the bulk of our
code, and libtorch depends on libtorch_cpu and libtorch_cuda.
This is a reland of https://github.com/pytorch/pytorch/pull/29731 but
I've extracted all of the prep work into separate PRs which can be
landed before this one.

Some things of note:

* torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library)
* The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774
In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/libprotobuf.a(arena.cc.o) is referenced by DSO"
* A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly
* I had to torch_cpu/torch_cuda caffe2_interface_library so that they get whole-archived linked into torch when you statically link. And I had to do this in an *exported* fashion because torch needs to depend on torch_cpu_library. In the end I exported everything and removed the redefinition in the Caffe2Config.cmake. However, I am not too sure why the old code did it in this way in the first place; however, it doesn't seem to have broken anything to switch it this way.
* There's some uses of `__HIP_PLATFORM_HCC__` still in `torch_cpu` code, so I had to apply it to that library too (UGH). This manifests as a failer when trying to run the CUDA fuser. This doesn't really matter substantively right now because we still in-place HIPify, but it would be good to fix eventually. This was a bit difficult to debug because of an unrelated HIP bug, see https://github.com/ROCm-Developer-Tools/HIP/issues/1706

Fixes #27215 (as our libraries are smaller), and executes on
part of the plan in #29235.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18790941

Pulled By: ezyang

fbshipit-source-id: 01296f6089d3de5e8365251b490c51e694f2d6c7
2019-12-04 08:04:57 -08:00
Brian Wignall
e7fe64f6a6 Fix typos (#30606)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606

Differential Revision: D18763028

Pulled By: mrshenli

fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c
2019-12-02 20:17:42 -08:00
Sebastian Messmer
aa2862b843 Hide the OperatorKernel* argument from the stack based kernel API (#29337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29337

This argument is needed by boxing wrappers so they're able to get a pointer to the corresponding unboxed kernel and call into it.
But if a kernel is registered in a boxed way, we don't need it and should hide this from the API.
This is especially needed for the backend fallback API where users would only be left wondering why this argument is there and what it does.
Also, hiding it allows us to potentially totally remove it in a future refactoring if we find some way to do so.
ghstack-source-id: 94481316

Test Plan: unit tests

Differential Revision: D18361991

fbshipit-source-id: 5cef26c896fe3f2a5db730d3bc79dcd62e7ef492
2019-11-23 15:25:01 -08:00
Sebastian Messmer
583c288232 Add a OperatorHandle argument to boxed kernels (#29201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29201

This is required for boxed backend fallback kernels (e.g. lazy, AMP) because they need to know which op was actually called.
ghstack-source-id: 94481313

Test Plan: I will add unit tests in a diff stacked on top

Differential Revision: D18282746

fbshipit-source-id: 339a1bbabd6aff31a587b98f095c75104dfc6f99
2019-11-23 15:24:49 -08:00
Junjie Bai
352731bd6e Revert D18632773: Split libtorch.so back into libtorch_{cpu,cuda,hip}
Test Plan: revert-hammer

Differential Revision:
D18632773

Original commit changeset: ea717c81e0d7

fbshipit-source-id: 18601439f9f81c9f389020e5a0e4e04adb21772d
2019-11-21 15:01:09 -08:00
Edward Yang
ec30d9028a Split libtorch.so back into libtorch_{cpu,cuda,hip} (#29731)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29731

The new structure is that libtorch_cpu contains the bulk of our
code, and libtorch depends on libtorch_cpu and libtorch_cuda.

Some subtleties about the patch:
- There were a few functions that crossed CPU-CUDA boundary without API macros. I just added them, easy enough. An inverse situation was aten/src/THC/THCTensorRandom.cu where we weren't supposed to put API macros directly in a cpp file.
- DispatchStub wasn't getting all of its symbols related to static members on DispatchStub exported properly. I tried a few fixes but in the end I just moved everyone off using DispatchStub to dispatch CUDA/HIP (so they just use normal dispatch for those cases.) Additionally, there were some mistakes where people incorrectly were failing to actually import the declaration of the dispatch stub, so added includes for those cases.
- torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library)
- The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774
- In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/l
ibprotobuf.a(arena.cc.o) is referenced by DSO"
- A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly. This situation also happens with custom C++ extensions.
- There's a ROCm compiler bug where extern "C" on functions is not respected. There's a little workaround to handle this.
- Because I was too lazy to check if HIPify was converting TORCH_CUDA_API into TORCH_HIP_API, I just made it so HIP build also triggers the TORCH_CUDA_API macro. Eventually, we should translate and keep the nature of TORCH_CUDA_API constant in all cases.

Fixes #27215 (as our libraries are smaller), and executes on
part of the plan in #29235.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18632773

Pulled By: ezyang

fbshipit-source-id: ea717c81e0d7554ede1dc404108603455a81da82
2019-11-21 11:27:33 -08:00
Edward Yang
65bb34d885 Remove TensorImpl::is_variable, deprecate Tensor::is_variable (#29653)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29653

I didn't remove is_variable from Tensor for BC reasons, but I did
remove as many uses as I could from the codebase.
at::impl::variable_excluded_from_dispatch got moved to TensorBody.h
so that it's more widely accessible.

This diff is NOT semantics preserving.  Here are the major differences:

- In a number of native operator implementations, we tested that arguments
  are not variable.  I replaced these with asserts that variable is
  excluded from dispatch.  I actually don't think these asserts are really
  necessary now (they should certainly be true, but it's hard to get
  it wrong), but I've kept them for old time's sake.  At least, they'll detect
  if you call these functions before you've processed variable (indicating
  a bug in your kernel.)

- There are a number of places where we do a per-tensor test for being a
  variable, for better error reporting when someone commits Tensor/Variable
  confusion.  Although these tests are substantively the same as the
  tests above, in these cases I decided to *delete* the test entirely.
  The reasoning is that in these cases, we didn't really care about
  dispatch (also, see above; I'm not too sure we really need the dispatch
  asserts), we cared about Tensor/Variable confusion.  Since Tensor/Variable
  confusion is impossible now, we don't need the tests.  One of the key
  factors which pushed me one way or another was whether or not a function
  was doing per-tensor validation; if I kept the assert in such functions,
  I'd repeatedly access the TLS.  Even if we want to bring back the asserts,
  they would have to go somewhere else.

  Another similar idiom is the number of places we do !x.defined() ||
  x.is_variable(); I treated this equivalently.

- nuclear_norm's computation of compute_uv is a bit weird, but I think
  it's OK to just delete the is_variable case (I *suspect* that it is
  always the case that self.is_variable(), but it doesn't really matter.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18496168

Pulled By: ezyang

fbshipit-source-id: 5a1ded931e0c10a6b758ba64a8380d34110e0c3e
2019-11-14 11:41:02 -08:00
Christy Lee
b8dca04f73 Add error message if CUDA startup fails (#29670)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29670

This is the entry point to loading CUDA code, improve error message to prompt users to check that gpu code is included.

Test Plan: Build without gpu code.  Run the binary.  Check that the new error message exists.

Reviewed By: yfeldblum

Differential Revision: D18453798

fbshipit-source-id: 63d9ec50acdf57ef4baf3f7d99c836c56bc1435e
2019-11-13 16:48:40 -08:00
Junjie Bai
b0c245d52d Consolidate the places that find pybind11 include dirs (#29659)
Summary:
Also move the logic that installs the pybind11 headers from setup.py to cmake (to align with other headers).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29659

Differential Revision: D18458208

Pulled By: bddppq

fbshipit-source-id: cfd1e74b892d4a65591626ab321780c8c87b810d
2019-11-12 14:51:56 -08:00
Yavuz Yetim
2704af0970 AsyncIf op implementation
Summary:
This diff adds the following:
- An AsyncIf to support conditional async execution. This op assumes that then_net and else_net are async scheduling nets. This op itself completes when every async op in the active net completes. Cancellation cancels the inner nets and the async ops.
- Unit tests targeting asynchronicity and error/cancellation handling.

Test Plan:
New unit tests

With --stress-runs=2000:
https://our.intern.facebook.com/intern/testinfra/testrun/4785074616784325

Reviewed By: ilia-cher

Differential Revision: D18051357

fbshipit-source-id: 1399a437b3ca63fd4ea0cf08d173f85b9242cc1f
2019-11-07 08:51:31 -08:00
Ilia Cherniavskii
7190789f58 Handling of failing and terminal async cpu ops (#29052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29052

Make sure we handle the case of multiple, async, terminal (no children)
and failing cpu ops.

Test Plan: AsyncIf tests

Reviewed By: yyetim

Differential Revision: D18276401

Pulled By: ilia-cher

fbshipit-source-id: 35b175dd025bc7e392056ac1331b159376a29e60
2019-11-04 12:01:21 -08:00
Yinghai Lu
c60bf2704a Support Offline Tensors through ONNXIFI layer
Summary:
Previous import was b2ec1a8041879b7be98d81387a14cae895f952f4

Included changes:
- **[97fe555](https://github.com/houseroad/foxi/commit/97fe555)**: Add deferred weight reader pointer when initializing the graph (#15) <Yinghai Lu>
- **[ba2faf7](https://github.com/houseroad/foxi/commit/ba2faf7)**: Add status and timeout to events (#14) <Jack Montgomery>

Test Plan: kicksandcastle

Reviewed By: ipiszy

Differential Revision: D18231697

fbshipit-source-id: 7566e2438d2b57f0feaadcd51f55a03552adeab9
2019-10-31 10:33:42 -07:00
Sebastian Messmer
bb0e46b65a Remove preallocation of type ids (#28024)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28024

We preallocated type ids to align them with ScalarType. At that point, the maximum type id was 10 and we used 11 to specify undefined type id.
However, since then, ScalarType got more additions, 11 isn't undefined anymore, and numbers 11-15 have meaning.
caffe2::TypeIdentifier also got its separate additions, 12 and upwards have meaning that differs from ScalarType.

I'm going with the (CI-tested) assumption that caffe2::TypeIdentifier and ScalarType actually don't need to be aligned
and remove the functionality for preallocated type ids. This simplifies our type ids.
ghstack-source-id: 92051872

Test Plan: unit tests

Differential Revision: D17936165

fbshipit-source-id: 2c9df2b9b3f35b3e319641c96638321ac3433d5c
2019-10-16 23:08:11 -07:00
Sebastian Messmer
d9de2e0ba9 Back out "Revert D17936166: [wip] Constexpr type ids" (#28155)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28155

Original commit changeset: 92c63a96dedd
ghstack-source-id: 92051874

Test Plan: unit tests

Differential Revision: D17964410

fbshipit-source-id: 1d989d28b3e1de6d43c915f122f2b65a77a332eb
2019-10-16 18:24:04 -07:00
Lu Fang
1819fade35 Revert D17936166: [wip] Constexpr type ids
Test Plan: revert-hammer

Differential Revision:
D17936166

Original commit changeset: 68cfa926c721

fbshipit-source-id: 92c63a96dedd8764e342c6437c6ea308d93d29b2
2019-10-16 06:47:10 -07:00
Sebastian Messmer
9cc4405dc9 Constexpr type ids (#28023)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28023

ghstack-source-id: 91987335

Test Plan: waitforsandcastle

Differential Revision: D17936166

fbshipit-source-id: 68cfa926c721e5fbc96e083eb47e784bf34a9df4
2019-10-15 21:21:20 -07:00
Sebastian Messmer
ef8bcfe2c7 Revert D17488861: constexpr type ids
Test Plan: revert-hammer

Differential Revision:
D17488861

Original commit changeset: ce7b059d7c86

fbshipit-source-id: 426fca9abe7122190fc17ac6976bc6bcbd5718df
2019-10-15 09:59:21 -07:00
Sebastian Messmer
1865f31efa Revert D17490109: Remove preallocation of type ids
Test Plan: revert-hammer

Differential Revision:
D17490109

Original commit changeset: 800c340d9d35

fbshipit-source-id: a3e39bbce53c828fe553379d9f2b66dc8a07c982
2019-10-15 09:59:17 -07:00
Sebastian Messmer
cf01f53b5a Remove preallocation of type ids (#26509)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26509

We preallocated type ids to align them with ScalarType. At that point, the maximum type id was 10 and we used 11 to specify undefined type id, see https://github.com/pytorch/pytorch/pull/10139.
However, since then, ScalarType got more additions, 11 isn't undefined anymore, and numbers 11-15 have meaning.
caffe2::TypeIdentifier also got its separate additions, 12 and upwards have meaning that differs from ScalarType.

I'm going with the (CI-tested) assumption that caffe2::TypeIdentifier and ScalarType actually don't need to be aligned
and remove the functionality for preallocated type ids. This simplifies our type ids.
ghstack-source-id: 91896918

Test Plan: unit tests

Differential Revision: D17490109

fbshipit-source-id: 800c340d9d3556a99f6e3ffc33af14ad68d7cc59
2019-10-15 08:47:13 -07:00
Sebastian Messmer
6f865c1e37 constexpr type ids (#26502)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26502

Create type ids at compile time instead of incrementing a counter at runtime. This is done by computing a compile time crc64 on the type name. We couldn't do this before, because we still used GCC4 and that compiler didn't support the use of `__PRETTY_FUNCTION__` in a constexpr context. However, since GCC5 this is possible and we can use this trick.

This does not change the semantics of preallocated type ids. I actually think we don't need to preallocate anymore, but I split the removal of preallocation into a separate diff to be able to test it separately.

ghstack-source-id: 91896920

Test Plan: unit tests

Differential Revision: D17488861

fbshipit-source-id: ce7b059d7c8686b69cb091a4a8beaf4b96391343
2019-10-15 08:47:09 -07:00
Edward Yang
0b6186d778 Remove Tensor.h, TensorMethods.h from src/core. (#27086)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27086

This is a major source of merge conflicts, and AFAICT isn't necessary anymore (it may have been necessary for some mobile build stuff in the past).

This is a commandeer of #25031

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D17687345

Pulled By: ezyang

fbshipit-source-id: bf6131af835ed1f9e3c10699c81d4454a240445f
2019-10-06 09:37:50 -07:00
Hao Lu
1f0328c6d4 Add randomFill to test_utils.h
Summary: Add helper function randomFill to test_utils.h so we can use it in benchmark scrips as well tests.

Test Plan:
```
buck run mode/opt //tvm/sparse:cblas_bench
```

Reviewed By: yinghai

Differential Revision: D17759193

fbshipit-source-id: e4909b04e83ca9382ab4718855fb63743d028de1
2019-10-04 18:29:22 -07:00
Sebastian Messmer
ed207b53ab c10::KernelFunction (#26337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26337

- Factor out boxing and unboxing functionality from the c10 dispatcher into a c10::KernelFunction class
- Move that class and everything else it depends on into ATen/core/boxing
- This also allows us to get rid of c10::KernelCache. Instead, we now store a pointer to the unboxed functor in c10::KernelFunction.
- We're also getting rid of the DispatchTableEntry struct and instead store KernelFunction directly.
- To make this work, we need to change the dispatcher calling API from Dispatcher::lookup().callBoxed/callUnboxed and OperatorEntry::lookup().callBoxed/callUnboxed to Dispatcher::callBoxed/callUnboxed and OperatorEntry::callBoxed/callUnboxed.

ghstack-source-id: 90459911

Test Plan: unit tests

Differential Revision: D17416607

fbshipit-source-id: fd221f1d70eb3f1b4d33092eaa7e37d25684c934
2019-09-20 18:55:25 -07:00
Ansha Yu
e44ea6cd5e tvm operator dynolog (#26295)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26295

Log the following in scuba caffe2_tvm_operator_stats:
1. everything in caffe2_operator_stats
2. fallback netdef
3. tvm module graph_json
4. whether compilation triggered this round
5. number of compilations stored in tvm_runtime_map
6. (not yet logged) last compilation time if any
7. (not yet logged) total bytes occupied by compilation
8. whether this compilation is fallback
9. batch size as observed by tvm op

Test Plan:
```
buck run mode/dbg //tvm/sparse:tvm_bbpredictor_benchmark -- --init_net ~/tmp/ads/84480054_204/init_net.pb --input_init_net ~/tmp/ads/84480054_204/input_init_net.pb --pred_net ~/tmp/ads/84480054_204/pred_net.pb --warmup 1000 --iter 1000 --num_cycles 5 --caffe2_logging_operator_dyno_sampling_rate=1 --vmodule=Logger=
2
```

Logs show up in the scuba:
https://our.intern.facebook.com/intern/scuba/query/?dataset=caffe2_tvm_operator_stats

https://fburl.com/scuba/lq2h22e4

Auto submitted adindexer canary:
https://our.intern.facebook.com/intern/ads/canary/421064436039494716
Additional adindexer canary:
https://our.intern.facebook.com/intern/ads/canary/421082681202831286/
Additional adfinder canary:
https://our.intern.facebook.com/intern/ads/canary/421082685084831037/

Reviewed By: yinghai

Differential Revision: D17358412

fbshipit-source-id: d2119c12ddeaa86217c163e32fb1e211952139f5
2019-09-18 18:37:17 -07:00
Andrey Malevich
28d3eb8156 Back out "Back out "[Caffe2] Fix device_option propagation"" (#25908)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25908

Original commit changeset: f6e961e88c01

device_option propagation is completely broken in Caffe2 for cases when pass through operators are used. As an example Gather operator don't have gradient and passes through it's inputs, which results in incorrect detection of the components for sparse parameter aggregation (component will be empty instead of the real device).
This diff is trying to fix this issue.

Original diff had a problem, that Caffe2 is not handling cases when device option is present, but contains only metadata (for example one for auto-generated reduction ops in backward pass). This diff is addressing this issue by merging device options during the backward pass

Test Plan:
1. net_transform is finally working with Gather + FloatToHalf transformed model instead of failing because of incorrect number of components.
2. New unit-test.
3. Verify that previously broken benchmark is now passing

ezyang do you have suggestions what else I should test?

Reviewed By: ezyang

Differential Revision: D17281528

fbshipit-source-id: 4a1bc386f29f6a34fbf8008effde9d4890abebfa
2019-09-17 04:01:36 -07:00
Sebastian Messmer
0e30e6570d Call aten ops through c10 dispatcher (#23668)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23668

- The eager mode frontend now calls operators who are defined in native_functions.yaml with `use_c10_dispatcher: True` through the c10 dispatcher and not anymore through globalATenDispatch().
- These operators aren't registered with globalAtenDispatch anymore, only on c10 now.
- Backend extensions calling globalATenDispatch().registerOp() to add their own kernels still work, this function will forward the registration to the c10 dispatcher for them.

ghstack-source-id: 90130455

Test Plan: benchmarks at https://docs.google.com/document/d/1gpzKZcFf1JJameY1vKxF7Cloul9s6D8HKIK2_Pp1hFo/edit#

Differential Revision: D16603133

fbshipit-source-id: 991f17b355e9c78c5e86fee4fa381df7ab98ac82
2019-09-15 01:18:07 -07:00
Jiakai Liu
67c530851c get rid of protobuf dependencies (#25650)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25650

This PR removes protobuf dependencies from mobile build altogether:
- caffe2/proto: protobuf files, including caffe2.proto and torch.proto;
- caffe2 components that depend on caffe2.proto, including most part of
caffe2/core, caffe2/utils;
- libprotobuf / libprotobuf-lite dependencies;
- protobuf compiler;
- some utils class, e.g.: netdef_converter.cpp;
- introduce a macro to disable third_party/onnx which depends on protobuf;

Test Plan:
- builds;
- link with demo app to make sure it can load and run a model in pickle format;

Differential Revision: D17183548

Pulled By: ljk53

fbshipit-source-id: fe60b48674f29c4a9b58fd1cf8ece44191491531
2019-09-06 08:48:20 -07:00
Jiakai Liu
a3d0abf729 move GetDimFromOrderString to caffe2/core/types.h (#25671)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25671

To decouple string_utils.h from types.h and protobuf headers.
Logically GetDimFromOrderString seems to be more similiar to
StringToStorageOrder comparing to other string_utils functions.

Test Plan: - Will check all internal/external CI jobs.

Reviewed By: yinghai

Differential Revision: D17191912

Pulled By: ljk53

fbshipit-source-id: fe555feef27bfd74c92b6297c12fb668252ca9ff
2019-09-05 04:32:04 -07:00
Sebastian Messmer
791347642b Allow TensorMethods.h to include Dispatcher.h (alternative) (#23888)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23888

This is an alternative to https://github.com/pytorch/pytorch/pull/23684.

Instead of splitting a bunch of headers into declaration and definition, we change tensor includes to only include the tensor declaration when the tensor definition isn't needed.
ghstack-source-id: 89357687

Test Plan: waitforsandcastle

Differential Revision: D16673569

fbshipit-source-id: fa1d92809b05de7910a8c2dc2f55abe071ca63bf
2019-09-04 01:35:19 -07:00
iotamudelta
4fe857187c switch to rocThrust for thrust/cub APIs (#25620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25620

Pull Request resolved: https://github.com/pytorch/pytorch/pull/25602

Enable rocThrust with hipCUB and rocPRIM for ROCm. They are the ROCm implementations of the thrust and cub APIs and replace the older hip-thrust and cub-hip packages going forward. ROCm 2.5 is the first release to contain the new packages as an option, as of 2.6 they will be the only available option.

Add hipification rules to correctly hipify thrust::cuda to thrust::hip and cub:: to hipcub:: going forward. Add hipification rules to hipify specific cub headers to the general hipcub header.

Infrastructure work to correctly find, include and link against the new packages. Add the macro definition to choose the HIP backend to Thrust.

Since include chains are now a little different from CUDA's Thrust, add includes for functionality used where applicable.

Skip four tests that fail with the new rocThrust for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21864

Reviewed By: xw285cornell

Differential Revision: D16940768

Pulled By: bddppq

fbshipit-source-id: 3dba8a8f1763dd23d89eb0dd26d1db109973dbe5
2019-09-03 22:16:30 -07:00
Yinghai Lu
4edf77b6c0 Fuse to individual operators to GatherFuse8BitRowwiseQuantFloatMulLengthElim (#25519)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25519

Fuse  Gather-Fused8BitRowwiseQuantizedToFloat-Mul-LengthsSum opportunistically.

Test Plan:
```
buck test caffe2/caffe2/opt/custom:concat_elim_test
```

Reviewed By: dreamingleo

Differential Revision: D17125045

fbshipit-source-id: 8ee50410eb13a82e1e5c8180f392fce2fe9cd728
2019-09-03 19:08:49 -07:00
Edward Yang
58a0dee749 Replace open registration TensorTypeId with closed enum. (#25252)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25252

Our model going forward for extensions will be that you will have to
get an allocation of an ID in our system.  This is how things work
in practice today; we're just simplifying our underlying registration
since there is no need to have distributed registration.

There are some codemods in this diff:

```
codemod --extensions cpp,h,cc,cuh,py,in --exclude-paths=c10/core/TensorTypeId.h '([A-Za-z]+?)TensorId\(\)' 'TensorTypeId::\1TensorId'
codemod --extensions cpp,h,cc,cuh,py,in 'TensorTypeIds::undefined\(\)' 'TensorTypeId::UndefinedTensorId'
codemod --extensions cpp 'TensorType1\(\)' 'TensorTypeId::CPUTensorId'
codemod --extensions cpp 'TensorType2\(\)' 'TensorTypeId::CUDATensorId'
codemod --extensions cpp 'TensorType3\(\)' 'TensorTypeId::XLATensorId'
codemod --extensions cpp 'TensorType1' 'CPUTensorId'
codemod --extensions cpp 'TensorType2' 'CUDATensorId'
codemod --extensions cpp 'TensorType3' 'XLATensorId'
```

The main hand-written changes are in c10/core/TensorTypeId.h

Other manual fixes:

- aten/src/ATen/core/op_registration/op_registration.cpp - stop using
  std::string operator+
- aten/src/ATen/function_wrapper.py - handle a hardcoded TypeId() that
  wasn't caught by codemod
- torch/csrc/tensor/python_tensor.h - fix now incorrect forward declaration
  of TensorTypeId
- aten/src/ATen/core/op_registration/ - remove out-of-line registration

Differential Revision: D17072001

Test Plan: ossci and sandcastle

Pulled By: ezyang

fbshipit-source-id: c641515fd0604c045c54fbb1d6b1b950f45e89d1
2019-08-29 08:55:58 -07:00
Lucian Grijincu
9c9f14029d Revert D16929363: Revert D16048264: Add static dispatch mode to reduce mobile code size
Differential Revision:
D16929363

Original commit changeset: 69d302929e18

fbshipit-source-id: add36a6047e4574788eb127c40f6166edeca705f
2019-08-20 17:08:31 -07:00
Lucian Grijincu
bd6cf5099b Revert D16048264: Add static dispatch mode to reduce mobile code size
Differential Revision:
D16048264

Original commit changeset: ad1e50951273

fbshipit-source-id: 69d302929e183e2da26b64dcc24c69c3b7de186b
2019-08-20 16:26:18 -07:00
Roy Li
6824c9018d Add static dispatch mode to reduce mobile code size
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22335

Test Plan: Imported from OSS

Differential Revision: D16048264

Pulled By: li-roy

fbshipit-source-id: ad1e50951273962a51bac7c25c3d2e5a588a730e
2019-08-20 12:21:32 -07:00
Rui Zhu
5b0de85868 Register FC/Conv DNNLowp separately for supporting both tensor type (#24361)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24361

Currently we only support Conv in kernel but have entrance for both type using one same class
It is time make change

Reviewed By: csummersea

Differential Revision: D16604713

fbshipit-source-id: b98d39a2c7960707cd50ba27e43dce73f741eeeb
2019-08-14 17:15:42 -07:00
Edward Yang
5ae909b443 Revert D15920763: Move TensorOptions to ATen/core
Differential Revision:
D15920763

Original commit changeset: c3429973180a

fbshipit-source-id: 0efb27722b371e1047f02240f071bc222b52e51d
2019-08-13 12:07:18 -07:00
Zachary DeVito
4a754dc3e3 cleanup warnings
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24133

Test Plan: Imported from OSS

Differential Revision: D16746249

Pulled By: zdevito

fbshipit-source-id: 051f048b03043d6947544cd02ae44288bd439ef9
2019-08-12 16:12:30 -07:00
Richard Zou
bde73860c6 Move TensorOptions to ATen/core (#22020)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22020
ghimport-source-id: 62766d49658ee84b8076c555432b50e13d104bc6

Test Plan: Imported from OSS

Differential Revision: D15920763

Pulled By: zou3519

fbshipit-source-id: c3429973180a65606da82face5c0ee377035e716
2019-08-12 07:41:12 -07:00
Supriya Rao
9223fa1c46 Add support to serialize qtensor in JIT. (#23356)
Summary:
Adds qtensor specific fields to the proto file so that they get serialized into the model.json

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23356
ghstack-source-id: 87263428

Differential Revision: D16473237

fbshipit-source-id: bf5b51d0863d036d30a1644a3c3b74516468224b
2019-07-26 15:52:15 -07:00
Yinghai Lu
b964bdb53a Fbgemm fp16 tensor support (#23101)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23101

Support for
- Shape inference
- Tensor info extraction

Reviewed By: zrphercule

Differential Revision: D16345251

fbshipit-source-id: 53ef674b5b1581e6267e6d2070e34355280dae79
2019-07-19 17:08:03 -07:00
Yinghai Lu
2a8d5a132c Fix workspace destruction ordering (#23096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23096

nets can have states that depends on the rest of the state in the Workspace. Hence, they should be destructed first.

Reviewed By: ajyu

Differential Revision: D16382987

fbshipit-source-id: 3fd030ba206e2d0e897abb9e31c95bdaeb9482b7
2019-07-19 16:49:50 -07:00
Will Feng
3a12520844 Pass Variable into Caffe2 ops, by requiring that the Variable doesn't require grad (#22473)
Summary:
As part of the Variable/Tensor merge, we want to be able to pass Variables into Caffe2 without doing extra shallow copy, to improve performance and also allow for in-place mutations in Caffe2 ops. There are a few approaches outlined in https://github.com/pytorch/pytorch/pull/22418, and this PR is the chosen approach.

Specifically, we can have the assumption that we won't be connecting autograd to C2 gradients at any point (as it's too tricky and not that useful). Therefore, we can pass Variable into Caffe2 ops by requiring that all Variables in Caffe2 don't require grad. For code paths in Caffe2 that might potentially track gradients (e.g. `ScriptModuleOp` and `call_caffe2_op_from_c10`), we use the `torch::NoGradGuard` to make sure gradients are not tracked.

This supersedes https://github.com/pytorch/pytorch/pull/22418.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22473

Differential Revision: D16099042

Pulled By: yf225

fbshipit-source-id: 57efc3c7cfb3048d9abe90e63759acc14ebd2972
2019-07-08 11:31:10 -07:00
Jongsoo Park
9db7bc8bc7 fix uninitialized variable warning (#22477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22477

There is actually no use of uninitialized variable but some compilers are not smart enough to reason about two if branches are already taken together.

Reviewed By: hx89

Differential Revision: D16100211

fbshipit-source-id: 25f01d668063603d7aaa776451afe8a10415d2ea
2019-07-06 00:36:45 -07:00
Sebastian Messmer
ed60d9fcf9 List/Dict remember and check their element type (#22005)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22005

When a Dict or List is created with type information, it will remember that.
If at any point later, this list is instantiated to a List<T> with a concrete type, it will assert that T is the correct type.

Differential Revision: D15914462

fbshipit-source-id: a8c3d91cb6d28d0c1ac0b57a4c4c6ac137153ff7
2019-07-05 15:17:51 -07:00
Sebastian Messmer
e68dc899d1 Fix compiler warnings (#22162)
Summary:
Fix various compiler warnings
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22162

Differential Revision: D16085339

Pulled By: smessmer

fbshipit-source-id: d36a4b334315f1a5942cac46443a7d166ca36d0d
2019-07-02 14:12:55 -07:00
Hong Xu
693871ded3 Rename macros and build options NAMEDTENSOR_ENABLED to BUILD_NAMEDTENSOR (#22360)
Summary:
Currently the build system accepts USE_NAMEDTENSOR from the environment
variable and turns it into NAMEDTENSOR_ENABLED when passing to CMake.
This discrepancy does not seem necessary and complicates the build
system. The naming of this build option is also semantically incorrect
("BUILD_" vis-a-vis "USE_").  This commit eradicate this issue before it
is made into a stable release.

The support of NO_NAMEDTENSOR is also removed, since PyTorch has been
quite inconsistent about "NO_*" build options.

 ---

Note: All environment variables with their names starting with `BUILD_` are currently automatically passed to CMake with no need of an additional wrapper.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22360

Differential Revision: D16074509

Pulled By: zou3519

fbshipit-source-id: dc316287e26192118f3c99b945454bc50535b2ae
2019-07-02 11:46:13 -07:00
Haixin Liu
869ce89474 use feenableexcept when glibc is available (#22241)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22241

Pull Request resolved: https://github.com/pytorch/pytorch/pull/20387

glibc has a non-standard function, feenableexcept, that triggers floating-point exception handler . Compared to feclearexcept + fetestexcept , this approach allows us to see precisely where the exception is raised from the stack trace.

Reviewed By: jspark1105

Differential Revision: D15301095

fbshipit-source-id: 94f6e72456b2280f78d7d01c2ee069ae46d609bb
2019-07-02 10:49:55 -07:00
Andrew Naguib
3cba9e8aaa Error Message Paraphrasing (#22369)
Summary:
Saying `I` in an err msg is too subjective to be used in a framework.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22369

Differential Revision: D16067712

Pulled By: soumith

fbshipit-source-id: 2a390646bd5b15674c99f65e3c460a7272f508b6
2019-06-30 00:13:02 -07:00
Vitaly Fedyunin
516c7e4456 Adding memory_format to empty and empty_like operators (#20558)
Summary:
Original RFC https://github.com/pytorch/pytorch/issues/19092

To ensure that we are not introducing BC breaking change, empty_like returns contiguous tensor by default.

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nCwh = torch.empty_like(nhwC)
new_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```

Now we need a way to preserve memory format in `empty_like`

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nhwC = torch.empty_like(nhwC, memory_format=torch.preserve_format)
new_nhwC.is_contiguous(memory_format=torch.channels_last) == True

like_nCwh = torch.empty_like(nCwh, memory_format=torch.preserve_format)
like_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```

Usage of `torch.preserve_format` allows us to avoid `if` constructs.

We can also generate different memory format outputs

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nhwC = torch.empty_like(nCwh, memory_format=torch.channels_last)
new_nhwC.is_contiguous(memory_format=torch.channels_last) == True

new_nCwh = torch.empty_like(nhwC, memory_format=torch.contiguous_format)
new_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20558

Differential Revision: D15502474

Pulled By: VitalyFedyunin

fbshipit-source-id: 2e120d57eefad6fb8e04b8322c79871392f64331
2019-06-26 11:48:27 -07:00
Sebastian Messmer
de85abf226 Allow default construction of Dict/List (#22084)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22084

For DictPtr/ListPtr, default construction was disallowed because it was ambigious if it's supposed to create an empty list or a nullptr.
But since we renamed them to Dict/List, we can now allow default construction without ambiguity.

Differential Revision: D15948098

fbshipit-source-id: 942a9235b51608d1870ee4a2f2f0a5d0d45ec6e6
2019-06-25 17:40:48 -07:00
Sebastian Messmer
e425789286 Fix "missing return statement" warning (#22216)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22216

-

Differential Revision: D15989670

fbshipit-source-id: d0534a3bf1eef29657738e271d35503a2f75a043
2019-06-25 16:57:42 -07:00
Ilia Cherniavskii
7b1d6c8912 Update intra_inter_benchmark (#22051)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22051
ghimport-source-id: 70710b3866b1a5e21656b77d2695ada74d00254e

Test Plan:
PARALLEL_BACKEND=NATIVE_TBB USE_OPENMP=0 USE_TBB=1 MKL_SEQ=1
MKLDNN_THREADING=SEQ USE_CUDA=0 BLAS=MKL USE_MKLDNN=1 BUILD_BINARY=1
python setup.py develop --cmake

./build/bin/intra_inter_benchmark

Imported from OSS

Differential Revision: D15933951

Pulled By: ilia-cher

fbshipit-source-id: 88ad8f7a1634c1612ffaa68f22721ffc73d9b2ba
2019-06-21 23:06:27 -07:00
Sebastian Messmer
275087383b ListPtr->List DictPtr->Dict step 2 (#21937)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21937

This changes call sites to use the new naming scheme

Reviewed By: zdevito

Differential Revision: D15892404

fbshipit-source-id: 8d32aa90a0ead1066688166478f299fde9c2c133
2019-06-19 18:02:05 -07:00
Sebastian Messmer
44128e09f0 Speed up op lookup and registration (#21806)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21806

Dispatcher::findSchema(op_name) now uses a lookup table instead of iterating through the list of operators to find it.

This speeds up op lookup (as in finding the operator handle from the name, not as in finding a kernel when you already have the operator handle)
and it also speeds up op registration since that needs to look if an op with the same name already eists.

Differential Revision: D15834256

fbshipit-source-id: c3639d7b567e4ed5e3627c3ebfd01b7d08b55ac1
2019-06-19 12:05:14 -07:00
Will Feng
04f09d4235 Move unwrap logic from c10 to caffe2 (#21620)
Summary:
After https://github.com/pytorch/pytorch/pull/17072, we are allowed to pass Variables into ATen ops, thus there is no need to unwrap input variables in the c10 call path.

Note that since Caffe2 still expects inputs to be pure Tensors, we moved the unwrapping logic to the Caffe2 wrapper.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21620

Differential Revision: D15763560

Pulled By: yf225

fbshipit-source-id: 5375f0e51eb320f380ae599ebf98e6b259f0bff8
2019-06-14 22:02:43 -07:00
Sherman Wong
adc99efb46 Add batch id to tracer event (#21446)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21446

this is used for easier tracing of iter id when looking at trace diagram

Reviewed By: ilia-cher

Differential Revision: D15628950

fbshipit-source-id: ee75b3bdb14a36abc18c7bddc49d8ec9789b724d
2019-06-13 17:13:42 -07:00
Sungmann Cho
f59581218f Fix spelling errors (#21665)
Summary:
alloctor -> allocator
excutable -> executable
excution -> execution
foward -> forward
initiaize -> initialize
paralell -> parallel
preprocesor -> preprocessor
tranpose -> transpose
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21665

Differential Revision: D15806155

Pulled By: soumith

fbshipit-source-id: d92b21ec8650a2b32f05faf9af0b7d2b073e992c
2019-06-13 15:21:55 -07:00
Karl Ostmo
49481d576d Torch rename (#20774)
Summary:
This renames the CMake `caffe2` target to `torch`, as well as renaming `caffe2_gpu` to `torch_gpu` (and likewise for other gpu target variants).  Many intermediate variables that don't manifest as artifacts of the build remain for now with the "caffe2" name; a complete purge of `caffe2` from CMake variable names is beyond the scope of this PR.

The shell `libtorch` library that had been introduced as a stopgap in https://github.com/pytorch/pytorch/issues/17783 is again flattened in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20774

Differential Revision: D15769965

Pulled By: kostmo

fbshipit-source-id: b86e8c410099f90be0468e30176207d3ad40c821
2019-06-12 20:12:34 -07:00
Sebastian Messmer
b527e48588 Use c10::List (#21177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21177

- Integrate c10::ListPtr into IValue and the c10 dispatcher.
- Streamline conversion to/from IValue. Before, we had IValue::to<> and kernel_functor.h had its own ivalue_to_arg_type and return_type_to_ivalue. They are now unified. Also, this means that nested types like Dicts of Lists of Optional of Dict of ... do work as expected now

Differential Revision: D15476433

fbshipit-source-id: bde9df80df20091aa8e6ae17ba7e90abd149b954
2019-06-12 13:58:24 -07:00
Sebastian Messmer
fe5ceea580 Rename caffe2<->c10 operator wrappers (#21322)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21322

Naming is everything.

- Rename c10_operator.h -> export_caffe2_op_to_c10.h
- Rename operator_c10wrapper.h -> export_c10_op_to_caffe2.h
- Rename corresponding macros

This hugely improves readability and explains what these things are doing.

Reviewed By: dzhulgakov

Differential Revision: D15616816

fbshipit-source-id: d976aefcb43a0f55d85c3424fdd9aca7e71c3603
2019-06-07 13:48:10 -07:00
Rui Zhu
2b902e9738 Fix the offset numerical bug when casting (#21484)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21484

cast<int32_t*> => cast<int32_t>

Also fixed reserve problem which might cause incorrect pointer.

Reviewed By: yinghai

Differential Revision: D15699866

fbshipit-source-id: 374418476bddd60f5c5306c8c57319ccf28b9990
2019-06-07 12:33:18 -07:00
Peng Gong
78a376592d add cancelAsyncCallback method to OperatorBase (#21492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21492

If one async operator failed, async_scheduling net currently only marks all scheduled async operators as finished without cancelling the callbacks.

The new behavior is to cancel the callbacks first, then set event status to finished.

Reviewed By: ilia-cher

Differential Revision: D15702475

fbshipit-source-id: 55a1774d768b2e238bab859b83332f1877a001ca
2019-06-06 20:57:12 -07:00
Junjie Bai
4c19421f16 Register gradient op with engine (#21205)
Summary:
cc dreiss
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21205

Differential Revision: D15578948

Pulled By: bddppq

fbshipit-source-id: ef285174e8637daef624c8088ebd903a70582345
2019-05-31 18:48:47 -07:00
Sebastian Messmer
85777b92b2 Assert against using Operator methods not supported when exporting it to c10, part 2 (#17946)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17946

Some of these are probably implementable for exported operators,
but aren't implemented yet and for now it's better to assert than to just return wrong results.

Reviewed By: ezyang

Differential Revision: D14430749

fbshipit-source-id: 2b0037a9ed227a22aa7376a90e6d3d09d3e04707
2019-05-29 13:16:00 -07:00
Jongsoo Park
0290897bca tracing for intra_op_parallel (#20603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20603

When we use intra_op_parallel operators, Caffe2 tracing was generating trace only for the master task giving a false impression that a lot of threads are underutilized.
This diff also traces child tasks.

Reviewed By: ilia-cher

Differential Revision: D14820008

fbshipit-source-id: ff4ed203804d86d9231c21c99d869f1ddf1d1ef9
2019-05-28 17:39:23 -07:00
Kimish Patel
d6d192e0af Added engine information to the profiling result. (#20493)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20493

This helps distinguish if the op was a quantized op or not.

Reviewed By: salexspb

Differential Revision: D15337854

fbshipit-source-id: 43c7aef143085cfaeb4ec2102a7f36cc454e0e94
2019-05-28 16:41:12 -07:00
Kimish Patel
7afa75006e Enable operator profiling via command line (#20173)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20173

Enabled op profiling even when net type is not dag or prof dag. Also added
engine type info to summary.

Reviewed By: salexspb, ilia-cher

Differential Revision: D15177813

fbshipit-source-id: 5be0efeaabc9a961cf1d73b0703749c08bb1adbb
2019-05-28 16:41:08 -07:00
Sebastian Messmer
6063ffd055 Specify dispatch key with kernel (#20821)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20821

Change registration API. Instead of

    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .kernel<Kernel>()
        .dispatchKey(CPUTensorId()));

it is now

    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .kernel<Kernel>(CPUTensorId()));

This binds kernel and dispatch key together, allowing them to be separate from other future configuration options like alias analysis or autograd wrappers.

The semantic problem behind this is that the dispatch key is a *kernel config parameter* and not an *operator config parameter* while things like autograd wrappers, alias info, and actually the kernel itself are *operator config parameters*. And while previously, the different kind of config parameters have been mixed, this diff now separates them.

Before this change, it wouldn't have been well defined if you specified a dispatchKey together with an autogradWrapper or aliasInfo for example.

    // what is this supposed to do?
    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .aliasInfo(DEFAULT)
        .dispatchKey(CPUTensorId()));

If we get more kernel config parameters in the future, we could introduce something like this

    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .kernel<Kernel>(torch::RegisterOperators::kernelOptions()
            .dispatchKey(CPUTensorId())
            .otherConfig());

but that's overkill as long as dispatch keys are the only kernel config parameter, and we can introduce that later without breaking backwards compatibility.

A nice side effect of this is that people can register multiple kernels to the same operator in the same `.op()` call:

    static auto registry = torch::RegisterOperators()
      .op("my::op", torch::RegisterOperators::options()
        .kernel<Kernel1>(CPUTensorId())
        .kernel<Kernel2>(CUDATensorId()));

Reviewed By: dzhulgakov

Differential Revision: D15455790

fbshipit-source-id: 1c46bfe676dcacf74cf36bd3f5df3d2c32b8fb11
2019-05-24 14:23:35 -07:00
Sebastian Messmer
4501dc305d Assert against using Operator methods not supported when exporting it to c10 (#17818)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17818

Some of these are probably implementable for exported operators,
but aren't implemented yet and for now it's better to assert than to just return wrong results.

Reviewed By: ezyang

Differential Revision: D14392459

fbshipit-source-id: bf86e6cb0a7cfefd112a65dc85cc243e57a5ad52
2019-05-24 13:45:01 -07:00
Dmytro Dzhulgakov
c25e33789e Lightweight at-most-once logging for API usage (#20745)
Summary:
Resubmit #20698 which got messed up.

Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl.

Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20745

Differential Revision: D15429196

Pulled By: dzhulgakov

fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca
2019-05-23 23:17:59 -07:00
Yinghai Lu
48bf7b9be8 Fix oscillation in coalesceInsertedDataDependencies (#20833)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20833

Att. The algorithm is still "horrendously inefficient". But since we are sunsetting Nomnigraph, I just did the minimal fix here.

Reviewed By: tracelogfb

Differential Revision: D15463880

fbshipit-source-id: 413a1280a92c1923ba49031177816a2d5f888575
2019-05-23 14:04:20 -07:00
Yinghai Lu
cf7ef5e631 Add onnxifi support for Int8FCDNNLowPPackedWeightBlob (#20564)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20564

Reviewed By: bddppq

Differential Revision: D15106712

fbshipit-source-id: 428db9c23cfd36ddedc8d79121fbbb3bb484c993
2019-05-20 16:57:11 -07:00
Edward Z. Yang
9b1dbffba5
Re-sync with internal repository (#20702) 2019-05-20 09:22:57 -04:00
Dmytro Dzhulgakov
d3059b9c49 Lightweight logging for once-only API usage 2019-05-19 23:04:40 -07:00
Jongsoo Park
7b9ee598d6 separate option for FE_OVERFLOW (#20476)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20476

There're overflow exceptions happening for legitimate computation like
for big x, sigmoid(x) = 1 / (1 + exp(-x)) = 1 / (1 + inf) = 1
This diff separates the option for FE_OVERFLOW to make caffe2_operator_throw_if_fp_exceptions=1 option less noisy.

Reviewed By: hx89

Differential Revision: D15332947

fbshipit-source-id: 9148233f5b84551a0900f0557ba22f2b1508ae0c
2019-05-19 16:05:27 -07:00
Sebastian Messmer
cb6be42403 Options based registration API (#20514)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20514

Change API from

    static auto registry = c10::RegisterOperators()
      .op("my::op",
        c10::kernel(...),
        c10::dispatchKey(...)
      );

to

    static auto registry = c10::RegisterOperators()
      .op("my::op", c10::RegisterOperators::options()
        .kernel(...)
        .dispatchKey(...)
      );

because this allows better discoverability. People looking for which options are available will easier find it and IDE autocompletion will work better.

Reviewed By: zdevito

Differential Revision: D15346348

fbshipit-source-id: 4b74a33b75c2b9cda4a903639fb7abd2c7cff167
2019-05-17 20:54:42 -07:00
Vitaly Fedyunin
5b78a5eadb Memory format support for contiguous and is_contiguous (#20455)
Summary:
#19975 was separated by 2 PRs.

This one:

Introduce MemoryFormat argument to the `x.is_contiguous(memory_format=torch.channels_last)` and to the `y = x.contiguous(memory_format=torch.channels_last)` functions.

At this moment both functions just operate with strides and doesn't store any tensor state.

(Original RFC #19092)

-----

Expands functionality of two tensor functions `.is_contiguous` and `.contiguous` (both python and c++ api).

Note: We had several complaints about `.to(memory_format)` function, and decided not to support it.

1.  `.contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`.

    - Using `torch.contiguous_format` will preserve existing `.contiguous()` behavior.

    - Calling `x.contiguous(memory_format=torch.channels_last)` returns new tensor which maintain same semantical layout (NCHW), but have different memory allocation pattern.

        `x.contiguous(memory_format=torch.channels_last)` expects input tensor to be 3d, 4d or 5d; and fails otherwise.

2. `.is_contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`.

    - `x.is_contiguous(memory_format=torch.contiguous_format)` preserves same functionality as `x.is_contiguous()` and remains unchanged.

    - `x.is_contiguous(memory_format=torch.channels_last)` returns true if A) input tensor is contiguous in memory AND B) allocated in the memory in NWHC (or similar for 3d,5d) format.

Note: By the end of the phase one `x.is_contiguous(memory_format=torch.channels_last)` will calculate state of the Tensor on every call. This functionality going to be updated later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20455

Differential Revision: D15341577

Pulled By: VitalyFedyunin

fbshipit-source-id: bbb6b4159a8a49149110ad321109a3742383185d
2019-05-16 07:18:24 -07:00
Rui Zhu
c129ab06e9 Change onnxifi workflow to support multi-group quantized & Add multi quantization info to caffe2.proto (#20439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20439

This is the QTensorProto workflow for multi group quantization in C2 side.
No DNNLOWP Tensor related thing is included in this pr, so once we finished glow side, we should be able to test this pr using resnet50.

Reviewed By: yinghai

Differential Revision: D15096919

fbshipit-source-id: 741eecd59eb79d24d9fe2b035f6246d42422d25c
2019-05-15 19:24:08 -07:00
Edward Yang
73a97387c1 Replace AT_CHECK with TORCH_CHECK [shard 9/10]
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20435

Reviewed By: jerryzh168

Differential Revision: D15318877

fbshipit-source-id: 4d83571187ea14a604fef83ac355d328b46d93e1
2019-05-15 08:05:59 -07:00
Kedar Pujara
254de9e8ec Removing cyclic dependency (#20511)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20511

Removed cyclic dependency of caffe2/core/net.h and workspace.h

Differential Revision: D15303412

fbshipit-source-id: 6e772e372cd0cf2af05d7815f1df8ae20bc2a65e
2019-05-14 18:55:19 -07:00
Sebastian Messmer
9e7f22b223 Remove dependencies from Caffe2Go on PyTorch JIT (#20463)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20463

Source file changes mostly involve ifdef'ing-out references to JIT code
from files that are part of Caffe2Go.  Update Internal build scripts to
remove those files from our globs.

After this, changes to most of the JIT files should not trigger mobile CI.

Reviewed By: dzhulgakov

Differential Revision: D15329407

fbshipit-source-id: 48f614c6b028eef0a03ce5161d083a3e078b0412
2019-05-14 14:36:08 -07:00
Ansha Yu
a9aaf698a4 add c2 benchmark runs in cpp (#20108)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20108

Add cpp runs for c2, hooked up via pybinds. Print output to terminal. This is not hooked up with the pep output yet because I'd like to verify the numbers first.

Note that this isn't quite the same mechanism as the pytorch cpp hookup, which uses cpp_python_extensions. If I can use the same mechanism to pull all the inputs for c2 through cpp and do FeedBlobs in cpp, then I'll switch to that.

Reviewed By: zheng-xq

Differential Revision: D15155976

fbshipit-source-id: 708079dacd3e19aacfe43d70c5e5bc54da2cf9e3
2019-05-13 17:01:08 -07:00
Richard Zou
e01a5bf28b Add USE_NAMEDTENSOR compilation flag. (#20162)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20162
ghimport-source-id: 0efcd67f04aa087e1dd5faeee550daa2f13ef1a5

Reviewed By: gchanan

Differential Revision: D15278211

Pulled By: zou3519

fbshipit-source-id: 6fee981915d83e820fe8b50a8f59da22a428a9bf
2019-05-09 09:09:16 -07:00
Yanbo Liang
a8387b7779 Delete TensorImpl::GetDevice() (#20025)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20025

Delete TensorImpl::GetDevice() and clean all its call sites.

Reviewed By: ezyang

Differential Revision: D15170917

fbshipit-source-id: b6862b74aa036198544f79d18a8c0f995cb0ca7b
2019-05-06 12:44:23 -07:00
Tongliang Liao
1dfeffbff5 Expose test utils (#20114)
Summary:
Some functions were not decorated with `CAFFE2_API`, makes them unusable when creating unit tests for custom ops outside Caffe2 repo.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20114

Differential Revision: D15217490

Pulled By: ezyang

fbshipit-source-id: dda3910ad24e566567607deaac705a34ec8e7b8d
2019-05-06 07:06:04 -07:00
Tongliang Liao
f2c715cbe1 Fix the spelling of "context"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20055

Differential Revision: D15217488

Pulled By: ezyang

fbshipit-source-id: bb2b57b5e749357b47a01c6c3e73addf3c5418c7
2019-05-06 06:54:30 -07:00
Sebastian Messmer
fb8792e2b6 Remove torch/jit from xplat build (#19967)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19967

-

Reviewed By: dreiss, dzhulgakov

Differential Revision: D15150843

fbshipit-source-id: af7d6902934883be9d8021b3601de2fe1f3bf806
2019-05-02 15:31:06 -07:00
Zachary DeVito
55c719b161 Remove operator.h's dependency on function_schema.h (#19817)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19817

A lot of files were depending on the JIT's typesystem
because operator.h depends on function_schema.h. However,
this isn't fundamental to the design. This diff tries to
remove the direct depenency and only includes the c10
wrapper helpers in files where it is required.

Reviewed By: smessmer

Differential Revision: D15112247

fbshipit-source-id: 2c53d83e542c32d9a398c8b60dbf40ab7a1cb0f6
2019-04-29 19:50:43 -07:00
Xiaomeng Yang
2ce39de3fc Add elementwise_affine for layer_norm_op (#19713)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19713

Add elementwise_affine for layer_norm_op

Reviewed By: houseroad

Differential Revision: D15075454

fbshipit-source-id: e8a7d3da1c81e49fa55323f5e74a68bc4ef8d83f
2019-04-26 17:20:01 -07:00
David Goodwin
c855e04d5f Caffe2 shouldn't fail if CUDA peer access is already enabled
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19586

Differential Revision: D15061544

Pulled By: dzhulgakov

fbshipit-source-id: 6a5f9f4fe45259d689671f58ad5206cdaf15c5bd
2019-04-24 13:22:27 -07:00
Yinghai Lu
b85edac16f Fix out-of-topological-order issue in Nomnigraph (#19458)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19458

The algorithm in https://fburl.com/ggh9iyvc fails to really ensure topological ordering of nodes. The fix is ugly but effective. I think we need a real topological sort to fix this issue more nicely. Mikhail Zolotukhin, Bram Wasti.

Differential Revision: D15011893

fbshipit-source-id: 130c3aa442f5d578adfb14fbe5f16aa722434942
2019-04-19 12:19:39 -07:00
Sebastian Messmer
17f05ad5e5 Moving at::Tensor into caffe2::Tensor without bumping refcount (#19388)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19388

The old implementation forced a refcount bump when converting at::Tensor to caffe2::Tensor.
Now, it is possible to move it without a refcount bump.

Reviewed By: dzhulgakov

Differential Revision: D14986815

fbshipit-source-id: 92b4b0a6f323ed38376ffad75f960cad250ecd9b
2019-04-18 14:13:26 -07:00
Sebastian Messmer
601f36bacc Use string based schema for exposing caffe2 ops (#19287)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19287

Since we now have a string-schema-based op registration API, we can also use it when exposing caffe2 operators.

Reviewed By: dzhulgakov

Differential Revision: D14931925

fbshipit-source-id: ec162469d2d94965e8c99d431c801ae7c43849c8
2019-04-18 02:04:50 -07:00
Sebastian Messmer
db611b7caf Delete C10Tensor (#19328)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19328

Plans changed and we don't want this class anymore.

Reviewed By: dzhulgakov

Differential Revision: D14966746

fbshipit-source-id: 09ea4c95b352bc1a250834d32f35a94e401f2347
2019-04-17 00:02:27 -07:00
Will Feng
c7b5a8a876 Change is_variable() to check existence of AutogradMeta, and remove is_variable_ (#19139)
Summary:
Currently, a TensorImpl's `is_variable_` is true if and only if the TensorImpl has AutogradMeta. This PR unifies these two concepts by removing `is_variable_` and change `is_variable()` to check existence of AutogradMeta instead.

Removing `is_variable_` is part of the work in Variable/Tensor merge.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19139

Differential Revision: D14893339

Pulled By: yf225

fbshipit-source-id: ceb5e22c3c01f79b5d21d5bdbf4a7d1bc397796a
2019-04-11 14:03:33 -07:00
Gregory Chanan
b6ee83a5b4 Materialize a non-default device for C2 legacy storage. (#18605)
Summary:
It's not intended that Storages have 'default' CUDA devices, but this is allowable via the Storage::create_legacy codepath.

This also messages with device_caching, because the initial cache is obtained from the Storage, which may have a 'default' device.

Instead, we materialize a device by allocating 0 bytes via the allocator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18605

Differential Revision: D14680620

Pulled By: gchanan

fbshipit-source-id: 6d43383d836e90beaf12bfe37c3f0506843f5432
2019-04-11 13:50:41 -07:00
Yinghai Lu
bbe648dffb Allow empty net type (#19154)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19154

I recently saw some weird workflow error due to empty but set net_type. Maybe we should just fallback to simple net in this case.

Reviewed By: dzhulgakov

Differential Revision: D14890072

fbshipit-source-id: 4e9edf8232298000713bebb0bfdec61e9c5df17d
2019-04-11 12:43:07 -07:00
Alexander Sidorov
0ca8f7a15f Make BlackBoxPredictor handle networks throwing exceptions (#19080)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19080

OSS: add a tiny unit test utility function to create tensors given shape and data outside of any workspace. I use it in an internal test

Reviewed By: dzhulgakov

Differential Revision: D14814194

fbshipit-source-id: 6d53b235d99a97da812215f5c7f11fecad363c8c
2019-04-09 16:42:12 -07:00
Lu Fang
75d6d8833d remove interned_string.h dep (#19061)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19061

remove the deps on interned_string.h

Reviewed By: BIT-silence

Differential Revision: D14850078

fbshipit-source-id: 07e6ad72a7de369049ea56f32b72276fb4c59b32
2019-04-09 09:59:15 -07:00
Lu Fang
443a58e03d Export C10 operator in PyTorch Model (#18210)
Summary:
Almost there, feel free to review.

these c10 operators are exported to _caffe2 domain.

TODO:

- [x] let the onnx checker pass
- [x] test tensor list as argument
- [x] test caffe2 backend and converter
- [x] check the c10 schema can be exported to onnx
- [x] refactor the test case to share some code
- [x] fix the problem in ONNX_ATEN_FALLBACK
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18210

Reviewed By: zrphercule

Differential Revision: D14600916

Pulled By: houseroad

fbshipit-source-id: 2592a75f21098fb6ceb38c5d00ee40e9e01cd144
2019-04-08 16:06:00 -07:00
Duc Ngo
e7b2669151 caffe2 - Expose tensor filler util to Python (#18886)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18886

Expose tensor filler util to Python and add a unit test (both C++/Python)

Reviewed By: salexspb

Differential Revision: D14784470

fbshipit-source-id: bb8e013d1755c27c166e87d5a8491a97c65d3d8d
2019-04-08 11:54:10 -07:00
Jerry Zhang
40a54bf2f1 Change ReinitializeTensor to use C10_LOG_FIRST_N (#18531)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18531

Currently we use C10_LOG_EVERY_MS to log the data type change, but it pollutes the log of some service,
we would like to change it to C10_LOG_FIRST_N to prevent that.

Reviewed By: dzhulgakov

Differential Revision: D14647704

fbshipit-source-id: b84e4002bd4aa94d616133cd1049c3d4ab05386e
2019-04-02 21:03:37 -07:00
Rui Zhu
19fe2b9db4 Adding quantized tensor shape/type info support for caffe2=>glow in caffe2 side (#18621)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18621

This diff added caffe2 support for onnxifi quantization.

Reviewed By: yinghai

Differential Revision: D14648767

fbshipit-source-id: 4ddb492cacbba6142305866e6dbb875880acaea3
2019-03-31 17:42:27 -07:00
Sebastian Messmer
9abc8a5b47 New operator registration MVP (#18161)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18161

This introduces version 0 for the new operator registration.

For now, it only works with kernels that are defined as stack-based functions.
This is actually not the intended public API for defining kernels, but it's the basis which is going to be used to define the public APIs (see diffs on top for them),
and it's also the API used for exposing caffe2 operators.

This diff also switches the mechanism for exposing caffe2 operators to the new mechanism.

Reviewed By: dzhulgakov

Differential Revision: D14514231

fbshipit-source-id: 454ab7b5b46a10203aa27b175400d23f818dd1df
2019-03-30 00:07:16 -07:00
Sebastian Messmer
c6bfcb854b Expose c10 operators to caffe2 by operator name (#18160)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18160

When exposing a c10 operator to the caffe2 frontend, don't use the operator schema but use the operator name instead.
This allows us to get rid of the existing mechanism for operator schema registration in a diff stacked on top.

Reviewed By: dzhulgakov

Differential Revision: D14513420

fbshipit-source-id: 6b08a9c6d9497eaf18b62361dd44bc07c7b4b76b
2019-03-26 12:36:11 -07:00
Gerard Goossen
46990c20fa Verify def before infer fensor (#18129)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18129

A lot of tensor interference function assume the operator passes the schema.
So call Verity to make sure this is actually the case.

Created diff before to add checking in Concat (https://github.com/pytorch/pytorch/pull/17110), but I encountered lot more places where this is assumed (for example ElementwiseOpShapeInference)

Reviewed By: mdschatz

Differential Revision: D14503933

fbshipit-source-id: cf0097b8c3e4beb1cded6b61e092a6adee4b8fcb
2019-03-22 06:36:25 -07:00
Hector Yuen
7bb36ada1f fix -Wsign-compare warnings for some files inside c2 (#18123)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18123

the motivation of this fix is to resolve things like:
for(auto i = 0; i < N; i++) where N is bigger than int32

These instances of comparison were found by enabling -Wsign-compare

There are way too many things to fix, so issuing this as a series of fixes

The plan is to fix all these issues and then enable this flag into Caffe2 to catch future instances

Reviewed By: ZolotukhinM

Differential Revision: D14497094

fbshipit-source-id: bca3927a2188bd33a508fa503ba221c220cdaefe
2019-03-19 10:39:20 -07:00
Duc Ngo
da3cc6e7ee Caffe2 - Add flag to fails if float point exceptions is detected in operator runs (#18040)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18040

Add flag to fails if float point exceptions is detected in operator runs

Sample exception

Exception [enforce fail at operator.h:837] !std::fetestexcept(FE_DIVBYZERO). Division by zero floating point exception (FE_DIVBYZERO) reported.
Error from operator:
input: "1" input: "0" output: "out" name: "" type: "Div"

Reviewed By: jspark1105

Differential Revision: D14467731

fbshipit-source-id: fad030b1d619a5a661ff2114edb947e4562cecdd
2019-03-16 12:28:05 -07:00
Sebastian Messmer
be364ac8d7 Specify overload name in function schema (#18037)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18037

The FunctionSchema can now store an overload name and the parser knows how to parse it. Specify like this:

    my_func.overload1(arg1: Tensor) -> Tensor
    my_func.overload2(arg1: Tensor, arg2: Tensor) -> Tensor

Reviewed By: zdevito

Differential Revision: D14467497

fbshipit-source-id: 8832b32f07351bb61090357b17b77a6a2fed3650
2019-03-15 16:58:13 -07:00
Sebastian Messmer
7a3488e0fc Expose c10 cuda ops to caffe2 (#18036)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18036

- Add macros to export c10 cuda operators to caffe2 frontend
- Instead of having a separate caffe2 registry for the c10 operator wrappers, use the existing caffe2 registries

Reviewed By: ezyang

Differential Revision: D14467495

fbshipit-source-id: 7715ed2e38d2bbe16f1446ae82c17193a3fabcb9
2019-03-15 16:58:12 -07:00
Sebastian Messmer
a41b6d7d1f Simplify macros for exposing c10 ops to c2 (#17781)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17781

The wrapper for calling a c10 operator from caffe2 is now based on a runtime FunctionSchema instead of compile time information. This way, it can be created for any c10 operator schema with just one invocation to a simple macro instead of having to define arguments and more as compile time structures.

Furthermore, previously, the wrapper assumed there's an argument present for preallocated outputs, but that was only true for caffe2 operators exported to c10. So the wrapper only worked correctly for calling caffe2->c10->caffe2. Now with the new implementation, it works for any c10 operator.

Also, binary size for this should be much smaller.

Reviewed By: ezyang

Differential Revision: D14375054

fbshipit-source-id: bac7ab8e63929e6e2a148eacac41ed092009aa86
2019-03-14 08:54:16 -07:00
Sebastian Messmer
25d06eef7b Improve caffe2 operator wrapping (#17743)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17743

- caffe2::Operator::SetOutputTensor() can now be used in operators that are called from c10/PyTorch.
- If the operator uses SetOutputTensor() instead of XOutput(), the wrapper doesn't preallocate an empty tensor for the operator anymore. Only outputs accessed in XOutput() will get an output tensor preallocated.
- Remove the copying of the vector with output tensors into a vector with pointer to output tensors.
- Preallocated outputs are now passed in as one TensorList argument on the stack. This TensorList argument has a well-defined name so other wrappers (i.e. the wrapper calling from c2 into c10) can recognize and use it).
- Macros for exporting caffe2 operators to c10 are simplified. Instead of having `c10_op_handle_for_c2_op`, we now pass in the operator handle as a template argument.
- `SetOutputTensor` and `OutputTensorOrUndefined` now work with operators exported to c10

Reviewed By: ezyang

Differential Revision: D14362434

fbshipit-source-id: 44a5e717204f21ea8e9728437429d9b84906f9f5
2019-03-14 08:54:15 -07:00
James Reed
1d26a3ae7e Open registration for c10 thread pool (#17788)
Summary:
1. Move ATen threadpool & open registration mechanism to C10
2. Move the `global_work_queue` to use this open registration mechanism, to allow users to substitute in their own
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17788

Reviewed By: zdevito

Differential Revision: D14379707

Pulled By: jamesr66a

fbshipit-source-id: 949662d0024875abf09907d97db927f160c54d45
2019-03-08 15:38:41 -08:00
Dmytro Dzhulgakov
a60fadfb71 Change message on unknown db type to be friendly (#17795)
Summary:
CreateDB actually returns nullptr when db type is unknown and throws when the file is missing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17795

Reviewed By: ezyang

Differential Revision: D14383226

Pulled By: dzhulgakov

fbshipit-source-id: 1dcf75a6b4ba8b64a24d4e5daf02db3189d56b7b
2019-03-08 10:46:24 -08:00
Sebastian Messmer
7f7d12854d Remove legacy way of exposing caffe2 operators to PyTorch (#17742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17742

This path isn't used anymore, and is incompatible with the changes stacked on top of this diff.
Removing it.
cc bwasti to check and confirm these can really be deleted

Reviewed By: ezyang

Differential Revision: D14362426

fbshipit-source-id: 32cdc19f28c2a981ae1e204901420998367ee588
2019-03-08 10:22:41 -08:00
Edward Yang
1e6acc676f Replace caffe2::DeviceGuard with c10::cuda::CUDAGuard (#17623)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17623

Despite it's generic sounding name, caffe2::DeviceGuard actually
only worked on CUDA devices.  Rename it to something that more
clearly spells out its applicability.

I'm not sure if it's the right call, but in this patch I added
'using CUDAGuard = c10::cuda::CUDAGuard', as this seems to be more
in-line with how the Caffe2 codebase is currently written.  More
idiomatic c10 namespace style would be to say cuda::CUDAGuard.
Willing to change this if people shout.

This is a respin of D13156470 (#14284)

Reviewed By: dzhulgakov

Differential Revision: D14285504

fbshipit-source-id: 93b8ab938b064572b3b010c307e1261fde0fff3d
2019-03-06 10:48:15 -08:00
Duc Ngo
e9eb18a18c Remove nomscheduler (#17693)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17693

Remove nomscheduler tool

Reviewed By: yinghai

Differential Revision: D14328168

fbshipit-source-id: 674d0e18596a4dc2bbb6b8d321f4066c4fc454ab
2019-03-06 10:48:13 -08:00
Sebastian Messmer
8569d9cbea Fix XOutput/XOutputTensor for ivalue based c2 operators (#17599)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17599

XOutput/XOutputTensor was broken for ivalue based operators. This diff fixes that.

Reviewed By: ezyang

Differential Revision: D14274003

fbshipit-source-id: b99f020244c66c4e2551dbd32ae0f665cc91b338
2019-03-04 14:20:13 -08:00
Sebastian Messmer
c7db0b35d8 Fix InputSize/OutputSize for ivalue based operators (#17579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17579

These methods previously just returned 0 when it was not a legacy operator,
making it impossible to convert some operators.

Reviewed By: dzhulgakov

Differential Revision: D14253094

fbshipit-source-id: 72bfdcf6da291a4ab80d1e0ceb20984b86edc408
2019-03-04 14:20:12 -08:00
Sebastian Messmer
b004b31d06 Allow exposing caffe2 operators with variable number of input tensors to c10 (#17491)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17491

Before, there was no way to expose a caffe2 operator that had a variable number of inputs.
Now, this is allowed by giving the operator one tensor list input.
Note that the tensor list must be the first input, and that any other tensor inputs will be ignored and inaccessible in this case.

Reviewed By: ezyang

Differential Revision: D14220705

fbshipit-source-id: 7f921bfb581caf46b229888c409bbcc40f7dda80
2019-02-28 16:31:59 -08:00
Sebastian Messmer
6706e9af19 Make C10_MOBILE consistent with how feature macros are usually used (#17481)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17481

Usually, feature macros are either defined or undefined and checked accordingly.
C10_MOBILE was a weird special case that was always defined but either defined to 1 or to 0.

This caused a lot of confusion for me when trying to disable something from mobile build and it also disabled it
from the server build (because I was using ifdef). Also, I found a place in the existing code base that made
that wrong assumption and used the macro wrongly, see https://fburl.com/y4icohts

Reviewed By: dzhulgakov

Differential Revision: D14214825

fbshipit-source-id: f3a155b6d43d334e8839e2b2e3c40ed2c773eab6
2019-02-27 17:57:51 -08:00
Sebastian Messmer
7c5ffc4120 Disable c10 dispatcher on mobile (#17078)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17078

This prevents caffe2 operators from being expsoed to c10 on mobile,
which in turn causes the whole c10 dispatcher to be stripped away
and saves binary size.

We probably want to re-enable the c10 dispatcher for mobile,
but for now this is ok.

Reviewed By: ezyang

Differential Revision: D14077972

fbshipit-source-id: e4dd3e3b60cdfbde91fe0d24102c1d9708d3e5c4
2019-02-27 17:57:50 -08:00
Ilia Cherniavskii
348d1889ff Fix operator initialization order (#15445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15445

Initilize task graph after operators (task graph uses ops)

Reviewed By: yinghai

Differential Revision: D13530864

fbshipit-source-id: fdc91e9158c1b50fcc96fd1983fd000fdf20c7da
2019-02-26 15:41:16 -08:00
Ilia Cherniavskii
47263e48f4 Better handling of net errors in prof_dag counters (#17384)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17384

Better handling of possible net run errors in prof_dag counters.

Reviewed By: yinghai

Differential Revision: D14177619

fbshipit-source-id: 51bc952c684c53136ce97e22281b1af5706f871e
2019-02-22 18:38:31 -08:00
Ilia Cherniavskii
0edc81136c Rethrow exceptions from RunAsync (#15034)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15034

Rethrow exception happened during RunAsync, ensure that pending tasks
are not executed after marked as finished

Reviewed By: andrewwdye

Differential Revision: D13409649

fbshipit-source-id: 3fd12b3dcf32af4752f8b6e55eb7a92812a5c057
2019-02-20 16:32:24 -08:00
Ilia Cherniavskii
0337494c6a Reinforce scheduling invariants (#17132)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17132

schedule() function is not supposed to throw exception and is supposed
to succeed in scheduling the full graph of tasks, potential errors (e.g. errors
from underlying thread pool, out of memory exceptions etc) are considered not
recoverable.
The invariant - the graph of tasks is either not executed or
executed in full before the call to finishRun()

Reviewed By: andrewwdye

Differential Revision: D14092457

fbshipit-source-id: a3e5d65dfee5ff5e5e71ec72bb9e576180019698
2019-02-20 16:32:23 -08:00
Brian W. Hart
fbd690c1fe caffe2: fix PinnedCPUAllocator cudaHostRegister() leak (#16340)
Summary:
In the NUMA case, PinnedCPUAllocator's allocate() would return a
DataPtr constructed by DefaultCPUAllocator, which would reference
the Default... Delete() rather than the Pinned... Delete(). That
meant Pinned... Delete() would never run, so cudaHostUnregister()
would never be called when regions were freed.

See: https://github.com/pytorch/pytorch/issues/16280

This change adds a 'naked_allocate()' method to the Default allocator
that just returns a pointer to the allocated memory rather than
wrapping it in a DataPtr. Pinned allocator uses that then constructs
a DataPtr with reference to its own Delete().
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16340

Reviewed By: dzhulgakov

Differential Revision: D13843206

Pulled By: ezyang

fbshipit-source-id: 9efb572e5a01b49ef2a4aceeccc13cd0b1066528
2019-02-15 07:02:33 -08:00
Michael Liu
5f866d0ea2 Apply modernize-use-override (2nd iteration)
Summary:
Use C++11’s override and remove virtual where applicable.
Change are automatically generated.

Reviewed By: Orvid

Differential Revision: D14086124

fbshipit-source-id: 2005227d095d776ca3b4309a57f54e25782b9b58
2019-02-14 16:52:57 -08:00
Sebastian Messmer
65e06df24a Use new constructor in USE_SIMPLE_CTOR_DTOR (#17080)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17080

This changes all operators using this macro to the new format

Reviewed By: dzhulgakov

Differential Revision: D14078628

fbshipit-source-id: 67048e485e326765fd49567cc008633d3d500d5c
2019-02-14 15:54:16 -08:00
Dmytro Dzhulgakov
3408d9de20 Clean up Storage/StorageImpl constructors (#16948)
Summary:
Small cleanup while doing https://github.com/pytorch/pytorch/pull/16857:

- rename C2 constructors as create_legacy
- remove duplicated constructors
- make resizable flag non-default
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16948

Differential Revision: D14062755

Pulled By: dzhulgakov

fbshipit-source-id: 3b7b4ec9cdf67d2628cccc001156e040006b673e
2019-02-13 22:58:32 -08:00
Dmytro Dzhulgakov
51dd2000cd unify c2 and TH allocator (#16892)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16892

Replaces https://github.com/pytorch/pytorch/pull/14517

Merged caffe2 and TH CPU Allocators. Mostly using the code from caffe2 allocators.
`memset` of caffe2 allocator is gone now. These two allocators should be almost the same.

Baseline:
```
Running ./tensor_allocation
Run on (48 X 2501 MHz CPU s)
CPU Caches:
  L1 Data 32K (x24)
  L1 Instruction 32K (x24)
  L2 Unified 256K (x24)
  L3 Unified 30720K (x2)
-------------------------------------------------------------------------
Benchmark                                  Time           CPU Iterations
-------------------------------------------------------------------------
BM_MakeStorageImpl                       148 ns        148 ns    4676594
BM_StorageImplCtor                        54 ns         54 ns   12957810
BM_MallocStorageImpl                      62 ns         62 ns   11254745
BM_TensorImplCtor                         22 ns         22 ns   31939472
BM_MallocTensorImpl                      105 ns        105 ns    6505661
BM_Malloc_1                               43 ns         43 ns   16464905
BM_MakeTensorFromStorage                 126 ns        126 ns    5586116
BM_MakeVariableFromTensor                236 ns        236 ns    2995528
BM_ATenCPUTensorAllocationSmall1         319 ns        319 ns    2268884
BM_ATenCPUTensorAllocationSmall2         318 ns        318 ns    2163332
BM_ATenCPUTensorAllocationMedium1        403 ns        403 ns    1663228
BM_ATenCPUTensorAllocationMedium2        448 ns        448 ns    1595004
BM_ATenCPUTensorAllocationBig1           532 ns        532 ns    1352634
BM_ATenCPUTensorAllocationBig2          4486 ns       4486 ns     160978
```
Changed:
```
Running ./tensor_allocation
Run on (48 X 2501 MHz CPU s)
CPU Caches:
  L1 Data 32K (x24)
  L1 Instruction 32K (x24)
  L2 Unified 256K (x24)
  L3 Unified 30720K (x2)
-------------------------------------------------------------------------
Benchmark                                  Time           CPU Iterations
-------------------------------------------------------------------------
BM_MakeStorageImpl                       141 ns        141 ns    4803576
BM_StorageImplCtor                        55 ns         55 ns   13129391
BM_MallocStorageImpl                      64 ns         64 ns   11088143
BM_TensorImplCtor                         23 ns         23 ns   31616273
BM_MallocTensorImpl                      101 ns        101 ns    7017585
BM_Malloc_1                               39 ns         39 ns   18523954
BM_MakeTensorFromStorage                 118 ns        118 ns    5877919
BM_MakeVariableFromTensor                452 ns        452 ns    1565722
BM_ATenCPUTensorAllocationSmall1         384 ns        384 ns    1819763
BM_ATenCPUTensorAllocationSmall2         389 ns        389 ns    1857483
BM_ATenCPUTensorAllocationMedium1        425 ns        425 ns    1646284
BM_ATenCPUTensorAllocationMedium2        430 ns        430 ns    1561319
BM_ATenCPUTensorAllocationBig1           508 ns        508 ns    1309969
BM_ATenCPUTensorAllocationBig2          3799 ns       3799 ns     173674
```

lstm benchmark:
Before:
```
INFO:lstm_bench:Iter: 1 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 21 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 41 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 61 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 81 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 101 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 121 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 141 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 161 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 181 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 201 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 221 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 241 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 261 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 281 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 301 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 321 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 341 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 361 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 381 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Done. Total EPS excluding 1st iteration: 0.8k
```

After:
```
INFO:lstm_bench:Iter: 1 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 21 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 41 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 61 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 81 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 101 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 121 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 141 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 161 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 181 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 201 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 221 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 241 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 261 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 281 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 301 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 321 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 341 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 361 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 381 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Done. Total EPS excluding 1st iteration: 0.8k
```

Reviewed By: ezyang

Differential Revision: D13202632

fbshipit-source-id: db6d2ec756ed15b0732b15396c82ad42302bb79d
2019-02-12 21:16:34 -08:00
Sebastian Messmer
9696fee635 Register CUDA kernels for caffe2 operators (#16691)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16691

Previous diffs already introduced a macro that registers caffe2 CPU kernels with c10.
This now also registers the CUDA kernels with it.

Reviewed By: bwasti

Differential Revision: D13901619

fbshipit-source-id: c15e5b7081ff10e5219af460779b88d6e091a6a6
2019-02-12 17:24:01 -08:00
Yinghai Lu
f435fb8290 Allow customization of blob node in net_drawer (#16915)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16915

TSIA

Reviewed By: ipiszy

Differential Revision: D14018010

fbshipit-source-id: df5ccc06fa37f08e7a02a8acc466c4ad47afe04e
2019-02-12 15:02:50 -08:00
Edward Yang
40528efeac More docs for methods in operator.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16826

Reviewed By: izdeby

Differential Revision: D13979891

fbshipit-source-id: df8391ffaff0d44845057bb839f05aea6fc5712c
2019-02-12 08:19:38 -08:00
Sebastian Messmer
64aa769ef9 Minimize templated code in caffe2 operator wrapper (#16965)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16965

Instead of having one large templated function to wrap the caffe2 op, minimize the amount of templated code.
Non-templated code can be reused between different operators and decreases binary size.

Reviewed By: orionr

Differential Revision: D14018806

fbshipit-source-id: bedd4152eec21dd8c5778446963826316d210543
2019-02-11 14:15:43 -08:00
Xiaodong Wang
af0c79eed4 Catch cudaError_t return val (nodiscard in rocm) (#16399)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16399

Catching cudaError_t return values in a few places, because it's nodiscard in rocm. Unless we add -Wno-unused-result, it'll end up with a compilation error.

Also in c10/cuda/test, check whether a host has GPU or not. We were silently throwing out the error before (so not really testing the cuda api).

Reviewed By: bddppq

Differential Revision: D13828281

fbshipit-source-id: 587d1cc31c20b836ce9594e3c18f067d322b2934
2019-02-11 13:18:36 -08:00
Bram Wasti
12bace141b Revert D13970381: [caffe2] Add visibility to registry class to fix ubsan error
Differential Revision:
D13970381

Original commit changeset: 763db24b8a98

fbshipit-source-id: dda8672ed0bc6fecc4dde5ce73feb99e15205978
2019-02-08 16:21:10 -08:00
Bram Wasti
54c981d9a9 Add visibility to registry class to fix ubsan error (#16792)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16792

fix

Reviewed By: ezyang

Differential Revision: D13970381

fbshipit-source-id: 763db24b8a98a2757a63b77c70c8c68ba47f31e6
2019-02-08 10:17:47 -08:00
Edward Yang
b9b0be7af2 Remove Legacy entry point. (#16721)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16721

The very key line is we have to set the stream to the default
stream before calling the allocator.  This is very interesting.
It shouldn't be necessary, but seemingly is!

Reviewed By: dzhulgakov

Differential Revision: D13943193

fbshipit-source-id: c21014917d9fe504fab0ad8abbc025787f559287
2019-02-08 09:33:58 -08:00
Edward Yang
5c982622b0 Delete duplicate copy of THCCachingAllocator (round two). (#16615)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16615

This is another go at landing https://github.com/pytorch/pytorch/pull/16226
Now that the caching allocator is moved to c10_cuda, we can
delete the duplicate copy from Caffe2.

The difference between this and the previous PR is that this
version faithfully maintains the binding code; in particular,
we end up with a SECOND copy of the caching allocator in
this patch.  I verified that this code does NOT cause a crash
in the workflow we canaried last time.

In further diffs, I plan to eliminate the second copy, and then
adjust the binding code.

Reviewed By: dzhulgakov

Differential Revision: D13901067

fbshipit-source-id: 66331fd4eadffd0a5defb3cea532d5cd07287872
2019-02-08 09:33:55 -08:00
Sebastian Messmer
2c713032a1 Don't automatically handle context parameter (#16867)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16867

Some caffe2 operators (example: BBoxTransform) have not just one template parameter which is the context, but might have multiple template parameters.
Because of this, we can't handle the context parameter inside the macro.

Reviewed By: bwasti

Differential Revision: D13995696

fbshipit-source-id: f55c3be913c8b125445a8d486846fc2fab587a63
2019-02-07 20:53:17 -08:00
Sebastian Messmer
64339dbd51 Fix and re-enable test case (#16643)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16643

The test was disabled in D13908117 because it conflicted with another diff that was about to land.
Now fixed the merge conflict and re-landing it.

Reviewed By: ezyang

Differential Revision: D13911775

fbshipit-source-id: b790f1c3a3f207916eea41ac93bc104d011f629b
2019-02-07 13:58:16 -08:00
Sebastian Messmer
6750e1e3e9 C10_REGISTER_CAFFE2_OPERATOR: Macro for registering c2 kernels (#16548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16548

With this macro, a caffe2 operator can now directly be registered with c10.
No need to write custom wrapper kernels anymore.

Differential Revision: D13877076

fbshipit-source-id: e56846238c5bb4b1989b79855fd44d5ecf089c9c
2019-02-07 13:58:14 -08:00
Edward Yang
4404762d7d Rename IntList to IntArrayRef. (#16751)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16751

This was made more complicated by the fact that ivalue::IntList
is a thing.  So I had to fix all of the sites where we referring
to IValue post facto.

The following codemods were run, in this order:

```
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntList IntArrayRef
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntArrayRef::create IntList::create
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in ivalue::IntArrayRef ivalue::IntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in Tag::IntArrayRef Tag::IntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in isIntArrayRef isIntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in toIntArrayRef toIntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'Shared<IntArrayRef>' 'Shared<IntList>'
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'intrusive_ptr<IntArrayRef>' 'intrusive_ptr<IntList>'
```

Some manual fixups were done afterwards; they can be reviewed separately
at https://github.com/pytorch/pytorch/pull/16752

Reviewed By: dzhulgakov

Differential Revision: D13954363

fbshipit-source-id: b5c40aacba042402155a2f5a229fa6db7992ac64
2019-02-05 14:54:34 -08:00
Bram Wasti
af4d2b889c Enable undefined at::Tensor to be passed as Output (#16730)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16730

with Jerry's new updates Tensor must be defined -- as a result I've needed to update the shim for caffe2 ops being used in PyTorch

Reviewed By: smessmer

Differential Revision: D13946950

fbshipit-source-id: 6f77877c61a743f82bdfc2ad04d6ab583000cc18
2019-02-05 12:56:46 -08:00
Dmytro Dzhulgakov
3796cbaf7a Try to turn off zero-out of tensors fully
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16601

Reviewed By: ezyang

Differential Revision: D13893776

fbshipit-source-id: 3190258f2591540dc54ad8504ac6ded998bef384
2019-02-04 23:59:11 -08:00
Jerry Zhang
cb9740a608 Tensor method rename size()->numel() - 1/3
Summary: Codemod generated with clangr shard mode, 25 files per diff,

Reviewed By: dzhulgakov

Differential Revision: D13944296

fbshipit-source-id: 67e97c2cf45889d25f2cb3e2203cecba03c8a3aa
2019-02-04 23:33:17 -08:00
Sebastian Messmer
aaa8ace486 Implement new c10 dispatcher (#16625)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16625

This is a squash of multiple PRs that refactored the old c10 dispatcher into a new one that follows the c10 dispatcher design doc.
It is now unboxed and follows the Stack semantics from JIT. It also uses the runtime JIT schema instead of its own compile time schema definitions.

Reviewed By: ezyang

Differential Revision: D13907069

fbshipit-source-id: edcc4806ccd21474fdfb5a98516219b1956db13d
2019-02-01 13:52:01 -08:00
Bram Wasti
e4c1b51d82 Shim caffe2 GetRepeatedArgument helper for use with Ivalue (#16519)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16519

GetRepeatedArguments is needed for some ops

Reviewed By: dzhulgakov

Differential Revision: D13864293

fbshipit-source-id: a39255cd391c28acd75a6f0e81d558542417e032
2019-01-31 17:33:57 -08:00
James Reed
dfb081a7e4 Fix a lot of C++ build warnings (#16411)
Summary:
I went through my build log and did what I thought were reasonable fixes to all the C++ compilation warnings that came up
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16411

Differential Revision: D13901006

Pulled By: jamesr66a

fbshipit-source-id: 02df4e3e5a5c8dd9e69ac9f065cd3f2a80645033
2019-01-31 14:35:56 -08:00
Dmytro Dzhulgakov
a061e3fd77 Back out "Revert D13596031: Improve c2-aten tensor interop and add proper testing" (#16514)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16514

Original commit changeset: dc371697f14b
Relanding https://github.com/pytorch/pytorch/pull/15860 - the problem was that layer_norm was using at::empty which is not yet on mobile

Reviewed By: ezyang

Differential Revision: D13861480

fbshipit-source-id: e2116da32bc117175c96b9151b1beba9b31eff36
2019-01-31 13:38:55 -08:00
Bram Wasti
1ff46f03ed Fix SIOF in torch using caffe2 registry (#16473)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16473

This resolves the issues associated with caffe2 initialization (specifically the REGISTER_FUNCTION_SCHEMA_OPERATOR calls) being run after Torch's static op registration calls.

The fix employs a meyer's singleton wrapped by the constructor of a type.  Everything is placed inside a macro to make it easier for users to use.

Reviewed By: smessmer

Differential Revision: D13854306

fbshipit-source-id: ecf60861f229532826fae254974e9af4389055df
2019-01-31 13:04:11 -08:00
Bram Wasti
1efad7f6be Swap Caffe2 operator constructor to pass arguments by value (#16576)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16576

allows instantiation of operator with arguments passed by move rather than explicit copies

per Sebastian's suggestion

Reviewed By: smessmer

Differential Revision: D13882416

fbshipit-source-id: bc8d50e73f5a1ae87155b0cf96799b8573a7a8fa
2019-01-31 13:04:09 -08:00
Sebastian Messmer
12f92f453b Kernel gets Stack* instead of ArrayRef<IValue> (#16282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16282

This changes the core kernel abstraction to be a function taking a stack, popping its arguments from the stack and pushing results to the stack,
instead of getting arguments as ArrayRef<IValue> and returning an output IValue.

Caffe2 operators need to have a way to pass in preallocated output tensors.
The convention for them is to get all inputs *and* outputs on the stack and also return all of them, i.e. a caffe2 op will always have inputs == outputs.
This will probably change in later diffs towards making the outputs in-arguments optional in the JIT schema.

Reviewed By: ezyang

Differential Revision: D13792335

fbshipit-source-id: e9cc2b5e438cc4653e1f701633a154b92b604932
2019-01-29 18:22:51 -08:00
Edward Yang
279238f0b8 Back out "Delete duplicate copy of THCCachingAllocator." (#16510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16510

This diff was supposed to be memory usage neutral, but based on
some internal flows involving cuDNN, it was not. Reverting pending
further investigation.

Original commit changeset: 03f1ebf7f11c

Reviewed By: xw285cornell

Differential Revision: D13863610

fbshipit-source-id: 15517e255fd6b0c064b65fb99f0ef19742236cfd
2019-01-29 15:44:19 -08:00
Edward Yang
3b337e7892 Revert D13596031: Improve c2-aten tensor interop and add proper testing
Differential Revision:
D13596031

Original commit changeset: d20b601e06ba

fbshipit-source-id: dc371697f14b3893a9164380a39e7a49d8d68ecf
2019-01-29 07:14:57 -08:00
Dmytro Dzhulgakov
5e21e0fe75 Improve c2-aten tensor interop and add proper testing (#15860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15860

Few changes (which are harder to split in separate diffs, so together):
- make conversion explicit (as they can throw to avoid surprises)
- fix tensor legacy dispatch not initialized when tensor is created on C2 side
- add a bunch of invariants to enforce

Reviewed By: ezyang

Differential Revision: D13596031

fbshipit-source-id: d20b601e06ba47aeff2f6e8e15769840e2d46108
2019-01-28 23:41:50 -08:00
Sebastian Messmer
504bcb276c Remove state from schema and calling API (#16180)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16180

Only the kernel knows about its state, the caller doesn't see it anymore.

Reviewed By: ezyang

Differential Revision: D13744071

fbshipit-source-id: cb00ff1a881508c1b36ac4123bee1f68ca02ca9c
2019-01-28 17:46:08 -08:00
Jerry Zhang
e866bc7c88 Remove dims() in caffe2::Tensor (#16356)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16356

att

Reviewed By: dzhulgakov

Differential Revision: D13813197

fbshipit-source-id: 68c0fb43404536f622422c51949c819d8a037aa5
2019-01-28 12:42:42 -08:00
Sebastian Messmer
05678d0bfa Op-calling API can handle state (#16177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16177

Change the API for calling operators so that it can store state in an OpKernel object.
This diff doesn't store the state there yet, that comes in a follow up diff.

Reviewed By: ezyang

Differential Revision: D13742889

fbshipit-source-id: 20511a9a1b9f850074e50634d4b4acf87f8c6ecd
2019-01-28 11:46:05 -08:00
Edward Yang
d2cdffaf37 More documentation on caffe2::Operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16371

Reviewed By: dzhulgakov

Differential Revision: D13820472

fbshipit-source-id: efccea0e92c86d30ec2bdda50eb9aab8a3a1504d
2019-01-28 07:41:14 -08:00
Jerry Zhang
f4e54fd659 trying to fix testX (#16370)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16370

passed locally but seems testX has some problem

Reviewed By: ezyang

Differential Revision: D13820250

fbshipit-source-id: e4ad9d1ec99508867d4ead46753a7fb7019c50bd
2019-01-25 17:02:21 -08:00
Bram Wasti
13fde345fb plug caffe2 into jit" (#16388)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16388

previous diff broke master -- this refactors out the custom_operator.cpp file into a separate header + cpp pair (caffe2_operator.{h,cpp})

Reviewed By: smessmer

Differential Revision: D13823550

fbshipit-source-id: 00e005e650336132d05aef97c1f0e5242ccad5ba
2019-01-25 16:52:32 -08:00
Jerry Zhang
539894d70a Remove caffe2::ShareData (#16139)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16139

Original commit changeset: 4b15a4c62995

Reviewed By: dzhulgakov

Differential Revision: D13677464

fbshipit-source-id: 1a644a88fac02b44feebac48ccc01bc72cc47edb
2019-01-25 15:39:11 -08:00
Zachary DeVito
c42431bd7a Revert D13740752: [c10] plug caffe2 into jit
Differential Revision:
D13740752

Original commit changeset: 2d9383574d42

fbshipit-source-id: e9ff217a438720423340a10af7fa263b33f2ae24
2019-01-25 12:29:19 -08:00
Edward Yang
45602ce9a2 Delete Tensor::swap(), replace with pointer swap (#12730)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12730

i-am-not-moving-c2-to-c10

Reviewed By: smessmer

Differential Revision: D10415430

fbshipit-source-id: 8a2ce8611c5fa77bbbd73fb6788c1baa3b370f07
2019-01-25 08:25:07 -08:00
Bram Wasti
6d2aee4a9b plug caffe2 into jit (#16331)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16331

Temporary measure to enable caffe2 ops in pytorch

Reviewed By: smessmer

Differential Revision: D13740752

fbshipit-source-id: 2d9383574d42ce84ee471aba32eeb4f5a0cc7a4c
2019-01-24 22:28:21 -08:00
Bram Wasti
d4b60f4014 Add RunOperator for using FunctionSchema registered ops easily in caffe2 (#16173)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16173

Helper to make it easy to run ops in caffe2

Reviewed By: smessmer

Differential Revision: D13468240

fbshipit-source-id: 2276c7870af6dcdf829957f005fd16ac1ef319b5
2019-01-24 22:28:19 -08:00
Bram Wasti
3b6b777a11 Add correct Input() shim to caffe2 operator impl (#16048)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16048

This enables full shimming of the operator (previously it was only
Output() shimmed).

Reviewed By: smessmer

Differential Revision: D13468241

fbshipit-source-id: c853b775ab5cdcd968f4a6cc4766e91c3c6b1c45
2019-01-24 22:28:18 -08:00
Jerry Zhang
0b470d0a3b Add Test for ReinitializeTensor (#16338)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16338

att

Reviewed By: ezyang

Differential Revision: D13806760

fbshipit-source-id: 322b9b7d314aeb0194f52b803ca35c0cb8efcdec
2019-01-24 15:05:21 -08:00
Will Feng
2a70f24cce Add thread-local guard: at::AutoNonVariableTypeMode (#15939)
Summary:
This PR adds thread-local guard (`at::AutoNonVariableTypeMode`) to make sure that in VariableType.cpp the operations on baseType still dispatch to non-Variable type, even if the parameters will become Variables after the Tensor/Variable merge. We achieve this by making `legacyTensorType()` and `getType()` check the `at::AutoNonVariableTypeMode` guard to decide whether to return non-Variable type for a variable.

This is part of the VariableImpl/TensorImpl merge work: https://github.com/pytorch/pytorch/issues/13638.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15939

Reviewed By: ezyang

Differential Revision: D13640980

Pulled By: yf225

fbshipit-source-id: d12c2543822958558d7d70d36c50999a5eb8783f
2019-01-24 14:33:03 -08:00
Edward Yang
792cb774f1 Delete duplicate copy of THCCachingAllocator. (#16226)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16226

Now that the caching allocator is moved to c10_cuda, we can
delete the duplicate copy from Caffe2.

Reviewed By: dzhulgakov, smessmer

Differential Revision: D13762540

fbshipit-source-id: 03f1ebf7f11c68c19aa0d66110156fe228da6138
2019-01-24 12:06:57 -08:00
Jerry Zhang
0a3932acb2 Fix comparison in ReinitializeTensor (#16294)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16294

In `ReinitializeTensor`, we compare `tensor->GetDevice()` and `options.device()`, but in the callsite, we actually just provide an option with `device_type`, which means the `device_id` will always be default(-1) for `options`, but for tensor, although it is passed a `device` with default `device_id`, when we allocate the data, the `device` of the `tensor` is the `device` of `Storage`, which is the `device` of underlying `DataPtr`, which is the same as the `device` of the `Context` of the operator, which has a non-default `device_id`.

Therefore everytime we do `ReinitializeTensor`, we'll find the `device` does not match, and after the `ReinitializeTensor` call, the `device` still does not match. That's why everytime we'll allocate a new Tensor and cause perf regressions for ops that uses `ReinitializeTensor` on multiple GPUs.

Reviewed By: BIT-silence

Differential Revision: D13795635

fbshipit-source-id: 24d6afa1a0196a32eb0134ee08b4280244cdb0c3
2019-01-23 19:29:29 -08:00
Benny Chen
f25322fb97 Fix issues under caffe round 1
Summary: Some automation to fix uninitialized members for caffe2 code. Ran canary to make sure I don't have any regression in prod, but not sure how to test comprehensively for caffe2

Reviewed By: ezyang

Differential Revision: D13776185

fbshipit-source-id: fb2a479971cc0276d8784be1c44f01252410bd24
2019-01-23 19:04:59 -08:00
Edward Yang
07a090247a Change data() accessor in Caffe2 to return non-const pointer. (#16176)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16176

This makes PyTorch and Caffe2's data() method line up.
Historically, PyTorch made no distinction between tensors
with const or non-const data, and thus provided a
non-const pointer with data() member.  Changing the API to
return a const-pointer would break all mutable code, whereas
changing the Caffe2 API to change a pointer doesn't break
any code, *except* for code which required an exact match
on const-ness (e.g., in template arguments).  Since the latter
is less disruptive, we've opted for it here.

The few places downstream that broke due to this are fixed
in this patch.

Reviewed By: smessmer

Differential Revision: D13742916

fbshipit-source-id: baa4b4544cfdf7c1f369f4d69a1e0d5953c1bd99
2019-01-23 13:55:24 -08:00
Shahzad Lone
53ae8bc64d Reserve vectors that we know the size in advance for. (#16201)
Summary:
Save reallocation costs, by reserving vectors according to how many elements we expect to put in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16201

Differential Revision: D13762594

Pulled By: ezyang

fbshipit-source-id: 7e3bfe421489dde48a2ddb0920dd155f69baecc0
2019-01-22 08:02:40 -08:00
Sebastian Messmer
c9044166a5 Make c10 dispatcher use boxed kernel function pointers (#16051)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16051
This changes the kernels stored in the c10 dispatcher from plain C function pointers to IValue-based KernelFunction*.

Note that KernelFunction is currently taking an `ArrayRef<IValue>` as arguments. A later diff will change that to it taking a `Stack*`.

Reviewed By: ezyang

Differential Revision: D13684518

fbshipit-source-id: 1fa54f60cec2e967b92a4a043d6e3ac1627ed991
2019-01-18 16:02:15 -08:00
Jerry Zhang
3f4bb3d493 rest of uses for deprecation of dims() in Tensor (#16118)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16118

att

Differential Revision: D13697211

fbshipit-source-id: 12bf6edd1794240ac748cc1b8fecb0c1e8eb9112
2019-01-18 11:52:12 -08:00
Jerry Zhang
da578b7dcf Add defined() to caffe2::Tensor (#16125)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16125

Add defined() method to check whether the Tensor is defined.

Reviewed By: ezyang

Differential Revision: D13719222

fbshipit-source-id: ff8efef2159ed1026bd16acaea40c768a1e20a47
2019-01-18 11:03:36 -08:00
Edward Yang
b9b160d86f Remove ATen/Half.h and ATen/core/Half.h forwarding headers.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16115

Reviewed By: bddppq

Differential Revision: D13717049

fbshipit-source-id: fb1d690183a932a1fa1a2d235f3219520f51620a
2019-01-18 10:55:21 -08:00
Dmytro Dzhulgakov
aaff2fecda Remove caffe2::Tensor copy constructor (#15416)
Summary:
Based on offline discussion it should be less surprising to the users of existing code. Thus caffe2::Tensor is now a move-only class (as it used to be), explicit calls to UnsafeSharedInstance() are necessary to get shared_ptr behavior.

This change also identified a few places that misused the copy constructor - those are fixed

Pull Request resolved: https://github.com/pytorch/pytorch/pull/15416

Reviewed By: Yangqing

Differential Revision: D13524598

fbshipit-source-id: aea12d6dff77342606fa88ce4ddddbff266245a7
2019-01-18 00:31:56 -08:00
Sebastian Messmer
3e85a2bcbf Move c10 dispatcher back to ATen/core (#16050)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16050

The c10 dispatcher will (soon) depend on IValue and IValue can't be moved to c10 yet because it depends on at::Tensor, which depends on legacy Type dispatch and we don't want the legacy dispatch in c10.

So instead, we move the c10 dispatcher back to ATen/core until we can actually move at::Tensor to c10.

Reviewed By: ezyang

Differential Revision: D13684517

fbshipit-source-id: 1125f4254223907c52f96ff73034f6d4ae9fd0a7
2019-01-17 15:56:52 -08:00
QingfengLi
ded4ff87af fix a little error in comments (#15922)
Summary:
There is a little error in the comment, "A->B",  so the Task B must start after task A finishes, not "B".
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15922

Differential Revision: D13709579

Pulled By: ezyang

fbshipit-source-id: 735afe83f4532b7c7456da3e96209b3e07071f37
2019-01-17 00:25:23 -08:00
fulltopic
c7a48da493 Corresponding data type for BYTE (#15627)
Summary:
TensorProto.DataType in caffe2/proto/caffe2.proto has BYTE = 3 defined, while there is no corresponding TypeMeta defined in caffe2/core/types.cc: DataTypeToTypeMeta. This issue failed the C++ tutorial of MNIST + LMDB.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15627

Differential Revision: D13709602

Pulled By: ezyang

fbshipit-source-id: d4826d0f9b3975e6a8478d4bad1abbbedcaea197
2019-01-17 00:17:56 -08:00
Edward Yang
5b2d30ec85 Revert D12812029: [pt1][tensor] Remove deprecated caffe2::Tensor APIs
Differential Revision:
D12812029

Original commit changeset: ea0c3dd882be

fbshipit-source-id: d5bb4cbb1d7c9be08789599a7db0fb3313f3dbc4
2019-01-16 14:53:20 -08:00
Jerry Zhang
5353847b19 Remove deprecated caffe2::Tensor APIs (#15814)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15814

Plan is to remove the APIs we want to deprecate one by one and make sure it still builds in sandcastle and ossci

Reviewed By: ezyang

Differential Revision: D12812029

fbshipit-source-id: ea0c3dd882bec95fcd4507160ebc61f598b6d040
2019-01-15 18:42:04 -08:00
Jerry Zhang
5e72e99c86 Remaining Tensor API fixes - dims() -> sizes() (#15743)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15743

Remaining fixes so that D12812029 will compile

Reviewed By: dzhulgakov

Differential Revision: D13535559

fbshipit-source-id: 2c8b3403570c8c35ac8efe2d827233abc0e6e0d1
2019-01-15 18:42:02 -08:00
Edward Yang
8b5894491c Comment about CuDNNWrapper (#15496)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15496

Differential Revision: D13544130

Pulled By: ezyang

fbshipit-source-id: 51bdd8312b482925b30a478774cdfa629c57ee4e
2019-01-15 18:01:12 -08:00
Duc Ngo
10b16953d1 nomnigraph - easy - use new test utils in converter_nomnigraph_test (#15751)
Summary:
Use new test utils in converter_nomnigraph_test , and add utils to set device option name, external inputs, outputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15751

Differential Revision: D13586228

Pulled By: duc0

fbshipit-source-id: ff809dd7bf9f30641ce2a6fef7e2810f005521c2
2019-01-14 18:38:38 -08:00
Jerry Zhang
6371bc76a9 Back out "[pt1][tensor] Remove caffe2::ShareData" (#15983)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15983

Original commit changeset: 6e4275d02f4c

Reviewed By: supertopher, Yangqing

Differential Revision: D13644123

fbshipit-source-id: 4b15a4c62995c0e68aad58465600409e302e6504
2019-01-12 07:07:22 -08:00
Jerry Zhang
fd0ed2e298 Tensor reinitialization codemod - 4/5 (#15967)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15967

Codemod generated with clangr shard mode, 25 files per diff,
To eliminiate partially initialized Tensor, we split the initialization of local Tensor variables into two steps, first declare un uninitialized Tensor, and
call `ReinitializeTensor` to initialize it.
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: smessmer

Differential Revision: D13586735

fbshipit-source-id: eae2d79e1107a2e813ce3809e690af4706aaa9ca
2019-01-11 16:41:19 -08:00
Lin Yang
d042914221 FC shape inference should use int64_t (#15961)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15961

as title

Reviewed By: yinghai

Differential Revision: D13634427

fbshipit-source-id: ec7d168b6272f0dac8a693401cfd0bea368f929a
2019-01-11 14:28:39 -08:00
Dmytro Dzhulgakov
96ea2594d8 Don't call cudaStreamDestroy at destruction time (#15692)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15692

It was leading to ocassional crashes with dynamically linked CUDA because runtime was already destroyed.

Also, unique_ptr<T[]> is more suitable than deque<T> for the purpose.

Reviewed By: Yangqing

Differential Revision: D13571988

fbshipit-source-id: 37eb26dfbe361c49160367b53f87bd037c6c0e46
2019-01-11 12:36:41 -08:00
Sebastian Messmer
da7468853a caffe2::Tensor::is_same() (#15407)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15407

Don't ask the tensor for its intrusive pointer if we just want to check if two tensors are the same.
This mirrors ATen APIs.

Reviewed By: dzhulgakov

Differential Revision: D13520389

fbshipit-source-id: 681317f36f480ab60e532bb08a073f98f39770fd
2019-01-10 16:22:25 -08:00
Sebastian Messmer
d408324350 Move files to/from c10/core and c10/util (#15316)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15316

This starts cleaning up the files in c10 according to the module structure we decided on.

Move to c10/util:
- Half.h, Half-inl.h, Half.cpp, bitcasts.h

Move to c10/core:
- Device.h, Device.cpp
- DeviceType.h, DeviceType.cpp

i-am-not-moving-c2-to-c10

Reviewed By: dzhulgakov

Differential Revision: D13498493

fbshipit-source-id: dfcf1c490474a12ab950c72ca686b8ad86428f63
2019-01-10 16:22:22 -08:00
Sebastian Messmer
6b64052e20 Remove Context from c10 operator schemas (#15312)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15312

Context will soon be entirely obsolete. Remove it from the operator schema interface.

Reviewed By: dzhulgakov

Differential Revision: D13495323

fbshipit-source-id: caa0f8f092cd6284e510c3e1e3374fe2f8338364
2019-01-10 16:22:20 -08:00
Orion Reblitz-Richardson
4edc8273eb Allow for registration after GlobalInit (#15876)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15876

Build changes made it so some .so libraries are now registered after GlobalInit is called. Although this shouldn't be common, it also shouldn't be explicitly excluded. These changes allow for late Caffe2 registration, but also warn in that case.

Reviewed By: kuttas

Differential Revision: D13608186

fbshipit-source-id: 0ca7bcd32516d374077db0c2548cf8c28ccdd5f6
2019-01-10 09:35:33 -08:00
Jerry Zhang
0c32e1b43e use C10_MOBILE/ANDROID/IOS (#15363)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15363

Didn't define C10_MOBILE in the numa file move diff: D13380559
move CAFFE2_MOBILE/ANDROID/IOS to c10

```
codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_MOBILE" "C10_MOBILE"
codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_ANDROID" "C10_ANDROID"
codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_IOS" "C10_IOS"

```

i-am-not-moving-c2-to-c10

Reviewed By: marcinkwiatkowski

Differential Revision: D13490020

fbshipit-source-id: c4f01cacbefc0f16d5de94155c26c92fd5d780e4
2019-01-09 15:08:20 -08:00
Sebastian Messmer
d562840910 Use C10Tensor in the dispatcher (#15195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15195

This removes the use of caffe2::Tensor or at::Tensor in the c10 dispatcher and only uses C10::Tensor.
It also changes output tensors to be passed as `const Tensor&` instead of `Tensor*` because we otherwise can't forward them in operator_c10wrapper.h.

Reviewed By: ezyang

Differential Revision: D13461640

fbshipit-source-id: 7f79925a7d60f01660a24bbfda47391af0c70ed3
2019-01-08 20:31:43 -08:00
Sebastian Messmer
8ac55a6812 Convert caffe2/aten Tensors to/from c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14820

Reviewed By: dzhulgakov

Differential Revision: D13348044

fbshipit-source-id: 95008e6ead3cfc478696b1c203769241d4cf6ca8
2019-01-08 20:31:42 -08:00
Sebastian Messmer
31d7c933af Implement c10::Tensor (#14819)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14819

This is a minimal wrapper for a c10::TensorImpl,
maybe destined for greatness later when we move caffe2::Tensor or at::Tensor into c10.

Reviewed By: dzhulgakov

Differential Revision: D13348039

fbshipit-source-id: 874f515358e94f35dc7a4c3e55b35fde59c51ff1
2019-01-08 20:31:40 -08:00
Jerry Zhang
ede1f4ad05 Remove caffe2::ShareData (#15418)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15418

Previously we are using Resize + ShareData.
Instead, we'll create a function on Tensor that clones itself with same storage.

Suppose we want `t` to `ShareData` with `t0`, Previous:
```
Tensor t(dims, CPU);
t.Resize(t0.sizes());
t.ShareData(t0);
```
Now:
```
Tensor t = t0.Alias();
```

Reviewed By: dzhulgakov

Differential Revision: D13507609

fbshipit-source-id: 6e4275d02f4c3356cbce91127f1b01111dc86b9f
2019-01-08 11:01:56 -08:00
Jerry Zhang
3ea5a9a66d Remove PythonOp non-CPU path and PytorchOp (#15417)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15417

Right now the way we test whether Blob contains a CPU tensor is broken in ```PythonOpBase``` is broken, which means non-CPU path might never be taken.
Searching through the codebase, non-gpu path is used in PythonDLPack, and it is used in PytorchOp which is unused. So we'll remove non-gpu path in this diff.

Reviewed By: dzhulgakov

Differential Revision: D13495011

fbshipit-source-id: 9fe9537f05026d2a2cf7051efa81d184de722710
2019-01-02 16:36:37 -08:00