pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Lin Yang	cc5befc461	[Format] format a few files (#35187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35187 When I touch these files, lint will always introduce some unintended change, to prevent it from happening, we need to format the code first. change is generated by: arc f Test Plan: integration test. Differential Revision: D20587596 fbshipit-source-id: 512cf6b86bd6632a61c80ed53e3a9e229feecc2a	2020-04-17 14:30:01 -07:00
Edward Yang	dd64e738c5	Expunge TensorId from all DispatchKey names. (#36240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36240 It's annoying, historical, and unnecessary (enum class is already namespaced). I did this codemod with: ``` git grep -l 'CPUTensorId' \| xargs sed -i 's/CPUTensorId/CPU/g' git grep -l 'CUDATensorId' \| xargs sed -i 's/CUDATensorId/CUDA/g' git grep -l 'VariableTensorId' \| xargs sed -i 's/VariableTensorId/Autograd/g' git grep -l 'HIPTensorId' \| xargs sed -i 's/HIPTensorId/HIP/g' git grep -l 'MSNPUTensorId' \| xargs sed -i 's/MSNPUTensorId/MSNPU/g' git grep -l 'XLATensorId' \| xargs sed -i 's/XLATensorId/XLA/g' git grep -l 'PrivateUse1_TensorId' \| xargs sed -i 's/PrivateUse1_TensorId/PrivateUse1/g' git grep -l 'PrivateUse2_TensorId' \| xargs sed -i 's/PrivateUse2_TensorId/PrivateUse2/g' git grep -l 'PrivateUse3_TensorId' \| xargs sed -i 's/PrivateUse3_TensorId/PrivateUse3/g' git grep -l 'AutocastTensorId' \| xargs sed -i 's/AutocastTensorId/Autocast/g' git grep -l '_PreAutogradTensorId' \| xargs sed -i 's/_PreAutogradTensorId/_PreAutograd/g' git grep -l 'TESTING_ONLY_GenericWrapperTensorId' \| xargs sed -i 's/TESTING_ONLY_GenericWrapperTensorId/TESTING_ONLY_GenericWrapper/g' git grep -l 'TESTING_ONLY_GenericModeTensorId' \| xargs sed -i 's/TESTING_ONLY_GenericModeTensorId/TESTING_ONLY_GenericMode/g' ``` Then I did a git grep for remaining TensorId occurrences, and manually killed those (mostly in codegen, and some docs that needed updating). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20929255 Pulled By: ezyang fbshipit-source-id: dc371b6aa6e6ea7c0a5660137c14debde806a09d	2020-04-13 23:33:44 -07:00
Hao Lu	4d1ccafb4b	[caffe2] Enable copying for caffe2::Tensor (#36468 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36468 Since `caffe2::Tensor` is now refcounted, enabling copy constructor and the copy assignment operator should be fine. Test Plan: ``` buck test mode/dev //caffe2/caffe2:caffe2_test_cpu -- TensorTest ``` AI/AF canaries with changes up to D20959214: https://our.intern.facebook.com/intern/experiment_store/experiment/3298538636995/#commit1-commit2 https://our.intern.facebook.com/intern/experiment_store/experiment/2199027015376/#commit1-commit2 AI/AF canaries on this diff: https://our.intern.facebook.com/intern/ads/canary/425960191574068914/ https://our.intern.facebook.com/intern/ads/canary/425960179835413033/ Reviewed By: yinghai Differential Revision: D20985924 fbshipit-source-id: ead5f5ceff23d0adc06d598128de16a5533d767b	2020-04-13 21:41:52 -07:00
Tristan Rice	ce54f0d411	Back out "Revert D20449887: [dt][caffe2] enable using smart exceptions in async nets" (#36172 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36172 Original commit changeset: 3d7801613f86 D20449887 broke some OSS tests as the OSS export sync wasn't working correctly. Test Plan: Manually export latest version to OSS to trigger the tests + test plan in D20449887 verified onnx tests are passing in https://github.com/pytorch/pytorch/pull/36172 Reviewed By: andrewwdye Differential Revision: D20902279 fbshipit-source-id: bc30fcc9f5cc8076f69a5d92675fd27455948372	2020-04-13 11:31:52 -07:00
Tristan Rice	90c7db8ae3	caffe2/core/plan_executor: add cancellation of async nets on error + propagate exceptions via std::exception_ptr for stack traces (#31966 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31966 This has three parts: * When `--caffe2_handle_executor_threads_exceptions` is set when a parallel execution step throws an exception it can hang waiting for async nets to finish. This adds cancellation code to cancel any async nets. * This makes the exceptions returned from parallel workers pass a std::exception_ptr so the stack trace can be recorded with folly::SmartExceptionTracer. * Define Cancel method at NetBase level to avoid pulling in unsupported AsyncSchedulingNet for fbandroid. Test Plan: Added unit tests for plan_executor buck test //caffe2/caffe2:caffe2_test_cpu buck test //caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100 Reviewed By: boryiingsu Differential Revision: D19320177 fbshipit-source-id: d9939fcea1317751fa3de4172dfae7f781b71b75	2020-04-09 14:38:18 -07:00
Nikita Shulga	0f34d648c8	Fix signed-unsigned warnings (RELAND) (#36224 ) Summary: This is a realand of https://github.com/pytorch/pytorch/pull/36196 Before the fix bazel spews following multi-line warning for every single caffe2 operator: ``` In file included from ./c10/util/logging_is_google_glog.h:50, from ./c10/util/Logging.h:26, from ./caffe2/core/logging.h:2, from ./caffe2/core/blob.h:13, from ./caffe2/core/operator.h:18, from ./caffe2/sgd/adadelta_op.h:1, from caffe2/sgd/adadelta_op.cc:1: bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h: In instantiation of 'std::string* google::Check_LTImpl(const T1&, const T2&, const char*) [with T1 = int; T2 = long unsigned int; std::string = std::__cxx11::basic_string<char>]': ./caffe2/core/operator.h:192:5: required from 'const T& caffe2::OperatorBase::Input(int, caffe2::DeviceType) [with T = caffe2::Tensor; caffe2::DeviceType = c10::DeviceType]' ./caffe2/core/operator.h:890:48: required from 'const caffe2::Tensor& caffe2::Operator<Context>::Input(int, caffe2::DeviceType) [with Context = caffe2::CPUContext; caffe2::DeviceType = c10::DeviceType]' ./caffe2/sgd/adadelta_op.h:87:5: required from 'bool caffe2::SparseAdadeltaOp<Context>::RunOnDevice() [with Context = caffe2::CPUContext]' ./caffe2/sgd/adadelta_op.h:85:8: required from here bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:722:32: warning: comparison of integer expressions of different signedness: 'const int' and 'const long unsigned int' [-Wsign-compare] 722 \| DEFINE_CHECK_OP_IMPL(Check_LT, < ) \| ^ bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:148:53: note: in definition of macro 'GOOGLE_PREDICT_TRUE' 148 \| #define GOOGLE_PREDICT_TRUE(x) (__builtin_expect(!!(x), 1)) \| ^ bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:722:1: note: in expansion of macro 'DEFINE_CHECK_OP_IMPL' 722 \| DEFINE_CHECK_OP_IMPL(Check_LT, < ) \| ^~~~~~~~~~~~~~~~~~~~ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36224 Test Plan: CI Differential Revision: D20919506 Pulled By: malfet fbshipit-source-id: b8b4b7c62dcbc109b30165b19635a6ef30033e73	2020-04-08 16:29:27 -07:00
Akshay Bhandary	83abd7ffbf	Revert D20909696: [pytorch][PR] Fix signed-unsigned warnings Test Plan: revert-hammer Differential Revision: D20909696 Original commit changeset: 16723355f473 fbshipit-source-id: e1cf6e9d42f852693549a94d7f5830196781f00e	2020-04-08 01:21:04 -07:00
Nikita Shulga	25fe27981f	Fix signed-unsigned warnings (#36196 ) Summary: Otherwise, while bazel spews following multi-line warning for every single caffe2 operator: ``` In file included from ./c10/util/logging_is_google_glog.h:50, from ./c10/util/Logging.h:26, from ./caffe2/core/logging.h:2, from ./caffe2/core/blob.h:13, from ./caffe2/core/operator.h:18, from ./caffe2/sgd/adadelta_op.h:1, from caffe2/sgd/adadelta_op.cc:1: bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h: In instantiation of 'std::string* google::Check_LTImpl(const T1&, const T2&, const char*) [with T1 = int; T2 = long unsigned int; std::string = std::__cxx11::basic_string<char>]': ./caffe2/core/operator.h:192:5: required from 'const T& caffe2::OperatorBase::Input(int, caffe2::DeviceType) [with T = caffe2::Tensor; caffe2::DeviceType = c10::DeviceType]' ./caffe2/core/operator.h:890:48: required from 'const caffe2::Tensor& caffe2::Operator<Context>::Input(int, caffe2::DeviceType) [with Context = caffe2::CPUContext; caffe2::DeviceType = c10::DeviceType]' ./caffe2/sgd/adadelta_op.h:87:5: required from 'bool caffe2::SparseAdadeltaOp<Context>::RunOnDevice() [with Context = caffe2::CPUContext]' ./caffe2/sgd/adadelta_op.h:85:8: required from here bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:722:32: warning: comparison of integer expressions of different signedness: 'const int' and 'const long unsigned int' [-Wsign-compare] 722 \| DEFINE_CHECK_OP_IMPL(Check_LT, < ) \| ^ bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:148:53: note: in definition of macro 'GOOGLE_PREDICT_TRUE' 148 \| #define GOOGLE_PREDICT_TRUE(x) (__builtin_expect(!!(x), 1)) \| ^ bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:722:1: note: in expansion of macro 'DEFINE_CHECK_OP_IMPL' 722 \| DEFINE_CHECK_OP_IMPL(Check_LT, < ) \| ^~~~~~~~~~~~~~~~~~~~ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36196 Differential Revision: D20909696 Pulled By: malfet fbshipit-source-id: 16723355f473379ba9da6d3c33bd561b9724800a	2020-04-07 21:31:01 -07:00
Edward Yang	459163b8eb	Revert D20449887: [dt][caffe2] enable using smart exceptions in async nets Test Plan: revert-hammer Differential Revision: D20449887 Original commit changeset: 047fdf1bd52f fbshipit-source-id: 3d7801613f86885c204f3946f3a52a855516faa3	2020-04-06 19:37:05 -07:00
Tristan Rice	8ef82fc2c9	[dt][caffe2] enable using smart exceptions in async nets (#34753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34753 This improves support for exceptions and capturing stack traces in caffe2 async nets. We generally want to use exceptions everywhere we can in order to preserve stack information. It also makes the exception timestamp more accurate so multiple exceptions at the same time can be correctly ordered. Test Plan: Updated the tests to use the new error semantics + adds a test to ensure the stack is correctly propagated through deferrable async scheduling. Reviewed By: andrewwdye Differential Revision: D20449887 fbshipit-source-id: 047fdf1bd52fd7c7c1f3fde77df9a27ed9e288e7	2020-04-06 14:27:07 -07:00
Tristan Rice	676fc929b7	[caffe2] fix type and shape inference for common gradient ops (#35857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35857 This fixes a lot of common ops for InferBlobShapesAndTypes as well as adds support for testing the inferred shapes and types of gradient ops. Ops: * Concat * Split * LeakyReLU * Relu * Prelu * Gelu * Elu * Sinh, Tanh, Cosh * Abs * ... and a number of other simple element wise ops Test Plan: Added support to hypothesis test to check the shape and type of gradient ops. Enabled it for all the ops I fixed the shape and type inference for. buck test caffe2/caffe2/python/operator_test: Reviewed By: pradeepd24 Differential Revision: D20806284 fbshipit-source-id: 77f796d9ff208e09e871bdbadf9a0a7c196b77f2	2020-04-02 11:17:04 -07:00
Nikita Shulga	16774f7353	Increase TimerTest tolerance to 20% on Windows (#35818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35818 Test Plan: CI Differential Revision: D20798424 Pulled By: malfet fbshipit-source-id: 57e8d9c6b93903a6632168a4a35bf946d8c518aa	2020-04-01 14:29:05 -07:00
Edward Yang	3f3b96b1f8	Revert D20735881: [pytorch][PR] [WIP] [reland][pytorch][PR] Fix some incorrect annotation… Test Plan: revert-hammer Differential Revision: D20735881 Original commit changeset: d21e940380f0 fbshipit-source-id: fb50a099320bfac92c9b8e1ca12cdc50d302342f	2020-03-30 12:28:27 -07:00
peter	e7a37823b0	[WIP] [reland][pytorch][PR] Fix some incorrect annotation… (#35588 ) Summary: …s found by clang-cl" This reverts commit `a9b540d109`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35588 Differential Revision: D20735881 Pulled By: ezyang fbshipit-source-id: d21e940380f0c1b9b9b84e9cc892985fd3ad0ac3	2020-03-30 11:42:19 -07:00
Nikita Shulga	a9b540d109	Revert D20670031: [pytorch][PR] Fix some incorrect annotations found by clang-cl Test Plan: revert-hammer Differential Revision: D20670031 Original commit changeset: cd8018dee703 fbshipit-source-id: 6900bf46346f0f415812607e5eff67259fc7b478	2020-03-27 18:26:01 -07:00
peter	45c9ed825a	Formatting cmake (to lowercase without space for if/elseif/else/endif) (#35521 ) Summary: Running commands: ```bash shopt -s globstar sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i caffe2//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i torch//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i c10//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake//.cmake sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake//.cmake.in ``` We may further convert all the commands into lowercase according to the following issue: `77543bde41`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35521 Differential Revision: D20704382 Pulled By: malfet fbshipit-source-id: 42186b9b1660c34428ab7ceb8d3f7a0ced5d2e80	2020-03-27 14:25:17 -07:00
peter	0c16cedafe	Fix some incorrect annotations found by clang-cl (#35364 ) Summary: Fixes incorrect usages of symbol annotations including: 1. Exporting or importing a function/class in an anonymous namespace. 2. Exporting or importing a function/class implementation in a header file. However, by removing the symbol annotations, they are now local symbols. If they need to be remain global, I can move the implementations to the source file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35364 Differential Revision: D20670031 Pulled By: ezyang fbshipit-source-id: cd8018dee703e2424482c27fe9608e040d8105b8	2020-03-27 10:40:04 -07:00
Linbin Yu	93065ff767	[1] add missing header for C10_EXPORT_CAFFE2_OP_TO_C10 (#35245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35245 add missing header file for C10_EXPORT_CAFFE2_OP_TO_C10_CPU micro (Note: this ignores all push blocking failures!) Test Plan: buck build -c caffe2.expose_op_to_c10=1 //xplat/caffe2:mask_rcnn_opsAndroid Reviewed By: dreiss Differential Revision: D20528761 fbshipit-source-id: 7cd186ba72964c2e193aca994f87a91a71c3c5d7	2020-03-24 22:16:03 -07:00
Nikita Shulga	6f737dd4a3	Fix signed-unsigned warnings (#34791 ) Summary: And few typos Pull Request resolved: https://github.com/pytorch/pytorch/pull/34791 Test Plan: CI Differential Revision: D20524879 Pulled By: malfet fbshipit-source-id: 58fa03bd6356979e77cd1bffb6370d41a177c409	2020-03-19 00:29:56 -07:00
Nikita Shulga	a3de359464	Do not throw from CUDAContext destructor (#34756 ) Summary: Throwing from destructor leads to undefined behaviour (most often to segault) So it's better to leak memory then segault Pull Request resolved: https://github.com/pytorch/pytorch/pull/34756 Test Plan: Run `test_pytorch_onnx_caffe2` Differential Revision: D20504228 Pulled By: malfet fbshipit-source-id: 7a05776fea9036f602e95b8182f8493cb5886dab	2020-03-18 00:13:18 -07:00
Nikita Shulga	e70c28856f	[Caffe2] Move more method implementations from tensor.h to tensor.cc (#34811 ) Summary: To speed up compilation time Pull Request resolved: https://github.com/pytorch/pytorch/pull/34811 Test Plan: CI Differential Revision: D20476992 Pulled By: malfet fbshipit-source-id: 922cde93783fbfc04854851d7a05a635d5239792	2020-03-16 22:15:18 -07:00
Nikita Shulga	ef78fa8668	caffe2::OperatorBase do not need to be aware of at::Tensor functions (#34810 ) Summary: Replacing <ATen/core/Tensor.h> with <<ATen/core/TensorBody.h> speeds up compilation of caffe2 operators by 15% For example, it reduces pool_op.cu compilation from 18.8s to 16s Pull Request resolved: https://github.com/pytorch/pytorch/pull/34810 Test Plan: CI Differential Revision: D20472230 Pulled By: malfet fbshipit-source-id: e1b261cc24ff577f09e2d5f6428be2063c6d4a8b	2020-03-16 12:58:05 -07:00
Linbin Yu	2fe7fc681d	[PT] add macro to expose caffe2 ops to PyTorch mobile (#34578 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34578 Right now C10_EXPORT_CAFFE2_OP_TO_C10_CPU didn't work on mobile since we disabled some code paths. This diff added a new macro to enable these code paths so we can register caffe2 ops in PT mobile. Test Plan: verified caffe2 ops are registered in PT mobile (on the whole stack) ``` _caffe2::BBoxConcatBatchSplits(Tensor[] input_list, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor output) _caffe2::BBoxTransform(Tensor rois, Tensor deltas, Tensor im_info, float[] weights, bool apply_scale, bool rotated, bool angle_bound_on, int angle_bound_lo, int angle_bound_hi, float clip_angle_thresh, bool legacy_plus_one, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor output_0, Tensor output_1) _caffe2::BoxWithNMSLimit(Tensor scores, Tensor boxes, Tensor batch_splits, float score_thresh, float nms, int detections_per_im, bool soft_nms_enabled, str soft_nms_method, float soft_nms_sigma, float soft_nms_min_score_thres, bool rotated, bool cls_agnostic_bbox_reg, bool input_boxes_include_bg_cls, bool output_classes_include_bg_cls, bool legacy_plus_one, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor scores, Tensor boxes, Tensor classes, Tensor batch_splits, Tensor keeps, Tensor keeps_size) _caffe2::GenerateProposals(Tensor scores, Tensor bbox_deltas, Tensor im_info, Tensor anchors, float spatial_scale, int pre_nms_topN, int post_nms_topN, float nms_thresh, float min_size, bool angle_bound_on, int angle_bound_lo, int angle_bound_hi, float clip_angle_thresh, bool legacy_plus_one, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor output_0, Tensor output_1) _caffe2::HeatmapMaxKeypoint(Tensor heatmaps, Tensor bboxes_in, bool should_output_softmax=True, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor keypoints) _caffe2::ResizeNearest(Tensor X, str order, float width_scale, float height_scale, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor Y) _caffe2::RoIAlign(Tensor features, Tensor rois, str order, float spatial_scale, int pooled_h, int pooled_w, int sampling_ratio, bool aligned, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor) Reviewed By: dreiss Differential Revision: D20128254 fbshipit-source-id: 49a837dddc431eb528b5c72ffdfe0d0131cd10b4	2020-03-11 19:15:14 -07:00
Rohith Menon	879a90b322	[ModelLoading] Use byte encoding for uint8, fp16 etc. instead of int32 (#34343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34343 Use byte encoding for uint8, fp16 etc. instead of int32 in TensorProto serialization/deserialization tl;dr - fp16 tensor deserialization 12x faster, serialized size 25% lower - uint8 tensor deserialization 36x faster, serialized size 25% lower Test Plan: ``` ============================================================================ caffe2/caffe2/fb/predictor/ModelLoaderBenchmark.cpprelative time/iter iters/s ============================================================================ BlobProtoInt32DeserializationFloat16 12.37ms 80.82 BlobProtoByteDeserializationFloat16 1125.46% 1.10ms 909.64 ---------------------------------------------------------------------------- BlobProtoInt32DeserializationUInt8 17.57ms 56.92 BlobProtoByteDeserializationUInt8 3629.45% 484.02us 2.07K ============================================================================ ``` Reviewed By: yinghai Differential Revision: D20137451 fbshipit-source-id: 8ed4be2286a6d4c7e134fcb0832f22bc645039a1	2020-03-06 11:58:30 -08:00
Artem Volkhin	75d29f8d3e	Allow converting IValue to vector<string> (#34269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34269 follow up for https://github.com/pytorch/pytorch/pull/16519 Test Plan: unit tests Reviewed By: houseroad Differential Revision: D20261495 fbshipit-source-id: 947f3cbd469d9258ec2dbb36cb68efe15a3b19eb	2020-03-05 12:31:23 -08:00
Michael Ranieri	1702152ef9	fixup unit tests (#34105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34105 make parallel_net_test.cc chronos conforming. exclude gtest asserts that check thrown exceptions when exceptions are disabled. Test Plan: CI green Differential Revision: D20153525 fbshipit-source-id: 7371e559da948f46773fed09e3a23a77411d59e0	2020-03-03 10:33:21 -08:00
cyy	8a14b41617	fix warnings reported by PVS (#33868 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33868 Differential Revision: D20169059 Pulled By: ailzhang fbshipit-source-id: ec12226ae27ddd89fa5bacdd35151981ebfedcfd	2020-03-02 18:51:38 -08:00
Michael Ranieri	b874c039f6	Allow checking for cached module before asserting (#33954 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33954 fixes caffe2/core/module_test.cc on windows misc lint fixes. Test Plan: CI green Reviewed By: malfet Differential Revision: D20153512 fbshipit-source-id: aeae84a028e26edd65c7218611e3c49a8d9bb8c0	2020-03-02 15:43:50 -08:00
Michael Ranieri	9239608037	fix windows clang attributes (#33959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33959 make sure clang on windows uses correct attributes. add support for cl.exe style pragma attributes Test Plan: CI green Differential Revision: D20153548 fbshipit-source-id: bfbfd374e8f5e7d7b8598453c3ca2b6693a425f1	2020-03-02 13:20:51 -08:00
Igor Sugak	5dde8cd483	[caffe2] fix no matching function min/max Clang errors (#33563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33563 When NVCC or Clang are driving CUDA compilation many math functions are declared by default, with a small difference: Clang marks them as `__device__` only, while NVCC uses both `__host__` and `__device__`. This makes every un-elaborated `min` or `max` function call from a `__host__` function generate a syntax error when Clang is used. Fix the errors by using `std::min` and `std::max` from `<algorithm>`, since C++14 they are `constexpr` and can be used in the `__device__` code [1]. 1. https://llvm.org/docs/CompileCudaWithLLVM.html#algorithm Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow buck build mode/opt //fblearner/flow/projects/dper:workflow ``` Execute tests on devgpu: ``` buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda ``` Reviewed By: ngimel Differential Revision: D20005795 fbshipit-source-id: 98a3f35e8a96c15d3ad3d2066396591f5cca1696	2020-02-28 11:33:24 -08:00
Michael Suo	dbe850af5b	[jit] do the code reorg (#33851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851 Rationale and context described in #33828. Script to reproduce the move: https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9 ghstack-source-id: 99079645 Test Plan: Make sure CI passes Reviewed By: jamesr66a Differential Revision: D20133869 fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e	2020-02-27 13:02:51 -08:00
Yinghai Lu	a2f3c6c26f	Call RandomNumberSeed() on-demand (#33539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33539 We rarely use the `random_seed_` in context but we always initialize it with `RandomNumberSeed()` which isn't trivial. This diff makes it that we only call `RandomNumberSeed()` once when we want to use `random_seed_`. Test Plan: unittests. Canaries: AF: https://our.intern.facebook.com/intern/ads/canary/424753437441438410 AI: https://our.intern.facebook.com/intern/ads/canary/424753467414318838 Prospector: https://our.intern.facebook.com/intern/ads/canary/424753976999968569 Reviewed By: ipiszy Differential Revision: D19993190 fbshipit-source-id: 1d2606bd65476ff3b519c69f9cbfa3b80f75cdff	2020-02-22 01:22:18 -08:00
Dmytro Dzhulgakov	e10aa6b72f	Fix flaky DagNetTest unittest Summary: The first run of the net is noisy sometimes - just run it twice. Reviewed By: cheshen1 Differential Revision: D20039274 fbshipit-source-id: 639e65646bf52f3efe1ecd4bbcd0e413d9389b29	2020-02-21 16:08:04 -08:00
Brian Wignall	f326045b37	Fix typos, via a Levenshtein-type corrector (#31523 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking. Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523 Differential Revision: D19216749 Pulled By: mrshenli fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea	2020-01-17 16:03:19 -08:00
Zachary DeVito	7e3c438913	Renaming IValue List functions (#32093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32093 toGenericListRef -> toListRef isGenericList -> isList toGenericList -> toList toXListRef -> toXVector Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D19369767 Pulled By: zdevito fbshipit-source-id: 4f0078f95b83e6586524c03f7bcf206722fdd9ae	2020-01-17 15:17:45 -08:00
Yanghan Wang	9b6ec61bfd	exposing CPU/GPU Copy ops (#32248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32248 expose CPU/GPU copy ops Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:torch_integration_test Reviewed By: houseroad Differential Revision: D19405856 fbshipit-source-id: 1df4aa202e26647cb81e9fe7e4478e594a5f7f3e	2020-01-17 12:40:43 -08:00
Yinghai Lu	df514fd8c0	C++ C2/Glow operator unittest Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32258 Test Plan: ``` buck test glow/fb/test/numerics:fp16_op_test ``` Reviewed By: bddppq Differential Revision: D19401786 fbshipit-source-id: 1382b5208be6172d3e6f768dedad7ebec31cffc9	2020-01-17 12:13:34 -08:00
Pavel Belevich	62b06b9fae	Rename TensorTypeId to DispatchKey (#32154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32154 TensorTypeId -> DispatchKey c10/core/TensorTypeId.h -> c10/core/DispatchKey.h c10/core/TensorTypeId.cpp -> c10/core/DispatchKey.cpp TensorTypeId::* -> DispatchKey::* TensorTypeId type_id -> DispatchKey dispatch_key type_id -> dispatch_key TensorTypeId::NumTensorIds -> DispatchKey::NumDispatchKeys RealTensorTypeId -> RealDispatchKey TensorTypeSet -> DispatchKeySet TensorTypeIds -> DispatchKeys c10/core/TensorTypeSet.h -> c10/core/DispatchKeySet.h c10/core/TensorTypeSet.cpp -> c10/core/DispatchKeySet.cpp type_set() -> key_set() type_set_ -> key_set_ typeSet -> keySet ExcludeTensorTypeIdGuard -> ExcludeDispatchKeyGuard IncludeTensorTypeIdGuard -> IncludeDispatchKeyGuard LocalTensorTypeSet -> LocalDispatchKeySet c10/core/impl/LocalTensorTypeSet.h -> c10/core/impl/LocalDispatchKeySet.h c10/core/impl/LocalTensorTypeSet.cpp -> c10/core/impl/LocalDispatchKeySet.cpp tls_local_tensor_type_set -> tls_local_dispatch_key_set tls_is_tensor_type_id_excluded -> tls_is_dispatch_key_excluded tls_set_tensor_type_id_excluded -> tls_set_dispatch_key_excluded tls_is_tensor_type_id_included -> tls_is_dispatch_key_included tls_set_tensor_type_id_included -> tls_set_dispatch_key_included MultiDispatchTensorTypeSet -> MultiDispatchKeySet multi_dispatch_tensor_type_set -> multi_dispatch_key_set tensorTypeIdToBackend -> dispatchKeyToBackend backendToTensorTypeId -> backendToDispatchKey initForTensorTypeSet -> initForDispatchKeySet inferred_type_set -> inferred_key_set computeTensorTypeId -> computeDispatchKey PODLocalTensorTypeSet raw_local_tensor_type_set -> PODLocalDispatchKeySet raw_local_dispatch_key_set get_default_tensor_type_id -> get_default_dispatch_key inferred_type_id -> inferred_dispatch_key actual_type_id -> actual_dispatch_key typeSetToDispatchKey_ -> dispatchKeySetToDispatchKey_ get_type_id() -> get_dispatch_key() legacyExtractTypeId -> legacyExtractDispatchKey extractTypeId -> extractDispatchKey Test Plan: Imported from OSS Differential Revision: D19398900 Pulled By: pbelevich fbshipit-source-id: 234ad19f93d33e00201b61e153b740a339035776	2020-01-15 11:16:08 -08:00
Zachary DeVito	14593f077f	remove list specialization from ivalue (#30734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30734 What are specialized lists? The IValues that hold List[int], List[Tensor], and List[AnythingElse] are different C++ types. e.g. List[int] has a std::vector<int> while List[AnythingElse] holds a std::vector<IValue>. Why do we have specialized lists? When we first created the JIT we needed to bind the ATen C++ API which has std::vector<int>, std::vector<Tensor> as inputs. The easiest way to match this API was to make our IValues contain these same types. Conversion was just unwrapping the IValue, very easy and cheap. What is the problem with specialized lists? We end up with significant special cases through the compiler. Other types like Dict are not specialized. So in the Pickler, for instance, there is a single piece of logic to handle their serialization. For Lists, we end up with multiple cases. Furthermore, it doesn't match Python, leading to problems along translation boundaries. Our pickle serialization is slightly different than python, so it is harder to load objects from our IValue serialization as Python values. They also make it harder to provide an easy-to-use user API. We'd like to match pybind11 for C++ bindings to TorchScript. This would entail having a single torch::List class (untemplated) that can be used to construct inputs. This is made much harder if the underlying ivalue needs to be different depending on the type inside the list. The ideal case would be to have a constructor like ``` template<typename T> List(std::vector<T> foo); ``` It would then set up the type tags correctly based on type T, without the need for passing tags. Do specialized lists improve perf? Not in a way we have been able to measure. Our major concern initially was having to translate a std::vector<IValue> to std::vector<int> to call ATen functions. This was especially a concern for aten::_convolution which takes a number of mostly-constant lists of integers. However, when we measure the effect of actually having to do this conversion for an aten::_convolution, it does not take measurable time (benchmark results below). This is true even if you use a trivial convolution (e.g. 1x1x1), and comment out the actual convolution code. What are the issues removing them? This PR removes list specialization but keeps the serialization format, and IValue APIs almost exactly the same. The only visible change is that toTensorListRef and family have turned into toTensorVector because they now return by value a copy of the list as a vector. Further PRs can then clean up the complexity issues that arose from speclization. This will likely involve removing the isTensorList/isIntList functions, and refactoring the code that used them to work generically. At some point we will also change serialization to no longer write specialized lists in the pickle binary. This is forward incompatible, so will go in its own PR. Benchmark: ``` import torch import torch.nn as nn import torch.nn.functional as F import time class MnistNet(nn.Module): def __init__(self): super(MnistNet, self).__init__() self.conv1 = nn.Conv2d(1, 1, kernel_size=1) self.conv2 = nn.Conv2d(1, 1, kernel_size=1) def forward(self, x): for i in range(10): x = F.relu(self.conv1(x)) x = F.relu(self.conv2(x)) return x model = MnistNet() x = torch.rand(1, 1, 1, 1) r = torch.jit.trace(model, x ) r(x) r(x) r(x) r(x) print(torch.jit.last_executed_optimized_graph()) while True: b = time.time() for i in range(100): r(x) e = time.time() print(e - b) ``` Results (no observable difference): ``` Before (actual conv) 0.13251137733459473 0.13260436058044434 0.13276338577270508 0.1327497959136963 0.13250041007995605 0.13270330429077148 0.13290190696716309 0.13265132904052734 0.13274288177490234 0.1326758861541748 0.13253355026245117 0.13254785537719727 0.13260746002197266 0.13285017013549805 0.13264012336730957 0.132490873336792 0.13280034065246582 0.13243484497070312 0.1325232982635498 0.1326127052307129 0.13264131546020508 0.13274383544921875 0.13298296928405762 0.1326909065246582 ------------------- After (actual conv) 0.13127517700195312 0.13150334358215332 0.13092470169067383 0.13102364540100098 0.13134360313415527 0.13155555725097656 0.13314104080200195 0.13151955604553223 0.13160037994384766 0.1315293312072754 0.13137340545654297 0.13148093223571777 0.131455659866333 0.1327371597290039 0.13134026527404785 0.13152337074279785 0.13151192665100098 0.13165974617004395 0.13403725624084473 0.13251852989196777 0.13135504722595215 0.1315624713897705 0.1317615509033203 0.1314380168914795 0.13157200813293457 -------------------- The following replace the convolution operator with a no-op, to show that even if the conv op was made faster, then we still would not see a difference: Before (fake conv) 0.0069539546966552734 0.0069522857666015625 0.007120847702026367 0.007344722747802734 0.007689952850341797 0.007932662963867188 0.00761723518371582 0.007501363754272461 0.007532835006713867 0.007141828536987305 0.007174253463745117 0.007114410400390625 0.007071495056152344 ------------------ After (fake conv) 0.007458209991455078 0.007337093353271484 0.007268190383911133 0.007313251495361328 0.007306575775146484 0.007468700408935547 0.0073091983795166016 0.007308483123779297 0.007538318634033203 0.007356882095336914 0.007464170455932617 0.007372140884399414 ``` Test Plan: Imported from OSS Differential Revision: D18814702 Pulled By: zdevito fbshipit-source-id: 0371c73b63068fdc12f24b801371ea90f23531a6	2020-01-12 18:28:25 -08:00
Hong Xu	daf00beaba	Remove duplicated Numa detection code. (#30628 ) Summary: cmake/Dependencies.cmake (`1111a6b810/cmake/Dependencies.cmake (L595-L609)`) has already detected Numa. Duplicated detection and variables may lead to incorrect results. Close https://github.com/pytorch/pytorch/issues/29968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30628 Differential Revision: D18782479 Pulled By: ezyang fbshipit-source-id: f74441f03367f11af8fa59b92d656c6fa070fbd0	2020-01-03 08:48:46 -08:00
peterjc123	c4121ed8db	Fix is_fundamental template for MSVC (#30959 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/30932 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30959 Differential Revision: D18891797 Pulled By: mingbowan fbshipit-source-id: e6c36ee80065e66117873e768f86f507c48aaef1	2019-12-19 12:10:22 -08:00
Tristan Rice	b0bd35ff13	caffe2/event: allow multiple errors such as when cancelled (#31335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31335 When an error occurs in a net we end up cancelling all the async ops. If one error occurs it's highly likely other errors will occur as well. Typically we see: 1. SendOp failed due to a network error 2. async scheduling cancels all other ops via `SetFinished("Cancelled");` 3. Another SendOp fails due to a network error and crashes the process when the exception is thrown. This changes caffe2 ops to allow failing twice. Test Plan: buck test //caffe2/caffe2:caffe2_test_cpu Reviewed By: andrewwdye Differential Revision: D19106548 fbshipit-source-id: 4b7882258a240894cc16d061a563c83a3214d3d9	2019-12-18 13:10:57 -08:00
Sebastian Messmer	643ca5def2	Replace c10::guts::stuff with std::stuff (#30915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30915 Since we now have C++14, we don't need these c10::guts helpers anymore ghstack-source-id: 95777609 Test Plan: waitforsandcastle Differential Revision: D18869639 fbshipit-source-id: 97716f932297c64c6e814410ac47b444c33d4e2e	2019-12-16 13:57:19 -08:00
Sebastian Messmer	409151e1bb	Use [[noreturn]] instead of C10_NORETURN or CAFFE_NORETURN (#30917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30917 This is a C++14 feature, we can use this now. ghstack-source-id: 95255753 Test Plan: waitforsandcastle Differential Revision: D18869637 fbshipit-source-id: dd02036b9faeaffa64b2d2d305725443054da31b	2019-12-15 23:54:16 -08:00
Richard Zou	9047d4df45	Remove all remaining usages of BUILD_NAMEDTENSOR (#31116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31116 Changelist: - remove BUILD_NAMEDTENSOR macro - remove torch._C._BUILD_NAMEDTENSOR - remove all python behavior that relies on torch._C._BUILD_NAMEDTENSOR Future: - In the next diff, I will remove all usages of ATen/core/EnableNamedTensor.h since that header doesn't do anything anymore - After that, we'll be done with the BUILD_NAMEDTENSOR removal. Test Plan: - run CI Differential Revision: D18934951 Pulled By: zou3519 fbshipit-source-id: 0a0df0f1f0470d0a01c495579333a2835aac9f5d	2019-12-12 09:53:03 -08:00
Shunting Zhang	7f5f2e8871	add ZERO_COLLISION_HASH to caffe2 data type (#30912 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30912 Add a new data type ZERO_COLLISION_HASH . Test Plan: ci Reviewed By: boryiingsu Differential Revision: D18843626 fbshipit-source-id: b2d8280f13c78b4a656cf95822198df59de7b64c	2019-12-10 21:36:24 -08:00
Edward Yang	38986e1dea	Split libtorch.so back into libtorch_{cpu,cuda,hip} (#30315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30315 The new structure is that libtorch_cpu contains the bulk of our code, and libtorch depends on libtorch_cpu and libtorch_cuda. This is a reland of https://github.com/pytorch/pytorch/pull/29731 but I've extracted all of the prep work into separate PRs which can be landed before this one. Some things of note: * torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library) * The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774 In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/libprotobuf.a(arena.cc.o) is referenced by DSO" * A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly * I had to torch_cpu/torch_cuda caffe2_interface_library so that they get whole-archived linked into torch when you statically link. And I had to do this in an exported fashion because torch needs to depend on torch_cpu_library. In the end I exported everything and removed the redefinition in the Caffe2Config.cmake. However, I am not too sure why the old code did it in this way in the first place; however, it doesn't seem to have broken anything to switch it this way. * There's some uses of `__HIP_PLATFORM_HCC__` still in `torch_cpu` code, so I had to apply it to that library too (UGH). This manifests as a failer when trying to run the CUDA fuser. This doesn't really matter substantively right now because we still in-place HIPify, but it would be good to fix eventually. This was a bit difficult to debug because of an unrelated HIP bug, see https://github.com/ROCm-Developer-Tools/HIP/issues/1706 Fixes #27215 (as our libraries are smaller), and executes on part of the plan in #29235. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18790941 Pulled By: ezyang fbshipit-source-id: 01296f6089d3de5e8365251b490c51e694f2d6c7	2019-12-04 08:04:57 -08:00
Brian Wignall	e7fe64f6a6	Fix typos (#30606 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606 Differential Revision: D18763028 Pulled By: mrshenli fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c	2019-12-02 20:17:42 -08:00
Sebastian Messmer	aa2862b843	Hide the OperatorKernel* argument from the stack based kernel API (#29337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29337 This argument is needed by boxing wrappers so they're able to get a pointer to the corresponding unboxed kernel and call into it. But if a kernel is registered in a boxed way, we don't need it and should hide this from the API. This is especially needed for the backend fallback API where users would only be left wondering why this argument is there and what it does. Also, hiding it allows us to potentially totally remove it in a future refactoring if we find some way to do so. ghstack-source-id: 94481316 Test Plan: unit tests Differential Revision: D18361991 fbshipit-source-id: 5cef26c896fe3f2a5db730d3bc79dcd62e7ef492	2019-11-23 15:25:01 -08:00
Sebastian Messmer	583c288232	Add a OperatorHandle argument to boxed kernels (#29201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29201 This is required for boxed backend fallback kernels (e.g. lazy, AMP) because they need to know which op was actually called. ghstack-source-id: 94481313 Test Plan: I will add unit tests in a diff stacked on top Differential Revision: D18282746 fbshipit-source-id: 339a1bbabd6aff31a587b98f095c75104dfc6f99	2019-11-23 15:24:49 -08:00
Junjie Bai	352731bd6e	Revert D18632773: Split libtorch.so back into libtorch_{cpu,cuda,hip} Test Plan: revert-hammer Differential Revision: D18632773 Original commit changeset: ea717c81e0d7 fbshipit-source-id: 18601439f9f81c9f389020e5a0e4e04adb21772d	2019-11-21 15:01:09 -08:00
Edward Yang	ec30d9028a	Split libtorch.so back into libtorch_{cpu,cuda,hip} (#29731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29731 The new structure is that libtorch_cpu contains the bulk of our code, and libtorch depends on libtorch_cpu and libtorch_cuda. Some subtleties about the patch: - There were a few functions that crossed CPU-CUDA boundary without API macros. I just added them, easy enough. An inverse situation was aten/src/THC/THCTensorRandom.cu where we weren't supposed to put API macros directly in a cpp file. - DispatchStub wasn't getting all of its symbols related to static members on DispatchStub exported properly. I tried a few fixes but in the end I just moved everyone off using DispatchStub to dispatch CUDA/HIP (so they just use normal dispatch for those cases.) Additionally, there were some mistakes where people incorrectly were failing to actually import the declaration of the dispatch stub, so added includes for those cases. - torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library) - The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774 - In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/l ibprotobuf.a(arena.cc.o) is referenced by DSO" - A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly. This situation also happens with custom C++ extensions. - There's a ROCm compiler bug where extern "C" on functions is not respected. There's a little workaround to handle this. - Because I was too lazy to check if HIPify was converting TORCH_CUDA_API into TORCH_HIP_API, I just made it so HIP build also triggers the TORCH_CUDA_API macro. Eventually, we should translate and keep the nature of TORCH_CUDA_API constant in all cases. Fixes #27215 (as our libraries are smaller), and executes on part of the plan in #29235. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18632773 Pulled By: ezyang fbshipit-source-id: ea717c81e0d7554ede1dc404108603455a81da82	2019-11-21 11:27:33 -08:00
Edward Yang	65bb34d885	Remove TensorImpl::is_variable, deprecate Tensor::is_variable (#29653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29653 I didn't remove is_variable from Tensor for BC reasons, but I did remove as many uses as I could from the codebase. at::impl::variable_excluded_from_dispatch got moved to TensorBody.h so that it's more widely accessible. This diff is NOT semantics preserving. Here are the major differences: - In a number of native operator implementations, we tested that arguments are not variable. I replaced these with asserts that variable is excluded from dispatch. I actually don't think these asserts are really necessary now (they should certainly be true, but it's hard to get it wrong), but I've kept them for old time's sake. At least, they'll detect if you call these functions before you've processed variable (indicating a bug in your kernel.) - There are a number of places where we do a per-tensor test for being a variable, for better error reporting when someone commits Tensor/Variable confusion. Although these tests are substantively the same as the tests above, in these cases I decided to delete the test entirely. The reasoning is that in these cases, we didn't really care about dispatch (also, see above; I'm not too sure we really need the dispatch asserts), we cared about Tensor/Variable confusion. Since Tensor/Variable confusion is impossible now, we don't need the tests. One of the key factors which pushed me one way or another was whether or not a function was doing per-tensor validation; if I kept the assert in such functions, I'd repeatedly access the TLS. Even if we want to bring back the asserts, they would have to go somewhere else. Another similar idiom is the number of places we do !x.defined() \|\| x.is_variable(); I treated this equivalently. - nuclear_norm's computation of compute_uv is a bit weird, but I think it's OK to just delete the is_variable case (I suspect that it is always the case that self.is_variable(), but it doesn't really matter.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18496168 Pulled By: ezyang fbshipit-source-id: 5a1ded931e0c10a6b758ba64a8380d34110e0c3e	2019-11-14 11:41:02 -08:00
Christy Lee	b8dca04f73	Add error message if CUDA startup fails (#29670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29670 This is the entry point to loading CUDA code, improve error message to prompt users to check that gpu code is included. Test Plan: Build without gpu code. Run the binary. Check that the new error message exists. Reviewed By: yfeldblum Differential Revision: D18453798 fbshipit-source-id: 63d9ec50acdf57ef4baf3f7d99c836c56bc1435e	2019-11-13 16:48:40 -08:00
Junjie Bai	b0c245d52d	Consolidate the places that find pybind11 include dirs (#29659 ) Summary: Also move the logic that installs the pybind11 headers from setup.py to cmake (to align with other headers). Pull Request resolved: https://github.com/pytorch/pytorch/pull/29659 Differential Revision: D18458208 Pulled By: bddppq fbshipit-source-id: cfd1e74b892d4a65591626ab321780c8c87b810d	2019-11-12 14:51:56 -08:00
Yavuz Yetim	2704af0970	AsyncIf op implementation Summary: This diff adds the following: - An AsyncIf to support conditional async execution. This op assumes that then_net and else_net are async scheduling nets. This op itself completes when every async op in the active net completes. Cancellation cancels the inner nets and the async ops. - Unit tests targeting asynchronicity and error/cancellation handling. Test Plan: New unit tests With --stress-runs=2000: https://our.intern.facebook.com/intern/testinfra/testrun/4785074616784325 Reviewed By: ilia-cher Differential Revision: D18051357 fbshipit-source-id: 1399a437b3ca63fd4ea0cf08d173f85b9242cc1f	2019-11-07 08:51:31 -08:00
Ilia Cherniavskii	7190789f58	Handling of failing and terminal async cpu ops (#29052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29052 Make sure we handle the case of multiple, async, terminal (no children) and failing cpu ops. Test Plan: AsyncIf tests Reviewed By: yyetim Differential Revision: D18276401 Pulled By: ilia-cher fbshipit-source-id: 35b175dd025bc7e392056ac1331b159376a29e60	2019-11-04 12:01:21 -08:00
Yinghai Lu	c60bf2704a	Support Offline Tensors through ONNXIFI layer Summary: Previous import was b2ec1a8041879b7be98d81387a14cae895f952f4 Included changes: - [97fe555](https://github.com/houseroad/foxi/commit/97fe555): Add deferred weight reader pointer when initializing the graph (#15) <Yinghai Lu> - [ba2faf7](https://github.com/houseroad/foxi/commit/ba2faf7): Add status and timeout to events (#14) <Jack Montgomery> Test Plan: kicksandcastle Reviewed By: ipiszy Differential Revision: D18231697 fbshipit-source-id: 7566e2438d2b57f0feaadcd51f55a03552adeab9	2019-10-31 10:33:42 -07:00
Sebastian Messmer	bb0e46b65a	Remove preallocation of type ids (#28024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28024 We preallocated type ids to align them with ScalarType. At that point, the maximum type id was 10 and we used 11 to specify undefined type id. However, since then, ScalarType got more additions, 11 isn't undefined anymore, and numbers 11-15 have meaning. caffe2::TypeIdentifier also got its separate additions, 12 and upwards have meaning that differs from ScalarType. I'm going with the (CI-tested) assumption that caffe2::TypeIdentifier and ScalarType actually don't need to be aligned and remove the functionality for preallocated type ids. This simplifies our type ids. ghstack-source-id: 92051872 Test Plan: unit tests Differential Revision: D17936165 fbshipit-source-id: 2c9df2b9b3f35b3e319641c96638321ac3433d5c	2019-10-16 23:08:11 -07:00
Sebastian Messmer	d9de2e0ba9	Back out "Revert D17936166: [wip] Constexpr type ids" (#28155 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28155 Original commit changeset: 92c63a96dedd ghstack-source-id: 92051874 Test Plan: unit tests Differential Revision: D17964410 fbshipit-source-id: 1d989d28b3e1de6d43c915f122f2b65a77a332eb	2019-10-16 18:24:04 -07:00
Lu Fang	1819fade35	Revert D17936166: [wip] Constexpr type ids Test Plan: revert-hammer Differential Revision: D17936166 Original commit changeset: 68cfa926c721 fbshipit-source-id: 92c63a96dedd8764e342c6437c6ea308d93d29b2	2019-10-16 06:47:10 -07:00
Sebastian Messmer	9cc4405dc9	Constexpr type ids (#28023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28023 ghstack-source-id: 91987335 Test Plan: waitforsandcastle Differential Revision: D17936166 fbshipit-source-id: 68cfa926c721e5fbc96e083eb47e784bf34a9df4	2019-10-15 21:21:20 -07:00
Sebastian Messmer	ef8bcfe2c7	Revert D17488861: constexpr type ids Test Plan: revert-hammer Differential Revision: D17488861 Original commit changeset: ce7b059d7c86 fbshipit-source-id: 426fca9abe7122190fc17ac6976bc6bcbd5718df	2019-10-15 09:59:21 -07:00
Sebastian Messmer	1865f31efa	Revert D17490109: Remove preallocation of type ids Test Plan: revert-hammer Differential Revision: D17490109 Original commit changeset: 800c340d9d35 fbshipit-source-id: a3e39bbce53c828fe553379d9f2b66dc8a07c982	2019-10-15 09:59:17 -07:00
Sebastian Messmer	cf01f53b5a	Remove preallocation of type ids (#26509 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26509 We preallocated type ids to align them with ScalarType. At that point, the maximum type id was 10 and we used 11 to specify undefined type id, see https://github.com/pytorch/pytorch/pull/10139. However, since then, ScalarType got more additions, 11 isn't undefined anymore, and numbers 11-15 have meaning. caffe2::TypeIdentifier also got its separate additions, 12 and upwards have meaning that differs from ScalarType. I'm going with the (CI-tested) assumption that caffe2::TypeIdentifier and ScalarType actually don't need to be aligned and remove the functionality for preallocated type ids. This simplifies our type ids. ghstack-source-id: 91896918 Test Plan: unit tests Differential Revision: D17490109 fbshipit-source-id: 800c340d9d3556a99f6e3ffc33af14ad68d7cc59	2019-10-15 08:47:13 -07:00
Sebastian Messmer	6f865c1e37	constexpr type ids (#26502 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26502 Create type ids at compile time instead of incrementing a counter at runtime. This is done by computing a compile time crc64 on the type name. We couldn't do this before, because we still used GCC4 and that compiler didn't support the use of `__PRETTY_FUNCTION__` in a constexpr context. However, since GCC5 this is possible and we can use this trick. This does not change the semantics of preallocated type ids. I actually think we don't need to preallocate anymore, but I split the removal of preallocation into a separate diff to be able to test it separately. ghstack-source-id: 91896920 Test Plan: unit tests Differential Revision: D17488861 fbshipit-source-id: ce7b059d7c8686b69cb091a4a8beaf4b96391343	2019-10-15 08:47:09 -07:00
Edward Yang	0b6186d778	Remove Tensor.h, TensorMethods.h from src/core. (#27086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27086 This is a major source of merge conflicts, and AFAICT isn't necessary anymore (it may have been necessary for some mobile build stuff in the past). This is a commandeer of #25031 Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D17687345 Pulled By: ezyang fbshipit-source-id: bf6131af835ed1f9e3c10699c81d4454a240445f	2019-10-06 09:37:50 -07:00
Hao Lu	1f0328c6d4	Add randomFill to test_utils.h Summary: Add helper function randomFill to test_utils.h so we can use it in benchmark scrips as well tests. Test Plan: ``` buck run mode/opt //tvm/sparse:cblas_bench ``` Reviewed By: yinghai Differential Revision: D17759193 fbshipit-source-id: e4909b04e83ca9382ab4718855fb63743d028de1	2019-10-04 18:29:22 -07:00
Sebastian Messmer	ed207b53ab	c10::KernelFunction (#26337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26337 - Factor out boxing and unboxing functionality from the c10 dispatcher into a c10::KernelFunction class - Move that class and everything else it depends on into ATen/core/boxing - This also allows us to get rid of c10::KernelCache. Instead, we now store a pointer to the unboxed functor in c10::KernelFunction. - We're also getting rid of the DispatchTableEntry struct and instead store KernelFunction directly. - To make this work, we need to change the dispatcher calling API from Dispatcher::lookup().callBoxed/callUnboxed and OperatorEntry::lookup().callBoxed/callUnboxed to Dispatcher::callBoxed/callUnboxed and OperatorEntry::callBoxed/callUnboxed. ghstack-source-id: 90459911 Test Plan: unit tests Differential Revision: D17416607 fbshipit-source-id: fd221f1d70eb3f1b4d33092eaa7e37d25684c934	2019-09-20 18:55:25 -07:00
Ansha Yu	e44ea6cd5e	tvm operator dynolog (#26295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26295 Log the following in scuba caffe2_tvm_operator_stats: 1. everything in caffe2_operator_stats 2. fallback netdef 3. tvm module graph_json 4. whether compilation triggered this round 5. number of compilations stored in tvm_runtime_map 6. (not yet logged) last compilation time if any 7. (not yet logged) total bytes occupied by compilation 8. whether this compilation is fallback 9. batch size as observed by tvm op Test Plan: ``` buck run mode/dbg //tvm/sparse:tvm_bbpredictor_benchmark -- --init_net ~/tmp/ads/84480054_204/init_net.pb --input_init_net ~/tmp/ads/84480054_204/input_init_net.pb --pred_net ~/tmp/ads/84480054_204/pred_net.pb --warmup 1000 --iter 1000 --num_cycles 5 --caffe2_logging_operator_dyno_sampling_rate=1 --vmodule=Logger= 2 ``` Logs show up in the scuba: https://our.intern.facebook.com/intern/scuba/query/?dataset=caffe2_tvm_operator_stats https://fburl.com/scuba/lq2h22e4 Auto submitted adindexer canary: https://our.intern.facebook.com/intern/ads/canary/421064436039494716 Additional adindexer canary: https://our.intern.facebook.com/intern/ads/canary/421082681202831286/ Additional adfinder canary: https://our.intern.facebook.com/intern/ads/canary/421082685084831037/ Reviewed By: yinghai Differential Revision: D17358412 fbshipit-source-id: d2119c12ddeaa86217c163e32fb1e211952139f5	2019-09-18 18:37:17 -07:00
Andrey Malevich	28d3eb8156	Back out "Back out "[Caffe2] Fix device_option propagation"" (#25908 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25908 Original commit changeset: f6e961e88c01 device_option propagation is completely broken in Caffe2 for cases when pass through operators are used. As an example Gather operator don't have gradient and passes through it's inputs, which results in incorrect detection of the components for sparse parameter aggregation (component will be empty instead of the real device). This diff is trying to fix this issue. Original diff had a problem, that Caffe2 is not handling cases when device option is present, but contains only metadata (for example one for auto-generated reduction ops in backward pass). This diff is addressing this issue by merging device options during the backward pass Test Plan: 1. net_transform is finally working with Gather + FloatToHalf transformed model instead of failing because of incorrect number of components. 2. New unit-test. 3. Verify that previously broken benchmark is now passing ezyang do you have suggestions what else I should test? Reviewed By: ezyang Differential Revision: D17281528 fbshipit-source-id: 4a1bc386f29f6a34fbf8008effde9d4890abebfa	2019-09-17 04:01:36 -07:00
Sebastian Messmer	0e30e6570d	Call aten ops through c10 dispatcher (#23668 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23668 - The eager mode frontend now calls operators who are defined in native_functions.yaml with `use_c10_dispatcher: True` through the c10 dispatcher and not anymore through globalATenDispatch(). - These operators aren't registered with globalAtenDispatch anymore, only on c10 now. - Backend extensions calling globalATenDispatch().registerOp() to add their own kernels still work, this function will forward the registration to the c10 dispatcher for them. ghstack-source-id: 90130455 Test Plan: benchmarks at https://docs.google.com/document/d/1gpzKZcFf1JJameY1vKxF7Cloul9s6D8HKIK2_Pp1hFo/edit# Differential Revision: D16603133 fbshipit-source-id: 991f17b355e9c78c5e86fee4fa381df7ab98ac82	2019-09-15 01:18:07 -07:00
Jiakai Liu	67c530851c	get rid of protobuf dependencies (#25650 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25650 This PR removes protobuf dependencies from mobile build altogether: - caffe2/proto: protobuf files, including caffe2.proto and torch.proto; - caffe2 components that depend on caffe2.proto, including most part of caffe2/core, caffe2/utils; - libprotobuf / libprotobuf-lite dependencies; - protobuf compiler; - some utils class, e.g.: netdef_converter.cpp; - introduce a macro to disable third_party/onnx which depends on protobuf; Test Plan: - builds; - link with demo app to make sure it can load and run a model in pickle format; Differential Revision: D17183548 Pulled By: ljk53 fbshipit-source-id: fe60b48674f29c4a9b58fd1cf8ece44191491531	2019-09-06 08:48:20 -07:00
Jiakai Liu	a3d0abf729	move GetDimFromOrderString to caffe2/core/types.h (#25671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25671 To decouple string_utils.h from types.h and protobuf headers. Logically GetDimFromOrderString seems to be more similiar to StringToStorageOrder comparing to other string_utils functions. Test Plan: - Will check all internal/external CI jobs. Reviewed By: yinghai Differential Revision: D17191912 Pulled By: ljk53 fbshipit-source-id: fe555feef27bfd74c92b6297c12fb668252ca9ff	2019-09-05 04:32:04 -07:00
Sebastian Messmer	791347642b	Allow TensorMethods.h to include Dispatcher.h (alternative) (#23888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23888 This is an alternative to https://github.com/pytorch/pytorch/pull/23684. Instead of splitting a bunch of headers into declaration and definition, we change tensor includes to only include the tensor declaration when the tensor definition isn't needed. ghstack-source-id: 89357687 Test Plan: waitforsandcastle Differential Revision: D16673569 fbshipit-source-id: fa1d92809b05de7910a8c2dc2f55abe071ca63bf	2019-09-04 01:35:19 -07:00
iotamudelta	4fe857187c	switch to rocThrust for thrust/cub APIs (#25620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25620 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25602 Enable rocThrust with hipCUB and rocPRIM for ROCm. They are the ROCm implementations of the thrust and cub APIs and replace the older hip-thrust and cub-hip packages going forward. ROCm 2.5 is the first release to contain the new packages as an option, as of 2.6 they will be the only available option. Add hipification rules to correctly hipify thrust::cuda to thrust::hip and cub:: to hipcub:: going forward. Add hipification rules to hipify specific cub headers to the general hipcub header. Infrastructure work to correctly find, include and link against the new packages. Add the macro definition to choose the HIP backend to Thrust. Since include chains are now a little different from CUDA's Thrust, add includes for functionality used where applicable. Skip four tests that fail with the new rocThrust for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21864 Reviewed By: xw285cornell Differential Revision: D16940768 Pulled By: bddppq fbshipit-source-id: 3dba8a8f1763dd23d89eb0dd26d1db109973dbe5	2019-09-03 22:16:30 -07:00
Yinghai Lu	4edf77b6c0	Fuse to individual operators to GatherFuse8BitRowwiseQuantFloatMulLengthElim (#25519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25519 Fuse Gather-Fused8BitRowwiseQuantizedToFloat-Mul-LengthsSum opportunistically. Test Plan: ``` buck test caffe2/caffe2/opt/custom:concat_elim_test ``` Reviewed By: dreamingleo Differential Revision: D17125045 fbshipit-source-id: 8ee50410eb13a82e1e5c8180f392fce2fe9cd728	2019-09-03 19:08:49 -07:00
Edward Yang	58a0dee749	Replace open registration TensorTypeId with closed enum. (#25252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25252 Our model going forward for extensions will be that you will have to get an allocation of an ID in our system. This is how things work in practice today; we're just simplifying our underlying registration since there is no need to have distributed registration. There are some codemods in this diff: ``` codemod --extensions cpp,h,cc,cuh,py,in --exclude-paths=c10/core/TensorTypeId.h '([A-Za-z]+?)TensorId\(\)' 'TensorTypeId::\1TensorId' codemod --extensions cpp,h,cc,cuh,py,in 'TensorTypeIds::undefined\(\)' 'TensorTypeId::UndefinedTensorId' codemod --extensions cpp 'TensorType1\(\)' 'TensorTypeId::CPUTensorId' codemod --extensions cpp 'TensorType2\(\)' 'TensorTypeId::CUDATensorId' codemod --extensions cpp 'TensorType3\(\)' 'TensorTypeId::XLATensorId' codemod --extensions cpp 'TensorType1' 'CPUTensorId' codemod --extensions cpp 'TensorType2' 'CUDATensorId' codemod --extensions cpp 'TensorType3' 'XLATensorId' ``` The main hand-written changes are in c10/core/TensorTypeId.h Other manual fixes: - aten/src/ATen/core/op_registration/op_registration.cpp - stop using std::string operator+ - aten/src/ATen/function_wrapper.py - handle a hardcoded TypeId() that wasn't caught by codemod - torch/csrc/tensor/python_tensor.h - fix now incorrect forward declaration of TensorTypeId - aten/src/ATen/core/op_registration/ - remove out-of-line registration Differential Revision: D17072001 Test Plan: ossci and sandcastle Pulled By: ezyang fbshipit-source-id: c641515fd0604c045c54fbb1d6b1b950f45e89d1	2019-08-29 08:55:58 -07:00
Lucian Grijincu	9c9f14029d	Revert D16929363: Revert D16048264: Add static dispatch mode to reduce mobile code size Differential Revision: D16929363 Original commit changeset: 69d302929e18 fbshipit-source-id: add36a6047e4574788eb127c40f6166edeca705f	2019-08-20 17:08:31 -07:00
Lucian Grijincu	bd6cf5099b	Revert D16048264: Add static dispatch mode to reduce mobile code size Differential Revision: D16048264 Original commit changeset: ad1e50951273 fbshipit-source-id: 69d302929e183e2da26b64dcc24c69c3b7de186b	2019-08-20 16:26:18 -07:00
Roy Li	6824c9018d	Add static dispatch mode to reduce mobile code size Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22335 Test Plan: Imported from OSS Differential Revision: D16048264 Pulled By: li-roy fbshipit-source-id: ad1e50951273962a51bac7c25c3d2e5a588a730e	2019-08-20 12:21:32 -07:00
Rui Zhu	5b0de85868	Register FC/Conv DNNLowp separately for supporting both tensor type (#24361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24361 Currently we only support Conv in kernel but have entrance for both type using one same class It is time make change Reviewed By: csummersea Differential Revision: D16604713 fbshipit-source-id: b98d39a2c7960707cd50ba27e43dce73f741eeeb	2019-08-14 17:15:42 -07:00
Edward Yang	5ae909b443	Revert D15920763: Move TensorOptions to ATen/core Differential Revision: D15920763 Original commit changeset: c3429973180a fbshipit-source-id: 0efb27722b371e1047f02240f071bc222b52e51d	2019-08-13 12:07:18 -07:00
Zachary DeVito	4a754dc3e3	cleanup warnings Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24133 Test Plan: Imported from OSS Differential Revision: D16746249 Pulled By: zdevito fbshipit-source-id: 051f048b03043d6947544cd02ae44288bd439ef9	2019-08-12 16:12:30 -07:00
Richard Zou	bde73860c6	Move TensorOptions to ATen/core (#22020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22020 ghimport-source-id: 62766d49658ee84b8076c555432b50e13d104bc6 Test Plan: Imported from OSS Differential Revision: D15920763 Pulled By: zou3519 fbshipit-source-id: c3429973180a65606da82face5c0ee377035e716	2019-08-12 07:41:12 -07:00
Supriya Rao	9223fa1c46	Add support to serialize qtensor in JIT. (#23356 ) Summary: Adds qtensor specific fields to the proto file so that they get serialized into the model.json Pull Request resolved: https://github.com/pytorch/pytorch/pull/23356 ghstack-source-id: 87263428 Differential Revision: D16473237 fbshipit-source-id: bf5b51d0863d036d30a1644a3c3b74516468224b	2019-07-26 15:52:15 -07:00
Yinghai Lu	b964bdb53a	Fbgemm fp16 tensor support (#23101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23101 Support for - Shape inference - Tensor info extraction Reviewed By: zrphercule Differential Revision: D16345251 fbshipit-source-id: 53ef674b5b1581e6267e6d2070e34355280dae79	2019-07-19 17:08:03 -07:00
Yinghai Lu	2a8d5a132c	Fix workspace destruction ordering (#23096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23096 nets can have states that depends on the rest of the state in the Workspace. Hence, they should be destructed first. Reviewed By: ajyu Differential Revision: D16382987 fbshipit-source-id: 3fd030ba206e2d0e897abb9e31c95bdaeb9482b7	2019-07-19 16:49:50 -07:00
Will Feng	3a12520844	Pass Variable into Caffe2 ops, by requiring that the Variable doesn't require grad (#22473 ) Summary: As part of the Variable/Tensor merge, we want to be able to pass Variables into Caffe2 without doing extra shallow copy, to improve performance and also allow for in-place mutations in Caffe2 ops. There are a few approaches outlined in https://github.com/pytorch/pytorch/pull/22418, and this PR is the chosen approach. Specifically, we can have the assumption that we won't be connecting autograd to C2 gradients at any point (as it's too tricky and not that useful). Therefore, we can pass Variable into Caffe2 ops by requiring that all Variables in Caffe2 don't require grad. For code paths in Caffe2 that might potentially track gradients (e.g. `ScriptModuleOp` and `call_caffe2_op_from_c10`), we use the `torch::NoGradGuard` to make sure gradients are not tracked. This supersedes https://github.com/pytorch/pytorch/pull/22418. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22473 Differential Revision: D16099042 Pulled By: yf225 fbshipit-source-id: 57efc3c7cfb3048d9abe90e63759acc14ebd2972	2019-07-08 11:31:10 -07:00
Jongsoo Park	9db7bc8bc7	fix uninitialized variable warning (#22477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22477 There is actually no use of uninitialized variable but some compilers are not smart enough to reason about two if branches are already taken together. Reviewed By: hx89 Differential Revision: D16100211 fbshipit-source-id: 25f01d668063603d7aaa776451afe8a10415d2ea	2019-07-06 00:36:45 -07:00
Sebastian Messmer	ed60d9fcf9	List/Dict remember and check their element type (#22005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22005 When a Dict or List is created with type information, it will remember that. If at any point later, this list is instantiated to a List<T> with a concrete type, it will assert that T is the correct type. Differential Revision: D15914462 fbshipit-source-id: a8c3d91cb6d28d0c1ac0b57a4c4c6ac137153ff7	2019-07-05 15:17:51 -07:00
Sebastian Messmer	e68dc899d1	Fix compiler warnings (#22162 ) Summary: Fix various compiler warnings Pull Request resolved: https://github.com/pytorch/pytorch/pull/22162 Differential Revision: D16085339 Pulled By: smessmer fbshipit-source-id: d36a4b334315f1a5942cac46443a7d166ca36d0d	2019-07-02 14:12:55 -07:00
Hong Xu	693871ded3	Rename macros and build options NAMEDTENSOR_ENABLED to BUILD_NAMEDTENSOR (#22360 ) Summary: Currently the build system accepts USE_NAMEDTENSOR from the environment variable and turns it into NAMEDTENSOR_ENABLED when passing to CMake. This discrepancy does not seem necessary and complicates the build system. The naming of this build option is also semantically incorrect ("BUILD_" vis-a-vis "USE_"). This commit eradicate this issue before it is made into a stable release. The support of NO_NAMEDTENSOR is also removed, since PyTorch has been quite inconsistent about "NO_*" build options. --- Note: All environment variables with their names starting with `BUILD_` are currently automatically passed to CMake with no need of an additional wrapper. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22360 Differential Revision: D16074509 Pulled By: zou3519 fbshipit-source-id: dc316287e26192118f3c99b945454bc50535b2ae	2019-07-02 11:46:13 -07:00
Haixin Liu	869ce89474	use feenableexcept when glibc is available (#22241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22241 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20387 glibc has a non-standard function, feenableexcept, that triggers floating-point exception handler . Compared to feclearexcept + fetestexcept , this approach allows us to see precisely where the exception is raised from the stack trace. Reviewed By: jspark1105 Differential Revision: D15301095 fbshipit-source-id: 94f6e72456b2280f78d7d01c2ee069ae46d609bb	2019-07-02 10:49:55 -07:00
Andrew Naguib	3cba9e8aaa	Error Message Paraphrasing (#22369 ) Summary: Saying `I` in an err msg is too subjective to be used in a framework. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22369 Differential Revision: D16067712 Pulled By: soumith fbshipit-source-id: 2a390646bd5b15674c99f65e3c460a7272f508b6	2019-06-30 00:13:02 -07:00
Vitaly Fedyunin	516c7e4456	Adding memory_format to empty and empty_like operators (#20558 ) Summary: Original RFC https://github.com/pytorch/pytorch/issues/19092 To ensure that we are not introducing BC breaking change, empty_like returns contiguous tensor by default. ```python nCwh = torch.randn(N, C, H, W) nhwC = nCwh.contiguous(memory_format=torch.channels_last) new_nCwh = torch.empty_like(nhwC) new_nCwh.is_contiguous(memory_format=torch.channels_last) == False ``` Now we need a way to preserve memory format in `empty_like` ```python nCwh = torch.randn(N, C, H, W) nhwC = nCwh.contiguous(memory_format=torch.channels_last) new_nhwC = torch.empty_like(nhwC, memory_format=torch.preserve_format) new_nhwC.is_contiguous(memory_format=torch.channels_last) == True like_nCwh = torch.empty_like(nCwh, memory_format=torch.preserve_format) like_nCwh.is_contiguous(memory_format=torch.channels_last) == False ``` Usage of `torch.preserve_format` allows us to avoid `if` constructs. We can also generate different memory format outputs ```python nCwh = torch.randn(N, C, H, W) nhwC = nCwh.contiguous(memory_format=torch.channels_last) new_nhwC = torch.empty_like(nCwh, memory_format=torch.channels_last) new_nhwC.is_contiguous(memory_format=torch.channels_last) == True new_nCwh = torch.empty_like(nhwC, memory_format=torch.contiguous_format) new_nCwh.is_contiguous(memory_format=torch.channels_last) == False ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/20558 Differential Revision: D15502474 Pulled By: VitalyFedyunin fbshipit-source-id: 2e120d57eefad6fb8e04b8322c79871392f64331	2019-06-26 11:48:27 -07:00
Sebastian Messmer	de85abf226	Allow default construction of Dict/List (#22084 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22084 For DictPtr/ListPtr, default construction was disallowed because it was ambigious if it's supposed to create an empty list or a nullptr. But since we renamed them to Dict/List, we can now allow default construction without ambiguity. Differential Revision: D15948098 fbshipit-source-id: 942a9235b51608d1870ee4a2f2f0a5d0d45ec6e6	2019-06-25 17:40:48 -07:00
Sebastian Messmer	e425789286	Fix "missing return statement" warning (#22216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22216 - Differential Revision: D15989670 fbshipit-source-id: d0534a3bf1eef29657738e271d35503a2f75a043	2019-06-25 16:57:42 -07:00
Ilia Cherniavskii	7b1d6c8912	Update intra_inter_benchmark (#22051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22051 ghimport-source-id: 70710b3866b1a5e21656b77d2695ada74d00254e Test Plan: PARALLEL_BACKEND=NATIVE_TBB USE_OPENMP=0 USE_TBB=1 MKL_SEQ=1 MKLDNN_THREADING=SEQ USE_CUDA=0 BLAS=MKL USE_MKLDNN=1 BUILD_BINARY=1 python setup.py develop --cmake ./build/bin/intra_inter_benchmark Imported from OSS Differential Revision: D15933951 Pulled By: ilia-cher fbshipit-source-id: 88ad8f7a1634c1612ffaa68f22721ffc73d9b2ba	2019-06-21 23:06:27 -07:00
Sebastian Messmer	275087383b	ListPtr->List DictPtr->Dict step 2 (#21937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21937 This changes call sites to use the new naming scheme Reviewed By: zdevito Differential Revision: D15892404 fbshipit-source-id: 8d32aa90a0ead1066688166478f299fde9c2c133	2019-06-19 18:02:05 -07:00
Sebastian Messmer	44128e09f0	Speed up op lookup and registration (#21806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21806 Dispatcher::findSchema(op_name) now uses a lookup table instead of iterating through the list of operators to find it. This speeds up op lookup (as in finding the operator handle from the name, not as in finding a kernel when you already have the operator handle) and it also speeds up op registration since that needs to look if an op with the same name already eists. Differential Revision: D15834256 fbshipit-source-id: c3639d7b567e4ed5e3627c3ebfd01b7d08b55ac1	2019-06-19 12:05:14 -07:00
Will Feng	04f09d4235	Move unwrap logic from c10 to caffe2 (#21620 ) Summary: After https://github.com/pytorch/pytorch/pull/17072, we are allowed to pass Variables into ATen ops, thus there is no need to unwrap input variables in the c10 call path. Note that since Caffe2 still expects inputs to be pure Tensors, we moved the unwrapping logic to the Caffe2 wrapper. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21620 Differential Revision: D15763560 Pulled By: yf225 fbshipit-source-id: 5375f0e51eb320f380ae599ebf98e6b259f0bff8	2019-06-14 22:02:43 -07:00
Sherman Wong	adc99efb46	Add batch id to tracer event (#21446 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21446 this is used for easier tracing of iter id when looking at trace diagram Reviewed By: ilia-cher Differential Revision: D15628950 fbshipit-source-id: ee75b3bdb14a36abc18c7bddc49d8ec9789b724d	2019-06-13 17:13:42 -07:00
Sungmann Cho	f59581218f	Fix spelling errors (#21665 ) Summary: alloctor -> allocator excutable -> executable excution -> execution foward -> forward initiaize -> initialize paralell -> parallel preprocesor -> preprocessor tranpose -> transpose Pull Request resolved: https://github.com/pytorch/pytorch/pull/21665 Differential Revision: D15806155 Pulled By: soumith fbshipit-source-id: d92b21ec8650a2b32f05faf9af0b7d2b073e992c	2019-06-13 15:21:55 -07:00
Karl Ostmo	49481d576d	Torch rename (#20774 ) Summary: This renames the CMake `caffe2` target to `torch`, as well as renaming `caffe2_gpu` to `torch_gpu` (and likewise for other gpu target variants). Many intermediate variables that don't manifest as artifacts of the build remain for now with the "caffe2" name; a complete purge of `caffe2` from CMake variable names is beyond the scope of this PR. The shell `libtorch` library that had been introduced as a stopgap in https://github.com/pytorch/pytorch/issues/17783 is again flattened in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20774 Differential Revision: D15769965 Pulled By: kostmo fbshipit-source-id: b86e8c410099f90be0468e30176207d3ad40c821	2019-06-12 20:12:34 -07:00
Sebastian Messmer	b527e48588	Use c10::List (#21177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21177 - Integrate c10::ListPtr into IValue and the c10 dispatcher. - Streamline conversion to/from IValue. Before, we had IValue::to<> and kernel_functor.h had its own ivalue_to_arg_type and return_type_to_ivalue. They are now unified. Also, this means that nested types like Dicts of Lists of Optional of Dict of ... do work as expected now Differential Revision: D15476433 fbshipit-source-id: bde9df80df20091aa8e6ae17ba7e90abd149b954	2019-06-12 13:58:24 -07:00
Sebastian Messmer	fe5ceea580	Rename caffe2<->c10 operator wrappers (#21322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21322 Naming is everything. - Rename c10_operator.h -> export_caffe2_op_to_c10.h - Rename operator_c10wrapper.h -> export_c10_op_to_caffe2.h - Rename corresponding macros This hugely improves readability and explains what these things are doing. Reviewed By: dzhulgakov Differential Revision: D15616816 fbshipit-source-id: d976aefcb43a0f55d85c3424fdd9aca7e71c3603	2019-06-07 13:48:10 -07:00
Rui Zhu	2b902e9738	Fix the offset numerical bug when casting (#21484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21484 cast<int32_t*> => cast<int32_t> Also fixed reserve problem which might cause incorrect pointer. Reviewed By: yinghai Differential Revision: D15699866 fbshipit-source-id: 374418476bddd60f5c5306c8c57319ccf28b9990	2019-06-07 12:33:18 -07:00
Peng Gong	78a376592d	add cancelAsyncCallback method to OperatorBase (#21492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21492 If one async operator failed, async_scheduling net currently only marks all scheduled async operators as finished without cancelling the callbacks. The new behavior is to cancel the callbacks first, then set event status to finished. Reviewed By: ilia-cher Differential Revision: D15702475 fbshipit-source-id: 55a1774d768b2e238bab859b83332f1877a001ca	2019-06-06 20:57:12 -07:00
Junjie Bai	4c19421f16	Register gradient op with engine (#21205 ) Summary: cc dreiss Pull Request resolved: https://github.com/pytorch/pytorch/pull/21205 Differential Revision: D15578948 Pulled By: bddppq fbshipit-source-id: ef285174e8637daef624c8088ebd903a70582345	2019-05-31 18:48:47 -07:00
Sebastian Messmer	85777b92b2	Assert against using Operator methods not supported when exporting it to c10, part 2 (#17946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17946 Some of these are probably implementable for exported operators, but aren't implemented yet and for now it's better to assert than to just return wrong results. Reviewed By: ezyang Differential Revision: D14430749 fbshipit-source-id: 2b0037a9ed227a22aa7376a90e6d3d09d3e04707	2019-05-29 13:16:00 -07:00
Jongsoo Park	0290897bca	tracing for intra_op_parallel (#20603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20603 When we use intra_op_parallel operators, Caffe2 tracing was generating trace only for the master task giving a false impression that a lot of threads are underutilized. This diff also traces child tasks. Reviewed By: ilia-cher Differential Revision: D14820008 fbshipit-source-id: ff4ed203804d86d9231c21c99d869f1ddf1d1ef9	2019-05-28 17:39:23 -07:00
Kimish Patel	d6d192e0af	Added engine information to the profiling result. (#20493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20493 This helps distinguish if the op was a quantized op or not. Reviewed By: salexspb Differential Revision: D15337854 fbshipit-source-id: 43c7aef143085cfaeb4ec2102a7f36cc454e0e94	2019-05-28 16:41:12 -07:00
Kimish Patel	7afa75006e	Enable operator profiling via command line (#20173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20173 Enabled op profiling even when net type is not dag or prof dag. Also added engine type info to summary. Reviewed By: salexspb, ilia-cher Differential Revision: D15177813 fbshipit-source-id: 5be0efeaabc9a961cf1d73b0703749c08bb1adbb	2019-05-28 16:41:08 -07:00
Sebastian Messmer	6063ffd055	Specify dispatch key with kernel (#20821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20821 Change registration API. Instead of static auto registry = torch::RegisterOperators() .op("my::op", torch::RegisterOperators::options() .kernel<Kernel>() .dispatchKey(CPUTensorId())); it is now static auto registry = torch::RegisterOperators() .op("my::op", torch::RegisterOperators::options() .kernel<Kernel>(CPUTensorId())); This binds kernel and dispatch key together, allowing them to be separate from other future configuration options like alias analysis or autograd wrappers. The semantic problem behind this is that the dispatch key is a kernel config parameter and not an operator config parameter while things like autograd wrappers, alias info, and actually the kernel itself are operator config parameters. And while previously, the different kind of config parameters have been mixed, this diff now separates them. Before this change, it wouldn't have been well defined if you specified a dispatchKey together with an autogradWrapper or aliasInfo for example. // what is this supposed to do? static auto registry = torch::RegisterOperators() .op("my::op", torch::RegisterOperators::options() .aliasInfo(DEFAULT) .dispatchKey(CPUTensorId())); If we get more kernel config parameters in the future, we could introduce something like this static auto registry = torch::RegisterOperators() .op("my::op", torch::RegisterOperators::options() .kernel<Kernel>(torch::RegisterOperators::kernelOptions() .dispatchKey(CPUTensorId()) .otherConfig()); but that's overkill as long as dispatch keys are the only kernel config parameter, and we can introduce that later without breaking backwards compatibility. A nice side effect of this is that people can register multiple kernels to the same operator in the same `.op()` call: static auto registry = torch::RegisterOperators() .op("my::op", torch::RegisterOperators::options() .kernel<Kernel1>(CPUTensorId()) .kernel<Kernel2>(CUDATensorId())); Reviewed By: dzhulgakov Differential Revision: D15455790 fbshipit-source-id: 1c46bfe676dcacf74cf36bd3f5df3d2c32b8fb11	2019-05-24 14:23:35 -07:00
Sebastian Messmer	4501dc305d	Assert against using Operator methods not supported when exporting it to c10 (#17818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17818 Some of these are probably implementable for exported operators, but aren't implemented yet and for now it's better to assert than to just return wrong results. Reviewed By: ezyang Differential Revision: D14392459 fbshipit-source-id: bf86e6cb0a7cfefd112a65dc85cc243e57a5ad52	2019-05-24 13:45:01 -07:00
Dmytro Dzhulgakov	c25e33789e	Lightweight at-most-once logging for API usage (#20745 ) Summary: Resubmit #20698 which got messed up. Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl. Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20745 Differential Revision: D15429196 Pulled By: dzhulgakov fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca	2019-05-23 23:17:59 -07:00
Yinghai Lu	48bf7b9be8	Fix oscillation in coalesceInsertedDataDependencies (#20833 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20833 Att. The algorithm is still "horrendously inefficient". But since we are sunsetting Nomnigraph, I just did the minimal fix here. Reviewed By: tracelogfb Differential Revision: D15463880 fbshipit-source-id: 413a1280a92c1923ba49031177816a2d5f888575	2019-05-23 14:04:20 -07:00
Yinghai Lu	cf7ef5e631	Add onnxifi support for Int8FCDNNLowPPackedWeightBlob (#20564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20564 Reviewed By: bddppq Differential Revision: D15106712 fbshipit-source-id: 428db9c23cfd36ddedc8d79121fbbb3bb484c993	2019-05-20 16:57:11 -07:00
Edward Z. Yang	9b1dbffba5	Re-sync with internal repository (#20702 )	2019-05-20 09:22:57 -04:00
Dmytro Dzhulgakov	d3059b9c49	Lightweight logging for once-only API usage	2019-05-19 23:04:40 -07:00
Jongsoo Park	7b9ee598d6	separate option for FE_OVERFLOW (#20476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20476 There're overflow exceptions happening for legitimate computation like for big x, sigmoid(x) = 1 / (1 + exp(-x)) = 1 / (1 + inf) = 1 This diff separates the option for FE_OVERFLOW to make caffe2_operator_throw_if_fp_exceptions=1 option less noisy. Reviewed By: hx89 Differential Revision: D15332947 fbshipit-source-id: 9148233f5b84551a0900f0557ba22f2b1508ae0c	2019-05-19 16:05:27 -07:00
Sebastian Messmer	cb6be42403	Options based registration API (#20514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20514 Change API from static auto registry = c10::RegisterOperators() .op("my::op", c10::kernel(...), c10::dispatchKey(...) ); to static auto registry = c10::RegisterOperators() .op("my::op", c10::RegisterOperators::options() .kernel(...) .dispatchKey(...) ); because this allows better discoverability. People looking for which options are available will easier find it and IDE autocompletion will work better. Reviewed By: zdevito Differential Revision: D15346348 fbshipit-source-id: 4b74a33b75c2b9cda4a903639fb7abd2c7cff167	2019-05-17 20:54:42 -07:00
Vitaly Fedyunin	5b78a5eadb	Memory format support for contiguous and is_contiguous (#20455 ) Summary: #19975 was separated by 2 PRs. This one: Introduce MemoryFormat argument to the `x.is_contiguous(memory_format=torch.channels_last)` and to the `y = x.contiguous(memory_format=torch.channels_last)` functions. At this moment both functions just operate with strides and doesn't store any tensor state. (Original RFC #19092) ----- Expands functionality of two tensor functions `.is_contiguous` and `.contiguous` (both python and c++ api). Note: We had several complaints about `.to(memory_format)` function, and decided not to support it. 1. `.contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`. - Using `torch.contiguous_format` will preserve existing `.contiguous()` behavior. - Calling `x.contiguous(memory_format=torch.channels_last)` returns new tensor which maintain same semantical layout (NCHW), but have different memory allocation pattern. `x.contiguous(memory_format=torch.channels_last)` expects input tensor to be 3d, 4d or 5d; and fails otherwise. 2. `.is_contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`. - `x.is_contiguous(memory_format=torch.contiguous_format)` preserves same functionality as `x.is_contiguous()` and remains unchanged. - `x.is_contiguous(memory_format=torch.channels_last)` returns true if A) input tensor is contiguous in memory AND B) allocated in the memory in NWHC (or similar for 3d,5d) format. Note: By the end of the phase one `x.is_contiguous(memory_format=torch.channels_last)` will calculate state of the Tensor on every call. This functionality going to be updated later. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20455 Differential Revision: D15341577 Pulled By: VitalyFedyunin fbshipit-source-id: bbb6b4159a8a49149110ad321109a3742383185d	2019-05-16 07:18:24 -07:00
Rui Zhu	c129ab06e9	Change onnxifi workflow to support multi-group quantized & Add multi quantization info to caffe2.proto (#20439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20439 This is the QTensorProto workflow for multi group quantization in C2 side. No DNNLOWP Tensor related thing is included in this pr, so once we finished glow side, we should be able to test this pr using resnet50. Reviewed By: yinghai Differential Revision: D15096919 fbshipit-source-id: 741eecd59eb79d24d9fe2b035f6246d42422d25c	2019-05-15 19:24:08 -07:00
Edward Yang	73a97387c1	Replace AT_CHECK with TORCH_CHECK [shard 9/10] Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20435 Reviewed By: jerryzh168 Differential Revision: D15318877 fbshipit-source-id: 4d83571187ea14a604fef83ac355d328b46d93e1	2019-05-15 08:05:59 -07:00
Kedar Pujara	254de9e8ec	Removing cyclic dependency (#20511 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20511 Removed cyclic dependency of caffe2/core/net.h and workspace.h Differential Revision: D15303412 fbshipit-source-id: 6e772e372cd0cf2af05d7815f1df8ae20bc2a65e	2019-05-14 18:55:19 -07:00
Sebastian Messmer	9e7f22b223	Remove dependencies from Caffe2Go on PyTorch JIT (#20463 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20463 Source file changes mostly involve ifdef'ing-out references to JIT code from files that are part of Caffe2Go. Update Internal build scripts to remove those files from our globs. After this, changes to most of the JIT files should not trigger mobile CI. Reviewed By: dzhulgakov Differential Revision: D15329407 fbshipit-source-id: 48f614c6b028eef0a03ce5161d083a3e078b0412	2019-05-14 14:36:08 -07:00
Ansha Yu	a9aaf698a4	add c2 benchmark runs in cpp (#20108 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20108 Add cpp runs for c2, hooked up via pybinds. Print output to terminal. This is not hooked up with the pep output yet because I'd like to verify the numbers first. Note that this isn't quite the same mechanism as the pytorch cpp hookup, which uses cpp_python_extensions. If I can use the same mechanism to pull all the inputs for c2 through cpp and do FeedBlobs in cpp, then I'll switch to that. Reviewed By: zheng-xq Differential Revision: D15155976 fbshipit-source-id: 708079dacd3e19aacfe43d70c5e5bc54da2cf9e3	2019-05-13 17:01:08 -07:00
Richard Zou	e01a5bf28b	Add USE_NAMEDTENSOR compilation flag. (#20162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20162 ghimport-source-id: 0efcd67f04aa087e1dd5faeee550daa2f13ef1a5 Reviewed By: gchanan Differential Revision: D15278211 Pulled By: zou3519 fbshipit-source-id: 6fee981915d83e820fe8b50a8f59da22a428a9bf	2019-05-09 09:09:16 -07:00
Yanbo Liang	a8387b7779	Delete TensorImpl::GetDevice() (#20025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20025 Delete TensorImpl::GetDevice() and clean all its call sites. Reviewed By: ezyang Differential Revision: D15170917 fbshipit-source-id: b6862b74aa036198544f79d18a8c0f995cb0ca7b	2019-05-06 12:44:23 -07:00
Tongliang Liao	1dfeffbff5	Expose test utils (#20114 ) Summary: Some functions were not decorated with `CAFFE2_API`, makes them unusable when creating unit tests for custom ops outside Caffe2 repo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20114 Differential Revision: D15217490 Pulled By: ezyang fbshipit-source-id: dda3910ad24e566567607deaac705a34ec8e7b8d	2019-05-06 07:06:04 -07:00
Tongliang Liao	f2c715cbe1	Fix the spelling of "context" Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20055 Differential Revision: D15217488 Pulled By: ezyang fbshipit-source-id: bb2b57b5e749357b47a01c6c3e73addf3c5418c7	2019-05-06 06:54:30 -07:00
Sebastian Messmer	fb8792e2b6	Remove torch/jit from xplat build (#19967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19967 - Reviewed By: dreiss, dzhulgakov Differential Revision: D15150843 fbshipit-source-id: af7d6902934883be9d8021b3601de2fe1f3bf806	2019-05-02 15:31:06 -07:00
Zachary DeVito	55c719b161	Remove operator.h's dependency on function_schema.h (#19817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19817 A lot of files were depending on the JIT's typesystem because operator.h depends on function_schema.h. However, this isn't fundamental to the design. This diff tries to remove the direct depenency and only includes the c10 wrapper helpers in files where it is required. Reviewed By: smessmer Differential Revision: D15112247 fbshipit-source-id: 2c53d83e542c32d9a398c8b60dbf40ab7a1cb0f6	2019-04-29 19:50:43 -07:00
Xiaomeng Yang	2ce39de3fc	Add elementwise_affine for layer_norm_op (#19713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19713 Add elementwise_affine for layer_norm_op Reviewed By: houseroad Differential Revision: D15075454 fbshipit-source-id: e8a7d3da1c81e49fa55323f5e74a68bc4ef8d83f	2019-04-26 17:20:01 -07:00
David Goodwin	c855e04d5f	Caffe2 shouldn't fail if CUDA peer access is already enabled Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19586 Differential Revision: D15061544 Pulled By: dzhulgakov fbshipit-source-id: 6a5f9f4fe45259d689671f58ad5206cdaf15c5bd	2019-04-24 13:22:27 -07:00
Yinghai Lu	b85edac16f	Fix out-of-topological-order issue in Nomnigraph (#19458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19458 The algorithm in https://fburl.com/ggh9iyvc fails to really ensure topological ordering of nodes. The fix is ugly but effective. I think we need a real topological sort to fix this issue more nicely. Mikhail Zolotukhin, Bram Wasti. Differential Revision: D15011893 fbshipit-source-id: 130c3aa442f5d578adfb14fbe5f16aa722434942	2019-04-19 12:19:39 -07:00
Sebastian Messmer	17f05ad5e5	Moving at::Tensor into caffe2::Tensor without bumping refcount (#19388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19388 The old implementation forced a refcount bump when converting at::Tensor to caffe2::Tensor. Now, it is possible to move it without a refcount bump. Reviewed By: dzhulgakov Differential Revision: D14986815 fbshipit-source-id: 92b4b0a6f323ed38376ffad75f960cad250ecd9b	2019-04-18 14:13:26 -07:00
Sebastian Messmer	601f36bacc	Use string based schema for exposing caffe2 ops (#19287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19287 Since we now have a string-schema-based op registration API, we can also use it when exposing caffe2 operators. Reviewed By: dzhulgakov Differential Revision: D14931925 fbshipit-source-id: ec162469d2d94965e8c99d431c801ae7c43849c8	2019-04-18 02:04:50 -07:00
Sebastian Messmer	db611b7caf	Delete C10Tensor (#19328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19328 Plans changed and we don't want this class anymore. Reviewed By: dzhulgakov Differential Revision: D14966746 fbshipit-source-id: 09ea4c95b352bc1a250834d32f35a94e401f2347	2019-04-17 00:02:27 -07:00
Will Feng	c7b5a8a876	Change is_variable() to check existence of AutogradMeta, and remove is_variable_ (#19139 ) Summary: Currently, a TensorImpl's `is_variable_` is true if and only if the TensorImpl has AutogradMeta. This PR unifies these two concepts by removing `is_variable_` and change `is_variable()` to check existence of AutogradMeta instead. Removing `is_variable_` is part of the work in Variable/Tensor merge. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19139 Differential Revision: D14893339 Pulled By: yf225 fbshipit-source-id: ceb5e22c3c01f79b5d21d5bdbf4a7d1bc397796a	2019-04-11 14:03:33 -07:00
Gregory Chanan	b6ee83a5b4	Materialize a non-default device for C2 legacy storage. (#18605 ) Summary: It's not intended that Storages have 'default' CUDA devices, but this is allowable via the Storage::create_legacy codepath. This also messages with device_caching, because the initial cache is obtained from the Storage, which may have a 'default' device. Instead, we materialize a device by allocating 0 bytes via the allocator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18605 Differential Revision: D14680620 Pulled By: gchanan fbshipit-source-id: 6d43383d836e90beaf12bfe37c3f0506843f5432	2019-04-11 13:50:41 -07:00
Yinghai Lu	bbe648dffb	Allow empty net type (#19154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19154 I recently saw some weird workflow error due to empty but set net_type. Maybe we should just fallback to simple net in this case. Reviewed By: dzhulgakov Differential Revision: D14890072 fbshipit-source-id: 4e9edf8232298000713bebb0bfdec61e9c5df17d	2019-04-11 12:43:07 -07:00
Alexander Sidorov	0ca8f7a15f	Make BlackBoxPredictor handle networks throwing exceptions (#19080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19080 OSS: add a tiny unit test utility function to create tensors given shape and data outside of any workspace. I use it in an internal test Reviewed By: dzhulgakov Differential Revision: D14814194 fbshipit-source-id: 6d53b235d99a97da812215f5c7f11fecad363c8c	2019-04-09 16:42:12 -07:00
Lu Fang	75d6d8833d	remove interned_string.h dep (#19061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19061 remove the deps on interned_string.h Reviewed By: BIT-silence Differential Revision: D14850078 fbshipit-source-id: 07e6ad72a7de369049ea56f32b72276fb4c59b32	2019-04-09 09:59:15 -07:00
Lu Fang	443a58e03d	Export C10 operator in PyTorch Model (#18210 ) Summary: Almost there, feel free to review. these c10 operators are exported to _caffe2 domain. TODO: - [x] let the onnx checker pass - [x] test tensor list as argument - [x] test caffe2 backend and converter - [x] check the c10 schema can be exported to onnx - [x] refactor the test case to share some code - [x] fix the problem in ONNX_ATEN_FALLBACK Pull Request resolved: https://github.com/pytorch/pytorch/pull/18210 Reviewed By: zrphercule Differential Revision: D14600916 Pulled By: houseroad fbshipit-source-id: 2592a75f21098fb6ceb38c5d00ee40e9e01cd144	2019-04-08 16:06:00 -07:00
Duc Ngo	e7b2669151	caffe2 - Expose tensor filler util to Python (#18886 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18886 Expose tensor filler util to Python and add a unit test (both C++/Python) Reviewed By: salexspb Differential Revision: D14784470 fbshipit-source-id: bb8e013d1755c27c166e87d5a8491a97c65d3d8d	2019-04-08 11:54:10 -07:00
Jerry Zhang	40a54bf2f1	Change ReinitializeTensor to use C10_LOG_FIRST_N (#18531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18531 Currently we use C10_LOG_EVERY_MS to log the data type change, but it pollutes the log of some service, we would like to change it to C10_LOG_FIRST_N to prevent that. Reviewed By: dzhulgakov Differential Revision: D14647704 fbshipit-source-id: b84e4002bd4aa94d616133cd1049c3d4ab05386e	2019-04-02 21:03:37 -07:00
Rui Zhu	19fe2b9db4	Adding quantized tensor shape/type info support for caffe2=>glow in caffe2 side (#18621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18621 This diff added caffe2 support for onnxifi quantization. Reviewed By: yinghai Differential Revision: D14648767 fbshipit-source-id: 4ddb492cacbba6142305866e6dbb875880acaea3	2019-03-31 17:42:27 -07:00
Sebastian Messmer	9abc8a5b47	New operator registration MVP (#18161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18161 This introduces version 0 for the new operator registration. For now, it only works with kernels that are defined as stack-based functions. This is actually not the intended public API for defining kernels, but it's the basis which is going to be used to define the public APIs (see diffs on top for them), and it's also the API used for exposing caffe2 operators. This diff also switches the mechanism for exposing caffe2 operators to the new mechanism. Reviewed By: dzhulgakov Differential Revision: D14514231 fbshipit-source-id: 454ab7b5b46a10203aa27b175400d23f818dd1df	2019-03-30 00:07:16 -07:00
Sebastian Messmer	c6bfcb854b	Expose c10 operators to caffe2 by operator name (#18160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18160 When exposing a c10 operator to the caffe2 frontend, don't use the operator schema but use the operator name instead. This allows us to get rid of the existing mechanism for operator schema registration in a diff stacked on top. Reviewed By: dzhulgakov Differential Revision: D14513420 fbshipit-source-id: 6b08a9c6d9497eaf18b62361dd44bc07c7b4b76b	2019-03-26 12:36:11 -07:00
Gerard Goossen	46990c20fa	Verify def before infer fensor (#18129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18129 A lot of tensor interference function assume the operator passes the schema. So call Verity to make sure this is actually the case. Created diff before to add checking in Concat (https://github.com/pytorch/pytorch/pull/17110), but I encountered lot more places where this is assumed (for example ElementwiseOpShapeInference) Reviewed By: mdschatz Differential Revision: D14503933 fbshipit-source-id: cf0097b8c3e4beb1cded6b61e092a6adee4b8fcb	2019-03-22 06:36:25 -07:00
Hector Yuen	7bb36ada1f	fix -Wsign-compare warnings for some files inside c2 (#18123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18123 the motivation of this fix is to resolve things like: for(auto i = 0; i < N; i++) where N is bigger than int32 These instances of comparison were found by enabling -Wsign-compare There are way too many things to fix, so issuing this as a series of fixes The plan is to fix all these issues and then enable this flag into Caffe2 to catch future instances Reviewed By: ZolotukhinM Differential Revision: D14497094 fbshipit-source-id: bca3927a2188bd33a508fa503ba221c220cdaefe	2019-03-19 10:39:20 -07:00
Duc Ngo	da3cc6e7ee	Caffe2 - Add flag to fails if float point exceptions is detected in operator runs (#18040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18040 Add flag to fails if float point exceptions is detected in operator runs Sample exception Exception [enforce fail at operator.h:837] !std::fetestexcept(FE_DIVBYZERO). Division by zero floating point exception (FE_DIVBYZERO) reported. Error from operator: input: "1" input: "0" output: "out" name: "" type: "Div" Reviewed By: jspark1105 Differential Revision: D14467731 fbshipit-source-id: fad030b1d619a5a661ff2114edb947e4562cecdd	2019-03-16 12:28:05 -07:00
Sebastian Messmer	be364ac8d7	Specify overload name in function schema (#18037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18037 The FunctionSchema can now store an overload name and the parser knows how to parse it. Specify like this: my_func.overload1(arg1: Tensor) -> Tensor my_func.overload2(arg1: Tensor, arg2: Tensor) -> Tensor Reviewed By: zdevito Differential Revision: D14467497 fbshipit-source-id: 8832b32f07351bb61090357b17b77a6a2fed3650	2019-03-15 16:58:13 -07:00
Sebastian Messmer	7a3488e0fc	Expose c10 cuda ops to caffe2 (#18036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18036 - Add macros to export c10 cuda operators to caffe2 frontend - Instead of having a separate caffe2 registry for the c10 operator wrappers, use the existing caffe2 registries Reviewed By: ezyang Differential Revision: D14467495 fbshipit-source-id: 7715ed2e38d2bbe16f1446ae82c17193a3fabcb9	2019-03-15 16:58:12 -07:00
Sebastian Messmer	a41b6d7d1f	Simplify macros for exposing c10 ops to c2 (#17781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17781 The wrapper for calling a c10 operator from caffe2 is now based on a runtime FunctionSchema instead of compile time information. This way, it can be created for any c10 operator schema with just one invocation to a simple macro instead of having to define arguments and more as compile time structures. Furthermore, previously, the wrapper assumed there's an argument present for preallocated outputs, but that was only true for caffe2 operators exported to c10. So the wrapper only worked correctly for calling caffe2->c10->caffe2. Now with the new implementation, it works for any c10 operator. Also, binary size for this should be much smaller. Reviewed By: ezyang Differential Revision: D14375054 fbshipit-source-id: bac7ab8e63929e6e2a148eacac41ed092009aa86	2019-03-14 08:54:16 -07:00
Sebastian Messmer	25d06eef7b	Improve caffe2 operator wrapping (#17743 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17743 - caffe2::Operator::SetOutputTensor() can now be used in operators that are called from c10/PyTorch. - If the operator uses SetOutputTensor() instead of XOutput(), the wrapper doesn't preallocate an empty tensor for the operator anymore. Only outputs accessed in XOutput() will get an output tensor preallocated. - Remove the copying of the vector with output tensors into a vector with pointer to output tensors. - Preallocated outputs are now passed in as one TensorList argument on the stack. This TensorList argument has a well-defined name so other wrappers (i.e. the wrapper calling from c2 into c10) can recognize and use it). - Macros for exporting caffe2 operators to c10 are simplified. Instead of having `c10_op_handle_for_c2_op`, we now pass in the operator handle as a template argument. - `SetOutputTensor` and `OutputTensorOrUndefined` now work with operators exported to c10 Reviewed By: ezyang Differential Revision: D14362434 fbshipit-source-id: 44a5e717204f21ea8e9728437429d9b84906f9f5	2019-03-14 08:54:15 -07:00
James Reed	1d26a3ae7e	Open registration for c10 thread pool (#17788 ) Summary: 1. Move ATen threadpool & open registration mechanism to C10 2. Move the `global_work_queue` to use this open registration mechanism, to allow users to substitute in their own Pull Request resolved: https://github.com/pytorch/pytorch/pull/17788 Reviewed By: zdevito Differential Revision: D14379707 Pulled By: jamesr66a fbshipit-source-id: 949662d0024875abf09907d97db927f160c54d45	2019-03-08 15:38:41 -08:00
Dmytro Dzhulgakov	a60fadfb71	Change message on unknown db type to be friendly (#17795 ) Summary: CreateDB actually returns nullptr when db type is unknown and throws when the file is missing Pull Request resolved: https://github.com/pytorch/pytorch/pull/17795 Reviewed By: ezyang Differential Revision: D14383226 Pulled By: dzhulgakov fbshipit-source-id: 1dcf75a6b4ba8b64a24d4e5daf02db3189d56b7b	2019-03-08 10:46:24 -08:00
Sebastian Messmer	7f7d12854d	Remove legacy way of exposing caffe2 operators to PyTorch (#17742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17742 This path isn't used anymore, and is incompatible with the changes stacked on top of this diff. Removing it. cc bwasti to check and confirm these can really be deleted Reviewed By: ezyang Differential Revision: D14362426 fbshipit-source-id: 32cdc19f28c2a981ae1e204901420998367ee588	2019-03-08 10:22:41 -08:00
Edward Yang	1e6acc676f	Replace caffe2::DeviceGuard with c10::cuda::CUDAGuard (#17623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17623 Despite it's generic sounding name, caffe2::DeviceGuard actually only worked on CUDA devices. Rename it to something that more clearly spells out its applicability. I'm not sure if it's the right call, but in this patch I added 'using CUDAGuard = c10::cuda::CUDAGuard', as this seems to be more in-line with how the Caffe2 codebase is currently written. More idiomatic c10 namespace style would be to say cuda::CUDAGuard. Willing to change this if people shout. This is a respin of D13156470 (#14284) Reviewed By: dzhulgakov Differential Revision: D14285504 fbshipit-source-id: 93b8ab938b064572b3b010c307e1261fde0fff3d	2019-03-06 10:48:15 -08:00
Duc Ngo	e9eb18a18c	Remove nomscheduler (#17693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17693 Remove nomscheduler tool Reviewed By: yinghai Differential Revision: D14328168 fbshipit-source-id: 674d0e18596a4dc2bbb6b8d321f4066c4fc454ab	2019-03-06 10:48:13 -08:00
Sebastian Messmer	8569d9cbea	Fix XOutput/XOutputTensor for ivalue based c2 operators (#17599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17599 XOutput/XOutputTensor was broken for ivalue based operators. This diff fixes that. Reviewed By: ezyang Differential Revision: D14274003 fbshipit-source-id: b99f020244c66c4e2551dbd32ae0f665cc91b338	2019-03-04 14:20:13 -08:00
Sebastian Messmer	c7db0b35d8	Fix InputSize/OutputSize for ivalue based operators (#17579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17579 These methods previously just returned 0 when it was not a legacy operator, making it impossible to convert some operators. Reviewed By: dzhulgakov Differential Revision: D14253094 fbshipit-source-id: 72bfdcf6da291a4ab80d1e0ceb20984b86edc408	2019-03-04 14:20:12 -08:00
Sebastian Messmer	b004b31d06	Allow exposing caffe2 operators with variable number of input tensors to c10 (#17491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17491 Before, there was no way to expose a caffe2 operator that had a variable number of inputs. Now, this is allowed by giving the operator one tensor list input. Note that the tensor list must be the first input, and that any other tensor inputs will be ignored and inaccessible in this case. Reviewed By: ezyang Differential Revision: D14220705 fbshipit-source-id: 7f921bfb581caf46b229888c409bbcc40f7dda80	2019-02-28 16:31:59 -08:00
Sebastian Messmer	6706e9af19	Make C10_MOBILE consistent with how feature macros are usually used (#17481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17481 Usually, feature macros are either defined or undefined and checked accordingly. C10_MOBILE was a weird special case that was always defined but either defined to 1 or to 0. This caused a lot of confusion for me when trying to disable something from mobile build and it also disabled it from the server build (because I was using ifdef). Also, I found a place in the existing code base that made that wrong assumption and used the macro wrongly, see https://fburl.com/y4icohts Reviewed By: dzhulgakov Differential Revision: D14214825 fbshipit-source-id: f3a155b6d43d334e8839e2b2e3c40ed2c773eab6	2019-02-27 17:57:51 -08:00
Sebastian Messmer	7c5ffc4120	Disable c10 dispatcher on mobile (#17078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17078 This prevents caffe2 operators from being expsoed to c10 on mobile, which in turn causes the whole c10 dispatcher to be stripped away and saves binary size. We probably want to re-enable the c10 dispatcher for mobile, but for now this is ok. Reviewed By: ezyang Differential Revision: D14077972 fbshipit-source-id: e4dd3e3b60cdfbde91fe0d24102c1d9708d3e5c4	2019-02-27 17:57:50 -08:00
Ilia Cherniavskii	348d1889ff	Fix operator initialization order (#15445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15445 Initilize task graph after operators (task graph uses ops) Reviewed By: yinghai Differential Revision: D13530864 fbshipit-source-id: fdc91e9158c1b50fcc96fd1983fd000fdf20c7da	2019-02-26 15:41:16 -08:00
Ilia Cherniavskii	47263e48f4	Better handling of net errors in prof_dag counters (#17384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17384 Better handling of possible net run errors in prof_dag counters. Reviewed By: yinghai Differential Revision: D14177619 fbshipit-source-id: 51bc952c684c53136ce97e22281b1af5706f871e	2019-02-22 18:38:31 -08:00
Ilia Cherniavskii	0edc81136c	Rethrow exceptions from RunAsync (#15034 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15034 Rethrow exception happened during RunAsync, ensure that pending tasks are not executed after marked as finished Reviewed By: andrewwdye Differential Revision: D13409649 fbshipit-source-id: 3fd12b3dcf32af4752f8b6e55eb7a92812a5c057	2019-02-20 16:32:24 -08:00
Ilia Cherniavskii	0337494c6a	Reinforce scheduling invariants (#17132 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17132 schedule() function is not supposed to throw exception and is supposed to succeed in scheduling the full graph of tasks, potential errors (e.g. errors from underlying thread pool, out of memory exceptions etc) are considered not recoverable. The invariant - the graph of tasks is either not executed or executed in full before the call to finishRun() Reviewed By: andrewwdye Differential Revision: D14092457 fbshipit-source-id: a3e5d65dfee5ff5e5e71ec72bb9e576180019698	2019-02-20 16:32:23 -08:00
Brian W. Hart	fbd690c1fe	caffe2: fix PinnedCPUAllocator cudaHostRegister() leak (#16340 ) Summary: In the NUMA case, PinnedCPUAllocator's allocate() would return a DataPtr constructed by DefaultCPUAllocator, which would reference the Default... Delete() rather than the Pinned... Delete(). That meant Pinned... Delete() would never run, so cudaHostUnregister() would never be called when regions were freed. See: https://github.com/pytorch/pytorch/issues/16280 This change adds a 'naked_allocate()' method to the Default allocator that just returns a pointer to the allocated memory rather than wrapping it in a DataPtr. Pinned allocator uses that then constructs a DataPtr with reference to its own Delete(). Pull Request resolved: https://github.com/pytorch/pytorch/pull/16340 Reviewed By: dzhulgakov Differential Revision: D13843206 Pulled By: ezyang fbshipit-source-id: 9efb572e5a01b49ef2a4aceeccc13cd0b1066528	2019-02-15 07:02:33 -08:00
Michael Liu	5f866d0ea2	Apply modernize-use-override (2nd iteration) Summary: Use C++11’s override and remove virtual where applicable. Change are automatically generated. Reviewed By: Orvid Differential Revision: D14086124 fbshipit-source-id: 2005227d095d776ca3b4309a57f54e25782b9b58	2019-02-14 16:52:57 -08:00
Sebastian Messmer	65e06df24a	Use new constructor in USE_SIMPLE_CTOR_DTOR (#17080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17080 This changes all operators using this macro to the new format Reviewed By: dzhulgakov Differential Revision: D14078628 fbshipit-source-id: 67048e485e326765fd49567cc008633d3d500d5c	2019-02-14 15:54:16 -08:00
Dmytro Dzhulgakov	3408d9de20	Clean up Storage/StorageImpl constructors (#16948 ) Summary: Small cleanup while doing https://github.com/pytorch/pytorch/pull/16857: - rename C2 constructors as create_legacy - remove duplicated constructors - make resizable flag non-default Pull Request resolved: https://github.com/pytorch/pytorch/pull/16948 Differential Revision: D14062755 Pulled By: dzhulgakov fbshipit-source-id: 3b7b4ec9cdf67d2628cccc001156e040006b673e	2019-02-13 22:58:32 -08:00
Dmytro Dzhulgakov	51dd2000cd	unify c2 and TH allocator (#16892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16892 Replaces https://github.com/pytorch/pytorch/pull/14517 Merged caffe2 and TH CPU Allocators. Mostly using the code from caffe2 allocators. `memset` of caffe2 allocator is gone now. These two allocators should be almost the same. Baseline: ``` Running ./tensor_allocation Run on (48 X 2501 MHz CPU s) CPU Caches: L1 Data 32K (x24) L1 Instruction 32K (x24) L2 Unified 256K (x24) L3 Unified 30720K (x2) ------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------- BM_MakeStorageImpl 148 ns 148 ns 4676594 BM_StorageImplCtor 54 ns 54 ns 12957810 BM_MallocStorageImpl 62 ns 62 ns 11254745 BM_TensorImplCtor 22 ns 22 ns 31939472 BM_MallocTensorImpl 105 ns 105 ns 6505661 BM_Malloc_1 43 ns 43 ns 16464905 BM_MakeTensorFromStorage 126 ns 126 ns 5586116 BM_MakeVariableFromTensor 236 ns 236 ns 2995528 BM_ATenCPUTensorAllocationSmall1 319 ns 319 ns 2268884 BM_ATenCPUTensorAllocationSmall2 318 ns 318 ns 2163332 BM_ATenCPUTensorAllocationMedium1 403 ns 403 ns 1663228 BM_ATenCPUTensorAllocationMedium2 448 ns 448 ns 1595004 BM_ATenCPUTensorAllocationBig1 532 ns 532 ns 1352634 BM_ATenCPUTensorAllocationBig2 4486 ns 4486 ns 160978 ``` Changed: ``` Running ./tensor_allocation Run on (48 X 2501 MHz CPU s) CPU Caches: L1 Data 32K (x24) L1 Instruction 32K (x24) L2 Unified 256K (x24) L3 Unified 30720K (x2) ------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------- BM_MakeStorageImpl 141 ns 141 ns 4803576 BM_StorageImplCtor 55 ns 55 ns 13129391 BM_MallocStorageImpl 64 ns 64 ns 11088143 BM_TensorImplCtor 23 ns 23 ns 31616273 BM_MallocTensorImpl 101 ns 101 ns 7017585 BM_Malloc_1 39 ns 39 ns 18523954 BM_MakeTensorFromStorage 118 ns 118 ns 5877919 BM_MakeVariableFromTensor 452 ns 452 ns 1565722 BM_ATenCPUTensorAllocationSmall1 384 ns 384 ns 1819763 BM_ATenCPUTensorAllocationSmall2 389 ns 389 ns 1857483 BM_ATenCPUTensorAllocationMedium1 425 ns 425 ns 1646284 BM_ATenCPUTensorAllocationMedium2 430 ns 430 ns 1561319 BM_ATenCPUTensorAllocationBig1 508 ns 508 ns 1309969 BM_ATenCPUTensorAllocationBig2 3799 ns 3799 ns 173674 ``` lstm benchmark: Before: ``` INFO:lstm_bench:Iter: 1 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 21 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 41 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 61 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 81 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 101 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 121 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 141 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 161 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 181 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 201 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 221 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 241 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 261 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 281 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 301 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 321 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 341 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 361 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 381 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Done. Total EPS excluding 1st iteration: 0.8k ``` After: ``` INFO:lstm_bench:Iter: 1 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 21 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 41 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 61 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 81 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 101 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 121 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 141 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 161 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 181 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 201 / 390. Entries Per Second: 0.8k. INFO:lstm_bench:Iter: 221 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 241 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 261 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 281 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 301 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 321 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 341 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 361 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Iter: 381 / 390. Entries Per Second: 0.7k. INFO:lstm_bench:Done. Total EPS excluding 1st iteration: 0.8k ``` Reviewed By: ezyang Differential Revision: D13202632 fbshipit-source-id: db6d2ec756ed15b0732b15396c82ad42302bb79d	2019-02-12 21:16:34 -08:00
Sebastian Messmer	9696fee635	Register CUDA kernels for caffe2 operators (#16691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16691 Previous diffs already introduced a macro that registers caffe2 CPU kernels with c10. This now also registers the CUDA kernels with it. Reviewed By: bwasti Differential Revision: D13901619 fbshipit-source-id: c15e5b7081ff10e5219af460779b88d6e091a6a6	2019-02-12 17:24:01 -08:00
Yinghai Lu	f435fb8290	Allow customization of blob node in net_drawer (#16915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16915 TSIA Reviewed By: ipiszy Differential Revision: D14018010 fbshipit-source-id: df5ccc06fa37f08e7a02a8acc466c4ad47afe04e	2019-02-12 15:02:50 -08:00
Edward Yang	40528efeac	More docs for methods in operator.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16826 Reviewed By: izdeby Differential Revision: D13979891 fbshipit-source-id: df8391ffaff0d44845057bb839f05aea6fc5712c	2019-02-12 08:19:38 -08:00
Sebastian Messmer	64aa769ef9	Minimize templated code in caffe2 operator wrapper (#16965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16965 Instead of having one large templated function to wrap the caffe2 op, minimize the amount of templated code. Non-templated code can be reused between different operators and decreases binary size. Reviewed By: orionr Differential Revision: D14018806 fbshipit-source-id: bedd4152eec21dd8c5778446963826316d210543	2019-02-11 14:15:43 -08:00
Xiaodong Wang	af0c79eed4	Catch cudaError_t return val (nodiscard in rocm) (#16399 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16399 Catching cudaError_t return values in a few places, because it's nodiscard in rocm. Unless we add -Wno-unused-result, it'll end up with a compilation error. Also in c10/cuda/test, check whether a host has GPU or not. We were silently throwing out the error before (so not really testing the cuda api). Reviewed By: bddppq Differential Revision: D13828281 fbshipit-source-id: 587d1cc31c20b836ce9594e3c18f067d322b2934	2019-02-11 13:18:36 -08:00
Bram Wasti	12bace141b	Revert D13970381: [caffe2] Add visibility to registry class to fix ubsan error Differential Revision: D13970381 Original commit changeset: 763db24b8a98 fbshipit-source-id: dda8672ed0bc6fecc4dde5ce73feb99e15205978	2019-02-08 16:21:10 -08:00
Bram Wasti	54c981d9a9	Add visibility to registry class to fix ubsan error (#16792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16792 fix Reviewed By: ezyang Differential Revision: D13970381 fbshipit-source-id: 763db24b8a98a2757a63b77c70c8c68ba47f31e6	2019-02-08 10:17:47 -08:00
Edward Yang	b9b0be7af2	Remove Legacy entry point. (#16721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16721 The very key line is we have to set the stream to the default stream before calling the allocator. This is very interesting. It shouldn't be necessary, but seemingly is! Reviewed By: dzhulgakov Differential Revision: D13943193 fbshipit-source-id: c21014917d9fe504fab0ad8abbc025787f559287	2019-02-08 09:33:58 -08:00
Edward Yang	5c982622b0	Delete duplicate copy of THCCachingAllocator (round two). (#16615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16615 This is another go at landing https://github.com/pytorch/pytorch/pull/16226 Now that the caching allocator is moved to c10_cuda, we can delete the duplicate copy from Caffe2. The difference between this and the previous PR is that this version faithfully maintains the binding code; in particular, we end up with a SECOND copy of the caching allocator in this patch. I verified that this code does NOT cause a crash in the workflow we canaried last time. In further diffs, I plan to eliminate the second copy, and then adjust the binding code. Reviewed By: dzhulgakov Differential Revision: D13901067 fbshipit-source-id: 66331fd4eadffd0a5defb3cea532d5cd07287872	2019-02-08 09:33:55 -08:00
Sebastian Messmer	2c713032a1	Don't automatically handle context parameter (#16867 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16867 Some caffe2 operators (example: BBoxTransform) have not just one template parameter which is the context, but might have multiple template parameters. Because of this, we can't handle the context parameter inside the macro. Reviewed By: bwasti Differential Revision: D13995696 fbshipit-source-id: f55c3be913c8b125445a8d486846fc2fab587a63	2019-02-07 20:53:17 -08:00
Sebastian Messmer	64339dbd51	Fix and re-enable test case (#16643 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16643 The test was disabled in D13908117 because it conflicted with another diff that was about to land. Now fixed the merge conflict and re-landing it. Reviewed By: ezyang Differential Revision: D13911775 fbshipit-source-id: b790f1c3a3f207916eea41ac93bc104d011f629b	2019-02-07 13:58:16 -08:00
Sebastian Messmer	6750e1e3e9	C10_REGISTER_CAFFE2_OPERATOR: Macro for registering c2 kernels (#16548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16548 With this macro, a caffe2 operator can now directly be registered with c10. No need to write custom wrapper kernels anymore. Differential Revision: D13877076 fbshipit-source-id: e56846238c5bb4b1989b79855fd44d5ecf089c9c	2019-02-07 13:58:14 -08:00
Edward Yang	4404762d7d	Rename IntList to IntArrayRef. (#16751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16751 This was made more complicated by the fact that ivalue::IntList is a thing. So I had to fix all of the sites where we referring to IValue post facto. The following codemods were run, in this order: ``` codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntList IntArrayRef codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntArrayRef::create IntList::create codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in ivalue::IntArrayRef ivalue::IntList codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in Tag::IntArrayRef Tag::IntList codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in isIntArrayRef isIntList codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in toIntArrayRef toIntList codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'Shared<IntArrayRef>' 'Shared<IntList>' codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'intrusive_ptr<IntArrayRef>' 'intrusive_ptr<IntList>' ``` Some manual fixups were done afterwards; they can be reviewed separately at https://github.com/pytorch/pytorch/pull/16752 Reviewed By: dzhulgakov Differential Revision: D13954363 fbshipit-source-id: b5c40aacba042402155a2f5a229fa6db7992ac64	2019-02-05 14:54:34 -08:00
Bram Wasti	af4d2b889c	Enable undefined at::Tensor to be passed as Output (#16730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16730 with Jerry's new updates Tensor must be defined -- as a result I've needed to update the shim for caffe2 ops being used in PyTorch Reviewed By: smessmer Differential Revision: D13946950 fbshipit-source-id: 6f77877c61a743f82bdfc2ad04d6ab583000cc18	2019-02-05 12:56:46 -08:00
Dmytro Dzhulgakov	3796cbaf7a	Try to turn off zero-out of tensors fully Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16601 Reviewed By: ezyang Differential Revision: D13893776 fbshipit-source-id: 3190258f2591540dc54ad8504ac6ded998bef384	2019-02-04 23:59:11 -08:00
Jerry Zhang	cb9740a608	Tensor method rename size()->numel() - 1/3 Summary: Codemod generated with clangr shard mode, 25 files per diff, Reviewed By: dzhulgakov Differential Revision: D13944296 fbshipit-source-id: 67e97c2cf45889d25f2cb3e2203cecba03c8a3aa	2019-02-04 23:33:17 -08:00
Sebastian Messmer	aaa8ace486	Implement new c10 dispatcher (#16625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16625 This is a squash of multiple PRs that refactored the old c10 dispatcher into a new one that follows the c10 dispatcher design doc. It is now unboxed and follows the Stack semantics from JIT. It also uses the runtime JIT schema instead of its own compile time schema definitions. Reviewed By: ezyang Differential Revision: D13907069 fbshipit-source-id: edcc4806ccd21474fdfb5a98516219b1956db13d	2019-02-01 13:52:01 -08:00
Bram Wasti	e4c1b51d82	Shim caffe2 GetRepeatedArgument helper for use with Ivalue (#16519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16519 GetRepeatedArguments is needed for some ops Reviewed By: dzhulgakov Differential Revision: D13864293 fbshipit-source-id: a39255cd391c28acd75a6f0e81d558542417e032	2019-01-31 17:33:57 -08:00
James Reed	dfb081a7e4	Fix a lot of C++ build warnings (#16411 ) Summary: I went through my build log and did what I thought were reasonable fixes to all the C++ compilation warnings that came up Pull Request resolved: https://github.com/pytorch/pytorch/pull/16411 Differential Revision: D13901006 Pulled By: jamesr66a fbshipit-source-id: 02df4e3e5a5c8dd9e69ac9f065cd3f2a80645033	2019-01-31 14:35:56 -08:00
Dmytro Dzhulgakov	a061e3fd77	Back out "Revert D13596031: Improve c2-aten tensor interop and add proper testing" (#16514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16514 Original commit changeset: dc371697f14b Relanding https://github.com/pytorch/pytorch/pull/15860 - the problem was that layer_norm was using at::empty which is not yet on mobile Reviewed By: ezyang Differential Revision: D13861480 fbshipit-source-id: e2116da32bc117175c96b9151b1beba9b31eff36	2019-01-31 13:38:55 -08:00
Bram Wasti	1ff46f03ed	Fix SIOF in torch using caffe2 registry (#16473 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16473 This resolves the issues associated with caffe2 initialization (specifically the REGISTER_FUNCTION_SCHEMA_OPERATOR calls) being run after Torch's static op registration calls. The fix employs a meyer's singleton wrapped by the constructor of a type. Everything is placed inside a macro to make it easier for users to use. Reviewed By: smessmer Differential Revision: D13854306 fbshipit-source-id: ecf60861f229532826fae254974e9af4389055df	2019-01-31 13:04:11 -08:00
Bram Wasti	1efad7f6be	Swap Caffe2 operator constructor to pass arguments by value (#16576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16576 allows instantiation of operator with arguments passed by move rather than explicit copies per Sebastian's suggestion Reviewed By: smessmer Differential Revision: D13882416 fbshipit-source-id: bc8d50e73f5a1ae87155b0cf96799b8573a7a8fa	2019-01-31 13:04:09 -08:00
Sebastian Messmer	12f92f453b	Kernel gets Stack* instead of ArrayRef<IValue> (#16282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16282 This changes the core kernel abstraction to be a function taking a stack, popping its arguments from the stack and pushing results to the stack, instead of getting arguments as ArrayRef<IValue> and returning an output IValue. Caffe2 operators need to have a way to pass in preallocated output tensors. The convention for them is to get all inputs and outputs on the stack and also return all of them, i.e. a caffe2 op will always have inputs == outputs. This will probably change in later diffs towards making the outputs in-arguments optional in the JIT schema. Reviewed By: ezyang Differential Revision: D13792335 fbshipit-source-id: e9cc2b5e438cc4653e1f701633a154b92b604932	2019-01-29 18:22:51 -08:00
Edward Yang	279238f0b8	Back out "Delete duplicate copy of THCCachingAllocator." (#16510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16510 This diff was supposed to be memory usage neutral, but based on some internal flows involving cuDNN, it was not. Reverting pending further investigation. Original commit changeset: 03f1ebf7f11c Reviewed By: xw285cornell Differential Revision: D13863610 fbshipit-source-id: 15517e255fd6b0c064b65fb99f0ef19742236cfd	2019-01-29 15:44:19 -08:00
Edward Yang	3b337e7892	Revert D13596031: Improve c2-aten tensor interop and add proper testing Differential Revision: D13596031 Original commit changeset: d20b601e06ba fbshipit-source-id: dc371697f14b3893a9164380a39e7a49d8d68ecf	2019-01-29 07:14:57 -08:00
Dmytro Dzhulgakov	5e21e0fe75	Improve c2-aten tensor interop and add proper testing (#15860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15860 Few changes (which are harder to split in separate diffs, so together): - make conversion explicit (as they can throw to avoid surprises) - fix tensor legacy dispatch not initialized when tensor is created on C2 side - add a bunch of invariants to enforce Reviewed By: ezyang Differential Revision: D13596031 fbshipit-source-id: d20b601e06ba47aeff2f6e8e15769840e2d46108	2019-01-28 23:41:50 -08:00
Sebastian Messmer	504bcb276c	Remove state from schema and calling API (#16180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16180 Only the kernel knows about its state, the caller doesn't see it anymore. Reviewed By: ezyang Differential Revision: D13744071 fbshipit-source-id: cb00ff1a881508c1b36ac4123bee1f68ca02ca9c	2019-01-28 17:46:08 -08:00
Jerry Zhang	e866bc7c88	Remove dims() in caffe2::Tensor (#16356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16356 att Reviewed By: dzhulgakov Differential Revision: D13813197 fbshipit-source-id: 68c0fb43404536f622422c51949c819d8a037aa5	2019-01-28 12:42:42 -08:00
Sebastian Messmer	05678d0bfa	Op-calling API can handle state (#16177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16177 Change the API for calling operators so that it can store state in an OpKernel object. This diff doesn't store the state there yet, that comes in a follow up diff. Reviewed By: ezyang Differential Revision: D13742889 fbshipit-source-id: 20511a9a1b9f850074e50634d4b4acf87f8c6ecd	2019-01-28 11:46:05 -08:00
Edward Yang	d2cdffaf37	More documentation on caffe2::Operator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16371 Reviewed By: dzhulgakov Differential Revision: D13820472 fbshipit-source-id: efccea0e92c86d30ec2bdda50eb9aab8a3a1504d	2019-01-28 07:41:14 -08:00
Jerry Zhang	f4e54fd659	trying to fix testX (#16370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16370 passed locally but seems testX has some problem Reviewed By: ezyang Differential Revision: D13820250 fbshipit-source-id: e4ad9d1ec99508867d4ead46753a7fb7019c50bd	2019-01-25 17:02:21 -08:00
Bram Wasti	13fde345fb	plug caffe2 into jit" (#16388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16388 previous diff broke master -- this refactors out the custom_operator.cpp file into a separate header + cpp pair (caffe2_operator.{h,cpp}) Reviewed By: smessmer Differential Revision: D13823550 fbshipit-source-id: 00e005e650336132d05aef97c1f0e5242ccad5ba	2019-01-25 16:52:32 -08:00
Jerry Zhang	539894d70a	Remove caffe2::ShareData (#16139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16139 Original commit changeset: 4b15a4c62995 Reviewed By: dzhulgakov Differential Revision: D13677464 fbshipit-source-id: 1a644a88fac02b44feebac48ccc01bc72cc47edb	2019-01-25 15:39:11 -08:00
Zachary DeVito	c42431bd7a	Revert D13740752: [c10] plug caffe2 into jit Differential Revision: D13740752 Original commit changeset: 2d9383574d42 fbshipit-source-id: e9ff217a438720423340a10af7fa263b33f2ae24	2019-01-25 12:29:19 -08:00
Edward Yang	45602ce9a2	Delete Tensor::swap(), replace with pointer swap (#12730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12730 i-am-not-moving-c2-to-c10 Reviewed By: smessmer Differential Revision: D10415430 fbshipit-source-id: 8a2ce8611c5fa77bbbd73fb6788c1baa3b370f07	2019-01-25 08:25:07 -08:00
Bram Wasti	6d2aee4a9b	plug caffe2 into jit (#16331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16331 Temporary measure to enable caffe2 ops in pytorch Reviewed By: smessmer Differential Revision: D13740752 fbshipit-source-id: 2d9383574d42ce84ee471aba32eeb4f5a0cc7a4c	2019-01-24 22:28:21 -08:00
Bram Wasti	d4b60f4014	Add RunOperator for using FunctionSchema registered ops easily in caffe2 (#16173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16173 Helper to make it easy to run ops in caffe2 Reviewed By: smessmer Differential Revision: D13468240 fbshipit-source-id: 2276c7870af6dcdf829957f005fd16ac1ef319b5	2019-01-24 22:28:19 -08:00
Bram Wasti	3b6b777a11	Add correct Input() shim to caffe2 operator impl (#16048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16048 This enables full shimming of the operator (previously it was only Output() shimmed). Reviewed By: smessmer Differential Revision: D13468241 fbshipit-source-id: c853b775ab5cdcd968f4a6cc4766e91c3c6b1c45	2019-01-24 22:28:18 -08:00
Jerry Zhang	0b470d0a3b	Add Test for ReinitializeTensor (#16338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16338 att Reviewed By: ezyang Differential Revision: D13806760 fbshipit-source-id: 322b9b7d314aeb0194f52b803ca35c0cb8efcdec	2019-01-24 15:05:21 -08:00
Will Feng	2a70f24cce	Add thread-local guard: at::AutoNonVariableTypeMode (#15939 ) Summary: This PR adds thread-local guard (`at::AutoNonVariableTypeMode`) to make sure that in VariableType.cpp the operations on baseType still dispatch to non-Variable type, even if the parameters will become Variables after the Tensor/Variable merge. We achieve this by making `legacyTensorType()` and `getType()` check the `at::AutoNonVariableTypeMode` guard to decide whether to return non-Variable type for a variable. This is part of the VariableImpl/TensorImpl merge work: https://github.com/pytorch/pytorch/issues/13638. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15939 Reviewed By: ezyang Differential Revision: D13640980 Pulled By: yf225 fbshipit-source-id: d12c2543822958558d7d70d36c50999a5eb8783f	2019-01-24 14:33:03 -08:00
Edward Yang	792cb774f1	Delete duplicate copy of THCCachingAllocator. (#16226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16226 Now that the caching allocator is moved to c10_cuda, we can delete the duplicate copy from Caffe2. Reviewed By: dzhulgakov, smessmer Differential Revision: D13762540 fbshipit-source-id: 03f1ebf7f11c68c19aa0d66110156fe228da6138	2019-01-24 12:06:57 -08:00
Jerry Zhang	0a3932acb2	Fix comparison in ReinitializeTensor (#16294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16294 In `ReinitializeTensor`, we compare `tensor->GetDevice()` and `options.device()`, but in the callsite, we actually just provide an option with `device_type`, which means the `device_id` will always be default(-1) for `options`, but for tensor, although it is passed a `device` with default `device_id`, when we allocate the data, the `device` of the `tensor` is the `device` of `Storage`, which is the `device` of underlying `DataPtr`, which is the same as the `device` of the `Context` of the operator, which has a non-default `device_id`. Therefore everytime we do `ReinitializeTensor`, we'll find the `device` does not match, and after the `ReinitializeTensor` call, the `device` still does not match. That's why everytime we'll allocate a new Tensor and cause perf regressions for ops that uses `ReinitializeTensor` on multiple GPUs. Reviewed By: BIT-silence Differential Revision: D13795635 fbshipit-source-id: 24d6afa1a0196a32eb0134ee08b4280244cdb0c3	2019-01-23 19:29:29 -08:00
Benny Chen	f25322fb97	Fix issues under caffe round 1 Summary: Some automation to fix uninitialized members for caffe2 code. Ran canary to make sure I don't have any regression in prod, but not sure how to test comprehensively for caffe2 Reviewed By: ezyang Differential Revision: D13776185 fbshipit-source-id: fb2a479971cc0276d8784be1c44f01252410bd24	2019-01-23 19:04:59 -08:00
Edward Yang	07a090247a	Change data() accessor in Caffe2 to return non-const pointer. (#16176 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16176 This makes PyTorch and Caffe2's data() method line up. Historically, PyTorch made no distinction between tensors with const or non-const data, and thus provided a non-const pointer with data() member. Changing the API to return a const-pointer would break all mutable code, whereas changing the Caffe2 API to change a pointer doesn't break any code, except for code which required an exact match on const-ness (e.g., in template arguments). Since the latter is less disruptive, we've opted for it here. The few places downstream that broke due to this are fixed in this patch. Reviewed By: smessmer Differential Revision: D13742916 fbshipit-source-id: baa4b4544cfdf7c1f369f4d69a1e0d5953c1bd99	2019-01-23 13:55:24 -08:00
Shahzad Lone	53ae8bc64d	Reserve vectors that we know the size in advance for. (#16201 ) Summary: Save reallocation costs, by reserving vectors according to how many elements we expect to put in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16201 Differential Revision: D13762594 Pulled By: ezyang fbshipit-source-id: 7e3bfe421489dde48a2ddb0920dd155f69baecc0	2019-01-22 08:02:40 -08:00
Sebastian Messmer	c9044166a5	Make c10 dispatcher use boxed kernel function pointers (#16051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16051 This changes the kernels stored in the c10 dispatcher from plain C function pointers to IValue-based KernelFunction. Note that KernelFunction is currently taking an `ArrayRef<IValue>` as arguments. A later diff will change that to it taking a `Stack`. Reviewed By: ezyang Differential Revision: D13684518 fbshipit-source-id: 1fa54f60cec2e967b92a4a043d6e3ac1627ed991	2019-01-18 16:02:15 -08:00
Jerry Zhang	3f4bb3d493	rest of uses for deprecation of dims() in Tensor (#16118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16118 att Differential Revision: D13697211 fbshipit-source-id: 12bf6edd1794240ac748cc1b8fecb0c1e8eb9112	2019-01-18 11:52:12 -08:00
Jerry Zhang	da578b7dcf	Add defined() to caffe2::Tensor (#16125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16125 Add defined() method to check whether the Tensor is defined. Reviewed By: ezyang Differential Revision: D13719222 fbshipit-source-id: ff8efef2159ed1026bd16acaea40c768a1e20a47	2019-01-18 11:03:36 -08:00
Edward Yang	b9b160d86f	Remove ATen/Half.h and ATen/core/Half.h forwarding headers. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16115 Reviewed By: bddppq Differential Revision: D13717049 fbshipit-source-id: fb1d690183a932a1fa1a2d235f3219520f51620a	2019-01-18 10:55:21 -08:00
Dmytro Dzhulgakov	aaff2fecda	Remove caffe2::Tensor copy constructor (#15416 ) Summary: Based on offline discussion it should be less surprising to the users of existing code. Thus caffe2::Tensor is now a move-only class (as it used to be), explicit calls to UnsafeSharedInstance() are necessary to get shared_ptr behavior. This change also identified a few places that misused the copy constructor - those are fixed Pull Request resolved: https://github.com/pytorch/pytorch/pull/15416 Reviewed By: Yangqing Differential Revision: D13524598 fbshipit-source-id: aea12d6dff77342606fa88ce4ddddbff266245a7	2019-01-18 00:31:56 -08:00
Sebastian Messmer	3e85a2bcbf	Move c10 dispatcher back to ATen/core (#16050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16050 The c10 dispatcher will (soon) depend on IValue and IValue can't be moved to c10 yet because it depends on at::Tensor, which depends on legacy Type dispatch and we don't want the legacy dispatch in c10. So instead, we move the c10 dispatcher back to ATen/core until we can actually move at::Tensor to c10. Reviewed By: ezyang Differential Revision: D13684517 fbshipit-source-id: 1125f4254223907c52f96ff73034f6d4ae9fd0a7	2019-01-17 15:56:52 -08:00
QingfengLi	ded4ff87af	fix a little error in comments (#15922 ) Summary: There is a little error in the comment, "A->B", so the Task B must start after task A finishes, not "B". Pull Request resolved: https://github.com/pytorch/pytorch/pull/15922 Differential Revision: D13709579 Pulled By: ezyang fbshipit-source-id: 735afe83f4532b7c7456da3e96209b3e07071f37	2019-01-17 00:25:23 -08:00
fulltopic	c7a48da493	Corresponding data type for BYTE (#15627 ) Summary: TensorProto.DataType in caffe2/proto/caffe2.proto has BYTE = 3 defined, while there is no corresponding TypeMeta defined in caffe2/core/types.cc: DataTypeToTypeMeta. This issue failed the C++ tutorial of MNIST + LMDB. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15627 Differential Revision: D13709602 Pulled By: ezyang fbshipit-source-id: d4826d0f9b3975e6a8478d4bad1abbbedcaea197	2019-01-17 00:17:56 -08:00
Edward Yang	5b2d30ec85	Revert D12812029: [pt1][tensor] Remove deprecated caffe2::Tensor APIs Differential Revision: D12812029 Original commit changeset: ea0c3dd882be fbshipit-source-id: d5bb4cbb1d7c9be08789599a7db0fb3313f3dbc4	2019-01-16 14:53:20 -08:00
Jerry Zhang	5353847b19	Remove deprecated caffe2::Tensor APIs (#15814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15814 Plan is to remove the APIs we want to deprecate one by one and make sure it still builds in sandcastle and ossci Reviewed By: ezyang Differential Revision: D12812029 fbshipit-source-id: ea0c3dd882bec95fcd4507160ebc61f598b6d040	2019-01-15 18:42:04 -08:00
Jerry Zhang	5e72e99c86	Remaining Tensor API fixes - dims() -> sizes() (#15743 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15743 Remaining fixes so that D12812029 will compile Reviewed By: dzhulgakov Differential Revision: D13535559 fbshipit-source-id: 2c8b3403570c8c35ac8efe2d827233abc0e6e0d1	2019-01-15 18:42:02 -08:00
Edward Yang	8b5894491c	Comment about CuDNNWrapper (#15496 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/15496 Differential Revision: D13544130 Pulled By: ezyang fbshipit-source-id: 51bdd8312b482925b30a478774cdfa629c57ee4e	2019-01-15 18:01:12 -08:00
Duc Ngo	10b16953d1	nomnigraph - easy - use new test utils in converter_nomnigraph_test (#15751 ) Summary: Use new test utils in converter_nomnigraph_test , and add utils to set device option name, external inputs, outputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15751 Differential Revision: D13586228 Pulled By: duc0 fbshipit-source-id: ff809dd7bf9f30641ce2a6fef7e2810f005521c2	2019-01-14 18:38:38 -08:00
Jerry Zhang	6371bc76a9	Back out "[pt1][tensor] Remove caffe2::ShareData" (#15983 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15983 Original commit changeset: 6e4275d02f4c Reviewed By: supertopher, Yangqing Differential Revision: D13644123 fbshipit-source-id: 4b15a4c62995c0e68aad58465600409e302e6504	2019-01-12 07:07:22 -08:00
Jerry Zhang	fd0ed2e298	Tensor reinitialization codemod - 4/5 (#15967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15967 Codemod generated with clangr shard mode, 25 files per diff, To eliminiate partially initialized Tensor, we split the initialization of local Tensor variables into two steps, first declare un uninitialized Tensor, and call `ReinitializeTensor` to initialize it. motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: smessmer Differential Revision: D13586735 fbshipit-source-id: eae2d79e1107a2e813ce3809e690af4706aaa9ca	2019-01-11 16:41:19 -08:00
Lin Yang	d042914221	FC shape inference should use int64_t (#15961 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15961 as title Reviewed By: yinghai Differential Revision: D13634427 fbshipit-source-id: ec7d168b6272f0dac8a693401cfd0bea368f929a	2019-01-11 14:28:39 -08:00
Dmytro Dzhulgakov	96ea2594d8	Don't call cudaStreamDestroy at destruction time (#15692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15692 It was leading to ocassional crashes with dynamically linked CUDA because runtime was already destroyed. Also, unique_ptr<T[]> is more suitable than deque<T> for the purpose. Reviewed By: Yangqing Differential Revision: D13571988 fbshipit-source-id: 37eb26dfbe361c49160367b53f87bd037c6c0e46	2019-01-11 12:36:41 -08:00
Sebastian Messmer	da7468853a	caffe2::Tensor::is_same() (#15407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15407 Don't ask the tensor for its intrusive pointer if we just want to check if two tensors are the same. This mirrors ATen APIs. Reviewed By: dzhulgakov Differential Revision: D13520389 fbshipit-source-id: 681317f36f480ab60e532bb08a073f98f39770fd	2019-01-10 16:22:25 -08:00
Sebastian Messmer	d408324350	Move files to/from c10/core and c10/util (#15316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15316 This starts cleaning up the files in c10 according to the module structure we decided on. Move to c10/util: - Half.h, Half-inl.h, Half.cpp, bitcasts.h Move to c10/core: - Device.h, Device.cpp - DeviceType.h, DeviceType.cpp i-am-not-moving-c2-to-c10 Reviewed By: dzhulgakov Differential Revision: D13498493 fbshipit-source-id: dfcf1c490474a12ab950c72ca686b8ad86428f63	2019-01-10 16:22:22 -08:00
Sebastian Messmer	6b64052e20	Remove Context from c10 operator schemas (#15312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15312 Context will soon be entirely obsolete. Remove it from the operator schema interface. Reviewed By: dzhulgakov Differential Revision: D13495323 fbshipit-source-id: caa0f8f092cd6284e510c3e1e3374fe2f8338364	2019-01-10 16:22:20 -08:00
Orion Reblitz-Richardson	4edc8273eb	Allow for registration after GlobalInit (#15876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15876 Build changes made it so some .so libraries are now registered after GlobalInit is called. Although this shouldn't be common, it also shouldn't be explicitly excluded. These changes allow for late Caffe2 registration, but also warn in that case. Reviewed By: kuttas Differential Revision: D13608186 fbshipit-source-id: 0ca7bcd32516d374077db0c2548cf8c28ccdd5f6	2019-01-10 09:35:33 -08:00
Jerry Zhang	0c32e1b43e	use C10_MOBILE/ANDROID/IOS (#15363 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15363 Didn't define C10_MOBILE in the numa file move diff: D13380559 move CAFFE2_MOBILE/ANDROID/IOS to c10 ``` codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_MOBILE" "C10_MOBILE" codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_ANDROID" "C10_ANDROID" codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_IOS" "C10_IOS" ``` i-am-not-moving-c2-to-c10 Reviewed By: marcinkwiatkowski Differential Revision: D13490020 fbshipit-source-id: c4f01cacbefc0f16d5de94155c26c92fd5d780e4	2019-01-09 15:08:20 -08:00
Sebastian Messmer	d562840910	Use C10Tensor in the dispatcher (#15195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15195 This removes the use of caffe2::Tensor or at::Tensor in the c10 dispatcher and only uses C10::Tensor. It also changes output tensors to be passed as `const Tensor&` instead of `Tensor*` because we otherwise can't forward them in operator_c10wrapper.h. Reviewed By: ezyang Differential Revision: D13461640 fbshipit-source-id: 7f79925a7d60f01660a24bbfda47391af0c70ed3	2019-01-08 20:31:43 -08:00
Sebastian Messmer	8ac55a6812	Convert caffe2/aten Tensors to/from c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14820 Reviewed By: dzhulgakov Differential Revision: D13348044 fbshipit-source-id: 95008e6ead3cfc478696b1c203769241d4cf6ca8	2019-01-08 20:31:42 -08:00
Sebastian Messmer	31d7c933af	Implement c10::Tensor (#14819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14819 This is a minimal wrapper for a c10::TensorImpl, maybe destined for greatness later when we move caffe2::Tensor or at::Tensor into c10. Reviewed By: dzhulgakov Differential Revision: D13348039 fbshipit-source-id: 874f515358e94f35dc7a4c3e55b35fde59c51ff1	2019-01-08 20:31:40 -08:00
Jerry Zhang	ede1f4ad05	Remove caffe2::ShareData (#15418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15418 Previously we are using Resize + ShareData. Instead, we'll create a function on Tensor that clones itself with same storage. Suppose we want `t` to `ShareData` with `t0`, Previous: ``` Tensor t(dims, CPU); t.Resize(t0.sizes()); t.ShareData(t0); ``` Now: ``` Tensor t = t0.Alias(); ``` Reviewed By: dzhulgakov Differential Revision: D13507609 fbshipit-source-id: 6e4275d02f4c3356cbce91127f1b01111dc86b9f	2019-01-08 11:01:56 -08:00
Jerry Zhang	3ea5a9a66d	Remove PythonOp non-CPU path and PytorchOp (#15417 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15417 Right now the way we test whether Blob contains a CPU tensor is broken in ```PythonOpBase``` is broken, which means non-CPU path might never be taken. Searching through the codebase, non-gpu path is used in PythonDLPack, and it is used in PytorchOp which is unused. So we'll remove non-gpu path in this diff. Reviewed By: dzhulgakov Differential Revision: D13495011 fbshipit-source-id: 9fe9537f05026d2a2cf7051efa81d184de722710	2019-01-02 16:36:37 -08:00

... 3 4 5 6 7 ...

1443 Commits