pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Yury Gitman	9c8f5cb61d	Ensure IDEEP transpose operator works correctly Summary: I found out that without exporting to public format IDEEP transpose operator in the middle of convolution net produces incorrect results (probably reading some out-of-bound memory). Exporting to public format might not be the most efficient solution, but at least it ensures correct behavior. Test Plan: Running ConvFusion followed by transpose should give identical results on CPU and IDEEP Reviewed By: bwasti Differential Revision: D22970872 fbshipit-source-id: 1ddca16233e3d7d35a367c93e72d70632d28e1ef	2020-08-11 12:58:31 -07:00
Edward Yang	a058e938f9	Refactor error msg stack handling, add TORCH_RETHROW (#37101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37101 Fixes #36954. The basic concept is to streamline the process of rethrowing c10::Error with extra error information. This is in a few steps: - I completely remodeled the Error data type and the internal invariants. Instead of manually adding in newlines, the message stack formatting process is responsible for inserting newlines and spacing as necessary. Call sites are then modified to respect the new API model. - TORCH_RETHROW macro is added, which adds context to an error message and then rethrows it. New internal assert failure looks like: ``` 0 INTERNAL ASSERT FAILED at ../c10/test/util/exception_test.cpp:64, please report a bug to PyTorch. Exception raised from TestBody at ../c10/test/util/exception_test.cpp:64 (most recent call first): frame #0: <unknown function> + 0x6aab9 (0x7ff611d3aab9 in /data/users/ezyang/pytorch-tmp/build/lib/libc10.so) frame #1: ... ``` Error message with context looks like: ``` This is an error This is context 1 This is context 2 ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21202891 Pulled By: ezyang fbshipit-source-id: 361cadd16bc52e5886dba08e79277771ada76169	2020-05-04 11:56:45 -07:00
pinzhenx	bd604cb5b7	Upgrade MKL-DNN to DNNL v1.2 (#32422 ) Summary: ## Motivation This PR upgrades MKL-DNN from v0.20 to DNNL v1.2 and resolves https://github.com/pytorch/pytorch/issues/30300. DNNL (Deep Neural Network Library) is the new brand of MKL-DNN, which improves performance, quality, and usability over the old version. This PR focuses on the migration of all existing functionalities, including minor fixes, performance improvement and code clean up. It serves as the cornerstone of our future efforts to accommodate new features like OpenCL support, BF16 training, INT8 inference, etc. and to let the Pytorch community derive more benefits from the Intel Architecture. <br> ## What's included? Even DNNL has many breaking changes to the API, we managed to absorb most of them in ideep. This PR contains minimalist changes to the integration code in pytorch. Below is a summary of the changes: <br> General: 1. Replace op-level allocator with global-registered allocator ``` // before ideep::sum::compute<AllocForMKLDNN>(scales, {x, y}, z); // after ideep::sum::compute(scales, {x, y}, z); ``` The allocator is now being registeted at `aten/src/ATen/native/mkldnn/IDeepRegistration.cpp`. Thereafter all tensors derived from the `cpu_engine` (by default) will use the c10 allocator. ``` RegisterEngineAllocator cpu_alloc( ideep::engine::cpu_engine(), [](size_t size) { return c10::GetAllocator(c10::DeviceType::CPU)->raw_allocate(size); }, [](void* p) { c10::GetAllocator(c10::DeviceType::CPU)->raw_deallocate(p); } ); ``` ------ 2. Simplify group convolution We had such a scenario in convolution where ideep tensor shape mismatched aten tensor: when `groups > 1`, DNNL expects weights tensors to be 5-d with an extra group dimension, e.g. `goihw` instead of `oihw` in 2d conv case. As shown below, a lot of extra checks came with this difference in shape before. Now we've completely hidden this difference in ideep and all tensors are going to align with pytorch's definition. So we could safely remove these checks from both aten and c2 integration code. ``` // aten/src/ATen/native/mkldnn/Conv.cpp if (w.ndims() == x.ndims() + 1) { AT_ASSERTM( groups > 1, "Only group _mkldnn_conv2d weights could have been reordered to 5d"); kernel_size[0] = w.get_dim(0) * w.get_dim(1); std::copy_n( w.get_dims().cbegin() + 2, x.ndims() - 1, kernel_size.begin() + 1); } else { std::copy_n(w.get_dims().cbegin(), x.ndims(), kernel_size.begin()); } ``` ------ 3. Enable DNNL built-in cache Previously, we stored DNNL jitted kernels along with intermediate buffers inside ideep using an LRU cache. Now we are switching to the newly added DNNL built-in cache, and no longer caching buffers in order to reduce memory footprint. This change will be mainly reflected in lower memory usage from memory profiling results. On the code side, we removed couple of lines of `op_key_` that depended on the ideep cache before. ------ 4. Use 64-bit integer to denote dimensions We changed the type of `ideep::dims` from `vector<int32_t>` to `vector<int64_t>`. This renders ideep dims no longer compatible with 32-bit dims used by caffe2. So we use something like `{stride_.begin(), stride_.end()}` to cast parameter `stride_` into a int64 vector. <br> Misc changes in each commit: Commit: change build options Some build options were slightly changed, mainly to avoid name collisions with other projects that include DNNL as a subproject. In addition, DNNL built-in cache is enabled by option `DNNL_ENABLE_PRIMITIVE_CACHE`. Old \| New -- \| -- WITH_EXAMPLE \| MKLDNN_BUILD_EXAMPLES WITH_TEST \| MKLDNN_BUILD_TESTS MKLDNN_THREADING \| MKLDNN_CPU_RUNTIME MKLDNN_USE_MKL \| N/A (not use MKL anymore) ------ Commit: aten reintegration - aten/src/ATen/native/mkldnn/BinaryOps.cpp Implement binary ops using new operation `binary` provided by DNNL - aten/src/ATen/native/mkldnn/Conv.cpp Clean up group convolution checks Simplify conv backward integration - aten/src/ATen/native/mkldnn/MKLDNNConversions.cpp Simplify prepacking convolution weights - test/test_mkldnn.py Fixed an issue in conv2d unit test: it didn't check conv results between mkldnn and aten implementation before. Instead, it compared the mkldnn with mkldnn as the default cpu path will also go into mkldnn. Now we use `torch.backends.mkldnn.flags` to fix this issue - torch/utils/mkldnn.py Prepack weight tensor on module `__init__` to achieve better performance significantly ------ Commit: caffe2 reintegration - caffe2/ideep/ideep_utils.h Clean up unused type definitions - caffe2/ideep/operators/adam_op.cc & caffe2/ideep/operators/momentum_sgd_op.cc Unify tensor initialization with `ideep::tensor::init`. Obsolete `ideep::tensor::reinit` - caffe2/ideep/operators/conv_op.cc & caffe2/ideep/operators/quantization/int8_conv_op.cc Clean up group convolution checks Revamp convolution API - caffe2/ideep/operators/conv_transpose_op.cc Clean up group convolution checks Clean up deconv workaround code ------ Commit: custom allocator - Register c10 allocator as mentioned above <br><br> ## Performance We tested inference on some common models based on user scenarios, and most performance numbers are either better than or on par with DNNL 0.20. ratio: new / old \| Latency (batch=1 4T) \| Throughput (batch=64 56T) -- \| -- \| -- pytorch resnet18 \| 121.4% \| 99.7% pytorch resnet50 \| 123.1% \| 106.9% pytorch resnext101_32x8d \| 116.3% \| 100.1% pytorch resnext50_32x4d \| 141.9% \| 104.4% pytorch mobilenet_v2 \| 163.0% \| 105.8% caffe2 alexnet \| 303.0% \| 99.2% caffe2 googlenet-v3 \| 101.1% \| 99.2% caffe2 inception-v1 \| 102.2% \| 101.7% caffe2 mobilenet-v1 \| 356.1% \| 253.7% caffe2 resnet101 \| 100.4% \| 99.8% caffe2 resnet152 \| 99.8% \| 99.8% caffe2 shufflenet \| 141.1% \| 69.0% † caffe2 squeezenet \| 98.5% \| 99.2% caffe2 vgg16 \| 136.8% \| 100.6% caffe2 googlenet-v3 int8 \| 100.0% \| 100.7% caffe2 mobilenet-v1 int8 \| 779.2% \| 943.0% caffe2 resnet50 int8 \| 99.5% \| 95.5% _Configuration: Platform: Skylake 8180 Latency Test: 4 threads, warmup 30, iteration 500, batch size 1 Throughput Test: 56 threads, warmup 30, iteration 200, batch size 64_ † Shufflenet is one of the few models that require temp buffers during inference. The performance degradation is an expected issue since we no longer cache any buffer in the ideep. As for the solution, we suggest users opt for caching allocator like jemalloc as a drop-in replacement for system allocator in such heavy workloads. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32422 Test Plan: Perf results: https://our.intern.facebook.com/intern/fblearner/details/177790608?tab=Experiment%20Results 10% improvement for ResNext with avx512, neutral on avx2 More results: https://fb.quip.com/ob10AL0bCDXW#NNNACAUoHJP Reviewed By: yinghai Differential Revision: D20381325 Pulled By: dzhulgakov fbshipit-source-id: 803b906fd89ed8b723c5fcab55039efe3e4bcb77	2020-03-26 22:07:59 -07:00
xiaobing.zhang	19bb496a0d	Enable mkldnn on windows (#31355 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/15982. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31355 Differential Revision: D19428979 Pulled By: ezyang fbshipit-source-id: bee304c5913e70e8dead3098e9796051861cd666	2020-01-27 09:00:02 -08:00
peterjc123	c4121ed8db	Fix is_fundamental template for MSVC (#30959 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/30932 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30959 Differential Revision: D18891797 Pulled By: mingbowan fbshipit-source-id: e6c36ee80065e66117873e768f86f507c48aaef1	2019-12-19 12:10:22 -08:00
Pieter Noordhuis	3556bea5aa	Build torch.distributed with Gloo backend on macOS (#25260 ) Summary: In facebookincubator/gloo#212, a libuv based Gloo transport was introduced, which allows us to use Gloo on macOS (and later perhaps also Windows). This commit updates CMake code to enable building with USE_DISTRIBUTED=1 on macOS. A few notes: * The Caffe2 ops are not compiled, for they depend on `gloo::transport::tcp`. * The process group implementation uses `gloo::transport::tcp` on Linux (because of `epoll(2)` on Linux and `gloo::transport::uv` on macOS). * The TCP store works but sometimes crashes on process termination. * The distributed tests are not yet run. * The nightly builds don't use `USE_DISTRIBUTED=1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25260 Reviewed By: mrshenli Differential Revision: D17202381 Pulled By: pietern fbshipit-source-id: ca80a82e78a05b4154271d2fb0ed31c8d9f26a7c	2019-09-05 07:09:50 -07:00
Cheng,Penghui	7ee82d48a8	Removed work around for convolution transpose op since the bug has be… (#22184 ) Summary: …en fixed in v0.18 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22184 Differential Revision: D15982627 Pulled By: bddppq fbshipit-source-id: 8725d5b5e5b68e029ffb08af12b416bd310c9638	2019-06-25 14:34:34 -07:00
Jinghui	29c849ff34	implement transpose operator for MKLDNN (#19955 ) Summary: implement transpose operator for MKLDNN 1. upgrade mkldnn-bridge to support ND transpose 2. implement transpose operator in caffe2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19955 Differential Revision: D15701832 Pulled By: bddppq fbshipit-source-id: e4337cd0ba6f8180a35c8c70cbb6830a0a84182f	2019-06-11 01:55:13 -07:00
Cheng,Penghui	74f6c55f0f	support negative axis in concat and split operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17955 Differential Revision: D14476031 Pulled By: ezyang fbshipit-source-id: e0e57e8595ed2005ded9e923572a40fe62aca5a7	2019-06-10 15:26:29 -07:00
Tim Khatkevich	a5cca4d342	add failback for Sign operator (#21343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21343 Needed to binarise features Reviewed By: yinghai Differential Revision: D15625653 fbshipit-source-id: 52f48259a040dac35a7000bb1eea9feb5c7ef1ab	2019-06-07 10:56:22 -07:00
Gu, Jinghui	b675f07bb6	Remove useless input shape checker in conv (#19608 ) Summary: The input shape checkers in conv/int8_conv operator is aims to avoid the issue when running with mkldnn winograd, the weigths has to be reordered each time if input shape changed. However, the checkers result to big performance regression due to frequent reorder. Meanwhile, in mkldnn-bridge, such case has been already fixed by correcting the prop_kind. Therefore, we have to remove the useless checker to fix the performance regression. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19608 Differential Revision: D15061169 Pulled By: yinghai fbshipit-source-id: 649a43ae6fce989e84939210f6dffb143ec3d350	2019-04-24 11:39:43 -07:00
Gu, Jinghui	575aebc182	implement operators for DNNLOWP (#18656 ) Summary: Implement operators for DNNLOWP, including int8_conv, int8_FC, int8_pooling, int8_relu, int8_sum, quantize/dequantize, and order_swtich operators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18656 Differential Revision: D14767092 Pulled By: yinghai fbshipit-source-id: 1f3e24929a358a42214da333bd304c593ea4468f	2019-04-10 12:04:39 -07:00
Gu, Jinghui	a7b82a44c4	Upgrade mkldnn-bridge for dnnlowp support (#16308 ) Summary: The mkldnn-bridge is upgraded in this PR to support DNNLOWP operators. Meanwhile, APIs have been updated in caffe2 to use latest version. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16308 Differential Revision: D14697018 Pulled By: yinghai fbshipit-source-id: ca952589098accb08295fd5aa92924c61e74d69c	2019-04-03 12:47:17 -07:00
Gregory Chanan	4c74cf7489	Move ideep singleton registration to ATen from C2. (#18335 ) Summary: Since we are going to add ideep to ATen, and ATen is always compiled, it makes sense to have the registration in ATen rather than C2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18335 Reviewed By: bddppq Differential Revision: D14578652 Pulled By: gchanan fbshipit-source-id: 4d77fcfc21a362b21d5291a127498aa722548873	2019-04-01 08:00:33 -07:00
Cheng,Penghui	e13101e069	support pre-convert filter format for mkldnn training mode and change 'OptimizeForIdeep' to 'OptimizeForMkldnn' (#15171 ) Summary: For MKL-DNN,the filter data will be reorderd to primitive format, it takes a lot of time. So the patch provide a method to convert filter format before training. And "OptimizeForIdeep" will be changed to "OptimizeForMkldnn" in this patch. This patch depends on https://github.com/pytorch/pytorch/pull/12866 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15171 Differential Revision: D14590741 Pulled By: yinghai fbshipit-source-id: 07971c9977edac3c8eec08ca2c39cda639683492	2019-03-29 19:00:48 -07:00
Hector Yuen	7bb36ada1f	fix -Wsign-compare warnings for some files inside c2 (#18123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18123 the motivation of this fix is to resolve things like: for(auto i = 0; i < N; i++) where N is bigger than int32 These instances of comparison were found by enabling -Wsign-compare There are way too many things to fix, so issuing this as a series of fixes The plan is to fix all these issues and then enable this flag into Caffe2 to catch future instances Reviewed By: ZolotukhinM Differential Revision: D14497094 fbshipit-source-id: bca3927a2188bd33a508fa503ba221c220cdaefe	2019-03-19 10:39:20 -07:00
Tim Khatkevich	6aacc1b2dd	Support failback for more operators in ideep (#17747 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17747 RMACRegions, Normalize and RoIPooling Reviewed By: dskhudia Differential Revision: D14365096 fbshipit-source-id: dafcb7077515e03c2880832a442015b70fc7140d	2019-03-08 05:48:22 -08:00
Michael Liu	f9ba3831ef	Apply modernize-use-override (4) Summary: Use C++11’s override and remove virtual where applicable. Change are automatically generated. bypass-lint drop-conflicts Reviewed By: ezyang Differential Revision: D14191981 fbshipit-source-id: 1f3421335241cbbc0cc763b8c1e85393ef2fdb33	2019-02-25 08:31:27 -08:00
Gu, Jinghui	60de0b885f	fallback operators to CPU for onnx support (#15270 ) Summary: fallback operators to CPU for onnx support Pull Request resolved: https://github.com/pytorch/pytorch/pull/15270 Differential Revision: D14099496 Pulled By: yinghai fbshipit-source-id: 52b744aa5917700a802bdf19f7007cdcaa6e640a	2019-02-22 10:47:53 -08:00
Cheng,Penghui	376bb40379	Implementation convolutionTranspose operator for mkl-dnn (#12866 ) Summary: the speed-up of a single operation is up to 2-3X on BDW. This PR depend on https://github.com/pytorch/pytorch/pull/14308 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12866 Differential Revision: D13936110 Pulled By: ezyang fbshipit-source-id: 34e3c2ca982a41e8bf556e2aa0477c999fc939d3	2019-02-20 17:26:10 -08:00
Michael Liu	5f866d0ea2	Apply modernize-use-override (2nd iteration) Summary: Use C++11’s override and remove virtual where applicable. Change are automatically generated. Reviewed By: Orvid Differential Revision: D14086124 fbshipit-source-id: 2005227d095d776ca3b4309a57f54e25782b9b58	2019-02-14 16:52:57 -08:00
Gu, Jinghui	5ada54e0bc	Impl ExpandDims op and fallback to CPU if needed (#15264 ) Summary: Impl ExpandDims op and fallback to CPU if needed Pull Request resolved: https://github.com/pytorch/pytorch/pull/15264 Differential Revision: D13808797 Pulled By: yinghai fbshipit-source-id: 7795ec303a46e85f84e5490273db0ec76e8b9374	2019-02-08 12:04:53 -08:00
Gu, Jinghui	887080e92a	Fallback sum/add to CPU if needed (#15267 ) Summary: Fallback sum/add to CPU if needed Pull Request resolved: https://github.com/pytorch/pytorch/pull/15267 Differential Revision: D13935064 Pulled By: yinghai fbshipit-source-id: eb228683d00a0462a1970f849d35365bc98340d6	2019-02-06 09:35:14 -08:00
Hui Wu	31ab03e34f	Add Winograd Conv method for CPU (#15196 ) Summary: Add winograd conv method. Users can select the direct conv or winograd conv in the model file. We close the origin pr https://github.com/pytorch/pytorch/pull/12154 and create this new one for better rebasing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15196 Differential Revision: D13463721 Pulled By: yinghai fbshipit-source-id: c5cd5c8aa7622ae7e52aeabd3dbb8ffb99b9b4ee	2019-02-01 16:41:30 -08:00
Xiaomeng Yang	4ae9ab24b6	Update conv_base to support empty batch (#16603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16603 Update conv_base to support empty batch Reviewed By: houseroad Differential Revision: D13894111 fbshipit-source-id: fc4370ff16ba6046f374e77bd845d28e6af05ea3	2019-01-31 23:46:18 -08:00
Tim Khatkevich	2ed5569bd6	Support fallback for more operators (#16566 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16566 it's a follow-up to https://github.com/pytorch/pytorch/pull/16456 Reviewed By: yinghai Differential Revision: D13881462 fbshipit-source-id: eff063580ac8f622477417ed4b25320299451811	2019-01-30 13:21:20 -08:00
Yinghai Lu	fa717cba63	Support int64_t shape data for ideep reshape op Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16533 Reviewed By: jerryzh168 Differential Revision: D13867402 fbshipit-source-id: ff53a851f142ef915ad69da3868bb3aab4d48987	2019-01-30 09:00:09 -08:00
Tim Khatkevich	7d7855ea31	Fallback support for more operators (#16456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16456 Adding fallbacks for more operators and fixing ifndef for expand_op.h Reviewed By: yinghai Differential Revision: D13845382 fbshipit-source-id: b7c5b7f7f176707b9ddffade139562a8085967ed	2019-01-30 03:54:11 -08:00
Jerry Zhang	539894d70a	Remove caffe2::ShareData (#16139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16139 Original commit changeset: 4b15a4c62995 Reviewed By: dzhulgakov Differential Revision: D13677464 fbshipit-source-id: 1a644a88fac02b44feebac48ccc01bc72cc47edb	2019-01-25 15:39:11 -08:00
Gu, Jinghui	0e6791b275	Impl Shape op for mkldnn (#15266 ) Summary: Impl Shape op for mkldnn Pull Request resolved: https://github.com/pytorch/pytorch/pull/15266 Differential Revision: D13804558 Pulled By: yinghai fbshipit-source-id: 8a35f608c23973d7a15c3d645aee4059eb55f245	2019-01-25 11:04:57 -08:00
Shane Li	620ff25bdb	Enhance cpu support on gloo based multi-nodes mode. (#11330 ) Summary: 1. Add some gloo communication operators into related fallback list; 2. Work around to avoid compiling errors while using fallback operator whose CPU operator inherits from 'OperatorBase' directly like PrefetchOperator; 3. Add new cpu context support for some python module files and resnet50 training example file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11330 Reviewed By: yinghai Differential Revision: D13624519 Pulled By: wesolwsk fbshipit-source-id: ce39d57ddb8cd7786db2e873bfe954069d972f4f	2019-01-15 11:47:10 -08:00
Jerry Zhang	6371bc76a9	Back out "[pt1][tensor] Remove caffe2::ShareData" (#15983 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15983 Original commit changeset: 6e4275d02f4c Reviewed By: supertopher, Yangqing Differential Revision: D13644123 fbshipit-source-id: 4b15a4c62995c0e68aad58465600409e302e6504	2019-01-12 07:07:22 -08:00
Cheng,Penghui	926e718d5f	Add/fallback some operators for mkl-dnn (#11696 ) Summary: Implementation LeakyRelu operator for mkl-dnn,the speed-up of a single operation is up to 10X on BDW. Implementation rashape operator for mkl-dnn,it will resolve occasionally crash issue which use fallback reshape operator. Implementation CreateBlobQueue and SafeEnqueueBlobs operators,it will resolve crash issue which use fallback operators. Fallback CreateBlobsQueueDBOp,TensorProtosDBInput,CloseBlobsQueue operators. Implement adam operator for mkl-dnn,the speed-up of a single operator is up to 6X on BDW. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11696 Reviewed By: yinghai Differential Revision: D10100438 Pulled By: wesolwsk fbshipit-source-id: 0b6e06897cc11e0a8e349d80a870b1e72e47f10d	2019-01-11 12:53:06 -08:00
Gu, Jinghui	07ea3e035e	Fix fallback issues to handle inplace case (#15726 ) Summary: Fix fallback issues to handle inplace case Pull Request resolved: https://github.com/pytorch/pytorch/pull/15726 Differential Revision: D13591243 Pulled By: yinghai fbshipit-source-id: 6897f1daacb36beabcdfc22c39242bbdfdd0e534	2019-01-10 19:47:09 -08:00
Jerry Zhang	ede1f4ad05	Remove caffe2::ShareData (#15418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15418 Previously we are using Resize + ShareData. Instead, we'll create a function on Tensor that clones itself with same storage. Suppose we want `t` to `ShareData` with `t0`, Previous: ``` Tensor t(dims, CPU); t.Resize(t0.sizes()); t.ShareData(t0); ``` Now: ``` Tensor t = t0.Alias(); ``` Reviewed By: dzhulgakov Differential Revision: D13507609 fbshipit-source-id: 6e4275d02f4c3356cbce91127f1b01111dc86b9f	2019-01-08 11:01:56 -08:00
Gu, Jinghui	2ebeb33697	Fallback to CPU concat op to handle TensorCPU inputs (#15263 ) Summary: Fallback to CPU concat op to handle TensorCPU inputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/15263 Differential Revision: D13587030 Pulled By: yinghai fbshipit-source-id: 010a8579d61c3beb8556eb92493a552b2ab0030c	2019-01-07 11:13:23 -08:00
Cheng,Penghui	1488c5dd03	support 0 size in any of the tensor dimensions in mkldnn (#15295 ) Summary: support 0 size in any of the tensor dimensions in mkldnn Pull Request resolved: https://github.com/pytorch/pytorch/pull/15295 Differential Revision: D13573747 Pulled By: yinghai fbshipit-source-id: 5bf7a0b9e2567e80f44981a7823be5407fc94e53	2019-01-04 22:33:18 -08:00
Edward Yang	54d8ce94ee	Revert D13383102: [pytorch][PR] Upgrade MKL-DNN to version 0.17 Differential Revision: D13383102 Original commit changeset: c434f0e0ddff fbshipit-source-id: 690f46ca0710954fa591a5ea77535e9759db4de5	2018-12-18 07:39:20 -08:00
Cheng,Penghui	1717ea1da0	Implementation of ChannelShuffle Op for MKLDNN (#15106 ) Summary: the speed-up of a single operation is up to 3X . Pull Request resolved: https://github.com/pytorch/pytorch/pull/15106 Differential Revision: D13429596 Pulled By: bddppq fbshipit-source-id: f8d987cafeac9bef9c3daf7e43ede8c6a4ee2ce5	2018-12-12 20:25:12 -08:00
Jerry Zhang	83f32eebd9	Tensor construction codemod - 2/3 (#14836 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14836 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: bddppq Differential Revision: D13335176 fbshipit-source-id: 8d89510670e2cf70559d2f75e68f7181feb0b6d9	2018-12-10 19:30:56 -08:00
Gu, Jinghui	70598740ec	Upgrade MKL-DNN to version 0.17 (#14308 ) Summary: upgrade MKL-DNN to version 0.17 update mkldnn bridge to latest. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14308 Differential Revision: D13383102 Pulled By: yinghai fbshipit-source-id: c434f0e0ddff2ee2c86db2d6c44a37298fd005a3	2018-12-07 16:44:50 -08:00
PenghuiCheng	939877bf4b	Implementation of WeightedSum op for mkl-dnn and fix FC op output shape issue. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14407 Reviewed By: yinghai Differential Revision: D13364364 Pulled By: wesolwsk fbshipit-source-id: e69bcd1bc52e35b2f0e45e5dc40184f1bd66605d	2018-12-07 12:35:19 -08:00
Dmytro Dzhulgakov	0cfbbceac3	Change Tensor::CopyFrom to a simple double dispatch (#14268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14268 Removes the need for Context in Tensor by doing simple dispatch for CopyBytes. It'd eventually be subsumed by Roy Li's changes of proper copy_ op, but before that is done, let's get a clear logic of how copies are implemented and clean up some craft in CopyFrom implementation. Note, that with these changes, one can probably can get rid of Context::CopyFromCPU/CopyToCPU, but it's a matter for follow up diffs. This diff doesn't change the API of Tensor yet, but relies on the fact that passing `Context` to CopyFrom makes copy async if the device is CUDA and doesn't have any effect otherwise (that's how Context methods are implemented). This doesn't change semantics of copy async implementation - as before it blindly calls cudaMemcpyAsync which probably means that it can be misused if invoked separately outside of operator body. I'll leave it for the follow up copy_ unification. For Extend() we always do async copy - it makes sense as it's an in-place device-device operation and only any further op would be observable. Note: there are now three ways of invoking copy in C2 code - templated CopyBytes, virtual CopyFromCPU/etc, and double-dispatch free method here. Hopefully we can get rid of the second one. Also, please advise whether it's c10-worthy :) Reviewed By: ezyang Differential Revision: D13117987 fbshipit-source-id: a6772d6dcf3effaf06717da3a656fc9873b310b5	2018-11-28 15:45:37 -08:00
Gu, Jinghui	60963c2ecb	Add "axis" and "axis_w" arguments in FC to support customized axix to reduce dim. (#12971 ) Summary: Add "axis" and "axis_w" arguments in FC to support customized axix to reduce dim. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12971 Reviewed By: bddppq Differential Revision: D12850675 Pulled By: yinghai fbshipit-source-id: f1cde163201bd7add53b8475329db1f038a73019	2018-11-21 15:44:50 -08:00
Hui Wu	acd7811e33	Add sigmoid op based on MKL-DNN Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13097 Differential Revision: D13105366 Pulled By: yinghai fbshipit-source-id: d156e8fd519baeecf61c25dcd8fa2c2fa7351ef4	2018-11-19 22:56:35 -08:00
Cheng,Penghui	c76fc75292	Implementation copy operator for mkl-dnn (#12820 ) Summary: It is a operator to copy blob from ideep device to ideep device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12820 Reviewed By: ezyang Differential Revision: D10850956 Pulled By: yinghai fbshipit-source-id: f25bff6238cefe847eb98277979fa59139bff843	2018-10-31 19:35:53 -07:00
Jerry Zhang	dcbca53e58	Renaming size() to numel() - 1/6 Summary: Codemod generated with clangr shard mode, 50 files per diff Reviewed By: li-roy Differential Revision: D10866373 fbshipit-source-id: 589194164d4fea93b74d83fa7fc4c59558c41f4a	2018-10-29 11:11:19 -07:00
Gu, Jinghui	dbab9b73b6	seperate mkl, mklml, and mkldnn (#12170 ) Summary: 1. Remove avx2 support in mkldnn 2. Seperate mkl, mklml, and mkldnn 3. Fix convfusion test case Pull Request resolved: https://github.com/pytorch/pytorch/pull/12170 Reviewed By: yinghai Differential Revision: D10207126 Pulled By: orionr fbshipit-source-id: 1e62eb47943f426a89d57e2d2606439f2b04fd51	2018-10-29 10:52:55 -07:00
Jerry Zhang	314d95a5f2	Renaming dims() to sizes() (caffe2/caffe2) - 3/4 (#13096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13096 Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes() Reviewed By: ezyang Differential Revision: D10842875 fbshipit-source-id: 1784859735ed4d1bd5ccd7ca56e289498374a68f	2018-10-25 12:14:21 -07:00
Jerry Zhang	07c0f4a097	Tensor dims() -> sizes() (caffe2/operators) - 1/5 (#13028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13028 Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes() Reviewed By: ezyang Differential Revision: D10476220 fbshipit-source-id: 3c3b3d5e2082cd6a1f0ff4a3c8641b30e6f16896	2018-10-24 14:18:18 -07:00

1 2 3

103 Commits