pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Xiaomeng Yang	2ce39de3fc	Add elementwise_affine for layer_norm_op (#19713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19713 Add elementwise_affine for layer_norm_op Reviewed By: houseroad Differential Revision: D15075454 fbshipit-source-id: e8a7d3da1c81e49fa55323f5e74a68bc4ef8d83f	2019-04-26 17:20:01 -07:00
Oleg Bogdanov	bf5a5c2a31	caffe2 \| Use _aligned_free in WorkerPool destruction (#19751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19751 This has probably never been tested on Windows but destruction of WorkersPool crashes because it uses _aligned_malloc to allocate and 'free' to deallocate, which is not symmetric. Fix is to use _aligned_free in deallocation Reviewed By: hlu1 Differential Revision: D15083472 fbshipit-source-id: 42243fce8f2dfea7554b52e6b289d9fea81d7681	2019-04-25 14:54:50 -07:00
Xiaomeng Yang	fb9fc42a0c	optimize BatchMatmulOp (#18612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18612 optimize BatchMatmulOp Reviewed By: houseroad Differential Revision: D14681665 fbshipit-source-id: cf5ea4909ace58fd44fe6fa634531102ac84e851	2019-04-23 15:34:59 -07:00
Oleg Bogdanov	70b82d28b8	caffe2 \| Windows compat fixes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19531 Reviewed By: hlu1 Differential Revision: D15024541 fbshipit-source-id: cd8249a6d529afb65fa8afd74a05dbfe73eb1fb0	2019-04-23 14:30:19 -07:00
Gemfield	6ed57e052d	Fix the return value of ParseFromString (#19262 ) Summary: Fix the return value of ParseFromString. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19262 Differential Revision: D14937605 Pulled By: ezyang fbshipit-source-id: 3f441086517186a075efb3d74f09160463b696b3	2019-04-15 12:39:29 -07:00
Xiaomeng Yang	fd40c0eba0	Add gelu op (#18992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18992 Add gelu op Reviewed By: houseroad Differential Revision: D14814811 fbshipit-source-id: 00f126b8b83763c57ebbf28fbd2de5a8fab6d491	2019-04-08 21:58:29 -07:00
Yinghai Lu	1d263ed92a	Add backward pass to infer single missing input shape for Concat opportunitiscally (#18911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18911 Att. Reviewed By: bddppq Differential Revision: D14791295 fbshipit-source-id: 4b7a775924f0eadb0cb73aa6c434a6a5be8b92be	2019-04-05 10:11:58 -07:00
Yinghai Lu	80404cb2f5	Add support for getting TensorProto argument (#18364 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18364 att Reviewed By: bddppq Differential Revision: D14584784 fbshipit-source-id: 03f9207d5cf4f7f4b812428a931edbcdcb21ca8d	2019-04-02 20:58:28 -07:00
Xiaomeng Yang	265fa0ce4d	Move math::Axpy function to elementwise lib (#18316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18316 Move math::Axpy function to elementwise lib i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D14574697 fbshipit-source-id: 7cfbb2da295c8966c5328bd6b577cce2638eea62	2019-03-26 12:19:19 -07:00
nihui	ed8c462dc7	Fix caffe2 build with BLAS=OpenBLAS (#18422 ) Summary: g++ complains about failing to find the declaration of cblas_sscal and cblas_dscal BLAS function let's fix it :) fedora 29, gcc 8.3.1, openblas 0.3.5 build with cmake -DBLAS=OpenBLAS .. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18422 Differential Revision: D14598977 Pulled By: soumith fbshipit-source-id: bde77bfb359d2ff38226401caeed78c114ef7468	2019-03-25 11:59:10 -07:00
Duc Ngo	172ec4ace5	caffe2 - Util to cleanup external inputs and outputs from a NetDef (#18194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18194 Add a util method to cleanup external inputs and outputs from a NetDef The following conditions will be met after the modification - No duplicate external inputs - No duplicate external outputs - Going through list of ops in order, all op inputs must be outputs from other ops, or registered as external inputs. - All external outputs must be outputs of some operators. Reviewed By: ZolotukhinM Differential Revision: D14528589 fbshipit-source-id: c8d82fda1946aa3696abcbec869a4a8bb22f09b6	2019-03-22 11:23:03 -07:00
Xiaomeng Yang	e04c9195b7	Update math::Transpose to support tensor with size > 2G (#17670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17670 Update math::Transpose to support tensor with size > 2G i-am-not-moving-c2-to-c10 Differential Revision: D14313624 fbshipit-source-id: 0b4a85b913972e5a8981f0d40d0c539407b98f30	2019-03-20 18:22:21 -07:00
Xiaomeng Yang	0fd1dc45c0	Optimize LayerNormOp (#17604 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17604 Optimize LayerNormOp i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D14274175 fbshipit-source-id: a7aa263a1b0eb109682d2be99306e7b2cdcc0faf	2019-03-08 17:38:14 -08:00
James Reed	1d26a3ae7e	Open registration for c10 thread pool (#17788 ) Summary: 1. Move ATen threadpool & open registration mechanism to C10 2. Move the `global_work_queue` to use this open registration mechanism, to allow users to substitute in their own Pull Request resolved: https://github.com/pytorch/pytorch/pull/17788 Reviewed By: zdevito Differential Revision: D14379707 Pulled By: jamesr66a fbshipit-source-id: 949662d0024875abf09907d97db927f160c54d45	2019-03-08 15:38:41 -08:00
Yinghai Lu	efed875b3f	Catch exceptions in bound_shape_inference (#17775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17775 Handles use input shape hint properly. Reviewed By: zrphercule Differential Revision: D14368735 fbshipit-source-id: 504cd96589e47aa432617e56362aa6b01a25ba9b	2019-03-08 13:18:28 -08:00
Xiaomeng Yang	9709d5e787	Fix math::Set for large tensor (#17539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17539 Fix math::Set for large tensor i-am-not-moving-c2-to-c10 Reviewed By: dzhulgakov, houseroad Differential Revision: D14240756 fbshipit-source-id: 0ade26790be41fb26d2cc193bfa3082c7bd4e69d	2019-02-27 12:34:58 -08:00
Xiaomeng Yang	2e67b34ea7	Separate gpu reduce functions (#17146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17146 Separate gpu reduce functions i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D14097564 fbshipit-source-id: a27de340997111a794b1d083c1673d4263afb9fb	2019-02-20 14:49:01 -08:00
Michael Liu	92a516b9ff	Apply modernize-use-override - 2/2 Summary: Use C++11’s override and remove virtual where applicable. Change are automatically generated. Reviewed By: Orvid Differential Revision: D14054721 fbshipit-source-id: 15d266fa1779b1e3ea6270f00841d7fb1e4d44ee	2019-02-13 21:01:28 -08:00
Xiaomeng Yang	3a34f443c5	Separate reduce functions from math (#16929 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16929 Separate CPU reduce functions from math i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D13999469 fbshipit-source-id: bd628b15a6e3c1f04cc62aefffb0110690e1c0d1	2019-02-13 17:50:47 -08:00
Xiaomeng Yang	2db847b3a7	Separate elementwise level2 math functions (#16753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16753 Separate elementwise level2 math functions i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D13954928 fbshipit-source-id: 1ca7a5d3da96e32510f502e5e4e79168854bee67	2019-02-07 18:38:26 -08:00
Johannes M Dieterich	448e0d78e9	Document hip-clang and its __HIP__ macro (#16771 ) Summary: In #16085 , we introduced initial hip-clang bring-up code. Document the use of the __HIP__ macro now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16771 Differential Revision: D13961538 Pulled By: ezyang fbshipit-source-id: 67f6226abcbe62e2f4efc291c84652199c464ca6	2019-02-05 15:13:52 -08:00
James Reed	ce15ae8f23	Add an API to set the number of threads in C10 thread pool (#16669 ) Summary: Tested locally on machine translation service Pull Request resolved: https://github.com/pytorch/pytorch/pull/16669 Differential Revision: D13927858 Pulled By: jamesr66a fbshipit-source-id: efcb8c21e0c2f76ac37967e6f52967da515595c3	2019-02-05 00:15:56 -08:00
Xiaomeng Yang	7d4a81cbb2	Use macro for reduce on 2d blocks (#16344 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16344 Use macro for reduce on 2d blocks i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D13808988 fbshipit-source-id: b68c0fb6079c1b6e203a072083aba7a95c202bc2	2019-02-01 23:49:07 -08:00
Xiaomeng Yang	598b713660	Seperate level1 elementwise functions from math (#16397 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16397 Seperate level1 elementwise functions from math i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D13830626 fbshipit-source-id: e6e672647076dab8b3b24be181f580a1486250c9	2019-01-30 00:04:12 -08:00
Owen Anderson	f204e3e624	Pass WERROR to CMake as an explicit parameter rather than an env var. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16465 Differential Revision: D13853949 Pulled By: resistor fbshipit-source-id: 71ccf90a2824ad21c9f26dd753b186f30435d82a	2019-01-28 20:57:18 -08:00
Xiaomeng Yang	0a2d14dd7c	Optimize SpatialBNOp on GPU (#16395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16395 Optimize SpatialBNOp on GPU i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D13829833 fbshipit-source-id: 04d2a63e8e9830c4c39a91cf87fcd7aa765dc55f	2019-01-28 09:36:45 -08:00
Edward Yang	45602ce9a2	Delete Tensor::swap(), replace with pointer swap (#12730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12730 i-am-not-moving-c2-to-c10 Reviewed By: smessmer Differential Revision: D10415430 fbshipit-source-id: 8a2ce8611c5fa77bbbd73fb6788c1baa3b370f07	2019-01-25 08:25:07 -08:00
Benny Chen	f25322fb97	Fix issues under caffe round 1 Summary: Some automation to fix uninitialized members for caffe2 code. Ran canary to make sure I don't have any regression in prod, but not sure how to test comprehensively for caffe2 Reviewed By: ezyang Differential Revision: D13776185 fbshipit-source-id: fb2a479971cc0276d8784be1c44f01252410bd24	2019-01-23 19:04:59 -08:00
Yaxun (Sam) Liu	9521a15c88	hip-clang enablement (#16085 ) Summary: Initial enabling of the upcoming hip-clang compiler for the PyTorch source base. Changes: * update the Eigen submodule to a version including our upstreamed hip-clang enabling there * modify a few ifdef guards with the `__HIP__` macro used by hip-clang * use `__lane_id` instead of `hc::__lane_id` * add Debug flags for ROCm to the cmake infrastructure Pull Request resolved: https://github.com/pytorch/pytorch/pull/16085 Differential Revision: D13709459 Pulled By: ezyang fbshipit-source-id: 1b7b33fe810a0434766180580d4443ea177eb7c7	2019-01-22 09:09:48 -08:00
Xiaomeng Yang	866c4e3467	Separate Moments from math and optimize it (#16175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16175 Separate Moments from math and optimize it i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D13742472 fbshipit-source-id: 90757d908d38c98ca69818855aaf68315e525992	2019-01-20 08:53:25 -08:00
Xiaomeng Yang	b436f94b53	Separate affine_channel from math and optimize it (#16135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16135 Separate affine_channel from math and optimize it i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D13727606 fbshipit-source-id: 8980af4afadaf964a18a9da581106fe30896a7e9	2019-01-18 22:40:16 -08:00
Thomas Viehmann	b662a9b66a	add back NNPACK in PyTorch (#15924 ) Summary: This tests the water for adding back NNPACK in PyTorch, it's a lot better than the fallback THNN versions. In #6151, we (ezyang and soumith) removed NNPACK support from PyTorch. Of course Maratyszcza might have advice, too. (Or an opinion on the CMake changes.) The only functional changes are to use NNPack more aggressively on mobile and a .contiguous() to match NNPack's assumption (I stumbled over that while using NNPack for style transfer.) The CMake changes try to use the NNPack we already have in git. In terms of lines of code this is a large part of the diff of https://lernapparat.de/pytorch-jit-android/ . As far as I can tell, we don't have MKLDNN on mobile and the native THNN implementation are prohibitively expensive in terms of both CPU and memory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15924 Differential Revision: D13709576 Pulled By: ezyang fbshipit-source-id: f2e287739909451c173abf046588209a7450ca2c	2019-01-18 15:34:35 -08:00
bddppq	1a09a2a27f	Export PyTorch erf to ONNX Erf and add Caffe2 Erf operator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16106 Differential Revision: D13709490 Pulled By: bddppq fbshipit-source-id: 1b5b32261f06543371f7bd7ac9b11957a5eb4ad0	2019-01-17 09:18:08 -08:00
Jerry Zhang	890568a018	Tensor reinitialization codemod - 5/5 (#15884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15884 Codemod generated with clangr shard mode, 25 files per diff, To eliminiate partially initialized Tensor, we split the initialization of local Tensor variables into two steps, first declare un uninitialized Tensor, and call `ReinitializeTensor` to initialize it. motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: hyuen Differential Revision: D13586737 fbshipit-source-id: dc8e49e9f29505b8898bb19f84c1a983f2d811ab	2019-01-10 16:32:26 -08:00
Sebastian Messmer	d408324350	Move files to/from c10/core and c10/util (#15316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15316 This starts cleaning up the files in c10 according to the module structure we decided on. Move to c10/util: - Half.h, Half-inl.h, Half.cpp, bitcasts.h Move to c10/core: - Device.h, Device.cpp - DeviceType.h, DeviceType.cpp i-am-not-moving-c2-to-c10 Reviewed By: dzhulgakov Differential Revision: D13498493 fbshipit-source-id: dfcf1c490474a12ab950c72ca686b8ad86428f63	2019-01-10 16:22:22 -08:00
Jerry Zhang	0c32e1b43e	use C10_MOBILE/ANDROID/IOS (#15363 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15363 Didn't define C10_MOBILE in the numa file move diff: D13380559 move CAFFE2_MOBILE/ANDROID/IOS to c10 ``` codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_MOBILE" "C10_MOBILE" codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_ANDROID" "C10_ANDROID" codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_IOS" "C10_IOS" ``` i-am-not-moving-c2-to-c10 Reviewed By: marcinkwiatkowski Differential Revision: D13490020 fbshipit-source-id: c4f01cacbefc0f16d5de94155c26c92fd5d780e4	2019-01-09 15:08:20 -08:00
Hao Lu	58a7f2aed1	Add pthreadpool_create and pthreadpool_destroy (#15492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15492 Add pthreadpool_create and pthreadpool_destroy, which are used by NNPACK tests. Reviewed By: Maratyszcza Differential Revision: D13540997 fbshipit-source-id: 628c599df87b552ca1a3703854ec170243f04d2e	2018-12-21 20:28:18 -08:00
Hao Lu	01be9b7292	Handling nullptr case Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15467 Reviewed By: Maratyszcza Differential Revision: D13536504 fbshipit-source-id: ab46ff6bb4b6ce881c3e29d7e6a095ea62289db4	2018-12-21 15:08:00 -08:00
David Reiss	cbd1c519c4	Replace non-printable-ascii characters in ProtoDebugString (#14918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14918 When ProtoBuf-Lite is in use, ProtoDebugString just calls SerializeAsString. This produces binary output, which is not a very suitable "debug" string. Specifically, we've observed it causing problems when calling code tries to add the debug string to a Java exception message (which requires valid UTF-8). Now, we replace all non-ASCII bytes with "?". This is not a very fast implementation, but generating debug strings shouldn't be a performance-sensitive operation in any application. Reviewed By: dzhulgakov Differential Revision: D13385540 fbshipit-source-id: 8868172baf20efaf53fecf7d666a6980f59b64f5	2018-12-13 13:16:24 -08:00
rohithkrn	7e2b074219	Integrate rocBLAS fp16 api into Caffe2 (#14882 ) Summary: This PR integrates rocBLAS half and mixed precision APIs in to Caffe2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14882 Differential Revision: D13407840 Pulled By: bddppq fbshipit-source-id: 75cb0d74da066776fa66575f1d255e879d36121e	2018-12-10 17:54:06 -08:00
James Sun	186341c5dc	Merge Caffe2 and PyTorch thread pool definitions (#14114 ) Summary: (1) Move Caffe2 thread pool to aten (2) Use the same thread pool definition for PyTorch interpreter (3) Make ivalue::Future thread-safe Pull Request resolved: https://github.com/pytorch/pytorch/pull/14114 Reviewed By: ilia-cher Differential Revision: D13110451 Pulled By: highker fbshipit-source-id: a83acb6a4bafb7f674e3fe3d58f7a74c68064fac	2018-11-28 18:10:20 -08:00
ArutyunovG	8e91da4cb3	Windows shared build (#13550 ) Summary: Hi guys, I'd like to build Caffe2 with more supported options in Windows with Microsoft Visual Studios. This is the first pull request. Running scripts/build_windows_shared.bat is able to build Caffe2 with both CMAKE_BUILD_TYPE=Debug and CMAKE_BUILD_TYPE=Release with Visual Studio 14 2015. CUDA is 9.0, cudnn is 7.0.5, glog, gflags and lmdb are supported on my system. Python is 3.5, Detectron works from python interface as well. It was even possible to debug detectron code and step into caffe2_gpu.dll with pdbs built. What is disappointing, that c10/experimental ops don't build with this Visual Studio generator, I added special option INCLUDE_EXPERIMENTAL_C10_OPS (default ON) to deal with it in build_windows_shared.bat. After this pull request the next step is to add Visual Studio 2017 support in the script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13550 Reviewed By: ezyang Differential Revision: D13042597 Pulled By: orionr fbshipit-source-id: f313f909f599cd582a1d000eff766eef3a9fc4fc	2018-11-16 12:16:28 -08:00
Junjie Bai	0d7a986da1	Change hip filename extension to .hip (#14036 ) Summary: xw285cornell - To make hip files to have unique filename extension we change hip files from _hip.cc to .hip (it's the only blessing option other than .cu in hipcc `3d51a1fb01/bin/hipcc (L552)`). - Change to use host compiler to compile .cc\|.cpp files. Previously we use hcc to compile them which is unnecessary - Change the hipify script to not replace "gpu" with "hip" in the filename of the generated hipified files. Previously we do this because hcc has a bug when linking files that have same filename. We have now changed to use host linker to do linking so this is unnecessary anymore. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14036 Reviewed By: xw285cornell Differential Revision: D13091813 Pulled By: bddppq fbshipit-source-id: ea3d887751d8abb39d75f5d5104aa66ce66b9ee0	2018-11-16 11:55:59 -08:00
Yinghai Lu	7c053b7e64	Add filler for SparseLengthsWeightedSum (#13949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13949 This diff adds support to fillers for `SparseLengthsWeight` ops. It does 3 things: 1. Add the fillers for `SparseLengthsWeight` ops 2. Add filling heuristics to consider the path of `LengthsRangeFill` -> `Gather` -> `SparseLengthsWeightedSum`, where the length input is shared by `LengthsRangeFill` and `SparseLengthsWeightedSum`. Therefore, we need to carefully bound the value of that length input so that at `Gather`, it does not index out-of-bound for the weight input of `Gather`. 3. Fix and simplify the logic of `math::RandFixedSum`, where we just keep rejecting the generated value if it violates the invariants. Reviewed By: highker Differential Revision: D13048216 fbshipit-source-id: bfe402e07e6421b28548047d18b298c148e0ec87	2018-11-16 11:31:05 -08:00
Ashish	5ae3b44255	Added HIP top_k operator (#13747 ) Summary: This PR contains changes for: 1. Adding HIP top_k operator in Caffe2 2. Added HIP equivalent definitions of GPUDefs and GPUScanUtils 3. Removing the top_k operator test from ROCm test ignore list 4. Bug fixes in related code in THC/THCAsmUtils.cuh Differential Revision: D12986451 Pulled By: bddppq fbshipit-source-id: 6d5241fb674eaeb7cde42166426ac88043b83504	2018-11-08 20:14:53 -08:00
rohithkrn	afc7dbd586	Hipify caffe2/utils/math_gpu.cu (#13521 ) Summary: This PR adds caffe2/utils/math_gpu.cu to pyHipify bddppq petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/13521 Differential Revision: D12954843 Pulled By: bddppq fbshipit-source-id: a2bf367da07e49cb7807ba6876b42d0733fc8205	2018-11-07 11:34:15 -08:00
Sebastian Messmer	b1c57caaf9	Move flat_hash_map to c10/util Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13527 Reviewed By: ezyang Differential Revision: D12912239 fbshipit-source-id: bb44d3ff87c4ca94943ec2667acf1e7ce2b3c914	2018-11-05 17:39:18 -08:00
Jongsoo Park	54e8623d26	3D Conv in NHWC layout (#12733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12733 Conv in NHWC layout only works for 2D images. This has been a pain point when implementing quantized 3D convolution because we need NHWC layout for best performance (note that NHWC layout in general gives better performance in CPU not just for quantized operators). For example, our quantized ops have a functionality to measure quantized error operator by operator but this needs running a shadow fp32 operator, but this is not easy when there's no 3D conv in NHWC layout is available (currently we're doing layout conversion on the fly for the shadow fp32 operator which is error prone). Some of Caffe2 frameworks like brew generates error when we try to create a 3D conv op in NHWC layout. This was also a blocker for using aibench because aibench is using brew. i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D10333829 fbshipit-source-id: 2d203ee1db833cd3f9d39353219e3894b46c4389	2018-11-04 21:50:09 -08:00
Dmytro Dzhulgakov	fdf34c8da8	Kill more weird constructors on Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13433 Reviewed By: jerryzh168 Differential Revision: D12874599 fbshipit-source-id: 0c262fda72cbc4f3ea80df790cc8e95140bdc7e0	2018-11-04 16:54:49 -08:00
Jongsoo Park	f000101b81	add a few comments on layout after im2col (#12429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12429 Comments to clarify layout after NHWC im2col for group convolution. i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D10233284 fbshipit-source-id: 996a69f2f932e02c978abaade7571b00741b6ae8	2018-11-04 11:02:58 -08:00

1 2 3 4 5 ...

417 Commits