pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 00:20:18 +01:00

Author	SHA1	Message	Date
Nikita Shulga	4edff32f81	[c10] Fix typo in __assert_fail noreturn modifier guard (#34157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34157 `[[noreturn]` only conficts with CUDA __asert_fail defition if clang is used if host compiler Test Plan: CI Reviewed By: EscapeZero Differential Revision: D20232088 fbshipit-source-id: 7182c28a15278e03175865cd0c87410c5de5bf2c	2020-03-03 17:25:25 -08:00
Nikita Shulga	0689cf8fc1	[c10] Make __assert_fail CUDA definition compilable with clang host compiler (#34102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34102 if nvcc is invoked with clang host compiler, it will fail with the following error due to the decorators mismatch defined in cuda and c10: ``` error: attribute "noreturn" did not appear on original declaration ``` Test Plan: Build pytorch with clang Reviewed By: EscapeZero Differential Revision: D20204951 fbshipit-source-id: ff7cef0db43436e50590cb4bbf1ae7302c1440fa	2020-03-02 20:11:49 -08:00
Wojciech Baranowski	8aa09de19e	build: set -DNDEBUG in Release (#32719 ) Summary: This might lead to silent undefined behaviour (e.g. with out-of-bound indices). This affects `test_multinomial_invalid_probs_cuda` which is now removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32719 Test Plan: * Build with VERBOSE=1 and manually inspect `less ndebug.build.log \| grep 'c++' \| grep -v -- -DNDEBUG` (only with nina on Linux) * CI Fixes https://github.com/pytorch/pytorch/issues/22745 Differential Revision: D20104340 Pulled By: yf225 fbshipit-source-id: 2ebfd7ddae632258a36316999eeb5c968fb7642c	2020-02-26 12:53:31 -08:00
Michael Ranieri	9b2b15f4fc	misc windows warning fixes (#33632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33632 * `inline_container.h` was unnecessarily exposing all includers to caffe2 headers via `caffe2/core/logging.h` * Add msvc version of hiding unused warnings. * Make sure clang on windows does not use msvc pragmas. * Don't redefine math macro. Test Plan: CI green Differential Revision: D20017046 fbshipit-source-id: 230a9743eb88aee08d0a4833680ec2f01b7ab1e9	2020-02-21 19:36:25 -08:00
Michael Ranieri	40265e2d66	prevent various warnings related to undef and redef (#33196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33196 Test Plan: Sandcastle green Reviewed By: malfet Differential Revision: D19842268 fbshipit-source-id: 47bc3d7a75e803041491e11a648b4a9e7d9cc72c	2020-02-12 13:28:35 -08:00
Sebastian Messmer	c21f89970f	Remove c++14-conditional constexpr (#30916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30916 These macros said "make it constexpr if we're in C++14". Since we're now always C++14, we can just say "constexpr" isntead. ghstack-source-id: 96369584 Test Plan: waitforsandcastle Differential Revision: D18869635 fbshipit-source-id: f41751e4e26fad6214ec3a98db2d961315fd73ff	2020-01-07 16:40:11 -08:00
Sebastian Messmer	409151e1bb	Use [[noreturn]] instead of C10_NORETURN or CAFFE_NORETURN (#30917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30917 This is a C++14 feature, we can use this now. ghstack-source-id: 95255753 Test Plan: waitforsandcastle Differential Revision: D18869637 fbshipit-source-id: dd02036b9faeaffa64b2d2d305725443054da31b	2019-12-15 23:54:16 -08:00
Serhat Yilmaz	57ee7dab87	Wraps assert statements in cuda kernels (#31276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31276 Change assert --> CUDA_ASSERT_KERNEL to avoid hip undefined __assert_fail() This is similar to https://github.com/pytorch/pytorch/pull/13902 in caffe2 land. Test Plan: wait for CI to clear Reviewed By: bddppq Differential Revision: D19047582 fbshipit-source-id: 34703b03786c8eee9c78d2459eb54bde8dc21a57	2019-12-14 20:29:47 -08:00
Sebastian Messmer	70e9ef518f	c10::string_view (#26616 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26616 Implement C++17 std::string_view for C++11. This is useful for compile time type name retrievaly which I'm going to stack on top of this. It is also useful to replace `const std::string&` with throughout our codebase. ghstack-source-id: 92100314 Test Plan: unit tests Differential Revision: D17518992 fbshipit-source-id: 48e31c677d51b0041f4b37e89a92bd176d4a0b08	2019-10-21 16:10:40 -07:00
Sebastian Messmer	5c67b01467	Switch internal CUDA build to C++14 (#26757 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26757 This doesn't switch any open source builds or CI. The internal fbcode build is C++17 already for quite some time, but in CUDA code, we had it restricted to C++11. This diff changes that to C++14. Because this doesn't change anything open source, the risk of this is low. ghstack-source-id: 90728524 Test Plan: waitforsandcastle Differential Revision: D17558142 fbshipit-source-id: 9cfd47e38e71d5a2fdae2f535c01f281bf007d9a	2019-09-26 14:57:21 -07:00
Johannes M Dieterich	a8d4bb34ea	Unify treatment of warp size / wave size (#25884 ) Summary: Introduce a C10_WARP_SIZE define in Macros.h For kernels that had ifdef-ing of WARP_SIZE for ROCm vs CUDA, use said macro. This is no functional change - we merely refactor to unify on one WARP_SIZE definition. I hope we can encourage use of this macro over more WARP_SIZE definitions being sprinkled across the code base (or numerically hard-coded). Some kernels remain that have their own WARP_SIZE definitions but did not satisfy above condition. They will be fixed in follow-up PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25884 Differential Revision: D17276662 Pulled By: bddppq fbshipit-source-id: cef8e77a74ae2e5de10df816ea80b25cb2bab713	2019-09-10 00:11:09 -07:00
Sam Gross	d8314a6260	Replace nullary/unary/binary loops with generic implementation (#21475 ) Summary: ``` This replaces the kernel helpers in Loops.h/cuh with the following: cpu_kernel cpu_kernel_vec gpu_kernel gpu_kernel_with_scalars These work with functions with any number of input arugments, with the exception of 'gpu_kernel_with_scalars' which is limited to binary operations. Previously, we only supported functions of 0, 1, or 2 input arguments. Adding support for 3 or 4 input argument functions required significant amount of additional code. This makes a few other changes: Remove 'ntensors' from the for_each/serial_for_each loop. Most loops assume a fixed number of tensors, and the value is accessible from TensorIterator::ntensors() Only lift CPU scalars to parameters in 'gpu_kernel_with_scalars'. Previously, we performed this recursively in gpu_unary_kernel and gpu_binary_kernel, so something like `torch.add(3, 4, out=cuda_tensor)` would specialize to a "nullary" kernel. Now, only the first scalar input is lifted to a kernel parameter. Any additional scalar inputs are copied to CUDA tensors. Note that operations like `x + 5` and `5 + x` still work efficiently. This avoids generating an exponential number of specializations in the number of input arguments. ``` Performance measurements Timing numbers are unchanged for basic elementwise operations. Linked below is a script to measure torch.add perf on PR vs. master CPU+GPU (GCC 7.3): [miniperf.py](https://gist.github.com/colesbury/4a61893a22809cb0931f08cd37127be4) Generated assembly cpu_kernel and cpu_kernel_vec still generate good vectorized code with both GCC 7.3 and GCC 4.8.5. Below is the assembly for the "hot" inner loop of torch.add as well as an auto-vectorized torch.mul implementation using cpu_kernel/ binary_kernel. (The real torch.mul uses cpu_kernel_vec but I wanted to check that auto vectorization still works well): [torch.add GCC 7.3](https://gist.github.com/colesbury/927ddbc71dc46899602589e85aef1331) [torch.add GCC 4.8](https://gist.github.com/colesbury/f00e0aafd3d1c54e874e9718253dae16) [torch.mul auto vectorized GCC 7.3](https://gist.github.com/colesbury/3077bfc65db9b4be4532c447bc0f8628) [torch.mul auto vectorized GCC 4.8](https://gist.github.com/colesbury/1b38e158b3f0aaf8aad3a76963fcde86) Pull Request resolved: https://github.com/pytorch/pytorch/pull/21475 Differential Revision: D15745116 Pulled By: colesbury fbshipit-source-id: 914277d7930dc16e94f15bf87484a4ef82890f91	2019-06-17 19:08:33 -07:00
Dmytro Dzhulgakov	c25e33789e	Lightweight at-most-once logging for API usage (#20745 ) Summary: Resubmit #20698 which got messed up. Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl. Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20745 Differential Revision: D15429196 Pulled By: dzhulgakov fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca	2019-05-23 23:17:59 -07:00
Edward Z. Yang	9b1dbffba5	Re-sync with internal repository (#20702 )	2019-05-20 09:22:57 -04:00
Dmytro Dzhulgakov	d3059b9c49	Lightweight logging for once-only API usage	2019-05-19 23:04:40 -07:00
Edward Yang	4e551a7edb	Make C10_NODISCARD macro more portable for nvcc+clang. (#20324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20324 ghimport-source-id: e51181c82f87c946b5ffcb87b0ad71a056cb4659 Differential Revision: D15359317 Pulled By: ezyang fbshipit-source-id: d88798f13a61c74456641ddec8250c08ce8af240	2019-05-17 08:57:19 -07:00
Sebastian Messmer	e710f3b1e1	Fix C10_MOBILE macro for ios (#19779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19779 This macro wasn't set correctly because the target macros weren't included from Apple's header. Reviewed By: dzhulgakov Differential Revision: D15090427 fbshipit-source-id: 43ca44f0f409e11718b7f60c3fdcd2aa02d7018e	2019-04-30 12:03:24 -07:00
Grigory Arutyunov	2336f0ba06	msvc_fixes (#17201 ) Summary: Fixing MSVC errors ``` D:\pytorch-scripts\caffe2_builders\v141\pytorch\aten\src\THC/THCReduce.cuh(144): error C4002: too many actual paramet ers for macro 'C10_LAUNCH_BOUNDS_1' [D:\pytorch-scripts\caffe2_builders\v141\pytorch\build\Debug\caffe2\caffe2_gpu.vcxp roj] D:\pytorch-scripts\caffe2_builders\v141\pytorch\aten\src\THC/THCReduce.cuh(259): error C4002: too many actual paramet ers for macro 'C10_LAUNCH_BOUNDS_1' [D:\pytorch-scripts\caffe2_builders\v141\pytorch\build\Debug\caffe2\caffe2_gpu.vcxp roj] D:/pytorch-scripts/caffe2_builders/v141/pytorch/aten/src/THCUNN/SpatialDilatedMaxPooling.cu(51): error C4002: too man y actual parameters for macro 'C10_LAUNCH_BOUNDS_1' [D:\pytorch-scripts\caffe2_builders\v141\pytorch\build\Debug\caffe2 \caffe2_gpu.vcxproj] ``` on variadic C10_LAUNCH_BOUNDS as well as Debug linking issues with at::Half in pool_op_cudnn.cc like this one ``` pool_op_cudnn.obj : error LNK2019: unresolved external symbol "public: bool __cdecl caffe2::MaxPoolFunctor<class caff e2::CUDAContext>::GlobalPoolingBackward<struct c10::Half,2>(int,int,int,struct c10::Half const ,struct c10::Half const ,struct c10::Half const ,struct c10::Half ,class caffe2::CUDAContext )const " (??$GlobalPoolingBackward@UHalf@c10@ @$01@?$MaxPoolFunctor@VCUDAContext@caffe2@@caffe2@QEBA_NHHHPEBUHalf@c10@00PEAU23@PEAVCUDAContext@1@Z) referenced in function "public: bool __cdecl caffe2::`anonymous namespace'::CuDNNMaxPoolFunctor::GlobalPoolingBackward<struct c10::H alf,2>(int,int,int,struct c10::Half const ,struct c10::Half const ,struct c10::Half const ,struct c10::Half ,class caffe2::CUDAContext )const " (??$GlobalPoolingBackward@UHalf@c10@@$01@CuDNNMaxPoolFunctor@?A0xb936404a@caffe2@QEBA_NH HHPEBUHalf@c10@00PEAU34@PEAVCUDAContext@2@Z) [D:\pytorch-scripts\caffe2_builders\v141\pytorch\build\Debug\caffe2\caff e2_gpu.vcxproj] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17201 Differential Revision: D14165732 Pulled By: ezyang fbshipit-source-id: 875fd9a5b2db6f83fc483f6d750d2c011260eb8b	2019-03-01 15:17:41 -08:00
Junjie Bai	212024282b	Mark cudaGetLastError return value unused in C10_CUDA_CHECK Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17605 Reviewed By: xw285cornell Differential Revision: D14277586 Pulled By: bddppq fbshipit-source-id: 38879208f2ab83cf39d8a8a61b288cd09fcafd9a	2019-03-01 00:05:46 -08:00
Sebastian Messmer	6706e9af19	Make C10_MOBILE consistent with how feature macros are usually used (#17481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17481 Usually, feature macros are either defined or undefined and checked accordingly. C10_MOBILE was a weird special case that was always defined but either defined to 1 or to 0. This caused a lot of confusion for me when trying to disable something from mobile build and it also disabled it from the server build (because I was using ifdef). Also, I found a place in the existing code base that made that wrong assumption and used the macro wrongly, see https://fburl.com/y4icohts Reviewed By: dzhulgakov Differential Revision: D14214825 fbshipit-source-id: f3a155b6d43d334e8839e2b2e3c40ed2c773eab6	2019-02-27 17:57:51 -08:00
Syed Tousif Ahmed	86af14b0c7	Resolves ptxas warnings when compiling for CUDA_ARCH 750 and a memoryType deprecation warning (#15461 ) Summary: When compiling for `TORCH_CUDA_ARCH_LIST=7.5` we were getting ptxas warnings (https://github.com/pytorch/pytorch/issues/14310). This was because we had some hardcoded values when using launch_bounds in kernels. The maximum number of threads per multiprocessor is 1024 for Turing architecture (7.5) but 2048 for previous architectures. The hardcoded launch_bounds in the kernel were requesting for 2048 threads when compiling for Turing and hence were generating the warning. This PR adds a macro that checks for the bounds on the launch bounds value supplied. The max number of threads per block across all architectures is 1024. If a user supplies more than 1024, I just clamp it down to 512. Depending on this value, I set the minimum number of blocks per sm. This PR should resolve https://github.com/pytorch/pytorch/issues/14310. The gradient computation being wrong reported in that PR is probably due to the faulty card. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15461 Differential Revision: D13633952 Pulled By: soumith fbshipit-source-id: 795aa151109f343ab5433bf3cb070cb6ec896fff	2019-01-10 21:44:39 -08:00
Edward Yang	e58bbbac18	Delete dependencies from CUDAStream; remove synchronize_with (#13920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13920 I want to move CUDAStream and CUDAGuard to c10_cuda without also bringing along CUDAContext or CUDAEvent for the ride (at least for now). To do this, I need to eliminate those dependencies. There's a few functions in CUDAContext.h which don't really need THCState, so they're separated out and put in general purpose c10/cuda/CUDAFunctions.h Reviewed By: smessmer Differential Revision: D13047468 fbshipit-source-id: 7ed9d5e660f95805ab39d7af25892327edae050e	2018-11-19 17:05:41 -08:00
Edward Yang	0478d32cb8	Move AlignOf, SmallVector and ArrayRef to c10. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13916 Reviewed By: smessmer Differential Revision: D13046722 fbshipit-source-id: 1583d3170d60e22f0a535cd1fd56bdf928186f5d	2018-11-14 11:13:16 -08:00
Edward Yang	fbabe5bf62	Rename c10::detail to c10::impl (#13838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13838 According to Sebastian, the detail convention is specifically for header-private functionality. That's not what c10/detail is; it's general, library private headers which may be used in multiple places within PyTorch. Rename it to impl to avoid the confusion in nomenclature. Reviewed By: smessmer Differential Revision: D13024368 fbshipit-source-id: 050f2632d83a69e3ae53ded88e8f938c5d61f0ef	2018-11-14 07:39:37 -08:00
Edward Yang	e35418b3be	New implementations of DeviceGuard, StreamGuard and MultiStreamGuard (with CUDA specializations) (#13342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13342 This PR introduces a few new concepts: - DeviceGuardImplInterface, and implementations for CPU and CUDA, which provide a generic interface for interfacing with device and stream state, without requiring a direct dependency on the code in question. - InlineDeviceGuard, a general template for generating both specialized and dynamically dispatched device guard implementations. Dynamic dispatch is done by specializing it on a VirtualGuardImpl. - Provide a device-independent DeviceGuard class, which can be used even from CPU code. It uses the aforementioned dynamic dispatch. - CUDA-specialized CUDAGuard class, which doesn't have a dynamic dispatch but can only be used from CUDA. - StreamGuard, which is the same as above, but for streams rather than devices. - Optional variants of all the aforementioned guards, which are a no-op if no device/stream is specified - CUDAMultiStreamGuard, specifically for the case when we want to set a device on every guard. There are some subtle semantic changes, which have been thoroughly documented in the class definition. BC-breaking changes: - Move constructor/assignment have been removed from all device guard implementations. - In some cases where you previously wrote 'set_device' (or 'set_stream'), you now must write 'reset_device', because if you switch devices/device types, the stream/device on the previous device is unset. This is different from previous behavior. - CUDAGuard no longer handles streams, or multiple streams. Use CUDAStreamGuard or CUDAMultiStreamGuard as appropriate for your use case. Reviewed By: dzhulgakov Differential Revision: D12849620 fbshipit-source-id: f61956256f0b12be754b3234fcc73c2abc1be04e	2018-11-11 12:11:10 -08:00
Jerry Zhang	e06f92785c	Move ATen/core/Macros.h to c10/macros/Macros.h Summary: EXT=h,cc,cpp,hpp,cxx,cu,cuh d=caffe2/aten/ codemod -m -d $d --extensions $EXT 'AT_HOST_DEVICE' 'C10_HOST_DEVICE' codemod -m -d $d --extensions $EXT 'AT_DEVICE' 'C10_DEVICE' codemod -m -d $d --extensions $EXT 'AT_HOST' 'C10_HOST' codemod -m -d $d --extensions $EXT 'AT_ANDROID' 'C10_ANDROID' codemod -m -d $d --extensions $EXT 'AT_IOS' 'C10_IOS' codemod -m -d $d --extensions $EXT 'AT_MOBILE' 'C10_MOBILE' codemod -m -d $d --extensions $EXT 'ATen/core/Macros.h' 'c10/macros/Macros.h' codemod -m -d $d --extensions $EXT 'HIP_HOST_DEVICE' 'C10_HIP_HOST_DEVICE' Reviewed By: dzhulgakov Differential Revision: D12851341 fbshipit-source-id: 7d540530ef779e16ddf2b4cdda9dcc85a61410c3	2018-11-05 12:32:11 -08:00
Sebastian Messmer	979560c9fc	Include c10 namespace into caffe2 and at namespaces. (#12950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12950 For backwards compatibility, we want the c10 symbols to be reachable from caffe2 and aten. When we move classes from at/caffe2 to c10, this 1. allow keeping backwards compatibility with third paty code we can't control 2. allows splitting diffs that move such classes into two diffs, where one only fixes the includes and the second one fixes the namespaces. Reviewed By: ezyang Differential Revision: D10496244 fbshipit-source-id: 914818688fad8c079889dfdc6242bc228b539f0e	2018-10-25 14:08:47 -07:00
Edward Yang	8c514627a4	Add C10_LIKELY/C10_UNLIKELY macros (#12932 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12932 I was looking at some assembly for some code I was working on, and felt a desire to have likely()/unlikely() macros. I checked if we already had them, and we didn't. This commit adds them, and fixes up all known use sites to make use of it. Reviewed By: Maratyszcza Differential Revision: D10488399 fbshipit-source-id: 7476da208907480d49f02b37c7345c17d85c3db7	2018-10-22 16:26:19 -07:00
David Reiss	96d826f635	Define REGISTER_CPU_GRADIENT_OPERATOR (#12588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12588 By default, this is an alias for REGISTER_CPU_OPERATOR. If gradients are not required (e.g., on mobile) it can be converted to a no-op by defining CAFFE2_NO_GRADIENT_OPS, resulting in a smaller build. GRADIENT_OPERATOR_SCHEMA works similarly. CAFFE2_NO_GRADIENT_OPS also converts REGISTER_GRADIENT to a no-op. Use these macros in fully_connected_op.cc as an example. Follow-up diffs will convert more operators. I had to introduce MACRO_EXPAND to handle the way Visual Studio expands VA_ARGS. Reviewed By: Yangqing Differential Revision: D10209468 fbshipit-source-id: 4116d9098b97646bb30a00f2a7d46aa5d7ebcae0	2018-10-22 10:01:02 -07:00
Yangqing Jia	7dbb38e856	Moving logging from caffe2 to c10. (#12881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12881 TSIA. This should not change any functionality. Remaining work: - change the build script to deprecate use of CAFFE2_USE_MINIMAL_GOOGLE_GLOG and use a C10 macro instead. - Unify the exception name (EnforceNotMet -> Error) - Unify the logging and warning APIs (like AT_WARNING) Reviewed By: dzhulgakov Differential Revision: D10441597 fbshipit-source-id: 4784dc0cd5af83dacb10c4952a2d1d7236b3f14d	2018-10-19 20:22:08 -07:00
Yangqing Jia	7d5f7ed270	Using c10 namespace across caffe2. (#12714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12714 This is a short change to enable c10 namespace in caffe2. We did not enable it before due to gflags global variable confusion, but it should have been mostly cleaned now. Right now, the plan on record is that namespace caffe2 and namespace aten will fully be supersets of namespace c10. Most of the diff is codemod, and only two places of non-codemod is in caffe2/core/common.h, where ``` using namespace c10; ``` is added, and in Flags.h, where instead of creating aliasing variables in c10 namespace, we directly put it in the global namespace to match gflags (and same behavior if gflags is not being built with). Reviewed By: dzhulgakov Differential Revision: D10390486 fbshipit-source-id: 5e2df730e28e29a052f513bddc558d9f78a23b9b	2018-10-17 12:57:19 -07:00
Edward Yang	07d67aa17a	Make TensorOptions immutable. (#12630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12630 Instead of providing mutable accessors, our "mutators" now return new copies of TensorOptions. Since TensorOptions is simply two 64-bit integers, this is not a big efficiency problem. There may be some sites that assumed that TensorOptions was mutable. They need to be fixed. Reviewed By: SsnL Differential Revision: D10249293 fbshipit-source-id: b3d17acc37e78c0b90ea2c29515de5dd01209bd3	2018-10-15 08:30:16 -07:00
Sebastian Messmer	6f664d3917	Improve TypeMeta (#11502 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11502 TypeMeta now is only a pointer to a TypeMetaData structure, of which there is exactly one global instance per type. This reduces the size of everything storing a TypeMeta (Tensor, Blob, ...) and potentially improves performance. Also, this diff gets rid of the type name registry in favor of static strings. Experiments (summary: 1-3% perf gain) - Service Lab: https://our.intern.facebook.com/intern/servicelab/30712497/ -> No significant results found. - Mobile Lab c10bench.json: https://our.intern.facebook.com/intern/fblearner/details/75984908/ -> 1-3% perf gain - Mobile Lab c10bench default: https://our.intern.facebook.com/intern/fblearner/details/75984999/ -> 2-3% perf gain - adindexer canary: https://our.intern.facebook.com/intern/ads/canary/413002142824203076 -> no significant changes (benchmark too noisy) - adfinder canary: https://our.intern.facebook.com/intern/ads/canary/413002166737860362 -> no significant changes (benchmark too noisy) Reviewed By: dzhulgakov Differential Revision: D9763422 fbshipit-source-id: fc08937f114af5ff9f3ddbe7c7e396942868cdf5	2018-10-06 14:09:28 -07:00
Yangqing Jia	9c49bb9ddf	Move registry fully to c10 (#12077 ) Summary: This does 6 things: - add c10/util/Registry.h as the unified registry util - cleaned up some APIs such as export condition - fully remove aten/core/registry.h - fully remove caffe2/core/registry.h - remove a bogus aten/registry.h - unifying all macros - set up registry testing in c10 Also, an important note that we used to mark the templated Registry class as EXPORT - this should not happen, because one should almost never export a template class. This PR fixes that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12077 Reviewed By: ezyang Differential Revision: D10050771 Pulled By: Yangqing fbshipit-source-id: 417b249b49fed6a67956e7c6b6d22374bcee24cf	2018-09-27 03:09:54 -07:00
Yangqing Jia	a6f1ae7f20	set up c10 scaffolding. Move macros proper first. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11939 Reviewed By: orionr, dzhulgakov Differential Revision: D10004629 Pulled By: Yangqing fbshipit-source-id: ba50a96820d35c7922d81c78c4cbe849c85c251c	2018-09-24 11:09:59 -07:00

1 2 3

135 Commits