Commit Graph

135 Commits

Author SHA1 Message Date
Nikita Shulga
4edff32f81 [c10] Fix typo in __assert_fail noreturn modifier guard (#34157)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34157

`[[noreturn]` only conficts with CUDA __asert_fail defition if clang is used if host compiler

Test Plan: CI

Reviewed By: EscapeZero

Differential Revision: D20232088

fbshipit-source-id: 7182c28a15278e03175865cd0c87410c5de5bf2c
2020-03-03 17:25:25 -08:00
Nikita Shulga
0689cf8fc1 [c10] Make __assert_fail CUDA definition compilable with clang host compiler (#34102)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34102

if nvcc is invoked with clang host compiler, it will fail with the following error due to the decorators mismatch defined in cuda and c10:
```
 error: attribute "noreturn" did not appear on original declaration
```

Test Plan: Build pytorch with clang

Reviewed By: EscapeZero

Differential Revision: D20204951

fbshipit-source-id: ff7cef0db43436e50590cb4bbf1ae7302c1440fa
2020-03-02 20:11:49 -08:00
Wojciech Baranowski
8aa09de19e build: set -DNDEBUG in Release (#32719)
Summary:
This might lead to silent undefined behaviour (e.g. with out-of-bound indices). This affects `test_multinomial_invalid_probs_cuda` which is now removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32719

Test Plan:
* Build with VERBOSE=1 and manually inspect `less ndebug.build.log | grep 'c++' | grep -v -- -DNDEBUG` (only with nina on Linux)
* CI

Fixes https://github.com/pytorch/pytorch/issues/22745

Differential Revision: D20104340

Pulled By: yf225

fbshipit-source-id: 2ebfd7ddae632258a36316999eeb5c968fb7642c
2020-02-26 12:53:31 -08:00
Michael Ranieri
9b2b15f4fc misc windows warning fixes (#33632)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33632

* `inline_container.h` was unnecessarily exposing all includers to caffe2 headers via `caffe2/core/logging.h`
* Add msvc version of hiding unused warnings.
* Make sure clang on windows does not use msvc pragmas.
* Don't redefine math macro.

Test Plan: CI green

Differential Revision: D20017046

fbshipit-source-id: 230a9743eb88aee08d0a4833680ec2f01b7ab1e9
2020-02-21 19:36:25 -08:00
Michael Ranieri
40265e2d66 prevent various warnings related to undef and redef (#33196)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33196

Test Plan: Sandcastle green

Reviewed By: malfet

Differential Revision: D19842268

fbshipit-source-id: 47bc3d7a75e803041491e11a648b4a9e7d9cc72c
2020-02-12 13:28:35 -08:00
Sebastian Messmer
c21f89970f Remove c++14-conditional constexpr (#30916)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30916

These macros said "make it constexpr if we're in C++14". Since we're now always C++14, we can just say "constexpr" isntead.
ghstack-source-id: 96369584

Test Plan: waitforsandcastle

Differential Revision: D18869635

fbshipit-source-id: f41751e4e26fad6214ec3a98db2d961315fd73ff
2020-01-07 16:40:11 -08:00
Sebastian Messmer
409151e1bb Use [[noreturn]] instead of C10_NORETURN or CAFFE_NORETURN (#30917)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30917

This is a C++14 feature, we can use this now.
ghstack-source-id: 95255753

Test Plan: waitforsandcastle

Differential Revision: D18869637

fbshipit-source-id: dd02036b9faeaffa64b2d2d305725443054da31b
2019-12-15 23:54:16 -08:00
Serhat Yilmaz
57ee7dab87 Wraps assert statements in cuda kernels (#31276)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31276

Change assert --> CUDA_ASSERT_KERNEL to avoid hip undefined __assert_fail()

This is similar to https://github.com/pytorch/pytorch/pull/13902 in caffe2 land.

Test Plan: wait for CI to clear

Reviewed By: bddppq

Differential Revision: D19047582

fbshipit-source-id: 34703b03786c8eee9c78d2459eb54bde8dc21a57
2019-12-14 20:29:47 -08:00
Sebastian Messmer
70e9ef518f c10::string_view (#26616)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26616

Implement C++17 std::string_view for C++11.

This is useful for compile time type name retrievaly which I'm going to stack on top of this.
It is also useful to replace `const std::string&` with throughout our codebase.
ghstack-source-id: 92100314

Test Plan: unit tests

Differential Revision: D17518992

fbshipit-source-id: 48e31c677d51b0041f4b37e89a92bd176d4a0b08
2019-10-21 16:10:40 -07:00
Sebastian Messmer
5c67b01467 Switch internal CUDA build to C++14 (#26757)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26757

This doesn't switch any open source builds or CI.
The internal fbcode build is C++17 already for quite some time, but in CUDA code, we had it restricted to C++11.
This diff changes that to C++14.

Because this doesn't change anything open source, the risk of this is low.
ghstack-source-id: 90728524

Test Plan: waitforsandcastle

Differential Revision: D17558142

fbshipit-source-id: 9cfd47e38e71d5a2fdae2f535c01f281bf007d9a
2019-09-26 14:57:21 -07:00
Johannes M Dieterich
a8d4bb34ea Unify treatment of warp size / wave size (#25884)
Summary:
Introduce a C10_WARP_SIZE define in Macros.h

For kernels that had ifdef-ing of WARP_SIZE for ROCm vs CUDA, use said macro. This is no functional change - we merely refactor to unify on one WARP_SIZE definition.

I hope we can encourage use of this macro over more WARP_SIZE definitions being sprinkled across the code base (or numerically hard-coded).

Some kernels remain that have their own WARP_SIZE definitions but did not satisfy above condition. They will be fixed in follow-up PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25884

Differential Revision: D17276662

Pulled By: bddppq

fbshipit-source-id: cef8e77a74ae2e5de10df816ea80b25cb2bab713
2019-09-10 00:11:09 -07:00
Sam Gross
d8314a6260 Replace nullary/unary/binary loops with generic implementation (#21475)
Summary:
```
This replaces the kernel helpers in Loops.h/cuh with the following:

  cpu_kernel
  cpu_kernel_vec

  gpu_kernel
  gpu_kernel_with_scalars

These work with functions with any number of input arugments, with the
exception of 'gpu_kernel_with_scalars' which is limited to binary
operations. Previously, we only supported functions of 0, 1, or 2 input
arguments. Adding support for 3 or 4 input argument functions required
significant amount of additional code.

This makes a few other changes:

Remove 'ntensors' from the for_each/serial_for_each loop. Most loops
assume a fixed number of tensors, and the value is accessible from
TensorIterator::ntensors()

Only lift CPU scalars to parameters in 'gpu_kernel_with_scalars'.
Previously, we performed this recursively in gpu_unary_kernel and
gpu_binary_kernel, so something like `torch.add(3, 4, out=cuda_tensor)`
would specialize to a "nullary" kernel. Now, only the first
scalar input is lifted to a kernel parameter. Any additional scalar
inputs are copied to CUDA tensors. Note that operations like `x + 5`
and `5 + x` still work efficiently. This avoids generating an exponential
number of specializations in the number of input arguments.
```

**Performance measurements**
Timing numbers are unchanged for basic elementwise operations. Linked below is a script to measure torch.add perf on PR vs. master CPU+GPU (GCC 7.3):
[miniperf.py](https://gist.github.com/colesbury/4a61893a22809cb0931f08cd37127be4)

**Generated assembly**
cpu_kernel and cpu_kernel_vec still generate good vectorized code with
both GCC 7.3 and GCC 4.8.5. Below is the assembly for the "hot" inner loop of
torch.add as well as an auto-vectorized torch.mul implementation using cpu_kernel/
binary_kernel. (The real torch.mul uses cpu_kernel_vec but I wanted to check that
auto vectorization still works well):

[torch.add GCC 7.3](https://gist.github.com/colesbury/927ddbc71dc46899602589e85aef1331)
[torch.add GCC 4.8](https://gist.github.com/colesbury/f00e0aafd3d1c54e874e9718253dae16)
[torch.mul auto vectorized GCC 7.3](https://gist.github.com/colesbury/3077bfc65db9b4be4532c447bc0f8628)
[torch.mul auto vectorized GCC 4.8](https://gist.github.com/colesbury/1b38e158b3f0aaf8aad3a76963fcde86)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21475

Differential Revision: D15745116

Pulled By: colesbury

fbshipit-source-id: 914277d7930dc16e94f15bf87484a4ef82890f91
2019-06-17 19:08:33 -07:00
Dmytro Dzhulgakov
c25e33789e Lightweight at-most-once logging for API usage (#20745)
Summary:
Resubmit #20698 which got messed up.

Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl.

Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20745

Differential Revision: D15429196

Pulled By: dzhulgakov

fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca
2019-05-23 23:17:59 -07:00
Edward Z. Yang
9b1dbffba5
Re-sync with internal repository (#20702) 2019-05-20 09:22:57 -04:00
Dmytro Dzhulgakov
d3059b9c49 Lightweight logging for once-only API usage 2019-05-19 23:04:40 -07:00
Edward Yang
4e551a7edb Make C10_NODISCARD macro more portable for nvcc+clang. (#20324)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20324
ghimport-source-id: e51181c82f87c946b5ffcb87b0ad71a056cb4659

Differential Revision: D15359317

Pulled By: ezyang

fbshipit-source-id: d88798f13a61c74456641ddec8250c08ce8af240
2019-05-17 08:57:19 -07:00
Sebastian Messmer
e710f3b1e1 Fix C10_MOBILE macro for ios (#19779)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19779

This macro wasn't set correctly because the target macros weren't included from Apple's header.

Reviewed By: dzhulgakov

Differential Revision: D15090427

fbshipit-source-id: 43ca44f0f409e11718b7f60c3fdcd2aa02d7018e
2019-04-30 12:03:24 -07:00
Grigory Arutyunov
2336f0ba06 msvc_fixes (#17201)
Summary:
Fixing MSVC errors

```
  D:\pytorch-scripts\caffe2_builders\v141\pytorch\aten\src\THC/THCReduce.cuh(144): error C4002: too many actual paramet
ers for macro 'C10_LAUNCH_BOUNDS_1' [D:\pytorch-scripts\caffe2_builders\v141\pytorch\build\Debug\caffe2\caffe2_gpu.vcxp
roj]
  D:\pytorch-scripts\caffe2_builders\v141\pytorch\aten\src\THC/THCReduce.cuh(259): error C4002: too many actual paramet
ers for macro 'C10_LAUNCH_BOUNDS_1' [D:\pytorch-scripts\caffe2_builders\v141\pytorch\build\Debug\caffe2\caffe2_gpu.vcxp
roj]
  D:/pytorch-scripts/caffe2_builders/v141/pytorch/aten/src/THCUNN/SpatialDilatedMaxPooling.cu(51): error C4002: too man
y actual parameters for macro 'C10_LAUNCH_BOUNDS_1' [D:\pytorch-scripts\caffe2_builders\v141\pytorch\build\Debug\caffe2
\caffe2_gpu.vcxproj]
```

on variadic C10_LAUNCH_BOUNDS as well as Debug linking issues with at::Half in pool_op_cudnn.cc like this one

```
pool_op_cudnn.obj : error LNK2019: unresolved external symbol "public: bool __cdecl caffe2::MaxPoolFunctor<class caff
e2::CUDAContext>::GlobalPoolingBackward<struct c10::Half,2>(int,int,int,struct c10::Half const *,struct c10::Half const
 ,struct c10::Half const ,struct c10::Half ,class caffe2::CUDAContext )const " (??$GlobalPoolingBackward@UHalf@c10@
@$01@?$MaxPoolFunctor@VCUDAContext@caffe2@@caffe2@QEBA_NHHHPEBUHalf@c10@00PEAU23@PEAVCUDAContext@1@Z) referenced in
 function "public: bool __cdecl caffe2::`anonymous namespace'::CuDNNMaxPoolFunctor::GlobalPoolingBackward<struct c10::H
alf,2>(int,int,int,struct c10::Half const ,struct c10::Half const ,struct c10::Half const ,struct c10::Half ,class
caffe2::CUDAContext *)const " (??$GlobalPoolingBackward@UHalf@c10@@$01@CuDNNMaxPoolFunctor@?A0xb936404a@caffe2@QEBA_NH
HHPEBUHalf@c10@00PEAU34@PEAVCUDAContext@2@Z) [D:\pytorch-scripts\caffe2_builders\v141\pytorch\build\Debug\caffe2\caff
e2_gpu.vcxproj]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17201

Differential Revision: D14165732

Pulled By: ezyang

fbshipit-source-id: 875fd9a5b2db6f83fc483f6d750d2c011260eb8b
2019-03-01 15:17:41 -08:00
Junjie Bai
212024282b Mark cudaGetLastError return value unused in C10_CUDA_CHECK
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17605

Reviewed By: xw285cornell

Differential Revision: D14277586

Pulled By: bddppq

fbshipit-source-id: 38879208f2ab83cf39d8a8a61b288cd09fcafd9a
2019-03-01 00:05:46 -08:00
Sebastian Messmer
6706e9af19 Make C10_MOBILE consistent with how feature macros are usually used (#17481)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17481

Usually, feature macros are either defined or undefined and checked accordingly.
C10_MOBILE was a weird special case that was always defined but either defined to 1 or to 0.

This caused a lot of confusion for me when trying to disable something from mobile build and it also disabled it
from the server build (because I was using ifdef). Also, I found a place in the existing code base that made
that wrong assumption and used the macro wrongly, see https://fburl.com/y4icohts

Reviewed By: dzhulgakov

Differential Revision: D14214825

fbshipit-source-id: f3a155b6d43d334e8839e2b2e3c40ed2c773eab6
2019-02-27 17:57:51 -08:00
Syed Tousif Ahmed
86af14b0c7 Resolves ptxas warnings when compiling for CUDA_ARCH 750 and a memoryType deprecation warning (#15461)
Summary:
When compiling for `TORCH_CUDA_ARCH_LIST=7.5` we were getting ptxas warnings (https://github.com/pytorch/pytorch/issues/14310). This was because we had some hardcoded values when using launch_bounds in kernels. The maximum number of threads per multiprocessor is 1024 for Turing architecture (7.5) but 2048 for previous architectures. The hardcoded launch_bounds in the kernel were requesting for 2048 threads when compiling for Turing and hence were generating the warning.

This PR adds a macro that checks for the bounds on the launch bounds value supplied. The max number of threads per block across all architectures is 1024. If a user supplies more than 1024, I just clamp it down to 512. Depending on this value, I set the minimum number of blocks per sm. This PR should resolve https://github.com/pytorch/pytorch/issues/14310. The gradient computation being wrong reported in that PR is probably due to the faulty card.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15461

Differential Revision: D13633952

Pulled By: soumith

fbshipit-source-id: 795aa151109f343ab5433bf3cb070cb6ec896fff
2019-01-10 21:44:39 -08:00
Edward Yang
e58bbbac18 Delete dependencies from CUDAStream; remove synchronize_with (#13920)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13920

I want to move CUDAStream and CUDAGuard to c10_cuda without also
bringing along CUDAContext or CUDAEvent for the ride (at least for
now).  To do this, I need to eliminate those dependencies.

There's a few functions in CUDAContext.h which don't really need
THCState, so they're separated out and put in general
purpose c10/cuda/CUDAFunctions.h

Reviewed By: smessmer

Differential Revision: D13047468

fbshipit-source-id: 7ed9d5e660f95805ab39d7af25892327edae050e
2018-11-19 17:05:41 -08:00
Edward Yang
0478d32cb8 Move AlignOf, SmallVector and ArrayRef to c10.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13916

Reviewed By: smessmer

Differential Revision: D13046722

fbshipit-source-id: 1583d3170d60e22f0a535cd1fd56bdf928186f5d
2018-11-14 11:13:16 -08:00
Edward Yang
fbabe5bf62 Rename c10::detail to c10::impl (#13838)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13838

According to Sebastian, the detail convention is specifically for header-private
functionality.  That's not what c10/detail is; it's general, library private headers
which may be used in multiple places within PyTorch.  Rename it to impl to avoid
the confusion in nomenclature.

Reviewed By: smessmer

Differential Revision: D13024368

fbshipit-source-id: 050f2632d83a69e3ae53ded88e8f938c5d61f0ef
2018-11-14 07:39:37 -08:00
Edward Yang
e35418b3be New implementations of DeviceGuard, StreamGuard and MultiStreamGuard (with CUDA specializations) (#13342)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13342

This PR introduces a few new concepts:

- DeviceGuardImplInterface, and implementations for CPU and CUDA, which
  provide a generic interface for interfacing with device and stream state,
  without requiring a direct dependency on the code in question.
- InlineDeviceGuard, a general template for generating both specialized
  and dynamically dispatched device guard implementations.  Dynamic
  dispatch is done by specializing it on a VirtualGuardImpl.
- Provide a device-independent DeviceGuard class, which can be used even
  from CPU code. It uses the aforementioned dynamic dispatch.
- CUDA-specialized CUDAGuard class, which doesn't have a dynamic dispatch
  but can only be used from CUDA.
- StreamGuard, which is the same as above, but for streams rather than
  devices.
- Optional variants of all the aforementioned guards, which are a no-op if
  no device/stream is specified
- CUDAMultiStreamGuard, specifically for the case when we want to set
  a device on every guard.

There are some subtle semantic changes, which have been thoroughly documented
in the class definition.

BC-breaking changes:

- Move constructor/assignment have been removed from all device guard
  implementations.
- In some cases where you previously wrote 'set_device' (or 'set_stream'), you now must write
  'reset_device', because if you switch devices/device types, the stream/device on the
  previous device is unset.  This is different from previous behavior.
- CUDAGuard no longer handles streams, or multiple streams.  Use CUDAStreamGuard
  or CUDAMultiStreamGuard as appropriate for your use case.

Reviewed By: dzhulgakov

Differential Revision: D12849620

fbshipit-source-id: f61956256f0b12be754b3234fcc73c2abc1be04e
2018-11-11 12:11:10 -08:00
Jerry Zhang
e06f92785c Move ATen/core/Macros.h to c10/macros/Macros.h
Summary:
EXT=h,cc,cpp,hpp,cxx,cu,cuh
d=caffe2/aten/
codemod -m -d $d --extensions $EXT 'AT_HOST_DEVICE' 'C10_HOST_DEVICE'
codemod -m -d $d --extensions $EXT 'AT_DEVICE' 'C10_DEVICE'
codemod -m -d $d --extensions $EXT 'AT_HOST' 'C10_HOST'
codemod -m -d $d --extensions $EXT 'AT_ANDROID' 'C10_ANDROID'
codemod -m -d $d --extensions $EXT 'AT_IOS' 'C10_IOS'
codemod -m -d $d --extensions $EXT 'AT_MOBILE' 'C10_MOBILE'
codemod -m -d $d --extensions $EXT 'ATen/core/Macros.h' 'c10/macros/Macros.h'
codemod -m -d $d --extensions $EXT 'HIP_HOST_DEVICE' 'C10_HIP_HOST_DEVICE'

Reviewed By: dzhulgakov

Differential Revision: D12851341

fbshipit-source-id: 7d540530ef779e16ddf2b4cdda9dcc85a61410c3
2018-11-05 12:32:11 -08:00
Sebastian Messmer
979560c9fc Include c10 namespace into caffe2 and at namespaces. (#12950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12950

For backwards compatibility, we want the c10 symbols to be reachable from caffe2 and aten.
When we move classes from at/caffe2 to c10, this
 1. allow keeping backwards compatibility with third paty code we can't control
 2. allows splitting diffs that move such classes into two diffs, where one only fixes the includes and the second one fixes the namespaces.

Reviewed By: ezyang

Differential Revision: D10496244

fbshipit-source-id: 914818688fad8c079889dfdc6242bc228b539f0e
2018-10-25 14:08:47 -07:00
Edward Yang
8c514627a4 Add C10_LIKELY/C10_UNLIKELY macros (#12932)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12932

I was looking at some assembly for some code I was working on,
and felt a desire to have likely()/unlikely() macros.  I checked
if we already had them, and we didn't.  This commit adds them,
and fixes up all known use sites to make use of it.

Reviewed By: Maratyszcza

Differential Revision: D10488399

fbshipit-source-id: 7476da208907480d49f02b37c7345c17d85c3db7
2018-10-22 16:26:19 -07:00
David Reiss
96d826f635 Define REGISTER_CPU_GRADIENT_OPERATOR (#12588)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12588

By default, this is an alias for REGISTER_CPU_OPERATOR.  If gradients are not
required (e.g., on mobile) it can be converted to a no-op by defining
CAFFE2_NO_GRADIENT_OPS, resulting in a smaller build.

GRADIENT_OPERATOR_SCHEMA works similarly.

CAFFE2_NO_GRADIENT_OPS also converts REGISTER_GRADIENT to a no-op.

Use these macros in fully_connected_op.cc as an example.
Follow-up diffs will convert more operators.

I had to introduce MACRO_EXPAND to handle the way Visual Studio expands
VA_ARGS.

Reviewed By: Yangqing

Differential Revision: D10209468

fbshipit-source-id: 4116d9098b97646bb30a00f2a7d46aa5d7ebcae0
2018-10-22 10:01:02 -07:00
Yangqing Jia
7dbb38e856 Moving logging from caffe2 to c10. (#12881)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12881

TSIA. This should not change any functionality.

Remaining work:
- change the build script to deprecate use of CAFFE2_USE_MINIMAL_GOOGLE_GLOG and use a C10 macro instead.
- Unify the exception name (EnforceNotMet -> Error)
- Unify the logging and warning APIs (like AT_WARNING)

Reviewed By: dzhulgakov

Differential Revision: D10441597

fbshipit-source-id: 4784dc0cd5af83dacb10c4952a2d1d7236b3f14d
2018-10-19 20:22:08 -07:00
Yangqing Jia
7d5f7ed270 Using c10 namespace across caffe2. (#12714)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12714

This is a short change to enable c10 namespace in caffe2. We did not enable
it before due to gflags global variable confusion, but it should have been
mostly cleaned now. Right now, the plan on record is that namespace caffe2 and
namespace aten will fully be supersets of namespace c10.

Most of the diff is codemod, and only two places of non-codemod is in caffe2/core/common.h, where

```
using namespace c10;
```

is added, and in Flags.h, where instead of creating aliasing variables in c10 namespace, we directly put it in the global namespace to match gflags (and same behavior if gflags is not being built with).

Reviewed By: dzhulgakov

Differential Revision: D10390486

fbshipit-source-id: 5e2df730e28e29a052f513bddc558d9f78a23b9b
2018-10-17 12:57:19 -07:00
Edward Yang
07d67aa17a Make TensorOptions immutable. (#12630)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12630

Instead of providing mutable accessors, our "mutators" now
return new copies of TensorOptions.  Since TensorOptions is
simply two 64-bit integers, this is not a big efficiency
problem.

There may be some sites that assumed that TensorOptions was
mutable.  They need to be fixed.

Reviewed By: SsnL

Differential Revision: D10249293

fbshipit-source-id: b3d17acc37e78c0b90ea2c29515de5dd01209bd3
2018-10-15 08:30:16 -07:00
Sebastian Messmer
6f664d3917 Improve TypeMeta (#11502)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11502

TypeMeta now is only a pointer to a TypeMetaData structure, of which there is exactly one global instance per type.
This reduces the size of everything storing a TypeMeta (Tensor, Blob, ...) and potentially improves performance.

Also, this diff gets rid of the type name registry in favor of static strings.

Experiments (summary: 1-3% perf gain)
- Service Lab: https://our.intern.facebook.com/intern/servicelab/30712497/
 -> No significant results found.
- Mobile Lab c10bench.json: https://our.intern.facebook.com/intern/fblearner/details/75984908/
 -> 1-3% perf gain
- Mobile Lab c10bench default: https://our.intern.facebook.com/intern/fblearner/details/75984999/
 -> 2-3% perf gain
- adindexer canary: https://our.intern.facebook.com/intern/ads/canary/413002142824203076
 -> no significant changes (benchmark too noisy)
- adfinder canary: https://our.intern.facebook.com/intern/ads/canary/413002166737860362
 -> no significant changes (benchmark too noisy)

Reviewed By: dzhulgakov

Differential Revision: D9763422

fbshipit-source-id: fc08937f114af5ff9f3ddbe7c7e396942868cdf5
2018-10-06 14:09:28 -07:00
Yangqing Jia
9c49bb9ddf Move registry fully to c10 (#12077)
Summary:
This does 6 things:

- add c10/util/Registry.h as the unified registry util
  - cleaned up some APIs such as export condition
- fully remove aten/core/registry.h
- fully remove caffe2/core/registry.h
- remove a bogus aten/registry.h
- unifying all macros
- set up registry testing in c10

Also, an important note that we used to mark the templated Registry class as EXPORT - this should not happen, because one should almost never export a template class. This PR fixes that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12077

Reviewed By: ezyang

Differential Revision: D10050771

Pulled By: Yangqing

fbshipit-source-id: 417b249b49fed6a67956e7c6b6d22374bcee24cf
2018-09-27 03:09:54 -07:00
Yangqing Jia
a6f1ae7f20 set up c10 scaffolding. Move macros proper first.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11939

Reviewed By: orionr, dzhulgakov

Differential Revision: D10004629

Pulled By: Yangqing

fbshipit-source-id: ba50a96820d35c7922d81c78c4cbe849c85c251c
2018-09-24 11:09:59 -07:00