Commit Graph

22 Commits

Author SHA1 Message Date
Richard Barnes
3f5b59eef4 [codemod] c10::optional -> std::optional in caffe2/aten/src/ATen/DeviceGuard.h +117 (#126901)
Summary:
Generated with
```
fbgs -f '.*\.(cpp|cxx|cc|h|hpp|cu|cuh)$' c10::optional -l | perl -pe 's/^fbsource.fbcode.//' | grep -v executorch | xargs -n 50 perl -pi -e 's/c10::optional/std::optional/g'
```

 - If you approve of this diff, please use the "Accept & Ship" button :-)

(117 files modified.)

Test Plan: Sandcastle

Reviewed By: palmje

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126901
Approved by: https://github.com/Skylion007, https://github.com/eqy
2024-05-24 00:26:15 +00:00
Richard Barnes
ed327876f5 [codemod] c10:optional -> std::optional (#126135)
Generated by running the following from PyTorch root:
```
find . -regex ".*\.\(cpp\|h\|cu\|hpp\|cc\|cxx\)$" | grep -v "build/" | xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/'
```

`c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi
2024-05-14 19:35:51 +00:00
cyy
97918e8c37 [Clang-tidy header][18/N] Enable clang-tidy on headers in torch/csrc/cuda (#118504)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118504
Approved by: https://github.com/albanD
2024-02-23 16:47:33 +00:00
Nikita Shulga
dfd441a12c [BE] Use nested namespaces in torch/csrc/cuda (#106928)
<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 6b1dde1</samp>

> _`namespace` syntax_
> _Simplified with C++17_
> _Code is more readable_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106928
Approved by: https://github.com/huydhn, https://github.com/izaitsevfb
2023-08-10 03:56:09 +00:00
Zachary DeVito
4b372e3958 [memory profiling] C++ tracing support (#95357)
Adds the ability to quickly generate stack traces for C++,
and combine Python, TorchScript, and C++ frames into a single trace.

This makes it possible for the memory tracer to record allocations inside
C++ code (e.g. convolution temporaries, backward operators).

The unwinder code is ~10x faster than execinfo.h's backward because it
cache fast unwinder routines for instruction pointers that have already been seen.
It is also only 1.2--2x slower than copying the entire stack (the approach perf takes),
while using 2 orders of magnitude less space per stack.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95357
Approved by: https://github.com/bertmaher
2023-03-12 07:24:14 +00:00
Michael Suo
30fb2c4aba [lint] autoformat test/cpp and torch/csrc
Let's have some fun.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828

Approved by: https://github.com/ezyang
2022-06-11 21:11:16 +00:00
Thomas Viehmann
ac1fe91dc9 Clean up some THC includes (#69024)
Summary:
These seem to not be needed and cause ninja to rebuild the files at every build.

(There also is THCStorage.cu, but hopefully this will go away with https://github.com/pytorch/pytorch/issues/68556 )

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69024

Reviewed By: soulitzer

Differential Revision: D32705309

Pulled By: ngimel

fbshipit-source-id: 5255297f213fdcf36e7203de7460a71291f8c9a0
2021-11-29 20:55:27 -08:00
SsnL
de7ac60cf4 Add out= variants for cuda.comm.broadcast/gather/scatter (#39681)
Summary:
Partially fixes https://github.com/pytorch/pytorch/issues/38911
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39681

Differential Revision: D22161342

Pulled By: mrshenli

fbshipit-source-id: 60295077159b02087823e93bb6ebac9d70adea0a
2020-06-24 12:58:19 -07:00
Edward Yang
1111a6b810 Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL (#30274)
Summary:
Reland of https://github.com/pytorch/pytorch/pull/29095
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30274

Differential Revision: D18762293

Pulled By: ezyang

fbshipit-source-id: d3d50c2dd12bcb678ab25fa708eb6587cc4b66f9
2019-12-02 12:19:58 -08:00
Mike Ruberry
eff4c4d7c1 Revert D18301806: Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL
Test Plan: revert-hammer

Differential Revision:
D18301806

Original commit changeset: 03da6a26c41e

fbshipit-source-id: c1324ee8d154e7e16f5dd4f1cf3625aaa566cd39
2019-11-21 14:50:07 -08:00
Alan Du
f4b9690f2d Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL (#29095)
Summary:
Given that pybind11 implements these gil functions, I don't think it makes sense for Pytorch to have its own bespoke versions.

Fixes https://github.com/pytorch/pytorch/issues/29065
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29095

Differential Revision: D18301806

Pulled By: ezyang

fbshipit-source-id: 03da6a26c41ee65aaadf7b67b9f0b14d2def2a5a
2019-11-21 13:44:40 -08:00
Mikhail Zolotukhin
1905bbb01d Include ATen/core/functional.h directly instead of torch/csrc/utils/functional.h. (#16377)
Summary:
One more shim removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16377

Differential Revision: D13821816

Pulled By: ZolotukhinM

fbshipit-source-id: 007f014d404de51841437db7eef28367a2f6e46b
2019-01-30 14:02:34 -08:00
Edward Yang
517c7c9861 Canonicalize all includes in PyTorch. (#14849)
Summary:
Anywhere we used #include "foo.h", we now say #include <foo.h>
Paths are adjusted to be rooted out of aten/src, torch/lib, or
the root level directory.

I modified CMakeLists.txt by hand to remove TH and THC from
the include paths.

I used the following script to do the canonicalization:

```
  import subprocess
  import re
  import os.path

  files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n')
  for fn in files:
      if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']):
          continue
      if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]):
          continue
      with open(fn, 'r') as f:
          c = f.read()
      def fmt(p):
          return "#include <{}>".format(p)
      def repl(m):
          p = m.group(1)
          if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]:
              return fmt(p)
          if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]):
              return fmt(p)
          for root in ["aten/src", "torch/lib", ""]:
              for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]:
                  new_p = os.path.relpath(os.path.join(bad_root, p), root)
                  if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))):
                      return fmt(new_p)
          print("ERROR: ", fn, p)
          return m.group(0)
      new_c = re.sub(r'#include "([^"]+)"', repl, c)
      if new_c != c:
          print(fn)
          with open(fn, 'w') as f:
              f.write(new_c)
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849

Reviewed By: dzhulgakov

Differential Revision: D13363445

Pulled By: ezyang

fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68
2018-12-08 19:38:30 -08:00
Edward Yang
1a4473bbd7 Rewrite THPUtils_PySequence_to_CUDAStreamList to return vector<optional<CUDAStream>> (#13125)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13125

Previously, it returned a vector of THCStream*, which we eventually turned
into CUDAStream.  No need to spatter the conversion code everywhere: just
do it correctly to begin with.  An important side effect of doing it this
way is that we no longer pass nullptr to CUDAStream; instead, we create
the default stream.  I will rely on this in a later patch.

Reviewed By: gchanan

Differential Revision: D10853224

fbshipit-source-id: f6bd6594eba4626eb41a4a5e67fc64c9bbb46a1a
2018-10-29 08:27:23 -07:00
Yangqing Jia
713e706618 Move exception to C10 (#12354)
Summary:
There are still a few work to be done:

- Move logging and unify AT_WARN with LOG(ERROR).
- A few header files are still being plumbed through, need cleaning.
- caffe2::EnforceNotMet aliasing is not done yet.
- need to unify the macros. See c10/util/Exception.h

This is mainly a codemod and not causing functional changes. If you find your job failing and trace back to this diff, usually it can be fixed by the following approaches:

(1) add //caffe2/c10:c10 to your dependency (or transitive dependency).
(2) change objects such as at::Error, at::Optional to the c10 namespace.
(3) change functions to the c10 namespace. Especially, caffe2::MakeString is not overridden by the unified c10::str function. Nothing else changes.

Please kindly consider not reverting this diff - it involves multiple rounds of rebasing and the fix is usually simple. Contact jiayq@ or AI Platform Dev for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/12354

Reviewed By: orionr

Differential Revision: D10238910

Pulled By: Yangqing

fbshipit-source-id: 7794d5bf2797ab0ca6ebaccaa2f7ebbd50ff8f32
2018-10-15 13:33:18 -07:00
mruberry
9d4360c060 Creates stream pool (#9938)
Summary:
This PR creates a stream pool per issue #9646. When a new stream is requested, that device it's requested on lazily creates two pools, one low priority and one high priority, of 32 streams each. Streams are returned from these pools round-robin. That is, stream 0 is returned, then stream 1... then stream 31, then stream 0... This PR also takes the opportunity to clean up the stream API, reducing its complexity and verbosity.

Change notes:

- There are now 3 sets of streams per device, the default stream, the low priority streams, and the high priority streams. These streams live in lazily initialized pools and are destroyed on shutdown.
- All stream refcounting has been removed (the pools pattern replaces it).
- Setting a stream now sets it on its device. Streams are associated with a device and the previous
requirement to specify that device was unnecessary.
- There is no exposure for setting the flags on a stream. This may also seem like a regression but the flag was always set to cudaStreamNonBlocking.
- Streams are now low or high priority whereas previously the priority could be set with an integer. In practice, however, the range for priorities is -1 to 0 on the latest hardware. -1 is high priority, 0 is low priority (aka default priority). Low vs. high actually clarifies this behavior if people were trying finer separations. (E.g., if someone tried streams with priorities 0, 1, and 2, they would actually all have priority 0, historically, and the intended behavior would not be respected.)
- Unused THCStream and THCState stream-related functions were removed.
- A new test of pooling behavior was added in stream_test.

fyi: colesbury, apaszke, goldsborough
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9938

Reviewed By: SsnL

Differential Revision: D9569036

Pulled By: ezyang

fbshipit-source-id: 12ed673fe373170d0cf4d65cb570de016c53ee7d
2018-08-30 12:40:23 -07:00
Mike Ruberry
1003ccfa15 Creates CUDAContext (#9435)
Summary:
ezyang noticed that the CUDAStream files lived under ATen/ despite being CUDA-specific, and suggested porting them to ATen/cuda and exposing them with a new CUDAContext. This PR does that. It also:

- Moves ATen's CUDA-specific exceptions for ATen/cudnn to ATen/cuda for consistency
- Moves getDeviceProperties() and getCurrentCUDASparseHandle() to CUDAContext from CUDAHooks

The separation between CUDAContext and CUDAHooks is straightforward. Files that are in CUDA-only builds should rely on CUDAContext, while CUDAHooks is for runtime dispatch in files that can be included in CPU-only builds. A comment in CUDAContext.h explains this pattern. Acquiring device properties and CUDA-specific handles is something only done in builds with CUDA, for example, so I moved them from CUDAHooks to CUDAContext.

This PR will conflict with #9277 and I will merge with master after #9277 goes in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9435

Reviewed By: soumith

Differential Revision: D8917236

Pulled By: ezyang

fbshipit-source-id: 219718864234fdd21a2baff1dd3932ff289b5751
2018-07-20 12:56:15 -07:00
Peter Goldsborough
b770156a7a Functional DataParallel (#9234)
Summary:
This PR adds the functional version of `DataParallel` (i.e. `data_parallel`) to the C++ frontend.

For this, I had to:
1. Add "differentiable" versions of scatter and gather, which perform their inverse operation in the backward pass, to C++. I've added them under `torch/csrc/autograd/functions/comm.{h,cpp}`. I had to move some utilities from `VariableType.cpp` into `torch/csrc/autograd/functions/utils.h`, and changed them a bit to fix the `const_cast`s for which there were `TODO`s,
2. Implement the `replicate`, `parallel_apply` and the combining `data_parallel` functions in C++.

`replicate` is implemented based on our existing `clone()` interface, along with the ability to set the current device via `at::OptionsGuard` (so nice).

`parallel_apply` is implemented using `at::parallel_for` (CC cpuhrsch) and [follows the code from PyTorch](https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/parallel_apply.py).

Added lots of tests for these things.

apaszke ezyang ebetica colesbury
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9234

Differential Revision: D8865182

Pulled By: goldsborough

fbshipit-source-id: 4f1fecf2b3f3bc1540c071dfb2d23dd45de433e4
2018-07-19 16:12:04 -07:00
Peter Goldsborough
f1ce15b50c Move nccl scatter and gather to C++ (#9117)
Summary:
As I try to replicate DP in C++, I need to move some functions into C++ from Python. This PR ports the scatter and gather primitives from Python in torch/cuda/comm.py to C++ in torch/csrc/cuda/comm.cpp. The basic infrastructure was already there, since apaszke had rewritten broadcast in C++ already.

I'm not very familiar with this code, so let me know if I'm doing something wrong. I largely just literally translated the code.

I don't know how "public" `torch.cuda.comm` is, but I feel like the `destination_index` parameter for `gather` should be changed from -1 indicating CPU to `None` indicating CPU, and `-1` indicating the default CUDA device. That would make the code clearer IMO.

apaszke colesbury teng-li pietern
Closes https://github.com/pytorch/pytorch/pull/9117

Differential Revision: D8721729

Pulled By: goldsborough

fbshipit-source-id: 1844a488079d21fa209b32e2c73e48632cbe9e68
2018-07-06 11:10:33 -07:00
Peter Goldsborough
04a3616de0 Replace std::size_t with size_t (#8093) 2018-06-04 11:10:44 -04:00
Sam Gross
f1c616418d
Fix Python docs for broadcast and braodcast_coalesced (#4727) 2018-01-19 10:57:20 -05:00
Adam Paszke
1061d7970d Move broadcast and broadcast_coalesced to C++ 2018-01-18 11:16:45 +01:00