Commit Graph

642 Commits

Author SHA1 Message Date
Shen Li
24f4d3987e Move all Stream and Event Python implementation to C++ (#15937)
Summary:
1. Added `torch/csrc/cuda/Event.h` and `torch/csrc/cuda/Event.cpp` to bind Python Event class to C++ implementation.
2. Move all CUDA runtime invocations from `torch/cuda/streams.py` to C++
3. Added tests to cover Stream and Event APIs. ~(event IPC handle tests is introduced in #15974)~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15937

Differential Revision: D13649001

Pulled By: mrshenli

fbshipit-source-id: 84ca58f35f6ba679a4ba33150ceba678d760d240
2019-01-17 07:29:22 -08:00
Peter Goldsborough
0bf1383f0a Python <-> C++ Frontend inter-op (#13481)
Summary:
This PR enables C++ frontend modules to be bound into Python and added as submodules of Python modules. For this, I added lots of pybind11 bindings for the `torch::nn::Module` class, and modified the `torch.nn.Module` class in Python to have a new Metaclass that makes `isinstance(m, torch.nn.Module)` return true when `m` is a C++ frontend module. The methods and fields of C++ modules are bound in such a way that they work seamlessly as submodules of Python modules for most operations (one exception I know of: calling `.to()` ends up calling `.apply()` on each submodule with a Python lambda, which cannot be used in C++ -- this may require small changes on Python side).

I've added quite a bunch of tests to verify the bindings and equality with Python. I think I should also try out adding a C++ module as part of some large PyTorch module, like a WLM or something, and see if everything works smoothly.

The next step for inter-op across our system is ScriptModule <-> C++ Frontend Module inter-op. I think this will then also allow using C++ frontend modules from TorchScript.

apaszke zdevito

CC dzhulgakov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13481

Differential Revision: D12981996

Pulled By: goldsborough

fbshipit-source-id: 147370d3596ebb0e94c82cec92993a148fee50a7
2018-12-13 08:04:02 -08:00
Edward Yang
517c7c9861 Canonicalize all includes in PyTorch. (#14849)
Summary:
Anywhere we used #include "foo.h", we now say #include <foo.h>
Paths are adjusted to be rooted out of aten/src, torch/lib, or
the root level directory.

I modified CMakeLists.txt by hand to remove TH and THC from
the include paths.

I used the following script to do the canonicalization:

```
  import subprocess
  import re
  import os.path

  files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n')
  for fn in files:
      if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']):
          continue
      if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]):
          continue
      with open(fn, 'r') as f:
          c = f.read()
      def fmt(p):
          return "#include <{}>".format(p)
      def repl(m):
          p = m.group(1)
          if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]:
              return fmt(p)
          if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]):
              return fmt(p)
          for root in ["aten/src", "torch/lib", ""]:
              for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]:
                  new_p = os.path.relpath(os.path.join(bad_root, p), root)
                  if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))):
                      return fmt(new_p)
          print("ERROR: ", fn, p)
          return m.group(0)
      new_c = re.sub(r'#include "([^"]+)"', repl, c)
      if new_c != c:
          print(fn)
          with open(fn, 'w') as f:
              f.write(new_c)
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849

Reviewed By: dzhulgakov

Differential Revision: D13363445

Pulled By: ezyang

fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68
2018-12-08 19:38:30 -08:00
Peter Goldsborough
d6c53328f9 Large scale fix of python-related files in torch/csrc/
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14515

Differential Revision: D13247966

Pulled By: goldsborough

fbshipit-source-id: 7a127c508fc576a7a92626dd6b729f660162d628
2018-12-07 13:04:46 -08:00
Pieter Noordhuis
220ce8046e Binding for prctl(PR_SET_PDEATHSIG) (#14491)
Summary:
If torch.multiprocessing.spawn is used to launch non-daemonic
processes (the default since #14391), the spawned children won't be
automatically terminated when the parent terminates.

On Linux, we can address this by setting PR_SET_PDEATHSIG, which
delivers a configurable signal to child processes when their parent
terminates.

Fixes #14394.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14491

Differential Revision: D13270374

Pulled By: pietern

fbshipit-source-id: 092c9d3c3cea2622c3766b467957bc27a1bd500c
2018-11-29 20:09:19 -08:00
albanD
f80d34a1c8 Update Tensor doc (#14339)
Summary:
Add to the Tensor doc info about `.device`, `.is_cuda`, `.requires_grad`, `.is_leaf` and `.grad`.
Update the `register_backward_hook` doc with a warning stating that it does not work in all cases.
Add support in the `_add_docstr` function to add docstring to attributes.

There is an explicit cast here but I am not sure how to handle it properly. The thing is that the doc field for getsetdescr is written as being a const char * (as all other doc fields in descriptors objects) in cpython online documentation. But in the code, it is the only one that is not const.
I assumed here that it is a bug in the code because it does not follow the doc and the convention of the others descriptors and so I cast out the const.
EDIT: the online doc I was looking at is for 3.7 and in that version both the code and the doc are const. For older versions, both are non const.
Please let me know if this should not be done. And if it should be done if there is a cleaner way to do it !
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14339

Differential Revision: D13243266

Pulled By: ezyang

fbshipit-source-id: 75b7838f7cd6c8dc72b0c61950e7a971baefaeeb
2018-11-28 15:28:17 -08:00
Anders Papitto
2983998bb3 add torch-python target (#12742)
Summary:
This is the next minimal step towards moving _C into cmake. For now,
leave _C in setup.py, but reduce it to an empty stub file. All of its
sources are now part of the new torch-python cmake target.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12742

Reviewed By: soumith

Differential Revision: D13089691

Pulled By: anderspapitto

fbshipit-source-id: 1c746fda33cfebb26e02a7f0781fefa8b0d86385
2018-11-16 11:43:48 -08:00
Benoit Steiner
bbe6ef3864 torch.finfo and torch.iinfo to mimic the numpy equivalent (#12472)
Summary:
This pull request intends to provide the functionality requested in https://github.com/pytorch/pytorch/issues/10742 by adding a new torch.finfo and torch.iinfo API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12472

Differential Revision: D10250829

Pulled By: benoitsteiner

fbshipit-source-id: eb22ca55d5b0064bef381fa7f1eb75989977df30
2018-10-15 13:43:52 -07:00
Yangqing Jia
713e706618 Move exception to C10 (#12354)
Summary:
There are still a few work to be done:

- Move logging and unify AT_WARN with LOG(ERROR).
- A few header files are still being plumbed through, need cleaning.
- caffe2::EnforceNotMet aliasing is not done yet.
- need to unify the macros. See c10/util/Exception.h

This is mainly a codemod and not causing functional changes. If you find your job failing and trace back to this diff, usually it can be fixed by the following approaches:

(1) add //caffe2/c10:c10 to your dependency (or transitive dependency).
(2) change objects such as at::Error, at::Optional to the c10 namespace.
(3) change functions to the c10 namespace. Especially, caffe2::MakeString is not overridden by the unified c10::str function. Nothing else changes.

Please kindly consider not reverting this diff - it involves multiple rounds of rebasing and the fix is usually simple. Contact jiayq@ or AI Platform Dev for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/12354

Reviewed By: orionr

Differential Revision: D10238910

Pulled By: Yangqing

fbshipit-source-id: 7794d5bf2797ab0ca6ebaccaa2f7ebbd50ff8f32
2018-10-15 13:33:18 -07:00
Adam Paszke
8c3a94eaf2 Improve autograd profiler performance (#11773)
Summary:
To illustrate the benefits of this commit, I'll use the time/iter I got from one of the JIT benchmarks on my machine.

| Run                                          | Time                    |
|----------------------------------------------|-------------------------|
| No profiler                                  | 45ms                    |
| With profiler                                | 56ms                    |
| Use `clock_gettime` instead of `std::chrono` | 48ms                    |
| Touch all pages on block allocation          | 48ms (less jitter)      |
| Use `const char*` instead of `std::string`   | 47ms (even less jitter) |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11773

Differential Revision: D9886858

Pulled By: apaszke

fbshipit-source-id: 58f926f09e95df0b11ec687763a72b06b66991d0
2018-09-19 09:25:43 -07:00
Teng Li
020501b7b0 Getting rid of USE_C10D for build (#11237)
Summary:
Will use USE_DISTRIBUTED for both c10d and THD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11237

Differential Revision: D9647825

Pulled By: teng-li

fbshipit-source-id: 06e0ec9b5e2f8f38780fc88718f8499463e9e969
2018-09-04 17:27:53 -07:00
Pieter Noordhuis
033499cf56 Remove mention of USE_DISTRIBUTED_MW (#11240)
Summary:
This was lingering after #10731.

cc orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11240

Differential Revision: D9645437

Pulled By: pietern

fbshipit-source-id: d02c33354b094be3bb0872cf54a45721e20c4e7d
2018-09-04 16:10:20 -07:00
Peter Goldsborough
7ddc6f84c4 NULL -> nullptr (#11047)
Summary:
How did we get so many uses of `NULL` again?

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11047

Differential Revision: D9566799

Pulled By: goldsborough

fbshipit-source-id: 83469f352ac69aa65bdaf1a1a21f922d892e0db3
2018-08-30 16:25:42 -07:00
Tongzhou Wang
23af7deea7 Add has_lapack flag (#11024)
Summary:
Currently our `skipIfLapack` has uses a try-catch block and regex match the error message. It is highly unreliable. This PR adds `hasLAPACK` and `hasMAGMA` on ATen context, and expose the flags to python.

Also fixes refcounting bug with `PyModule_AddObject`. The method steals reference, but we didn't `Py_INCREF` in some places before calling it with `Py_True` or `Py_False`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11024

Differential Revision: D9564898

Pulled By: SsnL

fbshipit-source-id: f46862ec3558d7e0058ef48991cd9c720cb317e2
2018-08-29 22:41:16 -07:00
Richard Zou
ad6d62250a Add torch.compiled_with_cxx11_abi(). (#10071)
Summary:
It returns whether PyTorch was built with _GLIBCXX_USE_CXX11_ABI=1.

Fixes #8385
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10071

Differential Revision: D9088946

Pulled By: zou3519

fbshipit-source-id: b00fd92ee340ef34f60bdd6027ceaf46dd7442c0
2018-08-01 15:34:48 -07:00
Gregory Chanan
34c7c56c73 Re-enable empty n-dimensional empty tensor and fix parallel CPU on empty tensors (#10077)
Summary:
This is a combination of https://github.com/pytorch/pytorch/pull/9947 (this was reverted) and https://github.com/pytorch/pytorch/pull/10076.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10077

Differential Revision: D9087491

Pulled By: gchanan

fbshipit-source-id: 9fe9905628000f2ff3e47df32533cd7d1f25a354
2018-07-31 16:43:45 -07:00
Gregory Chanan
6fb9acfc16 Revert empty n-dim and ATen in C2 integration builds
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10064

Differential Revision: D9082082

Pulled By: gchanan

fbshipit-source-id: ae49470f5b4c89b13beb55fd825de1ba05b6a4fa
2018-07-31 07:25:56 -07:00
Gregory Chanan
ce5f0d40b6 Enable n-dimensional empty tensors. (#9947)
Summary:
These could use some autograd tests, which are coming in a later PR, but using them in autograd is probably pretty rare.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9947

Reviewed By: ezyang

Differential Revision: D9032778

Pulled By: gchanan

fbshipit-source-id: fa5a6509d3bac31ea4fae25143e82de62daabfbd
2018-07-30 12:33:17 -07:00
Thomas Viehmann
3254bcaed8 Call deleter when destroying unconsumed DLPack PyCapsules (#9297)
Summary:
Usually DLPack consumer is expected to call DLManagedTensor's
deleter to signal that it doesn't need the contents.
This patch calls the deleter when freeing unconsumed
DLPack capsules created by PyTorch.

Test script:
```
import torch
import torch.utils.dlpack
import gc
for i in range(10000):
    a = torch.randn(1000,1000, dtype=torch.float32, device='cuda')
    b = torch.utils.dlpack.to_dlpack(a)
    gc.collect()
```
Before patch: consume all GPU ram.
After patch: constant GPU ram consumption.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9297

Differential Revision: D8781571

Pulled By: soumith

fbshipit-source-id: 2ebadec6c857646220d632ca64110af430dbd52f
2018-07-10 07:56:59 -07:00
Will Feng
ff501c30af Turn on UBSAN in the OSS build (#8813)
Summary:
Copy of https://github.com/pytorch/pytorch/pull/8802
Closes https://github.com/pytorch/pytorch/pull/8813

Differential Revision: D8707364

Pulled By: yf225

fbshipit-source-id: bc201980b50e9fb44c42a17f898b50d3558fc417
2018-07-05 15:55:49 -07:00
Sam Gross
77484d91db Add AT_WARN to issue warnings from ATen (#8967)
Summary:
Use AT_WARN from python_anomaly_mode instead of printing to stdout.
Closes https://github.com/pytorch/pytorch/pull/8967

Reviewed By: ezyang

Differential Revision: D8670654

Pulled By: colesbury

fbshipit-source-id: 3f7aee8ea06914d7d4381feec086e95f0b194752
2018-06-27 21:24:39 -07:00
Peter Goldsborough
47492ed451
[C++ API] Bag of fixes (#8843)
* Bag of fixes

* Rename tensor_range.h to tensor_list_view.h

* Post rebase fixes

* Rename torch::tensor namespace to torch::tensors due to name conflict

* Avoid recursion in Module::to
2018-06-25 21:11:49 -07:00
gchanan
b6af5d40bf
Some 0-sized dimension support, port catArray away from resizeLegacy. (#8666)
* Some 0-sized dimension support, port catArray away from resizeLegacy.

The goal of this PR is to port catArray away from resizeLegacy (so we can delete the legacy resize calls), but since catArray has some weird behavior because
we don't have arbitrary 0-sized dimension support, I made some effort to fix these both in one pass.

The major changes here are:
1) catArray uses the new resize API, no longer the old resizeLegacy API.
2) As 1) is the last usage of resizeLegacy, it is deleted.
3) If compiled with USE_TH_SIZE_ZERO_DIM, catArray will work and properly check shapes for n-dimensional empty tensors.
4) However, we retain the old behavior of "ignoring" size [0] tensors in catArray.  We previously allowed this because we didn't have n-dimensional empty tensors.
5) To get the above to work, we also add support for n-dimensional empty tensors for narrow and slice (ifdef USE_TH_SIZE_ZERO_DIM).
6) We change the stride formula for empty tensors to match NumPy; basically, we never multiply by 0 as the size, always at least 1, so the
   strides are monotonically increasing in the empty tensor case.
7) We print the size of empty tensors if size != [0]; this matches NumPy behavior (even in cases where the size could be inferred from the brackets.
8) For test purposes, we add torch._C._use_zero_size_dim() to add tests for the above.

* Fix flake8.

* Address review comments.
2018-06-20 13:26:08 -04:00
Soumith Chintala
dc186cc9fe
Remove NO_* and WITH_* across codebase, except in setup.py (#8555)
* remove legacy options from CMakeLists

* codemod WITH_ to USE_ for WITH_CUDA, WITH_CUDNN, WITH_DISTRIBUTED, WITH_DISTRIBUTED_MW, WITH_GLOO_IBVERBS, WITH_NCCL, WITH_ROCM, WITH_NUMPY

* cover SYSTEM_NCCL, MKLDNN, NNPACK, C10D, NINJA

* removed NO_* variables and hotpatch them only in setup.py

* fix lint
2018-06-15 12:29:48 -04:00
James Reed
04503962ff
[ONNX] Add an ATen fallback pathway for ONNX export (#8273)
* ATen fallback for ONNX export

* Move to enum

* Fix model test

* Add comment

* Address comments

BC interface
2018-06-12 22:59:45 -07:00
Pieter Noordhuis
695d40efc2
Create initial Python bindings for c10d (#8119)
* Build and install c10d from tools/build_pytorch_libs.sh

* Create initial Python bindings for c10d

* clang-format

* Switch link order to include more symbols

* Add bindings and tests for ProcessGroupGloo

* Add broadcast test

* Separate build flag for c10d

* Explicit PIC property

* Skip c10d tests if not available

* Remove c10d from Windows blacklist

Let it skip by itself because it won't be available anyway.

* Make lint happy

* Comments

* Move c10d module into torch.distributed

* Close tempfile such that it is deleted
2018-06-08 12:59:51 -07:00
Edward Z. Yang
15122e93bc
Test if ASAN is actually working as part of ASAN tests. (#6050)
* Test if ASAN is actually working as part of ASAN tests.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Drop explicit use of libstdc++, we should not care.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Build with DEBUG=1

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Increase main thread stack size when using ASAN.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-30 11:31:42 -04:00
Zachary DeVito
286cd04a20
JIT cleanup (#7631)
Cleans up dead code in the JIT:

* Remove interpreter_autograd_function
* Remove Handles
* Remove HandleBuilder
* Remove creates_handles, and tracing_autograd_python_function flags
* Remove unused var_args
* Fix submodules
2018-05-21 10:06:29 -07:00
Adam Paszke
0829d4502d
Trace size-dependent expressions correctly (#6554)
This makes the JIT tracer much more robust, by allowing it to record
dependencies on tensor sizes. For example, if you were to trace this
function

def fn(x):
    return x.view(x.size(1), -1)

before this patch, then it would embed the actual value of x.size(1)
in the trace as a constant, making it very hard to have e.g. batch size
independent traces. Now, this will correctly record the dependency, and
will retrieve the size of x at every run.
2018-05-04 10:55:39 +02:00
Zachary DeVito
d985cf46f1
Add workaround to fix include warnings in Python 2 builds. (#6716) 2018-04-24 12:30:19 -07:00
li-roy
d1bb75e273
Redo tensor repr to make it less verbose (#6370)
* Redo tensor repr to make it less verbose

* fix empty tensor

* fix scaled scalars

* update for device-dtype split

* address comments

* removed repeated lines

* address comments

* add cuda to device string
2018-04-18 18:25:07 -07:00
bddppq
c43c911662
Export onnx protobuf bindings to python (#6651)
* Export onnx protobuf bindings to python

* rename native onnx module to _onnx
2018-04-17 16:38:57 -07:00
gchanan
d7cb78478f Split set_default_tensor_type(dtype) into set_default_dtype(dtype). (#6599)
* Split set_default_tensor_type(dtype) into set_default_dtype(dtype).

* Fix flake8.

The difference between this one and set_default_tensor_type is that it only sets scalar type what determines the type + device of a tensor returned from a factory function with defaults is the default tensor type + the current device (if the default tensor type is cuda). This just changes the scalar type of the default tensor type.

We do eventually want to deprecate set_default_tensor_type; it is not clear how to do that in a sensible and backwards compatible way.
2018-04-16 13:49:00 -04:00
gchanan
749d51414a
Separate cuda-ness from dtype. (#6470)
* Separate cuda-ness from dtype.

There are no longer torch.cuda.int64, etc; only torch.int64 that correspond to at::ScalarType.
At the python arg parser level, the corresponding ATen type is selected from the combination of (ScalarType, Layout, Device).

There is also currently unused code in here for support ScalarType in native_functions; this will be used for specifying aggregate types
on reduction functions.

* Fix test_autograd.

* Add defaults to randint_like.

* Track is_cuda in py tensor types.

* Fix test_sparse.

* Fix multiprocessing.

* Fix rnn.

* Fix test_nn.

* Fix flake8.
2018-04-12 14:05:44 -04:00
gchanan
87e369111a
Add string-style devices to all tensors. (#6283)
* Add string-style devices to all tensors.

Previously, tensors only had a 'get_device' method which would throw an exception on a CPU tensor.   This made it necessary to if/else code that
was meant to be device agnostic.

This PR implements the following:
1) Adds a 'device' property to all tensors that returns a string representation of the device for all tensors.
For cpu tensors this is 'cpu'.  For cuda tensors this is 'cuda:X', where X is the cuda device ordinal.

2) Adds a DeviceSpec class.  This is just a helper class for separating device_type and device_index specification and to allow partial specification.
For example, you can call DeviceSpec('cuda'), DeviceSpec('cuda:0'), DeviceSpec('cuda', 1).
Also has backwards compatibility support for specifying integers, which are treated as cuda devices.

DeviceSpecs have the following properties:
a) device_type: string representation of the device type (i.e. 'cpu' or 'cuda')
b) device_index: integer for the device index (None if not specified)
c) cuda_device_index: for backwards compatibility; behaves roughly like `get_device` did previously.  I.e. if a function previously took integers for cuda devices,
it can now take DeviceSpecs (or strings), and can maintain the old functionality by calling `old_index = DeviceSpec(old).cuda_device_index`.

3) tensor methods and torch. functions that took integer devices can now take integers, strings, or DeviceSpecs.  For example:
torch.randn((2,3), dtype=torch.cuda.float32, device='cuda:1')

TODO in future PRs:
A) Split out cuda from dtype so you don't need to overspecify cuda-ness
B) We currently only support strings/DeviceSpecs in tensor methods and torch. functions.  We should have equivalents torch.cuda.device(...), torch.cuda.device_of, etc.
at the torch. level that work on strings/DeviceSpecs

* Add deviceInt64 to python arg parser.

* device_str.

* Remove device_str.

* remove device prefix from attributes.

* Use const char * instead of string.

* Move autogpu index out of Device.

* comment on is_default.

* Rename torch.DeviceSpec to torch.device.

* comment.

* Fix tests.

* Fix flake8.

* Fix sparse_coo_tensor parameter name.

* Improve error message.

* Remove device_ prefix from C++ device object.

* Allocate static strings.

* Return not implemented from rich compare.

* Move torch::Device to THPDevice.

* Remove cuda index.

* Py_RETURN_NOTIMPLEMENTED doesn't exist in python2.
2018-04-06 15:12:05 -04:00
Sam Gross
6b3a4637d6
Make the tensor type torch.Tensor instead of torch.autograd.Variable (#5785)
This changes type(tensor) to return `torch.Tensor` instead of
`torch.autograd.Variable`.

This requires a few implementation changes:

 - torch.Tensor is now a regular Python class instead of a
   pseudo-factory like torch.FloatTensor/torch.DoubleTensor
 - torch.autograd.Variable is just a shell with a __new__ function.
   Since no instanes are constructed it doesn't have any methods.
 - Adds torch.get_default_dtype() since torch.Tensor.dtype returns
   <attribute 'dtype' of 'torch._C._TensorBase' objects>
2018-04-03 16:29:25 -04:00
Sam Gross
83926393d3 Detect re-initialization of _C shared library (#6232)
We had a bug in the Buck build of PyTorch due to symbols from _C
being present in two shared libraries that were both loaded at
runtime. This caused global variables to be initialized twice and
destructed twice on exit. The second destruction often caused
segfaults on exit.

This attempts to detect that sort of situation early on. If
Module.cpp is compiled twice, the symbol
pytorch_duplicate_guard()::initialized will be shared. The second
initialization will print an error message and abort.
2018-04-03 15:28:37 -04:00
gchanan
4c81282c33
Introduce torch.layout and split layout from dtypes. (#6145)
* Introduce torch.layout and split layout from dtypes.

Tensors (and tensor types) now have a 'layout' attribute that returns either 'torch.strided' or 'torch.sparse_coo'.

Previously, dtypes were 1-to-1 with ATen types/PyTensorTypes; the impetus behind this decision was to make things easy in the common case
(i.e. specifying a type in a factory function).  But this doesn't really follow for sparity, which isn't a common case.

It also doesn't properly represent the concept or a dtype, which in numpy are proper scalar types (i.e. roughly the type returned from indexing the
last dimension of an n-d array).  But this should be the same whether or not the tensor is represented via strides, sparsity, etc.

This is accomplished by:
1) having the dtype of tensor return the (device-type, scalar-type) combination, i.e. torch.cuda.float32, so both
   torch.cuda.FloatTensor and torch.cuda.sparse.FloatTensor have the same dtype
2) Adding a layout parameter to python functions, where the combination of (dtype, layout) maps to an ATen type that is used for dispatch.

* Formatting, make init throw python_error.

* Fix cuda not enabled error message.

* Fix test.
2018-04-02 14:07:50 -04:00
Tongzhou Wang
22ef8e5654 [fft][1 of 3] build system and helpers to support cuFFT and MKL (#5855)
This is the first of three PRs that #5537 will be split into.

This PR adds mkl headers to included files, and provides helper functions for MKL fft and cuFFT.
In particular, on POSIX, headers are using mkl-include from conda, and on Windows, it is from a new file @yf225 and I made and uploaded to s3.

* add mkl-include to required packages

* include MKL headers; add AT_MKL_ENABLED flag; add a method to query MKL availability

* Add MKL and CUFFT helpers
2018-03-19 15:43:14 -04:00
cpuhrsch
84400d5531 ReduceOps cleanup and set_num_threads (#5723) 2018-03-19 13:40:56 -04:00
Sam Gross
7588893ce2
Some additional clean-ups (#5505)
- Remove some uses of mega-header THP.h
 - Use HANDLE_TH_ERRORS in functions that may throw
 - Move NumPy includes to common header
 - Delete unused allocator
2018-03-05 17:45:02 -05:00
Sam Gross
5dedc648bb Compile DataLoader.cpp separately (#5507)
Don't #include DataLoader.cpp in Module.cpp
2018-03-02 05:54:33 -05:00
gchanan
285a9e2452
Add dtype to torch.Tensor constructors and accept them in set_default_tensor_type (#5444)
* Add dtype to torch.Tensor, torch.FloatTensor, etc.

* Support passing dtypes to set_default_tensor_type.

* Check dtype exception.

* Correctly handle new type initialization order.

* Move handling of torch.Storage alias to C++.

* Delete function that erroneously reappeared.
2018-03-01 14:06:55 -05:00
Sam Gross
509aed6ca3
More Variable/Tensor clean-ups (#5464) 2018-02-28 16:46:47 -05:00
Sam Gross
48a3349c29
Delete dead Tensor code paths (#5417)
This deletes most of the dead Tensor code paths, including the TensorMethods cwrap and generic/Tensor.cpp.

This also moves the THNN.cwrap/.cpp generation to generate_code which can use ninja if installed.
2018-02-27 17:58:09 -05:00
gchanan
d5038309a1
Remove WITH_SCALARS, as it's enabled by default now. (#5437) 2018-02-27 14:51:11 -05:00
Sam Gross
30ec06c140
Merge Variable and Tensor classes (#5225)
This replaces the torch.Tensor constructors with factories that produce
Variables. Similarly, functions on the torch module (e.g. torch.randn)
now return Variables.

To keep the PR to a reasonable size, I've left most of the unused tensor
code. Subsequent PRs will remove the dead code, clean-up calls to
torch.autograd.Variable, and rename Variable to Tensor everywhere.

There are some breaking changes because Variable and Tensors had
slightly different semantics. There's a list of those changes here:

 https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge
2018-02-23 18:03:31 -05:00
gchanan
5edf6b2037
Add numpy-style dtypes to Variable factories. (#5245)
* Add numpy-style dtypes to Variable factories.

1) Add numpy-style dtypes corresponding to torch tensor types.  These are:
torch.float16, torch.float32, torch.float64, torch.uint8, torch.int8, torch.int16, torch.int32, torch.int64
as well as torch.cuda, torch.sparse, and torch.cuda.sparse equivalents.

2) Adds "legacy" names for the above dtypes that correspond more closely to existing tensor names.  These are:
torch.half, torch.float, torch.double, torch.short, torch.int, torch.long.
torch.byte and torch.char don't exist because they either don't match numpy semantics or differ on different architectures.

3) Adds a "dtype" parameter to Variable factories (e.g. zeros, ones) that allows the user to specify the type without changing the default tensor type.

4) Adds a "dtype" getter to Variables that return the canonical dtype from 1)

This PR is missing the following useful features that should be added in the future:
A) We only add the "dtype" parameter to auto-generated factories; hand-written factories like in tensor_new.cpp don't support this yet.

B) We don't allow type conversions to use dtypes; that should be added to type(param) or a new function.

C) We don't yet have a "device" parameter for these factories; right now, they will only create Variables on the default device.

* backend_to_string can be private.

* Define python binding argument indexes in a more simple way.

* add all_declared_types, still need to hook it up to THPDType.

* Fix all_declared_types for missing types (it's Sparse + Half).

* Ensure cuda dtypes are created even if compiled with NO_CUDA=1.

* Fix case where dtype is provided but dispatch is via namespace.

This happens in ones_like, empty_like, randn_like.

There is some question if we should do:
1) at::ones_like(tensor).toType(dtype)
2) at::ones_like(tensor.toType(dtype))

I did the former because this matches with the numpy documentation, i.e.:
"Overrides the data type of the result." and it's easier to implement.

Note that the above causes an extra copy, either of the input or output.
Here's a better implementation:
1) Make zeros_like, ones_like native functions that take an optional type (named dtype?).
2) Match the type argument with the dtype, so we don't have two different parameters.
3) Call at::zeros_like(input, type) -> at::native::zeros_like(input, type) -> type.zeros(input.sizes())

* Don't return from maybe_initialize_cuda.

* Don't leak DType name.

* Address cpp review comments.

* Share code between sparse and non-sparse test_dtypes.

* Rewrite _like functions as native function with explicit type parameter.

* Use type 'Type' instead of 'dtype' for consistency.

* Address review comments.

* Handle arg_idx when there is requires_grad but no dtype in python_binding_arguments.
2018-02-20 11:04:14 -05:00
Choongwoo Han
fae6c67121 Configurable flushing denormal numbers on CPU (#5294)
* Configurable flushing denormal numbers on CPU

* Formatting

* Update docs

* Minor doc changes
2018-02-19 19:23:43 -05:00
gchanan
9bb6d33d35
Enable scalars if compiled with WITH_SCALAR environment variable. (#4806)
* Enable scalars if compiled with WITH_SCALAR environment variable.

We are pretty close to enabling scalars (0-dimensional arrays); this allows turning them on
for development purposes and to be able to write code that works both with and without scalars enabled.

WITH_SCALARS is currently broken with distributions, but should work for test_torch, test_autograd, test_nn.

* Fix unsqueeze.

* Fix wrap dim, wrapping with Scalar.
2018-01-23 15:44:11 -05:00
gchanan
1569797b15
Use ATen infer_size implementation rather than TH. (#4781)
* Use ATen infer_size implementation rather than TH.

The only substantitive difference between the two implementations is in how empty sizes are handled;
in ATen these are treated as scalars (i.e., can be expanded to anything), whereas in TH they are treated
as a special case of empty tensors (i.e., can't be expanded to anything).  Therefore, this change is
necessary to support scalars (0-dimensional tensors).  We could also take a bool parameter for determining
how we treat empty tensors but this seems unnecessary: if one tries to expand an empty tensors (as a result
of an infer_size calculation), the expansion will fail.

* Make changes for review.

* Attempt to fix windows build.

* long -> int.
2018-01-22 15:34:31 -05:00
Sam Gross
e855317370
Make dirichlet_grad and standard_gamma match ATen declarations (#4722)
The Python function has an underscore (_) prefix so the C++
IMPLEMENT_STATELESS call should have an underscore prefix as well.
2018-01-18 16:49:18 -05:00
Adam Paszke
1061d7970d Move broadcast and broadcast_coalesced to C++ 2018-01-18 11:16:45 +01:00
Sam Gross
57549b7e44
Bind functions with out= arguments in VariableType (#4565)
This adds overrides in VariableType for the xxx_out ATen functions and
implements Python bindings. There is no support for automatic
differentiation. If any of the inputs (or outputs) requires grad, then the
function will throw an exception unless it's running in "no-grad" mode.

The bindings for calling torch.xxx functions on Variables are moved to a
different object. Previously, they were static method on VariableBase.
This change prevents users from accidentally calling static methods as if
they were instance methods.
2018-01-17 18:27:42 -05:00
HE, Tao
f4a75deccf Fix the inconsistency of polygamma on Tensor and Variable, for issue #4466 (#4527)
* Fix the inconsistency of `polygamma` on Tensor and Variable.

Signed-off-by: HE, Tao <sighingnow@gmail.com>

* Regression test for #4466, polygamma works on variables.

Signed-off-by: HE, Tao <sighingnow@gmail.com>

* Add macro IMPLEMENT_STATELESS_SWAP to dispatch stateless methods on Variables correctly.

When call stateless methods with more than one arguments and the `self` comes second,
the `self` argument needs to be swapped to the first position before dispatching.

The macro `IMPLEMENT_STATELESS_ADDXX` is still reserved for deprecated `add**`
methods.

Signed-off-by: HE, Tao <sighingnow@gmail.com>
2018-01-09 10:39:09 -05:00
Fritz Obermeyer
35abc4efa2 Add low-precision digamma() and polygamma() functions (#4399) 2018-01-02 11:53:23 +01:00
Vishwak Srinivasan
e519ef5337 Adding torch.expm1() and its inplace function (#4350) 2017-12-28 18:56:03 +09:00
SsnL
658d4c7ea8 allow optional int tensor 2017-12-24 03:08:28 +08:00
Edward Z. Yang
5f7c5502b8
Further improvements to ATen convolution (#4287)
- Rename THNN convolution to have thnn_ prefix.
- Propagate CuDNN benchmark and deterministic to at::Context
- Add 'convolution', 'convNd' and 'conv_transposeNd' native wrappers, with defaults
  The conv_transposeNd wrappers are updated to have the same argument
  order as Python.
- torch.nn.functional directly dispatches to the native wrappers
- Make it possible to turn off tracing for some native wrappers, so I don't
  have to write symbolics for all the functions above
- Spectral ops can now make use of CuDNN convolution if possible
- Better commentary on cudnn_batch_norm
- Turn on DCE for all JIT tests.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-21 13:03:43 -05:00
Edward Z. Yang
9bf5e40dfa Refactor cudnn code layout / make build more robust. (#4201)
* Refactor cudnn code layout / make build more robust.

When I previously moved cuDNN into ATen, I wasn't too familiar with the
ATen native function directory layout, and so I did a number of
suboptimal things.  This commit fixes those problems.

- If NO_CUDA was set but cuDNN is installed on your system, we'd incorrectly
  assume that CUDNN was enabled, to hilarious effect.

- We now distinguish between cudnn implementation files and cudnn
  native function files.  The native files now live in ATen/native/cudnn,
  and are *unconditionally compiled*, even when we are not building with cuDNN.
  This means that we can unconditionally declare cudnn functions in yaml
  and they are always available, even if they are broken.  The cuDNN specific
  files live in 'cudnn', they are *never* installed, and they are used
  purely for implementation purposes.  I had to add stub implementations of
  all ATen functions to achieve this.

- I had written headers for at::native functions manually, but codegen
  will generate them for me automatically.  So I deleted the headers.
  That lets me get rid of some header install logic as well.

- There's a new note about ATen preprocessor philosophy.
2017-12-18 16:47:57 -05:00
Sam Gross
d605058212
Replace Variable.volatile with torch.no_grad() (#3970)
This removes volatile from Variable. The functionality is mostly
replaced by a global (thread-local) flag, which is controlled by
torch.set_grad_enabled() and the context manager torch.no_grad().

In C++, the flag is exposed through GradMode::is_enabled() and GradMode::set_enabled()

Fixes #3627
2017-12-18 15:46:13 -05:00
gchanan
0876bab8b7
Support CPU Apply in ATen and implement standard_gamma using it (#4161)
* Support CPU Apply directly in ATen and implement standard_gamma using it.

Main changes in this PR:
1) Added a TH_APPLY-style templatized function for CPU apply calls (currently only 2 and 3 tensor argument
versions are supported, but more are easy to add).  In fact, this is basically identical to TH_APPLY, except
it uses ATen functions and the API is a template instead of a macro.  The template takes an operation that
is performed on the data (and an indicator to signal early termination); i.e. you don't need to know that
x_data is a pointer to the current data location of x.

2) Refactors the ATen dispatch code to easily generate dispatch code for different subsets of the scalar types.
This is in preference to the template_scalar path, which requires valid specialization of each scalar type.  Valid
specializations are  particularly annoying with CUDA because you most likely can't put the specializations
in a header so need to write some sort of for-all-scalar-type macro to get the correct specializations.
Currently, we only generate dispatch_all (all scalar types, the equivalent existed already), and
dispatch_cpu_floating_types (which is used by standard_gamma).

3) Implements standard_gamma using the above changes (this is an arbitrary choice, it was the latest
apply macro to be committed).  The forward is bound via Declarations.yaml,
the backward via the Apply template, and then they are hooked together in derivatives.yaml.  This eliminates
needing to change TH at all going forward, which means one can write idiomatic C++ instead of the TH-style macros
(e.g. TH_MATH_NAME).

* Generate Dispatch code with nicer spacing.

* Small cleanups.

* Fix typo.

* Add TODOs for changing macros, remove dead code.

* Use a lambda function.

* Get rid of early exit.

* Rename Scalar,ScalarType template parameters to CScalar.

* Reorder _standard_gamma_grad parameters.

* Add comments explaining calling convention.

* Don't generate Dispatch.h anymore.

* Get rid of backend specific checks in dispatch.

* Fix empty/scalar check.
2017-12-18 15:45:01 -05:00
Fritz Obermeyer
ee98e7a82e Implement Dirichlet and Beta distributions (#4117) 2017-12-18 19:11:37 +01:00
Zachary DeVito
d8c5f2ae21 Fix a bug where from_dlpack failes if cuda is not initialized. (#4182) 2017-12-14 21:54:36 -05:00
Edward Z. Yang
787b9c5202
Propagate CuDNN enabled to ATen library. (#4104)
This is not currently used by anything, but eventually ATen
will need to make decisions about whether or not to use
CuDNN functions or not, which means we need to propagate
this variable to ATen.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-14 11:29:25 -05:00
Fritz Obermeyer
05ebd21a36 Implement reparameterized gradient for Gamma sampler (#3978) 2017-12-11 03:32:15 -08:00
gchanan
1c96809cf8
Bind cauchy_, exponential_, normal_, uniform_ functions to THPVariable. (#3945)
* Bind cauchy_, exponential_, normal_, uniform_ functions to THPVariable.

Also changes the error messages around Generator parser; previously, you'd get an error
like: torch._C.Generator is not a torch.Generator; now the check is proper but returns
that only None is supported.

* Support passing Generators to ATen Variable-bound methods.

This involves changing THPGenerator to have an at::Generator rather than a THGenerator.
TH getRNGState, setRNGState are still called directly because they are not bound from ATen yet;
they should probably be on the Generators and return (opaque) GenerateState objects.

* Fix default values.

* Properly use THRandom_initialSeed.

* update standard gamma to use new default generator.
2017-12-07 14:34:51 -08:00
Sam Gross
d0cabbde74
Implement Variable.from_numpy (#4043)
Implements from_numpy using ATen tensors. Variable.from_numpy is a
convenient placeholder for the variant that returns Variables until we
merge Tensor and Variable.

The behavior is slightly changed:

 - from_numpy() on an empty array now returns an empty tensor instead of
   throwing an exception. The shape may not be preserved.
 - CharTensor(ndarray) used to throw an exception. It now copies the
   ndarray. Copying is implemented via ATen toType.
2017-12-06 14:08:56 -05:00
Fritz Obermeyer
165d0897e4 Implement distributions.Gamma (#3841) 2017-12-02 01:10:08 +01:00
Edward Z. Yang
1c0fbd27a1
CuDNN bindings rewrite (into ATen) (#3666)
* Comprehensive rewrite of Torch CuDNN bindings / a bit of ATen infra

The executive summary is that this moves the torch/csrc/cudnn
library into ATen, adding a number of new cudnn_ methods to ATen
for batchnorm, convolution, affine grid generator and grid sampler.

ATen infra changes:

- TensorGeometry was moved to ATen
- TensorGeometry was modified to make its interface resemble that of
  Tensor; in particular, sizes is no longer a field, it's a method.
- AT_CUDA_ENABLED macro is set via ATen/Config.h header which is
  generated at cmake configure time.
  Fixes https://github.com/zdevito/ATen/issues/168
- Change AT_CUDA_ENABLED macro to be a function macro, so that we
  error if it is not defined
- Introduce a new TensorArg class, which is a Tensor plus a little
  metadata.  This helps us give good error messages when checking
  dimensions/shapes of tensors.
  Fixes https://github.com/zdevito/ATen/issues/169
- Also introduce a TensorGeometryArg class, for when you don't
  need the actual tensor data (which is most of the time.)
- Add ATen/Check.h, which contains a number of utility functions
  for testing shapes, types and devices of input tensors.  This
  will be particulary useful for native methods, which don't get
  code generated input testing code.  These functions take a
  'CheckedFrom' argument, at the moment just a string, which
  specifies some extra information about what function was
  doing the actual checking; this greatly improves error messages.
    - Many check functions take initializer lists, which let you
      test that all tensors have some property.  This API is
      peculiar, in that we IGNORE undefined tensors in this case.
      This is handled by filterDefined.
- Add AT_CUDNN_ENABLED macro
- CuDNN linking from ATen was improved; for example, we now actually
  add the CuDNN headers to our include path.
- Add some missing override specifiers to some methods
- We now actually build tests with CUDA functionality accessible
  (previously, AT_CUDA_ENABLED was not defined, meaning that
  the headers were missing all CUDA-only functionality.)
- Native functions now support giving explicit names to return
  outputs in yaml.  This makes it possible to hook into the NN
  autogenerated derivatives codepath using native functions.

CuDNN rewrite changes:

- torch/csrc/cudnn now uses ATen (rather than passing around
  THVoidTensor) and lives in ATen.  This lets us remove tensorPointer
  shenanigans.  The functions are exposed to ATen as native functions
  described in aten/src/ATen/cudnn/cuDNN.yaml
- ATen now builds and links against CuDNN when enabled.  The cmake
  package script was taken from Caffe2.
- Some header reorganization was done to help reduce dependencies
  on headers (this reorg is no longer used but I've kept it)
- Rename CHECK to CUDNN_CHECK
- Rip out old shape/type testing code in favor of modern ATen/Check.h
  interface using TensorArg.  In many cases, increase the robustness of
  the checking code.
- Change the inputs of the public facing functions, so that they can
  be bound by ATen
  - Delete THCState*; this is retrieved from the global ATen context
  - Delete cudnnHandle_t, this is retrieved from the global Handles.h
  - Delete cudnnDataType_t, this is retrieved from the Tensor type
  - Delete Convolution class, instead its constituent arguments are
    passed individually
- Change functions to return tensors, rather than take an appropriately
  sized output tensor as an input.
- Redo how transposed convolution / backward convolution is implemented
  (knock on effect of returning tensors).  Previously it was assumed
  that you would always pass an appropriately sized output tensor, but
  we don't want to do this anymore.  For backwards, we instead give
  the desired output tensor (input, really) size, because that is
  readily available.  For *transposed* convolution, however, we take
  output_padding, and otherwise do the shape calculation.
- Redo how legacy group convolution is implemented (knock on effect from
  porting cudnn to ATen.)  Previously, group convolution was implemented
  by manually constructing sizes and strides and then outputting
  appropriate, with macros switching between individual groups and
  all-at-once based on CuDNN version.  Now, the code looks exactly what
  you'd expect: there's a top-level wrapping function that supports
  group convolution no matter the version of CuDNN, and a low-level
  wrapper which supports only what CuDNN supports.  The top-level
  function conditions on CuDNN version, and invokes the low-level
  interface 1 or n times.
- There is now a debugging printer for tensor descriptors.
- Convolution struct is replaced with ConvolutionArgs, which is not
  part of the public API but is used internally to conveniently
  pass around all of the arguments needed for Convolution.
- Add some constexprs for well-known dimensions, reduce amount of
  magic numbers in code.
- Put 'deterministic' in to ConvParams.  Fixes #3659
- Lots more comments.
- Some pessimizations, in the name of code clarity:
  - The descriptors are initialized on every invocation of convolution
    forward/backward.  Previously, the descriptors were cached, so that
    you didn't have to initialize them again on backwards.  This is
    difficult to support in the ATen interface so I didn't support it.
  - Legacy group convolution initializes its workspace for *every* group
    it performs.  I did not feel motivated to fix this because the
    legacy codepath is already quite slow.
- Affine grid generator and grid sampler automatically call contiguous
  on their arguments as necessary.
- Batchnorm input checking is greatly beefed up, it now checks for
  the following input characteristics:
    - Definedness
    - GPU location
    - Type
    - Contiguity
    - Size

PyTorch binding code changes

- batchnorm now uses consistent var/data naming
- batchnorm and convolution make use of new ATen bindings
- Affine grid generator and grid sampler make use of ATen CuDNN
  bindings via derivatives.yaml.  This means I had to restructure
  the code a little, since the THNN bindings still go through
  a legacy Python class.
- I fixed some warnings:
  - s/friend class/friend struct/ on InterpreterStateImpl
  - Removed pessimizing move 'detached' in torch/csrc/autograd/variable.cpp
  - Removed unused pack_list on Scalar

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

GCC 4.8 buildfix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Add TensorGeometry to ATen.h

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

CUDNN_CHECK

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Update TODO comment

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Delete return in cudnn_grid_sampler

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

s/cudnnSetStreamToCurrent/setCuDNNStreamToCurrent/g

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Don't allocate a new vector when filtering defined.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Remove Check overloads, convert to pass references.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Some more microbenchmarking.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-30 23:06:58 -05:00
SsnL
1661370ac5 Signal handling in DataLoader workers; Timeout option (#3474) 2017-11-29 23:52:14 +01:00
Sam Gross
4bce69be22
Implement Variable.storage() (#3765)
This still uses THPStorage, but avoids touching THPTensor
2017-11-20 14:18:07 -05:00
peterjc123
aa911939a3 Improve Windows Compatibility (for csrc/scripts) (#2941) 2017-11-08 19:51:35 +01:00
Holger Kohr
5e382894be add numpy() and from_numpy() to HalfTensor (#2953) 2017-11-08 15:01:29 +01:00
Sam Gross
7c0b16c140 Add torch.take and Tensor.put_ (#3263)
* Add torch.take and Tensor.put_

These are similar to numpy.take and numpy.put. The take function allows
you to linearly index into a tensor without viewing it as a 1D tensor
first. The output has the same shape as the indices. The put function
copies value into a tensor also using linear indices.
2017-11-01 06:04:44 -04:00
Sam Gross
a65db4e956 Use ATen for torch.cat, torch.addmm, and friends on Variables. (#3286)
This includes some changes to the dispatch code for torch.xxx functions:

 - Since Variable.addmm is an instance-method, the self argument has to
   come first. The dispatch code swaps the first two arguments if
   necessary to suppor the deprecated signatures where 'alpha' or 'beta'
   comes before the 'self' tensor.
 - Delete IMPLEMENT_STATELESS_REVERSED. These functions require output
   arguments to be passed in using the keyword 'out'. They were meant to
   handle torch.gt(out, a, b), but we haven't allowed that for a while.
2017-10-25 14:27:45 -04:00
Sam Gross
f1f64c8d07 Generate autograd functions for NN / more refactors (#3136)
Generate autograd functions for NN and implement more derivatives in derivatives.yaml

A big refactor of gen_variable_type.py
2017-10-19 15:03:26 -04:00
Adam Paszke
f9ee52efa9 Update DLPack bindings 2017-10-19 10:06:53 -04:00
Sam Gross
47beb64b5c Use ATen generator as default CPU generator (#3135)
ATen has it's own default CPU RNG. Use this as the default in PyTorch so
that random functions called through ATen have the same behavior as
random functions called through TensorMethods
2017-10-16 14:22:58 -04:00
Priya Goyal
756ab3f24f Adding conversion from python tensor to dlpack tensor (#2933) 2017-10-04 08:35:42 -04:00
Soumith Chintala
b3bc5fe302 refactor THCP method defs into cuda/Module.cpp 2017-09-30 13:14:35 -07:00
IraKorshunova
2b9765ad02 Erf and erfinv (#2799) 2017-09-20 21:23:45 -04:00
Adam Paszke
8dae433de8 Move JIT passes to a separate directory 2017-09-19 10:53:32 -04:00
Edward Z. Yang
2e266837f5 Port TracingState to pybind11, new export() method.
Along the way I added converters for Variable and TracingInput.  Variable should
probably be moved to a more widely known spot.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Adam Paszke
594f98ce16 Support multi-stage AutogradClosures 2017-09-05 17:48:55 -04:00
Zach DeVito
a3fdb281d1 Python wrapper for Node IR using pybind11
Supports almost all of the IR API.
2017-09-05 17:48:55 -04:00
Adam Paszke
bdcbbeaf68 Remove GlobalTracingState 2017-09-05 17:48:55 -04:00
Adam Paszke
e186d16e6b Apply JIT optimizations form Python 2017-09-05 17:48:55 -04:00
Adam Paszke
ea05ac8f41 Move JIT-related files to jit dir. Remove IR interpreter 2017-09-05 17:48:55 -04:00
Edward Z. Yang
a797ab9343 Rewrite AST to a new, more functional representation.
Previously, our AST was a DAG, where shared Nodes indicated a computation
should be reused.  This commit rewrites the IR into a new functional
representation which represents sharing explicitly using variable
bindings.

We offer a few justifications for this new style:

1. The new representation is not all that different from the
old one; it is about as easy to construct, and the lack of an
explicit graph doesn't negatively impact our ability to interpret
the graph, since we've chosen, as a matter of design, to NOT have
the IR participate in the actual execution of a graph.

2. The new let-binding representation has an implicit ordering,
which we can use to conveniently keep track of the original order
the trace showed up as.  This automatically gives us a topsort,
and gives us an easier to read textual representation of our
IR:

  %14 = Embedding %11, %0, -1, None, 2, False, False
  %15 = Dropout %14, 0.2, True, False
  %16 = Index %12, 0
  %17 = Index %12, 1
  %18 = Index %13, 0
  %19 = Index %13, 1
  %20 = Index %15, 0
  %21 = Linear %20, %1, %3
  %22 = Linear %16, %2, %4

3. It moves us closer to a Futhark style language
(http://futhark-lang.org/publications/pldi17.pdf).

Major aspects of the diff

- Node is replaced with Expr and Arg, a pair of mutually recursive
  structures which represent our new language.  In BNF, the language
  looks like this:

    a ::= c | %i
    e ::= %i, ... = e
        | PyOp e, ...
        | Ret %i, ...

  Technically, Ret is not actually a return (no control flow is involved),
  it just tuples up a series of tensors (identified by variables).

  One important invariant is that locals are always tensors; they
  are never constants (this is asymmetric with Args.)

- Arguments support Python constants.  This is an important piece because
  many operators take extra Python literals like integers and tuples in
  order to specify extra parameters about how an operator operates.  Adding
  this was essential to getting word_language_model to work.

- As both Expr and Arg have multiple variants, there is new infrastructure
  for doing case on the variants using ExprVisitor and ArgVisitor.  The
  strategy here is adapted from WebAssembly's visitors, although we have
  generalized to permit arbitrary argument forwarding, which is necessary
  to support tail-recursive visitor calls.  TCO is important because our
  interpreter may recurse arbitrarily deep into a stack of nested lets.
  If users wish, they can also manually case on the type tag.

- Tracing is now turned on and off using _tracer_enter/_tracer_exit in
  torch._C.  _tracer_enter accepts a list of variables which are to be
  treated as arguments; _tracer_exit accepts the list of traced variables
  which should be returned when you reexecute the trace, and returns
  the trace expression which can be reexecuted.  GlobalTracingState
  is a global variable which tracks whether or not we are tracing or not.

- You use run_forward to execute a trace on some set of parameters.

- When under tracing, variables keep track, via trace_local, what the
  name of their variables in the IR are.

Here is a simple runner which leaks memory but can be used to JIT models:

  import torch.autograd.function as F
  import torch._C

  def jit(model):
      import types
      real_forward = model.forward
      def forward(self, *args):
          def flatten(x):
              return tuple(F._iter_variables(x))
          if not hasattr(self, "saved_trace"):
              torch._C._tracer_enter(tuple(self.parameters()) + flatten(args))
              out = real_forward(*args)
              self.saved_trace = torch._C._tracer_exit(flatten(out))
              self.saved_outs = out
              return out
          else:
              flat_out = Variable._execution_engine.run_forward(self.saved_trace, tuple(self.parameters()) + flatten(args))
              return F._unflatten(flat_out, self.saved_outs)

Major problems:

- Sanity checking is spotty at best, especially when users pass in variables.

- The interpreter leaks tensor memory from the store.  When we add back def-use
  we should be able to deallocate tensors as soon as we know they are no longer
  necessary.

- The interpreter needs to reach feature parity with the old execution engine.
  From there, we need to see if backwards can be subsumed as well.

- I still have no confidence in having memory managed everything correctly.
  This requires a close look.

- Rather than return an *open* expression as a trace, we should return a
  *lambda* instead, which knows about how many formal parameters it
  requires.

- The IR is not introspectable from Python at the moment, but this is simply a
  matter of implementing all the binding code.

- The tracer is NOT reentrant (you can't trace while you're inside a trace.)
  Furthermore, no sanity checking is done if you try to incorrectly reuse
  things from one trace in another.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Edward Z. Yang
e1b7872fc2 Make it possible to access IR from Python.
Also, add a new trace_fn field to attach forward IR to Variables.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
Justin Johnson
94b5990201 Add torch.cuda.get_device_name function (#2540) 2017-08-26 15:06:37 -04:00
Alykhan Tejani
eb58740651 add ones_like and zeros_like 2017-08-25 14:11:04 -04:00
gchanan
c000d15058 Properly use Py_RETURN_True, Py_RETURN_False in back compatibility warnings. (#2345) 2017-08-08 21:54:20 -04:00
Zach DeVito
9d8cff9bc1 initialize aten and pytorch to share the same THCState 2017-07-11 10:35:03 -04:00
Adam Paszke
714351ff39 Officially enable process-group mode 2017-06-12 22:02:11 -04:00
Gregory Chanan
4f602a52b5 Use THPUtils_assert rather than THError in torch/csrc/Module. 2017-06-11 05:37:59 -04:00
Gregory Chanan
ffd808768e Remove raiseErrors from THTensor functions, have THStorage functions take an error_buffer to return a proper error message while being able to handle memory management correctly from calling function. 2017-06-11 05:37:59 -04:00
Gregory Chanan
177785eecf explicit Ptr constructors, fast transposed copy. 2017-06-11 05:37:59 -04:00
Gregory Chanan
be65f46c76 Add optional warning for backwards incompatible keepdim. Setting torch.utils.backcompat.keepdim.warning.enabled=True will cause Python warnings in the case where the default value of keepdim is used for 1-d reductions.
Also specify keepdim via kwargs in library so these warnings have less
noise.
2017-06-11 05:37:59 -04:00
Gregory Chanan
3556d1b8a3 Add optional warning for backwards incompatible broadcast.
Setting torch.utils.backcompat.broadcast.warning.enabled=True
will cause Python warnings in the case where broadcast occurs
but previously 1-d view style pointwise ops occured.
2017-06-11 05:37:59 -04:00
Gregory Chanan
5af46cb352 Add broadcasting support for matmul. 2017-06-11 05:37:59 -04:00
Sam Gross
d81da41650 Make sure the number of MKL and OpenMP threads match
Otherwise, on many machines, the size of the OpenMP thread pool will
change between MKL and our OpenMP enabled functions. The constant thread
creation and destruction results in worse performance and leaks memory
on GCC 5.4
2017-06-07 14:53:29 -04:00
Adam Paszke
8ea7c87c29 Improve init methods 2017-06-02 23:42:11 +02:00
Adam Paszke
181d2f41bd Add initial Python wrappers for THDTensors 2017-06-02 23:42:11 +02:00
Trevor Killeen
05bc877a05 make THPPointer have explicit constructors (#1636) 2017-05-25 15:35:54 -04:00
ethanluoyc
d0504aa41d Implement lgamma function. 2017-05-08 16:21:26 -07:00
Sam Gross
4c1cdb6148 Refactor Python string utility function 2017-04-28 21:25:26 +02:00
Sam Gross
27990fee54 Use fully qualified name as tp_name for tensors and storages (#1379) 2017-04-27 16:26:44 -04:00
Martin Raison
cd3bbc9dfd more operations and optimizations (hspmm, reorder, ...) 2017-04-18 12:46:54 -07:00
albanD
71303b8af4 Autograd deadlock for recent glibc fix (#1243) 2017-04-12 22:24:31 +02:00
Adam Paszke
afeeb81e79 Add support for keyword arguments in torch.cat 2017-04-11 14:48:54 -07:00
Adam Paszke
91c4ba7980 Add torch.arange and deprecate torch.range 2017-04-03 10:38:58 -04:00
albanD
dfa2d26830 * make random_ range correct when both lower and upper are specified 2017-03-31 15:37:24 -04:00
Sergey Zagoruyko
8dc5d2a22e export current_blas_handle 2017-03-23 23:32:45 +01:00
Brandon Amos
bb353ccc17 Add batch triangular factorization and solves, add IntegerTensor to cwrap (#903) 2017-03-23 15:06:00 -04:00
Adam Paszke
faac0f5c25 Fix torch.cat bugs
Always use PySequence API and disallow catting along inexistent
dimensions.
2017-03-22 18:58:42 -04:00
Sam Gross
379ae6d865 Refactor out dispatchStateless (#1007)
Some of the error messages were incorrect due to erroneous
'tensor == THPDefaultTensorClass' checks
2017-03-15 16:24:55 -04:00
Martin Raison
f17cfe4293 sparse tensor operations (#735) 2017-03-03 18:37:03 +01:00
Zhou Chang
f366e5fc81 Support int16 numpy conversions
issue #891
2017-03-02 09:15:57 -05:00
Sam Gross
fc6fcf23f7 Lock the cudaFree mutex. (#880)
Prevents NCCL calls from overlapping with cudaFree() which can lead to
deadlocks.
2017-03-01 11:29:25 -05:00
Adam Paszke
67f94557ff Expose torch.HalfTensor 2017-02-27 19:35:47 -05:00
Sam Gross
bd5303010d Refactor autograd package to separate Python dependencies. (#662)
The core autograd Variable, Function, and Engine no longer depend on the
Python API. This let's us implement functions in C++. In the future, we
can also multithread engine and release the GIL for most of the
non-Python backwards.
2017-02-13 16:00:16 -08:00
Sam Gross
712686ce91 Add cat, contiguous, squeeze, and unsqueeze to THPP
Use unsqueeze and view from TH/THC
2017-02-11 17:49:31 +01:00
Adam Paszke
79232c24e2 Fixes after rebase 2017-01-31 01:58:09 +01:00
Janusz Marcinkiewicz
76520512e7 DataChannel tests rewrite (#42); DataChannel isend and irecv implementation (#44) 2017-01-31 01:58:09 +01:00
Adam Paszke
60d1852c7b Major improvements to master-worker mode
* Fixed all undefined symbol errors
* Implemented storage interface and THStorage class
* RPC improvements
* Code refactor
2017-01-31 01:58:09 +01:00
Adam Paszke
55632d81d2 Add Python wrappers for process group mode 2017-01-31 01:58:09 +01:00
Sam Gross
c414bf0aaf Fix handling of unicode in torch._C._add_docstr (#487) 2017-01-18 17:22:30 -05:00
Sam Gross
9302f860ae Remove unused file TensorDocstrings.cpp (#481)
Tensor docstrings are created in _tensor_docs.py
2017-01-18 13:34:40 -05:00
Soumith Chintala
8aa8f791fc add more torch.* and Tensor docs (#476) 2017-01-18 08:39:33 -05:00
Sam Gross
14d5d52789 Add placeholder tensor documentation for methods that exist in torch. (#463) 2017-01-17 19:37:47 -05:00
Adam Paszke
f91bb96071 Remove cmin, cmax and cinv 2017-01-16 19:07:37 -05:00
Soumith Chintala
bdfef2975c adding more docs for torch.* functions 2017-01-11 08:19:49 -08:00
Zeming Lin
59d66e6963 Sparse Library (#333) 2017-01-05 00:43:41 +01:00
Soumith Chintala
6b4ed52f10 adding docs for some torch.* functions, removing all, any stateless methods 2017-01-03 18:29:50 -05:00
Sam Gross
849794cd2c Remove deprecated and unimplemented functions (#383) 2016-12-30 18:37:44 -05:00
Sam Gross
ab5776449c Add documentation for some torch.xxx functions (#382) 2016-12-30 17:01:47 -05:00
Adam Paszke
9b7eceddc8 Accept outputs in out argument 2016-12-29 12:25:59 +01:00
Sam Gross
24af02154c Use ForkingPickler for sharing tensor/storages across processes (#344)
This hooks into the (internal) ForkingPickler class in multiprocessing
to reduce tensors, storages, and CUDA events instead of our queue from
joblib. This makes it easier to use the standard multiprocessing classes
in later versions of Python.

This also exposes:

 - Tensor/Storage.share_memory_()
 - Module.share_memory()

These methods move the CPU tensors and storages to shared memory. If
you're using the "fork" method of multiprocessing, these objects can be
directly inherited instead of serialized through a queue.
2016-12-28 20:34:23 -05:00
Sam Gross
126a1cc398 Add Sphinx docs 2016-12-28 00:03:39 +01:00
Sam Gross
e46d942ca6 Fix double initialization of HalfStorage (#331) 2016-12-19 15:19:41 -05:00
Adam Paszke
8e09f0590b Make sure that C extension was compiled with cuDNN before using it 2016-12-15 00:47:55 +01:00
Adam Paszke
28f0cf6cee Add docstring support to cwrap (#295) 2016-12-11 23:25:14 +01:00
Sam Gross
1af9a9637f Refactor copy and release GIL during copy (#286) 2016-12-11 21:54:58 +01:00
Sam Gross
0d7d29fa57 Enable caching allocator for CUDA pinned memory (#275)
Also add binding for CUDA "sleep" kernel
2016-12-02 01:33:56 -05:00
Adam Paszke
1f5951693a Change torch.randperm to return Long tensors 2016-12-01 23:14:41 +01:00
Adam Paszke
3928f7740a Implement functional interface for Variables (torch.*) 2016-11-08 16:13:25 -05:00
Adam Paszke
ebc70f7919 Look for libcudart in default CUDA installation paths (#195) 2016-11-02 19:36:10 -04:00
Sam Gross
f2d7e94948 Use torch.Size for Tensor sizes and tuple for strides
See issue #20

The torch.Size class is a tuple subclass which distinguishes sizes from
other tuples so that torch.Tensor(size) is interpreted as size instead
of data.
2016-10-28 19:37:09 +02:00
Sam Gross
ad2d413c0b Add C++ bindings for cuDNN (#167)
The Python ctypes bindings overhead was high enough that it slowed down
multi-gpu training when using 4+ Maxwell GPUs.
2016-10-26 19:51:48 -04:00
Adam Paszke
9000f40e61 Add torch.from_numpy 2016-10-24 22:30:11 +02:00
Adam Paszke
f137c0c05a Improve error messages of stateless functions 2016-10-24 22:29:43 +02:00
Sam Gross
79ead42ade Add CUDA Stream and Event API (#133) 2016-10-18 12:15:57 -04:00
Sam Gross
3931beee81 Use THSetNumThreads instead of omp_set_num_threads
Set OMP num threads to one in the data loader.

Fixes #81
Fixes #82
2016-10-17 15:15:00 -04:00
Sam Gross
ee14cf9438 Add support for pinned memory: (#127)
torch.Storage/Tensor.pin_memory()
 torch.Storage/Tensor.is_pinned()
2016-10-15 18:38:26 -04:00
Soumith Chintala
3d6ebde756 qr and ormqr tests and bugfix 2016-10-14 03:10:16 -04:00
Adam Paszke
0325e2f646 Major autograd refactor
Improves autograd performance by more than 2x and fixes a couple
of bugs. All core functions have been moved to C.
2016-10-13 17:17:49 -07:00
Adam Paszke
2acee24332 Add keyword argument support to most tensor functions 2016-10-13 12:32:04 -04:00
Adam Paszke
96f61bff30 Add LAPACK functions 2016-10-08 20:37:37 -07:00
Adam Paszke
dbe540e49f Use the custom TH error handler in all threads by default 2016-09-30 14:59:50 -07:00
Adam Paszke
3f7ab95890 Finish implementation of prng related functions 2016-09-29 11:33:25 -07:00
Adam Paszke
941cf4e63d Add ffi utils for user C extensions 2016-09-29 09:35:56 -07:00
Adam Paszke
1828e7c42f Add async CUDA copy 2016-09-27 15:12:48 -07:00
Adam Paszke
ddf1598ef8 Add a method for catching exceptions thrown in ctypes 2016-09-25 12:25:54 -07:00
Adam Paszke
e71204b52f Improve error messages in storage and tensor C functions 2016-09-23 17:17:35 -07:00
Adam Paszke
06ab3f962f Refactor _C extension to export some utilities 2016-09-21 08:36:54 -07:00
Adam Paszke
8fdec15a55 Codemod to remove camel case method naming 2016-09-20 08:40:28 -07:00
soumith
1f2695e875 adding cuda driver check functions for runtime checking 2016-09-13 10:34:13 -07:00
Adam Paszke
58f507f9e3 Add file descriptor sharing mode to multiprocessing 2016-09-08 11:23:33 -07:00
Adam Paszke
f9d186d33a Add initial version of multiprocessing module 2016-08-31 19:46:08 -07:00
Adam Paszke
1902bc0bfb Interface with numpy 2016-08-13 20:19:17 -07:00
Adam Paszke
12bed8dc0d Add CUDA device selection 2016-08-12 07:46:46 -07:00
Adam Paszke
e9f9fd3727 Major refactor 2016-08-10 09:24:53 -07:00
Adam Paszke
554a1d8336 Add optim 2016-07-21 16:42:06 -04:00
Adam Paszke
bc7bd7a8b3 Add unit tests and fix detected bugs 2016-07-21 13:46:59 -04:00
Adam Paszke
c574295012 Various fixes 2016-07-19 10:45:59 -04:00
Adam Paszke
3a44259b32 Add support for CUDA 2016-07-19 10:45:59 -04:00
Adam Paszke
93ed433de3 Add rand and randn 2016-07-18 23:59:27 -04:00
Adam Paszke
3cec305524 Restructure python code 2016-06-23 22:55:05 +02:00
Adam Paszke
486ea76b98 Add more Tensor methods 2016-06-19 00:24:18 +02:00
Adam Paszke
4f66ea42af Add random-related Tensor methods 2016-06-18 21:36:10 +02:00
Adam Paszke
857c32bc21 Add all mm methods 2016-06-16 23:40:35 +02:00
Adam Paszke
0eb2b9e756 Add more Tensor and Storage methods 2016-06-15 23:03:47 +02:00
Adam Paszke
fdfe9d836e Add index* Tensor methods 2016-06-13 13:58:09 +02:00
Adam Paszke
a9282edf79 Add THPPointer and more Tensor methods 2016-06-13 13:26:00 +02:00
Soumith Chintala
5ee3358a92 python 2 support 2016-06-08 19:14:57 -04:00
Adam Paszke
0b61c3f233 Add more Tensor methods 2016-05-13 22:38:51 +02:00
Adam Paszke
56c98f7897 Add more Tensor methods 2016-05-13 00:01:54 +02:00
Adam Paszke
c3f7aac4f9 Add logical functions 2016-05-12 01:22:51 +02:00
Adam Paszke
449ac4ca2a Add torch.* functions 2016-05-09 19:14:40 +02:00
Adam Paszke
842e1b6358 Add exception handling 2016-05-05 20:58:13 +02:00