Summary:
How did we get so many uses of `NULL` again?
ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11047
Differential Revision: D9566799
Pulled By: goldsborough
fbshipit-source-id: 83469f352ac69aa65bdaf1a1a21f922d892e0db3
Summary:
Currently our `skipIfLapack` has uses a try-catch block and regex match the error message. It is highly unreliable. This PR adds `hasLAPACK` and `hasMAGMA` on ATen context, and expose the flags to python.
Also fixes refcounting bug with `PyModule_AddObject`. The method steals reference, but we didn't `Py_INCREF` in some places before calling it with `Py_True` or `Py_False`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11024
Differential Revision: D9564898
Pulled By: SsnL
fbshipit-source-id: f46862ec3558d7e0058ef48991cd9c720cb317e2
Summary:
These could use some autograd tests, which are coming in a later PR, but using them in autograd is probably pretty rare.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9947
Reviewed By: ezyang
Differential Revision: D9032778
Pulled By: gchanan
fbshipit-source-id: fa5a6509d3bac31ea4fae25143e82de62daabfbd
Summary:
Usually DLPack consumer is expected to call DLManagedTensor's
deleter to signal that it doesn't need the contents.
This patch calls the deleter when freeing unconsumed
DLPack capsules created by PyTorch.
Test script:
```
import torch
import torch.utils.dlpack
import gc
for i in range(10000):
a = torch.randn(1000,1000, dtype=torch.float32, device='cuda')
b = torch.utils.dlpack.to_dlpack(a)
gc.collect()
```
Before patch: consume all GPU ram.
After patch: constant GPU ram consumption.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9297
Differential Revision: D8781571
Pulled By: soumith
fbshipit-source-id: 2ebadec6c857646220d632ca64110af430dbd52f
* Bag of fixes
* Rename tensor_range.h to tensor_list_view.h
* Post rebase fixes
* Rename torch::tensor namespace to torch::tensors due to name conflict
* Avoid recursion in Module::to
* Some 0-sized dimension support, port catArray away from resizeLegacy.
The goal of this PR is to port catArray away from resizeLegacy (so we can delete the legacy resize calls), but since catArray has some weird behavior because
we don't have arbitrary 0-sized dimension support, I made some effort to fix these both in one pass.
The major changes here are:
1) catArray uses the new resize API, no longer the old resizeLegacy API.
2) As 1) is the last usage of resizeLegacy, it is deleted.
3) If compiled with USE_TH_SIZE_ZERO_DIM, catArray will work and properly check shapes for n-dimensional empty tensors.
4) However, we retain the old behavior of "ignoring" size [0] tensors in catArray. We previously allowed this because we didn't have n-dimensional empty tensors.
5) To get the above to work, we also add support for n-dimensional empty tensors for narrow and slice (ifdef USE_TH_SIZE_ZERO_DIM).
6) We change the stride formula for empty tensors to match NumPy; basically, we never multiply by 0 as the size, always at least 1, so the
strides are monotonically increasing in the empty tensor case.
7) We print the size of empty tensors if size != [0]; this matches NumPy behavior (even in cases where the size could be inferred from the brackets.
8) For test purposes, we add torch._C._use_zero_size_dim() to add tests for the above.
* Fix flake8.
* Address review comments.
* Build and install c10d from tools/build_pytorch_libs.sh
* Create initial Python bindings for c10d
* clang-format
* Switch link order to include more symbols
* Add bindings and tests for ProcessGroupGloo
* Add broadcast test
* Separate build flag for c10d
* Explicit PIC property
* Skip c10d tests if not available
* Remove c10d from Windows blacklist
Let it skip by itself because it won't be available anyway.
* Make lint happy
* Comments
* Move c10d module into torch.distributed
* Close tempfile such that it is deleted
* Test if ASAN is actually working as part of ASAN tests.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Drop explicit use of libstdc++, we should not care.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Build with DEBUG=1
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Increase main thread stack size when using ASAN.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
This makes the JIT tracer much more robust, by allowing it to record
dependencies on tensor sizes. For example, if you were to trace this
function
def fn(x):
return x.view(x.size(1), -1)
before this patch, then it would embed the actual value of x.size(1)
in the trace as a constant, making it very hard to have e.g. batch size
independent traces. Now, this will correctly record the dependency, and
will retrieve the size of x at every run.
* Split set_default_tensor_type(dtype) into set_default_dtype(dtype).
* Fix flake8.
The difference between this one and set_default_tensor_type is that it only sets scalar type what determines the type + device of a tensor returned from a factory function with defaults is the default tensor type + the current device (if the default tensor type is cuda). This just changes the scalar type of the default tensor type.
We do eventually want to deprecate set_default_tensor_type; it is not clear how to do that in a sensible and backwards compatible way.
* Separate cuda-ness from dtype.
There are no longer torch.cuda.int64, etc; only torch.int64 that correspond to at::ScalarType.
At the python arg parser level, the corresponding ATen type is selected from the combination of (ScalarType, Layout, Device).
There is also currently unused code in here for support ScalarType in native_functions; this will be used for specifying aggregate types
on reduction functions.
* Fix test_autograd.
* Add defaults to randint_like.
* Track is_cuda in py tensor types.
* Fix test_sparse.
* Fix multiprocessing.
* Fix rnn.
* Fix test_nn.
* Fix flake8.
* Add string-style devices to all tensors.
Previously, tensors only had a 'get_device' method which would throw an exception on a CPU tensor. This made it necessary to if/else code that
was meant to be device agnostic.
This PR implements the following:
1) Adds a 'device' property to all tensors that returns a string representation of the device for all tensors.
For cpu tensors this is 'cpu'. For cuda tensors this is 'cuda:X', where X is the cuda device ordinal.
2) Adds a DeviceSpec class. This is just a helper class for separating device_type and device_index specification and to allow partial specification.
For example, you can call DeviceSpec('cuda'), DeviceSpec('cuda:0'), DeviceSpec('cuda', 1).
Also has backwards compatibility support for specifying integers, which are treated as cuda devices.
DeviceSpecs have the following properties:
a) device_type: string representation of the device type (i.e. 'cpu' or 'cuda')
b) device_index: integer for the device index (None if not specified)
c) cuda_device_index: for backwards compatibility; behaves roughly like `get_device` did previously. I.e. if a function previously took integers for cuda devices,
it can now take DeviceSpecs (or strings), and can maintain the old functionality by calling `old_index = DeviceSpec(old).cuda_device_index`.
3) tensor methods and torch. functions that took integer devices can now take integers, strings, or DeviceSpecs. For example:
torch.randn((2,3), dtype=torch.cuda.float32, device='cuda:1')
TODO in future PRs:
A) Split out cuda from dtype so you don't need to overspecify cuda-ness
B) We currently only support strings/DeviceSpecs in tensor methods and torch. functions. We should have equivalents torch.cuda.device(...), torch.cuda.device_of, etc.
at the torch. level that work on strings/DeviceSpecs
* Add deviceInt64 to python arg parser.
* device_str.
* Remove device_str.
* remove device prefix from attributes.
* Use const char * instead of string.
* Move autogpu index out of Device.
* comment on is_default.
* Rename torch.DeviceSpec to torch.device.
* comment.
* Fix tests.
* Fix flake8.
* Fix sparse_coo_tensor parameter name.
* Improve error message.
* Remove device_ prefix from C++ device object.
* Allocate static strings.
* Return not implemented from rich compare.
* Move torch::Device to THPDevice.
* Remove cuda index.
* Py_RETURN_NOTIMPLEMENTED doesn't exist in python2.
This changes type(tensor) to return `torch.Tensor` instead of
`torch.autograd.Variable`.
This requires a few implementation changes:
- torch.Tensor is now a regular Python class instead of a
pseudo-factory like torch.FloatTensor/torch.DoubleTensor
- torch.autograd.Variable is just a shell with a __new__ function.
Since no instanes are constructed it doesn't have any methods.
- Adds torch.get_default_dtype() since torch.Tensor.dtype returns
<attribute 'dtype' of 'torch._C._TensorBase' objects>
We had a bug in the Buck build of PyTorch due to symbols from _C
being present in two shared libraries that were both loaded at
runtime. This caused global variables to be initialized twice and
destructed twice on exit. The second destruction often caused
segfaults on exit.
This attempts to detect that sort of situation early on. If
Module.cpp is compiled twice, the symbol
pytorch_duplicate_guard()::initialized will be shared. The second
initialization will print an error message and abort.
* Introduce torch.layout and split layout from dtypes.
Tensors (and tensor types) now have a 'layout' attribute that returns either 'torch.strided' or 'torch.sparse_coo'.
Previously, dtypes were 1-to-1 with ATen types/PyTensorTypes; the impetus behind this decision was to make things easy in the common case
(i.e. specifying a type in a factory function). But this doesn't really follow for sparity, which isn't a common case.
It also doesn't properly represent the concept or a dtype, which in numpy are proper scalar types (i.e. roughly the type returned from indexing the
last dimension of an n-d array). But this should be the same whether or not the tensor is represented via strides, sparsity, etc.
This is accomplished by:
1) having the dtype of tensor return the (device-type, scalar-type) combination, i.e. torch.cuda.float32, so both
torch.cuda.FloatTensor and torch.cuda.sparse.FloatTensor have the same dtype
2) Adding a layout parameter to python functions, where the combination of (dtype, layout) maps to an ATen type that is used for dispatch.
* Formatting, make init throw python_error.
* Fix cuda not enabled error message.
* Fix test.
This is the first of three PRs that #5537 will be split into.
This PR adds mkl headers to included files, and provides helper functions for MKL fft and cuFFT.
In particular, on POSIX, headers are using mkl-include from conda, and on Windows, it is from a new file @yf225 and I made and uploaded to s3.
* add mkl-include to required packages
* include MKL headers; add AT_MKL_ENABLED flag; add a method to query MKL availability
* Add MKL and CUFFT helpers
- Remove some uses of mega-header THP.h
- Use HANDLE_TH_ERRORS in functions that may throw
- Move NumPy includes to common header
- Delete unused allocator
* Add dtype to torch.Tensor, torch.FloatTensor, etc.
* Support passing dtypes to set_default_tensor_type.
* Check dtype exception.
* Correctly handle new type initialization order.
* Move handling of torch.Storage alias to C++.
* Delete function that erroneously reappeared.
This deletes most of the dead Tensor code paths, including the TensorMethods cwrap and generic/Tensor.cpp.
This also moves the THNN.cwrap/.cpp generation to generate_code which can use ninja if installed.
This replaces the torch.Tensor constructors with factories that produce
Variables. Similarly, functions on the torch module (e.g. torch.randn)
now return Variables.
To keep the PR to a reasonable size, I've left most of the unused tensor
code. Subsequent PRs will remove the dead code, clean-up calls to
torch.autograd.Variable, and rename Variable to Tensor everywhere.
There are some breaking changes because Variable and Tensors had
slightly different semantics. There's a list of those changes here:
https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge
* Add numpy-style dtypes to Variable factories.
1) Add numpy-style dtypes corresponding to torch tensor types. These are:
torch.float16, torch.float32, torch.float64, torch.uint8, torch.int8, torch.int16, torch.int32, torch.int64
as well as torch.cuda, torch.sparse, and torch.cuda.sparse equivalents.
2) Adds "legacy" names for the above dtypes that correspond more closely to existing tensor names. These are:
torch.half, torch.float, torch.double, torch.short, torch.int, torch.long.
torch.byte and torch.char don't exist because they either don't match numpy semantics or differ on different architectures.
3) Adds a "dtype" parameter to Variable factories (e.g. zeros, ones) that allows the user to specify the type without changing the default tensor type.
4) Adds a "dtype" getter to Variables that return the canonical dtype from 1)
This PR is missing the following useful features that should be added in the future:
A) We only add the "dtype" parameter to auto-generated factories; hand-written factories like in tensor_new.cpp don't support this yet.
B) We don't allow type conversions to use dtypes; that should be added to type(param) or a new function.
C) We don't yet have a "device" parameter for these factories; right now, they will only create Variables on the default device.
* backend_to_string can be private.
* Define python binding argument indexes in a more simple way.
* add all_declared_types, still need to hook it up to THPDType.
* Fix all_declared_types for missing types (it's Sparse + Half).
* Ensure cuda dtypes are created even if compiled with NO_CUDA=1.
* Fix case where dtype is provided but dispatch is via namespace.
This happens in ones_like, empty_like, randn_like.
There is some question if we should do:
1) at::ones_like(tensor).toType(dtype)
2) at::ones_like(tensor.toType(dtype))
I did the former because this matches with the numpy documentation, i.e.:
"Overrides the data type of the result." and it's easier to implement.
Note that the above causes an extra copy, either of the input or output.
Here's a better implementation:
1) Make zeros_like, ones_like native functions that take an optional type (named dtype?).
2) Match the type argument with the dtype, so we don't have two different parameters.
3) Call at::zeros_like(input, type) -> at::native::zeros_like(input, type) -> type.zeros(input.sizes())
* Don't return from maybe_initialize_cuda.
* Don't leak DType name.
* Address cpp review comments.
* Share code between sparse and non-sparse test_dtypes.
* Rewrite _like functions as native function with explicit type parameter.
* Use type 'Type' instead of 'dtype' for consistency.
* Address review comments.
* Handle arg_idx when there is requires_grad but no dtype in python_binding_arguments.
* Enable scalars if compiled with WITH_SCALAR environment variable.
We are pretty close to enabling scalars (0-dimensional arrays); this allows turning them on
for development purposes and to be able to write code that works both with and without scalars enabled.
WITH_SCALARS is currently broken with distributions, but should work for test_torch, test_autograd, test_nn.
* Fix unsqueeze.
* Fix wrap dim, wrapping with Scalar.
* Use ATen infer_size implementation rather than TH.
The only substantitive difference between the two implementations is in how empty sizes are handled;
in ATen these are treated as scalars (i.e., can be expanded to anything), whereas in TH they are treated
as a special case of empty tensors (i.e., can't be expanded to anything). Therefore, this change is
necessary to support scalars (0-dimensional tensors). We could also take a bool parameter for determining
how we treat empty tensors but this seems unnecessary: if one tries to expand an empty tensors (as a result
of an infer_size calculation), the expansion will fail.
* Make changes for review.
* Attempt to fix windows build.
* long -> int.
This adds overrides in VariableType for the xxx_out ATen functions and
implements Python bindings. There is no support for automatic
differentiation. If any of the inputs (or outputs) requires grad, then the
function will throw an exception unless it's running in "no-grad" mode.
The bindings for calling torch.xxx functions on Variables are moved to a
different object. Previously, they were static method on VariableBase.
This change prevents users from accidentally calling static methods as if
they were instance methods.
* Fix the inconsistency of `polygamma` on Tensor and Variable.
Signed-off-by: HE, Tao <sighingnow@gmail.com>
* Regression test for #4466, polygamma works on variables.
Signed-off-by: HE, Tao <sighingnow@gmail.com>
* Add macro IMPLEMENT_STATELESS_SWAP to dispatch stateless methods on Variables correctly.
When call stateless methods with more than one arguments and the `self` comes second,
the `self` argument needs to be swapped to the first position before dispatching.
The macro `IMPLEMENT_STATELESS_ADDXX` is still reserved for deprecated `add**`
methods.
Signed-off-by: HE, Tao <sighingnow@gmail.com>
- Rename THNN convolution to have thnn_ prefix.
- Propagate CuDNN benchmark and deterministic to at::Context
- Add 'convolution', 'convNd' and 'conv_transposeNd' native wrappers, with defaults
The conv_transposeNd wrappers are updated to have the same argument
order as Python.
- torch.nn.functional directly dispatches to the native wrappers
- Make it possible to turn off tracing for some native wrappers, so I don't
have to write symbolics for all the functions above
- Spectral ops can now make use of CuDNN convolution if possible
- Better commentary on cudnn_batch_norm
- Turn on DCE for all JIT tests.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Refactor cudnn code layout / make build more robust.
When I previously moved cuDNN into ATen, I wasn't too familiar with the
ATen native function directory layout, and so I did a number of
suboptimal things. This commit fixes those problems.
- If NO_CUDA was set but cuDNN is installed on your system, we'd incorrectly
assume that CUDNN was enabled, to hilarious effect.
- We now distinguish between cudnn implementation files and cudnn
native function files. The native files now live in ATen/native/cudnn,
and are *unconditionally compiled*, even when we are not building with cuDNN.
This means that we can unconditionally declare cudnn functions in yaml
and they are always available, even if they are broken. The cuDNN specific
files live in 'cudnn', they are *never* installed, and they are used
purely for implementation purposes. I had to add stub implementations of
all ATen functions to achieve this.
- I had written headers for at::native functions manually, but codegen
will generate them for me automatically. So I deleted the headers.
That lets me get rid of some header install logic as well.
- There's a new note about ATen preprocessor philosophy.
This removes volatile from Variable. The functionality is mostly
replaced by a global (thread-local) flag, which is controlled by
torch.set_grad_enabled() and the context manager torch.no_grad().
In C++, the flag is exposed through GradMode::is_enabled() and GradMode::set_enabled()
Fixes#3627
* Support CPU Apply directly in ATen and implement standard_gamma using it.
Main changes in this PR:
1) Added a TH_APPLY-style templatized function for CPU apply calls (currently only 2 and 3 tensor argument
versions are supported, but more are easy to add). In fact, this is basically identical to TH_APPLY, except
it uses ATen functions and the API is a template instead of a macro. The template takes an operation that
is performed on the data (and an indicator to signal early termination); i.e. you don't need to know that
x_data is a pointer to the current data location of x.
2) Refactors the ATen dispatch code to easily generate dispatch code for different subsets of the scalar types.
This is in preference to the template_scalar path, which requires valid specialization of each scalar type. Valid
specializations are particularly annoying with CUDA because you most likely can't put the specializations
in a header so need to write some sort of for-all-scalar-type macro to get the correct specializations.
Currently, we only generate dispatch_all (all scalar types, the equivalent existed already), and
dispatch_cpu_floating_types (which is used by standard_gamma).
3) Implements standard_gamma using the above changes (this is an arbitrary choice, it was the latest
apply macro to be committed). The forward is bound via Declarations.yaml,
the backward via the Apply template, and then they are hooked together in derivatives.yaml. This eliminates
needing to change TH at all going forward, which means one can write idiomatic C++ instead of the TH-style macros
(e.g. TH_MATH_NAME).
* Generate Dispatch code with nicer spacing.
* Small cleanups.
* Fix typo.
* Add TODOs for changing macros, remove dead code.
* Use a lambda function.
* Get rid of early exit.
* Rename Scalar,ScalarType template parameters to CScalar.
* Reorder _standard_gamma_grad parameters.
* Add comments explaining calling convention.
* Don't generate Dispatch.h anymore.
* Get rid of backend specific checks in dispatch.
* Fix empty/scalar check.