Commit Graph

70 Commits

Author SHA1 Message Date
peter
d6f62b70f3 Fix cuda and cudnn libraries search process on Windows (#20205)
Summary:
Fixes #20202
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20205

Differential Revision: D15258626

Pulled By: ezyang

fbshipit-source-id: 855ad457a8bb7a46accc7cf6ec5cb09e98f6e770
2019-05-08 06:08:47 -07:00
Tongzhou Wang
973d51079b Add device-specific cuFFT plan caches (#19300)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/19224
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19300

Differential Revision: D14986967

Pulled By: soumith

fbshipit-source-id: 8c31237db50d6924bba1472434c10326610d9255
2019-04-18 06:39:35 -07:00
Edward Yang
50df3e5e2e Add ability to query if built with CUDA and MKL-DNN. (#18362)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18362
ghimport-source-id: 374b7ab97e2d6a894368007133201f510539296f

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18242 Test running a CUDA build on CPU machine.
* **#18362 Add ability to query if built with CUDA and MKL-DNN.**

Fixes #18108.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14584430

fbshipit-source-id: 7605a1ac4e8f2a7c70d52e5a43ad7f03f0457473
2019-03-25 10:39:09 -07:00
SsnL
13422fca32 Add torch.backends.openmp.is_available(); fix some cmake messages (#16425)
Summary:
1. add `torch.backends.openmp.is_available()`
2. Improve various `cmake` outputs
3. Fix LDFLAGS not respected by `caffe2_pybind11_state_*` targets
4. Fix `MKL` warning message, and QUIET flag.
5. Fix various typos
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16425

Differential Revision: D13903395

Pulled By: soumith

fbshipit-source-id: d15c5d46f53e1ff1c27fca2887b9d23d0bd85b4d
2019-01-31 16:15:46 -08:00
Lu Fang
b1b00f329e Fix the flake8 linter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16549

Reviewed By: bddppq

Differential Revision: D13877435

Pulled By: houseroad

fbshipit-source-id: dbe575ba3f6dd30d27ac6aa5eec2eea025063540
2019-01-30 09:36:00 -08:00
David Riazati
bc74ec80d0 Add support for torch.backends.cudnn.enabled (#13057)
Summary:
This is used commonly in `nn` functions. This PR adds it as a weak
module (and also alters the conversion of weak modules to strong modules
to accept ordinary `object`s)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13057

Differential Revision: D10846618

Pulled By: driazati

fbshipit-source-id: 028b9f852d40e2e53ee85b93282c98cef8cd336b
2018-10-31 09:31:09 -07:00
sclarkson
2b033332c8 Allow linking to backwards-compatible cuDNN at runtime (#12239)
Summary:
Fixes #12193
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12239

Differential Revision: D10321744

Pulled By: soumith

fbshipit-source-id: bf437f7f9b6231158a1585d2dabae8d937396478
2018-10-10 23:56:51 -07:00
Matt Dawkins
87b2f05a9c Also set stdin to subprocess pipe in FindCUDNN windows popen call (#11435)
Summary:
Same issue as https://github.com/pytorch/pytorch/pull/10379, just in a different place (adding this resolves it)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11435

Differential Revision: D9736396

Pulled By: soumith

fbshipit-source-id: 220a52b8009fc2bee9313c5a091443c68f85f62f
2018-09-09 11:40:25 -07:00
Peter Goldsborough
9ce15173fb Move _cudnn_init_dropout_state to TensorOptions and enable cuDNN dropout in C++ API RNNs (#9012)
Summary:
The goal of this PR was to add support for dropout descriptors in the C++ API's RNN class.
The end result is a 4x-5x speedup for our RNN integration tests since they can now use cuDNN instead of autograd when dropout is set.

To achieve this, I had to move `_cudnn_init_dropout_state` to the `TensorOptions` API.

I also fixed a bug around `RNN::cuda()` not flattening parameters for cuDNN.

ebetica ezyang
Closes https://github.com/pytorch/pytorch/pull/9012

Reviewed By: pjh5

Differential Revision: D8689786

Pulled By: goldsborough

fbshipit-source-id: 44fb191f5a38e41c4ded5417306b5bbc012cd56c
2018-06-29 17:25:23 -07:00
Tongzhou Wang
e6c7b38f94
Cache cufft plans (#8344)
* cache cufft plans

* use an LRU cache

* suffix CuFFTParams members with _

* import print_function for py2

* lint

* fix potential race; add dummy impl for CPU only builds

* cpp formatting; remove nccl makefile change

* Use CUDA hooks instead

* comments and doc

* update the error message

* move LRU cachae to a separate file and native::detail namespace

* update comment

* specify NOTE location in CuFFTPlanCache.h

* update disabled_features.yaml to make amd ci work

* another fix for AMD CI in disabled_features.yaml

* Wrap cufft_plan_cache_* methods in __HIP_PLATFORM_HCC__

* improve the notes

* lint

* revert onnx change

* put back inlining for CUFFT_CHECK
2018-06-22 13:02:34 -04:00
Peter Goldsborough
0acddd6cee
Add torch.cuda.cudnn_is_available (#8703) 2018-06-20 14:18:03 -07:00
Edward Z. Yang
64834f6fb8
Split libATen.so into libATen_cpu.so and libATen_cuda.so (#7275)
* Split libATen.so into libATen_cpu.so and libATen_cuda.so

Previously, ATen could be built with either CPU-only support, or
CPU/CUDA support, but only via a compile-time flag, requiring
two separate builds.  This means that if you have a program which
indirectly uses a CPU-only build of ATen, and a CPU/CUDA-build of
ATen, you're gonna have a bad time.  And you might want a CPU-only
build of ATen, because it is 15M (versus the 300M of a CUDA build).

This commit splits libATen.so into two libraries, CPU/CUDA, so
that it's not necessary to do a full rebuild to get CPU-only
support; instead, if you link against libATen_cpu.so only, you
are CPU-only; if you additionally link/dlopen libATen_cuda.so,
this enables CUDA support.  This brings ATen's dynamic library
structure more similar to Caffe2's.  libATen.so is no more
(this is BC BREAKING)

The general principle for how this works is that we introduce
a *hooks* interface, which introduces a dynamic dispatch indirection
between a call site and implementation site of CUDA functionality,
mediated by a static initialization registry.  This means that we can continue
to, for example, lazily initialize CUDA from Context (a core, CPU class) without
having a direct dependency on the CUDA bits.  Instead, we look up
in the registry if, e.g., CUDA hooks have been loaded (this loading
process happens at static initialization time), and if they
have been we dynamic dispatch to this class.  We similarly use
the hooks interface to handle Variable registration.

We introduce a new invariant: if the backend of a type has not
been initialized (e.g., it's library has not been dlopened; for
CUDA, this also includes CUDA initialization), then the Type
pointers in the context registry are NULL.  If you access the
registry directly you must maintain this invariant.

There are a few potholes along the way.  I document them here:

- Previously, PyTorch maintained a separate registry for variable
  types, because no provision for them was made in the Context's
  type_registry.  Now that we have the hooks mechanism, we can easily
  have PyTorch register variables in the main registry.  The code
  has been refactored accordingly.

- There is a subtle ordering issue between Variable and CUDA.
  We permit libATen_cuda.so and PyTorch to be loaded in either
  order (in practice, CUDA is always loaded "after" PyTorch, because
  it is lazily initialized.)  This means that, when CUDA types are
  loaded, we must subsequently also initialize their Variable equivalents.
  Appropriate hooks were added to VariableHooks to make this possible;
  similarly, getVariableHooks() is not referentially transparent, and
  will change behavior after Variables are loaded.  (This is different
  to CUDAHooks, which is "burned in" after you try to initialize CUDA.)

- The cmake is adjusted to separate dependencies into either CPU
  or CUDA dependencies.  The generator scripts are adjusted to either
  generate a file as a CUDA (cuda_file_manager) or CPU file (file_manager).

- I changed all native functions which were CUDA-only (the cudnn functions)
  to have dispatches for CUDA only (making it permissible to not specify
  all dispatch options.)  This uncovered a bug in how we were handling
  native functions which dispatch on a Type argument; I introduced a new
  self_ty keyword to handle this case.  I'm not 100% happy about it
  but it fixed my problem.

  This also exposed the fact that set_history incompletely handles
  heterogenous return tuples combining Tensor and TensorList.  I
  swapped this codegen to use flatten() (at the possible cost of
  a slight perf regression, since we're allocating another vector now
  in this code path).

- thc_state is no longer a public member of Context; use getTHCState() instead

- This PR comes with Registry from Caffe2, for handling static initialization.
  I needed to make a bunch of fixes to Registry to make it more portable

  - No more ##__VA_ARGS__ token pasting; instead, it is mandatory to pass at
    least one argument to the var-args. CUDAHooks and VariableHooks pass a nullary
    struct CUDAHooksArgs/VariableHooksArgs to solve the problem. We must get rid of
    token pasting because it does not work with MSVC.

  - It seems MSVC is not willing to generate code for constructors of template
    classes at use sites which cross DLL boundaries. So we explicitly instantiate
    the class to get around the problem. This involved tweaks to the boilerplate
    generating macros, and also required us to shuffle around namespaces a bit,
    because you can't specialize a template unless you are in the same namespace as
    the template.
  - Insertion of AT_API to appropriate places where the registry must be exported

- We have a general problem which is that on recent Ubuntu distributions,
  --as-needed is enabled for shared libraries, which is (cc @apaszke who was
  worrying about this in #7160 see also #7160 (comment)). For now, I've hacked
  this up in the PR to pass -Wl,--no-as-needed to all of the spots necessary to
  make CI work, but a more sustainable solution is to attempt to dlopen
  libATen_cuda.so when CUDA functionality is requested.

    - The JIT tests somehow manage to try to touch CUDA without loading libATen_cuda.so. So
      we pass -Wl,--no-as-needed when linking libATen_cuda.so to _C.so

- There is a very subtle linking issue with lapack, which is solved by making sure libATen_cuda.so links against LAPACK. There's a comment in aten/src/ATen/CMakeLists.txt about htis as well as a follow up bug at #7353

- autogradpp used AT_CUDA_ENABLED directly. We've expunged these uses and added
  a few more things to CUDAHooks (getNumGPUs)

- Added manualSeedAll to Generator so that we can invoke it polymorphically (it
  only does something different for CUDAGenerator)

- There's a new cuda/CUDAConfig.h header for CUDA-only ifdef macros (AT_CUDNN_ENABLED, most prominently)

- CUDAHooks/VariableHooks structs live in at namespace because Registry's
  namespace support is not good enough to handle it otherwise (see Registry
  changes above)

- There's some modest moving around of native functions in ReduceOps and
  UnaryOps to get the CUDA-only function implementations into separate files, so
  they are only compiled into libATen_cuda.so. sspaddmm needed a separate CUDA
  function due to object linkage boundaries.

- Some direct uses of native functions in CUDA code has to go away, since these
  functions are not exported, so you have to go through the dispatcher
  (at::native::empty_like to at::empty_like)

- Code in THC/THCS/THCUNN now properly use THC_API macro instead of TH_API
  (which matters now that TH and THC are not in the same library)

- Added code debt in torch/_thnn/utils.py and other THNN parsing code to handle
  both TH_API and THC_API

- TensorUtils.h is now properly exported with AT_API

- Dead uses of TH_EXPORTS and co expunged; we now use ATen_cpu_exports and
  ATen_cuda_exports (new, in ATenCUDAGeneral.h) consistently

- Fix some incorrect type annotations on _cudnn_rnn_backward, where we didn't
  declare a type as possibly undefined when we should have. We didn't catch this
  previously because optional annotations are not tested on "pass-through" native
  ATen ops (which don't have dispatch). Upstream issue at #7316

- There's a new cmake macro aten_compile_options for applying all of our
  per-target compile time options. We use this on the cpu and cuda libraries.

- test/test_cpp_extensions.py can be run directly by invoking in Python,
  assuming you've setup your PYTHONPATH setup correctly

- type_from_string does some new funny business to only query for all valid CUDA
  types (which causes CUDA initialization) when we see "torch.cuda." in the
  requested string

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Last mile libtorch fixes

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* pedantic fix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-10 10:28:33 -07:00
Tongzhou Wang
1c01eabd3c
Codemod to update our codebase to 0.4 standard (#6641)
* Codemod to update our codebase to 0.4 standard

* Update some of the test scri[ts

* remove Variable in test_clip_grad_value

* fix _symbolic_override_wrapper_maker
2018-04-17 22:06:54 -04:00
gchanan
749d51414a
Separate cuda-ness from dtype. (#6470)
* Separate cuda-ness from dtype.

There are no longer torch.cuda.int64, etc; only torch.int64 that correspond to at::ScalarType.
At the python arg parser level, the corresponding ATen type is selected from the combination of (ScalarType, Layout, Device).

There is also currently unused code in here for support ScalarType in native_functions; this will be used for specifying aggregate types
on reduction functions.

* Fix test_autograd.

* Add defaults to randint_like.

* Track is_cuda in py tensor types.

* Fix test_sparse.

* Fix multiprocessing.

* Fix rnn.

* Fix test_nn.

* Fix flake8.
2018-04-12 14:05:44 -04:00
Tongzhou Wang
22ef8e5654 [fft][1 of 3] build system and helpers to support cuFFT and MKL (#5855)
This is the first of three PRs that #5537 will be split into.

This PR adds mkl headers to included files, and provides helper functions for MKL fft and cuFFT.
In particular, on POSIX, headers are using mkl-include from conda, and on Windows, it is from a new file @yf225 and I made and uploaded to s3.

* add mkl-include to required packages

* include MKL headers; add AT_MKL_ENABLED flag; add a method to query MKL availability

* Add MKL and CUFFT helpers
2018-03-19 15:43:14 -04:00
gchanan
a3442f62bc
Support native namespace functions with type dispatch. (#5576)
* Support native namespace functions with type dispatch.

Use 'ones' as an example.  Note this is a "halfway" solution; i.e. the call chain is:
at::ones(shape, dtype) -> dtype.ones(shape, dtype) -> CPUFloatType.ones(shape, dtype) -> at::native::ones(shape, dtype)

The "nicer" solution would probably be something like:
at::ones(shape, dtype) -> dtype.ones(shape) -> CPUFloatType.ones(shape) -> at::native::ones(shape, this)

* Fix type inference.

* Fix test install.

* Fix extensions.

* Put dtype argument at the beginning.

* Fix extension.cpp.

* Fix rnn.

* Move zeros in the same manner.

* Fix cuda.

* Change randn.

* Change rand.

* Change randperm.

* Fix aten contrib.

* Resize in randperm_out.

* Implement eye.

* Fix sparse zeros.

* linspace, logspace.

* arange.

* range.

* Remove type dispatch from gen_python_functions.

* Properly generate maybe_init_cuda for type dispatch functions not named type.

* Don't duplicate dtype, this parameters for native type dispatched functions.

* Call VariableType factory methods from the base type so it gets version number 0.

* Address review comments.
2018-03-09 10:52:53 -05:00
Edward Z. Yang
0877558e60
Port cuDNN RNN dropout state initialization to ATen and make Python c… (#5383)
* Port cuDNN RNN dropout state initialization to ATen and make Python code use it.

Fixes #5138.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Variable/Tensor bugfix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-02 10:00:00 -05:00
Sam Gross
895aebac08
Use Variable instead of Tensor in Function.forward (#4786)
The Tensor and Variable classes are being merged.
autograd.Function.forward is now called on Variables, but with "no-grad"
mode (torch.no_grad()) enabled.

One benefit is that we no longer have to explicitly track shared
storages.
2018-02-06 17:24:27 -05:00
Edward Z. Yang
7bd2db997e
Port cuDNN RNN bindings to ATen (#4881)
* Add transpose() to TensorGeometry.

This code is dead; I briefly used it in my RNN patchset but
eventually rewrote it to not be necessary.  However, it seemed
like a useful gadget so I kept it.  In general, it seems that it
would be useful for TensorGeometry to support all operations that
Tensor does, but it only computes the changes to sizes/strides
instead of actually doing the computation.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Turn on wrap_dim behavior for TensorGeometry

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Support for hard-coded differentiable outputs.

Some outputs of functions are nondifferentiable, and should always
be returned with requires_grad=False.  Traditionally, we have used
the presence of 'grad' to signal that only the first output is
differentiable, and the rest are not, but cudnn_rnn (to be
implemented) breaks this pattern; its first three outputs are differentiable,
but its last output is a buffer that is just consumed by backwards.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* TensorGeometry constructor from just sizes

The sizes are assumed to form a contiguous tensor, and we compute
the strides we would get in that case.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Support saving TensorList for backwards.

There is some back story here.  Saved TensorList in backwards will
be used by cudnn_rnn, and it is worth asking, why is it necessary to
save a list of tensors?  Indeed, *technically* speaking a list of
tensors is not necessary, we only need to save the sizes of each
of the weight tensors.  (We need the sizes because cuDNN is only
going to blast the derivative of weights into a flat buffer, but
we need to match the sizes of the views into the buffer when we
eventually return the derivatives.)

However, it was surprisingly awful trying to implement passing just
sizes, because as non-Tensor arguments, the JIT interpreter generation
code is expected to handle all non-Tensor arguments as attributes in the
trace, and our attributes struct doesn't actually know how to do
arrays of arrays.  Saved TensorList code was much easier to get working,
so that's what this patch does.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* MatrixRef - an ArrayRef with a stride, making it a 2D ArrayRef.

Like ArrayRef, this class does not own the underlying data, it is expected
to be used in situations where the data resides in some other buffer.
This is intended to be trivially copyable, so it should be passed by
value.

For now, 2D only (so the copies are actually cheap, without having
to write a SmallVector class) and contiguous only (so we can
return non-strided ArrayRef on index).

The intended use-case (not in this commit) is to make it easier to
work with RNN weights, which are num_weights x num_layers matrix of
parameters.

P.S. dimension 0 indexes rows, dimension 1 indexes columns

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Generalize getDataType in Descriptors.h

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Change copy_range to take Tensor, and change cat_tensors_backward accordingly

Should a backward function return a Variable or a Tensor?  For the most
part, all of our backward functions return Tensor, except cat_tensors_backward,
which returns a variable_list (which is really the only thing that matters,
because Tensor and Variable are interconvertible).  But this is kind of weird,
because it means that you can't implement a backwards in ATen that returns
a std::vector<Tensor>, and then hook it up transparently with the derivatives
code.  So I switched it over.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Support 5-ary return Tensor tuple.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Support code generation with mixed Tensor/TensorList in output.

I don't think I ended up using this in cudnn_rnn, but this seems
it might be useful for someone else later.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Support 4-ary boolean array

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Add support for retain_variables in tools/autograd/derivatives.yaml

'retain_variables', a bool which is true if a user has specified
that saved variables should be retained in case the backwards is
run again later.  This allows an optimization where we can
destroy saved buffers if we know variables are not going to be retained,
e.g., it is (will be) used by _cudnn_rnn

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Lazily initialize cuDNN descriptors

Previously, cuDNN descriptors were eagerly allocated as soon
as a FooDescriptor object was created.  However, in some uses
of TensorDescriptor, this is problematic: some tensors are optional
and cuDNN's API expects to be given a nullptr TensorDescriptor
in this case, not an uninitialized (but allocated) descriptor.

Lazily initializing the descriptors makes it less likely for
us to use uninitialized memory and matches the usual semantics of
unique_ptr.  It's good sense!

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Port cuDNN RNNs to ATen.

This brings three new functions:
  - _cudnn_rnn_flatten_weight: flatten a matrix of weight tensors into
    a single contiguous weight buffer as required by cuDNN
  - _cudnn_rnn: run RNN forwards
  - _cudnn_rnn_backward: run RNN backwards

RNNs have a lot of parameters, so we restructured what was previously
a single 'fn' object that recorded all the parameters into three
objects: RNNDescriptorParams, TensorDescriptorListParams and
DropoutDescriptorParams.

We make use of MatrixRef to organize the weight tensors (which are
weight/bias x number of layers), but I did not teach the codegen
how to pass these as arguments/return values natively, so instead
a MatrixRef is passed as its constituent ArrayRef and int64_t stride0.

cudnn_rnn has three differentiable outputs and one nondifferentiable
one, so it makes use of the support for hard-coded differentiable outputs.

I haven't deleted all of the descriptor code from Python, because dropout
initialization still goes through this codepath, that should be fixed soon
but I don't see it as essential for this PR.

This commit also removes the last use of NestedIOFunction from PyTorch.

There are some shenanigans with cuDNN dropout descriptor initialization,
see below:

Note [cuDNN dropout descriptor initialization]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In most cases, setting descriptors in cuDNN is cheap (e.g.,
cudnnSetTensorNdDescriptor).  However, this is not the case for
cudnnSetDropoutDescriptor: in cuDNN 6/7 (and possibly others) it does an
expensive precomputation to initialize the random number generator states.  In
cuDNN 6, this is the ONLY official mechanism to initialize a dropout descriptor,
which means that law-abiding clients were expected to generate a dropout
descriptor once and cache it.  However, our ATen interface is (1) stateless (so
we can't cache the descriptors) and (2) does not accept arbitrary user types in
its interface (so we can't pass the descriptor in).  This puts us in a pickle.

In cuDNN 7, a new function, cudnnRestoreDropoutDescriptor was added, which
forgoes the expensive initialization process, and can initialize the
descriptor with a pre-initialized state CUDA tensor.  This is great, because
it means we can simply pass in the state tensor and then initialize the
descriptor internally.  Unfortunately, this function is not available in
cuDNN 6.

To work around this, we break the cuDNN abstraction barrier, and have
the struct layout of the underlaying dropout descriptor.  With this struct,
we can reimplement cudnnRestoreDropoutDescriptor from scratch. Great!

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Fix cuDNN 7 behavior.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Delete some unused, controversial methods from MatrixRef.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Add missing filter_dim_a slice

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Replace nested for-loop with itertools.chain.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* CR comment on mut_desc()

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Refactor DropoutDescriptor API.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Use cached CurrentDeviceProperties from Context.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Document _cudnn_rnn outputs.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Improve fmap docs, convert some functions to use it.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Move IndexRange to autograd/function.h

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Elaborate on CUDNN_STATUS_INVALID_VALUE return some more.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Add an all-in-one setter for RNNDescriptorParams.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Print what the unrecognized RNN mode was

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* RNN TensorDescriptor improvements

- Have an explicit size/stride overload for set TensorDescriptor,
  so you don't have to create a goofy view to feed in.

- Change the padding to 3D rather than 5D, which is all you actually
  need (it's just 2D that is not supported by cuDNN API.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Fix implementation of cudnnRestoreDropoutDescriptor, plus test.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Better comments about input layout.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Add comment about no-DropoutDescriptor argument RNNDescriptor function.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Rename vocab_size back to input_size.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Don't use backslash in comment.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Bugfix for contiguous TensorGeometry calculation.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Don't allocate a dummy tensor when setting TensorDescriptor for flatten_weight.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Make contiguity errors more user-friendly.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* s/fn.dropout.train/fn_train/

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* s/_cudnn_rnn_backward_grad/_cudnn_rnn_backward_input/

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Make dcx properly undefined when not required.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Remove old TODO.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Add state size check in cudnnRestoreDropoutDescriptor

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Explicitly narrow int64_t to size_t

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Restore copyParams comment.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Update benchmark numbers, and slight engineering improvements.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Typofix.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-02-05 13:54:11 -05:00
Edward Z. Yang
7d25a41251
Fix #4492, make it impossible to forget to reset cudnn flags (#4503)
Three stage plan to no more stupidly weird "why isn't cuDNN enabled"
bugs:

- Add torch.backends.cudnn.disable_global_flags(), which as its name suggests,
  disables global flag setting in cuDNN, so that you are not allowed to
  make changes to this state.  However, the flags() context
  manager continues to work (since they are non-global changes).

- Call disable_global_flags() in test/common.py

- Switch all of the manual flag setting/unsetting in test/test_nn.py
  to use the context manager.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-08 12:21:09 -05:00
Edward Z. Yang
5f7c5502b8
Further improvements to ATen convolution (#4287)
- Rename THNN convolution to have thnn_ prefix.
- Propagate CuDNN benchmark and deterministic to at::Context
- Add 'convolution', 'convNd' and 'conv_transposeNd' native wrappers, with defaults
  The conv_transposeNd wrappers are updated to have the same argument
  order as Python.
- torch.nn.functional directly dispatches to the native wrappers
- Make it possible to turn off tracing for some native wrappers, so I don't
  have to write symbolics for all the functions above
- Spectral ops can now make use of CuDNN convolution if possible
- Better commentary on cudnn_batch_norm
- Turn on DCE for all JIT tests.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-21 13:03:43 -05:00
Edward Z. Yang
787b9c5202
Propagate CuDNN enabled to ATen library. (#4104)
This is not currently used by anything, but eventually ATen
will need to make decisions about whether or not to use
CuDNN functions or not, which means we need to propagate
this variable to ATen.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-14 11:29:25 -05:00
Richard Zou
28890b2046 Add rnn args check (#3925)
* Add rnn args check

* Check both hidden sizes for LSTM

* RNN args check test
2017-12-13 12:48:00 -05:00
peter
ba3b79b06b Fix the missing import 2017-11-14 09:36:43 +01:00
Christian Sarofeen
0443c11f7e Fix for cuDNN half precision RNN for pre-volta archs (#3613)
* Fix for cuDNN half RNN on pre-volta archs

* Fix cuDNN versioning in rnn.

* lint fix
2017-11-11 11:34:58 -05:00
peterjc123
aa911939a3 Improve Windows Compatibility (for csrc/scripts) (#2941) 2017-11-08 19:51:35 +01:00
Sean Naren
cf256ee268 Added tensor op check for cudnn rnns (#3409) 2017-11-01 05:51:23 -04:00
Priya Goyal
2443fcac0b Deterministic cudnn algorithms 2017-10-10 10:53:34 -04:00
Adam Paszke
ceb4f84d12 Improve memory usage of cuDNN RNN modules (#2179) 2017-07-25 04:00:17 +05:30
Gregory Chanan
69287250d1 Add a broadcast parameter to copy_, use it in the library in cases where there is non-broadcasting calls exposed by the tests. 2017-06-11 05:37:59 -04:00
Sam Gross
625850c2c2 Check cuDNN version at runtime (#1586)
* Check cuDNN version at runtime

This checks that the version from cudnn.h matches the version from
libcudnn.so.

Fixes #1476

* Only check major and minor version numbers
2017-05-19 01:55:09 -04:00
Sam Gross
e6c9509a41 Fix call to Tensor.set_ in rnn.py (#1592) 2017-05-18 20:28:49 -04:00
Sam Gross
b9379cfab7 Use cuDNN and NCCL symbols from _C library (#1017)
This ensures that we use the same library at the C++ level and with
Python ctypes. It moves the searching for the correct library from
run-time to compile-time.
2017-03-16 16:10:17 -04:00
Adam Paszke
1487278fdf Allow backprop through cuDNN RNN in eval mode
Handling of dropout descriptors has been improved too.
2017-03-01 19:42:39 +01:00
Adam Paszke
da725830c2 Add support for variable length sequences in RNNs (#873) 2017-03-01 17:36:32 +01:00
Christian Sarofeen
04aba1caec Fix cuDNN dropout desc for multi-gpu (#772) 2017-02-17 19:16:12 +01:00
bdfhjk
a217fefee1 Update rnn.py
Fixed a problem with outputting the RuntimeError if arguments are incorrect in cudnn/rnn.py
2017-02-15 21:49:42 +01:00
Adam Paszke
72c1982734 Add some more asserts to cuDNN RNN 2017-02-14 21:28:50 +01:00
Adam Paszke
63edca44f2 Add tests for non-contiguous inputs and gradients 2017-02-14 21:28:50 +01:00
ngimel
f096fb6859 adding cudnn V6 support (#515) 2017-01-31 02:01:37 +01:00
Adam Paszke
0180e638e5 Remove unnecessary zero_() calls in cuDNN RNN 2017-01-28 14:36:57 +01:00
Adam Paszke
95c6ae04fb Fix non-contiguous grad handling in cuDNN RNN 2017-01-28 14:36:57 +01:00
Luke Yeager
e7c1e6a8e3 [pep8] Fix most lint automatically with autopep8
Here's the command I used to invoke autopep8 (in parallel!):

    git ls-files | grep '\.py$' | xargs -n1 -P`nproc` autopep8 -i

Several rules are ignored in setup.cfg. The goal is to let autopep8
handle everything which it can handle safely, and to disable any rules
which are tricky or controversial to address. We may want to come back
and re-enable some of these rules later, but I'm trying to make this
patch as safe as possible.

Also configures flake8 to match pep8's behavior.

Also configures TravisCI to check the whole project for lint.
2017-01-28 01:15:51 +01:00
ngimel
b32dd4a876 add cudnn deb package installation paths to cudnn discovery, add 5.1.10 to load options (#448) 2017-01-13 14:32:23 -05:00
ngimel
59b23d79c6 fix cudnn rnn batch_first with tests (#445)
* fix cudnn rnn batch_first with tests
2017-01-13 13:40:27 -05:00
Adam Lerer
183b3aacd2 Hold CuDNN PRNG state between RNN iterations 2016-12-30 00:14:55 +01:00
Sam Gross
8a29338837 Use cuDNN for Conv3d and ConvTranspose3d (#359)
I've also updated test_nn.py to run marked tests twice: once with cuDNN
enabled and once with it disabled.
2016-12-28 16:14:47 -05:00
Adam Paszke
cd82b2b869 Implement comparison and logical operators for tensors 2016-12-28 00:04:08 +01:00
soumith
a9c2809ce3 change the order of cudnn libs 2016-12-21 05:44:16 -08:00
Sergey Zagoruyko
5586f48ad5 add cudnn 5.0.5 to supported versions (#321) 2016-12-17 07:57:20 -05:00