Commit Graph

302 Commits

Author SHA1 Message Date
Zachary DeVito
286cd04a20
JIT cleanup (#7631)
Cleans up dead code in the JIT:

* Remove interpreter_autograd_function
* Remove Handles
* Remove HandleBuilder
* Remove creates_handles, and tracing_autograd_python_function flags
* Remove unused var_args
* Fix submodules
2018-05-21 10:06:29 -07:00
Adam Paszke
b45f2ff1ae
Remove CompiledFunction + clean up JIT tests (#7421) 2018-05-16 20:03:04 +02:00
Jorghi12
cd86d4c554
PyTorch AMD Build Scripts (#6625)
* PyTorch AMD Build Script.

* Python invocation for hipify

* Adding individual hip fles.

* Updating CWD

Use the actual path for the file instead of the current working directory, which depends on where the script is invoked.

* Updating folder path for amd_build

* Removing previous amd_build directory

* Updated setup.py to support WITH_ROCM

* Renaming the files for CuDNN BatchNorm & Conv since having two .cpp files with the same name results in a linking error in the HCC compiler used for ROCm/AMD.

* Removing old BatchNorm & Conv files since they've been renamed.

* Updating build path to handle ROCM

* Cleaned up the build path and created a FindHIP cmake file for setting up relevant hip paths.

* Seperated the individual patch files to make it easier to detect issues while building.

* Removed CMakeLists hip files and fixed directory structure

* Adding build pytorch amd script

* Merged setup patch into PyTorch setup.py & cleaned a few issues

* Added information on where to download the hipify-python script.

* Resolved linting issues inside of build_pytorch_amd.py

* Removing many unnecessary patch files. Removing unnecessary .hip files. Fixing up the build process.

* Refactored the PR for supporting HIP

* Minimizing the number of changes inside individual patches.

* Cleaned up patch files.

* Removed patch files.

* Updating patches

* Removing HIP change from file.

* Cleaned up patches

* Added AVX/SSE avoidance due to bug with ROCms stack. Just temporary for now.

* Removing the other HIP file

* Removed patch file + merged ROCm into Aten/test

* Removed ATen tests patch file and updated disbale_features yaml to remove headers that don't exist on the HIP stack.

* Reduced the number of patches down to 14 after Edward's suggestions.

* Transferred deletion of certain functions from patch to yaml file.

* Set default Thrust path

* Fixed aten files so we now use the templated pow/abs instead of std:: directly.

* Removed error from aten/src/THCUNN/Abs.cu

* Updated the locations of the cmake build files. Moved THCTensorRandom from a hip to a patch file. Added executable/library commands that can successfully handle either CUDA or HIP.

* Removed hip extraction from the build script and removed the old hip file.

* Replaced MACRO with function in upper level cmake.

* Added empty ELSE() block to prevent the loading of a command without CUDA or HIP. Also added IF guards around torch_cuda_based_add_executable in Aten tests.

* Updated aten tests.

* Removed the hip include from the ATen header.

* Can't throw exceptions on C++ AMP, using abort

* Missing IF guards for cuda/hip executables in aten tests.

* Removed a series of patch files.

* Added template keyword to help out the HCC compiler.

* Rebased the specific files displayed in the PR

* Fixing typo.

* Change flag from "WITH_CUDA" to "NOT NO_CUDA"

Replacing "WITH_CUDA" with "NOT NO_CUDA" after the rebase.

* Fix LoadHIP path

* Updating build files after rebasing.

* Reorganization after cpu/gpu separation.

* Removed HIPCC from setup.py & removed -shared extra linking args.

* Updated CMake / Setup build to correctly link when under ROCm stack.

* Removed the unnecessary argument from Extension constructor.

* Adding another test to be included with ROCm building.

* Updated the setup_helpers scripts in order to get around linter error

* Fix syntax issue

* Solving lint issue: line too long
2018-05-15 18:38:01 -07:00
Zachary DeVito
ce69d3110b
Improve script builtin checking using schema (#7311)
Improve script builtin checking using schema

* This add aten_schema.h which provides a barebones amount of type and
  argument information about each builtin operator
* emitBuiltinCall is updated to use this information rather than
  aten_dispatch to ensure the operator is correct.
* handling of keyword and position arguments now matches python behavior
* There is no longer a requirement that kwargs be constant or that the
  attributes of an op must be entirely constant or non-constant
* compiler now constructs a non-attributed version of the op first and
  then turns it into the constant-attribute version if all attributes
  are constants.
* default arguments for builtins now work
* SugaredValue::call and similar functions now have SourceRange information
  for their arguments so that error reporting is more accurate

Notes:
* This does not try to merge the builtin checking with python arg parser.
  Given that we will eventually have C10 schema which will replace aten_schema,
  we will eventually have a C++ description of the schema and working of that
  description directly will be the easiest form to understand.
* python function calls and script method calls do not support keyword arguments yet.
  When we add this support we should refactor the handling in tryEmitSchema
  that resolves keywords into a common function.

* default arguments work
* keyword arguments to builtins work (still need to extend to calling python and other script methods)
* much better error reporting for incorrect builtins

Lift any constants to attributes on nodes when possible

* Schema  is usable internally in the compiler as
  the function signatures of script functions as well as for builtin
  operators.
* Adds a List[T] class to better represent the arguments to cat/stack
  as a type rather than with custom checking.
* Support kwargs for calls of script methods

A future commit will be needed to add support for:
* calls to script _functions_ which are currently are GraphExecutors without schema info.
* kwargs to python functions, which will require refactoring python op
2018-05-14 14:46:36 -07:00
Zachary DeVito
93eb50c103
Mark expand nodes as implicit/explicit in trace (#7303)
When tracing we record expand nodes. This is useful in some cases because
it makes it clear a broadcast happened. However, in future runs
the broadcast may be different or not needed. This change adds an
attribute to expand to track if it was implicitly added. This
takes the form of an unused input to expand with a default value.

The execution engine then removes implicit expands before execution.
Note that shape_analysis will re-add expands when it can prove by
shape analysis that they will exist and this is useful for the fuser,
so this change should not affect fusion passes.
2018-05-10 10:47:43 -07:00
Edward Z. Yang
64834f6fb8
Split libATen.so into libATen_cpu.so and libATen_cuda.so (#7275)
* Split libATen.so into libATen_cpu.so and libATen_cuda.so

Previously, ATen could be built with either CPU-only support, or
CPU/CUDA support, but only via a compile-time flag, requiring
two separate builds.  This means that if you have a program which
indirectly uses a CPU-only build of ATen, and a CPU/CUDA-build of
ATen, you're gonna have a bad time.  And you might want a CPU-only
build of ATen, because it is 15M (versus the 300M of a CUDA build).

This commit splits libATen.so into two libraries, CPU/CUDA, so
that it's not necessary to do a full rebuild to get CPU-only
support; instead, if you link against libATen_cpu.so only, you
are CPU-only; if you additionally link/dlopen libATen_cuda.so,
this enables CUDA support.  This brings ATen's dynamic library
structure more similar to Caffe2's.  libATen.so is no more
(this is BC BREAKING)

The general principle for how this works is that we introduce
a *hooks* interface, which introduces a dynamic dispatch indirection
between a call site and implementation site of CUDA functionality,
mediated by a static initialization registry.  This means that we can continue
to, for example, lazily initialize CUDA from Context (a core, CPU class) without
having a direct dependency on the CUDA bits.  Instead, we look up
in the registry if, e.g., CUDA hooks have been loaded (this loading
process happens at static initialization time), and if they
have been we dynamic dispatch to this class.  We similarly use
the hooks interface to handle Variable registration.

We introduce a new invariant: if the backend of a type has not
been initialized (e.g., it's library has not been dlopened; for
CUDA, this also includes CUDA initialization), then the Type
pointers in the context registry are NULL.  If you access the
registry directly you must maintain this invariant.

There are a few potholes along the way.  I document them here:

- Previously, PyTorch maintained a separate registry for variable
  types, because no provision for them was made in the Context's
  type_registry.  Now that we have the hooks mechanism, we can easily
  have PyTorch register variables in the main registry.  The code
  has been refactored accordingly.

- There is a subtle ordering issue between Variable and CUDA.
  We permit libATen_cuda.so and PyTorch to be loaded in either
  order (in practice, CUDA is always loaded "after" PyTorch, because
  it is lazily initialized.)  This means that, when CUDA types are
  loaded, we must subsequently also initialize their Variable equivalents.
  Appropriate hooks were added to VariableHooks to make this possible;
  similarly, getVariableHooks() is not referentially transparent, and
  will change behavior after Variables are loaded.  (This is different
  to CUDAHooks, which is "burned in" after you try to initialize CUDA.)

- The cmake is adjusted to separate dependencies into either CPU
  or CUDA dependencies.  The generator scripts are adjusted to either
  generate a file as a CUDA (cuda_file_manager) or CPU file (file_manager).

- I changed all native functions which were CUDA-only (the cudnn functions)
  to have dispatches for CUDA only (making it permissible to not specify
  all dispatch options.)  This uncovered a bug in how we were handling
  native functions which dispatch on a Type argument; I introduced a new
  self_ty keyword to handle this case.  I'm not 100% happy about it
  but it fixed my problem.

  This also exposed the fact that set_history incompletely handles
  heterogenous return tuples combining Tensor and TensorList.  I
  swapped this codegen to use flatten() (at the possible cost of
  a slight perf regression, since we're allocating another vector now
  in this code path).

- thc_state is no longer a public member of Context; use getTHCState() instead

- This PR comes with Registry from Caffe2, for handling static initialization.
  I needed to make a bunch of fixes to Registry to make it more portable

  - No more ##__VA_ARGS__ token pasting; instead, it is mandatory to pass at
    least one argument to the var-args. CUDAHooks and VariableHooks pass a nullary
    struct CUDAHooksArgs/VariableHooksArgs to solve the problem. We must get rid of
    token pasting because it does not work with MSVC.

  - It seems MSVC is not willing to generate code for constructors of template
    classes at use sites which cross DLL boundaries. So we explicitly instantiate
    the class to get around the problem. This involved tweaks to the boilerplate
    generating macros, and also required us to shuffle around namespaces a bit,
    because you can't specialize a template unless you are in the same namespace as
    the template.
  - Insertion of AT_API to appropriate places where the registry must be exported

- We have a general problem which is that on recent Ubuntu distributions,
  --as-needed is enabled for shared libraries, which is (cc @apaszke who was
  worrying about this in #7160 see also #7160 (comment)). For now, I've hacked
  this up in the PR to pass -Wl,--no-as-needed to all of the spots necessary to
  make CI work, but a more sustainable solution is to attempt to dlopen
  libATen_cuda.so when CUDA functionality is requested.

    - The JIT tests somehow manage to try to touch CUDA without loading libATen_cuda.so. So
      we pass -Wl,--no-as-needed when linking libATen_cuda.so to _C.so

- There is a very subtle linking issue with lapack, which is solved by making sure libATen_cuda.so links against LAPACK. There's a comment in aten/src/ATen/CMakeLists.txt about htis as well as a follow up bug at #7353

- autogradpp used AT_CUDA_ENABLED directly. We've expunged these uses and added
  a few more things to CUDAHooks (getNumGPUs)

- Added manualSeedAll to Generator so that we can invoke it polymorphically (it
  only does something different for CUDAGenerator)

- There's a new cuda/CUDAConfig.h header for CUDA-only ifdef macros (AT_CUDNN_ENABLED, most prominently)

- CUDAHooks/VariableHooks structs live in at namespace because Registry's
  namespace support is not good enough to handle it otherwise (see Registry
  changes above)

- There's some modest moving around of native functions in ReduceOps and
  UnaryOps to get the CUDA-only function implementations into separate files, so
  they are only compiled into libATen_cuda.so. sspaddmm needed a separate CUDA
  function due to object linkage boundaries.

- Some direct uses of native functions in CUDA code has to go away, since these
  functions are not exported, so you have to go through the dispatcher
  (at::native::empty_like to at::empty_like)

- Code in THC/THCS/THCUNN now properly use THC_API macro instead of TH_API
  (which matters now that TH and THC are not in the same library)

- Added code debt in torch/_thnn/utils.py and other THNN parsing code to handle
  both TH_API and THC_API

- TensorUtils.h is now properly exported with AT_API

- Dead uses of TH_EXPORTS and co expunged; we now use ATen_cpu_exports and
  ATen_cuda_exports (new, in ATenCUDAGeneral.h) consistently

- Fix some incorrect type annotations on _cudnn_rnn_backward, where we didn't
  declare a type as possibly undefined when we should have. We didn't catch this
  previously because optional annotations are not tested on "pass-through" native
  ATen ops (which don't have dispatch). Upstream issue at #7316

- There's a new cmake macro aten_compile_options for applying all of our
  per-target compile time options. We use this on the cpu and cuda libraries.

- test/test_cpp_extensions.py can be run directly by invoking in Python,
  assuming you've setup your PYTHONPATH setup correctly

- type_from_string does some new funny business to only query for all valid CUDA
  types (which causes CUDA initialization) when we see "torch.cuda." in the
  requested string

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Last mile libtorch fixes

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* pedantic fix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-10 10:28:33 -07:00
Peter Goldsborough
54a4867675
Bring back C++ extension torch.h (#7310)
* Bring back C++ extension torch.h

* Fix python.h include in python_tensor.cpp
2018-05-05 14:06:27 -07:00
Peter Goldsborough
feb64b5291
Add -Wno-unknown-pragmas (#7291) 2018-05-04 13:44:13 -07:00
Peter Goldsborough
67d0d14908
Rename autograd namespace to torch and change torch.h into python.h (#7267)
* Rename autograd namespace to torch and change torch.h into python.h

* Include torch.h instead of python.h in test/cpp/api

* Change some mentions of torch.h to python.h in C++ extensions

* Set paths directly, without find_path
2018-05-04 08:04:57 -07:00
Soumith Chintala
92f54e1f01
remove static libstdc++ linking and PYTORCH_BINARY_BUILD env variable (#7259) 2018-05-03 12:32:57 -07:00
Luca Antiga
5d3c3c53aa
Add raw IR serialization/deserialization (#6392) 2018-05-01 20:21:29 +02:00
Luca Antiga
0703357723 Don't build THD/master_worker if not explicitly requested (#7081) 2018-04-29 13:17:09 -04:00
James Reed
4667983f0f
Fixes for interpreter and ONNX export for translation (#7044)
Fixes for interpreter and ONNX export for translation

Address comments
2018-04-27 22:23:57 -07:00
Peter Goldsborough
7b09bc72a5
[WIP] Enable WERROR in tests (#6539)
* Enable WERROR in tests

* Also set WERROR=1 for cpp_build in CI

* Enable Werror after the compiler checks

* Remove -DWERROR because its picked up from the env var

* Had to fix some errors in aten/contrib/data

* Allow an uninitialized variable in ReduceOpsKernel.cpp

* Use CUDNN_DATA_UINT8 in cuDNN type string conversion

* Fixes and use target_compile_options

* Fix uninitialized variables in THNN

* Include Python.h earlier in tensor_types.cpp

* Use CUDNN_VERSION 7100 instead of 7000?

* More Python.h includes

* Make switch case in common_subexpression_elimination.cpp exhaustive

* Build with WERROR=0 just to see all the warnings

* Remove some Python includes

* Enable WERROR=1 again

* Bring back switch case default
2018-04-28 01:51:16 +01:00
Soumith Chintala
bd14d8e8f8
add additional caffe/caffe2 paths to exclude list in pytorch setup.py (#6891) 2018-04-25 22:10:38 -04:00
Orion Reblitz-Richardson
dec5e99e99 [aten] Move submodules to third_party (#6866)
* [aten] Move submodules to third_party

* [aten] Update aten_mirror.sh script for third_party

* [aten] Move ATen submodules def to root and rename

* [aten] Update cpuinfo cmake build

* [aten] Fix cpuinfo cmake build

* Update third_party/cpuinfo to d03d5d296063063c66877fb559cf34469734e3e1

* [aten] Fix JIT test reference to catch
2018-04-24 23:33:46 -04:00
gchanan
1c7b0c1020 Update version string to 0.5. (#6795) 2018-04-22 13:57:48 -04:00
bddppq
c43c911662
Export onnx protobuf bindings to python (#6651)
* Export onnx protobuf bindings to python

* rename native onnx module to _onnx
2018-04-17 16:38:57 -07:00
srib
53d2612b55 Fix a typo in the setup.py script (#6632) 2018-04-16 15:29:45 -04:00
Zachary DeVito
825ce7f196
[jit][script] Allow tuples to be re-assigned (#6538)
* Allow tuples to be re-assigned

This commit improves our support of tuples by making them more first-class.
In particular, it allows tuples to be re-assigned across loops and ifs.
It does this by making them first-class values in the Graph IR, and then
removing the tuples in a LowerTuples pass.

An alternative approach would have added more support for desugaring tuples
in the Environment object as they were emitted. Instead,
the current approach was chosen anticipating a future when tuples are
fully supported (including the interpreter). In that future, the current
code can be completly reused with the LowerTuples pass just becoming
a optimization that removes unneeded tuple allocations.
2018-04-13 17:34:50 -07:00
Peter Goldsborough
e4f1d3b538
Better warnings (#6428)
* Better warnings

* Remove -Wc++14-extensions because gcc does not know it

* Warning fix in input_buffer.cpp

* Remove pedantic for torch/csrc/

* Also use Wextra and Wall for ATen

* Use check_env_flag

* Undo changes in shape_analysis.cpp

* Remove C linkage flag
2018-04-10 23:34:25 -07:00
peterjc123
5651695a99 Fixes #6386, Use copies instead of symbolic files (#6396)
* Use copies instead of symbolic files

* bug fix

* Remove useless item
2018-04-09 13:54:10 -04:00
Soumith Chintala
108f5c197f
[pytorch] add static linkage support for CuDNN and NCCL (#6410)
* when linking static CUDA libs, additional dep on culibos.a

* add USE_STATIC_NCCL option

* add USE_STATIC_CUDNN option

* remove libATen soversion

* add caffe, caffe2 folders to setup.py exclude list
2018-04-08 22:54:18 -04:00
Ben
119ea39021 add cuda headers (#6401) 2018-04-08 10:50:20 -04:00
gchanan
87e369111a
Add string-style devices to all tensors. (#6283)
* Add string-style devices to all tensors.

Previously, tensors only had a 'get_device' method which would throw an exception on a CPU tensor.   This made it necessary to if/else code that
was meant to be device agnostic.

This PR implements the following:
1) Adds a 'device' property to all tensors that returns a string representation of the device for all tensors.
For cpu tensors this is 'cpu'.  For cuda tensors this is 'cuda:X', where X is the cuda device ordinal.

2) Adds a DeviceSpec class.  This is just a helper class for separating device_type and device_index specification and to allow partial specification.
For example, you can call DeviceSpec('cuda'), DeviceSpec('cuda:0'), DeviceSpec('cuda', 1).
Also has backwards compatibility support for specifying integers, which are treated as cuda devices.

DeviceSpecs have the following properties:
a) device_type: string representation of the device type (i.e. 'cpu' or 'cuda')
b) device_index: integer for the device index (None if not specified)
c) cuda_device_index: for backwards compatibility; behaves roughly like `get_device` did previously.  I.e. if a function previously took integers for cuda devices,
it can now take DeviceSpecs (or strings), and can maintain the old functionality by calling `old_index = DeviceSpec(old).cuda_device_index`.

3) tensor methods and torch. functions that took integer devices can now take integers, strings, or DeviceSpecs.  For example:
torch.randn((2,3), dtype=torch.cuda.float32, device='cuda:1')

TODO in future PRs:
A) Split out cuda from dtype so you don't need to overspecify cuda-ness
B) We currently only support strings/DeviceSpecs in tensor methods and torch. functions.  We should have equivalents torch.cuda.device(...), torch.cuda.device_of, etc.
at the torch. level that work on strings/DeviceSpecs

* Add deviceInt64 to python arg parser.

* device_str.

* Remove device_str.

* remove device prefix from attributes.

* Use const char * instead of string.

* Move autogpu index out of Device.

* comment on is_default.

* Rename torch.DeviceSpec to torch.device.

* comment.

* Fix tests.

* Fix flake8.

* Fix sparse_coo_tensor parameter name.

* Improve error message.

* Remove device_ prefix from C++ device object.

* Allocate static strings.

* Return not implemented from rich compare.

* Move torch::Device to THPDevice.

* Remove cuda index.

* Py_RETURN_NOTIMPLEMENTED doesn't exist in python2.
2018-04-06 15:12:05 -04:00
Sam Gross
6b3a4637d6
Make the tensor type torch.Tensor instead of torch.autograd.Variable (#5785)
This changes type(tensor) to return `torch.Tensor` instead of
`torch.autograd.Variable`.

This requires a few implementation changes:

 - torch.Tensor is now a regular Python class instead of a
   pseudo-factory like torch.FloatTensor/torch.DoubleTensor
 - torch.autograd.Variable is just a shell with a __new__ function.
   Since no instanes are constructed it doesn't have any methods.
 - Adds torch.get_default_dtype() since torch.Tensor.dtype returns
   <attribute 'dtype' of 'torch._C._TensorBase' objects>
2018-04-03 16:29:25 -04:00
gchanan
4c81282c33
Introduce torch.layout and split layout from dtypes. (#6145)
* Introduce torch.layout and split layout from dtypes.

Tensors (and tensor types) now have a 'layout' attribute that returns either 'torch.strided' or 'torch.sparse_coo'.

Previously, dtypes were 1-to-1 with ATen types/PyTensorTypes; the impetus behind this decision was to make things easy in the common case
(i.e. specifying a type in a factory function).  But this doesn't really follow for sparity, which isn't a common case.

It also doesn't properly represent the concept or a dtype, which in numpy are proper scalar types (i.e. roughly the type returned from indexing the
last dimension of an n-d array).  But this should be the same whether or not the tensor is represented via strides, sparsity, etc.

This is accomplished by:
1) having the dtype of tensor return the (device-type, scalar-type) combination, i.e. torch.cuda.float32, so both
   torch.cuda.FloatTensor and torch.cuda.sparse.FloatTensor have the same dtype
2) Adding a layout parameter to python functions, where the combination of (dtype, layout) maps to an ATen type that is used for dispatch.

* Formatting, make init throw python_error.

* Fix cuda not enabled error message.

* Fix test.
2018-04-02 14:07:50 -04:00
peterjc123
63af898d46 Fix extension test on Windows (#5548)
* Change cpp_extensions.py to make it work on Windows

* Fix linting

* Show python paths

* Debug

* Debug 1

* set PYTHONPATH

* Add ATen into library

* expose essential libs and functions, and copy _C.lib

* Specify dir in header

* Update check_abi for MSVC

* Activate cl environment to compile cpp extensions

* change version string

* Redirect stderr to stdout

* Add monkey patch for windows

* Remove unnecessary self

* Fix various issues

* Append necessary flags

* add /MD flag to cuda

* Install ninja

* Use THP_API instead of THP_CLASS

* Beautify the paths

* Revert "Use THP_API instead of THP_CLASS"

This reverts commit dd7e74c44db48e4c5f85bb8e3c698ff9de71ba2d.

* Use THP_API instead of THP_CLASS(new)
2018-04-02 13:53:25 -04:00
Richard Zou
7355f5cd8d Tell source users about TORCH_CUDA_ARCH_LIST (#6185)
Put it into the comments about env vars in setup.py.
Also put in a line in the README about where to find this info.
2018-04-02 13:35:14 -04:00
Ma Mingfei
f8270c0225 Enable MKLDNN convolution forward and backward (#6062)
* Enable MKLDNN convolution forward and backward

* minor change

* fix mkldnn build error when building ATen standalone
2018-03-29 15:25:07 -07:00
Simeon Monov
a90aa5d818 Fix small typo in setup.py (#6091)
Fixed small typo in setup.py
2018-03-28 16:51:08 -07:00
Edward Z. Yang
eb18a2f26c
Reorganize third-party libraries into top-level third_party directory (#6025)
- gloo, pybind11, nanopb and nccl now live in third_party.
- ATen builds in aten/build rather than torch/lib/build/aten
- A bit of faffing about in the scripts was necessary, because they used to assume that everything lived in the same directory. Now you are expected to cd into the correct directory before calling one of the build functions. The actual builder script lives in tools
- Lint now just unconditionally ignores third_party, rather than enumerating folders explicitly
2018-03-27 22:09:20 -04:00
peterjc123
1ab248d09e Fixes #5973: Stop printing verbose warnings for MSVC (#6001)
* Stop printing verbose warnings

* Add missing options

* Fix for misspelling
2018-03-26 09:40:30 -04:00
Simeon Monov
c4ee2b7067 Moved torch headers copy to build_deps (#5772)
* Moved torch headers copy to build_deps

PR #5706 initially moved headers under build_ext to fix bdist_wheel and
build develop. This broke install and #5755 moved them back to install
which broke bdist_wheel and build develop. Looks like build_ext is called
from install after it already tried to copy the headers to the python install
dir and the headers were not installed correctly. Using build_deps works
correct with all setup.py install, bdist_wheel and build develop.

* Comment about the auto-generated files

Added comment that the current solution will not include auto-generated
files which may be a problem if somebody needs to use them
2018-03-23 11:34:27 -04:00
Jon Malmaud
add04c56bf Verify that 'catch' submodule has been checked out before attempting build. (#5941) 2018-03-22 11:28:04 -04:00
gchanan
c474136ee1
[REDO] Add torch.sparse_coo_tensor factory. (#5781)
* Add torch.sparse_coo_tensor factory.

Notes:
1) I didn't add Tensor.new_sparse_coo_tensor; it didn't seem particularly useful, but it's easy to add
2) This doesn't do the type inference, i.e. torch.sparse_coo_tensor(indices=LongTensor, values=IntTensor)
will return a sparse tensor corresponding to the default type rather than a sparse IntTensor.  We can add
type inference later when we add it to other factories.

* Fix merge.

* Use type_conversion function from python_variable_methods.
2018-03-16 13:58:02 -04:00
cpuhrsch
5fa3aac610 ATen ReduceOps (#5776)
#5481 was reverted due to a strange test bug. This PR attempts to fix that.

This diff adds vectorization to ATen. It uses intel intrinsics to build a general vec256 class, that represents types of 256bit width. These can then be treated like regular variables. Using those it implements torch.sum() for the contiguous case. It uses Intel TBB for multithreading, which allows workstealing and chunks the reduction operations based on a experimentally chosen value (_THRESHOLD). It uses cpuinfo to pick the right code depending on the host's capabilities.

The kernels are implemented under native/cpu. Each .cpp file is compiled with -avx, -avx2 and no additional flags. A macro is used to append AVX, AVX2 or NONE to the function name. The header then needs to define the functions three times, one for each capability. This could be improved by either changing the cmake file a bit or possibly generating source code using a Python script etc.

For the non-contiguous case this defaults to the current implementation within TH. For CUDA is entirely defaults to the implementation within THC.

There probably needs to be a bit of a debate around the design decisions here, the additional dependencies, parallelization strategy, clarity, etc. The numerical results also diverge from numpy with larger tensors, which is expected since we're summing, for example, 8 numbers and then adding the result to the running sum, instead of each number one by one. But there might be something to be said about accumulating into a double for floats or the degree of divergence, the behavior with respect to CUDA, etc.

I wrote a [small Python script]( https://github.com/cpuhrsch/benchmark/blob/sumall/benchmarks/sum_bench.py) to compare the results with numpy numerically as well as on timing. I ran this script to create timings both on master and this branch.

Here is the command for 1 core
`OMP_NUM_THREAD=1 taskset -c 0 python sum_bench.py --enable_numpy 200`

Here is the command for all cores
`python sum_bench.py --enable_numpy 200`

Here are the results of each:

[Master, 1 core](https://paste.fedoraproject.org/paste/Nho9JzHpPVK9av8a6mByjQ)

[This branch, 1 core](https://paste.fedoraproject.org/paste/6xLHkYvcVJx9z~5MoHxN4w)

[Master, all cores](https://paste.fedoraproject.org/paste/5l3V1d5zGqvJcMXIUteMRw)

[This branch, all cores](https://paste.fedoraproject.org/paste/J4RuDU-0Drz0aZwtphQwEA)

To test the command is
`python sum_bench.py --test 200`

[This branch, test results](https://paste.fedoraproject.org/paste/kTEoUC~oWgXA6XWMAfNfNw)

For this test we look at the average absolute value of the differences. This does not take into account the relative magnitude of the numbers. The numbers are sampled from a standard normal distribution. 

In terms of performance this diff should bring PyTorch on par with Numpy and usually exceed it by 1.5 to 2x.
2018-03-15 12:09:28 -04:00
peterjc123
abd6f82709 Fix debug build failure on Windows (#5771) 2018-03-15 11:42:44 -04:00
Edward Z. Yang
cadeb0cb17
Revert "ATen ReduceOps (#5481)" (#5765)
* Revert "ATen ReduceOps (#5481)"

This reverts commit 310c3735b9.

* Revert "Check that new cpuinfo and tbb submodules exist (#5714)"

This reverts commit 1a23c9901d.
2018-03-13 23:50:16 -04:00
Peter Goldsborough
bab0f8484b Put torch header install back into the install command (#5755) 2018-03-13 19:23:02 -04:00
Sam Gross
1a23c9901d
Check that new cpuinfo and tbb submodules exist (#5714) 2018-03-12 15:44:10 -04:00
Zachary DeVito
41285edbb6 [jit] add a compiled script module (#5630)
Add script::Module C++ class to represent script modules
switch AST -> IR conversion to work on Modules/Methods rather than raw graphs
function-only AST -> IR conversion is just a simplified case where there is
only one module with a single method and no parameters.
introduce SugaredValue in compiler.h to represent values in scope in a script
function that are not first-class and that get desugared. This is used to
represent the module's self parameter, as well as python function calls,
and method calls on tensor
provide a Python ScriptModule that provides a nice API on top of script::Module
allowing for the definition of script modules with methods, parameters,
and submodules
Not in this PR but intended for the future:

ScriptModule actually subclasses nn.Module, with most methods implemented
Unification of tracedmodule and script module functionality into one container class.

Detailed changelog:

* Switch compiler over to using Module, but don't
use them yet.

* Remove intermediate attribute encoding in compiler

* Create SugaredValue object to handle resolution
of compiled module.

* switch to_ir to modules, implement Select

* hacky python wrappers

* Private ScriptModule

* Add `define` to script module

* Attributes use TK_LIST_LITERAL

this anticipates adding a real list literal expression to the language.

* Add a metaclass to make sure script stubs are registered

* Add a test

* Doc createResolutionCallback

* Docs and minor editing

* Address PR comments

* Document

* Fix unicode issue
2018-03-12 09:52:40 -04:00
Simeon Monov
dede63689f Moved headers files copy for C++ extensions to build_ext in setup.py (#5706)
The header files needed for the C++ extensions were copied to
torch/lib/include under install. In case of bdist_wheel or build develop
for example, the files are not copied and cpp_extensions test is failing:

```
Running test_cpp_extensions.py ...
running install
running build
running build_ext
/home/moni/src/ibm/AI/pytorch/torch/utils/cpp_extension.py:79: UserWarning:
Your compiler (g++) may be ABI-incompatible with PyTorch.
Please use a compiler that is ABI-compatible with GCC 4.9 and above.
See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.
  warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler))
building 'torch_test_cpp_extension' extension
creating build
creating build/temp.linux-x86_64-3.6
gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/moni/src/ibm/AI/pytorch/torch/lib/include -I/home/moni/src/ibm/AI/pytorch/torch/lib/include/TH -I/home/moni/src/ibm/AI/pytorch/torch/lib/include/THC -I/home/moni/miniconda3/envs/pytorch/include/python3.6m -c extension.cpp -o build/temp.linux-x86_64-3.6/extension.o -g -DTORCH_EXTENSION_NAME=torch_test_cpp_extension -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
extension.cpp:1:25: fatal error: torch/torch.h: No such file or directory
 #include <torch/torch.h>
                         ^
compilation terminated.
error: command 'gcc' failed with exit status 1
```
2018-03-12 14:07:45 +01:00
Richard Zou
03f2ad9029 Add check for python build deps to setup.py (#5618)
* Add check for python build deps to setup.py

* Address comments

* Remove install_requires line
2018-03-09 23:49:18 -05:00
Peter Goldsborough
7391dae709 Fix Variable conversion on the way to/from Python (#5581)
* PyObject* <--> at::Tensor no longer unwraps variables, instead we expect end uses to always work with variable types, and we will only unwrap the variables when we optimize.
* Add torch::CPU, torch::CUDA and torch::getType
* at::CPU -> torch::CPU in extensions
2018-03-09 14:31:05 -08:00
Sam Gross
5dedc648bb Compile DataLoader.cpp separately (#5507)
Don't #include DataLoader.cpp in Module.cpp
2018-03-02 05:54:33 -05:00
Peter Goldsborough
b10fcca5f0 Install cuda headers in ATen build (#5474) 2018-02-28 19:36:41 -08:00
peterjc123
377d896969 better solution for the linking error related to lazy_init for MSVC (#5375)
* Revert "Fix wrong argument name (#5366)"

This reverts commit cc9d3b265d.

* Fix wrong argument naming

* Revert "Wrap torch::cuda::lazy_init with WITH_CUDA flag"

This reverts commit a8fa37f8fac5aef09eb7fe54d84de6126618c262.

* Revert "Solves the linking error related to lazy_init for MSVC"

This reverts commit 63913a102f274865a76e7c40ffdf6b40c277d5ff.

* better solution for the linking error related to lazy_init for MSVC

* Naming changes

* Namespace changes and further comment

* Rebasing onto current master

* Remove code that is useless

* Fix linting

* Remove rebasing bugs
2018-02-28 17:34:34 -05:00
Sam Gross
48a3349c29
Delete dead Tensor code paths (#5417)
This deletes most of the dead Tensor code paths, including the TensorMethods cwrap and generic/Tensor.cpp.

This also moves the THNN.cwrap/.cpp generation to generate_code which can use ninja if installed.
2018-02-27 17:58:09 -05:00
gchanan
d5038309a1
Remove WITH_SCALARS, as it's enabled by default now. (#5437) 2018-02-27 14:51:11 -05:00