pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Edward Yang	a08119afc2	Eliminate direct access to size/strides of THTensor; replace them with std::vector (#9561 ) Summary: * THTensor now stores `sizes_` and `strides_` which is a `std::vector<int64_t>` * Anywhere a "public" API function made use of a int64_t* of sizes, I opted to just finagle it out of the tensor using THTensor_getSizePtr rather than try to rewrite all of these sites to use ArrayRef. They should use ArrayRef eventually, but not yet. * There are new utility functions for resizing sizes/strides in one go (THTensor_resizeDim), or replacing sizes and strides with completely new values (THTensor_setSizesAndStrides) * Anywhere you said `t->size[n] = 0`, we now say `THTensor_setSizeAt(t, n, 0)`, ditto for strides * Anywhere you said `t->size[n]`, we now say `t->size(n)` (coming soon: ditto for strides) Previous review of just the `std::vector` change in #9518, but I'm planning to merge this all in one go. Note for gchanan: review from commit "ci" and after Pull Request resolved: https://github.com/pytorch/pytorch/pull/9561 Reviewed By: cpuhrsch Differential Revision: D8901926 Pulled By: ezyang fbshipit-source-id: 483cf275060ab0a13845cba1ece39dd127142510	2018-07-19 14:10:06 -07:00
Anders Papitto	4c615b1796	Introduce libtorch to setup.py build (#8792 ) Summary: Prior to this diff, there have been two ways of compiling the bulk of the torch codebase. There was no interaction between them - you had to pick one or the other. 1) with setup.py. This method - used the setuptools C extension functionality - worked on all platforms - did not build test_jit/test_api binaries - did not include the C++ api - always included python functionality - produced _C.so 2) with cpp_build. This method - used CMake - did not support Windows or ROCM - was capable of building the test binaries - included the C++ api - did not build the python functionality - produced libtorch.so This diff combines the two. 1) cpp_build/CMakeLists.txt has become torch/CMakeLists.txt. This build - is CMake-based - works on all platforms - builds the test binaries - includes the C++ api - does not include the python functionality - produces libtorch.so 2) the setup.py build - compiles the python functionality - calls into the CMake build to build libtorch.so - produces _C.so, which has a dependency on libtorch.so In terms of code changes, this mostly means extending the cmake build to support the full variety of environments and platforms. There are also a small number of changes related to the fact that there are now two shared objects - in particular, windows requires annotating some symbols with dllimport/dllexport, and doesn't allow exposing thread_local globals directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8792 Reviewed By: ezyang Differential Revision: D8764181 Pulled By: anderspapitto fbshipit-source-id: abec43834f739049da25f4583a0794b38eb0a94f	2018-07-18 14:59:33 -07:00
Chunli Fu	a487b08c2e	AutoBatching - IR transformation(basic operators) (#9198 ) Summary: Use decorator `torch.jit.batch` to implement auto-batching (call `to_batch` pass to do IR tranformation). - `to_batch` pass: "to_batch.h/cpp" in csrc/jit/passess to transform a graph to a new batched graph. - Write several basic operators for BatchTensor (add, mul, sigmoid, tanh, mm, matmul, select). - Register the operators in a lookup table `<std::string, std::shared_ptr<Graph>>`. (use the Graph to replace the original node in IR graph) Move BatchTensor in python from torch.BatchTensor to torch.jit.BatchTensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/9198 Reviewed By: zdevito Differential Revision: D8744466 Pulled By: ChunliF fbshipit-source-id: 9ea56a30f55cb870f13a2069a47cc635419763ff	2018-07-11 18:25:07 -07:00
Adam Paszke	b9f575fc33	Remove legacy code from the JIT (#9323 ) Summary: In particular, get rid of backward tracing and CppOp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9323 Reviewed By: ezyang Differential Revision: D8795935 Pulled By: apaszke fbshipit-source-id: fb7a7eeee41902da35f2a8efd77262ca60fd6bbe	2018-07-11 10:25:38 -07:00
Zachary DeVito	efefd1d7cf	Unify aten_dispatch and aten_schema into a single operator abstraction with human-readable schema. (#8885 ) Summary: This is a series of two commits that should probably be read separately. They are stacked on top of #9018 since the second commit requires it for correctness. Commit 1 ======= This commit is the first in a series that will clean up how we handle declaring operators and intrinsics in the JIT to make it more modular and readable. This introduces readable declarations that can be used to register operators and switches gen_jit_dispatch to generate this schema. A follow up PR will remove the dispatch keys like "add-3" and resolve ops directly based on the registered schema, further simplifying the generation process. * Switches schema over to parsed declarations, in the future this will allow something like: ``` registry.register_intrinsic("foo(Tensor a, Tensor b) -> Tensor", [](Stack& stack) { ... }) ``` This will allow the scalable registration of intrinsics for lists, tuples, and other ops, as long as meta-data for these ops (e.g. derivatives and size propagation routines). The declarations resemble those used by PythonArgParser but have been singificantly cleaned up to minimize the number of types that can appear in the declaration. We should strive to get the other parts of PyTorch switched over to this restricted declaration set when possible, but it is too much to do in a single PR. My hope is that eventually we will use a very similar language to describe declarations in C10, and this can serve as a guide for that. Parsing is done using the script lexer, so it is very robust to whitespace and extensible for future types. This removes the other way we encoded schema, and makes it easier to see what schema are registered. Current generated declarations: https://gist.github.com/zdevito/a96a17766fb3a098d69a91ee00abaaf6 * Switches how we handle attempting to use an integer in the place of a fixed-sized int list, such as in conv (e.g. 'int[3] stride=1'). Now that we can statically distinguish between int and Tensor, we handle the expansion as an implicit conversion in the compiler. This allows us to simplify the interpreter since it no longer needs to handle the conversion itself. * Schema declarations have been changed so that they match the type system in the IR exactly. In particular, attribute_info which was used by liftConstantAttributes has been dropped and constant attributes are lifted purely based on the type of the input. Type conversions in compiler have been simplified due to this change. * Error highlighting in ErrorReport now only reports at most 20 lines of code, to make reading where an error occurred easier. Commit 2 ======= This commit unifies aten_dispatch and aten_schema into a single Operator object that both contains schema and implementation information. In the future we can use this object to also contain functionality like shape prop and autodiff needed by all operators. Operators are registered globally, and dispatch logic uses the schema information to figure out which variant to use. Descriptor keys, a frequent source of inscrutable debug errors, have been removed. * Introduce Operator, to replace TensorOp. Unlike TensorOp, we use Operator for all op implementations, including primitives that may occur in the graphs. The only exceptions are ops that are only known to the interpreter like jumps, and GraphExecutors where we need to record additional debug info. * Adds a global registry for Operator implementations. aten_dispatch.cpp turns into register_aten_ops.cpp, which registers all the Operators for aten with the operator registry. register_prim_ops.cpp now contains the implementations for primitive operators that used to be in the interpreter. This means that it is now safe to use `getOperation(node)` to lookup the true interpreter function for the node, which will simplify const-propagation passes. * Remove addInterpreterOpHandler in favor of global operator registry. * Instead of descriptors, we match Node arguments directly against FunctionSchema describing expected inputs in `matchSchema`. `matchSchema` knows how parse both attributes and positional inputs from a node and match it to the appropriate registered operator. Debug error messages when we try to run an invalid operator are significantly improved: they now automatically display the schema for the op with the same name that are registered. * Merge aten_schema into regsiter_aten_ops. Each Operator takes a string schema which is parsed to determine when to dispatch to that op. * Cleans up gen_jit_dispatch.py now that we do not need to write out descriptors. In particular, skip_scalar_overloads can be removed since Richard's code sorts declarations to put Tensor, Tensor declarations first. * remove matchSchemaAndLiftConstantAttributes and use emitBuiltinCall instead to remove code duplication * refactor stack manipulation functions into a separate header file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8885 Reviewed By: jamesr66a Differential Revision: D8751048 Pulled By: zdevito fbshipit-source-id: 312aabfbf88307c5f6ab947b6caf691468b94557	2018-07-10 10:24:48 -07:00
Edward Yang	d0d1820814	Add weak pointer and finalizer support directly to THStorage. (#9148 ) Summary: The underlying use-case is the file descriptor to storage cache in torch.multiprocessing.reductions. Previously, this was implemented by wrapping an existing allocator with a "weak ref" allocator which also knew to null out the weak reference when the storage died. This is terribly oblique, and prevents us from refactoring the allocators to get rid of per-storage allocator state. So instead of going through this fiasco, we instead directly implement weak pointers and finalizers in THStorage. Weak pointers to THStorage retain the THStorage struct, but not the data_ptr. When all strong references die, data_ptr dies and the finalizers get invoked. There is one major hazard in this patch, which is what happens if you repeatedly call _weak_ref on a storage. For cleanliness, we no longer shove our grubby fingers into the finalizer struct to see if there is already a Python object for the weak reference and return it; we just create a new one (no one is checking these Python objects for identity). This means if you keep calling it, we'll keep piling on finalizers. That's bad! But I am not going to fix it until it is actually a problem for someone, because then we need to add another caching layer. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9148 Differential Revision: D8729106 Pulled By: ezyang fbshipit-source-id: 69710ca3b7c7e05069090e1b263f8b6b9f1cf72f	2018-07-10 06:25:33 -07:00
Peter Goldsborough	4498fb962b	Add space around operator (#9294 ) Summary: Fixes lint failure on master Pull Request resolved: https://github.com/pytorch/pytorch/pull/9294 Differential Revision: D8779010 Pulled By: goldsborough fbshipit-source-id: da1ea2604189fd704c22fa8a5770bd92845cea91	2018-07-09 20:24:21 -07:00
Jesse Hellemn	99ab082366	Making setup.py install work for Caffe2 (#8509 ) Summary: Tested on my mac on a pretty clean anaconda3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/8509 Reviewed By: orionr Differential Revision: D8702257 Pulled By: pjh5 fbshipit-source-id: eda03ef9732da9fc56b31d909af5c0e39520d689	2018-07-09 18:10:58 -07:00
Zachary DeVito	819815d9c0	Fix missing compile_commands.json for aten (#9227 ) Summary: When we moved the libaten build into libcaffe2, we changed the location where it generated compile_commands.json such that it was no longer being picked up by the build script. This fixes it so it is still found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9227 Reviewed By: goldsborough Differential Revision: D8757984 Pulled By: zdevito fbshipit-source-id: 73df26bf08d98f18ac841d6c0db7e332fd328ab6	2018-07-08 16:54:34 -07:00
Francisco Massa	f6027bb15d	Install hpp headers for CPP Extensions (#9182 ) Summary: With the Cppzation of a few files in `TH`/`THC`, the CPP extensions got broken whenever the user uses feature from `THC` in their files, when pytorch is installed via `python setup.py install`. This addresses issues such as ``` /home/me/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/include/THC/THCDeviceTensorUtils.cuh:5:25: fatal error: THCTensor.hpp: No such file or directory ``` Closes https://github.com/pytorch/pytorch/pull/9182 Reviewed By: soumith Differential Revision: D8734581 Pulled By: fmassa fbshipit-source-id: 2a1138f208592eaccb01fcdb805a6b369d7a497a	2018-07-05 07:55:25 -07:00
Roy Li	c61f0217a5	combine size_average and reduce args in loss functions (#8018 ) Summary: closes #7929 Closes https://github.com/pytorch/pytorch/pull/8018 Differential Revision: D8682540 Pulled By: li-roy fbshipit-source-id: 649170dd1a7f373151c1d4e949838bd1c5651936	2018-07-01 05:39:00 -07:00
Chunli Fu	67b21117b7	Add BatchTensor class (#8922 ) Summary: Add BatchTensor class - construct from data, mask, dims or construct from list of tensors - can return a list of tensors from an BatchTensor class next step: do IR level transformation and operators Closes https://github.com/pytorch/pytorch/pull/8922 Differential Revision: D8668986 Pulled By: ChunliF fbshipit-source-id: 8b24d2a9f46a3b42dbb397e99e9e059dfb2b326e	2018-06-29 15:57:27 -07:00
Zachary DeVito	f74207c99f	Allow autograd to work even when the shape of values cannot be determined (#8641 ) This commit implements the solution proposed in https://github.com/pytorch/pytorch/issues/8410 to workaround the need to create zero tensors with the same shape as inputs. It introduces the concept of a LinearBlock which marks places in the code where we know if all the inputs to the node are zero, then the outputs to the node are also zero. Autodiff introduces LinearBlocks around backwards functions, which have this property. specializeUndef then propagates Undef nodes using this information. Notes: * Since we do not always specialize, we have a pass LowerLinearBlocks that replaces the block with an if statement that dynamically guards the Undef case. * We introduce AutogradAdd which is addition that still works when its inputs might be undefined. In cases where we specialize this will get removed in favor of a normal add, but there are cases where gradient graphs do not specialize (e.g. when they are not differentiable, but a derivative is required) so it is important for this op to be executable.	2018-06-25 18:40:04 -07:00
Orion Reblitz-Richardson	5a7b4840d9	Move nanopb-generated ONNX to unique file name (#8773 ) * Move nanopb-generated ONNX to unique file name * fix other places	2018-06-22 09:51:56 -04:00
Richard Zou	8489c4cc6e	Better support for literals in jit script (#8687 ) Addresses #8177 A design doc can be found here: [gist](https://gist.github.com/zou3519/4b7f13f03cc9f3612bd9363e6405fa0a) version or [quip](https://fb.quip.com/azL1AqUckBdo) version General approach: - Add NumberType, FloatType, IntType to represent Python numbers, floats and ints. - Emit these types for python literals - Change aten_schema such that Scalars are NumberType, int64_t and bool are IntType. - Emit aten::type_as, prim::NumToTensor, and prim::TensorToNum nodes for tensor-number math. (see examples below) - Erase NumberType, prim::NumToTensor, and prim::TensorToNum for ONNX export ### Tensor/number math ``` import torch @torch.jit.script def fn(x): return x + 1 ``` ``` graph(%x : Dynamic) { %1 : int = prim::Constant[value={1}]() %2 : Dynamic = prim::NumToTensor(%1) %3 : Dynamic = aten::type_as(%2, %x) %4 : Dynamic = aten::add[alpha={1}](%x, %4) return (%5); } ``` ### Number/Number Math ``` import torch @torch.jit.script def fn(zero): c = 1 + 1 return zero + c ``` ``` graph(%zero : Dynamic) { %1 : int = prim::Constant[value={1}]() %2 : int = prim::Constant[value={1}]() %3 : Dynamic = prim::num_to_tensor(%1) %4 : Dynamic = prim::num_to_tensor(%2) %5 : Dynamic = aten::add[alpha={1}](%3, %4) %c : int = prim::TensorToNum(%6) # this is the result of the addition ... return (%13); } ``` List of squashed commits: * Introduce Python Number types Added: IntType, FloatType, NumberType with IntType <: NumberType FloatType <: NumberType Changed aten_schema so arguments have corresponding types * Emit a NumberType for python literals. Also emit a NumberType for Scalar default values. * Add prim::NumToTensor and prim::TensorToNum * Add DynamicType -> NumberType implicit cast for bc * Better ensureTensor error message * Add ensureTensorOrNumber. Allow passing Number to some functions Like the range() construct and slices * Patch IntList to work. IntList is still a DynamicType in the frontend: a tensor gets built from a List[int]. Also, IntList[1] is a "union between int and IntList" the way it is implemented. If the frontend sees an int being passed for an IntList[1] arg, it converts it to a tensor as well. * Enforce some order on schemas to avoid overload ambiguity add(Tensor, Tensor) should appear earlier than add(Tensor, Scalar). This matches the order in which python_arg_parser parses its arguments. * Disable std_dim and var_dim tests. With the new schema information, std(input, keepdim) and std(input, dim) are ambiguous. This will need to be fixed at a later date. * Add NumberType erasure pass. This is used for ONNX export and to ensure that NumberType information doesn't reach the interpreter * Add support for mixed tensor/number math ops. * Tests for new functionality. Includes: - Tensor/number math - number/number math - EraseNumberTypes pass test * Patch tests Update expect tests for: - decompose_addmm - loop unrolling tests Because python numbers are now NumberType, they cannot be returned by functions anymore. Work around this by using "torch.full", or by adding a tensor([0]) (taken from FIXME_zerol()). Both approaches are used because torch.full is more readable, but it is broken in some cases. * Add erase_number_types to torch/CMakeLists.txt * Move math back to emitSimpleExpr from emitSugaredExpr * Remove some dead lines * Renable some excluded script/trace tests that are fixed. * Move some tests to expected failure * Address some comments (more addressing to come) * Erase relevant aten::type_as nodes in EraseNumberTypes I also changed it so that EraseNumberTypes is only called for ONNX export. It is no longer used to prevent prim::NumToTensor/prim::TensorToNum from reaching shape_analysis or interpreter.cpp. shape_analysis infers the type of the output of these nodes to be the same as their input. intepreter.cpp treats both of these nodes as no-ops. * Add reminder to fix std/var * Call EraseNumberTypes only when exporting a script module * Update expects after rebase	2018-06-21 15:43:38 -04:00
anderspapitto	48e90e3339	Build system changes (#8627 ) * All changes needed to get rid of process_github.sh * allow thnn_h_path	2018-06-20 17:45:26 -04:00
Teng Li	61c96811be	[c10d] NCCL python binding and CI test, with bug fixes (#8357 ) * [c10d] NCCL python binding and CI test, with bug fixes * Addressed comments and further bug fix * Made NCCL build optional, made C10D libc10d.a only * Fixed tests so that NCCL pg won't run when not neeeded * Addressed comments	2018-06-19 13:02:39 -07:00
cpuhrsch	05c473b85c	Temporarily remove TBB (#8255 )	2018-06-18 19:31:57 -04:00
Peter Goldsborough	372d1d6735	Create ATen tensors via TensorOptions (#7869 ) * Created TensorOptions Storing the type in TensorOptions to solve the Variable problem Created convenience creation functions for TensorOptions and added tests Converted zeros to TensorOptions Converted rand to TensorOptions Fix codegen for TensorOptions and multiple arguments Put TensorOptions convenience functions into torch namespace too All factory functions except _like support TensorOptions Integrated with recent JIT changes Support _like functions Fix in place modification Some cleanups and fixes Support sparse_coo_tensor Fix bug in Type.cpp Fix .empty calls in C++ API Fix bug in Type.cpp Trying to fix device placement Make AutoGPU CPU compatible Remove some auto_gpu.h uses Fixing some headers Fix some remaining CUDA/AutoGPU issues Fix some AutoGPU uses Fixes to dispatch_tensor_conversion Reset version of new variables to zero Implemented parsing device strings Random fixes to tests Self review cleanups flake8 Undo changes to variable.{h,cpp} because they fail on gcc7.2 Add [cuda] tag to tensor_options_cuda.cpp Move AutoGPU::set_index_from into .cpp file because Windows is stupid and sucks Fix linker error in AutoGPU.cpp Fix bad merge conflict in native_functions.yaml Fixed caffe2/contrib/aten Fix new window functions added to TensorFactories.cpp * Removed torch::TensorOptions Added code to generate wrapper functions for factory methods Add implicit constructor from Backend to TensorOptions Remove Var() from C++ API and use torch:: functions Use torch:: functions more subtly in C++ API Make AutoGPU::set_device more exception safe Check status directly in DynamicCUDAHooksInterface Rename AutoGPU to DeviceGuard Removed set_requires_grad from python_variables.h and warn appropriately in Variable::set_requires_grad remove python_default_init: self.type() Add back original factory functions, but with deprecation warnings Disable DeviceGuard for a couple functions in ATen Remove print statement Fix DeviceGuard construction from undefined tensor Fixing CUDA device compiler issues Moved as many methods as possible into header files Dont generate python functions for deprecated factories Remove merge conflict artefact Fix tensor_options_cuda.cpp Fix set_requires_grad not being checked Fix tensor_new.h TEMPORARILY put some methods in .cpp files to see if it solves issues on windows and mac Fix bug in DeviceGuard.h Missing includes TEMPORARILY moving a few more methods into .cpp to see if it fixes windows Fixing linker errors * Fix up SummaryOps to use new factories Undo device agnostic behavior of DeviceGuard Use -1 instead of optional for default device index Also move DeviceGuard methods into header Fixes around device index after optional -> int32_t switch Fix use of DeviceGuard in new_with_tensor_copy Fix tensor_options.cpp * Fix Type::copy( * Remove test_non_float_params from ONNX tests * Set requires_grad=False in ONNX tests that use ints * Put layout/dtype/device on Tensor * Post merge fixes * Change behavior of DeviceGuard to match AutoGPU * Fix C++ API integration tests * Fix flip functions	2018-06-16 00:40:35 -07:00
Tongzhou Wang	c537fd7432	fix lint (#8567 )	2018-06-15 17:34:39 -04:00
Soumith Chintala	dc186cc9fe	Remove NO_* and WITH_* across codebase, except in setup.py (#8555 ) * remove legacy options from CMakeLists * codemod WITH_ to USE_ for WITH_CUDA, WITH_CUDNN, WITH_DISTRIBUTED, WITH_DISTRIBUTED_MW, WITH_GLOO_IBVERBS, WITH_NCCL, WITH_ROCM, WITH_NUMPY * cover SYSTEM_NCCL, MKLDNN, NNPACK, C10D, NINJA * removed NO_* variables and hotpatch them only in setup.py * fix lint	2018-06-15 12:29:48 -04:00
Orion Reblitz-Richardson	edd4e2c5d1	Expose proto utils and ONNX (#8073 ) * Expose proto utils and ONNX from PyTorch libcaffe2.so * Try to use protobuf from _C.so * Fix ONNX proto header include * Adjust order of imports for ONNX until nanopb goes away * Set and use ONNX_NAMESPACE for PyTorch builds * Show protobuf summary for all builds * Add ONNX_NAMESPACE for cpp_build * Statically link libprotobuf.a into libtorch.so * Set ONNX_NAMESPACE on Windows build * Move core/dispatch up as well * Add /MD flag for Windows build of _C * Potential Windows fix for ONNX and protobuf * Add direct linkage from _C to ONNX on Windows * Only include protobuf wrapper for PyTorch * Pass extra_compile_args to _nvrtc ext build * Remove installation of .a files	2018-06-13 10:25:32 -07:00
Jorghi12	81b92f7515	Get ROCm building again on master (#8343 ) Billing of changes: - New Jenkins script for building on rocm. For now it is a bit hacked together, but we can improve it once CI is running - New ROCM docker image for nightly HIP, and also some legacy packages that we need temporarily - New enabled config py2-clang3.8-rocmnightly-ubuntu16.04-build based off of the existing Caffe2 image (not built yet) - A big pile of cmake fixes, mostly to turn bits on/off when ROCM build is involved - Switch from hiprng to hcrng - Apply some patches directly in code, eliminating the patches - Use __hdiv instead of hdiv, it's more portable - THCNumerics<T>::gt doesn't work in HIP, so simulate it with sub - Add a few more overloads HIP needs - Turn off use of hcc to link (we plan to turn this back on to get tests running) - Search for hiprand, hiprng, hipblas, hipsparse - Better Python 2 portability	2018-06-12 23:05:21 -04:00
albanD	78e3259bbe	Add autograd automatic anomaly detection (#7677 ) * add autograd automatic anomaly detection * python 3 string support * Fix non python build * fix typo in doc * better test and naming fix * fix no python build and python object handling * fix missing checks * clean NO_PYTHON build * Remove unwanted changes	2018-06-11 21:26:17 -04:00
Pieter Noordhuis	695d40efc2	Create initial Python bindings for c10d (#8119 ) * Build and install c10d from tools/build_pytorch_libs.sh * Create initial Python bindings for c10d * clang-format * Switch link order to include more symbols * Add bindings and tests for ProcessGroupGloo * Add broadcast test * Separate build flag for c10d * Explicit PIC property * Skip c10d tests if not available * Remove c10d from Windows blacklist Let it skip by itself because it won't be available anyway. * Make lint happy * Comments * Move c10d module into torch.distributed * Close tempfile such that it is deleted	2018-06-08 12:59:51 -07:00
Soumith Chintala	5e372c7106	fix lint	2018-06-06 12:53:58 -04:00
Paul Jesse Hellemn	8e6f7a1382	[Caffe2] Merging setup.py with setup_caffe2.py (#8129 ) * Mergine setup.pys, torch works, caffe2 works up to other KP * Fix to super call for python 2 * Works on python2 on mac * Consolidating Caffe2 flags	2018-06-06 08:31:31 -07:00
Adam Paszke	f45a3d5558	Add a loop unrolling pass to PyTorch JIT (#7672 )	2018-06-06 09:36:12 +02:00
Zachary DeVito	23dd033b51	Factor python dependency out of interpreter (#7970 ) * Factor python dependency out of interpreter * Remove NO_PYTHON for the autograd engine If there is no python bindings, then a default Engine is constructed the first time it is requested. If the python libraries are loaded, then they override the default accessor and the default engine becomes a python Engine. Note: it is possible for two engines to be generated if a non-python one gets created before the python bindings are loaded. This case is rare, and just results in additional threads being spawned. * Fixing AlexNet test which is skipped in CI	2018-06-01 16:07:21 -04:00
James Reed	1f94a6eab3	[JIT] Fission and fusion passes for addmm (#7938 ) * Addmm decomposition pass * Addmm peephole pass * Fix handling of output shape in fusion pass * Add DCE to the peephole passes * add comments * maybe bugfix? * Fix GPU tests * fix py2/3 test issue	2018-05-30 18:06:58 -04:00
Orion Reblitz-Richardson	4bf0202cac	[build] Have PyTorch depend on minimal libcaffe2.so instead of libATen.so (#7399 ) * Have PyTorch depend on minimal libcaffe2.so instead of libATen.so * Build ATen tests as a part of Caffe2 build * Hopefully cufft and nvcc fPIC fixes * Make ATen install components optional * Add tests back for ATen and fix TH build * Fixes for test_install.sh script * Fixes for cpp_build/build_all.sh * Fixes for aten/tools/run_tests.sh * Switch ATen cmake calls to USE_CUDA instead of NO_CUDA * Attempt at fix for aten/tools/run_tests.sh * Fix typo in last commit * Fix valgrind call after pushd * Be forgiving about USE_CUDA disable like PyTorch * More fixes on the install side * Link all libcaffe2 during test run * Make cuDNN optional for ATen right now * Potential fix for non-CUDA builds * Use NCCL_ROOT_DIR environment variable * Pass -fPIC through nvcc to base compiler/linker * Remove THCUNN.h requirement for libtorch gen * Add Mac test for -Wmaybe-uninitialized * Potential Windows and Mac fixes * Move MSVC target props to shared function * Disable cpp_build/libtorch tests on Mac * Disable sleef for Windows builds * Move protos under BUILD_CAFFE2 * Remove space from linker flags passed with -Wl * Remove ATen from Caffe2 dep libs since directly included * Potential Windows fixes * Preserve options while sleef builds * Force BUILD_SHARED_LIBS flag for Caffe2 builds * Set DYLD_LIBRARY_PATH and LD_LIBRARY_PATH for Mac testing * Pass TORCH_CUDA_ARCH_LIST directly in cuda.cmake * Fixes for the last two changes * Potential fix for Mac build failure * Switch Caffe2 to build_caffe2 dir to not conflict * Cleanup FindMKL.cmake * Another attempt at Mac cpp_build fix * Clear cpp-build directory for Mac builds * Disable test in Mac build/test to match cmake	2018-05-24 07:47:27 -07:00
Zachary DeVito	286cd04a20	JIT cleanup (#7631 ) Cleans up dead code in the JIT: * Remove interpreter_autograd_function * Remove Handles * Remove HandleBuilder * Remove creates_handles, and tracing_autograd_python_function flags * Remove unused var_args * Fix submodules	2018-05-21 10:06:29 -07:00
Adam Paszke	b45f2ff1ae	Remove CompiledFunction + clean up JIT tests (#7421 )	2018-05-16 20:03:04 +02:00
Jorghi12	cd86d4c554	PyTorch AMD Build Scripts (#6625 ) * PyTorch AMD Build Script. * Python invocation for hipify * Adding individual hip fles. * Updating CWD Use the actual path for the file instead of the current working directory, which depends on where the script is invoked. * Updating folder path for amd_build * Removing previous amd_build directory * Updated setup.py to support WITH_ROCM * Renaming the files for CuDNN BatchNorm & Conv since having two .cpp files with the same name results in a linking error in the HCC compiler used for ROCm/AMD. * Removing old BatchNorm & Conv files since they've been renamed. * Updating build path to handle ROCM * Cleaned up the build path and created a FindHIP cmake file for setting up relevant hip paths. * Seperated the individual patch files to make it easier to detect issues while building. * Removed CMakeLists hip files and fixed directory structure * Adding build pytorch amd script * Merged setup patch into PyTorch setup.py & cleaned a few issues * Added information on where to download the hipify-python script. * Resolved linting issues inside of build_pytorch_amd.py * Removing many unnecessary patch files. Removing unnecessary .hip files. Fixing up the build process. * Refactored the PR for supporting HIP * Minimizing the number of changes inside individual patches. * Cleaned up patch files. * Removed patch files. * Updating patches * Removing HIP change from file. * Cleaned up patches * Added AVX/SSE avoidance due to bug with ROCms stack. Just temporary for now. * Removing the other HIP file * Removed patch file + merged ROCm into Aten/test * Removed ATen tests patch file and updated disbale_features yaml to remove headers that don't exist on the HIP stack. * Reduced the number of patches down to 14 after Edward's suggestions. * Transferred deletion of certain functions from patch to yaml file. * Set default Thrust path * Fixed aten files so we now use the templated pow/abs instead of std:: directly. * Removed error from aten/src/THCUNN/Abs.cu * Updated the locations of the cmake build files. Moved THCTensorRandom from a hip to a patch file. Added executable/library commands that can successfully handle either CUDA or HIP. * Removed hip extraction from the build script and removed the old hip file. * Replaced MACRO with function in upper level cmake. * Added empty ELSE() block to prevent the loading of a command without CUDA or HIP. Also added IF guards around torch_cuda_based_add_executable in Aten tests. * Updated aten tests. * Removed the hip include from the ATen header. * Can't throw exceptions on C++ AMP, using abort * Missing IF guards for cuda/hip executables in aten tests. * Removed a series of patch files. * Added template keyword to help out the HCC compiler. * Rebased the specific files displayed in the PR * Fixing typo. * Change flag from "WITH_CUDA" to "NOT NO_CUDA" Replacing "WITH_CUDA" with "NOT NO_CUDA" after the rebase. * Fix LoadHIP path * Updating build files after rebasing. * Reorganization after cpu/gpu separation. * Removed HIPCC from setup.py & removed -shared extra linking args. * Updated CMake / Setup build to correctly link when under ROCm stack. * Removed the unnecessary argument from Extension constructor. * Adding another test to be included with ROCm building. * Updated the setup_helpers scripts in order to get around linter error * Fix syntax issue * Solving lint issue: line too long	2018-05-15 18:38:01 -07:00
Zachary DeVito	ce69d3110b	Improve script builtin checking using schema (#7311 ) Improve script builtin checking using schema * This add aten_schema.h which provides a barebones amount of type and argument information about each builtin operator * emitBuiltinCall is updated to use this information rather than aten_dispatch to ensure the operator is correct. * handling of keyword and position arguments now matches python behavior * There is no longer a requirement that kwargs be constant or that the attributes of an op must be entirely constant or non-constant * compiler now constructs a non-attributed version of the op first and then turns it into the constant-attribute version if all attributes are constants. * default arguments for builtins now work * SugaredValue::call and similar functions now have SourceRange information for their arguments so that error reporting is more accurate Notes: * This does not try to merge the builtin checking with python arg parser. Given that we will eventually have C10 schema which will replace aten_schema, we will eventually have a C++ description of the schema and working of that description directly will be the easiest form to understand. * python function calls and script method calls do not support keyword arguments yet. When we add this support we should refactor the handling in tryEmitSchema that resolves keywords into a common function. * default arguments work * keyword arguments to builtins work (still need to extend to calling python and other script methods) * much better error reporting for incorrect builtins Lift any constants to attributes on nodes when possible * Schema is usable internally in the compiler as the function signatures of script functions as well as for builtin operators. * Adds a List[T] class to better represent the arguments to cat/stack as a type rather than with custom checking. * Support kwargs for calls of script methods A future commit will be needed to add support for: * calls to script _functions_ which are currently are GraphExecutors without schema info. * kwargs to python functions, which will require refactoring python op	2018-05-14 14:46:36 -07:00
Zachary DeVito	93eb50c103	Mark expand nodes as implicit/explicit in trace (#7303 ) When tracing we record expand nodes. This is useful in some cases because it makes it clear a broadcast happened. However, in future runs the broadcast may be different or not needed. This change adds an attribute to expand to track if it was implicitly added. This takes the form of an unused input to expand with a default value. The execution engine then removes implicit expands before execution. Note that shape_analysis will re-add expands when it can prove by shape analysis that they will exist and this is useful for the fuser, so this change should not affect fusion passes.	2018-05-10 10:47:43 -07:00
Edward Z. Yang	64834f6fb8	Split libATen.so into libATen_cpu.so and libATen_cuda.so (#7275 ) * Split libATen.so into libATen_cpu.so and libATen_cuda.so Previously, ATen could be built with either CPU-only support, or CPU/CUDA support, but only via a compile-time flag, requiring two separate builds. This means that if you have a program which indirectly uses a CPU-only build of ATen, and a CPU/CUDA-build of ATen, you're gonna have a bad time. And you might want a CPU-only build of ATen, because it is 15M (versus the 300M of a CUDA build). This commit splits libATen.so into two libraries, CPU/CUDA, so that it's not necessary to do a full rebuild to get CPU-only support; instead, if you link against libATen_cpu.so only, you are CPU-only; if you additionally link/dlopen libATen_cuda.so, this enables CUDA support. This brings ATen's dynamic library structure more similar to Caffe2's. libATen.so is no more (this is BC BREAKING) The general principle for how this works is that we introduce a hooks interface, which introduces a dynamic dispatch indirection between a call site and implementation site of CUDA functionality, mediated by a static initialization registry. This means that we can continue to, for example, lazily initialize CUDA from Context (a core, CPU class) without having a direct dependency on the CUDA bits. Instead, we look up in the registry if, e.g., CUDA hooks have been loaded (this loading process happens at static initialization time), and if they have been we dynamic dispatch to this class. We similarly use the hooks interface to handle Variable registration. We introduce a new invariant: if the backend of a type has not been initialized (e.g., it's library has not been dlopened; for CUDA, this also includes CUDA initialization), then the Type pointers in the context registry are NULL. If you access the registry directly you must maintain this invariant. There are a few potholes along the way. I document them here: - Previously, PyTorch maintained a separate registry for variable types, because no provision for them was made in the Context's type_registry. Now that we have the hooks mechanism, we can easily have PyTorch register variables in the main registry. The code has been refactored accordingly. - There is a subtle ordering issue between Variable and CUDA. We permit libATen_cuda.so and PyTorch to be loaded in either order (in practice, CUDA is always loaded "after" PyTorch, because it is lazily initialized.) This means that, when CUDA types are loaded, we must subsequently also initialize their Variable equivalents. Appropriate hooks were added to VariableHooks to make this possible; similarly, getVariableHooks() is not referentially transparent, and will change behavior after Variables are loaded. (This is different to CUDAHooks, which is "burned in" after you try to initialize CUDA.) - The cmake is adjusted to separate dependencies into either CPU or CUDA dependencies. The generator scripts are adjusted to either generate a file as a CUDA (cuda_file_manager) or CPU file (file_manager). - I changed all native functions which were CUDA-only (the cudnn functions) to have dispatches for CUDA only (making it permissible to not specify all dispatch options.) This uncovered a bug in how we were handling native functions which dispatch on a Type argument; I introduced a new self_ty keyword to handle this case. I'm not 100% happy about it but it fixed my problem. This also exposed the fact that set_history incompletely handles heterogenous return tuples combining Tensor and TensorList. I swapped this codegen to use flatten() (at the possible cost of a slight perf regression, since we're allocating another vector now in this code path). - thc_state is no longer a public member of Context; use getTHCState() instead - This PR comes with Registry from Caffe2, for handling static initialization. I needed to make a bunch of fixes to Registry to make it more portable - No more ##__VA_ARGS__ token pasting; instead, it is mandatory to pass at least one argument to the var-args. CUDAHooks and VariableHooks pass a nullary struct CUDAHooksArgs/VariableHooksArgs to solve the problem. We must get rid of token pasting because it does not work with MSVC. - It seems MSVC is not willing to generate code for constructors of template classes at use sites which cross DLL boundaries. So we explicitly instantiate the class to get around the problem. This involved tweaks to the boilerplate generating macros, and also required us to shuffle around namespaces a bit, because you can't specialize a template unless you are in the same namespace as the template. - Insertion of AT_API to appropriate places where the registry must be exported - We have a general problem which is that on recent Ubuntu distributions, --as-needed is enabled for shared libraries, which is (cc @apaszke who was worrying about this in #7160 see also #7160 (comment)). For now, I've hacked this up in the PR to pass -Wl,--no-as-needed to all of the spots necessary to make CI work, but a more sustainable solution is to attempt to dlopen libATen_cuda.so when CUDA functionality is requested. - The JIT tests somehow manage to try to touch CUDA without loading libATen_cuda.so. So we pass -Wl,--no-as-needed when linking libATen_cuda.so to _C.so - There is a very subtle linking issue with lapack, which is solved by making sure libATen_cuda.so links against LAPACK. There's a comment in aten/src/ATen/CMakeLists.txt about htis as well as a follow up bug at #7353 - autogradpp used AT_CUDA_ENABLED directly. We've expunged these uses and added a few more things to CUDAHooks (getNumGPUs) - Added manualSeedAll to Generator so that we can invoke it polymorphically (it only does something different for CUDAGenerator) - There's a new cuda/CUDAConfig.h header for CUDA-only ifdef macros (AT_CUDNN_ENABLED, most prominently) - CUDAHooks/VariableHooks structs live in at namespace because Registry's namespace support is not good enough to handle it otherwise (see Registry changes above) - There's some modest moving around of native functions in ReduceOps and UnaryOps to get the CUDA-only function implementations into separate files, so they are only compiled into libATen_cuda.so. sspaddmm needed a separate CUDA function due to object linkage boundaries. - Some direct uses of native functions in CUDA code has to go away, since these functions are not exported, so you have to go through the dispatcher (at::native::empty_like to at::empty_like) - Code in THC/THCS/THCUNN now properly use THC_API macro instead of TH_API (which matters now that TH and THC are not in the same library) - Added code debt in torch/_thnn/utils.py and other THNN parsing code to handle both TH_API and THC_API - TensorUtils.h is now properly exported with AT_API - Dead uses of TH_EXPORTS and co expunged; we now use ATen_cpu_exports and ATen_cuda_exports (new, in ATenCUDAGeneral.h) consistently - Fix some incorrect type annotations on _cudnn_rnn_backward, where we didn't declare a type as possibly undefined when we should have. We didn't catch this previously because optional annotations are not tested on "pass-through" native ATen ops (which don't have dispatch). Upstream issue at #7316 - There's a new cmake macro aten_compile_options for applying all of our per-target compile time options. We use this on the cpu and cuda libraries. - test/test_cpp_extensions.py can be run directly by invoking in Python, assuming you've setup your PYTHONPATH setup correctly - type_from_string does some new funny business to only query for all valid CUDA types (which causes CUDA initialization) when we see "torch.cuda." in the requested string Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Last mile libtorch fixes Signed-off-by: Edward Z. Yang <ezyang@fb.com> * pedantic fix Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-10 10:28:33 -07:00
Peter Goldsborough	54a4867675	Bring back C++ extension torch.h (#7310 ) * Bring back C++ extension torch.h * Fix python.h include in python_tensor.cpp	2018-05-05 14:06:27 -07:00
Peter Goldsborough	feb64b5291	Add -Wno-unknown-pragmas (#7291 )	2018-05-04 13:44:13 -07:00
Peter Goldsborough	67d0d14908	Rename autograd namespace to torch and change torch.h into python.h (#7267 ) * Rename autograd namespace to torch and change torch.h into python.h * Include torch.h instead of python.h in test/cpp/api * Change some mentions of torch.h to python.h in C++ extensions * Set paths directly, without find_path	2018-05-04 08:04:57 -07:00
Soumith Chintala	92f54e1f01	remove static libstdc++ linking and PYTORCH_BINARY_BUILD env variable (#7259 )	2018-05-03 12:32:57 -07:00
Luca Antiga	5d3c3c53aa	Add raw IR serialization/deserialization (#6392 )	2018-05-01 20:21:29 +02:00
Luca Antiga	0703357723	Don't build THD/master_worker if not explicitly requested (#7081 )	2018-04-29 13:17:09 -04:00
James Reed	4667983f0f	Fixes for interpreter and ONNX export for translation (#7044 ) Fixes for interpreter and ONNX export for translation Address comments	2018-04-27 22:23:57 -07:00
Peter Goldsborough	7b09bc72a5	[WIP] Enable WERROR in tests (#6539 ) * Enable WERROR in tests * Also set WERROR=1 for cpp_build in CI * Enable Werror after the compiler checks * Remove -DWERROR because its picked up from the env var * Had to fix some errors in aten/contrib/data * Allow an uninitialized variable in ReduceOpsKernel.cpp * Use CUDNN_DATA_UINT8 in cuDNN type string conversion * Fixes and use target_compile_options * Fix uninitialized variables in THNN * Include Python.h earlier in tensor_types.cpp * Use CUDNN_VERSION 7100 instead of 7000? * More Python.h includes * Make switch case in common_subexpression_elimination.cpp exhaustive * Build with WERROR=0 just to see all the warnings * Remove some Python includes * Enable WERROR=1 again * Bring back switch case default	2018-04-28 01:51:16 +01:00
Soumith Chintala	bd14d8e8f8	add additional caffe/caffe2 paths to exclude list in pytorch setup.py (#6891 )	2018-04-25 22:10:38 -04:00
Orion Reblitz-Richardson	dec5e99e99	[aten] Move submodules to third_party (#6866 ) * [aten] Move submodules to third_party * [aten] Update aten_mirror.sh script for third_party * [aten] Move ATen submodules def to root and rename * [aten] Update cpuinfo cmake build * [aten] Fix cpuinfo cmake build * Update third_party/cpuinfo to d03d5d296063063c66877fb559cf34469734e3e1 * [aten] Fix JIT test reference to catch	2018-04-24 23:33:46 -04:00
gchanan	1c7b0c1020	Update version string to 0.5. (#6795 )	2018-04-22 13:57:48 -04:00
bddppq	c43c911662	Export onnx protobuf bindings to python (#6651 ) * Export onnx protobuf bindings to python * rename native onnx module to _onnx	2018-04-17 16:38:57 -07:00
srib	53d2612b55	Fix a typo in the setup.py script (#6632 )	2018-04-16 15:29:45 -04:00

1 2 3 4 5 ...

333 Commits