pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Sven-Hendrik Haase	080266e79c	Document CUDAHOSTCXX environment variable (#12265 ) Summary: This variable is already being used so this just serves to document that. I think it's an important variable, too, so it should definitely be documented there somewhere. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12265 Differential Revision: D10162261 Pulled By: soumith fbshipit-source-id: e0d01e012c2fedea63372de9967a8eaa3745fe94	2018-10-03 06:33:06 -07:00
daquexian	1fb8925efe	Fix typo LMBD->LMDB in docs of setup.py (#12282 ) Summary: `setup.py` reads `USE_LMDB` rather than `USE_LMBD` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12282 Differential Revision: D10162025 Pulled By: soumith fbshipit-source-id: 6295a777be10509ca49516ad7c10061d26b6f9c9	2018-10-03 06:14:19 -07:00
Edward Yang	1619264ca5	Make ATen-core and caffe2 mutually recursive / merge template data<T>() (#11970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11970 Adds an ATen-core-headers target, which caffe2_cpu_internal depends on, and makes ATen-core depend on caffe2_headers. If you link against ATen-core, you must ALSO link against caffe2_cpu_internal; if you link against caffe2_cpu_internal, you must ALSO link against ATen-core, otherwise you'll have undefined symbols. Then, we merge template data<T>() method with Caffe2 implementation, demonstrating that includes to Caffe2 (core) from ATen/core are working Reviewed By: jerryzh168 Differential Revision: D9967509 fbshipit-source-id: 3d220c38b2c3c646f8ff2884fdcc889fa9276c7a	2018-09-27 17:40:42 -07:00
Yangqing Jia	9c49bb9ddf	Move registry fully to c10 (#12077 ) Summary: This does 6 things: - add c10/util/Registry.h as the unified registry util - cleaned up some APIs such as export condition - fully remove aten/core/registry.h - fully remove caffe2/core/registry.h - remove a bogus aten/registry.h - unifying all macros - set up registry testing in c10 Also, an important note that we used to mark the templated Registry class as EXPORT - this should not happen, because one should almost never export a template class. This PR fixes that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12077 Reviewed By: ezyang Differential Revision: D10050771 Pulled By: Yangqing fbshipit-source-id: 417b249b49fed6a67956e7c6b6d22374bcee24cf	2018-09-27 03:09:54 -07:00
Orion Reblitz-Richardson	02d7c88fa4	Unify versions across setup.py, libtorch, and libcaffe2 (#12053 ) Summary: This unifies our versions across setup.py, libtorch, and libcaffe2. CMake has a default version (bumped to 1.0.0) that can be overridden by setup.py. The versions are also printed as a part of cmake/Summary.cmake to make sure they are correct. cc Yangqing ezyang soumith goldsborough pjh5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12053 Differential Revision: D10041878 Pulled By: orionr fbshipit-source-id: a98a01771f6c008d1016ab63ab785c3a88c3ddb0	2018-09-26 08:55:06 -07:00
Peter Goldsborough	e05d689c49	Unify C++ API with C++ extensions (#11510 ) Summary: Currently the C++ API and C++ extensions are effectively two different, entirely orthogonal code paths. This PR unifies the C++ API with the C++ extension API by adding an element of Python binding support to the C++ API. This means the `torch/torch.h` included by C++ extensions, which currently routes to `torch/csrc/torch.h`, can now be rerouted to `torch/csrc/api/include/torch/torch.h` -- i.e. the main C++ API header. This header then includes Python binding support conditioned on a define (`TORCH_WITH_PYTHON_BINDINGS`), which is only passed when building a C++ extension. Currently stacked on top of https://github.com/pytorch/pytorch/pull/11498 Why is this useful? 1. One less codepath. In particular, there has been trouble again and again due to the two `torch/torch.h` header files and ambiguity when both ended up in the include path. This is now fixed. 2. I have found that it is quite common to want to bind a C++ API module back into Python. This could be for simple experimentation, or to have your training loop in Python but your models in C++. This PR makes this easier by adding pybind11 support to the C++ API. 3. The C++ extension API simply becomes richer by gaining access to the C++ API headers. soumith ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11510 Reviewed By: ezyang Differential Revision: D9998835 Pulled By: goldsborough fbshipit-source-id: 7a94b44a9d7e0377b7f1cfc99ba2060874d51535	2018-09-24 14:44:21 -07:00
Yangqing Jia	a6f1ae7f20	set up c10 scaffolding. Move macros proper first. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11939 Reviewed By: orionr, dzhulgakov Differential Revision: D10004629 Pulled By: Yangqing fbshipit-source-id: ba50a96820d35c7922d81c78c4cbe849c85c251c	2018-09-24 11:09:59 -07:00
Peter Goldsborough	6100c0ea14	Introduce ExtensionVersioner for C++ extensions (#11725 ) Summary: Python never closes shared library it `dlopen`s. This means that calling `load` or `load_inline` (i.e. building a JIT C++ extension) with the same C++ extension name twice in the same Python process will never re-load the library, even if the compiled source code and the underlying shared library have changed. The only way to circumvent this is to create a new library and load it under a new module name. I fix this, of course, by introducing a layer of indirection. Loading a JIT C++ extension now goes through an `ExtensionVersioner`, which hashes the contents of the source files as well as build flags, and if this hash changed, bumps an internal version stored for each module name. A bump in the version will result in the ninja file being edited and a new shared library and effectively a new C++ extension to be compiled. For this the version name is appended as `_v<version>` to the extension name for all versions greater zero. One caveat is that if you were to update your code many times and always re-load it in the same process, you may end up with quite a lot of shared library objects in your extension's folder under `/tmp`. I imagine this isn't too bad, since extensions are typically small and there isn't really a good way for us to garbage collect old libraries, since we don't know what still has handles to them. Fixes https://github.com/pytorch/pytorch/issues/11398 CC The controller you requested could not be found. ezyang gchanan soumith fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/11725 Differential Revision: D9948244 Pulled By: goldsborough fbshipit-source-id: 695bbdc1f1597c5e4306a45cd8ba46f15c941383	2018-09-20 14:43:12 -07:00
Mingzhe Li	a7cbcb1bb9	Enable build_python on windows (#11385 ) Summary: The PR aims to resolve issues related to BUILD_PYTHON and BUILD_TEST after FULL_CAFFE2 is removed on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11385 Reviewed By: orionr Differential Revision: D9884906 Pulled By: mingzhe09088 fbshipit-source-id: fc114c0cbff6223f1ec261161e4caecc1fef5dd6	2018-09-17 21:40:03 -07:00
Bram Wasti	e8ecbcdf01	Move IValue to ATen/core (#11610 ) Summary: unblocks D9202320 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11610 Differential Revision: D9774853 Pulled By: bwasti fbshipit-source-id: 4798223f6de680a7152283e8cad8814da7f90209	2018-09-17 18:25:50 -07:00
Soumith Chintala	73738ec570	bump version to 1.0 (#11717 ) Summary: I'm just doing the honors and bumping the version to 1.0.0. 1.0 preview and RC releases will have the 1.0.0.dev{date} tag Pull Request resolved: https://github.com/pytorch/pytorch/pull/11717 Reviewed By: SsnL Differential Revision: D9840857 Pulled By: soumith fbshipit-source-id: 4c9c2e01dccb3c521dab26c49e1569d970a87ace	2018-09-17 12:13:48 -07:00
Gregory Chanan	e125e61824	Fix flake8 Summary: Fix flake8 Reviewed By: ezyang Differential Revision: D9873872 fbshipit-source-id: 26e81238f22caaeccd2c8b4f39cedb6cfb5520dd	2018-09-17 11:10:29 -07:00
Jesse Hellemn	5bfd8f583c	Moving copy of Caffe2 protos back to build_pytorch_libs.sh (#11726 ) Summary: This way it shows up in all current and future setup.py commands, as otherwise we'd have to override every once to have them all call copy_protos. This is needed because the nightly packages still do not include caffe2_pb2, because setup.py bdist does not go through setup.py install or setup.py develop Pull Request resolved: https://github.com/pytorch/pytorch/pull/11726 Reviewed By: orionr Differential Revision: D9844075 Pulled By: pjh5 fbshipit-source-id: 57b469e48010aacd0c08c214ba8a7e5d757feefa	2018-09-17 08:58:05 -07:00
Soumith Chintala	acb6f18bab	fix generate_code.py caching (#11644 ) Summary: Currently, because of some setup.py logic, `ninja` caching of the `generate_code.py` build step was broken. This resulted in `generate_code.py` running every single time builds were happening, regardless of whether inputs changed. This updated logic fixes the input caching Pull Request resolved: https://github.com/pytorch/pytorch/pull/11644 Reviewed By: orionr Differential Revision: D9814348 Pulled By: soumith fbshipit-source-id: 2012960908d0f600488d410094095cfd72adc34f	2018-09-13 12:39:48 -07:00
Teng Li	6dcdbd3a1d	Make C10d support CPU only build (#11513 ) Summary: This makes torch.distributed works for CPU only build. Also added one more CI test case to cover MPI CPU build. All CI tests should cover this change Pull Request resolved: https://github.com/pytorch/pytorch/pull/11513 Differential Revision: D9784546 Pulled By: teng-li fbshipit-source-id: 0976a6b0fd199670926f0273e17ad7d2805e42e7	2018-09-11 22:10:34 -07:00
Zachary DeVito	289a8c9b7d	Allow train/eval, and non-Tensor arguments to python functions (#11505 ) Summary: This whitelists train/eval functions in script modules, and tests that nested nn.Modules still work. This also changes the code for calling python functions from script to allow non-tensor inputs/outputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11505 Differential Revision: D9765466 Pulled By: zdevito fbshipit-source-id: 1177bff931324422b69e18fa0bbaa82e3c98ec69	2018-09-11 15:05:09 -07:00
Orion Reblitz-Richardson	d32b41003a	Copy protos on install same as develop (#11517 ) Summary: This is a potential fix for https://github.com/pytorch/pytorch/issues/11453 and https://github.com/pytorch/pytorch/issues/11074 worked through with pjh5 . Turns out we had some protos copy code that was in the .sh file that was removed. Better to have it in setup.py, though, same as for develop. cc ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11517 Differential Revision: D9771911 Pulled By: orionr fbshipit-source-id: 76975d8f71f38d951eaaed0b50dd3ec36dd177a9	2018-09-11 10:09:56 -07:00
Soumith Chintala	4e8d9a4a58	Introducing python setup.py rebuild develop (#11487 ) Summary: This speeds up incremental builds by doing the following changes: - Uses `rsync` instead of `cp` (when `rsync` is found) which is a bit smarter in doing "maybe copy" - Introduces a `rebuild` mode which does not rerun `cmake` in `build_pytorch_libs.sh`. Note: `rebuild` should only be used if you dont add / remove files to the build, as `cmake` is not rerun Current no-op rebuild speedup: - 1m 15s -> 20s There are some lingering bugs. No-op rebuilds rerun `cmake` for two rebuilds (likely that cmake logic is dependent on the install folder, hence kicking off rebuild). So what you see ``` python setup.py rebuild develop # first time - ~5 mins python setup.py rebuild develop # second time - ~3 mins python setup.py rebuild develop # third time - ~2 mins python setup.py rebuild develop # fourth time - ~20 seconds python setup.py rebuild develop # fifth time - ~20 seconds ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11487 Differential Revision: D9769087 Pulled By: soumith fbshipit-source-id: 20fbecde33af6426149c13767e8734fb3be783c5	2018-09-11 08:56:25 -07:00
Orion Reblitz-Richardson	a175282776	Flags for LMDB, LevelDB, and Caffe2 ops (#11462 ) Summary: Add flags for LMDB and LevelDB, default `OFF`. These can be enabled with ``` USE_LMDB=1 USE_LEVELDB=1 python setup.py build_deps ``` Also add a flag to build Caffe2 ops, which is default `ON`. Disable with ``` NO_CAFFE2_OPS=1 python setup.py build_deps ``` cc Yangqing soumith pjh5 mingzhe09088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11462 Reviewed By: soumith Differential Revision: D9758156 Pulled By: orionr fbshipit-source-id: 95fd206d72fdf44df54fc5d0aeab598bff900c63	2018-09-10 17:27:50 -07:00
Peter Goldsborough	a0d4106c07	Integrate custom op tests with CI (#10611 ) Summary: This PR is stacked on https://github.com/pytorch/pytorch/pull/10610, and only adds changes in one file `.jenkins/pytorch/test.sh`, where we now build the custom op tests and run them. I'd also like to take this PR to discuss whether the [`TorchConfig.cmake`](https://github.com/pytorch/pytorch/blob/master/cmake/TorchConfig.cmake.in) I made is robust enough (we will also see in the CI) orionr Yangqing dzhulgakov what do you think? Also ezyang for CI changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/10611 Differential Revision: D9597627 Pulled By: goldsborough fbshipit-source-id: f5af8164c076894f448cef7e5b356a6b3159f8b3	2018-09-10 15:40:21 -07:00
Orion Reblitz-Richardson	802d21c8f4	Remove FULL_CAFFE2 flag (#11321 ) Summary: Continuing pjh5's work to remove FULL_CAFFE2 flag completely. With these changes you'll be able to also do something like ``` NO_TEST=1 python setup.py build_deps ``` and this will skip building tests in caffe2, aten, and c10d. By default the tests are built. cc mingzhe09088 Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/11321 Reviewed By: mingzhe09088 Differential Revision: D9694950 Pulled By: orionr fbshipit-source-id: ff5c4937a23d1a263378a196a5eda0cba98af0a8	2018-09-07 15:09:44 -07:00
Peter Goldsborough	01930a3145	Move sync_params to C++ (#9805 ) Summary: The next function I'm moving to C++ is `sync_params`. It is stacked on top of https://github.com/pytorch/pytorch/pull/9729, so some changes will go away when it lands and I rebase. I also split code into a `.h` and `.cpp` file for better code organization. The controller you requested could not be found. pietern apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9805 Differential Revision: D9688604 Pulled By: goldsborough fbshipit-source-id: 4467104d3f9e2354425503b9e4edbd59603e20a8	2018-09-07 12:56:40 -07:00
iotamudelta	9de2085806	Use custom hcc/HIP, purge hcSPARSE (#11198 ) Summary: * purge hcSPARSE now that rocSPARSE is available * integrate a custom hcc and HIP * hcc brings two important compiler fixes (fixes hundreds of unit tests) * HIP brings a smart dispatcher that allows us to avoid a lot of static_casts (we haven't yet removed the automatic static_casts but this catches some occurrences the script did not catch) * mark 5 unit tests skipping that have regressed w/ the new hcc (we don't know yet what is at fault) * optimize bitonic sort - the comparator is always an empty struct - therefore passing it by value saves at least 3 bytes. It also removes an ambiguity around passing references to `__global__` functions Pull Request resolved: https://github.com/pytorch/pytorch/pull/11198 Differential Revision: D9652340 Pulled By: ezyang fbshipit-source-id: f5af1d891189da820e3d13b7bed91a7a43154690	2018-09-06 19:38:07 -07:00
Orion Reblitz-Richardson	dda8402447	Cleanup dependency of distributed flags (#11221 ) Summary: Now that we're building everything together, making all distributed flags conditional of USE_DISTRIBUTED being set. cc pietern The controller you requested could not be found. cpuhrsch Pull Request resolved: https://github.com/pytorch/pytorch/pull/11221 Reviewed By: Yangqing Differential Revision: D9664267 Pulled By: orionr fbshipit-source-id: a296cda5746ad150028c97160f8beacba955ff73	2018-09-06 08:56:00 -07:00
Jesse Hellemn	c0efe6f027	Forward declarations of needed curand functions (#10911 ) Summary: Needed for FULL_CAFFE2=1 with statically linked CUDA libraries. Waiting on advice from Nvidia Pull Request resolved: https://github.com/pytorch/pytorch/pull/10911 Reviewed By: pjh5 Differential Revision: D9636256 Pulled By: orionr fbshipit-source-id: fcad7945910b6c8fb5f52e81cc87dad5fcfb3c65	2018-09-05 16:56:26 -07:00
Richard Zou	68c2e014cb	Handling for py2/py3 division differences (#11016 ) Summary: - In Python 2, use of `/` (regardless of int/float/Tensor) causes a compiler error if `from __future__ import division` is not imported in the file. - The / operator is universally set to do "true" division for integers - Added a `prim::FloorDiv` operator because it is used in loop unrolling. The error if users use '/' in python 2 without importing from __future__ occurs when building the JIT AST. cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11016 Differential Revision: D9613527 Pulled By: zou3519 fbshipit-source-id: 0cebf44d5b8c92e203167733692ad33c4ec9dac6	2018-09-05 14:57:38 -07:00
Teng Li	020501b7b0	Getting rid of USE_C10D for build (#11237 ) Summary: Will use USE_DISTRIBUTED for both c10d and THD Pull Request resolved: https://github.com/pytorch/pytorch/pull/11237 Differential Revision: D9647825 Pulled By: teng-li fbshipit-source-id: 06e0ec9b5e2f8f38780fc88718f8499463e9e969	2018-09-04 17:27:53 -07:00
iotamudelta	33c7cc13ca	improve docker packages, fix bugs, enable tests, enable FFT (#10893 ) Summary: * improve docker packages (install OpenBLAS to have at-compile-time LAPACK functionality w/ optimizations for both Intel and AMD CPUs) * integrate rocFFT (i.e., enable Fourier functionality) * fix bugs in ROCm caused by wrong warp size * enable more test sets, skip the tests that don't work on ROCm yet * don't disable asserts any longer in hipification * small improvements Pull Request resolved: https://github.com/pytorch/pytorch/pull/10893 Differential Revision: D9615053 Pulled By: ezyang fbshipit-source-id: 864b4d27bf089421f7dfd8065e5017f9ea2f7b3b	2018-09-02 08:54:42 -07:00
Teng Li	3791bd12c8	PT1 Release Milestone No.2 MPI Group Support with all tests passed (#11128 ) Summary: Added MPI group support. And this will make all previous group test cases of MPI passed. Also, release the MPI thread level support by serializing different PG's MPI ops. This is required. The build is fixed too Pull Request resolved: https://github.com/pytorch/pytorch/pull/11128 Differential Revision: D9602188 Pulled By: teng-li fbshipit-source-id: 1d618925ae5fb7b47259b23051cc181535aa7497	2018-08-31 12:39:56 -07:00
Edward Yang	cd9416317d	Minor copy-edit on setup.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10933 Reviewed By: cpuhrsch Differential Revision: D9526650 fbshipit-source-id: 8ad1c989bee7009b3f95a2641189f55cf6c1979f	2018-08-29 13:41:04 -07:00
Orion Reblitz-Richardson	3c9775fff8	Remove nanopb since we've switched to protobuf (#10772 ) Summary: We no longer use nanopb in PyTorch (or Caffe2) so removing. All protobuf manipulation should go through standard protobuf, which is statically linked inside libcaffe2.so by default. cc zdevito pjh5 ezyang Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/10772 Reviewed By: pjh5 Differential Revision: D9465894 Pulled By: orionr fbshipit-source-id: 8cdf9f1d3953b7a48478d381814d7107df447201	2018-08-24 10:54:38 -07:00
Orion Reblitz-Richardson	8c13971f57	Remove protobuf require and use requirements.txt (#10771 ) Summary: In prep for making FULL_CAFFE2 default, users shouldn't be required to have protobuf installed. cc pjh5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10771 Reviewed By: pjh5 Differential Revision: D9474458 Pulled By: orionr fbshipit-source-id: 3e28f5ce64d125a0a0418ce083f9ec73aec62492	2018-08-24 10:39:40 -07:00
Johannes M Dieterich	a4c59a9dab	MIOpen integration, more tests enabled, bug fixes (#10612 ) Summary: * first integration of MIOpen for batch norm and conv on ROCm * workaround a ROCm compiler bug exposed by elementwise_kernel through explicit capture of variables in the densest packing * workaround a ROCm compiler bug exposed by having `extern "C" __host__` as a definition and just `__host__` in the implementation through the hipify script * use fabs() in accordance with C++11 for double absolute, not ::abs() which is integer-only on ROCm * enable test_sparse set on CI, skip tests that don't work currently on ROCm * enable more tests in test_optim after the elementwise_bug got fixed * enable more tests in test_dataloader * improvements to hipification and ROCm build With this, resnet18 on CIFAR data trains without hang or crash in our tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10612 Reviewed By: bddppq Differential Revision: D9423872 Pulled By: ezyang fbshipit-source-id: 22c0c985217d65c593f35762b3eb16969ad96bdd	2018-08-23 15:24:47 -07:00
Edward Yang	227635142f	Delete THD master_worker (#10731 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10731 Differential Revision: D9423675 Pulled By: ezyang fbshipit-source-id: 37221e11d84cc3672b944af598ea229a1d4c38cc	2018-08-22 08:54:36 -07:00
Peter Goldsborough	c101a57a74	Build mechanism for custom operators (#10226 ) Summary: This is the last step in the custom operator implementation: providing a way to build from C++ and Python. For this I: 1. Created a `FindTorch.cmake` taken largely from ebetica with a CMake function to easily create simple custom op libraries 2. Created a ` torch/op.h` header for easy inclusion of necessary headers, 3. Created a test directory `pytorch/test/custom_operator` which includes the basic setup for a custom op. 1. It defines an op in `op.{h,cpp}` 2. Registers it with the JIT using `RegisterOperators` 3. Builds it into a shared library via a `CMakeLists.txt` 4. Binds it into Python using a `setup.py`. This step makes use of our C++ extension setup that we already have. No work, yey! The pure C++ and the Python builds are separate and not coupled in any way. zdevito soumith dzhulgakov Pull Request resolved: https://github.com/pytorch/pytorch/pull/10226 Differential Revision: D9296839 Pulled By: goldsborough fbshipit-source-id: 32f74cafb6e3d86cada8dfca8136d0dfb1f197a0	2018-08-16 18:56:17 -07:00
Anders Papitto	130881f0e3	Delete build_caffe2.sh, replace with build_libtorch.py (#10508 ) Summary: delete build_caffe2.sh, replace with build_libtorch.py as suggested by peter (and copy-pasted from his draft PR). This ensures that all consumers of the torch CMake file go through as unified a path as possible. In order to change the surrounding infrastructure as little as possible, I made some tweaks to enable build_pytorch_libs.sh to generate the test binaries relative to the current directory, rather than hardcoding to pytorch/build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10508 Differential Revision: D9354398 Pulled By: anderspapitto fbshipit-source-id: 05b03df087935f88fca7ccefc676af477ad2d1e9	2018-08-16 08:10:04 -07:00
Orion Reblitz-Richardson	021b4888db	Remove setup_requires and tests_require from setup.py for FULL_CAFFE2 (#10530 ) Summary: In my environment, it looks like setup.py hangs when running ``` FULL_CAFFE2=1 python setup.py build_deps ``` Removing this fixes things, but we might also want to look at `tests_require`, which came over from `setup_caffe2.py`. cc pjh5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10530 Differential Revision: D9349597 Pulled By: orionr fbshipit-source-id: 589145eca507dfaf16386884ee2fbe60299660b4	2018-08-15 14:26:53 -07:00
Anders Papitto	d1442b36f3	add a rebuild_libtorch command for speedier iteration. (#10036 ) Summary: It just calls into `ninja install`. For iterative work on libtorch.so/_C.so, `python setup.py rebuild_libtorch develop` should provide quick iteration Pull Request resolved: https://github.com/pytorch/pytorch/pull/10036 Differential Revision: D9317869 Pulled By: anderspapitto fbshipit-source-id: 45ea45a1b445821add2fb9d823a724fc319ebdd2	2018-08-14 12:10:02 -07:00
iotamudelta	75651d5b58	improve use of ROCm libraries, enable more tests, small fixes (#10406 ) Summary: * some small leftovers from the last PR review * enable more unit test sets for CI * replace use of hcRNG w/ rocRAND (docker image was already updated w/ newer rocRAND) * use rocBLAS instead of hipBLAS to allow convergence w/ Caffe2 * use strided_batched gemm interface also from the batched internal interface * re-enable Dropout.cu as we now have philox w/ rocRAND Pull Request resolved: https://github.com/pytorch/pytorch/pull/10406 Reviewed By: Jorghi12 Differential Revision: D9277093 Pulled By: ezyang fbshipit-source-id: 7ef2f6fe4ead77e501ed7aea5c3743afe2466ca2	2018-08-13 11:39:43 -07:00
Jesse Hellemn	cd81217f8e	A single print statement in setup.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10473 Reviewed By: ml7 Differential Revision: D9299196 Pulled By: pjh5 fbshipit-source-id: f9aa84c2859df12f9da9ac5205e1918c253e19fb	2018-08-13 11:39:42 -07:00
Sam Gross	0b63d12db6	Don't call into Python during Storage destruction. (#10407 ) Summary: ``` This removes PyObjectFinalizer. We were seeing SIGSEGV at exit in some programs that use multiprocessing. The backtrace pointed to StorageRef.__del__ being called from subtype_dealloc. My guess is that the Python interpreter was shutdown before all C++ Storage objects were deallocated. Deallocating the C++ Storage called the finalizer which called back into Python after it was no longer safe to do so. This avoids a callback from C++ into Python during Storage finalization. Instead, dead Storage objects (expired weak references) are collected periodically when shared_cache exceeds a limit. The limit is scaled with 2x the number of live references, which places an upper bound on the amount of extra memory held by dead Storage objects. In practice, this should be very small. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10407 Differential Revision: D9272400 Pulled By: colesbury fbshipit-source-id: ecb14d9c6d54ffc91e134c34a4e770a4d09048a2	2018-08-13 11:20:07 -07:00
Jesse Hellemn	def3715e82	Minor changes for nicer pip packages (#9544 ) Summary: I am using this to test a CI job to upload pip packages, and so am using the Caffe2 namespace to avoid affecting the existing pytorch packages. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9544 Reviewed By: orionr Differential Revision: D9267111 Pulled By: pjh5 fbshipit-source-id: a68162ed29d2eb9ce353d8435ccb5f16c3b0b894	2018-08-10 12:09:46 -07:00
Yangqing Jia	40109b16d0	Remove caffe1 specific proto (#10380 ) Summary: This was used as a convenient way for us to convert c1 models. Now that conversion is more or less done, we should probably require any users who need to convert c1 models to explicitly install c1. This PR removes the explicit c1 proto (which was copied from c1) in favor of explicit installation. Note that caffe_translator would still work properly, only difference is that now users need to install c1 separately. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10380 Differential Revision: D9267981 Pulled By: Yangqing fbshipit-source-id: a6ce5d9463e6567976da83f2d08b2c3d94d14390	2018-08-10 11:10:26 -07:00
peter	506142ac8a	Add warning for building PyTorch using Python 2.7 on Windows (#10247 ) Summary: Fixes #9232. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10247 Differential Revision: D9178257 Pulled By: SsnL fbshipit-source-id: cc553335a5a918b6d77fe1064460cb66114859ca	2018-08-05 21:24:02 -07:00
Shuichi KITAGUCHI	df23bdc82d	add BEGIN NOT-CLEAN-FILES marker to .gitignore. (#10233 ) Summary: Using Visual Studio Code and Visual Studio, these IDEs store configurations to `FOLDER/.vscode` and `FOLDER/.vs`. But "setup.py clean" deletes these folders because those are described in `.gitignore` file. To prevent this, add "BEGIN NOT-CLEAN-FILES" marker to `.gitignore` file and "setup.py clean" ignores lines after this marker. Discussed in #10206 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10233 Differential Revision: D9175515 Pulled By: ezyang fbshipit-source-id: 24074a7e6e505a3d51382dc5ade5c65c97deda37	2018-08-05 15:55:44 -07:00
Elias Ellison	170d29769b	Strings lexing, parsing, implementation in print (#9324 ) Summary: This PR adds strings to the ast and implements them for print statements. Strings are lifted as attributes to the print node. They must be arguments to print itself, not as an argument for an object that is passed to print. If they are encountered elsewhere a NYI exception will be thrown. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9324 Reviewed By: jramseyer Differential Revision: D8807128 Pulled By: eellison fbshipit-source-id: 984401ff458ed18d473c6d1bd86750e56c77d078	2018-08-02 11:09:03 -07:00
Gregory Chanan	2d56b5cf8b	Prepare THC for first class scalars (0-dimensional tensors). Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10072 Differential Revision: D9082421 Pulled By: gchanan fbshipit-source-id: d4327b07aaef85cc2521393008154ebceae8cbfd	2018-08-01 14:28:51 -07:00
Edward Yang	37a226de63	When BUILD_ATEN=OFF, use ATen/core directly (#10019 ) Summary: ATenCore.h is a dummy header to just test that this is working at all. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10019 Reviewed By: smessmer Differential Revision: D9067262 Pulled By: ezyang fbshipit-source-id: 58bab9c0aa83b56335e36b719b9b6505400d8dee	2018-07-30 21:09:55 -07:00
Edward Yang	a08119afc2	Eliminate direct access to size/strides of THTensor; replace them with std::vector (#9561 ) Summary: * THTensor now stores `sizes_` and `strides_` which is a `std::vector<int64_t>` * Anywhere a "public" API function made use of a int64_t* of sizes, I opted to just finagle it out of the tensor using THTensor_getSizePtr rather than try to rewrite all of these sites to use ArrayRef. They should use ArrayRef eventually, but not yet. * There are new utility functions for resizing sizes/strides in one go (THTensor_resizeDim), or replacing sizes and strides with completely new values (THTensor_setSizesAndStrides) * Anywhere you said `t->size[n] = 0`, we now say `THTensor_setSizeAt(t, n, 0)`, ditto for strides * Anywhere you said `t->size[n]`, we now say `t->size(n)` (coming soon: ditto for strides) Previous review of just the `std::vector` change in #9518, but I'm planning to merge this all in one go. Note for gchanan: review from commit "ci" and after Pull Request resolved: https://github.com/pytorch/pytorch/pull/9561 Reviewed By: cpuhrsch Differential Revision: D8901926 Pulled By: ezyang fbshipit-source-id: 483cf275060ab0a13845cba1ece39dd127142510	2018-07-19 14:10:06 -07:00
Anders Papitto	4c615b1796	Introduce libtorch to setup.py build (#8792 ) Summary: Prior to this diff, there have been two ways of compiling the bulk of the torch codebase. There was no interaction between them - you had to pick one or the other. 1) with setup.py. This method - used the setuptools C extension functionality - worked on all platforms - did not build test_jit/test_api binaries - did not include the C++ api - always included python functionality - produced _C.so 2) with cpp_build. This method - used CMake - did not support Windows or ROCM - was capable of building the test binaries - included the C++ api - did not build the python functionality - produced libtorch.so This diff combines the two. 1) cpp_build/CMakeLists.txt has become torch/CMakeLists.txt. This build - is CMake-based - works on all platforms - builds the test binaries - includes the C++ api - does not include the python functionality - produces libtorch.so 2) the setup.py build - compiles the python functionality - calls into the CMake build to build libtorch.so - produces _C.so, which has a dependency on libtorch.so In terms of code changes, this mostly means extending the cmake build to support the full variety of environments and platforms. There are also a small number of changes related to the fact that there are now two shared objects - in particular, windows requires annotating some symbols with dllimport/dllexport, and doesn't allow exposing thread_local globals directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8792 Reviewed By: ezyang Differential Revision: D8764181 Pulled By: anderspapitto fbshipit-source-id: abec43834f739049da25f4583a0794b38eb0a94f	2018-07-18 14:59:33 -07:00
Chunli Fu	a487b08c2e	AutoBatching - IR transformation(basic operators) (#9198 ) Summary: Use decorator `torch.jit.batch` to implement auto-batching (call `to_batch` pass to do IR tranformation). - `to_batch` pass: "to_batch.h/cpp" in csrc/jit/passess to transform a graph to a new batched graph. - Write several basic operators for BatchTensor (add, mul, sigmoid, tanh, mm, matmul, select). - Register the operators in a lookup table `<std::string, std::shared_ptr<Graph>>`. (use the Graph to replace the original node in IR graph) Move BatchTensor in python from torch.BatchTensor to torch.jit.BatchTensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/9198 Reviewed By: zdevito Differential Revision: D8744466 Pulled By: ChunliF fbshipit-source-id: 9ea56a30f55cb870f13a2069a47cc635419763ff	2018-07-11 18:25:07 -07:00
Adam Paszke	b9f575fc33	Remove legacy code from the JIT (#9323 ) Summary: In particular, get rid of backward tracing and CppOp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9323 Reviewed By: ezyang Differential Revision: D8795935 Pulled By: apaszke fbshipit-source-id: fb7a7eeee41902da35f2a8efd77262ca60fd6bbe	2018-07-11 10:25:38 -07:00
Zachary DeVito	efefd1d7cf	Unify aten_dispatch and aten_schema into a single operator abstraction with human-readable schema. (#8885 ) Summary: This is a series of two commits that should probably be read separately. They are stacked on top of #9018 since the second commit requires it for correctness. Commit 1 ======= This commit is the first in a series that will clean up how we handle declaring operators and intrinsics in the JIT to make it more modular and readable. This introduces readable declarations that can be used to register operators and switches gen_jit_dispatch to generate this schema. A follow up PR will remove the dispatch keys like "add-3" and resolve ops directly based on the registered schema, further simplifying the generation process. * Switches schema over to parsed declarations, in the future this will allow something like: ``` registry.register_intrinsic("foo(Tensor a, Tensor b) -> Tensor", [](Stack& stack) { ... }) ``` This will allow the scalable registration of intrinsics for lists, tuples, and other ops, as long as meta-data for these ops (e.g. derivatives and size propagation routines). The declarations resemble those used by PythonArgParser but have been singificantly cleaned up to minimize the number of types that can appear in the declaration. We should strive to get the other parts of PyTorch switched over to this restricted declaration set when possible, but it is too much to do in a single PR. My hope is that eventually we will use a very similar language to describe declarations in C10, and this can serve as a guide for that. Parsing is done using the script lexer, so it is very robust to whitespace and extensible for future types. This removes the other way we encoded schema, and makes it easier to see what schema are registered. Current generated declarations: https://gist.github.com/zdevito/a96a17766fb3a098d69a91ee00abaaf6 * Switches how we handle attempting to use an integer in the place of a fixed-sized int list, such as in conv (e.g. 'int[3] stride=1'). Now that we can statically distinguish between int and Tensor, we handle the expansion as an implicit conversion in the compiler. This allows us to simplify the interpreter since it no longer needs to handle the conversion itself. * Schema declarations have been changed so that they match the type system in the IR exactly. In particular, attribute_info which was used by liftConstantAttributes has been dropped and constant attributes are lifted purely based on the type of the input. Type conversions in compiler have been simplified due to this change. * Error highlighting in ErrorReport now only reports at most 20 lines of code, to make reading where an error occurred easier. Commit 2 ======= This commit unifies aten_dispatch and aten_schema into a single Operator object that both contains schema and implementation information. In the future we can use this object to also contain functionality like shape prop and autodiff needed by all operators. Operators are registered globally, and dispatch logic uses the schema information to figure out which variant to use. Descriptor keys, a frequent source of inscrutable debug errors, have been removed. * Introduce Operator, to replace TensorOp. Unlike TensorOp, we use Operator for all op implementations, including primitives that may occur in the graphs. The only exceptions are ops that are only known to the interpreter like jumps, and GraphExecutors where we need to record additional debug info. * Adds a global registry for Operator implementations. aten_dispatch.cpp turns into register_aten_ops.cpp, which registers all the Operators for aten with the operator registry. register_prim_ops.cpp now contains the implementations for primitive operators that used to be in the interpreter. This means that it is now safe to use `getOperation(node)` to lookup the true interpreter function for the node, which will simplify const-propagation passes. * Remove addInterpreterOpHandler in favor of global operator registry. * Instead of descriptors, we match Node arguments directly against FunctionSchema describing expected inputs in `matchSchema`. `matchSchema` knows how parse both attributes and positional inputs from a node and match it to the appropriate registered operator. Debug error messages when we try to run an invalid operator are significantly improved: they now automatically display the schema for the op with the same name that are registered. * Merge aten_schema into regsiter_aten_ops. Each Operator takes a string schema which is parsed to determine when to dispatch to that op. * Cleans up gen_jit_dispatch.py now that we do not need to write out descriptors. In particular, skip_scalar_overloads can be removed since Richard's code sorts declarations to put Tensor, Tensor declarations first. * remove matchSchemaAndLiftConstantAttributes and use emitBuiltinCall instead to remove code duplication * refactor stack manipulation functions into a separate header file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8885 Reviewed By: jamesr66a Differential Revision: D8751048 Pulled By: zdevito fbshipit-source-id: 312aabfbf88307c5f6ab947b6caf691468b94557	2018-07-10 10:24:48 -07:00
Edward Yang	d0d1820814	Add weak pointer and finalizer support directly to THStorage. (#9148 ) Summary: The underlying use-case is the file descriptor to storage cache in torch.multiprocessing.reductions. Previously, this was implemented by wrapping an existing allocator with a "weak ref" allocator which also knew to null out the weak reference when the storage died. This is terribly oblique, and prevents us from refactoring the allocators to get rid of per-storage allocator state. So instead of going through this fiasco, we instead directly implement weak pointers and finalizers in THStorage. Weak pointers to THStorage retain the THStorage struct, but not the data_ptr. When all strong references die, data_ptr dies and the finalizers get invoked. There is one major hazard in this patch, which is what happens if you repeatedly call _weak_ref on a storage. For cleanliness, we no longer shove our grubby fingers into the finalizer struct to see if there is already a Python object for the weak reference and return it; we just create a new one (no one is checking these Python objects for identity). This means if you keep calling it, we'll keep piling on finalizers. That's bad! But I am not going to fix it until it is actually a problem for someone, because then we need to add another caching layer. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9148 Differential Revision: D8729106 Pulled By: ezyang fbshipit-source-id: 69710ca3b7c7e05069090e1b263f8b6b9f1cf72f	2018-07-10 06:25:33 -07:00
Peter Goldsborough	4498fb962b	Add space around operator (#9294 ) Summary: Fixes lint failure on master Pull Request resolved: https://github.com/pytorch/pytorch/pull/9294 Differential Revision: D8779010 Pulled By: goldsborough fbshipit-source-id: da1ea2604189fd704c22fa8a5770bd92845cea91	2018-07-09 20:24:21 -07:00
Jesse Hellemn	99ab082366	Making setup.py install work for Caffe2 (#8509 ) Summary: Tested on my mac on a pretty clean anaconda3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/8509 Reviewed By: orionr Differential Revision: D8702257 Pulled By: pjh5 fbshipit-source-id: eda03ef9732da9fc56b31d909af5c0e39520d689	2018-07-09 18:10:58 -07:00
Zachary DeVito	819815d9c0	Fix missing compile_commands.json for aten (#9227 ) Summary: When we moved the libaten build into libcaffe2, we changed the location where it generated compile_commands.json such that it was no longer being picked up by the build script. This fixes it so it is still found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9227 Reviewed By: goldsborough Differential Revision: D8757984 Pulled By: zdevito fbshipit-source-id: 73df26bf08d98f18ac841d6c0db7e332fd328ab6	2018-07-08 16:54:34 -07:00
Francisco Massa	f6027bb15d	Install hpp headers for CPP Extensions (#9182 ) Summary: With the Cppzation of a few files in `TH`/`THC`, the CPP extensions got broken whenever the user uses feature from `THC` in their files, when pytorch is installed via `python setup.py install`. This addresses issues such as ``` /home/me/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/include/THC/THCDeviceTensorUtils.cuh:5:25: fatal error: THCTensor.hpp: No such file or directory ``` Closes https://github.com/pytorch/pytorch/pull/9182 Reviewed By: soumith Differential Revision: D8734581 Pulled By: fmassa fbshipit-source-id: 2a1138f208592eaccb01fcdb805a6b369d7a497a	2018-07-05 07:55:25 -07:00
Roy Li	c61f0217a5	combine size_average and reduce args in loss functions (#8018 ) Summary: closes #7929 Closes https://github.com/pytorch/pytorch/pull/8018 Differential Revision: D8682540 Pulled By: li-roy fbshipit-source-id: 649170dd1a7f373151c1d4e949838bd1c5651936	2018-07-01 05:39:00 -07:00
Chunli Fu	67b21117b7	Add BatchTensor class (#8922 ) Summary: Add BatchTensor class - construct from data, mask, dims or construct from list of tensors - can return a list of tensors from an BatchTensor class next step: do IR level transformation and operators Closes https://github.com/pytorch/pytorch/pull/8922 Differential Revision: D8668986 Pulled By: ChunliF fbshipit-source-id: 8b24d2a9f46a3b42dbb397e99e9e059dfb2b326e	2018-06-29 15:57:27 -07:00
Zachary DeVito	f74207c99f	Allow autograd to work even when the shape of values cannot be determined (#8641 ) This commit implements the solution proposed in https://github.com/pytorch/pytorch/issues/8410 to workaround the need to create zero tensors with the same shape as inputs. It introduces the concept of a LinearBlock which marks places in the code where we know if all the inputs to the node are zero, then the outputs to the node are also zero. Autodiff introduces LinearBlocks around backwards functions, which have this property. specializeUndef then propagates Undef nodes using this information. Notes: * Since we do not always specialize, we have a pass LowerLinearBlocks that replaces the block with an if statement that dynamically guards the Undef case. * We introduce AutogradAdd which is addition that still works when its inputs might be undefined. In cases where we specialize this will get removed in favor of a normal add, but there are cases where gradient graphs do not specialize (e.g. when they are not differentiable, but a derivative is required) so it is important for this op to be executable.	2018-06-25 18:40:04 -07:00
Orion Reblitz-Richardson	5a7b4840d9	Move nanopb-generated ONNX to unique file name (#8773 ) * Move nanopb-generated ONNX to unique file name * fix other places	2018-06-22 09:51:56 -04:00
Richard Zou	8489c4cc6e	Better support for literals in jit script (#8687 ) Addresses #8177 A design doc can be found here: [gist](https://gist.github.com/zou3519/4b7f13f03cc9f3612bd9363e6405fa0a) version or [quip](https://fb.quip.com/azL1AqUckBdo) version General approach: - Add NumberType, FloatType, IntType to represent Python numbers, floats and ints. - Emit these types for python literals - Change aten_schema such that Scalars are NumberType, int64_t and bool are IntType. - Emit aten::type_as, prim::NumToTensor, and prim::TensorToNum nodes for tensor-number math. (see examples below) - Erase NumberType, prim::NumToTensor, and prim::TensorToNum for ONNX export ### Tensor/number math ``` import torch @torch.jit.script def fn(x): return x + 1 ``` ``` graph(%x : Dynamic) { %1 : int = prim::Constant[value={1}]() %2 : Dynamic = prim::NumToTensor(%1) %3 : Dynamic = aten::type_as(%2, %x) %4 : Dynamic = aten::add[alpha={1}](%x, %4) return (%5); } ``` ### Number/Number Math ``` import torch @torch.jit.script def fn(zero): c = 1 + 1 return zero + c ``` ``` graph(%zero : Dynamic) { %1 : int = prim::Constant[value={1}]() %2 : int = prim::Constant[value={1}]() %3 : Dynamic = prim::num_to_tensor(%1) %4 : Dynamic = prim::num_to_tensor(%2) %5 : Dynamic = aten::add[alpha={1}](%3, %4) %c : int = prim::TensorToNum(%6) # this is the result of the addition ... return (%13); } ``` List of squashed commits: * Introduce Python Number types Added: IntType, FloatType, NumberType with IntType <: NumberType FloatType <: NumberType Changed aten_schema so arguments have corresponding types * Emit a NumberType for python literals. Also emit a NumberType for Scalar default values. * Add prim::NumToTensor and prim::TensorToNum * Add DynamicType -> NumberType implicit cast for bc * Better ensureTensor error message * Add ensureTensorOrNumber. Allow passing Number to some functions Like the range() construct and slices * Patch IntList to work. IntList is still a DynamicType in the frontend: a tensor gets built from a List[int]. Also, IntList[1] is a "union between int and IntList" the way it is implemented. If the frontend sees an int being passed for an IntList[1] arg, it converts it to a tensor as well. * Enforce some order on schemas to avoid overload ambiguity add(Tensor, Tensor) should appear earlier than add(Tensor, Scalar). This matches the order in which python_arg_parser parses its arguments. * Disable std_dim and var_dim tests. With the new schema information, std(input, keepdim) and std(input, dim) are ambiguous. This will need to be fixed at a later date. * Add NumberType erasure pass. This is used for ONNX export and to ensure that NumberType information doesn't reach the interpreter * Add support for mixed tensor/number math ops. * Tests for new functionality. Includes: - Tensor/number math - number/number math - EraseNumberTypes pass test * Patch tests Update expect tests for: - decompose_addmm - loop unrolling tests Because python numbers are now NumberType, they cannot be returned by functions anymore. Work around this by using "torch.full", or by adding a tensor([0]) (taken from FIXME_zerol()). Both approaches are used because torch.full is more readable, but it is broken in some cases. * Add erase_number_types to torch/CMakeLists.txt * Move math back to emitSimpleExpr from emitSugaredExpr * Remove some dead lines * Renable some excluded script/trace tests that are fixed. * Move some tests to expected failure * Address some comments (more addressing to come) * Erase relevant aten::type_as nodes in EraseNumberTypes I also changed it so that EraseNumberTypes is only called for ONNX export. It is no longer used to prevent prim::NumToTensor/prim::TensorToNum from reaching shape_analysis or interpreter.cpp. shape_analysis infers the type of the output of these nodes to be the same as their input. intepreter.cpp treats both of these nodes as no-ops. * Add reminder to fix std/var * Call EraseNumberTypes only when exporting a script module * Update expects after rebase	2018-06-21 15:43:38 -04:00
anderspapitto	48e90e3339	Build system changes (#8627 ) * All changes needed to get rid of process_github.sh * allow thnn_h_path	2018-06-20 17:45:26 -04:00
Teng Li	61c96811be	[c10d] NCCL python binding and CI test, with bug fixes (#8357 ) * [c10d] NCCL python binding and CI test, with bug fixes * Addressed comments and further bug fix * Made NCCL build optional, made C10D libc10d.a only * Fixed tests so that NCCL pg won't run when not neeeded * Addressed comments	2018-06-19 13:02:39 -07:00
cpuhrsch	05c473b85c	Temporarily remove TBB (#8255 )	2018-06-18 19:31:57 -04:00
Peter Goldsborough	372d1d6735	Create ATen tensors via TensorOptions (#7869 ) * Created TensorOptions Storing the type in TensorOptions to solve the Variable problem Created convenience creation functions for TensorOptions and added tests Converted zeros to TensorOptions Converted rand to TensorOptions Fix codegen for TensorOptions and multiple arguments Put TensorOptions convenience functions into torch namespace too All factory functions except _like support TensorOptions Integrated with recent JIT changes Support _like functions Fix in place modification Some cleanups and fixes Support sparse_coo_tensor Fix bug in Type.cpp Fix .empty calls in C++ API Fix bug in Type.cpp Trying to fix device placement Make AutoGPU CPU compatible Remove some auto_gpu.h uses Fixing some headers Fix some remaining CUDA/AutoGPU issues Fix some AutoGPU uses Fixes to dispatch_tensor_conversion Reset version of new variables to zero Implemented parsing device strings Random fixes to tests Self review cleanups flake8 Undo changes to variable.{h,cpp} because they fail on gcc7.2 Add [cuda] tag to tensor_options_cuda.cpp Move AutoGPU::set_index_from into .cpp file because Windows is stupid and sucks Fix linker error in AutoGPU.cpp Fix bad merge conflict in native_functions.yaml Fixed caffe2/contrib/aten Fix new window functions added to TensorFactories.cpp * Removed torch::TensorOptions Added code to generate wrapper functions for factory methods Add implicit constructor from Backend to TensorOptions Remove Var() from C++ API and use torch:: functions Use torch:: functions more subtly in C++ API Make AutoGPU::set_device more exception safe Check status directly in DynamicCUDAHooksInterface Rename AutoGPU to DeviceGuard Removed set_requires_grad from python_variables.h and warn appropriately in Variable::set_requires_grad remove python_default_init: self.type() Add back original factory functions, but with deprecation warnings Disable DeviceGuard for a couple functions in ATen Remove print statement Fix DeviceGuard construction from undefined tensor Fixing CUDA device compiler issues Moved as many methods as possible into header files Dont generate python functions for deprecated factories Remove merge conflict artefact Fix tensor_options_cuda.cpp Fix set_requires_grad not being checked Fix tensor_new.h TEMPORARILY put some methods in .cpp files to see if it solves issues on windows and mac Fix bug in DeviceGuard.h Missing includes TEMPORARILY moving a few more methods into .cpp to see if it fixes windows Fixing linker errors * Fix up SummaryOps to use new factories Undo device agnostic behavior of DeviceGuard Use -1 instead of optional for default device index Also move DeviceGuard methods into header Fixes around device index after optional -> int32_t switch Fix use of DeviceGuard in new_with_tensor_copy Fix tensor_options.cpp * Fix Type::copy( * Remove test_non_float_params from ONNX tests * Set requires_grad=False in ONNX tests that use ints * Put layout/dtype/device on Tensor * Post merge fixes * Change behavior of DeviceGuard to match AutoGPU * Fix C++ API integration tests * Fix flip functions	2018-06-16 00:40:35 -07:00
Tongzhou Wang	c537fd7432	fix lint (#8567 )	2018-06-15 17:34:39 -04:00
Soumith Chintala	dc186cc9fe	Remove NO_* and WITH_* across codebase, except in setup.py (#8555 ) * remove legacy options from CMakeLists * codemod WITH_ to USE_ for WITH_CUDA, WITH_CUDNN, WITH_DISTRIBUTED, WITH_DISTRIBUTED_MW, WITH_GLOO_IBVERBS, WITH_NCCL, WITH_ROCM, WITH_NUMPY * cover SYSTEM_NCCL, MKLDNN, NNPACK, C10D, NINJA * removed NO_* variables and hotpatch them only in setup.py * fix lint	2018-06-15 12:29:48 -04:00
Orion Reblitz-Richardson	edd4e2c5d1	Expose proto utils and ONNX (#8073 ) * Expose proto utils and ONNX from PyTorch libcaffe2.so * Try to use protobuf from _C.so * Fix ONNX proto header include * Adjust order of imports for ONNX until nanopb goes away * Set and use ONNX_NAMESPACE for PyTorch builds * Show protobuf summary for all builds * Add ONNX_NAMESPACE for cpp_build * Statically link libprotobuf.a into libtorch.so * Set ONNX_NAMESPACE on Windows build * Move core/dispatch up as well * Add /MD flag for Windows build of _C * Potential Windows fix for ONNX and protobuf * Add direct linkage from _C to ONNX on Windows * Only include protobuf wrapper for PyTorch * Pass extra_compile_args to _nvrtc ext build * Remove installation of .a files	2018-06-13 10:25:32 -07:00
Jorghi12	81b92f7515	Get ROCm building again on master (#8343 ) Billing of changes: - New Jenkins script for building on rocm. For now it is a bit hacked together, but we can improve it once CI is running - New ROCM docker image for nightly HIP, and also some legacy packages that we need temporarily - New enabled config py2-clang3.8-rocmnightly-ubuntu16.04-build based off of the existing Caffe2 image (not built yet) - A big pile of cmake fixes, mostly to turn bits on/off when ROCM build is involved - Switch from hiprng to hcrng - Apply some patches directly in code, eliminating the patches - Use __hdiv instead of hdiv, it's more portable - THCNumerics<T>::gt doesn't work in HIP, so simulate it with sub - Add a few more overloads HIP needs - Turn off use of hcc to link (we plan to turn this back on to get tests running) - Search for hiprand, hiprng, hipblas, hipsparse - Better Python 2 portability	2018-06-12 23:05:21 -04:00
albanD	78e3259bbe	Add autograd automatic anomaly detection (#7677 ) * add autograd automatic anomaly detection * python 3 string support * Fix non python build * fix typo in doc * better test and naming fix * fix no python build and python object handling * fix missing checks * clean NO_PYTHON build * Remove unwanted changes	2018-06-11 21:26:17 -04:00
Pieter Noordhuis	695d40efc2	Create initial Python bindings for c10d (#8119 ) * Build and install c10d from tools/build_pytorch_libs.sh * Create initial Python bindings for c10d * clang-format * Switch link order to include more symbols * Add bindings and tests for ProcessGroupGloo * Add broadcast test * Separate build flag for c10d * Explicit PIC property * Skip c10d tests if not available * Remove c10d from Windows blacklist Let it skip by itself because it won't be available anyway. * Make lint happy * Comments * Move c10d module into torch.distributed * Close tempfile such that it is deleted	2018-06-08 12:59:51 -07:00
Soumith Chintala	5e372c7106	fix lint	2018-06-06 12:53:58 -04:00
Paul Jesse Hellemn	8e6f7a1382	[Caffe2] Merging setup.py with setup_caffe2.py (#8129 ) * Mergine setup.pys, torch works, caffe2 works up to other KP * Fix to super call for python 2 * Works on python2 on mac * Consolidating Caffe2 flags	2018-06-06 08:31:31 -07:00
Adam Paszke	f45a3d5558	Add a loop unrolling pass to PyTorch JIT (#7672 )	2018-06-06 09:36:12 +02:00
Zachary DeVito	23dd033b51	Factor python dependency out of interpreter (#7970 ) * Factor python dependency out of interpreter * Remove NO_PYTHON for the autograd engine If there is no python bindings, then a default Engine is constructed the first time it is requested. If the python libraries are loaded, then they override the default accessor and the default engine becomes a python Engine. Note: it is possible for two engines to be generated if a non-python one gets created before the python bindings are loaded. This case is rare, and just results in additional threads being spawned. * Fixing AlexNet test which is skipped in CI	2018-06-01 16:07:21 -04:00
James Reed	1f94a6eab3	[JIT] Fission and fusion passes for addmm (#7938 ) * Addmm decomposition pass * Addmm peephole pass * Fix handling of output shape in fusion pass * Add DCE to the peephole passes * add comments * maybe bugfix? * Fix GPU tests * fix py2/3 test issue	2018-05-30 18:06:58 -04:00
Orion Reblitz-Richardson	4bf0202cac	[build] Have PyTorch depend on minimal libcaffe2.so instead of libATen.so (#7399 ) * Have PyTorch depend on minimal libcaffe2.so instead of libATen.so * Build ATen tests as a part of Caffe2 build * Hopefully cufft and nvcc fPIC fixes * Make ATen install components optional * Add tests back for ATen and fix TH build * Fixes for test_install.sh script * Fixes for cpp_build/build_all.sh * Fixes for aten/tools/run_tests.sh * Switch ATen cmake calls to USE_CUDA instead of NO_CUDA * Attempt at fix for aten/tools/run_tests.sh * Fix typo in last commit * Fix valgrind call after pushd * Be forgiving about USE_CUDA disable like PyTorch * More fixes on the install side * Link all libcaffe2 during test run * Make cuDNN optional for ATen right now * Potential fix for non-CUDA builds * Use NCCL_ROOT_DIR environment variable * Pass -fPIC through nvcc to base compiler/linker * Remove THCUNN.h requirement for libtorch gen * Add Mac test for -Wmaybe-uninitialized * Potential Windows and Mac fixes * Move MSVC target props to shared function * Disable cpp_build/libtorch tests on Mac * Disable sleef for Windows builds * Move protos under BUILD_CAFFE2 * Remove space from linker flags passed with -Wl * Remove ATen from Caffe2 dep libs since directly included * Potential Windows fixes * Preserve options while sleef builds * Force BUILD_SHARED_LIBS flag for Caffe2 builds * Set DYLD_LIBRARY_PATH and LD_LIBRARY_PATH for Mac testing * Pass TORCH_CUDA_ARCH_LIST directly in cuda.cmake * Fixes for the last two changes * Potential fix for Mac build failure * Switch Caffe2 to build_caffe2 dir to not conflict * Cleanup FindMKL.cmake * Another attempt at Mac cpp_build fix * Clear cpp-build directory for Mac builds * Disable test in Mac build/test to match cmake	2018-05-24 07:47:27 -07:00
Zachary DeVito	286cd04a20	JIT cleanup (#7631 ) Cleans up dead code in the JIT: * Remove interpreter_autograd_function * Remove Handles * Remove HandleBuilder * Remove creates_handles, and tracing_autograd_python_function flags * Remove unused var_args * Fix submodules	2018-05-21 10:06:29 -07:00
Adam Paszke	b45f2ff1ae	Remove CompiledFunction + clean up JIT tests (#7421 )	2018-05-16 20:03:04 +02:00
Jorghi12	cd86d4c554	PyTorch AMD Build Scripts (#6625 ) * PyTorch AMD Build Script. * Python invocation for hipify * Adding individual hip fles. * Updating CWD Use the actual path for the file instead of the current working directory, which depends on where the script is invoked. * Updating folder path for amd_build * Removing previous amd_build directory * Updated setup.py to support WITH_ROCM * Renaming the files for CuDNN BatchNorm & Conv since having two .cpp files with the same name results in a linking error in the HCC compiler used for ROCm/AMD. * Removing old BatchNorm & Conv files since they've been renamed. * Updating build path to handle ROCM * Cleaned up the build path and created a FindHIP cmake file for setting up relevant hip paths. * Seperated the individual patch files to make it easier to detect issues while building. * Removed CMakeLists hip files and fixed directory structure * Adding build pytorch amd script * Merged setup patch into PyTorch setup.py & cleaned a few issues * Added information on where to download the hipify-python script. * Resolved linting issues inside of build_pytorch_amd.py * Removing many unnecessary patch files. Removing unnecessary .hip files. Fixing up the build process. * Refactored the PR for supporting HIP * Minimizing the number of changes inside individual patches. * Cleaned up patch files. * Removed patch files. * Updating patches * Removing HIP change from file. * Cleaned up patches * Added AVX/SSE avoidance due to bug with ROCms stack. Just temporary for now. * Removing the other HIP file * Removed patch file + merged ROCm into Aten/test * Removed ATen tests patch file and updated disbale_features yaml to remove headers that don't exist on the HIP stack. * Reduced the number of patches down to 14 after Edward's suggestions. * Transferred deletion of certain functions from patch to yaml file. * Set default Thrust path * Fixed aten files so we now use the templated pow/abs instead of std:: directly. * Removed error from aten/src/THCUNN/Abs.cu * Updated the locations of the cmake build files. Moved THCTensorRandom from a hip to a patch file. Added executable/library commands that can successfully handle either CUDA or HIP. * Removed hip extraction from the build script and removed the old hip file. * Replaced MACRO with function in upper level cmake. * Added empty ELSE() block to prevent the loading of a command without CUDA or HIP. Also added IF guards around torch_cuda_based_add_executable in Aten tests. * Updated aten tests. * Removed the hip include from the ATen header. * Can't throw exceptions on C++ AMP, using abort * Missing IF guards for cuda/hip executables in aten tests. * Removed a series of patch files. * Added template keyword to help out the HCC compiler. * Rebased the specific files displayed in the PR * Fixing typo. * Change flag from "WITH_CUDA" to "NOT NO_CUDA" Replacing "WITH_CUDA" with "NOT NO_CUDA" after the rebase. * Fix LoadHIP path * Updating build files after rebasing. * Reorganization after cpu/gpu separation. * Removed HIPCC from setup.py & removed -shared extra linking args. * Updated CMake / Setup build to correctly link when under ROCm stack. * Removed the unnecessary argument from Extension constructor. * Adding another test to be included with ROCm building. * Updated the setup_helpers scripts in order to get around linter error * Fix syntax issue * Solving lint issue: line too long	2018-05-15 18:38:01 -07:00
Zachary DeVito	ce69d3110b	Improve script builtin checking using schema (#7311 ) Improve script builtin checking using schema * This add aten_schema.h which provides a barebones amount of type and argument information about each builtin operator * emitBuiltinCall is updated to use this information rather than aten_dispatch to ensure the operator is correct. * handling of keyword and position arguments now matches python behavior * There is no longer a requirement that kwargs be constant or that the attributes of an op must be entirely constant or non-constant * compiler now constructs a non-attributed version of the op first and then turns it into the constant-attribute version if all attributes are constants. * default arguments for builtins now work * SugaredValue::call and similar functions now have SourceRange information for their arguments so that error reporting is more accurate Notes: * This does not try to merge the builtin checking with python arg parser. Given that we will eventually have C10 schema which will replace aten_schema, we will eventually have a C++ description of the schema and working of that description directly will be the easiest form to understand. * python function calls and script method calls do not support keyword arguments yet. When we add this support we should refactor the handling in tryEmitSchema that resolves keywords into a common function. * default arguments work * keyword arguments to builtins work (still need to extend to calling python and other script methods) * much better error reporting for incorrect builtins Lift any constants to attributes on nodes when possible * Schema is usable internally in the compiler as the function signatures of script functions as well as for builtin operators. * Adds a List[T] class to better represent the arguments to cat/stack as a type rather than with custom checking. * Support kwargs for calls of script methods A future commit will be needed to add support for: * calls to script _functions_ which are currently are GraphExecutors without schema info. * kwargs to python functions, which will require refactoring python op	2018-05-14 14:46:36 -07:00
Zachary DeVito	93eb50c103	Mark expand nodes as implicit/explicit in trace (#7303 ) When tracing we record expand nodes. This is useful in some cases because it makes it clear a broadcast happened. However, in future runs the broadcast may be different or not needed. This change adds an attribute to expand to track if it was implicitly added. This takes the form of an unused input to expand with a default value. The execution engine then removes implicit expands before execution. Note that shape_analysis will re-add expands when it can prove by shape analysis that they will exist and this is useful for the fuser, so this change should not affect fusion passes.	2018-05-10 10:47:43 -07:00
Edward Z. Yang	64834f6fb8	Split libATen.so into libATen_cpu.so and libATen_cuda.so (#7275 ) * Split libATen.so into libATen_cpu.so and libATen_cuda.so Previously, ATen could be built with either CPU-only support, or CPU/CUDA support, but only via a compile-time flag, requiring two separate builds. This means that if you have a program which indirectly uses a CPU-only build of ATen, and a CPU/CUDA-build of ATen, you're gonna have a bad time. And you might want a CPU-only build of ATen, because it is 15M (versus the 300M of a CUDA build). This commit splits libATen.so into two libraries, CPU/CUDA, so that it's not necessary to do a full rebuild to get CPU-only support; instead, if you link against libATen_cpu.so only, you are CPU-only; if you additionally link/dlopen libATen_cuda.so, this enables CUDA support. This brings ATen's dynamic library structure more similar to Caffe2's. libATen.so is no more (this is BC BREAKING) The general principle for how this works is that we introduce a hooks interface, which introduces a dynamic dispatch indirection between a call site and implementation site of CUDA functionality, mediated by a static initialization registry. This means that we can continue to, for example, lazily initialize CUDA from Context (a core, CPU class) without having a direct dependency on the CUDA bits. Instead, we look up in the registry if, e.g., CUDA hooks have been loaded (this loading process happens at static initialization time), and if they have been we dynamic dispatch to this class. We similarly use the hooks interface to handle Variable registration. We introduce a new invariant: if the backend of a type has not been initialized (e.g., it's library has not been dlopened; for CUDA, this also includes CUDA initialization), then the Type pointers in the context registry are NULL. If you access the registry directly you must maintain this invariant. There are a few potholes along the way. I document them here: - Previously, PyTorch maintained a separate registry for variable types, because no provision for them was made in the Context's type_registry. Now that we have the hooks mechanism, we can easily have PyTorch register variables in the main registry. The code has been refactored accordingly. - There is a subtle ordering issue between Variable and CUDA. We permit libATen_cuda.so and PyTorch to be loaded in either order (in practice, CUDA is always loaded "after" PyTorch, because it is lazily initialized.) This means that, when CUDA types are loaded, we must subsequently also initialize their Variable equivalents. Appropriate hooks were added to VariableHooks to make this possible; similarly, getVariableHooks() is not referentially transparent, and will change behavior after Variables are loaded. (This is different to CUDAHooks, which is "burned in" after you try to initialize CUDA.) - The cmake is adjusted to separate dependencies into either CPU or CUDA dependencies. The generator scripts are adjusted to either generate a file as a CUDA (cuda_file_manager) or CPU file (file_manager). - I changed all native functions which were CUDA-only (the cudnn functions) to have dispatches for CUDA only (making it permissible to not specify all dispatch options.) This uncovered a bug in how we were handling native functions which dispatch on a Type argument; I introduced a new self_ty keyword to handle this case. I'm not 100% happy about it but it fixed my problem. This also exposed the fact that set_history incompletely handles heterogenous return tuples combining Tensor and TensorList. I swapped this codegen to use flatten() (at the possible cost of a slight perf regression, since we're allocating another vector now in this code path). - thc_state is no longer a public member of Context; use getTHCState() instead - This PR comes with Registry from Caffe2, for handling static initialization. I needed to make a bunch of fixes to Registry to make it more portable - No more ##__VA_ARGS__ token pasting; instead, it is mandatory to pass at least one argument to the var-args. CUDAHooks and VariableHooks pass a nullary struct CUDAHooksArgs/VariableHooksArgs to solve the problem. We must get rid of token pasting because it does not work with MSVC. - It seems MSVC is not willing to generate code for constructors of template classes at use sites which cross DLL boundaries. So we explicitly instantiate the class to get around the problem. This involved tweaks to the boilerplate generating macros, and also required us to shuffle around namespaces a bit, because you can't specialize a template unless you are in the same namespace as the template. - Insertion of AT_API to appropriate places where the registry must be exported - We have a general problem which is that on recent Ubuntu distributions, --as-needed is enabled for shared libraries, which is (cc @apaszke who was worrying about this in #7160 see also #7160 (comment)). For now, I've hacked this up in the PR to pass -Wl,--no-as-needed to all of the spots necessary to make CI work, but a more sustainable solution is to attempt to dlopen libATen_cuda.so when CUDA functionality is requested. - The JIT tests somehow manage to try to touch CUDA without loading libATen_cuda.so. So we pass -Wl,--no-as-needed when linking libATen_cuda.so to _C.so - There is a very subtle linking issue with lapack, which is solved by making sure libATen_cuda.so links against LAPACK. There's a comment in aten/src/ATen/CMakeLists.txt about htis as well as a follow up bug at #7353 - autogradpp used AT_CUDA_ENABLED directly. We've expunged these uses and added a few more things to CUDAHooks (getNumGPUs) - Added manualSeedAll to Generator so that we can invoke it polymorphically (it only does something different for CUDAGenerator) - There's a new cuda/CUDAConfig.h header for CUDA-only ifdef macros (AT_CUDNN_ENABLED, most prominently) - CUDAHooks/VariableHooks structs live in at namespace because Registry's namespace support is not good enough to handle it otherwise (see Registry changes above) - There's some modest moving around of native functions in ReduceOps and UnaryOps to get the CUDA-only function implementations into separate files, so they are only compiled into libATen_cuda.so. sspaddmm needed a separate CUDA function due to object linkage boundaries. - Some direct uses of native functions in CUDA code has to go away, since these functions are not exported, so you have to go through the dispatcher (at::native::empty_like to at::empty_like) - Code in THC/THCS/THCUNN now properly use THC_API macro instead of TH_API (which matters now that TH and THC are not in the same library) - Added code debt in torch/_thnn/utils.py and other THNN parsing code to handle both TH_API and THC_API - TensorUtils.h is now properly exported with AT_API - Dead uses of TH_EXPORTS and co expunged; we now use ATen_cpu_exports and ATen_cuda_exports (new, in ATenCUDAGeneral.h) consistently - Fix some incorrect type annotations on _cudnn_rnn_backward, where we didn't declare a type as possibly undefined when we should have. We didn't catch this previously because optional annotations are not tested on "pass-through" native ATen ops (which don't have dispatch). Upstream issue at #7316 - There's a new cmake macro aten_compile_options for applying all of our per-target compile time options. We use this on the cpu and cuda libraries. - test/test_cpp_extensions.py can be run directly by invoking in Python, assuming you've setup your PYTHONPATH setup correctly - type_from_string does some new funny business to only query for all valid CUDA types (which causes CUDA initialization) when we see "torch.cuda." in the requested string Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Last mile libtorch fixes Signed-off-by: Edward Z. Yang <ezyang@fb.com> * pedantic fix Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-10 10:28:33 -07:00
Peter Goldsborough	54a4867675	Bring back C++ extension torch.h (#7310 ) * Bring back C++ extension torch.h * Fix python.h include in python_tensor.cpp	2018-05-05 14:06:27 -07:00
Peter Goldsborough	feb64b5291	Add -Wno-unknown-pragmas (#7291 )	2018-05-04 13:44:13 -07:00
Peter Goldsborough	67d0d14908	Rename autograd namespace to torch and change torch.h into python.h (#7267 ) * Rename autograd namespace to torch and change torch.h into python.h * Include torch.h instead of python.h in test/cpp/api * Change some mentions of torch.h to python.h in C++ extensions * Set paths directly, without find_path	2018-05-04 08:04:57 -07:00
Soumith Chintala	92f54e1f01	remove static libstdc++ linking and PYTORCH_BINARY_BUILD env variable (#7259 )	2018-05-03 12:32:57 -07:00
Luca Antiga	5d3c3c53aa	Add raw IR serialization/deserialization (#6392 )	2018-05-01 20:21:29 +02:00
Luca Antiga	0703357723	Don't build THD/master_worker if not explicitly requested (#7081 )	2018-04-29 13:17:09 -04:00
James Reed	4667983f0f	Fixes for interpreter and ONNX export for translation (#7044 ) Fixes for interpreter and ONNX export for translation Address comments	2018-04-27 22:23:57 -07:00
Peter Goldsborough	7b09bc72a5	[WIP] Enable WERROR in tests (#6539 ) * Enable WERROR in tests * Also set WERROR=1 for cpp_build in CI * Enable Werror after the compiler checks * Remove -DWERROR because its picked up from the env var * Had to fix some errors in aten/contrib/data * Allow an uninitialized variable in ReduceOpsKernel.cpp * Use CUDNN_DATA_UINT8 in cuDNN type string conversion * Fixes and use target_compile_options * Fix uninitialized variables in THNN * Include Python.h earlier in tensor_types.cpp * Use CUDNN_VERSION 7100 instead of 7000? * More Python.h includes * Make switch case in common_subexpression_elimination.cpp exhaustive * Build with WERROR=0 just to see all the warnings * Remove some Python includes * Enable WERROR=1 again * Bring back switch case default	2018-04-28 01:51:16 +01:00
Soumith Chintala	bd14d8e8f8	add additional caffe/caffe2 paths to exclude list in pytorch setup.py (#6891 )	2018-04-25 22:10:38 -04:00
Orion Reblitz-Richardson	dec5e99e99	[aten] Move submodules to third_party (#6866 ) * [aten] Move submodules to third_party * [aten] Update aten_mirror.sh script for third_party * [aten] Move ATen submodules def to root and rename * [aten] Update cpuinfo cmake build * [aten] Fix cpuinfo cmake build * Update third_party/cpuinfo to d03d5d296063063c66877fb559cf34469734e3e1 * [aten] Fix JIT test reference to catch	2018-04-24 23:33:46 -04:00
gchanan	1c7b0c1020	Update version string to 0.5. (#6795 )	2018-04-22 13:57:48 -04:00
bddppq	c43c911662	Export onnx protobuf bindings to python (#6651 ) * Export onnx protobuf bindings to python * rename native onnx module to _onnx	2018-04-17 16:38:57 -07:00
srib	53d2612b55	Fix a typo in the setup.py script (#6632 )	2018-04-16 15:29:45 -04:00
Zachary DeVito	825ce7f196	[jit][script] Allow tuples to be re-assigned (#6538 ) * Allow tuples to be re-assigned This commit improves our support of tuples by making them more first-class. In particular, it allows tuples to be re-assigned across loops and ifs. It does this by making them first-class values in the Graph IR, and then removing the tuples in a LowerTuples pass. An alternative approach would have added more support for desugaring tuples in the Environment object as they were emitted. Instead, the current approach was chosen anticipating a future when tuples are fully supported (including the interpreter). In that future, the current code can be completly reused with the LowerTuples pass just becoming a optimization that removes unneeded tuple allocations.	2018-04-13 17:34:50 -07:00
Peter Goldsborough	e4f1d3b538	Better warnings (#6428 ) * Better warnings * Remove -Wc++14-extensions because gcc does not know it * Warning fix in input_buffer.cpp * Remove pedantic for torch/csrc/ * Also use Wextra and Wall for ATen * Use check_env_flag * Undo changes in shape_analysis.cpp * Remove C linkage flag	2018-04-10 23:34:25 -07:00
peterjc123	5651695a99	Fixes #6386 , Use copies instead of symbolic files (#6396 ) * Use copies instead of symbolic files * bug fix * Remove useless item	2018-04-09 13:54:10 -04:00
Soumith Chintala	108f5c197f	[pytorch] add static linkage support for CuDNN and NCCL (#6410 ) * when linking static CUDA libs, additional dep on culibos.a * add USE_STATIC_NCCL option * add USE_STATIC_CUDNN option * remove libATen soversion * add caffe, caffe2 folders to setup.py exclude list	2018-04-08 22:54:18 -04:00
Ben	119ea39021	add cuda headers (#6401 )	2018-04-08 10:50:20 -04:00
gchanan	87e369111a	Add string-style devices to all tensors. (#6283 ) * Add string-style devices to all tensors. Previously, tensors only had a 'get_device' method which would throw an exception on a CPU tensor. This made it necessary to if/else code that was meant to be device agnostic. This PR implements the following: 1) Adds a 'device' property to all tensors that returns a string representation of the device for all tensors. For cpu tensors this is 'cpu'. For cuda tensors this is 'cuda:X', where X is the cuda device ordinal. 2) Adds a DeviceSpec class. This is just a helper class for separating device_type and device_index specification and to allow partial specification. For example, you can call DeviceSpec('cuda'), DeviceSpec('cuda:0'), DeviceSpec('cuda', 1). Also has backwards compatibility support for specifying integers, which are treated as cuda devices. DeviceSpecs have the following properties: a) device_type: string representation of the device type (i.e. 'cpu' or 'cuda') b) device_index: integer for the device index (None if not specified) c) cuda_device_index: for backwards compatibility; behaves roughly like `get_device` did previously. I.e. if a function previously took integers for cuda devices, it can now take DeviceSpecs (or strings), and can maintain the old functionality by calling `old_index = DeviceSpec(old).cuda_device_index`. 3) tensor methods and torch. functions that took integer devices can now take integers, strings, or DeviceSpecs. For example: torch.randn((2,3), dtype=torch.cuda.float32, device='cuda:1') TODO in future PRs: A) Split out cuda from dtype so you don't need to overspecify cuda-ness B) We currently only support strings/DeviceSpecs in tensor methods and torch. functions. We should have equivalents torch.cuda.device(...), torch.cuda.device_of, etc. at the torch. level that work on strings/DeviceSpecs * Add deviceInt64 to python arg parser. * device_str. * Remove device_str. * remove device prefix from attributes. * Use const char * instead of string. * Move autogpu index out of Device. * comment on is_default. * Rename torch.DeviceSpec to torch.device. * comment. * Fix tests. * Fix flake8. * Fix sparse_coo_tensor parameter name. * Improve error message. * Remove device_ prefix from C++ device object. * Allocate static strings. * Return not implemented from rich compare. * Move torch::Device to THPDevice. * Remove cuda index. * Py_RETURN_NOTIMPLEMENTED doesn't exist in python2.	2018-04-06 15:12:05 -04:00
Sam Gross	6b3a4637d6	Make the tensor type torch.Tensor instead of torch.autograd.Variable (#5785 ) This changes type(tensor) to return `torch.Tensor` instead of `torch.autograd.Variable`. This requires a few implementation changes: - torch.Tensor is now a regular Python class instead of a pseudo-factory like torch.FloatTensor/torch.DoubleTensor - torch.autograd.Variable is just a shell with a __new__ function. Since no instanes are constructed it doesn't have any methods. - Adds torch.get_default_dtype() since torch.Tensor.dtype returns <attribute 'dtype' of 'torch._C._TensorBase' objects>	2018-04-03 16:29:25 -04:00
gchanan	4c81282c33	Introduce torch.layout and split layout from dtypes. (#6145 ) * Introduce torch.layout and split layout from dtypes. Tensors (and tensor types) now have a 'layout' attribute that returns either 'torch.strided' or 'torch.sparse_coo'. Previously, dtypes were 1-to-1 with ATen types/PyTensorTypes; the impetus behind this decision was to make things easy in the common case (i.e. specifying a type in a factory function). But this doesn't really follow for sparity, which isn't a common case. It also doesn't properly represent the concept or a dtype, which in numpy are proper scalar types (i.e. roughly the type returned from indexing the last dimension of an n-d array). But this should be the same whether or not the tensor is represented via strides, sparsity, etc. This is accomplished by: 1) having the dtype of tensor return the (device-type, scalar-type) combination, i.e. torch.cuda.float32, so both torch.cuda.FloatTensor and torch.cuda.sparse.FloatTensor have the same dtype 2) Adding a layout parameter to python functions, where the combination of (dtype, layout) maps to an ATen type that is used for dispatch. * Formatting, make init throw python_error. * Fix cuda not enabled error message. * Fix test.	2018-04-02 14:07:50 -04:00
peterjc123	63af898d46	Fix extension test on Windows (#5548 ) * Change cpp_extensions.py to make it work on Windows * Fix linting * Show python paths * Debug * Debug 1 * set PYTHONPATH * Add ATen into library * expose essential libs and functions, and copy _C.lib * Specify dir in header * Update check_abi for MSVC * Activate cl environment to compile cpp extensions * change version string * Redirect stderr to stdout * Add monkey patch for windows * Remove unnecessary self * Fix various issues * Append necessary flags * add /MD flag to cuda * Install ninja * Use THP_API instead of THP_CLASS * Beautify the paths * Revert "Use THP_API instead of THP_CLASS" This reverts commit dd7e74c44db48e4c5f85bb8e3c698ff9de71ba2d. * Use THP_API instead of THP_CLASS(new)	2018-04-02 13:53:25 -04:00
Richard Zou	7355f5cd8d	Tell source users about TORCH_CUDA_ARCH_LIST (#6185 ) Put it into the comments about env vars in setup.py. Also put in a line in the README about where to find this info.	2018-04-02 13:35:14 -04:00
Ma Mingfei	f8270c0225	Enable MKLDNN convolution forward and backward (#6062 ) * Enable MKLDNN convolution forward and backward * minor change * fix mkldnn build error when building ATen standalone	2018-03-29 15:25:07 -07:00
Simeon Monov	a90aa5d818	Fix small typo in setup.py (#6091 ) Fixed small typo in setup.py	2018-03-28 16:51:08 -07:00
Edward Z. Yang	eb18a2f26c	Reorganize third-party libraries into top-level third_party directory (#6025 ) - gloo, pybind11, nanopb and nccl now live in third_party. - ATen builds in aten/build rather than torch/lib/build/aten - A bit of faffing about in the scripts was necessary, because they used to assume that everything lived in the same directory. Now you are expected to cd into the correct directory before calling one of the build functions. The actual builder script lives in tools - Lint now just unconditionally ignores third_party, rather than enumerating folders explicitly	2018-03-27 22:09:20 -04:00
peterjc123	1ab248d09e	Fixes #5973 : Stop printing verbose warnings for MSVC (#6001 ) * Stop printing verbose warnings * Add missing options * Fix for misspelling	2018-03-26 09:40:30 -04:00
Simeon Monov	c4ee2b7067	Moved torch headers copy to build_deps (#5772 ) * Moved torch headers copy to build_deps PR #5706 initially moved headers under build_ext to fix bdist_wheel and build develop. This broke install and #5755 moved them back to install which broke bdist_wheel and build develop. Looks like build_ext is called from install after it already tried to copy the headers to the python install dir and the headers were not installed correctly. Using build_deps works correct with all setup.py install, bdist_wheel and build develop. * Comment about the auto-generated files Added comment that the current solution will not include auto-generated files which may be a problem if somebody needs to use them	2018-03-23 11:34:27 -04:00
Jon Malmaud	add04c56bf	Verify that 'catch' submodule has been checked out before attempting build. (#5941 )	2018-03-22 11:28:04 -04:00
gchanan	c474136ee1	[REDO] Add torch.sparse_coo_tensor factory. (#5781 ) * Add torch.sparse_coo_tensor factory. Notes: 1) I didn't add Tensor.new_sparse_coo_tensor; it didn't seem particularly useful, but it's easy to add 2) This doesn't do the type inference, i.e. torch.sparse_coo_tensor(indices=LongTensor, values=IntTensor) will return a sparse tensor corresponding to the default type rather than a sparse IntTensor. We can add type inference later when we add it to other factories. * Fix merge. * Use type_conversion function from python_variable_methods.	2018-03-16 13:58:02 -04:00
cpuhrsch	5fa3aac610	ATen ReduceOps (#5776 ) #5481 was reverted due to a strange test bug. This PR attempts to fix that. This diff adds vectorization to ATen. It uses intel intrinsics to build a general vec256 class, that represents types of 256bit width. These can then be treated like regular variables. Using those it implements torch.sum() for the contiguous case. It uses Intel TBB for multithreading, which allows workstealing and chunks the reduction operations based on a experimentally chosen value (_THRESHOLD). It uses cpuinfo to pick the right code depending on the host's capabilities. The kernels are implemented under native/cpu. Each .cpp file is compiled with -avx, -avx2 and no additional flags. A macro is used to append AVX, AVX2 or NONE to the function name. The header then needs to define the functions three times, one for each capability. This could be improved by either changing the cmake file a bit or possibly generating source code using a Python script etc. For the non-contiguous case this defaults to the current implementation within TH. For CUDA is entirely defaults to the implementation within THC. There probably needs to be a bit of a debate around the design decisions here, the additional dependencies, parallelization strategy, clarity, etc. The numerical results also diverge from numpy with larger tensors, which is expected since we're summing, for example, 8 numbers and then adding the result to the running sum, instead of each number one by one. But there might be something to be said about accumulating into a double for floats or the degree of divergence, the behavior with respect to CUDA, etc. I wrote a [small Python script]( https://github.com/cpuhrsch/benchmark/blob/sumall/benchmarks/sum_bench.py) to compare the results with numpy numerically as well as on timing. I ran this script to create timings both on master and this branch. Here is the command for 1 core `OMP_NUM_THREAD=1 taskset -c 0 python sum_bench.py --enable_numpy 200` Here is the command for all cores `python sum_bench.py --enable_numpy 200` Here are the results of each: [Master, 1 core](https://paste.fedoraproject.org/paste/Nho9JzHpPVK9av8a6mByjQ) [This branch, 1 core](https://paste.fedoraproject.org/paste/6xLHkYvcVJx9z~5MoHxN4w) [Master, all cores](https://paste.fedoraproject.org/paste/5l3V1d5zGqvJcMXIUteMRw) [This branch, all cores](https://paste.fedoraproject.org/paste/J4RuDU-0Drz0aZwtphQwEA) To test the command is `python sum_bench.py --test 200` [This branch, test results](https://paste.fedoraproject.org/paste/kTEoUC~oWgXA6XWMAfNfNw) For this test we look at the average absolute value of the differences. This does not take into account the relative magnitude of the numbers. The numbers are sampled from a standard normal distribution. In terms of performance this diff should bring PyTorch on par with Numpy and usually exceed it by 1.5 to 2x.	2018-03-15 12:09:28 -04:00
peterjc123	abd6f82709	Fix debug build failure on Windows (#5771 )	2018-03-15 11:42:44 -04:00
Edward Z. Yang	cadeb0cb17	Revert "ATen ReduceOps (#5481 )" (#5765 ) * Revert "ATen ReduceOps (#5481)" This reverts commit `310c3735b9`. * Revert "Check that new cpuinfo and tbb submodules exist (#5714)" This reverts commit `1a23c9901d`.	2018-03-13 23:50:16 -04:00
Peter Goldsborough	bab0f8484b	Put torch header install back into the install command (#5755 )	2018-03-13 19:23:02 -04:00
Sam Gross	1a23c9901d	Check that new cpuinfo and tbb submodules exist (#5714 )	2018-03-12 15:44:10 -04:00
Zachary DeVito	41285edbb6	[jit] add a compiled script module (#5630 ) Add script::Module C++ class to represent script modules switch AST -> IR conversion to work on Modules/Methods rather than raw graphs function-only AST -> IR conversion is just a simplified case where there is only one module with a single method and no parameters. introduce SugaredValue in compiler.h to represent values in scope in a script function that are not first-class and that get desugared. This is used to represent the module's self parameter, as well as python function calls, and method calls on tensor provide a Python ScriptModule that provides a nice API on top of script::Module allowing for the definition of script modules with methods, parameters, and submodules Not in this PR but intended for the future: ScriptModule actually subclasses nn.Module, with most methods implemented Unification of tracedmodule and script module functionality into one container class. Detailed changelog: * Switch compiler over to using Module, but don't use them yet. * Remove intermediate attribute encoding in compiler * Create SugaredValue object to handle resolution of compiled module. * switch to_ir to modules, implement Select * hacky python wrappers * Private ScriptModule * Add `define` to script module * Attributes use TK_LIST_LITERAL this anticipates adding a real list literal expression to the language. * Add a metaclass to make sure script stubs are registered * Add a test * Doc createResolutionCallback * Docs and minor editing * Address PR comments * Document * Fix unicode issue	2018-03-12 09:52:40 -04:00
Simeon Monov	dede63689f	Moved headers files copy for C++ extensions to build_ext in setup.py (#5706 ) The header files needed for the C++ extensions were copied to torch/lib/include under install. In case of bdist_wheel or build develop for example, the files are not copied and cpp_extensions test is failing: ``` Running test_cpp_extensions.py ... running install running build running build_ext /home/moni/src/ibm/AI/pytorch/torch/utils/cpp_extension.py:79: UserWarning: Your compiler (g++) may be ABI-incompatible with PyTorch. Please use a compiler that is ABI-compatible with GCC 4.9 and above. See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html. warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler)) building 'torch_test_cpp_extension' extension creating build creating build/temp.linux-x86_64-3.6 gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/moni/src/ibm/AI/pytorch/torch/lib/include -I/home/moni/src/ibm/AI/pytorch/torch/lib/include/TH -I/home/moni/src/ibm/AI/pytorch/torch/lib/include/THC -I/home/moni/miniconda3/envs/pytorch/include/python3.6m -c extension.cpp -o build/temp.linux-x86_64-3.6/extension.o -g -DTORCH_EXTENSION_NAME=torch_test_cpp_extension -std=c++11 cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ extension.cpp:1:25: fatal error: torch/torch.h: No such file or directory #include <torch/torch.h> ^ compilation terminated. error: command 'gcc' failed with exit status 1 ```	2018-03-12 14:07:45 +01:00
Richard Zou	03f2ad9029	Add check for python build deps to setup.py (#5618 ) * Add check for python build deps to setup.py * Address comments * Remove install_requires line	2018-03-09 23:49:18 -05:00
Peter Goldsborough	7391dae709	Fix Variable conversion on the way to/from Python (#5581 ) * PyObject* <--> at::Tensor no longer unwraps variables, instead we expect end uses to always work with variable types, and we will only unwrap the variables when we optimize. * Add torch::CPU, torch::CUDA and torch::getType * at::CPU -> torch::CPU in extensions	2018-03-09 14:31:05 -08:00
Sam Gross	5dedc648bb	Compile DataLoader.cpp separately (#5507 ) Don't #include DataLoader.cpp in Module.cpp	2018-03-02 05:54:33 -05:00
Peter Goldsborough	b10fcca5f0	Install cuda headers in ATen build (#5474 )	2018-02-28 19:36:41 -08:00
peterjc123	377d896969	better solution for the linking error related to lazy_init for MSVC (#5375 ) * Revert "Fix wrong argument name (#5366)" This reverts commit `cc9d3b265d`. * Fix wrong argument naming * Revert "Wrap torch::cuda::lazy_init with WITH_CUDA flag" This reverts commit a8fa37f8fac5aef09eb7fe54d84de6126618c262. * Revert "Solves the linking error related to lazy_init for MSVC" This reverts commit 63913a102f274865a76e7c40ffdf6b40c277d5ff. * better solution for the linking error related to lazy_init for MSVC * Naming changes * Namespace changes and further comment * Rebasing onto current master * Remove code that is useless * Fix linting * Remove rebasing bugs	2018-02-28 17:34:34 -05:00
Sam Gross	48a3349c29	Delete dead Tensor code paths (#5417 ) This deletes most of the dead Tensor code paths, including the TensorMethods cwrap and generic/Tensor.cpp. This also moves the THNN.cwrap/.cpp generation to generate_code which can use ninja if installed.	2018-02-27 17:58:09 -05:00
gchanan	d5038309a1	Remove WITH_SCALARS, as it's enabled by default now. (#5437 )	2018-02-27 14:51:11 -05:00
Soumith Chintala	d2f71cbdeb	make CuDNN finders respect library major version (#5399 )	2018-02-24 19:37:00 -05:00
Sam Gross	30ec06c140	Merge Variable and Tensor classes (#5225 ) This replaces the torch.Tensor constructors with factories that produce Variables. Similarly, functions on the torch module (e.g. torch.randn) now return Variables. To keep the PR to a reasonable size, I've left most of the unused tensor code. Subsequent PRs will remove the dead code, clean-up calls to torch.autograd.Variable, and rename Variable to Tensor everywhere. There are some breaking changes because Variable and Tensors had slightly different semantics. There's a list of those changes here: https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge	2018-02-23 18:03:31 -05:00
peterjc123	6c587e9e67	Solves the linking error related to lazy_init for MSVC (#5368 ) * Revert "Fix wrong argument name (#5366)" This reverts commit `cc9d3b265d`. * Solves the linking error related to lazy_init for MSVC * Fix wrong argument naming * Wrap torch::cuda::lazy_init with WITH_CUDA flag	2018-02-23 11:08:20 -05:00
Peter Goldsborough	008ba18c5b	Improve CUDA extension support (#5324 ) * Also pass torch includes to nvcc build * Export ATen/cuda headers with install * Refactor flags common to C++ and CUDA * Improve tests for C++/CUDA extensions * Export .cuh files under THC * Refactor and clean cpp_extension.py slightly * Include ATen in cuda extension test * Clarifying comment in cuda_extension.cu * Replace cuda_extension.cu with cuda_extension_kernel.cu in setup.py * Copy compile args in C++ extension and add second kernel * Conditionally add -std=c++11 to cuda_flags * Also export cuDNN headers * Add comment about deepcopy	2018-02-23 10:15:30 -05:00
peterjc123	cc9d3b265d	Fix wrong argument name (#5366 )	2018-02-23 00:37:02 -05:00
peterjc123	013ed5b88f	Add lazy_init.h into build for Windows and refactor code (#5365 ) * Add lazy_init.h into build for Windows and refactor code * Remove minor bugs	2018-02-23 00:05:43 -05:00
Soumith Chintala	9388d35293	prioritize cudnn library dir in library_dirs order (#5345 )	2018-02-21 22:51:04 -05:00
gchanan	0878c6d4d7	Various dtype improvements. (#5321 ) * Various dtype improvements. 1) Add dtypes to the new data-based constructors: Variable.new_tensor and torch.autograd.variable. 2) In the python signatures, use Type instead of Dtype to match the C++ signatures; the error messages still print as dtype. 3) Handle / add a better error message when a dtype is used when ATen was not compiled with that type (e.g. cuda types). 4) Move cuda_lazy_init to its own file. A later commit will add support to the legacy constructors as well. * Move implementation of lazy_init to cpp. * Fix parsed_arg size.	2018-02-21 17:37:59 -05:00
Edward Z. Yang	031412a14b	setup.py and cmake improvements (#5269 ) * Document env vars and properly propagate MAX_JOBS down. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Apply CFLAGS and LDFLAGS environment variables to cmake builds. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Test that running built program works; fixes #5151. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * CMake CR. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-02-20 16:55:57 -05:00
gchanan	5edf6b2037	Add numpy-style dtypes to Variable factories. (#5245 ) * Add numpy-style dtypes to Variable factories. 1) Add numpy-style dtypes corresponding to torch tensor types. These are: torch.float16, torch.float32, torch.float64, torch.uint8, torch.int8, torch.int16, torch.int32, torch.int64 as well as torch.cuda, torch.sparse, and torch.cuda.sparse equivalents. 2) Adds "legacy" names for the above dtypes that correspond more closely to existing tensor names. These are: torch.half, torch.float, torch.double, torch.short, torch.int, torch.long. torch.byte and torch.char don't exist because they either don't match numpy semantics or differ on different architectures. 3) Adds a "dtype" parameter to Variable factories (e.g. zeros, ones) that allows the user to specify the type without changing the default tensor type. 4) Adds a "dtype" getter to Variables that return the canonical dtype from 1) This PR is missing the following useful features that should be added in the future: A) We only add the "dtype" parameter to auto-generated factories; hand-written factories like in tensor_new.cpp don't support this yet. B) We don't allow type conversions to use dtypes; that should be added to type(param) or a new function. C) We don't yet have a "device" parameter for these factories; right now, they will only create Variables on the default device. * backend_to_string can be private. * Define python binding argument indexes in a more simple way. * add all_declared_types, still need to hook it up to THPDType. * Fix all_declared_types for missing types (it's Sparse + Half). * Ensure cuda dtypes are created even if compiled with NO_CUDA=1. * Fix case where dtype is provided but dispatch is via namespace. This happens in ones_like, empty_like, randn_like. There is some question if we should do: 1) at::ones_like(tensor).toType(dtype) 2) at::ones_like(tensor.toType(dtype)) I did the former because this matches with the numpy documentation, i.e.: "Overrides the data type of the result." and it's easier to implement. Note that the above causes an extra copy, either of the input or output. Here's a better implementation: 1) Make zeros_like, ones_like native functions that take an optional type (named dtype?). 2) Match the type argument with the dtype, so we don't have two different parameters. 3) Call at::zeros_like(input, type) -> at::native::zeros_like(input, type) -> type.zeros(input.sizes()) * Don't return from maybe_initialize_cuda. * Don't leak DType name. * Address cpp review comments. * Share code between sparse and non-sparse test_dtypes. * Rewrite _like functions as native function with explicit type parameter. * Use type 'Type' instead of 'dtype' for consistency. * Address review comments. * Handle arg_idx when there is requires_grad but no dtype in python_binding_arguments.	2018-02-20 11:04:14 -05:00
Adam Paszke	cb2fd39fdd	Add Python frontend to the JIT (#5190 )	2018-02-15 22:53:19 +01:00
Peter Goldsborough	2d5fbe6e0d	Improve Variable interface (#5127 ) * Improve Variable interface * Address comments from @apaszke and @colesbury * string ::operator= is not noexcept * Remove ir.h from tracer_state.h to improve build times * Make Variable a struct and pack SavedVariable fields * Implement as_variable_ref * grad_fn_ptr() -> grad_fn_unsafe() * Reduce hackiness of set_type hack * Include variable.h and edge.h in tracer_state.h because it uses them * class Variable -> struct Variable because Windows cant even * Make Variable::output_nr uint32_t instead of int * Add comment about tracing state * Replaced more static_cast<Variable&> and improve docs * Remove SavedVariable destructor and construct members in init list * Clarify docs for Variable * Variable::set_version -> set_version_counter	2018-02-12 23:26:26 -05:00
gchanan	4b8bf73729	Enable scalars. (#5158 ) * Enable scalars. * Avoid variable name shadowing in list comprehension, because it rebinds in python2, but not python3.	2018-02-09 15:45:41 -05:00
bddppq	3e85613751	Experimental jit script (#5074 )	2018-02-07 20:43:45 +01:00
Zachary DeVito	c308e03f3e	Initial GraphExecutor Implementation. (#4982 ) This adds the initial implementation of graph executor for the new JIT design. It includes a few python tests ensuring that nograd, backward, and double-backward cases work for simple examples and some corner cases. More work needs to be done to performance optimize as there are many extra copies and places where we hold onto variables longer than we should. These are noted in the comments.	2018-02-02 17:45:59 -08:00
Peter Goldsborough	1475895c1d	Use distutils.copy_tree/copy_file instead of shutil	2018-02-01 16:19:03 -08:00
Peter Goldsborough	1262fba8e7	[cpp extensions] Create torch.h and update setup.py	2018-02-01 16:19:03 -08:00
Zach DeVito	2d829d15af	[JIT] Add simple shape analysis This quick and dirty shape analysis just makes up fake tensors, and runs them through ATen to do shape propagation.	2018-01-28 22:55:36 -08:00
Edward Z. Yang	b8ab7bee26	Use variadic templates instead of initializer lists and overloads. (#4772 ) Suppose you are given a list of arguments, each of which may be Tensor or TensorList. How can you write a function that can treat these arguments uniformly as a list of tensors? This patch solves the problem using variadic templates. Why variadic templates? Use of variadic templates means anyone working with this code has to understand universal references, perfect forwarding, parameter packs and some idioms of C++ template design. However, I argue that variadic templates are the right tool for supporting the implementation of functions which must take an arbitrarily heterogenous set of inputs. We were able to limp by in old code because, for the most part, tensor inputs were homogenous, but this is no longer the case for some non-primitively differentiable functions; and with the upcoming cuDNN RNN in ATen PR, will no longer be the case for primitively differentiable functions too. There are two parts to the PR. First, we add torch/csrc/utils/variadic.h, which defines a mix-in IterArgs that takes any class which supports operator(), and augments with a new variadic function apply() which calls operator() on each argument passed to it. In an original draft of the patch, I wrote the recursion for each parameter pack from scratch for each function; however, it turns out there are no fewer than seven instances where we need this idiom, and the mix-in reduces the lines of code, and also helps centralize the most important (and easy to forget) boilerplate for perfect forwarding. To verify that IterArgs is compiled away into an unrolled form per call site, I inspected the assembly on some synthetic examples. Next, we modify the following functions to make use of IterArgs: - compute_requires_grad - Function::flags (Variable and Tensor variants) - flatten - isTracing - count_tensors / count_variables Finally, the tuple packer is rewritten to be variadic, although we cannot make use of IterArgs (since we are given a tuple). It might make sense to refactor the code into a generic piece which invokes a function with the arguments specified by a tuple, and then an appropriate IterArgs, but we leave this for future work. One thing to note: we cannot write a function with overloads for both Tensor and Variable, because both ArrayRef<Variable> and Tensor have implicit conversions from Variable, making such an overload ambiguous. It may be interesting to remove the implicit conversion from ArrayRef. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-26 15:56:39 -05:00
Soumith Chintala	bb3bc969ca	fix binary version scheme to be PEP compliant (#4847 )	2018-01-25 11:16:02 -05:00
Teng Li	1b3d6ab864	Enabling Infiniband support for Gloo data channel with auto IB detection (#4795 )	2018-01-24 23:18:24 +01:00
Zachary DeVito	0ae5498079	[JIT] add create_autodiff_subgraphs (#4822 ) This pass splits differentiable subgraphs into their own Node, similar to a fusion group. This initial implementation does not create optimal subgraphs, but it works well in the case where most things are differentiable, and has the building blocks (`mergeNodes`) to extend to the better implementation.	2018-01-23 23:46:54 -05:00
gchanan	9bb6d33d35	Enable scalars if compiled with WITH_SCALAR environment variable. (#4806 ) * Enable scalars if compiled with WITH_SCALAR environment variable. We are pretty close to enabling scalars (0-dimensional arrays); this allows turning them on for development purposes and to be able to write code that works both with and without scalars enabled. WITH_SCALARS is currently broken with distributions, but should work for test_torch, test_autograd, test_nn. * Fix unsqueeze. * Fix wrap dim, wrapping with Scalar.	2018-01-23 15:44:11 -05:00
Adam Paszke	ad2edd8613	Check submodules only in build_deps (#4770 )	2018-01-21 20:24:05 -08:00
Adam Paszke	816d5d8ff7	Scaffolding for source-to-source AD in the JIT	2018-01-20 17:34:08 +01:00
Adam Paszke	1061d7970d	Move broadcast and broadcast_coalesced to C++	2018-01-18 11:16:45 +01:00
Adam Paszke	de5f7b725e	Base for pure C++ NCCL interface	2018-01-18 11:16:45 +01:00
Sam Gross	57549b7e44	Bind functions with out= arguments in VariableType (#4565 ) This adds overrides in VariableType for the xxx_out ATen functions and implements Python bindings. There is no support for automatic differentiation. If any of the inputs (or outputs) requires grad, then the function will throw an exception unless it's running in "no-grad" mode. The bindings for calling torch.xxx functions on Variables are moved to a different object. Previously, they were static method on VariableBase. This change prevents users from accidentally calling static methods as if they were instance methods.	2018-01-17 18:27:42 -05:00
Adam Paszke	1a02d3ae86	Implement MM fusion (MM with add reduction tree) (#4615 ) Implement MM fusion (MM with add reduction tree) A tree where leaves are matrix multiplies and inner vertices are adds can be computed as a single mm. Such subgraph often appear in backward if a single weight is reused multiple times (e.g. in RNNs). NOTE: this seems to be slightly slower on the GPU than the naive implementation, but it's a huge win on the CPU (think 100x lower overhead)	2018-01-17 21:36:21 +01:00
Jon Crall	94f439c07c	Fixed setup.py to handle CUDNN_LIBRARY envvar with aten (#4597 ) * Fixed setup.py to handle CUDNN_LIBRARY envvar with aten * undo changes * Added CUDNN_LIBRARY to bat file	2018-01-11 07:24:17 -05:00
Edward Z. Yang	dc76db349e	Delete a pile of dead code (#4295 ) * Delete obsolete basic ops. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * More deletion. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete some unused utilities. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete dead apply_fn Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete CppFunction symbolic support. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete ForwardFunction Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Batchnorm is 'working' Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-04 09:21:54 -05:00
peterjc123	b78a37a058	Enable ninja during python build process for MSVC (#3993 )	2017-12-30 12:58:32 +01:00
Edward Z. Yang	8c9a22a88e	Support NO_NNPACK environment variable (#4401 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-29 16:33:01 +09:00
Edward Z. Yang	5b8fe5cbb5	Batchnorm in ATen (#4285 ) * Batchnorm in ATen This commit moves BatchNorm derivatives into ATen, eliminating torch/csrc/autograd/functions/batch_normalization.cpp Some refactoring along the way: - Functions got renamed to remove _forward from their names - CuDNN batchnorm forward was modified to return save_mean/save_std instead of take it as parameters. To avoid returning undefined Variables, these return (small) uninitialized tensors when they are not used. - THNN batch normalization takes care of resizing save_mean and save_std on forward. - There are some shenanigans re batchnorm backwards in eval mode. I'm tracking that in #4284 - I decided not to introduce buffers as a proper concept in ATen, which means that tensors like running_mean/running_var are variables in ATen. This meant there needed to be some adjustments to how we trace such variables; the new strategy is if we can't find a Value for a variable, we look and see if we have a Value for the buffer pointed to by the variable, before finally falling back on constant. - This PR finally reliably triggered OOM on Travis builds; I fixed this by reducing the number of parallel jobs. - Stop using std::string when it's not necessary. - Remove training parameter from cudnn_batch_norm_backward, because it doesn't make sense; cuDNN doesn't implement the math for evaluation mode batchnorm backwards. - batchnorm_double_backward is now in an anonymous namespace, as it no longer needs to be called from torch/csrc Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-21 11:38:31 -05:00
Edward Z. Yang	a88a8ec827	Convolution derivatives in ATen (#4116 ) * Convolution derivatives in ATen This PR introduces ATen implementation of convolution, which dispatches to THNN/CuDNN/nnpack based on input parameters. The general strategy is to compose this function out of the various forward-backward pairs of specific implementations, rather than write a monolithic function with backwards (which is what we did before because the boilerplate of doing it otherwise would have been very high.) The new API provides the following functions: - _convolution, which is a fully generic, native convolution implementation that dispatches to various other convolution implementations depending on input characteristics. This is prefixed with an underscore because it explicitly takes benchmark, deterministic and cudnn_enabled which are implementation details for CuDNN. The intent is to eventually provide a convolution that reads these parameters out of the context using #4104. - _convolution_nogroup is a convolution implementation for non-CuDNN algorithms which don't support group convolution natively. - _convolution_double_backward is the generic double-backwards implementation for convolution. In more detail: - Most functionality from torch/csrc/autograd/functions/convolution.cpp has been moved into aten/src/ATen/native/Convolution.cpp - We continue to make use of ConvParams, but we now construct the parameters upon entry to a function from the function signature (which does not use ConvParams; having convolution take ConvParams directly would require teaching the code generator how to accept these as parameters, complicating ATen's API model) and destruct them when making subprocedure calls. - I introduce a new idiom, input_r, which represents a const Tensor& reference, which will subsequently be assigned to a local Tensor input. This is helpful because a lot of the existing algorithms relied on being able to assign to locals, which is not permitted with a const reference. - The native argument parser now supports std::array<bool,2> inputs (NB: there MUST NOT be a space; this is the same hack as is applied to derivatives.yaml) - Native parser now supports Tensor? arguments, which indicates a nullable tensor. Previously this function was only used by NN methods. - Documentation updates on THNN library - I added an extra fgradInput argument to VolumetricConvolutionMM_updateOutput and VolumetricConvolutionMM_accGradParameters so that its buffer list lines up with the backward argument list. This makes it possible to write derivative for conv3d which previously was not supported (commented out in derivatives.yaml) - Extra double_backward declarations for all convolution backwards functions was added. - You can now use the syntax Tensor? in native_functions.yaml to indicate that a tensor argument is nullable. There are adjustments to propagate this to the Python argument parser. - NNPACK was ported to ATen, and ATen now builds and links against ATen if possible. New AT_NNPACK_ENABLED macro. The nnpack functions are nnpack_spatial_convolution. - Some modest CuDNN convolution refactoring to remove _forward from names. - There's a new cudnn_convolution_backward function to deal with the fact that CuDNN convolution double backward requires you to have computed all gradients in one go. - Variable set_flags now checks if the tensor is undefined, fixing a silent memory corruption. - checkSameType updated to not raise an exception if called with Variable arguments - "no ATen declaration found for" error message is improved to say what available declarations are - make_variable now accepts undefined tensors, and returns an undefined tensor in this case.	2017-12-20 14:19:27 -05:00
peterjc123	77ea2f26d8	Add build support for Python 2.7 using MSVC (#4226 )	2017-12-20 15:07:25 +01:00
Sam Gross	d605058212	Replace Variable.volatile with torch.no_grad() (#3970 ) This removes volatile from Variable. The functionality is mostly replaced by a global (thread-local) flag, which is controlled by torch.set_grad_enabled() and the context manager torch.no_grad(). In C++, the flag is exposed through GradMode::is_enabled() and GradMode::set_enabled() Fixes #3627	2017-12-18 15:46:13 -05:00
peterjc123	02317d9336	Enable ext build for Windows (#3935 ) * Enable ext build for Windows * Include the static libs to make the compiling of the extension easier	2017-12-18 02:23:34 -05:00
Sam Gross	bec0349280	Implement Variable.cuda and Variable.type using ATen (#4139 ) * Implement Variable.cuda using ATen This adds an optional async flag to Tensor::copy_, which attempts to do a non-blocking copy if the one of the tensors is in pinned memory and the other is a CUDA tensor. * Perform cross-device copy in CopyBackwards Also call torch.cuda._lazy_init() from Variable.cuda() * Implement Variable.type via ATen * Changes from review: - remove copy_out - remove unnecessary include - fix default device for .cuda() * Combine if statements in dispatch_type	2017-12-18 01:54:35 -05:00
Edward Z. Yang	6d72c82985	Trace ATen native functions as themselves, not their implementations. (#4127 ) * Trace ATen non-primitive functions as themselves, not their implementations. Previously, if I invoked an ATen non-primitive function foo, which in turn called subfoo, I would always see 'subfoo' in the trace (e.g., tracing 'inlines' all of these operations.) Such inlining is bad for ONNX (and can be bad for optimization) as it prevents high-level optimizations from taking advantage of the structure. It might be right to inline, but give the optimizer a chance to work before inlining happens! The implementation here is surprisingly simple, because it uses the "DCE trick". Essentially, it doesn't matter if the constituent calls perform tracing, because you can always trace it again, and override the trace nodes associated with the returned variables. The original trace becomes dead and can be DCE'd. While implementing this, I also refactored how 'isTracing' and 'trace_outputs' works: - isTracing was previously a single function with overloads for both Tensor and Variable arguments. Unfortunately, such overloads are not safe, because of how C++ implicit conversions work. You would think that C++ should never confuse an overload for Variable with ArrayRef<Tensor>, but this is exactly what can happen: Tensor is convertible to both Variable and ArrayRef<Tensor>, thus it's ambiguous and C++ doesn't like it. The last time I ran into this problem, I applied initializer lists to everything and called it a day. A more robust fix is to separate out the Variable and Tensor overloads, which I have done in this patch. - trace_outputs was fed as an initializer list, which doesn't work when you have heterogenous inputs. So instead we first feed everything through 'flatten', which has overloads for each of the argument patterns in ATen, which then goes on to the recordTrace (which takes an ArrayRef). This is no less efficient, because we were allocating a vector anyway (to do the conversion from vector of Tensor to vector of Variable). This fixes mean that 'index' can properly be traced... although the JIT still does not support it. A failing test case has been added to this effect. Some knock-on effects: - The fuser now knows about chunk as well as split. They're pretty similar so there is no problem. - There is a new 'canonicalize' pass in the JIT which renumbers a graph so that all structurally equivalent graphs render the same. - We run DCE before the fuser tests, to make sure dead nodes don't block fusion. - There are new ONNX exports for the newly introduced higher level ATen operations. This includes type_as (no-op case only), chunk, select. Zach didn't like the extra use of 'native' in the new codegen, so we've introduced a new concept, 'abstract'. An abstract function is one that is implemented in derived types (e.g., CPUDoubleType), where as a concrete one is implemented in the base type (Type). Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-15 13:50:32 -05:00
Will Feng	db446d69ca	Fix issues with Windows 7 & 10 CPU build (#4065 )	2017-12-15 10:14:43 +01:00
Sam Gross	aeb7a3668d	Implement Variable.new (#4080 )	2017-12-11 15:45:43 -05:00
Sam Gross	60c03bc09c	Implement apply_, map_, and map2_ in Variable (#4057 )	2017-12-07 14:48:56 -05:00
Sam Gross	d0cabbde74	Implement Variable.from_numpy (#4043 ) Implements from_numpy using ATen tensors. Variable.from_numpy is a convenient placeholder for the variant that returns Variables until we merge Tensor and Variable. The behavior is slightly changed: - from_numpy() on an empty array now returns an empty tensor instead of throwing an exception. The shape may not be preserved. - CharTensor(ndarray) used to throw an exception. It now copies the ndarray. Copying is implemented via ATen toType.	2017-12-06 14:08:56 -05:00
Sam Gross	38f13447bc	Implement Variable.tolist() (#4038 ) Tensor.tolist() now dispatches through Variable.tolist() so that we only have one code path to test until we merge Variable and Tensor.	2017-12-06 12:35:05 -05:00
Sam Gross	5241cdf546	Implement Variable.numpy() (#4006 ) Implement Variable.numpy() and dispatch Tensor.numpy() through Variable.numpy() Variable.numpy() is disallowed on variables that require grad.	2017-12-05 14:24:11 -05:00
Zachary DeVito	9e46fca424	Use ninja as the cmake backend as well.	2017-12-04 14:16:26 -05:00
Zach DeVito	f72fe0624d	Add a CPU Fuser (single core) This adds a simple fusion backend for the CPU. * Refactors CompiledFusionFunction to have two subclasses that handle the compilation details of each backend. * emit-compile-link-run cycle for the CPU * simple single core loop to run the operation * lift CUDA-only restrictions in the fuser, checks that fusion groups are only on a single backend.	2017-12-04 14:13:44 -05:00
Zach DeVito	710f6d6958	Fix warnings and add alert to enable ninja when developing.	2017-12-03 04:49:41 +01:00
Edward Z. Yang	1c0fbd27a1	CuDNN bindings rewrite (into ATen) (#3666 ) * Comprehensive rewrite of Torch CuDNN bindings / a bit of ATen infra The executive summary is that this moves the torch/csrc/cudnn library into ATen, adding a number of new cudnn_ methods to ATen for batchnorm, convolution, affine grid generator and grid sampler. ATen infra changes: - TensorGeometry was moved to ATen - TensorGeometry was modified to make its interface resemble that of Tensor; in particular, sizes is no longer a field, it's a method. - AT_CUDA_ENABLED macro is set via ATen/Config.h header which is generated at cmake configure time. Fixes https://github.com/zdevito/ATen/issues/168 - Change AT_CUDA_ENABLED macro to be a function macro, so that we error if it is not defined - Introduce a new TensorArg class, which is a Tensor plus a little metadata. This helps us give good error messages when checking dimensions/shapes of tensors. Fixes https://github.com/zdevito/ATen/issues/169 - Also introduce a TensorGeometryArg class, for when you don't need the actual tensor data (which is most of the time.) - Add ATen/Check.h, which contains a number of utility functions for testing shapes, types and devices of input tensors. This will be particulary useful for native methods, which don't get code generated input testing code. These functions take a 'CheckedFrom' argument, at the moment just a string, which specifies some extra information about what function was doing the actual checking; this greatly improves error messages. - Many check functions take initializer lists, which let you test that all tensors have some property. This API is peculiar, in that we IGNORE undefined tensors in this case. This is handled by filterDefined. - Add AT_CUDNN_ENABLED macro - CuDNN linking from ATen was improved; for example, we now actually add the CuDNN headers to our include path. - Add some missing override specifiers to some methods - We now actually build tests with CUDA functionality accessible (previously, AT_CUDA_ENABLED was not defined, meaning that the headers were missing all CUDA-only functionality.) - Native functions now support giving explicit names to return outputs in yaml. This makes it possible to hook into the NN autogenerated derivatives codepath using native functions. CuDNN rewrite changes: - torch/csrc/cudnn now uses ATen (rather than passing around THVoidTensor) and lives in ATen. This lets us remove tensorPointer shenanigans. The functions are exposed to ATen as native functions described in aten/src/ATen/cudnn/cuDNN.yaml - ATen now builds and links against CuDNN when enabled. The cmake package script was taken from Caffe2. - Some header reorganization was done to help reduce dependencies on headers (this reorg is no longer used but I've kept it) - Rename CHECK to CUDNN_CHECK - Rip out old shape/type testing code in favor of modern ATen/Check.h interface using TensorArg. In many cases, increase the robustness of the checking code. - Change the inputs of the public facing functions, so that they can be bound by ATen - Delete THCState; this is retrieved from the global ATen context - Delete cudnnHandle_t, this is retrieved from the global Handles.h - Delete cudnnDataType_t, this is retrieved from the Tensor type - Delete Convolution class, instead its constituent arguments are passed individually - Change functions to return tensors, rather than take an appropriately sized output tensor as an input. - Redo how transposed convolution / backward convolution is implemented (knock on effect of returning tensors). Previously it was assumed that you would always pass an appropriately sized output tensor, but we don't want to do this anymore. For backwards, we instead give the desired output tensor (input, really) size, because that is readily available. For transposed* convolution, however, we take output_padding, and otherwise do the shape calculation. - Redo how legacy group convolution is implemented (knock on effect from porting cudnn to ATen.) Previously, group convolution was implemented by manually constructing sizes and strides and then outputting appropriate, with macros switching between individual groups and all-at-once based on CuDNN version. Now, the code looks exactly what you'd expect: there's a top-level wrapping function that supports group convolution no matter the version of CuDNN, and a low-level wrapper which supports only what CuDNN supports. The top-level function conditions on CuDNN version, and invokes the low-level interface 1 or n times. - There is now a debugging printer for tensor descriptors. - Convolution struct is replaced with ConvolutionArgs, which is not part of the public API but is used internally to conveniently pass around all of the arguments needed for Convolution. - Add some constexprs for well-known dimensions, reduce amount of magic numbers in code. - Put 'deterministic' in to ConvParams. Fixes #3659 - Lots more comments. - Some pessimizations, in the name of code clarity: - The descriptors are initialized on every invocation of convolution forward/backward. Previously, the descriptors were cached, so that you didn't have to initialize them again on backwards. This is difficult to support in the ATen interface so I didn't support it. - Legacy group convolution initializes its workspace for every group it performs. I did not feel motivated to fix this because the legacy codepath is already quite slow. - Affine grid generator and grid sampler automatically call contiguous on their arguments as necessary. - Batchnorm input checking is greatly beefed up, it now checks for the following input characteristics: - Definedness - GPU location - Type - Contiguity - Size PyTorch binding code changes - batchnorm now uses consistent var/data naming - batchnorm and convolution make use of new ATen bindings - Affine grid generator and grid sampler make use of ATen CuDNN bindings via derivatives.yaml. This means I had to restructure the code a little, since the THNN bindings still go through a legacy Python class. - I fixed some warnings: - s/friend class/friend struct/ on InterpreterStateImpl - Removed pessimizing move 'detached' in torch/csrc/autograd/variable.cpp - Removed unused pack_list on Scalar Signed-off-by: Edward Z. Yang <ezyang@fb.com> GCC 4.8 buildfix Signed-off-by: Edward Z. Yang <ezyang@fb.com> Add TensorGeometry to ATen.h Signed-off-by: Edward Z. Yang <ezyang@fb.com> CUDNN_CHECK Signed-off-by: Edward Z. Yang <ezyang@fb.com> Update TODO comment Signed-off-by: Edward Z. Yang <ezyang@fb.com> Delete return in cudnn_grid_sampler Signed-off-by: Edward Z. Yang <ezyang@fb.com> s/cudnnSetStreamToCurrent/setCuDNNStreamToCurrent/g Signed-off-by: Edward Z. Yang <ezyang@fb.com> Don't allocate a new vector when filtering defined. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Remove Check overloads, convert to pass references. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Some more microbenchmarking. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-30 23:06:58 -05:00
Zachary DeVito	70ca83793d	Add support to emit compile_commands.json from CMake/ninja files.	2017-11-30 13:47:27 -05:00
Zachary DeVito	0e54c3a989	Significantly speed up the incremental build. This commit adds code to setup.py to use ninja to manage C++ and code generator dependencies rather than use raw setuptools. This is based on similar code added to ONNX. Enabled optionally when ninja is installed. On my computer speed for a do-nothing build drops from 10s to 1.5 seconds. Speed of other compilation steps is significantly improved as well. Dependencies are tracked correctly so the need for ccache is reduced.	2017-11-30 13:47:27 -05:00
Zachary DeVito	929a11f920	Add interpreter support for Handles/PythonOp/CppOp (#3866 ) * Add interpreter support for Handles/PythonOp/CppOp This treats Handles as a first-class type in the interpreter since this turned out to be conceptually simpler than treating them as a separate concept, which requires a second channel for register allocating and moving data from one op to the next. Notes: * The refcounting nature of tensors is factored into its own base type so that it can be shared with other refcounted types such as handle. * Some methods redundant with TensorBase have been deleted from Tensor * The interpreter uses raw refcounted handles. In addition to being able to treat Tensors and Handles as the same base object, it removes a lot of redundant refcounting as objects moved from tensors to input/ output lists. * aten_dispatch has been updated to work directly on the raw refcounted lists to avoid refcounting and duplicate lists. * Removing jit_closure.cpp, The interpreter can now handle all pathways. * Functions like `unsafeToTensorShare` describe how ownership transfers in the interpreter. The `Steal` variants take rvalue references as arguments, and invalidate those arguments to prevent potential problems. * Make TensorTemporary is not a subtype relationship because it is too easy to do something horribly unsafe: ``` void foo(at::Tensor bar) { // bar destructor call release on a temporary! } foo(TensorTemporary(retainable)); // structure slicing! ```	2017-11-29 11:38:57 -05:00
Sam Gross	4518793aa2	Implement indexing in ATen (#3725 ) Implements basic and advanced indexing using ATen tensors/variables. Basic indexing is translated at the Python-binding level (python_variable_indexing.cpp) to slice/squeeze/unsqueeze/select calls. Advanced indexing is implemented in ATen in terms of take() and put() calls.	2017-11-21 13:19:00 -05:00
Scott Stevenson	a9ef76b9c6	Reflect renaming of OS X to macOS (#3795 )	2017-11-20 16:52:10 -05:00
Adam Paszke	3e4a777e44	Correct JIT interpreter autograd function (#3760 )	2017-11-19 21:48:22 +01:00
Zachary DeVito	cc7f09a372	Add cudaEvent support to the profiler (#3734 ) * Add cudaEvent support to the profiler This adds the ability to record cuda timings using cudaEventRecord in the profiler. Since it doesn't require nvprof it is easier to run than the nvprof path. This also records a thread id for each event, which will make tracing results easier to understand * Add flow arrows from cpu to cuda event * Fix no cuda build * Review comments * Move CUDA checks to one place	2017-11-16 13:58:09 -08:00
Soumith Chintala	99037d627d	fix OSX cuda build (#3722 )	2017-11-15 16:38:18 -05:00
Zachary DeVito	e43ff32192	Add a JIT interpreter (#3634 ) * Add a JIT interpreter The separate interpreter is used to graphs with a lower overhead than converting them to autograd graphs. Some notes: * does not support Handles/PythonOp/CppOp, these will be in a future commit * jit_closure.cpp still exists and we fall back to it for now when cannot handle something because of PythonOp/CppOp * In order to support retain_graph=True, the interpreter can be cloned, creating a copy that can be run with different arguments. This is assumed to be the non-standard case so cloning is not particularly optimized. No tensor _data_ is copied, but the at::Tensor list in the interpreter is. If we hit problems, there is a lot we could do (such as register allocation) to minimize the stuff that needs to be copied. * Uses a pImpl pattern to keep implementation details out of its header file. * Modifies the way getTensorOp works so that it reads/writes to already-existing vectors, this prevents needing to realloc these buffers each time. * Timings are here: https://gist.github.com/zdevito/5a20ac29fb1b9e449e693b67dc478127 This reduces overhead to about the same as running it in python. It is about 10us faster to run the same thing using ATen directly. * Code Mod Interpreter -> InterpreterState Function -> Code Add other requested comments. * RegList -> ListHandle<T> Change the RegList functions to be safer by identifying the type of each argument list, and checking that list insert does not try to add to two different lists at once. * Use exactly equal for interp tests	2017-11-13 22:09:53 -08:00
Sam Gross	4fa94793dd	Bump version in master (#3605 )	2017-11-11 18:49:19 -05:00
peter	7160fb0801	Fix setup scripts for Windows CUDA builds	2017-11-11 13:05:35 +01:00
Adam Paszke	1f1612ee37	Move _CompiledMixin to C++	2017-11-10 16:31:44 +01:00
Soumith Chintala	285ce10dbe	fix linking order of nvrtc to force no-as-needed (#3583 )	2017-11-08 22:05:09 -05:00
Edward Z. Yang	d2784b6e5b	Link ATen against CuDNN when available. (#3582 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-08 20:20:53 -05:00
peterjc123	aa911939a3	Improve Windows Compatibility (for csrc/scripts) (#2941 )	2017-11-08 19:51:35 +01:00
Adam Paszke	621fbd5c4e	Move flattening/unflattening JIT logic to C	2017-11-06 19:42:44 -05:00
Sam Gross	fde355f7d4	Allow in-place operations on views (#3384 ) Allow in-place operations on views Adds VariableViewImpl, a subclass of VariableImpl which has a pointer to the base Variable on which it is a view. In-place operations on views change the grad_fn of the base. Note that in-place operations only work on views that are the first output of the function that created them. All C++/ATen implemented functions have this behavior, but it's possible to write Python-implemented autograd functions that do not. In-place operations on these view will raise an exception. Fixes #3313	2017-11-06 18:19:56 -05:00
Zach DeVito	f6dac327df	build fixes	2017-11-02 19:53:36 -04:00
Zach DeVito	88d56cc198	fix setup.py paths	2017-11-02 19:53:36 -04:00
Zach DeVito	5aa5b572e4	update build so that all of TH* is in libATen	2017-11-02 19:53:36 -04:00
Sam Gross	afdf50cafe	Move jit/assert.h to csrc/assertions.h (#3442 ) I've kept JIT_ASSERT as an alias to TORCH_ASSERT, which we can use throughout the C++ code.	2017-11-02 13:26:51 -04:00
Soumith Chintala	fc7a68d147	fix lint	2017-11-02 07:36:58 -04:00
Soumith Chintala	4108feb27d	fix OSX cuda build	2017-11-02 07:15:24 -04:00
Trevor Killeen	0e38d3bbb3	remove thpp library (#3405 )	2017-11-01 11:57:09 -04:00
Trevor Killeen	b544882335	ATen in THD (Part I) (#2288 ) * enable size from ATen type * temp commit aten thd * port copy, math * port random * changes after rebase * lapack bind * thd and csrc compile * fix min/max reductions in DataChannelTCP * clean up changes * re-enable tensor constructors * port MPI to at::Tensor * fix storage methods to not cast to thpp storage ptrs	2017-11-01 09:59:02 -04:00
Edward Z. Yang	d4abaa4b9e	Move ONNX broadcast fusion into separate ONNX pass, fixes verbose printing. This breaks a lot of the onnx-pytorch tests because the abstraction barriers are not respected. I'll spin up a patch for that separately. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-01 09:49:53 -04:00
Soumith Chintala	91af122d43	add no-as-needed for THRTC	2017-11-01 04:25:42 -07:00
Soumith Chintala	88d9ebc850	lazy-load nvrtc and libcuda (#3408 )	2017-11-01 06:07:03 -04:00
Adam Cécile	a5dbc254f8	if git is not installed at all, no subprocess exception will be raised (#3379 )	2017-10-30 18:37:12 -04:00
Edward Z. Yang	40f7f6e095	Improve handling of 'expand' (broadcasting) in JIT and ONNX The pieces: - I improved the lint / asserts to catch some bugs which I committed while working on my export. There are two new properties which the linter checks now: (1) "Anticipated uses". If a node says that is used by M, M better appear later in the topsort. Previously, we only checked if it was in all_nodes. (2) If you are a select node, you better be a multi-type node; if you're not a select node, you better not be! And you should never have an input that is multi-type. - There is a new peephole optimization pass, for simple, local transformations to graphs. Right now, it implements a simple optimization: remove 'expand' invocations that are no-ops (the size before matches the size after), but we can add other things to it later. I needed this for ONNX because no-op expands show up in the left-hand argument, which we don't support. - There is now a broadcast fuser, which fuses ATen expand ops into broadcastable ONNX ops (Add, Div, Mul, Pow, Sub, Gemm.) It only fuses when the original size is a suffix of the new size, as per the ONNX spec. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-29 23:50:34 -04:00
Maxim Berman	7b00adf5d3	Add CUDNN_LIB_DIR in rpath (#3255 ) * Add CUDNN_LIB_DIR in link -rpath * insert CUDNN_LIB_PATH in front of rpath	2017-10-28 00:13:53 -04:00
Adam Paszke	61afb0d519	Autogenerate ATen dispatch for JIT nodes	2017-10-27 02:40:09 +05:30
Sam Gross	67839ce7bc	Delete unused Softmax code (#3220 ) Softmax and LogSoftmax are automatically bound and dispatched through VariableType.	2017-10-21 20:51:27 +02:00
Edward Z. Yang	67612cba09	Add -Wno-missing-braces Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-19 23:04:19 -04:00
Sam Gross	f1f64c8d07	Generate autograd functions for NN / more refactors (#3136 ) Generate autograd functions for NN and implement more derivatives in derivatives.yaml A big refactor of gen_variable_type.py	2017-10-19 15:03:26 -04:00
Adam Paszke	98e67448fa	Large Softmax and LogSoftmax refactor - Cleaned up THNN and THCUNN code and kernels - Improved THCUNN kernel performance 5x, making it match cuDNN performance - Added support for computing softmax over arbitrary dims NOTE: The default dim for 3D inputs is now 1 (used to be 0) - Both functions now accept inputs with arbitrarily many dimensions - Autograd functions no longer save the input (it's unnecessary) - Added cuDNN bindings for softmax, but they are unused as THCUNN matches or even exceeds cuDNN performance	2017-10-19 19:51:10 +02:00
Trevor Killeen	dcb457fdd9	add support for using nnpack when installed via conda (#3155 ) * add support for using nnpack when installed via conda * unify nnpack discovery between conda and user	2017-10-18 20:11:13 +02:00
Richard Zou	0f4ae13f05	Better cudnn version checking (#3132 )	2017-10-16 20:59:18 +02:00
Richard Zou	1322f9a272	Add cudnn version to torch.version	2017-10-13 23:58:25 +02:00
Francisco Massa	f093545919	Add compiled CUDA version in torch.version.cuda	2017-10-10 10:16:14 -04:00
Soumith Chintala	efe91fb9c1	delete redundant python nccl code	2017-10-09 22:24:18 -04:00
Soumith Chintala	4d62933529	add initial NCCL C bindings	2017-10-09 22:24:18 -04:00
Soumith Chintala	b7e258f81e	link specific versioned System NCCL, rather than generic file	2017-10-09 22:24:18 -04:00
Trevor Killeen	029252fb3b	NNPACK bindings for Convolution (#2826 ) * skeleton commit for building and linking nnpack library in PyTorch * first stab at conv forward binding + integration * bind NNPACK gradient kernels * move nnpack forward, input gradient calls deeper * nnpack conv api mimics nn * fix symbol error; use memory across calls * clean up warnings, add shape checking, thread safety, configurable thread specification * add batch size threshold, also bind for single-element batch for the future	2017-10-04 13:48:14 -04:00
Adam Paszke	437d3af7bf	Add CUDNN_INCLUDE_DIR before CUDA directories in setup.py	2017-10-03 10:06:47 -04:00
Sam Gross	de757805fc	Implement some autograd functions using ATen (#2805 ) This adds some generated autograd functions implemented in C++, which are generated from derivatives.yaml. It also generates Python bindings for the Variable methods. The generated files are: Functions.cpp/h: subclasses of torch::autograd::Function VariableType.cpp/h: The at::Type for autograd Variables python_variable_methods.cpp: Python bindings to torch::autograd::Variable python_variable_methods_dispatch.h: wrapper which releases GIL and sets the CUDA device python_functions.cpp/h: exposes generated autograd functions as Python objects The generated functions are mostly shadowed by the definitions in variable.py. We'll remove the Python implementations in favor of the generated C++ implementations in a subsequent commit.	2017-09-26 17:08:00 -04:00
Adam Paszke	b7849662b5	Always regenerate nn wrappers after rebuilding THNN and THCUNN	2017-09-25 23:21:30 -04:00
Adam Paszke	411e1469e0	Add tools for autograd profiling	2017-09-25 23:21:30 -04:00
Soumith Chintala	f4eca7c94d	make CUDA_HOME take precedence over all other CUDA detection methods (#2863 )	2017-09-25 18:17:40 -04:00
Soumith Chintala	5be06230f9	cleanup external NCCL detection, add NCCL_ROOT_DIR / NCCL_LIB_DIR mechanism	2017-09-25 11:28:59 -04:00
Edward Z. Yang	bf9ab91779	Indicate if the last invocation of setup.py was debug or not. How to use: import torch.version print(torch.version.debug) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-22 18:33:47 -04:00
Lu Fang	0a1ac8bfe5	create a cse pass, with very naive support.	2017-09-22 17:06:27 -04:00
Edward Z. Yang	670ec4bc59	Split Type into its own header file. No other substantive changes. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-20 12:24:27 -04:00
Adam Paszke	28828e033f	Make certain functions traceable	2017-09-19 10:53:32 -04:00
Adam Paszke	b708b6de8d	Add ONNX pass (JIT trace initialization)	2017-09-19 10:53:32 -04:00
Adam Paszke	0e53fe3a41	Put ONNX files where they belong	2017-09-19 10:53:32 -04:00
Adam Paszke	8dae433de8	Move JIT passes to a separate directory	2017-09-19 10:53:32 -04:00
Sam Gross	80d229b0e7	Refactor THPUtils_invalidArguments into separate file	2017-09-13 19:18:02 -04:00
Peter Ruch	0a9f93e43c	add env var for python executable	2017-09-13 17:49:08 -04:00
Soumith Chintala	19cfda761c	write THD link libraries to text file and read it in setup.py to link dependencies correctly (#2711 )	2017-09-12 20:56:36 -04:00
Sam Gross	1290e586fb	Use at::Tensor based autograd Variable (#2676 ) Variable is now a subclass of at::Tensor backed by a VariableImpl* pImpl. The implementation of the ATen functions is defined in the auto-generated VariableType.h/cpp file. Currently, only functions which fall through to the base type, such as sizes() and isCuda() are implemented. Differentiable ops like add() and mul() will be added in a subsequent PR.	2017-09-12 11:36:01 -04:00
Soumith Chintala	cf2c7ca998	add THPP linkage when building THD (#2687 )	2017-09-11 08:53:38 -04:00
Edward Z. Yang	459cc5a346	Check for nanopb and pybind11 submodules as well. (#2660 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-07 13:24:31 -04:00
Soumith Chintala	84095f9512	add linux guard	2017-09-07 11:57:49 -04:00
Soumith Chintala	894c05fd22	fix static linkage and make THD statically linked	2017-09-07 11:54:18 -04:00
Zach DeVito	6d8d5bab4c	Codemod Toffee -> ONNX, toffee -> onnx. Change file names to match	2017-09-06 13:45:39 -04:00
Edward Z. Yang	d59714e3b1	Code review comment changes. - Reduce setup.py diff. - Expunge WITH_TOFFEE from codebase. - Elaborate on a comment. - Move gen_toffee.sh to tools - Delete densenet test. - Use 'using' to inherit a constructor. - Delete outdated comment. - Comment about why primspecs can return fewer outputs. - Remove dead, commented out includes. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	7ac6d67a4e	Add nanopb to list of dep_libs. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Adam Paszke	594f98ce16	Support multi-stage AutogradClosures	2017-09-05 17:48:55 -04:00
Edward Z. Yang	605ef38831	Explicitly override CMAKE_DEBUG_POSTFIX for nanopb build. If it's not set, CMAKE_DEBUG_POSTFIX sets it to 'd' which means the static library gets named something different when built in debug mode. This is annoying because it means if you build in debug mode, the library is in a different place. Rather than teach the build system to find the correct name, just set this POSTFIX so names don't change. Also, update setup.py to look for the non-debug archive. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	de6ef65be5	Port to nanopb. General strategy: - nanopb is statically linked into PyTorch. It must be built with -fPIC. - Generated nanopb files for toffee.proto are checked into our repo. - Because nanopb generated protobufs are C only, we wrote a wrapper around it to give a Google C++ style interface. More on this shortly. How does the wrapper work? - It's called "micropb" becaues it is less small than nanopb :) - nanopb requires all variable-length fields to be written out using a "callbacks" mechanism. - We wrote pre-canned callbacks for all of the types ToffeeIR writes out and lists; these are micropb_callback and micropb_callback_list. These operate simply by dynamically allocating and storing the data to be written out in data (this defeats the purpose of the callback mechanism, but it's easy to implement) - Finally some boilerplate to actually implement the wrapper classes and have owning pointers to the actual data. Testing strategy: - Take the serialized protobuf from nanopb, parse it again with ToffeeIR and print it. Worked with all of test_jit.py! These tests don't run without 'toffee' being installed. TODO: - Update CI to install ToffeeIR, so we can run the Toffee tests in CI - Update E2E with Caffe2 tests so that they work with new stuff. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00

... 3 4 5 6 7 ...

581 Commits