pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Gregory Chanan	705d80b51e	Remove some Type.tensor usages and remove native_tensor without size. (#12355 ) Summary: This is to move us along the path to removing Type from the public API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12355 Reviewed By: ezyang Differential Revision: D10212616 Pulled By: gchanan fbshipit-source-id: c9cd128d1111ab219cb0b2f3bf5b632502ab97c0	2018-10-05 11:12:07 -07:00
David Riazati	9ebac3d7fe	Improve type kind error message (#12344 ) Summary: Address #12326 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12344 Differential Revision: D10210681 Pulled By: driazati fbshipit-source-id: fcc2e26b79dd2d7d5f9e7ef930e2bf434f2a7e08	2018-10-05 10:57:16 -07:00
Edward Yang	1e7050072b	Make TensorOptions contain optional fields, optimize struct size (#12103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12103 This defers lookup of defaults to the site where we read out of TensorOptions. THIS IS A BC-BREAKING BEHAVIOR CHANGE, but we expect the bulk of uses of OptionsGuard don't allocate TensorOptions inside the OptionsGuard region, and then use it outside of the region (the situation where behavior could change.) I also optimize the size of TensorOptions by rearranging fields, so that we always fit in two 64-bit words. Reviewed By: goldsborough Differential Revision: D10052523 fbshipit-source-id: f454a15b4dbf8cd17bc902ab7d2016f2f689ed13	2018-10-05 09:24:53 -07:00
Bram Wasti	5cb2b2358c	Move interned_strings and get build working (#12039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12039 Refactoring out this diff D9819906 Reviewed By: ezyang Differential Revision: D10024844 fbshipit-source-id: 75b6c93526dc1490299f8b5e564e029146338178	2018-10-05 00:41:18 -07:00
David Riazati	f0b73ff790	Pretty printer improvements (#12179 ) Summary: * Replaces `prim::PythonOp` with the name of the function being called * Delays printing values used in `prim::Return` nodes until the return node itself if that is the only place the value is used to remove some useless assigns zdevito apaszke ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/12179 Differential Revision: D10132661 Pulled By: driazati fbshipit-source-id: cbc4ac34137ed5872049082e25d19eb1ebc71208	2018-10-04 15:14:51 -07:00
Yangqing Jia	38f3d1fc40	move flags to c10 (#12144 ) Summary: still influx. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12144 Reviewed By: smessmer Differential Revision: D10140176 Pulled By: Yangqing fbshipit-source-id: 1a313abed022039333e3925d19f8b3ef2d95306c	2018-10-04 02:09:56 -07:00
Peter Goldsborough	bcc2a0599b	Enable clang-tidy in CI (#12213 ) Summary: At long last, we will have clang-tidy enabled in CI. For a while I thought I could clean up the project enough to enable clang-tidy with all checks enabled, but I figure it's smarter to set up the minimal checks and at least have those in CI. We can fix more going forward. ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/12213 Differential Revision: D10183069 Pulled By: goldsborough fbshipit-source-id: 7ecd2d368258f46efe23a2449c0a206d10f3a769	2018-10-03 17:25:06 -07:00
David Riazati	c9f9df002d	Properly catch errors in PythonOps (#12243 ) Summary: If a PythonOp throws an error it raises an exception to the interpreter and also releases the GIL which causes [pybind to segfault](https://github.com/potassco/clingo/issues/42) This fix catches pybind errors while the GIL is still held and throws a `python_error` to re-capture the GIL Fixes #12118 apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/12243 Differential Revision: D10182787 Pulled By: driazati fbshipit-source-id: 719d4a7c3294af201e061cf7141bec3ca0fb1f04	2018-10-03 17:25:03 -07:00
David Riazati	d1ac1eba3b	Add `bool` type to IR (#11834 ) Summary: This PR adds a bool type to `IValue` and puts it into place. * changes conds for `prim::If` and `prim::Loop` to use `bool` type * changes operators that take `bool`s to match their native ops * fixes ambiguous `aten` ops `aten::std` and `aten::var` * fixes tests in `test_jit.py TestJitGenerated` ``` 'test_std_dim', 'test_std_dim_1d', 'test_std_dim_1d_neg0', 'test_std_dim_neg0', 'test_var_dim', 'test_var_dim_1d', 'test_var_dim_1d_neg0', 'test_var_dim_neg0' ``` * adds `prim::BoolToTensor` and `prim::TensorToBool` apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11834 Differential Revision: D9928570 Pulled By: driazati fbshipit-source-id: 373c53df2f1a8ffa9e33d9a517002fbeef25f3eb	2018-10-03 12:40:03 -07:00
Elias Ellison	fed91f873f	(Very small) allow trailing commas in assign or tuples (#11723 ) Summary: Allow trailing commas in assign statements or tuples, which also allows single element tuples. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11723 Differential Revision: D10052162 Pulled By: eellison fbshipit-source-id: 344d908a3ad942a23ebd9f341794bc9734226aa8	2018-10-01 10:10:13 -07:00
mruberry	7b2c0a09e4	Adds support for NaN, +inf, -inf float scalars to CPU and CUDA fusers (#12070 ) Summary: In current upstream float scalars are always written into kernels with: `out << std::scientific << v << "f";` When the floats are special values like NaN, +inf, or -inf this produces nonsense that causes compilation to fail. This fix updates the conversion of float scalars to device-specific special values. The appropriate macros are added to the CPU and CUDA resource strings. Note that a NAN macro was not necessary on the CPU since math.h defines NAN. To verify this fix I updated the test_clamp_fusion test in test_jit.py. I wanted to test -inf, too, but -inf is not currently accepted by the interpreter. Edit: Forgot to mention, this partially addresses issue #12067. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12070 Reviewed By: ezyang Differential Revision: D10044704 Pulled By: soumith fbshipit-source-id: 8f4a930862d66a7d37d985e3f6a6fb724579e74c	2018-09-28 14:11:49 -07:00
Zachary DeVito	e7e10e60e0	Introduce builtin script functions (#12141 ) Summary: This functionality replaces the Scalar-Tensor builtin operators, with builtin functions. Builtin functions are used in place of operators where one operator can be defined using a composition of another. This simplifies later optimization passes by allowing us to have fewer operator. In the future, builtin functions can be used for other purposes. For example, we can define derivative functions as code rather than building graphs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12141 Reviewed By: ezyang Differential Revision: D10088065 Pulled By: zdevito fbshipit-source-id: a2acb06346e649c4c8a2fe423b420871161c21cf	2018-09-28 10:55:08 -07:00
Luca Antiga	5be0baefa2	Use streams in JIT serialization, allow JIT serialization to/from buffer (#11932 ) Summary: This PR replaces the use of `std::FILE` with `istream`/`ostream` for JIT serialization. It uses this mechanism to add the possibility to serialize to/from binary buffers, in addition to files, both in `libtorch` and from Python. `getExportImportCopy` in `test_jit.py` has been updated so that both file and buffer codepaths are exercised during tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11932 Differential Revision: D10084303 Pulled By: apaszke fbshipit-source-id: b850801b3932922fa1dbac6fdaed5063d58bc20d	2018-09-28 07:54:27 -07:00
Michael Suo	7f35e92af2	mutable lists (#10700 ) Summary: This PR implements the design that we discussed. Changes: - Added a World token IValue and type. The IValue is basically a dummy struct for now, in the future we may extend it (say, add thread-local state). - Effectful ops explicitly declare they are mutable by having World tokens as inputs and outputs in their schema. - Purely functional ops that use mutable values will get "fenced" and the world token will be threaded through the fences - AnnotateEffects pass which wires up all the world tokens together. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10700 Reviewed By: eellison Differential Revision: D9547881 Pulled By: michaelsuo fbshipit-source-id: ebbd786c31f15bf45e2ddb0c188438ff2f5f3c88	2018-09-27 19:25:13 -07:00
Edward Z. Yang	a5818047c4	Rewrite serialization to correctly handle partial reads/writes in all cases (#12143 ) Summary: Previously, doRead/doWrite were functions that could return partial reads/writes, and we checked for this case inconsistently in the call sites of serialization.cpp. Now, these functions do NOT return the amount of bytes read/written, and instead handle the necessary checking loop themselves. Fixes #12042. Maybe. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12143 Differential Revision: D10097027 Pulled By: ezyang fbshipit-source-id: fd222ab8a825bed352153648ad396acfe124a3e1	2018-09-27 19:09:53 -07:00
Yangqing Jia	13cf39294d	Remove ATen/Error.h and use ATen/core/Error.h instead. (#12132 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12132 TSIA. No code change involved. Reviewed By: bwasti Differential Revision: D10083237 fbshipit-source-id: bdab029015b9d0f1fa1f866c68aa5945cc68db9d	2018-09-27 10:11:17 -07:00
Freddie Mendoza	a72603f8f8	Fix for ppc64le jit graph difference in sigmoid backward, see #10726 (#11579 ) Summary: As reported in Issue #10726, the jit compiler, when running on ppc64le, may produce an isomorphic output but fail a diff test against the expected output file. The expected output file is created from a test that was ran on x86_64. This ensures that if ppc64le test output is different, the output is instead compared to an expected output file created when the test is run on a ppc64le system. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11579 Differential Revision: D10080890 Pulled By: soumith fbshipit-source-id: 7249bf6b5dfa7c853368a3688a982bc9ed642bc9	2018-09-27 07:09:31 -07:00
Zachary DeVito	478803a75f	Introduce type variables to implement generic list operators (#12040 ) Summary: We generate specialized list operations for int, float, and Tensor lists so that small lists of integers like the arguments to conv do not involve tons of boxing code. This PR adds a fallback GenericList for List types that contain any other type. It does so by adding type variables to `jit::Type`, and machinery for matching/replacing the type variables during `tryMatchSchema` and operator lookup. It also modifies the builtin list ops to include a fallback that works on a GenericList object that simply holds IValues. This is distinguished from IValue's tuple type so that conversion to/from Python still happens losslessly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12040 Differential Revision: D10037098 Pulled By: zdevito fbshipit-source-id: 0c5f2864d12e7d33554bf34cc29e5fb700dde150	2018-09-26 17:02:51 -07:00
Adam Paszke	78fe149ab9	Fix ONNX bug, add symbolic for full Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12052 Differential Revision: D10044910 Pulled By: apaszke fbshipit-source-id: 015ef372966d7594e1b450e348d457429f6ef20d	2018-09-26 11:45:25 -07:00
Richard Zou	c8a0b11b7f	add autodiff expressions for common operations (#11832 ) Summary: This PR does a few things: Previously test_jit.py only tested autograd on backward graphs. This is because we borrow from test_autograd and construct graphs with a small number of nodes. Because the number of nodes is small (typically 1-2), those graph do not end up containing autodiff subgraphs, so autodiff never gets tested. This PR enables autodiff testing by doing the following: - added disableDebugAutodiffSubgraphInlining fn to graph_executor to disable autodiff subgraph inlining. - (implementation) added autodiffSubgraphNodeThreshold and autodiffSubgraphInlineThreshold. These are set to their default values (2, 5) but disableDebugAutodiffSubgraphInlining() sets both to 1, disabling subgraph inlining and allowing 1-node autodiff subgraphs. - The relevant backward jit tests disable autodiff subgraph inlining so they will test the autodiff versions of the operators instead of autograd whenever an autodiff variant exists. - We don't run the tests that do inline autodiff subgraphs anymore. This has no impact on testing correctness because the assumption is that autograd functions are correct and are tested in test_autograd.py This allows the graph fuser to work better because a lot of these ops were previously not autodiff-compatible but fusible. On a more concrete example, lstm backward contains a lot of tensor-scalar operations; these autodiff formulas help its double backward pass. Included: - arithmetic overloads - abs, acos, asin, atan, ceil, cos, cosh, exp, expm1, floor, fmod, frac, log, log10, log1p, log2 reciprocal, remainder, round, sin, sinh, tan, trunc, rsqrt TestJitGenerated tests autodiff for all of the added operations. cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11832 Differential Revision: D10031256 Pulled By: zou3519 fbshipit-source-id: 9daf9900a5ad187743609cd0fbbd10b15411ad93	2018-09-26 08:10:04 -07:00
Wei Yang	807de9a1e3	fix segfault when grad to a hook fn is None (#12028 ) Summary: - fixes https://github.com/pytorch/pytorch/issues/11751 by checking if a grad is a Python None object before getting cdata from it - behaviors: pre-fix ``` >>> a = torch.randn(5, requires_grad=True) >>> a_list = a.unbind() >>> a0 = a_list[0] >>> a0.register_hook ...: def hook(grad): ...: print(grad) >>> a_list[0].backward() tensor(1.) >>> print('a_list[0]', a_list[0].grad, a.grad) ('a_list[0]', None, tensor([1., 0., 0., 0., 0.])) >>> a_list[1].backward() # segfault ``` post-fix ``` >>> a = torch.randn(5, requires_grad=True) >>> a_list = a.unbind() >>> a0 = a_list[0] >>> a0.register_hook ... : def hook(grad): ... : print(grad) >>> a_list[0].backward() tensor(1.) >>> print(a_list[0].grad, a.grad) (None, tensor([1., 0., 0., 0., 0.])) >>> a_list[1].backward() None >>> print(a_list[1].grad, a.grad) (None, tensor([1., 1., 0., 0., 0.])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12028 Differential Revision: D10034094 Pulled By: weiyangfb fbshipit-source-id: 3f2135325fa7d338b920f57752057e4f6a6c0b1d	2018-09-25 19:10:25 -07:00
Edward Yang	3deb4791c3	Replace 'struct Tensor' with 'class Tensor'. (#12034 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12034 We need ATen and Caffe2 to line up, and the rule is that if you have any private/protected members, you should declare it as a class. Class we go. (There are some other obvious candidates for this treatment, but I've kept this patch just to Tensor) Reviewed By: gchanan, mingzhe09088 Differential Revision: D10024467 fbshipit-source-id: 17cfe2741ba9c3f56cb87d6f5d1afd3c61a8e4fe	2018-09-25 09:54:35 -07:00
Gregory Chanan	0947712e5d	Move Factory functions from Type to TypeExtendedInterface. (#12025 ) Summary: This makes a few changes wrt Type, with the ultimate goal of removing Type from the public Methods/Functions. In particular: 1) Removes factory functions from Type, into TypeExtendedInterface. 2) sparse_coo_tensor is now a first class at:: namespace function, with TensorOptions overloads. 3) We move from Type-based sparse_coo_tensor dispatch to function-based. Note we still require a number of changes to get rid of tType in the public interface, in particular TensorOptions needs to support CUDA vs non-CUDA dispatch. That is coming in a future patch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12025 Reviewed By: ezyang Differential Revision: D10017205 Pulled By: gchanan fbshipit-source-id: 00807a37b09ed33f0656aaa165bb925abb026320	2018-09-25 09:40:17 -07:00
Edward Yang	d4ce41c4de	Rename tensor_impl_ to impl_ in Tensor (#12035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12035 This brings it in line with Caffe2's naming Reviewed By: mingzhe09088 Differential Revision: D10024485 fbshipit-source-id: a6feef82a56b5eb3043b0821ea802ba746e542a0	2018-09-25 09:11:39 -07:00
Hong Xu	3417a1e7e4	Prepend a "const" to a for loop in printPyObject. (#11857 ) Summary: As pytuple should be a constant type (since obj is constant), potential errors would occur without this const decorator, e.g., when compiling against PyPy. Although PyPy is not supported yet, it would still be useful if we remove this compilation issue (out of very few numbers of compilation issues) to allow hackers playing with them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11857 Differential Revision: D10024149 Pulled By: soumith fbshipit-source-id: aa7e08e58f6369233a11477113351dccd3854ba8	2018-09-24 23:12:57 -07:00
Adam Paszke	a830964007	Eliminate no-op adds and muls in peephole pass (#11801 ) Summary: Because we emit a lot of them in our symbolic AD. This brings down the backward time of an LSTM I'm testing from 14.2ms to 12.5ms (a 15% improvement). Pull Request resolved: https://github.com/pytorch/pytorch/pull/11801 Differential Revision: D9916815 Pulled By: apaszke fbshipit-source-id: 2d9cb886c424ccd43b9f996aad89950d3bddf494	2018-09-24 17:48:48 -07:00
Peter Goldsborough	e05d689c49	Unify C++ API with C++ extensions (#11510 ) Summary: Currently the C++ API and C++ extensions are effectively two different, entirely orthogonal code paths. This PR unifies the C++ API with the C++ extension API by adding an element of Python binding support to the C++ API. This means the `torch/torch.h` included by C++ extensions, which currently routes to `torch/csrc/torch.h`, can now be rerouted to `torch/csrc/api/include/torch/torch.h` -- i.e. the main C++ API header. This header then includes Python binding support conditioned on a define (`TORCH_WITH_PYTHON_BINDINGS`), which is only passed when building a C++ extension. Currently stacked on top of https://github.com/pytorch/pytorch/pull/11498 Why is this useful? 1. One less codepath. In particular, there has been trouble again and again due to the two `torch/torch.h` header files and ambiguity when both ended up in the include path. This is now fixed. 2. I have found that it is quite common to want to bind a C++ API module back into Python. This could be for simple experimentation, or to have your training loop in Python but your models in C++. This PR makes this easier by adding pybind11 support to the C++ API. 3. The C++ extension API simply becomes richer by gaining access to the C++ API headers. soumith ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11510 Reviewed By: ezyang Differential Revision: D9998835 Pulled By: goldsborough fbshipit-source-id: 7a94b44a9d7e0377b7f1cfc99ba2060874d51535	2018-09-24 14:44:21 -07:00
Adam Paszke	51414822f5	Stop moving constants into DifferentiableSubgraphs (#11809 ) Summary: Or even taking them as inputs. This prevents optimizations to happen either inside the differentiable subgraphs, or in the surrounding graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11809 Differential Revision: D10009680 Pulled By: apaszke fbshipit-source-id: face638566228e470a6deec48dc2aa3a1cce26d4	2018-09-24 13:24:53 -07:00
Christian Puhrsch	a9e6a673ae	Remove caffe2::Tensor::capacity_nbytes, at::Tensor::to##name##Data, (#11876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11876 Modern C++ api instead of macros, item() is aligned with Python frontend. caffe2::Tensor::capacity_nbytes is effecitvely unused and confusing w.r.t. caffe2::Tensor::nbytes(). codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCByte "item<uint8_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCLong "item<int64_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCInt "item<int32_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCDouble "item<double>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCFloat "item<float>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toByteData "data<uint8_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toLongData "data<int64_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toIntData "data<int32_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toDoubleData "data<double>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toFloatData "data<float>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCByte "item<uint8_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCLong "item<int64_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCInt "item<int32_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCDouble "item<double>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCFloat "item<float>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toByteData "data<uint8_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toLongData "data<int64_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toIntData "data<int32_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toDoubleData "data<double>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toFloatData "data<float>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCComplexDouble "item<std::complex<double>>" codemod -d tc --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCFloat "item<float>" Reviewed By: ezyang Differential Revision: D9948572 fbshipit-source-id: 70c9f5390d92b82c85fdd5f8a5aebca338ab413c	2018-09-24 10:40:10 -07:00
Gregory Chanan	1178851280	Get rid of most usages of Type.tensor. (#12002 ) Summary: 1) Most usages are replaced by at::empty. 2) native_tensor has its namespace function removed 3) Type.tensor(sizes, strides) becomes at::empty_strided(sizes, strides). Pull Request resolved: https://github.com/pytorch/pytorch/pull/12002 Differential Revision: D10007201 Pulled By: gchanan fbshipit-source-id: 5e5647c050ed2ecb87a33e0b5ce4928fa3186c34	2018-09-24 10:16:18 -07:00
Peter Goldsborough	825181ea9d	Rewrite C++ API tests in gtest (#11953 ) Summary: This PR is a large codemod to rewrite all C++ API tests with GoogleTest (gtest) instead of Catch. You can largely trust me to have correctly code-modded the tests, so it's not required to review every of the 2000+ changed lines. However, additional things I changed were: 1. Moved the cmake parts for these tests into their own `CMakeLists.txt` under `test/cpp/api` and calling `add_subdirectory` from `torch/CMakeLists.txt` 2. Fixing DataParallel tests which weren't being compiled because `USE_CUDA` wasn't correctly being set at all. 3. Updated README ezyang ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/11953 Differential Revision: D9998883 Pulled By: goldsborough fbshipit-source-id: affe3f320b0ca63e7e0019926a59076bb943db80	2018-09-21 21:28:16 -07:00
Owen Anderson	89d56ae435	Move function deletion from the stack to the heap. (#11611 ) Summary: This eliminates the need for any heuristics regarding stack size limits. This is a re-do #11534 with a fix to properly handle cases where multiple edges exist between a pair of functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11611 Differential Revision: D9991198 Pulled By: resistor fbshipit-source-id: fecd2c5cac7e78f82a0f20cf33268bb1617bb4a0	2018-09-21 16:11:03 -07:00
Richard Zou	b5f60af94c	Shape prop view/reshape/as_strided through prim::ListConstructs (#11877 ) Summary: Previously, aten::view returned a Dynamic type when attr::size is a prim::ListConstruct. See [this for a repro](https://gist.github.com/zou3519/cbd610472ba3369f556fa612a7d93b28). This prevented a pre-multipled lstm input graph from being fusible (aten::view is necessary to do premultiplication). If aten::view is passed an output of a prim::ListConstruct node, then shape prop should be able to figure out its TensorType because we statically know the number of inputs to prim::ListConstruct. This PR implements that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11877 Differential Revision: D9972356 Pulled By: zou3519 fbshipit-source-id: cb87786f6e7f222d4b8f07d8f2a9de34859cb6a5	2018-09-21 14:20:01 -07:00
Adam Paszke	7efbf3a827	Specialize ArgumentSpecs on tuple elements too (#11863 ) Summary: This is pretty important because a common situation of passing LSTM hidden states as a tuple completely trashes performance of a network. Cleans up all our propagation/undef specialization passes, at a cost of increased complexity of `ArgumentSpec` and `GraphExecutor`. An alternative would be to simply flatten all tuple inputs to a graph ahead of time, but that might just end up being confusing in the future (you never know if you're working with a graph that can have tuple or not). Pull Request resolved: https://github.com/pytorch/pytorch/pull/11863 Differential Revision: D9992814 Pulled By: apaszke fbshipit-source-id: 0a565a3b23e32f8fa72c0534e07c1ce6187739fc	2018-09-21 14:19:58 -07:00
Adam Paszke	1ad7e0c5ec	Minor JIT improvements (#11654 ) Summary: - Disable addmm fusion. The reason for this is explained in the comment. - Tiny change in `stack.h` that lets us avoid constructing an unnecessary temporary `IValue` on the (C++) stack (it will only get created on the interpreter stack directly). - Fixed a correctness issue in requires grad propagation Pull Request resolved: https://github.com/pytorch/pytorch/pull/11654 Reviewed By: colesbury Differential Revision: D9813739 Pulled By: apaszke fbshipit-source-id: 23e83bc8605802f39bfecf447efad9239b9421c3	2018-09-21 14:19:54 -07:00
Adam Paszke	c2a2110d71	Stop tracing _out overloads (#11910 ) Summary: They aren't recognized anywhere in the JIT Pull Request resolved: https://github.com/pytorch/pytorch/pull/11910 Differential Revision: D9979968 Pulled By: apaszke fbshipit-source-id: bb2505a14e3b1e54d5c243f99c80a4f4d918b204	2018-09-21 11:44:10 -07:00
Wei Yang	817e83fc01	fix PR #11061 (#11815 ) Summary: - fix PR https://github.com/pytorch/pytorch/pull/11061 by moving `detach_()` and `set_requires_grad()` to `torch.tensor_ctor()` and `tensor.new_tensor`, and also removed warnings and `args_requires_grad` from `internal_new_from_data ` - with this patch, the returned tensor from `tensor_ctor()` and `new_tensor` will be detached from source tensor, and set requires_grad based on the input args - `torch.as_tensor` retains its behavior as documented gchanan apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11815 Differential Revision: D9932713 Pulled By: weiyangfb fbshipit-source-id: 4290cbc57bd449954faadc597c24169a7b2d8259	2018-09-21 11:04:19 -07:00
Adam Paszke	e655f16c35	Pop stashed IntList in resize_, warn about its usage when tracing. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11909 Differential Revision: D9979595 fbshipit-source-id: 07b1027bd6bd1605a31afd4f57bcd58e307fa41e	2018-09-21 08:40:20 -07:00
Edward Yang	11bd2f2509	Retainable is no more (#11900 ) Summary: Stack:     ⚫  #11900 Retainable is no more  [💛](https://our.intern.facebook.com/intern/diff/D9977505/)     ⚪  #11902 Refactor fastGet/fastSet for clarity, removing a null pointer check.  [💛](https://our.intern.facebook.com/intern/diff/D9977654/) Kill it with fire Pull Request resolved: https://github.com/pytorch/pytorch/pull/11900 Differential Revision: D9979779 Pulled By: ezyang fbshipit-source-id: 0a437e7a0baadb6440e7dc39a01b4a406171faa7	2018-09-21 06:58:18 -07:00
Luca Antiga	58d28a5f12	Fix saving loaded module (#11915 ) Summary: This PR fixes #11913. In order to test for this, the model is serialized twice in `getExportImportCopy`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11915 Differential Revision: D9984697 Pulled By: soumith fbshipit-source-id: ae0250c179000c03db1522b99410f6ecb9681297	2018-09-21 06:58:16 -07:00
Peter Goldsborough	d712a71741	Protobuf serialization (#11619 ) Summary: This PR serves two purposes: 1. Design an abstraction over a serialization scheme for C++ modules, optimizers and tensors in general, 2. Add serialization to the ONNX/PyTorch proto format. This is currently a rough prototype I coded up today, to get quick feedback. For this I propose the following serialization interface within the C++ API: ```cpp namespace torch { namespace serialize { class Reader { public: virtual ~Reader() = default; virtual void read(const std::string& key, Tensor& tensor, bool is_buffer = false) = 0; virtual void finish() { } }; class Writer { public: virtual ~Reader() = default; virtual void writer(const std::string& key, const Tensor& tensor, bool is_buffer = false) = 0; virtual void finish() { } }; }} // namespace torch::serialize ``` There are then subclasses of these two for (1) Cereal and (2) Protobuf (called the "DefaultWriter" and "DefaultReader" to hide the implementation details). See `torch/serialize/cereal.h` and `torch/serialize/default.h`. This abstraction and subclassing for these two allows us to: 1. Provide a cereal-less serialization forward that we can ship and iterate on going forward, 2. Provide no-friction backwards compatibility with existing C++ API uses, mainly StarCraft. The user-facing API is (conceptually): ```cpp void torch::save(const Module& module, Writer& writer); void torch::save(const Optimizer& optimizer, Writer& writer); void torch::read(Module& module, Reader& reader); void torch::read(Optimizer& optimizer, Reader& reader); ``` with implementations for both optimizers and modules that write into the `Writer` and read from the `Reader` ebetica ezyang zdevito dzhulgakov Pull Request resolved: https://github.com/pytorch/pytorch/pull/11619 Differential Revision: D9984664 Pulled By: goldsborough fbshipit-source-id: e03afaa646221546e7f93bb8dfe3558e384a5847	2018-09-20 20:39:34 -07:00
Thomas Viehmann	068eac255b	Jit fuse clamp (#11574 ) Summary: This patch adds fused forward and backward for clamp to the jit. This is one item of #11118 . If it's OK, I'd be happy to also add some more of #11118 . The patch depends on #11150 , which I merged into master as a base. I'll rebase it when that or #10981 is merged. This is first serious jit patch, thank you, ngimel and the others for their guidance. All errors are my own. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11574 Differential Revision: D9943090 Pulled By: apaszke fbshipit-source-id: c40954b8c28c374baab8d3bd89acc9250580dc67	2018-09-20 14:43:10 -07:00
Hong Xu	83740eae4a	Avoid using PyThreadState.frame as it is not a public member. (#11855 ) Summary: The doc of PyThreadState [1] emphasizes that interp is its only public member. Use PyEval_GetFrame() instead. [1] https://docs.python.org/3/c-api/init.html#c.PyThreadState Pull Request resolved: https://github.com/pytorch/pytorch/pull/11855 Differential Revision: D9954430 Pulled By: ezyang fbshipit-source-id: 92da6781e45e2bcb5e3a37b162fa40e49d823215	2018-09-19 20:58:37 -07:00
David Riazati	1091c5e59f	Throw error on indexing a 0 dim tensor (#11679 ) Summary: Following through on warning that indexing 0-dim tensor would be an error in PyTorch 0.5 and to use `item()` instead Pull Request resolved: https://github.com/pytorch/pytorch/pull/11679 Reviewed By: soumith Differential Revision: D9833570 Pulled By: driazati fbshipit-source-id: ac19f811fa7320d30b7f60cf66b596d6de684d86	2018-09-19 18:10:03 -07:00
Tongzhou Wang	24e958a0a7	Move bernoulli into ATen (#10273 ) Summary: + https://github.com/pytorch/pytorch/issues/10236 : torch.bernoulli's out kwarg is broken fixed in moving `bernoulli_out` to ATen + https://github.com/pytorch/pytorch/issues/9917 : BUG torch.bernoulli(p.expand(shape)) is broken fixed in moving all `bernoulli` ops in ATen to use the modern apply utils methods + https://github.com/pytorch/pytorch/issues/10357 : torch.bernoulli inconsistent gpu/cpu results fixed by adding CUDA asserts In order to use `curand_uniform4`, I made some changes to `CUDAApplyUtils.cuh`. Specifically, I introduced an optional template parameter `int step` to the `CUDA_tensor_applyN` methods, representing that we want to process `step` values at each time for each of the `N` tensors. The calling convention for `step = 1` (default) isn't changed. But if `step > 1`, the given lambda `op` must take in `int n` as its first argument, representing the number of valid values, because there may not be full `step` values at the boundary. E.g., here is what the `bernoulli(self, p_tensor)` call look like: ```cpp // The template argument `4` below indicates that we want to operate on four // element at each time. See NOTE [ CUDA_tensor_applyN helpers ] for details. at::cuda::CUDA_tensor_apply2<scalar_t, prob_t, 4>( ret, p, [seeds] __device__( int n, scalar_t& v1, scalar_t& v2, scalar_t& v3, scalar_t& v4, const prob_t& p1, const prob_t& p2, const prob_t& p3, const prob_t& p4) { curandStatePhilox4_32_10_t state; curand_init( seeds.first, blockIdx.x * blockDim.x + threadIdx.x, seeds.second, &state); float4 rand = curand_uniform4(&state); switch (n) { case 4: { assert(0 <= p4 && p4 <= 1); v4 = static_cast<scalar_t>(rand.w <= p4); } case 3: { assert(0 <= p3 && p3 <= 1); v3 = static_cast<scalar_t>(rand.z <= p3); } case 2: { assert(0 <= p2 && p2 <= 1); v2 = static_cast<scalar_t>(rand.y <= p2); } case 1: { assert(0 <= p1 && p1 <= 1); v1 = static_cast<scalar_t>(rand.x <= p1); } } } ); ``` Benchmarking on `torch.rand(200, 300, 400)` 20 times, each time with 20 loops: post patch ``` ➜ ~ numactl --cpunodebind 1 --membind 1 -- taskset -c 12,13,14,15,16,17,18,19,20,21,22,23 env CUDA_LAUNCH_BLOCKING=1 python bern.py torch.bernoulli(x) 6.841588497161865 +- 0.05413117632269859 torch.bernoulli(xc) 0.05963418632745743 +- 0.0008014909108169377 x.bernoulli_() 0.4024486541748047 +- 0.0021550932433456182 xc.bernoulli_() 0.02167394384741783 +- 2.3818030967959203e-05 ``` pre-patch ``` ➜ ~ numactl --cpunodebind 1 --membind 1 -- taskset -c 12,13,14,15,16,17,18,19,20,21,22,23 env CUDA_LAUNCH_BLOCKING=1 python bern.py torch.bernoulli(x) 12.394511222839355 +- 0.0966421514749527 torch.bernoulli(xc) 0.08970972150564194 +- 0.0038722590543329716 x.bernoulli_() 1.654480218887329 +- 0.02364428900182247 xc.bernoulli_() 0.058352887630462646 +- 0.003094920190051198 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10273 Differential Revision: D9831294 Pulled By: SsnL fbshipit-source-id: 65e0655a36b90d5278b675d35cb5327751604088	2018-09-19 16:45:47 -07:00
Adam Paszke	8c3a94eaf2	Improve autograd profiler performance (#11773 ) Summary: To illustrate the benefits of this commit, I'll use the time/iter I got from one of the JIT benchmarks on my machine. \| Run \| Time \| \|----------------------------------------------\|-------------------------\| \| No profiler \| 45ms \| \| With profiler \| 56ms \| \| Use `clock_gettime` instead of `std::chrono` \| 48ms \| \| Touch all pages on block allocation \| 48ms (less jitter) \| \| Use `const char*` instead of `std::string` \| 47ms (even less jitter) \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/11773 Differential Revision: D9886858 Pulled By: apaszke fbshipit-source-id: 58f926f09e95df0b11ec687763a72b06b66991d0	2018-09-19 09:25:43 -07:00
Peter Goldsborough	b3a2665e0f	Code-reorg to have TORCH_ARG in its own header (#11787 ) Summary: I noticed I was including `torch/nn/pimpl.h` in the optimizer library just to access `TORCH_ARG`, even though that file includes a lot of irrelevant code. Let's save some re-compilation time by refactoring this macro into a separate logical file. #small-wins ebetica ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11787 Differential Revision: D9924447 Pulled By: goldsborough fbshipit-source-id: 5acd4ba559ffb2a3e97277e74bb731d7b1074dcf	2018-09-19 09:25:41 -07:00
Natalia Gimelshein	8601b33c07	fix half grad assignment (#11781 ) Summary: currently grad assignment for half type fails with a misleading RuntimeError ``` RuntimeError: torch.cuda.sparse.HalfTensor is not enabled. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11781 Differential Revision: D9931884 Pulled By: soumith fbshipit-source-id: 03e946c3833d1339a99585c9aa2dbb670f8bf459	2018-09-18 23:00:49 -07:00
David Riazati	a79f5d77ad	Add pretty printer for JIT IR (#10319 ) Summary: Adds some pretty-printing capability to the IR graph to make debugging easier/more human readable, see `torch/csrc/jit/test_jit.cpp:925` and onwards for example outputs. Results aren't perfect yet but it's a start. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10319 Reviewed By: zdevito Differential Revision: D9558402 Pulled By: driazati fbshipit-source-id: 1d61c02818daa4c9bdca36d1477d1734cfc7d043	2018-09-18 17:39:44 -07:00
sven	e585f2fb48	Polish CPP docs, Minor Python Docs Fixes (#11722 ) Differential Revision: D9919120 Pulled By: goldsborough fbshipit-source-id: bf14cbe4ab79524495957cb749828046af864aab	2018-09-18 14:55:57 -07:00

1 2 3 4 5 ...

1753 Commits