pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Phúc Lê	9b272affde	Add base support to torch.logspace, default base=10 (#19542 ) Summary: Add base support for torch.logspace. See #19220 for details. SsnL can you feedback? Thanks a lot. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19542 Differential Revision: D15028484 Pulled By: soumith fbshipit-source-id: fe5a58a203b279103abbc192c754c25d5031498e	2019-04-23 15:06:34 -07:00
SsnL	dce3d74dfb	add torch.cuda.synchronize(device=None) (#19573 ) Summary: fixes https://github.com/pytorch/pytorch/issues/19509 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19573 Differential Revision: D15045730 Pulled By: ezyang fbshipit-source-id: 732721b4b360fc4348ca7c87d4cd1386e7651bdd	2019-04-23 08:40:38 -07:00
vishwakftw	c30224ad21	Rename potri to cholesky_inverse (#19498 ) Summary: Changelog: - Rename `potri` to `cholesky_inverse` to remain consistent with names of `cholesky` methods (`cholesky`, `cholesky_solve`) - Fix all callsites - Rename all tests - Create a tentative alias for `cholesky_inverse` under the name `potri` and add a deprecation warning to not promote usage Pull Request resolved: https://github.com/pytorch/pytorch/pull/19498 Differential Revision: D15029901 Pulled By: ezyang fbshipit-source-id: 2074286dc93d8744cdc9a45d54644fe57df3a57a	2019-04-22 08:18:39 -07:00
Tongzhou Wang	973d51079b	Add device-specific cuFFT plan caches (#19300 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/19224 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19300 Differential Revision: D14986967 Pulled By: soumith fbshipit-source-id: 8c31237db50d6924bba1472434c10326610d9255	2019-04-18 06:39:35 -07:00
Richard Zou	eaa14f5f59	Error out on in-place binops on tensors with internal overlap (#19317 ) Summary: This adds checks for `mul_`, `add_`, `sub_`, `div_`, the most common binops. See #17935 for more details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19317 Differential Revision: D14972399 Pulled By: zou3519 fbshipit-source-id: b9de331dbdb2544ee859ded725a5b5659bfd11d2	2019-04-17 13:02:07 -07:00
J M Dieterich	31686805f2	Enable unit tests for ROCm 2.3 (#19307 ) Summary: Unit tests that hang on clock64() calls are now fixed. test_gamma_gpu_sample is now fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19307 Differential Revision: D14953420 Pulled By: bddppq fbshipit-source-id: efe807b54e047578415eb1b1e03f8ad44ea27c13	2019-04-16 10:58:27 -07:00
Sam Gross	7caad0ed33	Free all blocks with outstanding events on OOM-retry (#19222 ) Summary: The caching allocator tries to free all blocks on an out-of-memory error. Previously, it did not free blocks that still had outstanding stream uses. This change synchronizes on the outstanding events and frees those blocks. See #19219 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19222 Differential Revision: D14925071 Pulled By: colesbury fbshipit-source-id: a2e9fe957ec11b00ea8e6c0468436c519667c558	2019-04-15 11:29:27 -07:00
Johannes M Dieterich	d8669a2c7e	Enable working ROCm tests (#19169 ) Summary: Enable multi-GPU tests that work with ROCm 2.2. Have been run three times on CI to ensure stability. While there, remove skipIfRocm annotations for tests that depend on MAGMA. They still skip but now for the correct reason (no MAGMA) to improve our diagnostics. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19169 Differential Revision: D14924812 Pulled By: bddppq fbshipit-source-id: 8b88f58bba58a08ddcd439e899a0abc6198fef64	2019-04-12 21:51:10 -07:00
Vishwak Srinivasan	487388d8ad	Rename btrisolve to lu_solve (#18726 ) Summary: Changelog: - Rename `btrisolve` to `lu_solve` to remain consistent with names of solve methods (`cholesky_solve`, `triangular_solve`, `solve`) - Fix all callsites - Rename all tests - Create a tentative alias for `lu_solve` under the name `btrisolve` and add a deprecation warning to not promote usage Pull Request resolved: https://github.com/pytorch/pytorch/pull/18726 Differential Revision: D14726237 Pulled By: zou3519 fbshipit-source-id: bf25f6c79062183a4153015e0ec7ebab2c8b986b	2019-04-09 15:21:24 -07:00
J M Dieterich	e45e3634d6	add launch bounds, enable more tests (#18909 ) Summary: Add launch bounds annotations for ROCm arising from maxThreadsPerBlock and apply threads use. Enable tests that now work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18909 Differential Revision: D14801490 Pulled By: ezyang fbshipit-source-id: b81c97fc783a2627bc7e31b32036a364cfe40cc7	2019-04-05 10:17:15 -07:00
Roy Li	f5741eb855	Store ScalarType and Backend instead of Type in TensorIterator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17601 Reviewed By: ezyang Differential Revision: D14274754 fbshipit-source-id: b08880ae586b6ae57d4c0bbeb203796d087926c4	2019-04-04 02:24:16 -07:00
vishwakftw	baac5489a8	Expose alias multinomial methods to ATen (#17904 ) Summary: This PR exposes the multinomialAliasSetup and multinomialAliasDraw methods. cc: neerajprad Pull Request resolved: https://github.com/pytorch/pytorch/pull/17904 Differential Revision: D14700205 Pulled By: ezyang fbshipit-source-id: 16462fb1f1ef1d560fd586632ea356b23e966ee3	2019-04-02 07:56:41 -07:00
Edward Yang	173f224570	Turn on F401: Unused import warning. (#18598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598 ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a Stack from [ghstack](https://github.com/ezyang/ghstack): * #18598 Turn on F401: Unused import warning. This was requested by someone at Facebook; this lint is turned on for Facebook by default. "Sure, why not." I had to noqa a number of imports in __init__. Hypothetically we're supposed to use __all__ in this case, but I was too lazy to fix it. Left for future work. Be careful! flake8-2 and flake8-3 behave differently with respect to import resolution for # type: comments. flake8-3 will report an import unused; flake8-2 will not. For now, I just noqa'd all these sites. All the changes were done by hand. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14687478 fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3	2019-03-30 09:01:17 -07:00
Vishwak Srinivasan	e73be58ff7	Rename `btriunpack` to `lu_unpack` (#18529 ) Summary: Changelog: - Renames `btriunpack` to `lu_unpack` to remain consistent with the `lu` function interface. - Rename all relevant tests, fix callsites - Create a tentative alias for `lu_unpack` under the name `btriunpack` and add a deprecation warning to not promote usage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18529 Differential Revision: D14683161 Pulled By: soumith fbshipit-source-id: 994287eaa15c50fd74c2f1c7646edfc61e8099b1	2019-03-29 13:01:30 -07:00
Vishwak Srinivasan	d859031ebf	Rename `btrifact*` to `lu` (#18435 ) Summary: Changelog: - Renames `btrifact` and `btrifact_with_info` to `lu`to remain consistent with other factorization methods (`qr` and `svd`). - Now, we will only have one function and methods named `lu`, which performs `lu` decomposition. This function takes a get_infos kwarg, which when set to True includes a infos tensor in the tuple. - Rename all tests, fix callsites - Create a tentative alias for `lu` under the name `btrifact` and `btrifact_with_info`, and add a deprecation warning to not promote usage. - Add the single batch version for `lu` so that users don't have to unsqueeze and squeeze for a single square matrix (see changes in determinant computation in `LinearAlgebra.cpp`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/18435 Differential Revision: D14680352 Pulled By: soumith fbshipit-source-id: af58dfc11fa53d9e8e0318c720beaf5502978cd8	2019-03-29 00:34:30 -07:00
jithunnair-amd	fdedc62c26	enable more unit tests (#18537 ) Summary: Enable unit tests working with ROCm 2.3. In particular, these are unit tests where we skipped for double data types previously and some tests for multi-GPU setups. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18537 Differential Revision: D14651822 Pulled By: ezyang fbshipit-source-id: 7dd575504ebe235a91489866c91000e9754b1235	2019-03-27 14:27:23 -07:00
Tongzhou Wang	5292685d2f	Improve numerical precision of (s)logdet (#18449 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/18448 and https://github.com/pytorch/pytorch/issues/18450 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18449 Differential Revision: D14611638 Pulled By: soumith fbshipit-source-id: 4f1f27ab5316a92d2783e734169f599afed743cf	2019-03-26 15:32:14 -07:00
vishwakftw	291746f110	Rename trtrs to triangular_solve (#18213 ) Summary: Changelog: - Renames `trtrs` to `triangular_solve` to remain consistent with `cholesky_solve` and `solve`. - Rename all tests, fix callsites - Create a tentative alias for `triangular_solve` under the name `trtrs`, and add a deprecation warning to not promote usage. - Move `isnan` to _torch_docs.py - Remove unnecessary imports Pull Request resolved: https://github.com/pytorch/pytorch/pull/18213 Differential Revision: D14566902 Pulled By: ezyang fbshipit-source-id: 544f57c29477df391bacd5de700bed1add456d3f	2019-03-21 14:27:21 -07:00
Vishwak Srinivasan	a519217ee7	Add batched version of trtrs (#18025 ) Summary: - Remove single batch TH/THC implementations - Remove `_batch_trtrs_lower` from `multivariate_normal` - Add tests for batched behavior - Modify trtrs_backward to accommodate for batched case - Modify docs In a future PR, this will be renamed to `triangular_solve`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18025 Differential Revision: D14523004 Pulled By: ifedan fbshipit-source-id: 11c6a967d107f969b60e5a5c73ce6bb8099ebbe1	2019-03-20 11:11:32 -07:00
Vishwak Srinivasan	421b508d55	Rename gesv to solve (#18060 ) Summary: Changelog: - Renames `gesv` to `solve` to remain consistent with `cholesky_solve`. - Rename all tests, fix callsites - Create a tentative alias for `solve` under the name `gesv`, and add a deprecated warning to not promote usage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18060 Differential Revision: D14503117 Pulled By: zou3519 fbshipit-source-id: 99c16d94e5970a19d7584b5915f051c030d49ff5	2019-03-18 16:04:24 -07:00
Richard Zou	3c977fb7ce	Error out on in-place (unary) ops on tensors that have internal overlap (#17927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17927 ghimport-source-id: 626d321e430b6b5c0ea3aa1eb9df8c1e2d058bf8 Stack: * #17926 Implement at::has_internal_overlap helper function * #17927 Error out on in-place (unary) ops on tensors that have internal overlap On the way to #17935. Works for CPU and CUDA on the following ops: - abs_, acos_, asin_, atan_, ceil_, cos_, erf_, erfc_, exp_, expm1_ - floor_, log_, log10_, log1p_, log2_, round_, rsqrt_, - sin_, sqrt_, tan_, tanh_, trunc_ This PR adds a check to see if the out/result tensor has internal overlap. If it does, then we error out because the result may be incorrect. This is overly conservative; there are some cases where if the result is the same as the input, the inplace operation is OK (such as floor_, round_, and trunc_). However, the current code isn't organized in such a way that this is easy to check, so enabling those will come in the future. Reviewed By: ezyang Differential Revision: D14438871 fbshipit-source-id: 15e12bf1fdb2ab7f74bb806e22bc74840bd6abd1	2019-03-15 07:50:19 -07:00
J M Dieterich	1ba1ca0acb	Update to ROCm2.2 (#18007 ) Summary: ROCm 2.2 was released today, if we respin the CI docker images with the attached, PyTorch/Caffe2 will support ROCm 2.2 Changes necessary: * for the Ubuntu target, HIP PR 934 needs to be applied to fix the forceinline definition. ROCm 2.3 will contain this. * two unit tests proof flaky on different platforms, disable them defensively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18007 Differential Revision: D14473903 Pulled By: bddppq fbshipit-source-id: b1939f11d1c765a3bf71bb244b15f6ceb0e816d3	2019-03-14 18:47:22 -07:00
vaeksare	40a3e14ade	Disable btri tests on Windows if MAGMA is not found (#17989 ) Summary: Fixes #17988 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17989 Reviewed By: ezyang Differential Revision: D14454571 Pulled By: soumith fbshipit-source-id: fc39a807a597d3574f4ca4e22cea12194e4693c0	2019-03-14 07:22:55 -07:00
Thomas Viehmann	aba9051a65	kthvalue consistency with sort in the presence of NaN (#17824 ) Summary: This PR causes kthvalue to be consistent with sort (i.e. treat NaN as larger than any number), so that `a.kthvalue(n) == a.sort()[n - 1]`. One drawback is that median with a NaN argument does not return NaN, which is a deviation from NumPy. Thank you, ngimel, for raising this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17824 Differential Revision: D14410092 Pulled By: ezyang fbshipit-source-id: bdec2d8272dc4c65bcf2f9b8995e237774c44c02	2019-03-12 08:49:19 -07:00
vishwakftw	9d70e199f4	Move lerp to ATen, add functionality for tensor weights (#17348 ) Summary: Changelog: - Remove TH/THC bindings - Add tensor weights for `lerp` - Modify derivatives appropriately Pull Request resolved: https://github.com/pytorch/pytorch/pull/17348 Differential Revision: D14355845 Pulled By: soumith fbshipit-source-id: eaede4c09ee589d77ba6cf52583510ea8e3a2fcf	2019-03-07 14:04:58 -08:00
jwu	8ec7357312	fix different round behavior on CPU and GPU #16498 (#17443 ) Summary: xxtemp, colesbury, bhushan23, zou3519, convert gpu round behavior to half-to-even, consistent with torch cpu version and numpy. You feedback are welcomed. See #16498 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17443 Differential Revision: D14261786 Pulled By: VitalyFedyunin fbshipit-source-id: 98156436b545d72769831a89e2775d43ad913ebc	2019-03-06 19:40:10 -08:00
Shen Li	1154506533	Always synchronize src and dst streams when copying tensors (#16966 ) Summary: fixes #15568 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16966 Differential Revision: D14213144 Pulled By: mrshenli fbshipit-source-id: 2fcf5e07895fde80b4aee72e2736b0def876d21f	2019-02-27 14:57:56 -08:00
Johannes M Dieterich	76828647c1	Enable tests working on ROCm 2.1 dual gfx906 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17473 Reviewed By: bddppq Differential Revision: D14210243 Pulled By: ezyang fbshipit-source-id: 519032a1e73c13ecb260ea93102dc8efb645e070	2019-02-26 20:41:16 -08:00
Shen Li	b527055fcf	Restore current streams on dst device after switching streams (#17439 ) Summary: When switching back to `d0` from a stream on a different device `d1`, we need to restore the current streams on both `d0` and `d1`. The current implementation only does that for `d0`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17439 Differential Revision: D14208919 Pulled By: mrshenli fbshipit-source-id: 89f2565b9977206256efbec42adbd789329ccad8	2019-02-25 12:06:41 -08:00
surgan12	fad9eda7fb	Optional arg fixes (#17222 ) Summary: fixes #17210. cc : ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/17222 Differential Revision: D14130833 Pulled By: soumith fbshipit-source-id: 19ff6020c47208e3436ae28cd16110a0f435b25e	2019-02-19 04:39:18 -08:00
jiej	b5193b6a81	Second PR to restore reverted commit (#16224 ) (#17040 ) Summary: update: 1. global_reduce check for should_block_y_reduce first. This avoids the enabling global_reduce without block_y_reduce. Leading to accessing shared memory during global reduce without allocation. 2. updating block_y_reduce heuristics. Improves perf on tiny tensors 3. adding test case covering old cases where illegal memory access might occur TensorIterator cuda launch configs update (#16224) Update launch configs for TensorIterator gpu_reduce_kernel. Enable flexible block dimension to improve efficiency for reduction cases with small fast dimension. Previously TensorIterator launches blocks with fixed 32x16 threads. For cases like: import torch torch.randn(2**20, 4, device='cuda').sum(0) The fixed launch config does handle coalesced memory access efficiently. Updated launch configure enables flexible block dimension. Combining with improved reduction scheme (using flexible vertical / horizontal reduction instead of limited warp / block reduction in the old code), it ensures optimal memory access pattern even with reduction on dimension with small stride. Possible future improvements: 1. Precise dynamic shared memory allocation. 2. Using warp shuffle for vertical (block_y) reduction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16224 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17040 Differential Revision: D14078295 Pulled By: umanwizard fbshipit-source-id: ecc55054a5a4035e731f0196d633412225c3b06c	2019-02-14 15:23:01 -08:00
Johannes M Dieterich	3e1e5d5a8b	enable unit tests in test_cuda that now pass with ROCm 2.1 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17012 Differential Revision: D14059761 Pulled By: bddppq fbshipit-source-id: 8309c3ffe1efed42b5db69fdec26427413c3f224	2019-02-12 17:28:46 -08:00
vishwakftw	0d95028bee	Dispatch the correct legacy function for geqrf_out and ormqr_out (#16964 ) Summary: This fixes the segfault. Changelog: - Modify the function calls in LegacyDefinitions for `geqrf_out` and `ormqr_out` Pull Request resolved: https://github.com/pytorch/pytorch/pull/16964 Differential Revision: D14025985 Pulled By: gchanan fbshipit-source-id: aa50e2c1694cbf3642273ee14b09ba12625c7d33	2019-02-12 13:48:51 -08:00
Johannes M Dieterich	23e1c55cc0	enable unit tests working on ROCm 2.1 (#16871 ) Summary: This is the first round of enabling unit tests that work on ROCm 2.1 in my tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16871 Differential Revision: D13997662 Pulled By: bddppq fbshipit-source-id: d909a3f7dd5fc8f85f126bf0613751c8e4ef949f	2019-02-09 00:30:50 -08:00
vishwakftw	6d86bc7c3f	Fix issue with scalars and __rpow__ (#16687 ) Summary: Changelog: - Modify __rpow__ function in tensor.py to adapt to scalars Pull Request resolved: https://github.com/pytorch/pytorch/pull/16687 Differential Revision: D13936720 Pulled By: soumith fbshipit-source-id: b0c8727968b04efbc6e7461807c812d962f03370	2019-02-02 18:55:51 -08:00
Jacie Fan	a7796bc24d	CUDA histogram implementation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15842 Reviewed By: zou3519 Differential Revision: D13868982 Pulled By: jaciefan fbshipit-source-id: bce81dc121c4538d204047506f8f14d0b4d8f905	2019-01-30 11:36:20 -08:00
Shen Li	7ce634ebc2	Relax lower bound for nogil timing test to avoid false alarm (#16259 ) Summary: fixes #16250, #16271 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16259 Differential Revision: D13784505 Pulled By: mrshenli fbshipit-source-id: 0b7ad98cd3c018b9907d70158de3abc3c4cb57ef	2019-01-24 17:16:02 -08:00
Shen Li	2235fb256e	Add default_stream() and enhance current_stream() (#16200 ) Summary: Closes #16156 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16200 Differential Revision: D13747455 Pulled By: mrshenli fbshipit-source-id: 00c0d5f341c3ac7a757bdb4631a17e11fbc6d3ec	2019-01-22 14:35:19 -08:00
Shen Li	1c058de9ac	Release GIL when synchronize or wait (#16182 ) Summary: address the second future work item in #15937 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16182 Differential Revision: D13744972 Pulled By: mrshenli fbshipit-source-id: e9812e3fd4a5623e99b639d9f334bfc2d1827d92	2019-01-22 13:29:07 -08:00
Shen Li	898329c3f9	Unify device() return type in Stream, Event, and Tensor (#16150 ) Summary: Addresses one future work item in #15937 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16150 Differential Revision: D13732299 Pulled By: mrshenli fbshipit-source-id: 4d0b35df573a3bf92dea6e2e7eb42fe8bac77b18	2019-01-19 23:01:31 -08:00
Shen Li	292edfb087	Change current device in stream context manager if necessary (#16128 ) Summary: Fixes #16019 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16128 Differential Revision: D13721850 Pulled By: mrshenli fbshipit-source-id: 422c6c0b97c1cd46e127e265b532cb8c74a3aac5	2019-01-18 12:39:51 -08:00
Shen Li	24f4d3987e	Move all Stream and Event Python implementation to C++ (#15937 ) Summary: 1. Added `torch/csrc/cuda/Event.h` and `torch/csrc/cuda/Event.cpp` to bind Python Event class to C++ implementation. 2. Move all CUDA runtime invocations from `torch/cuda/streams.py` to C++ 3. Added tests to cover Stream and Event APIs. ~(event IPC handle tests is introduced in #15974)~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/15937 Differential Revision: D13649001 Pulled By: mrshenli fbshipit-source-id: 84ca58f35f6ba679a4ba33150ceba678d760d240	2019-01-17 07:29:22 -08:00
jiej	7c56db73d5	Moving torch.norm to ATen using TensorIterator (#15414 ) Summary: Adding supports for torch.nomr: i. multi dimensions for dim ii. dtype that specifies math/output tensor type Pull Request resolved: https://github.com/pytorch/pytorch/pull/15414 Differential Revision: D13702022 Pulled By: ezyang fbshipit-source-id: da2676f2b6aff988889b1539d0de8ecd4946823a	2019-01-16 22:15:25 -08:00
Thomas Viehmann	d33e7d1236	multinomial: fix detection of zero probability (#16075 ) Summary: The cumsum over the probabilities can be not monotonically non-decreasing. Thus it is hard to detect zero probability classes using just the cumsum. This changes the binary search postprocessing to use the (non-cumulated) distribution instead. Thank you, jcjohnson, for the bug report with reproducing case. Fixes: #13867 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16075 Differential Revision: D13695565 Pulled By: soumith fbshipit-source-id: 02c4d6f868f0050c1ae7d333f4317c5610e49cd9	2019-01-16 12:50:49 -08:00
Brennan Vincent	fb68d813be	Fix logic errors when accumulating reductions in output (CUDA) (#16023 ) Summary: The correct logic is as follows: * If there is an earlier split, we need to combine with its result * If there is not a later split, we need to project before saving into the output. This should partially f i x #15837 . For example: ``` In [7]: a=torch.ones([1838860800], dtype=torch.float, device="cuda:1") In [8]: a.mean() Out[8]: tensor(1., device='cuda:1') ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/16023 Differential Revision: D13678449 Pulled By: umanwizard fbshipit-source-id: ab5078484c88e96bb30121b5cf24a0e8b0a8c2f8	2019-01-15 19:57:57 -08:00
SsnL	300dcc3b96	Add cuda.reset_max_memory_* (#15985 ) Summary: Addresses #15968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15985 Differential Revision: D13649916 Pulled By: soumith fbshipit-source-id: a207aea5709a79dba7a6fc541d0a70103f49efff	2019-01-14 07:31:51 -08:00
vishwakftw	b4c3268b23	Batched upper triangular, lower triangular (#15257 ) Summary: Changelog: - Implements `triu` and `tril` for batches of 2D tensors. - Remove TH/THC binding for `tril` - Fix CUDA implementation - Update docstrings for tril and triu. - Remove mask-based `triu` and `tril` in cholesky forward and backward. - Remove batched tril in torch.distributions.utils Pull Request resolved: https://github.com/pytorch/pytorch/pull/15257 Differential Revision: D13613888 Pulled By: mrshenli fbshipit-source-id: 0949a05b9b8e974c1acfaf02a6284848ec5cc1c4	2019-01-09 19:46:39 -08:00
Shen Li	7b9f794580	Wrap C10 CUDAStream instead of cudaStream_t in THCPStream Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15833 Differential Revision: D13608337 Pulled By: mrshenli fbshipit-source-id: 4c66ef89fad0dc14a11ddb69da92907797cd2828	2019-01-09 15:12:48 -08:00
Shen Li	1e9a6d7192	A quick fix for Stream operation errors on non-current device (#15689 ) Summary: see #15682 This is a quick fix by implementing the simpler solution as suggested by colesbury. As benchmark result shows, it slows down `Stream.query()` by ~20%, I would be happy to further pursue a more complex solution by implementing this in C++/ATen. But I would still vote for merge this quick fix first just to get rid of the bug sooner. ~Test TBA~ Added FYI jeffreyksmithjr now ```python In [1]: def f(): ...: d0 = torch.device('cuda:0') ...: d1 = torch.device('cuda:1') ...: with torch.cuda.device(d0): ...: s0 = torch.cuda.current_stream() ...: with torch.cuda.device(d1): ...: s1 = torch.cuda.current_stream() ...: s0.query() ...: s1.query() In [4]: %timeit f() 38.1 µs ± 4.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [5]: %timeit f() 37.6 µs ± 2.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` before ```python In [4]: %timeit f() 28.5 µs ± 1.74 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [5]: %timeit f() 35.3 µs ± 2.91 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/15689 Differential Revision: D13571697 Pulled By: mrshenli fbshipit-source-id: 4fe697f91248c6419136d37bb5b7147e612e2f4c	2019-01-03 15:14:58 -08:00
Natalia Gimelshein	e2549cbc01	initialize with ident value in global reduction (#15653 ) Summary: Fixes #15647. cc colesbury. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15653 Differential Revision: D13571132 Pulled By: soumith fbshipit-source-id: 8f25943c974b3b931f4528e0e0a370bc095dab51	2019-01-02 19:52:57 -08:00

1 2 3 4 5 ...

262 Commits