pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Yaxun (Sam) Liu	13a684d50b	Fix test TestCuda.test_streams_multi_gpu_query (#23912 ) Summary: This is a similar issue as TestCuda.test_events_wait. PyTorch test sets a policy() method to assertLeaksNoCudaTensors. Whenever a test is run, assertLeaksNoCudaTensors is called, which in turn calls CudaMemoryLeakCheck, which in turn calls initialize_cuda_context_rng, where it executes torch.randn on each device, where a kernel is launched on each device. Since the kernel may not finish on device 0, the first assertion self.assertTrue(s0.query()) fails. The fix is to insert torch.cuda.synchronize(d0) torch.cuda.synchronize(d1) at the beginning of the test so that previously launched kernels finish before the real test begins. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23912 Differential Revision: D16688599 Pulled By: ezyang fbshipit-source-id: 3de2b555e99f5bbd05727835b9d7c93a026a0519	2019-08-07 07:44:30 -07:00
Hong Xu	be7fe1ccb9	Add tests to ensure that both abs(0.0) and abs(-0.0) lead to 0.0 (#23701 ) Summary: As pointed out by colesbury in https://github.com/pytorch/pytorch/pull/23579#discussion_r309798987 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23701 Differential Revision: D16623781 Pulled By: mrshenli fbshipit-source-id: f48a29499128b08d2ac8bc9e466f2326112ead94	2019-08-05 07:50:06 -07:00
vishwakftw	5d130e4232	Allowing batching for det/logdet/slogdet operations (#22909 ) Summary: Changelog: - Add batching for det / logdet / slogdet operations - Update derivative computation to support batched inputs (and consequently batched outputs) - Update docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/22909 Test Plan: - Add a `test_det_logdet_slogdet_batched` method in `test_torch.py` to test `torch.det`, `torch.logdet` and `torch.slogdet` on batched inputs. This relies on the correctness of `torch.det` on single matrices (tested by `test_det_logdet_slogdet`). A port of this test is added to `test_cuda.py` - Add autograd tests for batched inputs Differential Revision: D16580988 Pulled By: ezyang fbshipit-source-id: b76c87212fbe621f42a847e3b809b5e60cfcdb7a	2019-07-31 10:01:32 -07:00
Tongzhou Wang	af638ad5d7	pin_memory should not copy on already pinned tensors (#23484 ) Summary: fixes https://github.com/pytorch/pytorch/issues/21076 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23484 Differential Revision: D16546264 Pulled By: ezyang fbshipit-source-id: 8058e0bbc6336751f36b884d71234feef498a982	2019-07-30 21:16:23 -07:00
vishwakftw	b3a9a7a9b9	Rename gels to lstsq (#23460 ) Summary: Changelog: - Rename `gels` to `lstsq` - Fix all callsites - Rename all tests - Create a tentative alias for `lstsq` under the name `gels` and add a deprecation warning to not promote usage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23460 Test Plan: - All tests should pass to confirm that the patch is correct Differential Revision: D16547834 Pulled By: colesbury fbshipit-source-id: b3bdb8f4c5d14c7716c3d9528e40324cc544e496	2019-07-30 09:56:04 -07:00
Yaxun (Sam) Liu	0c9979dd7d	Fix TestCuda.test_events_wait (#23520 ) Summary: PyTorch test sets a policy() method to assertLeaksNoCudaTensors. Whenever a test is run, assertLeaksNoCudaTensors is called, which in turn calls CudaMemoryLeakCheck, which in turn calls initialize_cuda_context_rng, where it executes torch.randn on each device, where a kernel is launched on each device. Since the kernel may not finish on device 1, the assertion self.assertTrue(s1.query()) fails. The fix is to insert torch.cuda.synchronize(d0) torch.cuda.synchronize(d1) at the beginning of the test so that previously launched kernels finish before the real test begins. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23520 Differential Revision: D16547701 Pulled By: soumith fbshipit-source-id: 42ad369f909d534e15555493d08e9bb99dd64b6a	2019-07-29 13:09:41 -07:00
Hong Xu	236149edc5	Make randperm works properly on non-contiguous tensors. (#23043 ) Summary: Close https://github.com/pytorch/pytorch/issues/22710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23043 Differential Revision: D16446340 Pulled By: VitalyFedyunin fbshipit-source-id: 1760af310fee71b369e1aaaf96546277058611c9	2019-07-29 11:59:04 -07:00
Johannes M Dieterich	4cd726c7b3	Update ROCm CI to python3.6 (#23088 ) Summary: Rehash of https://github.com/pytorch/pytorch/issues/22322 . Given that python 2.7 will be EOL'd on Jan 1, 2020 and we have models depending on python3.5+, we'd like to update the ROCm CI across the board to python3.6. This PR adds the skip tests and some semantic changes for PyTorch. Added pattern match skip for anything but the ROCm CI compared to #223222 for the python find step in the PyTorch build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23088 Differential Revision: D16448261 Pulled By: bddppq fbshipit-source-id: 69ece1a213418d9abf1444c496dce1c190ee07c8	2019-07-23 23:07:45 -07:00
Vishwak Srinivasan	0ab19d66ee	Port lu_solve to ATen (#22379 ) Summary: Changelog: - Port TH implementation to ATen/native/BatchLinearAlgebra.cpp - Port THC implementation to ATen/native/cuda/BatchLinearAlgebra.cu - Remove TH/THC implementations - Update doc strings Pull Request resolved: https://github.com/pytorch/pytorch/pull/22379 Test Plan: - Added new tests in test_torch.py (port to test_cuda.py exists) Differential Revision: D16089645 Pulled By: zou3519 fbshipit-source-id: dc8561aadacacb23e80c375b4fec687df2b6bbc8	2019-07-23 19:11:35 -07:00
Junjie Bai	eb76b7a564	Revert D16199862: [pytorch][PR] [ROCm] Update ROCm CI to python3.6 Differential Revision: D16199862 Original commit changeset: 46ca6029a232 fbshipit-source-id: 2843b919f2655674e39dc764053621994061a12b	2019-07-17 14:26:56 -07:00
iotamudelta	031b406c38	Update ROCm CI to python3.6 (#22322 ) Summary: Given that python 2.7 will be EOL'd on Jan 1, 2020 and we have models depending on python3.5+, we'd like to update the ROCm CI across the board to python3.6. This PR adds the skip tests and some semantic changes for PyTorch. Open tasks/questions: * RoiAlignTest.CheckCPUGPUEqual fails in the Caffe2 unit tests. Is this something expects / can be skipped? * for testing, I've used update-alternatives on CentOS/Ubuntu to select python == python 3.6. Is this the preferred way? Pull Request resolved: https://github.com/pytorch/pytorch/pull/22322 Differential Revision: D16199862 Pulled By: ezyang fbshipit-source-id: 46ca6029a232f7d23f3fdb5efc33ae39a379fca8	2019-07-17 13:42:30 -07:00
vishwakftw	7d055c21b3	Port SVD to ATen, enable batching for matrix inputs (#21588 ) Summary: Changelog: - Port SVD TH implementation to ATen/native/BatchLinearAlgebra.cpp - Port SVD THC implementation to ATen/native/cuda/BatchLinearAlgebra.cu - Allow batches of matrices as arguments to `torch.svd` - Remove existing implementations in TH and THC - Update doc string - Update derivatives to support batching - Modify nuclear norm implementation to use at::svd instead of _batch_svd - Remove _batch_svd as it is redundant Pull Request resolved: https://github.com/pytorch/pytorch/pull/21588 Test Plan: - Add new test suite for SVD in test_torch.py with port to test_cuda.py - Add tests in common_methods_invocations.py for derivative testing Differential Revision: D16266115 Pulled By: nairbv fbshipit-source-id: e89bb0dbd8f2d58bd758b7830d2389c477aa61fb	2019-07-15 13:34:01 -07:00
Hong Xu	7750cae722	Refactor and improve randperm tests. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22121 Test Plan: Imported from OSS Differential Revision: D16153794 Pulled By: li-roy fbshipit-source-id: 4dbfa6cfcc79f6d431918a6646664215fa9ea0b9	2019-07-10 12:23:33 -07:00
Hong Xu	0f7c3710dd	Support Half type in randperm. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22102 Test Plan: Imported from OSS Differential Revision: D16153586 Pulled By: li-roy fbshipit-source-id: d58e3dbc5da893005f4eaf521a28b0d752274eff	2019-07-10 12:23:25 -07:00
Hong Xu	574e808680	Add a bitwise NOT operator for integer and Boolean types (CUDA). Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22320 Test Plan: Imported from OSS Differential Revision: D16183578 Pulled By: colesbury fbshipit-source-id: 2f72cce5e10fd637be1ac87e1bbfe0937a661034	2019-07-10 12:17:48 -07:00
Brandon Amos	046c4589df	lu: When not using pivoting, return the identity permutation instead of zeros (#22242 ) Summary: Some of my qpth users have told me that updating to the latest version of PyTorch and replacing the btrifact/btrisolve calls with the LU ones wasn't working and I didn't believe them until I tried it myself :) These updates have broken unpivoted LU factorizations/solves on CUDA. The LU factorization code used to return the identity permutation when pivoting wasn't used but now returns all zeros as the pivots. This PR reverts it back to return the identity permutation. I've not yet tested this code as I'm having some trouble compiling PyTorch with this and am hitting https://github.com/pytorch/pytorch/issues/21700 and am not sure how to disable that option. Here's a MWE to reproduce the broken behavior, and my fix. ```python torch.manual_seed(0) n = 4 L = torch.randn(n,n) A = L.mm(L.t()).unsqueeze(0) b = torch.randn(1, n) A_lu_cpu = torch.lu(A) A_lu_cuda_nopivot = torch.lu(A.cuda(), pivot=False) A_lu_cuda_pivot = torch.lu(A.cuda(), pivot=True) print('A_lu_cuda_nopivot\n', A_lu_cuda_nopivot) print('-----\nA_lu_cuda_pivot\n', A_lu_cuda_nopivot) x_cpu = b.lu_solve(A_lu_cpu) x_cuda_nopivot = b.cuda().lu_solve(A_lu_cuda_nopivot) x_cuda_nopivot_fixed = b.cuda().lu_solve( A_lu_cuda_nopivot[0], torch.arange(1, n+1, device='cuda:0').int()) x_cuda_pivot = b.cuda().lu_solve(*A_lu_cuda_pivot) print(x_cpu, x_cuda_nopivot, x_cuda_nopivot_fixed, x_cuda_pivot) ``` Output: ``` A_lu_cuda_nopivot (tensor([[[ 2.8465, -0.7560, 0.8716, -1.7337], [-0.2656, 5.5724, -1.1316, 0.6678], [ 0.3062, -0.2031, 1.4206, -0.5438], [-0.6091, 0.1198, -0.3828, 1.5103]]], device='cuda:0'), tensor([[0, 0, 0, 0]], device='cuda:0', dtype=torch.int32)) ----- A_lu_cuda_pivot (tensor([[[ 2.8465, -0.7560, 0.8716, -1.7337], [-0.2656, 5.5724, -1.1316, 0.6678], [ 0.3062, -0.2031, 1.4206, -0.5438], [-0.6091, 0.1198, -0.3828, 1.5103]]], device='cuda:0'), tensor([[0, 0, 0, 0]], device='cuda:0', dtype=torch.int32)) (tensor([[-0.3121, -0.1673, -0.4450, -0.2483]]), tensor([[-0.1661, -0.1875, -0.5694, -0.4772]], device='cuda:0'), tensor([[-0.3121, -0.1673, -0.4450, -0.2483]], device='cuda:0'), tensor([[-0.3121, -0.1673, -0.4450, -0.2483]], device='cuda:0')) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/22242 Differential Revision: D16049334 Pulled By: ezyang fbshipit-source-id: 7eacae810d87ffbdf8e07159bbbc03866dd9979d	2019-07-09 11:16:50 -07:00
iurii zdebskyi	59c42595e0	Enabled gather and scatter for bool tensor (#21924 ) Summary: - moving stuff around in order to enable bool. - Added implementation of atomicAdd(bool, bool) Pull Request resolved: https://github.com/pytorch/pytorch/pull/21924 Differential Revision: D15883711 Pulled By: izdeby fbshipit-source-id: 733f35c2bc3d87cec9f9687d72b62d2d2cd7c03e	2019-06-27 09:07:50 -07:00
Edward Yang	8f9e0f77dd	Turn off non-default stream testing. (#21793 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21793 ghimport-source-id: 5264fa90ca77fbc79898cfa2f0ee02f47dec27d4 Test Plan: Imported from OSS Differential Revision: D15874814 Pulled By: ezyang fbshipit-source-id: 5c51ab9ae431faf2db549b88b07ba00783acab25	2019-06-18 07:00:08 -07:00
Stefan Krah	710821875a	Fix flaky nuclear_norm() test (#21638 ) Summary: Try to fix a sporadic failure on some CIs. I've run this test hundreds of times on my machine (GeForce 1060, MAGMA) but I cannot reproduce this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21638 Differential Revision: D15827779 Pulled By: ezyang fbshipit-source-id: 3586075e48907b3b84a101c560a34cc733514a02	2019-06-14 11:40:03 -07:00
vishwakftw	4c03ac7ac4	Allow batch sizes > 65535 for inverse, solve, cholesky_solve and tria… (#21689 ) Summary: …ngular_solve Changelog: - Iterate over mini batches of 65535 matrices (maximum) Pull Request resolved: https://github.com/pytorch/pytorch/pull/21689 Differential Revision: D15800254 Pulled By: soumith fbshipit-source-id: c743ff13f1ba25d26874429d44e41a3c0ed21d6a	2019-06-12 23:30:19 -07:00
vishwakftw	9737b166a4	Fix bug in multinomial_alias_draw (#21324 ) Summary: An incorrect increment / decrement caused the samples to not be generated from a multinomial distribution Changelog: - Remove the incorrect increment / decrement operation Fixes https://github.com/pytorch/pytorch/issues/21257, fixes https://github.com/pytorch/pytorch/issues/21508 cc: LeviViana neerajprad Pull Request resolved: https://github.com/pytorch/pytorch/pull/21324 Differential Revision: D15761029 Pulled By: colesbury fbshipit-source-id: 2aeb51e2d3cfdb8356806a7d5b12d4b9910e37fb	2019-06-11 15:18:17 -07:00
Stefan Krah	8b9b215dc5	Add a 'dim' argument to nuclear norm (#21022 ) Summary: Addresses #18275. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21022 Differential Revision: D15743515 Pulled By: ezyang fbshipit-source-id: e4aaea0bd7f863a2abad45c4322d6a9fb02a88e3	2019-06-10 15:18:34 -07:00
Vishwak Srinivasan	3df5a46a99	Skip triangular_solve CUDA test on non-default stream Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21590 Differential Revision: D15742549 Pulled By: ezyang fbshipit-source-id: fd5b2cbce86e5f229c2ffba114ef362934296d07	2019-06-10 11:38:42 -07:00
huba	b144ba66d5	Change PyTorch tests to use non-default CUDA stream (#21474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21474 ghimport-source-id: b2477765362248a80557d1a20db02a1290bdcde3 Differential Revision: D15699700 Pulled By: fbhuba fbshipit-source-id: 1aa4309fec0982c8477cfab29ca5f42d2b171f97	2019-06-07 10:24:48 -07:00
Edward Yang	8c9a88bdab	Make test_cuda.py work on Python 2. (#21466 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21466 ghimport-source-id: 0a235c8b8cf994621a5a5afe022340dd35764c91 Differential Revision: D15698096 Pulled By: ezyang fbshipit-source-id: 1759c2681071e9c7e83de3de86daf4333c5f8f3a	2019-06-07 08:13:03 -07:00
vishwakftw	f6ec464890	Enable batched QR decomposition and add a `some` option (#20689 ) Summary: This PR covers two important points with respect to the QR decomposition: - batching of input matrices (#7500) - adding `some` as an option in `torch.qr` akin to NumPy's `mode` option (#10538) Changelog: - Enable batching for inputs to `torch.qr` - Move QR decomposition implementation to ATen (CPU and CUDA) - Remove existing implementations in TH/THC - Add a `some` option to `torch.qr` that will enable users to switch between complete and reduced decomposition - Modify doc strings Pull Request resolved: https://github.com/pytorch/pytorch/pull/20689 Differential Revision: D15529230 Pulled By: soumith fbshipit-source-id: 16af82b1d2db8a3a758fa8a5f798d83f5f950efb	2019-05-28 17:52:37 -07:00
Sam Gross	b85c52923b	Re-land "Fix advanced indexing on "huge" Tensors" (#21019 ) Summary: This #20919 without the changes to aten/src/THC/THCIntegerDivider.cuh that broke the ROCm build. cc bddppq Original summary: This fixes advanced indexing in cases where there's more than 2^31-1 bytes in the output. The `gpu_index_kernel` was missing the `can_use_32bit_indexing`/`with_32bit_indexing` check. This also adds a number of TORCH_INTERNAL_ASSERTS in Loops.cuh, OffsetCalculator, and IntDivider that sizes are fit in a signed 32-bit integer. More comprehensive tests that require a 32 GB GPU are here: https://gist.github.com/colesbury/e29387f5851521256dff562be07b981e Pull Request resolved: https://github.com/pytorch/pytorch/pull/21019 Differential Revision: D15518477 Pulled By: colesbury fbshipit-source-id: 4db5626fda76eb58250793e8aa7d4f2832db3a34	2019-05-28 12:45:56 -07:00
Junjie Bai	5ddbfc97e9	Revert D15501945: [pytorch][PR] Fix advanced indexing on "huge" Tensors Differential Revision: D15501945 Original commit changeset: e876e678e866 fbshipit-source-id: 2833eb118a62e301571a983529f6e4fc91442581	2019-05-27 20:26:37 -07:00
Sam Gross	b93bdf6989	Fix advanced indexing on "huge" Tensors (#20919 ) Summary: This fixes advanced indexing in cases where there's more than 2^31-1 bytes in the output. The `gpu_index_kernel` was missing the `can_use_32bit_indexing`/`with_32bit_indexing` check. This also adds a number of TORCH_INTERNAL_ASSERTS in Loops.cuh, OffsetCalculator, and IntDivider that sizes are fit in a signed 32-bit integer. More comprehensive tests that require a 32 GB GPU are here: https://gist.github.com/colesbury/e29387f5851521256dff562be07b981e Fixes #20888 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20919 Differential Revision: D15501945 Pulled By: colesbury fbshipit-source-id: e876e678e866d2efda8ee92c47a1d2d1310671f0	2019-05-24 16:25:04 -07:00
Sam Gross	dee11a92c1	Use Device instead of Backend in TensorIterator (#20690 ) Summary: This PR also moves Device::validate into the header file, which makes statements like `Device d = kCPU` effectively free. Device includes the device's index, so TensorIterator::compute_types now implicitly checks that all CUDA inputs are on the same GPU. Previously, this was done ad-hoc in places like TensorIterator::binary_op. Note that zero-dim Tensor (scalars) are NOT required to be on the same device as other inputs because they behave almost like Python numbers. TensorIterator handles copying zero-dim Tensors to the common device. Prior to this PR, TensorIterator would copy zero-dim Tensors between CPU and GPU, but not between different GPUs (because Backend didn't encode the GPU index). This removes that restriction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20690 Differential Revision: D15414826 Pulled By: colesbury fbshipit-source-id: 1d0ad1f7d663252af36dd4590bcda418c2f7a09f	2019-05-24 12:14:08 -07:00
Sam Gross	320c38555e	Refactor CUDA copy and general copy dispatch (#20685 ) Summary: Copy.cu goes from 308 to 190 lines of code. In general it uses, the same copy strategy, using cudaMempcyAsync, a pointwise kernel, or a copy using temporary buffers. The pointwise kernel has slightly improved performance when broadcasting due to faster index calculation. This deletes "`s_copy_`", "`_s_copy_from`", and "`_copy_same_type_`". The only entry-point now is "`copy_`". A mini-benchmark is here: https://gist.github.com/colesbury/706de1d4e8260afe046020988410b992 Before: https://gist.github.com/colesbury/ab454b6fe3791bff420d7bcf8c041f18 After: https://gist.github.com/colesbury/9024d242b56ab09a9ec985fa6d1620bc Results were measured on 2.2 GHz Broadwell; no-turbo; one thread; compiled with GCC 7.3.0. (Results are slower than typical usage due to turbo being off.) The only significant differences is in the CUDA [1024] -> [1024, 1024] broadcasting copy which is ~25% faster. I don't expect a noticeable difference in real programs. CPU copy overhead is a tiny bit (~200 ns) faster, but I don't expect anyone to notice that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20685 Differential Revision: D15414819 Pulled By: colesbury fbshipit-source-id: d3c6e04a5020470e3bef15b1fc09503cae5df440	2019-05-20 17:09:44 -07:00
Iurii Zdebskyi	71260b98e2	Fixed histc return type for CUDA (#20369 ) Summary: Fixing reported [issue](https://github.com/pytorch/pytorch/issues/20208). Pull Request resolved: https://github.com/pytorch/pytorch/pull/20369 Reviewed By: zou3519 Differential Revision: D15300959 Pulled By: izdeby fbshipit-source-id: 219692f99a66ea433112dfc226132eb6867122cf	2019-05-20 08:08:28 -07:00
Roy Li	163f0e182c	Fix bug in non_blocking copy (#20305 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20305 ghimport-source-id: eb3dacb10fd93bbb5a6bbe078ed1ec842163d0e6 Differential Revision: D15276094 Pulled By: li-roy fbshipit-source-id: 4728f419aa050e6c94a4f62231fa1a86caa556a7	2019-05-11 15:20:19 -07:00
Phúc Lê	9b272affde	Add base support to torch.logspace, default base=10 (#19542 ) Summary: Add base support for torch.logspace. See #19220 for details. SsnL can you feedback? Thanks a lot. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19542 Differential Revision: D15028484 Pulled By: soumith fbshipit-source-id: fe5a58a203b279103abbc192c754c25d5031498e	2019-04-23 15:06:34 -07:00
SsnL	dce3d74dfb	add torch.cuda.synchronize(device=None) (#19573 ) Summary: fixes https://github.com/pytorch/pytorch/issues/19509 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19573 Differential Revision: D15045730 Pulled By: ezyang fbshipit-source-id: 732721b4b360fc4348ca7c87d4cd1386e7651bdd	2019-04-23 08:40:38 -07:00
vishwakftw	c30224ad21	Rename potri to cholesky_inverse (#19498 ) Summary: Changelog: - Rename `potri` to `cholesky_inverse` to remain consistent with names of `cholesky` methods (`cholesky`, `cholesky_solve`) - Fix all callsites - Rename all tests - Create a tentative alias for `cholesky_inverse` under the name `potri` and add a deprecation warning to not promote usage Pull Request resolved: https://github.com/pytorch/pytorch/pull/19498 Differential Revision: D15029901 Pulled By: ezyang fbshipit-source-id: 2074286dc93d8744cdc9a45d54644fe57df3a57a	2019-04-22 08:18:39 -07:00
Tongzhou Wang	973d51079b	Add device-specific cuFFT plan caches (#19300 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/19224 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19300 Differential Revision: D14986967 Pulled By: soumith fbshipit-source-id: 8c31237db50d6924bba1472434c10326610d9255	2019-04-18 06:39:35 -07:00
Richard Zou	eaa14f5f59	Error out on in-place binops on tensors with internal overlap (#19317 ) Summary: This adds checks for `mul_`, `add_`, `sub_`, `div_`, the most common binops. See #17935 for more details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19317 Differential Revision: D14972399 Pulled By: zou3519 fbshipit-source-id: b9de331dbdb2544ee859ded725a5b5659bfd11d2	2019-04-17 13:02:07 -07:00
J M Dieterich	31686805f2	Enable unit tests for ROCm 2.3 (#19307 ) Summary: Unit tests that hang on clock64() calls are now fixed. test_gamma_gpu_sample is now fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19307 Differential Revision: D14953420 Pulled By: bddppq fbshipit-source-id: efe807b54e047578415eb1b1e03f8ad44ea27c13	2019-04-16 10:58:27 -07:00
Sam Gross	7caad0ed33	Free all blocks with outstanding events on OOM-retry (#19222 ) Summary: The caching allocator tries to free all blocks on an out-of-memory error. Previously, it did not free blocks that still had outstanding stream uses. This change synchronizes on the outstanding events and frees those blocks. See #19219 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19222 Differential Revision: D14925071 Pulled By: colesbury fbshipit-source-id: a2e9fe957ec11b00ea8e6c0468436c519667c558	2019-04-15 11:29:27 -07:00
Johannes M Dieterich	d8669a2c7e	Enable working ROCm tests (#19169 ) Summary: Enable multi-GPU tests that work with ROCm 2.2. Have been run three times on CI to ensure stability. While there, remove skipIfRocm annotations for tests that depend on MAGMA. They still skip but now for the correct reason (no MAGMA) to improve our diagnostics. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19169 Differential Revision: D14924812 Pulled By: bddppq fbshipit-source-id: 8b88f58bba58a08ddcd439e899a0abc6198fef64	2019-04-12 21:51:10 -07:00
Vishwak Srinivasan	487388d8ad	Rename btrisolve to lu_solve (#18726 ) Summary: Changelog: - Rename `btrisolve` to `lu_solve` to remain consistent with names of solve methods (`cholesky_solve`, `triangular_solve`, `solve`) - Fix all callsites - Rename all tests - Create a tentative alias for `lu_solve` under the name `btrisolve` and add a deprecation warning to not promote usage Pull Request resolved: https://github.com/pytorch/pytorch/pull/18726 Differential Revision: D14726237 Pulled By: zou3519 fbshipit-source-id: bf25f6c79062183a4153015e0ec7ebab2c8b986b	2019-04-09 15:21:24 -07:00
J M Dieterich	e45e3634d6	add launch bounds, enable more tests (#18909 ) Summary: Add launch bounds annotations for ROCm arising from maxThreadsPerBlock and apply threads use. Enable tests that now work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18909 Differential Revision: D14801490 Pulled By: ezyang fbshipit-source-id: b81c97fc783a2627bc7e31b32036a364cfe40cc7	2019-04-05 10:17:15 -07:00
Roy Li	f5741eb855	Store ScalarType and Backend instead of Type in TensorIterator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17601 Reviewed By: ezyang Differential Revision: D14274754 fbshipit-source-id: b08880ae586b6ae57d4c0bbeb203796d087926c4	2019-04-04 02:24:16 -07:00
vishwakftw	baac5489a8	Expose alias multinomial methods to ATen (#17904 ) Summary: This PR exposes the multinomialAliasSetup and multinomialAliasDraw methods. cc: neerajprad Pull Request resolved: https://github.com/pytorch/pytorch/pull/17904 Differential Revision: D14700205 Pulled By: ezyang fbshipit-source-id: 16462fb1f1ef1d560fd586632ea356b23e966ee3	2019-04-02 07:56:41 -07:00
Edward Yang	173f224570	Turn on F401: Unused import warning. (#18598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598 ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a Stack from [ghstack](https://github.com/ezyang/ghstack): * #18598 Turn on F401: Unused import warning. This was requested by someone at Facebook; this lint is turned on for Facebook by default. "Sure, why not." I had to noqa a number of imports in __init__. Hypothetically we're supposed to use __all__ in this case, but I was too lazy to fix it. Left for future work. Be careful! flake8-2 and flake8-3 behave differently with respect to import resolution for # type: comments. flake8-3 will report an import unused; flake8-2 will not. For now, I just noqa'd all these sites. All the changes were done by hand. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14687478 fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3	2019-03-30 09:01:17 -07:00
Vishwak Srinivasan	e73be58ff7	Rename `btriunpack` to `lu_unpack` (#18529 ) Summary: Changelog: - Renames `btriunpack` to `lu_unpack` to remain consistent with the `lu` function interface. - Rename all relevant tests, fix callsites - Create a tentative alias for `lu_unpack` under the name `btriunpack` and add a deprecation warning to not promote usage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18529 Differential Revision: D14683161 Pulled By: soumith fbshipit-source-id: 994287eaa15c50fd74c2f1c7646edfc61e8099b1	2019-03-29 13:01:30 -07:00
Vishwak Srinivasan	d859031ebf	Rename `btrifact*` to `lu` (#18435 ) Summary: Changelog: - Renames `btrifact` and `btrifact_with_info` to `lu`to remain consistent with other factorization methods (`qr` and `svd`). - Now, we will only have one function and methods named `lu`, which performs `lu` decomposition. This function takes a get_infos kwarg, which when set to True includes a infos tensor in the tuple. - Rename all tests, fix callsites - Create a tentative alias for `lu` under the name `btrifact` and `btrifact_with_info`, and add a deprecation warning to not promote usage. - Add the single batch version for `lu` so that users don't have to unsqueeze and squeeze for a single square matrix (see changes in determinant computation in `LinearAlgebra.cpp`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/18435 Differential Revision: D14680352 Pulled By: soumith fbshipit-source-id: af58dfc11fa53d9e8e0318c720beaf5502978cd8	2019-03-29 00:34:30 -07:00
jithunnair-amd	fdedc62c26	enable more unit tests (#18537 ) Summary: Enable unit tests working with ROCm 2.3. In particular, these are unit tests where we skipped for double data types previously and some tests for multi-GPU setups. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18537 Differential Revision: D14651822 Pulled By: ezyang fbshipit-source-id: 7dd575504ebe235a91489866c91000e9754b1235	2019-03-27 14:27:23 -07:00
Tongzhou Wang	5292685d2f	Improve numerical precision of (s)logdet (#18449 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/18448 and https://github.com/pytorch/pytorch/issues/18450 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18449 Differential Revision: D14611638 Pulled By: soumith fbshipit-source-id: 4f1f27ab5316a92d2783e734169f599afed743cf	2019-03-26 15:32:14 -07:00

1 2 3 4 5 ...

295 Commits