pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Nikita Shulga	bb32e123e6	Report results of python unit tests during window test runs (#35687 ) Summary: Define `store_test_results` attribute in CircleCI yamls Install `unittest-xml-reporting` and define `IN_CIRCLECI` environment variable to trigger test runners to save results to XML Pull Request resolved: https://github.com/pytorch/pytorch/pull/35687 Differential Revision: D20739831 Pulled By: malfet fbshipit-source-id: 6a7bbf19f93c32766963f5edad191ad8ca316ff8	2020-03-30 12:33:03 -07:00
Mike Ruberry	683246e5ea	Improves precision of linspace, logspace (#35461 ) Summary: The Torch algorithms for linspace and logspace conceptually compute each of their values using: `start_value + step_value * idx` [And NumPy does the same,](`cef4dc9d91/numpy/core/function_base.py (L24)`) except NumPy then [sets the last value in its array directly.](`cef4dc9d91/numpy/core/function_base.py (L162)`) This is because the above computation is unstable when using floats, and NumPy's contract, like PyTorch's, is that the last element in the array is the stop value. In PyTorch there can be a divergence between the computed last value and the actual value. One user reported case was: `torch.linspace(-0.031608279794, 0.031531572342, 257, dtype=torch.float32)` Which causes a difference of 3.7253e-09 between the last value as set by NumPy and computed by PyTorch. After this PR the difference is zero. Instead of simply setting the last element of the tensor, this PR updates the kernels with a "symmetric" algorithm that sets the first and last array elements without requiring an additional kernel launch on CUDA. The performance impact of this change seems small. I tested with a step sizes of 2^8 and 2^22, and all timing differences were imperceptible except for 2^22 on CPU, which appears to have suffered ~5% slowdown. I think that's an acceptable performance hit for the improved precision when we consider the context of linspace. An alternative would be to simply set the last element, as NumPy does, on CPU. But I think it's preferable to keep the CPU and CUDA algorithms aligned and keep the algorithm symmetric. In current PyTorch, for example, torch.linspace starts generating values very similar to NumPy, but as the index increases so do the errors, giving our current implementation a "left bias." Two tests are added to test_torch.py for this behavior. The linspace test will fail on current PyTorch, but the logspace test will succeed since its more complex computation needs wider error bars. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35461 Differential Revision: D20712539 Pulled By: mruberry fbshipit-source-id: 2c1257c8706f4cdf080ff0331bbf2f7041ab9adf	2020-03-27 23:50:39 -07:00
Alban Desmaison	181da12126	Revert D20687652: [pytorch][PR] Report results from cpp unittests on Windows and Linux Test Plan: revert-hammer Differential Revision: D20687652 Original commit changeset: fc370b7e2614 fbshipit-source-id: 8153815c8ed8f3d4f472caa95eda76180b038a42	2020-03-27 06:56:53 -07:00
Nikita Shulga	d2d40c45b6	Report results from cpp unittests on Windows and Linux (#35500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35500 Test Plan: Test in production :) Results should eventually be published to: https://circleci.com/build-insights/gh/pytorch/pytorch/master Differential Revision: D20687652 Pulled By: malfet fbshipit-source-id: fc370b7e261402e14b427f42038ecb2d95bad059	2020-03-26 23:00:33 -07:00
Nikita Shulga	6fa0b3df2e	[testing] Pass verbosity settings to `XMLTestRunner` (#35224 ) Summary: When `unittest.main()` is invoked with custom testRunner, verbosity settings for the runner must be set manually Pull Request resolved: https://github.com/pytorch/pytorch/pull/35224 Test Plan: CI Differential Revision: D20605896 Pulled By: malfet fbshipit-source-id: 79fc6f55911189b6d8a4bc83bd2390c94bd69e5e	2020-03-23 16:37:52 -07:00
Ailing Zhang	471ddacd8b	Add retry decorator and use it for Hub tests. (#34829 ) Summary: fix https://github.com/pytorch/pytorch/issues/34751 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34829 Differential Revision: D20476231 Pulled By: ailzhang fbshipit-source-id: eb38ee655e28250352b15e8e37b3b39310a7c378	2020-03-16 20:19:45 -07:00
Pearu Peterson	8bae1ed144	PCA and SVD for low-rank matrices, LOBPCG for positive-defined generalized eigenvalue problem - copy (#34721 ) Summary: This is a copy of PR https://github.com/pytorch/pytorch/issues/29488 to help the merging process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34721 Differential Revision: D20444270 Pulled By: vincentqb fbshipit-source-id: 042c56c8c0dae37834f52b4aee2deae7dd6fa659	2020-03-16 14:13:30 -07:00
Edward Yang	4b929e5466	Revert D20193196: [pytorch][PR] PCA and SVD for low-rank matrices, LOBPCG for positive-defined generalized eigenvalue problem Test Plan: revert-hammer Differential Revision: D20193196 Original commit changeset: 78a487991242 fbshipit-source-id: 8da4f8cb17c45af41e8c0ce80bc72581eb10dbb8	2020-03-11 09:24:34 -07:00
Pearu Peterson	2ec779d46c	PCA and SVD for low-rank matrices, LOBPCG for positive-defined generalized eigenvalue problem (#29488 ) Summary: This PR implements the following linear algebra algorithms for low-rank matrices: - [x] Approximate `A` as `Q Q^H A` - using Algorithm 4.4 from [Halko et al, 2009](http://arxiv.org/abs/0909.4061). + exposed as `torch.lowrank.get_approximate_basis(A, q, niter=2, M=None) -> Q` + [x] dense matrices + [x] batches of dense matrices + [x] sparse matrices + [x] documentation - [x] SVD - using Algorithm 5.1 from [Halko et al, 2009](http://arxiv.org/abs/0909.4061). + uses `torch.lowrank.get_approximate_basis` + exposed as `torch.svd_lowrank(A, q=6, niter=2, M=None) -> (U, S, V)` + [x] dense matrices + [x] batches of dense matrices + [x] sparse matrices + [x] documentation - [x] PCA - using `torch.svd_lowrank` + uses `torch.svd_lowrank` + exposed as `torch.pca_lowrank(A, center=True, q=None, niter=2) -> (U, S, V)` + [x] dense matrices + [x] batches of dense matrices + [x] sparse matrices, uses non-centered sparse matrix algorithm + [x] documentation - [x] generalized eigenvalue solver using the original LOBPCG algorithm [Knyazev, 2001](https://epubs.siam.org/doi/abs/10.1137/S1064827500366124) + exposed as `torch.lobpcg(A, B=None, k=1, method="basic", ...)` + [x] dense matrices + [x] batches of dense matrices + [x] sparse matrices + [x] documentation - [x] generalized eigenvalue solver using robust LOBPCG with orthogonal basis selection [Stathopoulos, 2002](https://epubs.siam.org/doi/10.1137/S1064827500370883) + exposed as `torch.lobpcg(A, B=None, k=1, method="ortho", ...)` + [x] dense matrices + [x] batches of dense matrices + [x] sparse matrices + [x] documentation - [x] generalized eigenvalue solver using the robust and efficient LOBPCG Algorithm 8 from [Duersch et al, 2018](https://epubs.siam.org/doi/abs/10.1137/17M1129830) that switches to orthogonal basis selection automatically + the "ortho" method improves iterations so rapidly that in the current test cases it does not make sense to use the basic iterations at all. If users will have matrices for which basic iterations could improve convergence then the `tracker` argument allows breaking the iteration process at user choice so that the user can switch to the orthogonal basis selection if needed. In conclusion, there is no need to implement Algorithm 8 at this point. - [x] benchmarks + [x] `torch.svd` vs `torch.svd_lowrank`, see notebook [Low-rank SVD](https://github.com/Quansight/pearu-sandbox/blob/master/pytorch/Low-rank%20SVD.ipynb). In conclusion, the low-rank SVD is going to be useful only for large sparse matrices where the full-rank SVD will fail due to memory limitations. + [x] `torch.lobpcg` vs `scipy.sparse.linalg.lobpcg`, see notebook [LOBPCG - pytorch vs scipy](https://github.com/Quansight/pearu-sandbox/blob/master/pytorch/LOBPCG%20-%20pytorch%20vs%20scipy.ipynb). In conculsion, both implementations give the same results (up to numerical errors from different methods), scipy lobpcg implementation is generally faster. + [x] On very small tolerance cases, `torch.lobpcg` is more robust than `scipy.sparse.linalg.lobpcg` (see `test_lobpcg_scipy` results) Resolves https://github.com/pytorch/pytorch/issues/8049. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29488 Differential Revision: D20193196 Pulled By: vincentqb fbshipit-source-id: 78a4879912424595e6ea95a95e483a37487a907e	2020-03-11 07:33:49 -07:00
Edward Yang	ba1bd41767	Turn on strict dtype checking for test_torch.py (#33825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33825 Partially addresses #20376 I do this by overriding assertEqual in classes that opt into this. This means I have to fix #33821. The fix is a little unsatisfactory as idiomatic Python 2 super() calls don't work (since the class is no longer in scope); hopefully this will just work when we go to Python 3. General approach taken: - A lot of dtype mismatches are because we specified tensor constants that infer to some dtype, but the actual dtype needed is something else. Those are easy, just annotate the tensor() constructor (often a legacy Tensor/FloatTensor call) with dtype - There are a few cases where the promotion rules are nontrivial. Some of them I just typed out the expected promotion rules manually (based on trial and error) - There are some more complex cases; if it gets too hairy I just set exact_dtype=False and nope the fuck out I don't have time to do it for all the other classes. But the setup should work if people just incrementally add the overrides to classes, and then eventually flip the default. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20125791 Pulled By: ezyang fbshipit-source-id: 389c2d1efbd93172af02f13e38ac5e92fe730c57	2020-03-03 14:45:53 -08:00
anjali411	dece155335	Modified assertEqual to handle complex tensors (#33773 ) Summary: - Modified assertEqual to handle complex tensors - added a test in test_torch.py to test torch.zeros - added dispatch for complex for index_kernel, index_put_kernel Pull Request resolved: https://github.com/pytorch/pytorch/pull/33773 Differential Revision: D20135553 Pulled By: anjali411 fbshipit-source-id: f716604535c0447ecffa335b0fc843431397c988	2020-02-28 08:43:28 -08:00
Nikolay Korovaiko	a7e22b4c6a	add bailout checks to checkScript (#32802 ) Summary: this adds enough infrastructure to run bailout checks in `checkScript`. I'll need to figure out the best way to enable it for nightly builds now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32802 Differential Revision: D19974718 Pulled By: Krovatkin fbshipit-source-id: 40485503f6d3ae14edcce98e1eec1f0559f3ad08	2020-02-21 21:18:54 -08:00
Rohan Varma	6cb9e6b015	Back out "Revert D19871946: [distributed] pass in timeout to TCP store when initializing" (#33434 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33434 Reland of https://github.com/pytorch/pytorch/pull/33325, since the unit test was flaky and failed on land. To ensure that the test is not flaky, I bumped the timeout so the rendezvous does not timeout (timing out the rendezvous in 1s led to the flakiness). I also generalized our mechanism for retrying on errors to include retrying on errors due to timeout in rendezvous. ghstack-source-id: 98558377 Test Plan: Added UT test_tcp_store_timeout_set Differential Revision: D19935390 fbshipit-source-id: 56ccf8c333dd2f954a33614d35cd1642d4e9473a	2020-02-19 17:17:17 -08:00
ptrblck	1e3664b6ef	Remove c/pdist tests from _internal/common_utils.py (#33409 ) Summary: * remove brute_test from `torch/testing/_internal/common_utils.py` * add these tests as internal tests to `test_torch.py` CC ailzhang Pull Request resolved: https://github.com/pytorch/pytorch/pull/33409 Differential Revision: D19951729 Pulled By: ailzhang fbshipit-source-id: b1126aaf26fa64a0f17cbb582dc8038b79cfe3eb	2020-02-19 10:27:30 -08:00
Pritam Damania	fd684cc312	Use torch.set_default_dtype in test_data_parallel and rename dtype2prec (#32962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32962 As per gchanan's comments on https://github.com/pytorch/pytorch/pull/30445, I've used `torch.set_default_dtype` in test_data_parallel instead of specifying dtype=torch.double everywhere. Also, renamed dtype2prec to dtype2prec_DONTUSE ghstack-source-id: 98388429 Test Plan: waitforbuildbot Differential Revision: D19714374 fbshipit-source-id: eb55bbca33881625636ba9ea6dd4cb692f25668e	2020-02-15 14:07:54 -08:00
ptrblck	a64d0ffe81	Use int64 in pdist kernel to handle batches >= 46342 #30583 (#31593 ) Summary: Currently `torch.pdist` yields an illegal CUDA memory access for batch sizes >= 46342 as reported by SsnL in https://github.com/pytorch/pytorch/issues/30583. Thanks for the minimal code reproduction, btw! ;) Reason for this bug: The calculation if `i` in the [`pdist_kerne_cuda_impl`](`46ad80c839/aten/src/ATen/native/cuda/DistanceKernel.cu (L112)`) might overflow, if a tensor with a `batch size >= 46342` is passed to `torch.pdist`. Detailed description: * `result` is resizes as ` n * (n - 1) / 2 = 1073767311` ([line of code](`46ad80c839/aten/src/ATen/native/Distance.cpp (L140)`)) * `grid` is initialized as `result.numel()` ([line of code](`46ad80c839/aten/src/ATen/native/cuda/DistanceKernel.cu (L246)`)) * `k` is assigned to the `blockIdx.x` as an `int32` ([line of code](`46ad80c839/aten/src/ATen/native/cuda/DistanceKernel.cu (L108)`)) * `i` is calculated using `2 * k >= 2147534622` ([line of code](`46ad80c839/aten/src/ATen/native/cuda/DistanceKernel.cu (L112)`)), which overflows, since `2147534622 > 2147483647 (int32_max)`. Using `const int64_t k = blockIdx.x;` would solve the illegal memory access. This seems also be done for [`cdist_kernel_cuda_impl`](`46ad80c839/aten/src/ATen/native/cuda/DistanceKernel.cu (L198-L201)`). However, we might expect a slowdown, so I've timed the current PyTorch master vs. this PR: (tested with `x = torch.randn(x.size(0), 128)` on a V100) \|x.size(0) \| int32 idx \| int64 idx \| slowdown \| \|----------\|-----------\|-----------\|----------\| \| 50000 \| - \| 4.4460 \| - \| \| 25000 \| 1.02522 \| 1.10869 \| 7.53% \| \| 12500 \| 0.25182 \| 0.27277 \| 7.68% \| \| 6250 \| 0.06291 \| 0.06817 \| 7.72% \| \| 3125 \| 0.01573 \| 0.01704 \| 7.69% \| \| 1562 \| 0.00393 \| 0.00426 \| 7.75% \| While checking the backward kernel, it seems I'm triggering another error with a size limit of ```python x = torch.randn(1449, 1, device='cuda', requires_grad=True) out = torch.pdist(x) out.mean().backward() > RuntimeError: CUDA error: invalid configuration argument ``` , while `[<=1448, 1]` works. I'll take another look at this issue. Let me know, if the potential fix should go into this PR or if I should open a new issue. CC ngimel, csarofeen Pull Request resolved: https://github.com/pytorch/pytorch/pull/31593 Differential Revision: D19825571 Pulled By: ngimel fbshipit-source-id: ace9ccab49f3cf0ce894cdb6daef0795e2e8ec03	2020-02-11 12:00:39 -08:00
George Guanheng Zhang	f4fbe9549d	Revert D19800021: [pytorch][PR] Improve error message for assertWarnsRegex Test Plan: revert-hammer Differential Revision: D19800021 Original commit changeset: 1c31ae785c8f fbshipit-source-id: d7b340d678562c25a84d48be66c576075000b50d	2020-02-10 12:17:52 -08:00
Peter Bell	c917a247a8	Improve error message for assertWarnsRegex (#33099 ) Summary: `assertWarnsRegex` now prints out any warnings that it caught while failing to find a matching warning. This makes it easier to debug tests by just looking at the CI logs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33099 Differential Revision: D19800021 Pulled By: ezyang fbshipit-source-id: 1c31ae785c8ffc5d47619aff6597e479263be2de	2020-02-10 07:27:59 -08:00
Richard Zou	6209412647	Add option to use ninja to compile ahead-of-time cpp_extensions (#32495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32495 Background ------------------------------ Previously, ninja was used to compile+link inline cpp_extensions and ahead-of-time cpp_extensions were compiled with distutils. This PR adds the ability to compile (but not link) ahead-of-time cpp_extensions with ninja. The main motivation for this is to speed up cpp_extension builds: distutils does not make use of parallelism. With this PR, using the new option, on my machine, - torchvision compilation goes from 3m43s to 49s - nestedtensor compilation goes from 2m0s to 28s. User-facing changes ------------------------------ I added a `use_ninja` flag to BuildExtension. This defaults to `True`. When `use_ninja` is True: - it will attempt to use ninja. - If we cannot use ninja, then this throws a warning and falls back to distutils. - Situations we cannot use ninja: Windows (NYI, I'll open a new issue for this), if ninja cannot be found on the system. Implementation Details ------------------------------ This PR makes this change in two steps. Please me know if it would be easier to review this if I split this up into a stacked diff. Those changes are: 1) refactor _write_ninja_file to separate the policy (what compiler flags to pass) from the mechanism (how to write the ninja file and do compilation). 2) call _write_ninja_file and _run_ninja_build while building ahead-of-time cpp_extensions. These are only used to compile objects; distutils still handles the linking. Change 1: refactor _write_ninja_file to seperate policy from mechanism - I split _write_ninja_file into: _write_ninja_file and _write_ninja_file_to_build_library - I renamed _build_extension_module to _run_ninja_build Change 2: Call _write_ninja_file while building ahead-of-time cpp_extensions - _write_ninja_file_and_compile_objects calls _write_ninja_file to only build object files. - We monkey-patch distutils.CCompiler.compile to call _write_ninja_files_and_compile_objects - distutils still handles the linking step. The linking step is not a bottleneck so it was not a concern. - This change only works on unix-based systems. Our code for windows goes down a different codepath and I did not want to mess with that. - If a system does not support ninja, we raise a warning and fall back to the original compilation path. Test Plan ------------------------------ Adhoc testing - I built torchvision using pytorch master and printed out the build commands. Next, I used this branch to build torchvision and looked at the ninja file. I compared the ninja file with the build commands and asserted that they were functionally the same. - I repeated the above for pytorch/nestedtensor. PyTorch test suite - I split `test_cpp_extensions` into `test_cpp_extensions_aot` and `test_cpp_extensions_jit`. The AOT (ahead-of-time) version tests ahead-of-time and the JIT version tests just-in-time (not to be confused with TorchScript) - `test_cpp_extensions_aot` gets run TWICE by run_test.py, once with a module that was built with ninja, and once with a module that was built without ninja. - run_test.py asserts that when we are building with use_ninja=True, ninja is actually available on the system. Test Plan: Imported from OSS Differential Revision: D19730432 Pulled By: zou3519 fbshipit-source-id: 819590d01cf65e8da5a1e8019b8b3084792fee90	2020-02-05 18:49:29 -08:00
davidriazati	2060e0a9dd	Split serialization tests to their own file (#32241 ) Summary: Stacked PRs * #32244 - Make zip serialization the default * #32241 - Split serialization tests to their own file This makes them all easier to run as a batch. This PR is just a code move / fixing up imports. There are still some serialization tests in `test_torch.py` as part of `TestDeviceType`. ](https://our.intern.facebook.com/intern/diff/19415826/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/32241 Pulled By: driazati Differential Revision: D19415826 fbshipit-source-id: a3f6cfe1626ff2f9b9631c409bf525bd32e4639b	2020-01-28 15:04:05 -08:00
Pritam Damania	f050b16dd9	Move pytorch distributed tests to separate folder for contbuild. (#30445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445 Create distributed and rpc directories under caffe/test for better management of unit tests. Differential Revision: D18702786 fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606	2020-01-22 21:16:59 -08:00

21 Commits