In edge cases in CI, SLOW_TESTS_FILE is defined but does not point to an existing file.
Guessing this is due to a test case that opens a subprocses and cwd's but doesn't clean its env.
We shouldn't make importing common_utils fail, so issue a warning and proceed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99659
Approved by: https://github.com/ezyang, https://github.com/malfet
Planning to do a full writeup later. The short story is, sometimes the following chain of events happens:
1. We turn on Dynamo's custom frame handler
2. GC triggers (and all of the finalizers run under Dynamo)
3. GC hits a GeneratorExit frame
4. You end up in the custom frame handler with throw_flag == TRUE and PyErr_Occurred() != NULL
If this happens and we blindly call into other Python functions (like the Python callback), the executed Python code will immediately raise an exception (because there's already an ambient exception set.) This is very, very confusing. The fix is to defer to the regular handler when throw_flag is TRUE.
I triggered this locally with
```
PYTHONUNBUFFERED=1 pytest test/dynamo/test_dynamic_shapes.py -k 'Unspec and export and not dupes and not reorder' -v -x -s
```
But I also have some tests which trigger the problem synthetically.
Fixes https://github.com/pytorch/pytorch/issues/93781
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96488
Approved by: https://github.com/albanD
Run more tests through pytest.
Use a block list for tests that shouldn't run through pytest. As far as I can tell, the number of tests run, skipped, and xfailed for those not on the blocklist are the same.
Regarding the main module:
Usually tests are run in CI, we call `python <test file>`, which causes the file to be imported under the module name `__main__`. However, pytest searches for the module to be imported under the file name, so the file will be reimported. This can cause issues for tests that run module level code and change global state, like test_nn, which modifies lists imported from another file, or tests in test/lazy, which initialize a backend that cannot coexist with a second copy of itself.
My workaround for this is to run tests from the `__main__` module. However, this results in pytest being unable to rewrite assertions (and possibly other things but I don't know what other things pytest does right now). A better solution might be to call `pytest <test file>` directly and move all the code in run_tests(argv) to be module level code or put it in a hook in conftest.py.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95844
Approved by: https://github.com/huydhn
Summary: Currently running PyTorch tests with dynamo and inductor is
controlled by environment variables, and CI sets them based on test
config name matching. Change them to use options of run_test.py.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94539
Approved by: https://github.com/huydhn
The expression `argv + [f'--junit-xml-reruns={test_report_path}'] if TEST_SAVE_XML else []` evaluates to the empty list when `TEST_SAVE_XML` is false and would need parentheses.
Instead simplify the code by appending the argument when required directly where `test_report_path` is set.
Note that `.append()` may not be used as that would modify `argv` and in turn `UNITTEST_ARGS` which might have undesired side effects.
Without this patch `pytest.main()` would be called, i.e. no arguments which will try to discover all tests in the current working directory which ultimately leads to (many) failures.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94589
Approved by: https://github.com/clee2000, https://github.com/Neilblaze
I applied some flake8 fixes and enabled checking for them in the linter. I also enabled some checks for my previous comprehensions PR.
This is a follow up to #94323 where I enable the flake8 checkers for the fixes I made and fix a few more of them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94601
Approved by: https://github.com/ezyang
Preferring dash over underscore in command-line options. Add `--command-arg-name` to the argument parser. The old arguments with underscores `--command_arg_name` are kept for backward compatibility.
Both dashes and underscores are used in the PyTorch codebase. Some argument parsers only have dashes or only have underscores in arguments. For example, the `torchrun` utility for distributed training only accepts underscore arguments (e.g., `--master_port`). The dashes are more common in other command-line tools. And it looks to be the default choice in the Python standard library:
`argparse.BooleanOptionalAction`: 4a9dff0e5a/Lib/argparse.py (L893-L895)
```python
class BooleanOptionalAction(Action):
def __init__(...):
if option_string.startswith('--'):
option_string = '--no-' + option_string[2:]
_option_strings.append(option_string)
```
It adds `--no-argname`, not `--no_argname`. Also typing `_` need to press the shift or the caps-lock key than `-`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94505
Approved by: https://github.com/ezyang, https://github.com/seemethere
While discussing a possible addition of `assert_not_close` to the API (See #90005 later in the stack), it became clear that we should have an intermediate function that returns a bool-ish value that one can assert on. This PR introduces this function as `are_equal` as replacement for `assert_equal`. Interface is the same, but instead of raising in case a comparison failed, we return the `ErrorMeta`'s of all failures and leave it to the caller to handle. Note that this only applies to errors raised during the comparison stage. Everything else, e.g. only setting `atol` *or* `rtol`, will raise just as before.
We decided to keep this private for now unless there is user demand. The largest issue that needs to be solved before this can become public is the return type: if we have something like `torch.testing.are_close` we are targeting two uses cases:
1. Using it to branch inside code like `if are_close(...):`
2. Using it to assert closeness inside a test like `assert are_close(...)`. This is the default way to assert something with `pytest`
To do that, the return type has to be bool-ish, i.e. being an instance of `bool` or implementing `__bool__`. Plus, `bool(are_close()) is True` needs to be the if the inputs are close and `False` otherwise. The current logic of `are_close` satisfies the former, but violates the latter. In case everything is close, we return an empty list, but `bool([]) is False`.
Directly using an instance of `bool` would work for the requirements above, but then we would have no option to add diagnositics to the error. Meaning `assert are_close()` would work, but would be non-descriptive.
Using `Tuple[bool, str]` would work in general, but is quite dangerous and unexpected: since all non-empty tuples evaluate to `True`, this can easily hide bugs if the user is not super careful:
```pycon
>>> close = (False, "error message with diagnostics")
>>> assert close[0]
AssertionError: error message with diagnostics
>>> assert close
```
One possible solution here would be a thin custom object:
```py
class Close:
def __init__(self, flag:bool, msg: str = "") -> None:
self._flag = flag
self._msg = msg
def __bool__(self):
return self._flag
def __str__(self):
return self._msg
```
Now we can do something like
```pycon
close = Close(False, "error message with diagnostics") # coming from are_close
>>> if not close:
... print("It works!")
It works!
>>> assert close
AssertionError
>>> assert close, close # This looks weird, but does its job
AssertionError: error message with diagnostics
```
But this means we introduce another abstraction that the user has to deal with.
To reiterate, we are not going to make `are_close` public until there is user demand, since none of the options above is without flaws.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90004
Approved by: https://github.com/mruberry, https://github.com/malfet
This PR is a copy of https://github.com/pytorch/pytorch/pull/90849 that merge was reverted.
The PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI:
`torch.sparse.check_sparse_tensor_invariants` class provides different ways to enable/disable the invariant checking.
`torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden.
The PR fixes https://github.com/pytorch/pytorch/issues/90833
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92094
Approved by: https://github.com/cpuhrsch
#75854
A naive attempt at working around the limitations of using a single 64-bit integer to pack `stream_id`, `device_index`, and `device_type`.
Stills needs sanity checks, testing, and minimization of BC-breaking changes.
Currently a Holder for the `StreamData3` struct is used for `IValue` compatibility. While doing this seems to work for `ivalue.h` and `ivalue_inl.h`, this doesn't seem to be naively working for the JIT CUDA stream wrapper? (Something about ambiguous calls if an `intrusive_ptr` to `c10::ivalue::StreamData3Holder` is used as the return type for `pack()`. It turns out that the methods required to access the fields for rematerializing a CUDA Stream are basically already present anyway, so `pack` is simply removed in the wrapper for now and the methods to access the required fields are called directly.
CC @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81596
Approved by: https://github.com/ezyang
This PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI:
- `torch.enable_check_sparse_tensor_invariants` and `torch.is_check_sparse_tensor_invariants_enabled` functions to globally enable/disable the invariant checks and to retrieve the state of the feature, respectively
- `torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden.
The PR also fixes https://github.com/pytorch/pytorch/issues/90833
# Main issue
*The following content is outdated after merging the PRs in this ghstack but kept for the record.*
The importance of this feature is that when enabling the invariants checks by default, say, via
<details>
```
$ git diff
diff --git a/torch/__init__.py b/torch/__init__.py
index c8543057c7..19a91d0482 100644
--- a/torch/__init__.py
+++ b/torch/__init__.py
@@ -1239,3 +1239,8 @@ if 'TORCH_CUDA_SANITIZER' in os.environ:
# Populate magic methods on SymInt and SymFloat
import torch.fx.experimental.symbolic_shapes
+
+# temporarily enable sparse tensor arguments validation in unsafe
+# constructors:
+
+torch._C._set_check_sparse_tensor_invariants(True)
```
</details>
a massive number of test failures/errors occur in test_sparse_csr.py tests:
```
$ pytest -sv test/test_sparse_csr.py
<snip>
==== 4293 failed, 1557 passed, 237 skipped, 2744 errors in 69.71s (0:01:09) ====
```
that means that we are silently constructing sparse compressed tensors that do not satisfy the sparse tensor invariants. In particular, the following errors are raised:
```
AssertionError: "resize_as_sparse_compressed_tensor_: self and src must have the same layout" does not match "expected values to be a strided and contiguous tensor"
RuntimeError: CUDA error: device-side assert triggered
RuntimeError: `col_indices[..., crow_indices[..., i - 1]:crow_indices[..., i]] for all i = 1, ..., nrows are sorted and distinct along the last dimension values` is not satisfied.
RuntimeError: expected col_indices to be a strided and contiguous tensor
RuntimeError: expected row_indices to be a strided and contiguous tensor
RuntimeError: expected values to be a strided and contiguous tensor
RuntimeError: for_each: failed to synchronize: cudaErrorAssert: device-side assert triggered
RuntimeError: tensor dimensionality must be sum of batch, base, and dense dimensionalities (=0 + 2 + 0) but got 3
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90849
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
Continuation of #79979.
Fixes#79161
This PR does the following:
* Expands the `parametrize_fn()` signature from returning a 3-tuple of `(test, test_name, param_kwargs)` to returning a 4-tuple of `(test, test_name, param_kwargs, decorator_fn)`. Expected signature for the addition is `decorator_fn(param_kwargs) -> List[decorator]` i.e. given the full set of test params, return a list of decorators to apply.
* `modules`, `ops`, and `parametrize` now fit the new signature, returning `decorator_fn`s instead of applying decorators themselves.
* `instantiate_parametrized_tests()` and `instantiate_device_type_tests()` now call the returned `decorator_fn`, passing in the full set of `param_kwargs` (after composition + `device` / `dtype` additions) and applying the returned decorators.
* Composing multiple `parametrize_fn`s also composes the corresponding `decorator_fn`s; the composed `decorator_fn` simply concatenates the decorator lists returned by the constituents.
* Expands `DecorateInfo.is_active` to support callables:
```python
DecorateInfo(
unittest.expectedFailure, "TestOps", "test_python_ref_executor",
device_type='cuda', active_if=lambda params: params['executor'] == 'nvfuser'
),
```
* Adds several tests to `test/test_testing.py` ensuring proper decoration using `@parametrize`, `@modules`, and `@ops`.
* (minor) Fixes a couple `ModuleInfo` naming oddities uncovered during testing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91658
Approved by: https://github.com/malfet
I made an important mistake here when thinking `not result.skipped` mean that the current test wasn't skipped.
Similar to `result.failures` or `result.errors`, `result.skipped` is that it's a list including all the skipped messages so far in the test suite (https://docs.python.org/3/library/unittest.html#unittest.TestResult). As such, the correct way to check if the current test was skipped is to compare `skipped_before` and `len(result.skipped)` after running the test in the same way as failures and errors are handled. If they are the same, the test isn't skipped.
### Testing
`python test/run_test.py -i test_autograd --verbose` to confirm that the disabled test `test_profiler_seq_nr` is run 50 times always in rerun mode
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90888
Approved by: https://github.com/clee2000
#85303 added a patch to `torch.testing.assert_close` to handle `torch.storage.TypedStorage`'s. This change is not reflected in the docs and is not intended for the public API. This PR removes the patch ones again and moves the behavior to `TestCase.assertEqual` instead. Meaning, `TypedStorage`'s are again not supported by the public API, but the behavior is the same for all internal use cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89557
Approved by: https://github.com/kurtamohler, https://github.com/mruberry
This PR extends the `Tensor.to_sparse()` method to `Tensor.to_sparse(layout=None, blocksize=None)` in a BC manner (`layout=None` means `layout=torch.sparse_coo`).
In addition, the PR adds support for the following conversions:
- non-hybrid/hybrid COO tensor to CSR or CSC or a COO tensor
- short, bool, byte, char, bfloat16, int, long, half CSR tensor to a BSR tensor
and fixes the following conversions:
- hybrid COO to COO tensor
- non-batch/batch hybrid BSR to BSR or BSC tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89502
Approved by: https://github.com/amjames, https://github.com/cpuhrsch