pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-08 07:39:33 +01:00

Author	SHA1	Message	Date
Joel Schlosser	e7ec294c10	NJT OpInfo tests v2 (#138370 ) This PR updates OpInfo-based tests for NJTs: * Adds extensive coverage across non-contiguous NJTs (both non-contiguous transposed and non-contiguous with holes) * The `_sample_njts()` helper that `sample_input_func`s utilize now produces non-contig NJTs as well * Utilizes a `SampleInput`-based xfail system for granular classification of bugs. For example, it's possible to indicate that a class of ops is expected to fail only on non-contig with holes NJT inputs. * I decided on adding `SampleInput`s and utilizing this system over using test parametrization for two reasons: * Test perf - adding `SampleInput`s is faster than generating entire new tests * Avoiding the possibility of `sample_input_func`s not respecting the non-contig test parameter - this would result in silently incorrect passing of these tests. Keeping the responsibility for `SampleInput` generation firmly within each `OpInfo`'s `sample_input_func` means weirdness like this isn't possible * Improves `SampleInput` naming for a bunch of `sample_input_func`s. This makes it easier to xfail them as needed. For example, binary / unary / other ops now use the new `_describe_njt()` helper to get a string repr that uniquely defines the type of NJT being passed to the op * Adds appropriate `XFailRule`s to get tests passing for forward / backward / forward compile / backward compile. In general, each xfail corresponds to some bug that needs to be fixed ```python # Represents a rule indicating how to xfail a particular test. It allows granularity # at the device, dtype, op, and individual sample levels. This flexibility allows entire # bugs to be represented by a single rule, even if this corresponds with multiple conceptual # test cases across multiple ops. @dataclass class XFailRule: # expected error type error_type: TypeVar = Exception # expected error message error_msg: str = "." # function to indicate whether the rule applies; return True if so match_fn: Callable[[torch.device, torch.dtype, OpInfo, SampleInput], bool] = None # optional name for identifying the rule name: str = "" def match(self, device, dtype, op, sample) -> bool: return self.match_fn(device, dtype, op, sample) ``` Example: ```python # Bug when broadcasting a binary op with non-contiguous with holes NJT + dense # tensor with 1 in ragged dim. XFailRule( error_type=RuntimeError, error_msg="cannot call binary pointwise function . with inputs of shapes", match_fn=lambda device, dtype, op, sample: ( isinstance(op, BinaryUfuncInfo) and "noncontig_holes" in sample.name and "broadcasting 1 over ragged" in sample.name ), name="binary_noncontig_holes_broadcasting_1_over_ragged", ), ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/138370 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer ghstack dependencies: #140160	2024-11-11 19:35:24 +00:00
Xiaodong Wang	565a7942ee	Recover non-standard bool test for msort (#139870 ) Summary: I was looking into why the non-standard bool value will fail for msort - it makes sense for argsort and sort to fail, because we're randomly generating uint8 so the order will be different (and thus the indices will be different). But msort should work. After some digging, it's interesting that even though scalar_t is bool, when the actual value is a uint8_t, the comparison will treat them as signed. I tried lhs=255 and rhs=0: lhs < rhs is equivalent to -1 < 0 which is true (but it's supposed to be False) Therefore we add an explicit type cast. Test Plan: Remove the test skip Differential Revision: D65472170 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139870 Approved by: https://github.com/Skylion007, https://github.com/davidberard98	2024-11-11 02:00:34 +00:00
Natalia Gimelshein	1cdaf1d85f	correctly keep track of processed tensors for foreach reductions (#140103 ) Fixes #140066 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140103 Approved by: https://github.com/janeyx99 Co-authored-by: Jane Xu <janeyx@meta.com>	2024-11-08 23:04:53 +00:00
Animesh Jain	86792a5a8d	[invoke_subgraph] User facing API to support arbitrary args and kwargs (#139162 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139162 Approved by: https://github.com/zou3519	2024-11-08 03:31:19 +00:00
Nikita Shulga	ae01f2b61b	Extend CPU implementation of MSELoss to BF16 (#139959 ) It's strange that it has not been implemented for the type yet Pull Request resolved: https://github.com/pytorch/pytorch/pull/139959 Approved by: https://github.com/jgong5, https://github.com/janeyx99 ghstack dependencies: #139961	2024-11-07 23:50:15 +00:00
Animesh Jain	75f3056c81	[hop-db] Import invoke_subgraph to avoid Dynamo error on mac (#140038 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140038 Approved by: https://github.com/ydwu4	2024-11-07 22:36:57 +00:00
IvanKobzarev	781c68c865	[aotd] coerce_same_metadata_as_tangent with expected_type for e.g.AsyncCollectiveTensor (#139095 ) Based on discussion here: https://github.com/pytorch/pytorch/pull/138731 Introducing ability for subclass implement type convertion to expected_type. ``` def __coerce_same_metadata_as_tangent__( self, expected_metadata: Any, expected_type: Optional[Type] = None ): ``` Here if `expected_type=None` means `SubclassClass` is expected. E.g. for `DTensor` we may find tangent `AsyncCollectiveTensor` where we expected `Tensor` - in this case `expected_type=Tensor` will be called during runtime Adding implementation to AsyncCollectiveTensor, that just triggers `wait()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139095 Approved by: https://github.com/bdhirsh	2024-11-07 16:24:48 +00:00
Gabriel Ferns	2037ea3e15	Add type annotations to Configs (#139833 ) Summary: Adds types to Configs, and fixes a bug in options that was caused by the lack of types. fixes: https://github.com/pytorch/pytorch/issues/139822 Configs are used by many modules so not sure which label to put. Types also allow https://github.com/pytorch/pytorch/pull/139736 to fuzz configs Pull Request resolved: https://github.com/pytorch/pytorch/pull/139833 Approved by: https://github.com/c00w	2024-11-07 03:49:09 +00:00
Sun, Jiayi	a59132b9c8	fix torch.linalg.norm and torch.norm for torch.complex32 datatype (#133661 ) Fix https://github.com/pytorch/pytorch/issues/132634. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133661 Approved by: https://github.com/mingfeima, https://github.com/Skylion007	2024-11-07 03:21:36 +00:00
Edward Z. Yang	4e647871d6	Ensure TORCH_TRACE is run for Dynamo/Distributed tests (#139786 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/139786 Approved by: https://github.com/bobrenjc93, https://github.com/c00w, https://github.com/anijain2305 ghstack dependencies: #139716	2024-11-07 01:58:05 +00:00
Colin L. Rice	2a857e940d	config: Add env_name_default and env_name_force to Config (#138956 ) This allows Configs to handle setting their defaults (or overriding themselves) via environment variables. The environment variables are resolved at install time (which is usually import time). This is done 1) to avoid any race conditions between threads etc..., but 2) to help encourage people to just go modify the configs directly, vs overriding environment variables to change pytorch behaviour. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138956 Approved by: https://github.com/ezyang ghstack dependencies: #138766	2024-11-06 21:20:42 +00:00
Nikita Shulga	68ef445c33	[MPS][Perf] Dispatch to SDP-math-mps for non-contig Tensors (#139791 ) As MacOS-15 or newer supports those out of the box. This significantly reduces memory requirements and improves performance for some stable diffision networks. Test plan: Run ```python from diffusers import StableDiffusionXLPipeline, AutoencoderKL, EulerAncestralDiscreteScheduler import torch import time vae = AutoencoderKL.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder='vae', torch_dtype=torch.bfloat16, force_upcast=False).to('mps') pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", vae=vae, torch_dtype=torch.bfloat16, variant="fp16").to('mps') pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config) start_time = time.time() start_mps_mem = torch.mps.driver_allocated_memory() image = pipe(prompt="Spherical cow in vacuum", num_inference_steps=10, guidance_scale=8, generator=torch.Generator("mps").manual_seed(42), ).images[0] end_mps_mem = torch.mps.driver_allocated_memory() run_time = time.time() - start_time print(f"run time in {run_time:.2f} sec, end_mps_mem {end_mps_mem/1024.02:.2f} Mb mem increase {(end_mps_mem-start_time)/1024.02:.2f} Mb") image.save(f'bfloat16.png') ``` Before the change total memory use were 16Gb and needed 65 sec to complete, after it drops down to 14Gb and takes 50 sec to finish on M2Pro, though generated image remains the same: ![image](https://github.com/user-attachments/assets/1a35efef-9f80-4cd0-ac9c-30203eab6bb1) Fixes https://github.com/pytorch/pytorch/issues/139389 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139791 Approved by: https://github.com/drisspg, https://github.com/Skylion007 ghstack dependencies: #139788, #139784, #139763	2024-11-06 16:25:39 +00:00
Sun, Jiayi	44df6522ee	add Half/BFloat16 support for grid_sample on CPU (#134812 ) Fix https://github.com/pytorch/pytorch/issues/127224. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134812 Approved by: https://github.com/Skylion007, https://github.com/mingfeima	2024-11-06 14:02:08 +00:00
Jack Taylor	5f266b5a02	[ROCm] re-enable flex attention UTs (#139632 ) https://github.com/pytorch/pytorch/pull/136792 accidentally disabled flex attention UTs on ROCm. Re-enabling. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139632 Approved by: https://github.com/drisspg	2024-11-06 12:49:44 +00:00
Xiaodong Wang	e7cf7d00be	Support torch.bool in torch.sort + CUDA (#139409 ) Summary: This might be out-dated, so I'm adding it back and see if we pass all the tests. I'm pretty sure cuda12 is ok. Test Plan: CI Differential Revision: D65282650 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139409 Approved by: https://github.com/zou3519, https://github.com/ngimel, https://github.com/eqy	2024-11-06 00:02:54 +00:00
PyTorch MergeBot	1d28b8b6d5	Revert "Deprecate `torch._utils.is_compiling()` and `torch._dynamo.external_utils.is_compiling()` (#127690 )" This reverts commit `e84d1121ad`. Reverted https://github.com/pytorch/pytorch/pull/127690 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally. More details in D65483292 ([comment](https://github.com/pytorch/pytorch/pull/127690#issuecomment-2458381056))	2024-11-05 23:10:38 +00:00
Xuehai Pan	e84d1121ad	Deprecate `torch._utils.is_compiling()` and `torch._dynamo.external_utils.is_compiling()` (#127690 ) This PR is split from PR #126898. - #126898 ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/127690 Approved by: https://github.com/Skylion007, https://github.com/malfet	2024-11-05 10:44:56 +00:00
zeshengzong	ffb7a08921	Fix torch.histc not checking min > max on cuda for int8 tensors (#139372 ) Fixes #139360 `86e6513c86/aten/src/ATen/native/cuda/SummaryOps.cu (L323-L324)` Assign `min` and `max` to with low-precision input_t variable `minvalue` and `maxvalue` cause wrong comparing result in following check in here: `86e6513c86/aten/src/ATen/native/cuda/SummaryOps.cu (L353)` ![image](https://github.com/user-attachments/assets/0d5c87f4-3dc6-48bb-bcc8-b1803e7cd487) Change type of `minvalue` and `maxvalue` to fix it, similar like in line: `86e6513c86/aten/src/ATen/native/cuda/SummaryOps.cu (L280-L282)` Test Result ```bash $ pytest test/test_reductions.py -vv ``` ![image](https://github.com/user-attachments/assets/6b5d0d48-ebc2-4a8c-85f4-dbad147c086c) ```bash $ lintrunner ``` ![image](https://github.com/user-attachments/assets/f97c2d6d-78ea-4439-a1ba-907bc9defad7) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139372 Approved by: https://github.com/eqy	2024-11-05 08:42:38 +00:00
Chen, Zejun	9aaf3a04fa	[profiler][UT] instantiate profiler UTs for devices and enable UTs for xpu profiler (#134316 ) This PR enables the profiler related UT to be device-agnostic. It instantiates the profiler UTs for different device types and enable them on XPU backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134316 Approved by: https://github.com/etaf, https://github.com/aaronenyeshi, https://github.com/gujinghui	2024-11-05 05:46:13 +00:00
Jane Xu	23169a6bcc	Disable foreach tests for complex128 internally (#139649 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139649 Approved by: https://github.com/ngimel	2024-11-04 23:24:47 +00:00
Colin L. Rice	abc5d59dcb	config: create Config objects with JK support (#138766 ) This teaches install_config_module (and the underlying code) to understands Config objects. Additionally we've added a JK option to this which resolves the JK. This config gets stored within the _ConfigEntry class and is evaluated when __getattr__ is called. If justknobs is set, it'll call justknobs_check to see the result. Due to preceeding work, basically everything works correctly here and we had to update a couple of tests, and modify the getattr behaviour. Note that we are updating the justknob_check function to support a default option, to make default work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138766 Approved by: https://github.com/ezyang	2024-11-01 19:20:37 +00:00
Joel Schlosser	ddb291a881	Fix and test several NJT reductions (#139317 ) I'm sick of reductions not working properly - spotty dim coverage, missing backwards, etc. This PR fixes quite a bit. It applies to the following ops: * `sum` / `mean` / `prod` * `all` / `any` * `amin` / `amax` * `min` / `max` * `argmin` / `argmax` The general reduction logic has been factored out into a helper `_apply_reduction(func, func_name, identity_element, args, kwargs)`. The idea is that by providing a valid identity element, we can utilize conversions to padded dense when needed for reducing over the ragged dim. Extensive test coverage includes: reductions across ragged dim * reductions across non-batch, non-ragged dims * reductions across both batch and ragged dims * multiple dim reductions (for ops that support this) * full reduction -> scalar Bonus: the PR includes backwards fixes for `sum` and `mean`, which have never worked. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139317 Approved by: https://github.com/cpuhrsch	2024-10-31 20:55:38 +00:00
Antoni Viros	ad637a4c5c	Add support for index_put_ in NT (#135722 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135722 Approved by: https://github.com/jbschlosser	2024-10-30 17:17:59 +00:00
PyTorch MergeBot	5861279f47	Revert "Add support for index_put_ in NT (#135722 )" This reverts commit `b4836e5b5c`. Reverted https://github.com/pytorch/pytorch/pull/135722 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing on ROCm ([comment](https://github.com/pytorch/pytorch/pull/135722#issuecomment-2445651914))	2024-10-30 01:53:55 +00:00
Antoni Viros	b4836e5b5c	Add support for index_put_ in NT (#135722 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135722 Approved by: https://github.com/jbschlosser	2024-10-30 00:03:21 +00:00
Joel Schlosser	23d590e518	More flexible test parametrization with @reparametrize (#138369 ) Background: The `@parametrize` decorator enjoys widespread usage as a convenient tool for ensuring extensive test coverage. One particular feature that makes this easy is the ability to stack such decorators, testing over the cross-product of inputs. Example: ```python class MyTestClass(TestCase): @parametrize("x", range(3)) @parametrize("y", [False, True]) def test_foo(self, x, y): # Invoked with: # x=0, y=False # x=1, y=False # x=2, y=False # x=0, y=True # x=1, y=True # x=2, y=True ... ``` Note that the `@ops` and `@modules` decorators employ the same underlying machinery for parametrizing over `OpInfo` / `ModuleInfo` entries. These decorators also parametrize over op-specific `device` / `dtype` info according to what is supported for each op. ```python class MyTestClass(TestCase): @ops(op_db) def test_foo(self, op, device, dtype): # Invoked each OpInfo in the db along with each device / dtype that corresponds # with this op according to the OpInfo entry. ... ``` Note that this in contrast to the naive cross product between ops and devices / dtypes, which would generate too many tests. Certain use cases benefit from a similar type of flexible parametrization that is more intelligent than simple cross-product composition. It is expensive to generate / run too many tests, even if the unneeded ones are skipped appropriately. This PR attempts to generalize such flexible parametrization and satisfy these use cases through the introduction of a `@reparametrize` decorator, which operates on an existing parametrizer and allows for customized on-the-fly parametrization through the use of an `adapter_fn`. Examples: ```python # adapter_fn that adds a new arg def include_is_even_arg(test_name, param_kwargs): x = param_kwargs["x"] is_even = x % 2 == 0 new_param_kwargs = dict(param_kwargs) new_param_kwargs["is_even"] = is_even is_even_suffix = "_even" if is_even else "_odd" new_test_name = f"{test_name}{is_even_suffix}" yield (new_test_name, new_param_kwargs) # adapter_fn that excludes certain values def exclude_odds(test_name, param_kwargs): x = param_kwargs["x"] is_even = x % 2 == 0 yield None if not is_even else (test_name, param_kwargs) class MyTestClass(TestCase): @reparametrize(parametrize("x", range(5)), include_is_even_arg) def test_foo(self, x, is_even): # Invoked with both the x value and the new is_even arg ... @reparametrize(parametrize("x", range(5)), exclude_odds) def test_bar(self, x): # Only invoked with even x values ... ``` For a more real-world use case, imagine you want to write a set of OpInfo tests that parametrize over additional op-specific things beyond `device` / `dtype` (in NJT's case, this includes contiguity type, whether to operate over the batch / ragged / other dims, etc.). The `@reparametrize` decorator allows you to customize the `@ops` parametrization to add in these additional args as they make sense on a per-op basis. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138369 Approved by: https://github.com/janeyx99	2024-10-29 22:14:38 +00:00
drisspg	80c7c7178e	Make sure all SDPA tests are ran with tensor cores enabled (#135592 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135592 Approved by: https://github.com/eqy	2024-10-29 20:53:10 +00:00
Jake Schmidt	2b577ae58f	Implement NJT embedding backward (#138627 ) Fixes #138352 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138627 Approved by: https://github.com/jbschlosser	2024-10-29 18:44:58 +00:00
PyTorch MergeBot	38645e8a3e	Revert "Fix unbind_copy and add its decomposition (#134319 )" This reverts commit `8aedc649bd`. Reverted https://github.com/pytorch/pytorch/pull/134319 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but this is still failing the same test on ExecuTorch ([comment](https://github.com/pytorch/pytorch/pull/134319#issuecomment-2443209139))	2024-10-29 04:54:37 +00:00
wz337	5b39734a0a	[DTensor][Test] Fix gloo backend failure when eager_init is turned on (#139097 ) We should only pass the `device_id` when the backend is `nccl`. Otherwise, we would run into the following error: ``` RuntimeError: No backend for the parent process group or its backend does not support splitting ``` This also fixes test failure is not asserted when using `with_comms()` or `with_comms(eager_init=False)`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139097 Approved by: https://github.com/XilunWu	2024-10-29 00:04:06 +00:00
cyy	aa2b17c330	[3/N] Don't skip ASAN on some tests (#139058 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/139058 Approved by: https://github.com/ezyang	2024-10-28 23:57:23 +00:00
Guilherme Leobas	8785353f2f	Fix tensor subclass + dynamic shapes in torch.compile + aot autograd (#125941 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125941 Approved by: https://github.com/bdhirsh ghstack dependencies: #133337	2024-10-28 21:58:59 +00:00
Guilherme Leobas	6baccb430b	Update TwoTensor impl. to accept `outer_size/outer_stride` (#133337 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133337 Approved by: https://github.com/bdhirsh	2024-10-28 21:58:59 +00:00
William Wen	52c80f663d	change name of dynamo CI chard to dynamo_wrapped (#138233 ) Implements https://github.com/pytorch/pytorch/issues/118127 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138233 Approved by: https://github.com/clee2000	2024-10-28 21:42:33 +00:00
Joel Schlosser	8ba9063002	FlexAttention support for NJT (#136792 ) This PR adds FlexAttention + NJT support. In particular: * To handle raggedness, treats the packed sequence dim of input NJTs as a giant "stacked sequence". To ensure user `score_mod` / `mask_mod` functions can still be written in the original NJT sequence space, this PR handles conversions for indices within the giant "stacked sequence" -> sequence relative indices automatically. * Provides `py_impls` for `NestedTensor` to the HOPs for flex attention forward / backward that simply wrap / unwrap NJTs appropriately * Adds barebones `new_empty()` support to NJT since FlexAttention utilizes this repeatedly; right now, only `new_empty()` with a shape of `()` is supported * Tests that FlexAttention with a causal mask matches causal SDPA * Adds a new public API for FlexAttention usage: * `create_nested_block_mask(mask_mod, B, H, njt, BLOCK_SIZE, _compile)` - NJT analogue for `create_block_mask()` that utilizes the `njt`'s ragged structure to create an appropriately-sized block mask (e.g. `(1, 1, total_seqlen, total_seqlen)`). This function handles the index conversion from "stacked sequence" space -> relative sequence space. * Minor note: as this is a public API, this function is purposefully named with "nested" instead of "njt" to keep the latter as an informal, mostly internal-only term. Example usage: ```python def causal_mask(b, h, q_idx, kv_idx): return q_idx >= kv_idx query = ... # NJT of shape (B, H, S, D) key = ... # NJT of shape (B, H, S, D) value = ... # NJT of shape (B, H, S, D) # create_nested_block_mask() automatically converts indices from "stacked sequence" space -> relative sequence space block_mask = create_nested_block_mask(causal_mask, 1, 1, query) # block mask conceptual shape is (B, H, sum(S), sum(S)) output = flex_attention(query, key, value, block_mask=block_mask) def causal_score_mod(score, b, h, q_idx, kv_idx): return torch.where(q_idx >= kv_idx, score, float("-inf")) # flex_attention() automatically converts indices from "stacked sequence" space -> relative sequence space for NJT inputs output2 = flex_attention(query, key, value, score_mod=causal_score_mod) ``` TODO: ~~Determine the right level of abstraction for public API helpers + move them alongside other helpers~~ Verify this with others though * ~~Some cleanup~~ * ~~`njt_score_mod_adapter`~~ * ~~Q: should `create_njt_block_mask()` call `njt_mask_mod_adapter()` so we don't need two calls?~~ * Can we avoid materializing the `sum(s)` length `seq_idx` used for conversion between stacked sequence -> sequence relative indices? * Not for now, although future work may deepen the integration between Flex + NJT (possibly requiring custom templates). We should try to cache this though. * ~~Demonstrate non-causal mask~~ * Support non-contiguous NJTs with holes (booted to future PR) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136792 Approved by: https://github.com/drisspg ghstack dependencies: #138841	2024-10-28 20:01:27 +00:00
Mwiza Kunda	c2ded9ec0d	Fix dot reference checks (#138596 ) dot reference implementation should be consistent with the cpu / cuda implementations since it may be used for meta dispatch i.e. ```python import torch x = torch.tensor([1,2,3], dtype=torch.float32) y = torch.tensor([4,5,6], dtype=torch.float16) x.dot(y) Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: dot : expected both vectors to have same dtype, but found Float and Half ``` However the below does not raise an exception ```python x.to("meta").dot(y.to("meta")) ``` Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/138596 Approved by: https://github.com/bdhirsh	2024-10-28 19:11:40 +00:00
Aaron Gokaslan	5d074746e9	[BE]: Add better optional typing (#138426 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/138426 Approved by: https://github.com/XuehaiPan, https://github.com/malfet	2024-10-27 14:19:00 +00:00
Simon Fan	99608ceed6	Scoped extension building for C++ backed custom ops tests (#136695 ) FIXES #125579 #131103 #133197 #133283 #134738 #135369 #135685 Tests that create C++ extensions can cause flakiness in CI due to library namespace conflict and test ordering. We can build them in temp dirs to ensure isolation. An alternative is to build these as part of the build process and have build time errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136695 Approved by: https://github.com/zou3519	2024-10-26 07:41:00 +00:00
Yidi Wu	c6bb9b53f4	[scan] better error handling and remove redundant tests (#137967 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137967 Approved by: https://github.com/zou3519	2024-10-25 19:01:25 +00:00
Tom Ritchford	8aedc649bd	Fix unbind_copy and add its decomposition (#134319 ) * Fixes https://github.com/pytorch/pytorch/issues/130829 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134319 Approved by: https://github.com/amjames, https://github.com/eellison	2024-10-23 19:13:44 +00:00
Tom Ritchford	1bc73f3157	Add decomposition for permute_copy (#130944 ) * Extracted from #129476 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130944 Approved by: https://github.com/amjames, https://github.com/eellison	2024-10-23 17:42:11 +00:00
William Wen	3441ea7d74	[dynamo] reset compiler stance after test (#138277 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138277 Approved by: https://github.com/anijain2305, https://github.com/jansel	2024-10-23 00:07:33 +00:00
Animesh Jain	4dd4d38ca9	[hierarchical-compilation][hop] Introduce invoke_subgraph (#137538 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137538 Approved by: https://github.com/zou3519	2024-10-22 15:33:34 +00:00
Colin L. Rice	bb8bc7d6b3	config: simplify most of the config handling and fix some bugs (#138377 ) This PR combines a number of cleanups in one PR. If any of the specific cleanups don't seem to make sense, let me know and I can remove them. Cleanups - This PR adds a set of test suites for the config module code, which handles basically all the APIs and ways it is used. Please let me know if you see anything critical that is not tested that I missed. This test suite is primarily used as the regression test suite for later changes in this diff. Note that there is some dynamo specific testing of the config module, but it isn't as verbose. - I removed all internal usage of shallow_copy_dict. Those usages could all use the deep copy, and did not depend on the reference behavior of certain config values that shallow_copy_dict allows. - I removed shallow copy semantics for configuration with a deprecation warning. I think this requires a release note, so hopefully I did that correctly. Let me know if we want to continue to expose shallow copy value semantics, but I just can't find a case where I expect anyone would want it. It also complicated later internal changes to the API (i.e. breaking apart various layers of the config changes). - I fixed what I believe is a bug in how hashes are calculated on configs. In particular, if you got the hash, then made a config change, and then got the hash again, it would not update the hash. @oulgen, please let me know if I'm misunderstanding this behavior and it is desired. - I switched our multiple implementations of iterating through the dictionary to a single one. This is primarily to make later changes easier, but it also makes it clear how inconsistent our various config ignoring options are. Let me know if people would be interested in me unifying the various options for ignoring config values. - I updated the test patcher (not the performance critical one, just the normal one), to use __setattr__ and __getattr__ to remove direct API access to the underlying config fetcher. For release notes, Not sure exactly how to communicate this, but something like "ConfigModule.to_dict, and ConfigModule.shallow_copy_dict no longer retain their shallow copy semantics, which allowed reference values objects to be modified. If you wish to modify the config object, call load_config explicitly". Pull Request resolved: https://github.com/pytorch/pytorch/pull/138377 Approved by: https://github.com/ezyang, https://github.com/jansel, https://github.com/jovianjaison	2024-10-22 13:40:26 +00:00
Bob Ren	20a2d39557	Log all failing test repros to scuba (#138394 ) This has the benefit that 1) It's much easier to aggregate test failure repros into say a CSV or shell script from scuba 2) We can do analysis (eg. set different two sets of tests across two PRs) 3) We can get results faster at the test-level granularity instead of job-level granularity we see in the HUD/GH. I tested this by introducing a breaking change, adding ci-scribe label and then verifying that the failed tests were logged to scuba: https://fburl.com/scuba/torch_open_source_signpost/w6qt7qr9 I then reverted the breaking change and published this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138394 Approved by: https://github.com/ezyang	2024-10-21 21:35:47 +00:00
Will Feng	e4ad02892f	Upgrade distributed test to g4dn instances (T4 GPUs) (#137161 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137161 Approved by: https://github.com/seemethere, https://github.com/eqy, https://github.com/yf225 Co-authored-by: Will Feng <yf225@cornell.edu>	2024-10-20 23:48:54 +00:00
Will Feng	a9f4f89cd5	[CI] Add Compiled DDP / Compiled FSDP2 / compute-comm reordering tests to test_inductor_distributed (#138178 ) `test_replicate_with_compiler.py` and `test_fully_shard_compile.py` requires bf16, so needs to be run within test_inductor_distributed job (which uses A10G (SM80) and has bf16 support). This allows us to migrate distributed jobs to T4 machines in https://github.com/pytorch/pytorch/pull/137161, as the compiled distributed jobs are the only blocking ones now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138178 Approved by: https://github.com/xmfan, https://github.com/fduwjj, https://github.com/fegin, https://github.com/kwen2501	2024-10-20 19:38:18 +00:00
Tom Ritchford	c0582fd0f8	Remove unused Python variables in torch/[b-z]* (#136963 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136963 Approved by: https://github.com/ezyang	2024-10-19 16:45:22 +00:00
wz337	ff598f2f4d	[DTensorTestbase] Add an optional `eager_init` flag to `with_comms()` to support eager init nccl communicator for DeviceMesh test case (#138108 ) Add an optional `eager_init` flag to `with_comms`. When `eager_init` is True and backend is `nccl`, we pass the `device_id` to `init_process_group()` for eager initialization. Otherwise, `device_id` is still `None` and this goes through the normal lazy call. Default for `eager_init` is False. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138108 Approved by: https://github.com/kwen2501	2024-10-19 01:04:55 +00:00
Ke Wen	c88b77af9c	[Distributed][CI] Add SM guard for compiled tests involving BF16 (#138245 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138245 Approved by: https://github.com/yf225	2024-10-18 21:39:39 +00:00

1 2 3 4 5 ...

5256 Commits