pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
cyy	b3fd94d15e	[Distributed] [7/N] Fix clang-tidy warnings in torch/csrc/distributed/c10d (#124987 ) This PR continues to clean clang-tidy warnings in torch/csrc/distributed/c10d, following #124701. In addition, libfmt dependency is added in CMake code to enable using it in the headers. The libfmt has to be added as private dependency to torch_cuda and torch_hip because they include torch/csrc/distributed/c10d/Utils.hpp which uses libfmt. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124987 Approved by: https://github.com/malfet	2024-04-27 07:22:27 +00:00
Yanbo Liang	ce503c1b40	Dynamo x autograd.Function supports setup_context (#124802 ) Fixes part of #118397 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124802 Approved by: https://github.com/zou3519	2024-04-27 04:57:13 +00:00
eqy	a866bfff45	[cuDNN] cuDNN SDPA (Flash Attention) Backward (#122510 ) #113713 currently passing trivial smoke tests but I just totally pattern-matched bits and pieces of the autograd defs Will also collect benchmark data, CC @drisspg Co-authored-by: Eli Uriegas <1700823+seemethere@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/122510 Approved by: https://github.com/drisspg	2024-04-27 04:15:49 +00:00
Nikita Shulga	5944a53555	[MPS] Fix nextafter for negative values (#125029 ) By changing the logic to on older MacOS: ```cpp bits += ((input > 0) ^ (input > other)) ? 1 : -1; ``` And use native `nextafter` on MacOS Sonoma (i.e. if Metal 3.1 is available) TODO: - Add tests for infs and denorms Fixes https://github.com/pytorch/pytorch/issues/124985 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125029 Approved by: https://github.com/Skylion007	2024-04-27 02:58:05 +00:00
Xia, Weiwen	35b332882b	[Quant][PT2E] Enable linear-binary(-unary) post-op recipe for X86Inductor quantizer (#122387 ) As the title Test plan python test/test_quantization.py -k test_linear_binary Differential Revision: [D56288440](https://our.internmc.facebook.com/intern/diff/D56288440) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122387 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5 ghstack dependencies: #123240	2024-04-27 02:40:57 +00:00
Tristan Rice	dc4c75ba72	elastic/rendezvous: make barrier and rank assignment operations O(n) instead of O(n^2) (#124982 ) Summary: This makes barrier and rank operations linear instead of quadratic with the number of workers. This drastically improves performance for rendezvous when running with over 1000 hosts. This uses 2 approaches for different areas: * local rank assignment: each worker does 1 set and 1 get, local ranks are assigned on the rank 0 host in a O(n) operation which reduces total store operations to be linear with number of workers. * exit_barrier: use a counter and a final flag so each worker has to do max 1 set, 1 get and 1 add. At 4000 hosts we see torchelastic be able to run in as little as 10 seconds down from 373 seconds. Test Plan: This is testing using many small tests running on a remote cluster. {D56549942} ``` torchx run --scheduler mast -- --image=torchelastic_benchmark --j=4000x1 ``` Differential Revision: D56605193 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124982 Approved by: https://github.com/kiukchung, https://github.com/kurman	2024-04-27 02:21:44 +00:00
Simon Fan	1a6fef15ef	[compiled autograd] verbose logs for debugging cache misses (#124980 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124980 Approved by: https://github.com/jansel ghstack dependencies: #124954	2024-04-27 01:10:37 +00:00
Simon Fan	43a7ab2a21	[compiled autograd] introduce verbose logs, add autograd node info to graph (#124954 ) - sets it as a fake stack trace as we don't have a generic comment feature - when verbose is disabled, still adds a contextmanager and flag checks. the alternative is to use MACROS, but that wouldn't be usable with TORCH_LOGS Pull Request resolved: https://github.com/pytorch/pytorch/pull/124954 Approved by: https://github.com/jansel	2024-04-27 01:10:37 +00:00
Xia, Weiwen	e592a609fd	[Quant][ONEDNN] improve performance of qconv by reducing integration overhead (#123240 ) ## Description Framework overhead is found to be big for the onednn qconv op (used for quantization with PT2E X86Inductor backend). This PR reduces the integration overhead by modifying the implementation of qconv. ## performance results Running quantized Resnet50 on an Intel(R) Xeon(R) Platinum 8490H machine Before ``` Average latency: 8.378 ms. ------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ------------------------- ------------ ------------ ------------ ------------ ------------ ------------ onednn::qconv2d_pointwise 86.54% 6.954ms 87.42% 7.025ms 132.547us 53 ``` After ``` Average latency: 6.255 ms. ------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ------------------------- ------------ ------------ ------------ ------------ ------------ ------------ onednn::qconv2d_pointwise 85.05% 6.381ms 85.98% 6.451ms 121.717us 53 ``` Test script: ```python import torch import torchvision import time import copy import numpy as np from torch._export import capture_pre_autograd_graph from torch.ao.quantization.quantize_pt2e import ( prepare_pt2e, convert_pt2e, ) import torch.ao.quantization.quantizer.x86_inductor_quantizer as xiq from torch.ao.quantization.quantizer.x86_inductor_quantizer import X86InductorQuantizer torch._inductor.config.cpp.enable_kernel_profile=True torch._inductor.config.profiler_mark_wrapper_call = True torch._inductor.config.freezing = True torch._inductor.config.cpp_wrapper = True def bench_model(model, inputs): times =[] with torch.no_grad(): for _ in range(5): # warm-up output = model(inputs) for _ in range(20): start_time = time.time() output = model(inputs) end_time = time.time() times.append(end_time - start_time) print ('Average latency: %0.3f ms.' % (np.median(times) * 1000.0)) with torch.profiler.profile(activities=[torch.profiler.ProfilerActivity.CPU]) as p: out_ipex = model(inputs) print(p.key_averages().table(sort_by="self_cpu_time_total", row_limit=-1)) def pt2e_ptq(m, example_inputs): m = m.eval() exported_model = capture_pre_autograd_graph(m, example_inputs) quantizer = X86InductorQuantizer() quantizer.set_global(xiq.get_default_x86_inductor_quantization_config()) prepared_model = prepare_pt2e(exported_model, quantizer) _ = prepared_model(example_inputs) converted_model = convert_pt2e(prepared_model) torch.ao.quantization.move_exported_model_to_eval(converted_model) with torch.no_grad(): optimized_model = torch.compile(converted_model) _ = optimized_model(example_inputs) _ = optimized_model(example_inputs) bench_model(optimized_model, example_inputs) return optimized_model if __name__ == "__main__": data = torch.randn(16, 3, 224, 224) model_fp = torchvision.models.resnet50(weights=torchvision.models.ResNet50_Weights.DEFAULT) pt2e_ptq(copy.deepcopy(model_fp), (data,)) ``` Differential Revision: [D56288440](https://our.internmc.facebook.com/intern/diff/D56288440) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123240 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/jerryzh168	2024-04-27 00:52:45 +00:00
Valentine233	368f5212fa	[cpu] [inductor] decompose bmm for memory bound in lowering (#124826 ) Fixes #124697. Resolve the issue of large regression of GPT-FAST MOE with `coordinate_descent_tuning` disabled. To get better perf for memory bound case, we decompose bmm in lowering. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124826 Approved by: https://github.com/jgong5, https://github.com/jansel	2024-04-27 00:19:10 +00:00
Valentine233	ebb8905e0c	[cpu] add VecConvert between 8bits and 16bits (#124828 ) The perf benefit was found in https://github.com/pytorch/pytorch/issues/124697#issuecomment-2071658300. The PR adds intrinsic specializations between int8/uint8 and bf16/fp16. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124828 Approved by: https://github.com/jgong5, https://github.com/jansel	2024-04-27 00:17:44 +00:00
Animesh Jain	fd24d8c05a	[dynamo][nn module] Use correct sources for _call_impl (#124970 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124970 Approved by: https://github.com/jansel ghstack dependencies: #124779, #124627	2024-04-26 23:18:30 +00:00
James Pang	43069c460e	Correct check for Boolean list input type (#124899 ) Summary: This diff fixes a bug in PyTorch where when creating a tensor from a List of booleans, PyTorch was throwing an error. This fix resolves that issue. All credit goes to swolchok for identifying the root cause of the issue and suggesting this fix. Test Plan: Running our model end to end works as expected and no error occurs. Differential Revision: D55990810 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124899 Approved by: https://github.com/zhxchen17	2024-04-26 22:25:43 +00:00
Xilun Wu	be2c09725a	[dtensor][experimental] local_map (#123676 ) Summary This PR is attempt to land an experimental feature designed in #103686 . `local_map` is designed to allow users to apply to `DTensor` objects a function that was written to apply to `torch.Tensor`. As a function, `local_map` takes in 2 required arguments (`func` and `out_placements`) and 3 optional arguments (`device_mesh`, `in_placements`, `redistribute_inputs`). `func` is the function to be applied to each local shard of input `DTensor`. `out_placements` is the sharding specification of output `DTensor`. `local_map` returns a new function that does the following: 1. Infer `device_mesh` and `in_placements` from `DTensor` input if they're not provided. If `device_mesh` is provided, it must be identical to the device mesh of every `DTensor` input. If `in_placements` is provided, it serves as the required sharding specification of corresponding `DTensor` input before feeding its local shard into `func`. In case it is different from `DTensor`'s sharding specification, if `redistribute_inputs=False` an exception will be raised, otherwise perform a resharding to the required sharding. 2. Call `func` with the arguments passed in along with `device_mesh` except `DTensor`s. For `DTensor`, pass in its local shard. This `func` may include collectives. 3. For each output of `func` that has validate (i.e. not `None) sharding specification in `out_placements`, construct a new `DTensor` using the output and the specification. Use this `DTensor` as the output. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123676 Approved by: https://github.com/wanchaol	2024-04-26 22:23:59 +00:00
Luca Wehrstedt	83e7b9d25f	[Inductor] Support fusion of chained reductions even if keepdims=True (#124843 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124843 Approved by: https://github.com/shunting314	2024-04-26 21:50:52 +00:00
Catherine Lee	a68a8c0f6b	Disable test_binary_op_list_error_cases in test_foreach (#125046 ) It's really flaky ex * https://github.com/pytorch/pytorch/issues/124636 * https://github.com/pytorch/pytorch/issues/124529 there are more Pull Request resolved: https://github.com/pytorch/pytorch/pull/125046 Approved by: https://github.com/huydhn	2024-04-26 21:25:38 +00:00
rzou	c6b7504d47	Fix torch.library.register_fake's module reporting (#125037 ) torch.library.register_fake reports the python module the fake impl is located in. This is used to check against `m.set_python_module("foo.bar")` calls in C++. The module reporting logic was wrong in most cases. This PR fixes it. Test Plan: - exhaustive tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/125037 Approved by: https://github.com/williamwen42	2024-04-26 20:53:33 +00:00
Kai Londenberg	cd06c73cbd	[Inductor Cutlass backend] Improved GEMM template (#124577 ) Improves the Cutlass backend GEMM template: * Adds code which allows to create stand-alone test runners for Cutlass GEMM Kernels, which allows (manual) debugging of, for example, CUDA IMA errors or similar problems which occur in practice. Includes some utility code and tests to actually compile and run these standalone tests. * Cleans up the GEMM template code through various refactorings * Eliminates code sections and options that are unneccessary now that epilogue fusions are being removed. * Limits the scope of a workaround for (flaky) Cutlass issues with bias broadcasting to neccessary cases. * Puts some CPU runtime checks into #if / #endif blocks, such that it's possible to compile CUTLASS Kernels with lower CPU overhead. * Add documentation comments Pull Request resolved: https://github.com/pytorch/pytorch/pull/124577 Approved by: https://github.com/jansel ghstack dependencies: #124576	2024-04-26 20:03:20 +00:00
Catherine Lee	4a6dfbe480	Add label to label config to auto apply labels based on other labels (#125042 ) * Implemented in https://github.com/pytorch/test-infra/pull/5127, * Tested in malfet/delete me: https://github.com/malfet/deleteme/issues/85 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125042 Approved by: https://github.com/huydhn	2024-04-26 19:58:56 +00:00
Aaron Orenstein	4e2b4c6ed6	Fix broken docs (#124940 ) These were causing doctest to be unhappy. In particular the doc from #124496 caused #124771 to fail "trunk / win-vs2019-cpu-py3 / test" to fail when pushing. Not sure why it wasn't a problem on the original PR. Testing: `./test/run_doctests.sh`: before: ``` === 4 warnings in 11.21 seconds === ``` after: ``` === in 11.11 seconds === ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/124940 Approved by: https://github.com/zou3519, https://github.com/atalman, https://github.com/huydhn	2024-04-26 19:24:52 +00:00
Ashwin Hari	9266e472e2	rename ort to maia in dynamo's ort backend. (#124967 ) Fixes #124966 Co-authored-by: Thiago Crepaldi <thiagofc@microsoft.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/124967 Approved by: https://github.com/thiagocrepaldi	2024-04-26 19:09:29 +00:00
Kurt Mohler	abcb42cdd2	Avoid COW materialize in various places (1) (#124984 ) Most, not all, of these cases were found automatically with `git grep -n '^\s\<const\>.\.=.*\<data_ptr\>'` Part of #97856 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124984 Approved by: https://github.com/Skylion007	2024-04-26 19:06:28 +00:00
Daohang Shi	2ea1e84d40	log pt2 config dict to signpost from inductor post grad (#124593 ) Summary: previous attempts don't work eventually. D49720297 causes online train SEV due to extra importing. D56299408 mitigates a tricky bug from Distributed Shampoo constructor but unfortutenaly didn't correct the scuba logging either. see f552546983 Test Plan: {F1491621504} Differential Revision: D56378270 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124593 Approved by: https://github.com/anijain2305	2024-04-26 18:57:11 +00:00
YangQun1	91d565da0c	[dynamo] Add support for tensor's is_complex method (#124927 ) This PR is to add support for tensor's is_complex method in dynamo. Take the following code as an example: ```python def test_tensor_is_complex(x): if x.is_complex(): return x + 1 else: return x - 1 ``` Before this fix, the is_complex() call will cause a graph break "torch.* op returned non-Tensor bool call_method is_complex". After this fix, the graph break can be avoided. Fixes #122692 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124927 Approved by: https://github.com/ezyang	2024-04-26 18:28:14 +00:00
Catherine Lee	781ea00c90	[TD] Query Github API for base (#122214 ) A better query for the base commit of a PR. Some ghstack PRs are not connected to main so git merge-base doesn't work. Instead, use the Github API to query for the base of the PR, which should be more accurate Sanity checked on one of Ed's ghstack PRs Pull Request resolved: https://github.com/pytorch/pytorch/pull/122214 Approved by: https://github.com/seemethere	2024-04-26 18:21:24 +00:00
Huy Do	858fdd8c40	Remove cppwrapper option on inductor benchmark workflow (#124971 ) I'm restoring the `training` and `inference` options after github.com/pytorch/pytorch/pull/124795 and remove the not less-known `cppwrapper` option instead per @desertfire suggestion. The total number of parameters remains at 10. Also, the default choice for training and inference are explicitly spelled out when dispatching the workflow manually to catch dev attention. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124971 Approved by: https://github.com/ezyang	2024-04-26 17:41:24 +00:00
chilli	392dc45597	Made FlexAttention rewrite getitem calls to use aten.index in score_mod (#124799 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124799 Approved by: https://github.com/drisspg ghstack dependencies: #124444	2024-04-26 17:22:13 +00:00
PyTorch MergeBot	b4d39a5de9	Revert "[TD] Query Github API for base (#122214 )" This reverts commit `b003e0f29e`. Reverted https://github.com/pytorch/pytorch/pull/122214 on behalf of https://github.com/clee2000 due to failing on main due to mistake ([comment](https://github.com/pytorch/pytorch/pull/122214#issuecomment-2079732105))	2024-04-26 16:42:51 +00:00
egienvalue	8461e7ed9e	Add test_cpp_extensions tests for stream_and_event and mita_backend (#123614 ) Test the generic torch.Stream/Event with fake device gurad and hooks. Since we added a fake device backend, it is mutual exclusive to other backends. Tests will be skipped if TEST_CUDA or TEST_ROCM is true. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123614 Approved by: https://github.com/albanD ghstack dependencies: #123611, #123612	2024-04-26 16:17:54 +00:00
egienvalue	73744a2c00	torch.mtia module for MTIA device backend (#123612 ) MTIA device has its own Module in PyTorch now. torch.mtia has following APIs similar to other backends. The lazy_init is also supported. ``` __all__ = [ "init", "is_available", "synchronize", "device_count", "current_device", "current_stream", "default_stream", "set_stream", "stream", "device", ] ``` ------------ For device management. We expand AccleratorHooksInterface to support generic device management and it can be used in both C++ and PyThon. ``` def _accelerator_hooks_device_count() -> _int: ... def _accelerator_hooks_set_current_device(device_index: _int) -> None: ... def _accelerator_hooks_get_current_device() -> _int : ... def _accelerator_hooks_exchange_device(device_index: _int) -> _int : ... def _accelerator_hooks_maybe_exchange_device(device_index: _int) -> _int : ... ``` --------- Adding get_device_module API to retrieve device modules for different device types. ``` def get_device_module(device: Optional[Union[torch.device, str]] = None) ``` --------- Pull Request resolved: https://github.com/pytorch/pytorch/pull/123612 Approved by: https://github.com/albanD ghstack dependencies: #123611	2024-04-26 16:17:54 +00:00
xinan.lin	36af9c0d7d	[Aten] Fix XPU convolution_overrideable input memory format. (#124841 ) [Aten] Fix convolution_overrideable input memory format. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124841 Approved by: https://github.com/EikanWang, https://github.com/jgong5, https://github.com/albanD	2024-04-26 15:55:01 +00:00
Aaron Orenstein	a8574a9719	Fix global flake8 issues (#124771 ) Prior to this `lintrunner --all-files --take FLAKE8` failed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124771 Approved by: https://github.com/Skylion007 ghstack dependencies: #124428	2024-04-26 15:35:53 +00:00
Aaron Orenstein	609c958281	Fix mypy issues in fake_tensor.py (#124428 ) fake_tensor.py had mypy error ignored. That seems less than desirable. Also added SafePyObjectT<T> which is a tagged wrapper around a SafePyObject but provides static type checking (with no other guarantees). Used `SafePyObjectT<TorchDispatchModeKey>` on some of the TorchDispatchModeTLS API to ensure that we don't accidentally inject a different type than expected into the stack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124428 Approved by: https://github.com/malfet	2024-04-26 15:35:53 +00:00
Shan19900305	8d12ba9acf	add methods for open device in PackedSequence module. (#124923 ) 1) add is_{custom_device_name}() and {custom_device_name}() for open device register; 2) fix open device failed testcases. @ezyang @bdhirsh Pull Request resolved: https://github.com/pytorch/pytorch/pull/124923 Approved by: https://github.com/ezyang	2024-04-26 15:26:20 +00:00
Catherine Lee	b003e0f29e	[TD] Query Github API for base (#122214 ) A better query for the base commit of a PR. Some ghstack PRs are not connected to main so git merge-base doesn't work. Instead, use the Github API to query for the base of the PR, which should be more accurate Sanity checked on one of Ed's ghstack PRs Pull Request resolved: https://github.com/pytorch/pytorch/pull/122214 Approved by: https://github.com/seemethere	2024-04-26 15:16:36 +00:00
PyTorch MergeBot	6b54f9d3e1	Revert "fix Invalid call to aoti_torch_tensor_copy_ #123039 (#124037 )" This reverts commit `f9379ebbbf`. Reverted https://github.com/pytorch/pytorch/pull/124037 on behalf of https://github.com/jeanschmidt due to introducing regressions in benchmark, see D56623194 for more details ([comment](https://github.com/pytorch/pytorch/pull/124037#issuecomment-2079574308))	2024-04-26 15:07:09 +00:00
DanilBaibak	6bef5e9f67	[CI] Add retry mechanism to check if the Docker daemon is running (#124728 ) What is done: * Skipped the 'Kill existing containers' step - ARC runners are always ephemeral. * Added a retry mechanism to check if the Docker daemon is running. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124728 Approved by: https://github.com/seemethere, https://github.com/zxiiro, https://github.com/ZainRizvi	2024-04-26 14:36:32 +00:00
Aaron Gokaslan	2f3b0befed	[BE]: Apply ruff FURB 118. (#124743 ) Replaces various lambdas with operator.itemgetter which is more efficient (as it's a builtin function). Particularly useful for when lambdas are used as 'key' functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124743 Approved by: https://github.com/albanD, https://github.com/malfet	2024-04-26 14:34:52 +00:00
Brian Hirsh	fc2aa23c1e	Test reland "AOTAutograd: gate view-replay behind config, not the def… (#124948 ) A parallel attempt at landing https://github.com/pytorch/pytorch/pull/124945, but attempting to land through fbcode first Pull Request resolved: https://github.com/pytorch/pytorch/pull/124948 Approved by: https://github.com/albanD	2024-04-26 13:16:26 +00:00
Prachi Gupta	fc13c1c850	[aot_inductor] Enable test_aot_inductor tests for ROCm (#123393 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/123393 Approved by: https://github.com/jithunnair-amd, https://github.com/malfet	2024-04-26 13:15:35 +00:00
Stonepia	3d8585e501	[XPU] Add manual_seed and synchronize method (#124709 ) This PR set the following device-specific settings for xpu(Intel GPU) specific: 1. Set the manual seed for xpu 2. Set the synchronization method for xpu Pull Request resolved: https://github.com/pytorch/pytorch/pull/124709 Approved by: https://github.com/EikanWang, https://github.com/desertfire	2024-04-26 12:32:12 +00:00
Jerry Zhang	74afccdd80	[parametrization] fix `requires_grad` propagation (#124888 ) Summary: Previously the `requires_grad` is not propagated from original Tensor to decomposed tensors Test Plan: python test/test_parametrization.py -k test_register_parametrization_no_grad Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/124888 Approved by: https://github.com/lezcano	2024-04-26 10:19:31 +00:00
PyTorch MergeBot	d1b25596d5	Revert "Add common used score_mod functions for templated attention (#124670 )" This reverts commit `ed120b08c4`. Reverted https://github.com/pytorch/pytorch/pull/124670 on behalf of https://github.com/jeanschmidt due to Breaking internal CI, more info can be found in D56571389 ([comment](https://github.com/pytorch/pytorch/pull/124670#issuecomment-2079084881))	2024-04-26 10:18:18 +00:00
lezcano	bba59b718b	Teach ShapeEnv that a <= b => a < b + 1 (#123436 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123436 Approved by: https://github.com/ezyang ghstack dependencies: #123342	2024-04-26 10:18:01 +00:00
lezcano	fa5ea29863	Apply guard knowledge to all simplifications (#123342 ) This was an oversight in a previous PR. We were just applying this knowledge when the expression had an unbacked int Pull Request resolved: https://github.com/pytorch/pytorch/pull/123342 Approved by: https://github.com/ezyang	2024-04-26 10:18:00 +00:00
PyTorch MergeBot	359ff49bf4	Revert "[dtensor] move pad/unpad_tensor to separate utils (#124871 )" This reverts commit `0b0eea2229`. Reverted https://github.com/pytorch/pytorch/pull/124871 on behalf of https://github.com/jeanschmidt due to Broke internal tests, see D56587991 for more details ([comment](https://github.com/pytorch/pytorch/pull/124871#issuecomment-2079001103))	2024-04-26 09:30:34 +00:00
PyTorch MergeBot	35a82d4a4a	Revert "Refresh OpOverloadPacket if a new OpOverload gets added (#124654 )" This reverts commit `872eeb0d7d`. Reverted https://github.com/pytorch/pytorch/pull/124654 on behalf of https://github.com/jeanschmidt due to Broken lots of internal signals, check D56571345 for more details ([comment](https://github.com/pytorch/pytorch/pull/124654#issuecomment-2078940680))	2024-04-26 08:56:03 +00:00
PyTorch MergeBot	7324ddd80c	Revert "Delete erroneous print (#124972 )" This reverts commit `333f095d07`. Reverted https://github.com/pytorch/pytorch/pull/124972 on behalf of https://github.com/jeanschmidt due to Need to revert #124654 but this PR depends on it :( ([comment](https://github.com/pytorch/pytorch/pull/124972#issuecomment-2078936303))	2024-04-26 08:52:27 +00:00
Yu, Guangye	19a83eacb5	add new API torch.amp.is_autocast_available (#124938 ) # Motivation expose `torch._is_autocast_available` to `torch.amp.is_autocast_available` as a public api. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124938 Approved by: https://github.com/albanD	2024-04-26 08:45:20 +00:00
PyTorch MergeBot	a46c27d961	Revert "Verify types in custom op schemas (#124520 )" This reverts commit `141888765b`. Reverted https://github.com/pytorch/pytorch/pull/124520 on behalf of https://github.com/jeanschmidt due to Breaking internal tests check D56588015 for more details ([comment](https://github.com/pytorch/pytorch/pull/124520#issuecomment-2078917978))	2024-04-26 08:42:11 +00:00

1 2 3 4 5 ...

72362 Commits