pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Nikita Shulga	0f739b8f66	[Codemod] `skipIfMps`->`skipIfMPS` (#140562 ) As `MPS` is an acronym that stands for Metal Performance Shaders Also to closer align with `skipCUDAIf` not `skipCudaIf` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140562 Approved by: https://github.com/ZainRizvi, https://github.com/r-barnes	2024-11-13 19:45:08 +00:00
Nikita Shulga	68ef445c33	[MPS][Perf] Dispatch to SDP-math-mps for non-contig Tensors (#139791 ) As MacOS-15 or newer supports those out of the box. This significantly reduces memory requirements and improves performance for some stable diffision networks. Test plan: Run ```python from diffusers import StableDiffusionXLPipeline, AutoencoderKL, EulerAncestralDiscreteScheduler import torch import time vae = AutoencoderKL.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder='vae', torch_dtype=torch.bfloat16, force_upcast=False).to('mps') pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", vae=vae, torch_dtype=torch.bfloat16, variant="fp16").to('mps') pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config) start_time = time.time() start_mps_mem = torch.mps.driver_allocated_memory() image = pipe(prompt="Spherical cow in vacuum", num_inference_steps=10, guidance_scale=8, generator=torch.Generator("mps").manual_seed(42), ).images[0] end_mps_mem = torch.mps.driver_allocated_memory() run_time = time.time() - start_time print(f"run time in {run_time:.2f} sec, end_mps_mem {end_mps_mem/1024.02:.2f} Mb mem increase {(end_mps_mem-start_time)/1024.02:.2f} Mb") image.save(f'bfloat16.png') ``` Before the change total memory use were 16Gb and needed 65 sec to complete, after it drops down to 14Gb and takes 50 sec to finish on M2Pro, though generated image remains the same: ![image](https://github.com/user-attachments/assets/1a35efef-9f80-4cd0-ac9c-30203eab6bb1) Fixes https://github.com/pytorch/pytorch/issues/139389 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139791 Approved by: https://github.com/drisspg, https://github.com/Skylion007 ghstack dependencies: #139788, #139784, #139763	2024-11-06 16:25:39 +00:00
Tom Ritchford	c0582fd0f8	Remove unused Python variables in torch/[b-z]* (#136963 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136963 Approved by: https://github.com/ezyang	2024-10-19 16:45:22 +00:00
PyTorch MergeBot	b6d6aa49b8	Revert "Validate input types for `torch.nn.Linear` and `torch.nn.Bilinear` (#135596 )" This reverts commit `e157ce3ebb`. Reverted https://github.com/pytorch/pytorch/pull/135596 on behalf of https://github.com/malfet due to It's too restrictive, should allow other int-like types, such as `numpy.int64` ([comment](https://github.com/pytorch/pytorch/pull/135596#issuecomment-2349714104))	2024-09-13 18:06:56 +00:00
Sanskar Modi	e157ce3ebb	Validate input types for `torch.nn.Linear` and `torch.nn.Bilinear` (#135596 ) Adding validation checks to check the input types and display better error messages for the same. Fixes https://github.com/pytorch/pytorch/issues/135463 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135596 Approved by: https://github.com/malfet	2024-09-12 21:28:37 +00:00
Mayank Mishra	9a04cfbeff	fix for fp16 (#134106 ) This PR is a replacement for https://github.com/pytorch/pytorch/pull/133085 for pushing a quick fix for RMSNorm. The original author is @kkontny Previous PR summary: Since FP16 has quite small dynamic range it is very easy to overflow while computing `at::pow(input, 2)` , and it happens in real world computation. I've tried to use `nn.RMSNorm` fused implementation instead of `LlamaRMSNorm` inside `transformers` implementation of Llama (`src/transformers/models/llama/modeling_llama.py`). It started to give wrong answers in Fp16 while still giving good in FP32. I figured out happens due to overflow while computing square of the input tensor. Original `LLamaRMSNorm` implementation upcasts input to fp32 to prevent this and give better numerical stability. ``` class LlamaRMSNorm(nn.Module): def __init__(self, hidden_size, eps=1e-6): """ LlamaRMSNorm is equivalent to T5LayerNorm """ super().__init__() self.weight = nn.Parameter(torch.ones(hidden_size)) self.variance_epsilon = eps def forward(self, hidden_states): input_dtype = hidden_states.dtype hidden_states = hidden_states.to(torch.float32) variance = hidden_states.pow(2).mean(-1, keepdim=True) hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon) return self.weight * hidden_states.to(input_dtype) ``` Proposed commit fixed the issue. FP16 in RMSNorm has to be treated in special way, to be usable in real world implementations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134106 Approved by: https://github.com/mikaylagawarecki, https://github.com/eqy	2024-09-11 22:02:07 +00:00
Nikita Shulga	71383dd3da	[MPS] Fix bachnorm_2d for channels last (#134618 ) By skipping gather of input tensor if memory_layout is channels_last, which is a first step towards fixing https://github.com/pytorch/pytorch/issues/134580 Though underlying problem is much more interesting, i.e. MPS does not have a generic support for channels last, but `c10::is_contiguoius()` is true for channels last layout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134618 Approved by: https://github.com/albanD	2024-09-03 19:20:11 +00:00
Nikita Shulga	f95085fd91	[BE][MPS] Prefer xfail to skip (#134858 ) This essentially undoes large skips on everything but MacOS Sequoia to nn.modules made by https://github.com/pytorch/pytorch/pull/128393 Instead it uses existing `xfail`, but guards it on `_macos15_or_newer` boolean Before the change if run on MacOS 14: ``` % python3 ../test/test_modules.py -v -k Hardswish 2>&1\|tail -n3 Ran 57 tests in 0.053s OK (skipped=32) ``` After ``` % python3 ../test/test_modules.py -v -k Hardswish 2>&1\|tail -n3 Ran 57 tests in 0.229s OK (skipped=10, expected failures=2) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/134858 Approved by: https://github.com/janeyx99	2024-08-31 00:29:48 +00:00
Nikita Shulga	8de0d7690c	Use newer `toAccumulateType` signature in `Normalization.cpp` (#134540 ) Which fixes BatchNorm behavior for if called with empty tensors on MPS backed. Removed `expectedFailureMPS` in test_nn.py, deleted expected failure in `test_mps.py` and adjusted `skipIfMPS` to `expectedFailureMPS` in BatchNorm2d OpInfo decorator, but restrict it only to the memory format tests Test Plan: CI + `python3 -c "import torch; print(torch.nn.BatchNorm2d(3, device='mps')(torch.rand(0, 3, 2, 2, device='mps')))"` Fixes https://github.com/pytorch/pytorch/issues/134423 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134540 Approved by: https://github.com/Skylion007, https://github.com/albanD	2024-08-27 18:09:20 +00:00
Mikayla Gawarecki	d028b810fe	Fix flaky GroupNorm ModuleInfo test (#133899 ) Fixes https://github.com/pytorch/pytorch/issues/98677 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133899 Approved by: https://github.com/albanD	2024-08-27 14:45:51 +00:00
Denis Vieriu	861bdf96f4	[MPS] Add native strided API for MPSNDArray starting with macOS 15 (#128393 ) Add support for native strides in MPS starting with macOS Sequoia. This will get rid of the additional gather and scatter operations needed to solve the strides or storage offsets of the tensors. Summary of changes (starting with macOS 15): - Add support for MPS strided API (strides/storage offsets etc): - [initWithBuffer:offset:descriptor:](https://developer.apple.com/documentation/metalperformanceshaders/mpsndarray/4391636-initwithbuffer?language=objc) - [arrayViewWithCommandBuffer:descriptor:aliasing:](https://developer.apple.com/documentation/metalperformanceshaders/mpsndarray/3114040-arrayviewwithcommandbuffer?language=objc) - [arrayViewWithShape:strides:](https://developer.apple.com/documentation/metalperformanceshaders/mpsndarray/4408694-arrayviewwithshape?language=objc) - [reshapeWithCommandBuffer:sourceArray:shape:destinationArray:](https://developer.apple.com/documentation/metalperformanceshaders/mpsndarrayidentity/4438557-reshapewithcommandbuffer?language=objc) - Add native support for NHWC convolutions (without incurring any extra copy from NCHW -> NHWC -> NCHW). - Add support for strided output buffers (previously we would create a contiguous buffer OSes older than macOS 15 will run the old gather/scatter code path to solve strides/storage offsets. --- Couple performance stats collected from torchbench comparing macOS 15 vs macOS 14: ``` - test_train[functorch_maml_omniglot-mps]: 27% faster - test_train[timm_vision_transformer-mps]: 12% faster - test_train[hf_T5-mps]: 9.46% faster ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/128393 Approved by: https://github.com/albanD Co-authored-by: Siddharth Kotapati <skotapati@apple.com>	2024-08-16 21:07:50 +00:00
ankurneog	ebc012ace6	Add hooks for execution on intel gaudi devices - 1 (#128584 ) ## Motivation This is follow up to PR:https://github.com/pytorch/pytorch/pull/126970 to support Gaudi devices for Pytorch UT execution. ## Changes We are adding additional hooks to: 1. Add dtype exceptions for Gaudi/HPU 2. Extend onlyNativeDevices decorator functionality to add additional devices Pull Request resolved: https://github.com/pytorch/pytorch/pull/128584 Approved by: https://github.com/albanD	2024-07-20 05:03:36 +00:00
cyy	d44daebdbc	[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2024-05-31 01:20:45 +00:00
PyTorch MergeBot	67739d8c6f	Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 )" This reverts commit `699db7988d`. Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2138496995))	2024-05-30 01:16:57 +00:00
cyy	699db7988d	[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2024-05-29 11:58:03 +00:00
PyTorch MergeBot	cdbb2c9acc	Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 )" This reverts commit `4fdbaa794f`. Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2136428735))	2024-05-29 03:02:35 +00:00
cyy	4fdbaa794f	[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2024-05-27 03:54:03 +00:00
Shunting Zhang	db9c6aeec6	Revert "Skip test_memory_format_nn_BatchNorm2d in inductor (#125970 )" (#126594 ) This reverts commit `0a9c6e92f8`. enable the test since it's fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126594 Approved by: https://github.com/huydhn ghstack dependencies: #126593	2024-05-25 01:27:02 +00:00
PyTorch MergeBot	df4b7cb5f7	Reapply "Skip test_memory_format_nn_BatchNorm2d in inductor (#125970 )" (#126594 ) This reverts commit `ce6e36bf8b`. Reverted https://github.com/pytorch/pytorch/pull/126594 on behalf of https://github.com/clee2000 due to broke tests on inductor? test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_CTCLoss_cuda_float64 `43f2f43eb3` https://github.com/pytorch/pytorch/actions/runs/9200644034/job/25308511495 ([comment](https://github.com/pytorch/pytorch/pull/126586#issuecomment-2126228689))	2024-05-23 04:54:28 +00:00
Shunting Zhang	ce6e36bf8b	Revert "Skip test_memory_format_nn_BatchNorm2d in inductor (#125970 )" (#126594 ) This reverts commit `0a9c6e92f8`. enable the test since it's fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126594 Approved by: https://github.com/huydhn ghstack dependencies: #126586, #126593	2024-05-22 22:43:09 +00:00
Huy Do	0a9c6e92f8	Skip test_memory_format_nn_BatchNorm2d in inductor (#125970 ) Skipping the test in the context of https://github.com/pytorch/pytorch/issues/125967 until the issue is root caused and fixed properly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125970 Approved by: https://github.com/clee2000	2024-05-11 04:11:18 +00:00
Aaron Gokaslan	2f3b0befed	[BE]: Apply ruff FURB 118. (#124743 ) Replaces various lambdas with operator.itemgetter which is more efficient (as it's a builtin function). Particularly useful for when lambdas are used as 'key' functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124743 Approved by: https://github.com/albanD, https://github.com/malfet	2024-04-26 14:34:52 +00:00
xinan.lin	6fcbeb3489	[ATen] Add CPU fp16 support for nll_loss and cross_entropy_loss (#123256 ) Add CPU FP16 support for nll_loss and cross_entropy_loss. Resolve issue #123328. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123256 Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/malfet	2024-04-18 11:44:38 +00:00
Mikayla Gawarecki	487b6d40ec	Add RMSNorm module (#121364 ) Similar to `dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L51)` The implementation here is not optimized and we welcome pull requests to improve this - Use `normalized_shape` instead of singular integer `dim` to be aligned with the `nn.LayerNorm` implementation - Remove the [upcast to float and downcast ](`dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L73)`) Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Differential Revision: [D55485840](https://our.internmc.facebook.com/intern/diff/D55485840) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121364 Approved by: https://github.com/albanD	2024-03-29 18:05:28 +00:00
PyTorch MergeBot	8698121636	Revert "Add RMSNorm module (#121364 )" This reverts commit `a7306de0dc`. Reverted https://github.com/pytorch/pytorch/pull/121364 on behalf of https://github.com/atalman due to Broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/121364#issuecomment-2025502007))	2024-03-28 15:31:10 +00:00
Mikayla Gawarecki	cc12668053	Fix swap_tensors path in _apply for modules that inherit from RNNBase (RNN, GRU, LSTM) (#122800 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122800 Approved by: https://github.com/albanD	2024-03-27 23:34:16 +00:00
Mikayla Gawarecki	a7306de0dc	Add RMSNorm module (#121364 ) Similar to `dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L51)` The implementation here is not optimized and we welcome pull requests to improve this - Use `normalized_shape` instead of singular integer `dim` to be aligned with the `nn.LayerNorm` implementation - Remove the [upcast to float and downcast ](`dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L73)`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121364 Approved by: https://github.com/albanD	2024-03-27 21:39:30 +00:00
Mikayla Gawarecki	d621e3e3b8	Add exhaustive module and optimizer tests for torch.load(state_dict, weights_only=True) (#121049 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121049 Approved by: https://github.com/janeyx99	2024-03-05 14:27:50 +00:00
feifan	bfa71b523d	add complex32 to v3_dtypes (#120388 ) Fixes [#120290](https://github.com/pytorch/pytorch/issues/120290) Fixes https://github.com/pytorch/pytorch/issues/73502 use `v3_dtypes` and `torch._utils._rebuild_tensor_v3` to handle torch.save(complex32) result: ![image](https://github.com/pytorch/pytorch/assets/37650440/18b6cbb3-fb3f-4855-9d48-374014647988) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120388 Approved by: https://github.com/albanD	2024-02-28 02:32:29 +00:00
Mikayla Gawarecki	677e67c399	Update nn.Module._apply to not gate on should_use_set_data when swap_tensors is set (#120659 ) This updates the nesting of if statements in `nn.Module._apply` such that if `torch.__future__.set_swap_module_params_on_conversion(True)`, we always try to swap regardless of whether - `torch._has_compatible_shallow_copy_type(param, fn(param)` - `torch.__future__.set_overwrite_module_params_on_conversion` is set This means that `meta_module.to_empty('device')` can now use the swap_tensors path cc @awgu Pull Request resolved: https://github.com/pytorch/pytorch/pull/120659 Approved by: https://github.com/albanD	2024-02-28 00:59:34 +00:00
rzou	b3df3e4e94	Restore OpInfo/ModuleInfo tests in Inductor-wrapped tests (#119693 ) I accidentally disabled this without realizing it. It turns out that PYTORCH_TEST_WITH_INDUCTOR=1 implies PYTORCH_TEST_WITH_DYNAMO=1, which activates skipIfTorchDynamo decorators. Test Plan: - wait for CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/119693 Approved by: https://github.com/bdhirsh	2024-02-12 22:44:45 +00:00
Pearu Peterson	2c91e13afc	Add lowerings to special functions (#119187 ) As in the title. In addition, the PR introduces infrastructure for lowerings of pointwise functions that have both cpp and triton implementations available. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119187 Approved by: https://github.com/peterbell10	2024-02-11 16:35:40 +00:00
Mikayla Gawarecki	db1a4dcb5a	[BE] Add dtypesIfMPS to ModuleInfo enabling float16 tests for MPS and remove all skipIfMPS for float64 (#119039 ) Right now, `ModuleInfo.dtypes` defaults to `torch.testing._internal.common_dtype.floating_types()`, almost no ModuleInfos override this (so only `float32` and `float64` are tested). This is the first step to clean up/improve dtype testing for `ModuleInfos` and fix #116626. Follow up PRs will updates `dtypes=` (and perhaps `dtypesIf{Device}` (if it makes sense)) for each `ModuleInfo` Pull Request resolved: https://github.com/pytorch/pytorch/pull/119039 Approved by: https://github.com/janeyx99	2024-02-08 20:35:32 +00:00
Mikayla Gawarecki	d5a718d27b	Add swap_tensors path to nn.Module._apply (#117167 ) Added `torch.__future__.{get/set}_swap_module_params_on_conversion` that defaults to `False` for now, but we probably want to modify to override this and default to `True` in `nn.Module._apply` if input is a tensor subclass. From offline discussion, for now we are not allowing `swap_tensor` after the first module forward has been run* if the autograd graph is still alive. The reason being that `torch.utils.swap_tensors(t1, t2)` requires the `use_count` of both `TensorImpl`s associated with `t1` and `t2` to be 1. The first forward pass will install `AccumulateGrad` nodes on each param, which [bump the refcount of the associated TensorImpl](`6cf1fc66e3/torch/csrc/autograd/variable.cpp (L307)`). Future work might be to swap the refs that the `AccumulateGrad` nodes hold if it is necessary. From this, it might seem like we don't need to handle gradients. However, I still handle the grads for the edge case that the grads are set via `p.grad = grad` OR the autograd graph is no longer alive because the output has been garbage collected. If any `swap_tensors` fails on any of the parameters in the `nn.Module` we raise an error. `RNNBase` overrides `nn.Module._apply()` and installs weakrefs on some parameters. As a result, all modules that inherit from `RNNBase` (`RNN`, `GRU` and `LSTM`) cannot use the`swap_tensors` path as of now* Pull Request resolved: https://github.com/pytorch/pytorch/pull/117167 Approved by: https://github.com/albanD ghstack dependencies: #118028	2024-02-07 18:55:44 +00:00
PyTorch MergeBot	c0164f2393	Revert "[BE] Add dtypesIfMPS to ModuleInfo enabling float16 tests for MPS and remove all skipIfMPS for float64 (#119039 )" This reverts commit `04d52d5399`. Reverted https://github.com/pytorch/pytorch/pull/119039 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing MPS test in trunk `04d52d5399`, may be a landrace ([comment](https://github.com/pytorch/pytorch/pull/119039#issuecomment-1928595240))	2024-02-06 01:13:28 +00:00
Mikayla Gawarecki	04d52d5399	[BE] Add dtypesIfMPS to ModuleInfo enabling float16 tests for MPS and remove all skipIfMPS for float64 (#119039 ) Right now, `ModuleInfo.dtypes` defaults to `torch.testing._internal.common_dtype.floating_types()`, almost no ModuleInfos override this (so only `float32` and `float64` are tested). This is the first step to clean up/improve dtype testing for `ModuleInfos` and fix #116626. Follow up PRs will updates `dtypes=` (and perhaps `dtypesIf{Device}` (if it makes sense)) for each `ModuleInfo` Pull Request resolved: https://github.com/pytorch/pytorch/pull/119039 Approved by: https://github.com/janeyx99	2024-02-05 23:19:01 +00:00
Edward Z. Yang	9bce208dfb	Replace follow_imports = silent with normal (#118414 ) This is a lot of files changed! Don't panic! Here's how it works: * Previously, we set `follow_imports = silent` for our mypy.ini configuration. Per https://mypy.readthedocs.io/en/stable/running_mypy.html#follow-imports, what this does is whenever we have an import to a module which is not listed as a file to be typechecked in mypy, we typecheck it as normal but suppress all errors that occurred in that file. * When mypy is run inside lintrunner, the list of files is precisely the files covered by the glob in lintrunner.toml, but with files in excludes excluded. * The top-level directive `# mypy: ignore-errors` instructs mypy to typecheck the file as normal, but ignore all errors. * Therefore, it should be equivalent to set `follow_imports = normal`, if we put `# mypy: ignore-errors` on all files that were previously excluded from the file list. * Having done this, we can remove the exclude list from .lintrunner.toml, since excluding a file from typechecking is baked into the files themselves. * torch/_dynamo and torch/_inductor were previously in the exclude list, because they were covered by MYPYINDUCTOR. It is not OK to mark these as `# mypy: ignore-errors` as this will impede typechecking on the alternate configuration. So they are temporarily being checked twice, but I am suppressing the errors in these files as the configurations are not quite the same. I plan to unify the configurations so this is only a temporary state. * There were some straggler type errors after these changes somehow, so I fixed them as needed. There weren't that many. In the future, to start type checking a file, just remove the ignore-errors directive from the top of the file. The codemod was done with this script authored by GPT-4: ``` import glob exclude_patterns = [ ... ] for pattern in exclude_patterns: for filepath in glob.glob(pattern, recursive=True): if filepath.endswith('.py'): with open(filepath, 'r+') as f: content = f.read() f.seek(0, 0) f.write('# mypy: ignore-errors\n\n' + content) ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118414 Approved by: https://github.com/thiagocrepaldi, https://github.com/albanD	2024-01-27 02:44:11 +00:00
rzou	06576d859d	Stop running ModuleInfo tests under Dynamo (#117318 ) This is a policy decision, similar to the OpInfo one. The problem is that they just take too long to run when we reset() before and after each. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117318 Approved by: https://github.com/voznesenskym	2024-01-12 22:17:59 +00:00
Mikayla Gawarecki	d0cf2182ea	Fix TransformerEncoderLayer for bias=False (#116760 ) Fixes https://github.com/pytorch/pytorch/issues/116385 Don't call `torch._transformer_encoder_layer_fwd` when `bias=False` `bias=False` was not something that `torch._transformer_encoder_layer_fwd` was meant to work with, it was my bad that this wasn't tested as I approved https://github.com/pytorch/pytorch/pull/101687. `bias=False` was causing the `tensor_args` in [`TransformerEncoder`](`a17de2d645/torch/nn/modules/transformer.py (L663-L677)`) to contain `None`s and error on checks for the fastpath like `t.requires_grad for t in tensor_args`. Alternative fix would be to 1) Pass `torch.zeros_like({*}.weight)` to the kernel when `bias=False` and filter `tensor_args` as appropriate 2) Fix `torch._transformer_encoder_layer_fwd` to take `Optional<Tensor>` for biases and fix the kernels as appropriate Let me know if these approaches are preferable Pull Request resolved: https://github.com/pytorch/pytorch/pull/116760 Approved by: https://github.com/jbschlosser	2024-01-05 00:13:10 +00:00
Nikita Shulga	3acb7972b0	[BE] Test CrossEntropyLoss for `torch.half` (#116681 ) To test it on MPS and CUDA devices Also, move some float64 skip-tests for MPS to xfail, same as CPU tests for torch.half Pull Request resolved: https://github.com/pytorch/pytorch/pull/116681 Approved by: https://github.com/xuzhao9, https://github.com/mikaylagawarecki	2024-01-04 02:16:09 +00:00
Mikayla Gawarecki	ac60a70e06	Migrated loss functions to ModuleInfos (#115584 ) Migrates most tests in `common_nn.py:criterion_tests` to ModuleInfos. I can split this up if it is too large to review What this PR does not include: - [`no_batch_dim` tests](https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_nn.py#L3995-L4112) - [tests that use the functional variant of the loss function and `wrap_functional`](https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_nn.py#L1079-L1128) #### On test times This PR increases test time by ~58s locally Before this PR: ``` >>> python test/test_nn.py -k Loss Ran 1003 tests in 28.977s ``` After this PR ``` >>> python test/test_nn.py -k Loss Ran 368 tests in 23.073s ``` ``` >>> python test/test_modules.py -k Loss Ran 836 tests in 63.900s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115584 Approved by: https://github.com/janeyx99 ghstack dependencies: #115617	2023-12-14 16:21:05 +00:00
PyTorch MergeBot	626b7dc847	Revert "Migrated loss functions to ModuleInfos (#115584 )" This reverts commit `f138b08d2e`. Reverted https://github.com/pytorch/pytorch/pull/115584 on behalf of https://github.com/atalman due to OSS CI oncall, breaks slow test ([comment](https://github.com/pytorch/pytorch/pull/115584#issuecomment-1854855080))	2023-12-13 23:34:30 +00:00
Mikayla Gawarecki	f138b08d2e	Migrated loss functions to ModuleInfos (#115584 ) Migrates most tests in `common_nn.py:criterion_tests` to ModuleInfos. I can split this up if it is too large to review What this PR does not include: - [`no_batch_dim` tests](https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_nn.py#L3995-L4112) - [tests that use the functional variant of the loss function and `wrap_functional`](https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_nn.py#L1079-L1128) #### On test times This PR increases test time by ~58s locally Before this PR: ``` >>> python test/test_nn.py -k Loss Ran 1003 tests in 28.977s ``` After this PR ``` >>> python test/test_nn.py -k Loss Ran 368 tests in 23.073s ``` ``` >>> python test/test_modules.py -k Loss Ran 836 tests in 63.900s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115584 Approved by: https://github.com/janeyx99 ghstack dependencies: #115617	2023-12-12 22:20:20 +00:00
Wongboo	68f74dd162	Add python and C++ support for LPPool3d (#114199 ) Add python and C++ support for LPPool3d to Fixes #114114 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114199 Approved by: https://github.com/mikaylagawarecki	2023-12-08 18:18:44 +00:00
Aaron Gokaslan	b7b2178204	[BE]: Remove useless lambdas (#113602 ) Applies PLW0108 which removes useless lambda calls in Python, the rule is in preview so it is not ready to be enabled by default just yet. These are the autofixes from the rule. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113602 Approved by: https://github.com/albanD	2023-11-14 20:06:48 +00:00
CaoE	7c9052165a	add fp16 support for native conv and deconv on CPU (#99497 ) ### Testing Native conv vs. mkldnn conv on SPR (with avx512_fp16 support) Single core: Input \| Naïve impl / us \| oneDNN / us \| Speed up -- \| -- \| -- \| -- IC: 64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 \| 34676789 \| 524199.8 \| 66.15185 IC: 128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 \| 33454125 \| 349844.4 \| 95.62573 IC: 256, OC: 256, kernel: 3, stride: 1, N: 1, H: 16, W: 16, G: 1, pad: 0 \| 317650.1 \| 2317.677 \| 137.0554 IC: 128, OC: 256, kernel: 3, stride: 1, N: 1, L: 64 \| 15334.68 \| 167.264 \| 91.67952 56 cores: Input \| Naïve impl / us \| oneDNN / us \| Speed up -- \| -- \| -- \| -- IC: 64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 \| 1032064 \| 11073.58 \| 93.20061 IC: 128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 \| 1000097 \| 16371.19 \| 61.08883 IC: 256, OC: 1024, kernel: 1, stride: 1, N: 256, H: 14, W: 14, G: 1, pad: 0 \| 981813.4 \| 9008.908 \| 108.9825 IC: 1024, OC: 256, kernel: 1, stride: 1, N: 256, H: 14, W: 14, G: 1, pad: 0 \| 1082606 \| 10150.47 \| 106.6558 IC: 256, OC: 256, kernel: 3, stride: 1, N: 1, H: 16, W: 16, G: 1, pad: 0 \| 319980.6 \| 181.598 \| 1762.027 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99497 Approved by: https://github.com/jgong5, https://github.com/cpuhrsch	2023-09-25 01:31:26 +00:00
FFFrog	003c5bb156	Add checks to `num_layers` for `RNN`, `LSTM`, `GRU` (#108853 ) Fixes #108223 As the title shown Pull Request resolved: https://github.com/pytorch/pytorch/pull/108853 Approved by: https://github.com/mikaylagawarecki	2023-09-09 19:33:52 +00:00
CaoE	8f02884569	add Half support for GroupNorm on CPU (#100234 ) ### Testing Single socket (28cores): * Contiguous: shape \| forward / s\| forward / s\| backward / s\| backward / s -- \| -- \| -- \| -- \| -- \| fp32 \| mixed fp32 fp16 \| fp32 \| mixed fp32 fp16 [10, 128, 10, 10] \| 2.45E-05 \| 3.26E-05 \| 6.87E-05 \| 7.40E-05 [10, 128, 80, 80] \| 0.000726 \| 0.000606 \| 0.002183 \| 0.001112 * Channels Last: shape \| forward / s\| forward / s\| backward / s\| backward / s -- \| -- \| -- \| -- \| -- \| fp32 \| mixed fp32 fp16 \| fp32 \| mixed fp32 fp16 [10, 128, 10, 10] \| 2.88E-05 \| 2.72E-05 \| 6.56E-05 \| 6.63E-05 [10, 128, 80, 80] \| 0.00076 \| 0.000256 \| 0.002385 \| 0.000735 Single core: * Contiguous: shape \| forward / s\| forward / s\| backward / s\| backward / s -- \| -- \| -- \| -- \| -- \| fp32 \| mixed fp32 fp16 \| fp32 \| mixed fp32 fp16 [10, 128, 10, 10] \| 9.47E-05 \| 1.90E-04 \| 2.03E-04 \| 3.10E-04 [10, 128, 80, 80] \| 6.25E-03 \| 8.98E-03 \| 0.016485 \| 0.01369 * Channels Last: shape \| forward / s\| forward / s\| backward / s\| backward / s -- \| -- \| -- \| -- \| -- \| fp32 \| mixed fp32 fp16 \| fp32 \| mixed fp32 fp16 [10, 128, 10, 10] \| 8.66E-05 \| 7.89E-05 \| 1.95E-04 \| 1.43E-04 [10, 128, 80, 80] \| 5.97E-03 \| 3.13E-03 \| 0.01626 \| 8.70E-03 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100234 Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki	2023-09-01 21:25:24 +00:00
Mikayla Gawarecki	584a01b650	Fix LayerNorm(bias=False) error (#108060 ) Fixes #108048 - [ ] Cherry pick this [here](https://github.com/pytorch/pytorch/issues/108055) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108060 Approved by: https://github.com/jbschlosser, https://github.com/albanD, https://github.com/malfet	2023-08-28 18:23:13 +00:00
CaoE	3267996372	add channel last 3d support for maxpool3d on CPU (#97775 ) ### Testing Single socket (28 cores): shape \| fp32 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig \| 3.959584 \| 5.493402 \| 0.557232 \| 0.568485 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL \| 0.815511 \| 1.351261 \| 5.710506 \| 10.57506 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig \| 10.63426 \| 15.28637 \| 2.67656 \| 1.71365 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL \| 2.63570 \| 2.05532 \| 2.55452 \| 2.33923 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig \| 0.375469 \| 0.479748 \| 0.066364 \| 0.065155 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d \| 0.112197 \| 0.112326 \| 0.111697 \| 0.145364 Single core: shape \| fp32 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig \| 92.16582 \| 128.6513 \| 6.684325 \| 12.21541 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL \| 10.14318 \| 29.80297 \| 7.350142 \| 11.25323 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig \| 238.55453 \| 331.89967 \| 19.694657 \| 32.78853 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL \| 30.17079 \| 32.75628 \| 22.44543 \| 30.17796 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig \| 7.474389 \| 9.937217 \| 0.236015 \| 0.434229 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d \| 2.318954 \| 2.469444 \| 0.262125 \| 0.401361 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97775 Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki	2023-08-26 00:21:27 +00:00

1 2 3

144 Commits