pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Edward Z. Yang	f7365eca90	Add unbacked symints support; item works now (#90624 ) The big idea is to add `create_unbacked_symfloat` and `create_unbacked_symint` to ShapeEnv, allowing you to allocate symbolic floats/ints corresponding to data you don't know about at compile time. Then, instead of immediately erroring out when you try to call local_scalar_dense on a FakeTensor, we instead create a fresh symint/symfloat and return that. There a bunch of odds and ends that need to be handled: * A number of `numel` calls converted to `sym_numel` * When we finally return from item(), we need to ensure we actually produce a SymInt/SymFloat when appropriate. The previous binding code assumed that you would have to get a normal Python item. I add a pybind11 binding for Scalar (to PyObject only) and refactor the code to use that. There is some trickiness where you are NOT allowed to go through c10::SymInt if there isn't actually any SymInt involved. See comment. * One of our unit tests tripped an implicit data dependent access which occurs when you pass a Tensor as an argument to a sizes parameter. This is also converted to support symbolic shapes * We now support tracking bare SymInt/SymFloat returns in proxy tensor mode (this was already in symbolic-shapes branch) * Whenever we allocate an unbacked symint, we record the stack trace it was allocated at. These get printed when you attempt data dependent access on the symint (e.g., you try to guard on it) * Subtlety: unbacked symints are not necessarily > 1. I added a test for this. These unbacked symints are not very useful right now as you will almost always immediately raise an error later when you try to guard on them. The next logical step is adding an assertion refinement system that lets ShapeEnv learn facts about unbacked symints so it can do a better job eliding guards that are unnecessary. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90624 Approved by: https://github.com/Skylion007, https://github.com/voznesenskym	2022-12-12 13:33:07 +00:00
Larry Liu	ddf00c803b	[torchgen] Introduce Executorch types and signatures (#90591 ) Retry of #89595. Accidentally closed. ## Forked `BaseCppType` Created a module for Executorch: `torchgen.executorch`. In `torchgen.executorch.api.types.types`: * Define `BaseCppType` with `torch::executor` namespace. In `torchgen.executorch.api.et_cpp`: * Help generate `NamedCType` for `ExecutorchCppSignature` arguments. In `torchgen.executorch.api.types.signatures`: * Define the signature using these types. (`ExecutorchCppSignature`) In `torchgen.executorch.api.types.__init__`: * Suppress flake8 error for `import *`. Differential Revision: [D41501836](https://our.internmc.facebook.com/intern/diff/D41501836/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90591 Approved by: https://github.com/iseeyuan	2022-12-10 04:34:02 +00:00
Larry Liu	de6beca838	[torchgen] Let native function declaration generation logic take a callable (#90590 ) Retry of #89594. Accidentally closed. This PR allows `get_native_function_declarations` API to take a function as argument. This function should take `NativeFunction` as input and emit code for native function declaration. By default it is `dest.compute_native_function_declaration`. Differential Revision: [D41501838](https://our.internmc.facebook.com/intern/diff/D41501838/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90590 Approved by: https://github.com/iseeyuan	2022-12-10 04:34:02 +00:00
Richard Barnes	ad188a227e	Introduce CUDA Device Assertions Infrastructure (#84609 ) Summary: This diff introduces a set of changes that makes it possible for the host to get assertions from CUDA devices. This includes the introduction of `CUDA_KERNEL_ASSERT2` A preprocessor macro to be used within a CUDA kernel that, upon an assertion failure, writes the assertion message, file, line number, and possibly other information to UVM (Managed memory). Once this is done, the original assertion is triggered, which places the GPU in a Bad State requiring recovery. In my tests, data written to UVM appears there before the GPU reaches the Bad State and is still accessible from the host after the GPU is in this state. Messages are written to a multi-message buffer which can, in theory, hold many assertion failures. I've done this as a precaution in case there are several, but I don't actually know whether that is possible and a simpler design which holds only a single message may well be all that is necessary. `TORCH_DSA_KERNEL_ARGS` This preprocess macro is added as an _argument_ to a kernel function's signature. It expands to supply the standardized names of all the arguments needed by `C10_CUDA_COMMUNICATING_KERNEL_ASSERTION` to handle device-side assertions. This includes, eg, the name of the pointer to the UVM memory the assertion would be written to. This macro abstracts the arguments so there is a single point of change if the system needs to be modified. `c10::cuda::get_global_cuda_kernel_launch_registry()` This host-side function returns a singleton object that manages the host's part of the device-side assertions. Upon allocation, the singleton allocates sufficient UVM (Managed) memory to hold information about several device-side assertion failures. The singleton also provides methods for getting the current traceback (used to identify when a kernel was launched). To avoid consuming all the host's memory the singleton stores launches in a circular buffer; a unique "generation number" is used to ensure that kernel launch failures map to their actual launch points (in the case that the circular buffer wraps before the failure is detected). `TORCH_DSA_KERNEL_LAUNCH` This host-side preprocessor macro replaces the standard ``` kernel_name<<<blocks, threads, shmem, stream>>>(args) ``` invocation with ``` TORCH_DSA_KERNEL_LAUNCH(blocks, threads, shmem, stream, args); ``` Internally, it fetches the UVM (Managed) pointer and generation number from the singleton and append these to the standard argument list. It also checks to ensure the kernel launches correctly. This abstraction on kernel launches can be modified to provide additional safety/logging. `c10::cuda::c10_retrieve_device_side_assertion_info` This host-side function checks, when called, that no kernel assertions have occurred. If one has. It then raises an exception with: 1. Information (file, line number) of what kernel was launched. 2. Information (file, line number, message) about the device-side assertion 3. Information (file, line number) about where the failure was detected. Checking for device-side assertions Device-side assertions are most likely to be noticed by the host when a CUDA API call such as `cudaDeviceSynchronize` is made and fails with a `cudaError_t` indicating > CUDA error: device-side assert triggered CUDA kernel errors Therefore, we rewrite `C10_CUDA_CHECK()` to include a call to `c10_retrieve_device_side_assertion_info()`. To make the code cleaner, most of the logic of `C10_CUDA_CHECK()` is now contained within a new function `c10_cuda_check_implementation()` to which `C10_CUDA_CHECK` passes the preprocessor information about filenames, function names, and line numbers. (In C++20 we can use `std::source_location` to eliminate macros entirely!) # Notes on special cases * Multiple assertions from the same block are recorded * Multiple assertions from different blocks are recorded * Launching kernels from many threads on many streams seems to be handled correctly * If two process are using the same GPU and one of the processes fails with a device-side assertion the other process continues without issue * X Multiple assertions from separate kernels on different streams seem to be recorded, but we can't reproduce the test condition * X Multiple assertions from separate devices should be all be shown upon exit, but we've been unable to generate a test that produces this condition Differential Revision: D37621532 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84609 Approved by: https://github.com/ezyang, https://github.com/malfet	2022-12-08 01:26:07 +00:00
Edward Z. Yang	7abd035b2f	Add missing mypy-nofollow.ini (#90179 ) I'm not sure how lintrunner worked without this lol. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90179 Approved by: https://github.com/albanD, https://github.com/voznesenskym	2022-12-08 01:05:12 +00:00
Peter Bell	e6a7278753	Give std/var correction overloads proper defaults (#56398 ) The correction overloads defaults were left off for forward compatibility reasons, but this FC window expired well over a year ago at this point. Differential Revision: [D29625593](https://our.internmc.facebook.com/intern/diff/D29625593) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56398 Approved by: https://github.com/mruberry	2022-12-07 15:15:00 +00:00
Ram Rachum	351d73b97f	Fix exception causes all over the codebase (#90271 ) This is the continuation to #90134 and hopefully the final PR in this series. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90271 Approved by: https://github.com/kit1980	2022-12-07 04:29:00 +00:00
Peter Bell	4f44877983	[Inductor] Add test for Scheduler fusions (#90014 ) Currently there is `test_vertical_fusion1` which fuses entirely during the lowering stage and no buffers are realized. This adds `test_scheduler_vertical_fusion1` which is the same test but with several intermediate calculations realized so the scheduler is left to do the fusion. To support the test, this PR also adds: - `metrics.ir_nodes_pre_fusion` which when compared with `generated_kernel_count` tells us how many nodes were fused. - `torch._test_inductor_realize` which is an identity operator in eager, but under inductor also forces the input to be realized. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90014 Approved by: https://github.com/jansel	2022-12-07 01:33:25 +00:00
Kimish Patel	bd456fb549	[Pytorch][Vulkan] shader codegen use ordered dictionary (#89951 ) When not using ordered dictionary, it can result in parameter values have different order for each specialization. This can result shader names which are not consistent in their naming and meaning of the template parameter values that appear in the meaning of their names. For example if you have: conv2d_pw: default_values: - X: 1 - Y: 2 parameter_values: - Y: 3 Default parameter value can generate shader with 'my_shader_1x2' where 1x2 is for X, Y parameters respectively. Then, for non default values, of which there is only 1, we have Y=3 and with existing implementation you can end up genreating shader with 'my_shader_3x1'. Here 3 is for Y and 1 is for X. This leads to confusing shader names. THis diff fixes this by 1. using ordered dict. 2. non default values are updated by first copying default values and then updating them. Differential Revision: [D41006639](https://our.internmc.facebook.com/intern/diff/D41006639/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89951 Approved by: https://github.com/salilsdesai	2022-12-06 00:49:35 +00:00
William Wen	0c3537a3c3	Add dynamo smoke tests to CI (#89302 ) Add dynamo smoke tests to CI, which checks for python/torch/cuda versions and runs simple dynamo examples on a few backends, including inductor. Smoke tests will run on dynamo and inductor shards. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89302 Approved by: https://github.com/malfet	2022-11-30 21:24:45 +00:00
Pearu Peterson	76c6dfeaa6	Add layout and blocksize arguments to Tensor.to_sparse method (#89502 ) This PR extends the `Tensor.to_sparse()` method to `Tensor.to_sparse(layout=None, blocksize=None)` in a BC manner (`layout=None` means `layout=torch.sparse_coo`). In addition, the PR adds support for the following conversions: - non-hybrid/hybrid COO tensor to CSR or CSC or a COO tensor - short, bool, byte, char, bfloat16, int, long, half CSR tensor to a BSR tensor and fixes the following conversions: - hybrid COO to COO tensor - non-batch/batch hybrid BSR to BSR or BSC tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/89502 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2022-11-30 20:21:10 +00:00
Nikita Karetnikov	4cb6bbbe27	Symintify `embedding` (#89327 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89327 Approved by: https://github.com/ezyang	2022-11-24 03:25:00 +00:00
Jane Xu	8695f0cced	Rectify `native_batch_norm` schema by splitting it into two legit schemas (#88697 ) Using the same repro from the issue (but with BatchNorm2D) Rectifies native_batch_norm schema by splitting the schema into 2: 1. one will have NON-optional alias-able running_mean and running_var inputs 2. the other will just not have those parameters at all (no_stats variation) Calling for name suggestions! ## test plan I've added tests in test_functionalization.py as well as an entry in common_method_invocations.py for `native_batch_norm_legit` CI should pass. ## next steps Because of bc/fc reasons, we reroute native_batch_norm to call our new schemas ONLY through the python dispatcher, but in 2 weeks or so, we should make `native_batch_norm_legit` the official batch_norm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88697 Approved by: https://github.com/albanD	2022-11-23 23:23:17 +00:00
Huy Do	b8d3afd886	Skip upload test stats for test reports from rerun disabled tests workflow (#89548 ) I have found the reason why uploading tests stats fails for rerun disabled workflow, for example https://github.com/pytorch/pytorch/actions/runs/3522896778/jobs/5917765699. The problem is that the pytest XML file is now too big to be processed quickly (x50 bigger). Unlike unittest, `pytest-flakefinder` used by rerun disabled tests for test_ops includes skipped messages multiple times (50 times by default, retrying and skipping). This slows down the upload test stats script too much (O(n)) because it tries to gather all the stats. On the other hand, `check_disabled_tests` doesn't suffer from the same issue because it ignores all these skipped messages. This is a quick fix to skip test reports from rerun disabled tests workflow when trying to upload test stats. I'll try to fix this properly later in the way we use pytest-flakefinder. From what I see, a zipped test report from rerun disabled test is only few MB ([example](https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/3521687954/1/artifact/test-reports-test-default-1-2-linux.2xlarge_9636028803.zip)), but will balloon up to a much bigger XML file after extracting from a dozen to a few hundred MB (text). The size of the zipped file is not a big immediate problem ### Testing [3521687954](https://github.com/pytorch/pytorch/actions/runs/3521687954) is an example workflow with rerun disabled tests and mem leak check. The script can now finish when running locally: * `upload_test_stats` finishes around 3+ minutes ``` time python -m tools.stats.upload_test_stats --workflow-run-id 3521687954 --workflow-run-attempt 1 --head-branch master ... Writing 8925 documents to S3 Done! Writing 1760 documents to S3 Done! Writing 1675249 documents to S3 Done! python3 -m tools.stats.upload_test_stats --workflow-run-id 3521687954 1 185.69s user 12.89s system 75% cpu 4:22.82 total ``` * `check_disabled_tests` finishes within 3 minutes ``` time python -m tools.stats.check_disabled_tests --workflow-run-id 3521687954 --workflow-run-attempt 1 --repo pytorch/pytorch ... python -m tools.stats.check_disabled_tests --workflow-run-id 3521687954 1 154.19s user 4.17s system 97% cpu 2:42.50 total ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89548 Approved by: https://github.com/clee2000	2022-11-23 22:39:39 +00:00
Nikita Karetnikov	07dd2fe6c3	Symintify `select` (#89326 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89326 Approved by: https://github.com/ezyang	2022-11-23 05:00:33 +00:00
Edward Z. Yang	5266953443	Add crossref debug mode for functionalization, catches stride errors (#89498 ) The idea is to add a custom handler to Functionalize key in Python dispatcher that runs the functionalized version along side a non functionalized version, and checks that their outputs agree in the end. (Technically, for metadata mutation we should also check the inputs, but for now we're relying on those functions returning self.) I turned this on for test_functionalize.py (new TestCrossRefFunctionalize) and found a bunch of failures that look legit. This probably doesn't interact that nicely if you're also tracing at the same time, probably need more special logic for that (directly, just disabling tracing for when we create the nested fake tensor mode, but IDK if there's a more principled way to organize this.) There are some misc fixups which I can split if people really want. - xfail_inherited_tests moved to test common_utils - Bindings for _dispatch_tls_set_dispatch_key_included, _dispatch_tls_is_dispatch_key_included and _functionalization_reapply_views_tls - Type stubs for _enable_functionalization, _disable_functionalization - all_known_overloads utility to let you iterate over all OpOverloads in all namespaces. Iterator support on all torch._ops objects to let you iterate over their members. - suspend_functionalization lets you temporarily disable functionalization mode in a context - check_metadata_matches for easily comparing outputs of functions and see if they match (TODO: there are a few copies of this logic, consolidate!) - _fmt for easily printing the metadata of a tensor without its data - _uncache_dispatch for removing a particular dispatch key from the cache, so that we force it to regenerate - check_significant_strides new kwarg only_cuda to let you also do stride test even when inputs are not CUDA - Functionalize in torch._C.DispatchKey Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89498 Approved by: https://github.com/malfet	2022-11-23 04:18:25 +00:00
Mengwei Liu	047e542a1a	[tools] expose selective build library (#89351 ) Change the base module and visibility of `tools:gen_oplist_lib` so that it can be reused. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89351 Approved by: https://github.com/cccclai	2022-11-21 21:08:13 +00:00
Driss Guessous	1d9e1fca97	Update sdp dispatch logic to enable fused backward (#89154 ) # Summary Reorganizes how the sdp dispatch logic is down in order to enable backwards for fused kernels Pull Request resolved: https://github.com/pytorch/pytorch/pull/89154 Approved by: https://github.com/cpuhrsch	2022-11-21 20:02:09 +00:00
PyTorch MergeBot	e1d58b1928	Revert "Update sdp dispatch logic to enable fused backward (#89154 )" This reverts commit `2e72ec7982`. Reverted https://github.com/pytorch/pytorch/pull/89154 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but the new test_sdp_math_gradcheck test breaks periodic slow gradcheck, i.e. `419ef2cdcf`	2022-11-20 22:14:38 +00:00
Driss Guessous	2e72ec7982	Update sdp dispatch logic to enable fused backward (#89154 ) # Summary Reorganizes how the sdp dispatch logic is down in order to enable backwards for fused kernels Pull Request resolved: https://github.com/pytorch/pytorch/pull/89154 Approved by: https://github.com/cpuhrsch	2022-11-19 02:06:27 +00:00
Huy Do	573eaf1225	Analyze and upload disabled tests rerun to S3 (#89083 ) Analyze and upload disabled tests rerun to S3. Note that this only picks up `test-reports` from `rerun_disable_tests` workflows. ### Testing Running the script manually `python -m tools.stats.check_disabled_tests --workflow-run-id 3473068035 --workflow-run-attempt 1 --repo pytorch/pytorch` and see the files successfully uploaded to s3://ossci-raw-job-status/rerun_disabled_tests/3473068035/1 Rockset collection created https://console.rockset.com/collections/details/commons.rerun_disabled_tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/89083 Approved by: https://github.com/clee2000	2022-11-17 03:36:58 +00:00
Driss Guessous	ff6d2a6d1b	Add mem efficient backward (#88856 ) # Registers the derivative for mem efficient backward - Use gradcheck to test correctness. The kernel is not implemented for fp64 so run checks with bumped tolerances in fp32 - I also made updates based off of Xformer main branch and flash-attention cutlass branch. - This will enable the fused backward to be called for scaled dot product attention Pull Request resolved: https://github.com/pytorch/pytorch/pull/88856 Approved by: https://github.com/cpuhrsch	2022-11-15 20:22:57 +00:00
PyTorch MergeBot	50c18217a3	Revert "Add mem efficient backward (#88856 )" This reverts commit `35e668b5ce`. Reverted https://github.com/pytorch/pytorch/pull/88856 on behalf of https://github.com/DanilBaibak due to breaking internal builds	2022-11-15 09:37:09 +00:00
Driss Guessous	35e668b5ce	Add mem efficient backward (#88856 ) # Registers the derivative for mem efficient backward - Use gradcheck to test correctness. The kernel is not implemented for fp64 so run checks with bumped tolerances in fp32 - I also made updates based off of Xformer main branch and flash-attention cutlass branch. - This will enable the fused backward to be called for scaled dot product attention Pull Request resolved: https://github.com/pytorch/pytorch/pull/88856 Approved by: https://github.com/cpuhrsch	2022-11-15 01:10:35 +00:00
BowenBao	20ae19aa1d	[ONNX] Improve diagnostic message formatting (#87830 ) * Reflect required arguments in method signature for each diagnostic rule. Previous design accepts arbitrary sized tuple which is hard to use and prone to error. ![image](https://user-images.githubusercontent.com/9376104/200381982-d1e905f0-a159-4ef5-8d2e-070524e8f5bf.png) * Removed `DiagnosticTool` to keep things compact. * Removed specifying supported rule set for tool(context) and checking if rule of reported diagnostic falls inside the set, to keep things compact. * Initial overview markdown file. * Change `full_description` definition. Now `text` field should not be empty. And its markdown should be stored in `markdown` field. * Change `message_default_template` to allow only named fields (excluding numeric fields). `field_name` provides clarity on what argument is expected. * Added `diagnose` api to `torch.onnx._internal.diagnostics`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87830 Approved by: https://github.com/abock	2022-11-10 21:42:17 +00:00
Panagiotis Antoniadis	656d0de6c5	Change TORCH_INTERNAL_ASSERT to TORCH_CHECK and add a nice error message (#88804 ) Fixes #87672 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88804 Approved by: https://github.com/ezyang	2022-11-10 18:11:32 +00:00
ssjia	c4a3aa8fe7	[vulkan] Add option for buffer representations in vTensor (#87622 ) This diff adds the option to use a Buffer to store data for a `vTensor` by passing `StorageType::BUFFER` to the constructor of `vTensor`. To enable this change, the construction of `vTensor` and `vTensorStorage` had to be slightly refactored to properly support strides. To summarize the changes: * `vTensorStorage` now contains no Tensor metadata (such as tensor sizes, strides, and `TensorOptions`) - it now only contains the image extents (if texture storage is used) and the buffer length. Tensor metadata is now managed by `vTensor`. The reason for this is to allow multiple `vTensor` objects to point to the same `vTensorStorage` but with different metadata which may be a useful feature now that Buffer storage is enabled. * `vTensor` will now compute the strides upon construction based on the requested sizes and memory layout if Buffer storage is requested. Previously, strides were faked by setting them all to 0 as strides do not apply to image textures (this behavior is preserved for texture storage). Differential Revision: [D40604163](https://our.internmc.facebook.com/intern/diff/D40604163/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87622 Approved by: https://github.com/digantdesai	2022-11-09 17:59:49 +00:00
Fabio Rocha	652af5ec15	upsample_*.vec ops are now CompositeImplicit (#85638 ) It was previously CompositeExplicit but it was not really necessary. See discussion in https://github.com/pytorch/pytorch/issues/85405 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85638 Approved by: https://github.com/ezyang, https://github.com/lezcano, https://github.com/malfet, https://github.com/jansel	2022-11-09 09:58:04 +00:00
Kurt Mohler	ee28b865ee	Deprecate TypedStorage, its derived classes, and all of their public methods (#85303 ) Part of #85302 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85303 Approved by: https://github.com/ezyang	2022-11-08 18:11:01 +00:00
Edward Z. Yang	825f4e602b	Add support for symbolic shapes to sparse tensor (#88573 ) Along the way, I undid making sparse/dense dim symint (they're dimensions, so they should be static.) Also symintify set_indices_and_values_unsafe There is a little bit of a nontrivial infra change here: previously, we didn't populate the strides field on sparse tensors. It is now populated with "empty" strides, and this meant that sparse tensors were falsely reporting they were non-overlapping dense/contiguous. I added in a hack to work around this case. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/88573 Approved by: https://github.com/anjali411	2022-11-08 03:13:42 +00:00
Antoni Viros i Martin	c77368d416	Implement a constructor for nested_tensor that is similar to torch.tensor() (#88213 ) Summary: This diff merges both previous implementations of constructors for nested tensors, the one from lists of tensors and the one with arbitrary python lists, adn implements it in pytorch core so no extensions are needed to construct NT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88213 Approved by: https://github.com/cpuhrsch	2022-11-08 00:03:18 +00:00
Catherine Lee	d632d94cc7	Disable mem leak check (#88373 ) tbh at this point it might be easier to make a new workflow and copy the relevant jobs... Changes: * Disable cuda mem leak check except for on scheduled workflows * Make pull and trunk run on a schedule which will run the memory leak check * Periodic will always run the memory leak check -> periodic does not have parallelization anymore * Concurrency check changed to be slightly more generous Pull Request resolved: https://github.com/pytorch/pytorch/pull/88373 Approved by: https://github.com/ZainRizvi, https://github.com/huydhn	2022-11-04 20:47:42 +00:00
Jane Xu	3e6579b8f6	Don't print fatal:... in generate_torch_version.py (#88335 ) During build, users commonly see a message like ``` fatal: no tag exactly matches 'd8b4f33324b1eb6c1103874764116fb68e0d0af4' ``` which is usually ignored when builds succeed, but has confused users when build fails (due to a different issue). This PR removes the red herring, since this usually prints for local development when tags are not found. We catch the exception anyway and handle it under the hood, so we don't need to print it and confuse the user. Test plan: Note that builds on trunk current have this line, cmd-F 'fatal: no tag exactly matches' in https://github.com/pytorch/pytorch/actions/runs/3379162092/jobs/5610355820. Then check in the PR build to see that the line no longer appears. I also tagged my commit locally and printed what tag would be--this code and the old code printed the same results for what tag would be. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88335 Approved by: https://github.com/seemethere	2022-11-04 20:34:23 +00:00
Kimish Patel	9533fe9031	[pytorch][vulkan] Add bias storage type to template (#88324 ) To enable buffer based use for bias as well, this diff adds storage type for bias to template Differential Revision: [D40689003](https://our.internmc.facebook.com/intern/diff/D40689003/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88324 Approved by: https://github.com/jmdetloff	2022-11-03 20:02:24 +00:00
Kimish Patel	893f8e3790	[PyTorch][Vulkan] Add template based codegen for shader generation (#88323 ) We would like to be able to parameterize kernels such that a parameterized algorithm can be implemented via templates. We can then profile performance of a kernel with different parameter values. This enables us to determine what parameters may work the best for a given kernel or a given device. In this diff one such kernel added in 1x1 conv which parameters across size of the tile being produced by each invocation. Few other options for parameters can be: - One can imagine dtype can also be a parameter such that we can do compute in fp16 or int8/int16. - Register blocking for input channels Differential Revision: [D40280336](https://our.internmc.facebook.com/intern/diff/D40280336/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D40280336/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/88323 Approved by: https://github.com/jmdetloff	2022-11-03 19:51:51 +00:00
Edward Z. Yang	2f296cfdbb	Add a reshape_copy operator. (#88314 ) The semantics is "as if" you did a reshape, but it always copied even if the input was directly view'able. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/88314 Approved by: https://github.com/albanD	2022-11-03 12:53:51 +00:00
Jerry Zhang	a0fb234b45	[codegen] using TORCH_LIBRARY_FRAGMENT for some namespaces (#88229 ) Summary: Sometimes we want to extend an existing custom namespace library, instead of creating a new one, but we don't have a namespace config right now, so we hardcode some custom libraries defined in pytorch today, i.e. quantized and quantized_decomposed Test Plan: ci Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/88229 Approved by: https://github.com/ezyang	2022-11-03 02:30:02 +00:00
Kimish Patel	72f3688029	[Pytorch][Vulkan] Update spv generation script to embed shader parameters (#88321 ) This diffs adds shader parameters such as tile size, weight storage type and format to the generated spv.cpp file. This is used in ShaderInfo struct that ops such as convolution will use to determine, the workgroup size and how to pack weights. Differential Revision: [D40280337](https://our.internmc.facebook.com/intern/diff/D40280337/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88321 Approved by: https://github.com/jmdetloff, https://github.com/mcr229	2022-11-02 23:28:18 +00:00
Huy Do	7c6fe21a38	Fix monitoring script for macos (#88159 ) The monitoring script is currently failing with AccessDenied when trying to access uss memory on mac because [psutil.memory_full_info](https://psutil.readthedocs.io/en/latest/index.html?highlight=memory_full_info) requires higher user privileges Example failures: * https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/3363066309/1/artifact/usage-log-test-default-2-2-macos-12_9208104847.zip * https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/3363066309/1/artifact/usage-log-test-default-2-2-macos-m1-12_9207913759.zip I could also make this script run with sudo, effectively granting this permission. But I'm not entirely sure that we need uss memory for mac, so gracefully handling the error looks nicer Pull Request resolved: https://github.com/pytorch/pytorch/pull/88159 Approved by: https://github.com/clee2000	2022-11-01 05:58:44 +00:00
KevinYuk	e9cabef663	enable xpu group norm channels last support (#87680 ) XPU would support channels last format for group norm operator, however, Pytorch converts all input tensor to contiguous format, which includes channels last tensor. Need Pytorch pass down this memory format hint to us. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87680 Approved by: https://github.com/albanD	2022-10-31 19:46:01 +00:00
Edward Z. Yang	ff94494644	Revert "Revert "Unify meta tensor and fake tensor converter conversion (#87943 )"" (#88045 ) This reverts commit `bc64999b83`. Check torch/_subclasses/meta_utils.py for "This is very tricky" for the bugfix explanation. cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/88045 Approved by: https://github.com/kit1980, https://github.com/Chillee	2022-10-31 17:50:14 +00:00
PyTorch MergeBot	bc64999b83	Revert "Unify meta tensor and fake tensor converter conversion (#87943 )" This reverts commit `baa715e790`. Reverted https://github.com/pytorch/pytorch/pull/87943 on behalf of https://github.com/kit1980 due to Broke several inductor tests	2022-10-29 18:39:28 +00:00
Huy Do	384b84d6a6	[BE] Upload GHA artifacts to S3 (#87827 ) This is exclusively used by macOS, ROCM (and any other future workflows) that don't have direct access to S3 to upload their artifacts ### Testing Running the script locally with the personal GITHUB_TOKEN: ``` python3 -m tools.stats.upload_artifacts --workflow-run-id 3342375847 --workflow-run-attempt 1 --repo pytorch/pytorch Using temporary directory: /var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb Downloading sccache-stats-macos-12-py3-arm64-runattempt1-9155493770 Downloading sccache-stats-macos-12-py3-lite-interpreter-x86-64-runattempt1-9155493303 Downloading sccache-stats-macos-12-py3-x86-64-runattempt1-9155493627 Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/sccache-stats-macos-12-py3-arm64-runattempt1-9155493770 to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/sccache-stats-macos-12-py3-arm64-9155493770 Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/sccache-stats-macos-12-py3-lite-interpreter-x86-64-runattempt1-9155493303 to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/sccache-stats-macos-12-py3-lite-interpreter-x86-64-9155493303 Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/sccache-stats-macos-12-py3-x86-64-runattempt1-9155493627 to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/sccache-stats-macos-12-py3-x86-64-9155493627 Downloading test-jsons-runattempt1-test-default-1-2-linux.rocm.gpu_9155913429.zip Downloading test-jsons-runattempt1-test-default-1-2-macos-12_9155944815.zip Downloading test-jsons-runattempt1-test-default-1-2-macos-m1-12_9155888061.zip Downloading test-jsons-runattempt1-test-default-2-2-linux.rocm.gpu_9155913500.zip Downloading test-jsons-runattempt1-test-default-2-2-macos-12_9155944892.zip Downloading test-jsons-runattempt1-test-default-2-2-macos-m1-12_9155888182.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/test-jsons-runattempt1-test-default-1-2-linux.rocm.gpu_9155913429.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/test-jsons-test-default-1-2-linux.rocm.gpu_9155913429.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/test-jsons-runattempt1-test-default-1-2-macos-12_9155944815.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/test-jsons-test-default-1-2-macos-12_9155944815.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/test-jsons-runattempt1-test-default-1-2-macos-m1-12_9155888061.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/test-jsons-test-default-1-2-macos-m1-12_9155888061.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/test-jsons-runattempt1-test-default-2-2-linux.rocm.gpu_9155913500.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/test-jsons-test-default-2-2-linux.rocm.gpu_9155913500.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/test-jsons-runattempt1-test-default-2-2-macos-12_9155944892.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/test-jsons-test-default-2-2-macos-12_9155944892.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/test-jsons-runattempt1-test-default-2-2-macos-m1-12_9155888182.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/test-jsons-test-default-2-2-macos-m1-12_9155888182.zip Downloading test-reports-runattempt1-test-default-1-2-linux.rocm.gpu_9155913429.zip Downloading test-reports-runattempt1-test-default-1-2-macos-12_9155944815.zip Downloading test-reports-runattempt1-test-default-1-2-macos-m1-12_9155888061.zip Downloading test-reports-runattempt1-test-default-2-2-linux.rocm.gpu_9155913500.zip Downloading test-reports-runattempt1-test-default-2-2-macos-12_9155944892.zip Downloading test-reports-runattempt1-test-default-2-2-macos-m1-12_9155888182.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/test-reports-runattempt1-test-default-1-2-linux.rocm.gpu_9155913429.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/test-reports-test-default-1-2-linux.rocm.gpu_9155913429.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/test-reports-runattempt1-test-default-1-2-macos-12_9155944815.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/test-reports-test-default-1-2-macos-12_9155944815.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/test-reports-runattempt1-test-default-1-2-macos-m1-12_9155888061.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/test-reports-test-default-1-2-macos-m1-12_9155888061.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/test-reports-runattempt1-test-default-2-2-linux.rocm.gpu_9155913500.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/test-reports-test-default-2-2-linux.rocm.gpu_9155913500.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/test-reports-runattempt1-test-default-2-2-macos-12_9155944892.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/test-reports-test-default-2-2-macos-12_9155944892.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/test-reports-runattempt1-test-default-2-2-macos-m1-12_9155888182.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/test-reports-test-default-2-2-macos-m1-12_9155888182.zip Downloading usage-log-runattempt1-test-default-1-2-linux.rocm.gpu_9155913429.zip Downloading usage-log-runattempt1-test-default-1-2-macos-12_9155944815.zip Downloading usage-log-runattempt1-test-default-1-2-macos-m1-12_9155888061.zip Downloading usage-log-runattempt1-test-default-2-2-linux.rocm.gpu_9155913500.zip Downloading usage-log-runattempt1-test-default-2-2-macos-12_9155944892.zip Downloading usage-log-runattempt1-test-default-2-2-macos-m1-12_9155888182.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/usage-log-runattempt1-test-default-1-2-linux.rocm.gpu_9155913429.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/usage-log-test-default-1-2-linux.rocm.gpu_9155913429.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/usage-log-runattempt1-test-default-1-2-macos-12_9155944815.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/usage-log-test-default-1-2-macos-12_9155944815.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/usage-log-runattempt1-test-default-1-2-macos-m1-12_9155888061.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/usage-log-test-default-1-2-macos-m1-12_9155888061.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/usage-log-runattempt1-test-default-2-2-linux.rocm.gpu_9155913500.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/usage-log-test-default-2-2-linux.rocm.gpu_9155913500.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/usage-log-runattempt1-test-default-2-2-macos-12_9155944892.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/usage-log-test-default-2-2-macos-12_9155944892.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/usage-log-runattempt1-test-default-2-2-macos-m1-12_9155888182.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/usage-log-test-default-2-2-macos-m1-12_9155888182.zip ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87827 Approved by: https://github.com/clee2000	2022-10-29 17:40:07 +00:00
Edward Z. Yang	baa715e790	Unify meta tensor and fake tensor converter conversion (#87943 ) Meta tensor does a lot of work to make sure tensors "look" similar to the original parts; e.g., if the original was a non-leaf, meta converter ensures the meta tensor is a non-leaf too. Fake tensor destroyed some of these properties when it wraps it in a FakeTensor. This patch pushes the FakeTensor constructor into the meta converter itself, so that we first create a fake tensor, and then we do various convertibility bits to it to make it look right. The two tricky bits: - We need to have no_dispatch enabled when we allocate the initial meta tensor, or fake tensor gets mad at us for making a meta fake tensor. This necessitates the double-callback structure of the callback arguments: the meta construction happens inside the function so it is covered by no_dispatch - I can't store tensors for the storages anymore, as that will result in a leak. But we have untyped storage now, so I just store untyped storages instead. Signed-off-by: Edward Z. Yang <ezyang@fb.com> cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/87943 Approved by: https://github.com/eellison, https://github.com/albanD	2022-10-29 15:01:07 +00:00
Kazuaki Ishizaki	14d5f139d2	Fix typos under benchmarks, test, and tools directories (#87975 ) This PR fixes typos in `.md` files under benchmarks, test, and tools directories Pull Request resolved: https://github.com/pytorch/pytorch/pull/87975 Approved by: https://github.com/kit1980	2022-10-29 01:26:17 +00:00
albanD	8a9aca7b8d	Reland 2 Many symintifications (#87604 ) (#87980 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/87980 Approved by: https://github.com/ezyang	2022-10-28 13:40:11 +00:00
PyTorch MergeBot	8b4d95759c	Revert "Many symintifications (#87604 )" This reverts commit `777e6a2c51`. Reverted https://github.com/pytorch/pytorch/pull/87604 on behalf of https://github.com/weiwangmeta due to breaking internal builds	2022-10-28 03:00:11 +00:00
Edward Z. Yang	1ff52225f1	Unify SymIntNode and SymFloatNode into SymNode (#87817 ) This refactor was prompted by challenges handling mixed int/float operations in C++. A previous version of this patch added overloads for each permutation of int/float and was unwieldy https://github.com/pytorch/pytorch/pull/87722/ This PR takes a different approach. The general outline of the patch is to combine the C++ types SymIntNode and SymFloatNode into a single type, SymNode. This is type erased; we no longer know statically at C++ if we have an int/float and have to test it with the is_int()/is_float() virtual methods. This has a number of knock on effects. - We no longer have C++ classes to bind to Python. Instead, we take an entirely new approach to our Python API, where we have a SymInt/SymFloat class defined entirely in Python, which hold a SymNode (which corresponds to the C++ SymNode). However, SymNode is not pybind11-bound; instead, it lives as-is in Python, and is wrapped into C++ SymNode using PythonSymNode when it goes into C++. This implies a userland rename. In principle, it is also possible for the canonical implementation of SymNode to be written in C++, and then bound to Python with pybind11 (we have this code, although it is commented out.) However, I did not implement this as we currently have no C++ implementations of SymNode. Because we do return SymInt/SymFloat from C++ bindings, the C++ binding code needs to know how to find these classes. Currently, this is done just by manually importing torch and getting the attributes. - Because SymInt/SymFloat are easy Python wrappers, __sym_dispatch__ now takes SymInt/SymFloat, rather than SymNode, bringing it in line with how __torch_dispatch__ works. Some miscellaneous improvements: - SymInt now has a constructor that takes SymNode. Note that this constructor is ambiguous if you pass in a subclass of SymNode, so an explicit downcast is necessary. This means toSymFloat/toSymInt are no more. This is a mild optimization as it means rvalue reference works automatically. - We uniformly use the caster for c10::SymInt/SymFloat, rather than going the long way via the SymIntNode/SymFloatNode. - Removed some unnecessary toSymInt/toSymFloat calls in normalize_* functions, pretty sure this doesn't do anything. - guard_int is now a free function, since to guard on an int you cannot assume the method exists. A function can handle both int and SymInt inputs. - We clean up the magic method definition code for SymInt/SymFloat/SymNode. ONLY the user classes (SymInt/SymFloat) get magic methods; SymNode gets plain methods; this is to help avoid confusion between the two types. Signed-off-by: Edward Z. Yang <ezyang@fb.com> cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87817 Approved by: https://github.com/albanD, https://github.com/anjali411	2022-10-27 20:56:02 +00:00
Huy Do	8016fd9eb1	Set check-latest to false when setup python and pip cache in CI (#87621 ) I missed the fine print in https://github.com/actions/setup-python/blob/main/README.md#caching-packages-dependencies when setting up the cache using setup-python GHA > Restored cache will not be used if the requirements.txt file is not updated for a long time and a newer version of the dependency is available which can lead to an increase in total build time. The latter part is important because it implies that even with the cache, pip will still try to check if a newer version exists and that part can be flaky, i.e. https://github.com/pytorch/pytorch/actions/runs/3313764038/jobs/5472180293 This undesired behavior can be turned off by setting the advance option `check-latest` to false https://github.com/actions/setup-python/blob/main/docs/advanced-usage.md#check-latest-version. Per my understanding, this should tell pip install in these workflows to use the local cached copy of the package avoiding the need to query pypi every single time. `check-latest` was added quite recently https://github.com/actions/setup-python/pull/406, so `actionlint-1.6.15` fails to recognize it. Thus, this PR also upgrades `actionlint` to the latest 1.6.21 to pass the linter check. Here is an example error from 1.6.15 from https://github.com/pytorch/pytorch/actions/runs/3315388073/jobs/5475918454: ``` >>> Lint for .github/workflows/lint.yml: Error (ACTIONLINT) [action] input "check-latest" is not defined in action "actions/setup-python@v4". available inputs are "architecture", "cache", "cache-dependency-path", "python-version", "python-version-file", "token" 25 \| with: 26 \| python-version: 3.8 27 \| architecture: x64 >>> 28 \| check-latest: false 29 \| cache: pip 30 \| cache-dependency-path: \| 31 \| **/.github/requirements-gha-cache.txt ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87621 Approved by: https://github.com/ZainRizvi	2022-10-26 20:08:29 +00:00
PyTorch MergeBot	5f4329134e	Revert "Set check-latest to false when setup python and pip cache in CI (#87621 )" This reverts commit `4080b1db28`. Reverted https://github.com/pytorch/pytorch/pull/87621 on behalf of https://github.com/huydhn due to Somehow setup-python treats Python 3.10 as Python 3.1 in pr-label.yml. I missed this signal because this is only run at push	2022-10-26 19:40:53 +00:00

1 2 3 4 5 ...

4284 Commits