pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Catherine Lee	0db21a6b23	Remove most rockset references (#139922 ) Remove most references to rockset: * replace comments and docs with a generic "backend database" * Delete `upload_to_rockset`, so we no longer need to install the package. * Do not upload perf stats to rockset as well (we should be completely on DynamoDB now right @huydhn?) According to VSCode, it went from 41 -> 7 instances of "rockset" in the repo Pull Request resolved: https://github.com/pytorch/pytorch/pull/139922 Approved by: https://github.com/huydhn, https://github.com/ZainRizvi	2024-11-12 21:17:43 +00:00
Xuehai Pan	8a67daf283	[BE][Easy] enable postponed annotations in `tools` (#129375 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129375 Approved by: https://github.com/malfet	2024-06-29 09:23:35 +00:00
PyTorch MergeBot	a32ce5ce34	Revert "[BE][Easy] enable postponed annotations in `tools` (#129375 )" This reverts commit `59eb2897f1`. Reverted https://github.com/pytorch/pytorch/pull/129375 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I need to revert to cleanly revert https://github.com/pytorch/pytorch/pull/129374, please do a rebase and reland this ([comment](https://github.com/pytorch/pytorch/pull/129375#issuecomment-2197800541))	2024-06-29 00:44:25 +00:00
Xuehai Pan	59eb2897f1	[BE][Easy] enable postponed annotations in `tools` (#129375 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129375 Approved by: https://github.com/malfet	2024-06-28 15:37:54 +00:00
Catherine Lee	7c00635125	[CI] Move gha artifact download before xml parsing for test stat uploads (#125609 ) Move gha artifact download to before any xml parsing is done for uplaod-test-stats Do not download gha artifacts during xml parsing since got uploaded to s3 in the above and will be downloaded when all the artifacts are downloaded from s3 The previous method resulted in dups if you run the script again TODO: write a deduper so we don't have to worry at all Pull Request resolved: https://github.com/pytorch/pytorch/pull/125609 Approved by: https://github.com/huydhn	2024-05-09 20:35:09 +00:00
Huy Do	f334b54d7f	Handle the list of skipped messages when uploading disabled test stats (#104803 ) This fixes the failure when a list of skipped messages is encountered when uploading disabled test stats, for example https://github.com/pytorch/pytorch/actions/runs/5489936777/jobs/10004725533. This happens for ONNX tests (running regularly), i.e. https://ossci-raw-job-status.s3.amazonaws.com/log/14868893973: ``` onnx/test_op_consistency.py::TestOnnxModelOutputConsistency_opset13CPU::test_output_match_tile_cpu_bool SUBSKIP [0.0000s] (Logic not implemented for size 0 inputs in op.Reshape) [ 47%] onnx/test_op_consistency.py::TestOnnxModelOutputConsistency_opset13CPU::test_output_match_tile_cpu_bool SUBSKIP [0.0000s] (Logic not implemented for size 0 inputs in op.Reshape) [ 47%] ... onnx/test_op_consistency.py::TestOnnxModelOutputConsistency_opset13CPU::test_output_match_tile_cpu_bool SUBSKIP [0.0000s] (Logic not implemented for size 0 inputs in op.Reshape) [ 47%] onnx/test_op_consistency.py::TestOnnxModelOutputConsistency_opset13CPU::test_output_match_tile_cpu_bool PASSED [0.3136s] [ 47%] ``` The corresponding XML output is as follows https://paste.sh/b1DbSLJD#M-0WsXd9snjEVFh4ZsxPPIlv where `skipped` is a list of skipped messages instead of a dictionary. As we only care about gathering disabled tests stats in this script, the list of skipped messages can be safely ignored. ### Testing * Gathering disabled test stats works correctly when running under rerunning disabled tests mode https://github.com/pytorch/pytorch/actions/runs/5487829458/jobs/9999835911 * The command works locally for the above failed workflow (which is not a rerunning disabled tests workflow): ``` python3 -m tools.stats.check_disabled_tests --workflow-run-id "5488337480" --workflow-run-attempt 1 --repo "pytorch/pytorch" ... The following 0 tests should be re-enabled: The following 0 are still flaky: Writing 0 documents to S3 Done! ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/104803 Approved by: https://github.com/clee2000	2023-07-08 07:23:46 +00:00
Huy Do	e9f2921bff	Fix rerun disabled test uploading logic (#103476 ) After https://github.com/pytorch/pytorch/pull/102107, rerunning disabled tests only collect and run disable tests. A side effect of this change is that the skip message `Test is enabled but --rerun-disabled-tests verification mode is set, so only disabled tests are run` isn't in the test report anymore as these non-disabled tests are not going to be collected in the first place. This breaks the logic in the uploading script that depends on this string to know if the test report belongs to a rerunning disabled tests workflow. * This PR updates the logic in `is_rerun_disabled_tests` check to count the number of times a test is run instead. In rerunning disabled tests mode, a test is run 50 times by default and 15 times for distributed tests (to avoid timeout). Both these numbers are larger than the max number of retries a test can get normally (3 x 3) * This also removes the hacky `is_rerun_disabled_tests` check in `tools/stats/upload_test_stats.py` as rerun disabled tests reports are now very small (50 x the number of disabled tests) ### Testing * `test_gradgrad_nn_GroupNorm_cuda_float64` now shows up correctly https://github.com/pytorch/pytorch/issues/98678 ``` python3 -m tools.stats.check_disabled_tests --workflow-run-id 5229037746 --workflow-run-attempt 1 --repo "pytorch/pytorch" Using temporary directory: /var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpdojg5vq5 Downloading test-reports-test-default-1-4-linux.g5.4xlarge.nvidia.gpu_14154925022.zip Downloading test-reports-test-default-1-4-linux.g5.4xlarge.nvidia.gpu_14154925093.zip Downloading test-reports-test-default-2-4-linux.g5.4xlarge.nvidia.gpu_14154925167.zip Downloading test-reports-test-default-2-4-linux.g5.4xlarge.nvidia.gpu_14154925226.zip Downloading test-reports-test-default-3-4-linux.g5.4xlarge.nvidia.gpu_14154925295.zip Downloading test-reports-test-default-3-4-linux.g5.4xlarge.nvidia.gpu_14154925371.zip Downloading test-reports-test-default-4-4-linux.g5.4xlarge.nvidia.gpu_14154925453.zip Downloading test-reports-test-default-4-4-linux.g5.4xlarge.nvidia.gpu_14154925536.zip Downloading test-reports-test-slow-1-1-linux.2xlarge_14154853469.zip Downloading test-reports-test-slow-1-1-linux.rocm.gpu_14154932523.zip Downloading test-reports-test-slow-1-1-linux.rocm.gpu_14154932563.zip Downloading test-reports-test-slow-1-2-linux.4xlarge_14154873704.zip Downloading test-reports-test-slow-1-2-linux.g5.4xlarge.nvidia.gpu_14154931154.zip Downloading test-reports-test-slow-1-2-linux.g5.4xlarge.nvidia.gpu_14154931186.zip Downloading test-reports-test-slow-2-2-linux.4xlarge_14154873756.zip Downloading test-reports-test-slow-2-2-linux.g5.4xlarge.nvidia.gpu_14154931225.zip Downloading test-reports-test-slow-2-2-linux.g5.4xlarge.nvidia.gpu_14154931267.zip Extracting test-reports-test-default-1-4-linux.g5.4xlarge.nvidia.gpu_14154925022.zip to unzipped-test-reports-test-default-1-4-linux.g5.4xlarge.nvidia.gpu_14154925022 Extracting test-reports-test-default-1-4-linux.g5.4xlarge.nvidia.gpu_14154925093.zip to unzipped-test-reports-test-default-1-4-linux.g5.4xlarge.nvidia.gpu_14154925093 Extracting test-reports-test-default-2-4-linux.g5.4xlarge.nvidia.gpu_14154925167.zip to unzipped-test-reports-test-default-2-4-linux.g5.4xlarge.nvidia.gpu_14154925167 Extracting test-reports-test-default-2-4-linux.g5.4xlarge.nvidia.gpu_14154925226.zip to unzipped-test-reports-test-default-2-4-linux.g5.4xlarge.nvidia.gpu_14154925226 Extracting test-reports-test-default-3-4-linux.g5.4xlarge.nvidia.gpu_14154925295.zip to unzipped-test-reports-test-default-3-4-linux.g5.4xlarge.nvidia.gpu_14154925295 Extracting test-reports-test-default-3-4-linux.g5.4xlarge.nvidia.gpu_14154925371.zip to unzipped-test-reports-test-default-3-4-linux.g5.4xlarge.nvidia.gpu_14154925371 Extracting test-reports-test-default-4-4-linux.g5.4xlarge.nvidia.gpu_14154925453.zip to unzipped-test-reports-test-default-4-4-linux.g5.4xlarge.nvidia.gpu_14154925453 Extracting test-reports-test-default-4-4-linux.g5.4xlarge.nvidia.gpu_14154925536.zip to unzipped-test-reports-test-default-4-4-linux.g5.4xlarge.nvidia.gpu_14154925536 Extracting test-reports-test-slow-1-1-linux.2xlarge_14154853469.zip to unzipped-test-reports-test-slow-1-1-linux.2xlarge_14154853469 Extracting test-reports-test-slow-1-1-linux.rocm.gpu_14154932523.zip to unzipped-test-reports-test-slow-1-1-linux.rocm.gpu_14154932523 Extracting test-reports-test-slow-1-1-linux.rocm.gpu_14154932563.zip to unzipped-test-reports-test-slow-1-1-linux.rocm.gpu_14154932563 Extracting test-reports-test-slow-1-2-linux.4xlarge_14154873704.zip to unzipped-test-reports-test-slow-1-2-linux.4xlarge_14154873704 Extracting test-reports-test-slow-1-2-linux.g5.4xlarge.nvidia.gpu_14154931154.zip to unzipped-test-reports-test-slow-1-2-linux.g5.4xlarge.nvidia.gpu_14154931154 Extracting test-reports-test-slow-1-2-linux.g5.4xlarge.nvidia.gpu_14154931186.zip to unzipped-test-reports-test-slow-1-2-linux.g5.4xlarge.nvidia.gpu_14154931186 Extracting test-reports-test-slow-2-2-linux.4xlarge_14154873756.zip to unzipped-test-reports-test-slow-2-2-linux.4xlarge_14154873756 Extracting test-reports-test-slow-2-2-linux.g5.4xlarge.nvidia.gpu_14154931225.zip to unzipped-test-reports-test-slow-2-2-linux.g5.4xlarge.nvidia.gpu_14154931225 Extracting test-reports-test-slow-2-2-linux.g5.4xlarge.nvidia.gpu_14154931267.zip to unzipped-test-reports-test-slow-2-2-linux.g5.4xlarge.nvidia.gpu_14154931267 Downloading test-reports-runattempt1-test-slow-1-1-linux.rocm.gpu_14154932523.zip Downloading test-reports-runattempt1-test-slow-1-1-linux.rocm.gpu_14154932563.zip Extracting test-reports-runattempt1-test-slow-1-1-linux.rocm.gpu_14154932523.zip to unzipped-test-reports-runattempt1-test-slow-1-1-linux.rocm.gpu_14154932523 Extracting test-reports-runattempt1-test-slow-1-1-linux.rocm.gpu_14154932563.zip to unzipped-test-reports-runattempt1-test-slow-1-1-linux.rocm.gpu_14154932563 The following 32 tests should be re-enabled: test_huge_index (__main__.TestCuda) from test_cuda.py test_conv_bn_fuse_cpu (__main__.CpuTests) from inductor/test_torchinductor.py test_multi_threads (__main__.TestTorchrun) from backends/xeon/test_launch.py test_huge_index (__main__.TestCuda) from test_cuda_expandable_segments.py test_memory_timeline_no_id (__main__.TestMemoryProfilerE2E) from profiler/test_memory_profiler.py test_inverse_errors_large_cuda_float64 (__main__.TestLinalgCUDA) from test_linalg.py test_trace_dependencies (__main__.TestAnalyze) from test_package.py test_caching_pinned_memory (__main__.TestCuda) from test_cuda_expandable_segments.py test_graph_concurrent_replay (__main__.TestCuda) from test_cuda_expandable_segments.py test_module_attribute_mutation_violation_negative_1 (__main__.MutationExportTests) from dynamo/test_export_mutations.py test_module_attribute_mutation_violation_negative_2 (__main__.MutationExportTests) from dynamo/test_export_mutations.py test_module_attribute_mutation_violation_negative_4 (__main__.MutationExportTests) from dynamo/test_export_mutations.py test_vmapjvpall_linalg_lu_cuda_float32 (__main__.TestOperatorsCUDA) from functorch/test_ops.py test_vmapjvpvjp_linalg_lu_cuda_float32 (__main__.TestOperatorsCUDA) from functorch/test_ops.py test_Conv2d_no_bias_cuda_tf32 (__main__.TestNN) from test_nn.py test_save_graph_repro (__main__.TestAfterAot) from dynamo/test_after_aot.py test_doc_examples (__main__.TestTypeHints) from test_type_hints.py test_caching_pinned_memory (__main__.TestCuda) from test_cuda.py test_graph_concurrent_replay (__main__.TestCuda) from test_cuda.py test_non_contiguous_tensors_nn_ConvTranspose1d_cuda_complex32 (__main__.TestModuleCUDA) from test_modules.py test_pickle_nn_RNN_eval_mode_cuda_float64 (__main__.TestModuleCUDA) from test_modules.py test_op_has_batch_rule_nn_functional_conv_transpose3d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA) from functorch/test_vmap.py test_geometric_kstest_cuda_float32 (__main__.TestTorchDeviceTypeCUDA) from test_torch.py test_profiler_experimental_tree_with_memory (__main__.TestProfilerTree) from profiler/test_profiler_tree.py test_fs_pool (__main__.TestMultiprocessing) from test_multiprocessing.py test_forward_mode_AD_linalg_lu_factor_ex_cuda_complex128 (__main__.TestFwdGradientsCUDA) from test_ops_fwd_gradients.py test_vjp_linalg_lu_cuda_float32 (__main__.TestOperatorsCUDA) from functorch/test_ops.py test_inplace_grad_fmod_cuda_float64 (__main__.TestBwdGradientsCUDA) from test_ops_gradients.py test_inplace_gradgrad_remainder_cuda_float64 (__main__.TestBwdGradientsCUDA) from test_ops_gradients.py test_bottleneck_cuda (__main__.TestBottleneck) from test_utils.py test_comprehensive_empty_strided_cuda_int32 (__main__.TestInductorOpInfoCUDA) from inductor/test_torchinductor_opinfo.py test_vmapvjpvjp_linalg_lu_cuda_float32 (__main__.TestOperatorsCUDA) from functorch/test_ops.py The following 11 are still flaky: test_transpose_with_norm (__main__.CPUReproTests) from inductor/test_cpu_repro.py, failing 215/215 test_compare_cpu_linalg_pinv_singular_cuda_float32 (__main__.TestCommonCUDA) from test_ops.py, failing 100/100 test_conv_bn_fuse_dynamic_shapes_cpu (__main__.DynamicShapesCodegenCpuTests) from inductor/test_torchinductor_codegen_dynamic_shapes.py, failing 115/115 test_lobpcg (__main__.TestAutograd) from test_autograd.py, failing 50/50 test_module_attribute_mutation_violation_negative_3 (__main__.MutationExportTests) from dynamo/test_export_mutations.py, failing 2/50 test_Conv2d_dilated_cuda_tf32 (__main__.TestNN) from test_nn.py, failing 1/50 test_grad_nn_GroupNorm_cuda_float64 (__main__.TestModuleCUDA) from test_modules.py, failing 50/50 test_index_add_correctness (__main__.TestTorch) from test_torch.py, failing 22/50 test_attn_cuda (__main__.TestMin) from functorch/test_dims.py, failing 1/50 test_open_device_registration (__main__.TestCppExtensionOpenRgistration) from test_cpp_extensions_open_device_registration.py, failing 50/50 test_gradgrad_nn_GroupNorm_cuda_float64 (__main__.TestModuleCUDA) from test_modules.py, failing 50/50 ``` * Uploading tests stats for rerunning disabled tests takes only half a minute ``` time python3 -m tools.stats.upload_test_stats --workflow-run-id 5229037746 --workflow-run-attempt 1 --head-branch main 31.94s user 2.94s system 44% cpu 1:19.07 total ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/103476 Approved by: https://github.com/clee2000	2023-06-13 17:07:40 +00:00
shaoyf42	7554c10899	Fix typos under tools directory (#97779 ) Fix typos under tools directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/97779 Approved by: https://github.com/clee2000, https://github.com/kit1980	2023-03-30 08:21:35 +00:00
PaliC	3df1a9baca	Upload external contribution data to s3 (#95747 ) Context: We want to create a metric panel to track external contributions to the PyTorch repo This PR creates a daily job to track how many external contributions occurred the day before and uploads it to a s3 collection which is accessible by rockset. `upload_external_contrib_stats.py` is a python script which grabs the neccesary stats from github and sticks them into an s3 bucket. It is used here to do daily uploads, but can generally be used for larger queries as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95747 Approved by: https://github.com/huydhn, https://github.com/kit1980	2023-03-02 21:57:28 +00:00
PyTorch MergeBot	06562529d2	Revert "Upload external contribution data to s3 (#95747 )" This reverts commit `f418e1f8b6`. Reverted https://github.com/pytorch/pytorch/pull/95747 on behalf of https://github.com/clee2000 due to broke lint on master, merge base is too old, https://github.com/pytorch/pytorch/actions/runs/4315881630/jobs/7531170401 `f418e1f8b6 (11721314649)`	2023-03-02 17:34:14 +00:00
PaliC	f418e1f8b6	Upload external contribution data to s3 (#95747 ) Context: We want to create a metric panel to track external contributions to the PyTorch repo This PR creates a daily job to track how many external contributions occurred the day before and uploads it to a s3 collection which is accessible by rockset. `upload_external_contrib_stats.py` is a python script which grabs the neccesary stats from github and sticks them into an s3 bucket. It is used here to do daily uploads, but can generally be used for larger queries as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95747 Approved by: https://github.com/huydhn, https://github.com/kit1980	2023-03-02 16:03:32 +00:00
Aaron Gokaslan	748bac8757	[BE]: Apply pyupgrade yield from and unit test alias upgrades (#94309 ) Applies some more harmless pyupgrades. This one gets rid of deprecated aliases in unit_tests and more upgrades yield for loops into yield from generators which are more performance and propagates more information / exceptions from original generator. This is the modern recommended way of forwarding generators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94309 Approved by: https://github.com/albanD	2023-02-07 20:08:58 +00:00
Huy Do	b8d3afd886	Skip upload test stats for test reports from rerun disabled tests workflow (#89548 ) I have found the reason why uploading tests stats fails for rerun disabled workflow, for example https://github.com/pytorch/pytorch/actions/runs/3522896778/jobs/5917765699. The problem is that the pytest XML file is now too big to be processed quickly (x50 bigger). Unlike unittest, `pytest-flakefinder` used by rerun disabled tests for test_ops includes skipped messages multiple times (50 times by default, retrying and skipping). This slows down the upload test stats script too much (O(n)) because it tries to gather all the stats. On the other hand, `check_disabled_tests` doesn't suffer from the same issue because it ignores all these skipped messages. This is a quick fix to skip test reports from rerun disabled tests workflow when trying to upload test stats. I'll try to fix this properly later in the way we use pytest-flakefinder. From what I see, a zipped test report from rerun disabled test is only few MB ([example](https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/3521687954/1/artifact/test-reports-test-default-1-2-linux.2xlarge_9636028803.zip)), but will balloon up to a much bigger XML file after extracting from a dozen to a few hundred MB (text). The size of the zipped file is not a big immediate problem ### Testing [3521687954](https://github.com/pytorch/pytorch/actions/runs/3521687954) is an example workflow with rerun disabled tests and mem leak check. The script can now finish when running locally: * `upload_test_stats` finishes around 3+ minutes ``` time python -m tools.stats.upload_test_stats --workflow-run-id 3521687954 --workflow-run-attempt 1 --head-branch master ... Writing 8925 documents to S3 Done! Writing 1760 documents to S3 Done! Writing 1675249 documents to S3 Done! python3 -m tools.stats.upload_test_stats --workflow-run-id 3521687954 1 185.69s user 12.89s system 75% cpu 4:22.82 total ``` * `check_disabled_tests` finishes within 3 minutes ``` time python -m tools.stats.check_disabled_tests --workflow-run-id 3521687954 --workflow-run-attempt 1 --repo pytorch/pytorch ... python -m tools.stats.check_disabled_tests --workflow-run-id 3521687954 1 154.19s user 4.17s system 97% cpu 2:42.50 total ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89548 Approved by: https://github.com/clee2000	2022-11-23 22:39:39 +00:00
Huy Do	573eaf1225	Analyze and upload disabled tests rerun to S3 (#89083 ) Analyze and upload disabled tests rerun to S3. Note that this only picks up `test-reports` from `rerun_disable_tests` workflows. ### Testing Running the script manually `python -m tools.stats.check_disabled_tests --workflow-run-id 3473068035 --workflow-run-attempt 1 --repo pytorch/pytorch` and see the files successfully uploaded to s3://ossci-raw-job-status/rerun_disabled_tests/3473068035/1 Rockset collection created https://console.rockset.com/collections/details/commons.rerun_disabled_tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/89083 Approved by: https://github.com/clee2000	2022-11-17 03:36:58 +00:00

14 Commits