pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Edward Yang	dda071587f	Revert "Make distributed modules importable even when backend not built (#159889 )" (#162568 ) This reverts commit `a0d026688c`. Revert "Always build USE_DISTRIBUTED. (#160449)" This reverts commit `d80297a684`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162568 Approved by: https://github.com/huydhn	2025-09-10 04:29:42 +00:00
Edward Yang	d80297a684	Always build USE_DISTRIBUTED. (#160449 ) Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci	2025-09-08 19:10:36 +00:00
PyTorch MergeBot	1e0656f063	Revert "Always build USE_DISTRIBUTED. (#160449 )" This reverts commit `de893e96c7`. Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to internal changes breaks import checks, see [D81845053](https://www.internalfb.com/diff/D81845053) ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3264887002))	2025-09-08 07:04:36 +00:00
Edward Yang	de893e96c7	Always build USE_DISTRIBUTED. (#160449 ) Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci	2025-09-05 20:15:11 +00:00
PyTorch MergeBot	adae7f66aa	Revert "Always build USE_DISTRIBUTED. (#160449 )" This reverts commit `c37103234a`. Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to Breaking internal build rules, see D81756619 ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3259430011))	2025-09-05 18:58:47 +00:00
Edward Yang	c37103234a	Always build USE_DISTRIBUTED. (#160449 ) Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci	2025-09-04 19:43:17 +00:00
PyTorch MergeBot	b7dad7dd49	Revert "Always build USE_DISTRIBUTED. (#160449 )" This reverts commit `90b08643c3`. Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to Already discussed with @ezyang about the internal quirks and errors ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3254219358))	2025-09-04 15:25:07 +00:00
Edward Yang	90b08643c3	Always build USE_DISTRIBUTED. (#160449 ) Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci	2025-09-03 07:33:55 +00:00
PyTorch MergeBot	4e42aa8ffc	Revert "Always build USE_DISTRIBUTED. (#160449 )" This reverts commit `b7034e9c92`. Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, can't be landed with forward fix due to internal tooling problems ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3246689684))	2025-09-02 20:28:42 +00:00
Edward Yang	b7034e9c92	Always build USE_DISTRIBUTED. (#160449 ) Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci	2025-09-01 23:00:21 +00:00
Shivam Raikundalia	63fbc738dc	[Easy/Profiler] Add last entry to truncated values (#148576 ) Summary: Since the ranks of a PG are usually in a consecutive range it is useful to print the last values when truncating metadata Test Plan: Manually changed truncate length to 2 and ran 4 gpu graph to get the following trace: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devgpu003.rva5.facebook.com/rank-1.Mar_05_09_48_21.1280355.pt.trace.json.gz&bucket=gpu_traces Differential Revision: D70637461 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148576 Approved by: https://github.com/davidberard98	2025-03-06 01:14:15 +00:00
cyy	e9f6045e80	[15/N] Fix extra warnings brought by clang-tidy-17 (#143100 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/143100 Approved by: https://github.com/Skylion007	2024-12-14 03:24:10 +00:00
Darshan Sanghani	28efc17d2c	[pytorch/profiler] Honor escape quotes arg in a profiler metadata log formatter (#141527 ) (#141626 ) Summary: We were ignoring the with_escaped_quotes param in format_list inline function iin utils.cpp in the case where we had to truncate a list of more than kTruncatelength items. In that case we would truncate a list into a string but always return it with an escaped quotes wrapping it. this will cause issues if this string is meant to be added to other lists which will also go through formatting. Leading to cases like `"["[a, b, c, ...]"]"`. now the above will be well formatted as `"[[a, b, c, ...]]"` as the escape quote requests will be honored. Differential Revision: D66521676 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141626 Approved by: https://github.com/sraikund16	2024-12-03 20:42:57 +00:00
cyy	96be048f06	[1/N] Avoid copy in std::get (#141812 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/141812 Approved by: https://github.com/Skylion007	2024-12-01 03:53:35 +00:00
Darshan Sanghani	32e93dfa92	[pytorch/profiler] Profiler NCCL metadata can now contain collective Input and Ouput Tensor addrs (#140637 ) Summary: Studying memory access patterns is the primary use cases. Differential Revision: D65918359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140637 Approved by: https://github.com/briancoutinho	2024-11-19 22:22:16 +00:00
PyTorch MergeBot	97d995a0d3	Revert "[pytorch/profiler] Profiler NCCL metadata can now contain collective Input and Ouput Tensor addrs (#139837 )" This reverts commit `3e277eb9fe`. Reverted https://github.com/pytorch/pytorch/pull/139837 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/139837#issuecomment-2473466607))	2024-11-13 12:26:43 +00:00
Darshan Sanghani	3e277eb9fe	[pytorch/profiler] Profiler NCCL metadata can now contain collective Input and Ouput Tensor addrs (#139837 ) Studying memory access patterns is the primary use cases. Internal: The data may be used to find the % of operators that may cause alignment related overhead. Differential Revision: [D64413699](https://our.internmc.facebook.com/intern/diff/D64413699/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139837 Approved by: https://github.com/sraikund16	2024-11-13 04:57:16 +00:00
cyyever	456c87c8a2	[8/N] Fix extra warnings brought by clang-tidy-17 (#139151 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/139151 Approved by: https://github.com/ezyang Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2024-10-30 14:20:08 +00:00
PyTorch MergeBot	375dcb960f	Revert "Avoid some dangling reference warnings (#132535 )" This reverts commit `f3d7a02716`. Reverted https://github.com/pytorch/pytorch/pull/132535 on behalf of https://github.com/clee2000 due to broke some internal builds D64479234 ([comment](https://github.com/pytorch/pytorch/pull/132535#issuecomment-2419983509))	2024-10-17 16:23:36 +00:00
Isuru Fernando	f3d7a02716	Avoid some dangling reference warnings (#132535 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132535 Approved by: https://github.com/aaronenyeshi	2024-10-16 13:41:12 +00:00
Shivam Raikundalia	3a1239a248	[Profiler] Harden Record Function Kwargs (#135365 ) Summary: In S445839, we had HTA break because of the "stream" parameter that was added to gpu traces. This brought up discussions regarding hardening our post processing of said inputs as to not break JSON schema as well as downstream tools. For this reason, this diff does the following. 1. Only allow int, double, bool and string values to be processed as kwinputs for JSON output. We can handle lists if needed in the future. 2. Make sure that any boolean is lowercase when a string so that the JSON does not break when parsing it 3. Force stream parameter to be an int Test Plan: Added unit tests to ensure that the list of requirements above is true for kwargs only. Differential Revision: D62304843 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135365 Approved by: https://github.com/aaronenyeshi	2024-09-10 18:44:05 +00:00
Shivam Raikundalia	195ac85fb6	[Profiler] Allow kwinputs to be non-string values (#134893 ) Summary: When we process keyword arguments in profiler today we assume that all values will be strings. This breaks HTA because it assumes that "stream" and other values similar to it will be ints. To fix this we will only put quotes around strings for ivalues. Test Plan: Add chrome trace export in unit tests and check that stream does not have quotes around it Differential Revision: D62056059 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134893 Approved by: https://github.com/sanrise, https://github.com/izaitsevfb	2024-09-04 16:34:10 +00:00
Shengbao Zheng	89bdd9c18f	[kineto] populate src/dst rank for p2p (#130812 ) Summary: as title populate src/dst rank (global rank) for p2p kernel Differential Revision: D59794535 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130812 Approved by: https://github.com/aaronenyeshi	2024-07-25 21:10:57 +00:00
Aaron Gokaslan	83eedf66b9	Update libfmt submodule to 11.0.1 (#130628 ) Update libfmt to 11.0.1 reopen of https://github.com/pytorch/pytorch/pull/129962. Requires a kineto update and moves fmt::join into a separate include so added it where necessary. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130628 Approved by: https://github.com/aaronenyeshi	2024-07-16 06:12:11 +00:00
Shivam Raikundalia	6f275ae4d0	Add kwinputs to Kineto Traces (#130373 ) Summary: On the autograd side of things, we are currently saving the kwinputs but we aren't doing anything with them on the profiler side. This diff enables the use of the kwinputs for both FunctionEvents and Chrome Traces. Test Plan: Added unit testing for both chrome traces and FunctionEvents. Used RecordFunctionFast to test kwinputs since test already had kwargs being passed in but not tested. Differential Revision: D59472345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130373 Approved by: https://github.com/davidberard98	2024-07-14 00:40:59 +00:00
Shivam Raikundalia	13316a8d46	[Profiler] Add Rank to NCCL Debug Info (#129528 ) Summary: We need to add the Rank information to the NCCL debug data so that kineto can infer all the necessary process group info such that on-demand can create distributedInfo metadata. Kineto portion will be added in a follow up diff Test Plan: Tested in D58736045, this diff just splits the kineto and profiler instances Differential Revision: D59028819 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129528 Approved by: https://github.com/aaronenyeshi	2024-06-26 21:24:05 +00:00
cyy	3d88c618d5	Concat namespaces in torch/csrc/profiler and other fixes (#127266 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127266 Approved by: https://github.com/soulitzer	2024-05-28 15:21:32 +00:00
briancoutinho	de42af4b00	Add coms metadata to execution trace (ET) (#126317 ) Add Execution Trace communication collective meta data. For specification see https://github.com/pytorch/pytorch/issues/124674 New fields look like ``` { "id": 80, "name": "record_param_comms", "ctrl_deps": 79, "inputs": {"values": [[[78,74,0,100,4,"cuda:0"]],21,["0","default_pg"],0,"allreduce",[],[],0,1,2], "shapes": [[[100]],[],[[],[]],[],[],[],[],[],[],[]], "types": ["GenericList[Tensor(float)]","Int","Tuple[String,String]","Int","String","GenericList[]","GenericList[]","Int","Int","Int"]}, "outputs": {"values": [[[78,74,0,100,4,"cuda:0"]]], "shapes": [[[100]]], "types": ["GenericList[Tensor(float)]"]}, "attrs": [{"name": "rf_id", "type": "uint64", "value": 53},{"name": "fw_parent", "type": "uint64", "value": 0},{"name": "seq_id", "type": "int64", "value": -1},{"name": "scope", "type": "uint64", "value": 0},{"name": "tid", "type": "uint64", "value": 2},{"name": "fw_tid", "type": "uint64", "value": 0},{"name": "op_schema", "type": "string", "value": ""},{"name": "kernel_backend", "type": "string", "value": ""},{"name": "kernel_file", "type": "string", "value": ""}, {"name": "collective_name", "type": "string", "value": "allreduce"}, {"name": "dtype", "type": "string", "value": "Float"}, {"name": "in_msg_nelems", "type": "uint64", "value": 100}, {"name": "out_msg_nelems", "type": "uint64", "value": 100}, {"name": "in_split_size", "type": "string", "value": "[]"}, {"name": "out_split_size", "type": "string", "value": "[]"}, {"name": "global_rank_start", "type": "uint64", "value": 0}, {"name": "global_rank_stride", "type": "uint64", "value": 1}, {"name": "pg_name", "type": "string", "value": "0"}, {"name": "pg_desc", "type": "string", "value": "default_pg"}, {"name": "pg_size", "type": "uint64", "value": 2}] } ``` ## Unit Test Added a new unit test to check the execution trace collected has right attributes `touch /tmp/barrier && TEMP_DIR="/tmp" BACKEND="nccl" WORLD_SIZE="2" python test/distributed/test_distributed_spawn.py -v TestDistBackendWithSpawn.test_ddp_profiling_execution_trace` ``` STAGE:2024-05-08 17:39:10 62892:62892 ActivityProfilerController.cpp:316] Completed Stage: Warm Up STAGE:2024-05-08 17:39:10 62893:62893 ActivityProfilerController.cpp:316] Completed Stage: Warm Up STAGE:2024-05-08 17:39:11 62892:62892 ActivityProfilerController.cpp:322] Completed Stage: Collection STAGE:2024-05-08 17:39:11 62893:62893 ActivityProfilerController.cpp:322] Completed Stage: Collection STAGE:2024-05-08 17:39:11 62892:62892 ActivityProfilerController.cpp:326] Completed Stage: Post Processing STAGE:2024-05-08 17:39:11 62893:62893 ActivityProfilerController.cpp:326] Completed Stage: Post Processing [rank1]:[W508 17:39:12.329544411 reducer.cpp:1399] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) [rank0]:[W508 17:39:12.329626774 reducer.cpp:1399] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) [rank0]:[W508 17:39:12.339239982 execution_trace_observer.cpp:825] Enabling Execution Trace Observer [rank1]:[W508 17:39:12.339364516 execution_trace_observer.cpp:825] Enabling Execution Trace Observer STAGE:2024-05-08 17:39:12 62892:62892 ActivityProfilerController.cpp:316] Completed Stage: Warm Up STAGE:2024-05-08 17:39:12 62893:62893 ActivityProfilerController.cpp:316] Completed Stage: Warm Up [rank1]:[W508 17:39:12.352452400 execution_trace_observer.cpp:837] Disabling Execution Trace Observer STAGE:2024-05-08 17:39:12 62893:62893 ActivityProfilerController.cpp:322] Completed Stage: Collection [rank0]:[W508 17:39:12.354019014 execution_trace_observer.cpp:837] Disabling Execution Trace Observer STAGE:2024-05-08 17:39:12 62893:62893 ActivityProfilerController.cpp:326] Completed Stage: Post Processing STAGE:2024-05-08 17:39:12 62892:62892 ActivityProfilerController.cpp:322] Completed Stage: Collection STAGE:2024-05-08 17:39:12 62892:62892 ActivityProfilerController.cpp:326] Completed Stage: Post Processing Execution trace saved at /tmp/tmpy01ngc3w.et.json Execution trace saved at /tmp/tmptf8543k4.et.json ok ---------------------------------------------------------------------- ``` Also run profilerunit test `touch /tmp/barrier && TEMP_DIR="/tmp" BACKEND="nccl" WORLD_SIZE="2" python test/distributed/test_distributed_spawn.py -v TestDistBackendWithSpawn.test_ddp_profiling_torch_profiler` ``` STAGE:2024-05-08 18:24:22 1926775:1926775 ActivityProfilerController.cpp:316] Completed Stage: Warm Up STAGE:2024-05-08 18:24:22 1926774:1926774 ActivityProfilerController.cpp:316] Completed Stage: Warm Up STAGE:2024-05-08 18:24:24 1926774:1926774 ActivityProfilerController.cpp:322] Completed Stage: Collection STAGE:2024-05-08 18:24:24 1926775:1926775 ActivityProfilerController.cpp:322] Completed Stage: Collection STAGE:2024-05-08 18:24:24 1926774:1926774 ActivityProfilerController.cpp:326] Completed Stage: Post Processing STAGE:2024-05-08 18:24:24 1926775:1926775 ActivityProfilerController.cpp:326] Completed Stage: Post Processing [rank1]:[W508 18:24:24.508622236 reducer.cpp:1399] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) [rank0]:[W508 18:24:24.508622241 reducer.cpp:1399] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) STAGE:2024-05-08 18:24:24 1926774:1926774 ActivityProfilerController.cpp:316] Completed Stage: Warm Up STAGE:2024-05-08 18:24:24 1926775:1926775 ActivityProfilerController.cpp:316] Completed Stage: Warm Up STAGE:2024-05-08 18:24:24 1926774:1926774 ActivityProfilerController.cpp:322] Completed Stage: Collection STAGE:2024-05-08 18:24:24 1926775:1926775 ActivityProfilerController.cpp:322] Completed Stage: Collection STAGE:2024-05-08 18:24:24 1926774:1926774 ActivityProfilerController.cpp:326] Completed Stage: Post Processing STAGE:2024-05-08 18:24:24 1926775:1926775 ActivityProfilerController.cpp:326] Completed Stage: Post Processing Trace saved to /tmp/tmpdrw_cmcu.json Trace saved to /tmp/tmpnio7ec9j.json ok ---------------------------------------------------------------------- Ran 1 test in 19.772s OK ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/126317 Approved by: https://github.com/yoyoyocmu, https://github.com/sanrise	2024-05-17 19:08:55 +00:00
Richard Barnes	ed327876f5	[codemod] `c10:optional` -> `std::optional` (#126135 ) Generated by running the following from PyTorch root: ``` find . -regex ".*\.$cpp\\|h\\|cu\\|hpp\\|cc\\|cxx$$" \| grep -v "build/" \| xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/' ``` `c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi	2024-05-14 19:35:51 +00:00
briancoutinho	c312cd8890	add simple test for nccl metadata (#125317 ) Add a few test cases to verify newly added NCCL metadata in profiler events The test looks at the following blocks record_param_comms ``` { "ph": "X", "cat": "cpu_op", "name": "record_param_comms", "pid": 2840966, "tid": 2844581, "ts": 2424859.045, "dur": 203.866, "args": { "Collective name": "allreduce", "Process Group Description": "default_pg", "dtype": "Float", "In msg nelems": 100, "Global rank start": 0, "Group size": 2, "Process Group Ranks": "[0, 1]", "Record function id": 0, "Out msg nelems": 100, "Global rank stride": 1, "Process Group Name": "0", } } ``` ## Unit test ``` >$ touch /tmp/barrier && TEMP_DIR="/tmp" BACKEND="nccl" WORLD_SIZE="2" python test/distributed/test_distributed_spawn.py -v TestDistBackendWithSpawn.test_ddp_profiling_torch_profiler test_ddp_profiling_torch_profiler (__main__.TestDistBackendWithSpawn.test_ddp_profiling_torch_profiler) ... NCCL version 2.20.5+cuda12.0 STAGE:2024-05-01 16:41:15 2840966:2840966 ActivityProfilerController.cpp:316] Completed Stage: Warm Up STAGE:2024-05-01 16:41:15 2840965:2840965 ActivityProfilerController.cpp:316] Completed Stage: Warm Up STAGE:2024-05-01 16:41:17 2840965:2840965 ActivityProfilerController.cpp:322] Completed Stage: Collection STAGE:2024-05-01 16:41:17 2840966:2840966 ActivityProfilerController.cpp:322] Completed Stage: Collection STAGE:2024-05-01 16:41:17 2840966:2840966 ActivityProfilerController.cpp:326] Completed Stage: Post Processing STAGE:2024-05-01 16:41:17 2840965:2840965 ActivityProfilerController.cpp:326] Completed Stage: Post Processing STAGE:2024-05-01 16:41:18 2840966:2840966 ActivityProfilerController.cpp:316] Completed Stage: Warm STAGE:2024-05-01 16:41:18 2840965:2840965 ActivityProfilerController.cpp:316] Completed Stage: Warm Up STAGE:2024-05-01 16:41:18 2840965:2840965 ActivityProfilerController.cpp:322] Completed Stage: Collection STAGE:2024-05-01 16:41:18 2840966:2840966 ActivityProfilerController.cpp:322] Completed Stage: Collection STAGE:2024-05-01 16:41:18 2840965:2840965 ActivityProfilerController.cpp:326] Completed Stage: Post Processing STAGE:2024-05-01 16:41:18 2840966:2840966 ActivityProfilerController.cpp:326] Completed Stage: Post Processing Trace saved to /tmp/tmpvwivp7mo.json Trace saved to /tmp/tmpvwvsc1fy.json ok ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/125317 Approved by: https://github.com/LucasLLC, https://github.com/kwen2501	2024-05-14 06:20:50 +00:00
Shengbao Zheng	9fa922c2ed	[profiler] Log process group name instead of pg uid (#124035 ) Summary: As part of the work of unifying process group identifier, log <group_name, group_desc>, instead of pg uid in profiler. - group_name remains as the unique identifier, e.g. “0”, "1" - group_desc will be the user specified name, e.g. "fsdp". Reviewed By: aaronenyeshi, kwen2501 Differential Revision: D55610682 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124035 Approved by: https://github.com/aaronenyeshi	2024-04-15 21:49:06 +00:00
Shengbao Zheng	5b9e5f854b	[profiler] Log process group id instead of backend id (#120475 ) Summary: https://github.com/pytorch/pytorch/pull/104373 introduced backend_id > an unique ID for the actual backend object, this is also exposed in record_param_comms, so we can correlate these collectives with the right backend object. However, it is inconvenient to correlate collectives with backend id. Instead, using pg id(uid) to correlate directly is a better solution. This PR change the ID information exposted in record_param_comms from backend_id to pg_id. Differential Revision: D53558257 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120475 Approved by: https://github.com/aaronenyeshi	2024-02-29 15:04:33 +00:00
cyy	87c6cd2f00	[1/N] Replace std::tie with structural binding (#119774 ) This PR replaces some std::tie calls with structural binding from C++17. This not only makes the code more compact, but also has some performance gain. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119774 Approved by: https://github.com/albanD, https://github.com/malfet	2024-02-14 09:25:04 +00:00
Pavan Balaji	ffc826bf10	[nccl-pg] Store PG global rank information in tracing logs (#115730 ) Storing the list of global ranks associated with each PG allows us to correlate traces across different ranks. Test Plan: OSS CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/115730 Approved by: https://github.com/fduwjj	2023-12-14 00:59:17 +00:00
Pavan Balaji	aa390cec21	[profiler] Fix description to use nelems rather than size (#114735 ) We were storing the number of elements in the tensor, rather than the actual bytes. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/114735 Approved by: https://github.com/aaronenyeshi, https://github.com/yoyoyocmu, https://github.com/kwen2501, https://github.com/fduwjj	2023-12-01 06:21:47 +00:00
Scott Wolchok	165f4f6ccf	[PyTorch] Redirect c10::optional to std::optional (#101995 ) We have C++17 now! I am intentionally dropping the `c10::optional<c10::ArrayRef>` size optimization. It was intended to improve dispatch, but thanks to D34602980 / #70864 we don't use `optional<ArrayRef>` in function arguments anymore anyway. Differential Revision: [D46079028](https://our.internmc.facebook.com/intern/diff/D46079028/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101995 Approved by: https://github.com/malfet, https://github.com/Skylion007, https://github.com/ezyang	2023-11-30 02:46:41 +00:00
Yue Dong	7a1314c548	[Kineto] Fix the Chrome trace loading issue with all_to_all input split length > 30 (#113392 ) Summary: This change fixes the Chrome trace loading issue with all_to_all input split length > 30. Now when the `all_to_all` input split size is larger than 30 we truncate the content and adding `...` at the end, which caused trouble when loading with Chrome trace. Test Plan: Trace with length = 2: - Link: https://fburl.com/perfdoctor/b94u4x82 {F1145436735} Looking into the json file: ``` Before: "In split size": [6058496, 5942784] After "In split size": "[6058496, 5942784]" ``` Reviewed By: aaronenyeshi Differential Revision: D51167843 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113392 Approved by: https://github.com/aaronenyeshi	2023-11-10 05:03:18 +00:00
Yue Dong	52e2b87d00	[Kineto][NCCL][5/n] Populate in/out split size info for all_to_all from CPU to CUDA kernel (#112308 ) Summary: This diff populates all_to_all input and out split size from CPU op to GPU kernel when valid. Test Plan: Trace example: - For non all_to_all collective functions: https://fburl.com/perfdoctor/4nobsu15 https://pxl.cl/3GNVb - For all_to_all: https://fburl.com/perfdoctor/f418goys https://pxl.cl/3H2nd Differential Revision: D50762093 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112308 Approved by: https://github.com/aaronenyeshi	2023-11-07 09:50:37 +00:00
Yuqing Jiang	2c3ab60506	[profiler] skip flop compute for Nested tensor (#112767 ) Summary: Since nested tensor doesn't have size(), when profiler with_flops is turned on, it throws exception in saveExtraArgs(). It is tricky to support flop computation for Nested tensor because it has dynamic shape. So skip the flop compute for Nested tensor for now instead of throwing exception. Test Plan: Used profiler with NT, the log shows this warning instead of throwing. ```/torch/nested/_internal/nested_tensor.py:205: UserWarning: Failed to save extra arguments for flops computation of op aten::add with input[0] as nested tensor. (Triggered internally at fbcode/caffe2/torch/csrc/profiler/util.cpp:433.)``` Differential Revision: D50919789 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112767 Approved by: https://github.com/aaronenyeshi	2023-11-03 17:44:00 +00:00
Aaron Enye Shi	63c089b09d	[c10] Move profiler clock to libc10 for timestamps (#111972 ) Summary: Move the profiler's Approximate Clock from libtorch to libc10. The main reason is to allow c10 features to get time. The clock is using TSC when available for performance. CUDA Caching Allocator's implementation of memory snapshot will add the timestamps to memory events with this same clock in subsequent diff. Test Plan: CI Differential Revision: D50601935 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/111972 Approved by: https://github.com/davidberard98	2023-10-27 16:18:40 +00:00
Yue Dong	ed15fa7cc2	[Kineto][NCCL][3/n] Get the NCCL communication info from PARAM_COMMS_INFO (#111846 ) This diff enables the functionality to get the NCCL communication metadata from `c10::DebugInfoKind::PARAM_COMMS_INFO` available in `ThreadLocalDebugInfo`. To make the overhead lighweight and avoid comparing the function name on each op, we add the method `bool isNcclMeta()`, which decided during initialization. Differential Revision: [D50439211](https://our.internmc.facebook.com/intern/diff/D50439211/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111846 Approved by: https://github.com/aaronenyeshi ghstack dependencies: #111842, #111843	2023-10-25 20:35:06 +00:00
Yuqing Jiang	56659844f9	[profiler] Show shapes for lists of tensors in chrome traces #109263 (#109751 ) Summary: https://github.com/pytorch/pytorch/issues/109263 Show the shape of tensorlist when the length is < 30. Test Plan: {F1097707985} and unit tests Reviewed By: davidberard98 Differential Revision: D49351902 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109751 Approved by: https://github.com/davidberard98	2023-09-26 01:03:54 +00:00
cyy	1157b4393b	Add const reference and std::move in opportunities detected by clang-tidy (#105815 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105815 Approved by: https://github.com/Skylion007	2023-07-25 12:28:14 +00:00
cyy	77f2883c41	[Reland2] fix missing-prototypes warnings in torch_cpu (Part 4) (#102228 ) This PR relands the changes introduced in PR https://github.com/pytorch/pytorch/pull/100849. The old PR turnd nnc_* functions into static. We now add declarations for them and hope that inter builds will pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102228 Approved by: https://github.com/albanD	2023-06-02 22:04:44 +00:00
PyTorch MergeBot	32ce06a5ab	Revert "[Reland] fix missing-prototypes warnings in torch_cpu (Part 4) (#101949 )" This reverts commit `4f2c007a1b`. Reverted https://github.com/pytorch/pytorch/pull/101949 on behalf of https://github.com/osalpekar due to As noted in @izaitsevfb's comment, we are still seeing linker errors, this time due to `nnc_prepacked_linear_clamp_run` being made a static function. ([comment](https://github.com/pytorch/pytorch/pull/101949#issuecomment-1560226880))	2023-05-23 22:53:47 +00:00
cyy	4f2c007a1b	[Reland] fix missing-prototypes warnings in torch_cpu (Part 4) (#101949 ) This PR relands the changes introduced in PR #100849. The old PR turnd nnc_aten_embedding into a static function, however, it is actually used in torch/csrc/jit/tensorexpr/operators/misc.cpp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101949 Approved by: https://github.com/albanD	2023-05-22 10:53:07 +00:00
PyTorch MergeBot	498c34e8e8	Revert " fix missing-prototypes warnings in torch_cpu (Part 4) (#100849 )" This reverts commit `c2f28d1c1d`. Reverted https://github.com/pytorch/pytorch/pull/100849 on behalf of https://github.com/izaitsevfb due to fails internal Meta builds, including fbcode and android, see D46009888: ld.lld: error: undefined symbol: nnc_aten_embedding ([comment](https://github.com/pytorch/pytorch/pull/100849#issuecomment-1555105800))	2023-05-19 19:05:15 +00:00
cyy	c2f28d1c1d	fix missing-prototypes warnings in torch_cpu (Part 4) (#100849 ) This PR fixes more missing-prototypes violations in the torch_cpu source following PRs #100053, #100147 and #100245 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100849 Approved by: https://github.com/albanD	2023-05-18 03:49:45 +00:00
David Berard	935100cbde	[profiler] When record_inputs=True, record scalar lists of length <= 30 (#100593 ) Many ops take as inputs scalars or scalar lists which are important to understand the properties of the op. For example, convolution ops' behavior and output shapes often depend on padding and strides, which are provided as scalars of lists of scalars. This will record scalar lists when record_inputs=True. Details: During collection (and this was true before this PR as well), we serialize values and tensor metadata into an InputOutputEncoder. After collection occurs, we deserialize these values to attach the information to each of the events. This PR does this: - Adds support for serializing scalar lists during collection / serialization - Adds an extra field called "Concrete Args" - Splits up the deserialization process into two steps - one for generating "input shapes" and one for generating "concrete args". We split up input shapes and concrete args to avoid interrupting any previous workflows that relied on the specific data in the input shapes category; additionally, it's just a better description. Note that single scalars will remain in the "input shapes" category as they were already in that category in the past. Differential Revision: [D45798431](https://our.internmc.facebook.com/intern/diff/D45798431) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100593 Approved by: https://github.com/aaronenyeshi	2023-05-16 07:58:46 +00:00
Aaron Enye Shi	98d3612e48	[Profiler] Enable SOFT_ASSERT to log Invariant Violation to Kineto (#92872 ) Summary: Record the Soft assert to Kineto. Test Plan: Internal CI Tests. Differential Revision: D42219145 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/92872 Approved by: https://github.com/robieta	2023-02-09 20:36:25 +00:00

1 2

68 Commits