pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Kazuaki Ishizaki	b5f9696d81	Fix typo under torch directory (#110824 ) This PR fixes typo `the the` of comments and exception messages in files under `torch` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110824 Approved by: https://github.com/H-Huang	2023-10-09 19:16:43 +00:00
cyy	12f97bb2e9	[Reland][3/N] Add -Wdeprecated and related fixes (#110518 ) Fixes the string_view errors and reland the work. The previous changes in torch/csrc/utils/invalid_arguments.cpp were too aggressive and not tested thoroughly. They are discarded. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110518 Approved by: https://github.com/ezyang	2023-10-07 08:38:40 +00:00
Jason Park	26f634eefb	Enable aarch64 for fixing undefined symbol error. (#110542 ) Summary: ARM can be safely supported Reviewed By: andrewjcg, aaronenyeshi Differential Revision: D49921679 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110542 Approved by: https://github.com/aaronenyeshi	2023-10-05 16:16:06 +00:00
PyTorch MergeBot	156aefa89b	Revert "[3/N] Add -Wdeprecated and related fixes (#109698 )" This reverts commit `c31fcdaa4f`. Reverted https://github.com/pytorch/pytorch/pull/109698 on behalf of https://github.com/PaliC due to breaking quantization tests ( quantization/test_quantize_per_channel_sub_byte and quantization/test_quantize_per_channel_float_qparams) internally ([comment](https://github.com/pytorch/pytorch/pull/109698#issuecomment-1746999806))	2023-10-04 14:33:47 +00:00
zdevito	3fe3439242	Use LLVMSymbolizer directly for unwind inside fbcode (#108800 ) Using LLVMSymbolizer directly avoids having to call fork which has caused timeouts in some circumstances. Differential Revision: [D49070589](https://our.internmc.facebook.com/intern/diff/D49070589/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108800 Approved by: https://github.com/aaronenyeshi	2023-10-04 04:04:08 +00:00
cyy	c31fcdaa4f	[3/N] Add -Wdeprecated and related fixes (#109698 ) This PR follows #108626. Hopefully we can enable the warning in the next PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109698 Approved by: https://github.com/Skylion007, https://github.com/ezyang	2023-10-03 22:50:53 +00:00
cyy	d58a91b2a6	[4/N] Move remaining c10::variant calls to std::variant (#110382 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/110382 Approved by: https://github.com/Skylion007	2023-10-02 23:52:04 +00:00
cyy	168f516fae	[3/N] Move c10::variant to std::variant (#110141 ) This PR moves more c10::variant calls to std::variant Pull Request resolved: https://github.com/pytorch/pytorch/pull/110141 Approved by: https://github.com/Skylion007	2023-09-28 18:43:55 +00:00
Yuqing Jiang	56659844f9	[profiler] Show shapes for lists of tensors in chrome traces #109263 (#109751 ) Summary: https://github.com/pytorch/pytorch/issues/109263 Show the shape of tensorlist when the length is < 30. Test Plan: {F1097707985} and unit tests Reviewed By: davidberard98 Differential Revision: D49351902 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109751 Approved by: https://github.com/davidberard98	2023-09-26 01:03:54 +00:00
Peter Bell	7ce69d5dbe	[RELAND] Remove some unnecessary <iostream> includes from headers (#108150 ) In almost all cases this is only included for writing the output formatter, which only uses `std::ostream` so including `<ostream>` is sufficient. The istream header is ~1000 lines so the difference is non-trivial. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108150 Approved by: https://github.com/albanD, https://github.com/malfet ghstack dependencies: #108149	2023-09-20 21:55:15 +00:00
Andrew Gallagher	a873f523ba	[aarch64][caffe2/torch/csrc/profiler] Support aarch64 in inline assembly (#104707 ) Summary: Port x86 inline assembly to aarch64: - Use `sp` instead of `%rsp` for stack pointer; move to second caller- saved register `x1` instead of `%rsi` - Use `x29` instead of `%rbp` for base pointer; move to third caller- saved register `x2` instead of `%rdx` Test Plan: ``` $ buck2 build fbcode//mode/opt fbcode//caffe2/torch/fb/model_transform/fx2trt/packaging:generate_merge_net_file ``` Reviewed By: jasonjk-park Differential Revision: D47242468 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104707 Approved by: https://github.com/aaronenyeshi	2023-09-15 14:34:55 +00:00
David Berard	aee5dec3aa	torch/csrc/profiler/README.md - stubs, RecordFunction, Autograd interaction (#108470 ) Technical details about the profiler - stubs for the stuff I haven't had time to fill out yet, plus details about RecordFunction and the profiler's interaction with autograd. reviewers - see `06c41eea9e/torch/csrc/profiler/README.md` for rendered markdown Pull Request resolved: https://github.com/pytorch/pytorch/pull/108470 Approved by: https://github.com/aaronenyeshi	2023-09-13 07:46:01 +00:00
PyTorch MergeBot	378ffde8c1	Revert "Remove some unnecessary <iostream> includes from headers (#106914 )" This reverts commit `a6c29b7227`. Reverted https://github.com/pytorch/pytorch/pull/106914 on behalf of https://github.com/izaitsevfb due to Causing metal breakage internally, see D48709279 ([comment](https://github.com/pytorch/pytorch/pull/106914#issuecomment-1696670027))	2023-08-29 02:22:33 +00:00
cyy	054f3f1d8f	[3/N] fix clang-tidy warnings in torch/csrc (#108024 ) Apply fixes to some found issues by clang-tidy in torch/csrc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108024 Approved by: https://github.com/Skylion007, https://github.com/albanD, https://github.com/malfet	2023-08-28 18:00:00 +00:00
bcoutinho	dcc674de8e	remove step invocation warning (#107216 ) Fixes #99734 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107216 Approved by: https://github.com/davidberard98, https://github.com/aaronenyeshi	2023-08-28 14:35:25 +00:00
Peter Bell	a6c29b7227	Remove some unnecessary <iostream> includes from headers (#106914 ) In almost all cases this is only included for writing the output formatter, which only uses `std::ostream` so including `<ostream>` is sufficient. The istream header is ~1000 lines so the difference is non-trivial. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106914 Approved by: https://github.com/lezcano	2023-08-25 18:24:05 +00:00
David Berard	614b865721	[profiler] _RecordFunctionFast - faster python bindings for record_function (#107195 ) torch.profiler.record_function is relatively slow; for example, in some benchmarks I was running, x.view_as(x) was ~2us, and ~16-17us when wrapped in a record_function context. The reasons for this are: dispatcher overhead from going through an op (the main source of overhead), python binding / python conversion overhead, and some overhead from the context manager. This new implementation is faster, but it won't work with torchscript. Based on the benchmarks I was running, it adds 0.5-0.7us overhead per call when the profiler is turned off. To use it, you can just: ```python with torch._C._profiler_manual._RecordFunctionFast("title"): torch.add(x, y) ``` It implements a context manager in python which directly calls the record_function utilities, instead of calling through an op. * The context manager is implemented directly in python because the overhead from calling a python function seems non-negligible * All the record_function calls, python object conversions are guarded on checks for whether the profiler is enabled or not. It seems like this saves a few hundred nanoseconds. For more details about the experiments I ran to choose this implementation, see [my record_functions experiments branch](https://github.com/pytorch/pytorch/compare/main...davidberard98:pytorch:record-function-fast-experiments?expand=1). This also adds a `torch.autograd.profiler._is_profiler_enabled` global variable that can be used to check whether a profiler is currently enabled. It's useful for further reducing the overhead, like this: ```python if torch.autograd.profiler._is_profiler_enabled: with torch._C._profiler_manual._RecordFunctionFast("title"): torch.add(x, y) else: torch.add(x, y) ``` On BERT_pytorch (CPU-bound model), if we add a record_function inside CachedAutotuning.run: * Naive torch.profiler.record_function() is a ~30% slowdown * Always wrapping with RecordFunctionFast causes a regression of ~2-4%. * Guarding with an if statement - any regression is within noise Selected benchmark results: these come from a 2.20GHz machine, GPU build but only running CPU ops; running `x.view_as(x)`, with various record_functions applied (with profiling turned off). For more detailed results see "record_functions experiments branch" linked above (those results are on a different machine, but show the same patterns). Note that the results are somewhat noisy, assume 0.05-0.1us variations ``` Baseline:: 1.7825262546539307 us # Just running x.view_as(x) profiled_basic:: 13.600390434265137 us # torch.profiler.record_function(x) + view_as precompute_manual_cm_rf:: 2.317216396331787 us # torch._C._profiler_manual._RecordFunctionFast(), if the context is pre-constructed + view_as guard_manual_cm_rf:: 1.7994389533996582 us # guard with _is_profiler_enabled + view_as ``` Differential Revision: [D48421198](https://our.internmc.facebook.com/intern/diff/D48421198) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107195 Approved by: https://github.com/albanD, https://github.com/aaronenyeshi	2023-08-22 18:48:30 +00:00
PyTorch MergeBot	28dc1a093f	Revert "Remove some unnecessary <iostream> includes from headers (#106914 )" This reverts commit `60936e4c29`. Reverted https://github.com/pytorch/pytorch/pull/106914 on behalf of https://github.com/ZainRizvi due to Sorry, but this is breaking internal builds. Seems like a lot of internal code depends on some of the removed imports ([comment](https://github.com/pytorch/pytorch/pull/106914#issuecomment-1688605975))	2023-08-22 17:16:48 +00:00
Peter Bell	60936e4c29	Remove some unnecessary <iostream> includes from headers (#106914 ) In almost all cases this is only included for writing the output formatter, which only uses `std::ostream` so including `<ostream>` is sufficient. The istream header is ~1000 lines so the difference is non-trivial. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106914 Approved by: https://github.com/lezcano	2023-08-19 20:21:58 +00:00
Edward Z. Yang	d5f7df3b8a	Hand bind CapturedTraceback (#107438 ) I do this instead of pybind11 because I need a custom tp_dealloc to promptly free PyFrames. I also add GC traverse/clear support. This is required to avoid leaking memory from co_extra on code objects in some obscure situations. This is indirectly tested by #107388 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/107438 Approved by: https://github.com/albanD	2023-08-18 19:05:52 +00:00
PyTorch MergeBot	22cade56ba	Revert "[Reland] Upgrade NVTX to NVTX3 (#97582 )" This reverts commit `5bbfb96203`. Reverted https://github.com/pytorch/pytorch/pull/97582 on behalf of https://github.com/izaitsevfb due to Breaks meta RL builds ([comment](https://github.com/pytorch/pytorch/pull/97582#issuecomment-1679568525))	2023-08-15 20:55:12 +00:00
cyy	5bbfb96203	[Reland] Upgrade NVTX to NVTX3 (#97582 ) PR #90689 replaces NVTX with NVTX3. However, the torch::nvtoolsext is created only when the third party NVTX is used. This is clear a logical error. We now move the creation code out of the branch to cover all cases. This should fix the issues reported in the comments of #90689. It would be better to move configurations of the failed FRL jobs to CI tests so that we can find such issues early before merging. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97582 Approved by: https://github.com/peterbell10	2023-08-14 16:55:25 +00:00
Brian Coutinho	3c52c6fd53	[pytorch] Disable CUDA sync events by default (#106723 ) Summary: As above, this was missed in a previous diff accidentally setting defaul to true. Internal to MEta this is disabled but it is enabled in open source PyTorch. Test Plan: CI Differential Revision: D48124636 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106723 Approved by: https://github.com/davidberard98, https://github.com/aaronenyeshi	2023-08-08 19:30:45 +00:00
cyy	b3e24c53eb	use performance-unnecessary-value-param in clang-tidy (#102615 ) performance-unnecessary-value-param has been disabled in clang-tidy for a long time. However, this check is actually useful and able to some interesting performance problems. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102615 Approved by: https://github.com/malfet, https://github.com/Skylion007	2023-07-28 17:37:03 +00:00
David Berard	76fb72e24a	[profiler] Fix profiling shapes with PT2 + lists of dynamic shapes (#105893 ) Fixes #105748 Follow-up to https://github.com/pytorch/pytorch/pull/104320. If we have a list that contains tensors with dynamic shapes, just mark the entire list as undefined. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105893 Approved by: https://github.com/aaronenyeshi	2023-07-26 13:41:07 +00:00
Brian Coutinho	8d9c8897ed	[profiler] add option for kineto synchronization events in the trace (#105187 ) Summary: ## About Sync Events For CUDA profiling mode, we can enable tracing CUDA synchronization events. * This feature captures synchronization events in CUDA including 1) context/device sync, 2) stream sync, 3) CUDA event sync, 4) CUDA stream wait event (inter stream synchronization). Read more * We add this flag using the profiler's experimental config option. * This PR relies on `7b003638c6` change in pytorch/kineto ## Usage Just set the `enable_cuda_sync_events` option in `_ExperimentalConfig` ``` from torch.autograd.profiler import profile, _ExperimentalConfig with profile(use_kineto=True, use_cuda=True, experimental_config=_ExperimentalConfig(enable_cuda_sync_events=True), ) as prof: workload() ``` Please wait for PyTorch github repo to point to `7b003638c6` or later commit in Kineto Test Plan: ## Unit Test Added a unit test buck2 test mode/dev-nosan caffe2/test:profiler --local-only -- test_profiler_cuda_sync_events Tests finished: Pass 1. Fail 0. Fatal 0. Skip 0. Build failure 0 ttps://www.internalfb.com/intern/testinfra/testrun/281475298097379 Reviewed By: davidberard98 Differential Revision: D46244591 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105187 Approved by: https://github.com/aaronenyeshi	2023-07-26 03:45:04 +00:00
cyy	1157b4393b	Add const reference and std::move in opportunities detected by clang-tidy (#105815 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105815 Approved by: https://github.com/Skylion007	2023-07-25 12:28:14 +00:00
Louis Feng	3a01c056f5	[PyTorch][ET] Collect Process Groups Mapping Info (#104373 ) Summary: Add the logics and interface to log ProcessGroup comms configuration (unique ID, type, and ranks info). Test Plan: Testing in HPC: ``` TORCH_LOGS=all ../buck-out/v2/gen/fbcode/c8344b52091f4f7f/hpc/models/ads/__ads_10x_launcher__/ads_10x_launcher.par +launcher=local launcher.num_trainers=4 +data_loader=random data_loader.num_batches=2000 ``` Example output in ET: ``` { "name": "## process_group:init ##", "id": 3, "rf_id": 1, "parent": 2, "fw_parent": 0, "seq_id": -1, "scope": 7, "tid": 1, "fw_tid": 0, "op_schema": "", "inputs": ["[{'pg_id': 140538064364672, 'backend_id': 140538060772480, 'backend_config': 'cuda:nccl', 'ranks': {0: 0, 1: 1, 2: 2, 3: 3}}, {'pg_id': 140538064363904, 'backend_id': 140538042628864, 'backend_config': 'cuda:nccl', 'ranks': {0: 0, 1: 1, 2: 2, 3: 3}}]"], "input_shapes": [[]], "input_types": ["String"], "outputs": [], "output_shapes": [], "output_types": [] }, ``` Differential Revision: D46321690 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104373 Approved by: https://github.com/kwen2501	2023-07-25 03:34:53 +00:00
David Berard	0ff9a82a4d	[profiler] Fix profiling PT2 w/ dynamic shapes & record_shapes (#104320 ) When using torch.profiler.profile(record_shapes=True), the profiler tries to collect `tensor.sizes()` to put this information into the profile trace. When dynamic shapes is turned on, sometimes tensors will appear that have symbolic sizes. In that case, `tensor.sizes()` can throw an assertion. This PR checks to see if tensor has symbolic shapes, and doesn't collect shape info in that case. Differential Revision: [D47082414](https://our.internmc.facebook.com/intern/diff/D47082414) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104320 Approved by: https://github.com/aaronenyeshi	2023-06-30 04:35:52 +00:00
Louis Feng	5847cb55e4	[PyPer][ET] Refactor EG to ET (#99694 ) Summary: Change execution graph to execution trace. See post: https://fb.workplace.com/groups/873291503156329/permalink/1529496217535851/ Test Plan: Run a job. Reviewed By: chaekit Differential Revision: D44121392 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99694 Approved by: https://github.com/chaekit	2023-06-22 19:41:54 +00:00
cyy	f2900420da	fix missing-prototypes warnings in torch_cpu (Part 6) (#101845 ) This PR fixes more missing-prototypes violations in the torch_cpu source following PRs #100053, #100147, #100245, #100849 and #101788 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101845 Approved by: https://github.com/albanD	2023-06-15 16:48:28 +00:00
PyTorch MergeBot	2c313e7b99	Revert "Record view stacks if running anomaly mode (#103185 )" This reverts commit `a02c573a89`. Reverted https://github.com/pytorch/pytorch/pull/103185 on behalf of https://github.com/izaitsevfb due to Breaks internal builds, see D46629734 ([comment](https://github.com/pytorch/pytorch/pull/103185#issuecomment-1588258206))	2023-06-12 23:52:10 +00:00
Edward Z. Yang	a02c573a89	Record view stacks if running anomaly mode (#103185 ) Now, when you do an inplace mutation and the view is naughty, you get this message: ``` RuntimeError: A view was created in no_grad mode and is being modified inplace with grad mode enabled. Given that this use case is ambiguous and error-prone, it is forbidden. You can clarify your code by moving both the view and the inplace either both inside the no_grad block (if you don't want the inplace to be tracked) or both outside (if you want the inplace to be tracked). To find out where this view was allocated, run your entire forward region under anomaly mode (torch.autograd.detect_anomaly(check_nan=False)). ``` When you run under anomaly mode, you get: ``` RuntimeError: A view was created in no_grad mode and is being modified inplace with grad mode enabled. Given that this use case is ambiguous and error-prone, it is forbidden. You can clarify your code by moving both the view and the inplace either both inside the no_grad block (if you don't want the inplace to be tracked) or both outside (if you want the inplace to be tracked). This view was allocated at: File "/data/users/ezyang/c/pytorch/test/test_autograd.py", line 4299, in arglebargle File "/data/users/ezyang/c/pytorch/test/test_autograd.py", line 4306, in test_anomaly_gives_view_stack File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/case.py", line 549, in _callTestMethod File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/case.py", line 591, in run File "/data/users/ezyang/c/pytorch/torch/testing/_internal/common_utils.py", line 2266, in _run_with_retry File "/data/users/ezyang/c/pytorch/torch/testing/_internal/common_utils.py", line 2337, in run File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/case.py", line 650, in __call__ File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/suite.py", line 122, in run File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/suite.py", line 84, in __call__ File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/suite.py", line 122, in run File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/suite.py", line 84, in __call__ File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/runner.py", line 184, in run File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/main.py", line 271, in runTests File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/main.py", line 101, in __init__ File "/data/users/ezyang/c/pytorch/torch/testing/_internal/common_utils.py", line 894, in run_tests File "/data/users/ezyang/c/pytorch/test/test_autograd.py", line 11209, in <module> ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103185 Approved by: https://github.com/zdevito	2023-06-09 16:56:28 +00:00
cyy	77f2883c41	[Reland2] fix missing-prototypes warnings in torch_cpu (Part 4) (#102228 ) This PR relands the changes introduced in PR https://github.com/pytorch/pytorch/pull/100849. The old PR turnd nnc_* functions into static. We now add declarations for them and hope that inter builds will pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102228 Approved by: https://github.com/albanD	2023-06-02 22:04:44 +00:00
dujinhang	2e8ce910bb	[Profiler][1/N] add profiler support for custom device. (#101554 ) 1. `torch.autograd.profiler` interface parameters changed. (use `self.use_device` instead of `self.use_cuda` facilitates access by other devices and integrate it in subsequent pr) 2. Modify `ProfilerEventStub`(aka `std::shared_ptr<CUevent_st>`) to `ProfilerVoidEventStub`(aka `std::shared_ptr<void>`) so that `ProfilerStubs` can be inherited by any `{device}Methods`. In addition, `cuda_event_start_` is renamed to `device_event_start_` , cuda and other devices can use this event pointer if needed. 4. custom device support using legacy profiling(add `ProfilerState::KINETO_PRIVATEUSE1_FALLBACK` option) 5. add `privateuse1Stubs` register (parse results and test cases are added in subsequent pr) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101554 Approved by: https://github.com/aaronenyeshi	2023-06-02 09:19:19 +00:00
cyy	7c2641d5f1	apply constexpr and if constexpr when possible (#102471 ) Now that we have full C++17 support, we can use if constexpr in some identified cases. <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at df4c16d</samp> The pull request improves the performance, readability, and consistency of various function templates in the `ATen` and `torch` modules by using `constexpr` keywords and C++17 features. It also fixes some type conversion and overflow issues for different input and output types. The changes affect the code for distributions, BLAS, batch normalization, embedding bag, random number generation, vectorized operations, cuBLAS, XNNPACK, CUTLASS, and shape inference. The affected files include `DistributionsHelper.h`, `vec256_int.h`, `vec512_int.h`, `BlasKernel.cpp`, `IndexKernel.cpp`, `EmbeddingBag.cpp`, `Normalization.cpp`, `rng_test.h`, `vec_test_all_types.h`, `TransformationHelper.h`, `CUDABlas.cpp`, `DistributionKernels.cpp`, `DistributionTemplates.h`, `RangeFactories.cu`, `RangeFactories.cpp`, `qconv.cpp`, `StructuredSparseLinearCUTLASS.cu`, `vec_test_all_types.cpp`, and `shape_inference.cpp`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102471 Approved by: https://github.com/Skylion007, https://github.com/malfet	2023-05-31 06:17:07 +00:00
David Berard	5324124eac	[profiler] Reintroduce forward-backward links (#102424 ) TL;DR: This re-introduces links between backward kernels and their corresponding forward kernels. <img width="1020" alt="Screenshot 2023-05-26 at 7 25 22 PM" src="https://github.com/pytorch/pytorch/assets/5067123/02571b59-859c-4c9e-b3ef-121ef3159812"> In the example above, you can see there are two such flows - one for aten::add, and one for aten::binary_cross_entropy ### Details Forward/backward links were added in https://github.com/pytorch/pytorch/pull/62553, but then disabled in https://github.com/pytorch/pytorch/pull/72904 due to segfaults (e.g. https://github.com/pytorch/pytorch/issues/69443). Between now and when the fwd-bwd links were disabled, there's been a lot of refactoring; so this PR updates the implementation: * Use a raw profiler::impl::Result instead of a KinetoEvent * Move the implementation to collection.cpp, where the TraceWrapper is currently handled. * Sort the events before processing, because they aren't always in chronological order * There can now be more than one event in the backward pass that matches the sequenceNr-threadID pair. The implementation needed to be updated to avoid showing multiple endpoints for a given sequenceNr-threadID pair ([ptr to where the bwd sequenceNr-threadID pair is duplicated](`6e3e3dd477/torch/csrc/profiler/collection.cpp (L398-L399)`)). Next, we need to verify that https://github.com/pytorch/pytorch/issues/69443 is fixed. Running the repro no longer errors. Looking further into the details of the issue it seems like the handling of the [raw linkedActivity pointer (old code from 2021)](`6089dcac48/libkineto/src/output_json.cpp (L283)`) resulted in the segfault. Now, it doesn't look like the linked activity is used anywhere in output_json.cpp so the issue should be fixed. ### Testing #### 1. unit test `test_profiler_fwd_bwd_link` was un-skipped. It was modified to match the new implementation. #### 2. https://github.com/pytorch/pytorch/issues/69443 I ran the repro in https://github.com/pytorch/pytorch/issues/69443 and verified there were no segfaults. #### 3. Duplicate flow IDs When forward-backward connections were first introduced, gpu-cpu async links had not been introduced. There's a possibility that gpu-cpu links and fwd-bwd links could interfere if their IDs overlap. I manually tested this in chrome://tracing; I edited a file so that a gpu-cpu link had the same ID as one of the fwd-bwd connections. The chrome tracing UI continued showing both types of links. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102424 Approved by: https://github.com/aaronenyeshi	2023-05-31 02:50:38 +00:00
David Berard	00992ffa2f	[profiler] Global function for controlling fwd-bwd connection behavior (#102492 ) Summary: In https://github.com/pytorch/pytorch/pull/102424, we'll re-introduce forward-backward links. We assume that most users will want to see them, but in case there are issues, we'll provide these APIs for turning them on and off. Differential Revision: D46266365 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102492 Approved by: https://github.com/aaronenyeshi	2023-05-30 04:50:34 +00:00
Feny Patel	cc233f4e23	integrate the new event with pytorch (#101025 ) Test Plan: This is a no-op diff Differential Revision: D45698169 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101025 Approved by: https://github.com/aaronenyeshi	2023-05-24 00:38:26 +00:00
Scott Wolchok	99f68d56ee	[PyTorch] Delete c10::guts::if_constexpr (#101991 ) Now that we have C++17, we should not need this any more. Differential Revision: [D46078335](https://our.internmc.facebook.com/intern/diff/D46078335/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101991 Approved by: https://github.com/r-barnes, https://github.com/Skylion007	2023-05-23 23:19:35 +00:00
PyTorch MergeBot	32ce06a5ab	Revert "[Reland] fix missing-prototypes warnings in torch_cpu (Part 4) (#101949 )" This reverts commit `4f2c007a1b`. Reverted https://github.com/pytorch/pytorch/pull/101949 on behalf of https://github.com/osalpekar due to As noted in @izaitsevfb's comment, we are still seeing linker errors, this time due to `nnc_prepacked_linear_clamp_run` being made a static function. ([comment](https://github.com/pytorch/pytorch/pull/101949#issuecomment-1560226880))	2023-05-23 22:53:47 +00:00
cyy	4f2c007a1b	[Reland] fix missing-prototypes warnings in torch_cpu (Part 4) (#101949 ) This PR relands the changes introduced in PR #100849. The old PR turnd nnc_aten_embedding into a static function, however, it is actually used in torch/csrc/jit/tensorexpr/operators/misc.cpp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101949 Approved by: https://github.com/albanD	2023-05-22 10:53:07 +00:00
PyTorch MergeBot	498c34e8e8	Revert " fix missing-prototypes warnings in torch_cpu (Part 4) (#100849 )" This reverts commit `c2f28d1c1d`. Reverted https://github.com/pytorch/pytorch/pull/100849 on behalf of https://github.com/izaitsevfb due to fails internal Meta builds, including fbcode and android, see D46009888: ld.lld: error: undefined symbol: nnc_aten_embedding ([comment](https://github.com/pytorch/pytorch/pull/100849#issuecomment-1555105800))	2023-05-19 19:05:15 +00:00
cyy	c2f28d1c1d	fix missing-prototypes warnings in torch_cpu (Part 4) (#100849 ) This PR fixes more missing-prototypes violations in the torch_cpu source following PRs #100053, #100147 and #100245 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100849 Approved by: https://github.com/albanD	2023-05-18 03:49:45 +00:00
David Berard	935100cbde	[profiler] When record_inputs=True, record scalar lists of length <= 30 (#100593 ) Many ops take as inputs scalars or scalar lists which are important to understand the properties of the op. For example, convolution ops' behavior and output shapes often depend on padding and strides, which are provided as scalars of lists of scalars. This will record scalar lists when record_inputs=True. Details: During collection (and this was true before this PR as well), we serialize values and tensor metadata into an InputOutputEncoder. After collection occurs, we deserialize these values to attach the information to each of the events. This PR does this: - Adds support for serializing scalar lists during collection / serialization - Adds an extra field called "Concrete Args" - Splits up the deserialization process into two steps - one for generating "input shapes" and one for generating "concrete args". We split up input shapes and concrete args to avoid interrupting any previous workflows that relied on the specific data in the input shapes category; additionally, it's just a better description. Note that single scalars will remain in the "input shapes" category as they were already in that category in the past. Differential Revision: [D45798431](https://our.internmc.facebook.com/intern/diff/D45798431) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100593 Approved by: https://github.com/aaronenyeshi	2023-05-16 07:58:46 +00:00
David Berard	e406125b6b	[profiler] replace record_concrete_inputs_enabled interface with callback instead of boolean (#101292 ) Summary: This allows an internal use case to register a callback that can vary over time instead of being a static value over the lifetime of the program. Test Plan: ran the test listed above ^^. Differential Revision: D45805139 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101292 Approved by: https://github.com/aaronenyeshi	2023-05-13 05:06:48 +00:00
David Berard	f7571507e0	Add global boolean for controlling whether to record concrete shapes or not (#101043 ) Summary: We don't think the performance impact of recording concrete shapes is significant; but it's good to have a knob for turning it off quickly in case it has a large performance impact. Test Plan: Ran D45681838. It prints the state of that "concrete inputs" boolean. I ran it before and after canarying a change to `pytorch/kineto:pytorch_record_concrete_inputs`; before, it returns true; after, it returns false. Note that D45681838 had to add `service` on the main function. That's because we need to `initFacebook` in order to use jks. Differential Revision: D45680162 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101043 Approved by: https://github.com/aaronenyeshi	2023-05-11 18:07:35 +00:00
Richard Li	c523d7d899	Add a new hook (#99854 ) Differential Revision: D45220984 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99854 Approved by: https://github.com/albanD	2023-04-26 23:00:38 +00:00
Zachary DeVito	400075f733	[stacktraces] Keep addr2line procs around (#99670 ) This PR caches the addr -> Frame information across calls to symbolize, and also keeps the addr2line symbolizing processes around once requested. This makes calls to symbolize frames that have been seen before nearly instant, and makes lookup of address in libraries that have already been loaded by addr2line faster. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99670 Approved by: https://github.com/ezyang	2023-04-21 18:16:04 +00:00
Zachary DeVito	9def799097	[combined tracebacks] missing gil acquire (#99685 ) When this code was refactored, the GIL for appendSymbolized was dropped accidentally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99685 Approved by: https://github.com/davidberard98	2023-04-21 06:24:36 +00:00

1 2 3 4 5 ...

253 Commits