pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Kai Londenberg	cb24013b5b	Fix assertion failure in pytorch profiler (#143940 ) Summary: Attempt to fix the following exception which occurred when profiling a Pytorch model ( Meta-internal LLM ) that also involved a ThreadPoolExecutor in the background: ``` Exception Found: !stack.empty() INTERNAL ASSERT FAILED at "fbcode/caffe2/torch/csrc/autograd/profiler_python.cpp":987, please report a bug to PyTorch. Python replay stack is empty. ``` The root cause of this issue seems to be that a thread call stack can be empty, which is asserted to not be empty. I fixed this with some minimal changes to profiler_python.cpp Approach: * Ensuring that the stack in question is not empty before trying to pop from it. Test Plan: * Tested manually on a reproducible scenario where the assertion failure was otherwise triggered ( repro too large to include here ). The assertion failure disappears. * CI Differential Revision: D67691558 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143940 Approved by: https://github.com/Skylion007, https://github.com/sraikund16	2024-12-31 01:43:04 +00:00
cyy	dca443835e	Enable more readability-redundant checks (#143963 ) They are helpful to simplifying code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143963 Approved by: https://github.com/albanD	2024-12-30 14:49:33 +00:00
cyy	075905b7bd	[14/N] Fix extra warnings brought by clang-tidy-17 (#141644 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/141644 Approved by: https://github.com/ezyang Co-authored-by: Eli Uriegas <1700823+seemethere@users.noreply.github.com>	2024-12-13 06:22:13 +00:00
PyTorch MergeBot	2f0fe82f6d	Revert "[14/N] Fix extra warnings brought by clang-tidy-17 (#141644 )" This reverts commit `24a5a2ef25`. Reverted https://github.com/pytorch/pytorch/pull/141644 on behalf of https://github.com/clee2000 due to failing internally D67112938 ([comment](https://github.com/pytorch/pytorch/pull/141644#issuecomment-2539602023))	2024-12-12 17:43:36 +00:00
cyy	24a5a2ef25	[14/N] Fix extra warnings brought by clang-tidy-17 (#141644 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/141644 Approved by: https://github.com/ezyang	2024-12-11 18:40:42 +00:00
cyy	7d98b3dcee	[3/N] Apply bugprone-unchecked-optional-access (#142442 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/142442 Approved by: https://github.com/albanD	2024-12-11 01:39:10 +00:00
cyy	40fb738197	Use Wextra-semi (#140236 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/140236 Approved by: https://github.com/ezyang	2024-11-13 02:15:16 +00:00
Richard Barnes	fddabc6e0b	C10_UNUSED to [[maybe_unused]] (#6357 ) (#138364 ) Summary: Pull Request resolved: https://github.com/pytorch/executorch/pull/6357 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138364 Approved by: https://github.com/Skylion007, https://github.com/eqy	2024-10-19 13:17:43 +00:00
Xuehai Pan	8962610247	[BE][clang-format] make macro `PyObject_HEAD_INIT(type)` and `PyVarObject_HEAD_INIT(type, size)` have its own line (#136949 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136949 Approved by: https://github.com/albanD, https://github.com/eqy ghstack dependencies: #136945	2024-10-02 18:39:22 +00:00
Shivam Raikundalia	9c2d119194	[Profiler/CPU] Add API for Dynamic Activity Toggling [3/n] (#133353 ) Summary: In this diff, we add the CPU activity implementation of being able to dynamically toggle profiling in between steps. To do this we remove the callbacks for Torch Ops and add them back in when an enable call is made. This diff also adds some support code for doing the same in python; however, the python stack comes with its own set of compilcations when enabling this feature. For one, we get into a scenario where the python stack during the toggle never gets an exit as it the tracing gets turned off which makes for some tricky post processing. For this reason, we can leave the python dynamic toggling off for now and revisit if there is enough demand. Test Plan: Got the following tracing by disabling torch and cuda ops: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Aug_13_13_03_02.606577.pt.trace.json.gz&bucket=gpu_traces Differential Revision: D61221497 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133353 Approved by: https://github.com/sanrise, https://github.com/aaronenyeshi	2024-08-16 16:36:57 +00:00
cyy	929d2f8253	[3/N] Fix clang-tidy warnings in torch/csrc/autograd (#133389 ) Follows #133295 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133389 Approved by: https://github.com/Skylion007	2024-08-16 00:57:54 +00:00
cyy	f4dcf2ae93	[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301 Approved by: https://github.com/ezyang, https://github.com/r-barnes	2024-07-08 07:03:53 +00:00
PyTorch MergeBot	846bb30e13	Revert "[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 )" This reverts commit `bd72e28314`. Reverted https://github.com/pytorch/pytorch/pull/128301 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it fails XLA build `bd72e28314`. Please rebase your PR before relanding because I think the failure is hidden by an unrelated broken trunk XLA failure from your current base commit ([comment](https://github.com/pytorch/pytorch/pull/128301#issuecomment-2169035822))	2024-06-15 01:58:20 +00:00
cyy	bd72e28314	[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301 Approved by: https://github.com/ezyang	2024-06-14 23:21:01 +00:00
cyy	9538bf4e7c	[2/N] Remove inclusion of c10/util/string_utils.h (#128372 ) Follows #128300. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128372 Approved by: https://github.com/aaronenyeshi	2024-06-12 01:18:20 +00:00
Richard Barnes	ed327876f5	[codemod] `c10:optional` -> `std::optional` (#126135 ) Generated by running the following from PyTorch root: ``` find . -regex ".*\.$cpp\\|h\\|cu\\|hpp\\|cc\\|cxx$$" \| grep -v "build/" \| xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/' ``` `c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi	2024-05-14 19:35:51 +00:00
Scott Wolchok	165f4f6ccf	[PyTorch] Redirect c10::optional to std::optional (#101995 ) We have C++17 now! I am intentionally dropping the `c10::optional<c10::ArrayRef>` size optimization. It was intended to improve dispatch, but thanks to D34602980 / #70864 we don't use `optional<ArrayRef>` in function arguments anymore anyway. Differential Revision: [D46079028](https://our.internmc.facebook.com/intern/diff/D46079028/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101995 Approved by: https://github.com/malfet, https://github.com/Skylion007, https://github.com/ezyang	2023-11-30 02:46:41 +00:00
Aaron Enye Shi	63c089b09d	[c10] Move profiler clock to libc10 for timestamps (#111972 ) Summary: Move the profiler's Approximate Clock from libtorch to libc10. The main reason is to allow c10 features to get time. The clock is using TSC when available for performance. CUDA Caching Allocator's implementation of memory snapshot will add the timestamps to memory events with this same clock in subsequent diff. Test Plan: CI Differential Revision: D50601935 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/111972 Approved by: https://github.com/davidberard98	2023-10-27 16:18:40 +00:00
cyy	d58a91b2a6	[4/N] Move remaining c10::variant calls to std::variant (#110382 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/110382 Approved by: https://github.com/Skylion007	2023-10-02 23:52:04 +00:00
fwenguang	c4f2b6dbd2	[profiler] use PyCFunction_Check to check both PyCMethod_Type and PyC… (#110002 ) At https://github.com/pytorch/pytorch/blob/main/torch/csrc/autograd/profiler_python.cpp#L1096, when what is PyTrace_C_CALL, Py_TYPE(arg) only can be PyCFunction_Type before python3.9. But in python3.9 or later, Py_TYPE(arg) also can be PyCMethod_Type. PyCMethod_Type is subtype of PyCFunction_Type, ref to `f2eaa92b0c/Objects/methodobject.c (L372)`. So there should use PyCFunction_Check to check arg->ob_type. Fixes #109877 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110002 Approved by: https://github.com/ezyang	2023-09-25 20:17:25 +00:00
cyy	75b954b715	[4/N] Enable clang-tidy in torch/csrc/autograd (#109455 ) The PR enables clang-tidy checks in torch/csrc/autograd. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109455 Approved by: https://github.com/Skylion007	2023-09-17 17:11:50 +00:00
cyy	a14d30d8d1	[1/N] apply clang-tidy in torch/csrc/autograd (#109032 ) This PR begins a new series of patches for enabling clang-tidy checks in torch/csrc/augograd Pull Request resolved: https://github.com/pytorch/pytorch/pull/109032 Approved by: https://github.com/albanD, https://github.com/Skylion007	2023-09-15 23:28:43 +00:00
cyy	36b8ca4e48	[2/N] apply clang-tidy in torch/csrc/autograd (#109277 ) This PR follows the work of PR #109032. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109277 Approved by: https://github.com/albanD	2023-09-15 00:39:12 +00:00
cyy	e4f3e5434f	[Reland] Elimates c10::guts::to_string (#108748 ) Reland of PR #108480, after relanding another blocking PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108748 Approved by: https://github.com/huydhn	2023-09-07 13:35:17 +00:00
PyTorch MergeBot	8da04e023e	Revert "Eliminate c10::guts::to_string (#108480 )" This reverts commit `4146be192e`. Reverted https://github.com/pytorch/pytorch/pull/108480 on behalf of https://github.com/huydhn due to Sorry for reverting this, but this is needed to keep trunk green after https://github.com/pytorch/pytorch/pull/108479 was reverted. Both will need to be relanded ([comment](https://github.com/pytorch/pytorch/pull/108480#issuecomment-1707067595))	2023-09-05 18:04:53 +00:00
cyy	4146be192e	Eliminate c10::guts::to_string (#108480 ) This PR replace c10::guts::to_string with std::to_string. The major part of changes is using void* as optimizer state key since string is used only for serialization and using pointers as hashing keys is more efficient than a string. Some other guts functions in the affected source files are also replaced. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108480 Approved by: https://github.com/Skylion007	2023-09-04 08:12:53 +00:00
Scott Wolchok	99f68d56ee	[PyTorch] Delete c10::guts::if_constexpr (#101991 ) Now that we have C++17, we should not need this any more. Differential Revision: [D46078335](https://our.internmc.facebook.com/intern/diff/D46078335/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101991 Approved by: https://github.com/r-barnes, https://github.com/Skylion007	2023-05-23 23:19:35 +00:00
Taylor Robie	d09cd15216	[Profiler] Defer recording startup python events (take 2) (#91684 ) This is my commandeer of https://github.com/pytorch/pytorch/pull/82154 with a couple extra fixes. The high level idea is that when we start profiling we see python frames which are currently executing, but we don't know what system TID created them. So instead we defer the TID assignment, and then during post processing we peer into the future and use the system TID of the next call on that Python TID. As an aside, it turns out that CPython does some bookkeeping (`ee821dcd39/Include/cpython/pystate.h (L159-L165)`, thanks @dzhulgakov for the pointer), but you'd have to do some extra work at runtime to know how to map their TID to ours so for now I'm going to stick to what I can glean from post processing alone. As we start observing more threads it becomes more important to be principled about how we start up and shut down. (Since threads may die while the profiler is running.) #82154 had various troubles with segfaults that wound up being related to accessing Python thread pointers which were no longer alive. I've tweaked the startup and shutdown interaction with the CPython interpreter and it should be safer now. Differential Revision: [D42336292](https://our.internmc.facebook.com/intern/diff/D42336292/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91684 Approved by: https://github.com/chaekit	2023-02-11 18:44:00 +00:00
Taylor Robie	4c6a7faec5	[Profiler] Use RAII wrapper to manage refcounts during python tracer startup. (#91646 ) Refcounting is hard. (Citation needed.) https://github.com/pytorch/pytorch/pull/81242 introduced a corner case where we would over incref when breaking out due to max (128) depth. https://github.com/pytorch/pytorch/pull/85847 ostensibly fixed a segfault, but in actuality was over incref-ing because PyEval_GetFrame returns a borrowed reference while `PyFrame_GetBack` returns a strong reference. Instead of squinting really hard at the loops, it's much better to use the RAII wrapper and do the right thing by default. I noticed the over incref issue because of a memory leak where Tensors captured by the closure of a function would be kept alive by zombie frames. Differential Revision: [D42184394](https://our.internmc.facebook.com/intern/diff/D42184394/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91646 Approved by: https://github.com/albanD	2023-02-10 00:28:18 +00:00
Aaron Gokaslan	8c8cd9539d	Add missing moves to torch autograd (#92772 ) Applies some additional std::move functions to torch/csrc/autograd to opportunities that were found via static analysis. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92772 Approved by: https://github.com/ezyang	2023-01-24 02:01:52 +00:00
Aaron Gokaslan	77c2a8a11f	Clang-Tidy: Improve ctors by removing unnecessary copies and initializations (#91538 ) Apply clang-tidy fixups to prefer member initializer and modernize-pass-by-value. This is a mostly a noop, but it should make a few ctors slighlty more readable and more efficient. Also drops in some missing moves that prevents a lot of unnecessary copying. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91538 Approved by: https://github.com/ezyang	2022-12-31 07:19:30 +00:00
Aaron Gokaslan	553b592824	Clang-Tidy: use modern for each loops and transparent functors (#91449 ) This applies some more clang-tidy fixups. Particularly, this applies the modernize loops and modernize-use-transparent-functors checks. Transparent functors are less error prone since you don't have to worry about accidentally specifying the wrong type and are newly available as of C++17. Modern foreach loops tend be more readable and can be more efficient to iterate over since the loop condition is removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91449 Approved by: https://github.com/ezyang	2022-12-29 23:37:51 +00:00
Aaron Gokaslan	3916d7a575	Apply modernize-use-emplace to aten, c10, torch (#91077 ) Apply clang-tidy check modernize-use-emplace. This is slightly more efficient by using an inplace constructor and is the recommended style in parts of the codebase covered by clang-tidy. This just manually applies the check to rest of the codebase. Pinging @ezyang as this is related to my other PRs he reviewed like #89000 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91077 Approved by: https://github.com/ezyang	2022-12-19 07:49:56 +00:00
Taylor Robie	6e6f929b2c	[Profiler] Restructure inputs and capture TensorLists. (#87825 ) This PR unifies and rationalizes some of the input representation in Result. The current approach of storing separate types in separate vectors is tedious for two types (Tensors and scalars), but would be even more annoying with the addition of TensorLists. A similar disconnection exists with sizes and strides which the user is also expected to zip with tensor_metadata. I simplified things by moving inputs to a variant and moving sizes and strides into TensorMetadata. This also forced collection of sizes and strides in python tracer which helps to bring it in line with op profiling. Collection of TensorLists is fairly straightforward; `InputOutputEncoder` already has a spot for them (I actually collected them in the original TorchTidy prototype) so it was just a matter of plumbing things through. Differential Revision: [D40734451](https://our.internmc.facebook.com/intern/diff/D40734451/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87825 Approved by: https://github.com/slgong-fb, https://github.com/chaekit	2022-11-08 21:48:43 +00:00
Taylor Robie	b16b5fb802	[Profiler] Hold weak reference to prevent TensorImpl address reuse during profiling. (#87244 ) A recurring problem with assigning Tensor IDs is that we want to preserve identity when storage changes but we don't observe TensorImpl destruction so identity assignment is not robust to the ABA problem with respect to TensorImpl. ~TensorImpl is far too hot to instrument; even adding a call to a no-op function in a different compilation unit increases overhead by tens of percent. (OSS builds do not have any sort of LTO.) Fortunately there is a solution. A PyTorch Tensor is a `c10::intrusive_ptr<c10::TensorImpl>`, which in turn holds a storage. (Which is a `c10::intrusive_ptr<c10::StorageImpl>`) `c10::intrusive_ptr` has a `c10::weak_intrusive_ptr` class for taking non-owning references to the underlying object. The implementation involves both a strong refcount and weak refcount in `c10::intrusive_ptr`. If the strong refcount of an intrusive_ptr goes to zero and there are no weak references then everything is deleted. However if there is a weak reference then the intrusive_ptr calls `release_resources()` but not delete. This has the effect of freeing the underlying resources (ensuring that program semantics are unchanged) but leaves behind an empty shell of an `intrusive_ptr` that the `weak_intrusive_ptr`s use to check status. And herein lies the solution: as long as we hold a weak reference to a TensorImpl we will block deletion and prevent the `TensorImpl` from being reused. This PR uses a `c10::weak_intrusive_ptr<c10::TensorImpl>` to store the address of profiled TensorImpls and then converts it to a raw pointer (or rather, a `TensorImplAddress`) during post processing when we no longer care about blocking address reuse. Differential Revision: [D40492848](https://our.internmc.facebook.com/intern/diff/D40492848/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87244 Approved by: https://github.com/slgong-fb, https://github.com/albanD	2022-10-27 06:38:11 +00:00
Taylor Robie	b0e10292fa	[Profiler] Tensor IDs for Module and Optimizer variables (#86754 ) More sophisticated profiling will increasingly rely on python tracer to contextualize observed results. This PR adds Tensors which are observed by the python tracer to the identity assignment loop. Differential Revision: [D39852885](https://our.internmc.facebook.com/intern/diff/D39852885/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86754 Approved by: https://github.com/slgong-fb, https://github.com/aaronenyeshi	2022-10-23 19:23:42 +00:00
Taylor Robie	be2d647ea6	[Profiler] Use parameter as key for optimizer state recording. (#86753 ) While optimizer can store state however it likes, in practice most optimizer state corresponds to a particular parameter. (This is the case for all `torch.optim` optimizers.) Thus, it turns out to be ergonomic to collect using that structure. Note that this doesn't lock us into anything; we can always collect state with non Tensor keys if the use case arises. One simplification that arises is that Module and Optimizer collection has very similar structure. So similar, in fact, that it is possible to use a common template for config. I also found that a lot of the `check_and_store` logic could be simplified and inlined by this joining of collected optimizer state. Differential Revision: [D40210703](https://our.internmc.facebook.com/intern/diff/D40210703/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86753 Approved by: https://github.com/slgong-fb, https://github.com/aaronenyeshi	2022-10-23 19:23:39 +00:00
Seonglyong Gong	dbea07b6aa	[Profiler] record gradient from nnModule (#86355 ) Summary: - catch .grad tensor info - update data type and `check_and_store`, etc - update unit test case Test Plan: buck run mode/opt //caffe2/test:profiler Reviewed By: chaekit Differential Revision: D39711295 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86355 Approved by: https://github.com/chaekit	2022-10-07 09:58:50 +00:00
Seonglyong Gong	a117fde86f	[Profiler] Apply TensorMetadata for Optimizer and nnModule (#86047 ) Summary: - Use `TensorMetadat` struct in saving tensor info from Optimizer and nnModule. Test Plan: buck run mode/opt //caffe2/test:profiler Reviewed By: chaekit Differential Revision: D39682205 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86047 Approved by: https://github.com/chaekit, https://github.com/robieta	2022-10-06 06:18:56 +00:00
Seonglyong Gong	3cfc61b846	[Profiler][trivial] Optimizer states (part 4 of Record Optimizer) (#85840 ) Summary: - add states into OptInfo and update unit testcase Test Plan: buck run mode/opt //caffe2/test:profiler Reviewed By: chaekit Differential Revision: D39406540 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85840 Approved by: https://github.com/robieta	2022-09-29 07:28:33 +00:00
Seonglyong Gong	7628603aee	[Profiler] bug fix: python object reference counting (#85847 ) Summary: Wrong reference counting of Python Objects has made intermittent and corner-case-only segfault. - before : increment once decrement in a loop. - after: increment and decrement in different but consistent loops. Test Plan: buck run mode/opt //caffe2/test:profiler Differential Revision: D39902973 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85847 Approved by: https://github.com/robieta, https://github.com/aaronenyeshi	2022-09-29 03:58:34 +00:00
Seonglyong Gong	d776693701	[Profiler] Optimizer param_groups (part 3 of Record Optimizer) (#85784 ) Summary: - use TensorMetadata struct - check_and_store util as overloading - param_groups - clean up unit test cases Test Plan: buck run mode/opt //caffe2/test:profiler Reviewed By: chaekit Differential Revision: D39406072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85784 Approved by: https://github.com/aaronenyeshi, https://github.com/robieta	2022-09-28 19:18:12 +00:00
Seonglyong Gong	f80ef73d1c	[Profiler] tracking Optimizer (part 2 of Record Optimizer) (#84920 ) Summary: Part 2 of Record Optimizer param_groups and states (https://github.com/pytorch/pytorch/pull/84063) - hooking from optimizer step - PyOptCall Type - declare data type for collection - python binding - simple unit test case Test Plan: buck run mode/opt //caffe2/test:profiler Differential Revision: D39402667 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84920 Approved by: https://github.com/robieta	2022-09-28 02:48:07 +00:00
Seonglyong Gong	dc865bff4e	[Profiler] set_class util (part 1 of Record Optimizer) (#84779 ) Summary: Part 1 of Record Optimizer param_groups and states (https://github.com/pytorch/pytorch/pull/84063) - nnModule and Optimizer have duplicated parts - create a util function to avoid duplication Test Plan: buck run mode/opt //caffe2/test:profiler Differential Revision: D39397210 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84779 Approved by: https://github.com/robieta	2022-09-13 01:48:41 +00:00
Taylor Robie	daffff9986	[Profiler] Make `RecordQueue` manage the lifetime of `PythonTracer`. (#83964 ) `PythonTracer` holds a pointer to an owning `RecordQueue`, however that relationship is not enforced and it is possible to dangle that pointer if the ProfilerState owning the `RecordQueue` is destroyed without proper cleanup. We currently use a singleton to enforce the requirement that only one python tracer is active at a time, however a better formulation is to simply enforce that with an atomic bool and manage object lifetime through composition. In this new architecture, `RecordQueue` explicitly holds a unique_ptr to the python tracer instance. That way if `~RecordQueue` is called it will call `~PythonTracer` which can then clean up any state. Overall it is just a simpler ownership model, and less prone to unexpected failures. Differential Revision: [D38955616](https://our.internmc.facebook.com/intern/diff/D38955616/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83964 Approved by: https://github.com/slgong-fb	2022-09-09 19:04:08 +00:00
Taylor Robie	328538700a	[Profiler][Trivial] Move `PythonTracerBase` to `torch/csrc/profiler/orchestration` (#83895 ) The ownership model between `RecordQueue` and `PythonTracer` is brittle; if a profiler is popped without proper shutdown it can dangle a reference in `PythonTracer` which will segfault when dereferenced. The next PR will address this; to start we simply move the code into `torch/csrc/profiler/orchestration` to limit the sloc delta when making actual changes. Differential Revision: [D38933962](https://our.internmc.facebook.com/intern/diff/D38933962/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D38933962/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/83895 Approved by: https://github.com/slgong-fb	2022-09-09 19:04:08 +00:00
Seonglyong Gong	fa241fd50e	[Profiler] record nn.Module's parameters (#83209 ) Summary: Record nn.Module's parameters for detaild memory profiling: - extend 'module_' in value cache & NNModuleInfo to save parameters - python binding and unit test case Test Plan: buck run mode/opt //caffe2/test:profiler -- -r test_nnmodule Differential Revision: D38379717 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83209 Approved by: https://github.com/robieta	2022-08-24 08:17:20 +00:00
Taylor Robie	09e837634b	[Profiler][Minor] Set end time on python events when profiling stops. (#83621 ) We don't have an end event for calls that are ongoing when profiling stops. (e.g. main) This cropped up when I was adding checks for negative durations. I also refactored `populate` to use a pop method. This not only allows me to implement this fix, but should also provide a convenient entry point for https://github.com/pytorch/pytorch/pull/82154 Differential Revision: [D38426342](https://our.internmc.facebook.com/intern/diff/D38426342/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83621 Approved by: https://github.com/slgong-fb	2022-08-21 00:22:11 +00:00
Taylor Robie	7edd947178	[Profiler][Python tracer] Add ephemeral inputs to the value cache. (#81958 ) There are a couple of bugs in the python tracer related to how we cache values. The first is that `ValueCache::store<CallType::PyModuleCall>` wrongly assumes that it will only be called from the profiling callback and calls `PyEval_GetFrame`, effectively violating the encapsulation of the cache by accessing global state. Secondly, we use `arg` to cache bound C functions. This turns out not to be correct, and collisions are resulting in incorrect traces. In both cases, we can solve the problem by introducing a concept of ephemeral data which is used to materialize a cached value, but is not part of the cache key. (And the author is responsible for making sure that is done correctly.) Differential Revision: [D38062921](https://our.internmc.facebook.com/intern/diff/D38062921/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81958 Approved by: https://github.com/ngimel	2022-07-29 05:12:09 +00:00
albanD	4b7de26556	Fix C API to be compatible with latest 3.11 beta (#81242 ) Based off https://github.com/pytorch/pytorch/pull/80511 with extra changes: - Update pybind to the latest release as it contains some needed fixes - Extend the compat header to do reduce changes in code Pull Request resolved: https://github.com/pytorch/pytorch/pull/81242 Approved by: https://github.com/malfet, https://github.com/mattip	2022-07-27 08:37:10 +00:00

1 2

62 Commits