pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
dujinhang	2e8ce910bb	[Profiler][1/N] add profiler support for custom device. (#101554 ) 1. `torch.autograd.profiler` interface parameters changed. (use `self.use_device` instead of `self.use_cuda` facilitates access by other devices and integrate it in subsequent pr) 2. Modify `ProfilerEventStub`(aka `std::shared_ptr<CUevent_st>`) to `ProfilerVoidEventStub`(aka `std::shared_ptr<void>`) so that `ProfilerStubs` can be inherited by any `{device}Methods`. In addition, `cuda_event_start_` is renamed to `device_event_start_` , cuda and other devices can use this event pointer if needed. 4. custom device support using legacy profiling(add `ProfilerState::KINETO_PRIVATEUSE1_FALLBACK` option) 5. add `privateuse1Stubs` register (parse results and test cases are added in subsequent pr) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101554 Approved by: https://github.com/aaronenyeshi	2023-06-02 09:19:19 +00:00
David Berard	5324124eac	[profiler] Reintroduce forward-backward links (#102424 ) TL;DR: This re-introduces links between backward kernels and their corresponding forward kernels. <img width="1020" alt="Screenshot 2023-05-26 at 7 25 22 PM" src="https://github.com/pytorch/pytorch/assets/5067123/02571b59-859c-4c9e-b3ef-121ef3159812"> In the example above, you can see there are two such flows - one for aten::add, and one for aten::binary_cross_entropy ### Details Forward/backward links were added in https://github.com/pytorch/pytorch/pull/62553, but then disabled in https://github.com/pytorch/pytorch/pull/72904 due to segfaults (e.g. https://github.com/pytorch/pytorch/issues/69443). Between now and when the fwd-bwd links were disabled, there's been a lot of refactoring; so this PR updates the implementation: * Use a raw profiler::impl::Result instead of a KinetoEvent * Move the implementation to collection.cpp, where the TraceWrapper is currently handled. * Sort the events before processing, because they aren't always in chronological order * There can now be more than one event in the backward pass that matches the sequenceNr-threadID pair. The implementation needed to be updated to avoid showing multiple endpoints for a given sequenceNr-threadID pair ([ptr to where the bwd sequenceNr-threadID pair is duplicated](`6e3e3dd477/torch/csrc/profiler/collection.cpp (L398-L399)`)). Next, we need to verify that https://github.com/pytorch/pytorch/issues/69443 is fixed. Running the repro no longer errors. Looking further into the details of the issue it seems like the handling of the [raw linkedActivity pointer (old code from 2021)](`6089dcac48/libkineto/src/output_json.cpp (L283)`) resulted in the segfault. Now, it doesn't look like the linked activity is used anywhere in output_json.cpp so the issue should be fixed. ### Testing #### 1. unit test `test_profiler_fwd_bwd_link` was un-skipped. It was modified to match the new implementation. #### 2. https://github.com/pytorch/pytorch/issues/69443 I ran the repro in https://github.com/pytorch/pytorch/issues/69443 and verified there were no segfaults. #### 3. Duplicate flow IDs When forward-backward connections were first introduced, gpu-cpu async links had not been introduced. There's a possibility that gpu-cpu links and fwd-bwd links could interfere if their IDs overlap. I manually tested this in chrome://tracing; I edited a file so that a gpu-cpu link had the same ID as one of the fwd-bwd connections. The chrome tracing UI continued showing both types of links. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102424 Approved by: https://github.com/aaronenyeshi	2023-05-31 02:50:38 +00:00
David Berard	935100cbde	[profiler] When record_inputs=True, record scalar lists of length <= 30 (#100593 ) Many ops take as inputs scalars or scalar lists which are important to understand the properties of the op. For example, convolution ops' behavior and output shapes often depend on padding and strides, which are provided as scalars of lists of scalars. This will record scalar lists when record_inputs=True. Details: During collection (and this was true before this PR as well), we serialize values and tensor metadata into an InputOutputEncoder. After collection occurs, we deserialize these values to attach the information to each of the events. This PR does this: - Adds support for serializing scalar lists during collection / serialization - Adds an extra field called "Concrete Args" - Splits up the deserialization process into two steps - one for generating "input shapes" and one for generating "concrete args". We split up input shapes and concrete args to avoid interrupting any previous workflows that relied on the specific data in the input shapes category; additionally, it's just a better description. Note that single scalars will remain in the "input shapes" category as they were already in that category in the past. Differential Revision: [D45798431](https://our.internmc.facebook.com/intern/diff/D45798431) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100593 Approved by: https://github.com/aaronenyeshi	2023-05-16 07:58:46 +00:00
Richard Li	c523d7d899	Add a new hook (#99854 ) Differential Revision: D45220984 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99854 Approved by: https://github.com/albanD	2023-04-26 23:00:38 +00:00
Aaron Enye Shi	237f917f5b	[Profiler][Easy] Fix typo in Profiler report input shapes (#99430 ) Summary: There are two variables for profiler input shapes: - In C++ interface: report_input_shapes - In Python interface: record_shapes Therefore record_input_shapes is a typo. We should also look to reducing redundant naming between the two. Test Plan: CI Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/99430 Approved by: https://github.com/davidberard98	2023-04-19 21:50:52 +00:00
Zachary DeVito	1c83888be8	[memory profiling] show pre-existing memory in trace_plot (#97590 ) Previously we only plotted memory if it was allocated or freed while trace recording was active. This change also adds any pre-existing blocks to the visualization. This helps because it is common to enable trace recording later and then not realize that there is a lot of allocated memory in the trace eventhough a lot was allocated beforehad. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97590 Approved by: https://github.com/eellison	2023-03-28 16:31:10 +00:00
Zachary DeVito	e74f70d212	Revert "Revert "[memory profiling] add a facility to gather combined C++/Python/TorchScript stack traces. (#95541 )"" (#96878 ) This reverts commit `e1ea584b1c`. Adds __has_include check to fix fbcode build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96878 Approved by: https://github.com/ezyang	2023-03-16 04:12:54 +00:00
PyTorch MergeBot	e1ea584b1c	Revert "[memory profiling] add a facility to gather combined C++/Python/TorchScript stack traces. (#95541 )" This reverts commit `4e1060c609`. Reverted https://github.com/pytorch/pytorch/pull/95541 on behalf of https://github.com/DanilBaibak due to breaking internal builds	2023-03-15 13:28:41 +00:00
Zachary DeVito	4e1060c609	[memory profiling] add a facility to gather combined C++/Python/TorchScript stack traces. (#95541 ) This refactors the stack trace facility specific to memory profiling in python+cuda to make a generic facility to generate combined stack traces. The generic facility (combined_traceback.h) does not require python to be around to work, but will return python stacks if it is present. This facility is then used to add support for stack trace gathering in memory profiling that happens directly from C++. It is also used to expose a python API for gathering and symbolizing combineds stacks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95541 Approved by: https://github.com/ezyang	2023-03-14 18:26:05 +00:00
Xunsong, Huang	b053a0f2ba	[XPU][Profiler] Add API support for XPU profiler to Kineto path (#94502 ) This patch is aimed to add support to XPU profiler which will co-work with Kineto. After this PR, kineto will follow these API to fit itself. Also, the development of interface in python is near done. Signed-off-by: Huang, Xunsong <xunsong.huang@intel.com> Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/94502 Approved by: https://github.com/ezyang	2023-03-10 12:17:14 +00:00
Salil Desai	193068cbcf	[Vulkan + Profiler] Enable Processing Vulkan Events in Profiler (#90852 ) @bypass-github-export-checks This diff enables passing processing events in the profiler. Passing the events from QueryPool, and making sure vulkan events align with parent CPU events correctly will be handled later in this diff stack. This diff was made by forking Taylor's scaffolding diff, D39779878, with a few changes: - Rebasing + resolving merge conflicts - Fixing (i.e. removing) auto import of profiler/containers.h - Changing the activity type to CPU_OP which makes the vulkan events appear on chrometrace - Moving timestamp adjustment scaffolding to D39893109 Differential Revision: [D39834805](https://our.internmc.facebook.com/intern/diff/D39834805/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90852 Approved by: https://github.com/mcr229	2022-12-19 19:54:32 +00:00
Taylor Robie	8023c9dc64	[Profiler] Memory profiler part 3: Schema parsing and mutable arguments (#86854 ) The appropriate annotation for a block of memory is a function of time: an input can be mutated in-place to become an activation, a clever kernel might steal the memory of a detached input (such as a mask) to use as output memory, etc. We could pessimistically assume that all ops mutate all of their inputs, however inspection of schema allows us to significantly narrow that assumption with minimal effort. Checking schemas also allows us to distinguish between dispatcher ops (which have load bearing semantics) and user annotations with reasonably high precision. Differential Revision: [D40220390](https://our.internmc.facebook.com/intern/diff/D40220390/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86854 Approved by: https://github.com/chaekit	2022-11-15 19:17:57 +00:00
Taylor Robie	cef13ebea0	[Profiler] Memory profiler part 1: Gradient identification (#86802 ) There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86802 Approved by: https://github.com/chaekit	2022-11-08 23:53:13 +00:00
Taylor Robie	6e6f929b2c	[Profiler] Restructure inputs and capture TensorLists. (#87825 ) This PR unifies and rationalizes some of the input representation in Result. The current approach of storing separate types in separate vectors is tedious for two types (Tensors and scalars), but would be even more annoying with the addition of TensorLists. A similar disconnection exists with sizes and strides which the user is also expected to zip with tensor_metadata. I simplified things by moving inputs to a variant and moving sizes and strides into TensorMetadata. This also forced collection of sizes and strides in python tracer which helps to bring it in line with op profiling. Collection of TensorLists is fairly straightforward; `InputOutputEncoder` already has a spot for them (I actually collected them in the original TorchTidy prototype) so it was just a matter of plumbing things through. Differential Revision: [D40734451](https://our.internmc.facebook.com/intern/diff/D40734451/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87825 Approved by: https://github.com/slgong-fb, https://github.com/chaekit	2022-11-08 21:48:43 +00:00
Taylor Robie	e132c45fd0	[Profiler] Handle ABA for TensorImpl* when assigning IDs (#87133 ) Part of the current ID assingment algorithm groups any Storages which are associated with the same TensorImpl. This isn't sound (which I knew but deferred until it actually became a problem) because pointers can be reused by different objects. (ABA problem) ABA is easy to handle for Storage because we see allocations and frees, but ~TensorImpl is very hot and cannot tolerate profiling code without significant increases in overhead. This PR narrows the conditions under which ID assignment will join on TensorImpl. Two storages which are associated with the same TensorImpl* are grouped IFF they were live at the same time. (Note that this still allows storages with disjoint lifetimes to be joined transitively through a third storage which overlaps with both.) The need for this PR arose in memory profiling. The Python argument parser creates short lived Tensors for (some) scalar arguments which triggers this issue. (Which is stochastic and platform dependent since optimizations like reusing recently freed allocations is implementation defined.) Spurious connections can lead to confusing and long range interactions when building up the memory profile, so it makes sense to harden ID assignment to avoid any issues. Differential Revision: [D40445121](https://our.internmc.facebook.com/intern/diff/D40445121/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87133 Approved by: https://github.com/slgong-fb, https://github.com/chaekit	2022-11-08 21:48:43 +00:00
Digant Desai	dcbcf5b90e	[profiler] Expose experimental performance events to python (#87905 ) Reports total counts (includes time spent in all children), self counts can be calculated manully. Differential Revision: [D40282770](https://our.internmc.facebook.com/intern/diff/D40282770/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87905 Approved by: https://github.com/SS-JIA	2022-11-02 14:54:15 +00:00
Taylor Robie	b16b5fb802	[Profiler] Hold weak reference to prevent TensorImpl address reuse during profiling. (#87244 ) A recurring problem with assigning Tensor IDs is that we want to preserve identity when storage changes but we don't observe TensorImpl destruction so identity assignment is not robust to the ABA problem with respect to TensorImpl. ~TensorImpl is far too hot to instrument; even adding a call to a no-op function in a different compilation unit increases overhead by tens of percent. (OSS builds do not have any sort of LTO.) Fortunately there is a solution. A PyTorch Tensor is a `c10::intrusive_ptr<c10::TensorImpl>`, which in turn holds a storage. (Which is a `c10::intrusive_ptr<c10::StorageImpl>`) `c10::intrusive_ptr` has a `c10::weak_intrusive_ptr` class for taking non-owning references to the underlying object. The implementation involves both a strong refcount and weak refcount in `c10::intrusive_ptr`. If the strong refcount of an intrusive_ptr goes to zero and there are no weak references then everything is deleted. However if there is a weak reference then the intrusive_ptr calls `release_resources()` but not delete. This has the effect of freeing the underlying resources (ensuring that program semantics are unchanged) but leaves behind an empty shell of an `intrusive_ptr` that the `weak_intrusive_ptr`s use to check status. And herein lies the solution: as long as we hold a weak reference to a TensorImpl we will block deletion and prevent the `TensorImpl` from being reused. This PR uses a `c10::weak_intrusive_ptr<c10::TensorImpl>` to store the address of profiled TensorImpls and then converts it to a raw pointer (or rather, a `TensorImplAddress`) during post processing when we no longer care about blocking address reuse. Differential Revision: [D40492848](https://our.internmc.facebook.com/intern/diff/D40492848/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87244 Approved by: https://github.com/slgong-fb, https://github.com/albanD	2022-10-27 06:38:11 +00:00
Taylor Robie	5ec03fc17a	[Profiler][Trivial] Add Module cls and self bindings and type_caster macro (#86755 ) Just a bit of clean up. We will need `self` and `cls` for memory profiling, and the type_caster specializations were getting quite verbose. Differential Revision: [D39920728](https://our.internmc.facebook.com/intern/diff/D39920728/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86755 Approved by: https://github.com/slgong-fb, https://github.com/aaronenyeshi	2022-10-23 19:23:44 +00:00
Taylor Robie	be2d647ea6	[Profiler] Use parameter as key for optimizer state recording. (#86753 ) While optimizer can store state however it likes, in practice most optimizer state corresponds to a particular parameter. (This is the case for all `torch.optim` optimizers.) Thus, it turns out to be ergonomic to collect using that structure. Note that this doesn't lock us into anything; we can always collect state with non Tensor keys if the use case arises. One simplification that arises is that Module and Optimizer collection has very similar structure. So similar, in fact, that it is possible to use a common template for config. I also found that a lot of the `check_and_store` logic could be simplified and inlined by this joining of collected optimizer state. Differential Revision: [D40210703](https://our.internmc.facebook.com/intern/diff/D40210703/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86753 Approved by: https://github.com/slgong-fb, https://github.com/aaronenyeshi	2022-10-23 19:23:39 +00:00
Taylor Robie	c16b7b41f7	[Profiler][Trivial] Small style and safety fixes (#86752 ) I noticed a couple abbreviations in the new optimizer capture code that are worth expanding. I also made the RawTensorMetadata a bit safer. Differential Revision: [D40210702](https://our.internmc.facebook.com/intern/diff/D40210702/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86752 Approved by: https://github.com/slgong-fb, https://github.com/aaronenyeshi	2022-10-20 17:34:16 +00:00
Taylor Robie	35fb007749	[Profiler][Minor] Separate standalone profilers from the main PyTorch profiler. (#85511 ) There are a number of instrumentation utils which have been added to the profiler toolkit. They are generally small and self contained, often wrapping vendor APIs. (NVTX, ITT) They don't really interact with the much more expansive machinery of the PyTorch profiler beyond registration / unregistration, minor util sharing, and reusing the profiler base class. Just as in the case of stubs, it makes sense to group them in a dedicated subfolder. Differential Revision: [D39108649](https://our.internmc.facebook.com/intern/diff/D39108649/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39108649/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/85511 Approved by: https://github.com/albanD	2022-10-14 05:38:48 +00:00
Seonglyong Gong	dbea07b6aa	[Profiler] record gradient from nnModule (#86355 ) Summary: - catch .grad tensor info - update data type and `check_and_store`, etc - update unit test case Test Plan: buck run mode/opt //caffe2/test:profiler Reviewed By: chaekit Differential Revision: D39711295 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86355 Approved by: https://github.com/chaekit	2022-10-07 09:58:50 +00:00
Seonglyong Gong	27c3fb0386	[Profiler] trace verbose=false by default (#86263 ) Summary: - Added config option to remove 'Call stack' field from trace file (#84982) - Change default value to `false` Test Plan: - `experimental_config=_ExperimentalConfig(verbose=true),` will add 'Call stack' field back in the trace file. - CI tests Differential Revision: D40092377 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86263 Approved by: https://github.com/aaronenyeshi	2022-10-06 06:32:25 +00:00
Seonglyong Gong	a117fde86f	[Profiler] Apply TensorMetadata for Optimizer and nnModule (#86047 ) Summary: - Use `TensorMetadat` struct in saving tensor info from Optimizer and nnModule. Test Plan: buck run mode/opt //caffe2/test:profiler Reviewed By: chaekit Differential Revision: D39682205 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86047 Approved by: https://github.com/chaekit, https://github.com/robieta	2022-10-06 06:18:56 +00:00
Taylor Robie	acd2f21ea1	[Profiler] Update python binding type annotations (#85722 ) The annotations for `torch._C._profiler` have gotten a bit stale. This PR simply brings them up to date. There is one small quality of life change that alters behavior: instead of returning device type and index separately we return a `torch.device` object. Differential Revision: [D39852803](https://our.internmc.facebook.com/intern/diff/D39852803/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85722 Approved by: https://github.com/chaekit	2022-10-03 05:41:39 +00:00
Taylor Robie	5ed338a55b	[Profiler] Add dtype to `_TensorMetadata` (#85721 ) `Inputs.dtypes_` stringifies the dtypes; however this loses information which is hard to recover and useful for analysis. So this PR adds full `torch.dtype` info for Tensors. Differential Revision: [D39852802](https://our.internmc.facebook.com/intern/diff/D39852802/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85721 Approved by: https://github.com/chaekit	2022-10-03 05:41:39 +00:00
Taylor Robie	ba95984588	[Profiler] Make `name` a property. (#85720 ) This is just a quality of life change. `.name` is 30% fewer characters than `.name()`. I should have done this from the start. Differential Revision: [D39788873](https://our.internmc.facebook.com/intern/diff/D39788873/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85720 Approved by: https://github.com/chaekit	2022-10-03 05:41:36 +00:00
Taylor Robie	0b0ce72b25	[Profiler] Extend ID assignment to allocations and frees (#85719 ) This is necessary for memory profiling because we need to know how to interpret an allocation. However there is a slight wrinkle: we don't know if an allocation is for a Tensor's StorageImpl until we see it used in a later call. (We could record outputs, however we're not willing to incur the overhead.) So we instead treat all allocations as relevant and then filter out some later. Otherwise the change to the ID assignment algorithm is minimal. Differential Revision: [D39788870](https://our.internmc.facebook.com/intern/diff/D39788870/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85719 Approved by: https://github.com/chaekit	2022-09-30 04:39:08 +00:00
Seonglyong Gong	3cfc61b846	[Profiler][trivial] Optimizer states (part 4 of Record Optimizer) (#85840 ) Summary: - add states into OptInfo and update unit testcase Test Plan: buck run mode/opt //caffe2/test:profiler Reviewed By: chaekit Differential Revision: D39406540 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85840 Approved by: https://github.com/robieta	2022-09-29 07:28:33 +00:00
Seonglyong Gong	d776693701	[Profiler] Optimizer param_groups (part 3 of Record Optimizer) (#85784 ) Summary: - use TensorMetadata struct - check_and_store util as overloading - param_groups - clean up unit test cases Test Plan: buck run mode/opt //caffe2/test:profiler Reviewed By: chaekit Differential Revision: D39406072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85784 Approved by: https://github.com/aaronenyeshi, https://github.com/robieta	2022-09-28 19:18:12 +00:00
Seonglyong Gong	f80ef73d1c	[Profiler] tracking Optimizer (part 2 of Record Optimizer) (#84920 ) Summary: Part 2 of Record Optimizer param_groups and states (https://github.com/pytorch/pytorch/pull/84063) - hooking from optimizer step - PyOptCall Type - declare data type for collection - python binding - simple unit test case Test Plan: buck run mode/opt //caffe2/test:profiler Differential Revision: D39402667 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84920 Approved by: https://github.com/robieta	2022-09-28 02:48:07 +00:00
Taylor Robie	1a0e1db763	[Profiler] Compute unique IDs for Tensors (#85162 ) This PR is largely based on https://github.com/pytorch/pytorch/pull/80266, with one major difference. #80266 assigned each unique {TensorImpl, StorageImpl} pair a unique ID, whereas this PR seeks to cluster the implicit graph formed by the pairs into disjoint groups and assign an ID to each disjoint group. Differential Revision: [D39563859](https://our.internmc.facebook.com/intern/diff/D39563859/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85162 Approved by: https://github.com/chaekit	2022-09-25 17:43:49 +00:00
Taylor Robie	4dfaca6fb1	[Profiler] Clean up Tensor representation (#85161 ) I want to start using `TensorMetadata` elsewhere in profiler so we have a common representation of Tensor. The main changes in this PR are: 1) Replace raw pointers with strong typedefs and create a custom type caster to handle moving them to Python. 2) Adding a `device()` method to handle reassembling type and index. Differential Revision: [D39563965](https://our.internmc.facebook.com/intern/diff/D39563965/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85161 Approved by: https://github.com/chaekit	2022-09-23 19:12:49 +00:00
Taylor Robie	e296a82f23	[Profiler] Capture storage data pointer (#84276 ) This is approximately a re-land of the storage half of https://github.com/pytorch/pytorch/pull/80266 I've directly represented and exposed storage impl rather than using it as a first guess for an ID. (Mostly for testing, which happened to save me as I was initially recording the wrong thing.) Differential Revision: [D39136546](https://our.internmc.facebook.com/intern/diff/D39136546/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84276 Approved by: https://github.com/slgong-fb	2022-09-23 19:12:49 +00:00
Seonglyong Gong	ebd4e90ff7	[Profiler] add config option to remove 'Call stack' field from trace file (#84982 ) Summary: `Call stack` field increases trace file size exponentially for Python stack tracing (need to be deprecated carefully). Added a config option to avoid this increase. Test Plan: `experimental_config=_ExperimentalConfig(no_callstack_trace=True),` will remove the field. + CI tests Differential Revision: D39489828 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84982 Approved by: https://github.com/robieta	2022-09-15 06:41:33 +00:00
Taylor Robie	014a333df3	[Profiler][Minor] Extend Python bindings (#83622 ) Adding some fields which are needed for memory profiling. Differential Revision: [D38528382](https://our.internmc.facebook.com/intern/diff/D38528382/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83622 Approved by: https://github.com/Gamrix	2022-08-26 20:03:24 +00:00
PyTorch MergeBot	d5af2a70ba	Revert "[TorchTidy] Adding support for unique tensor identifiers (#80266 )" This reverts commit `b6ba41921d`. Reverted https://github.com/pytorch/pytorch/pull/80266 on behalf of https://github.com/malfet due to Broke number of trunk jobs, see `b6ba41921d`	2022-08-25 05:09:12 +00:00
John Clow	b6ba41921d	[TorchTidy] Adding support for unique tensor identifiers (#80266 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80266 Approved by: https://github.com/robieta	2022-08-25 03:52:19 +00:00
Seonglyong Gong	fa241fd50e	[Profiler] record nn.Module's parameters (#83209 ) Summary: Record nn.Module's parameters for detaild memory profiling: - extend 'module_' in value cache & NNModuleInfo to save parameters - python binding and unit test case Test Plan: buck run mode/opt //caffe2/test:profiler -- -r test_nnmodule Differential Revision: D38379717 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83209 Approved by: https://github.com/robieta	2022-08-24 08:17:20 +00:00
Taylor Robie	1fa9a377d0	[Profiler] Start moving python bindings out of autograd (#82584 ) A lot of profiler code still lives in autograd for historic reasons. However as we formalize and clean up profiler internals it makes sense to pull more and more into the profiler folders/namespace. For now I'm just moving some of the core config data structures and those related to `torch::profiler::impl::Result` to keep the scope manageable. Differential Revision: [D37961462](https://our.internmc.facebook.com/intern/diff/D37961462/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D37961462/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/82584 Approved by: https://github.com/albanD, https://github.com/Gamrix	2022-08-19 17:15:18 +00:00

40 Commits