While optimizer can store state however it likes, in practice most optimizer state corresponds to a particular parameter. (This is the case for all `torch.optim` optimizers.) Thus, it turns out to be ergonomic to collect using that structure. Note that this doesn't lock us into anything; we can always collect state with non Tensor keys if the use case arises.
One simplification that arises is that Module and Optimizer collection has very similar structure. So similar, in fact, that it is possible to use a common template for config. I also found that a lot of the `check_and_store` logic could be simplified and inlined by this joining of collected optimizer state.
Differential Revision: [D40210703](https://our.internmc.facebook.com/intern/diff/D40210703/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86753
Approved by: https://github.com/slgong-fb, https://github.com/aaronenyeshi
There are a number of instrumentation utils which have been added to the profiler toolkit. They are generally small and self contained, often wrapping vendor APIs. (NVTX, ITT)
They don't really interact with the much more expansive machinery of the PyTorch profiler beyond registration / unregistration, minor util sharing, and reusing the profiler base class. Just as in the case of stubs, it makes sense to group them in a dedicated subfolder.
Differential Revision: [D39108649](https://our.internmc.facebook.com/intern/diff/D39108649/)
**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39108649/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85511
Approved by: https://github.com/albanD
Summary:
- catch .grad tensor info
- update data type and `check_and_store`, etc
- update unit test case
Test Plan: buck run mode/opt //caffe2/test:profiler
Reviewed By: chaekit
Differential Revision: D39711295
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86355
Approved by: https://github.com/chaekit
Summary:
- Added config option to remove 'Call stack' field from trace file (#84982)
- Change default value to `false`
Test Plan:
- `experimental_config=_ExperimentalConfig(verbose=true),` will add 'Call stack' field back in the trace file.
- CI tests
Differential Revision: D40092377
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86263
Approved by: https://github.com/aaronenyeshi
This is necessary for memory profiling because we need to know how to interpret an allocation. However there is a slight wrinkle: we don't know if an allocation is for a Tensor's StorageImpl until we see it used in a later call. (We could record outputs, however we're not willing to incur the overhead.) So we instead treat all allocations as relevant and then filter out some later. Otherwise the change to the ID assignment algorithm is minimal.
Differential Revision: [D39788870](https://our.internmc.facebook.com/intern/diff/D39788870/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85719
Approved by: https://github.com/chaekit
I want to start using `TensorMetadata` elsewhere in profiler so we have a common representation of Tensor. The main changes in this PR are:
1) Replace raw pointers with strong typedefs and create a custom type caster to handle moving them to Python.
2) Adding a `device()` method to handle reassembling type and index.
Differential Revision: [D39563965](https://our.internmc.facebook.com/intern/diff/D39563965/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85161
Approved by: https://github.com/chaekit
Summary: `Call stack` field increases trace file size exponentially for Python stack tracing (need to be deprecated carefully). Added a config option to avoid this increase.
Test Plan:
`experimental_config=_ExperimentalConfig(no_callstack_trace=True),` will remove the field.
+ CI tests
Differential Revision: D39489828
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84982
Approved by: https://github.com/robieta
Summary:
Record nn.Module's parameters for detaild memory profiling:
- extend 'module_' in value cache & NNModuleInfo to save parameters
- python binding and unit test case
Test Plan: buck run mode/opt //caffe2/test:profiler -- -r test_nnmodule
Differential Revision: D38379717
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83209
Approved by: https://github.com/robieta