Commit Graph

64 Commits

Author SHA1 Message Date
Taylor Robie
c321dfe1b5 move tree tests to the start of test_profiler.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79301

Approved by: https://github.com/davidchencsl
2022-06-12 04:24:30 +00:00
PyTorch MergeBot
c99ea0db46 Revert "[PyTorch] Record Sequence Number to Match Forward and Backward Operators (#78795)"
This reverts commit a299a2fa26.

Reverted https://github.com/pytorch/pytorch/pull/78795 on behalf of https://github.com/janeyx99 due to Broke profiler tests a299a2fa26
2022-06-10 13:11:44 +00:00
Louis Feng
a299a2fa26 [PyTorch] Record Sequence Number to Match Forward and Backward Operators (#78795)
Summary: Add sequence number to map forward and backward operators.

Test Plan:
```
buck build mode/dev-nosan cea/ml_perf_model/gpu/scripts: --show-output
buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestExecutionGraph.test_execution_graph_start_stop
```

Outputs with seq_id: P505545974

Differential Revision: D36881999

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78795
Approved by: https://github.com/robieta
2022-06-10 05:51:17 +00:00
Taylor Robie
84b9e5ba84 Move test_profiler tests to tree rather than icicle format
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79175

Approved by: https://github.com/ezyang
2022-06-09 19:45:02 +00:00
Taylor Robie
9f2e2aa28b Revert "Revert "[Profiler] Move python tracing to unified event type (Part 2)""
This reverts commit 4305f8e9bd.

replace TEST_CUDA with torch.has_cuda

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79173

Approved by: https://github.com/ezyang
2022-06-09 19:45:02 +00:00
Edward Z. Yang
eb856daf0f Do not treat all dense tensors as isTensorSubclassLike
Fixes https://github.com/pytorch/pytorch/issues/79079

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79098

Approved by: https://github.com/soulitzer, https://github.com/albanD
2022-06-09 03:00:57 +00:00
PyTorch MergeBot
4305f8e9bd Revert "[Profiler] Move python tracing to unified event type (Part 2)"
This reverts commit c2a3c8186c.

Reverted https://github.com/pytorch/pytorch/pull/78164 on behalf of https://github.com/malfet due to Broke cuda-on-cpu tests, see c2a3c8186c
2022-06-08 02:21:16 +00:00
Taylor Robie
c2a3c8186c [Profiler] Move python tracing to unified event type (Part 2)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78164

This PR finishes moving over the python tracer to use the unified event type. Things that changed:

1) The hacky after-the-fact splicing of python events in profiler_kineto.cpp is gone and python events now simply fold into the rest. (Yay!!!) This is a major BE win.
2) Added `ExtraFields<EventType::PyCall>` and `ExtraFields<EventType::PyCCall>`
3) The enter events (time + TraceKey) are now handled by RecordQueue for performance.
4) Python tracing now uses TSC for lower overhead.

Simplifications in profiler_python WRT part 1:
1) Rather than ValueCache emitting an intermediate value_t that gets further converted, load methods can now directly emit ExtraFields<...>
2) The complicated replay in profiler_python.cpp is replaced with a much simpler (and safer) pass to just pair start and end times.
3) During post processing we can now use `CallTypeHelper::map` to automatically pull in all events instead of having to loop over each the entries for each type manually. This will make it simpler to add new types of Python event later.

Differential Revision: [D36515869](https://our.internmc.facebook.com/intern/diff/D36515869/)

Approved by: https://github.com/aaronenyeshi
2022-06-07 23:42:00 +00:00
Taylor Robie
a173613f6d [Profiler] Move python tracing to unified event type (Part 1)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78163

The python function tracer is complicated and separate from the other profile types, so I've chosen to break the change into two diff. The first (this one) reworks the cache structure to make it amenable to integration (as well as some other nice tweaks) and the next one actually moves it over.

The old cache scheme worked very hard to pack all the information about an event into a small struct via bit packing, with a couple secondary caches for things like names. Because of the space constraints on that struct (and the fact that it had to represent all call and return types) there were a lot of subtle invariants swirling around that made it hard to offload anything to a different component. The new cache system is more modular and also, as it turns out, a bit faster. (Benchmarks in part 2)

There is a more detailed description of the cache hierarchy in the PR, but the gist is that I use various specializations to handle the different event types (python call, nn module, c function) and lean on the type system to keep everything safe and organized. (One nice thing about using unique IDs is that they also implicitly encode the event type. They implicitly encode everything!) Given that we are going to want to expand the semantics (e.g. torch ops, DataLoader, etc) this will give a nice way to capture richer semantics without significantly increasing the complexity of the profiler.

Differential Revision: [D36379147](https://our.internmc.facebook.com/intern/diff/D36379147/)

Approved by: https://github.com/aaronenyeshi
2022-06-07 23:42:00 +00:00
Edward Z. Yang
7313a7a987 Make Meta into a backend component
Seems like it should be one.  This will make it possible to register
meta implementations even when there is a CompositeImplicitAutograd
registration already.  It also paves the way for sparse meta, etc.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78469

Approved by: https://github.com/ngimel
2022-05-31 18:59:16 +00:00
Louis Feng
18d46ea9fd [PyTorch] Integrate Execution Graph Observer into PyTorch Profiler (#75358)
Test Plan:
```
buck build mode/dev-nosan caffe2/test:profiler --show-output
buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestExecutionGraph.test_execution_graph
```

Example output: P491658589

Differential Revision: D35342394

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75358
Approved by: https://github.com/robieta
2022-05-26 08:06:27 +00:00
Taylor Robie
34d160b1fa [Profiler] Build call tree in collection.cpp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77698

This PR adds tree building to the post processing of profiler. The basic algorithm is to sort the events, maintain a stack and a priority queue of event ends, and push/pop accordingly. The logic for merging Python events is still separate in `profiler_kineto.cpp`. That can be removed when Python events have an `EventType`.

Differential Revision: [D36321105](https://our.internmc.facebook.com/intern/diff/D36321105/)

Approved by: https://github.com/aaronenyeshi
2022-05-25 16:11:01 +00:00
Louis Feng
82cb7210e8 [PyTorch] Fix record function inputs_valid_ check (#78002)
Summary:
I think this has to be set in all before() calls. Because by default inputs_valid_ = false;. For RecordFunction without any input parameters (so this interface is not used), calling record_func.inputs() will cause an assert:
```
fbcode/caffe2/aten/src/ATen/record_function.h
322
    TORCH_INTERNAL_ASSERT_DEBUG_ONLY(inputs_valid_, "Called inputs() outside RecordFunction start callback");
```
I suppose, an alternative is to require users to call num_inputs() to check before calling inputs(). But I think the intent of inputs_valid_ is for verifying inputs are being requested within the lifetime of the start callback.

Test Plan:
After this diff fix:

```
=> buck build mode/dev-nosan caffe2/test:profiler --show-output

=> buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestRecordFunction.test_record_function
test_record_function (test_profiler.TestRecordFunction) ... Log file: /tmp/libkineto_activities_763488.json
INFO:2022-05-20 13:02:33 763488:763488 Config.cpp:470] Trace start time: 2022-05-20 13:02:48
Trace duration: 500ms
Warmup duration: 5s
Max GPU buffer size: 128MB
Enabled activities: cpu_op,user_annotation,external_correlation,cuda_runtime,cpu_instant_event,python_function
Manifold bucket: gpu_traces
Manifold object: tree/traces/clientAPI/0/1653076953/devgpu040.ftw6/libkineto_activities_763488.json
Trace compression enabled: 1
INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:559] Enabling GPU tracing
INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:486] Running child profiler CuptiRangeProfiler for 500 ms
INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:593] Tracing starting in 14s
INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:596] Tracing will end in 15s
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0520 13:02:34.013141 763488 Logger.cpp:2273] Dropping logs in unit tests. Set shouldLogDuringTests=True in your CBLC to fix this
STAGE:2022-05-20 13:02:33 763488:763488 ActivityProfilerController.cpp:269] Completed Stage: Warm Up
STAGE:2022-05-20 13:02:34 763488:763488 ActivityProfilerController.cpp:275] Completed Stage: Collection
INFO:2022-05-20 13:02:34 763488:763488 CuptiActivityProfiler.cpp:134] Processing 1 CPU buffers
INFO:2022-05-20 13:02:34 763488:763488 CuptiActivityProfiler.cpp:771] Traces Recorded:
INFO:2022-05-20 13:02:34 763488:763488 CuptiActivityProfiler.cpp:774] PyTorch Profiler: 1 iterations
ok

----------------------------------------------------------------------
Ran 1 test in 0.060s

OK
```

New test failure case:

```
=> buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestRecordFunction.test_record_function
test_record_function (test_profiler.TestRecordFunction) ... Log file: /tmp/libkineto_activities_808629.json
INFO:2022-05-20 13:04:46 808629:808629 Config.cpp:470] Trace start time: 2022-05-20 13:05:01
Trace duration: 500ms
Warmup duration: 5s
Max GPU buffer size: 128MB
Enabled activities: cpu_op,user_annotation,external_correlation,cuda_runtime,cpu_instant_event,python_function
Manifold bucket: gpu_traces
Manifold object: tree/traces/clientAPI/0/1653077086/devgpu040.ftw6/libkineto_activities_808629.json
Trace compression enabled: 1
INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:559] Enabling GPU tracing
INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:486] Running child profiler CuptiRangeProfiler for 500 ms
INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:593] Tracing starting in 14s
INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:596] Tracing will end in 15s
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0520 13:04:46.853253 808629 Logger.cpp:2273] Dropping logs in unit tests. Set shouldLogDuringTests=True in your CBLC to fix this
STAGE:2022-05-20 13:04:46 808629:808629 ActivityProfilerController.cpp:269] Completed Stage: Warm Up
W0520 13:04:48.126065 808629 record_function.cpp:470] Exception in RecordFunction callback: inputs_valid_ INTERNAL ASSERT FAILED at "caffe2/aten/src/ATen/record_function.h":322, please report a bug to PyTorch. Called inputs() outside RecordFunction start callback
Exception raised from inputs at caffe2/aten/src/ATen/record_function.h:322 (most recent call first):
# 0  c10::get_backtrace[abi:cxx11](unsigned long, unsigned long, bool)
# 1  std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (), c10::(anonymous namespace)::GetFetchStackTrace()::$_1>::_M_invoke(std::_Any_data const&)
# 2  c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
# 3  c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
# 4  c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, char const*)
# 5  torch::profiler::impl::ThreadLocalSubqueue::begin_op(at::RecordFunction const&, unsigned long)
# 6  std::unique_ptr<at::ObserverContext, std::default_delete<at::ObserverContext> > torch::autograd::profiler::(anonymous namespace)::onFunctionEnter<false>(at::RecordFunction const&)
# 7  at::RecordFunction::runStartCallbacks()
```

Differential Revision: D36556512

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78002
Approved by: https://github.com/swolchok
2022-05-25 06:33:59 +00:00
Kevin Tse
7c52f204e0 [DataPipe] Enforcing single valid iterator for IterDataPipes without multiple outputs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70479

Approved by: https://github.com/ejguan
2022-05-18 01:31:38 +00:00
Taylor Robie
0df2e863fb [Profiler] Expose profilerType in Python
Summary: It's currently possible for C++ callers to check if there is an active profiler. This adds Python API parity. For now we just use `torch._C._autograd` namespace, as this is mostly for first party frameworks like RPC. (We can always move to public API if there is demand.)

Test Plan: Added unit test

Differential Revision: D35602425

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75754
Approved by: https://github.com/rohan-varma
2022-04-16 21:08:18 +00:00
erjia
277c8fe646 [DataPipe] Make sure the profiler wrapper can delegate API for iterator
This PR is trying to solve the problem that delegate the API from the profiler layer to the `Iterator` returned from `IterDataPipe`.

We need this for internal usage `limit`, `resume`, etc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75275
Approved by: https://github.com/NivekT
2022-04-13 21:47:14 +00:00
alexmsettle
c0a6add7ee Changes to support input sequence ID tracking (#70264)
Summary:
in the NVTX markers.  This feature adds additional information
to the NVTX marker string eg seq_ids=[101, 102, 103].  This indicates
the sequence id of the op which produced the input tensor based on its
position index in the array.  In the above example input tensor 0 was produced by
the node with sequence id 101, input tensor 1 is from node 102, input tensor 2 is from
node with sequence id 103. This is the same way the sizes array is
organized. If you know the sequence id of the node and the sequence ids
of the input edges, then you have enough information to construct the
network graph.

Fixes https://github.com/pytorch/pytorch/issues/66105

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70264

Reviewed By: chaekit

Differential Revision: D34792707

Pulled By: robieta

fbshipit-source-id: 4407b853c929a737505803b0db77a8ecd966cce2
(cherry picked from commit cd3c0c8c9d4d63d7897f60521c407883240d1d5b)
2022-03-31 22:15:39 +00:00
Mike Guo
554169fc7b Disable forward/backward correlation to workaround the profiler crash (#72904)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/69443 and https://github.com/pytorch/pytorch/issues/72858

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72904

Reviewed By: george-qi

Differential Revision: D34382323

Pulled By: robieta

fbshipit-source-id: 2a6c18a010b6844a769d091b50bea1fd6285524f
(cherry picked from commit 7d2fabc5eca2b00c1f80b287b4a4e2650a178136)
2022-03-01 20:34:14 +00:00
Taylor Robie
322f13d914 [Profiler] Fix memory profile type from recent refactor (#71417)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71417

I accidentally changed CPU_INSTANT_EVENT to CPU_OP, which broke TensorBoard.

Test Plan: Make memory profiling unit test check this case.

Reviewed By: aaronenyeshi

Differential Revision: D33637286

fbshipit-source-id: c95945f6b85cd4168820bd4d2a9203274a0a5bd6
(cherry picked from commit b1e258672a)
2022-01-18 22:18:11 +00:00
Mike Guo
23633bdb5c record the datapipe for each pieces of Dataset (#67613)
Summary:
Add record_function for each DataPipe.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67613

Reviewed By: H-Huang

Differential Revision: D32246672

Pulled By: ejguan

fbshipit-source-id: 02ef7e75748c5b84fdcbb103398532e1f2962fbf
2021-12-01 10:29:06 -08:00
Andrey Talman
f1a3512b78 Adding Linux cuda 11.5 workflows (#68745)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68960

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68745

Reviewed By: janeyx99

Differential Revision: D32707491

Pulled By: atalman

fbshipit-source-id: 100facfdcc0fc2f68e203a696856852faa25ee08
2021-11-29 16:21:00 -08:00
Jane Xu
fa7fb7b4d9 [skip ci] Set test owner for test_profiler.py (#66831)
Summary:
Followup action to https://github.com/pytorch/pytorch/issues/66232

cc ilia-cher robieta chaekit gdankel bitfort ngimel orionr nbcsm guotuofeng guyang3532 gaoteng-git

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66831

Reviewed By: gdankel

Differential Revision: D31909245

Pulled By: janeyx99

fbshipit-source-id: 4156a5cffa215c29022fc4dab6ee5b442a509db4
2021-10-25 15:59:52 -07:00
Louis Feng
ecb7b38c00 [PyTorch] Support additional arguments in Python record function (#65736)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65736

We ran into some limitations to extract PyTorch operator parameters through hooks or the execution graph. Some of these limitations are not due to the operator not exposing them, rather the inputs for these operators are already fused/processed in some cases (like embedding table). We want to be able to attach some metadata to the user scope record functions allowing the profilers to later extract these information.

The record function C++ API already supports taking inputs and outputs information. The corresponding Python interface does not support them and only allows a string name as record function parameter.

This diff adds support for user to optionally to add additional arguments to the record function in two ways.
1. to remain backward compatible with `record_function_op`, we have added an optional string arg to the interface: `with record_function(name, arg_str)`.
2. to support data dependency graph, we also have the new `torch.autograd._record_function_with_args_enter` and `torch.autograd._record_function_with_args_exit` functions to provide an interface where we can give additional tensor arguments. For now we imagine this can be used for debugging or analysis purpose. In this form, we currently support some basic data types as inputs: scalars, string, list, and tensor.

Example usage:

```
# record_function operator with a name and optionally, a string for arguments.
with record_function("## TEST 1 ##", "[1, 2, 3]"):
    <actual module or operator>

# more general form of record_function
a = _record_function_with_args_enter("## TEST 2 ##", 1, False, 2.5, [u, u], "hello", u)
<actual module or operator>
_record_function_with_args_exit(a)

```
Corresponding outputs in execution graph:
```
    {
      "name": "## TEST 2 ##", "id": 7, "parent": 3, "fw_parent": 0, "scope": 5, "tid": 1, "fw_tid": 0,
      "inputs": [1,false,2.5,[6,6],"hello",6], "input_shapes": [[],[],[],[[3,4,5],[3,4,5]],[],[3,4,5]], "input_types": ["Int","Bool","Double","GenericList[Tensor(float),Tensor(float)]","String","Tensor(float)"],
      "outputs": [], "output_shapes": [], "output_types": []
    },
    {
      "name": "## TEST 1 ##", "id": 3, "parent": 2, "fw_parent": 0, "scope": 5, "tid": 1, "fw_tid": 0,
      "inputs": ["1, 2, 3"], "input_shapes": [[]], "input_types": ["String"],
      "outputs": [], "output_shapes": [], "output_types": []
    },
```

Test Plan:
```
=> buck build caffe2/test:profiler --show-output
=> buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestRecordFunction
test_record_function (test_profiler.TestRecordFunction) ... Log file: /tmp/libkineto_activities_1651304.json
Net filter:
Target net for iteration count:
Net Iterations: 3
INFO:2021-09-27 01:10:15 1651304:1651304 Config.cpp:424] Trace start time: 2021-09-27 01:10:30
Trace duration: 500ms
Warmup duration: 5s
Net size threshold: 0
GPU op count threshold: 0
Max GPU buffer size: 128MB
Enabled activities: cpu_op,user_annotation,external_correlation,cuda_runtime,cpu_instant_event
Manifold bucket: gpu_traces
Manifold object: tree/traces/clientAPI/0/1632730215/devvm2060.ftw0/libkineto_activities_1651304.json
Trace compression enabled: 1
INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:536] Tracing starting in 14s
INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:48] Target net for iterations not specified - picking first encountered that passes net filter
INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:57] Tracking net PyTorch Profiler for 3 iterations
INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:126] Processing 1 CPU buffers
INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:686] Recorded nets:
INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:689] PyTorch Profiler: 1 iterations
ok

----------------------------------------------------------------------
Ran 1 test in 0.021s

OK
```

Reviewed By: gdankel

Differential Revision: D31165259

fbshipit-source-id: 15920aaef7138c666e5eca2a71c3bf33073eadc4
2021-10-13 01:49:15 -07:00
Nikita Shulga
399214efd6 Revert D31172530: [pytorch][PR] Enable CUPTI for kineto by default on windows
Test Plan: revert-hammer

Differential Revision:
D31172530 (6b60884f12)

Original commit changeset: 2c69ed0282c5

fbshipit-source-id: 649e040a8c44b0f536a8db397b4325309a285934
2021-09-24 19:18:15 -07:00
Guangyun Han
6b60884f12 Enable CUPTI for kineto by default on windows (#65608)
Summary:
Retry of https://github.com/pytorch/pytorch/pull/62175

See https://github.com/pytorch/pytorch/pull/62175#issuecomment-926411151 for more information.

malfet gdankel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65608

Reviewed By: zou3519

Differential Revision: D31172530

Pulled By: gdankel

fbshipit-source-id: 2c69ed0282c54fa6cdb6e604096d0370e230fd66
2021-09-24 13:00:49 -07:00
Nikita Shulga
bc02255d5e Revert D30721329: [pytorch][PR] Enable CUPTI for kineto by default on windows.
Test Plan: revert-hammer

Differential Revision:
D30721329 (7dbc21bc2b)

Original commit changeset: aa1af47df8cc

fbshipit-source-id: 565d50841e19a45f8798a490aa3aa6b9f69ca404
2021-09-23 22:14:32 -07:00
Guangyun Han
7dbc21bc2b Enable CUPTI for kineto by default on windows. (#62175)
Summary:
It fix nothing.

For tracking this PR, please refers to https://github.com/pytorch/kineto/issues/356

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62175

Reviewed By: ezyang

Differential Revision: D30721329

Pulled By: gdankel

fbshipit-source-id: aa1af47df8cc1b6f5ba2194447f62b902a6a9c84
2021-09-23 15:13:47 -07:00
Teng Gao
d35ee431d8 correlate forward and backward op (#62553)
Summary:
Use startThreadId+seqNumber of forward-op and fwdThreadId+seqNumber of backward-op to correlate pair of them.
third_party/kineto should be updated accordingly: https://github.com/pytorch/kineto/pull/372

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62553

Reviewed By: malfet

Differential Revision: D30125728

Pulled By: gdankel

fbshipit-source-id: 9877a54392ba043d0eac56ce5b7bbf244277fa7e
2021-09-21 07:28:29 -07:00
Lucas Kabela
4a59f0b9d9 [Profiler] Change FLOP/s to Total FLOPs (#62779)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62779

Change from floating point operations per second to total floating point operations.  This requires removing the division  by executing time from the Kineto computed FLOPs and updating necessary documentation

Test Plan:
Running the following script:

```
import torch
from torch.profiler import profile
import torchvision.models as models

model = models.resnet18().eval()
inputs = torch.randn(5, 3, 224, 224)
with torch.no_grad():
    with profile(record_shapes=True, with_flops=True) as prof:
        model(inputs)
print(prof.key_averages().table(sort_by="cpu_time_total"))
```

Before diff results in:

{F636640118}

And after diff should be about `(27.78 * 10^9) FLOP/s * .652838 seconds =18135839640 FLOP = 18.136 GFLOP`.  Running the script again yields this answer:

{F636655686}

------------------------------------

Reviewed By: gdankel

Differential Revision: D29972997

fbshipit-source-id: 0f8d9f264b7d9f8f6bb3f10ab7c2c9794291e28b
2021-08-16 13:43:32 -07:00
Kimish Patel
54f2eb6e7e [Pytorch Profiler] Add support for adding module hierarchy to (#61792)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61792

KinetoEvent

This PR adds module hierarchy information to events.
What is module hierarchy information attached to events?
During profiling a TorchScript module, when events are added, we ask JIT
what is the module hierarchy associated with the node being
executed. At the time of execution of that node, there might be multiple
frames in the stack of interpreter. For each frame, we find
corresponding node and the corresponding module hierarchy is queried.
Module hierarchy corresponding to the node is associated with node's
InlinedCallStack. InlinedCallStack of node tracks the path via which the
node is inlined. Thus during the inlining process we annotate
module information corresponding to the CallMethod nodes being inlined.

With this PR, chrome trace will contain additional metadata:
"Module Hierarchy". This can look like this:
TOP(ResNet)::forward.SELF(ResNet)::_forward_impl.layer1(Sequential)::forward.0(BasicBlock)::forward.conv1(Conv2d)::forward.SELF(Conv2d)::_conv_forward
It contains module instance, type name and the method name in the
callstack.

Test Plan:
test_profiler

Imported from OSS

Reviewed By: raziel, ilia-cher

Differential Revision: D29745442

fbshipit-source-id: dc8dfaf7c5b8ab256ff0b2ef1e5ec265ca366528
2021-08-13 21:39:10 -07:00
Shen Li
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
Zsolt Dollenstein
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
Han Guangyun
8bbcef5096 Report more information for memory profiling (#61282)
Summary:
Report pointed memory size, total allocated memory, total reserved size all in one report.

`ptr` and `alloc_size` will be used for associating with op trace.
`allocated_size`, `reserved_size` will be used for memory trace.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61282

Reviewed By: ejguan

Differential Revision: D29796282

Pulled By: chaekit

fbshipit-source-id: 5314c867632d3af1fa9a3811b35eaa5e931a5d87
2021-08-04 15:03:14 -07:00
Mike Guo
08539ca047 Add non-context manager usage support for profiler (#61690)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60238, https://github.com/pytorch/kineto/issues/329

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61690

Reviewed By: malfet

Differential Revision: D30016561

Pulled By: ngimel

fbshipit-source-id: 93a578ffbb556f4b584213ac9cfafcc5cf0a9270
2021-07-30 15:54:36 -07:00
Ilia Cherniavskii
6997e7bd39 Update Kineto submodule (#58179)
Summary:
Update Kineto submodule, minor api changes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58179

Test Plan: CI

Reviewed By: gdankel

Differential Revision: D28391369

Pulled By: ilia-cher

fbshipit-source-id: 61fbf63d9ec2db66fac203944679e4b99cb0d568
2021-05-13 04:03:04 -07:00
Ilia Cherniavskii
e18f5f1d13 [profiler][small] Add skip_first parameter to the default schedule (#58025)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58025

Add `skip_first` parameter to allow for arbitrary profiler step ranges

Test Plan: python test/test_profiler.py

Reviewed By: gdankel

Differential Revision: D28347768

Pulled By: ilia-cher

fbshipit-source-id: bb6fd3cedfa4a5d1307b91002def733896dd03eb
2021-05-12 02:06:11 -07:00
Ilia Cherniavskii
bf2ebfc9f6 [profiler][small] Handle empty trace (#58013)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58013

Add a test case and a fix (legacy profiler) for empty trace handling

Test Plan: python test/test_profiler.py

Reviewed By: gdankel

Differential Revision: D28345388

Pulled By: ilia-cher

fbshipit-source-id: 4727589ab83367ac8b506cc0f186e5292d974671
2021-05-12 02:06:08 -07:00
Ilia Cherniavskii
c714596027 [kineto] Update Kineto submodule, cupti library paths (#57789)
Summary:
Update kineto submodule, improve cupti detection

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57789

Test Plan: CI

Reviewed By: ngimel

Differential Revision: D28297175

Pulled By: ilia-cher

fbshipit-source-id: 5895270fae160097ae8872a592984d0e4a1b187b
2021-05-10 19:15:59 -07:00
Ilia Cherniavskii
8639fd104e [profiler][kineto] Support for memory allocs/deallocs in the traces (#57835)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57835

Pull Request resolved: https://github.com/pytorch/kineto/pull/208

Adding ability to save memory allocs/deallocs into the trace

Test Plan: python test/test_profiler.py -v

Reviewed By: gdankel

Differential Revision: D28260915

fbshipit-source-id: d7905d38d7fac9750754ac1b293d3a1951590b5f
2021-05-07 21:23:30 -07:00
Ilia Cherniavskii
2370d8c41f [profiler] Add profiler fallback (#57612)
Summary:
Add an ability to use new profiler API even if Kineto is not compiled
in, by falling back to the legacy profiler.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57612

Test Plan:
compiled
USE_KINETO=0 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python
setup.py develop install --cmake
and with USE_KINETO=1
and ran
python test/test_profiler.py -v

Reviewed By: gdankel

Differential Revision: D28217680

Pulled By: ilia-cher

fbshipit-source-id: ec81fb527eb69bb0a3e0bd6aad13592200d7fe70
2021-05-06 13:35:27 -07:00
Ilia Cherniavskii
8df9b88042 [kineto] Update Kineto submodule (#57700)
Summary:
Update Kineto submodule to fix an invalid json bug, also update and
move profiler json tracing unit test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57700

Test Plan: python test/test_profiler.py -v

Reviewed By: gdankel, rohan-varma

Differential Revision: D28243256

Pulled By: ilia-cher

fbshipit-source-id: edfe9f26c66e967d610231be5fc22ba5ee1054fa
2021-05-05 20:09:38 -07:00
Ilia Cherniavskii
65fad0ebd2 Expand Kineto platform support (ci-all) (#56323)
Summary:
Expanding support to all builds

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56323

Test Plan: CI

Reviewed By: malfet

Differential Revision: D28171478

Pulled By: ilia-cher

fbshipit-source-id: 16bc752d1be3cbaeda5316f5d8a687ae05a83d22
2021-05-05 15:00:01 -07:00
Mike Guo
efd451385c Add gzip format support for chrome tracing (#56554)
Summary:
add gzip format support when exporting chrome tracing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56554

Reviewed By: xuzhao9

Differential Revision: D28019111

Pulled By: ilia-cher

fbshipit-source-id: 7d522481912bc9e93b4b31b17f01b1b069c7d2b6
2021-04-28 12:40:33 -07:00
Ilia Cherniavskii
3115728cba [profiler] Support for trace metadata (#56575)
Summary:
Adding support for user defined trace metadata

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56575

Test Plan: python test/test_profiler.py TestProfiler.test_profiler_metadata

Reviewed By: gdankel

Differential Revision: D27957876

Pulled By: ilia-cher

fbshipit-source-id: 8b6c254cca97eca23fc418e37e5772b207b0525a
2021-04-28 05:12:34 -07:00
Ilia Cherniavskii
dfd5331e9c Skip tests on ROCm (#53339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53339

Skip tests on ROCm

Test Plan: CI

Reviewed By: gdankel, ZolotukhinM

Differential Revision: D26838813

fbshipit-source-id: e26286a61a192710e393c19d3eb2316b6c76a42e
2021-03-04 21:55:34 -08:00
Ilia Cherniavskii
795ed5ca3f Enable Kineto in CPU builds (#53174)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53174

Enable Kineto also in the CPU builds (non-mobile, non-Windows(atm))

Test Plan: CI

Reviewed By: gdankel

Differential Revision: D26776112

Pulled By: ilia-cher

fbshipit-source-id: 8733f65c2993105136c853f2a7b6e497d0fa53bf
2021-03-04 19:15:52 -08:00
Xu Zhao
5c3a054b12 Add FLOPS support to the new profiler API. (#51734)
Summary:
The new profiler API was added in PR#48280. This PR is to add FLOPS
support to the new profiler API.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51734

Test Plan:
```python
python test/test_profiler.py -k test_flops
```

Reviewed By: xuzhao9

Differential Revision: D26261851

Pulled By: ilia-cher

fbshipit-source-id: dbeba4c197e6f51a9a8e640e8bb60ec38df87f73
2021-02-05 15:03:35 -08:00
guyang3532
ab0cf3b6b5 Add 'repeat' argument to profiler.schedule (#51630)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51630

Reviewed By: gdankel

Differential Revision: D26246317

Pulled By: ilia-cher

fbshipit-source-id: 28b572c837184fe1b2a07dd57e99aa72cb93a9cb
2021-02-04 13:51:04 -08:00
Ilia Cherniavskii
f1f9b049d8 [profiler] Support top-level memory events (#51421)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51421

Mark memory events that did not happen within an operator context
explicitly in the profiler output.

Test Plan: python test/test_profiler.py -k test_memory_profiler

Reviewed By: ngimel

Differential Revision: D26166518

Pulled By: ilia-cher

fbshipit-source-id: 3c14d3ac25a7137733ea7cc65f0eb48693a98f5e
2021-02-04 04:14:15 -08:00
Xu Zhao
cae4379826 Enable FLOPS Computation for Experimental Kineto Profiler (#51503)
Summary:
Add the FLOPS metric computation to the experimental Kineto profiler.
This includes saving necessary extra arguments and compute flops in the C++ code,
and extract the FLOPS value from the Python frontend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51503

Test Plan:
Build PyTorch with USE_KINETO option, then run the unit test:

```python
python test/test_profiler.py -k test_flops
```

Reviewed By: ilia-cher

Differential Revision: D26202711

Pulled By: xuzhao9

fbshipit-source-id: 7dab7c513f454355a220b72859edb3ccbddcb3ff
2021-02-03 12:15:23 -08:00