pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Taylor Robie	c321dfe1b5	move tree tests to the start of test_profiler.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/79301 Approved by: https://github.com/davidchencsl	2022-06-12 04:24:30 +00:00
PyTorch MergeBot	c99ea0db46	Revert "[PyTorch] Record Sequence Number to Match Forward and Backward Operators (#78795 )" This reverts commit `a299a2fa26`. Reverted https://github.com/pytorch/pytorch/pull/78795 on behalf of https://github.com/janeyx99 due to Broke profiler tests `a299a2fa26`	2022-06-10 13:11:44 +00:00
Louis Feng	a299a2fa26	[PyTorch] Record Sequence Number to Match Forward and Backward Operators (#78795 ) Summary: Add sequence number to map forward and backward operators. Test Plan: ``` buck build mode/dev-nosan cea/ml_perf_model/gpu/scripts: --show-output buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestExecutionGraph.test_execution_graph_start_stop ``` Outputs with seq_id: P505545974 Differential Revision: D36881999 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78795 Approved by: https://github.com/robieta	2022-06-10 05:51:17 +00:00
Taylor Robie	84b9e5ba84	Move test_profiler tests to tree rather than icicle format Pull Request resolved: https://github.com/pytorch/pytorch/pull/79175 Approved by: https://github.com/ezyang	2022-06-09 19:45:02 +00:00
Taylor Robie	9f2e2aa28b	Revert "Revert "[Profiler] Move python tracing to unified event type (Part 2)"" This reverts commit `4305f8e9bd`. replace TEST_CUDA with torch.has_cuda Pull Request resolved: https://github.com/pytorch/pytorch/pull/79173 Approved by: https://github.com/ezyang	2022-06-09 19:45:02 +00:00
Edward Z. Yang	eb856daf0f	Do not treat all dense tensors as isTensorSubclassLike Fixes https://github.com/pytorch/pytorch/issues/79079 Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/79098 Approved by: https://github.com/soulitzer, https://github.com/albanD	2022-06-09 03:00:57 +00:00
PyTorch MergeBot	4305f8e9bd	Revert "[Profiler] Move python tracing to unified event type (Part 2)" This reverts commit `c2a3c8186c`. Reverted https://github.com/pytorch/pytorch/pull/78164 on behalf of https://github.com/malfet due to Broke cuda-on-cpu tests, see `c2a3c8186c`	2022-06-08 02:21:16 +00:00
Taylor Robie	c2a3c8186c	[Profiler] Move python tracing to unified event type (Part 2) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78164 This PR finishes moving over the python tracer to use the unified event type. Things that changed: 1) The hacky after-the-fact splicing of python events in profiler_kineto.cpp is gone and python events now simply fold into the rest. (Yay!!!) This is a major BE win. 2) Added `ExtraFields<EventType::PyCall>` and `ExtraFields<EventType::PyCCall>` 3) The enter events (time + TraceKey) are now handled by RecordQueue for performance. 4) Python tracing now uses TSC for lower overhead. Simplifications in profiler_python WRT part 1: 1) Rather than ValueCache emitting an intermediate value_t that gets further converted, load methods can now directly emit ExtraFields<...> 2) The complicated replay in profiler_python.cpp is replaced with a much simpler (and safer) pass to just pair start and end times. 3) During post processing we can now use `CallTypeHelper::map` to automatically pull in all events instead of having to loop over each the entries for each type manually. This will make it simpler to add new types of Python event later. Differential Revision: [D36515869](https://our.internmc.facebook.com/intern/diff/D36515869/) Approved by: https://github.com/aaronenyeshi	2022-06-07 23:42:00 +00:00
Taylor Robie	a173613f6d	[Profiler] Move python tracing to unified event type (Part 1) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78163 The python function tracer is complicated and separate from the other profile types, so I've chosen to break the change into two diff. The first (this one) reworks the cache structure to make it amenable to integration (as well as some other nice tweaks) and the next one actually moves it over. The old cache scheme worked very hard to pack all the information about an event into a small struct via bit packing, with a couple secondary caches for things like names. Because of the space constraints on that struct (and the fact that it had to represent all call and return types) there were a lot of subtle invariants swirling around that made it hard to offload anything to a different component. The new cache system is more modular and also, as it turns out, a bit faster. (Benchmarks in part 2) There is a more detailed description of the cache hierarchy in the PR, but the gist is that I use various specializations to handle the different event types (python call, nn module, c function) and lean on the type system to keep everything safe and organized. (One nice thing about using unique IDs is that they also implicitly encode the event type. They implicitly encode everything!) Given that we are going to want to expand the semantics (e.g. torch ops, DataLoader, etc) this will give a nice way to capture richer semantics without significantly increasing the complexity of the profiler. Differential Revision: [D36379147](https://our.internmc.facebook.com/intern/diff/D36379147/) Approved by: https://github.com/aaronenyeshi	2022-06-07 23:42:00 +00:00
Edward Z. Yang	7313a7a987	Make Meta into a backend component Seems like it should be one. This will make it possible to register meta implementations even when there is a CompositeImplicitAutograd registration already. It also paves the way for sparse meta, etc. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/78469 Approved by: https://github.com/ngimel	2022-05-31 18:59:16 +00:00
Louis Feng	18d46ea9fd	[PyTorch] Integrate Execution Graph Observer into PyTorch Profiler (#75358 ) Test Plan: ``` buck build mode/dev-nosan caffe2/test:profiler --show-output buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestExecutionGraph.test_execution_graph ``` Example output: P491658589 Differential Revision: D35342394 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75358 Approved by: https://github.com/robieta	2022-05-26 08:06:27 +00:00
Taylor Robie	34d160b1fa	[Profiler] Build call tree in `collection.cpp` Pull Request resolved: https://github.com/pytorch/pytorch/pull/77698 This PR adds tree building to the post processing of profiler. The basic algorithm is to sort the events, maintain a stack and a priority queue of event ends, and push/pop accordingly. The logic for merging Python events is still separate in `profiler_kineto.cpp`. That can be removed when Python events have an `EventType`. Differential Revision: [D36321105](https://our.internmc.facebook.com/intern/diff/D36321105/) Approved by: https://github.com/aaronenyeshi	2022-05-25 16:11:01 +00:00
Louis Feng	82cb7210e8	[PyTorch] Fix record function inputs_valid_ check (#78002 ) Summary: I think this has to be set in all before() calls. Because by default inputs_valid_ = false;. For RecordFunction without any input parameters (so this interface is not used), calling record_func.inputs() will cause an assert: ``` fbcode/caffe2/aten/src/ATen/record_function.h 322 TORCH_INTERNAL_ASSERT_DEBUG_ONLY(inputs_valid_, "Called inputs() outside RecordFunction start callback"); ``` I suppose, an alternative is to require users to call num_inputs() to check before calling inputs(). But I think the intent of inputs_valid_ is for verifying inputs are being requested within the lifetime of the start callback. Test Plan: After this diff fix: ``` => buck build mode/dev-nosan caffe2/test:profiler --show-output => buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestRecordFunction.test_record_function test_record_function (test_profiler.TestRecordFunction) ... Log file: /tmp/libkineto_activities_763488.json INFO:2022-05-20 13:02:33 763488:763488 Config.cpp:470] Trace start time: 2022-05-20 13:02:48 Trace duration: 500ms Warmup duration: 5s Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,external_correlation,cuda_runtime,cpu_instant_event,python_function Manifold bucket: gpu_traces Manifold object: tree/traces/clientAPI/0/1653076953/devgpu040.ftw6/libkineto_activities_763488.json Trace compression enabled: 1 INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:559] Enabling GPU tracing INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:486] Running child profiler CuptiRangeProfiler for 500 ms INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:593] Tracing starting in 14s INFO:2022-05-20 13:02:33 763488:763488 CuptiActivityProfiler.cpp:596] Tracing will end in 15s WARNING: Logging before InitGoogleLogging() is written to STDERR I0520 13:02:34.013141 763488 Logger.cpp:2273] Dropping logs in unit tests. Set shouldLogDuringTests=True in your CBLC to fix this STAGE:2022-05-20 13:02:33 763488:763488 ActivityProfilerController.cpp:269] Completed Stage: Warm Up STAGE:2022-05-20 13:02:34 763488:763488 ActivityProfilerController.cpp:275] Completed Stage: Collection INFO:2022-05-20 13:02:34 763488:763488 CuptiActivityProfiler.cpp:134] Processing 1 CPU buffers INFO:2022-05-20 13:02:34 763488:763488 CuptiActivityProfiler.cpp:771] Traces Recorded: INFO:2022-05-20 13:02:34 763488:763488 CuptiActivityProfiler.cpp:774] PyTorch Profiler: 1 iterations ok ---------------------------------------------------------------------- Ran 1 test in 0.060s OK ``` New test failure case: ``` => buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestRecordFunction.test_record_function test_record_function (test_profiler.TestRecordFunction) ... Log file: /tmp/libkineto_activities_808629.json INFO:2022-05-20 13:04:46 808629:808629 Config.cpp:470] Trace start time: 2022-05-20 13:05:01 Trace duration: 500ms Warmup duration: 5s Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,external_correlation,cuda_runtime,cpu_instant_event,python_function Manifold bucket: gpu_traces Manifold object: tree/traces/clientAPI/0/1653077086/devgpu040.ftw6/libkineto_activities_808629.json Trace compression enabled: 1 INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:559] Enabling GPU tracing INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:486] Running child profiler CuptiRangeProfiler for 500 ms INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:593] Tracing starting in 14s INFO:2022-05-20 13:04:46 808629:808629 CuptiActivityProfiler.cpp:596] Tracing will end in 15s WARNING: Logging before InitGoogleLogging() is written to STDERR I0520 13:04:46.853253 808629 Logger.cpp:2273] Dropping logs in unit tests. Set shouldLogDuringTests=True in your CBLC to fix this STAGE:2022-05-20 13:04:46 808629:808629 ActivityProfilerController.cpp:269] Completed Stage: Warm Up W0520 13:04:48.126065 808629 record_function.cpp:470] Exception in RecordFunction callback: inputs_valid_ INTERNAL ASSERT FAILED at "caffe2/aten/src/ATen/record_function.h":322, please report a bug to PyTorch. Called inputs() outside RecordFunction start callback Exception raised from inputs at caffe2/aten/src/ATen/record_function.h:322 (most recent call first): # 0 c10::get_backtrace[abi:cxx11](unsigned long, unsigned long, bool) # 1 std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (), c10::(anonymous namespace)::GetFetchStackTrace()::$_1>::_M_invoke(std::_Any_data const&) # 2 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) # 3 c10::detail::torchCheckFail(char const, char const, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) # 4 c10::detail::torchInternalAssertFail(char const, char const, unsigned int, char const, char const) # 5 torch::profiler::impl::ThreadLocalSubqueue::begin_op(at::RecordFunction const&, unsigned long) # 6 std::unique_ptr<at::ObserverContext, std::default_delete<at::ObserverContext> > torch::autograd::profiler::(anonymous namespace)::onFunctionEnter<false>(at::RecordFunction const&) # 7 at::RecordFunction::runStartCallbacks() ``` Differential Revision: D36556512 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78002 Approved by: https://github.com/swolchok	2022-05-25 06:33:59 +00:00
Kevin Tse	7c52f204e0	[DataPipe] Enforcing single valid iterator for IterDataPipes without multiple outputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/70479 Approved by: https://github.com/ejguan	2022-05-18 01:31:38 +00:00
Taylor Robie	0df2e863fb	[Profiler] Expose `profilerType` in Python Summary: It's currently possible for C++ callers to check if there is an active profiler. This adds Python API parity. For now we just use `torch._C._autograd` namespace, as this is mostly for first party frameworks like RPC. (We can always move to public API if there is demand.) Test Plan: Added unit test Differential Revision: D35602425 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75754 Approved by: https://github.com/rohan-varma	2022-04-16 21:08:18 +00:00
erjia	277c8fe646	[DataPipe] Make sure the profiler wrapper can delegate API for iterator This PR is trying to solve the problem that delegate the API from the profiler layer to the `Iterator` returned from `IterDataPipe`. We need this for internal usage `limit`, `resume`, etc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75275 Approved by: https://github.com/NivekT	2022-04-13 21:47:14 +00:00
alexmsettle	c0a6add7ee	Changes to support input sequence ID tracking (#70264 ) Summary: in the NVTX markers. This feature adds additional information to the NVTX marker string eg seq_ids=[101, 102, 103]. This indicates the sequence id of the op which produced the input tensor based on its position index in the array. In the above example input tensor 0 was produced by the node with sequence id 101, input tensor 1 is from node 102, input tensor 2 is from node with sequence id 103. This is the same way the sizes array is organized. If you know the sequence id of the node and the sequence ids of the input edges, then you have enough information to construct the network graph. Fixes https://github.com/pytorch/pytorch/issues/66105 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70264 Reviewed By: chaekit Differential Revision: D34792707 Pulled By: robieta fbshipit-source-id: 4407b853c929a737505803b0db77a8ecd966cce2 (cherry picked from commit cd3c0c8c9d4d63d7897f60521c407883240d1d5b)	2022-03-31 22:15:39 +00:00
Mike Guo	554169fc7b	Disable forward/backward correlation to workaround the profiler crash (#72904 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/69443 and https://github.com/pytorch/pytorch/issues/72858 Pull Request resolved: https://github.com/pytorch/pytorch/pull/72904 Reviewed By: george-qi Differential Revision: D34382323 Pulled By: robieta fbshipit-source-id: 2a6c18a010b6844a769d091b50bea1fd6285524f (cherry picked from commit 7d2fabc5eca2b00c1f80b287b4a4e2650a178136)	2022-03-01 20:34:14 +00:00
Taylor Robie	322f13d914	[Profiler] Fix memory profile type from recent refactor (#71417 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71417 I accidentally changed CPU_INSTANT_EVENT to CPU_OP, which broke TensorBoard. Test Plan: Make memory profiling unit test check this case. Reviewed By: aaronenyeshi Differential Revision: D33637286 fbshipit-source-id: c95945f6b85cd4168820bd4d2a9203274a0a5bd6 (cherry picked from commit `b1e258672a`)	2022-01-18 22:18:11 +00:00
Mike Guo	23633bdb5c	record the datapipe for each pieces of Dataset (#67613 ) Summary: Add record_function for each DataPipe. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67613 Reviewed By: H-Huang Differential Revision: D32246672 Pulled By: ejguan fbshipit-source-id: 02ef7e75748c5b84fdcbb103398532e1f2962fbf	2021-12-01 10:29:06 -08:00
Andrey Talman	f1a3512b78	Adding Linux cuda 11.5 workflows (#68745 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68960 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68745 Reviewed By: janeyx99 Differential Revision: D32707491 Pulled By: atalman fbshipit-source-id: 100facfdcc0fc2f68e203a696856852faa25ee08	2021-11-29 16:21:00 -08:00
Jane Xu	fa7fb7b4d9	[skip ci] Set test owner for test_profiler.py (#66831 ) Summary: Followup action to https://github.com/pytorch/pytorch/issues/66232 cc ilia-cher robieta chaekit gdankel bitfort ngimel orionr nbcsm guotuofeng guyang3532 gaoteng-git Pull Request resolved: https://github.com/pytorch/pytorch/pull/66831 Reviewed By: gdankel Differential Revision: D31909245 Pulled By: janeyx99 fbshipit-source-id: 4156a5cffa215c29022fc4dab6ee5b442a509db4	2021-10-25 15:59:52 -07:00
Louis Feng	ecb7b38c00	[PyTorch] Support additional arguments in Python record function (#65736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65736 We ran into some limitations to extract PyTorch operator parameters through hooks or the execution graph. Some of these limitations are not due to the operator not exposing them, rather the inputs for these operators are already fused/processed in some cases (like embedding table). We want to be able to attach some metadata to the user scope record functions allowing the profilers to later extract these information. The record function C++ API already supports taking inputs and outputs information. The corresponding Python interface does not support them and only allows a string name as record function parameter. This diff adds support for user to optionally to add additional arguments to the record function in two ways. 1. to remain backward compatible with `record_function_op`, we have added an optional string arg to the interface: `with record_function(name, arg_str)`. 2. to support data dependency graph, we also have the new `torch.autograd._record_function_with_args_enter` and `torch.autograd._record_function_with_args_exit` functions to provide an interface where we can give additional tensor arguments. For now we imagine this can be used for debugging or analysis purpose. In this form, we currently support some basic data types as inputs: scalars, string, list, and tensor. Example usage: ``` # record_function operator with a name and optionally, a string for arguments. with record_function("## TEST 1 ##", "[1, 2, 3]"): <actual module or operator> # more general form of record_function a = _record_function_with_args_enter("## TEST 2 ##", 1, False, 2.5, [u, u], "hello", u) <actual module or operator> _record_function_with_args_exit(a) ``` Corresponding outputs in execution graph: ``` { "name": "## TEST 2 ##", "id": 7, "parent": 3, "fw_parent": 0, "scope": 5, "tid": 1, "fw_tid": 0, "inputs": [1,false,2.5,[6,6],"hello",6], "input_shapes": [[],[],[],[[3,4,5],[3,4,5]],[],[3,4,5]], "input_types": ["Int","Bool","Double","GenericList[Tensor(float),Tensor(float)]","String","Tensor(float)"], "outputs": [], "output_shapes": [], "output_types": [] }, { "name": "## TEST 1 ##", "id": 3, "parent": 2, "fw_parent": 0, "scope": 5, "tid": 1, "fw_tid": 0, "inputs": ["1, 2, 3"], "input_shapes": [[]], "input_types": ["String"], "outputs": [], "output_shapes": [], "output_types": [] }, ``` Test Plan: ``` => buck build caffe2/test:profiler --show-output => buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestRecordFunction test_record_function (test_profiler.TestRecordFunction) ... Log file: /tmp/libkineto_activities_1651304.json Net filter: Target net for iteration count: Net Iterations: 3 INFO:2021-09-27 01:10:15 1651304:1651304 Config.cpp:424] Trace start time: 2021-09-27 01:10:30 Trace duration: 500ms Warmup duration: 5s Net size threshold: 0 GPU op count threshold: 0 Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,external_correlation,cuda_runtime,cpu_instant_event Manifold bucket: gpu_traces Manifold object: tree/traces/clientAPI/0/1632730215/devvm2060.ftw0/libkineto_activities_1651304.json Trace compression enabled: 1 INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:536] Tracing starting in 14s INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:48] Target net for iterations not specified - picking first encountered that passes net filter INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:57] Tracking net PyTorch Profiler for 3 iterations INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:126] Processing 1 CPU buffers INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:686] Recorded nets: INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:689] PyTorch Profiler: 1 iterations ok ---------------------------------------------------------------------- Ran 1 test in 0.021s OK ``` Reviewed By: gdankel Differential Revision: D31165259 fbshipit-source-id: 15920aaef7138c666e5eca2a71c3bf33073eadc4	2021-10-13 01:49:15 -07:00
Nikita Shulga	399214efd6	Revert D31172530: [pytorch][PR] Enable CUPTI for kineto by default on windows Test Plan: revert-hammer Differential Revision: D31172530 (`6b60884f12`) Original commit changeset: 2c69ed0282c5 fbshipit-source-id: 649e040a8c44b0f536a8db397b4325309a285934	2021-09-24 19:18:15 -07:00
Guangyun Han	6b60884f12	Enable CUPTI for kineto by default on windows (#65608 ) Summary: Retry of https://github.com/pytorch/pytorch/pull/62175 See https://github.com/pytorch/pytorch/pull/62175#issuecomment-926411151 for more information. malfet gdankel Pull Request resolved: https://github.com/pytorch/pytorch/pull/65608 Reviewed By: zou3519 Differential Revision: D31172530 Pulled By: gdankel fbshipit-source-id: 2c69ed0282c54fa6cdb6e604096d0370e230fd66	2021-09-24 13:00:49 -07:00
Nikita Shulga	bc02255d5e	Revert D30721329: [pytorch][PR] Enable CUPTI for kineto by default on windows. Test Plan: revert-hammer Differential Revision: D30721329 (`7dbc21bc2b`) Original commit changeset: aa1af47df8cc fbshipit-source-id: 565d50841e19a45f8798a490aa3aa6b9f69ca404	2021-09-23 22:14:32 -07:00
Guangyun Han	7dbc21bc2b	Enable CUPTI for kineto by default on windows. (#62175 ) Summary: It fix nothing. For tracking this PR, please refers to https://github.com/pytorch/kineto/issues/356 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62175 Reviewed By: ezyang Differential Revision: D30721329 Pulled By: gdankel fbshipit-source-id: aa1af47df8cc1b6f5ba2194447f62b902a6a9c84	2021-09-23 15:13:47 -07:00
Teng Gao	d35ee431d8	correlate forward and backward op (#62553 ) Summary: Use startThreadId+seqNumber of forward-op and fwdThreadId+seqNumber of backward-op to correlate pair of them. third_party/kineto should be updated accordingly: https://github.com/pytorch/kineto/pull/372 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62553 Reviewed By: malfet Differential Revision: D30125728 Pulled By: gdankel fbshipit-source-id: 9877a54392ba043d0eac56ce5b7bbf244277fa7e	2021-09-21 07:28:29 -07:00
Lucas Kabela	4a59f0b9d9	[Profiler] Change FLOP/s to Total FLOPs (#62779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62779 Change from floating point operations per second to total floating point operations. This requires removing the division by executing time from the Kineto computed FLOPs and updating necessary documentation Test Plan: Running the following script: ``` import torch from torch.profiler import profile import torchvision.models as models model = models.resnet18().eval() inputs = torch.randn(5, 3, 224, 224) with torch.no_grad(): with profile(record_shapes=True, with_flops=True) as prof: model(inputs) print(prof.key_averages().table(sort_by="cpu_time_total")) ``` Before diff results in: {F636640118} And after diff should be about `(27.78 * 10^9) FLOP/s * .652838 seconds =18135839640 FLOP = 18.136 GFLOP`. Running the script again yields this answer: {F636655686} ------------------------------------ Reviewed By: gdankel Differential Revision: D29972997 fbshipit-source-id: 0f8d9f264b7d9f8f6bb3f10ab7c2c9794291e28b	2021-08-16 13:43:32 -07:00
Kimish Patel	54f2eb6e7e	[Pytorch Profiler] Add support for adding module hierarchy to (#61792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61792 KinetoEvent This PR adds module hierarchy information to events. What is module hierarchy information attached to events? During profiling a TorchScript module, when events are added, we ask JIT what is the module hierarchy associated with the node being executed. At the time of execution of that node, there might be multiple frames in the stack of interpreter. For each frame, we find corresponding node and the corresponding module hierarchy is queried. Module hierarchy corresponding to the node is associated with node's InlinedCallStack. InlinedCallStack of node tracks the path via which the node is inlined. Thus during the inlining process we annotate module information corresponding to the CallMethod nodes being inlined. With this PR, chrome trace will contain additional metadata: "Module Hierarchy". This can look like this: TOP(ResNet)::forward.SELF(ResNet)::_forward_impl.layer1(Sequential)::forward.0(BasicBlock)::forward.conv1(Conv2d)::forward.SELF(Conv2d)::_conv_forward It contains module instance, type name and the method name in the callstack. Test Plan: test_profiler Imported from OSS Reviewed By: raziel, ilia-cher Differential Revision: D29745442 fbshipit-source-id: dc8dfaf7c5b8ab256ff0b2ef1e5ec265ca366528	2021-08-13 21:39:10 -07:00
Shen Li	1022443168	Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: revert-hammer Differential Revision: D30279364 (`b004307252`) Original commit changeset: c1ed77dfe43a fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e	2021-08-12 11:45:01 -07:00
Zsolt Dollenstein	b004307252	[codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: manual inspection & sandcastle Reviewed By: zertosh Differential Revision: D30279364 fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a	2021-08-12 10:58:35 -07:00
Han Guangyun	8bbcef5096	Report more information for memory profiling (#61282 ) Summary: Report pointed memory size, total allocated memory, total reserved size all in one report. `ptr` and `alloc_size` will be used for associating with op trace. `allocated_size`, `reserved_size` will be used for memory trace. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61282 Reviewed By: ejguan Differential Revision: D29796282 Pulled By: chaekit fbshipit-source-id: 5314c867632d3af1fa9a3811b35eaa5e931a5d87	2021-08-04 15:03:14 -07:00
Mike Guo	08539ca047	Add non-context manager usage support for profiler (#61690 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60238, https://github.com/pytorch/kineto/issues/329 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61690 Reviewed By: malfet Differential Revision: D30016561 Pulled By: ngimel fbshipit-source-id: 93a578ffbb556f4b584213ac9cfafcc5cf0a9270	2021-07-30 15:54:36 -07:00
Ilia Cherniavskii	6997e7bd39	Update Kineto submodule (#58179 ) Summary: Update Kineto submodule, minor api changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/58179 Test Plan: CI Reviewed By: gdankel Differential Revision: D28391369 Pulled By: ilia-cher fbshipit-source-id: 61fbf63d9ec2db66fac203944679e4b99cb0d568	2021-05-13 04:03:04 -07:00
Ilia Cherniavskii	e18f5f1d13	[profiler][small] Add skip_first parameter to the default schedule (#58025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58025 Add `skip_first` parameter to allow for arbitrary profiler step ranges Test Plan: python test/test_profiler.py Reviewed By: gdankel Differential Revision: D28347768 Pulled By: ilia-cher fbshipit-source-id: bb6fd3cedfa4a5d1307b91002def733896dd03eb	2021-05-12 02:06:11 -07:00
Ilia Cherniavskii	bf2ebfc9f6	[profiler][small] Handle empty trace (#58013 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58013 Add a test case and a fix (legacy profiler) for empty trace handling Test Plan: python test/test_profiler.py Reviewed By: gdankel Differential Revision: D28345388 Pulled By: ilia-cher fbshipit-source-id: 4727589ab83367ac8b506cc0f186e5292d974671	2021-05-12 02:06:08 -07:00
Ilia Cherniavskii	c714596027	[kineto] Update Kineto submodule, cupti library paths (#57789 ) Summary: Update kineto submodule, improve cupti detection Pull Request resolved: https://github.com/pytorch/pytorch/pull/57789 Test Plan: CI Reviewed By: ngimel Differential Revision: D28297175 Pulled By: ilia-cher fbshipit-source-id: 5895270fae160097ae8872a592984d0e4a1b187b	2021-05-10 19:15:59 -07:00
Ilia Cherniavskii	8639fd104e	[profiler][kineto] Support for memory allocs/deallocs in the traces (#57835 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57835 Pull Request resolved: https://github.com/pytorch/kineto/pull/208 Adding ability to save memory allocs/deallocs into the trace Test Plan: python test/test_profiler.py -v Reviewed By: gdankel Differential Revision: D28260915 fbshipit-source-id: d7905d38d7fac9750754ac1b293d3a1951590b5f	2021-05-07 21:23:30 -07:00
Ilia Cherniavskii	2370d8c41f	[profiler] Add profiler fallback (#57612 ) Summary: Add an ability to use new profiler API even if Kineto is not compiled in, by falling back to the legacy profiler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57612 Test Plan: compiled USE_KINETO=0 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake and with USE_KINETO=1 and ran python test/test_profiler.py -v Reviewed By: gdankel Differential Revision: D28217680 Pulled By: ilia-cher fbshipit-source-id: ec81fb527eb69bb0a3e0bd6aad13592200d7fe70	2021-05-06 13:35:27 -07:00
Ilia Cherniavskii	8df9b88042	[kineto] Update Kineto submodule (#57700 ) Summary: Update Kineto submodule to fix an invalid json bug, also update and move profiler json tracing unit test Pull Request resolved: https://github.com/pytorch/pytorch/pull/57700 Test Plan: python test/test_profiler.py -v Reviewed By: gdankel, rohan-varma Differential Revision: D28243256 Pulled By: ilia-cher fbshipit-source-id: edfe9f26c66e967d610231be5fc22ba5ee1054fa	2021-05-05 20:09:38 -07:00
Ilia Cherniavskii	65fad0ebd2	Expand Kineto platform support (ci-all) (#56323 ) Summary: Expanding support to all builds Pull Request resolved: https://github.com/pytorch/pytorch/pull/56323 Test Plan: CI Reviewed By: malfet Differential Revision: D28171478 Pulled By: ilia-cher fbshipit-source-id: 16bc752d1be3cbaeda5316f5d8a687ae05a83d22	2021-05-05 15:00:01 -07:00
Mike Guo	efd451385c	Add gzip format support for chrome tracing (#56554 ) Summary: add gzip format support when exporting chrome tracing Pull Request resolved: https://github.com/pytorch/pytorch/pull/56554 Reviewed By: xuzhao9 Differential Revision: D28019111 Pulled By: ilia-cher fbshipit-source-id: 7d522481912bc9e93b4b31b17f01b1b069c7d2b6	2021-04-28 12:40:33 -07:00
Ilia Cherniavskii	3115728cba	[profiler] Support for trace metadata (#56575 ) Summary: Adding support for user defined trace metadata Pull Request resolved: https://github.com/pytorch/pytorch/pull/56575 Test Plan: python test/test_profiler.py TestProfiler.test_profiler_metadata Reviewed By: gdankel Differential Revision: D27957876 Pulled By: ilia-cher fbshipit-source-id: 8b6c254cca97eca23fc418e37e5772b207b0525a	2021-04-28 05:12:34 -07:00
Ilia Cherniavskii	dfd5331e9c	Skip tests on ROCm (#53339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53339 Skip tests on ROCm Test Plan: CI Reviewed By: gdankel, ZolotukhinM Differential Revision: D26838813 fbshipit-source-id: e26286a61a192710e393c19d3eb2316b6c76a42e	2021-03-04 21:55:34 -08:00
Ilia Cherniavskii	795ed5ca3f	Enable Kineto in CPU builds (#53174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53174 Enable Kineto also in the CPU builds (non-mobile, non-Windows(atm)) Test Plan: CI Reviewed By: gdankel Differential Revision: D26776112 Pulled By: ilia-cher fbshipit-source-id: 8733f65c2993105136c853f2a7b6e497d0fa53bf	2021-03-04 19:15:52 -08:00
Xu Zhao	5c3a054b12	Add FLOPS support to the new profiler API. (#51734 ) Summary: The new profiler API was added in PR#48280. This PR is to add FLOPS support to the new profiler API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51734 Test Plan: ```python python test/test_profiler.py -k test_flops ``` Reviewed By: xuzhao9 Differential Revision: D26261851 Pulled By: ilia-cher fbshipit-source-id: dbeba4c197e6f51a9a8e640e8bb60ec38df87f73	2021-02-05 15:03:35 -08:00
guyang3532	ab0cf3b6b5	Add 'repeat' argument to profiler.schedule (#51630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51630 Reviewed By: gdankel Differential Revision: D26246317 Pulled By: ilia-cher fbshipit-source-id: 28b572c837184fe1b2a07dd57e99aa72cb93a9cb	2021-02-04 13:51:04 -08:00
Ilia Cherniavskii	f1f9b049d8	[profiler] Support top-level memory events (#51421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51421 Mark memory events that did not happen within an operator context explicitly in the profiler output. Test Plan: python test/test_profiler.py -k test_memory_profiler Reviewed By: ngimel Differential Revision: D26166518 Pulled By: ilia-cher fbshipit-source-id: 3c14d3ac25a7137733ea7cc65f0eb48693a98f5e	2021-02-04 04:14:15 -08:00
Xu Zhao	cae4379826	Enable FLOPS Computation for Experimental Kineto Profiler (#51503 ) Summary: Add the FLOPS metric computation to the experimental Kineto profiler. This includes saving necessary extra arguments and compute flops in the C++ code, and extract the FLOPS value from the Python frontend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51503 Test Plan: Build PyTorch with USE_KINETO option, then run the unit test: ```python python test/test_profiler.py -k test_flops ``` Reviewed By: ilia-cher Differential Revision: D26202711 Pulled By: xuzhao9 fbshipit-source-id: 7dab7c513f454355a220b72859edb3ccbddcb3ff	2021-02-03 12:15:23 -08:00

1 2

64 Commits