Commit Graph

27 Commits

Author SHA1 Message Date
Edward Yang
173f224570 Turn on F401: Unused import warning. (#18598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**

This was requested by someone at Facebook; this lint is turned
on for Facebook by default.  "Sure, why not."

I had to noqa a number of imports in __init__.  Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it.  Left for future work.

Be careful!  flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments.  flake8-3 will
report an import unused; flake8-2 will not.  For now, I just
noqa'd all these sites.

All the changes were done by hand.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14687478

fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
2019-03-30 09:01:17 -07:00
Edward Yang
d1497debf2 Fix B903 lint: save memory for data classes with slots/namedtuple (#18184)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18184
ghimport-source-id: 2ce860b07c58d06dc10cd7e5b97d4ef7c709a50d

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18184 Fix B903 lint: save memory for data classes with slots/namedtuple**
* #18181 Fix B902 lint error: invalid first argument.
* #18178 Fix B006 lint errors: using mutable structure in default argument.
* #18177 Fix lstrip bug revealed by B005 lint

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14530872

fbshipit-source-id: e26cecab3a8545e7638454c28e654e7b82a3c08a
2019-03-21 09:10:30 -07:00
Adam Paszke
8c3a94eaf2 Improve autograd profiler performance (#11773)
Summary:
To illustrate the benefits of this commit, I'll use the time/iter I got from one of the JIT benchmarks on my machine.

| Run                                          | Time                    |
|----------------------------------------------|-------------------------|
| No profiler                                  | 45ms                    |
| With profiler                                | 56ms                    |
| Use `clock_gettime` instead of `std::chrono` | 48ms                    |
| Touch all pages on block allocation          | 48ms (less jitter)      |
| Use `const char*` instead of `std::string`   | 47ms (even less jitter) |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11773

Differential Revision: D9886858

Pulled By: apaszke

fbshipit-source-id: 58f926f09e95df0b11ec687763a72b06b66991d0
2018-09-19 09:25:43 -07:00
Michael Carilli
0c2648830f Augment emit_nvtx to help connect backward-pass Function apply calls with their corresponding forward pass ops (#10881)
Summary:
Often, we find ourselves looking at some long-running kernel or emit_nvtx range on an nvvp profile and trying to connect it to the offending line in a training script.  If the op is in the forward pass that's easy:  ops are enqueued explicitly from the Python side, so tracking it down with manual nvtx ranges supplemented by the built-in emit_nvtx ranges is straightforward.  If the op is in the backward pass, it's much more difficult.  From the Python side, all you can do is wrap loss.backward() in an nvtx range, and if you also use emit_nvtx, the automatic ranges provide only local information.  Right now, the only consistent way to connect backward-pass kernels to their associated forward-pass lines of Python is to understand your script line by line, and know exactly where in the backward pass you are.

This PR augments the existing nvtx machinery to bridge the gap between forward and backward, allowing connection of backward-pass Function apply calls to the forward-pass operations that required/created those Functions.

The method is simple and surgical.  During the forward pass, when running with emit_nvtx, the nvtx range for each function in VariableType is tagged with the current sequence number.  During the backward pass, the nvtx range associated with each Function's operator() is tagged with that Function's stashed sequence number, which can be compared to "current sequence numbers" from the forward pass to locate the associated op.

Double-backward is not a problem.  If a backward pass with create_graph = True is underway, the relationship between backward and double-backward is conceptually the same as the relationship between forward and backward:  The functions in VariableType still spit out current-sequence-number-tagged ranges, the Function objects they create still stash those sequence numbers, and in the eventual double-backward execution, their operator() ranges are still tagged with the stashed numbers, which can be compared to "current sequence numbers" from the backward pass.

Minor caveats:

- The sequence number is thread-local, and many VariableType functions (specifically, those without a derivative explicitly defined in derivatives.yaml) don't create an associated function object (instead delegating that to sub-functions further down the call chain, perhaps called from within at::native functions that route back through VariableType by calling at::function_name).  So the correspondence of stashed sequence numbers in Function operator() ranges with numbers in forward-pass ranges is not guaranteed to be 1 to 1.  However, it's still a vast improvement over the current situation, and I don't think this issue should be a blocker.
- Feel free to litigate my use of stringstream in profiler.cpp.  I did it because it was easy and clean.  If that's too big a hammer, let's figure out something more lightweight.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10881

Differential Revision: D9833371

Pulled By: apaszke

fbshipit-source-id: 1844f2e697117880ef5e31394e36e801d1de6088
2018-09-14 11:56:55 -07:00
Richard Zou
4e446b85fb Make profiler.build_table() O(n) rather than O(n^2) (#10969)
Summary:
Fixes #10851

Speeds up profiling results dramatically.

For the following script:
```
import torch
import time

ITER = 2000

x = torch.randn(1, 1, requires_grad=True)

with torch.autograd.profiler.profile() as prof:
    y = x
    for i in range(ITER):
        y = 3 * y - 2 * y
    y.backward()

start = time.time()
print("Done running. Preparing prof")
x = str(prof)
print("Done preparing prof results")
end = time.time()
print("Elapsed: {}".format(end - start))
```

I get 7s before / 0.13s after these changes.

cc apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10969

Differential Revision: D9556129

Pulled By: zou3519

fbshipit-source-id: 26b421686f8a42cdaace6382567d403e6385dc12
2018-08-29 12:25:51 -07:00
Ryan Brigden
8f421159fd Fix profiler crash when no events register (#8034)
* Fix profiler crash when no events register

When trying to profile, attempting to print the event table throws a vague error because the event list is empty:

....
max_name_length = max(len(evt.key) for evt in events)
ValueError: max() arg is an empty sequence

This change fixes the error by returning an empty string.

* Update profiler.py
2018-06-01 14:38:24 -04:00
Maxim Berman
03767b66db Add FileNotFoundError to torch._six (#7524)
Add FileNotFoundError for compatibility with Python 2 and use in
dataloader. Fixes pytorch/pytorch#6932
2018-05-12 20:54:26 -04:00
Soumith Chintala
0016dad841
[pytorch] minor fixes around binary builds (#6291)
* remove patch

* check that cuda dev environment is also present before running cpp_extension cuda tests

* add OSError to list of exceptions when c++filt is not found
2018-04-04 22:37:13 -04:00
Richard Zou
1449c9f754 Update autograd docs (#5907)
* Update autograd docs

* Deprecate 'grad_variables' in backward().

Advise to replace with 'grad_tensors'.

* Resolve saved_variables/saved_tensors

* Tensor section

* Address comments

* Address comments

* Address comments
2018-03-30 15:33:11 -04:00
li-roy
d776c52ff7 Fix nvprof parsing (#5840) 2018-03-17 10:38:57 -04:00
Teng Li
e979b7c940 Removed redundant import re (#4826) 2018-01-23 23:43:28 -05:00
peterjc123
23dc8acbc8 Fix missing import and enable test for profiler on Windows (#4522)
* Fix missing import and enable test for profiler on Windows

* Skip process when excutable is not found
2018-01-23 21:30:42 -05:00
Kaiyu Shi
c650c73cbc Extract the finish check for profiler (#4519)
* Extract the finish check for profiler

Delete unused import and rearrange the import order.

* Add imports for win support
2018-01-08 07:54:55 -05:00
peterjc123
e5f25421ae Implement demangle in Windows (#4515) 2018-01-07 05:35:10 -05:00
Atabak Dehban
ab80c27b47 Fix undefined FileNotFoundError (#4384) 2017-12-28 20:32:49 +01:00
Konstantin Lopuhin
f01052ade4 Use enabled in torch.autograd.profiler.emit_nvtx (#4032)
Or else it's always enabled.
2017-12-05 08:45:23 -08:00
Zachary DeVito
c25a1493cd CUDA mode profiler fixes (#3754)
* CUDA mode profiler fixes

* Enable multi-gpu CUDA tracing

We need to record per-device start events because event timing
comparison only works for events on the same device.

* Course-grained CPU-CUDA syncing of timelines
  Record a __cuda_start event used to synchronize cuda/gpu timings.
  This requires running some warm-up event records to ensure the
  call to event record for the __cuda_start event doesn't take
  longer than normal.

fix syncing

* fix cuda build and lint
2017-11-28 09:32:34 -05:00
folz
ca3fc59a9a fix elapsed_us spelling 2017-11-18 18:28:27 +01:00
Zachary DeVito
cc7f09a372
Add cudaEvent support to the profiler (#3734)
* Add cudaEvent support to the profiler

This adds the ability to record cuda timings using cudaEventRecord
in the profiler. Since it doesn't require nvprof it is easier
to run than the nvprof path.

This also records a thread id for each event, which will make
tracing results easier to understand

* Add flow arrows from cpu to cuda event

* Fix no cuda build

* Review comments

* Move CUDA checks to one place
2017-11-16 13:58:09 -08:00
Adam Paszke
02450fff38 Expend autograd profiler docs (#3621) 2017-11-10 08:58:45 -05:00
Ozan Çağlayan
dd6d04ddf2 doc: Normalize all true/false in docstrings to `True|False` (#3593)
* doc: Normalize all true/false in docstrings to ``True|False``

This makes them more apparent in the documentation.

* doc: fix flake8
2017-11-09 08:12:29 -05:00
Adam Paszke
22f596572c Add torch.autograd.profiler.range 2017-11-06 19:42:44 -05:00
Alexander Miller
837f933cac remove 'path' from key_averages header
path appears to be unused
2017-10-25 21:34:59 +02:00
Adam Paszke
76abc06b1f Fix nvprof mode in autograd profiler 2017-10-20 10:22:54 -04:00
Edward Z. Yang
191224b6e6 Suggest key_averages by default, it's more useful.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-13 01:31:22 +02:00
Adam Paszke
6fbbb1bc4e Limit number of demangler invocations in autograd profiler 2017-10-03 09:55:37 -04:00
Adam Paszke
411e1469e0 Add tools for autograd profiling 2017-09-25 23:21:30 -04:00