pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Edward Yang	173f224570	Turn on F401: Unused import warning. (#18598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598 ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a Stack from [ghstack](https://github.com/ezyang/ghstack): * #18598 Turn on F401: Unused import warning. This was requested by someone at Facebook; this lint is turned on for Facebook by default. "Sure, why not." I had to noqa a number of imports in __init__. Hypothetically we're supposed to use __all__ in this case, but I was too lazy to fix it. Left for future work. Be careful! flake8-2 and flake8-3 behave differently with respect to import resolution for # type: comments. flake8-3 will report an import unused; flake8-2 will not. For now, I just noqa'd all these sites. All the changes were done by hand. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14687478 fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3	2019-03-30 09:01:17 -07:00
Edward Yang	d1497debf2	Fix B903 lint: save memory for data classes with slots/namedtuple (#18184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18184 ghimport-source-id: 2ce860b07c58d06dc10cd7e5b97d4ef7c709a50d Stack from [ghstack](https://github.com/ezyang/ghstack): * #18184 Fix B903 lint: save memory for data classes with slots/namedtuple * #18181 Fix B902 lint error: invalid first argument. * #18178 Fix B006 lint errors: using mutable structure in default argument. * #18177 Fix lstrip bug revealed by B005 lint Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14530872 fbshipit-source-id: e26cecab3a8545e7638454c28e654e7b82a3c08a	2019-03-21 09:10:30 -07:00
Adam Paszke	8c3a94eaf2	Improve autograd profiler performance (#11773 ) Summary: To illustrate the benefits of this commit, I'll use the time/iter I got from one of the JIT benchmarks on my machine. \| Run \| Time \| \|----------------------------------------------\|-------------------------\| \| No profiler \| 45ms \| \| With profiler \| 56ms \| \| Use `clock_gettime` instead of `std::chrono` \| 48ms \| \| Touch all pages on block allocation \| 48ms (less jitter) \| \| Use `const char*` instead of `std::string` \| 47ms (even less jitter) \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/11773 Differential Revision: D9886858 Pulled By: apaszke fbshipit-source-id: 58f926f09e95df0b11ec687763a72b06b66991d0	2018-09-19 09:25:43 -07:00
Michael Carilli	0c2648830f	Augment emit_nvtx to help connect backward-pass Function apply calls with their corresponding forward pass ops (#10881 ) Summary: Often, we find ourselves looking at some long-running kernel or emit_nvtx range on an nvvp profile and trying to connect it to the offending line in a training script. If the op is in the forward pass that's easy: ops are enqueued explicitly from the Python side, so tracking it down with manual nvtx ranges supplemented by the built-in emit_nvtx ranges is straightforward. If the op is in the backward pass, it's much more difficult. From the Python side, all you can do is wrap loss.backward() in an nvtx range, and if you also use emit_nvtx, the automatic ranges provide only local information. Right now, the only consistent way to connect backward-pass kernels to their associated forward-pass lines of Python is to understand your script line by line, and know exactly where in the backward pass you are. This PR augments the existing nvtx machinery to bridge the gap between forward and backward, allowing connection of backward-pass Function apply calls to the forward-pass operations that required/created those Functions. The method is simple and surgical. During the forward pass, when running with emit_nvtx, the nvtx range for each function in VariableType is tagged with the current sequence number. During the backward pass, the nvtx range associated with each Function's operator() is tagged with that Function's stashed sequence number, which can be compared to "current sequence numbers" from the forward pass to locate the associated op. Double-backward is not a problem. If a backward pass with create_graph = True is underway, the relationship between backward and double-backward is conceptually the same as the relationship between forward and backward: The functions in VariableType still spit out current-sequence-number-tagged ranges, the Function objects they create still stash those sequence numbers, and in the eventual double-backward execution, their operator() ranges are still tagged with the stashed numbers, which can be compared to "current sequence numbers" from the backward pass. Minor caveats: - The sequence number is thread-local, and many VariableType functions (specifically, those without a derivative explicitly defined in derivatives.yaml) don't create an associated function object (instead delegating that to sub-functions further down the call chain, perhaps called from within at::native functions that route back through VariableType by calling at::function_name). So the correspondence of stashed sequence numbers in Function operator() ranges with numbers in forward-pass ranges is not guaranteed to be 1 to 1. However, it's still a vast improvement over the current situation, and I don't think this issue should be a blocker. - Feel free to litigate my use of stringstream in profiler.cpp. I did it because it was easy and clean. If that's too big a hammer, let's figure out something more lightweight. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10881 Differential Revision: D9833371 Pulled By: apaszke fbshipit-source-id: 1844f2e697117880ef5e31394e36e801d1de6088	2018-09-14 11:56:55 -07:00
Richard Zou	4e446b85fb	Make profiler.build_table() O(n) rather than O(n^2) (#10969 ) Summary: Fixes #10851 Speeds up profiling results dramatically. For the following script: ``` import torch import time ITER = 2000 x = torch.randn(1, 1, requires_grad=True) with torch.autograd.profiler.profile() as prof: y = x for i in range(ITER): y = 3 * y - 2 * y y.backward() start = time.time() print("Done running. Preparing prof") x = str(prof) print("Done preparing prof results") end = time.time() print("Elapsed: {}".format(end - start)) ``` I get 7s before / 0.13s after these changes. cc apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/10969 Differential Revision: D9556129 Pulled By: zou3519 fbshipit-source-id: 26b421686f8a42cdaace6382567d403e6385dc12	2018-08-29 12:25:51 -07:00
Ryan Brigden	8f421159fd	Fix profiler crash when no events register (#8034 ) * Fix profiler crash when no events register When trying to profile, attempting to print the event table throws a vague error because the event list is empty: .... max_name_length = max(len(evt.key) for evt in events) ValueError: max() arg is an empty sequence This change fixes the error by returning an empty string. * Update profiler.py	2018-06-01 14:38:24 -04:00
Maxim Berman	03767b66db	Add FileNotFoundError to torch._six (#7524 ) Add FileNotFoundError for compatibility with Python 2 and use in dataloader. Fixes pytorch/pytorch#6932	2018-05-12 20:54:26 -04:00
Soumith Chintala	0016dad841	[pytorch] minor fixes around binary builds (#6291 ) * remove patch * check that cuda dev environment is also present before running cpp_extension cuda tests * add OSError to list of exceptions when c++filt is not found	2018-04-04 22:37:13 -04:00
Richard Zou	1449c9f754	Update autograd docs (#5907 ) * Update autograd docs * Deprecate 'grad_variables' in backward(). Advise to replace with 'grad_tensors'. * Resolve saved_variables/saved_tensors * Tensor section * Address comments * Address comments * Address comments	2018-03-30 15:33:11 -04:00
li-roy	d776c52ff7	Fix nvprof parsing (#5840 )	2018-03-17 10:38:57 -04:00
Teng Li	e979b7c940	Removed redundant import re (#4826 )	2018-01-23 23:43:28 -05:00
peterjc123	23dc8acbc8	Fix missing import and enable test for profiler on Windows (#4522 ) * Fix missing import and enable test for profiler on Windows * Skip process when excutable is not found	2018-01-23 21:30:42 -05:00
Kaiyu Shi	c650c73cbc	Extract the finish check for profiler (#4519 ) * Extract the finish check for profiler Delete unused import and rearrange the import order. * Add imports for win support	2018-01-08 07:54:55 -05:00
peterjc123	e5f25421ae	Implement demangle in Windows (#4515 )	2018-01-07 05:35:10 -05:00
Atabak Dehban	ab80c27b47	Fix undefined FileNotFoundError (#4384 )	2017-12-28 20:32:49 +01:00
Konstantin Lopuhin	f01052ade4	Use enabled in torch.autograd.profiler.emit_nvtx (#4032 ) Or else it's always enabled.	2017-12-05 08:45:23 -08:00
Zachary DeVito	c25a1493cd	CUDA mode profiler fixes (#3754 ) * CUDA mode profiler fixes * Enable multi-gpu CUDA tracing We need to record per-device start events because event timing comparison only works for events on the same device. * Course-grained CPU-CUDA syncing of timelines Record a __cuda_start event used to synchronize cuda/gpu timings. This requires running some warm-up event records to ensure the call to event record for the __cuda_start event doesn't take longer than normal. fix syncing * fix cuda build and lint	2017-11-28 09:32:34 -05:00
folz	ca3fc59a9a	fix elapsed_us spelling	2017-11-18 18:28:27 +01:00
Zachary DeVito	cc7f09a372	Add cudaEvent support to the profiler (#3734 ) * Add cudaEvent support to the profiler This adds the ability to record cuda timings using cudaEventRecord in the profiler. Since it doesn't require nvprof it is easier to run than the nvprof path. This also records a thread id for each event, which will make tracing results easier to understand * Add flow arrows from cpu to cuda event * Fix no cuda build * Review comments * Move CUDA checks to one place	2017-11-16 13:58:09 -08:00
Adam Paszke	02450fff38	Expend autograd profiler docs (#3621 )	2017-11-10 08:58:45 -05:00
Ozan Çağlayan	dd6d04ddf2	doc: Normalize all true/false in docstrings to ``True\|False`` (#3593 ) * doc: Normalize all true/false in docstrings to ``True\|False`` This makes them more apparent in the documentation. * doc: fix flake8	2017-11-09 08:12:29 -05:00
Adam Paszke	22f596572c	Add torch.autograd.profiler.range	2017-11-06 19:42:44 -05:00
Alexander Miller	837f933cac	remove 'path' from key_averages header path appears to be unused	2017-10-25 21:34:59 +02:00
Adam Paszke	76abc06b1f	Fix nvprof mode in autograd profiler	2017-10-20 10:22:54 -04:00
Edward Z. Yang	191224b6e6	Suggest key_averages by default, it's more useful. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-13 01:31:22 +02:00
Adam Paszke	6fbbb1bc4e	Limit number of demangler invocations in autograd profiler	2017-10-03 09:55:37 -04:00
Adam Paszke	411e1469e0	Add tools for autograd profiling	2017-09-25 23:21:30 -04:00

27 Commits