pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-08 07:39:33 +01:00

Author	SHA1	Message	Date
Edward Yang	517c7c9861	Canonicalize all includes in PyTorch. (#14849 ) Summary: Anywhere we used #include "foo.h", we now say #include <foo.h> Paths are adjusted to be rooted out of aten/src, torch/lib, or the root level directory. I modified CMakeLists.txt by hand to remove TH and THC from the include paths. I used the following script to do the canonicalization: ``` import subprocess import re import os.path files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n') for fn in files: if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']): continue if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]): continue with open(fn, 'r') as f: c = f.read() def fmt(p): return "#include <{}>".format(p) def repl(m): p = m.group(1) if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]: return fmt(p) if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]): return fmt(p) for root in ["aten/src", "torch/lib", ""]: for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]: new_p = os.path.relpath(os.path.join(bad_root, p), root) if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))): return fmt(new_p) print("ERROR: ", fn, p) return m.group(0) new_c = re.sub(r'#include "([^"]+)"', repl, c) if new_c != c: print(fn) with open(fn, 'w') as f: f.write(new_c) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849 Reviewed By: dzhulgakov Differential Revision: D13363445 Pulled By: ezyang fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68	2018-12-08 19:38:30 -08:00
Adam Paszke	31b3d81714	Broadcast prim::FusedConcat inputs independently when checking kernels (#14503 ) Summary: Fixes #14483. cc zou3519 mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/14503 Differential Revision: D13256343 Pulled By: zou3519 fbshipit-source-id: 1c68a23f425be067a742bada7ee8cdfab7fc3fa2	2018-11-29 13:05:00 -08:00
Thomas Viehmann	8408dff55a	Add Type support to the fuser, fuse more (#14336 ) Summary: This adds scalar type support to the fuser, both internally (instead of auto / assuming float) and for the inputs/outputs. We can now fuse things with input / output of arbitrary scalar type, in particular comparisons and where work well. So it fixes #13384 by returning the right type tensor (and adds a test where byte and double tensors are returned). The type inference is done by re-calling PropagateTensorShapeOnNode in the compilation, I would venture that it isn't prohibitively expensive compared to the actual compilation. (Propagation was fixed for where to return the second argument's type and amended to handle FusedConcat.) I'm not sure how to add a check for the code generated by the fuser, but I am not sure we absolutely need to (we'd see if it is invalid / produces wrong results). Thanks in particular to apaszke, fmassa, mruberry for advice and encouragement! All the errors are my own. I have discussed order of PRs briefly with mruberry, if this goes in before he submits the PR, he graciously agreed to rebasing his, but I'd happily rebase, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14336 Differential Revision: D13202620 Pulled By: soumith fbshipit-source-id: 855159e261fa15f21aca3053bfc05fb3f720a8ef	2018-11-27 11:33:11 -08:00
Michael Suo	33d091f432	shape analysis fix (#14325 ) Summary: This PR is deceptively large because of an indenting change. The actual change is small; I will highlight it inline Pull Request resolved: https://github.com/pytorch/pytorch/pull/14325 Differential Revision: D13183296 Pulled By: suo fbshipit-source-id: fcbf6d5317954694ec83e6b8cc1c989f2d8ac298	2018-11-23 11:24:24 -08:00
Thomas Viehmann	1f871f126f	Have PYTORCH_FUSION_DEBUG print C kernel source (#14213 ) Summary: - Move up handling the environment variable from CPU only to all - Introduce two levels to be enabled with PYTORCH_FUSION_DEBUG=n: 1: print C source 2: print CPU assembly, too (previous effect of PYTORCH_FUSION_DEBUG) apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/14213 Differential Revision: D13135393 Pulled By: soumith fbshipit-source-id: befa4ebea3b3c97e471393a9f6402b93a6b24031	2018-11-20 12:45:07 -08:00
ArmenAg	751b5ea941	use at::Device throughout JIT (#14181 ) Summary: zdevito soumith Sorry about the previous PR, had some git issues. This is the same exact code as the previous PR but updated w.r.t pytorch/master. fixes #13254 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14181 Differential Revision: D13117688 Pulled By: soumith fbshipit-source-id: 044840b2c7a0101ef43dd16655fd9a0f9981f53f	2018-11-19 09:21:57 -08:00
mruberry	6fe089c6ea	Hierarchical device independent -> device specific architecture (#13108 ) Summary: This PR principally redesigns the fuser's logical flow to be hierarchical, with device-independent logic directing (relatively little) device-specific logic. This design is based on reviews of XLA, TVM, internal design review at NVIDIA and discussions with fuser owners at Facebook. To further vet the design I have begun developing the next significant PR (extended fusion logic) on top of this architecture and it has made the work significantly easier. This PR also improves fuser modularity, which should make it easier for others to contribute to. Unfortunately, this PR is large and its nature has made breaking it into smaller pieces challenging. Future PRs should be smaller. The fusion flow is now: - Fusions are "registered" and "upfront compilation" occurs. The fusion specifications, which includes the graph, go into a thread-safe device-independent cache. Upfront compilation generates some information used later during shape inference. - Fusions are run, which passes them to an executor that performs shape inference, requests an instantiated fusion from the specification's thread-safe store, and launches them. Launch logic eventually defers to device-specific logic. - Fusions not previously instantiated are compiled. Compilation is device-specific and arg-specific. Compilation logic eventually defers to device-specific logic. - If the fusion could not be run because fusion on the requested device is disabled or shape inference fails a fallback is invoked. This flow can be thought of as PyTorch IR -> Device-Independent Fusion Logic -> Device-Specific Fusion Logic. The current upstream logic is, by contrast, PyTorch IR -> Device-Specific Logic -> Device-Independent Logic, which results in needless code duplication and lack of conceptual clarity. That was my mistake when splitting the fuser off from the rest of the jit and our reviews since then have been incredibly helpful in understanding why the approach in this PR is better. This PR does not only move code around. It also fixes few couple bugs and makes some logical/code changes. Bug fixes: - thread-safety is improved with caches preventing concurrent access - the nvrtc version is now reviewed to determine the appropriate compute architecture to compile for, fixing a bug that would cause runtime errors if a user's nvrtc didn't support the compute architecture their gpu reported - an issue with DeviceGuard not setting the device properly and failing silently is worked-around (ezyang mentioned he was reviewing the dynamic registration DeviceGuard uses, which may resolve the issue) Code/Logical changes: - "const" now appears many more places (note: I cast const away in operator.h because of some obscure build issues -- I think we should be able to fix this and will take a look while this goes through testing) - The new flow allowed some redundant code to be removed (AnnotatedGraph is gone, for example, and the more straightforward flow eliminated duplication of effort elsewhere) - Fallback logic is now also invoked if a fusion is requested on a device that cannot handle fusions - Use of macros to determine which files are compiled is reduced (though they may come back if the Windows build is unhappy) - There is no more "common" code or folder, the device-independent logic being at the forefront of the fuser replaces and improves upon the goal of sharing code apaszke who I promised naming rights to zdevito who correctly pointed out that the device-independent logic should be the bulk of what the fuser is doing ngimel who contributed to the design of this architecture Pull Request resolved: https://github.com/pytorch/pytorch/pull/13108 Reviewed By: gchanan, fmassa Differential Revision: D12850608 Pulled By: soumith fbshipit-source-id: 24e2df6dfa97591ee36aeca8944519678c301fa3	2018-10-31 18:13:00 -07:00

7 Commits