mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-06 12:20:52 +01:00
1c46a32b67
8 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
1c46a32b67 |
Minor typing improvements (#91068)
Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/91068 Approved by: https://github.com/Skylion007, https://github.com/soumith |
||
|
|
b4799736ee |
autograd: fix non-deterministic output in codegen comments (#84695)
Summary: Like it says in the title. Currently, this will return output like this: In Buck1, that's OK because Buck1's caching doesn't really care too much about However, in Buck2, this is a disaster, because caching is based exclusively on inputs and outputs and The diff here proposes making the path relative to the codegen script itself, which should carry about as much info, but avoid cache misses. Concretely, this: ``` // generated from /dev/shm/uid-34135/cfbc5712-seed-nspid4026533424_cgpid2794673-ns-4026533443/tools/autograd/templates/python_functions.h ``` Becomes, this: ``` // generated from ../tools/autograd/templates/python_functions.h ``` So, we keep the useful part, and we get caching. This matters because those headers are used in actions like: ``` fbcode//deeplearning/fbgemm/fbgemm_gpu/codegen:embedding_ops -- action (cxx_compile gen_embedding_backward_adam_split_unweighted_cuda.cu (pic)) ``` Those actions take upwards of 5 minutes to finish, so by allowing a cache hit, we are a) saving our users a lot of time and b) saving some RE capacity as well. This actually matters a lot because right now those targets are produced by `//caffe2:generate-code`, which itself doesn't get cache hits from RE because `generate_code.par` is non-deterministic (this is, unfortunately, true of PARs in general), so that rule introduces non-determinism that the codegen propagates and we get zero caching. This diff doesn't fix `//caffe2:generate-code`'s inputs being non-deterministic, but it does fix its *outputs* being non-deterministic, which means the non-determinism stops there, and we get back to cache hits. Test Plan: - CI ``` buck2 build fbcode//caffe2:generate-code buck2 build fbcode//deeplearning/fbgemm/fbgemm_gpu/codegen:embedding_ops ``` Reviewed By: ndmitchell Differential Revision: D39348565 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84695 Approved by: https://github.com/soulitzer |
||
|
|
406ce692ca |
[torchgen] Generate wrapper functions under custom namespaces (#81744)
Summary: A follow up of #81581. Before these 2 PRs, if an operator with custom kernel namespace is added to `native_functions.yaml` (or any other yaml consumed by `torchgen`), although we are able to recognize the custom kernel in files such as `NativeFunctions.h` and `RegisterCPU.cpp`, we still generate backend specific wrappers under the hardcoded `at` namespace. This changes the behavior, by generating wrapper functions under custom namespaces. For example, if the entries in yaml file looks like: ``` - func: op_1(Tensor(a) self) -> Tensor(a) dispatch: CPU: at::op_1_kernel # ATen kernel - func: op_2(Tensor(a) self) -> Tensor(a) dispatch: CPU: custom::op_2_kernel # custom kernel ``` We generate the following code for `CPUFunctions_inl.h` and `RegisterCPU.cpp`: `CPUFunctions_inl.h`: ``` namespace at { namespace cpu { TORCH_API at::Tensor & op_1(const at::Tensor & self); } // namespace cpu } // namespace at namespace custom { namespace cpu { TORCH_API at::Tensor & op_2(const at::Tensor & self); } // namespace cpu } // namespace custom ``` Notice the difference between `at::cpu` and `custom::cpu`. Then the definition for these can be found in `RegisterCPU.cpp`. `RegisterCPU.cpp`: ``` #include "CPUFunctions.h" namespace at { namespace { at::Tensor & wrapper_op_1(const at::Tensor & self) { // No device check // DeviceGuard omitted return at::native::op_1_kernel(self); } } // anonymous namespace TORCH_LIBRARY_IMPL(aten, CPU, m) { m.impl("op_1", TORCH_FN(wrapper_op_1)); } namespace cpu { at::Tensor & op_1(at::Tensor & self) { return wrapper_op_1(self); } } // namespace cpu } // namespace at namespace custom { namespace { at::Tensor & wrapper_op_2(const at::Tensor & self) { // No device check // DeviceGuard omitted return at::native::op_2_kernel(self); } } // anonymous namespace TORCH_LIBRARY_IMPL(aten, CPU, m) { m.impl("op_2", TORCH_FN(wrapper_op_2)); } namespace cpu { at::Tensor & op_2(at::Tensor & self) { return wrapper_op_2(self); } } // namespace cpu } // namespace custom ``` The benefit for this change is that it unifies all the namespaces derived from custom ops. In the example above, there are: 1. `custom::native` for kernels 2. `custom::<dispatch_key>` e.g., `custom::cpu` for wrappers This customized operator will have nothing to do with `at::native`, `at::cpu` etc. Test Plan: This is very hard to test. I will refactor this logic, abstract out some layers so it's testable. Will do it in coming PRs Differential Revision: D37972772 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81744 Approved by: https://github.com/bdhirsh |
||
|
|
ba84e9662e |
Use OrderedSet in ufunc codegen (#82567)
Follow up from https://github.com/pytorch/pytorch/pull/82536#discussion_r934000916 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82567 Approved by: https://github.com/ezyang |
||
|
|
a4647cc1fa |
Apply ufmt linter to all py files under torchgen (#81570)
Previous batches: * https://github.com/pytorch/pytorch/pull/81285 * https://github.com/pytorch/pytorch/pull/81335 We have multiple batches here to minimize merge conflicts and reviewing process. Once everything has been formatted by ufmt (black+usort), the current black linter will be removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81570 Approved by: https://github.com/ezyang |
||
|
|
5c8a9803c8 |
[torchgen] Support multiple namespace in NativeFunctions.h (#79733)
Summary: This is a follow up to #78015. This PR * introduces namespace logic for generating `NativeFunctions.h`. * adds helper function to extract namespace from string * relaxes the constraint on the levels we support for custom kernel namespace to 2 Test Plan: Yaml entry: ``` - func: unsqueeze.out(Tensor(a) self, int dim, *, Tensor(a!) out) -> Tensor(a!) variants: function device_check: NoCheck dispatch: CPU: custom_1::custom_2::unsqueeze ``` Generated `NativeFunctions.h`: ``` namespace custom_1 { namespace custom_2 { namespace native { TORCH_API at::Tensor & unsqueeze(const at::Tensor & self, int64_t dim, at::Tensor & out); } // namespace native } // namespace custom_2 } // namespace custom_1 ``` Differential Revision: D37198111 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79733 Approved by: https://github.com/bdhirsh |
||
|
|
dca416b578 |
Pretty-print dataclasses (#76810)
Unfortunately the built-in pprint module support pretty-print of dataclasses only from python 3.10. The code that I wrote in method `__str__` of OpInfo should do the same job and should also work for any dataclass. For now I've put it there but we can create a function and put it somewhere where is accessible also for other dataclasses. Also the max width (80) is now hardcode but it would ideally be the parameter of the function.
when you call print on an OpInfo you get:
```
OpInfo(name = '__getitem__',
ref = None,
aliases = (),
variant_test_name = '',
op = <slot wrapper '__getitem__' of 'torch._C._TensorBase' objects>,
method_variant = <slot wrapper '__getitem__' of 'torch._C._TensorBase' objects>,
inplace_variant = None,
skips = (<torch.testing._internal.common_methods_invocations.DecorateInfo object at 0x7f463acbca90>,
<torch.testing._internal.common_methods_invocations.DecorateInfo object at 0x7f463acbcae0>),
decorators = (<torch.testing._internal.common_methods_invocations.DecorateInfo object at 0x7f463acbca90>,
<torch.testing._internal.common_methods_invocations.DecorateInfo object at 0x7f463acbcae0>),
sample_inputs_func = <function sample_inputs_getitem at 0x7f463acc6af0>,
reference_inputs_func = None,
error_inputs_func = None,
sample_inputs_sparse_coo_func = <function _DecoratorContextManager.__call__.<locals>.decorate_context at 0x7f463acc6b80>,
sample_inputs_sparse_csr_func = <function _DecoratorContextManager.__call__.<locals>.decorate_context at 0x7f463acc6c10>,
dtypes = {torch.int16,
torch.float64,
torch.int32,
torch.int64,
torch.complex64,
torch.float16,
torch.bfloat16,
torch.uint8,
torch.complex128,
torch.bool,
torch.float32,
torch.int8},
dtypesIfCUDA = {torch.int16,
torch.float64,
torch.int32,
torch.int64,
torch.complex64,
torch.float16,
torch.bfloat16,
torch.uint8,
torch.complex128,
torch.bool,
torch.float32,
torch.int8},
dtypesIfROCM = {torch.int16,
torch.float64,
torch.int32,
torch.int64,
torch.complex64,
torch.float16,
torch.bfloat16,
torch.uint8,
torch.complex128,
torch.bool,
torch.float32,
torch.int8},
backward_dtypes = {torch.int16,
torch.float64,
torch.int32,
torch.int64,
torch.complex64,
torch.float16,
torch.bfloat16,
torch.uint8,
torch.complex128,
torch.bool,
torch.float32,
torch.int8},
backward_dtypesIfCUDA = {torch.int16,
torch.float64,
torch.int32,
torch.int64,
torch.complex64,
torch.float16,
torch.bfloat16,
torch.uint8,
torch.complex128,
torch.bool,
torch.float32,
torch.int8},
backward_dtypesIfROCM = {torch.int16,
torch.float64,
torch.int32,
torch.int64,
torch.complex64,
torch.float16,
torch.bfloat16,
torch.uint8,
torch.complex128,
torch.bool,
torch.float32,
torch.int8},
supports_out = False,
supports_autograd = True,
supports_gradgrad = True,
supports_fwgrad_bwgrad = True,
supports_inplace_autograd = False,
supports_forward_ad = True,
gradcheck_wrapper = <function OpInfo.<lambda> at 0x7f463a7a40d0>,
check_batched_grad = True,
check_batched_gradgrad = True,
check_batched_forward_grad = True,
check_inplace_batched_forward_grad = True,
gradcheck_nondet_tol = 0.0,
gradcheck_fast_mode = None,
aten_name = '__getitem__',
decomp_aten_name = None,
aten_backward_name = None,
assert_autodiffed = False,
autodiff_nonfusible_nodes = ['aten::__getitem__'],
autodiff_fusible_nodes = [],
supports_sparse = False,
supports_scripting = False,
supports_sparse_csr = False,
test_conjugated_samples = True,
test_neg_view = True,
assert_jit_shape_analysis = False,
supports_expanded_weight = False)
```
cc @ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76810
Approved by: https://github.com/ezyang
|
||
|
|
36420b5e8c |
Rename tools/codegen to torchgen (#76275)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76275 In preparation for addressing https://github.com/pytorch/pytorch/issues/73212 Diff was generated with: ``` git mv tools/codegen torchgen git grep -l 'tools.codegen' | xargs sed -i 's/tools.codegen/torchgen/g' sed -i "s/\${TOOLS_PATH}\/codegen/\${TORCH_ROOT}\/torchgen/g" caffe2/CMakeLists.txt ``` and a manual edits to: * tools/test/test_gen_backend_stubs.py * torchgen/build.bzl * torchgen/gen_backend_stubs.py aka this diff: ``` diff --git a/tools/test/test_gen_backend_stubs.py b/tools/test/test_gen_backend_stubs.py index 3dc26c6d2d..104054575e 100644 --- a/tools/test/test_gen_backend_stubs.py +++ b/tools/test/test_gen_backend_stubs.py @@ -9,7 +9,7 @@ from torchgen.gen_backend_stubs import run from torchgen.gen import _GLOBAL_PARSE_NATIVE_YAML_CACHE # noqa: F401 path = os.path.dirname(os.path.realpath(__file__)) -gen_backend_stubs_path = os.path.join(path, '../torchgen/gen_backend_stubs.py') +gen_backend_stubs_path = os.path.join(path, '../../torchgen/gen_backend_stubs.py') # gen_backend_stubs.py is an integration point that is called directly by external backends. # The tests here are to confirm that badly formed inputs result in reasonable error messages. diff --git a/torchgen/build.bzl b/torchgen/build.bzl index ed04e35a43..d00078a3cf 100644 --- a/torchgen/build.bzl +++ b/torchgen/build.bzl @@ -1,6 +1,6 @@ def define_targets(rules): rules.py_library( - name = "codegen", + name = "torchgen", srcs = rules.glob(["**/*.py"]), deps = [ rules.requirement("PyYAML"), @@ -11,6 +11,6 @@ def define_targets(rules): rules.py_binary( name = "gen", - srcs = [":codegen"], + srcs = [":torchgen"], visibility = ["//visibility:public"], ) diff --git a/torchgen/gen_backend_stubs.py b/torchgen/gen_backend_stubs.py index c1a672a655..beee7a15e0 100644 --- a/torchgen/gen_backend_stubs.py +++ b/torchgen/gen_backend_stubs.py @@ -474,7 +474,7 @@ def run( ) -> None: # Assumes that this file lives at PYTORCH_ROOT/torchgen/gen_backend_stubs.py - pytorch_root = pathlib.Path(__file__).parent.parent.parent.absolute() + pytorch_root = pathlib.Path(__file__).parent.parent.absolute() template_dir = os.path.join(pytorch_root, "aten/src/ATen/templates") def make_file_manager(install_dir: str) -> FileManager: ``` run_all_fbandroid_tests Test Plan: sandcastle Reviewed By: albanD, ngimel Differential Revision: D35770317 fbshipit-source-id: 153ac4a7fef15b1e750812a90bfafdbc8f1ebcdf (cherry picked from commit c6d485d1d4648fa1c8a4c14c5bf3d8e899b9b4dd) |