pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Bert Maher	6ec8fabc29	Fix frac in CUDA fuser (#44152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44152 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D23528506 fbshipit-source-id: bfd468d72fa55ce317f88ae83e1f2d5eee041aa0	2020-09-09 11:10:08 -07:00
Elias Ellison	5bd2902796	[JIT] Remove references to no longer generated _tanh_backward and _sigmoid_backward (#44138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44138 If you look at the sigmoid and tanh backward they are composed of other ops: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/runtime/symbolic_script.cpp#L786 https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/runtime/symbolic_script.cpp#L164 So tanh_backward and sigmoid_backward are no longer generated / legacy ops. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23543603 Pulled By: eellison fbshipit-source-id: ce8353e53043cf969b536aac47c9576d66d4ce02	2020-09-05 01:41:36 -07:00
Bert Maher	0bf27d64f4	Fix NaN propagation in fuser's min/max implementation (#43590 ) Summary: fmax/fmin propagate the number if one argument is NaN, which doesn't match the eager mode behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43590 Reviewed By: mruberry Differential Revision: D23338664 Pulled By: bertmaher fbshipit-source-id: b0316a6f01fcf8946ba77621efa18f339379b2d0	2020-08-26 17:31:06 -07:00
Nikolay Korovaiko	6ecb5bb1f0	match old fuser rem to eager (#37196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37196 Reviewed By: zdevito Differential Revision: D21223172 Pulled By: Krovatkin fbshipit-source-id: 4d4ff1127d5dc69ab73f07ca79c1f5b0b4dd9732	2020-05-01 10:55:06 -07:00
Deyu Fu	346215caa4	[jit] Adding vectorized load/store support for JIT generated CUDA kernel (#36555 ) Summary: JIT pointwise kernel currently does not do vectorized load/store, which may lead to not optimal performance in shorter data types, like half and int8. In this PR, a fixed length of 4 elements per load/store is added for supported tensor shape, implemented as a runtime check inside kernel. Supported tensor shape: - all input/output data point are aligned to 4sizeof(dtype) - last dimension contiguous(stride 1) and size is multiple of 4 - all other dimension have stride that is multiple of 4 All test_jit passed, and here is performance result on a simple `ax+by+c` fusion result before PR: ``` torch.float32 kernel time: 0.748 ms. torch.float16 kernel time: 0.423 ms. torch.int8 kernel time: 0.268 ms. ``` result after PR: ``` torch.float32 kernel time: 0.733 ms. torch.float16 kernel time: 0.363 ms. torch.int8 kernel time: 0.191 ms. ``` test code: ``` import torch import time # disable profiling to test all data types torch._C._jit_set_profiling_mode(False) torch._C._jit_set_profiling_executor(False) torch.jit.script def axpby(x, y): return x * 2 - y * 3 + 1 for test_dtype in [torch.float32, torch.float16, torch.int8]: a = torch.randn(12345,4096, device="cuda").to(test_dtype) b = torch.randn(12345,4096, device="cuda").to(test_dtype) # warm up for _ in range(100): c = axpby(a,b) torch.cuda.synchronize() start = time.time() for _ in range(1000): c = axpby(a,b) torch.cuda.synchronize() end = time.time() print("{} kernel time: {:.3f} ms.".format(test_dtype, end-start)) ``` Generated code: [log_with_generated_code.txt](https://github.com/pytorch/pytorch/files/4472813/log_with_generated_code.txt) Additional note: double type is disabled from vectorized code path. We can later improve it with dynamic vectorization length support and less in-kernel check when we can use tensor shape information in codegen. For now, this implementation is following cache through TensorDesc mechanism, which does not have enough compile time information. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36555 Differential Revision: D21142762 Pulled By: ngimel fbshipit-source-id: 1cfdc5807a944c4670b040dc2d2dfa480377e7d7	2020-04-20 19:24:28 -07:00
Meghan Lele	6384c2d81b	[JIT] clang-format JIT code (#35115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35115 This commit runs the newly added tools/clang_format.py on the JIT codebase and includes all of the formatting changes thus produced. Testing: Ran the script, CI. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D20568523 Pulled By: SplitInfinity fbshipit-source-id: e09bdb982ccf090eecfb7c7b461b8d0681eef82b	2020-03-26 11:24:51 -07:00
Michael Suo	dbe850af5b	[jit] do the code reorg (#33851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851 Rationale and context described in #33828. Script to reproduce the move: https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9 ghstack-source-id: 99079645 Test Plan: Make sure CI passes Reviewed By: jamesr66a Differential Revision: D20133869 fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e	2020-02-27 13:02:51 -08:00

7 Commits