pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
jjsjann123	6583c0384b	fixing trivial reduction & broadcast scheduling (#77884 ) cherry-picked fixes from https://github.com/csarofeen/pytorch/pull/1714 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77884 Approved by: https://github.com/csarofeen, https://github.com/davidberard98	2022-05-20 02:00:42 +00:00
jjsjann123	a2802ad0b9	Upstream master bump 0513 (#77471 ) Updating nvfuser code base. This should fix the indexing issue observed in https://github.com/pytorch/vision/issues/6015. Running tests locally as well. Will update the description here at a later point @bypass-github-export-checks Pull Request resolved: https://github.com/pytorch/pytorch/pull/77471 Approved by: https://github.com/seemethere, https://github.com/eellison	2022-05-18 11:48:50 -07:00
Xiang Gao	4eec865f58	[nvFuser] Improving bitwise ops support (#77158 ) - Some renaming to better match PyTorch API: - `lshift` -> `bitwise_left_shift` - `rshift` -> `bitwise_right_shift` - `andOp` -> `bitwise_and` - `orOp` -> `bitwise_or` - `xorOp` -> `bitwise_xor` - `notOp` -> `bitwise_not` - Fix type inferences and type checking of these ops - Add `bitwise_*` to parser and python frontend - Improve test coverage Pull Request resolved: https://github.com/pytorch/pytorch/pull/77158 Approved by: https://github.com/kevinstephano, https://github.com/jjsjann123	2022-05-18 17:21:34 +00:00
David Berard	36f7a6cc4a	[NVFuser] don't decompose conv2d if we don't have shape info Sometimes bias won't have shape info (e.g. in the added test, conv gets run two times in a loop, each with different shapes). In that case we should just skip decomposition instead of erroring out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77440 Approved by: https://github.com/jjsjann123	2022-05-13 22:39:43 +00:00
jjsjann123	b010c3451c	nvfuser opinfo test fixes masked_var/std (#77273 ) Enables guard mode in opinfo tests. Fixes opinfo failures for test_nvfuser_correctness__masked_var_cuda_xxxx test_nvfuser_correctness__masked_std_cuda_xxxx The root cause of the failure is that tracing changes stride properties and causes nvfuser to use wrong kernel and generate wrong results. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77273 Approved by: https://github.com/davidberard98	2022-05-12 05:04:56 +00:00
David Berard	949cbf1d65	[NVFuser] Opinfos for extremal values in binary ufuncs Added slow tests for comparing the eager & fused outputs for given extremal inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75917 Approved by: https://github.com/jjsjann123, https://github.com/eellison	2022-05-10 03:22:20 +00:00
jjsjann123	489818e7c6	disabling squeeze/unsqueeze; disabling BN/BN_BWD for perf concern (#77017 ) Fixes #76883 (via disabling squeeze/unsqueeze) Disabling BN fwd/bwd for our perf concern. I need to update our python tests. Awaiting build to finish so I can update tests accordingly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77017 Approved by: https://github.com/csarofeen, https://github.com/davidberard98	2022-05-09 22:57:20 +00:00
jjsjann123	b4f3f9c651	Torchvision patch (#77001 ) Fixes #76791 Note that this is a hot patch so we get to run upstream tests. I'm doing proper fix in our local repo and will update upstream code once those are merged/reviewed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77001 Approved by: https://github.com/davidberard98	2022-05-09 16:53:23 +00:00
Xiang Gao	104f0bf09e	[Reland] Add atan2 isfinite isinf isnan isneginf isposinf isreal to nvfuser and its frontend (#76769 ) This reverts commit `4bb5944133`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76769 Approved by: https://github.com/csarofeen, https://github.com/mruberry	2022-05-07 21:26:00 +00:00
PyTorch MergeBot	4bb5944133	Revert "Add atan2 isfinite isinf isnan isneginf isposinf isreal to nvfuser and its frontend" This reverts commit `92d10decc4`. Reverted https://github.com/pytorch/pytorch/pull/76598 on behalf of https://github.com/malfet	2022-05-03 19:53:28 +00:00
Xiang Gao	92d10decc4	Add atan2 isfinite isinf isnan isneginf isposinf isreal to nvfuser and its frontend Fixes: https://github.com/csarofeen/pytorch/issues/1632 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76598 Approved by: https://github.com/csarofeen, https://github.com/mruberry	2022-05-03 16:31:40 +00:00
jjsjann123	d23619b030	Permutation extended Extended permutation support in integration (See more details on https://github.com/csarofeen/pytorch/issues/1601). This update allows us to better support permutation propagation on tensors, specifically for binary ops with inputs of different ranks. Our goal is to avoid permuting tensors unless absolutely necessary. We try to preserve the permutation propagation rule in aten, with some known limitation at the time. The idea in this implementation is the same as with our existing code, which is to permute input/output tensors outside of codegen: For a simplified binary op scenario: `output = binaryOp(input0, input1)` 1. In a simple case where `input0` and `input1` come with the same rank & permutation order, our output would preserve the same permutation; 2. For cases where `input0` and `input1` come with different ranks but with compatible permutation, the tensor with the higher rank dictates the permutation of the output; 3. For cases where `input0` and `input1` come with different ranks but with in-compatible permutation, this is where permutation propagation fails and the output tensor will be contiguous. By compatible permutation, it means that we can permute the higher rank tensor to contiguous format, and then apply a second permutation to the tensor with lower rank to match their axes. This check is implemented in `MemoryFormat::broadcastToRank(int lower_rank)`. Some concrete example (note that we comply with eager propagation on cases 1-3, but diverge in behavior for cases 4, 5): 1. different rank & same permutation ``` t0 = torch.randn(b, h, w, c).cuda().permute([0, 3, 1, 2]) # stride (hwc, 1, wc, c) t1 = torch.randn(h, w, c).cuda().permute([2, 0, 1]) # stride (1, wc, c) out = scripted_add(t0, t1) # stride (hwc, 1, wc, c) preserving memory format of t0 ``` 2. different rank & compatible permutation ``` t0 = torch.randn(b, h, w, c).cuda().permute([0, 3, 1, 2]) # stride (hwc, 1, wc, c) t1 = torch.randn(c, h, w).cuda() # stride (hw, w, 1) out = scripted_add(t0, t1) # stride (hwc, 1, wc, c) preserving memory format of t0 ``` 3. different rank & compatible permutation with broadcasting ``` t0 = torch.randn(b, h, w, c).cuda().permute([0, 3, 1, 2]) # stride (hwc, 1, wc, c) t1 = torch.randn(c).cuda().unsqueeze(-1).unsqueeze(-1) # stride (1, 1, 1) out = scripted_add(t0, t1) # stride (hwc, 1, wc, c) preserving memory format of t0 ``` 4. different rank & in-compatible permutation ``` t0 = torch.randn(b, h, w, c).cuda().permute([0, 3, 1, 2]) # stride (hwc, 1, wc, c) t1 = torch.randn(h, w).cuda() # stride (w, 1) jit_out = scripted_add(t0, t1) # stride (hwc, 1, wc, c) # stride (hwc, wc, c, 1) # nvfuser outputs contiguous tensor eager_out = eager_add(t0, t1) # stride (hwc, 1, wc, c) # stride (hwc, 1, wc, c) # TI preserves memory format of LHS operand ``` 5. different rank & in-compatible permutation ``` t0 = torch.randn(c, h, w).cuda() # stride (hw, w, 1) t1 = torch.randn(b, h, w, c).cuda().permute([0, 3, 1, 2]) # stride (hwc, 1, wc, c) jit_out = scripted_add(t0, t1) # stride (hwc, 1, wc, c) # stride (hwc, 1, wc, c) # nvfuser preserves memory format of highest rank tensors eager_out = eager_add(t0, t1) # stride (hwc, 1, wc, c) # stride (hwc, hw, w, 1) # TensorIterator preserves memory format of LHS operand ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/76563 Approved by: https://github.com/kevinstephano, https://github.com/ngimel	2022-05-02 22:09:56 +00:00
Elias Ellison	bcee215d2b	[Testing CI] test exact layout on nvfuser tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/76393 Approved by: https://github.com/jjsjann123, https://github.com/davidberard98	2022-04-28 00:18:15 +00:00
Elias Ellison	0d7be81c9c	[JIT] Add Context Manager to force strict fusion Fixes https://github.com/pytorch/pytorch/issues/75464 Adds a context manager that will throw if the ops in the context are not fused. API is : ``` with torch.jit.strict_fusion(): ... ``` A few TODOs: [+] Compose/figure out how to do with autodiff - right now it will run on autodiff as well [+] Support all of the nvfuser operators that are added in guarding [+] Figure out what to do with control flow that isn't taken (right now it will just error). this is probably a source of the original issue :/ - will just error [+] (After those are figured out) add to docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/75777 Approved by: https://github.com/davidberard98	2022-04-25 16:08:57 +00:00
David Berard	1324410f2e	[JIT] Reuse traced fn for jit opinfos Previously, jit opinfos would only run the traced function once. This is a problem for NNC and NVFuser, where the fused implementation only runs on the second invocation. This caches the traced function and calls the cached implementation, so that subsequent calls actually perform fusion and use the fused implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76000 Approved by: https://github.com/eellison	2022-04-22 20:14:29 +00:00
David Berard	cd0fdccaef	Enable windows tests for nvfuser Pull Request resolved: https://github.com/pytorch/pytorch/pull/75190 Approved by: https://github.com/eellison, https://github.com/jjsjann123	2022-04-19 12:36:50 +00:00
David Berard	ebb60a8b2f	[NVFuser] don't decompose linear if we don't have shape info Pull Request resolved: https://github.com/pytorch/pytorch/pull/75770 Approved by: https://github.com/jjsjann123, https://github.com/robieta	2022-04-18 14:24:37 +00:00
jjsjann123	692ebc8d8b	baby steps on patching inf/nan behavior & aten::amin support in nvfuser Fixes #75622 1. Instead of getting max/min_value for reduction init value, we go with (-)infinity instead so we can properly preserve inf inputs; 2. Adding inf/(-)inf/nan for float value. 3. Adding aten::amin in nvfuser (@kevinstephano @rdspring1 for review) Pull Request resolved: https://github.com/pytorch/pytorch/pull/75646 Approved by: https://github.com/rdspring1, https://github.com/kevinstephano, https://github.com/ngimel	2022-04-13 15:51:17 +00:00
David Berard	790cc8f259	[JIT] nvfuser test - use only major & minor versions torch.version.cuda.split('.') can have 2 or 3 elements depending on whether the version string contains a patch number. This updates the test so it doesn't error out when the version has > 2 parts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75706 Approved by: https://github.com/ngimel	2022-04-13 15:14:12 +00:00
David Berard	a305c078da	[JIT] Prevent nvfuser registration on ROCm Previously, cuda_graph_fuser.h registration of the nvfuser pass used `at::globalContext().hasHIP()` to check whether we were using ROCm/HIP. However, I don't think that check actually does anything; on the ROCm CI jobs the registration would still succeed. Instead it's replaced with `#ifdef USE_ROCM`. Verified this by enabling the NVFuser tests on ROCm and running in CI. Before this change: the NVFuser test in CI on ROCm would throw really long and complex errors. Now, it errors out immediately when trying to enable nvfuser. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75284 Approved by: https://github.com/eellison	2022-04-12 20:06:14 +00:00
jiej	0203341bbd	patching clamp for one sided clamp Fixes #75088 The solution is just to avoid putting random value for non-specified clamp as pointed out in https://github.com/pytorch/pytorch/issues/75088#issuecomment-1093410036 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75558 Approved by: https://github.com/ngimel	2022-04-12 03:02:32 +00:00
jjsjann123	f7e7af80e0	disabling reshape Fixes #75282 Temporarily disables reshape to avoid codegen failure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75539 Approved by: https://github.com/davidberard98	2022-04-12 02:43:45 +00:00
jjsjann123	2d5e4cff85	disabling view Disabling view to avoid codegen errors as we resolve them internally. This is currently done via simply stop the non-alias transformation for view op in fusion pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75235 Approved by: https://github.com/davidberard98	2022-04-07 01:00:04 +00:00
jiej	aac4d6cd63	updating nvfuser tests Re-enabled the failing test `test_category_rule` since I don't have the repro; removed `test_linear_1d_weight_mismatch_bias_dtype` since the old behavior is not supported in aten; disabled `test_int_tensor_input` for pre-volta device since we have reduction `amax` in the test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75340 Approved by: https://github.com/davidberard98	2022-04-07 00:58:04 +00:00
Horace He	5994d68484	Reland NVFuser guard changes Reland of https://github.com/pytorch/pytorch/pull/75016 with `USE_CUDA` => `USE_NVFUSER` Pull Request resolved: https://github.com/pytorch/pytorch/pull/75303 Approved by: https://github.com/jjsjann123, https://github.com/davidberard98	2022-04-06 06:32:34 +00:00
David Berard	83400e836e	[JIT] nvfuser CI fixes * test_native_batch_norm_backward * test_reduction_empty_axes * test_register_fuser * test_category_rule Pull Request resolved: https://github.com/pytorch/pytorch/pull/75116 Approved by: https://github.com/jjsjann123, https://github.com/eellison	2022-04-04 22:19:03 +00:00
David Berard	c5b3727e5e	[JIT] OpInfo tests for nvfuser (#71299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71299 These tests verify that for the same inputs, the eager version of an op and a traced, fused version of the op return the same output. Currently the tests don't check whether or not fusion actually occurred. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D33595299 Pulled By: davidberard98 fbshipit-source-id: 26fdacf44941808c134953e7a883a02d13a43f19 (cherry picked from commit 8cd084e2e3130fcd5f9c99302d6d9bf4e21c25bb)	2022-04-01 23:48:30 +00:00
David Berard	27deefb5e1	[JIT] Enable NVFuser tests in OSS CI (#73322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73322 These tests have been disabled in OSS CI since #34785. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D34436844 Pulled By: davidberard98 fbshipit-source-id: c5b14b33e7f369a6fa1e9cfbcb484a30dffc659e (cherry picked from commit b08f51587c0203c3e8b69f06ea613759e740aa4f)	2022-04-01 23:48:30 +00:00
David Berard	e9e75215e2	[JIT] Optionally validate nvfuser outputs after execution (#74361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74361 This adds an optional validation after executing an NVFuser node, which checks that the output is the same as the unfused implementation. Then the outputs and the graph are reported via a callback. ```python import torch def callback(x, y, graph): for i in range(len(x)-amt, len(x)): print(x[i]) print(y[i]) print(graph) with torch.jit.fuser("fuser2"): torch._C._jit_nvfuser_set_comparison_callback(True, callback) torch.jit.script def g(x, y): z = torch.add(x, y) return torch.sin(z) def f(x, y, a): z = torch.add(x, y) return g(torch.relu(z), a) f_s = torch.jit.script(f) x = torch.rand((10, 10), dtype=torch.half).cuda() y = torch.rand((10, 10), dtype=torch.half).cuda() a = torch.rand((10, 10), dtype=torch.half).cuda() f_s(x, y, a) f_s(x, y, a) f_s(x, y, a) ``` Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D34975310 Pulled By: davidberard98 fbshipit-source-id: 2379c9a6f371cd58da6a187c1f16882f3923ab24 (cherry picked from commit 96c87992c65f5e6bb1bdd51791682dd837af99b4)	2022-04-01 23:48:30 +00:00
PyTorch MergeBot	1352c6417a	Revert "Nvfuser guard patch" This reverts commit `d86181f745`. Reverted https://github.com/pytorch/pytorch/pull/75016 on behalf of https://github.com/malfet	2022-04-01 23:45:55 +00:00
jjsjann123	d86181f745	Nvfuser guard patch Fixes issue where CudaFusionGuard would return false on backward graph because `requires_grad` flag doesn't match. This is due to the fact that autodiff uses GradMode switch to turn on/off requires_grad, which is not taken into consideration by nvfuser guard. We verified the implementation under `TensorType::matchTensor`. - [x] Add python test to verify no fallback is observed Pull Request resolved: https://github.com/pytorch/pytorch/pull/75016 Approved by: https://github.com/eellison	2022-04-01 14:23:48 +00:00
jjsjann123	873ced7cd0	Nvfuser code bump 030122 (#73627 ) Summary: Things changed in this PR that requires review: test/forward_backward_compatibility/check_forward_backward_compatibility.py Our previous function overload extension names were wrong and has been updated in this PR, hence the compatibility list updated. nvfuser code updates with bug fixes towards failures we encountered in OpInfoTests as well as failures reported by AOTAutograd team. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73627 Reviewed By: Chillee Differential Revision: D34765458 Pulled By: davidberard98 fbshipit-source-id: c81f3d6a1b723fb3a8ba419b7f82227f70440ca7 (cherry picked from commit b6a2c362c37051e44fac31687b2fe272f776551e)	2022-03-31 08:18:22 +00:00
jiej	86c817cfa0	Requires grad guard Adding CudaFusionGuard to guard on device/requires_grad of profiled tensor type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74780 Approved by: https://github.com/davidberard98	2022-03-29 19:23:10 +00:00
jiej	13f28df460	disable contiguity on cross dimensional overlapped tensor Unmarked contiguity on stride properties when we have dimensions potentially covering overlapping memory. This check could be done more accurately, per dimension instead of a global flag per tensor. I'm just keeping it simple here, as the existing code gives us correctness and that's what's important. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74359 Approved by: https://github.com/ngimel, https://github.com/malfet	2022-03-23 21:17:42 +00:00
jiej	e4e19d5beb	nvfuser parser skip api (#74520 ) Summary: added python API to disable nvfuser on certain opkind. ``` "_jit_set_nvfuser_skip_node_kind", [](const std::string& op_name, bool flip = true) { return fuser::cuda::skipNode(op_name, flip); }) ``` Args: `op_name`: Symbol of op; `flip`: flag indicating whether to flip the given op in the skip list. Returns: a bool flag indicating if `op_name` was already in the skip list. The python example that disables the fusion of `aten::add` afterwards. `torch._C._jit_set_nvfuser_skip_node_kind("aten::add", True) # returns False, as no op is in skip list by default` Pull Request resolved: https://github.com/pytorch/pytorch/pull/74520 Reviewed By: saketh-are Differential Revision: D35046110 Pulled By: davidberard98 fbshipit-source-id: 689f5286513dbab206768823a852467b9f6b49b6 (cherry picked from commit 9a31129f7591ba2d393ab057b1cd137a6a25e7e8)	2022-03-23 20:56:43 +00:00
PyTorch MergeBot	a7866ada1c	Revert "disable contiguity on cross dimensional overlapped tensor" This reverts commit `6c383dede5`. Reverted https://github.com/pytorch/pytorch/pull/74359 on behalf of https://github.com/malfet	2022-03-23 20:54:22 +00:00
jiej	6c383dede5	disable contiguity on cross dimensional overlapped tensor Unmarked contiguity on stride properties when we have dimensions potentially covering overlapping memory. This check could be done more accurately, per dimension instead of a global flag per tensor. I'm just keeping it simple here, as the existing code gives us correctness and that's what's important. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74359 Approved by: https://github.com/ngimel	2022-03-23 17:42:38 +00:00
jiej	2d110d514f	Nvfuser code bump 2_1_2022 (#72127 ) Summary: Things changed in this PR that requires review: 1. aten/src/ATen/core/interned_strings.h 2. torch/csrc/jit/ir/alias_analysis.h : exposing createValue to allow efficient mutation 3. torch/csrc/jit/runtime/symbolic_shape_registry.cpp : added gelu/tanh/erf in registry 4. torch/jit/_script.py : throws scripting model sees autocast as decorator since it's not supported nvfuser code update: 1. codegen improvements and performance tuning 2. integration bug fixes for shape expression logic 3. kernel segmentation update to address perf regression from horizontal fusion 4. scalar cpu tensor promotion to support inter-device operation between cpu scalar tensor and cuda tensor Things reverted from local changes: aten::gelu with approximation (tracked in PR: https://github.com/pytorch/pytorch/pull/61439) Pull Request resolved: https://github.com/pytorch/pytorch/pull/72127 Reviewed By: HamidShojanazeri Differential Revision: D34113233 Pulled By: jbschlosser fbshipit-source-id: b82cde32b71e324eca0ea57cb8c9f9647278ca74 (cherry picked from commit `e009bc5c4e`)	2022-02-15 00:43:16 +00:00
Ryan Spring	4f8b986e28	Implement Tanh Gelu Approximation (#61439 ) Summary: 1. Implements https://github.com/pytorch/pytorch/issues/39853 2. Adds approximate boolean flag to Gelu 3. Enables Tanh Gelu approximation 4. Adds double backward support for Gelu 5. Enable Tanh Gelu in NvFuser ``` def gelu(x, approximate : str = 'none'): if approximate == 'tanh': # sqrt(2/pi) = 0.7978845608028654 return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0)))) else: return x * normcdf(x) ``` Linking XLA PR - https://github.com/pytorch/xla/pull/3039 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439 Reviewed By: VitalyFedyunin Differential Revision: D33894937 Pulled By: jbschlosser fbshipit-source-id: b65e8fb6ea66168af8f34f45ed50e92737a33851 (cherry picked from commit `6e986f91a9`)	2022-02-14 03:40:32 +00:00
jjsjann123	e429a68478	Allow single node fusion for nvfuser (#70000 ) Summary: Setting `PYTORCH_NVFUSER_ONE_OP_FUSION=1` will take all nodes nvFuser support, instead of waiting for fusion opportunity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70000 Reviewed By: samdow Differential Revision: D33292195 Pulled By: davidberard98 fbshipit-source-id: 8ed5ce5e82fbb6737e8ab5ce4223b038eaf47756	2021-12-23 17:07:57 -08:00
jiej	78f06e0690	fixing conv2d decomposition and tests (#70127 ) Summary: Current implementation has a bug where decomposed `add_optional` from `conv2d` is placed before the producer node, this causes linter error on graph. Cherry-picked from https://github.com/csarofeen/pytorch/pull/1333 Fixing issue posted in https://github.com/csarofeen/pytorch/issues/1325 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70127 Reviewed By: ejguan Differential Revision: D33199018 Pulled By: jansel fbshipit-source-id: bce1f14a443811b4d55116a04fd4daa86084cc47	2021-12-19 10:38:23 -08:00
jiej	76d282d447	Nvfuser code bump 12 5 (#69964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69964 Things added in this PR that requires review: 1. cuLaunchCooperativeKernel driver API added aten/src/ATen/cuda/detail/LazyNVRTC.cpp aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.h nvfuser code update: 1. perf turning on codegen scheduler that improves performance. 2. permutation support has been extended beyond contiguous/channels-last. (The improvements could be observed on PW benchmark) Things reverted from local changes: 1. aten::gelu with approximation 2. local changes that is upstreamed in PR https://github.com/pytorch/pytorch/issues/68804 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69428 Reviewed By: ngimel Differential Revision: D33073817 Pulled By: wconstab fbshipit-source-id: e77d32e81d037d7370822b040456fd4c3bd68edb	2021-12-16 08:28:54 -08:00
jjsjann123	fed9b90ed4	fixing removeProfilingNodes duplicated functions (#1282 ) (#68804 ) Summary: Unfortunately there're two versions of removeProfilingNodes function and one of them is not cleaning up profile_ivalue nodes properly. This leads to a dangling profile_ivalue node, which ended up being profiled multiple times and could give us false assert failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68804 Reviewed By: mrshenli Differential Revision: D32980157 Pulled By: Krovatkin fbshipit-source-id: cd57c58a941d10ccd01a6cd37aac5c16256aaea6	2021-12-13 22:54:30 -08:00
jjsjann123	0dc3f829d9	Nvfuser code bump 11 5 (#67943 ) Summary: nvfuser code update: 1. Tuning heuristics on schedulers for reduction/normalization kernels; 2. bfloat16 on IO tensor support; 3. Refactored memory format support, now we can support dimension collapsing with non-coherent input tensors with different memory format. e.g. channels last tensor input to batch normalization. Note that we are currently limiting memory format to only Contiguous and Channels last; 4. Refactored nvfuser graph partitioning in `graph_fuser.cpp`, separated node merge and profile node API. Updated `profiling_record.cpp`. Things that are reverted from our local branch: 1. changes on some entries in autodiff 2. aten::gelu with approximation 3. native_dropout(_backward) Pull Request resolved: https://github.com/pytorch/pytorch/pull/67943 Reviewed By: ngimel Differential Revision: D32288709 Pulled By: dzhulgakov fbshipit-source-id: fc9491182ea7e0158bc112c66f096823c588eaf1	2021-11-17 01:22:17 -08:00
Jane Xu	09c7771e9c	Set test owners for jit tests (#66808 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66808 Reviewed By: mrshenli Differential Revision: D31761414 Pulled By: janeyx99 fbshipit-source-id: baf8c49ff9c4bcda7b0ea0f6aafd26380586e72d	2021-10-25 07:51:10 -07:00
jjsjann123	d609957c95	patching graph_for (#55139 ) Summary: Allows individual DifferentiableGraphOp to display optimized forward graph. This improves user visibility to graph mutation via optimization pass, especially fusion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55139 Reviewed By: albanD Differential Revision: D31330909 Pulled By: dzhulgakov fbshipit-source-id: c745b482fdc34876dc404cbe3bacd99dcf2ac724	2021-10-04 21:50:22 -07:00
jiej	127c9402d0	Revert "Revert D30752939: [pytorch][PR] nvfuser update" (#65137 ) Summary: This reverts commit `03389dc851`. Attempt again for PR: https://github.com/pytorch/pytorch/issues/63745 Fixes the windows build failure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65137 Reviewed By: seemethere, dzhulgakov, heitorschueroff Differential Revision: D30994556 Pulled By: malfet fbshipit-source-id: f1925b6c5cc1a1a441a96499667c91e8dfc1b53d	2021-09-22 04:54:51 -07:00
Eli Uriegas	03389dc851	Revert D30752939: [pytorch][PR] nvfuser update Test Plan: revert-hammer Differential Revision: D30752939 (`cfaecaf40b`) Original commit changeset: ce122e80f01b fbshipit-source-id: 57685df8f9946032a06eff1de8a3d1498500d2d2	2021-09-15 17:38:47 -07:00
jiej	cfaecaf40b	nvfuser update (#63745 ) Summary: Syncing nvfuser code base from devel branch, Listing a few of our development since last sync: - Extends support to normalization and reduction kernels. - Multiple kernel launch for single `CudaFusionGroup`. Hierarchical caching system has been updated to cache graph segmentation. - profile_ivalue is enabled to convert dynamic scalar into compile time constants, which are required by the codegen. (e.g. reduction axes). To keep this PR simple and relatively review-free. We stripped most external changes and submitted them as separate PRs, so this gigantic PR is easier to handle. internal updates are files located in: 1. updates in nvfuser codegen `torch/csrc/jit/coddgen/cuda` 2. added nvfuser specific benchmarks `benchmarks/cpp/nvfuser` 3. nvfuser jit cpp tests `test/cpp/jit/test_gpu.cpp` `test/cpp/jit/test_gpu_shift.cpp` `test/cpp/jit/test_gpu_validator.h` updates affecting integration: 1. profile_ivalue enabled for nvfuser. related changes are in `torch/csrc/jit/runtime/`, 2. exposed a few more symbols `aten/src/ATen/core/` used by codegen Pull Request resolved: https://github.com/pytorch/pytorch/pull/63745 Reviewed By: saketh-are Differential Revision: D30752939 Pulled By: malfet fbshipit-source-id: ce122e80f01bcd3865f5bd3c4dfde660665fd84c	2021-09-15 14:42:55 -07:00
Shen Li	1022443168	Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: revert-hammer Differential Revision: D30279364 (`b004307252`) Original commit changeset: c1ed77dfe43a fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e	2021-08-12 11:45:01 -07:00

1 2

74 Commits