pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

History

hongxyan 637ab85e7f fix for launching kernel invalid config error when calling embedding … (#130994 ) …with large index Fixes #130806 When an output size of 2147483648 (=131072*16384) is expected in the above issue, it throwed out the following error: RuntimeError: HIP error: invalid configuration argument What happened was that the second parameter passed to hipLaunchKernel was crazy {2147483648,1,1}. Found two issues in the Indexing.cu: 1: ptrdiff_t was used but it is signed int, outTotalSize >= 2147483648 can cause overflow when doing [this](`39493aa934/aten/src/ATen/native/cuda/Indexing.cu (L1367)`): 2: On ROCm, std::min -> ::min did not work as expected when outTotalSize>=2147483648 As the result, 2147483648 was sent to hipLaunchKernel which the GPU does not support such a huge number since this number specifies the number of threads per block. The original code intended to set 128 threads per block, though this is debatable as the perf would not good for latest powerful GPUs (a TODO item to update for perf maybe?) , but at least it would not cause `invalid configuration argument` error. [Test] Run the same code snippet in the [issue](https://github.com/pytorch/pytorch/issues/130806), and print the output, its dim and numel(), which looks like below now: ``` output=tensor([[ 0.4044, -0.0244, -0.6865, ..., -0.7800, 0.1175, 1.6726], [-1.0866, -0.1609, 0.3538, ..., 1.9105, 0.7882, 1.1583], [-2.2079, 0.3736, 0.3610, ..., -0.2658, -0.0459, 1.3077], ..., [ 0.8753, -0.7482, -0.1978, ..., 0.9016, 1.1501, -0.5178], [-1.5845, -0.6277, 1.4520, ..., 0.5733, -2.1198, -0.0915], [-0.6310, -1.0239, -0.1910, ..., 0.4309, 0.1630, 0.3239]], device='cuda:0'), dim=2, numel=2147483648 ``` Added a large tensor unit test too. ``` /pytorch# pytest test/nn/test_embedding.py -k test_large_tensors ================================================================================== test session starts =================================================================================== platform linux -- Python 3.9.19, pytest-7.3.2, pluggy-1.4.0 rootdir: /dockerx/development/pytorch configfile: pytest.ini plugins: flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, cpp-2.3.0, hypothesis-5.35.1 collected 288 items / 287 deselected / 1 selected Running 1 items in this shard test/nn/test_embedding.py . [100%] =========================================================================== 1 passed, 287 deselected in 3.16s ============================================================================ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/130994 Approved by: https://github.com/jeffdaily, https://github.com/xw285cornell		2024-07-20 08:33:29 +00:00
..
test_convolution.py	Dont mutate tensor stride in place in cudnn conv (#126786 )	2024-05-22 01:53:44 +00:00
test_dropout.py	Enable UFMT on all of `test/nn` (#123809 )	2024-04-12 18:32:25 +00:00
test_embedding.py	fix for launching kernel invalid config error when calling embedding … (#130994 )	2024-07-20 08:33:29 +00:00
test_init.py	[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#127689 )	2024-06-02 12:30:43 +00:00
test_lazy_modules.py	Fix wrong ufmt exclusions in `.lintrunner.toml` (#124135 )	2024-04-17 12:22:50 +00:00
test_load_state_dict.py	Revert "Make nn.Module state_dict load_state_dict pre-hook and state_dict post hook public (#126704 )"	2024-06-11 17:45:20 +00:00
test_module_hooks.py	Revert "Make nn.Module state_dict load_state_dict pre-hook and state_dict post hook public (#126704 )"	2024-06-11 17:45:20 +00:00
test_multihead_attention.py	Enable UFMT on all of `test/nn` (#123809 )	2024-04-12 18:32:25 +00:00
test_packed_sequence.py	Enable UFMT on all of `test/nn` (#123809 )	2024-04-12 18:32:25 +00:00
test_parametrization.py	[parametrization] fix `requires_grad` propagation (#124888 )	2024-04-26 10:19:31 +00:00
test_pooling.py	Fix max_pool2d decomposition for empty list and integer limits (#129106 )	2024-06-24 22:19:42 +00:00
test_pruning.py	[BE]: Optimize min/max/sum comprehensions C419 (#123960 )	2024-04-12 23:54:15 +00:00