pytorch/test/nn
hongxyan 637ab85e7f fix for launching kernel invalid config error when calling embedding … (#130994)
…with large index

Fixes #130806
When an output size of 2147483648 (=131072*16384) is expected in the above issue, it throwed out the following error:
RuntimeError: HIP error: invalid configuration argument

What happened was that the second parameter passed to hipLaunchKernel was crazy {2147483648,1,1}.
Found two issues in the Indexing.cu:

1: ptrdiff_t was used but it is signed int,  outTotalSize >= 2147483648 can cause overflow when doing [this](39493aa934/aten/src/ATen/native/cuda/Indexing.cu (L1367)):
2: On ROCm, std::min -> ::min did not work as expected when outTotalSize>=2147483648

As the result, 2147483648 was sent to hipLaunchKernel which the GPU does not support such a huge number since this number specifies the number of threads per block. The original code intended to set 128 threads per block, though this is debatable as the perf would not good for latest powerful GPUs (a TODO item to update for perf maybe?) , but at least it would not cause `invalid configuration argument` error.

[Test]
Run the same code snippet in the [issue](https://github.com/pytorch/pytorch/issues/130806), and print the output, its dim and numel(), which looks like below now:
```
output=tensor([[ 0.4044, -0.0244, -0.6865,  ..., -0.7800,  0.1175,  1.6726],
        [-1.0866, -0.1609,  0.3538,  ...,  1.9105,  0.7882,  1.1583],
        [-2.2079,  0.3736,  0.3610,  ..., -0.2658, -0.0459,  1.3077],
        ...,
        [ 0.8753, -0.7482, -0.1978,  ...,  0.9016,  1.1501, -0.5178],
        [-1.5845, -0.6277,  1.4520,  ...,  0.5733, -2.1198, -0.0915],
        [-0.6310, -1.0239, -0.1910,  ...,  0.4309,  0.1630,  0.3239]],
       device='cuda:0'), dim=2, numel=2147483648
```

Added a large tensor unit test too.
```
/pytorch# pytest test/nn/test_embedding.py -k test_large_tensors
================================================================================== test session starts ===================================================================================
platform linux -- Python 3.9.19, pytest-7.3.2, pluggy-1.4.0
rootdir: /dockerx/development/pytorch
configfile: pytest.ini
plugins: flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, cpp-2.3.0, hypothesis-5.35.1
collected 288 items / 287 deselected / 1 selected
Running 1 items in this shard

test/nn/test_embedding.py .                                                                                                                                                        [100%]

=========================================================================== 1 passed, 287 deselected in 3.16s ============================================================================
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130994
Approved by: https://github.com/jeffdaily, https://github.com/xw285cornell
2024-07-20 08:33:29 +00:00
..
test_convolution.py Dont mutate tensor stride in place in cudnn conv (#126786) 2024-05-22 01:53:44 +00:00
test_dropout.py Enable UFMT on all of test/nn (#123809) 2024-04-12 18:32:25 +00:00
test_embedding.py fix for launching kernel invalid config error when calling embedding … (#130994) 2024-07-20 08:33:29 +00:00
test_init.py [BE] wrap deprecated function/class with typing_extensions.deprecated (#127689) 2024-06-02 12:30:43 +00:00
test_lazy_modules.py Fix wrong ufmt exclusions in .lintrunner.toml (#124135) 2024-04-17 12:22:50 +00:00
test_load_state_dict.py Revert "Make nn.Module state_dict load_state_dict pre-hook and state_dict post hook public (#126704)" 2024-06-11 17:45:20 +00:00
test_module_hooks.py Revert "Make nn.Module state_dict load_state_dict pre-hook and state_dict post hook public (#126704)" 2024-06-11 17:45:20 +00:00
test_multihead_attention.py Enable UFMT on all of test/nn (#123809) 2024-04-12 18:32:25 +00:00
test_packed_sequence.py Enable UFMT on all of test/nn (#123809) 2024-04-12 18:32:25 +00:00
test_parametrization.py [parametrization] fix requires_grad propagation (#124888) 2024-04-26 10:19:31 +00:00
test_pooling.py Fix max_pool2d decomposition for empty list and integer limits (#129106) 2024-06-24 22:19:42 +00:00
test_pruning.py [BE]: Optimize min/max/sum comprehensions C419 (#123960) 2024-04-12 23:54:15 +00:00