Commit Graph

98 Commits

Author SHA1 Message Date
Tom Ritchford
d8c8ba2440 Fix unused Python variables in test/[e-z]* (#136964)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136964
Approved by: https://github.com/justinchuby, https://github.com/albanD
2024-12-18 23:02:30 +00:00
zeshengzong
cb71bcc542 Replace clone.detach with detach.clone (#140264)
Fixes #64532

As state in issue, replace `clone.detach` by `detach.clone`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140264
Approved by: https://github.com/soulitzer
2024-11-13 07:01:02 +00:00
Jeff Daily
046f02d2de [ROCm] index_put performance improvement (#138259)
On ROCm, using a non-vectorized index_put kernel provides ~2x perf improvement over the hipified CUDA kernel.  None of the existing unit tests were exercising the large index case so a new unit test was added.

It was also noted that the scale value in the original kernel was hard-coded to 1.0 which would be a no-op, so it was removed from the simplified rocm kernel.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138259
Approved by: https://github.com/xw285cornell, https://github.com/leitian, https://github.com/eqy
2024-10-22 15:21:43 +00:00
Xuehai Pan
4226ed1585 [BE] Format uncategorized Python files with ruff format (#132576)
Remove patterns `**`, `test/**`, and `torch/**` in `tools/linter/adapters/pyfmt_linter.py` and run `lintrunner`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132576
Approved by: https://github.com/ezyang, https://github.com/Skylion007
ghstack dependencies: #132574
2024-08-04 17:13:31 +00:00
Xuehai Pan
ba48cf6535 [BE][Easy][6/19] enforce style for empty lines in import segments in test/ (#129757)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129757
Approved by: https://github.com/ezyang
2024-07-17 06:42:37 +00:00
Antoni Vros
78e40b271b Change index_put on GPU to accept FP8 inputs (#128758)
As the title says, this PR changes the dispatcher for the CUDA index_put_ kernel to accept FP8 inputs. This is useful for Transformers models where the KV cache is FP8 and has been pre-allocated.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128758
Approved by: https://github.com/eqy, https://github.com/drisspg
2024-06-25 00:38:03 +00:00
William Wen
5359af0c7e [dynamo] wrap GraphModule exceptions in dynamo-wrapped tests (#126341)
Better approach to https://github.com/pytorch/pytorch/pull/126197 to catch issues like https://github.com/pytorch/pytorch/issues/125568.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126341
Approved by: https://github.com/anijain2305, https://github.com/jansel
2024-05-29 05:18:04 +00:00
Xuehai Pan
26f4f10ac8 [5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
2024-05-27 14:49:57 +00:00
PyTorch MergeBot
55c0ab2887 Revert "[5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)"
This reverts commit 7763c83af6.

Reverted https://github.com/pytorch/pytorch/pull/127126 on behalf of https://github.com/XuehaiPan due to Broken CI ([comment](https://github.com/pytorch/pytorch/pull/127126#issuecomment-2133044286))
2024-05-27 09:22:08 +00:00
Xuehai Pan
7763c83af6 [5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
ghstack dependencies: #127122, #127123, #127124, #127125
2024-05-27 04:22:18 +00:00
Jianping Wu
c281d3a0cb Enable UFMT on test_indexing&test_view_ops (#125112)
Part of https://github.com/pytorch/pytorch/issues/123062

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125112
Approved by: https://github.com/ezyang
2024-05-01 23:44:53 +00:00
chilli
392dc45597 Made FlexAttention rewrite getitem calls to use aten.index in score_mod (#124799)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124799
Approved by: https://github.com/drisspg
ghstack dependencies: #124444
2024-04-26 17:22:13 +00:00
Catherine Lee
025387f4dd [ez][CI] Reduce CI_SERIAL_LIST pt2 (#124298)
#124085

Add @serialTest() to some tests

slow gradcheck already runs serially

Doing this slowly so its easier to check flaky issues that might get made

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124298
Approved by: https://github.com/kit1980
2024-04-18 00:13:36 +00:00
laith sakka
8455447972 Support builtin callable with object arguments in dynamo (#118678)
Fix issue #117556

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118678
Approved by: https://github.com/anijain2305
2024-01-31 17:54:08 +00:00
Aaron Gokaslan
6de28e92d2 [BE]: Apply FURB118 (prev): replaces unnecessary lambdas with operator. (#116027)
This replaces a bunch of unnecessary lambdas with the operator package. This is semantically equivalent, but the operator package is faster, and arguably more readable. When the FURB rules are taken out of preview, I will enable it as a ruff check.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116027
Approved by: https://github.com/malfet
2023-12-20 19:35:08 +00:00
Nikita Shulga
16e539e0e6 Fix index range check (#116062)
Fixes incorrect range check when index is `std::numeric_limits<int64_t>::min()`, as result of unary minus operations for such values is undefined, but in practice is equal to self, see https://godbolt.org/z/Wxhh44ocr

Lower bound check was `size >= -index`, which was incorrect if `index` is `INT64_MIN`, with `-1 - index`, which for all int64_t values returns result that also fits into int64_t range. `- (index + 1)` is more readable and results in the identical optimized assembly, see https://godbolt.org/z/3vcnMYf9a , but its intermediate result for `INT64_MAX` is  outside of `int64_t` range, which leads to a similar problems as with `int64_min` in original example.

Added regression test.

Fixes https://github.com/pytorch/pytorch/issues/115415

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116062
Approved by: https://github.com/Skylion007, https://github.com/albanD
2023-12-20 15:40:57 +00:00
PyTorch MergeBot
24af118e55 Revert "markDynamoStrictTest more tests (#115871)"
This reverts commit 478f0e96dc.

Reverted https://github.com/pytorch/pytorch/pull/115871 on behalf of https://github.com/jeanschmidt due to Breaking internal tests and builds, please check diff, this is required to revert #115870 ([comment](https://github.com/pytorch/pytorch/pull/115871#issuecomment-1862992931))
2023-12-19 15:36:27 +00:00
rzou
478f0e96dc markDynamoStrictTest more tests (#115871)
For:
test_dispatch.py
test_fake_tensor.py
test_indexing.py
test_linalg.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115871
Approved by: https://github.com/voznesenskym
ghstack dependencies: #115845, #115855, #115856, #115857, #115858, #115870
2023-12-15 05:26:54 +00:00
Kurt Mohler
5292a92e03 Add torch.unravel_index (#110580)
Fixes #35674

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110580
Approved by: https://github.com/lezcano, https://github.com/kulinseth
2023-10-12 00:55:51 +00:00
slc
2d4b1ae434 [Fix Bug] Cannot assign index like x[[1,2], :] = 2 when torch.use_deterministic_algorithms(True) to main (#105833)
Fixes https://github.com/pytorch/pytorch/issues/105819 and fix https://github.com/pytorch/pytorch/issues/96724

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105833
Approved by: https://github.com/kurtamohler, https://github.com/janeyx99
2023-08-07 17:00:19 +00:00
Nikita Shulga
f0832914ee [Dynamo] Fix lineinfo generation on PY3.11+ (#103525)
- Replace `for inst in instructions[0:targe.offset//2]: inst.starts_line = None`, with the one that that iterates over all instructions until `inst.offset == target.offset` condition is met, this way making it uniform across Python bytecode dialects (Python-3.11+ bytecode size is variable, while bytecode size is fixed for older Pythons)
- Speedup target_index search by replacing `[i for i in instructions if i.offset == offset][0]` with `next(i for i in instructions if i.offset == offset)`, which aborts the evaluation after condition met for the first time, according to:
  ```python
   In [1]: lst=list(range(10000))

   In [2]: %time [i for i in lst if i == 10]
   CPU times: user 144 µs, sys: 23 µs, total: 167 µs
   Wall time: 168 µs
   Out[2]: [10]

   In [3]: %time next(i for i in lst if i == 10)
   CPU times: user 6 µs, sys: 0 ns, total: 6 µs
   Wall time: 9.06 µs
   Out[3]: 10
   ```
- Fix small typo
- use `is_py311_plus` variable rather than checking `sys.version_info`

<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 6cd7f27</samp>

> _We fix the typos in our code of doom_
> _We remove the warnings that obscure our vision_
> _We refactor the `generate` function for the dynamo_
> _We resume the execution with precision_

Fixes https://github.com/pytorch/pytorch/issues/103355

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103525
Approved by: https://github.com/Skylion007, https://github.com/williamwen42
2023-06-14 05:41:43 +00:00
Nikita Shulga
4cfa06f706 [BE] Deprecate has_XYZ attributes (#103279)
Use [`__getattr__`](https://peps.python.org/pep-0562/) to raise warningwhen one tries to access `has_XYZ` methods and recommend appropriate `torch.backends.XYZ` methods

Make respective properties in `torch._C` private (by prefixing them with underscore), to exclude from `from torch._C import *`.

Added `warnings.simplefilter` to workaround Python-3.11 torch.compile lineinfo issue.

Fixes https://github.com/pytorch/pytorch/issues/102484

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103279
Approved by: https://github.com/janeyx99, https://github.com/Skylion007
2023-06-10 05:17:17 +00:00
Yanbo Liang
075d36d37f [Dynamo] Fix nested function resume execution (#100426)
Fixes #99665

Let me explain the root cause using the unit test I added:
* This bug is triggered when:
  * ```wrapped``` is a nested function.
  * ```wrapped``` is in another module which is different from the main function ```fn```.
  * There is a graph break inside of ```wrapped```.
* The root cause is when resuming nested function, actually we are using the outermost function(```fn``` in my example)'s global variables, but ```wrapped``` calls ```inner_func``` which is not part of ```fn```'s globals, so we have to set correct globals when nested function resume execution.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100426
Approved by: https://github.com/jansel
2023-05-11 03:10:23 +00:00
PyTorch MergeBot
4b8127b90e Revert "[Dynamo] Fix nested function resume execution (#100426)"
This reverts commit d719f0276d.

Reverted https://github.com/pytorch/pytorch/pull/100426 on behalf of https://github.com/jeanschmidt due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/100426#issuecomment-1540915913))
2023-05-09 21:32:13 +00:00
Yanbo Liang
d719f0276d [Dynamo] Fix nested function resume execution (#100426)
Fixes #99665

Let me explain the root cause using the unit test I added:
* This bug is triggered when:
  * ```wrapped``` is a nested function.
  * ```wrapped``` is in another module which is different from the main function ```fn```.
  * There is a graph break inside of ```wrapped```.
* The root cause is when resuming nested function, actually we are using the outermost function(```fn``` in my example)'s global variables, but ```wrapped``` calls ```inner_func``` which is not part of ```fn```'s globals, so we have to set correct globals when nested function resume execution.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100426
Approved by: https://github.com/jansel
2023-05-06 05:04:50 +00:00
Edward Z. Yang
77dae43767 Don't truncate leading 1s if they are unbacked (#95141)
This prevents us from guarding on leading unbacked SymInts.

The previous attempt at https://github.com/pytorch/pytorch/pull/94521 I got the logic a bit wrong. My idea there was to avoid slicing when the values to be set have low enough dimensionality that they definitely aren't too long. To do this, I need to compute the difference between the data to be set, and the post-slice space for the values. But I incorrectly compared against the *pre-slice* space in the original PR. Another version of this PR which is wrong is to compare against variableIndices.size(); but remember that in advanced indexing with tensors/lists, each of the individual indices specify what coordinates to read out of each dimension! A third incorrect attempt tested `variableIndices[0].dim()`, which is only correct if you don't broadcast one of the later variable indices, and if there are enough variableIndices to cover all dims. This is all quite complicated, so I went for a simpler solution of checking if the leading dim had a hint before testing if it is not equal to one.

BTW, there is no test for this one stripping behavior. There is now a test for this, based off the real code that caused the problem.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95141
Approved by: https://github.com/ngimel
2023-02-21 00:22:24 +00:00
Nikita Shulga
d5d55363d9 Add broadcastable check to index_put (#94849)
Copy-n-paste it from
989299802c/aten/src/ATen/native/TensorAdvancedIndexing.cpp (L582-L583)

Which is used for both CPU and CUDA checks, unless op is called for GPU with `deterministicAlgorithms()` set to true

Followup: do the same for XLA and fix the case when indices are not null

Fixes https://github.com/pytorch/pytorch/issues/94667

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94849
Approved by: https://github.com/ngimel
2023-02-17 20:37:23 +00:00
Huy Do
21dd311077 Add a mode to rerun all disabled tests (without running anything else) (#88646)
Rerun all disabled test to gather their latest result so that we can close disabled tickets automatically. When running under this mode (RERUN_DISABLED_TESTS=true), only disabled tests are run while the rest are skipped `<skipped message="Test is enabled but --rerun-disabled-tests verification mode is set, so only disabled tests are run" type="skip"/>`

The logic is roughly as follows, the test runs multiple times (n=50)

* If the disabled test passes, and it's flaky, do nothing because it's still flaky.  In the test report, we'll see the test passes with the following skipped message:
```
<testcase classname="TestMultiprocessing" file="test_multiprocessing.py" line="357" name="test_fs" time="0.000" timestamp="0001-01-01T00:00:00">
    <skipped message="{&quot;flaky&quot;: True, &quot;num_red&quot;: 4, &quot;num_green&quot;: 0, &quot;max_num_retries&quot;: 3, &quot;rerun_disabled_test&quot;: true}" type="skip"/>
</testcase>
```

* If the disabled test passes every single time, and it is not flaky anymore, mark it so that it can be closed later.  We will see the test runs and passes, i.e.
```
<testcase classname="TestCommonCUDA" name="test_out_warning_linalg_lu_factor_cuda" time="0.170" file="test_ops.py" />
```

* If the disabled test fails after all retries, this is also expected. So only report this but don't fail the job (because we don't care about red signals here), we'll see the test is skipped (without the `flaky` field), i.e.
```
<testcase classname="TestMultiprocessing" file="test_multiprocessing.py" line="357" name="test_fs" time="0.000" timestamp="0001-01-01T00:00:00">
    <skipped message="{&quot;num_red&quot;: 4, &quot;num_green&quot;: 0, &quot;max_num_retries&quot;: 3, &quot;rerun_disabled_test&quot;: true}" type="skip"/>
</testcase>
```

This runs at the same schedule as `mem_leak_check` (daily).  The change to update test stats, and (potentially) grouping on HUD will come in separated PRs.

### Testing

* pull https://github.com/pytorch/pytorch/actions/runs/3447434434
* trunk https://github.com/pytorch/pytorch/actions/runs/3447434928
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88646
Approved by: https://github.com/clee2000
2022-11-15 05:08:26 +00:00
Natalia Gimelshein
dc9c507d24 add nominal support for int32 indices in index/index_put ops (#86309)
Currently index_select/index_add decompositions decompose to `index` or `index_put` ops. The problem with this is that `index_select` and `index_add` accept int32 indices while `index` doesn't. That leads to error in meta func for those decompositions. This PR adds non-performant support for int32 indices to `index` operations, to allow decompositions go through.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86309
Approved by: https://github.com/lezcano
2022-10-05 23:59:16 +00:00
Elias Ellison
f37069aac7 Re-enable fixed dynamo tests (#84969)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84969
Approved by: https://github.com/bdhirsh, https://github.com/ezyang
2022-09-16 15:36:52 +00:00
Elias Ellison
f701cb04fb Test Dynamo CI w Fake Tensors (#84282)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84282
Approved by: https://github.com/anijain2305
2022-09-01 00:15:05 +00:00
yuguo68
cd41c8f032 fix set item to scalar tensor missing gradient info
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78246

Approved by: https://github.com/ngimel
2022-05-25 21:22:44 +00:00
Natalia Gimelshein
394b4d853c Fix deterministic indexing with non-contiguous tensor
Fixes #76176 (first case)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76220
Approved by: https://github.com/mruberry
2022-04-22 18:46:50 +00:00
Duncan Hill
0988dc481a [Codemod][Codemod deprecated unittest asserts] fbcode//caffe2/test (#71708)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71708

In Python 3.2, a number of asserts were deprecated.

In Python 3.11, these asserts are deleted completely. The files in this change still use the deprecated asserts.

Switch over to the supported syntax for 3.2 onwards.

Test Plan: Tested on the internal test suite runner.

Reviewed By: ajtulloch

Differential Revision: D33503694

fbshipit-source-id: a150f296033260acf8365d77b837ce0679f57361
(cherry picked from commit abf60ed97409265222915d8265aaabedd625fd93)
2022-03-15 19:28:52 +00:00
vfdev-5
a2ab06514b Fixes CUDA vs CPU consistency for index_put_ when accumulating (part 2) (#67189)
Summary:
Description:
- Follow up PR to https://github.com/pytorch/pytorch/issues/66790 to fix the tests on functorch, https://github.com/pytorch/functorch/issues/195

In functorch, a null tensor is added to the list of indices for the batch dimension in C++, but I can not find an equivalent of that in python without using `torch.jit.script`. If any other better solutions could be suggested, I'd be happy to replace the current way of testing.

cc ngimel zou3519

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67189

Reviewed By: suo

Differential Revision: D31966686

Pulled By: ngimel

fbshipit-source-id: a14b9e5d77d9f43cd728d474e2976d84a87a6ff4
2021-11-08 17:56:43 -08:00
Can Balioglu
efdb17b984 Add meta support to tensor range factories (#67032)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67032

This PR adds meta backend support to the `range`, `arange`, `linspace`, and `logspace` operators.

Note that the original PR (#66630) was reverted due to two failing unit tests in the Bionic CI. This revision includes a fix for those tests; otherwise its content is identical to the previous PR.

Original commit changeset: 2f9d8d1acbb0
ghstack-source-id: 142487306

Test Plan: Extended the existing tensor creation tests to assert meta backend support.

Reviewed By: zhaojuanmao

Differential Revision: D31834403

fbshipit-source-id: a489858a2a8a38a03234b14408e14d2b208a8d34
2021-11-05 15:36:29 -07:00
kshitij12345
885a8e53ba replace onlyOnCPUAndCUDA with onlyNativeDeviceTypes (#65201)
Summary:
Reference https://github.com/pytorch/pytorch/issues/53849

Replace `onlyOnCPUandCUDA` with `onlyNativeDeviceTypes` which includes `cpu, cuda and meta`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65201

Reviewed By: mrshenli

Differential Revision: D31299718

Pulled By: mruberry

fbshipit-source-id: 2d8356450c035d6a314209ab51b2c237583920fd
2021-11-01 09:22:34 -07:00
vfdev-5
28fac23409 Fixes CUDA vs CPU consistency for index_put_ when accumulating (#66790)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/39227
Fixes https://github.com/pytorch/pytorch/issues/66495 (duplicate of 39227)

Description:
- Expands values for CUDA implementation
- Improved shapes checking for CUDA
- Improved error message for CUDA
- Added tests

cc zou3519

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66790

Reviewed By: mruberry

Differential Revision: D31843566

Pulled By: ngimel

fbshipit-source-id: c9e5d12a33e1067619c210174ba6e3cd66d5718b
2021-10-21 19:09:57 -07:00
Jane Xu
8a65047acc [skip ci] Set test owners for everything considered with module: tests (#66865)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66865

Reviewed By: anjali411

Differential Revision: D31771147

Pulled By: janeyx99

fbshipit-source-id: 8bebe5ac2098364ef1ee93b590abb5f4455b0f89
2021-10-20 09:37:03 -07:00
Kushashwa Ravi Shrimali
d37636901e [Doc] make_tensor to torch.testing module (#63925)
Summary:
This PR aims to add `make_tensor` to the `torch.testing` module in PyTorch docs.

TODOs:

* [x] Add examples

cc: pmeier mruberry brianjo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63925

Reviewed By: ngimel

Differential Revision: D30633487

Pulled By: mruberry

fbshipit-source-id: 8e5a1f880c6ece5925b4039fee8122bd739538af
2021-08-30 12:25:40 -07:00
Shen Li
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
Zsolt Dollenstein
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
Eddie Yan
42d6543c7b [bc-breaking] Dispatch index_put with boolean mask argument to masked_fill (#61612)
Summary:
https://github.com/pytorch/pytorch/issues/57515

Based on ngimel 's branch, with a few tweaks to determine when to copy value tensors to device memory/additional tests.
bc-breaking note: Previously, if in `x[index]=value` `value` was a 0-d tensor with device different from `x`'s device, it resulted in a RuntimeError. Now this case is handled by copying `value` to the correct device.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61612

Reviewed By: mrshenli

Differential Revision: D29753491

Pulled By: ngimel

fbshipit-source-id: 3fba14f4c2b9b136b50af020f9c1eda88f7373b0
2021-07-19 22:53:14 -07:00
Natalia Gimelshein
61f946bba6 don't copy indices to the self device in dispatch_index (#59059)
Summary:
Let index/index_put implementation in aten take care of moving the indices to the correct device, don't make python wrapper do that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59059

Reviewed By: mruberry

Differential Revision: D28750562

Pulled By: ngimel

fbshipit-source-id: 2f2b5f875733898f1c0b30b544c89808f91e4a6f
2021-05-27 14:19:59 -07:00
Xiang Gao
3de86b951d Migrate thrust->cub for index put (#55693)
Summary:
64bit indexing is not supported, because if `num_indices = 2^31`, then 4 long tensors of `num_indices` elements will take 64GB RAM. I don't think anybody will be interested in running `index_put` with 64GB GPU RAM.

Benchmark on CUDA 11.3 RTX3090:
```python
import torch
import itertools

def run50_sync(f):
    for _ in range(50):
        f()
    torch.cuda.synchronize()

run50_sync(lambda: torch.randperm(1000000, device='cuda'))

def benchmark(M, L):
    a = torch.randn(M, device='cuda')
    i1 = torch.randint(M, (L,), dtype=torch.long, device='cuda')
    v = torch.randn(L, device='cuda')

    torch.cuda.synchronize()

    %timeit run50_sync(lambda:a.index_put_((i1,), v, True))

for M, L in itertools.product((100, 100000, 10000000), repeat=2):
    print(M, L)
    benchmark(M, L)
```

Before
```
100 100
5.13 ms ± 91 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
100 100000
30.2 ms ± 471 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
100 10000000
3.17 s ± 14.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
100000 100
5.19 ms ± 61.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
100000 100000
11.9 ms ± 200 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
100000 10000000
712 ms ± 3.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
10000000 100
5.07 ms ± 66.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10000000 100000
12.1 ms ± 76.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10000000 10000000
627 ms ± 7.65 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

After
```
100 100
3.75 ms ± 49.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
100 100000
26.2 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
100 10000000
2.81 s ± 23.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
100000 100
3.85 ms ± 16.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
100000 100000
9.74 ms ± 40.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
100000 10000000
444 ms ± 1.86 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
10000000 100
3.85 ms ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10000000 100000
10.7 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10000000 10000000
396 ms ± 2.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55693

Reviewed By: albanD

Differential Revision: D27895967

Pulled By: ngimel

fbshipit-source-id: 0616ce33395ce46f1a4161dfd38940b8e54fedc2
2021-04-27 12:27:09 -07:00
Mike Ruberry
399b66c813 Ports logdet from method_tests() to op_db (#55743)
Summary:
Per title. Also updates some tensor construction helpers.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55743

Reviewed By: ngimel

Differential Revision: D27702060

Pulled By: mruberry

fbshipit-source-id: f64b7bee855733ad1f4fd182819ceec5831d9878
2021-04-11 20:39:16 -07:00
Yukio Siraichi
93bf0ae6fc Remove legacy constructor calls from pytorch codebase. (#54142)
Summary:
Follow up from https://github.com/pytorch/pytorch/issues/53889
Related to https://github.com/pytorch/pytorch/issues/47112

Removing every occurrence of the legacy constructor call present in PyTorch at:
- _docs_
- _benchmarks_
- _test_
- _caffe2_
- _CONTRIBUTING.md_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54142

Reviewed By: ngimel

Differential Revision: D27699450

Pulled By: mruberry

fbshipit-source-id: 530aa3f5746cc8bc1407d5d51b2bbd8075e30546
2021-04-11 15:45:17 -07:00
kshitij12345
0527d14248 [numpy] Add torch.take_along_dim (#52833)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/38349

Wrapper around the existing `torch.gather` with broadcasting logic.

TODO:
* [x] Add Doc entry (see if phrasing can be improved)
* [x] Add OpInfo
* [x] Add test against numpy
* [x] Handle broadcasting behaviour and when dim is not given.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52833

Reviewed By: malfet

Differential Revision: D27319038

Pulled By: mruberry

fbshipit-source-id: 00f307825f92c679d96e264997aa5509172f5ed1
2021-03-28 05:22:51 -07:00
Thomas J. Fan
dc070605f1 TST Replaces assertEqualIgnoreTypes with assertEqual in test_indexing (#53115)
Summary:
Related to https://github.com/pytorch/pytorch/issues/38095 and https://github.com/pytorch/pytorch/issues/50006

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53115

Reviewed By: mruberry

Differential Revision: D27086086

Pulled By: VitalyFedyunin

fbshipit-source-id: 7a6af6bcf3d7ce9ba96d47a24a40f451d00f0e67
2021-03-16 16:06:36 -07:00
anjali411
4a2aa0f5f1 index_put_ for complex tensors on CUDA (#51148)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51148

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D26102025

Pulled By: anjali411

fbshipit-source-id: b1b6fd12fda03c4520a3c3200226edf352496188
2021-01-27 09:11:37 -08:00