Commit Graph

179 Commits

Author SHA1 Message Date
PyTorch MergeBot
a8f36dd646 Revert "add amp support for custom backend (#96188)"
This reverts commit cf12edee02.

Reverted https://github.com/pytorch/pytorch/pull/96188 on behalf of https://github.com/kit1980 due to Broke some linalg tests : https://github.com/pytorch/pytorch/actions/runs/4420037607/jobs/7750708339
2023-03-15 00:03:19 +00:00
shibo
cf12edee02 add amp support for custom backend (#96188)
Fixes #ISSUE_NUMBER
1、add amp support for custom backend
2、optimize the file `backend_registration.py`, and rename it with `custom_backend_registration.py`. And then we would register other funcs for custom backend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96188
Approved by: https://github.com/bdhirsh
2023-03-14 20:43:21 +00:00
soulitzer
d30db9a251 Replace non-reentrant checkpoint with a rewrite that can be nested and contain grad (#90105)
Changes:
- bc-breaking change: The main difference between this and the old non-reentrant impl that it replaces is that we clear recomputed tensors on backward immediately upon unpack, even if retain_graph=True. This has the following additional implications:
   - Accessing _saved_tensors multiple times will silently recompute forward multiple times.
   - Accessing ctx.saved_tensor twice in the same backward will now raise an error.
- To avoid dealing with the potential consequences, early stopping has been hidden behind a global flag that is by default False, and can be enabled via a context manager. We can remove this in a follow up. Some features of nesting as a result do not work by default.

Before land:
- import to check for more bc-breakingness
- implement any workarounds for the bc-breaking-ness, if we decide on any
- update docs to reflect new lifetime of recomputed variables
- update docs to mention the early stop feature

Follow ups:
- enable early-stopping by default
- update docs/tutorial to feature nested use cases

Related docs:
  - code comment: https://github.com/pytorch/pytorch/pull/90105/files#diff-9dcd955620b52ce128e18e3567be88edbb238810460d1288a86fabc20e483b30R448
  - design doc: https://docs.google.com/document/d/1UDLhTNv6_kvuDTRlsjfj9WdqtNaQNr8ahrvdBIB6914/edit#
  - retains_grad <> checkpiont https://docs.google.com/document/d/1maiGmuFUxysQL0AdYUU88kngAaXh_L0XpDcLDh_5Ors/edit

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90105
Approved by: https://github.com/albanD
2023-03-14 20:38:36 +00:00
Xuehai Pan
046e88a291 [BE] [3/3] Rewrite super() calls in test (#94592)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592
Approved by: https://github.com/ezyang, https://github.com/seemethere
2023-02-12 22:20:53 +00:00
Aaron Gokaslan
8fce9a09cd [BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308)
Apply parts of pyupgrade to torch (starting with the safest changes).
This PR only does two things: removes the need to inherit from object and removes unused future imports.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-07 21:10:56 +00:00
albanD
0b2dc3b3ac [Py-3.11] Skip dynamo related tests (#94187)
The quantization test fails to import Dynamo as expected.
The traceback tool looks a lot more tricky, opened https://github.com/pytorch/pytorch/issues/94189 to investigate further.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94187
Approved by: https://github.com/malfet
2023-02-07 16:40:55 +00:00
Edward Z. Yang
8b00c54425 Add utility report_compile_source_on_error (#91069)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91069
Approved by: https://github.com/soumith, https://github.com/albanD
2023-01-11 22:54:46 +00:00
Edward Z. Yang
333540a458 Reland "Add torch.utils.device_mode" (#91796)
Original PR https://github.com/pytorch/pytorch/pull/91525

Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91796
Approved by: https://github.com/albanD
2023-01-09 20:57:12 +00:00
PyTorch MergeBot
9b415240d4 Revert "Reland "Add torch.utils.device_mode" (#91796)"
This reverts commit 81b5eff3c3.

Reverted https://github.com/pytorch/pytorch/pull/91796 on behalf of https://github.com/huydhn due to This breaks trunk with the following failed test https://hud.pytorch.org/failure/test_jit_save%2CTestTracer
2023-01-09 04:45:47 +00:00
Edward Z. Yang
81b5eff3c3 Reland "Add torch.utils.device_mode" (#91796)
Original PR https://github.com/pytorch/pytorch/pull/91525

Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91796
Approved by: https://github.com/albanD
2023-01-08 03:44:56 +00:00
PyTorch MergeBot
f571ae4fdb Revert "Make torch.device usable as a context manager (#91525)"
This reverts commit 619d52a5d2.

Reverted https://github.com/pytorch/pytorch/pull/91525 on behalf of https://github.com/mehtanirav due to Internal breakages
2023-01-05 21:34:50 +00:00
Edward Z. Yang
619d52a5d2 Make torch.device usable as a context manager (#91525)
Fixes https://github.com/pytorch/pytorch/issues/82296
Fixes https://github.com/pytorch/pytorch/issues/27878
Fixes https://github.com/pytorch/pytorch/issues/260

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91525
Approved by: https://github.com/albanD
2023-01-04 01:32:00 +00:00
Huy Do
0417da2288 Set a timeout value when testing multiprocess DataLoader (#91476)
Setting a timeout value when testing multiprocess DataLoader to prevent ASAN jobs timing out after 4 hours.

We are seeing multiple timeout issue running ASAN tests on HUD https://hud.pytorch.org/hud/pytorch/pytorch/master/1?per_page=50&name_filter=asan for examples

* Without mem leak check enabled https://github.com/pytorch/pytorch/actions/runs/3794216079/jobs/6455118197
* With mem leak check https://github.com/pytorch/pytorch/actions/runs/3792743994/jobs/6449356306

Looking a bit closer into the test, the hanging happens when multiprocess DataLoader is used in `test_utils`.  Here is the snapshot of those processes when I log into the hang runner:

```
UID        PID  PPID  C STIME TTY          TIME CMD
jenkins      1     0  0 Dec28 pts/0    00:00:00 bash
jenkins      8     0  0 Dec28 pts/1    00:00:00 sh -c pip install dist/torch-2.0.0a0+git97db9fd-cp37-cp37m-linux_x86_64.whl[opt-einsum] && .jenkins/pytorch/test.sh
jenkins     20     8  0 Dec28 pts/1    00:00:00 /bin/bash .jenkins/pytorch/test.sh
jenkins    764    20  0 Dec28 pts/1    00:00:07 python test/run_test.py --exclude-jit-executor --exclude-distributed-tests --shard 5 5 --verbose
jenkins    788   764  0 Dec28 pts/1    00:00:00 /opt/conda/bin/python -c from multiprocessing.semaphore_tracker import main;main(6)
jenkins   3743   764  0 Dec28 pts/1    00:00:05 /opt/conda/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=7, pipe_handle=11) --multiprocessing-fork
jenkins   3766  3743  0 Dec28 pts/1    00:00:06 /opt/conda/bin/python -bb test_utils.py -v --import-slow-tests --import-disabled-tests
jenkins   3878  3766  0 Dec28 pts/1    00:00:06 /opt/conda/bin/python -bb test_utils.py -v --import-slow-tests --import-disabled-tests
jenkins   3879  3766  0 Dec28 pts/1    00:00:00 /opt/conda/bin/python -bb test_utils.py -v --import-slow-tests --import-disabled-tests
jenkins   3880  3766  0 Dec28 pts/1    00:00:00 /opt/conda/bin/python -bb test_utils.py -v --import-slow-tests --import-disabled-tests
jenkins   3881  3766  0 Dec28 pts/1    00:00:00 /opt/conda/bin/python -bb test_utils.py -v --import-slow-tests --import-disabled-tests
jenkins   3893     0  0 01:45 pts/2    00:00:00 /bin/bash
jenkins   3904  3893  0 01:46 pts/2    00:00:00 ps -ef
```

The specific hanging test was `test_random_seed` which spawned 4 subprocesses to load data.  After I killed one of them, the test could continue and printed the following stacktrace:

```
    test_random_seed (__main__.TestDataLoaderUtils) ... [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
  [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
  [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
  [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
  [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
  [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
  [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
  [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
  ERROR (9345.840s)
    test_random_seed (__main__.TestDataLoaderUtils) ...     test_random_seed errored - num_retries_left: 3
  Traceback (most recent call last):
    File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1134, in _try_get_data
      data = self._data_queue.get(timeout=timeout)
    File "/opt/conda/lib/python3.7/multiprocessing/queues.py", line 104, in get
      if not self._poll(timeout):
    File "/opt/conda/lib/python3.7/multiprocessing/connection.py", line 257, in poll
      return self._poll(timeout)
    File "/opt/conda/lib/python3.7/multiprocessing/connection.py", line 414, in _poll
      r = wait([self], timeout)
    File "/opt/conda/lib/python3.7/multiprocessing/connection.py", line 921, in wait
      ready = selector.select(timeout)
    File "/opt/conda/lib/python3.7/selectors.py", line 415, in select
      fd_event_list = self._selector.poll(timeout)
    File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
      _error_if_any_worker_fails()
  RuntimeError: DataLoader worker (pid 3878) is killed by signal: Terminated.
  The above exception was the direct cause of the following exception:
  Traceback (most recent call last):
    File "test_utils.py", line 469, in test_random_seed
      x2 = run()
    File "test_utils.py", line 464, in run
      return next(iter(dataloader))
    File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 635, in __next__
      data = self._next_data()
    File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1330, in _next_data
      idx, data = self._get_data()
    File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1296, in _get_data
      success, data = self._try_get_data()
    File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1147, in _try_get_data
      raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
  RuntimeError: DataLoader worker (pid(s) 3878) exited unexpectedly
  [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
  [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
  [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
  [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
  [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
  [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
  [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
  [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
  ok (0.137s)
```

This doesn't fix the issue which I'll need to follow up to see why they hang.  However, this should allow the test to terminate gracefully and report errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91476
Approved by: https://github.com/kit1980
2022-12-29 17:50:37 +00:00
mikey dagitses
3a1bdfee67 skip environment collection test in fbcode (#88744)
Summary: This runs pip, which we don't have in the fbcode environment.

Test Plan: Rely on CI.

Differential Revision: D41156589

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88744
Approved by: https://github.com/zou3519
2022-11-09 18:20:04 +00:00
soulitzer
c18eead2df Update saved variable hooks to no longer trigger on wrapped numbers (#87316)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87316
Approved by: https://github.com/ezyang, https://github.com/albanD
2022-10-20 03:01:11 +00:00
Rohan Varma
7a411952fb CheckpointSequential support non-reentrant (#86331)
Closes https://github.com/pytorch/pytorch/issues/86328

Adds `use_reentrant` argument to `checkpoint_sequential`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86331
Approved by: https://github.com/zhaojuanmao, https://github.com/albanD
2022-10-06 23:10:18 +00:00
Zain Rizvi
a1a95d402d Fix inheritance in TestDataLoaderUtil (#85018)
TestDataLoaderUtils needs to run it's parent class's setUp method to actually disable flaky tests (see https://github.com/pytorch/pytorch/issues/70516#issuecomment-1247045072 for details)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85018
Approved by: https://github.com/clee2000, https://github.com/huydhn
2022-09-14 22:04:43 +00:00
soulitzer
b18962552e Fix and unskip cpp extension tests for ARM (#83115)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83115
Approved by: https://github.com/albanD
2022-08-11 20:01:53 +00:00
albanD
7dd795cbed Prevent ref cycle creation in inner hook (#82776)
Towards fixing https://github.com/pytorch/pytorch/issues/82482

This PR fixes two things:

## 1) memory leak
The .detach() call prevents a true memory leak in some cases where the user function is using multiple ops in a row that save their inputs. The following chain of objects keep each other alive
- the `storage` object
- a recomputed Tensor y
- y's grad_fn FooBackward (in c++)
- FooBackward's SavedVariables (in c++)
- SavedVariable Hook
- the `inner_pack` function
- captures `storage`

Since part of this cycle is in c++, the python gc is not able to break it.
Should THPCppFunction_traverse actually visit it's SavedVariables which in turn should visit their hooks? I think the answer is yes but I haven't dived into which python object is traversing what as if there is non-unique ownership of the c++ object, it makes the traversal a lot trickier. @ezyang do you think we should dive into this more?

In this case, this can be easily solved anyways by storing `y.detach()` in the `storage` object as we don't care about the temporary backward graph that gets created during the second forward call.

## 2) Lifetime of the recomputed buffers
The new storage system is now such that the lifetime of the recomputed buffer is directly linked to the SavedVariable c++ object. Meaning that this buffer will get deleted IIF the SavedVariable is cleared.
This means that we now get the exact same behavior as the version without the saved variable hook where Tensors are saved directly on the SavedVariable object.

This is great as this solves all the cases where the non-checkpoint version used to work but the checkpoint version does not (even double access or retain_graph=True).

The one drawback of this approach though is that the buffer do NOT get cleared when the user passes in `retain_graph=True`! The next backward won't even re-run the forward as it already has all the buffers available. Is this a problem that you think we would need to find a solution for @rohan-varma or it is niche enough that we don't care for now?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82776
Approved by: https://github.com/ezyang, https://github.com/rohan-varma
2022-08-06 00:31:22 +00:00
albanD
2255911f8a Make M1 tests green (#82213)
This is skipping all the failing tests and add a new master job to test on M1

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82213
Approved by: https://github.com/seemethere, https://github.com/soulitzer, https://github.com/malfet
2022-08-05 16:12:08 +00:00
PyTorch MergeBot
ec4be38ba9 Revert "To add hipify_torch as a submodule in pytorch/third_party (#74704)"
This reverts commit 93b0fec39d.

Reverted https://github.com/pytorch/pytorch/pull/74704 on behalf of https://github.com/malfet due to broke torchvision
2022-06-21 23:54:00 +00:00
Bhavya Medishetty
93b0fec39d To add hipify_torch as a submodule in pytorch/third_party (#74704)
`hipify_torch` as a submodule in `pytorch/third_party`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74704
Approved by: https://github.com/jeffdaily, https://github.com/malfet
2022-06-21 18:56:49 +00:00
Kiarash Jamali
bc3c7a6cbd Fix issue with _checkpoint_without_reentrant
Fixes  #76737
I also added a test case for this bug.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76890
Approved by: https://github.com/albanD
2022-05-05 17:37:31 +00:00
Nikita Shulga
8473173c36 Remove breakpad dependency
This functionality does not seem to be used
and there are some requests to update dependency.

Add `third_party` to torch_cpu include directories if compiling with
Caffe2 support, as `caffe2/quantization/server/conv_dnnlowp_op.cc` depends on `third_party/fbgemm/src/RefImplementations.h`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394
Approved by: https://github.com/janeyx99, https://github.com/seemethere
2022-05-03 20:21:55 +00:00
PyTorch MergeBot
d79d9fa283 Revert "Remove breakpad dependency"
This reverts commit 9aa3c7fd83.

Reverted https://github.com/pytorch/pytorch/pull/75394 on behalf of https://github.com/malfet
2022-04-17 17:58:51 +00:00
Nikita Shulga
9aa3c7fd83 Remove breakpad dependency
This functionality does not seem to be used
and there are some requests to update dependency

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394
Approved by: https://github.com/janeyx99, https://github.com/seemethere
2022-04-17 17:43:45 +00:00
Nicolas Hug
d0387ad285 Move torchhub tests into separate test_hub.py file
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74826

Approved by: https://github.com/vmoens
2022-03-30 10:06:14 +00:00
Nicolas Hug
7df0d9fda4 Call super().setUp() and super().tearDown() in torchhub tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74621

Approved by: https://github.com/vmoens, https://github.com/janeyx99, https://github.com/cpuhrsch
2022-03-25 14:36:31 +00:00
Jane Xu
a1e284d9c8 Remove high priority as an owner for tests (#74555)
Summary:
Following triage review discussion, it would be best for these tests to not be triaged high priority by automation, but by the triagers in the oncall.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74555

Reviewed By: albanD

Differential Revision: D35099202

Pulled By: janeyx99

fbshipit-source-id: 657a0317141de3a598476a6f601ec26cc26231b1
(cherry picked from commit 057519cb2494d0f9a0b169f359ac87ba9e89f088)
2022-03-24 14:29:52 +00:00
Lood
670e4d9808 set_dir expanding "~"
Fixes #69761.

Small change to torch.hub.set_dir() (<10 LOC).

It seems that before the code was split into `set_dir()` and `_get_torch_home `, an [earlier version](5164622ba4/torch/hub.py (L111)) of hub.py had a os.path.expanduser check.

Currently, [_get_torch_home](https://github.com/pytorch/pytorch/blob/master/torch/hub.py#L104) retained the os.path.expanduser check, but `set_dir()` didn't have one. This PR fixes that (I hope).

(As I mentioned in the issue, I can't run the tests on my laptop yet because of storage space :/ But I did include a test.)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69763
Approved by: https://github.com/malfet, https://github.com/NicolasHug
2022-03-23 20:38:14 +00:00
Nicolas Hug
08590b4159 Cosmetic changes to torchhub tests (#74431)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74431

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D35011898

Pulled By: NicolasHug

fbshipit-source-id: 37a42f843b0a3c781fa59254552a9b3af8678176
(cherry picked from commit aa4f83e126cb72cd846266af7ea77c70e2a9dc81)
2022-03-22 08:55:09 +00:00
Nicolas Hug
e0ecdb5cba Properly catch warning in torchhub tests (#74430)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74430

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D35011900

Pulled By: NicolasHug

fbshipit-source-id: 36753167d6ee737ee437d1cd7303e5cc8b5c286c
(cherry picked from commit d0fdf4af795bdf74c145260c82f976a53f1aaff5)
2022-03-22 08:55:09 +00:00
Nicolas Hug
bcc77c470b Cosmetic changes to torchhub tests (#74431)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74431

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D35011832

Pulled By: NicolasHug

fbshipit-source-id: f76f92cf92b236ac8a2e2947001d219d0a7d5f14
(cherry picked from commit 3e142f8da9479eab356b3f38ace321cc9fde9bfc)
2022-03-22 08:55:09 +00:00
Alban Desmaison
734281c3d6 Cleanup all module references in doc (#73983)
Summary:
Working towards https://docs.google.com/document/d/10yx2-4gs0gTMOimVS403MnoAWkqitS8TUHX73PN8EjE/edit?pli=1#

This PR:
- Ensure that all the submodules are listed in a rst file (that ensure they are considered by the coverage tool)
- Remove some long deprecated code that just error out on import
- Remove the allow list altogether to ensure nothing gets added back there

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73983

Reviewed By: anjali411

Differential Revision: D34787908

Pulled By: albanD

fbshipit-source-id: 163ce61e133b12b2f2e1cbe374f979e3d6858db7
(cherry picked from commit c9edfead7a01dc45bfc24eaf7220d2a84ab1f62e)
2022-03-10 22:26:29 +00:00
Nikita Shulga
bede18b061 Add support for C++ frontend wrapper on Linux (#69094)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69094

Partially addresses https://github.com/pytorch/pytorch/issues/68768

Test Plan: Imported from OSS

Reviewed By: seemethere

Differential Revision: D32730079

Pulled By: malfet

fbshipit-source-id: 854e4215ff66e087bdf354fed7a17e87f2649c87
2021-12-02 16:47:00 -08:00
Michael Suo
5fd93fb5f8 broaden retries on TestHub (#67779)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67779

Not all flaky failures from this test are URLErrors; I think we should
err on the side of being expansive with retries here.

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D32145434

Pulled By: suo

fbshipit-source-id: 3c3274b2080681fcafb3ea6132e420605f65c429
2021-11-03 13:48:58 -07:00
Jane Xu
c19cda5782 [skip ci] Add test owners for a special hi-pri class of tests (#67553)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

This change does require some context: there were several suggestions regarding what to do about this group of tests: tests that are core and crucial to all of PyTorch and are too broad to be owned by one team.
1. Let's add a "module: core" and put people behind it! This idea sounds appealing unless you are one of the people backing the label. From talking to albanD among others, this idea of putting all these core tests on the shoulder of a few people or one team isn't super fair and I have not yet found anyone willing to take on this job.
2. Taking advantage of the fact that we already have a triaging oncall that takes turns triaging issues, we can leave these tests essentially unlabeled and allow the oncall to triage these tests. Since these tests are crucial to PyTorch, we'll add the "high priority" label to mark them different from other unowned tests (see https://github.com/pytorch/pytorch/issues/67552).
3. I _could_ still create an unbacked label "module: core" and attribute these tests there, but I don't like the idea of creating a facade that the tests are "triaged" to a label when no one is actually taking a look.

Now we could potentially break these tests down into smaller files so that each piece _could_ be owned by a team, but 1. I don't know if this is currently feasible and 2. This approach does not prevent that from happening in the future.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67553

Reviewed By: albanD

Differential Revision: D32025004

Pulled By: janeyx99

fbshipit-source-id: 1fb1aa4c27e305695ab6e80ae3d02f90519939c0
2021-10-29 12:17:21 -07:00
Jane Xu
68555339d7 test_utils.py: Add another retry to test_download_url_to_file (#66159)
Summary:
Fixes one of the flakiness concerns mentioned https://github.com/pytorch/pytorch/issues/65439#issuecomment-934686485

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66159

Reviewed By: ngimel

Differential Revision: D31406485

Pulled By: janeyx99

fbshipit-source-id: cf7834cdab58360ecef1748075d52969de2e0778
2021-10-05 16:26:20 -07:00
Nicolas Hug
0a3cf8886a Torchhub: More robust assumption regarding main or master branch (#64364)
Summary:
Closes https://github.com/pytorch/pytorch/issues/63753

This PR changes the assumption regarding the default branch of a repo to the following:

> If main exist then use main,otherwise use master

This will make torchhub more robust w.r.t. to the ongoing changes where repo use `main` instead of `master` as the development / default branch.

cc nairbv NicolasHug

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64364

Reviewed By: saketh-are

Differential Revision: D30731551

Pulled By: NicolasHug

fbshipit-source-id: 7232a30e956dcccca21933a29de5eddd711aa99b
2021-09-20 10:36:13 -07:00
Mike Ruberry
6596173811 Revert D30731191: [pytorch][PR] Torchhub: rewrite commit hash check to avoid using unnecessary GitHub API credits
Test Plan: revert-hammer

Differential Revision:
D30731191 (f9bf144a0c)

Original commit changeset: d1ee7c2ef259

fbshipit-source-id: 5c7207f66c5354ce7b9ac2594e4f5b8307619b0c
2021-09-17 14:33:00 -07:00
Nicolas Hug
f9bf144a0c Torchhub: rewrite commit hash check to avoid using unnecessary GitHub API credits (#64362)
Summary:
This PR adds more detailed error messages to torchhub if the commit hash validation goes wrong, providing suggestions to the users on how to resolve the issue.

It also documents why such validation is important.

EDIT: it also avoids validatating some stuff when we know "stuff" isn't a commit since there's no risk in this case

CC malfet mthrok

cc nairbv NicolasHug

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64362

Reviewed By: gchanan, malfet

Differential Revision: D30731191

Pulled By: NicolasHug

fbshipit-source-id: d1ee7c2ef2591dd7a5291977af1635ada2552d1b
2021-09-17 10:30:39 -07:00
Nicolas Hug
9157a2889f Pass GITHUB_TOKEN to linux CI jobs and avoid skipping torchhub tests (#64807)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64760

This should hopefully put the torchhub tests back.

This also avoids skipping the torchhub tests: currently the tests are skipped if they fail, which pretty much defeats the purpose of having a test in the first place since we're never notified when they do fail.

cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra nairbv NicolasHug

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64807

Reviewed By: seemethere

Differential Revision: D30994585

Pulled By: NicolasHug

fbshipit-source-id: 561782c22462b5cfec99cca153eb59623db5660a
2021-09-17 03:30:56 -07:00
driazati
bd8608cd5c Use CMake for breakpad (#63186)
Summary:
We currently build breakpad from [this fork](https://github.com/driazati/breakpad) to include extra logic to restore signal handlers that were previously present. With some [new additions](https://github.com/google/breakpad/compare/main...driazati:main) this fork now includes a CMake based build, so we can add breakpad as a proper dependency rather than rely on including it in Docker images as a system library which is error prone (we have a bunch of images) and hard to extend to MacOS / Windows. This also includes some changes to the crash handling code to support MacOS / Windows in a similar way to Linux.

```python
import torch

# On Windows this writes crashes to C:\Users\<user>\AppData\pytorch_crashes
# On MacOS/Linux this writes crashes to /tmp/pytorch_crashes
torch.utils._crash_handler.enable_minidumps()

# Easy way to cause a segfault and trigger the handler
torch.bincount(input=torch.tensor([9223372036854775807]))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63186

Reviewed By: malfet, seemethere

Differential Revision: D30318404

Pulled By: driazati

fbshipit-source-id: 0d7daf3701cfaba5451cc529a0730272ab1eb1dc
2021-08-19 10:42:01 -07:00
Shen Li
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
Zsolt Dollenstein
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
driazati
45cc207a88 Fix breakpad build + add test canary (#60990)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60990

This makes the breakpad build more explicit in its messaging and hints to cmake where to look for the library (it wasn't able to find it without `PATHS` on CI even though that works locally). This also adds a smoke test that will fail if breakpad isn't present on a CI job where it is expected (e.g. binary builds).

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D29514316

Pulled By: driazati

fbshipit-source-id: 79514363334788f311ba5d4f25deed3452f0c3eb
2021-07-06 14:15:07 -07:00
johnlu
265f0e5321 Add device runtime API for the plug-in to register platform python module into torch (#59857)
Summary:
## Motivation
Allow the out-of-tree Pytorch plug-in, for the device type other than CUDA, to add the runtime interface to the `torch` module. The runtime interface of the device can be referred with the device type name in the `torch` module. I.E., `torch.cuda` or `torch.xpu`.

## Solution
- Add a register interface for the plug-in to add the platform python module into `torch` module with the device type name. I.E., The `torch.xpu` can be used to refer the XPU runtime interface after the XPU runtime module is registered with `torch._register_device_module('xpu', xpu_module)` in Intel's XPU plug-in.

## Additional Context
More details about runtime has been discussed in https://github.com/pytorch/pytorch/issues/53707.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59857

Reviewed By: mrshenli

Differential Revision: D29309320

Pulled By: ezyang

fbshipit-source-id: b9802a5f937ddef9e0bdaf2f7692dfe463912fbe
2021-06-23 07:54:45 -07:00
Philip Meier
d5988c5eca remove unused type: ignore directives (#60006)
Summary:
During development it is common practice to put `type: ignore` comments on lines that are correct, but `mypy` doesn't recognize this. This often stems from the fact, that the used `mypy` version wasn't able to handle the used pattern.

With every new release `mypy` gets better at handling complex code. In addition to fix all the previously accepted but now failing patterns, we should also revisit all `type: ignore` comments to see if they are still needed or not. Fortunately, we don't need to do it manually: by adding `warn_unused_ignores = True` to the configuration, `mypy` will error out in case it encounters an `type: ignore` that is no longer needed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60006

Reviewed By: jbschlosser, malfet

Differential Revision: D29133237

Pulled By: albanD

fbshipit-source-id: 41e82edc5cd5affa7ccedad044b59b94dad4425a
2021-06-18 07:23:31 -07:00
driazati
059a717c9e Fix breakpad build and add to more images (#59236)
Summary:
This PR
* adds the breakpad build to most of the remaining docker images (except the mobile + slim ones)
* pins to a [fork of breakpad](https://github.com/google/breakpad/compare/master...driazati:master?expand=1) to enable dasiy chaining on signal handlers
* renames the API to be nicer

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59236

Reviewed By: malfet

Differential Revision: D28792511

Pulled By: driazati

fbshipit-source-id: 83723e74b7f0a00e1695210ac2620a0c91ab4bf2
2021-06-01 22:47:14 -07:00
Sam Estep
75024e228c Add lint for unqualified type: ignore (#56290)
Summary:
The other half of https://github.com/pytorch/pytorch/issues/56272.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56290

Test Plan:
CI should pass on the tip of this PR, and we know that the lint works because the following CI runs (before this PR was finished) failed:

- https://github.com/pytorch/pytorch/runs/2384511062
- https://github.com/pytorch/pytorch/actions/runs/765036024

Reviewed By: seemethere

Differential Revision: D27867219

Pulled By: samestep

fbshipit-source-id: e648f07b6822867e70833e23ddafe7fb7eaca235
2021-04-21 08:07:23 -07:00