pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	9b415240d4	Revert "Reland "Add torch.utils.device_mode" (#91796 )" This reverts commit `81b5eff3c3`. Reverted https://github.com/pytorch/pytorch/pull/91796 on behalf of https://github.com/huydhn due to This breaks trunk with the following failed test https://hud.pytorch.org/failure/test_jit_save%2CTestTracer	2023-01-09 04:45:47 +00:00
Edward Z. Yang	81b5eff3c3	Reland "Add torch.utils.device_mode" (#91796 ) Original PR https://github.com/pytorch/pytorch/pull/91525 Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/91796 Approved by: https://github.com/albanD	2023-01-08 03:44:56 +00:00
PyTorch MergeBot	f571ae4fdb	Revert "Make torch.device usable as a context manager (#91525 )" This reverts commit `619d52a5d2`. Reverted https://github.com/pytorch/pytorch/pull/91525 on behalf of https://github.com/mehtanirav due to Internal breakages	2023-01-05 21:34:50 +00:00
Edward Z. Yang	619d52a5d2	Make torch.device usable as a context manager (#91525 ) Fixes https://github.com/pytorch/pytorch/issues/82296 Fixes https://github.com/pytorch/pytorch/issues/27878 Fixes https://github.com/pytorch/pytorch/issues/260 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/91525 Approved by: https://github.com/albanD	2023-01-04 01:32:00 +00:00
Huy Do	0417da2288	Set a timeout value when testing multiprocess DataLoader (#91476 ) Setting a timeout value when testing multiprocess DataLoader to prevent ASAN jobs timing out after 4 hours. We are seeing multiple timeout issue running ASAN tests on HUD https://hud.pytorch.org/hud/pytorch/pytorch/master/1?per_page=50&name_filter=asan for examples * Without mem leak check enabled https://github.com/pytorch/pytorch/actions/runs/3794216079/jobs/6455118197 * With mem leak check https://github.com/pytorch/pytorch/actions/runs/3792743994/jobs/6449356306 Looking a bit closer into the test, the hanging happens when multiprocess DataLoader is used in `test_utils`. Here is the snapshot of those processes when I log into the hang runner: ``` UID PID PPID C STIME TTY TIME CMD jenkins 1 0 0 Dec28 pts/0 00:00:00 bash jenkins 8 0 0 Dec28 pts/1 00:00:00 sh -c pip install dist/torch-2.0.0a0+git97db9fd-cp37-cp37m-linux_x86_64.whl[opt-einsum] && .jenkins/pytorch/test.sh jenkins 20 8 0 Dec28 pts/1 00:00:00 /bin/bash .jenkins/pytorch/test.sh jenkins 764 20 0 Dec28 pts/1 00:00:07 python test/run_test.py --exclude-jit-executor --exclude-distributed-tests --shard 5 5 --verbose jenkins 788 764 0 Dec28 pts/1 00:00:00 /opt/conda/bin/python -c from multiprocessing.semaphore_tracker import main;main(6) jenkins 3743 764 0 Dec28 pts/1 00:00:05 /opt/conda/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=7, pipe_handle=11) --multiprocessing-fork jenkins 3766 3743 0 Dec28 pts/1 00:00:06 /opt/conda/bin/python -bb test_utils.py -v --import-slow-tests --import-disabled-tests jenkins 3878 3766 0 Dec28 pts/1 00:00:06 /opt/conda/bin/python -bb test_utils.py -v --import-slow-tests --import-disabled-tests jenkins 3879 3766 0 Dec28 pts/1 00:00:00 /opt/conda/bin/python -bb test_utils.py -v --import-slow-tests --import-disabled-tests jenkins 3880 3766 0 Dec28 pts/1 00:00:00 /opt/conda/bin/python -bb test_utils.py -v --import-slow-tests --import-disabled-tests jenkins 3881 3766 0 Dec28 pts/1 00:00:00 /opt/conda/bin/python -bb test_utils.py -v --import-slow-tests --import-disabled-tests jenkins 3893 0 0 01:45 pts/2 00:00:00 /bin/bash jenkins 3904 3893 0 01:46 pts/2 00:00:00 ps -ef ``` The specific hanging test was `test_random_seed` which spawned 4 subprocesses to load data. After I killed one of them, the test could continue and printed the following stacktrace: ``` test_random_seed (__main__.TestDataLoaderUtils) ... [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads) [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads) [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads) [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads) [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads) [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads) [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads) [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads) ERROR (9345.840s) test_random_seed (__main__.TestDataLoaderUtils) ... test_random_seed errored - num_retries_left: 3 Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1134, in _try_get_data data = self._data_queue.get(timeout=timeout) File "/opt/conda/lib/python3.7/multiprocessing/queues.py", line 104, in get if not self._poll(timeout): File "/opt/conda/lib/python3.7/multiprocessing/connection.py", line 257, in poll return self._poll(timeout) File "/opt/conda/lib/python3.7/multiprocessing/connection.py", line 414, in _poll r = wait([self], timeout) File "/opt/conda/lib/python3.7/multiprocessing/connection.py", line 921, in wait ready = selector.select(timeout) File "/opt/conda/lib/python3.7/selectors.py", line 415, in select fd_event_list = self._selector.poll(timeout) File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 3878) is killed by signal: Terminated. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "test_utils.py", line 469, in test_random_seed x2 = run() File "test_utils.py", line 464, in run return next(iter(dataloader)) File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 635, in __next__ data = self._next_data() File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1330, in _next_data idx, data = self._get_data() File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1296, in _get_data success, data = self._try_get_data() File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1147, in _try_get_data raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e RuntimeError: DataLoader worker (pid(s) 3878) exited unexpectedly [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads) [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads) [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads) [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads) [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads) [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads) [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads) [W ParallelNative.cpp:230] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads) ok (0.137s) ``` This doesn't fix the issue which I'll need to follow up to see why they hang. However, this should allow the test to terminate gracefully and report errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91476 Approved by: https://github.com/kit1980	2022-12-29 17:50:37 +00:00
mikey dagitses	3a1bdfee67	skip environment collection test in fbcode (#88744 ) Summary: This runs pip, which we don't have in the fbcode environment. Test Plan: Rely on CI. Differential Revision: D41156589 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88744 Approved by: https://github.com/zou3519	2022-11-09 18:20:04 +00:00
soulitzer	c18eead2df	Update saved variable hooks to no longer trigger on wrapped numbers (#87316 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87316 Approved by: https://github.com/ezyang, https://github.com/albanD	2022-10-20 03:01:11 +00:00
Rohan Varma	7a411952fb	CheckpointSequential support non-reentrant (#86331 ) Closes https://github.com/pytorch/pytorch/issues/86328 Adds `use_reentrant` argument to `checkpoint_sequential`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86331 Approved by: https://github.com/zhaojuanmao, https://github.com/albanD	2022-10-06 23:10:18 +00:00
Zain Rizvi	a1a95d402d	Fix inheritance in TestDataLoaderUtil (#85018 ) TestDataLoaderUtils needs to run it's parent class's setUp method to actually disable flaky tests (see https://github.com/pytorch/pytorch/issues/70516#issuecomment-1247045072 for details) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85018 Approved by: https://github.com/clee2000, https://github.com/huydhn	2022-09-14 22:04:43 +00:00
soulitzer	b18962552e	Fix and unskip cpp extension tests for ARM (#83115 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83115 Approved by: https://github.com/albanD	2022-08-11 20:01:53 +00:00
albanD	7dd795cbed	Prevent ref cycle creation in inner hook (#82776 ) Towards fixing https://github.com/pytorch/pytorch/issues/82482 This PR fixes two things: ## 1) memory leak The .detach() call prevents a true memory leak in some cases where the user function is using multiple ops in a row that save their inputs. The following chain of objects keep each other alive - the `storage` object - a recomputed Tensor y - y's grad_fn FooBackward (in c++) - FooBackward's SavedVariables (in c++) - SavedVariable Hook - the `inner_pack` function - captures `storage` Since part of this cycle is in c++, the python gc is not able to break it. Should THPCppFunction_traverse actually visit it's SavedVariables which in turn should visit their hooks? I think the answer is yes but I haven't dived into which python object is traversing what as if there is non-unique ownership of the c++ object, it makes the traversal a lot trickier. @ezyang do you think we should dive into this more? In this case, this can be easily solved anyways by storing `y.detach()` in the `storage` object as we don't care about the temporary backward graph that gets created during the second forward call. ## 2) Lifetime of the recomputed buffers The new storage system is now such that the lifetime of the recomputed buffer is directly linked to the SavedVariable c++ object. Meaning that this buffer will get deleted IIF the SavedVariable is cleared. This means that we now get the exact same behavior as the version without the saved variable hook where Tensors are saved directly on the SavedVariable object. This is great as this solves all the cases where the non-checkpoint version used to work but the checkpoint version does not (even double access or retain_graph=True). The one drawback of this approach though is that the buffer do NOT get cleared when the user passes in `retain_graph=True`! The next backward won't even re-run the forward as it already has all the buffers available. Is this a problem that you think we would need to find a solution for @rohan-varma or it is niche enough that we don't care for now? Pull Request resolved: https://github.com/pytorch/pytorch/pull/82776 Approved by: https://github.com/ezyang, https://github.com/rohan-varma	2022-08-06 00:31:22 +00:00
albanD	2255911f8a	Make M1 tests green (#82213 ) This is skipping all the failing tests and add a new master job to test on M1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82213 Approved by: https://github.com/seemethere, https://github.com/soulitzer, https://github.com/malfet	2022-08-05 16:12:08 +00:00
PyTorch MergeBot	ec4be38ba9	Revert "To add hipify_torch as a submodule in pytorch/third_party (#74704 )" This reverts commit `93b0fec39d`. Reverted https://github.com/pytorch/pytorch/pull/74704 on behalf of https://github.com/malfet due to broke torchvision	2022-06-21 23:54:00 +00:00
Bhavya Medishetty	93b0fec39d	To add hipify_torch as a submodule in pytorch/third_party (#74704 ) `hipify_torch` as a submodule in `pytorch/third_party` Pull Request resolved: https://github.com/pytorch/pytorch/pull/74704 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2022-06-21 18:56:49 +00:00
Kiarash Jamali	bc3c7a6cbd	Fix issue with _checkpoint_without_reentrant Fixes #76737 I also added a test case for this bug. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76890 Approved by: https://github.com/albanD	2022-05-05 17:37:31 +00:00
Nikita Shulga	8473173c36	Remove breakpad dependency This functionality does not seem to be used and there are some requests to update dependency. Add `third_party` to torch_cpu include directories if compiling with Caffe2 support, as `caffe2/quantization/server/conv_dnnlowp_op.cc` depends on `third_party/fbgemm/src/RefImplementations.h` Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394 Approved by: https://github.com/janeyx99, https://github.com/seemethere	2022-05-03 20:21:55 +00:00
PyTorch MergeBot	d79d9fa283	Revert "Remove breakpad dependency" This reverts commit `9aa3c7fd83`. Reverted https://github.com/pytorch/pytorch/pull/75394 on behalf of https://github.com/malfet	2022-04-17 17:58:51 +00:00
Nikita Shulga	9aa3c7fd83	Remove breakpad dependency This functionality does not seem to be used and there are some requests to update dependency Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394 Approved by: https://github.com/janeyx99, https://github.com/seemethere	2022-04-17 17:43:45 +00:00
Nicolas Hug	d0387ad285	Move torchhub tests into separate test_hub.py file Pull Request resolved: https://github.com/pytorch/pytorch/pull/74826 Approved by: https://github.com/vmoens	2022-03-30 10:06:14 +00:00
Nicolas Hug	7df0d9fda4	Call super().setUp() and super().tearDown() in torchhub tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/74621 Approved by: https://github.com/vmoens, https://github.com/janeyx99, https://github.com/cpuhrsch	2022-03-25 14:36:31 +00:00
Jane Xu	a1e284d9c8	Remove high priority as an owner for tests (#74555 ) Summary: Following triage review discussion, it would be best for these tests to not be triaged high priority by automation, but by the triagers in the oncall. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74555 Reviewed By: albanD Differential Revision: D35099202 Pulled By: janeyx99 fbshipit-source-id: 657a0317141de3a598476a6f601ec26cc26231b1 (cherry picked from commit 057519cb2494d0f9a0b169f359ac87ba9e89f088)	2022-03-24 14:29:52 +00:00
Lood	670e4d9808	set_dir expanding "~" Fixes #69761. Small change to torch.hub.set_dir() (<10 LOC). It seems that before the code was split into `set_dir()` and `_get_torch_home `, an [earlier version](`5164622ba4/torch/hub.py (L111)`) of hub.py had a os.path.expanduser check. Currently, [_get_torch_home](https://github.com/pytorch/pytorch/blob/master/torch/hub.py#L104) retained the os.path.expanduser check, but `set_dir()` didn't have one. This PR fixes that (I hope). (As I mentioned in the issue, I can't run the tests on my laptop yet because of storage space :/ But I did include a test.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/69763 Approved by: https://github.com/malfet, https://github.com/NicolasHug	2022-03-23 20:38:14 +00:00
Nicolas Hug	08590b4159	Cosmetic changes to torchhub tests (#74431 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74431 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D35011898 Pulled By: NicolasHug fbshipit-source-id: 37a42f843b0a3c781fa59254552a9b3af8678176 (cherry picked from commit aa4f83e126cb72cd846266af7ea77c70e2a9dc81)	2022-03-22 08:55:09 +00:00
Nicolas Hug	e0ecdb5cba	Properly catch warning in torchhub tests (#74430 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74430 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D35011900 Pulled By: NicolasHug fbshipit-source-id: 36753167d6ee737ee437d1cd7303e5cc8b5c286c (cherry picked from commit d0fdf4af795bdf74c145260c82f976a53f1aaff5)	2022-03-22 08:55:09 +00:00
Nicolas Hug	bcc77c470b	Cosmetic changes to torchhub tests (#74431 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74431 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D35011832 Pulled By: NicolasHug fbshipit-source-id: f76f92cf92b236ac8a2e2947001d219d0a7d5f14 (cherry picked from commit 3e142f8da9479eab356b3f38ace321cc9fde9bfc)	2022-03-22 08:55:09 +00:00
Alban Desmaison	734281c3d6	Cleanup all module references in doc (#73983 ) Summary: Working towards https://docs.google.com/document/d/10yx2-4gs0gTMOimVS403MnoAWkqitS8TUHX73PN8EjE/edit?pli=1# This PR: - Ensure that all the submodules are listed in a rst file (that ensure they are considered by the coverage tool) - Remove some long deprecated code that just error out on import - Remove the allow list altogether to ensure nothing gets added back there Pull Request resolved: https://github.com/pytorch/pytorch/pull/73983 Reviewed By: anjali411 Differential Revision: D34787908 Pulled By: albanD fbshipit-source-id: 163ce61e133b12b2f2e1cbe374f979e3d6858db7 (cherry picked from commit c9edfead7a01dc45bfc24eaf7220d2a84ab1f62e)	2022-03-10 22:26:29 +00:00
Nikita Shulga	bede18b061	Add support for C++ frontend wrapper on Linux (#69094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69094 Partially addresses https://github.com/pytorch/pytorch/issues/68768 Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D32730079 Pulled By: malfet fbshipit-source-id: 854e4215ff66e087bdf354fed7a17e87f2649c87	2021-12-02 16:47:00 -08:00
Michael Suo	5fd93fb5f8	broaden retries on TestHub (#67779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67779 Not all flaky failures from this test are URLErrors; I think we should err on the side of being expansive with retries here. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D32145434 Pulled By: suo fbshipit-source-id: 3c3274b2080681fcafb3ea6132e420605f65c429	2021-11-03 13:48:58 -07:00
Jane Xu	c19cda5782	[skip ci] Add test owners for a special hi-pri class of tests (#67553 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 This change does require some context: there were several suggestions regarding what to do about this group of tests: tests that are core and crucial to all of PyTorch and are too broad to be owned by one team. 1. Let's add a "module: core" and put people behind it! This idea sounds appealing unless you are one of the people backing the label. From talking to albanD among others, this idea of putting all these core tests on the shoulder of a few people or one team isn't super fair and I have not yet found anyone willing to take on this job. 2. Taking advantage of the fact that we already have a triaging oncall that takes turns triaging issues, we can leave these tests essentially unlabeled and allow the oncall to triage these tests. Since these tests are crucial to PyTorch, we'll add the "high priority" label to mark them different from other unowned tests (see https://github.com/pytorch/pytorch/issues/67552). 3. I _could_ still create an unbacked label "module: core" and attribute these tests there, but I don't like the idea of creating a facade that the tests are "triaged" to a label when no one is actually taking a look. Now we could potentially break these tests down into smaller files so that each piece _could_ be owned by a team, but 1. I don't know if this is currently feasible and 2. This approach does not prevent that from happening in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67553 Reviewed By: albanD Differential Revision: D32025004 Pulled By: janeyx99 fbshipit-source-id: 1fb1aa4c27e305695ab6e80ae3d02f90519939c0	2021-10-29 12:17:21 -07:00
Jane Xu	68555339d7	test_utils.py: Add another retry to test_download_url_to_file (#66159 ) Summary: Fixes one of the flakiness concerns mentioned https://github.com/pytorch/pytorch/issues/65439#issuecomment-934686485 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66159 Reviewed By: ngimel Differential Revision: D31406485 Pulled By: janeyx99 fbshipit-source-id: cf7834cdab58360ecef1748075d52969de2e0778	2021-10-05 16:26:20 -07:00
Nicolas Hug	0a3cf8886a	Torchhub: More robust assumption regarding main or master branch (#64364 ) Summary: Closes https://github.com/pytorch/pytorch/issues/63753 This PR changes the assumption regarding the default branch of a repo to the following: > If main exist then use main,otherwise use master This will make torchhub more robust w.r.t. to the ongoing changes where repo use `main` instead of `master` as the development / default branch. cc nairbv NicolasHug Pull Request resolved: https://github.com/pytorch/pytorch/pull/64364 Reviewed By: saketh-are Differential Revision: D30731551 Pulled By: NicolasHug fbshipit-source-id: 7232a30e956dcccca21933a29de5eddd711aa99b	2021-09-20 10:36:13 -07:00
Mike Ruberry	6596173811	Revert D30731191: [pytorch][PR] Torchhub: rewrite commit hash check to avoid using unnecessary GitHub API credits Test Plan: revert-hammer Differential Revision: D30731191 (`f9bf144a0c`) Original commit changeset: d1ee7c2ef259 fbshipit-source-id: 5c7207f66c5354ce7b9ac2594e4f5b8307619b0c	2021-09-17 14:33:00 -07:00
Nicolas Hug	f9bf144a0c	Torchhub: rewrite commit hash check to avoid using unnecessary GitHub API credits (#64362 ) Summary: This PR adds more detailed error messages to torchhub if the commit hash validation goes wrong, providing suggestions to the users on how to resolve the issue. It also documents why such validation is important. EDIT: it also avoids validatating some stuff when we know "stuff" isn't a commit since there's no risk in this case CC malfet mthrok cc nairbv NicolasHug Pull Request resolved: https://github.com/pytorch/pytorch/pull/64362 Reviewed By: gchanan, malfet Differential Revision: D30731191 Pulled By: NicolasHug fbshipit-source-id: d1ee7c2ef2591dd7a5291977af1635ada2552d1b	2021-09-17 10:30:39 -07:00
Nicolas Hug	9157a2889f	Pass GITHUB_TOKEN to linux CI jobs and avoid skipping torchhub tests (#64807 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64760 This should hopefully put the torchhub tests back. This also avoids skipping the torchhub tests: currently the tests are skipped if they fail, which pretty much defeats the purpose of having a test in the first place since we're never notified when they do fail. cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra nairbv NicolasHug Pull Request resolved: https://github.com/pytorch/pytorch/pull/64807 Reviewed By: seemethere Differential Revision: D30994585 Pulled By: NicolasHug fbshipit-source-id: 561782c22462b5cfec99cca153eb59623db5660a	2021-09-17 03:30:56 -07:00
driazati	bd8608cd5c	Use CMake for breakpad (#63186 ) Summary: We currently build breakpad from [this fork](https://github.com/driazati/breakpad) to include extra logic to restore signal handlers that were previously present. With some [new additions](https://github.com/google/breakpad/compare/main...driazati:main) this fork now includes a CMake based build, so we can add breakpad as a proper dependency rather than rely on including it in Docker images as a system library which is error prone (we have a bunch of images) and hard to extend to MacOS / Windows. This also includes some changes to the crash handling code to support MacOS / Windows in a similar way to Linux. ```python import torch # On Windows this writes crashes to C:\Users\<user>\AppData\pytorch_crashes # On MacOS/Linux this writes crashes to /tmp/pytorch_crashes torch.utils._crash_handler.enable_minidumps() # Easy way to cause a segfault and trigger the handler torch.bincount(input=torch.tensor([9223372036854775807])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63186 Reviewed By: malfet, seemethere Differential Revision: D30318404 Pulled By: driazati fbshipit-source-id: 0d7daf3701cfaba5451cc529a0730272ab1eb1dc	2021-08-19 10:42:01 -07:00
Shen Li	1022443168	Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: revert-hammer Differential Revision: D30279364 (`b004307252`) Original commit changeset: c1ed77dfe43a fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e	2021-08-12 11:45:01 -07:00
Zsolt Dollenstein	b004307252	[codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: manual inspection & sandcastle Reviewed By: zertosh Differential Revision: D30279364 fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a	2021-08-12 10:58:35 -07:00
driazati	45cc207a88	Fix breakpad build + add test canary (#60990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60990 This makes the breakpad build more explicit in its messaging and hints to cmake where to look for the library (it wasn't able to find it without `PATHS` on CI even though that works locally). This also adds a smoke test that will fail if breakpad isn't present on a CI job where it is expected (e.g. binary builds). Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D29514316 Pulled By: driazati fbshipit-source-id: 79514363334788f311ba5d4f25deed3452f0c3eb	2021-07-06 14:15:07 -07:00
johnlu	265f0e5321	Add device runtime API for the plug-in to register platform python module into torch (#59857 ) Summary: ## Motivation Allow the out-of-tree Pytorch plug-in, for the device type other than CUDA, to add the runtime interface to the `torch` module. The runtime interface of the device can be referred with the device type name in the `torch` module. I.E., `torch.cuda` or `torch.xpu`. ## Solution - Add a register interface for the plug-in to add the platform python module into `torch` module with the device type name. I.E., The `torch.xpu` can be used to refer the XPU runtime interface after the XPU runtime module is registered with `torch._register_device_module('xpu', xpu_module)` in Intel's XPU plug-in. ## Additional Context More details about runtime has been discussed in https://github.com/pytorch/pytorch/issues/53707. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59857 Reviewed By: mrshenli Differential Revision: D29309320 Pulled By: ezyang fbshipit-source-id: b9802a5f937ddef9e0bdaf2f7692dfe463912fbe	2021-06-23 07:54:45 -07:00
Philip Meier	d5988c5eca	remove unused `type: ignore` directives (#60006 ) Summary: During development it is common practice to put `type: ignore` comments on lines that are correct, but `mypy` doesn't recognize this. This often stems from the fact, that the used `mypy` version wasn't able to handle the used pattern. With every new release `mypy` gets better at handling complex code. In addition to fix all the previously accepted but now failing patterns, we should also revisit all `type: ignore` comments to see if they are still needed or not. Fortunately, we don't need to do it manually: by adding `warn_unused_ignores = True` to the configuration, `mypy` will error out in case it encounters an `type: ignore` that is no longer needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60006 Reviewed By: jbschlosser, malfet Differential Revision: D29133237 Pulled By: albanD fbshipit-source-id: 41e82edc5cd5affa7ccedad044b59b94dad4425a	2021-06-18 07:23:31 -07:00
driazati	059a717c9e	Fix breakpad build and add to more images (#59236 ) Summary: This PR * adds the breakpad build to most of the remaining docker images (except the mobile + slim ones) * pins to a [fork of breakpad](https://github.com/google/breakpad/compare/master...driazati:master?expand=1) to enable dasiy chaining on signal handlers * renames the API to be nicer Pull Request resolved: https://github.com/pytorch/pytorch/pull/59236 Reviewed By: malfet Differential Revision: D28792511 Pulled By: driazati fbshipit-source-id: 83723e74b7f0a00e1695210ac2620a0c91ab4bf2	2021-06-01 22:47:14 -07:00
Sam Estep	75024e228c	Add lint for unqualified `type: ignore` (#56290 ) Summary: The other half of https://github.com/pytorch/pytorch/issues/56272. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56290 Test Plan: CI should pass on the tip of this PR, and we know that the lint works because the following CI runs (before this PR was finished) failed: - https://github.com/pytorch/pytorch/runs/2384511062 - https://github.com/pytorch/pytorch/actions/runs/765036024 Reviewed By: seemethere Differential Revision: D27867219 Pulled By: samestep fbshipit-source-id: e648f07b6822867e70833e23ddafe7fb7eaca235	2021-04-21 08:07:23 -07:00
cyy	f74a346213	Fix torch.hub.load("pytorch/vision") fails to validate the master branch (#56138 ) Summary: We should iterate all pages of the branches API. Otherwise, even using "pytorch/vision" would fail to find master. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56138 Reviewed By: heitorschueroff Differential Revision: D27872346 Pulled By: ailzhang fbshipit-source-id: 55881558f7980b1fb08b0d08ed6687a38df06edd	2021-04-20 09:33:25 -07:00
davidriazati@fb.com	638617f9f8	Write mini dump on pybind exceptions (#55652 ) Summary: We register an [error handler](https://pybind11.readthedocs.io/en/stable/advanced/exceptions.html#registering-custom-translators) with pybind so that C++ exceptions are passed to Python and raised as runtime errors that can be `try...except`ed etc. Since these don't terminate the program (until Python does), they never fire the signal handler to write a minidump out with the crash information. This PR adds some logic in the exception translator to write out a minidump if enabled. ](https://our.intern.facebook.com/intern/diff/27830952/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/55652 Pulled By: driazati Reviewed By: bertmaher Differential Revision: D27830952 fbshipit-source-id: 26e8f913e99dff971a4eb09eb87221c66f759763	2021-04-19 14:53:43 -07:00
Sam Estep	e3900d2ba5	Add lint for unqualified `noqa` (#56272 ) Summary: As this diff shows, currently there are a couple hundred instances of raw `noqa` in the codebase, which just ignore all errors on a given line. That isn't great, so this PR changes all existing instances of that antipattern to qualify the `noqa` with respect to a specific error code, and adds a lint to prevent more of this from happening in the future. Interestingly, some of the examples the `noqa` lint catches are genuine attempts to qualify the `noqa` with a specific error code, such as these two: ``` test/jit/test_misc.py:27: print(f"{hello + ' ' + test}, I'm a {test}") # noqa E999 test/jit/test_misc.py:28: print(f"format blank") # noqa F541 ``` However, those are still wrong because they are [missing a colon](https://flake8.pycqa.org/en/3.9.1/user/violations.html#in-line-ignoring-errors), which actually causes the error code to be completely ignored: - If you change them to anything else, the warnings will still be suppressed. - If you add the necessary colons then it is revealed that `E261` was also being suppressed, unintentionally: ``` test/jit/test_misc.py:27:57: E261 at least two spaces before inline comment test/jit/test_misc.py:28:35: E261 at least two spaces before inline comment ``` I did try using [flake8-noqa](https://pypi.org/project/flake8-noqa/) instead of a custom `git grep` lint, but it didn't seem to work. This PR is definitely missing some of the functionality that flake8-noqa is supposed to provide, though, so if someone can figure out how to use it, we should do that instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56272 Test Plan: CI should pass on the tip of this PR, and we know that the lint works because the following CI run (before this PR was finished) failed: - https://github.com/pytorch/pytorch/runs/2365189927 Reviewed By: janeyx99 Differential Revision: D27830127 Pulled By: samestep fbshipit-source-id: d6dcf4f945ebd18cd76c46a07f3b408296864fcb	2021-04-19 13:16:18 -07:00
Ailing Zhang	0a06d054d0	Revert "Only allow hub.load() from original repo. (#54451 )" (#56048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56048 This reverts commit `c411017a41`. This implementation broke CI in pytorch/vision and it's not handling tags properly. So I want to revert it first to unblock vision CI and send out a proper fix later. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D27771701 Pulled By: ailzhang fbshipit-source-id: 932f9be72a1ae1816f4032643b3c2dde0cb7ae4c	2021-04-15 11:16:56 -07:00
Ailing Zhang	c411017a41	Only allow hub.load() from original repo. (#54451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54451 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D27243825 Pulled By: ailzhang fbshipit-source-id: 2f65a82064d83b71224b4280ddfaabfa8ec9aec3	2021-03-22 20:27:54 -07:00
Pritam Damania	4fa47e5e7d	Support non-tensor inputs and outputs for checkpointed functions. (#52422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52422 As mentioned in https://github.com/pytorch/pytorch/issues/52415, `torch.utils.checkpoint` doesn't support checkpointing for functions which have non-tensor inputs and outputs. This PR resolves this issue by ensuring the autograd machinery ignores the non-tensor inputs and outputs and processes the tensors accordingly. ghstack-source-id: 124406867 Test Plan: 1) unit test 2) waitforbuildbot Reviewed By: albanD Differential Revision: D26507228 fbshipit-source-id: 0a5a1591570814176185362e83ad18dabd9c84b0	2021-03-19 21:29:03 -07:00
Jane Xu	09ce9b5877	Store test file in S3 as well for every TestSuite (#52869 ) Summary: We want to store the file names that triggers each test suite so that we can use this data for categorizing those test files. ~~After considering several solutions, this one is the most backwards compatible, and the current test cases in test_testing.py for print test stats don't break.~~ The previous plan did not work, as there are multiple Python test jobs that spawn the same suites. Instead, the new S3 format will store test files (e.g., `test_nn` and `distributed/test_distributed_fork`) which will contain the suites they spawn, which will contain the test cases run within the suite. (Currently, there is no top layer of test files.) Because of this major structural change, a lot of changes have now been made (thank you samestep!) to test_history.py and print_test_stats.py to make this new format backwards compatible. Old test plan: Make sure that the data is as expected in S3 after https://github.com/pytorch/pytorch/pull/52873 finishes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52869 Test Plan: Added tests to test_testing.py which pass, and CI. Reviewed By: samestep Differential Revision: D26672561 Pulled By: janeyx99 fbshipit-source-id: f46b91e16c1d9de5e0cb9bfa648b6448d979257e	2021-03-02 07:36:00 -08:00
Jane Xu	550c965b2e	Re-enable test_standalone_load for Windows 11.1 (#51596 ) Summary: This fixes the previous erroring out by adding stricter conditions in cpp_extension.py. To test, run a split torch_cuda build on Windows with export BUILD_SPLIT_CUDA=ON && python setup.py develop and then run the following test: python test/test_utils.py TestStandaloneCPPJIT.test_load_standalone. It should pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51596 Reviewed By: malfet Differential Revision: D26213816 Pulled By: janeyx99 fbshipit-source-id: a752ce7f9ab9d73dcf56f952bed2f2e040614443	2021-02-03 08:58:34 -08:00

1 2 3 4

171 Commits