Commit Graph

114 Commits

Author SHA1 Message Date
Tongzhou Wang
23db54acdf [DataLoader] add repr for WorkerInfo (#39975)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39975

Differential Revision: D22039414

Pulled By: ezyang

fbshipit-source-id: 230f68a91fca901bce652fdf88ba88167f39b978
2020-06-16 08:19:32 -07:00
ShawnZhong
c8c53c802e Add generator= kwarg for DataLoader & random samplers (#39737)
Summary:
Fix https://github.com/pytorch/pytorch/issues/39572

Add `generator=` kwarg for DataLoader & random samplers

cc: SsnL, deeppatel4557, albanD, mitar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39737

Differential Revision: D22019132

Pulled By: albanD

fbshipit-source-id: 835e08b86c5396bc0b0e41057661306b15394d6e
2020-06-15 07:01:20 -07:00
Daiming Yang
0b90b9cdd3 Allow shuffle when auto-batching disabled in DataLoader (#39865)
Summary:
Fix https://github.com/pytorch/pytorch/issues/35761
cc SsnL

Note: closed the other PR for this new branch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39865

Differential Revision: D22003612

Pulled By: ezyang

fbshipit-source-id: 26aecd1b298fe99d3924f4c8157cd6cae2561c7c
2020-06-11 15:17:46 -07:00
Hong Xu
283a3ff16d The exception raised when RandomSampler.replacement is non-boolean should be TypeError (#36547)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36547

Differential Revision: D21818752

Pulled By: ezyang

fbshipit-source-id: 7502a24a0df134c44ac72959ba992777c873f8e9
2020-06-02 06:54:02 -07:00
Donna Choi
3d2fce6bc3 Change len(DataLoader) for IterableDataset (#38925)
Summary:
Fix https://github.com/pytorch/pytorch/issues/36176

One-liner change to ensure that ```len(loader) == (len(dataset) // batch_size)``` for IterableDataset.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38925

Differential Revision: D21731587

Pulled By: ezyang

fbshipit-source-id: 59a086165a004c0c1c8a1ee0776b1444bd26de23
2020-05-27 11:56:41 -07:00
Donna Choi
8c07a98adc Error out of default_collate for lists of unequal size (#38492)
Summary:
Fix issue https://github.com/pytorch/pytorch/issues/23141#

In the below example ```default_collate``` collates each element of the list. Since the second element isn't present in all samples, it is discarded:
```
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
import numpy as np

class CustomDataset(Dataset):
    def __len__(self):
        return 2

    def __getitem__(self, idx):
        tmp = {
            "foo": np.array([1, 2, 3]),
            "bar": ["X"] * (idx+1),
        }

        return tmp

training = CustomDataset()

for batch in DataLoader(training, batch_size=2):
    print(batch)
```
Yields
```
{
  'foo': tensor(
    [
      [1, 2, 3],
      [1, 2, 3]
    ]
  ),
  'bar': [
      ('X', 'X'),
    ]
}
```

Based on discussion in the issue, it seems the best course of action is to error out in this case. This seems consistent with what is done for tensor elements, as seen in [TensorShape.cpp line 1066](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/TensorShape.cpp#L1060) which is called when ```torch.stack``` is called. In this PR, I introduce a similar message to error out for lists.

SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38492

Differential Revision: D21620396

Pulled By: ezyang

fbshipit-source-id: 17f59fbb1ed1f0d9b2185c95b9ebe55ece701b0c
2020-05-18 14:53:33 -07:00
SsnL
b5868b2833 Relax sampler check in BatchSampler (#38403)
Summary:
Since the check was added in https://github.com/pytorch/pytorch/pull/6249, one can not pass an iterable as a sampler to the data loader anymore, which was a very handy feature (e.g., https://github.com/pytorch/pytorch/issues/1337). I think the check should be removed for two-fold reasons:
1. It is too strict. There is no reason that it should not be a general iterable.
2. It is inconsistent. In `DataLoader` (the main place where people use samplers), you can pass a general iterable as `batch_sampler` but not `sampler` due to this check.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38403

Differential Revision: D21555958

Pulled By: soumith

fbshipit-source-id: c7267bb99a31edd8f2750689205d6edc5dab5cff
2020-05-13 22:24:29 -07:00
Vitaly Fedyunin
57d01be92b Replacing assertEqual with assertEqualIgnoreType wherever types missmatch (#38102)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38102

Test Plan: Imported from OSS

Differential Revision: D21477060

Pulled By: VitalyFedyunin

fbshipit-source-id: 25e0fd837ca9bfccf0ce994c80f7790c894096d4
2020-05-09 14:48:55 -07:00
David Reiss
e75fb4356b Remove (most) Python 2 support from Python code (#35615)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35615

Python 2 has reached end-of-life and is no longer supported by PyTorch.
Now we can clean up a lot of cruft that we put in place to support it.
These changes were all done manually, and I skipped anything that seemed
like it would take more than a few seconds, so I think it makes sense to
review it manually as well (though using side-by-side view and ignoring
whitespace change might be helpful).

Test Plan: CI

Differential Revision: D20842886

Pulled By: dreiss

fbshipit-source-id: 8cad4e87c45895e7ce3938a88e61157a79504aed
2020-04-22 09:23:14 -07:00
Wojciech Baranowski
69e3ee2d5f DataLoader: properly diagnose exceeding file descriptor limit (#34768)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/973

Common failure scenario:
* DataLoader creates workers and communicates with them through SHMs
* Workers send back through an AF_UNIX socket file descriptors to SHMs containing data
* The limit of open files gets fully used
* A FD gets stripped from a socket message coming back from a worker, without the worker knowing this.
* This causes a `RuntimeError: received 0 items of ancdata` in the standard `multiprocessing` package
* The exception is not handled by PyTorch and so is presented to the users.

After this change the user will see

```
Traceback (most recent call last):
  File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 761, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/multiprocessing/queues.py", line 113, in get
    return _ForkingPickler.loads(res)
  File "/home/wbaranowski/git/Quansight/pytorch/torch/multiprocessing/reductions.py", line 294, in rebuild_storage_fd
    fd = df.detach()
  File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach
    return reduction.recv_handle(conn)
  File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/multiprocessing/reduction.py", line 184, in recv_handle
    return recvfds(s, 1)[0]
  File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/multiprocessing/reduction.py", line 162, in recvfds
    len(ancdata))
RuntimeError: received 0 items of ancdata

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 787, in _try_get_data
    fs = [tempfile.NamedTemporaryFile() for i in range(10)]
  File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 787, in <listcomp>
    fs = [tempfile.NamedTemporaryFile() for i in range(10)]
  File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/tempfile.py", line 551, in NamedTemporaryFile
    (fd, name) = _mkstemp_inner(dir, prefix, suffix, flags, output_type)
  File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/tempfile.py", line 262, in _mkstemp_inner
    fd = _os.open(file, flags, 0o600)
OSError: [Errno 24] Too many open files: '/tmp/tmpnx_f6v_f'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test_shm_leak.py", line 56, in <module>
    worker_init_fn=worker_init_fn
  File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 861, in _next_data
    idx, data = self._get_data()
  File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 828, in _get_data
    success, data = self._try_get_data()
  File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 791, in _try_get_data
    "Too many open files. Communication with the"
RuntimeError: Too many open files. Communication with the workers is no longer possible. Please increase the limit using `ulimit -n` in the shell or change the sharing strategy by calling `torch.multiprocessing.set_sharing_strategy('file_system')` at the beginning of your code
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34768

Differential Revision: D20538053

Pulled By: ezyang

fbshipit-source-id: be4425cf2fa02aff61619b2b829c153cb1a867cb
2020-04-14 07:10:57 -07:00
Wanchao Liang
3526627f46 Use unittest assertWarns instead (#36411)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36411

This PR remove pytorch specific defined assertwarns and use the unit
test one, also format some tests

Test Plan: Imported from OSS

Differential Revision: D20998159

Pulled By: wanchaol

fbshipit-source-id: 1280ecff2dd293b95a639d13cc7417fc819c2201
2020-04-13 15:56:42 -07:00
Hong Xu
817e4f9ef1 Correct a ValueError in dataloader to TypeError (#36244)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36244

Differential Revision: D20963949

Pulled By: ezyang

fbshipit-source-id: 8c6aa4831021788052269e7aa8282d11eba4e085
2020-04-10 09:03:58 -07:00
Mathis Chenuet
17a01c7c7b feature: deterministic random_split (#34043)
Summary:
## 🚀 Feature
Option to provide a seed (random_state) for random_split() like the sklearn API https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html.

## Motivation
Useful for deterministic sampling & reproducible data generation (easily, without affecting the PRNG for other uses).
See https://github.com/pytorch/pytorch/issues/32467
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34043

Differential Revision: D20605678

Pulled By: ezyang

fbshipit-source-id: 12b10bf72cd8a0d4264ae4d326064f806945d011
2020-03-26 08:02:39 -07:00
Hong Xu
a6a72ac68f Fix all occurrences of C416. (#33429)
Summary:
C416: Unnecessary (list/set) comprehension - rewrite using list/set().

See https://pypi.org/project/flake8-comprehensions/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33429

Differential Revision: D19972858

Pulled By: ezyang

fbshipit-source-id: faac042a94c59d737bd5ae983121a0a029346e23
2020-02-21 08:32:22 -08:00
Pritam Damania
f050b16dd9 Move pytorch distributed tests to separate folder for contbuild. (#30445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445

Create distributed and rpc directories under caffe/test for better management
of unit tests.

Differential Revision: D18702786

fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606
2020-01-22 21:16:59 -08:00
Brian Vaughan
945ce71b18 Correctly handle scalar types, fix parse of numpy ints (#30486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30486

Fixes: https://github.com/pytorch/pytorch/issues/29252

There is some incorrect code in the handling of parsing python numbers that led to issue #29252:

When we allow interpretation of a zero-dim numpy integer value
as a scalar in pytorch, we incorrectly parse the int as a float.

This PR also fixes the issue described in the "FIXME" here:
https://github.com/pytorch/pytorch/pull/27628/files#diff-f539198dd366265fb8dc2d661bc5d5bcR1487

Test Plan: Added a unit test based on the example given in the issue.

Differential Revision: D18932520

Pulled By: nairbv

fbshipit-source-id: f6416f28dfd73ac72c1042042851d76beb5fcf65
2019-12-11 15:35:57 -08:00
Tongzhou Wang
c37de32b23 Enable len(dataloader) for iterable dataset (#23587)
Summary:
Copy-paste comment from code for reasoning:

```
            # NOTE [ IterableDataset and __len__ ]
            #
            # For `IterableDataset`, `__len__` could be inaccurate when one naively
            # does multi-processing data loading, since the samples will be duplicated.
            # However, no real use case should be actually using that behavior, so
            # it should count as a user error. We should generally trust user
            # code to do the proper thing (e.g., configure each replica differently
            # in `__iter__`), and give us the correct `__len__` if they choose to
            # implement it (this will still throw if the dataset does not implement
            # a `__len__`).
            #
            # To provide a further warning, we track if `__len__` was called on the
            # `DataLoader`, save the returned value in `self._len_called`, and warn
            # if the iterator ends up yielding more than this number of samples.
```

Fixes https://github.com/pytorch/pytorch/issues/30184
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23587

Differential Revision: D18852625

Pulled By: ailzhang

fbshipit-source-id: aea8d4d70c7f21aaa69b35908a6f43026493d826
2019-12-06 15:38:05 -08:00
Peter Bell
dcd1216efe Force early initialization of OpenMP in forked children (#29006)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/28389

Intel's OpenMP implementation sets the thread affinity on the first call to an OpenMP function after a fork. By adding an atfork handler we can force this to happen before a user tries to set the affinity in their own DataLoader `worker_init_fn`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29006

Differential Revision: D18782456

Pulled By: ezyang

fbshipit-source-id: ce0b515256da0cf18ceb125e0cdec99a3311bbd3
2019-12-03 15:23:31 -08:00
Michael Suo
4b0a6d299c test reporting (#29658)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29658

This PR makes our test scripts output artifacts that CircleCI can
understand. This has a few benefits:
1. We can actually see failed tests and their output in the job screen
(instead of having to scroll through logs)
2. We can use the CircleCI test metadata API to track failed tests
programmatically.

it looks like this (old ui):
https://circleci.com/gh/pytorch/pytorch/3546584?pipelines-ui-opt-out
or this (new ui):
https://app.circleci.com/jobs/github/pytorch/pytorch/3546584/tests

Test Plan: Imported from OSS

Differential Revision: D18597261

Pulled By: suo

fbshipit-source-id: 07fc7d26bbb834e13cc4cc0e48178645ae6579f5
2019-11-19 11:15:31 -08:00
Mike Ruberry
f6bda1e07b Removes @default_floating_dtype decorator (#27628)
Summary:
One fewer legacy decorator cluttering the test suite.

Functions relying on this decorator were updated or, in the case of test_sparse, the test suite was put back on double by default.

Note: this PR is blocked on https://github.com/pytorch/pytorch/issues/27599.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27628

Differential Revision: D17896254

Pulled By: mruberry

fbshipit-source-id: 13d460301f50ef4af7a660372432108164c0de1f
2019-10-12 12:39:34 -07:00
Mike Ruberry
7f183a978f Stops common_utils.py from setting the default tensor type (to torch.DoubleTensor) (#27444)
Summary:
This PR stop common_utils.py from setting the default tensor type when it's imported. See issue https://github.com/pytorch/pytorch/issues/27355. This is a frequent source of confusion for test writers.

Many tests relied on this setting (whether they knew it or not), and this PR also updates the test suite to pass without common_utils.py setting the default tensor type. Some larger test files now set the default floating dtype themselves, however. These test files are:

- test_autograd.py
- test_distributions.py
- test_jit.py
- test_nn.py

This is still a significant improvement from today, however. First, these files set the default floating dtype much more clearly than importing it from common_utils. Second, the rest of the test suite no longer sets this globally. Third, this PR is a springboard to updating those tests, too. In particular, as tests are made generic they can be moved aways from relying on this global setting.

Notable technical changes in this PR are:

- Significant updates to test_torch.py to make it pass without setting the default floating dtype globally.
- The default_floating_dtype decorator is now defined in common_utils, a couple versions of this operator were defined in test files previously.
- test_torch-specific parts of common_utils were refactored into test_torch.
- tensor creation methods in common_utils were updated to accept an optional dtype and device.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27444

Differential Revision: D17795235

Pulled By: mruberry

fbshipit-source-id: 7f77271c0c836e69f183ad9057a2c4b29f09d2e1
2019-10-08 09:52:44 -07:00
SsnL
df9d8f9032 Fix no auto batching bugs: cannot bulk load; not work with namedtuple (#26065)
Summary:
see title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26065

Differential Revision: D17392851

Pulled By: soumith

fbshipit-source-id: 468cd41c8e03d689ff2e0261d948e28daad6bfaf
2019-09-16 07:22:31 -07:00
Pritam Damania
f8611eaa7e Disable tsan for test_dataloader.py. (#25005)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25005

Seeing a bunch of failures in TSAN mostly with the following error:

```
ThreadSanitizer: starting new threads after multi-threaded fork is not
supported. Dying (set die_after_fork=0 to override)
```

TSAN is unsafe to use in a multi-threaded program after fork() and setting
die_after_fork can lead to deadlocks. As a result, I'm disabling tsan.
ghstack-source-id: 88765698

Differential Revision: D16954347

fbshipit-source-id: 18895cd82b5052938284b46479d8470af2d74a06
2019-08-22 16:20:54 -07:00
Tongzhou Wang
928754b67d make more iterator attributes private (#23744)
Summary:
1. Prefixed underscores to any `DataLoaderIter` attribute that is not part of the data loader ctor argument list.
2. Prefixed `DataLoader.dataset_kind` with underscore because it only makes sense with the private enum `_DatasetKind`, and is an implementation detail.
3. Disallow setting `DataLoader.dataset` and `DataLoader.batch_sampler` after initializing a `DataLoader` because they affect other attributes in `__init__`.

These changes should not have major BC breaking effect since the big changes are on the iterator class and most users don't even store it. I GitHub searched `pin_memory_thread` and (while I didn't look through all result pages) results I see are forks of pytorch and blog posts on how data loader works.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23744

Differential Revision: D16732507

Pulled By: ezyang

fbshipit-source-id: 9f04d000b4200b8047f31eaa3473780b66cebd26
2019-08-09 11:43:00 -07:00
SsnL
e982e46de3 Add multiprocessing_context= argument to DataLoader (#22990)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22131
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22990

Differential Revision: D16539052

Pulled By: colesbury

fbshipit-source-id: b1c48ae2fb54065dd96a67be263254129e02eaa2
2019-07-29 12:58:40 -07:00
Jan Schlüter
0bc90194fb Catch and print exception traceback in parallel_apply() workers (#18055)
Summary:
When an exception occurs in one of the modules passed to `parallel_apply()`, it is caught and re-raised in the main thread. This preserves the original exception type and message, but has the traceback point at the position where it's re-raised, rather than the original point of failure.

This PR saves the exception information required to generate the traceback, and includes the original traceback in the message of the exception raised in the main thread.

Before:
```
  ...
  File ".../torch/nn/parallel/data_parallel.py", line 153, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File ".../torch/nn/parallel/parallel_apply.py", line 84, in parallel_apply
    raise output
RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor
```

After:
```
  ...
  File ".../torch/nn/parallel/data_parallel.py", line 153, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File ".../torch/nn/parallel/parallel_apply.py", line 88, in parallel_apply
    ''.join(traceback.format_exception(*exc_info)))
RuntimeError: Caught exception in replica 0. Original traceback and message:
Traceback (most recent call last):
  ...
  File "../models/foo.py", line 319, in bar
    baz = asdf / ghij[:, np.newaxis]
RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor
```

I took care to raise an exception of the original type (in case the main code checks for that), but replaced the message. It helped me find a bug that did not occur outside `data_parallel()`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18055

Differential Revision: D16444972

Pulled By: zhangguanheng66

fbshipit-source-id: ec436c9d4677fad18106a8046cfa835a20a101ce
2019-07-26 11:41:22 -07:00
Tongzhou Wang
25eae3ed08 Disable test_proper_exit flaky worker_kill (#22208)
Summary:
I learned from https://github.com/pytorch/pytorch/pull/22058 that `worker_kill` is just flaky, regardless of `hold_iter_reference`. So let's disable it altogether for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22208

Differential Revision: D15990307

Pulled By: soumith

fbshipit-source-id: d7d3f4fe7eaac4987f240cb8fd032c73a84157d7
2019-06-26 09:47:40 -07:00
Tongzhou Wang
71741ba115 rename test to be more consistent
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22057

Differential Revision: D15936870

Pulled By: soumith

fbshipit-source-id: ab6194219da2582efdf324b89b2bc87dfe4e5d69
2019-06-20 22:02:36 -07:00
Tongzhou Wang
058beae411 Add IterableDataset (#19228)
Summary:
This is a modified version of https://github.com/pytorch/pytorch/pull/14705 since commit structure for that PR is quite messy.

1. Add `IterableDataset`.
3. So we have 2 data loader mods: `Iterable` and `Map`.

    1. `Iterable` if the `dataset` is an instance of `IterableDataset`
    2. `Map` o.w.

3. Add better support for non-batch loading (i.e., `batch_size=None` and `batch_sampler=None`). This is useful in doing things like bulk loading.
3. Refactor `DataLoaderIter` into two classes, `_SingleProcessDataLoaderIter` and `_MultiProcessingDataLoaderIter`. Rename some methods to be more generic, e.g., `get_batch` -> `get_data`.
4. Add `torch.utils.data.get_worker_info` which returns worker information in a worker proc (e.g., worker id, dataset obj copy, etc.) and can be used in `IterableDataset.__iter__` and `worker_init_fn` to do per-worker configuration.
5. Add `ChainDataset`, which is the analog of `ConcatDataset` for `IterableDataset`.
7. Import torch.utils.data in `torch/__init__.py`
9. data loader examples and documentations
10. Use `get_worker_info` to detect whether we are in a worker process in `default_collate`

Closes https://github.com/pytorch/pytorch/issues/17909, https://github.com/pytorch/pytorch/issues/18096, https://github.com/pytorch/pytorch/issues/19946, and some of https://github.com/pytorch/pytorch/issues/13023
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19228

Reviewed By: bddppq

Differential Revision: D15058152

fbshipit-source-id: 9e081a901a071d7e4502b88054a34b450ab5ddde
2019-06-20 20:12:44 -07:00
Iurii Zdebskyi
03617574d3 Сhange type of a tensor with bools (#19097)
Summary:
**This is **bc-breaking** change**
Change dtype of a tensor which was created from bool data.
Old behavior: torch.tensor([True, False]) -> uint8 tensor
Now: torch.tensor([True, False]) -> bool tensor

Tested via tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19097

Reviewed By: ezyang

Differential Revision: D15632553

Pulled By: izdeby

fbshipit-source-id: b019150844c561a6845710a3c62b12f06b68bbe3
2019-06-05 10:19:27 -07:00
Tongzhou Wang
f051fbd4a8 Fix typo in test_dataloader
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21226

Differential Revision: D15592797

Pulled By: soumith

fbshipit-source-id: b9a83e574c7b10fb0d661332ab68e376409a4724
2019-06-01 10:30:14 -07:00
Tongzhou Wang
1d4685c20f Improve test_proper_exit error printing (#20166)
Summary:
This doesn't have `strace` yet. But still have `faulthandler` to print stack traces at hanging. Also part of an attempt to isolate changes from #19228 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20166

Differential Revision: D15536504

Pulled By: ezyang

fbshipit-source-id: fe6e6e2e9899f30d8167436d7bc62b42883a3356
2019-05-29 07:51:31 -07:00
Tongzhou Wang
f496ea36b2 DataLoader: add error detection for worker_init_fn (#20150)
Summary:
This is an attempt to isolate unrelated changes from #19228 for easier review.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20150

Differential Revision: D15314891

Pulled By: ezyang

fbshipit-source-id: 8c429747ba83ad5aca4cdd8f8086bcf65a326921
2019-05-12 18:28:56 -07:00
Tongzhou Wang
1ab33fce9a Disable worker_kill & holder_iter_reference combination in test_proper_exit (#20172)
Summary:
cc nairbv
All failures I have seen are of this combination. So let's just disable it for all cases. After #20063 I find it failing for py3 once.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20172

Differential Revision: D15266527

Pulled By: nairbv

fbshipit-source-id: afb9389dfc54a0878d52975ffa37a0fd2aa3a735
2019-05-08 14:39:47 -07:00
Brian Vaughan
9005a2c0fc disable flaky test_proper_exit again, still occasionally failing (#20063)
Summary:
test was disabled for being flaky, re-enabled in https://github.com/pytorch/pytorch/pull/19421 but still occasionally failing:

https://circleci.com/gh/pytorch/pytorch/1520165?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

```
Apr 29 19:51:58 ======================================================================
Apr 29 19:51:58 FAIL: test_proper_exit (__main__.TestDataLoader)
Apr 29 19:51:58 There might be ConnectionResetError or leaked semaphore warning (due to dirty process exit), but they are all safe to ignore
Apr 29 19:51:58 ----------------------------------------------------------------------
Apr 29 19:51:58 Traceback (most recent call last):
Apr 29 19:51:58   File "/var/lib/jenkins/workspace/test/common_utils.py", line 129, in wrapper
Apr 29 19:51:58     fn(*args, **kwargs)
Apr 29 19:51:58   File "test_dataloader.py", line 847, in test_proper_exit
Apr 29 19:51:58     self.fail(fail_msg + ', and had exception {}'.format(loader_p.exception))
Apr 29 19:51:58 AssertionError: test_proper_exit with use_workers=True, pin_memory=False, hold_iter_reference=False, exit_method=worker_kill: loader process did not terminate, and had exception Traceback (most recent call last):
Apr 29 19:51:58   File "test_dataloader.py", line 227, in run
Apr 29 19:51:58     super(ErrorTrackingProcess, self).run()
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/process.py", line 114, in run
Apr 29 19:51:58     self._target(*self._args, **self._kwargs)
Apr 29 19:51:58   File "test_dataloader.py", line 424, in _test_proper_exit
Apr 29 19:51:58     for i, _ in enumerate(it):
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 545, in __next__
Apr 29 19:51:58     idx, batch = self._get_batch()
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 522, in _get_batch
Apr 29 19:51:58     success, data = self._try_get_batch()
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 480, in _try_get_batch
Apr 29 19:51:58     data = self.data_queue.get(timeout=timeout)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/queues.py", line 135, in get
Apr 29 19:51:58     res = self._recv()
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/queue.py", line 22, in recv
Apr 29 19:51:58     return pickle.loads(buf)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1382, in loads
Apr 29 19:51:58     return Unpickler(file).load()
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 858, in load
Apr 29 19:51:58     dispatch[key](self)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1133, in load_reduce
Apr 29 19:51:58     value = func(*args)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/reductions.py", line 274, in rebuild_storage_fd
Apr 29 19:51:58     fd = multiprocessing.reduction.rebuild_handle(df)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/reduction.py", line 155, in rebuild_handle
Apr 29 19:51:58     conn = Client(address, authkey=current_process().authkey)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/connection.py", line 169, in Client
Apr 29 19:51:58     c = SocketClient(address)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/multiprocessing/connection.py", line 304, in SocketClient
Apr 29 19:51:58     s.connect(address)
Apr 29 19:51:58   File "/opt/python/2.7.9/lib/python2.7/socket.py", line 224, in meth
Apr 29 19:51:58     return getattr(self._sock,name)(*args)
Apr 29 19:51:58 error: [Errno 111] Connection refused

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20063

Differential Revision: D15218223

Pulled By: nairbv

fbshipit-source-id: 32018c4220f7cb9372ef138631fc3a79759265e1
2019-05-06 08:34:27 -07:00
Seungwon Park
6c7135decb fix typo: pytoch -> pytorch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19719

Differential Revision: D15080095

Pulled By: ezyang

fbshipit-source-id: b731a0fde87d25c63c1e3d4b9a9c2244e5ad84af
2019-04-25 06:40:40 -07:00
SsnL
5e62ee2b97 Fix no SIGCHLD checking in DataLoaderIter._shutdown_workers (#19421)
Summary:
Also

1. Bump multiprocessing test timeout following python core tests
2. Fix one type of flakiness in `test_proper_exit`.
3. Add trace reporting when loader process hangs in `test_proper_exit` using `faulthandler`.
3. Give `test_proper_exit` another try.

I'll heavily retest this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19421

Differential Revision: D15063728

Pulled By: ezyang

fbshipit-source-id: 4e0d992622e11053c44a9ec237b88b9a28a4472c
2019-04-24 08:06:58 -07:00
Edward Yang
8793e8db42 Disable flaky test_proper_exit test. (#18950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18950
ghimport-source-id: 27bd575fd3c73a51ace1360aa020fa63a792a5d2

Differential Revision: D14802009

Pulled By: ezyang

fbshipit-source-id: 051e1d038892c2c6e8337357fa80771b8dc42680
2019-04-05 09:49:54 -07:00
Edward Yang
173f224570 Turn on F401: Unused import warning. (#18598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**

This was requested by someone at Facebook; this lint is turned
on for Facebook by default.  "Sure, why not."

I had to noqa a number of imports in __init__.  Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it.  Left for future work.

Be careful!  flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments.  flake8-3 will
report an import unused; flake8-2 will not.  For now, I just
noqa'd all these sites.

All the changes were done by hand.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14687478

fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
2019-03-30 09:01:17 -07:00
Tongzhou Wang
d1e416ac73 Enable printing to stderr for test_proper_exit for better debugging (#18458)
Summary:
related to https://github.com/pytorch/pytorch/issues/16608
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18458

Differential Revision: D14611718

Pulled By: soumith

fbshipit-source-id: 6dc903ff2d32b9c3b76470869d1f4e9a67f706df
2019-03-25 19:20:21 -07:00
Edward Yang
2934153f35 Correctly call superclass setUp in TestCase subclasses. (#18291)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18291
ghimport-source-id: d6e95e899bd320407967df41435801e54864ba62

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18292 Add test for #17271 (torch.exp incorrect for 2**31 size tensor)
* **#18291 Correctly call superclass setUp in TestCase subclasses.**

This makes PYTORCH_TEST_SKIP_FAST work correctly for more
tests, reducing the wasted testing effort on our slow_test job.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14567643

fbshipit-source-id: 40cf1d6556e0dd0a0550ff3d9ffed8b6000f8191
2019-03-22 07:46:44 -07:00
Tongzhou Wang
f212fd9fd6 Customized pin_memory for PackedSequence (#18079)
Summary:
fixes https://github.com/pytorch/pytorch/issues/18078
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18079

Reviewed By: ezyang

Differential Revision: D14521192

Pulled By: zou3519

fbshipit-source-id: cec773a3a6f2c405a0d9701e213b7caf81649181
2019-03-19 13:41:30 -07:00
Edward Yang
d391137acd Fix lint in test_dataloader.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17878

Reviewed By: eellison

Differential Revision: D14409933

fbshipit-source-id: 20ee8953a21e29b4557aff62b5e48dddd630eef6
2019-03-11 14:50:51 -07:00
Edward Yang
b3c9090736 Revert D14392864: Fix lint in test_dataloader.py
Differential Revision:
D14392864

Original commit changeset: 12477b9cfe29

fbshipit-source-id: 1864a80d5cfaceeae55d0145340a578f978ab4a7
2019-03-11 10:19:41 -07:00
Edward Yang
c02369151d Fix lint in test_dataloader.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17820

Reviewed By: eellison

Differential Revision: D14392864

fbshipit-source-id: 12477b9cfe290428d51cc28e024c8cbe8bb7bf51
2019-03-11 08:01:33 -07:00
bhushan
a6c4ea66dd Passing indices as a list to Subset instead of Tensor (#17649)
Summary:
Indices in Subset were stored as tensors earlier
passing as list in random_split to ensure integer indexing

fixes: #17466
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17649

Differential Revision: D14400250

Pulled By: soumith

fbshipit-source-id: cd20a959f33773c4babf8e861ea37ec61c2713a0
2019-03-10 09:23:53 -07:00
youkaichao
b87abdfc12 typo fix
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17653

Differential Revision: D14302003

Pulled By: ezyang

fbshipit-source-id: 8ad90985a392b07127c7e315d4e74ce77962b573
2019-03-06 11:36:44 -08:00
Eskil Jörgensen
8042edcdb1 Make pin_memory and default_collate preserve namedtuples (#16440)
Summary:
Open issue: https://github.com/pytorch/pytorch/issues/3281
Corresponding PR (conflict): https://github.com/pytorch/pytorch/pull/4577

Another open issue: https://github.com/pytorch/pytorch/issues/14613
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16440

Differential Revision: D14020901

Pulled By: ezyang

fbshipit-source-id: 4abe817fc43c281a510715d311bad544511995d3
2019-02-11 08:47:33 -08:00
Michael Carilli
0742874643 Allow dataloader to accept a custom memory pinning function (#16743)
Summary:
Renewed attempt at https://github.com/pytorch/pytorch/pull/14171

From the original PR:
> Currently, the pin_memory_batch function in the dataloader will return a batch comprised of any unrecognized type without pinning the data, because it doesn't know how.
>
>This behavior was preventing us from overlapping data prefetching in Mask-RCNN, whose custom collate_fn returns a custom batch type.

The old PR allowed the user to implement batch pinning for custom batch and data types by passing a custom pin function to the dataloader.  slayton58 suggested a cleaner approach:  allow the user to define a `pin_memory` method on their custom types, and have `pin_memory_batch` [check for the presence of that method](https://github.com/pytorch/pytorch/pull/16743/files#diff-9f154cbd884fe654066b1621fad654f3R56) in the incoming batch as a fallback.  I've updated the test and docstrings accordingly.

The old PR was merged but then reverted due to weird cuda OOM errors on windows that may or may not have been related.  I have no idea why my changes would cause such errors (then or now) but it's something to keep an eye out for.

fmassa and yf225 who were my POCs on the old PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16743

Differential Revision: D13991745

Pulled By: ezyang

fbshipit-source-id: 74e71f62a03be453b4caa9f5524e9bc53467fa17
2019-02-10 19:37:53 -08:00
Johannes M Dieterich
23e1c55cc0 enable unit tests working on ROCm 2.1 (#16871)
Summary:
This is the first round of enabling unit tests that work on ROCm 2.1 in my tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16871

Differential Revision: D13997662

Pulled By: bddppq

fbshipit-source-id: d909a3f7dd5fc8f85f126bf0613751c8e4ef949f
2019-02-09 00:30:50 -08:00