Commit Graph

243 Commits

Author SHA1 Message Date
Johannes M Dieterich
a4c59a9dab MIOpen integration, more tests enabled, bug fixes (#10612)
Summary:
* first integration of MIOpen for batch norm and conv on ROCm
* workaround a ROCm compiler bug exposed by elementwise_kernel through explicit capture of variables in the densest packing
* workaround a ROCm compiler bug exposed by having `extern "C" __host__` as a definition and just `__host__` in the implementation through the hipify script
* use fabs() in accordance with C++11 for double absolute, not ::abs() which is integer-only on ROCm
* enable test_sparse set on CI, skip tests that don't work currently on ROCm
* enable more tests in test_optim after the elementwise_bug got fixed
* enable more tests in test_dataloader
* improvements to hipification and ROCm build

With this, resnet18 on CIFAR data trains without hang or crash in our tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10612

Reviewed By: bddppq

Differential Revision: D9423872

Pulled By: ezyang

fbshipit-source-id: 22c0c985217d65c593f35762b3eb16969ad96bdd
2018-08-23 15:24:47 -07:00
iotamudelta
75651d5b58 improve use of ROCm libraries, enable more tests, small fixes (#10406)
Summary:
* some small leftovers from the last PR review
* enable more unit test sets for CI
* replace use of hcRNG w/ rocRAND (docker image was already updated w/ newer rocRAND)
* use rocBLAS instead of hipBLAS to allow convergence w/ Caffe2
* use strided_batched gemm interface also from the batched internal interface
* re-enable Dropout.cu as we now have philox w/ rocRAND
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10406

Reviewed By: Jorghi12

Differential Revision: D9277093

Pulled By: ezyang

fbshipit-source-id: 7ef2f6fe4ead77e501ed7aea5c3743afe2466ca2
2018-08-13 11:39:43 -07:00
Tongzhou Wang
04f381650e Resubmit: Fix dataloader hang when it is not completely iterated (#10366)
Summary:
https://github.com/pytorch/pytorch/pull/9655
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10366

Differential Revision: D9237393

Pulled By: SsnL

fbshipit-source-id: fabfad7f371ba33300098f6b885c0e3f26c3e14a
2018-08-09 00:10:24 -07:00
iotamudelta
cfa05706ef ROCm contributions week 29 (#9653)
Summary:
In this changeset:
* improvements to `hipify-python.py`
* marking unit tests broken for ROCm
* reducing the number of jobs for the built to avoid out of memory issues
* switch to Thrust/cub-hip master for the CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9653

Differential Revision: D9117791

Pulled By: ezyang

fbshipit-source-id: a6c3c7b81f2bda9825974bf9bf89a97767244352
2018-08-02 09:09:00 -07:00
Tongzhou Wang
a7f183f971 Revert "Fix dataloader hang when it is not completely iterated (#9655)" (#9804)
Summary:
This reverts commit 9ee5133651.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9804

Reviewed By: ezyang

Differential Revision: D8987780

Pulled By: SsnL

fbshipit-source-id: 75ad70b0b8d672d0b35235fa248b187be64b68e5
2018-07-25 10:10:30 -07:00
Tongzhou Wang
a387331e54 Re-enable test_segfault after recent dataloder changes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9700

Differential Revision: D8953615

Pulled By: SsnL

fbshipit-source-id: c6aa3c07dd2857dd54889d47e537a6b1e9198c60
2018-07-23 18:38:42 -07:00
Tongzhou Wang
9ee5133651 Fix dataloader hang when it is not completely iterated (#9655)
Summary:
second trial of https://github.com/pytorch/pytorch/pull/7140

cc csarofeen Let's see if this works. It passes everything locally.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9655

Differential Revision: D8940177

Pulled By: SsnL

fbshipit-source-id: 8d6340fc9f7355c71e1e26b262da166402faa158
2018-07-22 20:38:27 -07:00
Tongzhou Wang
050a2588b5 change stft to have consistent signature with librosa (#9497)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9497

Fixes #7883 by using `rfft`.

It's worth noting that this is BC breaking. And it's impossible to detect the change because the two signatures before and after this change supports a common subset of calling patterns, e.g., `stft(Tensor, int, int)`. (some other calling patterns will raise error).

soumith and I plan to change the current `stft` interface because it is a bit messy and non-standard. rafaelvalle suggested us that `librosa` is a good reference API to align with. After discussing with soumith and ezyang , and given that `stft` is only out for 1 release, I decide to go with directly changing the signature. Also, my understanding is that most researchers in this field will welcome this change as `librosa` seems to be the golden-standard here. (it doesn't yet support all `pad_mode` but those will become available if added to `F.pad`.)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9308

Reviewed By: ezyang

Differential Revision: D8806148

Pulled By: SsnL

fbshipit-source-id: f6e8777d0c34d4a4d7024e638dc9c63242e8bb58
2018-07-17 10:55:43 -07:00
Will Feng
90fd4df695 Add flag for disabling tests with multiprocessing spawn start method (#9061)
Summary:
This will resolve some of the timeout issues in CPU and GPU tests internally.
Closes https://github.com/pytorch/pytorch/pull/9061

Reviewed By: ezyang

Differential Revision: D8707471

Pulled By: yf225

fbshipit-source-id: 9dc82a2c9da0c540ae015442f74b9b2b1a67a246
2018-06-30 14:39:11 -07:00
Will Feng
c84b97b979 [READY TO MERGE] Enable tests that use DataLoader with multiple workers on Windows (#6745)
* Don't import TEST_CUDA for test_dataloader on Windows

* test_partial_workers is stuck on Windows
2018-06-06 22:50:39 -04:00
Will Feng
e8bdbdaa27 Terminate dataloader workers properly when parent process is SIGKILL'ed (#6779)
Reopening #6606 with fix for TEST_CUDA import issue on Windows and improvement to how we wait for manager exit in test_manager_unclean_exit. Loop tested on the Windows CI multiple times to make sure this actually fixes the CUDA OOM issue.

* Terminate dataloader workers properly when parent process is SIGKILL'ed

* Wait for worker processes to finish before shutting down manager process

* Add test for checking proper worker exit

* cosmetic change

* Test only if CUDA exists

* Don't call multiprocessing.set_start_method() in Python 2

* import TEST_CUDA only when we are in __main__

* Tune JOIN_TIMEOUT

* handle os.getppid() == 0 case

* Reset to original JOIN_TIMEOUT

* Use WaitForSingleObject() to check parent process status on Windows

* Fix TEST_CUDA import

* clean up

* Check main process only when index_queue.get() times out

* Change index_queues to multiprocessing.Queue

* Move manager checking logic to watchdog class

* Fix bugs in dataloader

* Fix TEST_CUDA import issue

* Don't import TEST_CUDA from common_nn

* Use event to signal manager exit in test

* fix lint

* Add comments
2018-04-22 23:03:54 -04:00
gchanan
4c5b95a433
Revert "Terminate dataloader workers properly when parent process is SIGKILL'ed (#6606)" (#6772)
This reverts commit 8d6a50aaeb.
2018-04-19 14:28:48 -04:00
Will Feng
8d6a50aaeb
Terminate dataloader workers properly when parent process is SIGKILL'ed (#6606)
* Terminate dataloader workers properly when parent process is SIGKILL'ed

* Wait for worker processes to finish before shutting down manager process

* Add test for checking proper worker exit

* cosmetic change

* Test only if CUDA exists

* Don't call multiprocessing.set_start_method() in Python 2

* import TEST_CUDA only when we are in __main__

* Tune JOIN_TIMEOUT

* handle os.getppid() == 0 case

* Reset to original JOIN_TIMEOUT

* Use WaitForSingleObject() to check parent process status on Windows

* Fix TEST_CUDA import

* clean up

* Check main process only when index_queue.get() times out

* Change index_queues to multiprocessing.Queue

* Move manager checking logic to watchdog class

* Fix bugs in dataloader

* Fix TEST_CUDA import issue
2018-04-18 20:41:33 -04:00
Tongzhou Wang
60a16e5663 Set dataloader.batch_size = None when batch_sampler is given (#6108) 2018-03-30 10:01:09 +02:00
Jason Park
64e2c03bea Enable TensorDataset to get any number of tensors (#6038)
Keeping compatibility, enable TensorDataset to get any number of tensors.

* Enable TensorDataset to get any number of tensors

* Update dataset.py

Fix syntax error on python 2.7

* Add several test for tensordataset

* Fix whitespaces

* Simplify args

* Update dataset.py
2018-03-28 11:20:50 -04:00
AlexanderRadionov
831780390c Fixed non-determinate preprocessing on DataLoader (#4640)
dded ind_worker_queue parameter to data.DataLoader. It makes preprocessing determinate.

DataLoader in multiprocessing mode may cause non-deterministic issue. Even if radom_seed has frozen, each subprocess may get tasks in unstable order. This is caused by different I/O time while data loads. If you use augmentation while data loading, it makes results unreproduceble. Look at the https://discuss.pytorch.org/t/deterministic-non-deterministic-results-with-pytorch/9087

To fix this issue I have added the individual queue for each worker. In this case each worker get tasks in the stable order. In summary, subprocess produces the stable results.

To reproduce issue you may change ind_worker_queue to False and run the script several times.
Code to reproduce issue is in the corresponding PR.

* TestIndividualWorkerQueue added to DataLoader tests

* Review fixes

* "Simplify" code by removing itertools

* Rebase conflicts fix

* Review fixes

* Fixed shutdown behavior

* Removed ind_worker_queue flag.

* Rebase on master

* Disable tests that use DataLoader with multiple workers (#5322)
2018-03-23 17:43:59 -04:00
Will Feng
0340e46f9b Disable tests that use DataLoader with multiple workers (#5322) 2018-02-21 09:20:37 -05:00
Tongzhou Wang
964707e9b5 temporarily disable test_segfault until we figure out why it intermittently fails on cuda CI workere (#4976) 2018-01-31 19:04:44 -05:00
Tongzhou Wang
64a9ecae02 Dataloader issues (#4643)
* EINTR and kill by loader fix

* addressed @apaszke 's comments

* remove EINTR handling and add test if we are in main thread before setting SIGCHLD
2018-01-29 01:18:17 +01:00
peterjc123
2dd7039b6b Fix multiprocessing and dataloader tests on Windows (#4453) 2018-01-06 17:41:36 +01:00
Tongzhou Wang
cc9dc3f343 add lock for SynchronizedSeedDataset; add additional os level close stderr for tests that launch failing process (#4463) 2018-01-03 22:45:05 -05:00
Alykhan Tejani
18a866aedd Add random_split to torch.utils.data.dataset (#4435) 2018-01-02 18:56:49 +01:00
Will Feng
1681d07199 Disable tests and fix issues with Windows CUDA build (#4251) 2017-12-20 11:30:21 +01:00
Tongzhou Wang
5cc26c0c90 Add default PyTorch seeding and worker_init_fn to DataLoader (#4018)
* Add default PyTorch seeding and worker_init_fn to DataLoader

* generate seed using current RNG each time

* worker_seed <- main_proc_RNG_generated_seed + worker_id
2017-12-18 02:19:08 -05:00
Will Feng
db446d69ca Fix issues with Windows 7 & 10 CPU build (#4065) 2017-12-15 10:14:43 +01:00
SsnL
1661370ac5 Signal handling in DataLoader workers; Timeout option (#3474) 2017-11-29 23:52:14 +01:00
Richard Zou
e579ae75b5 Fix error when default_collate is passed a collection of numpy.str_ (#3404)
* Fix error when default_collate is passed a collection of numpy.str_

* Error if default_collate input is nested nparray containing non-numbers
2017-11-08 10:02:08 -05:00
Tzu-Wei Huang
618026e999 implements operator + for Dataset class (#3180)
* implements operator + for Dataset class

* check for exact equivalent
2017-10-29 01:19:59 +05:30
Valentin Haenel
d592e188f7 port of ConcatDataset (#1902) 2017-06-27 12:31:56 -04:00
Sam Gross
f09027bc29 Add batch sampler to DataLoader (#1867) 2017-06-22 20:18:31 +02:00
Isac Arnekvist
156fe28666 dataloader can now handle growing datasets (#1575) 2017-05-17 19:23:15 -04:00
Sasank Chilamkurthy
94b147fd41 Allows dicts batches in dataloader. (#1354)
* Allow dicts in Dataloader

* use collections.Sequence instead of collections.Iterable in dataloader
2017-04-28 19:14:52 +02:00
Adam Paszke
605b3c86ce Retain the type of numpy scalars in collate_fn 2017-04-11 14:48:54 -07:00
Eli Stevens
e216f557fd Fixes issue returning strings from a Dataloader with pin_memory=True (#908) 2017-03-13 10:11:07 +01:00
Adam Paszke
7ea6ae57c8 Support numpy arrays in default_collate 2017-02-20 23:28:31 -08:00
zhtvk
4d37ef878c Remove view on data and target tensors of dim 1 in TensorDataset (#609) 2017-02-09 22:06:39 +01:00
Luke Yeager
e7c1e6a8e3 [pep8] Fix most lint automatically with autopep8
Here's the command I used to invoke autopep8 (in parallel!):

    git ls-files | grep '\.py$' | xargs -n1 -P`nproc` autopep8 -i

Several rules are ignored in setup.cfg. The goal is to let autopep8
handle everything which it can handle safely, and to disable any rules
which are tricky or controversial to address. We may want to come back
and re-enable some of these rules later, but I'm trying to make this
patch as safe as possible.

Also configures flake8 to match pep8's behavior.

Also configures TravisCI to check the whole project for lint.
2017-01-28 01:15:51 +01:00
Adam Paszke
a1fa995044 Fixes and improvements (#593)
* Fix error in ELU backward

* Add --seed flag for testst st

* Add test for BatchNorm eval

* Fix autograd.backward docs

* Support cc flags in cuDNN search

* Fix IndexSelect backward formula
2017-01-25 22:21:49 -05:00
Sam Gross
ac8a5e7f0d Remove error message assertion (#480)
Depending on how PyTorch is compiled, the source code for DataLoader
might not be fully available which can cause a spurious error in
test_dataloader.py
2017-01-18 13:16:38 -05:00
Sergey Zagoruyko
89d930335b fix tests for GPU-less setup (#298) 2016-12-12 10:56:57 +01:00
Sam Gross
be3276fcdd Account for batch_size in DataLoader.__len__() (#277) 2016-12-02 01:21:36 -05:00
Sam Gross
aea6ba4bcd Support pinned memory in the DataLoader (#265)
DataLoader now supports the constructor argument 'pin_memory'. When set
to true, tensors in the sample are copied to pinned memory. This happens
in a background thread when num_workers > 1.
2016-11-29 12:35:03 -05:00
Sam Gross
6db721b5dd Make DataLoader preserve the ordering of the dataset (#135) 2016-10-21 23:54:16 -04:00