Summary:
In this changeset:
* improvements to `hipify-python.py`
* marking unit tests broken for ROCm
* reducing the number of jobs for the built to avoid out of memory issues
* switch to Thrust/cub-hip master for the CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9653
Differential Revision: D9117791
Pulled By: ezyang
fbshipit-source-id: a6c3c7b81f2bda9825974bf9bf89a97767244352
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9497Fixes#7883 by using `rfft`.
It's worth noting that this is BC breaking. And it's impossible to detect the change because the two signatures before and after this change supports a common subset of calling patterns, e.g., `stft(Tensor, int, int)`. (some other calling patterns will raise error).
soumith and I plan to change the current `stft` interface because it is a bit messy and non-standard. rafaelvalle suggested us that `librosa` is a good reference API to align with. After discussing with soumith and ezyang , and given that `stft` is only out for 1 release, I decide to go with directly changing the signature. Also, my understanding is that most researchers in this field will welcome this change as `librosa` seems to be the golden-standard here. (it doesn't yet support all `pad_mode` but those will become available if added to `F.pad`.)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9308
Reviewed By: ezyang
Differential Revision: D8806148
Pulled By: SsnL
fbshipit-source-id: f6e8777d0c34d4a4d7024e638dc9c63242e8bb58
Summary:
This will resolve some of the timeout issues in CPU and GPU tests internally.
Closes https://github.com/pytorch/pytorch/pull/9061
Reviewed By: ezyang
Differential Revision: D8707471
Pulled By: yf225
fbshipit-source-id: 9dc82a2c9da0c540ae015442f74b9b2b1a67a246
Reopening #6606 with fix for TEST_CUDA import issue on Windows and improvement to how we wait for manager exit in test_manager_unclean_exit. Loop tested on the Windows CI multiple times to make sure this actually fixes the CUDA OOM issue.
* Terminate dataloader workers properly when parent process is SIGKILL'ed
* Wait for worker processes to finish before shutting down manager process
* Add test for checking proper worker exit
* cosmetic change
* Test only if CUDA exists
* Don't call multiprocessing.set_start_method() in Python 2
* import TEST_CUDA only when we are in __main__
* Tune JOIN_TIMEOUT
* handle os.getppid() == 0 case
* Reset to original JOIN_TIMEOUT
* Use WaitForSingleObject() to check parent process status on Windows
* Fix TEST_CUDA import
* clean up
* Check main process only when index_queue.get() times out
* Change index_queues to multiprocessing.Queue
* Move manager checking logic to watchdog class
* Fix bugs in dataloader
* Fix TEST_CUDA import issue
* Don't import TEST_CUDA from common_nn
* Use event to signal manager exit in test
* fix lint
* Add comments
* Terminate dataloader workers properly when parent process is SIGKILL'ed
* Wait for worker processes to finish before shutting down manager process
* Add test for checking proper worker exit
* cosmetic change
* Test only if CUDA exists
* Don't call multiprocessing.set_start_method() in Python 2
* import TEST_CUDA only when we are in __main__
* Tune JOIN_TIMEOUT
* handle os.getppid() == 0 case
* Reset to original JOIN_TIMEOUT
* Use WaitForSingleObject() to check parent process status on Windows
* Fix TEST_CUDA import
* clean up
* Check main process only when index_queue.get() times out
* Change index_queues to multiprocessing.Queue
* Move manager checking logic to watchdog class
* Fix bugs in dataloader
* Fix TEST_CUDA import issue
Keeping compatibility, enable TensorDataset to get any number of tensors.
* Enable TensorDataset to get any number of tensors
* Update dataset.py
Fix syntax error on python 2.7
* Add several test for tensordataset
* Fix whitespaces
* Simplify args
* Update dataset.py
dded ind_worker_queue parameter to data.DataLoader. It makes preprocessing determinate.
DataLoader in multiprocessing mode may cause non-deterministic issue. Even if radom_seed has frozen, each subprocess may get tasks in unstable order. This is caused by different I/O time while data loads. If you use augmentation while data loading, it makes results unreproduceble. Look at the https://discuss.pytorch.org/t/deterministic-non-deterministic-results-with-pytorch/9087
To fix this issue I have added the individual queue for each worker. In this case each worker get tasks in the stable order. In summary, subprocess produces the stable results.
To reproduce issue you may change ind_worker_queue to False and run the script several times.
Code to reproduce issue is in the corresponding PR.
* TestIndividualWorkerQueue added to DataLoader tests
* Review fixes
* "Simplify" code by removing itertools
* Rebase conflicts fix
* Review fixes
* Fixed shutdown behavior
* Removed ind_worker_queue flag.
* Rebase on master
* Disable tests that use DataLoader with multiple workers (#5322)
* Add default PyTorch seeding and worker_init_fn to DataLoader
* generate seed using current RNG each time
* worker_seed <- main_proc_RNG_generated_seed + worker_id
Here's the command I used to invoke autopep8 (in parallel!):
git ls-files | grep '\.py$' | xargs -n1 -P`nproc` autopep8 -i
Several rules are ignored in setup.cfg. The goal is to let autopep8
handle everything which it can handle safely, and to disable any rules
which are tricky or controversial to address. We may want to come back
and re-enable some of these rules later, but I'm trying to make this
patch as safe as possible.
Also configures flake8 to match pep8's behavior.
Also configures TravisCI to check the whole project for lint.
* Fix error in ELU backward
* Add --seed flag for testst st
* Add test for BatchNorm eval
* Fix autograd.backward docs
* Support cc flags in cuDNN search
* Fix IndexSelect backward formula
Depending on how PyTorch is compiled, the source code for DataLoader
might not be fully available which can cause a spurious error in
test_dataloader.py
DataLoader now supports the constructor argument 'pin_memory'. When set
to true, tensors in the sample are copied to pinned memory. This happens
in a background thread when num_workers > 1.