Summary:
Indices in Subset were stored as tensors earlier
passing as list in random_split to ensure integer indexing
fixes: #17466
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17649
Differential Revision: D14400250
Pulled By: soumith
fbshipit-source-id: cd20a959f33773c4babf8e861ea37ec61c2713a0
Summary:
Currently, the fake tqdm implementation requires an input (whereas real tqdm does not).
This caused a problem in torchvision (https://github.com/pytorch/vision/pull/770), and seems likely to cause minor irritations elsewhere.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17636
Differential Revision: D14296530
Pulled By: ezyang
fbshipit-source-id: bc077d898773c93dab34c985a7b30525a43e558a
Summary:
Currently, when you pass a negative index to a `Dataset` created with `ConcatDataset`, it simply passes that index to the first dataset in the list. So if, for example, we took `concatenated_dataset[-1]`, this will give us the last entry of the *first* dataset, rather than the last entry of the *last* dataset, as we would expect.
This is a simple fix to support the expected behavior for negative indices.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15756
Reviewed By: ezyang
Differential Revision: D14081811
Pulled By: fmassa
fbshipit-source-id: a7783fd3fd9e1a8c00fd076c4978ca39ad5a8a2a
Summary:
Fix issue #12174 for Mac OSX.
PS: This is a duplicate of PR #16968 that got messed up. Sorry for the confusion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16999
Differential Revision: D14050669
Pulled By: zou3519
fbshipit-source-id: a4594c03ae8e0ca91a4836408b6c588720162c9f
Summary:
Renewed attempt at https://github.com/pytorch/pytorch/pull/14171
From the original PR:
> Currently, the pin_memory_batch function in the dataloader will return a batch comprised of any unrecognized type without pinning the data, because it doesn't know how.
>
>This behavior was preventing us from overlapping data prefetching in Mask-RCNN, whose custom collate_fn returns a custom batch type.
The old PR allowed the user to implement batch pinning for custom batch and data types by passing a custom pin function to the dataloader. slayton58 suggested a cleaner approach: allow the user to define a `pin_memory` method on their custom types, and have `pin_memory_batch` [check for the presence of that method](https://github.com/pytorch/pytorch/pull/16743/files#diff-9f154cbd884fe654066b1621fad654f3R56) in the incoming batch as a fallback. I've updated the test and docstrings accordingly.
The old PR was merged but then reverted due to weird cuda OOM errors on windows that may or may not have been related. I have no idea why my changes would cause such errors (then or now) but it's something to keep an eye out for.
fmassa and yf225 who were my POCs on the old PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16743
Differential Revision: D13991745
Pulled By: ezyang
fbshipit-source-id: 74e71f62a03be453b4caa9f5524e9bc53467fa17
Summary:
This PR implements:
1. a fix to issue #12174 - determine the location of cudnn library using `ldconfig`
2. a fix to determine the installed conda packages (in recent versions of conda, the command `conda` is a Bash function that cannot be called within a python script, so using CONDA_EXE environment variable instead)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16859
Differential Revision: D14000399
Pulled By: soumith
fbshipit-source-id: 905658ecacb0ca0587a162fade436de9582d32ab
Summary:
Since pip 18.0 (2018-07-22), `legacy` is no longer a valid choice for `pip list --format` as can be seen in the [Release Notes](https://pip.pypa.io/en/stable/news/#id62). Therefore, the options now are: `columns`, `freeze` and `json`. With `legacy`, this is how it looked like:
```
[...]
Versions of relevant libraries:
[pip3] numpy (1.16.1)
[pip3] torch (1.0.1)
[pip3] torchvision (0.2.1)
[...]
```
Changing to `freeze`, this is how it looks like:
```
[...]
Versions of relevant libraries:
[pip3] numpy==1.16.1
[pip3] torch==1.0.1
[pip3] torchvision==0.2.1
[...]
```
Currently, this is what happens:
```
[...]
Versions of relevant libraries:
[pip] Could not collect
[...]
```
The `freeze` option is also available in old pip, so this change is backwards compatible. Also, if we would like to keep the old style, which I think it is not necessary, I could easily change that.
---
In case anyone wants to know how `columns` looks like (I prefer `freeze`):
```
[...]
Versions of relevant libraries:
[pip3] numpy 1.16.1
[pip3] torch 1.0.1
[pip3] torchvision 0.2.1
[...]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16798
Differential Revision: D13971793
Pulled By: soumith
fbshipit-source-id: 3721d9079a2afa245e1185f725598901185ea4cd
Summary:
The current implementation of the `torch.utils.model_zoo.load_url`
function is prone to a race condition when creating the directory in
which it saves the loaded models, since it checks whether the
directory exists and then creates it in two separate steps. The
directory can be created after the check was made but before we
attempt to create the directory, resulting in an unhandled exception.
Instead, try to create the directory directly, and do nothing if it
already exists.
Note: for Python versions ≥ 3.2, we could simply use the
`exist_ok=True` flag on `os.makedirs`, but this is unavailable in
Python 2.7.
Signed-off-by: Antoine Busque <antoine.busque@elementai.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16578
Differential Revision: D13886470
Pulled By: soumith
fbshipit-source-id: 88815c8a65eec96caea32d6e9a7f83802502fdb9
Summary:
Rehash of previous attempts. This tries a different approach where we accept the install as specified in cmake (leaving bin/ include/ and lib/ alone), and then try to adjust the rest of the files to this more standard layout.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16414
Differential Revision: D13863635
Pulled By: zdevito
fbshipit-source-id: 23725f5c64d7509bf3ca8f472dcdcad074de9828
Summary:
Some HTTP servers dont return Content-Length, account for that
Fixes: https://github.com/pytorch/pytorch/issues/16152
Differential Revision: D13858882
Pulled By: soumith
fbshipit-source-id: e4293e9368ed4c87548d22adec1ce0c25ea4bd8f
Summary:
1. Improve error message for better debugging info
2. Increase timeout
3. Also apply the windows worker failure detection mechanism on non-Windows platforms, for better robustness
Attempt to fix#14501
cc ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16249
Differential Revision: D13784702
Pulled By: ezyang
fbshipit-source-id: 09a7cff83ab9edce561ed69f9fb555ab35d1275f
Summary:
I fixed a very small extra parenthesis in a doctest.
I'm also going to use this issue as a place to propose the eventual inclusion of xdoctest (a pip installable library I wrote) in pytorch's test suite. I think there are a lot of problems with Python's built in doctest module, and I've built xdoctest to fix them. I would love for my project to get some exposure and its addition to PyTorch may benefit both projects. Please see the readme for more details on what xdoctest brings to the table over the builtin doctest module: https://github.com/Erotemic/xdoctest
I came across this small syntax error when working on ensuring xdoctest was compatible with pytorch. It isn't 100% there yet, but I'm working on it. My goal is to ensure that xdoctest is 100% compatible with all of torch's doctest out-of-the-box before writing up the PR. I'm also airing the idea out-loud before I commit too much time into this (or get my hopes up), so I'm attaching this little blurb to a no-brainer-merge PR to (1) demonstrate a little bit of value (because xdoctest flagged this syntax error) and (2) see how its received.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15646
Differential Revision: D13606111
Pulled By: soumith
fbshipit-source-id: d4492801a38ee0ae64ea0326a83239cee4d811a4
Summary:
Since #1323 tensors are shared with shared memory, but this feature is not active for numpy.
This PR fix this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14534
Differential Revision: D13561649
Pulled By: soumith
fbshipit-source-id: b6bc9e99fb91e8b675c2ef131fba9fa11c1647c0
Summary:
Same as #14668, and was approved there.
ailzhang , please apply this patch to Horizon's `data_streamer.py`: https://gist.github.com/SsnL/020fdb3d6b7016d81b6ba1d04cc41459 Thank you!
Below is the original description at #14668:
As I am working on tasks in https://github.com/pytorch/pytorch/issues/13023, I realized how unreadable the code is because all functions to be run in multiprocessing must be at top global level. Adding more functionalities to `dataloader.py` will only make things worse.
So in this PR, I refactor `dataloader.py` and move much of it into `data._utils`. E.g., the `_worker_loop` and related methods are now in `data._utils.worker`, signal handling code in `data._utils.signal_handling`, collating code in `data._utils.collate`, etc. This split, IMHO, makes code much clearer. I will base my future changes to DataLoader on top of this.
No functionality is changed, except that I added `torch._six.queue`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15331
Reviewed By: yf225
Differential Revision: D13503120
Pulled By: ailzhang
fbshipit-source-id: 94df16b4d80ad1102c437cde0d5a2e62cffe1f8e
Summary:
This PR enables C++ frontend modules to be bound into Python and added as submodules of Python modules. For this, I added lots of pybind11 bindings for the `torch::nn::Module` class, and modified the `torch.nn.Module` class in Python to have a new Metaclass that makes `isinstance(m, torch.nn.Module)` return true when `m` is a C++ frontend module. The methods and fields of C++ modules are bound in such a way that they work seamlessly as submodules of Python modules for most operations (one exception I know of: calling `.to()` ends up calling `.apply()` on each submodule with a Python lambda, which cannot be used in C++ -- this may require small changes on Python side).
I've added quite a bunch of tests to verify the bindings and equality with Python. I think I should also try out adding a C++ module as part of some large PyTorch module, like a WLM or something, and see if everything works smoothly.
The next step for inter-op across our system is ScriptModule <-> C++ Frontend Module inter-op. I think this will then also allow using C++ frontend modules from TorchScript.
apaszke zdevito
CC dzhulgakov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13481
Differential Revision: D12981996
Pulled By: goldsborough
fbshipit-source-id: 147370d3596ebb0e94c82cec92993a148fee50a7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14278
In this commit, we make checkpoint_sequential work for models with multiple tensor inputs. Previously, it only processed the first tensor and ignored the rest.
We introduce a new test in test/test_utils.py that replicates the issue referenced in this [GitHub issue](https://github.com/pytorch/pytorch/issues/11093), and we make sure that the test passes by changing the behavior of checkpoint_sequential to process all input tensors.
Reviewed By: ezyang
Differential Revision: D13144672
fbshipit-source-id: 24f58233a65a0f5b80b89c8d8cbced6f814004f7
Summary:
As I am working on tasks in https://github.com/pytorch/pytorch/issues/13023, I realized how unreadable the code is because all functions to be run in multiprocessing must be at top global level. Adding more functionalities to `dataloader.py` will only make things worse.
So in this PR, I refactor `dataloader.py` and move much of it into `data._utils`. E.g., the `_worker_loop` and related methods are now in `data._utils.worker`, signal handling code in `data._utils.signal_handling`, collating code in `data._utils.collate`, etc. This split, IMHO, makes code much clearer. I will base my future changes to DataLoader on top of this.
No functionality is changed, except that I added `torch._six.queue`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14668
Reviewed By: soumith
Differential Revision: D13289919
Pulled By: ailzhang
fbshipit-source-id: d701bc7bb48f5dd7b163b5be941a9d27eb277a4c
Summary:
When using `setuptools` to build a Python extension, setuptools will automatically add an ABI suffix like `cpython-37m-x86_64-linux-gnu` to the shared library name when using Python 3. This is required for extensions meant to be imported as Python modules. When we use setuptools to build shared libraries not meant as Python modules, for example libraries that define and register TorchScript custom ops, having your library called `my_ops.cpython-37m-x86_64-linux-gnu.so` is a bit annoying compared to just `my_ops.so`, especially since you have to reference the library name when loading it with `torch.ops.load_library` in Python.
This PR fixes this by adding a `with_options` class method to the `torch.utils.cpp_extension.BuildExtension` which allows configuring the `BuildExtension`. In this case, the first option we add is `no_python_abi_suffix`, which we then use in `get_ext_filename` (override from `setuptools.build_ext`) to throw away the ABI suffix.
I've added a test `setup.py` in a `no_python_abi_suffix_test` folder.
Fixes https://github.com/pytorch/pytorch/issues/14188
t-vi fmassa soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14130
Differential Revision: D13216575
Pulled By: goldsborough
fbshipit-source-id: 67dc345c1278a1a4ee4ca907d848bc1fb4956cfa
Summary:
For custom TorchScript operators, `torch.ops.load_library` must be used and passed the path to the shared library containing the custom ops. Our C++ extensions stuff generally is meant to build a Python module and import it. This PR changes `torch.utils.cpp_extension.load` to have an option to just return the shared library path instead of importing it as a Python module, so you can then pass it to `torch.ops.load_library`. This means folks can re-use `torch.utils.cpp_extension.load` and `torch.utils.cpp_extension.load_inline` to even write their custom ops inline. I think t-vi and fmassa will appreciate this.
soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13941
Differential Revision: D13110592
Pulled By: goldsborough
fbshipit-source-id: 37756307dbf80a81d2ed550e67c8743dca01dc20
Summary:
Currently, the `pin_memory_batch` function in the dataloader will return a batch comprised of any unrecognized type without pinning the data, because it doesn't know how.
This behavior was preventing us from overlapping data prefetching in Mask-RCNN, whose custom `collate_fn` returns a custom batch type.
The present PR adds the ability for the user to pass a `pin_fn` alongside any custom `collate_fn` to handle such custom types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14171
Differential Revision: D13166669
Pulled By: soumith
fbshipit-source-id: ca965f9841d4a259b3ca4413c8bd0d8743d433ab
Summary:
This issue was noticed, and fix proposed, by raulpuric.
Checkpointing is implemented by rerunning a forward-pass segment for each checkpointed segment during backward. This can result in the RNG state advancing more than it would without checkpointing, which can cause checkpoints that include dropout invocations to lose end-to-end bitwise accuracy as compared to non-checkpointed passes.
The present PR contains optional logic to juggle the RNG states such that checkpointed passes containing dropout achieve bitwise accuracy with non-checkpointed equivalents.** The user requests this behavior by supplying `preserve_rng_state=True` to `torch.utils.checkpoint` or `torch.utils.checkpoint_sequential`.
Currently, `preserve_rng_state=True` may incur a moderate performance hit because restoring MTGP states can be expensive. However, restoring Philox states is dirt cheap, so syed-ahmed's [RNG refactor](https://github.com/pytorch/pytorch/pull/13070#discussion_r235179882), once merged, will make this option more or less free.
I'm a little wary of the [def checkpoint(function, *args, preserve_rng_state=False):](https://github.com/pytorch/pytorch/pull/14253/files#diff-58da227fc9b1d56752b7dfad90428fe0R75) argument-passing method (specifically, putting a kwarg after a variable argument list). Python 3 seems happy with it.
Edit: It appears Python 2.7 is NOT happy with a [kwarg after *args](https://travis-ci.org/pytorch/pytorch/builds/457706518?utm_source=github_status&utm_medium=notification). `preserve_rng_state` also needs to be communicated in a way that doesn't break any existing usage. I'm open to suggestions (a global flag perhaps)?
**Batchnorm may still be an issue, but that's a battle for another day.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14253
Differential Revision: D13166665
Pulled By: soumith
fbshipit-source-id: 240cddab57ceaccba038b0276151342344eeecd7
Summary:
This is the next minimal step towards moving _C into cmake. For now,
leave _C in setup.py, but reduce it to an empty stub file. All of its
sources are now part of the new torch-python cmake target.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12742
Reviewed By: soumith
Differential Revision: D13089691
Pulled By: anderspapitto
fbshipit-source-id: 1c746fda33cfebb26e02a7f0781fefa8b0d86385
Summary:
We weren't running C++ extensions tests in CI.
Also, let's error hard when `ninja` is not available instead of skipping C++ extensions tests.
Fixes https://github.com/pytorch/pytorch/issues/13622
ezyang soumith yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13646
Differential Revision: D12961468
Pulled By: goldsborough
fbshipit-source-id: 917c8a14063dc40e6ab79a0f7d345ae2d3566ba4
Summary:
In TorchScript and C++ extensions we currently advocate a mix of `torch::` and `at::` namespace usage. In the C++ frontend I had instead exported all symbols from `at::` and some from `c10::` into the `torch::` namespace. This is far, far easier for users to understand, and also avoid bugs around creating tensors vs. variables. The same should from now on be true for the TorchScript C++ API (for running and loading models) and all C++ extensions.
Note that since we're just talking about typedefs, this change does not break any existing code.
Once this lands I will update stuff in `pytorch/tutorials` too.
zdevito ezyang gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13523
Differential Revision: D12942787
Pulled By: goldsborough
fbshipit-source-id: 76058936bd8707b33d9e5bbc2d0705fc3d820763
Summary:
I found a bug about compiling the cuda file when I install maskrcnn-benchmark lib.
`python setup.py build develop` will throw the error:
```
File "/usr/local/lib/python2.7/dist-packages/torch/utils/cpp_extension.py", line 214, in unix_wrap_compile
original_compile(obj, src, ext, cc_args, cflags, pp_opts)
File "/usr/lib/python2.7/distutils/unixccompiler.py", line 125, in _compile
self.spawn(compiler_so + cc_args + [src, '-o', obj] +
TypeError: coercing to Unicode: need string or buffer, list found
```
For more information, please see [issue](https://github.com/facebookresearch/maskrcnn-benchmark/issues/99).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13509
Differential Revision: D12902675
Pulled By: soumith
fbshipit-source-id: b9149f5de21ae29f94670cb2bbc93fa368f4e0f7
Summary:
This PR makes the ABI compatibility check for C++ extensions more robust by resolving the real path of the compiler binary, such that e.g. `"c++"` is resolved to the path of g++. This more robust than assuming that `c++ --version` will contain the word "gcc".
CC jcjohnson
Closes#10114
soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13092
Differential Revision: D12810448
Pulled By: goldsborough
fbshipit-source-id: 6ac460e24496c0d8933b410401702363870b7568
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13141
This is an example diff to show what lint rules are being applied.
Reviewed By: mingzhe09088
Differential Revision: D10858478
fbshipit-source-id: cbeb013f10f755b0095478adf79366e7cf7836ff