* Add optimizer_for_mobile doc into python api root doc
* Apply suggestions from code review
Remove all references to `optimization_blacklist` as it's missing in 1.6
Co-authored-by: Nikita Shulga <nshulga@fb.com>
As ninja has accurate dependency tracking, if there is nothing to do,
then we will very quickly noop. But this is important for correctness:
if a change was made to a header that is not listed explicitly in
the distutils Extension, then distutils will come to the wrong
conclusion about whether or not recompilation is needed (but Ninja
will work it out.)
This caused https://github.com/pytorch/vision/issues/2367
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
ghstack-source-id: 6409595c8a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40837
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37462
Instead of running all the optimization pass in optimizeForMobile method,
introducing a whitelist optimizer dictionary as second param in the method,
when it is not passed during calling, the method will run all the optimization
passes, otherwise the method will read the dict and only run the pass with
value of True.
ghstack-source-id: 106104503
Test Plan:
python test/test_mobile_optimizer.py
Imported from OSS
Differential Revision: D22096029
fbshipit-source-id: daa9370c0510930f4c032328b225df0bcf97880f
Summary:
There still are occasional reports of DataLoader workers not exiting (e.g., https://github.com/pytorch/pytorch/issues/39570). Before we figure out why, we should just kill them if the join timesout to prevent hanging.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39869
Differential Revision: D22018501
Pulled By: ezyang
fbshipit-source-id: 66a00d0f5b3e303b6106b336949176b3ff8ac8ae
Summary:
Changes in PR https://github.com/pytorch/pytorch/issues/39759 broke HIP caffe2.
hipify for caffe2 renames CUDA to HIP; torch does not.
If caffe2 calls into torch, it needs to use CUDA-named functions.
CC ezyang xw285cornell sunway513 houseroad dzhulgakov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39801
Differential Revision: D21982493
Pulled By: xw285cornell
fbshipit-source-id: 8e88e0fb80c71f0342e23ef0214a42d5542bdc70
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39277
This PR contains initial changes that makes PyTorch build with Ampere GPU, CUDA 11, and cuDNN 8.
TF32 related features will not be included in this PR.
Test Plan: Imported from OSS
Differential Revision: D21832814
Pulled By: malfet
fbshipit-source-id: 37f9c6827e0c26ae3e303580f666584230832d06
Summary:
Gets rid of some in-kernel asserts where they can be replaced with static_asserts
Replaces bare in-kernel `assert` in one case with `CUDA_KERNEL_ASSERT` where necessary
replaces host code `assert`s with `TORCH_INTERNAL_ASSERT`
Another group of asserts is in fractional max pooling kernels which should be fixed regardless https://github.com/pytorch/pytorch/issues/39044, the problems there are not just asserts.
I've audited remaining cases of in-kernel asserts, and they are more like `TORCH_INTERNAL_ASSERT`, so they should not happen with invalid user data. I think it's ok to leave them as is.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39047
Differential Revision: D21750392
Pulled By: ngimel
fbshipit-source-id: e9417523a2c672284de3515933cb7ed166e56719
Summary:
This PR adds the following changes:
1. It sets the default extension build to use ninja
2. Adds HIPCC flags to the host code compile string for ninja builds. This is needed when host code makes HIP API calls
cc: ezyang jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38939
Differential Revision: D21721905
Pulled By: ezyang
fbshipit-source-id: 75206838315a79850ecf86a78391a31ba5ee97cb
Summary:
CC ezyang xw285cornell
HIP from ROCm 3.5 renames `hipOccupancyMaxActiveBlocksPerMultiprocessor` to `hipModuleOccupancyMaxActiveBlocksPerMultiprocessor`. In addition, the API parameter types now match CUDA. Add these changes in a backwards-compatible manner.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38551
Differential Revision: D21721832
Pulled By: ezyang
fbshipit-source-id: 6fc971845e363d7495d8be9550e76d0f082c3062
Summary:
current `to_mkldnn` model conversion logic under `torch.utils.mkldnn` does not cover `nn.Conv1d`. This patch fills the gap, using similar logic to `nn.Conv2d`. The model conversion will remove unnecessary memory format reorders of input/output tensors and thus speedup the model.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38528
Differential Revision: D21640325
Pulled By: albanD
fbshipit-source-id: c3340153b5c524e020c097eb4b9e2ffcbde8896d
Summary:
Fix issue https://github.com/pytorch/pytorch/issues/23141#
In the below example ```default_collate``` collates each element of the list. Since the second element isn't present in all samples, it is discarded:
```
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
import numpy as np
class CustomDataset(Dataset):
def __len__(self):
return 2
def __getitem__(self, idx):
tmp = {
"foo": np.array([1, 2, 3]),
"bar": ["X"] * (idx+1),
}
return tmp
training = CustomDataset()
for batch in DataLoader(training, batch_size=2):
print(batch)
```
Yields
```
{
'foo': tensor(
[
[1, 2, 3],
[1, 2, 3]
]
),
'bar': [
('X', 'X'),
]
}
```
Based on discussion in the issue, it seems the best course of action is to error out in this case. This seems consistent with what is done for tensor elements, as seen in [TensorShape.cpp line 1066](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/TensorShape.cpp#L1060) which is called when ```torch.stack``` is called. In this PR, I introduce a similar message to error out for lists.
SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38492
Differential Revision: D21620396
Pulled By: ezyang
fbshipit-source-id: 17f59fbb1ed1f0d9b2185c95b9ebe55ece701b0c
Summary:
Since the check was added in https://github.com/pytorch/pytorch/pull/6249, one can not pass an iterable as a sampler to the data loader anymore, which was a very handy feature (e.g., https://github.com/pytorch/pytorch/issues/1337). I think the check should be removed for two-fold reasons:
1. It is too strict. There is no reason that it should not be a general iterable.
2. It is inconsistent. In `DataLoader` (the main place where people use samplers), you can pass a general iterable as `batch_sampler` but not `sampler` due to this check.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38403
Differential Revision: D21555958
Pulled By: soumith
fbshipit-source-id: c7267bb99a31edd8f2750689205d6edc5dab5cff
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38157
This removes the error prone process of assembling `torch/__init__.pyi`
(and frequently forgetting to expose things), since now we can simply
rely on the true source file to get things done. Most of the old
codegen in gen_pyi.py is now rerouted to various files:
- `torch/_C/__init__.pyi` (the dumping pile of all misc bindings)
- `torch/_C/_nn.pyi` (NN function bindings)
- `torch/_C/_VariableFunctions.pyi` (torch function bindings)
`torch.types` grew a bunch more definitions that previously where
defined in `torch/__init__.pyi`
Some miscellaneous changes
- Fixed a bug where we treat single TensorList argument as implying
varargs are accepted. This is actually only supported on IntList.
This means we can correctly generate a stub for dequantize.
- Add missing manual stub for nonzero
- Switched torch/onnx/operators.py to directly refer to _C module,
since apparently mypy doesn't think that methods prefixed with
underscores get reexported. This may be a recurring theme; maybe
we need to find a better way to solve it.
Because I was really lazy, I dumped namedtuple definitions in both
`torch._C` and `torch._C._VariableFunctions`. This is definitely wrong.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D21497400
Pulled By: ezyang
fbshipit-source-id: 07b126141c82efaca37be27c07255cb2b9b3f064
Summary:
The root cause of incorrect rendering is that numbers are treated as a string if the data type is not specified. Therefore the data is sort based on the first digit.
closes https://github.com/pytorch/pytorch/issues/29906
cc orionr sanekmelnikov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31544
Differential Revision: D21105403
Pulled By: natalialunova
fbshipit-source-id: a676ff5ab94c5bdb653615d43219604e54747e56
Summary:
This pull request adds a check for ROCm environment and skips adding CUDA specific flags for the scenario when a pytorch extension is built on ROCm.
ezyang jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38047
Differential Revision: D21470507
Pulled By: ezyang
fbshipit-source-id: 5af2d7235e306c7aa9a5f7fc8760025417383069
Summary:
Also move the ignores for imports to the bottom in `mypy.ini`, those are much less interesting - start with the stuff people want to work on.
Second commit tests the instructions: remove an ignore, fix the issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37594
Differential Revision: D21434858
Pulled By: ezyang
fbshipit-source-id: 4f1a6868cdb4cb59d072bcf105f48c3a5ba3ff98
Summary:
This pull request enables ahead of time compilation of HIPExtensions with ninja by setting appropriate compilation flags for ROCm environment. Also, this enables the unit test for testing cuda_extensions on ROCm as well as removing test for ahead of time compilation of extensions with ninja from ROCM_BLACKLIST
ezyang jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37800
Differential Revision: D21408148
Pulled By: soumith
fbshipit-source-id: 146f4ffb3418f3534e6ce86805d3fe9c3eae84e1
Summary:
We don't need to create `torch.Generator()` and seed it if we are not shuffling.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37604
Differential Revision: D21346167
Pulled By: rohan-varma
fbshipit-source-id: 6ed560d236bc5c026a7d321755ddc02a29db1604
Summary:
See https://discuss.pytorch.org/t/training-with-gradient-checkpoints-torch-utils-checkpoint-appears-to-reduce-performance-of-model/78102/3?u=jwl for details.
Updated the docs to warn users about issues with checkpointing models that use `detach()` or `torch.no_grad()` to freeze their model layers/weights during training. When they do this, training with `checkpoint` will fail as it forces the outputs to require gradients when the model itself does not. Hence, during the backward pass it will output the error:
```
[4]<stderr>:RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
```
Maybe it is possible to fix this directly in the code, but I am not sure how in the current codebase.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37266
Differential Revision: D21262558
Pulled By: mrshenli
fbshipit-source-id: 529cf370534504baf8937ef17dac5d6916fbf5ae
Summary:
As described in the issue (https://github.com/pytorch/pytorch/issues/33701) the compiler check
for building cpp extensions does not work with ccache.
In this case we check compiler -v to determine which
compiler is actually used and check it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37293
Differential Revision: D21256913
Pulled By: ezyang
fbshipit-source-id: 5483a10cc2dbcff98a7f069ea9dbc0c12b6502dc