Summary:
This adds a Note on making experiments reproducible.
It also adds Instructions for building the Documentation to `README.md`. Please ping if I missed any requirements.
I'm not sure what to do about the submodule changes. Please advise.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11329
Differential Revision: D9784939
Pulled By: ezyang
fbshipit-source-id: 5c5acbe343d1fffb15bdcb84c6d8d925c2ffcc5e
Summary:
Ping ezyang
This addresses your comment in #114. Strangely, when running the doc build (`make html`) none of my changes are actually showing, could you point out what I'm doing wrong?
Once #11329 is merged it might make sense to link to the reproducibility note everywhere.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11434
Differential Revision: D9751208
Pulled By: ezyang
fbshipit-source-id: cc672472449564ff099323c39603e8ff2b2d35c9
Summary:
I'm 80% sure that this fixes the math bug. But I can't repro locally so I don't know.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11472
Differential Revision: D9755328
Pulled By: SsnL
fbshipit-source-id: 130be664d3c6ceee3c0c166c1a86fc9ec3b79d74
Summary:
vishwakftw Your patch needed some updates because the default native function dispatches changed from `[function, method]` to `[function]`. The CI was run before that change happened so it still shows green, but the internal test caught it.
I did some changes when rebasing and updating so I didn't just force push to your branch. Let's see if this passes CI and internal test. If it does, let me know if you want me to force push to your branch or use this PR instead.
Note to reviewers: patch was already approved at #10068 .
cc yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11421
Differential Revision: D9733407
Pulled By: SsnL
fbshipit-source-id: cf2ed293bb9942dcc5158934ff4def2f63252599
Summary:
In addition to documentation, this cleans up a few error message formats.
It also adds infra to find which operators are supported by the JIT automatically, which is then used in the generation of the docs.
The wording and formatting of the docs is not yet polished, but having this will allow our document writers to make faster progress.
Followup PRs will polish the docs and fix formatting issues.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11357
Differential Revision: D9721277
Pulled By: zdevito
fbshipit-source-id: 153a0d5be1efb314511bcfc0cec48643d78ea48b
Summary:
This PR cleans up the `at::Tensor` class by removing all methods that start with an underscore in favor of functions in the `at::` namespace. This greatly cleans up the `Tensor` class and makes it clearer what is the public and non-public API.
For this I changed `native_functions.yaml` and `Declarations.cwrap` to make all underscore methods `variant: function` (or add such a statement to begin with), and then fixed all code locations using the underscore methods.
ezyang colesbury gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11152
Differential Revision: D9683607
Pulled By: goldsborough
fbshipit-source-id: 97f869f788fa56639c05a439e2a33be49f10f543
Summary:
Since we don't need `torch.autograd.Variable` anymore, I removed `torch.autograd.Variable` from `onnx.rst`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10810
Differential Revision: D9500960
Pulled By: zou3519
fbshipit-source-id: 1bc820734c96a8c7cb5d804e6d51a95018db8e7f
Summary:
The CPU and CUDA variants are a direct transposition of Graves et al.'s description of the algorithm with the
modification that is is in log space.
The there also is a binding for the (much faster) CuDNN implementation.
This could eventually fix#3420
I still need to add tests (TestNN seems much more elaborate than the other testing) and fix the bugs than invariably turn up during the testing. Also, I want to add some more code comments.
I could use feedback on all sorts of things, including:
- Type handling (cuda vs. cpu for the int tensors, dtype for the int tensors)
- Input convention. I use log probs because that is what the gradients are for.
- Launch parameters for the kernels
- Errors and obmissions and anything else I'm not even aware of.
Thank you for looking!
In terms of performance it looks like it is superficially comparable to WarpCTC (and thus, but I have not systematically investigated this).
I have read CuDNN is much faster than implementations because it does *not* use log-space, but also the gathering step is much much faster (but I avoided trying tricky things, it seems to contribute to warpctc's fragility). I might think some more which existing torch function (scatter or index..) I could learn from for that step.
Average timings for the kernels from nvprof for some size:
```
CuDNN:
60.464us compute_alphas_and_betas
16.755us compute_grads_deterministic
Cuda:
121.06us ctc_loss_backward_collect_gpu_kernel (= grads)
109.88us ctc_loss_gpu_kernel (= alphas)
98.517us ctc_loss_backward_betas_gpu_kernel (= betas)
WarpCTC:
299.74us compute_betas_and_grad_kernel
66.977us compute_alpha_kernel
```
Of course, I still have the (silly) outer blocks loop rather than computing consecutive `s` in each thread which I might change, and there are a few other things where one could look for better implementations.
Finally, it might not be unreasonable to start with these implementations, as the performance of the loss has to be seen in the context of the entire training computation, so this would likely dilute the relative speedup considerably.
My performance measuring testing script:
```
import timeit
import sys
import torch
num_labels = 10
target_length = 30
input_length = 50
eps = 1e-5
BLANK = 0#num_labels
batch_size = 16
torch.manual_seed(5)
activations = torch.randn(input_length, batch_size, num_labels + 1)
log_probs = torch.log_softmax(activations, 2)
probs = torch.exp(log_probs)
targets = torch.randint(1, num_labels+1, (batch_size * target_length,), dtype=torch.long)
targets_2d = targets.view(batch_size, target_length)
target_lengths = torch.tensor(batch_size*[target_length])
input_lengths = torch.tensor(batch_size*[input_length])
activations = log_probs.detach()
def time_cuda_ctc_loss(grout, *args):
torch.cuda.synchronize()
culo, culog_alpha = torch._ctc_loss(*args)
g, = torch.autograd.grad(culo, args[0], grout)
torch.cuda.synchronize()
def time_cudnn_ctc_loss(groupt, *args):
torch.cuda.synchronize()
culo, cugra= torch._cudnn_ctc_loss(*args)
g, = torch.autograd.grad(culo, args[0], grout)
torch.cuda.synchronize()
def time_warp_ctc_loss(grout, *args):
torch.cuda.synchronize()
culo = warpctc.ctc_loss(*args, blank_label=BLANK, size_average=False, length_average=False, reduce=False)
g, = torch.autograd.grad(culo, args[0], grout)
torch.cuda.synchronize()
if sys.argv[1] == 'cuda':
lpcu = log_probs.float().cuda().detach().requires_grad_()
args = [lpcu, targets_2d.cuda(), input_lengths.cuda(), target_lengths.cuda(), BLANK]
grout = lpcu.new_ones((batch_size,))
torch.cuda.synchronize()
print(timeit.repeat("time_cuda_ctc_loss(grout, *args)", number=1000, globals=globals()))
elif sys.argv[1] == 'cudnn':
lpcu = log_probs.float().cuda().detach().requires_grad_()
args = [lpcu, targets.int(), input_lengths.int(), target_lengths.int(), BLANK, True]
grout = lpcu.new_ones((batch_size,))
torch.cuda.synchronize()
print(timeit.repeat("time_cudnn_ctc_loss(grout, *args)", number=1000, globals=globals()))
elif sys.argv[1] == 'warpctc':
import warpctc
activations = activations.cuda().detach().requires_grad_()
args = [activations, input_lengths.int(), targets.int(), target_lengths.int()]
grout = activations.new_ones((batch_size,), device='cpu')
torch.cuda.synchronize()
print(timeit.repeat("time_warp_ctc_loss(grout, *args)", number=1000, globals=globals()))
```
I'll also link to a notebook that I used for writing up the algorithm in simple form and then test the against implementations against it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9628
Differential Revision: D8952453
Pulled By: ezyang
fbshipit-source-id: 18e073f40c2d01a7c96c1cdd41f6c70a06e35860
Summary:
This implements the two-parameter Weibull distribution, with scale $\lambda$ and shape $k$ parameters as described on [Wikipedia](https://en.wikipedia.org/wiki/Weibull_distribution).
**Details**
- We implement as a transformed exponential distribution, as described [here](https://en.wikipedia.org/wiki/Weibull_distribution#Related_distributions).
- The `weibull_min` variance function in scipy does not yet support a vector of distributions, so our unit test uses a scalar distribution instead of a vector.
Example of the bug:
```
>>> sp.stats.expon(np.array([0.5, 1, 2])).var() # fine
array([1., 1., 1.])
>>> sp.stats.weibull_min(c=np.array([0.5, 1, 2])).var() # buggy
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py", line 490, in var
return self.dist.var(*self.args, **self.kwds)
File "/usr/local/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py", line 1242, in var
res = self.stats(*args, **kwds)
File "/usr/local/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py", line 1038, in stats
if np.isinf(mu):
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9454
Differential Revision: D8863574
Pulled By: SsnL
fbshipit-source-id: 1ad3e175b469eee2b6af98e7b379ea170d3d9787
Summary:
This pull request implements low rank multivariate normal distribution where the covariance matrix has the from `W @ W.T + D`. Here D is a diagonal matrix, W has shape n x m where m << n. It used "matrix determinant lemma" and "Woodbury matrix identity" to save computational cost.
During the way, I also revise MultivariateNormal distribution a bit. Here are other changes:
+ `torch.trtrs` works with cuda tensor. So I tried to use it instead of `torch.inverse`.
+ Use `torch.matmul` instead of `torch.bmm` in `_batch_mv`. The former is faster and simpler.
+ Use `torch.diagonal` for `_batch_diag`
+ Reimplement `_batch_mahalanobis` based on `_batch_trtrs_lower`.
+ Use trtrs to compute term2 of KL.
+ `variance` relies on `scale_tril` instead of `covariance_matrix`
TODO:
- [x] Resolve the fail at `_gradcheck_log_prob`
- [x] Add test for KL
cc fritzo stepelu apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8635
Differential Revision: D8951893
Pulled By: ezyang
fbshipit-source-id: 488ee3db6071150c33a1fb6624f3cfd9b52760c3
Summary:
fixes#4176 cc vishwakftw
I didn't do `:math:` and `\neg` because I am using double ticks so they render more similarly with `:attr:`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9630
Differential Revision: D8933022
Pulled By: SsnL
fbshipit-source-id: 31d8551f415b624c2ff66b25d886f20789846508
Summary:
dlpacks deserve documentation. :)
I wonder whether it might make sense to merge the various small torch.utils pages (and include a link for the larger ones, e.g. data) to enhance the structure in the docs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9343
Differential Revision: D8801227
Pulled By: soumith
fbshipit-source-id: 2980d271971743b86f052bec5a2cb4d146a90d9b
Summary:
Commits:
1. In extension doc, get rid of all references of `Variable` s (Closes#6947 )
+ also add minor improvements
+ also added a section with links to cpp extension :) goldsborough
+ removed mentions of `autograd.Function.requires_grad` as it's not used anywhere and hardcoded to `return_Py_True`.
2. Fix several sphinx warnings
3. Change `*` in equations in `module/conv.py` to `\times`
4. Fix docs for `Fold` and `Unfold`.
+ Added better shape check for `Fold` (it previously may give bogus result when there are not enough blocks). Added test for the checks.
5. Fix doc saying `trtrs` not available for CUDA (#9247 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9239
Reviewed By: soumith
Differential Revision: D8762492
Pulled By: SsnL
fbshipit-source-id: 13cd91128981a94493d5efdf250c40465f84346a
Summary:
This PR addresses #5823.
* fix docstring: upsample doesn't support LongTensor
* Enable float scale up & down sampling for linear/bilinear/trilinear modes. (following SsnL 's commit)
* Enable float scale up & down sampling for nearest mode. Note that our implementation is slightly different from TF that there's actually no "align_corners" concept in this mode.
* Add a new interpolate function API to replace upsample. Add deprecate warning for upsample.
* Add an area mode which is essentially Adaptive_average_pooling into resize_image.
* Add test cases for interpolate in test_nn.py
* Add a few comments to help understand *linear interpolation code.
* There is only "*cubic" mode missing in resize_images API which is pretty useful in practice. And it's labeled as hackamonth here #1552. I discussed with SsnL that we probably want to implement all new ops in ATen instead of THNN/THCUNN. Depending on the priority, I could either put it in my queue or leave it for a HAMer.
* After the change, the files named as *Upsampling*.c works for both up/down sampling. I could rename the files if needed.
Differential Revision: D8729635
Pulled By: ailzhang
fbshipit-source-id: a98dc5e1f587fce17606b5764db695366a6bb56b
Summary:
Closes#9147
Added a test to prevent regression in test_torch
Added entries in docs
cc ezyang weiyangfb
Closes https://github.com/pytorch/pytorch/pull/9156
Differential Revision: D8732095
Pulled By: soumith
fbshipit-source-id: 7a6892853cfc0ccb0142b4fd25015818849adf61