- Non differentiable outputs could prevent a gradient computation (see
test_dep_nograd)
- Crash in backward on variable which doesn't requires_grad (issue
#438)
- Stochastic functions could be backproped through multiple times
Only references to their data and version counters are stored.
Also, it is now possible to have None arguments in save_for_backward
and return too many values from backward (as long as the excessive
results are None).