Summary:
These could use some autograd tests, which are coming in a later PR, but using them in autograd is probably pretty rare.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9947
Reviewed By: ezyang
Differential Revision: D9032778
Pulled By: gchanan
fbshipit-source-id: fa5a6509d3bac31ea4fae25143e82de62daabfbd
Summary:
During tracing (and export) we are now introducing an unnecessary hard-coded view on the RHS of indexed assignments such as `tensor[idxs] = rhs`. This caused a regression in the PyTorch translate models because these expressions appear with variable sizes in the RHS. This change makes it so we only call view if we indeed need to strip leading 1-dimensions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9424
Reviewed By: colesbury
Differential Revision: D8838881
Pulled By: jamesr66a
fbshipit-source-id: 399e5daa7d021f4f59f6f92b9fae581f92bfc538
Summary:
This PR implements and tests N-dimensional empty tensors for indexing, factories, and reductions if compiled with -DUSE_TH_SIZE_ZERO_DIM.
Still remaining to add:
1) TensorShape functions
2) Simple linear algebra functions (matrix multiply variants)
3) Other functions that operate over a dimension (but don't reduce).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9209
Reviewed By: ezyang
Differential Revision: D8751257
Pulled By: gchanan
fbshipit-source-id: 2113374dc7af6caf31a99bf67b3893f130a29e23
Summary:
Booleaning indexing was special cased to handle a single boolean value, but didn't generally work given multiple booleans.
This PR unifies the behavior with slicing. Note that only 'True' and torch.tensor(True) behave like NumPy due to the lack of n-dimensional empty tensors.
The corresponding tests for false values have been added, but are guarded behind a flag until we add n-dimensional empty tensors.
Closes https://github.com/pytorch/pytorch/pull/8920
Reviewed By: ezyang
Differential Revision: D8661876
Pulled By: gchanan
fbshipit-source-id: 0dc8a45a303aa41f729d04ab8908cfaf2e3ce3d7
* Created TensorOptions
Storing the type in TensorOptions to solve the Variable problem
Created convenience creation functions for TensorOptions and added tests
Converted zeros to TensorOptions
Converted rand to TensorOptions
Fix codegen for TensorOptions and multiple arguments
Put TensorOptions convenience functions into torch namespace too
All factory functions except *_like support TensorOptions
Integrated with recent JIT changes
Support *_like functions
Fix in place modification
Some cleanups and fixes
Support sparse_coo_tensor
Fix bug in Type.cpp
Fix .empty calls in C++ API
Fix bug in Type.cpp
Trying to fix device placement
Make AutoGPU CPU compatible
Remove some auto_gpu.h uses
Fixing some headers
Fix some remaining CUDA/AutoGPU issues
Fix some AutoGPU uses
Fixes to dispatch_tensor_conversion
Reset version of new variables to zero
Implemented parsing device strings
Random fixes to tests
Self review cleanups
flake8
Undo changes to variable.{h,cpp} because they fail on gcc7.2
Add [cuda] tag to tensor_options_cuda.cpp
Move AutoGPU::set_index_from into .cpp file because Windows is stupid and sucks
Fix linker error in AutoGPU.cpp
Fix bad merge conflict in native_functions.yaml
Fixed caffe2/contrib/aten
Fix new window functions added to TensorFactories.cpp
* Removed torch::TensorOptions
Added code to generate wrapper functions for factory methods
Add implicit constructor from Backend to TensorOptions
Remove Var() from C++ API and use torch:: functions
Use torch:: functions more subtly in C++ API
Make AutoGPU::set_device more exception safe
Check status directly in DynamicCUDAHooksInterface
Rename AutoGPU to DeviceGuard
Removed set_requires_grad from python_variables.h and warn appropriately in Variable::set_requires_grad
remove python_default_init: self.type()
Add back original factory functions, but with deprecation warnings
Disable DeviceGuard for a couple functions in ATen
Remove print statement
Fix DeviceGuard construction from undefined tensor
Fixing CUDA device compiler issues
Moved as many methods as possible into header files
Dont generate python functions for deprecated factories
Remove merge conflict artefact
Fix tensor_options_cuda.cpp
Fix set_requires_grad not being checked
Fix tensor_new.h
TEMPORARILY put some methods in .cpp files to see if it solves issues on windows and mac
Fix bug in DeviceGuard.h
Missing includes
TEMPORARILY moving a few more methods into .cpp to see if it fixes windows
Fixing linker errors
* Fix up SummaryOps to use new factories
Undo device agnostic behavior of DeviceGuard
Use -1 instead of optional for default device index
Also move DeviceGuard methods into header
Fixes around device index after optional -> int32_t switch
Fix use of DeviceGuard in new_with_tensor_copy
Fix tensor_options.cpp
* Fix Type::copy(
* Remove test_non_float_params from ONNX tests
* Set requires_grad=False in ONNX tests that use ints
* Put layout/dtype/device on Tensor
* Post merge fixes
* Change behavior of DeviceGuard to match AutoGPU
* Fix C++ API integration tests
* Fix flip functions
* throw error on 0-length tensor slicing
* return empty tensor instead of throwing error
* make 0 slice work for tuples also
* add tests
* move check to aten
* Address comments
* Fix torch.tensor(...) device-type calculation when used with numpy and type inference.
* Fix tensor device type inference as well.
* Better variable type inference: infer cuda-ness only if device is not specified.
* Fix performance regression on simple cases of indexing
Dispatches to the old kernels
* Adapt JIT test
The test was expected to fail, but due to the change in the previous diff, it would now dispatch to index_select, which succeeds. I modified the function to go through the advanced indexing codepath
* Only do checks once, properly AutoNoGil, AutoGPU.
* Fix cross device indexing for more than 1 cuda device.
Cross device indexing is attempted from ATen, which doesn't work well because ATen doesn't have AutoGPU, etc.
Instead, before dispatching to ATen we do type conversion on the indices; it would probably be better if we
pushed all this down to ATen, but that will take some work.
* Small cleanup.
This deletes most of the dead Tensor code paths, including the TensorMethods cwrap and generic/Tensor.cpp.
This also moves the THNN.cwrap/.cpp generation to generate_code which can use ninja if installed.
* Improve Variable interface
* Address comments from @apaszke and @colesbury
* string ::operator= is not noexcept
* Remove ir.h from tracer_state.h to improve build times
* Make Variable a struct and pack SavedVariable fields
* Implement as_variable_ref
* grad_fn_ptr() -> grad_fn_unsafe()
* Reduce hackiness of set_type hack
* Include variable.h and edge.h in tracer_state.h because it uses them
* class Variable -> struct Variable because Windows cant even
* Make Variable::output_nr uint32_t instead of int
* Add comment about tracing state
* Replaced more static_cast<Variable&> and improve docs
* Remove SavedVariable destructor and construct members in init list
* Clarify docs for Variable
* Variable::set_version -> set_version_counter
This better maintains backwards compatibility when Tensors and Variables
are merged. For example:
>>> loss = var.sum().data[0]
Currently, `var.sum().data` is 1-dim so indexing. Once scalars are
enabled and Variable and Tensor are merged it will be zero-dim. This
change allows that expression to continue working (with a warning). In
the future, the canonical way to compute that expression will be:
>>> loss = float(var.sum())
Or an equivalent alternative:
>>> loss = var.sum().item()
Also fixes a few error cases.
The Tensor and Variable classes are being merged.
autograd.Function.forward is now called on Variables, but with "no-grad"
mode (torch.no_grad()) enabled.
One benefit is that we no longer have to explicitly track shared
storages.
1) Have 0-dim byte tensors behave like Py_TRUE, Py_FALSE
1) Py_TRUE now properly returns a copy from getitem
3) setitem now properly shapes the LHS consistent with the RHS (this doesn't really matter outside of error messages having the proper shape)
4) setitem supports numpy-style copy_to broadcasting (cuts off prefix 1s from src), so e.g. you can setitem (1,1,2,3) to (2,3) even though
that doesn't follow the normal inplace broadcasting rules.
Implements basic and advanced indexing using ATen tensors/variables.
Basic indexing is translated at the Python-binding level
(python_variable_indexing.cpp) to slice/squeeze/unsqueeze/select calls.
Advanced indexing is implemented in ATen in terms of take() and put()
calls.