mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
14cbd9adb8
347 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
14cbd9adb8 |
Implement torch.pinverse : Pseudo-inverse (#9052)
Summary: 1. Used SVD to compute. 2. Tests in test_autograd, test_cuda and test_torch 3. Doc strings in _torch_docs.py and _tensor_docs.py Closes #6187 Closes https://github.com/pytorch/pytorch/pull/9052 Reviewed By: soumith Differential Revision: D8714628 Pulled By: SsnL fbshipit-source-id: 7e006c9d138b9f49e703bd0ffdabe6253be78dd9 |
||
|
|
08daed40f7 |
Fix bug in flip() (#9156)
Summary: Closes #9147 Added a test to prevent regression in test_torch Added entries in docs cc ezyang weiyangfb Closes https://github.com/pytorch/pytorch/pull/9156 Differential Revision: D8732095 Pulled By: soumith fbshipit-source-id: 7a6892853cfc0ccb0142b4fd25015818849adf61 |
||
|
|
90fd4df695 |
Add flag for disabling tests with multiprocessing spawn start method (#9061)
Summary: This will resolve some of the timeout issues in CPU and GPU tests internally. Closes https://github.com/pytorch/pytorch/pull/9061 Reviewed By: ezyang Differential Revision: D8707471 Pulled By: yf225 fbshipit-source-id: 9dc82a2c9da0c540ae015442f74b9b2b1a67a246 |
||
|
|
15a75208ee |
Use std::random_device for generating storage handle (#8971)
Summary: Currently the `test_RNG_after_pickle` in the PR would fail because pickling a tensor changes the RNG state. This PR aims to fix it. Closes https://github.com/pytorch/pytorch/pull/8971 Reviewed By: ezyang Differential Revision: D8677474 Pulled By: yf225 fbshipit-source-id: 1713d9611699ad288b66d92dbb29ce9feb34b8cf |
||
|
|
edb88b5f3a
|
Update from Facebook (#8887)
* add opencl + fpga context adds an opencl context inside caffe2/fb which can be used for fpga access * [Caffe2] Force tensor inference checks to be triggered during testing We've started to rely on TensorInference functions more for different analysis. This diff ensures that the TensorInference function's result matches what is expected from the definition of the operator. * Enable building //caffe2:torch with @mode/opt In @mode/opt, python runs out of a PAR, which breaks a lot of assumptions in the code about where templates/ folders live relative to __file__. Rather than introduce hacks with parutil, I simply turn template_path into a parameter for all the relevant functions and thread it through from the top level. * [Caffe2] Fix cost models for DotProduct and Div. Update Tensor Inference for dot product As title. DotProduct states that output is a 1-D tensor (https://caffe2.ai/docs/operators-catalogue.html#dotproduct) though code suggests it is either 0- or 1-D depending on inputs. TensorInference defined to support implementation. * [SG-MoE] Add an option to make the experts NOT as components * [nomnigraph] Rename and fixup convertToNeuralNetOperator API This will make things a bit cleaner * no longer symlink THNN.h and THCUNN.h * forced decoder network (onnx export) Closes https://github.com/pytorch/translate/pull/95 Add networks in ensemble_export.py to create a forced decoding network from PyTorch NMT checkpoints. This network takes an arbitrary numberized (source, target) pair and returns the model score for the translation, including penalties. Vocabulary reduction networks are also supported, but note that target indices which are not in the possible_translation_tokens generated for the source input will be trea * Revert schema change to fix production models Revert schema change to fix production models * MockLogDeviceReader - rebase on FIX # Goal 1), Build a make_mock_log_device_reader using make_mock_reader 2), Replace the real log_device_reader here: https://fburl.com/raihwf1p # Log by D8151734 Real log_device_reader: ``` I0529 20:29:05.373108 954994 tensor.h:839] Tensor print_net/log of type std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >. Dims: (): read_net/ParseOpenTrainingRow:0 I0529 20:29:05.373244 954994 tensor.h:839] Tensor read_net/ParseOpenTrainin * [C2/D2][1/n]: Nonnegative-Constrained Optimization -- log barrier implement log barrier as a regularization method * Add teacher weight screening. Add teacher weight sceening according to teacher labels. If teacher label is zero, we do not use the distill loss in the objective function. * Add NormalizerContext See task for more detail. This implementation is a copy of what exists for RegularizerContext except for how the parameters are defined in the model_definition thrift file. I'll try an alternative implementation which overrides the default arguments of functions instead like for argscopes in tensorflow. https://github.com/pytorch/pytorch/compare/master...MaximeBoucher:update-from-facebook-0939578c068c?expand=1 * Adding cosine similarity option in dot processor Add pairwise cosine similarity option in dot product. Add an option to concate dot product and cosine similarity. Add test cases. * [nomnigraph][redo] Concat elim for sparseNN Same as D7962948, which was reverted because Operator Schema was not defined * [pytorch] Revert pytorch/pytorch#7918 'Release GIL when copying to shared memory', breaks ASAN Revert this pytorch diff that breaks ASAN when running Filament in dev mode; in opt mode it gives "bad file descriptor" errors. Looks like a race when copying tensors to shared memory in multiple mp.Queue's (which spawn separate threads). https://github.com/pytorch/pytorch/pull/7918/files * [nomnigraph][mobile] Enable nomnigraph by default, use -Oz on nomnigraph related code to reduce code size enables nomnigraph and reduces codesize * [Warmup] Allow both offline incremental training and online training Change plan name on saving side and reading side to support both training type This diff depends on D8128530 and D8168651. * Revert D7802642: [Warmup] Allow both offline incremental training and online training This reverts commit afc213cf9b36cecf75333a788391c4d09f4afccc @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Add legacy grad logic to fix div op on old graphs. Add legacy grad logic to fix div op on old graphs. * Correctly propagate operator failures Propagate errors from operators that throw exceptions and return false * Revert D8374829: [caffe2][nomnigraph][redo] Concat elim for sparseNN This reverts commit 6dda028c463e54bb5c32188bbbe9202107e188a5 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [Caffe2] Added extra_info to core.DeviceOption(), enforced extra_info to be inherited in scope.DeviceScope extra_info is a newly defined field in DeviceOption proto. This diff added extra_info to the core.DeviceOption(). And, In scope.DeviceScope(), this diff enforce the new scope to inherit the extra_info from old scope. * [opt] hgdirsync wasn't enabled, merge diverged code Here's the damage, P59732616 basically xplat was left behind but had the change from assert to CAFFE_ENFORCE * OMP parallelism over RoIs for RoIAlign op Simpler to parallelize over RoIs. Shouldn't affect other uses as it relies on the number of OMP threads set during startup. PR: https://github.com/pytorch/pytorch/pull/8562 * Use int64_t for shape in FillOps to avoid overflow of int32 * Implement Rotated RoIAlign op Based on Rotated RPNs as explained in https://arxiv.org/abs/1703.01086. The idea is simple - orientation/angle is added as an RPN anchor parameter and then the angle is further regressed similar to bbox coords. There are some additional changes related to NMS and IoU, but besides that it's a direct extension to Faster-RCNN. Further details in https://fb.quip.com/sZHlA1iMfWPZ. RoIs are represented in [center_x, center_y, width, height, angle] format. `angle` repre * Rotated RoIAlign op CUDA forward implementation CUDA forward impl for D8415490 * RoIAlignRotated op CUDA backward pass implementation TSIA * All remaining fixes to eliminate process_github.sh Most of this diff has already been reviewed separately, except for the parts relating to _thnn/utils.py and _utils._internal.py remove skipIf(True, 'Fbcode') line from process_github.sh replace sed of cpp file with #ifdef to control cudnnDestroy use undo sync-time deletion of .gitattributes, remove process_github.sh switch to using _utils._internal rather than try-import-except This diff also fixes the open-source bug where rebuilds have * Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training" Original commit changeset: 7707d2efe60e The original diff is backout becuase the online trainer package is backed out. This code would only work with new online trainer package * [easy] improve error log in adagrad op as title * re-allow use of thnn_h_path This fixes cffi usage in OSS * [4/4] [tum] paralyzing layerNorm for GPU full sync as title * add compile=False to pytorch tests, remove hack with pyc * Add shape and type inference for RowWiseArgMax operator See title * Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training" This reverts commit 78167eeef0af16b60f72c82f9dcdda9b41b4dcbd @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [fix-flaky-test] mock_hive_reader_test flaky, because GlobalCounter collects local counts intervally # Problem `MockHiveReader` uses `GlobalCounter` to limit `max_examples`. GlobalCounter on server node collect local counts from worker nodes every 1 sec. This 1 sec delay makes it impossible to limit exactly to the `max_examples`, it will definitely exceed `max_examples`. # Plan Given, ``` Expected num_examples = max_examples + num_examples/sec (Read Speed) x 1 sec (GlobalCounter Sync Int * [Caffe2] Fix FCGradient cost inference. Prevent overflow in cost inference FCGradient missed a factor 2 in the `num_outputs == 3` case. Overflow was occurring with flop calculation for FC. Changed types to `uint64_t` to prevent future problems. * Fix binary ops with empty inputs Fix binary ops with empty inputs * Support the filling of input blob with provided data as title for Biz Integrity case * Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"" Original commit changeset: 30c55dd38816 Original diff is reverted due to introducing bad integration test. Fixed the integration test. * [c2][easy] improve pack ops error loggings as desc. * Add ShapeTypeInference for LpNorm operator As desc * Shard test_nn to reduce runtime for each test target Closes https://github.com/pytorch/pytorch/pull/8793 The current test_nn would time out and be disabled in GreenWarden, and we need to have an option to split it up in order to pass the stress test. Right now GreenWarden roughly allows running 100 test cases in test_nn before timing out, and here we have an option to divide test_nn into 30 shards (with ~40 tests in each shard) to allow for some test suite growth in the future. * Change default caffe2_streams_per_gpu to 1 * Remove IN_SANDCASTLE from common.py and test_nn.py We prefer to disable the failing tests through Sandcastle UI instead. * Add a new class for an updated prof_dag.proto This diff contains: - An updated prof_dag.proto that contains blob profiles. - A class to deserialize this information (serialization is in a follow up diff) - Update to separate profiling information from NeuralNet (and use it as part of the class above). - Unit tests * Lambdarank for SparseNN This diff adds a lambda_rank_layer for SparseNN. changes include 1) Adds support for multi sessions in c2 op 2) Adds support for two different loss functions in c2 op 3) Unit tests for op * Revert D8586950: Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"" This reverts commit 012220ed63eccc35659a57b31d16a3625da6317b @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [easy] A few fixups to multithread predictor benchmark (1) support perf on T6 server (2) remove dead code * fix a bug about the map size as title * Fix reduce sum on in-place case. Fix reduce sum on in-place case. * [Warmup] Reland reverted diff Allow both offline incremental training and online training Closes https://github.com/pytorch/pytorch/pull/8827 fix net transform integration test. Allow offline and online trainer to coexist D7802642. * Add StoreHandlerNotAvailableException Add an exception for a store that is not available or has been deleted. * Use exception handling for fault tolerance, missing KV store Remove status blobs to communication ops so that exceptions propagate on failure. * [C2/D2][2/n]: Nonnegative-Constrained Optimization -- bounded grad proj for simple bounded constrained optimization, incl non-negative box constraints. * [GanH]: Adaptive Weighting with More Estimations With implemented postivity optimization, we now learn adaptive weights with different parameterizations. This improves parameter estimation and training stability. * Revert some changes for landing * Remove AutoNoGIL in StorageSharing * Temporarily disable net_tests * Revert "[Caffe2] Force tensor inference checks to be triggered during testing" This reverts commit 67ef05c22b2f71b4a489695384932f968384a2a4. * Revert "Fix reduce sum on in-place case." This reverts commit 6cb8a8e1b3db7b6d20941b0053e3f3836068eb64. * Revert "Revert "Fix reduce sum on in-place case."" This reverts commit 130a257c0893dc09f4bd6e6a45d112261807fd2c. |
||
|
|
55757357b2
|
[C++ API] Better forward methods (#8739)
* Better forward methods in C++ API capitalize error message in test_torch.test_flatten Support for operator() * Add operator() to Functional * Get rid of SigmoidLinear * Add BoundFunction to FunctionalImpl * Remove macro from conv because it makes errors more nasty |
||
|
|
04440d2c57 | Fix nonzero and tensor printing of n-dimensional empty tensors. (#8849) | ||
|
|
46bff5d9ff | Set MKL VML error mode to ignore (#8800) | ||
|
|
ce13ca235e |
added default lambd=0.5 for hardshrink (#8770)
* added default lambd=0.5 and tests * lint |
||
|
|
48e90e3339 |
Build system changes (#8627)
* All changes needed to get rid of process_github.sh * allow thnn_h_path |
||
|
|
b6af5d40bf
|
Some 0-sized dimension support, port catArray away from resizeLegacy. (#8666)
* Some 0-sized dimension support, port catArray away from resizeLegacy. The goal of this PR is to port catArray away from resizeLegacy (so we can delete the legacy resize calls), but since catArray has some weird behavior because we don't have arbitrary 0-sized dimension support, I made some effort to fix these both in one pass. The major changes here are: 1) catArray uses the new resize API, no longer the old resizeLegacy API. 2) As 1) is the last usage of resizeLegacy, it is deleted. 3) If compiled with USE_TH_SIZE_ZERO_DIM, catArray will work and properly check shapes for n-dimensional empty tensors. 4) However, we retain the old behavior of "ignoring" size [0] tensors in catArray. We previously allowed this because we didn't have n-dimensional empty tensors. 5) To get the above to work, we also add support for n-dimensional empty tensors for narrow and slice (ifdef USE_TH_SIZE_ZERO_DIM). 6) We change the stride formula for empty tensors to match NumPy; basically, we never multiply by 0 as the size, always at least 1, so the strides are monotonically increasing in the empty tensor case. 7) We print the size of empty tensors if size != [0]; this matches NumPy behavior (even in cases where the size could be inferred from the brackets. 8) For test purposes, we add torch._C._use_zero_size_dim() to add tests for the above. * Fix flake8. * Address review comments. |
||
|
|
cc6b046f48 |
Implement flatten function (#8578)
* Implement flatten function * address comments * allow start_dim=end_dim * undo submodule change |
||
|
|
8e4fe5dcf4 |
Fix serialization for Parameters (#8633)
* Fix serialization for Parameters * address comments * addres comments |
||
|
|
372d1d6735
|
Create ATen tensors via TensorOptions (#7869)
* Created TensorOptions
Storing the type in TensorOptions to solve the Variable problem
Created convenience creation functions for TensorOptions and added tests
Converted zeros to TensorOptions
Converted rand to TensorOptions
Fix codegen for TensorOptions and multiple arguments
Put TensorOptions convenience functions into torch namespace too
All factory functions except *_like support TensorOptions
Integrated with recent JIT changes
Support *_like functions
Fix in place modification
Some cleanups and fixes
Support sparse_coo_tensor
Fix bug in Type.cpp
Fix .empty calls in C++ API
Fix bug in Type.cpp
Trying to fix device placement
Make AutoGPU CPU compatible
Remove some auto_gpu.h uses
Fixing some headers
Fix some remaining CUDA/AutoGPU issues
Fix some AutoGPU uses
Fixes to dispatch_tensor_conversion
Reset version of new variables to zero
Implemented parsing device strings
Random fixes to tests
Self review cleanups
flake8
Undo changes to variable.{h,cpp} because they fail on gcc7.2
Add [cuda] tag to tensor_options_cuda.cpp
Move AutoGPU::set_index_from into .cpp file because Windows is stupid and sucks
Fix linker error in AutoGPU.cpp
Fix bad merge conflict in native_functions.yaml
Fixed caffe2/contrib/aten
Fix new window functions added to TensorFactories.cpp
* Removed torch::TensorOptions
Added code to generate wrapper functions for factory methods
Add implicit constructor from Backend to TensorOptions
Remove Var() from C++ API and use torch:: functions
Use torch:: functions more subtly in C++ API
Make AutoGPU::set_device more exception safe
Check status directly in DynamicCUDAHooksInterface
Rename AutoGPU to DeviceGuard
Removed set_requires_grad from python_variables.h and warn appropriately in Variable::set_requires_grad
remove python_default_init: self.type()
Add back original factory functions, but with deprecation warnings
Disable DeviceGuard for a couple functions in ATen
Remove print statement
Fix DeviceGuard construction from undefined tensor
Fixing CUDA device compiler issues
Moved as many methods as possible into header files
Dont generate python functions for deprecated factories
Remove merge conflict artefact
Fix tensor_options_cuda.cpp
Fix set_requires_grad not being checked
Fix tensor_new.h
TEMPORARILY put some methods in .cpp files to see if it solves issues on windows and mac
Fix bug in DeviceGuard.h
Missing includes
TEMPORARILY moving a few more methods into .cpp to see if it fixes windows
Fixing linker errors
* Fix up SummaryOps to use new factories
Undo device agnostic behavior of DeviceGuard
Use -1 instead of optional for default device index
Also move DeviceGuard methods into header
Fixes around device index after optional -> int32_t switch
Fix use of DeviceGuard in new_with_tensor_copy
Fix tensor_options.cpp
* Fix Type::copy(
* Remove test_non_float_params from ONNX tests
* Set requires_grad=False in ONNX tests that use ints
* Put layout/dtype/device on Tensor
* Post merge fixes
* Change behavior of DeviceGuard to match AutoGPU
* Fix C++ API integration tests
* Fix flip functions
|
||
|
|
c9b8d8566d |
Added flip() fn in ATen (CPU + CUDA) (#7873)
* Spelling fix in MultivariateNormal docstring (#7915) * [c10d] MPI Process Group Implementation (#7783) This provides a bare-minimum MPI Process Group implementation, the commit is on top of @pietern's Gloo Process Group PR. * [c10d] MPI Process Group Implementation ref: https://github.com/pytorch/pytorch/issues/7434 * Better exception, atexit func, and addressed comments * Clang formatting changes * Static initialization and addressed comments * Added constness back * Test will now launch mpi processes if found * CMakeList Changed * Fix Windows doc for import error (#7704) * Fix Windows doc for import error * Fix doc again * Fix wrong format * Moved condition for dilated grouped convolutions to CUDNN convolution implementation (#7465) * Updates to caffe2 operator documentation (#7917) * Significant updates to the operator docs in prep for merge * [auto] Update onnx to 307995b - Update from upstream (onnx/onnx#1038) |
||
|
|
711e5a6ceb
|
Port THS to ATen. (#8409)
* Port THS to ATen.
The basic structure of the patch:
- All kernels in aten/src/THS got rewritten as native
functions in aten/src/ATen/native/sparse
I took the liberty to rename some of the kernels,
opting for a longer, more transparent names than
things like 'spaddcmul'.
- Instead of holding fields for sparse tensor in the TH
C struct THSTensor, they are now held in a C++ class
SparseTensorImpl (this explains why I had to do this
all in one go; I can't have *two* reps for sparse
tensors!)
Along the way, we change a key internal representation
invariant: an "empty" sparse tensor has dimI == 1 and
dimV == 0 (this is different from dimI == 0 and dimV == 0
we had before); this ensures that we maintain the invariant
that dim == dimI + dimV. "Scalar" sparse tensors are
made illegal, because there really is no way to properly
express them in COO format.
- Because we haven't ported THCS or any of the traditional
dense TH implementations, there is a new set of adapter
functions in native/LegacyBridge.cpp exclusively devoted
to deciding whether or not to go to the new native implementation
or back to the legacy TH binding (prefixed with th_).
The intent is that when everything gets ported, we can
delete this file.
- I've kept the stubs for all the THS functions, but they now all
error if you try to actually call them. Eventually, we should
replace these with calls to ATen so that everything keeps
working.
- I gobbled up SparseMM (SparseMM.cpp is no more). It was tasty.
There are some miscellaneous improvements which were needed for other
changes in this patch:
- There is now AT_FORALL_SCALAR_TYPES_EXCEPT_HALF, which does what
it says on the tin.
- axpy templated function moved to TH/BlasUtils.h, there's a new macro
which lets you easily forward to all of the TH functions. We also expose
THBlas_copy. I'm not terribly pleased with these functions but
they seem to serve a purpose they need.
- New method on Tensor to get TensorImpl*, unsafeGetTensorImpl
- accessor() is now this-const, since const-correctness on Tensor is a lie
- New toSparse()/toDense() methods on Type; now you can call these
directly without having to manually apply at::toSparse/toDense
on the Backend and then running toBackend yourself.
Changes to the kernels:
- Previously, the whole body of all kernels was compiled for
every supported scalar type. In our new implementation,
the scalar dispatch has been pushed into the smallest extent
which (1) is not in a type loop and (2) requires statically
knowing the scalar type. These sites all use
AT_DISPATCH_ALL_TYPES. I tried to use lambdas as much as
possible, but sometimes it was not possible when a OpenMP
pragma was used.
- Anywhere we tested if the nDimension of a tensor was zero,
we replaced with a test that numel is zero. Because, as we
known, nDimension of zero-size tensors in TH is zero, and
that's wrong wrong wrong (and not done this way in ATen).
Some subtleties:
- Places where previously fastget1d was used, I now use a
TensorAccessor. However, you have to be careful about grabbing
the accessor, because sometimes you will be accessor'ing
indices/values and they are empty, which means they will
be *1D* ("oh, aren't indices always 2D?" Nope. Nyet.)
So, essentially, it is only safe to grab an accessor *after*
you have checked that nnz != 0. All of these shenanigans
will go away when we properly support zero-size dimensions.
A few places, we test for this case just by wrapping the loop
in a conditional on nnz. Some other places this is not so easy,
so we instead short-circuit the function with a special case for
when nnz == 0 (usually, these implementations are degenerate).
- There is a very subtle but important difference between
_sparse_get_impl(self)->indices() and self._indices();
the latter may return a view! This is because nnz is
not guaranteed to match the dimensions of indices/values;
you can "truncate" a sparse tensor by setting the nnz.
Actually, I think this is not a good idea and we should
enforce a stronger invariant, but for this patch I slavishly
adhere to the old ways, and as such I have to be very
careful if I want to resize something, I had better use
the former and not the latter.
- I had to reimplement broadcasting by hand (thus the s_
and non-s_ functions in the sparse native files). There
is a very important distinction between foo_out and foo_,
so it is important that the LegacyBridge function always
call to the lower layer, and not try to avoid boilerplate
by calling to another LegacyBridge function first.
I did NOT put broadcasting in LegacyBridge (even though,
ultimately, that's where it must live), because the th_
functions which are invoked from LegacyBridge handle
broadcasting themselves, and I don't want to broadcast
twice.
- Sparse function MUST explicitly specify the Type they
dispatch from, otherwise Variable wrapping/unwrapping will
not work correctly. If you use _get_sparse_impl, that is
sufficient to levy this requirement.
- The "has native" tests in LegacyBridge.cpp are not 100%,
because some of the functions are mixed dense-sparse functions,
and so you can't just say, "Oh, if it's sparse and CPU, call
the native sparse implementation." This is handled on a
case by case basis. There is some especially complex
logic for add(), which has dense-dense, sparse-sparse
and dense-sparse implementations.
- I added some uses of SparseTensorRef in native_functions.yaml,
but you will notice that these are all on native_* functions,
and not the actual, top-level functions. So the SparseTensorRef
is purely documentary (helping you not call the wrong overload)
but there is no magic; we do the wrapping ourselves the hard
way. (This is in constrast to the TH binding code which is magical.)
Except for _sparse_mask; _sparse_mask is magical.
- There is a raw_copy_sparse_ method, which is really my way of
getting around the fact that copy_ has never been implemented
for sparse tensors (even before this patch), but there IS a
super secret, internal way of doing these copies that the THS
code used, and which I needed to get my hands on when I did this
port. We should refactor so that either (a) copy_ does support
sparse-sparse copy natively, or (b) we do this other ways.
- Irritatingly, I must explicitly resize_as_ before copy_ into
a tensor. This was not the case with THTensor_(copy) but I don't
have any direct binding that doesn't have this requirement.
- For some reason, the sparse tensor constructor accepts a scalar
tensor for the values tensor. This is kind of weird because
you always need an nnz-dimension. However, the old code supported
this and just expanded it into a 1D size 0 tensor; so we need some
explicit code to do this.
There are maybe a bit more AT_ASSERTs in some of the kernels
than is wise. I added them all when I was debugging and was
loathe to remove them.
Some last mile fixes after this commit went into PR
- Move expand outside of dispatch so autograd works (it used to be inside and then we lost all of the recorded broadcasts).
- Hack to duplicate the derivatives for our now two definitions TH and native. Mercifully the derivatives are short.
- Apparently, TH has a special case to make foo_ functions method only, and if you don't do this the Python arg parsing is wrong. We carefully work around this in the native bindings
- Apply DCE to a test_jit case, fixes wobbling due to DCE trick in tracing
- Update test_function's output
- Some last mile fixes for dispatch confusion in sparse_coo_tensor functions.
- New simplified regression test based on failures I saw in ONNX
- Increase tolerance on super resolution test
- More robust dynamic_type normalization, fixes ONNX bug.
The dynamic_type situation is very delicate; probably need
to stop having both Scalar and real.
- Make new_with_tensor_sparse more CUDA safe
- Note about CUDA-safety in SparseTensorImpl
- Rename dimI/dimV to sparseDims/denseDims.
- Make localScalar on SparseTensorImpl work.
- Make numel uniformly supported on all types, not just dense
types
- Add tests for is_nonzero() method (which exercises localScalar)
- Disable constant JIT autogenerated tests, which are fragile and broken
by this change, but being fixed in a parallel track.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
|
||
|
|
6869a5f0fb |
Throw error on 0-length tensor slicing (#7775)
* throw error on 0-length tensor slicing * return empty tensor instead of throwing error * make 0 slice work for tuples also * add tests * move check to aten * Address comments |
||
|
|
ae55865a3b |
Migrated hardshrink() to ATen and deprecated nn.Hardshrink() (#8117)
* 1. added hardshrink() to ATen (CPU + GPU); 2. removed nn.Hardshrink(); 3. reusing previous tests for nn.Hardshrink() and included CUDA tests at test_nn; 4. default parameter lambda=0.5 is not working yet * optimized memory read/write * 1. pass in lambd as scalar for CPU/CUDA_apply*; 2. removed tests for hardshrink at test_legacy_nn * fixes test_utils * 1. replace zeros_like with empty_like; 2. use scalar_cast in cuda * 1. printing lambd value; 2. default lambd=0.5 is still failing * getting around Scalar bug buy removing default value of lambd from native_functions.yaml, and declare it at nn/functional.py * cleaned up debug printf |
||
|
|
21609e0fd0 |
`bincount` feature implementation (#6688)
* Implement CPU bincount feature support * Incorporate feedback on renaming to SummaryOps file and other nits * bincount gpu implementation * refactor cuda code and incorporate nits * doc fix * cuda bincount - cast weights to double if integral type * fix: signed unsigned comparison error * fix: ssize_t error * refactor * make template typenames readable and other nist * make compatible with v0.5 * incorporate comments * update test cases to ensure CUDA code coverage |
||
|
|
6a85b133d3
|
Improve number formatting in tensor print (#7632)
* Improve number formatting in tensor print * fix bad rebase * address comments * fix test * fix test * use assertExpected for tests * address comments * address comments |
||
|
|
71a3633e3f
|
change tensor.set_() argument names to match descriptions in doc (#8403)
Replaced args name `storage` and `sourceStorage` to `source` in tensor.set_() to match the descriptions in docs. |
||
|
|
ffffee6aa9 | Skip test_multinomial_invalid_probs on Windows (#8360) | ||
|
|
c3e4b3c88b
|
raise more informative error msg for torch.load not support seek (#7754)
Raising more informative error msg for torch.load() when input file does not support seek() or tell() |
||
|
|
742912512c |
Move signal window functions to ATen; add Blackman window (#8130)
* Move signal window functions to ATen; add Blackman window * fix cuda test not checking scipy |
||
|
|
89ea6acde2 |
[NEEDS REVIEW] Add nan and inf probability check to multinomial (#7647)
* Add nan and inf probs check to multinomial * fix bug * Spawn CUDA test in subprocess * Make sure invalid input won't pass the test case * Try to fix error * Test failure cases in Python 3 only * Try to fix Windows error * Move CUDA test to test_cuda.py * fix issues * fix module name error * no need to check for CUDA existence in test_cuda * Use PY3 |
||
|
|
c0a419e6ba
|
Add non_blocking to Tensor/Module.to (#7312)
* Add non_blocking to Tensor/Module.to * flake8 * Add argparse tests * cpp parse * Use C++ parser * use a commong parse function with Tensor.to * fix test_jit * use THPObjectPtr * increase refcount for None, True, and False * address comments * address comments |
||
|
|
bafec1637e |
support loading gzip (#6490)
* support loading gzip * address comments * address comments * fix lint * fix test for python2 |
||
|
|
e9c33e91d9 |
Remove python bindings for torch.slice (#7924)
* skip python bindings for slice * remove tests * convert slice test to indexing |
||
|
|
b5594ac750 |
Raise error when torch.load a storage on a non-existing device (#7921)
* Raise error when torch.load a storage on a non-existing device
Before, doing torch.load(...) on a CUDA tensor on a CPU-only machine
would raise an unreadable error:
```
~/pytorch/pytorch/torch/cuda/__init__.py in __enter__(self)
223 if self.idx is -1:
224 return
--> 225 self.prev_idx = torch._C._cuda_getDevice()
226 if self.prev_idx != self.idx:
227 torch._C._cuda_setDevice(self.idx)
AttributeError: module 'torch._C' has no attribute '_cuda_getDevice'
```
This PR makes it so that torch.load raises a hard error if one tries to
load a storage onto a non-existing device and suggests the user to use
torch.load's map_location feature.
* Address comments
* missing dep
|
||
|
|
769f5f7cfe
|
Handling of scalars in torch.Size (#5676)
* Handling of scalars in torch.Size torch.Size() constructor uses python_arg_parser IntList in python_arg_parser can take iter/range Have IntList take python iterables and ranges. Address comments: don't use python_arg_parser and instead call __index__ in THPSize_pynew Address comments Address comments * Rebased * Address nit |
||
|
|
42a68749bf |
einsum: don't inplace modify arguments (fixes: #7763) (#7765)
Thank you, Pierce Freeman, for the report and minimal example! |
||
|
|
b4ae80d459 | serialization for torch.device (#7713) | ||
|
|
75cf0faf4c | Implement __reduce__ for torch.dtype (#7699) | ||
|
|
4f20a0e439
|
Fix various sparse transpose issues; remove dead code from Declaratio… (#7200)
* Fix various sparse transpose issues; remove dead code from Declarations.yaml. 1) Fixes some checks in t_, transpose_ that don't allow transposing empty sparse tensors. 2) Remove out= variants from docs since they don't exist (and haven't since at least v0.3.1). 3) Unify implementations of t_, transpose_, t, transpose. 4) Move dead checking code from Declarations.cwrap to actual implementations. 5) Fix test which never tested transpose_. * Add test for error with t, t_. * Address review comments. * Fix jit tests. * Fix test_jit. |
||
|
|
7abdc303c6
|
Don't allow requires_grad to be set on integer Tensor constructors in… (#7185)
* Don't allow requires_grad to be set on integer Tensor constructors in tensor_new. * Fix autograd test. * Fix test_distributions. * Fix test_jit. * Fix NN tests. |
||
|
|
32b23a4bfc |
Throw error on tensor creation when sequence shape cannot be determined (#7583)
* first commit * unit test * minor style edits |
||
|
|
bf95dff85b | Map digamma +/-inf results to nan in test (fixes #7651) (#7665) | ||
|
|
e1148db7f2 |
Implement logsumexp (fixes #2591) (#7254)
* Implement logsumexp (fixes #2591) * Add logsumexp_backward, fix _out declaration. Thank you Simon and Edward for your comments! |
||
|
|
cfc1d92975 |
Implement ellipses ('...') and diagonals (e.g. 'ii->i') in einsum. (#7173)
This brings the two most important missing numpy einsum features to toch.einsum. |
||
|
|
eaa3f2e613 |
Fix advanced indexing with negative indices (#7345)
* Fix advanced indexing with negative indices Fixes #7156 Here is some behavior before this PR: ``` In[1]: x = torch.arange(9).view(3, 3).contiguous() x[[0], [-1]] # Should be equivalent to x[0, -1] Out[1]: tensor([ 8]) ``` The bug is that negative indices are added to the computed linear index directly. In the above example, the linear index computed is "-1", which wraps around to "8", giving the last element of a flattened view of `x`. Instead, we should wrap negative indices around before adding them to the linear index. * Use toCLong() |
||
|
|
857e3f4a5e | Throw error in tensor constructor when numpy strides mismatch (#7440) | ||
|
|
9fa1dff66a |
Allow the use of torch.device for loading (#7339)
* Allow using torch.device for loading * Make recommended changes * Better tests |
||
|
|
71626491c4 |
Add batched linear solver to torch.gesv() (#6100)
* Add batched linear solver to torch.gesv() Fixes #3164 Picks up from #4502 I moved `gesv` to ATen. Adds bindings for MAGMA's `gesv_batched` function for CUDA. For CPU, runs `THLapack(gesv)` in a for loop. The new function supports arbitrary batch dimensions (and broadcasting of those dimensions). For example, the 4-d tensor `A x B x M x M` should be treated as having batch-size `(A x B)`. The overhead of creating the magma_queue_t is: ~350000 microseconds the first time it's called and ~6 microseconds every time after that. * Tests and docs * Address comments * Address comments * Rebase * Address comments * Fix rebase * Addressed comments * Address comments * Address comments * Addressed comments |
||
|
|
8091388d0f
|
Add support for __floordiv__ and __rdiv__ for integral tensors (#7245) | ||
|
|
681baa9254
|
Restore warning to torch.range. (#7194)
Also, get rid of warning specification in Declarations.cwrap, which currently has no effect. |
||
|
|
07513cfd1d | implement sum over multiple dimensions (fixes #2006) (#6152) | ||
|
|
88a705555a
|
Add SLEEF for float and double (#6725) | ||
|
|
8031da5479
|
Implement torch.as_tensor, similar to numpy.asarray. (#7109)
* Implement torch.as_tensor, similar to numpy.asarray. torch.as_tensor behaves like torch.tensor except it avoids copies if possible; so also somewhat like tensor.new but without the size overloads. I didn't add a requires_grad field, because we haven't decided on the semantics such as as_param. * Remove requires_grad for doc. |
||
|
|
8fbab83c2a | only Tensors of floating point dtype can require gradients (see #7021) (#7034) | ||
|
|
361648a4a7
|
Fix torch.tensor(...) device-type calculation when used with numpy an… (#6995)
* Fix torch.tensor(...) device-type calculation when used with numpy and type inference. * Fix tensor device type inference as well. * Better variable type inference: infer cuda-ness only if device is not specified. |