pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Iurii Zdebskyi	f1adddd1c6	Updated sum() logic to properly deal with bool tensor (#21421 ) Summary: `torch.tensor([True, False, True], dtype=torch.bool).sum()` should return 2 instead of True as it does now. Tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/21421 Differential Revision: D15674203 Pulled By: izdeby fbshipit-source-id: b00e3d0ca809c9b92b750adc05632522dad50c74	2019-06-06 12:02:23 -07:00
Hong Xu	f891b4338a	Test the exceptions raised by isfinite and isinf (#21168 ) Summary: Following up `ef1fdc27a3` Pull Request resolved: https://github.com/pytorch/pytorch/pull/21168 Differential Revision: D15696615 Pulled By: ezyang fbshipit-source-id: 46904974ef3c4cb87c7a1d06871bf01543e61ef2	2019-06-06 10:30:26 -07:00
Iurii Zdebskyi	03617574d3	Сhange type of a tensor with bools (#19097 ) Summary: This is bc-breaking change Change dtype of a tensor which was created from bool data. Old behavior: torch.tensor([True, False]) -> uint8 tensor Now: torch.tensor([True, False]) -> bool tensor Tested via tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19097 Reviewed By: ezyang Differential Revision: D15632553 Pulled By: izdeby fbshipit-source-id: b019150844c561a6845710a3c62b12f06b68bbe3	2019-06-05 10:19:27 -07:00
Brennan Vincent	e268fc97c3	Re-add Tensor.T (#21175 ) Summary: Something flaky is going on with `test_inplace_view_saved_output` on Windows. With my PR #20598 applied, the test fails, even though there is no obvious reason it should be related, so the PR was reverted. Based on commenting out various parts of my change and re-building, I think the problem is with the name -- renaming everything from `T` to `asdf` seems to make the test stop failing. I can't be sure that this is actually the case though, since I could just be seeing patterns in non-deterministic build output... I spoke with colesbury offline and we agreed that it is okay to just disable this test on Windows for now and not block landing the main change. He will look into why it is failing. Test Plan: I will wait to make sure the Windows CI suite passes before landing this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21175 Differential Revision: D15566970 Pulled By: umanwizard fbshipit-source-id: edf223375d41faaab0a3a14dca50841f08030da3	2019-06-04 17:38:25 -07:00
Igor Fedan	d348d6405c	cdist: pairwise distances between two sets of tensors with batch mode (#20934 ) Summary: Batch implementation for cdist function Pull Request resolved: https://github.com/pytorch/pytorch/pull/20934 Differential Revision: D15609458 Pulled By: ifedan fbshipit-source-id: 31c12e120d168baec6a6af913f599838a44034d7	2019-06-04 15:52:52 -07:00
Natalia Gimelshein	ad971a37d0	Improve performance of advanced indexing backward (#20557 ) Summary: This PR improves performance of advanced indexing backward, partially solving #15245 (performance is still worse than gather, but not by such outrageous margins). Before, using benchmarking harness from #15245, cuda 10/V100: ``` Indexing is faster by at most -270.61607820767887 us on N: 16 D: 256 K: 1 Indexing is slower by at most 11127.466280784833 us on N: 16 D: 4096 K: 4096 ``` after: ``` Indexing is faster by at most 23.524456737696028 us on N: 512 D: 4096 K: 4096 Indexing is slower by at most 186.24056029472553 us on N: 16 D: 1024 K: 4096 ``` Strategy is to reuse embedding backward kernel, adapting it to handle unindexed dimensions in the beginning by launching additional threadblocks, and also allowing it to handle slices that are bigger than `65K*128`, that is hardly ever a problem for embedding. Still, integer indexing is baked in the kernel, and is important for performance, so for now bigger than 2G element tensors are not supported. The main savings come from not having to expand index to all unindexed dimensions, and not sorting expanded index with incoming gradient values, but rather only sorting unexpanded index. There are ways to make sorting overhead smaller (thanks mcarilli for suggestions) but I'll get to it when it becomes a real problem, or rather, when cuda graphs will force us to get rid of thrust::sort calls. I've also added tests for indexing backward, before tests for index_put_ and indexing backward were non-existent. This PR also fixes #20457 by casting indices to `self` backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20557 Differential Revision: D15582434 Pulled By: ezyang fbshipit-source-id: 91e8f2769580588ec7d18823d99a26f1c0da8e2a	2019-06-03 11:38:53 -07:00
Jerry Zhang	7f960a9c01	remove quantize_linear from Tensor method (#21196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21196 we'll add `quantize(quantizer)` as a tensor method later when we expose `quantizer` in Python frontend Python ``` torch.quantize_linear(t, ...) ``` C++ ``` at::quantize_linear(t, ...) ``` Differential Revision: D15577123 fbshipit-source-id: d0abeea488418fa9ab212f84b0b97ee237124240	2019-05-31 12:01:10 -07:00
Edward Yang	e161360b62	Revert D15558784: [reland][pt1][quant] remove quantize_linear from Tensor method Differential Revision: D15558784 Original commit changeset: 0b194750c423 fbshipit-source-id: d180a7f76bb05ad7470f17bc3d2bd614fab16529	2019-05-31 06:20:05 -07:00
Jerry Zhang	f91f24764e	remove quantize_linear from Tensor method (#21156 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21156 we'll add `quantize(quantizer)` as a tensor method later when we expose `quantizer` in Python frontend Python ``` torch.quantize_linear(t, ...) ``` C++ ``` at::quantize_linear(t, ...) ``` Differential Revision: D15558784 fbshipit-source-id: 0b194750c423f51ad1ad5e9387a12b4d58d969a9	2019-05-30 22:02:12 -07:00
Jerry Zhang	277bf69fa0	Add torch.load/torch.save for QTensor (#20830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20830 att Reviewed By: dzhulgakov Differential Revision: D15340701 fbshipit-source-id: 677038c8101f66dec4856c2eccf9f9e394012226	2019-05-30 20:52:19 -07:00
Iurii Zdebskyi	64f06d4964	Enable all and any for bool tensors (#21033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21033 ghimport-source-id: 35fdcf27b0bde8ec3e5b3051cf0d730f20f94783 Differential Revision: D15530497 Pulled By: izdeby fbshipit-source-id: 9c15cc960055f59a05ce0276f9d51c567626d966	2019-05-30 16:16:00 -07:00
Iurii Zdebskyi	9a22cb9f49	Enabled add, sum and mul for bool tensor (#21032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21032 ghimport-source-id: 6ab21752b4af451e8b10a0e02cd5d726aa7472f0 Differential Revision: D15530496 Pulled By: izdeby fbshipit-source-id: f4f83aa80eafbb4f307aadc1a13d8cdcf3055c24	2019-05-30 16:11:43 -07:00
Edward Yang	c4a90ca18e	Revert D15477933: [pt1][quant] remove quantize_linear and dequantize from Tensor method Differential Revision: D15477933 Original commit changeset: c8aa81f681e0 fbshipit-source-id: ec494fbbab72e20da262bdd8657887e1fdd173cb	2019-05-30 05:04:12 -07:00
Jerry Zhang	67291ba74f	remove quantize_linear and dequantize from Tensor method (#20874 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20874 A criteria for what should go in Tensor method is whether numpy has it, for this one it does not so we are removing it as a Tensor method, we can still call it as function. Python ``` torch.quantize_linear(t, ...), torch.dequantize(t) ``` C++ ``` at::quantize_linear(t, ...), at::dequantize(t) ``` Reviewed By: dzhulgakov Differential Revision: D15477933 fbshipit-source-id: c8aa81f681e02f038d72e44f0c700632f1af8437	2019-05-29 19:17:16 -07:00
Jerry Zhang	4900edebcf	QTensor permute, transpose and contiguous (#20869 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20869 Adding support for the functions listed in the title, by implementing the copy kernel. Differential Revision: D15474060 fbshipit-source-id: 9264df6e442cca1cc5d952e3e5dcc9f4a426f317	2019-05-29 16:05:53 -07:00
Jerry Zhang	157fcfc07d	Add `quantize_linear_per_channel` (#20765 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20765 att Reviewed By: dskhudia Differential Revision: D15435455 fbshipit-source-id: 77770044411ce8ee02d26d63eb7e79cd10db103e	2019-05-29 14:29:16 -07:00
Yashodhan Ghadge	0ffd20c268	Fix empty tensor for unique_dim (#19000 ) Summary: Fixes: #18408 cc: zasdfgbnm Pull Request resolved: https://github.com/pytorch/pytorch/pull/19000 Reviewed By: ezyang Differential Revision: D15470136 Pulled By: VitalyFedyunin fbshipit-source-id: daf71566b4dbdc91927d164f813b5ee8645af1a2	2019-05-29 13:50:32 -07:00
Iurii Zdebskyi	7cb1aa67b0	Enabled min, max, minall, maxall, cmin, cmax, cmaxValue, cminValue for bool tensors (#21031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21031 ghimport-source-id: 379b3e9d20872eb5ad14403ed6751cdb0e730bc5 Reviewed By: ezyang Differential Revision: D15530499 Pulled By: izdeby fbshipit-source-id: f113d6974ee18ac3dfb5c0bcff66865345d137d2	2019-05-29 13:22:54 -07:00
Jerry Zhang	94b9706017	fix `dequantize_linear` (#21035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21035 Fix the dtype error in `dequantize_linear`, it should accept the same dtype argument as `quantize_linear` Differential Revision: D15521931 fbshipit-source-id: 0114c046a3f1046e42fca49c74c85e487fee8616	2019-05-29 12:18:15 -07:00
Edward Yang	0544a491d5	Revert D15499749: [pytorch][PR] Add `Tensor.T` attribute to reverse dimensions Differential Revision: D15499749 Original commit changeset: f3306b496667 fbshipit-source-id: 7f50431d2ea37bc41bfed62f386ddedea1412878	2019-05-29 04:29:48 -07:00
vishwakftw	f6ec464890	Enable batched QR decomposition and add a `some` option (#20689 ) Summary: This PR covers two important points with respect to the QR decomposition: - batching of input matrices (#7500) - adding `some` as an option in `torch.qr` akin to NumPy's `mode` option (#10538) Changelog: - Enable batching for inputs to `torch.qr` - Move QR decomposition implementation to ATen (CPU and CUDA) - Remove existing implementations in TH/THC - Add a `some` option to `torch.qr` that will enable users to switch between complete and reduced decomposition - Modify doc strings Pull Request resolved: https://github.com/pytorch/pytorch/pull/20689 Differential Revision: D15529230 Pulled By: soumith fbshipit-source-id: 16af82b1d2db8a3a758fa8a5f798d83f5f950efb	2019-05-28 17:52:37 -07:00
Brennan Vincent	9294de8c9f	Add `Tensor.T` attribute to reverse dimensions (#20598 ) Summary: For compatibility with numpy Pull Request resolved: https://github.com/pytorch/pytorch/pull/20598 Differential Revision: D15499749 Pulled By: umanwizard fbshipit-source-id: f3306b496667f20169e9b28db3150d12183703bc	2019-05-28 16:59:06 -07:00
Nishant Pandit	9d9751f634	Convert dequantize_linear to an internal function _dequantize_linear (#20938 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20938 Dequantize_linear need not be exposed to the front end users. It will only be used for the jit passes for q-dq insertion and op substitution. Differential Revision: D15446097 fbshipit-source-id: a5fbcf2bb72115122c9653e5089d014e2a2e891d	2019-05-27 15:40:21 -07:00
Brennan Vincent	c46c6a4fe6	Zero slice bug (#20914 ) Summary: Bug reported internally at FB: ```python >>> t=torch.from_numpy(np.empty((0,4))) >>> t[:,1::2]=1 Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: Trying to resize storage that is not resizable at ../aten/src/TH/THStorageFunctions.cpp:76 ``` This happens because the storage offset of `t[:, 1::2]` is 1, and it has 0 elements. We can fix this by avoiding resizing the storage for no-element arrays. (We could also* have avoided it by not modifying the storage index in this case, but I felt this way was more semantically correct -- in general, we should not be assuming it's okay to do anything to the storage when it has zero elements). Pull Request resolved: https://github.com/pytorch/pytorch/pull/20914 Differential Revision: D15497860 Pulled By: umanwizard fbshipit-source-id: 6af61d73a05edfc5c07ce8be9e530f15bf72e6a9	2019-05-24 15:10:59 -07:00
Sam Gross	dee11a92c1	Use Device instead of Backend in TensorIterator (#20690 ) Summary: This PR also moves Device::validate into the header file, which makes statements like `Device d = kCPU` effectively free. Device includes the device's index, so TensorIterator::compute_types now implicitly checks that all CUDA inputs are on the same GPU. Previously, this was done ad-hoc in places like TensorIterator::binary_op. Note that zero-dim Tensor (scalars) are NOT required to be on the same device as other inputs because they behave almost like Python numbers. TensorIterator handles copying zero-dim Tensors to the common device. Prior to this PR, TensorIterator would copy zero-dim Tensors between CPU and GPU, but not between different GPUs (because Backend didn't encode the GPU index). This removes that restriction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20690 Differential Revision: D15414826 Pulled By: colesbury fbshipit-source-id: 1d0ad1f7d663252af36dd4590bcda418c2f7a09f	2019-05-24 12:14:08 -07:00
Jerry Zhang	9ea009fe8b	Add as_quantized_tensor (#20740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20740 Provide a way to assemble quantized Tensor from int8 Tensor, scale and zero point. Differential Revision: D15232416 fbshipit-source-id: c3a3d9d7214b1dc569214c019440c2779fbd063b	2019-05-22 15:19:45 -07:00
Sam Gross	7a0c6d528a	Fix copy_transpose_valid check (#20759 ) Summary: Fixes #20755 (Was broken in #20685) cc vadimkantorov Pull Request resolved: https://github.com/pytorch/pytorch/pull/20759 Differential Revision: D15433712 Pulled By: colesbury fbshipit-source-id: 29f612f7d4d7b73158d6f5dc1e46fd2f8fb09a2f	2019-05-21 15:37:37 -07:00
Jerry Zhang	cca923c481	Add dequantize_linear for JIT pass (#20107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20107 att Reviewed By: nishantpdce Differential Revision: D15202187 fbshipit-source-id: 7d6274a67fcca695c0425587f35046fecbc2ccdc	2019-05-21 12:26:48 -07:00
Brennan Vincent	987f1ccf49	Add "ndim" property to tensor (#20565 ) Summary: For compatibility with numpy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20565 Differential Revision: D15374390 Pulled By: umanwizard fbshipit-source-id: 4ab209a5fb27d8ba27ee7eb6b67b858ce2480594	2019-05-20 16:10:50 -07:00
Iurii Zdebskyi	71260b98e2	Fixed histc return type for CUDA (#20369 ) Summary: Fixing reported [issue](https://github.com/pytorch/pytorch/issues/20208). Pull Request resolved: https://github.com/pytorch/pytorch/pull/20369 Reviewed By: zou3519 Differential Revision: D15300959 Pulled By: izdeby fbshipit-source-id: 219692f99a66ea433112dfc226132eb6867122cf	2019-05-20 08:08:28 -07:00
Jerry Zhang	85fad0597c	Add qint8 type (int8_t) (#19984 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19984 Add qint8 for QTensor, with underlying type of int8_t Reviewed By: jianyuh Differential Revision: D15150715 fbshipit-source-id: 57580f599d46f9323af5ce462dbbc464b25e40d7	2019-05-17 20:35:05 -07:00
Stefan Krah	8c9f4c560a	Add matmul optimization for the case A.ndim <= 2 && B.ndim >= 3 (#20448 ) Summary: This addresses #18862. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20448 Differential Revision: D15393465 Pulled By: ezyang fbshipit-source-id: 87e5b0ed8253ea00365f420d98ac96dd4e934028	2019-05-17 09:44:26 -07:00
vishwakftw	690efa5220	Remove checks for CUDA 8 in LU-based tests (#20482 ) Summary: CUDA 8 is no longer supported and removed from CI, so these checks are irrelevant Pull Request resolved: https://github.com/pytorch/pytorch/pull/20482 Differential Revision: D15393438 Pulled By: ezyang fbshipit-source-id: ac0979bf660b3314eec502c745e34ce4940bda0e	2019-05-17 08:51:56 -07:00
Jerry Zhang	220e6894c5	Rename qint8 data type (#19932 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19932 In preparation to add int8_t data type for QTensor Reviewed By: zafartahirov Differential Revision: D15137838 fbshipit-source-id: 59462c36d6fc5982986d4196bf3f32f49bb294d7	2019-05-16 18:09:28 -07:00
Vitaly Fedyunin	a837c00acd	Removing unnecessary comments (+fix flake8) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20589 Differential Revision: D15373655 Pulled By: VitalyFedyunin fbshipit-source-id: 25277648d3e8f8a09cec7569ceda56e74c2ef0b1	2019-05-16 09:19:34 -07:00
Vitaly Fedyunin	5b78a5eadb	Memory format support for contiguous and is_contiguous (#20455 ) Summary: #19975 was separated by 2 PRs. This one: Introduce MemoryFormat argument to the `x.is_contiguous(memory_format=torch.channels_last)` and to the `y = x.contiguous(memory_format=torch.channels_last)` functions. At this moment both functions just operate with strides and doesn't store any tensor state. (Original RFC #19092) ----- Expands functionality of two tensor functions `.is_contiguous` and `.contiguous` (both python and c++ api). Note: We had several complaints about `.to(memory_format)` function, and decided not to support it. 1. `.contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`. - Using `torch.contiguous_format` will preserve existing `.contiguous()` behavior. - Calling `x.contiguous(memory_format=torch.channels_last)` returns new tensor which maintain same semantical layout (NCHW), but have different memory allocation pattern. `x.contiguous(memory_format=torch.channels_last)` expects input tensor to be 3d, 4d or 5d; and fails otherwise. 2. `.is_contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`. - `x.is_contiguous(memory_format=torch.contiguous_format)` preserves same functionality as `x.is_contiguous()` and remains unchanged. - `x.is_contiguous(memory_format=torch.channels_last)` returns true if A) input tensor is contiguous in memory AND B) allocated in the memory in NWHC (or similar for 3d,5d) format. Note: By the end of the phase one `x.is_contiguous(memory_format=torch.channels_last)` will calculate state of the Tensor on every call. This functionality going to be updated later. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20455 Differential Revision: D15341577 Pulled By: VitalyFedyunin fbshipit-source-id: bbb6b4159a8a49149110ad321109a3742383185d	2019-05-16 07:18:24 -07:00
Jerry Zhang	abb3698976	Add QInt32 ScalarType and qint32 data type (#19816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19816 We need this for quantization for bias add third argument of ScalarType to `quantize_linear` Differential Revision: D15094174 fbshipit-source-id: f19ec8f4716cf5fe0aa21b38d45af6d27c9ab377	2019-05-15 18:50:18 -07:00
Igor Fedan	4c23c34e79	Computing var/stddev and mean at the same time (#18731 ) Summary: The current variance kernels compute mean at the same time. Many times we want both statistics together, so it seems reasonable to have a kwarg/function that allows us to get both values without launching an extra kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18731 Differential Revision: D14726082 Pulled By: ifedan fbshipit-source-id: 473cba0227b69eb2240dca5e61a8f4366df0e029	2019-05-15 16:42:38 -07:00
Brennan Vincent	72bb84c518	Provide a few default args for numpy translation (#20451 ) Summary: Add automatic translations for a few argument names that commonly differ between PyTorch and NumPy. For now, they are as follows: * `keepdim` -> `keepdims` * `dim` -> `axis` * `input` -> (any of `a`, `x`, `x1`) * `other` -> `x2` Basic examples: ```python >>> t=torch.randn(10,10) >>> torch.sum(x=t, axis=1) tensor([ 0.5199, -0.3768, 4.3619, -0.9105, 1.1804, 1.0837, -0.9036, 0.2365, 1.1171, -0.0999]) ``` ```python >>> torch.add(x1=5, x2=6) tensor(11) ``` The additional overhead is zero when using traditional PyTorch argument names, and a few (usually 1) extra PyDict lookups when using NumPy argument names. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20451 Differential Revision: D15337521 Pulled By: umanwizard fbshipit-source-id: 7a7d389786f4ccf5c86a14ecb2002c61730c51b5	2019-05-15 10:13:17 -07:00
Philipp Lang	f23fb66e6e	Fix in file position logic: file descriptor and Python-side handle (#20270 ) Summary: This addresses #18436 The logic replicates the essence of closing file descriptors in numpy: `bf20e30340/numpy/core/include/numpy/npy_3kcompat.h (L278)` This stores the position of the file descriptor before resetting it to the Python handle offset, then resets to the original position before exit. The Python-side handle is then updated to reflect the new position. Also added somewhat more demanding tests to cover this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20270 Differential Revision: D15275902 Pulled By: soumith fbshipit-source-id: 5ca8a52b61c7718d2e69571f72f80b1350b0acdb	2019-05-09 08:20:01 -07:00
Brian Vaughan	c406bf20a0	error instead of crashing on attempt to subclass typed tensors (#20283 ) Summary: https://github.com/pytorch/pytorch/issues/20052 typed tensors (e.g. torch.FloatTensor) can't be subclassed. Was causing crashes and other errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20283 Differential Revision: D15278138 Pulled By: nairbv fbshipit-source-id: 8493eac4d34dfb76b054362bf0acec02146cd0e2	2019-05-09 07:10:38 -07:00
Ilia Cherniavskii	481b6d0268	Allow a non-OpenMP based build (#19749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19749 ghimport-source-id: a6636c0acddbdc5fd5b0dcb20b9f80cbdb9159b9 Differential Revision: D15141993 Pulled By: ilia-cher fbshipit-source-id: 96085608398b2a4c97c68b2948f5184d07f9ad3d	2019-05-06 19:34:48 -07:00
Jerry Zhang	17268a9225	Add print function for QTensor (#19513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19513 Add support for printing a QTensor in python frontend Differential Revision: D15017168 fbshipit-source-id: 312d1f18e6ca3c9eb4a5b8bb1c64f7cc8bc1dcf5	2019-05-06 13:12:43 -07:00
Iurii Zdebskyi	ca57dd9332	Fixed log_normal and geometric for CPU (#19938 ) Summary: log_normal_ and geometric_ were disabled for CPU by mistake in [this PR](`bc53805f2e`), this PR fixes it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19938 Differential Revision: D15143404 Pulled By: izdeby fbshipit-source-id: 41c7bd29f046b5a3ac6d601de8c64ab553771d19	2019-04-30 12:18:10 -07:00
iurii zdebskyi	aa6403bae6	Added .bool() method Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19928 Differential Revision: D15131923 Pulled By: izdeby fbshipit-source-id: 3909cf4623fe85e98ceaf57fbb57745919899445	2019-04-30 10:34:31 -07:00
iurii zdebskyi	de19eeee99	Enabled masked for a bool tensor (#19140 ) Summary: Added deprecation warnings for the masked methods and enabled them for a bool tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19140 Differential Revision: D14888021 Pulled By: izdeby fbshipit-source-id: 0e42daf8f3732ca29f36d10485402bfc502716ad	2019-04-29 10:40:12 -07:00
Xiaomeng Yang	2ce39de3fc	Add elementwise_affine for layer_norm_op (#19713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19713 Add elementwise_affine for layer_norm_op Reviewed By: houseroad Differential Revision: D15075454 fbshipit-source-id: e8a7d3da1c81e49fa55323f5e74a68bc4ef8d83f	2019-04-26 17:20:01 -07:00
Jerry Zhang	6ec55c13a9	Enable assignment for QTensor in pytorch frontend (#19676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19676 Make copy work with QTensor, enable assignment of QTensor in pytorch frontend. Differential Revision: D15064710 fbshipit-source-id: 04f2dc02a825695d41fa1114bfca49e92108fef3	2019-04-24 16:05:34 -07:00
Alex Şuhan	4a65ee95cc	Make torch.equal work with boolean CPU tensors Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19604 Differential Revision: D15056022 Pulled By: li-roy fbshipit-source-id: 1309b107b2d4ee0a490bce1b43c3c175180a1580	2019-04-24 15:51:10 -07:00
Vitaly Fedyunin	d14abe3aff	Add torch.from_file function similar to the Storage.from_file, but returning tensor (#18688 ) Summary: Porting `torch.Storage.from_file(filename, shared, size)` function to `torch.from_file(filename, shared, size, dtype=torch.int)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/18688 Differential Revision: D15012644 Pulled By: VitalyFedyunin fbshipit-source-id: 3f62ca9e414fad3847fe71b785ff97b5bdc2d2cd	2019-04-24 15:38:56 -07:00
Edward Yang	c42f3f9055	Revert D15008160: Enable assignment for QTensor in pytorch frontend Differential Revision: D15008160 Original commit changeset: 5f1166246d76 fbshipit-source-id: 24c7350431ae6a87199d6e3f7ffbbc8ec7d3c28b	2019-04-24 06:58:13 -07:00
Jerry Zhang	309c15e2df	Enable assignment for QTensor in pytorch frontend (#19530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19530 Make copy work with QTensor, enable assignment of QTensor in pytorch frontend. Differential Revision: D15008160 fbshipit-source-id: 5f1166246d768b23f009cde1fa03e8952368a332	2019-04-23 21:29:31 -07:00
Phúc Lê	9b272affde	Add base support to torch.logspace, default base=10 (#19542 ) Summary: Add base support for torch.logspace. See #19220 for details. SsnL can you feedback? Thanks a lot. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19542 Differential Revision: D15028484 Pulled By: soumith fbshipit-source-id: fe5a58a203b279103abbc192c754c25d5031498e	2019-04-23 15:06:34 -07:00
jhultman	f767c9ac76	Add docs and test guaranteeing indices from torch.nonzero ordered C-style (#19539 ) Summary: See #17556. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19539 Differential Revision: D15030151 Pulled By: ezyang fbshipit-source-id: d46ee56a66d89b0113f86e3f8693dc1680d0adb9	2019-04-23 09:29:21 -07:00
Tongzhou Wang	3b4d4ef503	Remove unnecessary printing from tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19606 Differential Revision: D15046583 Pulled By: ezyang fbshipit-source-id: ea9bb691d23855e7eddbabe68bf112a726641ba4	2019-04-23 09:24:08 -07:00
vishwakftw	c30224ad21	Rename potri to cholesky_inverse (#19498 ) Summary: Changelog: - Rename `potri` to `cholesky_inverse` to remain consistent with names of `cholesky` methods (`cholesky`, `cholesky_solve`) - Fix all callsites - Rename all tests - Create a tentative alias for `cholesky_inverse` under the name `potri` and add a deprecation warning to not promote usage Pull Request resolved: https://github.com/pytorch/pytorch/pull/19498 Differential Revision: D15029901 Pulled By: ezyang fbshipit-source-id: 2074286dc93d8744cdc9a45d54644fe57df3a57a	2019-04-22 08:18:39 -07:00
Jerry Zhang	fc1aadec3b	Make empty_affine_quantized private (#19446 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19446 change empty_affine_quantized to _empty_affine_quantized Reviewed By: dzhulgakov Differential Revision: D15008757 fbshipit-source-id: c7699ac0c208a8f17d88e95193970c75ba7219d3	2019-04-19 11:21:44 -07:00
Xiang Gao	e1750754c8	Step 4: add support for unique with dim=None (#18651 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18651 ghimport-source-id: e11988130a3f9a73529de0b0d08b4ec25fbc639c Differential Revision: D15000463 Pulled By: VitalyFedyunin fbshipit-source-id: 9e258e473dea6a3fc2307da2119b887ba3f7934a	2019-04-18 18:28:07 -07:00
Ailing Zhang	88f70a1670	Fix pickling torch.float32 (#18045 ) Summary: Attempt fix for #14057 . This PR fixes the example script in the issue. The old behavior is a bit confusing here. What happened to pickling is python2 failed to recognize `torch.float32` is in module `torch`, thus it's looking for `torch.float32` in module `__main__`. Python3 is smart enough to handle it. According to the doc [here](https://docs.python.org/2/library/pickle.html#object.__reduce__), it seems `__reduce__` should return `float32` instead of the old name `torch.float32`. In this way python2 is able to find `float32` in `torch` module. > If a string is returned, it names a global variable whose contents are pickled as normal. The string returned by __reduce__() should be the object’s local name relative to its module Pull Request resolved: https://github.com/pytorch/pytorch/pull/18045 Differential Revision: D14990638 Pulled By: ailzhang fbshipit-source-id: 816b97d63a934a5dda1a910312ad69f120b0b4de	2019-04-18 12:28:10 -07:00
Jerry Zhang	ad8f34fcca	Add empty_quantized (#18960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18960 empty_affine_quantized creates an empty affine quantized Tensor from scratch. We might need this when we implement quantized operators. Differential Revision: D14810261 fbshipit-source-id: f07d8bf89822d02a202ee81c78a17aa4b3e571cc	2019-04-17 16:17:40 -07:00
Richard Zou	eaa14f5f59	Error out on in-place binops on tensors with internal overlap (#19317 ) Summary: This adds checks for `mul_`, `add_`, `sub_`, `div_`, the most common binops. See #17935 for more details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19317 Differential Revision: D14972399 Pulled By: zou3519 fbshipit-source-id: b9de331dbdb2544ee859ded725a5b5659bfd11d2	2019-04-17 13:02:07 -07:00
Junjie Bai	33443d083e	Fix python lint (#19331 ) Summary: VitalyFedyunin jerryzh168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19331 Differential Revision: D14969435 Pulled By: bddppq fbshipit-source-id: c1555c52064758ecbe668f92b837f2d7524f6118	2019-04-16 21:47:30 -07:00
Jerry Zhang	06c28d8a12	Add slicing and int_repr() to QTensor (#19296 ) Summary: Stack:     ⚫  #19296 [pt1][quant] Add slicing and int_repr() to QTensor  [💛](https://our.intern.facebook.com/intern/diff/D14756833/)     ⚪  #18960 [pt1][quant] Add empty_quantized  [💛](https://our.intern.facebook.com/intern/diff/D14810261/)     ⚪  #19312 Use the QTensor with QReLU  [💛](https://our.intern.facebook.com/intern/diff/D14819460/)     ⚪  #19319 [RFC] Quantized SumRelu  [💛](https://our.intern.facebook.com/intern/diff/D14866442/) Methods added to pytorch python frontend: - int_repr() returns a CPUByte Tensor which copies the data of QTensor. - Added as_strided for QTensorImpl which provides support for slicing a QTensor(see test_torch.py) Pull Request resolved: https://github.com/pytorch/pytorch/pull/19296 Differential Revision: D14756833 Pulled By: jerryzh168 fbshipit-source-id: 6f4c92393330e725c4351d6ff5f5fe9ac7c768bf	2019-04-16 20:17:21 -07:00
Xiang Gao	df67969e6b	Step 3: Add support for return_counts to torch.unique for dim not None (#18650 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18650 ghimport-source-id: 75759c95e6c48e27c172b919097dbc40c6bfb5e6 Differential Revision: D14892319 Pulled By: VitalyFedyunin fbshipit-source-id: ec5d1b80fc879d273ac5a534434fd648468dda1e	2019-04-16 14:06:45 -07:00
Vitaly Fedyunin	1c5073fb4b	Adding pin_memory kwarg to zeros, ones, empty, ... tensor constructors (#18952 ) Summary: Make it possible to construct a pinned memory tensor without creating a storage first and without calling pin_memory() function. It is also faster, as copy operation is unnecessary. Supported functions: ```python torch.rand_like(t, pin_memory=True) torch.randn_like(t, pin_memory=True) torch.empty_like(t, pin_memory=True) torch.full_like(t, 4, pin_memory=True) torch.zeros_like(t, pin_memory=True) torch.ones_like(t, pin_memory=True) torch.tensor([10,11], pin_memory=True) torch.randn(3, 5, pin_memory=True) torch.rand(3, pin_memory=True) torch.zeros(3, pin_memory=True) torch.randperm(3, pin_memory=True) torch.empty(6, pin_memory=True) torch.ones(6, pin_memory=True) torch.eye(6, pin_memory=True) torch.arange(3, 5, pin_memory=True) ``` Part of the bigger: `Remove Storage` plan. Now compatible with both torch scripts: ` _1 = torch.zeros([10], dtype=6, layout=0, device=torch.device("cpu"), pin_memory=False)` and ` _1 = torch.zeros([10], dtype=6, layout=0, device=torch.device("cpu"))` Same checked for all similar functions `rand_like`, `empty_like` and others It is fixed version of #18455 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18952 Differential Revision: D14801792 Pulled By: VitalyFedyunin fbshipit-source-id: 8dbc61078ff7a637d0ecdb95d4e98f704d5450ba	2019-04-16 11:06:15 -07:00
Jerry Zhang	e1f38a847d	Fix type conversion in dequant and add a test (#19226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19226 Type conversoin was wrong previously. Thanks zafartahirov for finding it! Differential Revision: D14926610 fbshipit-source-id: 6824f9813137a3d171694d743fbb437a663b1f88	2019-04-16 10:52:44 -07:00
Jerry Zhang	1c836e7bb9	Add Quantized Backend (#18546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18546 We'll expose all combinations of various ways of quantization in the top level dispatch key, that is we have AffineCPUTensor, PerChannelAffineCUDATensor, etc. QTensor method added: - is_quantized() - item() Differential Revision: D14637671 fbshipit-source-id: 346bc6ef404a570f0efd34e8793056ad3c7855f5	2019-04-12 12:55:49 -07:00
Xiang Gao	3f7ddd269c	Step 2: Rename _unique_dim2_temporary_will_remove_soon to unique_dim (#18649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18649 ghimport-source-id: 3411d240a6af5fe299a889667964730184e30645 Differential Revision: D14888292 Pulled By: VitalyFedyunin fbshipit-source-id: 80da83c264598f74ab8decb165da4a1ce2b352bb	2019-04-12 12:41:20 -07:00
Iurii Zdebskyi	507fe66bea	Enable comp ops for bool tensor (#19109 ) Summary: Enabled comparison ops for bool tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/19109 Differential Revision: D14871187 Pulled By: izdeby fbshipit-source-id: cf9951847d69124a93e5e21dd0a39c9568b1037d	2019-04-11 14:37:10 -07:00
iurii zdebskyi	1858773c0c	Fixed bool Tensor value change bug (#19096 ) Summary: Fixes #19077 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19096 Differential Revision: D14871044 Pulled By: izdeby fbshipit-source-id: 61b12559c8c5b9613e00ba5933f478321ea80469	2019-04-10 11:09:07 -07:00
Xiang Gao	ea2405c7dc	Add torch.unique_consecutive (#19060 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/19045 Please review: VitalyFedyunin ngimel This is independent on the #18649 series. This will cause merge conflicts in #18649 series, but please merge this first, and I will resolve the merge conflicts there. The new feature is exposed in `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon`. But not at `torch.unique` yet. I will take care of the API after #18649 series get merged completely. Benchmark on a tensor of shape `torch.Size([15320, 2])`: ```python print(torch.__version__) print() a = tensor.sort().values.to('cpu') print('cpu, sorted_input=False:') %timeit torch._unique2_temporary_will_remove_soon(a) %timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True) %timeit torch._unique2_temporary_will_remove_soon(a, return_counts=True) %timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True, return_counts=True) print() print('cpu, sorted_input=True:') %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True) %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True) %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_counts=True) %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True, return_counts=True) print() a = a.to('cuda') print('cuda, sorted_input=False:') %timeit torch._unique2_temporary_will_remove_soon(a); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, return_counts=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True, return_counts=True); torch.cuda.synchronize() print() print('cuda, sorted_input=True:') %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_counts=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True, return_counts=True); torch.cuda.synchronize() ``` ``` 1.1.0a0+2addccc cpu, sorted_input=False: 340 µs ± 5.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 717 µs ± 14.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 52.3 ms ± 2.75 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 52.3 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) cpu, sorted_input=True: 32.8 µs ± 285 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 49.9 µs ± 557 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 51.6 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 78 µs ± 782 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cuda, sorted_input=False: 213 µs ± 1.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 291 µs ± 3.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 250 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 321 µs ± 1.59 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) cuda, sorted_input=True: 45.6 µs ± 2.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 110 µs ± 2.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 82 µs ± 857 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 143 µs ± 409 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` ```python print(torch.__version__) print() a1, a2 = tensor.unbind(1) indices = (a1 * tensor.max() + a2).sort().indices a = tensor.index_select(0, indices).to('cpu') print('cpu, sorted_input=False:') %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0) %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True) %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_counts=True) %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True, return_counts=True) print() print('cpu, sorted_input=True:') %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True) %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True) %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_counts=True) %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True, return_counts=True) print() a = a.to('cuda') print('cuda, sorted_input=False:') %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_counts=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True, return_counts=True); torch.cuda.synchronize() print() print('cuda, sorted_input=True:') %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_counts=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True, return_counts=True); torch.cuda.synchronize() ``` ``` cpu, sorted_input=False: 55.4 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 55.8 ms ± 616 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 55.2 ms ± 402 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 55.1 ms ± 725 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) cpu, sorted_input=True: 54.7 ms ± 585 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 55.2 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 54.5 ms ± 865 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 54.9 ms ± 577 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) cuda, sorted_input=False: 171 µs ± 783 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 220 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 203 µs ± 2.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 251 µs ± 2.83 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) cuda, sorted_input=True: 59.6 µs ± 757 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 113 µs ± 431 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 93.2 µs ± 2.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 147 µs ± 2.81 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` The CPU implementation of `unique_dim` is super slow, see https://github.com/pytorch/pytorch/issues/18987, but this PR will not worry about this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19060 Differential Revision: D14866909 Pulled By: ezyang fbshipit-source-id: d20012cec68c37b05cf770a6f4d6524f910b950f	2019-04-10 07:36:08 -07:00
James Reed	82b570528d	Move abs, frac, reciprocal, and neg to TensorIterator (#19041 ) Summary: I've been messing around with vectorizing the fusion compiler in JIT, and noticed that these ops were pathologically slow. I moved them to use TensorIterator + Vec256<> and got some speed wins. Benchmark script: ``` import torch, time ops = ['abs', 'neg', 'reciprocal', 'frac'] x = torch.rand(1024, 1024) NITER = 10000 print('op', 'time per iter (ms)', 'gops/s', 'GB/s', sep='\t') for op in ops: s = time.time() for i in range(NITER): getattr(x, op)() elapsed_sec = ((time.time() - s) / NITER) print(op, elapsed_sec * 1000, (10241024/elapsed_sec)/1e9, (1024102442) / elapsed_sec / 1e9, sep='\t') ``` Before this change (on my mac with a skylake): ``` op time per iter (ms) gops/s GB/s abs 0.9730974197387695 1.0775652866097343 8.620522292877874 neg 1.0723679780960083 0.9778136063534356 7.822508850827485 reciprocal 1.2610594034194946 0.8315040490215421 6.6520323921723366 frac 1.1681334018707275 0.8976509004200546 7.181207203360437 ``` After this change: ``` op time per iter (ms) gops/s GB/s abs 0.5031076192855835 2.084198210889721 16.673585687117768 neg 0.4433974027633667 2.3648672578256087 18.91893806260487 reciprocal 0.47145988941192624 2.2241043693195985 17.79283495455679 frac 0.5036592721939087 2.0819154096627024 16.65532327730162 ``` So, after this change it looks like we are hitting machine peak for bandwidth and are bandwidth bound. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19041 Differential Revision: D14862037 Pulled By: jamesr66a fbshipit-source-id: e2032ac0ca962dbf4120bb36812277c260e22912	2019-04-09 21:55:00 -07:00
Vishwak Srinivasan	487388d8ad	Rename btrisolve to lu_solve (#18726 ) Summary: Changelog: - Rename `btrisolve` to `lu_solve` to remain consistent with names of solve methods (`cholesky_solve`, `triangular_solve`, `solve`) - Fix all callsites - Rename all tests - Create a tentative alias for `lu_solve` under the name `btrisolve` and add a deprecation warning to not promote usage Pull Request resolved: https://github.com/pytorch/pytorch/pull/18726 Differential Revision: D14726237 Pulled By: zou3519 fbshipit-source-id: bf25f6c79062183a4153015e0ec7ebab2c8b986b	2019-04-09 15:21:24 -07:00
Edward Yang	29ea08616b	Add torch.__config__.show(), reporting detailed version of all libraries. (#18579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18579 ghimport-source-id: 65124c95e49423de4ad1008c65e75057fea09b94 Differential Revision: D14778507 Pulled By: ezyang fbshipit-source-id: 1e4bb79f4800a116ce8fb7af2fefbd34da8d102c	2019-04-09 11:13:24 -07:00
Xiang Gao	89145e602b	Namedtuple return for gels, triangular_solve, and test refactor (#17195 ) Summary: Partial fix of: https://github.com/pytorch/pytorch/issues/394 - `gels` and `triangular_solve` now returns namedtuple - refactor test for namedtuple API for better coverage and maintainability Pull Request resolved: https://github.com/pytorch/pytorch/pull/17195 Differential Revision: D14851875 Pulled By: ezyang fbshipit-source-id: 9b2cba95564269d2c3a15324ba48751d68ed623c	2019-04-09 09:13:26 -07:00
Gao, Xiang	8c9caf185b	Add numpy like repeat as torch.repeat_interleave (#18395 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/14093 cc: SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/18395 Differential Revision: D14599509 Pulled By: umanwizard fbshipit-source-id: 2391a1cc135fe5bab38475f1c8ed87c4a96222f3	2019-04-05 18:16:25 -07:00
J M Dieterich	e45e3634d6	add launch bounds, enable more tests (#18909 ) Summary: Add launch bounds annotations for ROCm arising from maxThreadsPerBlock and apply threads use. Enable tests that now work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18909 Differential Revision: D14801490 Pulled By: ezyang fbshipit-source-id: b81c97fc783a2627bc7e31b32036a364cfe40cc7	2019-04-05 10:17:15 -07:00
Vitaly Fedyunin	b7c830b916	Revert "Adding pin_memory kwarg to zeros, ones, empty,... (#18854 ) Summary: This reverts commit `c484cf43a0`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18854 Differential Revision: D14778393 Pulled By: VitalyFedyunin fbshipit-source-id: 4b5a1f5b1c091bbc4a8e75614734cc011d26b452	2019-04-05 06:25:33 -07:00
Iurii Zdebskyi	b4d2df1fee	Added bool and half support for resize_as_ and view methods (#18821 ) Summary: Enabled resize_as_ and view methods for bool and half tensors. tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/18821 Reviewed By: ezyang Differential Revision: D14762852 Pulled By: izdeby fbshipit-source-id: 4312079fb4e893fea6f71ff4f163094b2674f1e8	2019-04-04 13:09:10 -07:00
Gregory Chanan	8732a1b42e	Disallow changing the device of a tensor via set_. (#18832 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18832 ghimport-source-id: fde4ad90541ba52dfa02bdd83466f17e6541e535 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18833 [STACK] Cache device on TensorImpl; clean up TensorImpl constructors. * #18832 [STACK] Disallow changing the device of a tensor via set_. * #18831 [STACK] Stop swapping in Storages of the wrong device for Tensors. This is necessary to cache the device on a TensorImpl. Differential Revision: D14766231 fbshipit-source-id: bba61634b2d6252ac0697b96033c9eea680956e8	2019-04-04 11:15:37 -07:00
Gregory Chanan	486fae563d	Stop swapping in Storages of the wrong device for Tensors. (#18831 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18831 ghimport-source-id: 2741e0d70ebe2c2217572c3af54ddd9d2047e342 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18833 [STACK] Cache device on TensorImpl; clean up TensorImpl constructors. * #18832 [STACK] Disallow changing the device of a tensor via set_. * #18831 [STACK] Stop swapping in Storages of the wrong device for Tensors. This is necessary to support device caching, see https://github.com/pytorch/pytorch/pull/18751 and https://github.com/pytorch/pytorch/pull/18578. In library code, we potentially swap in Storages with the wrong device when device_guard is False. This happens as follows with "view-like" operations. 1) We allocate a tensor on the 'wrong' device (because device_guard is false). 2) We swap out the 'wrong' storage with the 'right' storage using e.g. THCTensor_setStorage. Instead, we can just construct the Tensor with the correct Storage from the beginning. This is what we do with 'view'. Note there are two other "view-like" cases where this happens: 1) unfold 2) set_() Because these aren't performance critical, I just added the device_guard instead of applying the above correction. For completeness, this also includes a test that all `device_guard: false` functions behave properly under these conditions. Reviewed By: dzhulgakov Differential Revision: D14766232 fbshipit-source-id: 0865c3ddae3f415df5da7a9869b1ea9f210e81bc	2019-04-04 06:25:33 -07:00
Vitaly Fedyunin	773ce4fbd0	Step 1: Secretly add return_counts to unique, and refactor unique_dim for performance (#18648 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18648 ghimport-source-id: 1cf4a8fe91492621e02217f38cae5d7e0699fb05 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18661 Step 7: remove _unique * #18655 Step 6: Rename _unique2 to unique and add int? dim * #18654 Step 5: remove _unque_dim in favor of unique_dim * #18651 Step 4: add support for unique with dim=None * #18650 Step 3: Add support for return_counts to torch.unique for dim not None * #18649 Step 2: Rename _unique_dim2_temporary_will_remove_soon to unique_dim * #18648 Step 1: Secretly add return_counts to unique, and refactor unique_dim for performance `unique` is fragile, previously I tried to change it in #18391 and #17097, they all pass OSS tests but finally get reverted due to internal failure. My previous work of refactoring unique #18459 is based on #18391, and after #18391 get reverted, I could not work on #18459. To continue working on #18459, #18391, and #17097 without worrying about internal failures, I am suggesting the following steps for the improvements of `unique` and `unique_dim`. soumith Please take this and there is no need to put #18391 back. The motivation is basically to move forward as much as possible without causing any internal failures. So I will try to divide it into steps and sort from low probability of internal failure to high probability. (I don't know what the internal failure is, so I have to guess). Let's merge these PR stack one by one until we enounter internal failure. Step 1: Create two new ATen operators, `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon` and keep `_unique` and `_unique_dim` unchanged. The backend of these two functions and `_unique` and `_unique_dim` are all the same, the only difference is the temporary ones support `return_counts` but not the `_unique` and `_unique_dim`. Step one is mostly #18391 + #18459. The cuda8 errors has been fixed. At this point, there is no user visible API change, so no docs are updated. `torch.unique` does not support `return_counts` yet, and `return_counts` is tested through the newly added temporary operators. This step just added two new ATen operators, so there shouldn't be any internal failure. Step 2: Rename `_unique_dim2_temporary_will_remove_soon` to `unique_dim`. This should cause no internal failure either, because no change to existing operators. The only thing to worry about is to delete `unique_dim` from python side because we don't want users to use it. At this point, C++ users now have `return_counts` support for `unique_dim`. Step 3: Update the docs of `torch.unique` and use `unique_dim` inside `torch.unique` to support `return_counts` In the docs, we should say `torch.unique` with None dim support does not support `return_counts` yet. This might cause internal failure. Step 4: Rename `_unique2_temporary_will_remove_soon` to `_unique2` and use `_unique2` inside `torch.unique` to support `return_counts`. Update the docs saying that `torch.unique` with None dim now support `return_counts`. This might cause internal failure. Step 5: Remove `_unique_dim`. This might cause internal failure. Step 6: Rename `_unique2` to `unique`, add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. Inside `torch.unique`, use `unique` and get rid of `unique_dim`. Unbind `unique_dim` totally from Python at codegen. This is likely to cause internal fail. Step 7: Remove `_unique`. This is very likely to cause internal failure. This PR ====== This PR is for step 1. This create two new ATen operators, `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon` and implement `return_counts` inside them and do refactor for performance improvements. Please review ngimel VitalyFedyunin. They are mostly copied from #18391 and #18459, so the review should be easy. Below is a benchmark on a tensor of shape `torch.Size([15320, 2])`: Before --------- ```python print(torch.__version__) %timeit a.unique(dim=0, sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(dim=0, sorted=True, return_inverse=True); torch.cuda.synchronize() ``` ``` 1.0.1 192 µs ± 1.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 548 ms ± 3.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` ```python print(torch.__version__) %timeit a.unique(sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(sorted=True, return_inverse=True); torch.cuda.synchronize() ``` ``` 1.0.1 226 µs ± 929 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 302 µs ± 7.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` After ------- ```python print(torch.__version__) %timeit a.unique(dim=0, sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(dim=0, sorted=True, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted=True, return_inverse=False, return_counts=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted=True, return_inverse=True, return_counts=True); torch.cuda.synchronize() ``` ``` 1.1.0a0+83ab8ac 190 µs ± 2.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 237 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 219 µs ± 2.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 263 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` ```python print(torch.__version__) %timeit a.unique(sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(sorted=True, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted=True, return_inverse=False, return_counts=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted=True, return_inverse=True, return_counts=True); torch.cuda.synchronize() ``` ``` 1.1.0a0+83ab8ac 232 µs ± 2.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 301 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 264 µs ± 7.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 339 µs ± 9.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` Differential Revision: D14730905 fbshipit-source-id: 10026b4b98628a8565cc28a13317d29adf1225cc	2019-04-03 15:29:55 -07:00
Jerry Zhang	dfcd7b0185	QTensor (#18230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18230 Implementing minimum qtensor API to unblock other workstreams in quantization Changes: - Added Quantizer which represents different quantization schemes - Added qint8 as a data type for QTensor - Added a new ScalarType QInt8 - Added QTensorImpl for QTensor - Added following user facing APIs - quantize_linear(scale, zero_point) - dequantize() - q_scale() - q_zero_point() Reviewed By: dzhulgakov Differential Revision: D14524641 fbshipit-source-id: c1c0ae0978fb500d47cdb23fb15b747773429e6c	2019-04-03 13:17:11 -07:00
Gregory Chanan	2113ea6fbf	Add device and dtype to storage. (#18749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18749 ghimport-source-id: 9026a037f5e11cdb9ccd386f4b6b5768b9c3259b Stack from [ghstack](https://github.com/ezyang/ghstack): * #18751 Disallow changing the device of a tensor via set_. * #18750 Use non-legacy constructors for tensor deserialization. * #18749 Add device and dtype to storage. The goal here is to fix our serialization, which currently depends on the legacy constructors. Having dtype and device on Storage allows us to use the non-legacy constructors. This fits somewhat along our goal of removing Storage, my having Storage act like a Tensor. Differential Revision: D14729516 fbshipit-source-id: bf4a3e8669ad4859931f4a3fa56df605cbc08dcb	2019-04-03 07:59:02 -07:00
Iurii Zdebskyi	48f70ea0a2	Added numpy conversion (#18505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18505 ghimport-source-id: f3c9b9251e5793f9e192f587194ddfebb45facc1 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18505 [WIP]Added numpy conversion * #18166 Bool Tensor for CUDA Differential Revision: D14646403 fbshipit-source-id: 79d39d692c778ce1981c1d35b1c33e3d93111041	2019-04-03 07:28:24 -07:00
Igor Fedan	3079d95b6c	Fix flake8 issues Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18762 Reviewed By: houseroad Differential Revision: D14734152 Pulled By: ifedan fbshipit-source-id: 5adf123f88273895ad34ee9041896358d686de08	2019-04-02 21:18:01 -07:00
Iurii Zdebskyi	b832b99afb	Bool Tensor for CUDA (#18166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18166 ghimport-source-id: a8e2ba2d966e49747a55701c4f6863c5e24d6f14 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18166 Bool Tensor for CUDA * #18165 Resolved comments from Bool Tensor for CPU PR ------ This PR enables bool tensor creation and some basic operations for the CPU backend. This is a part of Bool Tensor feature implementation work. The whole plan looks like this: 1. Storage Implementation [Done] 2. Tensor Creation. a) CPU [Done] b) CUDA [This PR] 3. Tensor Conversions. 4. Tensor Indexing. 5. Tensor Operations. 6. Back compatibility related changes. Change: Enable bool tensor in CUDA with the following operations: torch.zeros torch.tensor torch.ones torch.rand/rand_like/randint/randint_like torch.full torch.full_like torch.empty torch.empty_like Tested via unit tests and local scripts. Differential Revision: D14605104 fbshipit-source-id: b7d7340a7d70edd03a109222d271e68becba762c	2019-04-02 16:17:05 -07:00
Igor Fedan	2e97c82470	torch.cross' dim default changed to c10::optional instead of int=-1 (#17582 ) Summary: Argument dim=-1 doesn't work for torch.cross. The signature of the torch.cross has been changed to c10::optional<int64_t> dim instead of int64_t. So based on document "If dim is not given, it defaults to the first dimension found with the size 3." and if dim is specified (even negative) it will use the correspondent dim. Fixes #17229 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17582 Differential Revision: D14483063 Pulled By: ifedan fbshipit-source-id: f9699093ec401cb185fd33ca4563c8a46cdcd746	2019-04-02 13:27:00 -07:00
Vitaly Fedyunin	c484cf43a0	Adding pin_memory kwarg to zeros, ones, empty, ... tensor constructors. (#18455 ) Summary: Make it possible to construct a pinned memory tensor without creating a storage first and without calling pin_memory() function. It is also faster, as copy operation is unnecessary. Supported functions: ```python torch.rand_like(t, pin_memory=True) torch.randn_like(t, pin_memory=True) torch.empty_like(t, pin_memory=True) torch.full_like(t, 4, pin_memory=True) torch.zeros_like(t, pin_memory=True) torch.ones_like(t, pin_memory=True) torch.tensor([10,11], pin_memory=True) torch.randn(3, 5, pin_memory=True) torch.rand(3, pin_memory=True) torch.zeros(3, pin_memory=True) torch.randperm(3, pin_memory=True) torch.empty(6, pin_memory=True) torch.ones(6, pin_memory=True) torch.eye(6, pin_memory=True) torch.arange(3, 5, pin_memory=True) ``` Part of the bigger: `Remove Storage` plan. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18455 Reviewed By: ezyang Differential Revision: D14672084 Pulled By: VitalyFedyunin fbshipit-source-id: 9d0997ec00f59500ee018f8b851934d334012124	2019-04-02 08:48:19 -07:00
vishwakftw	baac5489a8	Expose alias multinomial methods to ATen (#17904 ) Summary: This PR exposes the multinomialAliasSetup and multinomialAliasDraw methods. cc: neerajprad Pull Request resolved: https://github.com/pytorch/pytorch/pull/17904 Differential Revision: D14700205 Pulled By: ezyang fbshipit-source-id: 16462fb1f1ef1d560fd586632ea356b23e966ee3	2019-04-02 07:56:41 -07:00
Edward Yang	173f224570	Turn on F401: Unused import warning. (#18598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598 ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a Stack from [ghstack](https://github.com/ezyang/ghstack): * #18598 Turn on F401: Unused import warning. This was requested by someone at Facebook; this lint is turned on for Facebook by default. "Sure, why not." I had to noqa a number of imports in __init__. Hypothetically we're supposed to use __all__ in this case, but I was too lazy to fix it. Left for future work. Be careful! flake8-2 and flake8-3 behave differently with respect to import resolution for # type: comments. flake8-3 will report an import unused; flake8-2 will not. For now, I just noqa'd all these sites. All the changes were done by hand. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14687478 fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3	2019-03-30 09:01:17 -07:00
Vishwak Srinivasan	e73be58ff7	Rename `btriunpack` to `lu_unpack` (#18529 ) Summary: Changelog: - Renames `btriunpack` to `lu_unpack` to remain consistent with the `lu` function interface. - Rename all relevant tests, fix callsites - Create a tentative alias for `lu_unpack` under the name `btriunpack` and add a deprecation warning to not promote usage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18529 Differential Revision: D14683161 Pulled By: soumith fbshipit-source-id: 994287eaa15c50fd74c2f1c7646edfc61e8099b1	2019-03-29 13:01:30 -07:00
Vishwak Srinivasan	d859031ebf	Rename `btrifact*` to `lu` (#18435 ) Summary: Changelog: - Renames `btrifact` and `btrifact_with_info` to `lu`to remain consistent with other factorization methods (`qr` and `svd`). - Now, we will only have one function and methods named `lu`, which performs `lu` decomposition. This function takes a get_infos kwarg, which when set to True includes a infos tensor in the tuple. - Rename all tests, fix callsites - Create a tentative alias for `lu` under the name `btrifact` and `btrifact_with_info`, and add a deprecation warning to not promote usage. - Add the single batch version for `lu` so that users don't have to unsqueeze and squeeze for a single square matrix (see changes in determinant computation in `LinearAlgebra.cpp`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/18435 Differential Revision: D14680352 Pulled By: soumith fbshipit-source-id: af58dfc11fa53d9e8e0318c720beaf5502978cd8	2019-03-29 00:34:30 -07:00
Edward Yang	81e030d9a6	Upgrade flake8-bugbear to master, fix the new lints. (#18507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18507 ghimport-source-id: 1c3642befad2da78a7e5f39d6d58732b85c76267 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18507 Upgrade flake8-bugbear to master, fix the new lints. It turns out Facebobok is internally using the unreleased master flake8-bugbear, so upgrading it grabs a few more lints that Phabricator was complaining about but we didn't get in open source. A few of the getattr sites that I fixed look very suspicious (they're written as if Python were a lazy language), but I didn't look more closely into the matter. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14633682 fbshipit-source-id: fc3f97c87dca40bbda943a1d1061953490dbacf8	2019-03-27 08:07:41 -07:00
Xiang Gao	2ba41c5550	Add some missing docs for tensor methods and attributes, new unittest to enforce tensors.rst no longer miss anything (#16057 ) Summary: This depend on https://github.com/pytorch/pytorch/pull/16039 This prevent people (reviewer, PR author) from forgetting adding things to `tensors.rst`. When something new is added to `_tensor_doc.py` or `tensor.py` but intentionally not in `tensors.rst`, people should manually whitelist it in `test_docs_coverage.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16057 Differential Revision: D14619550 Pulled By: ezyang fbshipit-source-id: e1c6dd6761142e2e48ec499e118df399e3949fcc	2019-03-26 18:05:56 -07:00
Soumith Chintala	66628f78b7	Revert D14605905: [pytorch][PR] Add return_counts to torch.unique Differential Revision: D14605905 Original commit changeset: 555f5a12a8e2 fbshipit-source-id: c7874f5987893e956c022180a37763d88bba38db	2019-03-26 17:18:01 -07:00
Tongzhou Wang	5292685d2f	Improve numerical precision of (s)logdet (#18449 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/18448 and https://github.com/pytorch/pytorch/issues/18450 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18449 Differential Revision: D14611638 Pulled By: soumith fbshipit-source-id: 4f1f27ab5316a92d2783e734169f599afed743cf	2019-03-26 15:32:14 -07:00
Soumith Chintala	436723122e	fix arange shape issue inconsistency across cpu and cuda (#18462 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/18363 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18462 Differential Revision: D14620263 Pulled By: soumith fbshipit-source-id: 223524cdda2f5d55c2ca8d4cdcf6f7a05a6c15eb	2019-03-26 15:27:24 -07:00
Xiang Gao	5bff395a82	Namedtuple return for solve, slogdet, sort, topk (#17093 ) Summary: More ops for https://github.com/pytorch/pytorch/issues/394. ~~Also need to rebase after landing #16186, because we need to update the whitelist of the new unit test added in #16186.~~ cc: ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/17093 Differential Revision: D14620068 Pulled By: ezyang fbshipit-source-id: deec5ffc9bf7624e0350c85392ee59789bad4237	2019-03-26 12:39:08 -07:00
Iurii Zdebskyi	1a742075ee	Resolving comments from Bool Tensor for CPU PR (#18165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18165 ghimport-source-id: 55cb3fb63a25c2faab1725b4ec14c688bf45bd38 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18166 Bool Tensor for CUDA * #18165 Resolved comments from Bool Tensor for CPU PR ------- ------------ This is a follow up PR that resolves some additional feedback on one the of previous Bool Tensor PRs. gchanan, here is a list of almost all the comments from the original PR with respective fixes and replies: [utils/python_scalars.h] why is this converting from uint8_t and not bool? (comment?) When i was adding this, i was testing by creating a tensor and then calling its .tolist(). it worked for bool and uint8_t equally good so i left uint8_t as thought it makes more sense as we are calling PyBool_FromLong. �Changing it to bool. [ATen/Dispatch.h]better name?. fixed. [test/test_torch.py] what about other factories, such as full? (and more). There is a test that goes through the factory methods - test_tensor_factories_empty. i added some bool cases above it and added a comment that once CUDA will be done, i will unite them and it will iterate not just between CUDA and CPU but also all types. ��Adding all bool cases now. Will unite in CUDA PR. [generic/THTensorMath.h] any changes in this file actually needed? Bad merge. Fixed. [TH/THTensor.h] this generates code for random, clampedRandom, and cappedRandom -- do we have tests for all of these with bool? Added [c10/core/ScalarType.h] I'm not very confident about the lack of Bool here -- can you look at the call sites and see what makes sense to do here? Added bool to the macro and created a similar one without for a single case which fails the build with errors: _./torch/csrc/jit/symbolic_variable.h:79:20: error: ambiguous overload for ‘operator’ (operand types are ‘const torch::jit::SymbolicVariable’ and ‘torch::jit::Value’) return (this) insertConstant(rhs);_ Differential Revision: D14605105 fbshipit-source-id: abf82d50e8f8c50b386545ac068268651b28496d	2019-03-26 09:59:34 -07:00

1 2 3 4 5 ...

723 Commits