pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Ilqar Ramazanli	f0e972a481	To add Nesterov Adam algorithm for multi-tensor optimizers API (#59165 ) Summary: Previously in the PR: https://github.com/pytorch/pytorch/issues/59009 we added NAdam to Optimizers. Here in this PR we are proposing multi-tensor version of NAdam for PyTorch. Nadam has been proposed in the paper https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ and report and report : http://cs229.stanford.edu/proj2015/054_report.pdf by Timothy Dozat. It has been one of the most used algorithm in Deep Learning community. It worth to noting that the implementation of NAdam is inspired by the implementation for Keras : `f9d3868495/tensorflow/python/keras/optimizer_v2/nadam.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59165 Reviewed By: vincentqb Differential Revision: D29360577 Pulled By: iramazanli fbshipit-source-id: 0fe14016303b2df2cb8cc31912a2674acf63d1e5	2021-06-27 17:00:41 -07:00
Mikhail Zolotukhin	3bfe15085d	[TensorExpr] Add a mechanism to register custom TS->NNC lowerings in TensorExprKernel. (#60804 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60804 The lowerings are stored as a map c10::Symbol -> std::function and the signature of thoese functions match the signature of `computeOperandValue`. Custom lowerings have higher priority over the standard ones, i.e. we can redefine how a given op is lowered. In general this feature is aimed at unblocking users whose models contain ops that are not yet supported by NNC - it allows to quickly add a custom lowering for a given op. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D29409580 Pulled By: ZolotukhinM fbshipit-source-id: e8e8dc9d3cb9155cfbf5c08a4216ba1b5b791a60	2021-06-27 15:27:22 -07:00
Ilqar Ramazanli	5563f4bda0	To add Rectified Adam algorithm for multi-tensor optimizers API (#59161 ) Summary: Previously in the PR: https://github.com/pytorch/pytorch/issues/58968 we added RAdam to Optimizers. Here in this PR we are proposing multi-tensor version of RAdam for PyTorch. Radam has been proposed in the paper https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al. It has been one of the most used algorithm in Deep Learning community. Differing from the paper, we selected variance tractability cut-off as 5 instead of 4 as it is the common practice. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59161 Reviewed By: vincentqb Differential Revision: D29360576 Pulled By: iramazanli fbshipit-source-id: 7ccdbf12b1ee7f12e66f7d7992123a70cc818b6b	2021-06-27 13:01:20 -07:00
Ansley Ussery	0fbc471d10	Support default values on NamedTuple fields (#54682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54682 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D27327241 Pulled By: ansley fbshipit-source-id: 76546f1770d50ebc3435bba3b74540e3c6be8a1c	2021-06-26 15:18:21 -07:00
Rong Rong (AI Infra)	6b53792f18	fix cuda mem leak check not properly run on master_builds (#60742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60742 improved CI_MASTER flag check logic, since it can be unset, true or false Test Plan: search for `PYTORCH_TEST_SKIP_CUDA_MEM_LEAK_CHECK` in logs below: - Before adding ci/master: - build workflow (`PYTORCH_TEST_SKIP_CUDA_MEM_LEAK_CHECK=1`): https://circleci.com/api/v1.1/project/github/pytorch/pytorch/14394913/output/107/0?file=true&allocation-id=60d5fd2fa55ae50282aec997-0-build%2F10295B30 - After adding ci/master label: - build workflow (`PYTORCH_TEST_SKIP_CUDA_MEM_LEAK_CHECK=0`): https://circleci.com/api/v1.1/project/github/pytorch/pytorch/14398213/output/107/0?file=true&allocation-id=60d61cf8bb9d097afc7a11aa-0-build%2F400138F1 - master build workflow (`PYTORCH_TEST_SKIP_CUDA_MEM_LEAK_CHECK=0`): https://circleci.com/api/v1.1/project/github/pytorch/pytorch/14398198/output/107/0?file=true&allocation-id=60d61ca3467438480c963290-0-build%2F2999C909 Reviewed By: ngimel Differential Revision: D29405732 Pulled By: walterddr fbshipit-source-id: 09dd653cbb47ca61b1f8872851bda6db8db671b9	2021-06-26 07:05:32 -07:00
Hao Lu	e3abccec8a	[Static Runtime] Remove output type constraints (#60669 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60669 Test Plan: Added unit test to check for nested outputs. Reviewed By: ajyu Differential Revision: D29322025 fbshipit-source-id: a3c8d3c5f0bb7cf7fda4bc5f579adb8fa7bc3724	2021-06-26 02:36:27 -07:00
Takeshi Watanabe	dae25c2002	Fix missing spaces in error of constant_pad_nd (#60729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60729 Reviewed By: ZolotukhinM Differential Revision: D29404422 Pulled By: ngimel fbshipit-source-id: c40458c7a6ae33f61c680bff8de778a80658c250	2021-06-25 19:20:03 -07:00
Richard Barnes	9a08e87d8b	Modernize for-loops in aten (#59598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59598 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D28946826 fbshipit-source-id: 9f3f7e38833c2bc33d27243cef16ab0118c65f3a	2021-06-25 19:02:00 -07:00
Xiong Wei	7e3a694b23	supports non-leaf inputs for autograd.backward() function (#60521 ) Summary: Close https://github.com/pytorch/pytorch/issues/60268 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60521 Reviewed By: ngimel Differential Revision: D29393586 Pulled By: albanD fbshipit-source-id: 2dd2de427ecfecca8d544237bacf690e0b7c918c	2021-06-25 18:57:26 -07:00
albanD	056a8e0d5c	Remove un-used parameter in _trilinear backward (#60673 ) Summary: This argument is only important for speed and memory usage. So it is ok to ignore it during the backward. As discussed, we might want to change this to speed up backward in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60673 Reviewed By: soulitzer Differential Revision: D29370125 Pulled By: albanD fbshipit-source-id: ad50b3ea530aeb194f5a51845523b517a50f2c71	2021-06-25 17:47:10 -07:00
Yi Wang	f262217101	[Model Averaging] Move step out of model averaging API (#60632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60632 Address the comment https://github.com/pytorch/pytorch/pull/60320#discussion_r654845062 ghstack-source-id: 132340278 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_periodic_model_averager Reviewed By: rohan-varma Differential Revision: D29355609 fbshipit-source-id: 50a6f13ed70b5a5b5b92ead2f3d7082c11277af5	2021-06-25 17:20:52 -07:00
Ivan Yashchuk	c5f0692b6e	Sparse CSR: increase dtype test coverage (#60656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60656 This PR uses `torch.testing.get_all_dtypes()` for dtype parametrisation of tests in `test_sparse_csr.py`. It adds previously excluded from tests bool, half, bfloat16, complex dtypes. `torch.complex32` is omitted due to lack of coverage and lack of specialized `AT_DISPATCH...`. The process of adding more dtypes to tests releaved that `.to_dense()` doesn't work for all dtypes. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D29408058 Pulled By: cpuhrsch fbshipit-source-id: 319b6f51b9786d6957d508f51657657a6d00267a	2021-06-25 17:11:21 -07:00
mingfeima	dd045ab540	add channels last for AdapativeMaxPool2d (#48920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48920 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D25399467 Pulled By: VitalyFedyunin fbshipit-source-id: d9d2cc728cc7a18a26983e96d3c3e81a23659e89	2021-06-25 16:36:20 -07:00
Will Constable	367aff91d8	Fix missing #pragma once in jit/method.h Summary: it seems to be accidentally missing Test Plan: run CI Reviewed By: suo Differential Revision: D29335990 fbshipit-source-id: 2790bc10d141f9484a0807ff7800024a02fd9cfa	2021-06-25 16:32:54 -07:00
Victor Bittorf	8b6487c650	Add CUDA Vital (#58059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58059 Add CUDA.used vital sign which is true only if CUDA was "used" which technically means the context was created. Also adds the following features: - Force vitals to be written even if vitals are disabled, to enable testing when the env variable is not set from the start of execution - Add a read_vitals call for python to read existing vital signs. Test Plan: buck test mode/dbg caffe2/test:torch -- --regex basic_vitals Reviewed By: xuzhao9 Differential Revision: D28357615 fbshipit-source-id: 681bf9ef63cb1458df9f1c241d301a3ddf1e5252	2021-06-25 16:31:11 -07:00
Brian Hirsh	9134b0e42f	add a boxed CPU fallback kernel (#58065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58065 This PR replaces the existing code-generated CPU fallback kernels that XLA uses with a single boxed CPU fallback. Current state: there are a couple different design ideas that I want to point out, but the logic for the actually kernel is mostly done and passing tests. ### Design To preface, I'm not 100% tied to the current design and I'm putting the PR up now for opinions and totally open to alternatives, some of which I listed below. Actually after writing this description, I'm leaning toward the following changes: * Confirm whether or not we can remove all C++ logging info directly in the yaml. Current Design All of the CPU fallback codegen is deleted. In its place, XLA (and other external backends, later) can choose to opt into a CPU fallback by adding the following code in a C++ file. I have an corresponding [xla-side PR with the xla changes](https://github.com/pytorch/xla/pull/2945/files#diff-1a005c10039f0cb11130a3b740f5de716d2f10acaea121017016025861886798R1). There's no actual requirement to split up the code into a .h and .cpp file, but that's necessary in the XLA case because they sometimes need to call the fallback directly from their handcrafted kernels. ``` // xla_cpu_fallback.h #include <ATen/native/CPUFallback.h> ... void xla_cpu_fallback(const c10::OperatorHandle& op, torch::jit::Stack* stack); ... ``` ``` // xla_cpu_fallback.cpp #include "torch_xla/csrc/aten_cpu_fallback.h" ... void xla_cpu_fallback(const c10::OperatorHandle& op, torch::jit::Stack* stack) { // Do custom logging here ... // Call the actual boxed CPU fallback. at::native::cpu_fallback(op, stack); } TORCH_LIBRARY_IMPL(_, XLA, m) { m.fallback(torch::CppFunction::makeFromBoxedFunction<&xla_cpu_fallback>()); } ``` Now that the fallback is exposed in the backend, they can call it directly. Doing so requires converting from an unboxed to a boxed context, which we provide a utility function before. E.g.: ``` #include <ATen/native/CPUFallback.h> at::Tensor addmm(const at::Tensor& self,const at::Tensor& mat1,const at::Tensor& mat2,const at::Scalar& beta,const at::Scalar& alpha) { .... if (...call_fallback...) { return at::native::call_fallback_fn<&xla_cpu_fallback, decltype(at::addmm)>::call("aten::addmm", self, mat1, mat2, beta, alpha); } ... } ``` That `decltype(at::addmm)` logic isn't actually used everywhere in the xla-side PR yet, since you hit issues with overloads. I could use it everywhere once #58092 lands. Alternatives: The API for calling the CPU fallback directly is ugly, can we make it nicer? We could change the api to use `at::redispatch`, which would make it look something like this: ``` at::Tensor addmm(const at::Tensor& self,const at::Tensor& mat1,const at::Tensor& mat2,const at::Scalar& beta,const at::Scalar& alpha) { .... if (...call_fallback...) { return at::redispatch::addmm(c10::DispatchKeySet(c10::DispatchKey::CPUFallback), self, mat1, mat2, beta, alpha); } ... } ``` Which definitely feels cleaner, but also requires adding a new DispatchKey just for this use case. Conditionally calling the CPU fallback doesn't sound like a hugely important use case, so I don't know if giving up one of our 64 dispatch key slots is worth the API improvement. Totally open to other opinions though! Another more mild improvement that would avoid having to pass operator string names (including overloads) around would be to codegen (yet another) namespaced API. Something like this: ``` at::Tensor addmm(const at::Tensor& self,const at::Tensor& mat1,const at::Tensor& mat2,const at::Scalar& beta,const at::Scalar& alpha) { .... if (...call_fallback...) { return at::fallback::addmm<&xla_cpu_fallback>(self, mat1, mat2, beta, alpha); } ... } ``` Writing that out actually I actually like it more (I think it'll let us get rid of `decltype(...)`). Maybe that is nice enough to warrant a new codegen API - I haven't tried adding that yet, but if people like it I'm happy to try it out. More alternatives The current design also involves the backend manually writing and registering the boxed fallback themselves, but an alternative would be for us to do it in codegen too: they would just need to pass in all of the C++ logging that they want done in the fallback, directly through the yaml. The main downsides: * Backend code that wants to call the fallback needs to abide by whatever convention our codegen uses to name the generated boxed fallback. * Passing custom C++ logging through yaml is just more fragile: right now xla uses an `iostream` to log each tensor arg in the operator, so we'd have to either force other backends into the same convention or figure something else out later. To be fair, we actually already do that: XLA has custom per-tensor-arg logging for all of the generated `out` wrappers in the codegen, which we do by passing their C++ logging info through the yaml. This seems unnecessary though, since `out` wrappers just call into a functional kernel, which is hand written with its own custom logging. So my take is: try to remove custom C++ logging from the yaml, and if it turns out to be really necessary, then we may as well take advantage of that to codegen the fallback. ### Performance impact While ops that fall back to CPU aren't exactly hot path, we probably don't want to use a boxed fallback if it turns out to be an absolute perf killer. I ran my benchmarks using callgrind, benchmarking both `at::add` and `at::add_out` run on XLA. My callgrind benchmark for `at::add` can be found here (the add_out benchmark looks basically the same): https://www.internalfb.com/phabricator/paste/view/P415418587. I created the benchmark by hacking the existing xla C++ test build scripts and throwing in a reference to callgrind. I also attached the full callgrind output for each benchmark; the full output is actually pretty noise and hard to parse, but I focused on everything underneath the `at::add()` call in the output, which was much more stable. My guess is that it's due to some heavyweight async startup processing that xla does. `at::add`: before: 88,505,130 instructions. Full output: https://www.internalfb.com/phabricator/paste/view/P415421001 after: 102,185,654 instructions. Full output: https://www.internalfb.com/phabricator/paste/view/P415421273 delta: ~15.5% increase `at::add_out`: before: 63,897,395 instructions. Full output: https://www.internalfb.com/intern/everpaste/?handle=GBrrKwtAPlix9wUEAOZtrFXpdO5UbsIXAAAz after: 73,170,346 instructions. Full output: https://www.internalfb.com/phabricator/paste/view/P415423227 delta: ~14.5% increase High level takeaway: A framework overhead increase of 10-20% doesn't seem too horrible for the CPU fallback use case. For structured, functional ops that requires a CPU fallback, we're actually in an unfortunate situation: we're doing even more work than necessary. Our codegen automatically creates a `CompositeExplicitAutograd` kernel which calls into the `out` operator. So the extra work that we end up doing is: * An extra dispatcher hop: (at::add -> CompositeExplicitAutograd -> CPUFallback -> at::native::add) instead of (at::add -> CPUFallback -> at::native::add) * An unnecessary tensor allocation (the CompositeExplicitAutograd kernel uses at::empty() to create an output tensor, which is immediately overwritten by the CPU fallback) * An unnecessary meta() call (the CompositeExplicitAutograd kernel calls it to create the output tensor, but we call it again in the CPU kernel). * unboxing->boxing->unboxing logic (this is the only strictly required piece) There are definitely ways to avoid the unnecessary work explained above: one would be to give the boxed fallback higher priority than composite keys (there's [an issue for it here](https://github.com/pytorch/pytorch/issues/55104)), and codegen fallthroughs for all composite ops. It'll require more infra to set up, so I see it as more of a perf knob that we can apply if we need it later. Unfortunately I couldn't dig much deeper into the differences aside from the aggregate change in instructions, since it looks like callgrind fudged some of the instruction attribution (`at::to_cpu` takes up a ton of instructions, but I don't see any attribution for the `at::native::add` kernel anywhere). Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D28833085 Pulled By: bdhirsh fbshipit-source-id: 537ebd5d7fb5858f1158764ff47132d503c3b92b	2021-06-25 16:26:50 -07:00
Hongbo Zhang	ad69e2fd11	[torch] Module fix on the support of LazyModule on bug #60132 (#60517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60517 This is to fix the module support on lazymodulefixin on the bug issue #60132 Check the link: https://github.com/pytorch/pytorch/issues/60132 We will have to update lazy_extension given the dependency on module.py and update the unit test as well. Test Plan: Unit test passes torchrec test passes Reviewed By: albanD Differential Revision: D29274068 fbshipit-source-id: 1c20f7f0556e08dc1941457ed20c290868346980	2021-06-25 16:20:19 -07:00
Basil Hosmer	cab926b2c0	faster generate_square_subsequent_mask in nn.Transformer (#60631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60631 Per #48360, speed up `Transformer.generate_square_subsequent_mask`. New impl is informally ~5x faster, though absolute difference is probably small. PR includes Python and C++ versions as well as a couple of places where the previous impl had been copied around. Test Plan: Imported from OSS Reviewed By: jbschlosser, albanD Differential Revision: D29356673 Pulled By: bhosmer fbshipit-source-id: 4c062ba0ead61a445aeef451c78777bf0b3a631e	2021-06-25 16:07:01 -07:00
Ansley Ussery	7585783b8d	Remove `Optional[None]` annotations (#60704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60704 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D29380281 Pulled By: ansley fbshipit-source-id: 055c17329a35375de4ebd058ee6d127475aad373	2021-06-25 15:53:58 -07:00
David Riazati	5ed7400b75	Fix doc preview source directory (#60792 ) Summary: `merge` is the directory with the actual changes, not `master`. Verified by downloading arficats from https://github.com/pytorch/pytorch/pull/60777/checks and searching through the result. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60792 Reviewed By: walterddr Differential Revision: D29405288 Pulled By: driazati fbshipit-source-id: 419c943727c00429945c1f116645bfa22fb12456	2021-06-25 15:46:30 -07:00
Basil Hosmer	7b933cd9ea	configurable pre/post LayerNorm in nn.Transformer (#60593 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60593 Per #55270, this PR makes it configurable whether to run LayerNorm before or after other operations in Transformer layers. However, it leaves for a separate PR the removal of the LayerNorm performed after the final encoder/decoder layer has run, which is redundant when LayerNorms has been run after other in-layer operations (problem described in #24930 #50086 #51447). Note: this means that transformers built with `nn.Transformer()` are now configurable, but will still contain a redundant LayerNorm when configured as before. However, callers of the `TransformerEncoder` and `TransformerDecoder` classes have always been able to avoid this redundancy. Reviewer notes: 1. Ran across this during other work, don't know if anybody's working on it already (most recent conversation in issues seems to be from early April). Happy to abandon if so. 2. Was looking for a quick way to add tests but it looks like the existing ones in test_nn just compare against snapshots. I could add something similar, but curious if there's any prepackaged way to add a test that LayerNorm-first (the new option) yields model that trains properly, etc. 3. New code in the `forward`s was written to minimize diff churn rather than maximize beauty :P happy to pretty it up if desired. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29356590 Pulled By: bhosmer fbshipit-source-id: 308669326990b8923aab5fcd96e03b582fb21f24	2021-06-25 15:43:35 -07:00
angelayi	e13a9587b4	Revert "Revert D29135358: [quant] Input-Weight Equaliaztion - convert modifications" (#60646 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60646 This reverts commit `e60f9cfc58`. Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D29361191 Pulled By: angelayi fbshipit-source-id: 275d8691d8e47da4ab80bb21b51d77ec25a0f714	2021-06-25 15:37:05 -07:00
Mikhail Zolotukhin	7188d84ccf	[Tools] Update path in clang_format_utils after #60473 (#60782 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60782 PR #60473 introduced a new folders nesting level, this change updates clang_format_utils.py to accordingly adjust the way it sets up root path. Test Plan: Imported from OSS Reviewed By: zhxchen17 Differential Revision: D29403622 Pulled By: ZolotukhinM fbshipit-source-id: 6404271615c2d263834cf538ab0153c4d41cc5c3	2021-06-25 14:30:45 -07:00
Adam Simpkins	394f60b0fc	[caffe2] update make_cifar_db to move the string into DB::Put() (#60692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60692 Update make_cifar_db.cc to work with the DB API changes in D29204425 (`00896cb9ed`). Test Plan: buck build caffe2/binaries:make_cifar_db Differential Revision: D29374754 fbshipit-source-id: 23d2acd24031d11071791e398433b537215ffd38	2021-06-25 14:02:24 -07:00
Ilqar Ramazanli	e1bd4963e2	To intorduce Functional API for multi-tensor (#60735 ) Summary: In this PR we change Multi-Tensor Optimizers to Functional API. We can see that in the file : https://github.com/pytorch/pytorch/blob/master/torch/optim/_functional.py , there has been functional API defined for most of Optimizers. However we do not have similar file / functionality for multi tensors : https://github.com/pytorch/pytorch/tree/master/torch/optim/_multi_tensor Therefore we are adding it in this PR here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60735 Reviewed By: vincentqb Differential Revision: D29392253 Pulled By: iramazanli fbshipit-source-id: cebc8e7b07ab11156370f5297cfb419cd9f20b46	2021-06-25 13:09:26 -07:00
Richard Barnes	8f16a38067	Add missing kernel checks (#60635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60635 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29355747 fbshipit-source-id: 20bae292703a54b2895a33c11e6f1b8b9a9d8195	2021-06-25 12:54:40 -07:00
lezcano	dfc8247d33	Faster cumsum and cumprod backwards (#60642 ) Summary: Piggybacking on https://github.com/pytorch/pytorch/pull/58747, now we can implement the backwards of `cumsum` and `cumprod` without tricks. This minimises the number of kernels that are launched in GPU, so we see a reasonable speed-up on GPU. We should also get a better stability for ill-conditioned inputs, as we do not perform any numerical tricks to get the result. Note that the benchmarks test forward + backward, so the true speed-up on the backward should be even faster. Even more so in `cumsum`, as it requires less operations than the backward of `cumprod`. <details> <summary> Test Script </summary> ```python from itertools import product import torch from torch.utils.benchmark import Compare, Timer def get_timer(ndims, prod_dim, dim, num_threads, device): size = [500]ndims size[dim] = prod_dim x = torch.rand(size, device=device, requires_grad=True) # Make sure there are no zeros as the formula for the backward # that we are testing is for when the backward has no zeros with torch.no_grad(): x.add_(1e-3) grad = torch.ones_like(x) timer = Timer( "torch.autograd.grad([x.cumprod(dim)], [x], grad_outputs=[grad])", globals={"x": x, "dim": dim, "grad": grad}, label=f"Cumprod + Backwards {device}", description=f"dim: {dim}", sub_label=f"prod_dim: {prod_dim}", num_threads=num_threads, ) return timer.blocked_autorange(min_run_time=5) def get_params(): ndims = 3 dims = range(ndims) prod_dims = [10, 100, 500] for dim, prod_dim, device in product(dims, prod_dims, ("cpu", "cuda")): threads = (1, 2, 4) if device == "cpu" else (1,) for num_threads in threads: yield ndims, prod_dim, dim, num_threads, device compare = Compare([get_timer(*params) for params in get_params()]) compare.trim_significant_figures() compare.print() ``` </details> <details> <summary> Benchmark PR </summary> ``` [------------ Cumprod + Backwards cpu -------------] \| dim: 0 \| dim: 1 \| dim: 2 1 threads: ----------------------------------------- prod_dim: 10 \| 11 \| 14 \| 12 prod_dim: 100 \| 260 \| 270 \| 260 prod_dim: 500 \| 1400 \| 1550 \| 1360 2 threads: ----------------------------------------- prod_dim: 10 \| 6 \| 6 \| 6 prod_dim: 100 \| 170 \| 166 \| 167 prod_dim: 500 \| 902 \| 950 \| 858 4 threads: ----------------------------------------- prod_dim: 10 \| 4 \| 3 \| 3 prod_dim: 100 \| 110 \| 108 \| 106 prod_dim: 500 \| 576 \| 590 \| 547 Times are in milliseconds (ms). [------------ Cumprod + Backwards cuda ------------] \| dim: 0 \| dim: 1 \| dim: 2 1 threads: ----------------------------------------- prod_dim: 10 \| 562 \| 566 \| 1075 prod_dim: 100 \| 5388 \| 5394 \| 6697 prod_dim: 500 \| 28170 \| 27580 \| 30740 Times are in microseconds (us). ``` </details> <details> <summary> Benchmark master </summary> ``` [------------ Cumprod + Backwards cpu -------------] \| dim: 0 \| dim: 1 \| dim: 2 1 threads: ----------------------------------------- prod_dim: 10 \| 11 \| 13 \| 12 prod_dim: 100 \| 270 \| 270 \| 256 prod_dim: 500 \| 1500 \| 1590 \| 1300 2 threads: ----------------------------------------- prod_dim: 10 \| 6 \| 6 \| 6 prod_dim: 100 \| 170 \| 170 \| 164 prod_dim: 500 \| 911 \| 940 \| 840 4 threads: ----------------------------------------- prod_dim: 10 \| 4 \| 4 \| 4 prod_dim: 100 \| 111 \| 109 \| 105 prod_dim: 500 \| 570 \| 590 \| 536 Times are in milliseconds (ms). [------------ Cumprod + Backwards cuda ------------] \| dim: 0 \| dim: 1 \| dim: 2 1 threads: ----------------------------------------- prod_dim: 10 \| 616 \| 597 \| 1109 prod_dim: 100 \| 5976 \| 5723 \| 7017 prod_dim: 500 \| 31110 \| 29160 \| 32320 Times are in microseconds (us). ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/60642 Reviewed By: ngimel Differential Revision: D29366368 Pulled By: albanD fbshipit-source-id: b0d692ce030352965c2f152e0f92fbb61fc5ebde	2021-06-25 12:44:12 -07:00
David Riazati	d3bec9f4d2	Use S3 for documentation previews (#60711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60711 We already build the docs on each PR, this adds a step to push the relevant folder of the docs (we build the entire website for pytorch.github.io which clocks in at around 500 MB, but we really only need the "master" docs, not every version. The master docs by themselves are around 50 MB which is more reasonable). It uses the same S3 bucket as the artifacts but places the items at the `pytorch/pytorch/pr-previews/<pr number>` prefix. The bucket has a rule to expire resources in that prefix after 1 month. On the AWS side the bucket has static hosting enabled with CloudFront directing to the docs preview prefix, so you can see the output at `https://d28slxzaq48q8t.cloudfront.net/<pr number>/`, e.g. https://d28slxzaq48q8t.cloudfront.net/60711/. For advertising we could link this on the HUD PR page as well as in the Dr. CI comment. We could add a CNAME on CloudFront to make this be `pr-preview.pytorch.org/<pr number>` or something but having random PRs be able to host content on the pytorch.org domain seems sketchy. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29398818 Pulled By: driazati fbshipit-source-id: 24032854d83815853b3650d8e54f60b684707f76	2021-06-25 12:12:26 -07:00
Edward Yang	aacc722aec	Dispatch to Python via __torch_dispatch__ (#59760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59760 See https://github.com/pytorch/pytorch/issues/59049 There are some moving parts to this PR, I'll structure this explanation so the straightforward parts go first, and then the less straightforward parts. The actual dispatch to Python. The core logic of dispatch to Python lives in `concrete_dispatch_fn` in `torch/csrc/autograd/python_variable.cpp`. It takes the input IValue stack, scans all the arguments for Tensor arguments, and defers most of the heavy lifting to `handle_torch_function_no_python_arg_parser` which actually does all of the logic for calling out to torch dispatch (in particular, this function handles multiple dispatch situations for you). Because we have a different function name than regular `__torch_function__` handling, `handle_torch_function_no_python_arg_parser` is generalized to accept a magic method name to look for when testing if Tensors have custom handling or not. Unlike `__torch_function__`, by default there is no `__torch_dispatch__` on Tensor classes. Maintaining the Python dispatch key. In order to get to the dispatch to Python logic, we must tag Tensors with the `__torch_dispatch__` magic method with the newly added Python dispatch key (separated from PythonFuncTorch to allow for a transitional period while they migrate to this mechanism). We expose a new private property `_is_python_dispatch` that assists in debugging if a Tensor is participating in Python dispatch or not. We apply the Python dispatch key the first time a PyObject for a Tensor is constructed (THPVariable_NewWithVar), testing if `__torch_dispatch__` exists with then newly added `check_has_torch_dispatch`. Shallow copy and detach. For the simple examples tested in this PR, most creations of Tensor route through the dispatcher. The exception to this is `shallow_copy_and_detach`, which bypasses the dispatcher and is used when saving tensors for backwards. When a Tensor is Python dispatch, we override the behavior of `shallow_copy_and_detach` to instead directly call into `__torch_dispatch__` to perform a `detach` operation (in the same way it would be invoked if you called `detach` directly). Because this Python call is triggered directly from c10::TensorImpl, it must be indirected through `PyInterpreter::detach`, which is the general mechanism for dynamic dispatching to the Python interpreter associated with a TensorImpl. torchdeploy compatibility. The dispatch to Python logic cannot be directly registered to the dispatcher as it is compiled in the Python library, which will get loaded multiple times per torchdeploy interpreter. Thus, we must employ a two phase process. First, we register a fallback inside a non-Python library (aten/src/ATen/core/PythonFallbackKernel.cpp). Its job is to determine the appropriate PyInterpreter to handle the Python dispatch by going through all of the arguments and finding the first argument that has a PyObject/PyInterpreter. With this PyInterpreter, it makes another dynamic dispatch via "dispatch" which will go to the correct torchdeploy interpreter to handle dispatching to actual Python. Testing. We provide a simple example of a LoggingTensor for testing, which can be used to generate TorchScript-like traces to observe what operations are being called when a Tensor is invoked. Although a LoggingTensor would be better implemented via an is-a relationship rather than a has-a relationship (as is done in the test), we've done it this way to show that arbitrarily complex compositions of tensors inside a tensor work properly. Known limitations. * We haven't adjusted any operator code, so some patterns may not work (as they lose the Python subclass in an unrecoverable way) * `__torch_function__` must be explicitly disabled with `_disabled_torch_function_impl` otherwise things don't work quite correctly (in particular, what is being disabled is default subclass preservation behavior.) * We don't ever populate kwargs, even when an argument is kwarg-only Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D29017912 D29017912 Test Plan: Imported from OSS Reviewed By: bdhirsh Pulled By: ezyang fbshipit-source-id: a67714d9e541d09203a8cfc85345b8967db86238	2021-06-25 11:50:32 -07:00
Aswin John Mathews	a53d7f8f7c	Remove test linalg test skips from MAGMA integration (#58232 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55552; majority of cases in https://github.com/pytorch/pytorch/issues/51303 Tests in torch/testing/_internal/common_methods_invocations.py (tested through test_ops) cannot be fully removed, since the machines seem to be running out of gpu memory during the test, and needs further analysis Pull Request resolved: https://github.com/pytorch/pytorch/pull/58232 Reviewed By: ngimel Differential Revision: D29394021 Pulled By: malfet fbshipit-source-id: f108a70af33beec908ac1c0b58467f8744e6fe87	2021-06-25 11:44:49 -07:00
Elton Leander Pinto	8216da1f23	Use python3.6 compatible APIs in clang_tidy.py (#60659 ) Summary: This PR make `tools/clang_tidy.py` use python 3.6 APIs for `asyncio` and `shlex`. I ran into some issues when running this script with the `-j` flag inside of the clang-tidy docker image (which uses python 3.6). Specifically, the functions `asycnio.run` and `shlex.join` are available in python >= 3.8. This change does not affect CI because we do not run the clang-tidy job in parallel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60659 Reviewed By: albanD Differential Revision: D29377851 Pulled By: 1ntEgr8 fbshipit-source-id: 92ab7ee6782b78d40ffccd03f1718ede4204d948	2021-06-25 10:35:03 -07:00
Edgar Andrés Margffoy Tuay	6322f66878	Add python version and cuda-specific folder to store extensions (#60592 ) Summary: See https://github.com/pytorch/pytorch/issues/55267 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60592 Reviewed By: albanD Differential Revision: D29353368 Pulled By: ezyang fbshipit-source-id: 1fbcd021f1030132c0f950f33ce4a3a2fef351e0	2021-06-25 10:27:04 -07:00
Masaki Kozuki	a404cc9a7b	CUDA `addcmul` and `addcdiv` do math in float for 16 bits I/O (#60715 ) Summary: Currently foreach `addcmul` and `addcdiv` cast scalar to float so that actual math is done in FP32 when tensor dtype is Float16/BFloat16 while regular `addcmul` and `addcdiv`, not. ### Reproducible steps to see the behavioral difference ```ipython In [1]: import torch; torch.__version__ Out[1]: '1.9.0' In [2]: a, b, c = torch.tensor([60000.0], device='cuda', dtype=torch.half), torch.tensor([60000.0], device='cuda', dtype=torch.half), torch.tensor([-1.0], device='cuda', dtype=torch.half) In [4]: torch.addcmul(a, b, c, value=2) Out[4]: tensor([-inf], device='cuda:0', dtype=torch.float16) In [5]: torch._foreach_addcmul([a], [b], [c], value=2)[0] Out[5]: tensor([-60000.], device='cuda:0', dtype=torch.float16) ``` ### How foreach casts? Foreach addcmul and addcdiv cast scalar to `opmath_t` (almost equivalent to acc_type) here: `42c8439b6e/aten/src/ATen/native/cuda/ForeachPointwiseOp.cu (L30)` and cast inputs and results here: `42c8439b6e/aten/src/ATen/native/cuda/ForeachFunctors.cuh (L133-L135)` Related to https://github.com/pytorch/pytorch/issues/58833 #60227 https://github.com/pytorch/pytorch/issues/60454 cc ptrblck mcarilli ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/60715 Reviewed By: albanD Differential Revision: D29385715 Pulled By: ngimel fbshipit-source-id: 8bb2db19ab66fc99d686de056a6ee60f9f71d603	2021-06-25 10:21:35 -07:00
Rohan Varma	0be65cd52a	[c10d] Fix test_collective_hang flakiness (#60662 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60662 Fixes this flaky test. Basically, sometimes a rank can exit the test early before rank 0 calls into allreduce. In this case Gloo will throw connection reset error on all other ranks. ghstack-source-id: 132363151 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D29364806 fbshipit-source-id: ce0c292a2166edad57ea0dbb76df12cfd560a10d	2021-06-25 10:15:18 -07:00
Elton Leander Pinto	474bdaf54d	Add --print-include-paths option to tools/linter/clang_tidy.py (#60744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60744 Fixes #60739 Test Plan: Run this comand: ``` python3 tools/linter/clang_tidy.py --paths torch/csrc/fx --print-include-paths ``` Output (varies from machine to machine): ``` (clang-tidy output) . . . clang -cc1 version 11.0.0 based upon LLVM 11.0.0 default target x86_64-unknown-linux-gnu ignoring nonexistent directory "nccl/include" ignoring nonexistent directory "/include" ignoring duplicate directory ".." ignoring duplicate directory "../aten/src" ignoring duplicate directory "../third_party/onnx" ignoring duplicate directory ".." ignoring duplicate directory ".." ignoring duplicate directory "../torch/lib" ignoring duplicate directory "../torch/../third_party/gloo" as it is a non-system directory that duplicates a system directory ignoring duplicate directory "../third_party/ideep/mkl-dnn/src/../include" as it is a non-system directory that duplicates a system directory #include "..." search starts here: #include <...> search starts here: aten/src ../aten/src . .. ../cmake/../third_party/benchmark/include caffe2/contrib/aten ../third_party/onnx third_party/onnx ../third_party/foxi third_party/foxi ../torch/../aten/src/TH caffe2/aten/src third_party ../torch/../third_party/valgrind-headers ../torch/csrc ../torch/csrc/api/include ../torch/lib ../torch/lib/libshm ../torch/csrc/api third_party/ideep/mkl-dnn/include ../third_party/fmt/include third_party/gloo ../torch/../third_party/gloo ../cmake/../third_party/googletest/googlemock/include ../cmake/../third_party/googletest/googletest/include ../third_party/protobuf/src /data/users/eltonpinto/miniconda3/envs/pytorch/include ../third_party/gemmlowp ../third_party/neon2sse ../third_party/XNNPACK/include ../third_party ../cmake/../third_party/eigen /home/eltonpinto/local/miniconda3/envs/pytorch/include/python3.8 /home/eltonpinto/local/miniconda3/envs/pytorch/lib/python3.8/site-packages/numpy/core/include ../cmake/../third_party/pybind11/include /usr/local/cuda-11.3/include ../third_party/ideep/mkl-dnn/src/../include ../third_party/ideep/include /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8 /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/x86_64-redhat-linux /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/backward /usr/local/include /usr/lib64/clang/11.0.0/include /usr/include . . . (more clang-tidy output) ``` Imported from OSS Reviewed By: ngimel Differential Revision: D29395398 fbshipit-source-id: e92077a9c4e9dee7f9d7e05df180d552e3763540	2021-06-25 10:12:15 -07:00
Elton Leander Pinto	608f12b818	Fix --dry-run option in tools/linter/clang_tidy.py (#60744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60744 Fixes #60741 Test Plan: Run this command: ``` python3 tools/linter/clang_tidy.py --paths torch/csrc/fx --dry-run ``` Output: ``` clang-tidy -p build -config '{"InheritParentConfig": true, "Checks": " bugprone-, -bugprone-forward-declaration-namespace, -bugprone-macro-parentheses, -bugprone-lambda-function-name, -bugprone-reserved-identifier, cppcoreguidelines-, -cppcoreguidelines-avoid-magic-numbers, -cppcoreguidelines-interfaces-global-init, -cppcoreguidelines-macro-usage, -cppcoreguidelines-owning-memory, -cppcoreguidelines-pro-bounds-array-to-pointer-decay, -cppcoreguidelines-pro-bounds-constant-array-index, -cppcoreguidelines-pro-bounds-pointer-arithmetic, -cppcoreguidelines-pro-type-cstyle-cast, -cppcoreguidelines-pro-type-reinterpret-cast, -cppcoreguidelines-pro-type-static-cast-downcast, -cppcoreguidelines-pro-type-union-access, -cppcoreguidelines-pro-type-vararg, -cppcoreguidelines-special-member-functions, -facebook-hte-RelativeInclude, hicpp-exception-baseclass, hicpp-avoid-goto, modernize-, -modernize-concat-nested-namespaces, -modernize-return-braced-init-list, -modernize-use-auto, -modernize-use-default-member-init, -modernize-use-using, -modernize-use-trailing-return-type, performance-, -performance-noexcept-move-constructor, -performance-unnecessary-value-param, ", "HeaderFilterRegex": "torch/csrc/.*", "AnalyzeTemporaryDtors": false, "CheckOptions": null}' torch/csrc/fx/fx_init.cpp ``` Reviewed By: ngimel Differential Revision: D29394538 Pulled By: 1ntEgr8 fbshipit-source-id: b824bc2aa63631f074e9ad17092e4e063d347395	2021-06-25 09:53:29 -07:00
lezcano	3a838e4ce3	Parametrizations depending on several inputs (#60530 ) Summary: Resubmit of https://github.com/pytorch/pytorch/pull/58488 There was a line that had been changed in `test_nn.py` as caught in https://github.com/pytorch/pytorch/pull/58488#discussion_r651267668 I reverted that line, which should never have been changed. I reckon that should solve the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60530 Reviewed By: ngimel Differential Revision: D29329865 Pulled By: albanD fbshipit-source-id: 8dfd0cd968fe26a3924dae7ca366af2c8a8639b3	2021-06-25 09:16:57 -07:00
Kevin Tse	8cba365378	Fix incorrect doc about the dtype for `torch.randint` described in issue #56347 (#60507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60507 Fix incorrect documentation about the dtype for `torch.randint` described in issue #56347 Test Plan: Review documentation to make sure formatting is right Reviewed By: bdhirsh Differential Revision: D29321181 fbshipit-source-id: caae69a9bbb30052da518a3f5d22a7ed3504cdd2	2021-06-25 07:51:36 -07:00
Martin Yuan	d8c3d555e4	[Delegate] Support composite of lowered sub modules of the same backend (#59921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59921 Test Plan: Imported from OSS Reviewed By: raziel Differential Revision: D29091143 Pulled By: iseeyuan fbshipit-source-id: 9ffcd18681917ece8ec73a34866c53701bdee1bc	2021-06-25 07:18:32 -07:00
Ilqar Ramazanli	7c2938bf67	To refactor Sparse Adam algorithm for functional form (#59171 ) Summary: Adds Functional Interface for Sparse Adam Optimizer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59171 Reviewed By: vincentqb Differential Revision: D29360582 Pulled By: iramazanli fbshipit-source-id: 5ceffd7f4b7abd1e0b758a5b8445abdf5555eba0	2021-06-25 06:35:39 -07:00
Xiaomeng Yang	963c983366	Improve numerical stability of LayerNorm (#59987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59987 Similar as GroupNorm, improve numerical stability of LayerNorm by Welford algorithm and pairwise sum. Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "LayerNorm" Reviewed By: ngimel Differential Revision: D29115235 fbshipit-source-id: 5183346c3c535f809ec7d98b8bdf6d8914bfe790	2021-06-25 02:22:42 -07:00
Protonu Basu	5b1f5c8f17	When creating a single parition skip the output nodes, but process possible nodes after it. (#60370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60370 When creating a single parition skip the output nodes, but process possible nodes after it. Test Plan: Run all CI tests. Reviewed By: jfix71 Differential Revision: D29265278 fbshipit-source-id: 2242009973a54498d8027cce5a294558a1206fdf	2021-06-24 23:50:30 -07:00
Hao Lu	2b51a8a935	[BackwardCompatibility] Remove aten::to from allow_list (#60147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60147 Remove aten::to from allow_list now that the aten::to schema change has landed (D29121620 (`eda2ddb5b0`)). Test Plan: CI Reviewed By: iseeyuan Differential Revision: D29187314 fbshipit-source-id: abdb5a560287a861f3858732f7b3da342ee4aa55	2021-06-24 22:57:57 -07:00
kshitij12345	3ca28656fa	[special] erfcx cuda support (#60519 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60519 Reviewed By: ngimel Differential Revision: D29353105 Pulled By: mruberry fbshipit-source-id: 2f525a347a22f96411739a16e354c7291e863f95	2021-06-24 21:50:37 -07:00
Garrett Cramer	46d27a53fe	cuda rpc backward sparse tensor fix (#59609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59609 quick fix for https://github.com/pytorch/pytorch/issues/58755 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D29335722 Pulled By: gcramer23 fbshipit-source-id: 0de7e0399b30f0934320f1e9abb1b92a45bcf929	2021-06-24 21:40:43 -07:00
Mike Ruberry	561132f902	Revert D29330585: [pytorch][PR] add BFloat16 support for arange on CPU Test Plan: revert-hammer Differential Revision: D29330585 (`375d201086`) Original commit changeset: b8a04cee0c3f fbshipit-source-id: dc138f9613becd083848e82d15c138d3883493c8	2021-06-24 20:57:43 -07:00
David Reiss	d63c236fb3	Introduce quantized convolution serialization format 3 (#60241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60241 We're going to make a forward-incompatible change to this serialization format soon, so I'm taking the opportunity to do a little cleanup. - Use int for version. This was apparently not possible when V2 was introduced, but it works fine now as long as we use int64_t. (Note that the 64-bits are only used in memory. The serializer will use 1 byte for small non-negative ints.) - Remove the "packed params" tensor and replace it with a list of ints. - Replace the "transpose" field with "flags" to allow more binary flags to be packed in. - Unify required and optional tensors. I just made them all optional and added an explicit assertion for the one we require. A bit of a hack: I added an always-absent tensor to the front of the tensor list. Without this, when passing unpacked params from Python to the ONNX JIT pass, they type would be inferred to `List[Tensor]` if all tensors were present, making it impossible to cast to `std::vector<c10::optional<at:Tensor>>` without jumping through hoops. The plan is to ship this, along with another diff that adds a flag to indicate numerical requirements, wait a few weeks for an FC grace period, then flip the serialization version. Test Plan: CI. BC tests. Reviewed By: vkuzo, dhruvbird Differential Revision: D29349782 Pulled By: dreiss fbshipit-source-id: cfef5d006e940ac1b8e09dc5b4c5ecf906de8716	2021-06-24 20:52:43 -07:00
Peter Bell	42c8439b6e	TH: Clean up dead code (#60655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60655 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29371717 Pulled By: ngimel fbshipit-source-id: faa71b1d4a15450c78e12aa917daec853057bce9	2021-06-24 19:42:16 -07:00
Peter Bell	4a7d281119	Migrate THAllocator to ATen (#60325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60325 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29371715 Pulled By: ngimel fbshipit-source-id: 78ec8368a48e1a4690d0664a0b02d2a235af98ff	2021-06-24 19:42:14 -07:00
Peter Bell	d586248544	Migrate THStorage_resizeBytes to ATen (CPU) (#60324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60324 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29371716 Pulled By: ngimel fbshipit-source-id: 056aee0ec87722090c133777b6948c28b03b37e4	2021-06-24 19:41:02 -07:00

1 2 3 4 5 ...

38059 Commits