pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Luca Wehrstedt	201174cb91	Revert D31389480: [pytorch][PR] Allow external CUDA streams to be set as current Test Plan: revert-hammer Differential Revision: D31389480 (`61f0bb70c1`) Original commit changeset: 2b2f40e5452c fbshipit-source-id: c6631e51abcf3819732f981f646cb77b91569c7d	2021-10-08 09:20:24 -07:00
Rohan Varma	b72a1782d8	[PG Wrapper][BE] Add collective information when monitored barrier error is (#66167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66167 Sometimes due to desync we see PG wrapper monitored barrier fail. In this case it would be useful to print the info about the collective that was trying to run along with the actual error. ghstack-source-id: 140037653 Test Plan: CI Reviewed By: cbalioglu Differential Revision: D31353021 fbshipit-source-id: e2a515326c9314c98119978d5566eb5431cca96c	2021-10-08 09:14:24 -07:00
Rohan Varma	b5b1d49a66	[PG Wrapper][BE] Make some methods private (#66166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66166 These methods should be private. ghstack-source-id: 139782587 Test Plan: CI Reviewed By: cbalioglu Differential Revision: D31353020 fbshipit-source-id: 583fb315cc2cacc37df3d29cd5793b42558930b3	2021-10-08 09:13:02 -07:00
Peter Bell	0cad2c0615	Move intraop_launch_future from Parallel.h (#64166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64166 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30728585 Pulled By: dagitses fbshipit-source-id: 75a41418ae9218bec9bac27597051295222b6eee	2021-10-08 09:07:35 -07:00
Scott Wolchok	2d885ab73d	[jit] Reduce refcounting of Types (#65345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65345 FooType::get() can return a const reference. Inconveniently, converting shared_ptr<FooType> to shared_ptr<Type> requires a copy & refcount bump, so to properly take advantage of this in unshapedType() we need to take a const Type& in isSubtypeOf(), which is good practice anyway -- don't require a shared_ptr if you don't need to take ownership. ghstack-source-id: 140044165 Test Plan: CI perf says c10::unshapedType time decreased from 2.8% to 2.2% during static runtime startup, though I expect this to be generally beneficial. Reviewed By: hlu1 Differential Revision: D31027361 fbshipit-source-id: 676feb81db9f74ad7b8651d8774f4ecb4cfa6ab8	2021-10-08 09:03:04 -07:00
Scott Wolchok	1ae468a484	[jit] Refcounting spot fixes (#65346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65346 Tidying up the top sources of reference count decrements seen during static runtime startup. ghstack-source-id: 140027349 Test Plan: CI perf now shows under 2% time spend in ~__shared_count instead of about 5%. Reviewed By: suo Differential Revision: D31057277 fbshipit-source-id: 9a16daf2e655fda80d4ec21290b30f02ba63d8da	2021-10-08 08:39:20 -07:00
Kevin Tse	8ebe1a924d	[DataPipe] moving mux IterDataPipe test to the right location (#66277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66277 Previously, it is grouped together with tests related to `MapDataPipe`, but it should be with `IterDataPipe`. cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31485823 Pulled By: NivekT fbshipit-source-id: d13d8c28cbfc305da0e3033d4109a0f971281a02	2021-10-08 08:32:29 -07:00
Kevin Tse	ed17851642	[DataPipe] adding test for IterableWrapperIterDataPipe (#66276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66276 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31485824 Pulled By: NivekT fbshipit-source-id: c7b21636e4b17e264bfb5dbea69cd3c477472f0b	2021-10-08 08:32:26 -07:00
Kevin Tse	e808e3d3d6	[DataPipe] adding SequenceWrapperMapDataPipe (#66275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66275 Once this is added to Core, TorchData's PR will not need a custom class and can use this wrapper instead. cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31485822 Pulled By: NivekT fbshipit-source-id: 790de27629c89c0ca7163a8ee5a09ee8b8233340	2021-10-08 08:32:24 -07:00
Vasiliy Kuznetsov	a7cc07f109	quantized embedding: make error message clearer (#66051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66051 Make the error message clearer when quantized embedding is converted with an unsupported dtype. This is helpful when debugging quantization errors on new models. Test Plan: ``` class M(nn.Module): def __init__(self): super().__init__() self.embedding = nn.Embedding(1, 1) m = M().eval() m.qconfig = torch.quantization.QConfig( activation=torch.quantization.MinMaxObserver.with_args(dtype=torch.qint8), weight=torch.quantization.MinMaxObserver.with_args(dtype=torch.qint8)) m.embedding.qconfig = m.qconfig mp = torch.quantization.prepare(m) mq = torch.quantization.convert(m) // error message now includes the incorrect dtype ``` Imported from OSS Reviewed By: dagitses Differential Revision: D31472848 fbshipit-source-id: 86f6d90bc0ad611aa9d1bdae24497bc6f3d2acaa	2021-10-08 08:32:22 -07:00
Vasiliy Kuznetsov	c9aba3b128	make error message when trying to quantize non floats more specific (#66050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66050 Adds the dtype to an error message when trying to quantize something other than a float. This is useful for debugging quantization tools on new models. Test Plan: ``` x = torch.randn(1, 1, 1, 1, dtype=torch.double) xq = torch.quantize_per_tensor(x, 0.01, 0, torch.quint8) // error message now includes Double ``` Imported from OSS Reviewed By: dagitses Differential Revision: D31472849 fbshipit-source-id: 2331ffacefcbc6f8eca79694757d740de74a0f1d	2021-10-08 08:32:19 -07:00
Vasiliy Kuznetsov	81660c08f0	quantized add: enable broadcasting (#66049 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66049 Enables quantized add with broadcasting. As pointed out by jamesr66a, this was disabled but TensorIterator already supports it. Added a test case to verify. Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_qadd_broadcast ``` Imported from OSS Reviewed By: dagitses Differential Revision: D31472850 fbshipit-source-id: a3b16d9000487918db743525d22db6864330762b	2021-10-08 08:31:07 -07:00
Edward Yang	ece0221854	Rename int to long, add more C++ types. (#66108 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66108 BC-breaking change: intT is now longT (which aligns it more accurately with how the types are referred to in C++). The benefit for this is we can idiomatically express all C++ dtypes (with intT now mapping to int32_t). These types are needed for ufunc codegen in a latter patch. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31385761 Pulled By: ezyang fbshipit-source-id: ec6f3a0953794313470dbe14911f23ac116be425	2021-10-08 08:25:06 -07:00
Edward Yang	11bc435622	Allow registration of custom symbolics for prim namespace (#64460 ) (#66139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66139 [ONNX] Add prim::PythonOp check back in export.cpp (#64944) Add prim::PythonOp check back in export.cpp Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31424102 fbshipit-source-id: 6d2eef767fab846ed79ea509e97b714072bac9f4 Co-authored-by: jiafatom <jiafa@microsoft.com>	2021-10-08 07:41:06 -07:00
Edward Yang	9b09a5f7ba	[ONNX] Enable scripting tests (#64780 ) (#66138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66138 * Scripting tests * Fixed scripting tests for lower opsets Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31424099 fbshipit-source-id: 67095b7ac67b9da986961788392aa92c95cf11f2	2021-10-08 07:41:03 -07:00
Edward Yang	53fefaa916	[ONNX] Fix duplicated output same name case (#64190 ) (#66137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66137 * fix duplicated output node same output name issue. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31424100 fbshipit-source-id: b1b06a92c51744030788b651f3a597d987a8deda Co-authored-by: hwangdeyu <dejack953@outlook.com>	2021-10-08 07:41:01 -07:00
BowenBao	4af47eb3a7	[ONNX] Update slice process shape to support rank only inference (#65782 ) (#66149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66149 Updated logic will be able to infer rank of slice output, when only rank is known for slice input. Enables cases where `ConstantValueMap::HasRank(input)` is `True`, while `ConstantValueMap::HasShape(input)` is `False`. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31423232 Pulled By: ezyang fbshipit-source-id: 516e3916aa71afda2b10e44620636e42ed837236 Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-10-08 07:39:40 -07:00
Richard Zou	dc37547c44	Opinfos for avg_pooling (#64214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64214 Added OpInfos for: - F.adapative_avg_pool{1, 3}d - F.avg_pool{1, 3}d The 2d variants already had OpInfos. Test Plan: - run tests Reviewed By: albanD, mruberry Differential Revision: D30667797 Pulled By: zou3519 fbshipit-source-id: 53f5cd02070de5b7db4abb017d727376b59288df	2021-10-08 07:26:08 -07:00
Jeeja KP	8d6d448238	Add HPU for Autograd Fallback (#65605 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/65605 Reviewed By: albanD Differential Revision: D31373899 Pulled By: ezyang fbshipit-source-id: 894f62dc44b0532f152dc97b839eecfbaed25e8c	2021-10-08 07:21:44 -07:00
Ankita Sharma	4af913a7cf	fixed minor issues for index_add in docs (#65806 ) Summary: Hi, I'm looking forward to contributing to PyTorch, so starting with a minor fix in the documentation for `index_add`. Currently, in the documentation for `index_add_` (please see https://pytorch.org/docs/master/generated/torch.Tensor.index_add_.html#torch.Tensor.index_add_): 1. `tensor` attribute was pointing to `torch.tensor` class, which IMO - is (thought may not be a big deal) unintentional. 2. `dim` attribute is pointing to `torch.Tensor.dim`, which again IMO - is unintentional. This PR suggests a correction for the first point above, to rename `tensor` attribute to `input` so that it doesn't point to `torch.tensor` class. (I've verified that others ops like `scatter` use `input`, so this should not break the consistency in the documentation). I couldn't find an appropriate fix for the second point above, since renaming `dim` to something else will break the consistency (as almost all others op in PyTorch use `dim` as the attribute name). I may be wrong here, so please let me know if there is any feedback or an alternate fix for this. _Note:_ I plan to fix this behavior for `index_copy_` (https://pytorch.org/docs/master/generated/torch.Tensor.index_copy_.html#torch.Tensor.index_copy_) once and if this PR is approved. To the reviewers, please help me tag the correct person who could help review this PR. cc: krshrimali mruberry zou3519 cc brianjo mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/65806 Reviewed By: dagitses, mruberry Differential Revision: D31431182 Pulled By: zou3519 fbshipit-source-id: 66ced9677ac3bc71d672d13366f9f567ecea0a2d	2021-10-08 07:17:15 -07:00
Luca Wehrstedt	61f0bb70c1	Allow external CUDA streams to be set as current (#65914 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65822. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65914 Reviewed By: dagitses Differential Revision: D31389480 Pulled By: lw fbshipit-source-id: 2b2f40e5452c5b2a0b9f0f705750d2aa9deb2ead	2021-10-08 06:09:32 -07:00
Shiyan Deng	60fe854f9f	[fx2trt] save and load TRTModule for OSS (#65958 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65958 zhxchen17 added `pickle` pybind for trt engine which allows us to save and load a nn.Module with trt engine in fbcode. This diff though is explicitly ser/des engine in __set_state__` and `__get_state__` so that in OSS people can also save and load TRTModule directly. Test Plan: buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fx2trt Reviewed By: wushirong Differential Revision: D31309429 fbshipit-source-id: 9068e2ae6375ed0e1bb55b0e9d582b8d9c049dbf	2021-10-07 22:27:40 -07:00
jiej	321345d7c9	Revert "Revert D31227448: [pytorch][PR] fixing sorting in stride indices" (#66176 ) Summary: enabling https://github.com/pytorch/pytorch/issues/63940 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66176 Reviewed By: ngimel Differential Revision: D31423920 Pulled By: dzhulgakov fbshipit-source-id: 06b1e0f757f4fb5b31ee1fa464bcd689df919b9c	2021-10-07 22:09:07 -07:00
Shiyan Deng	74477ba243	[fx2trt] More controls over output dtypes (#65959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65959 Give some more controls over the output dtype of a trt engine. Previously it would be fp16 if we turn on fp16_mode. This diff allows the engine to generate fp32 output with fp16_mode=True. Test Plan: CI Reviewed By: kflu, wushirong Differential Revision: D31243929 fbshipit-source-id: 09c752e6f382d6ad169da66878d9a9277c134869	2021-10-07 22:03:51 -07:00
CodemodService FBSourceClangFormatLinterBot	227f91e72d	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D31495160 fbshipit-source-id: b0a56003a6695989dff0d325cdc118182662ec61	2021-10-07 21:09:22 -07:00
Ben Koopman	a58ff186e8	[quant][embedding qat] Add basic EmbeddingBag QAT fakeQuant workflow (#65443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65443 Test Plan: Imported from OSS Reviewed By: dagitses, supriyar Differential Revision: D31456445 Pulled By: b-koopman fbshipit-source-id: 0edda6e272d9005fce65f2ba6a5e6abc831836de	2021-10-07 20:19:29 -07:00
Dhruv Matani	64caee1356	[PyTorch Edge] Leave out field for debug_handle if not being built with eager symbolication support (#66131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66131 Turns out that a model with 72k instructions causes about 0.5MiB of additional memory overhead (if there's an 8 byte memory overhead per instruction). This is not necessary if we're building w/o eager symbolication support. This change eliminates the 8 byte `debug_handle` if the build is w/o eager symbolication support. ghstack-source-id: 140045478 (Note: this ignores all push blocking failures!) Test Plan: ``` buck build -c "pt.enable_eager_symbolication"=1 //xplat/caffe2/fb/lite_predictor:lite_predictor buck build //xplat/caffe2/fb/lite_predictor:lite_predictor ``` Reviewed By: kimishpatel Differential Revision: D31387784 fbshipit-source-id: af56787ad833b990a46b79ab021e512edaa22143	2021-10-07 20:01:18 -07:00
Nikita Shulga	ebe530a9cd	Periodic jobs should not have CIFLOW_DEFAULT label (#66300 ) Summary: Noticed that `periodic-pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-slow-gradcheck` job has a `ciflow/default`, but does not have a `ciflow/scheduled` label Added asserts to enforce that jobs with non-trival is_scheduled property do not have default and do have scheduled labesl Rename `periodic-pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-slow-gradcheck` to `periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck` Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/66300 Reviewed By: seemethere Differential Revision: D31493323 Pulled By: malfet fbshipit-source-id: 194c1d7a4e659847d94a547b87a0d7d08e66406d	2021-10-07 19:57:32 -07:00
Peter Bell	bd9eee4e65	TBB: Use static partitioner to match OpenMP scheduling (#65327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65327 Should fix https://github.com/pytorch/pytorch/issues/64571 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31474116 Pulled By: malfet fbshipit-source-id: 8c4264d4778c6caf58261e3f70d72decd134128d	2021-10-07 19:12:36 -07:00
Nikita Shulga	d5033410b1	Parallel: Deduplicate parallel functions in different backends (#65326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65326 parallel_for and parallel_reduce currently share some common code in all backends, specifically for detecting if it should run in parallel or not. This moves all the backend-specific code into a single `internal::invoke_parallel` function and makes the `parallel_` functions common to all backends. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D31124495 fbshipit-source-id: 65c3d2af42a8860cc4d6349566085c9fa8d8c6f0	2021-10-07 19:11:19 -07:00
Nikita Shulga	e1817d895f	[BE] Cleanup python_function.cpp (#66296 ) Summary: - Delete unused `var_input_idx` - Fix `uninitialized variable` clang-tidy warning by setting `PyObject* input` to PyNone Pull Request resolved: https://github.com/pytorch/pytorch/pull/66296 Reviewed By: janeyx99 Differential Revision: D31491016 Pulled By: malfet fbshipit-source-id: 08267144be0cd049d122580cdf81cf586c3e30a6	2021-10-07 18:41:17 -07:00
Eli Uriegas	ca363d1e22	docker: Ensure libgnutls30 for all docker builds (#66258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66258 Installing libgnutls30 has shown to be good when confronted with the CERT issue related to deb.nodesource.com Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31477789 Pulled By: seemethere fbshipit-source-id: f87ae4c098771acc505db14e3982d8858cf7326f	2021-10-07 18:36:40 -07:00
Rohan Varma	38f5144eae	Fix https://github.com/pytorch/pytorch/issues/61982 (#66015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66015 Fixes https://github.com/pytorch/pytorch/issues/61982 by clone of tensors in DDPSink. Only applies once for static_graph and generally for unused params which already has overhead, so perf hit should not be an issue. Will verify with benchmark. Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D31346633 fbshipit-source-id: 5b9245ade628565cffe01731f6a0dcbb6126029b	2021-10-07 18:11:18 -07:00
Peter Bell	20f2e55d4f	Rename cuda/Resize.cu to cuda/Resize.cpp (#65943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65943 These files don't require nvcc to compile. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31386277 Pulled By: ngimel fbshipit-source-id: 1066ee87fa795e2c7969447fbce1fe2633fb9680	2021-10-07 16:37:51 -07:00
Ashish Solanki	86de09e49a	Upgrade to ubuntu:trusty-20190515 (#63468 ) Summary: Security Upgrade to ubuntu:trusty-20190515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63468 Reviewed By: ngimel Differential Revision: D31393552 Pulled By: malfet fbshipit-source-id: 4e2399e3cddc1d549c08c82c08015e00569c19bc	2021-10-07 16:28:08 -07:00
Don Jang	416f593080	[Static Runtime] Group graph nodes into input aliases & output aliases (#65517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65517 This change retrofits `GetAlwaysAliveValues` into `ValueGroup` to group the values used by a graph into three groups as follows: - input_aliases: values that are either inputs or contain aliases of inputs or constants. - output_aliases: values that are either outputs or contain aliases of outputs and are not in input_aliases. - Values that dont't show up in input_aliases and output_aliases are internally created consumed within the graph. `output_aliases` is the only new group introduced by this change, and a following diff will use this to preallocate output Tensors to accelerate Static Runtime's performance. Test Plan: Added `ValueGroup.Init` to cover the updated code path. Note that there was no test for `GetAlwaysAliveValues` before. Reviewed By: hlu1 Differential Revision: D30940955 fbshipit-source-id: 2cb065ecda0f447a61e64a7cf70cc7c6947f7dfc	2021-10-07 14:35:12 -07:00
Mikayla Gawarecki	0e2d1b221a	[Bootcamp][Pytorch Core] Add testing for complex non-vanilla SGD Summary: Adding test to ensure non-Vanilla SGD behaves as if complex numbers are two real numbers in R^2 as per issue 65711 on github Test Plan: ```buck test mode/dev caffe2/test:optim -- 'test_sgd_complex'``` https://pxl.cl/1QLxw Reviewed By: albanD Differential Revision: D31477212 fbshipit-source-id: 500678e561a05ac96759223b4c87a37cab26c6a6	2021-10-07 14:07:39 -07:00
Shunting Zhang	5e7d8ec846	Support Registering a Variable Length List of Builtin Modules for torch::deploy Builtin Libraries (#66021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66021 A builtin library consists of a list of frozen modules and a list of builtin modules. For tensorrt, it's quite simple since we only have a single builtin module tensorrt.tensorrt. But it can be complex for libraries like numpy which contains multiple builtin modules (np.core._multiarray_umath, np.random.mtrand etc.) if we want to add it as a torch::deploy builtin. We enhance the macro that registers builtin libraries to accept a variable length of builtin modules. We can use this macro to register frozentorch, frozenpython, tensorrt for now and can also use it to register libraries like numpy later on. The enhanced macro now looks as follows. Although we don't need to worry about back-compatibility for now, but this enhanced version is fully compatible with the previous version. The previous version is just a special case when the library contains no builtin modules. ``` REGISTER_TORCH_DEPLOY_BUILTIN(library_name_without_quote, frozen_modules_list, builtin_module_name_1, builtin_module_init_function_1, ..., builtin_module_name_N, builtin_module_init_function_N) ``` ghstack-source-id: 140007970 Test Plan: 1. Play around with interactive_embedded_interpreter.cpp to import torch._C, tensorrt.tensorrt etc inside the embedded interpreter. 2. Enhance test_builtin_registry.cpp 3. Run test_deploy.cpp and test_deploy_gpu.cpp Reviewed By: suo Differential Revision: D31349390 fbshipit-source-id: 70a1fcf660341180fc4d5195aed15ceb07c2bef7	2021-10-07 13:23:46 -07:00
Raghavan Raman	40dd2711b6	[Static Runtime] Cleanup LLVMCodeGen memory after code gen completes (#66218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66218 This stack of diffs reduces the memory used by LLVMCodeGen object. Here are the numbers on model `294738512`: (this is the number reported as `Memory turnover after freeze_module:` in the output) ``` Before: 123343496 After : 121566008 ``` So, there is a reduction of about `~1.77MB` with this change of making `PytorchLLVMJIT` a singleton. Test Plan: Imported from OSS Reviewed By: ZolotukhinM, hlu1 Differential Revision: D31445798 Pulled By: navahgar fbshipit-source-id: c860d36456b2c5d3e21010c1217e2948326f666d	2021-10-07 13:17:13 -07:00
Raghavan Raman	7e5ef5e517	[nnc] Added a cache to use singleton instances of PytorchLLVMJIT for every triple,cpu,attrs combination (#66217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66217 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D31445797 Pulled By: navahgar fbshipit-source-id: 4e1450100928132ccce4ef3c6c20ad6661cfabed	2021-10-07 13:17:11 -07:00
Raghavan Raman	c30dc52739	[nnc] Use given kernel function name while emitting code (#66216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66216 Test Plan: Imported from OSS Reviewed By: dagitses, priyaramani Differential Revision: D31445799 Pulled By: navahgar fbshipit-source-id: 8d164209831339d364710b14f6a263a16e108281	2021-10-07 13:15:46 -07:00
Bin Wen	3cc40253d9	add gather to ShardedTensor (#65671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65671 Tentative implementation to use dist.gather_object to collect shards from all ranks and then "merge" them. The merge is done on dst_rank though padding the sharded tensors into the size of full tensor based on their metadata (offsets, lengths) first, and then summing these padded tensors together. Also considered concatenating sharded tensor without padding to minimize memory footprint (assuming padding will increase memory). But it may not be flexible enough for arbitrary sharing (e.g. shard on multiple directions) Another way can be constructing the padded tensor on each rank and reduce to rank0. I feel this is the most easy implementation. But it will invoke higher memory usage and comm payload. Please let me know if this alternative is preferred. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 Test Plan: Imported from OSS python test/distributed/_sharded_tensor/test_sharded_tensor.py -v -k test_gather did not manage to test on oss, but tested in fbcode by reserving on demand gpu arc patch D31197611 modify the test with 2 gpus as on-demand gpu only has 2 cores (D31227986) buck test -c fbcode.enable_gpu_sections=true mode/dev-nosan caffe2/test/distributed/_sharded_tensor:sharded_tensor -- test_gather buck-out/gen/caffe2/test/distributed/_sharded_tensor/sharded_tensor#binary.par test_sharded_tensor.TestShardedTensorChunked.test_gather {F667213605} Reviewed By: dagitses, pritamdamania87 Differential Revision: D31197611 Pulled By: dracifer fbshipit-source-id: cf98b4a2d7838b11b9582eb23f826bb0fa38a7f4	2021-10-07 13:01:12 -07:00
Peter Bell	f445ed19b2	OpInfo for 2d fft functions (#66128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66128 cc mruberry peterbell10 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31450217 Pulled By: mruberry fbshipit-source-id: 1952fc60c5d5f454966c43f5710b8b97a9794d0e	2021-10-07 12:50:06 -07:00
Peter Bell	2213c463ba	C++ API and docs for hfftn (#66127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66127 cc mruberry peterbell10 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31450216 Pulled By: mruberry fbshipit-source-id: 2878aee294aa7d74482b66d536258bac0541408d	2021-10-07 12:48:36 -07:00
Peter Bell	e6a4f746c2	slow_conv3d: Use at::sum for grad_bias accumulation (#65758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65758 The same change has been made in conv2d, the proper algorithm is both faster and gives more precision. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31257872 Pulled By: ngimel fbshipit-source-id: 6ff3a7a00a05b66f83d45cc820bd0c230cb8de6d	2021-10-07 12:20:49 -07:00
Ivan Yashchuk	2e4e5b0264	Add inplace_variant for resize_ OpInfo (#66135 ) Summary: Enable testing of `torch.Tensor.resize_`. The negative view test is skipped as the test doesn't work with resize_ see https://github.com/pytorch/pytorch/issues/65945. cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/66135 Reviewed By: dagitses Differential Revision: D31444263 Pulled By: mruberry fbshipit-source-id: 00c7fe05df28fba01508b31adb3ed4fdcf4d0326	2021-10-07 12:00:30 -07:00
Samuel Salas	361b34eb81	Chunk: acc_ops (#66010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66010 Added chunk acc op and unit test. Removed misleading return statements. Test Plan: buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer Reviewed By: 842974287 Differential Revision: D31326490 fbshipit-source-id: 81183ad8773eb7471566bec07cdd3dd6c4cee217	2021-10-07 11:41:00 -07:00
Patrick Spencer	9fb6ba24e7	Update `torch.fx.passes.split_module` docstring (#65542 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65542 Add docstring for torch.fx.passes.split_module that conforms to Google Python Style conventions. Changed original example to the example from this diff: https://www.internalfb.com/diff/D24925283 (`9734c042b8`) Test Plan: Ran buck test //caffe2/test:fx. No errors detected https://pxl.cl/1QCch Reviewed By: jamesr66a Differential Revision: D31145694 fbshipit-source-id: 8e54f3b1be3dca1c4d414fdeeab71b9f2b5d9f3e	2021-10-07 10:37:10 -07:00
Mike Iovine	d5f64afc38	[Static Runtime] Support aten::to.prim_dtype overload (#64928 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64928 Added support this overload of `aten::to`: ``` aten::to.prim_dtype(Tensor(a) self, int? dtype, bool non_blocking=False, bool copy=False) -> Tensor(a\|b) ``` Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- IndividualOps_to` Reviewed By: hlu1 Differential Revision: D30901398 fbshipit-source-id: 38ce807c30185e92dd472b404b362f22ac7e4efb	2021-10-07 10:22:44 -07:00
Will Constable	a8c0b362ce	[pytorch][PR] Add hash and int128 utils for Lazy Tensor Core" (#66181 ) Summary: These utils are prerequisites for Lazy Node base class. - set up new torch/csrc/lazy, test/cpp/lazy dirs - add source files to build_variables.bzl in new lazy_core_sources var - create new test_lazy binary Fixes https://github.com/pytorch/pytorch/issues/65636 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66181 Original commit changeset: 3d0d5377d71e Test Plan: Run PyTorch XLA corresponding PR in XLA CI: https://github.com/pytorch/xla/pull/3148/files Reviewed By: suo Differential Revision: D31416438 fbshipit-source-id: 58a6a49c5bc30134bc6bae2e42778f359b9a8f40	2021-10-07 10:05:26 -07:00

1 2 3 4 5 ...

40733 Commits