pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Joel Schlosser	defc649eca	Update to short forms of splitWithTail / splitWithMask (#55542 ) Summary: Switched to short forms of `splitWithTail` / `splitWithMask` for all tests in `test/cpp/tensorexpr/test_*.cpp` (except test_loopnest.cpp) Pull Request resolved: https://github.com/pytorch/pytorch/pull/55542 Reviewed By: mrshenli Differential Revision: D27632033 Pulled By: jbschlosser fbshipit-source-id: dc2ba134f99bff8951ae61e564cd1daea92c41df	2021-04-09 10:15:20 -07:00
Bert Maher	90f848572c	NNC depthwise conv2d implementation (#54920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54920 Add a depthwise convolution implementation and reasonably good schedules for 3x3 stride=1,2. ghstack-source-id: 126076113 Test Plan: new tensorexpr test: Conv.DepthwiseConv2D Reviewed By: ZolotukhinM Differential Revision: D27413745 fbshipit-source-id: 833da6072b655fbe2b679704e9d56a08e1bf7e7e	2021-04-08 21:56:53 -07:00
Jeffrey Wan	3f9492c8b3	[Hackathon] Modernize API used in NNC C++ tests (1/3) (#55512 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/55203 Fixes issues (1) and (2) in the following tests: tests in test/cpp/tensorexpr/test_loopnest.cpp from the beginning to LoopNestReorderLongStringFull (including) Pull Request resolved: https://github.com/pytorch/pytorch/pull/55512 Reviewed By: mrshenli Differential Revision: D27630679 Pulled By: soulitzer fbshipit-source-id: b581aaea4f5f54b3285f0348aa76e99779418f80	2021-04-08 08:34:25 -07:00
Brian Hirsh	dd2bccafc5	nnc hackathon - use new APIs in tests (#55497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55497 Migrating some of the NNC API's used in testing, from this issue: https://github.com/pytorch/pytorch/issues/55203 I covered the second half of `test_loopnest.cpp`, and migrated (1) and (2) in the above issue: `LoopNest::getLoopStmtsFor`, `splitWithTail`, and `splitWithMask` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27628625 Pulled By: bdhirsh fbshipit-source-id: ec15efba45fae0bbb442ac3577fb9ca2f8023c2d	2021-04-07 13:03:25 -07:00
Martin Yuan	3551bd31be	[PyTorch] Lite interpreter with a backend delegate (#54462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54462 Unclean files during sync - Sat Mar 20 04:00:02 PDT 2021 Unclean files during sync - Sun Mar 21 04:00:01 PDT 2021 ghstack-source-id: 124585992 Test Plan: ``` buck run xplat/caffe2/fb/test/delegate:interpreter_test -- --model_file_path=/path/to/mobile_model.ptl ``` Reviewed By: raziel Differential Revision: D27232309 fbshipit-source-id: 8504a3185339d73bfa6e924485c4745acf269cec	2021-04-06 00:55:26 -07:00
Nikitha Malgi	197f9f0826	Merge CUDA Streams and Events (#53902 ) Summary: ----------- - Updates current_stream and default stream API's to take `optional[device]` argument - Adds parsing logic to replace `torch.cuda.Stream` and `torch.cuda.Event` -> `torch.classes.cuda.Stream` and `torch.classes.cuda.Event` for JIT - Merges StreamContext manager for both Eager and JIT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53902 Test Plan: ------ Run JIT tests: python test/test_jit.py -v TestCUDA Run eager tests: python test/test_cuda.py -v TestCuda Reviewed By: glaringlee Differential Revision: D27494627 Pulled By: nikithamalgifb fbshipit-source-id: b30b0570e38a33fb335c83762eb06ffd46a44b5c	2021-04-05 08:19:55 -07:00
Louis Feng	159fdde9ae	Support needsOutputs for RecordFunction and ObserverUtil improvements (#55012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55012 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54442 Added needsOutputs support to RecordFunction, improved ObserverUtil functions to handle list data. Minor refactor names to be consistent. To get output data from kernel calls, we need to temporarily capture them before passing them to the record function. Then the results are released to function return. We handle two cases, for unboxed and boxed kernels. The boxed version is fairly simple since all outputs are stored in the stack object. For unboxed kernel calls, we added a `ReturnValue` utility class to properly handle the different return values of unboxed kernels. For optimization, this intermediate capture is only enabled for observers that request `needsOutputs(true)` and should not affect other observers or when the observer is not enabled. Test Plan: ``` => buck build //caffe2/test/cpp/jit: --show-output => buck-out/gen/caffe2/test/cpp/jit/jit --gtest_filter=RecordFunctionTest* CUDA not available. Disabling CUDA and MultiCUDA tests Note: Google Test filter = RecordFunctionTest-_CUDA:*_MultiCUDA [==========] Running 7 tests from 1 test case. [----------] Global test environment set-up. [----------] 7 tests from RecordFunctionTest [ RUN ] RecordFunctionTest.TracedTestInputsOutputs [ OK ] RecordFunctionTest.TracedTestInputsOutputs (226 ms) [ RUN ] RecordFunctionTest.SampledCallbacks [ OK ] RecordFunctionTest.SampledCallbacks (771 ms) [ RUN ] RecordFunctionTest.RecordFunctionGuard [ OK ] RecordFunctionTest.RecordFunctionGuard (0 ms) [ RUN ] RecordFunctionTest.Callbacks [ OK ] RecordFunctionTest.Callbacks (2 ms) [ RUN ] RecordFunctionTest.ShouldRun [ OK ] RecordFunctionTest.ShouldRun (0 ms) [ RUN ] RecordFunctionTest.Basic [ OK ] RecordFunctionTest.Basic (1 ms) [ RUN ] RecordFunctionTest.OperatorNameOverload [ OK ] RecordFunctionTest.OperatorNameOverload (1 ms) [----------] 7 tests from RecordFunctionTest (1001 ms total) [----------] Global test environment tear-down [==========] 7 tests from 1 test case ran. (1002 ms total) [ PASSED ] 7 tests. ``` Reviewed By: ilia-cher Differential Revision: D27449877 fbshipit-source-id: 69918b729565f5899471d9db42a587f9af52238d	2021-04-02 15:16:17 -07:00
Maxim Grechkin	38a08a49ea	Flip clip_grad_norm default for error_if_nonfinite to false (#55169 ) Summary: Non-backwards-compatible change introduced in https://github.com/pytorch/pytorch/pull/53843 is tripping up a lot of code. Better to set it to False initially and then potentially flip to True in the later version to give people time to adapt. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55169 Reviewed By: mruberry Differential Revision: D27511150 Pulled By: jbschlosser fbshipit-source-id: 1ac018557c0900b31995c29f04aea060a27bc525	2021-04-02 12:25:32 -07:00
Lucas Hosseini	09f1f14569	Transition to new tensorpipe::Pipe API. (#55193 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55193 Test Plan: CI Reviewed By: lw Differential Revision: D27466387 fbshipit-source-id: 07b831d699f56874dd45f37e448b8c4244ead5e3	2021-04-02 02:28:07 -07:00
Mikhail Zolotukhin	0b75f862c7	[TensorExpr] Nuke FunctionCall. (#54998 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54998 The only reason why we couldn't use Load instead of FunctionCall was DepTracker. Now this is gone and we finally could replace FunctionCall with Load. Test Plan: Imported from OSS Reviewed By: bertmaher, pbelevich Differential Revision: D27446412 Pulled By: ZolotukhinM fbshipit-source-id: 9183ae5541c2618abc9026b1dc4c4c9fab085d47	2021-04-01 19:47:59 -07:00
Mikhail Zolotukhin	688e350725	[TensorExpr] Nuke DepTracker and findAllNeededTensors. (#54997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54997 DepTracker was used to automatically pull in dependent computations from output ones. While it seems quite convenient, it's led to several architectural issues, which are fixed in this stack. DepTracker worked on Tensors, which is a pair of Buf and Stmt. However, Stmt could become stale and there was no way to reliably update the corresponding tensor. We're now using Bufs and Stmts directly and moving away from using Tensors to avoid these problems. Removing DepTracker allowed to unify Loads and FunctionCalls, which essentially were duplicates of each other. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27446414 Pulled By: ZolotukhinM fbshipit-source-id: a2a32749d5b28beed92a601da33d126c0a2cf399	2021-04-01 19:46:26 -07:00
Lucas Hosseini	9d6a81d1a6	Avoid aggregate initialization for tensorpipe::{Cpu,Cuda}Buffer and tensorpipe::Message::Tensor. (#55136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55136 This will ease the transition to the new API where `Buffer` does not store a length anymore. Test Plan: CI Reviewed By: lw Differential Revision: D27466385 fbshipit-source-id: 9a167f8c501455a3ab49ce75257c69d8b4869925	2021-04-01 06:55:02 -07:00
Ailing Zhang	43d4f3b8d0	Implement public API InferenceMode and its error handling (#55008 ) Summary: https://www.internalfb.com/phabricator/paste/view/P360377337Pull Request resolved: https://github.com/pytorch/pytorch/pull/53343 For easier review, here's a diff between the version before revert. https://www.internalfb.com/phabricator/paste/view/P360750919 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55008 Test Plan: Imported from OSS Pulled By: ailzhang Reviewed By: bhosmer Differential Revision: D27443229 fbshipit-source-id: 01b03446a1f6373f43dd5c7170d26226b50f363c	2021-03-31 10:48:00 -07:00
Jianyu Huang	7fc03dd7c9	Back out "[pytorch][PR] Merge CUDA Streams and Events" (#54996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54996 Original commit changeset: 45d9fee9a582 Test Plan: CI Reviewed By: jspark1105 Differential Revision: D27444718 fbshipit-source-id: deb627230817923eaf84ade50ecb14bfbce4e779	2021-03-31 10:21:35 -07:00
Jacob Szwejbka	a0ae3e520f	[Pytorch Mobile] 'fix' filter of named parameters for FL (#54633 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54633 Theres currently no information that could be used to determine what is a parameter during the loading of a mobile module. This prevents named parameters from functioning correctly. This change is a temporary hack to help out federated learning the sole user of this api currently. ghstack-source-id: 124885201 Test Plan: todo Reviewed By: dhruvbird Differential Revision: D27308738 fbshipit-source-id: 0af5d1e8381ab7b7a43b20560941aa070a02e7b8	2021-03-31 09:21:35 -07:00
Qi Zhao	5b448cf21a	Revert D25966661: Support needsOutputs for RecordFunction and ObserverUtil improvements Test Plan: revert-hammer Differential Revision: D25966661 (`0e43a73f76`) Original commit changeset: 707886e1f212 fbshipit-source-id: a4e4af29abf622c1e0aaaf7dfb019c045988b4bc	2021-03-30 15:41:12 -07:00
Louis Feng	0e43a73f76	Support needsOutputs for RecordFunction and ObserverUtil improvements (#54442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54442 Added needsOutputs support to RecordFunction, improved ObserverUtil functions to handle list data. Minor refactor names to be consistent. To get output data from kernel calls, we need to temporarily capture them before passing them to the record function. Then the results are released to function return. We handle two cases, for unboxed and boxed kernels. The boxed version is fairly simple since all outputs are stored in the stack object. For unboxed kernel calls, we added a `ReturnValue` utility class to properly handle the different return values of unboxed kernels. For optimization, this intermediate capture is only enabled for observers that request `needsOutputs(true)` and should not affect other observers or when the observer is not enabled. Test Plan: ``` => buck build //caffe2/test/cpp/jit: --show-output => buck-out/gen/caffe2/test/cpp/jit/jit --gtest_filter=RecordFunctionTest* CUDA not available. Disabling CUDA and MultiCUDA tests Note: Google Test filter = RecordFunctionTest-_CUDA:*_MultiCUDA [==========] Running 7 tests from 1 test case. [----------] Global test environment set-up. [----------] 7 tests from RecordFunctionTest [ RUN ] RecordFunctionTest.TracedTestInputsOutputs [ OK ] RecordFunctionTest.TracedTestInputsOutputs (226 ms) [ RUN ] RecordFunctionTest.SampledCallbacks [ OK ] RecordFunctionTest.SampledCallbacks (771 ms) [ RUN ] RecordFunctionTest.RecordFunctionGuard [ OK ] RecordFunctionTest.RecordFunctionGuard (0 ms) [ RUN ] RecordFunctionTest.Callbacks [ OK ] RecordFunctionTest.Callbacks (2 ms) [ RUN ] RecordFunctionTest.ShouldRun [ OK ] RecordFunctionTest.ShouldRun (0 ms) [ RUN ] RecordFunctionTest.Basic [ OK ] RecordFunctionTest.Basic (1 ms) [ RUN ] RecordFunctionTest.OperatorNameOverload [ OK ] RecordFunctionTest.OperatorNameOverload (1 ms) [----------] 7 tests from RecordFunctionTest (1001 ms total) [----------] Global test environment tear-down [==========] 7 tests from 1 test case ran. (1002 ms total) [ PASSED ] 7 tests. ``` Reviewed By: ilia-cher Differential Revision: D25966661 fbshipit-source-id: 707886e1f212f40ba16a1fe292ea7dd33f2646e3	2021-03-30 14:26:22 -07:00
Sam Estep	5bcbbf5373	Lint trailing newlines (#54737 ) Summary: Context: https://github.com/pytorch/pytorch/issues/53406 added a lint for trailing whitespace at the ends of lines. However, in order to pass FB-internal lints, that PR also had to normalize the trailing newlines in four of the files it touched. This PR adds an OSS lint to normalize trailing newlines. The changes to the following files (made in 54847d0adb9be71be4979cead3d9d4c02160e4cd) are the only manually-written parts of this PR: - `.github/workflows/lint.yml` - `mypy-strict.ini` - `tools/README.md` - `tools/test/test_trailing_newlines.py` - `tools/trailing_newlines.py` I would have liked to make this just a shell one-liner like the other three similar lints, but nothing I could find quite fit the bill. Specifically, all the answers I tried from the following Stack Overflow questions were far too slow (at least a minute and a half to run on this entire repository): - [How to detect file ends in newline?](https://stackoverflow.com/q/38746) - [How do I find files that do not end with a newline/linefeed?](https://stackoverflow.com/q/4631068) - [How to list all files in the Git index without newline at end of file](https://stackoverflow.com/q/27624800) - [Linux - check if there is an empty line at the end of a file [duplicate]](https://stackoverflow.com/q/34943632) - [git ensure newline at end of each file](https://stackoverflow.com/q/57770972) To avoid giving false positives during the few days after this PR is merged, we should probably only merge it after https://github.com/pytorch/pytorch/issues/54967. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54737 Test Plan: Running the shell script from the "Ensure correct trailing newlines" step in the `quick-checks` job of `.github/workflows/lint.yml` should print no output and exit in a fraction of a second with a status of 0. That was not the case prior to this PR, as shown by this failing GHA workflow run on an earlier draft of this PR: - https://github.com/pytorch/pytorch/runs/2197446987?check_suite_focus=true In contrast, this run (after correcting the trailing newlines in this PR) succeeded: - https://github.com/pytorch/pytorch/pull/54737/checks?check_run_id=2197553241 To unit-test `tools/trailing_newlines.py` itself (this is run as part of our "Test tools" GitHub Actions workflow): ``` python tools/test/test_trailing_newlines.py ``` Reviewed By: malfet Differential Revision: D27409736 Pulled By: samestep fbshipit-source-id: 46f565227046b39f68349bbd5633105b2d2e9b19	2021-03-30 13:09:52 -07:00
Ailing Zhang	263180d7fc	Revert D26973911: Implement public API InferenceMode and its error handling Test Plan: revert-hammer Differential Revision: D26973911 (`7caa464631`) Original commit changeset: 0ebdac7a3cd5 fbshipit-source-id: afd37a3785bc694e8ffbd679eba1cfed89ef2273	2021-03-29 11:17:49 -07:00
Kurt Mohler	3ddc6174da	Raise error in clip_grad_norm_ if norm is non-finite (#53843 ) Summary: BC-breaking note: This change throws errors for cases that used to silently pass. The old behavior can be obtained by setting `error_if_nonfinite=False` Fixes https://github.com/pytorch/pytorch/issues/46849 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53843 Reviewed By: malfet Differential Revision: D27291838 Pulled By: jbschlosser fbshipit-source-id: 216d191b26e1b5919a44a3af5cde6f35baf825c4	2021-03-29 08:41:21 -07:00
Ailing Zhang	7caa464631	Implement public API InferenceMode and its error handling (#53343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53343 Test Plan: Imported from OSS Reviewed By: ezyang, nikithamalgifb Differential Revision: D26973911 Pulled By: ailzhang fbshipit-source-id: 0ebdac7a3cd554822d26d5a40f539b6e2aaec61d	2021-03-27 13:44:23 -07:00
Bert Maher	e4d19798f3	[nnc][tests] Convert a bunch of FileCheck to checkIR Summary: I added a helper to convert a Stmt to string and FileCheck it, so started using it in a bunch of places. I replaced about half the current uses, got tired, started to write a Perl script to automate it, realized that was hard, and decided to give up for a bit. But this cleans up some of the tests a bit, so seems easy to review and worth landing. Test Plan: test_tensorexpr --gtest_filter=LoopNest.* Reviewed By: navahgar Differential Revision: D27375866 fbshipit-source-id: 15894b9089dec5cf25f340fe17e6e54546a64257	2021-03-26 20:27:50 -07:00
Bert Maher	24f589df44	[nnc] Disabled test case for failure in implementing conv1d (#54756 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54756 We have multiple bugs here, one relating to index flattening and the other to computeAt. ghstack-source-id: 125054729 Test Plan: yikes Reviewed By: ZolotukhinM Differential Revision: D27354082 fbshipit-source-id: 8b15bac28e3eba4629881ae0f3bd143636f65ad7	2021-03-26 20:27:48 -07:00
Bert Maher	e542e67253	[nnc] Test case for computeAt with reduction (#54755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54755 As title. A step on the way to using computeAt to optimize convolution. ghstack-source-id: 125054730 Test Plan: new test Reviewed By: ZolotukhinM Differential Revision: D27353663 fbshipit-source-id: 930e09d96d1f74169bf148cd30fc195c6759a3e9	2021-03-26 20:25:18 -07:00
Nikitha Malgi	416ba5c48f	Merge CUDA Streams and Events (#53902 ) Summary: ----------- - Updates current_stream and default stream API's to take `optional[device]` argument - Adds parsing logic to replace `torch.cuda.Stream` and `torch.cuda.Event` -> `torch.classes.cuda.Stream` and `torch.classes.cuda.Event` for JIT - Merges StreamContext manager for both Eager and JIT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53902 Test Plan: ------ Run JIT tests: python test/test_jit.py -v TestCUDA Run eager tests: python test/test_cuda.py -v TestCuda Reviewed By: SplitInfinity Differential Revision: D27285996 Pulled By: nikithamalgifb fbshipit-source-id: 45d9fee9a582b5f4c82330f5f99eb88584804270	2021-03-26 14:19:39 -07:00
Pritam Damania	267fc27d39	Ensure torch.futures.wait_all exits early on error. (#53953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53953 torch.futures.wait_all, would wait for all specified futures to complete before it returned. As a result, if there was an error it would still wait for a long time (ex: long running RPCs) before it returned an error to the user. This PR ensures `wait_all` returns and error as soon as any future runs into an error and doesn't wait for all futures to complete. I removed the logic _invoke_rpc_python_udf which raised an error in the unwrap function, because ideally the error should be set on the Future and not be raised to the user only when `wait()` is called. As an example, in the case of `wait_all`, the user never calls `wait()` on the future that errored out but a future down the chain and we should propagate these errors via `setError` instead. ghstack-source-id: 124721216 Test Plan: 1) Unit test added. 2) waitforbuildbot Reviewed By: mrshenli Differential Revision: D27032362 fbshipit-source-id: c719e2277c27ff3d45f1511d5dc6f1f71a03e3a8	2021-03-25 07:39:14 -07:00
Mikhail Zolotukhin	1ceb90405b	[TensorExpr] Add plumbing for conv2d fusion. (#54439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54439 For now the only way to represent conv2d in TE is via an external call, and since aten library doesn't have an out variant for conv2d, the external call has to perform an extra copy. Because of that fusing conv2d now regressed performance and hence is disabled. However, in near future we should have two alternative ways to enable it: 1) represent conv2d natively in TE (without an external call) 2) add an out variant for conv2d Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27237045 Pulled By: ZolotukhinM fbshipit-source-id: f5545ff711b75f9f37bc056316d1999a70043b4c	2021-03-24 18:49:07 -07:00
Chen Lai	7605ce4ed8	[PyTorch] Enable test_lite_interpreter_runtime running in android (#54579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54579 ## Summary 1. Eliminate a few more tests when BUILD_LITE_INTERPRETER is on, such that test_lite_interpreter_runtime can build and run on device. 2. Remove `#include <torch/torch.h>`, because it's not needed. ## Test plan Set the BUILD_TEST=ON `in build_android.sh`, then run ` BUILD_LITE_INTERPRETER=1 ./scripts/build_pytorch_android.sh x86` push binary to android device: ``` adb push ./build_android_x86/bin/test_lite_interpreter_runtime /data/local/tmp ``` Reorganize the folder in `/data/local/tmp` so the test binary and model file is like following: ``` /data/local/tmp/test_bin/test_lite_interpreter_runtime /data/local/tmp/test/cpp/lite_interpreter_runtime/sequence.ptl ``` such that the model file is in the correct path and can be found by the test_lite_interpreter_runtime. ![image](https://user-images.githubusercontent.com/16430979/112276332-d89d1900-8c3d-11eb-91de-7bf10d1e418d.png) Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D27300720 Pulled By: cccclai fbshipit-source-id: d9526c7d3db8c0d3e76c5a4d604c6877c78afdf9	2021-03-24 14:45:27 -07:00
anjali411	f9ca0d87a7	Teach Python TS frontend to parse complex literals (#52881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52881 This PR adds: 1. logic to parse complex constants (complex literals of the form `bj`) 2. logic to parse complex lists 3. support for complex constructors: `complex(tensor/int/float/bool, tensor/int/float/bool)` 4. Limited operator support - `add`, `sub`, `mul`, `torch.tensor`, `torch.as_tensor` Follow-up work: 1. Add complex support for unary and other registered ops. 2. support complex constructor with string as input (this is supported in Python eager mode). 3. Test all emitXYZ for all XYZ in `ir_emitter.cpp` (currently only emitConst, emitValueToTensor are tested). e.g., test loops etc. 4. onnx doesn't support complex tensors, so we should error out with a clear and descriptive error message. Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D27245059 Pulled By: anjali411 fbshipit-source-id: af043b5159ae99a9cc8691b5a8401503fa8d6f05	2021-03-24 08:12:17 -07:00
Raghavan Raman	601e79200d	[NNC] Implementing LoopFusion (#54461 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54337 This PR adds a new API to NNC to perform loop fusion. ``` static For* fuseLoops(const std::vector<For>& loops); ``` Loop fusion is done only when all the conditions below are satisfied. All the loops have the same parent. * There are no statements between these loops in their parent body. * The start bounds are the same for all loops. * The stop bounds are the same for all loops. * Fusing the loops does not violate or add any dependencies. This PR also adds an API to check for partial overlaps in `buffer_inference.h` and fixes a bug in `mem_dependency_checker.cpp`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54461 Reviewed By: bertmaher Differential Revision: D27254888 Pulled By: navahgar fbshipit-source-id: c21b027d738e5022e9cb88f6f72cd9e255bdb15e	2021-03-23 21:20:00 -07:00
Lucas Hosseini	a84afb3a7c	Use type-erased union for Buffer. (#54251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54251 Pull Request resolved: https://github.com/pytorch/tensorpipe/pull/324 In order to merge the channel hierarchies, we need a generic `Buffer` type, that can wrap either a `CpuBuffer` or a `CudaBuffer`. The constraints are that, since this type is used by the channels, it cannot explicitly refer to `CudaBuffer`. We propose here a type-erasure based solution, with small-buffer optimization to avoid heap-allocating the wrapped concrete buffer. This is a new version of D27001339 (`c618dc13d2`) which broke PyTorch OSS build. Test Plan: CI Reviewed By: lw, mrshenli Differential Revision: D27156053 fbshipit-source-id: 4244302af33a3be91dcd06093c0d6045d081d3cc	2021-03-19 04:58:09 -07:00
Peter Bell	04e0cbf5a9	Add padding='same' mode to conv{1,2,3}d (#45667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45667 First part of #3867 (Pooling operators still to do) This adds a `padding='same'` mode to the interface of `conv{n}d`and `nn.Conv{n}d`. This should match the behaviour of `tensorflow`. I couldn't find it explicitly documented but through experimentation I found `tensorflow` returns the shape `ceil(len/stride)` and always adds any extra asymmetric padding onto the right side of the input. Since the `native_functions.yaml` schema doesn't seem to support strings or enums, I've moved the function interface into python and it now dispatches between the numerically padded `conv{n}d` and the `_conv{n}d_same` variant. Underscores because I couldn't see any way to avoid exporting a function into the `torch` namespace. A note on asymmetric padding. The total padding required can be odd if both the kernel-length is even and the dilation is odd. mkldnn has native support for asymmetric padding, so there is no overhead there, but for other backends I resort to padding the input tensor by 1 on the right hand side to make the remaining padding symmetrical. In these cases, I use `TORCH_WARN_ONCE` to notify the user of the performance implications. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D27170744 Pulled By: jbschlosser fbshipit-source-id: b3d8a0380e0787ae781f2e5d8ee365a7bfd49f22	2021-03-18 16:22:03 -07:00
Raghavan Raman	4b2abc4b8e	[NNC] Adding API to distribute loops (#53865 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53864 This PR adds the following APIs that perform loop distribution to `LoopNest`: ``` static std::vector<For> distributeLoop(For loop, const std::unordered_set<Stmt>& pivots); static std::vector<For> distributeLoop(For* loop); static std::vector<For> distributeLoopOverInnerLoops(For loop); ``` * The first method distributes the given loop over its body by splitting after every given pivot stmt. * The second method distributes the given loop over every stmt in its body. * The last method distributes the given loop over its body by splitting after every `For` stmt in its body. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53865 Reviewed By: mruberry Differential Revision: D27075006 Pulled By: navahgar fbshipit-source-id: 031746aad619fe84c109e78b53387535e7f77cef	2021-03-18 07:27:39 -07:00
Xiaoqiang Zheng	9f86b656ba	Resubmit: Adding parallel support for the LLVM backend. (#54122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54122 Test Plan: * USE_TBB=1 ATEN_THREADING=TBB python setup.py develop --cmake * USE_TBB=1 ATEN_THREADING=NATIVE python setup.py develop --cmake * USE_TBB=1 ATEN_THREADING=OMP python setup.py develop --cmake * cd build; ninja bin/tensorexpr_bench * bin/test_tensorexpr --gtest_filter="Parallel" Reviewed By: bertmaher Differential Revision: D27109802 Pulled By: zheng-xq fbshipit-source-id: db159466d0b46357bcf0fbefb36094bee312368c	2021-03-18 07:19:37 -07:00
Mike Ruberry	8caa7889fc	Revert D27001339: Use type-erased union for Buffer. Test Plan: revert-hammer Differential Revision: D27001339 (`c618dc13d2`) Original commit changeset: 26d7dc19d69d fbshipit-source-id: 6e036ed7e1f71c9cf20e3361607c4fe4fa2d3d02	2021-03-18 05:27:17 -07:00
Lucas Hosseini	c618dc13d2	Use type-erased union for Buffer. (#322 ) Summary: Pull Request resolved: https://github.com/pytorch/tensorpipe/pull/322 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54145 In order to merge the channel hierarchies, we need a generic `Buffer` type, that can wrap either a `CpuBuffer` or a `CudaBuffer`. The constraints are that, since this type is used by the channels, it cannot explicitly refer to `CudaBuffer`. We propose here a type-erasure based solution, with small-buffer optimization to avoid heap-allocating the wrapped concrete buffer. ghstack-source-id: 124131499 Test Plan: CI Reviewed By: lw Differential Revision: D27001339 fbshipit-source-id: 26d7dc19d69d7e3336df6fd4ff6ec118dc17c5b6	2021-03-18 02:23:17 -07:00
Wanchao Liang	a4f0f8b1e9	[distributed] add base processgroup::options (#53662 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53662 Add a base processgroup::options so that we can do inheritance and provide a universal option API in python Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D26968856 Pulled By: wanchaol fbshipit-source-id: 858f4b61b27aecb1943959bba68f8c14114f67d8	2021-03-17 18:40:04 -07:00
Bert Maher	7367bca066	[nnc] Tests for proposed feature: loop bounds conditional simplification (#54121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54121 It would be nice to do range analysis to determine if a condition cannot be satisfied. These are some tests that we should be able to turn on once we have this feature. ghstack-source-id: 124116847 Test Plan: Simplify.*LoopBounds Reviewed By: ZolotukhinM Differential Revision: D27107956 fbshipit-source-id: bb27e3d3bc803f0101c416e4a351ba2278684980	2021-03-17 11:01:10 -07:00
Bert Maher	a852fdb6b5	[nnc] Test for using int64 dimensions (#54094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54094 We should be able to use 64-bit integers for loop boundaries and buffer/tensor indexing. ghstack-source-id: 124116846 Test Plan: New tests, disabled Reviewed By: ZolotukhinM Differential Revision: D27094934 fbshipit-source-id: a53de21a0ef523ea3560d5dd4707df50624896ef	2021-03-17 10:59:26 -07:00
Martin Yuan	524cb0a514	[PyTorch Mobile] Dedup method names in bytecode serialization (#53677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53677 When serializing bytecode, we serialize it based on methods. It may happen that there are multiple instances of a class. In such a case, the methods inside the class may be serialized multiple times. To reduce the duplication, we cache the qualified name of the methods, so that one method is serialized only once. Test Plan: existing unittests and CI Reviewed By: dhruvbird, raziel Differential Revision: D26933945 Pulled By: iseeyuan fbshipit-source-id: 8a9833949fa18f7103a5a0be19e2028040dc7717	2021-03-16 15:24:47 -07:00
Raghavan Raman	ef07a04072	[NNC] New APIs to get loops corresponding to a Buf (#53778 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53092 This PR adds the following APIs to NNC. ``` // In For: static For* getParentLoop(const Stmt* st); static std::vector<For> getEnclosingLoopNest(const Stmt st); // In LoopNest: std::vector<const Stmt> getAllWritesToBuf(const Buf) const; std::vector<For> getAllInnermostLoopsWritingToBuf(const Buf) const; std::vector<std::vector<For>> getAllLoopNestsWritingToBuf(const Buf) const; ``` These APIs are required for some usecases that involve multiple transformations like `splitWithTail` followed by `reorder` as shown in https://github.com/pytorch/pytorch/issues/53092 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53778 Reviewed By: albanD Differential Revision: D26987013 Pulled By: navahgar fbshipit-source-id: 491459eddfff045132d2358631ad069bbcc520df	2021-03-12 18:50:15 -08:00
Hui Guo	8737c2a1a2	[TensorExpr] Reland: "Simplify index expressions constructed in loop flattening. Fixes #51173 " (#53861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53861 Replaced the iterators in the for-loops with integer index variables due to overflow when handling empty vectors. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26998894 Pulled By: huiguoo fbshipit-source-id: a1f6475c8ba123968ef7247b4f6f38edbf24b9ef	2021-03-11 23:52:36 -08:00
Horace He	d4602b7e45	[NNC] Fixes case where inlining wouldn't work because dim-size was 1. (#53254 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52581 The git diff is absolutely atrocious since I also refactored the code to share stuff between `Load` and `FunctionCall`. Biggest questions I have about this diff are: 1. The asserts I added. From my understanding it's not possible to have a constant index in `Store` that's non-zero, since `Store` always creates a new buffer. Perhaps the user can write this kind of incorrect code, though, so perhaps I should just check for it and not assert it? 2. I don't think(?) I need to do any special handling for `index_vars`, but wasn't totally able to track the logic there. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53254 Reviewed By: albanD Differential Revision: D26991064 Pulled By: Chillee fbshipit-source-id: 0bcd612d5f4b031c0b34e68a72d9c8d12d118be8	2021-03-11 20:53:20 -08:00
Nikita Shulga	d57ae6c46d	Revert D26906509: Adding parallel support for the LLVM backend. Test Plan: revert-hammer Differential Revision: D26906509 (`95d2318510`) Original commit changeset: 12c17f2f21af fbshipit-source-id: cc86d0dfca0dd791b31bda23a0172fc1cfd89760	2021-03-11 17:54:47 -08:00
Edward Yang	07d315fce8	Revert D26676150: Simplify index expressions constructed in loop flattening - #51173 Test Plan: revert-hammer Differential Revision: D26676150 (`1f01899e4a`) Original commit changeset: e202e0c8610e fbshipit-source-id: 9611dda6897b67e16e44c731994bc9e5fccab0b9	2021-03-11 07:17:38 -08:00
Xiaoqiang Zheng	95d2318510	Adding parallel support for the LLVM backend. (#53243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53243 Test Plan: Imported from OSS Reviewed By: bertmaher, Chillee Differential Revision: D26906509 Pulled By: zheng-xq fbshipit-source-id: 12c17f2f21af11e73fa4c5b5199043a7a15ecdec	2021-03-11 03:27:37 -08:00
James Butterworth	37ab711822	Adding learning rate schedulers to C++ API (#52268 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50577 Learning rate schedulers had not yet been implemented for the C++ API. This pull request introduces the learning rate scheduler base class and the StepLR subclass. Furthermore, it modifies the existing OptimizerOptions such that the learning rate scheduler can modify the learning rate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52268 Reviewed By: mrshenli Differential Revision: D26818387 Pulled By: glaringlee fbshipit-source-id: 2b28024a8ea7081947c77374d6d643fdaa7174c1	2021-03-10 23:09:51 -08:00
Bert Maher	3bd250fd03	[nnc] Test ability to vectorize reads from an intermediate tensor (#53752 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53752 This test doesn't work today because we don't properly vectorize "FunctionCall" (which is the way one accesses an intermediate tensor). ghstack-source-id: 123592860 Test Plan: `buck test //caffe2/test/cpp/tensorexpr -- LoopNest.VectorizeUse` Reviewed By: ZolotukhinM Differential Revision: D26895550 fbshipit-source-id: 0798ebf3e6a834bd70181732c81528455d5329fa	2021-03-10 20:32:10 -08:00
Hui Guo	1f01899e4a	Simplify index expressions constructed in loop flattening - #51173 (#52882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52882 Test Plan: Imported from OSS build/bin/test_tensorexpr Reviewed By: ZolotukhinM Differential Revision: D26676150 Pulled By: huiguoo fbshipit-source-id: e202e0c8610eb107558a3add8a6560a0cb97704a	2021-03-10 18:37:42 -08:00
Bert Maher	997f05cd34	[nnc] Add an initialization expression to Reduce() (#53751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53751 Sometimes the initial value of a reduction expression needs to be computed with reference to the loop axes; for example, adding bias can be efficiently represented by initializing the accumulator from the bias tensor: ``` C[n, c, h, w] = bias[c] for (...) C[n, c, h, w] += ... ``` ghstack-source-id: 123592861 Test Plan: `buck test //caffe2/test/cpp/tensorexpr -- Reductions.InitFunction` Reviewed By: navahgar Differential Revision: D26940321 fbshipit-source-id: 8a08e19e5d0b9ad453a07fab8b61e75dcd3d626b	2021-03-10 17:13:14 -08:00

1 2 3 4 5 ...

1310 Commits