pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
jjsjann123	873ced7cd0	Nvfuser code bump 030122 (#73627 ) Summary: Things changed in this PR that requires review: test/forward_backward_compatibility/check_forward_backward_compatibility.py Our previous function overload extension names were wrong and has been updated in this PR, hence the compatibility list updated. nvfuser code updates with bug fixes towards failures we encountered in OpInfoTests as well as failures reported by AOTAutograd team. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73627 Reviewed By: Chillee Differential Revision: D34765458 Pulled By: davidberard98 fbshipit-source-id: c81f3d6a1b723fb3a8ba419b7f82227f70440ca7 (cherry picked from commit b6a2c362c37051e44fac31687b2fe272f776551e)	2022-03-31 08:18:22 +00:00
jiej	2d110d514f	Nvfuser code bump 2_1_2022 (#72127 ) Summary: Things changed in this PR that requires review: 1. aten/src/ATen/core/interned_strings.h 2. torch/csrc/jit/ir/alias_analysis.h : exposing createValue to allow efficient mutation 3. torch/csrc/jit/runtime/symbolic_shape_registry.cpp : added gelu/tanh/erf in registry 4. torch/jit/_script.py : throws scripting model sees autocast as decorator since it's not supported nvfuser code update: 1. codegen improvements and performance tuning 2. integration bug fixes for shape expression logic 3. kernel segmentation update to address perf regression from horizontal fusion 4. scalar cpu tensor promotion to support inter-device operation between cpu scalar tensor and cuda tensor Things reverted from local changes: aten::gelu with approximation (tracked in PR: https://github.com/pytorch/pytorch/pull/61439) Pull Request resolved: https://github.com/pytorch/pytorch/pull/72127 Reviewed By: HamidShojanazeri Differential Revision: D34113233 Pulled By: jbschlosser fbshipit-source-id: b82cde32b71e324eca0ea57cb8c9f9647278ca74 (cherry picked from commit `e009bc5c4e`)	2022-02-15 00:43:16 +00:00
Mikhail Zolotukhin	1855b14922	[TensorExpr] Delet `DimArg` class. (#72390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72390 This class didn't add much value and only caused more boilerplate code. This change removes the class and updates all the use cases with uses of `ExprHandle`. A side effect of this change is different names in loop variables, which caused massive mechanical changes in our tests. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D34030296 Pulled By: ZolotukhinM fbshipit-source-id: 2ba4e313506a43ab129a10d99e72b638b7d40108 (cherry picked from commit `c2ec46a058`)	2022-02-11 01:21:59 +00:00
Raghavan Raman	4eb277ac61	[bench] Adding a cpp benchmark to compare performance of nnc with static and symbolic shapes (#72197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72197 Test Plan: Imported from OSS Reviewed By: huiguoo Differential Revision: D33951742 Pulled By: navahgar fbshipit-source-id: 0412d61da158e98429f377469e1c331587390b14 (cherry picked from commit `c043fdfc79`)	2022-02-07 07:01:19 +00:00
Raghavan Raman	237e960ec9	[bench] Fix build issues with TensorExpr cpp benchmarks (#72196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72196 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D33951743 Pulled By: navahgar fbshipit-source-id: f1b36bb3ba9cd649f0dbf0911f5a9e4791089e65 (cherry picked from commit `fbe5cadb5f`)	2022-02-07 07:01:19 +00:00
Raghavan Raman	38f696c0cd	[nnc] Add a API to unroll loops by a given factor (#72071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72071 Reviewed By: ngimel Differential Revision: D33946250 Pulled By: navahgar fbshipit-source-id: 3f3f92054174620025a9d71154d006f1738953e2 (cherry picked from commit `d8b53598e9`)	2022-02-03 18:41:21 +00:00
Richard Barnes	29d759948e	use irange for loops 2 (#66746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66746 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D31705361 fbshipit-source-id: 33fd22eb03086d114e2c98e56703e8ec84460268	2021-12-10 04:26:23 -08:00
CodemodService FBSourceClangFormatLinterBot	143491e0ad	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D32484422 fbshipit-source-id: 5c836dc7d06f12e64cc4bb1e85d8fa4b62a29b85	2021-11-17 07:27:04 -08:00
jjsjann123	0dc3f829d9	Nvfuser code bump 11 5 (#67943 ) Summary: nvfuser code update: 1. Tuning heuristics on schedulers for reduction/normalization kernels; 2. bfloat16 on IO tensor support; 3. Refactored memory format support, now we can support dimension collapsing with non-coherent input tensors with different memory format. e.g. channels last tensor input to batch normalization. Note that we are currently limiting memory format to only Contiguous and Channels last; 4. Refactored nvfuser graph partitioning in `graph_fuser.cpp`, separated node merge and profile node API. Updated `profiling_record.cpp`. Things that are reverted from our local branch: 1. changes on some entries in autodiff 2. aten::gelu with approximation 3. native_dropout(_backward) Pull Request resolved: https://github.com/pytorch/pytorch/pull/67943 Reviewed By: ngimel Differential Revision: D32288709 Pulled By: dzhulgakov fbshipit-source-id: fc9491182ea7e0158bc112c66f096823c588eaf1	2021-11-17 01:22:17 -08:00
Hao Lu	938bab0bfd	[PyTorch] Add int version of vectorized PrefixSum to Benchmark (#67865 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67865 - Add int version of vectorized PrefixSum - Use unaligned load/store instructions - Add exclusive scan version. "exclusive" means that the i-th input element is not included in the i-th sum. For details see https://en.cppreference.com/w/cpp/algorithm/exclusive_scan Test Plan: ``` buck build mode/opt-clang //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench OMP_NUM_THREADS=1 numactl -m 0 -C 5 \ ./buck-out/opt/gen/caffe2/benchmarks/cpp/tensorexpr/tensorexpr_bench --benchmark_filter=PrefixSumBench ``` For full benchmark results, see P465274613 ``` PrefixSumBench/LocalInt/64 57 ns 56 ns 12414048 GB/s=9.06239G/s PrefixSumBench/LocalInt/256 221 ns 221 ns 3160853 GB/s=9.28635G/s PrefixSumBench/LocalInt/1024 818 ns 817 ns 857922 GB/s=10.0235G/s PrefixSumBench/LocalInt/4096 3211 ns 3210 ns 217614 GB/s=10.2093G/s PrefixSumBench/LocalInt/16384 12806 ns 12804 ns 54805 GB/s=10.2364G/s PrefixSumBench/LocalInt/65536 51115 ns 51079 ns 13741 GB/s=10.2643G/s PrefixSumBench/LocalInt/262144 205974 ns 205912 ns 3401 GB/s=10.1847G/s PrefixSumBench/LocalInt/1048576 829523 ns 828859 ns 845 GB/s=10.1207G/s PrefixSumBench/LocalIntAVX2/64 45 ns 45 ns 15568113 GB/s=11.3549G/s PrefixSumBench/LocalIntAVX2/256 208 ns 208 ns 3371174 GB/s=9.86913G/s PrefixSumBench/LocalIntAVX2/1024 893 ns 892 ns 783154 GB/s=9.18629G/s PrefixSumBench/LocalIntAVX2/4096 3618 ns 3613 ns 193834 GB/s=9.06838G/s PrefixSumBench/LocalIntAVX2/16384 14416 ns 14411 ns 48564 GB/s=9.09543G/s PrefixSumBench/LocalIntAVX2/65536 57650 ns 57617 ns 12156 GB/s=9.09952G/s PrefixSumBench/LocalIntAVX2/262144 230855 ns 230612 ns 3035 GB/s=9.09386G/s PrefixSumBench/LocalIntAVX2/1048576 924265 ns 923777 ns 758 GB/s=9.08077G/s PrefixSumBench/LocalIntAVX512/64 23 ns 23 ns 24876551 GB/s=22.0697G/s PrefixSumBench/LocalIntAVX512/256 95 ns 95 ns 7387386 GB/s=21.556G/s PrefixSumBench/LocalIntAVX512/1024 435 ns 435 ns 1609682 GB/s=18.8425G/s PrefixSumBench/LocalIntAVX512/4096 1815 ns 1815 ns 385462 GB/s=18.0561G/s PrefixSumBench/LocalIntAVX512/16384 7479 ns 7476 ns 93660 GB/s=17.5335G/s PrefixSumBench/LocalIntAVX512/65536 30171 ns 29879 ns 23430 GB/s=17.5468G/s PrefixSumBench/LocalIntAVX512/262144 125805 ns 125631 ns 5570 GB/s=16.6929G/s PrefixSumBench/LocalIntAVX512/1048576 504216 ns 503983 ns 1384 GB/s=16.6446G/s PrefixSumBench/ExclusiveScanIntAVX512/64 23 ns 23 ns 30058295 PrefixSumBench/ExclusiveScanIntAVX512/256 101 ns 101 ns 7398498 PrefixSumBench/ExclusiveScanIntAVX512/1024 435 ns 434 ns 1403877 PrefixSumBench/ExclusiveScanIntAVX512/4096 1979 ns 1978 ns 354016 PrefixSumBench/ExclusiveScanIntAVX512/16384 7828 ns 7819 ns 89551 PrefixSumBench/ExclusiveScanIntAVX512/65536 31206 ns 31192 ns 22408 PrefixSumBench/ExclusiveScanIntAVX512/262144 130106 ns 130023 ns 5388 PrefixSumBench/ExclusiveScanIntAVX512/1048576 525515 ns 524976 ns 1244 ``` Reviewed By: navahgar, swolchok Differential Revision: D32011740 fbshipit-source-id: 7962de710bd588291dd6bf0c719f579c55f7c063	2021-11-04 14:00:19 -07:00
Shashank Chaudhry	89c4e8c22b	[NOOP][clangformat][codemod] Enable CLANGFORMAT for some folders in caffe2/* (#67746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67746 Test Plan: Visual inspection. Sandcastle. Reviewed By: zertosh Differential Revision: D31986646 fbshipit-source-id: 91885c20c3cead3853c49abb9fe0a94a67f33cc8	2021-11-03 12:23:14 -07:00
Xue Li	2f099c7555	Revert D30652629: use irange for loops Test Plan: revert-hammer Differential Revision: D30652629 (`687c2267d4`) Original commit changeset: 0ae6c4bbbb55 fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3	2021-10-15 15:23:10 -07:00
Richard Barnes	687c2267d4	use irange for loops (#66234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. bypass_size_limit allow-large-files Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D30652629 fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e	2021-10-15 13:50:33 -07:00
Nikita Shulga	4c4525fa5c	Compile without -Wno-unused-variable (take 2) (#66041 ) Summary: Delete `-Wno-unused-variable` from top level `CMakeLists.txt` Still suppress those warnings for tests and `torch_python` Delete number of unused variables from caffe2 code Use `(void)var;` to suppress unused variable in range loops Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants Do not delete `caffe2::OperatorBase::Output` calls as they have side effects Pull Request resolved: https://github.com/pytorch/pytorch/pull/66041 Reviewed By: ngimel Differential Revision: D31360142 Pulled By: malfet fbshipit-source-id: 6fdfb9f91efdc49ca984a2f2a17ee377d28210c8	2021-10-04 20:39:39 -07:00
Nikita Shulga	e4ee5ca698	Revert D31326599: [pytorch][PR] Compile without -Wno-unused-variable Test Plan: revert-hammer Differential Revision: D31326599 (`a6280ab653`) Original commit changeset: 924155f1257a fbshipit-source-id: b8ee5bc0298637443232f5ee9ec79e51ed256faf	2021-10-01 20:40:47 -07:00
Nikita Shulga	a6280ab653	Compile without -Wno-unused-variable (#65954 ) Summary: Delete `-Wno-unused-variable` from top level `CMakeLists.txt` Still suppress those warnings for tests and `torch_python` Delete number of unused variables from caffe2 code Use `(void)var;` to suppress unused variable in range loops Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants Pull Request resolved: https://github.com/pytorch/pytorch/pull/65954 Reviewed By: ngimel Differential Revision: D31326599 Pulled By: malfet fbshipit-source-id: 924155f1257a2ba1896c50512f615e45ca1f61f3	2021-10-01 17:40:47 -07:00
Mikhail Zolotukhin	3a0165da49	[TensorExpr] Port NNC lowerings to the new registry mechanism. (#65551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65551 Previously we had a big switch on Op kind to decide how to lower a given JIT operator to NNC. This PR changes this switch to a hash table lookup. Why? This helps us with at least two things: 1) With this approach we can easily check if we know how to handle a given node in advance - i.e. we can inspect the entire graph and tell whether it's possible to compile it or not without actually trying to do that and dying in the middle. This would allow us to, say, provide user-friendly error messages in AOT workflow. 2) We can switch to use schema instead of op kind to determine correct lowering. Unlike op schema, op kind might be ambigous (see e.g. #64963) and using it instead of schema can lead to bugs. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31148926 Pulled By: ZolotukhinM fbshipit-source-id: ac12684e2126c899426ef5e4cc1e3f70fa01f704	2021-09-30 22:56:18 -07:00
Raghavan Raman	8f3983254b	[MicroBench] Added a micro benchmark for prefix sum (#65790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65790 Here are the results of the benchmark: * ATen - version that calls `at::cumsum` * NNC - a simple prefix-sum loop implemented in NNC (not vectorized) * Local - a C++ implementation of the simple prefix-sum loop * LocalAVX2 - a vectorized C++ implementation of prefix-sum, only using AVX2 * LocalAVX512 - a vectorized C++ implementation of prefix-sum, using AVX512. The vectorized implementations are from the paper "Parallel Prefix Sum with SIMD" in ADMS' 20. ``` $ OMP_NUM_THREADS=1 ./buck-out/opt/gen/caffe2/benchmarks/cpp/tensorexpr/tensorexpr_bench --benchmark_filter=PrefixSumBench Run on (36 X 1601 MHz CPU s) 2021-09-28 23:13:12 ------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------------------ PrefixSumBench/ATen/64 1289 ns 1289 ns 543199 GB/s=397.069M/s PrefixSumBench/ATen/256 1867 ns 1867 ns 374232 GB/s=1096.8M/s PrefixSumBench/ATen/1024 4169 ns 4169 ns 167889 GB/s=1.9649G/s PrefixSumBench/ATen/4096 14137 ns 14136 ns 49266 GB/s=2.31806G/s PrefixSumBench/ATen/16384 49887 ns 49883 ns 13988 GB/s=2.6276G/s PrefixSumBench/ATen/65536 193742 ns 193686 ns 3628 GB/s=2.7069G/s PrefixSumBench/ATen/262144 764803 ns 764774 ns 917 GB/s=2.74219G/s PrefixSumBench/ATen/1048576 3040653 ns 3040277 ns 231 GB/s=2.75916G/s PrefixSumBench/Local/64 586 ns 586 ns 1197003 GB/s=873.244M/s PrefixSumBench/Local/256 1077 ns 1077 ns 646265 GB/s=1.90143G/s PrefixSumBench/Local/1024 3050 ns 3050 ns 229458 GB/s=2.68579G/s PrefixSumBench/Local/4096 11910 ns 11910 ns 58953 GB/s=2.75132G/s PrefixSumBench/Local/16384 43204 ns 43202 ns 16081 GB/s=3.03393G/s PrefixSumBench/Local/65536 167966 ns 167966 ns 4154 GB/s=3.12139G/s PrefixSumBench/Local/262144 667631 ns 667613 ns 1048 GB/s=3.14127G/s PrefixSumBench/Local/1048576 2654785 ns 2654631 ns 264 GB/s=3.15999G/s PrefixSumBench/NNC/64 642 ns 642 ns 1095277 GB/s=797.442M/s PrefixSumBench/NNC/256 1139 ns 1138 ns 617214 GB/s=1.799G/s PrefixSumBench/NNC/1024 3103 ns 3103 ns 225531 GB/s=2.63979G/s PrefixSumBench/NNC/4096 12053 ns 12052 ns 58084 GB/s=2.71883G/s PrefixSumBench/NNC/16384 43227 ns 43225 ns 16192 GB/s=3.03231G/s PrefixSumBench/NNC/65536 168065 ns 168056 ns 4153 GB/s=3.11972G/s PrefixSumBench/NNC/262144 668974 ns 668921 ns 1045 GB/s=3.13513G/s PrefixSumBench/NNC/1048576 2657464 ns 2657341 ns 263 GB/s=3.15677G/s PrefixSumBench/LocalAVX2/64 523 ns 523 ns 1351308 GB/s=979.537M/s PrefixSumBench/LocalAVX2/256 755 ns 755 ns 927762 GB/s=2.71159G/s PrefixSumBench/LocalAVX2/1024 1759 ns 1759 ns 400355 GB/s=4.65609G/s PrefixSumBench/LocalAVX2/4096 6708 ns 6706 ns 103959 GB/s=4.88649G/s PrefixSumBench/LocalAVX2/16384 22143 ns 22142 ns 31229 GB/s=5.91951G/s PrefixSumBench/LocalAVX2/65536 83649 ns 83642 ns 8350 GB/s=6.26828G/s PrefixSumBench/LocalAVX2/262144 330433 ns 330427 ns 2133 GB/s=6.34679G/s PrefixSumBench/LocalAVX2/1048576 1302301 ns 1302179 ns 537 GB/s=6.44198G/s PrefixSumBench/LocalAVX512/64 474 ns 474 ns 1459151 GB/s=1080.8M/s PrefixSumBench/LocalAVX512/256 576 ns 576 ns 1217442 GB/s=3.55524G/s PrefixSumBench/LocalAVX512/1024 994 ns 994 ns 703387 GB/s=8.24434G/s PrefixSumBench/LocalAVX512/4096 3642 ns 3641 ns 190646 GB/s=8.99857G/s PrefixSumBench/LocalAVX512/16384 10140 ns 10140 ns 68947 GB/s=12.9267G/s PrefixSumBench/LocalAVX512/65536 35739 ns 35736 ns 19567 GB/s=14.6711G/s PrefixSumBench/LocalAVX512/262144 156415 ns 156413 ns 4467 GB/s=13.4078G/s PrefixSumBench/LocalAVX512/1048576 613952 ns 613876 ns 1144 GB/s=13.665G/s ``` Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D31253849 Pulled By: navahgar fbshipit-source-id: f33e7be787c86a09e90babddd66b16e2e0777eb4	2021-09-30 14:44:52 -07:00
jiej	127c9402d0	Revert "Revert D30752939: [pytorch][PR] nvfuser update" (#65137 ) Summary: This reverts commit `03389dc851`. Attempt again for PR: https://github.com/pytorch/pytorch/issues/63745 Fixes the windows build failure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65137 Reviewed By: seemethere, dzhulgakov, heitorschueroff Differential Revision: D30994556 Pulled By: malfet fbshipit-source-id: f1925b6c5cc1a1a441a96499667c91e8dfc1b53d	2021-09-22 04:54:51 -07:00
Eli Uriegas	03389dc851	Revert D30752939: [pytorch][PR] nvfuser update Test Plan: revert-hammer Differential Revision: D30752939 (`cfaecaf40b`) Original commit changeset: ce122e80f01b fbshipit-source-id: 57685df8f9946032a06eff1de8a3d1498500d2d2	2021-09-15 17:38:47 -07:00
jiej	cfaecaf40b	nvfuser update (#63745 ) Summary: Syncing nvfuser code base from devel branch, Listing a few of our development since last sync: - Extends support to normalization and reduction kernels. - Multiple kernel launch for single `CudaFusionGroup`. Hierarchical caching system has been updated to cache graph segmentation. - profile_ivalue is enabled to convert dynamic scalar into compile time constants, which are required by the codegen. (e.g. reduction axes). To keep this PR simple and relatively review-free. We stripped most external changes and submitted them as separate PRs, so this gigantic PR is easier to handle. internal updates are files located in: 1. updates in nvfuser codegen `torch/csrc/jit/coddgen/cuda` 2. added nvfuser specific benchmarks `benchmarks/cpp/nvfuser` 3. nvfuser jit cpp tests `test/cpp/jit/test_gpu.cpp` `test/cpp/jit/test_gpu_shift.cpp` `test/cpp/jit/test_gpu_validator.h` updates affecting integration: 1. profile_ivalue enabled for nvfuser. related changes are in `torch/csrc/jit/runtime/`, 2. exposed a few more symbols `aten/src/ATen/core/` used by codegen Pull Request resolved: https://github.com/pytorch/pytorch/pull/63745 Reviewed By: saketh-are Differential Revision: D30752939 Pulled By: malfet fbshipit-source-id: ce122e80f01bcd3865f5bd3c4dfde660665fd84c	2021-09-15 14:42:55 -07:00
Mikhail Zolotukhin	f23f21dafe	[TensorExpr] Remove 'Placeholder' class. (#64887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64887 BufHandle has exactly the same functionality and should be used instead. Differential Revision: D30889483 D30889483 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 365fe8e396731b88920535a3de96bd3301aaa3f3	2021-09-14 00:22:44 -07:00
Raghavan Raman	2cc9778495	[MicroBench] Added a log_vml version of the signed log1p kernel (#64205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64205 The log_vml version of the micro-bench is over 2x faster than the log1p version. Here are the perf numbers: ``` --------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------- SignedLog1pBench/ATen/10/1467 45915 ns 45908 ns 14506 GB/s=2.5564G/s SignedLog1pBench/NNC/10/1467 40469 ns 40466 ns 17367 GB/s=2.9002G/s SignedLog1pBench/NNCLogVml/10/1467 19560 ns 19559 ns 35902 GB/s=6.00016G/s ``` Thanks to bertmaher for pointing this out. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30644716 Pulled By: navahgar fbshipit-source-id: ba2b32c79d4265cd48a2886b0c62d0e89ff69c19	2021-09-10 16:49:06 -07:00
Raghavan Raman	dc4fd3bdda	[MicroBench] Added a micro benchmark for a signed log1p kernel. (#64032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64032 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30579198 Pulled By: navahgar fbshipit-source-id: a53d68225fba768b26491d14b535f8f2dcf50c0e	2021-08-30 09:27:51 -07:00
Mikhail Zolotukhin	f0d274294d	[TensorExpr] Nuke KernelArena and KernelScope. (#63587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63587 Now that there is no classes using KernelArena for memory management we can remove it. Differential Revision: D30429115 D30429115 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 375f6f9294d27790645eeb7cb5a8e87047a57544	2021-08-24 00:32:16 -07:00
Mikhail Zolotukhin	62d02f2b57	[TensorExpr] Make 'Tensor' a value type. (#63586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63586 This is another commit in transition from KernelArena memory management. Tensor is essentially just a pair of <BufPtr, StmtPtr> and we don't need to dynamically allocate it at all - it's cheap to pass it by value, and that's what we're switching to in this commit. After this change nothing uses KernelScope/KernelArena and they can be safely removed. Differential Revision: D30429114 D30429114 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: f90b859cfe863692b7beffbe9bd0e4143df1e819	2021-08-24 00:32:13 -07:00
Mikhail Zolotukhin	dd96c26066	[TensorExpr] More NFC changes like Expr* -> ExprPtr. (#63778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63778 This is a preparation for a switch from raw pointers to shared pointers as a memory model for TE expressions and statements. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30487425 Pulled By: ZolotukhinM fbshipit-source-id: 9cbe817b7d4e5fc2f150b29bb9b3bf578868f20c	2021-08-24 00:30:49 -07:00
Philip Meier	99203580a9	Updates internal `assert_allclose` callsites in favor of `assert_close` (#61841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61841 Redo of #60863. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30408145 Pulled By: mruberry fbshipit-source-id: 0b34ebc7f23ba38ecd89640b61d8aca59b7eab58	2021-08-19 12:50:41 -07:00
Nikita Shulga	a9b0a921d5	Disable `avoid-non-const-global-variables` lint check (#62008 ) Summary: As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH` All changes but the ones to `.clang-tidy` are generated using following script: ``` for i in `find . -type f -iname ".c" -or -iname "*.h"\|xargs grep cppcoreguidelines-avoid-non-const-global-variables\|cut -f1 -d:\|sort\|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008 Reviewed By: driazati, r-barnes Differential Revision: D29838584 Pulled By: malfet fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13	2021-07-22 18:04:40 -07:00
Bert Maher	93772792e3	[nnc] Get rid of fuser trigger counters (#57334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57334 Here's a possibly controversial PR. These counters got in the way of generalizing the fuser tests to handle arbitrary devices, and I guess I'm just generally skeptical that they provide much value. While true that they let us observe whether fusion groups were created, we already have assertions based on the shape of the graph, and I'm not sure that I trust those any less than these counters. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29471484 Pulled By: bertmaher fbshipit-source-id: f6d76f6e72dbfb581acff1d834b0c74500941b57	2021-06-29 22:22:15 -07:00
Bert Maher	10e11dbdcd	Reland D29190420: [nnc][tests] Tests and benchmarks for computeSum (#60550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60550 Original commit changeset: ed655497a981 Whatever gcc version OSS Bazel uses wasn't happy move-constructing the SimpleIREvaluator, so use a unique_ptr instead. Test Plan: CI. Hope that the gcc version used by OSS Bazel build is happier with this (it should be), since actually testing it locally is an intractable pain. Reviewed By: navahgar Differential Revision: D29333116 fbshipit-source-id: c3e4b5d8c91eb96a43ae5315a01ca0c0f4d4a99d	2021-06-23 10:50:03 -07:00
Anjali Chourdia	b14f19b6fe	Revert D29190420: [nnc][tests] Tests and benchmarks for computeSum Test Plan: revert-hammer Differential Revision: D29190420 (`21479ad20c`) Original commit changeset: 86246df82098 fbshipit-source-id: ed655497a981783da4c8f13e2d7fec104e3cb184	2021-06-23 06:59:37 -07:00
Bert Maher	21479ad20c	[nnc][tests] Tests and benchmarks for computeSum (#60160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60160 Adds a few simple tests and benchmarks for the `computeSum` op (equivalent to `at::sum`). The benchmarks test 1D reduction and 2D row and column reduction. Performance is in the ballpark of aten (14-15 GB/s) on my skylake devserver for all cases, and occasionally better (e.g. 256k * 64 row reduction goes from 9 GB/s to 13). Results (on my skylake-avx512, with turbo disabled): ``` ------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------------------ Reduce1D/Torch/16777216 4746995 ns 4746722 ns 150 BYTES=14.1379G/s Reduce1D/Naive/16777216 34063215 ns 34061388 ns 21 BYTES=1.97023G/s Reduce1D/NativeRfactor/16777216 5057175 ns 5057167 ns 139 BYTES=13.2701G/s Reduce1D/TeNaive/16777216 33868945 ns 33868851 ns 21 BYTES=1.98143G/s Reduce1D/TeSplitTail/16777216 33902786 ns 33900436 ns 21 BYTES=1.97959G/s Reduce1D/TeSplitMask/16777216 33922509 ns 33920604 ns 21 BYTES=1.97841G/s Reduce1D/TeRfactorV1/16777216 5141150 ns 5141002 ns 135 BYTES=13.0537G/s Reduce1D/Op/16777216 5140390 ns 5140091 ns 135 BYTES=13.056G/s Reduce2DCol/Torch/8/2097152 12824403 ns 12823563 ns 55 BYTES=5.8874G/s Reduce2DCol/Torch/64/262144 8306873 ns 8306743 ns 83 BYTES=8.20507G/s Reduce2DCol/Torch/4096/4096 7992364 ns 7992239 ns 87 BYTES=8.3988G/s Reduce2DCol/OpSchedule/8/2097152/0 4866144 ns 4865766 ns 138 BYTES=15.5161G/s Reduce2DCol/OpSchedule/64/262144/0 36668978 ns 36666415 ns 19 BYTES=1.85885G/s Reduce2DCol/OpSchedule/4096/4096/0 155862459 ns 155801266 ns 4 BYTES=430.839M/s Reduce2DCol/OpSchedule/8/2097152/1 8067683 ns 8061117 ns 85 BYTES=9.36563G/s Reduce2DCol/OpSchedule/64/262144/1 7496686 ns 7496562 ns 93 BYTES=9.09183G/s Reduce2DCol/OpSchedule/4096/4096/1 5262821 ns 5262186 ns 131 BYTES=12.7562G/s Reduce2DCol/OpSchedule/8/2097152/2 6237899 ns 6237210 ns 109 BYTES=12.1044G/s Reduce2DCol/OpSchedule/64/262144/2 5258012 ns 5257655 ns 127 BYTES=12.9635G/s Reduce2DCol/OpSchedule/4096/4096/2 5231686 ns 5228241 ns 132 BYTES=12.839G/s Reduce2DCol/OpSchedule/8/2097152/3 11088573 ns 11087557 ns 62 BYTES=6.80921G/s Reduce2DCol/OpSchedule/64/262144/3 5338843 ns 5338326 ns 127 BYTES=12.7676G/s Reduce2DCol/OpSchedule/4096/4096/3 4311617 ns 4308102 ns 162 BYTES=15.5812G/s Reduce2DRow/Torch/8/2097152 4642244 ns 4641794 ns 151 BYTES=14.4575G/s Reduce2DRow/Torch/64/262144 4628311 ns 4628245 ns 151 BYTES=14.4999G/s Reduce2DRow/Torch/4096/4096 4894012 ns 4893316 ns 143 BYTES=13.7177G/s Reduce2DRow/Torch/262144/64 10469098 ns 10468027 ns 68 BYTES=6.51101G/s Reduce2DRow/Hand/262144/64 5554380 ns 5554059 ns 126 BYTES=12.2716G/s Reduce2DRow/OpSchedule/8/2097152/0 33890363 ns 33888931 ns 21 BYTES=1.98026G/s Reduce2DRow/OpSchedule/64/262144/0 33901317 ns 33899436 ns 21 BYTES=1.97965G/s Reduce2DRow/OpSchedule/4096/4096/0 33500358 ns 33498815 ns 21 BYTES=2.00381G/s Reduce2DRow/OpSchedule/262144/64/0 13132231 ns 13131049 ns 53 BYTES=5.19056G/s Reduce2DRow/OpSchedule/8/2097152/1 5200423 ns 5200025 ns 134 BYTES=12.9055G/s Reduce2DRow/OpSchedule/64/262144/1 5204428 ns 5204327 ns 133 BYTES=12.8949G/s Reduce2DRow/OpSchedule/4096/4096/1 8724355 ns 8723370 ns 80 BYTES=7.69488G/s Reduce2DRow/OpSchedule/262144/64/1 1811861280 ns 1811352083 ns 1 BYTES=37.6279M/s Reduce2DRow/OpSchedule/8/2097152/2 9169829 ns 9168946 ns 76 BYTES=7.31915G/s Reduce2DRow/OpSchedule/64/262144/2 9159901 ns 9158560 ns 76 BYTES=7.32747G/s Reduce2DRow/OpSchedule/4096/4096/2 9217398 ns 9215557 ns 76 BYTES=7.28391G/s Reduce2DRow/OpSchedule/262144/64/2 10820450 ns 10818998 ns 66 BYTES=6.29979G/s Reduce2DRow/OpSchedule/8/2097152/3 5227921 ns 5226544 ns 133 BYTES=12.84G/s Reduce2DRow/OpSchedule/64/262144/3 5194362 ns 5194082 ns 133 BYTES=12.9203G/s Reduce2DRow/OpSchedule/4096/4096/3 5196080 ns 5195349 ns 134 BYTES=12.9203G/s Reduce2DRow/OpSchedule/262144/64/3 5235189 ns 5234728 ns 133 BYTES=13.0202G/s ``` ghstack-source-id: 131753875 Test Plan: these tests Reviewed By: navahgar Differential Revision: D29190420 fbshipit-source-id: 86246df82098da4f5493d6c4f34a40016d95a9f0	2021-06-22 23:04:09 -07:00
Bert Maher	fbeb8b4992	[nnc] Speed up batchnorm benchmark Summary: Use better scheduling: fuse and parallelize NC, fuse and vectorize HW. ``` ----------------------------------------------- N/C/H/W ATen NNC ----------------------------------------------- 1/64/112/112 45449 ns 36672 ns 1/256/14/14 15555 ns 7116 ns 1/128/28/28 15737 ns 8560 ns 1/64/56/56 20766 ns 12153 ns 1/512/7/7 16985 ns 8182 ns 5/64/112/112 2532475 ns 2069668 ns 5/256/14/14 24507 ns 12228 ns 5/128/28/28 29352 ns 20146 ns 5/64/56/56 44786 ns 38784 ns 5/512/7/7 22307 ns 20505 ns ``` Test Plan: benchmark results above Reviewed By: navahgar Differential Revision: D29288658 fbshipit-source-id: dd05efa4b7d26b6ad94f54a9ef6c8c47adb160b5	2021-06-22 22:57:43 -07:00
Raghavan Raman	dd7bbe1a63	[NNC] Make splitWithMask transform in-place (#58269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58269 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28427227 Pulled By: navahgar fbshipit-source-id: 4e38a436abcf4752fd7ef6ab3666876eec6ea5ba	2021-05-25 11:32:51 -07:00
Raghavan Raman	e2467cc43e	[NNC] Make splitWithTail transform in-place (#58268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58268 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28427228 Pulled By: navahgar fbshipit-source-id: 270b62c4e83739ad21dd68f375120e56881b394f	2021-05-25 11:31:14 -07:00
Nikita Shulga	3a66a1cb99	[clang-tidy] Exclude cppcoreguidelines-avoid-magic-numbers (#57841 ) Summary: Add cppcoreguidelines-avoid-magic-numbers exclusion to clang-tidy Remove existing nolint warnings using following script: ``` for file in `git ls-files \| grep -v \.py`; do gsed '/^ *\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers)/d' -i $file; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57841 Reviewed By: samestep Differential Revision: D28295045 Pulled By: malfet fbshipit-source-id: 7c6e8d1213c9593f169ed3df6a916498f1a97163	2021-05-07 20:02:33 -07:00
Nikita Shulga	4cb534f92e	Make PyTorch code-base clang-tidy compliant (#56892 ) Summary: This is an automatic change generated by the following script: ``` #!/usr/bin/env python3 from subprocess import check_output, check_call import os def get_compiled_files_list(): import json with open("build/compile_commands.json") as f: data = json.load(f) files = [os.path.relpath(node['file']) for node in data] for idx, fname in enumerate(files): if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'): files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')] return files def run_clang_tidy(fname): check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"]) changes = check_output(["git", "ls-files", "-m"]) if len(changes) == 0: return check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"]) def main(): git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n") compiled_files = get_compiled_files_list() for idx, fname in enumerate(git_files): if fname not in compiled_files: continue if fname.startswith("caffe2/contrib/aten/"): continue print(f"[{idx}/{len(git_files)}] Processing {fname}") run_clang_tidy(fname) if __name__ == "__main__": main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892 Reviewed By: H-Huang Differential Revision: D27991944 Pulled By: malfet fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179	2021-04-28 14:10:25 -07:00
Bert Maher	c42dd8b257	Revert "Use at::cpu in bench_approx (#56563 )" (#56816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56816 This doesn't actually work. For some reason the linker can't find at::cpu::logit_out, and it's not worth digging into why not. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D27977406 Pulled By: bertmaher fbshipit-source-id: d0235a393f25243e2c8a011e9baf267daf483ae4	2021-04-26 23:51:49 -07:00
Bert Maher	461e887d92	CPU Convolution benchmark harness for some popular models (#56455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56455 CPU convolution performance is pretty important for inference, so tracking performance for CNNs often boils down to finding shapes that have either regressed or need optimization. This diff adds a benchmark harness that lets you pretty easily add new sets of convolution parameters to benchmark. I've started with an exhaustive list of layers from MobileNetV3, ResNet-18 and ResNet-50, which are fairly popular torchvision models. More to come if these prove useful. I've also added four backend configurations: - native: uses at::conv2d, which applies its own backend selection heuristics - mkldnn_none: uses mkldnn but applies no prepacking; uses the NCHW default - mkldnn_weight: prepacks weights in an mkldnn-friendly format - mkldnn_input: also prepacks the inputs in NCHW16c ghstack-source-id: 127027784 Test Plan: Ran this on my Skylake Xeon Reviewed By: ngimel Differential Revision: D27876139 fbshipit-source-id: 950e1dfa09a33cc3acc7efd579f56df8453af1f2	2021-04-22 22:14:36 -07:00
Bert Maher	57cba8e601	Use at::cpu in bench_approx (#56563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56563 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27902737 Pulled By: bertmaher fbshipit-source-id: 66962671afbb093d5ae0b9308a401536c06ce8f5	2021-04-21 22:56:07 -07:00
Ailing Zhang	f096245610	AutoNonVariableTypeMode->InferenceMode in OSS. (#56421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56421 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27866609 Pulled By: ailzhang fbshipit-source-id: 040991a031c5511501b03cfe21a4a636586e120e	2021-04-19 18:07:41 -07:00
Raghavan Raman	164de39a11	Fix build failure due to namespace change for log_out and tanh_out (#56278 ) Summary: There is a build failure in `bench_approx.cpp` due to namespace change for log_out and tanh_out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56278 Reviewed By: bertmaher, nikithamalgifb Differential Revision: D27825621 Pulled By: navahgar fbshipit-source-id: 0bccd324af92a3460610bf475514449f0223de2b	2021-04-16 13:34:32 -07:00
Mikhail Zolotukhin	7ab654afd7	[TensorExpr] Rename `Tensor::call` to `Tensor::load` to be consistent with `Buf` and `Placeholder`. (#55826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55826 It's a mechanical change. Differential Revision: D27717777 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: fbc1bb99602250c706cf2c8c2684119c323e4d51	2021-04-13 12:08:53 -07:00
Mikhail Zolotukhin	1263448cb2	[TensorExpr] Remove mask field from Load and Store classes. (#55825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55825 The mask has never been used (in vectorization we generate an explicit `IfThenElse` construct when we need to mask out some elements). The PR removes it and cleans up all its traces from tests. Differential Revision: D27717776 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 41d1feeea4322da75b3999d661801c2a7f82b9db	2021-04-13 12:08:51 -07:00
Mikhail Zolotukhin	754b0d073a	[TensorExpr] Unbreak benchmarks. (#55824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55824 Seemingly some of my last changes (namely, removing dep-tracker) broke the TE benchmarks. This PR fixes it. Differential Revision: D27717778 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 48584bc0cfd4879a3e44cb45ee1f0d5c91b5afbc	2021-04-13 12:08:50 -07:00
Mikhail Zolotukhin	b01a15d3d3	[TensorExpr] Redesign Rfactor loopnest transformation. (#55324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55324 With this change `rfactor` only affects the passed loop and its body never touching anything outside (that was a rootcause of a bug with the previous implementation). Also, we don't have an `insertion_point` parameter anymore - its meaning was vague, and the effect of it should've been achievable with other transformations anyway. The new `rfactor` semantics is as follows: ``` Requirements: * S is the reduction store * S is the only statement in the innermost loop * There is at least two reduction arguments in S * OUTER_REDUCTION_FOR loop corresponds to the outermost reduction variable used in the store and all other reduction variables are index variables of children loops of OUTER_REDUCTION_FOR * OUTER_REDUCTION_FOR is a perfect loop nest, i.e. it has only loops corresponding to the other reduction variables and the store, nested into each other What it does: * Introduce a new buffer with an extra dimension of a size equal to the span of the loop OUTER_REDUCTION_FOR (the new buffer is returned via RFAC_BUF_PTR) * Insert an initialization store for the new buffer in OUTER_REDUCTION_FOR before its nested loop * Replace the reduction store to the original buffer with the reduction store to the temp buffer, removing the index var of OUTER_REDUCTION_FOR from reduction arguments * Insert a final reduction store over the extra dimension of the new buffer to the original buffer * Returns TRUE if the transformation succeeded and FALSE otherwise Example: Original IR: S1: for i # normal axis S2: X[i] = 0 S3: for j # reduction axis S4: for k # reduction axis S5: X[i] = ReduceOp(X[i] + Y[i,j,k], reduce_axis={j,k}) After RFACTOR(S5, S3) S1: for i # normal axis S2: X[i] = 0 S3: for j # reduction axis for X, normal axis for X_rfac X_rfac[i,j] = 0 S4: for k # reduction axis X_rfac[i,j] = ReduceOp(X_rfac[i,j] + Y[i,j,k], reduce_axis={k}) X[i] = ReduceOp(X[i] + X_rfac[i,j], reduce_axis={j}) ``` Differential Revision: D27694960 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 076fa6a1df2c23f5948302aa6b43e82cb222901c	2021-04-13 12:08:48 -07:00
Ailing Zhang	24c904951c	Replace AutoNonVariableTypeMode with InferenceMode in fbcode. (#55114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55114 Test Plan: CI Reviewed By: ezyang, bhosmer Differential Revision: D27472768 fbshipit-source-id: 76f17ef7de40f6e04e2968f8958027b5f93e1c0c	2021-04-02 11:45:53 -07:00
Mikhail Zolotukhin	688e350725	[TensorExpr] Nuke DepTracker and findAllNeededTensors. (#54997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54997 DepTracker was used to automatically pull in dependent computations from output ones. While it seems quite convenient, it's led to several architectural issues, which are fixed in this stack. DepTracker worked on Tensors, which is a pair of Buf and Stmt. However, Stmt could become stale and there was no way to reliably update the corresponding tensor. We're now using Bufs and Stmts directly and moving away from using Tensors to avoid these problems. Removing DepTracker allowed to unify Loads and FunctionCalls, which essentially were duplicates of each other. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27446414 Pulled By: ZolotukhinM fbshipit-source-id: a2a32749d5b28beed92a601da33d126c0a2cf399	2021-04-01 19:46:26 -07:00
Wenlei Xie	53596cdb73	Remove hacky wrapper for about 100 kernels (#54367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54367 Codemod commands generated by https://github.com/pytorch/pytorch/pull/54098 ghstack-source-id: 124804544 Test Plan: buck build //caffe2/aten/... Reviewed By: smessmer Differential Revision: D27210057 fbshipit-source-id: 368dc77843468cfc44535488a040dbc2cb67208d	2021-03-25 10:00:16 -07:00

1 2

76 Commits