Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65499
When the tensors in question are contiguous, there is no need to go through dispatch, use TensorIterator, etc.
ghstack-source-id: 139549027
Test Plan:
Ran ptvsc2_predictor_bench for ctr_mobile_feed local net following https://fb.quip.com/q8hBAFGMeaOU (but without the profile and compare_results options).
Before:
I0922 14:00:32.261942 3132627 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 7.18124. Iters per second: 139.252
I0922 14:01:44.865965 3132627 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 7.25314. Iters per second: 137.871
I0922 14:02:56.929602 3132627 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 7.1986. Iters per second: 138.916
I0922 14:04:05.923025 3132627 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 6.89211. Iters per second: 145.093
I0922 14:05:17.953056 3132627 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 7.19577. Iters per second: 138.971
mean: 7.144172, stddev: 0.1283
After:
I0922 13:51:55.233937 3086245 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 6.79709. Iters per second: 147.122
I0922 13:53:03.062682 3086245 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 6.77605. Iters per second: 147.579
I0922 13:54:10.230386 3086245 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 6.70993. Iters per second: 149.033
I0922 13:55:18.403434 3086245 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 6.81044. Iters per second: 146.833
I0922 13:56:26.568646 3086245 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 6.80965. Iters per second: 146.85
mean: 6.800632, stddev: 0.013227
Looks like about a 5.3% improvement.
Reviewed By: hlu1
Differential Revision: D31125492
fbshipit-source-id: 92ab5af242d0a84dcf865323a57b48e8374eb823
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65384
The following pattern appears frequently in `ops.cpp`:
```
if (!n->matches(schema_1) && !n->matches(schema_2) && ... && !n->matches(schema_n)) {
LogAndDumpSchema(n);
return nullptr;
}
return [](ProcessedNode* p_node) {
if (p_node->Output(0).isNone()) {
if (p_node->Input(i).isSomeType()) {
// special logic for schema 1
} else if (p_node->Input(i).isSomeOtherType()) {
// special logic for schema 2
} else if (...) {
// special logic for schema3
}
// and so on
} else {
// another complicated type checking chain
}
};
```
A much cleaner way to implement operator overloads is like this:
```
if (n->matches(schema_1)) {
return schema_1_impl;
} else if (n->matches(schema_2)) {
return schema_2_impl;
}
// and so on
```
This has a few advantages:
* Significantly reduces complexity of the out variant implementations, especially for ops with more than 2 overloads. One implementation corresponds to one schema. This makes the implementation more readable/maintainable.
* Adhering to this convention makes it easier to add a new overload. Just add a new `n->matches(...)` case instead of working the schema into existing complicated logic.
* Ops are marginally faster since we don't have to check types at runtime.
Note: there are a few cases where this actually made the code less concise (`aten::div`), so I left those ops untouched.
Thanks for pointing this out in another diff d1jang
Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`
Reviewed By: hlu1
Differential Revision: D31072328
fbshipit-source-id: c40a4f7e6a79881e94c9ec49e9008ed75cfc8688
Summary:
This PR attempts to port `baddbmm` and `bmm` to structured kernels. The reason it's in the same PR: because a lot of it is common for both the ops, including the checks and implementation.
Issue tracker: https://github.com/pytorch/pytorch/issues/55070
cc: ysiraichi ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64805
Reviewed By: gchanan
Differential Revision: D31134454
Pulled By: ezyang
fbshipit-source-id: 3294619834a8cc6a0407aea660c556d3a42b6261
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65387
Added a customized NNC implementation for signed log1p kernel and enabled the fusion pass that adds the fused signed log1p op.
Also, added a SR microbenchmark for this kernel which shows the performance improvement.
Without fusion:
```
--------------------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------------------
BM_signed_log1p/16 1953 ns 1953 ns 358746
BM_signed_log1p/64 2049 ns 2049 ns 342145
BM_signed_log1p/512 3291 ns 3291 ns 214342
BM_signed_log1p/4096 15559 ns 15559 ns 44420
BM_signed_log1p/32768 101936 ns 101935 ns 6843
BM_signed_log1p/65536 194792 ns 194789 ns 3615
```
With NNC fusion:
```
--------------------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------------------
BM_signed_log1p/16 369 ns 369 ns 1896179
BM_signed_log1p/64 497 ns 497 ns 1406995
BM_signed_log1p/512 1618 ns 1618 ns 430209
BM_signed_log1p/4096 11327 ns 11326 ns 61463
BM_signed_log1p/32768 84099 ns 84086 ns 8325
BM_signed_log1p/65536 166531 ns 166510 ns 4186
```
This clearly shows >15% improvement in performance of this kernel with NNC fusion.
On inline_cvr local model, there is a small improvement in terms of profiled time spent on ops:
without fusion: `0.9%` (computed by adding the % spent on all the 4 ops involved)
with NNC fusion: `0.55%`
Test Plan:
`buck test mode/opt-clang //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- SignedLog1p`
Also, did the accuracy test with inline_cvr as described here, https://fb.quip.com/qmdDAJzEmPtf, on the full size model (285298536_1)
```
get 57220 prediction values
get 57220 prediction values
max_error: 0 total: 0
```
Reviewed By: hlu1
Differential Revision: D30609492
fbshipit-source-id: d2e68df580569a30ee61abb0ef18d2c4c56827bd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64209
Add a new fusion pass that turns transforms the following pattern:
```
graph(%input):
%0 : Tensor = aten::sign(%input)
%1 : Tensor = aten::abs(%input)
%2 : Tensor = aten::log1p(%1)
%res : Tensor = aten::mul(%0, %2)
return (%res)
```
Into a single op:
```
graph(%input):
%res : Tensor = static_runtim::signed_log1p(%input)
return (%res)
```
The intent is to reduce the number of passes over the tensor. However, enabling this pass actually causes a performance regression, probably due to a lack of vectorization in the fused implementation. Because of this issue, this diff **does not** enable this pass.
Followup: navahgar will add an NNC kernel which is faster than the the unfused version and enable this pass. We still need this version as a fallback since the NNC kernel will not support all dtypes.
Test Plan:
`buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- SignedLog1p`
Test passed with new graph pass disabled and enabled.
Reviewed By: hlu1
Differential Revision: D30559929
fbshipit-source-id: e4e080cb2e6a705cfdde1fc98bee92b723f8132a
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64159
Test Plan:
Confirm out variant is called for both versions:
```
> buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1
```
Reviewed By: mikeiovine
Differential Revision: D30622819
fbshipit-source-id: a2c8c7f969dae5f507718fb3d513e1fb4f026736
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64157
UseVariadicCat optimization is not applied to aten::cat if list input to the op can not be moved to the position before op (https://fburl.com/diffusion/l6kweimu). For these cases we will need out version for SR.
Test Plan:
Confirm out variant is called:
```
> buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1
```
Reviewed By: d1jang
Differential Revision: D30598574
fbshipit-source-id: 74cfa8291dc8b5df4aef58adfb1ab2a16f10d90a
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64070
Test Plan:
Confirm out variant is called for both versions:
```
> buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1
```
Reviewed By: d1jang
Differential Revision: D30595816
fbshipit-source-id: e88d88d4fc698774e83a98efce66b8fa4e281563
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64101
Checking `getOutOfPlaceOperation(n)` is a very expensive operation, especially in multithreaded environments, due to a lock acquisition when the NNC cache is queried. This slows down the memory planner initialization time, and by extension, the latency for the first static runtime inference.
There are two optimizations in this diff:
* Cache the result of `p_node->has_out_variant()` to avoid the call to `getOutOfPlaceOperation`. This speeds up calls to `canReuseInputOutputs`, which in turn speeds up `isOptimizableContainerType`
* Precompute all `isOptimizableContainerType` during static runtime initialization to avoid a pass over all of each node's inputs.
Test Plan: All unit tests pass: `buck test caffe2/benchmarks/static_runtime/...`
Reviewed By: movefast1990
Differential Revision: D30595579
fbshipit-source-id: 70aaa7af9589c739c672788bf662f711731864f2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64078
This change converts `aten::layer_norm -> output Tensor` to `static_runtime::layer_norm -> (output Tensor, temp1 Tensor, tmp2 Tensor)` to manage `tmp1` and `tmp2` Tensors by the static runtime.
Currently the out-variant of `aten::layer_norm` creates two temporary Tensors inside it:
```
at::Tensor mean = create_empty_from({M}, *X);
at::Tensor rstd = create_empty_from({M}, *X);
```
that the static runtime misses an opportunity to manage.
This change puts them into (unused) output Tensors of a new placeholder op `static_runtime::layer_norm` so that the static runtime can mange them since the static runtime as of now chooses to manage only output tensors.
Test Plan:
- Enhanced `StaticRuntime.LayerNorm` to ensure that `static_runtime::layer_norm` gets activated.
- Confirmed that the new op gets activated during testing:
```
V0825 12:51:50.017890 2265227 impl.cpp:1396] Switch to out variant for node: %8 : Tensor, %9 : Tensor, %10 : Tensor = static_runtime::layer_norm(%input.1, %normalized_shape.1, %4, %4, %5, %3)
```
Reviewed By: hlu1
Differential Revision: D30486475
fbshipit-source-id: 5121c44ab58c2d8a954aa0bbd9dfeb7468347a2d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63999
Use folly::F14FastMap/F14FastSet instead of std::unordered_map/unordered_set in the Static Runtime code base. folly::F14FastMap/F14FastSet implements the same APIs as std::unordered_map/unordered_set but faster. For details see https://github.com/facebook/folly/blob/master/folly/container/F14.md
Reviewed By: d1jang
Differential Revision: D30566149
fbshipit-source-id: 20a7fa2519e4dde96fb3fc61ef6c92bf6d759383
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63980
The out variant implementation of `aten::clone` causes a crash, which needs further investigation. This change disables it until the problem gets fixed.
Note that `inline_cvr` doesn't use `aten::clone` as of now, so no perf implication: https://www.internalfb.com/phabricator/paste/view/P446858755?lines=121
Test Plan: N/A
Reviewed By: hlu1
Differential Revision: D30544149
fbshipit-source-id: facb334d67473f622b36862fbdb2633358556fdf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63839
Replaced the use of a constant for clamp in the NNC code for Logit
with a variable. This makes it easier to enable caching for Logit.
There is no performance difference with this change, as shown in the micro-benchmarks below.
```
Logit NNC Benchmark Time (ns)
const-clamp var-clamp
logit_nnc_sleef/64 550 543
logit_nnc_sleef/512 3514 3517
logit_nnc_sleef/8192 85537 82900
logit_nnc_sleef/32768 347635 337016
logit_nnc_fast/64 173 167
logit_nnc_fast/512 829 866
logit_nnc_fast/8192 13286 13069
logit_nnc_fast/32768 51116 53429
logit_nnc_vml/64 146 164
logit_nnc_vml/512 773 783
logit_nnc_vml/8192 11556 11563
logit_nnc_vml/32768 44815 46720
```
Test Plan: SR unit tests and the inline_cvr model.
Reviewed By: bertmaher
Differential Revision: D30405466
fbshipit-source-id: adb891fdae5746439931ce5f43165291fec08f52
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63838
Refactored NNC operator definitions code into separate files.
Made `TEWrapper` a class with a fixed set of methods and added separate definitions for them based on `TORCH_ENABLE_LLVM` to keep the same functionality as before.
Test Plan: Build and ran Static Runtime tests.
Reviewed By: hlu1
Differential Revision: D30405467
fbshipit-source-id: 606ef852bb820d5e23a0f8af1bf5dc122e90bceb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63579
Provide a static runtime out variant implementation for the new op introduced in D30426232 (1385f9fb12).
Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- IndividualOps_VarStack`
Reviewed By: navahgar
Differential Revision: D30410525
fbshipit-source-id: bc59a3d8ad23e3d94561ec2dca9cc20687dbadf8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63578
Added a new op `prim::VarStack` and a pass that transforms instances of `aten::stack(list, dim)` into `prim::VarStack(list[0], ..., list[n], dim)`. Also provided a JIT interpreter implementation.
Most of the implementation/tests are the same as `prim::VarConcat`.
Test Plan: `buck test caffe2/test/cpp/jit:jit -- TestStackOpt`
Reviewed By: navahgar
Differential Revision: D30426232
fbshipit-source-id: 9829a7db6e0a5038c9b7528c43c25b0c221aa2ce
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63587
Now that there is no classes using KernelArena for memory management we
can remove it.
Differential Revision:
D30429115
D30429115
Test Plan: Imported from OSS
Reviewed By: navahgar
Pulled By: ZolotukhinM
fbshipit-source-id: 375f6f9294d27790645eeb7cb5a8e87047a57544
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63586
This is another commit in transition from KernelArena memory management.
Tensor is essentially just a pair of <BufPtr, StmtPtr> and we don't need
to dynamically allocate it at all - it's cheap to pass it by value, and
that's what we're switching to in this commit.
After this change nothing uses KernelScope/KernelArena and they can be
safely removed.
Differential Revision:
D30429114
D30429114
Test Plan: Imported from OSS
Reviewed By: navahgar
Pulled By: ZolotukhinM
fbshipit-source-id: f90b859cfe863692b7beffbe9bd0e4143df1e819
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63195
This helps us to later switch from using KernelArena with raw pointers
to shared pointers without having to change all our source files at
once.
The changes are mechanical and should not affect any functionality.
With this PR, we're changing the following:
* `Add*` --> `AddPtr`
* `new Add(...)` --> `alloc<Add>(...)`
* `dynamic_cast<Add*>` --> `to<Add>`
* `static_cast<Add*>` --> `static_to<Add>`
Due to some complications with args forwarding, some places became more
verbose, e.g.:
* `new Block({})` --> `new Block(std::vector<ExprPtr>())`
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D30292779
Pulled By: ZolotukhinM
fbshipit-source-id: 150301c7d2df56b608b035827b6a9a87f5e2d9e9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62921
Added a cache for NNC generated code across different calls to the same ops.
Before this diff:
```
ProcessedNode time 13402.9 ms
Static Module initialization took 30964.8 ms
```
After this diff:
```
ProcessedNode time 85.4195 ms
Static Module initialization took 4348.42 ms
```
There is one global cache for all the ops. It is guarded with a reader-writer lock. This is necessary because we could have multiple threads loading different models in parallel. Note that this locking does not guarantee that there will be exactly one code generated for each op. There could be more than one thread generating code for the same op simultaneously and all of them will update the cache in some order. But that should be small number bounded by the number of threads. Also, there is no correctness issue, since the generated code is always the same and the one generated by the last thread is retained in the cache and reused later while running the model.
Test Plan: Tested inline_cvr model
Reviewed By: hlu1
Differential Revision: D30104017
fbshipit-source-id: 32e9af43d7e724ed54b661dfe58a73a14e443ff7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61983
Trial #2. The previous PR (https://github.com/pytorch/pytorch/pull/61498) was reverted because this caused a failure in `pytorch_linux_backward_compatibility_check_test`. Fixed that now by adding to the exception list in `check_backward_compatibility.py`.
Test Plan: Imported from OSS
Reviewed By: eellison
Differential Revision: D29828830
Pulled By: navahgar
fbshipit-source-id: 947a7b1622ff6e3e575c051b8f34a789e105bcee
Summary:
Re-land of D29935444
We previously had lots of ops with implementations like this:
```
if (p_node->Output(0).isNone()) {
p_node->Output(0) = create_empty_like(input_0);
}
...
auto& out = p_node->Output(0);
some_func_out(inputs, out);
```
This would make the output have the correct shape. But it would
also take the dtype of `input_0`, which is not always correct.
This change transforms these blocks to:
```
if (p_node->Output(0).isNone()) {
p_node->Output(0) = some_func(inputs)
} else {
...
auto& out = p_node->Output(0);
some_func_out(inputs, out);
}
```
This gives the output the correct shape and dtype.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62267
Reviewed By: ejguan
Differential Revision: D29937253
Pulled By: malfet
fbshipit-source-id: d91ca5d5703490d7d349a1de2ad3bb09b0c33967
Summary:
We previously had lots of ops with implementations like this:
```
if (p_node->Output(0).isNone()) {
p_node->Output(0) = create_empty_like(input_0);
}
...
auto& out = p_node->Output(0);
some_func_out(inputs, out);
```
This would make the output have the correct shape. But it would
also take the dtype of `input_0`, which is not always correct.
This change transforms these blocks to:
```
if (p_node->Output(0).isNone()) {
p_node->Output(0) = some_func(inputs)
} else {
...
auto& out = p_node->Output(0);
some_func_out(inputs, out);
}
```
This gives the output the correct shape and dtype.
Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest`
Reviewed By: hlu1
Differential Revision: D29887367
fbshipit-source-id: cef04bfa52ec082ad3a9a32aa27c44e275c6b24c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62067
The wrapper for aten::cat is no longer needed after the variadic cat change in D29565344 (ae58a4c45d) .
Also added a simple test to test dynamic shapes, i.e., input tensors in args2 are larger than in args1.
Reviewed By: navahgar, mikeiovine
Differential Revision: D29864600
fbshipit-source-id: 44a712c2e776815c09e0bf5631412149b81274b2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61361
This PR ports the `clamp` kernel to the structured format. In addition, it introduces `OptionalScalarRef` as a replacement for `c10::optional<Scalar>&`. The latter, although it is a reference type, can still involve copying the contained `Scalar` (e.g. if the actual parameter is a `Scalar` or if a `c10::optional<Scalar>` is constructed just to call a kernel). `OptionalScalarRef` contains only a `const Scalar&`, and stores flag about whether the instance contains something inside the `Scalar` itself using a new tag.
For more information, see #55070.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D29821533
Pulled By: SplitInfinity
fbshipit-source-id: 88d55df5a4b2c14b68a57e4905d90eea1b088d99
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`
All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008
Reviewed By: driazati, r-barnes
Differential Revision: D29838584
Pulled By: malfet
fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61595
Add out variant wrapper for `aten::linear` in the static runtime
Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest`
Reviewed By: hlu1
Differential Revision: D29684236
fbshipit-source-id: 94df6d7267b3f269b2cadf065f207648777147df
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61505
The handling of `self` in static runtime was previously incorrect. This diff fixed that issue, since self is essential to prim::GetAttr/SetAttr. After all, most of the time we're getting and setting attributes from self, the torch script module.
Reviewed By: ajyu
Differential Revision: D29350173
fbshipit-source-id: 6e62add4cda517ef8cd6c315d4cb0595e7d531fb
Summary:
This PR suppresses clang-tidy warnings in the codebase (for now) so that we can re-enable clang-tidy checks on master.
I ran this script to add the `NOLINTNEXTLINE` comments (on a devserver):
```bash
python3 setup.py develop
# Uses same script that's run on CI and adds the -j (parallel), -s (add comments), -k (continue if diagnostic errors are found) options
python3 tools/clang_tidy.py \
-j \
-s \
-k \
-v \
--paths torch/csrc/ \
-g"-torch/csrc/jit/passes/onnx/helper.cpp" \
-g"-torch/csrc/jit/passes/onnx/shape_type_inference.cpp" \
-g"-torch/csrc/jit/serialization/onnx.cpp" \
-g"-torch/csrc/jit/serialization/export.cpp" \
-g"-torch/csrc/jit/serialization/import.cpp" \
-g"-torch/csrc/jit/serialization/import_legacy.cpp" \
-g"-torch/csrc/onnx/init.cpp" \
-g"-torch/csrc/cuda/nccl.*" \
-g"-torch/csrc/cuda/python_nccl.cpp" \
-g"-torch/csrc/autograd/FunctionsManual.cpp" \
-g"-torch/csrc/generic/*.cpp" \
-g"-torch/csrc/jit/codegen/cuda/runtime/*" \
-g"-torch/csrc/deploy/interpreter/interpreter.cpp" \
-g"-torch/csrc/deploy/interpreter/interpreter.h" \
-g"-torch/csrc/deploy/interpreter/interpreter_impl.h" \
-g"-torch/csrc/deploy/interpreter/test_main.cpp"
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60649
Test Plan: Verified changes by re-running the script (without the `-s` option) and seeing no warnings/errors.
Reviewed By: walterddr, janeyx99
Differential Revision: D29504258
Pulled By: 1ntEgr8
fbshipit-source-id: 78310b30ee8213b73ddb4771ad874665323e7a4e