pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Jamie King	812bc1dde6	Smart Decay for Adam - DPER3 (#62058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62058 This is the second diff in this stack. This diff includes the changes to DPER3; the first diff includes the changes to Caffe2. We want to decay learning parameters properly. Previously this was not done when a parameter is absent from a minibatch. We fix this by keeping track of missed minibatches and making decay catch up accordingly. The exponential moving averages (EMA) for the first and second moments used in Adam are updated only for parameters seen in a minibatch. Actually, for these parameters, 0 should be added to the EMAs and the EMAs should then be decayed by multiplying by beta1 and beta2 respectively. To avoid the computational overhead of touching every parameter for every minibatch, we: * keep track of the last time a parameter is seen * instead of decaying the EMAs by multiplying by beta1 and beta2, we multiply by beta1^k and beta2^k, where k is the number of minibatches since the parameter was last seen. We hope this will significantly improve the inconsistent learning parameter issue we have seen with Adam. Differential Revision: D29638897 fbshipit-source-id: 18d8e227d72c2e23010ca81e0f6eeb78872c8d3c	2021-07-23 13:26:30 -07:00
Nikita Shulga	a9b0a921d5	Disable `avoid-non-const-global-variables` lint check (#62008 ) Summary: As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH` All changes but the ones to `.clang-tidy` are generated using following script: ``` for i in `find . -type f -iname ".c" -or -iname "*.h"\|xargs grep cppcoreguidelines-avoid-non-const-global-variables\|cut -f1 -d:\|sort\|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008 Reviewed By: driazati, r-barnes Differential Revision: D29838584 Pulled By: malfet fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13	2021-07-22 18:04:40 -07:00
Pyre Bot Jr	d00bb45846	[typing] suppress errors in `fbcode/caffe2` - batch 2 Test Plan: Sandcastle Differential Revision: D29827809 fbshipit-source-id: 7ca7c2a33d691ac57392945b78a320d253c84ed4	2021-07-21 17:56:26 -07:00
Kaige Liu	094abf5fd0	[BE] Include a unit test for Save Operator with db_options Summary: A test case that triggers db_options with the save operator is missing. Test Plan: buck test Differential Revision: D29642719 fbshipit-source-id: 72b7374d40430398abac26dfe91538550525384d	2021-07-19 12:22:59 -07:00
Jamie King	c23db9327a	Smart Decay for Adam - Caffe2 (#61548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61548 We want to decay learning parameters properly. Previously this was not done when a parameter is absent from a minibatch. We fix this by keeping track of missed minibatches and making decay catch up accordingly. The exponential moving averages (EMA) for the first and second moments used in Adam are updated only for parameters seen in a minibatch. Actually, for these parameters, 0 should be added to the EMAs and the EMAs should then be decayed by multiplying by beta1 and beta2 respectively. To avoid the computational overhead of touching every parameter for every minibatch, we: * keep track of the last time a parameter is seen * instead of decaying the EMAs by multiplying by beta1 and beta2, we multiply by beta1^k and beta2^k, where k is the number of minibatches since the parameter was last seen * we calculate the amount of momentum that would have been discharged over the missed minibatches and update the weight accordingly. Differential Revision: D29654246 fbshipit-source-id: 7a6cd7966eb1f31116d99dfce79a78b2d3ee9e3e	2021-07-14 10:22:38 -07:00
Kaige Liu	58adaaba60	Enable C2 load rate limiter [2/n] (#61551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61551 We aim to enable rate limiter in C2 load, with a fix bandwidth limit. This diff update LoadOp to pass down the manifold db options. Test Plan: ``` buck test mode/opt caffe2/caffe2/python/operator_test:load_save_test ``` Differential Revision: D29639102 fbshipit-source-id: cf69549adadf4c7f12a8a2b7f3ca39092cab4b99	2021-07-14 08:27:05 -07:00
Nikita Shulga	f291b1899f	Revert D27978269: Smart Decay for Adam - Caffe2 Test Plan: revert-hammer Differential Revision: D27978269 (`aaa1e07609`) Original commit changeset: e47524101ddf fbshipit-source-id: 334824bbf9a6ed788e75af9c292754081f70a19b	2021-07-10 13:09:58 -07:00
Jamie King	aaa1e07609	Smart Decay for Adam - Caffe2 (#61488 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61488 We want to decay learning parameters properly. Previously this was not done when a parameter is absent from a minibatch. We fix this by keeping track of missed minibatches and making decay catch up accordingly. The exponential moving averages (EMA) for the first and second moments used in Adam are updated only for parameters seen in a minibatch. Actually, for these parameters, 0 should be added to the EMAs and the EMAs should then be decayed by multiplying by beta1 and beta2 respectively. To avoid the computational overhead of touching every parameter for every minibatch, we: * keep track of the last time a parameter is seen * instead of decaying the EMAs by multiplying by beta1 and beta2, we multiply by beta1^k and beta2^k, where k is the number of minibatches since the parameter was last seen. Differential Revision: D27978269 fbshipit-source-id: e47524101ddfcb281c46c505b9b7a8f0835bc64a	2021-07-09 18:28:21 -07:00
Feng Shi	b4a4a8434d	[1/n]support double for Caffe2 ScatterWeightedSum (#60402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60402 Add float64 data type support for ScatterWeightedSum for cases that 10^7 precision is not sufficient. Test Plan: buck test caffe2/caffe2/python/operator_test:sparse_ops_test -- testScatterWeightedSum Reviewed By: jianyuh Differential Revision: D29190324 fbshipit-source-id: 871a60744694e901a2c7685a67350860745d6729	2021-06-29 14:17:04 -07:00
Adam Simpkins	fadaa52f64	[caffe2] add an EstimateAllBlobSizes operator (#59775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59775 This operator is similar to `GetAllBlobNames` but also returns the estimated size required to serialize each node. One goal of this operator is to allow checkpoint saving logic to estimate the amount of space/bandwidth required to save a checkpoint when first starting training, without actually serializing any blobs yet. Currently the checkpointing logic uses `GetAllBlobNames` to determine the blobs to checkpoint. It can instead be updated to use `EstimateAllBlobSizes` to also get an estimate for how much space will be required for the checkpoint. ghstack-source-id: 132275153 Test Plan: Included a new unit test. Reviewed By: mraway Differential Revision: D29020227 fbshipit-source-id: 811e5d86c4b59183e84e6424c48c97739be09043	2021-06-24 16:55:22 -07:00
Baichuan Yuan	dca97b4394	Weighted decay with frequency (count-based) (#60382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60382 Instead of setting weight_decay w uniformly for all ids, for each row i in the sparse embedding table, the actual weight_decay `w_i` becomes `wfreq_i` where `freq_i = halflife/counter_i \in [\log(2), halflife]`. Counter is from `rowwise_counter` with definition `counter_i = 1 + \exp(-iter_{\delta}\rho)*counter_i`. Test Plan: buck test //caffe2/caffe2/python/operator_test:adagrad_test -- test_row_wise_sparse_adagrad buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_weight_decay Reviewed By: 0x10cxR1 Differential Revision: D25581030 fbshipit-source-id: 54b3831b20516c76c559b13d8deb809e2ee3b446	2021-06-21 18:46:35 -07:00
Stephen Macke	769c299dcf	[caffe2] add tests for inplace elementwise ops (#60106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60106 In Caffe2, some elementwise in-place compatible ops lack coverage for the in-place case. We add tests for a subset of them here and thereby increase coverage. Test Plan: ``` buck test //caffe2/caffe2/python/operator_test:elementwise_ops_test ``` Let CI run. Reviewed By: clrfb Differential Revision: D29143189 fbshipit-source-id: 83138ad8eff8fe95c40aece53714da3577396a23	2021-06-21 12:04:18 -07:00
Masaki Kozuki	c19acf816f	Replace TensorRT's deprecated API in `caffe2/python/trt/test_pt_onnx_trt.py` (#60236 ) Summary: TensorRT v8 is going to remove some functions/methods that used in test. ref: - getMaxWorkspaceSize deprecation: `b2d60b6e10/include/NvInfer.h (L6984-L6993)` - buildCudaEngine deprecation: `b2d60b6e10/include/NvInfer.h (L7079-L7087)` cc ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/60236 Reviewed By: gchanan Differential Revision: D29232376 Pulled By: ngimel fbshipit-source-id: 2b8a48787bf61c68a81568b6026d6afd5a83e751	2021-06-19 19:56:30 -07:00
Stephen Macke	e50f264b51	[caffe2] make MulGradient implementation in-place compatible (#60035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60035 In Caffe2, the operator schema for the MulGradient op indicates that MulGradient may be performed in-place, overwriting one of its inputs as the output. The implementation is not safe to perform in-place however, due to an accidentally-introduced write-read dependency on the overwriten input in the in-place case. We fix it here. Test Plan: ``` buck test //caffe2/caffe2/python/operator_test:elementwise_ops_test ``` Note that the newly added test fails without this change, but passes with this change: ``` ✓ ListingSuccess: caffe2/caffe2/python/operator_test:elementwise_ops_test - main (24.992) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_exp (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_log1p (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_abs (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_bitwise_and (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_reciprocal (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_sqr (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_rsqrt (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_mul (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_sqrt (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_add (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_swish_gradient_inplace (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_sigmoid (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_bitwise_or (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_cbrt_grad (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_not (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_sub (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_div (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_eq (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_softsign (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_eq_bcast (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_powt (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) *********************************************************************************************************************************************************************************** *******************************<NEW_TEST_YAY>******************************************************************************************************************************** ********************************************************************************************************************************************************************************* ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_mul_gradient_inplace (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ********************************************************************************************************************************************************************************* *******************************</NEW_TEST_YAY>******************************************************************************************************************************* *********************************************************************************************************************************************************************************** ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_hard_sigmoid (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_bitwise_xor (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_log (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_cube (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_swish (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_cbrt (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - test_div_legacy_grad (caffe2.caffe2.python.operator_test.elementwise_ops_test.TestElementwiseOps) (125.898) ✓ Pass: caffe2/caffe2/python/operator_test:elementwise_ops_test - main (125.898) Summary Pass: 30 ListingSuccess: 1 ``` Reviewed By: clrfb Differential Revision: D29034265 fbshipit-source-id: 98550e1d5976398e45d37ff2120591af1439c42a	2021-06-15 20:26:04 -07:00
Wei Wen	3b0c6a7b50	fix AddPadding tensor shape inference (#59572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59572 fix AddPadding tensor shape inference Test Plan: sandcastle Reviewed By: dehuacheng Differential Revision: D28686983 fbshipit-source-id: 03f70335fcfd94a1241562f8fbf12043a0deac2b	2021-06-08 11:02:33 -07:00
Jeongmin Lee	bca25d97ad	[itemwise-dropout][1/x][low-level module] Implement Itemwise Sparse Feature Dropout in Dper3 (#59322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59322 Implement sparse feature dropout (with replacement) that can drop out individual items in each sparse feature. For example, the existing sparse feature dropout with replacement drops out whole feature (e.g., a list of page ids) when the feature is selected for drop out. This itemwise dropout assigns probability and drops out to individual items in sparse features. Test Plan: ``` buck test mode/dev caffe2/torch/fb/sparsenn:test ``` https://www.internalfb.com/intern/testinfra/testrun/281475166777899/ ``` buck test mode/dev //dper3/dper3/modules/tests:sparse_itemwise_dropout_with_replacement_test ``` https://www.internalfb.com/intern/testinfra/testrun/6473924504443423 ``` buck test mode/opt caffe2/caffe2/python:layers_test ``` https://www.internalfb.com/intern/testinfra/testrun/2533274848456607 ``` buck test mode/opt caffe2/caffe2/python/operator_test:sparse_itemwise_dropout_with_replacement_op_test ``` https://www.internalfb.com/intern/testinfra/testrun/8725724318782701 Reviewed By: Wakeupbuddy Differential Revision: D27867213 fbshipit-source-id: 8e173c7b3294abbc8bf8a3b04f723cb170446b96	2021-06-04 19:59:17 -07:00
Nikita Shulga	eae84f0d5d	Fix ONNX forward compatibility (#59327 ) Summary: Fixes `onnx.utils.polish_model` not found exception when executed using onnx-1.9 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59327 Reviewed By: H-Huang Differential Revision: D28840563 Pulled By: malfet fbshipit-source-id: 403a29a88e7dee8b3414602b9fe2b31baf737dce	2021-06-02 12:39:56 -07:00
neginraoof	599f5058cf	[ONNX] Update ONNX to rel-1.9 (#55889 ) (#57080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57080 ONNX optimizer is removed in ONNX 1.9 This PR removes ONNX optimizer from a C++ code path and uses `try-except` block in Python to keep it compatible with both ONNX-1.8 and 1.9. Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D28467330 Pulled By: malfet fbshipit-source-id: 5e4669dd0537648898e593f9e253da18d6dc7568 Co-authored-by: neginraoof <neginmr@utexas.edu> Co-authored-by: Nikita Shulga <nshulga@fb.com>	2021-06-02 08:27:17 -07:00
Janet Yang	c06d2afa99	[caffe2] Add support for int32 lengths in BatchSparseToDense (#58062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58062 Make templated function to make sure BatchSparseToDense supports int32 lengths/indices Test Plan: ```buck test //caffe2/caffe2/python/operator_test:batch_sparse_to_dense_op_test ``` Reviewed By: khabinov Differential Revision: D28271423 fbshipit-source-id: 41b88b7a3663616b533aaf4731ff35cdf6ec4c85	2021-05-26 10:33:32 -07:00
Natalia Gimelshein	db5e5781ad	replace all remaining occurrences of deadline=1000, to prevent test flakiness Summary: Per title Test Plan: Fixes existing tests Reviewed By: robieta Differential Revision: D28690296 fbshipit-source-id: d7b5b5065517373b75d501872814c89b24ec8cfc	2021-05-25 15:55:30 -07:00
Natalia Gimelshein	45aa54d83c	relax test deadlines Summary: Relax test deadlines for c2 tests. We run on loaded machines, and timings are unreliable. Test Plan: Fixes existing tests Reviewed By: mruberry Differential Revision: D28690006 fbshipit-source-id: 457707e81a1ec92548c1f23ea7a0022fa0a3bfda	2021-05-25 15:02:52 -07:00
Natalia Gimelshein	056287aec4	turn off deadline for adagrad test Summary: Tests are frequently failing with "exceeded the deadline of 1000.00ms", we expect this to happen, so remove the deadline Test Plan: N/A: Fix breakages Reviewed By: robieta Differential Revision: D28581051 fbshipit-source-id: 4825ada9af151fa5d57c45c549138c15ba613705	2021-05-20 13:47:02 -07:00
Taylor Robie	6989eb60e5	Remove timeouts for C2 tests Summary: When run on very heavily loaded machines, some of these tests are timing out. It's not an issue with the test, it's an issue with the environment. I've removed the timeout so we at least keep unit test coverage. Test Plan: N/A: Fix breakages Reviewed By: ngimel Differential Revision: D28492334 fbshipit-source-id: aed3ee371763161aab2d356f5623c7df053fda6f	2021-05-17 16:39:30 -07:00
Sam Estep	3507ca320b	Remove unused python2 shebang (#58409 ) Summary: This is the only line (not in `third_party`) matching the regex `^#!.*python2`, and [it is not the first line of its file](https://github.com/koalaman/shellcheck/wiki/SC1128), so it has no effect. As a followup to https://github.com/pytorch/pytorch/issues/58275, this PR removes that shebang to reduce confusion, so now all Python shebangs in this repo are `python3`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58409 Reviewed By: walterddr Differential Revision: D28478469 Pulled By: samestep fbshipit-source-id: c17684c8651e45d3fc383cbbc04a31192d10f52f	2021-05-17 13:19:32 -07:00
Sam Estep	2e26976ad3	Disallow versionless Python shebangs (#58275 ) Summary: Some machines don't have a versionless `python` on their PATH, which breaks these existing shebangs. I'm assuming that all the existing versionless `python` shebangs are meant to be `python3` and not `python2`; please let me know if my assumption was incorrect for any of these. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58275 Test Plan: CI. Reviewed By: zhouzhuojie Differential Revision: D28428143 Pulled By: samestep fbshipit-source-id: 6562be3d12924db72a92a0207b060ef740f61ebf	2021-05-14 08:26:02 -07:00
Zhijing Li	88a1e8eb01	Add EMA to DecayAdagrad (#57866 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57866 As titled Test Plan: f271267365 Reviewed By: lanlanfb Differential Revision: D28292875 fbshipit-source-id: f6532048eb558afce87fdada3b7dfa8457a1f538	2021-05-07 23:09:08 -07:00
Valentin Andrei	da06ae73a3	[c2] Fix flaky test_spatial_bn_multi_batch_grad Summary: Removed the deadline restriction since the first run can take more than the deadline, wile subsequent runs are shorter. Reviewed By: ngimel Differential Revision: D28260077 fbshipit-source-id: 8ed2f5c16bc184bf4fae0a59b662fa1da2d4dd0a	2021-05-06 12:50:53 -07:00
Pyre Bot Jr	bca1949dc9	[typing] suppress errors in `fbcode/caffe2` - batch 2 Test Plan: Sandcastle Differential Revision: D28191118 fbshipit-source-id: 59421c7346903597308b0fdf8a0984f56664fb4f	2021-05-04 12:44:27 -07:00
Aapo Kyrola	d1def93166	[torch/debuggability] use log.info() in addition to print() in timeoutguard (#57296 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57296 Seems many trainers disable print(), so we cannot see the thread dumps with CompleteInTimeOrDie(). So log.info() also. Test Plan: sandcastle Reviewed By: aalmah Differential Revision: D28098738 fbshipit-source-id: dfdca8801bacf5c7bccecc2387cb7ef41dadfa46	2021-04-29 15:23:35 -07:00
Nikita Shulga	4cb534f92e	Make PyTorch code-base clang-tidy compliant (#56892 ) Summary: This is an automatic change generated by the following script: ``` #!/usr/bin/env python3 from subprocess import check_output, check_call import os def get_compiled_files_list(): import json with open("build/compile_commands.json") as f: data = json.load(f) files = [os.path.relpath(node['file']) for node in data] for idx, fname in enumerate(files): if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'): files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')] return files def run_clang_tidy(fname): check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"]) changes = check_output(["git", "ls-files", "-m"]) if len(changes) == 0: return check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"]) def main(): git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n") compiled_files = get_compiled_files_list() for idx, fname in enumerate(git_files): if fname not in compiled_files: continue if fname.startswith("caffe2/contrib/aten/"): continue print(f"[{idx}/{len(git_files)}] Processing {fname}") run_clang_tidy(fname) if __name__ == "__main__": main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892 Reviewed By: H-Huang Differential Revision: D27991944 Pulled By: malfet fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179	2021-04-28 14:10:25 -07:00
Pritam Damania	dc8a8cea79	Move caffe2 signal_handler to c10. (#56717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56717 The signal_handler was under the caffe2 namespacee but was being used by PyTorch as well. I've fixed this my moving it to the c10 namespace where now both C2 and PyTorch can use it. The signal_handler interface in caffe2/utils/signal_handler.h is kept the same for backward compatiblity for C2, but most of the commmon code is moved to c10. ghstack-source-id: 127446929 Test Plan: waitforbuildbot Reviewed By: ezyang Differential Revision: D27946738 fbshipit-source-id: d6228d1a0108f4c807d405e7a0bb799c5375388f	2021-04-26 23:08:12 -07:00
Amir Shimoni	34eb6c8589	[Caffe2] ScriptModuleOp support pass_inputs_as_tensor_list (#56813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56813 When the arg `pass_inputs_as_tensor_list` is True, the input tensors are wrapped into a TensorList and passes in as a single param. Test Plan: buck test //caffe2/caffe2/python:workspace_test -- TestScriptModule Reviewed By: dzhulgakov Differential Revision: D27972928 fbshipit-source-id: 5a199649445b0306f3134086c85bd55da45e1a0b	2021-04-23 18:49:57 -07:00
Andy Wei	19943aafe9	[caffe2] Speed up remote net loading Summary: Training recovery takes over 3 hours for DI models. See T88118480 for more details. One of the slowness reasons could be the linear search in the ApplicationSpecificInfo. To improve that, we cache the app info into a dict so the lookup can be much faster. Test Plan: Unit test buck test caffe2/caffe2/fb/predictor:predictor_py_dist_utils_test ```Building: finished in 6.2 sec (100%) 11023/11023 jobs, 2 updated Total time: 6.6 sec More details at https://www.internalfb.com/intern/buck/build/95555464-b15f-44f2-a781-a712126aeaa1 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 3f4e4913-5802-4437-81bf-1e0a08c067da Trace available for this run at /tmp/tpx-20210420-101444.394595/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/5348024608951863 ✓ ListingSuccess: caffe2/caffe2/fb/predictor:predictor_py_dist_utils_test - main (8.412) ✓ Pass: caffe2/caffe2/fb/predictor:predictor_py_dist_utils_test - test_empty_remote_net_in_app_into (caffe2.caffe2.fb.predictor.predictor_py_dist_utils_test.TestPredictorDistUtils) (7.844) ✓ Pass: caffe2/caffe2/fb/predictor:predictor_py_dist_utils_test - test_distributed_context_in_app_info (caffe2.caffe2.fb.predictor.predictor_py_dist_utils_test.TestPredictorDistUtils) (8.014) ✓ Pass: caffe2/caffe2/fb/predictor:predictor_py_dist_utils_test - test_remote_net_in_app_info (caffe2.caffe2.fb.predictor.predictor_py_dist_utils_test.TestPredictorDistUtils) (8.027) Summary Pass: 3 ListingSuccess: 1 If you need help debugging your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5348024608951863 ``` Performance Test: N557020 is the old way, which takes about 30~60 secs for every 1000 remote nets N556897 is the new way, which takes 0.12 secs for every 1000 remote nets N557020 output: ~~~ I0420 112047.755 <ipython-input-2-515f8ba1b5f6>:48] Start retrieving remote nets ... I0420 112050.036 <ipython-input-2-515f8ba1b5f6>:27] Get 1000 remote nets I0420 112052.750 <ipython-input-2-515f8ba1b5f6>:27] Get 2000 remote nets I0420 112055.907 <ipython-input-2-515f8ba1b5f6>:27] Get 3000 remote nets I0420 112059.542 <ipython-input-2-515f8ba1b5f6>:27] Get 4000 remote nets I0420 112103.628 <ipython-input-2-515f8ba1b5f6>:27] Get 5000 remote nets I0420 112108.309 <ipython-input-2-515f8ba1b5f6>:27] Get 6000 remote nets I0420 112113.883 <ipython-input-2-515f8ba1b5f6>:27] Get 7000 remote nets I0420 112119.564 <ipython-input-2-515f8ba1b5f6>:27] Get 8000 remote nets I0420 112125.629 <ipython-input-2-515f8ba1b5f6>:27] Get 9000 remote nets I0420 112132.057 <ipython-input-2-515f8ba1b5f6>:27] Get 10000 remote nets I0420 112138.979 <ipython-input-2-515f8ba1b5f6>:27] Get 11000 remote nets I0420 112146.198 <ipython-input-2-515f8ba1b5f6>:27] Get 12000 remote nets I0420 112154.381 <ipython-input-2-515f8ba1b5f6>:27] Get 13000 remote nets I0420 112202.881 <ipython-input-2-515f8ba1b5f6>:27] Get 14000 remote nets I0420 112211.595 <ipython-input-2-515f8ba1b5f6>:27] Get 15000 remote nets I0420 112221.341 <ipython-input-2-515f8ba1b5f6>:27] Get 16000 remote nets I0420 112231.300 <ipython-input-2-515f8ba1b5f6>:27] Get 17000 remote nets I0420 112242.615 <ipython-input-2-515f8ba1b5f6>:27] Get 18000 remote nets I0420 112253.730 <ipython-input-2-515f8ba1b5f6>:27] Get 19000 remote nets I0420 112305.044 <ipython-input-2-515f8ba1b5f6>:27] Get 20000 remote nets I0420 112316.378 <ipython-input-2-515f8ba1b5f6>:27] Get 21000 remote nets I0420 112328.176 <ipython-input-2-515f8ba1b5f6>:27] Get 22000 remote nets I0420 112341.466 <ipython-input-2-515f8ba1b5f6>:27] Get 23000 remote nets I0420 112355.653 <ipython-input-2-515f8ba1b5f6>:27] Get 24000 remote nets I0420 112409.014 <ipython-input-2-515f8ba1b5f6>:27] Get 25000 remote nets I0420 112422.924 <ipython-input-2-515f8ba1b5f6>:27] Get 26000 remote nets I0420 112437.026 <ipython-input-2-515f8ba1b5f6>:27] Get 27000 remote nets I0420 112451.413 <ipython-input-2-515f8ba1b5f6>:27] Get 28000 remote nets I0420 112506.773 <ipython-input-2-515f8ba1b5f6>:27] Get 29000 remote nets I0420 112522.614 <ipython-input-2-515f8ba1b5f6>:27] Get 30000 remote nets I0420 112538.564 <ipython-input-2-515f8ba1b5f6>:27] Get 31000 remote nets I0420 112555.075 <ipython-input-2-515f8ba1b5f6>:27] Get 32000 remote nets I0420 112612.159 <ipython-input-2-515f8ba1b5f6>:27] Get 33000 remote nets I0420 112629.656 <ipython-input-2-515f8ba1b5f6>:27] Get 34000 remote nets I0420 112647.850 <ipython-input-2-515f8ba1b5f6>:27] Get 35000 remote nets I0420 112705.807 <ipython-input-2-515f8ba1b5f6>:27] Get 36000 remote nets I0420 112724.495 <ipython-input-2-515f8ba1b5f6>:27] Get 37000 remote nets I0420 112744.072 <ipython-input-2-515f8ba1b5f6>:27] Get 38000 remote nets I0420 112804.266 <ipython-input-2-515f8ba1b5f6>:27] Get 39000 remote nets I0420 112824.954 <ipython-input-2-515f8ba1b5f6>:27] Get 40000 remote nets I0420 112845.934 <ipython-input-2-515f8ba1b5f6>:27] Get 41000 remote nets I0420 112908.721 <ipython-input-2-515f8ba1b5f6>:27] Get 42000 remote nets I0420 112930.573 <ipython-input-2-515f8ba1b5f6>:27] Get 43000 remote nets I0420 112952.775 <ipython-input-2-515f8ba1b5f6>:27] Get 44000 remote nets I0420 113015.969 <ipython-input-2-515f8ba1b5f6>:27] Get 45000 remote nets I0420 113041.214 <ipython-input-2-515f8ba1b5f6>:27] Get 46000 remote nets I0420 113104.702 <ipython-input-2-515f8ba1b5f6>:27] Get 47000 remote nets I0420 113128.730 <ipython-input-2-515f8ba1b5f6>:27] Get 48000 remote nets I0420 113153.378 <ipython-input-2-515f8ba1b5f6>:27] Get 49000 remote nets I0420 113218.021 <ipython-input-2-515f8ba1b5f6>:27] Get 50000 remote nets I0420 113243.351 <ipython-input-2-515f8ba1b5f6>:27] Get 51000 remote nets I0420 113309.279 <ipython-input-2-515f8ba1b5f6>:27] Get 52000 remote nets I0420 113335.202 <ipython-input-2-515f8ba1b5f6>:27] Get 53000 remote nets I0420 113402.367 <ipython-input-2-515f8ba1b5f6>:27] Get 54000 remote nets I0420 113430.947 <ipython-input-2-515f8ba1b5f6>:27] Get 55000 remote nets I0420 113458.127 <ipython-input-2-515f8ba1b5f6>:27] Get 56000 remote nets I0420 113526.365 <ipython-input-2-515f8ba1b5f6>:27] Get 57000 remote nets I0420 113554.709 <ipython-input-2-515f8ba1b5f6>:27] Get 58000 remote nets I0420 113623.601 <ipython-input-2-515f8ba1b5f6>:27] Get 59000 remote nets I0420 113653.264 <ipython-input-2-515f8ba1b5f6>:27] Get 60000 remote nets I0420 113724.726 <ipython-input-2-515f8ba1b5f6>:27] Get 61000 remote nets I0420 113755.080 <ipython-input-2-515f8ba1b5f6>:27] Get 62000 remote nets I0420 113827.936 <ipython-input-2-515f8ba1b5f6>:27] Get 63000 remote nets I0420 113859.362 <ipython-input-2-515f8ba1b5f6>:27] Get 64000 remote nets I0420 113931.138 <ipython-input-2-515f8ba1b5f6>:27] Get 65000 remote nets I0420 114003.229 <ipython-input-2-515f8ba1b5f6>:27] Get 66000 remote nets I0420 114038.085 <ipython-input-2-515f8ba1b5f6>:27] Get 67000 remote nets I0420 114111.300 <ipython-input-2-515f8ba1b5f6>:27] Get 68000 remote nets I0420 114145.383 <ipython-input-2-515f8ba1b5f6>:27] Get 69000 remote nets I0420 114219.571 <ipython-input-2-515f8ba1b5f6>:27] Get 70000 remote nets I0420 114254.233 <ipython-input-2-515f8ba1b5f6>:27] Get 71000 remote nets I0420 114329.326 <ipython-input-2-515f8ba1b5f6>:27] Get 72000 remote nets I0420 114405.087 <ipython-input-2-515f8ba1b5f6>:27] Get 73000 remote nets I0420 114440.979 <ipython-input-2-515f8ba1b5f6>:27] Get 74000 remote nets I0420 114518.520 <ipython-input-2-515f8ba1b5f6>:27] Get 75000 remote nets I0420 114556.013 <ipython-input-2-515f8ba1b5f6>:27] Get 76000 remote nets I0420 114633.434 <ipython-input-2-515f8ba1b5f6>:27] Get 77000 remote nets I0420 114711.834 <ipython-input-2-515f8ba1b5f6>:27] Get 78000 remote nets I0420 114750.741 <ipython-input-2-515f8ba1b5f6>:27] Get 79000 remote nets I0420 114829.749 <ipython-input-2-515f8ba1b5f6>:27] Get 80000 remote nets I0420 114909.038 <ipython-input-2-515f8ba1b5f6>:27] Get 81000 remote nets I0420 114948.711 <ipython-input-2-515f8ba1b5f6>:27] Get 82000 remote nets I0420 115028.869 <ipython-input-2-515f8ba1b5f6>:27] Get 83000 remote nets I0420 115109.094 <ipython-input-2-515f8ba1b5f6>:27] Get 84000 remote nets I0420 115150.249 <ipython-input-2-515f8ba1b5f6>:27] Get 85000 remote nets I0420 115231.601 <ipython-input-2-515f8ba1b5f6>:27] Get 86000 remote nets I0420 115313.772 <ipython-input-2-515f8ba1b5f6>:27] Get 87000 remote nets I0420 115356.035 <ipython-input-2-515f8ba1b5f6>:27] Get 88000 remote nets I0420 115438.846 <ipython-input-2-515f8ba1b5f6>:27] Get 89000 remote nets I0420 115522.213 <ipython-input-2-515f8ba1b5f6>:27] Get 90000 remote nets I0420 115607.908 <ipython-input-2-515f8ba1b5f6>:27] Get 91000 remote nets I0420 115652.009 <ipython-input-2-515f8ba1b5f6>:27] Get 92000 remote nets I0420 115736.510 <ipython-input-2-515f8ba1b5f6>:27] Get 93000 remote nets I0420 115822.303 <ipython-input-2-515f8ba1b5f6>:27] Get 94000 remote nets I0420 115908.392 <ipython-input-2-515f8ba1b5f6>:27] Get 95000 remote nets I0420 115954.912 <ipython-input-2-515f8ba1b5f6>:27] Get 96000 remote nets I0420 120042.219 <ipython-input-2-515f8ba1b5f6>:27] Get 97000 remote nets I0420 120129.969 <ipython-input-2-515f8ba1b5f6>:27] Get 98000 remote nets I0420 120218.765 <ipython-input-2-515f8ba1b5f6>:27] Get 99000 remote nets I0420 120306.883 <ipython-input-2-515f8ba1b5f6>:27] Get 100000 remote nets I0420 120355.543 <ipython-input-2-515f8ba1b5f6>:27] Get 101000 remote nets I0420 120444.976 <ipython-input-2-515f8ba1b5f6>:27] Get 102000 remote nets I0420 120533.482 <ipython-input-2-515f8ba1b5f6>:27] Get 103000 remote nets I0420 120622.351 <ipython-input-2-515f8ba1b5f6>:27] Get 104000 remote nets I0420 120712.467 <ipython-input-2-515f8ba1b5f6>:27] Get 105000 remote nets I0420 120802.660 <ipython-input-2-515f8ba1b5f6>:27] Get 106000 remote nets I0420 120854.634 <ipython-input-2-515f8ba1b5f6>:27] Get 107000 remote nets I0420 120945.786 <ipython-input-2-515f8ba1b5f6>:27] Get 108000 remote nets ~~~ N556897 output: ~~~ I0420 111502.516 <ipython-input-7-52640a51556f>:60] Start retrieving remote nets ... I0420 111504.709 <ipython-input-7-52640a51556f>:40] Get 1000 remote nets I0420 111504.825 <ipython-input-7-52640a51556f>:40] Get 2000 remote nets I0420 111504.941 <ipython-input-7-52640a51556f>:40] Get 3000 remote nets I0420 111505.056 <ipython-input-7-52640a51556f>:40] Get 4000 remote nets I0420 111505.174 <ipython-input-7-52640a51556f>:40] Get 5000 remote nets I0420 111505.286 <ipython-input-7-52640a51556f>:40] Get 6000 remote nets I0420 111505.405 <ipython-input-7-52640a51556f>:40] Get 7000 remote nets I0420 111505.522 <ipython-input-7-52640a51556f>:40] Get 8000 remote nets I0420 111505.639 <ipython-input-7-52640a51556f>:40] Get 9000 remote nets I0420 111505.756 <ipython-input-7-52640a51556f>:40] Get 10000 remote nets I0420 111505.873 <ipython-input-7-52640a51556f>:40] Get 11000 remote nets I0420 111505.990 <ipython-input-7-52640a51556f>:40] Get 12000 remote nets I0420 111506.106 <ipython-input-7-52640a51556f>:40] Get 13000 remote nets I0420 111506.223 <ipython-input-7-52640a51556f>:40] Get 14000 remote nets I0420 111506.343 <ipython-input-7-52640a51556f>:40] Get 15000 remote nets I0420 111506.457 <ipython-input-7-52640a51556f>:40] Get 16000 remote nets I0420 111506.585 <ipython-input-7-52640a51556f>:40] Get 17000 remote nets I0420 111508.930 <ipython-input-7-52640a51556f>:40] Get 18000 remote nets I0420 111509.045 <ipython-input-7-52640a51556f>:40] Get 19000 remote nets I0420 111509.154 <ipython-input-7-52640a51556f>:40] Get 20000 remote nets I0420 111509.266 <ipython-input-7-52640a51556f>:40] Get 21000 remote nets I0420 111509.382 <ipython-input-7-52640a51556f>:40] Get 22000 remote nets I0420 111509.497 <ipython-input-7-52640a51556f>:40] Get 23000 remote nets I0420 111509.614 <ipython-input-7-52640a51556f>:40] Get 24000 remote nets I0420 111509.736 <ipython-input-7-52640a51556f>:40] Get 25000 remote nets I0420 111509.854 <ipython-input-7-52640a51556f>:40] Get 26000 remote nets I0420 111509.972 <ipython-input-7-52640a51556f>:40] Get 27000 remote nets I0420 111510.090 <ipython-input-7-52640a51556f>:40] Get 28000 remote nets I0420 111510.210 <ipython-input-7-52640a51556f>:40] Get 29000 remote nets I0420 111510.329 <ipython-input-7-52640a51556f>:40] Get 30000 remote nets I0420 111510.448 <ipython-input-7-52640a51556f>:40] Get 31000 remote nets I0420 111510.572 <ipython-input-7-52640a51556f>:40] Get 32000 remote nets I0420 111510.689 <ipython-input-7-52640a51556f>:40] Get 33000 remote nets I0420 111510.821 <ipython-input-7-52640a51556f>:40] Get 34000 remote nets I0420 111510.989 <ipython-input-7-52640a51556f>:40] Get 35000 remote nets I0420 111511.110 <ipython-input-7-52640a51556f>:40] Get 36000 remote nets I0420 111511.236 <ipython-input-7-52640a51556f>:40] Get 37000 remote nets I0420 111511.357 <ipython-input-7-52640a51556f>:40] Get 38000 remote nets I0420 111511.482 <ipython-input-7-52640a51556f>:40] Get 39000 remote nets I0420 111511.607 <ipython-input-7-52640a51556f>:40] Get 40000 remote nets I0420 111511.729 <ipython-input-7-52640a51556f>:40] Get 41000 remote nets I0420 111511.855 <ipython-input-7-52640a51556f>:40] Get 42000 remote nets I0420 111511.988 <ipython-input-7-52640a51556f>:40] Get 43000 remote nets I0420 111512.112 <ipython-input-7-52640a51556f>:40] Get 44000 remote nets I0420 111512.232 <ipython-input-7-52640a51556f>:40] Get 45000 remote nets I0420 111512.353 <ipython-input-7-52640a51556f>:40] Get 46000 remote nets I0420 111512.477 <ipython-input-7-52640a51556f>:40] Get 47000 remote nets I0420 111512.597 <ipython-input-7-52640a51556f>:40] Get 48000 remote nets I0420 111512.723 <ipython-input-7-52640a51556f>:40] Get 49000 remote nets I0420 111512.839 <ipython-input-7-52640a51556f>:40] Get 50000 remote nets I0420 111512.969 <ipython-input-7-52640a51556f>:40] Get 51000 remote nets I0420 111513.085 <ipython-input-7-52640a51556f>:40] Get 52000 remote nets I0420 111513.205 <ipython-input-7-52640a51556f>:40] Get 53000 remote nets I0420 111513.322 <ipython-input-7-52640a51556f>:40] Get 54000 remote nets I0420 111513.441 <ipython-input-7-52640a51556f>:40] Get 55000 remote nets I0420 111513.559 <ipython-input-7-52640a51556f>:40] Get 56000 remote nets I0420 111513.678 <ipython-input-7-52640a51556f>:40] Get 57000 remote nets I0420 111513.796 <ipython-input-7-52640a51556f>:40] Get 58000 remote nets I0420 111513.918 <ipython-input-7-52640a51556f>:40] Get 59000 remote nets I0420 111514.038 <ipython-input-7-52640a51556f>:40] Get 60000 remote nets I0420 111514.158 <ipython-input-7-52640a51556f>:40] Get 61000 remote nets I0420 111514.273 <ipython-input-7-52640a51556f>:40] Get 62000 remote nets I0420 111514.391 <ipython-input-7-52640a51556f>:40] Get 63000 remote nets I0420 111514.512 <ipython-input-7-52640a51556f>:40] Get 64000 remote nets I0420 111514.638 <ipython-input-7-52640a51556f>:40] Get 65000 remote nets I0420 111514.759 <ipython-input-7-52640a51556f>:40] Get 66000 remote nets I0420 111514.874 <ipython-input-7-52640a51556f>:40] Get 67000 remote nets I0420 111515.000 <ipython-input-7-52640a51556f>:40] Get 68000 remote nets I0420 111515.117 <ipython-input-7-52640a51556f>:40] Get 69000 remote nets I0420 111515.235 <ipython-input-7-52640a51556f>:40] Get 70000 remote nets I0420 111515.358 <ipython-input-7-52640a51556f>:40] Get 71000 remote nets I0420 111515.481 <ipython-input-7-52640a51556f>:40] Get 72000 remote nets I0420 111515.604 <ipython-input-7-52640a51556f>:40] Get 73000 remote nets I0420 111515.725 <ipython-input-7-52640a51556f>:40] Get 74000 remote nets I0420 111515.848 <ipython-input-7-52640a51556f>:40] Get 75000 remote nets I0420 111515.979 <ipython-input-7-52640a51556f>:40] Get 76000 remote nets I0420 111516.102 <ipython-input-7-52640a51556f>:40] Get 77000 remote nets I0420 111516.226 <ipython-input-7-52640a51556f>:40] Get 78000 remote nets I0420 111516.344 <ipython-input-7-52640a51556f>:40] Get 79000 remote nets I0420 111516.472 <ipython-input-7-52640a51556f>:40] Get 80000 remote nets I0420 111516.603 <ipython-input-7-52640a51556f>:40] Get 81000 remote nets I0420 111516.751 <ipython-input-7-52640a51556f>:40] Get 82000 remote nets I0420 111516.883 <ipython-input-7-52640a51556f>:40] Get 83000 remote nets I0420 111517.025 <ipython-input-7-52640a51556f>:40] Get 84000 remote nets I0420 111517.160 <ipython-input-7-52640a51556f>:40] Get 85000 remote nets I0420 111517.290 <ipython-input-7-52640a51556f>:40] Get 86000 remote nets I0420 111517.415 <ipython-input-7-52640a51556f>:40] Get 87000 remote nets I0420 111517.541 <ipython-input-7-52640a51556f>:40] Get 88000 remote nets I0420 111517.665 <ipython-input-7-52640a51556f>:40] Get 89000 remote nets I0420 111517.790 <ipython-input-7-52640a51556f>:40] Get 90000 remote nets I0420 111517.918 <ipython-input-7-52640a51556f>:40] Get 91000 remote nets I0420 111518.044 <ipython-input-7-52640a51556f>:40] Get 92000 remote nets I0420 111518.171 <ipython-input-7-52640a51556f>:40] Get 93000 remote nets I0420 111518.292 <ipython-input-7-52640a51556f>:40] Get 94000 remote nets I0420 111518.429 <ipython-input-7-52640a51556f>:40] Get 95000 remote nets I0420 111520.024 <ipython-input-7-52640a51556f>:40] Get 96000 remote nets I0420 111520.148 <ipython-input-7-52640a51556f>:40] Get 97000 remote nets I0420 111520.271 <ipython-input-7-52640a51556f>:40] Get 98000 remote nets I0420 111520.396 <ipython-input-7-52640a51556f>:40] Get 99000 remote nets I0420 111520.522 <ipython-input-7-52640a51556f>:40] Get 100000 remote nets I0420 111520.646 <ipython-input-7-52640a51556f>:40] Get 101000 remote nets I0420 111520.770 <ipython-input-7-52640a51556f>:40] Get 102000 remote nets I0420 111520.899 <ipython-input-7-52640a51556f>:40] Get 103000 remote nets I0420 111521.023 <ipython-input-7-52640a51556f>:40] Get 104000 remote nets I0420 111521.149 <ipython-input-7-52640a51556f>:40] Get 105000 remote nets I0420 111521.274 <ipython-input-7-52640a51556f>:40] Get 106000 remote nets I0420 111521.399 <ipython-input-7-52640a51556f>:40] Get 107000 remote nets I0420 111521.526 <ipython-input-7-52640a51556f>:40] Get 108000 remote nets I0420 111521.651 <ipython-input-7-52640a51556f>:40] Get 109000 remote nets I0420 111521.778 <ipython-input-7-52640a51556f>:40] Get 110000 remote nets I0420 111521.900 <ipython-input-7-52640a51556f>:40] Get 111000 remote nets I0420 111522.055 <ipython-input-7-52640a51556f>:40] Get 112000 remote nets I0420 111522.173 <ipython-input-7-52640a51556f>:40] Get 113000 remote nets I0420 111522.297 <ipython-input-7-52640a51556f>:40] Get 114000 remote nets I0420 111522.421 <ipython-input-7-52640a51556f>:40] Get 115000 remote nets I0420 111522.545 <ipython-input-7-52640a51556f>:40] Get 116000 remote nets I0420 111522.671 <ipython-input-7-52640a51556f>:40] Get 117000 remote nets I0420 111522.795 <ipython-input-7-52640a51556f>:40] Get 118000 remote nets I0420 111522.919 <ipython-input-7-52640a51556f>:40] Get 119000 remote nets I0420 111523.048 <ipython-input-7-52640a51556f>:40] Get 120000 remote nets I0420 111523.171 <ipython-input-7-52640a51556f>:40] Get 121000 remote nets I0420 111523.298 <ipython-input-7-52640a51556f>:40] Get 122000 remote nets I0420 111523.420 <ipython-input-7-52640a51556f>:40] Get 123000 remote nets I0420 111523.544 <ipython-input-7-52640a51556f>:40] Get 124000 remote nets I0420 111523.669 <ipython-input-7-52640a51556f>:40] Get 125000 remote nets I0420 111523.794 <ipython-input-7-52640a51556f>:40] Get 126000 remote nets I0420 111523.920 <ipython-input-7-52640a51556f>:40] Get 127000 remote nets I0420 111524.041 <ipython-input-7-52640a51556f>:40] Get 128000 remote nets I0420 111524.173 <ipython-input-7-52640a51556f>:40] Get 129000 remote nets I0420 111524.293 <ipython-input-7-52640a51556f>:40] Get 130000 remote nets I0420 111524.417 <ipython-input-7-52640a51556f>:40] Get 131000 remote nets I0420 111524.542 <ipython-input-7-52640a51556f>:40] Get 132000 remote nets I0420 111524.665 <ipython-input-7-52640a51556f>:40] Get 133000 remote nets I0420 111524.790 <ipython-input-7-52640a51556f>:40] Get 134000 remote nets I0420 111524.913 <ipython-input-7-52640a51556f>:40] Get 135000 remote nets I0420 111525.038 <ipython-input-7-52640a51556f>:40] Get 136000 remote nets I0420 111525.166 <ipython-input-7-52640a51556f>:40] Get 137000 remote nets I0420 111525.289 <ipython-input-7-52640a51556f>:40] Get 138000 remote nets I0420 111525.414 <ipython-input-7-52640a51556f>:40] Get 139000 remote nets I0420 111525.536 <ipython-input-7-52640a51556f>:40] Get 140000 remote nets I0420 111525.659 <ipython-input-7-52640a51556f>:40] Get 141000 remote nets I0420 111525.782 <ipython-input-7-52640a51556f>:40] Get 142000 remote nets I0420 111525.907 <ipython-input-7-52640a51556f>:40] Get 143000 remote nets I0420 111526.035 <ipython-input-7-52640a51556f>:40] Get 144000 remote nets I0420 111526.157 <ipython-input-7-52640a51556f>:40] Get 145000 remote nets I0420 111526.287 <ipython-input-7-52640a51556f>:40] Get 146000 remote nets I0420 111526.409 <ipython-input-7-52640a51556f>:40] Get 147000 remote nets I0420 111526.533 <ipython-input-7-52640a51556f>:40] Get 148000 remote nets I0420 111526.658 <ipython-input-7-52640a51556f>:40] Get 149000 remote nets I0420 111526.781 <ipython-input-7-52640a51556f>:40] Get 150000 remote nets I0420 111526.908 <ipython-input-7-52640a51556f>:40] Get 151000 remote nets I0420 111527.033 <ipython-input-7-52640a51556f>:40] Get 152000 remote nets I0420 111527.158 <ipython-input-7-52640a51556f>:40] Get 153000 remote nets I0420 111527.289 <ipython-input-7-52640a51556f>:40] Get 154000 remote nets I0420 111527.413 <ipython-input-7-52640a51556f>:40] Get 155000 remote nets I0420 111527.544 <ipython-input-7-52640a51556f>:40] Get 156000 remote nets I0420 111527.665 <ipython-input-7-52640a51556f>:40] Get 157000 remote nets I0420 111527.790 <ipython-input-7-52640a51556f>:40] Get 158000 remote nets I0420 111527.917 <ipython-input-7-52640a51556f>:40] Get 159000 remote nets I0420 111528.046 <ipython-input-7-52640a51556f>:40] Get 160000 remote nets I0420 111528.175 <ipython-input-7-52640a51556f>:40] Get 161000 remote nets I0420 111528.297 <ipython-input-7-52640a51556f>:40] Get 162000 remote nets I0420 111528.422 <ipython-input-7-52640a51556f>:40] Get 163000 remote nets I0420 111528.548 <ipython-input-7-52640a51556f>:40] Get 164000 remote nets I0420 111528.672 <ipython-input-7-52640a51556f>:40] Get 165000 remote nets I0420 111528.796 <ipython-input-7-52640a51556f>:40] Get 166000 remote nets I0420 111528.920 <ipython-input-7-52640a51556f>:40] Get 167000 remote nets I0420 111529.045 <ipython-input-7-52640a51556f>:40] Get 168000 remote nets I0420 111529.172 <ipython-input-7-52640a51556f>:40] Get 169000 remote nets I0420 111529.300 <ipython-input-7-52640a51556f>:40] Get 170000 remote nets I0420 111529.426 <ipython-input-7-52640a51556f>:40] Get 171000 remote nets I0420 111529.547 <ipython-input-7-52640a51556f>:40] Get 172000 remote nets I0420 111529.683 <ipython-input-7-52640a51556f>:40] Get 173000 remote nets I0420 111529.800 <ipython-input-7-52640a51556f>:40] Get 174000 remote nets I0420 111529.923 <ipython-input-7-52640a51556f>:40] Get 175000 remote nets I0420 111530.080 <ipython-input-7-52640a51556f>:40] Get 176000 remote nets I0420 111530.205 <ipython-input-7-52640a51556f>:40] Get 177000 remote nets I0420 111530.331 <ipython-input-7-52640a51556f>:40] Get 178000 remote nets I0420 111530.453 <ipython-input-7-52640a51556f>:40] Get 179000 remote nets I0420 111530.577 <ipython-input-7-52640a51556f>:40] Get 180000 remote nets I0420 111530.705 <ipython-input-7-52640a51556f>:40] Get 181000 remote nets I0420 111530.829 <ipython-input-7-52640a51556f>:40] Get 182000 remote nets I0420 111530.955 <ipython-input-7-52640a51556f>:40] Get 183000 remote nets I0420 111531.082 <ipython-input-7-52640a51556f>:40] Get 184000 remote nets I0420 111531.210 <ipython-input-7-52640a51556f>:40] Get 185000 remote nets I0420 111531.338 <ipython-input-7-52640a51556f>:40] Get 186000 remote nets I0420 111531.461 <ipython-input-7-52640a51556f>:40] Get 187000 remote nets I0420 111531.588 <ipython-input-7-52640a51556f>:40] Get 188000 remote nets I0420 111531.708 <ipython-input-7-52640a51556f>:40] Get 189000 remote nets I0420 111531.845 <ipython-input-7-52640a51556f>:40] Get 190000 remote nets I0420 111531.968 <ipython-input-7-52640a51556f>:40] Get 191000 remote nets I0420 111532.096 <ipython-input-7-52640a51556f>:40] Get 192000 remote nets I0420 111534.047 <ipython-input-7-52640a51556f>:40] Get 193000 remote nets I0420 111534.172 <ipython-input-7-52640a51556f>:40] Get 194000 remote nets I0420 111534.297 <ipython-input-7-52640a51556f>:40] Get 195000 remote nets I0420 111534.420 <ipython-input-7-52640a51556f>:40] Get 196000 remote nets I0420 111534.543 <ipython-input-7-52640a51556f>:40] Get 197000 remote nets I0420 111534.671 <ipython-input-7-52640a51556f>:40] Get 198000 remote nets I0420 111534.794 <ipython-input-7-52640a51556f>:40] Get 199000 remote nets I0420 111534.920 <ipython-input-7-52640a51556f>:40] Get 200000 remote nets I0420 111535.044 <ipython-input-7-52640a51556f>:40] Get 201000 remote nets I0420 111535.167 <ipython-input-7-52640a51556f>:40] Get 202000 remote nets I0420 111535.291 <ipython-input-7-52640a51556f>:40] Get 203000 remote nets I0420 111537.169 <ipython-input-7-52640a51556f>:64] Finish retrieving remote nets. Starting processing ... I0420 111537.201 <ipython-input-7-52640a51556f>:77] Finished processing remote nets ~~~ Reviewed By: heslami Differential Revision: D27886217 fbshipit-source-id: cdc398d04bf963d4f495adc0a91c8ceb54466e58	2021-04-20 22:32:40 -07:00
Avinash Nagaraj Bukkittu	70a09d97d1	Use nodes instead of node Summary: `networkx 2.4+` replaced `node` attribute to `nodes` in graph object. This caused failures in `caffe2`'s' `topological_sort_traversal_longest_path` function which uses networkx library for topological sort. Differential Revision: D27718857 fbshipit-source-id: 812fbb613946565d089cc84a20f3cdf7df046e19	2021-04-13 10:45:35 -07:00
Yukio Siraichi	93bf0ae6fc	Remove legacy constructor calls from pytorch codebase. (#54142 ) Summary: Follow up from https://github.com/pytorch/pytorch/issues/53889 Related to https://github.com/pytorch/pytorch/issues/47112 Removing every occurrence of the legacy constructor call present in PyTorch at: - _docs_ - _benchmarks_ - _test_ - _caffe2_ - _CONTRIBUTING.md_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/54142 Reviewed By: ngimel Differential Revision: D27699450 Pulled By: mruberry fbshipit-source-id: 530aa3f5746cc8bc1407d5d51b2bbd8075e30546	2021-04-11 15:45:17 -07:00
Pritam Damania	e3691be2d9	Dump C++ stack traces of all threads for distributed tests. (#55003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55003 Using the `caffe2::setPrintStackTracesOnFatalSignal` utility in distributed tests to set a signal handler that dumps the state of all threads for all processes when it receives a FATAL signal. This would help in debugging tests further. I had to revert all the python faulthandler code since only one signal handler function is supported, so running python faulthandler with `setPrintStackTracesOnFatalSignal` doesn't work. Sample output: ``` SIGSEGV(11), PID: 3492872, Thread 3492872: [0] ???(0x7fa7b2d1d61b) in libcaffe2_caffe2_caffe2_cpu.so [1] ???(0x7fa7b2d1d3fb) in libcaffe2_caffe2_caffe2_cpu.so [2] ???(0x7fa7b2d1d33d) in libcaffe2_caffe2_caffe2_cpu.so [3] ???(0x7fa7b2d1d167) in libcaffe2_caffe2_caffe2_cpu.so [4] ???(0x7fa7ce683150) in libpthread.so.0 [5] ???(0x7fa7be2b233c) in libcaffe2__C_impl_cuda.so [6] ???(0x7fa7be2ce80c) in libcaffe2__C_impl_cuda.so [7] ???(0x7fa7be2a0512) in libcaffe2__C_impl_cuda.so [8] torch::distributed::rpc::TensorPipeAgent::send(torch::distributed::rpc::WorkerInfo const&, torch::distributed::rpc::Message&&, float, std::unordered_map<signed char, signed char, std::hash<signed char>, std::equal_to<signed char>, std::allocator<std::pair<signed char const, signed char> > > const&)+0x24f(0x7fa7be29f71f) in libcaffe2__C_impl_cuda.so [9] torch::distributed::autograd::sendMessageWithAutograd(torch::distributed::rpc::RpcAgent&, torch::distributed::rpc::WorkerInfo const&, torch::distributed::rpc::Message&&, bool, float, bool)+0x393(0x7fa7b602b203) in libcaffe2_libtorch.so [10] torch::distributed::rpc::pyRpcPythonUdf(torch::distributed::rpc::WorkerInfo const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::vector<at::Tensor, std::allocator<at::Tensor> >&, float, bool)+0x201(0x7fa7bd844971) in libcaffe2__C_impl_cuda.so ``` ghstack-source-id: 125630551 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D27419714 fbshipit-source-id: 8aca9a14ef688004053d8798124d9c3a3fbe3489	2021-04-03 13:59:56 -07:00
Oleg Khabinov	6145ac07b5	[caffe2] Reintroduce Log1p operator (#55073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55073 Original diff D27422219 (`d92e2520de`) was reverted, reintroducing this op again. Reviewed By: ChunliF Differential Revision: D27473735 fbshipit-source-id: 1af0281724e9ada699ebf2045d51f65083daf5b4	2021-03-31 22:29:23 -07:00
Alexander Golynski	25e07c6e91	Revert D27422219: [caffe2] Support Log1p operator Test Plan: revert-hammer Differential Revision: D27422219 (`d92e2520de`) Original commit changeset: f9eba82bf09c fbshipit-source-id: 7cd5b778ae5f296187f57b6efaa782de97a6f013	2021-03-31 06:03:45 -07:00
Oleg Khabinov	d92e2520de	[caffe2] Support Log1p operator (#54968 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54968 Support Log1p operator to add feature parity with PyTorch. NumPy doc https://numpy.org/doc/stable/reference/generated/numpy.log1p.html PyTorch doc https://pytorch.org/docs/stable/generated/torch.log1p.html Test Plan: ``` $ buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:log1p_op_test ``` Differential Revision: D27422219 fbshipit-source-id: f9eba82bf09c1c440f11a33f8ae2bf8084609457	2021-03-30 16:38:37 -07:00
Sam Estep	5bcbbf5373	Lint trailing newlines (#54737 ) Summary: Context: https://github.com/pytorch/pytorch/issues/53406 added a lint for trailing whitespace at the ends of lines. However, in order to pass FB-internal lints, that PR also had to normalize the trailing newlines in four of the files it touched. This PR adds an OSS lint to normalize trailing newlines. The changes to the following files (made in 54847d0adb9be71be4979cead3d9d4c02160e4cd) are the only manually-written parts of this PR: - `.github/workflows/lint.yml` - `mypy-strict.ini` - `tools/README.md` - `tools/test/test_trailing_newlines.py` - `tools/trailing_newlines.py` I would have liked to make this just a shell one-liner like the other three similar lints, but nothing I could find quite fit the bill. Specifically, all the answers I tried from the following Stack Overflow questions were far too slow (at least a minute and a half to run on this entire repository): - [How to detect file ends in newline?](https://stackoverflow.com/q/38746) - [How do I find files that do not end with a newline/linefeed?](https://stackoverflow.com/q/4631068) - [How to list all files in the Git index without newline at end of file](https://stackoverflow.com/q/27624800) - [Linux - check if there is an empty line at the end of a file [duplicate]](https://stackoverflow.com/q/34943632) - [git ensure newline at end of each file](https://stackoverflow.com/q/57770972) To avoid giving false positives during the few days after this PR is merged, we should probably only merge it after https://github.com/pytorch/pytorch/issues/54967. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54737 Test Plan: Running the shell script from the "Ensure correct trailing newlines" step in the `quick-checks` job of `.github/workflows/lint.yml` should print no output and exit in a fraction of a second with a status of 0. That was not the case prior to this PR, as shown by this failing GHA workflow run on an earlier draft of this PR: - https://github.com/pytorch/pytorch/runs/2197446987?check_suite_focus=true In contrast, this run (after correcting the trailing newlines in this PR) succeeded: - https://github.com/pytorch/pytorch/pull/54737/checks?check_run_id=2197553241 To unit-test `tools/trailing_newlines.py` itself (this is run as part of our "Test tools" GitHub Actions workflow): ``` python tools/test/test_trailing_newlines.py ``` Reviewed By: malfet Differential Revision: D27409736 Pulled By: samestep fbshipit-source-id: 46f565227046b39f68349bbd5633105b2d2e9b19	2021-03-30 13:09:52 -07:00
Leszek Nowaczyk	1551bcc670	change logging.warn to logging.warning (#51727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51727 logging.warn() is deprecated since Python 3.3 in favor of logging.warning() Reviewed By: yinghai Differential Revision: D25785598 fbshipit-source-id: 391d834fe607cd571ee147445aa0a98910535099	2021-03-29 10:42:30 -07:00
Lanlan Liu	695eef05a4	optimizer exploration - v1 and v2 + fix position_weighted optimizer + decoupled weight decay (#54042 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54042 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53881 1. Fix position_weighted optimizer: Position weighted layer uses default optimizer but is actually gradient_slice, which will cause problem if we do not handle it properly in the new optimizier. The solution is to use sparseadagrad when it is gradient_slices. 2. Optimizer implementation of v1 and v2: using 1st momentum with/without bias_correction. 3. also implemented decoupled weight decay in the new optimizer. Test Plan: buck test //caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_2 -- test_mlp_optimization buck test //caffe2/caffe2/python:optimizer_test -- TestDecayAdagrad buck test //caffe2/caffe2/python/operator_test:decay_adagrad_test ctr_mbl_feed work flow: f255731660 oc work flow: f255739503 Reviewed By: 0x10cxR1 Differential Revision: D26839668 fbshipit-source-id: 2b6881c1a88540ef5766be40f5e80001257e2199	2021-03-27 23:03:29 -07:00
Rahul Arunapuram Gokul	6eaf96961d	[codemod] fix tautological imports Test Plan: waitforsandcastle Reviewed By: koronthaly Differential Revision: D27310963 fbshipit-source-id: 9ca0a6468e00d481b1583ab98578dc70f80bb3bf	2021-03-27 01:15:57 -07:00
Adam Simpkins	87989a6cf9	[caffe2] support serializing float data as bfloat16 (#53735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53735 Add an option to BlobSerializationOptions to request that float data be serialized as bfloat16. This reduces the serialized data size at the expense of some loss in precision. ghstack-source-id: 124317910 Test Plan: Included a new unit test. Reviewed By: mraway Differential Revision: D26658205 fbshipit-source-id: 74521ed161059066355a3f208488ed01a344dbb5	2021-03-24 13:27:22 -07:00
Neha Shah	f3c00047ce	Reset Optimizer counter while deserializing netWithBackwardOptions Summary: Add ability to reset optimizer counter.. Test Plan: will wait for integration tests to run on diff. Differential Revision: D27248286 fbshipit-source-id: a608df1bd61b64eb317c9ffd9cfdd804c5288f6d	2021-03-23 11:16:11 -07:00
Adam Simpkins	da18313de3	[caffe2] expose whether FBGEMM is available to the Python code (#54274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54274 Some of the Python tests need to be aware of whether or not FBGEMM is available, so expose this setting in the pybind extension. ghstack-source-id: 124317732 Test Plan: Will use this variable in the tests on D26658205. Reviewed By: mraway Differential Revision: D27171780 fbshipit-source-id: 4c94144a959bf8bf0e1553b6e029e94a91794e29	2021-03-19 12:52:14 -07:00
generatedunixname89002005307016	c4f50162be	[typing] suppress errors in `fbcode/caffe2` - batch 2 Test Plan: Sandcastle Differential Revision: D27082725 fbshipit-source-id: a920b4eb62ff07d8e80fa2b9e3fd340cb44b689f	2021-03-16 16:45:41 -07:00
Chester Liu	f6df18f6ca	Clean up future imports for Python 2 (#53349 ) Summary: See https://github.com/pytorch/pytorch/issues/42919 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53349 Reviewed By: malfet Differential Revision: D27039089 Pulled By: bugra fbshipit-source-id: 8063dc184248604506a8dbb1bcb73da8ec85bb18	2021-03-14 15:56:13 -07:00
Adam Simpkins	7e5ffbfa94	[caffe2] add a SerializationOptions field for the save operator (#53402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53402 Add an `options` field to the `Save` operator which accepts options for how to serialize different blobs. At the moment this simply allows controlling the existing `chunk_size` behavior, but in the future we can add other options, such as the ability to control compression settings or other serialization formats. ghstack-source-id: 123567034 Test Plan: Added a new test to `load_save_test.py` that passes in options and verifies that blobs were serialized with the expected number of chunks. buck test caffe2/caffe2:caffe2_test_cpu \ caffe2/caffe2/core:serialization_test \ caffe2/caffe2/python/operator_test:load_save_test Reviewed By: mraway Differential Revision: D26502577 fbshipit-source-id: 6e302e530bb96990517c2e35c505db7f14a56284	2021-03-11 13:02:58 -08:00
Adam Simpkins	023948e6d7	[caffe2] update load_save_test.py to also verify the chunking behavior (#53401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53401 This is a reland of D26641599 (`cd9ac54ea7`) after rebasing onto D26802576 (`f595ba1bae`). Add some small utility functions to read the blob names back from the minidb file so that we can verify how many chunks were written for each blob. ghstack-source-id: 123567033 Test Plan: buck test caffe2/caffe2/python/operator_test:load_save_test Reviewed By: mraway Differential Revision: D26853942 fbshipit-source-id: 0b45078fdd279f547752c8fdb771e296374a00da	2021-03-10 15:29:36 -08:00

1 2 3 4 5 ...

2914 Commits