pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Yun Wang (Speech)	0d57e87000	Fix test_div in caffe2/caffe2/python:hypothesis_test (#106694 ) Summary: Suppress the "too_slow" health check for `test_div`. Test Plan: Sandcastle Differential Revision: D48105842 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106694 Approved by: https://github.com/malfet	2023-08-08 04:50:21 +00:00
Omkar Salpekar	ae1ed27756	[codemod][numpy] replace np.str with str (#103931 ) Summary: `np.str` is removed from numpy 1.20.0. It was an alias to builtin `str` and it's safe to do the replacement. The whole changes is mechanical, generated using the following onliner: ``` fbgr -sl 'np\.str\b' \| xargs perl -pi -e 's,\bnp\.str\b,str,g' ``` Test Plan: sandcastle Differential Revision: D46586144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103931 Approved by: https://github.com/huydhn	2023-06-21 18:16:42 +00:00
Yun Wang (Speech)	4130e4f284	[hypothesis==6.70.1] Fix more test errors (#98685 ) Summary: This diff fixes more test failures (T150117218) caused by upgrading the "hypothesis" library to 6.70.1 (D44523679). # //caffe2/caffe2/python:hypothesis_test This test generates float numbers and filters out those whose absolute values are less than 1e-2. It is a known issue of the new version of "hypothesis" that it generates zeros or floats with small absolute values too often: https://github.com/HypothesisWorks/hypothesis/issues/3603 I'm circumventing this issue by suppressing the health check `filter_too_much`. # //caffe2/caffe2/quantization/server:resize_nearest_dnnlowp_op_test All arithmetic should be done in float32 when calculating the reference, since the network being tested uses float32 everywhere. Mixing float32, float64 or even integers will result in intermediate values in float64. The different precision may cause off-by-1 errors when converting to integer. Test Plan: Run all the tests in both "dev" and "opt" modes: ``` for mode in dev opt; do buck2 test mode/$mode //caffe2/caffe2/python:hypothesis_test -- --run-disabled buck2 test mode/$mode //caffe2/caffe2/quantization/server:resize_nearest_dnnlowp_op_test -- --run-disabled buck2 test mode/$mode //caffe2/caffe2/fb/layers/tests:tum_history_test -- --run-disabled buck2 test mode/$mode //caffe2/caffe2/fb/dper/layer_models/tests:nn_ops_test -- --run-disabled buck2 test mode/$mode //caffe2/caffe2/fb/metrics:metrics_test -- --run-disabled buck2 test mode/$mode //deeplearning/numeric_suite/toolkit/test:net_transform_test -- --run-disabled buck2 test mode/$mode //f3/type_system:tests -- --run-disabled done ``` NOTE: In the first test (`//caffe2/caffe2/python:hypothesis_test`), the two methods `test_constant_fill_from_tensor` and `test_recurrent` would crash. But these crash on hypothesis 5.49.0, too, so I'm leaving them alone. Differential Revision: D44812706 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98685 Approved by: https://github.com/malfet	2023-04-11 19:07:55 +00:00
Nikita Shulga	1906eaf22f	[BE] Get rid of `future` (#92596 ) PyTorch has been Python-3.X+ for ages, so it's a shame to still rely on `future.utils` even in a deprecated Caffe2 codebase For the reference: https://peps.python.org/pep-0469/#migrating-directly-to-python-3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92596 Approved by: https://github.com/kit1980, https://github.com/orionr	2023-01-19 08:46:50 +00:00
Adam Simpkins	c4eb22009e	Drop some Python 2 compatibility code (#51769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51769 Remove some Python 2 compatibility code that otherwise causes errors to be reported from static type checkers. Static type checkers complain that the old Python 2 modules and functions referenced by this code do not exist. Given that Python 2 support is entirely deprecated now we can simply remove the compatibility code. ghstack-source-id: 121313191 Test Plan: Was able to get Pyre to successfully type check the `caffe2/python` directory with this and some other changes. Reviewed By: Tianshu-Bao Differential Revision: D26271723 Pulled By: simpkins fbshipit-source-id: fec8a09466be6867388832380480aafd36616aa1	2021-02-11 11:02:33 -08:00
Lu Fang	1fdc35da2c	[BE] Fix the broken test -- caffe2/caffe2/python:hypothesis_test - test_recurrent (#50668 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50668 GPU initialization sometimes is slow Test Plan: buck test mode/opt //caffe2/caffe2/python:hypothesis_test -- --exact 'caffe2/caffe2/python:hypothesis_test - test_recurrent (caffe2.caffe2.python.hypothesis_test.TestOperators)' --run-disabled Reviewed By: hl475 Differential Revision: D25939037 fbshipit-source-id: 832700cf42ece848cda66dd629a06ecda207f086	2021-01-17 21:21:38 -08:00
Taylor Robie	faf6032945	Remove deadlines for Caffe2 hypothesis_test when running on GPU. (#49591 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49591 A bunch of these tests are marked flaky, and have been since time immemorial. (Read: as far back as Buck will build.) However closer inspection reveals that they fail if and only if run on a GPU worker. What seems to be going on is that there are more jobs than GPUs, so the contention causes waits which registers as timeouts on the test. This diff is kind of hacky, but it basically just drops deadlines if a GPU is present. Because Caffe2 is going away I'm not too terribly concerned about a beautiful solution, but we may as well keep some test coverage if it's easy. CC Sebastian, Ilia, Min, and Hongzheng who also have tasks for what seems to be the same flakiness. Test Plan: Turn the tests back on and see if they fall over. (The failure repros reliably on an OnDemand GPU and is fixed by this change, so it's not really just a hail Mary.) Reviewed By: ngimel Differential Revision: D25632981 fbshipit-source-id: 43dcce416fea916ba91f891e9e5b59b2c11cca1a	2020-12-18 10:00:24 -08:00
Peiyao Zhou	4078f44668	[TB][embedding supporting] Modify histogram to accept multipy types to skip Castop and avoid OOMing in Castop Summary: To support min/max/mean/std, SummarizeOp need to skip size checking (similar to the LpNorm error mentioned above) and accept multiple types Test Plan: unit test: `buck test //caffe2/caffe2/fb/tensorboard/tests:tensorboard_accumulate_histogram_op_test` https://our.intern.facebook.com/intern/testinfra/testrun/1407375057859572 `buck test //caffe2/caffe2/fb/tensorboard/tests:tensorboard_accumulate_histogram_op_test --stress-runs 1000` https://our.intern.facebook.com/intern/testinfra/testrun/2533274832166362 Reviewed By: cryptopic Differential Revision: D24605507 fbshipit-source-id: fa08372d7c9970083c38abd432d4c86e84fb10e0	2020-11-11 12:03:54 -08:00
Brandon Lin	4a581ba6c2	Implement LengthsToOffsets operator in Caffe2 (#46590 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46590 This operator is very similar to LengthsToRanges but doesn't pack the offsets next to the original lengths. Reviewed By: yf225 Differential Revision: D24419746 fbshipit-source-id: aa8b014588bb22eced324853c545f8684086c4e4	2020-10-29 07:03:34 -07:00
Danny Huang	cd7a682282	[caffe2] adds hypothesis test for queue ops cancel (#45178 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45178 ## Motivation * To be able to make C2 ops cancellable so we can safely exit. * Some C2 operators are now blocking thus being non-cancellable. If an error occurs we need to be able to safely stop all net execution so we can throw the exception to the caller. ## Summary * Adds a hypothesis test for queue ops cancellation. Test Plan: ## Unit test added to verify that queue ops propagate errors ``` buck test caffe2/caffe2/python:hypothesis_test buck test caffe2/caffe2/python:hypothesis_test -- test_safe_dequeue_blob__raises_exception_when_hang --stress-runs 1000 ``` ``` Summary Pass: 1000 ListingSuccess: 1 ``` Reviewed By: d4l3k Differential Revision: D23847576 fbshipit-source-id: 2fc351e1ee13ea8b32d976216d2d01dfb6fcc1ad	2020-09-24 14:43:52 -07:00
Bugra Akyildiz	27c7158166	Remove __future__ imports for legacy Python2 supports (#45033 ) Summary: There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports: ```2to3 -f future -w caffe2``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033 Reviewed By: seemethere Differential Revision: D23808648 Pulled By: bugra fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38	2020-09-23 17:57:02 -07:00
Mike Ruberry	b6f4bb0a70	Revert D23236088: [pytorch][PR] [caffe2] adds Cancel to SafeDequeueBlobsOp and SafeEnqueueBlobsOp Test Plan: revert-hammer Differential Revision: D23236088 (`0ccc38b773`) Original commit changeset: daa90d9ee324 fbshipit-source-id: 933c7deab177250075683a9bea143ac37f16a598	2020-09-16 23:32:50 -07:00
Danny Huang	0ccc38b773	[caffe2] adds Cancel to SafeDequeueBlobsOp and SafeEnqueueBlobsOp (#44495 ) Summary: ## Motivation * To be able to make C2 ops cancellable so we can safely exit. * Some C2 operators are now blocking thus being non-cancellable. If an error occurs we need to be able to safely stop all net execution so we can throw the exception to the caller. * When an error occurs in a net or it got cancelled, running ops will have the `Cancel` method called. * This diff adds `Cancel` method to the `SafeEnqueueBlobsOp` and `SafeDequeueBlobsOp` to have the call queue->close() to force all the blocking ops to return. * Adds unit test that verified the error propagation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44495 Test Plan: ## Unit Test added to verify that queue ops propagate errors ``` buck test caffe2/caffe2/python:hypothesis_test ``` Reviewed By: dzhulgakov Differential Revision: D23236088 Pulled By: dahsh fbshipit-source-id: daa90d9ee32483fb51195e269a52cf5987bb0a5a	2020-09-16 18:17:34 -07:00
Lingyi Liu	bc64efae48	Back out "Revert D19987020: [pytorch][PR] Add the sls tensor train op" (#43938 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43938 resubmit Test Plan: unit test included Reviewed By: mruberry Differential Revision: D23443493 fbshipit-source-id: 7b68f8f7d1be58bee2154e9a498b5b6a09d11670	2020-09-01 11:42:12 -07:00
Mike Ruberry	cc52386096	Revert D19987020: [pytorch][PR] Add the sls tensor train op Test Plan: revert-hammer Differential Revision: D19987020 (`f31b111a35`) Original commit changeset: e3ca7b00a374 fbshipit-source-id: a600c747a45dfb51e0882196e382a21ccaa7b989	2020-08-29 12:46:11 -07:00
Lingyi Liu	f31b111a35	Add the sls tensor train op (#33525 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33525 Reviewed By: wx1988 Differential Revision: D19987020 Pulled By: lly-zero-one fbshipit-source-id: e3ca7b00a374a75ee42716c4e6236bf168ebebf1	2020-08-29 12:16:44 -07:00
Christopher Whelan	7a9ae52550	[hypothesis] Deadline followup (#42842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42842 Test Plan: `buck test` Reviewed By: thatch Differential Revision: D23045269 fbshipit-source-id: 8a3f4981869287a0f5fb3f0009e13548b7478086	2020-08-11 15:33:23 -07:00
Christopher Whelan	5cd0f5e8ec	[PyFI] Update hypothesis and switch from tp2 (#41645 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41645 Pull Request resolved: https://github.com/facebookresearch/pytext/pull/1405 Test Plan: buck test Reviewed By: thatch Differential Revision: D20323893 fbshipit-source-id: 54665d589568c4198e96a27f0ed8e5b41df7b86b	2020-08-08 12:13:04 -07:00
Dinesh Govindaraj	f153b35b9b	Shape inference for SparseToDense in ExpertCombiner Summary: Adding shape inference for SpraseToDense. Proposal impl of shape inference only works when data_to_infer_dim is given, otherwise SpraseToDense output dimension depends on max value of input tensor Test Plan: buck test //caffe2/caffe2/python:sparse_to_dense_test buck test //caffe2/caffe2/python:hypothesis_test -- test_sparse_to_dense Dper3 Changes: f204594813 buck test dper3/dper3_models/ads_ranking/model_impl/sparse_nn/tests:sparse_nn_lib_test Reviewed By: zhongyx12, ChunliF Differential Revision: D22479511 fbshipit-source-id: 8983a9baea8853deec53ad6f795c874c3fb93de0	2020-07-15 08:04:48 -07:00
rohithkrn	df252c059c	[ROCm] Skip caffe2 unique op test for rocm3.5 (#41219 ) Summary: unique op test failure in caffe2 blocks upgrading CI to rocm3.5.1. Skipping the test to unblock will re-enable after root causing and fixing the issue. jeffdaily sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41219 Differential Revision: D22471452 Pulled By: xw285cornell fbshipit-source-id: 9e503c8b37c0a4b92632f77b2f8a90281a9889c3	2020-07-09 20:00:29 -07:00
Hongzheng Shi	f6b0fbe2c5	topk tensor k support (#39407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39407 - support passing a single element tensor as k for topk module - support passing a single element tensor to constant fill output Test Plan: buck test dper3/dper3/modules/tests:core_modules_test -- test_topk_gating_without_split_examples_tensor_k buck test caffe2/caffe2/python:hypothesis_test -- test_constant_fill_from_tensor Reviewed By: huayuli00 Differential Revision: D21843739 fbshipit-source-id: 0c5f5c03e9f57eeba40c0068784625164c2527ec	2020-06-15 13:10:20 -07:00
Nikita Shulga	e2a178ca21	Update cafe2 hypothesis_test_util to support hypothesis-5 (#39498 ) Summary: Extracting forward-backward `hypothesis` interface update parts of https://github.com/pytorch/pytorch/pull/39430 into a separate PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/39498 Differential Revision: D21900210 Pulled By: malfet fbshipit-source-id: 75e637cf839f49dc141d37e1686ce45ff4721245	2020-06-05 08:27:50 -07:00
Brian Wignall	e7fe64f6a6	Fix typos (#30606 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606 Differential Revision: D18763028 Pulled By: mrshenli fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c	2019-12-02 20:17:42 -08:00
Lu Fang	c89340f068	Extend HasElements to support multiple inputs (#28717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28717 Make HasElements support multiple inputs. Any input has element, then return true. Test Plan: to be added Reviewed By: BIT-silence Differential Revision: D17972759 fbshipit-source-id: 3ecdea74a30fcfaaa6490fef1debc6cde68db922	2019-10-27 23:00:07 -07:00
Junjie Bai	a7eb18e243	Enable Unique operator tests on ROCm Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26046 Differential Revision: D17331522 Pulled By: bddppq fbshipit-source-id: 729624d1df15a1c0c7ba2b7e7e3c3a903fb13abf	2019-09-11 16:36:14 -07:00
Andrey Malevich	d58059bc6f	Fix SliceGradientOp to handle properly empty batches (#23784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23784 Backward path does nothing during the gradient path when the input as empty, as a result workspace can preserve gradient values from previous iteration and get inconsistent inputs for some of the backward pass operators. This diff should fix this disrepancy by always reinitializing output during the backward path. Reviewed By: dzhulgakov Differential Revision: D16646096 fbshipit-source-id: 8ca68dfad17a63fc87c033cce7b36b40bd77245c	2019-08-06 02:43:32 -07:00
Your Name	99674eb86f	Re-enable test_dag_net_forking on ROCm (#21013 ) Summary: Fixes #16229 Pull Request resolved: https://github.com/pytorch/pytorch/pull/21013 Differential Revision: D15515824 Pulled By: bddppq fbshipit-source-id: 23a6c7eaad6129328c6b9dfcc55ac2d31a6d2dc0	2019-05-28 12:12:53 -07:00
rohithkrn	aa88c2c0b6	Unify gpu_support variable in python tests (#16748 ) Summary: Assign `has_gpu_support = has_cuda_support or has_hip_support` and make according changes in python tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16748 Differential Revision: D13983132 Pulled By: bddppq fbshipit-source-id: ca496fd8c6ae3549b736bebd3ace7fa20a6dad7f	2019-02-07 00:29:51 -08:00
peter.yeh@amd.com	10cd9d5a03	Skip dag_net_forking test on Rocm (#16639 ) Summary: -Skip the test due to flaky behavior on AMD/Rocm -The fix is expected in Rocm 2.2 ( HSA runtime) bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/16639 Differential Revision: D13915231 Pulled By: bddppq fbshipit-source-id: 66e1d275836337170b15ceb9d60cfdd3242d4df8	2019-02-01 00:53:54 -08:00
Yangqing Jia	da73d709a8	Remove unsafecoalesce op (#12897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12897 UnsafeCoalesce Op is used during memonger days when we try to coalesce operators for better efficienct computation kernels. It creates a little bit of an unsafe underlying memory storage pattern. With the new tensor unification I am not sure if it is still safe for us to do so, so I propose we delete it for the sake of safety. Reviewed By: bddppq, ilia-cher Differential Revision: D10475980 fbshipit-source-id: b1a838c9f47d681c309ee8e2f961b432236e157e	2018-10-22 09:42:26 -07:00
Will Feng	cdead5ace1	Enable CircleCI for Linux jobs (#12389 ) Summary: Changes in this PR: 1. Intermediate Docker image is shared from build stage to test stage through ECR, in order to fix the Caffe2 flaky CUDA tests. 2. There are ~7 Caffe2 operator tests that are only flaky in `caffe2_py2_gcc4_8_ubuntu14_04_test` on CPU. Disabling those tests on that config only, which is okay to do because we are still running those tests in other test jobs. After this PR is merged, CircleCI will be running on master automatically, and will be running on PRs if the author rebased their PR onto the newest master (which we will ask all the authors to do when we switch off Jenkins for Linux). Pull Request resolved: https://github.com/pytorch/pytorch/pull/12389 Differential Revision: D10224267 Pulled By: yf225 fbshipit-source-id: dd1a90a425c3d13b870d3d328cb301eee2e6e2cd	2018-10-08 17:09:37 -07:00
Jongsoo Park	29610621ec	64B align for avx512 (#11748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11748 For avx512, we need to align at a multiple of 64B not 32B Regardless of avx512, it's in general a good idea to be cache line aligned. Reviewed By: ilia-cher Differential Revision: D9845056 fbshipit-source-id: b1d3ed67749c0c1a64acd5cc230a1279e8023512	2018-09-17 14:08:31 -07:00
Will Feng	c9e66351a7	Port all PyTorch and Caffe2 jobs to CircleCI (#11264 ) Summary: This PR adds all PyTorch and Caffe2 job configs to CircleCI. Steps for the CircleCI mini-trial: - [ ] Make sure this PR passes Jenkins CI and fbcode internal tests - [x] Approve this PR - [ ] Ask CircleCI to turn up the number of build machines - [ ] Land this PR so that the new `.circleci/config.yml` will take effect Several Caffe2 tests are flaky on CircleCI machines and hence skipped when running on CircleCI. A proper fix for them will be worked on after a successful mini-trial. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11264 Differential Revision: D9656793 Pulled By: yf225 fbshipit-source-id: 7832e90018f3dff7651489c04a179d6742168fe1	2018-09-05 16:28:11 -07:00
rohithkrn	f5910c8a36	Add MIOPEN recurrent operator (#10840 ) Summary: The goal of this PR is to enable miopen engine(for hip devices) for recurrent operator and also enable corresponding unit test. bddppq petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/10840 Differential Revision: D9518980 Pulled By: bddppq fbshipit-source-id: 214661e79a47c5dc6b712ef0fba986bd99db051f	2018-08-27 15:39:56 -07:00
Xiuyan Ni	db96a0951f	Add SIMD version to GFTRL optimizer (#9698 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9698 Add SIMD version to GFTRL optimizer Differential Revision: D8949723 fbshipit-source-id: 835ce2ce49630ae43fc6bac63c545c14b25f5a26	2018-07-30 15:27:24 -07:00
Junjie Bai	bdbbcf068a	Temporarily disable test_unique on rocm since it keeps running into segfault (#9872 ) Summary: petrex https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3758/console https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3757/console https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3752/console Pull Request resolved: https://github.com/pytorch/pytorch/pull/9872 Reviewed By: ezyang Differential Revision: D9013335 Pulled By: bddppq fbshipit-source-id: 80490a0fd4a86aa9c8454378c0edddc57d135c4e	2018-07-26 08:34:00 -07:00
Junjie Bai	7af5883860	Eanble python tests on ROCM (#9616 ) Summary: petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/9616 Differential Revision: D8960623 Pulled By: bddppq fbshipit-source-id: bde93bda6230094e6bf4badd8ee79f0688ae1993	2018-07-24 11:37:58 -07:00
Xiuyan Ni	4e5369349f	Add FTRL Optimzier with Group Lasso regularizer (#9074 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9074 Implement an optimzier based on FTRL Optimzier which support Group Lasso regularizer. The relevant paper list for this optimizer: 1. About the FTRL Optimizer: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41159.pdf, 2. About the group lasso regularizer solver: http://www.cse.cuhk.edu.hk/~king/PUB/ICML2010-Yang-473.pdf Differential Revision: D8623146 fbshipit-source-id: 40e08aa6319d1ad7aa95e8716e3de83b9cfb8452	2018-07-06 13:41:00 -07:00
bddppq	bc4feab3e3	Fix flaky atomic iter test (#7649 )	2018-05-17 21:17:29 -07:00
Paul Jesse Hellemn	b875fb281c	Update from facebook (#7451 ) * [bootcamp] Improve "Shape" operator to support axes specification To improve .shape operator of Caffe2 to support x.shape(tensor, axes), which takes an optional int array "axes" as input. For example, x.shape(tensor, [1, 0]) will return the dimension for axis 1 and 0 following the specified order. For current version, "axes" input allows duplications and can have arbitrary length. * Back out "Add barrier net that runs before training nets" Original commit changeset: b373fdc9c30f. Need additional changes to some callers to support barrier failures. * Change warning to verbose log to reduce log spam The `LOG(WARNING)` was a bit spammy for regular use so lets just make it a `VLOG`. * Extract the shared code from different caffe2_benchmark binaries The OSS benchmark and Internal benchmark will share most functions in the benchmark. * Support MFR in sequence training As titled. * Make knowledge distillation work with using logged prediction feature as teacher label. 1) Add loading raw dense feature as teacher label. 2) Optional calibration function for teacher label 3) Add teacher label into generic unit test 4) Deprecated TTSN workflow version using feature_options to config teacher label * [C2/CUDA]: unjoined cross entropy sigmoid as desc * Add async_scheduling executor into deferrable_net_exec_test Add async_scheduling into tests and fix some exception cases * Fix Event disabled error When disabling event in RNN ops make sure we don't call Finish on disabled event from op's RunAsync * cuda ensure cpu output op can handle both TensorCPU and TensorCUDA as desc. * [C2 Core] Infer input device option in C2 hypothesis_test checkers Improve how we default input blob device options. Previously it defaults as where op lives but it is not necessarily the case. For example: CopyCPUToGPU * [C2 Op]SplitByLengthsOp CPU/GPU implementation [C2 Op]SplitByLengthsOp CPU/GPU implementation * fix undefined symbol error not sure why we're getting undefined symbol even with link_whole = True Need to figure out why but need this workaround for now * Add tools in DAIPlayground platform to help debugging models Add additional tools to allow Plauground override individual method defined in AnyExp. This will allow user to create module that specificly change certain default method behavior. An example included in this diff is deactivating test model and checkpointing. When debugging any model problems, switching off components helps me quickly narrow down the location of the bug. The technique is extensively used in task T27038712 (Steady memory increase in EDPM, eventually resulting in gloo/cuda.cu:34: out of memory) * add shape and type inference for int8 conversion operator * Fix flaky test for group_norm Fix flaky test for group_norm * Fix group_norm_op_test flaky Fix group_norm_op_test flaky * Implementation of composite learning rate policy In many state-of-the-arts deep learning works, people use a simple trick to schedule the learning rate: use a fixed learning rate until error plateaus and then switch to a different fixed learning rate, and so on. In this diff, we implemented a simple version of the composite learning rate. The user gives a set of learning rates policies and corresponding iteration nums, and the optimizer will change the learning rate policy based on the number of iterations so far. For example, the user give two learning rate policies, one is FixedLearningRate and PolyLearningRate, with an iteration number of 1k. Then the first 1k iteration, we use FixedLearningRate. For the following iterations, we use PolyLearningRate. * Split two use cases of CachedReader into two classes, DBFileReader and CachedReader # Use Cases: 1). input: DB file -> output: DatasetReader. Use DBFileReader. 2). input: Reader -> build cache DB file -> output: DatasetReader. Use CachedReader. # Changes to CachedReader: 1). Move db_path to the constructor. Because in mock reader. cache will always be built ahead. # Changes to tests: 1). Make a separate TestCase class for CachedReader and DBFileReader. 2). Make it possible to add more test functions by adding setUp, tearDown and _make_temp_path. 3). Make delete db_path more general. `db_path` could be a file for `log_file_db`, but could also be a directory for `leveldb`. * Back out "On Mobile phones, call GlobalInit with no arguments in predictor in case we need to perform initialization" Original commit changeset: 4489c6133f11 * Fix LARS bug Fixed a bug in the LARS implementation which caused all subsequent blobs not using LARS to have the LARS learning rate multiplier applied to them. * [tum] support sparse init & add uniformFill option as title * Propagate exception for async nets Capture the exception when an exception is thrown in async nets and re-throw it after wait(). This allows exceptions to be propagated up to the caller. This diff was a part of D7752068. We split the diff so that C2 core files changes are in a separate diff. * Automatic update of fbcode/onnx to 69894f207dfcd72d1e70497d387201cec327efbc Previous import was 403ccfbd0161c38f0834413d790bad0874afbf9a Included changes: - [69894f2](https://github.com/onnx/onnx/commit/69894f2): Use op schema.all tensor types in random like definitions (#865) <Scott McKay> - [b9d6b90](https://github.com/onnx/onnx/commit/b9d6b90): Clarify random like operators (#846) <Scott McKay> - [fc6b5fb](https://github.com/onnx/onnx/commit/fc6b5fb): Refactor shape inference implementation (#855) <anderspapitto> - [b7d8dc8](https://github.com/onnx/onnx/commit/b7d8dc8): fix cmake warning message (#863) <Eric S. Yu> - [f585c5d](https://github.com/onnx/onnx/commit/f585c5d): add pytorch-operator test for tile (#831) <Wenhao Hu> - [993fe70](https://github.com/onnx/onnx/commit/993fe70): add install step (#832) <Eric S. Yu> - [68bc26c](https://github.com/onnx/onnx/commit/68bc26c): add type inference for traditional ml ops except classifier ops. (#857) <Ke Zhang> - [9cc0cda](https://github.com/onnx/onnx/commit/9cc0cda): fix string representation of scalar types (#858) <G. Ramalingam> - [1078925](https://github.com/onnx/onnx/commit/1078925): fix y in pow test case to scalar (#852) <Wenhao Hu> - [c66fb6f](https://github.com/onnx/onnx/commit/c66fb6f): Add some math function shape inference (#845) <anderspapitto> - [ff667d1](https://github.com/onnx/onnx/commit/ff667d1): Refactor return type and docs for ONNXIFI_BACKEND_DIRECTX_ID (#853) <Marat Dukhan> - [11c6876](https://github.com/onnx/onnx/commit/11c6876): clear initializer names when clear initializer (#849) <Wenhao Hu> - [73c34ae](https://github.com/onnx/onnx/commit/73c34ae): Clarify FeatureVectorizer description. (#843) <Scott McKay> - [1befb9b](https://github.com/onnx/onnx/commit/1befb9b): Remove useless text in docs (#850) <Lu Fang> - [e84788f](https://github.com/onnx/onnx/commit/e84788f): Fix SELU attributes' default values (#839) <Lu Fang> - [ebac046](https://github.com/onnx/onnx/commit/ebac046): Add tile test case (#823) <Wenhao Hu> - [8b7a925](https://github.com/onnx/onnx/commit/8b7a925): a few more shape inference functions (#772) <anderspapitto> - [9718f42](https://github.com/onnx/onnx/commit/9718f42): Make the coefficient non optional for LinearClassifier (#836) <Jaliya Ekanayake> - [ef083d0](https://github.com/onnx/onnx/commit/ef083d0): Add save_tensor and load_tensor functions for Protos (#770) <Lu Fang> - [45ceb55](https://github.com/onnx/onnx/commit/45ceb55): Check if CMAKE_BUILD_TYPE set before project(). (#812) <Sergii Dymchenko> - [4b3d2b0](https://github.com/onnx/onnx/commit/4b3d2b0): [WIP] reenable shape inference tests (#834) <anderspapitto> - [22d17ee](https://github.com/onnx/onnx/commit/22d17ee): RNN tests: LSTM, GRU, SimpleRNN (#739) <Peyman Manikashani> - [de65b95](https://github.com/onnx/onnx/commit/de65b95): dimension denotation (#443) <Tian Jin> - [eccc76e](https://github.com/onnx/onnx/commit/eccc76e): fix field number issue in onnx operator proto and enable its build (#829) <Ke Zhang> - [d582beb](https://github.com/onnx/onnx/commit/d582beb): disable shape inference test to unbreak ci (#830) <Lu Fang> - [485b787](https://github.com/onnx/onnx/commit/485b787): function proto for composite op. (#802) <Ke Zhang> - [cd58928](https://github.com/onnx/onnx/commit/cd58928): specify defaults for attributes of Affine op (#820) <G. Ramalingam> - [7ee2cf9](https://github.com/onnx/onnx/commit/7ee2cf9): merge the dummy backend back into the main one (#743) <anderspapitto> - [1c03a5a](https://github.com/onnx/onnx/commit/1c03a5a): [Proposal] ONNX Interface for Framework Integration (previously ONNX Backend API) header and docs (#551) <Marat Dukhan> - [3769a98](https://github.com/onnx/onnx/commit/3769a98): Rename real model test case from VGG-16 to ZFNet (#821) <Lu Fang> * [C2]ReluN Op relu n op. tf reference: https://www.tensorflow.org/api_docs/python/tf/nn/relu6 * Call destructor when assigning a blob value * Add executor overrides Add executor overrides flag to enable migration to async_scheduling executor * Add barrier net that runs before training nets - attempt #2 Add a synchonize barrier net that is run before training nets. With this net, shards that are faster will wait for other shards before start training. This reduce chances of the faster shards timing out during GLOO AllReduce. Removed explicit data_parallel_model.py.synchronize call in holmes workflow. This change was landed previously but caused errors for some EDPM workflows - See https://fb.facebook.com/groups/1426530000692545/permalink/1906766366002237/ - because EDPM assumes any call to CreateOrCloneCommonWorld and Gloo ops are wrapped in exception handlers but in this case exception thrown in the barrier init net is not handled. To address this issue, we add _CreateOrCloneCommonWorld to the param_init_net instead of a new barrier init net. Since errors for param_init_net run is handled gracefully and re-rendezvous, it should fixes the problem. * Handle empty nets in async_scheduling Make sure we don't get stuck on empty nets * use CUDA_ARCH for conditional compile * [C2 fix] infer function for ensure_cpu_output_op * Update group_norm test to reduce flaky test * Fix lr_multiplier for GPU	2018-05-10 23:14:27 -07:00
Orion Reblitz-Richardson	1d5780d42c	Remove Apache headers from source. * LICENSE file contains details, so removing from individual source files.	2018-03-27 13:10:18 -07:00
mlappelbaum	d11fc90317	Export atomic iter count (#2379 ) * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Add axis to top_k_op. (#2416) * Revert update on top_k_op * Add axis to top_k_op Add axis to top_k_op * [auto] Update onnx to a8e4648 - Adjust link flags when built in Windows Debug mode (#647) `a8e4648a7d` * [auto] Update onnx to f4acf28 - Remove allowconsumed enforceconsumed from op schema. (#617) `f4acf281ef` * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Initialize cpuinfo in the thread pool Thread pool called cpuinfo_get_processors_count() without initializing cpuinfo. Only by luck it didn't make Caffe2 single-threaded: threadpool is initialized after NNPACK, and NNPACK initializes cpuinfo itself. This commit also updates cpuinfo to a version that aborts with a fatal error if its used uninitialized. * Updated Python Op and Image Pre-Processing Pipeline tutorials && Added CIFAR-10 Part 1 tutorial (#2286) * Updated Basics tutorial: (1) Added Python 3 support with __future__ statements; (2) Various grammatical/typo fixes and minor refactoring of Markdown * Added Python 3 support and made minor typo fixes * Added Python 3 support with future imports, refactored and corrected errors in Markdown, added comments * Added Python 3 support with future imports, Added use of caffe_translator.py to translate downloaded .caffemodel file to .pb files * Upgrades to Image Pre-Processing Pipeline tutorial * Updated Python Op tutorial * removed markdown with empty links * Added Part 1 of an end-to-end CIFAR-10 tutorial * Updated MNIST Dataset and Databases tutorial with python3 support and markdown fixes * Tweaks to markup, less training iterations * changed permissions of CIFAR10_Part1; typo corrections in Image_Pre-Processing_Pipeline * Typo corrections in Multi-GPU Training tutorial * sync Python_Op py_gen with the IPython notebook * nit typo correction * [auto] Update onnx to 5cb999d - Minor cleanups to shape inference (#653) `5cb999ddc1` * [auto] Update onnx to ecac1c1 - Merge Rel 1.1.0 branch into master (#657) `ecac1c1624` * Strip down onnx to only pb definitions in mobile build (#2426) * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count	2018-03-26 19:26:09 -07:00
Orion Reblitz-Richardson	0ea8964fd6	Revert "Export number of iterations of AtomicIterOp" (#2359 ) * Revert "Use -DCMAKE_BUILD_TYPE=Release for local build by default" This reverts commit 035c62081f6420405b9f1380cc5d21b4c6ae78f6. * Revert "Export number of iterations of AtomicIterOp (#2338)" This reverts commit 91b7a0cb48c6b079e2ca8fd5c26819a003937d76.	2018-03-21 16:11:29 -07:00
mlappelbaum	8346088094	Export number of iterations of AtomicIterOp (#2338 ) * Exported AtomicIterOp count * Exported AtomicIterOp count	2018-03-21 12:39:30 -07:00
Orion Reblitz-Richardson	6aa087d902	Revert "export num iterations of AtomicIter" This reverts commit be9c8e5591f5d38131b9bdc2249542f27dadc221.	2018-03-20 13:34:22 -07:00
Matan Appelbaum	fac306d3c9	export num iterations of AtomicIter as title. Useful for tracking number of EASGD updates.	2018-03-20 13:34:22 -07:00
Orion Reblitz-Richardson	5c381bbc57	Patch cuda-convnet2 from internal Facebook changes. * Unfortunately this needs to be manually monkey patched. * This should get it so GitHub and fbcode versions match.	2018-02-28 14:20:48 -08:00
Pieter Noordhuis	52fa742c51	Revert D6893040: Implementing Pow operator (this merges existing pow with a scalar and new pow with a tensor exponent). Summary: This reverts commit 30f614beea6f859fee25ce4f85573142885dde45 bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! cause_a_sev_many_files Differential Revision: D6893040 Original commit changeset: 30f614beea6f fbshipit-source-id: 5e98a24699088283f864efe31234874bdacbe3c3	2018-02-14 10:34:08 -08:00
Maxim Naumov	f7cc8e8822	Implementing Pow operator (this merges existing pow with a scalar and new pow with a tensor exponent). Summary: The old pow operator has been deleted in math_ops.cc, math_ops.cu and math_ops.h, while the new operator supporting scalar and tensor exponent has been added in pow_op.cc, pow_op.h an elementwise_op.cu. Reviewed By: houseroad Differential Revision: D6893040 fbshipit-source-id: 30f614beea6f859fee25ce4f85573142885dde45	2018-02-13 17:46:35 -08:00
Huazhong Ning	90543ff13a	weighted sampling reader dequeue outputs table index Summary: Weighted sampling reader dequeue randomly chooses a hive reader to read a mini-batch. This diff allows dequeue to output the index of the randomly chosen table to a specific blob. Reviewed By: kennyhorror Differential Revision: D6621070 fbshipit-source-id: 754b981fc2bcfdb0146d2a0a5b677e7cfe74211b	2018-01-24 19:06:25 -08:00

1 2 3

125 Commits