pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Will Constable	4f34cd6d1e	Replace all CHECK_ and DCHECK_ with TORCH_* macros (#82032 ) Avoid exposing defines that conflict with google logging, since this blocks external usage of libtorch in certain cases. All the 'interesting' changes should be in these two files, and the rest should just be mechanical changes via sed. c10/util/logging_is_not_google_glog.h c10/util/logging_is_google_glog.h Fixes https://github.com/pytorch/pytorch/issues/81415 cc @miladm @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/82032 Approved by: https://github.com/soumith, https://github.com/miladm	2022-07-26 01:20:44 +00:00
Nikita Shulga	a9b0a921d5	Disable `avoid-non-const-global-variables` lint check (#62008 ) Summary: As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH` All changes but the ones to `.clang-tidy` are generated using following script: ``` for i in `find . -type f -iname ".c" -or -iname "*.h"\|xargs grep cppcoreguidelines-avoid-non-const-global-variables\|cut -f1 -d:\|sort\|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008 Reviewed By: driazati, r-barnes Differential Revision: D29838584 Pulled By: malfet fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13	2021-07-22 18:04:40 -07:00
Nikita Shulga	3a66a1cb99	[clang-tidy] Exclude cppcoreguidelines-avoid-magic-numbers (#57841 ) Summary: Add cppcoreguidelines-avoid-magic-numbers exclusion to clang-tidy Remove existing nolint warnings using following script: ``` for file in `git ls-files \| grep -v \.py`; do gsed '/^ *\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers)/d' -i $file; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57841 Reviewed By: samestep Differential Revision: D28295045 Pulled By: malfet fbshipit-source-id: 7c6e8d1213c9593f169ed3df6a916498f1a97163	2021-05-07 20:02:33 -07:00
Nikita Shulga	4cb534f92e	Make PyTorch code-base clang-tidy compliant (#56892 ) Summary: This is an automatic change generated by the following script: ``` #!/usr/bin/env python3 from subprocess import check_output, check_call import os def get_compiled_files_list(): import json with open("build/compile_commands.json") as f: data = json.load(f) files = [os.path.relpath(node['file']) for node in data] for idx, fname in enumerate(files): if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'): files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')] return files def run_clang_tidy(fname): check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"]) changes = check_output(["git", "ls-files", "-m"]) if len(changes) == 0: return check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"]) def main(): git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n") compiled_files = get_compiled_files_list() for idx, fname in enumerate(git_files): if fname not in compiled_files: continue if fname.startswith("caffe2/contrib/aten/"): continue print(f"[{idx}/{len(git_files)}] Processing {fname}") run_clang_tidy(fname) if __name__ == "__main__": main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892 Reviewed By: H-Huang Differential Revision: D27991944 Pulled By: malfet fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179	2021-04-28 14:10:25 -07:00
Tristan Rice	ce54f0d411	Back out "Revert D20449887: [dt][caffe2] enable using smart exceptions in async nets" (#36172 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36172 Original commit changeset: 3d7801613f86 D20449887 broke some OSS tests as the OSS export sync wasn't working correctly. Test Plan: Manually export latest version to OSS to trigger the tests + test plan in D20449887 verified onnx tests are passing in https://github.com/pytorch/pytorch/pull/36172 Reviewed By: andrewwdye Differential Revision: D20902279 fbshipit-source-id: bc30fcc9f5cc8076f69a5d92675fd27455948372	2020-04-13 11:31:52 -07:00
Michael Ranieri	1702152ef9	fixup unit tests (#34105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34105 make parallel_net_test.cc chronos conforming. exclude gtest asserts that check thrown exceptions when exceptions are disabled. Test Plan: CI green Differential Revision: D20153525 fbshipit-source-id: 7371e559da948f46773fed09e3a23a77411d59e0	2020-03-03 10:33:21 -08:00
Sebastian Messmer	643ca5def2	Replace c10::guts::stuff with std::stuff (#30915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30915 Since we now have C++14, we don't need these c10::guts helpers anymore ghstack-source-id: 95777609 Test Plan: waitforsandcastle Differential Revision: D18869639 fbshipit-source-id: 97716f932297c64c6e814410ac47b444c33d4e2e	2019-12-16 13:57:19 -08:00
Ilia Cherniavskii	7190789f58	Handling of failing and terminal async cpu ops (#29052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29052 Make sure we handle the case of multiple, async, terminal (no children) and failing cpu ops. Test Plan: AsyncIf tests Reviewed By: yyetim Differential Revision: D18276401 Pulled By: ilia-cher fbshipit-source-id: 35b175dd025bc7e392056ac1331b159376a29e60	2019-11-04 12:01:21 -08:00
Ilia Cherniavskii	47263e48f4	Better handling of net errors in prof_dag counters (#17384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17384 Better handling of possible net run errors in prof_dag counters. Reviewed By: yinghai Differential Revision: D14177619 fbshipit-source-id: 51bc952c684c53136ce97e22281b1af5706f871e	2019-02-22 18:38:31 -08:00
Michael Liu	5f866d0ea2	Apply modernize-use-override (2nd iteration) Summary: Use C++11’s override and remove virtual where applicable. Change are automatically generated. Reviewed By: Orvid Differential Revision: D14086124 fbshipit-source-id: 2005227d095d776ca3b4309a57f54e25782b9b58	2019-02-14 16:52:57 -08:00
Ilia Cherniavskii	f019a2d9b3	Remove unused executors, part 3 (#14199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14199 Remove legacy code for dag, async_dag Reviewed By: salexspb Differential Revision: D13019102 fbshipit-source-id: ff07e45304d9af4be0375215f4b642c4b0edb12d	2018-11-26 19:10:43 -08:00
Ilia Cherniavskii	4276fe7867	Support for saving exceptions in async CPU ops (#12904 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12904 Enabling support for saving exceptions in async parts of CPU ops via event().SaveException(). The error contract for CPU ops becomes: - return false in sync part -> net->Run() returns false - throw in sync part -> net->Run() rethrows the same exception - SetFinished("error msg") in async part -> net->Run() returns false - event().SetFinishedWithException() in async part -> net->Run() rethrows the same exception Reviewed By: andrewwdye Differential Revision: D10479130 fbshipit-source-id: 850ee9cbf83b04dd24b25eba359439b0cf7853c0	2018-10-29 04:57:40 -07:00
Ilia Cherniavskii	da2da55170	Make sure to update success_ at the end of the run (#12806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12806 Make sure to update success_ status at the end of the run when going through task statuses Reviewed By: aazzolini Differential Revision: D10443704 fbshipit-source-id: 79f8f7fe1eccb78f6e2859f3b1e66dc44347bcc8	2018-10-22 16:58:20 -07:00
Yangqing Jia	7d5f7ed270	Using c10 namespace across caffe2. (#12714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12714 This is a short change to enable c10 namespace in caffe2. We did not enable it before due to gflags global variable confusion, but it should have been mostly cleaned now. Right now, the plan on record is that namespace caffe2 and namespace aten will fully be supersets of namespace c10. Most of the diff is codemod, and only two places of non-codemod is in caffe2/core/common.h, where ``` using namespace c10; ``` is added, and in Flags.h, where instead of creating aliasing variables in c10 namespace, we directly put it in the global namespace to match gflags (and same behavior if gflags is not being built with). Reviewed By: dzhulgakov Differential Revision: D10390486 fbshipit-source-id: 5e2df730e28e29a052f513bddc558d9f78a23b9b	2018-10-17 12:57:19 -07:00
Junjie Bai	f54ab540af	Rename cuda_gpu_id to device_id in DeviceOption (#12456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12456 codemod with 'Yes to all' codemod -d . --extensions h,cc,cpp,cu,py,proto,pbtxt,pb.txt,config cuda_gpu_id device_id Overload TextFormat::ParseFromString to do string replace when parsing from protobuf format Reviewed By: Yangqing Differential Revision: D10240535 fbshipit-source-id: 5e6992bec961214be8dbe26f16f5794154a22b25	2018-10-09 15:54:04 -07:00
Yangqing Jia	38f3d1fc40	move flags to c10 (#12144 ) Summary: still influx. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12144 Reviewed By: smessmer Differential Revision: D10140176 Pulled By: Yangqing fbshipit-source-id: 1a313abed022039333e3925d19f8b3ef2d95306c	2018-10-04 02:09:56 -07:00
Jerry Zhang	9f4bcdf075	caffe2::DeviceType -> at::DeviceType (#11254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11254 Previously we use DeviceType in caffe2.proto directly, but it's an `enum` and have implicit conversion to int, which does not have type safety, e.g. we have to explicitly check for a device type is valid in event.h: ``` template <int d> struct EventCreateFunctionRegisterer { explicit EventCreateFunctionRegisterer(EventCreateFunction f) { static_assert(d < MaxDeviceTypes, ""); Event::event_creator_[d] = f; } }; ``` at::DeviceType is an `enum class`, and it does not have implicit conversion to int, and provides better type safety guarantees. In this diff we have done the following refactor(taking CPU as an example): 1. caffe2::DeviceType → caffe2::DeviceTypeProto 2. caffe2::CPU → caffe2::PROTO_CPU 3. caffe2::DeviceType = at::DeviceType 4. caffe2::CPU = at::DeviceType::CPU codemod -d caffe2/caffe2 --extensions h,cc,cpp 'device_type, ' 'device_type(), PROTO_' + some manual changes In short, after this diff, in c++, caffe2::CPU refers to the at::DeviceType::CPU and the old proto caffe2::CPU will be caffe2::PROTO_CPU. In python side, we have a temporary workaround that alias `caffe2_pb2.CPU = caffe2_pb2.PROOT_CPU` to make the change easier to review and this will be removed later. Reviewed By: ezyang Differential Revision: D9545704 fbshipit-source-id: 461a28a4ca74e616d3ee183a607078a717fd38a7	2018-09-05 16:28:09 -07:00
Ilia Cherniavskii	94bc4c6091	Ensure pending tasks are finished in case of failure (#9290 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9290 Ensure pending tasks (e.g. network ops) are finished when net fails Reviewed By: heslami Differential Revision: D8777230 fbshipit-source-id: e57fcf1df6aa0ed8847923391502b666edb43674	2018-07-11 15:39:46 -07:00
Orion Reblitz-Richardson	edb88b5f3a	Update from Facebook (#8887 ) * add opencl + fpga context adds an opencl context inside caffe2/fb which can be used for fpga access * [Caffe2] Force tensor inference checks to be triggered during testing We've started to rely on TensorInference functions more for different analysis. This diff ensures that the TensorInference function's result matches what is expected from the definition of the operator. * Enable building //caffe2:torch with @mode/opt In @mode/opt, python runs out of a PAR, which breaks a lot of assumptions in the code about where templates/ folders live relative to __file__. Rather than introduce hacks with parutil, I simply turn template_path into a parameter for all the relevant functions and thread it through from the top level. * [Caffe2] Fix cost models for DotProduct and Div. Update Tensor Inference for dot product As title. DotProduct states that output is a 1-D tensor (https://caffe2.ai/docs/operators-catalogue.html#dotproduct) though code suggests it is either 0- or 1-D depending on inputs. TensorInference defined to support implementation. * [SG-MoE] Add an option to make the experts NOT as components * [nomnigraph] Rename and fixup convertToNeuralNetOperator API This will make things a bit cleaner * no longer symlink THNN.h and THCUNN.h * forced decoder network (onnx export) Closes https://github.com/pytorch/translate/pull/95 Add networks in ensemble_export.py to create a forced decoding network from PyTorch NMT checkpoints. This network takes an arbitrary numberized (source, target) pair and returns the model score for the translation, including penalties. Vocabulary reduction networks are also supported, but note that target indices which are not in the possible_translation_tokens generated for the source input will be trea * Revert schema change to fix production models Revert schema change to fix production models * MockLogDeviceReader - rebase on FIX # Goal 1), Build a make_mock_log_device_reader using make_mock_reader 2), Replace the real log_device_reader here: https://fburl.com/raihwf1p # Log by D8151734 Real log_device_reader: ``` I0529 20:29:05.373108 954994 tensor.h:839] Tensor print_net/log of type std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >. Dims: (): read_net/ParseOpenTrainingRow:0 I0529 20:29:05.373244 954994 tensor.h:839] Tensor read_net/ParseOpenTrainin * [C2/D2][1/n]: Nonnegative-Constrained Optimization -- log barrier implement log barrier as a regularization method * Add teacher weight screening. Add teacher weight sceening according to teacher labels. If teacher label is zero, we do not use the distill loss in the objective function. * Add NormalizerContext See task for more detail. This implementation is a copy of what exists for RegularizerContext except for how the parameters are defined in the model_definition thrift file. I'll try an alternative implementation which overrides the default arguments of functions instead like for argscopes in tensorflow. https://github.com/pytorch/pytorch/compare/master...MaximeBoucher:update-from-facebook-0939578c068c?expand=1 * Adding cosine similarity option in dot processor Add pairwise cosine similarity option in dot product. Add an option to concate dot product and cosine similarity. Add test cases. * [nomnigraph][redo] Concat elim for sparseNN Same as D7962948, which was reverted because Operator Schema was not defined * [pytorch] Revert pytorch/pytorch#7918 'Release GIL when copying to shared memory', breaks ASAN Revert this pytorch diff that breaks ASAN when running Filament in dev mode; in opt mode it gives "bad file descriptor" errors. Looks like a race when copying tensors to shared memory in multiple mp.Queue's (which spawn separate threads). https://github.com/pytorch/pytorch/pull/7918/files * [nomnigraph][mobile] Enable nomnigraph by default, use -Oz on nomnigraph related code to reduce code size enables nomnigraph and reduces codesize * [Warmup] Allow both offline incremental training and online training Change plan name on saving side and reading side to support both training type This diff depends on D8128530 and D8168651. * Revert D7802642: [Warmup] Allow both offline incremental training and online training This reverts commit afc213cf9b36cecf75333a788391c4d09f4afccc @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Add legacy grad logic to fix div op on old graphs. Add legacy grad logic to fix div op on old graphs. * Correctly propagate operator failures Propagate errors from operators that throw exceptions and return false * Revert D8374829: [caffe2][nomnigraph][redo] Concat elim for sparseNN This reverts commit 6dda028c463e54bb5c32188bbbe9202107e188a5 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [Caffe2] Added extra_info to core.DeviceOption(), enforced extra_info to be inherited in scope.DeviceScope extra_info is a newly defined field in DeviceOption proto. This diff added extra_info to the core.DeviceOption(). And, In scope.DeviceScope(), this diff enforce the new scope to inherit the extra_info from old scope. * [opt] hgdirsync wasn't enabled, merge diverged code Here's the damage, P59732616 basically xplat was left behind but had the change from assert to CAFFE_ENFORCE * OMP parallelism over RoIs for RoIAlign op Simpler to parallelize over RoIs. Shouldn't affect other uses as it relies on the number of OMP threads set during startup. PR: https://github.com/pytorch/pytorch/pull/8562 * Use int64_t for shape in FillOps to avoid overflow of int32 * Implement Rotated RoIAlign op Based on Rotated RPNs as explained in https://arxiv.org/abs/1703.01086. The idea is simple - orientation/angle is added as an RPN anchor parameter and then the angle is further regressed similar to bbox coords. There are some additional changes related to NMS and IoU, but besides that it's a direct extension to Faster-RCNN. Further details in https://fb.quip.com/sZHlA1iMfWPZ. RoIs are represented in [center_x, center_y, width, height, angle] format. `angle` repre * Rotated RoIAlign op CUDA forward implementation CUDA forward impl for D8415490 * RoIAlignRotated op CUDA backward pass implementation TSIA * All remaining fixes to eliminate process_github.sh Most of this diff has already been reviewed separately, except for the parts relating to _thnn/utils.py and _utils._internal.py remove skipIf(True, 'Fbcode') line from process_github.sh replace sed of cpp file with #ifdef to control cudnnDestroy use undo sync-time deletion of .gitattributes, remove process_github.sh switch to using _utils._internal rather than try-import-except This diff also fixes the open-source bug where rebuilds have * Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training" Original commit changeset: 7707d2efe60e The original diff is backout becuase the online trainer package is backed out. This code would only work with new online trainer package * [easy] improve error log in adagrad op as title * re-allow use of thnn_h_path This fixes cffi usage in OSS * [4/4] [tum] paralyzing layerNorm for GPU full sync as title * add compile=False to pytorch tests, remove hack with pyc * Add shape and type inference for RowWiseArgMax operator See title * Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training" This reverts commit 78167eeef0af16b60f72c82f9dcdda9b41b4dcbd @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [fix-flaky-test] mock_hive_reader_test flaky, because GlobalCounter collects local counts intervally # Problem `MockHiveReader` uses `GlobalCounter` to limit `max_examples`. GlobalCounter on server node collect local counts from worker nodes every 1 sec. This 1 sec delay makes it impossible to limit exactly to the `max_examples`, it will definitely exceed `max_examples`. # Plan Given, ``` Expected num_examples = max_examples + num_examples/sec (Read Speed) x 1 sec (GlobalCounter Sync Int * [Caffe2] Fix FCGradient cost inference. Prevent overflow in cost inference FCGradient missed a factor 2 in the `num_outputs == 3` case. Overflow was occurring with flop calculation for FC. Changed types to `uint64_t` to prevent future problems. * Fix binary ops with empty inputs Fix binary ops with empty inputs * Support the filling of input blob with provided data as title for Biz Integrity case * Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"" Original commit changeset: 30c55dd38816 Original diff is reverted due to introducing bad integration test. Fixed the integration test. * [c2][easy] improve pack ops error loggings as desc. * Add ShapeTypeInference for LpNorm operator As desc * Shard test_nn to reduce runtime for each test target Closes https://github.com/pytorch/pytorch/pull/8793 The current test_nn would time out and be disabled in GreenWarden, and we need to have an option to split it up in order to pass the stress test. Right now GreenWarden roughly allows running 100 test cases in test_nn before timing out, and here we have an option to divide test_nn into 30 shards (with ~40 tests in each shard) to allow for some test suite growth in the future. * Change default caffe2_streams_per_gpu to 1 * Remove IN_SANDCASTLE from common.py and test_nn.py We prefer to disable the failing tests through Sandcastle UI instead. * Add a new class for an updated prof_dag.proto This diff contains: - An updated prof_dag.proto that contains blob profiles. - A class to deserialize this information (serialization is in a follow up diff) - Update to separate profiling information from NeuralNet (and use it as part of the class above). - Unit tests * Lambdarank for SparseNN This diff adds a lambda_rank_layer for SparseNN. changes include 1) Adds support for multi sessions in c2 op 2) Adds support for two different loss functions in c2 op 3) Unit tests for op * Revert D8586950: Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"" This reverts commit 012220ed63eccc35659a57b31d16a3625da6317b @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [easy] A few fixups to multithread predictor benchmark (1) support perf on T6 server (2) remove dead code * fix a bug about the map size as title * Fix reduce sum on in-place case. Fix reduce sum on in-place case. * [Warmup] Reland reverted diff Allow both offline incremental training and online training Closes https://github.com/pytorch/pytorch/pull/8827 fix net transform integration test. Allow offline and online trainer to coexist D7802642. * Add StoreHandlerNotAvailableException Add an exception for a store that is not available or has been deleted. * Use exception handling for fault tolerance, missing KV store Remove status blobs to communication ops so that exceptions propagate on failure. * [C2/D2][2/n]: Nonnegative-Constrained Optimization -- bounded grad proj for simple bounded constrained optimization, incl non-negative box constraints. * [GanH]: Adaptive Weighting with More Estimations With implemented postivity optimization, we now learn adaptive weights with different parameterizations. This improves parameter estimation and training stability. * Revert some changes for landing * Remove AutoNoGIL in StorageSharing * Temporarily disable net_tests * Revert "[Caffe2] Force tensor inference checks to be triggered during testing" This reverts commit 67ef05c22b2f71b4a489695384932f968384a2a4. * Revert "Fix reduce sum on in-place case." This reverts commit 6cb8a8e1b3db7b6d20941b0053e3f3836068eb64. * Revert "Revert "Fix reduce sum on in-place case."" This reverts commit 130a257c0893dc09f4bd6e6a45d112261807fd2c.	2018-06-26 14:55:48 -07:00
Edward Z. Yang	88db4c816e	Disable flaky Chaining tests (#8601 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-18 11:24:37 -04:00
sf-wind	752bb954b4	Update RunAsyncFailure test (#8486 ) Fix RunAsyncFailure test	2018-06-14 12:05:57 -04:00
sf-wind	5b86c3af4a	Update from facebook (#8384 ) * [fix] fixup the bias multiplier data access issue Hotfix for failues in conv_transpose * [D2][Easy]: lint regularizer lint with black * [GanH]: Split mu in adaptive weight for diagnose * [Dper] Add the ability to split FC weights into multiple smaller ones * fix SumReduceLikeOp for empty blob as desc. * add ctc_greedy_decoder for caffe2 ctc_greedy_decoder same as tf's * Update event callback handling Allow multiple callbacks per event * Add WeightedSum layer The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in honet: https://fburl.com/f4rmolg2 crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm * Replicate DAG's behavior Some callers expect RunAsync to block, replicate that behavior in case of explicit 'dag' net type * [dper] layernorm layer as title * Override dag, async_dag, async_polling Overriding dag, async_dag and async_polling with async_scheduling * Name the thread pools Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead. * [Caffe2] FilleOp should support int64_t dimensions Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc) * Remove caffe2/caffe2/contrib/torch/ It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!) #accept2ship * Fix linearWarmup multiplier check The multiplier needs to be non-negative, not strictly positive. * Revert D3314316 This is after 2 years and we do not seem to have a use case for this one, so for the sake of clean API design we should potentially remove this. This would allow us to potentially pass in arguments to optionally construct an object, although it is indeed a little bit unclear how we can reuse existing objects if constructor arguments are passed in. In any case, we may want to remove this dangling feature. * Speedup generate proposals by partial_sort. Speedup generate proposals by partial_sort. FACEBOOK: - Saw speed improvement for training with this op. - Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details. * More parallel processing friendly for CPP version of GenerateProposals. More parallel processing friendly for CPP version of GenerateProposals. * [DT] [43/n] Lift stop conditions inside reader code back to flow control 1. Split multi_reader function into local_reader and remote_reader 2. Lifted stop conditions inside Limiter back to flow control 3. Split epoch flow building logic into 3 cases: - single machine (1 reader, 1 trainer on trainer0 node, no PS) - (1 reader + 1 trainer) on trainer0 node, has PS - multiple readers, readers do not share nodes with trainers, might have PS or not * Resolve conflicts for torch/_thnn/utils.py * [Caffe2] Handle image decoding errors Image decoding errors can make the whole training fail. This diff is to handle them 1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors. 2.Replace the image with empty in case of error 3.Count the number of errors and throw runtime exception if the rate reaches given number The empty image data is kept. It might introduce noise in the training data. * Update MKL exporter to IDEEP ops TSIA * [Caffe2] GlobalInit is thread safe, fixing the comment With the mutex and lock, GlobalInit is thread safe. Update the comments. * Back out "Add support for generating ATen files during fbcode build" Original commit changeset: 28970ddba353 @override-unit-failures (Note: this ignores all push blocking failures!) * [DT]: fix predictor save similar to D6610058, here we add the fix for distributed online training * Remove net_singlethread_async_gpu.cc Closes https://github.com/caffe2/caffe2/pull/2528 This removes net_singlethread_async_gpu.cc as part of our effort to clean CUDAContext and the net executors. * Inline DFS task execution Add a DFS inline task execution mode in executor * Add c10 folder to fbcode This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten. * add dependencies for online trainer Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/ * Resolve conflicts for tools/jit/gen_jit_dispatch.py * [Fix] sparse regularization in distributed training * Support advanced pooling options in sum processor * support advanced pooling options in sum processor * remove redundant code * support attention in sum processor * Improve shard logging in net tracing code Make it handle arbitrary shard ids instead of just one digit ids. * [Caffe2] Call GlobalInit in predictor only in mobile FACEBOOK: Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens: User does not call GlobalInit and initFacebook after program starts User sets a flag manually: https://fburl.com/mcsumw7d User calls OSS predictor. OSS predictor calls GlobalInit GlobalInit calls initFacebook initFacebook resets all flags: https://fburl.com/tolszha1 Thus, the user manually set flags are overwritten This would happen anytime GlobalInit is called long after the program starts. I suppose the intention of the user in this case is not to call GlobalInit throughout the program, but use Caffe2 regardless (is that desired?) But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2. This issue doesn't exist in mobile, since initFacebook is not called on mobile. For now, guard the GlobalInit in predictor for mobile only. May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this. * resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py * Add empty fix for SumLikeReduceOp Add empty fix for SumLikeReduceOp * Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Remove Declarations.yaml * Include common.h * Change std::stoi to caffe2::stoi * Add thread_name.cc to the CMake file * No need to subtract 1. Fix test segfaults * Fix NetTest, ObserverTest Fix tests (cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41) * CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU * Add a variable to avoid conversion resizing issue * [fix] fixup the bias multiplier data access issue Hotfix for failues in conv_transpose * [D2][Easy]: lint regularizer lint with black * [GanH]: Split mu in adaptive weight for diagnose * [Dper] Add the ability to split FC weights into multiple smaller ones * fix SumReduceLikeOp for empty blob as desc. * add ctc_greedy_decoder for caffe2 ctc_greedy_decoder same as tf's * Update event callback handling Allow multiple callbacks per event * Add WeightedSum layer The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in honet: https://fburl.com/f4rmolg2 crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm * Replicate DAG's behavior Some callers expect RunAsync to block, replicate that behavior in case of explicit 'dag' net type * [dper] layernorm layer as title * Override dag, async_dag, async_polling Overriding dag, async_dag and async_polling with async_scheduling * Name the thread pools Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead. * [Caffe2] FilleOp should support int64_t dimensions Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc) * Remove caffe2/caffe2/contrib/torch/ It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!) #accept2ship * Fix linearWarmup multiplier check The multiplier needs to be non-negative, not strictly positive. * Revert D3314316 This is after 2 years and we do not seem to have a use case for this one, so for the sake of clean API design we should potentially remove this. This would allow us to potentially pass in arguments to optionally construct an object, although it is indeed a little bit unclear how we can reuse existing objects if constructor arguments are passed in. In any case, we may want to remove this dangling feature. * Speedup generate proposals by partial_sort. Speedup generate proposals by partial_sort. FACEBOOK: - Saw speed improvement for training with this op. - Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details. * More parallel processing friendly for CPP version of GenerateProposals. More parallel processing friendly for CPP version of GenerateProposals. * [DT] [43/n] Lift stop conditions inside reader code back to flow control 1. Split multi_reader function into local_reader and remote_reader 2. Lifted stop conditions inside Limiter back to flow control 3. Split epoch flow building logic into 3 cases: - single machine (1 reader, 1 trainer on trainer0 node, no PS) - (1 reader + 1 trainer) on trainer0 node, has PS - multiple readers, readers do not share nodes with trainers, might have PS or not * Resolve conflicts for torch/_thnn/utils.py * [Caffe2] Handle image decoding errors Image decoding errors can make the whole training fail. This diff is to handle them 1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors. 2.Replace the image with empty in case of error 3.Count the number of errors and throw runtime exception if the rate reaches given number The empty image data is kept. It might introduce noise in the training data. * Update MKL exporter to IDEEP ops TSIA * [Caffe2] GlobalInit is thread safe, fixing the comment With the mutex and lock, GlobalInit is thread safe. Update the comments. * Back out "Add support for generating ATen files during fbcode build" Original commit changeset: 28970ddba353 @override-unit-failures (Note: this ignores all push blocking failures!) * [DT]: fix predictor save similar to D6610058, here we add the fix for distributed online training * Remove net_singlethread_async_gpu.cc Closes https://github.com/caffe2/caffe2/pull/2528 This removes net_singlethread_async_gpu.cc as part of our effort to clean CUDAContext and the net executors. * Inline DFS task execution Add a DFS inline task execution mode in executor * Add c10 folder to fbcode This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten. * add dependencies for online trainer Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/ * Resolve conflicts for tools/jit/gen_jit_dispatch.py * [Fix] sparse regularization in distributed training * Support advanced pooling options in sum processor * support advanced pooling options in sum processor * remove redundant code * support attention in sum processor * Improve shard logging in net tracing code Make it handle arbitrary shard ids instead of just one digit ids. * [Caffe2] Call GlobalInit in predictor only in mobile FACEBOOK: Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens: User does not call GlobalInit and initFacebook after program starts User sets a flag manually: https://fburl.com/mcsumw7d User calls OSS predictor. OSS predictor calls GlobalInit GlobalInit calls initFacebook initFacebook resets all flags: https://fburl.com/tolszha1 Thus, the user manually set flags are overwritten This would happen anytime GlobalInit is called long after the program starts. I suppose the intention of the user in this case is not to call GlobalInit throughout the program, but use Caffe2 regardless (is that desired?) But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2. This issue doesn't exist in mobile, since initFacebook is not called on mobile. For now, guard the GlobalInit in predictor for mobile only. May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this. * resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py * Add empty fix for SumLikeReduceOp Add empty fix for SumLikeReduceOp * Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Remove Declarations.yaml * Include common.h * Change std::stoi to caffe2::stoi * Add thread_name.cc to the CMake file * No need to subtract 1. Fix test segfaults * Fix NetTest, ObserverTest Fix tests (cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41) * CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU * Add a variable to avoid conversion resizing issue * Remove the code per soumith's comments * Remove the code per soumith's comments * Remove blank lines in the end of file * Resolve conflicts for torch/_thnn/utils.py * Update MKL exporter to IDEEP ops TSIA * Back out "Add support for generating ATen files during fbcode build" Original commit changeset: 28970ddba353 @override-unit-failures (Note: this ignores all push blocking failures!) * add dependencies for online trainer Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/ * Resolve conflicts for tools/jit/gen_jit_dispatch.py * Support advanced pooling options in sum processor * support advanced pooling options in sum processor * remove redundant code * support attention in sum processor * resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py * Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Remove Declarations.yaml * Include common.h * Change std::stoi to caffe2::stoi * [caffe2] uprade IDEEP and hotfix for conv op accuracy issue (#8364) * [IDEEP] Upgrade IDEEP version Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * [IDEEP] Fix accuracy issue in conv op Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Fix build error due to lack of src in CMakeLists Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Remove the code per soumith's comments * [ONNX] Add an ATen fallback pathway for ONNX export (#8273) * ATen fallback for ONNX export * Move to enum * Fix model test * Add comment * Address comments BC interface * Remove imaginary file (#8415) * [Caffe2] Enable AMD/MIOPEN ops for Caffe2 (#8306) * Add hip support for caffe2 core * Add MIOPEN header/wrapper to caffe2 core * Add HIP device into caffe2 PB * top level makefile change for rocm/hip * makefile scaffolding for AMD/RocM/HIP * Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files * caffe2 PB update for AMD/ROCM HIP device * Add AMD/RocM/Thrust dependency * HIP threadpool update * Fix makefile macro * makefile fix: duplicate test/binary name * makefile clean-up * makefile clean-up * add HIP operator registry * add utilities for hip device * Add USE_HIP to config summary * makefile fix for BUILD_TEST * merge latest * Fix indentation * code clean-up * Guard builds without HIP and use the same cmake script as PyTorch to find HIP * Setup rocm environment variables in build.sh (ideally should be done in the docker images) * setup locale * set HIP_PLATFORM * Revert "set HIP_PLATFORM" This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad. * continue the build script environment variables mess * HCC_AMDGPU_TARGET * Cleanup the mess, has been fixed in the lastest docker images * Assign protobuf field hip_gpu_id a new field number for backward compatibility * change name to avoid conflict * Fix duplicated thread pool flag * Refactor cmake files to not add hip includes and libs globally * Fix the wrong usage of environment variables detection in cmake * Add MIOPEN CNN operators * Revert "Add MIOPEN CNN operators" This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a. * Add MIOPEN pooling operator * Add MIOPEN activation operator * Add MIOPEN softmax operator * Add MIOPEN spatial batch norm operator * Add MIOPEN loacl response normalization operator * Add MIOPEN conv operator * Clean-up LRN ops * enable fp16 in MIOPEN pool ops * Enable fp16 for MIOPEN relu op * Enable fp16 for MIOPEN spatial batch norm op * code clean-up * revert float16 support * Create Caffe2 python binding for AMD/ROCM/HIP * Add op fallback for HIP operator * add hip src/test files in cmake * exclude hip src/test files * fix python binding for hip backend * fix MIOPEN pooling op workspace * hack to compile miopen operators * fix include path for MIOPEN ops * Fix include path * Add HIP math utilities * Fix path for HIP math utils * cmake fix * Cmake fix / hipcc for hip files * suppress hipcc warning * cmake fix /replcae USE_HIP with USE_ROCM * revert LoadHIP.cmake change * fix include for thrust/cub-hip * include path fix for conversion.h * Updated with latest upstream changes * clang format fixes * Context_hip updates * Fixed typo in rocblas handle get function * Updated hipified math utils * Updated math hip test util * Updated context hip test * Updated common_hip * Updated net async dag for HIP * Added MIOPEN in operator hip test * fix * C2 dependencies clean-up * fix include path for building custom protobuf * Decouple miopen pool op and conv_pool_op base * cmake refactor * fix operator_hip_test * move all hip/miopen ops files into caffe2/operators/hip * sanitize cmake * permission issue * remove extra parenthesis * remove artifact from resolving merge conflict * cont. sanitize cmake files * fix syntax error * sanitize conversion.h * . * Revert "." This reverts commit 56020cb0e996a31ae27bf1f8f491955ed0b121b9. * clang-format * Enable some reduce operators' ONNX backend tests (#8418) * fix old comment to point to the right file (#8416) * Stop pinning nccl version. (#8421) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Expose logsumexp docs and mark log_sum_exp in distributions for internal use (#8428) * Enable some of the ONNX backend test on broadcasting (#8423) * Enable some of the ONNX backend test on broadcasting * enable gemm broadcast * Expose proto utils and ONNX (#8073) * Expose proto utils and ONNX from PyTorch libcaffe2.so * Try to use protobuf from _C.so * Fix ONNX proto header include * Adjust order of imports for ONNX until nanopb goes away * Set and use ONNX_NAMESPACE for PyTorch builds * Show protobuf summary for all builds * Add ONNX_NAMESPACE for cpp_build * Statically link libprotobuf.a into libtorch.so * Set ONNX_NAMESPACE on Windows build * Move core/dispatch up as well * Add /MD flag for Windows build of _C * Potential Windows fix for ONNX and protobuf * Add direct linkage from _C to ONNX on Windows * Only include protobuf wrapper for PyTorch * Pass extra_compile_args to _nvrtc ext build * Remove installation of .a files * Rebase creates some weird situations, revert them manually * Remove more weird changes due to rebase * Need to add thread_name.cc after merge	2018-06-13 13:10:45 -07:00
bddppq	f94ae3ba1d	Update from facebook (#7696 ) * Fix handling of empty batches in SumReduceDimsOp As titled * Deferrable async_scheduling finishRun fix Proper order of finishing run operations in deferrable_async_scheduling net * Simplify exception handling in async_scheduling Simplify exception handling, no need to busy wait, thread that processes the last task can finish the run * [C2]worker_coordinator_memorize_worker_ids As titled. This is related to T28689868, where the number of blobs we want to create is equal to the number of worker ids * Add unit test for nets with no type set * Ignore total length argument in sympolic_pad_packed_sequence 1- There was a mistake in the code that total_length was added to the wrong symbolic function (pack_padded_sequence) instead of (pad_packed_sequence) 2- No need to throw an exception if total_length is given since it is only used to enable data_parallel training on multi-gpus and doesn't have anything to do with onnx export, so just ignore it. https://fburl.com/tk4gciqp * Add support for MKLDNN to async_scheduling Just add MKLDNN as a possible CPU option to async_scheduling's pool function * [AuFL][ensemble] support branch output for prediction This diff supports using predictions from different branches and thus enables model ensembling (not fully independent). * Fix a bug in add_loss in layer_model_helper As titled. * Support lradaption for adam 1.lr adaption operator 2.apply to dense adam * Perf tweaks for async_scheduling Restore single pool option + remove unnecessary (no-ops) calls * add quantization to SparseSimdAdagradOp add a bunch of quantization signatures to SparseSimdAdagradOp, implementations to come next * [sr] [codemod] Change all SR callsites to use new API @allow-large-files This diff refactors all callsites of SR to use the slightly changed API introduced in the diff below. Really what this means is that you need to include the correct header. Also if you were using `ClientFactory::newFactory` you need to not prefix it with `ClientFactory::`. ``` cd ~/fbsource/fbcode find ./ -type f -exec sed -i -e 's:#include "servicerouter/client/cpp2/ClientFactory.h":#include "servicerouter/client/cpp2/ServiceRouter.h":' -e 's:#include <servicerouter/client/cpp2/ClientFactory.h>:#include <servicerouter/client/cpp2/ServiceRouter.h>:' -e 's/ClientFactory::newFactory(/newFactory(/g' {} \; ``` Also manually fixed spots that couldn't be done automatically (or broke because they depended on transitive includes). * Back out "Fix handling of empty batches in SumReduceDimsOp" Original commit changeset: 282da1730cc2 This commit is blocking the Github->fbcode sync, which really needs to get merged ASAP. D7881937 which this diff depends on will be reverted in the sync D7990948 which causes this to break. The sync diff cannot be patched with this reversion because it must be landed against base revision 5c8c099 , and D7881937 must not be included in the sync diff because it is breaking GPU tests that are not available in sandcastle : https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-cuda8.0-cudnn6-ubuntu16.04-test/3638/console for one example. * Add the flow to support operator benchmark 1) generate model with the operator 2) upload to everstore 3) generate model spec into json file 4) start running the benchmark * [tum][gpu] Connect DPM trainer with flow and unit tests This diff: - Fix some small bugs for Yiming's recent changes to parallelizer, so it suits real use cases. - Add correct tags to the TUM code, so we can do data parallel transform - pass extra info when instantiation. - add unit test for using DPM in TUM model After this diff, we can do simple box, multi-gpu fully-sync trainer for TUM in Fblearner workflow, but may still need to do speed benchmarking. * w/o normalized lradaption for adam dense only The previous lr adaption includes a normalization step when performing the dot product operation. This is not exactly same as what is proposed in the paper. I add normalization as an option. Without it, the operator performs exactly what the paper proposed. With the option, we add the normalization step * [fb] Use SharedPromise in DeferrableAsyncSchedulingNet This code is to simplify DeferrableAsyncSchedulingNet by removing condition variable + small fixes * [tum] implement cuda sparseLengthsMean and LengthsMean as title * Adding an optional parameter to allow use of protobufs in InferShapesAndTypes function. Adding an optional parameter to allow use of protobufs in InferShapesAndTypes function. * Move feature_to_index to FeatureSpec.feature_to_index move feature_to_index to FeatureSpec.feature_to_index to avoid override other fields * [Caffe2] Rename bytes_moved to bytes_written Just a rename in preparation for supporting bytes_read. * [c2] fix ReduceFrontSumOp for empty case by setting 0 otherwise, it may use the results from last iteration when it's empty batch. * [Caffe2] [Int8] Improve Intel CPU performance * [Easy] Improve PrependDim op logging as titled * DBFileReader expand db_path using os.path.expanduser(..) Since there are a lot of possible use cases of `DBFileReader` to read from user home path, like `~/local/sample.db`, I want to save people's trouble of calling `os.path.expanduser(db_path)` themselves. * [Caffe2] Add bytes_read to cost structure We're adding analytical read bytes to cost functions. This extends the structure accordingly for all CostInference defined operators. Additionally, some small bug fixes were performed: 1) Cost functions now extract type information of operands instead of assuming float * Fix sleef on aarch64 for hhvm @bypass-lint Rename flag * Remove duplicated part in caffe2/ideep/operators/conv_op.cc should be sync error * Rename test helper function test_adagrad_sparse_helper to adagrad_sparse_test_helper to avoid confusing pytest	2018-05-19 23:10:48 -07:00
Eric S. Yu	7bc3414f8f	fix caffe build failed with -O0 (#7570 )	2018-05-16 11:19:15 -04:00
Paul Jesse Hellemn	b875fb281c	Update from facebook (#7451 ) * [bootcamp] Improve "Shape" operator to support axes specification To improve .shape operator of Caffe2 to support x.shape(tensor, axes), which takes an optional int array "axes" as input. For example, x.shape(tensor, [1, 0]) will return the dimension for axis 1 and 0 following the specified order. For current version, "axes" input allows duplications and can have arbitrary length. * Back out "Add barrier net that runs before training nets" Original commit changeset: b373fdc9c30f. Need additional changes to some callers to support barrier failures. * Change warning to verbose log to reduce log spam The `LOG(WARNING)` was a bit spammy for regular use so lets just make it a `VLOG`. * Extract the shared code from different caffe2_benchmark binaries The OSS benchmark and Internal benchmark will share most functions in the benchmark. * Support MFR in sequence training As titled. * Make knowledge distillation work with using logged prediction feature as teacher label. 1) Add loading raw dense feature as teacher label. 2) Optional calibration function for teacher label 3) Add teacher label into generic unit test 4) Deprecated TTSN workflow version using feature_options to config teacher label * [C2/CUDA]: unjoined cross entropy sigmoid as desc * Add async_scheduling executor into deferrable_net_exec_test Add async_scheduling into tests and fix some exception cases * Fix Event disabled error When disabling event in RNN ops make sure we don't call Finish on disabled event from op's RunAsync * cuda ensure cpu output op can handle both TensorCPU and TensorCUDA as desc. * [C2 Core] Infer input device option in C2 hypothesis_test checkers Improve how we default input blob device options. Previously it defaults as where op lives but it is not necessarily the case. For example: CopyCPUToGPU * [C2 Op]SplitByLengthsOp CPU/GPU implementation [C2 Op]SplitByLengthsOp CPU/GPU implementation * fix undefined symbol error not sure why we're getting undefined symbol even with link_whole = True Need to figure out why but need this workaround for now * Add tools in DAIPlayground platform to help debugging models Add additional tools to allow Plauground override individual method defined in AnyExp. This will allow user to create module that specificly change certain default method behavior. An example included in this diff is deactivating test model and checkpointing. When debugging any model problems, switching off components helps me quickly narrow down the location of the bug. The technique is extensively used in task T27038712 (Steady memory increase in EDPM, eventually resulting in gloo/cuda.cu:34: out of memory) * add shape and type inference for int8 conversion operator * Fix flaky test for group_norm Fix flaky test for group_norm * Fix group_norm_op_test flaky Fix group_norm_op_test flaky * Implementation of composite learning rate policy In many state-of-the-arts deep learning works, people use a simple trick to schedule the learning rate: use a fixed learning rate until error plateaus and then switch to a different fixed learning rate, and so on. In this diff, we implemented a simple version of the composite learning rate. The user gives a set of learning rates policies and corresponding iteration nums, and the optimizer will change the learning rate policy based on the number of iterations so far. For example, the user give two learning rate policies, one is FixedLearningRate and PolyLearningRate, with an iteration number of 1k. Then the first 1k iteration, we use FixedLearningRate. For the following iterations, we use PolyLearningRate. * Split two use cases of CachedReader into two classes, DBFileReader and CachedReader # Use Cases: 1). input: DB file -> output: DatasetReader. Use DBFileReader. 2). input: Reader -> build cache DB file -> output: DatasetReader. Use CachedReader. # Changes to CachedReader: 1). Move db_path to the constructor. Because in mock reader. cache will always be built ahead. # Changes to tests: 1). Make a separate TestCase class for CachedReader and DBFileReader. 2). Make it possible to add more test functions by adding setUp, tearDown and _make_temp_path. 3). Make delete db_path more general. `db_path` could be a file for `log_file_db`, but could also be a directory for `leveldb`. * Back out "On Mobile phones, call GlobalInit with no arguments in predictor in case we need to perform initialization" Original commit changeset: 4489c6133f11 * Fix LARS bug Fixed a bug in the LARS implementation which caused all subsequent blobs not using LARS to have the LARS learning rate multiplier applied to them. * [tum] support sparse init & add uniformFill option as title * Propagate exception for async nets Capture the exception when an exception is thrown in async nets and re-throw it after wait(). This allows exceptions to be propagated up to the caller. This diff was a part of D7752068. We split the diff so that C2 core files changes are in a separate diff. * Automatic update of fbcode/onnx to 69894f207dfcd72d1e70497d387201cec327efbc Previous import was 403ccfbd0161c38f0834413d790bad0874afbf9a Included changes: - [69894f2](https://github.com/onnx/onnx/commit/69894f2): Use op schema.all tensor types in random like definitions (#865) <Scott McKay> - [b9d6b90](https://github.com/onnx/onnx/commit/b9d6b90): Clarify random like operators (#846) <Scott McKay> - [fc6b5fb](https://github.com/onnx/onnx/commit/fc6b5fb): Refactor shape inference implementation (#855) <anderspapitto> - [b7d8dc8](https://github.com/onnx/onnx/commit/b7d8dc8): fix cmake warning message (#863) <Eric S. Yu> - [f585c5d](https://github.com/onnx/onnx/commit/f585c5d): add pytorch-operator test for tile (#831) <Wenhao Hu> - [993fe70](https://github.com/onnx/onnx/commit/993fe70): add install step (#832) <Eric S. Yu> - [68bc26c](https://github.com/onnx/onnx/commit/68bc26c): add type inference for traditional ml ops except classifier ops. (#857) <Ke Zhang> - [9cc0cda](https://github.com/onnx/onnx/commit/9cc0cda): fix string representation of scalar types (#858) <G. Ramalingam> - [1078925](https://github.com/onnx/onnx/commit/1078925): fix y in pow test case to scalar (#852) <Wenhao Hu> - [c66fb6f](https://github.com/onnx/onnx/commit/c66fb6f): Add some math function shape inference (#845) <anderspapitto> - [ff667d1](https://github.com/onnx/onnx/commit/ff667d1): Refactor return type and docs for ONNXIFI_BACKEND_DIRECTX_ID (#853) <Marat Dukhan> - [11c6876](https://github.com/onnx/onnx/commit/11c6876): clear initializer names when clear initializer (#849) <Wenhao Hu> - [73c34ae](https://github.com/onnx/onnx/commit/73c34ae): Clarify FeatureVectorizer description. (#843) <Scott McKay> - [1befb9b](https://github.com/onnx/onnx/commit/1befb9b): Remove useless text in docs (#850) <Lu Fang> - [e84788f](https://github.com/onnx/onnx/commit/e84788f): Fix SELU attributes' default values (#839) <Lu Fang> - [ebac046](https://github.com/onnx/onnx/commit/ebac046): Add tile test case (#823) <Wenhao Hu> - [8b7a925](https://github.com/onnx/onnx/commit/8b7a925): a few more shape inference functions (#772) <anderspapitto> - [9718f42](https://github.com/onnx/onnx/commit/9718f42): Make the coefficient non optional for LinearClassifier (#836) <Jaliya Ekanayake> - [ef083d0](https://github.com/onnx/onnx/commit/ef083d0): Add save_tensor and load_tensor functions for Protos (#770) <Lu Fang> - [45ceb55](https://github.com/onnx/onnx/commit/45ceb55): Check if CMAKE_BUILD_TYPE set before project(). (#812) <Sergii Dymchenko> - [4b3d2b0](https://github.com/onnx/onnx/commit/4b3d2b0): [WIP] reenable shape inference tests (#834) <anderspapitto> - [22d17ee](https://github.com/onnx/onnx/commit/22d17ee): RNN tests: LSTM, GRU, SimpleRNN (#739) <Peyman Manikashani> - [de65b95](https://github.com/onnx/onnx/commit/de65b95): dimension denotation (#443) <Tian Jin> - [eccc76e](https://github.com/onnx/onnx/commit/eccc76e): fix field number issue in onnx operator proto and enable its build (#829) <Ke Zhang> - [d582beb](https://github.com/onnx/onnx/commit/d582beb): disable shape inference test to unbreak ci (#830) <Lu Fang> - [485b787](https://github.com/onnx/onnx/commit/485b787): function proto for composite op. (#802) <Ke Zhang> - [cd58928](https://github.com/onnx/onnx/commit/cd58928): specify defaults for attributes of Affine op (#820) <G. Ramalingam> - [7ee2cf9](https://github.com/onnx/onnx/commit/7ee2cf9): merge the dummy backend back into the main one (#743) <anderspapitto> - [1c03a5a](https://github.com/onnx/onnx/commit/1c03a5a): [Proposal] ONNX Interface for Framework Integration (previously ONNX Backend API) header and docs (#551) <Marat Dukhan> - [3769a98](https://github.com/onnx/onnx/commit/3769a98): Rename real model test case from VGG-16 to ZFNet (#821) <Lu Fang> * [C2]ReluN Op relu n op. tf reference: https://www.tensorflow.org/api_docs/python/tf/nn/relu6 * Call destructor when assigning a blob value * Add executor overrides Add executor overrides flag to enable migration to async_scheduling executor * Add barrier net that runs before training nets - attempt #2 Add a synchonize barrier net that is run before training nets. With this net, shards that are faster will wait for other shards before start training. This reduce chances of the faster shards timing out during GLOO AllReduce. Removed explicit data_parallel_model.py.synchronize call in holmes workflow. This change was landed previously but caused errors for some EDPM workflows - See https://fb.facebook.com/groups/1426530000692545/permalink/1906766366002237/ - because EDPM assumes any call to CreateOrCloneCommonWorld and Gloo ops are wrapped in exception handlers but in this case exception thrown in the barrier init net is not handled. To address this issue, we add _CreateOrCloneCommonWorld to the param_init_net instead of a new barrier init net. Since errors for param_init_net run is handled gracefully and re-rendezvous, it should fixes the problem. * Handle empty nets in async_scheduling Make sure we don't get stuck on empty nets * use CUDA_ARCH for conditional compile * [C2 fix] infer function for ensure_cpu_output_op * Update group_norm test to reduce flaky test * Fix lr_multiplier for GPU	2018-05-10 23:14:27 -07:00
Orion Reblitz-Richardson	acea18a54a	Fix net_test ParseFromString usage.	2018-03-30 21:00:44 -07:00
Ilia Cherniavskii	028a598cb9	Expose thread pool to operators Adding ExecutorHelper interface between executor and operators	2018-03-30 21:00:44 -07:00
Orion Reblitz-Richardson	1d5780d42c	Remove Apache headers from source. * LICENSE file contains details, so removing from individual source files.	2018-03-27 13:10:18 -07:00
Yangqing Jia	611a89c4b6	Remove more protobuf APIs. (#2348 ) * Wrap ShutdownProtobufLibrary * Remove text_format.h header and only put the function in proto_utils.h * ParseFromString returns bool	2018-03-21 10:29:45 -07:00
Marat Dukhan	a6a75621cb	Fix warning in net_test Summary: Fixes an annoying warning when building for Android with tests enabled. Closes https://github.com/caffe2/caffe2/pull/1970 Reviewed By: pietern Differential Revision: D7011817 Pulled By: Maratyszcza fbshipit-source-id: 06162d5c5b12ed939581ce9a8498fbed3eb2c47b	2018-02-16 10:56:15 -08:00
Orion Reblitz-Richardson	94b659034e	Potential fix to net_test failure with one GPU Summary: * Putting up to test on Jenkins since I can't test locally on my Mac. Might fix https://github.com/caffe2/caffe2/issues/1796 but I haven't touched these files before, so it's a guess. :) Closes https://github.com/caffe2/caffe2/pull/1826 Reviewed By: Yangqing Differential Revision: D6832918 Pulled By: orionr fbshipit-source-id: 22bdeafa031dbe6457d81cb105b41a451ca3a25d	2018-01-28 23:25:51 -08:00
Ilia Cherniavskii	304e64e70d	Fix NetTest.ChainingForDifferentDevices Summary: Closes https://github.com/caffe2/caffe2/pull/1495 Reviewed By: bwasti Differential Revision: D6381246 Pulled By: pietern fbshipit-source-id: bdc104dff0c667bde08fa0512b5956a39e84ad7e	2017-11-21 11:04:36 -08:00
Ilia Cherniavskii	2792de0d22	Revert D6331513: [caffe2][test] Fix NetTest Summary: This reverts commit b9e8ec9afc110b0284550c4818bde15ae108fa2f bypass-lint Differential Revision: D6331513 fbshipit-source-id: f24cb46fbcbcdbea2523297c567b08ceaaa93ea6	2017-11-14 21:33:12 -08:00
Ilia Cherniavskii	80c3f8fa88	Fix NetTest Summary: Split into cpu and gpu parts, update chaining test Reviewed By: Yangqing Differential Revision: D6331513 fbshipit-source-id: b9e8ec9afc110b0284550c4818bde15ae108fa2f	2017-11-14 19:18:28 -08:00
Ilia Cherniavskii	1149b9bbb5	Polling async net executor Summary: Implementation of polling async net executor. Notes: - New net executor async_polling - schedules CPU and GPU ops asynchronously, uses single polling thread - Events: update to Caffe2 events to support async CPU events, adding new methods: Query() - non-blocking checking of event states: INITIALIZED -> RECORDED -> SUCCESS/FAILED ErrorMessage() - when operation runs asynchronously and fails calling this on event will give error message - Tasks: using existing DAGNet's algorithm to compute CPU and GPU chains, a separate task for each chain - Polling: using single thread to query state of events - for CPU tasks atomically queries task state, for GPU task - uses cudaEventQuery; using Event - Scheduling of CPU ops: using global thread pools - Scheduling of GPU ops: using GPU thread pool per GPU device Reviewed By: dzhulgakov Differential Revision: D5985110 fbshipit-source-id: a9de7fcbb71d046a3aa1b573072b89a65dfeee8c	2017-11-03 07:27:44 -07:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
Yangqing Jia	93e12e75df	Allow caffe2 to detect if cuda lib has been linked, and also fix oss build error. Summary: Closes https://github.com/caffe2/caffe2/pull/1114 Reviewed By: pietern Differential Revision: D5686557 Pulled By: Yangqing fbshipit-source-id: 6b7245ebbe4eeb025ce9d0fe8fda427a0c3d9770	2017-08-23 18:41:15 -07:00
Yangqing Jia	65112f3865	code cleanup: separate the several net implementations to separate files. Summary: TSIA. Reviewed By: harouwu Differential Revision: D5670906 fbshipit-source-id: 507e789978144341bf696fb20dc11f3c2d55493b	2017-08-21 22:07:48 -07:00
Victor Gao	34be12353b	comment out unused parameters Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually. Reviewed By: igorsugak Differential Revision: D5454343 fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2	2017-07-21 15:14:43 -07:00
Ben Zhang	ff914bf201	Move away from creating netdef from string, which may get deprecated Summary: Remove cases of constructing a NetDef from String, instead of just creating a NetDef. Reviewed By: salexspb Differential Revision: D5309645 fbshipit-source-id: 06ec8617733d9dc5385668485f3b091bb37b3f73	2017-06-23 11:36:23 -07:00
Pieter Noordhuis	086cd6fa3e	Don't continue running operators after failure Summary: This brings back DAGNet up to parity with SimpleNet, where execution stops as expected after an operator fails. For the DAGNet it's more involved, since we have to deal with all worker threads stopping execution. Because the job queue may still hold an arbitrary number of chains to execute, this diff explicitly closes it down, waits for all workers to terminate, and resets the job queue, upon seeing a failure. Reviewed By: akyrola Differential Revision: D5232955 fbshipit-source-id: 4dac3c3ed6e5c2ebd07473b0f8be2b02c28978e9	2017-06-15 14:34:33 -07:00
Andrew Gallagher	9c58341809	codemod: use `<>` includes for gtest headers Summary: These are system headers and so should be included via `<>`. Reviewed By: yfeldblum Differential Revision: D4783480 fbshipit-source-id: 979670b594859b45560cead34f615442dfcc9f8b	2017-03-28 00:50:54 -07:00
Aapo Kyrola	ba1d592b5f	New 40% faster net-type for MLP on GPUs Summary: This diff introduces a new net type 'singlethread_async' which is based on my investigation of DPER/hogwild MLP bottlenecks. It only uses one CPU thread, but multiple GPUs on each GPU. This is implemented by having each Net to submit their list of operators to a central GPU-specific executor queue and a thread that executes them asynchronously. This executor takes all tasks in the queue and executes them on separate cuda streams and then waits them in the end. This solution can achieve >95% GPU utilization on 8 GPUs when sufficient amount of workers is used. FYI: I also tried fancier solution such as using cudaStreamCallbacks(), but they did not have as good performance. Improved the dper bench by adding the MomentumSGDUpdate operations and adding speed test capabilities. During my testing I also noticed that the startup costs for inizialing CUDA streams and contexts are high, so it is important to do a warm up. Reviewed By: Yangqing Differential Revision: D4553941 fbshipit-source-id: bb00524bef653d75de026dd64097b8d9b7a0acb3	2017-02-21 21:40:15 -08:00
Aapo Kyrola	7bdd8737cb	Fix to dagnet execution & dependency pruning Summary: Matt uyt reported (1) a very infrequent assertion failure at net.cc worker function. This was caused because an operator, that was not a chain, was scheduled in the job queue. This was possibly to happen since our DAGnet operator graph is graph of operators, and not chains. The dependency prunign that I introduced last week exposed this problem since it removed some "middle-to-chain" dependencies when computing the chains. (It is bit hard to explain). This diff attempts to fix the problem by only allowing scheduling of chains. I added, in addition, extra check to confirm that all parents of all nodes were indeed executed before starting next roud. This adds additional safety and breakpoint to see if there are still problems. I also fixed a bug in the operator graph pruning that made pruning less effective. (1) Matt's report: https://www.prod.facebook.com/groups/1405155842844877/permalink/1639428779417581/ Reviewed By: dzhulgakov Differential Revision: D4531424 fbshipit-source-id: 80fa7def6e8aff6910ebf0d9d5fef15ff20e0aec	2017-02-13 23:47:45 -08:00
Aapo Kyrola	b2472eab3a	Improve dagnet chain computation by pruning redundant dependencies Summary: We have noticed that the number of chains computed is usually much larger than necessary, when there is a backward pass. For example having a network of 5 FCs with gradient operators (but no parameter updates) should yield only one chain, but instead over 20 were created. After adding parameter updates, the forward pass still should remain one chain, while the backward pass will be splintered. Analysis showed that the problem was the dependices from forward ops to the gradient computation. But these are redundant since the gradient op is already dependent from the op via the full path over ops. Example: fc1 -> fc2 ---> fc3 --> loss \| \| \| \| fc1grad <- fc2grad <- fc3grad <- Here fc1 and fc1 grad have a direct dependency, but indirect dependency via fc2->fc3->[...]->fc1grad already covers that dependency. To fix this, I added a pruning step prior to the chain computation. The chain computation is done on the pruned tree, but I do not modify the runtime chains for safety. Pruning is based on following logic: - if one of my direct parents is ancestor via an another traversar, I can remove the direct dependency Pruning is extremely fast, linear in the number of dependencies. Reviewed By: dzhulgakov Differential Revision: D4500293 fbshipit-source-id: 0994ae6775c53378ea1e0074365cef041764a1b4	2017-02-03 07:44:40 -08:00
Yangqing Jia	238ceab825	fbsync. TODO: check if build files need update.	2016-11-15 00:00:46 -08:00
Yangqing Jia	2535720654	fbsync	2016-10-07 15:47:52 -07:00
Yangqing Jia	d1e9215184	fbsync	2016-10-07 13:08:53 -07:00
Yangqing Jia	b23e51d467	chunky sync	2016-09-06 15:55:19 -07:00
Yangqing Jia	6463eebc7b	chunky sync - build scripts to be written	2016-07-21 10:16:42 -07:00

1 2

51 Commits