pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Howard Huang	dadbf43eff	Fix asserts in tests (#72864 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72864 Fixes #72860 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D34246987 Pulled By: H-Huang fbshipit-source-id: 1ba47585533aff4cff9beec49bdc801f8320ffc8 (cherry picked from commit `03e45ceb89`)	2022-02-16 18:35:16 +00:00
Stephen Macke	3d3ad0a52f	[easy] add an `inplace` argument to MutableNetProto.to_net() and core.Net() constructor (#63068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63068 The caffe2 core.Net constructor can accept a caffe2_pb2.NetDef proto, but it always creates a copy. This is wasteful when we can prove that the proto being passed to it will not be used anywhere else. So we add an "inplace" argument to the `core.Net` constructor that allows clients to give away ownership of the passed proto without copying. We default this argument to `False`, ensuring that behavior does not change unless explicitly requested. Test Plan: Let CI run. Differential Revision: D29976510 fbshipit-source-id: 26e13ca76f3431b8ef0de51f08bbf263491d323e	2021-08-11 11:10:52 -07:00
Gary Zheng	4a58f35bef	[caffe2] Fix duplicate name bug in Net.AddExternalInput (#47530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47530 `Net.AddExternalInput` should raise if there are duplicate names. The previous code would only raise if the addition of duplicates was in separate calls, but not if it was in the same call. Test Plan: Added two new regression tests ``` ✓ Pass: caffe2/caffe2/python:core_test - testSetInputRecordWithBlobs (caffe2.caffe2.python.core_test.TestExternalInputs) (9.622) ✓ Pass: caffe2/caffe2/python:core_test - testAddExternalInputShouldRaiseIfDuplicate (caffe2.caffe2.python.core_test.TestExternalInputs) (9.639) ✓ Pass: caffe2/caffe2/python:core_test - testSetInputRecordWithoutBlobs (caffe2.caffe2.python.core_test.TestExternalInputs) (9.883) ✓ Pass: caffe2/caffe2/python:core_test - testAddExternalInputShouldRaiseIfDuplicateInSameCall (caffe2.caffe2.python.core_test.TestExternalInputs) (10.153) ``` Test trained 2 models. No issues f230755456 f230754926 Reviewed By: dzhulgakov Differential Revision: D24763586 fbshipit-source-id: c87088441d76f7198f8b07508b2607aec13521ed	2020-11-09 08:30:58 -08:00
Yunfan Zhong	e519fcd1aa	Remap net name inside arg.n for AsyncIf operator Summary: Similar to If operator, AsyncIf also contains nets in args. It needs the same handling. Test Plan: New unit test test_control_op_remap `buck test caffe2/caffe2/python:core_test` Also it worked end to end in prototype of dist bulk eval workflow f226680903 Reviewed By: yyetim Differential Revision: D24451775 fbshipit-source-id: 50594e2ab9bb457329ed8da7b035f7409461b5f6	2020-10-23 10:41:06 -07:00
Bugra Akyildiz	27c7158166	Remove __future__ imports for legacy Python2 supports (#45033 ) Summary: There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports: ```2to3 -f future -w caffe2``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033 Reviewed By: seemethere Differential Revision: D23808648 Pulled By: bugra fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38	2020-09-23 17:57:02 -07:00
Nikita Shulga	48ae5945de	Skip TestExtractPredictorNet if compiled without OpenCV (#42168 ) Summary: Found while trying to get RocM Caffe2 CI green Pull Request resolved: https://github.com/pytorch/pytorch/pull/42168 Reviewed By: seemethere Differential Revision: D22791879 Pulled By: malfet fbshipit-source-id: 8f7ef9711bdc5941b2836e4c8943bb95c72ef8af	2020-07-28 11:26:55 -07:00
Nikita Shulga	0799a81cb7	Extend Net.RunAllOnGPU() to support RecurrentNetwork op (#15713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15713 [caffe2] Extend Net.RunAllOnGPU() to support RecurrentNetwork op Reviewed By: dzhulgakov Differential Revision: D13576507 fbshipit-source-id: f517127492c9d516ece663d42fef84338c70344e	2019-02-08 15:48:42 -08:00
rohithkrn	aa88c2c0b6	Unify gpu_support variable in python tests (#16748 ) Summary: Assign `has_gpu_support = has_cuda_support or has_hip_support` and make according changes in python tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16748 Differential Revision: D13983132 Pulled By: bddppq fbshipit-source-id: ca496fd8c6ae3549b736bebd3ace7fa20a6dad7f	2019-02-07 00:29:51 -08:00
rohithkrn	0d663cec30	Unify cuda and hip device types in Caffe2 python front end (#14221 ) Summary: Goal of this PR is to unify cuda and hip device types in caffe2 python front end. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14221 Differential Revision: D13148564 Pulled By: bddppq fbshipit-source-id: ef9bd2c7d238200165f217097ac5727e686d887b	2018-11-29 14:00:16 -08:00
Yan Zhu	2356c8d542	device inference for Adam (#13990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13990 to make sure ITER blob lives on CPU. Reviewed By: xianjiec Differential Revision: D13056070 fbshipit-source-id: 148edbf745e50e886da3eb99d4e485d11c1924e2	2018-11-14 17:21:08 -08:00
Junjie Bai	f54ab540af	Rename cuda_gpu_id to device_id in DeviceOption (#12456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12456 codemod with 'Yes to all' codemod -d . --extensions h,cc,cpp,cu,py,proto,pbtxt,pb.txt,config cuda_gpu_id device_id Overload TextFormat::ParseFromString to do string replace when parsing from protobuf format Reviewed By: Yangqing Differential Revision: D10240535 fbshipit-source-id: 5e6992bec961214be8dbe26f16f5794154a22b25	2018-10-09 15:54:04 -07:00
Junjie Bai	ff608a9ff3	Back out "Revert D10123245: Back out "codemod cuda_gpu_id to device_id"" (#12232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12232 Original commit changeset: fca91fea58b7 This adds proper modifications to the DeviceType <->DeviceOption conversion code added in D10033396 Reviewed By: jerryzh168 Differential Revision: D10132473 fbshipit-source-id: 801ef777e2950982cb47b48051b1471a0a91e64b	2018-10-01 21:54:52 -07:00
Rick Ratmansky	3010dc4208	Revert D10123245: Back out "codemod cuda_gpu_id to device_id" Differential Revision: D10123245 Original commit changeset: d83da8e00a12 fbshipit-source-id: fca91fea58b7df208edc2e218a1d514f9821ec7b	2018-10-01 12:22:36 -07:00
Yang Liu	7d7d336c45	Back out "codemod cuda_gpu_id to device_id" Summary: Original commit changeset: f5614a5d2607 D9986213 is causing Multifeed Aggregator a [huge performance different](https://our.intern.facebook.com/intern/ads/analyze_canary/412951953278781781/) and is blocking aggregator push since last Friday night: https://fburl.com/feedtools/b6izvwjz We need to land this revert ASAP to unblock aggregator push. Reviewed By: orionr Differential Revision: D10123245 fbshipit-source-id: d83da8e00a1250f5d09811a0a587c127e377aab2	2018-10-01 11:31:14 -07:00
Junjie Bai	3eb5940cf5	codemod cuda_gpu_id to device_id (#12022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12022 codemod -d . --extensions h,cc,cpp,cu,py,proto,pbtxt,pb.txt,config cuda_gpu_id device_id codemod with 'Yes to all' Reviewed By: orionr Differential Revision: D9986213 fbshipit-source-id: f5614a5d26078817aee8caf79a494abfd6a95ff1	2018-09-27 20:24:53 -07:00
Shihao Xu	b834d9107e	Revert D9566744: [New Checkpoint] Kill the dummy TaskOutput when task.get_step() (#11164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11164 Revert D9566744 Reviewed By: enosair Differential Revision: D9620272 fbshipit-source-id: 6a78c46929f66bd11969840cb6b107f734be0c02	2018-08-31 22:25:57 -07:00
Shihao Xu	ad1670cf54	Kill the dummy TaskOutput when task.get_step() (#11048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11048 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10739 I wanted to assert that the blobs in the workspace of the new session after loading checkpoint are exactly the same as the blobs in the workspace of the old session before saving to a checkpoint. But I found that when calling `task.get_step()`, a dummy task output blob, `task:output/ConstIntFill:0`, is added. Also a dummy net `task:output` was also added along with it. See https://fburl.com/937lf2yk This makes it hard to assert "Equal", forcing me to assert "LessThan" or "GreaterThan". This adding a dummy TaskOutput when user specifies no TaskOutput is a hack. The reason for this is that ZMQ socket can't send empty blob list. As a result, if the Task on the Worker had no output, The master would never stop waiting and hang forever. See https://fburl.com/rd7fhy6p and imagine `socket.recv(net, 0)`. TaskOuput is at user layer. The hack shouldn't be exposed to user layer, polluting user workspaces. Instead, we should move the creating of the dummy blob to some deeper layer, and remove the dummy blob in the workspace afterwards to avoid polluting user workspaces. After this change, the workaround becomes totally transparent and no side-effect to users. Reviewed By: mraway Differential Revision: D9566744 fbshipit-source-id: 18292dd64a6d48192c34034200a7c9811d2172af	2018-08-29 20:11:29 -07:00
Zhanibek Datbayev	22e3b2c9c3	Revert D9413150: [New Checkpoint] Kill the dummy TaskOutput when task.get_step() Differential Revision: D9413150 Original commit changeset: 51aaf3201e26 fbshipit-source-id: ac7c4c0960db03f344fe3eb2ad7f0e034db2371a	2018-08-29 14:39:49 -07:00
Shihao Xu	6ca28984c7	Kill the dummy TaskOutput when task.get_step() (#10739 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10739 I wanted to assert that the blobs in the workspace of the new session after loading checkpoint are exactly the same as the blobs in the workspace of the old session before saving to a checkpoint. But I found that when calling `task.get_step()`, a dummy task output blob, `task:output/ConstIntFill:0`, is added. Also a dummy net `task:output` was also added along with it. See https://fburl.com/937lf2yk This makes it hard to assert "Equal", forcing me to assert "LessThan" or "GreaterThan". This adding a dummy TaskOutput when user specifies no TaskOutput is a hack. The reason for this is that ZMQ socket can't send empty blob list. As a result, if the Task on the Worker had no output, The master would never stop waiting and hang forever. See https://fburl.com/rd7fhy6p and imagine `socket.recv(net, 0)`. TaskOuput is at user layer. The hack shouldn't be exposed to user layer, polluting user workspaces. Instead, we should move the creating of the dummy blob to some deeper layer, and remove the dummy blob in the workspace afterwards to avoid polluting user workspaces. After this change, the workaround becomes totally transparent and no side-effect to users. Reviewed By: mraway Differential Revision: D9413150 fbshipit-source-id: 51aaf3201e26570b4fcf5738e9b9aa17c58777ac	2018-08-28 20:41:46 -07:00
Yiming Wu	579962f2a8	reroute tensor feature in core.Net and generate one net feature in model_helper (#10528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10528 adding 2 features to core and model_helper - reroute_tensor which supports op insertion on net level - model_helper complete net and cut net used for full graph analysis Differential Revision: D9330345 fbshipit-source-id: 56341d3f500e72069ee306e20266c8590ae7985a	2018-08-15 16:40:15 -07:00
Kittipat Virochsiri	8a0fe0a588	set_input_record() should always add external input (#9636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9636 Make sure that the blobs are registered to the net Reviewed By: pjh5 Differential Revision: D8924883 fbshipit-source-id: f09422a2d4d5ba8bf6cfbfd00172097b5ab1fcd6	2018-07-20 11:55:37 -07:00
Artem Volkhin	b6b6e1b39f	Fix core.Plan.create_from_proto (#9438 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9438 Current implementation of create_from_proto doesn't work as expected: it duplicates networks and execution steps by copying original PlanDef first and adding each step one-by-one later. Reviewed By: pjh5 Differential Revision: D8850316 fbshipit-source-id: 9b02836d6e6ee1c91cfdd3b4c4804f14137dc22b	2018-07-18 10:55:55 -07:00
Bram Wasti	82b981e4db	Update from facebook 1ee4edd286a3 (#8040 ) * Adding instance weight to batch distill loss as title * add bfloat 16-31 added bfloat 16-31 and their respective unit tests * [CUDA9] Upgrade - fbcode CUDA9 upgrade diff D5654023 has been out for a while thanks to Pieter. But with time growing it's becoming quite hard to rebase, because of the symlinks and auto-generated build/config files in tp2. Break D5654023 into two diffs, one touching tp2 config files, and another one touching fbcode TARGETS file (adding nvcc flag). These two should be a bit easier to rebase (for detailed procedure see "Test Plan"). This diff can only be committed if: 1. CUDA 9 rpm is rolled out fleet-wide (TBD) 2. NVidia driver 390.40 is rolled out fleet-wide (done) 3. Upgrade CUDA 9.1, cudnn 7.1, nccl 2.1 (done) 4. Make sure all dependents are built (done) 5. Test all C2 operators, PyTorch (see test plan) * Share intermediate int32 buffer across Conv ops Adding a known type * [C2 fix] infer function for ensure_cpu_output_op this is adding the missing device funtion for ensure_cpu_output_op * [int8] Add blob serializer/deserializer for Int8TensorCPU To export to logfiledb * [nomnigraph] Add try catch block to optimization passes in predictor This will catch failures that happen in the optimization pass. * Caffe2: avoid static initialization order fiasco for CAFFE_ENFORCE CAFFE_ENFORCE uses strack trace fetcher. Which is currently a global static variable. If at static initialization time CAFFE_ENFORCE is used, this is a SIOF. Recently CAFFE_ENFORCE was added into init functions registration, so we started to see this. Meyers singleton is going to provide safety here. If stacktrace fetcher was not registered yet, it will just use a dummy one. * NUMA support in SparseNN CPU benchmark Adding support for NUMA in SparseNN CPU benchmark * [mobile-roofline] Add logging needed for roofline model This should be all that's needed * Let the operators using the same input if the operators are not chained or else, we have to change the input data dims * fix null-pointer-use UBSAN errors in in reshape_op.h * revert previous fix on input blob name as title * Adding flag to let MineHardNegative automatically extract single value from dict Model exporter requires the output of the model to be a struct. This makes it convenient to use those models directly in MineHardNegative by allow automatic extraction of the single element of dict, which is a common use case. * Reverting change that broke internal tests back to OSS compatible state	2018-06-01 17:41:09 -04:00
Paul Jesse Hellemn	b875fb281c	Update from facebook (#7451 ) * [bootcamp] Improve "Shape" operator to support axes specification To improve .shape operator of Caffe2 to support x.shape(tensor, axes), which takes an optional int array "axes" as input. For example, x.shape(tensor, [1, 0]) will return the dimension for axis 1 and 0 following the specified order. For current version, "axes" input allows duplications and can have arbitrary length. * Back out "Add barrier net that runs before training nets" Original commit changeset: b373fdc9c30f. Need additional changes to some callers to support barrier failures. * Change warning to verbose log to reduce log spam The `LOG(WARNING)` was a bit spammy for regular use so lets just make it a `VLOG`. * Extract the shared code from different caffe2_benchmark binaries The OSS benchmark and Internal benchmark will share most functions in the benchmark. * Support MFR in sequence training As titled. * Make knowledge distillation work with using logged prediction feature as teacher label. 1) Add loading raw dense feature as teacher label. 2) Optional calibration function for teacher label 3) Add teacher label into generic unit test 4) Deprecated TTSN workflow version using feature_options to config teacher label * [C2/CUDA]: unjoined cross entropy sigmoid as desc * Add async_scheduling executor into deferrable_net_exec_test Add async_scheduling into tests and fix some exception cases * Fix Event disabled error When disabling event in RNN ops make sure we don't call Finish on disabled event from op's RunAsync * cuda ensure cpu output op can handle both TensorCPU and TensorCUDA as desc. * [C2 Core] Infer input device option in C2 hypothesis_test checkers Improve how we default input blob device options. Previously it defaults as where op lives but it is not necessarily the case. For example: CopyCPUToGPU * [C2 Op]SplitByLengthsOp CPU/GPU implementation [C2 Op]SplitByLengthsOp CPU/GPU implementation * fix undefined symbol error not sure why we're getting undefined symbol even with link_whole = True Need to figure out why but need this workaround for now * Add tools in DAIPlayground platform to help debugging models Add additional tools to allow Plauground override individual method defined in AnyExp. This will allow user to create module that specificly change certain default method behavior. An example included in this diff is deactivating test model and checkpointing. When debugging any model problems, switching off components helps me quickly narrow down the location of the bug. The technique is extensively used in task T27038712 (Steady memory increase in EDPM, eventually resulting in gloo/cuda.cu:34: out of memory) * add shape and type inference for int8 conversion operator * Fix flaky test for group_norm Fix flaky test for group_norm * Fix group_norm_op_test flaky Fix group_norm_op_test flaky * Implementation of composite learning rate policy In many state-of-the-arts deep learning works, people use a simple trick to schedule the learning rate: use a fixed learning rate until error plateaus and then switch to a different fixed learning rate, and so on. In this diff, we implemented a simple version of the composite learning rate. The user gives a set of learning rates policies and corresponding iteration nums, and the optimizer will change the learning rate policy based on the number of iterations so far. For example, the user give two learning rate policies, one is FixedLearningRate and PolyLearningRate, with an iteration number of 1k. Then the first 1k iteration, we use FixedLearningRate. For the following iterations, we use PolyLearningRate. * Split two use cases of CachedReader into two classes, DBFileReader and CachedReader # Use Cases: 1). input: DB file -> output: DatasetReader. Use DBFileReader. 2). input: Reader -> build cache DB file -> output: DatasetReader. Use CachedReader. # Changes to CachedReader: 1). Move db_path to the constructor. Because in mock reader. cache will always be built ahead. # Changes to tests: 1). Make a separate TestCase class for CachedReader and DBFileReader. 2). Make it possible to add more test functions by adding setUp, tearDown and _make_temp_path. 3). Make delete db_path more general. `db_path` could be a file for `log_file_db`, but could also be a directory for `leveldb`. * Back out "On Mobile phones, call GlobalInit with no arguments in predictor in case we need to perform initialization" Original commit changeset: 4489c6133f11 * Fix LARS bug Fixed a bug in the LARS implementation which caused all subsequent blobs not using LARS to have the LARS learning rate multiplier applied to them. * [tum] support sparse init & add uniformFill option as title * Propagate exception for async nets Capture the exception when an exception is thrown in async nets and re-throw it after wait(). This allows exceptions to be propagated up to the caller. This diff was a part of D7752068. We split the diff so that C2 core files changes are in a separate diff. * Automatic update of fbcode/onnx to 69894f207dfcd72d1e70497d387201cec327efbc Previous import was 403ccfbd0161c38f0834413d790bad0874afbf9a Included changes: - [69894f2](https://github.com/onnx/onnx/commit/69894f2): Use op schema.all tensor types in random like definitions (#865) <Scott McKay> - [b9d6b90](https://github.com/onnx/onnx/commit/b9d6b90): Clarify random like operators (#846) <Scott McKay> - [fc6b5fb](https://github.com/onnx/onnx/commit/fc6b5fb): Refactor shape inference implementation (#855) <anderspapitto> - [b7d8dc8](https://github.com/onnx/onnx/commit/b7d8dc8): fix cmake warning message (#863) <Eric S. Yu> - [f585c5d](https://github.com/onnx/onnx/commit/f585c5d): add pytorch-operator test for tile (#831) <Wenhao Hu> - [993fe70](https://github.com/onnx/onnx/commit/993fe70): add install step (#832) <Eric S. Yu> - [68bc26c](https://github.com/onnx/onnx/commit/68bc26c): add type inference for traditional ml ops except classifier ops. (#857) <Ke Zhang> - [9cc0cda](https://github.com/onnx/onnx/commit/9cc0cda): fix string representation of scalar types (#858) <G. Ramalingam> - [1078925](https://github.com/onnx/onnx/commit/1078925): fix y in pow test case to scalar (#852) <Wenhao Hu> - [c66fb6f](https://github.com/onnx/onnx/commit/c66fb6f): Add some math function shape inference (#845) <anderspapitto> - [ff667d1](https://github.com/onnx/onnx/commit/ff667d1): Refactor return type and docs for ONNXIFI_BACKEND_DIRECTX_ID (#853) <Marat Dukhan> - [11c6876](https://github.com/onnx/onnx/commit/11c6876): clear initializer names when clear initializer (#849) <Wenhao Hu> - [73c34ae](https://github.com/onnx/onnx/commit/73c34ae): Clarify FeatureVectorizer description. (#843) <Scott McKay> - [1befb9b](https://github.com/onnx/onnx/commit/1befb9b): Remove useless text in docs (#850) <Lu Fang> - [e84788f](https://github.com/onnx/onnx/commit/e84788f): Fix SELU attributes' default values (#839) <Lu Fang> - [ebac046](https://github.com/onnx/onnx/commit/ebac046): Add tile test case (#823) <Wenhao Hu> - [8b7a925](https://github.com/onnx/onnx/commit/8b7a925): a few more shape inference functions (#772) <anderspapitto> - [9718f42](https://github.com/onnx/onnx/commit/9718f42): Make the coefficient non optional for LinearClassifier (#836) <Jaliya Ekanayake> - [ef083d0](https://github.com/onnx/onnx/commit/ef083d0): Add save_tensor and load_tensor functions for Protos (#770) <Lu Fang> - [45ceb55](https://github.com/onnx/onnx/commit/45ceb55): Check if CMAKE_BUILD_TYPE set before project(). (#812) <Sergii Dymchenko> - [4b3d2b0](https://github.com/onnx/onnx/commit/4b3d2b0): [WIP] reenable shape inference tests (#834) <anderspapitto> - [22d17ee](https://github.com/onnx/onnx/commit/22d17ee): RNN tests: LSTM, GRU, SimpleRNN (#739) <Peyman Manikashani> - [de65b95](https://github.com/onnx/onnx/commit/de65b95): dimension denotation (#443) <Tian Jin> - [eccc76e](https://github.com/onnx/onnx/commit/eccc76e): fix field number issue in onnx operator proto and enable its build (#829) <Ke Zhang> - [d582beb](https://github.com/onnx/onnx/commit/d582beb): disable shape inference test to unbreak ci (#830) <Lu Fang> - [485b787](https://github.com/onnx/onnx/commit/485b787): function proto for composite op. (#802) <Ke Zhang> - [cd58928](https://github.com/onnx/onnx/commit/cd58928): specify defaults for attributes of Affine op (#820) <G. Ramalingam> - [7ee2cf9](https://github.com/onnx/onnx/commit/7ee2cf9): merge the dummy backend back into the main one (#743) <anderspapitto> - [1c03a5a](https://github.com/onnx/onnx/commit/1c03a5a): [Proposal] ONNX Interface for Framework Integration (previously ONNX Backend API) header and docs (#551) <Marat Dukhan> - [3769a98](https://github.com/onnx/onnx/commit/3769a98): Rename real model test case from VGG-16 to ZFNet (#821) <Lu Fang> * [C2]ReluN Op relu n op. tf reference: https://www.tensorflow.org/api_docs/python/tf/nn/relu6 * Call destructor when assigning a blob value * Add executor overrides Add executor overrides flag to enable migration to async_scheduling executor * Add barrier net that runs before training nets - attempt #2 Add a synchonize barrier net that is run before training nets. With this net, shards that are faster will wait for other shards before start training. This reduce chances of the faster shards timing out during GLOO AllReduce. Removed explicit data_parallel_model.py.synchronize call in holmes workflow. This change was landed previously but caused errors for some EDPM workflows - See https://fb.facebook.com/groups/1426530000692545/permalink/1906766366002237/ - because EDPM assumes any call to CreateOrCloneCommonWorld and Gloo ops are wrapped in exception handlers but in this case exception thrown in the barrier init net is not handled. To address this issue, we add _CreateOrCloneCommonWorld to the param_init_net instead of a new barrier init net. Since errors for param_init_net run is handled gracefully and re-rendezvous, it should fixes the problem. * Handle empty nets in async_scheduling Make sure we don't get stuck on empty nets * use CUDA_ARCH for conditional compile * [C2 fix] infer function for ensure_cpu_output_op * Update group_norm test to reduce flaky test * Fix lr_multiplier for GPU	2018-05-10 23:14:27 -07:00
Lu Fang	664fe34e0a	[Caffe2][fbcode=>GH sync] Update from facebook 4323b18ce13c (#7116 ) * [fix] Re-enable events in RNN ops We have earlier added event disabling in RNN ops as back then we didn't use events, with current use cases this is no longer true (https://fburl.com/8vd0lp8y) * use ops with cude impl * Revert D7729695: [caffe2][fix] Re-enable events in RNN ops This reverts commit 4b215c7496fb724656ff4c776933a15bdbbcde5e @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [observer] Clean up observer_config.h #accept2ship * [1/n] Refactor dataio_test.py Replace code duplication with a common function * Add barrier net that runs before training nets Add a synchonize barrier net that is run before training nets. With this net, shards that are faster will wait for other shards before start training. This reduce chances of the faster shards timing out during GLOO AllReduce. Removed explicit data_parallel_model.py.synchronize call in holmes workflow. Similar change in speech/asr_training workflow will come in another diff. * Support the dnnlowp backend in caffe2_benchmark This is for SHARE operator latency evaluation * Migrate integral_image_op to main caffe2 migrate integral_image_op(GPU version) given by https://fburl.com/yvqezigi to caffe2/caffe2/operators and implement its CPU version. Write up a test using the hypothesis_test mechanism * [pos_disc, fbcode] Implement unjoined lr loss As explained in https://our.intern.facebook.com/intern/wiki/Model_Based_Calibration/, when the dataset is an joined data set, where labels might change later, we need to use unjoined logloss. The implementation is almost the same as in Sigrid (https://fburl.com/1trngsls), where loss = y (log(p) - log(1-p)) + (1-y)(log(1-p)) = xy - (1-y)x - (1-y)log(1+exp(-x)) For x < 0, to ensure stability and avoid overflow, we reformulate the above exp as loss = xy - (1-y)x - (1-y)x + (1-y)log(1+exp(x)) = xy + (1-y)log(1+exp(x)) Then the final expression becomes loss = xy + (y - 1) x (x >= 0) - (1 - y) log(1 + exp(x - 2 x (x >= 0))) where y is the true label, x is the dot product and p = logistic(x). This kind of implementation is align with the current implementation of the original cross entropy in https://phabricator.intern.facebook.com/diffusion/FBS/browse/master/fbcode/caffe2/caffe2/operators/cross_entropy_op.cc;0bae3b5d0f825897c5e0dd0ff10f489d7271bf25$7-13 * Keep the array to fix the conflict * [C2] Compute Adagrad effective LR The AdagradWithLR op outputs an extra blob which is contains the average effective learning rate across all weights in this blob. * Open-source extractMetaNetDef & runGlobalInitialization, add new Predictor constructor from db file, and add run_map_outputs 1. Open-source extractMetaNetDef and runGlobalInitialization, for use in 2. new Predictor constructor from db file. 3. Add new run function that returns outputs as TensorMap * Disable eigen cpu Disable eigen cpu in transpose and reduce * Introduce request_only/object_only property of ModelLayer by default this is False * A simple TC Caffe2 benchmark We can run tunner, get MappingOptions and then use them to compare against cuBLAS currently broken due to LLVM issues. How to run: hg checkout eec1ab31b59c03b8deded1c755a9abaf8c45be01 add D7401202 add D7434625 add D7506031 add D7540728 buck run @mode/dev-nosan tc/tc/benchmarks_python:caffe2_benchmark * Move Caffe2 feature_maps_ops to open source Need feature maps operators in open source project facebookresearch/BlueWhale * Manually fix the conflicts in channel shuffle op * Fix the inconsistency between different gh and fbcode * Skip Adagrad GPU Test (Because some gpu implementation is missing) * Fix another test to make sure it won't run on gpu when implementation is not available yet	2018-05-01 20:49:00 -07:00
Orion Reblitz-Richardson	6223bfdb1d	Update from Facebook (#6692 ) * [GanH][Easy]: Add assertion to adaptive weighting layer 0 weight causes numeric instability and exploding ne * [Easy] Add cast op before computing norm in diagnose options As LpNorm only takes floats we add a manual casting here. * Introduce a new caching device allocator `cudaMalloc` and `cudaFree` calls are slow, and become slower the more GPUs there are. Essentially, they grab a host-wide (not device-wide) lock because GPU memory is transparently shared across all GPUs. Normally, this isn't much of a concern since workloads allocate memory upfront, and reuse it during later computation. However, under some computation models (specifically, memory conserving approaches like checkpoint-and-recompute, see https://medium.com/@yaroslavvb/fitting-larger-networks-into-memory-583e3c758ff9) this assumption is no longer true. In these situations, `cudaMalloc` and `cudaFree` are common and frequent. Furthermore, in data parallel contexts, these calls happen at nearly the same time from all GPUs worsening lock contention. A common solution to this problem is to add a custom allocator. In fact, nVIDIA provides one out of the box: CUB, which Caffe2 already supports. Unfortunately, the CUB allocator suffers from very high fragmentation. This is primarily because it is a "buddy" allocator which neither splits nor merges free cached blocks. Study https://github.com/NVlabs/cub/blob/1.8.0/cub/util_allocator.cuh#L357 if you want to convince yourself. This diff adapts a caching allocator from the Torch codebase https://github.com/torch/cutorch/blob/master/lib/THC/THCCachingAllocator.cpp which does splitting and merging and ends up working really well, at least for workloads like the checkpoint-and-recompute computation models noted above. I simplified the implementation a little bit, made it a bit more C++-like. I also removed a bunch of stream synchronization primitives for this diff. I plan to add them back in subsequent diffs. * Report reader progress in fblearner workflows Integrate with fblearner progress reporting API and add support to report training progress from reader nodes. If reader is constructed with batch limits, report based on finished batch vs total batch. The finished batch may be more than total batch because we evaludate if we should stop processing everytime we dequeue a split. If no limit for the reader, report based on finished splits (Hive files) vs total splits. This is fairly accurate. * [GanH][Diagnose]: fix plotting 1. ganh diagnose needs to set plot options 2. modifier's blob name is used for metric field can need to be fixed before generating net * Automatic update of fbcode/onnx to 985af3f5a0f7e7d29bc0ee6b13047e7ead9c90c8 * Make CompositeReader stops as soon as one reader finishes Previously, CompositeReader calls all readers before stopping. It results in flaky test since the last batch may be read by different threads; resulting in dropped data. * [dper] make sure loss is not nan as desc. * [rosetta2] [mobile-vision] Option to export NHWC order for RoIWarp/RoIAlign Thanks for finding this @stzpz and @wangyanghan. Looks like NHWC is more optimized. For OCR though it doesn't yet help since NHWC uses more mem b/w but will soon become important. * Intra-op parallel FC operator Intra-op parallel FC operator * [C2 Proto] extra info in device option passing extra information in device option design doc: https://fb.quip.com/yAiuAXkRXZGx * Unregister MKL fallbacks for NCHW conversions * Tracing for more executors Modified Tracer to work with other executors and add more tracing * Remove ShiftActivationDevices() * Check for blob entry iff it is present When processing the placeholders ops, ignore if the blob is not present in the blob_to_device. * Internalize use of eigen tensor Move use of eigen tensor out of the header file so we don't get template partial specialization errors when building other libraries. * feature importance for transformed features. * - Fix unused parameter warnings The changes in this diff comments out unused parameters. This will allow us to enable -Wunused-parameter as error. #accept2ship * add opencv dependencies to caffe2 The video input op requires additional opencv packages. This is to add them to cmake so that it can build * Add clip_by_value option in gradient clipping Add clip_by_value option in gradient clipping when the value is bigger than max or smaller than min, do the clip * std::round compat	2018-04-17 23:36:40 -07:00
Manoj Krishnan	a92a6233b5	Enable support for placeholder ops in InjectCrossDeviceCopies This is required to support placeholder/decorator ops which does not have operator schema. Note that the change is made in such a way that it is a no-op if placeholder Ops are not used. Changes: 1. Since the placeholder ops always run on CPU, added a utility to infer placeholder ops blob devices. 2. Placeholder op's input/output blobs should be on CPU as well. This change takes care of dealing with output blobs - i.e. use blobs on CPU. 3. Added a Unit test - test_inject_copy_placeholder_ops	2018-03-27 18:10:39 -07:00
Orion Reblitz-Richardson	1d5780d42c	Remove Apache headers from source. * LICENSE file contains details, so removing from individual source files.	2018-03-27 13:10:18 -07:00
Manoj Krishnan	c43896732e	Added device inference functions for Concat and Split Ops. Changes: ======= 1. Added device inference functions for Concat and Split Ops. 2. Added a unit test to validate the change. See, test_device_inference_function in core_test.py 3. Fixed some formatting.	2018-03-20 13:34:22 -07:00
Kutta Srinivasan	0a18608b43	hacks to test exception handling and python operator backtraces Add exception handling & re-throwing to worker threads of DAGNetBase	2018-03-07 15:09:17 -08:00
Bram Wasti	51897e52da	fix all the broken tests from adding debug info (#2013 )	2018-02-22 17:43:53 -08:00
Simon Layton	a8250280bb	Py3 test fixes Summary: \cc pietern Closes https://github.com/caffe2/caffe2/pull/1555 Differential Revision: D6479902 Pulled By: pietern fbshipit-source-id: 84647eddec45620b1ed603f4882ded2dd49adc43	2017-12-05 10:34:41 -08:00
Hassan Eslami	c2ea3f66b3	Make a concrete function for device_option equality Summary: Currently, the device_option equality is done in a specialized private function. Ideally, we should be able to test the equality from other places in the code and have a more detailed check for the equality. Reviewed By: akyrola Differential Revision: D6316608 fbshipit-source-id: c3fd085583e535d7936d05e4c8b15d2eff91c744	2017-11-13 15:17:06 -08:00
Junjie Bai	d894a6362f	Add missing is_test argument in ImageInput ops Summary: reported in Github Issue https://github.com/caffe2/caffe2/issues/1269 Reviewed By: salexspb Differential Revision: D6004461 fbshipit-source-id: 03f4bccfe085010b30109ab7b6fe7325caa160ef	2017-10-10 10:03:13 -07:00
Junjie Bai	91bb6ce095	Allow explicitly specifying to use operators' default implementation Reviewed By: dzhulgakov Differential Revision: D5973635 fbshipit-source-id: 12dccc6332a8dd264ccc9f831a053a3be9b89c56	2017-10-04 12:17:36 -07:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
Alisson Gusatti Azzolini	e3609a0619	Correctly propagate remap_blob across net boundaries Summary: If a blob is copy from device A to device B in the init_net, and then is used as an external_input in the train_net, we want the train_net to correctly use the blob already on device B instead of copying it over and over again. Reviewed By: akyrola Differential Revision: D5800870 fbshipit-source-id: d93f44bba80e4ed70eb03183d552496b54a966b5	2017-09-24 21:21:57 -07:00
Lei Chen	14950a9082	Support session in distributed realtime trainer Summary: Convert from PlanDef ProtoBuf into python Plan object by recursively creating Nets and ExecutionSteps. Also support running Plan object directly in Session. Reviewed By: azzolini Differential Revision: D5608393 fbshipit-source-id: c0ae3b6da743a759af6db3b614a5a3935fe0b34c	2017-08-16 10:28:55 -07:00
Yiming Wu	b51e0ec0c2	quick fix inplace blob bug Summary: fixing the case where the init net will initialize same blob twice. I made an exception by allowing inplace blob among ops if the blob keeps on the same device. This should fix this problem in a generalized way as most of our training is only on CPU now. Reviewed By: dzhulgakov Differential Revision: D5450564 fbshipit-source-id: 525c4c9a2e5216a70dbd1229da2d9f8a58b89e47	2017-07-23 02:18:16 -07:00
Yiming Wu	1fce3eac4e	single trainer hybrid device Summary: First try of single trainer hybrid device training for sparsenn Comparison results with CPU training: https://our.intern.facebook.com/intern/fblearner/run/compare/?compare_to[0]=20016969&compare_to[1]=19660293&baseline_run=19660293&all_runs[0]=20016969&all_runs[1]=19660293 Reviewed By: dzhulgakov Differential Revision: D5205723 fbshipit-source-id: 4a024324ac2efc3248dd470d4c533cf2ecec2e92	2017-06-27 22:06:30 -07:00
Alexander Sidorov	c8410859d9	Operator python stacktraces, attempt 2 Summary: Last time I used uuid filled into OperatorDef. And operator_tracebacks was populated using traceback.extract_stack. There were several issues with this approach: 1. A random field in OperatorDef breaks workflows relying on memoization, i.e. when computation is skipped based on already computed result before. 2. Adding one more field revealed RNNs being non forward compatible wrt to new fields in there. prototxt format seems to not allow forward compatibility (thanks jamesr66a for the investigation!). For RNNs we need to swtich them to a more resilient approach. azzolini's proposed change to OperatorDef / NetDef would allow that by just nesting NetDef dirrectly inside OperatorDef without need for extra serialization. 3. traceback.extract_stack is very slow when executable is on a remote filesystem. It does one or more os.stat for each frame on the stack. For some cases it ended up being up to 15 extra minutes on model construction. In this diff I use a different approach which should fix all those problems above. 1.2. are solved by not adding a new field at all. Instead I report operator idx wrt to a net it runs in. Thanks akyrola and dzhulgakov for the idea. Downside here is that operator list manipulation breaks the logic and separately created ops are not covered at all. 3. I solved this by operating on raw frames without using traceback and inspect modules which end up doing a lot of file system calls. See function extract_stacktace in core.py with additional comments. Reviewed By: dzhulgakov Differential Revision: D5286285 fbshipit-source-id: 626dd0f5f6b8b1d86bd6bf519078b122f43ddcaa	2017-06-25 19:32:58 -07:00
Alexander Sidorov	83e6a0bec8	Revert uuid change to OperatorDef protobuf Summary: a few issues: 1. Randomization hurts memoization 1. Even if we make it non random, then we can get key colisions when loading it back. 2. RNNs use prototxt for step net and apparently its not forward compatible like normal protobuf is I am thinking of a better less invasive solution now. Reviewed By: jamesr66a Differential Revision: D5272118 fbshipit-source-id: ab577fad04fbfc632e1fceffa923377a0d3da1be	2017-06-19 16:47:31 -07:00
Luke Yeager	90a52c3904	Skip TestInferDevice if no GPU support Summary: Working towards https://github.com/caffe2/caffe2/pull/817. ``` E AttributeError: Method CopyCPUToGPU is not a registered operator. Did you mean: [] ``` https://travis-ci.org/caffe2/caffe2/jobs/243867951 Closes https://github.com/caffe2/caffe2/pull/818 Differential Revision: D5276735 Pulled By: akyrola fbshipit-source-id: 35d9df19330ae522037e8a5d721d83dc2e5aa4dc	2017-06-19 12:21:24 -07:00
Luke Yeager	8ef12951e0	Fix for protobuf with unicode_literals Summary: Python 2.7, Protobuf 2.6 > op.ClearField('uuid') E TypeError: field name must be a string Fix: http://python-future.org/imports.html#should-i-import-unicode-literals /cc salexspb tomdz Closes https://github.com/caffe2/caffe2/pull/804 Differential Revision: D5258494 Pulled By: akyrola fbshipit-source-id: 04c473c1e55bf8caac0bfde7d86171c9f95e71a1	2017-06-15 13:22:57 -07:00
Alexander Sidorov	eebda50b79	Operator python traceback Summary: This is going to show a python Caffe2 user where a failed operator was created. Motivation for having this information not right in protobuf is to avoid having it too verboose and keep ability to read protobufs of a net after a simple print() call. Reviewed By: jamesr66a Differential Revision: D5226047 fbshipit-source-id: 7edfe850e05a2ec209577142aa3368664a57a108	2017-06-13 18:50:02 -07:00
Yiming Wu	406d748423	better engineering for core_test.TestInferDevice Summary: Recently people find that this test is too strict because of proto string matching. Thus, I change it to compare fields so that this test will not complain even if protobuf chnaged in future. Reviewed By: dzhulgakov Differential Revision: D5229855 fbshipit-source-id: 54efcd7a0f9e5dbba1ddeb480801abcb859e07bd	2017-06-12 15:23:00 -07:00
Luke Yeager	52ee7697f4	Fixing broken Python tests Summary: `brew_test.py` is just plain broken. `core_test.py` doesn't work with pytest. `apmeter_test.py` and `top_k_test.py` don't work for CUDA builds. Closes https://github.com/caffe2/caffe2/pull/765 Differential Revision: D5211817 Pulled By: Yangqing fbshipit-source-id: 78ec5af35a3fa870978e4c9590210ade9e3bc5ac	2017-06-08 13:34:46 -07:00
Yiming Wu	4fefff0bbb	Auto injecting device copy for single net and several nets Summary: This diff plan to attack the problem where we want to just annotate device option for operators and leave Caffe2 to help us inject cross device copy functions. This feature would be useful for mixed device training and multi device training with several nets, where previously we do the heavy lifting of adding copy functions ourselves. Ideally, this feature will happen like this: //construct your nets first core.InjectDeviceCopyAmongNets([train_init, train_net, ...]) My ideas are written in comments. I will update them here as well later. Reviewed By: dzhulgakov Differential Revision: D5134103 fbshipit-source-id: 173f7da9d1773d1c50ccdc27f1b5cd3067b04af5	2017-06-07 20:03:18 -07:00
Yiming Wu	8cd208ad6f	Infer input and output device from OperatorDef through OperatorSchema Summary: Infer input and output device from OperatorDef through OperatorSchema. This is inspired by shape inference. With this feature, we can easily analysis device information for all blobs in the net in a generic way. It is really helpful for auto cross device execution. Reviewed By: akyrola, dzhulgakov Differential Revision: D5161065 fbshipit-source-id: ee656123112171a4ca00f2fb3f6940f32ddf3135	2017-06-05 23:47:33 -07:00
Aapo Kyrola	5e6bd4fbfc	Return predict params from ExtractPredictorNet + test Summary: Make it easier for users by returning from ExtractPredictorNet the list of blobs that must be saved/exported to run a predictor net. Added a test for ExtractPredictorNet Codemod. Reviewed By: asaadaldien Differential Revision: D5176097 fbshipit-source-id: b1af42132459487b8d94fcdde0e4c514da608243	2017-06-05 15:34:37 -07:00

1 2

60 Commits