pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Paul Jesse Hellemn	b875fb281c	Update from facebook (#7451 ) * [bootcamp] Improve "Shape" operator to support axes specification To improve .shape operator of Caffe2 to support x.shape(tensor, axes), which takes an optional int array "axes" as input. For example, x.shape(tensor, [1, 0]) will return the dimension for axis 1 and 0 following the specified order. For current version, "axes" input allows duplications and can have arbitrary length. * Back out "Add barrier net that runs before training nets" Original commit changeset: b373fdc9c30f. Need additional changes to some callers to support barrier failures. * Change warning to verbose log to reduce log spam The `LOG(WARNING)` was a bit spammy for regular use so lets just make it a `VLOG`. * Extract the shared code from different caffe2_benchmark binaries The OSS benchmark and Internal benchmark will share most functions in the benchmark. * Support MFR in sequence training As titled. * Make knowledge distillation work with using logged prediction feature as teacher label. 1) Add loading raw dense feature as teacher label. 2) Optional calibration function for teacher label 3) Add teacher label into generic unit test 4) Deprecated TTSN workflow version using feature_options to config teacher label * [C2/CUDA]: unjoined cross entropy sigmoid as desc * Add async_scheduling executor into deferrable_net_exec_test Add async_scheduling into tests and fix some exception cases * Fix Event disabled error When disabling event in RNN ops make sure we don't call Finish on disabled event from op's RunAsync * cuda ensure cpu output op can handle both TensorCPU and TensorCUDA as desc. * [C2 Core] Infer input device option in C2 hypothesis_test checkers Improve how we default input blob device options. Previously it defaults as where op lives but it is not necessarily the case. For example: CopyCPUToGPU * [C2 Op]SplitByLengthsOp CPU/GPU implementation [C2 Op]SplitByLengthsOp CPU/GPU implementation * fix undefined symbol error not sure why we're getting undefined symbol even with link_whole = True Need to figure out why but need this workaround for now * Add tools in DAIPlayground platform to help debugging models Add additional tools to allow Plauground override individual method defined in AnyExp. This will allow user to create module that specificly change certain default method behavior. An example included in this diff is deactivating test model and checkpointing. When debugging any model problems, switching off components helps me quickly narrow down the location of the bug. The technique is extensively used in task T27038712 (Steady memory increase in EDPM, eventually resulting in gloo/cuda.cu:34: out of memory) * add shape and type inference for int8 conversion operator * Fix flaky test for group_norm Fix flaky test for group_norm * Fix group_norm_op_test flaky Fix group_norm_op_test flaky * Implementation of composite learning rate policy In many state-of-the-arts deep learning works, people use a simple trick to schedule the learning rate: use a fixed learning rate until error plateaus and then switch to a different fixed learning rate, and so on. In this diff, we implemented a simple version of the composite learning rate. The user gives a set of learning rates policies and corresponding iteration nums, and the optimizer will change the learning rate policy based on the number of iterations so far. For example, the user give two learning rate policies, one is FixedLearningRate and PolyLearningRate, with an iteration number of 1k. Then the first 1k iteration, we use FixedLearningRate. For the following iterations, we use PolyLearningRate. * Split two use cases of CachedReader into two classes, DBFileReader and CachedReader # Use Cases: 1). input: DB file -> output: DatasetReader. Use DBFileReader. 2). input: Reader -> build cache DB file -> output: DatasetReader. Use CachedReader. # Changes to CachedReader: 1). Move db_path to the constructor. Because in mock reader. cache will always be built ahead. # Changes to tests: 1). Make a separate TestCase class for CachedReader and DBFileReader. 2). Make it possible to add more test functions by adding setUp, tearDown and _make_temp_path. 3). Make delete db_path more general. `db_path` could be a file for `log_file_db`, but could also be a directory for `leveldb`. * Back out "On Mobile phones, call GlobalInit with no arguments in predictor in case we need to perform initialization" Original commit changeset: 4489c6133f11 * Fix LARS bug Fixed a bug in the LARS implementation which caused all subsequent blobs not using LARS to have the LARS learning rate multiplier applied to them. * [tum] support sparse init & add uniformFill option as title * Propagate exception for async nets Capture the exception when an exception is thrown in async nets and re-throw it after wait(). This allows exceptions to be propagated up to the caller. This diff was a part of D7752068. We split the diff so that C2 core files changes are in a separate diff. * Automatic update of fbcode/onnx to 69894f207dfcd72d1e70497d387201cec327efbc Previous import was 403ccfbd0161c38f0834413d790bad0874afbf9a Included changes: - [69894f2](https://github.com/onnx/onnx/commit/69894f2): Use op schema.all tensor types in random like definitions (#865) <Scott McKay> - [b9d6b90](https://github.com/onnx/onnx/commit/b9d6b90): Clarify random like operators (#846) <Scott McKay> - [fc6b5fb](https://github.com/onnx/onnx/commit/fc6b5fb): Refactor shape inference implementation (#855) <anderspapitto> - [b7d8dc8](https://github.com/onnx/onnx/commit/b7d8dc8): fix cmake warning message (#863) <Eric S. Yu> - [f585c5d](https://github.com/onnx/onnx/commit/f585c5d): add pytorch-operator test for tile (#831) <Wenhao Hu> - [993fe70](https://github.com/onnx/onnx/commit/993fe70): add install step (#832) <Eric S. Yu> - [68bc26c](https://github.com/onnx/onnx/commit/68bc26c): add type inference for traditional ml ops except classifier ops. (#857) <Ke Zhang> - [9cc0cda](https://github.com/onnx/onnx/commit/9cc0cda): fix string representation of scalar types (#858) <G. Ramalingam> - [1078925](https://github.com/onnx/onnx/commit/1078925): fix y in pow test case to scalar (#852) <Wenhao Hu> - [c66fb6f](https://github.com/onnx/onnx/commit/c66fb6f): Add some math function shape inference (#845) <anderspapitto> - [ff667d1](https://github.com/onnx/onnx/commit/ff667d1): Refactor return type and docs for ONNXIFI_BACKEND_DIRECTX_ID (#853) <Marat Dukhan> - [11c6876](https://github.com/onnx/onnx/commit/11c6876): clear initializer names when clear initializer (#849) <Wenhao Hu> - [73c34ae](https://github.com/onnx/onnx/commit/73c34ae): Clarify FeatureVectorizer description. (#843) <Scott McKay> - [1befb9b](https://github.com/onnx/onnx/commit/1befb9b): Remove useless text in docs (#850) <Lu Fang> - [e84788f](https://github.com/onnx/onnx/commit/e84788f): Fix SELU attributes' default values (#839) <Lu Fang> - [ebac046](https://github.com/onnx/onnx/commit/ebac046): Add tile test case (#823) <Wenhao Hu> - [8b7a925](https://github.com/onnx/onnx/commit/8b7a925): a few more shape inference functions (#772) <anderspapitto> - [9718f42](https://github.com/onnx/onnx/commit/9718f42): Make the coefficient non optional for LinearClassifier (#836) <Jaliya Ekanayake> - [ef083d0](https://github.com/onnx/onnx/commit/ef083d0): Add save_tensor and load_tensor functions for Protos (#770) <Lu Fang> - [45ceb55](https://github.com/onnx/onnx/commit/45ceb55): Check if CMAKE_BUILD_TYPE set before project(). (#812) <Sergii Dymchenko> - [4b3d2b0](https://github.com/onnx/onnx/commit/4b3d2b0): [WIP] reenable shape inference tests (#834) <anderspapitto> - [22d17ee](https://github.com/onnx/onnx/commit/22d17ee): RNN tests: LSTM, GRU, SimpleRNN (#739) <Peyman Manikashani> - [de65b95](https://github.com/onnx/onnx/commit/de65b95): dimension denotation (#443) <Tian Jin> - [eccc76e](https://github.com/onnx/onnx/commit/eccc76e): fix field number issue in onnx operator proto and enable its build (#829) <Ke Zhang> - [d582beb](https://github.com/onnx/onnx/commit/d582beb): disable shape inference test to unbreak ci (#830) <Lu Fang> - [485b787](https://github.com/onnx/onnx/commit/485b787): function proto for composite op. (#802) <Ke Zhang> - [cd58928](https://github.com/onnx/onnx/commit/cd58928): specify defaults for attributes of Affine op (#820) <G. Ramalingam> - [7ee2cf9](https://github.com/onnx/onnx/commit/7ee2cf9): merge the dummy backend back into the main one (#743) <anderspapitto> - [1c03a5a](https://github.com/onnx/onnx/commit/1c03a5a): [Proposal] ONNX Interface for Framework Integration (previously ONNX Backend API) header and docs (#551) <Marat Dukhan> - [3769a98](https://github.com/onnx/onnx/commit/3769a98): Rename real model test case from VGG-16 to ZFNet (#821) <Lu Fang> * [C2]ReluN Op relu n op. tf reference: https://www.tensorflow.org/api_docs/python/tf/nn/relu6 * Call destructor when assigning a blob value * Add executor overrides Add executor overrides flag to enable migration to async_scheduling executor * Add barrier net that runs before training nets - attempt #2 Add a synchonize barrier net that is run before training nets. With this net, shards that are faster will wait for other shards before start training. This reduce chances of the faster shards timing out during GLOO AllReduce. Removed explicit data_parallel_model.py.synchronize call in holmes workflow. This change was landed previously but caused errors for some EDPM workflows - See https://fb.facebook.com/groups/1426530000692545/permalink/1906766366002237/ - because EDPM assumes any call to CreateOrCloneCommonWorld and Gloo ops are wrapped in exception handlers but in this case exception thrown in the barrier init net is not handled. To address this issue, we add _CreateOrCloneCommonWorld to the param_init_net instead of a new barrier init net. Since errors for param_init_net run is handled gracefully and re-rendezvous, it should fixes the problem. * Handle empty nets in async_scheduling Make sure we don't get stuck on empty nets * use CUDA_ARCH for conditional compile * [C2 fix] infer function for ensure_cpu_output_op * Update group_norm test to reduce flaky test * Fix lr_multiplier for GPU	2018-05-10 23:14:27 -07:00
Orion Reblitz-Richardson	1d5780d42c	Remove Apache headers from source. * LICENSE file contains details, so removing from individual source files.	2018-03-27 13:10:18 -07:00
mlappelbaum	d11fc90317	Export atomic iter count (#2379 ) * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Add axis to top_k_op. (#2416) * Revert update on top_k_op * Add axis to top_k_op Add axis to top_k_op * [auto] Update onnx to a8e4648 - Adjust link flags when built in Windows Debug mode (#647) `a8e4648a7d` * [auto] Update onnx to f4acf28 - Remove allowconsumed enforceconsumed from op schema. (#617) `f4acf281ef` * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Initialize cpuinfo in the thread pool Thread pool called cpuinfo_get_processors_count() without initializing cpuinfo. Only by luck it didn't make Caffe2 single-threaded: threadpool is initialized after NNPACK, and NNPACK initializes cpuinfo itself. This commit also updates cpuinfo to a version that aborts with a fatal error if its used uninitialized. * Updated Python Op and Image Pre-Processing Pipeline tutorials && Added CIFAR-10 Part 1 tutorial (#2286) * Updated Basics tutorial: (1) Added Python 3 support with __future__ statements; (2) Various grammatical/typo fixes and minor refactoring of Markdown * Added Python 3 support and made minor typo fixes * Added Python 3 support with future imports, refactored and corrected errors in Markdown, added comments * Added Python 3 support with future imports, Added use of caffe_translator.py to translate downloaded .caffemodel file to .pb files * Upgrades to Image Pre-Processing Pipeline tutorial * Updated Python Op tutorial * removed markdown with empty links * Added Part 1 of an end-to-end CIFAR-10 tutorial * Updated MNIST Dataset and Databases tutorial with python3 support and markdown fixes * Tweaks to markup, less training iterations * changed permissions of CIFAR10_Part1; typo corrections in Image_Pre-Processing_Pipeline * Typo corrections in Multi-GPU Training tutorial * sync Python_Op py_gen with the IPython notebook * nit typo correction * [auto] Update onnx to 5cb999d - Minor cleanups to shape inference (#653) `5cb999ddc1` * [auto] Update onnx to ecac1c1 - Merge Rel 1.1.0 branch into master (#657) `ecac1c1624` * Strip down onnx to only pb definitions in mobile build (#2426) * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count	2018-03-26 19:26:09 -07:00
Orion Reblitz-Richardson	0ea8964fd6	Revert "Export number of iterations of AtomicIterOp" (#2359 ) * Revert "Use -DCMAKE_BUILD_TYPE=Release for local build by default" This reverts commit 035c62081f6420405b9f1380cc5d21b4c6ae78f6. * Revert "Export number of iterations of AtomicIterOp (#2338)" This reverts commit 91b7a0cb48c6b079e2ca8fd5c26819a003937d76.	2018-03-21 16:11:29 -07:00
mlappelbaum	8346088094	Export number of iterations of AtomicIterOp (#2338 ) * Exported AtomicIterOp count * Exported AtomicIterOp count	2018-03-21 12:39:30 -07:00
Orion Reblitz-Richardson	6aa087d902	Revert "export num iterations of AtomicIter" This reverts commit be9c8e5591f5d38131b9bdc2249542f27dadc221.	2018-03-20 13:34:22 -07:00
Matan Appelbaum	fac306d3c9	export num iterations of AtomicIter as title. Useful for tracking number of EASGD updates.	2018-03-20 13:34:22 -07:00
Orion Reblitz-Richardson	5c381bbc57	Patch cuda-convnet2 from internal Facebook changes. * Unfortunately this needs to be manually monkey patched. * This should get it so GitHub and fbcode versions match.	2018-02-28 14:20:48 -08:00
Pieter Noordhuis	52fa742c51	Revert D6893040: Implementing Pow operator (this merges existing pow with a scalar and new pow with a tensor exponent). Summary: This reverts commit 30f614beea6f859fee25ce4f85573142885dde45 bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! cause_a_sev_many_files Differential Revision: D6893040 Original commit changeset: 30f614beea6f fbshipit-source-id: 5e98a24699088283f864efe31234874bdacbe3c3	2018-02-14 10:34:08 -08:00
Maxim Naumov	f7cc8e8822	Implementing Pow operator (this merges existing pow with a scalar and new pow with a tensor exponent). Summary: The old pow operator has been deleted in math_ops.cc, math_ops.cu and math_ops.h, while the new operator supporting scalar and tensor exponent has been added in pow_op.cc, pow_op.h an elementwise_op.cu. Reviewed By: houseroad Differential Revision: D6893040 fbshipit-source-id: 30f614beea6f859fee25ce4f85573142885dde45	2018-02-13 17:46:35 -08:00
Huazhong Ning	90543ff13a	weighted sampling reader dequeue outputs table index Summary: Weighted sampling reader dequeue randomly chooses a hive reader to read a mini-batch. This diff allows dequeue to output the index of the randomly chosen table to a specific blob. Reviewed By: kennyhorror Differential Revision: D6621070 fbshipit-source-id: 754b981fc2bcfdb0146d2a0a5b677e7cfe74211b	2018-01-24 19:06:25 -08:00
Pieter Noordhuis	d618c05174	Increase lower bound of values for values in div test Summary: This should translate to an 1% error margin. The gradient checker uses a .5% threshold. Closes https://github.com/caffe2/caffe2/pull/1766 Differential Revision: D6774077 Pulled By: pietern fbshipit-source-id: f97c7ffb2ef34fdd71d69320a7fdcf4a6a457715	2018-01-22 09:06:12 -08:00
Ahmed Taei	d1d6c0b12b	Add CUDA implementation for ReplaceNaNOp Reviewed By: jay-mahadeokar Differential Revision: D6481993 fbshipit-source-id: cb253621795bb9de73d3e8bc1c8fc21b596d88c3	2017-12-05 13:34:51 -08:00
Junjie Bai	3da9d7971d	Suppress pytest filter_too_much health check Summary: Fix the travis CI Closes https://github.com/caffe2/caffe2/pull/1524 Reviewed By: dzhulgakov Differential Revision: D6412499 Pulled By: bddppq fbshipit-source-id: eaa5942c88d4edd65600d035e31d2300fd8ab3a8	2017-11-27 08:35:27 -08:00
Yan Shang	24e83acbb9	Enable sampling in evaluation Reviewed By: chocjy Differential Revision: D6119768 fbshipit-source-id: c8447326008392df70ab10b04f84223cf6d882b1	2017-11-16 14:03:51 -08:00
Andrey Malevich	e13f199452	Switch RNNOp to use NetDef argument for step represenetation. Summary: Before this diff RNNOp was using TextFormat for representing steps. This diff is changing RNNOp to prefer NetDef argument instead. To be backward compatible it supports TextFormat for existing models, though we can compile RNNs without TextFormat as well. Reviewed By: salexspb Differential Revision: D5949330 fbshipit-source-id: 9336a8f5ccf30ad8d8e3a7067b9437e1704b1c9f	2017-10-10 22:01:51 -07:00
Di Yu	acc384183a	caffe2 operator logit / logit gradient CUDA implementation Summary: This is the continuation of T20872698 Implement the gradient operator for element-wise Logit Reviewed By: asaadaldien Differential Revision: D5969487 fbshipit-source-id: c9bb4222529f9fd9085aa9048b90eb70a63f41f4	2017-10-03 18:48:25 -07:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
Di Yu	711d7137c7	Implement the gradient operator for element-wise Logit Summary: Implemented logit gradient with eps as arg. Add the unit test for it and explored the optimal parameter to run the test. Reviewed By: asaadaldien Differential Revision: D5910655 fbshipit-source-id: 44898b784a57c7ad45519b202b1eaf95c1c4d460	2017-09-26 14:49:22 -07:00
Aapo Kyrola	bb08f261f1	EnsureDense/SparseToDense for CUDA Summary: Make CUDA version of SparseToDense, register EnsureDense (which is trivial) on CUDA. Need to use atomics because indices can be duplicated. We can later add an option to inform if the indices are unique, and use faster path then. Reviewed By: jhcross Differential Revision: D5750893 fbshipit-source-id: 005d1675b127a571aac8474fca62d9633f0c7bff	2017-09-01 09:33:05 -07:00
Ahmed Taei	8af625ede2	Implement gradients for Col2Im and Im2Col operators Reviewed By: jay-mahadeokar Differential Revision: D5576385 fbshipit-source-id: a0ca4f704fd861f7cc67079041b1d0772fc66920	2017-08-07 15:51:30 -07:00
Wojciech Glogowski	8f8dccd2ed	distance_op_test from hypothesis_test refactored Summary: Moved distance_op_test from hypothesis_test to distance_op_test and refactored Reviewed By: akyrola, asaadaldien Differential Revision: D5495104 fbshipit-source-id: 4a90c75eabeb380ae9d150d6258e9b5b0fbfc5ca	2017-07-26 13:37:08 -07:00
Wojciech Glogowski	f656e002a7	CosineSimilarity GPU Reviewed By: asaadaldien, akyrola Differential Revision: D5476812 fbshipit-source-id: d931a7d8e4a4dfdf22ee18f8b9c755cc21b0e75b	2017-07-25 13:34:01 -07:00
Bangsheng Tang	e5a7891038	dot product using matmul Summary: 1. PairwiseDotProduct in layers 2. add_axis argument in Concat and Split(just for backward propagtion) Reviewed By: xianjiec Differential Revision: D5383208 fbshipit-source-id: 8e18ce371fff2da2da77b1a728142d69cd48e9c3	2017-07-17 23:20:37 -07:00
Aapo Kyrola	f44991b398	add timeout argument to DequeueBlobs; use 10 min timeout for data workers Summary: As title. This helps with (quite common) cases where data input is stuck for reason or another, and the net execution never proceeds and is stuck forever. Reviewed By: andrewwdye Differential Revision: D5409885 fbshipit-source-id: 840261fd5964408f788fc0f50ece0d74193694ac	2017-07-13 18:52:03 -07:00
Marat Dukhan	2ac9ff5c96	Cos, Sin, and Abs operators Summary: add Cos, Sin, and Abs operators Reviewed By: akyrola Differential Revision: D5307632 fbshipit-source-id: 743c9d289e4d3fd439e4b5385841cdff87d9247a	2017-07-03 22:18:32 -07:00
Thomas Dudziak	5355634dac	Dict fixes/improvements and unittest targets for Python 3 in caffe2 core Summary: As title Reviewed By: salexspb Differential Revision: D5316104 fbshipit-source-id: aee43819d817842e5ce6ba3d045a55b1a2491c30	2017-06-29 17:05:41 -07:00
Andrew Tulloch	912ee4e40a	Fix `test_sparse_to_dense` precision failures Summary: .. Reviewed By: tomdz Differential Revision: D5349561 fbshipit-source-id: 4c510905515eb03a64abc36f33d59a1d998c2ab1	2017-06-29 12:48:03 -07:00
Thomas Dudziak	342de07231	Core unit test fixes for Python 3 Summary: As title Differential Revision: D5291327 fbshipit-source-id: 7dd9279c53ba55d3422c31973ffcec5705787fdf	2017-06-23 13:22:16 -07:00
Luke Yeager	e2107fffba	Fixes for test_recurrent in hypothesis_test.py Summary: /cc akyrola is it possible this test has been broken ever since `5614816fce`? More generally, why do we still have `hypothesis_test.py` at all? In the case of this test, surely one of these files does more than this one old test: * `operator_test/cudnn_recurrent_test.py` * `operator_test/recurrent_network_test.py` * `operator_test/rnn_cell_test.py` Closes https://github.com/caffe2/caffe2/pull/843 Differential Revision: D5292109 Pulled By: akyrola fbshipit-source-id: 6df5df6353a9741d1ae1b796adaab98382857527	2017-06-21 05:35:42 -07:00
Luke Yeager	31e700910d	Fix entropy error coming from test_div Summary: Working towards https://github.com/caffe2/caffe2/pull/817. `E InvalidArgument: Insufficient bytes of entropy to draw requested array. shape=(4, 2, 5, 1, 3, 5, 5, 1), dtype=float32. Can you reduce the size or dimensions of the array? What about using a smaller dtype? If slow test runs and minimisation are acceptable, you could increase settings().buffer_size from 8192 to at least 24576000.` https://travis-ci.org/caffe2/caffe2/jobs/243867951 Closes https://github.com/caffe2/caffe2/pull/828 Differential Revision: D5276723 Pulled By: akyrola fbshipit-source-id: f7d0e2dd8ef8b6a2354bd4ff7c7446c377c954b4	2017-06-19 13:47:29 -07:00
Po-Yen Chou	5ce9cbae70	Upgrades python/hypothesis_test.py to use brew instead of CNNHelperModel Summary: Upgrades this file to use brew instead of CNNHelperModel Reviewed By: harouwu Differential Revision: D5252089 fbshipit-source-id: 6df4350717c1d42bc4bcc63d255cd422f085ee05	2017-06-15 15:07:56 -07:00
Luke Yeager	f61e4ca070	Fixes in tests to support numpy >= 0.12 Summary: ``` File "/data/caffe2/install/caffe2/python/hypothesis_test.py", line 1911, in test_batch_to_space (w + 2 * pad) / block_size).astype(np.float32) File "mtrand.pyx", line 1404, in mtrand.RandomState.randn (numpy/random/mtrand/mtrand.c:19843) File "mtrand.pyx", line 1534, in mtrand.RandomState.standard_normal (numpy/random/mtrand/mtrand.c:20368) File "mtrand.pyx", line 167, in mtrand.cont0_array (numpy/random/mtrand/mtrand.c:6127) TypeError: 'float' object cannot be interpreted as an index ``` ``` File "/data/caffe2/install/caffe2/python/operator_test/tile_op_test.py", line 101, in tile_ref tiled_data = np.tile(X, tuple(dims)) File "/data/caffe2/venv/local/lib/python2.7/site-packages/numpy/lib/shape_base.py", line 881, in tile return c.reshape(shape_out) TypeError: only integer scalar arrays can be converted to a scalar index ``` I also tested to make sure this still works with 0.11. Closes https://github.com/caffe2/caffe2/pull/787 Differential Revision: D5248087 Pulled By: salexspb fbshipit-source-id: eff69482a8eabb8ace330003fa326c832b53865f	2017-06-15 14:17:20 -07:00
Jiyan Yang	c7aa8e142d	Add gradient to SparseToDense op Summary: As desc. Differential Revision: D5169423 fbshipit-source-id: 64c72933c14c3caabfbe0bf85912194a479c24fa	2017-06-09 13:47:21 -07:00
Pooya Davoodi	2c97c98ca7	Enable testing the GPU implementations of Adagrad and Adam Summary: Enable testing the GPU implementations of Adagrad and Adam incl sparse versions. Closes https://github.com/caffe2/caffe2/pull/607 Reviewed By: dzhulgakov Differential Revision: D5121552 Pulled By: Yangqing fbshipit-source-id: da6b7dde456237c94cf74d00860e7327b2267eab	2017-06-01 18:10:57 -07:00
Yiming Wu	b070197e8a	cuda unique op Summary: cuda unique op , unittest provided, will provide benchmark agains CPU SpeedUp results for synthetic real data. Input of size 20k, range[1, 10million], ~5x speedup CPU 9.05795(ms) Unique GPU 1.79434(ms) Unique SpeedUp results for 5x synthetic data. Input of size 1 million, range[1, 10million] ~13.7x speedup CPU 54.7539(ms) Unique GPU 3.99473(ms) Unique Reviewed By: akyrola Differential Revision: D5007726 fbshipit-source-id: 0a00c518fd1809d0ae8c6cfcba09b0bd982ffaff	2017-05-11 21:08:10 -07:00
Xiaolong Wang	0d32ab4a45	Refactor FTRL optimizer to allow sending Alpha as input blob Summary: Split from parent diff Reviewed By: xianjiec Differential Revision: D4992993 fbshipit-source-id: 9f8a79023b0c581e84bd5e82e2e730c9e1a86e1e	2017-05-04 22:57:00 -07:00
Kittipat Virochsiri	c34d5a838f	Generalize LastNWindowCollector Summary: Use `CopyItems` so that it accepts any type of tensor. Also, move the cursor to input blob so that it's checkpoint friendly. Output is now also part of input so that inference can work correctly. Reviewed By: xianjiec Differential Revision: D4920987 fbshipit-source-id: da532736225ec27f409ff763ff69a0629235151c	2017-05-04 16:05:15 -07:00
Ahmed Taei	561255218a	NormalizeOP CUDA impelementation Summary: Implement NormalizeOP for GPU using CUDA, and re-write the graident to be a function of the output so its more efficent specially for CUDA implemntation. Reviewed By: akyrola Differential Revision: D4971300 fbshipit-source-id: e0ab66462000988aaf1f26010ea550533d107167	2017-05-01 09:25:30 -07:00
Aapo Kyrola	ed05c28bc6	Speedup SquaredL2Distance CUDA Summary: Both SquaredL2Distance and SquaredL2DistanceGradient had bad CUDA implementations. Use proper reductions and batched kernels. Reviewed By: asaadaldien Differential Revision: D4968527 fbshipit-source-id: f7cf82072d38bc127c757c5751863a9439aca8b5	2017-04-28 11:55:59 -07:00
Luke Yeager	09bb91022a	Fix tests for ops without a CUDA backend Summary: See https://github.com/caffe2/caffe2/pull/227 * Logit * ReplaceNaN * BatchOneHot Closes https://github.com/caffe2/caffe2/pull/277 Differential Revision: D4915268 Pulled By: Yangqing fbshipit-source-id: 77ccb2e7d03e6953e8ca60646987a02868d0ef5b	2017-04-24 15:52:25 -07:00
Lei Chen	8b5782ed5c	Weighted sampling dequeue operator Summary: Similar to SafeDequeueBlobsOp, but add weight-based sampling for reading from multiple input BlobsQueue. WeightedSampleDequeueBlobsOp will take a vector of weights (each weight is mapped to one input blob queue). Based on probability, we will choose which BlobQueue to fetch. WeightedSampleDequeueBlobsOp shall stop when any of input BlobQueue is empty. Reviewed By: dzhulgakov Differential Revision: D4905160 fbshipit-source-id: 5b1551e2250569f933a6c01ed04442843c5e0cb6	2017-04-19 12:02:06 -07:00
Xianjie Chen	70e9c08f27	feature processing ops Summary: add necessary ops for feature processing * logit op * replace nan * batch one hot op Reviewed By: kittipatv Differential Revision: D4840869 fbshipit-source-id: 197123ea5608d54f0b5ac7899973a077a6a86775	2017-04-11 07:07:51 -07:00
Aapo Kyrola	8da2d75ec8	Caffe2/Recurrent] recurrent.py API to cuDNN LSTM Summary: Quite large diff to make cuDNN LSTM and our LSTM produce same results and provide python API for the cuDNN LSTM. * Added operators RecurrentParamGet and RecurrentParamSet to access weights and biases for the different gates, input/recurrent. * Removed RecurrentInit as not needed * recurrent.cudnn_LSTM() returns a special net and mapping that can be used to retrieve the parameters from the LSTM * recurrent.cudnn_LSTM() can be passed blobs that have the parameters for the individual gate weights and biases * recurrnet.InitFromLSTMParams() can be used to initialize our own LSTM from CUDNN params. This way we can test if cuDNN and our own produce the same result. recurrent_test.py tests for the equivalency Reviewed By: salexspb Differential Revision: D4654988 fbshipit-source-id: 6c1547d873cadcf33e03b0e0110248f0a7ab8cb0	2017-04-05 14:20:23 -07:00
Dmytro Dzhulgakov	ef42d4c2aa	Fix sparse to dense and improve DispatchHelper Summary: Actually adds stuff on duplicated indices. I didn't use UnorderedSegmentSum because it'd need more modifications for figuring out the first dimension and I don't want to make that function more complex than it's already is :) We theoretically can have a version that does CopyItems and fails on duplicate indices as a fallback. But I haven't implemented it yet as it wouldn't be that useful for now. Also fixes hypothesis test - doing rand() inside the body is not cool as it makes hypothesis run forever Differential Revision: D4814574 fbshipit-source-id: 1851ec5f5df8fc4bf4844585076b8af23a06b0b2	2017-04-04 15:03:39 -07:00
Aapo Kyrola	e13e9c1302	cuDNN version of TransposeOp Summary: Uses the cudnnTransformTensor function. It works by shuffling the strides according to the transpose axis. Significant speedup over current GPU version . + moves the transpose test under utility_ops, because hypothesis_test is too big Reviewed By: jamesr66a Differential Revision: D4810993 fbshipit-source-id: 82577c4ced1389e70bd5992820ae4d8297a3817f	2017-04-03 13:33:10 -07:00
Luke Yeager	a95751e918	Fix test_random_seed_behavior for multi-GPU Summary: ``` E0327 17:33:12.775998 15629 context_gpu.h:126] Encountered CUDA error: an illegal memory access was encountered F0327 17:33:12.776208 15629 operator.h:176] Computation on device returned error in operator output: "Y" name: "" type: "XavierFill" arg { name: "shape" ints: 2 } device_option { device_type: 1 cuda_gpu_id: 0 } ``` Closes https://github.com/caffe2/caffe2/pull/225 Differential Revision: D4819785 Pulled By: Yangqing fbshipit-source-id: 896ca4d6534643bc261667377cc74d4fd7b3aca3	2017-04-03 10:50:46 -07:00
Luke Yeager	d76a814c93	Fixes for ops without a CUDA backend Summary: All of these tests fail with some variant of `Cannot create operator of type 'X' on the device 'CUDA'` (see commit messages). Closes https://github.com/caffe2/caffe2/pull/227 Differential Revision: D4797060 Pulled By: Yangqing fbshipit-source-id: 5feaa8e949098bfc1254d4c7449a2744e552f925	2017-03-29 14:36:09 -07:00
Ahmed Taei	f2b8150a1a	Fix PadImage same padding argument. Summary: PadImage has no kernel parameters resulting pads_ paraemeters to be not set (0). I added a test case too. Differential Revision: D4785230 fbshipit-source-id: fd475e7c41208e07fa7a363def9a45c6f82cddfe	2017-03-28 13:21:36 -07:00
James Cross	b41449b680	SparseMomentumSGDUpdateOp Summary: Creates SparseMomentumSGDUpdate, a sparse version of MomentumSGDUpdate, to make that optimization method (via in-place updating operator) compatible with GradientSlices. Differential Revision: D4784973 fbshipit-source-id: e6330f471a4d5f53589a6ac245e38f256ca7f354	2017-03-28 07:47:46 -07:00

1 2

86 Commits