Commit Graph

300 Commits

Author SHA1 Message Date
Qichao Que
2f68632a32 Add SparseNN workflow for feed.
Summary: Add SparseNN workflow for feed. I haven't fully thought about the change needed for ads, as I added a property called 'preproc_output_schema' for LayerModelHelper.

Reviewed By: xianjiec

Differential Revision: D4585796

fbshipit-source-id: 060d08f4beb928e7e7863f2e563f612c358951fb
2017-03-01 11:02:38 -08:00
Aapo Kyrola
aa3156c235 Remove use of logging module and np.random.randint() due to deadlocks with forks
Summary: See http://bugs.python.org/issue6721. Since everstore loaders use ProcessPoolExecutor, which is based on forks, and there was perhaps update of the numpy library or some unralted lirbary, we started getting subprocesses stuck at np.random.randint().   Also changed logging to prints, since logging is known to have issues with multiprocessing.  See https://www.prod.facebook.com/groups/fbpython/permalink/1438647216176641/

Differential Revision: D4633725

fbshipit-source-id: ae948a1827c71a3a2119d6a3248706728984df31
2017-03-01 03:32:56 -08:00
Aapo Kyrola
02937903cc add inference for gradient ops + a couple of missing shape inference functions + fix to scalars
Summary:
A bit too much stuff in one diff, so sorry:

1. Add inference for gradient types by using the fact that x_grad is gradient of x and must be of same shape. This is kind of awkward to use string matching, but in addition I rely on the operator being actually a gradient op.
2. dzhulgakov was write, scalar shape is () and not (1). Sorry, my claim easlier was #fakenews.
3. Added inference functions for MakeTwoClass, MomentumSGDUpdate and Cross entropy ops.

Reviewed By: dzhulgakov

Differential Revision: D4569758

fbshipit-source-id: 0db13f33819777fdddefe21d4b1ebf906fcaf98c
2017-02-28 23:33:32 -08:00
Aapo Kyrola
f84e5360cc LSTM benchmark (Caffe2 RNN based)
Summary: Just generate some random data and put it through LSTM (Cafef2 RNN based) using its own output as gradient value for benchmark purposes. With default parameters it fits my dev GPU memory. On default parameters provided in this diff I have got 300k entries per second processed. These entries are split into blocks of seq_length * block_size. Each entry is of size hidden_dim, LSTM takes in hidden_dim sized input and produces output of the same size.

Reviewed By: salexspb

Differential Revision: D4605815

fbshipit-source-id: dd529302a0a93e8711784c67e4c777c8d6a8cdf4
2017-02-28 23:17:26 -08:00
Jerry Pan
8a0ebed4c9 Caffe2: Tile operator
Summary: Caffe2: Tile operator

Differential Revision: D4630698

fbshipit-source-id: 1aa5c3c9d7fcfc17f78c80fd4b752595280266a0
2017-02-28 23:17:26 -08:00
Yangqing Jia
bdd542d087 backup functions for non-cuda cases
Summary: This fixes the error introduced in cudnn v6 diff.

Reviewed By: ajtulloch

Differential Revision: D4633113

fbshipit-source-id: 454cd4b3e52b8de01c1914e66d25310d7ecb13aa
2017-02-28 22:07:54 -08:00
Luke Yeager
69fa85be26 Fix some typos
Summary:
Found while reading through d522693cc8
Closes https://github.com/caffe2/caffe2/pull/176

Differential Revision: D4630275

Pulled By: Yangqing

fbshipit-source-id: 0a8e85d317d427a39467ebcb5e9a70594075bae2
2017-02-28 18:36:12 -08:00
Simon Layton
fbf47a8825 Cudnn v6
Summary:
Add cudnn v6 support, including testing support for dilated convolution.
Add a check to ensure that the versions of cuDNN used to compile Caffe2 and run it are compatible
Closes https://github.com/caffe2/caffe2/pull/85

Reviewed By: bwasti

Differential Revision: D4387690

Pulled By: Yangqing

fbshipit-source-id: 312960134398dd4afe6ee0c01cdc160046c904e8
2017-02-28 17:46:33 -08:00
Zachary Mirman
1c92e85dae Added editDistance helper to caffe2 operators
Summary: Added editDistance helper to caffe2 operators

Differential Revision: D4622152

fbshipit-source-id: 4d6246b8226c1283d5883edfaa27e8f7748fdc4c
2017-02-28 13:31:56 -08:00
Artem Volkhin
000db87bc7 Half-floats support for the rest of segment ops
Summary:
previously fp16 type was supported in SparseLengthsSum operator, now it
works in all other segment operator as well.

Reviewed By: dzhulgakov

Differential Revision: D4624312

fbshipit-source-id: c9d72110e3762167270bb088405eaf9c56e88493
2017-02-28 11:19:15 -08:00
Andrey Malevich
a3726759c6 Add a way do describe layers in a more AdHoc manner.
Summary:
This diff is trying to address one of the concerns that Xianjie have had - requirements create a layer for all operators and attach pass shapes and other info around.

The basic idea of the diff:
1. Try to create a layer with a given name, but if it's not available try to fallback on operator with that name (that is expected to have no parameters).
2. For all operators that we're adding through this functional style of creation - try to use C2 Shape/Type inference logic to get output type. If we fail to get - it just return untyped record and expect user to annotate it when it's really needed.

Reviewed By: xianjiec

Differential Revision: D4408771

fbshipit-source-id: aced7487571940d726424269970df0eb62670c39
2017-02-27 23:30:39 -08:00
Aaron Markham
851cb7059d changed StringfyProto to StringifyProto
Summary: Closes https://github.com/caffe2/caffe2/pull/155

Reviewed By: dzhulgakov

Differential Revision: D4621607

Pulled By: Yangqing

fbshipit-source-id: ec7f45132260fbb6d36ef61ffbf5bf6466f237eb
2017-02-27 23:05:04 -08:00
Pooya Davoodi
d85ca8c6df Do not initialize BN params if init_params is false.
Summary:
If init_params is False, the parameters should not be initialized.
This is particularly important when testing a model that provides values for these BN parameters.
Closes https://github.com/caffe2/caffe2/pull/174

Differential Revision: D4621791

Pulled By: Yangqing

fbshipit-source-id: 518443925990a12c1d5729b0971ebe19ba5d8998
2017-02-27 20:19:03 -08:00
Aapo Kyrola
7b0126381c Share queue + reduce logging
Summary: It is better for the workers to share the python-side queue, since I saw a case where workers assigned for one GPU was lagging behind others. Also, reduced logging as requested by rpenggithub.

Differential Revision: D4620487

fbshipit-source-id: 73353f9570b07788c8cd71c9fec9308cd93a44dd
2017-02-27 19:38:45 -08:00
Kun Huang
07623e24c9 Implement shape inference function for Im2Colop
Summary: Inference function for the Im2ColOp: caffe2/caffe2/operators/im2col_op.cc.

Differential Revision: D4608663

fbshipit-source-id: d26ffb403c2acb7a5ead5f58f044ee3340c8311a
2017-02-27 10:46:54 -08:00
Aapo Kyrola
449f8997ab close blobs queues when stopping + test
Summary:
Mysterious deadlocks after epoch has finished have occured randomly but quite frequently recently for myself, vigneshr and others. Looking at a stack trace of vigneshr's job (P57129798), I noticed a couple of threads were calling BlobsQueue.blockingWrite (or something like that). That call stucks when the caffe2/c++ side queue is at capacity (we use capacity of 4 with data workers). So in cases when this call was just being made while the script was to be terminated, the thread did not close and the whole process did not close either (not completely sure why that is since thread is a daemon thread, but this might be a flow-related issue since we run inside a flow container).

This is quite easy to fix: just call CloseBlobsQueue() when terminating the process. I modified coordinator.stop() and wait_for_finish() to return a status code based on whether threads that were joined actually closed within the 1.0sec timeout. This allowed creating an unit test to test for this issue. Before my change, the unit test failed.

Reviewed By: pietern

Differential Revision: D4619638

fbshipit-source-id: d96314ca783977517274fc7aadf8db4ee5636bdf
2017-02-27 10:07:57 -08:00
Kevin Matzen
04d02632e9 instance norm test fix
Summary:
Reduce test input size to instance norm gradient check.  Larger size is currently timing out on stress tests.
e.g. failed: Timeout: Ran out of time before finding a satisfying example for test_instance_norm_gradients. Only found 2 examples in 125.39s.

Reviewed By: Yangqing

Differential Revision: D4608828

fbshipit-source-id: ce17a3ad28752d808efcbf79f1ea4238e63fb005
2017-02-25 14:31:42 -08:00
Peng Yang
8ab13eea6f delete redundant comment lines.
Summary: delete redundant comment lines.

Differential Revision: D4600596

fbshipit-source-id: 4bb619f9ff99d6f799e87970b6b6d5ea7de02c98
2017-02-24 11:04:36 -08:00
Xianjie Chen
b257fd8e83 Other places that may need NameScope
Summary:
For code in layer model helper, layers. It's intentionally to not have NameScope by default.

This looks another place that may need default NameScope.
https://fburl.com/wdwtxp0m

Reviewed By: kennyhorror

Differential Revision: D4606971

fbshipit-source-id: b560bf59d3242e3f9443cd5aeda5c7e2e4e89079
2017-02-23 21:16:35 -08:00
Aapo Kyrola
9eeeb8407f use CUDA version of AccuracyOp with top_k=1
Summary: D4348953 added support for accuracy for top_k>1, which is only supported on CPU, requiring data to be copied to CUDA. But that diff did not take into account that we have top_k=1 version of AccuracyOp for CUDA. This diff ensures we use the CUDA version for top_k=1.

Differential Revision: D4607767

fbshipit-source-id: 8becda23890343043eb79ad04e4c6196e9010f0c
2017-02-23 19:02:53 -08:00
Min Li
182c168285 Add group collector limit and add option for enable sum loss
Summary: as title. Add num of examples limit for group collect. Add option for enabling sum loss in BatchLRLoss

Reviewed By: xianjiec

Differential Revision: D4602311

fbshipit-source-id: 5b2a244f1f0e9f1ab0f4590e94828fd18d018d8d
2017-02-23 15:03:22 -08:00
Deepak Gopinath
cd4ea42048 Allowing creation of random odd length arrays in RandGaussian
Summary: curandGenerateNormal can only generate arrays of multiple of 2 lengths. MSRAFill and GaussianFill operators use RandGaussian utility method which in turn uses curandGenerateNormal. This is a test which runs the operators on both devices to generate odd sized random arrays.

Differential Revision: D4602819

fbshipit-source-id: e65f5c731e925886cfa14afff482f7053bd020a0
2017-02-23 15:03:22 -08:00
Aapo Kyrola
0a060dae50 better killing after timeout, cleanup
Summary:
This fixes at partly a recurrent problem when using everstore data input (or any other data input with multiprocessing). If the main process dies violently, the child processes are not killed. One cause for this was when using the TimeoutGuard(), as it called os._exit(1) that prevents any cleanup happening. I changed it to send SIGINT signal to the PID, and if in 10 secs the process is still living, calling os._exit(1). In my tests, this works well.

Did some other cleanup:
- improved logging of inputs/sec in data_workers
- removed redundant atexit() handling as the multiprocessing pool does it itself

Differential Revision: D4602550

fbshipit-source-id: 64d4526a2a3625d163d23f078286e719d56998f4
2017-02-23 13:16:19 -08:00
Chonglin Sun
8a85d6bd34 support vectors with different dims in for DotProductOp.
Summary:
Add two argument to DotProductOp operator, `force_same_dim` (1 if we want
DotProductOp to only accept two tensors with equal dimension, 0 otherwise) and
pad_value (only useful when force_same_dim = 0, pad the tensor with smaller
size to the same as the other one).

Differential Revision: D4502619

fbshipit-source-id: 46f7da710c6f6365f76a7af6234c34c7f656be62
2017-02-23 11:09:07 -08:00
Yury Zemlyanskiy
4a53ab3cb6 LSTMWithAttention implementation in Caffe2
Summary:
Implementation of ##LSTMWithAttention##

Still TBD:
1. There are problems with back propagation, because gradient is not implemented for ops with broadcasting
2. I need to make initial_recurrent_state to be of shape [dim] rather than [1, batch_size, dim], so one doesn't need to provide batch_size to LSTMWithAttention

Differential Revision: D4298735

fbshipit-source-id: 8903fcff4d6a66647ee6d45a6ef28803fc3091e5
2017-02-23 04:08:34 -08:00
Alexander Sidorov
95262032d8 ] Char RNN bug fix for batching
Summary:
It could be that only first item
in the batch was really used in a case rest of the memory was 0. Or if
memory there had a big positive integer, then whole sequence was used. So we used rest of the batch depending on our luck :)

Reviewed By: Yangqing

Differential Revision: D4599569

fbshipit-source-id: ae89cee796bbcbc232e4abcab71dee360b0d8bc6
2017-02-22 17:34:30 -08:00
Andrew Tulloch
312821d36c Allow in-place instance norm.
Summary:
In-place is ~30% speedup, but needs a change to torch2caffe
or a graph rewrite on the client.

Differential Revision: D4577582

fbshipit-source-id: c31bf8ba97f4fa4cedf355cf2475eb7bab48b304
2017-02-22 14:03:55 -08:00
Bram Wasti
c7ed091633 Added model downloader
Summary: Closes https://github.com/caffe2/caffe2/pull/156

Reviewed By: Yangqing

Differential Revision: D4574588

Pulled By: bwasti

fbshipit-source-id: a0f2da0b13358157c7d7322257a9c4f1c61aae12
2017-02-22 12:47:15 -08:00
Pooya Davoodi
b0148a7c7d Use ws_nbytes_limit (called cudnn_ws in args).
Summary:
cudnn_ws args was already there. This PR only uses that args when model is created.
Closes https://github.com/caffe2/caffe2/pull/164

Differential Revision: D4598443

Pulled By: Yangqing

fbshipit-source-id: c2e83f73059360ecf2fedf2c62be7cacbb4034ca
2017-02-22 12:19:16 -08:00
Xianjie Chen
aed3aabc7f model and preprocessor can handle empty dense inputs
Summary: we may not need dense feature inputs in some models (e.g., double helix).

Reviewed By: dzhulgakov

Differential Revision: D4568755

fbshipit-source-id: 6850508f86fafb53f81783b2a2a38776be5455d7
2017-02-22 11:19:15 -08:00
Artem Volkhin
45e1905722 add support of fp16 to SparseLengthsSum and SparseLengthsMean
Summary: Another part of making DPER compatible with half-floats. This diffs adds supoprt of fp16 to segment reduction operators used in DPER.

Reviewed By: dzhulgakov

Differential Revision: D4587560

fbshipit-source-id: 0ae10648a7286a820bffaee802464dd9464584bc
2017-02-22 11:05:55 -08:00
Artem Volkhin
b2cf0fad15 Convert SparseLookup layer's embedding to fp16 blobs for predictor
Summary:
First part of adding half-floats support to DPER 2.0. Let's add an option use_half_floats to enable converting some weights of the model from fp32 to fp16 before saving it to predictor models parts. For now it's for SparseLookup layer's embeddings. All conversion is done after training is finished and saved models are ready to be used on remote predictors as-is (they will be stored compacted in memory). New fp16 blobs are saved to the model instead of original ones, under the same names, so we don't modify MetaNetDef at all.

Next steps:
1) support on delivery side -- operators working with these blobs should support both float and float16 input types
2) benchmark performance to make sure there is no regression
 a) of serialization
 b) of delivery
3) support realtime training (I'm thinking about adding new pre-publishing net which will be executed each time the realtime trainer stops to publish a new snapshot)

Depends on D4567304

Reviewed By: kennyhorror

Differential Revision: D4571710

fbshipit-source-id: 19967a17d3bd84878d66e8c0ed8c5342bf38d979
2017-02-22 11:05:49 -08:00
Xian Li
64419a928d Implement EnsureDenseOp and EnsureDenseGradientOp.
Summary:
This operator can always outputs dense gradients regardless of
the input gradients. For forward pass, it passes inputs to outputs in place.

Reviewed By: xianjiec

Differential Revision: D4582511

fbshipit-source-id: 7eb2c5d2142aa05d373f06cab1e7f89d8b747d34
2017-02-22 07:16:26 -08:00
Yangqing Jia
47b65b6d8d Add a create your own dataset tutorial
Summary:
bwasti - will follow up via email.
Closes https://github.com/caffe2/caffe2/pull/166

Differential Revision: D4596858

Pulled By: Yangqing

fbshipit-source-id: 6d088ccf1604e0dc9b94cbf0a75b51587e734d95
2017-02-22 03:31:47 -08:00
Alisson Gusatti Azzolini
59f0454621 Gather perf counters for distributed jobs
Summary: Set up a server node that periodically gathers values of all nodes' perf counters, allowing to publish them at once.

Reviewed By: dzhulgakov

Differential Revision: D4555116

fbshipit-source-id: 8e49ac8353b52b2be82aedf305762478e7fa687a
2017-02-21 22:06:25 -08:00
Alisson Gusatti Azzolini
6ff05fd49d Fix issues pickling jobs
Summary:
We were running into a problem where a Job could not be pickled. It needs to be pickled in order for the master flow operator to execute it using the session.
This creates a concept of "compiled" Job, that pretty much only stores protobufs with the Jobs to be executed, avoiding any issue with pickling.

Reviewed By: dzhulgakov

Differential Revision: D4554799

fbshipit-source-id: 2ee9877ca49a796d51925e5ec917436e3d930984
2017-02-21 20:47:27 -08:00
Alisson Gusatti Azzolini
8fa156d082 Improve "reporter net" design
Summary:
Previously we had several limitations for a reporter net:
 - needed to be a net, not an execution step
 - only one allowed per execution step, with a single interval

Now, "reporter nets" become repoter steps and multiple of them can be specified with different timeouts.

Reviewed By: dzhulgakov

Differential Revision: D4583686

fbshipit-source-id: ad7266e16f96e7829fd24dcc1f165f39e9db573d
2017-02-21 20:17:40 -08:00
Peng Yang
26be1977bf fix CrossEntropyOp bug for batch input
Summary: this is to fix the bug with eigen implementation which calculating crossentropy

Reviewed By: salexspb

Differential Revision: D4582078

fbshipit-source-id: 4c92047e9dbbe219fcbef618a45c584c2fbfaad5
2017-02-21 17:34:31 -08:00
Bram Wasti
183e158642 Remove Model API (unused)
Summary: Removed Model API because no one {seems to,should} be using it

Reviewed By: Yangqing

Differential Revision: D4575126

fbshipit-source-id: 174d39e9aa46750f1fae8295f7e1e5452559af33
2017-02-21 17:19:05 -08:00
Alisson Gusatti Azzolini
04eccb8ebe Performance counters
Summary:
- Key-value store for counters.
- Counters are updated via macros that also export USTD probes.
- Counter values can be exported using caffe2 operators.
- Snapshot mechanism for tracking time-window counter values.

Reviewed By: dzhulgakov, pietern

Differential Revision: D4553761

fbshipit-source-id: 25a1a91a3168dcff2159c6fba7b357d3fd3aa9bf
2017-02-21 16:31:24 -08:00
Qichao Que
7f4d5e9900 Add feed label parser operator.
Summary: Add feed label parser operator, this layer depends on D4520993.

Reviewed By: kennyhorror

Differential Revision: D4538797

fbshipit-source-id: 8efcd7b2f6962c30023c7464a13c125ba1a99dc4
2017-02-21 14:17:00 -08:00
Alexander Sidorov
ea9f4da368 fix typo in TextFileReader
Summary: as title

Reviewed By: bwasti

Differential Revision: D4591870

fbshipit-source-id: 01912ee75b036335402c7b4a5b147f20a50ce95b
2017-02-21 14:02:48 -08:00
Ahmed Taei
5bc3d2ef03 Add ReduceFront GPU Op's
Summary: Add GPU implementation for ReduceFront{Sum|Mean} Ops

Differential Revision: D4577270

fbshipit-source-id: 697f498531af6b9da4a0138d2a9beb39234f9756
2017-02-17 16:46:42 -08:00
Kittipat Virochsiri
ba7fad53b5 Support for sample softmax
Summary:
This diff adds ability to train multiclass classifier on sampled subset of classes. This basically implements what described in https://arxiv.org/abs/1412.2007 without the sampling probability correction. Since this implement uniform sampling, sampling probabilities are cancelled out in softmax anyway.

The trick to make this work is to have 2 different nets for prediction and training, both shared parameters. The model is built normally until the last layer. If sampling is needed, then we do the following:

The class sampling works as following:

Reviewed By: xianjiec

Differential Revision: D4512859

fbshipit-source-id: ab537bcac81d5e5877a8795045e8682c8064da68
2017-02-17 09:31:54 -08:00
Xianjie Chen
8949abe10b more clear about supported output dimension
Summary: Do I understand correctly? It must be of size 1 for sigrid

Reviewed By: kennyhorror

Differential Revision: D4576541

fbshipit-source-id: 92fa8dc62e36ff095e14cceeb80b03c0028f5695
2017-02-16 21:01:52 -08:00
Fei Sun
d8b7166251 Move build_ftrl to open source directory
Summary:
Move the open source version of build_ftrl to the open source directory.
Because build_ftrl can use several engines, the SIMD engine is fb specific.
We keep the build_ftrl in the fb/optimizers/sgd.py file.
So, if the caller only uses the open source engine, it can import the
open source build_ftrl. If the caller may use the SIMD engine, it needs
to import the fb specific build_ftrl.
Also move the tests to python directory.

Reviewed By: salexspb

Differential Revision: D4560384

fbshipit-source-id: 84fc915d3bbe42fd19503ef132d3277088f6fab3
2017-02-16 18:02:15 -08:00
Xianjie Chen
d0621a2449 NextScopedBlob with well-defined behavior and respect namescope
Summary:
Remove the use of `NextName` in layer model helper, so that the same function return `model_helper` that should construct identical `Net`, when under the same NameScope.

The `NextScopedBlob` should only take effect when there is real name conflicting, otherwise it returns ScopedBlobReference.

This is critical for parameter blobs. In long run, we need to be able to specify parameter blobs more explicitly. (kennyhorror is working on this). This solution works in short term for e.g., two tower sparse nn models.

Reviewed By: kennyhorror

Differential Revision: D4555423

fbshipit-source-id: 2c4b99a61392e5d51aa878f7346466a8f14be187
2017-02-16 17:16:36 -08:00
James Cross
b436788b16 LSTMUnit: pass through H values
Summary:
Pass through the h-value recurrent output unchanged at each LSTM step beyond the valid part of a sequence (computed based on seqLengths, allowing batching of sequences of different length). This enables using the final-step output of each sequence as the output when one vector is desired for the entire sequence. Gradient also passed back unchanged.

Also made some cosmetic changes to recurrent_network_test.py (seq_lengths offset corrected, should be in [1, T] rather than [0, T-1]).

Reviewed By: urikz

Differential Revision: D4540307

fbshipit-source-id: 73a9f6326069d713dcb0cdc8d17869317c6dbe96
2017-02-16 15:31:38 -08:00
Artem Volkhin
0c03c8fca5 Add name_overrides argument to SaveOp
Summary:
In current implementation of SaveOp we always use names for blobs from the
current workspace. But there is a use case for replacing names in saved model:
for example, to use half-floats in prediction model but keep full-floats for
training model we might want to save a blob "w_fp16" as "w".

Differential Revision: D4567304

fbshipit-source-id: 87bc84fa6a45d8bfa33edb55ac1fb1cff542dbe3
2017-02-16 12:32:51 -08:00
Steven Strijakov
5429031917 Adding SoftmaxWithLoss operator to Shape Inference
Summary: This diff adds shape inference for the SoftmaxWithLoss Operator

Differential Revision: D4565835

fbshipit-source-id: 1c2db398524c765977ec4d8a22c9b986bf9faf82
2017-02-16 12:32:51 -08:00