pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Kittipat Virochsiri	cea16ff7cd	BatchSigmoidCrossEntropyLoss Summary: To support feed interset team Reviewed By: kdub0 Differential Revision: D4719213 fbshipit-source-id: 8deb3544377fb06593399b101de66f3f845f93b5	2017-03-17 09:35:51 -07:00
James Cross	79c3a3af54	add gpu support for caffe2-seq2seq Summary: Adding synchronous optimization on GPUs to the translation training pipeline, via data_parallel_model.Parallelize_GPU, which needs to be updated so there is some way of performing sparse parameter updates (e.g., on embedding tables), whether on GPU or CPU. Reviewed By: urikz Differential Revision: D4631914 fbshipit-source-id: 9cdd655f7dbda3f9b2733d459228b3e097892441	2017-03-17 05:19:14 -07:00
Jon Morton	1513b1de6b	Add ResizeNearest operator Summary: This adds a nearest neighbor interpolation resizing operator to caffe2. CPU only, NCHW only, no gradients. Also adds torch2caffe support. This is probably not optimal in terms of performance, but it works. Reviewed By: ajtulloch Differential Revision: D4724244 fbshipit-source-id: b8295061141fb513da84acf91fdfd67264119059	2017-03-16 18:49:01 -07:00
Huazhong Ning	ad4ae4528f	migrate mtml to dper2 Summary: 1. migrate the basic mtml model to dper 2 2. test dper 2 mtml model 3. test all optimizers Reviewed By: kittipatv Differential Revision: D4680215 fbshipit-source-id: 7aac5c59bdac22fcad8ed869b98e9e62dca1d337	2017-03-16 17:48:05 -07:00
James Reed	cc2e915461	Implement TopK op in caffe2 Reviewed By: salexspb, urikz Differential Revision: D4718439 fbshipit-source-id: e6866eb7bb586f2716662cd4b65961bdd9914525	2017-03-16 17:32:20 -07:00
Kevin Waugh	2c8bf2525b	added BatchL2Loss layer Summary: layer that takes a label, prediction pair and outputs the L2 loss Reviewed By: kittipatv Differential Revision: D4702111 fbshipit-source-id: 09f2ede44d1b548e61096de741f1b2aa0b66bbcb	2017-03-16 17:32:20 -07:00
Alexander Sidorov	d85ed5d5d6	fix external_loggers Summary: it was broken in trunk and I fixed it locally then had a wrong merge in D4672026. This is just a revert of those changes Reviewed By: ajtulloch Differential Revision: D4723138 fbshipit-source-id: 14757d9c8ae5135bd7c084003a64e25efc74b54f	2017-03-16 13:47:58 -07:00
James Reed	10d95bd0f0	Remove batch_size parameter from attention and LSTMWithAttention interfaces Summary: Reshape based on tensor shapes in the graph rather than based on a passed-in batch_size parameter Reviewed By: urikz Differential Revision: D4702086 fbshipit-source-id: c4c1d8425cd36c1e86695918eaba2667c27e9601	2017-03-16 11:47:52 -07:00
Xianjie Chen	b2ab7365be	fix for special case when dense dim is 1 Summary: otherwise it will fail here: https://fburl.com/puy5x2dq Reviewed By: kittipatv Differential Revision: D4719212 fbshipit-source-id: e0d8211f64dca00ee48df3235d2bc030ea30f208	2017-03-16 05:19:10 -07:00
Luke Yeager	7773a2d643	Bugfix: type not being set when inferring types+shapes Summary: /cc akyrola I basically just copied all the `ShapeCall` stuff as `TypeCall`. Is there a better way? Closes https://github.com/caffe2/caffe2/pull/187 Differential Revision: D4699312 Pulled By: Yangqing fbshipit-source-id: 92f736ffe4127b00b5821acb1eb359771975fdd7	2017-03-15 18:48:40 -07:00
Alexander Sidorov	56f324d191	Added predictor bindings to python interface Summary: from caffe2.python import workspace; p = workspace.Predictor(init_net, predict_net); outputs = p.run(inputs) Reviewed By: Yangqing Differential Revision: D4576793 fbshipit-source-id: b829bbcaf2e7c34dad85024177433207bd96a234	2017-03-15 11:17:54 -07:00
Kittipat Virochsiri	61dd35f1d6	FCWithoutBias layer Summary: For some embedding task, we don't want to include bias term in embedding computation. Reviewed By: xianjiec Differential Revision: D4689620 fbshipit-source-id: 4168584681d30c0eaa1d17ceaf68edda11924644	2017-03-15 11:03:37 -07:00
Pieter Noordhuis	92101aa87a	Update resnet50 example Summary: Make it use Gloo and optionally use Redis for rendezvous (where a shared filesystem is not available). Differential Revision: D4709943 fbshipit-source-id: 59cc7a14316c7b634417ea5161a75fab3c19f2fa	2017-03-15 08:18:50 -07:00
ezineo	518d36d34b	Add PReLU translator Summary: Closes https://github.com/caffe2/caffe2/pull/171 Differential Revision: D4711877 Pulled By: Yangqing fbshipit-source-id: 555f733e6eabf351480b7d4398aa05755cc26599	2017-03-15 02:47:03 -07:00
Huazhong Ning	bb58074332	support get/add a field by nested name Summary: We are having more and more nested Struct schema. There is increasing need to get/adda field by nested name, e.g., for the following nest Struct schema: st = Struct( ('a': Scalar()), ('b': Struct( ('c': Scalar()), )), ) We may want to get the field "b:c" and/or insert a new field "b:x". The immediate need is for dper2 metrics. This diff is to achieve this. Reviewed By: kittipatv Differential Revision: D4690225 fbshipit-source-id: 71d4a74b36bd1228a2fefd901db2f200602152b7	2017-03-15 02:00:57 -07:00
Aapo Kyrola	26628d10ff	Fix workspace clashes Summary: For example, test and train nets could have shared workspaces, leading to race condition. This adds an assertion and adds a running counter to the workspace-blob name. Reviewed By: jhcross Differential Revision: D4712152 fbshipit-source-id: 808d7069095bac24ebfe0c9d31ebd134f4cf0956	2017-03-14 23:33:28 -07:00
Pieter Noordhuis	9e6fd02c28	Use Gloo ops in data_parallel_model Summary: No longer need GPU to CPU copies. The allreduce operator no longer uses 'local allreduce - global allreduce - local broadcast' sequence when Gloo is used, but passes all input blobs directly. Depends on D4708860. Differential Revision: D4709897 fbshipit-source-id: 4d745d5d8bac9c2fcca081dd5d812c902808c3b6	2017-03-14 22:34:51 -07:00
Alexander Sidorov	4d7451399b	XRay mobile quantized model Summary: This is going to allow to experiment with various training from scratch / fine tunning technics. The code itself for the new model is not intended to be used as is. Instead one could train a full precision model first. Then add quantization for the last layer, then for the next one and so on. In my experiments I tried getting a pretrained model and then quantizing all inception layers with 4 bits. This restored original accuracy after several dozen iterations Also in this diff I added a common prefix to the model checkpoint + added this prefix to git / hg ignore. And also some extra logs which are usefull to quickly see how things changed right after enabling quantization Differential Revision: D4672026 fbshipit-source-id: b022c8ccf11dd8a2af1a7b2e92673483bc741a11	2017-03-14 22:18:40 -07:00
Luke Yeager	014d1fe5c4	Allow test discovery in caffe2/python/ Summary: These are all essentially no-op changes which allow for nose-style (or pytest-style) test discovery. With this patch, you can use any of these methods to discover and run tests under `caffe2/python`: ``` python -m unittest discover -p 'test.py' caffe2/python/ python -m nose caffe2/python/ python -m pytest caffe2/python/ ``` Future work: * Get all of the tests to pass * Some seem to be testing operations which don't have GPU implementations * I get a segfault unless I set `CUDA_VISIBLE_DEVICES=0` * Some tests are flaky * Allow test discovery throughout the whole project (e.g. the `experiments/` dir) Closes https://github.com/caffe2/caffe2/pull/199 Reviewed By: pietern Differential Revision: D4704504 Pulled By: Yangqing fbshipit-source-id: 8f5687ec9c8aa873dfaff30dbf44272bc38a206b	2017-03-14 18:16:41 -07:00
Aapo Kyrola	91f468b15c	fixes to make data parallel model work for RecurrentNet + test case Summary: First, this diff includes a full test of data-parallel LSTM, which confirms it works correctly. To make it work, some changes had to be made: - cell net/step net external inputs must be namespace scoped - prevent double-namescoping of cellnet inputs - make data parallel model understand recurrentnets so the device-mapping works Reviewed By: salexspb Differential Revision: D4708840 fbshipit-source-id: 4b0ddc43642d449076a2b6f67ad1c47f84138ff4	2017-03-14 15:48:07 -07:00
Kittipat Virochsiri	25b1221579	Allow scalar output in functional layer Summary: Some operators, e.g., SoftmaxWithLoss, returns scalar-typed tensor. This would allow us to use those ops without having to write layer manually. Reviewed By: xianjiec, kennyhorror Differential Revision: D4703982 fbshipit-source-id: f33969971c57fc037c9b44adb37af1caba4084b6	2017-03-14 15:32:47 -07:00
Aapo Kyrola	783e40e806	Fix lengths-remapping again + better errors Summary: When cloning recurrent net op, we do a remapping of the lengths-blobs. But if they don't exists (like with CRF), we should not do that. Differential Revision: D4702123 fbshipit-source-id: 37a22d11e709011b8b98b2cc3d9f08eb9fda06c4	2017-03-14 11:04:45 -07:00
Alexander Sidorov	1fac027d0e	Quantized Training API Summary: These python helpers are going to provide sufficient book keeping when adding quantization for conv layers Reviewed By: Yangqing Differential Revision: D4671478 fbshipit-source-id: 292e2f633dd30969c0afbe7a8075b340ce9a6d12	2017-03-13 22:17:58 -07:00
Deepak Gopinath	a1d63da6af	Adding UNK to vocab \| Changing default params Summary: UNK needs tobe indexed in the vocabulary for validation to work. Default args now result in training loss decreasing. Reviewed By: urikz Differential Revision: D4703393 fbshipit-source-id: e4d6ad100daf8392f8ba1e502f9ecf39bb8ce24a	2017-03-13 22:17:48 -07:00
Aapo Kyrola	fc7939c25b	add model_helper.ExtractPredictorNet() Summary: It has been a pain to save predictor-compatible models from Caffe2. This diff adds function ExtractPredictorNet that takes a training model and outputs a predictor model by removing all operators that are not relevant for prediction, such as backward pass and dequeue-ops for input loading (as in predictor, the input data is external input). We can also consider including this directly in the predictor exporter for FB usage. Reviewed By: rpenggithub Differential Revision: D4693264 fbshipit-source-id: e81abbbec0bd4d717159cf36488d0baaf0130090	2017-03-13 16:32:04 -07:00
Ahmed Taei	a745981c94	ReduceBack{Sum\|Mean}Op CPU & GPU implementation Summary: Implement ReduceBackSum & ReduceBackMean with gradients for CPU & GPU contexts. The reduction happens among the last dimenstions for example if input is a M x N matrix ReduceBackSum will results a vector of dim M x 1 contains the rowwise sums. Differential Revision: D4689768 fbshipit-source-id: 5b0482d4341867ecf23526dc6c4d544420e7d8f7	2017-03-13 16:19:58 -07:00
Kairan Sun	ee2bc06926	Add Shape Inference for Reshape Operator Summary: Add shape inference for reshape. Because it cannot do shape inference for reshaped tensor with runtime tensor data, set `out[0].set_unknown_shape(true)` if no `shape` argument is used. Differential Revision: D4671125 fbshipit-source-id: 685a9198f9b08e3336014c792f20051b381d8619	2017-03-13 14:31:27 -07:00
Deepak Gopinath	001ac5d751	Fix to use appropriate corpus and vocab in eval Summary: We should be using the vocabulary built on the training data, and corpus_eval as data for the evaluation phase. Reviewed By: urikz Differential Revision: D4700382 fbshipit-source-id: ca1dd043a28f9bb585faad050c82fb12c1cdf6cc	2017-03-13 14:31:27 -07:00
Peizhao Zhang	a5a5d00b87	Fixed a bug: 'ModelTrainerLog instance has no attribute 'external_loggers'' Summary: Fixed a bug (AttributeError: ModelTrainerLog instance has no attribute 'external_loggers', at File "caffe2/python/experiment_util.py", line 101) when no external_loggers is passed to ModelTrainerLog(). Differential Revision: D4697197 fbshipit-source-id: 1c770c366d87ea474bcf40ab289b67c76648d48b	2017-03-13 12:32:36 -07:00
Xianjie Chen	e5858485ca	small change to concat layer to make tensor board vis nicer Summary: otherwise the blob will be in different namescope, e.g., `_nested`: https://fburl.com/ntlsaezv. this make tensorboard ugly. Reviewed By: dzhulgakov Differential Revision: D4696946 fbshipit-source-id: 73627feccd7c4896964e6c549b7241bcce4f49a7	2017-03-12 23:01:18 -07:00
Pieter Noordhuis	6729d81418	Specify which GPUs to use in resnet50 example Summary: TSIA This change also fixes an undefined attribute error after running 20 iterations of the resnet50 example trainer. Differential Revision: D4692794 fbshipit-source-id: b98efdfeb078c5ba89d2a86837f3c672e1eade5f	2017-03-12 22:33:15 -07:00
Dmytro Dzhulgakov	43b6fcba7d	Improve error message from LogFileDB on missing file Summary: A lot of people get confused if the file can't be loaded. Reviewed By: rpenggithub Differential Revision: D4686572 fbshipit-source-id: 519ff68a3d4f04cf8ce893f255f7814e043383b6	2017-03-10 23:31:28 -08:00
Aapo Kyrola	3f682ca699	Fix to data parallel model blob_to_device mapping Summary: We need the InferToDeviceMapping too early, or we should had done it also after running parameter update function since that can create new blobs like the momentum blobs. This fix is maybe not optimal, but works and is fast enough. Differential Revision: D4693450 fbshipit-source-id: 4c4cc2396dad371b3fbcd1d8da51133ea09a57e0	2017-03-10 18:03:58 -08:00
Dmytro Dzhulgakov	b61aaa90b6	Stop multi_reader if we run out of data before max_examples Summary: Before we didn't propagate the 'out-of-data' signal if splits_per_epoch wasn't specified. Right now it's a hacky fix (just reuse ReaderWithLimit). azzolini - any suggestions of more elegant solution? I can create an extra reader that just export "is empty" signal out. Overall, I guess we need to turn global_queue into a more sustainable unittest that verifies all possible combinations - I'm still not sure it's correct :-\ Reviewed By: xianjiec Differential Revision: D4665677 fbshipit-source-id: fe44d10ee82c3383145635e67dea1d9b666e061f	2017-03-10 18:03:57 -08:00
Wenyi Huang	0308910c58	Enable use of Print for LayerModelHelper Summary: Whe debug using LayerModelHelper, adding Print to model will trigger this assert. Reviewed By: xianjiec Differential Revision: D4687859 fbshipit-source-id: 6932e38f8dd17ba0b80da18a20943ecdb2e8af0a	2017-03-10 15:26:16 -08:00
Aapo Kyrola	a109cbdfb6	fix bug in data_parallel_model stripParams() Summary: Thanks for shenpan, detected this bug. Problem is that FinalizeAfterCheckponit() can be passed a list of strings, not blob references, and that fails in stripParam() after assertion I added in D4649208. It is ok to pass strings as well to that function. Reviewed By: jhcross Differential Revision: D4691028 fbshipit-source-id: 0bca80d44a5ab641438cc5b26482bca0b1527d69	2017-03-10 13:17:11 -08:00
Aapo Kyrola	adb3f0ec22	add exception for empty shape param Summary: Following krp's suggestion, check if the shape parameter is empty. Reviewed By: dzhulgakov Differential Revision: D4686698 fbshipit-source-id: 3f9fb1e3215dd2a4a726442531201eeb18224bc6	2017-03-10 00:33:59 -08:00
Karthik Prasad	965a7daf9b	Implement MILSTM in caffe2 Summary: Created a new function with specifics related to MI LSTM implementation in caffe2 See https://arxiv.org/pdf/1606.06630.pdf for details. See D4478877 for the implementation of the same in tensorflow Reviewed By: jhcross Differential Revision: D4669882 fbshipit-source-id: 095bbcf187dbdac2cd79558ff0c8f9f67d8af639	2017-03-09 16:32:47 -08:00
Jerry Pan	bde53f61af	Caffe2: add scuba logging to benchmark Summary: Caffe2: add scuba logging to benchmark Differential Revision: D4667194 fbshipit-source-id: 8e9fca5517d7d40a6bc3e55cd00161e7482cd6f4	2017-03-09 16:32:47 -08:00
Deepak Gopinath	57ecd20197	seq2seq open source implementation Summary: OSS implementation of seq2seq model in Caffe2. The script uses Seq2SeqModelCaffe2 class to build and run the model. It takes in training data in the form of text file with one sentence in each line, builds a vocabulary, generates batches based on batch size and runs the net for a configurable number of epochs. It prints total scalar loss at the end of each epoch. All FBLearner and neural_mt type system dependencies have been removed. Unimplemented and unnecessary methods have been removed to make the script simpler. fblearner/flow/projects/langtech/translation/neural_mt/model_util_caffe2.py has been moved to caffe2/caffe2/python/examples/seq2seq_util.py and remains unchanged Potential TODOs: - Get the model running in GPU. Only GatherOp does not have a corresponding GPU implementation. Try adding CopyGPUToCPU before and CopyCPUToGPU after Gather, and use CUDA DeviceOption. - Add evaluation on test data with suitable metric (perplexity? bleu?) Reviewed By: urikz Differential Revision: D4653333 fbshipit-source-id: 1c7d970ebc86afe23fad4d48854296bf54eb0f77	2017-03-09 16:18:08 -08:00
James Cross	c5621ded31	Allow use of ReversePackedSegs operator in CUDA context Summary: ReversePackedSegs operator for CUDA. Input "lengths" (static integers) required to be in CPU memory. Differential Revision: D4661281 fbshipit-source-id: c800c316c34015ba8e732dcbcaa8c4edaffdfeab	2017-03-09 15:03:55 -08:00
Aapo Kyrola	89c08334bb	data_parallel_model support for sparse gradients and CPU ops Summary: Data parallel model did not support sparse operations, nor gradients computed on CPU ops. Currently sparse operations are done on CPU, so there is no point of "data parallelizing" them. I had to make a few changes to data_parallel_model to support this: 1. Model can have params that are added prior to adding the data parallel part. For example, a lookup table of word vectors would be a parameter that is non-parallel. 2. Thus, when data parallel model is called, it will separate the non-parallel params and avoid working on them. Note: when we add distributed version, we need to explicitly handle them with AllGather! This works nicely since Caffe2 automatically adds the backward concat-operator when multiple ops gather from the same blob. I also added support for data parallel CPU ops, which might be necessary in cases when we don't have GPU implemenation of some ops. Test in data_parallel_model_test validates the correctness of the code by running the same trainer on different number of gpus and checking the end result is same. Reviewed By: jhcross Differential Revision: D4649208 fbshipit-source-id: e3b7ae701ead468dc94c52a976eafec5c9831097	2017-03-09 13:48:41 -08:00
Andrey Malevich	84e742ded7	Migrate realtime training workflows to use new metrics. Summary: This diff is getting rid of old metrics interface in realtime training. Reviewed By: xianjiec Differential Revision: D4649734 fbshipit-source-id: de4af85eb5476df9790ebd3915625bf8beee65af	2017-03-08 23:49:41 -08:00
Xianjie Chen	95501a0165	clean old unit test, add sum processor and sqrt pooling Summary: sum processor and sqrt pooling is to mimic the DoubleHelix model. Differential Revision: D4678413 fbshipit-source-id: fc1ccfe3c92c540ce5914dfd8ff1a040805c48db	2017-03-08 23:04:19 -08:00
Chonglin Sun	581e57c244	add AccumulateHistogramOp Summary: AccumulateHistogramOp, for computing the histogram of all values in input tensors Differential Revision: D4654417 fbshipit-source-id: dea92346004c772af16e1eb41306287d81dc5a02	2017-03-08 19:37:32 -08:00
Minsuk (Brian) Kahng	c6a9d7f188	User input (Conv out, etc.) Summary: Take user inputs for the introspection visualization: convolutions output layer activations, filters using containing phrases, and number of samples Reviewed By: Mortimerp9 Differential Revision: D4603797 fbshipit-source-id: dc972dcb8ad36e30defab266d710e047b11cff73	2017-03-08 13:49:45 -08:00
Ahmed Taei	4f0e7730a9	Distrubited Multi-GPU resnet50 Summary: Use filesystem rendezvous for dist-multi GPU training. Differential Revision: D4664945 fbshipit-source-id: 7b6767323e94bc4e7fa25ef3eba65b38abb79341	2017-03-08 11:39:29 -08:00
James Reed	8de1db9eb6	Implement recurrent attention in C2 Summary: Super rough implementation of recurrent attention. Planning to factor out the common code between the two functions as well as train and eval. I want to get this out and get eyes on it sooner rather than later Differential Revision: D4647837 fbshipit-source-id: 54bc4e8ed0df6f04c86c425926decbe89f73b068	2017-03-08 11:21:28 -08:00
Kittipat Virochsiri	f0d78753ae	Make ModelExporter.load_from_db() load to specific workspace Summary: In case of distributed task, load_from_db() loads to wrong workspace (when used inside a Python op). Passing which workspace to use explicitly so that it loads to the one Python op is being run. Reviewed By: kennyhorror Differential Revision: D4653692 fbshipit-source-id: 94585c012b05ee38b9ce5e8ef0efdd50aa41dd2b	2017-03-08 09:31:42 -08:00
Jiyan Yang	e75221e316	Add eval net to two tower workflow Summary: The evaluation part of the two tower workflow is missing. This diff is to complete it. Part of the newly added functions can be used for other workflows, eg, feed. As the eval workflow in different workflows will be overlapped, a generic eval workflow will be added in a separate diff. Reviewed By: kennyhorror Differential Revision: D4646880 fbshipit-source-id: 4d6eb35df10f6f613533d442f2a04dc0332386f8	2017-03-07 21:03:00 -08:00

1 2 3 4 5 ...

375 Commits