pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Aapo Kyrola	acb2ad12e5	fix race condition at terminate Summary: Looking at one segfault at exit (https://our.intern.facebook.com/intern/chronos/jobinstance/?jobinstanceid=911625597&smc=chronos_gp_admin_client&log_type=stderr&offset=0&pretty_logs=false) and it's coredump, only thing I can see that a FreeBlob() operator is called concurrently while a cudaMemcpyAsync (on thread 1) is crashing. FreeBlobOp is only called at data_workers _stop() (via utils.ResetBlobs()), and only code that could run a cudaMemcpyAsync that time is the fetcher -thread of data_workers that is enquing blobs. Here are the stacks: P57455299 This is clearly a bug since we should only clear the scratch blobs after all threads are terminated, which happens at wait_for_finish(). I am not 100% sure this fixes all the segfaults, but at least this one was most likely caused by this. Reviewed By: andrewwdye Differential Revision: D5146278 fbshipit-source-id: ae00796706bfc4fee6823caf6529b62ab20c1cd3	2017-05-30 13:47:10 -07:00
Aapo Kyrola	cdb50fbf2b	add optimizer support to data_parallel_model; Use MomentumSGDUpdate Summary: This diff does two things: - add supports for optimizer to data_parallel_model. User can supply optimizer_builder_fun instead of param_update_builder_fun. The latter is called for each GPU separately with proper namescope and devicescope, while optimizer builder only is called once and adds optimizes to the whole model. - use MomentumSGDUpdate instead of MomentumSGD + WeightedSum. This bring major perf benefits. Changes resnet50 trainer to use optimizer. This relies on D5133652 Reviewed By: dzhulgakov Differential Revision: D5142973 fbshipit-source-id: 98e1114f5fae6c657314b3296841ae2dad0dc0e2	2017-05-30 12:49:57 -07:00
Luke Yeager	0a9684c3b9	Mark in-place GPU dropout as broken, add test Summary: I'll let y'all decide how you want to fix this (probably need a persistent curand buffer). Here's a test to verify the fix. Closes https://github.com/caffe2/caffe2/pull/495 Differential Revision: D5148815 Pulled By: akyrola fbshipit-source-id: e80dabe65230ddd32340f2d872cd8786ac960bf8	2017-05-30 12:35:22 -07:00
Aapo Kyrola	44257ea5ed	automatically infer device scope for param Summary: hankun is using the optimizer, but having mixed set of of GPU and CPU operators. Currently this won't work with optimizer since it adds optimizers for all parameters in the current device scope. But we can actually infer the device that a param belongs to by looking at the device option in the param_init_net. Added a test as well. Reviewed By: salexspb Differential Revision: D5133652 fbshipit-source-id: ad8689d75ac1f5c78981bae1b6978fe91e40ef0f	2017-05-30 12:02:19 -07:00
Luke Yeager	6b1cf26380	Fix for dpm when GPUs don't have p2p access Summary: See discussion at https://github.com/caffe2/caffe2/pull/633#issuecomment-303536902 Tested with a TitanX (Pascal) and a TitanZ (Kepler) with this access pattern. ``` Checking GPU(s) for support of peer to peer memory access... > Peer access from TITAN X (Pascal) (GPU0) -> GeForce GTX TITAN Z (GPU1) : No > Peer access from TITAN X (Pascal) (GPU0) -> GeForce GTX TITAN Z (GPU2) : No > Peer access from GeForce GTX TITAN Z (GPU1) -> TITAN X (Pascal) (GPU0) : No > Peer access from GeForce GTX TITAN Z (GPU1) -> GeForce GTX TITAN Z (GPU2) : Yes > Peer access from GeForce GTX TITAN Z (GPU2) -> TITAN X (Pascal) (GPU0) : No > Peer access from GeForce GTX TITAN Z (GPU2) -> GeForce GTX TITAN Z (GPU1) : Yes ``` All combinations pass: * `0,1` * `0,2` * `1,2` * `0,1,2` Closes https://github.com/caffe2/caffe2/pull/659 Differential Revision: D5148779 Pulled By: akyrola fbshipit-source-id: 6263edfe8b36623983f1946b5c3f4a3fef415a45	2017-05-30 12:02:19 -07:00
Luke Yeager	a47652379f	Fix SparseAdagrad for indices.ndim>1 Summary: Same fix as https://github.com/caffe2/caffe2/pull/249, but for SparseAdagrad. Also update the tests for both ops to test this functionality. Closes https://github.com/caffe2/caffe2/pull/675 Differential Revision: D5148750 Pulled By: akyrola fbshipit-source-id: d30b722429bc547fd53400c1a29e4ee9e2e6ed18	2017-05-30 12:02:18 -07:00
Luke Yeager	16b240145a	Fixing some tests Summary: As dzhulgakov said at https://github.com/caffe2/caffe2/pull/227#issuecomment-295084443, it would be nice to avoid this stream of CPU-only test fixes. The second fix could have been avoided if tests were run on TravisCI. I think the TravisCI infra could be greatly improved if we used ccache like your colleagues at PyTorch: https://github.com/pytorch/pytorch/pull/614. Would you be interested in a PR which does this? Closes https://github.com/caffe2/caffe2/pull/547 Differential Revision: D5147405 Pulled By: akyrola fbshipit-source-id: 5e9a4571d364c5f0ed8a5e216c9b6136dd4d10be	2017-05-30 09:16:48 -07:00
Luke Yeager	dc517b6c42	Change hypothesis settings for slow memonger test Summary: Failure mode: ``` - 7 passing examples, 0 failing examples, 0 invalid examples - Typical runtimes: 12-14987 ms - Stopped because settings.timeout=60 ``` After this change: ``` - 5 passing examples, 0 failing examples, 0 invalid examples - Typical runtimes: 12-15475 ms - Stopped because settings.max_examples=5 ``` Obviously, the `DYNAMIC_PROGRAMMING` tests are the troublemakers. An alternate solution would be to make separate tests for the two assignment algorithms (one fast, one slow). Closes https://github.com/caffe2/caffe2/pull/676 Differential Revision: D5147363 Pulled By: akyrola fbshipit-source-id: 85d9f8198e53c10de2a8d6645e2b0eb7953c96e0	2017-05-30 09:16:48 -07:00
Simon Layton	2c3071fc4e	Rework initializers to pass a class not object Summary: Changed tests Moved to WeightInitializer, BiasInitializer keywords Closes https://github.com/caffe2/caffe2/pull/682 Reviewed By: Yangqing Differential Revision: D5138769 Pulled By: salexspb fbshipit-source-id: 81d266100b2a95c64c0196c16670dfd34ea03e02	2017-05-30 09:06:56 -07:00
Huazhong Ning	660dd58022	fix for realtime training. Reviewed By: kennyhorror Differential Revision: D5068298 fbshipit-source-id: 0dc3580c9c8123368a3625fb654c6eaf1dc4a950	2017-05-26 23:49:40 -07:00
Jiyan Yang	6aff754dbc	Add batch normalization layer Summary: As desc. Reviewed By: xianjiec Differential Revision: D5077230 fbshipit-source-id: f73cdedac6d9a3542f8ef829b54fb4c713dcafd0	2017-05-26 16:46:52 -07:00
Thomas Dudziak	ec19b4bd7b	Import fixes for Python 3 Summary: As title Differential Revision: D5135990 fbshipit-source-id: 88cb15bb2fb97dd21faf3ea5ddb8d4dbff7fad93	2017-05-26 16:31:50 -07:00
Thomas Dudziak	3ccbf23132	String-related fixes for Python 3 Summary: This diff is one step towards enabling python 3 build by making it be more diligent in its handling of strings. Reviewed By: salexspb Differential Revision: D4893083 fbshipit-source-id: 28b8adf3280e8d1f0a7dc9b0fee5ad53f2fada57	2017-05-26 16:04:32 -07:00
Anmol Kalia	7f98dc28cb	Refactored spatial softmax Summary: Refactored SoftmaxWithLoss by removing the code for spatial=1 mode and created a new op SpatialSoftmaxWithLoss that has the spatial mode implemented. Reviewed By: viswanathgs Differential Revision: D5104120 fbshipit-source-id: 8ab999e32c916b2a39a670a7b2a3365401535f24	2017-05-26 14:50:43 -07:00
Ahmed Taei	75a6f909c5	Add option to enable memonger for gradients and add param_names for save_model. Reviewed By: akyrola Differential Revision: D5131493 fbshipit-source-id: 7c159ccffa30eb064c157e559f1d8f0350f03ccb	2017-05-26 11:31:35 -07:00
Dmytro Dzhulgakov	35eaf444c0	Quickly hack sparsenn_benchmarks to also do BenchmarkNet Summary: Makes benchmark a bit hacky, but it's a benchmark after all :) Specifically ports functionality of proper BenchmarkNet run from the ads_benchmarks so that we can see training net perf. Also adds --report_interval parameter to print stats more often when running in hogwild mode kdub0 - hopefully if you have time you can integrate it properly with the Flow's workflow harouwu -shouldn't conflict too much with your current diff Reviewed By: rayleichen Differential Revision: D5125183 fbshipit-source-id: 9c6f1663bc85e26d6609f0f2f23aa280731939db	2017-05-26 10:48:45 -07:00
Aapo Kyrola	d60a2e3c58	UnsortedSegmentSum/Mean for CUDA Summary: To make optimizer for sparse gradients work with CUDA, we need UnsortedSegmentSum and Mean implemented for CUDA. Unique was already implemented by harouwu. Pretty straightforward implementations, should be fast enough -- and i don't know a faster way anyway. Added some tests as well. Reviewed By: asaadaldien Differential Revision: D5124548 fbshipit-source-id: 63ae72f45fc2f07470603f7b2de12f34635dbb3d	2017-05-26 09:33:49 -07:00
Luke Yeager	97159810c9	Restore compatibility with protobuf2 Summary: Addresses an issue with `417f74509e`. ``` > operators.append(proto.op.pop()) E AttributeError: 'RepeatedCompositeFieldContainer' object has no attribute 'pop' ``` /cc jhcross Closes https://github.com/caffe2/caffe2/pull/658 Reviewed By: dzhulgakov Differential Revision: D5130382 Pulled By: salexspb fbshipit-source-id: 34e0c39aad5f339c1aaa1506af3e7495193565f4	2017-05-26 08:47:24 -07:00
Alexander Sidorov	016f72537a	ModelHelper.create_param, Initializer abstraction and ParameterInfo for optimizers Summary: This is going to unblock Nvidia in their work on adding fp16 support to Caffe2. I discussed this with kennyhorror before to make sure this fits into his work on parameter sharing. Reviewed By: kennyhorror Differential Revision: D5127797 fbshipit-source-id: 4db155d320b1862570c23b77c4252bdacbf2296f	2017-05-25 22:03:15 -07:00
Andrey Malevich	6c12df3003	Fix export of SparseToDense layer. Summary: If there're 2 SparseToDense layers that are densifying same IdList feature it'll result in the situation, where we might export invalid input for the prediction in input specs. This diff is changing the behavior to support to use Alias to a new blob instead of passing things directly. Reviewed By: dzhulgakov Differential Revision: D5093754 fbshipit-source-id: ef4fa4ac3722331d6e72716bd0c6363b3a629cf7	2017-05-25 21:46:28 -07:00
Jiyan Yang	9bf1f16255	Add bias to cosine distance for two tower models Summary: Currently using two tower models with cosine distance results in bad calibration. Adding bias to the output of cosine term solves the problem. Reviewed By: xianjiec Differential Revision: D5132606 fbshipit-source-id: eb4fa75acf908db89954eeee67627b4a00572f61	2017-05-25 19:50:20 -07:00
Zhicheng Yan	2002018603	memory_leak_data_worker Summary: Memory leak happens when new BlobReference is constantly added to the set _scratch_blobs Reviewed By: panshen1 Differential Revision: D5134945 fbshipit-source-id: 3ce4d482153bb89de065f20cd91411178085caad	2017-05-25 19:22:03 -07:00
Pieter Noordhuis	a9b5efe3c2	Expose max collective concurrency Summary: This was hardcoded at 4 before but should be made configurable. Can be kept low for big MLPs and higher for convnets. Reviewed By: akyrola Differential Revision: D5126138 fbshipit-source-id: 713ee8bbeb243b7de1479808fd6398d397e0b49a	2017-05-25 13:32:40 -07:00
Mohamed Fawzy	e35a4fe5cc	Implement SizeOp as requested in github issue#583 Summary: Implement SizeOp that returns the number of elements in the input tensor. Output is 1D tensor that contains the number of elements Reviewed By: akyrola Differential Revision: D5101061 fbshipit-source-id: d1c56053b6f3b41c65ac574dd748482775d1ea0d	2017-05-25 11:07:35 -07:00
Artem Volkhin	55d293f730	remove non-existing blobs from output_schema in layer_model_instantiator Summary: In some cases (for example, when include_tags option is used) output_schema contains blobs that aren't produced by the generated net. In this case we want to filter them from output_schema as well. Differential Revision: D5120115 fbshipit-source-id: f98ea3f747589390b039d1e1987becec3980634c	2017-05-25 00:36:19 -07:00
Aapo Kyrola	da6b82b810	fix another bug related to in-place ops --> treat in-place ops like any other Summary: D5116828 changed how in-place ops were hanled in memonger and fixed a crash in NeuralMT. However, it still produced incorrect memongerization, because an op with one inplace input-output but another non-inplace output would be handled still incorrectly, as the other output's branch would not be followed properly. This is fixed by actually removing the whole in-place op special handling. This actually is not needed anymore, it was leftover from an older version of memonger that used topological sort of the ops. Reviewed By: asaadaldien Differential Revision: D5128142 fbshipit-source-id: b551b0faebdde410e6bd7516958c63cf610cc065	2017-05-24 23:32:03 -07:00
Deepak Gopinath	33c40e8a6e	Handling shared indices in sparse gradient updates Summary: When two or more blobs are gathered by the same indices blob in a data parallel model, we used to concatenate multiple times and re-write to the same indices blob. This leads to illegal memory access at times because the gradientslice indices blob is longer than its corresponding gradientslice values blob. This diff adds a check in order to avoid this. Reviewed By: akyrola Differential Revision: D5116817 fbshipit-source-id: 1c086d092eb6d48926d600f9408f578f5ddc41c7	2017-05-24 22:47:00 -07:00
Aapo Kyrola	f2303ccb77	fix tileop test Summary: Gradient test for tile op was flaky because i had made the dimensions too large. This caused push blocking errors. Also I noticed my test_grad_tile was incorrect. Reviewed By: asaadaldien Differential Revision: D5126476 fbshipit-source-id: ae9ce5d9041648d7a4535fc88d4013e669bd6f02	2017-05-24 18:32:01 -07:00
Bram Ton	4da076d3e9	Fixed typo caffe_translator.py, fixes bug #397 Summary: Fixed minor typo in python/caffe_translator.py. Fixes #397. Closes https://github.com/caffe2/caffe2/pull/412 Differential Revision: D4950875 Pulled By: aaronmarkham fbshipit-source-id: 07183c6d6e8e97451bb5ee5ff01a88553d6bdb82	2017-05-24 12:18:32 -07:00
Bor-Yiing Su	c79ce5c2ba	Profiles pipe stages. Summary: Adds timers to collect the average runtime for each pipe stage. Reviewed By: azzolini Differential Revision: D5083958 fbshipit-source-id: 42536bd70c80c2453d98d872286525388f6164c3	2017-05-24 12:02:03 -07:00
Viswanath Sivakumar	152d439400	Allow specifying net type in predictor_exporter Summary: predictor_exporter copies the original predict_net's op, external_input and external_output fields, but ignores the type field. This is reasonable as the train net would generally have 'dag' type and copying that for inference may not be applicable. It's good to have a way to specify the net type nevertheless to run DAGNet for inference. This diff adds a field in predictor_exporter to do that. Reviewed By: akyrola Differential Revision: D5122354 fbshipit-source-id: 0e3cc417128db903c71515135c9e3b87620ae21e	2017-05-24 11:46:27 -07:00
James Cross	03503140fd	DropoutCell as wrapper for another RNNCell Summary: Added a new RNNCell, DropoutCell, which wraps an existing RNNCell and applies dropout to its primary output (as defined by get_output_state_index()). Reviewed By: salexspb Differential Revision: D5084871 fbshipit-source-id: 60474af84e5757a12e7fdc3814840dc9ba8e32a1	2017-05-24 11:36:45 -07:00
Bram Wasti	c55be38e63	Added mobile exporter Summary: Basically takes in a live net and creates an init_net and predict_net which can be written to file and run in Predictor Reviewed By: salexspb Differential Revision: D4989425 fbshipit-source-id: 8052065da9ed763d48bd9e1e19f7697ef60a2829	2017-05-24 11:36:44 -07:00
James Cross	c39f6cf2d0	gradient accumulation fix Summary: As noted by salexspb, MultiRNNCell had unreliable gradient computation. The problem was that recurrent gradient and gradient computed wihtin the backward step net were not being accumulated during the backward pass, but rather writing to the same blob, thus overwriting each other. This diff fixes that by artificially introducing an extra blob for the internal output, and then accumulating it into the gradient coming from the recurrent connection. Reviewed By: salexspb Differential Revision: D5110059 fbshipit-source-id: 16add50989fe8866361bbc21afce5f214c5292fd	2017-05-24 10:33:32 -07:00
Aapo Kyrola	6c511f64cc	fix handling of ops with in-place input/output Summary: Memonger ignores ops with input and output in-place, but did not work correctly if there were also non-inplace inputs, like with Mul. Simple fix to also look at in-placeness during the traversar. Reviewed By: jhcross Differential Revision: D5116828 fbshipit-source-id: 52817f1221597986cc09cc65d094417c1923d965	2017-05-23 18:23:33 -07:00
Andrey Malevich	64e04e78d2	Remove AddOperator from ModelHelper Summary: It looks like AddOperator was never really used (searched across the whole code-base). In addition to this all model_helper functionality is getting replaced with Brew, so there I'd prefer to remove this method to reduce the amount of code touching model.params. Reviewed By: rayleichen Differential Revision: D5110425 fbshipit-source-id: f2a88e4c1ce5149d27e809e03da9a86c0867bc4d	2017-05-23 13:34:45 -07:00
Aapo Kyrola	2b11adb414	TileOp CUDA fix: number of threads must be hard coded Summary: I had "optimized" the number of threads / block, but cub::BlockReduce has a static template parameter for the number of threads, and this must match. Probably tests still passed because typically the initial numbers are zeros. Also added a stronger test. Thanks ves for the report. Differential Revision: D5110901 fbshipit-source-id: c1169b1286e204c202b0727448ddb51b4965eacb	2017-05-23 09:32:19 -07:00
Aapo Kyrola	74e964ff0d	make data_workers restartable Summary: Add ability to restart data workers data input. Reviewed By: andrewwdye Differential Revision: D5108666 fbshipit-source-id: f7f71cd6d4d45d007067814a552fc93cbe3eca42	2017-05-23 01:18:44 -07:00
Simon Layton	193c9289f0	Fix LRN schema for cuDNN op Summary: Correct schema generation was previously broken leading to invalid gradient op creation. Also exhibited in model_device_helper, where invalid schema were being created on the CPU when kwargs['engine'] == 'CUDNN' Closes https://github.com/caffe2/caffe2/pull/617 Reviewed By: asaadaldien Differential Revision: D5097062 Pulled By: akyrola fbshipit-source-id: e22181f857deccb7b4395e87271e2cbf1226eb64	2017-05-22 08:33:34 -07:00
Alexander Sidorov	92610e78bb	CuDNN comparison mode Summary: This is allows to produce nice comparisons against CuDNN. Currently on 1 layer I see about 28% slow down on average across setups specified. Reviewed By: akyrola Differential Revision: D4986218 fbshipit-source-id: efb12081f13dbfb92428fd4a85f12fd566eb9522	2017-05-20 15:19:43 -07:00
Aapo Kyrola	a2c01e830b	fix duplicate init blob issue + fix test Summary: Address KaimingHe's comments in D5093689 about same blob being initialized twice causing internal consistency check to fail. Also I noticed that my new test for test_checkpoint_params was completely botched due to an indentatino issue (it did not actually execute any test). So this fixes that as well. Modified the test to add a duplicate param initializer, so that this bug is tested for. Reviewed By: KaimingHe Differential Revision: D5101304 fbshipit-source-id: 72f343035c1b4953e7bb9a1a1c171cf05d3ead26	2017-05-20 09:18:29 -07:00
Aapo Kyrola	aa603a9083	add test for input order Summary: Based on jay-mahadeokar's code, add a test for input order consistency to data workers. Reviewed By: jay-mahadeokar Differential Revision: D5096887 fbshipit-source-id: efd226343f81e9a0157ec89d4588f1eee8a78549	2017-05-19 23:46:38 -07:00
Aapo Kyrola	6384bae29b	call save_to_db in CPUContext + fix a typo in data_parallel_model. Summary: If Predictor Exporter save_to_db is called in CUDAContext, a failure occurs since the following FeedBlob() tries to store a string (meta data), but for CUDA blobs we assume they are tensors. + fix a typo in data_parallel_model that I bumped on. Reviewed By: asaadaldien Differential Revision: D5099837 fbshipit-source-id: 69d01b35a9a1816bf083f13d8a6ce88e1f5aecb7	2017-05-19 18:25:00 -07:00
James Cross	83f6dceaa6	remove forget_bias as argument to AttentionCell constructor Summary: argument unsused. Differential Revision: D5096088 fbshipit-source-id: fcda8a1d2b0d7c85182ab5bc002c86640b443f97	2017-05-19 16:53:40 -07:00
Ahmed Taei	09bbd0382c	ConvNd cuDNN Summary: Add ConvND cuDNN implementation. Reviewed By: akyrola Differential Revision: D4702205 fbshipit-source-id: 65275bcff3970b0d43ac5c168d38bcd075985979	2017-05-19 15:20:33 -07:00
Aapo Kyrola	0af0cba2b7	Refactor data_parallel_model initial sync and checkpointing Summary: Major improvements. Before we only synced "params" and "computed params" of model after initialization and after loading a checkpoint. But actually we want to sync all blobs that are generated in the param_init_net. For example the _momentum blobs were missed by the previous implementation and had to be manually included in checkpoint finalization. I also added GetCheckpointParams() to data_parallel_model because it is now fully general. Also added a unit test. Reviewed By: andrewwdye Differential Revision: D5093689 fbshipit-source-id: 8154ded0c73cd6a0f54ee024dc5f2c6826ed7e42	2017-05-19 12:48:06 -07:00
Yiming Wu	0aeffa985e	make sure mutex is on CPU too Summary: mutex is only supported on CPU. need to make sure mutex and following atomicIter are both on CPU. This is critical for gpu SparseNN training Differential Revision: D5093184 fbshipit-source-id: 021e6ba699a3208449fa4761cad6b0ec4544957e	2017-05-19 12:17:17 -07:00
Yiming Wu	65750349ba	deprecate CNNModelHelper in python/operator_test dir Summary: deprecate CNNModelHelper in python/operator_test dir BTW I found that there is 2 mkl_speed_test. I am confused... Reviewed By: salexspb Differential Revision: D5094122 fbshipit-source-id: f6526f4de334f2245eb4c1f204a8ec9f23750d78	2017-05-19 12:17:17 -07:00
Ahmed Taei	32bf7a2c2b	Generalize PoolingOp(cuDNN) to compute 2D and 3D pooling. Reviewed By: akyrola Differential Revision: D5090689 fbshipit-source-id: f9f11e12adc0ee8db088f3397a8c33aa31eb5deb	2017-05-19 10:19:00 -07:00
Yiming Wu	1b7497807f	cnnmodelhelper deprecate warning Summary: We will start our API migration process. Before that, I want to make sure people don't add new CNNModelHelper instance to our opensource code. So that I put deprecation warning here in advance Reviewed By: salexspb Differential Revision: D5093556 fbshipit-source-id: 74bf4a7782c2d882f72f202d48c72255d152b68a	2017-05-18 23:35:26 -07:00

1 2 3 4 5 ...

725 Commits