pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Yinghai Lu	582d47e986	[Caffe2] Scoped dummy name generator (#6458 ) * Scoped dummy name generator * Fix * Fix * Use class variable * Fix build * comment	2018-04-16 11:58:02 -07:00
bddppq	7ef14bf04c	Follow the change of ONNX Cast operator "to" attribute (#6574 ) * Follow the change of ONNX Cast operator "to" attribute * Update Cast conversion in frontend and backend * update pytorch onnx frontend	2018-04-16 14:24:42 -04:00
Xiaomeng Yang	cd2112717c	[caffe2] Update math functions with params on host. (#6602 ) * Update ReduceMean Add reduce mean to math Add reduce mean to math * sync reduce_ops_test * Update math_gpu.cu	2018-04-14 21:41:41 -07:00
Yinghai Lu	434f710f3f	[Caffe2] Add support to TensorRT (#6150 ) * Add support to TensorRT * Removed License header * Bind input/output by position * Comments * More comments * Add benchmark * Add warning for performance degradation on large batch * Address comments * comments	2018-04-11 17:03:54 -07:00
Yinghai Lu	ef8f556212	[Caffe2] Changes done inside Facebook (#6378 ) * fix unit test for sqrt op From the error logging: [idx, grad, grad_estimate] are: [[ 146. 0.5 0.45776367] [ 147. 0.5 0.45776367] The gradient == 0.5 is correct, which means the SqrtOp and its gradient is doing right job. (Because y = sqrt(x), loss = y^2/2 = x/2, and then d(loss)/dx = 1/2 = 0.5; ) The test failed because of numerical problem of grad_estimate (in unit test). It can be because the step_size is small, and float precision is not high (when there are multiple elements in the tensor, we do sum(y^2) to compute loss) This diff - increase the step size, and also move the test cases to be further away from 0 (where sqrt(x) is not well defined) to be safe :) - also clean up, and merge the test case for inplace Vs. non-inplace Tested with: `CAFFE2_HYPOTHESIS_PROFILE=debug ai_bt caffe2/caffe2/python/operator_test:elementwise_ops_test -- "test_sqrt"` * CompositeReader & CompositeReaderBuilder A new type of reader gluing multiple readers together. * Back out "Revert D7394363: [GanH]: Log D Trick for Cross Entropy with Sigmoid" Original commit changeset: 9325a4356dbe * [dai][WIP] convert params to int8 on ps before sending to trainer Add float->uint8 conversion in addition to float->fp16 conversion in model_saver. * [easy] improve unit test for sparse length sum ops as desc. #accept2ship * Update GitHub upstream to `771fcb3455` * move sparse hash unique ops to OOS and add unit tests - move the SparseHash version to OOS, since 'sparsehash' is already deps of caffe2 OOS: https://fburl.com/arssw4n1 - The 'SparseHash' engine is also being used in OOS, so the SparseHash version shall be in OOS to reduce confusion: https://fburl.com/o5ea7ah2 - fix the CUDA UniqueOp for the case when batch is empty. - add unit test * group_norm_op for caffe2 This is the cuda op for Group Normalization (GN): https://arxiv.org/abs/1803.08494 This code implements GN in one op that computes Y=gamma * (X-mu) / sigma + beta and also its gradients. It is expected to have minimal memory consumption (similar to the BN op), without creating new blobs if GN were implemented as several ops (e.g., reshape, norm_mean/std, affine_channel). * Resubmit D7405233: disappeared in D7464958 OOS publish causes the op missing -- however, test was still there * [c2] add sparse hash engine for cuda unique op The SparseHash version of UniqueOp copy input tensor to CPU, and make use of sparse hash map to get unique output, and then copy back to GPU. * [dper][gpu] enable unit testing gpu trainer for sparse nn to debug the GPU trainer using mock data in unit test. make it easier to develop GPU trainer for new models. * Reuse Gloo context for Synchronize() calls Previously we were creating (and leaking) the Gloo context on each call to Synchronize(). Now only run the common world op and create the barrier net once, then run the barrier net on each Synchronize() call. Since timeout is associated with the Gloo context, assert that the timeout is fixed instead of trying to handle the complexity of multiple timeouts (and associated contexts). * [GanH/WGAN][1/n]: add FC param clipping as titled * [mobile] minimizing changes between caffe2_benchmark and speed_benchmark * [GanH]: enable diagnose within model avoid finding blob names but to directly enable inside the model * Add `net_transformer_fun` option to DPM This callback allows for various transformations to be made to the model after gradient operators have been added. The immediate motivation for this is to allow transformations such has "checkpoint-and-recompute" which allow trading off memory for additional compute. Adding several callbacks like this has made DPM's API less than ideal at this stage. However, I could not find any reasonable alternative. * [DT] [33/n] Compile flow task groups task groups need to compiled in order to pickle the object in fblearner. However I also changed the Job's compile function as creating new object is not necessary. * Initial commit for sparse_normalize vectorization and benchmark * [GanH]: LB Calibration for JSD as titled * Tracing event in async executor Adding event tracing through TRACE_EVENT macro in async executor * [Resubmit] D7409751 Reseting book-keeping blobs when the reservoir is reset D7409751 got lost in D7464958 * Visualizing realtime weights values we want to visualize the weights values as optimizer is iterating. This diff supports to visual the weights at an assigned index. Currently, we assume the blob to be 2 dimensional. * [GanH][Easy]: Fix Homotopy Weighting apparantely, there was a bug in homotopy weight (alpha, beta) update * [c2] move sparse hash unique op out of oss so that oss do not need to depend on google hash map. * Get rid of std::round as it's not supported on Android * Revert changes on setup.py * Skip shaky test on Dataio * fix	2018-04-10 21:11:43 -07:00
Bram Wasti	7bd398b3db	Add fuseNNPACKConvRelu (#6439 )	2018-04-10 16:51:16 -07:00
Qinqing Zheng	038b66ee07	[caffe2] use dictionary in Printer (#6443 )	2018-04-10 10:37:07 -07:00
Qinqing Zheng	66791f54d5	Update the compile function of Job (#6323 )	2018-04-09 22:44:23 -07:00
bddppq	df2e1d2962	Disallow using the OOP api workspace as context managers (#6456 )	2018-04-09 22:13:54 -07:00
François Garillot	a91c88a348	Check mappings ONNX -> Caffe2 bear the same argument names (#6317 ) * Check mappings ONNX -> Caffe2 bear the same argument names When adding an extra arg to an input ONNX op, if it's not supported in Caffe2, the exporter would just silently pass it to NetDef and ignore it in the implementation. It's pretty error-prone. Caffe2 also has an OpSchema description and we can enforce that all arguments explicitly appear in schema or listed explicitly in Caffe2. See also https://github.com/caffe2/caffe2/pull/2478 Add test for C2 argument checking * Some operators do not log arguments, which prevents argument checks. Invite users to file an issue to fix the schema.	2018-04-09 09:15:42 -07:00
Svetoslav Kolev	997acfd7fe	[Caffe2] Some small changes to InferBlobShapesAndTypes definition and SameAsInput Schema (#6335 ) * Change Same as input type deduction to work for ops with multiple outputs * change InferBlobShapesAndTypes definition to take vector ot pointers instead of unique_ptr. The function doesn't own the objects, so no need to pass smart pointers and that prevents calling the function with existing object, since the caller has to create unique_ptr, i.e. copy an existing object just to create the pointer * switching order of std::move<unique_ptr> and uniqur_ptr.get * adding comma	2018-04-06 19:06:46 -07:00
Lu Fang	aab0bd3c13	Change onnx_optimizer API (#6290 )	2018-04-06 13:46:53 -07:00
Lu Fang	876ad110af	Skip some unsupported onnx backend tests (#6247 )	2018-04-05 21:33:35 -07:00
bddppq	8df2487de9	Properly skip the failing onnx conversion test (#6280 )	2018-04-04 14:07:03 -07:00
kuttas	460e8cd376	change print to logger.warning in operator traceback code (#6216 )	2018-04-03 08:01:25 -07:00
Qinqing Zheng	fd2e7cb487	Change JobRunner's __call__ function to train (#6205 )	2018-04-02 21:04:36 -07:00
Paul Jesse Hellemn	771fcb3455	[caffe2] Fbcode to GitHub sync (#6208 ) * [easy] allow empty tensor in cuda relu op The diff has not enabled unit test of empty tensor, because MLKVersion of ReluOp need extra work to support * Make blob norm plotting work with distributed trainer when the old framework is used	2018-04-02 16:35:27 -07:00
Orion Reblitz-Richardson	a409f959e8	Remove ShuffleNet from model zoo. (#6203 ) * No longer supported.	2018-04-02 15:00:06 -07:00
Orion Reblitz-Richardson	cbe92abd7c	Disable failing test_lengths_max_gpu	2018-03-30 21:00:45 -07:00
Ellie Wen	3d27095eec	[easy] fix comments nit: fix comments	2018-03-30 21:00:44 -07:00
Qinqing Zheng	365652229d	Back out "Revert D7372460: [DT] [28/n] Lift epoch_limiter" Original commit changeset: b0a986d16c3b	2018-03-30 21:00:44 -07:00
Andrey Malevich	b9d2ba1dbf	Revert D7394363: [GanH]: Log D Trick for Cross Entropy with Sigmoid This reverts commit d63266ccbc0c1390c58c2a71ae0b562fdec2fbc0 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files	2018-03-30 21:00:44 -07:00
Ellie Wen	363a227d19	extend bucketize op to support duplicated boundries upgrade bucketize op to support duplicated boundaries	2018-03-30 21:00:44 -07:00
Jason Gauci	551d5fbf9a	CUDA version of LengthsMax operator CUDA version of LengthsMax operator @override-unit-failures	2018-03-30 21:00:44 -07:00
Andrew Tulloch	0df662c67f	[Caffe2] [Int8] More exhaustive unit tests for int8 ops (+ bug fix in Int8Add in-place case) As title. This catches one bug in the Int8Add in-place case, which wasn't tested in int8_test.cc	2018-03-30 21:00:44 -07:00
Xiaolong Wang	2b0e39f569	[GanH]: Log D Trick for Cross Entropy with Sigmoid as titled	2018-03-30 21:00:44 -07:00
Andrey Malevich	f8eb8a66e2	Revert D7372460: [DT] [28/n] Lift epoch_limiter This reverts commit 05bd9bec10fad5ff9dc40be88836fd7274d50ce9 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files	2018-03-30 21:00:44 -07:00
Bram Wasti	ee64200c64	[nomnigraph] Expose transformations to python Adding a python interface to the transformations	2018-03-30 21:00:44 -07:00
Yiming Wu	03c5198331	[C2 Int8][C2 Core]fetch int8 blob Providing Python API to fetch Int8 tensors. data, scale. zero_point = workspace.FetchInt8Blob(blob_name) now returns a tuple if the blob contains a Int8TensorCPU 'data' = int8 data array 'scale' = fake quantization scale 'zero_point' = fake quantization offset Although FetchBlob shares back-end implmentation with FetchInt8Blob, we raise error to prevent unexpected behavior of the same method	2018-03-30 21:00:44 -07:00
Lu Fang	8f3ba30266	Fix a typo Fix a typo in optimize_onnx_test.py	2018-03-30 21:00:44 -07:00
James Reed	47a1fd208f	Quick and dirty raw value substitution from zip file (#2454 )	2018-03-29 19:18:58 -07:00
Lu Fang	344fa57680	Adjust the test since only the op only has CPU implementation	2018-03-27 18:10:39 -07:00
Lu Fang	0ac8495165	Fix the CMake issues caused by internal changes	2018-03-27 18:10:39 -07:00
Xiaolong Wang	af3dcdf6ae	[D2]: Improve loss weight by allowing omitted weights as titled	2018-03-27 18:10:39 -07:00
Xiaolong Wang	d6c30ee6af	[GanH]: Unifying two discriminators to improve the flexibility and combines different discriminators in one model.	2018-03-27 18:10:39 -07:00
Jongsoo Park	3300e21d52	Add SparseLengthsPositionalWeightedSum operator that fuses SparseLengthsWeightedSum, LengthsRangeFill, and Gather add SparseLengthsPositionalWeightedSum operator that fuses SparseLengthsWeightedSum, LengthsRangeFill, and Gather	2018-03-27 18:10:39 -07:00
Xianjie Chen	e6b04ba121	fix lengths sum cuda op for empty batch the cuda does not allow launching empty kernel	2018-03-27 18:10:39 -07:00
Xianjie Chen	6ed9a0c3f2	fix cuda elementwise ops for empty batch CUDA will fail to launch empty kernel	2018-03-27 18:10:39 -07:00
Dehua Cheng	c6587597d8	Ignore backward step when there is no loss function; Ignore backward step when there is no loss function; For some customized model, we can encode the update directly in forward step and there is no backward step;	2018-03-27 18:10:39 -07:00
Xiaolong Wang	c909abd85f	[GanH] Label Smooth: Add Layer and Integrate to SparseNN as titled	2018-03-27 18:10:39 -07:00
Yan Zhu	107cb670b1	add typecast and assertion for histogram computing as title	2018-03-27 18:10:39 -07:00
Xianjie Chen	078b6d5ad1	[layer model] remove duplicated init ops it saves some model init time, and reduce confusion.	2018-03-27 18:10:39 -07:00
Roxie He	d2453afb1e	Add SumElementsInt operator Added a caffe2 math sum operator so that it takes integers (only int32) Changed the SumFloatIter to SumGenericIter so that it takes >1 types. Added a sumElementInt operator	2018-03-27 18:10:39 -07:00
James Cross	16312e8123	[fbtranslate/onnx] decoder step (pytorch -> caffe2) exporter for fbtranlsate This code introduces a new class for exporting decoder step (ensemble) models trained with fbtranslate pytorch to Caffe2 models via ONNX, for the purpose of use in "component beam search" being developed concurrently in C++ by @juancarabina.	2018-03-27 18:10:39 -07:00
Manoj Krishnan	a92a6233b5	Enable support for placeholder ops in InjectCrossDeviceCopies This is required to support placeholder/decorator ops which does not have operator schema. Note that the change is made in such a way that it is a no-op if placeholder Ops are not used. Changes: 1. Since the placeholder ops always run on CPU, added a utility to infer placeholder ops blob devices. 2. Placeholder op's input/output blobs should be on CPU as well. This change takes care of dealing with output blobs - i.e. use blobs on CPU. 3. Added a Unit test - test_inject_copy_placeholder_ops	2018-03-27 18:10:39 -07:00
Jiyan Yang	8fa38f8dce	Add gradient clipping (#2452 ) As titled.	2018-03-27 15:10:15 -07:00
Orion Reblitz-Richardson	1d5780d42c	Remove Apache headers from source. * LICENSE file contains details, so removing from individual source files.	2018-03-27 13:10:18 -07:00
Jason Gauci	f93e820e7d	Revert "[C2][GPU]LengthsMax CUDA version (#2209 )" (#2444 ) This reverts commit 71acc269bb573c8c04343e6d534b2557a456b29a.	2018-03-27 01:15:52 -07:00
harouwu	6740126f5c	[C2][GPU]LengthsMax CUDA version (#2209 ) lengthsmax CUDA version. will provide gradient later	2018-03-27 00:19:17 -07:00
Kutta Srinivasan	0e0918cb9a	dpm synchronize	2018-03-26 19:54:31 -07:00

1 2 3 4 5 ...

1732 Commits