pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Xiaomeng Yang	271f005eeb	Add elementwise_affine for LayerNormGradientOp (#19982 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19982 Add elementwise_affine for LayerNormGradientOp Reviewed By: houseroad Differential Revision: D15157493 fbshipit-source-id: 7465f2c1d4df4649b4903b93483c4861e9c7afa9	2019-05-03 15:33:46 -07:00
Jerry Zhang	ff0a7ae43f	Testing for folded conv_bn_relu (#19298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19298 Proper testing for conv_bn_relu folding Differential Revision: D13998891 fbshipit-source-id: ceb58ccec19885cbbf38964ee0d0db070e098b4a	2019-04-16 19:04:06 -07:00
Xiaomeng Yang	49f87320ba	[Caffe2] Add full impl of GroupNorm (#7058 ) * Add full impl of GroupNorm * Fix comments in math.h * Remove unsed buffers * Add #include <array> in gpu version * Remove unused moments_buffer_ * Make inverse std to be a template. * Add detailed comments	2018-04-29 11:26:40 -07:00
Yinghai Lu	ef8f556212	[Caffe2] Changes done inside Facebook (#6378 ) * fix unit test for sqrt op From the error logging: [idx, grad, grad_estimate] are: [[ 146. 0.5 0.45776367] [ 147. 0.5 0.45776367] The gradient == 0.5 is correct, which means the SqrtOp and its gradient is doing right job. (Because y = sqrt(x), loss = y^2/2 = x/2, and then d(loss)/dx = 1/2 = 0.5; ) The test failed because of numerical problem of grad_estimate (in unit test). It can be because the step_size is small, and float precision is not high (when there are multiple elements in the tensor, we do sum(y^2) to compute loss) This diff - increase the step size, and also move the test cases to be further away from 0 (where sqrt(x) is not well defined) to be safe :) - also clean up, and merge the test case for inplace Vs. non-inplace Tested with: `CAFFE2_HYPOTHESIS_PROFILE=debug ai_bt caffe2/caffe2/python/operator_test:elementwise_ops_test -- "test_sqrt"` * CompositeReader & CompositeReaderBuilder A new type of reader gluing multiple readers together. * Back out "Revert D7394363: [GanH]: Log D Trick for Cross Entropy with Sigmoid" Original commit changeset: 9325a4356dbe * [dai][WIP] convert params to int8 on ps before sending to trainer Add float->uint8 conversion in addition to float->fp16 conversion in model_saver. * [easy] improve unit test for sparse length sum ops as desc. #accept2ship * Update GitHub upstream to `771fcb3455` * move sparse hash unique ops to OOS and add unit tests - move the SparseHash version to OOS, since 'sparsehash' is already deps of caffe2 OOS: https://fburl.com/arssw4n1 - The 'SparseHash' engine is also being used in OOS, so the SparseHash version shall be in OOS to reduce confusion: https://fburl.com/o5ea7ah2 - fix the CUDA UniqueOp for the case when batch is empty. - add unit test * group_norm_op for caffe2 This is the cuda op for Group Normalization (GN): https://arxiv.org/abs/1803.08494 This code implements GN in one op that computes Y=gamma * (X-mu) / sigma + beta and also its gradients. It is expected to have minimal memory consumption (similar to the BN op), without creating new blobs if GN were implemented as several ops (e.g., reshape, norm_mean/std, affine_channel). * Resubmit D7405233: disappeared in D7464958 OOS publish causes the op missing -- however, test was still there * [c2] add sparse hash engine for cuda unique op The SparseHash version of UniqueOp copy input tensor to CPU, and make use of sparse hash map to get unique output, and then copy back to GPU. * [dper][gpu] enable unit testing gpu trainer for sparse nn to debug the GPU trainer using mock data in unit test. make it easier to develop GPU trainer for new models. * Reuse Gloo context for Synchronize() calls Previously we were creating (and leaking) the Gloo context on each call to Synchronize(). Now only run the common world op and create the barrier net once, then run the barrier net on each Synchronize() call. Since timeout is associated with the Gloo context, assert that the timeout is fixed instead of trying to handle the complexity of multiple timeouts (and associated contexts). * [GanH/WGAN][1/n]: add FC param clipping as titled * [mobile] minimizing changes between caffe2_benchmark and speed_benchmark * [GanH]: enable diagnose within model avoid finding blob names but to directly enable inside the model * Add `net_transformer_fun` option to DPM This callback allows for various transformations to be made to the model after gradient operators have been added. The immediate motivation for this is to allow transformations such has "checkpoint-and-recompute" which allow trading off memory for additional compute. Adding several callbacks like this has made DPM's API less than ideal at this stage. However, I could not find any reasonable alternative. * [DT] [33/n] Compile flow task groups task groups need to compiled in order to pickle the object in fblearner. However I also changed the Job's compile function as creating new object is not necessary. * Initial commit for sparse_normalize vectorization and benchmark * [GanH]: LB Calibration for JSD as titled * Tracing event in async executor Adding event tracing through TRACE_EVENT macro in async executor * [Resubmit] D7409751 Reseting book-keeping blobs when the reservoir is reset D7409751 got lost in D7464958 * Visualizing realtime weights values we want to visualize the weights values as optimizer is iterating. This diff supports to visual the weights at an assigned index. Currently, we assume the blob to be 2 dimensional. * [GanH][Easy]: Fix Homotopy Weighting apparantely, there was a bug in homotopy weight (alpha, beta) update * [c2] move sparse hash unique op out of oss so that oss do not need to depend on google hash map. * Get rid of std::round as it's not supported on Android * Revert changes on setup.py * Skip shaky test on Dataio * fix	2018-04-10 21:11:43 -07:00
Orion Reblitz-Richardson	1d5780d42c	Remove Apache headers from source. * LICENSE file contains details, so removing from individual source files.	2018-03-27 13:10:18 -07:00
James Cross	2c190d2f05	update transformer code for layer_norm() API change Summary: Quick fix for unit test broken by D6454290. This is my fault for approving while the tests covering the single callsite were broken. Reviewed By: goldsborough Differential Revision: D6466566 fbshipit-source-id: 2683be3d6bb184286e64fbde3e572946e39030c7	2017-12-01 20:19:31 -08:00
Peter Goldsborough	b43c1b2bed	Fix and upgrade brew.layer_norm Summary: While working on layer normalization for LSTMs I encountered an issue where the layer norm parameters (which are the scale/gain and bias/shift from the paper) were not registered in the model for `brew.layer_norm`. salexspb explained that this is because it was using the `init_net_param` API instead of `create_param`. This diff fixes this. While fixing I noticed that I noticed that `brew.layer_norm` actually had a bug where it was multiplying with the bias instead of adding it. Another issue was that the function giving the scale and bias a shape of `[1]`, however the paper (https://arxiv.org/pdf/1607.06450.pdf) specifies that, like for batch norm, there is one scale and bias parameter per neuron, i.e. the shape should be `[1, axis_dimension]`. The API now takes an explicit `dim_in` parameter (also more consistent with other normalization functions in that module) so that this can be specified. See tests for how this now looks. Reviewed By: jhcross Differential Revision: D6454290 fbshipit-source-id: fc00ca614de3190c40ab743e8984bec9e85fb58c	2017-12-01 14:18:28 -08:00
Aapo Kyrola	14f95c2782	Updated brew SpatialBN to use initializers Summary: Updated brew SpatialBN to use initializers similar to other brew ops such as conv and fc instead of initilaizing all of its parameters itself within the brew call. Reviewed By: asaadaldien Differential Revision: D5840359 fbshipit-source-id: 9f3d688d4957605eaf7ecd2488bc26bfb1da3f78	2017-11-02 11:25:45 -07:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
James Reed	f388135d3f	Layer norm brew wrapper Summary: Implement a brew wrapper for the LayerNorm op. This adds the scalar weight and bias terms to the op. Reviewed By: jmp84 Differential Revision: D5595836 fbshipit-source-id: 467b2e1158b0c454a149d4b26c47719826e98752	2017-08-17 11:17:47 -07:00
Simon Layton	ded2a5899e	Option to set BN scale and bias initial values Summary: Necessary to reproduce setup from 1-hour imagenet paper Closes https://github.com/caffe2/caffe2/pull/995 Differential Revision: D5547666 Pulled By: akyrola fbshipit-source-id: cbd4396888b02f32c67e1fe7e53636329de64f1b	2017-08-02 11:38:57 -07:00
Andrey Malevich	a8fb85797c	Refactoring of the parameters step 0. Add simple tags and unify interface for params and computed_params. Summary: This diff is the first step in the effort for refactoring all parameters. As a first step - I'm merging concept of params and computed_params, that is going to be based on tags instead (in the first version it's still using old data structs to store all the BlobReferences). Renaming computed_params to non-trainable/non-backprop params should be done is some other diff. Reviewed By: salexspb Differential Revision: D5171159 fbshipit-source-id: 68031ca779f053fb266a7c4a2e5b482a3bd9c832	2017-06-02 17:17:57 -07:00
Aapo Kyrola	076376f4f6	Revert D5119830: [C2] Refactoring of the parameters step 0. Add simple tags and unify interface for params and computed_params Summary: This reverts commit 2001090a37346eb12abbb234e13e727c288eb8a7 Differential Revision: D5119830 fbshipit-source-id: bf321868338f0db85dff3237af7eaf74212dbdf6	2017-06-01 00:02:21 -07:00
Andrey Malevich	ff61ed358e	Refactoring of the parameters step 0. Add simple tags and unify interface for params and computed_params Summary: This diff is the first step in the effort for refactoring all paramters. As a first step - I'm merging concept of params and computed_params, that is going to be based on tags instead (in the first version it's still using old data structs to store all the BlobReferences). Renaming computed_params to non-trainable/non-backprop params should be done is some other diff. Reviewed By: salexspb Differential Revision: D5119830 fbshipit-source-id: 2001090a37346eb12abbb234e13e727c288eb8a7	2017-05-31 22:36:36 -07:00
Simon Layton	193c9289f0	Fix LRN schema for cuDNN op Summary: Correct schema generation was previously broken leading to invalid gradient op creation. Also exhibited in model_device_helper, where invalid schema were being created on the CPU when kwargs['engine'] == 'CUDNN' Closes https://github.com/caffe2/caffe2/pull/617 Reviewed By: asaadaldien Differential Revision: D5097062 Pulled By: akyrola fbshipit-source-id: e22181f857deccb7b4395e87271e2cbf1226eb64	2017-05-22 08:33:34 -07:00
Yiming Wu	a28b01c155	rnn with brew Summary: Update rnn_cell.py and char_rnn.py example with new `brew` model. - Deprecated CNNModelHelper - replace all helper functions with brew helper functions - Use `model.net.<SingleOp>` format to create bare bone Operator for better clarity. Reviewed By: salexspb Differential Revision: D5062963 fbshipit-source-id: 254f7b9059a29621027d2b09e932f3f81db2e0ce	2017-05-16 13:33:44 -07:00
Simon Layton	1d0ba2cfbd	New cudnn ops Summary: cuDNN versions of dropout and LRN (for native fp16 support), port of Caffe's max pooling algo that uses an explicit mask to store locations (also supports fp16 storage) Closes https://github.com/caffe2/caffe2/pull/396 Reviewed By: akyrola Differential Revision: D4990880 Pulled By: asaadaldien fbshipit-source-id: a716acffb656843e9b31e3e6808bd2d8aa959d03	2017-05-08 16:33:21 -07:00
Yiming Wu	aa5a46b848	fix LRN order Summary: fix LRN helper's order Reviewed By: salexspb Differential Revision: D4949902 fbshipit-source-id: 88b1aa985546d36aa66c0677c617979ff186d78a	2017-04-27 16:46:47 -07:00
Yiming Wu	0bb558716a	rename model_helpers to brew and lowercase all helper functions Summary: rename model_helpers to brew. This is a big diff now. I did these things: 1. replace model_helpers with brew: find . -type f -exec sed -i 's/model_helpers/brew/g' {} + 2. rename model_helpers.py and model_helpers_test.py 3. rename ModelHelpersTest to BrewTest 4. lowercase all the helper functions to distinguish them from single op 5. run my unittests 6. run converge tests Reviewed By: salexspb Differential Revision: D4930465 fbshipit-source-id: f420a1b03238df1cbe9f4426e0b9c43a12119661	2017-04-24 15:52:26 -07:00
Yiming Wu	3623c241c4	normalization helpers Summary: Add normalization helpers Reviewed By: salexspb Differential Revision: D4884786 fbshipit-source-id: 529e678bae133e85d981310014c15d551d39d48b	2017-04-17 15:03:04 -07:00
Aapo Kyrola	580ff3a594	Revert D4854240: [EAZY][C2 OSS] Add normalization helpers and proxy to CNNModelHelper Summary: This reverts commit 3fa594d79960742b34e20d843e8b6ef8aeb601d3 Differential Revision: D4854240 fbshipit-source-id: d08cb30f188f876e1962f53a44f4e6d4ea68297f	2017-04-12 16:46:01 -07:00
Yiming Wu	b8f2baec8e	Add normalization helpers and proxy to CNNModelHelper Summary: Add normalization helpers and proxy to CNNModelHelper Reviewed By: salexspb Differential Revision: D4854240 fbshipit-source-id: 3fa594d79960742b34e20d843e8b6ef8aeb601d3	2017-04-11 23:02:59 -07:00

22 Commits