Summary:
Commit 479e4ce5 didn't end up solving the health checks firing and
they are likely still caused by the remaining `assume` calls.
Closes https://github.com/caffe2/caffe2/pull/1625
Differential Revision: D6573036
Pulled By: pietern
fbshipit-source-id: eeb21bdd61dca0a632eb1ba9e529177ac2569bfd
Summary: the "assume" statement in adagrad_test leads to health check failure. here we remove it by checking dc == hu.gpu_do
Reviewed By: pietern
Differential Revision: D6513314
fbshipit-source-id: 4caf2d938e5f5935a95cca8abd99185182223d63
Summary:
PR #1536 suppressed test_sparse_adagrad but test_row_wise_sparse_adagrad also filters too many examples. Suppress health checks for this test as well.
Closes https://github.com/caffe2/caffe2/pull/1599
Differential Revision: D6530850
Pulled By: pietern
fbshipit-source-id: c73f30d2e104565421e3e381b1cf66185edc833e
Summary:
With some test seeds this warning starts firing.
Should be addressed in a better way, not generating as many invalid examples.
Closes https://github.com/caffe2/caffe2/pull/1536
Reviewed By: bddppq
Differential Revision: D6437138
Pulled By: pietern
fbshipit-source-id: c619d928a585e3d887f686db5d98f841af10c56b
Summary:
Implemented new CUDA class for operator SparseAdagrad. The param and moment inputs now can be float or float16.
The functions for mixed-precision add/mult/store are defined in a separate head file ("caffe2/core/float16_util.h") for reuse purpose.
Reviewed By: azzolini
Differential Revision: D5880200
fbshipit-source-id: dca227f38629a03a9d771f42efe2c0b673075c4d
Summary: Implemented version of SparseAdagrad that only keeps track of an average sum of squared gradients term for each row of the parameter tensor, rather than a sum of squared gradients term for each individual parameter.
Differential Revision: D5881918
fbshipit-source-id: bd96ccf25554b457baaaca9309fc8048adbb37f7
Summary:
These GPU paths are probably even buggier than the CPU paths for sparse gradients with duplicate indices. Both paths cause multiple momentum updates in a single iteration, but only the GPU path is non-deterministic. Depending on how we decide to address the issues on the CPU path, pooyadavoodi has a good idea for how to match dense behavior with the sparse GPU ops.
Closes https://github.com/caffe2/caffe2/pull/254
Reviewed By: bwasti
Differential Revision: D4871680
Pulled By: dzhulgakov
fbshipit-source-id: 220be57a0f699a22ea85ed4f7022d92d362d06b3