Summary:
Adding a calibration module called histogram binning:
Divide the prediction range (e.g., [0, 1]) into B bins. In each bin, use two parameters to store the number of positive examples and the number of examples that fall into this bucket. So we basically have a histogram for the model prediction.
As a result, for each bin, we have a statistical value for the real CTR (num_pos / num_example). We use this statistical value as the final calibrated prediction if the pre-cali prediction falls into the corresponding bin.
In this way, the predictions within each bin should be well-calibrated if we have sufficient examples. That is, we have a fine-grained calibrated model by this calibration module.
Theoretically, this calibration layer can fix any uncalibrated model or prediction if we have sufficient bins and examples. It provides the potential to use any kind of training weight allocation to our training data, without worrying about the calibration issue.
Test Plan:
buck test dper3/dper3/modules/calibration/tests:calibration_test -- test_histogram_binning_calibration
buck test dper3/dper3_models/ads_ranking/tests:model_paradigm_e2e_tests -- test_sparse_nn_histogram_binning_calibration
All tests passed.
Example workflows:
f215431958
{F326445092}
f215445048
{F326445223}
Reviewed By: chenshouyuan
Differential Revision: D23356450
fbshipit-source-id: c691b66c51ef33908c17575ce12e5bee5fb325ff
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27001
This unconditional log line spams the logs enough that it's a drag on cpu and will eventually fill up logs.
Test Plan: Allow unit test and automated testing to give feedback.
Reviewed By: jspark1105
Differential Revision: D17638140
fbshipit-source-id: 4e8a44bda31327ba7e797f7579a9e3bf866eef7e
Summary:
These implicit fallthroughs lead to the following warning on g++ 7, because g++ could not recognize the implicit `abort` call in `LOG(FATAL)`. We suppress by adding explicit `return`s.
/home/hong/wsrc/pytorch/caffe2/utils/math_cpu.cc: In function void
caffe2::math::GemmEx(CBLAS_TRANSPOSE, CBLAS_TRANSPOSE, int
, int, int, T, const T*, int, const T*, int, T, T*, int, Context*) [with
T = float; Context = caffe2::CPUContext; Engine = caf
fe2::DefaultEngine]:
/home/hong/wsrc/pytorch/c10/util/logging_is_not_google_glog.h:98:10:
warning: this statement may fall through [-Wimplicit-fall
through=]
::c10::MessageLogger((char*)__FILE__, __LINE__, n).stream()
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/hong/wsrc/pytorch/caffe2/utils/math_cpu.cc:179:11: note: in
expansion of macro LOG
LOG(FATAL) << "Unexpected CBLAS_TRANSPOSE for trans_B";
^
/home/hong/wsrc/pytorch/caffe2/utils/math_cpu.cc:182:5: note: here
case CblasTrans: {
^~~~
In file included from /home/hong/wsrc/pytorch/c10/util/Logging.h:28:0,
from /home/hong/wsrc/pytorch/caffe2/core/logging.h:2,
from /home/hong/wsrc/pytorch/caffe2/core/types.h:9,
from /home/hong/wsrc/pytorch/caffe2/utils/math.h:17,
from
/home/hong/wsrc/pytorch/caffe2/utils/math_cpu.cc:14:
/home/hong/wsrc/pytorch/c10/util/logging_is_not_google_glog.h:98:10:
warning: this statement may fall through [-Wimplicit-fall
through=]
::c10::MessageLogger((char*)__FILE__, __LINE__, n).stream()
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/hong/wsrc/pytorch/caffe2/utils/math_cpu.cc:202:11: note: in
expansion of macro LOG
LOG(FATAL) << "Unexpected CBLAS_TRANSPOSE for trans_B";
^
/home/hong/wsrc/pytorch/caffe2/utils/math_cpu.cc:205:5: note: here
default:
^~~~~~~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24053
Differential Revision: D16732530
Pulled By: ezyang
fbshipit-source-id: 90373879f25b52efca5bf151c7ed58d6ad19d925
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16929
Separate CPU reduce functions from math
i-am-not-moving-c2-to-c10
Reviewed By: houseroad
Differential Revision: D13999469
fbshipit-source-id: bd628b15a6e3c1f04cc62aefffb0110690e1c0d1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16175
Separate Moments from math and optimize it
i-am-not-moving-c2-to-c10
Reviewed By: houseroad
Differential Revision: D13742472
fbshipit-source-id: 90757d908d38c98ca69818855aaf68315e525992
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16135
Separate affine_channel from math and optimize it
i-am-not-moving-c2-to-c10
Reviewed By: houseroad
Differential Revision: D13727606
fbshipit-source-id: 8980af4afadaf964a18a9da581106fe30896a7e9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13949
This diff adds support to fillers for `SparseLengthsWeight*` ops. It does 3 things:
1. Add the fillers for `SparseLengthsWeight*` ops
2. Add filling heuristics to consider the path of `LengthsRangeFill` -> `Gather` -> `SparseLengthsWeightedSum`, where the length input is shared by `LengthsRangeFill` and `SparseLengthsWeightedSum`. Therefore, we need to carefully bound the value of that length input so that at `Gather`, it does not index out-of-bound for the weight input of `Gather`.
3. Fix and simplify the logic of `math::RandFixedSum`, where we just keep rejecting the generated value if it violates the invariants.
Reviewed By: highker
Differential Revision: D13048216
fbshipit-source-id: bfe402e07e6421b28548047d18b298c148e0ec87
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12733
Conv in NHWC layout only works for 2D images. This has been a pain point when implementing quantized 3D convolution because we need NHWC layout for best performance (note that NHWC layout in general gives better performance in CPU not just for quantized operators). For example, our quantized ops have a functionality to measure quantized error operator by operator but this needs running a shadow fp32 operator, but this is not easy when there's no 3D conv in NHWC layout is available (currently we're doing layout conversion on the fly for the shadow fp32 operator which is error prone). Some of Caffe2 frameworks like brew generates error when we try to create a 3D conv op in NHWC layout. This was also a blocker for using aibench because aibench is using brew.
i-am-not-moving-c2-to-c10
Reviewed By: houseroad
Differential Revision: D10333829
fbshipit-source-id: 2d203ee1db833cd3f9d39353219e3894b46c4389
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13480
Revert D12845220 since the MKL functions are using multi-thread while the single-thread run is slower than eigen version.
i-am-not-moving-c2-to-c10
Reviewed By: houseroad
Differential Revision: D12891751
fbshipit-source-id: 2a61727b269a304daeee2af6ff7fee7820cb5344
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11884
In cpu mode, current convNd uses Im2ColNdNCHWImpl, which is generic implementation to handle convolutional layer for arbitrary number of dimensions. In video modeling, we use convNd for filter dimension=3.
The problem of current convNd is that Im2ColNdNCHWImpl is much slower than Im2Col used by conv2d for the filters with same Flops. For example, a (1, 7, 7) 3d filter takes 5 times longer than a (7, 7) 2d filter at inference time.
This diff extends Im2Col to 3d case (Im2Col3dNCHWImpl), and this optimization for 3d convolution gives 4~5 times faster inference time on cpu for various video models:
{F128300920}
i-am-not-moving-c2-to-c10
Reviewed By: BIT-silence
Differential Revision: D8245940
fbshipit-source-id: 75231d65c9dd56059dfe31701e26021fd1ff2a85
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12306
In a future diff, I'm going to introduce non-placement constructor and destructor to TypeMeta.
To make it less ambigous, this diff is first renaming the existing ones to PlacementXXX.
Reviewed By: dzhulgakov
Differential Revision: D10184117
fbshipit-source-id: 119120ebc718048bdc1d66e0cc4d6a7840e666a4
Summary:
CAFFE2_UNIQUE_LONG_TYPEMETA has been a tricky variable defined only from cmake - this is an experiment to remove it and see what exact compilers need that one set.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12311
Reviewed By: dzhulgakov
Differential Revision: D10187777
Pulled By: Yangqing
fbshipit-source-id: 03e4ede4eafc291e947e0449382bc557cb624b34
Summary:
TSIA. Right now we should basically use C10_EXPORT and C10_IMPORT for explicitly marking dllexport and dllimport, as a continued effort of the C10 unification.
This is a codemod by mechanically doing the following change:
CAFFE2_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT}
AT_CORE_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12019
Reviewed By: ezyang, teng-li
Differential Revision: D10016276
Pulled By: Yangqing
fbshipit-source-id: a420d62c43d1110105fc88f9e9076e28a3203164
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10888
Add cuda version of SpatialBNOp also optimize SpatialBN on CPU
Reviewed By: houseroad
Differential Revision: D9512435
fbshipit-source-id: 6f828c88d56d30dc9a2f98a297a161c35cc511b1
Summary:
This is an experimental build on top of what orionr and mingzhe09088 built.
Essentially, the idea is that we will need separate *_API versions for different shared libraries. If this theory is right, I'll try to clean up the design a bit and document it properly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11266
Reviewed By: orionr
Differential Revision: D9682942
Pulled By: Yangqing
fbshipit-source-id: c79653199e67a1500c9174f39f8b0357324763f3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11060
Adding synthetic data generation to the filler.h file (the exact distribution to be replaced later on).
Reviewed By: highker
Differential Revision: D9417594
fbshipit-source-id: 5d66dfbcb254a5961c36b7d3a081332c7372dac7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10439
Update Im2Col related to make preparation for group conv in NHWC order.
Reviewed By: houseroad
Differential Revision: D9285344
fbshipit-source-id: 1377b0243acb880d2ad9cf73084529a787dcb97d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9939
Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13
Pull Request resolved: https://github.com/pytorch/translate/pull/166
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125
Closes https://github.com/pytorch/pytorch/pull/9125
Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later
Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:
1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change
Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.
Reviewed By: ezyang, houseroad
Differential Revision: D9024330
fbshipit-source-id: e0b8295d2dc6ebe2963383ded5af799ad17164ba
Summary:
Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13
Pull Request resolved: https://github.com/pytorch/translate/pull/166
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125
Closes https://github.com/pytorch/pytorch/pull/9125
Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later
Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:
1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change
Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.
Reviewed By: xw285cornell
Differential Revision: D8121878
fbshipit-source-id: 4a5e9a677ba4ac82095df959851a054c81eccf81
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9520
Add random data filler to predictor bench to support production nets
Reviewed By: salexspb
Differential Revision: D8712757
fbshipit-source-id: 2c732b2ba71ab210f9222adf94d08442ca71dc03
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9299
Onnx has ReduceL1 and ReduceL2 operators that would facilitate this, so allow pytorch to export those and allow caffe2 to run them.
I only implemented this on CPU so far.
Reviewed By: pjh5
Differential Revision: D8757381
fbshipit-source-id: 68afc9e2f90042a70929b73ace05a499b5c670c7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9350
Re-apply #9270
Breaking this out of #8338
This takes care of the Eigen failure we saw on Mac CUDA builds when BUILD_CAFFE2 and BUILD_ATEN were removed. Fix is to isolate Eigen from headers included by cu files and processed by nvcc. This was worked on with smessmer.
Reviewed By: mingzhe09088
Differential Revision: D8794431
fbshipit-source-id: de656334af46c697802073f8e8d9a6aeb9ca65a7