Commit Graph

18 Commits

Author SHA1 Message Date
Bugra Akyildiz
27c7158166 Remove __future__ imports for legacy Python2 supports (#45033)
Summary:
There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports:

```2to3 -f future -w caffe2```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033

Reviewed By: seemethere

Differential Revision: D23808648

Pulled By: bugra

fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38
2020-09-23 17:57:02 -07:00
Eileen Pan
f07816003a [2/n][Compute Meta] support analysis for null flag features
Summary:
## TLDR
Support using NaN default value for missing dense features in RawInputProcessor for DPER2. In preparation for subsequent support for null flag features in compute meta. For train_eval this is already supported in DPER3 and we do not plan to support this in DPER2 train eval.

Differential Revision: D22439142

fbshipit-source-id: 99ae9755bd41a5d5f43bf5a9a2819d64f3883005
2020-07-20 13:13:45 -07:00
Will Feng (FAIAR)
078669f6c3 Back out "[2/n][Compute Meta] support analysis for null flag features"
Summary:
Original commit changeset: 46c59d849fa8

The original commit is breaking DPER3 release pipeline with the following failures:
https://www.internalfb.com/intern/chronos/jobinstance?jobinstanceid=9007207344413239&smc=chronos_gp_admin_client&offset=0
```
Child workflow f 202599639  failed with error: c10::Error: [enforce fail at operator.cc:76] blob != nullptr. op Save: Encountered a non-existing input blob: feature_preproc/feature_sparse_to_dense/default_float_value
```
https://www.internalfb.com/intern/chronos/jobinstance?jobinstanceid=9007207344855973&smc=chronos_gp_admin_client&offset=0
```
Child workflow f 202629391  failed with error: c10::Error: [enforce fail at operator.cc:76] blob != nullptr. op Save: Encountered a non-existing input blob: tum_preproc/inductive/feature_sparse_to_dense/default_float_value
```

Related UBN tasks: T69529846, T68986110

Test Plan: Build a DPER3 package on top of this commit, and check that DPER3 release test `model_deliverability_test` is passing.

Differential Revision: D22396317

fbshipit-source-id: 92d5b30cc146c005d6159a8d5bfe8973e2c546dd
2020-07-06 16:29:03 -07:00
Eileen Pan
db39542509 [2/n][Compute Meta] support analysis for null flag features
Summary:
## TLDR
Support using NaN default value for missing dense features in RawInputProcessor for DPER2. In preparation for subsequent support for null flag features in compute meta. For train_eval this is already supported in DPER3 and we do not plan to support this in DPER2 train eval.
## Overview
Intern project plan to support adding dense flags for missing feature values instead of replacing with zero.
## Project plan :
https://docs.google.com/document/d/1OsPUTjpJycwxWLCue3Tnb1mx0uDC_2KKWvC1Rwpo2NI/edit?usp=sharing
## Code paths:
See https://fb.quip.com/eFXUA0tbDmNw for the call stack for all affected code paths.

Test Plan:
## fblearner flow test

1. `flow-cli clone f197867430 --run-as-secure-group ads_personalization_systems --force-build` to build a ephemeral package and start a fblearner flow run (may fail)
2. Clone the new run and change the secure_group to `XXXX` and entitlement to `default` in the UI
3. Adds explicit_null_min_coverage flag
4. Optionally reduce `max_examples` since we only test pass/fail instead of quality.
5. Submit the run to test the change

Example:
f198538878

## compare output coverages to daiquery runs

1. Randomly select null flag features from compute meta workflow output
2. Look up the feature id in feature metadata using feature name
3. Check against a daiquery sample of coverage to see if the coverage falls within guidelines.
https://www.internalfb.com/intern/daiquery/workspace/275342740223489/192619942076136/

## Sampled features:
GFF_C66_ADS_USER_SUM_84_PAGE_TYPE_RATIO_EVENT_LIKE_IMPRESSION: 15694257
- original feature compute meta coverage: 0.999992
- daiquery feature coverage (10k rows): 0.69588
- null flag compute meta coverage: 0.293409
GFF_R1303_ADS_USER_SUM_7_PAGE_TYPE_COUNTER_CONVERSION: 16051183
-  original feature compute meta coverage: 0.949868
- daiquery feature coverage: 0.82241
- null flag compute meta coverage: 0.151687

## Unit tests:

`buck test  fblearner/flow/projects/dper/tests/workflows:ads_test`

https://www.internalfb.com/intern/testinfra/testconsole/testrun/6192449504303863/

Differential Revision: D22026450

fbshipit-source-id: 46c59d849fa89253f14dc2b035c4c677cd6e3a4c
2020-07-02 12:44:41 -07:00
Eileen Pan
4102fbdf08 [1/n] Allow dense NaN value in dper raw input processor output
Summary:
## TLDR
Support using NaN default value for missing dense features in RawInputProcessor for *DPER2*. In preparation for subsequent support for null flag features in *compute meta*. For train_eval this is already supported in DPER3 and we do not plan to support this in DPER2 train eval.
## Overview
Intern project plan to support adding dense flags for missing feature values instead of replacing with zero.

Project plan :
https://docs.google.com/document/d/1OsPUTjpJycwxWLCue3Tnb1mx0uDC_2KKWvC1Rwpo2NI/edit?usp=sharing

## Code paths:
See https://fb.quip.com/eFXUA0tbDmNw for the call stack for all affected code paths.

Test Plan:
# A. DPER3 blob value inspection
## 1. Build local bento kernel in fbcode folder
`buck build mode/dev-nosan //bento/kernels:bento_kernel_ads_ranking`

## 2. Use kernel `ads_ranking (local)` to print dense feature blob values
n280239

## 2.1 Try `default_dense_value = "0.0"` (default)
```
preproc_6/feature_preproc_6/dper_feature_processor_7/raw_input_proc_7/float_feature_sparse_to_dense_7/float_features [[0.       ]
 [0.       ]
 [0.       ]
 [0.       ]
 [0.       ]
 [0.       ]
 [0.       ]
 [1.       ]
 [1.7857143]
 [1.7777778]
 [1.       ]
 [0.       ]
 [0.5625   ]
 [0.       ]
 [0.       ]
 [0.8      ]
 [0.       ]
 [1.       ]
 [0.56     ]
 [0.       ]]
```
## 2.2 Try `default_dense_value = "123"`
```
preproc_2/feature_preproc_2/dper_feature_processor_3/raw_input_proc_3/float_feature_sparse_to_dense_3/float_features [[123.       ]
 [123.       ]
 [123.       ]
 [123.       ]
 [123.       ]
 [123.       ]
 [123.       ]
 [  1.       ]
 [  1.7857143]
 [  1.7777778]
 [  1.       ]
 [123.       ]
 [  0.5625   ]
 [123.       ]
 [123.       ]
 [  0.8      ]
 [123.       ]
 [  1.       ]
 [  0.56     ]
 [123.       ]]
```
## 2.3 Try `default_dense_value = float("nan")`
```
RuntimeError: [enforce fail at enforce_finite_op.h:40] std::isfinite(input_data[i]). Index 0 is not finite (e.g., NaN, Inf): -nan (Error from operator:
input: "unary_4/logistic_regression_loss_4/average_loss_4/average_loss" name: "" type: "EnforceFinite" device_option { random_seed: 54 })
```
which is expected due to nan input.

# B. Unit test
`buck test  fblearner/flow/projects/dper/tests/preprocs:raw_feature_extractor_test`

https://www.internalfb.com/intern/testinfra/testconsole/testrun/5348024586274923/

{F241336814}

Differential Revision: D21961595

fbshipit-source-id: 3dcb153b3c7f42f391584f5e7f52f3d9c76de31f
2020-06-26 16:54:14 -07:00
Brian Wignall
f326045b37 Fix typos, via a Levenshtein-type corrector (#31523)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking.

Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523

Differential Revision: D19216749

Pulled By: mrshenli

fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea
2020-01-17 16:03:19 -08:00
Brian Wignall
e7fe64f6a6 Fix typos (#30606)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606

Differential Revision: D18763028

Pulled By: mrshenli

fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c
2019-12-02 20:17:42 -08:00
Kevin Wilfong
88b1f6619e Return list of AccessedFeatures from get_accessed_features (#23983)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23983

While testing I realized that model layers can extract different types of features from the same column.  For example, MultifeedFeaturesTransform uses float and ID list features from the "features" column.

get_accessed_features returns a map from column to AccessedFeatures, and AccessedFeatures only has the feature IDs for one feature type.  This is incompatible with have multiple types of features per column, one type ends up overwriting another in the map.

To fix this, I've modified get_accessed_features to return a map from column to a list of AccessedFeatures objects.

Reviewed By: itomatik

Differential Revision: D16693845

fbshipit-source-id: 2099aac8dc3920dd61de6b6ad5cf343c864803bc
2019-08-14 10:50:27 -07:00
Kevin Wilfong
3ca7c0ffdb Add get_accessed_features function to ModelLayer class (#23036)
Summary:
We need a way to figure get a complete list fo features that are used in training a model.  One way to do this is to make it possible to get the list of features used in each Model Layer.  Then once the model is complete we can go through the layers and aggregate the features.

I've introduced a function to expose that information here, get_accessed_features, and implemented it in the FeatureSparseToDense layer to start with.

I've tried to include the minimum amount of information to make this useful, while making it easy to integrate into the variety of model layers.  This is, for example, why AccessedFeatures does not contain feature_names which is not always present in a model layer.  I debated whether or not to include feature_type, but I think that's useful enough, and easy enough to figure out in a model layer, that it's worth including.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23036

Test Plan:
Added a unit test to verify the behavior of get_accessed_features in FeatureSparseToDense.

aml_dper2-fblearner-flow-integration-tests failed due to a known issue D16355865
aml_dper3-fblearner-flow-integration-tests failed due to a known issue T47197113

I verified no tests in the integration tests failed to issues other than those known ones.

DPER2 canaries: https://fburl.com/fblearner/1217voga

Reviewed By: volkhin

Differential Revision: D16365380

Pulled By: kevinwilfong

fbshipit-source-id: 2dbb4d832628180336533f29f7d917cbad171950
2019-07-22 15:04:28 -07:00
Junjie Bai
52135e9b12 Revert D13551909: [fbcode] logdevice for generic feature type
Differential Revision:
D13551909

Original commit changeset: 807830c50bee

fbshipit-source-id: 48cacf4ec1765253a9be9d78f4b28cc48330be59
2019-01-25 00:33:06 -08:00
Qin Huang
11a2b3799b logdevice for generic feature type (#16191)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16191

logdevice related modifications for generic feature type

we directly convert the generic feature structures to json strings, which corresponds to the column input in offline and dper

Reviewed By: itomatik

Differential Revision: D13551909

fbshipit-source-id: 807830c50bee569de202530bc3700374757793a2
2019-01-24 23:33:19 -08:00
Qin Huang
ab293924bb support generic feature in DPER2 (#10197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10197

Support generic feature in DPER2

For now since we only have one generic type 1, we are directly adding the parsed feature record to embedding feature.

For new feature types with specific structure, there should also be corresponding coding changes expected.

Reviewed By: itomatik

Differential Revision: D8788177

fbshipit-source-id: 9aaa6f35ece382acb4072ec5e57061bb0727f184
2018-08-04 15:25:13 -07:00
Orion Reblitz-Richardson
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
Artem Volkhin
5b10ad255b Use EMBEDDING feature type instead of FLOAT_TENSOR
Summary: create a special type for embeddings

Differential Revision: D5997808

fbshipit-source-id: 9a5ad8ecc019d10536705d3b25f2436ca8a56454
2017-10-11 13:50:03 -07:00
Artem Volkhin
fb8a7679cc preprocs for embeddings
Summary: embeddings

Differential Revision: D5888420

fbshipit-source-id: b293df6444cba49e2feab6ccf8b8346019e5b421
2017-10-04 22:18:21 -07:00
Yangqing Jia
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
Jiyan Yang
a8695178aa Adding parameter sharing API to Dper2
Summary:
To achive this, I modified the blob name scheme defined in a layer.
Before it was scope/fc_w and scope/fc_w_auto_0 (if there is another fc
    within the same scope).
Now I change it to scope/fc/w and scope/fc_auto_0/w.
That is, we rely on the uniqueness of the scoped layer name to define
names for blobs.

I also overwrote the create_param method in LayerModelHelper to let it
use the resolved name for blobs given the sharingparameter context.

There are some details such as making the initializer more structured
that I need to finalize.

Reviewed By: kennyhorror

Differential Revision: D5435132

fbshipit-source-id: a0525f5ea0977e255dd5ea765b38913f5951d455
2017-08-03 00:33:18 -07:00
Jiyan Yang
c822e89956 Rename SparseToDense layer
Summary:
The SparseToDense layer is essentially calling the SparseToDenseMask op.
This makes it impossible to call the functional layer with the true SparseToDense op.
This diff is to rename the layer.

Please let me know if I missed anything or you have a better name suggestion.

Differential Revision: D5169353

fbshipit-source-id: 724d3c6dba81448a6db054f044176ffc7f708bdb
2017-06-09 12:48:27 -07:00