You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Inference for XGBoost models is implemented using the NaiveAdditiveDecisionTree implementation. As it is a DenseLtrRanker, it fills in missing values with 0. If the data that the XGBoost model was trained on contained missing values, then the scores produced in training may not match those in production, unless we similarly fill in missing values with 0.
This issue proposes that we alter the implementation of NaiveAdditiveDecisionTree to not fill in missing values, given that the implementation now correctly follows the model specification when missing values are encountered.
In the meantime we have found that we can achieve parity in scoring between training and inference by filling in missing values with 0 in the training data.
The text was updated successfully, but these errors were encountered:
Inference for XGBoost models is implemented using the
NaiveAdditiveDecisionTree
implementation. As it is aDenseLtrRanker
, it fills in missing values with 0. If the data that the XGBoost model was trained on contained missing values, then the scores produced in training may not match those in production, unless we similarly fill in missing values with 0.This is related to #135, #353, and has been partly implemented in #452 (which was ultimately merged via #480). With this change we now visit the designated missing node when a score is missing, but we won't hit this branch as missing scores are filled in with
0
at inference time.This issue proposes that we alter the implementation of
NaiveAdditiveDecisionTree
to not fill in missing values, given that the implementation now correctly follows the model specification when missing values are encountered.In the meantime we have found that we can achieve parity in scoring between training and inference by filling in missing values with 0 in the training data.
The text was updated successfully, but these errors were encountered: