You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 21, 2024. It is now read-only.
Assume in the graphic below that the red line is our customer value function, and integrating over time will give us our total CLV for the customer:
The bars represent discrete value measurements at regular time intervals. Summing up the area of the bars is essentially what the existing customer_lifetime_value method is doing. However, note the corners of the bars protruding beyond the red line - this will inflate total CLV estimations; the wider the bar, the greater the inflation. This width is fixed to monthly intervals in the current customer_lifetime_value implementation; the freq parameter only reflects the time intervals in which data was aggregated for model training, which is daily for most use cases. In theory training models on weekly or monthly data would improve the accuracy of CLV estimates.
Fortunately continuous-time CLV expressions exist. In addition to accuracy they also have the advantage of summing over the total lifetime of the customer rather than a user-specified time period. However, implementation is model-specific.
An expression for the Pareto-NBD model is provided as equation (2) on page 8 of this paper:
I've also found implementations for the Beta-Geometric/Beta-Binomial model and a few other models that haven't been added yet to btyd, but none for the BG/NBD model. I'll try reaching out to Fader himself on LinkedIn for assistance on this.
The text was updated successfully, but these errors were encountered:
Hi @ColtAllen,
first of all thank you for the work you are doing on this project.
Is there any news on the implementation of continuous-time CLV calculation? Did you find a formulation for the BG/NBD model?
Thanks!
Alfredo
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
bugSomething isn't workingenhancementNew feature or request
Unrealistically high CLV estimations have been a common complaint with the legacy
lifetimes
CLV formulations. I believe these discrepancies are due to calculations being made in discrete rather than continuous time.Follow this link for a primer on comparing summation to integration,:
https://math.stackexchange.com/questions/2089929/comparing-discrete-sums-and-integrals
Assume in the graphic below that the red line is our customer value function, and integrating over time will give us our total CLV for the customer:
The bars represent discrete value measurements at regular time intervals. Summing up the area of the bars is essentially what the existing
customer_lifetime_value
method is doing. However, note the corners of the bars protruding beyond the red line - this will inflate total CLV estimations; the wider the bar, the greater the inflation. This width is fixed to monthly intervals in the currentcustomer_lifetime_value
implementation; thefreq
parameter only reflects the time intervals in which data was aggregated for model training, which is daily for most use cases. In theory training models on weekly or monthly data would improve the accuracy of CLV estimates.Fortunately continuous-time CLV expressions exist. In addition to accuracy they also have the advantage of summing over the total lifetime of the customer rather than a user-specified time period. However, implementation is model-specific.
An expression for the Pareto-NBD model is provided as equation (2) on page 8 of this paper:
http://brucehardie.com/papers/rfm_clv_2005-02-16.pdf
I've also found implementations for the Beta-Geometric/Beta-Binomial model and a few other models that haven't been added yet to
btyd
, but none for the BG/NBD model. I'll try reaching out to Fader himself on LinkedIn for assistance on this.The text was updated successfully, but these errors were encountered: