Skip to content

Latest commit

 

History

History
52 lines (32 loc) · 2.83 KB

README.md

File metadata and controls

52 lines (32 loc) · 2.83 KB

Predicting Customer Lifetime Value (CLV)

Installation

File Descriptions

Because the dataset is large and publicly available, I did not upload it here.

The analysis can be found as Jupyter Notebook here:

Project Description

In this project, I analyzed customer behavior for online retail store that sells unique all-occasion gift-ware in the UK.

The dataset consists of 1,067,371 transactions and has the following variables:

Variable Description
InvoiceNo Invoice number. Nominal. A 6-digit integral number uniquely assigned to each transaction. If this code starts with the letter 'c', it indicates a cancellation.
StockCode Product (item) code. Nominal. A 5-digit integral number uniquely assigned to each distinct product.
Description Product (item) name. Nominal.
Quantity The quantities of each product (item) per transaction. Numeric.
InvoiceDate Invice date and time. Numeric. The day and time when a transaction was generated.
UnitPrice Unit price. Numeric. Product price per unit in sterling.
CustomerID Customer number. Nominal. A 5-digit integral number uniquely assigned to each customer.
Country Country name. Nominal. The name of the country where a customer resides.

I calculated three types of revenue-based CLV, assuming Average Lifespan for Basic and Granular CLV being 36 months:

Basic CLV = Average Revenue per Month * Average Lifespan

Granular CLV = (Average Revenue per Transaction * Average Frequency per Month) * Average Lifespan

Traditional CLV = Average Revenue * (Retention Rate / Churn Rate)

Results

Basic CLV gave unrealistically high CLV - 21725.62 USD per customer. Granular CLV is much lower - with only 1865.33 USD per customer. Still, both Basic and Traditional CLV relied on an arbitrary value of lifespan per customer, which we assumed here to be 3 years.

Traditional CLV, however, gave a more realistic number - only 141.69 USD per customer and was based on the real retention to churn ratio as a proxy for the customer lifespan.

Still, the traditional CLV method assumed that the churn is final, i.e. customers that churn do not come back later. Hence, it might underreport actual CLV, especially with low retention rates as in our case (19%).

Acknowledgement

This project is part of "Machine Learning for Marketing" course on Data Camp taught by Karolis Urbonas, Global Head of Machine Learning and Science at Amazon Web Services (AWS).