This study proposes a customer loyalty segmentation in a retailer context, wherein the clustering is performed using the Length-Recency-Frequency- Monetary LRFM model and the integration
Trang 1UNIVERSITY OF ECONOMICS AND LAW
FACULTY OF INFORMATION SYSTEMS
1 Ho Trung Thanh, Assoc Prof-PhD
2 Le Thi Kim Hien, Ph.D
3 Nguyen Phat Dat
Trang 3Acknowledgements
First of all, we would like to express our profound gratitude to University of Economics and Law for integrating the Interdisciplinary Research Methods course into Information System Faculty’s program We particularly want to convey our appreciation to Associate Professor - Dr Ho Trung Thanh and Deputy Dean Dr Le Thi Kim Hien for their invaluable guidance and unwavering support, which were instrumental in the success of our research
Our heartfelt thanks also go to the authors and author groups who have made significant
contributions through research works, articles, theses, models, and the sharing of knowledge and
methods across various fields relevant to this study These contributions have significantly enhanced the clarity and comprehensiveness of our research
Despite our earnest efforts during the research process, we acknowledge that some mistakes may
be unavoidable We value and welcome all types of feedback as valuable contributions to
Trang 4Commitment
The research has been carried out collectively by all members of Group 2 under the guidance of two lecturers, Ho Trung Thanh and Le Thi Kim Hien Additionally, the paper includes references from various articles on related subjects Should there be any evidence of academic misconduct
in this research paper, our group is committed to bearing full responsibility for any consequences
at any level of punishment
Ho Chi Minh City, 2023
Group 2
Trang 5TABLE OF CONTENT
Members of Group
Trang 6List of Tables
23
25
Trang 7List of Figures
Figure 1.1
Figure 1.2
Trang 9GANTT CHART
Build ideas and Projects
0311/23 05/11/23
04/11/23 07/11/23
Data Preparation 0/11/23 10/11/23
CHAPTER 4: EXPERIMENTAL RESULTS
Experimental process ancresufts 11/11/23 18/11/23
Discussion 18/11/23 20/11/23
References ( APA) Anpendix 20/11/23 23/11/23
Septemiber 27,2023 ctoher †I,2023 eral) Noyember 10,223
#W31123456783011028H5ĐB17BIBS12F1E11219101271123145670830H1036BBPU7ĐB1211E1
Trang 10ABSTRACT
Targeting the right customers has always been a key strategy in increasing profit Adventure Works retail company is of no exception To ensure that its differentiated marketing strategies keep up with the appropriate segments of customers, this research was conducted A <year> dataset with <number of records> records with <number of data variables> characteristics was collected through <data source> This study proposes a customer loyalty segmentation in a retailer context, wherein the clustering is performed using the Length-Recency-Frequency- Monetary (LRFM) model and the integration of the k-means method In the end, <number of
categories> categories were found: <types of loyalty> The correct clustering results yielded a
score value of <Silhouette's evaluation score> Derived from the results of this segmentation, Adventure Works can strategically deliver tailored marketing to their clients
Keywords: LRFM model, K-means clustering, Elbow method, Silhouette score, customer segmentation, customer loyalty, marketing, retail industry
Trang 11of January 20, 2014, a search for "customer analytics" yielded over 5 million results,
encompassing sponsored links from major players such as IBM, Accenture, and Adobe, as well
as service providers like SAS, SAP, and Deloitte Notably, a global survey in 2018 reported that
84% of leading companies in the United States and worldwide had initiated big data analytics endeavors to enhance decision-making accuracy (Statista, 2018) Big data analytics is influential
in refining business operations such as supply chain management (Gunasekaran et al., 2017) and
customer relationship management (Nam, Lee, & Lee, 2019; Phillips-Wren & Hoskisson, 2015; Zerbino, Aloini, Dulmin, & Mininno, 2018) In the context of Adventure Works' case, the
application of LRFM methodology to segment customers based on loyalty status will become a critical element in the development and sustainability of the business: It enables Adventure
Works to research and comprehend customer loyalty thoroughly, so that once armed with
profound insights, the company can formulate specific strategies to enhance market competitiveness, attract potential customers, and foster loyalty among its existing clients
Trang 12Objectives:
The study aims to provide an efficient customer segmentation model based on loyalty status using the LRFM (Length, Recency, Frequency, Monetary) model and K-Means algorithm to help Adventure Works:
- Identify customers’ actual shopping behavior based on real data
- Comprehend customer diversity and capture the typical characteristics of each segment
- Enable businesses to make more reasonable business decisions and develop more effective marketing and advertising campaigns
Objects and scopes
Objects:
We investigate and analyze the purchasing behaviors and habits of Adventure Works' customers,
derived from the retailer dataset, specifically, how long have they been purchasing from
Adventure Works, when was the last time they bought something from AW, how often do they order from AW, and how much have they spent on AW’s products In other words, their Length,
Recency, Frequency, and Monetary value scores are to be examined and understood for
insightful outputs
II
Trang 13Scopes:
Time scope: From July 1, 2017, to June 15, 2020
Space scope: Adventure Works Cycle online sales system
- Calculation of L, R, F, and M scores: The L (Length), R (Recency), F (Frequency), and
M (Monetary) scores of each customer are calculated
- Determination of the optimal number of clusters: Both the Elbow method and Silhouette score were carried out to determine the optimal number of clusters
- Model construction: The customer segmentation model is constructed using the K-means algorithm
- Model evaluation: The Silhouette method is used to assess the effectiveness of the
customer segmentation model
- Visualization of results: The results of the customer segmentation are visualized using
<>,
- Analysing results, labeling clusters, and recommending strategies: An in-depth analysis
of the segmented data is conducted to derive actionable insights and propose strategies to benefit the company
Trang 14
Chapter 1: Theoretical Basics and Literature Review
In the introductory section of the first chapter, we lay the theoretical groundwork for our
research, focusing on customer segmentation, the LRFM model, along with various techniques
and algorithms employed by our research team In addition, the proposals, methods, and limitations of relevant studies are also presented
1.1 Customer Behavior
Customer behavior is a field of study with roots in sciences such as psychology, sociology, sociopsychology, humanities and economics In research work Consumer behavior: Concepts and applications(Consumer behavior: Concepts and Applications), David L Loudon & Albert J Della Bitta believe: “Customer behavior is the actual decision-making and action process of individuals when evaluating, shopping, using use or discard goods and services.In particular, studying customer behavior is an important part of economic research with the aim of understanding how and why consumers buy (or do not buy) products , products and services, and how the customer's shopping process takes place An understanding of customer behavior provides the foundation for marketing strategies, such as product positioning, market segmentation, new product development, new market applications, global marketing, and strategic decisions determining marketing mix, These major marketing activities will be more effective when based on an understanding of customer behavior
“Factors affecting the online shopping intention of Generation Z consumers in Vietnam”
Conducted by Ta Van Thanh, Dang Xuan On in 2021, identified and evaluated the impact of key
factors affecting the online shopping intention of generation Z consumers based on research
methods Quantitative research, analysis of scale reliability, exploratory factors, regression and
testing of model fit, thereby showing that there are 4 factors: (1) perceived usefulness, (2) ) trust, (3) perceived risk, and (4) psychological safety affect generation Z's online shopping intention, thereby drawing conclusions and recommendations to contribute to improving online shopping activities business activities of e-commerce businesses
Solutions to improve individual customer loyalty at Orient Commercial Joint Stock Bank (OCB)” Conducted by Nguyen Thanh Tuan, Bui Thi Thanh in 2022 identified factors affecting
individual customer loyalty, thereby proposing solutions to enhance customer loyalty The topic mainly combines qualitative research methods and quantitative research methods to evaluate the
current state of customer loyalty
“Phu Quoc eco-tourism market segment” conducted by Nguyen Tri Nam Khang, Duong
Que Nhu and Chau My Lan, published in 2013 This research focuses on Phu Quoc eco-tourist
segment according to demographic and behavioral criteria, from That shows (1) the number of different tourist groups in Phu Quoc, (2) selects a target group of tourists and states the identifying characteristics of that group
13
Trang 151.2: Customer Segmentation
Customer segmentation is widely used to group customers into specific characteristics Clustering is the process of forming segments of a set of data by measuring similarities between data with other data (Singh H and Kaur K, 2013) Each cluster of customers has different features, behaviors which affect the business strategies This strategy helps organizations, companies, have a thorough view of customers so that they can target and market to customers
more effectively
Study "A decision-making support system module for customer segmentation and ranking” ( Yossi Hadad, Baruch Keren, 2022) proposed a modular decision support system that allows for complete customer classification and ranking Modules are based on customer criteria with quantitative values that can be extracted from the business's organizational information system
By calculating customer scores based on measurable underlying criteria, the module can identify
and classify customers (e.g bronze, silver, gold, platinum, etc.) ), tracks changes over time,
and allow for complete and accurate rankings This proposed method saved 90% of the time and resources needed to prepare for customer portfolio management
Study "Hybrid soft computing approach based on clustering, rule mining, and decision tree analysis for customer segmentation problem: Real case of customer-centric industries” (Kaveh Khalili-Damghani, Farshid Abdi, Shaghayegh Abolmakarem, 2018) used a computational method combining clustering, rule extraction, and decision tree to predict new customer segments in customer-centric companies First, the K-Means algorithm is applied to cluster the company's previous customers based on their purchasing behavior Next, a filtering- based hybrid feature selection method and multi-attribute decision-making method are proposed Finally, on the basis of customer characteristics and using decision tree analysis, IF-THEN rules are exploited This method is applied to predict profitable customers and map out the factors that most influence customers
Study "4 comparative dimensionality reduction study in telecom customer segmentation
using deep learning and PCA” (Maha Alkhayrat, Mohamad Aljnidi, Kadan Aljoumaa, 2020) focuses on reducing the size of telecommunication data sets performing customer clustering in reduced space and latent space to improve clustering quality The initial data set used contained over 100,000 customers with 20 variables By using principal component analysis and Autoencoder Neural Network to eliminate irrelevant features and noisy data, especially when the data is high-dimensional This work has helped telecommunications companies achieve better results in classifying customers into different groups
Study "Research on customer segmentation based on the characteristics of shopping
centers in Ho Chi Minh City” conducted by Dinh Tien Minh, Le Vu Lan Anh in 2021 using two
Trang 16main methods: qualitative and quantitative methods combined with K-Means clustering method
to segment customers at shopping centers in Vietnam Ho Chi Minh City to provide reasonable bases to help shopping center business models have appropriate policies for their target customers Research results show that there are 3 identified customer segments: Entertainment- oriented buyers; Buyers follow practicality trends; Buyers follow the agreeable trend
Study “Application of clustering techniques and association rules to explore customer data using hotel services” Conducted by Nguyen Van Chuc and Dao Thi Giang published in
2015, it showed new features in applying clustering methods in addition to mining customer data Based on the data mining model with two techniques of data clustering and association rule discovery, the authors have successfully built a web interface to support managers’ decision making hotels, helping to devise separate policies for each customer group, predicting customer behavior in using hotel services as well as booking tours
Study “Overview of big data analysis in e-commerce” Conducted by Le Trieu Tuan, Ly
Thu Trang in 2020 comprehensively researched the benefits of big data analysis and proposed an analysis model for e-commerce businesses, helping businesses This industry takes a closer look
at using big data to improve business performance Based on big data analysis methods including structured and unstructured data, it helps e-commerce businesses improve business performance and customer care This helps businesses maintain and attract more potential customers in addition to improving business quality and enhancing brand image, creating opportunities for companies to capture and capture the market
Study “Some solutions to perfect marketing-mix for the organizational customer segment of Mobifone Region IT Mobile Information Center in Ho Chi Minh City” Conducted by
Tran My Yen, Bao Trung in 2015 analyzed and evaluated the current status of Marketing mix for
MobiF one's organizational customer segment in City Ho Chi Minh City to find out the problems affecting business efficiency, thereby proposing some solutions to perfect the current Marketing mix for MobiFone's organizational customer segment in Ho Chi Minh city
1.3 : Segmentation Model
Clustering is a popular unsupervised learning technique in Data Mining (DM) It is used to
find classes or groups of a dataset that are mostly in a different cluster In the article by DD Truc(
2022 ), clustering is a DM technique used to divide data into related groups without prior knowledge of group definitions
K-means is the most popular hard clustering technique to divide data into groups, and the objects in each cluster are homogeneous and dissimilar to other clusters This method finds the center of each cluster in an unlabeled data set This technique seeks to group the given objects into K clusters so that the sum of the squared distances between the objects to the center is
15
Trang 17minimum To carry out the K-means algorithm, we repeat the process of calculating the distance between objects to the center and assigning that object to the appropriate cluster, then calculate the average vector to get the position of the nucleus new and repeat the above step until the position cannot be changed anymore In the article by Khajvand and Tarok (2011), the K-means technique was used as a clustering algorithm to cluster bank customers based on customer lifetime value (CLV) and the RFM model brings weight
1.4 : Traditional RFM
The 3-value model Recency, Frequency, Monetary Value was first introduced in 1995 and has brought a new breeze in classifying customer classes, proving to be extremely effective due
to its suitability with the 80/20 principle (Bult & Wansbeek, 1995) Regarding the RFM method,
this analysis ranks each customer to 3 factors: Recency - which shows how recently the
customer’s last purchase was, Frequency - which shows how often the customer made a purchase
in a given period and the last one is Monetary, shows how much the customer spent in the given period From here, there are many studies surrounding this model, from domestic to foreign, all trying to create the best method to rank RFM values, to provide in-depth and detailed information about customer groups, bringing them into long-term relationships with the business
“ Estimating customer lifetime value based on RFM analysis of customer purchase behavior: case study’( Mahboubeh Khajvand, Kiyana Zolfaghar, Sarah Ashoori, and Somayeh Alizadeh 2011) employed two distinct methodologies In the initial approach, the researchers
utilized the RFM marketing analysis method (Recency, Frequency, and Currency) for customer
segmentation In the second approach, they introduced an extended RFM analysis method, incorporating an additional parameter: Customer Lifetime Value (CLV) The CLV was calculated based on the weighted RFM method for each segment This comprehensive analysis provided CLV results for various segments, offering insights that could be leveraged to elucidate the company's marketing and sales strategies
"Customer-Centric Sales Forecasting Model: RFM-ARIMA Approach" (Elhosseimi, 2023) focuses on using models to improve data and accuracy in sales forecasting after having applied RFM analysis This study uses a large data set from Global Superstore's Tableau, which includes information on sales of multiple products, customer segments, geographic locations of purchases,
revenue, profits, and more The article presents a detailed study of the results customer-centric
combination for sales forecasting using the RFM-ARIMA model This study contributes to the field of sales forecasting by proposing a customer-centric approach center can be applied across
a variety of industries and businesses to improve the accuracy of sales forecasts
1.5 ; RFM with Machine Learning
Trang 18Instead of manually grouping cases, using RFM models combined with Machine Learning like K-means is highly effective Algorithms automatically assign customers to different segments using two methods: elbow and Silhouette aim helps businesses identify and prioritize key customers, predict future shopping behavior, and optimize marketing and customer care campaigns to increase business performance and profits
“Customer Segmentation Using Machine Learning Model: An Application of RFM Analysis” (2023) by author Israa Lewaa uses a combination of ML and RFM analysis techniques
to predict Customer churn (stop using services/end contracts) mainly through transaction data In this study, real data from online retail is used to analyze customer behavior Using practical methods such as Box—Cox transformation to improve data quality and RFM scoring to evaluate customers, the research has produced deeper insights into shopping habits Customer segmentation was performed using K-Means and DBSCAN clustering techniques, with the optimal number of clusters determined using the Elbow curve method This research delves into the fascinating field of online retail customer behavior analysis, providing valuable insights into how people shop online contributing to the constant pursuit of improved understanding of customers in the online retail sector
Article "A Mathematical Model for Customer Segmentation Leveraging Deep Learning, Explainable AI, and RFM Analysis in Targeted Marketing" (2023) by author Mostafa Elhosseini emphasizes the importance of using mathematical models to improve customer segmentation In the article, the author introduces the DeepLimeSeg model, clarifies its mathematical basis and the important role the model plays in revolutionizing customer segmentation The author has combined RFM (Recency, Frequency, Monetary) analysis with Deep learning techniques to develop a mathematical model of customer segmentation in the field of target marketing This method includes collecting and preprocessing customer data, training and testing the deep learning model, and evaluating the model's performance through various metrics The author also compared the effectiveness of the proposed model with existing models and discussed the experimental results, and came to the conclusion that applying Deep learning techniques is very
good to improve accuracy accuracy and reliability of these methods
“Customer Segmentation Using Machine Learning Model: An Application of RFM Analysis” (Lewaa, 2022) by author Israa Lewaa uses a combination of ML and RFM analysis techniques to predict customer churn (stop using services/end contracts) through transaction data
primarily In this study, real data from online retail is used to analyze customer behavior Using
practical methods such as Box—Cox transformation to improve data quality and RFM scoring to evaluate customers, the research has produced deeper insights into shopping habits Customer segmentation was performed using K-Means and DBSCAN clustering techniques, with the optimal number of clusters determined using the Elbow curve method This research delves into
17
Trang 19the fascinating field of online retail customer behavior analysis, providing valuable insights into how people shop online contributing to the constant pursuit of improved understanding of customers in the online retail sector
"REM Analysis Using K-Means Clustering to Improve Revenue and Customer Retention"
(Vinit Dawane, 2021) focuses on using RFM analysis combined with K-Means clustering algorithms to improve revenue and customer retention The important contributions of this research include: applying effective RFM Analysis to classify customers based on Recency, Frequency, and Monetary Value factors; use K-Means clustering algorithm; Provide specific marketing recommendations for each customer segment, helping to optimize marketing strategies This research addresses a key challenge in the FMCG industry by providing insights
to improve sales and customer relations It combines data analysis and marketing strategy effectively and uses a systematic approach, from data collection to preprocessing and application
of machine learning techniques This study provides an important foundation for customer segmentation in the FMCG sector The author also encourages future research to explore improvements such as weighted RFM analysis and alternative clustering techniques In short, this article helps bridge the gap between data analytics and marketing strategy in the FMCG sector, providing valuable information for businesses looking to optimize customer interactions and generate revenue
1.6 Research GAP and Motivation
1.6.1 Summary Table/Chart of Previous Studies
behavior: case
study
Trang 20
tourism market
segment"
based on consumption behavior behavioral criteria, demographic and
from there point
out the number of different tourist groups in Phu Quoc, select the
target tourist group
and state the
identifying
characteristics of
that group Applying clustering
hotel services as
well as booking tours
19
Trang 21reduction study in Autoencoder Neural and noisy data,
$ companies achieve better results in classifying customers into
different groups
needed to prepare for customer portfolio management