UNIVERSITY OF ECONOMICS AND LAW FACULTY OF INFORMATION SYSTEMS FINAL PROJECT REPORT INTERDISCIPLINARY RESEARCH METHOD COURSE TOPIC: AN APPLICATION OF LRFM MODEL FOR CUSTOMER LOYALTY
Trang 1UNIVERSITY OF ECONOMICS AND LAW FACULTY OF INFORMATION SYSTEMS
FINAL PROJECT REPORT INTERDISCIPLINARY RESEARCH METHOD COURSE
TOPIC:
AN APPLICATION OF LRFM MODEL FOR CUSTOMER LOYALTY
SEGMENTATION AT ADVENTURE WORKS COMPANY
Lecturers:
1 Ho Trung Thanh, Assoc Prof-PhD
2 Le Thi Kim Hien, Ph.D
3 Nguyen Phat Dat, B.S
GROUP 02
Ho Chi Minh City, November, 2023
Trang 2Members of Group 02
Trang 3
Acknowledgements
First of all, we would like to express our profound gratitude to University of Economics and Law for integrating the Interdisciplinary Research Methods course into Information System Faculty’s program We particularly want to convey our appreciation to Associate Professor - Dr Ho Trung Thanh, Deputy Dean Dr Le Thi Kim Hien, and Bachelor of Science Nguyen Phat Dat for their invaluable guidance and unwavering support, which were instrumental in the success of our research
Our heartfelt thanks also go to the authors and author groups who have made significant contributions through research works, articles, theses, models, and the sharing of knowledge and methods across various fields relevant to this study These contributions have significantly enhanced the clarity and comprehensiveness of our research Despite our earnest efforts during the research process, we acknowledge that some mistakes may be unavoidable We value and welcome all types of feedback as valuable contributions to enhancing and improving our work
Group 02
Trang 4Commitment
The research has been carried out collectively by all members of Group 02 under the guidance of two lecturers, Ho Trung Thanh , Le Thi Kim Hien and Nguyen Phat Dat Additionally, the paper includes references from various articles on related subjects Should there be any evidence of academic misconduct in this research paper, our group
is committed to bearing full responsibility for any consequences at any level of punishment
Ho Chi Minh City, 2023
Group 02
Trang 5Chapter 2 Methodology and Proposed Research Models - 7522222 csczzseecSs 28
3.4.L,R, F, M calculafion - - - - c0 TH n1 TH ST ng TT H TT ng TT cv cà 37
5
Trang 6List of Tables
Trang 7
List of Figures
Figure 3.1 Select necessary attributes to calculate L, R, F, M values 37
Figure 3.4: Relationship between Monetary and Length -<<<5 39 Figure 3.5 Chart Relationship between Monetary and Recency 40
Figure 3.6 Chart Relationship between Monetary and Frequency 40
Figure 3.3 The distribution after transform and normalize the data - 43
Figure 4.1: Elbow table 0002 22 2 11 0 n1 1g nh ket 44 Figure VÀO (05a Œaađiiđiiđa 45
Figure 4.3 Silhouette score V€erSus ““K”” TH ST HT SH Tp 46 Figure 4.4 Clustering reSUIf - TT TS 2n nT TT TH TK KT KH 46 Figure 4.5 si eằa a 47 Figure 4.6 Average of L, R, F, M values for each cluster - c2: 48 Figure 4.7 Number of customers in each segment Q2 n2 n2 Hs sờ 48 Figure 4.8 Total Length of each segment 022212211 11H n1 1 21121 xvy 49 Figure 4.9 Total Recency of each segmentL - 022222212 1 21 1 2 11x vy 50 Figure 4.10 Total Frequency of each segmentL -ccc cà 2n SSnss nhe e 50 Figure 4.11 Total Monetary of each segment c.n n nv vn rxện 51 Figure 4.13 Description of Cluster 2 - Original Moderate Loyal Customers 53
Figure 4.14 Description of Cluster 3 - New Extreme Loyal Customers 54
Figure 4.15 Description of Cluster 4 - All-Time Extreme Loyal Customers 55
Figure 4.16 Description of Cluster 5 - All-Time Loyal Customers 57
Figure 4.17 Description of Cluster 6 - New Customers but low Loyalty 58
7
Trang 8B2B | Business to Business
Trang 9BUILDING IDEAS AND PROJECTS
In this step, we find out issues related to big data analysis and methods to help improve data analysis We will use a company's sample customer data to analyze and draw conclusions using research methods over a selected period of time
CHAPTER 1: THEORETICAL BACKGROUND AND RELATED WORK
Chapter 1 sets the conceptual foundation concentrating on consumer segmentation, the LRFM model, and numerous approaches and algorithms used by other research teams Furthermore, pertinent research’ recommendations, methodologies, and limitations are provided
CHAPTER 2: METHODOLOGY
The second chapter provided a clear framework for researching the research topics by outlining the study design, data gathering methods, and analytical approaches Introducing the LRFM models that will be used to drive data analysis, establishing the groundwork for later chapters CHAPTER 3: DATA UNDERSTANDING AND PREPARATION
The structure of the dataset is examined along with the important variables and their connections It then highlights the critical process of data cleaning and preprocessing in order
to ensure data quality and dependability
CHAPTER 4: EXPERIMENTAL RESULTS
Trang 10Applying the K-means algorithm to a normalized dataset using the LRFM model aims to identify distinct customer clusters Subsequent analysis of these clusters will inform strategic labeling, enabling the development of targeted marketing campaigns for the company CONCLUSION AND FUTURE WORK
In the competitive retail sector, our research identifies high-value customers using surveys and Python analysis with the LRFM model and K-means method This approach offers a
framework for effective, customer-centric strategies However, limitations include dataset
representativeness, sensitivity in K-means clustering, and challenges in achieving stable segmentation despite normalization efforts
10
Trang 11ABSTRACT
Targeting the right customers has always been a key strategy in increasing profit Adventure Works retail company is no different To ensure that its differentiated marketing strategies keep up with the appropriate segments of customers, this research was conducted A 2017 - 2020 dataset with 121.253 records with 15 characteristics regarding Sales Data was collected This study proposes a customer loyalty segmentation in a retailer context, wherein the clustering is performed using the Length- Recency-Frequency-Monetary (LRFM) model and the integration of the K-means method In the end, six clusters were found, but only five of them allowed positive Loyalty Status assessment, labeled as: Original Extreme Loyal Customers, Original Moderate Loyal Customers, New Extreme Loyal Customers, All-Time Extreme Loyal Customers, All-Time Loyal Customers This clustering results yielded a Silhouette Coefficient score of 0,837 Derived from the results of this segmentation, Adventure Works can strategically deliver tailored marketing to their clients, gradually boost its customer relations
Keywords: LRFM model; K-means clustering; Elbow method; Silhouette score; customer segmentation; customer loyalty; marketing; retail industry
11
Trang 12ABSTRACT
Việc nhắm đến đúng đối tượng khách hàng luôn là chiến lược quan trọng đề tăng cường lợi nhuận, và công ty bán lẻ Adventure Works cũng không nằm ngoài quy luật Đề đảm bảo chiến lược tiếp thị phân khúc hóa của công ty được truyền tới các phân đoạn khách hàng thích hợp, nghiên cứu này đã được tiễn hành Một tập dữ liệu từ năm 2017 - 2020 với 121.253 bản ghi và 15 đặc điểm liên quan đến Dữ liệu Bán hàng đã được thu thập Nhóm nghiên cứu sau đó đã phân khúc hóa cấp độ trung thành của khách hàng trong ngữ cảnh bán lẻ, trong đó việc gom cụm được thực hiện bằng cách sử dụng mô hình Length-Recency-Frequency-Monetary (LRFM) tích hợp phương pháp K-means Kết quả phân cụm, nhóm thu được sáu nhóm, nhưng chỉ có năm trong số đó thỏa mãn điều kiện để đánh giá Trạng thái Trung thành tích cực, được gán cho tên gọi lần lượt là: Khách hàng Trung thành Cao Cấp Ban Đầu, Khách hàng Trung thành Trung Bình Ban Đầu, Khách hàng Trung thành Cao Cấp Mới, Khách hàng Trung thành Cao Cấp Mọi thời kỳ, và Khách hàng Trung thành Mọi thời kỳ Kết quả phân khúc hóa này đạt được
số điểm Silhouette Coefficient khá ấn tượng là 0,837 Xuất phát từ kết quả nảy, Adventure Works có thể tùy chỉnh chiến lược tiếp thị sao cho phù hợp hơn đối với từng phân khúc khách hàng của họ, ngày càng củng cố nền quản trị quan hệ với khách hàng của công ty
Từ khóa: mô hình LREM; thuật toán phân cụm K-means; phương pháp Elbow; phương pháp Silhouette score; phân khúc khách hàng: lòng trung thảnh của khách hàng; marketing: nền công nghiệp bán lẻ
12
Trang 13to enhance market competitiveness, attract potential customers, and foster loyalty among its existing clients
Objectives
The study aims to provide an efficient customer segmentation model based on loyalty status using the LRFM (Length, Recency, Frequency, Monetary) model and K-Means algorithm The in-depth analysis of each segment means to help Adventure Works:
e Identify customers’ actual shopping behavior
e Comprehend customer diversity and capture the typical characteristics of each segment
e Enhance business decisions and develop more effective marketing and advertising campaigns for increasing profit
13
Trang 14Objects and scopes
Objects
We investigate and analyze the purchasing behaviors and habits of Adventure Works' customers, derived from the retailer dataset, specifically, how long had they been purchasing from Adventure Works, when was the last time they bought something from AW, how often did they order from AW, and how much had they spent on AW’s products In other words, their Length, Recency, Frequency, and Monetary value scores are to be examined and understood for insightful outputs
Scopes
® Time scope: From July 1, 2017, to June 15, 2020
e Space scope: Buyers (except for Resellers) of retail products from AWC Research method
The research aims to segment Adventure Works' customers based on loyalty status using the LRFM model and K-means algorithm, following the below set of steps:
e Data preprocessing: Irrelevant and inaccurate information is removed to ensure data cleanliness
e Calculation of L, R, F, and M scores: The L (Length), R (Recency), F (Frequency), and M (Monetary) scores of each customer are calculated
e Determination of the optimal number of clusters: Both the Elbow method and Silhouette score were carried out to determine the optimal number of clusters
@ Model construction: The customer segmentation model is constructed using the K-means algorithm
@ Model evaluation: The Silhouette method is used to assess the effectiveness of the customer segmentation model
e Analysing results, labeling clusters, and recommending strategies: An in-depth analysis of the segmented data is conducted to derive actionable insights and propose strategies to benefit the company
14
Trang 15Chapter 1 Theoretical Framework and Literature Review
In the introductory section of the first chapter, we lay the theoretical groundwork for our research, focusing on customer segmentation, the LRFM model, along with various techniques and algorithms employed by our research team In addition, the proposals, methods, and limitations of relevant studies are also presented
of new products, exploration of new markets, global marketing endeavors, and strategic decision-making
“Factors affecting the online shopping intention of Generation Z consumers in Vietnam” (Ta Van Thanh , Dang Xuan On, 2021) identifies and evaluates the impact of key factors affecting the online shopping intention of Generation Z consumers based on research methods Through quantitative research, analysis of scale reliability, exploratory factors, regression, and testing of model fit, they conclude there were a total
of 4 factors: perceived usefulness, trust, perceived risk, and psychological safety affect generation Z's online shopping intention, thereby drawing conclusions and recommendations to contribute to improving online shopping activities
Solutions to improve individual customer loyalty at Orient Commercial Joint Stock Bank (OCB)” (Nguyen Thanh Tuan, Bui Thi Thanh, 2022) identifies factors affecting individual customer loyalty, thereby proposing solutions to enhance customer
15
Trang 16loyalty The topic mainly combines qualitative research methods and quantitative research methods to evaluate the current state of customer loyalty
“Phu Quoc eco-tourism market segment” (Nguyen Tri Nam Khang, Duong Que Nhu, Chau My Lan, 2013) focuses on the Phu Quoc eco-tourist segment according to
demographic and behavioral criteria They chose the number of different tourist groups
in Phu Quoc, then selected a target group of tourists and stated the identifying characteristics of that group
The study "A decision-making support system module for customer segmentation and ranking” (Yossi Hadad, Baruch Keren, 2022) proposes a modular decision support system that allows for complete customer classification and ranking Modules are based
on customer criteria with quantitative values that can be extracted from the business's organizational information system By calculating customer scores based on measurable underlying criteria, the module can identify and classify customers (e.g bronze, silver, gold, platinum, etc ), track changes over time, and allow for complete and accurate
rankings This proposed method saved 90% of the time and resources needed to prepare for customer portfolio management
The study "Hybrid soft computing approach based on clustering, rule mining, and decision tree analysis for customer segmentation problem: Real case of customer-
16
Trang 17Abolmakarem, 2018) uses a computational method combining clustering, rule extraction, and decision tree to predict new customer segments in customer-centric companies First, the K-Means algorithm is applied to cluster the company's previous customers based on th eir purchasing behavior Next, a filtering-based hybrid feature selection and multi-attribute decision-making methods are proposed Finally, on the basis of customer characteristics and using decision tree analysis, IF-THEN rules are exploited This method is applied to predict profitable customers and map out the factors that most influence customers
The study "A comparative dimensionality reduction study in telecom customer segmentation using deep learning and PCA” (Maha Alkhayrat, Mohamad Aljnidi, Kadan Aljoumaa, 2020) focuses on reducing the size of telecommunication data sets performing customer clustering in reduced space and latent space to improve clustering quality The initial data set used contains over 100,000 customers with 20 variables By using principal component analysis and Autoencoder Neural Network to eliminate irrelevant features and noisy data, especially when the data is high-dimensional, this work has helped telecommunications companies achieve better results in classifying customers into different groups
The Study "Research on Customer Segmentation Based on the characteristics of shopping centers in Ho Chi Minh City” (Dinh Tien Minh, Le Vu Lan Anh, 2021) used two main methods: qualitative and quantitative methods combined with K-Means clustering method to segment customers at shopping centers in Ho Chi Minh City, Vietnam They aimed to provide reasonable bases, helping shopping centers to release appropriate policies to their target customers The result shows that there were 3 identified customer segments: Entertainment-oriented buyers, buyers who follow practicality trends, buyers who follow the agreeable trend
The study “Application of clustering techniques and association rules to explore customer data using hotel services” (Nguyen Van Chuc, Dao Thi Giang, 2015) shows new features in applying clustering methods in the context of customer data mining Based on the data mining model with two techniques of data clustering and
17
Trang 18association rule discovery, the authors have successfully built a web interface to support hotel managers’ decision-making, thus releasing appropriate policies for each customer group, enable customer behaviors prediction in booking hotel services as well as booking tours
The study “Overview of big data analysis in e-commerce” (Le Trieu Tuan, and
Ly Thu Trang , 2020) researches the benefits of big data analysis and proposed an analysis model to boost e-commerce businesses This study took a closer look at using big data to improve business performance Based on big data analysis methods including structured and unstructured data, it helps e-commerce businesses maintain and attract more potential customers in addition to improving overall quality and enhancing brand image
1.3 K-means clustering
Clustering is a common unsupervised learning method in Data Mining (DM) that identifies classes or groups in a dataset K-means clustering, a prominent clustering approach, separates data into distinct groups, ensuring similarity within each cluster and dissimilarity between clusters It works by finding the center of each cluster in an unlabeled dataset, minimizing the sum of squared distances between objects and their cluster centers
1.4 Traditional RFM
The 3-value model Recency, Frequency, Monetary Value, since first introduced
in 1995, has brought a new breeze in classifying customer classes, proving to be extremely effective due to its suitability wit h the 80/20 principle (Bult & Wansbeek, 1995) Regarding the RFM method, this approach ranks each customer according to 3 factors: Recency - which shows recent the customer’s last purchase was, Frequency - which shows how often the customer purchased in a given period and the last one is Monetary - which shows how much the customer spent in the given period Many studies have successfully applied this model to effectively categorize customers, gaining in-depth and detailed insights about these customer groups, thereby enhancing the customer relationship with the business
18
Trang 19“Estimating customer lifetime value based on RFM analysis of customer purchase behavior: A case study” (Mahboubeh Khajvand, Kiyana Zolfaghar, Sarah Ashoori, and Somayeh Alizadeh, 2011) employs two distinct methodologies In the initial approach, the researchers utilize the RFM marketing analysis method (Recency, Frequency, and Currency) for customer segmentation In the second approach, they introduce an extended RFM analysis method, incorporating an additional parameter: Customer Lifetime Value (CLV) The CLV was calculated based on the weighted RFM method for each segment This thorough analysis give CLV results to different segments, providing valuable insights for refining the company's marketing and sales strategies
"Customer-Centric Sales Forecasting Model: RFM-ARIMA Approach" (Elhosseini, 2023) focuses on improving accuracy in sales forecasting with ARIMA after having applied the RFM model This study uses a large dataset from Global Superstore's Tableau, which includes information on multiple products, customer segments, geographic locations of purchases, revenue, profits, and more The article presents a detailed study of the results customer-centric combination for sales forecasting using the RFM-ARIMA model This study contributed to the field of sales forecasting by proposing a customer-centric approach center can be applied across a variety of industries and businesses to improve the accuracy of sales forecasts 1.5 LRFM model
When it comes to categorizing customers, the combination of RFM models and Machine Learning (such as K-means) is highly effective But it does not take into account the length of a customer's relationship with the company This is where the LRFM model comes in Together with the optimal number of clusters determined by methods like the Elbow and Silhouette method, businesses can optimize tailored marketing and services, making sure each segment earns the highest level of satisfaction possible
“LRFMV: An efficient customer segmentation model for superstores” (IRezwana Mahfuza, Nafisa Islam, Md Toyeb, Md Asaduzzaman Faisal Emon, Md Shahnur Azad
19
Trang 20Chowdhury, Md Golam Rabiul Alam; 2022) shows The LRFM model is an improved version of the RFM model that adds a new dimension, V, to represent the volume of products purchased This allows the LRFM model to identify customer segments with
a clear profit-quantity relationship The LRFMV model was compared to the RFM and LRFM models, and it was found to create more accurate customer segments with the same number of customers while maintaining a greater profit
The study "Customer Segmentation Based on Loyalty Level Using K-Means and LRFM Feature Selection in Retail Online Store” (Tiara Lailatul Nikmah, Nur Hazimah Syani Harahap, Gina Cahya Utami, Muhammad Mirza Razzaq; 2023) focuses on identifying high-potential customer groups by analyzing retail online shop sales data The LRFM feature selection method and K-Means data mining algorithm were used to segment customers into four categories: Premium Loyalty, Inertia Loyalty, Latent Loyalty, and No Loyalty The Silhouette Score Index technique validated the clustering results, yielding a score of 0.943898 Businesses can use these insights to prioritize customer service and enhance sales
"A New Approach for Customer Clustering by Integrating the LRFM Model and Fuzzy Inference System" (Ali Alizadeh Zoeram, AhmadReza Karimi Mazidi; 2018) presents an enhanced LRFM model for analyzing customer behavior and optimizing customer relationship management Unlike traditional RFM models that overlook customer loyalty, the proposed LRFM model incorporates a loyalty dimension, enabling more accurate customer segmentation The model leverages a fuzzy inference system
to incorporate LRFM indices and facilitate dynamic customer clustering, ensuring flexible and adaptive strategies By analyzing customer attributes within each cluster, tailored marketing interventions can be devised to enhance customer engagement and drive business success
20
Trang 211.6 Research GAP and Motivation
1.6.1 Summary Table of Previous Studies
Table 1 Previous Studies
behavior: case study
tourist group and state th identifying characteristic:
of that group
each customer group, ai
predict customer behavid
in using hotel services 4
21
Trang 22Hybrid
computing approach
soft
based on clustering
decision tree analysi
segmentation problem: A real case
2018
computational method combining
“A decision-making 2022 | Customer Track changes over timé
Trang 23
“Factors affecting the | 2021 Quantitative Assess the key facto
Vietnam”
2021
RFM Analysis Using K-Means Clustering
marketing strategy in th FMCG sector, providing valuable information fof
Trang 24
sales forecast accuracy
Machine — Learning
Application of RFM Analysis”
online contributing to the
24
Trang 25
method was utilized fo
distinct consumer types Clients were sorted int
Loyalty, Inertia Loyalty, Latent Loyalty, and No Loyalty
13 “A New Approach for 2018 | LRFM, CLV The outcomes derive
the and
LRFM Model
System”
proposed approach withi
a wholesale firm reveale
among clusters concernin the four LRFM indices Consequently, this metho proves to be effective fo customer clustering an studying their distinctive
25
Trang 261.6.2 Research GAP and Motivation
The above studies have shown that collecting, processing, and storing data brings
a lot of value to retail businesses Based on collected data with appropriate algorithms,
we can enhance the effectiveness in deploying marketing and advertising strategies to target customer segments The many proposed models mentioned above all aimed to segment customers separately and specifically so that businesses can easily make reasonable business decisions accordingly However, based on the customer segmentation methods of the above studies, it is still not possible to have the most appropriate advertising and marketing campaigns because: after dividing into customer clusters, the studies have not yet described the typical characteristics of each segment, leading to difficulties in choosing suitable strategies for each specific segment
To comprehensively explore and capture customer diversity, Data Mining method was applied carefully, using various well-known Data transformation and Data normalizing methods We also added a “Length” feature into the traditional RFM model, enabled the new extensive model to distinguish between customers who may have similar RFM scores but differ in their long-term loyalty This additional dimension allows for a more nuanced understanding of customer loyalty, enabling businesses to identify and reward customers who have demonstrated consistent loyalty over an extended period Finally, we illustrated the clustered segmentation under many different ways, improving the comprehensiveness of each customer segment's analysis and making it easier to identify strategies for Adventure Works to deepen its appeal to customers
Chapter 2 Methodology and Proposed Research Models
In this chapter, we establish the foundation of our research journey by outlining the research design, data collection methods, and analytical techniques
26
Trang 27Goal: Segment Customers of AIWs due to Loyalty level End E————x—
x Data Collecting
* , 4 Data Preprocessing Analyse Results Data Cleaning Choosing features
+ Calculate L, R, F, M score
2
Data Preprocessing Silhouette coefficient > 0 (Data Transformation Data Normalization
* Building Model
2| Segmentation into "k"
groups using K-mean:
algorithm number of segments using efficiency using Silhouette Elbow & Silhouette method
Silhouette coefficient <= 0
Figure 2.1 Research Model 2.1 Research Method
The research starts with creating an initial customer segmentation model based
on Adventure Works brand loyalty We collect the essential dataset and ensure data cleanliness through normalization and transformation After calculating L, R, F, and M scores, we use the Elbow and Silhouette methods to determine optimal number of clusters before applying that number into the clustering process The Silhouette method assesses the segmentation result’s effectiveness If the Silhouette coefficient is higher than 0, we proceed to labeling and analysis; otherwise, we refine the cluster number Afterwards, in-depth analysis of segmented data yields insights and strategies for
27
Trang 28Adventure VWorks' marketing This iterative process ensures a finely-tuned segmentation model for actionable recommendations
2.2 Customer Segmentation Analysis
2.2.1 Data Preprocessing
Data preprocessing is a pivotal phase in the data mining process, encompassing tasks such as data cleaning, transformation, and integration Its purpose is to refine data for analysis, aiming to enhance data quality and align it with the requirements of the specific data mining task Some common data procedures we will include are:
- Data cleaning: This step involves identifying and rectifying errors, inconsistencies, and inaccuracies within a dataset This process is essential to ensure that the data is accurate and reliable for subsequent analysis, includes handling missing values, removing duplicates, and addressing outliers
- Data transformation: This involves converting data into a suitable format for analysis Common techniques in data transformation include normalization (scaling data to a common range), standardization (adjusting data to have a standard mean and variance), and discretization (converting continuous data into discrete categories) These transformations make the data more amenable to analytical techniques and modeling
- Data normalization: Data normalization is a pivotal data preprocessing procedure involving the transformation of data to a consistent range, typically within the bounds of 0 to 1 or -1 to 1 Its application is driven by the necessity to mitigate disparities in data units and scales Notable techniques in the realm of data normalization encompass min-max normalization, z-score normalization, and decimal scaling
2.2.2 LRFM Data Modeling
The LRFM model we use is an extension of the well-known RFM model Similar
to RFM, it is used in feature selection after data preprocessing Recency (R) refers to how recently a customer has made a purchase It is typically measured by the customer's last purchase date The idea is that more recent customers may have a higher likelihood
of making another purchase Frequency (F) represents how often a customer purchases
28
Trang 29within a given time frame It is a measure of customer loyalty and engagement, as customers who buy frequently are often more loyal to the business Monetary Value (M) refers to the total amount of money a customer has spent on purchases during a specified period It helps identify high-value customers who contribute significantly to the business's revenue
In addition to three existing variables, we add a new variable called Length Length (L) is the number of days between the first and last session of the purchase With the inclusion of “Length” representing the duration of the customers’ association with the business, thus proves LRFM model as more advantageous for loyalty segmentation The LRFM approach assigns value based not only on Recency, Frequency, and Monetary factors but also on the time length of the customer's engagement, thereby enhancing its effectiveness in identifying and categorizing loyal customer segments 2.2.3 Elbow method
The Elbow method is employed to ascertain the optimal number of clusters in a dataset by minimizing the total variation or squared distances within the clusters This method relies on evaluating the Sum of Squared Errors (SSE) values SSE serves as a validation measure for clusters by calculating the sum of squares of each cluster member's distance to its center The Elbow method identifies the point at which the SSE values form an "elbow" in the graph, indicating the optimal number of clusters The SSE formula is as follows:
29
Trang 301) Updating Centroids: The algorithm recalculates the centroids of each cluster by determining the mean of all data points assigned to that cluster These recalculated centroids serve as the updated cluster centers Conceptually, this step involves identifying the "average location” of all members within a cluster and designating it as the new center of the cluster
algorithm evaluates whether the centroids have undergone substantial or minimal movement If the centroids exhibit minimal displacement from the previous iteration, indicating limited changes in the groups, the algorithm ceases execution due to convergence Conversely, if significant movement is detected, K-means initiates another iteration of centroid recalculation and convergence checking
The process of iteratively updating centroids and assessing convergence, resembling a gradual refinement of the clusters until they stabilize, is repeated multiple times K- means continues these iterations until either the centroids cease to exhibit significant movement or a predetermined number of iterations is reached
2.2.5 Silhouette method
The Silhouette method is a valuable technique for evaluating the quality of clustering in unsupervised machine learning It provides a quantitative measure of how well-separated and cohesive the clusters are within a dataset This method calculates a silhouette score for each data point, taking into account its distance to the other data points within the same cluster and the nearest neighboring cluster The Silhouette score ranges from -1 to 1, where a higher score indicates that the data point is appropriately clustered A score close to 1 implies that the data point is well-matched to its cluster and significantly separated from neighboring clusters, while a score near 0 suggests it
is on or very close to the decision boundary between clusters The formula of Silhouette
score:
30