1. Trang chủ
  2. » Luận Văn - Báo Cáo

final project report interdisciplinary research method course topic an application of lrfm model for customer loyalty segmentation at adventure works company

61 0 0
Tài liệu được quét OCR, nội dung có thể không chính xác
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề An Application of LRFM Model for Customer Loyalty Segmentation at Adventure Works Company
Tác giả Lờ Đỡnh Giỏp, Ha Tran Ngoc Quy, Hồ Song Tớn, Thai Anh Thu, Huỳnh Huệ Trỳc
Người hướng dẫn Ho Trung Thanh, Assoc., Le Thi Kim Hien, Ph.D., Nguyen Phat Dat, B.S.
Trường học University of Economics and Law
Chuyên ngành Information Systems
Thể loại Final Project Report
Năm xuất bản 2023
Thành phố Ho Chi Minh City
Định dạng
Số trang 61
Dung lượng 5,56 MB

Nội dung

UNIVERSITY OF ECONOMICS AND LAW FACULTY OF INFORMATION SYSTEMS FINAL PROJECT REPORT INTERDISCIPLINARY RESEARCH METHOD COURSE TOPIC: AN APPLICATION OF LRFM MODEL FOR CUSTOMER LOYALTY

Trang 1

UNIVERSITY OF ECONOMICS AND LAW FACULTY OF INFORMATION SYSTEMS

FINAL PROJECT REPORT INTERDISCIPLINARY RESEARCH METHOD COURSE

TOPIC:

AN APPLICATION OF LRFM MODEL FOR CUSTOMER LOYALTY

SEGMENTATION AT ADVENTURE WORKS COMPANY

Lecturers:

1 Ho Trung Thanh, Assoc Prof-PhD

2 Le Thi Kim Hien, Ph.D

3 Nguyen Phat Dat, B.S

GROUP 02

Ho Chi Minh City, November, 2023

Trang 2

Members of Group 02

Trang 3

Acknowledgements

First of all, we would like to express our profound gratitude to University of Economics and Law for integrating the Interdisciplinary Research Methods course into Information System Faculty’s program We particularly want to convey our appreciation to Associate Professor - Dr Ho Trung Thanh, Deputy Dean Dr Le Thi Kim Hien, and Bachelor of Science Nguyen Phat Dat for their invaluable guidance and unwavering support, which were instrumental in the success of our research

Our heartfelt thanks also go to the authors and author groups who have made significant contributions through research works, articles, theses, models, and the sharing of knowledge and methods across various fields relevant to this study These contributions have significantly enhanced the clarity and comprehensiveness of our research Despite our earnest efforts during the research process, we acknowledge that some mistakes may be unavoidable We value and welcome all types of feedback as valuable contributions to enhancing and improving our work

Group 02

Trang 4

Commitment

The research has been carried out collectively by all members of Group 02 under the guidance of two lecturers, Ho Trung Thanh , Le Thi Kim Hien and Nguyen Phat Dat Additionally, the paper includes references from various articles on related subjects Should there be any evidence of academic misconduct in this research paper, our group

is committed to bearing full responsibility for any consequences at any level of punishment

Ho Chi Minh City, 2023

Group 02

Trang 5

Chapter 2 Methodology and Proposed Research Models - 7522222 csczzseecSs 28

3.4.L,R, F, M calculafion - - - - c0 TH n1 TH ST ng TT H TT ng TT cv cà 37

5

Trang 6

List of Tables

Trang 7

List of Figures

Figure 3.1 Select necessary attributes to calculate L, R, F, M values 37

Figure 3.4: Relationship between Monetary and Length -<<<5 39 Figure 3.5 Chart Relationship between Monetary and Recency 40

Figure 3.6 Chart Relationship between Monetary and Frequency 40

Figure 3.3 The distribution after transform and normalize the data - 43

Figure 4.1: Elbow table 0002 22 2 11 0 n1 1g nh ket 44 Figure VÀO (05a Œaađiiđiiđa 45

Figure 4.3 Silhouette score V€erSus ““K”” TH ST HT SH Tp 46 Figure 4.4 Clustering reSUIf - TT TS 2n nT TT TH TK KT KH 46 Figure 4.5 si eằa a 47 Figure 4.6 Average of L, R, F, M values for each cluster - c2: 48 Figure 4.7 Number of customers in each segment Q2 n2 n2 Hs sờ 48 Figure 4.8 Total Length of each segment 022212211 11H n1 1 21121 xvy 49 Figure 4.9 Total Recency of each segmentL - 022222212 1 21 1 2 11x vy 50 Figure 4.10 Total Frequency of each segmentL -ccc cà 2n SSnss nhe e 50 Figure 4.11 Total Monetary of each segment c.n n nv vn rxện 51 Figure 4.13 Description of Cluster 2 - Original Moderate Loyal Customers 53

Figure 4.14 Description of Cluster 3 - New Extreme Loyal Customers 54

Figure 4.15 Description of Cluster 4 - All-Time Extreme Loyal Customers 55

Figure 4.16 Description of Cluster 5 - All-Time Loyal Customers 57

Figure 4.17 Description of Cluster 6 - New Customers but low Loyalty 58

7

Trang 8

B2B | Business to Business

Trang 9

BUILDING IDEAS AND PROJECTS

In this step, we find out issues related to big data analysis and methods to help improve data analysis We will use a company's sample customer data to analyze and draw conclusions using research methods over a selected period of time

CHAPTER 1: THEORETICAL BACKGROUND AND RELATED WORK

Chapter 1 sets the conceptual foundation concentrating on consumer segmentation, the LRFM model, and numerous approaches and algorithms used by other research teams Furthermore, pertinent research’ recommendations, methodologies, and limitations are provided

CHAPTER 2: METHODOLOGY

The second chapter provided a clear framework for researching the research topics by outlining the study design, data gathering methods, and analytical approaches Introducing the LRFM models that will be used to drive data analysis, establishing the groundwork for later chapters CHAPTER 3: DATA UNDERSTANDING AND PREPARATION

The structure of the dataset is examined along with the important variables and their connections It then highlights the critical process of data cleaning and preprocessing in order

to ensure data quality and dependability

CHAPTER 4: EXPERIMENTAL RESULTS

Trang 10

Applying the K-means algorithm to a normalized dataset using the LRFM model aims to identify distinct customer clusters Subsequent analysis of these clusters will inform strategic labeling, enabling the development of targeted marketing campaigns for the company CONCLUSION AND FUTURE WORK

In the competitive retail sector, our research identifies high-value customers using surveys and Python analysis with the LRFM model and K-means method This approach offers a

framework for effective, customer-centric strategies However, limitations include dataset

representativeness, sensitivity in K-means clustering, and challenges in achieving stable segmentation despite normalization efforts

10

Trang 11

ABSTRACT

Targeting the right customers has always been a key strategy in increasing profit Adventure Works retail company is no different To ensure that its differentiated marketing strategies keep up with the appropriate segments of customers, this research was conducted A 2017 - 2020 dataset with 121.253 records with 15 characteristics regarding Sales Data was collected This study proposes a customer loyalty segmentation in a retailer context, wherein the clustering is performed using the Length- Recency-Frequency-Monetary (LRFM) model and the integration of the K-means method In the end, six clusters were found, but only five of them allowed positive Loyalty Status assessment, labeled as: Original Extreme Loyal Customers, Original Moderate Loyal Customers, New Extreme Loyal Customers, All-Time Extreme Loyal Customers, All-Time Loyal Customers This clustering results yielded a Silhouette Coefficient score of 0,837 Derived from the results of this segmentation, Adventure Works can strategically deliver tailored marketing to their clients, gradually boost its customer relations

Keywords: LRFM model; K-means clustering; Elbow method; Silhouette score; customer segmentation; customer loyalty; marketing; retail industry

11

Trang 12

ABSTRACT

Việc nhắm đến đúng đối tượng khách hàng luôn là chiến lược quan trọng đề tăng cường lợi nhuận, và công ty bán lẻ Adventure Works cũng không nằm ngoài quy luật Đề đảm bảo chiến lược tiếp thị phân khúc hóa của công ty được truyền tới các phân đoạn khách hàng thích hợp, nghiên cứu này đã được tiễn hành Một tập dữ liệu từ năm 2017 - 2020 với 121.253 bản ghi và 15 đặc điểm liên quan đến Dữ liệu Bán hàng đã được thu thập Nhóm nghiên cứu sau đó đã phân khúc hóa cấp độ trung thành của khách hàng trong ngữ cảnh bán lẻ, trong đó việc gom cụm được thực hiện bằng cách sử dụng mô hình Length-Recency-Frequency-Monetary (LRFM) tích hợp phương pháp K-means Kết quả phân cụm, nhóm thu được sáu nhóm, nhưng chỉ có năm trong số đó thỏa mãn điều kiện để đánh giá Trạng thái Trung thành tích cực, được gán cho tên gọi lần lượt là: Khách hàng Trung thành Cao Cấp Ban Đầu, Khách hàng Trung thành Trung Bình Ban Đầu, Khách hàng Trung thành Cao Cấp Mới, Khách hàng Trung thành Cao Cấp Mọi thời kỳ, và Khách hàng Trung thành Mọi thời kỳ Kết quả phân khúc hóa này đạt được

số điểm Silhouette Coefficient khá ấn tượng là 0,837 Xuất phát từ kết quả nảy, Adventure Works có thể tùy chỉnh chiến lược tiếp thị sao cho phù hợp hơn đối với từng phân khúc khách hàng của họ, ngày càng củng cố nền quản trị quan hệ với khách hàng của công ty

Từ khóa: mô hình LREM; thuật toán phân cụm K-means; phương pháp Elbow; phương pháp Silhouette score; phân khúc khách hàng: lòng trung thảnh của khách hàng; marketing: nền công nghiệp bán lẻ

12

Trang 13

to enhance market competitiveness, attract potential customers, and foster loyalty among its existing clients

Objectives

The study aims to provide an efficient customer segmentation model based on loyalty status using the LRFM (Length, Recency, Frequency, Monetary) model and K-Means algorithm The in-depth analysis of each segment means to help Adventure Works:

e Identify customers’ actual shopping behavior

e Comprehend customer diversity and capture the typical characteristics of each segment

e Enhance business decisions and develop more effective marketing and advertising campaigns for increasing profit

13

Trang 14

Objects and scopes

Objects

We investigate and analyze the purchasing behaviors and habits of Adventure Works' customers, derived from the retailer dataset, specifically, how long had they been purchasing from Adventure Works, when was the last time they bought something from AW, how often did they order from AW, and how much had they spent on AW’s products In other words, their Length, Recency, Frequency, and Monetary value scores are to be examined and understood for insightful outputs

Scopes

® Time scope: From July 1, 2017, to June 15, 2020

e Space scope: Buyers (except for Resellers) of retail products from AWC Research method

The research aims to segment Adventure Works' customers based on loyalty status using the LRFM model and K-means algorithm, following the below set of steps:

e Data preprocessing: Irrelevant and inaccurate information is removed to ensure data cleanliness

e Calculation of L, R, F, and M scores: The L (Length), R (Recency), F (Frequency), and M (Monetary) scores of each customer are calculated

e Determination of the optimal number of clusters: Both the Elbow method and Silhouette score were carried out to determine the optimal number of clusters

@ Model construction: The customer segmentation model is constructed using the K-means algorithm

@ Model evaluation: The Silhouette method is used to assess the effectiveness of the customer segmentation model

e Analysing results, labeling clusters, and recommending strategies: An in-depth analysis of the segmented data is conducted to derive actionable insights and propose strategies to benefit the company

14

Trang 15

Chapter 1 Theoretical Framework and Literature Review

In the introductory section of the first chapter, we lay the theoretical groundwork for our research, focusing on customer segmentation, the LRFM model, along with various techniques and algorithms employed by our research team In addition, the proposals, methods, and limitations of relevant studies are also presented

of new products, exploration of new markets, global marketing endeavors, and strategic decision-making

“Factors affecting the online shopping intention of Generation Z consumers in Vietnam” (Ta Van Thanh , Dang Xuan On, 2021) identifies and evaluates the impact of key factors affecting the online shopping intention of Generation Z consumers based on research methods Through quantitative research, analysis of scale reliability, exploratory factors, regression, and testing of model fit, they conclude there were a total

of 4 factors: perceived usefulness, trust, perceived risk, and psychological safety affect generation Z's online shopping intention, thereby drawing conclusions and recommendations to contribute to improving online shopping activities

Solutions to improve individual customer loyalty at Orient Commercial Joint Stock Bank (OCB)” (Nguyen Thanh Tuan, Bui Thi Thanh, 2022) identifies factors affecting individual customer loyalty, thereby proposing solutions to enhance customer

15

Trang 16

loyalty The topic mainly combines qualitative research methods and quantitative research methods to evaluate the current state of customer loyalty

“Phu Quoc eco-tourism market segment” (Nguyen Tri Nam Khang, Duong Que Nhu, Chau My Lan, 2013) focuses on the Phu Quoc eco-tourist segment according to

demographic and behavioral criteria They chose the number of different tourist groups

in Phu Quoc, then selected a target group of tourists and stated the identifying characteristics of that group

The study "A decision-making support system module for customer segmentation and ranking” (Yossi Hadad, Baruch Keren, 2022) proposes a modular decision support system that allows for complete customer classification and ranking Modules are based

on customer criteria with quantitative values that can be extracted from the business's organizational information system By calculating customer scores based on measurable underlying criteria, the module can identify and classify customers (e.g bronze, silver, gold, platinum, etc ), track changes over time, and allow for complete and accurate

rankings This proposed method saved 90% of the time and resources needed to prepare for customer portfolio management

The study "Hybrid soft computing approach based on clustering, rule mining, and decision tree analysis for customer segmentation problem: Real case of customer-

16

Trang 17

Abolmakarem, 2018) uses a computational method combining clustering, rule extraction, and decision tree to predict new customer segments in customer-centric companies First, the K-Means algorithm is applied to cluster the company's previous customers based on th eir purchasing behavior Next, a filtering-based hybrid feature selection and multi-attribute decision-making methods are proposed Finally, on the basis of customer characteristics and using decision tree analysis, IF-THEN rules are exploited This method is applied to predict profitable customers and map out the factors that most influence customers

The study "A comparative dimensionality reduction study in telecom customer segmentation using deep learning and PCA” (Maha Alkhayrat, Mohamad Aljnidi, Kadan Aljoumaa, 2020) focuses on reducing the size of telecommunication data sets performing customer clustering in reduced space and latent space to improve clustering quality The initial data set used contains over 100,000 customers with 20 variables By using principal component analysis and Autoencoder Neural Network to eliminate irrelevant features and noisy data, especially when the data is high-dimensional, this work has helped telecommunications companies achieve better results in classifying customers into different groups

The Study "Research on Customer Segmentation Based on the characteristics of shopping centers in Ho Chi Minh City” (Dinh Tien Minh, Le Vu Lan Anh, 2021) used two main methods: qualitative and quantitative methods combined with K-Means clustering method to segment customers at shopping centers in Ho Chi Minh City, Vietnam They aimed to provide reasonable bases, helping shopping centers to release appropriate policies to their target customers The result shows that there were 3 identified customer segments: Entertainment-oriented buyers, buyers who follow practicality trends, buyers who follow the agreeable trend

The study “Application of clustering techniques and association rules to explore customer data using hotel services” (Nguyen Van Chuc, Dao Thi Giang, 2015) shows new features in applying clustering methods in the context of customer data mining Based on the data mining model with two techniques of data clustering and

17

Trang 18

association rule discovery, the authors have successfully built a web interface to support hotel managers’ decision-making, thus releasing appropriate policies for each customer group, enable customer behaviors prediction in booking hotel services as well as booking tours

The study “Overview of big data analysis in e-commerce” (Le Trieu Tuan, and

Ly Thu Trang , 2020) researches the benefits of big data analysis and proposed an analysis model to boost e-commerce businesses This study took a closer look at using big data to improve business performance Based on big data analysis methods including structured and unstructured data, it helps e-commerce businesses maintain and attract more potential customers in addition to improving overall quality and enhancing brand image

1.3 K-means clustering

Clustering is a common unsupervised learning method in Data Mining (DM) that identifies classes or groups in a dataset K-means clustering, a prominent clustering approach, separates data into distinct groups, ensuring similarity within each cluster and dissimilarity between clusters It works by finding the center of each cluster in an unlabeled dataset, minimizing the sum of squared distances between objects and their cluster centers

1.4 Traditional RFM

The 3-value model Recency, Frequency, Monetary Value, since first introduced

in 1995, has brought a new breeze in classifying customer classes, proving to be extremely effective due to its suitability wit h the 80/20 principle (Bult & Wansbeek, 1995) Regarding the RFM method, this approach ranks each customer according to 3 factors: Recency - which shows recent the customer’s last purchase was, Frequency - which shows how often the customer purchased in a given period and the last one is Monetary - which shows how much the customer spent in the given period Many studies have successfully applied this model to effectively categorize customers, gaining in-depth and detailed insights about these customer groups, thereby enhancing the customer relationship with the business

18

Trang 19

“Estimating customer lifetime value based on RFM analysis of customer purchase behavior: A case study” (Mahboubeh Khajvand, Kiyana Zolfaghar, Sarah Ashoori, and Somayeh Alizadeh, 2011) employs two distinct methodologies In the initial approach, the researchers utilize the RFM marketing analysis method (Recency, Frequency, and Currency) for customer segmentation In the second approach, they introduce an extended RFM analysis method, incorporating an additional parameter: Customer Lifetime Value (CLV) The CLV was calculated based on the weighted RFM method for each segment This thorough analysis give CLV results to different segments, providing valuable insights for refining the company's marketing and sales strategies

"Customer-Centric Sales Forecasting Model: RFM-ARIMA Approach" (Elhosseini, 2023) focuses on improving accuracy in sales forecasting with ARIMA after having applied the RFM model This study uses a large dataset from Global Superstore's Tableau, which includes information on multiple products, customer segments, geographic locations of purchases, revenue, profits, and more The article presents a detailed study of the results customer-centric combination for sales forecasting using the RFM-ARIMA model This study contributed to the field of sales forecasting by proposing a customer-centric approach center can be applied across a variety of industries and businesses to improve the accuracy of sales forecasts 1.5 LRFM model

When it comes to categorizing customers, the combination of RFM models and Machine Learning (such as K-means) is highly effective But it does not take into account the length of a customer's relationship with the company This is where the LRFM model comes in Together with the optimal number of clusters determined by methods like the Elbow and Silhouette method, businesses can optimize tailored marketing and services, making sure each segment earns the highest level of satisfaction possible

“LRFMV: An efficient customer segmentation model for superstores” (IRezwana Mahfuza, Nafisa Islam, Md Toyeb, Md Asaduzzaman Faisal Emon, Md Shahnur Azad

19

Trang 20

Chowdhury, Md Golam Rabiul Alam; 2022) shows The LRFM model is an improved version of the RFM model that adds a new dimension, V, to represent the volume of products purchased This allows the LRFM model to identify customer segments with

a clear profit-quantity relationship The LRFMV model was compared to the RFM and LRFM models, and it was found to create more accurate customer segments with the same number of customers while maintaining a greater profit

The study "Customer Segmentation Based on Loyalty Level Using K-Means and LRFM Feature Selection in Retail Online Store” (Tiara Lailatul Nikmah, Nur Hazimah Syani Harahap, Gina Cahya Utami, Muhammad Mirza Razzaq; 2023) focuses on identifying high-potential customer groups by analyzing retail online shop sales data The LRFM feature selection method and K-Means data mining algorithm were used to segment customers into four categories: Premium Loyalty, Inertia Loyalty, Latent Loyalty, and No Loyalty The Silhouette Score Index technique validated the clustering results, yielding a score of 0.943898 Businesses can use these insights to prioritize customer service and enhance sales

"A New Approach for Customer Clustering by Integrating the LRFM Model and Fuzzy Inference System" (Ali Alizadeh Zoeram, AhmadReza Karimi Mazidi; 2018) presents an enhanced LRFM model for analyzing customer behavior and optimizing customer relationship management Unlike traditional RFM models that overlook customer loyalty, the proposed LRFM model incorporates a loyalty dimension, enabling more accurate customer segmentation The model leverages a fuzzy inference system

to incorporate LRFM indices and facilitate dynamic customer clustering, ensuring flexible and adaptive strategies By analyzing customer attributes within each cluster, tailored marketing interventions can be devised to enhance customer engagement and drive business success

20

Trang 21

1.6 Research GAP and Motivation

1.6.1 Summary Table of Previous Studies

Table 1 Previous Studies

behavior: case study

tourist group and state th identifying characteristic:

of that group

each customer group, ai

predict customer behavid

in using hotel services 4

21

Trang 22

Hybrid

computing approach

soft

based on clustering

decision tree analysi

segmentation problem: A real case

2018

computational method combining

“A decision-making 2022 | Customer Track changes over timé

Trang 23

“Factors affecting the | 2021 Quantitative Assess the key facto

Vietnam”

2021

RFM Analysis Using K-Means Clustering

marketing strategy in th FMCG sector, providing valuable information fof

Trang 24

sales forecast accuracy

Machine — Learning

Application of RFM Analysis”

online contributing to the

24

Trang 25

method was utilized fo

distinct consumer types Clients were sorted int

Loyalty, Inertia Loyalty, Latent Loyalty, and No Loyalty

13 “A New Approach for 2018 | LRFM, CLV The outcomes derive

the and

LRFM Model

System”

proposed approach withi

a wholesale firm reveale

among clusters concernin the four LRFM indices Consequently, this metho proves to be effective fo customer clustering an studying their distinctive

25

Trang 26

1.6.2 Research GAP and Motivation

The above studies have shown that collecting, processing, and storing data brings

a lot of value to retail businesses Based on collected data with appropriate algorithms,

we can enhance the effectiveness in deploying marketing and advertising strategies to target customer segments The many proposed models mentioned above all aimed to segment customers separately and specifically so that businesses can easily make reasonable business decisions accordingly However, based on the customer segmentation methods of the above studies, it is still not possible to have the most appropriate advertising and marketing campaigns because: after dividing into customer clusters, the studies have not yet described the typical characteristics of each segment, leading to difficulties in choosing suitable strategies for each specific segment

To comprehensively explore and capture customer diversity, Data Mining method was applied carefully, using various well-known Data transformation and Data normalizing methods We also added a “Length” feature into the traditional RFM model, enabled the new extensive model to distinguish between customers who may have similar RFM scores but differ in their long-term loyalty This additional dimension allows for a more nuanced understanding of customer loyalty, enabling businesses to identify and reward customers who have demonstrated consistent loyalty over an extended period Finally, we illustrated the clustered segmentation under many different ways, improving the comprehensiveness of each customer segment's analysis and making it easier to identify strategies for Adventure Works to deepen its appeal to customers

Chapter 2 Methodology and Proposed Research Models

In this chapter, we establish the foundation of our research journey by outlining the research design, data collection methods, and analytical techniques

26

Trang 27

Goal: Segment Customers of AIWs due to Loyalty level End E————x—

x Data Collecting

* , 4 Data Preprocessing Analyse Results Data Cleaning Choosing features

+ Calculate L, R, F, M score

2

Data Preprocessing Silhouette coefficient > 0 (Data Transformation Data Normalization

* Building Model

2| Segmentation into "k"

groups using K-mean:

algorithm number of segments using efficiency using Silhouette Elbow & Silhouette method

Silhouette coefficient <= 0

Figure 2.1 Research Model 2.1 Research Method

The research starts with creating an initial customer segmentation model based

on Adventure Works brand loyalty We collect the essential dataset and ensure data cleanliness through normalization and transformation After calculating L, R, F, and M scores, we use the Elbow and Silhouette methods to determine optimal number of clusters before applying that number into the clustering process The Silhouette method assesses the segmentation result’s effectiveness If the Silhouette coefficient is higher than 0, we proceed to labeling and analysis; otherwise, we refine the cluster number Afterwards, in-depth analysis of segmented data yields insights and strategies for

27

Trang 28

Adventure VWorks' marketing This iterative process ensures a finely-tuned segmentation model for actionable recommendations

2.2 Customer Segmentation Analysis

2.2.1 Data Preprocessing

Data preprocessing is a pivotal phase in the data mining process, encompassing tasks such as data cleaning, transformation, and integration Its purpose is to refine data for analysis, aiming to enhance data quality and align it with the requirements of the specific data mining task Some common data procedures we will include are:

- Data cleaning: This step involves identifying and rectifying errors, inconsistencies, and inaccuracies within a dataset This process is essential to ensure that the data is accurate and reliable for subsequent analysis, includes handling missing values, removing duplicates, and addressing outliers

- Data transformation: This involves converting data into a suitable format for analysis Common techniques in data transformation include normalization (scaling data to a common range), standardization (adjusting data to have a standard mean and variance), and discretization (converting continuous data into discrete categories) These transformations make the data more amenable to analytical techniques and modeling

- Data normalization: Data normalization is a pivotal data preprocessing procedure involving the transformation of data to a consistent range, typically within the bounds of 0 to 1 or -1 to 1 Its application is driven by the necessity to mitigate disparities in data units and scales Notable techniques in the realm of data normalization encompass min-max normalization, z-score normalization, and decimal scaling

2.2.2 LRFM Data Modeling

The LRFM model we use is an extension of the well-known RFM model Similar

to RFM, it is used in feature selection after data preprocessing Recency (R) refers to how recently a customer has made a purchase It is typically measured by the customer's last purchase date The idea is that more recent customers may have a higher likelihood

of making another purchase Frequency (F) represents how often a customer purchases

28

Trang 29

within a given time frame It is a measure of customer loyalty and engagement, as customers who buy frequently are often more loyal to the business Monetary Value (M) refers to the total amount of money a customer has spent on purchases during a specified period It helps identify high-value customers who contribute significantly to the business's revenue

In addition to three existing variables, we add a new variable called Length Length (L) is the number of days between the first and last session of the purchase With the inclusion of “Length” representing the duration of the customers’ association with the business, thus proves LRFM model as more advantageous for loyalty segmentation The LRFM approach assigns value based not only on Recency, Frequency, and Monetary factors but also on the time length of the customer's engagement, thereby enhancing its effectiveness in identifying and categorizing loyal customer segments 2.2.3 Elbow method

The Elbow method is employed to ascertain the optimal number of clusters in a dataset by minimizing the total variation or squared distances within the clusters This method relies on evaluating the Sum of Squared Errors (SSE) values SSE serves as a validation measure for clusters by calculating the sum of squares of each cluster member's distance to its center The Elbow method identifies the point at which the SSE values form an "elbow" in the graph, indicating the optimal number of clusters The SSE formula is as follows:

29

Trang 30

1) Updating Centroids: The algorithm recalculates the centroids of each cluster by determining the mean of all data points assigned to that cluster These recalculated centroids serve as the updated cluster centers Conceptually, this step involves identifying the "average location” of all members within a cluster and designating it as the new center of the cluster

algorithm evaluates whether the centroids have undergone substantial or minimal movement If the centroids exhibit minimal displacement from the previous iteration, indicating limited changes in the groups, the algorithm ceases execution due to convergence Conversely, if significant movement is detected, K-means initiates another iteration of centroid recalculation and convergence checking

The process of iteratively updating centroids and assessing convergence, resembling a gradual refinement of the clusters until they stabilize, is repeated multiple times K- means continues these iterations until either the centroids cease to exhibit significant movement or a predetermined number of iterations is reached

2.2.5 Silhouette method

The Silhouette method is a valuable technique for evaluating the quality of clustering in unsupervised machine learning It provides a quantitative measure of how well-separated and cohesive the clusters are within a dataset This method calculates a silhouette score for each data point, taking into account its distance to the other data points within the same cluster and the nearest neighboring cluster The Silhouette score ranges from -1 to 1, where a higher score indicates that the data point is appropriately clustered A score close to 1 implies that the data point is well-matched to its cluster and significantly separated from neighboring clusters, while a score near 0 suggests it

is on or very close to the decision boundary between clusters The formula of Silhouette

score:

30

Ngày đăng: 22/08/2024, 21:40

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN