The findings show that the benefits and goals of CRM and BI complement each other perfectly as they both improve and help understand customer behavior and enhance customer satisfaction.K
Trang 1THE UNIVERSITYOFECONOMICS, UNIVERSITYOF DANANG
FACULTYOFECOMMERCE
ASSIGNMENT Subject: E - BUSINESS
BI IN CRMLecturer : Dr Lê Diên Tuấn
Members : Võ Nguyễn Quỳnh Trân
Đoàn Nguyên Minh Tuấn
Phạm Thị Quỳnh Như
Nguyễn Khoa Thuỳ Linh
Đà Nẵng, April 2024
Trang 2TABLE CONTENT
THEORY
I Introduction
II Literature review
2.1 Benefits/Goals of Business Intelligence
2.2 Applications of Business Intelligence in CRM
2.3 Business Intelligence integrated into CRM framework2.4 Previous Studies and Future Directions
III Theory of research methods
4.3.1 Review the data
4.3.2 Check descriptive statistics
4.3.3 Delete null data
4.3.4 Boxplot data description
4.4 Convert data to RFM
4.5 Convert data back into a range
4.6 Build a model based on K-means
Trang 3This report aims to review the concept and define the approach of using RFM, Normalization and K-Means methods in CRM in business organizations This study is qualitative for the purpose of research testing the concepts of the benefits of
integrating CRM with BI The study empirically tested the conceptual model research based on the success factors and benefits of integrating CRM along with BI The findings show that the benefits and goals of CRM and BI complement each other perfectly as they both improve and help understand customer behavior and enhance customer satisfaction
Keyword: Business Intelligence, Customer Relationship Management, K-Means,
RFM, Normalization
I Introduction
Business intelligence is a process that encompasses the collection, storytelling,
analysis, and provision of data to aid organizations in making informed decisions BIsystems have the ability to offer real-time or near-real-time data, which can beemployed to boost operational efficiency, anticipate and monitor financialperformance, and comprehend and react to customer behavior
CRM, which stands for Customer Relationship Management, is a business strategyaimed at enhancing customer satisfaction and fostering loyalty CRM systems areemployed to effectively manage customer data, track customer interactions, andautomate sales and marketing processes
II Literature review
2.1 Benefits/Goals of Business Intelligence
BI offers benefits such as (Chaudhuri et al., 2016; Chen & Popovich, 2017; Watson & Wixom, 2018)
● Avoiding redundant data management methods reduces costs
● Timely delivery of data
● Using advanced analytics to make better decisions
● Enables business strategies (for example, organization transformation, newcustomer’s identification, and customer retention
● Use of BI tools helps in having trusted data in place
● Management of business performance, such as the use of scorecards anddashboards
BI goals Article (“The Role of Business Intelligence in Today's Business Environment" by Thomas H Davenport and Jeanne G Harris, 2006.):
Trang 4● The main objective of BI is to provide businesses with the information theyneed to make better decisions.
● BI helps businesses understand the market, customers, and their own operationsbetter
● BI helps businesses identify new opportunities and challenges
2.2 Applications of Business Intelligence in CRM
BI tools and techniques enable organizations to collect, analyze, and utilize customerdata effectively By integrating BI into CRM processes, businesses can enhancecustomer segmentation, personalized marketing campaigns, and predictive analytics todrive customer satisfaction and loyalty The ability to extract actionable insights fromdata empowers companies to make informed decisions and tailor their offerings tomeet customer needs
2.3 Business Intelligence integrated into CRM framework
2.4 Previous Studies and Future Directions
Previous Studies:
Previous studies have explored the use of cluster analysis in conjunction with BusinessIntelligence (BI) approaches to develop Customer Relationship Management (CRM)methodologies for assessing customer loyalty Research by Lee et al (2017)demonstrated how cluster analysis techniques could segment customers based on theirpurchasing behavior, preferences, and engagement with the brand By leveraging BItools to analyze these customer segments, businesses were able to gain valuableinsights into customer loyalty drivers and develop targeted strategies to enhancecustomer retention
Future Directions:
Moving forward, there are several potential future directions for cluster analysis using
BI approaches to further advance CRM methodologies for assessing customer loyalty
Too long to read on your phone? Save
to read later on your computer
Save to a Studylist
Trang 5One direction could involve incorporating advanced machine learning algorithms, such
as neural networks or decision trees, into the cluster analysis process to improve theaccuracy of customer segmentation and loyalty prediction Additionally, integratingreal-time data streams from various sources, such as social media platforms or IoTdevices, could enable businesses to create dynamic customer loyalty models that adapt
to changing customer behavior and preferences
Furthermore, the integration of predictive analytics capabilities within BI tools couldhelp businesses forecast customer loyalty trends and proactively identify at-riskcustomers before they churn By combining cluster analysis with predictive modeling,organizations can not only assess current customer loyalty but also anticipate futureloyalty levels and tailor their CRM strategies accordingly
Overall, the future of cluster analysis using BI approaches in CRM methodologies toassess customer loyalty lies in leveraging advanced analytics techniques, integratingdiverse data sources, and focusing on predictive capabilities to drive personalizedcustomer engagement and long-term loyalty
III Theory of research methods
3.1 RFM
3.1.1 Definition
RFM analysis is a marketing technique used to quantitatively rank and group
customers based on the recency, frequency and monetary total of their recenttransactions to identify the best customers and perform targeted marketing campaigns.The system assigns each customer numerical scores based on these factors to provide
an objective analysis RFM analysis is based on the marketing adage that "80% of yourbusiness comes from 20% of your customers."
RFM analysis ranks each customer on the following factors:
- Recency How recent was the customer's last purchase? Customers whorecently made a purchase will still have the product on their mind andare more likely to purchase or use the product again Businesses often measurerecency in days But, depending on the product, they may measure it in years,weeks or even hours
- Frequency How often did this customer make a purchase in a given period?Customers who purchased once are often more likely to purchase again.Additionally, first time customers may be good targets for follow-upadvertising to convert them into more frequent customers
- Monetary How much money did the customer spend in a given period?Customers who spend a lot of money are more likely to spend money in thefuture and have a high value to a business The freshness of the customeractivity, be it purchases or visits
Trang 63.1.2 Calculation formula
R = The date the customer last made a purchase minus the date the customer made thefirst transaction
F = Total number of purchases divided by the time between first and last purchases
M = Combine all payments
3.2 Data Scaling Method
Data points are sometimes measured in different units or have two components (of thedata vector) that are too different from each other
For example, one component has a value range from 0 to 1000, the other componentonly has a value range from 0 to 1
→ At this point, we need to standardize the data before performing the next steps
- The value is normalized according to the following formula:
For example, with any data set, we determine that the maximum value of a feature is
30, the smallest value is -10 Thus, with any value of 18.8, we can normalize as follows:
x’= (x - min) / (max - min)
x’= (18.8 - (-10)) / (30 - (-10))
x’ = 28.8 / 40
x’ = 0.72
Trang 7� If the x value is outside the limits of the minimum and maximum values, the resulting value will not be within the range 0 and 1 If a given max and min value has been determined, a certain data point is outside the range That max and min we can remove from the dataset
K-Means is used to cluster data, collecting data into K different data clusters where theclusters within the cluster will have similarities
Clustering method with K-Means is an unsupervised technical computing machine used to cluster data into different groups The algorithm works based on distributing point data into clusters so that the total distance between the data and the cluster center
is minimum
3.3.2 What is the clustering method with K-Means used for?
Clustering method with K-Means is used to divide data into groups (clusters) based on similar characteristics between data points Specifically, using the K-Means algorithm can help in the following purposes:
Trang 8Data classification: K-Means can help classify data into similar groups based oncommon characteristics.
● Data Analysis: By clustering data, users can easily analyze characteristics and trends in the data
● Data compression: K-Means can help compress data by representing data using cluster centers instead of the original data
● Anomaly detection: By comparing data points with cluster centers, K-Means can help detect data points that are unusual or do not belong to any cluster
● Segmentation and marketing: In the marketing field, K-Means can help segmentcustomers based on purchasing behavior, interests, or other characteristics
3.3.3 How does K-Means run?
Step 1: Determine an initial number K:
● Method 1: Select K using ELBOW METHOD
The Elbow method is a graphical method for finding the optimal K value in a
k-means clustering algorithm The elbow graph shows the within-cluster-sum-of-square(WCSS) values on the y-axis corresponding to the different values of K (on the x-axis) The optimal K value is the point at which the graph forms an elbow
For example: Let K run from 1 to 9, then calculate the WSS
- Choose a K position so that when we increase K by 1 unit, the WSS will only decrease by an insignificant amount → You can choose K=4
How to calculate WSS
WSS = Sum((x - c) )i 2
xi is a data point in your cluster
c stands for the center of your cluster
You calculate this squared difference for all the data points (n) in your cluster and sum them up
How does the Elbow method work with WSS?
Step 1: Run k-means clustering on your data to select a range of size clusters (k values)
Trang 9Step 2: Calculate the total WSS for each k.
Step 3: Create a chart with the cluster of numbers on the x-axis and the corresponding WSS total on the y-axis
Step 4: Identify “elbows” where the addition of another cluster does not provide a better fit to the data
● Method 2: Select K using Silhouette Coefficient
Silhouette Score = (b-a)/max(a,b)
Where:
a is the average distance from a current point to all remaining points in a cluster
b is the average distance of 1 current point to all points in the nearest cluster
Silhouette Score runs [-1,1] , the closer it is to 1, the better the clustering
If the value is negative then the clustering is wrong
Step 2: Randomly select K certain points in the data set to be K central points in
K clusters
Step 3: Assign each remaining point to the nearest cluster based on the distance
to the nearest cluster center
Step 4: Calculate the new cluster center of each cluster by averaging
Step 5: Repeat process 3,4 until the center point does not change anymore
The formula for calculating the distance of a data point x and a cluster center Ci j
commonly used in the K-Means algorithm is distance:
PRACTICE
IV Model application
4.1 Introduction
- Data: Online Retail is data of an online retail store in the UK containing
cross-border sales transactions from February 1, 2010 to December 9, 2011
- Objective: Customer Segmentation based on RFM using K-means method for
the company to target and make the right decisions in managing relationships with customers
Trang 10- Implementation process:
Step 1: Read and understand the data source
Step 2: Clean and process data
Step 3: Convert data to calculate RFM
Step 4: Normalization
Step 5: Build a model based on the K-means algorithm
4.2 Read and understand data sources
- The data includes 8 columns and 541,910 rows
- Data description:
+ InvoiceNo: Customer invoice code, is a 6-digit integer uniquely assigned to each transaction If the code begins with the letter 'c', it represents a canceled transaction
+ Stock Code: Product code (item) A 5-digit integer uniquely assigned to each individual product
+ Description: Product name
+ Quantity: Quantity of each product (item) in a transaction
+ InvoiceDate: Date and time of invoice creation
+ Unit Price: Retail price of each product
+ CustomerID: Customer code Is a 5-digit integer uniquely assigned to each customer
+ Country: Name of the country where the customer resides
4.3 Data preprocessing
4.3.1 Review the data
Trang 11The total number of rows is 541,909 and 8 columns But the Description column has 540,455 rows and the CustomerID column has 406,829 rows, so the Description and CustomerID values are null.
4.3.2 Check descriptive statistics
It can be seen that the Quantity column has Min = -80995 and Max = 80995, which Mean =9.55 So it can be seen that the data has outliers
→ The Description column has 1,454 null lines and the CustomerID column has 135,080 null lines
Trang 124.3.3 Delete null data
→ Proceed to delete lines with null values Because if you rely on RFM,
distinguishing customers is very important, so the CustomerID column is indispensable But CustomerID cannot be replaced with average values because CustomerID is used to identify customers
4.3.4 Boxplot data description
Trang 13After processing the data with null values, the remaining data is 406,829 rows and 8 columns.
Using the Boxplot chart to check for unusual points, we can see that the unusual points
of the Quantity column are values greater than 60,000 and the unusual points of the UnitPrice column are values greater than 3000
a Check the data has Quantity > 60000
After checking, it can be discovered that the data has the number 74,215 and then 74,215 at the same time This unusual data may have been added by the person to test
Trang 14-the system or entered by mistake and is not -the actual data of -the purchasingcustomer Does not affect the analysis process so there is no need to delete it
b Check for abnormalities in column UnitPrice > 3000
After checking the abnormality of the UnitPrice column, it was found that there werenegative Quantity values, specifically 8905 rows with negative values To explain thisproblem, as the Description section describes, these may be items returned by thebuyer The number of customer returns can affect the total revenue customers bring tothe company, so these abnormalities will not be removed
4.4 Convert data to RFM
R = the date the customer last made a purchase minus the date the customer made the first transaction
F = total number of purchases divided by the time between first and last purchases
M = Combine all payments
Result: