Introduction
As society and the economy evolve, the demand for efficient transportation continues to rise, with the aviation industry emerging as a preferred option for long-distance travel and time-sensitive commuters Airlines strive to offer not only safe and convenient flights but also high-quality service, which serves as a key competitive advantage This focus on exceptional customer experience is particularly important as their clientele often consists of middle-class individuals and above, who prioritize service excellence during their travels.
Airline customer service encompasses the range of activities and services provided by airlines to enhance the passenger experience throughout their journey This includes pre-flight services like seamless booking and efficient check-in, as well as in-flight offerings such as comfortable seating, entertainment options, and various other amenities aimed at improving overall satisfaction.
Analyzing customer reviews and satisfaction with airline services is crucial for airlines to gain insights into their customers' needs and preferences Understanding whether the services offered align with passenger expectations allows airlines to evaluate customer attitudes and identify areas for improvement By addressing these insights, airlines can enhance their offerings to better meet the demands of their passengers.
To effectively assess customer satisfaction ratings for flight services, we utilized the Orange tool to analyze extensive data from various sources By employing K-means clustering and linear regression, we identified key factors influencing customer satisfaction throughout the flight journey This analysis not only enables a comprehensive evaluation of airline performance but also guides future strategies to enhance and maintain high levels of customer satisfaction.
Data overview
The data set from Kaggle includes 4000 responses which is customer information as well as their evaluation of the flight service experience The information includes:
Gender Gender of the passengers (Female, Male)
Customer Type The customer type (Loyal customer, disloyal customer)
Age The actual age of the passengers
Profession The passengers current job
Annual income The annual income of the passengers
Spending Score The spending score of passenger (1-100)
Purpose of the flight of the passengers (Personal Travel, Business Travel)
Travel class in the plane of the passengers (Business, Eco, Eco Plus)
Flight distance The flight distance of this journey
Satisfaction level of the inflight wifi service (0:Not Applicable;1-5)
Departure/Arrival time convenient Satisfaction level of Departure/Arrival time convenient Ease of Online booking Satisfaction level of online booking
Gate location Satisfaction level of Gate location
Food and drink Satisfaction level of Food and drink
Online boarding Satisfaction level of online boarding
Seat comfort Satisfaction level of Seat comfort
Inflight entertainment Satisfaction level of inflight entertainment
On-board service Satisfaction level of On-board service
Leg room service Satisfaction level of Leg room service
Baggage handling Satisfaction level of baggage handling
Check-in service Satisfaction level of Check-in service
Inflight service Satisfaction level of inflight service
Cleanliness Satisfaction level of Cleanliness
Departure Delay in Minutes Minutes delayed when departure
Arrival Delay in Minutes Minutes delayed when Arrival
Airline satisfaction level (Satisfaction, neutral or dissatisfaction)
Research method
Clustering using K-means
The K-means++ algorithm enhances the traditional K-means method by addressing the limitations of random centroid initialization By strategically selecting centroids that are statistically closer to the actual data centers, K-means++ improves the quality of clustering While the subsequent steps remain consistent with the standard K-means approach, the key innovation lies in its informed centroid initialization Thus, K-means++ can be viewed as an upgraded version of K-means, featuring a more effective method for determining initial centers of gravity.
The K-means++ technique effectively addresses the challenges of determining initial cluster centroids in K-means, as highlighted in Shindler's review of various clustering algorithms.
Applied Big Data in Management Đại học Tôn Đức…
Marketing VÀ TRUYỀN Thông CỦA IVY MODA
Tôi đang chia sẻ Scan
THAM KHẢO - BÁO CÁO CUỐI KÌ BIG DATA
K-Means is an unsupervised learning algorithm that partitions a given dataset into a fixed number of clusters (K clusters) by defining K centroids, one for each cluster To ensure better results, the centroids are placed far away from each other
3.1.2 The idea of K-means++ algorithm
The procedure for selecting the first centroid of k is to uniformly select the data points being clustered, ensuring that the closest centroid represents the group:
1 Decide on a shared centroid for the data points
2 Calculate D(x), the separation between each unselected data point x and the closest chosen centroid
3 Using a weighted probability distribution, choose a new data point at a new centroid, where the chosen point x has a probability proportional to D(x)2
4 Up until the centroids are chosen, repeat steps 2 and 3
5 After choosing the first centroids, we continue to conduct clustering using the conventional K-means algorithm
The K-means++ algorithm was created to get over the problem with picking the initial centroid at random Because the final clustering outcome is dependent on the original cluster centroids
3.1.3 Details of the K-means++ algorithm
The K-means++ algorithm is a technique for displaying the shortest D(x) distance between a data point and the selected nearest centroid Initialization of K- mean++ steps:
● Step 1: For a center c1, the first centroid is evenly chosen at random from the collection of m points in the data
To determine the distance of each location in the dataset from the selected centroid, calculate the distance of point xi from the furthest center of gravity using specific formulas.
In there: di: is the distance of the point xi to the farthest centroid k: number of centroids selected
● Step 3: Take a new center xi , whose maximum probability is proportional to di
● Step 4: Repeat steps 3 and 4, until we find k centroids
After finding the centroid k, we continue to divide the cluster based on the standard K-means algorithm as in Section 3, Part III above.
Linear Regression method
Linear Regression is a statistical technique used to predict the relationship between a dependent variable and one or more independent variables This method aims to identify the best-fit line that accurately represents the connection between these variables.
Regression analysis serves two primary purposes: making predictions and identifying causal relationships between independent and dependent variables While it overlaps with machine learning in forecasting, it's crucial to understand that regression only indicates relationships within a defined set of variables and does not establish definitive causal links.
Simple Linear Regression is the fundamental type of linear regression that aims to identify the optimal straight line that best fits a given dataset The model is represented by the equation y = b0 + b1*x, where y is the dependent variable, b0 is the y-intercept, b1 is the slope of the line, and x is the independent variable.
In this equation, y is the dependent variable, x is the independent variable, b0 is the intercept, and b1 is the line's slope
Linear regression is a powerful statistical tool that enables the prediction of future values of a dependent variable, elucidates the relationship between independent and dependent variables, and identifies key predictors within a dataset Its applications span various fields, including finance, economics, social sciences, and engineering.
Linear regression analysis utilizes the regression coefficient to assess how independent variables influence a dependent variable within a linear model This coefficient represents the average change in the dependent variable for each unit change in the independent variable A positive coefficient indicates that both variables move in the same direction, while a negative coefficient signifies an inverse relationship The absolute value of the coefficient reflects the strength of the independent variable's impact on the dependent variable, making it crucial for understanding their relationship.
Data mining and analysis using Orange software
Data preprocessing
Step 1: Adding data to Orange using the CSV File Import widget and evaluating the data In this task, we have added 2 data Segmentation Information and Customer Survey, which are separate data files we extracted to simulate data from multiple sources imported into the Data warehouse
Picture 2 Adding data to Orange Data Mining by CSV File Import
We will utilize the Feature Statistics tool in Orange to assess the data effectively This tool provides a comprehensive overview by displaying various statistics for each feature, enabling us to identify key features and detect any missing data.
Picture 3 Evaluate the data by the Feature Statistics
Step 2: If the data has errors as mentioned above, we can use the Preprocess widget to help handle missing data Here, we will select the Impute Missing Values option and choose Replace with Random Value to fill in the missing data with random values
To merge processed data files, utilize the Concatenate widget in Orange, which enables the joining of multiple data sources based on a shared attribute or index.
Picture 6 Merging 2 Data by Concatenate
Step 3: Adding the processed data to a table to get an overview of the data
Picture 7 Table of Customer Data After Preprocessing and Merging
Data mining and analysis using Orange software
Picture 8 Cluster analysis model and customer satisfaction level
4.2.2 The customer portrait of the airline industry a) Use distributions to visualize customer information
Figure 1 Passenger age and gender
Male customers generally have a higher frequency of flights compared to female customers, with the most active age group being those between 25 and 27 years old and individuals in their early 40s Notably, the highest flight frequency is observed among male customers around 41 years old, averaging nearly 120 flights In contrast, the number of passengers aged 50 and older shows fluctuations but is on a downward trend.
The data indicates that the majority of loyal customers fall within the 40 to 60 age range, with the 40 to 45 age group having the highest concentration at 180 loyal customers Additionally, the 25 to 30 age group shows a significant number of loyal customers, totaling 120 Conversely, the 20 to 30 age range experiences the highest level of disloyalty, with nearly 70 disloyal customers Other age groups maintain a relatively consistent number of disloyal customers, averaging around the same level.
Customers in the aviation industry typically have an annual income of $60,000 or more, encompassing both loyal and disloyal patrons The primary users of aviation services are individuals earning between $60,000 and $100,000 annually While there are some customers with incomes below $60,000, they represent a negligible portion of the overall customer base.
4.2.3 Describe customer portraits by using Box Plot
The highest group of air travelers consists of those in the entertainment industry, with artists leading at 1,237 flights, followed by entertainment professionals at 458 The healthcare sector ranks second, with healthcare workers taking 674 flights, and doctors frequently traveling as well, totaling 327 flights Other notable professions with significant air travel include engineers (357 flights), lawyers (355 flights), executives (287 flights), marketers (172 flights), and homemakers (133 flights).
Figure 5 Customer’s profession by Class
Eco Class is popular among various professions, including Artists, Engineers, Healthcare workers, and Marketing professionals Additionally, individuals in the Entertainment and Healthcare sectors tend to prefer Eco Plus over the other two classes However, overall usage of the different flight classes shows no significant distinction.
4.2.4 Relationship between Flight Distances, Annual Income and Type of Travel by using Scatter Plot
Figure 6 Relationship between flight distance, income and type of travel
Customers opting for transportation typically have an annual income of $50,000 or more In the realm of flight segments, individuals traveling for personal reasons predominantly choose short flights, averaging around 1,000 miles Meanwhile, flights ranging from 1,000 to under 3,000 miles are primarily booked by business travelers, although there remains a small percentage of personal travel customers in this category.
Customer clustering by K means
Picture 9 K-means clustering diagram using Orange
Applying K-means clustering algorithm to cluster customer groups with similar groups of airline airlines We will use the Orange tool to perform K-means clustering
We have chosen 9 out of 20 initial groups of factors to calculate the Silhouse index The selected factors are Age, Annual Income, Class, Flight Distance, Work Experience, Customer Type, Profession, Type of Travel, and Spending Score (ranging from 1 to 100).
In the third step, we utilize various tools including Scatter Plot, Bar Plot, Table, and Silhouette Plot to analyze and visualize customer segments The K-means algorithm, specifically the K-means++ variant used in the Orange tool, allows us to determine the optimal number of customer subgroups Based on the Silhouette index, we conclude that the ideal number of subgroups is two.
First, we will use Scatter Plot for analysis based on the variable Flight Distance
Figure 7 Scatter chart of customer clusters by age
The analysis reveals that the C2 customer group primarily prefers flights exceeding 1,500 km, while the C1 group exclusively targets shorter flights under this distance Notably, both segments share a significant customer base aged between 20 and 60, prompting further exploration of their distinct preferences.
Figure 8 Scatter chart of customer clusters by Annual Income
Group C1 typically has an annual income ranging from $50,000 to $190,000 and prefers flights with a distance of around 1,000 km In contrast, group C2 exhibits a more varied annual income distribution, yet those who favor flights of approximately 2,500 km tend to have the most concentrated income within the same range of $50,000.
Figure 9 Scatter chart of customer clusters according to Work Experiences
The C1 group primarily consists of individuals with work experiences ranging from 0 to 1 year and 4 to 10 years In contrast, the other customer segment predominantly features individuals with 0 to 1 year of experience.
In addition, we apply another way of analysis using Bar Plot
Figure 10 Scatter chart of customer clusters by Age
By age, customer group C1 will focus on 3 age groups including approximately
25 years old, approximately 39 years old, and approximately 45 years old As for group C2, customers will mainly be people around 42 years old, about 44 years old and almost
Figure 11 Scatter chart of customer clusters by Income
The annual income disparity between customer groups C1 and C2 is significant, with C1 earning nearly double that of C2 The highest income for C1 reaches around $90,000 with a frequency of approximately 120 transactions, while the lowest stands at $20,000 with 15 transactions In comparison, C2's highest income is $70,000 with about 60 transactions, and its lowest income is approximately $10,000 with a frequency of 5 transactions.
Figure 12 Scatter chart of customer clusters by Flight Distances
C1 customers show a strong preference for shorter flights, particularly those around 500 km, with over 320 individuals favoring this distance In contrast, C2 customers gravitate towards longer flights, with the most popular distance being approximately 2,500 km and the least preferred at 1,100 km.
Figure 13 Scatter chart of customer clusters according to Work Experiences
The last factor is Work Experiences, customer group C1 has 2 times more experience than customer group C2 and the highest level of experience in 2 customer groups is approximately 2 years
Combining the above analysis factors, the airline's customer data shows two customer clusters with the following meanings:
● Group C1: young - middle-aged customers, have a higher number of years of working experience than group C1, high income and love short-haul flights
● Group C2: elderly customers, medium years of working experience, income is not as high as C2 and prefer long flights.
Analyze the factors affecting customer satisfaction
Picture 12 The process of Linear Regression method
Step 1: Based on the processed data, select the data for analysis, which is the survey data of factors affecting customer satisfaction measured on a Likert scale 1 - 5
Picture 13 The table of variables
Here we see that the variables in the Features section are independent variables, and the variables in the Target section are dependent variables
Step 2: Use the Linear Regression widget to perform Linear Regression Analysis
Picture 14 The result of independent variables
A linear regression analysis reveals that gate location, leg room service, and baggage handling are the top three factors significantly enhancing customer satisfaction with airline services In-flight service and cleanliness also positively influence satisfaction levels Conversely, in-flight entertainment, departure delays, and check-in service have minimal or negative effects on customer satisfaction While other factors contribute positively, their impact on the overall customer experience is comparatively smaller.
A positive coefficient indicates a direct correlation between various factors and customer satisfaction in air services; as the quality of these factors improves, customer satisfaction levels also rise The strength of this impact is reflected in the coefficient value, with higher values signifying a more significant effect For instance, a one-unit increase in gate location, with a coefficient of 0.131702, is projected to enhance customer satisfaction by 0.131702 Similarly, an increase in leg room service, which has a coefficient of 0.122556, is expected to boost customer satisfaction by 0.122556.
The results of the linear regression analysis indicate that the following factors have a positive impact on customer satisfaction when using airline services, listed in order of their coefficient values:
On the other hand, the following factors have a negative impact on customer satisfaction or don’t have effect when using airline services:
Even if coefficients indicate that certain factors have minimal or negative impacts on customer satisfaction, airlines should still prioritize these elements in their strategies to enhance overall customer experience Small enhancements in these areas can be valued by customers and lead to a more positive perception of the airline.
More detail analysis about factor that strong affect on customer satisfaction:
The location of the gate significantly influences customer satisfaction, with a strong correlation to positive experiences When gates are conveniently situated and easily accessible, customers report higher satisfaction levels Therefore, airlines must focus on ensuring that gate locations are simple to find and navigate, complemented by clear signage and directions.
A key factor contributing to customer satisfaction is the availability of adequate leg room during flights, which significantly enhances passenger comfort Airlines can boost customer satisfaction by either increasing leg room in standard seating or offering premium options for additional space, thereby catering to the preferences of travelers.
Efficient and reliable baggage handling, with a coefficient of 0.119746, plays a crucial role in enhancing customer satisfaction by minimizing stress and inconvenience for travelers A positive correlation exists between effective luggage management and customer contentment, indicating that passengers whose luggage is handled properly are more likely to be satisfied To boost customer satisfaction, airlines must prioritize improvements in their baggage handling processes, including investing in advanced tracking technology and increasing the number of baggage handlers.
Inflight service quality, particularly regarding meals and drinks, significantly influences customer satisfaction Passengers express higher satisfaction levels when airlines provide a diverse selection of high-quality inflight dining options To enhance customer satisfaction, airlines should focus on expanding their food and beverage offerings and improving the overall quality of their meals.
A coefficient of 0.0912188 indicates that a clean and well-maintained cabin environment significantly impacts customer satisfaction by enhancing comfort and minimizing illness risk Cleanliness is a key driver of customer satisfaction, with well-maintained aircraft leading to higher levels of contentment among passengers To boost customer satisfaction, airlines should invest in advanced cleaning equipment and processes, along with training employees to ensure a sanitary environment.
A positive check-in experience greatly influences customer satisfaction, with a significant coefficient of 0.0689468 highlighting the importance of speed, efficiency, and staff friendliness Key elements of the check-in service include the efficiency of the process, the helpfulness of staff, and the clarity of information provided Enhancements in these areas can lead to a more favorable customer experience and improved satisfaction ratings.
Comfortable seating is crucial for enhancing customer satisfaction, as passengers are more likely to feel content when their seats offer adequate support Airlines can boost customer happiness by investing in improved seating options or supplying additional amenities such as cushions and pillows.
Customer satisfaction is significantly impacted by the quality and variety of food and beverages offered Passengers tend to feel more satisfied when they have access to a diverse selection of high-quality options Airlines can enhance customer satisfaction by partnering with reputable food and beverage suppliers or by investing in their own catering services.
Customers are more likely to be satisfied with their flight experience when online booking is simple, as indicated by a coefficient of 0.0501179 The ease of the online booking and check-in process plays a crucial role in enhancing customer satisfaction Airlines can boost satisfaction levels by investing in user-friendly online booking and check-in systems.
The online boarding process significantly influences customer satisfaction, as evidenced by a coefficient of 0.038703 A well-organized and efficient boarding experience leads to higher levels of customer contentment Airlines can enhance satisfaction by investing in advanced boarding technology and optimizing staffing during the boarding process.
The quality of on-board services, including duty-free shopping and entertainment options, significantly influences customer satisfaction, as indicated by a coefficient of 0.0383291 Friendly and helpful crew members, along with a variety of entertainment choices, enhance the overall travel experience To boost customer satisfaction, airlines should invest in improved crew training and expand entertainment offerings.
A coefficient of 0.0288492 highlights the positive influence of convenient departure and arrival times on customer satisfaction When flights are scheduled to arrive at reasonable hours, customers experience greater happiness Airlines can enhance customer satisfaction by offering more flight options with these convenient timings.
The following factors have a negligible or negative effect on customer satisfaction:
Conclusion
The analysis utilizing K-Means and Linear Regression reveals two significant customer segments: young and middle-aged high-income travelers who favor short flights, and older customers with average incomes who tend to prefer longer flights.
Customer satisfaction is influenced by various positive and negative factors Key positive factors include the convenience of entrance locations, quality of staff service, efficient baggage handling, and in-flight services Additionally, cleanliness, ease of online ticketing and check-in, comfort of seats, onboard food and drinks, and suitable departure and arrival times significantly enhance the overall customer experience.
On the other hand, negative factors have little effect on customer satisfaction, such as in-flight entertainment services, delayed departure and arrival, and onboard WiFi services
Both K-Means and Linear Regression have inherent limitations in data analysis K-Means necessitates a predetermined number of clusters and can produce varying outcomes based on the initial centroids selected On the other hand, Linear Regression is based on the assumption of a linear relationship between input and output variables, which may not be appropriate for complex data models To achieve accurate analysis results, it is essential to utilize multiple methods and conduct a comprehensive evaluation of the results, as both K-Means and Linear Regression can struggle with certain data types.
Here are some recommendation to focus on these customer segments and increase customer satisfaction:
Airlines can enhance customer satisfaction by offering tailored services and products that align with the specific preferences of different customer segments For instance, special deals and promotions for short-haul flights can effectively attract young and middle-aged travelers, while comfortable seating and engaging in-flight entertainment can cater to the needs of elderly passengers on long-haul flights.
To enhance customer satisfaction, airlines must prioritize improving various aspects of customer service, including quality gate locations, adequate legroom, efficient baggage handling, and superior inflight service Additionally, maintaining cleanliness, streamlining check-in processes, ensuring comfortable seating, and offering appealing food and drink options are essential Airlines should also focus on simplifying online booking and boarding procedures, as well as ensuring convenient departure and arrival times.
Airlines can enhance their services and products by leveraging customer feedback By gathering insights from passengers, they can pinpoint areas needing improvement and implement necessary changes to elevate the overall customer experience.
Build customer loyalty: Airlines can build customer loyalty by providing frequent flyer programs, special deals and discounts, and personalized services This can help retain customers and attract new ones
To effectively target and develop specific customer segments, airlines must provide tailored services and products, enhance customer service, leverage customer feedback, and foster customer loyalty Implementing these strategies will not only boost customer satisfaction but also create a competitive edge in the airline industry.