1. Trang chủ
  2. » Giáo Dục - Đào Tạo

(Tiểu luận) introduction to business analysis analyzing customer behaviour of a car insurance business

54 6 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Analyzing Customer Behaviour of A Car Insurance Business
Tác giả Nhữ Hà Phong, Ngô Xuân Tuấn, Vũ Thanh Đức, Trần Nguyễn Tuấn Bách, Hoàng Thuỳ Linh, Cù Thảo Ly
Người hướng dẫn Assoc. Prof. Nguyen Thi Vinh, PhD. Pham Thi Cam Anh
Trường học Foreign Trade University
Chuyên ngành Economics and International Business
Thể loại Essay
Năm xuất bản 2023
Thành phố Hanoi
Định dạng
Số trang 54
Dung lượng 9,06 MB

Cấu trúc

  • Chapter 01: DATA DICTIONARY (4)
  • Chapter 02: DATA CLEANSING & PREPROCESSING (6)
  • Chapter 03: EXPLORING DATA ANALYSIS & FINDING BUSINESS INSIGHTS 9 (10)
  • Part 1: Describing attributes: Creating real customer profile (10)
  • Part 2: Multivariate statistics: Finding business insights (25)
  • Part 3: Linear regression: Finding the trends and interdependence between outcome (33)
  • Chapter 04: HYPOTHESIS TESTING (35)
  • Chapter 05: INSIGHTS, CONCLUSION & RECOMMENDATIONS (45)
  • Part 01: Synthesis of business insights of real customer behavior (45)
  • Part 02: Recommendations for the business (50)

Nội dung

Trang 1 INSTITUTE OF ECONOMICS AND INTERNATIONAL BUSINESS---INTRODUCTION TO BUSINESS ANALYSIS:ANALYZING CUSTOMER BEHAVIOUR OF A CAR INSURANCE BUSINESSCourse code: VJPE205HK1-23241.1 Tra

DATA DICTIONARY

The dataset includes 18 features as described below:

No Field name Data type Description

1 ID Integer Unique ID number for all customers

2 Age Ordinal Range of all customers’ age

3 Gender Nominal Gender of all customers

4 Race Nominal Race of all customers

Range of all customers’ total driving experience in year

6 Education Nominal Education level of all customers

7 Income Nominal Class of income of all customers

8 Credit score Float Credit score of all customers

Whether all customers own their vehicles

10 Vehicle year Nominal The time that all customers have their vehicles

Whether all customers are married

Whether all customers have children

● 1: having children use their vehicles

14 Annual mileage Integer Annual mileage of all customers using their vehicles

15 Vehicle type Nominal Type of all customers’ vehicle

Number of speeding violations that all customers have committed while using their vehicles

Number of accidents that all customers have been involved in while using their vehicles

Whether all customers claim their loans from the company

DATA CLEANSING & PREPROCESSING

INPUT: Check for missing values in the dataset

The dataset does not contain any missing values, which is great The next step would typically be to look for any inconsistencies or outliers in the data.

1.2 - Outliers a) INPUT: Find outliers in credit score column

Outliers = OR(H2$U$6) If false, it is not outlier; if true, it is outlier.

Table 2.1: Finding outliers b) INPUT: Find outliers in annual mileage column

Samsung Electronic - QUN TR HC

BÁO CÁO CUỐI KỲ CHUYÊN ĐỀ ĐỊNH…

Pauline cullen the key to ielts writing task 2

Dinh huong chien luoc kinh doanh quoc te c…International 100% (16)32

Outliers= OR(I2$V$6) If false, it is not outlier; if true, it is outlier.

These outliers do not have a statistically significant impact on the outcome of the analysis.

Select all of the dataset(click on a random cell in the dataset then Ctrl+A) Data

Remove Duplicates Select all Ok

OUTPUT: No duplicate values found.

Báo cáo thực tập giữa khóa - sv Nguyễn Min…

Table 2.4: Output of modified dataset

Describing attributes: Creating real customer profile

1 - Visualizing data of each attributes (features) a - Age

(Figure 3.1: Table and Pie chart presenting categories of age)

Looking at the pie chart, we notice that the largest segment consists of individuals aged 26-39, accounting for 32% of the total The 40-64 age group is slightly lower at around 30% Meanwhile, both the 16-25 and over 65 age groups have roughly equal percentages, approximately 19% each From this, we can deduce that customers between the ages of 25-64 are more inclined to acquire insurance from the company. b - Education

(Figure 3.2: Table and Pie chart presenting categories of education)

Examining the chart reveals that merely 19% of customers lack an educational background, meaning they did not receive any formal schooling In contrast, 41% of customers have completed high school, which is only 1% greater than the proportion of individuals attending university It can be inferred that the vast majority of customers of the company have at least high school education or higher. c - Gender

Row Labels Count Percentage female 1342 50.22% male 1330 49.78%

(Figure 3.3: Table and Pie chart presenting categories of gender)

In the pie chart, it's evident that the proportion of females and males is nearly identical, with females accounting for 49.78% and males for 50.22% This suggests that the company does not exhibit any gender bias towards its customers. d - Race

Row Labels Count Percentage majority 2403 89.93% minority 269 10.07%

It is evident that 90% of the customers belong to the majority racial group, while only 10% are from minority backgrounds In other words, the company seems to give preference to individuals from the majority ethnic group. e - Income

Row Labels Count Percentage middle class 591 22.12% poverty 471 17.63% upper class 1168 43.71% working class 442 16.54%

(Figure 3.5: Table and Pie chart presenting categories of income)

It is witnessable that the majority of customers in the dataset are from upper middle class, or in other words, people with advanced educational degrees and occupied in white collar jobs, which takes up 44% of the variables The proportion of working- class people is the most insignificant - 16% From this contrast, we can conclude that the insurance company prioritizes people with average or higher income including middle-class and upper-working-class individuals. f - Marriage

(Figure 3.6: Table and Pie chart presenting categories of marriage)

The number of single and married individuals in the dataset is almost equal, which is 49% and 51% relatively From this comparison we can see that the company does not bias between married and unmarried customers. g - Children

(Figure 3.7: Table and Pie chart presenting categories of children)

It is seeable that people with one child are the larger categories of the two - 1 or 0 kids, which is 69% To conclude, most of the clients have a child of their own. h - vehicle year

Row Labels Count Percentage after 2015 848 31.74% before 2015 1824 68.26%

(Figure 3.8: Table and Pie chart presenting categories of vehicle year)

The majority of customers have owned their vehicles before 2015 And since each of those vehicles have their own insurance from the company, these 1824 individuals(68%) have been the clients of the company a longer time than the remaining 848(32%). i - vehicle ownership

(Figure 3.9: Table and Pie chart presenting categories of vehicle ownership)

As seen in the pie chart, the number of customers who are legally the owners of the cars and who have registered to purchase insurance is significantly higher than the number of customers who are not (70,1% compared to 29,9%) It makes sense that customers would be more concerned about their own vehicle, and it also makes the insurance purchasing and selling process easier by avoiding some extra paperworks. j - driving experience

(Figure 3.10: Table and Pie chart presenting categories of driving experience)

It is clear that the majority of the market volume (up to more than 2/3 of the total) is made up of customers with less than 20 years of driving experience More specifically, at about 33% and 34%, respectively, the proportion of customers with less than 10 years' driving experience and that of those with 11 to 19 years' experience is relatively similar Given that just less than 10% of customers have more than 30 years of driving experience, it is obvious that the amount of years of driving experience will be inversely related to the customer's probabilities to purchase insurance. k - credit score

(Figure 3.11: Table and Pie chart presenting categories of credit score)

The majority of customers in the dataset have fair (0.4 to 0.8) or poor (less than 0.4) credit scores Only a small portion of people (those with scores higher than 0.8) have excellent credit This means that the insurance company is focusing on a group of customers with a low/mid credit score by offering them coverage with more options or fewer deductibles The company also does not seem to have many priorities for customers who have higher credit scores. l - Postal code

(Figure 3.12: Table and Pie chart presenting categories of postal code)

The graphic shows that roughly 70% of all clients have the postal code 10238 , which may be indicative of the higher risk associated with the area that includes the postal code 10238 (New York, USA) Meanwhile, the remaining location like one with the postal code of 32765 (Oviedo, Florida, USA) is a safe suburban area, with relatively low crime and accident rates, accounting for only about 25% of the proportion. m - Annual mileage

(Figure 3.13: Column chart presenting categories of annual mileage)

According to the graph, the majority of our customers have used their vehicles for a similar distance, approximately 1000 to 1300 miles This suggests that these customers have similar needs and preferences, and that they can therefore be classified into the same segmentation. n - Vehicle types

Row Labels Count Percentage sedan 2284 85.48% sports car 388 14.52%

(Figure 3.14: Table and Pie chart presenting categories of vehicle type)

The dataset makes it clear that just two car categories are in use, and the sedan category is dominant, representing 85% of the total Therefore, a concise inference can be made that the majority of the company's customers have a preference for sedans. o - Speeding violation

(Figure 3.15: Table and Column chart presenting categories of speeding violation)

A quick view of the columns reveals that there is an inverse relationship between the frequency of speeding violations and the number of customers Notably, an outstanding number of customers have no speeding violations, indicating that the majority of customers generally adhere to the speed limit with only minimal violations. p - Past accidents

(Figure 3.16: Column chart presenting categories of past accidents)

Multivariate statistics: Finding business insights

To gain a more comprehensive and nuanced understanding of the relationships between the different features in the data frame, we will employ a pair plots and heatmap to visualize the data This will enable us to more effectively compare each feature to one another.

1.1.1 - A matrix of pairwise scatter plots

Given a set of features, excluding the "outcome" feature, we can create a matrix of pairplots, where each pairplot is a scatterplot of two different features.

1.1.2 - Pairwise relationship matrix of numerical features, grouped by categorical feature

Figure 3.19-31: Pairplot matrix of numerical features, grouped by categorical features

1.2 - Correlation heatmap of pairwise attributes

2 - Business insights from data visuals

By analyzing the pairplots and heat map above, we can gain a deeper understanding of our customers' behavior For example, we can identify patterns and relationships between different customer characteristics This information can be used to improve our marketing and sales strategies, as well as to develop new products and services that better meet the needs of our customers.

1 Customers with children tend to have lower annual mileage, but commit more speeding violations and become involved in more accidents than those with no child do.

2 Customers with higher education levels tend to be more likely to engage in risky driving (more speeding violations and more accidents) than those not taking part in middle-to-high education do.

3 Customer group of postal code number 10238 tends to have the most significant number of car accidents and speed violations than the others.

4 Customers in the upper class tend to be involved in more speeding violations and car accidents than those in other income classes do.

The insights gained from the data are promising, they, however, require further testing to validate their accuracy This is because some insights may appear to be significant,but they may not be valid or generalizable to other populations or contexts The next chapter will present these insights in hypothesis format and describe the testing process in more detail

Besides, the data analysis additionally reveals several highly salient insights regarding the relationship between annual mileage and credit score.

1 Customer groups of different categories of each categorical feature tend to show similarity in trend regarding credit score.

2 Customer groups of different categories of each categorical feature tend to show similarity regarding annual mileage.

Linear regression: Finding the trends and interdependence between outcome

Linear regression analysis is used to predict the value of a dependent variable based on the value of another independent variable

- The structure of data: Cross-sectional data.

- We processed data by estimating the coefficients of the OLS - ordinary least square model and calculating on correlation matrix.

After referring to different studies that have been conducted before as well as collecting additional information from our network, our group decided to use multiple regression analysis and t-test to figure out the dependence outcome on 4 independent variables: Credit score, annual mileage, speeding violations, past accidents.

To analyze the influence of factors on outcome, our research chose to study a linear regression as follows:

+β 3 SPEEDING VIOLATIONS +β 4 PAST ACCIDENTS + in which: εi is residual

3 - Business insights from correlation matrix

From the equation and the correltion table above, we can obsserve some significant trends, including:

1 Customers with less credit score tend to claim their loans.

2 Customers with higher annual mileage tend to claim their loans.

3 Customers with less speeding violations tend to claim their loans.

4 Customers with fewer past accidents tend to claim their loans.

HYPOTHESIS TESTING

For hypothesis purpose, Python are used as a tool to carry out the process Depending on characteristics of each hypothesis, an apporiate method of testing is used

1 - Hypothesis testing for Part 2 - Chapter 03: p-test

Insight 1: Customers with children tend to have lower annual mileage, but commit more speeding violations and become involved in more accidents than those with no child do.

H0: Average annual mileage of customers with children is greater than or equal to that of customers without children.

Ha: Average annual mileage of customers with children is lower than that of customers without children.

H0: Average number of speeding violations of customers with children is lower than or equal to that of customers without children.

Ha: Average number of speeding violations of customers with children is greater than that of customers without children.

H0: Average number of past accidents of customers with children is lower than or equal to that of customers without children.

Ha: Average number of past accidents of customers with children is greater than that of customers without children.

Insight 2: Customers with higher education levels tend to be more likely to engage in risky driving (more speeding violations and more accidents) than those not taking part in middle- to-high education do.

H0: Average number of speeding violations of customers with education level of university and high school is smaller than or equal to that of customers without education level.

Ha: Average number of speeding violations of customers with education level of university and high school is greater than that of customers without education level.

H0: Average number of speeding violations of customers with education level of university and high school is smaller than or equal to that of customers without education level.

Ha: Average number of speeding violations of customers with education level of university and high school is greater than that of customers without education level.

Insight 3: Customer group of postal code number 10238 tends to have the most insignificant number of car accidents and speed violations than the others.

H0: Average number of speeding violations of customers in the area of postal code number 10238 is smaller than or equal to that of customers in other areas.

Ha: Average number of speeding violations of customers in the area of postal code number 10238 is greater than that of customers in other areas.

H0: Average number of past accidents of customers in the area of postal code number 10238 is smaller than or equal to that of customers in other area.

Ha: Average number of past accidents of customers in the area of postal code number 10238 is greater than that of customers in other areas. Input:

Insight 4: Customers in the upper class tend to be involved in more speeding violations and car accidents than those in other income classes do.

H0: Average number of speeding violations of customers in the upper class is smaller than or equal to that of customers in other classes of income.

Ha: Average number of speeding violations of customers in the upper class is greater than that of customers in other classes of income. Input:

H0: Average number of speeding violations of customers in the upper class is smaller than or equal to that of customers in other classes of income.

Ha: Average number of speeding violations of customers in the upper class is smaller than or equal to that of customers in other classes of income.

2 - Hypothesis testing for Part 3 - Chapter 03: OLS method

Insight 1: Customers with less credit score tend to claim their loans.

H0: Customers with less credit score do not tend to claim their loans Ha: Customers with less credit score tend to claim their loans.

The null hypothesis can be rejected The coefficient of the credit score variable is negative and statistically significant, which means that a lower credit score is associated with a higher outcome

Insight 2: Customers with higher annual mileage tend to claim their loans.

H0: Customers with higher annual mileage do not tend to claim loans. Ha: Customers with higher annual mileage tend to claim loans.

The null hypothesis can be rejected The coefficient of the income variable is positive and statistically significant, which means that a higher annual mileage is associated with a higher outcome.

Insight 3: Customers with fewer speeding violations tend to claim their loans.

H0: Customers with fewer speeding violations do not tend to claim loan.Ha: Customers with fewer speeding violations tend to claim loan.

The null hypothesis can be rejected The analysis indicates that, in this dataset, more speeding violations are associated with a lower likelihood of claiming a loan The model is statistically significant, and the coefficients show the direction and strength of this association.

Insight 4: Customers with fewer past accidents tend to claim their loans.

H0: Customers with fewer past accidents do not tend to claim their loansHa: Customers with fewer past accidents tend to claim their loans.

The null hypothesis can be rejected These results suggest that customers with fewer past accidents are more likely to claim their loans, while those with more past accidents are less likely to claim.

INSIGHTS, CONCLUSION & RECOMMENDATIONS

Synthesis of business insights of real customer behavior

Based on the analysis above, we can synthesize the following conclusions or insights, and have a brief explanation to these phenomena:

Insight 01: Customers with children tend to have lower annual mileage, but commit more speeding violations and become involved in more accidents than those with no child do

Without a doubt, it is apparent that consumers with children go less distance on the road The demands of parenting are the fundamental cause of a number of elements that contribute to this tendency The need for a lot of time and attention toward children is foremost among these With children in tow, parents frequently find that their personal driving time is limited as they attend to the demands of everyday life, including commuting to and from work, running errands, or taking leisurely road excursions Children can also provide logistical difficulties, requiring more careful route planning and frequent breaks The requirement for larger vehicles and the accommodation of additional luggage may be necessary, increasing the overall cost and time required for transportation

On the other hand, it's also important to note that parents with kids typically drive differently than parents without kids, as seen by a propensity for more speeding violations and a higher likelihood of accidents There could be a number of causes for this behavior Parents may be prone to driving beyond the speed limit in an effort to get where they need to be quickly since they are frequently confined by busy schedules Additionally, having kids in the car might cause distractions as parents tend to their demands and various childcare duties, thus reducing their concentration on the road Driving in crowded areas,such as near schools or where kids are participating in activities, can further increase the risk of an accident

Insight 02: Customers with higher education levels tend to be more likely to engage in risky driving (more speeding violations and more accidents) than those not taking part in middle-to-high education do.

There are probably two main causes for this circumstance First off, people with more education frequently have higher levels of self-confidence, which could cause them to overestimate their driving ability They may be more likely to disobey safety instructions and participate in risky driving habits, such speeding, as a result of their overconfidence since they think they can handle any circumstance that may come.

Second, it's crucial to keep in mind that people with more education typically live in cities with heavier traffic Usually, stricter traffic laws and speed limits apply in these locations, and they are strictly enforced Therefore, the greater reported incidence of such violations among this population may be explained by the increased risk of getting caught and punished for speeding offenses in these regions.

And finally, higher educated people might be more vulnerable to the pressures of demanding professional responsibilities, which could result in hasty driving practices as they try to meet deadlines

Insight 03: Customer group of postal code number 10238 tends to have the most significant number of car accidents and speed violations than the others.

When compared to other consumer categories, the residents of postal code

10238 appear to experience a significantly higher frequency of traffic accidents and speeding tickets There could be a number of causes for this tendency First and foremost, this region's high poverty rate is a major factor High rates of poverty can contribute to an environment where crime is more prevalent and perhaps even encourage more reckless driving.

Second, postal code 10238's high unemployment rate makes the issue much worse Stress levels rise in areas experiencing high unemployment and financial difficulty, which may lead to hazardous driving practices

Additionally, demographics are important There are many young people living engage in risky driving behaviors, such as speeding and driving while intoxicated.

Lastly, driving safely is made more difficult by the neighborhood's narrow, congested roadways in postal code 10238 These circumstances may make accidents more likely.

Insight 04: Customers in the upper class tend to be involved in more speeding violations and car accidents than those in other income classes do.

It's noteworthy to observe that customers in the upper income bracket have a higher propensity for speeding tickets and auto accidents than people in lower income brackets There could be a number of causes for this behavior.

First off, people from the upper class might drive automobiles that are faster and more powerful, which might tempt them to speed This group may also be more likely to engage in unsafe driving practices because they believe their financial advantages will make it easier for them to avoid penalties.

Additionally, members of the upper class frequently live in cities with heavier traffic The chance of speeding and accidents can rise as a result of annoyance and impatience.

Finally, upper-class people may have demanding or stressful occupations that tempt them to take chances, such as speeding to save time Additionally, they could be more prone to use drugs or drink alcohol, both of which can impair their ability to drive safely and increase the likelihood of an accident.

Insight 05: Customers with less credit scores tend to claim their loan

Customers with lower credit scores have a propensity to default on their debts more frequently Numerous reasons might be used to account for this connection between credit scores and loan claims.

First off, people with lower credit scores frequently experience financial difficulties, which makes it harder for them to fulfill their financial obligations.

They may require loans to pay for emergency bills or to support basic living expenses, which enhances the possibility that they will make loan repayments.

Second, poorer credit scores may be a sign of past financial instability, missed payments, or loan defaults These people might be viewed by lenders as bigger risks, which would result in higher loan interest rates As a result, clients with poorer credit ratings could feel pressured to pay back their loans in order to lighten their total financial load.

Additionally, people with poorer credit may not have access to conventional lenders and may instead turn to unorthodox, high-risk lenders that have stricter terms and conditions This may make loan claims more likely as a result of adverse borrowing conditions.

Insight 06: Customers with higher annual mileage tend to claim their loan.

Customers who travel more frequently each year are more likely to use their loans The causes of this trend are varied.

Firstly, due to the longer amount of time spent driving, customers who drive more frequently are more likely to experience accidents and other vehicle- related problems These mishaps may result in human harm or property damage, necessitating the need for a loan to pay for the related expenses.

Also, higher annual mileage often results in more wear and tear on a vehicle, which increases maintenance and repair costs Customers who drive more miles per year may need loans to pay for upkeep or unforeseen repairs, which can be expensive for well-used cars.

Additionally, long distance travel frequently leads to increased fuel costs as well as additional operating costs High yearly mileage customers could on occasion need loans to pay for these recurring expenses.

Recommendations for the business

For many decades, insurance customers viewed insurance as a commodity product they had to purchase based on what was available But over the past decade, the market has become saturated with more insurance products and different options, creating a more sophisticated buyer with a large range in needs and wants To meet these changing needs, here are some recommendations that can adapt company in a world of change:

Focus and engage high-potential customers

Basing on the dataset, these customers have all of the characteristics: Aging between 26-64 years old, graduating from high school or higher, having high income, having children, possess vehicle manufactured before 2015, having not so much driving experience (under 19 years), having average credit score (0.4 to 0.6), not driving so much(between 10000 and 13000 miles), living in Bronx, nearly never receiving speeding penalty and getting in any accidents.

There are four positive changes which should be adopted by insurers include:

● Adding new channels to communicate with customers for policy questions and claims.

● Changing language in documents and communications to use less insurance jargon.

● Offering hybrid experiences (human and artificial intelligence, physical and virtual, direct and agent-based).

● Engaging with customers daily or throughout the year instead of only at renewal time.

Besides improving services to attract more customers, insurance companies should also improve efficiency of insurers by raising commissions The higher commission insurers receive, the more efficiently they will work While traditional insurance policies typically deliver 8-10% return, nowadays, insurance companies usually pay more for each insurance policy to encourage the insurers

Verbelen, R., 2018 https://academic.oup.com [Tr_c tuy`n]

Available at: https://academic.oup.comhttps://academic.oup.com/jrsssc/article/67/5/1275/7058363

Available at: https://www.sciencedirect.com/science/article/abs/pii/S0001457502001069

Rudden, J., không ngày tháng statista [Tr_c tuy`n]

Available at: https://www.statista.com/aboutus/our-research-commitment/1028/jennifer-rudden

No Full name Student ID Contribution percentage

Samsung Electronic - QUN TR HC

BÁO CÁO CUỐI KỲ CHUYÊN ĐỀ ĐỊNH…

Pauline cullen the key to ielts writing task 2

Pauline cullen the key to ielts writing task 2

Ngày đăng: 30/01/2024, 05:21

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w