1. Trang chủ
  2. » Luận Văn - Báo Cáo

Khóa luận tốt nghiệp: Use machine learning to determine which factors are crucial in long-term love

55 0 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Use Machine Learning to Determine Which Factors Are Crucial in Long-Term Love
Tác giả Trinh Anh Tu, Tang Quoc Minh
Người hướng dẫn Ph.D. Phan Thanh Trung
Trường học University of Information Technology
Chuyên ngành Information Systems
Thể loại Bachelor of Engineering
Thành phố Ho Chi Minh City
Định dạng
Số trang 55
Dung lượng 29,1 MB

Nội dung

Machine learning models were developed to analyze key factors for long-termromantic relationship success among young Vietnamese adults.. This research not only offers a unique empirical

Trang 1

VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY

UNIVERSITY OF INFORMATION TECHNOLOGY ADVANCED PROGRAM IN INFORMATION SYSTEMS

TRINH ANH TU — TANG QUOC MINH

USE MACHINE LEARNING TO DETERMINE WHICH

FACTORS ARE CRUCIAL IN LONG - TERM LOVE

BACHELOR OF ENGINEERING IN INFORMATION SYSTEMS

THESIS ADVISOR

Trang 2

ASSESSMENT COMMITTEEThe Assessment Committee is established under the Decision , date

¬— by the Rector of the University of Information Technology

Le eee e cece eee eee ene tence ee ene nena eae n acess - Chairman

Qe ence nee ee ene eee teens ene teen - Secretary

—— eee ee eee ee ene e dene eee ene ta eee eae n een ey - Member

Wo lec cece cece cece cece eee eeeeeeneeeeeaeeeeaaaes - Member

Trang 3

I would like to extend my deepest gratitude to Dr Trung for his exceptionalguidance and invaluable support during the development of my graduation thesis on

"Use machine learning to determine which factors are crucial in long-term love."

Dr Trung's expertise in machine learning and data science has been acornerstone of my academic journey His insightful feedback, rigorous academicstandards, and patient explanations have greatly enhanced both my understanding and

my interest in the field Under his mentorship, I have not only developed a robust thesisbut also cultivated a deeper appreciation for the intricacies and potentials of machinelearning

His approach to teaching, marked by a balance of challenging concepts and supportiveguidance, has been instrumental in pushing me to explore new ideas, refine mymethodologies, and strive for excellence in my research Dr Trung's ability to simplifycomplex concepts and encourage critical thinking has significantly contributed to mypersonal and professional growth

Furthermore, his constant encouragement and belief in my abilities provided mewith the confidence and motivation needed to navigate the challenges of my research.This journey, under Dr Trung's mentorship, has been one of immense learning andpersonal development

I am profoundly grateful for the time, effort, and wisdom that Dr Trung hasinvested in my thesis and my academic growth His mentorship has not only shaped thisthesis but also prepared me for future endeavors in the realm of data science andmachine learning

I would like to thank Dr Trung for being more than a supervisor; he has been amentor, a guide, and a significant influence on my academic path His impact extendsbeyond this thesis and will undoubtedly continue to inspire me in my future career

Trang 4

TABLE OF CONTENT

Chapter 1 Introduction 2 Chapter 2 Literature Review 4

2.1 Dating Before the Internet Era 4

2.2 Dating in the Digital Age and Online Dating Applications 4

2.3 The Role of Professional Matchmaking Services 5

Chapter 3 Data Collection 6

3.1 Conducting the questionnaire 6

3.2 Data collection methods 9

Chapter 4 Data Understanding 11

4.1 Basic Information 11

4.1.2 Duration of Relationship 13 4.1.3 Big Five Personality 15

4.2 Self-Assessment of Male and Female 17

4.2.1 Applying t-test and p-value 17 4.2.2 First Meeting 17 4.2.3 Current 19 4.3 Partner Assessment of Male and Female 21

4.3.1 First Meeting 21 4.3.2 Current 23

4.3.3 Future 25

4.4 Previous Relationship Assessment of Male and Female 27

4.4.1 First meeting 27 4.4.2 Features 29

Independent Variables: Our study's independent variables are derived from two key sources - the current partner evaluations and breakup factors Specifically, we extracted 10 features based on the

‘Partner (Current)' ratings and inverted the importance rankings from 'Breakup Reasons' for ended

Trang 5

of our independent variables, offering a comprehensive view of factors influencing romantic

compatibility.

5.4 Modeling Methods

5.5 Final Model Selection

5.5.1 Model trained result 5.5.2 Features Importance 5.6 Testing Hypothesis

5.7 Model Limitation

Chapter 6 Discussion

6.1 Data Collection Contribution

6.2 Important Factor in Long-Term Love

40 41

42

42 42 43 43 45 46

Trang 7

Machine learning models were developed to analyze key factors for long-termromantic relationship success among young Vietnamese adults The study focused ondata collection, with primary data meticulously gathered through an extensivequestionnaire completed by 196 individuals A significant proportion of theseparticipants were aged 18-24, providing a dataset is essential for accurately identifyingpredictors of relationship success The evaluation process involved self, partner, andpast relationship assessments across 10 traits, leading to the derivation of matched andmismatched cases Advanced models like Random Forest, Logistic Regression, andSVM were employed, with the Random Forest model proving most effective, achieving

an 87.4% Fl Score This model particularly highlighted ‘Dedication’ as the keypredictor of enduring relationships While the study offers new insights, itacknowledges the need for more extensive data to validate these findings across varieddemographics This research not only offers a unique empirical perspective, particularlyfrom Vietnamese youth, but also illustrates the crucial role of detailed data collection

in understanding complex social dynamics

Trang 8

Chapter 1 Introduction

Background to the Study:

In Vietnam, a nation marked by arich cultural heritage and rapid modernization,the dynamics of romantic relationships among young adults are evolving in uniqueways This demographic is at the intersection of traditional values and contemporaryrelationship paradigms, making their experiences in romantic relationships acompelling area for study While there is some research on romantic relationships inbroader Asian contexts, such as "Surrogate Dating and the Translation of Gendered

Meanings across Borders: The Case of China’s E-mail-Order Brides" by Monica Liu

(2015) [39], focused studies on the Vietnamese youth, particularly in the context ofgender differences in self and partner perception over time, are scarce This gap issignificant, considering the unique interplay of traditional Vietnamese values andmodern romantic concepts, as well as the global relevance of understanding genderdynamics in relationships, as explored in studies like "Gender Differences inCommunication Behaviors, Spatial Proximity Patterns, and Mobility Habits" by YangYang, et al (2016) [40]

Problem Statement:

This study aims to explore the critical factors contributing to the success of term romantic relationships among young Vietnamese adults, with a specific focus onunderstanding how gender influences self and partner perception at different stages of

long-a rellong-ationship This explorlong-ation is plong-articullong-arly relevlong-ant in the context of Vietnlong-am'srapidly changing social landscape, where traditional norms coexist with modernperspectives on love and relationships Research such as "Asymmetries of Men andWomen in Selecting Partner" by Haluk O Bingol, et al (2012) [41] and "Quantifyinggender preferences across humans lifespan" by Asim Ghosh, et al (2016) [42]highlights the importance of examining these gendered nuances in romantic contexts

Aim of the Study:

The primary aim is to identify key factors that contribute to the success of term romantic relationships among young adults in Vietnam, with a special focus ongender differences in self and partner perception over time To achieve this, the studywill employ advanced machine learning techniques to analyze survey data, offering anovel and comprehensive approach to understanding these complex dynamics

Trang 9

long-Research Questions:

@ RQ1: What are the most significant factors contributing to the success of

long-term romantic relationships among young couples in Vietnam?

e© RQ2: How do gender differences influence the perception of key factors in a

relationship

Scope of the Study:

The study mainly focuses on Vietnamese youth aged 18-25 years, a demographicthat represents a significant portion of the population undergoing key life transitions.The methodology includes collecting data from 196 individuals in committed romanticrelationships and employing machine learning models for analysis to determine thecrucial factors in long-term relationships and the correlation between men and womenregarding in partner and self-assessments over each period

Significance of the Study:

This research aims to provide insights into the factors that enable young couples

in Vietnam to maintain successful and enduring relationships, with a particularemphasis on understanding the role of gender in shaping these dynamics It willcontribute to the academic discourse on romantic relationships within the Vietnamesecontext and offer comparative insights with studies conducted in other cultural settings,such as "Aspirational pursuit of mates in online dating markets" by Elizabeth E Bruch,

et al (2018) [43] and "Computational Courtship: Understanding the Evolution of OnlineDating through Large-scale Data Analysis" by Rachel Dinh, et al (2023) [44] The use

of machine learning in this research represents an innovative approach, enhancing thedepth and accuracy of the analysis

Study Outline:

The thesis is structured as follows:

Chapter 2: Literature Review

Chapter 3: Data Collection

Chapter 4: Data Understanding

Chapter 5: Modeling

Chapter 6: Discussion

Chapter 7: Conclusion

Trang 10

Chapter 2 Literature Review

2.1 Dating Before the Internet Era

Dating in the pre-internet era was deeply rooted in traditional social structures

In many cultures, families and community elders play a significant role in arrangingmarriages, with a focus on long-term commitments [1] This period was characterized

by limited choices, often restricted by social class and geographical boundaries Theevolution of societal values over time, coupled with these limitations, gradually pavedthe way for the emergence of online dating

2.2 Dating in the Digital Age and Online Dating Applications

With the advent of the internet, dating underwent a significant transformation.Online platforms have expanded the possibilities for meeting potential partners,transcending traditional geographical and social constraints This era saw a shift in howrelationships are formed, with an emphasis on compatibility and communication

However, maintaining authenticity and ensuring safety became more prominentchallenges in the online dating landscape due to increased risks While online platformsallowed for greater connectivity, it also enabled deception and abuse more easily Due

to the fact that there were also incidents of racism, body-shaming, catfishing where fake

profiles were created to deceive others Personal information shared online couldpotentially be misused Meeting strangers from the internet also carried some physicalrisks

As these problems arose, dating apps and websites started implementing newfeatures to improve screening and promote integrity in user profiles and interactions.Implement secure authentication mechanisms, such as two-factor authentication, and

use end-to-end encryption to protect users’ data Nevertheless, with technology

advancing rapidly, bad actors also found newer ways to circumvent security measures

Trang 11

In response to the growing safety concerns, many companies specializing invetted introductions emerged [33, 34] Services like eHarmony [45] and Rudicaf [46]invested in sophisticated verification processes to match compatible partners whoseintentions had been deemed genuine through in-depth profiling Their curated approachfocused on building meaningful relationships over encounters, hoping to minimize risks

in the evolving world of online dating [5, 6, 7]

2.3 The Role of Professional Matchmaking Services

Conventional matchmaking services rely heavily on human labor to conductcompatibility assessments and matches Counselors interview each client to understandtheir needs and preferences They then scroll through profiles to find suitable candidatesbased on their expertise and intuition

While this personalized approach is effective, its scalability and efficiency arelimited by real-world constraints The matching process depends on the skills ofcounselors and becomes cumbersome as the user base grows It lacks systematicevaluation methods to validate effectiveness [8]

This provided the motivation for my research which aims to apply machinelearning techniques to optimize different steps in the matchmaking workflow.Algorithms could be developed to assist counselors in profiling users based on traitsthat truly correlate with relationship success and recommend selections that counselorsmay not think of, ultimately improving outcomes For instance, a two-side matchingframework and an LDA (Latent Dirichlet allocation) model to learn user preferencesfrom messaging behavior and profile features have been proposed [9] Moreover, theeffect of specific factors like sport activity on human mating in online dating has beenexplored using causal machine learning techniques [10] Additionally, the matchmakingproblem in electronic social networks has been formulated as an optimization problem,suggesting a function to measure the matching degree of interest [11] Even in contextslike brand-influencer matchmaking on Instagram, machine learning techniques haveshown their potential in matching based on online profiles [12]

Trang 12

Chapter 3 Data Collection

3.1 Conducting the questionnaire

We initiated our questionnaire development by performing a comprehensiveliterature review of existing research papers related to self and partner attractiveness,love relationships, and breakups As information technology students without a strongbackground in psychology, we aimed to ground our questionnaire in establishedscientific theories and instruments to ensure validity

Our literature search uncovered a multitude of high-quality articles across reviewed psychology, sociology, and interpersonal communications journals Aftercarefully screening over 50 abstracts, we narrowed down to six seminal papers thatdirectly informed our questionnaire scope These examined self-perception, mateselection preferences, relationship stages, predictors of breakups, and personality

peer-assessments.

Two seminal studies formed the integral framework for developing ourrelationship questionnaire: "Cognitive processes underlying human mate choice: Therelationship between self-perception and mate preference in Western society" byConroy-Beam & Buss (2016) [13], and the "Measuring personality in one minute orless: A 10-item short version of the Big Five Inventory in English and German" byRammstedt & John (2007) [14] Conroy-Beam and Buss empirically devised a reliable,valid 10-item measure of how Westerners perceive self and ideal long-term matesacross key traits This provided the basis to adapt sections asking Vietnameseparticipants to rate themselves and partners from 1-5 on possessing these 10 qualities atthree key relationship stages — first meeting, current, and future aspirations The stagesenabled analyzing gender and temporal differences statistically Additionally,Rammstedt and John's 10-item Big Five personality questionnaire assessingExtraversion, Agreeableness, Conscientiousness, Emotional Stability, and Openness to

Trang 13

In addition to the relationship questions, we also developed a 10-item list ofcommon factors leading to breakups based on the same traits Participants rated theextent they believed each item contributed to their past failed relationships on a 5-pointLikert scale This provided supplementary data on what interpersonal deficiencies most

strongly undermine Vietnamese couples’ relationship stability.

All our questions have answer scale from 1 (Strongly Disagree) to 5 (StronglyAgree), here are the questionnaire structure:

1 Big five personality questions

Table 3.1: Big five personality questions

Question: Do You Think Yourself Is?

Extroverted, enthusiasticCriticizing, arguing

Calm and emotionally stable

Traditional, not creative

Trang 14

2 Partner and self-evaluate Questions

Table 3.2: Partner and self-evaluate questions

Question: What factors do you evaluate as important

Finances (income and inheritance)Attractive appearance

ParentingSocial statusHealth

Desire for childrenDedication

AmbitionFamily relationship

3 Questions for breakup reasons

Table 3.3: Questions for breakup reasons

Question: What factors do you evaluate as important lead to

breakupDoesn’t have Finances (income and inheritance)Doesn’t have Attractive appearance

Doesn’t have Loyalty

Doesn’t have Parenting Doesn’t have Social status Doesn’t have Health

Doesn’t have Desire for children

Doesn’t have Dedication Doesn’t have Ambition

Trang 15

In summary, through rigorous academic research and culturally adaptingestablished instruments, we developed a comprehensive relationship questionnaire withthree sections: self and partner ratings across relationship stages, breakup predictors,and a personality assessment.

3.2 Data collection methods

Our data collection methodology is centered on leveraging personal connections

to widely distribute links to our online survey questionnaire As a two-person student

research team, we relied on a viral “snowball sampling” approach involving an initial

seeding to our extended social and professional networks, followed by exponential to-peer sharing

peer-In the first phase, we individually sent out the survey link via messaging apps toevery connection each comprising friends, classmates, co-workers, relatives andacquaintances, with an invitation message to help participate and pass on the survey totheir own networks This enabled us to tap into multiple diverse segments spanninguniversity students, public sector employees and others Within a few days, weobserved rapid voluntary diffusion as our contacts promoted the study among theircoworkers, organizations, societies, family members and online communities Whilesnowball sampling risks selection bias, the randomness arising from viral uncontrolledsharing mitigated this and secured representation across relationship statuses, locations,ages, occupations, and interests Our linking to the survey via Web links and QR codes

on multiple platforms like Facebook, Zalo and Telegram also expanded visibility

Concurrent to the viral approach, we directly collected data in-person throughprinted questionnaires distributed to students on various university campuses as well aspublic spaces like cafés, malls and parks around Ho Chi Minh City To qualifyrespondents, we screened them on meeting two criteria: currently in a romanticrelationship, and willing to complete the 15-minute questionnaire seriously andhonestly These public intercept surveys boosted sample diversity, especially those lessactive digitally

Over the intense one-week data gathering phase combining digital outreach andin-person collection, we obtained 196 valid responses meeting quality checks For open-ended questions, we standardized free text into categories based on a codebook weinductively developed to designate emerging themes

Trang 16

This methodology generated a robust convenience sample suitable forexploratory relationship research, not intended to statistically represent Vietnam'sbroader young adult population The multi-channel collection protocol also appearedeffective for rapidly capturing 196 diverse responses with limited researcher resources

Trang 17

In our data understanding process, it is crucial to note that we have not definedany threshold for what constitutes ‘long-term love.' Our dataset comprises responsesfrom individuals who are currently in relationships up to the point they participated inthe survey This approach reflects our focus on understanding relationship dynamics attheir current stage, without a predetermined definition of long-term commitment Thisperspective is essential in framing our analysis and interpretation of the data in thecontext of contemporary relationship experiences.

The six key variables we examined through graphs were: gender, age, monthlyincome, education level attained, length of current relationship, and scores on the BigFive Personality assessment Plotting the demo graphics provided a broad overview ofthe composition and profile attributes of the respondents who completed ourquestionnaire Inspecting these distributions enabled us to evaluate the quality of thedata captured through the collection protocols to validate sufficient coverage acrosscategories of interest needed for later analysis stages After this initial descriptivestatistical phase, we felt assured of having a sizable quality dataset to proceed withtesting hypotheses and developing predictive models to address the motivating aims ofour couple dynamics research

Going forward, figuring out the generalizability and limitations on extendingfindings to the broader population would involve comparing sample characteristics tonational census benchmarks These visual methods for scanning the dataset alsodirected planning tailored statistical tests suited for the underlying data propertiesreflected across variables like age and income Overall, the diagnostics provided thefoundation to delve deeper into analyses that could yield actionable insights for ourresearch questions

Trang 18

Fig 4.1 : Distribution of Male and Female Fig 4.2:Distribution of Age

Education Level Distribution

Monthly Income Distribution

Bachelors income Range

Fig 4.3: Distribution of Educational Level Fig 4.4: Distribution of Income

Our sample of 196 responses exhibits a moderately unbalanced gender split, with

55.6% males compared to 44.4% females This could be attributed to both research team members being male students from the male-dominated University of Information

Technology (UIT) The dominant representation is youths aged 18-25 years old,

confirming alignments with our intended target group for exploring dynamics between

young adult romantic partners

Trang 19

Most respondents hold a university degree (79.1%), followed by junior college,

vocational, master's, doctorate level and lastly high school qualifications Monthly income distribution clusters primarily within the under 10 million VND bracket, capturing overall earning power of Vietnamese youth demographics The education

level, age range, and income levels also reasonably match the typical student profile atour university

Taken together, while the convenience sampling evidenced slightly more maleand highly educated participation, reasonable inclusion across ages and income levelsboosts generalizability given the biases expected from our personal contexts

Fig 4.5: Duration of Relationship

At this age range, many individuals are experiencing their early adult years This

is a period often characterized by exploration and personal growth, where long-termcommitments might not be a priority Therefore, the prevalence of relationships in the1-2- and 2-4-year ranges aligns well with this life stage

Trang 20

3

= mt

10}

Í lglp33mas — 253508 35-45 years 245 yeats

Ouration Range

Fig 4.6: Distribution of previous lover Fig 4.7: Duration Before Breakup

In the study "Physical and Psychological Aggression in At-Risk Young Couples:Stability and Change in Young Adulthood" it was found that young couples who havebeen together for approximately 1.5 to 2.5 years are more likely to experience physical

and psychological aggression, which can lead to a breakup This aligns with the Vietnamese statement that young couples tend to break up around the 2.5-year mark, as highlighted in the mentioned study that during the period of 1.5-2.5 years, couples are prone to physical and psychological aggression [8].

Trang 21

4.1.3 Big Five Personality

Trang 22

Emotional Stability, and Openness to Experiences To score each trait, the responsesfor its paired questions must first be recorded where required based on the reverse-scored items denoted with "R" Next, the recorded responses are averaged across thedimension's two questions to obtain the subscale score For example, to computeExtraversion, the original response for Q1 is averaged with the reverse-coded responsefor Q6R to calculate the final Extraversion subscale score This results in a subscalescore between | to 5 for each of the Big Five traits These subscale values can bereferenced to gender norms to interpret one's relative standing on the dimensions Inthis manner, the 10-item questionnaire allows quick yet valid measurement of thefundamental Five Factor personality traits.[14]

Men tend to rate themselves higher in extraversion and openness to experiences.They believe they have more outgoing, socially engaged, and adventurous personalities

On the other hand, women demonstrate more confidence in their ability to regulateemotions They see themselves as calmer and more level-headed when facing stress.Regarding conscientiousness and meticulousness, both genders rate themselves highly.Men and women alike think that they tend to be thoughtful, deliberative decision makerswho consider multiple angles However, neither gender seems particularly assured oftheir own agreeableness and warmth

In summary, while my study drew upon the questionnaire items from "Measuringpersonality in one minute or less"[14] to generate indicative Big Five traitmeasurements, the personal analysis presented here is my own interpretation basedsolely on the results of my converted 10-item survey It suggests some subtle differences

in how men and women view certain aspects of their own personalities

Trang 23

4.2 Self-Assessment of Male and Female

4.2.1 Applying t-test and p-value

Conditional Mean Comparison:

If you have a continuous variable (like a score) and a categorical variable (like gender), you can use a t-test to compare the means of the continuous

variable across the groups of the categorical variable.

This can indicate whether there is a statistically significant difference in the continuous variable across groups, which may suggest a correlation between the two variables.

Using T-Tests in Regression Analysis:

In a regression model, t-tests are used to assess whether each independent variable significantly impacts the dependent variable.

The t-statistic and p-value for each variable in the regression model

indicate whether that variable is correlated with the dependent variable.

4.2.2 First Meeting

Table 4.1: t-test & p-value of Self-Assessment in first meeting

[oie | Nalotan | Fontetean | vtatnte | pvae

Trang 24

Through the data and charts at the first meeting, we can see that men pay more

Fig 4.9: boxplot of Self-Assessment in first meeting

18

attention to their own finances than women Women care a lot about their fidelity Otherfactors show us the relationship between men and women The above is proven throughp-value < 0.05 And it is understandable that the data is showing us that both men andwomen are less interested in Parenting and Desiring for Children because the data setmainly focuses on young people, an age group with little desire to want to have childrenmore than other age groups [31] That also means factors related to appearance andattention such as health, fidelity, and thoroughness

Trang 25

4.2.3 Current

Table 4.2: t-test & p-value of Self-Assessment currently

[xumee — ] Tnggượm | rmaeMem |remmel poate

Trang 26

Fig 4.10: boxplot of Self-Assessment currently

The boxplots for "Male Self Current Evaluation" and "Female Self CurrentEvaluation" reflect self-perceived attributes by individuals currently in a romanticrelationship Men's evaluations on factors such as "Finance" and "Ambition" display abroad distribution and higher medians, indicating that these attributes are rated highly

by men in the context of a romantic relationship For women, attributes like "Loyalty"and "Dedication" have higher medians, suggesting that these qualities are particularlyvalued by women in their romantic partnerships

The statistical analysis for "Self(current)" reveals a significant differencebetween men and women in the attribute of "Finance," with a p-value of 0.00017 Thissuggests that men in romantic relationships perceive their financial status as more

Trang 27

relationships perceive the importance of ambition, although not conclusivelysignificant The other attributes do not show a statistically significant differencebetween genders, suggesting a consensus on their importance within a romanticrelationship

The emphasis men place on "Finance" and "Ambition" may reflect their assessment of these traits as central to their identity or value within a romanticpartnership Women prioritizing "Loyalty" and "Dedication" may indicate these are keytraits that define their engagement in a relationship Both genders show no significantstatistical difference in how they self-evaluate other attributes, indicating a sharedperspective on the importance of these qualities in the context of their current romanticrelationships

self-4.3 Partner Assessment of Male and Female

3.193 -0.6163 3.064 -2.6612

p-value

Finance 0.000229

Attractiveness 0.163063

Loyalty 0.106041 Parenting 0.321845

Social Status 0.059779

Health 0.134887

Desire for children 0.187129

Dedication 0.232742 Ambition 0.538456

Family Relationship 0.008458

Ngày đăng: 02/10/2024, 02:27

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w