Machine learning models were developed to analyze key factors for long-termromantic relationship success among young Vietnamese adults.. This research not only offers a unique empirical
Trang 1VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY
UNIVERSITY OF INFORMATION TECHNOLOGY ADVANCED PROGRAM IN INFORMATION SYSTEMS
TRINH ANH TU — TANG QUOC MINH
USE MACHINE LEARNING TO DETERMINE WHICH
FACTORS ARE CRUCIAL IN LONG - TERM LOVE
BACHELOR OF ENGINEERING IN INFORMATION SYSTEMS
THESIS ADVISOR
Trang 2ASSESSMENT COMMITTEEThe Assessment Committee is established under the Decision , date
¬— by the Rector of the University of Information Technology
Le eee e cece eee eee ene tence ee ene nena eae n acess - Chairman
Qe ence nee ee ene eee teens ene teen - Secretary
—— eee ee eee ee ene e dene eee ene ta eee eae n een ey - Member
Wo lec cece cece cece cece eee eeeeeeneeeeeaeeeeaaaes - Member
Trang 3I would like to extend my deepest gratitude to Dr Trung for his exceptionalguidance and invaluable support during the development of my graduation thesis on
"Use machine learning to determine which factors are crucial in long-term love."
Dr Trung's expertise in machine learning and data science has been acornerstone of my academic journey His insightful feedback, rigorous academicstandards, and patient explanations have greatly enhanced both my understanding and
my interest in the field Under his mentorship, I have not only developed a robust thesisbut also cultivated a deeper appreciation for the intricacies and potentials of machinelearning
His approach to teaching, marked by a balance of challenging concepts and supportiveguidance, has been instrumental in pushing me to explore new ideas, refine mymethodologies, and strive for excellence in my research Dr Trung's ability to simplifycomplex concepts and encourage critical thinking has significantly contributed to mypersonal and professional growth
Furthermore, his constant encouragement and belief in my abilities provided mewith the confidence and motivation needed to navigate the challenges of my research.This journey, under Dr Trung's mentorship, has been one of immense learning andpersonal development
I am profoundly grateful for the time, effort, and wisdom that Dr Trung hasinvested in my thesis and my academic growth His mentorship has not only shaped thisthesis but also prepared me for future endeavors in the realm of data science andmachine learning
I would like to thank Dr Trung for being more than a supervisor; he has been amentor, a guide, and a significant influence on my academic path His impact extendsbeyond this thesis and will undoubtedly continue to inspire me in my future career
Trang 4TABLE OF CONTENT
Chapter 1 Introduction 2 Chapter 2 Literature Review 4
2.1 Dating Before the Internet Era 4
2.2 Dating in the Digital Age and Online Dating Applications 4
2.3 The Role of Professional Matchmaking Services 5
Chapter 3 Data Collection 6
3.1 Conducting the questionnaire 6
3.2 Data collection methods 9
Chapter 4 Data Understanding 11
4.1 Basic Information 11
4.1.2 Duration of Relationship 13 4.1.3 Big Five Personality 15
4.2 Self-Assessment of Male and Female 17
4.2.1 Applying t-test and p-value 17 4.2.2 First Meeting 17 4.2.3 Current 19 4.3 Partner Assessment of Male and Female 21
4.3.1 First Meeting 21 4.3.2 Current 23
4.3.3 Future 25
4.4 Previous Relationship Assessment of Male and Female 27
4.4.1 First meeting 27 4.4.2 Features 29
Independent Variables: Our study's independent variables are derived from two key sources - the current partner evaluations and breakup factors Specifically, we extracted 10 features based on the
‘Partner (Current)' ratings and inverted the importance rankings from 'Breakup Reasons' for ended
Trang 5of our independent variables, offering a comprehensive view of factors influencing romantic
compatibility.
5.4 Modeling Methods
5.5 Final Model Selection
5.5.1 Model trained result 5.5.2 Features Importance 5.6 Testing Hypothesis
5.7 Model Limitation
Chapter 6 Discussion
6.1 Data Collection Contribution
6.2 Important Factor in Long-Term Love
40 41
42
42 42 43 43 45 46
Trang 7Machine learning models were developed to analyze key factors for long-termromantic relationship success among young Vietnamese adults The study focused ondata collection, with primary data meticulously gathered through an extensivequestionnaire completed by 196 individuals A significant proportion of theseparticipants were aged 18-24, providing a dataset is essential for accurately identifyingpredictors of relationship success The evaluation process involved self, partner, andpast relationship assessments across 10 traits, leading to the derivation of matched andmismatched cases Advanced models like Random Forest, Logistic Regression, andSVM were employed, with the Random Forest model proving most effective, achieving
an 87.4% Fl Score This model particularly highlighted ‘Dedication’ as the keypredictor of enduring relationships While the study offers new insights, itacknowledges the need for more extensive data to validate these findings across varieddemographics This research not only offers a unique empirical perspective, particularlyfrom Vietnamese youth, but also illustrates the crucial role of detailed data collection
in understanding complex social dynamics
Trang 8Chapter 1 Introduction
Background to the Study:
In Vietnam, a nation marked by arich cultural heritage and rapid modernization,the dynamics of romantic relationships among young adults are evolving in uniqueways This demographic is at the intersection of traditional values and contemporaryrelationship paradigms, making their experiences in romantic relationships acompelling area for study While there is some research on romantic relationships inbroader Asian contexts, such as "Surrogate Dating and the Translation of Gendered
Meanings across Borders: The Case of China’s E-mail-Order Brides" by Monica Liu
(2015) [39], focused studies on the Vietnamese youth, particularly in the context ofgender differences in self and partner perception over time, are scarce This gap issignificant, considering the unique interplay of traditional Vietnamese values andmodern romantic concepts, as well as the global relevance of understanding genderdynamics in relationships, as explored in studies like "Gender Differences inCommunication Behaviors, Spatial Proximity Patterns, and Mobility Habits" by YangYang, et al (2016) [40]
Problem Statement:
This study aims to explore the critical factors contributing to the success of term romantic relationships among young Vietnamese adults, with a specific focus onunderstanding how gender influences self and partner perception at different stages of
long-a rellong-ationship This explorlong-ation is plong-articullong-arly relevlong-ant in the context of Vietnlong-am'srapidly changing social landscape, where traditional norms coexist with modernperspectives on love and relationships Research such as "Asymmetries of Men andWomen in Selecting Partner" by Haluk O Bingol, et al (2012) [41] and "Quantifyinggender preferences across humans lifespan" by Asim Ghosh, et al (2016) [42]highlights the importance of examining these gendered nuances in romantic contexts
Aim of the Study:
The primary aim is to identify key factors that contribute to the success of term romantic relationships among young adults in Vietnam, with a special focus ongender differences in self and partner perception over time To achieve this, the studywill employ advanced machine learning techniques to analyze survey data, offering anovel and comprehensive approach to understanding these complex dynamics
Trang 9long-Research Questions:
@ RQ1: What are the most significant factors contributing to the success of
long-term romantic relationships among young couples in Vietnam?
e© RQ2: How do gender differences influence the perception of key factors in a
relationship
Scope of the Study:
The study mainly focuses on Vietnamese youth aged 18-25 years, a demographicthat represents a significant portion of the population undergoing key life transitions.The methodology includes collecting data from 196 individuals in committed romanticrelationships and employing machine learning models for analysis to determine thecrucial factors in long-term relationships and the correlation between men and womenregarding in partner and self-assessments over each period
Significance of the Study:
This research aims to provide insights into the factors that enable young couples
in Vietnam to maintain successful and enduring relationships, with a particularemphasis on understanding the role of gender in shaping these dynamics It willcontribute to the academic discourse on romantic relationships within the Vietnamesecontext and offer comparative insights with studies conducted in other cultural settings,such as "Aspirational pursuit of mates in online dating markets" by Elizabeth E Bruch,
et al (2018) [43] and "Computational Courtship: Understanding the Evolution of OnlineDating through Large-scale Data Analysis" by Rachel Dinh, et al (2023) [44] The use
of machine learning in this research represents an innovative approach, enhancing thedepth and accuracy of the analysis
Study Outline:
The thesis is structured as follows:
Chapter 2: Literature Review
Chapter 3: Data Collection
Chapter 4: Data Understanding
Chapter 5: Modeling
Chapter 6: Discussion
Chapter 7: Conclusion
Trang 10Chapter 2 Literature Review
2.1 Dating Before the Internet Era
Dating in the pre-internet era was deeply rooted in traditional social structures
In many cultures, families and community elders play a significant role in arrangingmarriages, with a focus on long-term commitments [1] This period was characterized
by limited choices, often restricted by social class and geographical boundaries Theevolution of societal values over time, coupled with these limitations, gradually pavedthe way for the emergence of online dating
2.2 Dating in the Digital Age and Online Dating Applications
With the advent of the internet, dating underwent a significant transformation.Online platforms have expanded the possibilities for meeting potential partners,transcending traditional geographical and social constraints This era saw a shift in howrelationships are formed, with an emphasis on compatibility and communication
However, maintaining authenticity and ensuring safety became more prominentchallenges in the online dating landscape due to increased risks While online platformsallowed for greater connectivity, it also enabled deception and abuse more easily Due
to the fact that there were also incidents of racism, body-shaming, catfishing where fake
profiles were created to deceive others Personal information shared online couldpotentially be misused Meeting strangers from the internet also carried some physicalrisks
As these problems arose, dating apps and websites started implementing newfeatures to improve screening and promote integrity in user profiles and interactions.Implement secure authentication mechanisms, such as two-factor authentication, and
use end-to-end encryption to protect users’ data Nevertheless, with technology
advancing rapidly, bad actors also found newer ways to circumvent security measures
Trang 11In response to the growing safety concerns, many companies specializing invetted introductions emerged [33, 34] Services like eHarmony [45] and Rudicaf [46]invested in sophisticated verification processes to match compatible partners whoseintentions had been deemed genuine through in-depth profiling Their curated approachfocused on building meaningful relationships over encounters, hoping to minimize risks
in the evolving world of online dating [5, 6, 7]
2.3 The Role of Professional Matchmaking Services
Conventional matchmaking services rely heavily on human labor to conductcompatibility assessments and matches Counselors interview each client to understandtheir needs and preferences They then scroll through profiles to find suitable candidatesbased on their expertise and intuition
While this personalized approach is effective, its scalability and efficiency arelimited by real-world constraints The matching process depends on the skills ofcounselors and becomes cumbersome as the user base grows It lacks systematicevaluation methods to validate effectiveness [8]
This provided the motivation for my research which aims to apply machinelearning techniques to optimize different steps in the matchmaking workflow.Algorithms could be developed to assist counselors in profiling users based on traitsthat truly correlate with relationship success and recommend selections that counselorsmay not think of, ultimately improving outcomes For instance, a two-side matchingframework and an LDA (Latent Dirichlet allocation) model to learn user preferencesfrom messaging behavior and profile features have been proposed [9] Moreover, theeffect of specific factors like sport activity on human mating in online dating has beenexplored using causal machine learning techniques [10] Additionally, the matchmakingproblem in electronic social networks has been formulated as an optimization problem,suggesting a function to measure the matching degree of interest [11] Even in contextslike brand-influencer matchmaking on Instagram, machine learning techniques haveshown their potential in matching based on online profiles [12]
Trang 12Chapter 3 Data Collection
3.1 Conducting the questionnaire
We initiated our questionnaire development by performing a comprehensiveliterature review of existing research papers related to self and partner attractiveness,love relationships, and breakups As information technology students without a strongbackground in psychology, we aimed to ground our questionnaire in establishedscientific theories and instruments to ensure validity
Our literature search uncovered a multitude of high-quality articles across reviewed psychology, sociology, and interpersonal communications journals Aftercarefully screening over 50 abstracts, we narrowed down to six seminal papers thatdirectly informed our questionnaire scope These examined self-perception, mateselection preferences, relationship stages, predictors of breakups, and personality
peer-assessments.
Two seminal studies formed the integral framework for developing ourrelationship questionnaire: "Cognitive processes underlying human mate choice: Therelationship between self-perception and mate preference in Western society" byConroy-Beam & Buss (2016) [13], and the "Measuring personality in one minute orless: A 10-item short version of the Big Five Inventory in English and German" byRammstedt & John (2007) [14] Conroy-Beam and Buss empirically devised a reliable,valid 10-item measure of how Westerners perceive self and ideal long-term matesacross key traits This provided the basis to adapt sections asking Vietnameseparticipants to rate themselves and partners from 1-5 on possessing these 10 qualities atthree key relationship stages — first meeting, current, and future aspirations The stagesenabled analyzing gender and temporal differences statistically Additionally,Rammstedt and John's 10-item Big Five personality questionnaire assessingExtraversion, Agreeableness, Conscientiousness, Emotional Stability, and Openness to
Trang 13In addition to the relationship questions, we also developed a 10-item list ofcommon factors leading to breakups based on the same traits Participants rated theextent they believed each item contributed to their past failed relationships on a 5-pointLikert scale This provided supplementary data on what interpersonal deficiencies most
strongly undermine Vietnamese couples’ relationship stability.
All our questions have answer scale from 1 (Strongly Disagree) to 5 (StronglyAgree), here are the questionnaire structure:
1 Big five personality questions
Table 3.1: Big five personality questions
Question: Do You Think Yourself Is?
Extroverted, enthusiasticCriticizing, arguing
Calm and emotionally stable
Traditional, not creative
Trang 142 Partner and self-evaluate Questions
Table 3.2: Partner and self-evaluate questions
Question: What factors do you evaluate as important
Finances (income and inheritance)Attractive appearance
ParentingSocial statusHealth
Desire for childrenDedication
AmbitionFamily relationship
3 Questions for breakup reasons
Table 3.3: Questions for breakup reasons
Question: What factors do you evaluate as important lead to
breakupDoesn’t have Finances (income and inheritance)Doesn’t have Attractive appearance
Doesn’t have Loyalty
Doesn’t have Parenting Doesn’t have Social status Doesn’t have Health
Doesn’t have Desire for children
Doesn’t have Dedication Doesn’t have Ambition
Trang 15In summary, through rigorous academic research and culturally adaptingestablished instruments, we developed a comprehensive relationship questionnaire withthree sections: self and partner ratings across relationship stages, breakup predictors,and a personality assessment.
3.2 Data collection methods
Our data collection methodology is centered on leveraging personal connections
to widely distribute links to our online survey questionnaire As a two-person student
research team, we relied on a viral “snowball sampling” approach involving an initial
seeding to our extended social and professional networks, followed by exponential to-peer sharing
peer-In the first phase, we individually sent out the survey link via messaging apps toevery connection each comprising friends, classmates, co-workers, relatives andacquaintances, with an invitation message to help participate and pass on the survey totheir own networks This enabled us to tap into multiple diverse segments spanninguniversity students, public sector employees and others Within a few days, weobserved rapid voluntary diffusion as our contacts promoted the study among theircoworkers, organizations, societies, family members and online communities Whilesnowball sampling risks selection bias, the randomness arising from viral uncontrolledsharing mitigated this and secured representation across relationship statuses, locations,ages, occupations, and interests Our linking to the survey via Web links and QR codes
on multiple platforms like Facebook, Zalo and Telegram also expanded visibility
Concurrent to the viral approach, we directly collected data in-person throughprinted questionnaires distributed to students on various university campuses as well aspublic spaces like cafés, malls and parks around Ho Chi Minh City To qualifyrespondents, we screened them on meeting two criteria: currently in a romanticrelationship, and willing to complete the 15-minute questionnaire seriously andhonestly These public intercept surveys boosted sample diversity, especially those lessactive digitally
Over the intense one-week data gathering phase combining digital outreach andin-person collection, we obtained 196 valid responses meeting quality checks For open-ended questions, we standardized free text into categories based on a codebook weinductively developed to designate emerging themes
Trang 16This methodology generated a robust convenience sample suitable forexploratory relationship research, not intended to statistically represent Vietnam'sbroader young adult population The multi-channel collection protocol also appearedeffective for rapidly capturing 196 diverse responses with limited researcher resources
Trang 17In our data understanding process, it is crucial to note that we have not definedany threshold for what constitutes ‘long-term love.' Our dataset comprises responsesfrom individuals who are currently in relationships up to the point they participated inthe survey This approach reflects our focus on understanding relationship dynamics attheir current stage, without a predetermined definition of long-term commitment Thisperspective is essential in framing our analysis and interpretation of the data in thecontext of contemporary relationship experiences.
The six key variables we examined through graphs were: gender, age, monthlyincome, education level attained, length of current relationship, and scores on the BigFive Personality assessment Plotting the demo graphics provided a broad overview ofthe composition and profile attributes of the respondents who completed ourquestionnaire Inspecting these distributions enabled us to evaluate the quality of thedata captured through the collection protocols to validate sufficient coverage acrosscategories of interest needed for later analysis stages After this initial descriptivestatistical phase, we felt assured of having a sizable quality dataset to proceed withtesting hypotheses and developing predictive models to address the motivating aims ofour couple dynamics research
Going forward, figuring out the generalizability and limitations on extendingfindings to the broader population would involve comparing sample characteristics tonational census benchmarks These visual methods for scanning the dataset alsodirected planning tailored statistical tests suited for the underlying data propertiesreflected across variables like age and income Overall, the diagnostics provided thefoundation to delve deeper into analyses that could yield actionable insights for ourresearch questions
Trang 18Fig 4.1 : Distribution of Male and Female Fig 4.2:Distribution of Age
Education Level Distribution
Monthly Income Distribution
Bachelors income Range
Fig 4.3: Distribution of Educational Level Fig 4.4: Distribution of Income
Our sample of 196 responses exhibits a moderately unbalanced gender split, with
55.6% males compared to 44.4% females This could be attributed to both research team members being male students from the male-dominated University of Information
Technology (UIT) The dominant representation is youths aged 18-25 years old,
confirming alignments with our intended target group for exploring dynamics between
young adult romantic partners
Trang 19Most respondents hold a university degree (79.1%), followed by junior college,
vocational, master's, doctorate level and lastly high school qualifications Monthly income distribution clusters primarily within the under 10 million VND bracket, capturing overall earning power of Vietnamese youth demographics The education
level, age range, and income levels also reasonably match the typical student profile atour university
Taken together, while the convenience sampling evidenced slightly more maleand highly educated participation, reasonable inclusion across ages and income levelsboosts generalizability given the biases expected from our personal contexts
Fig 4.5: Duration of Relationship
At this age range, many individuals are experiencing their early adult years This
is a period often characterized by exploration and personal growth, where long-termcommitments might not be a priority Therefore, the prevalence of relationships in the1-2- and 2-4-year ranges aligns well with this life stage
Trang 203
= mt
10}
Í lglp33mas — 253508 35-45 years 245 yeats
Ouration Range
Fig 4.6: Distribution of previous lover Fig 4.7: Duration Before Breakup
In the study "Physical and Psychological Aggression in At-Risk Young Couples:Stability and Change in Young Adulthood" it was found that young couples who havebeen together for approximately 1.5 to 2.5 years are more likely to experience physical
and psychological aggression, which can lead to a breakup This aligns with the Vietnamese statement that young couples tend to break up around the 2.5-year mark, as highlighted in the mentioned study that during the period of 1.5-2.5 years, couples are prone to physical and psychological aggression [8].
Trang 214.1.3 Big Five Personality
Trang 22Emotional Stability, and Openness to Experiences To score each trait, the responsesfor its paired questions must first be recorded where required based on the reverse-scored items denoted with "R" Next, the recorded responses are averaged across thedimension's two questions to obtain the subscale score For example, to computeExtraversion, the original response for Q1 is averaged with the reverse-coded responsefor Q6R to calculate the final Extraversion subscale score This results in a subscalescore between | to 5 for each of the Big Five traits These subscale values can bereferenced to gender norms to interpret one's relative standing on the dimensions Inthis manner, the 10-item questionnaire allows quick yet valid measurement of thefundamental Five Factor personality traits.[14]
Men tend to rate themselves higher in extraversion and openness to experiences.They believe they have more outgoing, socially engaged, and adventurous personalities
On the other hand, women demonstrate more confidence in their ability to regulateemotions They see themselves as calmer and more level-headed when facing stress.Regarding conscientiousness and meticulousness, both genders rate themselves highly.Men and women alike think that they tend to be thoughtful, deliberative decision makerswho consider multiple angles However, neither gender seems particularly assured oftheir own agreeableness and warmth
In summary, while my study drew upon the questionnaire items from "Measuringpersonality in one minute or less"[14] to generate indicative Big Five traitmeasurements, the personal analysis presented here is my own interpretation basedsolely on the results of my converted 10-item survey It suggests some subtle differences
in how men and women view certain aspects of their own personalities
Trang 234.2 Self-Assessment of Male and Female
4.2.1 Applying t-test and p-value
Conditional Mean Comparison:
If you have a continuous variable (like a score) and a categorical variable (like gender), you can use a t-test to compare the means of the continuous
variable across the groups of the categorical variable.
This can indicate whether there is a statistically significant difference in the continuous variable across groups, which may suggest a correlation between the two variables.
Using T-Tests in Regression Analysis:
In a regression model, t-tests are used to assess whether each independent variable significantly impacts the dependent variable.
The t-statistic and p-value for each variable in the regression model
indicate whether that variable is correlated with the dependent variable.
4.2.2 First Meeting
Table 4.1: t-test & p-value of Self-Assessment in first meeting
[oie | Nalotan | Fontetean | vtatnte | pvae
Trang 24Through the data and charts at the first meeting, we can see that men pay more
Fig 4.9: boxplot of Self-Assessment in first meeting
18
attention to their own finances than women Women care a lot about their fidelity Otherfactors show us the relationship between men and women The above is proven throughp-value < 0.05 And it is understandable that the data is showing us that both men andwomen are less interested in Parenting and Desiring for Children because the data setmainly focuses on young people, an age group with little desire to want to have childrenmore than other age groups [31] That also means factors related to appearance andattention such as health, fidelity, and thoroughness
Trang 254.2.3 Current
Table 4.2: t-test & p-value of Self-Assessment currently
[xumee — ] Tnggượm | rmaeMem |remmel poate
Trang 26Fig 4.10: boxplot of Self-Assessment currently
The boxplots for "Male Self Current Evaluation" and "Female Self CurrentEvaluation" reflect self-perceived attributes by individuals currently in a romanticrelationship Men's evaluations on factors such as "Finance" and "Ambition" display abroad distribution and higher medians, indicating that these attributes are rated highly
by men in the context of a romantic relationship For women, attributes like "Loyalty"and "Dedication" have higher medians, suggesting that these qualities are particularlyvalued by women in their romantic partnerships
The statistical analysis for "Self(current)" reveals a significant differencebetween men and women in the attribute of "Finance," with a p-value of 0.00017 Thissuggests that men in romantic relationships perceive their financial status as more
Trang 27relationships perceive the importance of ambition, although not conclusivelysignificant The other attributes do not show a statistically significant differencebetween genders, suggesting a consensus on their importance within a romanticrelationship
The emphasis men place on "Finance" and "Ambition" may reflect their assessment of these traits as central to their identity or value within a romanticpartnership Women prioritizing "Loyalty" and "Dedication" may indicate these are keytraits that define their engagement in a relationship Both genders show no significantstatistical difference in how they self-evaluate other attributes, indicating a sharedperspective on the importance of these qualities in the context of their current romanticrelationships
self-4.3 Partner Assessment of Male and Female
3.193 -0.6163 3.064 -2.6612
p-value
Finance 0.000229
Attractiveness 0.163063
Loyalty 0.106041 Parenting 0.321845
Social Status 0.059779
Health 0.134887
Desire for children 0.187129
Dedication 0.232742 Ambition 0.538456
Family Relationship 0.008458