1. Trang chủ
  2. » Luận Văn - Báo Cáo

Final report major business intelligent social media data analysis

33 0 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Social Media Data Analysis
Tác giả Le Nguyen Trieu Giang, Tran Ngoc My Dung, Cao Nhat Phi, Nguyen Duc Vinh, Nguyen Phi Hung
Người hướng dẫn Tran Thanh Cong, Lecture
Trường học Ho Chi Minh City University of Economics and Finance
Chuyên ngành Business Intelligent
Thể loại Final Report
Năm xuất bản 2024
Thành phố Ho Chi Minh City
Định dạng
Số trang 33
Dung lượng 3,66 MB

Cấu trúc

  • CHAPTER 1: IDENTIFY PROBLEM (0)
    • 1.1. OVERVIEW (5)
    • 1.2. PROBLEM STATEMENT (5)
    • 1.3. OBJECTIVES OF THE STUDY METHODOLOGY (5)
    • 1.4. REPORT THE STRUCTURE (6)
      • 1.4.1. INTRODUCTION (6)
      • 1.4.2. DATA COLLECTION (6)
      • 1.4.3. ANALYSIS AND FINDINGS (7)
      • 1.4.4. RECOMMENDATIONS (7)
      • 1.4.5. CONCLUSION (7)
      • 1.4.6. REFERENCES (7)
  • CHAPTER 2: DATA DESCRIPTION (0)
    • 2.1. IMPORT PYTHON LIBRARIES (8)
    • 2.2. READING DATASET (8)
      • 2.2.1. DATA HEAD (9)
      • 2.2.2. DATA TAIL (10)
      • 2.2.3. DATA INFO (11)
  • CHAPTER 3: DATA CLEANING (12)
    • 3.1. WHAT IS DATA CLEANING? (12)
    • 3.2. CHECK FOR DUPLICATION (13)
    • 3.4. DATA REDUCTION (15)
    • 3.5. FEATURE ENGINEERING (16)
    • 3.6. CREATING FEATURES (16)
  • CHAPTER 4: EDA EXPLORATORY DATA ANALYSIS (18)
    • 4.1. STATISTICS SUMMARY (18)
    • 4.2. EDA UNIVARIABLE ANALYSIS (20)
    • 4.3. DATA TRANSFORMATION (24)
    • 4.4. EDA BIVARIATE ANALYSIS (25)
    • 4.5. EDA MULTIVARIATE ANALYSIS (27)
  • CHAPTER 5: RECOMMENDATIONS (30)
  • CHAPTER 6. CONCLUSIONS (31)
    • 6.1. SUMMARIZE THE PROJECT (31)
    • 6.2. LIMITATIONS (31)

Nội dung

5 Figure 3: Description of the last 5 lines of the data table .... 7 Figure 5 Unique data description for each data column .... 11 Figure 8: Description of the column of redundant data t

IDENTIFY PROBLEM

OVERVIEW

Relying on social media data analysis to maximize sales is a project in which our team will analyze data from social media platforms like as Facebook, Twitter, Instagram, LinkedIn, and others to determine which platforms are popular and when consumers most frequently visit them From there, we supply organizations with appropriate solutions and ways to reach clients quickly and successfully improve their sales and marketing initiatives.

PROBLEM STATEMENT

In recent years, Vietnam's economy has grown significantly and steadily The state fosters new company engagement in small and medium-sized locations The expansion in the number of firms encourages originality in business thinking while also allowing entrepreneurs to continually reinvent company processes One of the most significant innovations in business today is the integration of database-based information technology into corporate operating systems Administrators struggle to digest input data and must make educated judgments Decision support systems have proved beneficial for enterprises Furthermore, fast industry growth forces organizations to swiftly identify and forecast market trends That is why our team monitors traffic to social networking sites and offers suitable business advice to firms looking to develop

Businesses may gain a lot from analyzing social media data to maximize sales, including improved customer understanding, higher engagement, performance measurement, and lower marketing expenses.

OBJECTIVES OF THE STUDY METHODOLOGY

• Identify trends and patterns: Analyze user-generated material to determine popular themes, hashtags, and interaction patterns across many social media sites

• To understand user behavior: Examine how people interact with material, including their preferences, feelings, and engagement metrics like as likes, shares, and comments

• To measure impact and influence: Determine the effect of social media initiatives, influencers, or viral material on brand perception, customer behavior, and audience attitude

• Explore Audience Segmentation: To better understand your social media audience's requirements and preferences, segment them by demographics, interests, and habits

• Identify Opportunities and Challenges: Identify new opportunities or risks in the social media ecosystem, such as algorithm updates, regulatory difficulties, or changes in customer behavior

• To Inform Decision Making: Provide stakeholders with information and suggestions so they can make educated decisions about content strategy, marketing campaigns, customer service, and product development

• To enhance customer engagement: Create methods for increasing consumer engagement and loyalty via tailored content, timely communication, and community management on social media platforms

• To support research and innovation: Contribute to scholarly research and innovation in social media analysis methodology, tools, and best practices, helping to develop the discipline and handle emergent difficulties.

REPORT THE STRUCTURE

• Provide an overview of the study's objectives and rationale

• Introduce the importance of social media analysis in understanding user behavior, brand perception, and market trends

• Outline the structure of the report

• Provide an overview of the social media platforms and datasets used in the study

• Explain the criteria for selecting data sources and the timeframe of data collection

• Present the results of the social media analysis based on the research objectives

• Analyze trends, patterns, and insights derived from the data

• Use visualizations such as charts, graphs, and heatmaps to illustrate key findings

• Interpret the findings in the context of existing literature and theoretical frameworks 1.4.4 RECOMMENDATIONS

• Provide actionable recommendations based on the study findings

• Suggest strategies for optimizing social media engagement, improving brand reputation, or addressing identified challenges

• Prioritize recommendations based on their potential impact and feasibility of implementation 1.4.5 CONCLUSION

• Summarize the key findings of the study and their implications

• Reflect on the contribution of the study to the field of social media analysis

• Highlight avenues for future research and areas for further exploration

• Provide a list of references cited throughout the report, following the appropriate citation style.

DATA DESCRIPTION

IMPORT PYTHON LIBRARIES

The first step related to ML using Python is to understand and test our data using libraries

Import all the libraries needed for our analysis, such as Data Loading, statistical analysis, Visualization, Data Transformation, Merge, and Join, etc.

READING DATASET

The Pandas library provides many possibilities for loading data into a Pandas DataFrame from files such as JSON, csv, xlsx, sql, pickle, html, txt, images, etc

Most data is available in table format of CSV files It is trendy and easily accessible By using the read_csv() function, the data can be converted to a pandas DataFrame

Exploratory Data Analysis (EDA) plays a crucial role in examining social media user data, aiding businesses in understanding communication network user approaches and identifying factors influencing their operations Social media data is often stored in DataFrames for analysis EDA techniques enable businesses to uncover patterns and correlations, guiding decision-making and optimizing their operations based on data-driven insights.

Function data head: The data.head() method is designed to display the first few rows of a DataFrame (two-dimensional data table) or Series (one-dimensional data list) in pandas The default number of rows to be displayed is 5, but you can specify how many rows you want to display by passing a parameter to this method

• Preview data: data.head() helps preview a part of the data without displaying the entire DataFrame or Series, saving time and increasing productivity

• Check the data structure: By displaying the first few rows, you can check the structure of the data, including the columns and their data types

• Check input data: You can use data.head() to check data after reading it from some source, such as from a CSV file or database

• Inspect data after processing: If you have performed data transformations, you can use data.head() to examine the results of those transformations

Function: The data.tail() method is designed to display the last few rows of a DataFrame or Series in pandas The default number of rows to be displayed is 5, but you can also specify how many rows you want to display by passing a parameter to this method

• Preview data: data.tail() helps preview the last part of the data without displaying the entire DataFrame or Series, saving time and increasing productivity

• Check the data structure: By displaying the last few rows, you can check the structure of the data, including the columns and their data types

• Check input data: You can use data.tail() to check data after reading it from some source, such as from a CSV file or database

• Inspect data after processing: If you have performed data transformations, you can use data.tail() to examine the results of those transformations

Figure 3: Description of the last 5 lines of the data table

Function: The data.info() method in the Python pandas library is used to display descriptive information about the DataFrame, including the number of rows and columns, the data type of each column, and the total number of values There are no missing items in each column Below is a description of data.info()'s functionality and usage:

• Total number of rows (entries) and columns (columns)

• The name of each column and the data type of each column

• Total number of non-null values in each column

Figure 4: Description of data columns

DATA CLEANING

WHAT IS DATA CLEANING?

Data cleaning during analytical data transfer is the process of processing data to clean and normalize data before performing analysis The goal of data cleaning is to remove or correct inaccurate, unreliable, or unreliable values in the data to ensure that the data input to the analysis process is accurate and reliable trust

Below are some of the methods and operations commonly performed when cleaning data during analytics streaming:

• Eliminate data loops: Check and eliminate data records that have identical values in rows or columns of data

• Handling missing data: Filling in missing values in the data, often using methods such as filling in the mean, median, or mode, or using model predictions to predict the missing base value on other data

• Prepare data: Ensure that values in column data are expressed in the same units or on the sam scale, for example by converting unit measures (e.g., change from Fahrenheit to degrees Celsius), or converting the data to the same form (for example, convert a string to lower or upper case)

• Noise Removal: Identifies and removes value noise or imprecision in data, such as value margins or outliers that may be the result of recording errors or inaccurate recording

• Error checking and error correction: Check data to detect invalid or unprocessable values and correct errors if any

• Reformat data: Ensure that column data is properly formatted as numbers, strings, dates, or other data types appropriate to their content

• Identify minimum properties: Check minimum properties between columns of data or between data records, to ensure that data is recorded or collected correctly and without integration

• Data cleansing is an important part of the analytics data pipeline, ensuring that analytics results are accurate and meaningful.

CHECK FOR DUPLICATION

The data.nunique() method in data analysis is used to count the number of unique values (unique values) in each column of DataFrame data It returns a Series containing the number of unique values in each column, with the column name number (index) In the table, the resulting social media data is analyzed

Figure 5: Unique data description for each data column

Missing Values Calculation is the process of calculating and evaluating the proportion of missing values in a data set When working with real-world data, it is common to encounter missing or incomplete data There are many reasons for missing data, including errors in data collection, data conversion, or data not being available for specific observations data.isnull().sum()

Figure 6: Depicting no missing dataAfter checking the above data, there is no blank data.

DATA REDUCTION

Some columns or variables can be dropped if they do not add value to our analysis

In the data table above, column "Unnamed: 0.1" and column "Unnamed: 0" have the same data We will delete one of the two columns without affecting the analysis of social media data that we doing

Figure 7: Description of columns 0 and 1 having duplicate data

The code line data.drop('Unnamed: 0.1', axis=1, inplace=True) is used to remove the column named 'Unnamed: 0.1' from the DataFrame data This line does the following:

Figure 8: Description of the column of redundant data that

FEATURE ENGINEERING

Feature engineering refers to using domain knowledge to select and transform the most relevant variables from raw data when creating a predictive model using machine learning or statistical modeling The main goal of Feature engineering is to create meaningful data from raw data.

CREATING FEATURES

"Creating features" is the process of creating new variables or attributes from the original data to improve model performance or to better understand the data In data analytics and machine learning, selecting and generating appropriate features can be an important and extremely creative part of the modeling process

Figure 10 Description of datetime data table :

Create a feature named “datetime” to be able to retrieve data specifically and easier to analyze data for optimization in the sales field based on the social media user data panel.

EDA EXPLORATORY DATA ANALYSIS

STATISTICS SUMMARY

Summary statistics in social media analysis offer valuable insights by synthesizing key numerical data from the analysis process They provide an overview of essential metrics such as post counts, views, interactions, and participation levels Furthermore, they delve into user-specific information, including time spent, geographic locations, preferred platforms, and social network behaviors By analyzing these statistical data, analysts can uncover significant trends and patterns, such as user engagement over time, posting frequency, share rates, and community involvement.

A statistical summary provides a general understanding of the data's distribution, including whether it is regularly distributed, skewed left or right, or contains any outliers This may be accomplished in Python by using describe() The function describe() offers all of the data's statistical summaries Besides that describe() is a function that yields a statistical overview of data of numeric data types, such as float and int

Figure 11: Describe data without non-numeric columns

The data varied widely in count, likes, and retweets, with significant dispersion (std) compared to the mean, showing a marked difference in user engagement Time variables such as Year, Month, Day, and Hour are also recorded, providing information about time patterns in user activity on social media platforms

Figure 12: Describe data with all type of columns From the extensive statistical summary, we see that the data has a diversity of users and social media platforms with 685 users and 4 unique platforms recorded Instagram was the most popular platform with 258 mentions, while '#Compassionate #TearsOfEmpathy' was the most prominent hashtag with 3 uses The data also showed variation in engagement levels, with likes averaging around 42, but with the largest difference of 80 Geographically, the data was concentrated mainly in the United States with 59 mentions Finally, 'Positive' was the most recorded emotion with 44 appearances, reflecting a positive aspect of social media posts.

EDA UNIVARIABLE ANALYSIS

This bar chart shows a comparison between the number of posts on three different social media platforms based on the data collected Instagram leads with the highest number of posts, followed by Facebook and Twitter This disparity may reflect the popularity or priority of brands when choosing platforms to engage with their audience This provides insight into how content is communicated and disseminated on social media platforms, as well as shows which platforms can be the most powerful marketing channels for customer outreach strategies

This graph shows the distribution of the number of "Likes" received by social media posts This distribution takes the form of multiple vertices, with multiple vertices appearing at different value intervals of "Likes" This shows that there are many groups of posts with the number of "Likes" concentrated at different levels, from low to high Specifically, some peaks stand out at around 20 to 30 "Likes" and another peak at around 40 "Likes" After that, the number of "Likes" tends to gradually decrease, with a small number of posts receiving more than 70 "Likes" This chart provides insight into how users interact with content on social media platforms and can be useful for a deeper analysis of content strategies and social media marketing techniques

This chart shows the total number of "Likes" received by posts from different countries, with data grouped by country The US came out on top with the highest number of

"Likes", followed by the United Kingdom and Canada, followed by Australia and India It appears that the data has duplicated the country name, possibly due to an input error or the display of the chart Despite this, the chart provides important information about the popularity of social media content in specific countries, showing that the US is the most engaged market on these platforms This information can help businesses adjust their marketing strategies to focus on areas where they are more likely to interact, in order to optimize the effectiveness of advertising and content campaigns.

Figure 16: Chart of 2 variables “Top country likes”

This bar graph shows the number of posts tied to each hashtag in the top 10 list Each column corresponds to a different hashtag, and the height of the column represents the number of posts The data shows an uneven distribution, with some hashtags being used more than others This provides information about current trends and interests across social media platforms, helping marketers better understand hot topics or issues of most public interest Notably, hashtags associated with emotions and reactions such as '#Compassionate' and'#TearsOfEmpathy' appear with high frequency, which may reflect users' interest in content that is humane and sympathetic.

DATA TRANSFORMATION

After performing EDA Univariate Analysis to get an overview of the initial data, we found that the data results in the charts were not really consistent and there were many overlapping variables Therefore, Data Transformation will be applied after understanding the characteristics of the data and need to be adjusted to suit the goals of analysis and modeling Separation of Number variables and classification:

Applying the strip() method on the 'Text' column eliminates unwanted spaces from the start and end of strings This process cleans data by removing extra spaces, tabs, and carriage returns, resulting in more refined data.

EDA BIVARIATE ANALYSIS

Figure 19: Bivariate graph between Hour and Like

This scatter graph shows the relationship between the number of "Likes" that social media posts receive and the hours of the day that they are posted The data points are scattered showing great variation in the number of "Likes" at each specific time of day The trendline in the scatter chart shows a slight ascending relationship between the number of "Likes" and the time of day, which may indicate that posts at certain hours may get more engagement However, the wide dispersion of data points also implies that many other factors can affect the number of "Likes" a post receives, and it is impossible to draw firm conclusions based on this chart alone

Figure 20 Schedule 2 variables "like" and "platform" :

This chart shows the distribution of "Likes" on social media platforms through a blue histogram with a red estimated density (KDE) line above The blue columns indicate the number of posts that receive a certain amount of "Likes", while the KDE line provides a smooth view of the overall trend of distribution The graph shows the concentration of many posts with a small number of "Likes", with a small number of posts receiving a high number of "Likes", this is shown by the peak of the histogram and the KDE line This also reflects disparities in popularity or user engagement between posts.

EDA MULTIVARIATE ANALYSIS

Figure 21 Multivariate chart "platform" Instagram, Facebook, Twitter :

Figure 21 illustrates the intricate relationships among engagement metrics such as 'Likes' and 'Retweets', as well as posting times across Twitter, Instagram, and Facebook The diagonal density plots reveal right-skewed distributions for both 'Likes' and 'Retweets', indicating that while the majority of posts receive modest engagement, a fractional segment achieves significantly higher interaction, a trend that is particularly accentuated for Instagram Off-diagonal scatterplots illuminate a strong positive correlation between 'Likes' and 'Retweets', underscoring the propensity for posts that captivate users to garner substantial sharing, a pattern prevalent across all platforms This attests to the symbiotic nature of engagement metrics The color-coded data points distinctly demonstrate that Instagram, in particular, is a powerhouse for 'Likes', signifying deeper user engagement relative to Twitter and Facebook However, the temporal data juxtaposed with engagement metrics do not conclusively indicate a direct correlation with posting times, hinting instead at the nuanced and multifaceted landscape of social media interactions that could benefit from further granular analysis Collectively, these insights gleaned from the multivariate analysis are pivotal in shaping nuanced content dissemination strategies, emphasizing the utility of customizing approaches to each unique platform and exploiting Instagram's notable engagement potential The data-driven narrative of Figure 21 thus substantiates the strategic imperative of content optimization and engagement maximization for a robust social media presence

Figure 22: Multivariate chart “Percentages of Platforms”

Figure 22 presents a pie chart that delineates the distribution of social media platform usage amongst users The chart distinctly illustrates that Instagram accounts for 35.2% of the usage, followed closely by Twitter at 33.2%, and Facebook at 31.6% This nearly even distribution suggests a diverse social media landscape where user attention is not heavily skewed toward a single platform

The multivariate nature of this analysis stems from the comparison between different platforms, providing a visual representation of their respective market shares within the dataset The close percentages indicate that while Instagram slightly leads in this dataset, all three platforms maintain a significant presence within the user base

It is also noteworthy that the percentage points are within a small range, suggesting that any shifts in user behavior or platform features could influence the leading position This indicates the competitive nature of social media platforms and the need for businesses to maintain a versatile and adaptive online presence

Strategic insights can be drawn from this distribution for businesses looking to maximize their social media impact A balanced approach might be recommended, considering the similar market share of each platform However, the slight edge of Instagram could be leveraged for campaigns aimed at engagement, due to its visual nature and the engagement patterns identified in previous analyses

In essence, Figure 22 provides a multivariate snapshot of platform usage, highlighting the necessity for a dynamic and informed approach to social media marketing strategy, given the relatively equal distribution of user engagement across the platforms.

RECOMMENDATIONS

According to the research results, Instagram is the most popular social media network

As a result, we have the following advice for businesses:

• Optimize Instagram Posting Time: Based on post timing data, focus on posting at times when users are most likely to pay attention and engage with your content Providing compelling material at these times can assist in boosting engagement

• Curate information to Increase Engagement: Incorporate emotions and information that elicits empathy and compassion into your content tactics Use tales, photographs, or films that will make users feel emotionally connected and thrilled This may boost engagement and foster a good community surrounding the company

• Focus your Marketing Strategy on Instagram: Given that Instagram has been identified as the most engaging platform, consider investing more efforts to generating content and marketing strategies for this platform This involves improving account management, providing original material, and actively participating with the community

• Geographic optimization and content customization: Use geographic significance data to tailor content and marketing tactics to the regions with the highest engagement This might involve developing local content or advertising to boost engagement and brand exposure in the area

• Monitor and Evaluate efficacy: Ensure that the recommended tactics are frequently monitored and assessed to determine their efficacy and any required improvements Utilize community comments and data to constantly enhance your social media marketing plan. These recommendations should be implemented in practical ways and adapted depending on the specific business and market context.

CONCLUSIONS

SUMMARIZE THE PROJECT

Leveraging social media analytics, Instagram stands out as the most engaging platform, a crucial element in digital marketing Strategic timing of posts is vital for maximizing user engagement, as specific windows exist when posts garner increased visibility and interaction This underscores the importance of platform selection and optimized posting times to enhance sales and marketing strategies.

Geographical targeting is another essential finding, with the United States marking a high engagement locale, suggesting marketers focus their efforts on this region for increased impact Additionally, the emotional resonance of content, especially those that evoke empathy and compassion, has been proven to significantly enhance user engagement, advocating for a content strategy that prioritizes emotional connectivity.

LIMITATIONS

However, the study's insights come with a set of limitations that merit consideration The analysis, focused on a select group of social media platforms, may not capture the full spectrum of digital engagement across the broader social media landscape This limitation suggests the need for future research to include a more diverse array of platforms for a comprehensive understanding

The temporal nature of the data underscores another limitation, as social media trends are notoriously fluid and rapidly evolving The insights derived represent a snapshot in time and may not hold in the face of swiftly changing user behaviors and platform algorithms Furthermore, the study's reliance on quantitative data analysis means the nuanced dynamics of user engagement, such as content quality, sentiment, and the underlying reasons for user interactions, remain largely unexplored These qualitative aspects are crucial for a deeper understanding of social media engagement and necessitate incorporation in future studies

Lastly, the generalizability of the study's findings across different industries and business models is not assured The unique characteristics of each industry and target audience mean that the insights may need to be adapted to fit specific contexts, highlighting the importance of tailored analyses

To address these limitations and build upon the current study's findings, future research should aim to expand the scope of analysis to include a wider range of social media platforms and data types Incorporating real-time data and qualitative analysis will also enhance the robustness and applicability of the insights, ensuring they remain relevant in the fast-paced world of social media

Ngày đăng: 08/10/2024, 16:41

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN