1. Trang chủ
  2. » Luận Văn - Báo Cáo

unit 14 business intelligence 3

84 1 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Unit 14: Business Intelligence
Tác giả Truong Quoc Kiet
Người hướng dẫn Pham Quoc Trung
Trường học Btec
Chuyên ngành Computing
Thể loại assignment
Năm xuất bản 2023
Định dạng
Số trang 84
Dung lượng 6,56 MB

Cấu trúc

  • 2. Tableau (29)
  • B. Design a business intelligence tool, application or interface that can perform a specific task to (0)
    • I. Python (0)
    • II. Tableau (51)
  • C. Discuss how business intelligence tools can contribute to effective decision-making. (P5) (55)
    • I. Histogram (55)
      • 1. Analyze (55)
      • 2. Evaluate how this tool help (57)
    • II. Boxplot chart (57)
    • III. Bar Chart (63)
    • IV. Correlation Matrix (75)
  • D. Explore the legal issues involved in the secure exploitation of business intelligence tools. (P6) (77)
    • I. Legal issues (77)
      • 1. Data Privacy (77)
      • 2. Intellectual Property (78)
      • 3. Contractual Obligations (79)
    • II. Security issues (79)
      • 1. Data handling and storage (79)
      • 2. Cybersecurity Threats (79)
      • 3. Bias, Fairness & Discrimination (79)
    • III. Business Intelligence Tools (81)
      • 1. Spyder (81)
    • IV. Conclusion (81)
  • E. References (82)

Nội dung

Definition Business Intelligence BI refers to the technology-driven process of collecting, analyzing, and presenting business data to support decision-making within an organization.. Key

Tableau

Tableau empowers organizations to unlock the potential of their data, transforming raw insights into actionable intelligence Its interactive dashboards, reports, and charts provide a comprehensive view of complex datasets This empowers businesses to make data-driven decisions, leading to greater efficiency, improved decision-making, and enhanced outcomes.

Tableau, with its robust set of features, stands out as a leading data visualization and business intelligence platform Its intuitive data visualization capabilities empower users to craft visually stunning and interactive dashboards effortlessly, eliminating the need for extensive coding or technical expertise The platform's strength lies in its connectivity, offering seamless integration with a diverse range of data sources, including databases, spreadsheets, and cloud- based repositories Tableau's real-time data analysis feature enables users to gain quick insights into dynamic trends, while the drag-and-drop functionality simplifies the creation of charts, graphs, and maps The ability to blend data from various sources ensures a comprehensive

Our initial step involves creating a histogram to visualize the distribution of AI/machine learning adoption across different industries To achieve this, we will tally the occurrences of industries in the dataset, specifically in column Q24, representing the current industry of the respondents Seaborn will be employed to generate the histogram based on the values in column Q24 Subsequently, we will utilize matplotlib to modify the x-axis label, ensuring it provides more context than just "Q24." Additionally, we will rotate the names of the industries 90 degrees for

Continuing with our analysis of compensation for various field-related jobs, we will construct a box plot In this plot, the yearly compensation column (Q29), which we previously cleaned, will serve as the Y-axis, while the respondents' jobs (Q23) will be represented on the X- axis Employing seaborn and matplotlib, we will generate the plot and make necessary adjustments to enhance its visual representation and details This approach allows us to visually compare the distribution of yearly compensation across different job roles in the field

16 GCD210125 –Figure 21 Code for boxplot

To explore the correlations among different programming languages (Python, R, SQL, C) and their relationships with age, we initiate by creating a new dataframe This dataframe exclusively comprises columns for age (Q2) and the four coding languages (Q12_1, Q12_2, Q12_3, Q12_4) Subsequently, utilizing the pandas corr() function, we calculate the correlation between these columns However, since the data in these columns are categorical, we apply the factorize function to convert them into numeric elements in a list, enabling the use of corr() Once the correlation data is obtained, we employ seaborn's heatmap function to visualize the correlation matrix, and matplotlib is utilized to modify the labels for enhanced clarity This analysis allows us to discern the interrelationships between age and the proficiency in different programming languages

17 GCD210125 –Figure 23 Code for correlation matrix

Figure 24 Correlation Matrix in Python

To analyze language preferences across education levels, four dataframes were created, each containing education level and one language column Null values were removed from the language columns for accurate chart creation Four bar charts were then generated using the seaborn library, depicting the occurrence of different education levels for Python, R, SQL, and C The charts provide insights into the language preferences based on education levels within the surveyed dataset.

18 GCD210125 –Figure 25 Code for bar chart

Figure 26 Barchart in Python - C user

19 GCD210125 –Figure 27 Barchart in Python R user –

Figure 28 Barchart in Python SQL user –

20 GCD210125 –Figure 29 Barchart in Python - Python user

For convenience, when we have to work with the dataset again in another software we save the filtered dataframe with the altered Q29 column into a new csv file

Figure 30 Save into new file

Since we already cleaned the dataset with Python we are just going to use the save cleaned csv file for Tableau so we won’t have to go through the whole process again

Figure 31 Histogram in Tableau For the histogram, we used Q24 as the dimension for the Columns bar and the records count

21 GCD210125 – g , of the whole file as the a measure in the Rows bar

Figure 32 Boxplot in Tableau For the box plot we used Q23 for the Columns bar and the altered Q29 in the Rows bar but rather than having tableau default it to Count, we set Q29 to Dimension

22 GCD210125 –Figure 33 Barchart in Tableau

For the bar charts, in tableau we didn’t have to have 4 separate sheets with 4 charts, instead we just put the 3 language column with Count to use as measures in the Rows bar and Q8 as dimension in the Columns bar

C Discuss how business intelligence tools can contribute to effective decision-making (P5)

Figure 34 Histogram made by using Python (Spyder)

23 GCD210125 –Figure 35 Histogram made by using Tableau

The histogram generated using both Python and Tableau offers a comprehensive view of the extent to which various industries have embraced Machine Learning techniques A visual examination of this graph reveals a notable dominance in Machine Learning adoption within the Computer/Technology industry, presenting a significantly higher prevalence compared to other sectors Following closely is the Academics/Education sector, standing out as the second most prominent domain in incorporating Machine Learning methodologies The stark contrast in adoption rates between these sectors and their counterparts is evident Additionally, it's noteworthy that the largest adopter of AI/machine learning exhibits a magnitude nearly twice that of the second-largest adopter On the opposite spectrum, the Shipping/Transportation industry emerges as the least adoptive, closely followed by the Broadcasting/Communication sector as the second- least adoptive

2 Evaluate how this tool help

Python played a crucial role in efficiently filtering and cleaning the dataset, a task that might not be as straightforward in Tableau It enabled the rapid creation of a histogram to visualize the data However, the strength of Tableau came into play when it was used to validate the accuracy of the data analysis and visualization performed by Python Tableau provided a more detailed and interactive histogram, allowing for a closer examination of specific bins within the histogram This level of interactivity was particularly beneficial, offering insights that might be challenging to achieve within the constraints of time and skill when working solely with Python code Together, Python and Tableau formed a complementary toolkit, combining the data processing capabilities of Python with the advanced visualization features of Tableau for a comprehensive and validated analytical approach

24 GCD210125 –Figure 36 Boxplot chart made by using Tableau

Figure 37 Boxplot chart made by using Python (Spyder)

Analyzing the two charts provided, a distinct disparity in average salaries across various

Positions working directly with data, such as data scientists, data architects, and research scientists, receive significantly higher average salaries Notably, data architects earn an average salary of around $65K annually, while data scientists earn approximately $45K per year In contrast, certain roles, including Data Analysts, Engineers (Non-Software), and Data Administrators, have considerably lower average salaries of around $8,700 annually The lowest average salary belongs to statisticians, with an average income just over $4K Although most positions have salaries below $200K annually, there are outliers earning over $700K per year It's important to focus on average salary trends across various positions rather than extreme outliers.

From the analysis, it is evident that positions within the management team and those directly involved in working with data demand the most attention These roles are crucial due to the substantial workload and the precision they contribute to the company's information and data analysis The accuracy of information and data is directly linked to the advantages a company can gain, justifying the higher salaries associated with these positions Investing in competitive salaries for these roles is a strategic move, considering that competitors are likely to do the same to attract top talent In a fiercely competitive market, gaining an edge through precise data analysis is a valuable and worthwhile investment

The careful and meticulous processing of data is essential for maintaining a clear understanding and staying ahead in the market Without a clear strategic direction, a company may lose momentum compared to its competitors and incur additional effort to adapt its operating systems accordingly Therefore, prioritizing the recruitment of skilled individuals in management and data-related fields is a top priority for the company to ensure it remains competitive, agile, and well-positioned in the dynamic business landscape

2 Evaluate how this tool help

The survey engaged participation from thousands of individuals, and the significance of utilizing visualization tools like Tableau and Python in extracting information cannot be overstated Without these tools, the process of filtering and analyzing the data would extend over weeks, prone to errors if handled manually The efficiency and support provided by tools like Tableau and Python are evident, drastically reducing the time and personnel required for such tasks What would take hundreds of hours of work and testing can now be accomplished in just a few hours, showcasing the immense time-saving capabilities

Design a business intelligence tool, application or interface that can perform a specific task to

Tableau

The cleaned dataset is ready for use in Tableau.* By utilizing the previously cleaned CSV file, we avoid the need for additional data processing.

Figure 31 Histogram in Tableau For the histogram, we used Q24 as the dimension for the Columns bar and the records count

21 GCD210125 – g , of the whole file as the a measure in the Rows bar

Figure 32 Boxplot in Tableau For the box plot we used Q23 for the Columns bar and the altered Q29 in the Rows bar but rather than having tableau default it to Count, we set Q29 to Dimension

22 GCD210125 –Figure 33 Barchart in Tableau

For the bar charts, in tableau we didn’t have to have 4 separate sheets with 4 charts, instead we just put the 3 language column with Count to use as measures in the Rows bar and Q8 as dimension in the Columns bar

Discuss how business intelligence tools can contribute to effective decision-making (P5)

Histogram

Figure 34 Histogram made by using Python (Spyder)

23 GCD210125 –Figure 35 Histogram made by using Tableau

The histogram generated using both Python and Tableau offers a comprehensive view of the extent to which various industries have embraced Machine Learning techniques A visual examination of this graph reveals a notable dominance in Machine Learning adoption within the Computer/Technology industry, presenting a significantly higher prevalence compared to other sectors Following closely is the Academics/Education sector, standing out as the second most prominent domain in incorporating Machine Learning methodologies The stark contrast in adoption rates between these sectors and their counterparts is evident Additionally, it's noteworthy that the largest adopter of AI/machine learning exhibits a magnitude nearly twice that of the second-largest adopter On the opposite spectrum, the Shipping/Transportation industry emerges as the least adoptive, closely followed by the Broadcasting/Communication sector as the second- least adoptive

2 Evaluate how this tool help

Python played a crucial role in efficiently filtering and cleaning the dataset, a task that might not be as straightforward in Tableau It enabled the rapid creation of a histogram to visualize the data However, the strength of Tableau came into play when it was used to validate the accuracy of the data analysis and visualization performed by Python Tableau provided a more detailed and interactive histogram, allowing for a closer examination of specific bins within the histogram This level of interactivity was particularly beneficial, offering insights that might be challenging to achieve within the constraints of time and skill when working solely with Python code Together, Python and Tableau formed a complementary toolkit, combining the data processing capabilities of Python with the advanced visualization features of Tableau for a comprehensive and validated analytical approach.

Boxplot chart

24 GCD210125 –Figure 36 Boxplot chart made by using Tableau

Figure 37 Boxplot chart made by using Python (Spyder)

Analyzing the two charts provided, a distinct disparity in average salaries across various

Data-related positions, such as data scientists, data architects, and research scientists, typically command higher average salaries compared to other roles Data architects earn around $65,000 annually, while data scientists average $45,000 Positions like Data Analyst, Engineer (Non-Software), and Data Administrator have lower average salaries, around $8,700 annually Statisticians have the lowest average salary at just over $4,000 per year While most positions earn below $200,000 annually, there are outliers earning over $700,000 The focus should remain on the average salary trends across different positions.

From the analysis, it is evident that positions within the management team and those directly involved in working with data demand the most attention These roles are crucial due to the substantial workload and the precision they contribute to the company's information and data analysis The accuracy of information and data is directly linked to the advantages a company can gain, justifying the higher salaries associated with these positions Investing in competitive salaries for these roles is a strategic move, considering that competitors are likely to do the same to attract top talent In a fiercely competitive market, gaining an edge through precise data analysis is a valuable and worthwhile investment

The careful and meticulous processing of data is essential for maintaining a clear understanding and staying ahead in the market Without a clear strategic direction, a company may lose momentum compared to its competitors and incur additional effort to adapt its operating systems accordingly Therefore, prioritizing the recruitment of skilled individuals in management and data-related fields is a top priority for the company to ensure it remains competitive, agile, and well-positioned in the dynamic business landscape

2 Evaluate how this tool help

The survey engaged participation from thousands of individuals, and the significance of utilizing visualization tools like Tableau and Python in extracting information cannot be overstated Without these tools, the process of filtering and analyzing the data would extend over weeks, prone to errors if handled manually The efficiency and support provided by tools like Tableau and Python are evident, drastically reducing the time and personnel required for such tasks What would take hundreds of hours of work and testing can now be accomplished in just a few hours, showcasing the immense time-saving capabilities

These tools serve as powerful replacements for manual statistical efforts, potentially replacing the need for dozens of individuals collaborating on data analysis The visualizations generated by Tableau and Python offer an intuitive and clear understanding of the data The depiction of salary ranges, including the highest salary scores, averages, and summary through box plots, is remarkably insightful Tableau goes a step further by accurately displaying the highest hinge and whisker points, along with precise representation of the lowest part The units on the table are distinctly marked by large and easily discernible dots, allowing for a clear assessment of data concentration levels

In summary, the effectiveness of these tools is indisputable They stand as indispensable

26 GCD210125 – assets for analysts, analytics experts, and companies grappling with massive volumes of data The capabilities of Tableau and Python not only streamline the analytical process but also enhance accuracy and provide a visual clarity that is unparalleled in traditional manual approaches.

Bar Chart

27 GCD210125 –Figure 38 Barchart in Tableau

Figure 39 Barchart in Python - C user

Figure 40 Barchart in Python - R user

30 GCD210125 –Figure 41 Barchart in Python - SQL user

Figure 42 Barchart in Python - Python user The presented charts provide a visual representation of user distribution across the programming languages Python, SQL, R, and C, segmented by different education levels A bar chart effectively communicates the count of users for each language within various educational

31 GCD210125 – brackets Here are the key insights derived from the chart analysis:

Educational Distribution: o A significant portion of users across all four languages holds advanced educational qualifications, primarily university degrees or higher o Individuals with master's degrees consistently represent the highest count in each language, indicating a prevalent presence of users with advanced academic backgrounds

Usage Patterns Across Languages: o C Language: The usage of the C language appears to be relatively limited, with its highest count reaching just 364 users This may be attributed to the inherent complexity of C, given its low-level nature o Python & SQL: Python and SQL exhibit a pronounced prevalence among users with master's degrees Python registers 3,264 users, while SQL closely follows with 2,077 users

This suggests extensive teaching and application of these languages, particularly within academic and professional settings, especially in data science disciplines

Overall Language Preference: o Python emerges as the predominant choice across most educational tiers, followed by R, SQL, and C, though with a notably reduced user count o The higher preference for Python could be indicative of its versatility and popularity in diverse domains, including data science and programming

In summary, the charts offer valuable insights into the distribution and preferences of users across different programming languages based on their educational backgrounds These observations can inform educational and professional strategies related to programming language training and usage

2 Evaluate how this tool help

The Spyder chart lacks a comprehensive breakdown of user counts, making it challenging to ascertain precise numbers for each educational category The data does not encompass demographic variables such as age or gender, precluding any definitive conclusions regarding the age or gender distribution among users with higher educational backgrounds

From a visual analysis, the advantages of Tableau over Spyder in terms of data presentation and manipulation become evident:

Unified Chart Presentation: o Tableau allows the consolidation of data from multiple columns into a single chart, facilitating a streamlined comparison process o Spyder, on the other hand, requires users to split the data across multiple charts, necessitating frequent toggling between visuals, which can be less efficient

Interactive Data Insights: o Tableau offers enhanced interactivity; hovering over individual bars reveals detailed data, often incorporating information from other columns within the same chart o Spyder's visuals remain static, requiring users to reference the original data for precise details Although Spyder provides more granular interval divisions, achieving similar detailed data views as Tableau mandates additional code integration, prolonging the process

Ease of Visualization Customization: o Tableau simplifies visualization customization through intuitive function buttons, allowing users to effortlessly modify colors, shapes, and sort data o Spyder's customization demands are more code-driven While Spyder offers flexibility, achieving visually appealing results akin to Tableau entails integrating additional libraries and lines of code, which can be more time-consuming

In summary, while both Tableau and Spyder offer visualization capabilities, Tableau's user-friendly interface and interactive features often provide a more efficient and visually engaging experience, particularly for those seeking rapid insights and customization options The ease of unified chart presentation, interactive data insights, and visualization customization makes Tableau a preferable choice for efficient and effective data analysis.

Correlation Matrix

Figure 43 Correlation Matrix made by using Python (Spyder) (Note: Q2 = Age; Q12_1 = Python; Q12_2 = SQL; Q12_3 = R and Q12_4 = C)

Based on the observations from correlation matrix, several conclusions and insights can be drawn:

Senior programmers favor C (-0.28), SQL (-0.33), and Python (-0.36), while avoiding R (-0.37), potentially due to C's suitability for experienced professionals working with embedded systems and performance-intensive applications.

Preference for Python in Working with SQL Data: Those proficient in SQL tend to favor Python the most (-0.28), followed by C and R (-0.013 and -0.32) This preference may stem

33 GCD210125 – y ( ), y ( ) p y from Python's robust libraries and tools that support SQL data processing, allowing for a flexible integration between database querying and data analysis

Python Users Tend to Avoid C: Users of Python show the least preference for the C programming language (-0.15) This might reflect the convenience and flexibility of Python in application development and data processing, compared to C, which requires more coding and memory management

In conclusion, based on the information from the correlation matrix, programming language usage trends may reflect the influence of factors such as age, experience, and job roles.This information can be valuable in shaping training and skill development strategies within your organization, as well as optimizing the use of programming languages tailored to specific user groups

2 Evaluate how this tool help

In the case of creating a correlation matrix, we encountered difficulties visualizing the correlation matrix using Tableau However, we were successful in generating the correlation matrix using Python We observed that Python excels in data processing, whereas Tableau proves challenging to produce similar results Although Python is highly proficient, it falls short in terms of aesthetics compared to Tableau The variables in the correlation matrix still retain their column names, without being replaced by variable names for better understanding, which is a significant drawback when using Python to create a correlation matrix.

Explore the legal issues involved in the secure exploitation of business intelligence tools (P6)

Legal issues

Navigating the legal landscape of data utilization, businesses face stringent regulatory frameworks such as the GDPR in Europe and the CCPA in the U.S These regulations impose rigorous standards governing the collection, storage, and processing of data Compliance requires businesses to obtain explicit user consent, maintain transparent data handling practices, and uphold the right to data erasure Beyond mere compliance, businesses must adhere to principles like data minimization and purpose limitation, ensuring that they only collect information that is strictly necessary for the intended purpose The over-collection or repurposing of data without proper consent not only violates these regulations but can also result in severe penalties, highlighting the critical importance of responsible and lawful data management practices in today's data-centric business landscape

In the rapidly evolving landscape of artificial intelligence (AI), safeguarding proprietary algorithms has become imperative for companies relying on sophisticated AI solutions These algorithms serve as invaluable assets and, to secure ownership rights, companies often turn to legal instruments such as patents, copyrights, or trade secrets Establishing clear ownership not only protects intellectual property but also ensures a competitive edge in the market Additionally, as businesses increasingly integrate third-party tools or engage in collaborative ventures, careful attention to licensing terms is essential Misuse or unauthorized distribution of licensed software can lead to legal disputes, emphasizing the need for meticulous adherence to licensing agreements and collaboration terms In the dynamic and interconnected realm of AI, proactive legal measures are crucial to foster innovation, protect assets, and maintain ethical business practices

In the realm of business operations, be it licensing agreements for business intelligence tools or contractual engagements with clients, achieving agreement clarity is of paramount importance Clearly defined terms, conditions, and obligations are essential to avoid misunderstandings, disputes, and potential financial repercussions down the line Ambiguity or vagueness in contracts can lead to confusion and hinder the smooth execution of business relationships Equally crucial are termination clauses within contracts, which should unambiguously outline the conditions under which agreements can be terminated These clauses serve to protect the interests of both parties involved, offering a clear framework for addressing unforeseen circumstances and providing a foundation for a fair and equitable resolution in the event of termination In the complex landscape of business agreements, precision and transparency in contractual language are key elements for fostering successful, mutually beneficial partnerships.

Security issues

Securing AI systems involves encrypting data during transmission and storage to protect its sensitivity Enforcing access controls limits access to authorized personnel, preventing internal breaches and unauthorized access, ensuring the security of data handling and storage.

Cybersecurity Threats, particularly in the context of AI systems, encompass the critical aspects of Vulnerability Management and Advanced Persistent Threats (APTs) Continuous monitoring, identification, and patching of vulnerabilities in AI systems are essential, with regular security audits playing a proactive role in addressing potential weaknesses Additionally, AI systems are susceptible to Advanced Persistent Threats (APTs), which are prolonged and targeted cyber-attacks that aim to infiltrate systems for data extraction or damaging purposes over an extended period, owing to the interconnected nature of AI systems, making them prime targets for such threats

Bias, fairness, and discrimination within AI systems can be addressed through various strategies Firstly, maintaining the integrity of training data is crucial, as biases often originate from the training data Ensuring diversity and representativeness within the training data can help mitigate discriminatory outcomes in AI applications Additionally, regular algorithmic auditing is essential for identifying biases or unintended discriminatory outcomes This involves employing fairness metrics and instituting post-deployment monitoring to detect and rectify any issues that may arise These measures aim to promote fairness and mitigate discrimination within AI systems.

Business Intelligence Tools

Spyder is an open-source integrated development environment (IDE) with the full form Scientific Python Development Environment, specifically tailored for scientific computing in Python This versatile platform is extensively utilized for data analysis, machine learning, and other AI-driven tasks due to its flexibility and comprehensive capabilities In terms of key legal aspects, it's important to note that Spyder is distributed under the MIT license, which permits users to use, modify, and distribute the software, including for commercial purposes, under specified conditions Furthermore, as Spyder often integrates with third-party Python libraries like NumPy, pandas, and scikit-learn, it is crucial to review and comprehend the licensing terms of each library, particularly when redistributing software or deploying solutions commercially Additionally, given Spyder's role in data analysis, businesses must ensure that any data processed or stored within the environment complies with data privacy regulations such as GDPR or CCPA Implementing data anonymization techniques and ensuring data encryption within Spyder is pivotal for maintaining data privacy and compliance

Tableau is a prominent data visualization tool that enables businesses to transform raw data into Interactive and understandable visualizations, dashboards, and reports In terms of key legal aspects, businesses that utilize Tableau are bound by Tableau's Terms of Service (ToS), which govern the contractual agreement These terms outline acceptable uses, data handling practices,and limitations on data storage and processing Additionally, Tableau may have specific data usage policies, particularly for cloud-based services or public visualization sharing, necessitating an understanding of data storage, access, and processing within Tableau's ecosystem Furthermore,Tableau regularly releases security disclosures to address potential vulnerabilities or security issues, requiring businesses to stay abreast of these disclosures for timely updates and patches to maintain a secure environment Moreover, when exporting visualizations and reports, businesses must ensure compliance with data export regulations and implement secure data transmission protocols to uphold data integrity.

Conclusion

In conclusion, as businesses turn more towards AI-powered solutions and utilize advanced

36 GCD210125 – business intelligence tools, it becomes essential to comprehend and address the legal and security risks related to these technologies A comprehensive strategy that incorporates stringent compliance measures, strong security protocols, and ongoing education is crucial for enabling businesses to embrace AI's potential while minimizing potential risks Collaborating with legal and cybersecurity experts, as well as conducting regular risk assessments, can further strengthen an organization's resilience in the rapidly evolving digital environment.

Ngày đăng: 06/05/2024, 14:59

w