TIỂU LUẬN GIỮA kỳ môn xác SUẤT THỐNG kê ỨNG DỤNG CHO CÔNG NGHỆ THÔNG TIN

OPENING

Statistics library in Python

1.1.1 Gererality about Statistics library in Python

In today's world of big data and AI, data science and machine learning are crucial across various scientific and technological domains An important skill in data management is the capability to effectively describe, summarize, and visually represent data Python's statistics libraries are essential tools that are comprehensive, popular, and extensively utilized for data analysis.

This module provides functions for calculating mathematical statistics of numeric (Real-valued) data

The module is designed for users seeking basic graphing and statistical functionalities, rather than competing with comprehensive libraries like NumPy and SciPy or professional statistical software such as Minitab, SAS, and Matlab It targets the needs of those using graphing and scientific calculators.

Descriptive statistics is about describing and summarizing data It uses two main approaches:

- The quantitative approach describes and summarizes data numerically.

- The visual approach illustrates data with charts, plots, histograms, and other graphs.

1.1.2 Some functions relate to Statistisc library

Averages and measures of central location:

- statistics.harmonic_mean(data, weights=None)

- statistics.median_grouped(data, interval=1)

- statistics.mode(data) -statistics.multimode(data) Measures of spread:

-statistics.pstdev(data, mu=None) -statistics.pvariance(data, mu=None) -statistics.stdev(data, xbar=None) -statistics.variance(data, xbar=None) Statistics for relations between two inputs:

-statistics.covariance(x, y, /) -statistics.correlation(x, y, /) -statistics.linear_regression(x, y, /, *, proportionalse)

- mean() function can be used to calculate mean/average of a given list of numbers It returns mean of the data set passed as parameters.

The arithmetic mean, a key statistical measure, is calculated by dividing the sum of a dataset by the number of data points This metric represents the central location of varying values within a dataset In Python, calculating the arithmetic mean involves summing the numbers and dividing by the total count of those numbers.

-Syntax : mean([data-set]) -Parameters :

-[data-set] : List or tuple of a set of numbers.

-Returns : Sample arithmetic mean of the provided data-set.

Tiểu luận giữa kỳ môn xác suất thống kê ứng dụng cho công nghệ thông tin đề cập đến tầm quan trọng của xác suất và thống kê trong việc phân tích và xử lý dữ liệu Nội dung tiểu luận sẽ khám phá các phương pháp thống kê cơ bản, ứng dụng của chúng trong công nghệ thông tin, cũng như cách chúng hỗ trợ ra quyết định dựa trên dữ liệu Bên cạnh đó, tiểu luận cũng sẽ trình bày các ví dụ thực tế về việc ứng dụng xác suất thống kê trong các lĩnh vực như phân tích dữ liệu lớn, học máy và trí tuệ nhân tạo Việc hiểu rõ xác suất và thống kê là rất cần thiết cho các chuyên gia trong ngành công nghệ thông tin để tối ưu hóa quy trình làm việc và nâng cao hiệu quả công việc.

-Exceptions : TypeError when anything other than numeric values are passed as parameter.

The fmean() function is designed to convert input data into a float data type, allowing it to calculate the arithmetic mean or average of a given sequence or iterable This function consistently returns a float as its output.

The primary distinction between the mean() and fmean() functions lies in data type conversion; fmean() converts data to floats, while mean() does not Additionally, fmean() operates more quickly than mean() The syntax for fmean() is fmean([data-set]).

-Parameters:[data-set]: List or tuple of a set of numbers.

-Returns: floating-point arithmetic mean of the provided data.

Tiểu luận giữa kỳ môn xác suất thống kê ứng dụng cho công nghệ thông tin tập trung vào việc phân tích và áp dụng các phương pháp xác suất trong lĩnh vực công nghệ thông tin Bài viết sẽ trình bày các khái niệm cơ bản của xác suất thống kê, các ứng dụng thực tiễn trong phân tích dữ liệu và tối ưu hóa quy trình Ngoài ra, tiểu luận cũng sẽ đề cập đến tầm quan trọng của việc sử dụng xác suất trong việc ra quyết định và dự đoán xu hướng trong công nghệ thông tin Các ví dụ minh họa sẽ giúp làm rõ cách mà xác suất thống kê có thể cải thiện hiệu suất và độ chính xác trong các ứng dụng công nghệ hiện đại.

-Convert data to floats and compute the geometric mean.

-The geometric mean indicates the central tendency or typical value of the data using the product of the values (as opposed to the arithmetic mean which uses their sum).

-Raises a StatisticsError if the input dataset is empty, if it contains a zero, or if it contains a negative value The data may be a sequence or iterable.

-No special efforts are made to achieve exact results (However, this may change in the future.)

-Parameters : -[data-set] : List or tuple of a set of numbers.

-Returns : the geometric mean of the provided data-set.

1.1.2.4 Statistics.harmonic_mean(data, weights=None)

The Harmonic Mean, also referred to as the Contrary Mean, is a type of average classified under the Pythagorean means It is particularly useful for calculating average rates and is defined as the reciprocal of the arithmetic mean of the reciprocals of a specific set of observations.

Harmonic mean can be incorporated in Python3 by using harmonic_mean() function from the statistics module

The harmonic_mean function calculates the harmonic mean of a given set of real-valued numbers It accepts a data set in the form of a list, tuple, or iterator The function returns the harmonic mean, providing a useful statistical measure for data analysis.

-Errors and Exceptions :StatisticsError when a empty data-set is passed or if data- set consist of negative values

TypeError for dataset of non-numeric type values.

Tiểu luận giữa kỳ môn xác suất thống kê ứng dụng cho công nghệ thông tin tập trung vào việc phân tích và áp dụng các phương pháp xác suất trong lĩnh vực công nghệ thông tin Nghiên cứu này nhằm cung cấp cái nhìn sâu sắc về cách thức mà xác suất thống kê có thể cải thiện quy trình ra quyết định và tối ưu hóa hiệu suất hệ thống thông tin Bằng cách sử dụng các mô hình xác suất, chúng ta có thể dự đoán và phân tích dữ liệu lớn, từ đó nâng cao khả năng xử lý thông tin và giảm thiểu rủi ro trong các ứng dụng công nghệ Tiểu luận sẽ trình bày các ứng dụng cụ thể, cũng như những thách thức và cơ hội mà xác suất thống kê mang lại cho ngành công nghệ thông tin hiện đại.

- The statistics.median() method calculates the median (middle value) of the given data set This method also sorts the data in ascending order before calculating the median.

Python is a widely used programming language for data analysis and statistics, and Python 3 offers a statistics module that includes essential functions such as mean(), median(), and mode() The median() function within this module allows users to calculate the median value from an unsorted data list, eliminating the need for prior sorting.

The median is the value that divides a data sample or probability distribution into two equal halves, effectively representing the middle point of the dataset.

Tiểu luận giữa kỳ môn xác suất thống kê ứng dụng cho công nghệ thông tin tập trung vào việc phân tích và áp dụng các phương pháp xác suất trong lĩnh vực công nghệ Nội dung bài viết sẽ trình bày các khái niệm cơ bản của xác suất thống kê, đồng thời nêu rõ tầm quan trọng của chúng trong việc giải quyết các vấn đề thực tiễn Bên cạnh đó, tiểu luận cũng sẽ đề cập đến các ứng dụng cụ thể của xác suất trong công nghệ thông tin, như phân tích dữ liệu, dự đoán xu hướng và tối ưu hóa quy trình Qua đó, người đọc sẽ hiểu rõ hơn về vai trò của xác suất thống kê trong việc nâng cao hiệu quả và độ chính xác trong các ứng dụng công nghệ hiện đại.

The median serves as a key measure of central tendency in statistics and probability theory, offering a significant advantage over the mean by remaining less influenced by extreme values Unlike the mean, the median is either part of the dataset or remains closely aligned with the data, ensuring a more accurate representation of the dataset's central point.

For odd set of elements, the median value is the middle one

-For even set of elements, the median value is the mean of two middle elements -Median can be represented by the following formula :

The syntax for calculating the median in Python is `median([data-set])`, where the parameter `[data-set]` can be a list, tuple, or any iterable containing numeric values This function returns the median, or middle value, of the provided data set It's important to note that a `StatisticsError` will be raised if the iterable is empty or if the list is null.

HISTOGRAM EQUALIZATION ALGORITHM

Histogram equalization algorithm

Histogram equalization is a fundamental image processing technique that enhances the global contrast of an image by modifying the pixel intensity distribution in its histogram This process allows areas with low contrast to achieve improved contrast in the resulting image In image processing, a histogram serves to quantify the frequency of different light levels present in an image.

Essentially, histogram equalization works by:

 Computing a histogram of image pixel intensities

 Evenly spreading out and distributing the most frequent pixel values (i.e., the ones with the largest counts in the histogram)

 Giving a linear trend to the cumulative distribution function (CDF) -The result of applying histogram equalization is an image with higher global contrast.

-We can further improve histogram equalization by applying an algorithm called Contrast Limited Adaptive Histogram Equalization (CLAHE), resulting in higher quality output images.

-Other than photographers using histogram equalization to correct under/over-exposed images, the most widely used histogram equalization application can be found in the medical field.

Histogram equalization is commonly utilized in X-ray and CT scans to enhance contrast, enabling doctors and radiologists to interpret the images more effectively and make precise diagnoses.

-By the end of this tutorial, you will be able to successfully apply both basic histogram equalization and adaptive histogram equalization to images with OpenCV.

- The algorithm will work most with Opencv library.

Example about Histogram equalization algorithm

-Applying histogram equalization starts by computing the histogram of pixel intensities in an input grayscale/single-channel image:

Our histogram displays multiple peaks, signifying a substantial number of pixels allocated to specific bins The objective of histogram equalization is to redistribute these pixels to underrepresented bins, enhancing the overall contrast of the image.

-Mathematically, what this means is that we’re attempting to apply a linear trend to our cumulative distribution function (CDF):

Tiểu luận giữa kỳ môn xác suất thống kê ứng dụng cho công nghệ thông tin tập trung vào việc phân tích và áp dụng các phương pháp xác suất trong lĩnh vực công nghệ thông tin Bài viết sẽ đề cập đến tầm quan trọng của xác suất thống kê trong việc xử lý dữ liệu, dự đoán xu hướng và ra quyết định Ngoài ra, nó cũng sẽ khám phá các công cụ và kỹ thuật thống kê được sử dụng trong phân tích dữ liệu lớn và trí tuệ nhân tạo, từ đó giúp nâng cao hiệu quả trong công việc và nghiên cứu Sự kết hợp giữa lý thuyết xác suất và thực tiễn công nghệ thông tin sẽ mang lại cái nhìn sâu sắc về cách tối ưu hóa quy trình và nâng cao chất lượng sản phẩm.

The before and after histogram equalization application can be seen in Figure 3:

-With adaptive histogram equalization, we divide an input image into an M x N grid

We then apply equalization to each cell in the grid, resulting in a higher quality output image:

-The downside is that adaptive histogram equalization is by definition more computationally complex (but given modern hardware, both implementations are still quite speedy).

-In this tutorial, you learned how to perform both basic histogram equalization and adaptive histogram equalization with OpenCV.

-Basic histogram equalization aims to improve the global contrast of an image by

“spreading out” pixel intensities often used in the image.

My comment, analysis, evaluation

IMPLEMENTATION

Implementation

Instruction for building and running my sourcecode:

 Import the library and the image

 Build the function of calculating Histogram

Tiểu luận giữa kỳ môn xác suất thống kê ứng dụng cho công nghệ thông tin nghiên cứu vai trò quan trọng của xác suất và thống kê trong việc phân tích dữ liệu, dự đoán xu hướng và ra quyết định Các phương pháp thống kê giúp tối ưu hóa quy trình và nâng cao hiệu quả trong các ứng dụng công nghệ thông tin Bài viết cũng nhấn mạnh tầm quan trọng của việc áp dụng lý thuyết xác suất trong các mô hình dự đoán, từ đó cung cấp cái nhìn sâu sắc về cách thức dữ liệu có thể được sử dụng để cải thiện sản phẩm và dịch vụ.

My equalize The binary image

The chart show the histogram of binary image and the equalized image.

Tiểu luận giữa kỳ môn xác suất thống kê ứng dụng cho công nghệ thông tin tập trung vào việc phân tích và áp dụng các nguyên lý xác suất để giải quyết các vấn đề trong lĩnh vực công nghệ thông tin Bài viết sẽ trình bày các phương pháp thống kê, cách thức thu thập và phân tích dữ liệu, cùng với các ứng dụng thực tiễn của xác suất trong việc tối ưu hóa quy trình và ra quyết định Qua đó, tiểu luận nhằm nâng cao hiểu biết về vai trò của xác suất thống kê trong việc phát triển và cải tiến công nghệ thông tin hiện đại.

Tiếng Việt https://docs.python.org/3/library/statistics.html Tiếng Anh https://www.geeksforgeeks.org

Tiêu đề	Tiểu Luận Giữa Kỳ
Tác giả	Lâm Quang Huy
Người hướng dẫn	TS. Nguyễn Quốc Bình
Trường học	Trường Đại Học Tôn Đức Thắng
Chuyên ngành	Xác Suất Thống Kê Ứng Dụng Cho Công Nghệ Thông Tin
Thể loại	tiểu luận
Năm xuất bản	2022
Thành phố	Thành Phố Hồ Chí Minh

Định dạng
Số trang	49
Dung lượng	3,03 MB