Luckily, Python3 provide statistics module, which comes with very useful functions like mean, median, mode etc.. median function in the statistics module can be used to calculate median
Trang 1KHOA CONG NGHE THONG TIN
rat
TIỂU LUẬN GIỮA KỲ MÔN:
XÁC SUAT THONG KE
UNG DUNG CHO CONG NGHE THONG TIN
TIỂU LUẬN GIỮA KỲ
Người hướng dân: TS NGUYÊN QUỐC BÌNH Người thực hiện LAM QUANG HUY
Khoá : K25
THÀNH PHÓ HÒ CHÍ MINH, NĂM 2022
Trang 2KHOA CONG NGHE THONG TIN
7a
TIỂU LUẬN GIỮA KỲ MÔN:
XÁC SUAT THONG KE
UNG DUNG CHO CONG NGHE THONG TIN
TIỂU LUẬN GIỮA KỲ
Người hướng dẫn: TS NGUYÊN QUỐC BÌNH Người thực hiện: LÂM QUANG HUY
Khoá : K25
THANH PHO HO CHI MINH, NAM 2022
Trang 3Em cam ơn thầy Nguyễn Quốc Bình đã giảng dạy cho em kiến thức về lap trình ứng dụng xác suất thông kê cũng như đã hướng dẫn em thực hiện bài tiêu luận giữa kỳ nay a.
Trang 4TON DUC THANG
Tôi xin cam đoan đây là công trình nghiên cứu của riêng tôi và được sự hướng dẫn khoa học của TS Nguyễn Văn A; Các nội dung nghiên cứu, kết quá trong đề tài nay là trung thực và chưa công bố dưới bất kỳ hình thức nào trước đây Những số liệu trong các bảng biểu phục vụ cho việc phân tích, nhận xét, đánh giá được chính tác giả thu thập từ các nguồn khác nhau có ghi rõ trong phân tài liệu tham khảo
Ngoài ra, trong luận văn còn sử dụng một số nhận xét, đánh giá cũng như số liệu
của các tác giả khác, cơ quan tô chức khác đều có trích dẫn và chú thích nguồn gốc Nếu phát hiện có bất kỳ sự gian lận nào tôi xin hoàn toàn chịu trách nhiệm
về nội dung luận văn của mình Trường đại học Tôn Đức Thắng không liên quan đến những vi phạm tác quyền, bản quyền do tôi gây ra trong quá trình thực hiện (nếu có)
TP Hồ Chí Minh, ngày 26 tháng 10 năm 2022
Tác giả (ý tên và ghỉ rõ họ tên)
⁄
Lam Quang Huy
Trang 5Bài tiêu luận là phần tóm tắt kiến thức mà học sinh học được ở khoảng thời gian
giữa kì 1 Về việc áp dụng kiến thức về môn xác suất thống kê đã học ở phân lí thuyết kết hợp phương pháp lập trình Python đã được học ở lớp thực hành đề giải quyết một
sô bài toán.Trong đó có cụ thể những nội dung của các nhóm chức năng của mô dun
statistics trong thư viện Python Học sinh thực hiện 2 phan: phan viét code vé thuat
toan can bang Histogram dé xtr li anh va phan viét bao cao (3 chuong) Cuéi phan tiéu
luận là nguồn tai liệu học sinh đã tham khảo để làm tiêu luận.
Trang 6
CÔNG TRÌNH DUOC HOAN THANH TAI TRUONG DAI HOC TON DUC THANG 2
CHAPTER 1 —- OPENING 6 1.1 Statistics library in Python 6 1.1.1 Gererality about Statistics library In Python - 2 221212221 21121121211111151 122 6 1.1.2 Some functions relate to Statistise ÏTDFArV L1 2220112111111 1122112221212 x mg 6
1.1.2.1 Statistics.mean(datfa) - cọ SH HH HH H2 1110111 tr 7 1.1.2.2 &Ằ cha nh ae 8 1.1.2.3 statistics.eeometric_mean(data) is c nọ n1 SH HH HH1 ru 10 1.1.2.4 Statistics harmonic_mean(data, weights=Nof€) ác cớ 11 1.1.2.5 statistics median(data) 0 ccc eeceeteeteneetecteneetseteeeetieeteectesteeneneses 13 1.1.2.6 Statistics median _low(data) 0 ccc ceeeteeteteeneneetetetnenenteen 16 1.1.2.7 Statistics median_high(data).00.00 ccc ccc eeteeeectseetcteeeneneneneneneee 18 1.1.2.8 Statisticsmedian_grouped(data) 0.00 0 ccc ceeeeeeteneteetseeteteeeneneneees 19 1.1.2.9 Statistics mode(data) 00 ccc ccccecceeteteteteeetectsestecisereneeteeeneneneen 22 1.1.2.10 statistics.multimode(data) ác HH H102 10t 1n tu 24 1.1.⁄2.11 statistics.quantile(dafa) - LH n1 0101111111111 HH HH Hệ 25 1.1.2.12 Statisties.pstdev(data, mu=NoR€) ác n n9 9411111111111 11H Hi rớt 26 1.1.2.13 Statisties pvariance(data, mu=NoI€) ác c 11H HH HH nấu 27
1.1.2.14 Statisties.stdev(data, xbar=NoR€) ác cnn HH9 H111 H1 HH HH Ho Hệ, 29 1.1.2.15 Statisties varlance(data, mu=NoI€) c cccnn HH HH HH Hớu 31 1.1.2.16 Statistics convarlanc€(X, V, /) LH HH HH HH nh HH nh tàu 34
II Ni an s ố 35
II cối an ỐỒ ố 36
CHAPTER 2_— HISTOGRAM EQUALIZATION ALGORITHÌM 2 s25 se” 38
2.1 Histogram equalization algorithm 38 2.2 Example about Histogram equalization algorithm 39 2.3 My comment, analysis, evaluation 41 CHAPTER 3- IMPLEMENTATION 42
Trang 81.1 Statistics library in Python
1.1.1 Gererality about Statistics library in Python
In the era of big data and artificial intelligence, data science and machine learning have become essential in many fields of science and technology A necessary aspect of working with data is the ability to describe, summarize, and represent data visually Python statistics libraries are comprehensive, popular, and widely used tools that will assist you in working with data
This module provides functions for calculating mathematical statistics of numeric (Real-valued) data
The module is not intended to be a competitor to third-party libraries such as NumPy, SciPy, or proprietary full-featured statistics packages aimed at
professional statisticians such as Minitab, SAS and Matlab It is aimed at the level
of graphing and scientific calculators
Descriptive statistics is about describing and summarizing data It uses two main approaches:
- The quantitative approach describes and summarizes data numerically
- The visual approach illustrates data with charts, plots, histograms, and other graphs
1.1.2 Some functions relate to Statistisc library
Averages and measures of central location:
- statistics mean(data)
- statistics fmean(data)
-statistics.geometric_mean(data)
Trang 9-statistics variance(data, xbar=None)
Statistics for relations between two inputs:
-Syntax : mean([data-set])
-Parameters :
-[data-set] : List or tuple of a set of numbers
-Retums : Sample arithmetic mean of the provided data-set
Trang 11converted to floats Moreover fmean() function runs faster than the mean() function -Syntax: fmean([data-set} ])
-Parameters:[data-set]: List or tuple of a set of numbers
-Retums: floating-point arithmetic mean of the provided data
Trang 13-Parameters :
-[data-set] : List or tuple of a set of numbers
-Retums : the geometric mean of the provided data-set
1.1.2.4, Statistics harmonic_mean(data, weights=None)
-Harmonic Mean (also known as Contrary mean) is one of several kinds of average and in particular one of the Pythagorean means Usually used in situations when average rates are desired The harmonic mean is also the reciprocal of the arithmetic mean of the reciprocals of a given set of observations
Harmonic mean can be incorporated in Python3 by using harmonic _mean() function from the statistics module
Trang 14Charmonic_meanCdata4)))
‘a
Mean of data set 2 is 4.574783168721765
Harmonic Mean of data set 3 is 55/56
Mean of data set 4 is 1.6363636363636365
Harmonic
Harmonic
Trang 15statistics
Cstatistics (dat1))
raceback (most recent call last):
File “main.py", line 11, in <module>
print (statistics.harmonic_mean(dat1) )
Pile "/usr/lib/python3.8/statistics.py", line 406, in harmonic_mean
T, total, count = _sum(1/x for x in _fail_neg(data, errmsg))
Pile "/usr/lib/python3.8/statistics.py", line 166, in _sum
for n,d in map(_exact_ratio, values):
File "/usr/lib/python3.8/statistics.py", line 406, in <genexpr>
T, total, count = _sum(1/x for x in fail _neg(data, errmsg))
File "/usr/lib/python3.8/statistics.py", line 289, in _fail_neg
- Python is a very popular language when it comes to data analysis and statistics Luckily, Python3 provide statistics module, which comes with very useful functions like mean(), median(), mode() etc
median() function in the statistics module can be used to calculate median value from
an unsorted data-list The biggest advantage of using median() function is that the data- list does not need to be sorted before being sent as parameter to the median() function -Example:
-Median is the value that separates the higher half of a data sample or probability distribution from the lower half For a dataset, 1t may be thought of as the middle value
Trang 16The median is the measure of the central tendency of the properties of a data-set in statistics and probability theory Median has a very big advantage over Mean, which is the median value is not skewed so much by extremely large or small values The median value is either contained in the data-set of values provided or it doesn’t sway too much from the data provided
For odd set of elements, the median value is the middle one
-For even set of elements, the median value is the mean of two middle elements -Median can be represented by the following formula :
- Syntax : median( [data-set] )
-Parameters : [data-set] : List or tuple or an iterable with a set of numeric values -Returns : Return the median (middle value) of the iterable containing the data -Exceptions : StatisticsError is raised when iterable passed is empty or when list is null
- Example:
Trang 18statistics median
4 empty = 1
ivan)
Traceback (most recent call last):
File "main.py", line 7, in <module>
print(median (erpty) )
File "/usr/lib/python3.8/statistics.py", line 430, in median
raise StatisticsError("no median for empty data")
statistics.StatisticsError: no median for empty data
1.1.2.6 Statistics median_low(data)
- Median is often referred to as the robust measure of the central location and 1s less affected by the presence of outliers in data statistics module in Python allows three options to deal with median / middle elements in a data set, which are median(), median_low() and median_high() The low median is always a member of the data set When the number of data points is odd, the middle value is returned When it 1s even, the smaller of the two middle values is returned
- Syntax : median_low( [data-set] )
-Parameters : [data-set] : Takes in a list, tuple or an iterable set of numeric data -Returntype : Returns the low median of numeric data Low median is a member of actual data-set
- Example:
Trang 19Traceback (most recent call last):
File "main.py", line 7, in <module>
print (median_ 1ow(empty) )
File "/usr/1ib/python3.8/statistics.py", line 453, in median 1ow
raise StatisticsError("no median for empty data")
statistics.StatisticsError: no median for empty data
Trang 2011.2.7, Statistics median_high(data)
-Median is often referred to as the robust measure of the central location and is less affected by the presence of outliers in data statistics module in Python allows three options to deal with median / middle elements in a data set, which are median(), median_low() and median_high() The high median is always a member of the data set When the number of data points is odd, the middle value is returned When it 1s even, the larger of the two middle values 1s returned
-Syntax : median_high( [data — set] )
-Parameters : [data-set] : Takes in a list, or an iterable set of numeric data
-Returntype : Returns the high median of the numeric data (always in actual data- set)
Trang 21CfrQ, 2), fr(44, 12),
fr€19, 3), frŒ2, 3))
(median_high(Cdata1))) (median_high(Cdata3)))
Traceback (most recent call last):
File “main.py", line 7, in <module>
print (median_high(empty) )
File "/usr/lib/python3.8/statistics.py", line 475, in median high
raise StatisticsError("no median for empty data")
statistics StatisticsError: no median for empty data
1.128 Statistics median_grouped(data)
-median_grouped() function under the Statistics module, helps to calculate median value from a set of continuous data
-The data are assumed to be grouped into intervals of width intervals Each data point
in the array is the midpoint of the interval containing the true value The median is
Trang 22calculated by interpolation within the median interval (the mterval containing the median value), assuming that the true values within that interval are distributed uniformly :
median = L + interval * (N /2 - CF) / FL = lower limit of the median interval
N = total number of data points
CF = number of data points below the median interval
F = number of data points in the median interval
-Syntax : median_grouped( [data-set], interval)
-Parameters :
[data-set] : List or tuple or an tterable with a set of numeric values
interval (1 by default) : Determines the width of grouped data and changing It will also change the interpolation of calculated median
-Returntype : Return the median of grouped continuous data, calculated as 50th percentile
-Exceptions : StatisticsError is raised when iterable passed is empty or when list is null
Trang 23statistics median_grouped
% (median_grouped(set1) ))
% (median_grouped(set2) ))
% (median_grouped(set3) ))
Grouped Median of set 1 is 4.5
Grouped Median of set 2 is -6.5
Grouped Median of set 3 ic 1.5
` Cmedian_grouped(set1)))
X
(median_grouped(set1, interval »))
\ (median_grouped(set1, interval )))
v * & input
rouped Median for Interval set as (default) 1 is 12.5
rouped Median for Interval set as 2 is 12.0
5 rouped Median for Interval set as 5 is 10.5
Trang 241.129 Statistics mode(data)
-The mode of a set of data values is the value that appears most often It 1s the value at which the data is most likely to be sampled A mode of a continuous probability distribution is often considered to be any value x at which its probability density function has a local maximum value, so any peak is a mode
-Python is very robust when it comes to statistics and working with a set of a large range of values The statistics module has a very large number of functions to work with very large data-sets The mode() function is one of such methods This function returns the robust measure of a central data point in a given range of data-sets
- Syntax : mode([data-set])
-Parameters :
[data-set] which is a tuple, list or a iterator of real valued numbers as well as Strings
-Return type:
Returns the most-common data point from discrete or nominal data
-Errors and Exceptions :
Raises StatisticsError when data set is empty