tiểu luận giữa kỳ môn xác suất thống kê ứng dụng cho công nghệ thông tin tiểu luận giữa kỳ

Luckily, Python3 provide statistics module, which comes with very useful functions like mean, median, mode etc.. median function in the statistics module can be used to calculate median

Trang 1

KHOA CONG NGHE THONG TIN

rat

TIỂU LUẬN GIỮA KỲ MÔN:

XÁC SUAT THONG KE

UNG DUNG CHO CONG NGHE THONG TIN

TIỂU LUẬN GIỮA KỲ

Người hướng dân: TS NGUYÊN QUỐC BÌNH Người thực hiện LAM QUANG HUY

Khoá : K25

THÀNH PHÓ HÒ CHÍ MINH, NĂM 2022

Trang 2

KHOA CONG NGHE THONG TIN

7a

TIỂU LUẬN GIỮA KỲ MÔN:

XÁC SUAT THONG KE

UNG DUNG CHO CONG NGHE THONG TIN

TIỂU LUẬN GIỮA KỲ

Người hướng dẫn: TS NGUYÊN QUỐC BÌNH Người thực hiện: LÂM QUANG HUY

Khoá : K25

THANH PHO HO CHI MINH, NAM 2022

Trang 3

Em cam ơn thầy Nguyễn Quốc Bình đã giảng dạy cho em kiến thức về lap trình ứng dụng xác suất thông kê cũng như đã hướng dẫn em thực hiện bài tiêu luận giữa kỳ nay a.

Trang 4

TON DUC THANG

Tôi xin cam đoan đây là công trình nghiên cứu của riêng tôi và được sự hướng dẫn khoa học của TS Nguyễn Văn A; Các nội dung nghiên cứu, kết quá trong đề tài nay là trung thực và chưa công bố dưới bất kỳ hình thức nào trước đây Những số liệu trong các bảng biểu phục vụ cho việc phân tích, nhận xét, đánh giá được chính tác giả thu thập từ các nguồn khác nhau có ghi rõ trong phân tài liệu tham khảo

Ngoài ra, trong luận văn còn sử dụng một số nhận xét, đánh giá cũng như số liệu

của các tác giả khác, cơ quan tô chức khác đều có trích dẫn và chú thích nguồn gốc Nếu phát hiện có bất kỳ sự gian lận nào tôi xin hoàn toàn chịu trách nhiệm

về nội dung luận văn của mình Trường đại học Tôn Đức Thắng không liên quan đến những vi phạm tác quyền, bản quyền do tôi gây ra trong quá trình thực hiện (nếu có)

TP Hồ Chí Minh, ngày 26 tháng 10 năm 2022

Tác giả (ý tên và ghỉ rõ họ tên)

⁄

Lam Quang Huy

Trang 5

Bài tiêu luận là phần tóm tắt kiến thức mà học sinh học được ở khoảng thời gian

giữa kì 1 Về việc áp dụng kiến thức về môn xác suất thống kê đã học ở phân lí thuyết kết hợp phương pháp lập trình Python đã được học ở lớp thực hành đề giải quyết một

sô bài toán.Trong đó có cụ thể những nội dung của các nhóm chức năng của mô dun

statistics trong thư viện Python Học sinh thực hiện 2 phan: phan viét code vé thuat

toan can bang Histogram dé xtr li anh va phan viét bao cao (3 chuong) Cuéi phan tiéu

luận là nguồn tai liệu học sinh đã tham khảo để làm tiêu luận.

Trang 6

CÔNG TRÌNH DUOC HOAN THANH TAI TRUONG DAI HOC TON DUC THANG 2

CHAPTER 1 —- OPENING 6 1.1 Statistics library in Python 6 1.1.1 Gererality about Statistics library In Python - 2 221212221 21121121211111151 122 6 1.1.2 Some functions relate to Statistise ÏTDFArV L1 2220112111111 1122112221212 x mg 6

1.1.2.1 Statistics.mean(datfa) - cọ SH HH HH H2 1110111 tr 7 1.1.2.2 &Ằ cha nh ae 8 1.1.2.3 statistics.eeometric_mean(data) is c nọ n1 SH HH HH1 ru 10 1.1.2.4 Statistics harmonic_mean(data, weights=Nof€) ác cớ 11 1.1.2.5 statistics median(data) 0 ccc eeceeteeteneetecteneetseteeeetieeteectesteeneneses 13 1.1.2.6 Statistics median _low(data) 0 ccc ceeeteeteteeneneetetetnenenteen 16 1.1.2.7 Statistics median_high(data).00.00 ccc ccc eeteeeectseetcteeeneneneneneneee 18 1.1.2.8 Statisticsmedian_grouped(data) 0.00 0 ccc ceeeeeeteneteetseeteteeeneneneees 19 1.1.2.9 Statistics mode(data) 00 ccc ccccecceeteteteteeetectsestecisereneeteeeneneneen 22 1.1.2.10 statistics.multimode(data) ác HH H102 10t 1n tu 24 1.1.⁄2.11 statistics.quantile(dafa) - LH n1 0101111111111 HH HH Hệ 25 1.1.2.12 Statisties.pstdev(data, mu=NoR€) ác n n9 9411111111111 11H Hi rớt 26 1.1.2.13 Statisties pvariance(data, mu=NoI€) ác c 11H HH HH nấu 27

1.1.2.14 Statisties.stdev(data, xbar=NoR€) ác cnn HH9 H111 H1 HH HH Ho Hệ, 29 1.1.2.15 Statisties varlance(data, mu=NoI€) c cccnn HH HH HH Hớu 31 1.1.2.16 Statistics convarlanc€(X, V, /) LH HH HH HH nh HH nh tàu 34

II Ni an s ố 35

II cối an ỐỒ ố 36

CHAPTER 2_— HISTOGRAM EQUALIZATION ALGORITHÌM 2 s25 se” 38

2.1 Histogram equalization algorithm 38 2.2 Example about Histogram equalization algorithm 39 2.3 My comment, analysis, evaluation 41 CHAPTER 3- IMPLEMENTATION 42

Trang 8

1.1 Statistics library in Python

1.1.1 Gererality about Statistics library in Python

In the era of big data and artificial intelligence, data science and machine learning have become essential in many fields of science and technology A necessary aspect of working with data is the ability to describe, summarize, and represent data visually Python statistics libraries are comprehensive, popular, and widely used tools that will assist you in working with data

This module provides functions for calculating mathematical statistics of numeric (Real-valued) data

The module is not intended to be a competitor to third-party libraries such as NumPy, SciPy, or proprietary full-featured statistics packages aimed at

professional statisticians such as Minitab, SAS and Matlab It is aimed at the level

of graphing and scientific calculators

Descriptive statistics is about describing and summarizing data It uses two main approaches:

- The quantitative approach describes and summarizes data numerically

- The visual approach illustrates data with charts, plots, histograms, and other graphs

1.1.2 Some functions relate to Statistisc library

Averages and measures of central location:

- statistics mean(data)

- statistics fmean(data)

-statistics.geometric_mean(data)

Trang 9

-statistics variance(data, xbar=None)

Statistics for relations between two inputs:

-Syntax : mean([data-set])

-Parameters :

-[data-set] : List or tuple of a set of numbers

-Retums : Sample arithmetic mean of the provided data-set

Trang 11

converted to floats Moreover fmean() function runs faster than the mean() function -Syntax: fmean([data-set} ])

-Parameters:[data-set]: List or tuple of a set of numbers

-Retums: floating-point arithmetic mean of the provided data

Trang 13

-Parameters :

-[data-set] : List or tuple of a set of numbers

-Retums : the geometric mean of the provided data-set

1.1.2.4, Statistics harmonic_mean(data, weights=None)

-Harmonic Mean (also known as Contrary mean) is one of several kinds of average and in particular one of the Pythagorean means Usually used in situations when average rates are desired The harmonic mean is also the reciprocal of the arithmetic mean of the reciprocals of a given set of observations

Harmonic mean can be incorporated in Python3 by using harmonic _mean() function from the statistics module

Trang 14

Charmonic_meanCdata4)))

‘a

Mean of data set 2 is 4.574783168721765

Harmonic Mean of data set 3 is 55/56

Mean of data set 4 is 1.6363636363636365

Harmonic

Trang 15

statistics

Cstatistics (dat1))

raceback (most recent call last):

File “main.py", line 11, in <module>

print (statistics.harmonic_mean(dat1) )

Pile "/usr/lib/python3.8/statistics.py", line 406, in harmonic_mean

T, total, count = _sum(1/x for x in _fail_neg(data, errmsg))

Pile "/usr/lib/python3.8/statistics.py", line 166, in _sum

for n,d in map(_exact_ratio, values):

File "/usr/lib/python3.8/statistics.py", line 406, in <genexpr>

T, total, count = _sum(1/x for x in fail _neg(data, errmsg))

File "/usr/lib/python3.8/statistics.py", line 289, in _fail_neg

- Python is a very popular language when it comes to data analysis and statistics Luckily, Python3 provide statistics module, which comes with very useful functions like mean(), median(), mode() etc

median() function in the statistics module can be used to calculate median value from

an unsorted data-list The biggest advantage of using median() function is that the data- list does not need to be sorted before being sent as parameter to the median() function -Example:

-Median is the value that separates the higher half of a data sample or probability distribution from the lower half For a dataset, 1t may be thought of as the middle value

Trang 16

The median is the measure of the central tendency of the properties of a data-set in statistics and probability theory Median has a very big advantage over Mean, which is the median value is not skewed so much by extremely large or small values The median value is either contained in the data-set of values provided or it doesn’t sway too much from the data provided

For odd set of elements, the median value is the middle one

-For even set of elements, the median value is the mean of two middle elements -Median can be represented by the following formula :

- Syntax : median( [data-set] )

-Parameters : [data-set] : List or tuple or an iterable with a set of numeric values -Returns : Return the median (middle value) of the iterable containing the data -Exceptions : StatisticsError is raised when iterable passed is empty or when list is null

- Example:

Trang 18

statistics median

4 empty = 1

ivan)

Traceback (most recent call last):

File "main.py", line 7, in <module>

print(median (erpty) )

File "/usr/lib/python3.8/statistics.py", line 430, in median

raise StatisticsError("no median for empty data")

statistics.StatisticsError: no median for empty data

1.1.2.6 Statistics median_low(data)

- Median is often referred to as the robust measure of the central location and 1s less affected by the presence of outliers in data statistics module in Python allows three options to deal with median / middle elements in a data set, which are median(), median_low() and median_high() The low median is always a member of the data set When the number of data points is odd, the middle value is returned When it 1s even, the smaller of the two middle values is returned

- Syntax : median_low( [data-set] )

-Parameters : [data-set] : Takes in a list, tuple or an iterable set of numeric data -Returntype : Returns the low median of numeric data Low median is a member of actual data-set

- Example:

Trang 19

File "main.py", line 7, in <module>

print (median_ 1ow(empty) )

File "/usr/1ib/python3.8/statistics.py", line 453, in median 1ow

statistics.StatisticsError: no median for empty data

Trang 20

11.2.7, Statistics median_high(data)

-Median is often referred to as the robust measure of the central location and is less affected by the presence of outliers in data statistics module in Python allows three options to deal with median / middle elements in a data set, which are median(), median_low() and median_high() The high median is always a member of the data set When the number of data points is odd, the middle value is returned When it 1s even, the larger of the two middle values 1s returned

-Syntax : median_high( [data — set] )

-Parameters : [data-set] : Takes in a list, or an iterable set of numeric data

-Returntype : Returns the high median of the numeric data (always in actual data- set)

Trang 21

CfrQ, 2), fr(44, 12),

fr€19, 3), frŒ2, 3))

(median_high(Cdata1))) (median_high(Cdata3)))

File “main.py", line 7, in <module>

print (median_high(empty) )

File "/usr/lib/python3.8/statistics.py", line 475, in median high

statistics StatisticsError: no median for empty data

1.128 Statistics median_grouped(data)

-median_grouped() function under the Statistics module, helps to calculate median value from a set of continuous data

-The data are assumed to be grouped into intervals of width intervals Each data point

in the array is the midpoint of the interval containing the true value The median is

Trang 22

calculated by interpolation within the median interval (the mterval containing the median value), assuming that the true values within that interval are distributed uniformly :

median = L + interval * (N /2 - CF) / FL = lower limit of the median interval

N = total number of data points

CF = number of data points below the median interval

F = number of data points in the median interval

-Syntax : median_grouped( [data-set], interval)

-Parameters :

[data-set] : List or tuple or an tterable with a set of numeric values

interval (1 by default) : Determines the width of grouped data and changing It will also change the interpolation of calculated median

-Returntype : Return the median of grouped continuous data, calculated as 50th percentile

-Exceptions : StatisticsError is raised when iterable passed is empty or when list is null

Trang 23

statistics median_grouped

% (median_grouped(set1) ))

Grouped Median of set 1 is 4.5

Grouped Median of set 2 is -6.5

Grouped Median of set 3 ic 1.5

` Cmedian_grouped(set1)))

X

(median_grouped(set1, interval »))

\ (median_grouped(set1, interval )))

v * & input

rouped Median for Interval set as (default) 1 is 12.5

rouped Median for Interval set as 2 is 12.0

5 rouped Median for Interval set as 5 is 10.5

Trang 24

1.129 Statistics mode(data)

-The mode of a set of data values is the value that appears most often It 1s the value at which the data is most likely to be sampled A mode of a continuous probability distribution is often considered to be any value x at which its probability density function has a local maximum value, so any peak is a mode

-Python is very robust when it comes to statistics and working with a set of a large range of values The statistics module has a very large number of functions to work with very large data-sets The mode() function is one of such methods This function returns the robust measure of a central data point in a given range of data-sets

- Syntax : mode([data-set])

-Parameters :

[data-set] which is a tuple, list or a iterator of real valued numbers as well as Strings

-Return type:

Returns the most-common data point from discrete or nominal data

-Errors and Exceptions :

Raises StatisticsError when data set is empty

Tiêu đề	Ứng dụng cho Công Nghệ Thông Tin
Tác giả	Lam Quang Huy
Người hướng dẫn	TS NGUYÊN QUỐC BÌNH
Trường học	Trường Đại Học Tôn Đức Thắng
Chuyên ngành	Xác Suất Thống Kê
Thể loại	tiểu luận giữa kỳ
Năm xuất bản	2022
Thành phố	Thành phố Hồ Chí Minh

Định dạng
Số trang	49
Dung lượng	9,09 MB