1. Trang chủ
  2. » Công Nghệ Thông Tin

Slide full

116 0 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Analysis of Car Prices
Tác giả Jeffrey C. Schlemmer
Trường học University of California, Irvine
Chuyên ngành Machine Learning
Thể loại thesis
Năm xuất bản 2023
Thành phố Irvine
Định dạng
Số trang 116
Dung lượng 21,68 MB

Nội dung

Trang 2 5Giá xe bao nhiêu Trang 4 13Source: https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data14pháp rõ dòng.. phân tích Trang 8 29Comma-separated Values

Trang 2

Giá xe bao nhiêu

Trang 3

9 10

Giá heo?

Data scientists

Jeffrey C Schlemmerhttps://archive.ics.uci.edu/ml/machine-learning-databases/autos/

Trang 4

Thu tính liên tính liên quan ra.

Attribute1 Attribute2 Attribute3 Attribute4 0

1 2 3

n

Sourse: https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.names

Trang 5

-cao

18

Jeffrey C Schlemmer Giá xe

UCI, Kaggle, Kdnuggets :

1

2

3 N4

Trang 8

Hierarchical Data Format (HDF) hdf pandas.read_hdf()

30

Data Source: https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data

header

Trang 9

SQL database

Trang 10

Comma-separated

Excel sheet excel pandas.read_excel() df to_excel()

Hierarchical Data

Trang 12

46 47

-2

Trang 14

df)(hay feature)

(average)(frequency)

DataFrame dropna( axis= 0, how='any', thresh=None, subset= None, inplace= False )

Parameters

axis Determine if rows or columns which contain missing values are removed.

0, or : Drop rows which contain missing values.

1, or : Drop columns which contain missing value.

how Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.

thresh Require that many non-NA values.

subset Labels along other axis to consider, e.g if you are dropping rows these would be a list of columns to include.

inplace

If True, do operation inplace and return None.

Returns

DataFrame DataFrame with NA entries dropped from it.

Trang 15

df.dropna(subset = [ ], axis = 0, inplace = True)

horsepower peak-rpm price

df replace ( to_replace= None , value= None , inplace= False , limit = None , regex = False ,

)

Trang 16

horsepower peak-rpm price

Trang 17

19 20

các khác nhau => nên không quán

theo tiêu quát chung, và cho phép

inplace = False , level = None ,

target with mapper Can be either the axis name

Trang 18

mpgL/100km

dataframe.astype( )

Trang 19

0.167 0.99 0.75

df[ 'length' ] = df[ ] / df[ 'length' ] max () df[ 'width' ] = df[ ] / df[ 'width' ] max () df[ 'height' ] = df[ ] / df[ 'height' ] max ()

Trang 20

df[ 'length' ] = (df[ 'length' ] - df[ 'length' ] min() ) / (df[ 'length' ] max() - df[ 'length' ] min() )

df[ 'width' ] = (df[ 'width' ] - df[ 'width' ] min() ) / (df[ 'width' ] max() - df[ 'width' ] min() )

df[ 'height' ] = (df[ 'height' ] - df[ 'height' ] min() ) / (df[ 'height' ] max() - df[ 'height' ] min() )

32

-score

df[ 'length' ] = (df[ 'length' ] - df[ 'length' ] mean() ) / df[ 'length' ] std()

df[ 'width' ] = (df[ 'width' ] - df[ 'width' ] mean() ) / df[ 'width' ] std()

df[ 'height' ] = (df[ 'height' ] - df[ 'height' ] mean() ) / df[ 'height' ] std()

-score

Parameters:

a: array_like An array like object containing the sample data

axis: int or None, optional Axis along which to operate Default is 0 If

None, compute over the whole array a

Returns:zscore array_like

The z-scores, standardized by mean and standard deviation of input array a

Trang 21

binbin

Trang 22

prefix = None , prefix_sep = , dummy_na = False , columns = None , sparse = False , drop_first = False , dtype = None , ) ->

Trang 25

6 7

1

2

Trang 26

Là t

MeanMedianMode

là cho giátrung bình trung

là kê cho xu trung tâm, hay kê trí

Trang 27

2.1 Central tendencyMode

2.1 Central tendency

Mean = Median = ModeMode < Median < Mean Positive skewMean < Median < Mode Negative skew

2.1 Central tendency

báo cáo

Trang 28

2.2 Dispersion Tính to

x1, x2, x3 n

Min = x1Max = xn

Range= Max Min = xn x1

Quartile cung thông tin các giá

Trang 29

2.2 DispersionInterquartile Range (IQR)

IQR = Q3 Q1

23

2.2 DispersionVariance

các quan sát so trung bình chúng

bình

Trang 30

2.2 DispersionCoefficient of Variation (CV)

Trang 31

1

N i i

x skewness

N

Skewness

Trang 32

34

Trang 33

quan:

Ung th

T quan KHÔNGKHÔNG

2.4 Correlation

correlation

2.4 Correlation

Trang 34

Close to +1 : Large positive relationship Close to -1 : Large negative relationship Close to 0 : No relationship

P-value

P-value < 0.001 Strong P-value < 0.05 Moderate

P-value < 0.1 Weak P-value > 0.1 No

Trang 35

quan Pearson: 0.81-value: 9.35e-48

2.4 Correlation - Statistics

quan(Correlation heatmap)

3 Gom nhóm

drive-wheelsbody-styleprice

Trang 36

3 Gom nhómPh

dòng

3 Gom nhómDùng heatmap

Trang 37

3 Gom nhómPh

Trang 38

4 Phân tích ANOVA

kê hay không

60

4 Phân tích ANOVAANOVA F-test

import scipy.stats as stats

groupB, groupC, )

Trang 39

4 Phân tích ANOVAimport scipy.stats as stats

honda và jaguar

- Shape)

Phân tích ANOVA

Trang 41

asethub/ds105/master/Model_Datase

t.csv

https://raw.githubusercontent.com/datasethub/ds105/master/Model_Dataset_Lab.csv

ph

Mô hình'city-mpg'

39

'price'

7000

Trang 42

'body-style' 'horsepower' 'highway-mpg' 'engine-size'

Trang 43

12 13

-2019

-19 và 2017

Trang 46

Y = b0+ b1x1+ b2x2+ b3x3+ b4x4

b0 intercept(X = 0)

Trang 49

4.2 Residual plot

Trang 50

4.3 Distribution plotDistribution Plots

46

A

Trang 53

- PolynomialFeaturestrong package preprocessing

sklearn

58

Trang 55

Pipeline Constructor

pipeline object

Trang 56

mô hình.

7.1 Mean Squared Error (MSE)

7.2 R-squared (R^2)

7.1 Mean Squared Error (MSE)

y = 150

yHat = 50

150 50 = 100

7.1 Mean Squared Error (MSE)

(100) 2

7.1 Mean Squared Error (MSE)

Trang 57

7.1 Mean Squared Error (MSE)

7.2 R-squared (R^2)R-squared (R^2)

Y

= 6

Trang 58

7.1 R-squared (R^2)

.Các hình vuông màu xanh MSE

=

thì x

Trang 59

7.1 R-squared (R^2)Tính R^2

Trang 60

90Regression plot

Distribution plot

Trang 63

5 6

giá cho phép chúng ta

mô hình có phùdùng phát mô hình

Trang 65

fromsklearn.model_selection importcross_val_score

scores = cross_val_score(lr, x_data, y_data, cv=3)

Trang 66

Cross validation (CV) cross_val_score()

Root Mean Squared Error (RMSE)

Relative Squared Error (RSE)

Mean Absolute Error (MAE)

Relative Absolute Error (RAE)

2 Overfitting - Underfitting

Trang 67

2.1 Overfitting

Trang 68

y(x) + noise

26y(x) + noise

Trang 70

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

34

Code

Trang 72

41 42

:

và overfitting.

có nhau.

Trang 73

alpha

Trang 75

-6

Trang 78

So sánh

11

Trang 79

https://python-graph-gallery.com/

2 Matplotlib

Trang 80

Artist Player(Artist) Scripting Player(pyplot)

2 MatplotlibBackend player

(abstract interface class)

1 FigureCanvas: matplotlib.backend_bases.FigureCanvas

2 Renderer: matplotlib.backend_bases.Renderer

3 Event: matplotlib.backend_bases.Event

2 MatplotlibArtist player

Artist objectArtist:

1 Primitive Artist

2 Container Artist: Axis, Axes, Figure, và Tickcontainer artist container artist khác các primitive artist

Ref.: https://www.aosabook.org/en/matplotlib.html

Trang 81

2 MatplotlibArtist Player

23

2 MatplotlibArtist Player

2 MatplotlibArtist Player

2 Matplotlib

Scripting player

pyplot

Trang 82

2 MatplotlibScripting player

Trang 83

plot(*args, scalex=True, scaley=True, data=None, **kwargs)

(line)

plot([x], y, [fmt], *, data=None, **kwargs)

plot([x], y, [fmt], [x2], y2, [fmt2], , **kwargs)

Trang 84

Ref.: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html

35

kê c

.https://github.com/datasethub/ds105/blob/master/Canada.xlsx

Trang 85

https://www.un.org/en/development/desa/population/migrati on/data/empirical2/migrationflows.asp

39

df_can.head()

Trang 94

6 Pie chartPie Chart

Trang 97

8 Scatter plotScatter Plot

Trang 98

8 Scatter plot

49https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html

9 Waffle chart

h

9 Waffle chartWaffle Chart

Trang 99

9 Waffle chartWaffle Chart

Trang 100

57 58

-8

2

Trang 101

{

"ID" : "28",

Trang 102

Ref: http://132.72.155.230:3838/js/geojson-1.html

Trang 103

1.2 GeoJSONStructure of GeoJSON

type Feature FeatureCollection geometry

Point LineString MultiLineString Polygon MultiPolygon GeometryCollection properties

Trang 107

27 28

Folium Open Street Map

Map Styles Stamen Toner

Trang 109

Ontario

36

Trang 110

40

2.4 Choropleth map

Trang 111

2.4 Choropleth mapGeojson File

44

2.4 Choropleth map

Trang 113

gmapsgmaps

gmapsAPI key

gmaps

import gmaps gmaps.configure( api_key = GOOGLE_API_KEY) fig = gmaps.figure()

fig

from ipywidgets.embed import embed_minimal_html embed_minimal_html( 'export.html' , views =[fig])

Trang 114

cities, overlaid,'TERRAIN' is a map that emphasizes terrain

stroke_color = 'green' , scale = )

fig = gmaps.figure() fig.add_layer(symbol_layer) fig

Trang 115

gmapsChoropleth map

60

gmaps

Heat map

import gmaps gmaps.configure( api_key =GOOGLE_API_KEY) fig = gmaps.figure( map_type = 'SATELLITE' ) locations = [

weights =[ 1895 , 926 , 5785 , 4256 , 3745 ],

point_radius = 50 ) fig.add_layer(heatmap_layer) fig

gmapsHeat map

Ngày đăng: 17/02/2024, 11:34