1. Trang chủ
  2. » Công Nghệ Thông Tin

Ultimate pandas notebook

53 7 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Nội dung

Pandas Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive It aims to be the fundam.

Pandas Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive It aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis/manipulation tool available in any language Installing Pandas To install Python Pandas, go to your command prompt and type "pip install pandas" Once the installation is completed, go to your IDE (For example: PyCharm) or Anaconda Jupyter Notebook and simply import it by typing: import pandas as pd How to import Pandas In order to start using Pandas and all of the function available in Pandas, You will need to import it This can be easily done with this import statement In [1]: import pandas as pd NOTE: We shorten pandas to pd in order to save time and also to keep code standardized so that anyone working with your code can easily understand and run it Data structures: Pandas deals with the following three data structures − Series DataFrame Panel Series: Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.) The axis labels are collectively referred to as the index A pandas Series can be created using the following constructor − Syntax In [ ]: pandas.Series(data, index, dtype, copy) Here, data can be many different things: a Python dict an ndarray a scalar value (like 5) Example Create an Empty Series In [2]: import pandas as pd s = pd.Series() print(s) Series([], dtype: float64) :2: DeprecationWarning: The default dtype f or empty Series will be 'object' instead of 'float64' in a future versio n Specify a dtype explicitly to silence this warning s = pd.Series() Example Create a Series from ndarray In [3]: import pandas as pd import numpy as np data = np.array(['a','b','c','d']) s = pd.Series(data) print(s) a b c d dtype: object Example Create a Series from ndarray with index mentioned In [4]: import pandas as pd import numpy as np data = np.array(['a','b','c','d']) s = pd.Series(data,index=[100,101,102,103]) print(s) 100 101 102 103 dtype: a b c d object Example Create a Series from dict NOTE: A dict can be passed as input and if no index is specified, then the dictionary keys are taken in a sorted order to construct index If index is passed, the values in data corresponding to the labels in the index will be pulled out In [5]: import pandas as pd import numpy as np data = {'a' : 0., 'b' : 1., 'c' : 2.} s = pd.Series(data) print(s) a 0.0 b 1.0 c 2.0 dtype: float64 In [6]: import pandas as pd import numpy as np data = {'a' : 0., 'b' : 1., 'c' : 2.} s = pd.Series(data,index=['b','c','d','a']) print(s) b 1.0 c 2.0 d NaN a 0.0 dtype: float64 Example Create a Series from Scalar In [7]: import pandas as pd import numpy as np s = pd.Series(5, index=[0, 1, 2, 3]) print(s) 5 5 dtype: int64 NOTE If data is a scalar value, an index must be provided The value will be repeated to match the length of index Accessing Data from Series with Position: To access the data you have to mentioned the position within [] bracket In [8]: import pandas as pd s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) print("Series are: \n",s) #retrieve the first element print(s[0]) print(s[3]) Series are: a b c d e dtype: int64 Slicing Data from Series with position: For slicing you can use the index from which index to which index you want In [9]: import pandas as pd s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) print("Series are: \n",s) #retrieve the first three element print(s[:3]) Series are: a b c d e dtype: int64 a b c dtype: int64 In [10]: import pandas as pd s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) print("Series are: \n",s) #retrieve the last three element print(s[-3:]) Series are: a b c d e dtype: int64 c d e dtype: int64 Retrieve Data Using Label: To access the data you have to mentioned the label within [] bracket In [11]: import pandas as pd s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) print("Series are: \n",s) #retrieve a single element print(s['a']) Series are: a b c d e dtype: int64 In [12]: import pandas as pd s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) print("Series are: \n",s) #retrieve multiple elements print(s[['a','c','d']]) Series are: a b c d e dtype: int64 a c d dtype: int64 Basic functionality of series: axes: Returns the list of the labels of the series In [13]: import pandas as pd import numpy as np data = np.array(['a','b','c','d']) s = pd.Series(data,index=[100,101,102,103]) print(s) print("The axes are: ") print(s.axes) 100 a 101 b 102 c 103 d dtype: object The axes are: [Int64Index([100, 101, 102, 103], dtype='int64')] empty: Returns the Boolean value saying whether the Object is empty or not True indicates that the object is empty In [14]: import pandas as pd import numpy as np data = np.array(['a','b','c','d']) s = pd.Series(data,index=[100,101,102,103]) print(s) print("The series is empty or not: ",s.empty) 100 a 101 b 102 c 103 d dtype: object The series is empty or not: In [15]: False import numpy as np import pandas as pd s=pd.Series() print(s) print("The series is empty or not: ",s.empty) Series([], dtype: float64) The series is empty or not: True :3: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future versio n Specify a dtype explicitly to silence this warning s=pd.Series() ndim: Returns the number of dimensions of the object By definition, a Series is a 1D data structure, so it returns In [16]: import pandas as pd import numpy as np data = np.array(['a','b','c','d']) s = pd.Series(data,index=[100,101,102,103]) print(s) print("The dimension of s: ",s.ndim) 100 a 101 b 102 c 103 d dtype: object The dimension of s: size: Returns the size(length) of the series In [17]: import pandas as pd import numpy as np data = np.array(['a','b','c','d']) s = pd.Series(data,index=[100,101,102,103]) print(s) print("The dimension of s: ",s.size) 100 a 101 b 102 c 103 d dtype: object The dimension of s: values: Returns the actual data in the series as an array In [18]: import pandas as pd import numpy as np data = np.array(['a','b','c','d']) s = pd.Series(data,index=[100,101,102,103]) print(s) print("The values of s: ",s.values) 100 a 101 b 102 c 103 d dtype: object The values of s: ['a' 'b' 'c' 'd'] head(): head() returns the first n rows(observe the index values) The default number of elements to display is five, but you may pass a custom number tail(): tail() returns the last n rows(observe the index values) The default number of elements to display is five, but you may pass a custom number In [19]: import pandas as pd import numpy as np data = np.array(['a','b','c','d','e','f','g','h']) s = pd.Series(data,index=[100,101,102,103,104,105,106,10]) print(s) print("***************************") print("The head of s: \n",s.head()) print("***************************") print("The tail of s: \n",s.tail()) 100 a 101 b 102 c 103 d 104 e 105 f 106 g 10 h dtype: object *************************** The head of s: 100 a 101 b 102 c 103 d 104 e dtype: object *************************** The tail of s: 103 d 104 e 105 f 106 g 10 h dtype: object Explore via coding Create a series of age of 15 people acess the fifth element and print the dimension In [20]: import pandas as pd s=pd.Series([18,19,20,22,24,26,70,69,45,35,76,65,15,35,43],name="age") print("Series are: \n",s) print("Fifth element of series are: ",s[4]) print("Dimension is: ",s.ndim) Series are: 18 19 20 22 24 26 70 69 45 35 10 76 11 65 12 15 13 35 14 43 Name: age, dtype: int64 Fifth element of series are: Dimension is: 24 DataFrames: A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns It is similar to a spreadsheet, a SQL table A pandas DataFrame can be created using the following constructor − Syntax In [ ]: pandas.DataFrame( data, index, columns, dtype, copy) DataFrame accepts many different kinds of input: Dict of 1D ndarrays, lists, dicts, or Series 2-D numpy.ndarray Structured or record ndarray A Series Another DataFrame Example Create an Empty DataFrame In [21]: import pandas as pd df = pd.DataFrame() print(df) Empty DataFrame Columns: [] Index: [] Example Create a DataFrame from Lists In [22]: import pandas as pd data = [1,2,3,4,5] df = pd.DataFrame(data) print(df) In [23]: import pandas as pd data = [['Alex',10],['Bob',12],['Clarke',13]] df = pd.DataFrame(data,columns=['Name','Age']) print(df) In [24]: Name Alex Bob Clarke Age 10 12 13 import pandas as pd data = [['Alex',10],['Bob',12],['Clarke',13]] df = pd.DataFrame(data,columns=['Name','Age'],dtype=float) print(df) Name Alex Bob Clarke Age 10.0 12.0 13.0 Example Create a DataFrame from Dict of ndarrays / Lists In [25]: import pandas as pd data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]} df = pd.DataFrame(data) print(df) In [26]: Name Tom Jack Steve Ricky Age 28 34 29 42 import pandas as pd data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]} df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4']) print(df) rank1 rank2 rank3 rank4 Name Tom Jack Steve Ricky Age 28 34 29 42 Example Create a DataFrame from List of Dicts In [27]: import pandas as pd data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}] df = pd.DataFrame(data) print(df) In [28]: a b 10 c NaN 20.0 import pandas as pd data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}] df = pd.DataFrame(data, index=['first', 'second']) print(df) first second a b 10 c NaN 20.0 How i select specific column from a DataFrame: column selection: We can select the column in data frame by using label In the data frame you have to pass column name in square bracket In [29]: import pandas as pd d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(d) print("Data frame is: \n",df) print("***********") print(df['one']) Data frame is: one two a 1.0 b 2.0 c 3.0 d NaN *********** a 1.0 b 2.0 c 3.0 d NaN Name: one, dtype: float64 NOTE: Each column in a DataFrame is a Series As a single column is selected, the returned object is a pandas Series We can verify this by checking the type of the output: filter out the column which have SepalLength >5 In [85]: ds['SepalLengthCm']>5 Out[85]: True False False False False 145 True 146 True 147 True 148 True 149 True Name: SepalLengthCm, Length: 150, dtype: bool Create new column which is TotalPetalCm=PetalLengthCm+PetalWidthCm In [86]: ds['TotalPetalCm']=ds['PetalLengthCm']+ds['PetalWidthCm'] ds Out[86]: Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species 5.1 3.5 1.4 0.2 Irissetosa 4.9 3.0 1.4 0.2 Irissetosa 4.7 3.2 1.3 0.2 Irissetosa 4.6 3.1 1.5 0.2 Irissetosa 5.0 3.6 1.4 0.2 Irissetosa 145 146 6.7 3.0 5.2 2.3 Irisvirginica 146 147 6.3 2.5 5.0 1.9 Irisvirginica 147 148 6.5 3.0 5.2 2.0 Irisvirginica 148 149 6.2 3.4 5.4 2.3 Irisvirginica 149 150 5.9 3.0 5.1 1.8 Irisvirginica TotalPeta 150 rows × columns Delete column TotalPetalCm from dataset In [87]: del ds['TotalPetalCm'] ds Out[87]: Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species 5.1 3.5 1.4 0.2 Iris-setosa 4.9 3.0 1.4 0.2 Iris-setosa 4.7 3.2 1.3 0.2 Iris-setosa 4.6 3.1 1.5 0.2 Iris-setosa 5.0 3.6 1.4 0.2 Iris-setosa 145 146 6.7 3.0 5.2 2.3 Iris-virginica 146 147 6.3 2.5 5.0 1.9 Iris-virginica 147 148 6.5 3.0 5.2 2.0 Iris-virginica 148 149 6.2 3.4 5.4 2.3 Iris-virginica 149 150 5.9 3.0 5.1 1.8 Iris-virginica 150 rows × columns Acess the dataframe In [88]: ds.iloc[0:10] Out[88]: Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species 5.1 3.5 1.4 0.2 Iris-setosa 4.9 3.0 1.4 0.2 Iris-setosa 4.7 3.2 1.3 0.2 Iris-setosa 4.6 3.1 1.5 0.2 Iris-setosa 5.0 3.6 1.4 0.2 Iris-setosa 5.4 3.9 1.7 0.4 Iris-setosa 4.6 3.4 1.4 0.3 Iris-setosa 5.0 3.4 1.5 0.2 Iris-setosa 4.4 2.9 1.4 0.2 Iris-setosa 10 4.9 3.1 1.5 0.1 Iris-setosa Delete the first row from the dataframe In [89]: ds.drop(0, inplace=True) ds Out[89]: Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species 4.9 3.0 1.4 0.2 Iris-setosa 4.7 3.2 1.3 0.2 Iris-setosa 4.6 3.1 1.5 0.2 Iris-setosa 5.0 3.6 1.4 0.2 Iris-setosa 5.4 3.9 1.7 0.4 Iris-setosa 145 146 6.7 3.0 5.2 2.3 Iris-virginica 146 147 6.3 2.5 5.0 1.9 Iris-virginica 147 148 6.5 3.0 5.2 2.0 Iris-virginica 148 149 6.2 3.4 5.4 2.3 Iris-virginica 149 150 5.9 3.0 5.1 1.8 Iris-virginica 149 rows × columns Delete the first 10 rows from dataframe In [90]: ds.drop(ds.index[1:10],inplace=True) ds Out[90]: Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species 4.9 3.0 1.4 0.2 Iris-setosa 11 12 4.8 3.4 1.6 0.2 Iris-setosa 12 13 4.8 3.0 1.4 0.1 Iris-setosa 13 14 4.3 3.0 1.1 0.1 Iris-setosa 14 15 5.8 4.0 1.2 0.2 Iris-setosa 145 146 6.7 3.0 5.2 2.3 Iris-virginica 146 147 6.3 2.5 5.0 1.9 Iris-virginica 147 148 6.5 3.0 5.2 2.0 Iris-virginica 148 149 6.2 3.4 5.4 2.3 Iris-virginica 149 150 5.9 3.0 5.1 1.8 Iris-virginica 140 rows × columns Reindex the dataframe In [91]: ds.reset_index(inplace=True, drop=True) ds Out[91]: Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species 4.9 3.0 1.4 0.2 Iris-setosa 12 4.8 3.4 1.6 0.2 Iris-setosa 13 4.8 3.0 1.4 0.1 Iris-setosa 14 4.3 3.0 1.1 0.1 Iris-setosa 15 5.8 4.0 1.2 0.2 Iris-setosa 135 146 6.7 3.0 5.2 2.3 Iris-virginica 136 147 6.3 2.5 5.0 1.9 Iris-virginica 137 148 6.5 3.0 5.2 2.0 Iris-virginica 138 149 6.2 3.4 5.4 2.3 Iris-virginica 139 150 5.9 3.0 5.1 1.8 Iris-virginica 140 rows × columns Use describe function In [92]: ds.describe() Out[92]: Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm count 140.000000 140.000000 140.000000 140.000000 140.000000 mean 80.435714 5.910000 3.030714 3.922857 1.268571 std 40.676511 0.812696 0.432642 1.711465 0.741677 2.000000 4.300000 2.000000 1.000000 0.100000 25% 45.750000 5.200000 2.800000 1.675000 0.400000 50% 80.500000 5.850000 3.000000 4.500000 1.400000 75% 115.250000 6.425000 3.300000 5.100000 1.800000 max 150.000000 7.900000 4.400000 6.900000 2.500000 find the setosa flower describe In [93]: ds[ds['Species']=="Iris-virginica"].describe() Out[93]: Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm count 50.00000 50.00000 50.000000 50.000000 50.00000 mean 125.50000 6.58800 2.974000 5.552000 2.02600 std 14.57738 0.63588 0.322497 0.551895 0.27465 101.00000 4.90000 2.200000 4.500000 1.40000 25% 113.25000 6.22500 2.800000 5.100000 1.80000 50% 125.50000 6.50000 3.000000 5.550000 2.00000 75% 137.75000 6.90000 3.175000 5.875000 2.30000 max 150.00000 7.90000 3.800000 6.900000 2.50000 find the virginica flower describe In [94]: ds[ds['Species']=="Iris-virginica"].describe() Out[94]: Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm count 50.00000 50.00000 50.000000 50.000000 50.00000 mean 125.50000 6.58800 2.974000 5.552000 2.02600 std 14.57738 0.63588 0.322497 0.551895 0.27465 101.00000 4.90000 2.200000 4.500000 1.40000 25% 113.25000 6.22500 2.800000 5.100000 1.80000 50% 125.50000 6.50000 3.000000 5.550000 2.00000 75% 137.75000 6.90000 3.175000 5.875000 2.30000 max 150.00000 7.90000 3.800000 6.900000 2.50000 find the virginica flower describe In [95]: Out[95]: ds[ds['Species']=="Iris-versicolor"].describe() Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm count 50.00000 50.000000 50.000000 50.000000 50.000000 mean 75.50000 5.936000 2.770000 4.260000 1.326000 std 14.57738 0.516171 0.313798 0.469911 0.197753 51.00000 4.900000 2.000000 3.000000 1.000000 25% 63.25000 5.600000 2.525000 4.000000 1.200000 50% 75.50000 5.900000 2.800000 4.350000 1.300000 75% 87.75000 6.300000 3.000000 4.600000 1.500000 max 100.00000 7.000000 3.400000 5.100000 1.800000 Sort the dataframe according to SepalLengthCm In [96]: ds.sort_values(by='SepalLengthCm') Out[96]: Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species 14 4.3 3.0 1.1 0.1 Iris-setosa 32 43 4.4 3.2 1.3 0.2 Iris-setosa 28 39 4.4 3.0 1.3 0.2 Iris-setosa 31 42 4.5 2.3 1.3 0.3 Iris-setosa 37 48 4.6 3.2 1.4 0.2 Iris-setosa 112 123 7.7 2.8 6.7 2.0 Iris-virginica 108 119 7.7 2.6 6.9 2.3 Iris-virginica 125 136 7.7 3.0 6.1 2.3 Iris-virginica 107 118 7.7 3.8 6.7 2.2 Iris-virginica 121 132 7.9 3.8 6.4 2.0 Iris-virginica 140 rows × columns Check for the null values in PetalLengthCm In [97]: ds['PetalLengthCm'].isnull().sum() Out[97]: Rename the SepalLengthCm column as SP(Cm) In [98]: ds.rename(columns={'SepalLengthCm':'SP(Cm)'},inplace=True) ds Out[98]: Id SP(Cm) SepalWidthCm PetalLengthCm PetalWidthCm Species 4.9 3.0 1.4 0.2 Iris-setosa 12 4.8 3.4 1.6 0.2 Iris-setosa 13 4.8 3.0 1.4 0.1 Iris-setosa 14 4.3 3.0 1.1 0.1 Iris-setosa 15 5.8 4.0 1.2 0.2 Iris-setosa 135 146 6.7 3.0 5.2 2.3 Iris-virginica 136 147 6.3 2.5 5.0 1.9 Iris-virginica 137 148 6.5 3.0 5.2 2.0 Iris-virginica 138 149 6.2 3.4 5.4 2.3 Iris-virginica 139 150 5.9 3.0 5.1 1.8 Iris-virginica 140 rows × columns Rename the PetalLengthCm column as PL(Cm), SepalWidthCm as SW(Cm) and PetalWidthCm as PW(Cm) In [99]: ds.rename(columns={'PetalLengthCm':'PL(Cm)','SepalWidthCm':'SW(Cm)','Peta ds Out[99]: Id SP(Cm) SW(Cm) PL(Cm) PW(Cm) Species 4.9 3.0 1.4 0.2 Iris-setosa 12 4.8 3.4 1.6 0.2 Iris-setosa 13 4.8 3.0 1.4 0.1 Iris-setosa 14 4.3 3.0 1.1 0.1 Iris-setosa 15 5.8 4.0 1.2 0.2 Iris-setosa 135 146 6.7 3.0 5.2 2.3 Iris-virginica 136 147 6.3 2.5 5.0 1.9 Iris-virginica 137 148 6.5 3.0 5.2 2.0 Iris-virginica 138 149 6.2 3.4 5.4 2.3 Iris-virginica 139 150 5.9 3.0 5.1 1.8 Iris-virginica 140 rows × columns Find and print count of each kind of flower Print the count as integer value In [100… ds['Species'].value_counts() Out[100… Iris-virginica 50 Iris-versicolor 50 Iris-setosa 40 Name: Species, dtype: int64 In [101… df=ds[ds['Species']=="Iris-setosa"] print("Count of setosa flower: ",df['Id'].count()) df=ds[ds['Species']=="Iris-virginica"] print("Count of virginica flower: ",df['Id'].count()) df=ds[ds['Species']=="Iris-versicolor"] print("Count of versicolor flower: ",df['Id'].count()) Count of setosa flower: 40 Count of virginica flower: 50 Count of versicolor flower: 50 Find the data of flower "iris-virginiva" type where petal-length>1.5 In [102… df=ds[ds['Species']=="Iris-virginica"] df[df['PL(Cm)']>1.5] Out[102… Id SP(Cm) SW(Cm) PL(Cm) PW(Cm) Species 90 101 6.3 3.3 6.0 2.5 Iris-virginica 91 102 5.8 2.7 5.1 1.9 Iris-virginica 92 103 7.1 3.0 5.9 2.1 Iris-virginica 93 104 6.3 2.9 5.6 1.8 Iris-virginica 94 105 6.5 3.0 5.8 2.2 Iris-virginica 95 106 7.6 3.0 6.6 2.1 Iris-virginica 96 107 4.9 2.5 4.5 1.7 Iris-virginica 97 108 7.3 2.9 6.3 1.8 Iris-virginica 98 109 6.7 2.5 5.8 1.8 Iris-virginica 99 110 7.2 3.6 6.1 2.5 Iris-virginica 100 111 6.5 3.2 5.1 2.0 Iris-virginica 101 112 6.4 2.7 5.3 1.9 Iris-virginica 102 113 6.8 3.0 5.5 2.1 Iris-virginica 103 114 5.7 2.5 5.0 2.0 Iris-virginica 104 115 5.8 2.8 5.1 2.4 Iris-virginica 105 116 6.4 3.2 5.3 2.3 Iris-virginica 106 117 6.5 3.0 5.5 1.8 Iris-virginica 107 118 7.7 3.8 6.7 2.2 Iris-virginica 108 119 7.7 2.6 6.9 2.3 Iris-virginica 109 120 6.0 2.2 5.0 1.5 Iris-virginica 110 121 6.9 3.2 5.7 2.3 Iris-virginica 111 122 5.6 2.8 4.9 2.0 Iris-virginica 112 123 7.7 2.8 6.7 2.0 Iris-virginica 113 124 6.3 2.7 4.9 1.8 Iris-virginica 114 125 6.7 3.3 5.7 2.1 Iris-virginica 115 126 7.2 3.2 6.0 1.8 Iris-virginica 116 127 6.2 2.8 4.8 1.8 Iris-virginica 117 128 6.1 3.0 4.9 1.8 Iris-virginica 118 129 6.4 2.8 5.6 2.1 Iris-virginica 119 130 7.2 3.0 5.8 1.6 Iris-virginica 120 131 7.4 2.8 6.1 1.9 Iris-virginica 121 132 7.9 3.8 6.4 2.0 Iris-virginica 122 133 6.4 2.8 5.6 2.2 Iris-virginica 123 134 6.3 2.8 5.1 1.5 Iris-virginica 124 135 6.1 2.6 5.6 1.4 Iris-virginica 125 136 7.7 3.0 6.1 2.3 Iris-virginica 126 137 6.3 3.4 5.6 2.4 Iris-virginica 127 138 6.4 3.1 5.5 1.8 Iris-virginica 128 139 6.0 3.0 4.8 1.8 Iris-virginica 129 140 6.9 3.1 5.4 2.1 Iris-virginica 130 141 6.7 3.1 5.6 2.4 Iris-virginica 131 142 6.9 3.1 5.1 2.3 Iris-virginica 132 143 5.8 2.7 5.1 1.9 Iris-virginica 133 144 6.8 3.2 5.9 2.3 Iris-virginica 134 145 6.7 3.3 5.7 2.5 Iris-virginica 135 146 6.7 3.0 5.2 2.3 Iris-virginica 136 147 6.3 2.5 5.0 1.9 Iris-virginica 137 148 6.5 3.0 5.2 2.0 Iris-virginica 138 149 6.2 3.4 5.4 2.3 Iris-virginica 139 150 5.9 3.0 5.1 1.8 Iris-virginica Find and print the minimum and maximum of the feature for each kind of flower In [103… df=ds[ds['Species']=="Iris-setosa"] print("Setosa") print("The minimum value of: \n",df.min()) print("The maximum value of: \n",df.max()) print("************************************") df=ds[ds['Species']=="Iris-virginica"] df=ds[ds['Species']==""] print("Verginica") print("The minimum value of: \n",df.min()) print("The maximum value of: \n",df.max()) print("************************************") df=ds[ds['Species']=="Iris-versicolor"] print("Versicolor") print("The minimum value of: \n",df.min()) print("The maximum value of: \n",df.max()) Setosa The minimum value of: Id SP(Cm) 4.3 SW(Cm) 2.3 PL(Cm) PW(Cm) 0.1 Species Iris-setosa dtype: object The maximum value of: Id 50 SP(Cm) 5.8 SW(Cm) 4.4 PL(Cm) 1.9 PW(Cm) 0.6 Species Iris-setosa dtype: object ************************************ Verginica The minimum value of: Id NaN SP(Cm) NaN SW(Cm) NaN PL(Cm) NaN PW(Cm) NaN Species NaN dtype: float64 The maximum value of: Id NaN SP(Cm) NaN SW(Cm) NaN PL(Cm) NaN PW(Cm) NaN Species NaN dtype: float64 ************************************ Versicolor The minimum value of: Id 51 SP(Cm) 4.9 SW(Cm) PL(Cm) PW(Cm) Species Iris-versicolor dtype: object The maximum value of: Id 100 SP(Cm) SW(Cm) 3.4 PL(Cm) 5.1 PW(Cm) 1.8 Species Iris-versicolor dtype: object Use group by on species In [104… import pandas as pd iris=pd.read_csv("D:\Python learning track and notes\Dataset\Iris.csv") grouped_df=iris.groupby('Species') for name,group in grouped_df: print("Name is: ",name) print("Group is:") print(group) Name is: Iris-setosa Group is: Id SepalLengthCm cies 5.1 tosa 4.9 tosa 4.7 tosa 4.6 tosa 5.0 tosa 5.4 tosa 4.6 tosa 5.0 tosa 4.4 tosa 10 4.9 tosa 10 11 5.4 tosa 11 12 4.8 SepalWidthCm PetalLengthCm PetalWidthCm Spe 3.5 1.4 0.2 Iris-se 3.0 1.4 0.2 Iris-se 3.2 1.3 0.2 Iris-se 3.1 1.5 0.2 Iris-se 3.6 1.4 0.2 Iris-se 3.9 1.7 0.4 Iris-se 3.4 1.4 0.3 Iris-se 3.4 1.5 0.2 Iris-se 2.9 1.4 0.2 Iris-se 3.1 1.5 0.1 Iris-se 3.7 1.5 0.2 Iris-se 3.4 1.6 0.2 Iris-se tosa 12 13 tosa 13 14 tosa 14 15 tosa 15 16 tosa 16 17 tosa 17 18 tosa 18 19 tosa 19 20 tosa 20 21 tosa 21 22 tosa 22 23 tosa 23 24 tosa 24 25 tosa 25 26 tosa 26 27 tosa 27 28 tosa 28 29 tosa 29 30 tosa 30 31 tosa 31 32 tosa 32 33 tosa 33 34 tosa 34 35 tosa 35 36 tosa 36 37 tosa 37 38 tosa 38 39 tosa 39 40 tosa 40 41 tosa 41 42 tosa 42 43 tosa 43 44 tosa 44 45 tosa 45 46 4.8 3.0 1.4 0.1 Iris-se 4.3 3.0 1.1 0.1 Iris-se 5.8 4.0 1.2 0.2 Iris-se 5.7 4.4 1.5 0.4 Iris-se 5.4 3.9 1.3 0.4 Iris-se 5.1 3.5 1.4 0.3 Iris-se 5.7 3.8 1.7 0.3 Iris-se 5.1 3.8 1.5 0.3 Iris-se 5.4 3.4 1.7 0.2 Iris-se 5.1 3.7 1.5 0.4 Iris-se 4.6 3.6 1.0 0.2 Iris-se 5.1 3.3 1.7 0.5 Iris-se 4.8 3.4 1.9 0.2 Iris-se 5.0 3.0 1.6 0.2 Iris-se 5.0 3.4 1.6 0.4 Iris-se 5.2 3.5 1.5 0.2 Iris-se 5.2 3.4 1.4 0.2 Iris-se 4.7 3.2 1.6 0.2 Iris-se 4.8 3.1 1.6 0.2 Iris-se 5.4 3.4 1.5 0.4 Iris-se 5.2 4.1 1.5 0.1 Iris-se 5.5 4.2 1.4 0.2 Iris-se 4.9 3.1 1.5 0.1 Iris-se 5.0 3.2 1.2 0.2 Iris-se 5.5 3.5 1.3 0.2 Iris-se 4.9 3.1 1.5 0.1 Iris-se 4.4 3.0 1.3 0.2 Iris-se 5.1 3.4 1.5 0.2 Iris-se 5.0 3.5 1.3 0.3 Iris-se 4.5 2.3 1.3 0.3 Iris-se 4.4 3.2 1.3 0.2 Iris-se 5.0 3.5 1.6 0.6 Iris-se 5.1 3.8 1.9 0.4 Iris-se 4.8 3.0 1.4 0.3 Iris-se tosa 46 47 5.1 3.8 tosa 47 48 4.6 3.2 tosa 48 49 5.3 3.7 tosa 49 50 5.0 3.3 tosa Name is: Iris-versicolor Group is: Id SepalLengthCm SepalWidthCm 50 51 7.0 3.2 51 52 6.4 3.2 52 53 6.9 3.1 53 54 5.5 2.3 54 55 6.5 2.8 55 56 5.7 2.8 56 57 6.3 3.3 57 58 4.9 2.4 58 59 6.6 2.9 59 60 5.2 2.7 60 61 5.0 2.0 61 62 5.9 3.0 62 63 6.0 2.2 63 64 6.1 2.9 64 65 5.6 2.9 65 66 6.7 3.1 66 67 5.6 3.0 67 68 5.8 2.7 68 69 6.2 2.2 69 70 5.6 2.5 70 71 5.9 3.2 71 72 6.1 2.8 72 73 6.3 2.5 73 74 6.1 2.8 74 75 6.4 2.9 75 76 6.6 3.0 76 77 6.8 2.8 77 78 6.7 3.0 78 79 6.0 2.9 79 80 5.7 2.6 80 81 5.5 2.4 81 82 5.5 2.4 82 83 5.8 2.7 83 84 6.0 2.7 84 85 5.4 3.0 85 86 6.0 3.4 86 87 6.7 3.1 87 88 6.3 2.3 88 89 5.6 3.0 89 90 5.5 2.5 90 91 5.5 2.6 91 92 6.1 3.0 92 93 5.8 2.6 93 94 5.0 2.3 94 95 5.6 2.7 95 96 5.7 3.0 96 97 5.7 2.9 97 98 6.2 2.9 98 99 5.1 2.5 99 100 5.7 2.8 50 51 52 53 Species Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor 1.6 0.2 Iris-se 1.4 0.2 Iris-se 1.5 0.2 Iris-se 1.4 0.2 Iris-se PetalLengthCm 4.7 4.5 4.9 4.0 4.6 4.5 4.7 3.3 4.6 3.9 3.5 4.2 4.0 4.7 3.6 4.4 4.5 4.1 4.5 3.9 4.8 4.0 4.9 4.7 4.3 4.4 4.8 5.0 4.5 3.5 3.8 3.7 3.9 5.1 4.5 4.5 4.7 4.4 4.1 4.0 4.4 4.6 4.0 3.3 4.2 4.2 4.2 4.3 3.0 4.1 PetalWidthCm 1.4 1.5 1.5 1.3 1.5 1.3 1.6 1.0 1.3 1.4 1.0 1.5 1.0 1.4 1.3 1.4 1.5 1.0 1.5 1.1 1.8 1.3 1.5 1.2 1.3 1.4 1.4 1.7 1.5 1.0 1.1 1.0 1.2 1.6 1.5 1.6 1.5 1.3 1.3 1.3 1.2 1.4 1.2 1.0 1.3 1.2 1.3 1.3 1.1 1.3 \ 54 Iris-versicolor 55 Iris-versicolor 56 Iris-versicolor 57 Iris-versicolor 58 Iris-versicolor 59 Iris-versicolor 60 Iris-versicolor 61 Iris-versicolor 62 Iris-versicolor 63 Iris-versicolor 64 Iris-versicolor 65 Iris-versicolor 66 Iris-versicolor 67 Iris-versicolor 68 Iris-versicolor 69 Iris-versicolor 70 Iris-versicolor 71 Iris-versicolor 72 Iris-versicolor 73 Iris-versicolor 74 Iris-versicolor 75 Iris-versicolor 76 Iris-versicolor 77 Iris-versicolor 78 Iris-versicolor 79 Iris-versicolor 80 Iris-versicolor 81 Iris-versicolor 82 Iris-versicolor 83 Iris-versicolor 84 Iris-versicolor 85 Iris-versicolor 86 Iris-versicolor 87 Iris-versicolor 88 Iris-versicolor 89 Iris-versicolor 90 Iris-versicolor 91 Iris-versicolor 92 Iris-versicolor 93 Iris-versicolor 94 Iris-versicolor 95 Iris-versicolor 96 Iris-versicolor 97 Iris-versicolor 98 Iris-versicolor 99 Iris-versicolor Name is: Iris-virginica Group is: Id SepalLengthCm SepalWidthCm 100 101 6.3 3.3 101 102 5.8 2.7 102 103 7.1 3.0 103 104 6.3 2.9 104 105 6.5 3.0 105 106 7.6 3.0 106 107 4.9 2.5 107 108 7.3 2.9 108 109 6.7 2.5 109 110 7.2 3.6 110 111 6.5 3.2 111 112 6.4 2.7 112 113 6.8 3.0 113 114 5.7 2.5 114 115 5.8 2.8 115 116 6.4 3.2 116 117 6.5 3.0 117 118 7.7 3.8 118 119 7.7 2.6 PetalLengthCm 6.0 5.1 5.9 5.6 5.8 6.6 4.5 6.3 5.8 6.1 5.1 5.3 5.5 5.0 5.1 5.3 5.5 6.7 6.9 PetalWidthCm 2.5 1.9 2.1 1.8 2.2 2.1 1.7 1.8 1.8 2.5 2.0 1.9 2.1 2.0 2.4 2.3 1.8 2.2 2.3 \ 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 Species Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica 6.0 6.9 5.6 7.7 6.3 6.7 7.2 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8 6.7 6.7 6.3 6.5 6.2 5.9 2.2 3.2 2.8 2.8 2.7 3.3 3.2 2.8 3.0 2.8 3.0 2.8 3.8 2.8 2.8 2.6 3.0 3.4 3.1 3.0 3.1 3.1 3.1 2.7 3.2 3.3 3.0 2.5 3.0 3.4 3.0 5.0 5.7 4.9 6.7 4.9 5.7 6.0 4.8 4.9 5.6 5.8 6.1 6.4 5.6 5.1 5.6 6.1 5.6 5.5 4.8 5.4 5.6 5.1 5.1 5.9 5.7 5.2 5.0 5.2 5.4 5.1 1.5 2.3 2.0 2.0 1.8 2.1 1.8 1.8 1.8 2.1 1.6 1.9 2.0 2.2 1.5 1.4 2.3 2.4 1.8 1.8 2.1 2.4 2.3 1.9 2.3 2.5 2.3 1.9 2.0 2.3 1.8 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 In [105… Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica Iris-virginica grouped_df.get_group('Iris-versicolor') Out[105… Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species 50 51 7.0 3.2 4.7 1.4 Iris-versicolor 51 52 6.4 3.2 4.5 1.5 Iris-versicolor 52 53 6.9 3.1 4.9 1.5 Iris-versicolor 53 54 5.5 2.3 4.0 1.3 Iris-versicolor 54 55 6.5 2.8 4.6 1.5 Iris-versicolor 55 56 5.7 2.8 4.5 1.3 Iris-versicolor 56 57 6.3 3.3 4.7 1.6 Iris-versicolor 57 58 4.9 2.4 3.3 1.0 Iris-versicolor 58 59 6.6 2.9 4.6 1.3 Iris-versicolor 59 60 5.2 2.7 3.9 1.4 Iris-versicolor 60 61 5.0 2.0 3.5 1.0 Iris-versicolor 61 62 5.9 3.0 4.2 1.5 Iris-versicolor 62 63 6.0 2.2 4.0 1.0 Iris-versicolor 63 64 6.1 2.9 4.7 1.4 Iris-versicolor 64 65 5.6 2.9 3.6 1.3 Iris-versicolor 65 66 6.7 3.1 4.4 1.4 Iris-versicolor 66 67 5.6 3.0 4.5 1.5 Iris-versicolor 67 68 5.8 2.7 4.1 1.0 Iris-versicolor 68 69 6.2 2.2 4.5 1.5 Iris-versicolor 69 70 5.6 2.5 3.9 1.1 Iris-versicolor 70 71 5.9 3.2 4.8 1.8 Iris-versicolor 71 72 6.1 2.8 4.0 1.3 Iris-versicolor 72 73 6.3 2.5 4.9 1.5 Iris-versicolor 73 74 6.1 2.8 4.7 1.2 Iris-versicolor 74 75 6.4 2.9 4.3 1.3 Iris-versicolor 75 76 6.6 3.0 4.4 1.4 Iris-versicolor 76 77 6.8 2.8 4.8 1.4 Iris-versicolor 77 78 6.7 3.0 5.0 1.7 Iris-versicolor 78 79 6.0 2.9 4.5 1.5 Iris-versicolor 79 80 5.7 2.6 3.5 1.0 Iris-versicolor 80 81 5.5 2.4 3.8 1.1 Iris-versicolor 81 82 5.5 2.4 3.7 1.0 Iris-versicolor 82 83 5.8 2.7 3.9 1.2 Iris-versicolor 83 84 6.0 2.7 5.1 1.6 Iris-versicolor 84 85 5.4 3.0 4.5 1.5 Iris-versicolor 85 86 6.0 3.4 4.5 1.6 Iris-versicolor 86 87 6.7 3.1 4.7 1.5 Iris-versicolor 87 88 6.3 2.3 4.4 1.3 Iris-versicolor 88 89 5.6 3.0 4.1 1.3 Iris-versicolor 89 90 5.5 2.5 4.0 1.3 Iris-versicolor 90 91 5.5 2.6 4.4 1.2 Iris-versicolor 91 92 6.1 3.0 4.6 1.4 Iris-versicolor 92 93 5.8 2.6 4.0 1.2 Iris-versicolor 93 94 5.0 2.3 3.3 1.0 Iris-versicolor 94 95 5.6 2.7 4.2 1.3 Iris-versicolor 95 96 5.7 3.0 4.2 1.2 Iris-versicolor 96 97 5.7 2.9 4.2 1.3 Iris-versicolor 97 98 6.2 2.9 4.3 1.3 Iris-versicolor 98 99 5.1 2.5 3.0 1.1 Iris-versicolor 99 100 5.7 2.8 4.1 1.3 Iris-versicolor ... axis labels are collectively referred to as the index A pandas Series can be created using the following constructor − Syntax In [ ]: pandas. Series(data, index, dtype, copy) Here, data can be... pulled out In [5]: import pandas as pd import numpy as np data = {'a' : 0., 'b' : 1., 'c' : 2.} s = pd.Series(data) print(s) a 0.0 b 1.0 c 2.0 dtype: float64 In [6]: import pandas as pd import numpy... import pandas as pd s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) print("Series are: ",s) #retrieve a single element print(s['a']) Series are: a b c d e dtype: int64 In [12]: import pandas

Ngày đăng: 20/10/2022, 13:50