THÔNG TIN TÀI LIỆU
Missing Value Problem
Overview
•
Background
•
Problems
•
General methods
•
Handling Missing value
•
Imputation Method
Background
•
Human Error
•
No information ( may be because it is related to
privacy)
Problems
•
Loss of efficiency
•
Complication in handling and analyzing the data
•
Bias resulting from differences between missing
and complete data
General Method
•
Sequential Method ( Preprocessing Methods)
•
Parrarel Method
missing value is taking into account in main
process knowledge mining
Handling Missing Value
•
Do Not Impute ( parrarel method)
•
Case Deletion / Row Ignoring
•
Fill in the missing value / Imputation Method
Problem with Imputation Method
•
The rising of new pattern because of wrong
imputation method
•
Biasing natural pattern in original data
Imputation Method
•
Use global constant
ex : unknown, N/A, -~
•
Global most common attribute value for symbolic
attribute or global average ( mean) value for
numerical atrribute ( MC)
•
Concept most common attribute value for
symbolic attribute or global average ( mean) value
for numerical atrribute (CMC)
Imputation Method (2)
•
Weel known algorithm in data mining to predict
the most probable value
ex : KNN, K-Means, Fuzzy K-Means, SVM,
Regularized EM, Singular value decomposition,
Bayesian Principal Component Analysis, Local
Least Square, etc
KNN Imputation Approach
•
Compute the k nearest neighbours and impute a value
from them.
•
For nominal values, use the most common value among
all neighbours
•
For numerical values use the average value.
•
Indeed, we need to define a proximity measure between
instances, such as euclidian distance (it is a case of a Lp
norm distance), which is usually used.
[...]... revised in three steps • First, for each record with missing values, the regression parameters of the variables with missing values on the variables with available values are computed from the estimates of the mean and of the covariance matrix • Second, the missing values in a record are filled in with their conditional expectation values given the available values and the estimates of the mean and of the... fill in missing data • First we select the examples in which there are no missing attribute values • In the next step we set one of the condition attributes (input attribute), some of those values are missing, as the decision attribute (output attribute), and the decision attributes as the condition attributes by contraries • Finally, we use SVM regression to predict the decision attribute values Event... the MVs with the EM algorithm, and then we compute the Singular Value Decomposition and obtain the eigenvalues • Now we can use the eigenvalues to apply a regression over the complete attributes of the instance, to obtain an estimation of the MV itself Bayesian Principal Component Analysis • This method is an estimation method for missing values, which is based on Bayesian principal component analysis... cycles through these steps until the imputed values and the estimates of the mean and of the covariance matrix stop changing appreciably from one iteration to the next Singular Value Decomposition Imputation • Employ singular value decomposition to obtain a set of mutually orthogonal expression patterns that can be linearly combined to approximate the values of all attributes in the data set • In... simultaneously within the framework of Bayes inference is not new in principle, actual BPCA implementation that makes it possible to estimate arbitrary missing variables is new in terms of statistical methodology Bayesian Principal Component Analysis(2) • The missing value estimation method based on BPCA consists of three elementary processes • Three elementary processes : 1 Principal component (PC) regression... a minimum loss of information criterion • Treating a mixed-mode feature n-tuple as a discretevalued one, the authors propose a new statistical approach for synthesis of knowledge based on cluster analysis • As main advantage, this method does not require neither scale normalization nor ordering of discrete values Event Covering Approach(2) • By synthesis of the data into statistical knowledge, they... represents the mean value of the objects in the cluster K-means Clustering Imputation (2) • Once the clusters have converged, the last process is to fill in all the non-reference attributes for each incomplete object based on the cluster information • Data objects that belong to the same cluster are taken as nearest neighbours of each other • Apply a nearest neighbour algorithm to replace missing data,... a record are filled in with their conditional expectation values given the available values and the estimates of the mean and of the covariance matrix, the conditional expectation values being the product of the available values and the estimated regression coefficients Regularized Expectation-Maximization(2) • Third, the mean and the covariance matrix are re-estimated, the mean as the sample mean... Principal component (PC) regression 2 Bayesian estimation 3 An expectationmaximization (EM)-like repetitive algorithm Local Least Squares Imputation • With this method, a target instance that has missing values is represented as a linear combination of similar instances • Rather than using all available genes in the data, only similar genes based on a similarity measure are used the method has the . Missing Value Problem Overview • Background • Problems • General methods • Handling Missing value • Imputation Method Background • Human Error • No. Method missing value is taking into account in main process knowledge mining Handling Missing Value • Do Not Impute ( parrarel method) • Case Deletion / Row Ignoring • Fill in the missing value. nearest neighbours and impute a value from them. • For nominal values, use the most common value among all neighbours • For numerical values use the average value. • Indeed, we need to define
Ngày đăng: 28/03/2014, 23:20
Xem thêm: Missing Value Problem docx