Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 12 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
12
Dung lượng
1,06 MB
Nội dung
Received March 20, 2019, accepted April 7, 2019, date of publication April 11, 2019, date of current version April 24, 2019 Digital Object Identifier 10.1109/ACCESS.2019.2910406 A Semi-Supervised CNN With Fuzzy Rough C-Mean for Image Classification SAMAN RIAZ School 1,2 , ALI ARSHAD 1,2 , AND LICHENG JIAO3 , (Fellow, IEEE) of Computer Science and Technology, Xidian University, Xi’an 710071, China School of International Education, Xidian University, Xi’an 710071, China Key Laboratory of Intelligent Perception and Image Understanding, International Joint Collaboration Laboratory of Intelligent Perception and Computation, International Research Center of Intelligent Perception and Computation, School of Artificial Intelligence, Ministry of Education, Xidian University, Xi’an 710071, China Corresponding author: Saman Riaz (samanriaz@hotmail.com) This work was supported in part by the National Basic Research Program (973 Program) of China under Grant 2013CB329402, in part by the National Natural Science Foundation of China under Grant 61573267, Grant 61473215, Grant 61571342, Grant 61572383, Grant 61501353, Grant 61502369, Grant 61271302, Grant 61272282, and Grant 61202176, in part by the Fund for Foreign Scholars in University Research and Teaching Programs (the 111 Project) under Grant B07048, in part by the Major Research Plan of the National Natural Science Foundation of China under Grant 91438201 and Grant 91438103 ABSTRACT While deep learning (DL) has proven to be a powerful paradigm for the classification of the large-scale big image data set Deep learning network such as CNN requires a large number of labeled samples for training the network However, often times the labeled data are difficult, expensive, and timeconsuming In this paper, we propose a semi-supervised approach to fused fuzzy-rough C-mean clustering with convolutional neural networks (CNNs) to knowledge learning from simultaneously intra-model and inter-model relationships forming final data representation to be classified, which can be achieved by better performance Conception behind this is to reduce the uncertainty in terms of vagueness and indiscernibility by using fuzzy-rough C-mean clustering and specifically removing noise samples by using CNN from raw data The framework of our proposed semi-supervised approach uses all the training data with abundant unlabeled data with few labeled data to train FRCNN model To show the effectiveness of our model, we used four benchmark large-scale image datasets and also compared it with the state-of-the-art supervised, unsupervised, and semi-supervised learning methods for image classification INDEX TERMS Semi-supervised learning, convolutional neural network, fuzzy rough C-mean clustering, feature learning, and image classification I INTRODUCTION Representation learning in machine learning is a set of techniques to learn a feature transformation of raw data input to a representation that can be effectively exploited in machine learning tasks such as clustering and classification for big data [1] Currently, it has come into an age of big data, every day millions of data are generating in all kinds of industrial and scientific fields around the world However, with more and more data being generated; yet another critical issue is the existence of the high amount of uncertainties, vagueness and noisy in the entire large data [2], [66]–[68] Henceforth feature ambiguity issues have become a huge challenge for the classification task To overcome these aforementioned issues, Fuzzy learning [3]–[5] and Rough set [6]–[8] have been introduced to deliniate uncertainties amongst the raw data [9]–[14] After successful application of fuzzy learning and rough sets The associate editor coordinating the review of this manuscript and approving it for publication was Guitao Cao VOLUME 7, 2019 for solving many practical problems [15]–[18], [19]–[23], the combination of fuzzy and rough sets provides a paradigm shift in dealing reasoning with uncertainty inconsistency [1], [24] Fuzzy rough set theory [25], [26] covers two difference being the types of uncertainty, namely vagueness (Fuzzy) and indiscernibility (Rough) Various clustering methods have been introduced with fuzzy set theory, rough set theory and there combination to expose uncertainties among the raw data, like Fuzzy C-Mean clustering (FCM) [27], [28], Rough C-Mean clustering (RCM) [7], [8], Hybrid Rough-Fuzzy clustering [9], generalized rough fuzzy possibilistic C-Means (RFPCM) [29] and Fuzzy-Rough CMean clustering (FRCM) [30] FRCM is the incorporation of rough set theory and fuzzy set theory in which each cluster is represented by the center, a crisp lower approximation, a fuzzy boundary in this algorithm, the new center is a weighting average of the boundary and lower approximation Deep learning has become an immensely active area in machine learning [31] for many computer vision [32], and language processing [33] Particular, Convolution Neural 2169-3536 2019 IEEE Translations and content mining are permitted for academic research only Personal use is also permitted, but republication/redistribution requires IEEE permission See http://www.ieee.org/publications_standards/publications/rights/index.html for more information 49641 S Riaz et al.: Semi-Supervised CNN With Fuzzy Rough C-Mean for Image Classification Network (CNN) based models [34]–[37] has been the top performance for image classification tasks after successfully implemented AlexNet [32] to achieve most accurate results on ImageNet dataset [33] CNN models are developing an intensive model that permits large-scale task-driven features to learn from big data These improvements are obtained with supervised learning, which requires a large amount of labeled data for training However, the labeled data is often difficult, expensive or time-consuming [38] To address this issue, semi-supervised learning has attracted growing attention [39], which uses a large amount of unlabeled data together with labeled data to build better learning Recently, several semi-supervised deep learning approaches have been proposed for image classification [19]–[21], [65], [40]–[42], [69], [70] In practice, the advantages are mainly summarized as follows: 1) Semi-supervised Learning, 2) noise reduction, and 3) task-driven features to learn The semi-supervised learning in which labeled and unlabeled data are simultaneously deal during the training process to exploit the hidden information from unlabeled to labeled data for building better learners The noise reduction properly is apparent that with the information transformed from layers to layer, the contaminated on the raw data are sequentially decreased and removed Meanwhile, the task-driven features to learning allow the knowledge to be sequentially propagated from the learning layer (e.g the classification layer) to the data representation layers that provides a more intelligent way to automatically discover informative feature from data It is a common knowledge that both fuzzy logic and neural network system are aimed at exploiting human-like knowledge processing capability and also effective methods for data representation There have been few early attempts that successfully combined fuzzy logic with neural network for image classification Our literature review shows that no one has yet ever added fuzzy and rough factors into CNN models with semi-supervised learning to guarantee noise insensitiveness and image detail preservation to improve the representation ability and robustness However, these approaches mostly fall into sequential learning framework [43]–[45], in which the fuzzy logic of the original input data are further transformed into fuzzy degrees and then using deep learning to representation at output layer for image classification or vice-versa Nonetheless, very few researchers addressed this challenge, by proposing a joint learning framework that could more intelligently fuse respective learning views altogether [1] However, these are based on supervised learning To address this problem, we propose a semi-supervised approach to fused Fuzzy-Rough C-Mean Clustering with Convolutional Neural Network (FRCNN) in which labeled and unlabeled data are simultaneously extracting information from fuzzy-rough c-mean clustering and CNN representations Particularly Fuzzy-Rough representation, reduce the uncertainties in term of vagueness and indiscernibility by using unsupervised Fuzzy-Rough C-Mean clustering and neural representation reduces the noises in original data 49642 by using supervised CNN architecture For data classification, these two better representation learning are fused as semi-supervised representation learning in the fusion layer for the final data representation to be classified The motivation of the proposed work is to handle more difficult pattern classification tasks for semi-supervised data with high-level of uncertainties and noise Our main novelties and contribution of the paper as follows Semi-supervised learning: A novel semi-supervised Fuzzy Rough Convolutional Neural Network (SSFRCNN) approach combining unsupervised Fuzzy Rough C-Mean clustering and supervised neural network to train a proposed semi-supervised CNN Supervised and Unsupervised representation: for unsupervised representation, Fuzzy-Rough C-Mean clustering is used for feature extraction by reducing the (vagueness and indiscernibility) uncertainty from raw data Supervised representation is learned using CNN by removing noise from raw data Semi-supervised representation: In the fusion layer of neural network, the knowledge learned from the relationship of intra-model and inter-model (supervised and unsupervised models), which can achieve better performance are fused together for final representation for data classification Verify Results: the performance of proposed SSFRCNN approach is verified on four benchmark image dataset ImageNet (ILSVRC-12) [46], MNIST 48], CIFAR-10, [48] and Scene-15 [49], which contain high-level of noise and uncertainties SSFRCNN performances are much better than other supervised, unsupervised and semi-supervised state-of-theart deep learning approaches The remainder paper is organized as follows In section II, we briefly review recent advances in Fuzzy-Rough C-Mean Clustering and Deep learning approaches In Section III, we will introduce our new proposed semi-supervised FRCNN algorithm Section IV deals with description of the experimental results and analysis, and finally section V, concludes the paper II RELATED WORK A FUZZY-ROUGH C-MEAN CLUSTERING (FRCM) Clustering is an unsupervised learning approach that separates data into different group’s based on their similarities Fuzzy C-Mean clustering (FCM) [50] and Rough C-Mean clustering (RCM) [7], [8)] are two main mathematical tools of granular computing [51] to reduce uncertainties in terms of exposing uncertainties in terms of vagueness (Fuzziness) and indiscernibility (roughness) among the raw data FuzzyRough C-Mean clustering (FRCM) is the integration of FCM [50] and RCM [7], [8] from the original input data FRCM is proposed by Hu and Yu [52], in which the fuzzy membership values of each sample incorporates to the lower approximation and boundary region of a cluster VOLUME 7, 2019 S Riaz et al.: Semi-Supervised CNN With Fuzzy Rough C-Mean for Image Classification Let X = {a1 , a2 , an } ∈ Rd is a set of image data, where d is the dimension and n is the number of data points Let Cj (j = 1, 2, k) is the set of k clusters and each cluster is regarded as a rough set The set of clusters is categorized into three regions: lower approximation Cj , upper approximation C j and boundary region CjB = C j − Cj Let V = {v1 , v2 , vk } be a set of the vector of k center of clusters The samples in the region of lower approximation belong to a cluster categorically However, the samples that lie in the boundary belong to a cluster to some extent and have diverse effect on the centers and clusters So different weighting values are ought to be imposed on the boundary samples in computing the new centers Let Wijnxk be a membership matrix, the membership function of FRCM can be defined as follows i = 1, 2, , n 1 Wij = j = 1, 2, k (1) 2/(m−1) k d /d ij rj r=1 The exponent m > is utilized to change the weighting impact of membership values The updated cluster center is calculating as follows Vjl = n i=1 n i=1 Wijl−1 m m Wijl−1 , j = 1, 2, , k (2) The objective function of FRCM is n Jml ( w, v) = k Wij i=1 j=1 m − vj (3) B DEEP LEARNING APPROACHES Deep learning is a supervised learning task which has gained a lot of success in the task of large-scale image data clusters and classification [1], [24], [40], [53] for big data However, deep learning models not discuss the uncertainties amongst the raw data, affecting accuracy Therefore, numerous Fuzzy or Rough based deep learning approaches are introduced to handle this uncertainty problem for clustering and classification [1], [24], [43]–[45], [53], [54] Deng et al [1], proposed hierarchically approaches to fuses the fuzzy logic and neural network that simultaneously leaned feature representations altogether for robust data classification In this approach, they reduced the uncertainty in term of vagueness (fuzziness from Fuzzy logic and removed noises from raw data by using neural network Yaganejou and Dick [53] also proposed Fuzzy based CNN model for data classification They first applied CNN to automated feature extraction from the input image and then using Fuzzy C-Mean algorithm for clusters image data in the derived feature space Rajesh and Malar [54], proposed Rough Set based neural network for classification In this approach, they first extract feature from input images using Rough set theory (RST), and then the selected are given as input to feed forward VOLUME 7, 2019 Algorithm Fuzzy-Rough C-Mean (FRCM) Input: Unlabeled data X, k number of clusters, threshold parameters T, exponent index m, ε for stop criterion Output: Membership matricesWij , k cluster centers Begin Let l = 1, initialize the cluster center Vjl−1 , j = 1, 2, , k on the bases of random sampling Assign the data samples to the approximation i Calculate the closest center Vnl−1 and Al−1 for a given data points as follows l−1 d l−1 = dist(ai , vl−1 n ) = dist(ai , ) 1≤j≤k A = {∀r, r = 1, 2, k, r = h : l−1 dirl−1 − dih ≤≤ T (4) / Cl , ii If A = ∅, then ∈ C n , ∈ Cr , and, ∈ l = 1, 2, 3, k If A = ∅, then ∈ Cn , ∈ C n Calculate FRCM membership values by using equation Calculate updated cluster center by using equation Stop algorithm if converged, otherwise, l = l + go to step Neural Network for classification All above approaches have handled the uncertainty from raw data However, all these are supervised learning approaches, which requires large number of labeled data for training Drawback of which is expense and time-consumption Whereas our goal is semi-supervised deep learning approach for data classification Many semi-supervised deep learning approaches are proposed to handle the problem of limited labeled data for clustering and classification [40]–[42], [43] Wu and Prasad [40], proposed semi-supervised deep learning approach using Pseudo labels for image classification In this approach, first predicted label using unlabeled data which is generated by C-DPMM based clustering algorithm and then pre-train a neural network with predicted labels can help us to extract discriminative features Furthermore, fine-tune the proposed neural network will further adjust the features from the pre-trained neural network to make more beneficial to classification Tarvaine et al [70], proposed a semi-supervised learning based Mean Teacher model, in which average model weights to form a target-generating teacher model Unlike Temporal Ensembling, Mean teacher works with large datasets and online learning Laine and Aila [69], also proposed an efficient method for deep neural networks in a semi-supervised way They introduce self-ensembling, where they form a consensus prediction of the unknown labels using the outputs of the network in training on different epochs Both above methods achieved remarkable results on SVHN, CIFAR-10 and CIFAR-100 However, they cannot achieved a better 49643 S Riaz et al.: Semi-Supervised CNN With Fuzzy Rough C-Mean for Image Classification results on datasets which contains high uncertainties and noisy samples Shi et al [41], proposed a transductive semi-supervised deep learning using Min-Max features to overcome unreliable label estimates on outliers and uncertain samples to learn features for classification Zhou et al [43], proposed semi-supervised based fuzzy deep belief network for classification, in which they first train the DBN model by semi-supervised learning and then design a fuzzy membership function for each class of reviews based on the learned deep architecture Currently, no one used the combined view of Fuzzy and Rough logic with the CNN model The main motivation of our study is to reduce the uncertainties in term of vagueness (fuzziness) and indiscernibility (roughness) and removed noise from raw datasets III A SEMI-SUPERVISED FUZZY ROUGH CONVOLUTIONAL NEURAL NETWORK (SSFRCNN) A PRELIMINARIES part The task-driven part is the classification layer by using softmax function that assigns the semi-supervised representation into its corresponding category Further details of each part is given below 1) UNSUPERVISED REPRESENTATION LEARNING Lately, Machine learning has focused on learning better feature representations from unsupervised data for higher-level tasks i.e classification A cardinal drawback of many feature learning system is their complexity and expense We propose a novel Fuzzy Rough C-Mean based feature learning technique, which is a highly useful technique for representation learning with high dimensional datasets containing high-level of uncertainties Here, we learn representation on the bases of FRCM clustering [50], we apply FRCM clustering to learn k centroids from the U unlabeled dataset by using algorithm Given the learned centroids Vk , we choose non-linear mapping for unsupervised representation learning fkl (ai ) = max 0, µ zlk − zlk , i = m + 1, n (5) where is the input data, ∈ U or |{ Xi } |ni=m+1 , zlk = ali − vlk and µ zlk is the mean of the elements of z if the output ‘‘0’’ of any feature fk , where the distance to the centroid Vk is the ‘‘above average’’ In practice, this means that roughly half of the feature will be set 2) SUPERVISED REPRESENTATION LEARNING For the supervised representation, we used AlexNet [32] and LeNet-5 [61] in order to extract valuable knowledge for enhancing high-level representation AlexNet consist of five convolutional layer and three fully-connected layers, each node on the l th layer is connected to all the nodes on the (l − 1)th layer with parameters θ l = wl , bl , i.e., fil = B METHODOLOGY In this section, we will introduce a semi-supervised representation learning approach for image data classification A Semi-Supervised Fuzzy Rough Convolution Neural Network (SSFRCNN) framework is shown in figure 1, which consist of four parts: (1) unsupervised representation learning, (2) supervised representation learning, (3) semi-supervised representation learning (Fusion part), and (4) Task-driven In this framework, the semi-supervised input data follow two paths to make the unsupervised and supervised representation learnings: (A) Fuzzy Rough C-Mean clustering is used for unsupervised representation learning on the bases of unlabeled data, and (B) CNN’s architecture is used for supervised representation learning on the bases of labeled data Consequently, the semi-supervised representation learning learned from these two concepts by fused in the fusion 49644 , i = 1, 2, m l + e−ai = wli f (l−1) + bli (6) where wli and bli represent the weights and bias connecting the node i on the lth layer Initially is the input data from labeled set, ∈ L 3) SEMI-SUPERVISED REPRESENTATION LEARNING Semi-Supervised representation is the combination of unsupervised representation and supervised representation, inspired by multi-model learning [17] according to J Ngian.Single view is not enough to capture the complex and high resolution image data In Semi-supervised Fuzzy Rough C-Mean CNN (SSFRCNN), we simultaneously exploited unsupervised representation by using FRCM and CNN architecture for supervised representation and better representation This was achieved by reducing the high level of uncertainty and noise in the original input data In this VOLUME 7, 2019 S Riaz et al.: Semi-Supervised CNN With Fuzzy Rough C-Mean for Image Classification FIGURE Flowchart of proposed approach semi-supervised FRCNN model paper, we combine unsupervised and supervised representation sensations with dense-connected fusion layers fil = , i = 1, 2, m, m + 1, n l + e−ai (f j )l−1 + (wFR )li = (wD )li (f k )l−1 + bli (7) j k In equation 7, fk is the output from the unsupervised representation and fj is the output from CNN architecture Both representations is fused together with weights wD and wFR , respectively 4) TASK-DRIVEN PART This is based on final classification layer that assigns semisupervised representation into its corresponding category Where we have a softmax-function to compute the predictive probabilities for all categories This is done by calculating yˆ ik = exp (wk π ρ (xi ) + bk k exp (wk π (xi ) + bk (8) where, we denote (xi , yi ) or (ai , f (ai )) as the ith labeled input and its corresponding label, π (xi ) mean the feed-forward transformation of SSFRCNN from input layer to find classification layer wk and bk represent the weight vectors, bias vectors, and the k categories The loss takes the form of an average over the losses from every training samples L (θ, zL , zU ) = [λLL ( , zL ) + γ LU (ϕ, zU )] = λ m m i yˆ i − yi + γ n−m n i=m+1 yˆ i − y¯ i (9) where yˆ i is predicted labels by SSFRCNN algorithm, y¯ i is predicted labels data by FRCM clustering algorithm, and θ = ( , ϕ) is the set of parameters of supervised and unsupervised VOLUME 7, 2019 representation parts, λ and γ are the parameters to control the tradeoff between supervised and unsupervised terms, n is the number of training samples, m is labeled samples and n-m is unlabeled samples C SEMI-SUPERVISED FUZZY ROUGH CONVOLUTION NEURAL NETWORK (FRCNN) TRAINING In algorithm 2, parameter initialization and fine tuning are two major steps for training Semi-Supervised FRCNN model In the initialization phase, initialization should be done for both supervised and unsupervised learning parts For initialization of unsupervised part, we follow the method [52], initially, the cluster centroids are the average of data points in each k cluster, where k is suggested the same as the class number to be classified Exponent parameter m = 2, we follow the initialization of all other parts of supervised, semi-supervised, and the classification part is suggested by the [55] Then, the weight between the layers of all other three parts is randomly initialized according to the rule 1 wi ∼ U − √ , √ r r (10) where U is the uniform distribution and r is the size of the previous layer For the semi-supervised part, n is the total size of the previous layer for both supervised and unsupervised part and we initialized the biases to be zero After we initialized all parts, the fine tuning step was required to train the whole semi-supervised FRCNN in a classification manner Better fine tuning process may help to precisely adjust the parameters in the neural network to enhance the discriminant ability of the final representation We used the back-propagation algorithm to calculate the gradient for all 49645 S Riaz et al.: Semi-Supervised CNN With Fuzzy Rough C-Mean for Image Classification Algorithm The Training Strategies for Semi-Supervised FRCNN Input: Training samples X = {XL , XUL } and their labels Y = {YL , YUL }, where YUL is predicted labeled from FRCM, parameters λ and γ , k is the number of features that suggests the same as the class number training epoch number T Initialization: Initialize the parameters θ = ( , ϕ) in semi-supervised FRCNN with two steps Initialize unsupervised part according to [52], initialize weights of supervised and semi-supervised parts according to equation 10 For t =1 T Get the fitting error L by using equation through training samples with their labels (X, Y) Propagate the fitting error L according to equation 10 and 11 Apply adaptation law in equation 13 to update the new parameters set θˆt+1 θt+1 = θˆt+1 End Output: The well trained semi-supervised FRCNN with θt+1 the parameters in the system, [56] ∂L = ∂θ l r ∂Lr ∂fil ∂fil ∂ali ∂ali ∂θ l (11) where L the loss is function defined in equation 9, θ is the set of general parameters of semi-supervised FRCNN In the unsupervised part, the parameters set ϕ = {vi } to be adjusted by equation ∂LU ∂LU = = (ˆyi − y¯ i ) ∂ϕ ∂vi m−n max(0, µ (zi ) − zi ) 2(ai − vi ) (12) The neuron in the supervised part, semi-supervised part, and task-driven part just involves linear weights and biases i.e = {b, w} Their gradient is easily calculated according to equation 6, and After obtaining the gradient for the parameters set, to implement the fine-tuning step, we applied the stochastic gradient descending The adaptation law for the parameters is given as follows Vt = ωγt−1 − α ∂L , ∂θ l l θt+1 = θtl − Vt (13) where Vt is the velocity vector, which is determined by the previous velocity and the current gradient, t is the iteration counts, ω ǫ [0, 1] controls the information can contribute by the previous gradient and α < is the learning rate We set the number ω = 0.1 and α = 0.05 according to suggestions in [57] 49646 TABLE Description of dataset IV EXPERIMENT Subsequently, our experimental study is mainly divided into two parts In the first part, we used four image datasets to validate the efficiency of proposed method of semi-supervised FRCNN with other semi-supervised state-of-the-art methods In the second part, we compared our semi-supervised approach with state-of-the-art unsupervised and supervised classification methods Unsupervised learning algorithms are Support Vector Machine (SVM) [58] and Fuzzy Rough C-Mean (FRCM) [50] SemiSupervised learning algorithms are Semi-Supervised Deep Learning using Pseudo Labels (PL-SSDL) [40], Spatiospectral LapSVM (SS-LapSVM) [59], Semi-Supervised Ladder Network (SSLN) [60], Temporal Ensembling with augmentation (TE) [69] and Mean Teacher: weight-averaged consisting targets (Mean-Teacher) [70] Supervised learning algorithms include CNNs model AlexNet [32], Lenet-5 [61], CDNN [32], and Fused Fuzzy Deep Neural Network (FDNN) [1]) In our experiment, we used two deep CNN model AlexNet [32] and Lenet-5 [61] as the baseline architecture A DATA PREPARATION In our experiments, we exploited the performance of proposed semi-supervised approach of Fuzzy Rough C-Mean with AlexNet [32] architecture on the large scale ImageNet ILSVRC-12 [46] dataset This datasets consist of 1.2 million training examples and 50,000 validation examples of 256 × 256 pixel size collected from 10,000 object categories, and LeNet-5 [61] architecture on the three small scale datasets; Scene-15 [49], MNIST [47], and CIFAR-10 [48] Scene15 datasets contains more than 4,500 natural images of indoor and outdoor from 15 categories (15 classes) To extract visual features, we follow the bag-of-feature method To further explain, on each image, we use a grid-based method to extract the dense SIFT [62] features on 16 × 16 pixel patches sampled every pixels To generate features, the local features are clustered into 1024 codewords by kernel assignments [63] MNIST is handwritten digits dataset containing 60,000 training images and 10,000 testing images of 28 × 28 pixel size from 10 classes i.e digits between to 9, the digits are centered and the size is normalized CIFAR-10 is a natural image dataset containing 50,000 training images and 10,000 testing images of 32 × 32 pixel color image from 10 classes B EXPERIMENTAL SETUP To evaluate our proposed approach, we trained and tested on four benchmark datasets namely ImageNet, MNIST, VOLUME 7, 2019 S Riaz et al.: Semi-Supervised CNN With Fuzzy Rough C-Mean for Image Classification CIFAR-10, and Scene-15 For training, we used (10%, 20%, & 30%) samples from each class as labeled samples and rest are used as unlabeled samples We initialize the unsupervised part of our approach with fuzziness m = 2, objective threshold ε 0.01 and k is the number of clusters, which is equal to the number of classes For supervised part, we used AlexNet [32] for ImageNet [46], Lenet-5 [61] for MNIST, CIFAR-10, and Scene-15 To train the proposed semi-supervised model, we used stochastic gradient descent (equation 13) with λ and γ = 0.01, ω = 0.1&α = 0.05 suggested in [57] For evaluation purposes, we used accuracy measure for classification All recorded final results of training and testing are the average of repeated experiments 10 times with epochs 40 To further demonstrate the performance of our approach, we compared our approach with other five semi-supervised (PL-SSDL, SS-LapSVM, SSLN, TE, Mean Teacher), two unsupervised (SVM and FRCM) and three supervised (AlexNet/Lenet-5, CDNN, and FDNN) learning algorithms We conducted all the experiments using TensorFlow [64] on a personal computer with a commercial GPU card TABLE Accuracy of semi-supervised methods on four datasets with 20% labeled rate TABLE Accuracy of semi-supervised methods on four datasets with 30% labeled rate C EXPERIMENT RESULT AND ANALYSIS In this section, we investigated our novel approach of semisupervised FRCNN approach for four datasets including large-scale ImageNet ILSVRC-12 [46] dataset with labeled rate 10%, 20%, & 30% All experimental results were conducted at 40 epochs For comparison purposes, we firstly investigated our approach with state-of-the-art semisupervised approach Table 2, & are representing the classification accuracy of semi-supervised FRCNN with other compared semi-supervised methods at 10%, 20% & 30% labeled rate Reported results include the mean and standard deviation of experimental results, which are repeated for 10 times According to the results, when compared to other methods, we can analyze that our proposed approach achieved better results on all datasets with different labeled rates This improvement is on the bases of feature representation mechanism with robust data Multi-model Learning [31] believes that the extracted features from single method are not sufficient to handle complex structures TABLE Accuracy of semi-supervised methods on four datasets with 10% labeled rate VOLUME 7, 2019 In our approach, we took advantage of supervised and unsupervised models for feature representation by fusion of these methods in a semi-supervised manner Figure 2, shows classification results of our approach on six sample images Figure 3, shows the results of accuracy of semi-supervised methods on all four datasets with different labeled rates According to figure 3, our proposed approach achieved a significant performance than all other methods Mean teacher and TE perform better with the increase in labeled data on CIFAR-10 dataset, and these methods not perform better on Scene-15 as compared tour approach due to increase uncertainty and noisy samples in dataset TE is the self-ensembling method, where they perform a consensus prediction of the unknown labels using the output of the network-in-training on different epochs, which are not consistent with their true labels Our approach, semi-supervised FRCNN used unlabeled data with their predicted labels using FRCM clustering as a ground truth, which is consistent to their true labels and extracts the discriminative information which is useful for the classification Due to this reason, semi-supervised FRCNN achieved better results as compared to PL-SSDL Secondly, we investigated the performance of our approach with other state-of-the-art supervised, unsupervised, and semi-supervised methods Table 5, shows the accuracy of supervised, unsupervised, semi-supervised methods on all datasets Semi-supervised results are the average of three labeled rates (10%, 20%, & 30%) FRCM, AlexNet, and LeNet-5 are used as baseline methods According to the results of table 5, supervised methods are superior to unsupervised methods Figure 4, is the multi-bar character, which shows the average 49647 S Riaz et al.: Semi-Supervised CNN With Fuzzy Rough C-Mean for Image Classification FIGURE Top five classification results of six image samples of imageNet (ILSVRC-12) dataset (correct labels of each image is shown under the image) TABLE Average accuracy of unsupervised, supervised, and semi-supervised methods on four datasets accuracy of supervised, unsupervised, and semi-supervised on all datasets According to figure 4, we compared the semi-supervised approach with supervised and unsupervised approaches Here we observed that the semi-supervised approach achieved better results due to their abundance of unlabeled and few labeled data FDNN achieved superior results except for CIFAR-10 This was due to learned knowledge by multi-model representation learning compared to other supervised approaches When we analyzed the results 49648 of our approach, the results did improved in two main areas Firstly, the knowledge learned by multi-model learning with robust data, and secondly, semi-supervised data used to train the model D TIME COMPLEXITY To evaluate the efficiency of our proposed algorithm, we compared the running time of our approach with other compared methods (which are presented in table 6) According to the VOLUME 7, 2019 S Riaz et al.: Semi-Supervised CNN With Fuzzy Rough C-Mean for Image Classification FIGURE Comparison of the accuracy of proposed method SSFRCNN and other semi-supervised methods with different labeled rates on four datasets (a) ImageNet (b) MNIST (c) CIFAR-10 (d) Scene-15 E CONVERGENCE ANALYSIS OF SEMI-SUPERVISED FUZZY ROUGH CONVOLUTIONAL NEURAL NETWORK (SSFRCNN) FIGURE Comparison of average accuracy of semi-supervised methods with supervised and unsupervised methods on four datasets table, we analyzed that semi-supervised based approaches require longer running time for training than the other supervised and unsupervised based approaches However, our proposed semi-supervised approach consumed shorter training time compared to other all semi-supervised based approaches Our method can not only effectively address the problem of training time but also classification accuracy VOLUME 7, 2019 Although these are guaranteed to converge, the CNN architecture, FCM and Rough C-Mean (RCM) are well-known methods in machine learning However, there are no theoretical convergence proof of FRCM and clustering based CNN for classification Nonetheless, many previous studies in clustering based CNN models have shown that CNN’s based models for clustering and classification can usually approach to reliable convergence performance in the training phase To evaluate the convergence performance of our proposed approach with only initialization and after pre-trained models on all datasets which are shown in figure Figure shows that the accuracy performance of our approach with the pre-trained model is higher than that with initialization of parameters on all datasets However, we can also observe from figure 5, that the accuracy of the proposed method with both initialization and pre-trained models are increasing with the number of training epochs Which shows that these both models make semi-supervised FRCNN converge stability It is also interesting to note that with the increase of training epochs, our pre-trained model becomes 49649 S Riaz et al.: Semi-Supervised CNN With Fuzzy Rough C-Mean for Image Classification TABLE Computational time comparison of semi-supervised FRCNN (SSFRCNN) with other methods The competitive performance of our approach shows that it significantly outperforms the other state-of-the-art supervised, unsupervised, and semi-supervised approaches on four image datasets Our aim in future is to also investigate the use of supervised and semi-supervised deep architectures as representation learning tasks REFERENCES FIGURE The convergence analysis of the training on all datasets with pre-trained model and initialization flat, while the initialization model suffers a little oscillation after 30 epochs in first 50 epochs Therefore, in our paper, we chose 40 epochs V CONCLUSION In our paper, we designed a novel efficient semi-supervised algorithm (SSFRCNN), which fuses the neural network (supervised learning), and Fuzzy-Rough C-Mean clustering (unsupervised learning) This approach successfully alleviates the small number of labeled data, feature uncertainties, and noise in raw data for robust data classification tasks In the unsupervised representation learning, the efficient human-like reasoning of Fuzzy, Rough logic in handling uncertainty, supervised representation learning and the deep learning reduces noise from raw data The fusion of supervised and unsupervised representation learning makes the semi-supervised representation learning, achieve much better classification results on large-scale image datasets In our study, we used the unlabeled data with their predicted labels generated by FRCM clustering algorithm for training semi-supervised FRCNN These can help to generate a more reasonable feature for classification The experiment results provide soundness support for the effectiveness of semi-supervised learning for real applications 49650 [1] Y Deng, Z Ren, Y Kong, F Bao, and Q Dai, ‘‘A hierarchical fused fuzzy deep neural network for data classification,’’ IEEE Trans Fuzzy Syst., vol 25, no 4, pp 1006–1012, Aug 2017 [2] A L Blum and P Langley, ‘‘Selection of relevant features and examples in machine learning,’’ Artif Intell., vol 97, pp 245–271, Dec 1997 [3] L A Zadeh, ‘‘Fuzzy sets,’’ Inf Control, vol 8, no 3, pp 338–353, Jun 1965 [4] C.-C Lin and C S G Lee, ‘‘Neural-network-based fuzzy logic control and decision system,’’ IEEE Trans Comput., vol 40, no 12, pp 1320–1336, Dec 1991 [5] J J Buckley and Y Hayashi, ‘‘Fuzzy neural networks: A survey,’’ Fuzzy Sets Syst., vol 66, no 1, pp 1–13, 1994 [6] J C van Gemert, J.-M Geusebroek, C J Veenman, and A W Smeulders, ‘‘Kernel codebooks for scene categorization,’’ in Computer Vision—ECCV Berlin, Germany: Springer, Oct 2008, pp 696–709 [7] P Lingras and C West, ‘‘Interval set clustering of Web users with rough k-means,’’ J Intell Inf Syst., vol 23, no 1, pp 5–16, Jul 2004 [8] G Peters, ‘‘Outliers in rough k-means clustering,’’ in Pattern Recognition and Machine Intelligence, vol 3776 Berlin, Germany: Springer, Dec 2005, pp 702–707 [9] G Peters, F Crespo, P Lingras, and R Weber, ‘‘Soft clustering–Fuzzy and rough approaches and their extensions and derivatives,’’ Int J Approx Reasoning, vol 54, pp 307–322, Feb 2013 [10] Y Zhang, S Ye, and W Ding, ‘‘Based on rough set and fuzzy clustering of MRI brain segmentation,’’ Int J Biomath., vol 10, no 2, Feb 2017, Art no 1750026 [11] T Zhang and F Ma, ‘‘Improved rough k-means clustering algorithm based on weighted distance measure with Gaussian function,’’ Int J Comput Math., vol 94, no 4, pp 663–675, 2017 [12] K A Vidhya and T V Geetha, ‘‘Rough set theory for document clustering: A review,’’ J Intell Fuzzy Syst., vol 32, no 3, pp 2165–2185, 2017 [13] F J Wentz et al., ‘‘Evaluating and extending the ocean wind climate data record,’’ IEEE J Sel Topics Appl Earth Observ Remote Sens., vol 10, no 5, pp 2165–2185, May 2017 [14] N N R R Suri, M N Murty, and G Athithan, ‘‘Detecting outliers in categorical data through rough clustering,’’ Natural Comput., vol 15, no 3, pp 385–394, 2016 [15] H K Kwan and Y Cai, ‘‘A fuzzy neural network and its application to pattern recognition,’’ IEEE Trans Fuzzy Syst., vol 2, no 3, pp 185–193, Aug 1994 [16] F S Wong, P Z Wang, T H Goh, and B K Quek, ‘‘Fuzzy neural systems for stock selection,’’ Financial Analysts J., vol 48, no 1, pp 47–52, 1992 [17] M K Mehlawat and P Gupta, ‘‘Fuzzy chance-constrained multiobjective portfolio selection model,’’ IEEE Trans Fuzzy Syst., vol 22, no 3, pp 653–671, Jun 2014 VOLUME 7, 2019 S Riaz et al.: Semi-Supervised CNN With Fuzzy Rough C-Mean for Image Classification [18] F.-J Lin, C.-H Lin, and P.-H Shen, ‘‘Self-constructing fuzzy neural network speed controller for permanent-magnet synchronous motor drive,’’ IEEE Trans Fuzzy Syst., vol 9, no 5, pp 751–759, Oct 2001 [19] A Arshad, S Riaz, L Jiao, and A Murthy, ‘‘A semi-supervised deep fuzzy C-mean clustering for two classes classification,’’ in Proc IEEE 3rd Inf Technol Mechatron Eng Conf (ITOEC), Oct 2017, pp 365–370 [20] A Arshad, S Riaz, L Jiao, and A Murthy, ‘‘Semi-supervised deep fuzzy c-mean clustering for software fault prediction,’’ IEEE Access, vol 6, pp 25675–25685, 2018 [21] A Arshad, S Riaz, L Jiao, and A Murthy, ‘‘The empirical study of semisupervised deep fuzzy c-mean clustering for software fault prediction,’’ IEEE Access, vol 6, pp 47047–47061, 2018 [22] S Riaz, A Arshad, and L Jiao, ‘‘Rough-KNN noise-filtered convolutional neural network for image classification,’’ in Proc 3rd Int Conf Inf Technol Intell Transp Syst (ITITS), vol 314, Xi’an, China, Jan 2019, pp 265–275 [23] S Riaz, A Arshad, and L Jiao, ‘‘Rough noise-filtered easy ensemble for software fault prediction,’’ IEEE Access, vol 6, pp 46886–46899, 2018 [24] S Riaz, A Arshad, and L Jiao, ‘‘Fuzzy rough C-mean based unsupervised CNN clustering for large-scale image data,’’ Appl Sci., vol 8, no 10, p 1869, 2018 [25] K Beyer, J Goldstein, R Ramakrishnan, and U Shaft, ‘‘When is ‘Nearest Neighbor’ meaningful?’’ in Proc Int Conf Database Theory, Jan 1999, pp 217–235 [26] P K Bhowmick, A Basu, P Mitra, and A Prasad, ‘‘Sentence level news emotion analysis in fuzzy multi-label classification framework,’’ in Proc Natural Lang Process Appl., 2010, pp 143–154 [27] V Barnett and T Lewis, Outliers in Statistical Data, 3rd ed Hoboken, NJ, USA: Wiley, 1984 [28] J C Dunn, ‘‘Some recent investigations of a new fuzzy partitioning algorithm and its application to pattern classification problems,’’ J Cybern., vol 4, no 2, pp 1–15, 1974 [29] P Maji and S K Pal, ‘‘Rough set based generalized fuzzy C-means algorithm and quantitative indices,’’ IEEE Trans Syst., Man, Cybern., B (Cybern.), vol 37, no 6, pp 1529–1540, Dec 2007 [30] N Verbiest, ‘‘Fuzzy rough and evolutionary approaches to instance selection,’’ M.S thesis, Dept Comput Sci., Ghent Univ., Ghent, Belgium, 2014 [31] J Ngiam, A Khosla, M Kim, J Nam, H Lee, and A Y Ng, ‘‘Multimodal deep learning,’’ in Proc 28th Int Conf Mach Learn (ICML), 2011, pp 689–696 [32] A Krizhevsky, I Sutskever, and G E Hinton, ‘‘Imagenet classification with deep convolutional neural networks,’’ in Proc Adv Neural Inf Process Syst., 2012, pp 1097–1105 [33] R Girshick, J Donahue, T Darrell, and J Malik, ‘‘Rich feature hierarchies for accurate object detection and semantic segmentation,’’ in Proc IEEE Conf Comput Vis Pattern Recognit., Jun 2014, pp 580–587 [34] K Simonyan and A Zisserman (2014) ‘‘Very deep convolutional networks for large-scale image recognition.’’ [Online] Available: https:// arxiv.org/abs/1409.1556 [35] K He, X Zhang, S Ren, and J Sun, ‘‘Deep residual learning for image recognition,’’ in Proc IEEE Conf Comput Vis Pattern Recognit., Jun 2016, pp 770–778 [36] A Graves and J Schmidhuber, ‘‘Framewise phoneme classification with bidirectional LSTM and other neural network architectures,’’ Neural Netw., vol 18, no 5, pp 602–610, 2005 [37] A Graves, A.-R Mohamed, and G Hinton, ‘‘Speech recognition with deep recurrent neural networks,’’ in Proc IEEE Int Conf Acoust., Speech Signal Process., Vancouver, BC, Canada, May 2013, pp 6645–6649 [38] O Chapelle, B Scholkopf, and A Zien, Semi-supervised Learning Cambridge, MA, USA, Press, USA: MIT, 2006 [39] X Zhu, ‘‘Semi-supervised learning literature survey,’’ Ph.D dissertation, Dept Comput Sci., Univ Wisconsin, Madison, WI, USA, 2007 [40] H Wu and S Prasad, ‘‘Semi-supervised deep learning using pseudo labels for hyperspectral image classification,’’ IEEE Trans Image Process., vol 27, no 3, pp 1259–1270, Mar 2018 [41] W Shi, Y Gong, C Ding, Z Ma, X Tao, and N Zheng, ‘‘Transductive semi-supervised deep learning using min-max features,’’ in Proc Eur Conf Comput Vis (ECCV), vol 11209 Cham, Switzerland: Springer, 2015, pp 299–315 VOLUME 7, 2019 [42] Y Kuznietsov, J Stuckler, and B Leibe, ‘‘Semi-supervised deep learning for monocular depth map prediction,’’ in Proc IEEE Conf Comput Vis Pattern Recognit., 2017, pp 6647–6655 [43] S Zhou, Q Chen, and X Wang, ‘‘Fuzzy deep belief networks for semi-supervised sentiment classification,’’ Neurocomputing, vol 131, pp 312–322, May 2014 [44] R Zhang, F Shen, and J Zhao, ‘‘A model with fuzzy granulation and deep belief networks for exchange rate forecasting,’’ in Proc Int Joint Conf Neural Netw (IJCNN), Jul 2014, pp 366–373 [45] G Padmapriya and K Duraiswamy, ‘‘Association of deep learning algorithm with fuzzy logic for multidocument text summarization,’’ J Theor Appl Inf Technol., vol 62, no 1, pp 166–173, 2014 [46] J Deng, A Berg, S Satheesh, H Su, A Khosla, and F Li, ‘‘ImageNet large-scale visual recognition competition 2012,’’ Int J Comput Vis., vol 115, pp 211–252, 2015 [Online] Available: http://www.imagenet.org/challenges/LSVRC/2012/ [47] Yann LeCun The MNIST Database of Handwritten Digits Accessed: Dec 15, 2018 [Online] Available: http://yann.lecun.com/exdb/mnist/ [48] A Krizhevsky and G Hinton, ‘‘Learning multiple layers of features from tiny images,’’ M.S thesis, Univ Toronto, Toronto, Dept Comput Sci., ON, USA, 2009 [49] X Wang, X Gao, Y Yuan, D Tao, and J Li, ‘‘Semi-supervised Gaussian process latent variable model with pairwise constraints,’’ Neurocomputing, vol 73, pp 2186–2195, Jun 2010 [50] W Jin, A K H Tung, and J Han, ‘‘Mining top-n local outliers in large databases,’’ in Proc 7th ACM SIGKDD Int Conf Knowl Discovery Data Mining, pp 293–298, Aug 2001 [51] J T Yao, A V Vasilakos, and W Pedrycz, ‘‘Granular computing: Perspectives and challenges,’’ IEEE Trans Cybern., vol 43, no 6, pp 1977–1989, Dec 2013 [52] Q Hu and D Yu, ‘‘An improved clustering algorithm for information granulation,’’ in Fuzzy Systems and Knowledge Discovery, vol 3613 Berlin, Germany: Springer, Aug 2005, pp 494–504 [53] M Yeganejou and S Dick, ‘‘Classification via deep fuzzy c-means clustering,’’ in Proc IEEE Int Conf Fuzzy Syst (FUZZ-IEEE), Jul 2018, pp 1–6 [54] T Rajesh and R S M Malar, ‘‘Rough set theory and feed forward neural network based brain tumor detection in magnetic resonance images,’’ in Proc Int Conf Adv Nanomater Emerg Eng Technol., pp 240–244, Jul 2013 [55] X Glorot and Y Bengio, ‘‘Understanding the difficulty of training deep feedforward neural networks,’’ in Proc 13th Int Conf Artif Intell Statist., Mar 2010, pp 249–256 [56] C.-F Juang and C.-T Lin, ‘‘An online self-constructing neural fuzzy inference network and its applications,’’ IEEE Trans Fuzzy Syst., vol 6, no 1, pp 12–32, Feb 1998 [57] I Sutskever, J Martens, G Dahl, and G Hinton, ‘‘On the importance of initialization and momentum in deep learning,’’ in Proc Int Conf Mach Learn., Feb 2013, pp 1139–1147 [58] T Menzies, J Greenwald, and A Frank, ‘‘Data mining static code attributes to learn defect predictors,’’ IEEE Trans Softw Eng., vol 33, no 1, pp 2–13, Jan 2007 [59] L Yang, S Yang, P Jin, and R Zhang, ‘‘Semi-supervised hyperspectral image classification using spatio-spectral Laplacian support vector machine,’’ IEEE Geosci Remote Sens Lett., vol 11, no 3, pp 651–655, Mar 2014 [60] A Rasmus, M Berglund, M Honkala, H Valpola, and T Raiko, ‘‘Semisupervised learning with ladder networks,’’ in Proc Adv Neural Inf Process Syst., 2015, pp 3546–3554 [61] Y Chen, Z Lin, X Zhao, G Wang, and Y Gu, ‘‘Deep learning-based classification of hyperspectral data,’’ IEEE J Sel Topics Appl Earth Observ Remote Sens., vol 7, no 6, pp 2094–2107, Jun 2014 [62] J C van Gemert, C J Veenman, A W M Smeulders, and J.-M Geusebroek, ‘‘Visual word ambiguity,’’ IEEE Trans Pattern Anal Mach Intell., vol 32, no 7, pp 1271–1283, Jul 2010 [63] A Oliva and A Torralba, ‘‘Modeling the shape of the scene: A holistic representation of the spatial envelope,’’ Int J Comput Vis., vol 42, no 3, pp 145–175, 2001 [64] M Abadi et al (2016) ‘‘Tensorflow: Large-scale machine learning on heterogeneous distributed systems.’’ [Online] Available: https://arxiv org/abs/1603.04467 [65] A Arshad, S Riaz, and L Jiao, ‘‘Semi-supervised deep fuzzy c-mean clustering for imbalanced multi-class classification,’’ IEEE Access, vol 7, pp 28100–28112, 2019 49651 S Riaz et al.: Semi-Supervised CNN With Fuzzy Rough C-Mean for Image Classification [66] B Du, Y Zhang, L Zhang, and D Tao, ‘‘Beyond the sparsity-based target detector: A hybrid sparsity and statistics-based detector for hyperspectral images,’’ IEEE Trans Image Process., vol 25, no 11, pp 5345–5357, Nov 2016 [67] B Du and L Zhang, ‘‘Target detection based on a dynamic subspace,’’ Pattern Recognit., vol 47, no 1, pp 344–358, 2014 [68] L Zhang, Q Zhang, L Zhang, D Tao, X Huang, and B Du, ‘‘Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding,’’ Pattern Recognit., vol 48, no 10, pp 3102–3112, Dec 2015 [69] S Laine and R Aila (2016) ‘‘Temporal ensembling for semi-supervised learning.’’ [Online] Available: https://arxiv.org/abs/1610.02242 [70] A Tarvainen, and H Valpola, ‘‘Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,’’ in Proc Adv Neural Inf Process Syst., 2017, pp 1195–1204 SAMAN RIAZ received the M.Phil degree in applied mathematics and the M.Sc degree in applied mathematics from Quaid-e-Azam, Pakistan, in 2008 and 2006, respectively She is currently pursuing the Ph.D degree with the School of Computer Science and Technology, Xidian University, China Her research interests include machine learning, rough set theory, and probability 49652 ALI ARSHAD received the B.S degree in computer science from Iqra University, Pakistan, in 2008, and the M.S degree in software engineering from International Islamic University, Pakistan, in 2012 He is currently pursuing the Ph.D degree with the School of Computer Science and Technology, Xidian University, China His research interests include machine learning, semi-supervised learning, and fuzzy C-mean clustering LICHENG JIAO received the B.S degree from Shanghai Jiao Tong University, Shanghai, China, in 1982, and the M.S and Ph.D degrees from Xi’an Jiaotong University, Xi’an, China, in 1984 and 1990, respectively Since 1992, he has been a Professor with the School of Artificial Intelligence, Xidian University, Xi’an, where he is currently the Director of the Key Laboratory of Intelligent Perception and Image Understanding, Ministry of Education, China He is in charge of about 40 important scientific research projects He has published over 20 monographs and 100 papers in international journals and conferences His research interests include image processing, natural computation, machine learning, and intelligent information processing He is an Expert of the Academic Degrees Committee of the State Council He is a Fellow of the IEEE Xi’an Section Execution Committee He is the Chair of the Awards and Recognition Committee He is the Vice Board Chairperson of the Chinese Association of Artificial Intelligence He is also a Councilor of the Chinese Institute of Electronics He is a Committee Member of the Chinese Committee of Neural Networks VOLUME 7, 2019