Bản chất của hình ảnh y sinh học (Phần 12)

12 Pattern Classi cation and Diagnostic Decision The nal purpose of biomedical image analysis is to classify a given image, or the features that have been detected in the image, into one of a few known categories In medical applications, a further goal is to arrive at a diagnostic decision regarding the condition of the patient A physician or medical specialist may achieve this goal via visual analysis of the image and data presented: comparative analysis of the given image with others of known diagnoses or the application of established protocols and sets of rules assist in such a decision-making process Images taken earlier of the same patient may also be used, when available, for comparative or di erential analysis Some measurements may also be made from the given image to assist in the analysis The basic knowledge, clinical experience, expertise, and intuition of the physician play signi cant roles in this process When image analysis is performed via the application of computer algorithms, the typical result is the extraction of a number of numerical features When the numerical features relate directly to measurements of organs or features represented by the image | such as an estimate of the size of the heart or the volume of a tumor | the clinical specialist may be able to use the features directly in his or her diagnostic logic However, when parameters such as measures of texture and shape complexity are derived, a human analyst is not likely to be able to analyze or comprehend the features Furthermore, as the number of the computed features increases, the associated diagnostic logic may become too complicated and unwieldy for human analysis Computer methods would then be desirable to perform the classi cation and decision process At the outset, it should be borne in mind that a biomedical image forms but one piece of information in arriving at a diagnosis: the classi cation of a given image into one of many categories may assist in the diagnostic procedure, but will almost never be the only factor Regardless, pattern classi cation based upon image analysis is indeed an important aspect of biomedical image analysis, and forms the theme of the present chapter Remaining within the realm of CAD as introduced in Figure 1.33 and Section 1.11, it would be preferable to design methods so as to aid a medical specialist in arriving at a diagnosis rather than to provide a decision © 2005 by CRC Press LLC 1089 1090 Biomedical Image Analysis A generic problem statement for pattern classi cation may be expressed as follows: A number of measures and features have been derived from a biomedical image Develop methods to classify the image into one of a few specied categories Investigate the relevance of the features and the classi cation methods in arriving at a diagnostic decision about the patient Observe that the features mentioned above may have been derived manually or by computer methods Recognize the distinction between classifying the given image and arriving at a diagnosis regarding the patient: the connection between the two tasks or steps may not always be direct In other words, a pattern classi cation method may facilitate the labeling of a given image as being a member of a particular class arriving at a diagnosis of the condition of the patient will most likely require the analysis of several other items of clinical information Although it is common to work with a prespeci ed number of pattern classes, many problems exist where the number of classes is not known a priori A special case is screening, where the aim is to simply decide on the presence or absence of a certain type of abnormality or disease The initial decision in screening may be further focused on whether the subject appears to be free of the speci c abnormality of concern or requires further investigation The problem statement and description above are rather generic Several considerations arise in the practical application of the concepts mentioned above to medical images and diagnosis Using the detection of breast cancer as an example, the following questions illustrate some of the problems encountered in practice Is a mass or tumor present? (Yes/No) If a mass or tumor is present { Give or mark its location { Compare the density of the mass to that of the surrounding tissues: hypodense, isodense, hyperdense { Describe the shape of its boundary: round, ovoid, irregular, macrolobulated, microlobulated, spiculated { Describe its texture: homogeneous, heterogeneous, fatty { Describe its edge: sharp (well-circumscribed), ill-de ned (fuzzy) { Decide if it is a benign mass, a cyst (solid or uid- lled), or a malignant tumor Are calci cations present? (Yes/No) If calci cations are present: { Estimate their number per cm2 { Describe their shape: round, ovoid, elongated, branching, rough, punctate, irregular, amorphous © 2005 by CRC Press LLC Pattern Classi cation and Diagnostic Decision 1091 { Describe their spatial distribution or cluster { Describe their density: homogeneous, heterogeneous Are there signs of architectural distortion? (Yes/No) Are there signs of bilateral asymmetry? (Yes/No) Are there major changes compared to the previous mammogram of the patient? Is the case normal? (Yes/No) If the case is abnormal: { Is the disease benign or malignant (cancer)? The items listed above give a selection of the many features of mammograms that a radiologist would investigate see Ackerman et al 1084] and the BI-RADSTM manual 403] for more details Figure 12.1 shows a graphical user interface developed by Alto et al 528, 1085] for the categorization of breast masses related to some of the questions listed above Figure 12.2 illustrates four segments of mammograms demonstrating masses and tumors of di erent characteristics, progressing from a well-circumscribed and homogeneous benign mass to a highly spiculated and heterogeneous tumor The subject matter of this book | image analysis and pattern classi cation | can provide assistance in responding to only some of the questions listed above Even an entire set of mammograms may not lead to a nal decision: other modes of diagnostic imaging and means of investigation may be necessary to arrive at a de nite diagnosis In the following sections, a number of methods for pattern classi cation, decision making, and evaluation of the results of classi cation are reviewed and illustrated (Note: Parts of this chapter are reproduced, with permission, from R.M Rangayyan, Biomedical Signal Analysis: A Case-Study Approach, IEEE Press and Wiley, New York, NY 2002, c IEEE.) 12.1 Pattern Classi cation Pattern recognition or classi cation may be de ned as the categorization of the input data into identi able classes via the extraction of signi cant features or attributes of the data from a background of irrelevant detail 402, 721, 1086, 1087, 1088, 1089, 1090] In biomedical image analysis, after quantitative features have been extracted from the given images, each image (or ROI) may be represented by a feature vector x = x1 x2 : : : xn ]T , which is also known © 2005 by CRC Press LLC 1092 FIGURE 12.1 Biomedical Image Analysis Graphical user interface for the categorization of breast masses Reproduced with permission from H Alto, R.M Rangayyan, R.B Paranjape, J.E.L Desautels, and H Bryant, \An indexed atlas of digital mammograms for computer-aided diagnosis of breast cancer", Annales des Telecommunications, 58(5): 820 { 835, 2003 c GET { Lavoisier Figure courtesy of C LeGuillou, Ecole Nationale Superieure des Telecommunications de Bretagne, Brest, France © 2005 by CRC Press LLC Pattern Classi cation and Diagnostic Decision (a) b145lc95 fcc = 0.00 SI = 0.04 cf = 0.11 A = 0.07 F8 = 8.11 (b) b164ro94 0.42 0.18 0.26 0.08 8.05 FIGURE 12.2 (c) m51rc97 0.64 0.49 0.55 0.09 8.15 1093 (d) m55lo97 0.83 0.61 0.99 0.01 8.29 Examples of breast mass regions and contours with the corresponding values of fractional concavity fcc , spiculation index SI , compactness cf , acutance A, and sum entropy F8 (a) Circumscribed benign mass (b) Macrolobulated benign mass (c) Microlobulated malignant tumor (d) Spiculated malignant tumor Note that the masses and their contours are of widely di ering size, but have been scaled to the same size in the illustration The rst letter of the case identi er indicates a malignant diagnosis with `m' and a benign diagnosis with `b' based upon biopsy The symbols after the rst numerical portion of the identi er represent l: left, r: right, c: cranio-caudal view, o: medio-lateral oblique view, x: axillary view The last two digits represent the year of acquisition of the mammogram An additional character of the identi er after the year (a { f), if present, indicates the existence of multiple masses visible in the same mammogram Reproduced with permission from H Alto, R.M Rangayyan, and J.E.L Desautels, \Content-based retrieval and analysis of mammographic masses", Journal of Electronic Imaging, in press, 2005 c SPIE and IS&T © 2005 by CRC Press LLC 1094 Biomedical Image Analysis as the measurement vector or a pattern vector When the values xi are real numbers, x is a point in an n-dimensional Euclidean space: vectors of similar objects may be expected to form clusters as illustrated in Figure 12.3 x x x x x x class C z x x x x x decision function d( x)= w x + w x + w = o oo 1 2 o o o z o 1o o o oo o class C x FIGURE 12.3 Two-dimensional feature vectors of two classes, C1 and C2 The prototypes of the two classes are indicated by the vectors z1 and z2 The linear decision function d(x) shown (solid line) is the perpendicular bisector of the straight line joining the two prototypes (dashed line) Reproduced with permission from R.M Rangayyan, Biomedical Signal Analysis: A Case-Study Approach, IEEE Press and Wiley, New York, NY 2002, c IEEE For e cient pattern classi cation, measurements that could lead to disjoint sets or clusters of feature vectors are desired This point underlines the importance of the appropriate design of the preprocessing and feature extraction procedures Features or characterizing attributes that are common to all patterns belonging to a particular class are known as intraset or intraclass features Discriminant features that represent the di erences between pattern classes are called interset or interclass features The pattern classi cation problem is that of generating optimal decision boundaries or decision procedures to separate the data into pattern classes based on the feature vectors provided Figure 12.3 illustrates a simple linear decision function or boundary to separate 2D feature vectors into two classes © 2005 by CRC Press LLC Pattern Classi cation and Diagnostic Decision 1095 12.2 Supervised Pattern Classi cation The problem considered in supervised pattern classi cation may be stated as follows: You are provided with a number of feature vectors with classes assigned to them Propose techniques to characterize and parameterize the boundaries that separate the classes A given set of feature vectors of known categorization is often referred to as a training set The availability of a training set facilitates the development of mathematical functions that can characterize the separation between the classes The functions may then be applied to new feature vectors of unknown classes to classify or recognize them This approach is known as supervised pattern classi cation A set of feature vectors of known categorization that is used to evaluate a classi er designed in this manner is referred to as a test set After adequate testing and rmation of the method with satisfactory results, the classi er may be applied to new feature vectors of unknown classes the results may then be used to arrive at diagnostic decisions The following subsections describe a few methods that can assist in the development of discriminant and decision functions 12.2.1 Discriminant and decision functions A general linear discriminant or decision function is of the form d(x) = w1 x1 + w2 x2 + + wn xn + wn+1 = wT x (12.1) where x = x1 x2 : : : xn 1]T is the feature vector augmented by an additional entry equal to unity, and w = w1 w2 : : : wn wn+1 ]T is a correspondingly augmented weight vector A two-class pattern classi cation problem may be stated as d(x) = wT x > 00 ifif xx 22 CC1 (12.2) where C1 and C2 represent the two classes The discriminant function may be interpreted as the boundary separating the classes C1 and C2 , as illustrated in Figure 12.3 In the general case of an M -class pattern classi cation problem, we will need M weight vectors and M decision functions to perform the following decisions: x Ci i = : : : M di (x) = wiT x > 00 ifotherwise (12.3) where wi = (wi1 wi2 ::: win wi n+1 )T is the weight vector for the class Ci Three cases arise in solving this problem 1086]: Case 1: Each class is separable from the rest by a single decision surface: if di (x) > then x Ci : (12.4) © 2005 by CRC Press LLC 1096 Biomedical Image Analysis Case 2: Each class is separable from every other individual class by a distinct decision surface, that is, the classes are pairwise separable There are M (M ; 1)=2 decision surfaces given by dij (x) = wijT x if dij (x) > j 6= i then x Ci : (12.5) Note: dij (x) = ;dji (x).] Case 3: There exist M decision functions dk (x) = wkT x k = : : : M with the property that if di (x) > dj (x) j 6= i then x Ci : (12.6) This is a special instance of Case We may de ne dij (x) = di (x) ; dj (x) = (wi ; wj )T x = wijT x: (12.7) If the classes are separable under Case 3, they are separable under Case the converse is, in general, not true Patterns that may be separated by linear decision functions as above are said to be linearly separable In other situations, an in nite variety of complex decision boundaries may be formulated by using generalized decision functions based upon nonlinear functions of the feature vectors as d(x) = w1 f1 (x) + w2 f2 (x) + = KX +1 i=1 wi fi (x): + wK fK (x) + wK +1 (12.8) (12.9) Here, ffi (x)g i = : : : K are real, single-valued functions of x fK +1 (x) = Whereas the functions fi (x) may be nonlinear in the n-dimensional space of x, the decision function may be formulated as a linear function by de ning a transformed feature vector xy = f1 (x) f2 (x) : : : fK (x) 1]T Then, d(x) = wT xy , with w = w1 w2 : : : wK wK +1 ]T : Once evaluated, ffi (x)g is just a set of numerical values, and xy is simply a K -dimensional vector augmented by an entry equal to unity Several methods exist for the derivation of optimal linear discriminant functions 402, 738, 674] Example of application: The ROIs of 57 breast masses are shown in Figure 12.4 arranged in the order of decreasing acutance A (see Sections 2.15, 7.9.2, and 12.12) Figure 12.5 shows the contours of the 57 masses arranged in the increasing order of fractional concavity fcc (see Section 6.4) Most of the contours of the benign masses are seen to be smooth, whereas most of the contours of the malignant tumors are rough and spiculated Furthermore, most of the benign masses have well-de ned, sharp edges and are well-circumscribed, whereas the majority of the malignant tumors possess ill-de ned and fuzzy borders It is seen that the shape factor fcc facilitates the ordering of the © 2005 by CRC Press LLC Pattern Classi cation and Diagnostic Decision 1097 contours in terms of shape complexity However, the contours of a few benign masses and a few malignant tumors not follow the expected trend In addition, the acutance measure has lower values for most of the malignant tumors than for a majority of the benign masses The three shape factors cf , fcc , and SI (see Chapter 6) the 14 texture measures as de ned by Haralick 441, 442] (see Section 7.3) and four measures of edge sharpness as de ned by Mudigonda et al 165] (see Section 7.9.2) were computed for the ROIs and their contours (Note: The factor SI was divided by two in this example to reduce it to the range 1].) Figure 12.6 gives a plot of the 3D feature-vector space (fcc A F8 ) for the 57 masses The feature F8 shows poor separation between the benign and malignant samples, whereas the feature A demonstrates some degree of separation A scatter plot of the three shape factors (fcc cf SI ) of the 57 masses is given in Figure 12.7 Each of the three shape factors demonstrates high discriminant capability Figure 12.8 shows a 2D plot of the shape-factor vectors fcc SI ] for a training set formed by selecting the vectors for 18 benign masses and 10 malignant tumors The prototypes for the benign and malignant classes, obtained by averaging the vectors over all the members of the two classes in the training set, are marked as `B' and `M', respectively, on the plot The solid straight line is the perpendicular bisector of the line joining the two prototypes (dashed line), and represents a linear discriminant function The equation of the straight line is SI + 0:6826 fcc ; 0:5251 = 0: The decision function is represented by the following rule: if SI + 0:6826 fcc ; 0:5251 < then benign mass else malignant tumor end It is seen that the rule given above will correctly classify all of the training samples Figure 12.9 shows the result of application of the linear discriminant function designed and shown in Figure 12.8 to a test set of 19 benign masses and 10 malignant tumors The test set does not include any of the cases from the training set It is seen that the classi er will lead to three false negatives in the test set 12.2.2 Distance functions Consider M pattern classes represented by their prototype patterns z1 z2 : : : zM The prototype of a class is typically computed as the average of all the feature vectors belonging to the class Figure 12.3 illustrates schematically the prototypes z1 and z2 of the two classes shown © 2005 by CRC Press LLC 1098 Biomedical Image Analysis m51rc97 0.088 b164ro94 0.085 b164rc94 0.085 b146ro96 0.084 b62lc97 0.084 b62lo97 0.080 b155ro95 0.080 m23lc97 0.079 b155rc95 0.078 m23lo97 0.074 m63ro97 0.073 b62lx97 0.072 b145lc95 0.071 b166lc94 0.069 b146rc96 0.068 b62rc97e 0.065 b62rc97d 0.064 m63rc97 0.064 b62rc97a 0.063 b62rc97b 0.063 b164rx94 0.063 b62ro97e 0.063 b145lo95 0.062 b62ro97a 0.059 b110rc95 0.059 b148ro97 0.058 b157lc96 0.057 b62ro97d 0.057 b157lo96 0.056 b110ro95 0.055 b62ro97c 0.054 b62rc97c 0.053 b64rc97 0.052 b161lc95 0.051 m22lo97 0.051 m62lx97 0.051 b62rc97f 0.051 b148rc97 0.050 b166lo94 0.050 m59lc97 0.049 b158lc95 0.047 m22lc97 0.046 b62ro97b 0.045 m58rm97 0.044 b161lo95 0.043 m59lo97 0.041 m61lc97 0.040 b158lo95 0.039 b62ro97f 0.036 m51ro97 0.033 m64lc97 0.029 m62lo97 0.029 m55lc97 0.027 m61lo97 0.024 m58ro97 0.021 m58rc97 0.014 m55lo97 0.012 FIGURE 12.4 ROIs of 57 breast masses, including 37 benign masses and 20 malignant tumors The ROIs are arranged in the order of decreasing acutance A Note that the masses are of widely di ering size, but have been scaled to the same size in the illustration For details regarding the case identi ers, see Figure 12.2 Reproduced with permission from H Alto, R.M Rangayyan, and J.E.L Desautels, \Content-based retrieval and analysis of mammographic masses", Journal of Electronic Imaging, in press, 2005 c SPIE and IS&T © 2005 by CRC Press LLC Accuracy of Classi cation of Masses as Benign or Malignant Using Combinations of the Shape Factor fcc , Acutance A, and the Texture Measure Sum Entropy F8 with Pattern Classi cation Methods and CBIR (Precision) 529] Logistic regression Mahalanobis distance Linear discriminant analysis Features Sens Spec Avg Sens Spec k-NN Precision Avg Sens Spec Avg Az k=5 k=7 k=5 k=7 fcc 90 97.3 94.7 90 97.3 94.7 100.0 97.3 98.2 0.99 94.7 94.7 95.1 95.2 A F8 50 94.6 78.9 75 67.6 70.0 75.0 73.0 73.7 0.74 68.4 73.7 65.3 67.4 30 86.5 66.7 65 56.8 59.6 75.0 54.1 61.4 0.68 63.2 54.4 58.2 60.9 fcc A fcc F8 A F8 90 97.3 94.7 90 97.3 94.7 95.0 97.3 96.5 0.98 96.5 94.7 93.0 91.2 90 97.3 94.7 90 97.3 94.7 100.0 97.3 98.2 0.99 94.7 94.7 95.1 93.7 55 86.5 75.4 60 70.3 66.7 75.0 73.0 73.7 0.76 75.4 75.4 68.4 68.4 fcc F8 A 90 97.3 94.7 95 97.3 96.5 100.0 97.3 98.2 0.99 96.5 96.5 90.9 91.2 © 2005 by CRC Press LLC 1171 14 texture * * * 70 50.0 64.9 65.0 64.9 64.9 0.67 # # # # *: Logistic regression identi ed the texture feature F8 as the only signi cant feature results were computed for F8 only #: Experiments not conducted for this feature set Sens = sensitivity Spec = speci city Avg = average accuracy as percentages See Figures 12.4 and 12.5 for illustrations of the masses and their contours Pattern Classi cation and Diagnostic Decision TABLE 12.13 1172 Biomedical Image Analysis A review of CBIR systems by Gudivada and Raghavan 1123] outlines previous approaches to content-based retrieval, and expresses the need to utilize features from a variety of approaches based on attributes, feature extraction, or object recognition, for information representation Gudivada and Raghavan 1123] and Yoshitaka and Ichikawa 1122] indicate that conventional database systems are not well suited to handle multimedia data containing images, video, and text Hence, it is necessary to explore more exible query and retrieval methods Representation of breast mass images for CBIR: The rst step in the development of a CBIR system is to represent the data or information in a meaningful way in the database so that retrieval is facilitated for a given application 529, 1085, 1121] The representation of breast masses and tumors in a database requires the design of a reasonable number of descriptors to represent the image features of interest (or diagnostic value) with minimal loss of information It is well established that most benign masses have contours that are well-circumscribed, smooth, and are round or oval, and have a relatively homogeneous internal texture On the other hand, malignant tumors typically exhibit ill-di erentiated and rough or spiculated contours, with a heterogeneous internal texture 54, 55] For these reasons, shape factors and texture measures have been proposed for di erentiating between benign masses and malignant tumors 163, 165, 275, 345, 354, 428, 451] Various researchers have chosen to represent the contours of objects in a variety of ways, some of which include: coding the object's contour as an ordered sequence of points or high-curvature points 1124, 1125, 1126] using chain-code histograms 1125, 1126, 1127, 1128] and using shape descriptors such as compactness 163, 274, 345, 428, 1118, 1132], concavity/convexity 345, 354, 1132], moments 163, 274, 1128, 1132], Fourier descriptors 274, 428, 1124, 1126], spiculation index 345], and the wavelet transform modulusmaxima 1118] Loncaric 406] gives an overview of shape analysis techniques from chain codes to fractal geometry (See Chapter for details on shape analysis.) Automatically extracted shapes and parameters are considered to be primitive features, whereas logical features are abstract representations of images at various levels of detail and may be synthesized from primitive features Logical features require more human intervention and domain expertise, and therefore, there is a higher cost associated with preprocessing the data for the database CBIR approaches di er with respect to the image features that are extracted, the level of abstraction of the features, and the degree of desired domain independence 1123] Depending upon the application, objectbased descriptors such as tumor shape may be preferred to attribute-based descriptors such as color or texture keywords may also be used where appropriate 1129, 1130] In the work of Alto et al 529], the features used are related to radiologically established attributes of breast masses When the images in a database are indexed with objective measures of diagnostic features, the database may be referred to as an indexed atlas 1085, 1133] © 2005 by CRC Press LLC Pattern Classi cation and Diagnostic Decision 1173 The query process: Once the mammographic masses have an appropriate representation in a database, the next step in the development of a CBIR system is to design the query techniques to t the needs of the end-user In a CAD application, the end-user could be a radiologist, a radiology intern, or a physician In standard text-based databases, queries are generally comprised of keywords, natural language queries, or browsing procedures (that is, query by subject) For image or multimedia databases, the same methods may apply only if there are searchable keywords or textual descriptors associated with the images In a \query by example", the user speci es a condition by giving examples of the desired image or object, either by cutting and pasting an image or by sketching the example object's contour 1122] One of the best-known commercial CBIR systems is Query by Image Content (QBIC) developed at IBM 1129] QBIC uses visual content such as color percentages, color layout, and texture extracted from images of art collections A CBIR system developed by Srihari 1130], known as Piction, contains images of newspaper photos annotated with their associated captions Queries based on text and image features extracted from the photos may then be used to identify human faces found in the newspaper photographs A comprehensive list of query classes is given by Gudivada and Raghavan 1123] as: color, texture, sketch, shape, volume, spatial constraints, browsing, objective attributes, subjective attributes, motion, text, and domain concepts Fewer classes may be used when the database is highly domain-speci c, and more are needed when the database is of general scope When a query is made in a CBIR system, the retrieved results are typically presented as a set or series of images that are rank-ordered by their degree of similarity (or a distance measure, such as the Euclidean, Manhattan, or Mahalanobis distance, as an indicator of dissimilarity) with respect to the query image This is di erent from retrieval in text-based database systems, which generally provide results with an exact match (that is, a single word, set of words, or a phrase) The CBIR work of Alto et al was focused on retrieving similar masses, of established diagnosis, to assist the radiologist by suggesting a probable diagnosis for the query case on hand The concept of similarity is especially pertinent in such an application because no two breast masses may be expected to be identical, and a perfect or exact match to a query would be improbable in practice Some of the shape-matching procedures suggested by Trimeche et al 1125] require a comparison of the vertices of polygonal models of the query contour and the database contours Each vertex is represented by a set of values (such as scale, angle, ratio of consecutive segments, and the ratio to the overall length) The feature vectors could be excessively lengthy if the contour has many vertices, such as a spiculated mass A matrix containing all possible matches between the vertices of the query shape and each of the candidate contours may then be created The polygonal model method produced good results with sh contours The use of shape factors to represent the contours © 2005 by CRC Press LLC 1174 Biomedical Image Analysis of masses, as in the work of Alto et al., simpli es the process of comparative analysis Visualization of the query results is accomplished by rank-ordering the retrieved results from the minimum to the maximum distance and presenting the top k objects to the user, where k could be N One suggested method of visualization of the retrieved results that may enhance the user's perception of the overall information presented was de ned by Moghaddam et al 1134] as a Splat: the retrieved images were displayed in rank order of their visual similarities, with their placement on the page with respect to the query being dictated by their mutual similarities Evaluation of retrieval: An important step in developing a CBIR system is to evaluate its e ciency with respect to the retrieval of relevant information Measures of precision and recall have been proposed to assess the performance of general information retrieval systems, based upon the following de nitions 1131]: Correct detections k X Ak = Vn (12.93) n=1 where k is the number of retrieved objects, and Vn f0 1g with Vn = if the retrieved object is relevant to the query and Vn = if it is irrelevant In the present application, a relevant object is a retrieved benign mass for a benign query sample a retrieved malignant tumor would be considered irrelevant In the case of a malignant query sample, a retrieved benign mass would be irrelevant and a malignant tumor would be relevant False alarms k X Bk = (1 ; Vn ): (12.94) n=1 Misses Mk = ! N X n=1 Vn ; Ak (12.95) where N is the total number of objects in the database Correct dismissals Dk = N X n=1 ! (1 ; Vn ) ; Bk : (12.96) Recall, de ned as the ratio of the number of relevant retrieved objects to all relevant objects in the database, and computed as Rk = A A+kM : k k © 2005 by CRC Press LLC (12.97) Pattern Classi cation and Diagnostic Decision 1175 Precision, de ned as the ratio of the number of relevant retrieved objects to all retrieved objects, and computed as Pk = A A+k B : k k (12.98) Fallout, de ned as the ratio of the number of retrieved irrelevant objects to all irrelevant objects in the database, and computed as Fk = B B+kD : k k (12.99) The following plots may be used to evaluate the e ectiveness of CBIR systems: Retrieval e ectiveness: precision versus recall ROC: correct detections versus false alarms Relative operating characteristics: correct detections versus fallout Response ratio: BAkk versus Ak In general, an e ective CBIR system will demonstrate high precision for all values of recall 1131] Results of CBIR with breast masses: Alto et al 529] applied a content-based retrieval algorithm to the 57 masses and their contours shown in Figures 12.4 and 12.5 The retrieval algorithm uses the Euclidean distance between a query sample's feature vector and the feature vector of each of the remaining masses in the database, and rank-orders the masses corresponding to the vectors that are most similar to the query vector (that is, the shortest Euclidean distance) The rank-ordered masses are presented to the user, annotated with the biopsy-proven diagnosis for each retrieved mass The 57 masses were each used, in turn, as the query sample Figure 12.30 shows the contours of the 57 masses rank-ordered by the Euclidean distance from the origin in the three-feature space of fcc cf SI ] This is equivalent to sorting the contours by the magnitudes of the feature vectors fcc cf SI ] Observe that the use of three shape factors has led to a more comprehensive characterization of shape roughness than only one (fcc ) as in Figure 12.5, resulting in a di erent order of sorting Figures 12.31 through 12.34 show the retrieval results for the four masses illustrated in Figure 12.2 using various feature vectors including fcc , A, and F8 (In each case, the rst mass at the left is the query sample The retrieved samples are arranged in the increasing order of distance from the query sample from left to right The contour of the mass in each ROI is provided above the ROI.) The results indicate that the masses retrieved and their sequence © 2005 by CRC Press LLC 1176 Biomedical Image Analysis b161lc95 0.12 b145lc95 0.12 b62rc97e 0.13 b158lo95 0.14 b164rx94 0.14 b145lo95 0.15 b62lx97 0.15 b148rc97 0.15 b62rc97a 0.15 b62ro97e 0.16 b62lo97 0.16 b148ro97 0.16 b62rc97d 0.17 b110rc95 0.18 b62rc97b 0.18 b62ro97b 0.19 b62ro97f 0.2 b164rc94 0.2 b161lo95 0.21 b157lo96 0.21 b110ro95 0.22 b166lo94 0.22 b155rc95 0.22 b62ro97a 0.27 b62ro97d 0.28 b62lc97 0.28 b62rc97f 0.28 b166lc94 0.29 b146ro96 0.29 b146rc96 0.31 b157lc96 0.31 b62rc97c 0.32 b158lc95 0.34 b62ro97c 0.37 m62lx97 0.38 b164ro94 0.4 m63ro97 0.42 b64rc97 0.45 m23lc97 0.56 b155ro95 0.57 m63rc97 0.64 m51rc97 0.82 m58rm97 0.9 m23lo97 0.94 m61lc97 m62lo97 1.1 m22lc97 1.2 m61lo97 1.2 m55lc97 1.2 m51ro97 1.2 m59lo97 1.2 m22lo97 1.2 m55lo97 1.3 m58ro97 1.3 m59lc97 1.3 m58rc97 1.3 m64lc97 1.4 FIGURE 12.30 Contours of 57 breast masses, including 37 benign masses and 20 malignant tumors The contours are arranged in the order of increasing magnitude of the feature vector fcc cf SI ], which is given next to each sample Note that the masses and their contours are of widely di ering size, but have been scaled to the same size in the illustration For details regarding the case identi ers, see Figure 12.2 See also Figure 12.5 Figure courtesy of H Alto 528] © 2005 by CRC Press LLC Pattern Classi cation and Diagnostic Decision 1177 depend upon the features used to characterize and index the masses Regardless, all of the results illustrated clearly lead to decisions that agree with the known diagnoses of the query samples, except for the case in Figure 12.33 (b) Although the texture and edge-sharpness measures resulted in poor classi cation accuracies on their own, the results of retrieval using both of the features with the shape factor fcc indicate the need to include these measures so as to provide a broader scope of representation of radiographic features than shape complexity alone The results of content-based retrieval, as illustrated in Figures 12.31 { 12.34, lend easily to pattern classi cation with the k-NN method The k-NN method may be applied by simple visual inspection of the rst k cases in the results of retrieval the classi cation of the query sample is made based upon the known classi cation of the majority of the rst k objects Alto et al applied the kNN method to the retrieval results with k = and 11 Correct detections in these cases refer to the retrieval of at least or 6, respectively, correct cases by virtue of their diagnosis corresponding to the known diagnosis of the query (test) sample The results of k-NN analysis and retrieval precision are presented in Table 12.13 for k = and The high levels of classi cation accuracy and retrieval precision with the use of shape factors indicate the importance of shape in the analysis of breast masses and tumors A study by Sahiner et al 428] has also indicated the importance of shape parameters in the classi cation of breast masses and tumors See Zheng et al 1135] for the application of CBIR to image analysis in pathology (histology) 12.12.3 Extension to telemedicine The concepts of an indexed atlas and CBIR may be combined with mobile software agents for web-based medical image retrieval and analysis of medical images, telemedicine, and remote medical consultation applications 1085] Software agents are autonomous, intelligent, software objects that can process, analyze, and make decisions about data 1136, 1137] A mobile agent is a self-contained software program that can move within a computer network and perform tasks for the user A mobile agent can reduce search time and function with limited computational resources and low-bandwidth communication links: this is accomplished by having the agent process or evaluate the data at the source, and then transmit only the pertinent data to the user Mobile agents can serve a variety of functions, and may be used to 1136, 1137, 1138, 1139, 1140]: search information residing at remote nodes and report back to the source (information agent) nd under-utilized network resources to perform computationally intensive processing tasks (computation agent) and © 2005 by CRC Press LLC 1178 b145lc95 Biomedical Image Analysis b146rc96 b148rc97 b148ro97 b161lc95 b62rc97e m63ro97 b146rc96 m23lo97 b148ro97 b110rc95 b155rc95 (a) b145lc95 b62lx97 b166lc94 (b) b145lc95 b62lx97 b164rx94 (c) FIGURE 12.31 Content-based retrieval with the circumscribed benign query sample b145lc95 (a) using the shape factor fcc only, (b) using acutance A only, and (c) using the three features fcc A F8 ] For details of case identi cation, see Figure 12.2 Reproduced with permission from H Alto, R.M Rangayyan, and J.E.L Desautels, \Content-based retrieval and analysis of mammographic masses", Journal of Electronic Imaging, in press, 2005 c SPIE and IS&T © 2005 by CRC Press LLC Pattern Classi cation and Diagnostic Decision b164ro94 b62ro97c b146ro96 1179 b62ro97a b158lc95 b62rc97f b62lc97 m51rc97 b62lo97 b155ro95 m63ro97 m51rc97 (a) b164ro94 b164rc94 b146ro96 (b) b164ro94 b146ro96 b62lc97 (c) FIGURE 12.32 Content-based retrieval with the macrolobulated benign query sample b164ro94 (a) using the shape factor fcc only, (b) using acutance A only, and (c) using the three features fcc A F8 ] For details of case identi cation, see Figure 12.2 Reproduced with permission from H Alto, R.M Rangayyan, and J.E.L Desautels, \Content-based retrieval and analysis of mammographic masses", Journal of Electronic Imaging, in press, 2005 c SPIE and IS&T © 2005 by CRC Press LLC 1180 m51rc97 Biomedical Image Analysis m23lc97 m55lc97 m59lo97 m23lo97 b64rc97 b146ro96 b62lc97 b62lo97 m63ro97 b146ro96 m23lc97 (a) m51rc97 b164ro94 b164rc94 (b) m51rc97 m23lo97 b164ro94 (c) FIGURE 12.33 Content-based retrieval with the microlobulated malignant query sample m51rc97 (a) using the shape factor fcc only, (b) using acutance A only, and (c) using the three features fcc A F8 ] For details of case identi cation, see Figure 12.2 Reproduced with permission from H Alto, R.M Rangayyan, and J.E.L Desautels, \Content-based retrieval and analysis of mammographic masses", Journal of Electronic Imaging, in press, 2005 c SPIE and IS&T © 2005 by CRC Press LLC Pattern Classi cation and Diagnostic Decision m55lo97 m22lc97 m61lo97 1181 m58ro97 m58rm97 m58rc97 m61lo97 m55lc97 m62lo97 m55lc97 m64lc97 m62lo97 (a) m55lo97 m58rc97 m58ro97 (b) m55lo97 m58rc97 m61lo97 (c) FIGURE 12.34 Content-based retrieval with the spiculated malignant query sample m55lo97 (a) using the shape factor fcc only, (b) using acutance A only, and (c) using the three features fcc A F8 ] For details of case identi cation, see Figure 12.2 Reproduced with permission from H Alto, R.M Rangayyan, and J.E.L Desautels, \Content-based retrieval and analysis of mammographic masses", Journal of Electronic Imaging, in press, 2005 c SPIE and IS&T © 2005 by CRC Press LLC 1182 Biomedical Image Analysis send messages back and forth between clients residing at various network nodes (communication agent) Figure 12.35 shows a schematic representation of the combined use of an indexed atlas, CBIR, and mobile agents in the context of mammography and CAD of breast cancer The major strength of mobile agents is that they permit program execution near or at a distributed data source by moving to each site in order to perform a computational task In addition, mobile-agent systems typically have the characteristics of low network tra c, load balancing, fault tolerance, and asynchronous interaction Agents can function independently of one another, as well as cooperate to solve problems The use of mobile agents with a CBIR system brings speci c bene ts and di culties For example, mobile agents can move to sites with better or more data, and faster computers They can replicate themselves and use the inherent power of parallelism to improve productivity The basic strengths of mobile-agent systems include the inherent parallelism of multiple agents conducting simultaneous searches, parallel searching with intelligent prepreprocessing, and agent-to-agent communication Speci c di culties with the mobile-agent paradigm include issues of security, complexity, and control Security is important when dealing with patient information Data encryption and restricting the agents to operations within secure networks could address security concerns 12.13 Remarks We have studied how biomedical images may be processed and analyzed to extract quantitative features that may be used to classify the images as well as lead toward diagnostic decisions The practical development and application of such techniques is usually hampered by a number of limitations related to the extent of discriminant information present in the images selected for analysis, as well as the limitations of the features designed and computed Artifacts inherent in the images or caused by the image acquisition systems impose further limitations The subject of pattern classi cation is a vast area by itself 402, 1086, 401, 1087] The topics presented in this chapter provide a brief introduction to the subject A pattern classi cation system that is designed with limited data and information about the chosen images and features will provide results that should be interpreted with due care Above all, it should be borne in mind that the nal diagnostic decision requires far more information than that provided by images and image analysis: this aspect is best left to the physician or health-care specialist in the spirit of computer-aided diagnosis © 2005 by CRC Press LLC Pattern Classi cation and Diagnostic Decision FIGURE 12.35 © 2005 by CRC Press LLC 1183 Use of an indexed atlas, CBIR, and mobile agents in the context of mammography and CAD of breast cancer Note: \U of C" = University of Calgary, Calgary, Alberta, Canada Reproduced with permission from H Alto, R.M Rangayyan, R.B Paranjape, J.E.L Desautels, and H Bryant, \An indexed atlas of digital mammograms for computer-aided diagnosis of breast cancer", Annales des Telecommunications, 58(5): 820 { 835, 2003 c GET { Lavoisier 1184 Biomedical Image Analysis 12.14 Study Questions and Problems Selected data les related to some of the problems and exercises are available at the site www.enel.ucalgary.ca/People/Ranga/enel697 The prototype vectors of two classes of images are speci ed as Class : 0:5]T , and Class : 3]T A new sample vector is given as 1]T Give the equations for two measures of similarity or dissimilarity, compute the measures for the sample vector, and classify the sample into Class or Class using each measure In a three-class pattern classi cation problem, the three decision boundaries are d1 (x) = ;x1 + x2 , d2 (x) = x1 + x2 ; 5, and d3 (x) = ;x2 + Draw the decision boundaries on a sheet of graph paper Classify the sample pattern vector x = 5]T using the decision functions Two pattern class prototype vectors are given to you as z1 = 4]T and z2 = 10 2]T Classify the sample pattern vector x = 5]T using (a) the normalized dot product, and (b) the Euclidean distance A researcher makes two measurements per sample on a set of 10 normal and 10 abnormal samples The set of feature vectors for the normal samples is f 6]TT 22 20]T T 10 T14]T 10T 10]T 24T 24]T 10] 8] 10] 12] 12] g: The set of feature vectors for the abnormal samples is f 10]TT 24 16]TT 16 18]TT 18 20]TT 14 20]TT 20 22] 18 16] 20 20] 18 18] 20 18] g: Plot the scatter diagram of the samples in both classes in the feature-vector space (on a sheet of graph paper) Design a linear decision function to classify the samples with the lowest possible error of misclassi cation Write the decision function as a mathematical rule and draw the same on the scatter diagram How many (if any) samples are misclassi ed by your decision function? Mark the misclassi ed samples on the plot Two new observation sample vectors are provided to you as x1 = 12 15]T and x2 = 14 15]T Classify the samples using your decision rule Now, classify the samples x1 and x2 using the k-nearest-neighbor method, with k = Measure distances graphically on your graph paper plot and mark the neighbors used in this decision process for each sample Comment upon the results | whether the two methods resulted in the same classi cation result or not | and provide reasons © 2005 by CRC Press LLC Pattern Classi cation and Diagnostic Decision 12.15 Laboratory Exercises and Projects 1185 The le tumor shape1.dat gives the values of several shape factors for 28 benign masses and 26 malignant tumors the contours are illustrated in Figure 6.27 See Chapter for details regarding the methods see the le tumor shape1.txt for details regarding the data le Select the three shape factors (SI fcc cf ) and form feature vectors for each case Using each feature vector as the test case and the remaining vectors in the dataset as the training set, classify each case as benign or malignant using (a) the Euclidean distance, (b) the Manhattan distance, and (b) the Mahalanobis distance Compare the results with the classi cation provided in the data le and determine the TPF, TNF, FPF, and FNF Comment upon the performance of the three distance measures Repeat experiment with the data in the le tumor shape2.dat for 37 benign masses and 20 malignant tumors the contours are illustrated in Figure 12.5 See the le tumor shape2.txt for details regarding the data le Discuss the di erences in the results you obtain with the two datasets Using the data in the preceding problem, classify each case as benign or malignant using the k-NN method, with k = and Use the Euclidean distance Comment upon the results The les mfc ben.dat and mfc mal.dat give the values of the three shape factors (mf ff cf ) for 64 benign calci cations and 79 malignant calci cations, respectively (See Chapter for details.) Design a pattern classi cation system using the Mahalanobis distance and evaluate its performance in terms of the TPF, TNF, FPF, and FNF © 2005 by CRC Press LLC [...]... boundary between the two classes Figure courtesy of F.J Ayres © 2005 by CRC Press LLC 1112 Biomedical Image Analysis 1 0.9 0.8 Spiculation index/2 SI/2 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 FIGURE 12.11 0.2 0.3 0.4 0.5 Fractional concavity fcc 0.6 0.7 Second iteration of the K -means algorithm Details as in Figure 12.10 Figure courtesy of F.J Ayres © 2005 by CRC Press LLC Pattern Classi cation and Diagnostic... concavity fcc 0.6 0.7 Third iteration of the K -means algorithm Details as in Figure 12.10 Figure courtesy of F.J Ayres © 2005 by CRC Press LLC 1114 Biomedical Image Analysis 1 0.9 0.8 Spiculation index/2 SI/2 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 FIGURE 12.13 0.2 0.3 0.4 0.5 Fractional concavity fcc 0.6 0.7 Fourth iteration of the K -means algorithm Details as in Figure 12.10 Figure courtesy of F.J Ayres... unsupervised pattern classi cation, and may be solved by clusterseeking methods 12.3.1 Cluster-seeking methods Given a set of feature vectors, we may examine them for the formation of inherent groups or clusters This is a simple task in the case of 2D vectors, where we may plot them, visually identify groups, and label each group with a pattern class Allowance may have to be made to assign the same class... and z2 , respectively If D31 and D32 are both greater than , start a new cluster with its center as z3 = x3 otherwise, assign x3 to the domain of the closer cluster 5 Continue to apply Steps 3 and 4 by computing and checking the distance from every new (unclassi ed) pattern vector to every established cluster center and applying the assignment or cluster-creation rule 6 Stop when every given pattern vector... of occurrence of an event y p (y) is used to represent the PDF of a random variable y Probabilities and PDFs involving a multidimensional feature vectors are multivariate functions with dimension equal to that of the feature vector.] Bayes rule shows how observing the sample x changes the a priori probability P (Ci ) to the a posteriori probability P (Ci jx): In other words, Bayes rule provides a mechanism... benign class prototype (mean) is indicated by the solid diamond that for the malignant class is indicated by the * symbol The dashed and dashdot contours indicate two constant-Mahalanobis-distance contours (level sets) each for the two Gaussian PDF models (see Equation 12.18) for the malignant and benign classes, respectively The solid contour indicates the decision boundary, as given by Equation 12.53... F.J Ayres © 2005 by CRC Press LLC Pattern Classi cation and Diagnostic Decision 1115 1 0.9 0.8 Spiculation index/2 SI/2 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 FIGURE 12.14 0.2 0.3 0.4 0.5 Fractional concavity fcc 0.6 0.7 Final state of the K -means algorithm after the fth iteration Details as in Figure 12.10 Figure courtesy of F.J Ayres © 2005 by CRC Press LLC 1116 Biomedical Image Analysis A classi er... Biomedical Image Analysis 1 Sum Entropy 0.8 0.6 0.4 0.2 1 0.8 0 1 0.6 0.8 0.4 0.6 0.4 Acuta nce FIGURE 12.6 0.2 0.2 0 0 na tio ac Fr ity av nc o lC Plot of the 3D feature-vector space (fcc A F8 ) for the set of 57 masses in Figure 12.4 `o': benign masses (37) ` ': malignant tumors (20) Reproduced with permission from H Alto, R.M Rangayyan, and J.E.L Desautels, \Content-based retrieval and analysis of mammographic... above, it would be appropriate to verify the Gaussian nature of the PDFs of the variables on hand by conducting statistical tests 168, 1087] Furthermore, it would be necessary to derive or estimate the mean vector and covariance matrix for each class sample statistics computed from a training set may serve this purpose © 2005 by CRC Press LLC 1120 Biomedical Image Analysis Example: Figure 12.15 shows plots... probability functions for the benign (solid line) and malignant (dash-dot line) classes are also shown Equal prior probabilities of 0:5 were used for the two classes See also Figure 12.16 Figure courtesy of F.J Ayres © 2005 by CRC Press LLC 1122 Biomedical Image Analysis 3.5 1 3 0.8 0.6 2 p(fcc | Ci ) P(Ci | fcc) 2.5 1.5 0.4 1 0.2 0.5 0 0 0.1 0.2 0.3 FIGURE 12.16 0.4 0.5 0.6 Fractional concavity fcc 0.7

Định dạng
Số trang	97
Dung lượng	1,99 MB