1. Trang chủ
  2. » Y Tế - Sức Khỏe

Hệ thống phân loại trong Chỉnh hình pps

8 358 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 8
Dung lượng 140,23 KB

Nội dung

Journal of the American Academy of Orthopaedic Surgeons 290 Classifications of musculoskeletal conditions have at least two central functions. First, accurate classifica- tion characterizes the nature of a problem and then guides treatment decision making, ultimately im- proving outcomes. Second, accu- rate classification establishes an expected outcome for the natural history of a condition or injury, thus forming a basis for uniform report- ing of results for various surgical and nonsurgical treatments. This allows the comparison of results from different centers purportedly treating the same entity. A successful classification system must be both reliable and valid. Reliability reflects the precision of a classification system; in general, it refers to interobserver reliability, the agreement between different ob- servers. Intraobserver reliability is the agreement of one observer’s re- peated classifications of an entity. The validity of a classification system reflects the accuracy with which the classification system describes the true pathologic pro- cess. A valid classification system correctly categorizes the attribute of interest and accurately describes the actual process that is occurring. 1 To measure or quantify validity, the classification of interest must be compared to some “gold standard.” If the surgeon is classifying bone stock loss prior to revision hip ar- throplasty, the gold standard could potentially be intraoperative assess- ment of bone loss. Validation of the classification system would require a high correlation between the pre- operative radiographs and the intra- operative findings. In this example, the radiographic findings would be considered “hard” data because dif- ferent observers can confirm the radiographic findings. Intraopera- tive findings, on the other hand, would be considered “soft” data be- cause independent confirmation of this intraoperative assessment is often impossible. This problem with the validation phase affects many commonly used classification sys- tems that are based on radiographic criteria, and it introduces the ele- ment of observer bias to the valida- tion process. Because of the difficulty of measuring validity, it is critical that classification systems have at least a high degree of reliability. Assessment of Reliability Classifications and measurements in general must be reliable to be assessed as valid. However, be- cause confirming validity is diffi- cult, many commonly used classifi- Dr. Garbuz is Assistant Professor, Department of Orthopaedics, University of British Columbia, Vancouver, BC, Canada. Dr. Masri is Associate Professor and Head, Division of Reconstructive Orthopaedics, University of British Columbia. Dr. Esdaile is Professor and Head, Division of Rheumatology, University of British Columbia. Dr. Duncan is Professor and Chairman, Department of Orthopaedics, University of British Columbia. Reprint requests: Dr. Garbuz, Laurel Pavilion, Third Floor, 910 West Tenth Avenue, Vancouver, BC, Canada V5Z 4E3. Copyright 2002 by the American Academy of Orthopaedic Surgeons. Abstract Classification systems help orthopaedic surgeons characterize a problem, suggest a potential prognosis, and offer guidance in determining the optimal treatment method for a particular condition. Classification systems also play a key role in the reporting of clinical and epidemiologic data, allowing uniform comparison and documentation of like conditions. A useful classification system is reliable and valid. Although the measurement of validity is often difficult and some- times impractical, reliability—as summarized by intraobserver and interobserv- er reliability—is easy to measure and should serve as a minimum standard for validation. Reliability is measured by the kappa value, which distinguishes true agreement of various observations from agreement due to chance alone. Some commonly used classifications of musculoskeletal conditions have not proved to be reliable when critically evaluated. J Am Acad Orthop Surg 2002;10:290-297 Classification Systems in Orthopaedics Donald S. Garbuz, MD, MHSc, FRCSC, Bassam A. Masri, MD, FRCSC, John Esdaile, MD, MPH, FRCPC, and Clive P. Duncan, MD, FRCSC Donald S. Garbuz, MD, et al Vol 10, No 4, July/August 2002 291 cation systems can be shown to be reliable yet not valid. On preopera- tive radiographs of a patient with a hip fracture, for example, two ob- servers may categorize the fracture as Garden type 3. This measure- ment is reliable because of interob- server agreement. However, if the intraoperative findings are of a Garden type 4 fracture, then the classification on radiographs, al- though reliable, is not valid (ie, is inaccurate). A minimum criterion for the acceptance of any classifica- tion or measurement, therefore, is a high degree of both interobserver and intraobserver reliability. Once a classification system has been shown to have acceptable reliability, then testing for validity is appropri- ate. If the degree of reliability is low, however, then the classification system will have limited utility. Initial efforts to measure reliabil- ity looked only at observed agree- ment—the percentage of times that different observers categorized their observations the same. This concept is illustrated in Figure 1, a situation in which the two sur- geons agree 70% of the time. In 1960, Cohen 2 introduced the kappa value (or kappa statistic) as a mea- sure to assess agreement that oc- curred above and beyond that related to chance alone. Today the kappa value and its variants are the most accepted methods of measur- ing observer agreement for categor- ical data. Figure 1 demonstrates how the kappa value is used and how it dif- fers from the simple measurement of observed agreement. In this hypothetical example, observed agreement is calculated as the per- centage of times both surgeons agree whether fractures were dis- placed or nondisplaced; it does not take into account the fact that they may have agreed by chance alone. To calculate the percentage of chance agreement, it is assumed that each surgeon will choose a cate- gory independently of the other. The marginal totals are then used to calculate the agreement expected by chance alone; in Figure 1, this is 0.545. To calculate the kappa value, the observed agreement (Po) minus the chance agreement (Pc) is divided by the maximum possible agreement that is not related to chance (1 − Pc): κ = (Po − Pc) / (1 − Pc) This example is the simplest case of two observers and two cate- gories. The kappa value can be used for multiple categories and multiple observers in a similar manner. In analyzing categorical data, which the kappa value is designed to measure, there will be cases in which disagreement between vari- ous categories may not have as pro- found an impact as disagreement between other categories. For this reason, categorical data are divided into two types: nominal (unranked), in which all categorical differences are equally important, and ordinal (ranked), in which disagreement between some categories has a more profound impact than disagreement between other categories. An exam- ple of nominal data is eye color; an example of ordinal data is the AO classification, in which each subse- quent class denotes an increase in severity of the fracture. The kappa value can be un- weighted or weighted depending on whether the data are nominal or ordinal. Unweighted kappa values should always be used with un- ranked data. When ordinal data are being analyzed, however, a decision must be made whether or not to weight the kappa value. Weighting has the advantage of giving some credit to partial agree- ment, whereas the unweighted kappa value treats all disagree- ments as equal. A good example of appropriate use of the weighted kappa value is in a study by Kristiansen et al 3 of interobserver agreement in the Neer classifica- tion of proximal humeral fractures. This well-known classification has four categories of fractures, from nondisplaced or minimally dis- placed to four-part fractures. Weighting was appropriate in this case because disagreement be- Surgeon No. 2 Surgeon No. 1 Displaced Nondisplaced Total Displaced 50 15 65 Nondisplaced 15 20 35 Total 65 35 100 Observed agreement = 0.70 Chance agreement = 65 × 65 + 35 × 35 100 = 0.545 100 100 Agreement beyond chance (κ) = 0.70 – 0.545 = 0.34 1 – 0.545 Figure 1 Hypothetical example of agreement between two orthopaedic surgeons classify- ing radiographs of subcapital hip fractures. Classification Systems in Orthopaedics Journal of the American Academy of Orthopaedic Surgeons 292 tween a two-part and three-part fracture is not as serious as dis- agreement between a nondisplaced fracture and a four-part fracture. By weighting kappa values, one can account for the different levels of importance between levels of disagreement. If a weighted kappa value is determined to be appropri- ate, the weighting scheme must be specified in advance because the weights chosen will dramatically affect the kappa value. In addition, when reporting studies that have used a weighted kappa value, the weighting scheme must be docu- mented clearly. One problem with weighting is that without uniform weighting schemes, it is difficult to generalize across studies. Sample size will allow the confidence inter- val to be narrower but it does not automatically affect the number of categories. Although the kappa value has become the most widely accepted method to measure observer agree- ment, interpretation is difficult. Values obtained range from −1.0 (complete disagreement) through 0.0 (chance agreement) to 1.0 (com- plete agreement). Hypothesis test- ing has limited usefulness when the kappa value is used because it al- lows the researcher to see only if obtained agreement is significantly different from zero or chance agree- ment, revealing nothing about the extent of agreement. Consequently, when kappa values are obtained for assessing classifications of mus- culoskeletal conditions, hypothe- sis testing has almost no role. As Kraemer stated, “It is insufficient to demonstrate merely the nonran- domness of diagnostic procedures; one requires assurance of substantial agreement between observations.” 4 This statement is equally applicable to classifications used in ortho- paedics. To assess the strength of agree- ment obtained with a given kappa value, two different benchmarks have gained widespread use in orthopaedics and other branches of medicine. The most widely adopted criteria for assessing the extent of agreement are those of Landis and Koch: 5 κ > 0.80, almost perfect; κ = 0.61 to 0.80, substantial; κ = 0.41 to 0.60, moderate; κ = 0.21 to 0.40, fair; κ = 0.00 to 0.20, slight; and κ < 0.00, poor. Although these criteria have gained widespread acceptance, the values were chosen arbitrarily and were never intended to serve as general benchmarks. The criteria of Svanholm et al, 6 while less widely used, are more stringent than those of Landis and Koch and are perhaps more practical for use in medicine. Like Landis and Koch, Svanholm et al chose arbitrary values: κ ≥ 0.75, excellent; κ = 0.51 to 0.74, good; and κ ≤ 0.50, poor. When reviewing reports of stud- ies on agreement of classification systems, readers should look at the actual kappa value and not just at the arbitrary categories described here. Although the interpretation of a given kappa value is difficult, it is clear that the higher the value, the more reliable the classification sys- tem. When interpreting a given kappa value, the impact of preva- lence and bias must be considered. Feinstein and Cicchetti 7,8 refer to them as the two paradoxes of high observed agreement and low kappa values. Most important is the effect that the prevalence (base rate) can have on the kappa value. Preva- lence refers to the number of times a given category is selected. In gener- al, as the proportion of cases in one category approaches 0, or 100%, the kappa value will decrease for any given observed agreement. In Figure 2, the same two hypothetical orthopaedic surgeons as in Figure 1 review and categorize 100 different radiographs. The observed agree- ment is the same as in Figure 1, 0.70. However, the agreement beyond chance (kappa value) is 0.06. The main difference between Figures 1 and 2 is the marginal totals or the Surgeon No. 2 Surgeon No. 1 Displaced Nondisplaced Total Displaced 65 15 80 Nondisplaced 15 5 20 Total 80 20 100 Observed agreement = 0.70 Chance agreement = 80 × 80 + 20 × 20 100 = 0.68 100 100 Agreement beyond chance (κ) = 0.70 – 0.68 = 0.06 1 – 0.68 Figure 2 Hypothetical example of agreement between two orthopaedic surgeons classify- ing radiographs, with a higher prevalence of displaced fractures than in Figure 1. Donald S. Garbuz, MD, et al Vol 10, No 4, July/August 2002 293 underlying prevalence of displaced and nondisplaced fractures, de- fined as the proportion of dis- placed and nondisplaced fractures. If one category has a very high prevalence, there can be paradoxi- cal high observed agreement yet low kappa values (although to some extent this can be the result of the way chance agreement is calcu- lated). The effect of prevalence on kappa values must be kept in mind when interpreting studies of ob- server variability. The prevalence, observed agreement, and kappa values should be clearly stated in any report on classification reliabil- ity. Certainly a study with a low kappa value and extreme preva- lence rate will not represent the same level of disagreement as will a low kappa value in a sample with a balanced prevalence rate. Bias (systematic difference) is the second factor that can affect the kappa value. Bias has a lesser effect than does prevalence, however. As bias increases, kappa values para- doxically will increase, although this is usually seen only when kappa values are low. To assess the extent of bias in observer agree- ment studies, Byrt et al 9 have sug- gested measuring a bias index, but this has not been widely adopted. Although the kappa value, influ- enced by prevalence and bias, mea- sures agreement, it is not the only measure of the precision of a classi- fication system. Many other factors can affect both observer agreement and disagreement. Sources of Disagreement As mentioned, any given classifica- tion system must have a high degree of reliability or precision. The de- gree of observer agreement obtained is affected by many factors, includ- ing the precision of the classification system. To improve reliability, these other sources of disagreement must be understood and minimized. Once this is done, the reliability of the classification system itself can be accurately estimated. Three sources of disagreement or variability have been described: 1,10 the clinician (observer), the patient (examined), and the procedure (examination). Each of these can affect the reliability of classifications in clinical practice and studies that examine classifications and their reliability. Clinician variability arises from the process by which information is observed and interpreted. The in- formation can be obtained from dif- ferent sources, such as history, physical examination, or radio- graphic examination. These raw data are often then converted into categories. Wright and Feinstein 1 called the criteria used to put the raw data into categories conversion criteria. Disagreement can occur when the findings are observed or when they are organized into the arbitrary categories commonly used in classification systems. An example of variability in the observational process is the mea- surement of the center edge angle of Wiberg. Inconsistent choice of the edge of the acetabulum will lead to variations in the measurements obtained (Fig. 3). As a result of the emphasis on arbitrary criteria for the various cate- gories in a classification system, an observer may make measurements that do not meet all of the criteria of a category. The observer will then choose the closest matching category. Another observer may disagree about the choice of closest category and choose another. Such variability in the use of conversion criteria is common and is the result of trying to convert the continuous spectrum of clinical data into arbitrary and finite categories. The particular state being mea- sured will vary depending on when and how it is measured. This results in patient variability. A good example is the variation ob- tained in measuring the degree of spondylolisthesis when the patient is in a standing compared with a supine position. 11 To minimize pa- tient variability, examinations should be performed in a consistent, stan- dardized fashion. The final source of variability is the procedure itself. This often refers to technical aspects, such as the taking of a radiograph. If the exposures of two radiographs of the same patient’s hip are different, for example, then classification of the degree of osteopenia, which de- pends on the degree of exposure, will differ as a result of the variabil- ity. Standardization of technique will help reduce this source of vari- ability. Figure 3 Anteroposterior radiograph of a dysplastic hip, showing the difficulty in defining the true margin of the acetabulum when measuring the center edge angle of Wiberg (solid lines). The apparent lateral edge of the acetabulum (arrow) is really a superimposition of the true anterior and posterior portions of the superior rim of the acetabulum. Inconsistent choice among observers may lead to errors in measurement. Classification Systems in Orthopaedics Journal of the American Academy of Orthopaedic Surgeons 294 These three sources of variation apply to all measurement processes. The variability of classification sys- tems is not just a problem of im- proving a classification system itself; it is only one aspect by which the reliability and utility of classification systems can be improved. Under- standing these sources of measure- ment variability and how to mini- mize them is critically important. 1,10 Assessment of Commonly Used Orthopaedic Classification Systems Although many classification sys- tems have been widely adopted and frequently used in orthopaedic sur- gery to guide treatment decisions, few have been scientifically tested for their reliability. A high degree of reliability or precision should be a minimum requirement before any classification system is adopted. The results of several recent studies that have tested various orthopaedic clas- sifications for their intraobserver and interobserver reliability are summa- rized in Table 1. 12-21 In general, the reliability of the listed classification systems would be considered low and probably unacceptable. Despite this lack of reliability, these systems are com- monly used. Although Table 1 lists only a limited number of systems, they were chosen because they have been subjected to reliability testing. Many other classification systems commonly cited in the literature have not been tested; consequently, there is no evidence that they are or are not reliable. In fact, most classi- fications systems for medical condi- tions and injuries that have been tested have levels of agreement that are considered unacceptably low. 22,23 There is no reason to believe that the classification systems that have not been tested would fare any bet- ter. Four of the studies listed in Table 1 are discussed in detail to highlight the methodology that should be used to assess the reliabil- ity of any classification system: the AO classification of distal radius fractures, 15 the classification of acetabular bone defect in revision hip arthroplasty, 13 the Severin clas- sification of congenital dislocation of the hip, 14 and the Vancouver clas- sification of periprosthetic fractures of the femur. 12 Kreder et al 15 assessed the reli- ability of the AO classification of distal radius fractures. This classi- fication system divides fractures into three types based on whether the fracture is extra-articular (type A), partial articular (type B), or com- plete articular (type C). These frac- ture types can then be divided into groups, which are further divided into subgroups with 27 possible combinations. Thirty radiographs of distal radial fractures were pre- sented to observers on two occa- sions. Before classifying the radio- graphs, a 30-minute review of the AO classification was conducted. Assessors also had a handout, which they were encouraged to use when classifying the fractures. There were 36 observers in all, including attending surgeons, clinical fellows, residents, and nonclinicians. These groups were chosen to ascertain whether the type of observer had an influence on the reliability of the classification. In this study, an unweighted kappa value was used. The authors evaluated intraobserver and interobserver reliability for AO type, AO group, and AO subgroup. The criteria of Landis and Koch 5 were used to grade the levels of agreement. Interobserver agree- ment was highest for the initial AO type, and it decreased for groups and subgroups as the number of categories increased. This should be expected because, as the number of categories increases, there is more opportunity for disagreement. Intraobserver agreement showed similar results. Kappa values for AO type ranged from 0.67 for resi- dents to 0.86 for attending surgeons. Again, with more detailed AO sub- groups, kappa values decreased progressively. When all 27 cate- gories were included, kappa values ranged from 0.25 to 0.42. The con- clusions of this study were that the use of AO types A, B, and C pro- duced levels of reliability that were high and acceptable. However, sub- classification into groups and sub- groups was unreliable. The clinical utility of using only the three types was not addressed and awaits fur- ther study. Several important aspects of this study, aside from the results, merit mention. This study showed that not only the classification system is tested but also the observer. For any classification system tested, it is important to document the ob- servers’ experience because this can substantially affect reliability. One omission in this study 15 was the lack of discussion of observed agreement and the prevalence of fracture categories; these factors have a distinct effect on observer variability. Campbell et al 13 looked at the reliability of acetabular bone defect classifications in revision hip arthro- plasty. One group of observers included the originators of the clas- sification system. This is the ulti- mate way to remove observer bias; however, it lacks generalizability because the originators would be expected to have unusually high levels of reliability. In this study, preoperative radiographs of 33 hips were shown to three different groups of observers on two occa- sions at least 2 weeks apart. The groups of observers were the three originators, three reconstructive orthopaedic surgeons, and three senior residents. The three classifi- cations assessed were those attrib- uted to Gross, 24 Paprosky, 25 and the American Academy of Orthopaedic Surgeons. 26 The unweighted kappa Donald S. Garbuz, MD, et al Vol 10, No 4, July/August 2002 295 Table 1 Intraobserver and Interobserver Agreement in Orthopaedic Classification Systems Intraobserver Interobserver Observed Observed Study Classification Assessors Agreement (%) κ Value Agreement (%) κ Value Brady Periprosthetic Reconstructive — 0.73 – 0.83* — 0.60 – 0.65* et al 12 femur fractures orthopaedic surgeons, (Vancouver) including originator; residents Campbell Acetabular bone Reconstructive — 0.05 – 0.75* — 0.11 – 0.28* et al 13 defect in revision orthopaedic surgeons, total hip (AAOS 26 ) including originators Campbell Acetabular bone Reconstructive — 0.33 – 0.55* — 0.19 – 0.62* et al 13 defect in revision orthopaedic surgeons, total hip (Gross 24 ) including originators Campbell Acetabular bone Reconstructive — 0.27 – 0.60* — 0.17 – 0.41* et al 13 defect in revision orthopaedic surgeons, total hip including originators (Paprosky 25 ) Ward et al 14 Congenital hip dis- Pediatric orthopaedic 45 – 61 0.20 – 0.44* 14 – 61 −0.01 – 0.42* location (Severin) surgeons 0.32 – 0.59 † 0.05 – 0.55 † Kreder et al 15 Distal radius (AO) Attending surgeons, — 0.25 – 0.42* — 0.33* fellows, residents, nonclinicians Sidor et al 16 Proximal humerus Shoulder surgeon, 62 – 86 0.50 – 0.83* — 0.43 – 0.58* (Neer) radiologist, residents Siebenrock Proximal humerus Shoulder surgeons — 0.46 – 0.71 † — 0.25 – 0.51 † et al 17 (Neer) Siebenrock Proximal humerus Shoulder surgeons — 0.43 – 0.54 † — 0.36 – 0.49 † et al 17 (AO/ASIF) McCaskie Quality of cement Experts in THA, — 0.07 – 0.63* — −0.04* et al 18 grade in THA consultants, residents Lenke et al 19 Scoliosis (King) Spine surgeons 56 – 85 0.34 – 0.95* 55 0.21 – 0.63* Cummings Scoliosis (King) Pediatric orthopaedic — 0.44 – 0.72* — 0.44* et al 20 surgeons, spine surgeons, residents Haddad Femoral bone Reconstructive — 0.43 – 0.62* — 0.12 – 0.29* et al 21 defect in revision orthopaedic surgeons total hip (AAOS, 30 Mallory, 28 Paprosky et al 29 ) * Unweighted † Weighted Classification Systems in Orthopaedics Journal of the American Academy of Orthopaedic Surgeons 296 value was used to assess the level of agreement. As expected, the originators had higher levels of intraobserver agree- ment than did the other two observer groups (AAOS, 0.57; Gross, 0.59; Paprosky, 0.75). However, levels of agreement fell markedly when tested by surgeons other than the origina- tors. This study underscores the im- portance of the qualifications of the observers in studies that measure reliability. To test the classification system itself, experts would be the initial optimal choice, as was the case in this study. 13 However, even if the originators have acceptable agreement, this result should not be generalized. Because most classifi- cation systems are developed for widespread use, reliability must be high among all observers for a sys- tem to have clinical utility. Hence, although the originators of the clas- sifications of femoral bone loss were not included in a similar study 21 at the same center, the conclusions of the study remain valuable with re- spect to the reliability of femoral bone loss classifications in the hands of orthopaedic surgeons other than the originators. Ward et al 14 evaluated the Severin classification, which is used to as- sess the radiographic appearance of the hip after treatment for con- genital dislocation. This system has six main categories ranging from normal to recurrent dislocation and is reported to be a prognostic indi- cator. Despite its widespread accep- tance, it was not tested for reliability until 1997. The authors made every effort to test only the classification system by minimizing other poten- tial sources of disagreement. All identifying markers were removed from 56 radiographs of hips treated by open reduction. Four fellow- ship-trained pediatric orthopaedic surgeons who routinely treated con- genital dislocation of the hip inde- pendently rated the radiographs. Before classifying the hips, the observers were given a detailed de- scription of the Severin classification. Eight weeks later, three observers repeated the classifying exercise. The radiographs were presented in a different order in an attempt to minimize recall bias. Both weighted and unweighted kappa values were calculated. Observed agreement also was calculated and reported so that the possibility of a high ob- served agreement with a low kappa value would be apparent. The kappa values, whether weighted or unweighted, were low, usually less than 0.50. The authors of this study used the arbitrary criteria of Svanholm et al 6 to grade their agree- ment and concluded that this classi- fication scheme is unreliable and should not be widely used. This study demonstrated the method- ology that should be used when test- ing classification systems. It elimi- nated other sources of disagreement and focused on the precision of the classification system itself. The Vancouver classification of periprosthetic femur fractures is an example of a system that was tested for reliability prior to its wide- spread adoption and use. 12 The first description was published in 1995. 27 Shortly afterward, testing began on the reliability and the validity of this system. The meth- odology was similar to that de- scribed in the three previous stud- ies. Reliability was acceptable for the three experienced reconstruc- tive orthopaedic surgeons tested, including the originator. To assess generalizability, three senior resi- dents also were assessed for their intraobserver and interobserver reliability. The kappa values for this group were nearly identical to those of the three expert surgeons. This study confirmed that the Vancouver classification is both reliable and valid. With these two criteria met, this system can be rec- ommended for widespread use and can subsequently be assessed for its value in guiding treatment and out- lining prognosis. Summary Classification systems are tools for identifying injury patterns, assessing prognoses, and guiding treatment decisions. Many classification sys- tems have been published and wide- ly adopted in orthopaedics without information available on their relia- bility. Classification systems should consistently produce the same results. A system should, at a minimum, have a high degree of intraobserver and interobserver reliability. Few systems have been tested for this re- liability, but those that have been tested generally fall short of accept- able levels of reliability. Because most classification systems have poor reliability, their use to differentiate treatments and suggest outcomes is not warranted. A system that has not been tested cannot be assumed to be reliable. The systems used by ortho- paedic surgeons must be tested for reliability, and if a system is not found to be reliable, it should be modified or its use seriously ques- tioned. Improving reliability involves looking at many components of the classification process. 1 Methodologies exist to assess classifications, with the kappa value the standard for measuring ob- server reliability. Once a system is found to be reliable, the next step is to prove its utility. Only when a system is shown to be reliable should it be widely adopted by the medical community. This should not be construed to mean that untested classification systems, or those with disappointing reliability, are without value. Systems are needed to categorize or define sur- gical problems before surgery in or- der to plan appropriate approaches and techniques. Classification sys- tems provide a discipline to help define pathology as well as a lan- Donald S. Garbuz, MD, et al Vol 10, No 4, July/August 2002 297 guage to describe that pathology. However, it is necessary to recog- nize the limitations of existing clas- sification systems and the need to confirm or refine proposed preop- erative categories by careful intra- operative observation of the actual findings. Furthermore, submission of classification systems to statisti- cal analysis highlights their inher- ent flaws and lays the groundwork for their improvement. References 1. Wright JG, Feinstein AR: Improving the reliability of orthopaedic measurements. J Bone Joint Surg Br 1992;74:287-291. 2. Cohen J: A coefficient of agreement for nominal scales. Educational and Psychological Measurement 1960;20:37-46. 3. Kristiansen B, Andersen UL, Olsen CA, Varmarken JE: The Neer classification of fractures of the proximal humerus: An assessment of interobserver varia- tion. Skeletal Radiol 1988;17:420-422. 4. Kraemer HC: Extension of the kappa coefficient. Biometrics 1980;36:207-216. 5. Landis JR, Koch GG: The measurement of observer agreement for categorical data. Biometrics 1977;33:159-174. 6. Svanholm H, Starklint H, Gundersen HJ, Fabricius J, Barlebo H, Olsen S: Reproducibility of histomorphologic diagnoses with special reference to the kappa statistic. APMIS 97 1989;689-698. 7. Feinstein AR, Cicchetti DV: High agreement but low kappa: I. The prob- lems of two paradoxes. J Clin Epidemiol 1990;43:543-549. 8. Cicchetti DV, Feinstein AR: High agreement but low kappa: II. Resolving the paradoxes. J Clin Epidemiol 1990;43: 551-558. 9. Byrt T, Bishop J, Carlin JB: Bias, prevalence and kappa. J Clin Epidemiol 1993;46:423-429. 10. Clinical disagreement: I. How often it occurs and why. Can Med Assoc J 1980: 123;499-504. 11. Lowe RW, Hayes TD, Kaye J, Bagg RJ, Luekens CA: Standing roentgeno- grams in spondylolisthesis. Clin Orthop 1976;117:80-84. 12. Brady OH, Garbuz DS, Masri BA, Duncan CP: The reliability and valid- ity of the Vancouver classification of femoral fractures after hip replace- ment. J Arthroplasty 2000;15:59-62. 13. Campbell DG, Garbuz DS, Masri BA, Duncan CP: Reliability of acetabular bone defect classification systems in revision total hip arthroplasty. J Arthro- plasty 2001;16:83-86. 14. Ward WT, Vogt M, Grudziak JS, Tumer Y, Cook PC, Fitch RD: Severin classification system for evaluation of the results of operative treatment of congenital dislocation of the hip: A study of intraobserver and interob- server reliability. J Bone Joint Surg Am 1997;79:656-663. 15. Kreder HJ, Hanel DP, McKee M, Jupiter J, McGillivary G, Swiont- kowski MF: Consistency of AO frac- ture classification for the distal radius. J Bone Joint Surg Br 1996;78:726-731. 16. Sidor ML, Zuckerman JD, Lyon T, Koval K, Cuomo F, Schoenberg N: The Neer classification system for proximal humeral fractures: An assessment of interobserver reliability and intraob- server reproducibility. J Bone Joint Surg Am 1993;75:1745-1750. 17. Siebenrock KA, Gerber C: The repro- ducibility of classification of fractures of the proximal end of the humerus. J Bone Joint Surg Am 1993;75:1751-1755. 18. McCaskie AW, Brown AR, Thompson JR, Gregg PJ: Radiological evaluation of the interfaces after cemented total hip replacement: Interobserver and intraobserver agreement. J Bone Joint Surg Br 1996;78:191-194. 19. Lenke LG, Betz RR, Bridwell KH, et al: Intraobserver and interobserver reli- ability of the classification of thoracic adolescent idiopathic scoliosis. J Bone Joint Surg Am 1998;80:1097-1106. 20. Cummings RJ, Loveless EA, Campbell J, Samelson S, Mazur JM: Interobserver reliability and intraobserver repro- ducibility of the system of King et al. for the classification of adolescent idio- pathic scoliosis. J Bone Joint Surg Am 1998; 80:1107-1111. 21. Haddad FS, Masri BA, Garbuz DS, Duncan CP: Femoral bone loss in total hip arthroplasty: Classification and preoperative planning. J Bone Joint Surg Am 1999;81:1483-1498. 22. Koran LM: The reliability of clinical methods, data and judgments (first of two parts). N Engl J Med 1975;293: 642-646. 23. Koran LM: The reliability of clinical methods, data and judgments (second of two parts). N Engl J Med 1975;293: 695-701. 24. Garbuz D, Morsi E, Mohamed N, Gross AE: Classification and recon- struction in revision acetabular arthro- plasty with bone stock deficiency. Clin Orthop 1996;324:98-107. 25. Paprosky WG, Perona PG, Lawrence JM: Acetabular defect classification and surgical reconstruction in revision arthroplasty: A 6-year follow-up eval- uation. J Arthroplasty 1994;9:33-44. 26. D’Antonio JA, Capello WN, Borden LS: Classification and management of ace- tabular abnormalities in total hip ar- throplasty. Clin Orthop 1989;243:126-137. 27. Duncan CP, Masri BA: Fractures of the femur after hip replacement. Instr Course Lect 1995;44:293-304. 28. Mallory TH: Preparation of the proxi- mal femur in cementless total hip revi- sion. Clin Orthop 1988;235:47-60. 29. Paprosky WG, Lawrence J, Cameron H: Femoral defect classification: Clinical application. Orthop Rev 1990; 19(suppl 9):9-15. 30. D’Antonio J, McCarthy JC, Bargar WL, et al: Classification of femoral abnor- malities in total hip arthroplasty. Clin Orthop 1993;296:133-139

Ngày đăng: 11/08/2014, 19:20

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w