Safer Surgery 104 threats and increasing the likelihood of further error, creating a cascade that leads to more serious surgical problems, and subsequently to harm or adverse event. Threats either predispose errors that cause minor failures in process, or directly cause minor failures themselves. These minor failures either lead to more threats, and so to more errors, or lead directly to more serious or potentially dangerous major failures. Major failures may expose more threats, create more errors and can lead directly to an adverse outcome (Catchpole et al. 2005). Though we did not directly measure surgical outcomes or observe any death, in more than 40 cases the Great Ormond Street team observed over 500 minor problems and 8 major problems that represented considerable lapses in the quality of care given, and a serious threat to the safety of the patient. The reader is directed to our accompanying chapter (Catchpole, Chapter 19) in this volume that describes the results in orthopaedic surgery. The multidisciplinary, cross-industry team wondered why some operations went more smoothly than others, and why in some operations even a large number of small problems did not result in serious problems. In part, the research team observed that the escalation from small, Figure 7.1 Escalation model of surgical error Rating Operating Theatre Teams 105 seemingly innocuous problems to these sometimes life-threatening situations was dependent upon the type of operation performed, and the risk (or complexity) of the operation. Some operations had more critical stages, and so the coincidence of a minor problem at a critical time might also be more likely, and also certain operations would be more demanding and thus might be more likely to result in human errors, or were more sensitive to the overloading of individual mental and physical capacity. The Great Ormond Street team therefore concluded that non- technical skills, which are specically trained for in aviation to address these types of situations, might also have an inuence in surgical care, and thus sought at rst to derive a simple scale for evaluating this hypothesis. NOTECHS was designed through a pan-European collaboration between airlines and academics to provide a generic structure for non-technical skills in aviation training and to allow consistent assessment across different national and organisational cultures (Avermaete and van Kruijsen 1998). It was this broad applicability, demonstrated utility (Avermaete and van Kruijsen 1998, Flin et al. 2003), operational validation (O’Connor et al. 2002, Lodge et al. 2001) and the success with which the method of developing the scale had been adapted to specialties in medicine previously (Fletcher et al. 2003, 2004) that this behavioural marker methodology was selected for adaptation to the surgical environment. The NOTECHS system consists of elements of behaviour grouped into four categories, which can be further grouped into two social skill categories (teamwork and cooperation, leadership and management) and two cognitive skills categories (situation awareness and problem- solving and decision-making), and was adapted for use with surgical teams following consultation with two cardiac surgeons, one vascular surgeon, one orthopaedic surgeon, one human factors researcher and two aviation non-technical skills trainers who regularly used NOTECHS in civil airline simulation. In this rst iteration of the Surgical NOTECHS system, the changes from aviation to surgery largely related to alterations in language used, in the types of behaviour that were dened as markers, and the structure of the situation awareness dimension. The Surgical NOTECHS scale was applied to the whole team in each operation by assigning a score from 1 to 4 to each of the four dimensions (Table 7.1). This was conducted once each for three pre-dened periods of the operation. The rst phase, described as the access phase, started with the rst incision, and lasted until the site of the surgical treatment had been exposed. The second phase, known as the treatment phase, followed directly afterwards, and lasted until the completion of the surgical treatment. The nal phase, known as the closure phase, lasted from the completion of the surgical treatment, to the moment that the nal suture in the closure of the incision was tied off. The human factors (non-clinically trained) observer (KC) recorded his observations on the scoring sheet, using ticks, crosses and notes, in order to promote consistency and balance of judgement intra- operatively and inter-operatively, and then made a global estimate of his overall impression of performance. For paediatric cardiac surgery, the Surgical NOTECHS evaluation was conducted entirely from the videotaped operation. In orthopaedic surgery, it was conducted by the observers in situ. Operations were then ranked Safer Surgery 106 from best to worst according to the number of ‘below standard’ Surgical NOTECHS scores obtained in each operation, then the number of ‘basic standard’ scores, then ‘standard’ and nally ‘exceed’ scores. This gave an overall order to each operation purely in terms of the positive and negative non-technical behaviour observed. In paediatric cardiac surgery this simple approach proved effective, useful and informative, and helped to establish a methodology and a substantive link between non-technical skills, process and, by implication with previous work (de Leval et al. 2000), outcome. There was a moderately close relationship between the number of minor process problems and the ranked Surgical NOTECHS score (Figure 7.2), since this type of surgery requires considerable use of these types of skills, particularly in the management of blood circulation between anaesthetist, perfusionist and surgeon. Moreover, because this type of surgery involved large teams, and is amongst the most complex, technically demanding, and potentially risky of any surgery, the surgical process can quickly fall apart if the operating team do not work together effectively. Combined with previous observations regarding operative risk and operative type, the Great Ormond Street team found that escalation from small problems to bigger Table 7.1 Summary of rst iteration of the surgical NOTECHS scoring system LEADERSHIP and MANAGEMENT Leadership, planning and preparation, workload management, authority and assertiveness TEAMWO RK and COOPERATION Team building/maintaining, support of others, understanding team needs, conict solving PROB LEM-SOLVING and DECISION-MAKING Denition and diagnosis, option generation, risk assessment, outcome review SIT UATION AWARENESS Notice (patient, procedure, people), understand (patient, procedure, people), think ahead (patient, procedure, people) Below standard = 1 Behaviour directly compromises patient safety and effective teamwork. Basic standard = 2 Behaviour in other conditions could directly compromise patient safety and effective teamwork. Standard = 3 Behaviour maintains an effective level of patient safety and teamwork. Excellent = 4 Behaviour enhances patient safety and teamwork. A model for all other teams. Source: Catchpole et al. 2005 Rating Operating Theatre Teams 107 problems was partially mitigated in teams with higher non-technical skills ratings (Catchpole et al. 2007). Thus, it was possible to describe a mechanism for surgical failure, which suggested direct methods for improvements in surgical performance and safety, one of which was the explicit training of non-technical skills. Though the model also tted orthopaedic surgery, it was clear that this simple method for scoring non-technical skills as a team did not t as well when an operation had widely differing demands on individuals. Knee replacement operations are demanding of some members of the team, with success relying heavily on the relationship between the scrub nurse and the surgeon, while the anaesthetist was rarely involved directly with the surgical procedure, and had little interaction with the team, even though, for example, patients can have rare but extreme reactions to the surgical treatment. Thus, with nurses and surgeons who worked well together, the course of the operation might be smooth, even though the anaesthetist might not be as situationally aware – or as safe – as might be desirable. The escalation from small problems to bigger ones, and the inuence of risk, operative type and non-technical performance is illustrated in Figure 7.3. To provide greater quality to the evaluation of non-technical skills in a wider range of operations, and to evaluate the success of an aviation-style non-technical skills training programme in more common types of surgery, the original method for non-technical skills measurement needed renement. The key limitation was that the rst iteration of the Surgical NOTECHS system could not account for different contributions of individuals or sub-teams to the success of the surgery. This might be particularly important where certain types of surgery did not usually demand the constant interaction with another team member or sub-team, encouraging a lack of awareness that could prove deleterious during the development of more acute events in the process of surgery. Moreover, it might also help to better understand Figure 7.2 Relationship between minor failures and ranked non-technical skills performance in paediatric cardiac surgery Source: Catchpole et al. 2005 Safer Surgery 108 the team dynamics, and the contribution of sub-teams to the overall success of an operation. In the remainder of this chapter, we describe the development and validation of this rened non-technical performance measurement methodology. The Oxford NOTECHS System The tool was developed by a consultant upper gastro-intestinal surgeon, a consultant vascular surgeon, a clinical research fellow, who was also a trainee Figure 7.3 Mechanisms of surgical failure Rating Operating Theatre Teams 109 surgeon, and the human factors researcher and aviation trainers involved in the rst iteration. The most obvious change from the rst iteration was the scoring of performance of each of the three sub-teams (surgeons, anaesthetists and nurses) in each dimension for every operation. Overall sub-team performance was taken as the sum of the dimension performances (out of 16). The overall team non-technical performance was calculated from the sum of the overall sub-team performance scores (out of 48). Each overall team dimension performance was scored as the sum of all the sub-team performances in that dimension (out of 12). Thus, a non- technical score was obtained in each dimension for the theatre team, and for each sub-team of surgeons, anaesthetists and nurses. Repeat scoring for three phases of the operation was abandoned to ease the burden on the observers, and because the operations were generally of shorter duration and without such clearly delineated phases as cardiac surgery. As part of a larger study that sought to examine the value of multidisciplinary aviation-style teamwork training on performance in the operating theatre (McCulloch et al. 2009), we applied the Oxford NOTECHS scale to examine behavioural change. This provided the opportunity to examine a number of properties of the scale for validity and reliability purposes. We observed 65 laparoscopic cholecystectomies (LC) and 45 carotid endarterectomies (CEA) in the pre- and post-intervention phases of the study. For each case, a non-technical performance score was given using the Oxford NOTECHS system. We further evaluated relationships between technical errors, minor process problems and the sub-team non-technical scores, as well as the four dimensions. This allowed a more detailed analysis of the aspects of teamwork most closely associated with changes in these outcome measures. Technical errors were identied at the same time using the Observational Clinical Human Reliability Assessment (OCHRA) system (Tang et al. 2004), while minor process problems, were identied using the previous framework (Catchpole et al. 2006), and operative duration was also recorded. In 36 cases, the teams were assessed independently by the study’s two principal observers, and in 14 other cases by a third independent observer. Five cases were also studied with one observer using Oxford NOTECHS, and the other using the Observational Teamwork Assessment for Surgery (OTAS) scale (Undre et al. 2007). Reliability of the Oxford NOTECHS Tool Inter-rater reliability was evaluated using the Rwg statistic for overall Oxford NOTECHS scores and component dimensions by parallel independent scoring of cases by two observers. Including all pre- and post-intervention cases, 24 LCs and 12 CEAs were independently dual observed. Reliability of Oxford NOTECHS scoring is reported for all 36 cases in Table 7.2, and in CEA operations only in Table 7.3. Results for the 24 LC operations on their own have been reported elsewhere (Mishra et al. 2009). In all cases, and for most sub-teams and dimensions, overall Safer Surgery 110 reliability is high. In CEA reliability on the problem-solving and decision-making dimension was lower than might be desired for some teams, as were the ratings on the surgeons’ cognitive dimensions. This reects difculties in scoring where observable behaviours may be limited. For ten of the LCs and four of the CEAs, a third observer was invited to independently score the theatre teams on their non-technical performance. Overall reliability was good (Table 7.4) but again lowest reliability was noted in scoring of situation awareness (SA). This is perhaps because the third observer, from an aviation background, had a very basic understanding of the workings of an operating theatre. α LM TC PD SA Total Surgeons 0.87 0.90 0.83 0.83 0.96 Anaesthetists 0.90 0.97 0.83 0.93 0.97 Nurses 0.87 0.90 0.93 0.83 0.95 Total 0.91 0.94 0.96 0.93 0.98 Table 7.2 Reliability (R wg ) of Oxford NOTECHS tool for 36 dual observed LCs and CEAs α LM TC PD SA Total Surgeons 0.89 0.89 0.91 0.86 0.96 Anaesthetists 0.94 0.91 0.93 0.93 0.97 Nurses 0.87 0.89 0.87 0.90 0.96 Total 0.94 0.93 0.96 0.96 0.98 Table 7.3 Reliability (R wg ) of Oxford NOTECHS for 12 dual observed CEAs α LM TC PD SA Total Surgeons 1.00 0.89 0.91 0.83 0.98 Anaesthetists 0.94 0.94 0.94 0.97 0.98 Nurses 0.94 0.83 0.97 0.86 0.94 Total 0.98 0.91 0.96 0.93 0.98 Table 7.4 Reliability of Oxford NOTECHS in 14 cases observed independently with third observer Rating Operating Theatre Teams 111 We also examined the reliability of the non-technical skills ratings over time by dividing data in both pre- and post-intervention conditions into three, and comparing the performance in these one-third cohorts. In laparoscopic cholecystectomy, differences between the thirds of the cohorts were not signicant using ANOVA either before (F=1.34, p=0.28) or after (F=1.03, p=0.34) the training intervention. Similarly in carotid endarterectomy, differences between the thirds of the cohorts were not signicant before (F=1.93, p=0.17) or after (F=1.01, p=0.38). Though clearly, reliability over time needs to be studied in more detail, this at least suggested that when comparing between pre- and post- intervention data to examine the effect of the training programme, we could be more condent that the effect was not brought about by changes in scores over time. Convergent Validity Overall agreement between OTAS and Oxford NOTECHS was excellent (r= 0.88, n=5, p=0.04). The mean OTAS score for the ve cases compared was 18.8 (range 14–22 out of a possible maximum of 30), and the mean Oxford NOTECHS score was 37.8 (range 33–45, out of a possible maximum of 48), also suggesting that data on both scales covered a similar range in relation to the overall scale maxima and minima (see Figure 7.4). Figure 7.4 Oxford NOTECHS scores against OTAS scores for 5 LCs Safer Surgery 112 Relationship between Oxford NOTECHS, Technical Errors and Minor Failures We exploited our developing model of the relationship between non-technical skills and intra-operative performance to examine the relationship between the Oxford NOTECHS scores and technical errors and minor intra-operative problems. We observed 65 LCs and 45 CEAs, in which non-technical performance was evaluated using the Oxford NOTECHS scale, and technical errors noted using the OCHRA system. There were no associations between non-technical performances and technical errors of statistical signicance in CEA, but in LC there was a strong association with surgeons’ SA (ρ = –0.54, p<0.001). Linear regression analysis suggests that surgeons’ SA and nurses’ problem-solving and decision-making (PD) combined are responsible for 40.5 per cent of the variation in technical errors seen. For carotid endarterectomy, signicant relationships are between nurses’ teamwork and cooperation (TC) and minor problems (ρ=–0.30, p=0.04) and between nurses’ sub-team Oxford NOTECHS and minor problems, but there were no associations between non-technical skills performance and technical errors of statistical signicance. As with LC, there is a suggestion of the number of technical errors being negatively related to the overall non-technical performance, but in this group the associations were not signicant. Examining operative duration, for LC, signicant relationships were seen between anaesthetists’ non-technical skills and operative duration (ρ=–0.25, p=0.041) and between team leadership and management (LM) and duration (ρ=–0.25, p=0.046). For CEA, correlations between surgeons’ LM and non- technical skills (ρ=–0.31, p=0.037), and surgeons’ SA and non-technical skills (ρ=–0.31, p=0.040) were signicant. Discussion The Oxford NOTECHS tool has been found to be reliable and to relate in different ways, depending upon operative type, to other intra-operative performance measures. One nding was that greater surgical situation awareness may result in fewer technical errors for LC and CEA operations. Surgical SA is assessed by gauging the surgical team’s awareness of the state of the patient, stage of the procedure and availability of theatre staff. It is unsurprising therefore that excellence in this dimension translates into technical outcome, and so perhaps disappointing that no relationship was found in the other type of operation (CEA). Laparoscopy is known to be cognitively demanding, which may explain the closer relationship with cognitive skills scores. Very few technical errors were recorded in CEA, and we did observe a trend toward the results in LC, so it may be that simply a greater sample of cases would be necessary to demonstrate the relationship. Clearly, we also need to be cautious with these results since SA can be difcult to observe. Rating Operating Theatre Teams 113 Minor failures are a reection of errors produced mainly outside the operating eld, and it is therefore logical that better coordination amongst the nursing staff results in fewer problems. For example, where higher nursing teamwork scores were recorded, fewer psychomotor general errors (such as dropping instruments) were noted. Clearly, one criticism is that the Oxford NOTECHS and failure assessments were scored by the same observer and thus may be contaminating one another. However, this relationship was found to be consistent after analysis of scores performed by both observers, and the Oxford NOTECHS scores were further validated by comparison with scoring by an independent observer uninvolved in the recording of minor failures. So Oxford NOTECHS at least seems resistant to this contamination, even though the timing of the Oxford NOTECHS ratings at the end of the operation, after the recording of minor failures, means it would be the more likely measure to be contaminated. In the UK, three groups working independently and more or less simultaneously have derived observational techniques of team performance in the operating theatre that have a great many resemblances. Indeed, we suspect that the differences largely relate to decisions made about the appropriate trade-offs between conicting demands. The ANTS (Fletcher et al. 2003) and NOTSS (Yule et al. 2006) systems, for anaesthetists and surgeons respectively, have been developed at the University of Aberdeen to assess individuals, and primarily as training aids, and they are both extremely well designed for these purposes (see Chapters 2, 11 and 12 in this volume). The OTAS system (Undre et al. 2007), developed by the Clinical Safety Research Unit at Imperial College, London, has been developed to measure team behaviour in the operating theatre. It is prescriptive – so relatively easily trained – but requires the full attention of a single observer (see Chapter 6). Our Oxford NOTECHS tool has been developed to allow assessment of team performance simultaneously with other intra-operative parameters, and has evolved from assessing whole team performance to allow the rating of sub-teams. We have demonstrated here that, though it may require observational experience and some calibration, it can indeed be used reliably even by non-specialists and may relate to other aspects of intra-operative performance. We feel that establishing the relationship between teamwork and quality of care or operative duration may be one way to evaluate whether training for these skills in healthcare is valuable, and to engage front line staff in thinking differently about how they practice. The complexity and difculty of observing performance and behaviour in the operating theatre should not be underestimated. Meaningful, useful and reliable data are dependent upon the skills of the observer – who is the basic tool in the observation – and the design of the observational method to be appropriate for the type of operation, the parameters being studied, and the purpose of the observations. We have described the evolution and substantive afrmation of the value of the Oxford NOTECHS technique for evaluating non-technical skills in the operating theatre, and have demonstrated an iterative cycle that swings between measurement and theoretical development. Thus, by building on the excellent research work reported in this volume, we hope to work with others to develop better measurement methods, that require fewer trade-offs to be made, that examine these skills in more . this type of surgery involved large teams, and is amongst the most complex, technically demanding, and potentially risky of any surgery, the surgical process can quickly fall apart if the operating. contributions of individuals or sub-teams to the success of the surgery. This might be particularly important where certain types of surgery did not usually demand the constant interaction with. orthopaedic surgery, it was conducted by the observers in situ. Operations were then ranked Safer Surgery 106 from best to worst according to the number of ‘below standard’ Surgical NOTECHS scores