Tài liệu Computational Intelligence in Microelectronics Manufacturing docx

58 319 0
Tài liệu Computational Intelligence in Microelectronics Manufacturing docx

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Computational Intelligence in Microelectronics Manufacturing May, Gary S. "Computational Intelligence in Microelectronics Manufacturing" Computational Intelligence in Manufacturing Handbook Edited by Jun Wang et al Boca Raton: CRC Press LLC,2001 ©2001 CRC Press LLC 13 Computational Intelligence in Microelectronics Manufacturing 13.1 Introduction 13.2 The Role of Computational Intelligence 13.3 Process Modeling 13.4 Optimization 13.5 Process Monitoring and Control 13.6 Process Diagnosis 13.7 Summary 13.1 Introduction New knowledge and tools are constantly expanding the range of applications for semiconductor devices, integrated circuits, and electronic packages. The solid-state computing, telecommunications, aerospace, automotive and consumer electronics industries all rely heavily on the quality of these methods and processes. In each of these industries, dramatic changes are underway. In addition to increased perfor- mance, next-generation computing is increasingly being performed by portable, hand-held computers. A similar trend exists in telecommunications, where the user will soon be employing high-performance, multifunctional, portable units. In the consumer industry, multimedia products capable of voice, image, video, text, and other functions are also expected to be commonplace within the next decade. The common thread in each of these trends is low-cost electronics. This multi-billion-dollar electronics industry is fundamentally dependent on the manufacture of semiconductor integrated circuits (ICs). However, the fabrication of ICs is extremely expensive. In fact, the last couple of decades have seen semiconductor manufacturing become so capital-intensive that only a few very large companies can participate. A typical state-of-the-art, high-volume manufacturing facility today costs over a billion dollars [Dax, 1996]. As shown in Figure 13.1, this represents a factor of over 1000 increase over the cost of a comparable facility 20 years ago. If this trend continues at its present rate, facility costs will exceed the total annual revenue of any of the four leading U.S. semiconductor companies at the turn of the century [May, 1994]. Because of rising costs, the challenge before semiconductor manufacturers is to offset capital invest- ment with a greater amount of automation and technological innovation in the fabrication process. In other words, the objective is to use the latest developments in computer technology to enhance the manufacturing methods that have become so expensive. In effect, this effort in computer-integrated Gary S. May Georgia Institute of Technology ©2001 CRC Press LLC manufacturing of integrated circuits (IC-CIM) is aimed at optimizing the cost-effectiveness of integrated circuit manufacturing as computer-aided design (CAD) has dramatically affected the economics of circuit design. Under the overall heading of reducing manufacturing cost, several important subtasks have been identified. These include increasing chip fabrication yield, reducing product cycle time, maintaining consistent levels of product quality and performance, and improving the reliability of processing equip- ment. Unlike the manufacture of discrete parts such as electrical appliances, where relatively little rework is required and a yield greater than 95% on salable product is often realized, the manufacture of integrated circuits faces unique obstacles. Semiconductor fabrication processes consist of hundreds of sequential steps, and yield loss occurs at every step. Therefore, IC manufacturing processes have yields as low as 20 to 80%. The problem of low yield is particularly severe for new fabrication sequences. Effective IC-CIM systems, however, can alleviate such problems. Table 13.1 summarizes the results of a Toshiba 1986 study that analyzed the use of IC-CIM techniques in producing 256K dynamic RAM memory circuits [Hodges et al., 1989]. This study showed that CIM techniques improved the manufacturing process on each of the four productivity metrics investigated. Because of the large number of steps involved, maintaining product quality in an IC manufacturing facility requires strict control of literally hundreds or even thousands of process variables. The interde- pendent issues of high yield, high quality, and low cycle time have been addressed in part by the ongoing development of several critical capabilities in state-of-the-art IC-CIM systems: in situ process monitoring, process/equipment modeling, real-time closed-loop process control, and equipment malfunction diagno- sis. Each of these activities increases throughput and reduces yield loss by preventing potential mispro- cessing, but each presents significant engineering challenges in effective implementation and deployment. 13.2 The Role of Computational Intelligence Recently, the use of computational intelligence in various manufacturing applications has dramatically increased, and semiconductor manufacturing is no exception to this trend. Artificial neural networks FIGURE 13.1 Graph of rising integrated circuit fabrication costs in thousands of dollars over the last three decades. (Source: May, G., 1994. Manufacturing ICs the Neural Way, IEEE Spectrum , 31(9):47-51. With permission.) ©2001 CRC Press LLC [Dayhoff, 1990], genetic algorithms [Goldberg, 1989], expert systems [Parsaye and Chignell, 1988], and other techniques have emerged as powerful tools for assisting IC-CIM systems in performing various process monitoring, modeling, control, and diagnostic functions. The following is an introduction to various computational intelligence tools in preparation for a more detailed description of the manner in which these tools have been used in IC-CIM systems. 13.2.1 Neural Networks Because of their inherent learning capability, adaptability, and robustness, artificial neural nets are used to solve problems that have heretofore resisted solutions by other more traditional methods. Although the name “neural network” stems from the fact that these systems crudely mimic the behavior of biological neurons, the neural networks used in microelectronics manufacturing applications actually have little to do with biology. However, they share some of the advantages that biological organisms have over standard computational systems. Neural networks are capable of performing highly complex mappings on noisy and/or nonlinear data, thereby inferring very subtle relationships between diverse sets of input and output parameters. Moreover, these networks can also generalize well enough to learn overall trends in functional relationships from limited training data. There are several neural network architectures and training algorithms eligible for manufacturing applications. However, the backpropagation (BP) algorithm is the most generally applicable and most popular approach for microelectronics manufacturing. Feedforward neural networks trained by BP consist of several layers of simple processing elements called “neurons” (Figure 13.2). These rudimentary processors are interconnected so that information relevant to input–output mappings is stored in the weight of the connections between them. Each neuron contains the weighted sum of its inputs filtered by a sigmoid transfer function. The layers of neurons in BP networks receive, process, and transmit critical information about the relationships between the input parameters and corresponding responses. In addition to the input and output layers, these networks incorporate one or more “hidden” layers of neurons that do not interact with the outside world, but assist in performing nonlinear feature extraction tasks on information provided by the input and output layers. In the BP learning algorithm, the network begins with a random set of weights. Then an input vector is presented and fed forward through the network, and the output is calculated by using this initial weight matrix. Next, the calculated output is compared to the measured output data, and the squared difference between these two vectors determines the system error. The accumulated error for all of the input–output pairs is defined as the Euclidean distance in the weight space that the network attempts to minimize. Minimization is accomplished via the gradient descent approach, in which the network weights are adjusted in the direction of decreasing error. It has been demonstrated that, if a sufficient number of hidden neurons are present, a three-layer BP network can encode any arbitrary input–output relationship [Irie and Miyake, 1988]. The structure of a typical BP network appears in Figure 13.3. Referring to this figure, let w i,j,k = weight between the j th neuron in layer ( k –1) and the i th neuron in layer k ; in i,k = input to the i th neuron in the k th layer; and out i,k = output of the i th neuron in the k th layer. The input to a given neuron is given by TABLE 13.1 Results of 1986 Toshiba Study Productivity Metric No CIM With CIM Turnaround Time 1.0 0.58 Integrated Unit Output 1.0 1.50 Average Equipment Uptime 1.0 1.32 Direct Labor Hours 1.0 0.75 Source: Hodges, D., Rowe, L., and Spanos, C., 1989. Computer-Integrated Manufac- turing of VLSI, Proc. IEEE/CHMT Int. Elec. Manuf. Tech. Symp. , 1-3. With permission. ©2001 CRC Press LLC Equation (13.1) where the summation is taken over all the neurons in the previous layer. The output of a given neuron is a sigmoidal transfer function of the input, expressed as Equation (13.2) Error is calculated for each input–output pair as follows: Input neurons are assigned a value and com- putation occurs by a forward pass through each layer of the network. Then the computed value at the output is compared to its desired value, and the square of the difference between these two vectors provides a measure of the error ( E ) using Equation (13.3) where n is the number of layers in the network, q is the number of output neurons, d j is the desired output of the j th neuron in the output layer, and out j,n is the calculated output of that same neuron. FIGURE 13.2 Schematic of a single neuron. The output of the neuron is a function of the weighted sum of its inputs, where F is a sigmoid function. Feedforward neural networks consist of several layers of interconnected neurons. ( Source: Himmel, C. and May, G., 1993. Advantages of Plasma Etch Modeling Using Neural Networks over Statistical Techniques, IEEE Trans. Semi. Manuf. , 6(2):103-111. With permission.) in w out ik ijk jk j ,,,,– =⋅ [] ∑ 1 out e ik in ik , – , = + 1 1 E d out jjn j q = () = ∑ 05 2 1 .– , ©2001 CRC Press LLC After a forward pass through the network, error is propagated backward from the output layer. Learning occurs by minimizing error through modification of the weights one layer at a time. The weights are modified by calculating the derivative of E and following the gradient that results in a minimum value. From Equations 13.1 and 13.2, the following partial derivatives are computed as Equation (13.4) Now let Equation (13.5) Using the chain rule, the gradient of error with respect to weights is given by Equation (13.6) FIGURE 13.3 BP neural network showing input, output, and hidden layers, as well as interconnection strengths (weights), inputs and outputs of neurons in different layers. ( Source: Himmel, C. and May, G., 1993 . Advantages of Plasma Etch Modeling Using Neural Networks Over Statistical Techniques, IEEE Trans. Semi. Manuf. , 6(2):103-111. With permission.) ∂ ∂ = ∂ ∂ = () in w out out in out out ik ijk jk ik ik jk ik , ,, ,– , , ,– , – 1 1 1 ∂ ∂ = ∂ ∂ = E in E out ik ik ik ik , , , , – – δ φ ∂ ∂ = ∂ ∂       ∂ ∂       =⋅ E w E in in w out ijk ik ik ijk ik jk ,, , , ,, ,,– – δ 1 ©2001 CRC Press LLC In the previous expression, the out j,k-1 is available from the forward pass. The quantity δ i,k is calculated by propagating the error backward through the network. Consider that for the output layer Equation (13.7) where the expressions in Equations 13.3 and 13.4 have been substituted. Likewise, the quantity φ i,n is given by Equation (13.8) Consequently, for the inner layers of the network Equation (13.9) where the summation is taken over all neurons in the ( k + 1) th layer. This expression can be simplified using Equations 13.1 and 13.5 to yield Equation (13.10) Then δ i,k is determined from Equation 13.7 as Equation (13.11) Note that φ i,k depends only on the δ in the ( k + 1) th layer. Thus, φ for all neurons in a given layer can be computed in parallel. The gradient of the error with respect to the weights is calculated for one pair of input–output patterns at a time. After each computation, a step is taken in the opposite direction of the error gradient. This procedure is iterated until convergence is achieved. 13.2.2 Genetic Algorithms Neural networks are an extremely useful tool for defining the often complex relationships between controllable process conditions and measurable responses in electronics manufacturing processes. How- ever, in addition to the need to predict the output behavior of a given process given a set of input conditions, one would also like to be able to use such models “in reverse.” In other words, given a target response or set of response characteristics, it is often desirable to derive an optimum set of process conditions (or process “recipe”) to achieve these targets. Genetic algorithms (GAs) are a method to optimize a given process and define this reverse mapping. In the 1970s, John Holland introduced GAs as an optimization procedure [Holland, 1975]. Genetic algorithms are guided stochastic search techniques based on the principles of genetics. They use three operations found in natural evolution to guide their trek through the search space: selection, crossover, and mutation . Using these operations, GAs search through large, irregularly shaped spaces quickly, – , ,, , , δ in in in in in E in E out out in = ∂ ∂ = ∂ ∂       ∂ ∂       –– ,, φ in i jn d out= () – , ,, , , φ ik ik jk jk ik j E out E in in out = ∂ ∂ = ∂ ∂       ∂ ∂       + + ∑ 1 1 φδ ik jk ijk j w ,,,, =⋅ [] ++ ∑ 11 δφ δ ik ik ik ik ik ik jk ijk j out out out out w ,, , , ,,,,, – – = ()( ) = () ⋅ [] ++ ∑ 1 1 11 ©2001 CRC Press LLC requiring only objective function values (detailing the quality of possible solutions) to guide the search. Furthermore, GAs take a more global view of the search space than many methods currently encountered in engineering optimization. Theoretical analyses suggest that GAs quickly locate high-performance regions in extremely large and complex search spaces and possess some natural insensitivity to noise. These qualities make GAs attractive for optimizing neural network based process models. In computing terms, a genetic algorithm maps a problem onto a set of binary strings. Each string represents a potential solution. Then the GA manipulates the most promising strings in searching for improved solutions. A GA operates typically through a simple cycle of four stages: (i) creation of a population of strings; (ii) evaluation of each string; (iii) selection of “best” strings; and (iv) genetic manipulation to create the new population of strings. During each computational cycle, a new generation of possible solutions for a given problem is produced. At the first stage, an initial population of potential solutions is created as a starting point for the search process. Each element of the population is encoded into a string (the “chromosome”), to be manipulated by the genetic operators. In the next stage, the performance (or fitness ) of each individual of the population is evaluated. Based on each individual string’s fitness, a selection mechanism chooses “mates” for the genetic manipulation process. The selection policy is responsible for assuring survival of the most fit individuals. A common method of coding multiparameter optimization problems is concatenated, multiparameter, mapped, fixed-point coding. Using this procedure, if an unsigned integer x is the decoded parameter of interest, then x is mapped linearly from [0, 2 l ] to a specified interval [ U min , U max ] (where l is the length of the binary string). In this way, both the range and precision of the decision variables are controlled. To construct a multiparameter coding, as many single parameter strings as required are simply concat- enated. Each coding has its own sub-length. Figure 13.4 shows an example of a two-parameter coding with four bits in each parameter. The ranges of the first and second parameter are 2-5 and 0-15, respectively. The string manipulation process employs genetic operators to produce a new population of individuals (“offspring”) by manipulating the genetic “code” possessed by members (“parents”) of the current population. It consists of selection, crossover, and mutation operations. Selection is the process by which strings with high fitness values (i.e., good solutions to the optimization problem under consideration) receive larger numbers of copies in the new population. In one popular method of selection called elitist roulette wheel selection, strings with fitness value F i are assigned a proportionate probability of survival into the next generation. This probability distribution is determined according to Equation (13.12) FIGURE 13.4 Example of multiparameter binary coding. Two parameters are coded into binary strings with different ranges and varying precision ( π ). ( Source: Han, S. and May, G., 1997 . Using Neural Network Process Models to Perform PECVD Silicon Dioxide Recipe Synthesis via Genetic Algorithms, IEEE Trans. Semi. Manuf. , 10(2):279-287. With permission.) P F F i i = ∑ ©2001 CRC Press LLC Thus, an individual string whose fitness is n times better than another’s will produce n times the number of offspring in the subsequent generation. Once the strings have reproduced, they are stored in a “mating pool” awaiting the actions of the crossover and mutation operators. The crossover operator takes two chromosomes and interchanges part of their genetic information to produce two new chromosomes (see Figure 13.5). After the crossover point is randomly chosen, portions of the parent strings (P1 and P2) are swapped to produce the new offspring (O1 and O2) based on a specified crossover probability. Mutation is motivated by the possibility that the initially defined popu- lation might not contain all of the information necessary to solve the problem. This operation is imple- mented by randomly changing a fixed number of bits in every generation according to a specified mutation probability (see Figure 13.6). Typical values for the probabilities of crossover and bit mutation range from 0.6 to 0.95 and 0.001 to 0.01, respectively. Higher rates disrupt good string building blocks more often, and for smaller populations, sampling errors tend to wash out the predictions. For this reason, the greater the mutation and crossover rates and the smaller the population size, the less frequently predicted solutions are confirmed. 13.2.3 Expert Systems Computational intelligence has also been introduced into electronics manufacturing in the areas of automated process and equipment diagnosis. When unreliable equipment performance causes operating conditions to vary beyond an acceptable level, overall product quality is jeopardized. Thus, timely and accurate diagnosis is a key to the success of the manufacturing process. Diagnosis involves determining the assignable causes for the equipment malfunctions and correcting them quickly to prevent the sub- sequent occurrence of expensive misprocessing. FIGURE 13.5 The crossover operation. Two parent strings exchange binary information at a randomly determined crossover point to produce two offspring. ( Source: Han, S. and May, G., 1997. Using Neural Network Process Models to Perform PECVD Silicon Dioxide Recipe Synthesis via Genetic Algorithms, IEEE Trans. Semi. Manuf. , 10(2):279- 287. With permission.) FIGURE 13.6 The mutation operation. A randomly selected bit in a given binary string is changed according to a given probability. ( Source: Han, S. and May, G., 1997. Using Neural Network Process Models to Perform PECVD Silicon Dioxide Recipe Synthesis via Genetic Algorithms, IEEE Trans. Semi. Manuf. , 10(2):279-287. With permission.) 0 000 001 00000 [...]... effectively resulting in a time-varying gain for the network transfer function (Figure 13.9) Annealing the network at high temperature early leads to rapid location of the general vicinity of the global minimum of the error surface The training algorithm remains within the attractive basin of the global minimum as the temperature decreases, preventing any significant uphill excursion When used in conjunction... to obtain, and representational inadequacy of a limited number of data sets can induce network overtraining, thus increasing the misclassification or “false alarm” rate Also, approaches such as this, in which diagnostic actions take place following a sequence of several processing steps, are not appropriate, since evidence pertaining to potential equipment malfunctions accumulates at irregular intervals... were used as initial weights of the modified model; and (iii) the source input method, in which the source model output was used as an additional input to the modified network The starting point for model transfer was the development of a physical neural network (PNM) model trained on 98 data points generated from a process simulator utilizing first principles Training data was obtained by running the simulator... criterion individually, irrespective of the optimal set for the other two The results of the independent optimization are summarized in Table 13.4 Several interesting interactions and trade-offs between the various parameters emerged in this study One such trade-off can be visualized in two-dimensional contour plots such as those in Figures 13.13 and 13.14 Figure 13.13 plots training error against training... tolerance and narrow initial weight distribution The latter result implies that the interaction between neurons within the restricted weight space during training is a primary stimulus for improving prediction Thus, although learning degrades with a wider weight range, generalization is improved TABLE 13.4 Independently Optimized Network Inputs Parameter Training Error Prediction Error Training Time Hidden... learning rates Further, slight differences in optimal parameters lead to significant differences in performance This is indicated in Table 13.11, which shows training and prediction errors for the neural network models trained with the parameter sets in Table 13.10 If the improvements for the multiple response model are factored in, GAs provide an average benefit of 10.0% in training accuracy and 65.6% in. .. search methods (In each table, the “% Improvement” column refers to the improvement obtained by using genetic search) Although in two cases involving training error minimization the simplex method proved superior, the genetically optimized networks exhibited vastly improved performance in nearly every category for prediction error minimization The overall average improvement observed in using genetic optimization... method involves determining a set of n linearly independent, mutually conjugate directions (where n is the dimensionality of the search space) Successive line minimizations put the algorithm at the minimum of the quadratic approximation For functions that are not exactly quadratic, the algorithm does not find the exact minimum, but repeated cycles of n line minimizations converge in due course to the minimum... classification for new scenarios One approach to defining a hybrid scheme involves combining neural networks with an inference system based on the Dempster–Shafer theory of evidential reasoning [Shafer, 1976] This technique allows the combination of various pieces of uncertain evidence obtained at irregular intervals, and its implementation results in time-varying, nonmonotonic belief functions that reflect... three adjustable fitting parameters in an analytical expression for the TiO2 deposition rate The first step in this hybrid modeling technique involves developing an analytical model For TiO2 deposition via MOCVD, this was accomplished by applying the continuity equation to reactant concentration as the reactant of interest is transported from the bulk gas and incorporated into the growing film Under these . Computational Intelligence in Microelectronics Manufacturing May, Gary S. " ;Computational Intelligence in Microelectronics Manufacturing& quot; Computational. Computational Intelligence in Manufacturing Handbook Edited by Jun Wang et al Boca Raton: CRC Press LLC,2001 ©2001 CRC Press LLC 13 Computational Intelligence in Microelectronics

Ngày đăng: 25/12/2013, 19:15

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan