Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 41 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
41
Dung lượng
0,99 MB
Nội dung
358 Spatiotemporal Pattern Classification t 1 2 Figure 9.14 A simple two-node SCAF is shown. is zero. The mechanism for training these weights will be described later. The final assumption is that the initial value for F is zero. Consider what happens when Qn is applied first. The net input to unit 1 is -fl.net = Zi • Qn + U' 12 X 2 - T(t) = i + o - r«) where we have explicitly shown gamma as a function of time. According to Eqs. (9.2) through (9.4), x\ — ~ax\ + b(l - F), so x\ begins to increase, since F and x\ are initially zero. The net input to unit 2 is /2,net = Z 2 • Qn + W 2 \X\ ~ = 0 + Zi - T(t) Thus, ±2 also begins to rise due to the positive contribution from x\. Since both x\ and x 2 are increasing from the start, the total activity in the network, x\ + X2, increases quickly. Under these conditions, F(t) will begin to increase, according to Eq. (9.9). 9.3 The Sequential Competitive Avalanche Field 359 After a short time, we remove Qn and present Qi2- x\ will begin to decay, but slowly with respect to its rise time. Now, we calculate I\ ne i and /2. ne t again: I\ .net = Z, • Q,2 = 0 + 0 - T(t) h.nel = Z 2 ' Ql2 + W 2 \Xi ~ T(t) Using Eqs. (9.2) through (9.4) again, x\ = —cax\ and ± 2 = 6(1 + x\ — F), so x\ continues to decay, but x 2 will continue to rise until 1 + x\ < r(t). Figure 9. 15(a) shows how x\ and x 2 evolve as a function of time. A similar analysis can be used to evaluate the network output for the oppo- site sequence of input vectors. When Q|2 is presented first, x 2 will increase. x\ remains at zero since /i. net = -T(t) and, thus, x\ = -cax\. The total activity in the system is not sufficient to cause T(t) to rise. When Qn is presented, the input to unit 1 is I\ = 1. Even though x 2 is nonzero, the connection weight is zero, so x 2 does not contribute to the input to unit 1. x\ begins to rise and T(t) begins to rise in response to the increasing total activity. In this case, F does not increase as much as it did in the first example. Figure 9.15(b) shows the behavior of x\ and x 2 for this example. The values of F(t) for both cases are shown in Figure 9.15(c). Since T(t) is the measure of recognition, we can conclude that Qn — > Qi2 was recognized, but Qi2 — * Qn was not. 9.3.2 Training the SCAF As mentioned earlier, we accomplish training the weights on the connections from the inputs by methods already described for other networks. These weights encode the spatial part of the STP. We have drawn the analogy between the SOM and the spatial portion of the STN. In fact, a feood method for training the spatial weights on the SCAF is with Kohonen's clustering algorithm (see Chapter 7). We shall not repeat the discussion of that trailing method here. We shall instead concentrate on the training of the temporal part of the SCAF. Encoding the proper temporal order of the spatial patterns requires training the weights on the connections between the various nodes. This training uses the differential Hebbian learning law (also referred to as the Kosko-Klopf learning law): Wij = (-cwij + dx i Xj)U(± i )U(-Xj) (9.10) where c and d are positive constants, and _ ( ~ [ 1 s >0 0 s <0 360 Spatiotemporal Pattern Classification 1.0 0.8 0.6 0.4 0.2- ••• fl 10 20 30 (a) 1.0 0.8 0.6 0.4 0.2 x 2 - • •••••• X 1 /' • 10 20 30 (b) 10 20 (c) 30 Figure 9.15 These figures illustrate the output response of a 2-node SCAF. (a) This graph shows the results of a numerical simulation of the two output values during the presentation of the sequence Qn —* Qi2- The input pattern changes at t = 17. (b) This graph shows the results for the presentation of the sequence Qi2 —> Qn- (c) This figure shows how the value of F evolves in each case. FI is for the case shown in (a), and F 2 is for the case shown in (b). Without the U factors, Eq. (9.10) resembles the Grossberg outstar law. The U factors ensure that learning can occur (wjj is nonzero) only under certain conditions. These conditions are that x, is increasing (x, > 0) at the same time that Xj is decreasing (-±j > 0). When these conditions are met, both U factors will be equal to one. Any other combination of ±j and Xj will cause one, or both, of the Us to be zero. The effect of the differential Hebbian learning law is illustrated in Fig- ure 9.16, which refers back to the two-node SCAF in Figure 9.14. We want to train the network to recognize that pattern Qn precedes pattern Qi2- In the example that we did, we saw that the proper response from the network was 9.3 The Sequential Competitive Avalanche Field 361 z, • Q 11 x. > 0' Figure 9.16 This figure shows the results of a sequential presentation of Qn followed by Qi2- The net-input values of the two units are shown, along with the activity of each unit. Notice that we still consider that x\ > 0 and ± 2 > 0 throughout the periods indicated, even though the activity value is hard-limited to a maximum value of one. The region R indicates the time for which x\ < 0 and ± 2 > 0 simultaneously. During this time period, the differential Hebbian learning law causes w 2 \ to increase. achieved if w\ 2 = 0 while w 2 \ = 1. Thus, our learning law must be able to increase w 2 \ without increasing w\ 2 . Referring to Figure 9.16, you will see that the proper conditions will occur if we present the input vectors in their proper sequence during training. If we train the network by presenting Qu followed by Qi2, then x 2 will be increasing while x\ is decreasing, as indicated by the region, R, in Figure 9.16. The weight, w\ 2 , remains at zero since the conditions are never right for it to learn. The weight, w 2 \, does learn, resulting in the configuration shown in Figure 9.14. 9.3.3 Time-Dilation Effects The output values of nodes in the SCAF network decay slowly in time with respect to the rate at which new patterns are presented to the network. Viewed as a whole, the pattern of output activity across all of the nodes varies on a time scale somewhat longer than the one for the input patterns. This is a time-dilation effect, which can be put to good use. 362 Spatiotemporal Pattern Classification Node output values SCAF layer Figure 9.17 This representation of a SCAF layer shows the output values as vertical lines. Figure 9.17 shows a representation of a SCAF with circles as the nodes. The vertical lines represent hypothetical output values for the nodes. As the input vectors change, the output of the SCAF will change: New units may saturate while others decay, although this decay will occur at a rate slightly slower than the rate at which new input vectors are presented. For STPs that are sampled frequently—say, every few milliseconds—the variation of the output values may still be too quick to be followed by a human observer. Suppose, however, that the output values from the SCAF were themselves used as input vectors to another SCAF. Since these outputs vary at a slower rate than the original input vectors, they can be sampled at a lower frequency. The output values of this second SCAF would decay even more slowly than those of the previous layer. Conceptually, this process can be continued until a layer is reached where the output patterns vary on a time scale that is equal to the total time necessary to present a complete sequence of patterns to the original network. The last output values would be essentially stationary. A single set of output values from the last slab would represent an entire series of patterns making up one complete STP. Figure 9.18 shows such a system based on a hierarchy of SCAF layers. The stationary output vector can be used as the input vector to one of the spatial pattern-classification networks. The spatial network can learn to classify the stationary input vectors by the methods discussed previously. A complete spatiotemporal pattern-recognition and pattern-classification system can be con- structed in this manner. Exercise 9.4: No matter how fast input vectors are presented to a SCAF, the outputs can be made to linger if the parameters of the attack function are ad- justed such that, once saturated, a node output decays very slowly. Such an arrangement would appear to eliminate the need for the layered SCAF architec- ture proposed in the previous paragraphs. Analyze the response of a SCAF to an arbitrary STP in the limiting case where saturated nodes never decay. 9.4 Applications of STNs 363 Associative memory I Output from SCAF 4 SCAF 4 Output from SCAF 3 1 SCAF3 Output from SCAF 2 SCAF 2 Output from SCAF 1 SCAF1 Original input vectors Figure 9.18 This hierarchy of SCAF fayers is used for spatiotemporal pattern classification. The outputs from each layer are sampled at a rate slower man the rate at which inputs to that layer change. The output from the top layer, essentially a spatial pattern, can be used as an input to an associative network that classifies the original STP. 9.4 APPLICATIONS OF STNS We suggested earlier in this chapter that STNs would be useful in areas such as speech recognition, radar analysis, and sonar-echo classification. To date, the dearth of literature indicates that little work has been done with this promising architecture. 364 Spatiotemporal Pattern Classification A prototype sonar-echo classification system was built by General Dynamics Corporation using the layered STN architecture described in Section 9.2 [8]. In that study, time slices of the incoming sonar signals were converted to power spectra, which were then presented to the network in the proper time sequence. After being trained on seven civilian boats, the network was able to identify correctly each of these vessels from the latter's passive sonar signature. The developers of the SCAF architecture experimented with a 30 by 30 SCAF, where outputs from individual units are connected randomly to other units. Apparently, the network performance was encouraging, as the developers are reportedly working on new applications. Details of those applications are not available at the time of this writing. 9.5 STN SIMULATION In this section, we shall describe the design of the simulator for the spatiotem- poral network. We shall focus on the implementation of a one-layer STN. and shall show how that STN can be extended to encompass multilayer (and multi- network) STN architectures. The implementation of the SCAF architecture is left to you as an exercise. We begin this section, as we have all previous simulation discussions, with a presentation of the data structures used to construct the STN simulator. From there, we proceed with the development of the algorithms used to perform signal processing within the simulator. We close this section with a discussion of how a multiple STN structure might be created to record a temporal sequence of related patterns. 9.5.1 STN Data Structures The design of the STN simulator is reminiscent of the design we used for the CPN in Chapter 6. We therefore recommend that you review Section 6.4 prior to continuing here. The reason for the similarity between these two networks is that both networks fit precisely the processing structure we defined for performing competitive processing within a layer of units. 3 The units in both the STN and the competitive layer of the CPN operate by processing normalized input vectors, and even though competition in the CPN suppresses the output from all but the winning unit(s), all network units generate an output signal that is distributed to other PEs. The major difference between the competitive layer in the CPN and the STN structure is related to the fact that the output from each unit in the STN becomes an input to all subsequent network units on the layer, whereas the lateral connections in the CPN simulation were handled by the host computer 'Although the STN is not competitive in the same sense that the hidden layer in the CPN is. we shall see that STN units respond actively to inputs in much the same way that CPN hidden-layer units respond. 9.5 STN Simulation 365 system, and never were actually modeled. Similarly, the interconnections be- tween units on the layer in the STN can be accounted for by the processing algorithms performed in the host computer, so we do not need to account for those connections in the simulator design. Let us now consider the top-level data structure needed to model an STN. As before, we will construct the network as a record containing pointers to the appropriate lower-level structures, and containing any network specific data parameters that are used globally within the network. Therefore, we can create an STN structure through the following record declaration: record STN = begin UNITS : "layer; a, b, c, d : float; gamma : float; upper : "STN; lower : "STN; y : float; end record; {pointer to network units} {network parameters} {constant value for gamma} {pointer to next STN} {pointer to previous STN} {output of last STN element} Notice that, as illustrated in Figure 9.19, this record definition differs from all previous network record declarations in that we have included a means for outputs weights To higher-level STN networks To lower-level STN networks Figure 9.19 The data structure of the STN simulator is shown. Notice that, in this network structure, there are pointers to other network records above and below to accommodate multiple STNs. In this manner, the same input data can be propagated efficiently through multiple STN structures. 366 Spatiotemporal Pattern Classification stacking multiple networks through the use of a doubly linked list of network record pointers. We include this capability for two reasons: 1. As described previously, a network that recognizes only one pattern is not of much use. We must therefore consider how to integrate multiple networks as part of our simulator design. 2. When multiple STNs are used to time dilate temporal patterns (as in the SCAF), the activity patterns of the network units can be used as input patterns to another network for further classification. Finally, inspection of the STN record structure reveals that there is nothing about the STN that will require further modifications or extensions to the generic simulator structure we proposed in Chapter 1. We are therefore free to begin developing STN algorithms. 9.5.2 STN Algorithms Let us begin by considering the sequence of operations that must be performed by the computer to simulate the STN. Using the speech-recognition example as described in Section 9.2.1 as the basis for the processing model, we can construct a list of the operations that must be performed by the STN simulator. 1. Construct the network, and initialize the input connections to the units such that the first unit in the layer has the first normalized input pattern contained in its connections, the second unit has the second pattern, and so on. 2. Begin processing the test pattern by zeroing the outputs from all units in the network (as well as the STN.y value, since it is a duplicate copy of the output value from the last network unit), and then applying the first normalized test vector to the input of the STN. 3. Calculate the inner product between the input test vector and the weight vector for the first unprocessed unit. 4. Compute the sum of the outputs from all units on the layer from the first to the previous units, and multiply the result by the network d term. 5. Add the result from step 3 to the result from step 4 to produce the input activation for the unit. 6. Subtract the threshold value (F) from the result of step 5. If the result is greater than zero, multiply it by the network b term; otherwise, substitute zero for the result. 7. Multiply the negative of the network a term by the previous output from the unit, and add the result to the value produced in step 6. 8. If the result of step 7 was less than or equal to zero, multiply it by the network c term to produce x. Otherwise, use the result of step 7 without modification as the value for x. 9.5 STN Simulation 367 9. Compute the attack value for the unit by multiplying the x value calculated in step 8 by a small value indicating the network update rate (6t) to produce the update value for the unit output. Update the unit output by adding the computed attack value to the current unit output value. 10. Repeat steps 3 through 9 for each unit in the network. 11. Repeat steps 3 through 10 for the duration of the time step, Ai. The number of repetitions that occur during this step will be a function of the sampling frequency for the specific application. 12. Apply the next time-sequential test vector to the network input, and repeat steps 3 through 11. 13. After all the time-sequential test vectors have been applied, use the output of the last unit on the layer as the output value for the network for the given STP. Notice that we have assumed that the network units update at a rate much more rapid than the sampling rate of the input (i.e., the value for fit is much smaller than the value of At). Since the actual sampling frequency (given by -^r) will always be application dependent, we shall assume that the network must update itself 100 times for each input pattern. Thus, the ratio of 6t to A< is 0.01, and we can use this ratio as the value for 6t in our simulations. We shall also assume that you will provide the routines necessary to perform the first two operations in the list. We therefore begin developing the simulator algorithms with the routine needed to propagate a given input pattern vector to a specified unit on the STN. This routine will encompass the operations described in steps 3 through 5. function activation (net: STN; unumber:integer; invec:"float[]) return float; {propagate the given input vector to the STN unit number} var i : integer; I {iteration counter} sum : float; {accumulator} others : float; {unit output accumulator} connects : ~float[]; {locate connection array} unit : "float[]; {locate unit outputs} begin sum = 0; {initialize accumulator} others = 0; {ditto} unit = net.UNITS".OUTS; {locate unit arrays} connects = net.UNITS".WEIGHTS[unumber]; for i = 1 to length(invec) {for all input elements} do {compute sum of products} sum = sum + connects[i} * invec[i]; end do; [...]... Neocognitron Exercise 10. 1: We can rewrite Eq (10. 1) for a single S-cell in the succinct form US = r,* [|±1 - l] (10. 7) where e and h represent total excitatory and inhibitory inputs to the cell, respectively, and we have absorbed the factor r//(l + r/) into h Show that, when the inhibitory input is very small, Eq (10. 7) can be written as t/s « r,0[e - ft] Exercise 10. 2: Show that, when both e and h are very... connection from the Vs-cell, as indicated in the denominator of Eq (1,0 .10) Also substitute Eq (10. 12) in Eq (10. 10) and notice the similarity between the numerator and denominator of the first term in the brackets Equation^ (10. 12) indicates that the V^-cell is calculating the average input value for all trie 5-planes In that case, Eq (10. 10) can have a nonzero value only if the excitatory response of the... recognition in general, and speech recognition in particular, there are a number of references to other approaches For a general review of neural networks for speech recognition, see the papers by Lippmann [5, 6, 7] For other methods see, for example, Grajski et al [1] and Williams and Zipser [9] Bibliography [1] Kamil A Grajski, Dan P Witmer, and Carson Chen A combined DSP and artificial neural network (ANN)... and Ben Gold Neural- net classifiers useful for speech recognition In Proceedings of the IEEE, First International Conference on Neural Networks, San Diego, CA, pp IV-417-IV-426, June 1987 IEEE [6] Richard P Lippmann Neural network classifiers for speech recognition The Lincoln Laboratory Journal, 1(1): 107 -124, 1988 [7] Richard P Lippmann Review of neural networks for speech recognition Neural Computation,... Ronald J Williams and David Zipser A learning algorithm for continually running fully recurrent neural networks Neural Computation, 1(2):270280, 1989 C H A P T E R The Neocognitron ANS architectures such as backpropagation (see Chapter 3) tend to have general applicability We can use a single network type in many different applications by changing the network's size, parameters, and training sets... multilayered neural network Biological Cybernetics, 20:121-136, 1975 [2] Kunihiko Fukushima Neural network model for selective attention in visual pattern recognition and associative recall Applied Optics, 26(23):49854992, December 1987 \ [3] Kunihiko Fukushima A neural networKfor visual pattern recognition Computer, 21(3):65-75, March 1988 [4] Kunihiko Fukushima, Sei Miyake, and Takayuki Ito Neocognitron: A neural. .. Man, and Cybernetics, SMC-13(5):826834, September-October 1983 [5] Robert Hecht-Nielsen ANZA User's Guide Hecht-Nielsen Neurocomputer Corporation, San Diego, CA, Release 1.0, July 1987 [6] Robert Hecht-Nielsen Counterpropagation networks Applied Optics, 26(23):4979-4984, December 1987 [7] Murali M Menon and Karl G Heinemann Classification of patterns using a self-organizing neural network Neural Networks, ... in each case In summary, only a certain percentage of S-cells and (7-cells at each level respond with a positive output value These are the cells whose excitation level exceeds that of the average cells 10. 3 PERFORMANCE OF THE NEOCOGNITRON Figure 10. 10 shows a typical response of cells in the nine-layer neocognitron trained to recognize handwritten numerals 0 through 9 This particular example shows... examples of numerals that were recognized successfully appear in Figure 10. 11 Finally, we show in Figure 10. 12 an example of a pattern that results in a completely ambiguous response from the neocognitron In the next section, we describe a method for resolving patterns such as the one in Figure 10. 12 10. 4 ADDITION OF LATERAL INHIBITION AND FEEDBACK TO THE NEOCOGNITRON There are two issues raised by the... whenever possible, as was illustrated in Figure 9.11 and described in the previous section The second method is to implement the SCAF network, and to combine many SCAF's with an associative-memory network (such as a BPN or CPN, as described in Chapters 3 and 6 respectively) to decode the output of the final SCAF Programming Exercises 9.1 Code the STN simulator and verify its operation by constructing multiple . for example, Grajski et al. [1] and Williams and Zipser [9]. Bibliography [1] Kamil A. Grajski, Dan P. Witmer, and Carson Chen. A combined DSP and artificial neural network (ANN) approach to. Lippmann. Neural network classifiers for speech recognition. The Lincoln Laboratory Journal, 1(1): 107 -124, 1988. [7] Richard P. Lippmann. Review of neural networks for speech recognition. Neural. MA, 1990. [5] Richard P. Lippmann and Ben Gold. Neural- net classifiers useful for speech recognition. In Proceedings of the IEEE, First International Conference on Neural Networks, San Diego, CA, pp.