Temporal generalization accounts for response resurgence in the peak procedure

Behavioural Processes 74 (2007) 126–141 Temporal generalization accounts for response resurgence in the peak procedure Federico Sanabria ∗ , Peter R Killeen Arizona State University, United States Received 20 June 2006; received in revised form 20 October 2006; accepted 24 October 2006 Abstract The peak interval (PI) procedure is commonly used to evaluate animals’ ability to produce timed intervals It consists of presenting fixed interval (FI) schedules in which some of the trials are replaced by extended non-reinforced trials Responding will often resume (resurge) at the end of the non-reinforced trials unless precautions are taken to prevent it Response resurgence was replicated in rats and pigeons Variation of the durations of the FI and the non-reinforced probe trials showed it to be dependent on the time when reinforcement is expected Timing of both the normal time to reinforcement, and the subsequent time to reinforcement during the probe trials followed Weber’s law A quantitative model of resurgence is described, suggesting how animals respond to the signaling properties of reinforcement omission Model results were simulated using a stochastic binary counter © 2006 Elsevier B.V All rights reserved Keywords: Model; Peak procedure; Reinforcement omission; Resurgence; Temporal control; Pigeons; Rats Introduction Many organisms can coordinate their behavior with periodic changes in their environment, even without the assistance of concurrent stimuli marking those changes (Lejeune and Wearden, 1991) This ability is usually labeled timing, and it implies the operation of a mechanism that, when exposed to the periodicity of a stimulus, outputs an isoperiodic response in phase with the environmental period A paradigmatic example of timing is the pattern of responding obtained under fixed-interval (FI) schedules of reinforcement (Ferster and Skinner, 1957) In an FI trial, the onset of a cue such as insertion of a response lever starts an interval of fixed duration, after whose completion a target response is reinforced Repeated exposure to the FI generates a scalloped pattern in cumulative response records The peak interval (PI) procedure introduces an aperiodicity in reinforcement that is highly informative of the timing mechanism that generates the iconic scalloped records (Catania, 1970; ∗ Corresponding author at: Department of Psychology, Arizona State University, P.O Box 871104, Tempe, AZ 85287-1104, United States Tel.: +1 480 965 0756; fax: +1 480 965 8544 E-mail address: Federico.Sanabria@asu.edu (F Sanabria) 0376-6357/$ – see front matter © 2006 Elsevier B.V All rights reserved doi:10.1016/j.beproc.2006.10.012 Roberts, 1981) This procedure consists of cancellation of reinforcement in a random selection of probe trials, each lasting for at least twice the FI Histograms of responses in probe trials have been characterized as measures of temporal generalization; at first flat, after extended training they come to resemble a Gaussian distribution with slight positive skew centered on the target FI, having a width proportional to the target interval (Roberts, 1981) The orderliness of performance under PI and other timing procedures usually elicits model-building from those with the penchant (e.g., Church and Broadbent, 1990; Killeen, 2002; Wearden and Ferrara, 1995); their models typically involve: (a) a pacemaker, that feeds pulses to (b) a counter or accumulator; on reinforcement, the counter flushes into (c) a memory module that updates estimated counts-to-reinforcement, (d) which may be Gaussian distributed due to noise in the counter or in memory; finally, (e) a decision algorithm generates responses depending on the divergence between current and stored counts This is a very flexible theoretical framework, as demonstrated by the diversity of models it supports (Church and Broadbent, 1990; Gallistel and Gibbon, 2000; Killeen and Taylor, 2000; Lejeune, 1998; Machado, 1997; Wearden, 1999) Aside from the ubiquitous scalar invariance of PI performance (Gibbon, 1977, 1981, 1986; Gibbon and Allan, 1984), further constraints to timing models provided by PI studies have been based on the analy- F Sanabria, P.R Killeen / Behavioural Processes 74 (2007) 126–141 sis of within-trial performance (Cheng and Westwood, 1993; Church et al., 1994), the detection of features of timing supported by specific brain structures (Buhusi and Meck, 2005), the impact of gaps in the presentation of interval-marking cues during probe trials (Aum et al., 2004; Buhusi and Meck, 2006; Buhusi et al., 2006; Devaca et al., 1994; Hopson, 1999; Kaiser et al., 2002; Zentall and Kaiser, 2005), and the manipulation of reinforcement/punishment contingencies (Cheng, 1992; Cheng et al., 1993; MacEwen and Killeen, 1991; Matell and Meck, 1999) Despite the extensive use of the PI procedure to evaluate timing models, there are systematic divergences of temporal gradients from precisely Gaussian form that are not completely understood and that have yet to be theoretically exploited (but see Church, 1999) The generality of two such divergences is well illustrated by selections from recent reports of peak performance under various conditions (Fig 1) Note that in all panels of Fig response distributions are positively skewed and response rates increase towards the end of probe trials—a pattern that we refer to as response resurgence Church et al (1991) identified these divergences from Gaussian distributions as asymmetrical sources of variance in temporal generalization, distinct from the symmetrical variance captured by Gaussian distributions that is the cornerstone of scalar expectancy theory (SET; Gibbon, 1977) In their experiments with Norway rats, Church et al (1991) demonstrated that when the duration of 127 probe trials (and hence the time of subsequent reinforcement) was variable, response resurgence disappeared and skewness was virtually eliminated This finding suggests that the asymmetries in response distributions are due to the anticipation of either current trial termination or subsequent reinforcement This hypothesis implies that animals may anticipate reinforcement despite cues, such as inter-trial interval and trial stimulus onset, that always signal its imminence This occurs even though the animals are responsive to them, as their onset demonstrably changes response rates, and by implication resets the counter in the timing mechanism (Roberts, 1981; Roberts and Church, 1978) Under this hypothesis, rats are like commuters who keep their eyes on the tracks for an arriving train, even though loudspeakers always announce its arrival in advance Using pigeons as subjects, Kirkpatrick-Steger et al (1996) reported a change in response resurgence that is dependent on PI parameters, and is not accounted for by Church et al (1991) anticipation hypothesis As shown in Fig 2A, Kirkpatrick-Steger and associates systematically replicated the often reported skewness and resurgence in response rates when the duration of probe trials (P) was more than four times the target fixed interval F (i.e., P/F > 4; in Church et al., P/F = 2, and 12) When P/F = 4, however, response resurgence took the form of a second peak centered about 3F, instead of a monotonic ramp (Fig 2B) This second peak was also reported by Saulsgiver et al (2006) for some pigeons exposed to P/F = If Fig Peak interval performance in five experiments (A) Female Wistar rats under various 5-HT receptor agonist and antagonist treatments (reprinted from Asgari et al (2006), with permission from Elsevier) (B) PVG rats under various doses of d-amphetamine (reprinted from Bayley et al (1998), with kind permission of Springer Science and Business Media) (C) Pigeons with hippocampal lesion and sham control (reprinted from Colombo et al (2001), with permission from Elsevier) (D) Starlings trained in two fixed interval (FI) schedules of reinforcement (reproduced from Rodr´ıguez-Giron´es and Kacelnik (1999), with permission from Elsevier) (E) Pigeons trained in two FI schedules of reinforcement (reproduced from Grace and Nevin (1999), with permission from Elsevier) 128 F Sanabria, P.R Killeen / Behavioural Processes 74 (2007) 126–141 Fig Peak interval performance of pigeons under two probe-to-FI (P/F) duration ratios (A) Eight (120/15) and (B) four (120/30; reprinted from Kirkpatrick-Steger et al (1996), published by APA, with permission) (C) Simulation of peak interval performance on 120-s probe trials, using a stochastic binary counter (p = 9, pulse rate = s−1 ) trained on 30-s trials (reprinted from Killeen and Taylor (2000), published by APA, with permission) responding tracks the termination of trials, or forthcoming reinforcement, as suggested by Church et al (1991), it is not clear why response rates would systematically decrease before the end of the trial under certain P/F ratios It is possible that the timing mechanism supporting PI performance is different in rats than in pigeons, and certain components of the pigeons’ mechanism remain unaccounted for In light of the generality of FI and PI performance across species (e.g., Lejeune and Wearden, 1991; Wearden and McShane, 1988), this is unlikely It is perhaps more appropriate – or at least more parsimonious – to assume that the differences in performance are due to procedural differences, in this case differences in the properties of the interval-marking cues Whereas Church and associates used salient diffuse cues (tone and houselight), Kirkpatrick-Steger, Saulsgiver and their coworkers used the more localized illumination of a key in a continuously illuminated chamber It is possible that, because of the low contrast of key light relative to chamber illumination, pigeons were only sensitive to the onset of the key light when they were oriented towards the key In that case, when pigeons orient away from the key during probe trials, reorienting towards the illuminated key before the end of the trial would generate a situation indiscriminable from reorienting shortly after the intertrial interval (ITI) When the latter situation takes place during FI trials, key pecking may be reinforced Therefore, reorienting Fig A binary counter With each input, the register increments a radix = counter Reinforcement, upon the 12th input, averages the counters to the longterm store, which is the criterion Upon each input the device compares its bit settings to the criterion A standard binary counter would weight each bit by 2i ; this model weights the bits by how often they were set when reinforcement occurred (and possibly gives them inhibitory weights if they were when reinforcement occurred) The product of the criterion and present time are weighted and summed to predict response probability In this example, the probability of a correct increment is 1.0 In a real embodiment, p < 1, causing generalization around the reinforced values towards the key during probe trials might engender key pecking and, possibly, the second peak Second peaks may thus require the inconspicuity of interval-marking cues, of the kind typically required for autoshaping (Wasserman, 1973) However necessary, this condition is clearly insufficient to generate second peaks This attentional hypothesis does not explain the dependency of second peaks on particular P/F ratios, which remains a major challenge to theories of timing Killeen and Taylor (2000)’s Theory of Stochastic Counters (TSC) explicitly confronts that challenge, and provides a comprehensive account for skewness of response rates, resurgence and second peaks According to TSC, these attributes of temporal production may be due, not to the interference of subsequent trials or attentional processes on the operation of a simple mechanism, but rather to the continued operation of a stochastic binary counter This counter may be characterized as a set of neurons or binary switches (bits) arranged in sequence, represented in Fig by each row of squares The onset of the interval-marking cue (input = 0) sets all bits to OFF (clear squares), and every pulse from the pacemaker sends an input signal that travels from one bit to the next If a bit is ON (dark squares), an input flips it to OFF and the process continues to the next bit in the sequence If a bit is OFF, an input flips it to ON with probability p and F Sanabria, P.R Killeen / Behavioural Processes 74 (2007) 126–141 the process stops For p = 1, this is a standard binary counter, and could be instantiated in a sequence of neurons that have a two-pulse threshold for firing Reinforcement sends the current counter configuration (input 12 in Fig 3) to memory, adding it to a weighting vector of previously stored configurations A unit is added to weights corresponding to bits that are ON, and a unit subtracted from weights corresponding to bits that are OFF The decision algorithm multiplies the current configuration in real time (e.g., input 28) by the weighting vector stored in memory, and responding occurs if their dot product (i.e., filter output) is larger than a threshold This is essentially a model of conditioning to a compound cue, with each of the bits a different feature of the stimulus, and with equal speeds of learning both inhibitory and excitatory stimuli A binary counter like the one described above, with p = and a threshold equal to the weighted sum of the bits in the reinforced counter configuration, would generate a single response every time it reaches the reinforced counter configuration in a PI procedure As long as it does not run short of switches, it would be a precise and accurate clock When p < 1, the system generates skewed temporal generalization gradients like those typically reported in the PI literature It can also produce persistent responding that closely matches the pattern of responding obtained by Kirkpatrick-Steger et al (1996) (see Fig 2C) Second peaks occur because the configuration of lower order bits recurs periodically in probe trials Prolonged exposure to probe trials, however, will condition negative weights for those higher order bits that are ON after the target interval elapses; they become negative features of the compound cue Unless positive features overshadow these negative features, prolonged training with a high frequency of probes should cause extinction to those configurations of the register, and second peaks should become flatter or disappear In the simulation in Fig 2C, the weighting vector was updated only during FI trials, leaving no chance for conditioned inhibition to evolve To summarize, some divergences of PI performance from Gaussian distributions (skewness and resurgence) may be explained by anticipation of reinforcement (Church et al., 1991)—but this would not explain second peaks An attentional module that predicts an occasional neglect of the intervalmarking cue may be superimposed on the anticipation model to explain second peaks—but this would not explain its dependency on PI parameters A stochastic binary counter may account for divergences from Gaussian distributions, invoking neither reinforcement anticipation nor inattention—but resurgence and second peaks are predicted to decrease with training Although the right tail of the bell curve does decrease with training, it is not eliminated with extended probe trials (Kirkpatrick-Steger et al., 1996; Saulsgiver et al., 2006) We attempted to replicate Kirkpatrick-Steger et al.’s (1996) data to evaluate and reconcile these models Pigeons and rats served as experimental subjects to detect differences between species that may have confounded the development of general models of PI performance Salient interval-marking cues were used (tones for rats, simultaneous darkening of the chamber and illumination of response key for pigeons) to minimize the possi- 129 bility of cue neglect A broad spectrum of probe trial durations and target intervals was examined to generalize our conclusions across these parameters Experiment 2.1 Method 2.1.1 Subjects Six experimentally naăve SpragueDawley rats (Rattus norvegicus; Charles River) were approximately 160 days old at the beginning of the experiment Rats were housed individually in a room with a 12 h:12 h light:dark cycle, with the light cycle beginning at 1800 h; experiments were conducted only during the dark cycle The rats’ running weights were based on 85% of their free-feeding weights, as estimated from the provider’s growth curves Each rat was weighed immediately prior to an experimental session When required, supplementary feeding of 8604 rodent chow (Harlan Teklad, Madison, WI) was given at the end of each day, no fewer than 12 h before experimental sessions Supplementary feeding amounts were based 50% on a moving average of the amount historically fed, and 50% on current deviations from target running weight Water was always available in the home cages 2.1.2 Apparatus Experimental sessions were conducted in six MED Associates modular test chambers (305 mm long, 241 mm wide, and 292 mm high), each enclosed in a sound- and light-attenuating box equipped with a ventilating fan The front and rear walls and the ceiling of the experimental chambers were made of clear plastic, and the front wall was hinged and functioned as a door to the chamber One of the two aluminum side panels served as a test panel The floor consisted of thin metal bars positioned above a catch pan A square opening (51 mm sides) located 15 mm above the floor and centered on the test panel provided access to the hopper (MED Associates, ENV-200-R2M) for 45 mg food pellets (Noyes Precision pellets, Improved Formula A/I, Research Diets, Inc., New Brunswick, NJ) A single pellet was provided with each activation of a dispenser (MED Associates, ENV-203) Two retractable levers (MED Associates, ENV-112CM) flanked the food hopper Only the lever closer to the chamber door, to the right of the hopper, was operative; the other lever remained retracted throughout the experiment The center of the lever was 80 mm from the center of the food hopper, and 21 mm from the floor Lever presses were recorded when a force of approximately 0.2 N was applied to the end of the lever; its activation generated a 100-ms refractory period in which no further activations were registered A multiple tone generator (MED Associates, ENV-223) was used to produce kHz tones at approximately 75 dB through a MED Associates, ENV-224AM speaker centered on the top of the test panel, 125 mm above the food hopper The ventilation fan mounted on the rear wall of the sound-attenuating chamber provided masking noise of approximately 60 dB There was no illumination of the test chambers during experimental sessions Experimental events were arranged via a Med-PC® 130 F Sanabria, P.R Killeen / Behavioural Processes 74 (2007) 126–141 interface connected to a PC controlled by Med-PC IV® software 2.1.3 Procedure 2.1.3.1 Hopper training and autoshaping At the beginning of the experiment each rat was unsystematically and permanently assigned to a different test chamber When a rat was placed in its corresponding test chamber for the first time, the food hopper was baited with 20 pellets After h rats were removed from their chambers, and each hopper was inspected for leftover pellets This procedure was repeated daily for each rat until all pellets in the hopper were eaten Immediately following hopper training, lever pressing was autoshaped by inserting the lever into the chamber every 208 s for at least 16 s, followed by a pellet delivery Each daily autoshaping session consisted of 34 lever presentations; 15 autoshaping sessions were conducted for each rat 2.1.3.2 Fixed interval pretraining Immediately after autoshaping rats were introduced to a fixed interval 15 s (FI-15 s) schedule of reinforcement Pretraining sessions started with a 5-min chamber habituation period, in which levers remained retracted The start of each experimental trial was signaled by the insertion of the lever and the onset of a kHz tone A lever press after 15 s of trial start was immediately followed by the retraction of the lever, the offset of the tone, and the delivery of one pellet, followed by a 15 s inter-trial interval (ITI); which in turn was followed by the insertion of the lever, onset of the tone, and initiation of the next FI Sessions ended after 150 trials or h, whichever occurred first Each rat was pretrained for six sessions 2.1.3.3 Peak training The first experimental condition was similar to pretraining, except for the following modifications to the experimental protocol Trials were divided in blocks of four, and one trial of each block was randomly selected as a probe trial In probe trials, the tone remained on and the lever inserted for 120 s; lever presses were recorded, but they had no other programmed effect In the other three FI trials of each block, experimental conditions were the same as during pretraining except that if no lever press were made during a trial, that trial ended without pellet delivery after 120 s (i.e., the duration of probe trials) Sessions ended after 120 trials (probe + training) or h, whichever occurred first The first experimental condition was conducted for 32–33 sessions; in subsequent conditions, the number of sessions was reduced to 21–23 The target FI schedule and probe trial duration were varied across experimental conditions as indicated in Table After Condition 3, levers remained inserted during ITIs, and trials were only signaled by the tone Rat was excluded from the experiment after Condition 2, due to an intermittent failure of the lever in its box that was only detected during Condition 2.1.4 Data analysis Peak interval performance on individual probe trials have been typically described by a sequence of three states An initial Table Chronological order and parameters of peak training conditions during Experiment Order Target FI schedule, F (s) Probe trial duration, P (s) P/F 1a 2a 3a 15 30 30 30 60 15 15 120 240 120 240 240 240 60 8 16 a Lever was retracted during ITI low response rate (post-reinforcement pause, or PRP), a high response rate that typically envelops the target FI, and a terminal low response rate (Church et al., 1994) The presence of response resurgence would be captured by a fourth high rate state To detect resurgence, the pattern of responding on individual trials was examined using a modified version of an analytical procedure employed by Church et al (1994) and Hanson and Killeen (1981) The parameters of two nested models were fit to response times obtained from each probe trial Both models assumed that the sequence of responses on each probe trial could be partitioned in a number of alternating low-high states (three in one model, four in the other) The response rate during each state was assumed constant, and represented as the mean rate during that time The start-time of the first state (PRP) was assumed to coincide with the initiation of the trial; the partition was constrained to require that the response rate in the first (PRP) state be lower than the response rate during the second state All possible partitions of the responses consistent with these conditions were examined to find the partition that minimized the sum of square deviations from Model (three epochs of constant responding) and Model (four epochs) Predicted rates within states were set at the average observed rate for those partitions The corrected Akaike Information Criterion (AICc ; Burnham and Anderson, 2002) was used to select the best fitting model relative to the number of free parameters used: six in Model (two break points, three rates, one residual variance), eight in Model States were called Low or High in relation to the rates in the states preceding them The first and second states were Low and High, as noted If the sequence of states in a trial followed the pattern Low-High-Low, Low-High-High-Low, or Low-High-Low-Low, the response rate function for that trial was deemed bitonic, showing typical inverted U shape without evidence of resurgence If the sequence of states in a trial followed the pattern Low-High-Low-High, the response rate function for that trial was deemed tritonic, and provided evidence for resurgence The first and second break points were labeled as startand stop-times for that trial The third break point in a tritonic response rate function was the resurgence time for that trial Monotonic trials (Low-High-High patterns) were also identified These were, however, very rare Only three such trials were observed in the sample of about 2000 trials examined Thus, monotonic trials were excluded from the break-point detection analysis F Sanabria, P.R Killeen / Behavioural Processes 74 (2007) 126–141 Fig Peak interval performance of rats in Experiment 1, from s to twice the FI Mean response rates were calculated over 2-s bins and normalized by dividing each of them by the maximum rate in the same experimental condition Numbers in the legend are fixed interval (F) and probe duration (P), in that order Dark symbols and the letter “r” in parentheses indicate conditions were levers were retracted between trials 2.2 Results 2.2.1 Autoshaping and pretraining The typical rat responded on half of the trials (17/34) by the third autoshaping session All rats obtained the 150 programmed pellets in the first and last FI pretraining sessions Response rates during the last pretraining session ranged from 18 to 41 responses/min 2.2.2 Peak training 2.2.2.1 Mean response rates Fig shows that responses clustered around the target fixed interval F in probe trials across all experimental conditions Fig depicts changes in response rate as a function of time through a probe trial (left of vertical line) and through the subsequent (post-probe) trial (right of vertical line) Response rates progressively decreased after F, and then increased again in a monotonic fashion, with the possible exception of one condition (F = 60, P = 240), where the mean response rate did not fall as low as in the other conditions, and did not thereafter increase Across experimental conditions, the mean probe terminal response rate (between 0.9P and P) was 45% of the maximum response rate of each condition The approximate superimposition of the relative distributions of responses in Fig suggests that their dispersion was approximately proportional to F and, therefore, behaved according to Weber’s Law There were, however, at least two systematic deviations from exact superimposition: (a) gradients were slightly broader in each FI condition when P = 240; (b) response rates prematurely peaked around 0.25F when the lever was retracted between trials A comparison of two otherwise identical conditions (F = 30, P = 240) suggests no effect of lever retraction except for the premature peak The direction of the skewness of response distributions, measured by splitting the area under each curve at F in Fig 4, did not vary systematically on any other dimension Under the hypothesis that responses were under the partial control of the delivery of the subsequent reinforcer, we fit 131 two Gaussian distributions to response rates, one centered on reinforcement at F, and the other centered on the subsequent reinforcement, at P + F (following Whitaker et al., 2003) The peak and spread of each distribution (RF , RP+F , σ F and σ P+F ) were free parameters The curves in Fig to the left of vertical line in each plot show that the model described performance during probe trials quite well.1 Appendix A.1 gives the mathematical details; Table contains the parameters Fig plots the standard deviations against the means of their distributions (F or P + F), along with the best fitting simple Weber function, σ = wt, where w is the Weber Fraction, or coefficient of variation, and t is the stipulated mean of the distributions There is a reasonable fit by this simple model to the data, further evidencing the harmonious nature of control by time to reinforcement We examined the persistence of response rates in trials following unreinforced probe trials (post-probe) and compared them with response rates in trials following reinforced FI trials (post-FI; Fig 5, right of vertical line) Because probe trials were mostly post-FI, the initial rise in responding in probe trials is virtually identical to the mean initial rise of all post-FI trials Responding in post-probe trials, however, restarted at a much higher rate than in post-FI trials, essentially matching the response rate at the end of the preceding probe By the time reinforcement was due, responding in post-probe trials was either similar to or slightly below responding in post-FI trials To account for these differences, we assumed that response rates rise from their level at the end of the probe toward their standard peak rate at around the time of reinforcement In particular, responding that resurged in the preceding probe trial becomes less dominant, and typical responding between trial initiation and reinforcement becomes dominant, transitioning at rate λ (see Table 2) A mathematical model of this transition is described in Appendix A.2 Fig 5, right of the vertical line, shows that the model describes post-probe performance adequately 2.2.2.2 Break point detection The break point detection algorithm was run on the last three to five sessions of each condition Model parameters (break-points and response rates) were estimated for each individual trial and used to simulate responding Simulated responding was aggregated to generate, for each condition, a function relating mean response rate to time in probe trial Mean simulated response rates accounted for 91–96% of the variance in the obtained response rates (Fig 5, left of vertical line) The average probability of a tritonic resurgence (Low-HighLow-High) response rate function ranged from 11 to 74 (Fig 7) across experimental conditions The probability of tritonic rate functions did not vary linearly with F, P, or F/P, but it was The total time between the onset of a probe trial and the next reinforcer was P + ITI + F Analytically, the inclusion of the ITI in the estimation of timing parameters did not affect goodness of fit, because ITI did not vary across experimental conditions Functionally, Roberts and Church (1978) have demonstrated that timing is unaffected by intervals that not differentially signal reinforcement Thus, analysis was based on the duration of the cue signaling time to reinforcement (F and P + F) 132 F Sanabria, P.R Killeen / Behavioural Processes 74 (2007) 126–141 Fig Response rates of rats in Experiment as a function of time (in seconds) through a probe trial (left of vertical line) and during the subsequent, post-probe trial (right of vertical line) Responding during ITI was not recorded Gray circles indicate response rates in trials preceded by FI trials and serve as comparison between post-FI and post-probe performance F and P indicate fixed interval and probe durations, respectively; LR indicates whether the response lever was retracted between trials Predictions from a double Gaussian (Eq (A1)) and a transition model (Eq (A2)) are superimposed on probe and post-probe data, respectively 133 F Sanabria, P.R Killeen / Behavioural Processes 74 (2007) 126–141 Table Parameters of the double Gaussian distribution and transition model of PI performance (Experiment 1) F P Lever retraction RF (responses/s) RF+P (responses/s) σ F (s) σ F+P (s) λ (s−1 ) 15 15 15 30 30 30 60 60 120 240 120 240 240 240 No Yes No Yes Yes No No 1.11 1.08 0.85 1.33 1.02 0.91 0.36 0.51 0.94 0.44 1.08 0.53 0.46 0.17 7.70 8.62 10.29 13.42 15.24 13.45 28.72 25.36 45.94 81.25 39.25 91.78 96.11 274.98 082 072 093 017 077 036 066 metrically distributed and concentrated at 1.35F (Fig 8B) The distributions of resurgences in probe trials with tritonic rate functions were somewhat heterogeneous (Fig 8C) When levers were retracted between trials, and particularly when P = 120 s, resurgence-times peaked at different moments and then declined at various rates towards the end of the trial In other experimental conditions, resurgence probability increased around the middle of the trial and rapidly approached asymptote, except for F = 15 s, P = 60 s, where the increase in resurgence persisted until the end of the trial Fig shows that across experimental Fig The standard deviations of the Gaussian distributions are plotted against t = F (FI) and t = P + F (the time of the post-probe reinforcer) for Experiment significantly higher when the lever was retracted between trials than when the lever was left in place A within-subject resampling of the probability of tritonic rate functions in each condition (5000 iterations) indicated that an equal or larger mean difference was unlikely to occur by chance (p < 002) This means that resurgence was more likely with well-demarked intervals The relative frequency of starts in probe trials was positively skewed, peaking a few seconds before F (Fig 8A) In two conditions where the lever was retracted between trials, a peak in start-times was detected soon after the lever was inserted in the chamber to initiate a trial Stop-times were more sym- Fig Mean probabilities of trials with tritonic rate functions across rats in Experiment Bars indicate standard errors Fig Experiment 1: probability distributions of (A) starts and (B) stops in probe trials, from s to 4F (C) Probability distribution of resurgences in tritonic trials Note that panel C is plotted on a different scale than panels A and B Symbols and legend correspond to those of Fig 134 F Sanabria, P.R Killeen / Behavioural Processes 74 (2007) 126–141 per reinforcer is not favored by these data Apparently, rats are biased to take a chance even when the signs are not portentous But this is not news (see, e.g., Inglis et al., 1997) Our results are consistent with the notion that response rates integrate simultaneous intervals between various stimuli and food (Guilhardi et al., 2005; Meck and Church, 1984) The interval-marking cues were prolonged (15 s) and reliable There were no uncued food presentations, and food followed the cues after Fs 75% of the time These cues engendered a transition from response rates controlled by a distal cue – the trial cue that was Ps prior – to response rates controlled by the proximal cue, predicting reinforcement in Fs Experiment Fig Obtained vs predicted [k(P + F)] resurgence times in Experiment Fitted k slopes are indicated for each subject conditions, median resurgence times were well described by the linear function k(P + F), with k tailored to each individual rat, and clustering around two-thirds 2.3 Discussion Temporal generalization gradients obtained from rats in various peak procedures are well described by Gaussian distributions with standard deviations proportional to the training FI schedule (F) for probe intervals between zero and 2F Beyond 2F, responding resurges at rates that are not a simple function of F A second Gaussian distribution centered on the next reinforcer captured the resurgence Like the first distribution, the standard deviation of the second Gaussian distribution was proportional to its mean This suggests that response rates were controlled not only by reinforcement in the current trial, but also by the subsequent reinforcement in the next trial This occurred despite the salience of interval-marking cues It is possible that responding resurges because, after stopping, animals may not discriminate between the current time within the trial and times between trial initiation and reinforcement (e.g., see Bizo et al., 2006)—as if the animals suddenly forgot how much time had passed since trial initiation The evidence provided here suggests otherwise If it were simply forgetting, rate of forgetting should be independent of probe duration Fig shows, however, that there is an extremely orderly relation between resurgence and time to the next reinforcer It should not be surprising that response rates track the temporal location of a second reinforcer when the first reinforcer is overdue (Whitaker et al., 2003) It is surprising, however, that the tone offset during ITI did not preclude responding prior to that cue in probe trials The rats would be signaled when reinforcement was imminent; why did not they just wait for that signal? If rats can exploit such forthcoming predictors to minimize wasted effort, why should response resurgence occur? The difference in response patterns following probe and FI trials indicates that rats were sensitive to the interval-marking cue once it occurred: they just did not wait for it Thus, an intuitive account based on the use of external cues to minimize number of responses To evaluate the generality of our observations, we trained pigeons on a PI procedure similar to the one implemented in Experiment for rats 3.1 Method 3.1.1 Subjects Five experienced adult White Carneaux pigeons (Columba livia) were housed individually in a room with a 12 h:12 h day:night cycle, with dawn at 0600 h They had free access to water and grit in their home cages The pigeons’ running weights were based on 80% of their free-feeding weights Each pigeon was weighed immediately prior to an experimental session and was excluded from a session if its weight exceeded 8% of its running weight When required, a supplementary feeding of ACE-HI pigeon pellets (Star Milling Co.) was given at the end of each day, at least 12 h before experimental sessions were conducted Supplementary feeding amounts were determined as in Experiment 3.1.2 Apparatus Three MED Associates modular test chambers, similar to those described in Experiment 1, were used The test panel contained a plastic transparent response key (25 mm in diameter: MED Associates, ENV-123AM), centered horizontally 70 mm from the ceiling The key could be illuminated by white light from two diodes Activation of the key generated a 100-ms period in which no further activations were registered A rectangular opening (52 mm wide, 57 mm high) located 20 mm above the floor and centered on the test panel could provide access to milo when a grain hopper was activated (Coulbourn Instruments, H14-10R) A houselight (MED Associates, ENV-215M) was mounted 12 mm from the ceiling on the sidewall opposite the test panel 3.1.3 Procedure 3.1.3.1 Fixed interval pretraining Pigeons were introduced directly to a FI-15s schedule of reinforcement Sessions started with a 15-s ITI in which the houselight was illuminated The start of each experimental trial was signaled by the extinction of the houselight simultaneously with the illumination of the response key A key peck after 15 s of trial start extinguished the 135 F Sanabria, P.R Killeen / Behavioural Processes 74 (2007) 126–141 Table Chronological order and parameters of peak training conditions during Experiment Order Target FI schedule, F (s) Probe trial duration, P (s) P/F Maximum no of daily trials 2a 15 30 30 60 15 15 120 240 240 240 240 60 8 16 80 80 68 41 61 80 a Preceded by mixed (VI 60 s, EXT 120 s) schedule of reinforcement key, activated the food hopper for 2.5 s, and then started an ITI Sessions ended after 80 trials or h, whichever happened first Each pigeon was pretrained for five sessions 3.1.3.2 Peak training The first experimental condition was similar to pretraining, except for a few modifications to the experimental protocol Trials were divided in blocks of four, with one trial of each block randomly selected as probe trial In probe trials the key light remained on for 120 s; key pecks were recorded, but they did not have any other programmed effect In the other three FI trials of each block, experimental conditions were the same as during pretraining except that trials with no key pecks ended after 120 s without food delivery Sessions ended after h or after trial 80, whichever happened first Peak training conditions were in effect for 30 daily sessions Initially, an analysis of the acquisition of probe performance within each experimental condition was planned To minimize carry-over effects between conditions, a mixed variable interval extinction schedule was instituted after the first experimental condition The mixed schedule was intended to dissociate reinforcement from time since onset of interval-marking cue Because subsequent acquisition data was deemed uninformative for the purpose of the study, this de-training procedure was not conducted again Peak training contingencies were restored in subsequent experimental conditions Each condition was in effect for 19–31 sessions The target FI schedule, probe trial duration, and maximum number of trials per session were varied across experimental conditions as indicated in Table (Conditions 1–6) One pigeon (#115) was excluded from Conditions and due to an injury to its foot well described by Gaussian distributions centered on F with dispersions a linear function of F (Fig 10); past 2F, responding resurged in all conditions except one (F = 15, P = 240; Fig 11) The skewness of the generalization gradients was not systematic The mean terminal response rate (between 0.9P and P) was 22% of the maximum response rate, substantially lower than that found for rats The double Gaussian model used to describe the response rates of rats also described the performance of pigeons in the probe trials quite well (Fig 11), even though the pigeons showed much less resurgence than the rats Table contains the parameters and Fig 12 plots the standard deviations of the estimated Gaussian distributions against their mean (The σ P+F for F = 15, P = 240 is very large, due to the absence of response resurgence in this condition We treated this value as exceptional, and excluded it from Fig 12.) There was a tendency for the maximum height of the second distribution (RP+F ) to increase with longer FIs and with shorter probes Consequently, terminal probe rates (represented in Fig 13 as percent of maximum rate) generally increased as a function of FI-to-probe duration ratio (F/P) Unlike in Experiment 1, post-probe and post-FI performance were very similar Nevertheless, when F = 15 s, post-probe rates were slightly lower than post-FI rates Conversely, when F = 30 or 60 s, post-probe rates were slightly higher, at least transiently, than post-FI rates The transition model used in Experiment 1, which took resurgence rates as their starting point, would overpredict response rates at the beginning of post-probe tri- 3.1.4 Data analysis Data were analyzed with the break point detection algorithm described in Experiment 3.2 Results 3.2.1 Pretraining All pigeons obtained all programmed reinforcers in each of the five pretraining sessions Response rates during the last pretraining session ranged from 35 to 74 responses/min 3.2.2 Peak training 3.2.2.1 Mean response rates Temporal generalization gradients were similar to those of rats From zero to 2F, they were Fig 10 Peak interval performance of pigeons in Experiment 2, from s to twice the FI Calculation and normalization of mean response rates are the same as in Fig 136 F Sanabria, P.R Killeen / Behavioural Processes 74 (2007) 126–141 Fig 11 Response rates of pigeons in Experiment as a function of time (in seconds) through a probe (left of vertical line) and post-probe trial (right of vertical line) Conventions are the same as in Fig 5, except that the curve superimposed on post-probe data was generated from Eq (A3) Table Parameters of the double Gaussian distribution model of PI performance (Experiment 2) F (s) P (s) RF (responses/s) RF+P (responses/s) σ F (s) σ F+P (s) 15 15 15 30 30 60 60 120 240 120 240 240 1.89 2.14 1.22 1.70 1.64 0.88 0.38 0.19 0.10 0.94 0.30 0.73 7.11 7.22 7.35 16.07 13.98 23.12 27.22 71.44 a Best fit is meaningless large number a 44.18 185.11 191.07 F Sanabria, P.R Killeen / Behavioural Processes 74 (2007) 126–141 Fig 12 The standard deviations of the Gaussian distributions are plotted against t = F (FI) and t = P + F (the time of the post-probe reinforcer) for Experiment als The post-probe curves in Fig 12 are more accurately and parsimoniously drawn by treating post-probe performance as undistinguishable from post-FI performance This amendment to the model is laid out in Appendix A.3 Because of the lesser resurgence before the ITI marker and the greater resetting of rates after it, we infer that the ITI marker was much more effective with pigeons than with rats 137 Fig 14 Mean probabilities of trials with tritonic rate functions across pigeons in Experiment Bars indicate standard errors The probability distributions of starts, stops (in probe trials) and resurgences (in probe trials with tritonic rate functions) are shown in Fig 15 Compared to rats (Fig 9), start- and stoptimes were more consistent in pigeons, and responding generally 3.2.2.2 Break point detection The break point detection algorithm was run on the last five sessions of each condition Mean responses rates were reproduced from the algorithm using the procedure described in Experiment They accounted for 83–99% of the variance in the obtained response rates (Fig 11, left of vertical line) The distribution of tritonic response rate functions across experimental condition is shown in Fig 14 Longer FI schedules generated more tritonic functions than shorter FI schedules A within-subject resampling of the probability of tritonic rate functions in each condition (5000 iterations) indicated that a mean difference equal or larger than the obtained between conditions with F = 15 and other experimental conditions would occur, by chance, with p < 03 Fig 13 Normalized terminal response rates (between 0.9P and P) as a function of F/P ratio Fig 15 Experiment 2: probability distributions of (A) starts and (B) stops in probe trials, from s to 4F (C) Probability distribution of resurgences in tritonic trials Note that panel C is plotted on a different scale than panels A and B Symbols and legend correspond to those of Fig 10 138 F Sanabria, P.R Killeen / Behavioural Processes 74 (2007) 126–141 Fig 16 Obtained vs predicted [k(P + F)] resurgence times in Experiment Fitted k slopes are indicated for each subject started earlier in probe trials As was the case for rats, resurgence in pigeons was heterogeneous and generally increased in probability towards the end of probe trials Across experimental conditions, median resurgence times were well described by a proportional function k(P + F), with k fitted to each individual pigeon (Fig 16) The median value of k was 67 and 63 for rats and pigeons, respectively Note that these values, approximately two-thirds of the expected time to reinforcement, are also the duration of pausing that Schneider (1969) found to characterize simple FI schedules 3.3 Discussion Pigeon data confirm many of the conclusions drawn from rat data Rates between zero and 2F showed scalar invariance (Weber’s Law), and resurgence after 2F was well described as the left tail of a Gaussian distribution centered on the time of the next reinforcer The distribution of resurgence times supports the hypothesis that response rates track subsequent reinforcers The ability of distant reinforcement to engender resurgence appears to be positively correlated with the duration of the training fixed interval, F (Figs 13 and 14), and somewhat negatively influenced by the duration of probe trials, P (Fig 13) This is consistent with Whitaker et al (2003) results from mixed FI schedules In the most extreme situation, when F = 15 s and P = 240 s, there was no second peak It is conceivable that resurgence was dependent on the conditioned reinforcing value of the trial-termination signal (i.e., offset of interval-marking cue) This signal would acquire its value from its temporal proximity to incoming reinforcers Thus, we would expect more resurgence when the interval between trial termination and the next reinforcer (F, excluding ITI) was short Contrary to that hypothesis, the amount of resurgence increased monotonically with F Compare, for instance, the proportion of tritonic trials across conditions with P = 240 s (Fig 14) On average, tritonic trials are least frequent when F is 15 s, and most frequent when F is 60 s Because of the good account given by the assumption that the second peak was centered at the forthcoming reinforcer, and because resurgence rates did not covary in the direction predicted by a delayed conditioned reinforcement account for trial termination cues, we conclude that resurgence is under the control of forthcoming reinforcers, not the trial termination cues that predict them The response rate of pigeons generally reset after every ITI, whether or not preceded by reinforcement This was not the case for rats Research on reinforcement omission in FI schedules with pigeons using ITI of similar length (16 s) has shown similar results (Papini and Hollingsworth, 1998) Papini and Hollingsworth further demonstrated that response rates increase slightly after reinforcement omission when ITIs were very short (2 s) Thus, a common process may underlie response rate transition in both rats and pigeons, with pigeons’ behavior looking more like rats’ when transition markers are degraded This exponential transfer of control starts at the end of each probe trial, continues unobserved through the ITI, and manifests during the following trial Under this hypothesis, the transfer of control was not complete for the rats in Experiment 1, whereas it was largely complete for the pigeons General discussion The principal objective of this study was to reproduce and examine two common divergences from Gaussian timing errors in a PI procedure with long probe trials Rats and pigeons demonstrated response resurgence under various FI and probe trial durations The distribution of responses near target times (0–2F), however, was not always positively skewed Consistent with prior reports, resurgence invariably took the form of a monotonic increase in responding Resurgence times and the distribution of responding at the end of probe trials were sensitive to changes in FI and probe trial duration The specific form of covariation of resurgence and PI parameters supports Church et al.’s (1991) hypothesis that resurgence is controlled by reinforcement following probe trial termination The often reported positive skewness in Gaussian timing errors may be due to the overlapping of two Gaussian distributions (one for the cancelled reinforcer, one for the incoming reinforcer) in probe trials Because of the relatively large P/F ratios used in this study, we may have minimized the overlapping of response distributions Peak interval performance was consistent with prior reports of behavioral control by multiple reinforcement times (Leak and Gibbon, 1995; Meck and Church, 1984; Whitaker et al., 2003) Each time appears to be tracked independently and with proportional error It is noteworthy that such tracking subsisted even though, unlike mixed FI schedules, reinforcement after probe trials was reliably signaled by the start of a new trial This is indicative that not only reinforcers were temporally tracked, but stimuli related to those reinforcers were also tracked; their control over behavior appears to be combinatorial (Guilhardi et al., 2005) Across experimental conditions, response resurgence never took the form of the second peak described by KirkpatrickSteger et al (1996), although a small ripple in response rate was somewhat noticeable for pigeons in the F = 60 s, P = 240 s condition The location of this ripple (centered at 3F) and the F Sanabria, P.R Killeen / Behavioural Processes 74 (2007) 126–141 conditions under which it appeared (F/P = 4) are consistent with Kirkpatrick-Steger et al.’s data The absence of larger second peaks may have been due to the saliency of the interval-marking cue Unlike in Kirkpatrick-Steger et al.’s study, interval initiation was signaled by diffused and easily detectable cues—extinction of houselight for pigeons, tone onset for rats In other respects, experimental conditions in this study and in Kirkpatrick-Steger et al.’s were very similar The difference across studies is, therefore, consistent with the notion that neglect of reinforcement cues may be necessary to generate second peaks Consistent with this hypothesis, it is possible that the infrequency of reinforcement or the long interval between trial initiation and reinforcement when F = 60 s, P = 240 s inhibited cue tracking and generated the ripple observed in pigeon performance But even if non-temporal factors are necessary to generate second peaks, the dependency of second peaks on P/F = suggest that those factors are not sufficient to account for such aberrant timing The Theory of Stochastic Counters (Killeen and Taylor, 2000) predicts that a fallible binary counter would generate resurgence and second peaks, because of the periodicity of the configuration of lower order bits The repetition of lower order bits would be expected at a period proportional to the fixed interval (F), and response resurgence times proportional to P + F As noted in the introduction, however, prolonged training is expected to condition responding to the value of higher order bits that indicate whether or not reinforcement in the current trial is past due With these conditioning properties, a binary counter model with a single counter and memory modules faces similar challenges in accounting for resurgence as other pacemaker-counter models Within the theoretical framework of pacemaker-counter models, two temporal locations must be encoded to predict resurgence of the character we report It must condition F and P + F A single counter resetting at reinforcement could encode both times into a single memory matrix, which would store both reinforcer times (or more, if it were necessary) But this did not correspond to the performance of pigeons in our experiments Resurgence was ubiquitous, but post-probe performance was not a mere projection of performance at the end of probe trials (note changes in response acceleration around the vertical lines in Figs and 11) Instead, post-probe performance appeared to be under partial (rats) or complete (pigeons) control of the FI marked by the new trial onset Behavior under the control of P + F was overshadowed by the control of F If the counter resets with trial-initiation stimuli, how can it ever measure and condition P + F? This demands either a dual-counter or a dual-memory system The architecture must be that each trial termination (or initiation) starts a timer, but only reinforcement prints to memory and stops the timer Thus, on a post-probe trial transition, a new counter is initiated without resetting the older counter Pigeons, in the conditions of our experiments, come quickly under the control of the new counter, rats under the joint control of old and new Then, upon reinforcement, the score on both counters is encoded in memory as predictive of reinforcement The replication of Gaussian distributions around expected time to reinforcement further begs the question of their origin 139 Such curves are so general that they can have many origins One of the origins is errors in a stochastic counter Fig 17 shows the distribution of reproductions of intervals presented to an imperfect binary counter In timing the length of the preceding interval, some counts may be dropped, leading to a shorterthan-true estimate of the time; but in producing the interval, again some counts may be dropped On the average these errors will cancel, to yield an unbiased estimate; but they will add variance to the estimates For precise counters—ones with a probability of error p less than 1%, the resulting distribution of reproductions looks like a pair of back-to-back exponential distributions, centered on the correct time, and with a slightly greater standard deviation for the right tail (Killeen and Taylor, 2000) But for inaccurate counters, the distributions of reproductions take the shape of slightly peaked and skewed Gaussian distributions A p = 90 counter was presented with a mixture of 20 or 80 inputs, and ‘reinforced’ at the end of each Four thousand replications of the experiment provided the data shown in Fig 17, which were scaled to have a maximum of The curves through the simulated data are Gaussian They represent the distributions of hitting times for each of the criterial counts This display, up to the vertical dashed line, corresponds to probe performance in P/F = A more precise implementation of the logic of the Theory of Stochastic Counters would generate hitting times for onset, offset, and resurgence, using those criterial counts to move into states of high or low response rate For the present data however, that would be putting too fine a point on this modeling, which is intended merely as a proof of concept Theoretical models are at best sufficient—with luck they will reproduce the observed data This is an important achievement for any model But many other models may also reproduce the data Then the choice is based on parsimony and ease of use A more elegant model might treat these animals’ performance as the response of control systems to random impulses at periods of F and P + F But that is a chapter for another book Fig 17 The output of a stochastic counter conditioned to 20 and 80 inputs The curve through the data was the same as that used in the displays of probe performance of rats and pigeons (Eq (A1)), with F as a free parameter and P = 4F The vertical dashed line indicates P The parameter values of the model are RF = 0.83, RP+F = 0.37, σ F = 7.08, σ P+F = 30.78, F = 18.08 140 F Sanabria, P.R Killeen / Behavioural Processes 74 (2007) 126–141 from zero to F: Acknowledgement NSF IBN 0236821 supported this research We thank Lewis A Bizo for methodological advice, Weihua Chen for data collection and pre-analysis, and Heather Adams, Michelle Barker, Cindy Bazua, Kayla Cranston, Courtney Ficks, Michelle Gaza, Paul Jellison, Kweku Osafo-Acquaah, and Hyewon Rhieu for data collection Portions of this research were presented at the 2006 International Conference on Comparative Cognition, Melbourne Beach, FL Appendix A A.1 Double Gaussian model The double Gaussian model used to fit the probe distributions is the sum of two Gaussian kernels: √ 2σF ) Rt = RF e−((t−F )/ √ 2σP+F ) + RP+F e−((t−(P+F ))/ 0 P, such that Rpost-ITI = RP+F e−λ(t−P) + RF (1 − e−λ(t−P) ), (A2) P

Định dạng
Số trang	16
Dung lượng	779,83 KB