A theory of behaviour on progressive ratio schedules, with applications in behavioural pharmacology

18 4 0
A theory of behaviour on progressive ratio schedules, with applications in behavioural pharmacology

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

A theory of behaviour on progressive ratio schedules, with applications in behavioural pharmacology C. M. Bradshaw & P. R. Killeen Psychopharmacology ISSN 0033-3158 Psychopharmacology DOI 10.1007/s00213-012-2771-4 23 Your article is protected by copyright and all rights are held exclusively by SpringerVerlag This e-offprint is for personal use only and shall not be self-archived in electronic repositories If you wish to self-archive your work, please use the accepted author’s version for posting to your own website or your institution’s repository You may further deposit the accepted author’s version on a funder’s repository at a funder’s request, provided it is not made publicly available until 12 months after publication 23 Author's personal copy Psychopharmacology DOI 10.1007/s00213-012-2771-4 REVIEW A theory of behaviour on progressive ratio schedules, with applications in behavioural pharmacology C M Bradshaw & P R Killeen Received: 16 March 2012 / Accepted: June 2012 # Springer-Verlag 2012 Abstract Rationale Mathematical principles of reinforcement (MPR) provide the theoretical basis for a family of models of schedule-controlled behaviour A model of fixed-ratio schedule performance that was applied to behaviour on progressive ratio (PR) schedules showed systematic departures from the data Objective This study aims to derive a new model from MPR that will account for overall and running response rates in the component ratios of PR schedules, and their decline toward 0, the breakpoint Method The role of pausing is represented in a real-time model containing four parameters: T0 and k are the intercept and slope of the linear relation between post-reinforcement pause duration and the prior inter-reinforcer interval; a (specific activation) measures the incentive value of the reinforcer; δ (response time) sets biomechanical limits on response rate Running rate is predicted to decrease with negative acceleration as ratio size increments, overall rate to increase and then decrease Differences due to type of progression are explained as hysteresis in the control by The order of authorship is alphabetical Correspondence may be addressed to either author Electronic supplementary material The online version of this article (doi:10.1007/s00213-012-2771-4) contains supplementary material, which is available to authorized users C M Bradshaw (*) Psychopharmacology Section, Division of Psychiatry University of Nottingham B109 Medical School, University of Nottingham, Nottingham NG7 2UH, UK e-mail: c.m.bradshaw@nottingham.ac.uk P R Killeen (*) Department of Psychology, Arizona State University, Tempe, AZ 85287-1104, USA e-mail: Killeen@asu.edu reinforcement rates Re-analysis of extant data focuses on the effects of acute treatment with antipsychotic drugs, lesions of the nucleus accumbens core, and destruction of orexinergic neurones of the lateral hypothalamus Results The new model resolves some anomalies evident in earlier analyses, and provides new insights to the results of these interventions Conclusions Because they can render biologically relevant parameters, mathematical models can provide greater power in interpreting the effects of interventions on the processes underlying schedule-controlled behaviour than is possible for first-order data such as the breakpoint Keywords Progressive ratio schedule Mathematical principles of reinforcement Mathematical model Linear waiting Hysteresis Reinforcer magnitude Antipsychotics Nucleus accumbens core Lesion Orexinergic neurones Introduction Ratio schedules of reinforcement specify the number of responses that a subject must emit in order to obtain a reinforcer In fixed ratio (FR) schedules, this number is an unchanging feature of the schedule, whereas in variable ratio schedules, it changes unpredictably from one reinforcer to the next (Ferster and Skinner 1957) In progressive ratio (PR) schedules, the required number of responses is systematically increased, typically from one reinforcer to the next (Hodos 1961), but sometimes between sessions (Czachowski and Samson 1999) or according to some other schedule (e.g Li et al 2003; Richardson and Roberts 1996; Stafford et al 1998) Responding on PR schedules is generally found to be well maintained under lower ratios; however, the rate of responding declines with progressive increases in the ratio Author's personal copy Psychopharmacology requirement, until, eventually, responding ceases altogether The ratio at which the subject stops responding is known as the breakpoint or breaking point (Hodos 1961; Hodos and Kalman 1963) PR schedules have found favour among behavioural neuroscientists interested in the biological basis of motivation and reward processes because of the prima facie relationship between the breakpoint and (a) the subject’s motivational state (Barr and Philips 1999; Bowman and Brown 1998; Ferguson and Paule 1997), and (b) the incentive value (Cheeta et al 1995; Hodos 1961), and magnitude (Covarrubias and Aparicio 2008; Ferguson and Paule 1997; Rickard et al 2009; Skjoldager et al 1993) of the reinforcer It is becoming increasingly apparent, however, that the uncritical use of the breakpoint as an index of motivation or reinforcer value can no longer be justified The specificity of the breakpoint as a motivational index is called into question by its sensitivity to ostensibly non-motivational manipulations such as changes in the response requirement (Aberman et al 1998; Skjoldager et al 1993) and the ratio step size (Covarrubias and Aparicio 2008) It has also been noted that the breakpoint is an intrinsically unreliable measure, being derived from a single time point during an experimental session, data from the rest of the session being ignored (Arnold and Roberts 1997; Killeen et al 2009) Moreover, the definition of the breakpoint is arbitrary, there being no general consensus as to the period of time that must elapse without a response occurring before the subject may be said to have truly stopped responding (Rickard et al 2009) The ambiguities inherent in the breakpoint may be circumvented by quantitative analyses that take into account the response rate in each component ratio of the schedule Models derived from the mathematical principles of reinforcement (MPR; Killeen 1994) provide a theoretical basis for such analyses Mathematical principles of reinforcement MPR is a theoretical account of the way in which reinforcers exert control over operant behaviour The theory is founded on three fundamental principles: (1) Reinforcers activate behaviour; (2) The rate at which organisms can emit operant responses is limited by biological constraints; and (3) The contingencies specified by reinforcement schedules determine the ‘coupling’ of reinforcers to operant responses and to discriminative stimuli The characteristic patterns of freeoperant responding maintained by classical reinforcement schedules (e.g Ferster and Skinner 1957) derive from the operation of these three principles Principles are akin to strategies; models to tactics Models are expedients that may be revised as better executions emerge The above principles of MPR provide the theoretical substrate for a family of models of performance maintained under ratio and interval schedules of reinforcement The models and their parameters translate the data in terms of those principles The first principle of reinforcement is that incentives empower behaviour (Killeen 1998) Its first implementation in a model was the simplest possible: A0ar, where A is arousal, the behavioural manifestation of incitement; r is rate of reinforcement; and a is a motivational parameter called specific activation Although simple, it is not ad hoc, as it was derived from prior research on the cumulation of arousal (Killeen 1979; Killeen et al 1978) The specific activation parameter expresses the duration of activation induced by the delivery of a single reinforcer It is the primary motivational parameter, being affected by deprivation, incentive motivation, and pharmacological intervention Actual manifestation of that excitement is curtailed under natural ceilings on response rate That is the second principle It is instantiated in a model holding that response rate is proportional to the time left available for responding The constraints on responding are summarised by the response time parameter delta, δ, which defines the minimum time that must elapse between the initiation of two successive responses Delta depends on the nature of the manipulandum, and the dexterity of the organism on it This realisation yields a version of Herrnstein’s ‘hyperbolic’ matching law (Herrnstein 1970, 1974; Herrnstein et al 1997), manifest in Eq (see below) The association of responses to reinforcers is summarised by a coupling coefficient, C, which is specific for any particular schedule C, which ranges from to 1, is the degree of association between responses and reinforcement; it is derived from the weight beta, β, given to the most recent response in the reinforcement process (0≤β≤1) Beta is called the currency parameter, because if it takes a value of 1, all of the weight of reinforcement is focused on the particular response that immediately preceded reinforcement For smaller values, the impact of reinforcement is spread out over responses before the last one in what is traditionally called the delay of reinforcement gradient The currency parameter is therefore a measure of the steepness of the gradient It is identical to the hypothetical decay of ‘eligibility traces’ as used in reinforcement learning models (e.g Sutton and Barto 1990) The coupling coefficient tells how much of the credit for a reinforcer is assigned to a target response class Since responses are spread out in time relative to the reinforcer, the coefficient is proportional to the area under the delay of reinforcement gradient In ratio schedules, the reinforcement contingencies make the responses occurring proximate to reinforcement predominately target responses, ones that are counted by the experimenter, and the calculation of coupling is straightforward, given in the next section On other reinforcement schedules, other unmeasured responses capture some of the reinforcement strength, Author's personal copy Psychopharmacology and different formulas are necessary to compute the coupling, as shown in Killeen (1994) Combining the three principles gives the basic predictive equation (from Killeen 1998, Eq 6b): Rẳ C:ar d ỵ arị ð1Þ with δ>0 (for typical responses, between 0.2 and 0.4 s) If we define T as the average inter-reinforcer interval, T01/r, then for FR schedules: Rẳ CFRN d ỵ T =aÞ ð2Þ Section briefly describes MPR’s account of performance on FR schedules, and Section the application of MPR to performance on PR schedules The FR model (Bizo and Killeen 1997) Bizo and Killeen (1997) developed Eq for FR schedules, where rate of reinforcement itself depends on rate of responding, R, and ratio size, N: T0N/R Fortunately, in this case, the positive feedback loop resolved to a simple equation: Rẳ CFRN N : a d 3ị As the ratio requirement N increases, the curve defined by Eq rises to a peak before falling linearly to zero (Fig 1, upper left panel) It rises because, on ratio schedules, more and more target responses are strengthened by the reinforcer as N increases But early responses, remote from the reinforcer, are strengthened less than those most proximate to it The falloff of strength with distance is assumed to be exponential, or, in the case of discrete responses, geometric, with rate of falloff in that case determined by beta (β), yielding the coupling coefficient for FR schedules as CFRN 01−(1−β)N When β01, coupling is tightly focused on the last response, and Eq resolves to a simple inverse linear relation between R and N The value of a governs the slope of the descending limb of the function (slope0−1/a); the value of δ determines the intersection of the (extrapolated) descending limb with the ordinate (intercept01/δ) The intercept with the x-axis, historically called the extinction ratio, and subsequently the breakpoint, is given by a/δ As N increases, coupling increases; at the same time, however, reinforcement rate is decreasing with N, and that takes its toll at larger values of N, bending the function down toward at the extinction ratio Equation has proved to be a robust descriptor of performance on FR schedules (Bizo and Killeen 1997; Killeen and Fig Theoretical response rate functions; ordinates, response rate, Ri abscissae, response/reinforcer ratio, N The upper right graph shows the progressive ratio (PR) model (Eq 5, running response rate, RRUN; Eq 7, overall response rate, ROVERALL) For comparison, the upper left graph shows the fixed ratio (FR) model (Eq 3) The FR model specifies a linear decline of response rate from its peak towards zero; δ is the (extrapolated) ordinate intercept, −1/a defines the slope, and the breakpoint is predicted by a/δ The locus of the peak is defined by β; when β01, the function resolves to a straight line extending from 1/δ to a/δ Note that in contrast to the FR model, the PR model defines different curves for RRUN and ROVERALL, and that response rate declines in a curvilinear fashion towards zero The middle and lower graphs show the effects of changes in the four parameters of the PR model on the curves defined by Eqs and 7; continuous lines ‘baseline’ functions and broken lines the effect of the changed value of each parameter An increase in the minimum post-reinforcement pause, T0, reduces RRUN, the effect being mainly confined to lower values of N An increase in the slope of the linear waiting function, k, results in an increase of the proportion of the inter-reinforcer interval devoted to post-reinforcement pausing; the reduction of ROVERALL occurs at all values of N A reduction of specific activation, a, is reflected in steepened decline of both response rate functions An increase in response time, δ, produces a parallel downward displacement of both curves (see text for further explanation) Sitomer 2003; Reilly 2003) It also provides a passable description of performance on PR schedules (Bezzina et al 2008; Covarrubias and Aparicio 2008; den Boon et al 2012; Ho et al 2003; Kheramin et al 2005; Olarte Sánchez et al 2012a, b; Zhang et al 2005a,b) The parameters a and δ have Author's personal copy Psychopharmacology been shown to be differentially sensitive to various brain lesions and acute treatment with different classes of psychoactive drug, offering encouragement to those who would use this approach to analyse PR schedule performance as a means of disentangling the effects of neuropharmacological interventions on motivational and motor processes (see Olarte Sánchez et al 2012a, b; Zhang et al 2005a) In some cases, it has been found that analysis based on Eq can reveal complex effects of interventions that are hidden in the breakpoint For example, atypical antipsychotics (e.g clozapine, olanzapine) have been found to increase both a and δ; however, because these effects exert opposing influences on the breakpoint, that measure is often unaffected by these drugs (den Boon et al 2012; Olarte Sánchez et al 2012a) On the other hand, there are significant difficulties with the application of Eq to PR schedule performance A growing body of evidence indicates that empirical response rate curves deviate systematically from Eq (Killeen et al 2009; Rickard et al 2009) In the next section, the limitations of Eq as a descriptor of PR schedule performance are discussed, and solutions offered The PR schedule PR schedules allow the effect of an intervention on behaviour to be assessed within a single session This is preferable to testing the effect of the intervention on a series of separate FR schedules presented in different phases of an experiment, not only because of the saving of experimental time and effort, but also because the shorter protocol minimises the contaminating influence of instability of the effects of interventions over time Although the FR model provides a fair description of performance on PR schedules, it fails to capture some characteristic features of performance on these schedules For example, Eq specifies a linear decline in response rate as a function of increasing ratio requirement, whereas a number of studies have found marked concavity of the declining limb of the response rate curve (e.g Bezzina et al 2008; Killeen et al 2009; Olarte Sánchez et al 2012a, b; Rickard et al 2009; Zhang et al 2005a,b) Rickard et al (2009) found that while overall response rate on PR schedules was well described by the bitonic function of Eq 3, running response rate declined convexly, not linearly As we now understand, the problem is that in the original statement of MPR (Killeen 1994), Eq was applied to overall response rate (ROVERALL): i.e response rate calculated by dividing N by the inter-reinforcement interval, T This works in steady-state scenarios, such as FR, but is the nub of the problem in dynamically changing ones, such as PR, especially when response rates are nonhomogeneous—that is, when they consist of low rates (during the post-reinforcement pause, of duration TP) mixed with high rates (the running rate, RRUN throughout the rest of the ratio) The impact of the post-reinforcement pause on overall response rates may be incorporated into the PR model by invoking the linear waiting principle (Wynne et al 1996), which expresses the robust empirical finding that the postreinforcement pause on trial i, TP,i, is a linear function of the total inter-reinforcement interval on the prior trial, TTOT,i1: TP;i ẳ T0 ỵ kTTOT;i1 ; 4ị where T0 is an initial pause due to post-prandial activity or lassitude, and k is the slope of the linear waiting function (Schneider 1969) With Eq in place, we can construct a model of PR performance Running rate is given by Eq Because on PR schedules, animals are exposed to long ratio components, we assume that coupling is quickly driven toward its maximum, and economise on free parameters by setting CPR equal to 1.0 Then: Á RRUN;i ¼ À d ỵ TTOT;i1 =a 5ị Equation comes from Eq by setting the coupling coefficient to 1.0, and assuming that run rates depend on the time between reinforcers in the prior component (see Appendix 1A) The time between reinforcers is the sum of the pause time and time to complete the ratio once responding has commenced (the run time, TRUN): TTOT;i ẳ TP;i ỵ TRUN 6aị Pause time is given by linear waiting (Eq 4), and run time by the number of responses required, Ni, divided by the run rate, RRUN,i Thus, Run time Pause time TTOT;i zfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflffl{ ẳ T0 ỵ kTTOT;i1 ỵ z}|{ Ni RRUN;i 6bị Substituting the definition of RRUN from Eq into Eq 6b gives: TTOT;i ẳ T0 ỵ kTTOT;i1 ỵ Ni d ỵ TTOT;i1 =a 6cị Finally, to predict overall response rate, divide the response requirement by the predicted duration of the component: ROVERALL;i ẳ Ni =TTOT;i 7ị Equations and are the key predictions The curves defined by these equations are illustrated in the upper right panel of Fig 1; the effects of changes in each of the four parameters are shown in the middle and lower panels A flowchart for computing these values is in the Appendix Author's personal copy Psychopharmacology Note that all of the computations involve predicted quantities, and are not drawn from the data Note also how this computed version of MPR for PR schedules draws a different prediction than Eq for FR schedules (cf Fig 1, upper left panel) Now, rather than a linear descent to the x-intercept, it draws a curvilinear decrease toward asymptote The curvilinearity arises from the key difference between these models: Respect for the dependence of behaviour on the recently experienced conditions (the prior ratio—and its dependence on the prior ratio …) which are different from the current conditions By using this real-time computed model, MPR can automatically adapt to any sequence of component lengths in the progression, Ni If all values of N are the same—that is, if it is an FR schedule, these computations reduce to those of FR schedules If they deviate only a little, as in an arithmetic progression with a small step size, the FR model works fine But for the commonly used exponential progressions, the behaviour on large ratios are sustained by the recent history of smaller ratios and thus biassed upward, sustained above the level that could be maintained at that ratio were it to be presented in a regular FR schedule For practical use, the real-time computation, described by the flowchart in the Appendix and embodied in the archived spreadsheet, is equally effective, whether dealing with FR or PR schedules Equations and constitute a coherent model of performance on PR schedules founded on MPR The following section presents a re-analysis of some recent observations on PR schedule performance based on this model Applications Reinforcer magnitude (Rickard et al 2009) Rickard et al (2009) examined the effect of manipulating reinforcer magnitude on PR schedule performance Fifteen rats were trained under a PR schedule based on the exponential progression described by Roberts and Richardson (1992) Seven volumes of a 0.6 M sucrose solution (6, 12, 25, 50, 100, 200, and 300 μl) were used as the reinforcer in different phases of the experiment, each phase lasting for at least 30 sessions In the present re-analysis, Eqs and were fitted to the overall and running response rate data obtained from each rat, averaged over the last 10 sessions of each phase Figure shows the group mean data; open symbols indicate the running response rates, RRUN, and filled symbols the overall response rates, ROVERALL RRUN declines monotonically towards zero, whereas ROVERALL rises to a peak before descending towards zero The descent of both RRUN and ROVERALL towards zero displays marked curvature, consistent with the PR model The effect of the magnitude of reinforcement is clearly evident in the rate of decay towards zero, larger volumes being associated with more gradual decline, implicating changes in a with changes in incentive motivation, as expected Interestingly, although the height of the peak of the ROVERALL curve appears to be inversely related to reinforcer volume, the intercept of the RRUN curve with the ordinate seems to be unrelated to reinforcer volume The goodness of fit of the model was r2 00.991 (Eq 5, r2 00.991; Eq 7, r2 00.976) Note that the same parameter values are used for those matched predictions Figure shows the values of the parameters (mean±SEM) derived from the fits of the model to the data from the individual rats There was a significant effect of reinforcer volume on the specific activation parameter, a [F(6,84)029.4, p

Ngày đăng: 13/10/2022, 14:36

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan