Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 30 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
30
Dung lượng
0,94 MB
Nội dung
Running head: DISCOUNTING DISCOUNTING Models of trace decay, eligibility for reinforcement, and delay of reinforcement gradients, from exponential to hyperboloid Peter R Killeen Arizona State University Prepublication preprint Killeen, P R (2011) Models of trace decay, eligibility for reinforcement, and delay of reinforcement gradients, from exponential to hyperboloid Behavioural Processes, 87, 57–63 DOI: 10.1016/j.beproc.2010.12.016 Abstract Behavior such as depression of a lever or perception of a stimulus may be strengthened by consequent behaviorally-significant events (BSEs), such as reinforcers This is the Law of Effect As time passes since its emission, the ability for the behavior to be reinforced decreases This is trace decay It is upon decayed traces that subsequent BSEs operate If the trace comes from a response, it constitutes primary reinforcement; if from perception of an extended stimulus, it is classical conditioning This paper develops simple models of these processes It premises exponentially-decaying traces related to the richness of the environment, and conditioned reinforcement as the average of such traces over the extended stimulus, yielding an almost-hyperbolic function of duration The models account for some data, and reinforce the theories of other analysts by providing a sufficient account of the provenance of these effects It leads to a linear relation between sooner and later isopreference delays whose slope depends on sensitivity to reinforcement, and intercept on that and the steepness of the delay gradient Unlike human prospective judgments, all control is vested in either primary or secondary reinforcement processes; therefore the use of the term discounting, appropriate for humans, may be less descriptive of the behavior of nonverbal organisms Keywords: Delay of reinforcement gradients, discounting, forced choice paradigms, magnitude effect, matching paradigms, reinforcement learning, trace decay gradients Introduction Pigeons cannot reliably count above (Brannon et al., 2001; Nickerson, 2009; Uttal, 2008), have short time-horizons (Shettleworth and Plowright, 1989), may be stuck in time (Roberts and Feeney, 2009), not ask for the answers to the questions they are about to be asked (Roberts et al., 2009), and fail to negotiate an amount of reinforcement commensurate with the work that they are about to undertake (Reilly et al., 2011) How such simple creatures discount future payoffs as a function of their delay? It is the thesis of this paper that they not That the orderly data in such studies is the simple result of the dilution of the conditioned reinforcers which support and guide that choice, as a function of the delay to the outcome that they signal Classic and generally accepted concepts of causality preclude events from acting backward in time Then what sense we make of Fig 1, a familiar rendition of the control exerted by delayed reinforcers? How the animals know what’s coming? Only three accounts come to mind a) Precognition But causality rules that out b) It is memory of a past choice that makes contact with reinforcement; the figure should be reversed Or, c) the animals have learned what leads to what There follows an extended argument that (b) and (c) are both true, and that in novel contexts, (b) typically leads to (c) >> Please place Fig hereabouts 0, (1) which is the point association1 of an event at d seconds remove from the BSE, as seen in Fig >> Please place Fig hereabouts > Please place Fig hereabouts