Preference Reversal, 1981
Ainslie, G., and Herrnstein, R. J. (1981). Preference reversal and delayed reinforcement. Animal Learning and Behavior, 9(4), 476-482.

 

`

 

Excerpt from

Preference reversal and delayed reinforcement

 

George Ainslie
Massachusetts Mental Health Center
Boston, Massachusetts 02115

and

R. J. Heernstein
Harvard University

Cambridge, Massachusetts 02138


Published in Animal Learning and Behavior, 1981, 9(4), 476-482.

 

From pages 477-478.


Method

Subjects

Six male White Carneaux pigeons, experienced in a number of other operant procedures, were maintained at 80% of their free-feeding weights.


Procedure

Each subject was given 40-45 trials daily, each of which lasted 60 sec regardless of the subject's behavior. At the beginning of each trial, the left key was green and the right key, white. A peck on the left (green) key darkened both keys and raised the food hopper D sec later. The hopper remained raised for 2 sec after the subject's head entered the feeder opening. The keys then remained dark until the beginning of the next trial, 60 sec from the trial's onset. A peck on the right (white) key darkened both keys and raised the food hopper D + 4 sec later. The hopper remained raised for 4 sec after the subject's head entered the feeder opening, whether or not the bird kept his head continuously in the opening. The key then remained dark until the beginning of the next trial.

If no peck occurred within 10 sec of the beginning of a trial, the keys darkened and remained dark until the beginning of the next trial. The experimental session ended when the subject had pecked on 40 trials or when 45 trials had taken place, whichever happened first.

The value of D was initially set at .01 sec for all subjects, making the smaller reinforcer available virtually immediately after the appropriate peck. For Subjects I, 2, and 3, D was increased from .01 sec to 2, 4, 6, 8, and 12 sec, then returned to .01 sec; for Subjects 4, 5, and 6, D went from .01 to 12 sec and was then decreased to 8, 6, 4, 2, and .01 sec. Although stable preferences usually formed within the first few sessions of a new con­tingency, D was not changed until no subject was pecking outside of the range of rates it had emitted on previous trials with the same value of D, and no subject seemed to be moving toward the edge of that range. The experiment ran for 320 sessions extended over a period of 11 months and was divided roughly equally among the seven conditions.

Results

Data from the last 10 sessions in each condition were analyzed.

All subjects initially preferred the smaller reinforcer at .01 sec rather than the larger reinforcer at 4 sec (Figure 2). As the delay, D, was increased, all subjects reversed preference, choosing the large-late reinforcer on most or all trials. When D was decreased to .01 sec again, four subjects again chose the small-early reinforcer on virtually all trials; two subjects (2 and 5) continued to choose the large-late reinforcer on all trials. Figure 2 summarizes the main findings for individual subjects. The ordinate gives the large-late reinforcers as a proportion of all reinforcers; the abscissa gives the two delays at each con­dition, which differ by 4 sec, plotted in the order in which they were imposed. Somewhere between .01 and 8 sec, all subjects reversed preference. Lines through the data points show ± 1 standard deviation (estimated population value) around the mean of the individual sessions. Where no variability lines are shown, the standard deviation was zero or virtually zero.