Behavior 101 #2: Simple Schedules of Reinforcement, Extinction

Hopefully by now you’ve had some practice identifying how the four quadrants of reinforcement and punishment play an integral part of your everyday life. If you need a refresher, feel free to review HERE. Last time, we talked briefly about how a reinforcer may hold more value to one individual, but may not be of value at all to another individual. Making sure to choose effective reinforces is a must for any behavior change intervention. Often a behavior analyst will do what is known as a preference assessment along with the initial assessment. A preference assessment involves repeated presentation of various potential reinforcers, and data is taken on how much the individual interacts with each given item, and if that item is freely chosen over other available items. This isn’t done with just people or dogs; research into the animal behavior world has even involved preference assessments of Galapagos turtles (for those interested- the preference assessment showed that one turtle preferred having his shell rubbed, while another turtle in the study preferred being sprayed with the hose.

Generally, things like food/edibles, shelter, water, and sexual stimulation all fall under the category of ‘primary’ reinforcers. This means that these things are genetically hardwired into most of us to act as reinforcers. This isn’t to say that pizza or candy will be a reinforcer for everyone, but with a few extreme cases, I haven’t met too many people that don’t like to eat at least every so often (I have, however, worked with kids who don’t eat at all. Period. So just because it’s a primary reinforcer doesn’t mean it will act as one for an individual).

Most other reinforcers fall into the category of secondary reinforcers (or conditioned reinforcers). Secondary reinforcers are things that, by themselves, have no reinforcing power, but through the use of pairing, have become reinforcers. Can you think of the most commonly used conditioned reinforcer? We use it everyday. I’ll give you a hint- it’s green (at least, in the US it is!). If you guessed money, then you were correct! Money is a very powerful secondary reinforcer, because it holds unlimited access to other conditioned or primary reinforcers (it’s also a generalized reinforcer, but we’ll talk about generalization later). If you hadn’t been taught at some point that money is a means of buying things you wants and/or need, then all money would be to you is a piece of paper. Hand money to a 3 month old. Does it have any value to them? Other than maybe an interesting texture (or taste?), you probably won’t have a very effective intervention if reward a baby with a dollar bill every time he makes eye contact.

Reinforcers can be conditioned by a process called pairing. Pairing involves simultaneous presentation of the primary reinforcer with the one to be conditioned. Most clicker trainers will be familiar with this process from their early stages of working with a new dog. A click noise by itself holds no value to a dog that has never heard it before, or has never had it paired. The click itself is not reinforcing. What clicker-trained dogs have been taught to understand is that the click means a food reward is coming. Initially, when you are starting out, you must pair the food (or ball, or whatever primary reinforcer you’re using) with the sound of the click. This is usually achieved by firing the clicker in rapid succession with the presentation of the primary reinforcer. Also called ‘loading’ the clicker. What it does is teach the dog that the sound equals a reward. It pairs the stimuli together.

A paired reinforcer

A paired reinforcer

I’m sure most people are aware of Pavlov’s dog. Pavlov conditioned a dog to a bell by feeding the dog every time the bell was rung. After enough times of hearing the bell and being presented with food, the dogs would begin to drool in anticipation of being fed. This is often referred to as Pavlovian conditioning.

Pairing isn’t just done with reinforcers, but with punishers as well. We can have conditioned punishers, just like we have conditioned reinforcers. We use this process throughout the day, most likely without realizing it. A speeding ticket is a conditioned punisher, as the presentation of one usually limits the future amount of time we spend going above the posted speed limit sign. A car horn is another conditioned reinforcer. A parent’s warning could be another (“wait ‘till your father gets home!” sent most of scuttling to our rooms and generally effectively stopped whatever behavior tended to bring upon the wrath of the offended parent). Of course, a conditioned punisher or reinforcer will loose its value if it’s not regularly backed up with the primary reinforcer/punisher. If your parent warned you enough times, but Dad never cared about your antics when he returned home, chances are, after a time, you didn’t care when Mom said that, and you just continued on with your shenanigans until somebody put an eye out. The punisher lost its effectiveness, because it was never backed up with the actual punisher to which it had been paired. You can probably guess that this is called ‘unpairing.’ Never fear, it’s simple to re-pair a conditioned reinforcer.

The more items a conditioned reinforcer can be paired with, the more powerful that conditioned reinforcer will become. A paper bill that only buys you a drink of water is nice, if you’re thirsty (we’ll talk about motivating operations another day) but a paper bill in the form of a $1 bill, which can be spent on limitless things from gum to soda to cheap dollar store toys, or even saved for bigger and better things, will be a more effective reinforcer. And, naturally, $100 will generally be more effective than $1 when it comes to maintaining the behavior that got the money in the first place.

This reinforcer is pretty worthless to Dierdre. As you can see, she's tasting it. Other than the novel fun of shredding it, this wouldn't motivate her to retain behavior. However, it would probably be more reinforcing to you or me!

This reinforcer is pretty worthless to Dierdre. As you can see, she’s tasting it. Other than the novel fun of shredding it, this wouldn’t motivate her to retain behavior. However, it would probably be more reinforcing to you or me!

Of course, there is a limit to how effective a reinforcer is given the behavior that it was presented for. If you only got paid $1 a day to go to work, I’m going to guess that that’s probably not enough to keep you going to work, even in this economy (especially if today’s cost of living continued to be what it is). Maybe if you really loved your job, but then your reinforcement would most likely be coming from somewhere else, such as social attention, praise, or a rewarding feeling (e.g. volunteer work). Even if you offered me $5 a day to snuggle puppies all day, I’d still probably turn you down, simply because that involves leaving my house and I love to sleep. Sleeping and being at home is worth more to me than $5 and some puppy snuggles. Ok, there are times I’d get out of bed for this for free, but that usually because I haven’t done so in awhile, so I’m deprived of the secondary reinforcer in this scenario- puppy snuggles. Give me a few days of doing it and I’ll quickly become satiated and it will loose its reinforcing value. Puppies are a lot of work!

Now, to dabble a bit into what’s known as organizational behavior (a.k.a. the behavior of workers/employees), we see this play out every day in the wages paid for different jobs. Less desirable and more demanding jobs, and/or jobs requiring much more initial effort in terms of education, higher degrees, etc. often net the larger paychecks. And I don’t mean ‘fast food worker’ less desirable so much as ‘septic tank scuba diver’ less desirable. There’s only so little money you can pay someone to suit up and scuba dive in a tank full of human waste before they call it quits. (Yes, this is a job that actually exists. Mostly in Australia. Gotta love their adventurous, hard-working spirit, that’s for sure!). This is also the reason why often (not always, but often) you find poorer customer service and lower morale at locations that pay poor wages. Research shows that companies that pay higher wages and provide benefits have higher morale among their employees, better customer service, and less turnover, such as Starbucks and Costco. More effort equals higher reinforcement equals continued effort on the part of the individual.

If your dog exerts a lot of effort for a behavior (say, a variable surface track, or a utility dog obedience routine) and is met with very little reinforcement, the dog’s quality of work may decline. The work and effort is not worth the reinforcement. So how do we get around having to give the dog a cheeseburger for every agility jump, thus creating a 200-pound porker that knocks down all the rails because his belly hangs so low? We institute reinforcement schedules. These can be fixed or variable, with a frequency or time interval, and all have their pros and cons. Chances are pretty good that, at your job, you’re on a fixed ratio with regards to your pay. You are probably getting paid on a weekly, bi-monthly, every 2 weeks, whatever schedule. If it’s always the same, every pay period (excluding bank holidays and whatever kinks get thrown in there) then you’re on a fixed ratio reinforcement schedule. If you give a puppy a treat every other time the dog sits, then you’ve placed the puppy on a fixed ratio schedule of 2. Every 2 behaviors nets the pup a treat. We usually write this as FR-2. At my job I get paid every week, so I’d be on a fixed ration of FR-7, every 7 days, I get paid. This can vary widely, but there comes a point where, eventually, a FR schedule is too high to control the behavior. If you dog only got rewarded every 150th time he ran a challenging variable-surface track, he’d probably ‘forget’ how to track. The ratio is not high enough to maintain the behavior. Finding the happy medium by gradually fading the reinforcement is required.

A more effective ratio of reinforcement is a variable ratio of reinforcement. A variable ration (VR) is still written as a number, say 5, but that will only be the average ratio, not the constant ratio. The dog may get rewarded for sitting on the 3rd time, on the 6th time, on the 4th time, on the 5th time and on the 7th time. The average ratio is 5, but the frequency varies. This keeps the individual guessing as to when the reward will come. This time? No. This time? No. This time? YES! All right, I got it!! This time? No. Because the individual keeps guessing, the behaviors maintain throughout several instances of non-presentation. Variable ratio schedules are the most resistant to extinction.

Extinction is what we call it when a behavior stops being reinforced, and eventually disappears. That behavior becomes extinguished. Often, when you extinguish a behavior, you experience what is called an ‘extinction burst.’ The phrase, “It gets worse before it gets better!” usually refers to an extinction burst. An extinction burst happens when a behavior that typically produces a reinforcer, suddenly stops producing that reinforcer. The organism’s first response will be to try the behavior again. Maybe harder this time. Or faster, or in rapid succession. Anything to try and make that behavior work again. Think of it like a vending machine. Everyday you put a dollar in, press the button, and get a soda. This has been working nicely for you and you’ve become accustomed to getting a soda every day from this vending machine. One day, you put in your dollar, push the button, and nothing happens. What is the usual response? You may push the button again. And again. You may push it harder, you may push it faster, or multiple times in rapid succession. You might try putting in another dollar. You may get mad, try and shake the machine, even kick it, before you finally give up and walk away. This is an extinction burst. You’re usual behavior of putting a dollar n and pushing a button has worked well until now, and when it didn’t work, you escalated the rate and intensity of your behavior until you realized it wasn’t going to work.

We see this a lot in children, especially younger ones. Say you go through the grocery check out, and there’s candy there, and the child wants the candy. The parent says No, and the child starts to kick and scream. The parent is distracted, busy, maybe doesn’t want to deal with it right now, so they give in and buy the candy. The next time they go to the grocery store, they tell themselves, “I’m going to stick to my guns this time. No candy!” When they approach that checkout and the child screams for candy, the parent says No. The kid screams and cries, and when that doesn’t work, they may scream and cry even louder. They may start kicking and hitting. They’re going through an extinction burst.

Extinction bursts are usually responsible for many people believing that what behavior intervention they’re doing is not working, when in reality, you know it IS working because the presence of the extinction burst means that you’ve blocked the reinforcer maintaining the behavior and you just need to be persistent and wait for the behavior to extinguish. This can take time, and of course, in some instances, such as self-injurious behavior, the risk of injury is far too great and a different intervention must be tried.

Sometimes ignoring an attention-maintained behavior can be rough, such as trying to ignore a barking puppy in a crate while you wait for them to be quiet before you let them out. You don’t want to let them out while they’re barking and therefor inadvertently reinforce the barking, but you just can’t stand the barking. In instances like these we try to elicit the behavior we want, often through prompting, or by offering an alternative or incompatible behavior. For barking dogs, I usually toss a blanket over the crate, which distracts them enough for them to settle down for a moment while they try and figure out what just happened, and then reward the quiet with praise and letting them out.

Interval ratios of reinforcement involve amounts of time, rather than presentations of behavior. Instead of rewarding a dog for every 3rd sit, maybe you’re rewarding for every 30 seconds of a solid down-stay. This would be a fixed interval of 30 seconds. FI-30. Of course, interval ratios can come in the variable form as well, and are also the most resistant to extinction. When you’re proofing that down-stay for competition, chances are you’ll use variable intervals to make sure your dog stays for however long you leave them there- 30 seconds or 10 minutes. These are also written as the average number, so VI-5 for an average of every 5 minutes (or 5 seconds!).

There is also fixed and variable time ratios. In these scenarios, the organism is rewarded on that time ratio regardless of whether or not the behavior has occurred. Feeding your dogs dinner might be on a fixed or variable time ratio. If you feed your dog dinner everyday at 5, regardless of their behavior that day (did they chew a pillow? Or were they a perfect angle while you were gone? Either way- you’re going to feed them!) they’re on a fixed time ratio. If you’re more like me, and feed them generally sometime in the evening (6? 7? 8:30?) then they’re on a variable time ratio. Of course if your dogs are anything like mine, they start campaigning for dinner around 5:30ish, or as soon as I get home from work. The closer to when they believe dinner should be, the more intense the behavior gets. This is typical of a variable time ratio- the closer to the approximate time to when the reward should appear, the more intense and stronger the behavior will get. Then, after the reward appears, the behavior diminishes or disappears entirely until closer to when the end of the next interval will be (the following morning, about 8am, my dogs are back to campaigning!).

Next time you go to train, think about how you’re using these simple schedules of reinforcement throughout your training session. Next time, we’ll talk about compound schedules of reinforcement, as well as different types of differential reinforcement!



Leave a Reply