There’s an internal conflict inside all of us. A battle between two independent systems that influence every decision and behaviour a person can undertake. The term “intention-behaviour gap” exemplifies this phenomenon perfectly, referring to the frequent disconnect between a person’s plans and actions (Rhodes & de Bruijn, 2013). Anecdotal examples are frequently seen, especially in the realms of dieting, exercise, saving, and education. People enter new diets but don’t stick with them, pay for exercise classes but don’t attend them, or take time out to study but spend most of it on social media. In the internal conflict there’s always a winner, but not necessarily the right one.
Learning From Experience
Most animals learn from experience, repeating actions that have led to positive results and avoiding actions that have led to negative results. This type of learning is known as conditioning, and has been heavily studied over the last 100 years (Passer et al., 2008). Conditioning techniques have been used to train dogs to do tricks, get crows to forage for coins on the street, and teach pigeons to carry messages across long distances. Yet this kind of learning is not isolated to other animals; humans are just as susceptible to conditioning. Play any video game and you’ll be exposed to a reinforcement schedule, or a set of rules for the presentation of rewards and punishments. Depending on the genre, you can beat levels to earn experience points or gain new items. These rewards act as signals to the brain that the behaviour has been successful, leading to a surge of dopamine that strengthens the connection between neurons, and solidifies the association between the behaviour and the outcome (Schultz, 2006). As you play the game, you will increase the frequency of the behaviours that lead to these rewards, allowing the reinforcement schedule within the game to shape your actions.
There’s a third pillar in this learning mechanism, and it involves the environment. Not only is there an association created between the reward and the behaviour, the context surrounding the action is also considered. The environment serves as the selection mechanism for the correct script, which contains the behaviour and an expectation of reward. As such, once the cue is viewed, mental energy is saved by choosing the script that has been successful in the past. We can think of the brain as a storage facility for many different behavioural scripts. Every time a cue is encountered the brain searches through for the corresponding script, deploys it, and then compares the received reward against its expected value.
Model-Based and Model-Free Systems
This behavioural script storage system is also known as the model-free (MF) learning system (Botvinick & Weinstein, 2014). It is quick to deploy, not resource intensive and effective in a lot of situations. However, it requires a degree of trial and error, as well as time to explore these possibilities. For behaviours that have dire consequences (such as death), trial and error is not always the best approach. The system is also quite inflexible. After the brain gets used to deploying a specific script in response to a cue, even if the reward stops, it will take some time before the script is no longer called. This is because the brain gets excited when it sees the cue as it thinks an opportunity for future reward is available. This excitement releases the same reward signals in the brain as the actual reward itself, meaning that the valence of the outcome gets transferred to the cue itself (Neal, Wood, & Quinn, 2006). Even when the outcome of the script changes, the excitement of seeing the cue is still rewarding, clouding the brain’s ability to notice the change.
Thankfully, the brain can call on another system that doesn’t suffer from any of these drawbacks: the model-based (MB) system. You’ll know this system as the voice inside your head that is reading this very paragraph. It’s a (mostly) logical, deliberate and predictive system that can imagine future scenarios and create plans of action to expected outcomes. The MB system uses working memory resources to create a model of the current environment and manipulates it mentally to test out different behaviours. Anyone who has tried to complete a maze puzzle will have experienced the ability to test out different paths mentally without actually having to trace them with pen.
The MB system is explicit in its methods, requires effort to deploy and is part of conscious decision-making. The MF system however, is implicit, easy to deploy and unconscious in nature. The only way that the MF system is able to communicate and affect decision-making is through emotional reactions. These two systems sometimes work together but can often produce conflicting requests for behaviours. For example, when looking at a tasty doughnut, the MF system may be screaming with cravings to try and get you to eat it, while the MB system is laying out a reasonable argument as to why the doughnut is not the best food to consume. This battle is typical of many behaviours that people want to do but know that they shouldn’t, or behaviours they know they should do but don’t want to.
The MF system has a strong advantage by being less costly, as the brain is incredibly lazy and is always looking for the path of least resistance. While MF decision-making is not inherently bad, working well for many species, relying on it heavily in modern human society can have problematic consequences. Issues with motivation often result from a constant pushback from the MF system whenever trying to make reasoning-based decisions. Understanding this conflict allows for new techniques to be developed that can align both systems and reduce the problems of low motivation.
Why No One Ever Wants to Study (or exercise etc.)
Delayed gratification is a real problem for the MF system as it can only learn based on frequency and recency. A task like studying is both boring and tiring, and these emotions are stored in the MF memory bank as the outcome or punishment of the activity. Every repetition compounds the association between studying and the negative emotions. These emotions are not only attached to the behaviour of studying, but also to any cues that signal that study will commence. This includes textbooks, notepads, and even thinking about studying. When these cues are encountered the MF system will have an aversive response and will send a wave of negative emotions to try and prevent the study behaviour. If an individual is able to overcome these negative feelings and achieves the desired outcome of passing the test or a getting a good grade, these outcomes commonly occur at a much later time. The MF system can’t associate the good grade with studying as the distance in time between them is too great. As far as it is concerned, study leads to negative outcomes and the act of receiving test scores back from the teacher leads to positive outcomes. Unless a person can receive immediate and frequent positive feedback, the study session will always carry negative emotions with it. No one will ever want to study if the only reward is a good grade.
So what can be done? Video games have been perfecting this technique for years: frequent positive feedback. The MF system reacts solely based on experience, and if the experiences are positive it will react positively. Not only that, the MF system will provide cravings for the behaviour, especially around cues related to it. Littering little treats across a study session will train the brain to associate positive feelings with the activity, increasing enjoyment and attention. If the tasks that a person should do are also the tasks that a person wants to do, motivation is never an issue (Milkman, Minson, & Volpp, 2014).
If you’re trying to get into a new activity like running but you don’t enjoy it, it might be better to try a different exercise; maybe a team sport or a martial arts class. The other option is to find a way to add frequent positive feedback to the exercise. This can mean using different apps like Zombies Run that turn the activity into a game, or by watching your favourite show only while running on a treadmill (known as temptation bundling Milkman et al., 2014). The important thing is that the activity is rewarding, that way you will want to do it.
Botvinick, M., & Weinstein, A. (2014). Model-based hierarchical reinforcement learning and human action control. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 369(1655). doi:10.1098/rstb.2013.0480
Milkman, K. L., Minson, J. A., & Volpp, K. G. M. (2014). Holding the Hunger Games Hostage at the Gym: An Evaluation of Temptation Bundling. Management Science, 60(2), 283–299. doi:10.1287/mnsc.2013.1784
Neal, D. T., Wood, W., & Quinn, J. M. (2006). Habits—A Repeat Performance. Current Directions in Psychological Science, 15(4), 198–202. doi:10.1111/j.1467-8721.2006.00435.x
Passer, M., Smith, R., Holt, N., Bremner, A. J., Sutherland, E., & et al. (2008). Psychology: the science of mind and behaviour. Retrieved from http://eprints.gold.ac.uk/id/eprint/5051
Rhodes, R. E., & de Bruijn, G.-J. (2013). How big is the physical activity intention-behaviour gap? A meta-analysis using the action control framework. British Journal of Health Psychology, 18(2), 296–309. doi:10.1111/bjhp.12032
Schultz, W. (2006). Behavioral theories and the neurophysiology of reward. Annual Review of Psychology, 57, 87–115. doi:10.1146/annurev.psych.56.091103.070229