
Your “best” training programme might be your least effective one.
Not the one with the lowest completion rates or worst feedback. The one everyone loved. The one that felt smooth, clear, and effortless. The one you pointed to as a success.
That’s the uncomfortable premise of this article — and the research behind it has been building for over thirty years. This concept is called desirable difficulties, and it suggests that almost everything L&D teams optimise for is working against long-term learning.
As James Clear, author of Atomic Habits, puts it, “If you’re always right, you’re not learning… If you’re always winning, you’re undershooting your potential.” The same applies to your learners. Let’s find out why.
The Counterintuitive Truth
Picture the scene. Your training programme has just wrapped up. Completion rates are at an all-time high. Satisfaction scores came back at 4.8 out of 5. By every metric you track, it was a success. Back pats and confetti cannons abound.
But three months later, the same learners fail an audit. How is that possible? They completed the training. They said they enjoyed it. So what went wrong?
Nothing went wrong with the training itself. The issue is with how we measure training. And this brings us to a paradox that most L&D professionals have never been told about.
The strategies that produce the best performance during training — clear, easy-to-follow content that learners breeze through — tend to produce the worst long-term retention. On the other hand, strategies that feel slow, effortful, and even frustrating tend to produce the strongest long-term retention.
Cognitive psychologist Robert Bjork has spent decades studying this paradox. His conclusion is that we confuse performance with learning. Performance is what you can demonstrate right now. Learning is what you can still apply weeks, months, or even years later.
They’re not the same thing. And the conditions that maximise one (performance) often undermine the other (learning). Bjork’s term for the strategies that bridge this gap is desirable difficulties.
What Are Desirable Difficulties?
The term was coined by Robert Bjork in 1994, in a paper on memory and metacognition. The definition is precise, and this precision matters, as “desirable difficulties” is one of the most misapplied concepts in learning science.
A desirable difficulty is a condition of learning that slows down performance during training but enhances long-term retention and transfer afterwards.
The word “desirable” is doing a lot of heavy lifting in that sentence. After all, not all difficulty is good. Confusing instructions, cluttered interfaces, unnecessary complexity — this all creates friction. And that’s not the kind of difficulty we’re seeking.
The difficulty is only desirable when it forces the brain to engage in deeper cognitive processing — the kind that strengthens how information is encoded, stored, and retrieved. The effort has to be productive.
This is the distinction that separates Bjork’s framework from the lazy interpretation that “harder is always better”. It isn’t. Harder is only better when the difficulty targets the right cognitive processes.
So what makes difficulty desirable at the cognitive level? The answer lies in a distinction Bjork draws between two things most people assume are the same, but aren’t.
The Performance-Learning Paradox
Here’s a truth that may make some learning professionals uncomfortable: performance and learning are not the same thing.
- Performance is what a learner can demonstrate right now. It’s the score achieved in the end-of-module quiz. It’s the confidence in the room at the end of a workshop. And it’s the ability to recall a fact five minutes after reading it. It’s visible, measurable, and reassuring.
- Learning is what a learner can still recall, apply, and transfer weeks or months later. It holds under different contexts, prompts, and levels of pressure. It’s difficult to quantify in the moment. It only reveals itself afterwards.
One of Bjork’s key insights is that the conditions that maximise performance during training often undermine learning, and vice versa. This is known as the performance-learning paradox. It helps to explain why 90% of corporate training fails to produce lasting change.
Here’s how it works. When training is clear and easy to follow, learners perform well in the moment. But that fluency comes at a cost. The brain hasn’t had to work hard enough to build durable memory traces. As a result, storage strength never gets built.
On the other hand, when training introduces the right sort of difficulties (we’ll come to this shortly), performance during the session often drops. Learners feel less confident. Errors creep in. But the brain is now doing the deeper cognitive work that builds storage strength.
Put simply, low cognitive effort is the enemy of long-term retention.
The New Theory of Disuse
Robert and Elizabeth Bjork formalised this through their new theory of disuse. This theory proposes that every memory has two strengths:
- Retrieval strength: how easily you can access it right now.
- Storage strength: how durably it’s encoded for the long term.
Easy learning conditions inflate retrieval strength without building storage strength. Desirable difficulties do the opposite. They temporarily reduce retrieval strength while building the storage strength that determines whether the knowledge survives.
That’s the theoretical foundation. But which specific strategies create this effect?
The Four Desirable Difficulties
Bjork and Bjork (2011) identified four core strategies that consistently produce the performance-learning gap described above. Each approach is backed by decades of independent research.
1. Retrieval Practice

Retrieval practice is the act of pulling information out of memory rather than putting it back in. Every time a learner is asked to recall something — through a quiz, a flashcard, or a scenario — the brain has to reconstruct the answer from scratch. This difficulty strengthens the neural pathway that stores it.
The performance-learning paradox is visible here in its purest form. Re-reading feels productive but produces weak retention. Retrieval feels harder, but produces dramatically stronger long-term memory.
In fact, Roediger and Karpicke (2006) found that retrieval practice produced 80% recall after one week, compared to just 36% for restudy. These aren’t small margins.
2. Spaced Practice

Spaced practice works by distributing learning across time, rather than concentrating it in a single session. The brain consolidates memories during the gaps between study sessions, and each return to the material after a delay forces a small act of retrieval.
This is why cramming (or ‘massed practice’) works for an exam, but fails for long-term retention. Spaced practice may feel slower, and it produces more errors during training, but it builds knowledge that lasts.
Rohrer and Pashler (2007) tested this directly. They found that spacers scored 74% on a test one week later, compared to 49% for massers. Same material. Same total study time. Dramatically different outcomes.
3. Interleaving

Interleaving takes place when you mix different topics, problem types, or skills within a single practice session, rather than completing all of one type before moving on to the next.
This contrasts with blocking, which is how almost all training is structured. Learn Topic A. Practise Topic A. Move on to Topic B. It’s neat, logical, and intuitive. It also produces significantly weaker learning.
Interleaving is also a closer match to reality. In the workplace, problems and tasks don’t arrive in a neat, predictable order. There’s always legwork to be done. As a bonus, alternating between topics introduces natural spacing, giving the brain time to consolidate the information.
Rohrer, Dedrick, Hartwig, and Cheung (2020) confirmed this in a randomised trial with 787 students over three months. The interleaving group scored 61% on an unannounced test, compared to 38% for the blocked group.
4. Varied Practice

Varied practice occurs when you change the conditions under which learning takes place. Think different locations, formats, devices, and times of day. When you learn something in a single context, your brain ties the memory to that environment. Change the environment, and the retrieval route weakens.
Smith, Glenberg, and Bjork (1978) demonstrated this with a simple experiment. Participants studied a 40-word list twice — one group in the same room both times, the other in two different rooms. The varied-context group recalled 24.4 words compared to 15.9 for the same-context group. That’s a 53% boost.
In training terms: when every module is delivered in the same format, the brain encodes within a narrow band of contextual cues. However, if you mix in quizzes, scenarios, videos, and discussions, you start to widen the net.
Desirable vs Undesirable Difficulty
The pattern is clear: making learning harder tends to make it stick. But this concept can be dangerous if misapplied. After all, not all difficulty is desirable. Let’s look at some examples:
- Poorly written instructions that force the learner to decode what they’re being asked to do before they can start learning.
- Cluttered or unintuitive interfaces that turn navigation into a task in itself. Finding the next module shouldn’t feel like an odyssey.
- Content pitched far above the learner’s current level with no scaffolding, no context, and no way to bridge the gap.
- Unnecessary time pressure on tasks where speed adds nothing. Rushing a learner through complex material helps no one.
- Irrelevant complexity like trick questions or ambiguous wording tests learners’ patience, not their knowledge.
- Cognitive overload from cramming too much into a single session. Twenty topics in an afternoon isn’t challenging, it’s wasteful.
The test is whether the difficulty targets the right cognitive work. A 48-hour delay between learning and testing is difficult, but it forces retrieval from long-term memory rather than short-term recognition. That’s desirable.
But strip away the mechanisms that support encoding and long-term memory formation, and all you’re left with is difficulty.
Why L&D Gets This Wrong
Think about how most organisations evaluate training. Completion rates tell you who showed up. Satisfaction surveys tell you who enjoyed it. End-of-module quizzes tell you who could recall information five minutes after reading it. You’re measuring immediate performance, not learning.
Incentive structures often make this worse. L&D teams are typically measured on what’s easiest to report: completion, satisfaction, and pass rates. These can be pulled from your LMS in minutes, optimised for in weeks, and presented to leadership in a single slide.
The metrics that actually matter — recall after a delay, behavioural change on the job, and performance impact — are harder to measure and slower to materialise. It’s no surprise that most teams gravitate towards what’s visible and immediate, even when the evidence suggests that’s an ineffective approach.
So how do we break the cycle?
How to Implement Desirable Difficulties
Let’s start with the good news. Implementing desirable difficulties doesn’t require a complete redesign of your learning programmes. Most can be introduced into existing initiatives with relatively small structural changes. The key is knowing where to apply them. Here’s our seven-step guide.

1. Space It Out
If your training is delivered in a single session, that’s the first thing to change. Break your content down into shorter sessions distributed over days or weeks. A five-module compliance course delivered across three weeks will produce significantly better retention than the same course delivered in an afternoon.
Pro-tip: The optimal gap between sessions is roughly 10-20% of the time you need learners to retain the information (Pashler et al., 2007).

2. Test Before You Teach
Send your learners a short quiz on material they haven’t covered yet. Do this before the workshop, before the eLearning module, before the onboarding session, and so on. They’ll get most of it wrong. That’s the point.
Pro-tip: The pretesting effect (Richland, Kornell and Kao, 2009) shows that attempting retrieval before learning primes the brain to encode the correct information with greater depth when it arrives.

3. Introduce Interleaving
Stop grouping all problems and scenarios of the same type together. For example, if you’re training salespeople on three objection-handling techniques, don’t practise all of Technique A, then all of B, then all of C. Instead, mix them up.
Pro-tip: Start with blocked practice for the first exposure. Then switch to interleaving for all subsequent practice. Interleaving before any foundation is built crosses the line into undesirable difficulty.

4. Vary the Format
Don’t deliver every module the same way. Follow a video with a scenario. Follow that scenario with a quiz. Then follow that quiz with a discussion. Each format change creates a new encoding context, which builds additional retrieval routes.
Pro-tip: Use a simple rule of thumb: no two consecutive touchpoints in the same format. If it feels repetitive, the encoding probably is too.

5. Ditch the Final Exam
A single 20-question quiz at the end of a course measures short-term recognition. But four 5-question quizzes delivered over the following fortnight measure (and strengthen) long-term recall. That’s the same number of questions with a fundamentally different outcome.
Pro-tip: Use a 1-3-7-21 schedule to help combat the forgetting curve. Prompting retrieval on these days strengthens the memory trace at the point when it’s starting to fade.

6. Measure What Matters
Start tracking recall after a delay, not just completion rates and learner satisfaction. If you can’t measure everything, pick one high-stakes area (compliance, safety, product knowledge, etc.) and run a delayed test 30 days after training.
Pro-tip: Compare your end-of-session quiz scores against a surprise retest 30 days later. This gives you a retention baseline to improve against.

7. Prepare Your Learners
Knowing difficulty is desirable doesn’t make it feel any less difficult. However, a brief explanation at the start of a programme — ”this is designed to be challenging, because that’s how the brain builds lasting knowledge” — can reframe the entire experience.
Pro-tip: Showcase the research behind your approach and frame it as a competitive advantage. Learners who understand the why are far less likely to push back on the how.
The Final Word
Desirable difficulties are not complicated. Space learning over time. Test before you teach. Mix topics instead of blocking them. Vary the format. Replace single assessments with distributed retrieval. These are small, structural changes — not a complete redesign.
The hard part isn’t implementation. It’s trust. Trusting the research over the satisfaction scores. Trusting that a dip in short-term performance is the price of long-term retention and that the price is worth paying.
The organisations that make this shift will build workforces that don’t just complete training, but remember it, apply it, and perform because of it. The rest will keep optimising for comfort and wondering why nothing sticks.
Thanks for reading. If you’ve enjoyed this content, please connect with me here or find more articles here.
The Impact Suite puts the “desirable” in desirable difficulties through spacing, retrieval, interleaving, and variation engineered into a single platform. Book a demo for a live walkthrough or download our Learning Theory guidebook to go deeper into the research.