I recently had a call with a founder who was frustrated that the new AI models were “getting stupider”.
They were plugging in information as before but the AI was tripping over itself, thinking in loops and generally making a pigs ear of it all.
I asked them to show me.
Turns out they were adding "think step-by-step" to every prompt, just as they'd learned from countless prompting guides!
They weren't doing anything wrong per se. The AI landscape had simply shifted beneath their feet.
Increasingly we have so-called reasoning models.
What we're witnessing is fascinating – modern reasoning models now do much of their thinking internally, automatically breaking problems into steps before responding. Adding explicit step-by-step instructions can actually disrupt this process, like interrupting someone who's already deep in thought and asking what they are thinking about.
As we conclude this Playbook on prompting, let's explore how reasoning techniques are evolving and the new best practices that will keep you ahead of the curve.
Let's get started:
At its core, reasoning in AI is about breaking complex problems into manageable steps, considering multiple perspectives, weighing evidence, and drawing logically sound conclusions.
(Kinda the same with humans. But that’s a different discussion!)
Very crudely we have non-reasoning models and reasoning models:
Early AI models typically jumped straight to conclusions without methodically working through problems. This led to the development of techniques like Chain of Thought prompting, which served as external scaffolding to guide the AI toward more thorough analysis. GPT 3.5 and 4 are examples.
Modern reasoning models now have much of this analytical capability built in. They automatically:
This internal reasoning happens before you see the response – like the model drafts a detailed analysis, then edits it down to just the essential conclusions.
It’s pretty magical.
Understanding this evolution is crucial for effective prompting. What worked brilliantly with previous generations of models might now be redundant or even counterproductive. We need to adapt our techniques to match the capabilities of the models we're using.
Chain of Thought (CoT) prompting was one of the most significant breakthroughs in getting better performance from AI systems. Before we talk about how it's evolving, let's understand what it actually is and why it was so revolutionary.
At its simplest, Chain of Thought is a prompting technique that asks the AI to break down its thinking process into visible steps before providing an answer. Instead of jumping straight to conclusions, the model shows its reasoning – like working out a math problem on paper instead of just writing the answer.
Here's what CoT looks like in practice…
Standard prompt: "What's the average monthly payment on a £300,000 mortgage at a 4.5% interest rate over a 25-year term?"
This will (more often than not) go horribly wrong. Older models are a bit crap at this sort of mathematical/logical reasoning. You might run the prompt 10 times and get 10 different answers. No bueno.
Instead with a Chain of Thought prompt: "What's the average monthly payment on a £300,000 mortgage at a 4.5% interest rate over a 25-year term? Let's work through this step by step."
With the CoT prompt, the model might respond with something like this:
Let's work through this step by step:
Therefore, the monthly payment would be approximately £1,661.
Chain of Thought transformed AI performance for several critical reasons:
1. Error Reduction: By breaking problems into steps, models became much less likely to make calculation errors or logical mistakes. Each intermediate step serves as a checkpoint, keeping the reasoning on track.
2. Complex Problem Solving: CoT allowed models to tackle much more complex problems than they could previously handle. Problems that require multiple steps of reasoning became solvable.
3. Transparency: The visible reasoning gave users insight into how the AI was approaching problems, making it possible to spot where things might be going wrong.
4. Educational Value: The step-by-step approach made AI outputs more useful for learning, as users could follow the reasoning process rather than just seeing the answer. It’s sort of like writing your “workings out” in a maths exam.
5. Confidence Assessment: Users could evaluate the soundness of the AI's reasoning, rather than having to blindly trust its conclusions. Traditional models give us an answer and we have to just hope it’s true! Which is tricky because hallucinations exist!
The impact of CoT can't be overstated – it turned models from simple text predictors into systems capable of sophisticated reasoning across mathematics, logic, planning, and more. It was (and is, in certain situations) a very clever hack to get better results from AI.
Now though we have models that do this for us: reasoning models.
Modern reasoning models now generate their own internal chain-of-thought before responding. Think of the o Series by ChatGPT - o1 and o3 for example. Or any AI that has a “Deep Research” function.
They think through problems step by step, check their work, and sometimes even revise their thinking – all before showing you a single word.
They are running a chain of thought process internally. With some bells and whistles of course!
This changes everything about how we should prompt these models. Rather than forcing the model to show every step of its reasoning, we can now focus on guiding its attention and shaping its output.
This is an evolving field so for now we’ll stick to best practices rather than hard and fast rules. As with everything in AI it’s fluid! Working with modern reasoning models requires a new set of best practices:
1. Guide, don't babysit The AI already “thinks”—just give it a clear job.
Example: "You're a growth-marketer. Suggest 3 paid channels for a $10k budget."
Rather than scripting every step of the thinking process (as we will do with non-reasoning models), focus on clearly defining the task and letting the model's internal reasoning take care of the rest. So in the context of the RISEN framework we can drop the S!
2. Lead with the role Start by telling the model who it is.
Example: "You're a CFO explaining cash flow to non-finance staff..."
Role-based prompting provides context that shapes how the model approaches the problem, without micromanaging its reasoning process. This remains valid with reasoning models - we’re just telling it who to reason as. We retain the R of the RISEN framework.
3. State the output first Spell out format and style before asking.
Example: "Return a 5-bullet checklist, each bullet < 20 words."
By clearly defining what you want the final output to look like, you can let the model handle the reasoning process while ensuring you get a result in exactly the format you need. This is the E of RISEN - still valid!
4. Prototype hot, deploy cold Tinker with loose settings, then lock them down.
Example: Draft ideas at temperature 0.7, final runs at 0.2.
When developing prompts, use higher temperature settings to explore different approaches. Once you've found what works, lower the temperature for consistent, reliable results. This is generally good advice for all prompt engineering but the great variability of reasoning models makes it even more powerful.
5. Budget tokens like cash Extra words = extra cost. Trim the fat.
Example: Paste the exec summary, not the 30-page report.
Since modern models handle reasoning internally, you can focus on providing just the essential information needed, rather than including verbose instructions. This is even more important than with non-reasoning models because of the additional costs associated with them.
6. Build your fallback Use cheaper models for easy jobs, premium for tough ones.
Example: Use advanced models for strategy, simpler models for spell-check.
This practice recognises that not all tasks require sophisticated reasoning – match the model to the complexity of the task. We talk about the capability cliff before. This is particularly the case with reasoning models which tend to be more expensive.
While basic reasoning is now baked into many models (via chain of thought), certain complex problems still benefit from specialised approaches. These are just additional ways to nudge how the AI to think “deeper”. They work with reasoning and non-reasoning models and are worth adding to your toolkit.
Tree of Thoughts encourages the AI to consider multiple solutions before committing to one – similar to how chess players evaluate different moves before choosing.
How it works:
For this [problem/challenge], please:
1. Generate 3 distinct approaches to solving it
2. Briefly evaluate each approach's strengths and limitations
3. Select the most promising approach and develop it into a full solution
This technique is particularly effective for:
Step-back prompting asks the AI to consider the broader context before addressing specifics – like taking a step back to see the whole forest before examining individual trees.
How it works:
Before addressing this specific question about [topic], first consider the broader context, relevant principles, and frameworks an expert would apply. Then provide a focused answer.
This approach works particularly well for:
Self-consistency involves having the AI verify its own work using different approaches – like double-checking a calculation using a different method.
How it works:
I need a reliable answer to this problem. Please:
1. Solve it using your primary method
2. Verify the solution using a different approach
3. If there are discrepancies, determine which approach is more reliable and why
This technique is valuable for:
If you are building workflows or software you might use a less advanced (read:cheaper!) model to check the work of the more advanced model.
These are just supplementals ways to get our AI to solve problems for us. Honestly these are solid human thinking methods! They just become prompting techniques because of the particular context. Remember, this is all ultimately communicating what we want the AI to do for us!