Today I Learned (TIL)

Things I've learned or things I find interesting will be logged here. For long-form content, you might want to check out my newsletter.

On this page

Prompt Engineering vs Context Engineering

From Anthropic: https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents

Prompt Engineering:

Prompt engineering refers to methods for writing and organizing LLM instructions for optimal outcomes

Context Engineering

Context engineering refers to the set of strategies for curating and maintaining the optimal set of tokens (information) during LLM inference, including all the other information that may land there outside of the prompts.

I like this framing because in the world of agentic systems, writing clever prompts alone won’t cut it. Agents operate in dynamic environments, constantly juggling new information. The real skill is curating which pieces of that evolving universe end up in context at the right moment. It’s a subtle but powerful shift that mirrors how good software architectures focus not only on code, but also on data flow.

If you’re building or designing AI agents, this is worth a read.

Why LLMs Confidently Hallucinate a Seahorse Emoji That Never Existed

Ask any major AI if there's a seahorse emoji and they'll say yes with 100% confidence. Then ask them to show you, and they completely freak out, spitting random fish emojis in an endless loop. Plot twist: there's no seahorse emoji. Never has been. But tons of humans also swear they remember one existing.

Check out the analysis in this post 👉🏽 https://vgel.me/posts/seahorse/

A seahorse emoji was actually proposed to Unicode in 2018 but got rejected. Makes sense we'd all assume it exists though. Tons of ocean animals are emojis, so why not seahorses? The post digs into what's happening inside the model using this interpretability technique called logit lens (#todo: learn more about logit lens). The model builds up this internal concept of "seahorse + emoji" and genuinely believes it's about to output one. But when it hits the final layer that picks the actual token, there's no seahorse in the vocabulary. So it grabs the closest match, a tropical fish or horse, and outputs that. The AI doesn't realize it messed up until it sees its own wrong answer. Then some models catch themselves and backtrack, others just spiral into emoji hell.

I tried this myself with both Claude and ChatGPT and it looks like they've mostly fixed this now.

ChatGPT went through the whole confusion cycle (horse, dragon, then a bunch of random attempts) before finally catching itself and admitting there's no seahorse emoji. Claude went even further off the rails, confidently claiming the seahorse emoji is U+1F994 and telling me I should be able to find it on my keyboard.

It's a perfect example of how confidence means nothing. The model isn't lying or hallucinating in the usual sense. It's just wrong about something it reasonably assumed was true, then gets blindsided by reality.

Prompt Engineering
dspy

Goodbye Manual Prompts, Hello DSPy

Today I learned about a smarter way to deal with the headache of prompts in production. Drew Brunig’s talk at the Databricks Data + AI Summit is hands down the clearest explanation I’ve seen of why traditional prompting doesn’t scale well. He compares it to regex gone wild: what starts as a neat solution quickly becomes a brittle mess of instructions, examples, hacks, and model quirks buried inside giant text blocks that no one wants to touch. A single “good” prompt can have so many moving parts that it becomes practically unreadable.

DSPy takes a very different approach. Instead of hand-crafting and maintaining prompts, you define the task in a structured way and let the framework generate and optimise the prompts for you. You describe what goes in and what should come out, pick a strategy (like simple prediction, chain-of-thought, or tool use), and DSPy handles the formatting, parsing, and system prompt details behind the scenes. Because the task is decoupled from any specific model, switching to a better or cheaper model later is as easy as swapping it out and re-optimising.

This feels like a glimpse of where prompt engineering is heading: less manual tinkering, more structured task definitions and automated optimisation. I’ll definitely be trying DSPy out soon.

https://www.youtube.com/watch?v=I9ZtkgYZnOw

leadership

Nemawashi (根回し)

There’s a Japanese concept called nemawashi (literally “root-walking”) that offers a way around the dreaded “big reveal” in engineering proposals. Instead of marching into a meeting with your fully-formed design and expecting everyone to buy it, nemawashi encourages you to talk privately with all relevant stakeholders first and get feedback, surface objections, let people shape the idea, and build informal buy-in. By the time the formal meeting happens, the decision is mostly baked, not bombarded.

When I read “Quiet Influence: A Guide to Nemawashi in Engineering,” what struck me is how often we dismiss the political or social side of engineering work. A technically perfect solution can still die if colleagues feel blindsided, ignored, or defensive in a meeting. Adopting nemawashi has the power to transform you from someone pushing an idea to someone guiding a shared direction. For me (and for readers who work in cross-team or senior roles), it underlines a critical truth: influence is relational, not just visionary.

👉🏽 https://hodgkins.io/blog/quiet-influence-a-guide-to-nemawashi-in-engineering/

AI Engineering

AI agents are starting to do real work

Ethan Mollick argues that AIs have quietly crossed a line. OpenAI recently tested models on complex, expert-designed tasks that usually take humans four to seven hours. Humans still performed better, but the gap is shrinking fast. Most AI mistakes were about formatting or following instructions, not reasoning.

The standout example is Claude 4.5 replicating academic research on its own. Work that would have taken hours was done in minutes, hinting at how whole fields could change when repetitive but valuable tasks get automated.

It’s a reminder that the real shift isn’t just about replacing jobs. It’s about rethinking how we work with AI so we don’t drown in a sea of AI-generated busywork.

👉 Read Ethan’s full piece