<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Context Engineering vs. Memory Engineering vs. Harness Engineering in Software - General</title>
    <link>https://community.hpe.com/t5/software-general/context-engineering-vs-memory-engineering-vs-harness-engineering/m-p/7266438#M1505</link>
    <description>&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Anthropic tested Claude against a simple prompt: "build a retro game maker." A solo run cost $9, took 20 minutes, and produced a game where the core feature was broken. A harnessed run, with a planner, generator, and evaluator working together, cost about $200, ran for 6 hours, and produced a working game with an AI-assisted sprite editor.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Same model. Same prompt. Roughly 22x the cost. Completely different result.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;The difference was not the model. It was the system around the model.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;One useful way to think about that system is as three layers:&lt;/FONT&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;&lt;STRONG&gt;Context Engineering&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;- curating what the model sees at each step&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;&lt;STRONG&gt;Memory Engineering&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;- deciding what the agent retains across time&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;&lt;STRONG&gt;Harness Engineering&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;- designing the orchestration, evaluation, and infrastructure the agent runs inside&lt;/FONT&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;I am using those as practical labels, not as settled academic categories. In real systems they blur together. But separating them makes agent failures much easier to diagnose.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Many teams still blur them together. The teams shipping serious agents usually cannot afford to.&lt;/FONT&gt;&lt;/P&gt;&lt;HR /&gt;&lt;P&gt;&lt;FONT size="4"&gt;&lt;STRONG&gt;&lt;FONT face="helvetica"&gt;1. Context Engineering: Curating the Model's Attention Budget&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;The term context engineering picked up speed in 2025, but Anthropic's version is still the clearest operational definition I have seen: context engineering is the iterative curation of what goes into the model's limited context window from a constantly evolving universe of possible information. Unlike prompt engineering, which usually focuses on static instructions, context engineering is dynamic. The curation happens every time you decide what to pass to the model.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;The core constraint is simple. Chroma's work on context rot, and Anthropic's follow-on discussion of it, make the same point: more tokens do not automatically mean better reasoning. Context is a finite attention budget.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;In practice, most of the mechanics reduce to four moves:&lt;/FONT&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;&lt;STRONG&gt;Write&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;- persist notes, plans, and intermediate results outside the active window&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;&lt;STRONG&gt;Select&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;- pull the right pieces in just in time; Claude Code loads&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;CLAUDE.md&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;up front but uses&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;grep&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;glob&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;to navigate a codebase on demand&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;&lt;STRONG&gt;Compress&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;- summarize and prune; Anthropic specifically calls out tool-result clearing as one of the safest forms of compaction&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;&lt;STRONG&gt;Isolate&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;- split work across subagents with clean windows; Anthropic's multi-agent research system beat single-agent Claude Opus 4 and reached 90.2% in Anthropic's evaluation&lt;/FONT&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Drew Breunig's failure-mode map explains why this matters. Long contexts fail through&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;context poisoning&lt;/STRONG&gt;,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;context distraction&lt;/STRONG&gt;,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;context confusion&lt;/STRONG&gt;, and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;context clash&lt;/STRONG&gt;. His examples are concrete: agent traces getting stuck beyond ~100K tokens, a quantized Llama 3.1 8B failing with 46 tools but succeeding with 19, and sharded prompts causing an average 39% drop in one study.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Anthropic's guiding principle is the right one:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;"find the smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome."&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;HR /&gt;&lt;P&gt;&lt;FONT size="4"&gt;&lt;STRONG&gt;&lt;FONT face="helvetica"&gt;2. Memory Engineering: Persistence Beyond the Context Window&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Memory becomes the problem the moment a task spans sessions.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Anthropic put it well: "the core challenge of long-running agents is that they must work in discrete sessions, and each new session begins with no memory of what came before. Imagine a software project staffed by engineers working in shifts, where each new engineer arrives with no memory of what happened on the previous shift."&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Memory is the least settled of these three categories, but it matters as soon as an agent has to preserve state across resets, handoffs, or long-running tasks.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;A useful lens borrows from cognitive science:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;episodic&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;memory for prior experiences and examples,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;semantic&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;memory for facts and relationships, and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;procedural&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;memory for learned rules and instructions.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;One of the simplest patterns that keeps showing up is structured note-taking. Anthropic's Pokemon example is the clearest demonstration I have seen: without any special prompting about memory structure, the agent maintained tallies across thousands of steps, built maps, tracked achievements, and kept strategic combat notes. After context resets, it read its own notes and kept going.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Anthropic turned the same idea into a progress-file pattern for long-running coding: each fresh session reads a&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;claude-progress.txt&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;file and recent git history, makes incremental progress, then leaves a clean commit and a structured handoff for the next session. Dead simple. Very effective.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;The harder problem is retrieval, not storage. Saving state is easy. Deciding what to surface, when to surface it, and how visible that selection should be to the user is much harder. Hidden memory injection can feel helpful one moment and invasive the next.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;The cleanest distinction is this: short-term memory often lives inside the agent's active state, while long-term memory usually lives outside the context window and gets pulled back in when needed.&lt;/FONT&gt;&lt;/P&gt;&lt;HR /&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT face="helvetica" size="4"&gt;3. Harness Engineering: The System the Agent Runs Inside&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Harness design has moved especially quickly over the past year. Anthropic has published a useful sequence of posts on long-running harnesses and managed agents, while Cognition made the counterpoint case for defaulting to simpler, single-threaded agents unless you truly need more architecture.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;A harness is the orchestration loop that calls the model, routes tool calls, manages sessions, and governs how the agent operates. If context is the model's RAM and memory is its disk, the harness is the operating system.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;&lt;STRONG&gt;What harness engineering encompasses:&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;Orchestration and Session Management&lt;/FONT&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Anthropic's long-running coding work showed that naive agents fail in predictable ways. First, they try to one-shot the whole app and leave half-finished work behind. Later, they swing the other way and declare victory too early.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;The first fix was a two-agent harness: an initializer that sets up the environment, feature list, scripts, and progress file, followed by coding agents that work one feature at a time and leave clean artifacts behind.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;The next step was a three-agent architecture: planner, generator, evaluator. The planner expands a one-line prompt into a fuller spec. The generator builds. The evaluator tests the running application with browser automation. That planner-generator-evaluator loop produced far better results than a solo run on Anthropic's retro game maker example.&lt;/FONT&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;The Bitter Lesson of Harness Design&lt;/FONT&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Anthropic's broader lesson is the one most teams miss:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;harness components encode assumptions about what the model cannot do, and those assumptions go stale fast.&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;They originally needed sprint decomposition and context resets because Sonnet 4.5 showed "context anxiety" as it approached the end of a long run. When they moved to Opus 4.5, some of that scaffolding stopped helping. With Opus 4.6, they removed the sprint construct entirely and let the model work coherently for more than two hours in a continuous build.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;That is the right default posture toward harnesses: treat them as perishable, not permanent.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;This is also what makes Anthropic's April 2026 Managed Agents piece important. It treats the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;session&lt;/STRONG&gt;,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;harness&lt;/STRONG&gt;, and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;sandbox&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;as separable components so any implementation can be swapped without disturbing the others. The brain is decoupled from the hands.&lt;/FONT&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;Tool Design&lt;/FONT&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Anthropic said something in its SWE-bench work that still feels underappreciated:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;"We actually spent more time optimizing our tools than the overall prompt."&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Their later context engineering post makes the same point more bluntly: bloated tool sets create ambiguous decision points. If a human engineer cannot clearly say which tool should be used, the model will not do better.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;The practical implication is task-specific tool curation. Do not hand an agent every tool you own. Give it the smallest viable set for the job.&lt;/FONT&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Evaluation as a Harness Component&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Anthropic's March 2026 harness post surfaced a critical finding about self-evaluation:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;agents are poor judges of their own work.&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;When asked to evaluate work they produced, they tend to praise it, even when the result is obviously mediocre to a human reviewer.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;The fix is structural. Separate the agent doing the work from the agent judging it. Anthropic's line here is worth remembering:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;"Tuning a standalone evaluator to be skeptical turns out to be far more tractable than making a generator critical of its own work."&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;In their setup, the evaluator used Playwright MCP to click through a running application like a real user. That let it catch bugs the generator missed entirely.&lt;/FONT&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;Production Reliability&lt;/FONT&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Anthropic's Managed Agents architecture pushes the same idea into production infrastructure. Containers are cattle, not pets. The session log lives outside the harness, so crashes do not erase state. Credentials are kept out of the sandbox where generated code runs.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;That architectural decoupling had real operational payoff: Anthropic reported roughly a 60% drop in p50 time-to-first-token and more than a 90% drop in p95.&lt;/FONT&gt;&lt;/P&gt;&lt;HR /&gt;&lt;P&gt;&lt;FONT size="4"&gt;&lt;STRONG&gt;&lt;FONT face="helvetica"&gt;How the Three Layers Interact&lt;/FONT&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Context Engineering&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;→ Core question: "What should the model see right now?" → Scope: Single inference step → Analogy: Working memory / RAM → Failure modes: Poisoning, distraction, confusion, clash → Changes: Every agent step&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Memory Engineering&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;→ Core question: "What should persist across sessions?" → Scope: Across sessions and time → Analogy: Long-term memory / disk → Failure modes: Irrelevant retrieval, stale memories, privacy problems → Changes: Every session or periodically&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Harness Engineering&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;→ Core question: "How should the agent operate?" → Scope: Entire system lifecycle → Analogy: Operating system / kernel → Failure modes: One-shotting, premature completion, cascading errors, poor self-evaluation → Changes: Every model generation and major deploy cycle&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;These layers are not independent. Memory feeds context, and the harness decides when and how that happens. Matt Webb's phrase&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;"context plumbing"&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;is a good metaphor here: context is dynamic, distributed, and time-sensitive. A big part of agent engineering is moving the right context to the right place at the right moment.&lt;/FONT&gt;&lt;/P&gt;&lt;HR /&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT face="helvetica" size="4"&gt;The Takeaway&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;These layers shift with every model release. Anthropic's line about harnesses is the right one:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;"the space of interesting harness combinations doesn't shrink as models improve. Instead, it moves."&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Sonnet 4.5 needed more scaffolding than Opus 4.6. Opus 4.6, in turn, makes new kinds of long-running builds worth trying.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Three things to do now:&lt;/FONT&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;&lt;STRONG&gt;Audit your context.&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Find what is wasting attention budget. Bigger context windows do not remove context-management problems. They mostly let you postpone them.&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;&lt;STRONG&gt;Design memory as infrastructure.&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Start simple with durable notes, progress files, and explicit handoff artifacts.&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;&lt;STRONG&gt;Stress-test your harness on every model release.&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Components that were load-bearing six months ago may already be dead weight.&lt;/FONT&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;For many teams, the harder work is no longer the model alone. It is the system around the model.&lt;/FONT&gt;&lt;/P&gt;&lt;HR /&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;&lt;FONT size="5"&gt;&lt;STRONG&gt;&lt;FONT face="helvetica" size="4"&gt;Readings&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Anthropic: "Building effective agents" - anthropic.com/engineering/building-effective-agents&lt;/LI&gt;&lt;LI&gt;Anthropic: "How we built our multi-agent research system" - anthropic.com/engineering/multi-agent-research-system&lt;/LI&gt;&lt;LI&gt;Anthropic: "Effective context engineering for AI agents" - anthropic.com/engineering/effective-context-engineering-for-ai-agents&lt;/LI&gt;&lt;LI&gt;Anthropic: "Effective harnesses for long-running agents" - anthropic.com/engineering/effective-harnesses-for-long-running-agents&lt;/LI&gt;&lt;LI&gt;Anthropic: "Harness design for long-running application development" - anthropic.com/engineering/harness-design-long-running-apps&lt;/LI&gt;&lt;LI&gt;Anthropic: "Scaling Managed Agents: Decoupling the brain from the hands" - anthropic.com/engineering/managed-agents&lt;/LI&gt;&lt;LI&gt;Chroma: "Research on context rot" - research.trychroma.com/context-rot&lt;/LI&gt;&lt;LI&gt;Drew Breunig: "How Long Contexts Fail" - dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html&lt;/LI&gt;&lt;LI&gt;Cognition: "Don't Build Multi-Agents" - cognition.ai/blog/dont-build-multi-agents&lt;/LI&gt;&lt;LI&gt;Matt Webb: "Context plumbing" - interconnected.org/home/2025/11/28/plumbing&lt;/LI&gt;&lt;LI&gt;LangChain: "Context overview" - docs.langchain.com/oss/python/concepts/context&lt;/LI&gt;&lt;LI&gt;LangChain: "Memory overview" - docs.langchain.com/oss/python/concepts/memory&lt;/LI&gt;&lt;LI&gt;LangChain: "The Anatomy of an Agent Harness" - langchain.com/blog/the-anatomy-of-an-agent-harness&lt;/LI&gt;&lt;LI&gt;LangChain: "Your harness, your memory" - langchain.com/blog/your-harness-your-memory&lt;/LI&gt;&lt;/UL&gt;</description>
    <pubDate>Fri, 01 May 2026 21:48:13 GMT</pubDate>
    <dc:creator>Sarwesh</dc:creator>
    <dc:date>2026-05-01T21:48:13Z</dc:date>
    <item>
      <title>Context Engineering vs. Memory Engineering vs. Harness Engineering</title>
      <link>https://community.hpe.com/t5/software-general/context-engineering-vs-memory-engineering-vs-harness-engineering/m-p/7266438#M1505</link>
      <description>&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Anthropic tested Claude against a simple prompt: "build a retro game maker." A solo run cost $9, took 20 minutes, and produced a game where the core feature was broken. A harnessed run, with a planner, generator, and evaluator working together, cost about $200, ran for 6 hours, and produced a working game with an AI-assisted sprite editor.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Same model. Same prompt. Roughly 22x the cost. Completely different result.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;The difference was not the model. It was the system around the model.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;One useful way to think about that system is as three layers:&lt;/FONT&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;&lt;STRONG&gt;Context Engineering&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;- curating what the model sees at each step&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;&lt;STRONG&gt;Memory Engineering&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;- deciding what the agent retains across time&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;&lt;STRONG&gt;Harness Engineering&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;- designing the orchestration, evaluation, and infrastructure the agent runs inside&lt;/FONT&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;I am using those as practical labels, not as settled academic categories. In real systems they blur together. But separating them makes agent failures much easier to diagnose.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Many teams still blur them together. The teams shipping serious agents usually cannot afford to.&lt;/FONT&gt;&lt;/P&gt;&lt;HR /&gt;&lt;P&gt;&lt;FONT size="4"&gt;&lt;STRONG&gt;&lt;FONT face="helvetica"&gt;1. Context Engineering: Curating the Model's Attention Budget&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;The term context engineering picked up speed in 2025, but Anthropic's version is still the clearest operational definition I have seen: context engineering is the iterative curation of what goes into the model's limited context window from a constantly evolving universe of possible information. Unlike prompt engineering, which usually focuses on static instructions, context engineering is dynamic. The curation happens every time you decide what to pass to the model.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;The core constraint is simple. Chroma's work on context rot, and Anthropic's follow-on discussion of it, make the same point: more tokens do not automatically mean better reasoning. Context is a finite attention budget.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;In practice, most of the mechanics reduce to four moves:&lt;/FONT&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;&lt;STRONG&gt;Write&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;- persist notes, plans, and intermediate results outside the active window&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;&lt;STRONG&gt;Select&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;- pull the right pieces in just in time; Claude Code loads&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;CLAUDE.md&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;up front but uses&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;grep&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;glob&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;to navigate a codebase on demand&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;&lt;STRONG&gt;Compress&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;- summarize and prune; Anthropic specifically calls out tool-result clearing as one of the safest forms of compaction&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;&lt;STRONG&gt;Isolate&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;- split work across subagents with clean windows; Anthropic's multi-agent research system beat single-agent Claude Opus 4 and reached 90.2% in Anthropic's evaluation&lt;/FONT&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Drew Breunig's failure-mode map explains why this matters. Long contexts fail through&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;context poisoning&lt;/STRONG&gt;,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;context distraction&lt;/STRONG&gt;,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;context confusion&lt;/STRONG&gt;, and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;context clash&lt;/STRONG&gt;. His examples are concrete: agent traces getting stuck beyond ~100K tokens, a quantized Llama 3.1 8B failing with 46 tools but succeeding with 19, and sharded prompts causing an average 39% drop in one study.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Anthropic's guiding principle is the right one:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;"find the smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome."&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;HR /&gt;&lt;P&gt;&lt;FONT size="4"&gt;&lt;STRONG&gt;&lt;FONT face="helvetica"&gt;2. Memory Engineering: Persistence Beyond the Context Window&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Memory becomes the problem the moment a task spans sessions.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Anthropic put it well: "the core challenge of long-running agents is that they must work in discrete sessions, and each new session begins with no memory of what came before. Imagine a software project staffed by engineers working in shifts, where each new engineer arrives with no memory of what happened on the previous shift."&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Memory is the least settled of these three categories, but it matters as soon as an agent has to preserve state across resets, handoffs, or long-running tasks.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;A useful lens borrows from cognitive science:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;episodic&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;memory for prior experiences and examples,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;semantic&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;memory for facts and relationships, and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;procedural&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;memory for learned rules and instructions.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;One of the simplest patterns that keeps showing up is structured note-taking. Anthropic's Pokemon example is the clearest demonstration I have seen: without any special prompting about memory structure, the agent maintained tallies across thousands of steps, built maps, tracked achievements, and kept strategic combat notes. After context resets, it read its own notes and kept going.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Anthropic turned the same idea into a progress-file pattern for long-running coding: each fresh session reads a&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;claude-progress.txt&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;file and recent git history, makes incremental progress, then leaves a clean commit and a structured handoff for the next session. Dead simple. Very effective.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;The harder problem is retrieval, not storage. Saving state is easy. Deciding what to surface, when to surface it, and how visible that selection should be to the user is much harder. Hidden memory injection can feel helpful one moment and invasive the next.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;The cleanest distinction is this: short-term memory often lives inside the agent's active state, while long-term memory usually lives outside the context window and gets pulled back in when needed.&lt;/FONT&gt;&lt;/P&gt;&lt;HR /&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT face="helvetica" size="4"&gt;3. Harness Engineering: The System the Agent Runs Inside&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Harness design has moved especially quickly over the past year. Anthropic has published a useful sequence of posts on long-running harnesses and managed agents, while Cognition made the counterpoint case for defaulting to simpler, single-threaded agents unless you truly need more architecture.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;A harness is the orchestration loop that calls the model, routes tool calls, manages sessions, and governs how the agent operates. If context is the model's RAM and memory is its disk, the harness is the operating system.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;&lt;STRONG&gt;What harness engineering encompasses:&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;Orchestration and Session Management&lt;/FONT&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Anthropic's long-running coding work showed that naive agents fail in predictable ways. First, they try to one-shot the whole app and leave half-finished work behind. Later, they swing the other way and declare victory too early.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;The first fix was a two-agent harness: an initializer that sets up the environment, feature list, scripts, and progress file, followed by coding agents that work one feature at a time and leave clean artifacts behind.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;The next step was a three-agent architecture: planner, generator, evaluator. The planner expands a one-line prompt into a fuller spec. The generator builds. The evaluator tests the running application with browser automation. That planner-generator-evaluator loop produced far better results than a solo run on Anthropic's retro game maker example.&lt;/FONT&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;The Bitter Lesson of Harness Design&lt;/FONT&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Anthropic's broader lesson is the one most teams miss:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;harness components encode assumptions about what the model cannot do, and those assumptions go stale fast.&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;They originally needed sprint decomposition and context resets because Sonnet 4.5 showed "context anxiety" as it approached the end of a long run. When they moved to Opus 4.5, some of that scaffolding stopped helping. With Opus 4.6, they removed the sprint construct entirely and let the model work coherently for more than two hours in a continuous build.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;That is the right default posture toward harnesses: treat them as perishable, not permanent.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;This is also what makes Anthropic's April 2026 Managed Agents piece important. It treats the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;session&lt;/STRONG&gt;,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;harness&lt;/STRONG&gt;, and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;sandbox&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;as separable components so any implementation can be swapped without disturbing the others. The brain is decoupled from the hands.&lt;/FONT&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;Tool Design&lt;/FONT&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Anthropic said something in its SWE-bench work that still feels underappreciated:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;"We actually spent more time optimizing our tools than the overall prompt."&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Their later context engineering post makes the same point more bluntly: bloated tool sets create ambiguous decision points. If a human engineer cannot clearly say which tool should be used, the model will not do better.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;The practical implication is task-specific tool curation. Do not hand an agent every tool you own. Give it the smallest viable set for the job.&lt;/FONT&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Evaluation as a Harness Component&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Anthropic's March 2026 harness post surfaced a critical finding about self-evaluation:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;agents are poor judges of their own work.&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;When asked to evaluate work they produced, they tend to praise it, even when the result is obviously mediocre to a human reviewer.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;The fix is structural. Separate the agent doing the work from the agent judging it. Anthropic's line here is worth remembering:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;"Tuning a standalone evaluator to be skeptical turns out to be far more tractable than making a generator critical of its own work."&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;In their setup, the evaluator used Playwright MCP to click through a running application like a real user. That let it catch bugs the generator missed entirely.&lt;/FONT&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;Production Reliability&lt;/FONT&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Anthropic's Managed Agents architecture pushes the same idea into production infrastructure. Containers are cattle, not pets. The session log lives outside the harness, so crashes do not erase state. Credentials are kept out of the sandbox where generated code runs.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;That architectural decoupling had real operational payoff: Anthropic reported roughly a 60% drop in p50 time-to-first-token and more than a 90% drop in p95.&lt;/FONT&gt;&lt;/P&gt;&lt;HR /&gt;&lt;P&gt;&lt;FONT size="4"&gt;&lt;STRONG&gt;&lt;FONT face="helvetica"&gt;How the Three Layers Interact&lt;/FONT&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Context Engineering&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;→ Core question: "What should the model see right now?" → Scope: Single inference step → Analogy: Working memory / RAM → Failure modes: Poisoning, distraction, confusion, clash → Changes: Every agent step&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Memory Engineering&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;→ Core question: "What should persist across sessions?" → Scope: Across sessions and time → Analogy: Long-term memory / disk → Failure modes: Irrelevant retrieval, stale memories, privacy problems → Changes: Every session or periodically&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Harness Engineering&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;→ Core question: "How should the agent operate?" → Scope: Entire system lifecycle → Analogy: Operating system / kernel → Failure modes: One-shotting, premature completion, cascading errors, poor self-evaluation → Changes: Every model generation and major deploy cycle&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;These layers are not independent. Memory feeds context, and the harness decides when and how that happens. Matt Webb's phrase&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;"context plumbing"&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;is a good metaphor here: context is dynamic, distributed, and time-sensitive. A big part of agent engineering is moving the right context to the right place at the right moment.&lt;/FONT&gt;&lt;/P&gt;&lt;HR /&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT face="helvetica" size="4"&gt;The Takeaway&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;These layers shift with every model release. Anthropic's line about harnesses is the right one:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;"the space of interesting harness combinations doesn't shrink as models improve. Instead, it moves."&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Sonnet 4.5 needed more scaffolding than Opus 4.6. Opus 4.6, in turn, makes new kinds of long-running builds worth trying.&lt;/FONT&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;Three things to do now:&lt;/FONT&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;&lt;STRONG&gt;Audit your context.&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Find what is wasting attention budget. Bigger context windows do not remove context-management problems. They mostly let you postpone them.&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;&lt;STRONG&gt;Design memory as infrastructure.&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Start simple with durable notes, progress files, and explicit handoff artifacts.&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT face="helvetica"&gt;&lt;STRONG&gt;Stress-test your harness on every model release.&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Components that were load-bearing six months ago may already be dead weight.&lt;/FONT&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;For many teams, the harder work is no longer the model alone. It is the system around the model.&lt;/FONT&gt;&lt;/P&gt;&lt;HR /&gt;&lt;P class=""&gt;&lt;FONT face="helvetica"&gt;&lt;FONT size="5"&gt;&lt;STRONG&gt;&lt;FONT face="helvetica" size="4"&gt;Readings&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Anthropic: "Building effective agents" - anthropic.com/engineering/building-effective-agents&lt;/LI&gt;&lt;LI&gt;Anthropic: "How we built our multi-agent research system" - anthropic.com/engineering/multi-agent-research-system&lt;/LI&gt;&lt;LI&gt;Anthropic: "Effective context engineering for AI agents" - anthropic.com/engineering/effective-context-engineering-for-ai-agents&lt;/LI&gt;&lt;LI&gt;Anthropic: "Effective harnesses for long-running agents" - anthropic.com/engineering/effective-harnesses-for-long-running-agents&lt;/LI&gt;&lt;LI&gt;Anthropic: "Harness design for long-running application development" - anthropic.com/engineering/harness-design-long-running-apps&lt;/LI&gt;&lt;LI&gt;Anthropic: "Scaling Managed Agents: Decoupling the brain from the hands" - anthropic.com/engineering/managed-agents&lt;/LI&gt;&lt;LI&gt;Chroma: "Research on context rot" - research.trychroma.com/context-rot&lt;/LI&gt;&lt;LI&gt;Drew Breunig: "How Long Contexts Fail" - dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html&lt;/LI&gt;&lt;LI&gt;Cognition: "Don't Build Multi-Agents" - cognition.ai/blog/dont-build-multi-agents&lt;/LI&gt;&lt;LI&gt;Matt Webb: "Context plumbing" - interconnected.org/home/2025/11/28/plumbing&lt;/LI&gt;&lt;LI&gt;LangChain: "Context overview" - docs.langchain.com/oss/python/concepts/context&lt;/LI&gt;&lt;LI&gt;LangChain: "Memory overview" - docs.langchain.com/oss/python/concepts/memory&lt;/LI&gt;&lt;LI&gt;LangChain: "The Anatomy of an Agent Harness" - langchain.com/blog/the-anatomy-of-an-agent-harness&lt;/LI&gt;&lt;LI&gt;LangChain: "Your harness, your memory" - langchain.com/blog/your-harness-your-memory&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Fri, 01 May 2026 21:48:13 GMT</pubDate>
      <guid>https://community.hpe.com/t5/software-general/context-engineering-vs-memory-engineering-vs-harness-engineering/m-p/7266438#M1505</guid>
      <dc:creator>Sarwesh</dc:creator>
      <dc:date>2026-05-01T21:48:13Z</dc:date>
    </item>
  </channel>
</rss>

