Back to Guides
GuideClinical AIIntermediateNewRamez's Pick

Context Engineering for Physicians: The Skill After Prompting

Ramez Kouzy 8 min

What you'll learn

  • Why prompt engineering is only the entry-level skill
  • How data, task, tool, and normative context shape AI output
  • Why longer prompts and larger context windows are not the same as better context
  • How to evaluate clinician-facing AI workflows before adoption
  • What physicians should ask vendors about context, uncertainty, and action boundaries

Prompt engineering was the first useful AI skill for clinicians. It taught us that asking clearly matters. Tell the model who you are, what you want, what format you need, and what constraints matter. That still works.

But it is no longer enough.

The next skill is context engineering. Not "what words should I type into the chatbot?" but "what should the system know before it answers, and what should it not be allowed to forget?"

That distinction matters in medicine because clinical work is not a clean question-answer exercise. It is a context problem. The same symptom means different things depending on the patient, the setting, the available data, the local practice pattern, the goal of the encounter, and the risk of being wrong.

The Shift

Prompt engineering is asking better questions. Context engineering is designing the information environment in which the AI does its work.


Why This Matters Now

Clinicians are already using AI in workflows that depend on context:

  • ambient scribes that generate notes from visits
  • chatbots that summarize charts
  • tools that draft patient messages
  • literature assistants that read uploaded papers
  • clinical evidence systems that retrieve and synthesize sources
  • coding agents that build dashboards, calculators, and workflow tools

In each case, the question is not just whether the model is "smart." The question is what context the model had when it produced the answer.

Did the scribe know this was a follow-up visit, not a new consult? Did the chart summarizer know which problem was active and which was historical? Did the patient message tool know the patient's reading level, recent scan result, and prior conversation with the clinician? Did the evidence tool retrieve the right guideline for the right disease stage?

A model with the wrong context can sound excellent and still be clinically useless. Worse, it can sound excellent and be wrong in exactly the way a busy clinician may not notice.

This is why the Nature Medicine piece already listed on BeamPath, Physicians as Context Engineers in the Era of Generative AI, is important. The core argument is not that physicians need to become software engineers. It is that physicians cannot be passive end users of AI systems. We have to help define the conditions under which these systems operate.

The paper describes four kinds of context physicians need to shape:

  • data context
  • task context
  • tool context
  • normative context

That is a useful frame. I would make it even more practical: before you trust an AI output, ask what it knew, what job it thought it was doing, what tools it used, and what values it was optimizing.


Prompting Is a Moment. Context Is a System.

A prompt is a moment. You type something, the model responds, and the interaction ends.

Context is a system. It includes the patient's information, the source documents, the institution's rules, the model instructions, the available tools, the retrieval layer, the output format, the human review process, and the escalation pathway.

This is why the best AI workflows are starting to look less like chatbot conversations and more like configured workspaces.

NotebookLM works because the sources are explicit. Notion AI works better when the project context already lives in the workspace. OpenEvidence works because it is built around retrieval and citation. Coding agents work better when they have a repo, instructions, tests, and a clear definition of done.

The same principle applies in clinical AI.

If you ask a general model, "What should I do for this patient with chest pain?" you have asked for trouble. If you give an approved system the triage note, vital signs, ECG, troponin trend, prior cardiac history, local pathway, and the specific task "identify missing information before clinician evaluation," you have a much better clinical support tool.

Same model class. Different context. Completely different risk profile.


The Four Contexts Physicians Should Care About

1. Data Context

Data context is what the AI knows about the patient, the source material, and the clinical setting.

This is the obvious one, but it is also where many workflows fail. A model may have the diagnosis but not the line of therapy. It may have the medication list but not adherence. It may have the note but not the scan. It may have a guideline but not the local formulary. It may have a trial result but not the patient's performance status.

In medicine, missing context is not a minor inconvenience. It can change the answer.

This is why "just paste the chart" is not a strategy. More context is not automatically better context. Long context windows make it tempting to dump everything into the model, but long does not mean organized. A thousand pages of chart noise can bury the five facts that matter.

The physician's job is to know which facts are decision-relevant.

2. Task Context

Task context is what job the AI thinks it is doing.

"Review this patient" is not a task. It is a wish.

Better tasks look like this:

  • identify missing staging information before tumor board
  • summarize the last three oncology notes for a covering physician
  • draft patient-friendly scan results without interpreting prognosis
  • compare this treatment plan against the protocol checklist
  • extract adverse events from these notes for a retrospective study

Each task has a different evidence standard, output format, and failure mode. A patient message should optimize for clarity and tone. A protocol checklist should optimize for completeness and conservatism. A literature summary should optimize for faithful source extraction, not persuasive prose.

If the task is vague, the AI fills in the gaps. That is exactly what you do not want in high-stakes work.

3. Tool Context

Tool context is what the AI is allowed to use.

Can it search the web? Can it retrieve from a local guideline library? Can it access the EHR? Can it run code? Can it cite papers? Can it call a calculator? Can it send a message? Can it place an order?

These are not technical details. They are clinical safety boundaries.

An AI that can draft a message is different from an AI that can send one. An AI that can recommend a lab is different from an AI that can order it. An AI that can summarize a chart is different from an AI that can modify it.

The more tools an AI can use, the more important context engineering becomes. You are no longer managing an answer. You are managing an action surface.

4. Normative Context

Normative context is the hardest one and probably the most important.

What should the AI optimize for? Speed? Sensitivity? Specificity? Cost? Patient autonomy? Institutional policy? Equity? Minimizing missed diagnoses? Avoiding unnecessary testing?

Clinical work is full of value judgments. We pretend some of them are purely technical because it is more comfortable. They are not.

A triage tool tuned to minimize missed emergencies may increase downstream testing. A documentation tool optimized for billing may create bloated notes. A patient education tool optimized for reassurance may understate uncertainty. A utilization tool optimized for cost may conflict with clinician judgment.

Context engineering forces these choices into the open.


A Practical Checklist Before You Use AI Clinically

Here is the simple version I would use before adopting any clinician-facing AI workflow.

Define the task

Write the task in one sentence. If you cannot define it clearly, the AI should not be doing it.

List the required context

Name the minimum information needed for a competent answer: patient facts, guideline, prior treatment, local policy, audience, and output format.

Remove irrelevant context

Do not dump the entire chart unless the workflow truly needs it. Irrelevant context creates noise and can hide the signal.

Set the boundary

Decide whether the AI is drafting, summarizing, checking, recommending, or acting. Those are different risk categories.

Define human review

Specify who reviews the output, what they must verify, and when the task escalates to a human-only workflow.

This is not bureaucracy. It is the difference between an AI assistant and an unlicensed workflow hiding inside a polished interface.


A Clinical Example

Imagine a patient sends a portal message:

"I started the new medication and feel terrible. Should I stop it?"

A generic AI response might say something broadly safe: contact your doctor, seek urgent care if severe, do not stop medication without medical advice. Fine. Not wrong. Also not very useful.

A context-engineered system would know:

  • the medication
  • the indication
  • start date
  • dose
  • relevant labs
  • allergy history
  • known serious adverse effects
  • the clinic's escalation policy
  • whether this is a high-risk drug
  • whether the patient has already called before

It would not simply answer the patient. It might classify urgency, draft a response for clinician review, surface missing information, and flag whether the symptom matches a known toxicity pattern.

That is the point. Context engineering is not about making the chatbot sound smarter. It is about changing the workflow from generic advice to clinically situated assistance.


The Big Mistake: Treating Context as a Bigger Prompt

The most common misunderstanding is thinking context engineering means "write a longer prompt."

It does not.

A longer prompt can help, but it can also make the model worse. The goal is not maximum context. The goal is selected context.

Good context is:

  • relevant to the task
  • current
  • source-grounded
  • structured
  • bounded
  • easy for a human to inspect

Bad context is:

  • huge
  • stale
  • mixed with irrelevant notes
  • missing provenance
  • full of copied-forward errors
  • impossible to audit

Every physician already understands this. A good sign-out is not the whole chart. It is the right facts, in the right order, for the next clinician's task.

Context engineering is sign-out for AI.


What This Means for Physicians

The physician of the AI era does not need to become a machine learning engineer.

But the physician does need to become harder to fool.

When a vendor shows an AI demo, ask:

  • What context does the system receive?
  • Where does that context come from?
  • How recent is it?
  • What does the system ignore?
  • What sources can it retrieve?
  • What actions can it take?
  • How does it express uncertainty?
  • What happens when the context is incomplete?
  • Who is responsible for the final output?

Those questions matter more than a benchmark score.

Benchmarks tell you how a model performs on a test. Context tells you whether the system understands the job in front of it.


The Bottom Line

Prompt engineering was the entry-level skill. It still matters. Clinicians should know how to ask better questions, give constraints, request formats, and force the model to show uncertainty.

But context engineering is the more durable skill.

It is how we move from impressive demos to useful systems. It is how we keep AI from becoming a confident generator of context-free nonsense. And it is how physicians stay involved in the design of clinical AI instead of becoming passive recipients of whatever a vendor decides to ship.

The future physician will not just ask AI better questions.

The future physician will decide what the AI is allowed to know, what it is allowed to do, and when it needs to stop.

That is context engineering.


Source notes: This draft builds on the BeamPath resource Physicians as Context Engineers in the Era of Generative AI, Anthropic's discussion of effective context engineering for agents, Martin Fowler's writing on context engineering and skill files, and recent medical AI work on context-switching and contextual errors in clinical settings.

Enjoyed this guide?

Subscribe to Beam Notes for more insights delivered to your inbox.

Subscribe