I have been paying attention to the wrong thing. We all have.
For the past year, the AI conversation in medicine has been dominated by model releases, benchmark wars, and breathless press announcements. Every week brings a new "breakthrough." Every month brings a new leaderboard. I have written about some of them in this newsletter, and I will write about more today. But something happened recently that made me rethink what actually matters.
A radiologist named Dr. Laura Heacock posted on Twitter that she had been using Claude Code — an AI coding tool — to iterate on her academic talks. It used her visual style guide and her writing style guide to shape her drafts, triple-checked citations, cleaned up slide formatting. She saved a few hours of tedious work. And the way she described it was completely unremarkable. No fanfare. No "AI is revolutionizing my practice." Just: I used a tool, it worked, I moved on with my week.
That is the shift I want to talk about. Not the announcements. Not the benchmarks. The moment when AI becomes something you just use on a Tuesday afternoon and do not think twice about.
Google's AMIE Moved From the Lab to the Waiting Room
In January 2024, Google published a retrospective analysis showing that their AI system AMIE could match physician-level diagnostic accuracy in simulated patient conversations. It was impressive. It was also entirely theoretical — standardized cases, actor patients, controlled conditions.
On March 9, 2026, they published a prospective feasibility study.
One hundred adult patients at Beth Israel Deaconess Medical Center, scheduled for urgent care appointments between April and November 2025, text-chatted with AMIE up to five days before their visit. Not simulations. Not retrospective chart review. Real patients with real complaints, interacting with a conversational AI before seeing their physician. The system — built on Gemini 2.5 Pro with Thinking mode enabled — guided patients through five conversation phases: intake, history taking, diagnostic validation, assessment delivery, and wrap-up. It generated a conversation summary that was shared with the PCP before the appointment.
The results: AMIE's differential diagnosis included the final diagnosis in 90% of cases (confirmed via chart review eight weeks later), with 75% top-3 accuracy. Blinded assessment found no significant difference in overall DDx quality between AMIE and PCPs (p=0.6). Patient satisfaction was high. Attitudes toward AI improved significantly (p<0.001). Human safety supervisors monitored every interaction in real time and did not need to intervene once.
Important caveats: this is a single-arm feasibility study with 100 patients, not a randomized trial. AMIE's management plans were logged for research purposes only — they were never shown to patients or providers. And physicians still outperformed AMIE on the practicality (p=0.003) and cost-effectiveness (p=0.004) of management plans, which is exactly what you would expect. Knowing the right diagnosis is not the same as knowing what to do about it in the context of a specific patient's life, insurance, and preferences.
But here is what matters: this is the first time a conversational diagnostic AI has been tested prospectively with real patients in a real clinic. Not imaging AI reading scans in the background. Not a risk model running silently in the EHR. A patient texting with an AI about their symptoms, and a physician receiving that context before walking in the room. The question has shifted. We are no longer asking "can AI do diagnostics in a lab?" We are asking "how does the AI's pre-visit differential change the conversation I have with my patient?"
Feasibility studies are not definitive. But they matter more than benchmarks for a simple reason: they tell you what works when the patient is real, the context is messy, and the physician still has four more patients to see before lunch.
AI Became a Coworker, Not an Assistant
In January, Anthropic launched Claude Cowork — an extension of Claude Code designed for knowledge work beyond programming. By February 24, it had rolled out broadly. By March 9, Microsoft integrated it into Copilot as part of a new E7 licensing tier for enterprise customers.
Eight weeks from launch to Microsoft integration. That is not a pilot program. That is a land grab.
But the more interesting story is not what the companies are doing. It is what physicians are doing without asking permission.
Dr. Heacock is not alone. Across Twitter and LinkedIn, physicians are posting about using Claude Code and Cowork for literature reviews, manuscript drafting, slide formatting, and reference management. These are not IT-approved deployments. They are individual physicians discovering a tool, trying it, finding it useful, and telling their colleagues.
The language is shifting, too. People are not calling these tools "assistants" anymore. They are calling them "coworkers." The framing matters. An assistant takes orders. A coworker takes a task and runs with it. You say "I need this deck cleaned up by Friday" and it comes back done. That is a fundamentally different relationship than "write me a summary of this paper."
I think this is what separates the AI tools that will last from the ones that will fade. The ones that remove friction win. The ones that require new workflows lose. Cowork fits into the gap between "I need to do this" and "I do not have time to do this well." That gap, in medicine, is enormous.
FDA Gave Its First Nod to a Generative AI Chatbot
On March 3, the FDA granted Breakthrough Device Designation to RecovryAI, a conversational chatbot designed to guide patients through post-surgical recovery after joint replacement surgery.
This is the first generative AI chatbot to receive this designation.
RecovryAI is prescribed to patients for 30 days after surgery. It checks in twice daily, answers questions about pain and mobility, and escalates to the care team when something looks wrong. It is there at 2 AM when the patient is worried about swelling and does not want to bother the on-call surgeon.
Breakthrough Device Designation is not approval — it is a prioritized pathway. The criteria are strict: the device must address a life-threatening or irreversibly debilitating condition and offer advantages over existing alternatives. RecovryAI meets those criteria because post-surgical complications are common, costly, and patients often do not know when to call for help.
Here is the context that matters: as of March 2026, no generative AI device has received FDA authorization. RecovryAI could be the first.
This is the FDA learning by doing. They do not have a playbook for regulating conversational AI in medicine. They are writing it in real time. RecovryAI is the test case, and the precedent it sets will shape everything that comes after — ambient scribes, diagnostic chatbots, clinical decision support, all of it.
Meanwhile, FDA Commissioner Marty Makary announced in January that the agency would ease regulation of AI-enabled clinical decision support software, saying they need to "move at Silicon Valley speed" and foster a "good environment for investors." The tension is obvious: accelerate innovation, but do not compromise safety. RecovryAI sits right in the middle of that tension.
I think this is the most important regulatory story in AI and medicine right now, and it is getting less attention than it deserves. When RecovryAI gets authorized — or does not — the reasoning will tell us how the FDA plans to handle the next hundred generative AI tools lining up behind it.
Quick Hits
A few other developments worth noting:
Amazon just launched a health AI assistant for all US customers. On March 10, Amazon expanded its Health AI assistant from the One Medical app to amazon.com and the main Amazon app. It can answer health questions, explain medical records, manage prescription renewals, and book appointments. With your permission, it pulls your records through the Health Information Exchange — the nationwide system for sharing patient data. Prime members get up to five free consultations with a One Medical provider for over 30 conditions. You do not need to be a Prime member or a One Medical subscriber to use it. Amazon acquired One Medical for $3.9 billion in 2023. This is them leveraging that investment. When the company that delivers your toothpaste also interprets your lab results, the line between retail and healthcare is not blurring — it is gone.
AI scribes are reducing burnout, and the effect size is not subtle. A multicenter quality improvement study in JAMA Network Open followed 263 physicians and APPs across six health systems for 30 days with an ambient AI scribe. Burnout dropped from 51.9% to 38.8% (OR 0.26, 95% CI 0.13-0.54, p<0.001). Note-related cognitive load improved by 2.64 points on a 10-point scale. Time documenting after hours dropped by nearly an hour per week. Separately, a survey of 43 US health systems in the Scottsdale Institute found that all 43 respondents had adopted ambient notes, with 53% reporting high success rates. That is a self-selected sample of large, tech-forward systems — not all US hospitals. But the signal is clear: ambient scribes are the AI application that is actually shipping and working at scale.
Medical schools are teaching AI now, not debating whether to ban it. 77% of US and Canadian medical schools now include AI in their curriculum, according to the AAMC's 2023-2024 SCOPE survey. Two years ago, schools were worried about students using ChatGPT to cheat. Now they are teaching them how to use it responsibly in clinical practice. Stanford created a new position — Director of Medical Education in Artificial Intelligence — that did not exist three months earlier. The students arriving at medical school in 2026 already know how to use AI. The challenge is teaching them to use it well.
AI drug discovery has 173 programs in clinical trials and zero FDA approvals. This is the "not yet" story. The hype around AI-designed drugs is enormous, but as of December 2025, no AI-discovered drug has been authorized. The first approval is expected in 2026 or 2027 at the earliest, possibly later. Contrast that with ambient scribes — widely deployed across major health systems in under two years — and you see the difference between AI that removes friction and AI that requires an entirely new infrastructure.
The Consolidation Risk No One Is Talking About
Here is the problem no one wants to say out loud: AI is making the gap between large health systems and small practices wider, not narrower.
The athenahealth 2026 Physician Sentiment Survey, released March 4, found that 65% of physicians at large enterprises report being comfortable with AI, compared to 43% at small practices. That 22-point gap is not just about training. It is about resources. Large systems can afford AI tools, infrastructure, and dedicated support. Small practices cannot. And 90% of those small practices fear losing their independence to consolidation.
The rural picture is worse. Rural physician burnout is running at 67%, compared to 52% in urban settings. Sixty-nine percent of rural doctors are considering leaving medicine entirely. These physicians do not have access to ambient scribes, Cowork integrations, or AI-powered scheduling tools. They are burning out at higher rates while urban academic physicians are tweeting about how AI saved them hours last week.
AI is not an equalizer. It is a consolidator.
If large, well-funded health systems adopt AI and reduce burnout, improve efficiency, and retain physicians, while small practices and rural hospitals fall further behind, the result is not hard to predict. More acquisitions. More closures. Fewer independent practices. A two-tier system where access to care depends on whether you live near a hospital that can afford the AI tools that make medicine sustainable.
I think this is the policy question we should be asking, and I do not see it getting the attention it deserves. The conversation is all upside — "AI will save us from burnout!" — without accounting for the fact that the tools saving urban academic physicians are not reaching the physicians who need them most.
I started this issue by saying I have been paying attention to the wrong thing. Here is what I mean.
The stories that will matter most in five years are not the model releases or the benchmark scores. They are Dr. Heacock quietly saving hours on a Tuesday. A patient texting an AI about their symptoms before an appointment and finding it useful. A chatbot checking on a post-surgical patient at 2 AM. A small practice in rural Texas that cannot afford any of it.
The question is not whether AI will transform medicine. It already is. The question is whether we are building something that works for everyone or only for those who can afford to keep up.
I know which answer I want. I am less sure which one we are building.
