I see it all the time. A resident pastes a patient summary into ChatGPT to help draft a discharge note. A researcher uploads a spreadsheet of outcomes data to Claude for analysis. A colleague copies a radiology report into an AI tool to generate patient-friendly language.
Sometimes it's fine. Sometimes it's a HIPAA violation that could cost your institution millions.
The difference? Whether that data contains Protected Health Information.
Here's what you need to know before you paste anything into an AI tool.
What Actually Counts as PHI?
Protected Health Information is any health information that can be linked to a specific individual. The key word is "linked." A diagnosis alone isn't PHI. A medical record number alone isn't PHI. But put them together, and now you've got PHI.
HIPAA defines 18 specific identifiers that make health data "protected." Remove all 18, and the data becomes de-identified. Keep even one, and you're dealing with PHI.
Here's the complete list:
- Names
- Geographic subdivisions smaller than a state (street addresses, cities, ZIP codes)
- Dates directly related to an individual (birth date, admission date, discharge date, date of death)
- Telephone numbers
- Fax numbers
- Email addresses
- Social Security numbers
- Medical record numbers
- Health plan beneficiary numbers
- Account numbers
- Certificate/license numbers
- Vehicle identifiers and serial numbers (including license plates)
- Device identifiers and serial numbers
- Web URLs
- IP addresses
- Biometric identifiers (fingerprints, retinal scans, voiceprints)
- Full-face photographs
- Any other unique identifying number, characteristic, or code
That last one is deliberately broad. If your dataset includes a unique patient identifier you created yourself, that counts.
The Date Trap
Dates are where most people slip up. You can't just redact the birth date and call it done. Admission dates, surgery dates, lab collection dates, even the date you accessed the chart -- all PHI if they're tied to a patient.
PHI vs PII vs De-Identified Data
You'll hear these terms used interchangeably. They're not the same.
PHI (Protected Health Information) is health information plus an identifier. It's governed by HIPAA. It requires specific safeguards, business associate agreements, and security measures.
PII (Personally Identifiable Information) is a broader term that includes any data that can identify someone, health-related or not. Your name, your address, your Social Security number -- all PII. PHI is a subset of PII. HIPAA regulates PHI; other laws (like GDPR, state privacy laws) regulate PII more broadly.
De-identified data is what you get when you strip out all 18 HIPAA identifiers. Once properly de-identified, data is no longer considered PHI and isn't subject to HIPAA restrictions.
The critical distinction: you can share de-identified data much more freely than PHI. You can publish it. You can use it for research without patient consent. And yes, you can paste it into ChatGPT or any other AI tool without violating HIPAA.
The Two Ways to De-Identify Data
HIPAA recognizes two methods for de-identification: Safe Harbor and Expert Determination.
Safe Harbor is the straightforward approach. Remove all 18 identifiers, confirm there's no reasonable basis to believe the information could identify someone, and you're done. It's mechanical. It's verifiable. Most institutions prefer it because it's defensible.
The catch? Safe Harbor is strict. You can't keep ZIP codes. You can't keep exact ages over 89. You can't keep precise dates. For many clinical use cases, that level of scrubbing makes the data less useful.
Expert Determination is the alternative. You hire a qualified statistician to analyze your dataset and certify that the risk of re-identification is "very small." This allows you to keep more granular data -- maybe you keep month and year of diagnosis, or three-digit ZIP codes.
The catch? You need actual statistical expertise, documentation of the methodology, and someone willing to stake their professional reputation on the analysis. It's not something you can eyeball.
For most clinicians using AI tools day-to-day, Safe Harbor is the practical choice. If you need Expert Determination, you're probably working on a formal research project with institutional support.
Real-World Scenarios: What Can You Actually Do?
Let's get practical. Here are the questions I get asked most often.
Can I paste a radiology report into ChatGPT to generate a patient-friendly summary?
Not unless you de-identify it first. That report contains the patient's name, MRN, date of exam, possibly age and clinical history. All PHI. Strip those out using Safe Harbor, and you're good to go.
Can I upload a spreadsheet of patient outcomes to Claude for data analysis?
Depends what's in the spreadsheet. If it's just aggregate statistics (median age, survival rates, complication percentages), that's not PHI. If it includes patient-level rows with any identifiers -- even just a study ID if that ID could be linked back to real patients -- that's PHI.
Can I use an AI tool to help write a case report?
Yes, but change all identifying details first. Use a fake name, alter the age slightly, generalize the location. Case reports are published, so you'd need patient consent anyway, but even in your draft work, treat the data as if it could leak.
Can I screenshot part of the EMR and ask an AI to interpret it?
Absolutely not. Screenshots almost always contain names, MRNs, dates, and other identifiers burned into the image. Even if you think you cropped them out, metadata and partial text can betray you.
The pattern here is simple. If you wouldn't be comfortable posting it on Twitter with your name attached, don't put it in an AI tool unless that tool is HIPAA-compliant.
The Redaction Test
Before pasting anything into an AI tool, ask yourself: "If this conversation leaked publicly, would I be explaining myself to my compliance officer?" If yes, redact first.
What About "HIPAA-Compliant" AI Tools?
Some AI tools claim HIPAA compliance. What does that actually mean?
HIPAA compliance requires a Business Associate Agreement (BAA). This is a legal contract where the AI vendor agrees to safeguard PHI, limit how they use it, report breaches, and allow audits. Without a signed BAA, the vendor can do whatever they want with your data -- train models on it, store it indefinitely, share it with third parties.
As of early 2026, most consumer AI tools do NOT offer BAAs. ChatGPT's free tier? No BAA. Claude's standard web interface? No BAA. Google's Gemini? No BAA for general users.
Some do offer BAA-compliant versions:
- OpenAI has ChatGPT Enterprise with BAA options for organizations
- Microsoft's Azure OpenAI Service can be configured for HIPAA compliance
- Some medical-specific AI tools like OpenEvidence are built with healthcare compliance in mind
But here's the thing: even with a BAA, you should still minimize what you share. A BAA doesn't give you unlimited permission to dump patient data into a tool. It just means the vendor has agreed to basic safeguards. The principle of minimum necessary use still applies.
For most day-to-day clinical questions, the safest approach is simple: de-identify everything before it touches an AI tool. That way you're not dependent on vendor promises or contract fine print.
Why This Matters More Than You Think
I know this feels like bureaucratic nitpicking. You're trying to provide better care or accelerate research, and someone's waving a rulebook at you.
But the consequences are real. HIPAA violations can trigger:
- Fines up to $1.9 million per violation category per year
- Criminal charges for knowing misuse
- Institutional sanctions, loss of Medicare funding
- Career damage (yes, individual clinicians can be held liable)
- Patient harm if data is misused or breached
More importantly, this isn't just about compliance. It's about trust. Patients share intimate details with us because they trust we'll protect that information. When we're careless with their data -- even with good intentions -- we erode that trust.
The AI tools we're using today are powerful. They can help us practice better medicine, understand complex cases, and communicate more clearly. But they're also black boxes. We don't fully understand how they process data, what they retain, or how they might be exploited.
De-identifying data isn't just a legal checkbox. It's basic information hygiene.
Practical Takeaways
Here's what I want you to remember:
Before you paste anything into an AI tool, ask:
- Does this contain any of the 18 HIPAA identifiers?
- If yes, can I remove them without losing the value of the query?
- If I can't remove them, does this tool have a signed BAA with my institution?
If you need to use real patient data:
- Check if your institution has approved AI tools with BAAs
- Work with your compliance or IT team to set up secure workflows
- Document what you're doing and why
- Consider whether you actually need patient-level data or if aggregate data would work
For research and education:
- De-identify using Safe Harbor whenever possible
- If you need Expert Determination, work with your IRB and biostatistics team
- Remember that even "anonymized" data can sometimes be re-identified -- be thoughtful about what you share
When in doubt:
- Don't guess. Ask your compliance officer.
- Err on the side of caution.
- Read the dos and don'ts of using LLMs in medicine for more practical guidance.
The bottom line: AI tools are incredibly useful, but they're not worth risking your patients' privacy or your career. Learn the rules, follow them, and when you're uncertain, ask.
If you're looking to use AI tools safely in clinical practice, start with the basics: understand what data you can and can't share, learn to recognize PHI, and build good habits now. Check out our guide on AI safety for clinicians for more on building a sustainable, compliant AI workflow.
Now go forth and use AI responsibly. Your patients are counting on it.
