A new study evaluating 21 large language models found a consistent pattern: while AI can arrive at the correct diagnosis when given complete information, it struggles with the core of clinical medicine—reasoning through uncertainty.
Across real-world clinical scenarios, models failed to generate appropriate differential diagnoses more than 80% of the time, highlighting a critical gap in early-stage decision-making. The issue isn’t accuracy at the endpoint—it’s the inability to navigate the stepwise diagnostic process, where incomplete information, judgment, and prioritization define care.
The takeaway for healthcare is becoming clearer. AI is improving incrementally and can support tasks like data synthesis and final diagnosis—but it still depends heavily on structured inputs and human oversight.

