One online user described living with jaw pain for years after a sports injury. Despite multiple scans and visits to specialists, no clear solution was offered—until the individual input their symptoms into a language model. The AI suggested a specific alignment issue and proposed a tongue-positioning technique. After trying it, the pain vanished.

This case, which gained traction on social media, is far from unique. Other stories describe patients claiming AI tools have correctly interpreted scans or offered accurate diagnoses where medical professionals had not. In one example, a mother struggling for years to obtain a diagnosis for her child’s neurological issues turned to a language model. After submitting records and scans, she received a suggestion that led to surgery—and a significant improvement in the child’s condition.

Consumer-friendly AI is transforming how people seek health advice. The era of “Dr. Google” is giving way to a new phase, where conversational agents take on diagnostic roles. In response, universities, clinicians, and developers are exploring how reliable these systems are, how they can be safely integrated into care, and how to deal with misinformation when it occurs.

Some physicians are already encountering patients using AI tools during treatment. One reported an instance where a frustrated patient, tired of waiting, input her records into an AI chatbot and received an accurate diagnosis. Rather than being annoyed, the doctor saw it as an opportunity to better understand the patient’s concerns.

However, studies show that while AI can be highly accurate on its own, its effectiveness drops when humans are in the loop. Errors often stem from incomplete information entered into the system or misinterpretation of AI responses. In one experiment, two groups of doctors evaluated identical patient cases—one with AI support, one without. Both groups performed similarly, though the AI alone achieved much higher diagnostic accuracy.

Medical professionals also caution that while AI may offer a correct diagnosis, it doesn’t account for the nuances of a patient's unique situation. For example, in fertility care, recommendations based solely on embryo viability scores may overlook critical factors such as the timing of biopsies or previous reproductive history—details a seasoned physician would consider.

Patients sometimes come in convinced of a particular course of action based on what an AI has told them. While the AI’s suggestion may not be wrong, it may not be optimal either. Experienced physicians argue that there’s both a science and an art to determining the right treatment, and AI often lacks the ability to combine the two.

In response, some AI developers are working on tools tailored for medical use. One major company launched a benchmark system, developed with input from hundreds of physicians, to evaluate AI’s performance on simulated health scenarios. They claim that the latest version of their model can match or outperform doctors in producing quality responses.

Another tech firm introduced a diagnostic platform for clinicians, which uses multiple language models working in tandem—mimicking the dynamic of a group of specialists. In trials, it significantly outperformed human doctors.

With these tools gaining traction, some medical schools are now teaching students how to work with them—and how to communicate about them with patients. One educator compared the situation to when patients first began using search engines for medical information, saying that in today’s world, a doctor not using AI may be seen as behind the curve.

In practice, however, doctors still often act as the gatekeepers of information. Studies show that they tend to trust AI only when it agrees with their assessments, and dismiss it otherwise. In one case, a rare disease was correctly identified by an AI after being misdiagnosed by several specialists. The model even suggested a more common—but less likely—alternative diagnosis that had been incorrectly chosen by the humans.

Another large study involving over 1,200 participants showed that when AI operated independently, it provided the correct diagnosis in nearly 95% of cases. But when people used the AI as a guide, success rates dropped to just one-third. The issue often lay in the input—when users omitted critical symptoms, the AI offered misleading advice. For example, in a case of sudden-onset headache and neck stiffness, the right move would be immediate medical attention, but when the sudden nature of symptoms was not mentioned, the AI suggested simple pain relief at home.

Regardless of whether the content is accurate or not, AI often delivers responses in a confident, polished tone that feels authoritative. Unlike a traditional search engine that provides links for further reading, AI tools generate structured text, giving the impression of finality—even when wrong. This can be dangerously misleading.