The AI-powered chatbot ChatGPT is taking the Internet by a storm with its impressive language capabilities, helping to draw up legal contracts as well as write fiction. But it turns out that the underlying technology could also help spot the early signs of Alzheimer’s disease, potentially making it possible to diagnose the debilitating condition sooner.
Catching Alzheimer’s early can significantly improve treatment options and give patients time to make lifestyle changes that could slow progression. Diagnosing the disease typically requires brain imaging or lengthy cognitive evaluations though, which can be both expensive and time consuming and therefore unsuitable for widespread screening, says Hualou Liang a professor of biomedical engineering at Drexel University, in Philadelphia.
A promising avenue for early detection of Alzheimer’s is automated speech analysis. One of the most common and noticeable symptoms of the disease is problems with language, such as grammatical mistakes, pausing, repetition, or forgetting the meaning of words, says Liang. This has led to growing interest in using machine learning to spot early signs of the disease in the way people talk.
“The hope is that we can use machine learning to pick up these kinds of signals that allow us to do early diagnostics.”
—Hoalou Liang, Drexel University
Normally this relies on purpose-built models, but Liang and his colleagues wanted to see if they could repurpose the technology behind ChatGPT, OpenAI’s large language model GPT-3, to spot the telltale signs of Alzheimer’s. They discovered it could discriminate between transcripts of speech from Alzheimer’s patients and healthy volunteers well enough to predict the disease with 80 percent accuracy, which represents state-of-the-art performance.
“These large language models like GPT-3 are so powerful they can pick up these kind of subtle differences,” says Liang. “If the subject has some kind of issue [involving] Alzheimer’s, and that’s already reflected in the language, the hope is that we can use machine learning to pick up these kinds of signals that allow us to do early diagnostics.”
The researchers tested their approach on a collection of 237 audio recordings taken from healthy volunteers and Alzheimer’s patients, which were converted to text using a pre-trained speech recognition model. To enlist the help of GPT-3, the researchers made use of one of its less well-known capabilities. Its API makes it possible to feed a chunk of text into the model and get it to spit out what is known as an “embedding”—a numerical representation of a piece of text that encodes its meaning and can be used to assess its similarity to other text.
While most machine learning models deal with word embeddings, one of the novel features of GPT-3, says Liang, is that it’s powerful enough to produce embeddings for entire paragraphs. And because of the model’s vast size and the huge amount of data used to train it, it is able to produce very rich representations of the text.
The researchers used this capability to create embeddings for all of the transcripts from both Alzheimer’s patients and healthy individuals. They then took a selection of these embeddings, combined with labels to say which group they came from, and used them to train machine-learning classifiers to distinguish between the two groups. When tested on unseen transcripts the best classifier achieved an accuracy of 80.3 percent, as reported in a paper in PLOS Digital Health.
That was significantly better than the 74.6 percent the researchers achieved when they applied a more conventional approach to the speech data, which relies on acoustic features that have to be painstakingly identified by experts. They also compared their technique to several cutting-edge machine-learning approaches that use large language models too but include an extra step in which the model is laboriously fine-tuned using some of the transcripts from the training data. They matched the performance of the top model and outperformed the other two.
Interestingly, when the researchers tried fine-tuning, the GPT-3 model performance actually dropped. This might seem counter-intuitive, but Liang points out that this is probably due to the mismatch in size between the vast amount of data used to train GPT-3 and the small amount of domain-specific training data available for fine-tuning.
While the team does achieve state-of-the-art results, Frank Rudzicz, an associate professor of computer science at the University of Toronto, says relying on privately owned models to carry out this kind of research does raise some problems. “Part of the reason these closed APIs are limiting is that we also can’t inspect or deeply modify the internals of those models or do a more complete set of experiments that would help elucidate potential sources of error that we need to avoid or correct,” he says.
Liang is also open about the limitations of the approach. The model is nowhere near accurate enough to properly diagnose Alzheimer’s, he says, and any real-world deployment of this kind of technology would be as an initial screening step designed to direct people toward a specialist for a full medical evaluation. As with many AI-based approaches, it’s also hard to know exactly what the model is picking up on when it detects Alzheimer’s, which may be a problem for medical staff. “The doctor, very naturally would ask why you get these results,” says Liang. “They want to know what feature is really important.”
Nonetheless, Liang thinks the approach holds considerable promise and he and his colleagues are planning to build an app that can be used at home or in a doctor’s office to simplify screening of the disease.
Original Source: https://spectrum.ieee.org/gpt-3-ai-chat-alzheimers