Depression is a widespread mental health issue affecting around 280 million individuals globally. In response to this challenge, researchers from Kaunas University of Technology (KTU) have created an artificial intelligence (AI) model designed to detect depression through an analysis of speech and brain neural activity. By integrating these two data sources, their approach offers a more precise and impartial evaluation of an individual’s emotional state, paving the way for advancements in depression diagnosis.
“Depression is a prevalent mental disorder that can have serious effects on individuals and society. Therefore, we are developing a new, more objective diagnostic method that could eventually be accessible to everyone,” states Rytis MaskeliÅ«nas, a KTU professor and one of the researchers behind this innovation.
Researchers assert that the majority of past diagnostic studies for depression have primarily relied on a single data type. The new multimodal approach, however, aims to provide a fuller understanding of an individual’s emotional condition.
High accuracy achieved using voice and brain activity data
This innovative combination of speech and brain activity data realized an impressive accuracy rate of 97.53% in diagnosing depression, far surpassing other existing methods. “The inclusion of vocal data enhances our research because it reveals aspects we cannot yet gather solely from brain activity,” MaskeliÅ«nas explains.
Musyyab Yousufi, a PhD student at KTU who contributed to the research, elaborates on their data selection: “While it’s commonly thought that facial expressions can provide insight into a person’s psychological state, this data is easily manipulated. We opted for voice, as it can subtly communicate emotional states through variations in speech rate, tone, and overall energy.”
Furthermore, unlike EEG (electroencephalography) or vocal data, facial expressions can only indicate the severity of a person’s condition up to a certain limit. “However, we must respect patients’ privacy, and gathering and integrating data from multiple sources is more beneficial for future applications,” explains the professor from KTU’s Faculty of Informatics (IF).
MaskeliÅ«nas highlights that the EEG data utilized came from the Multimodal Open Dataset for Mental Disorder Analysis (MODMA), making it clear that KTU’s research team specializes in computer science rather than medical science.
The MODMA EEG data was gathered over a five-minute period while participants were awake, resting, and with their eyes closed. During the audio segment of the study, participants engaged in a session of questions and answers, as well as activities that involved reading and describing images to capture their natural language use and cognitive state.
AI must learn to support its diagnostic conclusions
The gathered EEG and audio signals were converted into spectrograms for visualization. Specialized noise filters and pre-processing techniques were utilized to eliminate noise and ensure comparability, while a modified DenseNet-121 deep-learning model was employed to recognize signs of depression within the imagery. Each image depicted signal variations over time, with EEG data representing brain activity waveforms and audio data displaying frequency and intensity distributions.
The model included a custom classification layer trained to categorize the data into healthy versus depressed individuals. This classification process was rigorously evaluated, followed by an assessment of the application’s accuracy.
In the future, this AI model has the potential to expedite the diagnosis of depression, possibly even enabling remote evaluations and reducing the reliance on subjective assessments. This advancement does, however, hinge on future clinical trials as well as enhancements to the program. Yet, Maskeliūnas notes that this aspect of research may present its own set of challenges.
“The primary challenge with these studies is the scarcity of data, as many individuals prefer to keep their mental health issues private,” he explains.
Another critical aspect highlighted by the professor from KTU’s Department of Multimedia Engineering is the necessity for the algorithm to not only be accurate but also to provide medical professionals with insights into how it arrived at its diagnostic conclusions. “The algorithm still needs to learn to present these diagnoses in an understandable manner,” MaskeliÅ«nas notes.
The KTU professor further remarks that, given the increasing demand for AI solutions that directly impact individuals across sectors like healthcare, finance, and legal systems, the need for explainability in AI is becoming widespread.
This trend is fostering the rise of explainable artificial intelligence (XAI), which focuses on elucidating the reasoning behind model decisions to enhance user trust in AI systems.