Skip to main content
Jun 10, 2026  |  12:00pm - 1:00pm

T-CAIREM Trainee Rounds: David Chen & Mahri Kadyrova

Type
Trainee Rounds

Join us for these presentations

Trainee Rounds Presentations (Session 2)

David Chen

David Chen is a first-year radiation oncology resident at the University of British Columbia. He completed his MD at the University of Toronto and BMSc at Western University. His research interests include cancer bioinformatics and applications of artificial intelligence for clinical decision-making and patient care in oncology. Through his research, David aims to leverage big data and artificial intelligence to generate evidence-based conclusions in medicine. He is currently designing AI tools to support clinical trial screening and conduct, automate evidence synthesis in systematic reviews, evaluate and improve the reporting completeness of research, as well as translate and summarize complex healthcare information into patient-friendly and useful formats. Outside of medicine and science, David is involved in community arts initiatives, including musicals and fashion shows, and recently discovered an unlikely interest in spin classes. 

Abstract Title

Development of an Agentic Multi-LLM System to Support the Informed Consent Process in Clinical Trials

Abstract

Background: Large language models (LLMs) could assist clinical trial coordinators in responding to trial participant queries during the informed consent process. We developed an agentic, multi-LLM system that generates accurate responses grounded in the trial knowledge base, and iteratively improves the accuracy of its own responses.

Methods: Trial-Agent comprises (1) a Coordinator LLM that drafts responses using the trial knowledge base, (2) a Moderator LLM that rates factual accuracy (Likert 1–5 rubric) against the trial knowledge base and provides constructive feedback, and (3) an optional revise-and-rerate loop that prompts the Coordinator LLM to improve the accuracy of its response. We evaluated all Coordinator–Moderator pairings of OpenAI’s GPT-4o, GPT-5 Chat, and GPT-5 Thinking LLMs across two prospective clinical trials, DEFEND and CAN-SILENCE. Primary study outcomes were the agreement of the Moderator LLM accuracy ratings with human trial coordinator accuracy ratings and the Moderator LLM-rated accuracy of the Coordinator LLMs draft response and post-revision response.

Results: The GPT-5 Chat Moderator accuracy ratings was comparable to human trial coordinator accuracy ratings for the DEFEND (human: 4.69, 95% CI 4.63–4.74; LLM: 4.63, 95% CI 4.58–4.69; p=0.142) and CAN-SILENCE (human: 4.83, 95% CI 4.76–4.89; LLM: 4.86, 95% CI 4.82–4.90; p=0.182). The most accurate LLM acting as the Coordinator responding to participant queries was the GPT-5 Chat LLM for the DEFEND trial (4.84, 95% CI 4.81–4.88) and the GPT-5 Thinking LLM for the CAN-SILENCE trial (4.94, 95% CI 4.92–4.96). The revise-and-rerate loop improved inaccurate responses in both trials.

Conclusions: We designed an agentic LLM system that enables scalable, trial-concordant responses to participant queries about clinical trials using trial-grounded retrieval augmented generation and an LLM-as-a-judge to evaluate and improve its own response accuracy.


Mahri Kadyrova

Mahri Kadyrova is a PhD researcher in the Department of Electrical and Computer Engineering under the supervision of Dr. Ervin Sejdic, collaborating with Dr. Yana Yunusova from the Department of Speech-Language Pathology. Her research applies machine learning to uncontrolled, real-world video data to develop digital health assessment tools for remote monitoring of individuals with motor neuron disease. She is a recipient of the NSERC PGS-D scholarship.

Abstract title

Deep Learning–Based Tongue Segmentation and Motion Analysis in Motor Neuron Disease Using Uncontrolled Videos

Abstract

Introduction. Monitoring motor neuron disease (MND) progression, particularly tongue function, is vital for timely intervention but limited by access to and variability of clinical assessments. Remote video-based evaluation using machine learning–derived tongue motion features offer a scalable alternative. This study aimed to (1) identify the best-performing tongue segmentation model for the pipeline and (2) clinically validate tongue motion features.

Methods. A total of 133 participants with MND (53 females; median age = 65.0 [59.0–70.0]) completed the ALSBDI-R and performed three tongue tasks (Relax, lateral-fast [L-fast], lateral-normal) via video in uncontrolled environments. From 711 videos, up to 100 frames per video (median = 100 [90–107]) were manually annotated with high reliability. Two decoder architectures (U-Net, U-Net++) combined with five pre-trained encoders (MobileNetV2, VGG19, ResNeXt50_32x4d, EfficientNet-B5, InceptionV4) were fine-tuned using an 80/10/10 participant-level split to identify the optimal segmentation model. The best-performing model (U-Net + EfficientNet-B5) was applied to 297 L-fast videos for motion analysis. Framewise tongue tip trajectories were derived from segmentation masks to compute tongue tip frequency and 95th percentile speed and acceleration. Spearman’s correlation was conducted to evaluate the relationship between each feature and ALSBDI-R total score.

Results. All models achieved >99.4% accuracy for tongue segmentation. While U-Net++ generally outperformed U-Net across encoders, U-Net with EfficientNet-B5 produced the highest performance (IoU = 87.74 ± 10.00). There were significant negative correlations between ASLBDI-R total score and tongue tip frequency (rs(295) = −0.68, p < .001), 95th percentile speed (rs(295) = −0.43, p < .001), and 95th percentile acceleration (rs(295) = −0.57, p < .001).

Discussion. Remote tongue localization in uncontrolled videos proved feasible. Quantitative tongue motion features show at least moderate correlations with clinical scores in the expected direction, indicating their potential as biomarkers for MND progression.

2026 Trainee Rounds-Session2

Contact

Dominic Ali
Communications Specialist
d.ali@utoronto.ca 647-378-6425