Researchers found that a hybrid “2M+1H” model—combining two independently trained machine-learning systems with limited gastroenterologist adjudication—matched the accuracy of traditional expert central reading while cutting human review workload by more than 80%. In an analysis of 150 full-length endoscopy videos, the AI-led approach achieved strong agreement with expert reference standards and showed comparable performance for key trial endpoints such as endoscopic improvement and remission.
Just as striking, the study exposed a vulnerability in current practice: nearly one in six videos received different final scores depending on which human readers were assigned. By using AI as a first-pass scorer and reserving physicians only for disagreements, the hybrid model reduced reader variability and delivered more reproducible results.
