Variability in grading diabetic retinopathy using retinal photography and its comparison with an automated deep learning diabetic retinopathy screening software

Abstract

Background: Diabetic retinopathy (DR) screening using colour retinal photographs is cost-effective and time-efficient. In real-world clinical settings, DR severity is frequently graded by individuals of different expertise levels. We aim to determine the agreement in DR severity grading between human graders of varying expertise and an automated deep learning DR screening software (ADLS). Methods: Using the International Clinical DR Disease Severity Scale, two hundred macula-centred fundus photographs were graded by retinal specialists, ophthalmology residents, family medicine physicians, medical students, and the ADLS. Based on referral urgency, referral grading was divided into no referral, non-urgent referral, and urgent referral to an ophthalmologist. Inter-observer and intra-group variations were analysed using Gwet’s agreement coefficient, and the performance of ADLS was evaluated using sensitivity and specificity. Results: The agreement coefficient for inter-observer and intra-group variability ranged from fair to very good, and moderate to good, respectively. The ADLS showed a high area under curve of 0.879, 0.714, and 0.836 for non-referable DR, non-urgent referable DR, and urgent referable DR, respectively, with varying sensitivity and specificity values. Conclusion: Inter-observer and intra-group agreements among human graders vary widely, but ADLS is a reliable and reasonably sensitive tool for mass screening to detect referable DR and urgent referable DR.

Keywords

automated screening software; deep learning; diabetes retinopathy; grading; variability

Link to Publisher Version (URL)

10.3390/healthcare11121697

This document is currently not available here.

Find in your library

Share

COinS