As the prevalence of glaucoma increases worldwide there has been increasing focus on the use of AI in glaucoma diagnosis and disease monitoring. In one study, Maetschke et al. developed a deep learning algorithm that was accurately able to detect glaucomatous versus healthy eyes using raw unsegmented OCT – optic nerve head images. This represents an advancement as prior AI based approaches were segmentation-based and required multiple inputs such as the peripapillary RNFL thickness and the cup-to-disc ratio. In another study, Shuldiner et al. developed a machine learning (ML) algorithm which showed modest success in identifying eyes at risk for rapid glaucoma progression based on initial visual field (VF) testing. Further research in this area is focusing on potential AI-powered home detection/monitoring systems for glaucoma.
In 2021 an Israel startup, Ophthalmic Sciences, unveiled the world's first AI-based contactless intraocular pressure (IOP) measuring device. The device, which is known as IOPerfect™, is a virtual reality (VR) like headset which can be operated at home without any eye drops. This device has great potential in allowing for remote monitoring of glaucoma. The company states that IOP can be checked in two minutes and is unaffected by corneal thickness based on its proprietary AI-based algorithm, which analyzes vascular pressure response. The device can be used not only at home but also in non-ophthalmic care settings like emergency rooms, primary care offices, nursing homes, and even pharmacies. Further clinical trials are ongoing and Ophthalmic Sciences is seeking FDA approval in 2023.
…And upcoming challenges
There are many limitations to the use of AI in glaucoma care. One big concern is in terms of large scale implementation. While many AI-based algorithms have shown success in various aspects of glaucoma care, actual implementation in day-to-day care has been limited. For example, while the IOPerfect™ has shown great promise it has not yet received FDA approval for sale in the US. Additionally, large scale trials often required for FDA approval for such devices can be very expensive to conduct. If this cost is then passed onto the patient, this may result in limited utility and access.
1. Ahuja AS, Bommakanti S, Wagner I, Dorairaj S, Ten Hulzen RD, Checo L. Current and Future Implications of Using Artificial Intelligence in Glaucoma Care. J Curr Ophthalmol. 2022 Jul 26;34(2):129-132. doi: 10.4103/joco.joco_39_22. PMID: 36147268; PMCID: PMC9486995.
2. Maetschke S, Antony B, Ishikawa H, Wollstein G, Schuman J, Garnavi R. A feature agnostic approach for glaucoma detection in OCT volumes. PLoS One. 2019 Jul 1;14(7):e0219126. doi: 10.1371/journal.pone.0219126. PMID: 31260494; PMCID: PMC6602191.'
3. Shuldiner SR, Boland MV, Ramulu PY, De Moraes CG, Elze T, Myers J, Pasquale L, Wellik S, Yohannan J. Predicting eyes at risk for rapid glaucoma progression based on an initial visual field test using machine learning. PLoS One. 2021 Apr 16;16(4):e0249856. doi: 10.1371/journal.pone.0249856. PMID: 33861775; PMCID: PMC8051770.
From the coding lab...
Diabetic retinopathy (DR) is a leading cause of blindness in adults worldwide, and early and accurate screening is essential for diagnosis and prevention of vision loss. Screening has traditionally been performed through in-person ophthalmic examination and more recently has included hybrid models of care such as retinal imaging in primary care settings followed by remote telemedical interpretation. Another rapidly advancing area of research is the use of artificial intelligence (AI) algorithms for automated DR detection. DR is an ideal target for automated diagnosis given its large prevalence and reliance on screening, which enables the creation of large retinal imaging data banks that can train and test AI algorithms. These algorithms can also be developed using several types of single-modality retinal imaging, such as fundus imaging, optical coherence tomography (OCT), or optical coherence tomography angiography (OCTA). AI-based automated DR detection is a promising avenue to increase opportunities for eye care, specifically by improving patient access and facilitating cases for specialist referral.
…To Food and Drug Administration (FDA) clearance
There has been a growing number of studies evaluating AI algorithms for DR, many of which utilize machine learning (a subset of AI). Two AI systems have thus far received FDA clearance after large multicenter prospective clinical trials, IDx-DR in 2018 and EyeArt in 2020. IDx-DR is an autonomous deep learning system with an operator-assistive AI to help technicians without prior imaging experience capture high-quality fundus imaging. In the pivotal trial that led to its clearance status, IDx-DR was compared to stereoscopic fundus photos and OCT data obtained and analyzed by experienced personnel for the diagnosis of more-than-mild DR and diabetic macular edema (abbreviated to MTMDR). The AI system yielded a sensitivity of 87.2% (95% CI 81.8–91.2%) and specificity of 90.7% (95% CI 88.3–92.7%), demonstrating sufficient capability to diagnose MTMDR in non-ophthalmic care settings. EyeArt is a second autonomous point-of-care deep learning system that shared a similar study design and was successfully able to diagnose MTMDR with a sensitivity of 95.5% (95% CI 92.4-98.5%) and specificity of 85% (95% CI 82.6-87.4%). EyeArt was also able to diagnose vision-threatening DR with a sensitivity of 95.1% (95% CI 90.1-100%) and specificity of 89.0% (95% CI 87.-91.1%), which typically requires more urgent intervention and is responsive to treatment during early stages.
…To the future
EyeArt and IDx-DR are among the many promising AI systems designed for DR that are either commercially available or in the pipeline. As these systems achieve further clinical utility, future areas of interest include applying AI algorithms to mixed imaging data, similar to how an ophthalmologist can integrate OCT and fundus photo data before making a clinical judgment. Modalities such as mobile applications are additional areas of tremendous interest and may further improve access. Finally, it remains to be seen how AI algorithms can be adapted for newer diagnostic imaging modalities such as ultra-wide field imaging, which enables detection of peripheral retinal lesions that can represent early DR.
1. Abràmoff MD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med. 2018;1(1). doi:10.1038/S41746-018-0040-6
2. Horie S, Ohno-Matsui K. Progress of Imaging in Diabetic Retinopathy-From the Past to the Present. Diagnostics (Basel, Switzerland). 2022;12(7):1684. doi:10.3390/DIAGNOSTICS12071684
3. Ipp E, Liljenquist D, Bode B, et al. Pivotal Evaluation of an Artificial Intelligence System for Autonomous Detection of Referrable and Vision-Threatening Diabetic Retinopathy. JAMA Netw open. 2021;4(11). doi:10.1001/JAMANETWORKOPEN.2021.34254
4. Saleh GA, Batouty NM, Haggag S, et al. The Role of Medical Image Modalities and AI in the Early Detection, Diagnosis and Grading of Retinal Diseases: A Survey. Bioengineering. 2022;9(8):366. doi:10.3390/BIOENGINEERING9080366
5. Wu JH, Liu TYA, Hsu WT, Ho JHC, Lee CC. Performance and Limitation of Machine Learning Algorithms for Diabetic Retinopathy Screening: Meta-analysis. J Med Internet Res. 2021;23(7). doi:10.2196/23863
While anti-VEGF therapy has become the standard of care for patients with neovascular age-related macular degeneration (nAMD), it remains difficult to predict treatment response and optimize treatment regimen. With new therapies on the horizon, treatment decisions will become increasingly complex. It is possible that Artificial Intelligence (AI) can be used to help determine optimal treatment regimens for patients with nAMD. Studies focusing on the anatomic response to anti-VEGF treatment on OCT have shown success using AI based convolutional neural networks. For example, one study was able to successfully predict, with an area under the curve of 0.81, the effectiveness of anti-VEGF treatment on CNV or cystoid macular edema.
A pilot study performed by Genentech Roche has successfully developed an algorithm using machine learning to predict treatment regimen for patients with nAMD. They evaluated 324 patients who had received ranibizumab or faricimab for the treatment of nAMD. They analyzed baseline characteristics of those patients along with spectral domain OCTs. One specific algorithm that they developed called “XG boost” showed the greatest success in predicting the treatment interval for these patients. This algorithm shows real-world promise that in the future we may be able to optimize the clinical regimen to allow for the least frequent treatment interval while maintaining BCVA outcomes. It must be noted that this is only a pilot study based on a small patient population.
…And upcoming challenges
Still, a downside to an AI-based approach is the biases from the datasets used in the development of these models. For nAMD, specifically, there are a limited number of large datasets with high quality OCT data and multiple study groups have used these same datasets. This means the current models are based on datasets of nAMD patients that maybe less demographically diverse than real-world populations. Though AI has significant potential in guiding treatment of nAMD, limitations remain and further development will be needed before applications are made in actual clinical practice. However, if algorithms can be further validated on bigger and more diverse datasets, then the results may be able to be extrapolated to a far broader population, and potentially transform clinical practice.
1. Ferrara D, Newton EM, Lee AY. Artificial intelligence-based predictions in neovascular age-related macular degeneration. Curr Opin Ophthalmol. 2021 Sep 1;32(5):389-396. doi: 10.1097/ICU.0000000000000782. PMID: 34265783; PMCID: PMC8373444.
3. Ramessur R, Raja L, Kilduff CLS, Kang S, Li JO, Thomas PBM, Sim DA. Impact and Challenges of Integrating Artificial Intelligence and Telemedicine into Clinical Ophthalmology. Asia Pac J Ophthalmol (Phila). 2021 May-Jun 01;10(3):317-327. doi: 10.1097/APO.0000000000000406. PMID: 34383722.
Asia-Pacific Journal of Ophthalmology
To develop a robust AI algorithm, it is important to consider the 6 "rights" that must be done correctly to guide the translation of AI technologies from bench to bedside.
1) The right "intended use" environment
The intended use should be a digital and clinical ecosystem that is ready to have the AI algorithm plugged in. It should also take into the account the market size, scalability, and finances needed to achieve clinical impact.
2) The right “training and testing” dataset
a) A training dataset should be representative of the AI algorithm’s target population.
b) A testing dataset should be sufficiently large to detect a difference within the intended use environment with adequate power.
c) Datasets need to be deidentified and anonymized to comply with the data privacy and patient’s confidentiality rules.
3) The right “techniques”
Technical partners should be identified to help clinicians develop an AI algorithm, e.g. machine learning for data, deep learning for images.
4) The right “reporting guidelines”
International reporting guidelines (STARD, CONSORT, SPIRIT-AI) should be followed when reporting AI diagnostic performance.
5) The right “enabler”
The cloud, standalone workstations, or imaging devices should be able to deploy the AI algorithm.
6) The ethical rights of the patients
Ethical standards should be agreed-upon by all stakeholders before the wide implementation of AI in medicine.
Author Insights – Dr. Al-Aswad
The combination of human intelligence and artificial intelligence or what is called augmented intelligence is guaranteed to transfer healthcare delivery and management in the near future. As any emerging field or technology, there is the promise of endless benefits but with the possibility of harm1. In this editorial we described the six "rights" to consider today, but with further research and implementation in the future we will most likely identify more.
For example, in the context of health disparities, could augmented intelligence be applied towards precision medicine approaches that can be deployed across the world to balance health outcomes? Conversely, will the benefits of augmented intelligence be available to everyone or only the wealthy?
To achieve some of these goals we need better understanding of the models, their output and their intended roles in the healthcare paradigms. This can only occur with standardization and transparency in the field. If you look at the current AI research at least in ophthalmology, you find a variety of models and reporting without consensus and/or standardization. Currently the field of AI in my opinion is similar to the wild west and with time, it will have more roles, regulations, standards for reporting and implementations.
Asia-Pacific Journal of Ophthalmology
As populations worldwide age, it has become increasingly difficult for health systems to keep up with the growing eye care needs of the population. Even in developed nations such as the United States, the wait time to see an eye care provider has become substantial. Studies have shown that earlier intervention can help prevent permanent vision impairment. There are well known gaps in population level screening for major eye diseases such as diabetic retinopathy (DR), age-related macular degeneration (AMD), and glaucoma. While digital ophthalmology solutions such as teleophthalmology have helped to improve clinical outcomes, their benefit is constrained by scalability. AI based technologies can help to alleviate this constraint by scaling-up eye screening and primary care services. Based upon the well-understood relationships between clinical features and disease severity in the major eye diseases such as glaucoma and DR, AI can be harnessed to perform initial screening. In the last 5-10 years a new branch of AI known as deep learning (DL) has developed which allows the AI to “self-learn” predictive features for classification of diagnosis or severity. DL algorithms have shown clinically acceptable performance in classifying ophthalmic imaging data in various eye diseases such as DR. AI based algorithms have also been developed that detect AMD in color fundus photography (CFPs) and predict AMD progression using OCTs. Other possible applications include automating the selection of intraocular lenses during preoperative planning for cataract surgery. It must be noted that there are significant challenges to the practical application of AI into ophthalmology. However, AI has already shown tremendous promise in the clinical setting and novel solutions are already in the works.
AI Insights – Implementations at a Glance
Diabetic retinopathy - One example of AI based technology already in use for the diagnosis of DR is EyeArt. EyeArt is an automatic DR detection device that is commercially available in the EU and Canada and when measured against the Messidor-2 data set the referable DR screening sensitivity was found to be 93.8%, with a specificity of 72.2% . EyeArt also implemented the first AI based DR screening study relying on smartphone app-based fundus images, achieving a sensitivity of 95.8% for any DR, 99.3% for referable DR, and 99.1% for sight-threatening DR, with specificities of 80.2%, 68.8%, and 80.4% respectively .
AMD - Several studies have demonstrated success using AI in the diagnosis of AMD. One research group developed an OCT- based DL algorithm to evaluate for the need of anti-VEGF treatment, they used 150,000 OCT line scans for training and 5,358 for validation, with an AUC of 96.8%, a sensitivity of 90.1%, and a specificity of 96.2% .
IOL Calcs - There are currently several AI based formulas for Intraocular Lens (IOL) power calculations, demonstrating high levels of accuracy, such as de the Hill-Radial Basis Function (RBF) calculator, Kane formula, PEARL-DGS formula and Ladas formula . A new model using various ML techniques was able to predict 90% and 100% of eyes within ±0.5D and ±1.0D, respectively .
 Grzybowski, A., Brona, P., Lim, G. et al. Artificial intelligence for diabetic retinopathy screening: a review. Eye 34, 451–460 (2020). https://doi.org/10.1038/s41433-019-0566-0
 Nuzzi R, Boscia G, Marolo P, Ricardi F. The Impact of Artificial Intelligence and Deep Learning in Eye Diseases: A Review. Front Med (Lausanne). 2021 Aug 30;8:710329. doi: 10.3389/fmed.2021.710329. PMID: 34527682; PMCID: PMC8437147.
 Rampat R, Deshmukh R, Chen X, et al. Artificial Intelligence in Cornea, Refractive Surgery, and Cataract: Basic Principles, Clinical Applications, and Future Directions. Asia Pac J Ophthalmol (Phila). 2021;10(3):268-281. Published 2021 Jul 1. doi:10.1097/APO.0000000000000394
The optic disc may tell us more than what catches the eye. 86,123 pairs of fundus photos from the Duke Eye Center matched with spectral domain optical coherence tomography (SD OCT) images from patients with glaucoma or glaucoma suspects were used to train a convolutional neural network to predict global retinal nerve fiber layer (RNFL) thickness. Predicted RNFL thickness from fundus photos significantly correlated with observed RNFL thickness based on OCT (r = 0.76). Fundus photos could also discriminate progressors from nonprogressors (AUC= 0.86). While fundus photos will never replace OCT, especially given the decreased accuracy and lack of sectoral measurements, it may be a useful tool in limited resource settings where OCT is not available.
AI Insights - AUC and ROC
AUC (Area Under Curve) is a commonly used metric in AI representing the area under the ROC (Receiver Operator Characteristic) curve. Every test has a sensitivity and specificity - but many AI algorithms are tunable to have different thresholds that trade off between increased sensitivity and specificity. The ROC curve is a graph of all the test's sensitivity/specificity pairs, and the area under the curve represents better overall sensitivity/specificity, represented as a decimal with a maximum area of 1.0. It is important to note that a random classifier between two possibilities (such as progressors vs nonprogressors) will have an AUC of 0.5, so should be considered the baseline minimum. Sensitivity and specificity are not affected by the prevalence of the disease being studied by the test, so the AUC reflects algorithmic performance which may not reflect real-world accuracy after accounting for the disease prevalence.
Translational Vision Science & Technology
Can machine learning algorithms stand out in a crowded field? 90,713 visual fields from 13,516 eyes across five institutions were used to train six different machine learning (ML) algorithms to identify glaucomatous progression. The performance of these algorithms were compared to six existing progression algorithms based on clinical expert labels. ML algorithms classified progressed versus stable fields with similar/better performance than the individual or ensemble progression algorithms. They also found that traditional algorithms had a significant tendency to call “unclear” patterns consistently progressing or stable, while ML algorithms had no significant bias in these borderline cases. Visual field progression remains an important problem in the management of glaucoma, and machine learning algorithms demonstrate equivalent or better accuracy than conventional progression algorithms with less bias in borderline progression cases.
AI Insights - Ground Truth Labels
“Ground truth” is an important concept for the training and evaluation of AI algorithms. In particular, supervised tasks (i.e, those where labels are explicitly provided for algorithm training) rely on “valid” signals to avoid learning biased or faulty patterns in the data. Therefore, great care should be taken to avoid using proxy labels without extensive consideration and verification of assumptions. For example, ICD-10 codes are a relatively simple proxy to acquire for diagnostic labels; however, anyone who has ever used an EMR in a patient with multiple hospitalizations would agree that these labels are often not rigorously evaluated for precision. Typically, expert panel judgement is considered “ground truth” for many clinical problems. However, in large datasets, manual assessment for the entire dataset may be time/cost prohibitive. Ideally, “ground truth” labels would be used when both training and testing supervised models, but when “ground truth” labels are difficult to acquire, they should at least be used for testing and evaluation of the models. In this work, authors trained their machine learning algorithm on a training dataset with proxy labels generated from the majority decision of the traditional progression algorithms, but evaluated its performance on the testing dataset with expert panel labels. The authors deliberately explored their training data and undertook various algorithmic design choices to account for their use of proxy labels for training. While the algorithms were not necessarily being fully optimized to outperform the consensus prediction, the inferences made from the testing dataset were still based on the “ground truth” labels.