Abstract
Humans have been shown to have biases when reading medical images, raising questions about whether humans are uniform in their disease gradings. Artificial intelligence (AI) tools trained on human-labeled data may have inherent human non-uniformity. In this study, we used a radiographic knee osteoarthritis external validation dataset of 50 patients and a six-year retrospective consecutive clinical cohort of 8,273 patients. An FDA-approved and CE-marked AI tool was tested for potential non-uniformity in Kellgren-Lawrence grades between the right and left sides of the images. We flipped the images horizontally so that a left knee looked like a right knee and vice versa. According to human review, the AI tool showed non-uniformity with 20–22% disagreements on the external validation dataset and 13.6% on the cohort. However, we found no evidence of a significant difference in the accuracy compared to senior radiologists on the external validation dataset, or age bias or sex bias on the cohort. AI non-uniformity can boost the evaluated performance against humans, but image areas with inferior performance should be investigated.
Original language | English |
---|---|
Article number | 26782 |
Journal | Scientific Reports |
Volume | 14 |
Issue number | 1 |
Number of pages | 9 |
ISSN | 2045-2322 |
DOIs | |
Publication status | Published - 2024 |
Bibliographical note
Publisher Copyright:© The Author(s) 2024.
Keywords
- Artificial intelligence
- Bias
- Clinical data
- Knee osteoarthritis
- Laterality
- Uniform performance