TY - JOUR
T1 - External validation of an artificial intelligence tool for radiographic knee osteoarthritis severity classification
AU - Brejnebøl, Mathias Willadsen
AU - Hansen, Philip
AU - Nybing, Janus Uhd
AU - Bachmann, Rikke
AU - Ratjen, Ulrik
AU - Hansen, Ida Vibeke
AU - Lenskjold, Anders
AU - Axelsen, Martin
AU - Lundemann, Michael
AU - Boesen, Mikael
N1 - Publisher Copyright:
© 2022 The Authors
PY - 2022
Y1 - 2022
N2 - Purpose: To externally validate an artificial intelligence (AI) tool for radiographic knee osteoarthritis severity classification on a clinical dataset. Method: This retrospective, consecutive patient sample, external validation study used weight-bearing, non-fixed-flexion posterior-anterior knee radiographs from a clinical production PACS. The index test was ordinal Kellgren-Lawrence grading by an AI tool, two musculoskeletal radiology consultants, two reporting technologists, and two resident radiologists. Grading was repeated by all readers after at least four weeks. Reference test was the consensus of the two consultants. The primary outcome was quadratic weighted kappa. Secondary outcomes were ordinal weighted accuracy, multiclass accuracy and F1-score. Results: 50 consecutive patients between September 24, 2019 and October 22, 2019 were retrospectively included (3 excluded) totaling 99 knees (1 excluded). Quadratic weighted kappa for the AI tool and the consultant consensus was 0.88 CI95% (0.82–0.92). Agreement between the consultants was 0.89 CI95% (0.85–0.93). Intra-rater agreements for the consultants were 0.96 CI95% (0.94–0.98) and 0.94 CI95% (0.91–0.96) respectively. For the AI tool it was 1 CI95% (1–1). For the AI tool, ordinal weighted accuracy was 97.8% CI95% (96.9–98.6 %). Average multiclass accuracy and F1-score were 84% (83/99) CI95% (77–91%) and 0.67 CI95% (0.51–0.81). Conclusions: The AI tool achieved the same good-to-excellent agreement with the radiology consultant consensus for radiographic knee osteoarthritis severity classification as the consultants did with each other.
AB - Purpose: To externally validate an artificial intelligence (AI) tool for radiographic knee osteoarthritis severity classification on a clinical dataset. Method: This retrospective, consecutive patient sample, external validation study used weight-bearing, non-fixed-flexion posterior-anterior knee radiographs from a clinical production PACS. The index test was ordinal Kellgren-Lawrence grading by an AI tool, two musculoskeletal radiology consultants, two reporting technologists, and two resident radiologists. Grading was repeated by all readers after at least four weeks. Reference test was the consensus of the two consultants. The primary outcome was quadratic weighted kappa. Secondary outcomes were ordinal weighted accuracy, multiclass accuracy and F1-score. Results: 50 consecutive patients between September 24, 2019 and October 22, 2019 were retrospectively included (3 excluded) totaling 99 knees (1 excluded). Quadratic weighted kappa for the AI tool and the consultant consensus was 0.88 CI95% (0.82–0.92). Agreement between the consultants was 0.89 CI95% (0.85–0.93). Intra-rater agreements for the consultants were 0.96 CI95% (0.94–0.98) and 0.94 CI95% (0.91–0.96) respectively. For the AI tool it was 1 CI95% (1–1). For the AI tool, ordinal weighted accuracy was 97.8% CI95% (96.9–98.6 %). Average multiclass accuracy and F1-score were 84% (83/99) CI95% (77–91%) and 0.67 CI95% (0.51–0.81). Conclusions: The AI tool achieved the same good-to-excellent agreement with the radiology consultant consensus for radiographic knee osteoarthritis severity classification as the consultants did with each other.
KW - Artificial intelligence
KW - Conventional radiography
KW - External validation
KW - Inter-rater agreement
KW - Knee osteoarthritis
U2 - 10.1016/j.ejrad.2022.110249
DO - 10.1016/j.ejrad.2022.110249
M3 - Journal article
C2 - 35338955
AN - SCOPUS:85126866648
VL - 150
JO - European Journal of Radiology
JF - European Journal of Radiology
SN - 0720-048X
M1 - 110249
ER -