Early Colorectal Cancer Detected by Machine Learning Model Using Gender, Age, and Complete Blood Count Data

Hornbrook MC, Goshen R, Choman E, O’Keeffe-Rosetti M, Kinar Y, Liles EG, Rust KC

Dig. Dis. Sci. 2017 10;62(10):2719-2727

PMID: 28836087

Abstract

BACKGROUND: Machine learning tools identify patients with blood counts indicating greater likelihood of colorectal cancer and warranting colonoscopy referral.

AIMS: To validate a machine learning colorectal cancer detection model on a US community-based insured adult population.

METHODS: Eligible colorectal cancer cases (439 females, 461 males) with complete blood counts before diagnosis were identified from Kaiser Permanente Northwest Region’s Tumor Registry. Control patients (n = 9108) were randomly selected from KPNW’s population who had no cancers, received at ≥1 blood count, had continuous enrollment from 180 days prior to the blood count through 24 months after the count, and were aged 40-89. For each control, one blood count was randomly selected as the pseudo-colorectal cancer diagnosis date for matching to cases, and assigned a “calendar year” based on the count date. For each calendar year, 18 controls were randomly selected to match the general enrollment’s 10-year age groups and lengths of continuous enrollment. Prediction performance was evaluated by area under the curve, specificity, and odds ratios.

RESULTS: Area under the receiver operating characteristics curve for detecting colorectal cancer was 0.80 ± 0.01. At 99% specificity, the odds ratio for association of a high-risk detection score with colorectal cancer was 34.7 (95% CI 28.9-40.4). The detection model had the highest accuracy in identifying right-sided colorectal cancers.

CONCLUSIONS: ColonFlag identifies individuals with tenfold higher risk of undiagnosed colorectal cancer at curable stages (0/I/II), flags colorectal tumors 180-360 days prior to usual clinical diagnosis, and is more accurate at identifying right-sided (compared to left-sided) colorectal cancers.