Aims
To evaluate the open-source Natural-Language-Processing tool ‘CogStack’ in conjunction with Machine Learning for oncology clinical trial recruitment. Different approaches were explored and compared to existing methods.
Methods
A short-list of potential trial patients with hypervascular tumours was recorded over five weeks using existing targeted manual (human) and semi-automated methods. CogStack’s Named-Entity-Recognition model was employed to review the Electronic-Medical-Record (EMR) of the 21,050 patient presentations during that period and identify biomedical inclusion/exclusion criteria based concepts from the Unified Medical Language System. Each patient output was used as input into an ensemble of ML models. Model 1 ranked each patient based on their similarity to six different synthetic patients, one for each cancer of interest. The 100 most similar patients each day were short-listed and the next ML model made a binary prediction on patient suitability. The EMR’s of the CogStack short-listed patients were manually reviewed for suitability.
Results
The test case was a complex commercial trial with extensive inclusion/exclusion criteria and included multiple cancer types. The existing method yielded 12/25 suitable patients (precision@k value-0.48), identifying the first suitable patient in 84.85hrs, and average review time was 5.33mins per-patient.
Model 1 short-listed 137 unique patients; yielding 43/137 suitable patients (precision@k value-0.31), identifying the first suitable patient in 2.28hrs (97.31% faster) with an average review time of 3.08mins per-patient (42.19% faster); the classification model yielded a (weighted-average) F1-score of 55% (precision@k value-0.60).
Conclusions
This CogStack patient recruitment test-case employed a trial specific solution. Model 1 demonstrated the ability to achieve comparable precision with established methods, without requiring training data. CogStack/ML short-listed more suitable patients, as it assessed all hospital presentations and the time saving was significant. The classification model results demonstrate that with sufficient training data CogStack/ML can out-perform established methods. The benefits are an ‘always-on’ prospective patient-identification and reduced time-to-trial solution.