QIMR Berghofer


ML Solutions
ML Solutions
Extracting Diagnostics From Skin Cancer Pathology Reports

Using supervised machine learning methods to automatically extract diagnostic information from skin cancer pathology reports for QIMR Berghofer Medical Research Institute.


Keratinocyte cancers are the most common cancers in caucasian populations. In most jurisdictions these cancers are not routinely registered, and thus estimates of incidence are derived from administrative data that do not discriminate between basal or squamous cell carcinomas, and other diagnoses. Automated extraction of diagnostic information from pathology reports would provide timely and affordable incidence data at a population level.


We employed supervised learning methods to develop algorithms to classify diagnosis (BCC, SCC, keratoacanthoma and intraepidermal carcinoma), number of lesions, and site of lesions from free-text pathology reports. The resulting algorithms were incorporated into a web application capable of processing large numbers of pathology reports.

The training dataset included all pathology reports for participants (including non-skin lesions, benign skin lesions and melanoma). Separate supervised machine learning algorithms were developed for each classification task (i.e., diagnosis and site).

We developed a web application to upload pathology reports and analyse the free-text on a local server. This web application is capable of parsing and analysing reports across a range of formats, as used by various laboratories.

To assess ‘real-world’ performance of the algorithms, we compared algorithm-derived output against ‘gold-standard’ data.

Challenges and Roadblocks

Since pathology reports often contain discussion of multiple lesions. it can be very challenging to extract structured information from them. We implemented a multi-label classification algorithm as this delivered significant improvement over more traditional approaches.


Supervised learning methods were used to develop a web application capable of accurately and rapidly classifying large numbers of pathology reports for keratinocyte cancers and related diagnoses. In the absence of population-based skin cancer registration, this solution assists with accurately measuring subtype-specific skin cancer incidence.

About QIMR Berghofer

QIMR Berghofer is one of Australia’s most successful medical research institutes, translating discoveries from bench to bedside for a better future of health.

QIMR Berghofer

Explore More

Case Studies that share common themes and challenges.

Transforming Gut Health with Google Cloud and Microba

Transforming Gut Health with Google Cloud and Microba

ML SolutionsArrow