Cervical cancer screening in the U.S. still faces major challenges, including inconsistent reporting and data that does not fully reflect all populations. To address these challenges, the American Society for Clinical Pathology (ASCP), with an inaugural grant from the Alliance for Women’s Health and Prevention (AWHP), spearheaded the Cervical Cancer Screening Collaborative.
Using large-scale, multi-site datasets, and a novel natural language processing (NLP)-based algorithm, the project aims to improve cervical cancer screening accuracy and equity across diverse populations in the U.S.
A preventable cancer still showing troubling gaps
Even with effective screening and HPV testing tools widely available, cervical cancer continues to disproportionately affect women who face barriers to care, particularly women of color and women in rural communities, according to Millicent Gorham, PhD (Hon), MBA, FAAN, Chief Executive Officer of the Alliance for Women's Health and Prevention.
Those disparities reveal a need for a broader and more inclusive understanding of how screening performs across different populations and geographic regions.
“Cervical cancer is one of the most preventable cancers we have,” Dr. Gorham says. Yet advanced cases rising disproportionately among Black and Hispanic women, women in rural communities, and women with limited access to care are “a signal that the system meant to catch this disease early is not reaching everyone it needs to,” she adds.
What also drew AWHP to ASCP’s Cervical Cancer Screening is that much of the U.S.-based evidence informing national screening guidelines comes from a single integrated health system: Kaiser Permanente Northern California. “What we are questioning is whether one health system in one region can tell us what we need to know about cervical cancer screening for every woman in this country. It cannot,” Dr. Gorham says.
Hence their support of research like ASCP’s project, which “broadens the evidence base, so that guidelines, coverage, and access are built on data that reflects the full diversity of women in this country,” Dr. Gorham adds.
A more representative national dataset
The initiative is built upon a large retrospective study drawing data from 10 participating sites across the United States. It combines Pap test results, HPV testing data, and cervical biopsy findings collected over roughly a decade, allowing researchers to analyze screening outcomes longitudinally and across diverse patient populations by age, ZIP code, race, and ethnicity.
Sachin Gupta, PhD, MBA, MLS(ASCPi)MBi, LSSBB, CPHQ, ASCP’s Scientific Director in the Center for Quality and Patient Care, explains that one of the aims of the project is to study how Pap testing diagnostics rates and HPV positivity rates vary across demographics and geographic regions. The project could eventually help validate “whether current screening guidelines are appropriate and up to date or not,” Dr. Gupta says.
By aggregating data from multiple institutions, ASCP hopes to create a broader evidence base that more accurately reflects the populations’ current screening guidelines are intended to serve.
Why cervical biopsy reports are difficult to analyze at scale
Although Pap and HPV test results are often captured in structured fields in a pathology report, cervical biopsy reports frequently exist as narrative “unstructured text,” Dr. Gupta explains, making large-scale analysis and standardized data extraction more difficult.
“The cervical biopsy result can hide within a lengthy description,” Dr. Gupta says. It’s not sustainable to pull these findings out through manual review, “given the rapid growth in documentation and expanding demand for data-driven clinical insights.”
Traditionally, extracting information from these reports would require a person to manually examine patient charts one by one. That process is labor-intensive, and prone to human error, especially when analyzing tens of thousands of pathology reports across multiple institutions.
To address that challenge, ASCP is using their Performance & Diagnostics Insights (PDI) platform and developed an NLP-based approach capable of converting narrative pathology reports into structured, analyzable data. In other words, ASCP developed a tool that can read the pathology reports and automatically sort the information into organized data, making it easier to study and spot patterns across thousands of cases.
Training an NLP model to understand pathology language
Many existing pathology NLP libraries are tissue-specific and do not generalize well to cervical biopsy data, making a custom solution necessary.
“ASCP developed a hybrid framework that integrates deep learning techniques along with BERT (Bidirectional Encoder Representations from Transformers),” Dr. Gupta says. Instead of generating text, the system interprets clinical language and extracts diagnostic information from pathology narratives.
Understanding context is critical because pathologists may phrase findings differently across institutions or reporting styles.
“Every pathologist writes their report a little differently. It’s not a kind of standardized template, so to speak. And a diagnosis can be hidden in a long text string somewhere,” Dr. Gupta says.
The system uses multiple processing steps, including text preprocessing, information extraction, feature engineering, and sentiment analysis, to help interpret the meaning behind narrative pathology language. Rather than simply searching for keywords, the model is trained to understand how pathologists communicate diagnostic findings within clinical context.
Validating the model across multiple health systems
After training and validating the algorithm on more than 10,000 pathology reports from one participating site, ASCP is now applying the model to data from additional institutions. “Our algorithm's performance turned out to be great with superb accuracy and precision and now we are in the process of using it for datasets of additional sites,” Dr. Gupta says.
Each new site introduces differences in reporting structure, formatting, and terminology, giving researchers an opportunity to evaluate how well the model adapts across real-world practice settings.
Shaping quality improvement and future guidelines
Dr. Gupta says the work may help standardize reporting practices and improve researchers’ ability to study incidence rates and screening outcomes in specific populations. The project could also create new opportunities for benchmarking and population health analysis across participating institutions.
From Dr. Gorham’s perspective, “Cervical cancer screening guidelines in this country should rest on evidence that reflects every woman the guidelines are meant to serve.”
Ultimately, the goal is fewer women diagnosed with advanced cervical cancer, Dr. Gorham adds. “We know how to prevent this disease. The work is making sure access to preventive care reaches everyone.”
The laboratory’s role in shaping the future of screening
Although the initiative relies on advanced AI and NLP tools, Dr. Gorham stresses the importance of pathologists and laboratory professionals, who are “central to this project — without them, it does not happen.”
While AI may dominate healthcare headlines in abstract terms, this project represents a highly practical application grounded in pathology workflows and laboratory data.