Skip to the content.

Pathology Markup Language (PathoML)

Brief Introduction

A significant opportunity emerges to advance data-driven cancer research by performing integrative analysis on large-scale multimodal biomedical data using computational models. However, in addition to the mere amounts and number of modalities, the value of data for data-driven cancer research lies in biologically and medically meaningful features a large multimodal data contain. Although deep-learning techniques have enabled researchers to extract large-scale features from data, it remains time-consuming and error-prone for computational methods to access and use these features for diagnostic or scientific purposes, as a result of lacking a supporting data standard. Consequently, the potential of data to inform cancer research remains largely untapped. Here we propose Pathology mark-up language (i.e. PathoML) and Tumor Pathology Ontology to systematically represent heterogeneous features across multi-modal pathology data including pathology reports and digital slides in a form suitable for use by computational models. We pilot PathoML in representing pathological features contained in pathology data of several neoplastic diseases and exemplify different uses of the representations. The example representation files, the source code of the uses cases, Tumor Pathology Ontology, as well as the ontology specification and documentation of PathoML are available in this website.

PathoML ontology specification and documentation

Representation Examples

Use Cases

Tumor Pathology Ontology

“Hello World” example

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<rdf:RDF xmlns:rdf="" 
	<owl:Ontology rdf:about="">
        	<owl:imports rdf:resource=""/>
	<histo:NeoplasticCell rdf:ID="Hello_World_Cell">
		<histo:displayName rdf:datatype="">Hello World</histo:displayName>

PathoML Team

Chen Li’s group