Workshops > Frontiers in Mathematical Oncology

Frontiers in Mathematical Oncology

Model dictionaries to screen experimental data for possible biological events

Anelia Horvath

George Washington University


Tumor samples have complex nucleic composition consisting of normal and cancerous genome and transcriptome components. Segregating tumor from micro-environmental features has strong implications for the sample’s analyses, and is essential for interpreting both clinical and biological findings. Most of the existing approaches to resolve tumor purity utilize solely DNA-derived data and do not take advantage of the expressed genetic variation. We present a novel approach to interrogate genome composition, including ploidy changes, genome admixture, and allele-specific expression, based on differential RNA-DNA allele distribution from Next Generation Sequencing data. Our method – Model Dictionaries - implements Earth Mover’s Distance to quantify continuous allelic signals derived from experimental RNA and DNA sequencing data, through testing them against idealized allele distributions that correspond to all the possible biological events. Model Dictionaries uses chromosome-of-origin transcriptional information to refine and further inform the sample’s genome composition, and offers immediate visualization at desired resolution from genome to nucleotide. Our method demonstrates high accuracy on both simulated datasets and real data assessed through seven popular alternative approaches. Applying Model Dictionaries, we defined novel relationships between encoded and expressed genetic variation that improve solving of admixed cellular content.