Deep LearningCancer BiologyBioinformatics

MMDRP: Predicting Cancer Drug Response with Multimodal Deep Learning

June 2022 · Farzan Taj

The Problem

Two patients can receive the same cancer diagnosis and the same chemotherapy regimen, yet one responds well while the other does not. This heterogeneity is driven by the unique molecular landscape of each patient's tumour — differences in gene expression, mutational burden, protein abundance, and more.

Predicting in vitro drug response from cell line profiles is a tractable proxy for this problem: if we can learn which molecular features make a cancer cell line sensitive or resistant to a given compound, we gain both a predictive tool and a window into the biology of drug action.

Our Approach

Most prior work used gene expression alone to represent a cell line. We asked: what happens if you give the model everything? MMDRP (Multimodal Drug Response Prediction) encodes up to eight distinct omics layers simultaneously, fusing them with a learned representation of the drug's chemical structure.

MMDRP Model Architecture

Gene Expression

Mutations

Copy Number

Proteomics

miRNA

Metabolomics

Histone Marks

RPPA

Drug Features (SMILES)

↓

Cell Line Encoder

Drug Encoder

↓

Multimodal Fusion

↓

Drug Response (AUC)

Biomarker Scores

Each input modality is projected through its own encoder before being concatenated into a shared representation. A prediction head then regresses the drug response metric (area under the dose-response curve, AUC) from CTRPv2. Importantly, the model is designed to handle missing modalities gracefully — not every cell line has every data type.

Data

Cell line omics data came from the DepMap portal (releases 20Q2 and 21Q2). Drug response data was sourced from CTRPv2 and processed using the PharmacoGx Bioconductor package.

Modality	Source	Scale
Gene expression	DepMap 21Q2	~19 000 genes
Somatic mutations	DepMap 21Q2	~18 000 genes
Copy number variation	DepMap 21Q2	~18 000 genes
Proteomics	DepMap 20Q2	~8 000 proteins
miRNA expression	DepMap 21Q2	~734 miRNAs
Metabolomics	DepMap 21Q2	~225 metabolites
Histone modifications	DepMap 21Q2	Chromatin marks
RPPA	DepMap 21Q2	~214 antibodies
Drug structure	CTRPv2	SMILES fingerprints

Drug chemical structures were encoded from SMILES strings into fingerprint representations, giving the model access to structural information that purely omics-based approaches ignore entirely.

Biomarker Discovery

Prediction accuracy is only half the story. A model that achieves high AUC but is a black box offers limited scientific value. MMDRP includes an attribution pipeline: by computing input gradients with respect to the predicted drug response, we can rank individual features — genes, proteins, metabolites — by how much they influence each prediction.

This allows the model to surface candidate biomarkers on a per-drug basis. A feature that consistently appears in attributions across cell lines sensitive to a particular compound is a hypothesis about the mechanism of action or resistance — something a wet-lab collaborator can test.

Implementation

The entire pipeline is open-source: R scripts handle data downloading and preprocessing (dose-response curve fitting, omics normalisation), and a PyTorch-based Python CLI runs training and inference. The repository is designed to be reproducible — starting from raw DepMap and CTRPv2 downloads and producing trained models and result figures.

$ python src/drp_full_model.py --modalities expression mutation cnv protein --drug-features smiles