Reconciling Multiple Connectivity Scores for Drug Repurposing

Samart, Tuyishime, et al., 2021

Kewalin Samart (Mathematics, Michigan State University) , Phoebe Tuyishime (Food Science and Nutrition, Michigan State University) , Arjun Krishnan (Computational Mathematics, Science, and Engineering; Biochemistry and Molecular Biology, Michigan State University) , Janani Ravi (Pathobiology and Diagnostic Investigations, Michigan State University)


The basis of several recent methods for drug repurposing is the key principle that an efficacious drug will reverse the disease molecular ‘signature’ with minimal side-effects. This principle was defined and popularized by the influential ‘connectivity map’ study in 2006 regarding reversal relationships between disease- and drug-induced gene expression profiles, quantified by a disease-drug ‘connectivity score.’ Over the past 15 years, several studies have proposed variations in calculating connectivity scores towards improving accuracy and robustness in light of massive growth in reference drug profiles. However, these variations have been formulated inconsistently using various notations and terminologies even though they are based on a common set of conceptual and statistical ideas. Therefore, we present a systematic reconciliation of multiple disease-drug similarity metrics (\(ES\), \(css\), \(Sum\), \(Cosine\), \(XSum\), \(XCor\), \(XSpe\), \(XCos\), \(EWCos\)) and connectivity scores (\(CS\), \(RGES\), \(NCS\), \(WCS\), \(Tau\), \(CSS\), \(EMUDRA\)) by defining them using consistent notation and terminology. In addition to providing clarity and deeper insights, this coherent definition of connectivity scores and their relationships provides a unified scheme that newer methods can adopt, enabling the computational drug-development community to compare and investigate different approaches easily. To facilitate the continuous and transparent integration of newer methods, this article will be available as a live document ( coupled with a GitHub repository ( that any researcher can build on and push changes to.


drug repositioning/repurposing | disease gene signature | drug profile | CMap and LINCS L1000 | similarity metrics | connectivity mapping | transcriptomics

Key points


The past few decades has seen a rapid increase in computational, experimental, and clinical drug repositioning/repurposing approaches owing to the appeal of reduced costs and drug discovery time [13]. Drug repurposing works on the principle that drugs have multiple modes of action, targets, and off-targets, that can be exploited to identify new indications [1]. This principle has been leveraged to identify novel therapeutic candidates for several diseases [1,4]. Approaches and resources for drug repurposing have been broadly summarized and discussed elsewhere [2,5]. With the accumulation of massive drug and disease data collections, computational methods and databases have now become an indispensable component of the drug repurposing workflow [2,6]. Nearly all these methods leverage high-throughput gene expression profiles abundantly available for drugs and diseases to find novel associations [79]. These expression profiles can be used to derive a characteristic molecular imprint, i.e., a signature, of a disease or drug perturbation in a tissue [10]. Large compendia of such transcriptomic signatures have been created for thousands of drugs based on the differential gene expression of various cell lines with or without drug perturbation. Computational methods then use these compendia to predict repurposed candidates for a disease either based on the (dis)similarity of a drug’s expression signature to that disease’s expression signature [11] or based on similarity to the signatures of other drugs previously linked to the disease [12,13].

In this article, we will focus on these widely-used expression-based methods for drug repurposing collectively referred to as “drug-disease connectivity analysis” [11]. A typical instance of this analysis is presented in Figure 1 where novel drug indications for a particular disease of interest are identified based on the extent to which the ranked drug-gene signature is a “reversal” of the disease gene signature ([14,15] Fig. 1). Connectivity-based drug repurposing has been used to discover drugs in various cancers and non-cancer diseases [3].