Research



Mathematical and Computational Biology

Analysis of single-cell and spatial omics data



Single-cell RNA sequencing (scRNA-seq) provides transcriptomic details of individual cells which reveals novel cell types and temporal trajectories of cells. More challenging questions can be addressed when we pair scRNA-seq with other technologies or data resources supplying the missing information. We are generally interested in integrating single-cell or spatial omics data and analyzing inferring complex processes from these data such as cell-cell communications. For example, the missing spatial information of scRNA-seq data can be retained by integrating with spatial gene expression data. With the spatially annotated scRNA-seq data or the recent spatial transcriptomics data, cell-cell communication, a process with a strong spatial limitation, can be studied in detail. In both tasks, we use optimal transport and its variants motivated by the biological problems. The datasets to be coupled or the spatial distribution of ligand or receptor expressions are viewed as distributions to be coupled. The complex processes and relationships are better dissected by finding the globally optimal coupling plans.

Related works


Predictive modeling of molecular structures



Computational predicting various properties of biomolecules is a key component in modern drug discovery workflow. This is a challenging task due to 1) the structural complexity of macro-biomolecules and 2) the limited data resources of biomolecular properties. We adapt topological data analysis to deriving concise structural representations of biomolecules so that machine learning models can be effectively trained with the small functional data. This strategy has been shown to be very competitive in various applications, for example, in D3R challenges for computational drug discovery (see this report paper for the performance in one D3R challenge).

Related works


Data-driven modeling of biological systems



The various data resources provides an opportunity to incorporate data into mechanistic models to study complex biological systems. Here as an example, we study the development of early mammalian embryo with the help of single-cell RNA sequencing and spatial gene expression data. During early mammalian embryo development, a small number of cells make robust fate decisions at particular spatial locations in a tight time window to form inner cell mass (ICM), and later epiblast (Epi) and primitive endoderm (PE). Here we build a multiscale three-dimensional model for mammalian embryo to recapitulate the observed patterning process from zygote to late blastocyst. By integrating the spatiotemporal information reconstructed from multiple single-cell transcriptomic datasets, the data-informed modeling analysis suggests two major processes critical to the formation of Epi/PE layers: a selective cell-cell adhesion mechanism for fate-location coordination and a temporal attenuation mechanism of cell signaling.

Related works



Topological Data Analysis and Optimal Transport

Topological data analysis



Topological data analysis is a powerful tool for describing complex and high-dimensional data and a major tool is persistent homology. Motivated by the application to molecular structure analysis, we develop TDA methods suitable for the heterogeneous molecular descriptions. Using persistent cohomology, we can connect the non-geometric information to the topological characterization of the geometric information so that both kinds of information are represented in the resulting enriched persistence barcodes. We can also define filtrations to handle dynamics involved in molecular interactions.

Related works


Optimal transport



Optimal transport has gained tremendous attention due to its application in deep learning. It is also an indispensable tool for finding correspondence between paired datasets and interactions among cells. In the applications of optimal transport to integrating scRNA-seq and spatial data and inferring cell-cell communications, it is natural to enforce constraints such that dissimilar cells should not be connected in data integration and communication between cells that are too far away is unreasonable. Motivated by this, we develop supervised optimal transport where the transport plan is restricted by application-induced constraints.

Related works