A Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data

Einipour, Amin; Mosleh, Mohammad; Ansari-Asl , Karim

Volume 7, Issue 1 (6-2020) jhbmi 2020, 7(1): 60-72 | Back to browse issues page

Mendeley

Zotero

RefWorks

Einipour A, Mosleh M, Ansari-Asl K. A Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data. jhbmi 2020; 7 (1) :60-72
URL: http://jhbmi.ir/article-1-420-en.html

A Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data

Amin Einipour

, Mohammad Mosleh ^*

, Karim Ansari-Asl

Ph.D. in Computer Engineering, Assistant Professor, Computer Engineering Dept., Faculty of Engineering, Dezful Branch, Islamic Azad University, Dezful, Iran

Abstract: (4277 Views)

Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. However, the results of the proposed methods mainly depend on the shape of the cell populations and the dimensions of the data. Therefore, it is very important to develop a method that can identify cell populations regardless of these obstacles.
Method: In the proposed method, which was a library method, at first, the number of clusters (cell populations) was estimated. Estimating the number of clusters is important because in the real world, basic information such as the number and type of cell populations is not available. Thereafter, using a graph-based Gaussian kernel, while reducing the dimensions of the problem, the cell populations were identified by means of the kmeans++ clustering.
Results: The results of the implementation showed that the proposed method can achieve an acceptable improvement compared to other machine learning methods presented in this regard. For example, for the ARI criterion, values of 100, 93.47 and 84.69 were obtained for Kolod, Buettner, and Usoskin single-cell data sets, respectively.
Conclusion: The proposed method can cluster and thus identify cell populations with high accuracy and quality without having any basic information about the number and type of cell populations, regardless of the high dimensions of the problem.

Keywords: Single-cell RNA-sequencing, Clustering, Identification of Cell Populations, Graph-based Gaussian Kernel

Full-Text [PDF 1493 kb] (2463 Downloads)

Type of Study: Original Article | Subject: Bioinformatics
Received: 2019/07/16 | Accepted: 2019/12/14

Audio File [MP3 744 KB] (160 Download)