西安电子科技大学学报

• 研究论文 • 上一篇    下一篇

泛癌症DNA甲基化位点聚类分析

杨利英;杨胜楠;袁细国;耿芳歌;张军英   

  1. (西安电子科技大学 计算机学院,陕西 西安 710071)
  • 收稿日期:2017-09-11 出版日期:2018-08-20 发布日期:2018-09-25
  • 作者简介:杨利英(1974-),女,副教授,E-mail: yangliying1208@163.com
  • 基金资助:

    中央高校基本科研业务费专项资金资助项目(20101164977);国家自然科学基金资助项目(61571341)

Analyzing pan-cancer DNA methylation patterns via clustering

YANG Liying;YANG Shengnan;YUAN Xiguo;GENG Fangge;ZHANG Junying   

  1. (School of Computer Science and Technology, Xidian Univ., Xian 710071, China)
  • Received:2017-09-11 Online:2018-08-20 Published:2018-09-25

摘要:

当前对脱氧核糖核苷酸甲基化的研究大多局限在单一疾病中单个基因或者某个较小的区域.针对这一局限,提出了一种基于聚类的甲基化分析方法,从泛癌症角度、在全基因组水平上挖掘甲基化模式.首先对多种癌症进行差异甲基化位点筛选;然后对差异甲基化位点进行聚类;最后进行生物富集分析.在泛癌症项目提供的6种癌症上进行了实验研究,通过筛选得到2184个差异甲基化位点,通过聚类得到9个甲基化簇.在甲基化簇中进行的模式分析结果显示,不同类型的癌症在甲基化模式上存在共性,甲基化和基因表达之间关系较为复杂,不是单纯的正负相关关系.富集分析结果表明,该项研究得到的基因集合在多个癌症的通路中存在显著富集.

关键词: 模式识别, 脱氧核糖核苷酸甲基化, 差异位点, 聚类分析, 基因表达

Abstract:

There have been many studies on DNA methylation, but most researches are for a single cancer, individual genes or smaller regions of the gene. In view of the problem, this paper proposes a clustering-based method and analyzes DNA methylation from the perspective of the Pan-cancer at the whole genome level. First, methylation levels of the multiple cancer types are analyzed by SAM and the differential methylation sites are screened out. Also, by calculating the correlation between methylation and gene expression, common regulatory sites are identified. Then AP clustering is carried out at differential methylation sites. Finally, GO and KEGG are adopted for gene annotation and enrichment analysis. Experiments are performed on six categories of cancers from the Pan-cancer project in TCGA. 2184 differential methylation sites and 9 clusters are obtained based on SAM and AP. Experimental results show that the relationship between methylation and gene expression is complex rather than simple positive or negative correlation. From the results of GO and KEGG, we also conclude that the corresponding genes in clusters have been enriched in multiple cancer-related pathways, and are of good biological interpretation.

Key words: pattern recognition, deoxyribonucleic acid methylation, differential site, clustering analysis, gene expression