J4 ›› 2014, Vol. 41 ›› Issue (6): 95-99.doi: 10.3969/j.issn.1001-2400.2014.06.016

• 研究论文 • 上一篇    下一篇

一种新的DNA模体发现聚类求精算法

张懿璞   

  1. (西安电子科技大学 计算机学院,陕西 西安  710071)
  • 收稿日期:2013-12-17 出版日期:2014-12-20 发布日期:2015-01-19
  • 通讯作者: 张懿璞
  • 作者简介:张懿璞(1985-),男,西安电子科技大学博士研究生,E-mail:zephyr26026@gmail.com.
  • 基金资助:

    陕西省自然科学基金青年人才资助项目(2013JQ8037);中央高校基本科研业务费专项资金资助项目(K5051303002)

Novel cluster refinement algorithm for DNA motif discovery

ZHANG Yipu   

  1.  (School of Computer Science and Technology, Xidian Univ., Xi'an  710071, China)
  • Received:2013-12-17 Online:2014-12-20 Published:2015-01-19
  • Contact: ZHANG Yipu

摘要:

模体发现问题是分析基因转录调控关系的一个重要方面.提出了一种新的基于熵的聚类求精算法——ECRmotif,用于DNA序列中的模体发现问题.ECRmotif使用灵活的概率模型从背景序列中鉴别模体.它首先使用一个基于熵的聚类过程将数据集划分为若干子集,并对各候选子集压缩其实例的搜索空间,求精得到模体.通过模拟数据和真实数据的实验,表明ECRmotif算法可以有效地提高运行速度和效率,并准确地找出模体.

关键词: 模体发现, 聚类, 求精

Abstract:

The motif discovery problem is an important aspect of the analysis of gene transcriptional regulatory relationship. This paper describes a novel entropy-based cluster refinement algorithm (ECRmotif) for motif discovery in DNA sequences. ECRmotif employs a flexible probabilistic model to identify motif from the background sequences. It first utilizes an entropy-based cluster process to divide the dataset into several subsets, and then reduces the instances searching space for each candidate subset and refines the motif from the candidate subsets. Experiments by using both synthetic and real datasets demonstrate that our algorithm increases the running speed and efficiency and finds motif accurately.

Key words: motif discovery, cluster, refinement