An Active Co-Training Algorithm for Biomedical Named-Entity Recognition


Tsendsuren Munkhdalai, Meijing Li, Unil Yun, Oyun-Erdene Namsrai, Keun Ho Ryu, Journal of Information Processing Systems Vol. 8, No. 4, pp. 575-588, Dec. 2012  

10.3745/JIPS.2012.8.4.575
Keywords: Biomedical Named-Entity Recognition, Co-Training, Semi-supervised Learning, Feature Processing, Text Mining
Fulltext:

Abstract

Exploiting unlabeled text data with a relatively small labeled corpus has been an active and challenging research topic in text mining, due to the recent growth of the amount of biomedical literature. Biomedical named-entity recognition is an essential prerequisite task before effective text mining of biomedical literature can begin. This paper proposes an Active Co-Training (ACT) algorithm for biomedical named-entity recognition. ACT is a semi-supervised learning method in which two classifiers based on two different feature sets iteratively learn from informative examples that have been queried from the unlabeled data. We design a new classification problem to measure the informativeness of an example in unlabeled data. In this classification problem, the examples are classified based on a joint view of a feature set to be informative/non-informative to both classifiers. To form the training data for the classification problem, we adopt a query-bycommittee method. Therefore, in the ACT, both classifiers are considered to be one committee, which is used on the labeled data to give the informativeness label to each example. The ACT method outperforms the traditional co-training algorithm in terms of fmeasure as well as the number of training iterations performed to build a good classification model. The proposed method tends to efficiently exploit a large amount of unlabeled data by selecting a small number of examples having not only useful information but also a comprehensive pattern.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.




Cite this article
[APA Style]
Munkhdalai, T., Li, M., Yun, U., Namsrai, O., & Ryu, K. (2012). An Active Co-Training Algorithm for Biomedical Named-Entity Recognition. Journal of Information Processing Systems, 8(4), 575-588. DOI: 10.3745/JIPS.2012.8.4.575.

[IEEE Style]
T. Munkhdalai, M. Li, U. Yun, O. Namsrai, K. H. Ryu, "An Active Co-Training Algorithm for Biomedical Named-Entity Recognition," Journal of Information Processing Systems, vol. 8, no. 4, pp. 575-588, 2012. DOI: 10.3745/JIPS.2012.8.4.575.

[ACM Style]
Tsendsuren Munkhdalai, Meijing Li, Unil Yun, Oyun-Erdene Namsrai, and Keun Ho Ryu. 2012. An Active Co-Training Algorithm for Biomedical Named-Entity Recognition. Journal of Information Processing Systems, 8, 4, (2012), 575-588. DOI: 10.3745/JIPS.2012.8.4.575.