A graph-search framework for associating gene identifiers with documentsReportar como inadecuado

A graph-search framework for associating gene identifiers with documents - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

BMC Bioinformatics

, 7:440

First Online: 10 October 2006Received: 02 May 2006Accepted: 10 October 2006


BackgroundOne step in the model organism database curation process is to find, for each article, the identifier of every gene discussed in the article. We consider a relaxation of this problem suitable for semi-automated systems, in which each article is associated with a ranked list of possible gene identifiers, and experimentally compare methods for solving this geneId ranking problem. In addition to baseline approaches based on combining named entity recognition NER systems with a -soft dictionary- of gene synonyms, we evaluate a graph-based method which combines the outputs of multiple NER systems, as well as other sources of information, and a learning method for reranking the output of the graph-based method.

ResultsWe show that named entity recognition NER systems with similar F-measure performance can have significantly different performance when used with a soft dictionary for geneId-ranking. The graph-based approach can outperform any of its component NER systems, even without learning, and learning can further improve the performance of the graph-based ranking approach.

ConclusionThe utility of a named entity recognition NER system for geneId-finding may not be accurately predicted by its entity-level F1 performance, the most common performance measure. GeneId-ranking systems are best implemented by combining several NER systems. With appropriate combination methods, usefully accurate geneId-ranking systems can be constructed based on easily-available resources, without resorting to problem-specific, engineered components.

Electronic supplementary materialThe online version of this article doi:10.1186-1471-2105-7-440 contains supplementary material, which is available to authorized users.

Download fulltext PDF

Autor: William W Cohen - Einat Minkov

Fuente: https://link.springer.com/

Documentos relacionados