TY - JOUR
T1 - The EU-ADR corpus
T2 - Annotated drugs, diseases, targets, and their relationships
AU - van Mulligen, Erik M.
AU - Fourrier-Reglat, Annie
AU - Gurwitz, David
AU - Molokhia, Mariam
AU - Nieto, Ainhoa
AU - Trifiro, Gianluca
AU - Kors, Jan A.
AU - Furlong, Laura I.
N1 - Funding Information:
This research received funding from the European Union Community in the framework of the FP7/2007–2013 convention-governing subsidy no. 215847 – the EU-ADR project, the Innovative Medicines Initiative [eTOX,115002], and the Instituto de Salud Carlos III FEDER (CP10/00524). The Research Programme on Biomedical Informatics (GRIB) is a node of the Spanish National Institute of Bioinformatics (INB) and a member of the COMBIOMED network.
PY - 2012/10
Y1 - 2012/10
N2 - Corpora with specific entities and relationships annotated are essential to train and evaluate text-mining systems that are developed to extract specific structured information from a large corpus. In this paper we describe an approach where a named-entity recognition system produces a first annotation and annotators revise this annotation using a web-based interface. The agreement figures achieved show that the inter-annotator agreement is much better than the agreement with the system provided annotations. The corpus has been annotated for drugs, disorders, genes and their inter-relationships. For each of the drug-disorder, drug-target, and target-disorder relations three experts have annotated a set of 100 abstracts. These annotated relationships will be used to train and evaluate text-mining software to capture these relationships in texts.
AB - Corpora with specific entities and relationships annotated are essential to train and evaluate text-mining systems that are developed to extract specific structured information from a large corpus. In this paper we describe an approach where a named-entity recognition system produces a first annotation and annotators revise this annotation using a web-based interface. The agreement figures achieved show that the inter-annotator agreement is much better than the agreement with the system provided annotations. The corpus has been annotated for drugs, disorders, genes and their inter-relationships. For each of the drug-disorder, drug-target, and target-disorder relations three experts have annotated a set of 100 abstracts. These annotated relationships will be used to train and evaluate text-mining software to capture these relationships in texts.
KW - Adverse drug reactions
KW - Corpus development
KW - Machine learning
KW - Text mining
UR - http://www.scopus.com/inward/record.url?scp=84865957320&partnerID=8YFLogxK
U2 - 10.1016/j.jbi.2012.04.004
DO - 10.1016/j.jbi.2012.04.004
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:84865957320
SN - 1532-0464
VL - 45
SP - 879
EP - 884
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
IS - 5
ER -