Direkt zum Inhalt Direkt zur Suche Direkt zur Navigation

Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Wissensmanagement in der Bioinformatik

A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature

Online appendix

This web page is the online appendix to our PPI benchmark paper: Domonkos Tikk, Philippe Thomas, Peter Palaga, Jörg Hakenberg, and Ulf Leser: A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature, PLoS Comput Biol 6(7): e1000837. doi:10.1371/journal.pcbi.1000837. Here we provide source codes and documentation that enables you to reproduce our experiments, run the kernels with different settings, on different corpora, etc., and explore or compare the generated results. We kindly ask you to cite the above paper and the ones of the respective kernels used in your experiments (see below).

In the paper we compare nine state-of-the-art kernel methods for PPI relation extraction, namely

  • shallow linguistic (SL) kernel [1]
  • subtree (ST) kernel [2]
  • subset tree (SST) kernel [3]
  • partial tree (PT) kernel [4, 5]
  • spectrum tree (SpT) kernel [6]
  • k-band shortest path spectrum kernel (kBSPS) [7]
  • cosine similarity (cosine) kernel [8]
  • edit similarity (edit) kernel [8]
  • all-path graph (APG) kernel [9]

We also include in the comparison package the kernels of Kim [10] reimplemented by Fayruzov et al [11], though – due to late availability – they are not thoroughly investigated in our benchmark paper.

We made mostly use of the implementations of the original authors, except for SpT and kBSPS kernels, which were implemented by ourselves.

An instance based evaliation of all 13 kernels is described in A detailed error analysis of 13 kernel methods for protein-protein interaction extraction. Instance based difficulty levels are freely available. Download


Sources:


References:

[1] Giuliano C, Lavelli A, Romano L (2006) Exploiting shallow linguistic information for relation extraction from biomedical literature. In: Proc. of the 11th Conf. of the European Chapter of the Association for Computational Linguistics (EACL’06). Trento, Italy: The Association for Computer Linguistics, pp. 401–408.

[2] Vishwanathan SVN, Smola AJ (2002) Fast kernels for string and tree matching. In: Proc. of Neural Information Processing Systems (NIPS’02). Vancouver, BC, Canada, pp. 569–576.

[3] Collins M, Duffy N (2001) Convolution kernels for natural language. In: Proc. of Neural Information Processing Systems (NIPS’01). Vancouver, BC, Canada, pp. 625–632.

[4] Moschitti A (2006) Efficient convolution kernels for dependency and constituent syntactic trees. In: Proc. of the 17th European Conf. on Machine Learning (ECML’06). Berlin, Germany, pp. 318–329.

[5] Moschitti A (2008) Kernel Methods, Syntax and Semantics for Relational Text Categorization. In Proc. of ACM 17th Conf. on Information and Knowledge Management (CIKM’08). Napa Valley, CA, USA, pp. 253–262.

[6] Kuboyama T, Hirata K, Kashima H, Aoki-Kinoshita KF, Yasuda H (2007) A spectrum tree kernel. Information and Media Technologies 2: 292–299.

[7] Palaga P (2009) Extracting Relations from Biomedical Texts Using Syntactic Information. Master’s Thesis, Technische Universität Berlin.

[8] Erkan G, Özgür A, Radev DR (2007) Semi-supervised classification for extracting protein interaction sentences using dependency parsing. In: Proc. of the 2007 Joint Conf. on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Prague, Czech Republic, pp. 228–237.

[9] Airola A, Pyysalo S, Björne J, Pahikkala T, Ginter F, et al. (2008) All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning. BMC Bioinformatics 9: S2.

[10] Kim S, Yoon J, Yang J (2008) Kernel approaches for genic interaction extraction. Bioinformatics 24: 118–126.

[11] Fayruzov T, De Cock M, Cornelis C, Hoste V (2009) Linguistic feature analysis for protein interaction extraction. BMC Bioinformatics 10: 374.