Tuning Text Classification for Hereditary Diseases with Section Weighting
Tuning Text Classification for Hereditary Diseases with Section Weighting
Jörg Hakenberg1*, Juliane Rutsch2, and Ulf Leser1
1 Humboldt-Universität zu Berlin, Department of
Computer Science, Knowledge Management Group, Unter den Linden 6, 10099
Berlin, Germany.
2 School of Electrical Engineering and Computer Science, FH
Stralsund, Zur Schwedenschanze 15, 18435 Stralsund, Germany.
* Corresponding author. Current affiliation: Knowledge
Management in Bioinformatics, Dept. Computer Science,
Humboldt-Universität zu Berlin, Rudower Chaussee 25, 12489 Berlin,
Germany. Phone: +49.30.2093.3903, eMail:
hakenberg(a)informatik.hu-berlin.de
Abstract
Motivation: Information in life science publications is
heterogeneously distributed over various sections. Depending on
research questions, different sections cover more or less of the data
needed to answer them. Our approach, called section weighting, seeks to
make use of information coverage and density found in typical life
science publications. We study the impact section weighting on text
classification according to hereditary diseases.
Results: Our results indicate that weighting sections can improve text
classification. Our systems gains 7% in F1-measure when we add section
weighting. Proper composition of features is equally crucial, improving
our results by 11%. Combining both techniques, the system yields a
performance 18% higher than the baseline classifier. For our research
question, favoring the sections Abstract, Introduction, and Materials
and Methods yields the best results.
Published in
Proceedings of the First International Symposium on Semantic Mining in
Biomedicine (SMBM), pp.34-37. Hinxton, UK, April 2005.
[
PDF] - [SMBM 2005]
@InProceedings{Hakenberg:2005a, author = {J\"org Hakenberg and Juliane Rutsch and Ulf Leser}, title = {Tuning Text Classification for Hereditary Diseases with Section Weighting}, booktitle = {Proc International Symposium on Semantic Mining in Biomedicine, SMBM}, address = {Hinxton, UK}, pages = {34-37}, month = {April}, year = 2005 }