n this paper, we propose solutions to resolve the problem of many spelling variants and the problem of
lack of annotated corpus for training, which are two among the main difficulties in named entity
recognition in biomedical domain. To resolve the problem of spelling variants, we propose a use of editdistance as a feature for SVM. And we propose a use of virtual examples to automatically expand the
annotated corpus to resolve the lack-of-corpus problem. Using virtual examples, the annotated corpus can be extended in a fast, efficient and easy way. The experimental results show that the introduction of editdistance produces some improvements in protein name recognition performance. And the model, which is
trained with the corpus expanded by virtual examples, outperforms the model trained with the original
corpus. According to the proposed methods, we finally achieve the performance 75.80 in F-measure
(71.89 % in precision, 80.15 % in recall) in the experiment of protein name recognition on GENIA corpus (ver. 3.0).