My thesis is about part-of-speech taggers. I analyzed their behavior and made some evaluations about their accuracy for incremental tagging. For this I adapted HunPOS to work incrementally.
Here you can download my thesis as pdf (it's in german): download
Since my thesis is written in german, I will use this page to give a brief overview of the content in english.
What are POS taggers?
POS taggers are programs that assign Parts of speech (also known as word classes) to words. You feed in a sentence and get a the part-of-speech for every word in it. Have a look at the Wikipedia article for more information.
Evaluation of POS taggers
The Evaluations of taggers are sometimes really short. Sometimes only the accuracy of the tagger on one corpus (which has to be split into a training set and a test set) is used as the evaluation measure.
There are several additional measures that should be taken in account:
- Accuracy over different sizes of the training set
- Robustness against unknown words
- Performance on other languages than English
POSeval is a collection of tools that help to evaluation POS taggers. It might seem to be overkill at first, but it's really handy if you want to run several tests.
The interface of the main screen looks like this:
(it looks better on my screen because I use a light text, dark background color palette).
I will write more about POSeval soon.
TnT is a tagger based on higher-order hidden markov models. It's a well known tagger, really fast and sadly not free software.
HunPOS is a reimplementation of TnT. It's free software and written in OCaml.
SVMTool is based on Support Vector machines. It's not as fast as TnT or HunPOS, but more accurate.
This one is a rule based tagger. It learns a list of transformations that are applied on the sentence. This tagger is from 1994 and can't compete with the others in terms of accuracy.
The following tests where made on the NEGRA corpus (a German newspaper corpus).
The X-axes of the graphs are the size of the training set, the y axis represent the accuracy. Three different train/test splits where used for every size of the training set. You can see the best, worst and average result as error bars.
The most important graph first:
These are the best configurations for each tagger. As you can see, SVMTool wins.
Now, let's have a look on the performance on the words that were missing in the training set (the unknown words):
Here is the graph for the Brill-Tagger, competing with HunPOS:
tagging on a different corpus
To investigate how well the taggers can adapt to new texts, I let them train on the same sentences as above, but tag sentences from the heise corpus (unfortunately not released yet). Since these articles are mostly technical, the structure can be different from those of the heise corpus.
Here are the results:
And here are the results for unknown words:
As you might have seen, I wanted to work on incremental taggers. I've adapted HunPOS to work incrementally, here is the source. SVMTool didn't need such an adaption because the use of the Viterbi algorithm is optional.
HunPOS doesn't use lookahead, SVMTool comes in three variants: normal (Lookahead of two words), 1_lookahead, no_lookahed.
Here are the results:
As you can see, HunPOS performs as well as SVMTool without lookahead and the difference between a lookahead of one and two is marginal.