Westfalica electronica / Automatic normalization for linguistic annotation of historical language data

Titelaufnahme

Titel	Automatic normalization for linguistic annotation of historical language data / Marcel Bollmann
Verfasser	Bollmann, Marcel
Erschienen	Bochum, 2013
Umfang	84 S.
Serie	Bochumer linguistische Arbeitsberichte ; 13
Schlagwörter (GND)	Korpus / Annotation / Online-Ressource
URN	urn:nbn:de:hbz:6:2-45690

Zugänglichkeit
Das Dokument ist öffentlich im Netz zugänglich.

Dateien
Automatic normalization for linguistic annotation of historical language data [pdf 0.81 mb]

Zusammenfassung

This paper deals with spelling normalization of historical texts with regard to further processing with modern part-of-speech taggers. Different methods for this task are presented and evaluated on a set of historical German texts from the 15th–18th century, and specific problems inherent to the processing of historical data are discussed. A chain combination using word-based and character-based techniques is shown to be best for normalization, while POS tagging of normalized data is shown to benefit from ignoring punctuation marks. Using these techniques, when 500 manually normalized tokens are used as training data for the normalization, the tagging accuracy of a manuscript from the 15th century can be raised from 28.65% to 76.27%.

Klassifikation

Alle Pflichtdokumente → 400 Sprache → 400 Sprache, Linguistik

Links

Nachweis	Nachweis in der ULB Münster

Statistik
Das PDF-Dokument wurde 10 mal heruntergeladen.

Nutzungshinweis

	Das Medienwerk ist im Rahmen des deutschen Urheberrechts nutzbar.

Titelaufnahme