Grammar and Corpora 2016
Zitierempfehlung (Kapitel)

Schauwecker, Yela und Stein, Achim: Automatic Morphosyntactic and Dependency Annotation of the Anglo-Norman Text Database, in: Fuß, Eric et al. (Hrsg.): Grammar and Corpora 2016, Heidelberg: Heidelberg University Publishing, 2018.

Weitere Zitierweisen

Dieses Werk ist unter der
Creative Commons-Lizenz 4.0
(CC BY-SA 4.0)
Creative Commons Lizenz BY-SA 4.0

Identifikatoren (Buch)
ISBN 978-3-946054-84-9 (PDF)
ISBN 978-3-946054-82-5 (Softcover)
ISBN 978-3-946054-83-2 (Hardcover)

Veröffentlicht am 16.05.2018.

Yela Schauwecker, Achim Stein

Automatic Morphosyntactic and Dependency Annotation of the Anglo-Norman Text Database

Abstract Non-standardized languages are an immense challenge for auto­matic annotation. This paper discusses the case of Anglo-Norman (AN), which is the variety of Old French (OF) spoken and written in medieval England for over 300 years, until well after 1400. In addition to presenting the irregularities in, for example spelling, inflection and word-order that are also characteristic of OF, AN developed particular spelling variants, shows even less consistent case-marking and considerable diachronic variation between the earliest (c1112) and the latest (c1440) texts in the Anglo-Norman text database (Rothwell and Trotter 2005; henceforth “ANdb”).

We present the first attempt to provide an automatic grammatical analysis of the ANdb. We applied machine-learning techniques combined with lexi­con-driven tools that were trained on OF resources. This paper is organized according to the individual steps in the annotation process: section 1 gives a succinct overview of the historical context and some relevant linguistic pecu­liarities of AN. Section 2 deals with the automated graphical “normalisation” of the texts. We generated regularized spellings that temporarily substituted the graphical forms during the annotation process to improve the accuracy of lemmatisation, part-of-speech tagging, and dependency parsing. Section 3 describes how a dependency parser developed for Old French was applied to the normalised version of the AN data, and discusses the usefulness of the parsed output for historical syntactic research.

Keywords Dependency parsing, part of speech tagging, automatic spelling normalisation, Anglo-Norman, Old French historical corpora