Joanna Bilińska, Monika Kwiecień, Magdalena Derwojedowa
Microcorpus of Nineteenth-Century Polish
Abstract In the paper, a 1M word corpus of Polish texts from the period 1830– 1918 is described. The corpus was compiled to provide diversified linguistic data for morphological analysis, however several tests proved that it can be used as a versatile resource to identify various linguistic phenomena and trace their dynamics in regard to inflection, spelling or even syntax. It is divided into five equal subcorpora to provide stylistic variety: scientific texts for general public, news, feuilletons, fiction and drama. In order to conduct morphological analysis an analyzer made for contemporary texts was adapted, which can, therefore, process word forms that differ from contemporary inflection and spelling. In the paper, several experiments made with the use of the corpus are discussed.
Keywords Morphological analysis, spelling, 19th century Polish, corpus