Todorova, Velislava und Chinkina, Maria: Significance Filters for N-gram Viewer, in Bubenhofer, Noah und Kupietz, Marc (Hrsg.): Visualisierung sprachlicher Daten: Visual Linguistics – Praxis – Tools, Heidelberg: Heidelberg University Publishing, 2018, S. 301–314. https://doi.org/10.17885/heiup.345.c4407

Identifier (Buch)

ISBN 978-3-946054-75-7 (PDF)
ISBN 978-3-946054-77-1 (Hardcover)
ISBN 978-3-947732-15-9 (Softcover)




Velislava Todorova, Maria Chinkina

Significance Filters for N-gram Viewer

Abstract This paper presents a visualization tool for the analysis of tendencies in language use over time. Given a dated and tokenized corpus, it calculates frequencies of selected n-grams and visually presents them as data points on a line chart in a coordinate system, with time on the x axis and relative frequency on the y axis. It provides the option of smoothing the graph in order to make the general tendency more salient. The user can specify an n-gram as a sequence of tokens, lemmas, and/or POS tags, if the corpus provides these anno-tations. Along with the original text, the tool also accesses the metadata of the corpus, such as dates and authors’ names, allowing for a comparison of the use of n-grams by different authors at different time periods in context. The latest version of our tool introduces a filtering mechanism that indicates the periods of time throughout which the observed values within one or more datasets are significantly different. We used Fisher’s exact test of independence because it has the advantage of providing reliable results even for sparse data.