Lucie Flekova, Florian Stoffel, Iryna Gurevych, Daniel Keim
Content-based Analysis and Visualization of Story Complexity
Abstract Obtaining insights into the style and content characteristics of a novel can provide a benefit to a large number of users. Parents and teachers may be interested in finding appropriate books for children. Booksellers may want to assess the fit of a candidate’s artwork into their portfolio or determine the target audience for their promotion activities. Literature scholars might discover particular stylistic similarities in writing patterns of different authors. For all of the above, manually reviewing the textual content of the books is a tedious and time-consuming task which can be achieved only to a limited level of detail. The combination of automated data analysis of literature and computer-based visu-alization techniques proves to be powerful in giving a quick overview as well as providing details of the visualized data. In this chapter we define the umbrella term Story Complexity, and outline the text data analysis required to describe properties of literature contributing to the numerous aspects of this term. We introduce a multi-faceted model of story complexity by addressing numerous aspects of writing, which can pose difficulties to human readers attempting to follow a storyline in fictional literature. Approximations of these aspects are computed automatically with state of the art Natural Language Processing methods. We present the corresponding text data analysis methods, as well as giving examples of how the extracted data can be presented visually, so that the results of the data analysis can be perceived more effectively than by examining the extracted properties of text in a numeric way.