Abstract
This study investigates whether statistical analysis of text data can reveal divergent reader reactions to the same content when presented in different languages. To address this question, we utilized Marcel Proust’s, “À la recherche du temps perdu,” a vast corpus spanning seven books with millions of words. Our analysis involved a comparison between the original raw French text and the raw English translation by Scott Moncrieff, incorporating three distinct analytical approaches. Linguistic Analysis: The initial segment of this study comprises a stochastic analysis of linguistic elements, encompassing words, syntax, grammar, and punctuation. These details are crucial in understanding the reader’s interpretation of the literary text. Visual Syntax Mapping: In the second part, we employed a visual syntax mapping (VSM) technique to create numerical vector values based on word placement and proximity within the text. This approach assigns numerical values to each word, enabling the text to be computationally processed by machine learning models. Cosine similarity measurements were computed for character names in relation to the surrounding words, generating a two-dimensional graph of the referential fictional space. Reader Reaction Analysis: The final phase of the study involved evaluating reader reactions to words based on eye movement and heart rate data to determine their positive or negative connotations. By computing values for every word in the text and averaging these values for each sentence, we created a comprehensive map of reader reactions throughout the seven books to determine where and how much the reader reaction differs based on the language.
Presenters
Wright DonaldProfessor of French and Arabic, Director of Middle Eastern Studies, Global Languages and Cultures, Hood College, Maryland, United States
Details
Presentation Type
Paper Presentation in a Themed Session
Theme
Communications and Linguistic Studies
KEYWORDS
Linguistics, Machine-Learning, Data, Quantitative Analysis, Translation