Distinguishing Fact from Fiction: Pattern Recognition in Texts Using Complex Networks - Computer Science > Computation and LanguageReportar como inadecuado




Distinguishing Fact from Fiction: Pattern Recognition in Texts Using Complex Networks - Computer Science > Computation and Language - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

Abstract: We establish concrete mathematical criteria to distinguish between differentkinds of written storytelling, fictional and non-fictional. Specifically, weconstructed a semantic network from both novels and news stories, with $N$independent words as vertices or nodes, and edges or links allotted to wordsoccurring within $m$ places of a given vertex; we call $m$ the word distance.We then used measures from complex network theory to distinguish between newsand fiction, studying the minimal text length needed as well as the optimizedword distance $m$. The literature samples were found to be most effectivelyrepresented by their corresponding power laws over degree distribution $Pk$and clustering coefficient $Ck$; we also studied the mean geodesic distance,and found all our texts were small-world networks. We observed a naturalbreak-point at $k=\sqrt{N}$ where the power law in the degree distributionchanged, leading to separate power law fit for the bulk and the tail of $Pk$.Our linear discriminant analysis yielded a $73.8 \pm 5.15%$ accuracy for thecorrect classification of novels and $69.1 \pm 1.22%$ for news stories. Wefound an optimal word distance of $m=4$ and a minimum text length of 100 to 200words $N$.



Autor: J. T. Stevanak, David M. Larue, Lincoln D. Carr

Fuente: https://arxiv.org/







Documentos relacionados