Sparseness-constrained Nonnegative Tensor Factorization for Detecting Topics at Different Time Scales

arXiv2010.01600MaRDI QIDQ6350495

Lara Kassab, Hanbaek Lyu, Deanna Needell, Alona Kryshchenko, Denali Molitor, Elizaveta Rebrova, Jiahong Yuan

Publication date: 4 October 2020

Abstract: Temporal data (such as news articles or Twitter feeds) often consists of a mixture of long-lasting trends and popular but short-lasting topics of interest. A truly successful topic modeling strategy should be able to detect both types of topics and clearly locate them in time. In this paper, we compare the variability of topic lengths discovered by several well-known topic modeling methods including latent Dirichlet allocation (LDA), nonnegative matrix factorization (NMF), as well as its tensor counterparts based on the nonnegative CANDECOMP/PARAFAC tensor decomposition (NCPD and Online NCPD). We demonstrate that only tensor-based methods with the dedicated mode for tracking time evolution successfully detect short-lasting topics. Furthermore, these methods are considerably more accurate in discovering the points in time when topics appeared and disappeared compared to the matrix-based methods such as LDA and NMF. We propose quantitative ways to measure the topic length and demonstrate the ability of NCPD (as well as its online variant), to discover short and long-lasting temporal topics in semi-synthetic and real-world data including news headlines and COVID-19 related tweets.

Has companion code repository: https://github.com/lara-kassab/dynamic-tensor-topic-modeling

This page was built for publication: Sparseness-constrained Nonnegative Tensor Factorization for Detecting Topics at Different Time Scales

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6350495)