To content


tosca: tool for statistical content analysis

The tool was established by Lars Koppers, Jonas Rieger, Karin Boczek and Gerret von Nordheim. It provides different functions to explore text corpora with topic models. The package focuses on the visualisation and validation of content analysis. Therefore it provides some filter for preprocessing and a wrapper for the latent Dirichlet allocation (lda) from the lda-package to include topic models. Most visualisations aim to present measures for corpora, subcorpora or topics from lda over time. To use this functionality, every document needs a date specification as metadata. To harmonize different textsources we provide the S3 object textmeta.

Further information can be found on tosca's CRAN page.

ldaPrototype: Prototype of Multiple Latent Dirichlet Allocation Runs

The framework offers the possibility to determine a prototype from a number of runs of Latent Dirichlet Allocation. The procedure selects the LDA run with the highest mean pairwise similarity to all other runs, i.e. the medoid of them.

For further information have a look at the corresponding GitHub page.

rollinglda: Construct Consistent Time Series from Textual Data

RollingLDA is a rolling version of the Latent Dirichlet Allocation. By a sequential approach, it enables the construction of LDA-based time series of topics that are consistent with previous states of LDA models. After an initial modeling, updates can be computed efficiently, allowing for real-time monitoring and detection of events or structural breaks.

For further information have a look at the corresponding GitHub page.