Universiteit Leiden

nl en

Text & datamining

Text and Data Mining (TDM) is increasingly applied in various academic disciplines to extract useful information from unstructured textual data using computational methods.

The Centre for Digital Scholarship (CDS) offers support to researchers wishing to apply TDM techniques


We can offer help during:

  • the acquisition and the pre-processing of digitised and born-digital library collections for TDM research
  • data enrichment
  • data analysis and data visualisation
  • data curation and data preservation

Don’t hesitate to contact us if you need information about the application of TDM techniques.

Text & data mining explained

Texts and Data Mining (TDM) entails a range of techniques with which texts in natural languages can be analysed, searched and visualised. Such techniques can be applied to scholarly content, such as journal articles or monographs, but also to tweets, works of literature or blogposts. TDM can be used to identify the most frequently used words, recognize named entities such as persons or organisations, or characterise the sentiments that are expressed in a text, among many other applications. Leiden University Libraries have an important collection of publications on TDM, which can be requested through the Catalogue.

Exactly how and what can be achieved partly depends on the format of the text to be mined and on the licensing.

TDM techniques can be applied both to texts in the public domain and to texts which are still copyright-protected. As of 7 June 2021, articles 15n en 15o of the Dutch Copyright Act, stipulate that researchers affiliated with universities (and other non-commercial research institutes) are entitled to use texts they can legally access for the purpose of TDM. Publishers are not allowed to put in place in barriers for this type of research. If you experience any obstacles while acquiring texts, feel free to contact us via cds@library.leidenuniv.nl.

You can also find more information on the website pages of the Copyright Information Office.

In research based on TDM, you can basically use any textual source that you access online. Please contact us via email, cds@library.leidenuniv.nl, if you would appreciate some guidance on how to find the texts you need.

This website uses cookies.