Finnish Text Corpus

STT Archive – Corpus with keywords

Kännykkää käyttävä henkilö lukee uutista siteerausselvityksestä.

Lehtikuva / Pihla Lehmusjoki


STT Text corpus comprises news wire articles in Finnish sent to media outlets between 1992-2021 (years 2022-2023 will be added later). The corpus includes a few million items in total. Most of the material is news articles that vary from short “news flashes” to telegrams and longer articles. News articles are categorized by department (domestic, foreign, economy, politics, culture, entertainment and sports) as well as by metadata (IPTC subject codes or keywords and location data).

STT Text corpus is available through Language Bank of Finland (Kielipankki). For non-commerial reseach you can get access to the corpus by filling in an application at the Language Bank site. For commercial research and use, we ask you kindly to contact us first through mediapalvelut(at)

