STT Text corpus comprises news wire articles in Finnish sent to media outlets between 1992-2018 (soon until 2021). The corpus includes about 2,8 million items in total. Most of the material is news articles that vary from short “news flashes” to telegrams and longer articles. News articles are categorized by department (domestic, foreign, economy, politics, culture, entertainment and sports) as well as by metadata (IPTC subject codes or keywords and location data).
STT Text corpus is available through Language Bank of Finland (Kielipankki). For non-commerial reseach you can get access to the corpus by filling in an application at the Language Bank site. For commercial research and use, we ask you kindly to contact us first through mediapalvelut(at)stt.fi
For further information, please contact mediapalvelut(at)stt.fi