TY - GEN
T1 - ParlaMint: Comparable Corpora of European Parliamentary Data
AU - Erjavec, Tomaž
AU - Ogrodniczuk, Maciej
AU - Osenova, Petya
AU - Petya Osenova, Petya
AU - Pancur, Andrej
AU - Ljubešic, Nikola
AU - Agnoloni, Tommaso
AU - Barkarson, StarkaDur
AU - Calzada Pérez, María
AU - Çöltekin, Çagrı
AU - Coole, Matthew
AU - Dargis, Roberts
AU - de Macedo, Luciana D.
AU - de Does, Jesse
AU - Depuydt, Katrien
AU - Diwersy, Sascha
AU - Hansen, Dorte Haltrup
AU - Kopp, Matyáš
AU - Krilavicius, Tomas
AU - Luxardo, Giancarlo
AU - Marx, Maarten
AU - Morkevicius, Vaidas
AU - Navarretta, Costanza
AU - Rayson, Paul
AU - Ring, Orsolya
AU - Rudolf, Michał
AU - Simov, Kiril
AU - Steingrímsson, Steinþór
AU - Üveges, István
AU - van Heusden, Ruben
AU - Venturi, Giulia
PY - 2021
Y1 - 2021
N2 - This paper outlines the ParlaMint project from the perspective of its goals, tasks, participants, results and applications potential. The project produced language corpora from the sessions of the national parliaments of 17 countries, almost half a billion words in total. The corpora are split into COVID-related subcorpora (from November 2019) and reference corpora (to October 2019). The corpora are uniformly encoded according to the ParlaMint schema with the same Universal Dependencies linguistic annotations. Samples of the corpora and conversion scripts are available from the project’s GitHub repository. The complete corpora are openly available via the CLARIN.SI repository for download, and through the NoSketch Engine and KonText concordancers as well as through the Parlameter4 interface for exploration and analysis.
AB - This paper outlines the ParlaMint project from the perspective of its goals, tasks, participants, results and applications potential. The project produced language corpora from the sessions of the national parliaments of 17 countries, almost half a billion words in total. The corpora are split into COVID-related subcorpora (from November 2019) and reference corpora (to October 2019). The corpora are uniformly encoded according to the ParlaMint schema with the same Universal Dependencies linguistic annotations. Samples of the corpora and conversion scripts are available from the project’s GitHub repository. The complete corpora are openly available via the CLARIN.SI repository for download, and through the NoSketch Engine and KonText concordancers as well as through the Parlameter4 interface for exploration and analysis.
M3 - Article in proceedings
SP - 19
EP - 24
BT - Proceedings of CLARIN Annual Conference 2021
PB - CLARIN ERIC
ER -