- Islandora Repository
- Lund Corpora
-
-
Swedia2000
-
This is a research corpus as well as a public corpus of Swedish dialects. It was recorded 1998-2001 by the Swedia 2000 project and described in IMDI within the ECHO project.
-
-
Tactile Reading
-
Participants reading a short text from Pippi Longstocking and feeling a tactile image of a face. This is recorded by the automated finger-tracker, developed by Björn Breidegård.
-
-
ThaiSweVideo
-
The corpus consists of 60 transcripts from interactions in everyday contexts between 6 children and their caregivers (10 transcripts per child), recorded longitudinally, for the period when the children are 18 to 27 months of age. All six children are growing up in middle class environments, in Sweden and Thailand (Bangkok area) respectively. The videos of the corpus are linked to the transcripts, on an utterance-by-utterance basis using the software CLAN (MacWhinney 2000). This makes it suitable, among other things, for studies of interaction between verbal and gestural communication.
-
-
The Barack Obama Corpus
-
The Barack Obama Corpus (BOC) consists of 6,215,948 words (tokens), which are sourced from nearly 3,500 different texts, dating from January 2009 to January 2016. The texts, all taken from the White House Archives, comprise all speeches held by Barack Obama in his official capacity as 44th President of the United States of America. The earliest speech in the BOC is President Obama’s inauguration speech and the last is his final State of the Union speech (January 2016). In total, the corpus includes 34,967 word types, which leads to a type/token-ratio of 0.56.
The files, which display the original titles given to them by the White House, have been tagged for genre, audience type, date and location of delivery, and principal topics. The genres include remarks, addresses, statements, press conferences, debates and question-answer sessions, while the audience types have been separated in three: general public, specialized audience and press. The locations distinguish between the United States and abroad (Germany, UK, Indonesia etc.). Topics include a six-way distinction into political issues (e.g. fiscal household), social issues (e.g. health care), humanitarian issues, environment, representation (e.g. ceremonial duties) and campaign speeches., How to cite this resource:
Riesner, Katherina (2017). The Barack Obama Corpus [Data set]. http://hdl.handle.net/10050/00-0000-0000-0003-C53B-4@view, the_barack_obama_corpus_information.txt
-
-
The Politics of State Building
-
This project collects, digitizes, and segments the transcripts of parliamentary debates from several countries from the nineteenth century to the present. These corpora are then analyzed using NLP techniques to examine the political dimensions of investments in state capacity. This repository includes the supplementary materials and replication code for all publications related to the project, as well as different versions of the parliamentary corpora with their respective documentation. The corpora will be continuously updated.
-
-
USE
-
This corpus of written first-, second-, and third-term assigments of students of English at Uppsala University, Sweden, was collected by Ylva Berglund and Margareta Westergren Axelsson 1999-2001. It is available (also) from Oxford Text Archive, http://www.ota.ahds.ac.uk/. The corpus was described with the IMDI metadata set for the ECHO project
It is organized in subcorpora according either to tasks (useful for comparing essays from different people facing the same task) or to individuals (that is, longitudinally; useful for comparing development over time in individual writers). Note that the two hierarchies are only two different ways of visualizing the corpus -- the data pointed to is identical for both .