This project collects, digitizes, and segments the transcripts of parliamentary debates from several countries from the nineteenth century to the present. These corpora are then analyzed using NLP techniques to examine the political dimensions of investments in state capacity. This repository includes the supplementary materials and replication code for all publications related to the project, as well as different versions of the parliamentary corpora with their respective documentation. The corpora will be continuously updated.