Big data and the study of language and culture: Parliamentary discourse across time and space

Conveners: Jukka Tyrkkö (Linnaeus University/University of Turku), Minna Korhonen (Macquarie University), Haidee Kruger (Macquarie University/North-West University)

Workshop description

The last two decades have witnessed increasing interest in the use of large digitised archives as linguistic primary data. While these are not corpora in the strictest traditional sense (Leech 1992), they nonetheless potentially provide vast amounts of evidence for investigating cultural phenomena (Baker & McEnery 2016; Schneider 2018) as well as language, stylistic and register change (Millar 2009; Rühlemann & Hilpert 2017). At the same time, this kind of data pose diverse conceptual and methodological challenges for researchers.

This workshop is dedicated to one specific type of large cultural and linguistic data: parliamentary records. Perhaps the most well-known example of these in the English-speaking world is the Hansard, which has spread from Britain along with colonisation to other parts of the world. In addition to the British Hansard (, digitised archival records from various countries are widely available, including recently compiled specialised diachronic corpora of parliamentary records from Australia, Canada, New Zealand, and South Africa.

Although this field of research is still young, parliamentary records have already been used for research from a variety of linguistic perspectives ranging from critical discourse analysis and sentiment detection to historical sociolinguistics and register analysis. As diachronic data, parliamentary records allow the analysis of language and register change (Kruger & Smith 2018; Kruger et al. submitted; Korhonen 2018; Hou & Smith 2018; Macalister 2006) as well as of societal shifts reflected in language (Michel et al. 2010; Rheault et al. 2016; Alexander & Struan 2017; Tyrkkö and Nevala 2018; Tyrkkö accepted). While providing unparalleled opportunities for the empirical investigation of both language variation and change, parliamentary records also allow for the investigation of how a comparable register is reshaped by different contexts over time (Kruger et al. submitted). These records also include varying amounts of metadata on speakers, their political affiliations and backgrounds, and the topics of the debates and as such allow for the investigation of the role of individual speakers in language variation and change. Furthermore, as a hybrid spoken-written register, parliamentary records reflect the transformation from the actual spoken parliamentary discourse to the written record which may involve substantial editorial intervention, often altering spoken usage in the direction of norms for formal writing (Slembrouck 1992; Mollin 2007). The practices of record-keeping and transcribing parliamentary records have been varied across time and place, and their effect on compilation practices and the choice of methods in linguistic analyses has only recently been discussed in detail (Mollin 2007; Ryx 2014; Edwards 2016; Beelen et al. 2017; Hiltunen et al. 2018).

The conveners welcome contributions from any linguistic perspective that use one or more English-language parliamentary records as primary data. In addition to the various Hansards, we welcome contributions based on archives of the U.S. Congress or Senate, or similar data from other parts of the Anglophone world. The proposed papers should use quantitative methods, and make use of either the entire dataset, or a substantial part of it (e.g. in the form of a corpus based on particular design principles). Contrastive studies involving more than one parliamentary dataset, or other reference data, are also very welcome.

Potential topics include, but are not limited to:

  1. colloqualisation and democratisation
  2. register change in parliamentary debates
  3. parliamentary debates across varieties of English
  4. the use of parliamentary data to study language variation and change in English
  5. the role of individual speakers in language variation and change
  6. the transformation of spoken parliamentary discourse to written parliamentary records
  7. the linguistic use of parliamentary records as cultural artefacts.

The aim of the workshop is to promote best practices in the compilation and use of these specialised datasets, as well as to advance the use of computational, quantitative and corpus-based methods in the study of various aspects of political language. The papers presented in the workshop will be published as a collection edited by the conveners. The conveners will be in touch with the authors of the accepted papers prior to the conference.


Alexander, Marc & Andrew Struan. 2017. Digital Hansard: Politics and the uncivil. Digital Humanities 2017, Montréal, QC, Canada, 08-11 Aug 2017. 378-380.

Baker, Helen & Anthony McEnery. 2016. Corpus Linguistics and 17th-Century Prostitution. Bloomsbury.

Beelen, Kaspar, Timothy Alberdingk Thijm, Christopher Cochrane, Kees Halvemaan, Graeme Hirst, Michael Kimmins, Sander Lijbrink, Maarten Marx, Nona Naderi, Ludovic Rheault, Roman Polyanovsky & Tanya Whyte. 2017. Digitization of the Canadian parliamentary debates. Canadian Journal of Political Science/Revue canadienne de science politique 50(3): 849–864.

Edwards, Cecilia. 2016. The political consequences of Hansard editorial policies: The case for greater transparency. Australasian Parliamentary Review 31(2): 145–160.

Hiltunen, Turo, Jenni Riihimäki & Jukka Tyrkkö. 2018. Tracing the use of colloquial language in the British parliamentary record in late 19th and early 20th century. Conference presentation at ICAME39, 30 May to 3 June, 2018, Tampere, Finland.

Hou, Liwen & David A. Smith. 2018. Modeling the decline in English passivization. Proceedings of the Society for Computation in Linguistics. Vol. 1, Article 5.

Korhonen, Minna. 2018. Possession in the Australian Hansard: Stative have (got) in the 20th century. Conference presentation at ICAME39, 30 May to 3 June, 2018, Tampere, Finland.

Kruger, Haidee & Adam Smith. 2018. Colloquialization and densification in Australian English: A multidimensional analysis of the Australian Diachronic Hansard Corpus. Australian Journal of Linguistics 38(3): 293–328.

Kruger, Haidee, Bertus Van Rooy & Adam Smith. (Submitted). Register change in the British and Australian Hansard (1901–2015).

Leech, Geoffrey. 1992. Corpora and theories of linguistic performance. In Svartvik Jan (ed.) Directions in corpus linguistics. Berlin: Mouton de Gruyter. 105–122.

Macalister, John. 2006. The Maori presence in the New Zealand lexicon, 1850-2000: Evidence from a corpus-based study. English World-Wide 27(1): 1–24.

Michel, Jean-Baptiste, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak & Erez Lieberman Aiden. 2010. Quantitative analysis of culture using millions of digitized books. Science 331(6014): 176–82.

Millar, Neil. 2009. Modal verbs in TIME: Frequency changes 1923–2006. International Journal of Corpus Linguistics 14(2): 191–220.

Mollin, Sandra. 2007. The Hansard hazard: Gauging the accuracy of British parliamentary transcripts. Corpora 2(2): 187–210.

Rheault, Ludovic, Kaspar Beelen, Christopher Cochrane & Graeme Hirst. 2016. Measuring emotion in parliamentary debates with automated textual analysis. PLoS ONE 11(12). Available online at <>.

Rühlemann, Christoph & Martin Hilpert. Colloquialization in journalistic writing: The case of inserts with a focus on well. Journal of Historical Pragmatics 18(1): 104–135.

Ryx, Kathryn. 2014. “Whatever passed in parliament ought to be communicated to the public”: Reporting the proceedings of the reformed commons, 1833–50. Parliamentary History 33(3): 453–474.

Schneider, Edgar W. 2018. The interface between cultures and corpora: Tracing reflections and manifestations. ICAME Journal 42(1): 97–132.

Slembrouck, Stef. 1992. The parliamentary Hansard ‘verbatim’ report: The written construction of spoken discourse. Language and Literature 1(2): 101–119.

Tyrkkö, Jukka. (Accepted). Kinship references in the British Parliament, 1800-2005. In Lutzky, Ursula & Minna Nevala (eds) Reference and identity in public discourses (Pragmatics and Beyond). Amsterdam: John Benjamins.

Tyrkkö, Jukka & Minna Nevala. 2018. Lunatics, crackpots and maniacs: Mental illness and the democratisation of British public discourses. Conference presentation at ICEHL20, 27 August to 30 Augus, 2018, Edinburgh.