The Corpus

What drives the use of text analytical methods is the idea that our images of children and childhood have been, and continue to be, expressed in – amongst other possibilities – written form. That is to say, as texts. So analyzing pertinent texts can give us useful evidence for what our images are. Beyond that quite general intuition, there’s the more specific idea that, at least in their broad parameters, most of the images with traction today are the result of historical discourses about children and childhood. One need not necessarily examine every word that has been or is being said about children in order to grasp these images. Rather, a large, but still relatively contained, set of historical texts generally set the terms in which our images vary.

The goal for assembling a corpus – that is to say, the ‘body’ of texts that constitute the data for the project – is to pull together as much of this history, for as much of the world, as possible, given the time, technical, linguistic, and financial constraints of the project.

A good starting point is to assemble the primary sources used in the two books in the image above. There are a lot of sources used in them, but they offer relatively good, if English-language-biased, coverage of the modern, Eurocentric tradition, perhaps supplemented by some similar related works. Identifying sources from earlier in the Eurocentric tradition is relatively straightforward, certainly for the ‘classic’ Greek and Roman literature, if only because there’s relatively little of it and it is all available in one place online. Identifying sources from outside the Eurocentric tradition is somewhat more challenging – at least for one raised and trained within that tradition – but is by no means impossible, as the two books shown below demonstrate.

The more challenging issue, regardless of the tradition considered, is that these primary sources were composed in many, many languages. While generally good English translations exist for many of them, that is not true for all of them. Moreover, there is good reason to think it would be better to consider all the sources in their original languages. That poses a significant practical analytical problem. But one worth exploring. It may turn out that relying on English translations is necessary, within the limits of this project. But if a better solution can be found, then it definitely should be pursued.

Speaking of limitations, it is worth saying a few words about sampling and bias. Take for granted that the corpus that will be used for this project will be biased towards English language texts (even if I’m able to use a multilingual corpus), towards ‘Western’ texts, and towards texts – from whatever tradition – by dominant culture male intellectuals. I don’t say that to justify the bias, quite the contrary. I say that to be explicit, in advance, about the limits I expect to encounter in acquiring sources. In principle, what I want is the population – or a good, fully representative, random sample – of all the things that have ever been said or thought or done about human children and childhood, anywhere. What I will be getting is a partly purposive, partly convenience, sample of things that have been written, published, and come down to us in the historical record, and that are available to an anglophone Canadian university student in 2021. Those are seriously different things. A key part of the ‘purposive’ aspect of the sample will be to specifically seek out a wider range of sources, demographically speaking, than convenience alone might dictate.

Another part of that ‘purposive’ aspect – and this is not a limitation – is to specifically seek out a wider range of sources, ideologically speaking, than convenience alone would dictate. The goal of this project is to try to model all the different images of children and childhood prevailing in the world today, if perhaps skewed towards an anglo-Canadian standpoint in the world. The different images, not just the mainstream, modal, typical, ‘consensus’ images, be that mainstream defined ethnically, linguistically, religiously, or occupationally. I want to capture the variation in our images, specifically including minority opinions, specifically including stigmatized minority opinions. So, even in principle, I would want to oversample views seen as ‘extreme’ – and regardless of how noxious I might myself consider some of those views. In practical terms, I will need to consciously seek out texts embodying such views, which may not be easy.

I will be adding a bibliography of texts included in the corpus to this page, with links where possible, as I acquire them. Wish me luck!