For the purposes of my current research, I approach the web essentially as a corpus. There is no shortage of materials online about … well, anything … but certainly not about children, childhood, parenting and education. Some of those materials are historic documents, the influences which have shaped the understandings of children and childhood we currently hold. Most of those materials are our current expressions of those understandings. My research aims to use the historical material to help us organize the current expressions.
In that context, my main goal for LLCU 607 is to learn more about the efficient collection and analysis of content from the web. I expect I’ll also learn about some corners of the web with which I’m not already familiar and about some of the ways in which the web itself is currently being discussed in the humanities. But I am most interested in learning how to exploit the web for research purposes. Not to be crass or anything …
I have been using the internet since 1990 and the web, well, basically since it started in 1993. That said, there was only once when I set up a website of my own (in 2003) and that was very basic and almost entirely static. It was more a way to facilitate file transfers for myself than anything intended to attract visitors. I was never particularly captivated by blogging and sites like LiveJournal and MySpace, and I actively resisted the shift to Facebook, despite most of my friends being early adopters.
In that respect, I have very little experience with ‘digital cultural production’. At the same time, I have been writing on computers since the early 1980s and have been producing graphs, maps, posters, and publications professionally since 2006. So, my experience depends in part on what one includes within ‘digital cultural production’. This applies even more so with ‘digital methods’. There are things I know very well (stats, GIS). There are things I know hardly at all (audio and video processing). I have a reasonable base in programming, but only a base.
Practically speaking, my goals include:
- getting some baseline proficiency with ScraPy and Beautiful Soup,
- getting a better understanding of the different API protocols (REST, SOAP, etc.) and how best to interact with them,
- learning how to parse XML and JSON results efficiently, and
- developing a more consistent workflow for acquiring, cleaning, and processing web-derived data, especially data from social networking sites.