All Things Corpus!


The last TESOL Convention in Toronto seemed to be corpus-themed for me. I went to a number of sessions about using corpuses as a materials writer, as a teacher, and even having students use corpuses themselves. And I learned about some new corpus tools, new aspects of old corpus tools and lots of activity ideas.

And, yes, I’m just getting around to writing up things I did at TESOL. Better late than never.

Why Use a Corpus?

There were really three reasons I kept hearing that resonated with me:

1. Our instincts aren’t always right. Looking at how language is actually used is important because frankly what we think we know about language usage isn’t always correct. I suspect that as teachers, we tend to get a lot of textbook, overly formal input which biases our ear. We also aren’t necessarily talking to a broad spectrum of society (no one is in constant communication with speakers from all different regions of the country (or the world) of all socio-economic statuses and cultural backgrounds). We’re also aging while language is changing, like it or now. We’ve all seen those little fun facts about language. My favorite two are: Use of the subjunctive is growing in the US, not shrinking. The subjunctive is almost unheard of in the UK (even though we think the subjunctive is a formal tense and UK English is more formal than US English). If we want to give our students accurate knowledge about vocabulary and grammar use, it’s good to consult a source and a corpus is a nice source of language as it is used. We can then temper that with our own instincts and textbooks, but I know every time I look up a word in a corpus I am surprised by what I learn.

2. We discover patterns and rules we never realized existed. My personal favorite was the discovery that “due to” is almost always used with negative causes. We never say, “We are having cake due to Bob’s birthday.” We say, “We don’t have any cake due to shortages.” Stumbling on those kinds of collocations and associations helps you teach better and gives your students more of that instinct for language that we often attribute to being a native-speaker, those rules we understand subconsciously, but never really think about. That leads me to my last reason for using corpuses.

3. Students can use corpuses  Letting students discover language for themselves is a great way to impart those subconscious rules of language, yes but also to help them build vocabulary (through collocations and word families) and use vocabulary better through real-world examples.

Corpus Tools

  1. The biggest find for me was MICUSP, the Michigan Corpus of Upper-Level Student Papers (Thanks to Ashley Hewlett). MICUSP is a collection of academic class essays from undergraduate seniors and graduate students. What makes it stand out are:
    • The search and filter functions let you search or filter by academic subject, type or genre of essay, native vs. non-native speaker, particular features of the paper (abstract, lit review, tables or graphs, etc.)What that means is that you can show students examples of argument essays in their own discipline. Or easily find a specific example paper meeting your requirements. You can have students compare argument essays in Philosophy classes with argument essays in English class, or compare an abstract of a critique with an abstract of a research paper. In this way, they can see how different aspects of the paper affect each other. Students can also see what kinds of papers are written in different fields and what kinds of papers are not written.
    • The corpus provides the full-text of the essay, not just the part where your keyword is.
    • Speaking of key words, you can search with or without a key word, so students can see how a word is used across disciplines or genres.
  2. Ashley Hewlett also mentioned the MICASE, the Michigan Corpus of Academic Spoken English, which I have used before because there are fewer corpuses of spoken English. Like MICUSP, MICASE has nice search options. You can search by number and identity of speakers (professor, student, post doc fellow, etc.), gender, age, location of the encounter (seminar vs. lecture vs. service encounter), as well as discipline. You can even search by the speakers’ L1s and the nature of the interaction–more monologue or more interactive. Again, it’s nice because it provides sources (in the form of scripts unfortunately). And the corpus is fascinating because even in an academic environment, the spoken language is still full of grammar mistakes, run-on sentences, fragments, false starts, and non-sequiturs.
  3. In another presentation in the Electronic Village, Jon Smart introduced me to AntConc, a tool that lets you build and analyze your own corpus. It’s not super user-friendly but it’s also not terribly difficult. If you collect a series of texts in seperate .txt files, you can use AntConc to search them for keywords, much as a traditional corpus tool does. I thought this would be great for collecting student essays in a class you teach year after year. After a few years, or semesters, you would have a nice set of student essays that you could let students search for language use or genre features. I was also playing with it by downloading the top 100 texts on Gutenberg press, which helps students see literary language in action.

In a later post, I’ll cover some of the activities that I saw.

If you liked this post, you might enjoy my book, 50 Activities for the First Day of School, a collection of rapport building and community creating activities.

1 thought on “All Things Corpus!”

Leave a Reply

Your email address will not be published. Required fields are marked *