chomsky's beef with corpus linguistics

Re: a word of thanks | Mike McDonald | June 16th, 2005

Paul Raper wrote:

There are many examples, but it is interesting how even learning a language as a first language displays many attributes that ELT teachers put down to L1 interference. Is it truly that? Or is it simply applying a logic that seems normal? Once the knowledge is acquired, the problem is solved; my son now makes very few mistakes in either English or Swiss German. His main problem now is building his vocabulary.

Most of this though seems to indicate that grammar is certainly not innate as Chomsky suggests.

My guess is that Chomsky would reply that while grammar is innate, the parameters are not, and that mixing the parameters of different languages causes errors like our children's.


Re: a word of thanks | Dave Mackie | June 16th, 2005


The problem is, Chomsky doesn't accept experimentation or evidence, only argument. He's a mediaeval thinker.


Re: a word of thanks | Chris Baldwin | June 16th, 2005

I seem to remember when I read Pinker many years ago that he referred to the fact that all languages fit in to patterns, whilst non fit into others. Seems like good reasoning to me. I'm actually on the fence on this one, though.


Re: a word of thanks | Dave Mackie | June 17th, 2005


Chomsky once advised Applied Linguists that he couldn't see any relation between his ideas and what happened in a classroom. His devotees have overlooked that, alas.

"The modern Plato" is just that- he's a mediaeval thinker who has attempted to update Plato. What he offers is a kind of "Creationism". Everything that can be known is already known. Change and growth are therefore deviant from the "ideal".


If only he'd listened to Mrs. Chomsky! | Dave Mackie | June 21st, 2005


Credit should go to Alfred North Whitehead for the observation that Western philosophy was a series of "footnotes to Plato". However, I think Noam Chomsky has always aspired to be something more than a mere footnote. He certainly must have worked hard to become the second-most quoted man, after Joshua the Nazarene: oft quoted, but perhaps less read.

An early devotee of Marx and Freud, Chomsky is a superb rhetorician, gifted in persuasive argumentation. Anyone who might doubt that should read carefully P. 8-10 of "Rules and Representations" (Oxford, Basil Blackwell, 1980) in which he dons the mantle of Galileo. He proposes therein to remake linguistics in the image of physics . . . as he understands it:

"Abstract mathematical models of the universe to which at least the physicists give a higher degree of reality than they accord the ordinary world . . . we have no present alternative to pursuing the 'Galilean style' in the natural sciences . . . to construct abstract models that are accorded more significance than the ordinary world . . . ( and ) . . . to tolerate unexplained counterevidence".

There is the beginning of the Chomsky problem(s):

1. Personal vanity (Chomsky and Galileo?)

2. Bizarre comparison (linguistics and physics ?)

3. The abstract ideal being more real than reality

4. Ignoring counter-evidence.

In case you wonder what he thinks of reality, please refer to P. 3-4 and 58 of "Aspects of the Theory of Syntax" (M.I.T. Press, 1965), in which reality is variously, but always pejoratively dismissed as "limitation . . . distraction . . . shift . . . error . . . false . . . deviation . . . degenerate . . . messy . . . not relevant."

His justification for innate UG (sadly, it doesn't have an H on the end) is based on the premise that every ordinarily-abled 5-year-old child has mastered most of the vocabulary, pronunciation and grammar of his or her native language, and has done it IN TOO SHORT A TIME and WITH INSUFFICIENT INPUT for it to have been learned. I was shocked, therefore, when I first learned that he had a wife and three children.

The average 5 year old has had about 27,000 hours of language immersion, explicit instruction, modeling, encouragement, existential necessity, supportive peer groups, and every other stimulus and motivational factor you can think of. I shudder (a personal note) to conjecture what must have gone on in the Chomsky household for him not to have noticed.

There's more, much more, and I don't want to ramble. Here are some last words from two workers who tried for many years to follow his lead. First, Yorick Wilks, University of Sheffield, Computational Pragmatics (in "Noam Chomsky: Consensus & Controversy", p. 205):

"An intellectual embarrassment"

More temperately, Terry Winograd, Stanford University, Artificial Intelligence:

"a characterization of abstract competence will inevitably fail to capture the appropriate generalities about language since it does not deal with 'what is really going on' " (in "Language as a Cognitive Process", Addison-Wesley, 1983, p. 187).

Bring back Skinner.


Re: Chomsky, Plato, a CP's pathway AND SFG | Paul Raper | June 30th, 2005

Just for what it's worth, have you seen this corpus linguistics site?


Chomsky's beef with corpus linguistics | Robert Haines | June 30th, 2005

Paul, thanks for the link to the corpus linguistics (CL) web site. It looks very interesting! The link to information about Chomsky and CL is worth a look if you're interested in learning more about Chomsky's ideas or CL.

I'd like to make sure I'm clear about one thing: Chomsky is claiming that CL is flawed, and will always be, because it captures bits of performance but not competence, is that right? So, the stuff of corpora doesn't reflect what each language user can do, but instead what he/she has done under specific circumstances. For example, my use of German after a liter of Weissbier, recorded by CL researchers at the Haufbräuhaus will not necessarily reflect what I am capable of a week later when I'm more sober and fresh off an intensive German course. Perhaps a poor example since German is not my first language, but is Chomsky's beef about such issues?


Re: Chomsky's beef with CL | Ramesh Krishnamurthy | July 5th, 2005

Hi Rob,

You quote an extreme example, but it is certainly because performance data is subject to influence from a large variety of contextual factors that Chomsky rejects its validity in theoretical linguistics.

Here is Chomsky's view as expressed in a recent interview (it is on the reading list for GLE Unit 1):

Andor, J. (2004). The master and his performance: An Interview with Noam Chomsky. Intercultural Pragmatics 1-1, 93-111.

Chomsky: Corpus linguistics doesn't mean anything. It's like saying suppose a physicist decides, suppose physics and chemistry decide that instead of relying on experiments, what they're going to do is take videotapes of things happening in the world and they'll collect huge videotapes of everything that's happening and from that maybe they'll come up with some generalizations or insights. Well, you know, sciences don't do this.   But maybe they're wrong. Maybe the sciences should just collect lots and lots of data and try to develop the results from them. Well if someone wants to try that, fine. They're not going to get much support in the chemistry or physics or biology department. But if they feel like trying it, well, it's a free country, try that. We'll judge it by the results that come out. So if results come from study of massive data, rather like videotaping what's happening outside the window, fine-look at the results. I don't pay much attention to it. I don't see much in the way of results.   My judgment, if you like, is that we learn more about language by following the standard method of the sciences. The standard method of the sciences is not to accumulate huge masses of unanalyzed data and to try to draw some generalization from them. The modern sciences, at least since Galileo, have been strikingly d ifferent. What they have sought to do was to construct refined experiments which ask, which try to answer specific questions that arise within a theoretical context as an approach to understanding the world.

Go back to Galileo. If you want to understand how bodies fall, Galileo would not have been interested in videotapes of leaves falling and balls going around and rocks rolling down mountains and so on and so forth.   What he was interested in is the highly refined abstract conception of a ball rolling down a frictionless plane, which doesn't even exist in nature. But it turns out that the study of that is what provided insight into the principles of nature and that's the way the sciences have developed ever since. To say that it's more empirical to just collect and observe data is completely wrong.

I mean physics, chemistry, biology do not lack in empirical content because they're based on experiments that are designed in order to answer questions that arise within a theoretical context. That's a form of empirical inquiry that the sciences have pursued very successfully as we know them. Somebody can argue well, they've just been done in the wrong way. They should have just been taking & collecting massive data from what's happening and try to work on that. Maybe, but I don't see any particular reason to believe it, and I don't see why linguistics should be di fferent.


People who work seriously in this particular area do not rely on corpus linguistics. They may begin by looking at facts about frequency and shifts in frequency and so on, but if they want to move on to some understanding of what's happening they will very quickly, and in fact do, shift to the experimental framework. Where you design situations, you enquire into how people will act in those situations. You design them within a framework of theoretical inquiry which has already suggested that these are likely to be important questions and I want the answers to them. But that's not corpus linguistics. If you want to use hints from data that you acquire by looking at large corpuses, fine. That's useful information for you, fine. I mean, Galileo might have gotten some hints from looking at events that were happening in the world. In fact, he did. He observed the tides-that's like corpus linguistics. You're observing the tides. And from the general observations about the tides you see regularities and so on and that leads you to construct experimental frameworks including highly abstract situations. In fact, many of Galileo's experiments were thought experiments. He couldn't actually carry them out.   You may be motivated by phenomena that you've observed in the world, but as soon as you get beyond the most superficial stage, you guide inquiry by partial understanding and experiments in which you construct situations in which you hope to get answers to particular questions that are arising from a theoretical framework. And that's done whether you're studying speech acts or human interaction or discourse or any other topic. There's no other rational way to proceed.   So there's simply no issues here. I mean, there may be issues about what people personally find to be helpful in sharpening their intuitions or stimulating their ideas, and so on, but that's personal history. Maybe some people get ideas by taking a walk and others get ideas by looking at data, but that's of no interest. What you want to know is the kind of results that come out. And invariably, they come from some form of experimental inquiry in which an experiment is just designing a situation in which you can hope that it will yield an answer to a particular question that's arising from a theoretical framework. And everyone does that.   There's no other way to proceed. There's no issues here.

However, a corpus linguist might also want to record your output when you are not slewed on Weissbier, and also to record a few hundred or a few thousand other people in a variety of states, and would thereby hope to eliminate a lot of the contextual variations and see what linguistic phenomena/features seem to remain stable, typical, central, and frequent, irrespective of speaker, context (or alcoholic status).... :-)


Who gives a ___ about Chomsky. Just Testing | Dave Mackie | July 7th, 2005


Of course Chomsky has a beef about Corpus linguistics!!!  Chomsky began as a Hebrew scholar. Then he got interested in Freud, followed by Marx. None of these influences are subject to PROOF- they all involve delicate argument from assumption.

Chomsky cannot survive conclusions derived from objective sources. He is a medieval exegesist.


PS: Jerry wrote:

What does "exegesist" mean? It's not in my dictionary. I have a feeling that other CPs may also not know that word.

It's a style of argument expanding upon a priori assumptions, typically used by religious scholars reinterpreting the meaning of scripture: i.e., they have the text as a "given", but argue over what it means in their context. When the Quran says that "Women are the sisters of Men", for example, does it mean that they are equal, or that men have a duty to supervise them?

Chomsky is very fond of this style, since it does not require him to gather data or do anything Empirical. For instance, he was happy to assert that children have all mastered the basics their native language by the age of 5 years, and that they can do this DESPITE the ..."degenerate . . . narrowly limited"... input from their environments, not as a result thereof. Much of his thinking builds on such maxims, even though evidence for their accuracy is lacking.

And there is the problem: however skilful his argument, he's not supported by data, only by assertion. I could go on, and on .......

After 38 yrs of being Chomskied, I am rather tired of the whole mess.   However, if you'd like to pursue his ideas critically (in the best sense of the word), there are some excellent books on the subject. Geoffrey Sampson's "Educating Eve", Terrence Deacon's "The Symbolic Species: The co-evolution of language and the human brain" are both accessible critiques of Chomskythink.

There's also a very nice web-site here.

Goals of linguistics and SL research | Robert Haines | July 8th, 2005

If you're interested, here is a link to a paper by Vivian Cook on relationships between linguists and second language research. Scrolling down about midway to the caption (in blue letters) 'The goals of linguistics and of SL research' takes you to a section on Chomsky's question about SL, followed by Cook's interpretation of how they relate to second language learning (a multi-competence view).


Re: Chomsky's beef with CL | Glyn Hughes | July 7th, 2005

I wish I had Chomsky's cast iron belief in himself.

No, on second thoughts, I don't.

I may be misunderstanding this but Chomsky's interpretation of what happens in sciences doesn't tally with what I remember. For a start, what is an experiment if it isn't a means of collecting data? And what's with the 'mass of unanalysed data'? I don't recall having seen a corpus study where anybody wrote about how they stared at a bunch of texts for a while before drawing conclusions. Are not concordance lines, t-scores, m-scores and even tagging for parts of speech examples of 'developing a theoretical framework' in order to learn more about what is being studied?

And as for the tides. Yes, probably you would do some mathematical modelling and all sorts of theoretical stuff but in order to draw conclusions about the tides you would still surely have to actually measure the tides themselves otherwise you're just talking about how tides might move.

I don't actually know about Galileo and his frictionless plane but it sounds to me like the sort of theoretical work that goes on in the sciences all the time. Practitioners in sciences can get quite excised about the implications of theoretical models but their purpose is ultimately as a tool for interpreting real-world data. 

A theoretical framework helps scientists to develop hypotheses but as I understand it science is all about hypothesis testing. It seems that Chomsky feels this last part is unnecessary.


Re: Chomsky's beef with CL | Robert Haines | July 7th, 2005


I've just read your post after replying to the other posts from Ramesh on Chomsky's beef. On the one hand, I have often agreed with Chomsky's ideas on U.S. and foreign politics, so I have gained some respect for his critical thinking skills in those realms. Now, I'm working to come to grips with at least a few of his concepts with regard to language.

It seems Chomsky deals with linguistics as a science and has little if any interest in applied linguistics or TEFL/TESOL. If I'm not too far off base, he wants to remind us of the importance of imagination in scientific enquiry. This might be hard to do if you've ever read Bill Bryson's book (A Short History of Nearly Everything), in which he describes how revered scientists did rather bizarre things, e.g. poking needles in their eyes and staring at the sun, to explore natural phenomena.:-)

Anyway, I get the impression Mr. Chomsky is simply working on a different level (not necessarily higher, but different). He may be thinking beyond the scope of our traditional methods as applied linguists. After all, he works in a much different context than we do, right?

I wonder if Chomsky and we teachers have something to learn from each other?

Ramesh, (and list members)

I really enjoyed reading this, and I'd like to ask some questions:

Although it seems obvious, what does Chomsky mean by 'theoretical context'?

Chomsky claims that CL cannot reasonably inform theoretical linguistics but says little about its use in applied linguistics, right?

How does Chomsky propose we refine our abstract concepts, and what sort of situations should we create in order to investigate the nature of language?

Chomsky claims that science has not traditionally been about collecting masses of data and generalizing from them, but isn't that just what we see a lot of today in the scientific community, e.g. in Medicine? Eight out of ten people showed little or no symptoms after three weeks of taking Drug X, so we can mass market it to the public now.



Re: Chomsky's beef with CL | Ramesh Krishnamurthy | July 7th, 2005

Hi Rob, Glyn, and all other interested parties,

I found an interesting webpage with contrastive references from Chomsky and other linguists, including this one:

Chomsky: The verb perform cannot be used with mass-word objects: one can perform  a task , but one cannot perform labor .

Hatcher: How do you know, if you don't use a corpus and have not studied the verb perform ?

Chomsky: How do I know? Because I am a native speaker of the English language. (Hill, 1962c: 29)

The added comment "BUT: What about perform magic ? " seems initially not to be a very significant counter-argument: the 450m-word Bank of English has 42,000+ examples of 'perform/performs/performing/perfomed' but only 27 contain 'magic', and several of those are for 'magic tricks ', not the mass-noun. But I haven't checked all the other potential mass-nouns yet...



Re: Chomsky's beef with CL | Robert Haines | July 8th, 2005

Ramesh, that's an interesting snippet. Can you share the link to the web site with us, please?

Dear McDave, Freud and Marx are two influential thinkers, whether we agree with their ideas or not. That means, as with Chomsky, it might be worth investigating what has drawn people to concepts such as ego, workers' rights and, in the case of Chomsky, language performance. Even if it's only pomp and circumstance, how can I refute what I don't understand?

If you mean to say Chomsky is an exegesist in the sense that he critically analyzes and explains scripture, I don't know if that's true. Otherwise, he's just another 'important' critical thinker/interpreter/explainer (exegegist?), right?


Re: Chomsky's beef with CL | Ramesh Krishnamurthy | July 8th, 2005

Sorry Rob, I thought I had put the URL in the email. Here it is.

The main text is in German- so you should be OK - but I think the quotes/references are meaningful/useful even if you don't know German.



Re: Chomsky's beef with CL | Ramesh Krishnamurthy | July 8th, 2005

I got 81,100 hits on Google for "perform music".

Mike McDonald

Hi Mike,

Good example: 'perform music'.

But please be careful about using Google counts as linguistic evidence. We have little idea about the number of documents accessed by Google (current estimates for the Web as a whole are 150 billion documents in English alone!), nor about the quality of the documents, number of duplicated documents, etc. (that's why I gave frequency information from a corpus, about whose contents I can be reasonably sure).

There have been numerous postings and heated discussions on corpora-list recently about the problems of using search-engine counts:

Web: Google's missing pages: mystery solved?
Web: MSN cheating too?
Web: Yahoo doubles its counts!

I had a quick look at the first page of Google hits for "perform music", and many of the examples are from headlines, section headings, concert announcements, concert programmes, etc. So this usage may be severely genre-restricted.

Using Google again, but strictly for comparison, "perform labour" has 655 hits, "perform labor" has 13,400 hits (= Google accesses more US English documents than BR English documents?); "perform work" has 607,000 hits, but many from legal/technical contexts; "perform tasks" has 521,000 hits. Again have a quick look at the contexts.

I actually think that Chomsky's intuition about the typical objects of "perform" may be reasonable; and his justification for 'how do I know - because I am a native-speaker' is excusable (often, we can intuitively say that something 'sounds right' or sounds wrong' in this way, and we can even introspect plausible "rules" or patterns).

However, how then do we test for 'native-speakerhood'? It becomes a circular argument - "this pattern is reasonable because a native-speaker says so; this person is a native-speaker because he accepts this pattern as reasonable"...



Re: Chomsky's beef with CL | Mike McDonald | July 8th, 2005

Thanks for your thought-provoking reply, Ramesh.

I didn't intend to present the 81,000 Google hits as academic evidence, just as an easily verifiable backup for my intuition as to what I think most native speakers (except Chomsky, perhaps) would agree was a natural collocation. When Google gives a few hundred hits for some collocation, it's wise to be wary. But when it gives 81,000 hits for "perform music", I think it's a reasonable indication that the verb perform can be used idiomatically with mass-word objects - in certain contexts, at least. To verify this, I just tried the Cobuild Concordance Sampler, and got several hits each for "perform+3music", "performing+3music", "performed+3music", and "music+3performed".

I had a quick look at the first page of Google hits for "perform music", and many of the examples are from headlines, section headings, concert announcements, concert programmes, etc. So this usage may be severely genre-restricted.

I disagree with the "severely". You are right, of course, that the Google examples came mainly from those genres, but that may be something to do with the specific combination of the base form of the verb "perform" with "music" I got 26,500 hits for "performed music", and 25,900 for "music is performed" - both significant numbers which do not include that many headlines or headings as far as I can see. As a classical music grad, I would expect the genres to include spoken interactions about past or future classical concerts, books and articles on classical music, various ethno-musicological genres, and tourist guides, as well as the ones you mentioned. I think the genre restriction is due more to the formal nature of the verb "perform" (vs. "play") than to any lack of idiomaticness, which is what Chomsky claims.

I actually think that Chomsky's intuition about the typical objects of "perform" may be reasonable; and his justification for 'how do I know - because I am a native-speaker' is excusable (often, we can intuitively say that something 'sounds right' or sounds wrong' in this way, and we can even introspect plausible "rules" or patterns).

Well, if Chomsky can claim native-speaker intuition, so can I. The problem with such intuition is that everyone's idiolect is necessarily limited, and within our own idiolect there may be regions that we cannot access immediately. Even when I was living in Britain, I often used to wonder "Would I say that?" when thinking about what to teach my students.

Mike McDonald

Chomsky's idea of NS | Robert Haines | July 8th, 2005

My impression is that, traditionally, 'native speaker' (NS) means someone who 'inherits' a language. But, aren't there other ways of defining the term native speaker besides claiming that the NS is someone who accepts certain patterns as reasonable, as you've done? I know NS is a slippery term to begin with, but what if we follow Mike's line of idiolects; can we claim that what is accepted as reasonable by the users in a particular language community/environment (ecolects?) qualifies as NS language?

If so, that wouldn't bolster Chomsky's claim, but it also wouldn't provide much support for current corpus findings, would it? Doesn't CL research select NS English for its corpora? If not, then Chomsky's argument about measuring performance seems valid as the corpora might include English that is ungrammatical or 'non-standard'. Granted, it is still English, but which English?


PS: Mike, sorry, you wrote 'idomaticness' and I wrote 'idiomaticity', which was an idiotic thing of me to do. :-) You mean that Chomsky is making a claim about how authentic it sounds to a native speaker, right?

I was trying to do too many things as once there. Apologies.

Re: Chomsky's idea of NS | Ramesh Krishnamurthy | July 8th, 2005

Hi Rob,

English is now spoken by c. 1.5 billion people. How many and which ones would qualify as NS?

Chomsky was justifying his statement about "perform" by saying he was an NS.

So every statement made by anyone claiming to be an NS is valid?

Sinclair says that NS may be expert 'users' of a language, but don't necessarily know much 'about' the language... How many car drivers know about the engine, gearbox, transmission, or braking system of the car they drive?




Archive Categories