Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

You have nothing in your shopping cart yet.

Title: Language Acquisition
Description: How can we acquire Language

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above


1
The Study of Language and
Language Acquisition
We may regard language as a natural phenomenon—an
aspect of his biological nature, to be studied in the same
manner as, for instance, his anatomy
...
Lenneberg, Biological Foundations of Language
(), p
...
1 The naturalistic approach to language
Fundamental to modern linguistics is the view that human
language is a natural object: our species-specific ability to acquire
a language, our tacit knowledge of the enormous complexity of
language, and our capacity to use language in free, appropriate,
and infinite ways are attributed to a property of the natural world,
our brain
...

It follows, then, as in the study of biological sciences, linguistics
aims to identify the abstract properties of the biological object
under study—human language—and the mechanisms that
govern its organization
...
Consider the famous duo:
() a
...

b
...


Neither sentence has even a remote chance of being encountered
in natural discourse, yet every speaker of English can perceive their
differences: while they are both meaningless, (a) is grammatically

 Language Acquisition
well formed, whereas (b) is not
...
e
...
the goal of
linguistic theory’ (Chomsky /: )—in other words, a
psychology, and ultimately, biology of human language
...
The postulation of innate linguistic knowledge, the Universal Grammar (UG), is a case in point
...
A well-known example
concerns the structure dependency in language syntax and children’s knowledge of it in the absence of learning experience
(Chomsky , Crain & Nakayama )
...
Is Alex e singing a song?
b
...
There
are many possible hypotheses compatible with the language
acquisition data in ():
() a
...

c
...


front the first auxiliary verb in the sentence
front the auxiliary verb that most closely follows a noun
front the last auxiliary verb
front the auxiliary verb whose position in the sentence is a prime
number
e
...
Is [NP the woman who is sing] e happy?
b
...
*Is [the woman who e singing] is happy?
b
...
They stick to the correct operation from very early on, as
Crain & Nakayama () showed using elicitation tasks
...

Though sentences like those in () may serve to disconfirm
hypothesis (a), they are very rarely if ever encountered by children in normal discourse, not to mention the fact that each of
the other incorrect hypotheses in () will need to be ruled out by
disconfirming evidence
...
The conclusion is then Chomsky’s (: ):
‘the child’s mind
...

The principle of structure-dependence is not learned, but forms
part of the conditions for language learning
...
For
example, the s language-particular and construction-specific
transformational rules, while descriptively powerful, are inadequate when viewed in a biological context
...
, we will rely on corpus statistics from Legate () and Legate &
Yang (in press) to make this remark precise, and to address some recent challenges to
the APS by Sampson () and Pullum ()
...


 Language Acquisition
unrestrictiveness of rules made the acquisition of language wildly
difficult: the learner had a vast (and perhaps an infinite) space of
hypotheses to entertain
...

The present book is a study of language development in children
...
Drawing insights from the study
of biological evolution, we will put forth a model that make this
interaction precise, by embedding a theory of knowledge, the
Universal Grammar (UG), into a theory of learning from data
...
The justification of this approach will take the naturalistic approach just as
in the justification of innate linguistic knowledge: we will provide
evidence—conceptual, mathematical, and empirical, and from a
number of independent areas of linguistic research, including the
acquisition of syntax, the acquisition of phonology, and historical
language change—to show that without the postulated model, an
adequate explanation of these empirical cases is not possible
...


1
...
Language acquisition research attempts to
give an explicit account of this process
...
2
...
Explanation
of language acquisition is not complete with a mere description
of child language, no matter how accurate or insightful, without
an explicit account of the mechanism responsible for how
language develops over time, the learning function L
...
Such
statements, if devoid of a serious effort at some learning-theoretic
account of how this is achieved, reveal irresponsibility rather than
ignorance
...
Given reasonable assumptions about the linguistic data, the duration of learning, the
learner’s cognitive and computational capacities, and so on, the
model must be able to attain the terminal state of linguistic
knowledge ST comparable to that of a normal human learner
...
This requirement has traditionally been
referred to as the learnability condition, which unfortunately
carries some misleading connotations
...
However, this position has little
empirical content
...


 Language Acquisition
(Chomsky )
...

Hence there is no external target of learning, and hence no
‘learnability’ in the traditional sense
...
 below
documents evidence that child language and adult language
appear to be sufficiently different that language acquisition
cannot be viewed as recapitulation or approximation of the
linguistic expressions produced by adults, or of any external
target
...
This requires that the learnability condition (in the
conventional sense) must fail under certain conditions—in
particular (as we shall see in Chapter ) empirical cases where
learners do not converge onto any unique ‘language’ in the
informal and E-language sense of ‘English’ or ‘German’, but
rather a combination of multiple (I-language) grammars
...


1
...
2 Developmental compatibility
A model of language acquisition is, after all, a model of reality: it
must be compatible with what is known about children’s
language
...
No matter how much innate linguistic knowledge (S)
children are endowed with, language still must be acquired
from experience (E)
...
As long as this is the case, there remains
a possibility that there is something in the input, E, that causes
such variations
...
Only then can the
respective contribution from S and E—nature vs
...

This urges us to be serious about quantitative comparisons
between the input and the attained product of learning: in our
case, quantitative measures of child language and those of adult
language
...

A few examples illustrate this observation and the challenge it
poses to an acquisition model
...
The placement of finite
verbs in French matrix clauses is such an example
...

Jean sees often/not Marie
...


French, in contrast to English, places finite verbs in a position
preceding sentential adverbs and negations
...
This
discovery has been duplicated in a number of languages with similar properties; see Wexler () and much related work for a survey
...
The best-known example is perhaps the phenomenon of subject drop
...
(I) help Daddy
...
(He) dropped the candy
...


 Language Acquisition
level (Valian ), in striking contrast to adult language, where
subject is used in almost all sentences
...
One such example that
has attracted considerable attention is what is known as the
Optional Infinitive (OI) stage (e
...
Weverink , Rizzi ,
Wexler ): children acquiring some languages that morphologically express tense nevertheless produce a significant number
of sentences where matrix verbs are non-finite
...


Non-finite root sentences like () are ungrammatical in adult
Dutch and thus appear very infrequently in acquisition data
...

These quantitative disparities between child and adult language
represent a considerable difficulty for empiricist learning models
such as neural networks
...
g
...
It is therefore unclear how a statistical
learning model can duplicate the developmental patterns in child
language
...
The model must not produce certain patterns that are in principle
compatible with the input but never attested (the argument from
the poverty of stimulus)
...
The model must not produce certain patterns abundant in the input
(the subject drop phenomenon)
...
The model must produce certain patterns that are never attested in
the input (the Optional Infinitive phenomenon)
...
For instance, both the obligatory use of subject in English and the placement of finite verbs before/after negation
and adverbs involve a binary choice
...
As will be discussed in Chapter , previous formal
models of acquisition in the UG tradition in general have not
begun to address these questions
...

Finally, quantitative modeling is important to the development
of linguistics at large
...
Biology
did not come of age until the twin pillars of biological sciences,
Mendelian genetics and Darwinian evolution, were successfully
integrated into the mathematical theory of population genetics—
part of the Modern Synthesis (Mayr & Provine )—where
evolutionary change can be explicitly and quantitatively
expressed by its internal genetic basis and external environmental
conditions
...


1
...
3 Explanatory continuity
Because child language apparently differs from adult language, it
is thus essential for an acquisition model to make some choices
on explaining such differences
...

Explanatory Continuity is an instantiation of the well-known
Continuity Hypothesis (Macnamara , Pinker ), with
roots dating back to Jakobson (), Halle (), and Chomsky
()
...


 Language Acquisition
the contrary, children’s cognitive system is assumed to be identical to that of adults
...
Children and adults differ in linguistic performance
...
Children and adults differ in grammatical competence
...
g
...
This necessarily leads to a performance-based explanation for child
acquisition
...
To be sure, adult linguistic performance is
affected by these factors as well
...

Parsimony is the obvious, and primary, reason
...
In addition,
competence is one of the few components in linguistic performance of which our theoretical understanding has some depth
...
The tests used for competence studies, often in the
form of native speakers’ grammatical intuition, can be carefully
controlled and evaluated
...
 For example, it has been shown that there is much
data in child subject drop that does not follow from performance
limitation explanations; see e
...
Hyams & Wexler (), Roeper
& Rohrbacher (), Bromberg & Wexler ()
...


Language Acquisition 
memory lapses (Pinker ) fails to explain much of the developmental data reported in Marcus et al
...
And in Chapter , we will see additional developmental data from several studies of children’s
syntax, including the subject drop phenomenon, to show the
empirical problems with the performance-based approach
...
But
exactly how is child competence different from adult competence? Here again are two possibilities:
()

a
...

b
...


(a) says that child language is subject to different rules and
constraints from adult language
...

It is important to realize that there is nothing unprincipled in
postulating a discontinuous competence system to explain child
language
...
However, in the absence of a concrete theory of how linguistic competence matures (a) runs the risk of ‘anything goes’
...
 More specifically, we
must not confuse the difference between child language and adult
 This must be determined for individual problems, although when maturational
accounts have been proposed, often non-maturational explanations of the empirical
data have not been conclusively ruled out
...
g
...
, Demuth , Crain , Allen , Fox & Grodzinsky )
...
That is, while (part of ) child language may
not fall under the grammatical system the child eventually attains,
it is possible that it falls under some other, equally principled
grammatical system allowed by UG
...
)
This leaves us with (b), which, in combination with (b),
gives the strongest realization of the Continuity Hypothesis: that
child language is subject to the same principles and constraints in
adult language, and that every utterance in child language is
potentially an utterance in adult language
...
This position further splits
into two directions:
()

a
...

b
...


(a), the dominant view (‘triggering’) in theoretical language
acquisition will be rejected in Chapter 
...
This perspective will be elaborated in the rest of this book,
where we examine how it measures up against the criteria of
formal sufficiency, developmental compatibility, and explanatory
continuity
...
3 A road map
This book is organized as follows
...
After an encounter with the
populational and variational thinking in biological evolution that
inspired this work, we propose to model language acquisition as a
population of competing grammars, whose distribution changes
in response to the linguistic evidence presented to the learner
...

Chapter  applies the model to one of the biggest developmental problems in language, the learning of English past tense
...
Again, quantitative predictions are made and checked against children’s performance on
irregular verbs
...

Chapter  continues to subject the model to the developmental
compatibility test by looking at the acquisition of syntax
...
In addition, a number of major empirical cases
in child language will be examined, including the acquisition of
word order in a number of languages, the subject drop phenomenon, and Verb Second
...
The quantitativeness of the acquisition model
allows one to view language change as the change in the distribution of grammars in successive generations of learners
...
We apply the model of language
change to explain the loss of Verb Second in Old French and Old
English
...


2
A Variational Model of Language
Acquisition
One hundred years without Darwin are enough
...
J
...
However, this simple observation raises
profound questions: What results in the differences between child
language and adult language, and how does the child eventually
resolve such differences through exposure to linguistic evidence?
These questions are fundamental to language acquisition
research
...

Two leading approaches to L can be distinguished in this
formulation according to the degree of focus on S and L
...

Rather, emphasis is given to L , which is claimed to be a generalized learning mechanism cross-cutting cognitive domains
...
In contrast,
a rationalist approach, often rooted in the tradition of generative
grammar, attributes the success of language acquisition to a richly
endowed S, while relegating L to a background role
...
Almost all theories of acquisition in the
UG-based approach can called transformational learning models,
borrowing a term from evolutionary biology (Lewontin ): the
learner’s linguistic hypothesis undergoes direct transformations
(changes), by moving from one hypothesis to another, driven by
linguistic evidence
...
We will show that once the domain-specific and
innate knowledge of language (S) is assumed, the mechanism
language acquisition (L ) can be related harmoniously to the
learning theories from traditional psychology, and possibly, the
development of neural systems
...
1 Against transformational learning
Recall from Chapter  the three conditions on an adequate acquisition model:
()

a
...
developmental compatibility
c
...

In recent years, the GSL approach to language acquisition has
(re)gained popularity in cognitive sciences and computational
linguistics (see e
...
Bates & Elman , Seidenberg )
...
The child learner is viewed as a generalized data

 A Variational Model
processor, such as an artificial neural network, which approximates the adult language based on the statistical distribution of
the input data
...
g
...

Despite this renewed enthusiasm, it is regrettable that the GSL
approach has not tackled the problem of language acquisition in
a broad empirical context
...
g
...
g
...
Much more
effort has gone into the learning of irregular verbs, starting with
Rumelhart & McClelland () and followed by numerous
others, which prompted a review of the connectionist manifesto,
Rethinking Innateness (Elman et al
...
But even for such a trivial
problem, no connectionist network has passed the Wug-test
(Prasada & Pinker , Pinker ), and, as we shall see in
Chapter , much of the complexity in past tense acquisition is not
covered by these works
...
, there is reason to believe that these
challenges are formidable for generalized learning models such as
an artificial neural network
...
What would be remarkable is
to discover whether the constructed system learns in much the
same way that human children learn
...


A Variational Model 
to find an empiricist (learning-theoretic) alternative to the learning biases introduced by innate UG
...
That is, an empiricist must account for,
say, systematic utterances like me riding horse (meaning ‘I am
riding a horse’) in child language and island constraints in adult
language, at the same time
...

We thus focus our attention on the other leading approach to
language acquisition, which is most closely associated with generative linguistics
...
 for a simple yet convincing
example
...
The Principles and
Parameters (P&P) approach (Chomsky ) is an influential
instantiation of this idea by attempting to constrain the space of
linguistic variation to a set of parametric choices
...
g
...
It
assumes that the state of the learner undergoes direct changes, as
the old hypothesis is replaced by a new hypothesis
...

Hence, a new hypothesis is formed to replace the old
...
An influential way
to implement parameter setting is the triggering model (Chomsky
, Gibson & Wexler )
...
Again, a new
hypothesis replaces the old hypothesis
...
For the rest of our discussion, we will focus on the triggering model (Gibson & Wexler ), representative of the TL
models in the UG-based approach to language acquisition
...
1
...
The first problem concerns
the existence of local maxima in the learning space
...
 By analyzing the triggering model as a
Markovian process in a finite space of grammars, Berwick &
Niyogi () have demonstrated the pervasiveness of local
maxima in Gibson and Wexler’s (very small) three-parameter
space
...
However, Kohl (), using an
exhaustive search in a computer implementation of the triggering
model, shows that in a linguistically realistic twelve-parameter
space, , of the , grammars are still not learnable even


The present discussion concerns acquisition in a homogeneous environment in
which all input data can be identified with a single, idealized ‘grammar’
...


A Variational Model 
with the best default starting state
...
Overall, there are on average
, unlearnable grammars for the triggering model
...
In a broad sense, ambiguous evidence refers to
sentences that are compatible with more than one grammar
...
When ambiguous evidence is presented, it may select
any of the grammars compatible with the evidence and may
subsequently be led to local maxima and unlearnability
...
Only evidence that unambiguously determines the target grammar triggers the learner to
change parameter values
...
Without unambiguous evidence, Fodor’s revised
triggering model will not work
...
As pointed out by Osherson et al
...
In a most extreme form, if the last sentence the


Niyogi & Berwick () argue that ‘mis-convergence’, i
...
the learner attaining a
grammar that is different from target grammar, is what makes language change possible: hence formal insufficiency of the triggering model may be a virtue instead of a
defect
...
In addition, whatever positive implications of misconvergence are surely negated by the overwhelming failure to converge, as Kohl’s results
show
...
This scenario is by no means an
exaggeration when a realistic learning environment is taken into
account
...
For example, Weinreich et
al
...
To take a concrete example, consider again the acquisition
of subject use
...
g
...
This pattern, of course, is compatible with an
optional subject grammar
...

Consequently, variability in linguistic evidence, however sparse,
may still lead a triggering learner to swing back and forth
between grammars like a pendulum
...
1
...
g
...
Sakas & Fodor
), the difficulty posed by the developmental compatibility
condition is far more serious
...
If such models are at all relevant to the
explanation of child language, the following predictions are
inevitable:
()

a
...

b
...


A Variational Model 
To the best of my knowledge, there is in general no developmental evidence in support of either (a) or (b)
...
First,
consider the prediction in (a), the consistency of child language
with respect to a single grammar defined in the UG space
...
However, Valian
() shows that while Italian children drop subjects in % of all
sentences, the NS ratio is only % for American children in the
same age group
...

Alternatively, Hyams () suggests that during the NS stage,
English children use a discourse-based, optional-subject grammar like Chinese
...
() show that while
subject drop rate is only % for American children during the
NS stage (;–;), Chinese children in the same age group drop
subjects in % of all sentences
...
 for additional discussion)
...
() find that for -year-olds, Chinese children
drop objects in % of sentences containing objects and
American children only %
...

Turning now to the triggering models’ second prediction for
language development (b), we expect to observe abrupt changes


This figure, as well as Valian’s (), is lower than those reported elsewhere in the
literature, e
...
Bloom (), Hyams & Wexler ()
...
In particular, Wang et al
...


 A Variational Model
in child language as the learner switches from one grammar to
another
...
Behrens
() reports similar findings in a large longitudinal study of
German children’s NS stage
...
In section 
...

Again, there is no indication of a radical change in the child’s
grammar, contrary to what the triggering model entails
...


2
...
3 Imperfection in child language?
So the challenge remains: what explains the differences between
child and adult languages? As summarized in Chapter  and
repeated below, two approaches have been advanced to account
for the differences between child and adult languages:
()

a
...

b
...


The performance deficit approach (a) is often stated under
the Continuity Hypothesis (Macnamara , Pinker )
...
g
...


A Variational Model 
The competence deficit approach (b) is more often found in
works in the parameter-setting framework
...
 The differences between child
language and adult language have been attributed to other deficits
in children’s grammatical competence
...

assumes a deficit in the Tense/Agreement node in children’s
syntactic representation (Wexler ): the Tense/Agreement
features are missing in young children during the ROI stage
...
The reader is referred to Phillips () for a
review and critique of some recent proposals along these lines
...
In section 
...
More empirically, as we
shall see in Chapters  and , the imperfection perspective on
child language leaves many developmental patterns unexplained
...

We will see that when English children drop subjects in Wh
questions, they do so almost always in adjunct (where, how)
questions, but almost never in argument (who, what) questions:
a categorical asymmetry not predicted by any imperfection
explanation proposed so far
...


 A Variational Model
(approximately %) of V patterns in children acquiring V:
hence, % of ‘imperfection’ to be explained away
...
While there is no doubt that innate UG
knowledge must play a crucial role in constraining the child’s
hypothesis space and the learning process, there is one component
in the GSL approach that is too sensible to dismiss
...
In the rest of this chapter we
propose a new approach that incorporates this useful aspect of
the GSL model into a generative framework: an innate UG
provides the hypothesis space and statistical learning provides the
mechanism
...


2
...
2
...
It is a fundamental question how such variation is interpreted in a theory of language
acquisition
...

Variation, as an intrinsic fact of life, can be observed at many
levels of biological organizations, often manifested in physiological, developmental, and ecological characteristics
...
As pointed out by Ernst Mayr on many occasions (in particular, , , ), it was Darwin who first realized that the variations among individuals are ‘real’: individuals in
a population are inherently different, and are not mere ‘imperfect’
deviations from some idealized archetype
...
As R
...
Lewontin remarks,
evolutionary changes are hence changes in the distribution of
different individuals in the population:
Before Darwin, theories of historical change were all transformational
...
Lamarck’s
theory of evolution was transformational in regarding species as changing
because each individual organism within the species underwent the same
change
...

In contrast, Darwin proposed a variational principle, that individual
members of the ensemble differ from each other in some properties and that
the system evolves by changes in the proportions of the different types
...
(Lewontin : –; italics original
...
Non-uniformity in a
sample of data often should, as in evolution, be interpreted as a
collection of distinct individuals: variations are therefore real and
expected, and should not be viewed as ‘imperfect’ forms of a
single archetype
...
Similarly, the
distinction between transformational and variational thinking in
evolutionary biology is also instructive for constructing a formal
model of language acquisition
...
In contrast, we may consider a variational theory in which language acquisition is the change in the
distribution of I-language grammars, the principled variations in
human language
...
The computational properties of the model will then be discussed in the context
of the formal sufficiency condition on acquisition theories
...
2
...
We adopt the P&P framework, i
...
assuming that
there is only a finite number of possible human grammars, varying along some parametric dimensions
...

Each grammar Gi is paired with a weight pi, which can be
viewed as the measure of prominence of Gi in the learner’s
language faculty
...
Learning stops when the weights of all
grammars are stabilized and do not change any further, possibly
corresponding to some kind of critical period of development
...
That is, the target
grammar has eliminated all other grammars in the population as
a result of learning
...
selects a grammar Gi with the probability pi
b
...


A Variational Model 
c
...
As
learning proceeds, grammars that have overall more success with
the data will be more prominently represented in the learner’s
hypothesis space
...
Imagine the
learner has two grammars, G, the target grammar used in the
environment, and G, the competitor, with associated weights of
p and p respectively
...
e
...
The learner will then have
comparable probabilities of selecting the grammars for both
input analysis and sentence production, following the null
hypothesis that there is a single grammatical system responsible
for both comprehension/learning and production
...


where SG indicates a sentence produced by the grammar G
...
At this stage of acquisition, sequences
produced by the learner will look like this:
() Intermediate in acquisition:
SG, SG, SG, SG, SG, SG
...

When learning stops, G will have been eliminated (p ≈ ) and
G is the only grammar the learner has access to:
()



Completion of acquisition:
SG, SG, SG, SG, SG, SG,
...


 A Variational Model
Of course, grammars do not actually compete with each other:
the competition metaphor only serves to illustrate (a) the grammars’ coexistence and (b) their differential representation in the
learner’s language faculty
...
 We will also stress the passiveness of the learner in the
learning process, conforming to the research strategy of a ‘dumb’
learner in language acquisition
...
The justification for this minimum assumption is twofold
...
Hence, we assume that the learner does not contemplate which grammar to use when an input datum is presented
...
He does not make active changes to the
selected grammar (as in the triggering model), or reorganize his
grammar space, but simply updates the weight of the grammar
selected and moves on
...
Write s ∈ E if a sentence s is an utterance in the
linguistic environment E
...
Write G → s if a grammar G can analyze s,
which, as a special case, can be interpreted as parsability (Wexler &
Culicover , Berwick ), in the sense of strong generative
capacity
...

However, as we shall see in Chapter , children use their morphological knowledge and domain-specific knowledge of UG—strong


In this respect, the variational model differs from a similar model of acquisition
(Clark ), in which the learner is viewed as a genetic algorithm that explicitly evaluates grammar fitness
...


A Variational Model 
generative notions—to disambiguate grammars
...
Our choice of string-grammar
compatibility obviously eases the evaluation of grammars using
linguistic corpora
...

For simplicity, write pi for pi(E, t) at time t, and pi′ for pi(E, t + ) at
time t + 
...
In the present model, learning is the adaptive change in
the weights of grammars in response to the sentences successively
presented to the learner
...
 Consider the one in ():
() Given an input sentence s, the learner selects a grammar Gi with probability pi:
a
...
if Gi → s then
/

{

{

p′i = pi + ( – pi)
p′j = ( – )pj if j ≠ i
p′i = ( – )pi

p′j = —— + ( – )pj
N–

if j ≠ i

() is the Linear reward-penalty (LR–P) scheme (Bush &
Mosteller , ), one of the earliest, simplest, and most extensively studied learning models in mathematical psychology
...

() for a review
...
In competition learning models, what
is crucial is the constitution of the hypothesis space
...


 A Variational Model
conditioned on external stimulus; in the grammar competition
model, the hypothesis space consists of Universal Grammar, a highly
constrained and finite range of possibilities
...

And, as will be discussed in later chapters in addition to numerous
other studies in language acquisition, in order adequately to account
for child language development, one needs to make reference to
specific characterization of UG supplied by linguistic theories
...
The landmark study of Newport et
al
...
Specifically, children who
are exposed to more yes/no questions tend to use auxiliary verbs
faster and better
...
g
...

The reason, as we shall see, lies in the Universal Grammar
...


2
...


2
...
1 Asymptotic behaviors
In any competition process, some measure of fitness is required
...
In other words, ci is the percentage of
sentences in the environment with which the grammar Gi is
incompatible
...

For example, consider a Germanic V environment, where the
main verb is situated in the second constituent position
...
 An
English-type SVO grammar, although not compatible with all V
sentences, is nevertheless compatible with a certain proportion of
them
...
Since the grammars in
the delimited UG space are fixed—it is only their weights that
change during learning—their fitness values defined as penalty
probabilities are also fixed if the linguistic environment is, by
assumption, fixed
...
It is a notion used, by the linguist,
in the formal analysis of the learning model
...
For example, the learner needs not
and does not keep track of frequency information about sentence
patterns, and does not explicitly compute the penalty probabilities of the competing grammars
...

 For expository ease we will keep to the fitness measure of whole grammars in the
present discussion
...
 we will place the model in a more realistic P&P grammar space, and discuss the desirable consequences in the reduction of computational
cost
...
For simplicity but without loss of generality,
suppose that there are two grammars in the population, G and
G, and that they are associated with penalty probabilities of c
and c respectively
...
e
...


2
...
2 Stable multiple grammars
Recall from section 
...
This inherent variability poses a significant challenge for the robustness of the triggering model
...
From a
learning perspective, a non-homogeneous environment induces a
population of grammars none of which is % compatible with
the input data
...
Therefore, the variability of a speaker’s
linguistic competence can be viewed as a probabilistic combination of multiple grammars
...
In Chapter , we extend the
acquisition model to language change
...
We will derive certain conditions under which one
grammar will inevitably replace another in a number of generations, much like the process of natural selection
...

Consider the special case of an idealized environment in which
all linguistic expressions are generated by an input grammar G
...
It
is easy to see from () that the p converges to , with the competing grammars eliminated
...

Empirically, one of the most important features of the variational model is its ability to make quantitative predictions about
language development via the calculation of the expected change
in the weights of the competing grammars
...
At any time, p + p = 
...
p, G is chosen and G → s
p( – c) (–)p + with Pr
...
pc, G is chosen and G → s
/
= c( – p)

 A Variational Model
Although the actual rate of language development is hard to
predict—it would rely on an accurate estimate of the learning
parameter and the precise manner in which the learner updates
grammar weights—the model does make comparative predictions on language development
...
By estimating penalty probabilities of grammars from CHILDES () allows us to make
longitudinal predictions about language development that
can be verified against actual findings
...

Before we go on, a disclaimer, or rather, a confession, is in
order
...
What we are committed to is the mode of learning:
coexisting hypotheses in competition and gradual selection, as
schematically illustrated in (), and elaborated throughout
this book with case studies in child language
...
This is needed to
accommodate the fact of linguistic variation in adult speakers
that is particularly clear in language change, as we shall see in
Chapter 
...


2
...
3 Unambiguous evidence
The theorem in () states that in the variational model, convergence to the target grammar is guaranteed if all competitor grammars have positive penalty probabilities
...
While the general existence of
unambiguous evidence has been questioned (Clark , Clark &

A Variational Model 
Roberts ), the present model does not require unambiguous
evidence to converge in any case
...
The target of
learning is a Dutch V grammar, which competes in a population
of (prototype) grammars, where X denotes an adverb, a prepositional phrase, and other adjuncts that can freely appear at the
initial position of a sentence:
() a
...

c
...

e
...
 Observe that
none of the patterns in (a) alone could distinguish Dutch from
the other four human grammars, as each of them is compatible
with certain V sentences
...
% are SVO patterns, followed by XVSO patterns at % and
only 
...
 Most notably, Hebrew, and Semitic in
general, grammar, which allows VSO and SVO alternations
(Universal : Greenberg ; see also Fassi-Fehri , Shlonsky
), is compatible with 
...

Despite the lack of unambiguous evidence for the V grammar,
as long as SVO, OVS, and XVSO patterns appear at positive
frequencies, all the competing grammars in () will be punished
...
The theorem in
() thus ensures the learner’s convergence to the target V grammar
...
, based
on a computer simulation
...

 Thanks to Edith Kaan for her help in this corpus study
...
9
0
...
7
0
...
5
0
...
3
0
...
1
0
0

500 1000 1500 2000 2500 3000 3500 4000 4500 5000
No
...
The convergence to the V grammar in the absence of unambiguous
evidence

2
...
It only requires a finite and non-arbitrary
space of possible grammars, a conclusion accepted by many of
today’s linguists
...


2
...
1 Parameter interference
So far we have been treating competing grammars as individual
entities; we have not taken into account the structure of the
 Different theories of UG will yield different generalizations: when situated into
a theory-neutral learning model, they will—if they are not merely notational

A Variational Model 
grammar space
...
According to
some estimates (Clark ; cf
...
And, if the grammars are stored as individual
wholes, the learner would have to manipulate – grammar
weights: now that seems implausible
...
Suppose
that there are n binary parameters, , ,
...
Each parameter i is associated with a weight pi,
the probability of the parameter i being 
...
, pn)
...
For example, if the
current value of pi is 
...
As the value of pi
changes, so will the probability of selecting  or 
...
, pn), the learner
can non-deterministically generate a string of s and s, which is
a grammar, G
...
P gives rise to all n grammars; as P changes, the
probability of P ⇒ G also changes
...

() describes how P generates a grammar to analyze an
incoming sentence:
variants—make different developmental predictions
...
See Ch
...


 A Variational Model
()

For each incoming sentence s
a
...
, n
• with probability pi, choose the value of i to be ;
• with probability  – pi , choose the value of i to be 
...
Let G be the grammar with the parameter values chosen in (a)
...
Analyze s with G
...
Update the parameter values to P′ = (p′, p′,
...


Now a problem of parameter interference immediately arises
...
By contrast, fitness
measure and thus the outcome of learning—reward or punishment—is defined on whole grammars
...
Suppose that the language to be
acquired is German, which has [+Wh] and [+V]
...
Now although [+Wh] is the
target value for the Wh parameter, the whole grammar [+Wh,
–V] is nevertheless incompatible with a V declarative sentence
and will fail
...

So the problem is this
...
This in turns introduces
the problem of parameter interference: updating independent

A Variational Model 
parameter probability is made complicated by the success/failure
of the composite grammar
...


2
...
2 Independent parameters and signatures
To be sure, not all parameters are subject to the interference problem
...
Specifically, with respect to a parameter ,
its signature refers to s, a class of sentences that are analyzable
only if  is set to the target value
...

In the variational model, unlike the cue-based learning model
to be reviewed a little later, the signature–parameter association
need not be specified a priori, and neither does the learner
actively search for signature in the input
...
Specifically, both values of a parameter are
available to the child at the outset
...
Hence, the nontarget value has a positive penalty probability, and will be
eliminated after a sufficient number of signatures have been
encountered
...
On the one hand, it radically
reduces the problem of parameter interferences
...
 On the
 This also suggests that when proposing syntactic parameters, we should have the
problem of acquisition in mind
...


 A Variational Model
other hand, parameters with signatures lead to longitudinal
predictions that can be directly related to corpus statistics
...
In Chapter , we will see the acquisition of several independent parameters that can be developmentally tracked this way
...
The Wh
movement parameter is a straightforward example
...
For non-Wh sentences, the Wh parameter obviously has
no effect
...

The  value for this parameter is associated with signature such as
(), where finite verbs precede negation/adverb:
() a
...

Jean ne eats
no of cheese
...

b
...

Jean eats often of cheese
...


Yet another independent parameter is the obligatory subject
parameter, for which the positive value (e
...
English) is associated
with the use of pure expletives such as there in sentences like
There is a train in the house
...
Therefore, finite verbs
followed by negation or adverbs in a language indicate that the verb must raise at least
to Tense
...
 we review two models that untangle parameter interference by endowing the learner with additional
resources
...
, a far simpler model
and study its formal sufficiency
...
A
fuller treatment of the mathematical and computational issues
can be found in Yang (in press)
...
4
...
Fodor’s
() Structural Trigger Learner (STL) takes this approach
...
If so, the present parameter values are left unchanged; parameters are set only when the
input is completely unambiguous
...

The other approach was proposed by Dresher & Kaye ()
and Dresher (); see Lightfoot () for an extension to the
acquisition of syntax
...
Dresher & Kaye () propose that for each
parameter, the learner is innately endowed with the knowledge of
the cue associated with that parameter
...
Upon
the presentation of a cue, the learner sets the value for the corresponding parameter
...
That is, the cue
 Tesar & Smolensky Constraint Demotion model () is similar
...


 A Variational Model
for a parameter may not be usable if another parameter has not
been set
...
Suppose the parameter
sequence is , ,
...
, sn, respectively
...
Initialize , ,
...

b
...
, n
• Set i upon seeing si
...
, i alone
...
, n to respective default values
...
 The STL model seems to
introduce computational cost that is too high to be realistic: the
learner faces a very large degree of structural ambiguity that must
be disentangled (Sakas & Fodor )
...
While this has been deductively worked out for
about a dozen parameters in metrical stress (Dresher ),
whether the same is true for a non-trivial space of syntactic parameters remains to be seen
...
The STL model may maintain that before a parameter is conclusively set, both parameter
values are available, to which variation in child language are be
attributed
...

The cue-based model is completely deterministic
...
() for a formal discussion; see also
Church () for general comments on the cue-based model, and Gillis et al
...


A Variational Model 
a parameter is associated with a unique parameter value—correct
or incorrect, but not both—and hence no variation in child
language can be accounted for
...
This
predicts radical and abrupt reorganization of child language:
incorrectly, as reviewed earlier
...


2
...
4 Naive parameter learning
In what follows, we will pursue an approach that sticks to the
strategy of assuming a ‘dumb’ learner
...
Reward all the parameter values if the composite grammar succeeds
...
Punish all the parameter values if the composite grammar fails
...
The
hope is that, in the long run, the correct parameter values will
prevail
...
The
combinations of the two parameters give four grammars, of
which we can explicitly measure the fitness values (penalty probabilities)
...
Of the remaining declarative sentences, about % are SVO sentences that are consistent
with the [–V] value
...

 This figure is based on English data: we are taking the liberty to extrapolate it to
our (hypothetical) German simulation
...
We
then have the penalty probabilities shown in Table 
...
 shows the changes of the two parameter values over
time
...

It is not difficult to prove that for parameters with signatures,
the NPL will converge on the target value, using the Martingale
methods in Yang & Gutmann (); see Yang (in press) for
T A B L E 
...


[+V]
[–V]

[–Wh]

...


1
0
...
9
0
...
8
0
...
7
0
...
6
0
...
5
0

20

40

60

80

100

120

140

160

180

F I G U R E 
...
We now turn to the more difficult issue of learning parameters that are subject to the interference problem
...

First, our conclusion is based on results from computer simulation
...
Analytical results—proofs—are much better, but so far
they have been elusive
...
As the example of the
Wh and V learning (Fig
...
In that example, if the three competitors have
high penalty probabilities, intuition tells us that the two parameters rise to target values quickly
...

This is a departure from the traditional linguistic learnability
study, and we believe it is a necessary one
...
 Rather, learning is
studied ‘in the limit’ (Gold ), with the assumption that learning can take an arbitrary amount of data as long as it converges
on the correct grammar in the end: hence, no sample complexity
considerations
...
In Chapter  we show that it is possible to establish bounds
on the amount of linguistic data needed for actual acquisition: if


Although intuition fades rapidly as more and more parameters combine and inter-

act
...


 A Variational Model
the learning data required by a model greatly exceed such bounds,
then such a model will fail the formal sufficiency condition
...
For example, suppose one
has found models that require exactly n or n specific kinds of input
sentences to set n parameters
...
But
to claim this is an efficient model, one must show that these n
sentences are in fact attested with robust frequencies in the actual
input: a model whose theoretical convergence relies on twenty
levels of embedded clauses with parasitic gaps is hopeless in reality
...
For example,
computer simulation shows that the NPL model does not
converge onto the target parameter values in a reasonable amount
of time if all of the n –  composite grammars have the penalty
probability of 
...
But this curious (and
disastrous) scenario does not occur in reality
...
To do so, one would have to consider all, at
least a large portion, of the n grammars
...
It is obvious that each of these steps poses enormous practical problems for large numbers of n
...
e
...
The example in (), the V grammar in competition with
four other grammars, is a case in point
...


A Variational Model 
Furthermore, we believe that it is reasonable to assume that the
badness of a grammar is in general correlated with how ‘far away’
it is from the target grammar, where distance can be measured by
how many parameter values they differ: the Hamming distance
...
It is true that the
change of some parameter may induce radical changes on the
overall grammar obtained, e
...
[±Wh], scrambling (though some
of these parameters may be independent, and thus free of parameter interference)
...

Specifically, we assume that the penalty probabilities of the
competing grammars follow a standard Gaussian distribution:
()

c(x) =  – e –

x
— , where  = /





To choose penalty probabilities, we first divide the interval (, )
into n equal segments, where n is the number of parameters
...
However, to simulate the effect that grammars
further from the target are generally (but not always) worse than
the closer ones, we assume that Gh falls in the hth region with
probability s, in the h ± st regions with probability s , in the h ±
nd regions with probability s , etc
...
Thus, a grammar farther away can be still be
compatible with many sentences from the target grammar, but
the likelihood of it being so vanishes very quickly
...
But overall, further away grammars are on average
worse than those that are closer to the target
...
And

 A Variational Model
even here we will make simplified assumptions; see Appendix A
for details
...
Second, some essential distributional statistics
are based on English and Germanic languages, and then extrapolated (not unreasonably, we believe) to other grammars
...
The average penalty probability for grammars one parameter away
is 
...
The average penalty probability for grammars two parameters away
is 
...
The average penalty probability for grammars three parameters
away is 
...
Penalty probability in general correlates with the
Hamming distance from the target
...
 in Appendix A) are also consistent with our
assumption of distance-related exponential decay
...
4
...
First, if  is too
small—the learner modifies parameter weights very slightly upon
success/failure—the learner takes an incredibly long time to
converge
...

It is not hard to understand why this may be the case
...
, pn), and the
target values are T, which is an n-ary vector of s and s
...
g
...
), the learner has no idea
what T may be
...
Thus, the learner may
have increasingly higher confidence in P, which now works better
and better
...

There are a number of ways of implementing this intuition
...
There are also many algorithms in
computer science and machine learning that formally—and
computationally expensively—modify the learning rate with
respect to the confidence interval
...
Furthermore, they deviate from
the guidelines of psychological plausibility and explanatory
continuity that acquisition models are advised to follow
(Chapter )
...
It is based on two observations
...

Second, the overall goodness of P can be related to how often P
successfully analyzes incoming sentences
...

Formally,
()

The Naive Parameter Learner with Batch (NPL+B)
a
...
• If G → s, then b = b + 
...

/
c
...

• If b = –B, punish G and reset b = 
...
Go to (a)
...
Usually, ‘batch’ refers to a memory that stores a
number of data points before processing them
...
The cost of additional memory load
is trivial
...
To see this, consider that P is very close to T
...
Then, almost every B sentences will
push the batch counter b to its bound (B)
...
By
contrast, if P is quite far from T, then it generally takes a longer
time for b to reach its bound—reward and punishment are then
less frequent, and thus slow down learning
...
A gambler has n dollars to
start the game
...
The
gambler wins if he ends up with n dollars, and is ruined if he is
down to 
...
It is not difficult to
show—the interested reader may consult any textbook on
stochastic processes—that the probability of the gambler winning
(i
...
getting n dollars), w, is:
(q/p)n – 
() w = ————

(q/p)n – 

Our batch counter b does exactly the same thing
...
b wins if it reaches B, and loses if it reaches –B
...


A Variational Model 
Let p be the probability of P yielding a successful grammar
...
Fig
...
B = 
means that there is no batch: the learning parameter would be
uniform throughout learning
...
 A typical result from a
simulation of learning a ten-parameter grammar is given in Fig
...
And the learner converges in a reasonable amount of
time
...

It must be conceded that the formal sufficiency condition of
the NPL model is only tentatively established
...
First, and obviously, much more work is needed
to establish whether the assumptions of Gaussian distribution
and exponential decay are accurate
...

The most important consequence of the NLP model, if vindicated, lies in the dramatic reduction of computational cost: the
memory load reduced from storing n grammar weights to n
parameter weights
...



A copy of the NPL+B learner can be obtained from the author
...
Presumably, the learner can focus on parameters that are not independent: a
smaller space means smaller computational cost for the STL parser
...
95
0
...
85
0
...
75
0
...
65
w(1, p)
w(2, p)
w(3, p)
w(4, p)
w(5, p)

0
...
55
0
...
5

0
...
6

0
...
7

0
...
8

0
...
9

0
...
The probability function w(B, p) = ————
(q/p)B – 
1

0
...
6

0
...
2

0
0

1000

2000

3000

4000
Time

5000

6000

7000

FIGURE 
...
B = ,  = 
...

Since the successful acquisition of a grammar is accomplished
only when all parameters are set correctly, children may go
through an extended period of time in which some parameters
are already in place while others are still fluctuating
...
Now what the child possesses are
partial fragments of grammars that may not correspond to any
attested adult language—something that is, say, English-like in
one respect but Chinese-like in another
...
A number of such cases in child languages
will be documented in Chapter 
...
5 Related approaches
The idea of language acquisition as grammar competition has
occasionally surfaced in the literature, although it has never been
pursued systematically or directly related to quantitative data in
language development
...
This position was echoed in Stampe
(), and seems to be accepted by at least some researchers in
phonological acquisition (Macken )
...
; cf
...

Since the advent of the P&P framework, some linguists have
claimed that syntactic acquisition selects a grammar out of all possible human grammars (Piattelli-Palmarini , Lightfoot ), but
nothing has been formalized
...
The possibility of
associating grammars with weights has been raised by Valian (),
Weinberg (), and Bloom (), either for learnability considerations or to explain the gradual developmental patterns in child
language
...

Recently, Roeper (; cf
...
Roeper
further suggests that in the selection of competing grammars, the
learner follows some principles of economy akin to those in the
Minimalist Program (Chomsky b): grammars with less
complex structural representations are preferred
...
For instance, English
children who alternate between I go, using a nominative case
subject, and me go, using a default (accusative) case, can be viewed
as using two grammars with different case/agreement systems,
both of which are attested in human languages
...
The GA model represents grammars
as parameter vectors, which undergo reproduction via ‘crossover’,
i
...
parts of two parental parameter vectors are swapped and
combined
...

Candidate grammars are evaluated against input data; hence,
measure of fitness is defined, which is subsequently translated
into differential reproduction
...
It can incorporate other possibilities, including the
economy condition suggested by Roeper
...
However, these additional biases must be argued for
empirically
...


A Variational Model 
Both the GA model and the variational model are explicitly
built on the idea of language acquisition as grammar competition; and in both models, grammars are selected for or against on
the basis of their compatibility with input data
...
One major difference lies in
the evaluation of grammar fitness
...
It is not accessed by the learner, but can be measured
from text corpora by the linguist
...
The parsability measures are then
explicitly used to determine the differential reproduction that
leads to the next generation of grammars
...
The
variational model developed here sidesteps these problems by
making use of probabilities/weights to capture the cumulative
effects of discriminating linguistic evidence
...


Appendix A: Fitness distribution in a threeparameter space
Gibson & Wexler (: table ) considered the variations of degree sentences within three parameters: Spec-Head, Comp-Head, and
V
...

For simplicity, we do not consider double objects
...

A principled way to estimate the probability of a string wn =
w, w
...
A space of three parameters, or eight grammars, and the string patterns
they allow
Language

Spec-Head

Comp-Head

V

degree- sentences

VOS–V







VOS+V







SVO–V







SVO+V







OVS–V







OVS+V







SOV–V







SOV+V







VS VOS AVS AVOS
XVS XVOS XAVOS
SV SVO OVS SV SAVO OAVS
XVS XVOS XAVS XAVOS
SV SVO SAV SAVO
XSV XSVO XSAV XSAVO
SV SVO OVS SAV SAVO OASV
XVS XVSO XASV XASVO
VS OVS VAS OVAS
XVS XOVS XVAS XOVAS
SV OVS SVO SAV SAOV OAVS
XVS XVOS XAVS XAOVS
SV SOV SVA SOVA
XSOV XSVA XSOVA
SV SVO OVS SAV SAOV OASV
XVS XVSO XASV XASOV

n

p(wn) = p(w)p(w|w)p(w|w)
...
For example, if w = S, w = V, and w = O, then p(SVO) =
p(S)p(V|S)p(O|SV)
...
Presumably p(V|S) = : every sentence has a verb
(including auxiliary verbs)
...
When the n gets large, the
conditional probabilities get complicated, as substrings of w
...
wn are dependent
...
And again there is independence to be
exploited; for example, verb-to-tense raising parameter is
conditioned only upon the presence of a negation or adverb,
and nothing else
...
It does not seem unreasonable to
assume, say, that the frequencies of transitive verbs are more or less
uniform across languages, because transitive verbs are used in
certain life contexts, which perhaps do not vary greatly across
languages
...

Furthermore, some grammars, i
...
parameter settings, may not be
attested in the world
...
, extrapolating from the grammars for which we do have some statistical
results
...

Roughly % of all sentences contain an auxiliary, and % of
verbs are transitives
...
We then
obtain:
() a
...
P(XSV) = P(XSVO) = P(SAV) = P(XSAVO) = /

() will be carried over to the other three non-V grammars, and
assigned to their respective canonical word orders
...
In addition, we must consider the effect of V: raising
S, O, or X to the sentence-initial position
...
These probability
masses (
...
) will be distributed among the canonical patterns
...

T A B L E 
...


...


...


...



...


...


...


...


...


...


...



...


...


...


...


...


...


...



...


...


...


...


...


...


...



...


...


...


...


3
Rules over Words
Fuck these irregular verbs
...


The acquisition of English past tense has generated much interest
and controversy in cognitive science, often pitched as a clash
between generative linguistics and connectionism (Rumelhart &
McClelland ), or even between rationalism and empiricism
(Pinker )
...
g
...

Yet this is not to say the problem of English past tense is trivial
or uninteresting
...

We show that the variational learning model, instantiated here as
competition among phonological rules (rather than
grammars/parameters, as in the case of syntactic acquisition),
provides a new understanding of how phonology is organized
and learned
...
1 Background
Our problem primarily concerns three systematic patterns in
children’s acquisition of past tense
...
Second,
young children sometimes overregularize: for example, they
produce take-taked instead of take-took, where the suffix -d for
regular verbs is used for an irregular verb
...
Third, errors such as bring-brang and wipewope, mis-irregularization errors where children misapply and
overapply irregular past tense forms, are exceeding rare, accounting for about 
...

One leading approach to the problem of past tense, following
the influential work of Rumelhart and McClelland (), claims
that the systematic patterns noted above emerge from the statistical properties of the input data presented to connectionist
networks
...
g
...
To
give just one example (from Prasada & Pinker ), connectionist models have difficulty with the Wug-test, the hallmark of past
tense knowledge
...

In this chapter, we will critically assess another leading approach
to the problem of past tense, the Words and Rule (WR) model
developed by Pinker and his associates (Pinker , )
...
In the ‘rule’ component, following the
tradition of generative linguistics, regular verbs are inflected by
making use of a default phonological rule, which adds -d to the
root (stem)
...
Equally important to the WR model is the Blocking
Principle, a traditional idea dating back to Panini
...

formation, the Blocking Principle has the effect of forcing the use

Rules over Words 
of a more specific form over a more general form: for example,
sang is a more specific realization of the past tense of sing than
singed, and is therefore used
...
The strength of association is conditioned upon
the frequencies of irregular verbs that children hear; thus, memorization of irregular verbs takes time and experience to be
perfected
...
This accounts for the second salient
pattern of past tense acquisition: overregularization errors in
child language
...
The RC model treats both irregular and regular verbs within a single component of the cognitive
system: generative phonology
...
In contrast to the WR model,
we claim that irregular past tense is also formed by phonological
rules
...

The RC model derives from the variational approach to
language acquisition, which holds that systematic errors in child
language are reflections of coexisting hypotheses in competition
...
For the problem of past tense, the hypothesis
space for each irregular verb x includes an irregular rule R,
defined over a verb class S of verbs of which x is a member
...
The acquisition of x
involves a process of competition between R and the default -d
rule, the latter of which in principle could apply to all verbs, regular and irregular
...
Before learning is complete, the default rule will be
probabilistically accessed, leading to overregularization errors
...
 presents the RC model in detail, including a
description of the past tense formation rules in the computational system and a learning algorithm that specifies how rules
compete
...
Section 
...
Specifically, we show that children’s
performance on an irregular verb strongly correlates with the
weight of its corresponding phonological rule, which explains a
number of class-based patterns in the acquisition of irregular
verbs
...

Section 
...
Section 
...
We
show that each of them is either empirically flawed or can be
accommodated equally well in the RC model
...
2 A model of rule competition
A central question for a theory of past tense formation, and consequently, for a theory of past tense acquisition, is the following:
Should the -d rule be considered together with the inflection of the
irregular as an integrated computational system, or should they be
treated by using different modules of cognition? The approach
advocated here is rooted in the first tradition, along the lines
pursued in Chomsky & Halle (), Halle & Mohanan (), and
the present-day Distributed Morphology (Halle & Marantz )
...
In an intermediate class of cases exemplified by verbs

Rules over Words 
like sing-sang or bind-bound the changes affect only a specific number of verbs
...

Rather, it will contain a few rules, each of which determines the stem vowels of
a list of verbs specifically marked to undergo the rule in question
...


3
...
1 A simple learning task
Before diving into the details of our model, let’s consider a simple
learning task, which may help the reader understand the core
issues at a conceptual level
...
The learner will store in its memory a list of pairs,
as is: (, ), (, ), etc
...
Notice that
() contains two regularities between the two paired numbers (x,
y) that can be formulated as two rules: y = x +  for {, , } and y
= x for {, , , }
...
{, , } ‫ ۋ‬Rx+
b
...
The WR model employs the first
strategy: irregular verbs are memorized by rote as associated pairs
such as feed-fed, bring-brought, shoot-shot, think-thought
...
{feed, shoot,
...
{bring, think,
...


In an information-theoretic sense, the rule-based strategy,
which allows a more ‘compact’ description of the data, is the more
efficient one
...
g
...

Furthermore, there is reason to believe that the rule-based strategy is preferred when verbs (rather than numbers) are involved
...
Irregular past tense rules
are often well-motivated phonological processes that are abundantly attested in the language
...
Therefore, such rules
are frequently encountered by and naturally available to the learner
...
In what follows, we will describe the properties of the
phonological rules for past tense, and how they compete in the
process of learning
...
2
...
This, along with the issue of irregular phonology in other languages,
will be discussed in section 
...
The change in the quality
of the vowel actually involves shortening as well as lowering
...


Rules over Words 
)
...
Readjustment
rules, mostly vowel-changing processes, further alter the phonological structure of the stem
...

The default rule for English verb past tense is given in ():
() The default -d rule:
-d

x → x + -d

Irregular verbs fall into a number of classes as they undergo
identical or similar suffixation and readjustment processes
...
Such a rule
is schematically shown in (), while the rule system for the most
common irregular verbs is given in Appendix B
...
}

For example, the verb class consisting of lose, deal, feel, keep, sleep,
etc
...
Suffixation and readjustment rules are generally independent of each other, and are in fact acquired separately
...
[ay]–[I]: divine-divinity
b
...
[e]–[æ]: nation-national
d
...
[u]–[ٙ]: deduce-deduction


See Halle & Marantz () for arguments that the -ø(null) morpheme is ‘real’
...

Now the conceptual similarities and differences between the WR
model and the RC model ought to be clear
...

Every theory must have some memory component for irregular
verbs: irregularity, by definition, is unpredictable and hence must be
memorized, somehow
...
In the WR model, irregular verbs and their past tense forms
are stored as simple associated pairs, and learning is a matter of
strengthening their connections
...

Once a rule system such as () is situated in a model of learning, a number of important questions immediately arise:
() a
...

c
...


Where do rules such as suffixation and readjustment come from?
How does the learner determine the default rule (-d)?
How does the learner know which class a verb belongs to?
How do the rules apply to generate past tense verbs?

We postpone (c) and (d) until section 
...

For our purposes, we will simply assume that the relevant rules
for past tense formation, both the default and the irregular, are
available to the child from very early on
...
g
...
Ablaut and umlaut only designate the direction of vowel shifting, e
...
front →
back, but leave other articulatory positions, e
...
[± high/low], unspecified
...
We
will return to the issue of class homogeneity in section 
...
In section 
...


Rules over Words 
V1

R1

V2

R2

V3

R3

V4

R4

V5
V6
F I G U R E 
...
The justification of
our assumption is threefold
...
Recall that their past tense is very
good (% correct), and all their errors result from using a wrong
rule: almost always the default, very rarely a wrong irregular rule
...
This suggests that knowledge
of the rules must be present
...

Second, there is strong crosslinguistic evidence that children’s
inflectional morphology is in general close to perfect; see
Phillips () for a review
...
Clahsen & Penke
() had similar findings in a German child during the period
of ; to ;: the correct use of the affixes -st (nd singular) and
-t (rd singular) is consistently above %
...
And
interestingly, when children’s morphology occasionally deviates
from adult forms, the errors are overwhelmingly of omission,
i
...
the use of a default form, rather than substitution, i
...
the use
of an incorrect form
...
To
acquire the inflectional morphologies in these languages, the learner

 Rules over Words
must be able to extract the suffixes that correspond to the relevant
syntactic/semantic features, and master the readjustment rules and
processes when combining stems and suffixes
...

Finally, recent work in computational modeling of phonological acquisition proposed by Yip & Sussman (, ) and
extended by Molnar () suggests not only that these rules can
be learned very rapidly under psychologically plausible assumptions but that they are learnable by precisely the principle of storage minimization
...
It learns with far greater efficiency and accuracy than every computational model proposed to
date, including MacWhinney & Leinbach (), Ling & Marinov
(), and Mooney & Califf ()
...

The rapid learning of rules in the Yip–Sussman model is
consistent with the observation that children’s knowledge of
inflectional morphology is virtually perfect
...
, we
lay out the RC model that explains what remains problematic
over an extended period of time: the application of these rules
...
2
...
First, we assume, uncontroversially, that children are able to pair a root with its past tense: for
example, when sat is heard, the learner is able to deduce from the
meaning of the sentence that sat is the past tense realization of the
root sit
...

 For a review that very young children can perform morphological analysis of word
structures, see Clark ()
...
However, empirical evidence strongly speaks against this
possibility
...
Misapplication of
irregular rules such as bring-brang, trick-truck, wipe-wope,
dubbed ‘weird past tense forms’ by Xu & Pinker (), are
exceedingly rare: about 
...
 The rarity of weird past
tense forms suggests that the child is conservative in learning verb
class membership: without seeing evidence that a verb is irregular, the child generally assumes that it is regular, instead of postulating class membership arbitrarily
...
Write P(x ∈S) for the probability that the learner correctly places x into the verb class S
...
These frequencies can be estimated from adult-to-child corpora such as CHILDES
...

A central feature of the RC model is that rule application is not
absolute
...
For example, when the child tries to inflect sing, the irregular rule [-ø &
ablaut], which would produce sang, may apply with a probability
that might be less than 
...
If R is probabilistically
bypassed, the -d rule applies as the default
...

 The present model should not be confused with a suggestion in Pinker & Prince
(), which has an altogether different conception of ‘competition’
...
The Blocking Principle states that when
two rules or lexical items are available to realize a certain set of
morphophonological features, the more specific one wins out
...
Call this version of the Blocking Principle the
Absolute Blocking Principle (ABP)
...
Thus, a more specific rule can be skipped in favor of
a more general rule
...
In section 
...

An irregular rule R, defined over the verb class S, applies with
probability PR , once a member of S is encountered
...
The acquisition of irregular verb past tense proceeds as algorithm shown in
Fig
...
e
...
When presented with a verb in past tense (Xpast), the
suggest, much like the present model, that irregular verbs are dealt with by irregular
rules (altogether this is not the position they eventually adopt)
...
g
...
 may compete to apply to the verb V
...
Under Pinker & Prince’s suggestion,
when the appropriate irregular rule loses out, another irregular rule will apply
...


Rules over Words 
Xpast
Root extraction
X
P(X∈S)

Rule selection

1 – P(X∈S)

R
Rule competition

PR
XIrregular

X-ed
1 – PR
X-ed

Match?

Update
weights

F I G U R E 
...
As illustrated in Fig
...
Selection: associate x to the corresponding class S and hence the rule
R defined over this class
...
Competition apply to R to x over the default rule
...
First,
the learner may not reliably associate x to S, in which case x would
be treated as a regular verb (recall that it is virtually impossible for
an irregular verb to be misclassified)
...
Second, even if x’s class membership S is
correctly established, the corresponding rule R may not apply:

 Rules over Words
rather, in (b), R applies with the probability PR , its weight
...
When either of the
two steps fails, the overregularized form will be produced, resulting in a mismatch with the input form, Xpast
...
Learning is successful when ∀x, P(x ∈
S)PR = : the learner can reliably associate an irregular verb with
its matching irregular rule, and reliably apply the rule over the
default -d rule
...
, many models for
updating probabilities (weights) are in principle applicable
...

Under the null hypothesis, we assume that the grammar system
the child uses for production is the same one he uses for comprehension/learning, the two-step procedures in ()
...

The RC model makes direct and quantitative predictions about
the performance of both irregular verbs and irregular verb
classes
...
While P(x ∈ S) may increase
when the past tense of x is encountered, PR may increases whenever any member of S is encountered
...
Hence, if we hold fx or fS constant, the RC
model makes two directions about the performance of irregular
verbs:
() a
...

b
...


In section 
...


Rules over Words 

3
...
4 The Absolute and Stochastic Blocking
Principles
We now give justifications for the Stochastic Blocking Principle
(SBP), fundamental to the RC model
...
The ABP is central to the WR
model: when it is presupposed, the rote memorization of irregular verbs is virtually forced
...
The WR model
accounts for this by claiming that that irregular verbs are individually memorized
...
The memory imprints of irregular verbs in a
child’s mind are not as strong as those in an adult’s mind, for children have not seen irregular verbs as many times as adults
...

Pinker (: ) justifies the ABP by arguing that it is part of
the innate endowment of linguistic knowledge, for it cannot be
deduced from its effect
...
First, to learn
the ABP, the child must somehow know that forms like singed are
ungrammatical
...
Finally, Pinker
claims that to know singed is ungrammatical ‘is to use it and to be
corrected, or to get some other negative feedback signals from
adults like disapproval, a puzzled look, or a non sequitur
response’
...
g
...

It is not the logic of this argument that we are not challenging;
rather, it is the premise that the blocking effect of a more specific
form over a more general form is absolute
...

Suppose that, initially, for the verb sing, the irregular rule R=[ø & ablaut] and the default -d rule are undifferentiated
...
However, only
when R is selected can a match result, which in turn increases its
weight (probability), PR
...
The end product of such a competition
process is a rule system that appears to obey the ABP but does not
presuppose it: while the specific rule has priority—just as in the
ABP—this preference is probabilistic, and gradually increases as a
result of learning from experience
...

If the effect of the ABP can be duplicated by rule competition
and statistical learning, its theoretical status needs to be reconsidered
...
There
is at least one good reason to reject the ABP: the presence of
‘doublets’
...

For doublets, the ABP cannot be literally true, for otherwise
learned and dived should never be possible, blocked by the more
specific learnt and dove
...
The
term ‘expected’ is important here, implying that the learner has
indeed seen irregular forms of x before, but is now being
confronted with conflicting evidence
...
 As


Including no less a literary genius than Lewis Carroll
...


3
...
rules in overregularization
In this section we examine children’s overregularization data in
detail
...


3
...
1 The mechanics of the WR model
In order to contrast the RC model with the WR model, we must
be explicit about how the WR model works and what predictions
it makes
...
It is not clear how predictions can be made with this
level of clarity under the WR model
...
However, the closest to a clear statement that we can find
in the WR literature is still vague:
It is not clear exactly what kind of associative memory fosters just the kinds of analogies that speakers are fond of
...
Here we can only present a rough sketch
...

‘Well, I can’t show it you myself,’ the Mock Turtle said: ‘I’m too stiff
...


 Rules over Words
units
...
Links between stems
and pasts would be set up during learning between their representations at two levels:
between the token representations of each pair member, and their type representations
at the level of representation that is ordinarily accessed by morphology: syllables,
onsets, rhymes, feet (specifically, the structures manipulated in reduplicative and
templatic systems, as shown in the ongoing work of McCarthy and Prince and others)
...
On occasions where token-token links are noisy or inaccessible and retrieval fails, the type-type
links would yield an output that has some probability of being correct, and some probability of being an analogical extension (e
...
, brang)
...
g
...
(Pinker & Prince : )

It is difficult to evaluate statements like these
...
However, it is not clear how
the type-level linkings between phonological structures (syllables,
onsets, etc
...
But far worse is the vagueness
concerning how the two levels interact
...
Such imprecise formulations are not
amenable to analytical results such as ()
...
The data clearly
point to an organization of irregular verbs by rules and classes
...
, are frequency based, although section 
...


Rules over Words 

3
...
2 The data
The measure of children’s knowledge of irregular verbs is the
correct usage rate (CUR), C(x), defined as follows:
total number of correct past tense of x

() C(x) = ————————————
total number of past tense of x

Our data on child performance come from the monograph
Overregularization in Language Acquisition (Marcus et al
...
 Marcus et al
...
 The input frequencies of irregular verbs are determined by the present author, based
on more than , adult sentences to which Adam, Eve, Sarah,
and Abe were exposed during the recording sessions
...
(: tables A–A)
and given in ():
() a
...

c
...


Adam: / = 
...
%
Sarah: / = 
...
It is clear that
there is quite a bit of individual variation among the children
...
Of particular interest is the
verb class [-ø & Rime → U], which includes verbs such as know,
grow, blow, fly, and throw
...

 For example, the past tense of no-change irregular verbs can only be accurately
identified from the conversation context
...
The CURs are / = % (Adam), / = %
(Eve), / = % (Sarah), and / = % (Abe)
...
We will
explain this peculiar pattern in section 
...

Hence, C(x) for the WR model is correlated with the frequency of
x in past tense form, fx
...
Hence, C(x) in the RC model is
correlated with fx × ∑m∈S fm
...
3
...


To test this prediction, we have listed some verbs grouped by
class in (), along with their input frequencies estimated from
adult speech
...
Also, to minimize sampling effect, only verbs
that were used by children at least twenty times are included in
our study (Appendix C gives a complete list of irregular verbs
with their frequencies):
()

Verbs grouped by class
a
...
%)
leave (/=
...
g
...
Ambiguities that arise between past tense and
present tense (e
...
hit), past participles (e
...
brought, lost), nouns (e
...
shot), and adjectives
(e
...
left) were eliminated by manually combing through the sentences in which they
occurred
...


Rules over Words 
b
...
%)
think (/=
...
%)
buy (/=
...
[-ø & No Change]
put (/=
...
%)
hurt (/=
...
%)
d
...
%)
bite (/=
...
[-ø & Backing ablaut]
get (/=
...
%)
write (/=
...
%)
f
...
%)
throw (/=
...
 The
‘exception’ in class (b), where think, a more frequent verb than
catch, is used at a lower CUR, is only apparent
...
Adam, Eve, & Sarah
b
...
% (/)

...
% (/)

The low averaged CUR of think in (b) is due to a disproportionately large number of uses from Abe
...

 The strong frequency–CUR correlation in the class [-ø & Backing ablaut] might
not be taken at face value
...
See also n
...
This
unequivocally points to the conclusion that irregular verbs are organized in (rule-defined) classes
...
In fact, the frequency–overregularization correlation is also considered by Marcus et al
...
—significant, but far from perfect
...

The frequency–performance correlation almost completely
breaks down when verbs from different classes are considered
...


3
...
4 The free-rider effect
Recall that the RC model predicts:
()

For two verbs x and x such that x ∈ S, x ∈ S and fx = fx, C(x) >
C(x) if fS > fS
...
This ‘free ride’ is made possible by the rule shared by all members of a class
...
(We postpone the
discussion of bite and shoot to section 
...
) We have also included blew, grew, flew, and
drew, which appeared , , , and  times respectively, and
belong to the [-ø & Rime → u] class that is problematic for all
four children
...
[-ø & No Change]
hurt, cut

...
[-ø & Rime → u]
draw, blow, grow, fly

...
This is mysterious
under the WR model
...
Verb class
Verb (frequency)
[-ø & No Change]
hurt (), cut ()
b
...
% (/)

...
Again, it is not clear how the WR
model accounts for this
...
The first rule applies to the verbs hurt and
cut, which do not change in past tense forms
...
Every occurrence of such verbs increases the weight of the class rule
...
In contrast, verbs in (b) belong to the [-ø
& Rime → u] class (blow, grow, know, throw, draw, and fly), which
totals only  occurrences in the input sample
...

A closer look at Abe’s performance, which is markedly poor
across all verb classes, reveals an even more troubling pattern for
the WR model
...
 (/)

...
However, for
the low-frequency verbs in (a), Abe has an average CUR of
0
...
: table A): in fact better than went
and came
...
Despite their relatively high frequencies, go-went
and come-came nevertheless ‘act alone’, for they are in trivial
classes
...
Come-came
belongs to the heterogeneous class [-ø & umlaut], which in fact
consists of three subclasses with distinct sound changes: fall and
befall, hold and behold, and come and become
...


3
...
5 The effect of phonological regularity: Vowel
Shortening
Consider the following two low-frequency verbs: shoot and bite,
whose past tense forms appeared only  and  times respectively
in more than , adult sentences
...
% (/)—again in sharp contrast with
the performance (
...

Past tense formation for both shoot and bite fall under the rule
[-ø & Vowel Shortening]
...
 and in (),
Vowel Shortening is a pervasive feature of the English language
...


Rules over Words 
perspectives, that Vowel Shortening is essentially free: vowels in
closed syllables are automatically shortened under suffixation,
resulting from the interaction between universal phonological
constraints and language-specific syllabification properties
...
g
...
And children are very good
at learning suffixes, as we saw when reviewing their agreement
morphology acquisition in section 
...
[-t]
lose-lost ()
leave-left ()
b
...
[-ø]
shoot-shot ()
bite-bit ()

% correct
% (/)
% (/)
% (/)
% (/)
% (/)

All verbs in () are used very well, almost irrespective of their
individual frequencies, ranging from very frequent ones (saysaid) to very rare ones (shoot-shot, bite-bit)
...


3
...
4
...
has identified a major problem with the WR model
...


 Rules over Words
Perhaps the notion of analogy, built on phonological similarity
(of some sort), may duplicate the effect of rules without explicitly
assuming them
...

Consider Pinker’s discussion on analogy:
Analogy plays a clear role in language
...
They also
find it easier to memorize irregular verbs when they are similar to other irregular verbs
...
(Pinker : )

As an example, Pinker goes on to suggest that rhyme may play
a role in pattern association and memorization
...
The bite-bote type
error results from the occasional misuse of the rhyme analogy
...

In sections 
...
 we have compared children’s performance on several low-frequency verbs
...

However, note that the only irregular verb that bite-bit rhymes
with is light-lit, which appeared only once in the more than
, adult sentences sampled
...
If Pinker were correct
in suggesting that rhyme helps irregular verb memorization, one
would expect that drew, grew, threw, and knew, which rhyme with
each other and thus help each other in memorization, would have
higher retrieval success than shot and bit, which get help from no
one
...

Could some different forms of analogy (other than rhyme)
work so that the WR model can be salvaged? One cannot answer
this question unless a precise proposal is made
...
g
...
Here
there is a methodological point to be made
...

Furthermore, the goal of modern cognitive science is to understand and model mental functions in precise terms
...
 will simply
escape attention: they are revealed only under scrutiny of the
empirical data guided by a concrete theoretical model proposed
here
...
Rather, it simply reflects the probabilistic associations
between words and rules, and the probabilistic competitions
among rules, as the RC model demonstrates
...
But analogy works only when the sound similarities among verbs under
identical rules/classes are strong enough and the sound similarities among verbs under different rules/classes are weak enough
...
For example, verbs in the [-ø & No Change] class,
such as hit, slit, split, quit, and bid, are very similar to those in the
[=ø & Lowering ablaut] class, such as sit and spit, yet the two
groups are distinct
...


 Rules over Words
Or, consider the free-rider effect discussed in section 
...
In order for the WR model to capture the
free-rider effect with analogy, the ‘family resemblance’ among
verbs of all frequencies must be very strong
...

For example, think may be analogized to sing and ring to yield
thank or thunk
...
% in all verb
uses are analogical errors (Xu & Pinker )
...
To take an example from Marcus et al
...
The authors convincingly argue, using a
sort of Wug-test with novel German nouns, that despite its low
frequency, the -s is the default plural suffix, However, it is hard
to imagine that German speakers memorize all four classes of
irregular plurals—the majority of nouns in the language—on a
word-by-word basis, as if each were entirely different from the
others
...

Furthermore, it is the partial similarity among English irregular
verbs that led Pinker and his colleagues to look for family
resemblance: four irregular classes of German noun plurals do
not show any systematic similarity
...
It seems that German learners must sort each irregular noun into its proper class, as suggested by the traditional
rule-based view
...

 Which seems no more than a historical accident: see section 
...
These languages typically have
very long ‘words’ built out of many morphemes, each of which
expresses an individual meaning and all of which are glued
together by both the morphophonological and the syntactic
systems of the language
...

This is not to say that analogy plays no role in learning
...
 However, the
role analogy plays in learning must be highly marginal—precisely
as marginal as the rarity of analogy errors, 
...
As for an
overall theory of past tense, it is important to realize, as Pinker &
Prince (: , italics original) remark, that ‘a theory that can
only account for errorful or immature performance, with no
account of why the errors are errors or how children mature into
adults, is of limited value’
...


3
...
2 Partial regularity and history
Before moving on, let us consider a major objection of proponents of the WR model to the rule-based approach
...
In addition, brang is even acceptable to
some speakers
...
g
...

This again suggests that analogy is a very weak influence
...
g
...
Pinker’s
explanation is again based on family resemblance, the sort of fuzzy
associations borrowed from connectionist networks
...
But this reasoning seems circular: why are these
verbs pulled into similarity-based families? As far as one can tell,
because they sound similar
...
Nowhere does
the WR model specify how fuzzy family resemblance actually
works to prevent thunk and blunk from being formed
...

In the RC model, verb classes are defined by rules such as (),
repeated below:
() Rule R for verb class S
R

x → y where x ∈ S = {x, x, x,
...
One can imagine another kind of
rule that is defined in terms of input, where the past tense of the
verb is entirely predictable from the stem:
()

Rule R for verb class S
R

x → y where x has property S

In present-day English, rules like () are full of exceptions, at
least in the domain of the past tense
...
Even the suppletive verbs,

Rules over Words 
which may seem arbitrary synchronically, are not necessarily accidents diachronically
...
However, go did retain the past
tense form, went, which belongs to the more regular class that also
includes bend and send
...

How did such (partial) regularities get lost in history? There are
two main factors; see Pinker (: ch
...

One is purely frequency-based
...
We will return to this in section 
...
See
Pinker (: ) for the history of the now archaic wrought
...
The reader is referred to Yang
() for a formal model of sound change based on the RC
model of learning, and for a detailed discussion of these issues
...
5 Some purported evidence for the WR model
Pinker () summarizes previous work on the WR model and
gives ten arguments in its support
...


3
...
1 Error rate
How low is it?
Pinker claims that the rate of past tense errors is quite low: the
mean rate across twenty-five children is 
...
He suggests that this low rate indicates that overregularization is ‘the exception, not the rule, representing the occasional
breakdown of a system that is built to suppress the error’, as in the
WR model
...
In (), we saw that the error rate averaged over four
children is 
...
Also, as is clear from
Marcus et al
...
 He even made
a considerable number of errors (/=%) in go-goed, while
all other children used went perfectly throughout
...
) are lost
...

Longitudinal trends
Pinker claims that the rate of overregularization, 
...
He concludes that the steady error rate is due to the
occasional malfunction of memory retrieval—the exception, not
the rule
...
First, it seems
that Adam is the exception, rather than the rule
...
Second, as already noted in section

...
g
...



See Maratsos () for a discussion of Abe, in particular why the large set of data
from Abe must be taken as seriously as those from other children
...
To study Abe’s longitudinal development, we have
grouped every consecutive fifteen recordings into a period
...
We have examined verbs that Abe was particularly bad
at: go, eat, fall, think, came, catch, run, and the members of the
problematic [-ø & Rime → u] class: throw, grow, know, draw, blow,
and fly
...

With the exception of period , in which Abe only had eighteen
opportunities to overregularize (and there was thus a likely
sampling effect), his error rate is gradually declining
...


3
...
2 The role of input frequency
Pinker notes that the more frequently an irregular verb is heard,
the better the memory retrieval for that verb gets, and the lower
the overregularization rate
...
Abe’s longitudinal overregularization for problematic verbs
Period

No
...
used

Error rate















































...


...


...


...


...


...


...


 Rules over Words
within a class (section 
...
The performance of an irregular verb is determined by two factors: the
correct identification of class membership, and the weight of the
irregular rule (see sections 
...


3
...
3 The postulation of the -d rule
In the stage which Pinker calls phase  (from ; to shortly before
;), Adam left many regular verbs unmarked: instead of saying
Yesterday John walked, the child would say Yesterday John walk
...
Pinker suggests that the two
phases are separated by the postulation of the -d rule
...

First, individual variations
...
However, on Marcus et al
...
Eve’s use
of regular verbs was basically in a steady climb from the outset
(;)
...
Abe,
whose irregular verbs were marked poorly, nevertheless showed
the highest rate of regular verb marking: he started out with about
% of regular verb marking at ;, rising to % around ;
...
Children learning some but not all
languages (including English) go through a stage in which they
produce a large amount of nonfinite as well as finite verbs in
matrix sentences as well as finite
...

Consider an alternative explanation of the rapid increase
Pinker noted in the use of inflected verbs
...
However, during the OI stage, the -d
rule, which applies to past tense verbs, simply does not apply to
the extensively used nonfinite verbs that are allowed by an OI
stage competence system
...

A good test that may distinguish this position from Pinker’s is
to turn to a language for which the OI stage does not exist, so that
OI is not a confounding factor
...
If the alternative view, that the -d rule is available from early on, is correct, we predict that in the acquisition of
Italian and Spanish, irregular verbs ought to be overregularized
from early on
...
So far we have not checked this
prediction
...
5
...
He cites Adam’s performance for support
...
 during phase ,
and 0
...
There appears to be no ‘real regression,
backsliding, or radical reorganization’ (: ) in Adam’s irregular verb use
...

Gradual improvement is also predicted by the RC model, as
weights for class membership and irregular rules can only
increase
...



The gradual improvement in Adam’s performance seems to contradict Pinker’s
earlier claim that Adam’s error rate is stable (section 
...
5
...
Children are found to call
overregularized verbs silly at above chance level
...

Pinker correctly points out some caveats with such experiments: a child’s response might be affected by many factors, and
thus is not very reliable
...
In
fact, such findings are compatible with any model in which children produce more correct forms than overregularizations at the
time when judgements were elicited
...
5
...
The children are not amused
...

Parent: Mommy goed to the store?
Child: NO! (annoyed) Daddy, I say it that way, not you
...

Whether anecdotal evidence should be taken seriously is of
course a concern
...
In
any case, the RC model gives a more direct explanation for
observed reactions
...
Now if an overregularized form such as goed is repeated several times, the chance of a
mismatch (i
...
the child generating went) is consequently
enhanced—the probability of generating went at least once in
several consecutive tries—much to children’s annoyance, it
appears
...
5
...
Pinker claims that the rarity
entails that adult overregularization is the result of performance,
not the result of a grammatical system
...
Under the RC
model, for an irregular verb (e
...
smite-smote) that appears very
sparsely, the learner may not be sure which class it belongs to, i
...

the probability of class membership association is considerably
below 
...

Pinker also notes that since memory fades when people get
older, more overregularization patterns have been observed
during experiments with older people (Ullman et al
...


3
...
8 Indecisive verbs
Adults are unsure about the past tense of certain verbs that they
hear infrequently
...
As noted in section

...


 Rules over Words
Pinker links input frequency to the success of irregular past
tense (memory imprint)
...


3
...
9 Irregulars over time
Pinker cites Joan Bybee’s work showing that, of the  irregular
verbs during the time of Old English,  are still irregular in
Modern English, with the other  lost to the +ed rule
...
The more frequently used
irregulars are retained
...
Suppose
that for generation n, all  irregular verbs had irregular past tense
forms, but some of them are very infrequently used
...
, and
will regularize them sometimes
...

Eventually, when the irregular forms drop into nonexistence, such
verbs will have lost their irregular past tense forever
...
See Yang () for a model that formalizes
this process
...
5
...
He reasons that this pattern is predicted, since the survival
of irregular verbs against children and adults’ overregularization
is only ensured by high frequency of use
...
 and 
...
6 Conclusion
We have proposed a rule competition model for the acquisition of
past tense in English
...
Hence, the learning of an irregular verb is determined
by the probability with which the verb is associated with the
corresponding irregular rule, as well as the probability of the rule
applying over the default -d rule
...

The RC model is completely general, and applicable to the
acquisition of phonology in other languages
...
Such quantitative predictions are
strongly confirmed by the acquisition data
...

Scrutiny over past tense ‘errors’ revealed much about the organization and learning of phonology
...


Appendix B: The rule system for English past
tense
This list is loosely based on Halle & Mohanan (: appendix)
and Pinker & Prince (: appendix)
...


 Rules over Words

Suppletion
go, be

-t suffixation
• No Change
burn, learn, dwell, spell, smell, spill, spoil
• Deletion
bent, send, spend, lent, build
• Vowel Shortening
lose, deal, feel, kneel, mean, dream, keep, leap, sleep, leave
• Rime → a
buy, bring, catch, seek, teach, think

-ø suffixation
• No Change
hit, slit, split, quit, spit, bid, rid, forbid, spread, wed, let, set, upset,
wet, cut, shut, put, burst, cast, cost, thrust, hurt
• Vowel Shortening
bleed, breed, feed, lead, read, plead, meet

-d suffixation
• Vowel Shortening
flee, say
• Consonant
have, make
• ablaut
sell, tell
• No Change (default)
regular verbs

Appendix C: Overregularization errors in children
Irregular verbs are listed by classes; in the text, only verbs with 25
or more occurrences are listed
...
All raw counts from Marcus et al
...
The
connection between them is not of an external or superficial,
but of a profound, intrinsic, and causal nature
...
If
language is delimited in the finite space of Universal Grammar, its
ontogeny might well recapitulate its scope and variations as the
child gradually settles on one out of the many possibilities
...

The variational model also serves another important purpose
...
As far as we know, there is presently no formal
model that directly explains developmental findings, nor any
rigorous proposal of how the child attains and traverses ‘stages’
described in developmental literature
...

The variational model makes two general predictions about
child language development:
()

a
...

b
...


 Competing Grammars
What follows is a preliminary investigation of () through
several case studies in children’s syntactic development
...
First, they are based on a large
body of carefully documented quantitative data
...
Nevertheless, we will show that some interesting and
important patterns in the data have never been noticed; in addition, an explanation of them may not be possible unless a variational approach is assumed
...
Section (
...

The statistics established there will be used in section 
...
Section 
...
Based on the children’s null subject Wh questions and null object sentences, we show that English children
have simultaneous access both to an obligatory subject grammar
(the target) and to an optional subject grammar, supporting
prediction (b)
...


4
...
Following the discussion of
parameter learning in section 
...
We will test the variational model
 Section 
...

 Hence a large debt is due to the researchers who collected the data used here
...
),
acquired relatively late, and that of V in Dutch (Haegeman ),
also acquired late
...
As reviewed in
section (
...
g
...
g
...
While we believe that the setting of the verb-raising parameter, and indeed of many parameters, is genuinely early,
the claim that the subject and the V parameters are also set early
is unconvincing
...


4
...
1 Verb raising and subject drop: the baselines
Consider first the verb to Tense raising parameter, for which the
[+] value is expressed by signature of the type VFIN Neg/Adv
...
Based on the
CHILDES corpus, we estimate that such sentences constitute %
of all French sentences heard by children
...

We then have a direct explanation of the well-known observation

 Competing Grammars
that word order errors are ‘triflingly few’ (Brown : ) in children acquiring fixed word order languages
...

Observe that virtually all English sentences display rigid word
order, e
...
verb almost always (immediately) precedes object
...
These patterns give a very high (far greater
than %) rate of unambiguous signatures, which suffices to drive
out other word orders very early on
...
Consider then the acquisition of subject use in
English
...
There is a man in the room
...
Are there toys on the floor?

Optional subject languages do not have to fill the subject position, and therefore do not need placeholder items such as there
...
% of all adult
sentences to children, based on the CHILDES database
...
% of unambiguous evidence ought to result in a late
acquisition
...
% as a baseline for late acquisition: if a parameter is expressed
 This does not mean that we are committed to a particular parameter, [± pro-drop]
or [± Null Subject], which is in any case too crude to capture the distributional differences between two representative classes of optional subject grammars, the Italian type
and the Chinese type
...
We are, however, committed to
what seems to be a correct generalization that the use of expletive subjects and the
obligatoriness of subject are correlated—hence, something in UG must be responsible
for this
...
% of the input, then its target value should be set relatively
late; more specifically, as late as the consistent use of subjects in
child English
...
1
...
As
noted in (), there appears to be no direct signature for the V
parameter: the four competitor grammars together provide a
complete covering of the V expressions
...
%,
%, and 
...
As a result,
these grammars are eliminated quite early on; see Fig
...
By the
virtue of allowing SVO and XVSO alternations (Fassi-Fehri
, Shlonsky ), it is compatible with an overwhelming
majority of V patterns (
...
However, it is not
compatible with OVS sentences, which therefore are in effect
unambiguous signatures for the target V parameter after the
other three competitors have been eliminated very rapidly
...
%) implies that the V grammar is
a relatively late acquisition, with a Hebrew-type non-V grammar in coexistence with the target V grammar for an extended
period of time
...
This prediction
is confirmed based on the statistics from a Dutch child, Hein


As remarked earlier, Valian nevertheless claims that the subject parameter is set
correctly, and attributes the missing subjects to performance limitations; we will return
to this in section 
...
 The data concern the position of the
finite verb in matrix sentences, and are reported in Haegeman’s
tables  and , which we combine in Table 
...
The number of V sentences is the number of
postverbal subject sentences minus those with overt material left
of V; that is, column  minus column  in Table 
...

The results are shown in Table 
...
):
() a
...

know I not
b
...

see I yet not
c
...

shines the sun
d
...

can I not run

Now we have to be sure the V patterns in () are ‘real’, i
...
are
indeed due to the presence of a competing Semitic-type grammar
...

Another compounding factor is the precise location of the (finite)
verb
...
Thus, if the
V patterns are genuinely Hebrew-like, the finite verb must reside in
a position higher than Tense
...
Stromswold & Zimmerman’s () large quantitative
study shows, contrary to the earlier claims of Deprez & Pierce
(), that the subject is consistently placed above Negation,


I should point out that Haegeman’s paper does not directly deal with the V
phenomenon, but with the nature of Optional Infinitives instead; it happens to contain
a large body of quantitative data needed by our study
...
Subjects and non-subject topics in Hein’s finite clause
Age

Preverbal subject

Postverbal subject

Overt material left of V

;
;
;
;
;
;
;
;
;
;


































T A B L E 
...
Hence, the verbs in () are
higher than Tense, consistent with the Hebrew-type grammar
...
, we see that before ; the child used V patterns
in close to % of all sentences; see Wijnen () for similar findings
...
With
half of the data showing V patterns, to say that children have
learned V, or have adult-like grammatical competence, is no
different from saying that children use V randomly
...
But this only begs the question: how do
Dutch children figure out that topic drop is not used in their language? And there
would still be half of the data to explain away
...
They
found that in child German, while nonfinite verbs overwhelmingly
appear in the final (and not second) position, finite verbs overwhelmingly appear in the second (and not final) position
...
A finite verb in the second position does not mean it has
moved to the ‘V’ position, particularly if the preverbal position
is filled with a subject, as some of the examples taken from
Poeppel & Wexler (: –) illustrate below:
()

a
...

I have a big
ball
b
...

I do that not

If this were true, then an English utterance like Russell loves
Mummy would be classified as a V sentence
...

As shown in Table 
...
 This can be interpreted as the target V
grammar gradually wiping out the Hebrew-type grammar
...
%) of Dutch OVS
sentences is comparable to the frequency (
...

If we use Brown’s criterion that % correct usage signals successful acquisition, we may conclude that the Dutch child studied by
Haegeman has mastered V at ;, or has come very close
...


Competing Grammars 
(Clahsen )
...


4
...
, we
can give a quantitative evaluation of the Argument from the
Poverty of Stimulus (APS)
...
 that at the heart of APS lies the question:
why do human children unequivocally settle on the correct
(structure-dependent) rules for question formation, while the
input evidence does not rule out the incorrect, structure-independent, inductive generalization?
() a
...

d
...

Front the auxiliary verb that is most closely follows a noun
...

Front the auxiliary verb whose position in the sentence is a prime
number
...


for which the relevant evidence is in many ways ambiguous:
() a
...
Has Robin e finished reading?

Recently, the argument for innate knowledge based on structure dependency has been challenged by Sampson (), Pullum
(), and Cowie (), among others
...
Here we will focus
on Pullum’s objections and show that they are not valid
...
This assumption is
incorrect: the learner in fact has to rule out all, in principle infinitely many, hypotheses compatible with (); cf
...

But for the sake of argument, suppose it were the case that the

 Competing Grammars
learner had only a binary choice to make, while keeping in mind
that if the learner did not have prior knowledge of structure
dependency, the effort it takes to rule out all possible hypotheses
can only be harder than that to rule out just (a)
...
He discovered
that in the first  sentences he examined, , or %, are of these
two types
...
How fundamental are the changes these events portend?
b
...
Is a young professional who lives in a bachelor condo as much a part
of the middle class as a family in the suburbs?
d
...

This argument commits a logical error: a mere demonstration
that critical evidence exists does not mean that such evidence is
sufficient
...

It then forces us to the problem of how to quantify ‘sufficiency’
of critical evidence that serves to disambiguate alternative
hypotheses
...

But there is another, equally suggestive, way of evaluating
Pullum’s claim: we situate the case of structure dependency in a
comparative setting of language acquisition
...

First and foremost, we must take an independent case in acquisition, for which we have good knowledge of children’s developmental time course, and for which we can also obtain a corpus
frequency of the relevant evidence
...

As reviewed earlier, English children’s subject use reaches adult
level at around ; (Valian )
...
In
both cases, the learners make a binary choice: Valian’s children
have to determine whether the language uses overt subjects, and
Crain & Nakayama’s children would, if Pullum were correct, have
to rule out the possibility that language is structure-dependent
but not linear
...
 If English subject use is
gradually learned on the basis of there expletive sentences, which
represent roughly 
...
% in the input data
...
The
Wall Street Journal hardly fits the bill, a point that Pullum himself


One may reject models that do not predict such frequency–development correlations, on the ground that the comparable time courses of subject acquisition and V
acquisition (section 
...


 Competing Grammars
acknowledges
...

For example, based on fifty-six files in the Nina corpus, we found:
() , sentences, of which , are questions, of which
a
...

b
...
Where’s the little red duck that Nonna sent you?
(NINA
...
Where are the kitty cats that Frank sent you? (NINA
...
What is the animal that says cockadoodledoo?
(NINA
...
Where’s the little blue crib that was in the house before?
(NINA
...
Where’s the other dolly that was in here? (NINA
...
What’s this one up here that’s jumping? (NINA
...
Where’s the other doll that goes in there? (NINA
...
What’s the name of the man you were yesterday with?
(NINA
...
What color was the other little kitty cat that came to visit?
(NINA
...
Where’s the big card that Nonna brought you?
(NINA
...
And what was the little girl that came who also had whiskers?
(NINA
...
Where’s the card that Maggie gave you for Halloween?
(NINA
...
Nina # where are the pants that daddy sent you?
(NINA
...
Where are the toys that Mrs Wood told you you could bring
home? (NINA
...
%: that is forty times lower than 
...
However,
this line of reasoning would not work if children know where sentence boundaries are,

Competing Grammars 
evidence needed to settle on one of two binary choices by around
the third birthday
...
In an earlier paper,
Legate () finds the following:
() In a total of , sentences, , were questions, of which
a
...

b
...
Where’s the part that goes in between? (ADAM
...
What is the music it’s playing? (ADAM
...
What’s that you’re drawing? (ADAM
...
What was that game you were playing that I heard downstairs?
(ADAM
...

Furthermore, crucial evidence at a frequency around 
...

Interestingly, the canonical type of critical evidence, [aux [NP
aux]], appeared not even once in all , adult sentences
...
 And the conclusion
i
...
that the punctuation between two clauses signals a fresh start
...
In any case, we only found  such
sentences in the Nina corpus,  of which contain the special symbol #, which encodes a
significant pause separating two clauses
...



Of these, it is not even clear whether the equative sentences (b-iii) and (b-iv)
necessarily count as evidence against the first auxiliary hypothesis
...

The Nina sentences in (b-iii), (b-vi), and (b-viii) are of this type as well
...
Mother: What is the funniest bird you ever saw?
 In any case, the claim that children entertain the first auxiliary hypothesis for
question formation is false
...
contains the
instruction: Construct a structure-dependent rule, ignoring all
structure-independent rules
...


4
...
We begin
with a typology of subject use across languages, which serves to
establish the nature of the candidate grammars that compete
during acquisition
...
In one group that includes languages like Italian
and Spanish, a null subject is identified via unambiguous agreement (number, person, gender) morphology on the verb
...
That is, there is no reason
that unambiguous agreement would force a language to be prodrop
...
That is,
() a
...

b
...

\

In the group of languages that includes Chinese, a null subject
is identified via linking to a discourse topic, which serves as its
antecedent
...

structure-independent hypothesis is ever entertained; no such proof can exist, given the
normal ethics of human subject experimentation
...
% for relatively late ‘learnings’
...
In contrast, Chinese does freely allow NO (Huang ),
which, like null subjects, can be recovered by linking the empty
pronominal to a discourse topic:
() TOPIC [Zhangsan kanjian-le e]
...

‘Zhangsan saw him
...
When a
topic phrase (Top) is fronted, subject drop in Chinese is grammatical only if Top is not a possible antecedent for the null
subject, for otherwise the linking to discourse topic is disrupted
...

()

a
...

(e = John)
In park-LOC,
[e t beat-ASP people]
...

b
...
(e = John)
Sue, [e likes t]
...


Italian identifies null subjects through agreement morphology,
and does not have the restrictions on subject drop seen above in
Chinese
...
Chi e ha
baciato t?
Who has(SGM) kissed t?
‘Who has he kissed?’
b
...
Since Chinese
cannot front Wh phrases (in questions or any other constructions), only topicalization
data can be given in ()
...
Dove hai
e visto Maria t?
Where have(SG) e seen Maria t?
‘Where have you seen Maria?’

The differences between Chinese, English, Italian subject use
are summarized below:
()

a
...

b
...

c
...


We shall see how such differences play out their roles in child
language acquisition, disambiguating these grammars from one
another
...
We will
again stress that the learner does not actively search for the
patterns in () to identify their target grammar, as in a cue-based
learning model
...
For example,
both English and Italian grammars will be punished in a Chinese
environment when a null object sentence is encountered
...


4
...
1 The early acquisition of Chinese and Italian
subject drop
Here we study the acquisition of subject use in Chinese and
Italian children; we turn to English children in section 
...
So, when we say a ‘Chinese grammar’, we
mean that the type of grammar that employs discourse-based
argument drop
...
Here, null object sentences like () are unambiguous

Competing Grammars 
evidence for a Chinese-like grammar
...

() shows that Chinese adults use a fair amount of object drop
sentences in speech to children (
...
In section 
...
We thus predict that from very early on, Chinese children
have eliminated English and Italian grammars, and converged on
the remaining grammar, the target
...
Wang et al
...
% and objects 
...
% and 
...
This is probably due to the
fact that the statistics from Wang et al
...
Our own study of production data yields the figure of %
...
in the same study
(: –)
...
[e] Xiàyu-le
(It) rain-ASP
...

b
...

[It] seems
(it) going to
rain-ASP
...


In general, Chinese children in all age groups leave the subject
position null
...
All sentences contained a verb or a predicate, and thus
an opportunity for subject drop
...


 Competing Grammars
Let us now turn to Italian children
...

This means that every subjectless question with an argument
(object) Wh question punishes a Chinese grammar, and of course
an English grammar as well
...
We also
know that Wh questions are one of the most frequent constructions children are exposed to
...
This
prediction is confirmed by Valian’s findings (): at both of the
developmental stages investigated (;–; and ;–;), Italian
children drop subjects in about % of sentences, roughly the
same as the figures in adult speech reported in the references cited
above
...
3
...

We first claim that the Italian grammar can very rapidly be
eliminated by English children on the basis of their knowledge of
agreement morphology
...
Phillips (: ), reviewing a number of crosslinguistic
studies, observes that ‘in languages with overt agreement
morphology, children almost always use the agreement
morphemes appropriate to the argument being agreed with’
...
g
...


Competing Grammars 
Children’s near-perfect knowledge of agreement morphology
plays an important role in grammar competition
...
Hence, one
must understand the variational model and the evaluation of
grammar–sentence compatibility in the sense of strong generative
capacity (cf
...
We remarked earlier in () that
unambiguous agreement is a necessary condition for pro-drop
...
Specifically, if an Italian grammar is chosen
to analyze English input, the lack of unambiguous agreement in
English causes the Italian grammar to fail and be punished as a
result
...
Chinese
employs discourse linking as the mechanism for null subject
identification; morphology provides no useful information
...
% of all input sentences
...

The claim of grammar coexistence attributes English child NS
to the presence of the Chinese grammar, which is probabilistically
accessed
...
%) (Wang et al
...
Note that an Italian-type grammar
selected by an Icelandic learner will not be contradicted by agreement reasons
...
It is well known that in pro-drop languages such as
Italian and Spanish, the use of overt pronouns is unnatural for normal discourse, and
is reserved only for focus, contrast, stress, etc
...

I would like to thank Norbert Hornstein for bringing this problem to my attention
...
We also predict that child English ought to contain a
certain amount of null objects (NO), grammatically acceptable in
Chinese
...
g
...
Furthermore, the presence of the Chinese grammar
entails that the distributional patterns of English child NS ought
to show characteristics of a Chinese grammar
...

First, recall that in a Chinese-type grammar, NS is only possible in adjunct topicalizations (a), but not in argument topicalizations (b)
...
 This prediction
is strongly confirmed
...
% (/) of Wh questions with NS are adjunct (how, where)
questions
...
% (/) of object questions (who, what) contain subjects
...
Since both NS and NO are attributed to the Chinese
grammar, we predict that the relative ratio of NS/OS will hold fairly
constant across English and Chinese children in a same age group
...
Suppose that for Chinese children, NS ratio is s and NO ratio is o, and that for English children,
NS ratio is s′ and NO ratio is o′
...
 Recall that Chinese children learn their grammar very early,
 The fronting of the Wh word in question formation, of course, is an early acquisition, as noted in section 
...

 Note that p is a variable that diminishes over time when the Chinese grammar is
on its way out
...
Now if we scale up p to %, that is,
English children were to use the Chinese grammar monolingually, we expect of their NS and OS ratios to be identical to
those for Chinese children
...

The confirmation for this prediction is shown in Fig
...
 It plots the slopes
of NO/NS for both Chinese and American children, which are
virtually indistinguishable: the raw statistics are 
...
% and 
...

Finally, we may add that the expletive subject elicitation study
of Wang et al
...
It is raining
...
It’s rain
...
They can’t come out
...
No snow
...
Snow
...
(DS: ;)

This again demonstrates the coexistence of both types of grammars
...
The model explicitly appeals to the
syntactic properties of competing UG grammars given by theories of adult linguistic competence
...
Hence it is important to use statistics from a single study: the experimental
design and counting procedure would be consistent for both American and Chinese
children
...


 Competing Grammars
1
American children
Chinese children

Null Object %

0
...
6

0
...
2

0
0

0
...
4
0
...
8

1

F I G U R E 
...
 In addition, performance-based theories seem
to be self-contradictory
...

The recent optional infinitive (OI) based approach to null
subject (e
...
Rizzi , Sano & Hyams , Hyams , Wexler
), which holds that null subjects are licensed by non-finite
root verbs, also says nothing about the quantitative findings
reported
...
g
...


Competing Grammars 
roughly the same time
...
 For example, the OI stage for a Dutch child Hein
(Haegeman : table ) essentially ended at ; and ;, when his
OI usage dropped to % and %
...


4
...
Language acquisition can be modeled as a selectionist process in
which variant grammars compete to match linguistic evidence
...
Under the condition of explanatory continuity, the irregularity in
child language and the gradualness of language development can be
attributed to a probabilistic combination of multiple grammars,
rather than to imperfect exercise of a single grammar
...
Formal sufficiency and development compatibility can be simultaneously met in the variational model, for which the course of acquisition is determined by the relative compatibilities of the grammars
with input data; such compatibilities, expressed in penalty probabilities, are quantifiable and empirically testable
...
The first step is the observation of non-uniformity in
children’s language: the deviation from the adult grammar they
are acquiring
...
Third, we associate each
of the competing grammars with its corresponding disconfirming evidence in the linguistic environment, i
...
input patterns that
they are incompatible with
...
Finally, we


See Phillips () for additional discussion that the correlation between OI and
NS is weak
...

Quantitative predictions are then possible
...
In future work, this procedure will be systematically
applied to a wider range of topics in children’s syntax
...

Noam Chomsky and Morris Halle, The Sound Patterns
of English (), p
...
 If our (linguistic) ancestors had brains like ours, then UG, as
we understand it through our languages, would have governed
their languages as well
...
Thus, UG must be placed at a central position in the
explanation of language change
...
g
...
Ultimately, language changes because learners acquire
grammars that are different from that of their parents
...

Following Battye & Roberts () and others, this iterative


A similar and more familiar stance has been taken by J
...
Fodor and L
...


 Language Change
process can be stated in the familiar distinction between E- and Ilanguages (Chomsky ) (see Fig
...
It determined the languages they
acquired, and the linguistic evidence they provided for later
generations
...
 extrapolated over time specifies the dynamics of a formal model of language change
...

When one gives descriptions of a certain historical change—for
example, the change of a parameter from one value to another—
one must give an account, from a language-learning perspective,
of how that change took place
...
Of these, two
aspects deserve particular attention
...
For example, one would like to make claims that when
such and such patterns are found in certain distributions, linguistic change is bound to occur
...
It is common to find in the literature appeals to social, political, and cultural factors to explain
language change
...
The dynamics of language acquisition and language change

Language Change 
and independently motivated model which details how such
factors affect language acquisition
...
Again, these claims can be substantiated only
when supporting evidence is found in synchronic child language
development
...
It characterizes the
dynamic interaction between the internal Universal Grammar and
the external linguistic evidence, as mediated by language acquisition
...

Section 
...
In sections 
...
 we
apply the model to explain the loss of V in Old French and the
erosion of V in Old English
...
1 Grammar competition and language change
5
...
1 The role of linguistic evidence
Given the dynamics of language change in Fig
...
Two mutually incompatible grammars constitute a heterogeneous
linguistic environment

 Language Change
result in generation n +  learning a language different from
generation n
...

This conclusion is warranted only when another logical possibility is rejected
...
We have three arguments against this
possibility
...
g
...
Stylistics aside, all of us with similar experience attain core
grammars that are very similar to each other
...
But it appears that even here individual variations are ineffective; whole groups of speakers must, for some reason
unknown to us, coincide in a deviation, if it is to result in a linguistic change
...


And finally, while one might attempt to invoke the idea of
individual mislearning to explain historical change in some

Language Change 
languages, it leaves mysterious the relative stability in other
languages, say, the rigidity of word order in Western Germanic
languages
...
A
question immediately arises: what makes the linguistic evidence for
generation n +  different from that of the previous generation?
There are many possibilities
...
These are interesting and important topics of research, but are not relevant for a formal model of
language change
...
The precise manner in which new genes arise, which could
be mutation, migration, etc
...

After all, the world would have looked very different if the comet
that led to the demise of dinosaurs had been off target
...

Hence, we are chiefly concerned with the predictable consequences of such changes: what happens to language learners after
the linguistic evidence is altered, and how does it affect the
composition of the linguistic population as a result?

5
...
2 A variational model of language change
Suppose that, as a result of migration, genuine innovation, and
other sociological and historical factors, a linguistic environment is
 A desirable feature of a competence theory but by no means a necessary one: see
Yang () for discussion in relation to the issue of ‘psychological reality’
...

The expressions used in such an environment—call it EG, G—
can formally be viewed as a mixture of expressions generated by
two independent sources: the two grammars G and G
...
Call  () the advantage of G (G)
...
 illustrates
...
Recall from Chapter  that
the fitness of individual grammars is defined in terms of their
penalty probabilities:
() The penalty probability of a grammar Gi in a linguistic environment E is
ci = Pr(Gi → s | s ∈ E)
/

The penalty probabilities ultimately determine the outcome of
language acquisition:
c


() limt → ∞ p(t) = ———
c + c

c
limt → ∞ p(t) = ———
c + c

Suppose that at generation n, the linguistic environment EG, G
= pG + qG, where p + q = 
...
The penalty
probabilities of G and G, c and c, are thus q and p
...
Based on (), we have:
()

p′ p/(p + q)
— = ——— —
——
q′ q/(p + q)
p
=—

q

In order for G to overtake G, the weight of G (q) internalized in
speakers must increase in successive generations and eventually
drive the weight of G (p) to 
...
Thus, we
obtain a sufficient and necessary condition for grammar competition in a linguistic population:
()

The fundamental theorem of language change
G overtakes G if  > : the advantage of G is greater than that of G
...

Note that we may not be able to estimate  and  directly from
historical context
...
e
...
However, () says that if q′ > q (G is on the rise), it
must be the case that  > , and, if  > , G will necessarily
replace G
...


Obviously, () and () are very strong claims, and should be
closely scrutinized in future research
...
) that has frequently
been observed in language change (Weinreich et al
...
9
0
...
7
0
...
5
0
...
3
0
...
1
0
0

5

10

15

20

FI G U R E 
...

The present model shares an important feature with the work
of Clark & Roberts (), which extends the use of Genetic
Algorithms in acquisition (Clark )
...
However, they identify the final state of
acquisition with a single grammar (cf
...

Therefore, when the linguistic evidence does not unambiguously identify a single grammar, as a realistic, inherently variable
environment, they posit some general constraints on the
learner, e
...
the elegance condition, which requires the learner to
select the simplest among conflicting grammars
...
g
...
, Kroch )
...
For example, Santorini () demonstrates that in early Yiddish subordinate clauses, individual
speakers allowed both INFL-medial and INFL-final options
...
It is significant that both examples are from a
single source (Prossnitz ; , ):
()

a
...
d[a]z der mensh git erst oyf in di hikh
that the human goes first up in the height
‘the people first grow in height’

For the purpose of this study, we assume that all speakers in a
linguistic community are exposed to identical linguistic experience, and that a speaker’s linguistic knowledge is stable after the
period of language acquisition
...
We leave these options for further
research
...
We conclude that heterogeneity in the linguistic evidence, however introduced, is a
prerequisite for language change
...
The propagation of such grammars in successive generations of individual learners defines the dynamics of language
change
...


 Language Change

5
...
The following
examples are taken from Clark & Roberts ():
() Loss of null subjects
a
...
(ModF)
thus [they] had fun that night
...
Si firent
grant joie la nuit
...

() Loss of V
a
...
(ModF)
then heard-they
a clap of thunder
...
Lors oïrent ils venir un escoiz de tonoire
...


In this section, we will provide an analysis for the loss of V under
the variational model
...

Recall that in order for a ModF SVO grammar to overtake a V
grammar, it is required that the SVO grammar has a greater
‘advantage’
...
() shows the advantage patterns of V over
SVO, and vice versa:
() a
...
Advantage of SVO grammar over V grammar:
SVO → s but V → s: V >  (SXVO, XSVO)
/

If the distribution patterns in modern V languages are indicative
of those of ancient times, we can see that the V constraint is in
general very resilient to erosion
...


Language Change 
constraint is very strongly manifested
...
denn Johann hat gestern das Buch gelesen
...
so Johann had yesterday the book read
...
Our own counts, based
on a Dutch sample of adult-to-child speech reported in section

...
% SVO, % XVSO, and 
...
In
contrast, based on the Penn Treebank (Marcus et al
...
He always reads newspapers in the morning
...
Every evening Charles and Emma Darwin played backgammon
...

If the V constraint is so resilient, why did V succumb to SVO
in French? The reason, in our view, is that OF was also a null
subject language
...
However, this advantage would be
considerably diminished if the subject were dropped to yield [X V
pro] patterns: a null subject SVO grammar (like modern Italian)
can analyze such patterns as [X (pro) V]
...
)
 Joyes (esme Joye) (c
...



VS
%




NullS
%


...
As the same time, V > 
patterns have gone from fairly sparse (about < %) in OF (R: )
to –% in early MidFr, as the class of sentence-initial XPs that
do not trigger SV inversion was expanded (Vance )
...
Lors la royne fist Santré appeller
...

‘Then the queen had Saintré called
...
Et a ce parolles le roy demanda quelz prieres
And at these words the king asked
what requests
ilz faisonient
they made
...
Apres disner le chevalier me
dist
...

‘After dinner the knight said to me
...
)
 Joyes (esme Joye) (c
...
Thus, following the corollary in (), it
must be the case that an SVO grammar (plus pro-drop) has an
advantage over an OF V grammar (plus pro-drop)
...

Our analysis of the loss of V in French crucially relies on the
fact that null subject was lost after V was lost
...
In late fifteenth century and early sixteenth
centuries, when SVO orders had already become ‘favored’, there
was still significant use of null subjects, as the statistics in ()
demonstrate:

Language Change 
() The lasting effect of pro-drop in MidFr
SV
%
Anon
...

Anon
...
We see
that in the sixteenth century, when V almost completely evaporated, there was still a considerable amount of subject drop
...

We believe that the present analysis may be extended to other
western European Romance languages, which, as is well known,
all had V in medieval times
...
It appears that the combination of pro-drop
and V are intrinsically unstable, and will necessarily give away to
an SVO (plus pro-drop) grammar
...
It is reported (Bates , cited in Caselli et al
...
Now this is a figure already lower than the approximately
% of V >  sentences in which an SVO grammar has an advantage over a V grammar
...


 Language Change

5
...
Unless specified otherwise, all our examples and statistics are taken from
Kroch & Taylor (, henceforth K&T)
...


5
...
1 Word order in Old English
K&T show that Old English (OE) is, generally speaking, a
Germanic language similar to Yiddish and Icelandic
...

In OE, when the subject is an NP, the finite verb is in the second
position:
() V with NP subjects in OE
a
...
þ r wearþ se cyning Bagsecg ofsl gen
there was the king Bagsecq slain

In contrast, a pronominal subject precedes the verb, creating
superficially V patterns with a non-subject topic phrase:
() V with pronoun subjects in OE
a
...

each evil he can do
...
scortlice ic h bbe nu ges d ymb þa þrie d las
...
ðfter his gebede he ahof þ t cild up
...

Furthermore, there are genuine V patterns when the topic
position is occupied by a certain class of temporal adverbs and

Language Change 
adjuncts
...
Her
Oswald se eadiga arcebisceop forlet þis lif
in-this-year Oswald the blessed archbishop forsook this life
b
...
hwi sceole we oþres mannes niman?
why should we another man’s take
b
...
ne mihton hi n nigne fultum t him begitan
not could they not-any help from him get
d
...
3
...
Specifically, the southern dialect essentially preserved the
V of Old English: preposed XPs, with exception of a certain class
of adverbs and adjuncts noted earlier, generally trigger
subject–verb inversion with full NP subjects but rarely with
pronoun subjects (see Table 
...
The loss of
subject cliticization (and that of word-order freedom in general)
can further be linked to impoverishment of the morphological
case system of pronouns; see Kiparsky () for a possible theoretical formulation of this traditional idea
...
V in southern early Middle English
Preposed XP

NP subjects
% inverted

NP complements
PP complements
Adj
...
(: table )

in the southern dialect of early ME, which are manifested in
sentences with pronominal subjects () and certain adverb and
adjunct topics (), schematically shown as in ():
()

XP subject-pronoun VFIN
...
Therefore, patterns
such as () were no longer compatible with an OE type V
grammar
...
Examining Table 
...
When subject pronouns
could not be analyzed as clitics any more but only as NPs, the
SVO grammar would have had an advantage over the V grammar, and eventually rose to dominance
...

Notice that we immediately have an account for the so-called
‘residual V’ in modern English questions, certain negations, etc
...
In other words, the
linguistic evidence for those constructions has been homogeneous

Language Change 
with respect to a V grammar throughout the history of English
...


5
...
3 The northern dialect and language contact
In contrast to the southern dialect, K&T show that the northern
dialect, under heavy Scandinavian influence, was very much like
modern Germanic languages
...

As noted earlier, the V constraint exhibited in West Germanic
languages is difficult to overthrow
...
In
discussing the loss of V in Old French, we argued that subject
drop in Old French considerably diminished V’s advantage, to a
point where an SVO grammar, aided by an increase in V > 
patterns, eventually won out
...
They insightfully attribute
the erosion of V to the competition of grammars in learners
during language contact
...
The northern V dialect,
when mixed with the southern (essentially OE) language, constituted a heterogeneous linguistic environment for later generations
of learners, who, instead of converging to a single grammar,
attained a mixture of coexisting grammars
...
 shows the
consequences of language contact in the northern dialect
...


 Language Change
T A B L E 
...
complements
then
now
Adverbs








(/)
(/)
(/)
(/)
(/)
(/)

Pronoun subjects
% inverted







(/)
(/)
(/)
(/)
(/)
(/)

Source: Kroch et al
...
Recall that prior to
contact the northern dialect was much like Germanic languages,
in which V is strongly enforced: Kroch et al
...
% of all sentences containing
subjects
...
), while NP subjects still
in general follow subjects, the overall subject–verb inversion rate
has dropped to 
...
When the V constraint is
sufficiently weakened, and if the morphological case system of the
mixed language got lost, then an SVO grammar would have gradually taken over, in the manner described earlier for the loss of V
in OE
...
 That is, a West Germanic V
language similar to the northern dialect would not lose V without language contact, even if its morphological case system was
lost
...
Once
language contact was made, the homogeneity of linguistic
evidence was broken, and two distinct grammars were formed by
the learners
...
K&T’s thesis

Language Change 
that language contact is the prerequisite for the loss of V in the
northern dialect dovetails with our theoretical model rather
nicely
...
4 Limitations of the Model
The variational model of language change formalizes historical
linguists’ intuition of grammar competition, and directly relates
the statistical properties of historical texts to the direction of
language change
...

At the same time, we would like to stress the scope and limitation of this model
...
Second, the model is
constructed for syntactic change
...
Principles of phonology and phonological learning are
very likely to be different from those of syntax
...

The most severe limitation is that the present model operates
in a vacuum
...
Second, it
does not consider sociological forces that may affect language
learning, use, and ultimately, change (cf
...
This is
obviously an idealization that must be justified when applied to
actual case studies
...
We propose that this model be used as
some sort of null hypothesis, again drawing a parallel with biology
...
—is the null hypothesis, precisely because of its
predictiveness
...

In any case, we hope that this work will contribute to a formal
framework in which problems in language change can be studied
with precision
...


6
Summary
A nativism of domain specific information needn’t, of
course, be incompatible with a nativism of domain specific
acquisition mechanisms
...

Jerry A
...
–

To end this preliminary study of the variational approach to
language, let us return to the abstract formulation of language
acquisition to situate the variational model in a broader context
of cognitive studies
...
The connection between
S and L is made possible by the variational and probabilistic
thinking central to Darwinian evolutionary theory
...
The present
approach, if correct, shows that a synthesis of Universal Grammar
and learning is not only possible but desirable
...
1 Knowledge and learning
We stress again that the admission of general learning/growth
principles into language acquisition in no way diminishes the
importance of UG in the understanding of natural language
...
Recall, as discussed in Chapter , the presence of Chinese-type topic drop during English children’s Null
Subject stage, as demonstrated by the almost categorical asymmetry in argument vs
...
It is inconceivable that such patterns can be
explained without appealing to extremely domain-specific properties of grammars
...

On the other hand, the variational model complements
linguistic theories in two novel and interesting ways
...
In Legate & Yang
(), we pursue this line of thinking by exploiting the parallels
between children’s violation of the Binding Condition B and the
apparent violation of the same condition in languages such as Old
English
...

Second, the acquisition model provides an independent and
theory-neutral tool for accessing the psychological status of
linguistic theories
...
D
...
Frazier,
among others, in the study of sentence processing as an evaluation criterion for linguistic theories
...

Just as there are infinitely many ways to slice up a cake, each a
potential controversy at a party, disagreement arises when one
linguist’s insight is not shared by another
...
Suppose further that each
gives a descriptively adequate account of some range of linguistic
data
...

()

T → L → D
T → L → D

() can be carried out straightforwardly: the identification of
the relevant evidence to set target values, followed by the estimation of their frequencies in naturalistic corpora
...
 and 
...
), and, when available, diachronic trends through time
(Chapter )
...

 A desirable feature of a competence theory but by no means a necessary one: see
Yang () for discussion in relation to the issue of ‘psychological reality’
...
A PCFG, if a
sufficiently large number of rules is allowed, can indeed approximate the distribution of sentences in all languages, and can be
useful for many engineering applications
...
For English,  would be , and in
Italian,  would be  (and thus , )
...
The
algorithm for updating these probabilities to fit the data may even
be similar to that used in the variational model, and thus has the
virtue of explaining the gradualness in child language
...
Consider another aspect of grammar, that
of Wh movement, which roughly breaks into the overt Wh movement languages such as English and in-situ ones such as Chinese
...


S→
...


As we know, Wh movement in English is acquired very early—
virtually no Wh words are left behind (Stromswold )—
hence  must be driven to  very quickly
...


Summary 
late, i
...
why ’s rise to  in () appears to be a much slower
process
...
But my suspicion is that when the crosslinguistic facts of language acquisition and the nature of the input
evidence are considered (e
...
section 
...
That
would be something we can all agree on
...
The hypothesis space of language, S, above all,
must be studied with respect to adult language typologies and
child language development, and all evidence points to a domainspecific body of knowledge
...
Fodor suggested in the quote at the
beginning of this chapter
...
For example, subject uses in Italian, Chinese,
and English children (section 
...
g
...
Alternative formulation of subject use,
which may be descriptively perfectly adequate, may lead to incorrect developmental predictions
...


6
...
By the use of variational thinking and statistical modeling, the approach developed here may provide a
principled way of bridging a similar gap, which lies between
linguistic competence and linguistic performance, or between
theoretical linguists’ idealized and discrete grammars and the
variabilities and gradients in language acquisition and use
...
While
these differences may be attributed to different values of parameters, as Kayne suggests, it is also possible that speakers may
acquired different parameter-value weights (section 
...
Yet not all
these differences in individual speakers are interesting to theoretical linguists, just as not all differences in individual organisms—we know that no two organisms are exactly the
same—are interesting to theoretical biologists
...


Summary 
Chomsky () remarks that the learning of a language is
much like the development and growth of physical organs
...
Competition and selection in the
learning model immediately recall Hubel & Wiesel’s () classic study on the development of pattern-specific visual pathways
...
There seem to be neural
groups, available at birth, that correspond to specific aspects
(read: parameters) of stimulus—for example, orientations,
shades, and colors of visual scenes
...

Selectional growth at the behavioral level in other species has
also been documented (Marler )
...
In the beginning there
are many different forms of songs, characterized by a wide range
of pitch, rhythm, and duration
...
In the end,
sparrows acquire their distinctive local ‘dialects’
...

In a biologically continuous view, a human child in a specific
linguistic environment, much like a swamp sparrow in New York
or Minnesota, develops an ambient grammar out of many undifferentiated blueprints
...

I have aimed to show that the variational perspective resolves
some puzzles in the UG approach to language acquisition, and


I would like to thank Marc Hauser for emphasizing this relevance to me
...
The investigations reported here are no doubt
preliminary; I only hope that they have convinced the reader that
this line of research is worth pursuing
...


References

ALLEN, S
...

Amsterdam: John Benjamins
...
Abductive and Deductive Change
...

ANDERSON, S
...
A-Morphous Morphology
...

ATKINSON, R
...
, & CROTHERS, E
...
New York: Wiley
...
-J
...
Washington, DC: Center
for Applied Linguistics
...
The Polysynthesis Parameter
...

—— ()
...
New York: Basic Books
...
, & SUTTON, R
...
Cambridge, Mass
...

BATES, E
...
New
York: Academic Press
...
Learning Rediscovered: A Perspective on Saffran,
Aslin, and Newport
...

BATTYE, A
...
Introduction
...
Battye & I
...
),
Clause Structure and Language Change
...

BEHRENS, H
...
Doctoral
dissertation, University of Amsterdam
...
The Child’s Learning of English Morphology
...

BERTOLO, S
...
, GIBSON, E
...
Characterizing
Learnability Conditions for Cue-Based Learners in Parametric Language
Systems
...

BERWICK, R
...
Cambridge, Mass
...

—— & NIYOGI, P
...
Linguistic Inquiry :
–
...
The Grammatical Basis of Linguistic Performance
...
: MIT Press
...
Language Development: Form and Function in Emerging
Grammars
...
: MIT Press
...
Subjectless Sentences in Child Languages
...


 References
—— ()
...
Linguistic Inquiry : –
...
, NICOL, J
...
Children’s Knowledge of
Binding and Coreference: Evidence from Spontaneous Speech
...

BLOOMFIELD, L
...
O
...
Jespersen, The Philosophy of
Grammar
...

BORER, H
...
The Maturation of Syntax
...
Roeper & E
...
), Parameter Setting
...

BOSER, K
...
Master’s
thesis, Cornell University
...
How Language Comes to Children: From Birth
to Two Years
...
: MIT Press
...
, & WEXLER, K
...
In
C
...
Ganger, & K
...
), MIT Working Papers in
Linguistics : –
...
A First Language
...
: Harvard University
Press
...
Derivational Complexity and order of Acquisition
in Child Speech
...
R
...
), Cognition and the Development of
Language
...

BURZIO, L
...
Journal of Linguistics
: –
...
, & MOSTELLER, F
...

Psychological Review : –
...
Stochastic Models for Learning
...

BYBEE, J
...
Morphological Classes as Natural Categories
...

—— & SLOBIN, D
...
Language : –
...
, BATES, E
...
, FENSON, J
...
, SANDERL, L
...
A Cross-Linguistic Study of Early Lexical Development
...

CHANGEUX, J
...
The Neuronal Man
...

CHARNIAK, E
...
Cambridge, Mass
...

CHOMSKY, N
...
MS,
Harvard University and Massachusetts Institute of Technology
...

—— ()
...
Cambridge, Mass
...

—— ()
...
New York: Pantheon
...
On Wh-Movement
...
Culicover, T
...
Akmajian
(eds
...
New York: Academic Press, –
...
Rules and Representations
...

—— ()
...
Dordrecht: Foris
...
Knowledge of Language: Its Nature, Origin, and Use
...

—— (a)
...
Mind : –
...
The Minimalist Program
...
: MIT Press
...
Linguistics and Brain Sciences
...
Marantz, Y
...

O’Neil (eds
...
Cambridge, Mass
...

—— & Halle, M
...
Cambridge, Mass
...

CHURCH, K
...
In R
...
), Formal Grammar: Theory and
Implementation
...

CLAHSEN, H
...
Linguistics : –
...
The Acquisition of Agreement Morphology and its
Syntactic Consequences: New Evidence on German Child Language from
the Simone Corpus
...
Meisel (ed
...

Dordrecht: Kluwer, –
...
Inflectional Rules in Children’s Grammars:
Evidence from the Development of Participles in German
...

CLARK, E
...
Cambridge: Cambridge University
Press
...
The Selection of Syntactic Knowledge
...

—— & Roberts, I
...
Linguistic Inquiry : –
...
What’s Within: Nativism Reconsidered
...

CRAIN, S
...

Behavioral and Brain Sciences : –
...
Acquisition of Structural Restrictions on Anaphora
...

—— & NAKAYAMA, M
...

Language : –
...
Maturation and the Acquisition of Sesotho Passive
...

DEPREZ, V
...
Negation and Functional Projections in Early
Grammar
...


 References
DRESHER, E
...

Linguistic Inquiry : –
...
A Computational Learning Model for Metrical
Phonology
...

EDELMAN, G
...
New York: Basic Books
...
Finding Structure in Time
...

—— ()
...
Machine Learning : –
...
, JOHN, M
...
, PARISI, D
...

()
...
Cambridge, Mass
...

EMBICK, D
...
(in press)
...
Brain and Language
...
The Verbal Complex V’–V in French
...

FASSI-FEHRI, A
...

Boston, Mass
...

FELIX, S
...
Cognition and Language Growth
...

FISHER, C
...
Acoustic Cues to Grammatical Structure in
Infant-Directed Speech: Cross-Linguistic Evidence
...

FODOR, J
...
Concepts
...

—— ()
...
Mind : –
...
Connectionist and Cognitive Architecture: A
Critical Analysis
...

FODOR, J
...
Unambiguous Triggers
...

—— & CROWTHER, C
...
Understanding Stimulus Poverty Arguments
...

FOX, D
...
Children’s Passive: A View from the ByPhrase
...

ˇ
FRANCIS, N
...
Frequency Analysis of English Usage: Lexicon
and Grammar
...
: Houghton Mifflin
...
, & KAPUR, S
...

Linguistic Inquiry : –
...
Linguistic Theory and Language Acquisition: A Note on
Structure-Dependence
...

GALLISTEL, C
...
The Organization of Learning
...
: MIT
Press
...
A
...

Journal of Memory and Language : –
...
, & WEXLER, K
...
Linguistic Inquiry : –
...
, DURIEUX, G
...
A Computational Model of
P&P: Dresher & Kaye () Revisited
...
Verrips & F
...
),
Approaches to Parameter Setting
...

GLEITMAN, L
...

Cognition : –
...
M
...
Information and
Control : –
...
J
...
Cambridge, Mass
...

GREENBERG, J
...
In J
...
), Universals of
Language
...
: MIT Press, –
...
T
...
Geneva
Generative Papers : –
...
Root Infinitives, Tense, and Truncated Structures
...

HALLE, M
...
Word : –
...
On Distinctive Features and Their Articulatory Implementation
...

—— ()
...
Proceedings of the North Eastern
Linguistic Society : –
...
Some Consequences of the Representation of Words in Memory
...

—— (b)
...
In PF:
Papers at the Interface
...
: MIT Working Papers in
Linguistics, –
...
The Stress of English Words –
...

—— ()
...
Trends in Cognitive Science : 
...
Distributed Morphology
...
Hale & S
...
Keyser
(eds
...
Cambridge, Mass
...

—— & MOHANAN, K
...
Segmental Phonology of Modern English
...

HALLE, M
...
D
...
MS, Massachusetts Institute of Technology and Yale University
...
The State of the Art
...

HUANG, C
...
J
...
Linguistic Inquiry : –
...
, & WIESEL, T
...
Journal of Physiology
: –
...
Language Acquisition and the Theory of Parameters
...

—— ()
...
In J
...
Goodluck, & T
...
), Theoretical Issues in
Language Acquisition: Continuity and Change in Development
...

—— ()
...
In H
...
), Generative Perspectives on Language
Acquisition
...

—— & WEXLER, K
...
Linguistic Inquiry : –
...
, & SAFIR, K
...
In O
...
Safir (eds
...
Dordrecht: Kluwer, –
...
Child Language, Aphasia and Phonological
Universals
...

JENKINS, L
...
Cambridge: Cambridge University Press
...
French Syntax: The Transformational Cycle
...
: MIT Press
...
Romance Clitics, Verb Movement, and PRO
...

—— ()
...
Cambridge, Mass
...

—— ()
...
In
R
...
Oxford: Oxford University Press,
–
...
The Historical Creation of Reflexive Pronouns in English
...

VAN KEMENADE, A
...
Dordrecht: Foris
...
‘Elsewhere’ in Phonology
...
Anderson & P
...
), A Festschrift for Morris Halle
...

—— ()
...
In A
...

Vincent (eds
...
Cambridge:
Cambridge University Press, –
...
An Analysis of Finite Parameter Learning in Linguistic Spaces
...
Massachusetts Institute of Technology
...
Reflexes of Grammar in Patterns of Language Change
...

—— ()
...
In M
...
Collins (eds
...
Oxford: Blackwell,
–
...
Verb Movement in Old and Middle English: Dialect

References 
Variation and Language Contact
...
van Kemenade & N
...
),
Parameters of Morphosyntactic Change
...

—— —— & RINGE, D
...
In S
...
van Reneen, & L
...
), Textual Parameters in Older Languages
...
:
Benjamins, –
...
, WILLIAMS, K
...
, STEVENS, K
...

Linguistic Experience Alters Phonetic Perception in Infants by  Months of
Age
...

LABOV, W
...
Language : –
...
, & BEVER, T
...
The Relation between Linguistic Structure
and Theories of Language Learning: A Constructive Critique of Some
Connectionist Learning Models
...

LEGATE, J
...
Was the Argument That Was Made Empirical? MS,
Massachusetts Institute of Technology
...
D
...
Empirical Re-Assessments of Stimulus Poverty
Arguments
...

LEGATE, J
...
, & YANG, C
...
Condition B is Elsewhere
...

LEVINSON, S
...
Cambridge, Mass
...

LEVY, Y
...
The Development of a Mixed Null Subject
System: A Cross-linguistic Perspective with Data on the Acquisition of
Hebrew
...

LEWONTIN, R
...
The Organism as the Subject and Object of Evolution
...

—— ()
...
In A
...
Miller, D
...

Lewontin, & W
...
San
Francisco, Calif
...
H
...

LIGHTFOOT, D
...
Cambridge:
Cambridge University Press
...
How to Set Parameters
...
: MIT Press
...
Shifting Triggers and Diachronic Reanalysis
...
van Kemenade
& N
...
), Parameters of Morphosyntactic Change
...

—— ()
...
Mind and
Language : –
...
The Development of Language: Acquisition, Change, and Evolution
...

LING, C
...
Answering the Connectionist Challenge: A

 References
Symbolic Model of Learning the Past Tense of English Verbs
...

MACKEN, M
...
Journal of Linguistics : –
...
Phonological Acquisition
...
Goldsmith (ed
...
Oxford: Blackwell, –
...
Names for Things: A Study of Human Learning
...
: MIT Press
...
, & LEINBACH, J
...
Cognition :
–
...
The Child Language Data Exchange System
...

MARANTZ, A
...
Cambridge,
Mass
...

MARATSOS, M
...
Journal of
Child Language, : –
...
Negative Evidence in Language Acquisition
...

—— ()
...

—— BRINKMANN, U
...
, WIESE, R
...
German
Inflection: The Exception that Proves the Rule
...

—— PINKER, S
...
, HOLLANDER, M
...
, & XU, F
...
Stanford, Calif
...

—— SANTORINI, B
...
Building a Large
Annotated Corpus of English: The Penn Tree Bank
...

MARLER, P
...
In S
...
Gelman (eds
...
Hillsadale, NJ:
Erlbaum, –
...
Evolutionary Genetics
...

MAYR, E
...
Cambridge, Mass
...

—— ()
...
Cambridge, Mass
...

—— ()
...
Cambridge, Mass
...

—— & PROVINE, W
...
Cambridge, Mass
...


References 
MOLNAR, R
...

Master’s thesis, Massachusetts Institute of Technology
...
, & CALIFF, M
...
Journal of Artificial
Intelligence Research : –
...
The Ecology of Language Evolution
...

MYERS, S
...
Natural Language and
Linguistic Theory : –
...
, & THATHACHAR, M
...
Englewood
Cliffs, NJ: Prentice-Hall
...
, GLEITMAN, L
...
Mother, I’d Rather Do
It Myself
...
E
...
A Ferguson (eds
...
Cambridge: Cambridge University Press,
–
...
, & BERWICK, R
...

MIT Artificial Intelligence Laboratory Memo No
...

—— —— ()
...

Cognition : –
...
Markov Processes and Learning Models
...

OSHERSON, D
...
, & STOB, M
...
Cognition : –
...
Systems that Learn
...
: MIT Press
...
Principles of the History of Language
...
:
McGrath
...
Zero Syntax: Experiencer and Cascade
...
:
MIT Press
...
Syntax at Age : Cross-linguistic Differences
...
Cambridge, Mass
...

PIATTELLI-PALMARINI, M
...

Cognition : –
...
On the Emergence of Syntax: A Crosslinguistic Study
...

PINKER, S
...

Cambridge, Mass
...

—— ()
...
In L
...
Liberman (eds
...
Cambridge, Mass
...

—— ()
...
New York: Basic
Books
...
, & FROST, A
...
Cognition : –
...
& Prince, A
...
Cognition : –
...
Regular and Irregular Morphology and the Psychological
Status of Rules of Grammar
...
Lima, R
...
Iverson (eds
...
Amsterdam: John Benjamins, –
...
Phrase Structure in Competition: Variation and Change in
Old English Word Order
...
D
...

POEPPEL, D
...
The Full Competence Hypothesis
...

POLLOCK, J
...
Verb Movement, Universal Grammar, and the Structure
of IP
...

PRASADA, S
...
Generalization of Regular and Irregular
Morphology
...

PROSSNITZ, I
...
A
...

Cracow
...
Learnability, Hyperlearning, and the Poverty of the
Stimulus
...

QUANG, PHUC DONG
...
In A
...
Salus, R
...
Vanek (eds
...
McCawley
...

RANDALL, J
...
Linguistics : –
...
Review of Elman et al
...
Journal of
Child Language : –
...
Null Object in Italian and the Theory of Pro
...

—— ()
...
Language Acquisition : –
...
Verbs and Diachronic Syntax: A Comparative History of
English and French
...

ROEPER, T
...
Bilingualism: Language and
Cognition : –
...
Null Subjects in Early Child English and the
Theory of Economy of Projection
...
Philadelphia:
Institute for Research in Cognitive Science, University of Pennsylvania
...
Principles of Categorization: A Historical View
...
Rosch
& B
...
), Cognition and Categorization
...


References 
RUMELHART, D
...
On Learning the Past Tenses of
English Verbs: Implicit Rules or Parallel Distributed Processing? In J
...
Rumelhart, & PDP Research Group
...
Cambridge,
Mass
...

SAFFRAN, J
...
, & NEWPORT, E
...
Science : –
...
, & FODOR, J
...
The Structural Trigger Learner
...
Bertolo
(ed
...
Cambridge: Cambridge
University Press, –
...
Language Acquisition: Growth or Learning? Philosophical
Papers : –
...
(ed
...
New York:
Academic Press
...
, & HYAMS, N
...
Proceedings of the North Eastern Linguistic Society : –
...
Variation and Change in Yiddish Subordinate Clause
Word Order
...

SAYE, T
...
Words, Rules, and Stems in the Italian
Mental Lexicon
...
Nooteboom, F
...
Wijnen (eds
...
Boston, Mass
...

SEIDENBERG, M
...
Science : –
...
Clause Structure and Word Order in Hebrew and Arabic
...

SMITH, N
...
Cambridge:
Cambridge University Press
...
A Dissertation on Natural Phonology
...

STROMSWOLD, K
...
PhD
dissertation, Massachusetts Institute of Technology
...
Acquisition of Nein and Nicht and the VPInternal Subject Stage in German
...

TESAR, B
...
Learnability in Optimality Theory
...
: MIT Press
...
, & WEXLER, K
...
Cambridge, Mass
...

TORRENS, V
...
In
MIT Working Papers in Linguistics 
...
: MITWPL, –
...
Parameters and Effects of Word Order Variation
...

ULLMAN, M
...
, PINKER, S
...
, LOCASCIO, J
...
H
...
Abstract presented at the rd Annual Meeting of
the Society for Neuroscience, Washington, DC
...
Null Subjects: A Problem for Parameter-Setting Models of
Language Acquisition
...

—— ()
...
Cognition : –
...
Against a Metrical Basis for Subject Drop in Child
Language
...

WANG, Q
...
, BEST, C
...
Null Subject vs
...

Language Acquisition : –
...
Markedness vs
...
Language Acquisition : –
...
, LABOV, W
...
Empirical Foundations for a
Theory of Language Change
...
Lehman & Y
...
), Directions
for Historical Linguistics: A Symposium
...

WEVERINK, M
...

MA thesis, University of Utrecht
...
Optional Infinitives, Head Movement, and the Economy of
Derivation in Child Language
...
Lightfoot & N
...
), Verb
Movement
...

—— ()
...
Lingua :
–
...
Formal Principles of Language Acquisition
...
: MIT Press
...
Verb Placement in Dutch Child Language: A Longitudinal
Analysis
...

XU, F
...
Weird Past Tense Forms
...

YANG, C
...
Psychological Reality, Type Transparency, and Minimalist
Sentence Processing
...

—— ()
...
Bilingualism: Language and
Cognition : –
...
Dig–Dug, Think–Thunk
...
London Review Of Books  ()
...
Panda’s Thumbs and Irregular Verbs
...

—— (in press) Parametric Learning and Parsing
...
Berwick (ed
...
Boston, Mass
...


References 
—— & GUTMANN, S
...
Paper
presented at the th Conference of the Mathematics of Language,
Orlando, Fla
...
, and SUSSMAN, G
...
Cambridge, Mass
...

—— —— ()
...
Paper
presented at the National Conference on Artificial Intelligence, Orlando, Fla
Title: Language Acquisition
Description: How can we acquire Language