Search for notes by fellow students, in your own course and all over the country.
Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.
Title: Speech Science (Production and Perception)
Description: Notes from a university module in speech science. Divided into two subsections: Speech production (Topics 1-5) and speech perception (Topics 6-10). Ideally aimed at those taking a 1st or 2nd year module in speech science. Notes are comprehensive and extensive, covering major topics in the field of study.
Description: Notes from a university module in speech science. Divided into two subsections: Speech production (Topics 1-5) and speech perception (Topics 6-10). Ideally aimed at those taking a 1st or 2nd year module in speech science. Notes are comprehensive and extensive, covering major topics in the field of study.
Document Preview
Extracts from the notes are below, to see the PDF you'll receive please use the links above
PALS2002 – Speech Science
Lecture 1 – Acoustic-Phonetic Characteristics of Speech Sounds
Course content:
-
-
Lectures
o Speech Production
Review: Acoustic-Phonetic characteristics of speech sounds
Effect of talker sex and gender on speech production
Effect of talker age on speech production
Instrumentation for the acoustic and articulatory measurement of speech
production
Speaker-Listener interaction
o Speech perception
The challenge of speech perception
Perception of multimodal phonetic cues
Perceptual warping and linguistic effects
Development of speech perception
Second language speech perception
Labs
o Speech Production
Segmentation and annotation of consonants
Acoustic analysis of consonants
Analysis of child speech
Analysis of spontaneous speech
o Speech Perception
VCV test
Neighbourhood activation model
Analysis of child directed speech
Relation of PALS1004 to PALS2002:
-
PALS1004: The phonetic description of speech production, the quantitative analysis of
speech sounds and sources of variation and variety in speech
PALS2002: The basic concepts in speech acoustics and speech perception
...
Complex resonators have more than
one
Putting a signal through a complex resonator:
The same model applies for speech production! The human voice system can be viewed as a
complex resonator excited by a sound source…
The vocal tract is a complex resonator…
-
Cavity sizes are changed via movement of articulators, such as the tounge
The source signal is vocal fold vibration, which produces the sound source for many speech
sounds (vowels and some consonants)
Women and children’s vocal folds are shorter and typically vibrate faster than men’s
...
e
...
Speech patterns in spectrograms:
-
Spectrograms are produced using a set of band-pass filters that approximate how the ear
analyses sounds
Many of the speech patterns seen on spectrograms are relevant for how we perceive speech
sounds in terms of their voicing, place and manner of articulation
Characteristic speech patterns for different classes of sounds (brief review):
Vowels
-
Formants as seen in spectrograms are bands of high energy
...
Therefore, there is no change in timbre
In dipthongs, articulators move during the production of the vowel, so formants change in
frequency, which results in changes in timbre
Plosives
-
Articulation
o Involves a complete obstruction (manner of articulation) at a given place along the
vowel tract (place of articulation)
o
o
-
Can be produced with or without voicing
Air from the lungs builds up behind the obstruction (‘closure’ period) and is suddenly
released, causing a burst
Acoustics
o An important marker of whether the plosive is voiced or unvoiced: Voice Onset Time
(VOT), i
...
the time between plosive burst and onset of vocal fold vibration
o Voiced consonants
Voicing starts less than about 30ms after the burst (short VOT)
Little to no aspiration
Voicing can occur during closure
F1 transition at vowel onset
o Unvoiced consonants
Voicing starts more than about 50ms after the burst (long VOT)
Aspiration visible during VOT
Voicing does not occur during closure
F1 cutback (F1 transition not visible)
Fricatives
-
-
Articulation
o Involves a cosnstriction (manner of articulation) at a given place along the vocal tract
(place of articulation)
o Can be produced with or without voicing
o Air from the lungs becomes turbulent when passing through the constriction and the
noise produced is then shaped by the resonances in front and behind the
constriction
Acoustics
o Voiced fricatives are shorter and less intense than voiceless fricatives and can have
voicing produced throughout the fricative (forming a ‘voice bar’)
Nasals
-
-
Articulation
o Involves an obstruction (manner of articulation) at a given place along the vocal
tract (place of articulation) together with a lowered velum to couple in the nasal
tract
o Is always produced with voicing in English
o Sound produced by vocal fold vibration is shaped by the nasal and oral resonances
behind the constriction
Acoustics
o Nasal formants also called the nasal murmur
Approximants
-
Similar formant frequencies expected as in the respective vowels
Similar transitions as in diphthongs but faster/shorter
No sudden spectral changes
Reminder! Connected discourse…
-
Acoustic patterns in connected discourse affected by:
o
o
o
o
Coarticulation
Speaking rate
Speaking style
Speaker characteristics
PALS2002 – Speech Science
Lecture 2 – Effect of talker sex and gender on speech production
Why this distinction…
-
-
Effect of biological sex on speech production
o Acoustic-Phonetic characteristics determined by the biological differences on male
and female speakers
Effect of gender on speech production
o Acoustic-Phonetic characteristics determined that are acquired and which reflect
gender identity
Vocal folds: Biological differences
-
Effect of differences in size of vocal folds and vocal tract
o Vocal fold length: between 17mm and 25mm in adult males, and 12
...
5mm in adult females
o Male vocal folds are longer and thicker, and vibrate more slowly as a result
Mean F0 for adult males: 100-120 Hz
Mean F0 for adult females: 200-220 Hz
When does this sex difference in F0 appear?
-
Pre-puberty, very similar F0 characteristics in boys and girls
Hormonal induced laryngeal changes at puberty
o Testosterone causes many significant changes to the male voice
Faster growth of the larynx than in women
Thyroid cartilage changes shape and grows in size in boys (Adam’s apple)
Increases in the size and thickness of the vocal folds: vocal folds grow twice
as fast in boys and girls
But even for mean F0…
-
There is evidence of learnt behaviours
o Male and female means can differ across languages and cultures
Pitch can be consciously manipulated to project certain traits
o Lower pitched voice perceived as more dominant and authoritative
Lower pitch register typically used by female newscasters
An Australian study of young women recorded in 1945 and 1993: significant
lowering of mean F0 by 23 Hz over 50 years (Klofstad et al
...
a 15% difference in
average vocal tract length
But notice also the overlap in the distribution of vocal tact length…
o Vocal tract length only partially determined by speaker sex
Men and women also show difference in vowel spacing (imagine a vowel quadrilateral)
But also factors other than anatomical factors…
-
Cross-linguistic variation in how male and female talkers differ from each other
This might indicate that cultural factors are involved in defining and shaping male or female
speech
And thus that anatomy does not completely determine the vowel formant frequencies
Differences in articulation rate between male and female speakers
See: Tuomainen and Hazan (2016)… #UCL
Articulation rate in conversational speech in interaction:
Socio-Phonetic factors can also cause gender differences
-
Differences in acoustic-phonetic characteristics of speech for female and male speakers can
be caused not by physiological differences but by the use of learnt ‘gender markers’
e
...
Girls learn to produce a much more sibilant /s/ sound than boys or than would be
predicted purely from differences in vocal tract size (Romeo, Hazan and Pettinato, 2013)
Women can adopt certain phonetic variants to a greater degree than men
Another example of sociophonetic effect…
-
A comparison of boys with and without Gender Identity Disorder (aged 5-13, n=30)
-
Boys with typical gender development rated as sounding more masculine; likely due to
acquired gender-specific speech traits (Munson et al
...
g
...
By 7 months, rhythmic jaw
oscillations emerge, so reduplicated canonical syllables become possible
...
, 2008)
-
Predominance of ‘modal’ phonation
Many ‘non-modal’ types of infant phonation
-
-
o Vocal fry
o ‘loft’ register
o Bi-phonation
Three early categories of vocalisation
o Vowel (mid-pitch, full vowels, or quasivowels)
o Squeals (high pitch)
o Growls (either low in pitch or mid pitch, with harsh vocal quality)
First contrastive vocal categories, so important basis for future speech development
Five stages in vocal development
-
Reflexive phonation (0-2m)
o Vegetative or reflexive sounds…crying, coughing, sneezing
Cooing (1-4m)
o Quasivocalic sounds
Expansion (3-8m)
o Clear vowels, wider variety of sounds
Canonical babbling (5-10m)
o Reduplicated CV syllables
Meaningful speech (10-18m)
Babble: Early stages
-
-
Early babble subject to biological constraints
o Earliest stable consonants:
Stops and nasals (Locke, 1983; McCune and Vihman, 2001) involving simple
raising and lowering of the jaw
o Frame/Content theory of early speech organisation (Davis and MacNeilage, 1995)
Early speech dominated by cycles of mandibular oscillation (the ‘frames’),
starting tongue position determines both consonant and vowel
Early stages of babbling fairly ‘universal’
The 12 most frequent consonants of the worlds languages comprise 95% of all consonants
produced in early babbling (Fromkin, 2001)
These are: /p, b, t, m, d, n, k, g, s, h, w, j/ (Locke, 1983)
Next stage of babble development
-
Gaining of greater voluntary control over consonant articulation
o Concept of ‘vocal motor schemes’ (VMS)
o Their definition (McCune and Vihman, 2001) of onset of VMS: ‘production of ten or
more occurrences of a given consonant in each of three out of four successive 30minute observational sessions’
o Consistency and stability in production
Importance of consonant development
-
Development in production marked by increase in proportion of vocalisations including a
consonant
o Consonant use in pre-linguistic vocalisations most useful predictor of:
Onset of speech (Menyuk et al
...
(2003)
-
Observed vocalisations in free play between mothers and 8 month old infants
In ‘yoked social response group,’ mother’s response was not contingent on infant’s vocal
behaviour
In ‘contingent social response group,’ mother directly responded to baby’s vocalisations
‘Contingent social interactions increased the proportions of vocalizations that had more
mature voicing, syllable structure, and faster (canonical) consonant–vowel transitions
...
,
2010):
o ‘Massive scale recording’
1,500 all day soundtracks from over 200 children, aged 10 months to 4
years, with recorders worn in a chest pocket of the children’s clothing
o Move away:
From short and infrequent samples to ongoing recordings of infant
vocalisations
From time consuming manual classification to automatic classification of
vocalisations
o Significant differences in voice related parameters (such as voicing, canonical
syllables, pitch, speech islands etc
...
, 2006): Neuroplasticity!
o Executive function
o Social congition
Social perception
Perspective taking (ToM)
Evidence of greater variability in speech production in children
-
Can be examined by looking at speech motor control directly (e
...
Walsh and Smith, 2002) or
the acoustic-phonetic realisations of speech productions (e
...
Lee et al
...
(2014): Development of clear speech strategies in 9 to 14 year olds
-
-
Acoustic-phonetic measures show different maturational patterns
o Children up to 15 years have higher median pitch than adults, mostly due to ongoing
physiological changes
o Higher mid-frequency energy than adults is seen up to approx
...
Study of variance in acoustic-phonetic measures
Lee et al
...
Thus, we predict that speech motor control processes follow a protracted
developmental time course and are influenced by sex-related differences in somatic growth’
...
g
...
Impact of regional (and social) mobility – Change in accent and
vowel production of students from Northern England going to Cambridge University…
What physiological effects occur at the other end of the age range (elderly speakers)?
Xue and Hao (2003)
-
-
Source
o Vocal folds become thinner and weaker
o Cartilages in the larynx become less mobile
In women, vocal folds become thicker after the menopause causing sharp
decrease in pitch
Increased mucus can cause lowering and increase in irregular vibration
Greater irregularity in vocal fold vibration
Filter
o Increase in oral cavity length and volume in elderly speakers
o Decrease in articulation rate
Some acoustic-phonetic consequences of ageing
Leeper and Culatta (1995)
Acoustic effects on filter
-
Lowering of formant frequencies
Less expanded vowel space due to reduced articulation
Slower speech rate
Greater proportion of disfluencies
Can we see these effects if looking at the same speakers over time?
-
Yes
...
, 2010)
Reduction in motor flexibility for complex words
-
Older speakers can show reduced motor flexibility
65-73 year olds make more errors when repeating complex nonwords than young adults,
although no difference for shorter words
What may be the impact of poor speech production on speech communication in older speakers?
-
Given that most communication may be within the same age group, weak voicing makes the
speech even harder to understand
Most older speakers also have age related hearing loss
Speaker-listener interaction likely to be problematic in many cases
PALS2002 – Speech Science
Lecture 4: Speaker-Listener Interaction
This lecture:
-
Different approaches to speech research
Lindblom (1990): Hyper-Hypo model of speech production
Use of different speaking styles in speech communication
Notion of phonetic accommodation
An example of a ‘hyper-speech’ speaking style: Infant-directed speech – Speaker-listener
interaction in speech development
Traditional approach to speech research
Separate studies of speech perception and production
-
-
Speech Perception
o Word/sentence intelligibility tests
o Categorical perception tests
Speech Production
o Acoustic-phonetic analyses of minimal pairs, words, sentences…
Speaker-Listener Interaction
-
-
In interactions, interlocutors attempt to achieve a shared understanding of the dialogue
o To ensure mutual understanding, speakers may need to take into account their
interlocutor’s needs
Speech production may be affected by:
o Listener’s listening environment
o Listener’s characteristics
o Feedback from the interlocutor
e
...
Aubanel and Cooke (2013), Pickering and Garrod (2004, 2013)
Model of speech production that considers speaker-listener interaction: Lindblom (1990): HyperHypo (‘Adaptive Variability’) Model of speech production
-
Variation in speech production viewed as continuous adaptation of speech production to the
demands of the communicative situation
Guiding principle: minimising articulatory effect without sacrificing communication
efficiency
Hypo vs Hyper Speech…
-
-
Hypo Speech
o Situations requiring low degree of clarity
o Interlocutor sharing same language knowledge
o High contextual information
o Poor acoustic distinctiveness
o Little communicative effort
Hyper Speech
o
o
o
o
o
o
Situations requiring high degree of clarity
‘Language-Impoverished’ interlocutor
Degraded listening environment
Low contextual information
Good acoustic distinctiveness
Greater communicative effort
Speech is produced somewhere along the ‘hyper-hypo’ continuum
...
(2002)
-
Acoustic-phonetic characteristics
o Lower speech rate, enhanced segmental contrasts (vowel space, VOT, etc
...
(2014)
-
Acoustic/environmental barriers: Finely-tuned adaptations to the type of noise/degradation
-
o Reduction in temporal overlap when talking in presence of modulated maskers
o Shifts in spectral energy according to frequency of background noise
Linguistic barriers
o Infant-, child-, foreigner- and pet-directed speech show different combinations of
acoustic-phonetic and linguistic characteristics
Real communications in difficult conditions: Granlund (2015)
Do normal hearing (NH) children make adaptations when speaking to hearing impaired (HI) children?
Are HI children able to make adaptations too when speaking to HI peers despite their delayed
speech development?
-
Speech produced with communicative intent (dialogue)
‘Ecologically-valid’ situation: problem-solving task imposing a certain cognitive load
Involves speakers who are used to regularly communicating with HI peers
Direct comparison of NH and HI children on same task
Grid task (Granlund, 2015)
-
-
To elicit several repetitions of keywords in a communicative situation
Production and perception of target sounds
o Problems will be more explicit
o Problems will need to be resolved
To elicit misunderstandings
Global acoustic-phonetic strategies:
-
-
Hi speakers:
o Speak more slowly
o Speak more loudly
o Have a wider F0 range than NH speakers
In HI directed speech, both NH and HI speakers:
o Slightly decrease speech rate
o Increase the intensity of their speech (more so for NH speakers)
o Increase F0 range
Notion of ‘phonetic accommodation’ (also called alignment, imitation, convergence)
...
g
...
‘Model speech’ approach:
-
Record a set of ‘model speakers’
Choose participants who vary, e
...
in their regional accent, to the models
Pre-test recordings: Participants record a set of sentences
Exposure: Participants listen to sentences produced by the ‘model speakers’
Post-test recordings: Participants re-record the same set of sentences as in the pre-test
Key analysis: Is the speech of the participants more similar to the model speakers in the
post-test?
o Behavioural tests using AXB discrimination: is A or B more similar to X (model)?
o Acoustic analyses of vowel formants, vowel duration, F0, etc
...
g
...
g
...
, 2009; 2015)
-
Recording of approx
...
’ Gives the best quality recording
‘Clipping’ as a result of signal being too loud…results in the signal being distorted
Tips for making a high quality recording…
-
If possible, use and external microphone rather than the internal microphone of the pc,
tablet etc
...
g
...
g
...
g
...
g
...
g
...
g
...
g
...
g
...
102,457
words, 139,751 syllables
...
Measuring lingual articulation
-
Ultrasound used to trace position and movement of the tongue during speech production
Applications? Investigations of:
o Co-articulation phenomena
o Development of articulatory control in children
o Pathological speech
Measuring tongue contact
-
Electropalatography (EPG): measures the position and duration of tongue contact with hard
palate during speech production
Electromagnetic articulography (EMA)
-
Induction coils used to create a magnetic field
Sensor coils placed on the tongue, jaw and other articulators
Measurement of the movement of sensor coils in space
SUMMARY
-
Importance of high quality recordings for speech production research
Important factors for high-quality recordings (sampling rate, SNR)
Equipment for high quality recordings
Tasks for speech elicitation
Analytic studies vs large corpora (‘big data’)
Why articulatory measurements may be useful
-
Some of the instruments used for articulatory measurements in the clinic and for research
PALS2002 – Speech Science
Lecture 6 – The Challenge of Speech Perception
A simple model of speech recognition:
The following lectures:
-
6: Why speech perception is a challenge, and more about acoustics
7: Low-level solutions
8: Lip reading, plus high-level solutions (i
...
linguistic processing)
9: Development of speech perception
10: Second-language speech perception
Ways in which speech perception is challenging
-
“Phonemes are not like beads on a string” – Hocket
The segmentation issue: No clear markers for the beginning or end of words or phonemes
“We speak to be understood, and only to be understood” – Passy
Between talker variablilty
Noise and channel effects
No necessary or sufficient features
Co-articulation
-
The acoustic realization of phonemes is affected by neighbouring sounds
The production of phonemes interacts with the production of neighbouring phonemes
The acoustic cues are broadly distributed in time
Makes segmentation hard, because there are no clear dividing lines where the cues for one
phoneme starts and the next begins
The listeners must somehow learn that two acoustically different phonemes are meant to
sound the same
Hyper and hypo articulations of vowels
-
Hyperarticulation: Produce vowels with more extreme locations when we need to speak
clearly
Hypoarticulation: Produce vowels in more central locations when we do not need to speak
clearly
Between talker variability
Female and male talker variability: Vowel spaces and vocal tract size
Kent and Reed (2002)
-
Formant frequencies are higher and the vowel space is larger for shorter vocal tracts
The same formant frequencies will be different vowels for different talkers
Another view on vowel spaces and vocal tract size: These are formant frequency differences, not
differences in pitch or fundamental frequency (Peterson and Barnsey, 1952)
Also, differences in vowel spacing as a function of accent (Iverson and Evans, 2003)
Some sources of talker variability:
-
The acoustic form of speech varies a great deal because of:
o Anatomy (e
...
vocal tract size)
o Accents
o Chosen speaker gestures
No necessary or sufficient features
Sinewave speech:
-
Formants can be replaced with sinewaves:
o Completely unnatural source
o No ‘normal’ acoustic cues
Cochlear implant simulations:
-
Speech can be replaced by bands of noise:
o Completely unnatural source
o Poor spectral information
Summary of the challenges
-
The cues for phonemes are mixed, not like “beads on a string”
Large within-talker variability; listeners can say the same thing different ways
Large between-talker variability
Noise can obscure acoustic cues
No acoustic cues are necessary or sufficient
Despite all this variability
-
High speed of processing
o 25-30 phones a minute
Amazing ability to ‘normalise’ across speakers
o Example of sentence with every sound spoken by a different speaker
How do we do it?
-
No one knows the full picture
Rest of the lectures will be about the parts that we do know:
o Phonetic perception
o Linguistic effects
o Development and 2nd language acquisition
PALS2002 – Speech Science
Lecture 7 – The Perception of Multimodal Phonetic Cues
Summary of how speech perception is challenging:
-
The cues for phonemes are mixed, not like “beads on a string”
Large within-talker variability: listeners can say the same thing in different ways
Large between-talker variability
Noise can obscure acoustic cues
No acoustic cues are necessary or sufficient
Next two lectures: How we meet the challenge:
Phonetic Information > Perception > Lexicon
-
-
Perception (low level)
o Auditory perception (today and next week)
o Visual perception (today)
Language Structure (high level)
o Linguistic processes (next week)
How do we meet the challenge of speech perception?
Part 1: Using Multiple Acoustic Cues
Example of multiple acoustic cues: production of voiced plosives
-
-
-
Step 1: Complete closure of the vocal tract
o Blocks flow of air through the oral cavity (impeding vocal fold vibration) and absorbs
acoustic energy
o Produces stop gap and voice bar
Step 2: Release of the closure
o Air rushes out through opening
o When only partially open, frication energy (i
...
noise due to turbulence) is produced
Seen as a burst on a spectrogram
Step 3: Onset of voicing
o Air pressure is released, so vocal folds can vibrate again
o Articulators move into position for next phoneme
Movement of transitions change resonant frequencies, which are seen as
formant transitions
A single articulation thus has many acoustic consequences
-
Stop gap: Plosive manner (wide frequency range)
Voice bar: Voicing (low frequencies)
Burst: Place of articulation (variable frequency)
F2 and F3 formant transitions: Place of articulation
Voice Onset Time: Voicing (wide frequency range)
F1 formant transition: Voicing and manner
Listeners pay attention to multiple cues (pieces of evidence about the articulation), rather than
relying on only one single cue
Multiple cues means that some information is redundant (i
...
several cues mean the same thing)
This is a big advantage for speech perception because if we cannot hear all of the cues, we may be
able to still recognise speech based on what we can hear
-
Cues can be knocked out due to noise or hearing impairment
Talkers do not always produce all of the cues clearly (i
...
they speak to be understood)
Talkers with different accents may produce a different set of cues than you expect to hear
Just because a cue is available to listeners, it does not mean that they use it for speech recognition
-
Listeners weight cues differently (i
...
depend on them to greater or lesser extents)
Cue weightings develop during infancy and childhood, as the individual works out what
combination of cues gives the ‘right’ answer
Acoustic analyses tell us what cues are in the signal, but not how someone uses them to
recognise speech
o For that, we need to run perceptual tests
How do you determine what acoustic pattern info is important for perception?
-
-
Why not just use natural speech?
o Can be good at measuring real-world performance, but does not give us much
control over the acoustic variation
Use of synthetic speech
o Controls acoustic cues
Use of ‘controlled’ tests
o Evaluate the perceptual effect of one or more speech patterns
o Use a ‘speech continuum’
How do you determine how people use an acoustic cue?
-
Construct a ‘speech continuum’ that varies a particular acoustic cue
Then, test whether the cue affects identification
Is this /ra/ or /la/?
Labelling Graph
-
Cues that are most important to listeners (primary cues) will have categorical labelling
functions
Cues that are less important to listeners (secondary cues) will have more progressive
labelling functions
Cues that are unimportant to listeners will have random or flat labelling functions
Why is it important to know how people perceive acoustic cues?
-
-
Scientific reasons
o Understand how speech perception works
o Understand how language experience shapes perception
Clinical applications
o Enhancement of acoustic cues for hearing-impaired children
o Evaluation of cause of peripheral difficulties (e
...
children with SLI)
Part 2: Using Lip-reading Cues
Information provided by visual cues
-
-
Global (non-phonetic) information
o WHO is speaking
o WHERE person is speaking
o WHEN person is speaking
o HOW (facial expression)
Segmental information
o Phonetic cues
Segmental information
-
We are good at perceiving place information through lip-reading
o Harder to perceive place information for articulations inside the oral cavity
Voicing information is invisible
o Cannot see the vocal folds vibrating
Manner information is partially visible
o Can see some differences in timing and liprounding
-
-
o Cannot see uvular flap (nasals)
Because place and manner are hard to see, there are groups of phonemes that essentially
looks the same
o /b/ and /p/
o /d/, /t/, /s/, /z/, /n/, /l/
Visemes: Groups of phonemes that look the same during lipreading
Audio/Visual integration
-
Complementarity between auditory and visual cues
Speech perception is multimodal
-
e
...
Audition + Vision, Audition + Touch
Notion of ‘superadditivity’: speech perception with 2 sources of information is often greater
than predicted on the basis of intelligibility for each source alone
Visual/Auditory cue integration
-
Issue: Is visual information used as a backup, or are both Audio and Visual evaluated before
a label is given to the sound
Test? McGurk Experiments!
The McGurk Effect
-
-
Visual /g/ + Auditory /b/ = Perceived /d/
o Visual cues
Tells us consonant could not be /b/ (because /b/ is so visible)
But it could be either /d/ or /g/ (hard to distinguish articulators in the
mouth)
o Auditory cues
Sounds like /b/
But /d/ is the next closest consonant
o We perceive the combination as /d/
It is the only phoneme that is possible given this combination of auditory
and visual cues
We perceive this combination even though the auditory information is not
ambiguous
The brain does not expect differing information from different sensory modalities, hence
why this effect is present
...
g
...
g
...
g
...
g
...
, 1957)
-
-
Categorical Perception: More accurate at discriminating stimuli at boundaries than within
categories
o Almost as if we perceive the stimuli in terms of their category labels
Opposite of continuous perception
o Same sensitivity for all acoustic differences
A two-dimensional view of perceptual warping: /r/ and /l/
Iverson et al
...
g
...
We do not
need to hear all phonemes perfectly
...
g
...
g
...
g
...
g
...
)
-
-
We activate “neighbourhoods” of phonetically similar words during speech perception
o e
...
Words that differ by one phoneme
We make a guess about which word in the neighbourhood matched the input
o We use lexical frequency in our guess
Easy words are high frequency and have few neighbours
...
g
...
They need a lot
of phonetic information to be recognised
o e
...
“Hack” or “Vat”
Effects of grammatical structure
-
Grammatical sentences
o Gadgets simplify work around the house
Semantically anomalous sentences
o Gadgets kill passengers from the eyes
Ungrammatical sentences
o Gadgets accidents country honey the shoot
Implications for the design of speech audiometry materials
-
-
All words and sentences are not equally easy to recognise
o Need to compare people using standardised tests
o Bad idea to test people repeatedly on the same test, because they learn the
materials and learn to guess
Different tests use context to different degrees, so it is important to know what level you
want to test
o Use of phonetic information?
o Use of semantic context?
o “Real-world” speech recognition skills?
VCV Test
-
Excellent for measuring what phonetic information we perceive
Does not measure any kind of higher-level processing (e
...
lexical, semantic, syntactic
effects)
SPIN Test
-
Tells us how well people can make use of semantic information, but tells us less about the
use of phonetic information
“Global” Tests
-
-
Text comprehension
o Presentation of paragraph level material followed by a set of open or closed
questions
Connected Disclosure Tracking
o How many words in a passage can you “transmit” to a listener per minute?
Sentence verification task
o
Reaction time for “true/false” responses to sentences such as “Mud is dirty” and
“rockets move slowly”
Levels of assessment (from Analytic to Global)
-
-
Analytic
o Far from “normal communication”
o Word level
o Provides reliable information about the use of acoustic information
Global
o Close to “normal communication”
o Sentence/paragraph level
o Cannot reliably be used to evaluate the use of acoustic information
Summary of linguistic effects
-
-
-
We use our knowledge of language to help guess what was said
o e
...
Lexical neighbourhoods, lexical frequency, semantic and grammatical
probabilities
The guesses “constrain” the amount of phonetic information that we need to perceive
o It helps us solve the challenge of speech perception because we do not need to hear
every cue perfectly
We can use different clinical tests to examine speech perception at global or analytic levels
PALS2002 – Speech Science
Lecture 9: Development of Speech Perception
Theoretical Perspective: Nativism vs Empiricism
-
-
B
...
Skinner (1957)
o People learn language through ‘operant’ conditioning
o Stimulus-Response-Reward shapes behaviour
o e
...
A rat learning to press a bar
o Empiricism: All knowledge is a result of experience
Chomsky (1957)
o Innate abilities to process the world’s languages
o Development is a process of learning the characteristics of one ’s native language
o Poverty of the stimulus: Language is not systematic enough to learn
The current view… Somewhere in between
...
e
...
(1971) found that babies could categorically perceive consonants
...
Kuhl and Miller (1978)
Monkeys, gerbils, starlings, budgies e
...
c
...
This may suggest that Eimas et al
...
The abilities of infants change over the first year of life…
Kuhl et al
...
Loss/gain of
ability to hear non-native contrasts around 6-12 months of age
...
Study
...
g
...
Infant EEG Conclusions
-
Technique allows for the collection of about 3000 stimulus trials per infant (enough for 30
pairs) in 15 minutes of testing
Response seems acoustically driven at the youngest ages
At later ages, pairwise increases in sensitivity, particularly for closest pairs
Suggests that categories may not be learned as single entities
CONCLUSIONS
-
Infants tune into the sounds of their language during the first year of life, and this process
continues into childhood and beyond
...
g
...
Broad implications for understanding speech in noise e
...
c
...
-
Infants tend to imitate adults
As they get older, their imitation of adult vowels gets better
Likely helps reinforce auditory perceptual learning
Kuhl & Meltzoff (1996)
How do we help with statistical learning and imitation? Child-directed speech
...
(1997)
Doupe & Kuhl (1999)
Perception and production of speech develop together
...
e
...
Issue 2: Is there a loss of plasticity for learning over time?
-
Informal observations…
o Children learn new languages with little effort and can learn to speak without a
strong accent
o Adults must work hard to learn new languages and speak with a stronger native
accent
o Possible link to changes in the brain around puberty?
However, science tells us that this conclusion is not entirely true
Sentence rating for L2 speakers with different ages of arrival - Flege (1998)
-
-
Lose the ability to learn a second language at puberty
Loss of plasticity is gradual
Adults can continue to learn
o Long-term experience can help
o Computer based training works
Possible social factors
Some (?) changes in normal plasticity
And L2 phonological systems
Do all second language phonemes get more difficult with age? Best et al
...
In English, voicing contrast is longer than in French
...
e
...
Categorical Perception
-
-
Categorical perception more accurate at discriminating stimuli at boundaries than within
categories
o Almost as if we perceive the stimuli in terms of their category labels
Opposite of perception
o Same sensitivity for all acoustic differences
Assimilation types: Single Category
Discrimination: Poor
Two Category
Discrimination: Excellent
Category Goodness
Discrimination: Moderate to very good
Non Categorised
Discrimination: Poor to very good
Interaction between the native and second language phonetic subsystems – Flege et al
...
See: Lively et al
...
(1996)
-
Naturalistic variability and ID training improves…
o Identification of stimuli from new talkers and words in multiple phonetic positions
o Production
o And improvements are retained over time
CONCLUSIONS
-
-
Infants tune into the sounds of their language during the first year of life, and this process
continues into childhood and beyond
This specialization for language occurs at multiple levels of processing (e
...
perceptual,
phonological, lexical)
First language specialization actively interferes with second-language phoneme learning
o Categories existing in the same space
o Perceptual interference
o Social factors
Broad implications for understanding speech in noise e
...
c
Title: Speech Science (Production and Perception)
Description: Notes from a university module in speech science. Divided into two subsections: Speech production (Topics 1-5) and speech perception (Topics 6-10). Ideally aimed at those taking a 1st or 2nd year module in speech science. Notes are comprehensive and extensive, covering major topics in the field of study.
Description: Notes from a university module in speech science. Divided into two subsections: Speech production (Topics 1-5) and speech perception (Topics 6-10). Ideally aimed at those taking a 1st or 2nd year module in speech science. Notes are comprehensive and extensive, covering major topics in the field of study.