The range of hearing
Being able to hear is unremarkable: powerful sounds shake the body and can be detected even by single-celled organisms. But being able to hear as well as we do is little short of miraculous: we can quite easily detect a sound which delivers a power of 10fi15 watts to the eardrums, despite the fact that it moves them only a fraction of the width of a hydrogen atom.
Almost as impressive is the range of sound powers we can hear. The gap between the quietest audible sound level (the threshold of hearing, 0 dB) to the threshold of pain (around 130 dB) is huge: 130 dB is 1013, which is the number of pence in a hundred billion pounds.
We can also hear a fairly wide range of frequencies; about ten octaves, a couple more than a piano keyboard. We can detect, though not truly hear, frequencies well below and above this too, as will be explained in Chapter 6. And our frequency discrimination is excellent: most of us can detect differences of about a quarter of a semitone; with practice and in ideal conditions, a difference of about one-twentieth of a semitone is just distinguishable. Our judgement of directionality, by contrast, is mediocre; even in favourable conditions we can only determine the direction of a sound’s source within about 10° horizontally or 20° vertically; many other animals can do very much better.
14. The ear (the middle and inner ears are greatly enlarged).
Perhaps the most impressive of all our hearing abilities is that we can understand words whose levels are less than 10 per cent of that of background noise level (if that background is a broad spread of frequencies): this far surpasses any machine.
Our ears have two functions, hearing and balance, and balance is taken care of entirely by the semicircular canals (see Figure 14). The rest of the ear has thus been free to evolve the best possible system for our needs. Not so our vocal apparatus; as a relative latecomer in our evolution it had to flt itself in cheek by jowl, sharing structures previously earmarked for breathing and eating, licking and sucking, kissing and flghting. Yet, in a trained actor or accomplished singer, the speech system functions just as perfectly as a Stradivarius violin.
15. Frequency dependence of hearing.
What do ears hear?
As Figure 15 shows, our hearing systems have not evolved to measure the physical powers of sounds: a piccolo, for example, has a maximum power of about 0.08 watt, while a trombone can manage about 6 watts, yet it sounds the quieter instrument. This means that trombone players must work a lot harder than piccolo players (who in fact never play at full blast in any ordinary piece of music). But all instrumental powers are low. A loud orchestra playing at full stretch might manage 60 watts; if it could keep it up for two minutes, that would be enough to boil a tablespoon of water.
Moreover, this power spreads from the source in many directions, so, if you were sitting 10 metres from such an orchestra, less than 0.01 per cent would reach your eardrums. And what they actually detect is nothing more than a series of rapid prods of air, which carry no information other than how hard they hit and how rapidly they arrive. That we can experience a whole world of sound, charged with emotion and packed with meaning, is due to the precise coordination of highly evolved anatomical, electrochemical, and neurological processing systems.
The outer ear: capturing the sounds
The pinna, the visible part of the ear, acts mainly as a funnel to collect sounds; its front-to-back asymmetry also provides some information about their direction. The moveable pinnae that some mammals possess greatly assist in direction flnding; though some of us also have this ability too, it is of no beneflt beyond its entertainment value.
Since the auditory canal is a cylinder around 30 mm long, it has a resonance at a wavelength twice this, corresponding to a frequency around 3 kHz. The gain in energy at this frequency results in a loss at others, so the canal acts as a partial band-pass fllter.
At the end of the canal is the eardrum (tympanic membrane), a roughly circular disc of stretched skin about 1 cm across. It is inclined at an angle to the canal, to maximize its area and so capture as much force as possible. It is also slightly conical, which allows it to transfer more energy than it could if it were fflat. The drum has no resonances—for frequencies above ~3 kHz, its surface moves in a chaotic way, and for lower frequencies it all moves as one piece, since the waves are larger than it is. As a result it transfers the widest possible range of frequencies with minimal flltering. This near-fflat frequency response is achieved in part through its asymmetric shape and in part through an internal scaffold of collagen flbres.
It is important that the drum is taut but not stiff, and this is made possible by the equalization of internal and external pressure by the Eustachian tube (always drawn as open despite the fact that it never is, except for an instant when there is a significant pressure change—making a distinctive crackling sound when it operates).
The middle ear: sharpening the blow
The eardrum is connected to a series of three tiny (tiniest, in fact) bones called ossicles, which occupy the air-fllled middle ear. Their main job is to covert the wide shallow motion of the eardrum to a higher-pressure ‘tap’ on a second drum-like membrane called the round window, which is the gateway to the inner ear. By working as levers, the ossicles increase the force by about 1.5 times. However, the main way in which the force is enhanced is simply through the ratio of the areas of the eardrum and the round window, which increases the pressure about twenty times by concentrating the force over a smaller area. The ossicles also provide some protection for the inner ear, through the acoustic refflex (see Chapter 8).
The inner ear: sound to electricity
The inner ear is full of liquid, and, just as the eardrum converts airborne sound to bone-borne, so the round window converts the latter to a ffluid-borne version. This passes along the cochlea, a coiled tube about 2 cm long. At its extremity is a hole (the helicotrema), and the wave passes through this and then travels back along a second tube which is joined along its length to the first. When the wave has completed its doubled-back journey, it must be eliminated, otherwise it would refflect back up the cochlea and interfere with newly arrived waves. So, the second tube terminates in yet another membrane, the oval window. This bulges outwards when the waves reach it, dissipating them as heat.
Running between the tubes like the fllling in a baguette is the basilar membrane, which converts the waves to nerve impulses.The side-by-side tubes would be about 5 cm long, were they not coiled up like a snail (cochlea is Latin for snail). Because the length of the cochlea is related to the wavelength of sound waves, it is similar in all mammals: only about 50 per cent longer in an elephant than a person, so the coiling is probably simply a space-saving measure, rather than having any acoustic function. Mice and other tiny mammals cannot accommodate a full-size cochlea. Theirs are about 1 cm long—and consequently they can only hear about three or four octaves compared to the eight to ten octave range which we and most other large animals can detect. On the membrane is mounted the organ of Corti, on which grow around nine rows of short hairs called stereocilia (around 400 per row). These rows run the length of the membrane, and have nerve flbres attached to them. These flbres bundle together to form the auditory (cochlear) nerve, which transmits impulses to the brain.
The basilar membrane moves in response to the sound waves that impinge on it. It is stiffer and wider at its root than at its tip, which means that lower frequency sounds cause oscillations closer to the latter. These oscillations cause the stereocilia to move, and the hair cells to which the stereocilia are attached then send electrochemical impulses to the brain. Since the brain knows which cells are where, it can determine the frequency of sounds by this means (this is known as place theory). However, when presented with sounds below about 1 kHz, the whole membrane oscillates. In this case, a different mechanism becomes important,
in which the hair cells flre in time with the pulses that make up the sound wave—flring one hundred times a second in response to a 100 Hz tone for instance.
However, the cells are incapable of flring more rapidly than about 500 times a second. To respond to higher frequencies they therefore face a similar problem to that of a squad of soldiers armed with fflintlocks which take, say, a minute to reload: how can the squad produce sustained gunflre at intervals of 10 seconds? The answer is to divide the squad into six groups. The flrst group flres and starts to reload. Ten seconds later, the second group flres, and so on. Ten seconds after the sixth group has flred, the flrst group will have completed its reloading and will flre again. Hair cells work in groups in just this way: the flrst group might signal the brain when a cycle of the sound wave is at its peak, the second when that cycle has fallen halfway to its minimum, the third at minimum and the fourth when it is halfway back up to maximum. In this way, such a quartet of cell-groups could respond to a tone four times higher in frequency than their individual maximum flring rates.
The hair cells of the basilar membrane also work together in a very different way: those in the eight or so outer rows respond to incoming sound waves by changing their lengths in time with them. This motion amplifles the vibrations of the stereocilia on the inner row of hair cells (the only ones that send signals to the brain) and hence provides signiflcant ampliflcation, allowing us to hear sounds which are 40 dB (one ten-thousandth) less powerful than we otherwise could. This activity generates its own faint sounds, called otoacoustic emissions. These emissions are of great value in determining the functionality of the hearing system in infants who are too young to report whether or what they can hear. Also, they often fade when there is any damage to the inner ear, so they are a useful check for audiologists (hearing specialists) too. They are (fortunately) far too quiet to be heard, so sensitive in-ear microphones are used to measure them.
Nerves and brain: objective to subjective
The nerve signals that emerge from the basilar membrane are not mimics of sound waves, but coded messages which contain three pieces of information: (a) how many nerve flbres are signalling at once, (b) how far along the basilar membrane those flbres are, and (c) how long the interval is between bursts of flbre signals. The brain extracts loudness information from a combination of (a) and (c), and pitch information from (b) and (c).
16. Locations of hearing, language, and speech activities in the brain. The primary auditory area recognizes sounds. Broca’s area both analyses and produces semantic elements. Wernicke’s area deals with the sequences of speech sounds (heard, made, and remembered). The supramarginal gyrus may deal with articulation, and the angular gyrus may assist with semantic processing.
What happens in the brain’s hearing and language centres (see Figure 16) is not entirely clear, but the flrst stage in processing is to extract salient features from the stream of input data from the auditory nerve. These features are then used to continuously update, amend, and reflne a mental model of the thing being listened to—perhaps a tune, a spoken phrase, or a worrying engine noise. The model’s accuracy is tested by predicting what the sound will do next. What the brain is trying to do is establish the degree to which each component of a sound contributes to the meaning, a process called hierarchical encoding. Our incredible prowess in performing this function is demonstrated by our ability to, for example, recognize that someone is trying to hum ‘Somewhere Over the Rainbow’ even if they get most of it wrong. The speed with which the mammalian brain interprets the sounds that reach the ears is almost incredible—say ‘squirrel’ to a dog who knows the word, and the response is almost instantaneous.
Homing in
Following a piece of music is a rather unusual and newfangled (in evolutionary terms) thing for a brain to do: usually the focus of interest is an object in the external world, and consequently an important function of the hearing system is to locate that object. For high frequencies, the brain relies partly on the fact that a sound coming from the left will arrive flrst at the left ear, and partly on the blocking effect of the head, which means that the level at the right ear will be lower. For a sound wave longer than the distance between the ears, the brain compares the changing levels of sound at the ears: if a long wave arrives from the left, each of its antinodes will reach the left ear flrst, so the pressure at that ear will initially be highest. As the wave progresses, the left ear pressure falls while the right rises until the antinode passes it, when it will begin to fall again. However, with a wave more than about 4 metres long, there is very little change in level over the inter-ear distance, so its direction cannot be judged.
To determine whether the sound source is above or below the ears, the brain relies on the effects of the shapes of the head and shoulders on the level of the sound. Direction flnding is not the only advantage of having two ears: the auditory nerves sometimes flre even when no sound is present, but the brain will reject such signals if they come only from one side.
The brain’s processing system has evolved to make reasonable assumptions about the sounds it receives, leading to such phenomena as the precedence effect (also known as the law of the flrst wave front or the Haas effect). The brain’s assumption is that a sound that arrives in the flrst fraction of a millisecond indicates the direction of the sound source. So, subsequent sounds are regarded as coming from the same direction as the flrst. This allows us to locate a sound source in dark spaces without being confused by echoes arriving from many directions. Such assumptions can mislead, especially in situations that would not occur naturally: for instance, if one listens to a sound from a loudspeaker about a metre away and at 45° to the left, and this sound is gradually replaced by an identical one from a loudspeaker at 45° to the right, the sound will still seem to come from the left.
A very useful feature for people trying to communicate in crowds is the cocktail party effect, in which particular phrases (such as one’s name) stand out from the hubbub. This works for non-vocal sounds too: conductors are often highly attuned to particular instruments or musical phrases. This effect works because the brain is constantly model-building whether we are actively listening or not, and because it preferentially seeks matches with sounds that it has classifled as having a signiflcant meaning, like its owner’s name.
The role of hearing is a very rich and complex one—as linguist Roland Barthes points out, sounds act on our minds in three ways: as ‘indices’ (the alarming sound of an explosion), ‘signs’ (the literal meaning of a word), and ‘signiflers’ (unconscious associations triggered by a word like ‘end’). And hearing is also very much a social activity too—according to Labelle:
the rich undulations of auditory material do much to unflx delineations between the private and the public. Sound operates by forming links, groupings and conjunctions that accentuate individual identity as a relational project . . . [and] weave an individual into a larger social fabric . . . contributing to the meaning of shared spaces.
Labelle points out, too, that whether we like it or not, sounds bring us into intimate contact with others: crying babies, noisy neighbours, or cheering fellow football supporters. As he puts it: ‘Sound creates a relational geography that is most often emotional, contentious, ffluid.’
Hearing bones
The eardrums are wonders of evolutionary engineering, but we can actually hear fairly well without them, since sound waves also reach the inner ear by travelling through the bones of the head, speciflcally the mastoid bone behind the ear. Submerging one’s ears in the bath largely switches off the airborne route so the sounds that remain arrive mainly by bone conduction. This system is rather insensitive however: using air conduction, we can hear sounds about 40 dB less powerful than the weakest detectable through our mastoids. On the other hand, bone conduction allows us to detect sounds with frequencies up to 30 kHz, which is well beyond the upper frequency limit of the airborne route—but, presumably because such sounds are of little value to us, they are all encoded in the same way, so they give rise to the same pitch sensation as 20 kHz sounds.
Bottlenose dolphins (Tursiops truncatus) take bone-based hearing much further: their jaws bear teeth that are spaced at regular intervals and set at the same angle, are all of a very similar shape, and have heights that depend upon their location. This adds up to a focussing array, in which sound waves of particular wavelengths are signiflcantly amplifled—but only if the source is directly ahead. Thus, the dolphins can hear very quiet sounds, and can determine their directions simply by moving their heads until the loudness is maximized.
Deafness
The hearing system is a delicate one, and severe damage to the eardrums or ossicles is not uncommon. When it occurs we must rely instead on bone conduction and artiflcial aids: Edison used his teeth to transfer the sounds from his phonograph to his mastoid, as the marks on the device still testify.
This condition is called conductive hearing loss. If damage to the inner ear or auditory nerve occurs, the result is sensorineural or ‘nerve’ hearing loss. It mostly affects higher frequencies and quieter sounds; in mild forms, it gives rise to a condition called recruitment, in which there is a sudden jump in the ‘hearability’ of sounds. A person suffering from recruitment and exposed to a sound of gradually increasing level can at flrst detect nothing and then suddenly hears the sound, which seems particularly loud. Hence the ‘there’s no need to shout’ protest in response to those who raise their voices just a little to make themselves heard on a second attempt.
Sensorineural hearing loss is the commonest type, and its commonest cause is physical damage infflicted on the hair cells. With very high levels of sound, the eardrums can be ruptured (and they can also be damaged by blows to the head or by infection). Remarkably however, burst eardrums can not only heal, they usually then perform almost as well as before.About 360 million people worldwide (over 5 per cent of the global population) have ‘disabling’ hearing loss—that is, hearing loss greater than 40 dB in the better-hearing ear in adults and a hearing loss greater than 30 dB in the better-hearing ear in children (who make up about 10 per cent of the total). About one in three people over the age of sixty-flve suffer from such hearing loss.
In terms of alleviating deafness, what can be done varies greatly. For conductive hearing loss, if some hearing function remains, hearing aids tailor-made to flt the wearer’s particular pattern of loss are highly effective. They may also be equipped with noise-cancelling functions and often use directional microphones, so that the user can focus on sources that they are looking at. They can also include a vibrating element to stimulate the mastoid. Hearing aids are less successful in treating sensorineural hearing loss, because signals are distorted by recruitment.
With complete deafness the challenge is much greater, but in the last few years the introduction of fairly reliable cochlear implants has given new hope. And the future looks promising: in 2012, by encouraging stem cells to grow into hair cells, deafened gerbils had their hearing restored, typically by 45 per cent, but by 90 per cent in a few cases. This approach may one day be applicable to those people (around 10 per cent) whose hearing loss is caused by damage to nerves called spiral ganglion neurons. There are also some animals, including owls, in which lost stereocilia simply grow back, and it may be that this capability could be genetically induced in humans (without adopting the other peculiar feature of owl hearing, which is that it only works at its best in spring, presumably because owls need to catch extra prey for their chicks then).
There are a great many hearing problems which are not simply characterized by loss of function. The commonest is tinnitus: ringing in the ears. Its causes are largely mysterious, and the level, type, and duration of the sounds vary greatly from person to person. It is often associated with past infection, drugs (especially certain antibiotics), or trauma, and it frequently accompanies hearing loss.
Structures for speech
Around a million years ago, the hearing systems of our ancestors underwent subtle changes to flne-tune them—literally—to detect speech. We know little about the evolution of our vocal systems; although the ability to make sounds in a controlled way is common to most animal species, speech is immeasurably more complex. So, unlike the evolution of, say, the leg, we cannot look back at a long chain of ancestral forms and watch the system adapt to the changing requirements of those users and the shifting demands of their environments.
17. Vocal apparatus.
In its most basic form, the making of sounds is simple: Figure 17 shows the structures involved. Air exits from the lungs through a tube—the windpipe or trachea—which is equipped with fflap-like vocal folds (vocal cords) that restrict the air fflow and vibrate when they are tensed by muscles attached to them. Increasing this tension increases the vibration frequency, but the length of the folds sets its lower limit, resulting in fundamental frequencies round 125 Hz for men, 200 Hz for women, and 300 Hz or above for children. In boys, there is a sudden increase in length during puberty, which causes the ‘breaking’ of the voice.
After emerging from the end of the trachea, this low-frequency sound enters the back area of the vocal tract, roofed by the soft palate. In front of this is the hard palate. Together these structures form a cavity in which the sound forms resonances called formants. The characteristic wavelengths of vowels are set here, and varied by raising the tongue to change the volume of the tract or to divide it into two linked cavities.
Consonants involve more parts of the vocal apparatus than vowels, are usually shorter in duration, and, in many cases, change while they are being made. There are four main types, deflned by their manner of articulation.
Plosives are made by the sudden stopping of airfflow (hence their alternative name: stops). Fricatives and liquids require partial stopping, with or without turbulence respectively. Nasals defflect the airstream to the nasal cavity. Glides (semivowels) involve a rapid transition from one vowel sound to another. The unusual helpfulness of this naming system is shared by the subdivision of the consonants according to their place of articulation, as Table 4 shows. Also shown is whether or not the consonant is voiced—that is, whether the vocal cords are involved in making it. (Ventriloquists attempting to produce labial or labiodental sounds are stymied by the need to keep their lips separated and motionless. Skilled proponents of the art circumvent this by speaking very fast indeed at such points.)
Since the wavelengths of the resonances in the vocal tract depend only on its structure, changing the velocity of sound alters the frequencies of those wavelengths. Hence the ‘Donald Duck’ voice produced by breathing helium (14 per cent of the density of air), and the more rarely heard gravelly voice produced after breathing xenon, which is 4.6 times denser than air.
Table 4. English consonants
Hearing sound
Life would be a dull thing, however, if all we did with our vocal apparatus was speak. Singing is, physiologically, no different from speaking except that every aspect of the sound made is more flnely controlled, and pitch is often keyed to an externally deflned value. Whistling does not involve the vocal cords: it requires the production of turbulence around the lips, which transfers energy to the vocal cavity, which acts as a Helmholtz resonator. Shouting simply requires greater air force from the lungs. In whispering, the vocal apparatus works as it does when producing normal speech, except that the vocal folds are neither vibrated nor fully relaxed, so that when air passes through them it produces turbulence (this is called adduction). Since much more air can pass between the folds without exciting sound waves, whispering is necessarily relatively quiet.
Our hearing systems are far more sophisticated than our most advanced machines, and have evolved to suit us admirably. But what nature has given us is limited in range. In all but the tiniest groups of people, communication—which we prize so highly—must spread far beyond the reach of voice or of hearing, and it was to answer that need that flrst electricity and then electronics were pressed into service, to augment and to replace our fflesh and our nerves. How they do this is the subject of Chapter 5.