Post -by Gautam Shah

The tonal quality of spoken language or sound is determined by many factors like social history and ethnic affinity. It is also formed by dominant building forms, materials and the physical environment factors like terrain, topography, (plains, coastal, valleys, lake fronts, forest, deserts). The quality of speech-sound is acutely affected by the environment one dwells in or aspires to be with.


There is a saying in Gujarat, India that every 20/25 km Speech varies. Such ‘Socio-linguistics’, can happen syntactically, lexically, and phonologically. The Phonology relates to the systems of phonemes or the organization of sounds in a language.

“Sound is a spatial event, a material phenomenon and an auditive experience rolled into one. It can be described using the vectors of distance, direction and location. Within architecture, every built space can modify, position, reflect or reverberate the sounds that occur there. Sound embraces and transcends the spaces in which it occurs, opening up a consummate context for the listener: the acoustic source and its surroundings unite into a unique auditory experience.” -– OASE (



People (fishermen) who stay close to a sea coast are affected by the continuous splashing sound of waves. Similarly villagers staying in a valley often bear the echoing effect of the mountain range, whereas in plain desert land there is complete absence of bouncing sounds. People living on a very busy-noisy street have to talk louder and that habit remains with them for a very long time.

Sea coast


True colour of human speech comes about by intra vowel-consonants pauses, vowel and consonant utterance lengths and preferred frequency combinations, intra word pauses, phrasing, etc. When a language is spoken in different terrains each, creates its own variants. Human speech variants develop according to the environment one resides, and specifically how one listens to own speech sounds. This is perhaps the reason why children with deficient hearing capacity often have poor speech formation.


It is also true that people tend to accept the speech sound they can make as the perfect one, which may not be true. Teachers have better speech quality, as they have more opportunities to improvise. Similarly an American child or for that matter any child of a well to do family, bred in media culture is better attuned to a style of talking that is correct for good projection. Next generation of children are going to be more articulate than their parents or other non media children.

Speech intelligibility is a function of space. Space not only defines how the speech will be listened to, but also how the speaker or musician will improvise the output.



In Indian classical music concerts (vocal and instrumental) we have seen masters tuning the musical instruments, drums on stage, in front of the audience. This is often irritating to many, but in reality the musician is attuning the sound for that space and environment (moisture, temperature and air movement currents). The Alap in Indian music, the first rendering that is without the drum beats, is also attuning for the space and environment. Most Western concerts or Pop singers spend hours on the ground testing position of the speakers, their location and pitch of a sound etc. to attune to the site conditions.



Most experienced speakers and stage actors have the capacity to instantly modulate their output according to the quality of space. For example, if the background noise is high, the speaker will raise the voice and change the tonal quality (change the range of frequency to over come masking) or if there is a longer reverberation, the pauses between words are widened. Speakers also face the section of crowd they want the message to sink in. In group discussions, an experienced person automatically shifts to a ‘sound’ advantageous position. Seasoned actors during the rehearsals pick the nuances of stage positions and body posture to deliver an effective dialogue.


Effective sound delivery is closely related to how the speaker is perceived. For example on non visual space like the Radio or telephone a straight into the mike creates a steady delivery of sound, but a moving speaker (or the mike) carries the impression of a non-sincere person. Most of the TV anchors are taught to speak without moving their head or body. There was a time when the surroundings or space mattered a lot on the quality of Sound being carried, however, today the microphones can eliminate the background noise and also do some degree of micro balancing to eliminate the differences caused by shifting speaker or singer.