Enveloped by Sound

Originally published in Revue Musicale Suisse — March 2020
Interview by Pia Schwab with Jürgen Strauss

As hearing human beings, we are constantly surrounded by “clouds” — diffuse acoustic fields. Their composition determines how voices and music affect us. Jürgen Strauss speaks about sound images that are like sugar, and others that leave us cold.

What phenomenon in acoustics can be described with the term “cloud”?

An acoustic situation in which someone speaks or makes music and another person listens is characterized by the fact that, in temporal sequence, one first perceives the direct sound. If the event takes place within a room, then — depending on how the room is constructed — the floor reflection follows first, then the side reflections, then the ceiling reflection, and finally reflections arriving from all directions, overlapping with one another. This is what is referred to as the acoustic response of the room.

Every listening position or listening situation is characterized by a very specific relationship between direct sound from the source and the acoustic response of the room. Through this superposition, the reflections shape what we technically call a diffuse field. This diffuse field can be described metaphorically as a cloud.

Wouldn’t “cloudless” listening be far more desirable?

In our homes, workplaces, or on a train, we are accustomed to the room providing a specific response. When the world no longer answers acoustically, many people feel exposed, almost as if they were in outer space. It can quickly become uncanny. One loses orientation.

People who work in reflection-poor spaces such as recording studios often notice after some time that they lose their sense of time. Normally, we move through time by moving through a sequence of spaces, each with its own acoustic response. The second thing that disappears is the feeling of space.

So the “cloud” anchors us in space and time?

Exactly. Our perception — and here specifically auditory perception — constitutes both our sensation of space and our sensation of time through reflections.

This becomes immediately understandable if we consider how blind people orient themselves. Through the evaluation of reflections in a space, they achieve astonishing localization abilities and impressions of spatiality. They can move around tables, recognize staircases, and create remarkably vivid impressions of objects — even down to the size of teacups.

We can all do this to some degree; it is simply a matter of practice. But we generally do not rely on it because we orient ourselves visually. Nevertheless, through reflections we too gain orientation and an impression of space. We always know whether we are in a small room, a long corridor, and so on.

Are there parallels to visual perception?

The weaker the direct sound becomes in relation to the diffuse field, the more one has the impression that the sound is soft, decontoured, and that brilliance effects are damped. This is completely analogous to light.

It becomes clear when comparing camera lenses, for example portrait lenses from the 1950s. The direct light entering the lens and directly reaching the film corresponds to direct sound. The light scattered within the lens and reflected at the edges creates diffuse light.

Leica produced lenses with a particular grind that were exceptionally suited to portrait photography because they slightly softened contours and colors.

“…wrinkles…”

“…exactly…”

The opposite approach was also used for portraits, for example by Helmut Newton in his photographs of nude warrior heroines, and — as part of that visual aesthetic — with Nikon lenses. These lenses attempt to capture pure direct light. That is why one can see every pore in those photographs, whereas the Leica image appears soft and slightly blurred.

Are there similar opposing tendencies in music recording?

At the same time these Leica portrait lenses were popular, stereophonic recording emerged. Around 1954 it truly began. The English label Decca made some of the first stereo recordings in Geneva’s Victoria Hall. They were characterized by good direct sound, beautiful clarity, clearly audible contours of the sound image, but also by a strong acoustic room response. One could hear the unity between the orchestra and the performance space.

In later developments, symphonic music was no longer recorded only from a certain distance. Engineers increasingly moved microphones much closer to the musicians. So-called support microphones were placed inside the orchestra, perhaps half a meter or one meter from the instruments. This became the so-called multi-microphone recording technique. EMI and later Deutsche Grammophon became famous for this approach.

A kind of “group image” of the orchestra was created from five, six, or seven meters away, including part of the room reflections. Into this image were mixed the signals captured directly from inside the orchestra through the support microphones.

The result was a sound image that, translated into a concert situation, would mean that with one ear we are sitting in the musicians’ laps, while with the other we remain at the back of the hall.

To homogenize this image, the mixture was metaphorically covered with a sauce — an artificial reverberation that did not originate from the room itself. Only through this process did the orchestral sound emerge that we now normally hear on recordings.

Karajan and the Berlin Philharmonic became famous for precisely this sound. One hears precision, detail, and instrumental tone colors with extraordinary clarity. But it is a sound image that can never actually be heard in any concert hall on Earth.

So we like something unnatural?

The richness of detail and the impression of intimacy apparently possess enormous appeal. This can already be observed with the introduction of sound recording itself.

When one sees photographs of Caruso singing into an acoustic horn, he stands perhaps thirty centimeters away from it, and the accompanying musicians stand as close as possible as well. The horn effectively functions as the microphone, capturing an almost purely direct sound.

Caruso sings almost directly into our ears.

“Exactly! He is extremely close. That creates a form of intimacy. When it comes to speech, we can follow effortlessly. The possibility of metaphorically sitting in the front row — or even on stage — was embraced enthusiastically by audiences from the very beginning.

Perhaps because from birth, and even before birth, we are experts in relation to voices. It is like sugar; we can never get enough of it.”

Enveloped by Sound | Revue Musicale Suisse

Enveloped by Sound

What phenomenon in acoustics can be described with the term “cloud”?

Wouldn’t “cloudless” listening be far more desirable?

So the “cloud” anchors us in space and time?

Are there parallels to visual perception?

Are there similar opposing tendencies in music recording?

So we like something unnatural?

STRAUSS Network

Enveloped by Sound | Revue Musicale Suisse

Enveloped by Sound

What phenomenon in acoustics can be described with the term “cloud”?

Wouldn’t “cloudless” listening be far more desirable?

So the “cloud” anchors us in space and time?

Are there parallels to visual perception?

Are there similar opposing tendencies in music recording?

So we like something unnatural?

More from the Journal

Vol. 3: Group Delay | Spatial Resolution & Depth

Vol. 2: Time Alignment | The Foundation of Natural Sound

Vol. 1: Directivity Index | A Continually Rising Bundling of Sound

STRAUSS Network