July 2, 2020

The Sound of Places - Visualizing audio with R

Using R and soundgen to visualize the spectrogram and loudness of places I've visited.

The Sound of Places - Visualizing audio with R

At the risk of sounding like a broken record (that's an audio pun right there), I have to start this piece by saying that for the last year I've been backpacking. During this adventure, I've seen wonderful places, tasted extravagant flavors, and heard many sounds.

To shortly give you an anecdote, one night, I was at a camp in New Zealand, and the only thing I could hear was the smooth and serene sound of a nearby stream. And it got me thinking, "how would this tone looks like?" So I recorded a short sample of the sound with the intent of visualizing it alongside several samples taken at other sites. These other places are the center of the city of Auckland, and on a trail hike (Cathedral Cove).

In this short article, I'll present two kinds of visualization to compare the audio of those places. To create them, I used R and the package soundgen. You can find the complete code at https://github.com/juandes/wanderdata-scripts/tree/master/sound-places.

The audio samples

To give you a better sense of the data, I uploaded the audio samples to YouTube. Below is the "camp" scene.

In it, you can hear the sound of the water stream and noise. The following video is from the Cathedral Cove trail. Here you mostly hear small critters in the background.

Last, we have sounds from Auckland. As expected from a city, you can hear cars, buses, chatter, and more.

With that out of the way, let's move on to the analysis.

Figure 1. The Cathedral Cove. Photo by me.

Spectrogram and oscillogram

The first plots I want to show are the spectrogram and oscillogram of the sound clips. If I correctly understood, a spectrogram is a visual representation of the sound frequency and amplitude over time. Frequency refers to the pitch of the sound and amplitude to the loudness. From the first set of graphs we are about to see, the spectrogram is the upper part of the visualization. The x-axis is the timestamp of the sample (in milliseconds), the y-axis describes the frequency (in kHz) and the intensity of the heatmap, the amplitude. Below the spectrogram is the oscillogram, used for measuring the amplitude.

Figures 2,3 and 4, presents the spectrogram and oscillogram of the three. From top to bottom, we have the sound sample from the camp, the trail, and Auckland.

Figure 2. Camp's spectrogram.

The camp's spectrogram visualization shows mostly white noise. These low values portrait the quiet sound I describe above. At the beginning and around the 20,000 ms. mark of the plot, there's a darker vertical "line" caused by me touching the phone and standing on a small twig. Because the sound is so low, the oscillogram is almost flat.

By looking at this image and its values, it's a bit hard to grasp the exact meaning of what's happening here. Sure, it is quiet, but I would be nice to compare it to something we are familiarized with. To help with that, consider that the average man's voice frequency is around 85 to 180 Hz and 165 to 255 Hz for females.

The second image is the spectrogram of the sound taken at the trail. Unlike the past one, this one shows more activity in the upper region of the graph. If you listened to the video, you might have noticed the "crittery" sound of insects. What particular insects? I don't know. But I assumed they were crickets.  To compare this data with an external source, I looked for the frequency of the crickets' songs and found a website with several samples whose dominant frequency starts at 2.9 kHz (with many of them at > 4.0 kHz). Maybe my crickets also sing at the same frequency? I'm aware this whole conclusion is a bit of a long shot, but it was enough to convince me :). Regarding the oscillogram, note that this one has waves.

Figure 3. Cathedral Cove's spectrogram.

Next, we have the chart from the Auckland sample. This one is more varied than the others as it has all sorts of shades of gray. This color diversity is the result of the mix of sounds you can experience in the city—traffic, buses, people, constructions, and more. Aren't cities amazing?

Figure 4. Auckland's spectrogram.

The following code snippet shows the function I used to produce the plots. I'm assigning the returned value to the variable analysis, to prevent printing the output.


plotSpectrogram <- function(filename, area) {
  filepath <- paste0("data/", filename)
  # I already knew the sampling rate; this is not a random number.
  analysis <- analyze(filepath, samplingRate = 32000, pitchMethods = NULL,
                      plot = TRUE, ylim = c(0, 10),
                      main = paste0("Spectrogram of a sound sample from ", area))
Figure 5. Sky at night; taken from a camp (not THAT one). Pic by me.


Unlike amplitude, loudness is the subjective perception of noise; what's loud for me might not be loud to you. The library I used for the analysis provides a way to calculate loudness based on a "base" measurement they implemented. Figure 6, 7, and 8, present the loudness scores (on top of the spectrogram) from the three samples.

Figure 6. Camp's loudness.

Figure 7. Cathedral Cove's loudness.

Figure 8. Auckland's loudness.

The right y-axis shows the loudness score. Similarly to what we saw before, the camp audio sample has the lowest loudness score with values around the 1.0 mark (mean 1.10). Next is the trail loudness score graph (mean 4.05), which shows an increase in loudness towards the end of the audio clip. Then, we have the loudness of Auckland. As expected, this one has the highest scores from them (mean 14.21).

To create the graphs, I used the following function:

plotLoudness <- function(filename, area) {
  filepath <- paste0("data/", filename)
  getLoudness(filepath, samplingRate = 32000, osc = FALSE, plot = TRUE,
              main = paste0("Loudness value of a sound sample from ", area))

And that's it!

Figure 9. Victoria Park @ Auckland (not that loud tho). Pic by me.

Recap and conclusion

Have you ever stopped and listened to your surroundings? I'm sure you did. In my case, I wanted to take this experience a bit further and add an extra layer of "perception" to some of the sounds I heard during my adventures. In this article, I show how I used R and the package soundgen to plot a spectrogram, oscillogram, and the loudness of three audio samples recorded in New Zealand. From the visualizations, I was able to "see" the silence of the camp, the busy environment of the city, and what I think were crickets.

For more information about soundgen, check out their intro guide Acoustic analysis with soundgen. For the complete source (which includes more visualizations), visit the GitHub repo at https://github.com/juandes/wanderdata-scripts/tree/master/sound-places.