How to “draw” and “read” sound
How to link audio and visual information? This question is often asked by scientists and amateurs from all over the world. So, in February 2006, the news that scientists were able to reproduce sounds from an earthen pot more than 6500 years old quickly spread throughout the Internet.
The potter allegedly applied a musical rhythm to the pot while making it. Unfortunately, this turned out to be an unsuccessful April Fools joke on Belgian television.
However, Patrick Feaster managed to process the recording, which is more than 1000 years old. On this occasion, in May 2011, he spoke at a conference of the Association for Recorded Sound Collections (ARSC) with the discovery of “paleospectrophony”.
Immersion in history: deciphering the records of the past
Patrick uses modern technology (in this case, not particularly modern, since the spectrogram was invented a long time ago) in order to convert visual objects into sound ones. However, humanity has not always followed this path and tried, on the contrary, to “capture” sound in images.
For a long time (before the creation of the phonograph by Thomas Edison), people were worried about the question: how to come up with such a method of fixing music that would help the person looking at the recording to reproduce the melody in his head as easily as professional musicians do when looking at the score. Unfortunately, according to Dr. Fister, such a task is unattainable in principle, since our brains in most cases are not good enough at converting visual information into sound.
Perhaps the solution to this problem in the past was not crowned with success, but history has left us a lot of evidence of how people in different eras tried to create such sound recording systems. The most famous of these systems formed the basis of the phonoautograph, the predecessor of the phonograph invented by the Frenchman Edouard Martinville. A phonautograph was a device in which sound passed through a cone, causing a membrane connected to a needle to vibrate. The needle, in turn, drew wavy lines on a glass cylinder covered with smoked paper.
With the help of a phonautograph, the sound could be captured, but there was no way to reproduce it. This problem was solved by Fister. In 2008, he, his colleagues, and audio expert David Giovannoni gathered at Lawrence National Laboratory in Berkeley to transcribe one of the best-preserved Martinville phonoautograms.
Lawrence Labs developed technologies to extract sounds from high-quality photographs that captured images of fragile wax media or broken discs. Using these technologies, scientists obtained from the phonoautogram a recording of the song “Moonlight” (“Au Clair de la Lune”), made in 1860. It is believed to be the first recording on which a human voice is discernible.
However, solving this problem was not enough for Fister: subsequently he not only recorded sound from more than 50 phonoautograms, but also investigated earlier attempts to “record sound”. Strange as it may seem, the scientist was helped in this by the Google Books service. Using it, Feister wrote down characters from books that were constantly ignored as historical fads.
He found the oldest undulating line in a book from 1806. Through other techniques, he was able to decipher the 1677 melody, which was recorded in many dots. Another was found in recordings from the 10th century, where lines showed in which key to sing. Examples of such recordings can be found on his Phonozoic website.
Researchers from MIT, Microsoft and Adobe are taking a different path: they reconstruct sound from a moving (or rather, vibrating) picture. Researchers have developed an algorithm for obtaining an audio signal from vibrations recorded on video.
In one such experiment, they managed to extract intelligible speech from an empty bag of chips. In a number of other experiments, the same was done with the surface of aluminum foil, a glass of water, and even with the leaves of a house plant. In 2014, the team presented their achievements at the annual SIGGRAPH conference.
Video from a TED talk by one of the researchers working on the project
The fact is that when sound comes into contact with an object, it makes it vibrate. The movements created by these vibrations are so insignificant and imperceptible that a person cannot see them. However, they can be “seen” by the camera: to extract the audio signal from the video, the scientists used video recording with a frame rate higher than the frequency of the audio signal.
Initially, the experiments used cameras with a shooting frequency of 2000 and 6000 frames per second, but the researchers tried to use other, more budgetary cameras. Of course, it was not possible to extract articulate speech from the recorded video with a shooting rate of 60 frames per second, but it was still possible to understand how many people were in the room, their gender and even the peculiarities of their pronunciation.
Of course, when thinking about using such developments, “spy stories” come to mind, but the researchers themselves call their project an opportunity to discover new facets in the image of objects and study their previously unexplored properties. And if hundreds of years ago people tried to come up with a way to “record sound”, now such “recording” becomes a side effect, which, in turn, helps to reveal new properties of familiar objects.
Do it yourself
As already mentioned, the first phonoautogram was deciphered thanks to the technology of sound reproduction from photographs of old records (we have already written about this technology in one of our materials – it also contains links to transcribed audio recordings). However, Patrick Feister emphasizes that anyone can handle this task – if they know what to do.
The detailed process is described in this material. On our own note, to solve the problem, you will need a high-quality photo, basic Photoshop skills (a wave drawn on vinyl must be digitized, “straightened” – the groove on the plate is twisted in a spiral – remove all kinds of noise and displacement), as well as a relatively powerful computer with a lot of RAM.
In order to convert the resulting image into a WAV file, Patrick uses a rather exotic software: the ImageToSound program. It is free, but despite this, it is quite difficult to find it on the net (Patrick shared a source).
The program sequentially converts each block of the image (block width – 1 pixel) into an audio sample. Unfortunately, this software does not even support Windows 7 (the author uses a separate computer with Windows 98 for work). As an alternative, Fister suggests using the AEO-Light program, but warns that he himself is not fully familiar with the intricacies of working with it.
The last step is adjusting the playback speed. This is where simple mathematics comes to the rescue. First you need to find out the playback speed on the original disc, the length of one revolution of the digitized wave (after “despiralization”) in pixels and the sampling rate of the final file.
If the image was edited to an audio file with a sampling rate of 44.1 kHz, then this means that the second of the audio file will be equal to 44,100 image pixels. If, for example, the speed of a song on a vinyl record was 50 rpm, and after digitizing and despiralizing, one revolution of the record took 30,000 pixels, we get 1,500,000 pixels per minute (50×30,000).
If we divide this number by 60, we get the number of pixels per second (1,500,000 / 60 = 25,000). Divide the sampling rate by the number of pixels per second (44,100 / 25,000 = 1.764). The resulting number is multiplied by the length of the audio file (song playback time) and we get the time with which this file was originally recorded. If the playback speed of the original recording is not known, Patrick advises you to choose the final speed by ear.
Patrick Feister warns that this is a rather painstaking work that takes time and patience, but at the same time it gives sometimes surprising results: especially when it comes to the voices of the past, which, it would seem, were forever lost.