This post is a collection of ideas and conclusions that I have come to regarding the topic of procedural audio. The report refers to practitioners of procedural audio and attempts to propose possible next steps in the procedural audio movement for interactive games. I decided to make write this post after reading a recent interview with Andrew Farnell which can be accessed here. I thought to myself ‘Having these tools would be awesome!!’ But a question remained ambiguous. ‘How can we get them?’ So this is my stab at trying to define, as a sound designer, what we might need to do to get these tools.
If you are familiar with procedural content generation for audio skip to the section titled ‘Top Down’.
This section of the report will define procedural audio and then outline two possible approaches to procedural audio content generation (the ‘top down’ and ‘bottom up’ approaches).
Procedural content generation can be defined using the following definition:
“Procedural refers to the process that computes a particular function, procedural content generation [is the process of] generating content by computing functions.” (Nicolas Fournel, 2010). Procedural audio can therefore be seen as the process of producing audio using the aforementioned method.
Currently most audio in interactive games requires some form of sample playback. This method requires audio data to be stored on disk and any pieces of audio required to be played back at any time (ie needs to happen quicker than can be streamed from disk) such as footsteps or gunshots must be stored in RAM. The Game Audio Tutorial makes the following point:
” Some of your sounds will play off disk, where they still have to compete for space with all the graphics assets, but many will need to be loaded up into RAM to be able to play instantaneously when needed. Ram costs money. Ram is limited.” (R. Stevens, D. Raybould, 2011).
If the audio was produced procedurally there would be little, or no requirement for audio data to be stored In RAM.
The other problem with ‘data driven’ approach to audio is the fact that the sounds can only be as reactive as the processing/layering allowed by the audio data produced by a sound designer. Procedural audio however can be generated according to the exact conditions within a game.
Nicolas Fournel summarizes when procedural content generation is useful:
“Procedural content generation is used due to memory constraints or other technological limitations.” He continues and states it can also be “used when there is too much content to create, when we need variations of the same asset and when the asset changes depending on the game context.” (Nicolas Fournel, 2010).
Top Down (analysis and re-synthesis)
The ‘top down’ approach involves analysis of sounds and then various methods of synthesis to either partially or fully reconstruct that sound. This method could also be considered as ‘data driven procedural audio’.
Nicolas Fournel describes this method:
“Top down, you analyze [the] example of the sound you want to create and you find the adequate synthesis system to emulate them.” (Nicolas Fournel. 2010).
This method allows a sound designer to create an asset such as a gunshot sound in the normal way. The sound is then synthesized using appropriate methods of synthesis. The advantage of this is that artistic control over a sound is maintained whilst many of the benefits of procedural content generation are still received.
Bottom Up (Physical Modeling)
The ‘bottom up’ approach uses physical modeling to synthesize a sound from scratch below is a discussion of the advantages and disadvantages of this approach.
“Sound is, in the end a physical phenomena. So you could go and simulate everything on a machine. This gives you lots of automation.” (Nikunj Raghuvanshi. 2011). Essentially the sound is created according to the physical model in the game, this allows sound to react according to the exact conditions in game at any given moment. Nikunj continues to state the disadvantages of this approach:
“It takes a lot of computing, you could not [currently] do this at run-time.” he also states another disadvantage “There is no space for any artistic control over sounds that are produced or propagate.” (Nikunj Raghuvanshi. 2011).
If we refer back to Nicolas Fournel’s definition of Procedural content generation uses stated earlier. This approach best serves the advantage: “asset changes depending on the game context.”
Andy Farnell provides another reason as to why this method could be very useful. The “problem is how to provide the colossal amounts of content required to populate virtual worlds for modern video games.” (Andy Farnell. 2007). Farnell also discusses another advantage of this ‘bottom up’ approach:
Another advantage of this stated by Farnell is the following: “One of the great advantages is that it gives 90% of your assets for free. You just put your objects in the world and you get default sounds” (Andy Farnell. 2012).
Comparison and Conclusion
In this section each method will be comparatively discussed attempting to define how each can be used and what the possible ‘next steps’ for procedural audio in interactive games may be.
As stated earlier Nikunj Raghuvanshi states that the bottom up approach requires loss of “artistic control.” (Nikunj Raghuvanshi. 2011). However Andy Farnell argues that this does not have to be the case with a physical modelling based method. “Every Procedural Audio team would need a good sound designer. I wouldn’t leave it to the programmers, I want somebody who has a great set of ears and I would actually put them in a higher position and get them to direct the programmers and say, ‘No its more like this, listen to these examples. I want to get this emotion across’, and they can direct it aesthetically.” (Andy Farnell. 2012).
As mentioned in the ‘Bottom Up’ section of this report Farnell believes that you could get a lot of sounds without having to create the content from scratch. The sound designers would then be liberated and able to fine tune the sounds, thus maintaining creative control.
The implication of this method is it requires a complete re-structuring of how audio is created for games. In other words the actions performed by a sound designer to achieve their roles as designers of sounds will be completely changed.
The top down approach can provide at least a temporary solution to this quandary acting as a step in which current sound design methods can be met with procedural techniques. This allows sound designers to create the hyper-real, artistically controlled sounds game developers are used to with some of the benefits of procedural audio. This, I believe, is an important step towards getting game developers to accept these approaches. As Farnell states: “it dawned on me that there weren’t any fundamental obstacles to radical technical progress. We could do this. The obstacles were structural and political. How do you introduce a new technology?” (Andy Farnell. 2012). Currently the game industry has a highly developed process of audio content creation if the bottom up approach to procedural audio is introduced hastily sound designers responsible for the quality of the audio will not be able to reach the potential and perceived quality of audio currently in games. This is due to a lack in education regarding procedural audio. Therefore the education must be a gradual process and the top down approach could feasibly begin providing sound designers with this education.
Before physical model based procedural audio can be practiced in games sound designers must be able to reach and surpass the perceived quality of game audio using those techniques. As the Game Audio Tutorial states: “You will have to constantly convince people of the importance of investing in sound and music.” (R. Stevens, D. Raybould. 2011). In order to convince them it must at least sound as perceptually good as current sound design, with the additional benefits of procedural content generation being the selling point. If procedural techniques are proposed to developers before they are up to scratch it could damage future efforts when trying to introduce such systems.
To allow movement towards the physical model based approach programmers with an intimate knowledge of sound design and the physical creation of sound must begin contributing synthesis models to the community allowing sound designers to tweak and get the best from such systems.
· Top down approach fits in well with current sound design practice.
· Procedural approaches should only be proposed (as commercially viable solutions) when they sound as perceptually good as the current data driven approach with the added benefits of procedural content generation.
· Synthesis models should be shared allowing sound designers and audio professionals to learn and make the most out of them.
Thank you for taking the time to read this collection of thoughts and information. I’d love to discuss things with people about this. You can email me (firstname.lastname@example.org) or use the comment box on this post (linked top left of post).
For more information on procedural audio Fournel has put together the wonderful:
Andy Farnell. (2007). Synthetic game audio with Puredata.
Andy Farnell. (2012). Procedural Audio: Interview with Andy Farnell. Available: http://designingsound.org/2012/01/procedural-audio-interview-with-andy-farnell/. Last accessed 21/1/2012.
Nicolas Fournel. (2010). What is Procedural Audio?. Available: http://www.gdcvault.com/play/1012704/Procedural-Audio-for-Video-Games. Last accessed 20/1/2012.
Nikunj Raghuvanshi. (2011). Sound Synthesis in CRACKDOWN 2 and Wave Acoustics for Games. Available: http://www.gdcvault.com/play/1014416/Sound-Synthesis-in-CRACKDOWN-2. Last accessed 19/1/2012.
R. Stevens, D. Raybould (2011). The Game Audio Tutorial. UK: Focal Press. p33, pxvii.