Part 1: A Hodgepodge of Walled Gardens
Audio has a strange place in video game development. Everyone knows the importance of sound in games - both in terms of setting the tone (the old adage from movies: visuals tell you what's happening, and sound tells you how to feel about it) but also in giving you information about what's happening around you.
Our brains process sound faster than visuals, letting us react faster, and we aren't restricted by field of view and focus, making clear sound key to deliver gameplay information to players. For something like cars, which typically have little animation, sound completely drives immersion.
How is Sound Made in Games?
Ask your average game developer what the sound team does, and you'll usually get a blank stare. The sound guys might as well be in the backroom performing arcane rituals. The reason why is simple - there is not a straightforward pipeline for sound like there is for graphics.
Let's be honest - audio today is like a glorified jukebox. The techniques for sound design are adapted from film and linear media. See a sound, record a sound. The problem is anything can happen with nonlinear media. It's like making foley for all possible parallel universes at once.
The audio pipeline is, to put it the best way, evolved rather than designed. We have a collection of different tools that we arrange together on a case-by-case basis. Despite some amazing tools, there's an underlying problem: each of these tools performs PART of the pipeline, and none of these tools talk to each other.
For example, if I'm doing sound design in my DAW (options: Nuendo, Pro Tools, Ableton Live, Reaper, and more) I can't hear what it sounds like in the game with environmental effects (reverb, doppler, distance low-passing). These effects often have resonant peaks that matter a lot to the final sound, so playing these sounds in the game must be part of the iteration loop. Hundreds of hours are lost, switching between environments in a medium-sized game.
But this hodgepodge of different walled garden environments causes a deeper, more fundamental problem, one that affects the future of audio.
Part 2: Why Generative Audio Seems Doomed
Everyone I talk to about sound in games agrees the future is generative audio. The only problem is that I've heard that for the last 13 years and not much progress has been made. I'm going to make the argument that the hodgepodge of different audio environments laid out in the previous post is impeding progress. To make that argument, let's take a case study.
Meet a video game audio developers arch-nemesis: the car sound.
Making a great car sound is hard. There are so many factors that come into play - the speed, acceleration, size of the engine, load on the engine as it goes up slopes or whines downhill, loss of traction on turns, sudden stops as it impacts and the engine winds down, materials under the wheels, damage on the engine, and more.
All of these parameters go into the car sound system, and what comes out is a single stream of audio. Depending on the game, only a subset matters, but most of the time the player wants to know the state of his own vehicle when their eyes are on the road, and what kind of vehicle is coming up behind them to know how to react.
And even when you can see the vehicle, you can't rely on visuals - not many animations on a solid hunk of metal.
So there's a lot of vehicles in games, this should be a solved problem, right? Well, options exist - Shout out to Crankcase Audio's REV plugin - but only for a specific subset of the problem. The truth is I've lost count how many times I've implemented vehicles in games from scratch.
What keeps us from being able to reuse our previous work? That's right, the hodgepodge.
To implement a car sound, we need to work across every part of the audio pipeline. Designing sounds in the DAW, setting up DSP chains in the audio middleware, and implementing code in C++ / C# / Blueprints / Lua / Whatever to handle the complex logic translating many parameters into a single sound efficiently.
When each game has a slightly different pipeline of tools and doing anything interesting with sound requires implementation spread across multiple environments, it makes it hard to reuse your own work, let alone utilizing the techniques developed by others.
This problem is the reason behind the big paradox of video game audio. How can there be so many great people with amazing ideas implementing innovative sound, but audio still seems to suffer from a lack of innovation?
The good news is, audio is not alone in this problem. We can look at history to find a solution.
Part 3: The Graphics Wars
It's hard to remember what game development was like in the mid-90s. Previously, almost all games were written from scratch, implementing graphics, game logic, audio on a case-by-case basis. But there was a shift occurring - companies put extra money into making their code reusable and leveraging that to make game development faster and more predictable. A handful of these companies were able to make their internal tools slicker and available for others to use as well.
But even with these changes, graphics suffered from the same problems that audio suffers from today. People working with graphics were either traditionally trained artists who moved to computers or pure engineers who were picking apart the mechanics of light (in the latter category, the king himself, John Carmack). These creators, split into the artist and engineer, paired up to build worlds from scratch. But if one moved to another company, they would have to relearn systems from scratch. Outsourcing anything beyond flat images was impossible (sound familiar?)
Parallel to this, video game magazines were very popular. The high gloss covers featuring the latest games made them fly off the shelves. So when Nvidia saw the opportunity to sell GPUs to consumers, they found a fast friend in magazine creators.
As these magazines filled with jaw-dropping visuals made possible by GPUs, GPUs in turn flew off the shelves. And as these GPUs flew off the shelves, video game developers found an easy audience by making games that took advantage of this raw power.
Something far deeper was at play here, a standardization triggered almost by accident by a confluence of market forces. It started when game developers suddenly had to support the fixed GPU pipeline. They couldn't just use their own internal graphics pipeline anymore. They had to write a layer of abstraction over their graphics code to support multiple renderers.
Graphics became decoupled from the game.
And as a result, it became easier to reuse, it became easier to reimplement other peoples techniques, it became easier to outsource as third party tools implemented the growing standards, and it became easier to transfer knowledge to new companies, as now everyone spoke the same language.
And thus began the great graphics wars, a period of unbroken innovation that lasts to this day. Innovation, after all, is a collaborative process, and collaboration needs a common language.
So how can we apply this to audio?