An Overview of Core Audio
Core Audio is the engine behind any sound played on a Mac or iPhone OS. Its procedural API is exposed in C, which makes it directly available in Objective-C and C++, and usable from any other language that can call C functions, such as Java with the Java Native Interface, or Ruby via RubyInline. From an audio standpoint, Core Audio is high level because it is highly agnostic. It abstracts away both the implementation details of the hardware and the details of individual audio formats.
To an application developer, Core Audio is suspiciously low level. If you’re coding in C, you’re doing something wrong, or so the saying goes. The problem is, very little sits above Core Audio. Audio turns out to be a difficult problem, and all but the most trivial use cases require more decision making than even the gnarliest Objective-C framework. The good news is, the times you don’t need Core Audio are easy enough to spot, and the tasks you can do without Core Audio are pretty simple (see sidebar “When Not to Use Core Audio”).
When you use Core Audio, you’ll likely find it a far different experience from nearly anything else you’ve used in your Cocoa programming career. Even if you’ve called into other C-based Apple frameworks, such as Quartz or Core Foundation, you’ll likely be surprised by Core Audio’s style and conventions.
This chapter looks at what’s in Core Audio and where to find it. Then it broadly surveys some of its most distinctive conventions, which you’ll get a taste for by writing a simple application to exercise Core Audio’s capability to work with audio metadata in files. This will give you your first taste of properties, which enable you to do a lot of the work throughout the book.
The Core Audio Frameworks
Core Audio is a collection of frameworks for working with digital audio. Broadly speaking, you can split these frameworks into two groups: audio engines, which process streams of audio, and helper APIs, which facilitate getting audio data into or out of these engines or working with them in other ways.
Both the Mac and the iPhone have three audio engine APIs:
- Audio Units. Core Audio does most of its work in this low-level API. Each unit receives a buffer of audio data from somewhere (the input hardware, another audio unit, a callback to your code, and so on), performs some work on it (such as applying an effect), and passes it on to another unit. A unit can potentially have many inputs and outputs, which makes it possible to mix multiple audio streams into one output. Chapter 7, “Audio Units: Generators, Effects, and Rendering,” talks more about Audio Units.
- Audio Queues. This is an abstraction atop audio units that make it easier to play or record audio without having to worry about some of the threading challenges that arise when working directly with the time-constrained I/O audio unit. With an audio queue, you record by setting up a callback function to repeatedly receive buffers of new data from the input device every time new data is available; you play back by filling buffers with audio data and handing them to the audio queue. You will do both of these in Chapter 4, “Recording.”
- OpenAL. This API is an industry standard for creating positional, 3D audio (in other words, surround sound) and is designed to resemble the OpenGL graphics standard. As a result, it’s ideally suited for game development. On the Mac and the iPhone, its actual implementation sits atop audio units, but working exclusively with the OpenAL API gets you surprisingly far. Chapter 9, “Positional Sound,” covers this in more detail.
To get data into and out of these engines, Core Audio provides various helper APIs, which are used throughout the book:
- Audio File Services. This framework abstracts away the details of various container formats for audio files. As a result, you don’t have to write code that specifically addresses the idiosyncrasies of AIFFs, WAVs, MP3s, or any other format. It enables your program to open an audio file, get or set the format of the audio data it contains, and start reading or writing.
- Audio File Stream Services. If your audio is coming from the network, this framework can help you figure out the format of the audio in the network stream. This enables you to provide it to one of the playback engines or process it in other interesting ways.
- Audio Converter Services. Audio can exist in many formats. By the time it reaches the audio engines, it needs to be in an uncompressed playable format (LPCM, discussed in Chapter 2, “The Story of Sound”). Audio Converter Services helps you convert between encoded formats such as AAC or MP3 and the uncompressed raw samples that actually go through the audio units.
- Extended Audio File Services. A combination of Audio Converter Services and Audio File Stream Services, the Extended Audio File APIs enables you to read from or write to audio files and do a conversion at the same time. For example, instead of reading AAC data from a file and then converting to uncompressed PCM in memory, you can do both in one call by using Extended Audio File Services.
- Core MIDI. Most of the Core Audio frameworks are involved with processing sampled audio that you’ve received from other sources or captured from an input device. With the Mac-only Core MIDI framework, you synthesize audio on the fly by describing musical notes and how they are to be played out—for example, whether they should sound like they’re coming from a grand piano or a ukulele. You’ll try out MIDI in Chapter 11, “Core MIDI.”
A few Core Audio frameworks are platform specific:
- Audio Session Services. This iOS-only framework enables your app to coordinate its use of audio resources with the rest of the system. For example, you use this API to declare an audio “category,” which determines whether iPod audio can continue to play while your app plays and whether the ring/silent switch should silence your app. You’ll use this more in Chapter 10, “Core Audio on iOS.”
As you develop your application, you’ll combine these APIs in interesting ways. For example, you could use Audio File Stream Services to get the audio data from a net radio stream and then use OpenAL to put that audio in a specific location in a 3D environment.