An Interview with the Authors of Computer Graphics
Read Computer Graphics: Principles and Practice, Third Edition and more than 24,000 other books and videos on Safari Books Online. Start a free trial today.
Andrew Glassner: What topics in computer graphics excite you personally right now, and why? How has that list changed over the years?
John “Spike” Hughes: Making geometric models is still amazingly difficult. As we get better and better at rendering, the models have to keep up, so making a model entails choosing reflectance characteristics, applying textures, applying things like displacement maps for fine detail and building level-of-detail representations for use when the model’s seen at a distance. The problem is that each of these things involves a representational shift: we think of “geometry” and “reflectance” as different, but when you look closely, it turns out that reflectance characteristics are in fact partly due to fine-scale geometry—our reflectance models are just ways to avoid the work of generating all that fine geometry, especially when the place-to-place variation in geometry will result in no net difference in appearance: any two parts of a stick of chalk look about the same. But if you look closely at a piece of chalk, the variation does matter. Displacement maps are a way to bridge the representations between the “geometry” scale and the “microgeometry” scale. Texture mapping is a way to push information from one scale into another in a way that seems appropriate—a texture-mapped apple looks good in a picture of a basketful of apples. But close up, the lack of further detail in the texture makes the apple look unreal. I’m intrigued by the problem of letting people rapidly make models that behave pretty well at various scales. Something similar happened in typesetting in the 1970s and 1980s: some of the first tools let you make amazingly beautiful documents, but a lot of skill was required, and some expensive machinery. The (non-book-publishing) world was divided into those with typewriters and those with high-end computers that had phototypesetting software on them. But pretty soon, there arose software that anyone could use to make pretty decent looking documents: you could use multiple fonts, change font sizes, use italic and bold, and so on. On the downside, that ability just about killed the high-end stuff: when you could get nice looking results cheap, and beautiful results for lots of money, “cheap” won. And for a while, cheap-and-ugly won: people got the ability to use varying fonts, and they made a mess! But within ten years, elementaryschool kids were producing documents that looked better than some of the textbooks I used in college. That needs to happen with modeling—it has to get so easy that everyone can do it, and not have to think about all the subtleties of scale and level-of-detail and representation.
At the other extreme, I’m interested in fundamentals. Are polygon meshes really the right way to represent geometry? Are linear transformations really the right way to think about altering shape? Polygon meshes have the problem that they contain discontinuities at edges and vertices, and those discontinuities let you make up some situations in which our models produce bad results. For instance, if you focus a whole lot of light so that it hits exactly on an edge, and the surface is pure specular, the standard models of graphics don’t say what happens to the light. Numerically, it probably reflects from one of the two adjacent faces, but relying on numerical representations to address a lack of clarity in the definitions is a generally a bad idea—the moment your surface or light source start to move, the numerical choices are likely to vary because of roundoff error, and you’re going to get something that looks very bad. And all this is happening in a situation that’s perfectly nice and simple: a light shining through a magnifying glass onto a shiny sphere that just happens to be represented as an icosahedron.
At an intermediate level, I’m very interested in the problem of creating objects or images or diagrams with semantic content. Trying to make figures for this edition of the book was amazingly frustrating to me because the tools I had were all adapted to different levels of representation. I wanted to make things like intersecting 3D planes, but I wanted to draw lines on them that were labeled with text that was readable in the final 2D figure, with arrows that made sense as “labeling arrows” rather than physical arrows in 3-space. I could do all that, but the moment I changed my view direction, the labels would all be messed up. The problem was that the 2D and 3D objects needed to communicate at a semantic level, but the interface for that was not really defined.
Morgan McGuire: Let's start with how the list has changed. 3D graphics is now ubiquitous. Just reaching that point would have been on my list in previous decades. .Many people in industrialized nations have smart phones with 3D graphics processors in them. Modern operating systems all base their rendering on 3D APIs like DirectX and OpenGL. Millions of people spend time every day in 3D virtual worlds through video games. 3D printing has become affordable for consumer goods. Nearly every television show and film seamlessly incorporates computer generated imagery, and the design of nearly all products is assisted by 3D modeling. These are the results of decades of research and engineering on exciting topics.
Two new topics that I'm personally excited about in graphics for the future are expressive rendering and (wearable) augmented reality. Both have long been science fiction dreams, and I think that recent systems and algorithmic advances make this the right time to revisit these ideas. (Seamless and ubiquitous solutions for consumers will probably take decades.)
Expressive rendering seeks to create more abstracted images and animations, in the same ways that many traditional artists intentionally diverge from realistic imagery. This is important for visual communication—abstraction and interpretation are how we convey information clearly and with emotion. The history of natural media did not end when artists achieved photorealism, around the time of the Renaissance. Instead, that achievement was the *beginning* of the more abstract styles that now dominate commercial art, fine arts, and graphic design. In computer graphics, I think we're approaching the point where we understand the fundamentals of photorealism. With mastery of the fundamentals, we are now ready to begin more in depth exploration of expressive rendering. Except for animated films, expressive rendering was not previously mainstream in the graphics community and is a barely explored new area in visual communication through computation.
Augmented reality integrates virtual images with the real world, in real time. Wearing a see-through head mounted display, like eye glasses with an embedded transparent computer display, is one way to merge the real and virtual images. Augmented reality will probably requires thousands of times as much computation per pixel as classic 3D rendering and will depend on technologies that are only now appearing in labs, such as lightweight eye trackers and head mounted displays. The rapid adoption of 3D rendering, feature tracking, face detection, and accelerometers in mobile devices shows how quickly exotic graphics technologies can enter the consumer space when driven by appropriate demand and applications. It also shows that many of the underlying systems required for augmented reality are already affordable in a wearable context. Now we have to do the same for eye tracing, low-latency displays, low-latency graphics pipelines, and robust real-time 3D scanning and relighting. This should also drive work on no-contact interfaces such as gesture recognition and speech recognition.
Andy Van Dam: Over the decades there have been many “quests” running in parallel. One of the most obvious ones is "photorealistic rendering," where we now have phenomenally realistic imagery being produced in real time even on commodity computers. It still astonishes me how far we've come how fast, and that my grandson can put together a game computer that is far more powerful than the most expensive high-end graphics workstations and indeed supercomputers of a decade ago. But what interests me most since I wrote my first paper on "Computer Driven Displays and their Use in Man/Machine Interaction” nearly 50 years ago is that we still have a long way to go in having "natural" user experiences where more of our human sensory apparatus is engaged in a transparent, fluid way, assisted by computer intelligence that knows about our context, preferences and intent in performing tasks. So to put it bluntly, I see realistic (and even non-realistic) rendering as a largely solved problem, compared to what remains to be done to turn computers into intelligent partners that I can control with ease (and pleasure). My group and I are largely focused on pen- and touch-interaction and the creative use of gesture recognition. Much more work can happen there before we have a paradigm shift as profound and universal as that from the command line interface to the GUI in the seventies (PARC) and eighties (Apple, then Microsoft), all building on much earlier work by the recently departed grand visionary Doug Engelbart and his SRI team.
Andrew: What interesting topics in computer graphics are under-explored?
Spike: There are a ton of places in graphics where we have operations that don’t commute, i.e., doing first A and then B is different than doing first B and then A. For example, we can represent the spectral distribution of light leaving the light sources in a scene, and the per-wavelength reflectivity of various surfaces, and by taking a kind of per-wavelength product, compute the light reflected from objects in the scene, and eventually compute the spectral distributions of the light arriving at a sensor, and then say how “red”, “green”, and “blue” it looks, and store these three values as a pixel of an image. Alternatively, we can represent the light and the reflectivity by storing “red”, “green”, and “blue” amounts of light and reflectivity, and the “per wavelength product” becomes a “product of red values, product of green values, product of blue values” operation. The end results will generally be somewhat different. The operations of “compute reflected light” and “condense the representation to tristimulus values” don’t commute. In fact, when you choose a three-sample representation for light, you’ve implicitly chosen to ignore metamers (different spectral distributions that look the same). We do this a lot in graphics: we observe two things that don’t commute, pretend that they do because it makes things faster or simpler or cheaper, and move on. But in many cases we’re not entirely aware of the consequences of these choices, and in some cases, we’re not even aware that we’ve made the choices. I’d like to see someone understand these things, and put bounds on the amount of error that results as a consequence of “false commutation” in various situations.
I’d also like to see us develop richer models of the consumers of images. We all know that you can make a crowd of people that look kind of realistic by making many copies of just a few basic characters and mixing them up. The viewer isn’t likely to notice that this guy in the green shirt is taking steps that look exactly like the ones taken by the guy in the red shirt 0.3 seconds ago, so there’s an illusion of complexity. On the other hand, you need some variation, or people say “Hey, those are all copies of the same person!” We don’t really have a very good model of how our imagery is comprehended by the visual system, although there’s been some good progress. Once we get a richer notion of what’s going on in the visual system, there’s an opportunity to spend resources where they matter. As a trivial example, we don’t render the distribution of UV light in a scene, because our eyes can’t perceive that. But are there other things we spend lots of resources on without good reason? I suspect so.
Andy: As my colleagues have said, modeling and simulation have endlessly many problems and there is, of course, a fluid and ever shifting boundary between what is domain specific and properly the province of graphics. I've been in the field so long that I've seen graphics moving into essentially every domain of knowledge, and often it seems like a branch of applied mathematics, physics, or engineering.
Andrew: Manipulated images and videos are everywhere. When browsing the web, how do you personally decide if an image or video has been doctored?
Morgan: There is no such thing as an objective image. By framing a shot, selecting the depth of field, choosing the subject and orientation, and then later exposing it, even a film photographer creates a highly manipulated rendition of reality. Digital images only make the process easier, so this is not a new issue but instead one that has steadily grown more complicated.
Architects will often show faux-watercolor renderings of proposed buildings instead of photorealistic ones. This is because we're eager to accept realistic imagery as objective reality, but if we see a "painterly" rendition then we understand both that the building is not real and that some of the details are not final. I try to view any photorealistic image that I encounter as if it were a watercolor. That is, I take them as hearsay and not fact.
Andy: There is no such thing as visual truth, and I learned to doctor photographs in the darkroom in the era of old-fashioned film. Computer graphics has simply made it easier, faster and more universal. Visual information, in short, is no more reliable than written or oral narratives.
Andrew: Many people are learning new material from online media such as videos, MOOCs, and interactive demos. Describe why someone should use your book instead of (or in addition to) these resources.
Morgan: Interactive, animated, and online educational resources are fantastic. Andy has long insisted that the power of computers and graphics should enhance all education beyond the capabilities of the printed page and lecture hall. I could not agree with him more!
I think that a modern education is best when it comprises interactive demos, video, online human interaction, research papers, academic text, and direct face-to-face interaction with a mentor or lecturer. We wrote this book for considered study of advanced mathematical and computer science topics within graphics that proceeds at the student's own pace and the teacher's own order of topics. That "considered study" is an essential part of learning for advanced, technical topics. It should always be one piece of a comprehensive educational approach, and should integrate well with the other components.
We were conscious of this during the writing. This book serves equally well as the primary text in a MOOC or traditional university course, or for the independent reader. The introduction explains several paths that the reader might wish to take through the book. We were careful to minimize dependencies between chapters to support this. We include extensive code examples and reference online resources to help integrate the book with other forms of learning.
Andy: Our book, even though it is not meant to be nor can be encyclopedic, can serve as a compendium/reference work with many possible trails through it. As such it can augment any other form of knowledge acquisition/learning.
Andrew: What fundamentals do today's students need to learn that have been under-represented in the past?
Andy: In general, I feel that the CS students I encounter and especially those taking graphics could use more background in mathematics, physics and engineering than they typically have been exposed to in their education. I also think that it would be great if students majoring or even just minoring in CS knew much more about the human component in the user-computer-interface integrated system — our students lack an understanding of the basics of human perception, cognition, and social dynamics, and consequently are inadequately prepared to design systems and apps that impedance-match human and societal capabilities (and limitations) well. Design and analysis of experiments would also be useful background in a field that all too much believes that the marketplace is where new ideas first get tested out.
Andrew: Are you happy with the overall mathematical frameworks of computer graphics? If not, what would you change?
Spike: My answers to other questions suggest that I’ve got doubts in several areas, but overall, I think that the frameworks we use are pretty decent; they’ll be around for a while. There’s one problem that arises, I think, from the teaching of linear algebra in many mathematics departments, and that’s a lack of understanding of dual spaces, covectors, and adjoints. We tend to treat anything that can be drawn as an arrow as being a “vector,” but the normal vector n to a surface isn’t really a very natural object. Much more natural is the linear function v →v · n, which is zero on vectors perpendicular to n. In fact, the vector n appears almost always in expressions like this; the important thing isn’t the vector—the important thing is the operation “dot product with this vector.” We take the dot-product with the eye-vector and with light-rays, but we seldom have reason to add the vector n to a point. When we transform a surface by some linear mapping like a nonuniform scale, tangent vectors transform in the same way, but the normal vector does not—that’s because it’s really a covector.
The BRDF is similar: there’s not much occasion to work with the BRDF itself; instead, we tend to integrate its product with some other function like the light field. That integration is a lot like the dot-product-with-the-normal; it tells us that “integrate against the BRDF” is a covector for the vector space of light fields. Once we realize that, it’s possible to make sense of things like “BRDFs that are delta functions,” for while delta functions don’t really make sense, the covectors that they suggest actually do make sense. This is probably just a case of a mathematician wanting to put things in his own language, but doing so eliminates some surprises and confusions, and once you’re in the habit, it’s much easier to see covectors everywhere.
Closely related to this is the notion of invariance or equivariance: some things (e.g., the dot-product of two vectors) are invariant under some classes of transformations (in this case, rotations). That lets you swap the order of operations, which can save computations. Others are equivariant under transformations, i.e., there’s an easy way to compute the after-transformation result. An example is spline curves: you can interpolate among a collection of control points and then apply an affine transformation, or you can apply the affine transformation to each control point, and then interpolate the results; either way you’ll end up with the same spline curve. It’s a good idea, each time an idea is introduced, to ask, “What’s the largest set of transformations under which the associated computations are invariant or equivariant?” since the answer to this question tells you how you might optimize computations involving the new idea.
Morgan: Graphics depends on diverse mathematical tools, and any additional math one learns invariably helps. Because of this, we sought to write one of the most mathematically deep books in the field, and readers can choose whether they want the applied-math approach from the main text or the deeper digressions marked with integral-shaped road signs (we also have a sign for more theoretical computer science topics).
I think any graphics student or practitioner will benefit from the mathematical frameworks in the text…and it is necessary to acknowledge that this is only a beginning. We show how mathematical and computer science principles support graphics applications. Anyone with a career in graphics must use that as an introduction to study more math in detail. We were not concerned with filling the book with content. We focused on ideas and approaches, because there is great content already in other books.
However, I'm never going to be satisfied with the mathematical frameworks in pure mathematics, let alone those applied in graphics. Dissatisfaction is what drives research to expand the frontiers of knowledge. I hope that the book will answer many questions for readers...and lead them to ask many, many more.
Andrew: What advice would you offer to someone who wants to become a skilled practitioner of computer graphics?
Morgan: We provide a set of principles that distill some of our hard-won wisdom. This "Tao of graphics" is new in the third edition. We hope that it will serve as concrete advice to new practitioners. Of course, some of the principles likely can only be appreciated after working with the topics for a while.
Andy: ”Practice, practice, practice” (the old joke about how to get to Carnegie Hall).
Andrew: What advice would you offer to someone who wants to pursue new research in computer graphics?
Spike: Read our book, but at the same time, read the SIGGRAPH proceedings, especially the highly-cited papers. At first, they’ll seem impenetrable, but soon you’ll get a sense of what ideas are being used over and over, and what constitutes a step forward in the field. As you read, you’ll start to see what things you need to know to pursue the area that interests you most. If you’re looking at animation, you’ll realize that you should understand some physics and something about numerical integration of differential equations. If you’re interested in rendering, you’ll need some probability theory and real analysis. If real-time performance is what excites you, then an understanding of the differing constants in the exponential growth of various resources—computation, bandwidth, memory—and the consequences of those differences, as well as GPU implementations of various data structures, will be essential. The other advice I’d give is, “Implement early and often!” The visual feedback of graphics is a great way to confirm your understanding of a topic, or to help you debug misunderstandings.
Morgan: Modern computer graphics is a very large field. It includes, for example, animation, rendering, modeling, human-computer interaction, scientific and medical visualization, and computational photography. We address many of these, but focus on realistic rendering in this book because that is the core on which most other aspects rely. To succeed in rendering, it is necessary to integrate principles and practice. Few areas of computer science simultaneously benefit as much from the study of theoretical continuous math (e.g., calculus, differential equations, probability, geometry, topology) and engineering (e.g., hardware architecture, software design, input devices). I recommend that everyone who studies rendering seek to understand the entire system, from logic gates to the rendering equation. This comprehensive knowledge often is more effective in a fast-moving field than deep knowledge of one aspect at the expense of ignorance elsewhere. As anecdotal evidence, many of the researchers who've won SIGGRAPH awards have demonstrated amazing breadth in their work.
Andrew: Do you think that there will be a Grand Unified Theory for all of computer graphics, uniting everything from modeling and texturing to animation and rendering, or will the field forever be a collection of tricks and approximations? If such a theory is out there, what do you think it might look like?
Morgan: I think that leading researchers agree that photorealistic rendering is a sampling and reconstruction problem, so numerical integration of some abstraction of Maxwell's equations is a good first cut at a GUT. Modern GPUs have sufficient computational power that real-time rendering work now takes this approach as well. So, even performance constraints are no longer a good reason to abandon the mathematical framework.
However, the challenge in unifying all of graphics is that the error function in our computation is not energy. It is human perception, and the neuroscience and psychology models of perception today have limited predictive power compared to Maxwell's equations for energy transport. So, we can minimize radiometric error, but cannot minimize perceptual error. The "hacks" are the (black) art of graphics. We have many heuristics for how to design systems and many perceptually acceptable approximations, but we need much more input from the other sciences to begin to formalize why these work so well. Note that modeling, lighting, and animation interact even more strongly with human perception than rendering. We can't limit the progress of today's applications on tomorrow's perceptual research, so I think that many hacks will remain with us for a long time. This is glorious—human artists, directors, and visually savvy programmers should have a central role in the production of imagery.
Andrew: Most computer graphics have focused on imagery perceptible to humans. Are there other frequency ranges or audiences you'd like to explore?
Spike: I like to garden, and there’s some evidence that insects see plants rather differently than we do, because their visual receptors are responsive to different bands. It’d be fun to play with rendering a bee’s-eye view of the world. In a related note, there’s probably an opportunity to generate a kind of synthetic synesthesia in which sounds are displayed as colors, or vice versa. I don’t know of any practical use for that, but it might be an interesting experience.
Synthetic graphics also gets used as input to computer vision algorithms, and I think that there are opportunities for major problems there. We’re pretty good, in graphics, at making images that fool the human eye. But there may be characteristics of those images that the human eye cannot detect, but which can be learned by a vision algorithm as a significant “feature.” (I’m thinking, for instance, of JPEG artifacts.) You’d hate to think that your vision algorithm had learned to understand 3D scenes when it had really learned to understand artifacts of our standard approaches to graphics.
Andrew: What is the most surprising or exciting application of computer graphics that you have ever seen?
Spike: The spreadsheet. Sure, it’s 2D graphics. But it completely changed the world. It was the first great scientific visualization tool. It had a brilliant interface. It was simple, and the simplicity of its model gave it enormous power. It’s a pity that many students today outside of computer science don’t seem to understand that a spreadsheet is more than a way to organize data in tabular form and compute a few row- and column-sums. The best spreadsheets are the ones where the value is in the formulas rather than the data—the formulas give you a way to ask the “what if?” questions that are really important in many applications.
I’d like to believe that as 3D graphics gets more and more sophisticated, that we’ll start sharing not just 3D models, but whole scenes, and that in these scenes, the description of relationships among objects will be the major content rather than the objects themselves. Of course, some CAD systems already have this characteristic, but I’m thinking about things like models of meeting rooms in which the relationship of people to furniture and lighting and other people is modeled, so that as you change the physical characteristics of the room, you can see the social consequences and decide whether a certain living room layout works well both for family use and for large parties, for instance. When we get to that state, we’ll have something akin to the spreadsheet: a situation in which the relationships between things are as central as the things themselves.
Morgan: This question demands a personal response, because excitement in graphics is almost always generated by the whole experience, including the graphics, and not by the images in isolation.
The first time that I saw real-time, first-person graphics on a consumer PC (in Wolfenstein 3D in 1992) was incredibly exciting and surprising. Suddenly, 3D graphics belonged not to the labs equipped with SGI workstations but to the people. It was delivering the virtual reality future that we had wished for in science fiction like Tron, Neuromancer, and Snow Crash.
I felt the same way the first time that I used an iPhone, which made powerful computer graphics personal and integral to my lifestyle. Prior to that graphics was a destination activity, where you'd specifically plan to use an application and then go to the room with the computer in it.
Andrew: A popular goal of the graphics community has been to create simple, powerful tools that anyone can use to create marvelous imagery. But most of today's tools are big, complicated, expensive systems. Those programs (and their free counterparts) require significant training for even the most talented people. Why do you think this has happened? Should it change? If so, what should change, and how?
Spike: I think that this is a consequence of a kind of “proportionality rule”: the power of a tool and the investment required to learn it have to be nearly enough proportional that learning the tool produces a return that makes it worthwhile. If you’re going to make drawings all day every day, and you’re going to have a wonderful digitizing tablet to work with, then it’s worth the huge investment to get good at using Adobe Illustrator. If you just need to draw a quick diagram of your vegetable garden, it’s the wrong tool unless you’re already an expert. On the other hand, if you need to write a grocery list that’ll print neatly, a tool like Notepad is perfect: the time spent learning it is almost nothing. A few tools manage to hit a sweet spot: instant learning combined with great empowerment and the potential for additional empowerment through use. I think that the pinch-zoom manipulation of images and objects is a good example, as is Google search. But pinch-zoom manipulation of images wouldn’t be a great example without digital cameras. If, as in 1979, there were only a few dozen digital images widely available, who would care about manipulating them? I think that intuitive and instantly powerful tools for manipulating 3D shapes will become common when 3D models are as easy to generate as digital photos; we’re getting there rapidly. Of course, doing subtle manipulations (think of the hundreds of filters in Photoshop) will probably still require big and complex tools, but doing basic things easily ought to be within reach. I suppose that when vision techniques can make good guesses about what you’re manipulating, and can then feed advertisement-generation mechanisms (you’re messing with a model of a camera, and an ad for telephoto lenses appears), there’ll be a strong incentive to develop such tools.
Morgan: My understanding is that this occurred in part because content creation tools experienced significant aggregation due to business acquisitions and mergers. Adobe and Autodesk dominate their fields today, and historically focused on professional artists. Now, they've saturated those markets and solved many of the technical integration problems. So, those companies are beginning to innovate more substantially again and to expand to consumer modeling with tools like Autodesk's 123D Catch. That application allows anyone to create 3D models by taking photographs of real objects and works remarkably well.
I think that applications like Minecraft, Sketchup, Sketchbook Pro, and iPhoto today are examples of relatively powerful 3D, 2D art, and 2D photograph editing available to consumers. I hope that we'll see more applications like these soon, especially for tablets and touch-screens, where touch and pen user interfaces naturally lend themselves to content creation.