Using OpenACC to Parallelize Real Code

By Sunita Chandrasekaran and Guido Juckeland
Nov 22, 2017

📄 Contents

␡

⎙ Print

< Back Page 5 of 6 Next >

This chapter is from the book 

OpenACC for Programmers: Concepts and Strategies

Learn More Buy

4.5 Summary

Here are all the OpenACC advantages you have used in this chapter.

Incremental optimization. You focused on only the loop of interest here. You have not had to deal with whatever is going on in track_progress() or any other section of the code. We have not misled you with this approach. It will usually remain true for an 80,000-lines of code program with 1,200 subroutines. You may be able to focus on a single computationally intense section of the code to great effect. That might be 120 lines of code instead of our 20, but it sure beats the need to understand the dusty corners of large chunks of legacy code.
Single source. This code is still entirely valid serial code. If your colleagues down the hall are oblivious to OpenACC, they can still understand the program results by simply ignoring the funny-looking comments (your OpenACC directives)—as can an OpenACC-ignorant compiler. Or a compute platform without accelerators. This isn’t guaranteed to be true; you can utilize the OpenACC API instead of directives, or rearrange your code to make better use of parallel regions; and these types of changes will likely break the pure serial version. But it can be true for many nontrivial cases.
High level. We have managed to avoid any discussion of the hardware specifics of our accelerator. Beyond the acknowledgment that the host-device connection is much slower than the local memory connection on either device, we have not concerned ourselves with the fascinating topic of GPU architecture at all.
Efficient. Without an uber-optimized low-level implementation of this problem using CUDA or OpenCL,¹ you have to take our word on this, but you could not do much better even with those much more tedious approaches. You can exploit the few remaining optimizations using some advanced OpenACC statements. In any event, the gains will be small compared with what you have already achieved.
Portable. This code should run efficiently on any accelerated device. You haven’t had to embed any platform-specific information. This won’t always be true for all algorithms, and you will read more about this later in Chapter 7, “OpenACC and Performance Portability.”

With these advantages in mind, we hope your enthusiasm for OpenACC is growing. At least you can see how easy it is to take a stab at accelerating a code. The low risk should encourage you to attempt this with your applications.

< Back Page 5 of 6 Next >

🔖 Save To Your Account

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Email Address

Using OpenACC to Parallelize Real Code

This chapter is from the book

This chapter is from the book

This chapter is from the book 

4.5 Summary

InformIT Promotional Mailings & Special Offers