Using OpenACC to Parallelize Real Code

Nov 22, 2017

␡

⎙ Print

Page 1 of 6 Next >

Walk through building a real-world OpenACC, a high-level, directive-based programming model for C/C++ and Fortran, program.

Save 35% off the list price* of the related book or multi-format eBook (EPUB + MOBI + PDF) with discount code ARTICLE.
* See informit.com/terms

This chapter is from the book 

OpenACC for Programmers: Concepts and Strategies

Learn More Buy

In this chapter, you’ll parallelize real code. You will start with code that does something useful. Then you’ll consider how you might use OpenACC to speed it up. You will see that reducing data movement is key to achieving significant speedup, and that OpenACC gives you the tools to do so. By the end of the chapter you will be able to call yourself an OpenACC programmer—a fledgling one, perhaps, but on your way. Let’s jump right into it.

4.1 Case Study

You are reading a book about OpenACC programming, so it’s a safe bet the authors are fans of this approach to parallel programming. Although that’s a perfectly sensible thing, it has its dangers. It is tempting for enthusiasts to cherry-pick examples that make it seem as if their favored technology is perfect for everything. Anyone with experience in parallel programming has seen this before. We are determined not to do that here.

Our example is so generically useful that it has many applications, and it is often used to demonstrate programming with other parallel techniques as well, such as the somewhat related OpenMP and the very different MPI. So, rest assured, we haven’t rigged the game.

Another reason we prefer this example is that both the “science” and the numerical method are intuitive. Although we will solve the Laplace equation for steady-state temperature distribution using Jacobi iteration, we don’t expect that you immediately know what that means.

Let’s look at the physical problem. You have a square metal plate. It is initially at zero degrees. This is termed, unsurprisingly, the initial conditions. You will heat two of the edges in an interesting pattern where you heat the lower-right corner (as pictured in Figure 4.1A) to 100 degrees. You control the two heating elements that lead from this corner such that they go steadily to zero degrees at their farthest edge. The other two edges you will hold at zero degrees. These four edges constitute the boundary conditions.

For the metal plate, you would probably guess the ultimate solution should look something like Figure 4.1B.

Figure 4.1. A heated metal plate

You have a very hot corner, a very cold corner, and some kind of gradient in between. This is what the ultimate, numerically solved solution should look like.

If you are wondering whether this is degrees centigrade or Fahrenheit, or maybe Kelvin, you are overthinking the problem. If you have a mathematical method or numerical background, you should be interested to know that the equation that governs heat distribution is the Laplace equation:

∇²T = 0

Although this equation has many interesting applications, including electrostatics and fluid flow, and many fascinating mathematical properties, it also has a straightforward and intuitive meaning in this context. It simply means that the value of interest (in our case, temperature) at any point is the average of the neighbor’s values. This makes sense for temperature; if you have a pebble and you put a cold stone on one side and a hot stone on the other, you’d probably guess that the pebble would come to the average of the two. And in general, you would be right.

4.1.1 Serial Code

Let’s represent the metal plate using a grid, which becomes a typical two-dimensional array in code. The Laplace equation says that every point in the grid should be the average of the neighbors. This is the state you will solve for.

The simulation starting point—the set of initial conditions—is far from this. You have zero everywhere except some big jumps along the edges where the heating elements are. You want to end up with something that resembles the desired solution.

There are many ways you can find this solution, but let’s pick a particularly straightforward one: Jacobi iteration. This method simply says that if you go over your grid and set each element equal to the average of the neighbors, and keep doing this, you will eventually converge on the correct answer. You will know when you have reached the right answer because when you make your averaging pass, the values will already be averaged (the Laplace condition) and so nothing will happen. Of course, these are floating-point numbers, so you will pick some small error, which defines “nothing happening.” In this case, we will say that when no element changes more than one-hundredth of a degree, we are done. If that isn’t good enough for you, you can easily change it and continue to a smaller error.

Your serial algorithm looks like this at the core.

for(i = 1; i <= HEIGHT; i++) {
    for(j = 1; j <= WIDTH; j++) {
        Temperature[i][j] = 0.25 * (Temperature_previous[i+1][j]
                                 + Temperature_previous[i-1][j]
                                 + Temperature_previous[i][j+1]
                                 + Temperature_previous[i][j-1]);
    }
}

Here it is in Fortran:

do j=1,width
    do i=1,height
        temperature(i,j) =0.25*(temperature_previous(i+1,j)&
                              + temperature_previous(i-1,j)&
                              + temperature_previous(i,j+1)&
                              + temperature_previous(i,j-1))
    enddo
enddo

Note that the C and Fortran code snippets are virtually identical in construction. This will remain true for the entire program.

This nested loop is the guts of the method and in some sense contains all the science of the simulation. You are iterating over your metal plate in both dimensions and setting every interior point equal to the average of the neighbors (i.e., adding together and dividing by 4). You don’t change the very outside elements; those are the heating elements (or boundary conditions). There are a few other items in the main iteration loop as it repeats until convergence. Listing 4.1 shows the C code, and Listing 4.2 shows the Fortran code.

Listing 4.1. C Laplace code main loop

while ( worst_dt > TEMP_TOLERANCE ) {
    for(i = 1; i <= HEIGHT; i++) {
        for(j = 1; j <= WIDTH; j++) {
            Temperature[i][j] = 0.25 * (Temperature_previous[i+1][j]
                                 + Temperature_previous[i-1][j]
                                 + Temperature_previous[i][j+1]
                                 + Temperature_previous[i][j-1]);
        }
    }

    worst_dt = 0.0;

    for(i = 1; i <= HEIGHT; i++){
        for(j = 1; j <= WIDTH; j++){
            worst_dt = fmax( fabs(Temperature[i][j]-
                                  Temperature_previous[i][j]),
                             worst_dt);
            Temperature_previous[i][j] = Temperature[i][j];
        }
    }

    if((iteration % 100) == 0) {
        track_progress(iteration);
    }

    iteration++;
}

Listing 4.2. Fortran Laplace code main loop

do while ( worst_dt > temp_tolerance )

    do j=1,width
       do i=1,height
           temperature(i,j) =0.25*(temperature_previous(i+1,j)&
                                 + temperature_previous(i-1,j)&
                                 + temperature_previous(i,j+1)&
                                 + temperature_previous(i,j-1))
        enddo
    enddo

    worst_dt=0.0

    do j=1,width
       do i=1,height
          worst_dt = max( abs(temperature(i,j) – &
                              temperature_previous(i,j)),&
                          worst_dt )
          temperature_previous(i,j) = temperature(i,j)
       enddo
    enddo

    if( mod(iteration,100).eq.0 ) then
        call track_progress(temperature, iteration)
    endif

    iteration = iteration+1

enddo

The important addition is that you have a second array that keeps the temperature data from the last iteration. If you tried to use one array, you would find yourself using some updated neighboring elements and some old neighboring elements from the previous iteration as you were updating points in the grid. You need to make sure you use only elements from the last iteration.

While you are doing this nested loop copy to your backup array (and moving all this data around in memory), it’s a good time to look for the worst (most changing) element in the simulation. When the worst element changes only by 0.01 degree, you know you are finished.

It might also be nice to track your progress as you go; it’s much better than staring at a blank screen for the duration. So, every 100 iterations, let’s call a modest output routine.

That is all there is to it for your serial Laplace Solver. Even with the initialization and output code, the full program clocks in at fewer than 100 lines. (See Listing 4.3 for the C code, and Listing 4.4 for Fortran.)

Listing 4.3. Serial Laplace Solver in C

#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <sys/time.h>

#define WIDTH      1000
#define HEIGHT     1000
#define TEMP_TOLERANCE 0.01

double Temperature[HEIGHT+2][WIDTH+2];
double Temperature_previous[HEIGHT+2][WIDTH+2];

void initialize();
void track_progress(int iter);

int main(int argc, char *argv[]) {

  int i, j;
  int iteration=1;
  double worst_dt=100;
  struct timeval start_time, stop_time, elapsed_time;

  gettimeofday(&start_time,NULL);

  initialize();

  while ( worst_dt > TEMP_TOLERANCE ) {

    for(i = 1; i <= HEIGHT; i++) {
      for(j = 1; j <= WIDTH; j++) {
        Temperature[i][j] = 0.25 * (Temperature_previous[i+1][j]
                                 + Temperature_previous[i-1][j]
                                 + Temperature_previous[i][j+1]
                                 + Temperature_previous[i][j-1]);
      }
    }

    worst_dt = 0.0;

    for(i = 1; i <= HEIGHT; i++){
      for(j = 1; j <= WIDTH; j++){
         worst_dt = fmax( fabs(Temperature[i][j]-
                               Temperature_previous[i][j]),
                          worst_dt);
         Temperature_previous[i][j] = Temperature[i][j];
      }
    }

    if((iteration % 100) == 0) {
      track_progress(iteration);
    }

    iteration++;
  }

  gettimeofday(&stop_time,NULL);
  timersub(&stop_time, &start_time, &elapsed_time);

  printf("\nMax error at iteration %d was %f\n",
          iteration-1, worst_dt);
  printf("Total time was %f seconds.\n",
          elapsed_time.tv_sec+elapsed_time.tv_usec/1000000.0);
}

void initialize(){

  int i,j;

  for(i = 0; i <= HEIGHT+1; i++){
    for (j = 0; j <= WIDTH+1; j++){
      Temperature_previous[i][j] = 0.0;
    }
  }

  for(i = 0; i <= HEIGHT+1; i++) {
    Temperature_previous[i][0] = 0.0;
    Temperature_previous[i][WIDTH+1] = (100.0/HEIGHT)*i;
  }

  for(j = 0; j <= WIDTH+1; j++) {
    Temperature_previous[0][j] = 0.0;
    Temperature_previous[HEIGHT+1][j] = (100.0/WIDTH)*j;
  }
}

void track_progress(int iteration) {

  int i;

  printf("---------- Iteration number: %d ------------\n",
          iteration);
  for(i = HEIGHT-5; i <= HEIGHT; i++) {
    printf("[%d,%d]: %5.2f  ", i, i, Temperature[i][i]);
  }
  printf("\n");
}

Listing 4.4. Fortran version of serial Laplace Solver

program serial
  implicit none

  integer, parameter             :: width=1000
  integer, parameter             :: height=1000
  double precision, parameter    :: temp_tolerance=0.01

  integer                        :: i, j, iteration=1
  double precision               :: worst_dt=100.0
  real                           :: start_time, stop_time

  double precision, dimension(0:height+1,0:width+1) :: &
                              temperature, temperature_previous

  call cpu_time(start_time)

  call initialize(temperature_previous)

  do while ( worst_dt > temp_tolerance )

    do j=1,width
      do i=1,height
        temperature(i,j) = 0.25* (temperature_previous(i+1,j)&
                               + temperature_previous(i-1,j)&
                               + temperature_previous(i,j+1)&
                               + temperature_previous(i,j-1))
      enddo
    enddo

    worst_dt=0.0

    do j=1,width
      do i=1,height
        worst_dt = max( abs(temperature(i,j) – &
                            temperature_previous(i,j)),&
                        worst_dt )
        temperature_previous(i,j) = temperature(i,j)
      enddo
    enddo

    if( mod(iteration,100).eq.0 ) then
      call track_progress(temperature, iteration)
    endif

    iteration = iteration+1

  enddo

  call cpu_time(stop_time)

  print*, 'Max error at iteration ', iteration-1, ' was ', &
          worst_dt
  print*, 'Total time was ',stop_time-start_time, ' seconds.'
end program serial

subroutine initialize( temperature_previous )
  implicit none
  integer, parameter             :: width=1000
  integer, parameter             :: height=1000
  integer                        :: i,j
  double precision, dimension(0:height+1,0:width+1) :: &
                    temperature_previous

  temperature_previous = 0.0

  do i=0,height+1
    temperature_previous(i,0) = 0.0
    temperature_previous(i,width+1) = (100.0/height) * i
  enddo

  do j=0,width+1
    temperature_previous(0,j) = 0.0
    temperature_previous(height+1,j) = ((100.0)/width) * j
  enddo
end subroutine initialize

subroutine track_progress(temperature, iteration)
  implicit none
  integer, parameter             :: width=1000
  integer, parameter             :: height=1000
  integer                        :: i,iteration

  double precision, dimension(0:height+1,0:width+1) :: &
        temperature

  print *, '------- Iteration number: ', iteration, ' ------'
  do i=5,0,-1
    write (*,'("("i4,",",i4,"):",f6.2,"  ")',advance='no') &
              height-i,width-i,temperature(height-i,width-i)
  enddo
  print *
end subroutine track_progress

4.1.2 Compiling the Code

Take a few minutes to make sure you understand the code fully. In addition to the main loop, you have a small bit of initialization, a timer to aid in optimizing, and a basic output routine. This code compiles as simply as

pgcc laplace.c

Here it is for the PGI compiler:

pgcc laplace.f90

We use PGI for performance consistency in this chapter. Any other standard compiler would work the same. If you run the resulting executable, you will see something like this:

. . .
. . .
---------- Iteration number: 3200 ------------
. . . [998,998]: 99.18  [999,999]: 99.56  [1000,1000]: 99.86
---------- Iteration number: 3300 ------------
. . . [998,998]: 99.19  [999,999]: 99.56  [1000,1000]: 99.87

Max error at iteration 3372 was 0.009995
Total time was 21.344162 seconds.

The output shows that the simulation looped 3,372 times before all the elements stabilized (to within our 0.01 degree tolerance). If you examine the full output, you can see the elements converge from their zero-degree starting point.

The times for both the C and the Fortran version will be very close here and as you progress throughout optimization. Of course, the time will vary depending on the CPU you are using. In this case, we are using an Intel Broadwell running at 3.0 GHz. At the time of this writing, it is a very good processor, so our eventual speedups won’t be compared against a poor serial baseline.

This is the last time you will look at any code outside the main loop. You will henceforth exploit the wonderful ability of OpenACC to allow you to focus on a small portion of your code—be it a single routine, or even a single loop—and ignore the rest. You will return to this point when you are finished.

Page 1 of 6 Next >

🔖 Save To Your Account

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Privacy Notice

Overview

Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information

To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information

Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security

Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children

This site is not directed to children under the age of 13.

Marketing

Pearson may send or direct marketing communications to users, provided that

Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
Such marketing is consistent with applicable law and Pearson's legal obligations.
Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information

If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out

Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information

Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents

California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure

Pearson may disclose personal information, as follows:

As required by law.
With the consent of the individual (or their parent, if the individual is a minor)
In response to a subpoena, court order or legal process, to the extent permitted or required by law
To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
To investigate or address actual or suspected fraud or other illegal activities
To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links

This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact

Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice

We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020

Email Address