Home > Articles > Operating Systems, Server > Linux/UNIX/Open Source

  • Print
  • + Share This
Like this article? We recommend

Integration with Vector Intrinsics

Sometimes you may want to perform a vector operation that’s not supported directly by GCC. An example would be a max operation, giving the maximum of each pair of values. We first attempt to create a scalar version of this. Since our vector type can be cast to an array easily, this is quite straightforward:

v4si v_max(v4si v1, v4si v2)
{
  int * s1 = (int*)&v1;
  int * s2 = (int*)&v2;
  v4si max;
  int * smax = (int*)&max;
  smax[0] = MAX(s1[0], s2[0]);
  smax[1] = MAX(s1[1], s2[1]);
  smax[2] = MAX(s1[2], s2[2]);
  smax[3] = MAX(s1[3], s2[3]);
  return max;
}

The overhead of this function call is likely to be quite high, so ideally we should declare it as inline; and, since we’re already writing GCC-specific code, we can add an attribute to ensure that the compiler always inlines it. This macro can be used for functions put in headers:

#define INLINE inline extern __attribute__((always_inline))

If a function is declared as inline extern, the compiler won’t generate a non-inline version of it. You can’t create a pointer to the function, but you also don’t get linker errors from having multiply-defined symbols. Unfortunately, the C standard states that inline is advisory, and GCC ignores it at lower optimization levels to make debugging easier. The extra attribute tells it to inline the function even if it wouldn’t do so normally. This approach makes the function equivalent to a macro in terms of performance, but with function-style scoping.

Now that we’ve got the scalar implementation, it would be nice if we could optimize it a bit. As it turns out, we can. As previously mentioned, each architecture with a vector unit exposes a set of vector intrinsics, which map directly to vector instructions. If you’re targeting recent Power PC chips, the same function would be implemented as follows:

INLINE v4si v_max(v4si v1, v4si v2)
{
  return vec_max((vector int)v1, (vector int)v2);
}

This is a lot more readable—but, of course, not portable. Notice that we need to cast our type to a vector int. This is because AltiVec intrinsics were added to GCC by Apple before the generic vector framework was defined, and so they defined their own types. Fortunately, the representation of them is the same, and so we can cast between the new and old types easily.

What happens when we try to compile it? Let’s see:

$ gcc -std=c99 max.cmax.c: In function ’v_max’:
max.c:13: warning: implicit declaration of function ’vec_max’
max.c:13: error: ’vector’ undeclared (first use in this function)
max.c:13: error: (Each undeclared identifier is reported only once
max.c:13: error: for each function it appears in.)
max.c:13: error: parse error before ’int’

Not quite the outcome we were hoping for. The reason is that, by default, GCC doesn’t emit vector instructions, and therefore doesn’t support the vector built-ins. We need to use the -faltivec switch to enable the use of AltiVec intrinsics. The -faltivec switch also defines the preprocessor macro __ALTIVEC__. This is particularly useful, since we can use it to detect whether to use a vector or scalar code path. The following program illustrates this principle:

{max.c}

We can see the two code paths being used quite easily:

$ gcc -faltivec -std=c99 max.c && ./a.out
Altivec!
bar: 7
bar: 6
bar: 7
bar: 9

$ gcc -std=c99 max.c && ./a.out
Scalar!
bar: 7
bar: 6
bar: 7
bar: 9

Similar extensions are available on other platforms.

  • + Share This
  • 🔖 Save To Your Account