Home > Articles > Programming > Graphic Programming

  • Print
  • + Share This
From the author of Notes


[1] For a very small problem, the overhead of delegating and then waiting on the work might dominate the runtime.

[2] Because CUDA cannot make any guarantees as to the threads' execution order, the output histogram must be zero-initialized in host code before invoking this kernel.

[3] Improved hardware support for global atomics is only a partial explanation for these performance increases, of course.

[4] The INTDIVIDE_CEILING macro, defined in chUtil.h, computes the smallest integer that is greater than or equal to the result of dividing the two input operands.

  • + Share This
  • 🔖 Save To Your Account