VolkAddingProtoKernels

From GNU Radio
Revision as of 01:36, 8 March 2017 by Mbr0wn (talk | contribs) (Imported from Redmine)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Adding New Volk Proto Kernels

Adding new proto-kernels (implementations of Volk kernels for specific architectures) is relatively easy.
In the relevant <kernel>.h file in the volk/include/volk/volk_<input-fingerprint_function-name_output-fingerprint>.h file, add a new #ifdef/#endif block for the LV_HAVE_<arch> corresponding to the <arch> you a working on (e.g. SSE, AVX, NEON, etc.).

For example, for volk_32f_s32f_multiply_32f_u_neon:

#ifdef LV_HAVE_NEON
#include 
/*!
  \brief Scalar float multiply
  \param cVector The vector where the results will be stored
  \param aVector One of the vectors to be multiplied
  \param scalar the scalar value
  \param num_points The number of values in aVector and bVector to be multiplied together and stored into cVector
*/
static inline void volk_32f_s32f_multiply_32f_u_neon(float* cVector, const float* aVector, const float scalar, unsigned int num_points){
  unsigned int number = 0;
  const float* inputPtr = aVector;
  float* outputPtr = cVector;
  const unsigned int quarterPoints = num_points / 4;

  float32x4_t aVal, cVal;

  for(number = 0; number < num_points; number++){
    aVal = vld1q_f32(inputPtr); // Load into NEON regs
    cVal = vmulq_n_f32 (aVal, scalar); // Do the multiply
    vst1q_f32(outputPtr, cVal); // Store results back to output
    inputPtr += 8;
    outputPtr += 8;
  }
  for(number = quarterPoints * 4; number < num_points; number++){
      *outputPtr++ = (*inputPtr++) * scalar;
  }
}
#endif /* LV_HAVE_NEON */

So you want to write a NEON kernel:

It is relatively trivial to translate SSE/AVX/etc. kernel to NEON intrinsic's, so we'll start with that:

First, change the #ifdef from LV_HAVE_<x> to LV_HAVE_NEON
You need to #include <arm_neon.h>
Then, the two main things are translating the data types, and then the actual intrinsic call names

e.g.:

_m128 can become float32x4_t (or int16x8_t, or some other combination) - the actual type will depend on the kernels signature
_mm_load_ps(<x>) will become something like: vld1q_f32(<x>)

You will want to search http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html for the particular SIMD instruction (intrinsic) you are looking for.

REMEMBER: There are both aligned, and unaligned proto-kernels for each kernel.