VolkAddingProtoKernels

= Adding New Volk Proto Kernels =

Adding new proto-kernels (implementations of Volk kernels for specific architectures) is relatively easy.

In the relevant .h file in the volk/include/volk/volk_.h file, add a new /  block for the LV_HAVE_ corresponding to the you a working on (e.g. SSE, AVX, NEON, etc.).

For example, for :

/*! \brief Scalar float multiply \param cVector The vector where the results will be stored \param aVector One of the vectors to be multiplied \param scalar the scalar value \param num_points The number of values in aVector and bVector to be multiplied together and stored into cVector static inline void volk_32f_s32f_multiply_32f_u_neon(float* cVector, const float* aVector, const float scalar, unsigned int num_points){ unsigned int number = 0; const float* inputPtr = aVector; float* outputPtr = cVector; const unsigned int quarterPoints = num_points / 4;
 * 1) ifdef LV_HAVE_NEON
 * 2) include

float32x4_t aVal, cVal;

for(number = 0; number &lt; num_points; number++){ aVal = vld1q_f32(inputPtr); // Load into NEON regs cVal = vmulq_n_f32 (aVal, scalar); // Do the multiply vst1q_f32(outputPtr, cVal); // Store results back to output inputPtr += 8; outputPtr += 8; } for(number = quarterPoints * 4; number &lt; num_points; number++){ *outputPtr++ = (*inputPtr++) * scalar; } } So you want to write a NEON kernel:
 * 1) endif /* LV_HAVE_NEON */

It is relatively trivial to translate SSE/AVX/etc. kernel to NEON intrinsic's, so we'll start with that:

First, change the #ifdef from LV_HAVE_ to LV_HAVE_NEON

You need to #include 

Then, the two main things are translating the data types, and then the actual intrinsic call names

e.g.:

_m128 can become float32x4_t (or int16x8_t, or some other combination) - the actual type will depend on the kernels signature

_mm_load_ps() will become something like: vld1q_f32()

You will want to search http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html for the particular SIMD instruction (intrinsic) you are looking for.

REMEMBER: There are both aligned, and unaligned proto-kernels for each kernel.