VolkAddingProtoKernels
Adding New Volk Proto Kernels
Adding new proto-kernels (implementations of Volk kernels for specific architectures) is relatively easy.
In the relevant <kernel>.h file in the volk/include/volk/volk_<input-fingerprint_function-name_output-fingerprint>.h file, add a new #ifdef
/#endif
block for the LV_HAVE_<arch> corresponding to the <arch> you a working on (e.g. SSE, AVX, NEON, etc.).
For example, for volk_32f_s32f_multiply_32f_u_neon
:
#ifdef LV_HAVE_NEON #include /*! \brief Scalar float multiply \param cVector The vector where the results will be stored \param aVector One of the vectors to be multiplied \param scalar the scalar value \param num_points The number of values in aVector and bVector to be multiplied together and stored into cVector */ static inline void volk_32f_s32f_multiply_32f_u_neon(float* cVector, const float* aVector, const float scalar, unsigned int num_points){ unsigned int number = 0; const float* inputPtr = aVector; float* outputPtr = cVector; const unsigned int quarterPoints = num_points / 4; float32x4_t aVal, cVal; for(number = 0; number < num_points; number++){ aVal = vld1q_f32(inputPtr); // Load into NEON regs cVal = vmulq_n_f32 (aVal, scalar); // Do the multiply vst1q_f32(outputPtr, cVal); // Store results back to output inputPtr += 8; outputPtr += 8; } for(number = quarterPoints * 4; number < num_points; number++){ *outputPtr++ = (*inputPtr++) * scalar; } } #endif /* LV_HAVE_NEON */
So you want to write a NEON kernel:
It is relatively trivial to translate SSE/AVX/etc. kernel to NEON intrinsic's, so we'll start with that:
First, change the #ifdef from LV_HAVE_<x> to LV_HAVE_NEON
You need to #include <arm_neon.h>
Then, the two main things are translating the data types, and then the actual intrinsic call names
e.g.:
_m128 can become float32x4_t (or int16x8_t, or some other combination) - the actual type will depend on the kernels signature
_mm_load_ps(<x>) will become something like: vld1q_f32(<x>)
You will want to search http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html for the particular SIMD instruction (intrinsic) you are looking for.
REMEMBER: There are both aligned, and unaligned proto-kernels for each kernel.