Volk
VOLK
VOLK stands for Vector-Optimized Library of Kernels. It's a library that was introduced into GNU Radio in December 2010. You can read more about it here: http://www.trondeau.com/blog/2010/12/11/volk-vector-optimized-library-of-kernels.html. The official website for VOLK is http://libvolk.org
Other details on implementing Volk in GNU Radio can be found:
http://www.trondeau.com/blog/2012/2/13/volk-integration-to-gnu-radio.html
And benchmarking of Volk in GR:
http://www.trondeau.com/blog/2012/2/17/volk-benchmarking.html
Paper on VOLK at the WinnForum's SDR conference in January, 2013:
attachment:volk.pdf
Using VOLK
VOLK comes with a profiler that will build a config file for the best SIMD architecture for your processor. Run volk_profile
that is installed into $PREFIX/bin
. This program tests all known VOLK kernels for each architecture supported by the processor. When finished, it will write to $HOME/.volk/volk_config
the best architecture for the VOLK function. This file is read when using a function to know the best version of the function to execute.
Alignment Issues
SIMD code can be very sensitive to the alignment of the vectors, which is generally something like a 16-byte or 32-byte alignment requirement. The VOLK dispatcher functions, which is what we will normally call as users of VOLK, makes sure that the correct aligned or unaligned version is called depending on the state of the vectors passed to it. However, things typically work faster and more efficiently when the vectors are aligned. As such, VOLK (as of v3.7.3) has memory allocate and free methods to provide us with properly aligned vectors. We can also ask VOLK to give us the current machine's alignment requirement, which makes our job even easier when porting code.
To get the machine's alignment, simply call the size_t volk_get_alignment()
.
Allocate memory using void* volk_malloc(size_t size, size_t alignment)
.
Make sure that any memory allocated by VOLK is also freed by VOLK with volk_free(void *p)
.
Hand-Tuning Performance
If you know a particular architecture works best for your processor, you can specify the particular architecture to use in the VOLK preferences file: $HOME/.volk/volk_config
The file looks like:
volk_
Where the "FUNCTION_NAME" is the particular function that you want to over-ride the default value. We then specify two architectures for aligned and unaligned calls to the kernel. "ALIGNED ARCHITECTURE" is the VOLK SIMD architecture to use when all vectors are memory aligned (a_sse, a_sse2, a_sse3, a_avx, etc.). The "UNALIGNED ARCHITECTURE" is for when any of the vectors are not memory aligned (u_sse, u_avx, etc.). For example, the following config file tells VOLK to use SSE3 for the aligned kernel calls and the generic version for the unaligned kernel calls of a function that multiplies two complex streams together.
volk_32fc_x2_multiply_32fc a_sse3 generic
NOTE: This is a change in version 3.7 of GNU Radio and VOLK. Previously, we defined each function as either an aligned or unaligned kernel and set a single architecture for each. We now use a dispatcher function that uses the two versions of the architecture calls here depending on the alignment state of the vectors.
Writing Volk kernels
Developing with Volk in GNU Radio
To use Volk kernels in GNU Radio, you have to be aware of the buffer alignment. We have the ability to check the buffer alignment by calling the is_unaligned() function from the gr_block. If this returns True, then there is an alignment issue and the aligned kernels cannot be called. If this call returns False, then the buffers are aligned and the aligned Volk kernel may be used.
The following is an example using the gr_multiply_cc block, which uses the volk_32fc_x2_multiply_32fc kernel to multiply two streams together. We call the VOLK dispatcher, which will select the proper aligned or unaligned kernel based on the state of the vectors.
Every kernel should have an aligned and unaligned version. The generic C implementation of the math can be used for the unaligned call if necessary. Generally speaking, making an unaligned kernel is as simple as copying the aligned kernel and changing and load calls to loadu and store to storeu (true for Intel SSE instructions, at least).
int gr_multiply_cc::work (int noutput_items, gr_vector_const_void_star &input_items, gr_vector_void_star &output_items) { gr_complex *out = (gr_complex *) output_items[0]; int noi = d_vlen*noutput_items; memcpy(out, input_items[0], noi*sizeof(gr_complex)); for(size_t i = 1; i < input_items.size(); i++) volk_32fc_x2_multiply_32fc(out, out, (gr_complex*)input_items[i], noi); return noutput_items; }
Using ORC with VOLK
VOLK can take advantage of the Oil Runtime Compiler (ORC) to create cross-platform kernels relatively quickly. ORC is a higher-level language way to write SIMD code for different SIMD architectures. The ease of writing an Orc function can be offset by a less well-tuned architecture-specific kernel (generality versus speed). ORC can often be a good place to start writing VOLK kernels and then optimize as necessary.
To download ORC, go to:
http://code.entropywave.com/download/orc/
Or use their git repo:
git://code.entropywave.com/git/orc.git
As of GNU Radio 3.5.2, VOLK depends on ORC version 0.4.12 or higher.
VOLK Naming Scheme
There is discussion about standardizing the naming scheme for VOLK. We want standard naming to make sure that all functions are explicitly clear as to what they do, what their inputs and output types are, and that new functions do not have naming conflicts.
The basic naming scheme will look something like this:
> volk_(inputs params)_[name]_(output params)_[alignment]
These are a few questions that must be addressed when creating the names:
1. Different and multiple inputs and/or outputs
2. Different types, also with different/multiple inputs/outputs
3. Constants (scalars) versus vectors
4. Mappings or other control information (I'm thinking of things like masks for permutation operators)
5. Memorable (as in the user's should be able to be able to "guess" the names from their purpose)
6. Unique (prevent duplication)
The current scheme follows this formula:
volk_(input_type_0)_x(input_num_0)_(input_type_1)_x(input_num_1)_... _[name]_(output_type_0)_x(output_num_0)_(output_type_1)_x(output_num_1)_..._[alignment]
Any function may have M inputs and N outputs. Each input/output has a type that is explicitly named. We specify the types in blocks if there are multiple types in a row. For each block, the type of that block of inputs/outputs is followed by the number of items in that block. The types of data can be:
> 8i, 8u, 16i, 16u, 32i, 32u, 32f, 64i, 64u, 64f
The number of parameters with that type is specified following the type and prefixed with an "x." If there is only a single argument of the type, the multiplier may be omitted.
Any input/output type can be made complex by adding a "c" to the property type (such as 32-bit floating complex would be 32fc). By default, all inputs and outputs are vectors, but some of the VOLK kernels may take a scalar, such as multiplying by a const. These types are specified by prefixing a "s" to the type (e.g., s32fc).
The alignment property in the name specifies the memory alignment required by the inputs and outputs. Many SIMD architectures require a specific byte alignment. Mostly, this is a 16 byte alignment. The underlying Volk machinery will know this, and so the kernel must only be told that this is an aligned kernel by specifying an "a" suffix. An unaligned requirement would just be written as "u."
Note that only one alignment is specified for the function. Mostly, any imposed alignment on the input will be the same restriction on the output alignment, and vice-versa. However, some functions may not have the same requirements on all inputs or outputs, and scalars usually do not require a specific alignment. In these cases, the alignment should be the strictest alignment required by any of the inputs or outputs. Differences should be made clear in the function documentation.
Some examples include:
multiply two complex float vectors together (aligned and unaligned versions) and the dispatcher:
> volk_32fc_x2_multiply_32fc_a
> volk_32fc_x2_multiply_32fc_u
> volk_32fc_x2_multiply_32fc
Add four unsigned short vectors together:
> volk_16u_x4_add_16u_a
> volk_16u_x4_add_16u_u
> volk_16u_x4_add_16u
Multiply a complex float vector by a short integer:
> volk_32fc_s16i_multiply_32fc_a
> volk_32fc_s16i_multiply_32fc_u
> volk_32fc_s16i_multiply_32fc