- 1 GSoC 2014: Using Hardware Based Co-processors in GNU Radio
- 1.1 High Level Details
- 1.2 Keystone2 Details
- 1.2.1 Setting Up the Development Environment
- 1.2.2 Co-Processors
- 1.2.3 Programming the Keystone
- 1.3 GNU Radio Buffers with Zero Copy
GSoC 2014: Using Hardware Based Co-processors in GNU Radio
Student: Alfredo Muniz (email@example.com)
Mentor: Philip Balister
Abstract: GNU Radio as a digital signal processing program requires numerous mathematical operations to be executed readily and repeatedly. A faster way to process signals allows projects with high timing constraints such as channel sounding. Past co-processor projects including GPUs, FPGAs, and DSPs do not scale well with new devices. This summer we would like to implement a purely open source three step approach that would improve the state of GNU Radio co-processing for years to come.
High Level Details
Essentially a summary of the project and previous work. We will be using the XTCIEVMK2H
- This is a special board by TI currently only available to academics
- The TCI6630K2L also contains the coprocessors and should work the same way
Overview of Deliverables
The necessary steps to accomplish the project can be summarized below (not necessarily in order):
- Modify gnuradio-runtime to allow blocks to use the write_pointer without a need for a kernel copy
- Install GNU Radio on the Keystone2 and run a test on a co-processor
- Test the implementation of gnuradio-runtime on a co-processor
Previous work done at VT with the keystone2 platform and GNU Radio. They focus on making the keystone2 development entirely open source. Was helpful for setting up the board for netboot and getting the uboot parameters correct.
The XTCIEVMK2H Keystone2 has lots of documentation available Additional information on the coprocessor drivers and api is found by installing the CCSv5 and the MCSDK v3.0.4 in the $(TI_PDK_INSTALL_DIR)\packages\ti\drv\$(COPROC)\docs\
Setting Up the Development Environment
The way to develop on the keystone2 is through booting from a TFTP server and running the root file system (rootfs) from a network file system (nfs). We have to configure both our host machine and the board in order to allow netboot and nfs.
The most similar setup I have seen on the wiki is on this page
Gathering the Files
Philip did magic with OE and made the images for GNU Radio and for UHD
We need 5 files to go in our boot folder which will be connected to our TFTP server and for these tutorials is /tftpboot
Boot Monitor = mon.bin
Boot Loader = u-boot.bin
??? = u-boot.img
Kernel Image = uImage-k2hk-evm.bin
Device Tree = uImage-k2hk-evm.dtb
We need the gnuradio-dev-image-k2hk-evm.tar.gz to go in our rootfs folder which will be a nfs. I created /var/www/share as my nfs and extracted the package there.
For Ubuntu 12.04 Precise there are lots of ways to get TFTP but I found the only working one to be with ATFTP. This setup requires that our host computer be connected through ethernet to a network that the board is also connected to.
Below is a copy of my configuration file located at /etc/default/atftpd
USE_INETD=false OPTIONS="--tftpd-timeout 300 --retry-timeout 5 --port=69 --bind-address=10.16.32.74 --maxthread 100 --verbose=7 /tftpboot"
Notice that bind-address is the address of the host we are using which can be found using ifconfig and looking at inet_addr.
Once we setup the configuration, we need to create a folder for which we want to keep the files we wish to share with the board. In this case, I choose /tftpboot. We need to set the permissions in /tftpboot to allow copying and access from the board.
We can then start the tftp server using the commands
sudo service atftpd restart
We can then create a file in the tftpboot called test and connect to our tftp server from our host machine:
tftp 10.16.32.74 get test
The file test should appear in the folder we called tftp from.
Setting up the Network File System
The NFS can be setup using nfs-kernel-server. We have to install that package and modify the /etc/exports file to have the line
Then we can run these commands to ensure it works:
sudo exportfs -a sudo exportfs -v
I choose to put my rootfs in /var/www/share and ran these commands to give the board access to the files
sudo chown -R nobody:nogroup /var/www/share sudo chmod 777 -R /var/www/share
Lastly we can run the nfs server
sudo service nfs-kernel-server restart
Now we can login to the keystone2 by connecting the USB cables, ethernet cable, and power cable.
Configuring the Keystone Boot Environment
Now that we have made a tftp server, we can receive the files we need to communicate with the board. There are a couple of configurations we need to change. Again 10.16.32.74 is my ip on my host machine. The ip of the keystone should be automatically assigned unless there is a problem with the internet/firewall in which case we want to configure following these steps
Once we connect to the serial port of our board, we want to interrupt the countdown by hitting any key. We can then set the uboot environment:
env default -f -a setenv serverip 10.16.32.74 setenv tftp_root /tftpboot setenv bootargs console=ttyS0,115200n8 rootwait=1 earlyprintk root=/dev/nfs nfsroot=10.16.32.74:/var/www/share,v3,tcp rw ip=dhcp setenv _1 tftpboot 0c5f0000 mon.bin setenv _2 mon_install 0x0c5f0000 setenv _3 tftpboot 87000000 uImage-k2hk-evm.dtb setenv _4 tftpboot 88000000 uImage-k2hk-evm.bin setenv _5 bootm 88000000 - 87000000 setenv _ run _1 _2 _3 _4 _5 saveenv
From now on, we can then boot from our TFTP server and from our NFS by running
That should be enough to get us up and running. We should be able to login as root and run the test for GNU Radio
There are six different co-processors on the board that we have access to
- RAC - Receive Accelerator Coprocessor(RAC)
- TAC - Transmit Accelerator Coprocessor(TAC)
- VCP2 - Viterbi-Decoder Coprocessor(VCP2)
- example for the VCP2.
- TCP3d - Turbo Decoder Coprocessor(TCP3d)
- FFTC - Fast Fourier Transform Coprocessor(FFTC)
- BCP - Bit Rate Coprocessor(BCP)
For now I think the FFTC is a good choice as it contains examples, is well documented, and should be straightforward to test in GNU Radio. Success of the FFTC, means we can test others such as the TCP3d which can be very beneficial to GNU Radio.
I'll describe the steps of using the co-processor briefly and will update with further details once I get it working. We first need to go through an initialization sequence that is described on page 26 of the FFTC SDS. Then we need to send a TX Request using the fftc_txgetrequestbuffer() buffer function. The function outputs the pointer to where the data we are inputing should be stored. Once we fill in the buffer with our data from GNU Radio, we can go through the RX stage by calling fftc_rxgetresult() which returns the pointer to the raw result and the pointer to the length of the buffer. The interface should work well with our get_user_pages function described below.
We decided to figure out how the system works on the register level so that we can use the co-processors properly.
TeraNet - System Interconnect
The interconnect on the tci6638k2k is called the TeraNet or Eagle's Nest. It is divided into Data Space to show the transfer of data and Configuration Space for access to peripheral configuration registers. The Data Space and Configuration Space are connected through bridge_12, bridge_13, bridge_14 and the Tnet_msmc_sys. The TeraNet is shown as TeraNet_3_x for data and TeraNet_3P_x for configuration.
To get data across the board, we need to learn about the different systems and see how their registers are configured. Then we can try to make sense of the TI software.
The MSMC (pronounced mizmick).
AXI - ARM CorePac
We need to figure out how to get data from the ARM to the MSMC.
All the registers available.
Programming the Keystone
Developing on the keystone2 can be divided into two major sections: ARM (TI calls it Linux) and DSP (TI calls it DSP/BIOS). For this project we want to develop on the ARM side since that is where GNU Radio will be running. Unfortunately most of the documentation and examples focus on developing applications for the DSP side while the ARM side is turned off using TI's debugger. However, TI's software is smart in that it is multilayered and the DSP compiler is somewhat similar to the ARM compiler so porting code from the DSP to the ARM shouldn't be too difficult. This section describes the software in more detail and explains a method of writing code for the ARM side using the libraries provided by TI.
Multicore Software Development Kit (MCSDK)
First we need to download the MCSDK which contains the libraries, drivers, documentation, examples, tests, and toolchains needed for using the device. It is possible to work without the MCSDK by programming the registers manually with the linaro toolchain but it will get complex really fast as systems depend on each other and need to be setup in a certain order. First we need to install the appropriate MCSDK for our device (mine is TCI6638K2K) on our host machine. This is the only software we need to install in order to use TI's libraries and documentation. The MCSDK comes with many different parts/folders a few notable ones for this project are the PDK and MCSDK_LINUX.
The first thing we need to do is crosscompile the linux-devkit. This can be done in MCSDK_LINUX/linux-devkit by running the script:
This produces the supporting binaries we need for the peripherals. We can simply cp
n the contents of sysroots/cortexa15hf---- into the rootfs of the keystone2 which for me is /var/www/share/.
Programmers Development Kit (PDK)
We need to setup the PDK
The PDK comes with the Low Level Drivers (LLDs) needed to run the many peripherals. It will be our working directory as it contains the code we need to compile. If we go into PA folder PDK/packages/ti/drv/pa/, we can see that the following folders:
src - Files to generate the libraries
test - Files to generate tests
example - Files to generate examples
We will first generate the libraries for the arm. We do this by modifying the makefile in the PDK/packages/makefile.
Once we point to the makefile_armv7 in our peripheral directory, we can edit that makefile to build the libs. The makefile for the libs is in PDK/packages/ti/drv/PERIPHERAL/build/armv7/
We need to modify that makefile to include the appropriate include directories and to build the appropriate files in the src folder.
Once we have that, we can in the PDK/packages folder run:
If all is successful we should receive no errors and find our new libraries in the ../bin folder
Creating a Test
To create the test we need to first create our makefile in the drv/tests/k2k/armv7/linux/build folder. Here we will specify our test files and our include directories for the test to build properly.
The resource manager (RM) can be found in the PDK/ti/drv/rm directory. It allows communication between ARM<>DSP, DSP<>DSP cores. Because all of the examples for the coprocessors are written for the DSP, we are able to easily run DSP programs while still communicating with the ARM.
There are a number of ways to get data to and from the ARM and DSP - Msgcom, MessageQ, mpm_mailbox, and CMEM. The current plan is to use CMEM to allocate contiguous memory for holding the log-likelihood ratios and for holding the resulting hard decisions. CMEM is able to translate the virtual address of the buffers into physical pointers so that the DSP can use since it doesn't have a memory management unit (MMU). We can pass the physical pointers to the DSP through the MessageQ system. Msgcom is being deprecated and mpm_mailbox isn't as well documented. We'll then run a test to verify it functions and then move onto getting data out of GNU Radio.
We can look at the files in filetestdemo specifically options 2 and 3 for cmem_cached_test and for cmem_uncached_test for examples on how to use the cmem api. When allocating a buffer we need to remember that addresses are 36 bits instead of 32.
The MessageQ only needs to happen once because the DSP would then know where memory is. However the ARM doesn't know when the result is ready from the Turbo Decoder. Sending another MessageQ would slow things down (20us) so interrupts is the way to go. The TCP3d generates an interrupt to the DSP that tells it when the data is ready. Perhaps we can use this to interrupt the ARM so that we can get the result faster!
It may be possible to use the request_irq function in the linux kernel since we can see the interrupt on /proc/interrupts but that requires writing a kernel module and spending time in kernel space. It is much easier to use a busy-wait algorithm for a quick test in the meantime. We noted that the ARM doesn't recognize when the DSP writes to the address possibly due to the ARM caching the memory. We need to figure out how to make the ARM not cache stuff in the memory we choose.
GNU Radio Buffers with Zero Copy
Details on this can be found on the runtime page
The goal here is to avoid copying data from the user space to the kernel space which is typically done to provide separation from the user and the device to make programming easier and for security reasons. Direct IO allows us to take large amounts of data and operate on them from user space. The disadvantages are the time it takes to setup the direct IO which involves faulting and setting up the pages. The advantage is that we can tell the accelerators where the data from GNU Radio is without the need for an extra copy (time) in the kernel buffer. There are two methods that we will explore: contiguous buffer and scatter-gather lists. The contiguous buffer method is easy on the keystone2 since there is a module that supports this. Scatter-gather list method is a little more complex and will be explored further down the line once I figure out more about the linux kernel DMA api.
- Create a block in GNU Radio that prints the page numbers of a GNU Radio buffer using get_user_pages
Work in Progress
- An example of how to make loadable kernel modules and an example of get_user_pages is now available on github with the proper instructions. The next step before proceeding further is to pass the write buffer to the block constructor which is an animal on its own.
In order to make efficient use of co-processors in GNU Radio, we want to be able to perform direct IO as explained above with the GNU Radio write buffers. This requires us to modify a couple of files in gnuradio-runtime so that we can pass the write buffer pointer to the block's constructors. The goal is to then create co-processor blocks that perform get_user_pages on the write buffers so that we can perform direct IO to and from the userspace pages without an extra copy to the kernel. This technique should be portable to the majority of co-processors.
We call these buffers that are to be modified outside of the circbuf factory in-place buffers. Essentially they are history=1, relative_rate=1, single input, and single output.
See work on dissecting GNU Radio Runtime