GRCon13Coprocessor

= GRCon13 CoProc WG =

AKA: Putting Blocks Somewhere Besides the General Purpose Processor on Which GNU Radio is Running


 * Diverse hardware platforms each with unique attributes and challenges
 * Not practical to make GR a replacement for existing development tools (Xilinx ISE, TI Code Composer, etc.)
 * Dynamically scheduling when to do what where is hard
 * Goal: enable hardware accelerator users, developers, and researchers to adopt GR as a framework for applications


 * Moving data
 * Creating buffers in desired memory region
 * Facilitating command/control and parameter loading
 * Permit “chains of operations” and “superblocks”
 * Allows configuration of accelerated portion at start-up (or not)
 * Need a unified accelerator API
 * Wrap the necessary parts of the driver interface
 * Present the desired functional interface to the flowgraph
 * Provide accelerator developers an easy, effective, and efficient way to use GR

Initial Goals


 * C++ Class API for GR buffer interface
 * Allow for multiple types of buffer allocation and usage, each of which all must provide the same data guarantees to scheduler
 * VM Circular; non-circular; non-host based via DMA (circular or not); others
 * Specifics defined by actual interface, inherited from parent class
 * Move current GR buffers to use this, or this to use generic GR buffer interface if that is already in place
 * Arbitrary size, depending on usage and need of block, but default to a specific value for buffer type


 * C++ Class API for coprocessor interface
 * Supports means for creating buffers for data transport between a specific coprocessor and main CPU memory (via new buffer API)
 * Separate data transport and kernel execution if/where possible, to minimize latency to coprocessor work, and maximize data throughput when handling processing on coprocessor
 * Supports means for executing a single kernel on the coprocessor
 * No support for multiple-kernel scheduling yet; multi-kernel combined into single kernel initially
 * Single threaded; asynchronous / no blocking (use internal state to keep tabs on processing)
 * Work flow: push data to coprocessor, kernel execution, pull data from coprocessor
 * Hopefully data push and pull can be made asynchronous to kernel execution

Future Goals


 * Allow kernel-per-block/thread, multi-kernel control via current host CPU-based scheduler, while maintaining data storage on coprocessor in-between relevant blocks
 * Dynamic block allocation on host CPU or coprocessor at flow graph start time
 * Dynamic block work location selection on host CPU or coprocessor during runtime
 * Supports means for creating buffers for data transport between any specific coprocessors, to avoid having to return data to the host CPU