GRCon13Coprocessor
Jump to navigation
Jump to search
GRCon13 CoProc WG
AKA: Putting Blocks Somewhere Besides the General Purpose Processor on Which GNU Radio is Running
- Diverse hardware platforms each with unique attributes and challenges
- Not practical to make GR a replacement for existing development tools (Xilinx ISE, TI Code Composer, etc.)
- Dynamically scheduling when to do what where is hard
- Goal: enable hardware accelerator users, developers, and researchers to adopt GR as a framework for applications
- Moving data
- Creating buffers in desired memory region
- Facilitating command/control and parameter loading
- Permit “chains of operations” and “superblocks”
- Allows configuration of accelerated portion at start-up (or not)
- Need a unified accelerator API
- Wrap the necessary parts of the driver interface
- Present the desired functional interface to the flowgraph
- Provide accelerator developers an easy, effective, and efficient way to use GR
Initial Goals
- C++ Class API for GR buffer interface
- Allow for multiple types of buffer allocation and usage, each of which all must provide the same data guarantees to scheduler
- VM Circular; non-circular; non-host based via DMA (circular or not); others
- Specifics defined by actual interface, inherited from parent class
- Move current GR buffers to use this, or this to use generic GR buffer interface if that is already in place
- Arbitrary size, depending on usage and need of block, but default to a specific value for buffer type
- Allow for multiple types of buffer allocation and usage, each of which all must provide the same data guarantees to scheduler
- C++ Class API for coprocessor interface
- Supports means for creating buffers for data transport between a specific coprocessor and main CPU memory (via new buffer API)
- Separate data transport and kernel execution if/where possible, to minimize latency to coprocessor work, and maximize data throughput when handling processing on coprocessor
- Supports means for executing a single kernel on the coprocessor
- No support for multiple-kernel scheduling yet; multi-kernel combined into single kernel initially
- Single threaded; asynchronous / no blocking (use internal state to keep tabs on processing)
- Work flow: push data to coprocessor, kernel execution, pull data from coprocessor
- Hopefully data push and pull can be made asynchronous to kernel execution
Future Goals
- Allow kernel-per-block/thread, multi-kernel control via current host CPU-based scheduler, while maintaining data storage on coprocessor in-between relevant blocks
- Dynamic block allocation on host CPU or coprocessor at flow graph start time
- Dynamic block work location selection on host CPU or coprocessor during runtime
- Supports means for creating buffers for data transport between any specific coprocessors, to avoid having to return data to the host CPU