GRCon13Coprocessor

From GNU Radio
Revision as of 01:36, 8 March 2017 by Mbr0wn (talk | contribs) (Imported from Redmine)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

GRCon13 CoProc WG

AKA: Putting Blocks Somewhere Besides the General Purpose Processor on Which GNU Radio is Running

  • Diverse hardware platforms each with unique attributes and challenges
  • Not practical to make GR a replacement for existing development tools (Xilinx ISE, TI Code Composer, etc.)
  • Dynamically scheduling when to do what where is hard
    • Goal: enable hardware accelerator users, developers, and researchers to adopt GR as a framework for applications
  • Moving data
    • Creating buffers in desired memory region
    • Facilitating command/control and parameter loading
  • Permit “chains of operations” and “superblocks”
    • Allows configuration of accelerated portion at start-up (or not)
  • Need a unified accelerator API
    • Wrap the necessary parts of the driver interface
    • Present the desired functional interface to the flowgraph
    • Provide accelerator developers an easy, effective, and efficient way to use GR

Initial Goals

  • C++ Class API for GR buffer interface
    • Allow for multiple types of buffer allocation and usage, each of which all must provide the same data guarantees to scheduler
      • VM Circular; non-circular; non-host based via DMA (circular or not); others
      • Specifics defined by actual interface, inherited from parent class
    • Move current GR buffers to use this, or this to use generic GR buffer interface if that is already in place
    • Arbitrary size, depending on usage and need of block, but default to a specific value for buffer type
  • C++ Class API for coprocessor interface
    • Supports means for creating buffers for data transport between a specific coprocessor and main CPU memory (via new buffer API)
    • Separate data transport and kernel execution if/where possible, to minimize latency to coprocessor work, and maximize data throughput when handling processing on coprocessor
    • Supports means for executing a single kernel on the coprocessor
    • No support for multiple-kernel scheduling yet; multi-kernel combined into single kernel initially
    • Single threaded; asynchronous / no blocking (use internal state to keep tabs on processing)
    • Work flow: push data to coprocessor, kernel execution, pull data from coprocessor
    • Hopefully data push and pull can be made asynchronous to kernel execution

Future Goals

  • Allow kernel-per-block/thread, multi-kernel control via current host CPU-based scheduler, while maintaining data storage on coprocessor in-between relevant blocks
  • Dynamic block allocation on host CPU or coprocessor at flow graph start time
  • Dynamic block work location selection on host CPU or coprocessor during runtime
  • Supports means for creating buffers for data transport between any specific coprocessors, to avoid having to return data to the host CPU