GNU Radio 4.0 Summary of Proposed Features

High Level Design Goals
GNU Radio 4.0 seeks to make major changes to the core GNU Radio code in order to achieve the following goals
 * Modular Runtime Components
 * Improved Support of Heterogeneous Architectures
 * Support for Distributed Architectures

In addition, there are many things we are able to improve "while we're at it" that aren't related to performance, but more toward the Developer and User Experience. These include:


 * Separating the Block API from the runtime
 * YAML based block design methodology []

Modularity
GNU Radio 3.x uses a fixed runtime that is intended to support operation on GPP-only platforms. The scheduler which uses 1 thread per block (TPB) has been generally effective, but is not suitable to all applications. Rather than solve the problem for every potential user, GR 4.0 will provide a modular architecture for the major runtime components so that application specific version can be used when appropriate.

The currently proposed modular components are
 * Scheduler
 * Runtime
 * Custom Buffers

Heterogeneous Architectures
GNU Radio 3.10 introduced a Custom Buffers feature for streamlined data movement to and from hardware accelerators. GR 4.0 seeks to extend this capability by not being constrained by the GR3.x API, which allow more flexible custom buffers to be specified, rather than being locked with the block. For instance, a block might have a CUDA implementation that assumes the work method is already in GPU memory. Depending on the platform, this could be more effectively handled if the data is in device memory, pinned memory, or utilizing managed memory. By separating the buffer abstraction from the block, one block implementation can be used on different platforms.

Scheduler and runtime modularity is also intended to be useful for heterogeneous architectures. For instance, consider a multi-gpu server. The current CPU scheduler with GPU custom buffers can handle a single GPU effectively, but probably can't adequately utilize the multi-gpu resources without a custom scheduling component.

Distributed Architectures
Sometimes it is useful to run a flowgraph across multiple host processors. One example could be a distributed DSP problem where channels of filtered data are sent to different machines for computationally intensive signal processing. This can be done manually currently in GR3.x with the use of ZMQ or Networking blocks and setting up orchestration scripts to control the flow between flowgraphs running on different machines.

The goal for 4.0 is to integration this behavior by use of a modular runtime that can automatically handle the serialization and configuration of graph edges that cross host boundaries.

There are a few main components to this feature:
 * 1) Serialization of stream and message data
 * 2) RPC control of the runtime
 * 3) Custom Runtime to be able to integrate things like Kubernetes

Streamlined Developer Experience
See [] for more details

The goal of this feature is to make the process of creating and maintaining blocks less painful by:
 * Getting rid of boilerplate through code generation
 * Organizing the code files in one folder
 * Get as much "for free" as possible when making a block

For instance, as of GR3.10, if you want to add a parameter to the constructor of a block, you have to
 * Add it to the public header
 * Update the impl header
 * Update the impl.cc file
 * Update the grc
 * Update the python bindings
 * Update the documentation (either wiki or doxygen or both)

This is a lot of effort for a minimal change - so the idea here is to have a top level .yml file that will drive the generation of all the boiler plate. All you as a developer should need to worry about generally is the work function

Improved PMT library
A new PMT effort is underway (led by John Sallay) that seeks to modernize and make more performant the PMT API []. This will have many benefits including faster processing of message/PDU based flowgraphs

GR 4.0 will not use the legacy PMT API, but will use the new |"PMTF"

meson/ninja
Meson is a powerful and user friendly build system that uses a python-like syntax

Originally intended as a placeholder for the build system (replace CMake) since it is easier to get things up and running quickly, it has turned out to be quite powerful and less mind-boggling. We should consider sticking with it.

yaml-cpp
Use yaml for preferences and for configuration of plugin components with a public factory method

gtest
Replace Boost.Test for c++ unit tests

Removed Dependencies

 * Boost (no boost is a hard requirement)

Vendorized Dependencies
The following dependencies are added as submodules (actually using meson's wrap functionality)
 * CLI11 (replaces Boost program_options)
 * cppzmq
 * nlohmann-json
 * moodycamel
 * pmtf