MPSchedulerPerformance: Difference between revisions

From GNU Radio
Jump to navigation Jump to search
(Imported from Redmine)
 
 
(2 intermediate revisions by the same user not shown)
Line 4: Line 4:


<pre>
<pre>
 
<table width="100%" border="0">
 
<tr>
1 pipeline x 3 stages
<td align=center><strong>1 pipeline x 3 stages</strong></td>
3 pipelines x 1 stage
<td align=center><strong>3 pipelines x 1 stage</strong></td>
3 pipelines x 3 stages
<td align=center><strong>3 pipelines x 3 stages</strong></td>
 
</tr>
 
<tr>
 
<td align=center valign=top><img src="http://gnuradio.org/images/perf-data-images/1x3.png"></td>
 
<td align=center valign=top><img src="http://gnuradio.org/images/perf-data-images/3x1.png"></td>
 
<td align=center valign=top><img src="http://gnuradio.org/images/perf-data-images/3x3.png"></td>
 
</tr>
</table>
</pre>
</pre>
Each FIR has 256 taps. Since these are implemented with a dot-product, we count 2 floating point operations (FLOP) per tap, per sample. For each run of the benchmark, we measure the user, system and real time. In addition we know the number of samples processed by the graph and the topology, thus we can compute the total number of floating point operations. We compute GFLOPS as the total FLOPs / real time / 1e9.
Each FIR has 256 taps. Since these are implemented with a dot-product, we count 2 floating point operations (FLOP) per tap, per sample. For each run of the benchmark, we measure the user, system and real time. In addition we know the number of samples processed by the graph and the topology, thus we can compute the total number of floating point operations. We compute GFLOPS as the total FLOPs / real time / 1e9.
Line 38: Line 39:
http://gnuradio.org/images/perf-data-images/core-duo.png
http://gnuradio.org/images/perf-data-images/core-duo.png


== [[PowerPC]] Processors (using Altivec) ==
== PowerPC Processors (using Altivec) ==


Note the differences in scaling between the same machines depending on whether we're using<br />
Note the differences in scaling between the same machines depending on whether we're using<br />
Line 52: Line 53:
http://gnuradio.org/images/perf-data-images/ps3-altivec.png
http://gnuradio.org/images/perf-data-images/ps3-altivec.png


== [[PowerPC]] Processors (without Altivec) ==
== PowerPC Processors (without Altivec) ==


Please note that the benchmark does not contain any '''AltiVec''' code. Thus these machines are<br />
Please note that the benchmark does not contain any '''AltiVec''' code. Thus these machines are<br />

Latest revision as of 07:02, 19 March 2017

Each benchmark data point is generated by running a particular flowgraph.
The flowgraphs have a rectangular format, described by the number of pipelines and the number of stages in each pipeline. For the purposes of the benchmark, only the fir_filter_fff blocks count as stages; the null_source, head and null_sinks get folded into the overhead.
E.g.,

<table width="100%" border="0">
<tr>
<td align=center><strong>1 pipeline x 3 stages</strong></td>
<td align=center><strong>3 pipelines x 1 stage</strong></td>
<td align=center><strong>3 pipelines x 3 stages</strong></td>
</tr>
<tr>
<td align=center valign=top><img src="http://gnuradio.org/images/perf-data-images/1x3.png"></td>
<td align=center valign=top><img src="http://gnuradio.org/images/perf-data-images/3x1.png"></td>
<td align=center valign=top><img src="http://gnuradio.org/images/perf-data-images/3x3.png"></td>
</tr>
</table>

Each FIR has 256 taps. Since these are implemented with a dot-product, we count 2 floating point operations (FLOP) per tap, per sample. For each run of the benchmark, we measure the user, system and real time. In addition we know the number of samples processed by the graph and the topology, thus we can compute the total number of floating point operations. We compute GFLOPS as the total FLOPs / real time / 1e9.

The benchmark code

The benchmark code and raw data are in
[source:gnuradio/branches/features/mp-sched/gnuradio-examples/python/mp-sched gnuradio-examples/python/mp-sched.]

You can plot the raw data and fly it around in 3D using
[source:gnuradio/branches/features/mp-sched/gnuradio-examples/python/mp-sched/plot_flops.py plot_flops.py].

x86 and x86_64 Processors

On the x86 machines the kernel of gr.fir_filter_fff is implemented with SSE.

http://gnuradio.org/images/perf-data-images/dual-quad-core.png

http://gnuradio.org/images/perf-data-images/dual-quad-core-2.33-clovertown.png

http://gnuradio.org/images/perf-data-images/core2-duo.png

http://gnuradio.org/images/perf-data-images/core-duo.png

PowerPC Processors (using Altivec)

Note the differences in scaling between the same machines depending on whether we're using
Altvec or not. It's especially pronounced on the PS3.

http://gnuradio.org/images/perf-data-images/js21-altivec.png

In the next two graphs running on cell processors we're not yet using the SPEs.
This is all running on the PPE.

http://gnuradio.org/images/perf-data-images/qs21-altivec.png

http://gnuradio.org/images/perf-data-images/ps3-altivec.png

PowerPC Processors (without Altivec)

Please note that the benchmark does not contain any AltiVec code. Thus these machines are
at a disadvantage to the SSE enabled benchmark code on the x86 and x86_64. Within a graph, however,
the scaling is valid.

http://gnuradio.org/images/perf-data-images/js21.png

In the next two graphs running on cell processors we're not yet using the SPEs.
This is all running on the PPE.

http://gnuradio.org/images/perf-data-images/qs21.png

http://gnuradio.org/images/perf-data-images/ps3.png