XMLTagsEditHistoryDiscussion

Analysis: Why EABI matters

Published in LinuxDevices. (Mar. 14, 2007)

Foreword

It's usual to hear of the new ARM EABI nowadays. There are many motivations to start using it but there is one we specially like: it's much faster for floating point operations.

Many ARM cores lack of a Floating Point Unit and any software acceleration is more than welcome.

It might be hard to switch to EABI. For instance, for the Debian distribution, EABI is actually considered a new port.

Without EABI

The ARM EABI (embedded application binary interface) improves the floating point performance. So, it's not striking if you read how your processor is wasting a lot of cycles now. From the Debian ARM-EABI wiki:

The current Debian port creates hardfloat FPA instructions. FPA comes from "Floating Point Accelerator". Since the FPA floating point unit was implemented only in very few ARM cores, these days FPA instructions are emulated in kernel via Illegal instruction faults. This is of course very inefficient: about 10 times slower that -msoftfloat for a FIR test program. The FPA unit also has the peculiarity of having mixed-endian doubles, which is usually the biggest grief for ARM porters, along with structure packing issues.

So, what does it mean? It means that the compilers usually generate inscructions for a piece of harware, namely a Floating Point Unit that is not actually there!

So, when you make a floating point operation, such at 3.58*x, the CPU runs into an illegal instruction, and it raises an exception.

The kernel catches this specific exception and performs the intended float point operation, and then resumes executing the program. And this is slow because it implies a context switch.

The benchmark

We made a simple benchmark using our Open Hardware Free ECB_AT91 ARM(ARMv4t) development board and a simple benchmark we have used before : The dot product of 2 given vectors, the euclidean distance of the vectors, and the FFT algorithm (complex valued, Cooley and Tukey radix-2), The source code we used is available (GPL).

It's usual to use the number of Floating Point Operations performed by a given program for benchmarking purposes. This can be misleading because some operations (division) take more time than others (addition). That's why we will run the same program in both setups, with similar compiler flags

First we will try the Old ABI using the Debian distribution (Debian Sid). We will use an image that we bootstrapped.

For the EABI test we used OpenEmbedded (by using the Ĺngström Distribution).

Results

MFLOPS (Millions of floating point instructions per second)

imagenes/bench/eabi-oabi/bench-eabi-oabi.png

Speed-Up EABI over OABI

imagenes/bench/eabi-oabi/bench-speedup.png

In each context switch both the data and instruction cache are flushed and this hurts the Old ABI's performance. You will notice it in the graphs because the performance with the old ABI does not depend on the size (N) of the input data whereas in EABI the impact of the cache in the performance is seen clearly. The dot-product performance only goes down when N > 4096 (When we use more than 16KB in memory). The processor we're using (AT91RM9200) has a 16-Kbyte data cache.

Closing Remarks

In order to benefit from the new EABI you need a distribution that supports it. Thanks to the work of the OpenEmbedded crew we got a working toolchain based on GCC. We didn't test EABI with Debian, but we expect the results to be quite similar.

About the Authors

Andrés Calderón and Nelson Castillo are co-founders of emQbit. Both of them are long time GNU/Linux users.

imagenes/emQbit/team/afc-lt.png

Andrés has experience in High Performance computing and in Embedded Systems Design, DSP and FPGA programming.

imagenes/emQbit/team/nelson-lt.png

Nelson has experience in Unix Network Programming, Linux driver programming and Embedded System programming.

The Free ECB_AT91 V1 was used in the tests. The design (open hardware) of this board is available online.

Free ECB_AT91

References

Last update: 2007-03-15 (Rev 189)

svnwiki $Rev: 12966 $