Faster Than the Fastest FFT for Arduino

An effective approach for computing a signal’s discrete Fourier transform is the Fast Fourier Transform (FFT). Searching for algorithm implementations will lead you to this fantastic Instructable. Although it offers a useful method for implementing the FFT, it is still possible to increase efficiency while maintaining high precision. In this instructable, I suggest using the Julia implementation I created here to outperform ApproxFFT (the English version is available on my blog). Enter the fray!

You only need an Arduino card (I’ll use my old Arduino Uno) and a USB cord to upload the code to test the implementation.

Fast fourier transform (FFT) spectrum of the vibration signal (using a... | Download Scientific Diagram

First Step : What Distinguishes ApproxFFT From Other Methods?

To calculate the FFT, ApproxFFT employs a key technique: it uses a lookup table to approximatively calculate the sines and cosines required for calculation. On the other hand, using a trigonometric identity, the code provided in this instructable will calculate an FFT with a close to zero approximation.

The fact that ApproxFFT does not carry out the calculation immediately is another problem! Look at lines 66 and 67 in the original file to see that this is true. You’ll see that my Arduino Uno won’t support arrays larger than 256 cells due to two array allocations that triple the program’s memory footprint!

I’ll offer two fresh FFT implementations for Arduino in this instructable. Due to some tricks with floating point arithmetic, one will use the float type. As in the first Instructable, the second and third will both employ fixed point arithmetic, and the storage type will be int. The trigonometric identity, a custom saturating addition routine, and a fixed point multiplication based on the fmuls instruction given by the AVR assembler will all help to speed up the calculations. Each and every code is kept in its own GitHub repository. Additionally, a blog entry detailing the implementation’s inner workings may be found on my website. The benchmarks and how to obtain them are the main topics of this Instructable.

The second step is to compare the two programs:

We have to set up a relatively structured test method in order to compare the two implementations. I’ll utilize the setup and loop in the Arduino sketch that follows:

int data[256]={}; int N = 128; float frequency = 44000; // Actually useless for our test. // our timer unsigned long timestart; unsigned long timestop; unsigned long totaltime; void setup() { Serial.begin(250000); while (!Serial) { ; // wait for serial port to connect. Needed for native USB port only } } void loop() { Serial.println(“Ready”); // First, we read the length of the test. An int is two bytes on an Arduino Uno. Serial.readBytes((char*)(&N), sizeof(int)); Serial.println(N, DEC); // Then, we read the input data Serial.readBytes((char*)(data), N*sizeof(int)); Serial.println(“Received data. Starting FFT calculation.”); // We can now start the FFT calculation ! timestart = micros(); // Here we perform the FFT calculation. timestop = micros(); totaltime = timestop-timestart; Serial.println(“FFT calculated, I will now send the result”); // Then we send back the data to the computer. Serial.write((char*)(data), N*sizeof(int)); // And finally we write the execution time. Serial.write((char*)(&totaltime), sizeof(int)); delay(1); Serial.println(“And we’re done !”); }

Using the Serial connection in Julia, it is simple to retrieve the data and time benchmarks. Visit the blog post or examine the program tester.jl file on GitHub to learn more if you’re interested in how this is accomplished! I also benchmarked the old ApproxFFT code for comparison’s sake, so I had to modify the loop function and the FFT code’s conclusion to be able to extract the FFT and transmit it back to the computer.

Third Step is the creation of an FFT for floats:

The trigonometry method is clearly what has made the biggest speed increase in this case. However, I also included some entertaining IEEE-754 techniques for computing the modules, halving, and multiplications quickly. All of these methods rely on the same idea: approximating a number’s base two logarithms using its IEEE-754 representation. In part because it was entertaining, I also multiplied fractional values using the specific AVR instruction.The graphic that is attached demonstrates that the computed FFT is accurate and has a tolerable mean squared error.

Forth Step Going Faster, Using a Fixed-Point FFT:

Fixed-point arithmetic is used in the two additional implementations. The first one employs 8 bits numbers, while the second one makes use of 16 bits (ApproxFFT-equivalent). So, hypothetically, you could apply this code to an array with 512 elements on the first and 1024 elements on the second. Theoretically, for trigonometry to continue working, you would need to create 32-bit fixed-point arithmetic because you would need to make little changes to the buffers and, more critically, for 1024 bytes-long arrays. The blog post goes into detail on all of this. Nevertheless, Fixed16FFT and Fixed8FFT are nearly as accurate as ApproxFFT.

Fifth Step Executing All Benchmarks:

A good feature of this project is the program tester.jl script should allow you to replicate the benchmarks. You must first download Julia. Although I personally don’t use it, I’m aware that Julia for Visual Studio Code is popular. Use this editor if you don’t already have one. Also required is Arduino-CLI.

Launch Julia next, then install any required dependencies. It might take some time!

]add CairoMakie, DataFrames, FFTW, FileIO, FixedPointNumbers, LibSerialPort, LinearAlgebra, Sixel, Statistics,

Keep in mind that you can disregard it if your platform does not support Sixel. Don’t start the script’s lines 3 through 10. You don’t require it, particularly in VS-Code.

The script can then be executed. If it doesn’t work the first time, try again. The various error comparison plots should be displayed, followed by the benchmarks.

Sixth Step The FFT’s speed and accuracy limitations:

Hurray! Compared to the Fastest FFT, we have two FFT implementations that are quicker. As can be seen, Fixed16FFT runs a 256-point FFT in 30ms as opposed to ApproxFFT’s 50ms. Even better, Fixed8FFT completes the same length input array in just 11ms! You can see that FloatFFT is not far behind either.

When compared to the reference implementation that operates on my PC, all of these implementations have nearly the same mean squared error. ExactFFT, my final version, computes the FFT using floats without the use of trigonometry or other advanced floating-point techniques. As you can see, it is billions of times more accurate even if it is far slower than the others.

Conclusion

This project was enjoyable. Ultimately, we achieve the same precision with a 5-fold reduction in execution time compared to ApproxFFT. Additionally, it utilizes space much better. On the blog, there are many small implementation pitfalls that can be of interest to you. For instance, did you know that a chainsaw can be used to calculate complex modules?

If you have any questions, concerns, or feedback, don’t hesitate to get in touch. And most of all, have fun with your present endeavors!

WATCH DIFFERENT KINDS OF LEARNING VIDEOS IN OUR YouTube Channel