FFT-Based FIR Filtering Using Overlap-Add Method: a Matlab Implementation

You have a very long signal, and you want to filter it with an FIR filter. The signal can be a stream of radio or audio, and you are going to implement it on a resource-constrained device, an FPGA or an MCU. This looks like a simple task for newbies, what could be wrong?

The Memory is Limited

A common mistake beginners make is loading the entire signal to the memory and then filtering it. This could work in Matlab, but there are many issues when implementing it on real hardware. First, the signal can be very long, and the memory (or registers in an FPGA) is limited. The memory is precious, and you don’t want to eat it all up and crash your device.

Second, it is a waste of time if you start the filtering until the end of the signal stream. The normal way to do it is once a block of the signal is ready, you filter it, and then you wait for the next block. It can be considered a producer-consumer problem, and there are standard implementations in each language.

Overlap-Add Method

We know that the DFT provides an efficient way to compute the convolution (or linear filtering) of two signals in the frequency domain (go back to any DSP textbook if you don’t know about it). To filter a very long sequence, two DFT-based approaches are the overlap-add method and the overlap-save method. Both of them break the signal into blocks and then compute the convolution of each block independently, and then combine the results in a certain way. We will only discuss the overlap-add method here and focus on the implementation in Matlab.

The theory and how it accelerates the filtering can be found in this blog post. The general steps are

find out the optimal NFFT and signal block size based on the FIR filter length.
convert the FIR filter to the frequency domain. and initialize the output signal.
loop each block of the signal, convert it to the frequency domain, and compute the dot product of the frequency domain of the FIR filter and the frequency domain of the signal block.
convert back the result to the time domain, and add it to the output signal with an overlap.

The Matlab Implementation

Note that this code is modified from the Matlab function fftfilt in the Matlab Signal Processing Toolbox. I made it more readable and easy to understand, and I also added some comments to explain the code.

  
function y = fft_filter(b,x)
    %   Overlap-add method for FIR filtering using FFT.
    %   y = fft_filter(b,x) filters x, with the FIR filter specified by the 
    %   coefficients b, using the overlap/add method, and internal
    %   parameters (FFT size and block length) that guarantee efficient
    %   execution.
    
    
    % Check validity of input
    % I found it extremely useful to check the validity of the input
    % and to make the code more readable. no need to write a group of 
    % if-else statements anymore.
    % here I require the input to be a column vector.
    validateattributes(b,{'double'},{'column'},'fft_filter','B',1);
    validateattributes(x,{'double'},{'column'},'fft_filter','X',2);

    % count the number of element for x and b
        nx = size(x,1);
        nb = size(b,1);

    % determine the most efficient NFFT
    if nb >= nx || nb > 2^20              % take a single FFT in this case
        nfft = 2^nextpow2(nb + nx -1);    % 2^ceil(log2(nb + nx -1))
        blockLen = nx;
    else
        % this is the computation cost of the FFT for each NFFT
        fftflops = [ 18 59 138 303 660 1441 3150 6875 14952 32373 69762 ...
            149647 319644 680105 1441974 3047619 6422736 13500637 28311786 ...
            59244791 59244791*2.09];
        nfftList = 2.^(1:21);
        nfftValid = nfftList(nfftList > nb-1);
        fftflopsValid = fftflops(nfftList > nb-1);
        % NFFT = blockLen + nb - 1, to avoid circular convolution
        blockLenList = nfftValid - (nb - 1);
        % minimize (number of blocks) * (number of flops per fft)
        [~,ind] = min(ceil(nx./blockLenList).*fftflopsValid);
        nfft = nfftValid(ind);               % must have nfft > (nb-1)
        blockLen = blockLenList(ind);
    end
    
    B = fft(b,nfft,1);
    % init output y
    y = zeros(size(x),'like',1+1i);
    istart = 1;
    % main loop for the overlap-add method
    % this is a good example how to take care of the edge case
    while istart <= nx
        iend = min(istart+blockLen-1,nx);
        X = fft(x(istart:iend,:),nfft,1);
        yblock = ifft(X.*B,[],1);
        yend = min(nx,istart+nfft-1);
        y(istart:yend,:) = y(istart:yend,:) + yblock(1:(yend-istart+1),:);
        istart = istart + blockLen;
    end

    % let the output be real if the input is real
    if isreal(b) && isreal(x)
        y = real(y);
    end

end

In this post, we not only reviewed a divide-and-conquer way to deal with long signals but also implemented the overlap-add method in Matlab. I hope you can learn one or two coding tricks from this post too.

FFT-Based FIR Filtering Using Overlap-Add Method: a Matlab Implementation

The Memory is Limited

Overlap-Add Method

The Matlab Implementation

Further Reading

Power Spectrum: something I wish I could understand early (2)

Power Spectrum: something I wish I could understand early (3)

Polyphase FIR Filter Downsampling