7 years, 10 months ago.

LPC4088 Nearest Neighbor Image Resize with

Hi everyone,

I'm on the LPC4088 QuickStart Board.

I'm using nearest neighbor to resize an 80x60 image to 640x480. The algorithm is extremely slow, however. I was wondering if this is just a problem with the LPC4088 or if there is something I can do with the algorithm to speed it up. The input and output image sizes are constant, but the algorithm is written more generally:

void ImProcess::nearestNeighborResize(uint8_t *data, uint32_t *newData, 
                                    uint16_t srcWidth, uint16_t srcHeight,
                                    uint16_t destWidth, uint16_t destHeight)
{
    double scaleWidth =  (double)destWidth / (double)srcWidth;
    double scaleHeight = (double)destHeight / (double)srcHeight;
    uint16_t cy, cx;
    uint32_t v_index, h_index, nearestMatch, pixel;
    
    for(cy = 0; cy < destHeight; cy++)
    {
        for(cx = 0; cx < destWidth; cx++)
        {
            pixel=cy*destWidth+cx;
            v_index = (uint32_t)(cy/scaleHeight);
            h_index = (uint32_t)(cx/scaleWidth);
            nearestMatch =  (v_index*(srcWidth*3) + h_index*3);
            
            newData[pixel]= (0xFF << 24) | (data[nearestMatch] << 16) | (data[nearestMatch+1] << 8) | data[nearestMatch+2];
        }
    }
}

Knowing the sizes of the input and output images already, the problem is one of copying each pixel in the original image to an 8x8 block of pixels in the new image. Is there a faster way than the above to do this? I would think looping through the original image pixels and then looping twice from 0 through 8 for the 8x8 array in the new image would do it, but that just makes the code perform the same number of iterations as above:

640x480 = 80*60*8*8

1 Answer

7 years, 10 months ago.

The first obvious speed up would be changing the h_index and v_index calculations from using doubles to using integers. You know your scale factor is going to be 8, replacing a double precision divide with a shift 3 bits to the left is a lot faster, (1 clock Vs 400 clocks). You're doing that twice per pixel in the new image so this is a big saving.

If that's not enough then you can make some structural changes. You are currently calculating the value for each pixel in the new image. Since you know that 64 pixels will have the same value you don't need to work out the value 63 of those times.

You could nest a couple of for loops to write the whole block of 64 values but it's probably more efficient to calculate full line in the image and then duplicate that line 7 times.

for(cy = 0; cy < srcHeight; cy++)
    {
        int lineStart = cy*8*destWidth;
        for(cx = 0; cx < srcWidth; cx++)
        {
            int newPixel=lineStart+cx*8;
            int oldPixel= cy*srcWidth*3+cx*3;
            uint32_t newValue = (0xFF << 24) | (data[oldPixel] << 16) | (data[oldPixel+1] << 8) | data[oldPixel+2];

            for (int i=0;i<8;i++) { // create pixels of the same value
               newData[newPixel+i] == newValue;
            }
        }
        for (int i=1;i<8;i++) // next 7 rows are the same so do a block copy of the row.
          memcpy(newData+lineStart + i*destWidth,newData+lineStart ,destWidth*sizeof(uint32_t));
    }