Wednesday, July 31, 2024

Etnaviv NPU update 20: Fast object detection on the NXP i.MX 8M Plus SoC

I'm happy to announce that my first project regarding support for the NPU in NXP's i.MX 8M Plus SoC has reached the feature complete stage.

CC BY-NC 4.0 Henrik Boye

For the last several weeks I have been working full-time on adding support for the NPU to the existing Etnaviv driver. Most of the existing code that supports the NPU in the Amlogic A311D was reused, but NXP used a much more recent version of the NPU IP so some advancements required new code, and this in turn required reverse engineering.

This work has been kindly sponsored by the Open Source consultancy Ideas On Board, for which I am very grateful. I hope this will be useful to those companies that need full mainline support in their products, even if it is just the start.

This company is unique in working on both NPU and camera drivers in Linux mainline, so they have the best experience for products that require long term support and vision processing.

Since the last update I have fixed the last bugs in the compression of the weights tensor and implemented support for a new hardware-assisted way of executing depthwise convolutions. Some improvements on how the tensor addition operation is lowered to convolutions was needed as well.

Performance is pretty good already, allowing for detecting objects in video streams at 30 frames per second, so at a similar performance level as the NPU in the Amlogic A311D. Some performance features are left to be implemented, so I think there is still substantial room for improvement.

At current the code is at a very much proof-of-concept state. The next step is cleaning it all up and submitting for review to Mesa3D. In the meantime, you can find the draft code at https://gitlab.freedesktop.org/tomeu/mesa/-/tree/etnaviv-imx8mp.

A big thanks to Philipp Zabel who reverse engineered the bitstream format of the weight encoding and added some patches to the kernel that were required for the NPU to work reliably.