Thursday, June 13, 2024

Rockchip NPU update 4: Kernel driver for the RK3588 NPU submitted to mainline

In the past few weeks I have been working on among other things a kernel driver for the NPU in the Rockchip RK3588 SoC, new from the ground up.

It is now fully working and after a good amount of polishing I sent it yesterday to the kernel mailing lists, for review. Those interested can see the code and follow the review process at this link.

The kernel driver is able to fully use the three cores in the NPU, giving us the possibility of running 4 simultaneous object detection inferences such as the one below on a stream, at almost 30 frames per second.

 

The userspace  driver is in a less polished state, but fully featured at this state. I will be working on this in the next few days so it can be properly submitted for review.

This is the first accelerator-only driver for an edge NPU submitted to the mainline kernel, and hopefully it can serve as a template for the next ones to come, as the differences among NPUs of different vendors are relatively superficial.

6 comments:

Liam said...

Did you happen to modify the HDMI_RX input for better support for realtime raw video for computer vision?

Tomeu Vizoso said...

Hi Liam, I have worked only on strictly the NPU so far, on the RK3588. For my demo I just used a USB webcam.

Anonymous said...

Hi, thanks for sharing your updates and progress. Sorry for the newbie questions, I am starting to learn about NPU and related stuff.

May I ask which board are you using for the RK3588 development?
Also, how do you know the NPU itself has 3 cores? I searched only and the datasheet and did not find that reference
Finally, why is that you can run 4 streams simultaneously on 3 cores? Is it thanks to 3 NPU cores + 1 GPU.

I am planning to buy the Orange Pi 5 Pro to try out your ideas, so I really appreciate your updates and sharing your hard work. Thanks!

Tomeu Vizoso said...

> May I ask which board are you using for the RK3588 development?

I am using a QuartzPro64 that Pine54 sent me.

> Also, how do you know the NPU itself has 3 cores? I searched only and the datasheet and did not find that reference

I think I first saw it in the source code of their kernel driver.

> Finally, why is that you can run 4 streams simultaneously on 3 cores? Is it thanks to 3 NPU cores + 1 GPU.

The kernel contains a job queue and dispatches jobs from it to the 3 cores. Because part of running the model happens outside of the NPU, running one thread above the number of cores gives us more total throughput without a degradation if inferences per second.

> I am planning to buy the Orange Pi 5 Pro to try out your ideas, so I really appreciate your updates and sharing your hard work. Thanks!

You are welcome, hope you have lots of fun.

Anonymous said...

Is your open source kernel driver compatible with the proprietary rknn and rkllm SDKs?

Tomeu Vizoso said...

> Is your open source kernel driver compatible with the proprietary rknn and rkllm SDKs?

No, it's not and it couldn't be, as the UABI that Rockchip chose wouldn't be acceptable in the mainline Linux kernel.