Monday, October 23, 2023

Etnaviv NPU update 9: We got there!


Since the last update I finally got the whole of MobileNetv1 running at full-accuracy on the NPU with Mesa: 
tomeu@arm-64:~/mesa$ python3.10 -i grace_hopper.bmp -m mobilenet_v1_1.0_224_quant.tflite -l labels_mobilenet_quant_v1_224.txt -e
Loading external delegate from with args: {}
Processing the input took 18 ms.
Running the NN job took 13 ms.
Processing the output took 1 ms.
0.866667: military uniform
0.031373: Windsor tie
0.015686: mortarboard
0.007843: bow tie
0.007843: academic gown
time: 33.094ms
That takes us to a performance level around 3 times faster than running the same inference on the CPUs on the A311D SoC.

Most of the time (18 ms.) is spent in my naive manipulation of the input tensor, transposing and reshuffling it to match what the HW expects. Once we learn to do these operations on the 4 tensor manipulation cores, this time should be brought close to zero.

The 13 ms. that the convolutions take in the NPU is still sensibly higher than the 8 ms. that the blob achieves, but the optimizations mentioned in previous updates in this blog should bring us pretty close.

Next steps

Now that we have something that people can use in their products, I will switch to upstreaming mode.

I want to do a few cleanups to the Mesa code and then I will ask for people to review and ack so it can be merged. In the meantime, the draft merge request can be found here.

I would also like to have a CI job running to make sure it doesn't regress. But given that we don't use NIR as of yet and the dependencies with the rest of Mesa are minimal, there is probably little need as long as I'm the only person contributing to the code.

3 comments: said...

Good work! I am gonna move my HAOS from rpi4 to Vim3Pro in a near future. I have a coral USB accelerator too add in the future aswell. I would be awesome to be able to use both.

Anonymous said...

Do you have a tutorial of some sorts?

Anonymous said...

Have you i elementet the New kennel?