With new releases of the Linux kernel and Mesa drivers poised to be packaged by Linux distributions, the TensorFlow Lite driver for the NPU in the Amlogic A311D SoC will be available to users with minimal effort.
With that work bearing its fruits, I have been looking at how this driver could be of use with other hardware.
Philipp Zabel of Pengutronix has been looking at adding support for the NPU in the NXP i.MX 8M Plus SoC, and he has made great progress on reverse engineering the in-memory format of the weights tensor, which is different from that used in the A311D.
I started by probing what would entail supporting the NPU in the S905D3 SoC from Amlogic, and I found it not that different from what is currently supported, besides it also using a new format for the weights tensor.Weights, the other kind of them. |
After a couple of weeks staring at memory dumps and writing a python tool to decode them, I realized that the run-length and Huffman encodings were the same, with only a few differences such as where and how the bias values were stored.
With a few changes to Philip's work-in-progress branch I got my first tests passing on the Libre Computer Solitude SBC board.
Next I will look at supporting more weights tensor dimensions and fixing bugs in how the weights and other values are encoded.
The command stream programming seems to be very similar to that of the A311D, so I don't expect much work to be needed there.
Once everything is working at the same level as with the A311D, I will move to determine the optimal values for the zero run-length and Huffman symbol maps, for maximum compression and thus performance (as NPUs are so fast at arithmetic that they tend to be memory starved).
Big thanks to Pengutronix for supporting Philip's work, and to Libre Computer for having supported the development of the driver so far.