Wednesday, August 13, 2014

Dynamic scaling of the memory bus

The problem

These days there's quite good support for CPU scaling in the mainline kernel, and many ARM SoCs are making use of it already. But in modern hardware with lots of very fast external memory, running the memory bus at its maximum frequency drastically reduces the amount of time that the device can run when on battery.

A problem that many teams are finding when trying to upstream their power management code is that there's currently no way for several clock consumers to influence the frequency of the memory bus. There has been a few tries to upstream the solutions currently in vendor trees, but so far no acceptable solution has been found.

I'm helping to upstream some of the stuff in the ChromeOS tree, and this issue is currently blocking very interesting work from reaching mainline.

The past

In the vendor tree for Tegra this is addressed by creating virtual clocks that are child of the clock that wants to be influenced. Depending on the type of the virtual clock, setting its rate will influence the rate of its parent clock by setting a floor or ceiling value.

In Qualcomm's vendor tree for the Snapdragon family of SoCs, the concept of a voter clock is introduced. Drivers can vote on the rate of a given clock by "voting" through a child clock, so not that different to how Tegra does it.

Both approaches have the critical disadvantage of adding clk instances for things that aren't real clocks, thus making the API considerably more confusing for relatively little gain.

Both vendor trees have additional API for registering bandwidth needs: tegra_isomgr and msm_bus_scale. They bear quite some resemblance with each other and with pm_qos_interface, but both are tightly tied to specificities of their platforms.

The discussion was brought back to life a couple of months ago when a patch was posted for allowing the tegra-drm driver to set the frequency rate of the external memory controller based on the amount of bandwidth that was needed by the display controller for refreshing the display. Of course, that patch was rejected because there are other components that need to have a say in the frequency rate of the memory bus.

But in that discussion some kind of plan took form and I have been working on making something from it that can be merged upstream.

A possible future

There's so far two main additions to existing frameworks, with the rationale being explained further below:
  • Add per-user floor and ceiling constraints to the Common Clock Framework, so drivers can set maximum and minimum frequency rates that the clock should respect. Patchset here.
  • Add a PM_QOS_MEMORY_BANDWIDTH class to pm_qos, for drivers to register their expected bandwidth needs. Patchset here.
The idea is for the following agents to be able to influence the current frequency of the memory bus:
  • Thermal: a cooling device would call clk_set_ceiling_rate to cap the memory bus to a frequency based on the current temperature.
  • Power: a battery driver would set a ceiling in the same way, based on the remaining capacity.
  • Devfreq: a devfreq driver wrapping a power management unit such as the ACTMON on Tegra or the PPMU on Exynos would set a floor frequency based on the current load stats.
  • Cpufreq: a cpufreq driver would set a floor frequency based on the current CPU frequency.
  • Devices that can anticipate how much memory bandwidth will need (such as the display controller, the camera, multimedia codecs, an ISP, USB, etc) would register their requirements in the PM_QOS_MEMORY_BANDWIDTH class. The EMC driver would be listening for notifications and setting a floor frequency based on the aggregated bandwidth that is needed.
The impression so far is that this approach matches the needs of the Tegra and Exynos SoCs, and people working on Rockchip upstreaming are evaluating it. Others working on other SoCs are very welcome to look at it and comment, so the result is also useful to them and they can improve their power management in mainline without having to refactor things later.


mupuf said...

Fun project!

However, beware, slowing down the memory bus could actually force the CPU to be active for a longer period of time and result in a greater power usage than if memory requests were answered at full speed before drastically downclocking memory.

That is something that performance counters may not be able to capture if polled at a period slower than these burst. Good thing SoCs tend to have PMUs nowadays!

Thanks for working on this project :)

Tomeu Vizoso said...

Yeah, but that's true of any other resource in the critical path (including the CPU itself).

I think the API proposed is sufficiently low-level to be useful no matter what concrete strategy is followed, which is something that I hope will be tuned with the help of the vendor's QA teams.