Thursday, December 22, 2016

Slides on the Chamelium board

Yesterday I gave a short talk about the Chamelium board from the ChromeOS team, and thought that the slides could be useful for others as this board gets used more and more outside of Google.

If you are interested in how this board can help you automate the testing of your display (and not only!) code and hardware, a new mailing list has been created to discuss its uses. We at Collabora will be happy to help you integrate this board in your CI lab as well.

Thanks go to Intel for sponsoring the preparation of these slides and for allowing me to share them under an open license.

And of course, thanks to Google's ChromeOS team for releasing the hardware design with an open hardware license along with the code they are running on it and with it.

Tuesday, November 8, 2016

How continuous integration can help you keep pace with the Linux kernel

Almost all of Collabora's customers use the Linux kernel on their products. Often they will use the exact code as delivered by the SBC vendors and we'll work with them in other pars of their software stack. But it's becoming increasingly common for our customers to adapt the kernel sources to the specific needs of their particular products.

A very big problem most of them have is that the kernel version they based on isn't getting security updates any more because it's already several years old. And the reason why companies are shipping kernels so old is that they have been so heavily modified compared to the upstream versions, that rebasing their trees on top of newer mainline releases is so expensive that is very hard to budget and plan for it.

To avoid that, we always recommend our customers to stay close to their upstreams, which implies rebasing often on top of new releases (typically LTS releases, with long term support). For the budgeting of that work to become possible, the size of the delta between mainline and downstream sources needs to be manageable, which is why we recommend contributing back any changes that aren't strictly specific to their products.

But even for those few companies that already have processes in place for upstreaming their changes and are rebasing regularly on top of new LTS releases, keeping up with mainline can be a substantial disruption of their production schedules. This is in part because new bugs will be in the new mainline release, and new bugs will be in the downstream changes as they get applied to the new version.

Those companies that are already keeping close to their upstreams typically have advanced QA infrastructure that will detect those bugs long before production, but a long stabilization phase after every rebase can significantly slow product development.

To improve this situation and encourage more companies to keep their efforts close to upstream we at Collabora have been working for a few years already in continuous integration of FOSS components across a diverse array of hardware. The initial work was sponsored by Bosch for one of their automotive projects, and since the start of 2016 Google has been sponsoring work on continuous integration of the mainline kernel.

One of the major efforts to continuously integrate the mainline Linux kernel codebase is, which builds several configurations of different trees and submits boot jobs to several labs around the world, collating the results. This is being of great help already in detecting at a very early stage any changes that either break the builds, or prevent a specific piece of hardware from completing the boot stage.

Though can easily detect when an update to a source code repository has introduced a bug, such updates can have several dozens of new commits, and without knowing which specific commit introduced the bug, we cannot identify culprits to notify of the problem. This means that either someone needs to monitor the dashboard for problems, or email notifications are sent to the owners of the repositories who then have to manually look for suspicious commits before getting in contact with their author.

To address this limitation, Google has asked us to look into improving the existing code for automatic bisection so it can be used right away when a regression is detected, so the possible culprits are notified right away without any manual intervention.

Another area in which is currently lacking is in the coverage of the testing. Build and boot regressions are very annoying for developers because they impact negatively everybody who work in the affected configurations and hardware, but the consequences of regressions in peripheral support or other subsystems that aren't involved critically during boot can still make rebases much costlier.

At Collabora we have had a strong interest in having the DRM subsystem under continuous integration and some time ago started a R&D project for making the test suite in IGT generically useful for all the DRM drivers. IGT started out being i915-specific, but as most of the tests exercise the generic DRM ABI, they could as well test other drivers with a moderate amount of effort. Early in 2016 Google started sponsoring this work and as of today submitters of new drivers are using it to validate their code.

Another related effort has been the addition to DRM of a generic ABI for retrieving CRCs of frames from different components in the graphics pipeline, so two frames can be compared when we know that they should match. And another one is adding support to IGT for the Chamelium board, which can simulate several display connections and hotplug events.

A side-effect of having continuous integration of changes in mainline is that when downstreams are sending back changes to reduce their delta, the risk of introducing regressions is much smaller and their contributions can be accepted faster and with less effort.

We believe that improved QA of FOSS components will expand the base of companies that can benefit from involvement in development upstream and are very excited by the changes that this will bring to the industry. If you are an engineer who cares about QA and FOSS, and would like to work with us on projects such as, LAVA, IGT and Chamelium, get in touch!

Thursday, April 21, 2016

Validating changes to KMS drivers with IGT

New DRM drivers are being added to almost each new kernel release, and because the mode setting API is so rich and complex, bugs do slip in that translate to differences in behaviour between drivers.

There have been previous attempts at writing test suites for validating changes and preventing regressions, but they have typically happened downstream and focused on the specific needs of specific products and limited to one or at most a few of different hardware platforms.

Writing these tests from scratch would have been an enormous amount of work, and gathering previous efforts and joining them wouldn't be much worth it because they were written using different test frameworks and in different programming languages. Also, there would be great overlap on the basic tests, and little would remain of the trickier stuff.

Of the existing test suites, the one with most coverage is intel-gpu-tools, used by the Intel graphics team. Though a big part is specific to the i915 driver, what uses the generic APIs is pretty much driver-independent and can be made to work with the other drivers without much effort. Also, Broadcom's Eric Anholt has already started adding tests for IOCTLs specific to the VideoCore-IV driver.

Collabora's Micah Fedke and Daniel Stone had added a facility for selecting DRM device files other than i915's and I improved the abstraction for creating buffers so it works for drivers without GEM buffers. Next I removed a bunch of superfluous dependencies on i915-only stuff and got a useful subset of tests to run on a Radxa Rock2 board (with the Rockchip 3288 SoC). Around half of these patches have been merged already and the other half are awaiting review. Meanwhile, Collabora's Robert Foss is running the ported tests on a Raspberry Pi 2 and has started sending patches to account for its peculiarities.

The next two big chunks of work are abstracting CRC checksums of frames (on drivers other than i915 this could be done with Google's Chamelium or with a board similar to Numato Opsis), and the buffer management API from libdrm that is currently i915-only (bufmgr). Something that will have to be dealt with in the future is abstracting the submittal of specific loads on the GPU as that's currently very much driver-specific.

Additionally, I will be scheduling jobs in our LAVA instance to run these tests on the boards we have in there.

Thanks to Google for sponsoring my time, to the Intel OTC folks for their support and reviews, and to Collabora for sponsoring Robert's, Micah's and Daniel's time.