tests: perf: UNHALTED_REFERENCE_CYCLES might not be actionable on a host
This patch does NOT address the root problem and only addresses the
validation of the context to be added during the test suite.
Observed issue
==============
The system_tests jobs for master hangs on the perf event test suites.
Cause
=====
The hang is caused by a cleanup problem (reference counting of the trace
chunk on session destroy/rotation) when the activation of a context
fails on a ust app channel.
This patch does NOT address the root problem and only addresses the
validation of the context to be added during the test suite. In all
cases we need to handle when a context fail, but for this test we need
to validate that the context can be added and skip the tests as
necessary based on the host.
The perf tests depend on the presence and accessibility of the
UNHALTED_REFERENCE_CYCLES PMU counter. This test suite was previously
run "manually" and since it required the presence and access to that
PMU. Since that the perf test suite is now run on `make check` when
libpfm is present, we need to automate the discovery of
UNHALTED_REFERENCE_CYCLES and validate that we can access it.
There are three major scenarios were we want to skip the tests.
1) UNHALTED_REFERENCE_CYCLES is simply not present in the PMU sets for
that hosts.
2) UNHALTED_REFERENCE_CYCLES is present in the PMU sets but not
actionable. This can happen on qemu guests.
3) UNHALTED_REFERENCE_CYCLES is present but not accessible. This can be
happen if the `/proc/sys/kernel/perf_event_paranoid` prevents the usage
of the PMU.
Solution
========
Two problems were found with `find_event.c`.
1) It took the first event matching the passed name even if it was in a
PMU not supported by the host. In our use case it worked since the only
platform that does not use `r300` is not currently in our testing set.
-> = PMU set currently choosen
-* = The correct PMU set
e.g:
-> Intel Core
r300
Intel Atom
r300
Intel Nehalem
r300
Intel Nehalem EX
r300
Intel X86 architectural PMU
r13c
...
-* Intel Skylake
r300
On my system only the following are "detected" as per libpfm example
found here [1].
[18, ix86arch, "Intel X86 architectural PMU"]
[51, perf, "perf_events generic PMU"]
[110, rapl, "Intel RAPL"]
[114, perf_raw, "perf_events raw PMU"]
[200, skl, "Intel Skylake"]
Hence the `skl` PMU set should be used.
2) libpfm does not perform any validation as to if the event is actually
usable or not.
To fix those two problems, we use pfm_get_os_event_encoding and
perf_event_open.
pfm_get_os_event_encoding [2] is responsible for performing the query
across valid PMU sets and encoding it to the perf struct format.
perf_event_open is used to validate that the event can be used. It tests
the availability on the running host and the accessibility of the PMU.
Based on the result of `find_event` the tests are skipped or failed as
necessary.
Known drawbacks
========
The only drawback is that the tests, albeit having libpfm as a
dependency, are not guarantee to run on all hosts. There is not much we
can do here. We can only validate that it is indeed run on our CI, most
probably using lava hardware-based workers.
References
==========
[1] https://sourceforge.net/p/perfmon2/libpfm4/ci/
288483932c3eb83202b0d8762aa0ed8534982c3f/tree/examples/check_events.c
[2] https://man7.org/linux/man-pages/man3/pfm_get_os_event_encoding.3.html
[3] https://man7.org/linux/man-pages/man2/perf_event_open.2.html
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Iea7794dc28d019953930992a2237a1b606368d1f