git.lttng.org Git - lttng-tools.git/commit

tests: perf: UNHALTED_REFERENCE_CYCLES might not be actionable on a host

This patch does NOT address the root problem and only addresses the
validation of the context to be added during the test suite.

Observed issue
==============

The system_tests jobs for master hangs on the perf event test suites.

Cause
=====

The hang is caused by a cleanup problem (reference counting of the trace
chunk on session destroy/rotation) when the activation of a context
fails on a ust app channel.

This patch does NOT address the root problem and only addresses the
validation of the context to be added during the test suite. In all
cases we need to handle when a context fail, but for this test we need
to validate that the context can be added and skip the tests as
necessary based on the host.

The perf tests depend on the presence and accessibility of the
UNHALTED_REFERENCE_CYCLES PMU counter. This test suite was previously
run "manually" and since it required the presence and access to that
PMU. Since that the perf test suite is now run on `make check` when
libpfm is present, we need to automate the discovery of
UNHALTED_REFERENCE_CYCLES and validate that we can access it.

There are three major scenarios were we want to skip the tests.

1) UNHALTED_REFERENCE_CYCLES is simply not present in the PMU sets for
that hosts.

2) UNHALTED_REFERENCE_CYCLES is present in the PMU sets but not
actionable. This can happen on qemu guests.

3) UNHALTED_REFERENCE_CYCLES is present but not accessible. This can be
happen if the `/proc/sys/kernel/perf_event_paranoid` prevents the usage
of the PMU.

Solution
========

Two problems were found with `find_event.c`.

1) It took the first event matching the passed name even if it was in a
PMU not supported by the host. In our use case it worked since the only
platform that does not use `r300` is not currently in our testing set.

-> = PMU set currently choosen
-* = The correct PMU set

e.g:
-> Intel Core
      r300
    Intel Atom
      r300
    Intel Nehalem
      r300
    Intel Nehalem EX
      r300
    Intel X86 architectural PMU
      r13c
    ...
-* Intel Skylake
     r300

On my system only the following are "detected" as per libpfm example
found here [1].

   [18, ix86arch, "Intel X86 architectural PMU"]
   [51, perf, "perf_events generic PMU"]
   [110, rapl, "Intel RAPL"]
   [114, perf_raw, "perf_events raw PMU"]
   [200, skl, "Intel Skylake"]

Hence the `skl` PMU set should be used.

2) libpfm does not perform any validation as to if the event is actually
usable or not.

To fix those two problems, we use pfm_get_os_event_encoding and
perf_event_open.

pfm_get_os_event_encoding [2] is responsible for performing the query
across valid PMU sets and encoding it to the perf struct format.

perf_event_open is used to validate that the event can be used. It tests
the availability on the running host and the accessibility of the PMU.

Based on the result of `find_event` the tests are skipped or failed as
necessary.

Known drawbacks
========

The only drawback is that the tests, albeit having libpfm as a
dependency, are not guarantee to run on all hosts. There is not much we
can do here. We can only validate that it is indeed run on our CI, most
probably using lava hardware-based workers.

References
==========

[1] https://sourceforge.net/p/perfmon2/libpfm4/ci/288483932c3eb83202b0d8762aa0ed8534982c3f/tree/examples/check_events.c
[2] https://man7.org/linux/man-pages/man3/pfm_get_os_event_encoding.3.html
[3] https://man7.org/linux/man-pages/man2/perf_event_open.2.html

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Iea7794dc28d019953930992a2237a1b606368d1f

author	Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
	Mon, 15 Mar 2021 15:25:07 +0000 (11:25 -0400)
committer	Jérémie Galarneau <jeremie.galarneau@efficios.com>
	Mon, 15 Mar 2021 19:10:52 +0000 (15:10 -0400)
commit	65702b8f172b8d2156ab1889f7e7c1b134114ec1
tree	dcfb9d92ba6b3c055c8e062fdb10dcfa0c2c7023	tree \| snapshot
parent	f7b7d4fc6333c8be9a8abd9a978d3922f13bb2ab	commit \| diff

tests/perf/find_event.c		diff \| blob \| blame \| history
tests/perf/test_perf_raw.in		diff \| blob \| blame \| history