tests: perf: UNHALTED_REFERENCE_CYCLES might not be actionable on a host
authorJonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Mon, 15 Mar 2021 15:25:07 +0000 (11:25 -0400)
committerJérémie Galarneau <jeremie.galarneau@efficios.com>
Mon, 15 Mar 2021 19:10:52 +0000 (15:10 -0400)
This patch does NOT address the root problem and only addresses the
validation of the context to be added during the test suite.

Observed issue
==============

The system_tests jobs for master hangs on the perf event test suites.

Cause
=====

The hang is caused by a cleanup problem (reference counting of the trace
chunk on session destroy/rotation) when the activation of a context
fails on a ust app channel.

This patch does NOT address the root problem and only addresses the
validation of the context to be added during the test suite. In all
cases we need to handle when a context fail, but for this test we need
to validate that the context can be added and skip the tests as
necessary based on the host.

The perf tests depend on the presence and accessibility of the
UNHALTED_REFERENCE_CYCLES PMU counter. This test suite was previously
run "manually" and since it required the presence and access to that
PMU. Since that the perf test suite is now run on `make check` when
libpfm is present, we need to automate the discovery of
UNHALTED_REFERENCE_CYCLES and validate that we can access it.

There are three major scenarios were we want to skip the tests.

1) UNHALTED_REFERENCE_CYCLES is simply not present in the PMU sets for
that hosts.

2) UNHALTED_REFERENCE_CYCLES is present in the PMU sets but not
actionable. This can happen on qemu guests.

3) UNHALTED_REFERENCE_CYCLES is present but not accessible. This can be
happen if the `/proc/sys/kernel/perf_event_paranoid` prevents the usage
of the PMU.

Solution
========

Two problems were found with `find_event.c`.

 1) It took the first event matching the passed name even if it was in a
 PMU not supported by the host. In our use case it worked since the only
 platform that does not use `r300` is not currently in our testing set.

 -> = PMU set currently choosen
 -* = The correct PMU set

 e.g:
 -> Intel Core
      r300
    Intel Atom
      r300
    Intel Nehalem
      r300
    Intel Nehalem EX
      r300
    Intel X86 architectural PMU
      r13c
    ...
 -* Intel Skylake
     r300

 On my system only the following are "detected" as per libpfm example
 found here [1].

   [18, ix86arch, "Intel X86 architectural PMU"]
   [51, perf, "perf_events generic PMU"]
   [110, rapl, "Intel RAPL"]
   [114, perf_raw, "perf_events raw PMU"]
   [200, skl, "Intel Skylake"]

 Hence the `skl` PMU set should be used.

 2) libpfm does not perform any validation as to if the event is actually
 usable or not.

To fix those two problems, we use pfm_get_os_event_encoding and
perf_event_open.

pfm_get_os_event_encoding [2] is responsible for performing the query
across valid PMU sets and encoding it to the perf struct format.

perf_event_open is used to validate that the event can be used. It tests
the availability on the running host and the accessibility of the PMU.

Based on the result of `find_event` the tests are skipped or failed as
necessary.

Known drawbacks
========

The only drawback is that the tests, albeit having libpfm as a
dependency, are not guarantee to run on all hosts. There is not much we
can do here. We can only validate that it is indeed run on our CI, most
probably using lava hardware-based workers.

References
==========

[1] https://sourceforge.net/p/perfmon2/libpfm4/ci/288483932c3eb83202b0d8762aa0ed8534982c3f/tree/examples/check_events.c
[2] https://man7.org/linux/man-pages/man3/pfm_get_os_event_encoding.3.html
[3] https://man7.org/linux/man-pages/man2/perf_event_open.2.html

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Iea7794dc28d019953930992a2237a1b606368d1f

tests/perf/find_event.c
tests/perf/test_perf_raw.in

index 38ac6c139c2edd6a163069e68974f0acc2cd809c..aa1c964c399180ce95dc769ce5f1cfa404bc7536 100644 (file)
  *
  */
 
+#include <errno.h>
 #include <stdio.h>
-#include <perfmon/pfmlib.h>
 #include <string.h>
 
+#include <linux/perf_event.h>
+#include <perfmon/perf_event.h>
+#include <perfmon/pfmlib_perf_event.h>
+
 int main(int argc, char **argv)
 {
-       int ret, i;
-       unsigned int j;
-       pfm_pmu_info_t pinfo;
+       int ret, fd;
+
+       /* pfm query objects */
+       pfm_perf_encode_arg_t pencoder;
+       pfm_event_info_t info;
+
+       /* Perf event object to be populated by libpfm */
+       struct perf_event_attr attr;
 
        if (argc != 2) {
                fprintf(stderr, "Usage: %s <pmu counter to find>\n"
                                "ex: %s UNHALTED_REFERENCE_CYCLES\n"
-                               "Returns the first occurence it finds with "
+                               "Returns the event raw number if found and actionable with"
                                "return code 0.\n"
-                               "If not found returns 1, on error returns -1\n",
+                               "If not found returns 1,"
+                               "If not actionable return 2,"
+                               "on error returns 255\n",
                                argv[0], argv[0]);
                ret = -1;
                goto end;
        }
 
-       memset(&pinfo, 0, sizeof(pinfo));
-       pinfo.size = sizeof(pinfo);
+       /* Initialize perf_event_attr. */
+       memset(&attr, 0, sizeof(struct perf_event_attr));
+
+       /* Initialize libpfm encoder structure. */
+       memset(&pencoder, 0, sizeof(pencoder));
+       pencoder.size = sizeof(pfm_perf_encode_arg_t);
+
+       /* Initialize libpfm event info structure. */
+       memset(&info, 0, sizeof(info));
+       info.size = sizeof(info);
+
+       /* Prepare the encoder for query. */
+       pencoder.attr = &attr; /* Set the perf_event_attr pointer. */
+       pencoder.fstr = NULL; /* Not interested by the fully qualified event string. */
 
        ret = pfm_initialize();
        if (ret != PFM_SUCCESS) {
                fprintf(stderr, "Failed to initialise libpfm: %s",
                                pfm_strerror(ret));
-               ret = -1;
+               ret = 255;
+               goto end;
+       }
+
+       ret = pfm_get_os_event_encoding(argv[1],
+                       PFM_PLM0 | PFM_PLM1 | PFM_PLM2 | PFM_PLM3,
+                       PFM_OS_PERF_EVENT, &pencoder);
+       if (ret != PFM_SUCCESS) {
+               fprintf(stderr, "libpfm: error pfm_get_os_event_encoding: %s\n",
+                               pfm_strerror(ret));
+               ret = 1;
+               goto end;
+       }
+
+       /*
+        * Query the raw code for later use. Do it now to simplify error
+        * management.
+        */
+       ret = pfm_get_event_info(pencoder.idx, PFM_OS_NONE, &info);
+       if (ret != PFM_SUCCESS) {
+               fprintf(stderr, "libpfm: error pfm_get_event_info: %s\n", pfm_strerror(ret));
+               ret = 1;
                goto end;
        }
 
-       pfm_for_all_pmus(j) {
-               ret = pfm_get_pmu_info(j, &pinfo);
-               if (ret != PFM_SUCCESS) {
-                       continue;
-               }
-
-               for (i = pinfo.first_event; i != -1; i = pfm_get_event_next(i)) {
-                       pfm_event_info_t info =
-                                       { .size = sizeof(pfm_event_info_t) };
-
-                       ret = pfm_get_event_info(i, PFM_OS_NONE, &info);
-                       if (ret != PFM_SUCCESS) {
-                               fprintf(stderr, "Cannot get event info: %s\n",
-                                               pfm_strerror(ret));
-                               ret = -1;
-                               goto end;
-                       }
-
-                       if (info.pmu != j) {
-                               continue;
-                       }
-
-                       if (strcmp(info.name, argv[1]) == 0) {
-                               fprintf(stdout, "r%" PRIx64 "\n", info.code);
-                               ret = 0;
-                               goto end;
-                       }
-               }
+       /*
+        * Now that the event is found, try to use it to validate that
+        * the current user has access to it and that it can be used on that
+        * host.
+        */
+
+       /* Set the event to disabled to prevent unnecessary side effects. */
+       pencoder.attr->disabled = 1;
+
+       /* perf_event_open is provided by perfmon/perf_event.h. */
+       fd = perf_event_open(pencoder.attr, 0, -1, -1, 0);
+       if (fd == -1) {
+               fprintf(stderr, "perf: error perf_event_open: %d: %s\n", errno,
+                               strerror(errno));
+               ret = 2;
+               goto end;
        }
 
-       ret = 1;
+       /* We close the fd immediately since the event is actionable. */
+       close(fd);
+
+       /* Output the raw code for the event */
+       fprintf(stdout, "r%" PRIx64 "\n", info.code);
+       ret = 0;
 
 end:
        return ret;
index 550c0e9a3dc3840f9b68ae85011caebed9c1f55e..8138c25b49c5d8cec148e44a62ab3e001080f18f 100644 (file)
@@ -40,14 +40,29 @@ function have_libpfm()
 
 function test_ust_raw()
 {
-       TRACE_PATH=$(mktemp -d)
-       SESSION_NAME="ust_perf"
-       CHAN_NAME="mychan"
-       EVENT_NAME="tp:tptest"
-       PMU="UNHALTED_REFERENCE_CYCLES"
-       PERFID=$($CURDIR/find_event $PMU)
-       test $? -eq "0"
-       ok $? "Find PMU $PMU"
+       local TRACE_PATH=$(mktemp -d)
+       local SESSION_NAME="ust_perf"
+       local CHAN_NAME="mychan"
+       local EVENT_NAME="tp:tptest"
+       local PMU="UNHALTED_REFERENCE_CYCLES"
+       local tests_to_skip=9
+       local ret
+
+       # Find the raw perf id of the event.
+       PERFID=$("$CURDIR/find_event" "$PMU")
+       ret=$?
+       if [ "$ret" -eq "0" ]; then
+               pass "Find PMU $PMU"
+       elif [ "$ret" -eq "1" ]; then
+               skip 0 "PMU event not found." $tests_to_skip
+               return
+       elif [ "$ret" -eq "2" ]; then
+               skip 0 "PMU event not actionable." $tests_to_skip
+               return
+       else
+               fail "find_event returned $ret."
+               return
+       fi
 
        create_lttng_session_ok $SESSION_NAME $TRACE_PATH
 
@@ -72,14 +87,30 @@ function test_ust_raw()
 
 function test_kernel_raw()
 {
-       TRACE_PATH=$(mktemp -d)
-       SESSION_NAME="kernel_perf"
-       CHAN_NAME="mychan"
-       EVENT_NAME="lttng_test_filter_event"
-       PMU="UNHALTED_REFERENCE_CYCLES"
-       PERFID=$($CURDIR/find_event $PMU)
-       test $? -eq "0"
-       ok $? "Find PMU $PMU"
+       local TRACE_PATH=$(mktemp -d)
+       local SESSION_NAME="kernel_perf"
+       local CHAN_NAME="mychan"
+       local EVENT_NAME="lttng_test_filter_event"
+       local PMU="UNHALTED_REFERENCE_CYCLES"
+       local PERFID=""
+       local tests_to_skip=9
+       local ret
+
+       # Find the raw perf id of the event.
+       PERFID=$("$CURDIR/find_event" "$PMU")
+       ret=$?
+       if [ "$ret" -eq "0" ]; then
+               pass "Find PMU $PMU"
+       elif [ "$ret" -eq "1" ]; then
+               skip 0 "PMU event not found." $tests_to_skip
+               return
+       elif [ "$ret" -eq "2" ]; then
+               skip 0 "PMU event not actionable." $tests_to_skip
+               return
+       else
+               fail "find_event returned $ret."
+               return
+       fi
 
        create_lttng_session_ok $SESSION_NAME $TRACE_PATH
 
This page took 0.028519 seconds and 4 git commands to generate.