git.lttng.org Git - lttng-tools.git/commit - src/bin/lttng-sessiond/main.c

Fix: Multiple health monitoring fixes

* Fix modulo operation bug on
  #define HEALTH_IS_IN_CODE(x) (x % HEALTH_POLL_VALUE) which is causing
  the check to think it is never within code.  (x % 1 always equals 0).
  Simplify this by using a simple & on the poll value, and remove the
  IS_IN_CODE, using ! on IS_IN_POLL instead (which removes nothing to
  clarity).

* Atomic operations should apply to at most "unsigned long" (32-bit on
  32-bit arch) rather than uint64_t.

* Separate the "error" condition from the counters. We clearly cannot
  use the "0" value as an error on 32-bit counters anymore, because they
  can easily wrap.

* Introduce "exit" condition, will be useful for state tracking in the
  future. Error and exit conditions implemented as flags.

* Add "APP_MANAGE" in addition to "APP_REG" health check, to monitor the
  app registration thread (which was missing, only the app manager
  thread was checked, under the name "APP_REG", which was misleading).

* Remove bogus usage of uatomic_xchg() in health_check_state():
  It is not needed to update the "last" value, since the last value is
  read and written to by a single thread. Moreover, this specific use of
  xchg was not exchanging anything: it was just setting the last value
  to the "current" one, and doing nothing with the return value.
  Whatever was expected to be achieved by using uatomic_xchg() clearly
  wasn't.

* Because the health check thread could still be answering a request
  concurrently sessiond teardown, we need to ensure that all threads
  only set the "error" condition if they reach teardown paths due to an
  actual error, not on "normal" teardown condition (thread quit pipe
  being closed). Flagging threads as being in error condition upon all
  exit paths would lead to false "errors" sent to the client, which we
  want to avoid, since the client could then think it needs to kill a
  sessiond when the sessiond might be in the process of gracefully
  restarting.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>

author	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
	Mon, 23 Jul 2012 18:00:42 +0000 (14:00 -0400)
committer	David Goulet <dgoulet@efficios.com>
	Tue, 24 Jul 2012 16:01:39 +0000 (12:01 -0400)
commit	139ac87245fd1ca18d60a0efca32b50e4c1d8730
tree	af560cdb8d1d8c09821a272f75454f9ccdd28312	tree \| snapshot
parent	6e3c5836f180eeee21271242f707f4b88a840570	commit \| diff

include/lttng/lttng.h		diff \| blob \| blame \| history
src/bin/lttng-sessiond/health.c		diff \| blob \| blame \| history
src/bin/lttng-sessiond/health.h		diff \| blob \| blame \| history
src/bin/lttng-sessiond/main.c		diff \| blob \| blame \| history