lttng-tools.git
9 years agoFix: sessiond vs consumerd push/get metadata deadlock
Mathieu Desnoyers [Wed, 19 Aug 2015 21:44:59 +0000 (14:44 -0700)] 
Fix: sessiond vs consumerd push/get metadata deadlock

We need to unlock the registry while we push metadata to break a
circular dependency between the consumerd metadata lock and the sessiond
registry lock. Indeed, pushing metadata to the consumerd awaits that it
gets pushed all the way to relayd, but doing so requires grabbing the
metadata lock. If a concurrent metadata request is being performed by
consumerd, this can try to grab the registry lock on the sessiond while
holding the metadata lock on the consumer daemon. Those push and pull
schemes are performed on two different bidirectionnal communication
sockets.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: sessiond vs consumerd push/get metadata deadlock
Mathieu Desnoyers [Wed, 19 Aug 2015 21:44:59 +0000 (14:44 -0700)] 
Fix: sessiond vs consumerd push/get metadata deadlock

We need to unlock the registry while we push metadata to break a
circular dependency between the consumerd metadata lock and the sessiond
registry lock. Indeed, pushing metadata to the consumerd awaits that it
gets pushed all the way to relayd, but doing so requires grabbing the
metadata lock. If a concurrent metadata request is being performed by
consumerd, this can try to grab the registry lock on the sessiond while
holding the metadata lock on the consumer daemon. Those push and pull
schemes are performed on two different bidirectionnal communication
sockets.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: streamline ret/errno of run_as()
Mathieu Desnoyers [Wed, 19 Aug 2015 21:13:48 +0000 (14:13 -0700)] 
Fix: streamline ret/errno of run_as()

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
 Conflicts:
src/bin/lttng-sessiond/ust-registry.c
src/common/runas.c
src/common/ust-consumer/ust-consumer.c

9 years agoFix: Double unlock on error path
Mathieu Desnoyers [Thu, 3 Sep 2015 03:01:21 +0000 (23:01 -0400)] 
Fix: Double unlock on error path

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: Relay daemon ownership and reference counting
Mathieu Desnoyers [Thu, 3 Sep 2015 02:57:40 +0000 (22:57 -0400)] 
Fix: Relay daemon ownership and reference counting

The ownership and reference counting of the relay daemon is unclear and
buggy in many ways. It is the cause of memory corruptions, double-free,
leaks, segmentation faults, observed in various conditions.

Fix this situation by introducing a clear ownership and reference
counting scheme for this daemon.

See doc/relayd-architecture.txt for details.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
 Conflicts:
src/bin/lttng-relayd/ctf-trace.c
src/bin/lttng-relayd/lttng-relayd.h
src/bin/lttng-relayd/main.c

9 years agoAdd run_as_unlink implementation
Jérémie Galarneau [Thu, 24 Sep 2015 01:47:13 +0000 (21:47 -0400)] 
Add run_as_unlink implementation

run_as_unlink() is used by
Fix: Relay daemon ownership and reference counting

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoCleanup: Reduce scope of relayd connections in live thread
Jérémie Galarneau [Fri, 27 Feb 2015 16:34:04 +0000 (11:34 -0500)] 
Cleanup: Reduce scope of relayd connections in live thread

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: Remove structurally dead code from relayd
Jérémie Galarneau [Thu, 8 Jan 2015 23:06:20 +0000 (18:06 -0500)] 
Fix: Remove structurally dead code from relayd

CID 1262070:  Structurally dead code  (UNREACHABLE)

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoCleanup: relayd: centralize thread stopping function
Mathieu Desnoyers [Thu, 18 Dec 2014 18:02:07 +0000 (13:02 -0500)] 
Cleanup: relayd: centralize thread stopping function

Rather than relying on having main.c and live.c threads both using the
same notification pipe from different stop_thread implementations,
centralize thread stop in one central function exposed to both main.c
and live.c

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoRefactor relayd main/set_options/cleanup
Mathieu Desnoyers [Thu, 18 Dec 2014 01:45:24 +0000 (20:45 -0500)] 
Refactor relayd main/set_options/cleanup

- Enforce symmetry between allocation and teardown,
- Handle all errors,
- Return all errors as EXIT_FAILURE,
- Standardize on zero being success, nonzero being error,
  (rather than < 0 being error),
- Fix pthread PERROR: we need to store ret into errno before
  calling PERROR, since pthread API does not set errno,
- Join errors now fall-through, rather than rely on the OS
  to teardown the rest.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoCleanup: spaghetti function return path
Mathieu Desnoyers [Fri, 21 Nov 2014 17:36:36 +0000 (18:36 +0100)] 
Cleanup: spaghetti function return path

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoAccept uid and gid parameters in utils_mkdir()/utils_mkdir_recursive()
Jérémie Galarneau [Thu, 3 Sep 2015 19:09:00 +0000 (15:09 -0400)] 
Accept uid and gid parameters in utils_mkdir()/utils_mkdir_recursive()

utils_mkdir* utils may now be use in immediate or "run_as" mode.

This is done since some of the code shared between daemons calls
run_as directly, which doesn't support negative uid/gid (which we use
to mean "run as current user").

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: reference counting of consumer output
Mathieu Desnoyers [Wed, 19 Aug 2015 07:29:52 +0000 (00:29 -0700)] 
Fix: reference counting of consumer output

The UST app session has a reference on the consumer output object, but
it belongs to the UST session. Implement a refcounting scheme to ensure
it is not freed before all users are done using it.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: sessiond add missing socket close
Mathieu Desnoyers [Tue, 18 Aug 2015 01:47:53 +0000 (18:47 -0700)] 
Fix: sessiond add missing socket close

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: sessiond should not error on channel creation vs app exit
Mathieu Desnoyers [Sun, 16 Aug 2015 21:56:57 +0000 (17:56 -0400)] 
Fix: sessiond should not error on channel creation vs app exit

We should not report an error when creating a channel if the application
is exiting concurrently.

Also, remove an inappropriate assert() in ust_app_create_event_glb: it
is possible to have a channel lookup fail if channel/event creation
occurs concurrently with an application exit.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: sessiond ust-app session teardown race
Mathieu Desnoyers [Sun, 16 Aug 2015 21:10:22 +0000 (17:10 -0400)] 
Fix: sessiond ust-app session teardown race

Add a deleted flag within the ust app session which is raised (with ust
app session lock held) at delete, and checked within each RCU traversal,
again with ust app session lock held.

This takes care of races between teardown of an application (unregister)
and execution of commands which are accessing the app session
concurrently.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoInitialize default log level of events on load
Jérémie Galarneau [Mon, 31 Aug 2015 22:53:51 +0000 (18:53 -0400)] 
Initialize default log level of events on load

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: Memory leak of agent
Jérémie Galarneau [Sun, 30 Aug 2015 22:50:39 +0000 (18:50 -0400)] 
Fix: Memory leak of agent

agent_destroy() has a comment which indicates that it does _not_
destroy the pointer passed to it and it seems that agents are
never realeased under any code path whatsoever.

There does not seem to be an instance where an agent is allocated on
the stack.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: Memory leak of agent event internals
Jérémie Galarneau [Sun, 30 Aug 2015 22:32:47 +0000 (18:32 -0400)] 
Fix: Memory leak of agent event internals

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: UTF-8 characters may be stored on up to 4 bytes
Jérémie Galarneau [Sun, 30 Aug 2015 21:43:45 +0000 (17:43 -0400)] 
Fix: UTF-8 characters may be stored on up to 4 bytes

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoDon't save log level in session configuration when unneeded
Jérémie Galarneau [Fri, 28 Aug 2015 18:53:26 +0000 (14:53 -0400)] 
Don't save log level in session configuration when unneeded

Saving the log level of events in session configurations when "ALL" log
levels are enabled may confuse both users and programs working with
session configurations.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: typo in error message
Jérémie Galarneau [Wed, 26 Aug 2015 16:04:12 +0000 (12:04 -0400)] 
Fix: typo in error message

writting -> writing

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoAdd agent domains to lttng enable-event usage()
Jérémie Galarneau [Wed, 26 Aug 2015 16:03:38 +0000 (12:03 -0400)] 
Add agent domains to lttng enable-event usage()

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoReport memory allocation failure when copying filter bytecode
Jérémie Galarneau [Wed, 26 Aug 2015 16:00:30 +0000 (12:00 -0400)] 
Report memory allocation failure when copying filter bytecode

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: cmd_enable_event must return positive error codes
Jérémie Galarneau [Wed, 26 Aug 2015 16:00:05 +0000 (12:00 -0400)] 
Fix: cmd_enable_event must return positive error codes

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoSave filter expressions as part of agent events
Jérémie Galarneau [Wed, 26 Aug 2015 15:37:48 +0000 (11:37 -0400)] 
Save filter expressions as part of agent events

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoAdd agent domains to the enable-event section of LTTNG(1)
Jérémie Galarneau [Wed, 26 Aug 2015 15:31:45 +0000 (11:31 -0400)] 
Add agent domains to the enable-event section of LTTNG(1)

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: set GLOBAL buffer type for kernel domain in list
Philippe Proulx [Fri, 21 Aug 2015 15:44:28 +0000 (11:44 -0400)] 
Fix: set GLOBAL buffer type for kernel domain in list

MI is using the list command reponse's buffer type, even when listing
the kernel domain. Not setting .buf_type here results in MI reporting a
wrong buffer type for the kernel domain.

Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: take RCU read-side lock within hash table functions
Mathieu Desnoyers [Thu, 6 Aug 2015 22:02:21 +0000 (18:02 -0400)] 
Fix: take RCU read-side lock within hash table functions

After review, a great deal of caller sites miss the RCU read-side lock
when using the hash table modification functions. This is a case where
having a slight performance degradation might be worthwhile if we can be
a bit more stability. So instead of playing whack-a-mole, add the RCU
read-side lock in the hash table modification functions to ensure
protection from ABA.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: TimeoutExpired in Python tests not defined globally
Antoine Busque [Tue, 11 Aug 2015 20:29:30 +0000 (16:29 -0400)] 
Fix: TimeoutExpired in Python tests not defined globally

The `TimeoutExpired` exception is used in multiple locations
throughout the Python tests. However, it needs to be used as
`subprocess.TimeoutExpired` given that it is only defined in that
module.

Signed-off-by: Antoine Busque <abusque@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: intialization of ust_metadata_poll_pipe to garbage value
Jérémie Galarneau [Fri, 7 Aug 2015 21:01:37 +0000 (17:01 -0400)] 
Fix: intialization of ust_metadata_poll_pipe to garbage value

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix "allocator sizeof operand mismatch" warning
Jérémie Galarneau [Fri, 7 Aug 2015 20:17:02 +0000 (16:17 -0400)] 
Fix "allocator sizeof operand mismatch" warning

Addresses benign scan-build waring:
Result of 'realloc' is converted to a pointer of type 'char *',
which is incompatible with sizeof operand type 'char **'

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: Discard disable event command filter payload
Jérémie Galarneau [Thu, 6 Aug 2015 02:03:29 +0000 (22:03 -0400)] 
Fix: Discard disable event command filter payload

liblttng-ctl sends both the filter expression and filter bytecode
whenever lttng_disable_event_ext() is used _or_ when it is used
implicitly by lttng_disable_event() on an Agent domain (Log4j,
JUL or Python).

As of now, the session daemon ignores this filter payload.
However, on some rare occasions (the frequency of which depends
on the system's configuration and load), the second call to
sendmsg() done by liblttng-ctl could block and return an error
when the session daemon closed the socket (EPIPE).

This fix ensures the payload is received and discarded by the
session daemon, which in turn allows the client to handle the
session daemon's reply to the command.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: Use MSG_NOSIGNAL when calling sendmsg()
Jérémie Galarneau [Thu, 6 Aug 2015 01:26:34 +0000 (21:26 -0400)] 
Fix: Use MSG_NOSIGNAL when calling sendmsg()

Applications using the liblttng-ctl library are most probably
not expecting the SIGPIPE signal which can be triggered by
sendmsg() on a closed socket. Use the MSG_NOSIGNAL flag to
handle such cases gracefully.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: test_mi test
Mathieu Desnoyers [Thu, 6 Aug 2015 19:52:37 +0000 (15:52 -0400)] 
Fix: test_mi test

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: Don't send agent disable event command twice
Jérémie Galarneau [Mon, 3 Aug 2015 20:45:00 +0000 (16:45 -0400)] 
Fix: Don't send agent disable event command twice

The session daemon sends a "disable event" command to agents for each
event, enabled or not, on session destroy. This had no adverse effect
of the Java agent since it suffered from an unrelated bug which ignored
any refcount decrementation.

This fix bumps the command version to "1" to indicate that this behavior
is fixed on the session daemon's end.

Fixes #884

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: incorrect variable being checked in libc-wrapper test
Antoine Busque [Sat, 1 Aug 2015 19:20:50 +0000 (15:20 -0400)] 
Fix: incorrect variable being checked in libc-wrapper test

Signed-off-by: Antoine Busque <abusque@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: Initialize global agent_apps_ht_by_sock on session daemon launch
Jérémie Galarneau [Thu, 30 Jul 2015 16:46:56 +0000 (12:46 -0400)] 
Fix: Initialize global agent_apps_ht_by_sock on session daemon launch

Reported-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: clean-up agent app hash table from the main sessiond thread
Jérémie Galarneau [Sat, 25 Jul 2015 21:48:12 +0000 (17:48 -0400)] 
Fix: clean-up agent app hash table from the main sessiond thread

The agent application hash table, which is allocated by the session
daemon's main thread, is free'd from the agent application registration
thread.

This leads to a number of interesting scenarios under which the agent
app registration thread may encounter an error, thus tearing itself down
and freeing the agent_apps_ht_by_sock hash table. Of course, nothing then
prevents the client processing thread from accessing this invalidated hash
table to list, enable or disable agent events which leads to crashes or
assertions hitting in ht_match_reg_uid().

However, it is not necessary for the agent app registration thread to
encounter an error for this to prove problematic. As shown in bug #893,
the session daemon's teardown will assert on a NULL key in
ht_match_reg_uid() whenever it is performed while a JUL, Log4J or Python
event is still enabled in a session. This happens because the session
daemon's clean-up triggers the destruction of all sessions. The destruction
of those sessions would access the free'd agent_apps_ht_by_sock to disable
the registered agent events.

Fixes #893

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: RCU read-side lock released too early in destroy_agent_app
Jérémie Galarneau [Sat, 25 Jul 2015 20:24:05 +0000 (16:24 -0400)] 
Fix: RCU read-side lock released too early in destroy_agent_app

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: misleading logging statement in agent_find_event
Jérémie Galarneau [Sat, 25 Jul 2015 19:54:19 +0000 (15:54 -0400)] 
Fix: misleading logging statement in agent_find_event

An _event_ is not found, not an agent.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: Unhandled domain option condition in list_agent_events
Jérémie Galarneau [Fri, 24 Jul 2015 22:01:02 +0000 (18:01 -0400)] 
Fix: Unhandled domain option condition in list_agent_events

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: Crash on lttng list -j/-l/-p when no events are present
Jérémie Galarneau [Fri, 24 Jul 2015 21:54:47 +0000 (17:54 -0400)] 
Fix: Crash on lttng list -j/-l/-p when no events are present

The lttng client will free an uninitialized pointer whenever a the
lttng list command is invoked on a domain which involves the log4j,
JUL or Python agents.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: Unbalanced rcu_read_unlock() on stream file creation failure
Jérémie Galarneau [Thu, 16 Jul 2015 17:04:05 +0000 (13:04 -0400)] 
Fix: Unbalanced rcu_read_unlock() on stream file creation failure

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: Unbalanced rcu_read_unlock() on directory creation failure
Jérémie Galarneau [Thu, 16 Jul 2015 17:02:47 +0000 (13:02 -0400)] 
Fix: Unbalanced rcu_read_unlock() on directory creation failure

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: Memory leak in relay_add_stream error path
Jérémie Galarneau [Thu, 16 Jul 2015 16:58:44 +0000 (12:58 -0400)] 
Fix: Memory leak in relay_add_stream error path

Failing to allocate a struct ctf_trace results in the leak of a stream's
path and channel name.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: set session should not set non-existent session
Partha Pratim Mukherjee [Sat, 25 Jul 2015 07:55:19 +0000 (13:25 +0530)] 
Fix: set session should not set non-existent session

set-session does not check the existence of a session before setting
it as the current session. Fix it so that it gives error for a
non-existent session.

Fixes #885

Signed-off-by: Partha Pratim Mukherjee <ppm.floss@gmail.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: Follow struct dirent allocation guidelines of READDIR(3)
Jérémie Galarneau [Tue, 14 Jul 2015 15:33:41 +0000 (11:33 -0400)] 
Fix: Follow struct dirent allocation guidelines of READDIR(3)

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoUpdate comments regarding the power of 2 constraint on sub-buffer sizes
Jonathan Rajotte [Fri, 10 Jul 2015 21:50:02 +0000 (17:50 -0400)] 
Update comments regarding the power of 2 constraint on sub-buffer sizes

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoBuild: add Flex version check
Jonathan Rajotte [Fri, 10 Jul 2015 21:03:18 +0000 (17:03 -0400)] 
Build: add Flex version check

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoBuild: add Bison version check
Jonathan Rajotte [Fri, 10 Jul 2015 21:03:17 +0000 (17:03 -0400)] 
Build: add Bison version check

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: handle sys_futex() FUTEX_WAIT interrupted by signal
Mathieu Desnoyers [Mon, 6 Jul 2015 21:28:34 +0000 (17:28 -0400)] 
Fix: handle sys_futex() FUTEX_WAIT interrupted by signal

We need to handle EINTR returned by sys_futex() FUTEX_WAIT, otherwise a
signal interrupting this system call could make sys_futex return too
early, and therefore cause a synchronization issue.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: metadata push -EPIPE should be recoverable
Mathieu Desnoyers [Mon, 6 Jul 2015 16:21:06 +0000 (12:21 -0400)] 
Fix: metadata push -EPIPE should be recoverable

This return value can be caused by application terminating concurrently
(when using per-PID buffers), so it should not make the consumer
management thread exit.

CC: Aravind HT <aravind.ht@gmail.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: destroy session removes the default config file
Partha Pratim Mukherjee [Sun, 5 Jul 2015 19:31:15 +0000 (15:31 -0400)] 
Fix: destroy session removes the default config file

Destroy session command by default removes the default config file
without checking the current session. As a result when we call any
other command which expects a default session by calling
get_session_name() function, it fails.

This patch will fix this by checking that the default config file gets
removed only when destroy session is called with the current session.

Fixes: #887
Signed-off-by: Partha Pratim Mukherjee <ppm.floss@gmail.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoBuild: bump autoconf version requirement to 2.64
Jonathan Rajotte [Thu, 2 Jul 2015 22:55:32 +0000 (18:55 -0400)] 
Build: bump autoconf version requirement to 2.64

AC_INIT with package_url was introduced in AC 2.64

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: Memory leak in setup of relayd_path
Jérémie Galarneau [Thu, 2 Jul 2015 22:55:17 +0000 (18:55 -0400)] 
Fix: Memory leak in setup of relayd_path

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: update liburcu URL
Jérémie Galarneau [Thu, 2 Jul 2015 22:25:28 +0000 (18:25 -0400)] 
Fix: update liburcu URL

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: Memory allocated by xmlNodeGetContent() must be freed by xmlFree()
Jérémie Galarneau [Thu, 25 Jun 2015 16:42:48 +0000 (12:42 -0400)] 
Fix: Memory allocated by xmlNodeGetContent() must be freed by xmlFree()

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: get_cmdline_by_pid path length assumes a max pid of 65535
Jérémie Galarneau [Tue, 23 Jun 2015 21:27:31 +0000 (23:27 +0200)] 
Fix: get_cmdline_by_pid path length assumes a max pid of 65535

PROC(5) mentions that "On 64-bit systems, pid_max can be set to any
value up to 2^22 (PID_MAX_LIMIT, approximately 4 million)."

We use 32 bits for simplicity's sake.

Reported-by: Zhenyu Ren <zhenyu.ren@aliyun.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: Mark MI and Config string declarations as extern
Jérémie Galarneau [Mon, 25 May 2015 16:10:37 +0000 (12:10 -0400)] 
Fix: Mark MI and Config string declarations as extern

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: modprobe.c: fix tmp_list memory leak
Philippe Proulx [Tue, 28 Apr 2015 21:09:16 +0000 (17:09 -0400)] 
Fix: modprobe.c: fix tmp_list memory leak

Reported-by: Hannes Weisbach <hannes.weisbach@mailbox.tu-dresden.de>
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: append_list_to_probes(): increment index
Philippe Proulx [Tue, 28 Apr 2015 21:08:58 +0000 (17:08 -0400)] 
Fix: append_list_to_probes(): increment index

Reported-by: Hannes Weisbach <hannes.weisbach@mailbox.tu-dresden.de>
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: live_test regression on large number of cpus
Mathieu Desnoyers [Tue, 28 Apr 2015 21:23:34 +0000 (17:23 -0400)] 
Fix: live_test regression on large number of cpus

Merge fixes from Babeltrace lttng-live plugin, especially about
incorrect use of send() and recv().

Can be triggered with 32 virtual processors visible on the system with
the root_regression test suite.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: set UST register timeout to -1 as test default
Mathieu Desnoyers [Tue, 28 Apr 2015 14:16:37 +0000 (10:16 -0400)] 
Fix: set UST register timeout to -1 as test default

On busy systems, it's possible to spuriously hit the default 3 seconds
timeout for UST registration to sessiond, thus causing tests to be flaky
on those systems.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoTests: Don't rely on implicit scalar expression dereference
Jérémie Galarneau [Thu, 23 Apr 2015 23:41:35 +0000 (19:41 -0400)] 
Tests: Don't rely on implicit scalar expression dereference

This silences an "experimental feature" warning when using Perl 5.20.2.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix python bindings' Makefile for out-of-tree builds
Simon Marchi [Tue, 14 Apr 2015 20:45:27 +0000 (16:45 -0400)] 
Fix python bindings' Makefile for out-of-tree builds

The references to the built archives should use top_builddir and not
top_srcdir, because that's where they are.

And new in V2, I got a new error:

  lttng_wrap.c:2970:25: fatal error: lttng/lttng.h: No such file or directory
   #include <lttng/lttng.h>

I think we are missing the -I$(top_srcdir)/include. I had not noticed this
previously, probably because I had an lttng/lttng.h in
/usr/local/include or /usr/include. Also, the other includes seem
unnecessary. The This is not really related to out-of-tree builds though.

Signed-off-by: Simon Marchi <simon.marchi@polymtl.ca>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: zero memory passed to create channel kernel ioctl
Mathieu Desnoyers [Mon, 6 Apr 2015 16:16:11 +0000 (12:16 -0400)] 
Fix: zero memory passed to create channel kernel ioctl

Valgrind complains about uninitialized memory passed to ioctl.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: possible evaluation of garbage values in fini_validation_ctx()
Jérémie Galarneau [Thu, 26 Mar 2015 19:02:40 +0000 (15:02 -0400)] 
Fix: possible evaluation of garbage values in fini_validation_ctx()

Zero-out struct validation_ctx on creation.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: Possible call to execvp with NULL argument on allocation failure
Jérémie Galarneau [Thu, 26 Mar 2015 18:57:50 +0000 (14:57 -0400)] 
Fix: Possible call to execvp with NULL argument on allocation failure

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoDocs: connection_find_by_sock() must be called with rcu_read_lock
Jérémie Galarneau [Fri, 27 Feb 2015 04:09:35 +0000 (23:09 -0500)] 
Docs: connection_find_by_sock() must be called with rcu_read_lock

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: test: log4j: missing static test files for dist and out of tree build
Jonathan Rajotte [Fri, 30 Jan 2015 22:04:57 +0000 (17:04 -0500)] 
Fix: test: log4j: missing static test files for dist and out of tree build

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: test: java-jul:missing static test files for dist and out of tree build
Jonathan Rajotte [Fri, 30 Jan 2015 22:04:56 +0000 (17:04 -0500)] 
Fix: test: java-jul:missing static test files for dist and out of tree build

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: out-of-tree build: missing xsd file for mi test execution
Jonathan Rajotte [Fri, 30 Jan 2015 18:30:11 +0000 (13:30 -0500)] 
Fix: out-of-tree build: missing xsd file for mi test execution

This need to be backported to stable 2.6.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: out-of-tree build: missing xsd file for save-load test execution
Jonathan Rajotte [Fri, 30 Jan 2015 18:30:10 +0000 (13:30 -0500)] 
Fix: out-of-tree build: missing xsd file for save-load test execution

This need to be backported to stable 2.6 and stable 2.5

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoUpdate version to v2.6.0 v2.6.0
Jérémie Galarneau [Mon, 26 Jan 2015 17:18:26 +0000 (12:18 -0500)] 
Update version to v2.6.0

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoTests: Run health check test_thread_ok as part of root_regression
Jérémie Galarneau [Mon, 26 Jan 2015 17:14:13 +0000 (12:14 -0500)] 
Tests: Run health check test_thread_ok as part of root_regression

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: deadlock between UST registry lock and consumer lock
Mathieu Desnoyers [Fri, 23 Jan 2015 16:29:00 +0000 (11:29 -0500)] 
Fix: deadlock between UST registry lock and consumer lock

Reorganize locking of ust registry and consumer socket communication.

commit ce34fcd0 "Fix: per-uid flush and ust registry locking" attempted
to fix locking related to the UST registry, but doing so introduced a
deadlock. The actual solution is to reverse the order in which the UST
registry and the consumer lock nest: the UST registry will now to
responsible for serializing the registry content, and the consumer lock
will only protect communication with the consumer, as it should. This
deals with a TODO in the code.

The reason why this was not done from the beginning is that there was
originally an intent to make sure the ust registry lock is not held for
a long time, thus not while communicating with the consumer daemon.
However, when live has been implemented, it required communication with
the consumer daemon while the ust registry is held anyway. Therefore,
there is not much point anymore in trying to make sure this lock is not
held across the communication with consumerd in push_metadata. This
allows us to greatly simplify locking of the UST registry.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: uninitialized return value
Mathieu Desnoyers [Fri, 23 Jan 2015 16:28:59 +0000 (11:28 -0500)] 
Fix: uninitialized return value

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: build failure using disable-lttng-ust configure option
Jérémie Galarneau [Thu, 22 Jan 2015 20:17:34 +0000 (15:17 -0500)] 
Fix: build failure using disable-lttng-ust configure option

A stub for ust_app_get_size_one_more_packet_per_stream() is missing
which causes the build to fail when using the --disable-lttng-ust
configuration option.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: grab more than one packet for snapshots
Mathieu Desnoyers [Thu, 15 Jan 2015 22:24:27 +0000 (17:24 -0500)] 
Fix: grab more than one packet for snapshots

There are a few issues with snapshot size: when taking a snapshot
without specifying any "max size" (should be unlimited), only a single
packet from each stream is saved. We expect all stream available content
to be saved. There is a similar issue when a max size is specified.

Also, trying to make all streams save as much data has unexpected
corner-cases: for instance, if we have this configuration:
- kernel channels: 2 subbuffers of 1MB x 8 CPUs
- per-PID UST channels: 16 subbuffers of 4kB x 8 CPUs x 100 apps

would require the user to have a very large max size, since it would try
to fit (8 + (100 * 8)) * 1MB = 808MB of sub-buffers, else it would fail.
This issue here is using the largest subbuffer size as the criterion
applied to all channels.

We fix those issues by simplifying the algorithm used to calculate how
much data to grab. Rather than calculating the size to grab from each
stream, we calculate a number of packets to grab. It fails if we cannot
grab at least one packet from each stream in the session. Then checks if
it can grab 2 packets from each stream, and so on, until there is no
more space available (based on max size). This is not a perfect
solution, but has the merit of being simple to understand, and has no
(or few) unexpected corner-cases.

Fixes #860

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: per-uid flush and ust registry locking
Mathieu Desnoyers [Thu, 15 Jan 2015 22:24:26 +0000 (17:24 -0500)] 
Fix: per-uid flush and ust registry locking

Commit c4b88406 "Fix: ust-app: per-PID app unregister vs tracing stop
races" introduces a regression for per-UID flush. It can be triggered by
the test_high_throughput_limits (root regression) test. For per-UID
tracing, we need to use the registry channel ID, not the per-application
channel ID, when asking the consumer daemon to flush.

When doing this fix, we notice that the locking rules of push_metadata()
are weird. A per-ust app session lock is protecting registry data, which
makes it impossible to call push_metadata from a ust session level (for
the entire session) in the case of per-UID tracing. Moreover, it's
unclear how holding a per-application lock can protect a registry shared
across applications in per-UID tracing. Therefore, we move all accesses
to the registry metadata_key and metadata_closed fields into the
registry lock critical section. We now only rely on RCU to ensure
existance of registry across push_metadata(), rather than relying on the
per-application session lock.

It also takes care of a documentation vs code mismatch: push_metadata()
documents that "The session lock MUST be acquired here before calling
this.", but in reality, it's the application session lock which is held
across those calls. Removing this requirement, and relying on RCU
instead, fixes this mismatch.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: add missing synchronization point for before app test case
Mathieu Desnoyers [Thu, 20 Nov 2014 06:40:41 +0000 (07:40 +0100)] 
Fix: add missing synchronization point for before app test case

Fixes a race where the application could generate all its events before
trace start.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Conflicts:
tests/utils/testapp/gen-ust-events/gen-ust-events.c

9 years agoFix: tests: wait output hide Terminate errors
Mathieu Desnoyers [Wed, 19 Nov 2014 21:40:31 +0000 (22:40 +0100)] 
Fix: tests: wait output hide Terminate errors

Also: Don't hide kill errors.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: tests: remove killall, add missing SIGTERM handlers
Mathieu Desnoyers [Wed, 19 Nov 2014 21:40:30 +0000 (22:40 +0100)] 
Fix: tests: remove killall, add missing SIGTERM handlers

Applications may change name and, thus, be missed by using
killall.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: high throughput test: reset bw limit on sigterm
Mathieu Desnoyers [Wed, 19 Nov 2014 21:40:29 +0000 (22:40 +0100)] 
Fix: high throughput test: reset bw limit on sigterm

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: tests: add missing wait, document missing synchro
Mathieu Desnoyers [Wed, 19 Nov 2014 21:40:28 +0000 (22:40 +0100)] 
Fix: tests: add missing wait, document missing synchro

Move all wait ${!} that target a single process to "wait", to minimize
the chances to forget some background process in the future.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoDocument test anti-patterns
Mathieu Desnoyers [Wed, 19 Nov 2014 21:40:27 +0000 (22:40 +0100)] 
Document test anti-patterns

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: test flaky sleep and wait patterns
Mathieu Desnoyers [Wed, 19 Nov 2014 21:40:26 +0000 (22:40 +0100)] 
Fix: test flaky sleep and wait patterns

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Conflicts:
tests/regression/ust/python-logging/test_python_logging

9 years agoFix: tests: don't use pidof to wait for test apps
Mathieu Desnoyers [Wed, 19 Nov 2014 21:40:25 +0000 (22:40 +0100)] 
Fix: tests: don't use pidof to wait for test apps

Use the bash shell "wait" to wait for all background tasks rather than
the racy "pidof". Indeed, it's possible that applications have been
forked, but not executed yet, when pidof is done, which would therefore
miss applications. Using "wait" from the shell solves this.

If we want to be really strict, we should have sessiond, consumerd, and
relayd export a file containing their own PID, and wait for this instead
of using pidof. But this will be for another fix.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoDocs: Grammar fixes in the lttng manpage
Jérémie Galarneau [Wed, 14 Jan 2015 23:41:10 +0000 (18:41 -0500)] 
Docs: Grammar fixes in the lttng manpage

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: add missing UST perf counter support check
Mathieu Desnoyers [Mon, 12 Jan 2015 22:14:52 +0000 (17:14 -0500)] 
Fix: add missing UST perf counter support check

Report whether performance counters are supported by UST on the
architecture as soon as the user try to enable a perf counter context.

Fixes #851

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: tests: integer too large for long type
Mathieu Desnoyers [Tue, 2 Dec 2014 22:21:12 +0000 (17:21 -0500)] 
Fix: tests: integer too large for long type

Compiler warns on 32-bit builds.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: undefined operation on last_relay_viewer_session_id
Mathieu Desnoyers [Tue, 2 Dec 2014 22:21:11 +0000 (17:21 -0500)] 
Fix: undefined operation on last_relay_viewer_session_id

Triggers compiler warning on 32-bit build.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: print format type mismatch
Mathieu Desnoyers [Tue, 2 Dec 2014 22:21:10 +0000 (17:21 -0500)] 
Fix: print format type mismatch

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: print format type mismatch
Mathieu Desnoyers [Tue, 2 Dec 2014 22:21:09 +0000 (17:21 -0500)] 
Fix: print format type mismatch

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: Remove unused argument in debug statement
Jérémie Galarneau [Thu, 8 Jan 2015 20:43:24 +0000 (15:43 -0500)] 
Fix: Remove unused argument in debug statement

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoCleanup: Remove unused label
Jérémie Galarneau [Thu, 8 Jan 2015 21:02:13 +0000 (16:02 -0500)] 
Cleanup: Remove unused label

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: exit threads not only on goto restart
Mathieu Desnoyers [Mon, 5 Jan 2015 21:43:08 +0000 (16:43 -0500)] 
Fix: exit threads not only on goto restart

Exit threads as soon as number of FD is 0, on every loop (no need for
goto restart special case). Number of FD being 0 is a sufficient
condition for exiting the thread: it means the quit pipe has been
removed from the poll set.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 years agoFix: poll: show the correct number of fds
Mathieu Desnoyers [Mon, 5 Jan 2015 21:43:07 +0000 (16:43 -0500)] 
Fix: poll: show the correct number of fds

LTTNG_POLL_GETNB() uses wait nb_fd, which is only updated after
lttng_poll_wait returns.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Conflicts:
src/bin/lttng-sessiond/ht-cleanup.c

9 years agoFix: compat poll: add missing empty revents checks
Mathieu Desnoyers [Mon, 5 Jan 2015 21:43:05 +0000 (16:43 -0500)] 
Fix: compat poll: add missing empty revents checks

Poll returns the entire array, including entries that have no activity.
We need to check them explicitly.

Fixes #747

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Conflicts:
src/bin/lttng-sessiond/ht-cleanup.c

This page took 0.043976 seconds and 4 git commands to generate.