Jonathan Rajotte [Mon, 5 Feb 2018 23:19:00 +0000 (18:19 -0500)]
Fix: add-context cannot be performed after a session has been started
The following scenario lead to a corrupted trace/metadata layout problem:
- lttng create test
- lttng enable-channel -u test
- lttng enable-event -u -a -c test
- lttng start
- ./instrumented-application
- lttng stop
- lttng add-context -u -t procname -c test
- lttng start
- ./instrumented-application
- lttng stop
- lttng view
Babeltrace 1.5.x will fail with:
[error] Unexpected end of packet. Either the trace data stream is corrupted or metadata description does not match data layout.
[error] Reading event failed.
Error printing trace.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Fri, 16 Feb 2018 19:48:49 +0000 (14:48 -0500)]
Fix: duplicated kernel consumer socket locking
Commit
9d1103e introduced a bug causing a deadlock on snapshot record.
Function consumer_snapshot_channel is called with the lock held causing
the pthread_mutex_lock call inside to hang forever.
Because consumer_snapshot_channel now acquires the lock before using the
socket. No need to acquire the lock before calling the function.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Wed, 19 Apr 2017 16:19:56 +0000 (12:19 -0400)]
Tests: Change syscall tests to use `gen-syscall-events` testapp
Use `gen-syscall-events` testapp in conjuction with the LTTng PID
tracker to improve the reliability of the syscall tracing tests by only
tracing the test app's activity.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Wed, 5 Apr 2017 15:53:51 +0000 (11:53 -0400)]
Tests: Add test app to generate syscalls
This app launches and waits for the creation of a file specified in the
arguments before executing syscalls. This can be used with the PID
tracker feature of LTTng to only trace the syscalls made by the test
process and thus makes the tests more reliable.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Thu, 8 Jun 2017 21:14:46 +0000 (17:14 -0400)]
Tests: Move script synchronization functions to utils library
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Wed, 13 Dec 2017 17:08:34 +0000 (12:08 -0500)]
Fix: remove unused event types in MI XML schema
KPROBE and KRETPROBE event types are never produced by the MI output,
PROBE and FUNCTION are rightfully used. Using KPROBE and KRETPROBE would
be exposing the inner workings of the kernel tracer that should be
abstracted to the user.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Wed, 17 Jan 2018 21:58:13 +0000 (16:58 -0500)]
Updating lttng-ust-ctl header file
This file is a near-complete duplicate of the include/lttng/ust-ctl.h
file in lttng-ust repo.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 15 Feb 2018 16:53:17 +0000 (11:53 -0500)]
Tests: cleanly exit from test apps on reception of SIGTERM
There is a known lttng-ust limitation that can cause a buffer
to become unreadable if an application is killed or preempted
indefinitely between the reserve and commit operations in
while trying to record to a subbuffer.
A buffer being unreadable will cause some tests to fail since
events that are expected to be visible in a given stream
may not be shown by the trace viewers as the consumer was
unable to "get" that subbuffer.
It was fairly easy to reproduce this failure scenario using
the test_ust_fast snapshot test, in the "post_mortem" case.
This test case performs the following sequence of operations:
* setup a tracing session in snapshot mode
* launch an app
* kill(1) it after one event is known to have been produced
* record a snapshot
* try to read the resulting snapshot
Adding logging allowed the confirmation that the "get"
operation was indeed failing on the subbuffer to which the
application had run. This resulted in an empty stream
(file size == 0) being produced by the snapshot record operation.
The test was then failing because babeltrace reported that no
events were contained in the resulting trace.
Since there are no concrete solution to this limitation yet,
the test suite must ensure that the applications exit cleanly
on reception of a signal.
This patch introduces a SIGTERM signal handler in the test
applications which sets a "should_quit" flag to 1 and is
tested between every iteration of their event production loop.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 14 Feb 2018 22:44:05 +0000 (17:44 -0500)]
Document consumer socket locking assumptions
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 14 Feb 2018 21:13:51 +0000 (16:13 -0500)]
Fix: consumer socket lock not held during snapshot record
This missing lock was identified while stress-testing the
snapshot tracing mode.
The "post_mortem" test case would sometimes hang on a
push_metadata() call waiting for a status reply from the
consumer daemon.
This test demonstrated a race that consists in killing an
application and taking a snapshot near-simultaneously.
This causes the app management thread to issue a "push metadata"
command to the consumerd while the lttng client is issuing
a snapshot record command.
Since the snapshot record does not acquire the consumer socket lock,
the "push metadata" and "snapshot" commands end-up mixed-up on
the socket which ultimately causes the "apps management" thread
to wait for a reply forever while holding the socket's lock.
This prevents the client, invoked by the test script, from
completing the "stop" operation on the session.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 14 Feb 2018 21:05:18 +0000 (16:05 -0500)]
Fix: set_relayd_for_snapshot does not acquire the consumer socket lock
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 14 Feb 2018 20:24:40 +0000 (15:24 -0500)]
Fix: send_channel_monitor_pipe does not take the consumer socket lock
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 14 Feb 2018 21:04:33 +0000 (16:04 -0500)]
Document the locking assumptions of consumerd-relayd socket passing
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 14 Feb 2018 21:14:21 +0000 (16:14 -0500)]
Assert that the consumer lock is held while sending FDs to consumerd
The consumer_data lock must be held during the communications
between the consumerd and sessiond.
The consumer_data lock is refered-to by each consumer_socket
instance; they point to their consumer's global data lock.
The lock can't be taken in consumer_send_msg() or consumer_send_fds()
since we want to protect a complete "transaction". Some commands
require both functions to be called and we want to hold the lock
over the duration of both calls to protect against other
threads initiating a communication between the two calls.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 14 Feb 2018 19:59:35 +0000 (14:59 -0500)]
Assert that the consumer socket lock is taken during communication
The consumer_data lock must be acquired during any communication
between the session and consumer daemons.
Stress tests have shown a number of deadlocks that have been
traced down to this type of errors.
Individual fixes follow this commit.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 9 Feb 2018 21:40:39 +0000 (16:40 -0500)]
Tests: refuse to run test suite if lttng processes are present
The test suite often fails because of unclean environments where
stale LTTng processes are left running. Since the test suite
assumes that no LTTng process (daemons and test applications) are
running, it makes sense to force the user to kill all those
processes before running the test suite.
The warn_processes.sh script now prints an error and returns 1
to indicate an early failure to the test harness.
It is possible to circumvent this check by invoking the tests
manually or by removing the "exit 1" from the warn_processes.sh
script if there is a need to have persistent processes across
the execution of the test suite.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 8 Feb 2018 23:25:55 +0000 (18:25 -0500)]
Fix: metadata channel leak when using the snapshot tracing mode
While running stress tests involving the snapshot mode, it
becomes apparent that the lttng-consumerd leaks a number of file
descriptors.
To isolate the problem, the test was narrowed down to
* Create a session in snapshot mode
* Enable a userspace channel
* Enable all userspace events
* Start tracing
* Run a traced application
* Stop tracing
* Destroy session
This has shown that 5 file descriptors were leaked on each
iteration of the above.
As the comments in this change indicate, the ownership and
lifetime of metadata channels varies depending on the tracing
mode being used.
In non-snapshot tracing modes, metadata channels are owned by
their respective streams. On destruction of a metadata stream,
consumer_del_channel() is invoked since the stream releases its
ownership of the metadata channel.
However, this relationship between metadata streams and channels
does not exist in snapshot mode; streams are created and
destroyed on every snapshot record. Hence, the
LTTNG_CONSUMER_CLOSE_METADATA command must immediately clean the
metadata channel.
The channel's "monitor" flag is used to determine whether or not
the metadata channel is in "snapshot" mode or not.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Thu, 25 Jan 2018 23:57:27 +0000 (18:57 -0500)]
Fix: do not flag consumer as disabled on relayd comm failure
A relay daemon may be temporarily unavailable (e.g. not launched yet,
or simply a network error). In such a case, it is not necessary to
mark the consumer as bad since the error is not related to the
consumer daemon itself.
This change lets the user try to create a channel later without
having to restart the session and consumer daemons.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Julien Desfossez [Thu, 1 Feb 2018 19:24:10 +0000 (14:24 -0500)]
Fix: cleanup inactive FDs in the consumer polling thread
Users have reported assert() hitting on consumerd shutdown on a
non-empty data stream hash table.
Relevant stack trace:
[...] in lttng_ht_destroy (ht=0x6) at hashtable.c:162
[...] in lttng_consumer_cleanup () at consumer.c:1207
[...] in main ([...]) at lttng-consumerd.c:625
This is reproducible when a consumerd is shutting down at the same
time as one of its relay daemon peers.
On failure to reach a relay daemon, all of that relay daemons'
associated streams are marked as having an inactive endpoint (see
cleanup_relayd(), consumer.c:467). The data polling thread is notified
of the change through an empty message on the "data" pipe.
Before blocking on the next poll(), the data polling thread checks if
it needs to update its poll set using the "need_update" flag. This
flag is set anytime a stream is added or deleted.
While building a new poll set, streams that are now marked as inactive
or as having an inactive endpoint are not included in the new poll
set. Those inactive streams are in a transitional state, awaiting
a clean-up.
After updating the poll set, the data polling thread checks if it
should quit (via the consumer_quit flag). Assuming this flag is set,
the thread cannot simply exit; it must clean-up any remaining data
stream.
The thread currently performs this check at consumer.c:2532. This
check is erroneous as it assumes that the number of FDs in the poll set is
indicative of the number of FDs the thread has ownership of.
If all streams are inactive, the poll set will contain no FDs to
monitor and the thread will assume that it can exit. This will leave
streams in "data_ht", causing an assertion to hit in the main thread
during the clean-up.
This patch adds an inactive FD count which must also reach zero before
the data polling thread can exit.
The clean-up of the inactive streams occurs as the data polling thread
wakes-up on its "data" pipe. Upon being woken-up on the "data" pipe,
the data polling thread will validate the endpoint status of every
data stream and close those that have been marked as inactive
(see consumer_del_stream(), consumer.c:525).
This occurs as often as necessary to allow the thread to clean-up all
of its inactive streams and exit cleanly.
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Mon, 22 Jan 2018 20:43:35 +0000 (15:43 -0500)]
man: document dead-peer detection for lttng-relayd
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Mon, 22 Jan 2018 20:43:34 +0000 (15:43 -0500)]
lttng-relayd: use TCP keep-alive mechanism to detect dead-peer
Allow relayd to clean-up objects related to a dead connection
for which the FIN packet was no emitted (Unexpected shutdown,
ethernet:blocking). Note that an idle peer is not considered dead given
that it respond to the keep-alive query after the idle time is elapsed.
By RFC 1122-4.2.3.6 implementation must default to no less than two
hours for the idle period. On linux the default value is indeed 2 hours.
This could be problematic if relayd should be aggressive regarding
dead-peers. Hence it is important to provide tuning knob regarding the
tcp keep-alive mechanism.
The following environments variable can be used to enable and fine-tune
it:
LTTNG_RELAYD_TCP_KEEP_ALIVE_ENABLE
Set to 1 to enable the use of tcp keep-alive allowing the detection
of dead peers.
LTTNG_RELAYD_TCP_KEEP_ALIVE_TIME
See tcp(7) tcp_keepalive_time or tcp_keepalive_interval on
Solaris 11.
A value of -1 lets the operating system manage this parameter
(default).
LTTNG_RELAYD_TCP_KEEP_ALIVE_PROBES
See tcp(7) tcp_keepalive_probes.
A value of -1 lets the operating system manage this
parameter (default).
No effect on Solaris.
LTTNG_RELAYD_TCP_KEEP_ALIVE_INTVL`::
See tcp(7) tcp_keepalive_intvl.
A value of -1 lets the operating system manage
his parameter (default).
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 14 Dec 2017 20:13:18 +0000 (15:13 -0500)]
Tests: add kernel notification tests to the root regression list
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 13 Dec 2017 23:25:32 +0000 (18:25 -0500)]
Docs: clarify which socket serves as the ust_app_ht_by_sock's key
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 13 Dec 2017 23:24:50 +0000 (18:24 -0500)]
Docs: refer to apps_notify_thread instead of 'the other thread'
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 13 Dec 2017 23:24:10 +0000 (18:24 -0500)]
Docs: describe the apps_thread's working in function header
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sat, 9 Dec 2017 17:51:46 +0000 (12:51 -0500)]
Tests: race between consumer pause and trace start/stop
This fixes two problems with the current test.
1. Starting the tracing before pausing the consumption can result
in an arbitrary number of buffer usage conditions being sent to
the client as the buffers can be filled and emptied a number of
times.
2. Resuming the consumption before stopping tracing can, in a
similar way as '1', result in an arbitrary number of notifications
being sent to the client.
Note that the non-blocking stop is used since the blocking
variant would wait for pending data to be flushed forever since
the consumption is paused. Hence, we stop the tracing, resume
the consumption, and wait for the buffers to be flushed explicitly
using the lttng_data_pending() call. No sleeps are performed in
that loop since those could hide races triggered by this test.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 5 Dec 2017 20:22:11 +0000 (15:22 -0500)]
Clean-up: remove unneeded rcu_read_lock acquisition
create_channel_per_uid() must already be called while the
RCU reader lock is held since the buffer registry is being
accessed.
The only caller of create_channel_per_uid() is do_create_channel()
which, itself, documents that it must be called while holding the
RCU read lock.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 5 Dec 2017 20:21:49 +0000 (15:21 -0500)]
Docs: document locking assumption of function
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 5 Dec 2017 20:25:28 +0000 (15:25 -0500)]
Fix: notification thread not notified of channel creation on app error
The multi-app notification test is failing (more often on ARM64)
since the notification thread appears to not be notified of a
channel's creation under some circumstances.
Investigating this failure pointed to create_channel_per_uid()
which provides the "hook" the notification system needs to
be informed of a channel's creation.
The first time this function is invoked for a given channel, the
lookup in the buffer registry will fail, prompting the lazy creation
of the channel. Then, that channel is sent to the application
being registered.
The error in the current code is that the channel's creation
is not communicated to the notification subsystem whenever the
session daemon fails to communicate with the application.
Failing to communicate with the application is not a channel
creation error (in per-uid mode). In this specific case, the
test is launching many short-lived applications and it is
expected for the session daemon to encounter closing or dead
applications as it handles their registration.
Note that the diff of this commit is misleading. The important part
is that notification_thread_command_add_channel() has to be
performed regardless of the result of send_channel_uid_to_ust().
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Thu, 28 Sep 2017 20:37:22 +0000 (16:37 -0400)]
Clean-up: consumer_add_metadata_stream always returns 0
Since
c869f647b0c4476645ab9ee01e362401fb8c1e42, the return value of
consumer_add_metadata_stream is always zero. Hence, we can remove some
dead error handling code.
consumer_add_metadata_stream success is checked using asserts.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Thu, 28 Sep 2017 20:37:21 +0000 (16:37 -0400)]
Fix: scope ownership of a stream for ust-consumer
A failure on lttng_pipe_write() during send_stream_to_thread() leads to
a null-pointer dereference of the stream handle during
consumer_del_channel(). The chain of events leading to the problem
is:
- Failure during lttng_pipe_write() inside send_stream_to_thread().
- Call to consumer_stream_destroy() via consumer_del_stream_for_data()
or consumer_del_stream_for_metadata().
- The stream is monitor and globally visible at this point leading to
performing a call to destroy_close_stream() which performs the first
cleanup of the stream.
Note: At this point the stream is still in the channel local stream
list (stream.send_node).
- The call to unref_channel() returns a reference to a channel for which
a cleanup call must be done.
- The cleanup call for the channel is performed using
consumer_del_channel().
- At this point the stream is still in the channel's local stream list.
This results in a second call to consumer_stream_destroy() via
clean_channel_stream_list(). Which, itself, results in accesses to
freed memory.
The fix consists in:
- Using cds_list_del() inside send_stream_to_thread() after public
exposition of the stream to ensure that the stream ownership/visibility
is clear. A stream cannot be globally visible and local
(stream.send_node) to a channel at the same time.
- Modifying error paths to acknowledge the ownership transfer to
send_stream_to_thread().
Reported-by: Liguang Li <liguang.li@windriver.com>
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 30 Nov 2017 23:23:44 +0000 (00:23 +0100)]
Clean-up: reduce scope of dyanamically-allocated string
tmpnew is only useful within the scope of the libdir check.
It can be allocated and free()'d within that scope.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 30 Nov 2017 23:18:03 +0000 (00:18 +0100)]
Fix: using putenv() and free()-ing the value is invalid
putenv() does not copy the string passed as the parameter. Hence,
free()-ing the string results in an invalid environment. In the
"good" case, we don't care since we execl().
However, on error, our process now has an invalid environment
which can cause breakage further down the line.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 30 Nov 2017 22:50:16 +0000 (23:50 +0100)]
Clean-up: unnecessary duplicated call to exit()
exit(EXIT_FAILURE) is called outside of the switch case.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 30 Nov 2017 22:45:30 +0000 (23:45 +0100)]
Fix: unknown consumer type considered a libc error
The PERROR() macro uses the errno variable to print an error
message. However, the consumer type being invalid is an internal
error. The value of errno, at that point, is unrelated to the
error.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 29 Nov 2017 21:42:29 +0000 (16:42 -0500)]
Fix: consumerd(64/32)_lib_dir can be NULL
Reproducer:
lttng-sessiond \
--consumerd32-path=/usr/local/lib/lttng/libexec/lttng-consumerd \
--consumerd64-path=/usr/local/lib/lttng/libexec/lttng-consumerd
lttng create
lttng enable-event -u -a
On a 64bit machine the invocation of the 64bit consumerd will not fail
since its libdir is populated by sessiond_config_init but will segfault on
spawning of the 32 bit consumerd when performing the check of libdir
value.
On a 32bit machine the opposite will happen.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 23 Nov 2017 02:16:38 +0000 (21:16 -0500)]
Fix: evaluate trigger condition on registration
Since there is nothing preventing clients from subscribing to a
condition before the corresponding trigger is registered, we have
to evaluate this new condition right away.
The current implementation is waiting for the next "evaluation" of
conditions (e.g. on reception of a channel sample) to evaluate this
newly registered trigger conditions, but this is broken.
The reason it is broken is that waiting for the next sample
does not allow us to properly handle transitions for edge-triggered
conditions.
Consider this example: when we handle a new channel sample, we
evaluate each conditions twice: once with the previous state, and
again with the newest state. We then use those two results to
determine whether a state change happened: a condition was false and
became true. If a state change happened, we have to notify clients.
Now, if a client subscribes to a given notification and registers
a trigger *after* that subscription, we have to make sure the
condition is evaluated at this point while considering only the
current state. Otherwise, the next evaluation cycle may only see
that the evaluations remain the same (true for samples n-1 and n) and
the client will never know that the condition has been met.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 14 Nov 2017 02:16:18 +0000 (21:16 -0500)]
Fix: nonsensical message printed by lttng track/untrack
The lttng track/untrack command, when used to track/untrack all
PIDs, prints a message of the following form:
"PID -1 untracked in session auto-
20171113-210309"
This is because -1 is taken to mean "all" by the API and is used
as-is to print the message on the CLI.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Julien Desfossez [Mon, 13 Nov 2017 23:15:54 +0000 (18:15 -0500)]
Fix: O_CLOEXEC is erroneously used on pipe creation
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Julien Desfossez [Mon, 13 Nov 2017 23:14:49 +0000 (18:14 -0500)]
Fix: wrong parameter to fcntl in pipe_set_flag
Depending on the flags passed, fcntl must be called with F_SETFD or
F_SETFL. This fix checks the flag passed and ensure it is valid and
calls fcntl with the right parameter.
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 18 Oct 2017 15:39:06 +0000 (11:39 -0400)]
Fix: use lttng_clock_gettime instead of clock_gettime
It appears that commit
389fbf04b41e2002be44a1e3392bfade2f1deeef missed
it.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Thu, 12 Oct 2017 15:19:39 +0000 (11:19 -0400)]
Fix: close channel monitor pipe after killing the metadata_timer_thread
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Julien Desfossez [Wed, 30 Aug 2017 18:06:42 +0000 (14:06 -0400)]
Fix: path of snapshots with a relay and default URI
When recording a snapshot to a relay without custom URI (ex:
net://localhost vs net://localhost/custom), the snapshots end up being
stored in ~/lttng-traces/<hostname>/snapshot-XXX instead of being inside
the <session-name> folder like on local snapshots. We would expect the
path to be: ~/lttng-traces/<hostname>/<session-name>/snapshot-XXX
So there is a discrepancy between the local and remote behaviour. This
behaviour has been there since at least v2.6, maybe earlier.
Moreover, there is nothing that informs the user about the default
snapshot name, so it is not possible to know where a snapshot has been
stored.
After parsing the URI provided by the user, we now check if a custom
name was provided or copy the session name there. This is the same
operation performed in _lttng_create_session_ext.
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Mon, 28 Aug 2017 21:50:04 +0000 (17:50 -0400)]
Fix: use file based synchronization for python logging test
No synchronization yield unstable result on a stressed system.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Mon, 28 Aug 2017 21:50:03 +0000 (17:50 -0400)]
Test: add file based synchronization point for python test app
test.py is responsible for the cleanup of the "ready" file while the
cleanup of the "go" file is left to the external controller.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Julien Desfossez [Wed, 23 Aug 2017 20:48:53 +0000 (16:48 -0400)]
Fix: wrong use of the relay_streams_sent in snapshot
The relay_streams_sent message is only useful in live sessions and
should only be sent after all the streams of a channel have been sent.
Here we were sending this message every time we sent a stream to the
relay during a snapshot which makes no sense.
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Julien Desfossez [Wed, 23 Aug 2017 20:43:22 +0000 (16:43 -0400)]
Fix: the return code of lttcomm_send_unix_sock is signed
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 13 Nov 2017 15:31:29 +0000 (10:31 -0500)]
Fix warning: src/bin/lttng/utils.c: cast incompatible pointer
Reported-by: Philippe Proulx <eeppeliteloop@gmail.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Philippe Proulx [Wed, 8 Nov 2017 20:19:24 +0000 (15:19 -0500)]
Fix: src/common/pipe.h: include <sys/types.h> for ssize_t and mode_t
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Philippe Proulx [Mon, 6 Nov 2017 23:46:41 +0000 (18:46 -0500)]
Fix: detect dlmopen() and disable corresponding tests if not available
musl and uClibc-ng are known not to support dlmopen(). LTTng-UST has
this dlmopen() detection.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Sun, 12 Nov 2017 21:15:53 +0000 (16:15 -0500)]
Fix: Use tmpdir for intermediary files
Launching root and non-root testing would result in conflict.
Remove unused TRACE_PATH assignation.
Clear the pipe list variable before each pipe collecting.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Sun, 12 Nov 2017 20:36:52 +0000 (15:36 -0500)]
Fix: include scripts for distribution
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sun, 12 Nov 2017 20:19:56 +0000 (15:19 -0500)]
Fix: typo in lttng-consumerd file default
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sun, 12 Nov 2017 20:19:35 +0000 (15:19 -0500)]
Fix: missing NULL checks in logging statements
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sun, 12 Nov 2017 19:59:28 +0000 (14:59 -0500)]
Fix: kernel consumerd sock paths need rundir substitution
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Fri, 28 Jul 2017 17:40:41 +0000 (13:40 -0400)]
Test: kernel testing for notification
Perform notification tests on both domains.
Scenarios where low notifications are wanted need further synchronization
since multiple low notifications can be sent between resume_consumer
and the lttng stop command. This problem can be addressed by suspending
the generation of events. This is achieved by the use of signal-aware
background shells and the use of lttng-test kernel module or
gen-ust-events as events generator.
These background shells are controlled by signal SIGUSR1 and
provide their states via a state file. If the file is present the
application is suspended and does not generate events, otherwise events
are generated.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sun, 12 Nov 2017 19:19:01 +0000 (14:19 -0500)]
Fix: create lttng run dir regardless of user privilege
The changes made when centralizing the configuration handling
introduced a regression which cause the rundir to only be
created when running as the root user.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Olivier Blin [Fri, 27 Oct 2017 09:46:19 +0000 (11:46 +0200)]
Fix: Make version.h generation work with dash
version.h generation failed when using dash as shell:
Generating version.h... /bin/sh: 24: Syntax error: Missing '))'
dash does not handle the following construct:
git_describe="$((cd /path/to/lttng-tools/.; git describe) 2>/dev/null)"
Use backquotes instead.
The fix has been tested with dash and bash.
Signed-off-by: Olivier Blin <olivier.blin@softathome.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Philippe Proulx [Fri, 8 Sep 2017 02:52:48 +0000 (22:52 -0400)]
lttng-enable-event(1): filtering: specify that `$ctx.cpu_id` is available
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 9 Nov 2017 22:46:54 +0000 (17:46 -0500)]
centralize sessiond config option handling
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sun, 12 Nov 2017 16:41:47 +0000 (11:41 -0500)]
Fix: buffer overflow warning in python bindings
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 8 Nov 2017 19:02:07 +0000 (14:02 -0500)]
Tests fix: BT2 does not output the metadata of a trace collection
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 7 Aug 2017 20:07:53 +0000 (16:07 -0400)]
Update version to 2.11.0-pre
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Wed, 2 Aug 2017 18:26:18 +0000 (14:26 -0400)]
Typo: occured -> occurred
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 2 Aug 2017 20:49:44 +0000 (16:49 -0400)]
Fix: ensure kernel context is in a list before trying to delete it
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 2 Aug 2017 19:24:00 +0000 (15:24 -0400)]
Harmonize return code conventions in context handling
Mathieu Desnoyers [Wed, 2 Aug 2017 15:34:43 +0000 (11:34 -0400)]
Fix: uninitialized return value on error path
Found by Coverity:
*** CID
1378810: Uninitialized variables (UNINIT)
/src/bin/lttng-sessiond/context.c: 73 in add_kctx_all_channels()
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Philippe Proulx [Fri, 28 Jul 2017 20:36:48 +0000 (16:36 -0400)]
lttng enable-channel: disallow --overwrite and --blocking-timeout
The overwrite mode has no impact on LTTng-UST when there's a set
blocking timeout.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Philippe Proulx [Fri, 28 Jul 2017 20:30:42 +0000 (16:30 -0400)]
lttng-enable-channel(1): reword --blocking-timeout, document in description
Also change the synopsis so that you can specify --blocking-timeout,
--overwrite, or none, but not both.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Philippe Proulx [Fri, 28 Jul 2017 19:48:32 +0000 (15:48 -0400)]
lttng enable-channel: --blocking-timeout opt.: use `inf` instead of -1
It might be -1 for the API, but for a command-line interface used by
humans, `inf` is more meaningful than -1.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Julien Desfossez [Fri, 28 Jul 2017 18:15:40 +0000 (14:15 -0400)]
Cleanup: remove unused internal structure
There is TODO to add content to lttcomm_relayd_update_sync_info
since 2012 (commit
173af62f4804133d4a7f45e34b6f72126f3eca5f),
the intent is really not clear and this is never going to
happen, let's remove it.
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 1 Aug 2017 18:53:25 +0000 (14:53 -0400)]
Cleanup: remove unnecessary extern qualifier
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 1 Aug 2017 18:52:57 +0000 (14:52 -0400)]
Docs: document the trigger API
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 1 Aug 2017 18:29:02 +0000 (14:29 -0400)]
Docs: document the notification API
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 1 Aug 2017 18:23:43 +0000 (14:23 -0400)]
Docs: document the notification channel API
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 1 Aug 2017 17:57:39 +0000 (13:57 -0400)]
Docs: document the evaluation API
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 1 Aug 2017 16:25:57 +0000 (12:25 -0400)]
Docs: document the lttng_condition API
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 31 Jul 2017 18:58:18 +0000 (14:58 -0400)]
Docs: document the lttng_buffer_usage condition API
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 31 Jul 2017 18:08:19 +0000 (14:08 -0400)]
Docs: document the lttng_action_notify action type
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 31 Jul 2017 18:07:59 +0000 (14:07 -0400)]
Docs: document the lttng_action API
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 31 Jul 2017 21:51:35 +0000 (17:51 -0400)]
Fix: ambiguous ownership of kernel context by multiple channels
A kernel context, when added to multiple channels, must be copied
before being added to individual channels. The current code
adds the same ltt_kernel_context structure to multiple kernel
channels which introduces a conceptual ambiguity in the ownership
of the context object.
Concretely, creating multiple kernel channels and adding a context
to all of them (by not specifying a channel name) causes the context
to be added to each channels' list of contexts, overwritting the
context's list node, and causing the channel context lists to become
corrupted. This results in crashes being observed during the
destruction of the session.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Philippe Proulx [Mon, 31 Jul 2017 18:09:44 +0000 (14:09 -0400)]
lttng-enable-channel(1): move --output description to maintain A-Z ordering
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Philippe Proulx [Mon, 31 Jul 2017 18:08:28 +0000 (14:08 -0400)]
lttng-enable-channel(1): document --monitor-timer
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 28 Jul 2017 21:00:07 +0000 (17:00 -0400)]
Prettify channel listing
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Thu, 27 Jul 2017 22:39:45 +0000 (18:39 -0400)]
Use pipe instead of eventfd() for notification command queue
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 26 Jul 2017 14:53:53 +0000 (10:53 -0400)]
Cleanup: useless reset of ret to zero
ret is overwritten in the normal code flow.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 26 Jul 2017 14:52:15 +0000 (10:52 -0400)]
Fix: ret is never used on error_open code path
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 26 Jul 2017 14:29:17 +0000 (10:29 -0400)]
Fix: use error code path instead of break when errors happen before execl
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 25 Jul 2017 21:56:58 +0000 (17:56 -0400)]
Cleanup: ignore useless check of execl() return value
execl only return if there is an error. errno is handled hence no need
to consider the return value which would always be -1.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 25 Jul 2017 21:46:47 +0000 (17:46 -0400)]
Fix: wrong variable assignment on error
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 25 Jul 2017 21:43:58 +0000 (17:43 -0400)]
Cleanup: remove dead increment of pointer
No further memcpy is performed no need to increment the pointer.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 25 Jul 2017 21:20:45 +0000 (17:20 -0400)]
Fix: missing error handling in use of print_tabs()
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 25 Jul 2017 21:12:31 +0000 (17:12 -0400)]
Cleanup: functions shall have a single exit point
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 25 Jul 2017 20:57:50 +0000 (16:57 -0400)]
Cleanup: remove dead assignment
Only handle cases where the returned error is not EEXIST. ret is
overwritten anyway.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 25 Jul 2017 20:29:53 +0000 (16:29 -0400)]
Cleanup: remove dead assignment
ret is not used when jumping to error_no_alloc.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 25 Jul 2017 20:26:25 +0000 (16:26 -0400)]
Cleanup: remove dead assignment
Both calling sites do not use the return value and errors are already
managed inside the called function.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 25 Jul 2017 19:55:31 +0000 (15:55 -0400)]
Cleanup: remove dead assignment
Artifact of refactor done in
f8f3885cc52af9d3c951da78989d6f4a25270411
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 25 Jul 2017 19:51:34 +0000 (15:51 -0400)]
Cleanup: remove dead assignment
ret is not used for further error propagation.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 25 Jul 2017 18:11:02 +0000 (14:11 -0400)]
Cleanup: remove dead assignment
ret is overwritten no need to reset it.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 25 Jul 2017 18:07:30 +0000 (14:07 -0400)]
Cleanup: remove dead assignment
ret is always overwritten hence assign a value here is not necessary.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
This page took 0.046225 seconds and 4 git commands to generate.