Jérémie Galarneau [Tue, 8 May 2018 16:00:41 +0000 (12:00 -0400)]
Cleanup: send_fds functions are not const-correct
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 3 May 2018 20:06:11 +0000 (16:06 -0400)]
Remove unused ltt_session look-up result
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 3 May 2018 18:57:07 +0000 (14:57 -0400)]
Clean-up: reduce indentation level of create_channel_per_uid()
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 3 May 2018 18:35:24 +0000 (14:35 -0400)]
Enforce locking assumptions during channel creation
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 2 May 2018 21:43:48 +0000 (17:43 -0400)]
Cleanup: misleading create_ust_app_session() name
create_ust_app_session() does not necessarily create an
ust_app_session; it will look for an existing one and return it
and only create one if it fails to do so.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 2 May 2018 19:27:37 +0000 (15:27 -0400)]
Rename rotate_count to current_archive_id
The ltt_session's rotate count will no longer be used only to
count the number of rotations. It will be used to tag streams
with a "trace archive chunk id" that indicates the epoch of
their creation.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 2 May 2018 18:39:14 +0000 (14:39 -0400)]
Cleanup: name of send_sessiond_channel() is misleading
This function sends a channel to the sessiond _and_ to the
relay daemon (if applicable).
Comments are updated to reflect this change and the publication
of streams towards the relay daemon is now logged.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Fri, 27 Apr 2018 21:27:29 +0000 (17:27 -0400)]
Print the git version used to build from a distribution tarball
The git version is omitted when building from a distribution
tarball. This will cause 'lttng version' and 'lttng --version'
to print the state of the git tree which produced the tarball.
git describe is used to produce the description of the tree's
state, along with the "dirty" state (whether or not local
changes were present in the tree).
Note that the 'git version' will not be printed when the
distribution tarball was produced at a release tag (a tag
starting with v[0-9]).
This patch simplifies the generation of the version.h file by
generating a file that is merely included by version.h.
It also ensures that version.tmpl is no longer installed on the
system by the install target.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 1 May 2018 19:22:57 +0000 (15:22 -0400)]
Docs: lttng-version uses the intransitive form of "broke"
To indicate that something is divided, the transitive form
"broken down" is preferred.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Sat, 28 Apr 2018 00:06:08 +0000 (20:06 -0400)]
Fix: relayd streams can be leaked on connection error
There are cases where a connection error can cause streams to be
leaked.
For instance, the control connection could receive an index and
close. Since a packet is in-flight, the stream corresponding to
that index will not close. However, nothing guarantees that
the data connection will be able to receive the packet's data.
If the protocol is respected, this is not a problem. However,
a buggy consumerd or network errors can cause the streams to
remain in the "data in-flight" state and never close.
To mitigate a case observed in the field where a consumerd
would be forcibly closed (network interface brought down) and
cause leaks on the relay daemon, the session is aborted whenever
the control or data connection encounters an error. Aborting
a session causes the streams to be closed regardless of the
fact that data is in-flight.
Currently, only the control connection holds an ownership of
the session object. This can cause the following scenario to leak
streams:
1) Control connection receives an index
- Stream is put in "in-flight data" mode
2) Control connection is closed/shutdown cleanly
- try_stream_close refuses to close the stream as data is in-flight,
but it puts the stream in "closed" mode. When the data is
received, the stream will be closed as soon as possible.
3) Data connection closes cleanly or due to an error
- The stream "closing" condition will never be re-evaluated.
Since the data connection has no ownership of the session, it can
never clean-up the streams that are waiting for "in-flight" data to
arrive before closing.
This patch lazily associates the data connection to its session
so that the session can be aborted whenever an error happens on
either the data or control connection.
Note that this leaves the relayd vulnerable to a case which will
still leak. If the control connection receives an index and closes
cleanly, the data connection could have never been established
with the consumer daemon and result in a leak.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 1 May 2018 15:58:15 +0000 (11:58 -0400)]
Cleanup: fix typo in relayd comment
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Mon, 30 Apr 2018 18:27:35 +0000 (14:27 -0400)]
Fix: ret may be used uninitialized in sample_channel_positions()
sample_channel_positions() returns garbage if
cds_lfht_is_node_deleted(&stream->node.node) on first and "possibly"
only iteration of the consumer_data.stream_per_chan_id_ht hash table.
Found by scan-build.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Fri, 27 Apr 2018 21:20:21 +0000 (17:20 -0400)]
Cleanup: ret is unused in relay_process_data_receive_header()
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 27 Apr 2018 22:23:26 +0000 (18:23 -0400)]
Fix build: in_git_repo is used before being set
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 27 Apr 2018 21:30:50 +0000 (17:30 -0400)]
Fix: partial writes of padding are not checked
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 27 Apr 2018 20:47:04 +0000 (16:47 -0400)]
Propagate whether a connection was closed cleanly or after an error
This allows a follow-up fix that requires this distinction to
decide whether a session must be closed or aborted.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Fri, 27 Apr 2018 19:44:19 +0000 (15:44 -0400)]
Fix: relayd protocol field present from minor 8 is not checked
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Mon, 9 Apr 2018 14:23:33 +0000 (10:23 -0400)]
Add DBG statement for TCP keep-alive options
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 25 Apr 2018 18:57:29 +0000 (14:57 -0400)]
Fix: relay_recv_metadata does not check for partial write
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 21 Feb 2018 05:57:26 +0000 (00:57 -0500)]
Use non-blocking recvmsg() for data/ctrl connections of lttng-relayd
The relay daemon's use of blocking network I/O can cause severe
performance degradation when interacting with unresponsive peers.
This patch changes the recvmsg() calls to use the MSG_DONTWAIT flag
which makes the call non-blocking. The connection classes are modified
to handle the partial reception of buffers.
The sendmsg() calls are still blocking, but this is assumed to
represent a fairly minimal risk of actually blocking given that
the control protocol's replies consist of 4-byte status codes.
A similar approach could be used to make the live connections
non-blocking as that side may also suffer from the same resiliancy
problems. So far, no users have reported this problem so it is
not prioritised.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 24 Apr 2018 19:58:41 +0000 (15:58 -0400)]
Fix: unprivilieged sessiond agent port clashes with root sessiond
This fix addresses the same problem as reported in
f28f9e44.
The session daemon now tries to bind the agent TCP socket to a
port within a range (10 ports by default). The session daemon
will use the first available TCP port within that range when
binding to "localhost". It is still possible to restrict the
session daemon to the broken behaviour by specifying an agent
port using the --agent-tcp-port PORT. If that option is used,
the session daemon will attempt to bind to that part. If it
fails, agent tracing will be marked as disabled.
This fix is backported since the current logic of binding to a
set port means that the default configuration on Ubuntu, Debian,
and other distributions that launch an lttng-sessiond on boot does
not allow the tracing of agent domains (Java Util Logging, log4j,
and Python logging back-ends).
By default, users are not part of the tracing group and it is
not reasonable to expect users to be part of that group for
userspace tracing.
The behaviour of the "system" lttng-sessiond does not change
as it will bind on the first available port within the range.
The non-privilieged session daemons that will be launched after
will be able to bind on other ports available within the range.
Reported-by: Deborah Barnard <starfallprojects@gmail.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 24 Apr 2018 15:21:37 +0000 (11:21 -0400)]
Fix: erroneous use of extern keyword
The extern keyword is errneously (or at least, uselessly) used
for an internal API where LTTNG_HIDDEN is meant to be used.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 23 Apr 2018 23:03:16 +0000 (19:03 -0400)]
Fix: failure to launch agent thread is not reported
A session daemon may fail to launch its agent thread. In such
a case, the tracing of agent domains fails silently as events
never get enabled through the agent.
The problem that was reported was caused by a second session
daemon being already bound on the agent TCP socket port, which
prevented the launch of the agent thread.
While in this situation tracing is still not possible, the user
will at least get an error indicating as such when enabling
an event in those domains.
Reported-by: Deborah Barnard <starfallprojects@gmail.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 23 Apr 2018 20:36:25 +0000 (16:36 -0400)]
Fix: agent may not be ready on launch
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 23 Apr 2018 19:45:12 +0000 (15:45 -0400)]
Cleanup: misleading variable name
Using "running" implies that the thread is guaranteed to be
functional/ready. The intention of those "running" flags is only
to indicate that the underlying pthread was created. The thread
may not be running anymore and these flags should not be used
to check if the thread is "ready" to process anything.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 23 Apr 2018 19:29:39 +0000 (15:29 -0400)]
Fix: checking for existing session daemon is done after daemonizing
The session daemon checks that no other session daemons are
running only after daemonizing. This means that launching the
deamon in background or daemon modes will appear to succeed even
if the launch failed due to an already present daemon.
The check is performed using both the client socket and the lock
file. This fix also addresses another problem that would cause
the pid file to be overwritten and deleted even if the session daemon
failed to launch.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 10 Apr 2018 20:33:32 +0000 (16:33 -0400)]
Fix: null pointer dereference in lttng_rotation_handle_destroy
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Geneviève Bastien [Thu, 1 Mar 2018 22:11:41 +0000 (17:11 -0500)]
sessiond: rename syscall.h so it does not conflict with system
Signed-off-by: Geneviève Bastien <gbastien+lttng@versatic.net>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 4 Apr 2018 19:32:27 +0000 (15:32 -0400)]
Tests: Handle rotations happening on two separate days during testing
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 3 Apr 2018 20:45:23 +0000 (16:45 -0400)]
Tests: Clean trace_path after each subtest
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 3 Apr 2018 20:36:42 +0000 (16:36 -0400)]
Tests: Use for loop for identical validation
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 3 Apr 2018 20:01:18 +0000 (16:01 -0400)]
Tests: Count number of chunk using ls
Instead on removing validated chunk and validate that the directory is
empty, count the number of chunk present at the beginning and validate
that the count is equal to three.
Let the caller take care of cleaning up the generated files.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 10 Apr 2018 18:40:32 +0000 (14:40 -0400)]
Fix: quiet option is not set in sessiond-config
The quiet option is currently set directly while parsing the
command line options of the lttng-sessiond. Since it is not
set in the sessiond configuration object, its default value
(false) overwrites the lttng_opt_quiet option when the
configuration is applied.
Reported-by: Stanislav Vovk <stanislav.vovk@windriver.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 10 Apr 2018 17:56:47 +0000 (13:56 -0400)]
Fix: hold consumer socket lock for consumer_send_msg
The lock is held and released during the recv() section, but not
during the send section for a failure to lookup the PID registry.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 9 Apr 2018 19:15:35 +0000 (15:15 -0400)]
Fix: use signed member to transport enum value
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Gregory LEOCADIE [Thu, 29 Mar 2018 10:52:30 +0000 (12:52 +0200)]
Fix: use off_t type for lseek function return value to avoid overflow
Context: LTTng is configured in live mode with only one channel, getting
traces for a long-running application (days of uptime)
The trace file gets bigger (many GBs), so the offset (bigger than
int.MaxValue). When getting a packet for such offset, the lseek returns
bigger than int.MaxValue. This value is stored in a variable "ret" of
type int. We have an overflow which leads to sending an error to the
viewer (babeltrace), which stops.
[error] get_data_packet: error.
[error] get_data_packet failed
[error] Unknown return code 0
Signed-off-by: Gregory LEOCADIE <g.leocadie@criteo.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sat, 7 Apr 2018 20:07:56 +0000 (16:07 -0400)]
Extend the rotation API to provide network trace archive locations
The current lttng-ctl rotation API does not allow a user to
differentiate between a network or local trace archive location.
The API currently only provides a "path" which is absolute when
a local rotation is completed, and relative (to an unknown location)
when the trace is streamed to a relay daemon.
This change introduces the lttng_trace_archive_location interface
to express these locations unambiguously. It is currently only
used by the rotation control API, but the intention is to also use
it for future interfaces which need to express a location description.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sat, 7 Apr 2018 20:04:51 +0000 (16:04 -0400)]
Increase LTTNG_HOST_NAME_MAX from 64 to 255
POSIX guarantees that a host name will not exceed 255 characters.
Moreover, RFC 1035 limits the length of a fully qualified domain name (FQDN)
to 255 characters.
This limit will be used as part of the lttngctl communication
protocol.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 5 Apr 2018 03:21:02 +0000 (23:21 -0400)]
Add lttng_trace_archive_location lttng-ctl API
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 4 Apr 2018 18:29:46 +0000 (14:29 -0400)]
Clarify notification channel info ht destruction error log
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 4 Apr 2018 16:20:30 +0000 (12:20 -0400)]
Fix: goto end after end label
This check was most probably for the previous call and was but after the
end label by mistake. The check is not needed since the end label is
following the call.
CID
1388094 (#1 of 1): Double free (USE_AFTER_FREE)
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Jonathan Rajotte [Wed, 4 Apr 2018 16:20:28 +0000 (12:20 -0400)]
Check return value of cds_lfht_destroy
Bubbling up the error is not an option here. Print and error an move on.
CID
1388096: Error handling issues (CHECKED_RETURN)
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Jonathan Rajotte [Wed, 4 Apr 2018 16:20:29 +0000 (12:20 -0400)]
Fix: destroy schedule attr
CID
1388095 (#9-14 of 14): Resource leak (RESOURCE_LEAK)
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Jonathan Rajotte [Tue, 3 Apr 2018 23:09:40 +0000 (19:09 -0400)]
Tests: fix oot and dist for rotation tests
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 3 Apr 2018 22:21:28 +0000 (18:21 -0400)]
Tests: add rotation tests scripts to noinst_SCRIPTS and EXTRA_DIST
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 28 Mar 2018 19:53:56 +0000 (15:53 -0400)]
Tests: SESSION_NAME defined on each iteration of kernel rotation test
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 28 Mar 2018 19:47:31 +0000 (15:47 -0400)]
Tests: Reduce scope of TRACE_PATH to a function
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 28 Mar 2018 19:30:25 +0000 (15:30 -0400)]
Tests: PID_RELAYD is never used
Irrelevant since before
1c362dc78cf2e28c8935efcb5d4a85ef5d5967ba
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 28 Mar 2018 16:05:11 +0000 (12:05 -0400)]
Tests: use functions from utils.sh in rotation tests
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 28 Mar 2018 17:56:27 +0000 (13:56 -0400)]
Tests: consolidate session creation with a uri parameter in utils.sh
Introduce a new create_lttng_session_uri test helper.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 28 Mar 2018 19:21:26 +0000 (15:21 -0400)]
Tests: use modprobe to test for the presence of lttng-modules
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 28 Mar 2018 19:21:00 +0000 (15:21 -0400)]
Tests: missing license header in rotation utils
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Thu, 22 Mar 2018 20:15:25 +0000 (16:15 -0400)]
Tests: missing parenthesis in userspace rotation test
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Thu, 22 Mar 2018 20:14:53 +0000 (16:14 -0400)]
Tests: use enable_ust_lttng_channel_ok instead of a custom lttng invocation
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Thu, 22 Mar 2018 20:13:35 +0000 (16:13 -0400)]
Tests: remove TRACE_PATH at the end of the rotation test only
Clean the inside of TRACE_PATH directory between each test but do not
delete the actual directory.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 21 Mar 2018 18:29:58 +0000 (14:29 -0400)]
Tests: exit $out gets overridden by EXIT trap from tap/tap.sh
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Thu, 29 Mar 2018 20:19:39 +0000 (16:19 -0400)]
Tests: Use SIGTERM instead of SIGKILL
The use of SIGKILL does not guarantee the immediate termination of sub
background task. Using SIGTERM allows the generator to finish cleanly.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Thu, 29 Mar 2018 20:16:26 +0000 (16:16 -0400)]
Add --post-script to tap-driver.sh
Allow warn_process.sh to be run between each test to validate that a
test does not leave ghost processes.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 3 Apr 2018 16:01:33 +0000 (12:01 -0400)]
Tests: add rotation tests to the "check" target
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 28 Feb 2018 21:32:26 +0000 (16:32 -0500)]
Fix: fail on truncation of kernel channel path
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 28 Feb 2018 21:26:11 +0000 (16:26 -0500)]
Fix: fail on truncation of snapshot path
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Julien Desfossez [Fri, 26 Jan 2018 21:56:52 +0000 (16:56 -0500)]
Dedicated error message when relay does not support rotations
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Julien Desfossez [Fri, 19 Jan 2018 16:14:35 +0000 (11:14 -0500)]
Fix: add missing includes for embedded help
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Philippe Proulx [Thu, 18 Jan 2018 22:12:02 +0000 (17:12 -0500)]
Document tracing session rotation features
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 10 Jan 2018 21:23:45 +0000 (16:23 -0500)]
Check for pending notification on notification channel activity
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 3 Apr 2018 14:16:16 +0000 (10:16 -0400)]
Clarify error logging statement of rotation thread
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 3 Apr 2018 14:14:17 +0000 (10:14 -0400)]
Fix: rotation state marked as completed before relayd has completed
The session rotation state is updated to COMPLETED before the
relay daemon has signaled that its rotation has been completed.
This causes users using the "rotation get_info" API to receive
this status before the session archive is readable on the
relay daemon's end.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 3 Apr 2018 14:13:26 +0000 (10:13 -0400)]
Fix: cmd_rotate_set_schedule returns positive error codes
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 3 Apr 2018 14:11:34 +0000 (10:11 -0400)]
Fix: unchecked return value of domain_mkdir()
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 3 Apr 2018 14:07:32 +0000 (10:07 -0400)]
Add initial "no rotation" state to session rotation states
ltt_session structures are initialized in the
LTTNG_ROTATION_sTATE_COMPLETED state which is unexpected for
users of the get_info API.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 29 Mar 2018 20:43:14 +0000 (16:43 -0400)]
Fix: erroneous use of kernel consumer error codes
Errors related to the kernel consumer are returned in a code path
that is only used by the userspace tracer, probably as a result
of copy-pasting code.
This patch changes the codes to the corresponding CONSUMER32/64
ones and makes them negative to honor the convention indicated
in the function's header.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 29 Mar 2018 19:13:34 +0000 (15:13 -0400)]
Fix: unhandled prev_seq initial value
The previous sequence number of a stream is initialized to -1ULL
and comparing the current sequence number against it to perform
a rotation will yield unexepected results.
The assumption that the previous sequence number is less than
the current one is assert()'ed on elsewhere.
This case triggers whenever a rotation is performed before the
relay daemon has received a packet for a given stream.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Julien Desfossez [Fri, 9 Feb 2018 19:53:32 +0000 (14:53 -0500)]
Size-based rotation
The user can now configure the desired size of each chunk, every time a
chunk is bigger than the specified size, a rotation is automatically
started. The size of a chunk is measured by polling from the monitoring
thread, so the accuracy depends on the monitoring sampling rate.
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 10 Jan 2018 19:12:35 +0000 (14:12 -0500)]
Add lttng_notification_channel_has_pending_notification()
This new API allows notification channel users to check for
pending notifications without necessarily blocking until
a new notification is ready. Moreoever, the pending notification
is not consumed by this new API.
lttng_notification_channel_get_next_notification() must still
be called to consume the new notification.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 9 Jan 2018 22:00:41 +0000 (17:00 -0500)]
Fix: channel lock must be taken to check for pending notifications
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 9 Jan 2018 22:00:27 +0000 (17:00 -0500)]
Docs: typo in notification channel header
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 20 Dec 2017 19:52:33 +0000 (14:52 -0500)]
Fix: circular inclusion of lttng.h results in warning
The circular inclusion of lttng.h, which includes all
public headers, from condition.h results in the following
warning for users of the API:
warning: ‘struct lttng_evaluation’ declared inside
parameter list will not be visible outside of this definition
or declaration
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 15 Dec 2017 21:03:52 +0000 (16:03 -0500)]
Remove unneeded domain.h include
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 29 Nov 2017 16:21:53 +0000 (17:21 +0100)]
Docs: wrong enum value used in evaluation API description
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 9 Feb 2018 20:07:56 +0000 (15:07 -0500)]
Remove unneeded forward declaration in condition headers
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Julien Desfossez [Thu, 11 Jan 2018 18:44:56 +0000 (13:44 -0500)]
Add the GMT offset in the rotated chunk path
The path of a rotated chunk is composed of the start and end timestamp
of the trace inside that chunk. In order to support distributed
environments, we now specify the GMT offset in this path as well. The
date is now formatted in ISO 8601. Here is an example:
~/lttng-traces/<session-name>/20180118T144610-0500-20180118T144611-0500-1
Chunk start: 2018-01-18 14:46:10
Chunk end: 2018-01-18 14:46:11
GMT offset: GMT-5 on both timestamps
Chunk ID: 1 (number of rotations that occured in this session so far)
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Julien Desfossez [Thu, 21 Dec 2017 20:32:16 +0000 (15:32 -0500)]
Tests for the session rotation feature
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Julien Desfossez [Thu, 21 Dec 2017 20:28:56 +0000 (15:28 -0500)]
Fix validate_trace_empty test check
Since the output of babeltrace was directly piped into wc, the return
code was never an error even if the trace was invalid. We now split the
commands in two parts: process the trace with babeltrace and check the
error code, and then count the number of lines.
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Julien Desfossez [Thu, 21 Dec 2017 19:57:49 +0000 (14:57 -0500)]
Example client to use the session rotation API
This client creates a session with all system calls enabled, offers
the option to rotate periodically the session (unlimited or only a
number of times), and call a script on each rotated chunk. The script
provided compresses the chunk and deletes the original.
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Julien Desfossez [Thu, 21 Dec 2017 19:32:32 +0000 (14:32 -0500)]
Save, restore and list the rotation parameters
Add the support to save, restore and list the automatic rotation
parameters.
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Julien Desfossez [Fri, 9 Feb 2018 19:50:28 +0000 (14:50 -0500)]
Session consumed size notification
Add the support for notifications about the total amount of trace data
consumed for a session. The user can register itself to be notified when
a session has consumed more than a threshold. This sums the data for all
channels in a session.
For the review: part of this code was written by Jérémie, but it was on
top of my development branch with major updates on my early work with
notifications, so I had to squash it because it made no sense to keep
Jérémie's code separate.
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 21 Dec 2017 16:06:11 +0000 (11:06 -0500)]
Fix: previous channel total is not updated
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 21 Dec 2017 16:03:33 +0000 (11:03 -0500)]
Add likely/unlikely annotations on channel sample handling path
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 20 Dec 2017 21:19:57 +0000 (16:19 -0500)]
Separate session info from channel info in notification thread
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Julien Desfossez [Wed, 20 Dec 2017 20:43:22 +0000 (15:43 -0500)]
Rotate timer
Allow the user to configure a timer to rotate a session periodically.
The user can configure this setting with the API or the new
enable-rotation/disable-rotation commands:
lttng enable-rotation --timer 10s
lttng disable-rotation --timer
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 3 Apr 2018 16:11:24 +0000 (12:11 -0400)]
Simplify lock handling in enqueue_timer_rotate_job()
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 17 Jun 2015 18:39:51 +0000 (14:39 -0400)]
Use utils_parse_time_suffix in create and enable-channel command
Signed-off-by: Simon Marchi <simon.marchi@polymtl.ca>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 17 Jun 2015 18:07:58 +0000 (14:07 -0400)]
Introduce utils_parse_time_suffix
This function is based on utils_parse_size_suffix, but is to parse
(relatively short) time suffixes. It returns the time in microseconds.
So far, it supports:
- u/us: microseconds, x1, same as no suffix
- m/ms: milliseconds, x1 000
- s: seconds, x1 000 000
For example:
- 32u becomes 32
- 32us becomes 32
- 32m becomes 32 000
- 32ms becomes 32 000
- 32s becomes 32 000 000
Signed-off-by: Simon Marchi <simon.marchi@polymtl.ca>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 27 Mar 2018 18:37:32 +0000 (14:37 -0400)]
Fix: use metadata key instead of fd for consumer rotation command
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 14 Mar 2018 21:35:05 +0000 (17:35 -0400)]
Fix: double similar condition
Based on the pattern of the function, threshold_bytes should be used
inside the "else if" condition.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 23 Mar 2018 20:08:53 +0000 (16:08 -0400)]
Fix: missing type definitions in mi-lttng-3.0.xsd
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 23 Mar 2018 19:58:13 +0000 (15:58 -0400)]
Fix: out of tree build fails on missing header
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Julien Desfossez [Mon, 18 Dec 2017 21:51:41 +0000 (16:51 -0500)]
lttng rotate command
The command line and API interface to the lttng rotate command.
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Julien Desfossez [Mon, 18 Dec 2017 21:04:44 +0000 (16:04 -0500)]
Relay rotate pending command
When a session rotation completes and the session is configured to send
its traces to a relay, we have to poll the relay to know when all the
chunk's data are written on its disk. To do that, we define a timer in
the sessiond and arm it when the rotation is complete. When the rotation
is complete on the relay, we clear the "rotate_pending" flag in the
session and the client can access the chunk safely.
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Julien Desfossez [Mon, 18 Dec 2017 19:45:56 +0000 (14:45 -0500)]
Rotate command
This is the core of the session rotation command in the session daemon,
no client interface for now. For each channel in the session, we send a
request to the related consumer to rotate the channel and add that
channel key and domain in the channel_pending_rotate_ht HT. When the
consumer has finished the rotation of all the streams in the channel, it
sends back a notification. The rotation thread in the session daemon
looks up the channel information in the HT and finds the corresponding
session. When all channels of a session have finished, the rotation
thread asks the consumer to rename the chunk folder to append the
timestamp of the end of the rotation.
On the first rotation, we have an extra step to change the session
directory layout from "<session-name>/<domain>" to
"<session-name>/<session-start-time>-<rotate-end-time>-1/<domain>".
When the rotation starts, the new chunk folder is created immediately
in: "<session-name>/<previous-rotate-start-time>-2/<domain>" so we won't
have to move the domain folder(s) after the next rotate has finished,
just rename the chunk folder.
The "mkdir" and "rename" commands are all propagated to the relay if
needed, only the rotate_pending check on the relay is not part of this
patch.
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
This page took 0.053596 seconds and 4 git commands to generate.