From 7a7472507b5cfdde1e205678c0c66c23eb8bb2ad Mon Sep 17 00:00:00 2001 From: compudj Date: Wed, 11 Jan 2006 21:45:01 +0000 Subject: [PATCH] userspace tracing arch git-svn-id: http://ltt.polymtl.ca/svn@1469 04897980-b3bd-0310-b5e0-8ef037075253 --- .../doc/developer/lttng-userspace-tracing.txt | 194 ++++++++++++++++++ 1 file changed, 194 insertions(+) create mode 100644 ltt/branches/poly/doc/developer/lttng-userspace-tracing.txt diff --git a/ltt/branches/poly/doc/developer/lttng-userspace-tracing.txt b/ltt/branches/poly/doc/developer/lttng-userspace-tracing.txt new file mode 100644 index 00000000..85a31b3d --- /dev/null +++ b/ltt/branches/poly/doc/developer/lttng-userspace-tracing.txt @@ -0,0 +1,194 @@ + +Some thoughts about userspace tracing + +Mathieu Desnoyers January 2006 + + + +* Goals + +Fast and secure user space tracing. + +Fast : + +- 5000ns for a system call is too long. Writing an event directly to memory + takes 220ns. +- Still, we can afford a system call for buffer switch, which occurs less often. +- No locking, no signal disabling. Disabling signals require 2 system calls. + Mutexes are implemented with a short spin lock, followed by a yield. Yet + another system call. In addition, we have no way to know on which CPU we are + running when in user mode. We can be preempted anywhere. +- No contention. +- No interrupt disabling : it doesn't exist in user mode. + +Secure : + +- A process shouldn't be able to corrupt the system's trace or another + process'trace. It should be limited to its own memory space. + + + +* Solution + +- Signal handler concurrency + +Using atomic space reservation in the buffer(s) will remove the requirement for +locking. This is the fast and safe way to deal with concurrency coming from +signal handlers. + +- Start/stop tracing + +Two possible solutions : + +Either we export a read-only memory page from kernel to user space. That would +be somehow seen as a hack, as I have never even seen such interface anywhere +else. It may lead to problems related to exported types. The proper, but slow, +way to do it would be to have a system call that would return the tracing +status. + +My suggestion is to go for a system call, but only call it : + +- when the process starts +- when receiving a SIG_UPDTRACING + +Two possibilities : + +- one system call per information to get/one system call to get all information. +- one signal per information to get/one signal for "update" tracing info. + +I would tend to adopt : + +- One signal for "general tracing update" + One signal handler would clearly be enough, more would be unnecessary + overhead/pollution. +- One system call for all updates. + We will need to have multiple parameters though. We have up to 6 parameters. + +syscall get_tracing_info + +first parameter : active traces mask (32 bits : 32 traces). + + +Concurrency + +We must have per thread buffers. Then, no memory can be written by two threads +at once. It removes the need for locks (ok, atomic reservation was already doing +that) and removes false sharing. + + +Multiple traces + +By having the number of active traces, we can allocate as much buffers as we +need. The only thing is that the buffers will only be allocated when receiving +the signal/starting the process and getting the number of traces actives. + +It means that we must make sure to only update the data structures used by +tracing functions once the buffers are created. + +When adding a new buffer, we should call the set_tracing_info syscall and give +the new buffers array to the kernel. It's an array of 32 pointers to user pages. +They will be used by the kernel to get the last pages when the thread dies. + +If we remove a trace, the kernel should stop the tracing, and then get the last +buffer for this trace. What is important is to make sure no writers are still +trying to write in a memory region that get desallocated. + +For that, we will keep an atomic variable "tracing_level", which tells how many +times we are nested in tracing code (program code/signal handlers) for a +specific trace. + +We could do that trace removal in two operations : + +- Send an update tracing signal to the process + - the sig handler get the new tracing status, which tells that tracing is + disabled for the specific trace. It writes this status in the tracing + control structure of the process. + - If tracing_level is 0, well, it's fine : there are no potential writers in + the removed trace. It's up to us to buffer switch the removed trace, and, + after the control returns to us, set_tracing_info this page to NULL and + delete this memory area. + - Else (tracing_level > 0), flag the removed trace for later switch/delete. + + It then returns control to the process. + +- If the tracing_level was > 0, there was one or more writers potentially + accessing this memory area. When the control comes back to the writer, at the + end of the write in a trace, if the trace is marked for switch/delete and the + tracing_level is 0 (after the decrement of the writer itself), then the + writer must buffer switch, set_tracing_info to NULL and then delete the + memory area. + + +Filter + +The update tracing info signal will make the thread get the new filter +information. Getting this information will also happen upon process creation. + +parameter 2 for the get tracing info : array of 32 ints (32 bits). +Each integer is the filter mask for a trace. As there are up to 32 active +traces, we have 32 integers for filter. + + +Buffer switch + +There could be a tracing_buffer_switch system call, that would give the page +start address as parameter. The job of the kernel is to steal this page, +possibly replacing it with a zeroed page (we don't care about the content of the +page after the syscall). + +Process dying + +The kernel should be aware of the current pages used for tracing in each thread. +If a thread dies unexpectedly, we want the kernel to get the last bits of +information before the thread crashes. + +syscall set_tracing_info + +parameter 1 : array of 32 user space pointers to current pages or NULL. + + +Memory protection + +We want each process to be usable to make a trace unreadable, and each process +to have its own memory space. + +Two possibilities : + +Either we create one channel per process, or we have per cpu tracefiles for all +the processes, with the specification that data is written in a monotically +increasing time order and that no process share a 4k page with another process. + +The problem with having only one tracefile per cpu is that we cannot safely +steal a process'buffer upon a schedule change because it may be currently +writing to it. + +It leaves the one tracefile per thread as the only solution. + +Another argument in favor of this solution is the possibility to have mixed +32-64 bits processes on the same machine. Dealing with types will be easier. + + +Corrupted trace + +A corrupted tracefile will only affect one thread. The rest of the trace will +still be readable. + + +Facilities + +Upon process creation or when receiving the signal of trace info update, when a +new trace appears, the thread should write the facility information into it. It +must then have a list of registered facilities, all done at the thread level. + +We must decide if we allow a facility channel for each thread. The advantage is +that we have a readable channel in flight recorder mode, while the disadvantage +is to duplicate the number of channels, which may become quite high. To follow +the general design of a high throughput channel and a low throughput channel for +vital information, I suggest to have a separate channel for facilities, per +trace, per process. + + + + + + -- 2.34.1