update compat

[lttv.git] / ltt / branches / poly / doc / developer / lttng-userspace-tracing.txt
diff --git a/ltt/branches/poly/doc/developer/lttng-userspace-tracing.txt b/ltt/branches/poly/doc/developer/lttng-userspace-tracing.txt

index 85a31b3da9da30389be67010c34a7b5eb7026c4d..d61953f58d4d2e0521d593317b2b66e0a499e818 100644 (file)
--- a/ltt/branches/poly/doc/developer/lttng-userspace-tracing.txt
+++ b/ltt/branches/poly/doc/developer/lttng-userspace-tracing.txt
@@ -48,8 +48,13 @@ status.
  
  My suggestion is to go for a system call, but only call it :
  
  
  My suggestion is to go for a system call, but only call it :
  
-- when the process starts
-- when receiving a SIG_UPDTRACING
+- when the thread starts
+- when receiving a SIGRTMIN+3 (multithread ?)
+
+Note : save the thread ID (process ID) in the logging function and the update
+handler. Use it as a comparison to check if we are a forked child thread.
+Start a brand new buffer list in that case.
+
  
  Two possibilities :
  
  
  Two possibilities :
  
@@ -66,7 +71,9 @@ I would tend to adopt :
  
  syscall get_tracing_info
  
  
  syscall get_tracing_info
  
-first parameter : active traces mask (32 bits : 32 traces).
+parameter 1 : trace buffer map address. (id)
+
+parameter 2 : active ? (int)
  
  
  Concurrency
  
  
  Concurrency
@@ -79,15 +86,15 @@ that) and removes false sharing.
  Multiple traces
  
  By having the number of active traces, we can allocate as much buffers as we
  Multiple traces
  
  By having the number of active traces, we can allocate as much buffers as we
-need. The only thing is that the buffers will only be allocated when receiving
-the signal/starting the process and getting the number of traces actives.
+need. Allocation is done in the kernel with relay_open. User space mapping is
+done when receiving the signal/starting the process and getting the number of
+traces actives.
  
  It means that we must make sure to only update the data structures used by
  tracing functions once the buffers are created.
  
  
  It means that we must make sure to only update the data structures used by
  tracing functions once the buffers are created.
  
-When adding a new buffer, we should call the set_tracing_info syscall and give
-the new buffers array to the kernel. It's an array of 32 pointers to user pages.
-They will be used by the kernel to get the last pages when the thread dies.
+We could have a syscall "get_next_buffer" that would basically mmap the next
+unmmapped buffer, or return NULL is all buffers are mapped.
  
  If we remove a trace, the kernel should stop the tracing, and then get the last
  buffer for this trace. What is important is to make sure no writers are still
  
  If we remove a trace, the kernel should stop the tracing, and then get the last
  buffer for this trace. What is important is to make sure no writers are still
@@ -115,8 +122,7 @@ We could do that trace removal in two operations :
         accessing this memory area. When the control comes back to the writer, at the
         end of the write in a trace, if the trace is marked for switch/delete and the
         tracing_level is 0 (after the decrement of the writer itself), then the
         accessing this memory area. When the control comes back to the writer, at the
         end of the write in a trace, if the trace is marked for switch/delete and the
         tracing_level is 0 (after the decrement of the writer itself), then the
-       writer must buffer switch, set_tracing_info to NULL and then delete the
-       memory area.
+       writer must buffer switch, and then delete the memory area.
  
  
  Filter
  
  
  Filter
@@ -124,9 +130,7 @@ Filter
  The update tracing info signal will make the thread get the new filter
  information. Getting this information will also happen upon process creation.
  
  The update tracing info signal will make the thread get the new filter
  information. Getting this information will also happen upon process creation.
  
-parameter 2 for the get tracing info : array of 32 ints (32 bits).
-Each integer is the filter mask for a trace. As there are up to 32 active
-traces, we have 32 integers for filter.
+parameter 3 for the get tracing info : a integer containing the 32 bits mask.
  
  
  Buffer switch
  
  
  Buffer switch
@@ -142,15 +146,10 @@ The kernel should be aware of the current pages used for tracing in each thread.
  If a thread dies unexpectedly, we want the kernel to get the last bits of
  information before the thread crashes.
  
  If a thread dies unexpectedly, we want the kernel to get the last bits of
  information before the thread crashes.
  
-syscall set_tracing_info
-
-parameter 1 : array of 32 user space pointers to current pages or NULL.
-
-
  Memory protection
  
  Memory protection
  
-We want each process to be usable to make a trace unreadable, and each process
-to have its own memory space.
+If a process corrupt its own mmaped buffers, the rest of the trace will be
+readable, and each process have its own memory space.
  
  Two possibilities :
  
  
  Two possibilities :
  
@@ -189,6 +188,127 @@ trace, per process.
  
  
  
  
  
  
+API :
+
+syscall 1 :
+
+in :
+buffer : NULL means get new traces
+                                non NULL means to get the information for the specified buffer
+out :
+buffer : returns the address of the trace buffer
+active : is the trace active ?
+filter : 32 bits filter mask
+
+return : 0 on success, 1 on error.
+
+int ltt_update(void **buffer, int *active, int *filter);
+
+syscall 2 :
+
+in :
+buffer : Switch the specified buffer.
+return : 0 on success, 1 on error.
+
+int ltt_switch(void *buffer);
+
+
+Signal :
+
+SIGRTMIN+3
+(like hardware fault and expiring timer : to the thread, see p. 413 of Advances
+prog. in the UNIX env.)
+
+Signal is sent on tracing create/destroy, start/stop and filter change.
+
+Will update for itself only : it will remove unnecessary concurrency.
+
+
+
+Notes :
+
+It doesn't matter "when" the process receives the update signal after a trace
+start : it will receive it in priority, before executing anything else when it
+will be scheduled in.
+
+
+
+Major enhancement :
+
+* Buffer pool *
+
+The problem with the design, up to now, is if an heavily threaded application
+launches many threads that has a short lifetime : it will allocate memory for
+each traced thread, consuming time and it will create an incredibly high
+number of files in the trace (or per thread).
+
+(thanks to Matthew Khouzam)
+The solution to this sits in the use of a buffer poll : We typically create a
+buffer pool of a specified size (say, 10 buffers by default, alterable by the
+user), each 8k in size (4k for normal trace, 4k for facility channel), for a
+total of 80kB of memory. It has to be tweaked to the maximum number of
+expected threads running at once, or it will have to grow dynamically (thus
+impacting on the trace).
+
+A typical approach to dynamic growth is to double the number of allocated
+buffers each time a threashold near the limit is reached.
+
+Each channel would be found as :
+
+trace_name/user/facilities_0
+trace_name/user/cpu_0
+trace_name/user/facilities_1
+trace_name/user/cpu_1
+...
+
+When a thread asks for being traced, it gets a buffer from free buffers pool. If
+the number of available buffers falls under a threshold, the pool is marked for
+expansion and the thread gets its buffer quickly. The expansion will be executed
+a little bit later by a worker thread. If however, the number of available
+buffer is 0, then an "emergency" reservation will be done, allocating only one
+buffer. The goal of this is to modify the thread fork time as less as possible.
+
+When a thread releases a buffer (the thread terminates), a buffer switch is
+performed, so the data can be flushed to disk and no other thread will mess
+with it or render the buffer unreadable.
+
+Upon trace creation, the pre-allocated pool is allocated. Upon trace
+destruction, the threads are first informed of the trace destruction, any
+pending worker thread (for pool allocation) is cancelled and then the pool is
+released. Buffers used by threads at this moment but not mapped for reading
+will be simply destroyed (as their refcount will fall to 0). It means that
+between the "trace stop" and "trace destroy", there should be enough time to let
+the lttd daemon open the newly created channels or they will be lost.
+
+Upon buffer switch, the reader can read directly from the buffer. Note that when
+the reader finish reading a buffer, if the associated thread writer has
+exited, it must fill the buffer with zeroes and put it back into the free pool.
+In the case where the trace is destroyed, it must just derement its refcount (as
+it would do otherwise) and the buffer will be destroyed.
+
+This pool will reduce the number of trace files created to the order of the
+number of threads present in the system at a given time.
+
+A worse cast scenario is 32768 processes traced at the same time, for a total
+amount of 256MB of buffers. If a machine has so many threads, it probably have
+enough memory to handle this.
+
+In flight recorder mode, it would be interesting to use a LRU algorithm to
+choose which buffer from the pool we must take for a newly forked thread. A
+simple queue would do it.
+
+SMP : per cpu pools ? -> no, L1 and L2 caches are typically too small to be
+impacted by the fact that a reused buffer is on a different or the same CPU.
+
+
+
+
+
+
+
+
+
+