| 1 | |
| 2 | Tracepoint proposal |
| 3 | |
| 4 | - Tracepoint infrastructure |
| 5 | - In-kernel users |
| 6 | - Complete typing, verified by the compiler |
| 7 | - Dynamically linked and activated |
| 8 | |
| 9 | - Marker infrastructure |
| 10 | - Exported API to userland |
| 11 | - Basic types only |
| 12 | |
| 13 | - Dynamic vs static |
| 14 | - In-kernel probes are dynamically linked, dynamically activated, connected to |
| 15 | tracepoints. Type verification is done at compile-time. Those in-kernel |
| 16 | probes can be a probe extracting the information to put in a marker or a |
| 17 | specific in-kernel tracer such as ftrace. |
| 18 | - Information sinks (LTTng, SystemTAP) are dynamically connected to the |
| 19 | markers inserted in the probes and are dynamically activated. |
| 20 | |
| 21 | - Near instrumentation site vs in a separate tracer module |
| 22 | |
| 23 | A probe module, only if provided with the kernel tree, could connect to internal |
| 24 | tracing sites. This argues for keeping the tracepoing probes near the |
| 25 | instrumentation site code. However, if a tracer is general purpose and exports |
| 26 | typing information to userspace through some mechanism, it should only export |
| 27 | the "basic type" information and could be therefore shipped outside of the |
| 28 | kernel tree. |
| 29 | |
| 30 | In-kernel probes should be integrated to the kernel tree. They would be close to |
| 31 | the instrumented kernel code and would translate between the in-kernel |
| 32 | instrumentation and the "basic type" exports. Other in-kernel probes could |
| 33 | provide a different output (statistics available through debugfs for instance). |
| 34 | ftrace falls into this category. |
| 35 | |
| 36 | Generic or specialized information "sinks" (LTTng, systemtap) could be connected |
| 37 | to the markers put in tracepoint probes to extract the information to userspace. |
| 38 | They would extract both typing information and the per-tracepoint execution |
| 39 | information to userspace. |
| 40 | |
| 41 | Therefore, the code would look like : |
| 42 | |
| 43 | kernel/sched.c: |
| 44 | |
| 45 | #include "sched-trace.h" |
| 46 | |
| 47 | schedule() |
| 48 | { |
| 49 | ... |
| 50 | trace_sched_switch(prev, next); |
| 51 | ... |
| 52 | } |
| 53 | |
| 54 | |
| 55 | kernel/sched-trace.h: |
| 56 | |
| 57 | DEFINE_TRACE(sched_switch, struct task_struct *prev, struct task_struct *next); |
| 58 | |
| 59 | |
| 60 | kernel/sched-trace.c: |
| 61 | |
| 62 | #include "sched-trace.h" |
| 63 | |
| 64 | static probe_sched_switch(struct task_struct *prev, struct task_struct |
| 65 | *next) |
| 66 | { |
| 67 | trace_mark(kernel_sched_switch, "prev_pid %d next_pid %d prev_state %ld", |
| 68 | prev->pid, next->pid, prev->state); |
| 69 | } |
| 70 | |
| 71 | int __init init(void) |
| 72 | { |
| 73 | return register_sched_switch(probe_sched_switch); |
| 74 | } |
| 75 | |
| 76 | void __exit exit(void) |
| 77 | { |
| 78 | unregister_sched_switch(probe_sched_switch); |
| 79 | } |
| 80 | |
| 81 | |
| 82 | Where DEFINE_TRACE internals declare a structure, a trace_* inline function, |
| 83 | a register_trace_* and unregister_trace_* inline functions : |
| 84 | |
| 85 | static instrumentation site structure, containing function pointers to |
| 86 | deactivated functions and activation boolean. It also contains the |
| 87 | "sched_switch" string. This structure is placed in a special section to create |
| 88 | an array of these structures. |
| 89 | |
| 90 | static inline void trace_sched_switch(struct task_struct *prev, |
| 91 | struct task_struct *next) |
| 92 | { |
| 93 | if (sched_switch tracing is activated) |
| 94 | marshall_probes(&instrumentation_site_structure, prev, next); |
| 95 | } |
| 96 | |
| 97 | static inline int register_trace_sched_switch( |
| 98 | void (*probe)(struct task_struct *prev, struct task_struct *next) |
| 99 | { |
| 100 | return do_register_probe("sched_switch", (void *)probe); |
| 101 | } |
| 102 | |
| 103 | static inline void unregister_trace_sched_switch( |
| 104 | void (*probe)(struct task_struct *prev, struct task_struct *next) |
| 105 | { |
| 106 | do_unregister_probe("sched_switch", (void *)probe); |
| 107 | } |
| 108 | |
| 109 | |
| 110 | We need a a new kernel probe API : |
| 111 | |
| 112 | do_register_probe / do_unregister_probe |
| 113 | - Connects the in-kernel probe to the site |
| 114 | - Activates the site tracing (probe reference counting) |
| 115 | |
| 116 | |