| 1 | |
| 2 | LTTng synthetic TSC MSB |
| 3 | |
| 4 | Mathieu Desnoyers, Mars 1, 2006 |
| 5 | |
| 6 | A problem found on some architectures is that the TSC is limited to 32 bits, |
| 7 | which induces a wrap-around every 8 seconds or so. |
| 8 | |
| 9 | The wraps arounds are detectable by the use of a heartbeat timer, which |
| 10 | generates an event in each trace at periodic interval. It makes reading the |
| 11 | trace sequentially possible. |
| 12 | |
| 13 | What causes problem is fast time seek in the trace : it uses the buffer |
| 14 | boundary timestamps (64 bits) to seek to the right block in O(log(n)). It |
| 15 | cannot, however, read the trace sequentially. |
| 16 | |
| 17 | So the problem posed is the following : we want to generate a per cpu 64 bits |
| 18 | TSC from the available 32 bits with the 32 MSB generated synthetically. I should |
| 19 | be readable by the buffer switch event. |
| 20 | |
| 21 | The idea is the following : we keep a 32 bits previous_tsc value per cpu. It |
| 22 | helps detect the wrap around. Each time a heartbeat fires or a buffer switch |
| 23 | happens, the previous_tsc is read, and then written to the new value. If a wrap |
| 24 | around is detected, the msb_tsc for the cpu is atomically incremented. |
| 25 | |
| 26 | We are sure that there is only one heartbeat at a given time because they are |
| 27 | fired at fixed interval : typically 10 times per 32bit TSC wrap around. Even |
| 28 | better, as they are launched by a worker thread, it can only be queued once in |
| 29 | the worker queue. |
| 30 | |
| 31 | Now with buffer switch vs heartbeat concurrency. Worse case : a heartbeat is |
| 32 | happenning : one CPU is in process context (worker thread), the other ones are |
| 33 | in interrupt context (IPI). On one CPU in IPI, we have an NMI triggered that |
| 34 | generates a buffer switch. |
| 35 | |
| 36 | What is sure is that the heartbeat needs to read and write the previous_tsc. It |
| 37 | also needs to increment atomically the msb_tsc. However, the buffer switch only |
| 38 | needs to read the previous_tsc, compare it to the current tsc and read the |
| 39 | msb_tsc. |
| 40 | |
| 41 | Another race case is that the buffer switch can be interrupted by the heartbeat. |
| 42 | |
| 43 | So what we need is to have an atomic write. As the architecture does not support |
| 44 | 64 bits cmpxchg, we will need this little data structure to overcome this |
| 45 | problem : |
| 46 | |
| 47 | An array of two 64 bits elements. Elements are updated in two memory writes, but |
| 48 | the element switch (current element) is made atomically. As there is only one |
| 49 | writer, this has no locking problem. |
| 50 | |
| 51 | We make sure the synthetic tcs reader does not sleep by disabling preemption. We |
| 52 | do the same for the writer. |