[lug] More wood on the fire (High resolution timers)
Bear Giles
bgiles at coyotesong.com
Sun Feb 23 20:54:31 MST 2003
At the risk of starting a riot, I have to ask a really dumb question.
Why do you care to know the time to a resolution of 1 usec?
If you just need the time - a single measurement - you can
certainly use gettimeofday() and a combination of NTP and local
clock skewing to get the actual time to within 10msec or so.
But if you you're trying to create a high resolution interval
timer, you'll need to make multiple calls to gettimeofday() and
the context switch latency that can occur between these calls will
become a huge issue.
Consider the following code. (And run it on your system, although
you may need to tweak some of the numbers.)
long process (long count, const char *buffer)
{
struct timeval start, stop;
long i, j, x, delay;
gettimeofday(&start, NULL);
/* do something that can't be optimized out of existence. */
x = 0;
for (i = 0; i < count; i++) {
for (j = 0; j < 1<<13; j++) {
x = (x << 1) ^ buffer[j % 20];
}
}
gettimeofday(&stop, NULL);
delay = 1000000 * (stop.tv_sec - start.tv_sec);
delay = delay + stop.tv_usec - start.tv_usec;
return delay;
}
int main()
{
int i;
long baseline = 0;
long delay[20];
for (i = 0; i < 20; i++) delay[i] = 0;
for (i = 0; i < 20; i++) {
for (count = 1; count < 20; count++) {
delay[count] += process(count,
"the quick brown fox jump over the lazy dog");
}
}
baseline = delay[1];
for (i = 1; i < 20; i++) {
printf("%2ld: %6.2f\n", i,
100.0*(float)(delay[i]-(i*baseline))/
(float)baseline);
}
}
The time consumed by "process()" should scale linearly with the
count, so the numbers printed later should hover around 0%.
But they don't. Even with 20 runs averaged there's a lot of
variability, but this is representantive:
1: 0.00
2: 3.14
3: 160.48
4: 17.94
5: 18.10
6: 22.68
7: 34.80
8: 44.18
9: 43.51
10: 85.85
11: 58.61
12: 63.25
13: 66.86
14: 120.00
15: 89.56
16: 77.52
17: 92.06
18: 94.81
19: 104.92
The numbers are a bit wierd, but the second column is an attempt
to normalize the "excess" time over n*O(1), where O(1) has
arbitrary units of 100. So for n=19, the time should have been
1900, but was actually 1900+104.92 = 2004.92.
For n=3, the time should have been 300 but was actually 460.48.
The huge hit here is because it's close to the system time slice -
shorter runs often avoid the hit, while longer runs can amortize
the cost over more slices spent entirely in the delay loop.
The point of this exercise is that the latency in the time slices
causes an anomomously, and highly unpredictable overhead, to be
added to any time measurement of a software task that spans
multiple timeslices. If you can stay within a single timeslice
(perhaps tossing any anomomously high numbers) you can get good
results on a stock kernel, but if you're trying to measure time
delays that are within an order of magnitude of the timeslice
you're hosed.
Bear
More information about the LUG
mailing list