[lug] q: multithreading on smp weirdness
D. Stimits
stimits at comcast.net
Thu Dec 9 14:01:45 MST 2004
...
> i've profiled the stuff & have attached them to the mail (thrddbg.zip).
...
> so far it seems that on kernels before or equal
> "2.4.21-4.EL #1 SMP Fri Oct 3 17:29:39 EDT 2003 ia64 ia64 ia64 GNU/Linux"
> , test app wouldn't multithread properly....
> & on kernels after or equal
> "2.4.21-15.EL #1 SMP Thu Apr 22 00:13:07 EDT 2004 ia64 ia64 ia64 GNU/Linux"
> , test app multithreads correctly.
...
I know a lot of kernel devel thread issues have been dealt with in going
from kernel 2.4.x to 2.6.x, and at some point some of the design will
have come back to 2.4.x, so I guess the first recommendation would be to
go with a newer 2.4.x or 2.6.x kernel, especially since you have found
this to be directly related to performance. I have yet to test it out
(other than running SMP machines for a workstation), but rumor is that
2.6 kernels are more efficient with SMP.
On the profile data, I wasn't sure which one was for bad-performance
case and which for good-performance case (it would be good to sit
profile data for a bad-performance case configuration on one machine to
the exact same case configuration on a failing machine).
One thing I did see in the 6000 case was almost no time or function
calls to most functions, but huge numbers of calls to get_busy and
push/pop. It seems that the push/pop is in some way a means of being
reentrant, though I'm not sure (how efficient are your push/pop
functions?). Now if threads were being spawned thousands of times, and
pushing/popping thousands of times, why were the other functions not
being called more? From this I can think of two things to add to the
trail of clues.
First it might be that profile just is not profiling actual content
within a thread (course-grained profile seeing only a whole thread and
not breakdown within the thread) and the real culprit is unmeasured
within a thread.
Second, perhaps it is spawning and re-joining thousands of threads even
when there is no work for the thread to do...if so then a huge amount of
overhead from spawning threads and re-joining them could be removed by
making them spawn/re-join only when there is work to do. Since the
get_busy is something you built for testing and likely has something to
do, the overhead might not be avoidable and might be valid. I'm not sure
what the ability of profiling will be to follow individual threads and
track profile data within a single thread call though, it's something
I've never had to do.
Regardless of first or second possibility, you will want to verify the
push/pop efficiency. I'm wondering if this is somehow related to a cache
hit/miss or just something more basic. I didn't see the actual push/pop
methods in earlier code, and would be especially suspicious if a
push/pop reserves or releases memory (or copies memory).
D. Stimits, stimits AT comcast DOT com
More information about the LUG
mailing list