[lug] q: multithreading on smp weirdness

Thu Dec 9 14:01:45 MST 2004

...
> i've profiled the stuff & have attached them to the mail (thrddbg.zip).
...
> so far it seems that on kernels before or equal
> "2.4.21-4.EL #1 SMP Fri Oct 3 17:29:39 EDT 2003 ia64 ia64 ia64 GNU/Linux"
> , test app wouldn't multithread properly....
> & on kernels after or equal
> "2.4.21-15.EL #1 SMP Thu Apr 22 00:13:07 EDT 2004 ia64 ia64 ia64 GNU/Linux"
> , test app multithreads correctly.
...

I know a lot of kernel devel thread issues have been dealt with in going 
from kernel 2.4.x to 2.6.x, and at some point some of the design will 
have come back to 2.4.x, so I guess the first recommendation would be to 
go with a newer 2.4.x or 2.6.x kernel, especially since you have found 
this to be directly related to performance. I have yet to test it out 
(other than running SMP machines for a workstation), but rumor is that 
2.6 kernels are more efficient with SMP.

On the profile data, I wasn't sure which one was for bad-performance 
case and which for good-performance case (it would be good to sit 
profile data for a bad-performance case configuration on one machine to 
the exact same case configuration on a failing machine).

One thing I did see in the 6000 case was almost no time or function 
calls to most functions, but huge numbers of calls to get_busy and 
push/pop. It seems that the push/pop is in some way a means of being 
reentrant, though I'm not sure (how efficient are your push/pop 
functions?). Now if threads were being spawned thousands of times, and 
pushing/popping thousands of times, why were the other functions not 
being called more? From this I can think of two things to add to the 
trail of clues.

First it might be that profile just is not profiling actual content 
within a thread (course-grained profile seeing only a whole thread and 
not breakdown within the thread) and the real culprit is unmeasured 
within a thread.

Second, perhaps it is spawning and re-joining thousands of threads even 
when there is no work for the thread to do...if so then a huge amount of 
overhead from spawning threads and re-joining them could be removed by 
making them spawn/re-join only when there is work to do. Since the 
get_busy is something you built for testing and likely has something to 
do, the overhead might not be avoidable and might be valid. I'm not sure 
what the ability of profiling will be to follow individual threads and 
track profile data within a single thread call though, it's something 
I've never had to do.

Regardless of first or second possibility, you will want to verify the 
push/pop efficiency. I'm wondering if this is somehow related to a cache 
hit/miss or just something more basic. I didn't see the actual push/pop 
methods in earlier code, and would be especially suspicious if a 
push/pop reserves or releases memory (or copies memory).

D. Stimits, stimits AT comcast DOT com