[lug] q: multithreading on smp weirdness

Wed Dec 1 10:25:48 MST 2004

karheng at softhome.net wrote:
> that's an interesting guess! i've gotta watch out for
> that.. i don't think it's the reason though, at least
> not for the test app.
> for the test app, i've replaced the chunk fetching
> function with a routine that simply does some computation
> over a small buffer. that buffer doesn't grow or
> shrink, & all i do is toggle the number of iterations
> over that buffer. so it should exclude issues caused by
> the cpu cache line...
> actually, the test app i have is part of a bigger app.
> thread 1 of the bigger app fetches data & thread 2
> processes them.
> i've extracted & simplified that code out and made both
> threads only do computation on a small buffer to
> reproduce the problem from the bigger app.
> anyway, i've also tested the threads in the bigger app,
> & i've found that data returned from thread 1 is
> consistently sufficient to keep thread 2 busy.
> (thread1:thread2 elapse time ratio is approx 6:4, in my
> test app, it's made to be 1:1).
> in theory, i should be getting about 40% or 30%
> performance improvement in my bigger app, & always 40%
> to 50% performance improvement in my test case.
> i've attached source code from my test app, for better
> clarity. it can't be compiled because i've excluded some
> custom classes. i can produce a completely isolated
> version of the code if required.
> i've been trying to look up multithreading topics/faqs
> all over the net... so far still no clue.
> will also try it on another dual CPU itanium box soon
> after it's set up.

I suspect the threading wrappers themselves hold a good chunk of the 
clues. You could possibly compile with profiling on (see man gprof, and 
search for profile under man g++) which would give you a lot of information.

Based on the code below though, it sounds like hard drive access is also 
going on. If this is so, does /proc/interrupts show all cpu's handling 
interrupts? If not, the SMP APIC is not enabled (and it may or may not 
be safe to do this depending on the chipset). On SMP machines only CPU0 
handles hardware IRQ's, *unless* the APIC is enabled. Disk access and 
most hardware require hardware IRQ's, and thus with the SMP APIC 
enabled, any CPU can handle hardware, but without it, only 1 CPU can do 
this, and that 1 CPU can easily get IRQ starved under heavy hardware IRQ 
activity (e.g., a highly active ethernet card plus disk activity). Many 
of the newer Intel chipsets do not by default get the APIC activated, 
simply because Intel doesn't provide enough public chipset information. 
Profiling would probably tell you if disk I/O is consuming too much time 
as well.

Are these disk read/write that follow? If so, then you can be almost 
certain that this is your bottleneck, especially since no matter how 
many threads you have the disk is serialized.
...
> /*
>   Wrapper for OS file map functions.
>   Provides void * map(filename,mapmode,size) & unmap(void *).
>   mapmode is 'i' for create+read+write, 'w' for read+write,
>   & 'r' for read only.
> */
> #include        <Filemapper.h>
...
>   buf1=(int *)map("buf1",'i',sizeof(int)*bufcnt);
>   buf2=(int *)map("buf2",'i',sizeof(int)*bufcnt);
...
>   unmap(buf1);
>   unmap(buf2);
...

D. Stimits, stimits AT comcast DOT net