[lug] q: multithreading on smp weirdness
Chan Kar Heng
karheng at softhome.net
Sun Dec 5 09:46:56 MST 2004
>On Tue, 2004-11-30 at 21:14 -0700, karheng at softhome.net wrote:
>> that's an interesting guess! i've gotta watch out for
>> that.. i don't think it's the reason though, at least
>> not for the test app.
>>
>> for the test app, i've replaced the chunk fetching
>> function with a routine that simply does some computation
>> over a small buffer. that buffer doesn't grow or
>> shrink, & all i do is toggle the number of iterations
>> over that buffer. so it should exclude issues caused by
>> the cpu cache line...
>>
>> actually, the test app i have is part of a bigger app.
>> thread 1 of the bigger app fetches data & thread 2
>> processes them.
>> i've extracted & simplified that code out and made both
>> threads only do computation on a small buffer to
>> reproduce the problem from the bigger app.
>>
>> anyway, i've also tested the threads in the bigger app,
>> & i've found that data returned from thread 1 is
>> consistently sufficient to keep thread 2 busy.
>> (thread1:thread2 elapse time ratio is approx 6:4, in my
>> test app, it's made to be 1:1).
>> in theory, i should be getting about 40% or 30%
>> performance improvement in my bigger app, & always 40%
>> to 50% performance improvement in my test case.
>
>Something else that I just thought of that could help. CPU performance
>counters. Intel's Vtune software uses these to profile software in all
>sorts of interesting ways. I don't know if Itanium has the same
>performance registers that the P4 has, but if they were good I don't see
>why Intel would leave them out.
>
>I also don't know if Intel has anything like Vtune for Itanium. *goes
>to check Google* Ah, yes they do. And there's another thing called
>Thread Checker (http://www.intel.com/software/products/threading/tcwin/)
>
>I don't know if Intel offers an evaluation copy or if you'd have to buy
>it. Geez. I just looked. They want $1,198 for the Vtune package (that
>includes Thread Checker).
>
>There may be other tools to do this sort of thing out there.
>--
>Zan Lynx <zlynx at acm.org>
thanks. will keep that in mind.
we've actually evaluated the intel compiler but didn't
get to vtune for a reason i can't remember offhand.
(i think i got an older vtune & its readme claimed it
didn't support the itanium under linux though the
web site claimed it's supposed to)
(the compiler actually gave us approx 15% performance
improvement just by switching to it alone, & that's with
very optimized code & we've not even tuned it to the
compiler yet.)
anyway, just an update...
i managed to get it tested on a new dual itanium box.
it gave me the expected 50% performance improvement
no matter what number of thread syncs i put in.
(thread sync overheads were visible & only added
few milliseconds progressively.)
i'm gonna try it with 2 other dual itanium boxes from
a nearby friendly dept soon to verify this.
thanks.
rgds,
kh
More information about the LUG
mailing list