[lug] q: multithreading on smp weirdness

karheng at softhome.net karheng at softhome.net
Tue Nov 30 21:13:36 MST 2004


hmmm 

well, actually, the test case i have is part of a bigger app.
thread 1 of the bigger app fetches data & thread 2 processes them. 

i've extracted & simplified that code out and made both threads
only do computation on a small buffer to reproduce the problem
from the bigger app. 

anyway, i've also tested the threads in the bigger app, & i've
found that data returned from thread 1 is consistently sufficient
to keep thread 2 busy.
(thread1:thread2 elapse time ratio is approx 6:4, in my test app,
it's made to be 1:1)
in theory, i should be getting about 40% or 30% performance
improvement in my bigger app, & always 40% to 50% performance
improvement in my test case. 

so it's not that.. :) 

i'm gonna include source code for the test app in reply to another
message of this thread.. 

rgds, 

kh 


>
>This is a guess, but I would say thread 2 is starved waiting for chunks. 
>
>When you say thread 1 fetches a chunk, is that I/O? I/O in bigger blocks,
>(up to a point) usually does better than small chunks. 
>
>
>George Sexton
>MH Software, Inc.
>http://www.mhsoftware.com/
>Voice: 303 438 9585
>   
>
>> -----Original Message-----
>> From: lug-bounces at lug.boulder.co.us 
>> [mailto:lug-bounces at lug.boulder.co.us] On Behalf Of 
>> karheng at softhome.net
>> Sent: Monday, November 29, 2004 7:45 PM
>> To: lug at lug.boulder.co.us
>> Subject: [lug] q: multithreading on smp weirdness 
>> 
>> 
>> greetings.  
>> 
>> i've got a test app with 2 threads processing
>> several chunks of data. thread 1 fetches chunk 1
>> & processes it. once done, it's relayed to thread 2.
>> then thread 1 fetches & processes chunk 2, while
>> thread 2 receives & processes chunk 1. this goes
>> on until all chunks are processed by both threads.  
>> 
>> my problem is that if the chunk size is made small
>> enough, the elapse time suddenly almost doubles...
>> and i need these chunks to be small eventually.  
>> 
>> more specifically:
>> (chunk size * iteration count always == 6000)
>> single threaded:
>> chunk size of 6000 unit, 1 iteration, 12+ secs elapse time
>> multi-threaded:
>> chunk size of 12 unit, 500 iterations, 6+ secs elapse time
>> chunk size of 10 unit, 600 iterations, 6+ secs elapse time
>> chunk size of 8 unit, 750 iterations, 6++ secs elapse time
>> chunk size of 6 unit, 1000 iterations, 6++ secs elapse time 
>> most of the time
>> chunk size of 5 unit, 1200 iterations, 6+++ secs elapse time 
>> most of the 
>> time, 11+ secs at other times.
>> chunk size of 4 unit, 1500 iterations, 11+ secs elapse time
>> chunk size of 3 unit, 2000 iterations, 11++ secs elapse time
>> chunk size of 2 unit, 3000 iterations, 11+++ secs elapse time
>> chunk size of 1 unit, 6000 iterations, 12+ secs elapse time
>> (note that in above, elapse times of 7, 8, 9 & 10 secs are
>> mostly not present, just 1 or 2 sporadic ones).  
>> 
>> i ran this on a 4 CPU itanium linux server.  
>> 
>> this problem doesn't appear when i ran a port
>> of this app on a windows hyperthreading machine.  
>> 
>> anyone might have any idea what's wrong?
>> thanks in advance.  
>> 
>> rgds,  
>> 
>> kh 
 



More information about the LUG mailing list