[lug] Re: q: multithreading on smp weirdness

Thu Dec 9 19:38:42 MST 2004

> I know a lot of kernel devel thread issues have been dealt with in going 
> from kernel 2.4.x to 2.6.x, and at some point some of the design will have 
> come back to 2.4.x, so I guess the first recommendation would be to go 
> with a newer 2.4.x or 2.6.x kernel, especially since you have found this 
> to be directly related to performance. I have yet to test it out (other 
> than running SMP machines for a workstation), but rumor is that 2.6 
> kernels are more efficient with SMP. 
> 
okay.. thanks... i guess i'll have to stick with that
until i find problems even with new kernels... 

> On the profile data, I wasn't sure which one was for bad-performance case 
> and which for good-performance case (it would be good to sit profile data 
> for a bad-performance case configuration on one machine to the exact same 
> case configuration on a failing machine). 
> 
well, as you know, there were 2 sets of profile data
attached. 

the first set profiles the test app doing only 1 thread
sync... it still runs on 2 threads, but only syncs once.
it's also the one that multithreads properly (elapse time
of 6 secs).
but when i ran it with the profiler, it went really bad.
(elapse time of 17 secs). 

the 2nd set profiles the test app doing 6000 thread syncs.
it still runs on 2 threads, but syncs 6000 times.
it's the one that didn't multithread properly (elapse time
of 11+++ secs).
but when i ran it with the profiler, it's elapse time
wasn't too bad (elapse time of 12+ secs). 

strange. 

> One thing I did see in the 6000 case was almost no time or function calls 
> to most functions, but huge numbers of calls to get_busy and push/pop. It 
> seems that the push/pop is in some way a means of being reentrant, though 
> I'm not sure (how efficient are your push/pop functions?). Now if threads 
> were being spawned thousands of times, and pushing/popping thousands of 
> times, why were the other functions not being called more? From this I can 
> think of two things to add to the trail of clues. 
> 
threads weren't spawned 6000 times... sorry if
i had mentioned that earlier... there's always
only 2 threads. the threads were synchronized
6000 times. 

the push & pop calls were pushing & popping wait
events into threads for synchronization purposes.
so it only makes sense that the number of these
calls go way up when i sync 6000 times. 

the push & pop functions are not reentrant but
are thread safe. 

i've attached an extract of the Push() function
for you to get an idea of its efficientcy (more
of inefficientcy actually.. hehe). 

rgds, 

kh 

-------------- next part --------------
// ::Start resets waitcounter to 0.
// Public method to allow other threads to increment waitcounter of this thread.
// If called by others, thread doesn't block.
// Blocks without consuming CPU time.
APIFUNCATTRH void APIFUNCATTRB
NXThreadSync::Push(unsigned cnt,unsigned long timeoutms)
APIFUNCATTRT
{
#if             !defined(MS_WINDOWS_NATIVE)
   pthread_cleanup_push((void (*)(void *))pthread_mutex_unlock,(void *)&mutex);
   if(pthread_mutex_lock(&mutex)!=0)
      throwerror(0,"NXThreadSync::Push()","Unexpected error!");
   try
   {
      waitcounter+=cnt;
      if(waitcounter<1)
      {
         if(pthread_cond_broadcast(&condition)!=0)
            throwerror(0,"NXThreadSync::Push()","Unexpected error!");
      }
      else
         while(waitcounter>0)
         {
            if(timeoutms==0)
            {
               //if(pthread_cond_wait(&condition,&mutex)!=0)
               //   throwerror(0,"NXThreadSync::Push()","Unexpected error!");
               //continue;
               timeoutms=FINALTIMEOUTMS;
            };
            struct timespec timeout;
            int rslt;

            timeout.tv_nsec=(timeoutms%1000)*1000000;
            timeout.tv_sec=(timeoutms/1000)+time(NULL);
            rslt=pthread_cond_timedwait(&condition,&mutex,&timeout);
            if(rslt==ETIMEDOUT)
               throwerror(0,"NXThreadSync::Push()","Timed out!");
            if(rslt!=0)
               throwerror(0,"NXThreadSync::Push()","Unexpected error!");
         };
   }
   catch(...)
   {
      pthread_mutex_unlock(&mutex);
      throw;
   };
   if(pthread_mutex_unlock(&mutex)!=0)
      throwerror(0,"NXThreadSync::Push()","Unexpected error!");
   pthread_cleanup_pop(0);
#else        /* !defined(MS_WINDOWS_NATIVE) */
   if(WaitForSingleObject(hMutex,FINALTIMEOUTMS)!=WAIT_OBJECT_0)
      throwerror(0,"NXThreadSync::Push()","Unexpected error!");
   try
   {
      waitcounter+=cnt;
      if(waitcounter<1)
      {
         ReleaseSemaphore(hSmphr,lThrdCnt,(LPLONG)&lThrdCnt);
         for(;lThrdCnt!=0;SleepEx(0,TRUE),ReleaseSemaphore(hSmphr,0,(LPLONG)&lThrdCnt));
      }
      else
      {
         lThrdCnt++;
         if(ReleaseMutex(hMutex)==0)
            throwerror(0,"NXThreadSync::Push()","Unexpected error!");
         DWORD waitrslt=WaitForSingleObject(hSmphr,timeoutms==0?FINALTIMEOUTMS:timeoutms);
         if(WaitForSingleObject(hMutex,FINALTIMEOUTMS)!=WAIT_OBJECT_0)
            throwerror(0,"NXThreadSync::Push()","Unexpected error!");
         if(waitrslt==WAIT_TIMEOUT)
            throwerror(0,"NXThreadSync::Push()","Timed out!");
         if(waitrslt!=WAIT_OBJECT_0)
            throwerror(0,"NXThreadSync::Push()","Unexpected error!");
      };
   }
   catch(...)
   {
      ReleaseMutex(hMutex);
      throw;
   };
   if(ReleaseMutex(hMutex)==0)
      throwerror(0,"NXThreadSync::Push()","Unexpected error!");
#endif       /* !defined(MS_WINDOWS_NATIVE) */
}