[lug] [OT] Threads & Socket Programming

Fri Oct 25 09:25:15 MDT 2002

On Fri, 2002-10-25 at 09:49, Scott Herod wrote:
> > Someone else may be able to correct me, but I thought sockets in C 
> > (C++) were non-blocking. 
> 
> I believe that it depends on how the file ( pipe, socket, etc. ) is 
> opened.  Typically a flag specified in the open or socket function call 
> sets the behavior.

That's correct.  You can use an ioctl() with FIONBIO to set the socket
as nonblocking.  You can also use fcntl() with FNODELAY to do the same
thing.  They're very similar, but slightly different.  See Stevens text
on network programming for the details.

> >> Everything works fine, except that if I write too many
> >> packets to the socket too fast, I loose data packets (never
> >> received by the client side).  I can solve this problem by
> >> putting in a sleep(1) statement, but that kind of defeats
> >> the purpose of having a thread to handle writing data to the
> >> socket from the queue.

The way I handle this is to set the sending socket to be blocking and
use write() or send() to send the entire packet.  This lets the OS
handle the details of getting the entire packet out correctly.  On the
receiving end I set the reading socket to be nonblocking and loop using
read()'s until I get all of the user-level packet (ie the data structure
I'm trying to send) or I timeout waiting for more data.  The hard part
isn't getting the data, it's catching when the opposite end closes the
connection (looking for EPIPE on a nonblocking read()).

By making the read() socket nonblocking I can make sure my server (which
does most of the reading from lots of clients) doesn't get hung on one
client when bad packets make it through.  If that happens, the server
closes the connection to the client - punts and lets the client
restart.  If a client hangs sending, then you just kill it and try
again.  Not optimal for all situations, but useful for many.

> BTW, in some code that I am looking at to recall this I have 
> the following few lines.
> 
>     // select seems to work better with gdb than usleep
>     // usleep( timeoutInMS/1000 );
>     struct timeval tv;
>     tv.tv_sec = 0; tv.tv_usec = timeoutInMS*1000;
>     select( 1, NULL, NULL, NULL, &tv );

Select can be used as a small granulatiry timer.  Stevens texts even
suggest it.  With network programming, though, it's used to poll the set
of open sockets to see if any have data that needs to be read or is
available for sending.  So it works in well with the rest of this
conversation.  :-)

> >> Anyone know why I would be having this problem?  Do I need
> >> to call some function to poll the socket and find out if
> >> data can be sent out yet, or if I have to wait (thought such
> >> blocking was handled automatically).

You can use select() to poll the socket to see if it's ready for
sending.  That's what select() is for.  I've never used it for sending,
however - only for receiving sockets.  In the code I'm currently working
on the problem is never the sender can't send, but that the receiver is
blocked waiting on data from some other source, so the send/receive
network queues fill up and both sides come to a halt.  In my situation,
I have lots of clients (100's in some cases) trying to send lots of 2k
packets to the server.  If the server hangs on one, it doesn't take long
for all of the rest of the clients to lock up too.  Bad doobie.  That's
why I use nonblocking read's on the server and just punt any badly
behaved clients.

How can you tell this has happened?  Look at netstat -t tcp on the
server side (receive side, that is) and watch for the receive queues to
fill up.  Also look at "top" and see what state the server is in.  If
it's in sbwait, chances are it's stuck waiting for incoming data that it
never gets.  In turn, this eventually causes the client side to back up
it's send queue and just stops functioning.  You won't get stuck in
sbwait if you use nonblocking reads, though.  It's just trickier to know
when you have all the data you want - you need to write a loop to check
for the full data structure you sent.

> My impression, David, is that this is the reading code's problem.  While 
> reads can be set to block or not, I almost always wrap a select call 
> around one as well so that I can, in a sense, poll.  Check to see how 
> the reading code is opening the socket and whether it is doing any 
> polling.  If you want, reply off-line.  I've got a pretty good 
> collection of sample code for reading/writing using various connections.

Scott's probably right.  The problems are usually on the receiving side
and the effect is becoming visible to you on the client side.

-- 
Michael J. Hammel                               The Graphics Muse 
mjhammel at graphics-muse.org                     
http://www.graphics-muse.com
------------------------------------------------------------------------------
Got a full 6-pack, but lacks the plastic thing to hold it all together.
-- From a real employee performance evaluation.