[lug] Troubleshooting Transient DNS failures

George Sexton georges at mhsoftware.com
Wed Jan 16 10:41:27 MST 2019


On 2019-01-16 09:37, Rob Nagler wrote:
> Not exactly the answer you are looking for, but...
> 
> Over the years I've found my colo/ISPs had issues with DNS so I
> switched to Google and now CloudFlare. I find them to be extremely
> reliable, and moved to CloudFlare, since they aren't Google. Google
> 
> Your failure is not likely to be local so spending time debugging it
> may be a total waste of time. Pop and swap. :)

Unfortunately, this is in a corporate setting and that would go over 
like the proverbial lead zeppelin.


> One time we were having issues with TCP DNS. One of their caching
> servers was misconfigured. Another time, certain sites (e.g.
> github.com [1]) were just failing randomly, which would mess up builds

Yes. I've got Jenkins jobs that are randomly failing and it makes the 
devs unhappy.

I understand I have no control, but a) they ask me why their job failed 
and b) I'm concerned about the perception that the CI/CD pipeline is 
unreliable, and it's my job to make it reliable...

> randomly. As you say, you don't have control of the DNS servers, so
> you're going to be debugging your ISP/colo (read: maybe the same one
> :).
> 
> Rob
> 
> 
> 
> Links:
> ------
> [1] http://github.com
> _______________________________________________
> Web Page:  http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
> Join us on IRC: irc.hackingsociety.org port=6667 
> channel=#hackingsociety

-- 
George Sexton
MH Software, Inc. - Home of connectDaily Web Calendar
https://www.mhsoftware.com/


More information about the LUG mailing list