[lug] Testing

Wed Oct 21 09:13:42 MDT 2009

Davide,

Good questions.  Let me elaborate a little.

First, there are several ways to hunt up a reasonable test sample.  Second,
these approaches can fail in some interesting ways.

I'll start with ways to get tests.  The suites have common features: they're
big, they're broad, and they're generated by someone (or something) else.

Each type tests what it tests.  Your point is that statistics are only
useful when you think about what they actually say.  Absolutely right.

(1) Comprehensive testing.

Quality Logic sells two classes of printer-language tests: "Application
Tests" -- sample real-world print jobs that exercise a   language,
end-to-end -- and "Evaluation Tests" that test individual operators of the
language interpreter.

A single PostScript application test might, for example, be the PostScript
generated by a Windows driver for a giant Microsoft word document that
invokes a gallon of PostScript features, simple to fancy.  A single
evaluation test might test all the variants and behaviors of PostScript's
"lineto" operator; the PostScript evaluation test suite exhaustively tests
all the operators in the Red Book, Adobe's 912-page, PostScript Language
Reference Manual.

Each suite has thousands of tests (or tens of thousands, depending on how
you count).  They're used by all major printer manufacturers.

(2) Sampling from nature.

There's no easy way to write evaluation tests for PDF.  The design makes it
impossible to write files that only exercise one operator.

The example we used in the article -- thousands of PDF tests scraped from
the web -- gathers a large array of application tests, not consciously
selected for what they test.

There are statistical biases here, but you can tailor coverage with a little
care.

For example, when I built the suite, PDF on the web was still rare.  If I'd
grabbed the first thousand PDFs I found, most would have been IRS forms,
created with one version of Adobe Acrobat, probably by a handful of people
in one government office.  It would have been an accurate, random
representation by number, but I chose to limit myself to one PDF per host.

(3) Monkey testing.

I haven't tried this yet, but lots of literature claims you find tons of
bugs with random input.  The two commonly-discussed variants are "dumb
monkey testing" and "smart monkey testing."  A dumb monkey is line noise --
byte streams from /dev/random.  A smart monkey is sort-of-reasonable, random
input, such as good input with random changes.

There's everything in between, too.  Expect will let you generate user-like
input -- input with statistical characteristics "like" typing -- that folks
use to do cool things.

These three approaches each yield different kinds of useful, big, test
suites to pick random subsets from: some to run, some to sequester.  They
aren't the only ways to do this, but illustrate useful approaches.

I'll put some interesting failure modes in another message.  This one's long
enough already.  :-)

-- 
Click to Call Me Now! --
http://seejeffrun.blogspot.com/2009/09/call-me-now.html

Jeffrey Haemer <jeffrey.haemer at gmail.com>
720-837-8908 [cell],  @goyishekop [twitter]
http://www.youtube.com/user/goyishekop [vlog]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lug.boulder.co.us/pipermail/lug/attachments/20091021/8da09dc6/attachment.html>