dieharder - A testing and benchmarking tool for random number
dieharder [-a] [-d dieharder test number] [-f filename] [-B]
[-D output flag [-D output flag] ... ] [-F] [-c separator]
[-g generator number or -1] [-h] [-k ks_flag] [-l]
[-L overlap] [-m multiply_p] [-n ntuple]
[-p number of p samples] [-P Xoff]
[-o filename] [-s seed strategy] [-S random number seed]
[-n ntuple] [-p number of p samples] [-o filename]
[-s seed strategy] [-S random number seed]
[-t number of test samples] [-v verbose flag]
[-W weak] [-X fail] [-Y Xtrategy]
[-x xvalue] [-y yvalue] [-z zvalue]
- -a runs all the tests with standard/default options to
- user-controllable report. To control the formatting of the
report, see -D below. To control the power of the test (which uses default
values for tsamples that cannot generally be varied and psamples which
generally can) see -m below as a "multiplier" of the default
number of psamples (used only in a -a run).
- -d test number - selects specific diehard test.
- -f filename - generators 201 or 202 permit either raw
- formatted ASCII numbers to be read in from a file for
testing. generator 200 reads in raw binary numbers from stdin. Note well:
many tests with default parameters require a lot of rands! To see a sample
of the (required) header for ASCII formatted input, run
dieharder -o -f example.input -t 10
and then examine the contents of example.input. Raw binary input reads 32
bit increments of the specified data stream. stdin_input_raw accepts a
pipe from a raw binary stream.
- -B binary mode (used with -o below) causes output rands to
be written in raw binary, not formatted ascii.
- -D output flag - permits fields to be selected for
- dieharder output. Each flag can be entered as a binary
number that turns on a specific output field or header or by flag name;
flags are aggregated. To see all currently known flags use the -F
- -F - lists all known flags by name and number.
- -c table separator - where separator is e.g. ',' (CSV) or '
- -g generator number - selects a specific generator for
- -g -1 causes all known generators to be printed out to the
- -h prints context-sensitive help -- usually Usage (this
message) or a
- test synopsis if entered as e.g. dieharder -d 3 -h.
- -k ks_flag - ks_flag
0 is fast but slightly sloppy for psamples > 4999 (default).
1 is MUCH slower but more accurate for larger numbers of psamples.
2 is slower still, but (we hope) accurate to machine precision for any
number of psamples up to some as yet unknown numerical upper limit (it has
been tested out to at least hundreds of thousands).
3 is kuiper ks, fast, quite inaccurate for small samples, deprecated.
- -l list all known tests.
- -L overlap
1 (use overlap, default)
0 (don't use overlap)
in operm5 or other tests that support overlapping and non-overlapping sample
- -m multiply_p - multiply default # of psamples in -a(ll)
runs to crank
- up the resolution of failure. -n ntuple - set ntuple length
for tests on short bit strings that permit the length to be varied (e.g.
- -o filename - output -t count random numbers from current
generator to file.
- -p count - sets the number of p-value samples per test
- -P Xoff - sets the number of psamples that will cumulate
- that a generator is "good" and really, truly
passes even a -Y 2 T2D run. Currently the default is 100000; eventually it
will be set from AES-derived T2D test failure thresholds for fully
automated reliable operation, but for now it is more a "boredom"
threshold set by how long one might reasonably want to wait on any given
- -S seed - where seed is a uint. Overrides the default
- selection. Ignored for file or stdin input.
- -s strategy - if strategy is the (default) 0, dieharder
- rewinds) once at the beginning when the random number
generator is selected and then never again. If strategy is nonzero, the
generator is reseeded or rewound at the beginning of EACH TEST. If -S seed
was specified, or a file is used, this means every test is applied to the
same sequence (which is useful for validation and testing of dieharder,
but not a good way to test rngs). Otherwise a new random seed is selected
for each test.
- -t count - sets the number of random entities used in each
- possible. Be warned -- some tests have fixed sample sizes;
others are variable but have practical minimum sizes. It is suggested you
begin with the values used in -a and experiment carefully on a test by
- -W weak - sets the "weak" threshold to make the
test(s) more or less
- forgiving during e.g. a test-to-destruction run. Default is
- -X fail - sets the "fail" threshold to make the
test(s) more or less
- forgiving during e.g. a test-to-destruction run. Default is
currently 0.000001, which is basically "certain failure of the null
hypothesis", the desired mode of reproducible generator failure.
- -Y Xtrategy - the Xtrategy flag controls the new "test
to failure" (T2F)
- modes. These flags and their modes act as follows:
0 - just run dieharder with the specified number of tsamples and psamples,
do not dynamically modify a run based on results. This is the way it has
always run, and is the default.
1 - "resolve ambiguity" (RA) mode. If a test returns
"weak", this is an undesired result. What does that mean, after
all? If you run a long test series, you will see occasional weak returns
for a perfect generators because p is uniformly distributed and will
appear in any finite interval from time to time. Even if a test run
returns more than one weak result, you cannot be certain that the
generator is failing. RA mode adds psamples (usually in blocks of 100)
until the test result ends up solidly not weak or proceeds to unambiguous
failure. This is morally equivalent to running the test several times to
see if a weak result is reproducible, but eliminates the bias of personal
judgement in the process since the default failure threshold is very small
and very unlikely to be reached by random chance even in many runs.
This option should only be used with -k 2.
2 - "test to destruction" mode. Sometimes you just want to know
where or if a generator will .I ever fail a test (or test series). -Y 2
causes psamples to be added 100 at a time until a test returns an overall
pvalue lower than the failure threshold or a specified maximum number of
psamples (see -P) is reached.
Note well! In this mode one may well fail due to the alternate null
hypothesis -- the test itself is a bad test and fails! Many dieharder
tests, despite our best efforts, are numerically unstable or have only
approximately known target statistics or are straight up asymptotic
results, and will eventually return a failing result even for a
gold-standard generator (such as AES), or for the hypercautious the XOR
generator with AES, threefish, kiss, all loaded at once and xor'd
together. It is therefore safest to use this mode .I comparatively,
executing a T2D run on AES to get an idea of the test failure threshold(s)
(something I will eventually do and publish on the web so everybody
doesn't have to do it independently) and then running it on your target
generator. Failure with numbers of psamples within an order of magnitude
of the AES thresholds should probably be considered possible test
failures, not generator failures. Failures at levels significantly less
than the known gold standard generator failure thresholds are, of course,
probably failures of the generator.
This option should only be used with -k 2.
- -v verbose flag -- controls the verbosity of the output for
- only. Probably of little use to non-developers, and
developers can read the enum(s) in dieharder.h and the test sources to see
which flag values turn on output on which routines. 1 is result in a
highly detailed trace of program activity.
- -x,-y,-z number - Some tests have parameters that can
safely be varied
- from their default value. For example, in the diehard
birthdays test, one can vary the number of length, which can also be
varied. -x 2048 -y 30 alters these two values but should still run fine.
These parameters should be documented internally (where they exist) in the
e.g. -d 0 -h visible notes.
NOTE WELL: The assessment(s) for the rngs may, in fact, be completely
incorrect or misleading. There are still "bad tests" in
dieharder, although we are working to fix and improve them (and try to
document them in the test descriptions visible with -g testnumber -h). In
particular, 'Weak' pvalues should occur one test in two hundred, and
'Failed' pvalues should occur one test in a million with the default
thresholds - that's what p MEANS. Use them at your Own Risk! Be Warned!
Or better yet, use the new -Y 1 and -Y 2 resolve ambiguity or test to
destruction modes above, comparing to similar runs on one of the
as-good-as-it-gets cryptographic generators, AES or threefish.
Welcome to the current snapshot of the dieharder random number tester. It
encapsulates all of the Gnu Scientific Library (GSL) random number generators
(rngs) as well as a number of generators from the R statistical library,
hardware sources such as /dev/*random, "gold standard" cryptographic
quality generators (useful for testing dieharder and for purposes of
comparison to new generators) as well as generators contributed by users or
found in the literature into a single harness
that can time them and
subject them to various tests for randomness. These tests are variously drawn
from George Marsaglia's "Diehard battery of random number tests",
the NIST Statistical Test Suite, and again from other sources such as personal
invention, user contribution, other (open source) test suites, or the
The primary point of dieharder is to make it easy to time and test
(pseudo)random number generators, including both software and hardware rngs,
with a fully open source tool. In addition to providing "instant"
access to testing of all built-in generators, users can choose one of three
ways to test their own random number generators or sources: a unix pipe of a
raw binary (presumed random) bitstream; a file containing a (presumed random)
raw binary bitstream or formatted ascii uints or floats; and embedding your
generator in dieharder's GSL-compatible rng harness and adding it to the list
of built-in generators. The stdin and file input methods are described below
in their own section, as is suggested "best practice" for newbies to
random number generator testing.
An important motivation for using dieharder is that the entire test suite is
fully Gnu Public License (GPL) open source code and hence rather than being
prohibited from "looking underneath the hood" all users are openly
encouraged to critically examine the dieharder code for errors, add new tests
or generators or user interfaces, or use it freely as is to test their own
favorite candidate rngs subject only to the constraints of the GPL. As a
result of its openness, literally hundreds of improvements and bug fixes have
been contributed by users to date, resulting in a far stronger and more
reliable test suite than would have been possible with closed and locked down
sources or even open sources (such as STS) that lack the dynamical feedback
mechanism permitting corrections to be shared.
Even small errors in test statistics permit the alternative
unstated) null hypothesis to become an important factor in rng testing -- the
unwelcome possibility that your generator is just fine but it is the
that is failing. One extremely useful feature of dieharder is that
it is at least moderately self validating.
Using the "gold
standard" aes and threefish cryptographic generators, you can observe how
these generators perform on dieharder runs to the same general degree of
accuracy that you wish to use on the generators you are testing. In general,
dieharder tests that consistently fail at any given level of precision
(selected with e.g. -a -m 10) on both of the gold standard rngs (and/or the
better GSL generators, mt19937, gfsr4, taus) are probably unreliable at that
precision and it would hardly be surprising if they failed your generator as
Experts in statistics are encouraged to give the suite a try, perhaps using any
of the example calls below at first and then using it freely on their own
generators or as a harness for adding their own tests. Novices (to either
statistics or random number generator testing) are strongly
to read the next section on p-values and the null hypothesis and running the
test suite a few times with a more verbose output report to learn how the
whole thing works.
Examples for how to set up pipe or file input are given below. However, it is
recommended that a user play with some of the built in generators to gain
familiarity with dieharder reports and tests before tackling their own
favorite generator or file full of possibly random numbers.
To see dieharder's default standard test report for its default generator
(mt19937) simply run:
To increase the resolution of possible failures of the standard -a(ll) test, use
the -m "multiplier" for the test default numbers of pvalues (which
are selected more to make a full test run take an hour or so instead of days
than because it is truly an exhaustive test sequence) run:
dieharder -a -m 10
To test a different generator (say the gold standard AES_OFB) simply specify the
generator on the command line with a flag:
dieharder -g 205 -a -m 10
Arguments can be in any order. The generator can also be selected by name:
dieharder -g AES_OFB -a
To apply only
the diehard opso test to the AES_OFB generator, specify the
test by name or number:
dieharder -g 205 -d 5
dieharder -g 205 -d diehard_opso
Nearly every aspect or field in dieharder's output report format is
user-selectable by means of display option flags. In addition, the field
separator character can be selected by the user to make the output
particularly easy for them to parse (-c ' ') or import into a spreadsheet (-c
dieharder -g 205 -d diehard_opso -c ',' -D test_name -D pvalues
to see an extremely terse, easy to import report or
dieharder -g 205 -d diehard_opso -c ' ' -D default -D histogram -D description
to see a verbose report good for a "beginner" that includes a full
description of each test itself.
Finally, the dieharder binary is remarkably autodocumenting even if the man page
is not available. All users should try the following commands to see what they
(prints the command synopsis like the one above).
dieharder -a -h
dieharder -d 6 -h
(prints the test descriptions only for -a(ll) tests or for the specific test
(lists all known tests, including how reliable rgb thinks that they are as
dieharder -g -1
(lists all known rngs).
(lists all the currently known display/output control flags used with -D).
Both beginners and experts should be aware that the assessment provided by
dieharder in its standard report should be regarded with great suspicion. It
is entirely possible for a generator to "pass" all tests as far as
their individual p-values are concerned and yet to fail utterly when
considering them all together. Similarly, it is probable
that a rng
will at the very least show up as "weak" on 0, 1 or 2 tests in a
typical -a(ll) run, and may even "fail" 1 test one such run in 10 or
so. To understand why this is so, it is necessary to understand something of
rng testing, p-values, and the null hypothesis!
dieharder returns "p-values". To understand what a p-value is and how
to use it, it is essential to understand the null hypothesis,
The null hypothesis for random number generator testing is "This generator
is a perfect random number generator, and for any choice of seed produces a
infinitely long, unique sequence of numbers that have all the expected
statistical properties of random numbers, to all orders". Note well that
that this hypothesis is technically false for all software
generators as they are periodic and do not have the correct entropy content
for this statement to ever be true. However, many hardware
fail a priori as well, as they contain subtle bias or correlations due to the
deterministic physics that underlies them. Nature is often
but it is rarely random
and the two words don't
(quite) mean the same thing!
The null hypothesis can be practically
true, however. Both software and
hardware generators can be "random" enough
sequences cannot be distinguished from random ones, at least not easily or
with the available tools (including dieharder!) Hence the null hypothesis is a
practical, not a theoretically pure, statement.
To test H0
, one uses the rng in question to generate a sequence of
presumably random numbers. Using these numbers one can generate any one of a
wide range of test statistics
-- empirically computed numbers that are
considered random samples
that may or may not be covariant subject to
H0, depending on whether overlapping sequences of random numbers are used to
generate successive samples while generating the statistic(s), drawn from a
known distribution. From a knowledge of the target distribution of the
statistic(s) and the associated cumulative distribution function (CDF) and the
value of the randomly generated statistic(s), one can read
off the probability of obtaining the empirical result if the sequence was
that is, if the null hypothesis is true and the generator in
question is a "good" random number generator! This probability is
the "p-value" for the particular test run.
For example, to test a coin (or a sequence of bits) we might simply count the
number of heads and tails in a very long string of flips. If we assume that
the coin is a "perfect coin", we expect the number of heads and
tails to be binomially distributed
and can easily compute the
probability of getting any particular number of heads and tails. If we compare
our recorded number of heads and tails from the test series to this
distribution and find that the probability of getting the count we obtained is
with, say, way more heads than tails we'd suspect the coin
wasn't a perfect coin. dieharder applies this very test (made mathematically
precise) and many others that operate on this same principle to the string of
random bits produced by the rng being tested to provide a picture of how
"random" the rng is.
Note that the usual dogma is that if the p-value is low -- typically less than
0.05 -- one "rejects" the null hypothesis. In a word, it is
improbable that one would get the result obtained if the generator is a good
one. If it is any other value, one does not "accept" the generator
as good, one "fails to reject" the generator as bad for this
particular test. A "good random number generator" is hence one that
we haven't been able to make fail yet!
This criterion is, of course, naive in the extreme and cannot be used with
It makes just as much sense to reject a generator that has
p-values of 0.95 or more! Both of these p-value ranges are equally
on any given test run, and should
be returned for (on
average) 5% of all test runs by a perfect
random number generator. A
generator that fails to produce p-values less than 0.05 5% of the time it is
tested with different seeds is a bad
random number generator, one that
the test of the null hypothesis. Since dieharder returns over 100
pvalues by default per test,
one would expect any perfectly good rng to
"fail" such a naive test around five times by this criterion in a
single dieharder run!
The p-values themselves, as it turns out, are test statistics! By their nature,
p-values should be uniformly distributed on the range 0-1. In 100+ test runs
with independent seeds, one should not be surprised to obtain 0, 1, 2, or even
(rarely) 3 p-values less than 0.01. On the other hand obtaining 7 p-values in
the range 0.24-0.25, or seeing that 70 of the p-values are greater than 0.5
should make the generator highly suspect! How can a user determine when a test
is producing "too many" of any particular value range for p? Or too
Dieharder does it for you, automatically. One can in fact convert a set
of p-values into a p-value by comparing their distribution to the expected
one, using a Kolmogorov-Smirnov test against the expected uniform distribution
p-values obtained from looking at the distribution of p-values
should in turn be uniformly distributed and could in principle be subjected to
still more KS tests in aggregate. The distribution of p-values for a
generator should be idempotent,
even across different test
statistics and multiple runs.
A failure of the distribution of p-values at any level of aggregation signals
trouble. In fact, if the p-values of any given test are subjected to a KS
test, and those p-values are then subjected to a KS test, as we add more
p-values to either level we will either observe idempotence of the resulting
distribution of p to uniformity, or
we will observe idempotence to a
single p-value of zero!
That is, a good generator will produce a
roughly uniform distribution of p-values, in the specific sense that the
p-values of the distributions of p-values are themselves roughly uniform and
so on ad infinitum, while a bad generator will produce a non-uniform
distribution of p-values, and as more p-values drawn from the non-uniform
distribution are added to its KS test, at some point the failure will be
absolutely unmistakeable as the resulting p-value approaches 0 in the limit.
The question is, trouble with what? Random number tests are themselves complex
computational objects, and there is a probability that their code is
incorrectly framed or that roundoff or other numerical -- not methodical --
errors are contributing to a distortion of the distribution of some of the
p-values obtained. This is not an idle observation; when one works on writing
random number generator testing programs, one is always
tests themselves with "good" (we hope) random number generators so
that egregious failures of the null hypothesis signal not a bad generator but
an error in the test code. The null hypothesis above is correctly framed from
point of view, but from a real and practical
of view it should read: "This generator is a perfect random number
generator, and for any choice of seed produces a infinitely long, unique
sequence of numbers that have all the expected statistical properties of
random numbers, to all orders and
this test is a perfect test and
returns precisely correct p-values from the test computation." Observed
"failure" of this joint null hypothesis H0'
can come from
failure of either or both of these disjoint components, and comes from the
as often or more often than the first during the test
development process. When one cranks up the "resolution" of the test
(discussed next) to where a generator starts to fail some test one realizes,
or should realize, that development never ends and that new test regimes will
always reveal new failures not only of the generators but of the code.
With that said, one of dieharder's most significant advantages is the control
that it gives you over a critical test parameter. From the remarks above, we
can see that we should feel very uncomfortable
"failing" any given random number generator on the basis of a 5%, or
even a 1%, criterion, especially when we apply a test suite
dieharder that returns over 100 (and climbing) distinct test p-values as of
the last snapshot. We want failure to be unambiguous and reproducible!
To accomplish this, one can simply crank up its resolution. If we ran any given
test against a random number generator and it returned a p-value of (say)
0.007328, we'd be perfectly justified in wondering if it is really a good
generator. However, the probability of getting this result isn't really all
that small -- when one uses dieharder for hours at a time numbers like this
will definitely happen quite frequently and mean nothing. If one runs the
test again (with a different seed or part of the random sequence)
and gets a p-value of 0.009122, and a third time and gets 0.002669 -- well,
that's three 1% (or less) shots in a row and that
should happen only
one in a million times. One way to clearly resolve failures, then, is to
increase the number of p-values
generated in a test run. If the actual
distribution of p being returned by the test is not uniform, a KS test will
return a p-value that is not some ambiguous 0.035517 but is
instead 0.000000, with the latter produced time after time as we rerun.
For this reason, dieharder is extremely conservative
about announcing rng
"weakness" or "failure" relative to any given test. It's
internal criterion for these things are currently p < 0.5% or p > 99.5%
weakness (at the 1% level total) and a considerably
criterion for failure: p < 0.05% or p > 99.95%. Note well that the
ranges are symmetric -- too high a value of p is just as bad (and unlikely) as
too low, and it is critical
to flag it, because it is quite possible
for a rng to be too good,
on average, and not to produce enough
low p-values on the full spectrum of dieharder tests. This is where the final
kstest is of paramount importance, and where the "histogram" option
can be very useful to help you visualize the failure in the distribution of p
-- run e.g.:
dieharder [whatever] -D default -D histogram
and you will see a crude ascii histogram of the pvalues that failed (or passed)
any given level of test.
Scattered reports of weakness or marginal failure in a preliminary -a(ll) run
should therefore not be immediate cause for alarm. Rather, they are tests to
repeat, to watch out for, to push the rng harder on using the -m option to -a
or simply increasing -p for a specific test. Dieharder permits one to increase
the number of p-values generated for any
test, subject only to the
availability of enough random numbers (for file based tests) and time, to make
failures unambiguous. A test that is truly
weak at -p 100 will almost
always fail egregiously at some larger value of psamples, be it -p 1000 or -p
100000. However, because dieharder is a research tool and is under perpetual
development and testing, it is strongly suggested
that one always
consider the alternative null hypothesis -- that the failure is a failure of
the test code in dieharder itself in some limit of large numbers -- and take
at least some steps (such as running the same test at the same resolution on a
"gold standard" generator) to ensure that the failure is indeed
probably in the rng and not the dieharder code.
Lacking a source of perfect
random numbers to use as a reference,
validating the tests themselves is not easy and always leaves one with some
ambiguity (even aes or threefish). During development the best one can usually
do is to rely heavily on these "presumed good" random number
generators. There are a number of generators that we have theoretical reasons
to expect to be extraordinarily good and to lack correlations out to some
known underlying dimensionality, and that also test out extremely well quite
consistently. By using several such generators and not just one, one can hope
that those generators have (at the very least) different
and should not all uniformly fail a test in the same way and with the same
number of p-values. When all of these generators consistently
test at a given level, I tend to suspect that the problem is in the test code,
not the generators, although it is very difficult to be certain,
many errors in dieharder's code have been discovered and ultimately fixed in
just this way by myself or others.
One advantage of dieharder is that it has a number of these "good
generators" immediately available for comparison runs, courtesy of the
Gnu Scientific Library and user contribution (notably David Bauer, who kindly
encapsulated aes and threefish). I use AES_OFB, Threefish_OFB, mt19937_1999,
gfsr4, ranldx2 and taus2 (as well as "true random" numbers from
random.org) for this purpose, and I try to ensure that dieharder will
"pass" in particular the -g 205 -S 1 -s 1 generator at any
reasonable p-value resolution out to -p 1000 or farther.
Tests (such as the diehard operm5 and sums test) that consistently fail
at these high resolutions are flagged as being "suspect" -- possible
failures of the alternative
null hypothesis -- and they are strongly
Their results should not be used to test random number
generators pending agreement in the statistics and random number community
that those tests are in fact valid and correct so that observed failures can
indeed safely be attributed to a failure of the intended
As I keep emphasizing (for good reason!) dieharder is community supported. I
therefore openly ask that the users of dieharder who are expert in statistics
to help me fix the code or algorithms being implemented. I would like to see
this test suite ultimately be validated
by the general statistics
community in hard use in an open environment, where every possible failure of
the testing mechanism itself is subject to scrutiny and eventual correction.
In this way we will eventually achieve a very powerful suite of tools indeed,
ones that may well give us very specific information not just about failure
but of the mode
of failure as well, just how the sequence tested
deviates from randomness.
Thus far, dieharder has benefitted tremendously from the community. Individuals
have openly contributed tests, new generators to be tested, and fixes for
existing tests that were revealed by their own work with the testing
instrument. Efforts are underway to make dieharder more portable so that it
will build on more platforms and faster so that more thorough testing can be
done. Please feel free to participate.
The simplest way to use dieharder with an external generator that produces raw
binary (presumed random) bits is to pipe the raw binary output from this
generator (presumed to be a binary stream of 32 bit unsigned integers)
directly into dieharder, e.g.:
cat /dev/urandom | ./dieharder -a -g 200
Go ahead and try this example. It will run the entire dieharder suite of tests
on the stream produced by the linux built-in generator /dev/urandom (using
/dev/random is not recommended as it is too slow to test in a reasonable
amount of time).
Alternatively, dieharder can be used to test files of numbers produced by a
candidate random number generators:
dieharder -a -g 201 -f random.org_bin
for raw binary input or
dieharder -a -g 202 -f random.org.txt
for formatted ascii input.
A formatted ascii input file can accept either uints (integers in the range 0 to
2^31-1, one per line) or decimal uniform deviates with at least ten
significant digits (that can be multiplied by UINT_MAX = 2^32 to produce a
uint without dropping precition), also one per line. Floats with fewer digits
will almost certainly fail bitlevel tests, although they may pass some of the
tests that act on uniform deviates.
Finally, one can fairly easily wrap any generator in the same (GSL) random
number harness used internally by dieharder and simply test it the same way
one would any other internal generator recognized by dieharder. This is
strongly recommended where it is possible, because dieharder needs to use a
of random numbers to thoroughly test a generator. A built in
generator can simply let dieharder determine how many it needs and generate
them on demand, where a file that is too small will "rewind" and
render the test results where a rewind occurs suspect.
Note well that file input rands are delivered to the tests on demand, but if the
test needs more than are available it simply rewinds the file and cycles
through it again, and again, and again as needed. Obviously this significantly
reduces the sample space and can lead to completely incorrect results for the
p-value histograms unless there are enough rands to run EACH test without
repetition (it is harmless to reuse the sequence for different tests). Let the
A frequently asked question from new users wishing to test a generator they are
working on for fun or profit (or both) is "How should I get its output
into dieharder?" This is a nontrivial question, as dieharder consumes
numbers of random numbers in a full test cycle, and then there
are features like -m 10 or -m 100 that let one effortlessly demand 10 or 100
times as many to stress a new generator even more.
Even with large file support
in dieharder, it is difficult to provide
enough random numbers in a file to really make dieharder happy. It is
therefore strongly suggested that you either:
a) Edit the output stage of your random number generator and get it to write its
production to stdout as a random bit stream
-- basically create 32 bit
unsigned random integers and write them directly to stdout as e.g. char data
or raw binary. Note that this is not
the same as writing raw floating
point numbers (that will not be random at all as a bitstream) and that
"endianness" of the uints should not matter for the null hypothesis
of a "good" generator, as random bytes are random in any order.
Crank the generator and feed this stream to dieharder in a pipe as described
b) Use the samples of GSL-wrapped dieharder rngs to similarly wrap your
generator (or calls to your generator's hardware interface). Follow the
examples in the ./dieharder source directory to add it as a "user"
generator in the command line interface, rebuild, and invoke the generator as
a "native" dieharder generator (it should appear in the list
produced by -g -1 when done correctly). The advantage of doing it this way is
that you can then (if your new generator is highly successful) contribute it
back to the dieharder project if you wish! Not to mention the fact that it
makes testing it very easy.
Most users will probably go with option a) at least initially, but be aware that
b) is probably easier than you think. The dieharder maintainers may
able to give you a hand with it if you get into trouble, but no promises.
A warning for those who are testing files of random numbers. dieharder is a tool
that tests random number generators, not files of random numbers!
extremely inappropriate to try to "certify" a file of random numbers
as being random just because it fails to "fail" any of the dieharder
tests in e.g. a dieharder -a run. To put it bluntly, if one rejects all such
files that fail any test at the 0.05 level (or any other), the one thing one
can be certain of is that the files in question are not
random, as a
truly random sequence would fail any given test at the 0.05 level 5% of the
To put it another way, any file of numbers produced by a generator
"fails to fail" the dieharder suite should be considered
"random", even if it contains sequences that might well
"fail" any given test at some specific cutoff. One has to presume
that passing the broader tests of the generator itself, it was determined that
the p-values for the test involved was globally
so that e.g. failure at the 0.01 level occurs neither more nor less than 1% of
the time, on average, over many many tests. If one particular file generates a
failure at this level, one can therefore safely presume that it is a
file pulled from many thousands of similar files the generator
might create that have the correct distribution of p-values at all levels of
testing and aggregation.
To sum up, use dieharder to validate your generator (via input from files or an
embedded stream). Then by all means use your generator to produce files or
streams of random numbers. Do not use dieharder as an accept/reject tool to
validate the files themselves!
To demonstrate all tests, run on the default GSL rng, enter:
To demonstrate a test of an external generator of a raw binary stream of bits,
use the stdin (raw) interface:
cat /dev/urandom | dieharder -g 200 -a
To use it with an ascii formatted file:
dieharder -g 202 -f testrands.txt -a
(testrands.txt should consist of a header such as:
# generator mt19937_1999 seed = 1274511046
To use it with a binary file
dieharder -g 201 -f testrands.bin -a
cat testrands.bin | dieharder -g 200 -a
An example that demonstrates the use of "prefixes" on the output lines
that make it relatively easy to filter off the different parts of the output
report and chop them up into numbers that can be used in other programs or in
dieharder -a -c ',' -D default -D prefix
As of version 3.x.x, dieharder has a single output interface that produces
tabular data per test, with common information in headers. The display control
options and flags can be used to customize the output to your individual
The options are controlled by binary flags. The flags, and their text versions,
are displayed if you enter:
by itself on a line.
The flags can be entered all at once by adding up all the desired option flags.
For example, a very sparse output could be selected by adding the flags for
the test_name (8) and the associated pvalues (128) to get 136:
dieharder -a -D 136
Since the flags are cumulated from zero (unless no flag is entered and the
default is used) you could accomplish the same display via:
dieharder -a -D 8 -D pvalues
Note that you can enter flags by value or by name, in any combination. Because
people use dieharder to obtain values and then with to export them into
spreadsheets (comma separated values) or into filter scripts, you can chance
the field separator character. For example:
dieharder -a -c ',' -D default -D -1 -D -2
produces output that is ideal for importing into a spreadsheet (note that one
can subtract field values from the base set of fields provided by the default
option as long as it is given first).
An interesting option is the -D prefix flag, which turns on a field identifier
prefix to make it easy to filter out particular kinds of data. However, it is
equally easy to turn on any particular kind of output to the exclusion of
others directly by means of the flags.
Two other flags of interest to novices to random number generator testing are
the -D histogram (turns on a histogram of the underlying pvalues, per test)
and -D description (turns on a complete test description, per test). These
flags turn the output table into more of a series of "reports" of
is entirely original code and can be modified and used at will
by any user, provided that:
a) The original copyright notices are maintained and that the source, including
all modifications, is made publically available at the time of any derived
publication. This is open source software according to the precepts and spirit
of the Gnu Public License. See the accompanying file COPYING, which also must
accompany any redistribution.
b) The primary author of the code (Robert G. Brown) is appropriately
acknowledged and referenced in any derived publication. It is strongly
suggested that George Marsaglia and the Diehard suite and the various authors
of the Statistical Test Suite be similarly acknowledged, although this suite
shares no actual code with these random number test suites.
c) Full responsibility for the accuracy, suitability, and effectiveness of the
program rests with the users and/or modifiers. As is clearly stated in the
THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT
SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR
CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE,
DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
OF THIS SOFTWARE.
The author of this suite gratefully acknowledges George Marsaglia (the author of
the diehard test suite) and the various authors of NIST Special Publication
800-22 (which describes the Statistical Test Suite for testing pseudorandom
number generators for cryptographic applications), for excellent descriptions
of the tests therein. These descriptions enabled this suite to be developed
with a GPL.
The author also wishes to reiterate that the academic correctness and accuracy
of the implementation of these tests is his sole responsibility and not that
of the authors of the Diehard or STS suites. This is especially true where he
has seen fit to modify those tests from their strict original descriptions.
GPL 2b; see the file COPYING that accompanies the source of this program. This
is the "standard Gnu General Public License version 2 or any later
version", with the one minor (humorous) "Beverage" modification
listed below. Note that this modification is probably not legally defensible
and can be followed really pretty much according to the honor rule.
As to my personal preferences in beverages, red wine is great, beer is
delightful, and Coca Cola or coffee or tea or even milk acceptable to those
who for religious or personal reasons wish to avoid stressing my liver.
The Beverage Modification to the GPL:
Any satisfied user of this software shall, upon meeting the primary author(s) of
this software for the first time under the appropriate circumstances, offer to
buy him or her or them a beverage. This beverage may or may not be alcoholic,
depending on the personal ethical and moral views of the offerer. The beverage
cost need not exceed one U.S. dollar (although it certainly may at the whim of
the offerer:-) and may be accepted or declined with no further obligation on
the part of the offerer. It is not necessary to repeat the offer after the
first meeting, but it can't hurt...