Testing RNG Uniformity: Visual Checks, Statistical Tests, and Sample Sizes
Uniformity is the cornerstone of quality random number generation. A uniform random number generator produces each possible outcome with equal probability—whether generating integers, floating-point numbers, or selecting from discrete options. Testing for uniformity ensures your random number generator behaves correctly, which is critical for fair simulations, unbiased sampling, and reliable statistical analysis.
When a random number generator fails the uniformity test, bias creeps into your results. This bias can be subtle—perhaps certain numbers appear 0.1% more often than others—but over millions of trials, even small biases compound into significant errors. Whether you're building a game, running scientific simulations, or generating random samples, verifying uniformity protects against these insidious failures.
Generate random numbers and test their properties using our Random Number Generator tool, then apply these testing methods to verify uniformity.
Understanding Uniformity
Uniformity means that every possible outcome has an equal probability of occurring. For a uniform distribution over integers 1 through 10, each number should appear exactly 10% of the time in a large sample. For a continuous uniform distribution over [0, 1), any interval of equal width should contain equal probabilities.
Discrete Uniformity
For discrete outcomes (like rolling a die or selecting from a list):
Expected frequency: If generating N numbers from k possible outcomes, each outcome should appear approximately N/k times.
Example: Generating 10,000 random integers from 1 to 10:
- Expected frequency: 10,000 / 10 = 1,000 occurrences per number
- Observed frequencies should cluster around 1,000
Continuous Uniformity
For continuous distributions (like floating-point numbers in [0, 1)):
Equal probability intervals: Any interval [a, b] within [0, 1) should contain proportion (b - a) of all generated values.
Example: For uniformly distributed values in [0, 1):
- Interval [0, 0.5)should contain ~50% of values
- Interval [0.5, 1.0)should contain ~50% of values
- Interval [0, 0.1)should contain ~10% of values
Visual Inspection Methods
Before diving into statistical tests, visual inspection provides quick insights into RNG behavior and can reveal obvious problems.
Histograms
Histograms display the frequency distribution of generated values, making it easy to spot imbalances.
Procedure:
- Generate a large sample (10,000+ values)
- Divide the range into bins (10-50 bins typically)
- Count occurrences in each bin
- Plot histogram with expected frequency line
Interpreting Histograms:
- Uniform distribution: Bars should have similar heights with random variation
- Bias indicators: Systematic patterns, repeated spikes, or gaps
- Smoothness: Uniform distributions should appear smooth, not jagged
Example: Testing a Die Simulator Generate 60,000 rolls and create a histogram with 6 bins. Each bin should contain approximately 10,000 occurrences. If one bin consistently shows 11,000+ while another shows 9,000-, you've detected bias.
Quantile-Quantile (Q-Q) Plots
Q-Q plots compare the distribution of your sample against the expected uniform distribution.
Procedure:
- Generate sample values
- Sort sample values
- Compare sorted sample quantiles against theoretical uniform quantiles
- Plot points—should form a straight diagonal line
Interpreting Q-Q Plots:
- Straight diagonal line: Indicates uniform distribution
- Curved line: Suggests skewness or bias
- S-shaped curve: Indicates systematic deviation from uniformity
Practical Tip: Use statistical software (R, Python with matplotlib) to generate Q-Q plots automatically from your samples.
Runs Plots
Runs plots display consecutive values, revealing patterns that histograms might miss.
Procedure:
- Generate sequence of values
- Plot values in order (or pairs of consecutive values)
- Look for clustering, cycles, or patterns
What to Look For:
- Random scatter: Good uniformity
- Clustering: Values grouping together suggests correlation
- Cycles: Repeating patterns indicate periodicity or poor seeding
- Gaps: Missing regions suggest range problems
Example: Testing 2D Uniformity Plot pairs of consecutive values (x_i, x_i+1) as points in a 2D space, where i represents the index position. Uniform distribution should show points scattered evenly across the square. Clusters or bands indicate correlation between consecutive values.
Statistical Tests for Uniformity
Visual inspection reveals obvious problems, but statistical tests provide quantitative measures of uniformity with defined significance levels.
Chi-Square Goodness-of-Fit Test
The chi-square test is the most common method for testing discrete uniformity.
Procedure:
- Generate N values from k possible outcomes
- Count observed frequency O_i for each outcome i
- Calculate expected frequency E_i = N / k
- Compute chi-square statistic: χ² = Σ((O_i - E_i)² / E_i)
- Compare to chi-square distribution with (k-1) degrees of freedom
Example Calculation: Testing 10,000 integers from 1 to 10:
- Expected per number: 1,000
- Observed: [998, 1002, 995, 1008, 1001, 997, 1003, 999, 1004, 993]
- χ² = (998-1000)²/1000 + (1002-1000)²/1000 + ... = 0.034
- Critical value (α=0.05, df=9): 16.92
- Since 0.034 < 16.92, we fail to reject uniformity (good result)
Requirements:
- Expected frequency ≥ 5 per bin (preferably ≥ 10)
- Independent observations
- Large sample size (typically 1,000+)
Limitations:
- Less sensitive to local deviations than global tests
- Requires sufficient sample size per bin
- Can miss subtle patterns
Kolmogorov-Smirnov (KS) Test
The KS test compares the empirical cumulative distribution function (CDF) to the theoretical uniform CDF.
Procedure:
- Generate N values and sort them
- Compute empirical CDF F_n(x)
- Compare to theoretical uniform CDF F(x) = x
- Calculate maximum difference: D = max|F_n(x) - F(x)|
- Compare to KS distribution critical values
Advantages:
- Works for continuous distributions
- No binning required
- Sensitive to deviations across the entire range
Limitations:
- Less powerful for detecting tail deviations
- Requires knowledge of the theoretical distribution
Example: Testing [0, 1) Uniformity
Generate 10,000 values, sort them, and compute the maximum difference between empirical and theoretical CDFs. Large differences indicate non-uniformity.
Anderson-Darling Test
The Anderson-Darling test is similar to KS but gives more weight to tail deviations.
Advantages:
- More sensitive to tail behavior than KS
- Better for detecting deviations at distribution extremes
- Commonly used in statistical quality control
When to Use:
- When tail behavior is critical
- When you suspect problems at distribution boundaries
- When you need higher sensitivity than KS test
Practical Testing Procedure
A systematic approach ensures thorough testing:
Step 1: Generate Large Sample
Sample Size Guidelines:
- Coarse checks: 1,000-10,000 values
- Thorough testing: 100,000-1,000,000 values
- Research-grade: 1,000,000+ values
Rule of Thumb: For chi-square with k bins, ensure expected frequency ≥ 5 per bin. If testing 10 bins, generate at least 50 values, but 1,000+ provides better power.
Step 2: Visual Inspection
Start with visual methods:
- Create histogram with appropriate binning
- Generate Q-Q plot
- Create runs plot for sequence analysis
Action: If visual inspection reveals obvious problems, fix the RNG before proceeding to statistical tests.
Step 3: Statistical Testing
Apply multiple tests:
- Chi-square for discrete uniformity
- KS or Anderson-Darling for continuous uniformity
- Run tests on multiple samples with different seeds
Best Practice: Use multiple seeds to ensure test results aren't seed-specific. A good RNG should pass tests across various seeds.
Step 4: Interpret Results
P-values:
- p < 0.05: Reject uniformity (evidence of bias)
- p ≥ 0.05: Fail to reject uniformity (consistent with uniform distribution)
Important: Failing to reject doesn't prove uniformity—it means no evidence of non-uniformity was found. Multiple tests with consistent results increase confidence.
Step 5: Document and Report
Record:
- Sample size
- Number of bins (if applicable)
- Test statistics and p-values
- Seeds used
- Any visual observations
Common Pitfalls
Pitfall 1: Insufficient Sample Size Small samples lack statistical power to detect subtle biases. Use thousands to hundreds of thousands of values for meaningful tests.
Pitfall 2: Poor Binning Choices Uneven bin sizes or too few bins distort chi-square results. Use equal-sized bins and ensure adequate expected frequencies.
Pitfall 3: Testing Single Seed One seed might produce a "lucky" sequence that passes tests. Test multiple seeds to ensure consistency.
Pitfall 4: Over-Interpreting Visual Results Visual patterns can be misleading. Small random variations look like patterns. Always supplement visual inspection with statistical tests.
Pitfall 5: Ignoring Multiple Testing Running many tests increases the chance of false positives. Use Bonferroni correction or focus on a few key tests.
Worked Example: Testing a Custom RNG
Scenario: Testing a custom RNG that generates integers from 1 to 20.
Step 1: Generate 100,000 values.
Step 2: Visual inspection - histogram shows roughly equal bars (good sign).
Step 3: Chi-square test:
- Expected per number: 5,000
- Observed frequencies range from 4,987 to 5,023
- χ² = 18.4
- Critical value (α=0.05, df=19): 30.14
- p-value ≈ 0.50 (fail to reject uniformity)
Step 4: Repeat with 10 different seeds - all pass chi-square test.
Conclusion: RNG appears uniform for this use case.
Conclusion
Testing RNG uniformity is essential for ensuring reliable random number generation. Visual methods provide quick insights, while statistical tests offer quantitative assessments. A combination of both approaches, applied to large samples across multiple seeds, provides confidence in RNG quality.
Remember that no test proves perfection—they only detect deviations from uniformity. Multiple tests with consistent results increase confidence, but good RNGs should pass tests across various conditions and seeds.
For practical random number generation, use our Random Number Generator, which employs high-quality algorithms designed to pass standard uniformity tests. Then apply these testing methods to verify the results meet your specific requirements.
For more on RNG quality, explore our articles on seeding and repeatability, common RNG mistakes, and true vs pseudo-randomness.
FAQs
How many samples do I need for reliable testing?
For chi-square tests, aim for at least 5 expected observations per bin (preferably 10+). For 10 bins, that means 50-100+ samples minimum, but 1,000+ provides better statistical power. For thorough testing, use 100,000+ samples.
What if my RNG fails uniformity tests?
First, verify your test implementation is correct. If tests are valid, the RNG likely has bias. Consider using a different algorithm (modern PRNGs like PCG or xoshiro), check seeding practices, or investigate range conversion issues (modulo bias).
Can visual inspection replace statistical tests?
No. Visual inspection can reveal obvious problems but lacks the rigor of statistical tests. Use visual methods for initial screening, then follow up with statistical tests for quantitative assessment.
How often should I test my RNG?
Test during development and when changing RNG implementations. For production systems, periodic testing (monthly or quarterly) helps catch degradation or implementation issues. Test after system updates that might affect RNG behavior.
Is perfect uniformity achievable?
Theoretical perfect uniformity requires infinite samples. In practice, good RNGs produce distributions that are statistically indistinguishable from uniform given appropriate sample sizes. Focus on RNGs that pass standard statistical tests rather than seeking perfect uniformity.
Sources
- Knuth, Donald E. The Art of Computer Programming, Volume 2: Seminumerical Algorithms. Addison-Wesley, 1997.
- L'Ecuyer, Pierre, and Simard, Richard. "TestU01: A C Library for Empirical Testing of Random Number Generators." ACM Transactions on Mathematical Software, 2007.
- Marsaglia, George. "Diehard: A Battery of Tests of Randomness." Florida State University, 1996.
