**Picking the most precise ammo, probably.**

Jeroen Hogema

May, 2006.

Minor
update April 2019.

* *

Suppose you have two batches of pellets, or .22 cartridges, or two sets of handloaded ammo, and you want to decide which one is the best for your competition shooting. You can compare them by testing, i.e. fire some shots and observe which one is best. But what do you have to measure from the resulting holes in your target, and how do you use these measurements to get the ‘best’ answer? Furthermore, how sure should you feel that you are indeed picking the best batch? After all, due there is some randomness in the groups that you shoot. So when you repeat the test several times, it is not always the batch that is ‘really’ the best one that will win. Using Monte Carlo simulations of shot groups as a basis, this article covers three issues.

- From several known measures for shot group tightness measures, which one is the most often correct when you are trying to identify the best batch (no matter how small the difference).
- What is the best way to spend a given number if shots: all in 1 group, or several sub-groups?
- Using a statistical test to find out if the
differences between two batches are statistically significant.

Formally, there
is a difference between *accuracy
*and *precision
*when quantifying shot groups. Johnson (2001) defined *accuracy *as the proximity of an array of shots to the centre of
mass of a target, and marksmanship *precision*
as the dispersion of an array of shots around their own centre of
impact. In these terms, I will now only focus on precision.

This chapter is
mostly a replication of the work by Jack / John E. Leslie III
(1994). He focussed on “the really big question”: *Which statistics should be used to
measure ammunition/firearm accuracy?*".

Leslie investigated several measures by means of computer simulations. These were:

- radial standard deviation (RSD)
- mean radius (MR)
- diagonal
- figure of merit (FOM)
- shot group diameter
(SGD), a.k.a. extreme spread.

For the definition of these measures,
see Appendix A.

This is what I did to replicate Leslie’s work.

- Simulate 2 groups of shots, with a difference of 10, 20, 30 or 40% difference in the parameter that determines the dispersion. So we know which one is ‘really’ the best one.
- Calculate all metrics for both shot groups.
- Use these metrics in trying to identify the best group. (Due to the randomness that is inherent to the shot groups, sometimes you might pick the wrong one, especially when the number of shots is low).
- Repeat many times.
- Finally, use the results to determine the % of times the right group was picked.
- Repeat this entire process for various conditions, e.g.:
- for various numbers of shots in the group.
- for various differences between the parameters that determine the dispersion.

The results are presented in the following figures.

Figure 1 Percentage correctly picking the best grouping ammo as a function of the number of shots, for various metrics and for various % difference between the two sets.

These figures show the following results.

- In line with Leslie’s findings, RSD is the superior metric, followed closely by the mean radius. Shot group diameter is the worst. Figure of merit and diagonal are almost identical and somewhat better than shot group diameter.
- If the 2 batches are different enough (>40%), about 10 shots is enough, even when using the shot group diameter as the metric, to be 90% sure of correctly picking the best batch.
- If the 2
batches are close together (10% difference), MANY shots are needed
if you want to be sure enough of picking the correct winner. E.g.,
if you accept a 15% change of picking the wrong batch, (i.e., 85%
probability of picking the right one), you need 60 shots when using
RSD as the metric. The same number of shots will only give you 73%
probability of picking the right one when using shot group diameter.

Now let’s turn to the question what is the best way to use a given nr of shots when you are trying to select the better of two batches. Should you use all in one group, or distribute them over several groups? Suppose you are willing to fire 20 shots, should you keep them all in one group, or would you get better results from 2 groups of 10 shots? Or 4 groups of 5 shots?

This was now first investigated for the RSD, in the following manner.

- Simulate groups from 2 sets of ammo, with 20% difference in the parameter that determines the dispersion.
- Then determine the RSD (Radial Standard Deviation) for both shot groups.
- The group with the highest RSD is assumed to be from the worst set.
- Repeat the steps above many times, and then determine the % of times the right group was picked for various conditions:
- as a function of the number of shots being used in the group, and
- with all shots in 1 group, but also with the shots distributed over 2 or 3 groups. Here, the mean of the resulting 2 or 3 RSDs was used to pick the best group.

Results are shown in Figure 2.

Figure 2 Percentage correctly picking the best grouping ammo as a function of the number of shots, based on Radial Standard Deviation: all shots in 1 group, or in 2 groups, or in three groups.

The results show that distributing the shots over more groups reduces the percentage of correct decisions. Thus, when using the RSD, it is better to use all shots in a single group.

When using the same approach with the Shot Group Diameter as the metric, the results as shown in Figure 4 are found.

Figure
3 Percentage correctly
picking the best grouping ammo as a function of the number of shots,
based on Shot Group Diameter: all shots
in 1 group, or in 2, 3 or 4 groups.

Here, the results are different. Putting al shots in 1 group is not always the best approach. The trick is obviously that by using more groups, you are using more data from the total number of shots in the estimation (which is also why diagonal and FOM are better than SGD).

From Figure 3, the following can be seen.

- For less than 10 shots, 1 group is the best.
- For 10 shots or more, 2 groups is better than 1 (i.e., 2 groups of 5 is the turning point).
- For 18 shots or more, 3 groups are better than 2 (i.e., 3 groups of 6 is the turning point).
- For 24 shots or more, 4 groups is better than 3 (i.e., 4 groups of 6 is the turning point).

Further pieces of the puzzle:

- 3 groups of 4 shots are better than 1 group of 12.
- 4 groups of 4 shots are better that 1 group of 16.
- 5 groups of 4 shots are better than 1 group of 20.
- 3 groups of 6 is better than 2 groups of 9.
- 4 groups of 6 are better than 3 groups of 8.
- 4 groups of 5 are better than 5 groups of 4 and better than 2 groups of 10 .
- 5 groups of 6 are better than 3 groups of 10.

I’m still looking for the overall pattern, or but it seems that using multiples of 4, 5 or 6 shots per group yields the best results.

In conclusion, when using the SGD, you are better off when you use all shots in a single group. When using the shot group diameter, the best results are obtained by using an average over several sub-groups. The optimal number of sub-groups depends on the total number of shots you are willing to do for your test.

Until now, the approach has been to fire 2 sets of shots, obtain the metric of you choice, pick the best one, and proclaim that one as the winner. No matter how small the difference. A more refined approach is trying to quantify how sure you are that they are ‘really’ different. If the difference between the two is ‘small’, it might well be that if you repeat the test, the other one is the winner. A formal way to include this in your ammo testing is to carry out a statistical test on the difference. Such a statistical test yields a so-called p-value. Only when p is below a threshold value of alpha of e.g. the traditional level of 0.05, the difference is said to be ‘statistically significant’. Meaning that when you repeat the test, you can be pretty sure you will find the same batch as before to be the best one.

The mean radius is a good candidate to use as a starting point. First, it’s a good metric according to Chapter 2. Second, you get a radius for each individual shot in the group, and that is needed as input for a statistical test. When looking for a statement about the means of the radii being equal or not, a t-test is a suitable candidate to start with.

This brings us to a test situation with three possible results: batch nr 1 is the best one, batch nr 2 is the best one, or ‘Not Significant’, i.e., the difference in means is too small to be sure which one is best.

The immediate next question is about the
*statistical power*
of the test: what is the probability that the test will identify a
winner. After all, a test that ends up with *don’t
really know *most of the time is not too helpful.
The statistical power depends on three factors:

- the selected level of alpha,
- the effect size, i.e., the magnitude of the difference that you want to be able to detect as being significant, and
- the number of shots in each group.

I ran further simulation runs to find out the relationship between the statistical power one the one hand, and these three factors on the other. Results are shown in Figure 4 and Figure 5.

Figure 4 Statistical power as a function of the effect size and level of significance (t-test based on 30 shots).

Figure 5 Statistical power as a function of the effect size and the number of shots in each group (t-test, alpha = 0.05).

These figures illustrate that power can be improved by (1) increasing your alpha, (2) increasing your number of shots per group, or (3) considering only larger effect sizes.

Some statistical considerations:

- OK, the radii follow a Rayleigh distribution rather than a normal distribution. I am assuming that the number of shots is large enough to make a normal distribution a close enough approximation
- OK, since we are dealing with a Rayleigh distribution, the mean and standard deviation of the radii are correlated. Meaning that when the means are different, so are the standard deviations, meaning that the t-test assumption of equal variances is violated. The Welch's t-test is more appropriate. I did that as well, and the results are almost identical to the ordinary t-test results.

Going back to the situation where you are trying to pick the best from 2 sets of ammo (no matter how small the difference), the Section 2 results show that e.g.

- when using 30 shots per group,
- and when using the mean radius (MR) to select the best one,
- and when the 'real' difference between the 2 sets is 40%,
- then you have about 97% probability of picking the right one.

When introducing the significance testing to this situation, using an alpha level of 0.05, you have about 65% probability of obtaining a significant effect. This 65% is not very impressive, I would say. And this is still for a fairly large actual difference between the two batches!

If you are interested in more subtle effects, let’s say 20% effect size, then Figure 6 shows the relationship between the statistical power one the one hand, and the number of shots required and the alpha level on the other.

Figure 6 Statistical power as a function of the number of shots in each group and the and level of significance (t-test; effect size 20%).

With the classical value of alpha of 0.05, you need two groups of 130 shots to reach 80% statistical power. Obviously this is not very realistic. A lower number of shots can suffice if you are willing to use a higher alpha. But even when using an alpha of 0.2, you need to do 75 shots per group if you want to reach 80% statistical power. That’s still 150 shots in total that need to be measured and typed into your spreadsheet. And

This shows the practical disadvantage of introducing statistical tests in trying to pick the most precise ammo. If you want to have a decent chance of finding significant differences, even when the differences are fairly small, then you need to fire and measure more test shots than you would like. Perhaps the Chapter 2 approach of picking the batch with the smallest MR or RSD and forget about statistical significance is more practical after all.

Grubbs,
F.E. (1991). *Statistical Measures
of Accuracy for Riflemen and Missile Engineers*
(available from the
author at 4109 Webster Road, Havre De Grace, Md. 21078.).

Johnson,
R.F. (2001). *Statistical measures
of marksmanship* (Report
TN?01/02).
Natick, MA: U.S. Army Research Institute of Environmental
Medicine.

Leslie,
J.E.III. (1994). *Is
"Group Size" the Best Measure of Accuracy?*. Published as "Is 'Group
Size' the Best Measure of Accuracy?", *The
Canadian Marksman, *129, no. 1 (Autumn 1994):
p46-48. (April 2006:
http://www.shootersjournal.com/Features/WHICHONE.pdf ).

**Appendix A
**Calculating the metrics

**Shot
Group
Diameter **(SGD), or
Extreme Spread: distance between the (centres of) the two holes that
are furthest apart.

**Figure
Of
Merit**: The figure of
merit (FOM) is the average of the maximum horizontal group spread and
the maximum vertical group spread.

**Diagonal**: calculated by taking the square root of the sum of the
maximum horizontal spread squared and the
maximum vertical spread squared.

For remaining metrics involved, you need
to measure the location the centre of each shot, yielding an x and
y coordinate for each shot : x_{i}, y_{i} (for i =
1 to n, with n the number of shots involved).

**Mean
radius
(MR)**: first
calculate the radius r_{i} of each shot with respect to the
group centre (x_{COG}, y_{COG}). MR is the average
over all shots.

_{} _{}

_{}

_{}

**Radial
Standard
Deviation (RSD)**:
calculated as the square root of the sum of the variance of all x_{i
}plus the variance of all y_{i. }

_{}

where, using the general equation for standard deviations:

_{}

_{}

Note that the RSD can be re-written as follows:

_{}

and since _{},

_{}

Showing that the RSD is (indeed) the standard deviation of the radii of the shots with respect to the shot group centre.

For further details and background, see Leslie (1994) or e.g. Johnson (2001) or Grubbs (1991).