Picking the most precise ammo, probably.


Jeroen Hogema

May, 2006.

Minor update April 2019.


To articles overview




1       Introduction

Suppose you have two batches of pellets, or .22 cartridges, or two sets of handloaded ammo, and you want to decide which one is the best for your competition shooting. You can compare them by testing, i.e. fire some shots and observe which one is best. But what do you have to measure from the resulting holes in your target, and how do you use these measurements to get the ‘best’ answer? Furthermore, how sure should you feel that you are indeed picking the best batch? After all, due there is some randomness in the groups that you shoot. So when you repeat the test several times, it is not always the batch that is ‘really’ the best one that will win.  Using Monte Carlo simulations of shot groups as a basis, this article covers three issues.


Formally, there is a difference between accuracy and precision when quantifying shot groups. Johnson (2001) defined accuracy as the proximity of an array of shots to the centre of mass of a target, and marksmanship precision as the dispersion of an array of shots around their own centre of impact. In these terms, I will now only focus on precision.



2       Measures of precision

This chapter is mostly a replication of the work by Jack / John E. Leslie III (1994). He focussed on “the really big question”: Which statistics should be used to measure ammunition/firearm accuracy?".  


Leslie investigated several measures by means of computer simulations. These were:


For the definition of these measures, see Appendix A.




This is what I did to replicate Leslie’s work.


The results are presented in the following figures.





Figure 1 Percentage correctly picking the best grouping ammo as a function of the number of shots, for various metrics and for various % difference between the two sets.


These figures show the following results.




3       How many groups of how many shots?

Now let’s turn to the question what is the best way to use a given nr of shots when you are trying to select the better of two batches. Should you use all in one group, or distribute them over several groups? Suppose you are willing to fire 20 shots, should you keep them all in one group, or would you get better results from 2 groups of 10 shots? Or 4 groups of 5 shots?


This was now first investigated for the RSD, in the following manner.





Results are shown in Figure 2.


Figure 2 Percentage correctly picking the best grouping ammo as a function of the number of shots, based on Radial Standard Deviation: all shots in 1 group, or in 2 groups, or in three groups. 


The results show that distributing the shots over more groups reduces the percentage of correct decisions. Thus, when using the RSD, it is better to use all shots in a single group.


When using the same approach with the Shot Group Diameter as the metric, the results as shown in Figure 4 are found.




Figure 3  Percentage correctly picking the best grouping ammo as a function of the number of shots, based on  Shot Group Diameter: all shots in 1 group, or in 2, 3 or 4 groups. 


Here, the results are different. Putting al shots in 1 group is not always the best approach. The trick is obviously that by using more groups, you are using more data from the total number of shots in the estimation (which is also why diagonal and FOM are better than SGD).


From Figure 3, the following can be seen.


Further pieces of the puzzle:


I’m still looking for the overall pattern, or but it seems that using multiples of 4, 5 or 6 shots per group yields the best results.


In conclusion, when using the SGD, you are better off when you use all shots in a single group. When using the shot group diameter, the best results are obtained by using an average over several sub-groups. The optimal number of sub-groups depends on the total number of shots you are willing to do for your test.



4       Is the difference statistally significant?

Until now, the approach has been to fire 2 sets of shots, obtain the metric of you choice, pick the best one, and proclaim that one as the winner. No matter how small the difference. A more refined approach is trying to quantify how sure you are that they are ‘really’ different. If the difference between the two is ‘small’, it might well be that if you repeat the test, the other one is the winner. A formal way to include this in your ammo testing is to carry out a statistical test on the difference. Such a statistical test yields a so-called p-value. Only when p is below a threshold value of alpha of e.g. the traditional level of 0.05, the difference is said to be ‘statistically significant’. Meaning that when you repeat the test, you can be pretty sure you will find the same batch as before to be the best one.  


The mean radius is a good candidate to use as a starting point. First, it’s a good metric according to Chapter 2. Second, you get a radius for each individual shot in the group, and that is needed as input for a statistical test.  When looking for a statement about the means of the radii being equal or not, a t-test is a suitable candidate to start with.


This brings us to a test situation with three possible results: batch nr 1 is the best one, batch nr 2 is the best one, or ‘Not Significant’, i.e., the difference in means is too small to be sure which one is best.


The immediate next question is about the statistical power of the test: what is the probability that the test will identify a winner. After all, a test that ends up with don’t really know most of the time is not too helpful. The statistical power depends on three factors:


I ran further simulation runs to find out the relationship between the statistical power one the one hand, and these three factors on the other. Results are shown in Figure 4 and Figure 5.



Figure 4 Statistical power as a function of the effect size and level of significance (t-test based on 30 shots).



Figure 5 Statistical power as a function of the effect size and the number of shots in each group (t-test, alpha = 0.05).


These figures illustrate that power can be improved by (1) increasing your alpha, (2) increasing your number of shots per group, or (3) considering only larger effect sizes.


Some statistical considerations:


Going back to the situation where you are trying to pick the best from 2 sets of ammo (no matter how small the difference), the Section 2 results show that e.g. 


When introducing the significance testing to this situation, using an alpha level of 0.05, you have about 65% probability of obtaining a significant effect. This 65% is not very impressive, I would say. And this is still for a fairly large actual difference between the two batches!


If you are interested in more subtle effects, let’s say 20% effect size, then Figure 6 shows the relationship between the statistical power one the one hand, and the number of shots required and the alpha level on the other.



Figure 6 Statistical power as a function of the number of shots in each group and the and level of significance (t-test; effect size 20%).



With the classical value of alpha of 0.05, you need two groups of 130 shots to reach 80% statistical power. Obviously this is not very realistic. A lower number of shots can suffice if you are willing to use a higher alpha. But even when using an alpha of 0.2, you need to do 75 shots per group if you want to reach 80% statistical power. That’s still 150 shots in total that need to be measured and typed into your spreadsheet. And


This shows the practical disadvantage of introducing statistical tests in trying to pick the most precise ammo. If you want to have a decent chance of finding significant differences, even when the differences are fairly small, then you need to fire and measure more test shots than you would like. Perhaps the Chapter 2 approach of picking the batch with the smallest MR or RSD and forget about statistical significance is more practical after all.


In the end of the day, you’ve got to make your choice anyway.  

 To articles overview small, medium and large effect size (Power =0.80) (from Cohen, 1992).




5       References

Cohen, J. (1992). A power primer. Psychological Bulletin, 112 (1), 155?159.


Grubbs, F.E. (1991). Statistical Measures of Accuracy for Riflemen and Missile Engineers (available from the author at 4109 Webster Road, Havre De Grace, Md. 21078.).


Johnson, R.F. (2001). Statistical measures of marksmanship (Report TN?01/02). Natick, MA: U.S. Army Research Institute of Environmental Medicine.


Leslie, J.E.III. (1994). Is "Group Size" the Best Measure of Accuracy?. Published as "Is 'Group Size' the Best Measure of Accuracy?", The Canadian Marksman, 129, no. 1 (Autumn 1994): p46-48. (April 2006: http://www.shootersjournal.com/Features/WHICHONE.pdf ).



Appendix A           Calculating the metrics

Shot Group Diameter (SGD), or Extreme Spread: distance between the (centres of) the two holes that are furthest apart.


Figure Of Merit: The figure of merit (FOM) is the average of the maximum horizontal group spread and the maximum vertical group spread.


Diagonal: calculated by taking the square root of the sum of the maximum horizontal spread  squared and the maximum vertical spread squared.


For remaining metrics involved, you need to measure the location the centre of each shot, yielding an x and y coordinate for each shot : xi, yi (for i = 1 to n, with n the number of shots involved).


Mean radius (MR): first calculate the radius ri of each shot with respect to the group centre (xCOG, yCOG). MR is the average over all shots.







Radial Standard Deviation (RSD): calculated as the square root of the sum of the variance of all xi plus the variance of all yi.



where, using the general equation for standard deviations:


Note that the RSD can be re-written as follows:


and since ,



Showing that the RSD is (indeed) the standard deviation of the radii of the shots with respect to the shot group centre.



For further details and background, see Leslie (1994) or e.g. Johnson (2001) or Grubbs (1991).