To select for a simply-inherited trait requires knowing just three things: the number of loci involved (often just one), the number of alleles at each locus (usually a small number), and the genotypes or possible genotypes of the parents-to-be (again typically a small number).
In the case of a simply-inherited trait that is partially dominant, such as Andalusian chicken colour, all three pieces of information are known. There is just one locus (B), two alleles (’B’ and ‘b’), and three genotypes easily identifiable by eye (’BB’, black; ‘Bb’, slate blue; and ‘bb’, white).
Selection of a partially dominant trait is quite straightforward. Should you wish all white chickens, remove the blacks and blues from your flock and breed only the whites. All subsequent chicks will be white. Should you want only black chickens, remove the other two colours and breed only the blacks to produce generations of black-only birds. Keeping only slate-blue birds will produce all three genotypes in the next generation — not ideal if you don’t want black and white chickens, but at least you can still know their genotypes by looking at them. And to breed for a guaranteed all-blue next generation, simply cross black birds with white ones, and repeat.
Selecting for or against a genotype of a partially dominant trait is easy, simply because the genotype is visibly expressed.
What about a completely dominant trait, like the suri/huacaya genotype in alpacas mentioned last week?
Assume a herd of heterozygous suris, homozygous suris, and huacayas. The huacaya geneotype is recessive, so selecting for all huacayas and removing all suris will give you an all-huacaya herd that breeds true-to-type in no time. But should you want only suris, removing all huacayas won’t help, as the heterozygous suris are carriers of the huacaya allele, and may well produce future huacayas. They won’t always breed true-to-type.
It’s quite easy to select for or against a co-dominant genotype, as with the Andalusian chickens. It’s also quite easy to select for a recessive genotype, as with the huacaya alpacas. This is because you know without question the genotypes of the parents-to-be, and can predict the outcomes definitively.
It is more difficult to select against a recessive trait, ie to completely remove the recessive allele from a population. (Unless random genetic drift does it for you, but that is hardly a reliable breeding tool!) And likewise for the opposite side of the same coin: selecting for a homozygous dominant trait to again completely remove the recessive allele.
This is because the genotypes of the parents-to-be isn’t always known definitively. The recessive allele could be carried by any number of animals, though not expressed. Barring the existence of a genetic test, those carriers can’t be identified unless and until an offspring with the recessive phenotype is born.
Traditionally, a test mating was done only for males, partly because females aren’t usually as valuable to justify the test (just let the progeny ‘chips’ fall as they may, essentially), and partly because males can produce greater numbers of progeny, and more quickly, than a female. Apart from these practicalities, there is no intrinsic difference in testing males versus females. And the advent of embryo transfer (ET) technology does now allow greater numbers of offspring from a female to be tested, if the cost can be justified.
Revisiting the test cross scenarios of suri (dominant allele) over huacaya (recessive allele) last week, a suri need produce just one huacaya to show conclusively it is heterozygous. A homozygous suri never will:
But after how many matings can we be sure an animal really is homozygous? It could be entirely due to chance that the recessive allele was never passed onto the progeny, despite numerous matings. Of course, the more matings that don’t produce the recessive phenotype, the higher the probability of homozygosity. If the probability is high enough we can conclude that an animal is indeed not a carrier.
Alpacas, as with other animals such as horses and cattle, are animals that typically have single births. Species where twinning or litters are the norm, such as with sheep, dogs, pigs and poultry, wouldn’t require as many matings for a homozygous recessive to appear.
The probabilities of different mating scenarios can be calculated, and we shall cover these in subsequent posts!
Consider any species that typically gives birth to one offspring per mating, and that within that species is any simply-inherited trait with a dominant ‘A’ allele and a recessive ‘a’ allele.
Now consider four individuals of that species, all with known genotypes: a heterozygous dominant male (’Aa’), a homozygous dominant female (’AA’), a heterozygous dominant female (’Aa’), and a homozygous recessive female (’aa’). Assume a genetic test doesn’t exist. The mating outcomes of the male over each of these females can be summarised as:
We know with certainty which of the progeny are recessive (’aa’) just by looking at them. We also know with certainty that the genotypes of the homozygous recessive female’s offspring that aren’t themselves recessive must be heterozygous dominant (’Aa’). But how do we know which of the others are homozygous (’AA’) or heterozygous (’Aa’) dominant?
All we can know for sure is that the probability of an ‘AA’ genotype resulting when our male is crossed with a known ‘AA’ female is 1 in 2, or 0.5. Likewise the probability of an ‘Aa’ genotype from the same mating is 0.5. The probability of an ‘AA’ genotype with a known ‘Aa’ mating is 1 in 4, or 0.25, and of an ‘Aa’ genotype is 2 in 4, or 0.5.
We can calculate these probabilities as we already know the genotypes of the parents. But if we don’t know the genotypes of the parents, we must then rely on what we know about population genetics, and work with the probabilities of certain genotypes being in the population as a whole.
We can use Punnet squares similarly to calculate the probabilities of which genotypes are likely to be in the offspring, given the known probabilities of those genotypes being in the starting population.
Instead of p and q representing gene frequencies, let them now represent the probabilities that an animal will contribute a dominant allele or a recessive allele.
Here, p is the probability that ‘A’ is passed to the progeny, and q is the probability that ‘a’ is passed to the progeny.
If we know a male is ‘Aa’ then both p and q are 0.5. There is a 50% chance that any of his gametes picked at random will have an ‘A’ or ‘a’ allele.
If we don’t know for sure whether any particular female is ‘AA’ or ‘Aa’ we must work with the probabilities of these alleles being in the population of females as a whole.
Revisiting the first Punnet square above, of an ‘Aa’ male and ‘AA’ female mating, you can see that half the progeny are expected to be ‘AA’ and half expected to be ‘Aa’. Extrapolating this outcome to a larger population of females, half will similarly be expected to be ‘AA’ and the other half ‘Aa’.
From this, there are expected to be three ‘A’ alleles to every ‘a’ in that wider population, or three in four. Thus the probability of ‘A’ being present in the population is p = 0.75. And from this we know that the probability of ‘a’ being present is q = 0.25.
On crossing our known ‘Aa’ male with any female from a known ‘AA’ dam:
You’ll see the probability of a homozygous recessive genotype (’aa’) from mating a known heterozygous dominant male to females from known ‘AA’ dams is 0.125, or one in eight. This of course means the probability of producing a dominant genotype (whether ‘AA’ or ‘Aa’) is seven in eight (0.375 + 0.375 + 0.125 = 0.875 = 7/8). Three of those are expected to be homozygous dominant (’AA’) and four are expected to be heterozygous dominant (’Aa’).
This same method can be used to work out the probabilities of any genotype from any starting genotypes.
For example, continuing with the middle Punnet square at top, the probabilities of mating outcomes from crossing those progeny from known ‘Aa’ dams with a known ‘Aa’ animal is:
(This is the same result as simply crossing two known ‘Aa’ animals.)
From the third, end Punnet square at top we can see that crossing a known ‘Aa’ male with any ‘aa’ homozygous recessive female would result in half ‘Aa’ and half ‘aa’ progeny on average. Let’s step through this one mathematically, even though the progeny genotypes will be known by sight:
We already know the probability of an ‘Aa’ animal contributing ‘A’ or ‘a’ is p = q = 0.5. The probability of an ‘aa’ animal contributing an ‘A’ gamete is p = 0, and the probability of contributing an ‘a’ allele is q = 1.
Thus the probability of any ‘AA’ genotypes from an ‘Aa’ × ’aa’ cross is: the probability of an ‘A’ allele from the ‘Aa’ genotype and the probability of an ‘A’ allele from the ‘aa’ genotype, or 0.5 × 0 = 0
Similarly, the probaility of any ‘aa’ genotypes from an ‘Aa’ × ’aa’ cross is the probability of an ‘a’ allele from the ‘Aa’ genotype and the probability of an ‘a’ allele from the ‘aa’ genotype, or 0.5 × 1 = 0.5.
And lastly, the probability of any ‘Aa’ genotypes is the probability of an ‘A’ allele from the ‘Aa’ genotype and the probability of an ‘a’ allele from the ‘aa’ genotype, or 0.5 × 1 = 0.5
It’s all well and good to be able to calculate these probabilities, but there’s one more number we need, and that is the number of births required to be sure a tested animal is not a carrier of a recessive gene. This number will differ depending on the type of mating, and we will go over that in more detail later.
But first a small diversion to discuss the practicalities and limitations of test matings — our topic for next week!
We can easily identify homozygous recessive genotypes and partially-dominant traits simply by looking at the phenotypes of the progeny. Identifying carriers of recessive alleles isn’t as simple, as recessive alleles are hidden — the phenotype of an ‘AA’ animal is indistinguishable from that of an ‘Aa’ animal.
The purpose of test matings is to identify carriers of recessive alleles by forcing any such alleles that may be present to appear in progeny. It takes just one such progeny to be born to show without doubt that the tested parent is indeed a carrier. But as there is no guarantee of such a birth, it is more a matter of knowing how many offspring must be born to be sure that the tested animal is definitively not a carrier.
Before going further, let’s first summarise the calculations from last week:
The probability of homozygous recessive (’aa’) offspring when a heterozygous carrier (’Aa’) is mated to:
Genotype of Mate
homozygous dominant (’AA’)
progeny (either ‘AA’ or ‘Aa’) of known ‘AA’ parent
0.125 (one in eight)
heterozygous dominant (’Aa’) — a known carrier
0.25 (one in four)
homozygous recessive (’aa’)
0.5 (one in two)
All this table does, really, is confirm what we already know from Mendelian genetics, that the greatest likelihood of a homozygous recessive genotype will come by joining a heterozygous dominant to a homozygous recessive. This is because the homozygous recessive parent has only recessive alleles to pass on to its progeny, and the outcome is solely determined by which allele the heterozygous parent contributes. This mating scenario also requires the smallest number of matings to be sure a tested animal is not a carrier. (We’ll go over this later.)
All other probabilities will be between 0.5 and 0, depending on how prevalent the recessive allele is in the greater population. (And the number of required matings required are much higher too.)
Thus it makes too much sense to mate all test animals to homozygous recessive genotypes. This is indeed the case with alpacas, with their simply-inherited fleece type trait. Suri is dominant, huacaya is recessive, huacayas are in plentiful supply, and huacayas are indeed used to test suri genotypes.
But unfortunately it isn’t always as straightforward as that! It just so happens that huacayas happen to be popular, and in many cases preferred to suris. Their recessive genotype is desired, but this is actually not the norm for most recessive traits.
Many recessive traits are actually not wanted (eg horned cattle), or are a cause of infertility, or are lethal. Such animals either are never born, or are quickly culled if they are. Breeding age adults can be hard to find or non-existent.
The table above shows that the next-highest probability of dominant recessive genotypes comes from matings to known carriers. But this too presents challenges, as known carriers of deleterious alleles are also culled.
One easily obtained source of potential carriers, however, is a sire’s own daughters. There are several downsides to this, even apart from the time needed for the test sire to reach breeding age to produce those very daughters. Many daughters must first be born and raised to breeding age themselves, with time for gestation added to that. And a sire × daughter mating produces inbred progeny of low value.
However, the big advantage of such matings is that all the recessive alleles that potential sire may carry can be tested at once, as at least one copy of each is likely to be present across the whole daughter group.
Test matings with any random animal in a herd is likely to require the highest number of matings to ensure an animal isn’t a carrier, especially if an allele is already rare in that population. (The probabilities are 0.05 or 0.0125, depending on which assumptions are used. We’ll be going over the maths that calculates those probailities over the next few weeks.)
Yet testing with random animals is still the easiest approach of all, despite those odds. While many more females would be needed, that sire would be mated anyway if he’s being considered for testing at all. And as with mating to daughters, this method tests for all recessive alleles likely to be in the population. Sires in artificial insemination studs are routinely tested this way.
Over the next few weeks we’ll step through the maths of matings, and the number of required matings to be confident a tested animal is not a carrier — slowly!
But first, next week, an introduction to the concept of levels of confidence in statistics. The is the measure of confidence we have that a particular outcome is reliable and reproducible — and with which we can determine the number of matings required for different test mating scenarios.
Recently we calculated the probabilities of several mating scenarios producing a homozygous recessive, assuming the test animal (invariably a male) is a carrier. As most recessive alleles are not wanted, the sooner a carrier can be identified, the sooner it can be culled from a breeding programme.
But knowing the odds aren’t quite enough — odds are an indicator and not a guarantee of outcome. By sheer luck a carrier may produce a homozygous recessive after just one mating. Or it could take three, or ten, or more, matings — it’s all down to the random assortment of alleles and which sperm fertilises which egg.
As well as knowing the odds, breeders also need to know how many matings are required to be confident that a tested animal isn’t a carrier. We need to know the level of confidence.
The level of confidence (or confidence level) is a mathematical concept in statistics. A 0% confidence level is one where you have no confidence whatsoever that you’ll get the same outcome should you repeat the procedure. A 100% confidence level is one where you have no doubt at all that repeating a procedure will produce the same outcome. This level of confidence is only possible if you are able to use literally every single individual in existence at time of sampling — it is the only way to ensure you catch every ‘outlier’ that may alter results should you repeat the sampling. Thus this level isn’t truly possible in statistics and is regarded as a theoretical concept.
Statistical analyses commonly use a 95% confidence level. This isn’t so much a measure of accuracy as confidence in repeatability. If we take a random sample of a population, and repeat the same procedure over and over (with a different random sample each time), we can expect to match results theoretically taken from the entire population 95% of the time.
You may be familiar with a “bell curve", or normal (Gaussian) distribution curve, where a population is distributed evenly about some average value (the mean). A Gaussian distribution has the mean at the 50th percentile of 100 percentiles (percentage points), with half the population above this and half below:
This distribution is assumed to apply in certain circumstances (like ours), thus any samples taken randomly from a population are expected to fall evenly around the (unknown) mean. To be 95% confident is to expect 95% of all samples to fall in a range that captures that unknown mean.
The confidence level is shown graphically and mathematically below. The 95% confidence level would be at the 95th percentile. As the sample is assumed to be normally distributed, this means if the range at one end is at the 95th percentile, then the other end must begin at the 5th:
The confidence level when expressed as a proportion rather than a percentage is called the confidence coefficient. A confidence level of 95% has a confidence coefficient of 0.95 for example.
1 - α (the Greek letter alpha) in the graph above represents the confidence interval. This is the range of results.
The remaining, shaded parts combined equal α. As this is a normal distribution, the two are of equal amounts and each must equal α/2 .
A confidence level of 100% has a coefficient of 1, therefore (1 - α/2) is the percentile at which our confidence interval ends.
Now you have a better grasp of confidence levels it will be easier to follow the next few weeks as we go over calculations of the number of required matings for various confidence levels — see you then!
Having gone over confidence levels, it’s time to apply that and step through some maths!
Let’s now calculate confidence levels and the required number of test matings to be statistically confident that a tested animal is not a carrier of a recessive allele.
Everything below assumes that the tested animal is a sire, that there is one offspring from one mating, and that all mates (dams) are of the same type for the allele of interest. That is, they are either all known carriers, or are all daughters of the tested sire, or are all randomly selected from the same population.
n = number of matings that produce an offspring P[Dn] = probability (P) of detection (D) that at least one homozygous recessive offspring is born for n number of matings. This is our level of confidence in the test. PAA = probability that a mate is homozygous dominant (’AA’) at the locus being tested. PAa = probability that a mate is heterozygous (’Aa’) at the locus being tested. Paa = probability that a mate is homozygous recessive (’aa’) at the locus being tested.
From all this:
OK, so where did this come from, you may well be asking?! Let’s break this down bit by bit.
P[Dn], as stated above, is our level of confidence. From last week, our level of confidence can also be written as 1 - α.
Thus P[Dn] = 1 - α.
Substitute α (alpha) with , and you have 1 -
But what is that sum inside the parentheses?
Firstly, does it make sense to you that the (probability of homozygous recessive born) + (probability of no homozygous recessive born) must equal 1, or 100%? In other words, that it is 100% certain that all offspring born must have some combination of alleles, regardless of what that combination is?
And from that, does it also make sense that to calculate the probability of at least one homozygous recessive offspring being born, that you could calculate the probabilities of no homozygous recessive offspring being born, and subtract that from 1 (100%)? We could write that as: (probability of homozygous recessive born) = 1 - (probability of no homozygous recessive born)
From this, can you see how is the probability of no homozygous recessive born?
PAA is the probability that a dam is homozygous dominant (’AA’) at the locus being tested. This animal by definition can only contribute ‘A’ alleles. The probability that she will birth a non-recessive offspring is 1 (100%), whether or not the sire is a carrier. The PAA is thus written as if it were 1 × PAA.
PAa is the probability that a dam is heterozygous (’Aa’) at the locus being tested. If both the sire and female are carriers, there are three chances in four that a dominant phenotype will be born, hence we write 3/4PAa.
Paa is the probability that a dam is homozygous recessive (’aa’) at the locus being tested. If the sire is a carrier, and the female is homozygous recessive, there are two chances in four that a dominant phenotype will be born, hence we write 1/2Paa.
To calculate the number of matings to give us enough confidence that a sire is not a carrier, we need to know (mating outcome 1) and (mating outcome 2), and so on for some number n such that the confidence level is high.
(mating outcome 1) and (mating outcome 2) and (mating outcome 3) and so on can be rewritten (mating outcome 1) × (mating outcome 2) × (mating outcome 3) × (…), as in statistics, when we say ‘and’ we write ‘×’.
And (mating outcome 1) × (mating outcome 2) × (mating outcome 3) × (…) is the same as writing (mating outcome)n, where n is the number of matings.
Hopefully isn’t so scarey-looking now?
Time to use it and work with some numbers!
Assume we have five known carriers to test a sire with. This means the probabilities of PAA and Paa must both be 0, and that of PAa is 1.
That isn’t a very high level of confidence, but let’s take an extreme and assume 30 known carriers:
Now that is a very high level of confidence that the sire is not a carrier! But it would appear we could use a smaller number of females for almost as good a result. That would certainly be a more preferable use of dams. Rather than just punching in numbers at random to find the most favourable values for n and [Dn], the most practical approach is to solve for n outright.
How are you with logs?! (I don’t want to dwell on logs too much here but I have written a little supplementary post here to explain how the formula was rearranged below.)
Let’s rearrange to solve for n:
The number of matings required to give us a 95% level of confidence that no homozygous recessives will be born, and that the sire is not a carrier, using known carrier dams is:
This is an interesting example to make a point with. The solution for n, to two decimal places, was actually 10.42. Hence the rounding down to 10. But it’s so close to 10.5 that maybe it should be rounded up to 11 for an extra margin of safety.
Indeed, you should always factor in additional matings anyway, to account for matings that don’t take or aborted foetuses. The calculated number of matings assumes a live birth for every mating, ie it assumes n number of successful matings.
The above assumed one offspring per mating and that all mates were of the same type. We’ll stop there as that was quite a lot of information to absorb! Next week we’ll go over the maths for other scenarios involving multiple groups of mates and multiple births per mate.