Aaron Sams, principal scientist at Embark, joins host Laura Reeves to take us through the algorithms and data sets that create the genetic codes for each purebred dog breed.
“Dog population and dog evolutionary history is really complex,” Sams noted. “There's a lot of dogs in the world and most of them are what we call village dogs. They are just free-breeding free-ranging dogs that live with human populations … then you have some of these ancient breeds … salukis and other what I think of as landrace breeds, these are dogs that are living with humans, they are purpose bred, they’re adapted to the region in which they live… Then you’ve got the purebred dog breeds that we are most familiar with in Europe and America.
“A registration organization decides these are the standards, this is what this dog has to conform to to be a registered purebred of this breed. But genetic variation is a lot more complex. Within each of those breeds, there are genetic variation across those dogs. They're not clones of each other. They're not genetically identical. A lot of breeds are related to each other.
“So it's all pretty complex. How do you do ancestry testing and decide, given some random dog, what ancestry is the best fit for that dog? What's our best estimate of the ancestry of that dog? So rather than thinking of it from a single reference genome, we have to think about it as a population. We have a reference database with large numbers of individuals that serve as references for that given population.
“What we're trying to do is capture as much of that genetic diversity that represents that population as we can, so that anytime we take a new dog and we compare it to those reference individuals we can see that this dog shares DNA that's very, very similar to those reference individuals.
“There's different levels of genetic diversity in each of these populations. If you have a very inbred breed that doesn't have very much genetic diversity, you might need a smaller number of individuals to capture the genetic diversity for that entire population. The more individuals you add to your reference, the more accurate you're going to get, the more you can match those individuals that you're testing identically.
“Let’s say you send one of your dogs in and we say it's around 90% German wirehaired pointer and 10% English pointer … that happens. Those are very closely related breeds. Sometimes you have lineages of that breed that we don't have in our reference database. So sometimes you'll see that kind of thing happen. But a year later, you come back and we've added more registered German wirehaired pointers. Now we have dogs in our reference data set that are actually better matches for that DNA … Overtime these things are going to change.
“If you take a frozen semen sample from several generations ago, say 5-6 generations in the past, then you bring that forward and you breed a dog with that frozen semen sample. If there was more outcrossing or the breed was still kind of in formation (when that dog was alive), there's definitely a good chance that if we now have a very well-established reference data set for that breed, that you bring that genetic diversity forward, it may not actually be present in most dogs in that breed today. Now you've got ancestry or DNA from earlier in the population that's maybe been lost over time and you're reproducing it … if we don't have that in our reference data set for that breed it's going to be called as something else….”
For more on this absolutely fascinating deep dive on genetic ancestry DNA testing, listen to the podcast episode above.