Information

Is there a known minimal stretch of DNA that can distinguish any two people in the world?

Is there a known minimal stretch of DNA that can distinguish any two people in the world?


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I imagine this could be used as a universal Identifier.


Here is what the data says. UK government must have had some scientific evidence when it settled on a 10 variable-length sections of genome for their database, SGM+. In one such variable sections, some people have 10 repeats of CTTT, others have 11, others have 12 etc. The largest of those fragments, at its maximum length, are about 350 base pairs. The US government uses 13 such variable-length sequences for its CIA database, called CODIS. On both sides of the Atlantic, roughly 3-4 thousand base pairs were judged to be enough, with some safety margin, by the most advanced geneticists in the world, working on public money.

The catch is that these variable length sequences are disjoint, so you end up requiring far more than 4,000 bp. If you need these markers on the same DNA strand (chromosome), you should space them by 50 centimorgans, at which distance they recombine independently. But according to http://www.sciencemag.org/site/feature/data/genomes/265-5181-2094.pdf , the longest chromosome only has 350-400 centimorgan. It means you cannot get 10 independent DNA sequences on a human chromosome, be it the longest.

I predict there is no such continuous DNA stretch, that would tell apart any two non-twins. Since it is much better to use multiple chromosomes in DNA fingerprinting, I doubt you will find more relevant experimental data.


When designing PCR primers we typically use a minimum length of 20 bases, because the probability of a sequence of N bases appearing by random is $frac{1}{4^N}$, and $frac{1}{4^{20}}$ is about 9x$10^{-13}$, or about 1 in a trillion. Since the human genome is a little over 3 billion bases long, a 20 base sequence should appear only once. However, most of an individual's DNA isn't random, it's inherited from their parents, and they got their's from their parents, etc. Long story short, there isn't a lot of unique DNA in any given person. The uniqueness is only visible on the large scale, you have half of your DNA from your father, and half from your mother. So would any of your siblings, but the specific DNA they got from each parent would be different from yours.

I think we can calculate the odds of getting a specific set of chromosomes from a parent. If there are 23 pairs of chromosomes, then the probability of getting any given set should be $frac{1}{2^{23}}$, since we have 2 parents, the odds of getting your specific set of DNA is $(frac{1}{2^{23}})^2$, which is about 1 in 70 trillion. So the odds of you having a non-identical twin that has the same DNA as you is very low. But any given 20 base sequence should have a 50% probability of appearing in you and the sibling.

But if we look at 2 20 base sequences of DNA in you and your sibling, and each had a 50% chance of appearing, then the probability of both sequences appearing in the both of you is 25%. If we add a third, it goes to 12.5%, and so on. If there are 7 billion humans on earth, we need a probability of less than 1 in 7 billion, and $2^{33}$ is about 8.6 billion, so if you looked at 33 different sites on the genome it should be able to differentiate you from every human on the planet

By the way, did I do the math right? I'm not a mathmagician and I could have made a mistake in my probability calculations. I have also assumed that the spontaneous mutation rate and crossing over rates are low enough to ignore, but both would serve to make your DNA a little more unique.


Click here to order our latest book, A Handy Guide to Ancestry and Relationship DNA Tests

How does a paternity test work?

-A curious adult from California

Did I find my real dad? Is that really my son? Did Michael Jackson really father Billie Jean's kid? Questions like these used to be very hard to answer. In the past, people used a blood test. This might rule out that you were the father but couldn't prove that you were.

Nowadays, DNA technology is used to figure out who is the father of a child. DNA paternity testing makes it possible to determine a child's biological father to a very high degree of certainty.

Everyone, except identical twins, has a unique set of DNA. DNA is made up of 4 bases or letters, A, C, G, and T. These 4 letters form the written code that makes up the DNA sequence.

Now when someone says that everyone's DNA is unique, what they mean is that occasionally one of these letters is different for different people. On average, two people at random have a different base every thousand bases or so. This is where the statistic that says that everyone's DNA is 99.9% the same comes from.

Since you get half your DNA from your father and half from your mother, your DNA is more than 99.9% the same as your parents. Your DNA is also more similar to that of your grandparents or cousins than to that of a random stranger. Paternity tests use this greater similarity to figure out who the parents are.

So how do you figure out someone's DNA is more similar to another's? There are lots of ways but we'll focus on the simplest, DNA restriction analysis or DNA fingerprinting. DNA fingerprinting uses special proteins called restriction enzymes. Restriction enzymes cut DNA but only at a certain combination of A, G, T, and C. Different restriction enzymes cut DNA at different places -- each has a unique sequence it recognizes. For example, the restriction enzyme EcoRI cuts DNA at the sequence GAATTC and will cut only at that sequence. It will not, for example, cut at GACTTC.

OK, so what DNA fingerprinting does is it looks for differences in the DNA that change where these restriction enzymes can cut DNA. The pattern of DNA fragments is then compared and if the child's DNA looks like a combination of the two parents' DNA, then the child is theirs.

Let's look at an example of how this might be done. Suppose we have three people: Bob, Larry, and Mary. If we take the same stretch of DNA from the three of them, small differences might mean that EcoRI will cut them differently (see Figure 1). In Bob, the sequence GAATTC occurs once in this stretch of DNA. That is, in this stretch of DNA, Bob has one EcoR I site. Now suppose Mary has no EcoR I sites and Larry has two EcoR I sites in this stretch of DNA. You can see that EcoR I will cut this stretch of Bob's DNA into two fragments, Larry's into three fragments, and Mary's will not cut.

When we cut the DNA with EcoR I and separate the cut fragments on an agarose gel, the gel might look something like in Figure 2. In an agarose gel, smaller fragments run faster so you get separation based on size -- the bigger fragments are near the top, the smaller are near the bottom.

Now, suppose Mary has a child and she wants to determine which of two men, Bob or Larry, is the biological father of her child. She consults a paternity testing expert. The expert collects a certain stretch of DNA from Mary, Bob, Larry, and the child, and cuts the DNA with EcoR I. When the expert separates the cut DNA fragments on an agarose gel, the pattern looks like the one in Figure 3. The child's DNA must be a combination of Mary's DNA plus one of the men's DNA. The agarose gel indicates that the child's DNA is a combination of Mary's DNA (top band) plus Larry's DNA (bottom three bands). Thus, Larry is the biological father of the child.

What happens if Larry and Bob have the identical sequence in this stretch of DNA? The answer is that you wouldn't be able to distinguish, based on looking at differences in this stretch of DNA, between the two men. So what do you do? You simply search for other stretches of DNA in which there is a difference between these two men. This is why in real life, multiple stretches of DNA must be examined to ensure that the results are statistically significant.


9 Resistance To HIV


All sorts of things could wipe out the human race&mdashasteroid strikes, nuclear annihilation, and extreme climate change, just to name a few. Perhaps the scariest threat is some type of super-virulent virus. If a disease ravages the population, only the rare few who are immune would have a chance of survival. Fortunately, we know that certain people are indeed resistant to particular diseases.

Take HIV, for example. Some people have a genetic mutation that disables their copy of the CCR5 protein. HIV uses that protein as a doorway into human cells. So, if a person lacks CCR5, HIV can&rsquot enter their cells, and they&rsquore extremely unlikely to become infected with the disease.

That being said, scientists say that people with this mutation are resistant rather than immune to HIV. A few individuals without this protein have contracted and even died from AIDS. Apparently, some unusual types of HIV have figured out how to use proteins other than CCR5 to invade cells. This type of resourcefulness is why viruses are so scary.

Folks with two copies of the defective gene are most resistant to HIV. Currently, that includes only about 1 percent of Caucasians and is even more rare in other ethnicities.


Parabon ® Snapshot ®

Snapshot is a cutting-edge forensic DNA analysis service that provides a variety of tools for solving hard cases quickly:

Snapshot is ideal for generating investigative leads, narrowing suspect lists, and solving human remains cases, without wasting time and money chasing false leads.

Get Started

Snapshot is ideal for generating investigative leads, narrowing suspect lists, and solving human remains cases, without wasting time and money chasing false leads.

Get Started

Snapshot Genetic Genealogy

Genetic Genealogy (GG) is the combination of genetic analysis with traditional historical and genealogical research to study family history. For forensic investigations, it can be used to identify remains by tying the DNA to a family with a missing person or to point to the likely identity of a perpetrator.

By comparing a DNA sample to a database of DNA from volunteer participants, it is possible to determine whether there are any relatives of the DNA sample in the database and how closely related they are (see Snapshot Kinship Inference for more details). This information can then be cross-referenced with other data sources used in traditional genealogical research, such as census records, vital records, obituaries and newspaper archives.

Why Use Genetic Genealogy?

Genetic genealogy gives you a powerful new tool to generate leads on unknown subjects. When a genetic genealogy search yields useful related matches to an unknown DNA sample, it can narrow down a suspect list to a region, a family, or even an individual. Paired with Snapshot DNA Phenotyping to further reduce the list of possible matches, there is no more powerful identification method besides a direct DNA comparison. Identity can then be confirmed using traditional STR analysis.

How Does This Technique Differ From Familial Searches in the CODIS Database?

Our genetic genealogy service is somewhat like familial search, but it differs in three very important ways: (1) we only search public genetic genealogy databases, not government-owned criminal (STR profile) databases, such as CODIS (2) because the DNA SNP profiles we generate contain vastly more information than traditional STR profiles, genetic relatedness can be detected at a far greater distance (see Snapshot Kinship Inference) and (3) because genetic genealogy matches can be cross-referenced by name with traditional genealogy sources, such as Ancestry.com, existing family trees can be used to expedite tree-building and case-solving. This technology and our innovative techniques combine to create a groundbreaking system for forensic human identification.

How Genetic Genealogy Works

Genetic genealogy uses autosomal DNA (atDNA) single nucleotide polymorphisms (SNPs) to determine how closely related two individuals are. Unlike other genetic markers, such as mitochondrial DNA or Y chromosome DNA, atDNA is inherited from all ancestral lines and passed on by both males and females and thus can be used to compare any two individuals, regardless of how they are related. However, atDNA SNPs are more difficult to obtain from forensic samples, which is why Parabon has created an optimized laboratory protocol to ensure high-quality results even from small, degraded DNA samples.

The standard atDNA metric used by genetic genealogists is the amount of DNA that two people are likely to have inherited from a recent common ancestor. This can be estimated by looking for long stretches of identical DNA. While alleles can easily be shared by chance at one or a few SNPs, it is highly unlikely for two unrelated people to share a long stretch of DNA. Therefore, only segments above a certain length are counted. The length of these shared segments is measured in centimorgans (cM), a measure of genetic distance, and the total number of cM shared across all chromosomes can be used to determine approximately how closely related two people are. The figure below shows how shared segments of DNA on a single chromosome are broken up with each generation, leading to shorter shared segments for more distant relatives. Using a public genetic genealogy database, DNA from an unknown person can be compared to roughly 1 million other people to see whether any of them are related.

DNA database matches serve as clues on which traditional genealogy methods can build, starting with building the matches' family trees using a wide variety of information sources. During the tree building process, the genetic genealogist searched for common ancestors who appear across multiple family trees of the matches. Ideally, marriages between the descendants of the identified common ancestors are discovered. Then descendancy research is employed to search for descendants at the intersection of these common ancestors who were born at a time that is consistent with the subject's estimated age range. The goal of this search is to narrow down the possible individuals to a set of names, a family, or even an individual.

Depending on the amount of information available from the matches, genetic genealogy can produce a wide range of leads. In all cases that proceed to analysis, genetic genealogy will significantly narrow the scope of possible identities for the person-of-interest. In some cases, the identity will be narrowed to descendants of a particular ancestor or from a particular region. In others, our analysts can produce the name and address of the person-of-interest. In all cases, identity must be confirmed through traditional forensic DNA matching.

Genetic Genealogy Use Cases

Genetic genealogy has traditionally been used to discover new relatives and build a full family tree. However, it can also be used to discover the identity of an unknown individual by using DNA to identify relatives and then using genealogy research to build family trees and deduce who the unknown individual could be. These techniques have primarily been used to discover the family history of adopted individuals, but they apply equally as well to forensic applications. Genetic genealogy has been used to identify victims' remains, as well as suspects, in a number of high-profile cases.

Because genetic genealogy uses the same type of data generated for Snapshot DNA Phenotyping and Snapshot Kinship, the analysis can quickly be performed on existing cases, and new cases have a wide array of options for generating new leads from a single DNA sample.

Snapshot Featured in
National Geographic
Magazine Cover Story
[UPDATE: Solved]

Watch NBC Nightly News
Put Snapshot To The Test

Watch Snapshot
Workflow Video

The Snapshot DNA Phenotyping Service

DNA Phenotyping is the prediction of physical appearance from DNA. It can be used to generate leads in cases where there are no suspects or database hits, to narrow suspect lists, and to help solve human remains cases.

DNA carries the genetic instruction set for an individual's physical characteristics, producing the wide range of appearances among people. By determining how genetic information translates into physical appearance, it is possible to "reverse-engineer" DNA into a physical profile. Snapshot reads tens of thousands of genetic variants ("genotypes") from a DNA sample and uses this information to predict what an unknown person looks like.

Over the past four years, using deep data mining and advanced machine learning algorithms in a specialized bioinformatics pipeline, Parabon &mdash with funding support from the US Department of Defense (DoD) &mdash developed the Snapshot Forensic DNA Phenotyping System, which accurately predicts genetic ancestry, eye color, hair color, skin color, freckling, and face shape in individuals from any ethnic background, even individuals with mixed ancestry.

Because some traits are partially determined by environmental factors and not DNA alone, Snapshot trait predictions are presented with a corresponding measure of confidence, which reflects the degree to which such factors influence each particular trait. Traits, such as eye color, that are highly heritable (i.e., are not greatly affected by environmental factors) are predicted with higher accuracy and confidence than those that have lower heritability these differences are shown in the confidence metrics that accompany each Snapshot trait prediction.

How DNA Phenotyping Works

Whereas traditional DNA forensics matches STRs from a sample to a known suspect or a database, DNA phenotyping can generate new leads about an individual, even if they have not previously been identified in a database. DNA phenotyping takes advantage of modern SNP technology to read the parts of the genome that actually code for the differences between people.

The Snapshot DNA Phenotyping System translates SNP information from an unknown individual's DNA sample into predictions of ancestry and physical appearance traits, such as skin color, hair color, eye color, freckling, and even face morphology. Each phenotype prediction is made with a measure of confidence, including those that can be excluded with high confidence.

SNP Technology

Recent advances in genomic technology have made it practical and affordable to read the sequence of millions of pieces of DNA from a small quantity of sample. This data captures a large proportion of the genomic variation between people and thus contains much of the genetic blueprint that differentiates people's appearance. These SNP genotypes can then be paired with phenotypes from thousands of subjects to create a genotype-and-phenotype (GaP) dataset for analysis.

Using genomic data from large populations of subjects with known phenotypes, Parabon's bioinformatics scientists have built statistical models for forensic traits, which can be used to predict the physical appearance of unknown individuals from DNA.

Data Mining

Beginning with large GaP datasets containing genetic information and measures of phenotype for thousands of subjects, Parabon's bioinformatics team performs large-scale statistical analysis on hundreds of thousands of individual SNPs and billions of SNP combinations to identify genetic markers that are associated with a trait. This mining process can take weeks of compute time running on hundreds, sometimes thousands, of computers. In the end, those SNPs with the greatest likelihood of contributing biologically to the trait's variation are selected for potential use in predictive models.

Data Modeling

In the modeling phase, Parabon's scientists use machine learning algorithms to combine the selected set of SNPs into a complex mathematical equation for the genetic architecture of the trait. A new, unknown individual's SNP data can then be plugged into this equation to produce a prediction of the trait in that individual.

Model accuracy is assessed by making predictions on new subjects with known phenotypes ("out-of-sample predictions"). By comparing predicted versus actual phenotypes, Parabon scientists are able to calculate confidence statements about new predictions and, more importantly, exclude highly unlikely traits. For example, if 99% of brown-eyed people have an eye color prediction value greater than 2, then we can have very high confidence that a prediction of 1.5 most likely did not come from a brown-eyed person.

The final models are calibrated with all available data before being installed into the Snapshot production service that is used to generate phenotype predictions for investigators.

Snapshot Success Stories

Snapshot has been used by hundreds of law enforcement agencies around the world to help generate leads, narrow their suspect pools, and solve human remains cases, in both active and decades-old investigations.

Featured Case Summaries: Read detailed case descriptions, including how Snapshot helped solve the following cases:

Case Summary
Albuquerque, NM
2008 Aggravated Assault

Just before noon on 11 September 2008, Diane Marcell returned to her home in Albuquerque, NM, to meet her daughter, Brittani Marcell, for lunch. Brittani, then 17 years old, had driven home from her nearby high school. Upon entering her home, Diane found Brittani lying on the floor, covered in blood. A male subject, unknown to Diane, was standing near Brittani holding a shovel.

Startled, he dropped the shovel, ran into. More

Case Summary
Lake Brownwood, TX
2016 Sexual Assault and Murder

On Friday 13 May 2016, the Brown County Texas Sheriff's Office (BCSO) received a missing person report for 25-year-old Rhonda Chantay Blankinship. Family members reported Blankinship had last been seen late Friday evening, walking near her home in the Tamarack Mountain/Thunderbird Bay area of Lake Brownwood. Friends, family and volunteers began a search for her while deputies followed up on possible leads into her disappearance.

Blankinship's body was found. More

Case Summary
Tacoma, WA
1986 Rape and Murder of 12-Year-Old Girl

On Wednesday 26 March 1986, Michella Welch, a petite 12-year old girl with long blond hair and glasses, went missing. She had taken her two younger sisters to Puget Park in Tacoma, Washington at about 10 a.m. and then rode her bicycle home about 11 a.m. to make lunch for them. When she returned, she chained her bike next to one of her sister's bikes, set the lunches on the table and went looking for her siblings, who had gone to a nearby business to use the restroom.

A 13-year-old classmate later told detectives he saw a man in the park that day under the Proctor Bridge who. More

Case Summary
Rockingham County, NC
2012 Double Homicide

In the early hours of 4 Feb 2012, Troy and LaDonna French were gunned down in their home in Reidsville, NC. The couple awoke to screams from their 19-year old daughter, Whitley, who had detected the presence of a male intruder in her second floor room. As they rushed from their downstairs bedroom to aid their daughter, the intruder attempted to quiet the girl with threats at knifepoint. Failing this, he released Whitley and raced down the stairs.

After swapping his knife for the handgun in his pocket. More

Case Summary
Anne Arundel County, MD
2017 Unidentified Remains

On Wednesday 14 June 2017, members of the Anne Arundel County Police Department responded to a call reporting that a body had been found in the area of East Ordnance Road and East Avenue in Glen Burnie, MD. Upon arrival, officers located badly decomposed human skeletal remains that had been covered up by a tarp. The Office of the Chief Medical Examiner later determined that the decedent was a female approximately 20 years of age and that foul play was suspected in her death.

In the fall of 2017, after initial investigative efforts failed to reveal the victim's identity. More

Testimonials: To read about how Snapshot has helped our clients with their investigations, see:

Published Investigations: To learn how Snapshot is being used by additional law enforcement agencies &mdash and to read about other solved cases &mdash please visit the published police investigation page at:

Blind Evaluations: Snapshot was built by Parabon NanoLabs for the defense, security, justice, and intelligence communities with funding from the United States Defense Threat Reduction Agency. As part of the development and validation process, Snapshot was tested on thousands of out-of-sample genotypes and was shown to be extremely accurate.

To see examples of Snapshot predictions from blind evaluation studies, visit:

Example of How To Use Snapshot: To learn how you can use Snapshot to narrow a suspect pool, watch:

Snapshot Featured in
National Geographic
Magazine Cover Story
[UPDATE: Solved]

Watch NBC Nightly News
Put Snapshot To The Test

Watch Snapshot
Workflow Video

Predicting Genetic Ancestry With Snapshot

Scientific analysis of human genomes from different parts of the world has shown that, on a global scale, modern humans divide genetically into seven continental populations: African, Middle Eastern, European, Central/South Asian, East Asian, Oceanian, and Native American 1 . These genetic divisions stem simply from the fact that these groups were isolated from one another for many generations, and thus each group has a unique genetic signature that can be used for identification. In order to determine a new subject's genetic ancestry, Parabon Snapshot analyzes tens of thousands of SNPs from a DNA sample to determine a person's percent membership in each of these global populations. Other forensic ancestry approaches assume that every individual comes from only a single population, so they can easily be confounded by admixed individuals, but Snapshot allows for contributions from multiple populations, so it can detect even low levels of admixture (<5%).

Global ancestry map showing mostly East Asian and Native/South American ancestry, with some European ancestry as well.

After global ancestry is determined, Snapshot's ancestry algorithm investigates which subpopulations (e.g., Northwest vs. Northeast Europe) an individual comes from. This analysis is robust to admixture, such that each piece of continental ancestry can be precisely localized within that continent. For example, the admixed East Asian and Latino example from the global map above was determined to have specifically Japanese, Central American, and Southwest European ancestry, as shown in the map below.

Regional ancestry map showing mostly Japanese, Southwest European, and Central American ancestry.

Using all of this information, Snapshot builds a precise profile of an individual's ethnic ancestry using only his or her DNA.

How Genetic Ancestry Determination Works

Parabon has built a powerful system for determining ethnic ancestry from DNA. Most other forensic ancestry systems use only a small number of SNPs and thus are limited to very coarse populations and cannot detect admixture between populations. Snapshot uses tens of thousands of SNPs across the genome to obtain very precise estimates of ancestry, even for admixed individuals. Parabon's scientists have collected data from many published scientific articles, totalling more than 9,000 individuals with clearly defined ancestry from more than 150 populations around the world, as shown in the map below.

Each point represents a population from which we have obtained ancestry background data. Efforts are ongoing to increase the representation of Native American populations.

Academic research using hundreds of thousands of SNPs from across the genome has shown that human groups generally divide into seven continental populations, which have been established over the past 50,000 years during the migration out of Africa. The 150 populations collected as the ancestry background can thus be divided into these seven continental groups according to their origin.

Snapshot builds on this research by mapping a new person's genome onto these established populations. Our algorithm calculates how similar the new individual's DNA is to each of the background populations, determining which population(s) the person comes from. This allows for contributions from multiple groups, so even small amounts of admixture (<5%) can be detected.

Snapshot takes a similar approach to identifying within-continental (regional) ancestry, although the local populations were identified through empirical analysis performed by our bioinformatics team. Each piece of continental ancestry is partitioned according to its regional ancestry (e.g., if an individual is 50% European and 50% East Asian, the precise origin of each of those pieces will be determined). The person's genome is also plotted against all of the known individuals in each region to show visually where he or she falls.

Below is an example plot for an individual who was determined to be 50% East Asian and 50% Latino. Latino ancestry is a mixture of European and Native American ancestry, so these groups are shown as well.

Ancestry clustering diagram this individual is half Japanese and half Latino.

Ancestry Determination Use Cases

Ethnic ancestry is one of the most informative traits that can be predicted from DNA. In an ancestry analysis, Snapshot will determine an individual's precise genetic origins, as well as whether there is any evidence of admixture (contribution from multiple populations). This information can be used to help identify remains or to significantly focus an investigation by excluding a wide range of possible suspects or even pointing to a very small group.

Snapshot Featured in
National Geographic
Magazine Cover Story
[UPDATE: Solved]

Watch NBC Nightly News
Put Snapshot To The Test

Watch Snapshot
Workflow Video

Snapshot Kinship Inference &trade

Snapshot Kinship Inference provides highly accurate inferences about the familial relationship between two people based on their DNA, even if they are distantly related. Unlike traditional forensic DNA methods, which are extremely limited in their ability to determine kinship (see tan region in the figure below), Snapshot can detect relatedness out to 9th-degree relatives (fourth cousins). This powerful forensic analysis tool gives investigators valuable, previously unobtainable information about the DNA samples found at a crime scene &mdash information that can save time and money and lead to more solved cases.

Thanks to the massive amount of information contained in genome-wide SNP data, using DNA extracted from two biological samples, it is possible to precisely calculate the degree of relatedness between the contributors, even if the relationship is very distant.

Built with advanced machine learning algorithms, the Snapshot kinship model can distinguish up to 9th-degree relatives (fourth cousins) from unrelated pairs.

Traditional STR-based kinship analysis is limited to distinguishing parent/offspring relationships, often yielding inconclusive results for siblings or other second-degree relatives. Snapshot's kinship model, on the other hand, uses hundreds of thousands of SNPs to detect relatedness out to 9th-degree relationships &mdash e.g., fourth cousins. Moreover, the precise degree of the relationship can be determined out to 6th-degree relatives (second cousins once removed) while minimizing false positives &mdash i.e., unrelated pairs mistakenly inferred to be related.

How Snapshot Kinship Inference Works

Traditional autosomal kinship analysis uses fewer than 20 short tandem repeat (STR) loci, which lack the resolution to establish relatedness beyond parent-offspring or full siblings, and is easily confounded by mutation or mistaken testing of a close relative of the true parent. 1 Other forensic analyses use pieces of DNA that are directly transmitted through the maternal (mitochondrial DNA) or paternal (Y-chromosome) lines however, these approaches are limited to a small subset of relationships and have very low resolution. For example,

7% of unrelated Europeans share the same mitochondrial haplotype, meanting that they cannot be assigned to a specific family. MtDNA and Y-STRs can only suggest that two individuals may be related but cannot say whether that relationship is close or very distant.

Dissatisfied with these limitations, Parabon's scientists set out to develop a novel algorithm that takes advantage of the massive amount of autosomal data made available by genome-wide SNP typing to compare two genomes and determine the precise degree of relatedness between the two individuals. The result is a revolutionary new test that redefines the state-of-the-art in kinship analysis.

Parabon's kinship algorithm analyzes the similarity between two genomes and uses a machine learning model to predict the degree of relatedness of the two individuals. In thousands of out-of-sample predictions, this method has proven to be highly accurate while maintaining a very low false-positive rate (i.e., unrelated pairs are almost never mistakenly inferred to be related). This is true across subjects from a range of ethnic backgrounds, including related pairs with different ethnic backgrounds. Absolute accuracy is >90% out to 3rd-degree relatives (first cousins), and Snapshot can distinguish 6th-degree relatives (e.g., second cousins once removed) from unrelated pairs with greater than 98% accuracy.

Snapshot Kinship Accuracy, measured as the frequency of correct predictions of the exact degree of relatedness (absolute accuracy) and the frequency of predictions within one degree of actual relatedness (n = 3,654 relationships).

As shown in the figure above, even when Snapshot incorrectly infers the degree of relatedness between two individuals, it is almost always correct within one degree. For example, Snapshot may occasionally incorrectly predict a 4th-degree relationship to be a 5th-degree relationship, but it rarely makes the mistake of predicting a 4th-degree relationship to be a 6th-degree relationship. With this level of accuracy, you can be confident that the inferences provided by Snapshot are reliable and actionable.

[1] Chakraborty, R., et al. (1999). The utility of short tandem repeat loci beyond human identification: implications for development of new DNA typing systems. Electrophoresis, 1682&ndash1696.

How Snapshot Kinship Inference is Used

Snapshot Kinship Inference can be used to establish familial relationships between a DNA sample and previously collected DNA samples or among a set of new samples, e.g.:

  • If there is a chance that the perpetrator of a crime is related to the victim, Snapshot can compare the victim's DNA to a crime scene DNA sample to determine whether they are related. With just one test, investigators and include or exclude the entire extended biological family of the victim.
  • If DNA from a suspect cannot be obtained, but a consenting family member is willing to contribute a sample, Snapshot can establish whether that family member is related to a crime scene DNA sample.
  • If the identity of unidentified remains is suspected, but only distant relatives are available, Snapshot can compare DNA from the remains (even bone) to that of a relative to determine whether they are related.

According to the U.S. Department of Justice (DOJ) Bureau of Justice Statistics, over 60% of all violent crimes in 2016 [the latest period for which data is available] were committed by persons known to the victim. 1

Knowledge of these relationships can be used to validate claims of distant kinship, establish relationship networks within groups of interest, or identify remains when close relatives are not available, such as cold cases, mass disasters, or casualties of past conflicts.

[1] Morgan R. and Kena G., Criminal Victimization, 2016, US Department of Justice, Office of Justice Programs, Bureau of Justice Statistics, NCJ 251150, Dec 2017. https://www.bjs.gov/content/pub/pdf/cv16.pdf. Retrieved: 19 Feb 2018.

Snapshot Featured in
National Geographic
Magazine Cover Story
[UPDATE: Solved]

Watch NBC Nightly News
Put Snapshot To The Test

Watch Snapshot
Workflow Video

Forensic Art Enhancement

While DNA can reveal much about the appearance of a subject, information about features such as age, body mass index (BMI) or the presence of facial hair are not available within an individual's genetic code. Snapshot forensic art services provide a means of incorporating such information into a Snapshot composite when it is available from non-DNA sources.

Examples of age progression and accessorization with Snapshot Forensic Art Services. By default, Snapshot produces composites from DNA at 25 years of age (A). Composite (A) shown after age progression to age 50 years (B) with the addition of a light beard (C) after further age progression to age 75 years with reading glasses (D) and with a full beard (E)

Our Forensic Art Department &mdash under the direction of Thom Shaw, who is certified by the International Association for Identification (IAI) in the discipline of forensic art &mdash offers age progression, BMI alteration, and accessorization services, which may include the addition of facial hair, eyeglasses, piercings, etc. We can also create composite sketches from eyewitness accounts and combine them with traditional Snapshot composites in this way, corroborating the witness account or adding objective phenotype information to help produce the most accurate composite possible.

Composite (A) shown after age progression to 50 years old, including a beard (B) as compared to the actual subject (C)

In cases involving unidentified remains where a skull or partial skull is available, our forensic artists are also trained to perform digital facial reconstruction, using bone structure to enhance or give nuance to a Snapshot composite.

Snapshot predictions for Yolanda McClary, investigator for TV's "Cold Justice",
shown at age 25 and age progressed to 49 years old

Collectively, these forensic art services perfectly complement what Snapshot can provide from DNA alone and together they represent a revolution in how DNA can be used in an investigation.

How Forensic Art Enhancement Works

Forensic artists are artists with special training to address forensic challenges. They have an expert understanding of the human face and how the effects of aging and body mass index (BMI) change appearance. Those trained in facial reconstruction learn how to infer the most likely distribution of muscle and soft tissue from a skull. Forensic artists who create composite sketches from eyewitness accounts are trained to conduct cognitive interviews, so as to get the most accurate portrayal from a witness' memory.

Like many domains, forensic artists are beginning to rely heavily on modern software applications to facilitate their work. Sketches formerly performed with pencil and pad can now be drawn digitally. As well, facial reconstructions once performed with clay sculpture can also be digitally sculpted. In the right hands, graphics software programs can ease the task of adding or subtracting hair, scars, and other accessories. In all cases, great skill and specialized training is still required, but the work can be more efficient and realistic thanks to these tools.

Forensic Art Enhancement Use Cases

Age Progression or Regression

Because age is not genetically encoded, Snapshot predicts subjects at 25 years of age by default. When investigators have reason to believe a person of interest is younger or older, our artists can adjust a composite accordingly, based on standard aging principles.

Examples of age progression with Snapshot Forensic Art Services: the predicted composite at 25 years old (A) shown after age progression to age 50 years (B) and after further age progression to 75 years of age

Composites Based on Eyewitness Account

Our forensic artists are trained to conduct cognitive interviews and produce composites solely from an eyewitness account. The interview and composite production is conducted online with screen sharing technology, so eyewitnesses do not have to travel. When DNA is available for the same person of interest as seen by the eyewitness, Snapshot can provide a corresponding composite from "the genetic witness" perspective. Our artists can combine a composite from an eyewitness account with one produced by Snapshot to produce a single, highly accurate rendering that contains the best that both sources of information can offer.

Accessorization

In some instances, descriptive information about a subject's accessories or distinguishing features is available that can be used to enhance a Snapshot composite. For example, a surveillance camera image may be too grainy for identification, but nevertheless suggestive that a suspect has facial hair. Similarly, an eyewitness may recall a tattoo or scar, even though they were too traumatized to remember much else. In such cases, our forensic artists can accessorize a Snapshot composite to include all available descriptive information about a subject.

Examples of accessorization with Snapshot Forensic Art Services: the predicted composite at 25 years old (A) shown after age progression to age 50 years, with the addition of a light beard (B) and after further age progression to age 75 years with reading glasses and a full beard (C)

Body Mass Index (BMI) Alteration

Besides the effects of aging, changes in BMI have among the largest effects on appearance. By default, Snapshot produces composites assuming the subject has a BMI of 22, which is considered average. When information is available that suggests a subject has a lower or higher than average BMI, forensic artists can appropriately alter the BMI of a Snapshot composite.

Extreme examples of body mass index (BMI) alteration: the original prediction (A) shown with significantly less body mass (B) and again with a significantly larger amount of body mass (C)

Unidentified Remains

When unidentified human remains include a skull, our forensic artists can perform facial reconstruction, literally building up the corresponding face using knowledge of facial musculature and soft tissues. Although facial features cannot be perfectly inferred from a skull, bone structure can be immensely informative about the shape of an individual's face. Snapshot predicts exterior face morphology, but when a skull is available, a forensic artist can use it to confirm or enhance a Snapshot composite based on facial reconstruction.

Snapshot Featured in
National Geographic
Magazine Cover Story
[UPDATE: Solved]


Notes

[note 2] TATT stands for a specific string of nucleotide bases, thymine-adeninethymine- thymine. Thymine and adenine are two of the four bases frequently found in DNA. The other two are cytosine (C) and guanine (G).

[note 3] Norrgard, K., "Forensics, DNA Fingerprinting, and CODIS" (accessed July 7, 2010) Nature Education 1(1) (2008).

[note 4] Hanson, E., and J. Ballantyne, "A Highly Discriminating 21 Locus Y-STR 'Megaplex' System Designed to Augment the Minimal Haplotype Loci for Forensic Casework," Journal of Forensic Sciences 49 (January 2004): 1-12.


Is there a known minimal stretch of DNA that can distinguish any two people in the world? - Biology

"Mitochondrial DNA and Human Evolution," Nature, 325 (1987), 31-6.

Department of Biochemistry, University of California, Berkeley, California 94720, USA Page 31 Mitochondrial DNAs from 147 people, drawn from five geographic populations have been analysed by restriction mapping. All these mitochondrial DNAs stem from one woman who is postulated to have lived about 200,000 years ago, probably in Africa. All the populations examined except the African population have multiple origins, implying that each area was colonised repeatedly.

MOLECULAR biology is now a major source of quantitative and objective information about the evolutionary history of the human species. It has provided new insights into our genetic divergence from apes 1-8 and into the way in which humans are related to one another genetically 9-14 . Our picture of genetic evolution within the human species is clouded, however, because it is based mainly on comparisons of genes in the nucleus. Mutations accumulate slowly in nuclear genes. In addition, nuclear genes are inherited from both parents and mix in every generation. This mixing obscures the history of individuals and allows recombination to occur. Recombination makes it hard to trace the history of particular segments of DNA unless tightly linked sites within them are considered.

Our world-wide survey of mitochondrial DNA (mtDNA) adds to knowledge of the history of the human gene pool in three ways. First, mtDNA gives a magnified view of the diversity present in the human gene pool, because mutations accumulate in this DNA several times faster than in the nucleus 15 . Second, because mtDNA is inherited maternally and does not recombine 16 , it is a tool for relating individuals to one another. Third, there are about 1016 mtDNA molecules within a typical human and they are usually identical to one another 17-19 . Typical mam- Page 32

malian females consequently behave as haploids, owing to a bottleneck in the genetically effective size of the population of mtDNA molecules within each oocyte 20 . This maternal and haploid inheritance means that mtDNA is more sensitive than nuclear DNA to severe reductions in the number of individuals in a population of organisms". A pair of breeding individuals can transmit only one type of mtDNA but carry four haploid sets of nuclear genes, all of which are transmissible to offspring. The fast evolution and peculiar mode of inheritance of mtDNA provide new perspectives on how, where and when the human gene pool arose and grew.

MtDNA was highly purified from 145 placentas and two cell lines, HeLa and GM 3043, derived from a Black American and an aboriginal South African (!Kung), respectively. Most placentas (98) were obtained from US hospitals, the remainder coming from Australia and New Guinea. In the sample, there were representatives of 5 geographic regions: 20 Africans (representing the sub- Saharan region), 34 Asians (originating from China, Vietnam, Laos, the Philippines, Indonesia and Tonga), 46 Caucasians (originating from Europe, North Africa, and the Middle East), 21 aboriginal Australians, and 26 aboriginal New Guineans. Only two of the 20 Africans in our sample, those bearing mtDNA types I and 81 (see below) were born in sub-Saharan Africa. The other 18 people in this sample are Black Americans, who bear many non-African nuclear genes probably contributed mainly by Caucasian mates. Those males would not be expected to have introduced any mtDNA to the Black American population. Consistent with our view that most of

these 18 people are a reliable source of African mtDNA, we found that 12 of them bear restriction site markers known 21 to occur exclusively or predominantly in native sub-Saharan Africans (but not in Europeans, Asians or American Indians nor, indeed, in all such Africans). The mtDNA types in these 12 people are 2-7, 37-41 and 82 (see below). Methods used to purify mtDNA and more detailed ethnographic information on the first four groups are as described 17,22 the New Guineans are mainly from the Eastern Highlands of Papua New Guinea

Each purified mtDNA was subjected to high resolution map Ping 22-24 with 12 restriction enzymes (Hpal, Avall, FnuDII, Hhal, Hpall, Mbol, TaqI, Rsal, Hinfl, Haelll, Alul and DdeI). Restriction sites were mapped by comparing observed fragment patterns to those expected from the known human mtDNA sequence 25 . In this way, we identified 467 independent sites, of which 195 were polymorphic (that is, absent in at least one individual). An average of 370 restriction sites per individual were surveyed, representing about 9% of the 16,569 base-pair human mtDNA genome.

The 147 mtDNAs mapped were divisible into 133 distinct types. Seven of these types were found in more than one individual no individual contained more than one type. None of the seven shared types occurred in more than one of the five geographic regions. One type, for example, was found in two Australians. Among Caucasians, another type occurred three times and two more types occured twice. In New Guinea, two additional types were found three times and the seventh case involved a type found in six individuals.

A histogram showing the number of restriction site differences between pairs of individuals is given in Fig. 1 the average number of differences observed between any two humans is 9.5. The distribution is approximately normal, with an excess of pairwise comparisons involving large numbers of differences. From the number of restriction site differences, we estimated the extent of nucleotide sequence divergence 26 for each pair of individuals. These estimates ranged from zero to 1.3 substitutions per 100 base pairs, with an average sequence divergence of 0.32%, which agrees with that of Brown 17 , who examined only 21 humans.

Table I gives three measures of sequence divergence within and between each of the five populations examined. These measures are related to one another by equation (1):

where is the mean pairwise divergence (in percent) between individuals within a single population (X), is the corresponding value for another population (Y), is the mean pairwise divergence between individuals belonging to two different populations (X and Y), and is a measure of the interpopulation divergence corrected for intrapopulation divergence. Africans as a group are more variable ( = 0.47) than other groups. Indeed, the variation within the African population is as great as that between Africans and any other group ( = 0.40- 0.45). The within-group variation of Asians ( = 0.35) is also comparable to that which exists between groups. For Australians, Caucasians, and New Guineans, who show nearly identical amounts of within-group variation ( = 0.23-0.25), the variation between groups slightly exceeds that within groups.

When the interpopulational distances ( ) are corrected for intrapopulation variation (Table 1), they become very small ( = 0.01-0.06). The mean value of the corrected distance among populations ( = 0.04) is less than one-seventh of the mean distance between individuals within a population (0.30). Most of the mtDNA variation in the human species is therefore shared between populations. A more detailed analysis supports this vie 27 .

Figure 2 shows the sequence divergence ( ) calculated for each population across seven functionally distinct regions of the 14,11,11 mtDNA genome. As has been found before , the most variable region is the displacement loop (k = 1.3), the major noncoding portion of the mtDNA molecule, and the least variable region is the 16S ribosomal, RNA gene (5x = 0.2). In general, Africans are the most diverse and Asians the next most, across all functional regions.

A tree relating the 133 types of human mtDNA and the reference sequence (Fig. 3) was built by the parsimony method. To interpret this tree, we make two assumptions, both of which have extensive empirical support: (1) a strictly maternal mode of mtDNA transmission (so that any variant appearing in a group of lineages must be due to a mutation occurring in the ancestral lineage and not recombination between maternal and paternal genomes) and (2) each individual is homogeneous for its multiple mtDNA genomes. We can therefore view the tree as a genealogy linking maternal lineages in modern human populations to a common ancestral female (bearing mtDNA type a).

Many trees of minimal or near-minimal length can be made from the data all trees that we have have examined share the following features with Fig. 3. (1) two primary branches, one composed entirely of Africans, the other including all 5 of the populations studied and (2) each population stems from multiple lineages connected to the tree at widely dispersed positions. Since submission of this manuscript, Horai et al. 29 built a tree for our samples of African and Caucasian populations and their sample of a Japanese population by another method their tree shares these two features.

Among the trees investigated was one consisting of five primary branches with each branch leading exclusively to one of the five populations. This tree, which we call the population-specific tree, requires 51 more point mutations than does the tree of minimum length in Fig. 3. The minimum-length tree requires fewer changes at 22 of the 93 phylogenetically-informative restriction sites than does the population-specific tree, while the latter tree required fewer changes at four sites both trees require the same number of changes at the remaining 67 sites. The minimum-length tree is thus favoured by a score of 22 to 4. The hypothesis that the two trees are equally compatible with the data is statistically rejected, since 22:4 is significantly different from the expected 13:13. The minimum-length tree is thus significantly more parsimonious than the population-specific tree.

We infer from the tree of minimum length (Fig. 3) that Africa is a likely source of the human mitochondrial gene pool. This inference comes from the observation that one of the two primary branches leads exclusively to African mtDNAs (types 1-7, Fig. 3) while the second primary branch also leads to African mtDNAs (types 37-41, 45, 46, 70, 72, 81, 82, 111 and 113). By postulating that the common ancestral mtDNA (type a in Fig. 3) was African, we minimize the number of intercontinental migrations needed to account for the geographic distribution of mtDNA types. It follows that b is a likely common ancestor of all non-African and many African mtDNAs (types 8-134 in Fig. 3).

Multiple lineages per race

The second implication of the tree (Fig. 3)-that each non-African population has multiple origins-can be illustrated most simply with the New Guineans. Take, as an example, mtDNA type 49, a lineage whose nearest relative is not in New Guinea, but in Asia (type 50). Asian lineage 50 is closer genealogically to this New Guinea lineage than to other Asian mtDNA lineages. Six other lineages lead exclusively to New Guinean mtDNAs, each originating at a different place in the tree (types 12, 13, 26-29, 65, 95 and 127-134 in Fig. 3). This small region of New Guinea (mainly the Eastern Highlands Province) thus seems to have been colonised by at least seven maternal lineages (Tables 2 and 3).

In the same way, we calculate the minimum numbers of female lineages that colonised Australia, Asia and Europe (Tables 2 and 3). Each estimate is based on the number of region-specific clusters in the tree (Fig. 3, Tables 2 and 3). These numbers, ranging from 15 to 36 (Tables 2 and 3), will probably rise as more types of human mtDNA are discovered.

A time scale can be affixed to the tree in Fig. 3 by assuming that mtDNA sequence divergence accumulates at a constant rate in humans. One way of estimating this rate is to consider the extent of differentiation within clusters specific to New Guinea (Table 2 see also refs 23 and 30), Australia 30 and the New World 31 . People colonised these regions relatively recently: a minimum of 30,000 years ago for New Guinea 32, 40,000 years ago for Australia 33 , and 12,000 years ago for the New World 34 . These times enable us to calculate that the mean rate of mtDNA divergence within humans lies between two and four percent per million years a detailed account of this calculation appears Page 34

elsewhere 30. This rate is similar to previous estimates from animals as disparate as apes, monkeys, horses, rhinoceroses, mice, rats, birds and fishes". We therefore consider the above estimate of 2%-4% to be reasonable for humans, although additional comparative work is needed to obtain a more exact calibration.

As Fig. 3 shows, the common ancestral mtDNA (type a) links mtDNA types that have diverged by an average of nearly 0.57%. Assuming a rate of 2%-4% per million years, this implies that the common ancestor of all surviving mtDNA types existed 140,000-290,000 years ago. Similarly, ancestral types b-j may have existed 62,000-225,000 years ago (Table 3).

When did the migrations from Africa take place? The oldest of the clusters of mtDNA types to contain no African members stems from ancestor c and included types 11-29 (Fig. 3). The apparent age of this cluster (calculated in Table 3) is 90,000-180,000 years. Its founders may have left Africa at about that time. However, it is equally possible that the exodus occurred as recently as 23-105 thousand years ago (Table 2). The mtDNA results cannot tell us exactly when these migrations took place.

Two previous studies of human mtDNA have included African individua 21,28 , both support an African origin for the human mtDNA gene pool. Johnson et al 21 surveyed 40 restriction sites in each of 200 mtDNAs from Africa, Asia, Europe and the New World, and found 35 mtDNA types. This much smaller number of mtDNA types probably reflects the inability of their methods to distinguish between mtDNAs that differ by less than 0.3% and may account for the greater clustering of mtDNA Page 35

types by geographic origin that they observed. (By contrast, our methods distinguish between mtDNAs that differ by 0.03%.) Although Johnson et al favoured an Asian origin, they too found that Africans possess the greatest amount of mtDNA variability and that a midpoint rooting of their tree leads to an African origin.

Greenberg et al 28 sequenced the large noncoding region, which includes the displacement loop (D loop), from four Caucasians and three Black Americans. A parsimony tree for these seven D loop sequences, rooted by the midpoint method, appears in Fig. 4. This tree indicates (1) a high evolutionary rate for the D loop (at least five times faster than other other mtDNA regions), (2) a greater diversity among Black American D loop sequences, and (3) that the common ancestor was African.

Estimates of genetic distance based on comparative studies of nuclear genes and their products differ in kind from mtDNA estimates. The latter are based on the actual number of mutational differences. between mtDNA genomes, while the former rely on differences in the frequencies of molecular variants measured between and within populations. Gene frequencies can be influenced by recombination, genetic drift, selection, and migration, so the direct relationship found between time and mutational distance for mtDNA would not be expected for genetic distances based on nuclear DNA. But studies based on polymorphic blood groups, red cell enzymes, and serum proteins show that (1) differences between racial groups are smaller than those within, such groups and (2) the largest gene frequency differences are between Africans and other populations, suggesting an African origin for the human nuclear gene pool 11,12,35 . More recent studies of restriction site polymorphisms in nuclear DNA 14,36-42 support these conclusions.

Relation to fossil record

Our tentative interpretation of the tree (Fig. 3) and the associated time scale (Table 3) fits with one view of the fossil record: that the transformation of archaic to anatomically modern forms of Homo sapiens occurred first in Africa 43-45 , about 100,000-140,000 years ago, and that all present-day humans are descendants of that African population. Archaeologists have observed that blades were in common use in Africa 80-90 thousand years ago, long before they replaced flake tools in Asia or Europe 46,47 .

But the agreement between our molecular view and the evidence from palaeoanthropology and archaeology should be treated cautiously for two reasons. First, there is much uncertainty about the ages of these remains. Second, our placement of the common ancestor of all human mtDNA diversity in Africa 140,000-280,000 years ago need not imply that the transformation to anatomically modern Homo sapiens occurred in Africa at this time. The mtDNA data tell us nothing of the contributions to this transformation by the genetic and cultural traits of males and females whose mtDNA became extinct.

An alternative view of human evolution rests on evidence that Homo has been present in Asia as well as in Africa for at least one million years 48 and holds that the transformation of archaic to anatomically modern humans occurred in parallel in different parts of the Old World 33,49 . This hypothesis leads us to expect genetic differences of great antiquity within widely separated parts of the modern pool of mtDNAs. It is hard to reconcile the mtDNA results with this hypothesis. The greatest divergences within clusters specific to non-African parts of the World correspond to times of only 90,000-180,000 years. This might imply that the early Asian Homo (such as Java man and Peking man) contributed no surviving mtDNA lineages to the gene pool of our species. Consistent with this implication are features, found recently in the skeletons of the ancient Asian forms, that make it unlikely that Asian erectus was ancestral to Homo sapiens 50-52 . Perhaps the non-African erectus population was replaced by sapiens migrants from Africa incomplete fossils indicating the possible presence of early modern humans in western Asia at Zuttiyeh (75,000-150,000 years ago) and Qafzeh (50,000-70,000 years ago) might reflect these first migrations 45,53 .

If there was hybridization between the resident archaic forms in Asia and anatomically modem forms emerging from Africa, we should expect to find extremely divergent types of mtDNA in present-day Asians, more divergent than any mtDNA found in Africa. There is no evidence for these types of mtDNA among the Asians studied 21,54-16 . Although such archaic types of mtDNA could have been lost from the hybridizing population, the probability of mtDNA lineages becoming extinct in an expanding population is low 57 . Thus we propose that Homo

erectus in Asia was replaced without much mixing with the invading Homo sapiens from Africa.

Conclusions and prospects

Studies of mtDNA suggest a view of how, where and when modern humans arose that fits with one interpretation of evidence from ancient human bones and tools. More extensive molecular comparisons are needed to improve our rooting of the mtDNA tree and the calibration of the rate of mtDNA divergence within the human species. This may provide a more reliable time scale for the spread of human populations and better estimates of the number of maternal lineages involved in founding the non-African populations.

It is also important to obtain more quantitative estimates of the overall extent of nuclear DNA diversity in both human and African ape populations. By comparing the nuclear and mitochondrial DNA diversities, it may be possible to find out whether a transient or prolonged bottleneck in population size accompanied the origin of our species 15 . Then a fuller interaction between palaeoanthropology, archaeology and molecular biology will allow a deeper analysis of how our species arose.

We thank the Foundation for Research into the Origin of Man, the National Science Foundation and the NIH for support. We also thank P. Andrews, K. Bhatia, F. C. Howell, W. W. Howells, R. L. Kirk, E. Mayr, E. M. Prager, V. M. Sarich, C. Stringer and T. White for discussion and help in obtaining placentas.


Craig Venter’s Synthetic Genome 3.0 Evokes Classic Experiments

J. Craig Venter and his colleagues at Synthetic Genomics Inc update their efforts to create a &ldquohypothetical minimal genome&rdquo in this week&rsquos Science.

&ldquoJCVI-syn3.0,&rdquo or syn3.0 for short, is about 531,000 base pairs organized into 473 genes, serially transplanted into cells of the tiny and fast-replicating Mycoplasma mycoides and M. capricolum. The first iteration of the smallest synthetic genome, JCVI-syn1.0, has just over a million base pairs, and JCVI-syn2.0 has 576,000. DNA Science covered them here.

Creation of syn3.0 will inspire future synthetic biology efforts, but reminds me of two of my all-time favorite experiments, from more than half a century ago.

Ingram and the Sickle Cell Mutation

The researchers created syn3.0 in steps that they call &ldquodesign-build-test&rdquo. First they synthesized 8 genome pieces based on an extensive literature review of genes thought to be essential, and mass-produced the pieces in yeast cells. Then they placed the pieces, one type at a time, into M. mycoides carrying the other seven-eighths of syn1.0. They transferred that reconstituted genome into M. capricolum, a step that jettisons the host genome, then deployed naturally-occurring mobile genetic elements (transposons) to insert into, and thereby destroy, just one gene at a time.

If a cell can&rsquot survive with a specific gene harpooned, then that gene is deemed essential &mdash initially. Genes are classified as essential, not essential, or quasi-essential, these last being required for the organism to grow well enough to study, but not to live.

The &ldquodesign-build-test&rdquo strategy reminds me of Vernon Ingram&rsquos famous shortcut to discover the molecular basis of sickle cell disease. Rather than determining the sequence of all 146 amino acids that make up beta globin, painstaking in 1958, he cut the protein into peptides, then ran the pieces in an electrical field (electrophoresis). One peptide moved to a different position in people with sickle cell disease carriers had corresponding fragments that migrated to the disease position as well as to the normal position. Ingram then deduced which piece included the single-base mutation that causes the disease &ndash he had only to decipher the 8 amino acids of that fragment. It was a little like searching within a sentence for an error rather than checking an entire document.

Getting Rid of Redundancies

Syn3.0 is essentially syn1.0 minus 42 genes. Part of the reduction came from eliminating syn1.0 genes that are initially classified as essential but are actually redundant &ndash they are not needed if another gene is also present, but this wasn&rsquot obvious until their partners were deleted. Dr. Venter, in a teleconference yesterday, attributed an analogy to his colleague Hamilton Smith: &ldquoIf you know nothing about airplanes and look at a 757 and find the functions of parts by removing an engine from the right wing, the plane can still fly and land, so you say the second engine is non-essential and don&rsquot discover essentiality until you remove the second one.&rdquo Looking at genomes piece-by-piece can miss redundancy and dual dependency, and may have contributed to initial overestimates of essential genes in various genome projects.

The result of the whittling, syn3.0, is &ldquoa working approximation to a minimal cell.&rdquo But that cell with its adopted genome may be somewhat coddled, given all the small molecules it could possibly require in the laboratory, compared to its natural niche as a goat pathogen. In the real world, with environmental constraints and challenges, genomes are larger than the hypothetical minimum. In fact the researchers chose Mycoplasma because their hosts supply nearly all nutrients, enabling them to naturally survive with minimal genomes. Explained Dr. Venter, &ldquoevery genome is context-specific. It depends on what is in the environment available to it. There is no true minimal genome without defining context and phenotype.&rdquo

So which genes does syn3.0 use? Nearly half take care of protein synthesis (transcription and translation) or &ldquopreservation of genome information&rdquo (DNA replication, topology, repair, metabolism and cell division). The rest are involved in cell membrane structure and function and controlling the composition of the cytoplasm. Most important was that the functions of 31.5% of the genes &ndash 149 of 473 &ndash aren&rsquot known. Many, however, are highly conserved (found in other organisms), suggesting that they are essential. The usually unflappable Dr. Venter sounded astonished. &ldquoKnowing we&rsquore missing a third of our fundamental knowledge is a key finding even if syn3.0 has no other uses,&rdquo he said.

Why design-build-test synthetic genomes? The strategy may be used someday to design drugs or industrial chemicals. But I don&rsquot think the usual &ldquowhy the heck is it useful?&rdquo journalistic requirement need apply here, for the entire concept of synthetic biology tackles the question that draws many of us to biology &ndash what, exactly, is a living organism?

Thanks, mom, for fostering a young biologist.

For as long as I can remember, I&rsquove wondered what life is and how it differs from, say, a rock. This fascination led to me bringing home all sorts of creatures, dead and alive, from my adventures in the wilds of Brooklyn. I&rsquoll never forget my mother bursting into the office of a paleontologist at the American Museum of Natural History asking him to identify the precious fossils that I kept in a shoebox &ndash and still have. That I would become a biologist seemed evident from earliest childhood, a trajectory that narrowed to genetics once I learned that DNA directs everything.

In college, my interest went from collecting, comparing, and observing to experimenting. In cell biology lab, we recreated the classic 1953 Miller experiment, and I was hooked.

Stanley Miller was a 23-year-old graduate student in biochemistry when he combined components of a possible prebiotic atmosphere &ndash methane, ammonia, hydrogen, and water &ndash in a glass vessel and added a spark. The result, after many variations on the primordial soup theme: amino acids. He&rsquod brewed an organic molecule of life from simple chemical compounds in the environment, given a jolt of energy. Miller&rsquos adviser Harold Urey took his name off the paper they submitted to Science so that the young man would receive the credit.

Headlines proclaimed that Mr. Miller had created life, as I suspect they might about syn3.0. &ldquoPeople made jokes. They suggested that I&rsquod grown a rat or a mouse in there!&rdquo he told me. (My husband in fact stuck a frog into a friend&rsquos Miller experiment re-enactment in a lab class.) Dr. Miller died in 2007 I interviewed him for a little known book published in 2001 called Discovery: Windows on the Life Sciences. (Amazon even has the title wrong and I think I own all 8 copies that sold.) It&rsquos odd that Miller&rsquos work was done in the very year that Watson and Crick described DNA.

Shortly after graduate school, my career path veered unexpectedly to writing textbooks. My favorite, on human genetics, has chronicled Dr. Venter&rsquos 20-year quest to recreate an early genome. So while new websites, blogs, and online magazines devoted to genomics seem to be debuting daily, with DNA news reverberating around the Internet and dozens of versions of the same stories appearing simultaneously everywhere, I&rsquove got history sitting right here on my archaic shelf of textbooks. For JCVI-syn3 didn&rsquot come out of thin air. I was thrilled when at the teleconference Dr. Venter began with the same story, of the 20-year effort, that I&rsquove chronicled in my textbook editions &mdash even noting his &ldquotime off&rdquo to sequence the first human genome!

Mycoplasma genitalium, the free-living organism with the smallest known genome.

When the first edition of my textbook was published in 1994, sequencing &ldquothe&rdquo human genome was just getting underway. A table listed genome sizes, the smallest, E. coli, at about 4.8 million base pairs. By edition 2 in 1997, the table included Haemophilus influenzae at the low end, with 1.8 million base pairs. Edition 3, from 1999, began with Mycoplasma genitalium, &ldquothe smallest known genome of any free-living organism&rdquo. Its streamlined 582,970 base pair genome would inspire Dr. Venter to begin deriving the first synthetic genome a decade later.

Edition 4, published in 2001, arrived shortly after the official debut of the first draft human genome sequence, announced with much fanfare at the White House. I&rsquod learned the carefully orchestrated news months earlier, while doing the final edits. But because having a human genome sequence or two gracing the covers of Science and Nature didn&rsquot mean understanding anything about what&rsquos actually in those genomes, my textbook still had room for non-humans. I expanded the table of genomes.

I wrote a new section called &ldquoThe Minimum Set of Genes Required for Life,&rdquo giving Mycoplasma genitalium its own table. Of the 480 genes, Dr. Venter estimated that 265 to 350 were essential to life. Then in edition 5 I gave the minimal genome it&rsquos own boxed reading treatment. In the midst of all the human genome news, I remained more interested in the simplest sets of genetic instructions, for that is where understanding lies.

As more human genome sequences rolled in &ndash James Watson, Craig Venter, then a series of ethnic &ldquofirsts&rdquo followed by celebrities, curious researchers, and journalists &ndash my textbook, because it has &ldquohuman&rdquo in the title, shrank coverage of other species. Like natural selection, instructors dictate what stays and goes in a textbook. But I kept the littlest genome, adding in edition 6, (2005) &ldquoTaking cues from the tiny Mycoplasma genome, Venter&rsquos current research group is attempting to build a synthetic genome &hellip&rdquo

Finally in 2010, edition 10, came Mycoplasma mycoides JCVI-syn1.0: &ldquoCreation of a Bacterial Cell Controlled by a Chemically Synthesized Genome.&rdquo The researchers even stitched their names like watermarks into the reconstituted 1,077,947 base pair genome using a lexicon of DNA triplets corresponding to letters of the alphabet, to distinguish synthetic life from the old kind.

Here is Dr. Venter&rsquos take on the gestation of his namesake. &ldquoMy co-author Ham Smith, who is 85 and has been involved in this work 20 years after most people would have retired, and Clyde Hutchison III (first author) and I were discussing philosophically the differences among genomes, and the only way to answer basic questions about life is to get to a minimal genome, and the only way to get to that would be to synthesize a minimal genome. That started our 20 year quest. We were briefly interrupted by Ham and I taking time off to sequence the first human genome, but we came back in earnest in 2002.&rdquo

I&rsquom writing the twelfth edition of my textbook now, and am happy to include JCVI-syn3.0 &ndash and whatever else follows before my deadline. Of course without a time machine we can&rsquot ever truly know how life on Earth began, and that&rsquos why Dr. Venter stressed the importance of the word &ldquoa&rdquo before &ldquominimal bacterial genome&rdquo in the paper&rsquos title.

Life may have started in many ways, at many times, as complex collections of self-replicating and changeable chemicals coalesced and polymerized, perhaps using clays or minerals as templates, then knitting themselves fatty protective coverings. Somewhere between the prebiotic simulations and creation of synthetic genomes lie the answers. And that&rsquos why glimpsing even one possible scenario for a first genome remains, to me, the most exciting type of experiment in biology.


Difference Between Human and Neanderthal

Human vs Neanderthal

The difference between humans and Neanderthals is their height, size and morphological features. Neanderthals, when compared to humans, were shorter in height and smaller in size. Humans have larger bodies when compared to Neanderthals, and have a significant difference in form and structure, especially in their skulls and teeth.

Another significant difference in the human and Neanderthal is their DNA. Fossil and archaeological evidence prove a distinct separation between Neanderthals and the modern Homo sapiens. Neanderthals were a different species to humans. The brain of a Neanderthal had a raised larynx, and was also bigger than that of the Homo sapiens.

There are notable physical differences between humans and Neanderthals, such as the Neanderthal has thicker bones, shorter limbs, an asymmetrical humerus, barrel chest and thicker metacarpals.

Neanderthal developmental differences from humans are the Craniodental development. Neanderthal and human faces and dental differences starts right from pre birth. The human and Neanderthal occurrence in time also signifies a difference in both species. Neanderthals, when compared to humans, were much stronger, and they lived in the cold climate of Europe.

Neanderthals were homogeneous species, and they are not human ancestors. Although, the difference between humans and Neanderthals when compared to apes is small. Neanderthals had a small population in the relatively recent past, and have no genetic or evolutionary connections with humans. Neanderthals displayed limited genetic diversity due to the lack of clear hybrids in the fossil record, and the lack of Neanderthal features in modern humans. Their limited genetic diversity suggests they went extinct, leaving no descendants. Their Homo erectus development is also more similar to that of apes rather than the modern human. The human child’s growth rate is slower than that of the Neanderthal child, as they used to grow rapidly from infants to adults.

Humans share similarities with other animals, such as anatomical, physiological and biochemical aspects. As humans are made from pre-existing material, as said by the bible, humans have much similarity within their basic body plan, the way it works and the underlying chemical pathway and machines in the body. They are almost the same as other mammals such as Neanderthals and other primates. Some of the significant differences between the human and Neanderthal are the distinctive sizes of their brain, bipedalism, decreased size of back teeth and advanced culture.

1. Human and Neanderthal brains and body structures have major differences in height and size.

2. Neanderthals are not ancestors of humans, but a homogenous species.

3. Humans have developed better eyesight, hearing or smell than Neanderthals due to skeletal adaptations.

4. Neanderthals and humans have many difference in their DNA.

5. Humans and Neanderthals seem not to have major differences in their behavior, and as well as cultural abilities, but Neanderthals fossil brains differ from the modern human brain.


DNA Template Could Explain Evolutionary Shifts

Rearrangements of all sizes in genomes, genes and exons can result from a glitch in DNA copying that occurs when the process stalls at a critical point and then shifts to a different genetic template, duplicating and even triplicating genes or just shuffling or deleting part of the code within them, said researchers from Baylor College of Medicine in a recent report in the journal Nature Genetics.

The report further elucidated the effect of the fork stalling and template switching mechanism involved in some forms of copy number variation.

"I think this is going to make people think very hard about copy number variation with respect to genome evolution, gene evolution and exon shuffling," said Dr. James R. Lupski, vice chair of molecular and human genetics at BCM and senior author of the report.

The mechanism not only represents a newly discovered method by which the genome generates copy number variation among genes, but it also demonstrates that copy number variation can occur at a different time in the life of a cell. DNA replication takes place as the cell is dividing and becoming two &ndash a process known as mitosis.

Copy number variation involves structural changes in the human genome that result in the deletion of genes or parts of them or extra copies of genes. Often, this process is associated with disease or with evolution of the genome itself.

DNA (deoxyribonucleic acid) exists as two complementary strands that remain together because of the attraction between nucleotides. A, or adenine, is always attracted to T, or thymine. C, or cytosine, is always attracted to G, or guanine.

When a cell divides, it must reproduce its DNA so that each cell that results from the division has the same genetic code. That means it must replicate its DNA. During this process, an enzyme called a helicase separates the two strands, breaking the hydrogen bonds between the A &ndash T and G &ndash C base pairs. The two separating strands become the replication fork. On one strand, an enzyme called DNA polymerase reads the genetic material in the strand as a template and makes a strand of complementary DNA to pair to it. Again, the code is A to T and C to G. This process is continuous. On the lagging strand, the complementary strand is made in short, separated segments by a process that involves RNA and a series of enzymes.

Until the 1990s, researchers studying reasons for genetic mutations or changes looked at molecular "typos" in this process, tiny changes in the As, Ts, Cs or Gs called single nucleotide polymorphism (SNPs). They changed the message of the gene. However, in the early 1990s, Lupski was one of the early champions of a newly discovered mechanism in which the structure of the DNA itself was grossly duplicated or deleted to change numbers of copies of a gene that occurred in the genetic material. This "copy number variation" wrote a new chapter in the understanding of human genetic variation.

In a previous report, Lupski and colleagues described how the process that copies DNA during cell division stalls when there is a problem with the genetic material. In some cases, the process seeks a different template, often copying another similar but significantly different stretch of DNA before it switches back to the appropriate area.

In this newer report, Lupski and colleagues describe how this process &ndash called fork stalling and template switching (FoSTeS) in humans or microhomology-mediated break-induced replication (MMBIR) in simpler models &ndash generated genomic rearrangements ranging in size from several megabases to a few hundred base pair during normal cell division, resulting in the duplication or even triplications of individual genes or the rearrangements of single exons (the coding region of genes).

"This phenomenon occurs throughout the genome," said Dr. Feng Zhang, a postdoctoral associate in Lupski's laboratory and the first author of the report.

In studies of subjects with abnormalities in the gene associated with Charcot-MarieTooth type 1A (PMP22), the researchers found that the fork stalling, template switching phenomenon explained the changes, from those that involved triplication of a gene to others that resulted from shuffling within an exon.

Studies of one family &ndash two children and a mother &ndash demonstrated that the event occurred during mitosis or cell division, a significant finding that further confirms the significance of the event.

The researchers noted that finding this mitotic rearrangement of the gene in the mother, who did not have the disorder, of two children with a neuropathy suggests that the mechanism might be considered in genetic counseling about the risk of having another child with the disorder.

The scientists wrote, "We propose that FoSTeS/MMBIR may be a key mechanism for generating structural variation, particularly nonrecurrent CNV (copy number variation), of the human genome. "

The observation of mosaicism for an apparent mitotically generated, FoSTeS/MMBIR-mediated complex PMP22 rearrangement in the unaffected mother of two children with neuropathy suggests this mechanism can have implications for genetic counseling regarding recurrence risk.

Others who took part in this research include Mehrdad Khajavi of BCM, Anne M Connolly of the Washington University School of Medicine in St. Louis, Mo., and Charles F Towne and Sat Dev Batish of Athena Diagnostics in Worcester, Mass.

Funding for this work came from the Charcot Marie Tooth Association and the National Institute of Neurological Disorders and Stroke.

Story Source:

Materials provided by Baylor College of Medicine. Note: Content may be edited for style and length.


Is there a known minimal stretch of DNA that can distinguish any two people in the world? - Biology

Next lecture, we shall be talking about speciation. So here we need to cover topics about the nature of species, and discuss how and whether speciation is different from microevolution (evolution within species)?

- What are species?
- How do species differ from each other?
- How many species are there? We will briefly cover species-level biodiversity .


Species "concepts" - What are species?

Darwin in 1859 proved to the world (the reasonable part of it, anyway!) that species evolved, rather than were created. But this made for a difficulty. All of a sudden species weren’t created kinds, with an Aristotelian essence, as previously thought. It then became unclear how species differed, if at all, from other categories. Species evolve from non-species, so where is the dividing line? Darwin hard a hard time with this one, because if species didn’t exist, he could hardly write a book on their origin, could he?!

Darwin’s resolution of this conundrum was to use a pragmatic definition of species - sometimes dismissively called the morphological species concept, in which species were distinguished from races and polymorphic forms by drawing a suitable dividing line in the actual continuum between species and races or forms.

It would be nice to say that there the story ends. Unfortunately, it doesn’t. Species concepts have been for the last 10-20 years a major battleground for systematists, philosophers of biology, and evolutionists. My own view is that Darwin was thinking more clearly than many of the modern contestants, even though his theory of genetics was patently wrong. But not many agree with me (yet!). I will therefore attempt to give you a fairly balanced assessment.

So here are just some of the leading "species concepts", and their strengths and weaknesses.

1) The morphological species concept ( phenetic species conceptalso included)

According to Darwin, species can simply be diagnosed by morphological gaps in the variation between individuals (see diagram above, where the line separates two morphological clusters of individuals). For instance, Darwin regarded Primula veris (the primrose) and Primula elatior (the cowslip) as varieties of the same species because many intermediates or hybrids are found between them. He argued in the same way that the many races of humans were members of the same species. In these cases, it is not easy to find a sensible place to put a dividing line, even though there are clear differences between the forms. Darwin’s ideas were revived by numerical taxonomists in the 1960s, who introduced a multivariate statistical version of the idea, known today as the phenetic species concept.

However, Darwin’s ideas do lead to some problems:

a) Variation within species sometimes leads to morphological gaps. For instance, we have seen that races, subspecies, populations and even morphs within populations are often discrete (i.e. the variation is discontinuous, there are gaps). Nowadays, we would certainly not classify the melanic form of the peppered moth as a different species just because the variation is not continuous.

b) Lack of differences between species: There are often sibling species which (a) are morphologically more or less identical, although genetically different, (b) evolve more or less separately, (c) have little or no hybridization or gene flow between them. Some examples are:

  • willow warbler and chiff-chaff in UK - sing different songs
  • Drosophila fruitflies: D. pseudoobscura and D. persimilis, which differ chromosomally
  • Anopheles mosquitoes, which differ in habitat, biting propensity, and whether they carry malaria

2) The biological species concept

Difficulties with Darwin's concept tempted a number of people to try to redefine species by means of interbreeding. These ideas were first put forward clearly by an entomologist, E.B. Poulton in 1903. Later, Dobzhansky (1937), and, most famously, Mayr (1940, 1942, 1954, 1963, 1970 etc. etc.) carried on and popularized this tradition it was Mayr who named the idea the "biological species concept", thereby unfairly trying to take the high moral ground because anyone else’s species concept was thereafter, of course, "non-biological"!

The biological species concept allows for abundant gene flow within each species, but a lack of hybridization or gene flow betwen species. The lack of gene flow is caused by isolating mechanisms , a term invented by Dobzhansky, but again popularized by Mayr. Because they are not necessarily " mechanisms " in any sense, I prefer the term " reproductive isolation ":

Types of reproductive isolation
A) Pre-mating isolation < or pre-zygotic isolation>
a) Ecological or seasonal isolation - mates do not meet
b) Behavioural (biochemical) isolation - individuals meet but do not attempt mating
c) Mechanical isolation - attempts at mating do not work!

B) Post-mating < orpost-zygotic> isolation
d) Gametic incompatibility - gametes die before fertilization > (note: this is post-mating but pre-zygotic)
e) Hybrid inviability - hybrids have reduced viabilility as zygote or later in development. This may be caused by internal (genomic factors), or because hybrids are not suited to survival for ecological reasons. Hybrids may also have reduced mating propensity, or be disfavoured as mates.
f) Hybrid sterility - hybrids survive and mate as normal, but are partially or completely sterile.
g) Sexual selection against hybrids (studied by Russ Naisbit, a PhD student in my laboratory) - hybrids are healthy and fertile, but disfavoured during mating.

Problems with the biological species concept
a) Does not apply in allopatry. Strictly, the biological species concept only works in sympatry and parapatry, because how can we tell whether two species would intercross if they are allopatric? We can put them together to see if they interbreed, but many sympatric species of ducks, Drosophila, even tigers and lions will interbreed in captivity, though they rarely, if ever, do in the wild.

So when two species are allopatric, we have to guess from their traits - morphology, behaviour, genetics - and their behaviour in captivity, if possible, whether they would interbreed if they were in sympatry. Not very scientific? (This is a problem with all definitions of species that propose a single fundamental essence of species. Given that species originate by evolution, species identity is bound to be more dubious the more time that they have been diverging. Thus species are bound to become less real and more difficult to classify with increasing spans of space (in geography) or time (in the fossil record).

b) Natural hybridization and gene flow between species exists.

Around 10% of birds and butterfly species produce hybrids in the wild, although each species usually does so very rarely (maybe 1/1000 or less). Ducks [SEE OVERHEAD] and other birds of paradise seem particularly prone to hybridization (>50% of species) in the wild, even though most of the time they seem like "good" species. Fewer mammals probably hybridize in Europe, only about 6% of species are known to form hybrids in the wild. However one of these is the world’s biggest animal (ever - beats the dinosaurs hands down): the blue whale, has been recorded hybridizing with its near relative, the fin whale. Not only that, a female hybrid between these two species has been found with a healthy foetus, genetically a backcross. Plants are especially well known for their tendency to hybridize (probably well over 20%), and hybridization is even a major source of speciation by allopolyploidy in this group (see below). For this reason, the biological species concept has never really caught on with botanists.

Hybridization would not matter if genes did not pass between species via hybridization. But we now know that genes DO pass between species, and many species have received genes, or whole mitochondrial genomes from other species. In some cases, flowering plants have even adopted genes from symbiotic bacteria. DNA sequencing has now revealed many, many examples of this kind of horizontal gene transfer between species . Hybridization and gene transfer are today very important topics in conservation and economic biology.

Although the biological species concept has long been accepted by many evolutionary biologists (especially zoologists) as the best species concept, these kinds of problems have led to increasing attacks. Several possible solutions have been proposed.

3) Ecological species concept

Leigh Van Valen, in the 1970s suggested that species were better defined by the types of selection they underwent, or by their ecological niche . Real species, argued Van Valen, are ecologically different.

a) It is at least theoretically possible that some kinds of sibling species might have exactly the same niches. Eventually, this would lead to a probable loss of one of the species through competition, so this problem is perhaps more theoretical than actual.

b) The worst problem for this idea is that species often do have ecological morphs within the species. The cichlid fish Cichlasoma from Cuatro Cienagas, Mexico, has multiple morphs that do different things:

  • one is bottom living, has grinding molariform teeth, and feeds on molluscs
  • another is pelagic, has sharp teeth, and feeds on fish
  • a third has rounded teeth and feeds on algae and detritus

4) Cladistic and phylogenetic species concepts

Recently, most systematists have favoured phylogenetic systematics, in which cladistic classifications. The cladistic movement was founded by Willi Hennig in the 1950s. If higher taxa are defined by means of phylogeny, then so should species, reasoned cladists. This has led to a plethora of cladistic and phylogenetic species concepts. One idea, based on Willi Hennig’s own idea, and supported by Ridley among others is a cladistic species concept:

According to Hennig and Ridley, species are branches in a lineage. When the lineage branches, two new species arise out of the old one, as above, where 5 species result from a phylogeny with two branching (speciation) events. Although there is a morphological discontinuity within the history of species 2, this does not mean the upper and lower portion of species 2 are different species, unless a new branch (in grey) originates at that point. The virtue of this idea to its proponents is that it should apply in history, to fossils, as well as to modern species.

Unfortunately, there are Problems :

a) In practice, phylogenies are unstable hypotheses rather than facts. The branching pattern must be known in order to define species. Cladistic species may therefore be somewhat arbitrary. Supposing the grey branch was unknown, then was suddenly discovered in a small modern population. Now, suddenly, fossil species 2 must be reclassified into two separate species, even though a continuous record of those forms were previously well known. Worse, if the grey lineage has no fossil record, we don’t know where species 2 must be divided.

b) Many island populations may be cladistic side-branches of mainland species yet their establishment does not usually alter the mainland species in any way whatsoever. In fact, this may be true for any population that has been geographically isolated for a few generations. Cladistic species concepts could lead to a lot of new species that are only faintly recognizable.

c) Hybridization, if it occurs between branches, will tend to lead to a lack of clear branching between related pairs of species at some genes. The phylogeny of species may be meaningless under such conditions instead, the phylogeny becomes a mass of "genealogies" at sometimes contradictory genes. Of course one could use some sort of average phylogeny (sometimes called a "consensus" phylogeny) as the "true" species phylogeny, but this kind of averaging is certainly very different from the notion that the species we are looking at have a single true phylogeny.

There are many alternative evolutionaryand phylogenetic species concepts which attempt to answer these problems. For example, various kinds of phylogenetic concept have attempted to incorporate the possibility of gene flow between species. For instance, Cracraft suggests that species have fixed differences at (morphological) "characters", but critics have argued that this would lead to the recognition of many local populations with trivial genetic differences as separate species. It is also a little unclear what one means by "fixed" differences when gene flow will prevent complete fixation. We don’t have time to go through all the species concepts of this sort here, but you can find them in many books (some of my own efforts, encyclopaedia entries with general references, are available from my home page).

5) Rank-free taxonomy, and giving up on species altogether!

Recently, a number of leading phylogenetic systematists have proposed "rank-free" taxonomy, in which species no longer hold a unique position in the taxonomic hierarchy. Proponents of this view argue that the difficulty of assigning a species rank exists because species lack reality as special taxa. Instead, they argue, we should develop a completely new taxonomy based purely on phylogenetic principles, and do away with the Linnean binomial (i.e. two Latin names: genus + species) tradition. The first revision of a taxonomic group without species designations has recently been published in the journal "Systematic Biology" (2000). Whether this idea will catch on is hard to say. If it does, it could cause chaos in biological nomenclature at a time when we badly need taxonomists for studies in biodiversity and conservation. There is a very strong resistance to this idea from among traditional taxonomists, and also from within even the phylogenetic systematists.

My own view is that hybridization and gene flow will wreck the idea of the perfectly hierarchical rank-free taxonomy, especially near the (current) species level, and that species will remain a convenient naming device to classify animals and plants. There must be a certain validity to species, or your bird or plant guides wouldn't be very useful. In some asexual taxa, like brambles and dandelions, it may be somewhat difficult to distinguish "species" from "varieties", but mostly even asexual taxa are easy to divide along species lines. On the other hand, I rather agree that the supposed "reality" of species over and above other higher (genera, families) or lower (subspecies, varieties) taxa has been greatly overemphasised.

Why are there so many species concepts?

What should practising evolutionary geneticists like you do, faced with such a diversity of opinion?

  • Many evolutionary biologists, provided they do NOT work on plants, think the biological species concept is best.
  • Many taxonomists and systematists think that some form of phylogenetic species concept is best, while some profess to get rid of species altogether on the grounds that true phylogenetic taxonomy should be purely hierarchical, and rank-free.
  • Ecologists assume and often use the ecological species concept.

I have my own way of making sense of this debate, with which you may or may not agree. I argue that you can update Darwin’s idea of species without too much difficulty, but take account of modern knowledge of genetics, and thereby solve some of the problems inherent in the other species concepts at the same time.

Species within a region are genetically differentiated populations potentially connected by gene flow. This gene flow may be very low (as in the biological species concept), but it doesn’t have to be negligible. The important thing is that the gene flow is low enough, and the disruptive selection keeping the populations apart is strong enough so that genetic differences between the species are maintained. If the two populations collapse together, because the gene flow outweighs selection, then there will only be a single species.

Species are then clusters of genotypes with discontinuities or gaps between them (a genetic version of Darwin’s morphological concept ). Low levels of gene flow (a lack of Mayr’s pre-mating isolation ) could break up the genotypic and phenotypic differences. However, this gene flow, if it exists, may be balanced by disruptive selection, which may be intrinsic (due to interactions between genes within the hybrids, as in Mayr’s post-mating isolating mechanisms ) or extrinsic (due to the environment, as in Van Valen’s ecological concept ) . Darwin’s morphological concept can thus be related to the ecological and biological concepts: the biological and ecological concepts are explanations of the morphological/genotypic situation of two clusters separated by gaps. Phylogenies obviously have something to do with the whole process. As species diverge more and more, hybridization will be reduced, and a separate branch in the phylogeny emerges from the cross-linking caused by hybridization, and becomes progressively better defined.

Phew! Now that’s over, let’s get on with discussing the interesting things about species.

Genetic differences between species

To study speciation, we need to know how species differ from one another genetically. In general, weherever we look, species differ in ways similar to those of populations or geographic races (see EVOLUTION IN SPACE AND TIME), only more so. Here are some of the ways in which species differ:

a) Morphological differences (see Darwin’s definition, above). Morphology differs between races and populations, as well, of course as already mentioned.

b) Enzyme and molecular differences . Francisco Ayala did detailed surveys with allozymes on Drosophila [SEE OVERHEAD]. Species differ at multiple allozyme loci, subspecies at slightly less loci, and so on down to poplations. We have seen that many hybrid zones separate subspecific forms that differ at multiple genetic loci also, and this and Ayala’s work shows clearly that races and species differ genetically in degree rather than kind. This is as true for mtDNA and other DNA markers as it is for allozymes (see also (g) below). Because multilocus differences are common even between populations and races that noone would want to call species, it is almost certain that speciation also involves multiple locus evolution, and indeed more of it!

c) Chromosomal differences . We have already mentioned human/chimp diffferences (see Chromosomal Evolution), and how common this is in other species that have been studied. Again we can point to subspecies and races that differ chromosomally also, only less so. Chromosomally, species are continuous with races, but usually differ more.

Polyploidy is, however one exception to this gradual differentiation. Polyploidy is a very common feature of plant species differences, and only rarely can be considered polymorphic within species because of the almost universal sterility of diploid X polyploid offspring, which are triploid.

d) Signals used in mating . Sexually-selected colours, tail length in birds, pheromones in moths, other insects, and even mammals are all involved in species recognition as well. In many crickets and grasshoppers, as well as frogs, species-specific sounds are required in fireflies, species recognize each other by means of coded flashes [SEE OVERHEAD].

Again, these kinds differences are quite easily derived from mate choice differences within species, perhaps caused by sexual selection or for ecological reasons of efficiency. Differences between races and species are again in degree rather than kind. There is a controversy as to whether mate choice itself may evolve to "protect species" from gene flow. This would be a true isolating "mechanism". See next lecture.

e) Hybrid inviability and sterility - genomic incompatibility . Sterility and inviability are very common in hybrids. We have already mentioned examples produced by chromosomal differences. Mules (donkey x horse hybrids, which are sterile) are another example.

We know from studies of clines and hybrid zones that multilocus hybrid inviability can occur within species as well as between them. On the other hand, some species almost never mate together, but if they do, the hybrids seem not only viable but fertile. Related species of Darwin’s finches and ducks are an example. Once again, species differ from races only in the degree of hybrid inviability and sterility, not absolutely in kind.

A particularly well known kind of difference is known as Haldane's Rule after its discoverer, J.B.S. Haldane. Haldane's Rule states that when only one sex of the F1 hybrid between species is affected by inviability or sterility, that sex is usually the heterogametic (XY) sex, rather than the homogametic (XX) sex. The rule works in mammals and Diptera (flies) in which the sex-determination is usually male - XY, female - XX as well as in birds and butterflies, in which females are XY and the males are XX. The reason is probably mainly due to recessive effects of genes causing incompatibility on the X chromosome. These genes must be epistatic can you see why?

In other cases, the F1 hybrid between two species may be alright, but backcrosses or F2 crosses produce inviability or sterility. This is known as hybrid breakdown , and may be caused by recessive incompatibility genes (also epistatic) becoming homozygous during these later crosses.

f) Ecological differences . Perhaps the best examples we have of ecological differences between closely-related species are adaptive radiations on islands. Darwin’s finches are well-known. The Hawaiian honeycreepers [SEE OVERHEAD] are even more extraordinary. From finch-like ancestors, they have produced nectarivorous, insectivorous, frugivorous, as well as seed-eating forms.

But we have already discussed under the ecological species concept how ecological differences are found across clines that are under extrinsic selection across an environmental gradient. Once again there is no clear dividing line between races and species in the degree of ecological differentiation.

g) Genealogical differences . As we have seen, when species diverge, their DNA, such as mitochondrial DNA, will also diverge. When a genealogy (the phylogeny of a single gene or stretch of DNA) is estimated, one usually finds that species, and sometimes even races, fall on different branches of the genealogy. An example is given by the Heliconius butterflies, on which own group work, in the figure below.

Heliconius cydno and H. melpomene are closely related species which also occasionally hybridize. They clearly fall on separate branches of this genealogy of the genes CO1 and CO2 of mtDNA. However, it is also true that the melpomene from French Guiana falls on a separate branch of the genealogy from the members of the same species from Panama, and there is a similar deep branching pattern even within Panamanian cydno. Thus, a separate genealogy is not a good guide to separate species status. Other geographic races of Heliconius melpomene have mtDNA genealogies that intermingle with the Panamanian H. melpomene, so not all geographic populations have separate genealogies.

However, in some cases, as in the Drosophila (melanogaster, simulans, sechellia, and mauritiana) genealogies [OVERHEAD], gene genealogies of well-recognized separate species intermingle. In this case two possibilities exist: (1) ancestral polymorphisms -- speciation occurred recently enough so that polymorphisms for genes within each species are retained. (2) interspecific gene transfer -- horizontal gene transfer since the origin of the species has led to an intermingling of the genealogies more recently. These two are difficult to tell apart.

In any case, even at the genealogical level we see intermingling above the species level as well as below. Separate genealogical branches have evolved within some species, as well as between many, perhaps most species. Genealogies of species may be more separate than those of races and populations within species, but there is a lot of overlap.

Genetic differences between species , then are usually inherited at multiple loci, and are on average greater than and involve more genes than (though overlap and blend into) the kinds of differences we see between geographic races, or even morphs in polymorphic populations. There is nothing magic about the species level in terms of genetics, and therefore it would seem most logical and parsimonious (simplest) to use the same microevolutionary forces - selection, drift, mutation - coupled with more time, to explain the evoloution of species, as well as the other kinds of subspecific evolution we have already discussed.

Strictly, biodiversity means the sum total of diversity at all levels of the evolutionary hierarchy, from genetic diversity within populations, between populations, between races, species, genera, and so on, up to ecosystems and biomes. In practice, the species is traditionally viewed as one of the most important level of biodiversity. In view of the difficulty of defining species (above), perhaps this isn't valid?



Comments:

  1. Jairus

    You have hit the mark. In it something is also to me it seems it is very good idea. Completely with you I will agree.

  2. Cacey

    Hardly I can believe that.

  3. Mooguk

    I think you are not right. We will discuss it. Write in PM.

  4. Kalen

    You are wrong. I propose to discuss it. Email me at PM, we will talk.

  5. JoJotaur

    Now all is clear, I thank for the information.

  6. Kazigrel

    I think you have written very well, this experience will be useful to many, and this topic was described not but without such a detailed presentation



Write a message