April 23, 2014

New page: Early Kurgan expansion maps

Triggered by a mail discussion on linguistic prehistory, I finally did something that was in my agenda since long ago: create and share some Kurgan expansion maps → dedicated page in this blog.

The maps do not cover all the Indoeuropean expansion (which still continues to this very day for all I know) but just the core one of the Chalcolithic period, which provides the skeleton for the understanding of later periods. For ad-hoc reasons all maps are of Europe and nearby parts of West Asia only, although admittedly that leaves the proto-Tocharian Afanasevo culture off map. As I did not go into the Bronze and Iron ages' periods, which would have to include Indo-Iranian expansion, this works alright.

Some eye candy to entice you (map #2 out of 4, arguably the most central one, outlining at least five different branches of Indoeuropean very early on, early 4th millennium BCE):


Lochsbour's IBD in modern Europeans is greatest among Danish but most direct among French

This is a most interesting issue I forgot to discuss when previously addressing the massively interesting Lazaridis et al. study on European ancestry based on ancient autosomal DNA (see here and here). 

Identity by descent (IBD) data shows interesting differences between populations in Supplemental Information's article 18. While Stuttgart's (early farmer) ancestry is more or less the same by both measures (Sardinians first, followed by Slovakians and some other Balcanic and Central European populations), there are important differences in the ancestry of Lochsbour (Epipaleolithic hunter-gatherer from Luxembourg). While the Danes score highest in overall IBD block number (rough relatedness to Lochsbour) it is the French who score highest in IBD length (indicating a more direct relatedness, even if in smaller amounts). 



The difference between the French and Danish is quite significant, I believe, and seems to suggest that Lochsbour's relatives had a direct impact on modern French genetics, while the impact of Lochsbour as such on other populations should be considered more indirect (i.e. via other hunter-gatherer populations). 

This implies that there was some important diversity among the hunter-gatherer groups that influenced modern European genetics and that Lochsbour must be considered a mere generic proxy. Possibly if Motala or La Braña would have been used as reference instead, we would get some important differences in the results, as would be the case if Balcanic or Eastern European hunter-gatherers would be thrown into the equation, no doubt. 

You may have noticed that there are some notable samples unmarked in the graphs, that's because they are colonial populations such as Zimbabwean or North American Whites, whose exact ancestry is not easy to track. The green and red texts are my illustrative additions.

While not marked, I find also notable and rather perplexing that Lebanon shows up as the fourth non-colonial population more related to Lochsbour by IBD length, after Scotland but before Ukraine, the Netherlands and Sweden.

In any case you can parse the data for the 10 more notable samples of each measure in the supplemental material, chapter 18.

Referenced study:

Iosif Lazaridis, Nick Patterson, Alissa Mittnik, et al., Ancient human genomes suggest three ancestral populations for present-day Europeans. BioArxiv 2013 (preprint). Freely accessibleLINK [doi:10.1101/001552]

April 17, 2014

British Chalcolithic? Indeed!

As you may know, in the continent we don't even bother anymore about copper or basic metallurgy to define the Chalcolithic (Copper and Stone Age). Much as happened with the concept of Neolithic, which initially meant polished stone tools, but ended up being all about farming and herding, the term Chalcolithic has evolved to mean an advanced and rather sophisticated form of Neolithic with long distance trade and growing social stratification, occasionally even the first civilizations. Not always copper is present and nobody really cares. 

But Britain is different: there they call the Chalcolithic "Late Neolithic". Why? Apparently because no copper artifacts have been found and they see no reason to establish correlation with continental Europe. So what in the mainland is Copper Age (with or without metallurgy) in the islands it is just more "Neolithic", no matter it is about the same thing.

However now thanks to the effort of an archaeological team led by Dr. George Nash, Britain may end up finally having to mold to the continental way. Because there is at least some copper in the "Late Neolithic" of the big island. 

This tiny piece of copper may change the name of an era

It is tiny, it is of unknown function (part of a bead?) but it is from a British (specifically Welsh) Chalcolithic ("Late Neolithic") site. And crucially it is copper, wow!

The site was described by Dr. Nash as the least known Neolithic chambered tomb, maybe just until now, and goes by the name of Perthi Duon (Anglesey).

The British confusion with all these categories is astonishing. Our source "explains" to their readers:
The Copper Age followed the Neolithic Era and is considered a part of  the Bronze Age. The period is  defined as a phase of the Bronze Age  in which metallurgists had not yet  discovered that bronze could be  made by adding tin to copper.

Well, no: the Chalcolithic or Copper Age is not part of the Bronze Age. The Bronze Age begins with bronze, what implies quite a greater mastery of metallurgic techniques, techniques that allow for the first time to replace stone tools and weapons with something else. Copper Age is about fashion (shiny but rather useless things) and not just about copper but also gold and silver metallurgy - among other quite more interesting things, like the first international trade networks, megaliths, fortified towns and almost certainly the first states worth that name. 

Copper Age is about the beginnings of civilization. The Incas for example were in the Copper Age when Pizarro ruined their day with steel, gunpowder and treachery. 

Bronze is about swords, spears and axes, much like Iron after it. 

Not that there was no conflict in the Chalcolithic but it was fought with stone weapons, regardless that they made shiny copper (or semi-precious stone) imitations for some burials.

That clarified, they continue with more meaningful stuff:
While a Copper Age has long been recognised in Europe, the question of whether Britain experienced such  a period is still debated by archaeologists.

Dr Nash said: “The big question in  archaeology at the moment is  whether there was a Copper Age in  Britain.

“Did copper come to Britain before bronze?

“This discovery helps to suggest  that we did have a Copper Age.”

The question, as I see it, is not if there was a Copper Age but if there was copper in that Copper and Stone Age, what is not really that important in itself.

For me there is certainly a British Chalcolithic largely defined by the erection of Stonehenge and many other similar monuments and all what they imply in the wider context of European Megalithism and later the Bell Beaker phenomenon. Of course it is not a clear cut definition but it is not either dependent on the mere presence of copper. Similarly, for contextual and continuity reasons, it is not likely that the Balcanic Chalcolithic will be redefined as Bronze Age any time soon just because widespread bronze metallurgy (tin bronze not that arsenic ersatz thing!) has been recently discovered.

Context does matter and Britain is not outside the wider European context at all. 

Source: Daily Post.

Basque-Iberian numerals

Basque-Iberian language family theory is back and very strongly so. Even such an staunchly pan-indoeuropeanist¹ fanatic like Villar has to admit to it, even reluctantly and trying to dismiss its relevance. 

The main argument is the recognition of Basque and Iberian numerals as very similar, including even the once suspect IE loan sei (six), which may in the end be Vasconic after all. 

Euskararen Jatorria has these days a whole series on the matter (in Basque and Spanish):


Ferrer 2009 scheme of Iberian numerals
Orduña 2011: Basque numerals (standard and variants)


So, totally dismissing that self-complacient and power-mongering cultural terrorist of Lakarra, and looking at the obvious facts, we have the following series of correspondences (iberian - Basque). I have transcribed Iberian transcriptional "s" as "z" and "ś" as regular "s", as it seems to correspond with modern and historical Basque spelling:
  1. ban - bat [bana: each, bana-tu: divide]
  2. bi(n) - bi²
  3. irur - (h)iru(r)
  4. lau(r) - lau(r)
  5. borz(te) - bost, borz, bortz³
  6. sei - sei⁴
  7. zizbi - zazpi
  8. zorze - zortzi, zorzi
  9. unknown - bederatzi
  10. abaŕ - (h)amaŕ⁵
Additionally we have also the correspondence (20) oŕkei - (h)ogei. With those numbers you can count up to 99 using the vigesimal system common to both languages⁵.

Less clear is whether there is a Basque-Iberian correspondence regarding the number 100 (ehun in Basque). Orduña argued for it but Ferrer rejected the claim. 

It is interesting that the form for 11 in Basque is irregular: amaika (regular would be hamabat but it does not exist in fact). Considering the abaŕ-ke-# form in Iberian, we can now track its origins surely to a shortening of abaŕkeban⁷. Amaika is also used to mean "a lot" in Basque, possibly because most peasants were not too much into numbers in the past.

________

Notes:

¹ Pan-indoeuropeanism: wild hypothesis that rejects that the Indoeuropean family expanded from any single origin and claims instead that it was preceded only by itself from the beginning of times. Oddly enough there are people (and I mean linguists) who believe in it, even if it makes no sense whatsoever. It is a convenient ideological way to deny any respect to other languages which may have been in the area longer, as may be the case of Basque or Dravidian. Winners write history and now it seems that also linguistic theories of the worst kind.

² Notice the obvious Vasconic influence in Latin particle bi-, which is used in many languages nowadays: binary, bilateral, bifocal, etc. 

³ Iberian syllabary would force borzt or even bortz to be  written borzte or similar, as there is no lone "t" sign, only "ta", "te", "ti", "to" and "tu", which can also be "da", "de", etc. Only in the scarce Ibero-Jonian (Greek) script transcription becomes much easier.

⁴ It was typically believed that the numeral sei (6) was an Indoeuropean borrowing (compare with Sp. seis, Lat. sex), possibly under the influence of Christian doctrine (the very word "sex" comes from the Latin numeral, in reference to the Hebrew chastity commandment usually listed under that number) but judging on the Iberian identical form this must revised. However it is true that there is an independent an maybe related IE pattern of similar cognates for this number whose PIE reconstruction is *swéḱs.

⁵ "ŕ" indicates strong r (alveolar trill /r/, as in Sp. "perro"), "r" indicates a soft r (liquid /ɾ/, as in Sp. "pero"). Hence you decline hamar as hamarra but laur as laura, even if both roots are written similarly in standard Basque, which has long abandoned the "ŕ" character for pragmatic reasons (still used but very rarely). 

⁶ This vigesimal system, unknown to most Indoeuropean languages (staunchly decimal), was adopted by Celtic and later by French, which still retains it for the numbers 70 and 90 and their derivatives. Other languages with vigesimal system are Danish, Albanian and a dialect of Slovenian spoken in Italy (all them IE languages); Georgian and Nakh also have it in the Caucasus region; some traditional numerical expressions in English ("score" and other usages) also seem to retain the memory of a vigesimal system. It is an important piece of evidence supporting the existence of a vasconic substrate in much of Europe.

Micro-update: Gascon and other old romances of present-day France also retained at least some expression of this vigesimalism. 

⁷ If -ke- meant "and" in ancient Iberian (Basque eta has been claimed to come from Latin et), can it be argued that the Latin particle -que (also meaning "and") has vasconic origins? In standard theory it comes from PIE *-kʷe but the evidence of its existence seems a bit feeble to my eyes, with almost no alleged derivate being even remotely similar to their alleged ancestor.

April 12, 2014

Sense of justice is not emotional but fully driven by reason

This may be highly counter-intuitive because we culturally associate this sense of justice with idealism or, as Che Guevara put it, with being driven by superior feelings of love for fellow humans.

Actually it has nothing to do with it but with greater rational function, brain scans show.

Keith J. Yoder & Jean Decety, The Good, the Bad, and the Just: Justice Sensitivity Predicts Neural Response during Moral Evaluation of Actions Performed by Others. Journal of Neuroscience 2014. Pay per viewLINK [doi:10.1523/JNEUROSCI.4648-13.2014]

A copy of the study has been uploaded by co-author J. Decety to academia.edu.

Complementary source: press release.

From the latter:
During the behavior-evaluation exercise, people with high justice sensitivity showed more activity than average participants in parts of the brain associated with higher-order cognition. Brain areas commonly linked with emotional processing were not affected.

The conclusion was clear, Decety said: “Individuals who are sensitive to justice and fairness do not seem to be emotionally driven. Rather, they are cognitively driven.”
According to Decety, one implication is that the search for justice and the moral missions of human rights organizations and others do not come primarily from sentimental motivations, as they are often portrayed. Instead, that drive may have more to do with sophisticated analysis and mental calculation.

This may explain why there is a well known positive correlation between intelligence and leftism: fairness is simply more rational.

Genetic paleohistory of domestic cows shows major differentiation in Africa

A new study reveals that African Bos taurus breeds are quite deeply diverged from the Eurasian branch, showing an early differentiation of both continental populations, admixture with African wild auroch and livestock export from Europe to Asia. 


Jared E. Decker et al., Worldwide Patterns of Ancestry, Divergence, and Admixture in Domesticated Cattle. PLoS ONE 2014. Open accessLINK [doi:10.1371/journal.pgen.1004254]

Abstract

The domestication and development of cattle has considerably impacted human societies, but the histories of cattle breeds and populations have been poorly understood especially for African, Asian, and American breeds. Using genotypes from 43,043 autosomal single nucleotide polymorphism markers scored in 1,543 animals, we evaluate the population structure of 134 domesticated bovid breeds. Regardless of the analytical method or sample subset, the three major groups of Asian indicine, Eurasian taurine, and African taurine were consistently observed. Patterns of geographic dispersal resulting from co-migration with humans and exportation are recognizable in phylogenetic networks. All analytical methods reveal patterns of hybridization which occurred after divergence. Using 19 breeds, we map the cline of indicine introgression into Africa. We infer that African taurine possess a large portion of wild African auroch ancestry, causing their divergence from Eurasian taurine. We detect exportation patterns in Asia and identify a cline of Eurasian taurine/indicine hybridization in Asia. We also identify the influence of species other than Bos taurus taurus and B. t. indicus in the formation of Asian breeds. We detect the pronounced influence of Shorthorn cattle in the formation of European breeds. Iberian and Italian cattle possess introgression from African taurine. American Criollo cattle originate from Iberia, and not directly from Africa with African ancestry inherited via Iberian ancestors. Indicine introgression into American cattle occurred in the Americas, and not Europe. We argue that cattle migration, movement and trading followed by admixture have been important forces in shaping modern bovine genomic variation.

The triple (indicine or zebuine, Eurasian taurine, African taurine) division is apparent even in the limited scope of Principal Component Analysis, with African taurine breeds standing out in the intra-taurine distinctiveness in PC2 while PC1 shows the pre-Neolithic taurine-indicine distinction:

Figure 1. Principal component analysis of 1,543 animals genotyped with 43,043 SNPs.
Points were colored according to geographic origin of breed; black: Africa, green: Asia, red: North and South America, orange: Australia, and blue: Europe.

An admixture-enabled phylogeny shows more clearly the deep divergence of the African branch of taurine cows:

Figure 4. Phylogenetic network of the inferred relationships between 74 cattle breeds.
Breeds were colored according to their geographic origin; black: Africa, green: Asia, red: North and South America, orange: Australia, and blue: Europe. Scale bar shows 10 times the average standard error of the estimated entries in the sample covariance matrix. Common ancestor of domesticated taurines is indicated by an asterisk. Migration edges were colored according to percent ancestry received from the donor population. Migration edge a is hypothesized to be from wild African auroch into domesticates from the Fertile Crescent. Migration edge b is hypothesized to be introgression from hybrid African cattle. Migration edge c is hypothesized to be introgression from Bali/indicine hybrids into other Indonesian cattle. Migration edge d signals introgression of African taurine into Iberia. Migration edges e and f represent introgression from Brahman into American Criollo.

Admixture K=3 is also consistent with this triple pattern:

Figure 6. Ancestry models with 3 ancestral populations (K = 3).
Blue represents Eurasian Bos t. taurus ancestry, green represents Bos javanicus and Bos t. indicus ancestry, and dark grey represents African Bos. t. taurus ancestry. See Supplementary Figures S5, S6, S7, S8, S9, S10 for other values of K.

The authors find that modern Anatolian breeds are not representative of early Neolithic cows:
Anatolian breeds (AB, EAR, TG, ASY, and SAR) are admixed between blue Fertile Crescent, grey African-like, and green indicine-like cattle (Figures 5 and 6), and we infer that they do not represent the taurine populations originally domesticated in this region due to a history of admixture. Zavot (ZVT), a crossbred breed [25], has a different history with a large portion of ancestry similar to Holsteins (Figures 2 and S8, S9, S10). The placement of Anatolian breeds along principal components 1 and 2 in Figure 1 [23], the ancestry estimates in Figure 6, their extremely short branch lengths in Figures 24, and significant f3 statistics confirm that modern Anatolian breeds are admixed (see Methods for explanation of f-statistics).

As mentioned above, they also find that African taurines are much deeper diverged from Eurasian taurines than would be expected if they all diverged in a simple model from early Neolithic cows. This is partly caused, according to the study, because of a later history of back-migration (or export) of European cows to Asia, including the Far East:
We conclude that there were two waves of European introgression into Far East Asian cattle, first with Mediterranean cattle (which carried African taurine and indicine alleles) brought along the Silk Road [29] and later from 1868 to 1918 when Japanese cattle were crossed with British and Northwest European cattle [25].

However there is more: African breeds also appear to have important levels of admixture (~26%) with native African wild auroch:
The second factor that we believe underlies the divergence of African taurine is a high level of wild African auroch [30], [31] introgression. Principal component (Figure 1), phylogenetic trees (Figures 2 and 3), and admixture (Figure 6) analyses all reveal the African taurines as being the most diverged of the taurine populations. Because of this divergence, it has been hypothesized that there was a third domestication of cattle in Africa [32][36]. If there was a third domestication, African taurine would be sister to the European and Asian clade. When no migration events were fit in the TreeMix analyses, African cattle were the most diverged of the taurine populations (Figures 2 and 3), but when admixture was modeled to include 17 migrations, all African cattle, except for East African Shorthorn Zebu and Zebu from Madagascar which have high indicine ancestry, were sister to European cattle and were less diverged than Asian or Anatolian cattle (Figure 4), thus ruling out a separate domestication. Our phylogenetic network (Figure 4) shows that there was not a third domestication process, rather there was a single origin of domesticated taurine (Asian, African, and European all share a recent common ancestor denoted by an asterisk in Figure 4, with Asian cattle sister to the rest of the taurine lineage), followed by admixture with an ancestral population in Africa (migration edge a in Figure 4, which is consistent across 6 separate TreeMix runs, Figure S4). This ancestral population (origin of migration edge a in Figure 4) was approximately halfway between the common ancestor of indicine and the common ancestor of taurine. We conclude that African taurines received as much as 26% (estimated as 0.263 in the network, p-value<2.2e-308) of their ancestry from admixture with wild African auroch, with the rest being Fertile Crescent domesticate in origin.

As it is well known, African breeds also show variable frequencies of indicine (zebu) ancestry, which is c. 0-20% in West Africa and as much as 74% in some East African breeds, owing to greater exchanges with Asia in historical times. 
... we revealed two clusters of indicine ancestry possibly resulting from the previously suggested two waves of indicine importation into Africa, the first occurring in the second millennium BC and the second during and after the Islamic conquests [25], [34], [48].

However the study notices that, after controlling for the African wild auroch's admixture effect, the appearance of indicine admixture in some breeds collapses to zero (and is reduced in other cases):
Thus, we conclude that contrary to the assumptions and conclusions of [55] cattle with pure taurine ancestry do exist in Africa.

Other results are a confirmation of SE origin of European cows, a specific founder effect in Europe for shorthorn breeds and significant (8-23%) African admixture in Iberian breeds. Some American breeds are indeed a colonial mix of taurine and indicine. 

Figure 5. Worldwide map with country averages of ancestry proportions with 3 ancestral populations (K = 3).
Blue represents Eurasian Bos t. taurus ancestry, green represents Bos javanicus and Bos t. indicus ancestry, and dark grey represents African Bos. t. taurus ancestry. Please note, averages do not represent the entire populations of each country, as we do not have a geographically random sample.

April 11, 2014

Scotland's first inhabitation pushed back to c. 14,000 years ago

New archaeological evidence from South Lanarkshire push back the first colonization of Scotland to c. 14,000 years ago, quite earlier than known before in the same area.

Like the previous findings, these older tools appear to belong to the Hamburgian culture which spanned the North Sea, being probably established in Doggerland, and is best known from its Low German and Danish sites.

According to The Courier:
Previously, the oldest evidence of human occupation [in Scotland] could be dated to around 13,000 years ago at a now-destroyed cave site in Argyll.

It is thought the hunters who left behind the flint remains came into Scotland in pursuit of game, possibly wild horses and reindeer, at a time when the climate improved following severe glacial conditions. 

These glacial conditions returned around 13,000 years ago and Scotland was once again depopulated, probably for another 1,000 years, after which new groups of people with different types of flint tools made an appearance.



Polished axes in Australia are 35-30,000 years old

There was a time when "Neolithic" meant age of the polished stone. Not anymore thankfully. Otherwise we would have to write here that "Neolithic" began in Australia 25,000 years before anywhere else. But of course what began so early was the art of making polished stone tools, not farming.

Windjana polished axe fragment



From Science Network (excerpts):
Purposely sharpened or ‘retouched’ stone axes evolved in Australia thousands of years before they appeared in Europe according to researchers studying the south-east Asian archaeological record.

They found 30,000-year-old flakes from ground-edged axes at a site near Windjana Gorge in the central Kimberley.

“The suggestion that all innovation has to come from the Old World is not true because clearly ground-stone axes were created here,” Prof Balme says.

She notes that they were also made in Japan at a slightly later date, by people who would have had no contact with either Australian Aborigines or people in Africa and Europe.

Actually as, David at Prehistoria al Día[es] explains, the Japanese dates are not really more recent, ranging between 34,000 and 38,500 years BP.

Semi-polished edges at tool (scrapper?) from Arnhem Land


Japanese polished axes/adzes

He also includes a most interesting documentary in two videos on how modern Papuans make and use their polished axes (narration in French):


April 8, 2014

Lions also migrated out of Africa

A quick excursion from the humano-centric focus of this blog, in this occasion to the paleohistory of that fascinating social predator: the lion.

Ross Barnet et al., Revealing the maternal demographic history of Panthera leo using ancient DNA and a spatially explicit genealogical analysis. BMC Evolutionary Biology, 2014. Open accessLINK [doi:10.1186/1471-2148-14-70]

Abstract

Background

Understanding the demographic history of a population is critical to conservation and to our broader understanding of evolutionary processes. For many tropical large mammals, however, this aim is confounded by the absence of fossil material and by the misleading signal obtained from genetic data of recently fragmented and isolated populations. This is particularly true for the lion which as a consequence of millennia of human persecution, has large gaps in its natural distribution and several recently extinct populations.
Results

We sequenced mitochondrial DNA from museum-preserved individuals, including the extinct Barbary lion (Panthera leo leo) and Iranian lion (P. l. persica), as well as lions from West and Central Africa. We added these to a broader sample of lion sequences, resulting in a data set spanning the historical range of lions. Our Bayesian phylogeographical analyses provide evidence for highly supported, reciprocally monophyletic lion clades. Using a molecular clock, we estimated that recent lion lineages began to diverge in the Late Pleistocene. Expanding equatorial rainforest probably separated lions in South and East Africa from other populations. West African lions then expanded into Central Africa during periods of rainforest contraction. Lastly, we found evidence of two separate incursions into Asia from North Africa, first into India and later into the Middle East.
Conclusions

We have identified deep, well-supported splits within the mitochondrial phylogeny of African lions, arguing for recognition of some regional populations as worthy of independent conservation. More morphological and nuclear DNA data are now needed to test these subdivisions.  
 
 
Modern lions originated somewhere in Africa, possibly towards the East or South of the continent, and spread from there. Asian lions originated in North Africa and migrated Eastwards more or less like humans did. However, according to the study's molecular clock estimates, they did so only in the Mousterian Pluvial and not in the Abbassia Pluvial, as we did. 

The cave lion is a different (sub-)species, used in this study to root the phylogenetic tree.

Phylogenetic analyses of lion sequence data. A) Median network of 1051 bp of cytb for all 88 lion individuals identified from GenBank plus those generated in this study. Panthera leo spelaea was used as an outgroup. Circles are proportional to haplotype frequencies and black circles represent hypothesized intermediate haplotypes. The number of links represent the number of mutations between haplotypes. Haplotypes are labelled from A to S and correspond to sequences labelled in Table 1 and Additional file 3: Table S1. B) Phylogenetic tree from a Bayesian analysis of combined cytb and control region data for all lion taxa where available (n = 54). Posterior probabilities of supported clades are shown at nodes. Estimates of divergence times: (a) 124,200 years (95% credibility: 81,800-183,500); (b) 61,500 years (32,700-97,300); (c) 51,000 years (26,600-83,100); (d) 81,900 years (45,700-122,200); (e) 57,800 years (26,800-96,600); (f) 21,100 years (8300–38,800). Branch colours correspond to reconstructed ancestral geographic states (Purple, South Africa; Yellow, East Africa; Orange, West Africa; Red, Central Africa; Teal, North Africa; Blue, South Asia; Green, Near-East). Tip colours correspond to origins of samples.




Reconstructed distribution of the modern lion at different times. Estimates of spatial diffusion pathways at Marine Isotope Stage (MIS) time points: A. MIS5 B. MIS4-MIS3 C. MIS2-MIS1 D. Estimated natural distribution prior to anthropogenic disturbance. Black arrows show estimated spatial diffusions, with thicknesses proportional to Bayes factors. Movement from East Africa to South Africa (4.83), from South Africa to East Africa (4.66), from West Africa to Central Africa (3.00), from North Africa to South Asia (4.37), from South Asia to North Africa (4.50), from North Africa to Middle East (21.03). Tropical rainforest is shown in light grey (present distribution), maximal extent during humid periods (black dashed line), and minimal extent during arid periods (white dashed line). The Great Rift Valley is shown in dark grey. African rivers are shown in blue. Co, Congo; Ng, Niger; Ni, Nile; Se, Senegal.

April 6, 2014

Revised Lazaridis study on ancient ancestry of Europeans

The already famous Lazaridis et al. study on the contribution of various ancient populations to modern European genetics has gone through a revision which does not alter the fundamental conclusions reached in the past but does add some interesting nuances, new graphs and some new data.

Iosif Lazaridis, Nick Patterson, Alissa Mittnik, et al., Ancient human genomes suggest three ancestral populations for present-day Europeans. BioArxiv 2013 (preprint). Freely accessibleLINK (last version) [doi:10.1101/001552]

Most up to date supplementary info → LINK

For background see this previous entry.


Scandinavian hunter-gatherers deviate towards Siberia

Among the new data (or maybe data I skipped in the first read?) is the fact that the ancient Epipaleolithic individuals from Motala (Sweden) deviate towards Mal'ta-1 (Siberia), something that neither Lochsbour (representing Western hunter-gatherers) nor Stuttgart (representing early farmers) do.

This implies that there were already some differences in the Epipaleolithic era among European hunter-gatherers, with those of Magdalenian background lacking the Siberian (ANE) component, which is found however in Scandinavian ones (of Ahrensburgian background?) This may help explaining the extra ANE affinity in Northwest Europe, which is otherwise hard to understand. 

It also suggests that Eastern European hunter-gatherers were already in the Epipaleolithic more akin to Siberian ones than those from the West and South of the subcontinent, as well as those from West Asia (otherwise Stuttgart, which has partial West Asian ancestry would show increased affinity). Of course this can only be confirmed by direct analysis of Eastern European Paleolithic remains but seems quite likely in any case.


Principal component analyses (lots of them!)

Ancient samples projected on the global PCA:

Fig S1-10: projection of ancient samples onto global PCA dimensions 1 & 2

In this global PCA, EEFs overlap well with the reduced modern European sample (Basques and Sardinians only) and the West Asian one (Georgian, Palestinian Bedouins). However projected hunter-gatherers from Europe and Siberia show a clear "other Asian" deviation. Why? For the very same reason that South Asians and Melanesians do, even if they are clearly distinct populations: because the frame gives them no other choice: they are not quite like modern Europeans and they do not have any African tendency either, so the other populations that are somewhat akin to them are other Asians and there they go. 

That's why PCAs must always be taken with a preventive dose of salt: they are very nice visualization tools but they depend too much on the sampling strategy and its intrisical bidimensional limitations. 


Modern West Eurasians projected to a PCA of three ancient samples:

Fig. S10-3 Projection of West Eurasian populations onto the first two principal components
inferred using Loschbour, Stuttgart, and MA1 (full version).

Quite curious: all West Eurasians cluster tightly in comparison to their ancient "ancestors". It is likely that dimension 2 should be scaled down because the second component is always smaller than the first one (often around half). However I could not find a clear datum to proceed so I retained the original equal scale even if it can be a bit misleading. 

While not exactly, Lochsbour and Stuttgart explain the bulk of European (and West Eurasian!) ancestry, at least in comparison to the quite outlying Mal'ta-1 sample, representing ancient Siberians. 

Detail (zoom in) of this PCA:

Fig. S10-4 (I annotated the three ancestral tendencies with arrows for easy of view)

As expected Eastern/SE populations deviate more towards Stuttgart, Western/NW do towards Lochsbour and in general Northern populations deviate slightly towards Mal'ta, also in West Asia (i.e. Iranians, Turks).

Modern European PCA with ancient samples projected on it:

Figure S10.5: Projection of ancient samples onto the “European” PCA (annotations in gray are mine)

Something the authors notice is that their PCA does not approximate a map of Europe, as happens in some cases. They dedicate some time to evaluate this discrepancy, comparing with the PCA of Novembre et al. 2008. The differences are caused because the latter used many more NW and Central European samples and instead had way too little Eastern European ones. 

This highlights that sampling strategy is of utmost importance when analyzing autosomal DNA, not just in PCAs because oversampled populations would tend to cope the axes (or components). This should be obvious but is way too often ignored, what may result in spectacular magic hat tricks but hampers serious science. 

This is one of the reasons why I do not trust too much autosomal DNA analysis: lack of the fundamental rigor. This is of course not a defect in this particular study but it happens often in many others, scholarly and amateur alike. 

The authors believe that the main axis of differentiation in Europe when the subcontinent is considered as a whole may tend to Northeastern Europe rather than SSE/NNW⁸, something that is consistent with their ancient admixture findings elsewhere in the study. 


FineStructure PCA output

A key point in this study is that only Sardinians and Basques can be modeled as simple EEF-WHG admixture, all the rest of Europeans needing of the MA1 component to be explained. This is much easier to visualize in the following graph.
We also processed the ChromoPainter/ChromoCombine output with fineSTRUCTURE1 using 250,000 burnin and 2,500,000 runtime MCMC iterations. Fig. S19.2 shows a Principal Components Analysis by fineSTRUCTURE which strongly resembles that of Fig. 1B.
Fig. S19-2 (annotated in gray by me)
Here we can see that only Sardinians (and quite insistently Canarians), Basques and some populations related to these (North Iberians, South French) are actually close to the Stuttgart-Lochsbour axis. All the rest need a third ancestry for explanation, which is approximated by MA1 (not plotted but whose tendency I annotated).

Notice that in this case the two ancient samples are not projected, as in the previous graphs but actually computed as part of the wider West Eurasian population.

This does not deny that other NE European populations have greater affinity to Lochsbour (fig S19-3) but it seems clear that this affinity must be mediated by another branch of ancient European hunter-gatherers, one that existed in Eastern Europe, rather than in the West, and that it had more ancient Siberian (MA1) affinity. It is also likely that this Eastern European aboriginal population was the one which brought the extra MA1 affinity to the rest of Europe, most likely in the context of Indoeuropean (Kurgan) migrations. Along with it they probably also brought extra WHG-like admixture (but actually from an Eastern European source).

The extra MA1 tendency is also present in West Asia. This may have two alternative or complementary explanations:
  1. There have been also significant Siberian-like intrusions in West Asia after the Neolithic.
  2. This extra ancient Siberian affinity is in fact (largely?) pre-Neolithic but the founder population of West Asian roots which triggered the European Neolithic in Thessaly was particularly removed from this admixture and more akin to Palestinians or peninsular Arabs than to other West Asians.


Sicilians, Maltese and Ashkenazi Jews are different

These three are the only European populations which have a poor fit with the triple admixture model (EFF+WHG+ANE), suggesting that they have fourth party inputs, most likely extra admixture from West Asia. 

This is apparent in the previous graph too (among several).


Tree modeling for the origins of the ancestral populations

From fig. S16-2 (allowing for five admixture edges)

From fig. 16-4 (full-genome coverage, allowing for 5 admixture edges and using Dai instead of Onge)

The basic topology of the tree is consistent (excepted the partial change of the location of Karitiana Native Americans, which depends on the greater affinity of the East Asian sample used and is essentially neutralized by the admixture edge with Onge or Ma1 respectively). The main admixture events are:
  1. The Karitiana (and Amerindians by extension) are clearly a mix of East Asians plus Ancient Siberians of Western affinity (MA1), which is represented differently in both trees.
  2. Early European Farmers (Stuttgart, Iceman) have clear "Basal Eurasian" admixture (which can be interpreted as North or East African input and/or a residual ancient Arabian element, probably both)
  3. La Braña also has "Basal Eurasian" admixture (surely from NW Africa, what implies that the North African component in Western Iberia is pre-Neolithic)
  4. Motala has Ancient Siberian (MA1-like) admixture
  5. Mal'ta 1 probably has some East Asian admixture
  6. Ötzi the Iceman might have some Western Hunter-Gatherer admixture (~3%)

Of the three Western branches, MA1 is the more distant one. That implies that West Asia and Europe were also exchanging genetics in the Upper Paleolithic, while Siberia remained more isolated in comparison. That stands even when East Asian admixture into MA1 is accounted for. 


A Lochsbour's cousin in West Asia

The authors compare and analyze many models of possible admixture leading to the known ancient and modern populations. They seem to favor this one in the end:

Figure S14.20: A model for Near Eastern populations with Ancient North Eurasian admixture.
Stuttgart is a mixture of Near_East and a sister group of Loschbour (UHG: Unknown Hunter-
Gatherers); A Test population (shown here) is a mixture of Near_East and a sister group of MA1.

This scheme suggests that ancient West Asian ("Near East") populations were closer to ancient Europeans (Lochsbour) than to ancient Siberians (MA1). It also suggests an unknown relative population of Lochsbour (UHG) as partial ancestor of early European farmers (Stuttgart). This population is speculated to have lived in the Balcans. 

An issue here is that most modern West Asians and all Caucasian peoples actually have too much MA1 affinity to be a good fit for the (ancient) Near East proto-population concept. About 12-13% among West Asians (Cypriot, Druze) and as much as 29% among Caucasians. They are actually more like "Test" than like "Near East".

The authors conclude that they don't really know if this extra MA1-like ancestry is old or recent. If old, it would imply either two different populations of West Asians or, as they say, the expansion of a West Asian population with extra "Basal Eurasian" ancestry. 

This brings us to a key question: what is actually "Basal Eurasian"?


What is "Basal Eurasian"?

Notice that "Basal Eurasian" is defined as phylogenetically intermediate between the Mbuti and Eurasian-plus populations. Quite misleadingly the node is described as "Non-Africans" but that does not need to be true at all. It is just downstream of one of the most ancient African sub-branches, that of Pygmies, so it can still represent African populations which are or were closer to the out-of-Africa branch. 

There is no formal ascertainment whatsoever of what is "Basal Eurasian", no comparison with other African populations and no even formal consideration of the (very likely) possibility of various isolate ancient populations existing in NW Africa or Arabia. This is clearly a flaw. 

A key piece of information here is that La Braña (ancient NW Iberian hunter-gatherer) consistently shows "Basal Eurasian" admixture. This admixture is much more likely to have arrived from North Africa than anywhere else. NW African genetic markers are still apparent in Western Iberia in fact and there is strong archaeological support for Iberia-NW Africa interaction in Solutrean/Oranian times. 

We can only consider in fact this "Basal Eurasian" idea as a mere indicator of African-like affinity, even if it's not Mbuti but something else. This something else can be in fact several things:
  • NW African Aterian residual in the case of La Braña
  • Arabian OoA residual influence in the case of EEF
  • Direct NE African admixture in the proto-EEF West Asian population, strongly indicated by the NE African E1b-M78 (notably its subclade E1b-V13) in ancient Neolithic and modern European Y-DNA.
I am particularly inclined to suspect an almost direct migration from Palestine to Thessaly at the origins of European Neolithic. After all both non-European lineages found in early European farmers (E1b-V13 and G) are common in that area. But of course an Anatolian intermediate station cannot be excluded.

In any case I'd suggest to change the terms "Basal Eurasian" and "non-Africans" by something more neutral, maybe "Ultra-Mediterranean" and "Proto-Eurasian" respectively, where both concepts are allowed to be African, at least partly so.


Update (Apr 23): see also here for some curious aspects of Lochsbour's IBD ancestry.

March 29, 2014

Y-DNA R1a spread from Iran

While this conclusion was something more or less reachable with previous data (see HERE for example), a new study adds some fine detail for us to reconstruct the paleohistory of this major Eurasian lineage.

Peter A. Underhill et al., The phylogenetic and geographic structure of Y-chromosome haplogroup R1a. EJHG 2014. Pay per viewLINK [doi:10.1038/ejhg.2014.50]

Important: supplemental materials are freely available.

Abstract

R1a-M420 is one of the most widely spread Y-chromosome haplogroups; however, its substructure within Europe and Asia has remained poorly characterized. Using a panel of 16 244 male subjects from 126 populations sampled across Eurasia, we identified 2923 R1a-M420 Y-chromosomes and analyzed them to a highly granular phylogeographic resolution. Whole Y-chromosome sequence analysis of eight R1a and five R1b individuals suggests a divergence time of ~25 000 (95% CI: 21 300–29 000) years ago and a coalescence time within R1a-M417 of ~5800 (95% CI: 4800–6800) years. The spatial frequency distributions of R1a sub-haplogroups conclusively indicate two major groups, one found primarily in Europe and the other confined to Central and South Asia. Beyond the major European versus Asian dichotomy, we describe several younger sub-haplogroups. Based on spatial distributions and diversity patterns within the R1a-M420 clade, particularly rare basal branches detected primarily within Iran and eastern Turkey, we conclude that the initial episodes of haplogroup R1a diversification likely occurred in the vicinity of present-day Iran.

This case, as well as many others, including that of its close relatives R1b and Q, illustrate why frequency is not the same as origin, which can only be inferred (if at all) by studying the hierarchical diversity of the lineage. These three lineages for example, must have spread from West Asia but they are relatively less important in numbers in that region today, overshadowed by other lineages, notably J. Instead their derived branches had major impacts in other regions (Europe, South and Central Asia, Siberia and America).



Frequencies of the main lineages

There are two main sub-lineages of R1a, which according to the current ISOGG tree version (maybe to be refitted after this study?) are known as R1a1a1b2 (Z93) and R1a1a1b1a (Z282). The first one is essentially Asian (with greatest frequencies in South and Central Asia, where it includes >98% of all R1a individuals) wile the latter is almost exclusively European (notably Eastern European but with a distinct branch in Scandinavia, encompassing together >96% of R1a individuals in Europe).




These maps give us a quite decent glimpse of the main scatter patterns of R1a but alone they can't inform us of its origins. For that we have to look at the detailed tree and the relationship of its samples with geography. 


Origins and distribution of R1a

As mentioned above, the authors conclude that R1a and R1a1 must come from Iran, where the greatest basal diversity is:
To infer the geographic origin of hg R1a-M420, we identified populations harboring at least one of the two most basal haplogroups and possessing high haplogroup diversity. Among the 120 populations with sample sizes of at least 50 individuals and with at least 10% occurrence of R1a, just 6 met these criteria, and 5 of these 6 populations reside in modern-day Iran. Haplogroup diversities among the six populations ranged from 0.78 to 0.86 (Supplementary Table 4). Of the 24 R1a-M420*(xSRY10831.2) chromosomes in our data set, 18 were sampled in Iran and 3 were from eastern Turkey. Similarly, five of the six observed R1a1-SRY10831.2*(xM417/Page7) chromosomes were also from Iran, with the sixth occurring in a Kabardin individual from the Caucasus. Owing to the prevalence of basal lineages and the high levels of haplogroup diversities in the region, we find a compelling case for the Middle East, possibly near present-day Iran, as the geographic origin of hg R1a.

Between these top tier nodes (R1a and R1a1) and the two most common sublineages described above, this study only found one paragroup represented: R1a1a1* (M417). This should be an important step in the analysis but the researchers prefer to remain silent on it. Why? I guess that the reason is that it is complicated to analyze and reach to sound conclusions. 

I spent some time today looking at the haplotypes of this paragroup mentioned in the study and I could not reach a conclusion either: the majority of the sequences are from Europe and all them (excepting a highly derived Norwegian line and including a low derived Iranian one) seem to derive from a North German haplotype. I call this group "branch A". 

However there is at least one West Asian sequence (from Turkey) which seems independent ("branch B"), while an Indian and the already mentioned Norwegian sequence could derive from either one. So my impression is that there is an specifically North European "branch A" but also some other stuff with West Asian centrality ("branch B") within this key paragroup. 

Guess that I could say a lot more about not being able to say much more on this key intermediate step but, synthetically there are two options among which I can't decide:
  • Branch A went back to West Asia from where it spread again to Eastern Europe and Central South Asia.
  • Branch B is actually at the origin of the two derived and highly spread subhaplogroups.
Whatever the case I understand that there are good reasons to think that these spread first from West Asia, at the very least Z93 and very likely also  Z282. 


R1a1a1b2 (Z93)

There is nothing European in this lineage: only some lesser terminal branches at the Southern Urals, roughly where the Kurgan phenomenon began some 6000 years ago. 

This detail is indeed remarkable because, if, as often argued, R1a or some of its subclades spread from there, we should expect at least some basal diversity being retained. Instead all we see are some highly derived branches. So the main conclusion must be that the expansion of R1a does not seem related to the Kurgan phenomenon, except maybe in some secondary instances. 

As mentioned before, this lineage is Central and South Asian and comprises the vast majority of R1a in those two regions. 

The detailed haplotype network can be seen in Supp. Info fig. 2.

In essence we can say that:
  • Z93* has three apparent distinct branches stemming from West Asia (incl. Caucasus) and another one from South Asia/Altai (1). 
  • Z95* has two apparent distinct branches:
    • A small one with presence in West Asia and Southern Europe
    • Another one (pre-M780?) stemming from South or West Asia
  • M780 has clear origins in South Asia (incl. most Roma lineages)
  • Z2125 also appears to originate in South Asia, even if it has a greater spread outside it, notably to Central Asia
  • M580 and M582 appear related and surely originated in West Asia
Weighting them:
  • Z95:
    • West Asia: 2
    • South Asia: 2
    • West/South Asia: 1
Therefore the origin of Z95 should be though as West-South Asian but undecided between either region. Say Afghanistan for example. 
  • Z93:
    • West Asia: 3
    • West/South Asia: 1 (Z95)
    • South Asia/Altai: 1 
In this case I would say that West Asia is almost certainly the origin, although tending to Central/South Asia. For example: Iran again. 

So, regardless of whether the previous stage (M417) represents a stay in West Asia or a back-migration from Europe into West Asia, West Asia is clearly at the origin of Z93. It does not represent any Kurgan migration but an Asian phenomenon with origins towards the West (around Iran).


R1a1a1b1a (Z282)

On first sight this European sublineage seemed quite simpler: it is obvious that the bulk of it spread from Eastern Europe. However, when we look at the haplotype network, we cannot confirm this pattern for the Norwegian or Scandinavian haplogroup Z284, which is only linked to the rest via some South European and West Asian samples. 

So my conclusion must be that Z282 experienced a main expansion from Eastern Europe but only into Eastern and Central Europe and that the Scandinavian variant almost certainly represents another flow within this haplogroup, with the knot being in West Asia. 

Anyhow the main East and Central European expansion seems true. For some reason it is not centered in any obvious prehistorical locality, as could be the Volga or maybe Ukraine, but instead its center is further North around Smolensk. 


Overall reconstruction of the spread of R1a

With all the previous analysis I made this map, which also shows in discrete gray color the general pattern of expansion of haplogroup R:


We have an expansion of R into South Asia and Western Eurasia (incl. Central Asia) and even into parts of Africa (R1b-V88) from apparent South Asian (R, R1 and R2) and West Asian (R1a, R1b) origins. Related lineages Q and P* could also be integrated into this pattern of expansion but I did not want to overload the map with too many details. 

There is some uncertainty regarding the North European branches of R1a but otherwise the pattern seems quite clear. 

On these North European branches, I must say that they remind me of other odd lineages with similar geography: R1b-U106, I1-M253 and I2a2-M223. With the likely exception of R1b-U106 neither appears to have experienced any significant re-expansion since their arrival to that corner of the World, however they do seem to survive pretty well in it. 


Time frame?

Finally we seem to be entering the age of full Y chromosome sequencing and a more serious molecular clock based on it. As I have explained on other occasions (for example), the human Y chromosome is large enough to experience mutations almost every single generation, what should provide a decent molecular clock, unlike the very rough approximations used in the past. 

However the issue of correct calibration remains open. As you surely know the academy is slow to incorporate the most recent evidence, especially from fields distinct to their specialty. Hence I do not expect them to calibrate based on the obvious fact that age(CF) or at least age(F)=100,000 years. They are probably still stuck in old concepts of a "recent" out-of-Africa migration c. 60 or at most 80 Ka ago, as well as the usual Pan-Homo spilt under-estimates

I must reckon in any case that I had not enough time to study this matter in depth yet, so the previous observation is rather my idea of what to expect.

In any case in this study the authors resorted to full Y chromosome to calculate their age estimates and I applaud them for doing so. As apparent in fig. 5, all R1 derived sequences have approximately the same number of accumulated SNPs, what in principle allows for a perfected molecular clock, assuming it is well calibrated. 

Their estimate is as follows:
A consensus has not yet been reached on the rate at which Y-chromosome SNPs accumulate within this 9.99Mb sequence. Recent estimates include one SNP per: ~100 years,⁵⁸ 122 years,⁴ 151 years⁵ (deep sequencing reanalysis rate), and 162 years.⁵⁹ Using a rate of one SNP per 122 years, and based on an average branch length of 206 SNPs from the common ancestor of the 13 sequences, we estimate the bifurcation of R1 into R1a and R1b to have occurred ~25,100 ago (95% CI: 21,300–29,000). Using the 8 R1a lineages, with an average length of 48 SNPs accumulated since the common ancestor, we estimate the splintering of R1a-M417 to have occurred rather recently, B5800 years ago (95% CI: 4800–6800). The slowest mutation rate estimate would inflate these time estimates by one third, and the fastest would deflate them by 17%.
The references correspond to (4) Poznick 2013, (5) Francalacci 2013, (58) Xue 2009 and (59) Méndez 2013. This last is the Anzick study, of which at the very least we can say that they had a real calibration point in the ancient Amerindian DNA. It is also the one which provides the longest mutation rate. 

Considering that Xue 2009 is "old" (for this avant-guard aspect of this pretty young science), I find their choice of the Poznick rate quite a bit conservative. The Francalacci rate is the intermediate one of the three "recent" papers referenced and it is also quite close to the calibrated Méndez rate. 

Personally I would choose the later without a second thought. As long as CF ends up being younger than 100 Ka, it is positively too conservative anyhow.

Using the Méndez (Anzick-calibrated) rate of 162 years per SNP, I get the following corrected estimates:
  • R1a/R1b split (R1 node): 33,000 years ago (CI: 26.0-42.5 Ka)
  • R1a-M417 node: 7,700 years ago (CI: 6.4-9.0 Ka)

These seem fair enough to me, judging on the fact that the core R1a expansion seems to originate in West Asia (at the very least for the South/Central Asian branch), what fits much better with a Neolithic frame than with the Kurgan one.

It also fits better with my previous estimates after due re-calibration of Terry D. Robb's full sequence Y-DNA tree, although my estimates are even older, especially after a second recalibration to adjust to the recent discovery of widespread H. sapiens evidence in South and East Asia c. 100 Ka ago

In my understanding the R1 node is actually c. 48 Ka old (R1b: c. 34 Ka.), what, apportioning, yields a date of c. 11.2 Ka for the R1a-M-417 node. 



Update (Mar 31):best possible molecular clock estimates for R1:

Follows fig. 5 of Underhill et al. 2014, annotated by me in red and purple colors:


If I'm correct, then the expansion of R1b in Europe still corresponds in rough terms to the Magdalenian period or, more generally, the late Upper Paleolithic. This does not mean that it remained that way forever (it may well have been reshuffled later on: in the Epipaleolithic, Neolithic and Chalcolithic) but it seems to be the time-frame of its main expansion when the main lineages got established, whatever happened to them later on.

I know well that so far ancient DNA for this lineage remains to be found and that the dominant haplogroup among known Epipaleolithic hunter-gatherers was (for all we know) I2a. However this is what the refined full Y chromosome sequence molecular clock, properly calibrated according to the archaeological evidence for the settling of Asia by H. sapiens, has to say. If you wish to dismiss this and use another estimate instead, that's always up to you. I just hope that you know what you're doing.

Anyhow, if I am correct, then the expansion of R1a is neither Chalcolithic nor Neolithic but clearly Epipaleolithic. Does it make any sense? I can't say for sure because this period is not so well understood. Whatever the case, is it possible to integrate the key pre-Neolithic Zarzian culture of the Zagros (map) in this scheme of things? What about all the other question marks that fill the gaps of our mediocre knowledge of the Mesolithic of West Asia? Or is it the Balcanic Epigravettian to be blamed instead? Or both?

I really can't say with any certainty at this stage. But I am intrigued indeed.


Update (Mar 31): frequency pie charts of Underhill's data available at Kurdish DNA.