March 17, 2018

Oldest known Iberian R1b-S116 (and DF27) is NOT at all Indoeuropean

This study is very interesting but it is very wrongly argued, maybe in an attempt to fit their findings with what has sadly become the mainstream current of "explanation" about the origins Y-DNA haplogroup R1b-S116 (also P312, etc.)

Cristina Valdiosera, Thorsten Günther et al. Four millennia of Iberian biomolecular prehistory illustrate the impact of prehistoric migrations at the far end of Eurasia. PNAS 2018. DOI:10.1073/pnas.1717762115

The issue is that they found the very first known carrier of R1b-S116 (and R1b-DF27, the main Iberian haplogroup) in an individual of the Bronze Age of Lower Rioja (Cueva de Los Lagos, Alhama de Cervera), belonging very clearly to the Central Iberian culture of Cogotas I, even if it is at its very northeast margin.

What is wrong? Well, the very title is wrong. It is nothing but an artifact produced by forced (supervised) results of Admixture within the simplistic 3-population model. Even then their result is in fact so weak that it immediately cried to me as "artifact" (noise or whatever you want to call it) and it is effectively nothing but that. 

And to demonstrate it is as simple as digging into the supplementary materials and look at the unsupervised Admixture run (dataset S03), whose optimal columns (lowest CV scores) are K=16-19 (all four are optimal, what is fine with me but makes explanation and understanding a bit more dense). 

As that unsupervised admixture is massive, with lots of global populations ancient and modern, I made a selection using only the four optimal K-values (K=16 to K=19, from left to right):

Click to expand (labels at bottom are mine)

And it is absolutely clear from K=16 to K=18 that there is not a speck of the Caucasus component which is absolutely universal in all the true Indoeuropean samples. There is a tiny speck of it in the K=19 column but there even Sardinians and some Anatolian Neolithic individuals have it at much greater values and thus cannot anymore be automatically interpreted as Indoeuropean marker, but just as extra Caucasus affinity present in some Neolithic-derived populations or individuals more than others since the very beginning of mainline (Vasconic) European Neolithic at the Aegean.

And this is it. Quod erat demonstrandum (Q.E.D): R1b-S116, at least in Iberia, has nothing to do with Indoeuropean expansion, nothing at all: it is absolutely clear that it is a pre-Indoeuropean thing. And it has been present in Lower Rioja since at least the Bronze Age.

Furthermore, when we look at the Central European Bell Beaker (Central BB) samples and compare them with their immediate chronological precursors of (definitely Indoeuropean) Corded Ware culture, we must admit that there is a decrease of the Caucasus component and an increase of the Vasconic Neolithic (light blue) element. This also speaks against the Indoeuropean "explanation" for the expansion of R1b-S116 into Central Europe, because the first known such ancient carriers are from the Bell Beaker period and not a moment earlier, and these clearly express an anti-Indoeuropean tendency in their autosomal genomes.

There is however a sizable Indoeuropean component in modern non-Basque Iberians, smaller than in most other European populations but very clear nevertheless. This must have arrived at later times: (1) with the Celts, who arrived to Catalonia at the end of the Bronze Age, later expanding into Central and Western Iberia, (2) with the Romans, (3) maybe also to some extent with the Germanic invaders of the late Roman period. None of these expansions seem particularly associated with R1b-S116, however the c. 1% R1a and the c. 8% J2 (with plausible Italo-Roman origin) should be related to it, along with an assortment of other haplogroups. 

For those willing to dig in the details, there is also a small treasure trove of other ancient Y-DNA, mostly I (which underlines the Paleoeuropean influence in Neolithic Iberia, regardless of whether this is local or was carried on from further East by the Neolithic settlers), as well as one instance of unspecific R1b, another of G and another of H.

Someone may ask, which is then the origin and means of expansion of R1b-S116, if not Indoeuropean? Good question to which I don't have yet a well defined answer. But my tentative explanation is that it should be related to two ultimately related processes within Western European "Neolithic" (Late Neolithic and Chalcolithic): 
  1. The well documented phenomenon of increase, in most areas at least, of the Paleoeuropean component time passes, this may be to some extent because of simple absorption of local subneolithic "hunter-gatherers" but it probably also produced different subpopulations within the Western Neolithic and in some cases we do see these peripheral "Second Neolithic" groups expanding at the expense of the "First Neolithic" peoples. This is most clear in Central Europe with the expansion of Funnelbeaker cultures from, probably, Denmark and nearby areas of Low Germany. In fact Michelsberg culture and its close relative in France Seine-Oise-Marne basically wipe out the first farmers of LBK (Linear Pottery) at what I usually describe as the Chalcolithic but is often described as Middle or Late Neolithic in other sources.
  2. Clearly Bell Beaker had something to do: we see their impact in Germany, Britain and Ireland and one could argue that Cogotas I is somehow derived from the Bell Beaker of Ciempozuelos, although in this I'm going to remain neutral and a bit skeptic until more evidence shows up. 
But what seems very apparent tome is that R1b-S116 should have expanded from somewhere in France, probably towards the South. And we do need better genetic studies, including archaeogenetic ones, on the Hexagon before we can jump to conclusions. France is not the most affected area by Bell Beaker, so I am cautious about attributing too much weight to only Bell Beaker and I would rather think on a complex succession of expansions associated to various cultures. 

Of great interest here should be the ill-known but fascinating Artenacian culture, which expanded in all West France and Belgium from a core at Dordogne before the BB period and coincident with the Corded Ware expansion in Central Europe. Like Bell Beaker folk, they were adept at bowmanship but their area is not densely affected by Bell Beaker later on (although there is indeed a scatter of findings). I do wonder if somehow Bell Beaker is derived from Artenac, even if it is clearly not the same thing. Food for thought.

Update (March 18): small steppe-like noise appears in diverse Iberian samples since the Late Neolithic/Chalcolithic.

This has arisen in the discussion below (h/t to MZ): when the supervised (forced assignment to rigid three populations) is used, the appearance of "steppe" ancestry is found here and there also before the Bronze Age. As we see above, this is not real: it does not happen in the unsupervised model at all but mere "noise" or "artifact" produced by the excessive simplicity of the three populations model.

This does not make the three populations model "wrong": it is still approximately right but "evidence" produced  ONLY from rigidly applying this model is not evidence of anything, just a hint to be confirmed or rejected via wider analysis at best.


  1. "Quod erat demonstratum (Q.E.D)": I'd say: "Quod erat demonstrandum" (What had to be demonstrated).

    1. You seem to be right, so I'm going to correct my error right away. Thank you and my apologies for my poor provincial Latin.

    2. It seems that you didn't accept Latin as a superstratum, but also we Etruscans had, but we learned it so well that Italian was born here and not at Rome.
      Anyway I've been a teacher of Italian, Latin, History and Geography in the "Liceo", thus I play at home. Your post is very interesting and I'll study that very well.
      In the past I expressed the hypothesis that Italy had at least three languages: a Basque one in Sardinia, an Indo-European centum in the North and Etruscan was intermediate between Caucasian and IE (of course this theory isn't mine but of Alfredo Trombetti), even though it may have come from the Aegean Sea with the first agriculturalists through central Europe and Southward (see Rhaetians and Camuns)...
      Yamnaya very likely expanded only the satem languages... and to link a language to only one haplogroup is simply stupid...

  2. In the detailed results of the supervised 3-population ADMIXTURE run the "steppe" component is already present in the late neolithic samples. Since those predate Yamnaya it must be some type of HG ancestry. This makes it possible to tease out inflated steppe components in descended populations like Basques in models based on f-statistics, while SNP-sorting algorithms like ADMIXTURE and more recently Dystruct fail to pick up the steppe signal.

    1. It's an artifact. In general, a well done unsupervised run is always better than a supervised run.

      And what do we see in the unsupervised run? That there is some sort of confusion between, most obviously, Caucasus affinity due to Mainstream (Anatolia) Neolithic present irregularly in the EEF and derived populations. Another source of confusion may be on the side of the HGs but it's not that obvious.

      What I told yesterday to someone in Facebook: this graph, my selection from Valdiosera's unsupervised run, is very good to train the eye to what we can call "quantum uncertainty" of autosomal genetics, because all four columns are equally good, judging on cross-validation scores, which AFAIK is the relevant control, but they produce clearly different results for almost all populations.

      Why? Because in the left colums (K=16 and K=17 are almost the same for our selection), the Early Neolithic is not as well defined and shows instead more WHG, what is indeed correct for all we know: early West Anatolian carred some WHG along West Asian specific Neolithic ancestry. In K=19 however (K=18 is transitional but closest to those to its left) Anatolia Neolithic and LBK are almost monolithically light blue, what is similar to what we get in so many other papers focused in Europe and European demographic genesis. So the former are rather relative to West Asian Neolithic, increasing the overall WHG scores, while the latter is rather relative to early European mainline Neolithic, producing thus lower WHG scores.

      We only get Caucasus component in Sardinia (and negligibly in Bronze Age Iberia) in the last row but then we also get it in Early West Anatolia Neolithic, so in this column, it is not always a Indoeuropean or Steppe marker but also marker of some internal Early Neolithic variability, but this can become source of confusion and may well be related to the artifacts in the supervised runs.

      But mostly it is a question of forcing itself: the genetic complexity is always richer than any oversimplified model and thus it tends to produce such artifacts or "noise" when forced to fit a pre-determined model which is seldom good enough.

  3. The detailed results are on page 28 in the supplementary information btw. In particular some of the Iberian LNCA individuals are inferred to have as much or more of the steppe component than almost all of the BA individuals.

    Sorry for commenting twice.

    1. Your comment was smitten by God. No bickering, no negative gossiping (much less with names), thank you.

      In interested in FACTS and healthy discussions. Not in human misery.

  6. I like your analysis. It’s very sophisticated and absolutely unbiased. And I as well believe that major European subclades of R1b located below R1b-L151 expanded from somewhere on the South of France.

    1. Thank you Dmitry. I did almost nothing: just grabbing Valdiosera's own data and putting it in a more VISIBLE form. I would not dare to say my analysis is "sophisticated" but at least it's not a shallow acceptance of what is force-fed in the most visible parts of the study.

      Maybe more "sophisticated" but still quite simple: as it is directly derived from available modern Y-DNA, is the idea that R1b-S116 expanded from Southern France, but I've been behind that model since eight years ago and I have yet to see any single evidence of the contrary. Of course it'd be nice if there'd be more clear and detailed evidence in favor, notably regionalized French Y-DNA, both modern and ancient. That would help a lot.

  7. Maju
    Can you point me to where i can identify - "...Iberian LNCA individuals are inferred to have as much or more of the steppe component than almost all of the BA individuals...".?

    Its interesting how a southwest_Iberia_LNCA added to Olalde can show up so important in central europe Bell Beakers in some models.

    1. Unsure, that's something that's something that MZ said, not me.

    2. It's on page 28 here:

      The steppe component - if you want to call it that - already exists all over in LNCA Iberia, albeit with a more irregular distribution than in the Bronze Age. The peaks, however, are comparable. In particular one LNCA seems to have a bit more of the steppe component than even the individual with the highest steppe component from the Bronze Age. Judging by the PCA, I'd say it's the LN individual from San Quilez with a position just north of present day Basques.

    3. Now that you mention it I already made a cut of that image for the FB discussion. I'm going to add it as update, because it is very clear and relevant: supervised (forced) analysis causes some individuals to appear as slightly steppary: both in the Chalcolithic as in the Bronze Age samples, both in Northern and Southern Iberia. The reason? Unsure but surely it is that the 3-pops model is not good enough to properly represent the actual genetic diversity (it gets close but it's not "evidence" on its own right).


  8. I think I got it. This paper is following Martiniano sequence. The Portuguese last sample that is showing “steppe” is MC337A.
    Every time we find an individual used in these papers lately published that have Iberian samples, that was not an social irrelevant individual thrown into a cave, we need to look carefully. This sample was not categorized a Bell beaker, as in fact was not the I6601 added last minute to Olalde. They might/probably not be.
    But both were high status individuals buried with care. Both lived in the exact regions at the exact time of Bell beaker arising. I 6601 in Zambujal, and this MC337a in Alcalar.
    This MC337a women, was from the Alcalar bell beaker centre. Alcalar, in the further way point in Europe, beyond the mountains, in ALgarve, was a big Bell beaker center. Started 3200BC and by 2800bc was big and powerful and started to have Bell beaker pottery. She was a 60 year old buried with a blade, two pins and beads in 2800bc.

    Would love to see more modelling with her, as Alberto did with I6601. See I6601 as southwest Iberia CA here:

  9. The calls show that PIR001 belonged to Y haplogroup R1b1a1a2a~L23.

    Ahahahaah the first R-L23* found in Iberia [
    PIR001 Bronze Age 2200–1550
    ] and not in Eastern Europe
    I’ll study the question later, after bath and breakfast.

  10. The calls show that PIR001 belonged to Y haplogroup R1b1a1a2a~L23.

    Private SNPs

  11. The R-DF27 sample is at this level: R1b1a2-BY15964

    R-Y24895 BY15964/Y32225 * Y24894 * Y24895+3 SNPs formed 4500 ybp, TMRCA 4000 ybp
    o id:YF08869PRT [PT-20]
    o id:YF07103GBR
    o id:YF06232MEX [MX-NLE]

  12. I'm afraid I have to disagree with your labels.
    Paleo-European in red is the same thing as Vasconic.
    What you labeled Neolithic in light blue is obviously Indo-European.
    I seriously disagree with the idea that Basque would have anything to do with the expansion of Neolithic.

    1. You're going totally against the tide, a tide of massively growing evidence in support of the Kurgan model of Indoeuropean expansion, which was always the only robust model anyhow. But whatever...

    2. And also massively growing evidence in support of mainline Neolithic = Vasconic. The strongest evidence seems to me the discovery of eteo-Sardinian (also called "paleo-Sardinian") being clearly Vasconic, what is nearly impossible to explain unless we accept that the mainline early European farmers or people of the Neolithic of Aegean roots were Vasconic or proto-Vasconic speakers, because modern Sardinians almost identical to them.

    3. My opinion is twofold:
      1. PIE cannot be as late as the Kurgan theory. Besides, there's no indication that PIE was of Neolithic dating in the first place. Most of the "evidence" is highly debatable.
      2. It's quite possible that the Paleo-European substrates, that existed in Northern Europe (Pre-Germanic), in the British Isles (Pre-Irish), in Spain and in Greece belong to a single linguistic family. This is very easy to prove, because this requires quite a lot of work. But there are quite interesting indications in that direction.
      I think Basque and its ancestors have been there for millennias, maybe as early as modern man reached there.
      And, yes, Pre-Sardinian also belongs to that Paleo-European (Pre-PIE) and Pre-Neolithic group.

    4. It's very difficult to follow your terminology France-LGC, let alone why you would think many of the ideas you sustain (for instance the Kurgan model and the linguistic "consensus" on PIE's antiquity are very much coincident), but what I most clearly have to disagree with is with this:

      "Pre-Sardinian also belongs to that Paleo-European (Pre-PIE) and Pre-Neolithic group."

      If so how is it that they systematically get all early NEOLITHIC peoples closely aligned with them in terms genetic? It's like saying that US-Americans have nothing to do with Europe, when most of their genetics do. Sorry, no way!

      I used some time ago to be also in the Paleolithic continuity camp but that cannot be sustained anymore on light of the archaeogenetic evidence. So if the facts don't fit your theory, you have two options: going fanatic and deny the facts or being humble and change your theories according to what facts say.

    5. Sorry, but I cannot see that claim about early Neolithic people.
      Anatolian, according to your graph, is mostly light blue = Neolithic
      Sardinian is half blue and half red = Paleo-European with admixture of Neolithic.
      Basque and Spanish have even more red = Paleo-European.
      I see nothing in your graph that disproves my approach.

    6. In the right column (K=19) where Anatolia Neolithic is 100% cyan (light blue), Sardinia is 80% in the very same color. In other columns it is different and Anatolia Neolithic also appears partly as red (WHG, which it also appears in some studies specifically about that original group). And you should not only look at Anatolia but also at the other, slightly more HG-admixed group of Central Europe, and also Iberia EN/MN, Ötzi (in other studies), etc.

      Basque and Spanish (modern or even ancient but at later times) have indeed more red. And it is an open issue that process of admixture and what role played Atlantic Europe (and IMO what is now Western France maybe critically) in it. But that does not erase the cyan component: it just weakens it a bit.

      All four colums are equally true internally, at least as far as statistical analysis can tell and for a global analysis, so there is indeed some uncertainty. But when faced with such a "fuzzy logic" a good idea is usually to take the middle path, so let's take K=18 (because K=16 and K=17 are almost identical, the difference seems to be in other parts of the World). In that column and gauging by eye, I get:

      Anatolian Neol. 90% cyan, 10% red

      LBK 85% cyan, 15% red (so basically they advanced by the Morava-Danube route with nearly no admixture)

      Iberia North EN, 80% cyan, 20% red (so a bit more admixture but still not much)

      Iberia North LNCA, 60% cyan, 40% red (this is probably when the "Atlantic Shift" begins, but it's still 2/3 Aegean Neolithic-like)

      Iberia North BA 50%-50%

      Modern Basques: 30% cyan, 70% red.

      Modern Sardinians: 60% cyan, 40% red.


      Modern Mexicans are like 50% Spanish and 50% Native, what do most of them speak: Spanish or a Native language? Sure there is admixture and I do see it and also argue for it as originating and expanding from Atlantic Europe (it seems but we need more evidence from places like France, Denmark, etc.) But still Sardinians are at best like LNCA Iberians... with some luck.

      In any case you should read many other studies. This is just one.

    7. It could also be interesting in this context to consider the latest work done by Aikio as well as others before him on northern substrate languages in Sami, which seem to have had no discernible relationship with either Indo-European or Vasconic. It would be consistent with what we know of the population genetics of modern Sami (i. e. hugely inflated HG ancestry compared to other Europeans) to consider these Paleo-Laplandic languages the languages that were spoken by least some of the 'Paleoeuropean' HGs before the arrival of the Uralics. And yeah, at least those northern HGs definitely didn't speak Vasconic, which definitely strengthens the Vasconic = Neolithic association.

    8. I was not aware: looks interesting. Do you have a link?

    9. This paper is a good summary - the case for at least two distinct substrata looks very strong (here dubbed Proto-Laplandic & Proto-Lakelandic):

      There's an interesting sketch of a linguistic map of the early iron age in northern Europe, with Lakelandic/Laplandic covering most of Fennoscandia based on toponymy before the arrival of Uralic from the south-east and Germanic from the south.

      Come to think of it, perhaps it was a bit speculative of me to say that those would have been the languages of the HGs since some very old off-shoots of CWC presumably also survived in Finland and parts of Lapland. But I think there's at least the possibility that those mysterious groups are responsible for the inflated HG ancestry of modern Sami.

  13. The calls show that ESP005 belonged to Y haplogroup R1b1a1a2a1a2a-BY15964.

    Private SNPs:
    Even though these SNPs (only downstream P312) won’t be all reliable, because from Ytree rather than the most rigorous YFull tree, who may believe that hg. R-L27 is only 4500 years old as to YFull?

  14. R1b-P312-Z290-L21-DF13-Z39589-DF41-Z43690-Y8426-A40-BY281-BY283

  15. This sample is ancestral to the survived haplogroups so far, and present only in Iberia and Southern America, because it is positive only for three of the five SNPs at this level:


