The scientific community is abuzz with the publication of 30 articles in Nature and other journals (Genome Research and Genome Biology) resulting from the Encode project. As Gina Kolata reports in New York Times,
The human genome is packed with at least four million gene switches that reside in bits of DNA that once were dismissed as “junk” but that turn out to play critical roles in controlling how cells, organs and other tissues behave. The discovery, considered a major medical and scientific breakthrough, has enormous implications for human health because many complex diseases appear to be caused by tiny changes in hundreds of gene switches.
The findings are the fruit of an immense federal project, involving 440 scientists from 32 labs around the world. As they delved into the “junk” — parts of the DNA that are not actual genes containing instructions for proteins — they discovered it is not junk at all. At least 80 percent of it is active and needed.
The result is an annotated road map of much of this DNA, noting what it is doing and how. It includes the system of switches that, acting like dimmer switches for lights, control which genes are used in a cell and when they are used, and determine, for instance, whether a cell becomes a liver cell or a neuron.
There are only about 21,000 genes that code for proteins, which constitutes a tiny proportion of the human DNA. Previously it was thought that 99 percent of DNA in the human genome didn’t do anything – it was ‘junk.’ Now it appears that most of it is actually functional – but in a very strange way.
The human genome resembles an army that has 21,000 privates and millions of generals that tell privates (and each other) what to do.
Visualizations of networked linkages between genetic components across the human genome (right) and in a smaller subset (left). (Image: Gerstein et al. in Nature)
What I find striking is that much of the action – both in terms of selection responses, and also in responses to environmental influences, or aging – is in the vast network of regulatory genes. My experience with such massively nonlinearly connected systems is that it is very difficult to manipulate them to achieve a desired outcome, and it is very easy to get unintended – and undesirable – consequences via nonlinear feedback loops.
This reminds me of a conversation I had with the evolutionary biologist Michael Rose in April at the Consilience conference in St. Louis. As I wrote in my blog on why I decided to switch to the (so called) Paleo diet:
The reason is that one gene-one action model is wrong; it’s not how our bodies work. Most functions are regulated not by a single gene, but by whole networks of them. As we age, some genes come on, and others go off, and the network changes, often in very subtle and nonlinear ways.
The new results from the Encode project appear to be a dramatic confirmation of this view.
This is very interesting stuff!
Michael Eisen at his “it is NOT junk” blog, http://www.michaeleisen.org/blog/?p=1167, takes some issue with the study’s press release and subsequent media accounts. He says that many of the sequences identified in the paper will turn out to not actually be involved in gene regulation. He doesn’t not give an estimate of the fraction, but it could be that the network will turn out to be much(?) less massively connected once it is pruned.
Eisen’s point that the “junk DNA” idea has been debunked (over and over) for at least a decade is also worth noting. There has been no shortage of projects and headlines from before the ENCODE project declaring that most DNA is not junk.
Matt, thanks for the link to Michael Eisen’s blog. It is quite interesting, and not only because of his coverage of genomics. I found his viewpoint on the Encode brouhaha very illuminating (naturally, my information on this topic all comes from the media, I don’t read the primary literature in this field). However, his criticism of the scientists for ‘allowing’ themselves to be misquoted is not fair. Having gone through the process on numerous occassions (most recently last month) I can attest that even direct quotes often garble what I actually said. Additionally, most publications have policies against letting interviewees check how their words will be actually reported. So I decided not to worry about it. I will talk to reporters, check their drafts if they choose to ask for it, but otherwise I am not responsible for what they write. I am only responsible for I write myself.
Pete, it was interesting to see how your recent article was described by various media outlets, etc.
Ed Yong at “Not Exactly Rocket Science” has a long post on ENCODE – with updates where he felt he got things wrong the first time around. This is potentially one of the advantages of web-based journalism – if you discover you got it wrong, you can easily and transparently update the post for all time. (http://blogs.discovermagazine.com/notrocketscience/2012/09/05/encode-the-rough-guide-to-the-human-genome/)
He also recommends this round-up on the Nature news blog (http://blogs.nature.com/news/2012/09/fighting-about-encode-and-junk.html).
Matt, good point. Same principle applies to an e-book, compared to a paper book (I am thinking about these issues a lot as I gear up to write my next popular book, which will be an e-book). An e-book can literally evolve – and there are no thousands of old editions to pulp.
By the way, to differentiate between Peter Turchin and Peter Richerson, we typically refer to the first one as ‘Peter’ and the second one as ‘Pete’.
Evolvability seems to me to a big problem with the emerging genetic picture, as Peter suggests. Gunter Wagner studied this problem many years ago. His analysis found that organs should be well modularized so selection could act on each one more or less independently of others. Random pleiotropy was deadly for evolvability. If a gene selected for its action in one organ but the same allele has negative effects in another organ selection will have difficulty making progress. Recent findings suggest that there ought to be massive pleiotropy in the genome. For example the regulatory gene FOXP2 is expressed during development in many organs. The common human sequence has two base pair differences from the chimpanzee sequence. There is some poorish evidence that human sequence was selected for a role in the capacity for speech. But why didn’t that perturb its function in at least some of the many other developmental circuits it participates in? Since organisms clearly are quite evolvable something must be going on in the genome that we, or at least I, don’t understand.
Pete, you raise a very important point. As I was writing my blog about massive interconnections, I was thinking along these lines, too. In population ecology models if you put in too many nonlinear feedbacks (and too many could be just 3 or 4) you quickly destabilize the model to the point where populations start crashing. Modularizing, insulating different parts of the system from wild oscillations in other parts, is one way to stabilize it. So I wouldn’t be surprised if the evolution followed this route (non-modularized, massively interconnected systems would go extinct). Well, we will have to wait for more progress in genomics to learn how the human genome avoids this problem.
Modularizing and isolating seems to me like the strategies that human engineers, not nature, would choose so that they could make aprioristic predictions about small interventions, and therefore extract useful knowledge for further interventions. But as far as I know nature has never used this approach. Aprioristic knowledge is a human approach to technology improvement but it is not the way nature seems to work. Nature doesn´t make hypothesis and test them in controlled isolated conditions. It doesn´t need to. Predictability is of no interest. Nature is only concerned about the outcomes. It just explores blindly and prejudicelessly, keeping whatever is succesful and discarding whatever is a failure. Interconectivity could not be a problem at all for natural evolution, it could only be a problem for artificial evolution and for human engineers.
This has proven of key importance during the last 13.000 years in the acclimatation of edible plants through artificial selection. Those plants whose traits are governed by single genes have been acclimatated easily whereas those whose traits are governed by a number of genes have been more resistant to acclimatation. This only proves that interconectivity and pleiotropy in genes is bad for human engineers but not necessarily for natural evolution.
I think that I read something about the optimal degrees of interconectivity in Robert Wright´s “Nonzero” or Dorion Sagan´s “The thermodynamics of life”. Too little interconectivity is bad and too much is bad as well. Natural selection is the perfect sculptor of these equilibria which could account for the evolvability issue.
Perhaps we humans can only aspire to find aposterioristic explanations.
I liked your analogy between the genetic code and the army.
At first glance, the ratio of the number of privates to generals as 1 to 20,000 looks like an organizational nonsense. However, if we look, for instance, at the ratio of the number of workers who are directly involved in the production of goods and the number of the rest of the population, then we can see that this ratio (approximately 2 millions to 360 millions for U.S.) is sufficiently close to the ratio of coding and non-coding DNA (1 to 20,000).
Approximately the same ratio can be found between the number of moving (“working”) and stationary (“non-working”) parts in the car.
In light of these analogies, it is rather strange to see such a hypothesis that all non-coding DNA in some way take part in the regulation of coding DNA. Because, it is in a sense the same as saying that all people who are not directly involved in the production of goods are managers!
In any case, it would be very surprising if the massive interconnection occurred in reality. This would not only be useless, but rather the strong interfering genome stability.
Alexander, not that the brouhaha is winding down, and the dust is settling, it looks like the ’80 percent’ is a very misleading number. Still, even if the breakdown between the protein coding genes and ‘switches’ is 1.5% to 10 or 20%, we still have more ‘generals’ than ‘privates.’ So the problem that the possible massive interconnectedness poses to the stability of the system remains. We will just have to wait and see for the resolution of this conundrum.
Good day! I know this is kinda off topic however , I’d figured
I’d ask. Would you be interested in exchanging links or maybe guest authoring a blog article or vice-versa?
My blog addresses a lot of the same subjects as yours and I feel we could greatly benefit from each other.
If you might be interested feel free to send me an e-mail.
I look forward to hearing from you! Fantastic blog by