Exploring chronic disease
Excuse me, my what?!
Duh, Kineosphaeram, one of the over 600 bacterial species that may be living in your mouth or other areas of your body. If you don’t harbor Kineosphaeram, then perhaps your mouth is home to Bergeriella, Buttiauxella, Cedecea, Derxia, Faecalibacterium, Hallella, Mannheimia, Paludibacterm, Ruminococcus, Thermovirga, or Wolinella. The list goes on….
If these bacterial species sound new to you, it’s because many of them are. Several of the species were just recently named after researchers led by Dr. Mark Stoneking of the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany conducted the first in-depth study of global diversity in the human mouth. The team sequenced and analyzed variations in the bacterial gene encoding 16S rRNA, a component of the bacterial ribosome, in the salivary metagenome (bacterial population) of 120 healthy subjects from six geographic areas. The researchers proceeded to compare the sequences they found with a database of previously categorized 16S rRNA sequences to categorize the types of bacteria present.
These sequences could be assigned to 101 known bacterial genera, of which 39 were not previously reported from the human oral cavity; phylogenetic analysis suggests that an additional 64 unknown genera are present. The results suggest great diversity in the salivary microbiome within and between individuals that until this point had never been realized.
“The healthy human mouth is home to a tremendous variety of microbes including viruses, fungi, protozoa and bacteria,” said Professor William Wade from King’s College London Dental Institute. “The bacteria are the most numerous: there are 100 million in every millilitre of saliva and more than 600 different species in the mouth. Around half of these have yet to be named and we are trying to describe and name the new species.”
Are these bacteria helpful or up to no good? While some may not impact dental health, disease causing bacteria in the mouth are rampant – ranging from species that cause the dental plaque that leads to cavities to forms that weaken the gums or cause bad breath. For decades scientists have advised patients to brush their teeth (don’t forget to scrub for a full three minutes!), floss, and often use a variety of mouthwashes to rid the mouth of as many bacteria as possible. There is little worry that such procedures might kill commensal or “helpful” bacteria in the mouth, probably because most dental bacteria are seen as a menace to to salivary and dental health.
Three of the bacteria identified in the “healthy” subjects in Stoneking’s study are certainly not bacteria anyone wants to be carrying around – Neisseria. Treponema neisseria and Yersinia. Treponema and Neisseria can cause gonorrhea and syphilis respectively. Infection with Yersinia leads to a variety of symptoms including fever, abdominal pain, and diarrhea, which is often bloody. It has also been implicated in Reactive Arthritis.
At my last appointment, my dentist showed me an awesome video of biofilm (bacterial colonies) on the surface of normal teeth. The images were so cool that I asked him for permission to put the video up on this site, but, alas, it is copyrighted. My dentist proceeded to laud the virtues of regular flossing, a practice which I do regularly. (I swear!) In his opinion, flossing helps break up these biofilms and is critical to preventing tooth decay.
Interestingly, the Marshall Protocol does just that – although it uses pulsed, low-dose antibiotics – which have been shown to effectively destroy biofilms – and Benicar to get the job done with more vigor than a flossing addict could ever achieve. Take a certain MP patient (to protect her anonymity, I will call her “Mom”), who has been seeing the dentist for years due to tooth decay. I am told there is a ski home somewhere in Vail funded in large part by “Mom’s” regular dental work. She started the MP two years ago and now her dentist is more than a little surprised. At her last appointment, he said that he simply could not fathom the lack of plaque or tooth decay in any area of her mouth. Boy, does “Mom” wish she had started the MP earlier!
What intrigues me about bacteria in the mouth is that scientists regard most of them as harmful to our health and have no problem with procedures that would seek to sterilize the mouth. But when one mentions other parts of the body – let’s say the gut – and points out that perhaps the majority of bacteria in that area are also causing inflammation and disease, the same researchers often strongly disagree. Currently bacteria in the gut are largely assumed to be “helpful”, although in many cases such thinking is based only on speculation. Perhaps some gut bacteria may help with metabolic breakdown, but it is quite possible that the environment in the gut more closely resembles that of the mouth – an environment that can easily be overtaken by pathogens. Under such circumstances, a treatment like the MP that kills bacteria in the gut is therapeutic against inflammatory diseases such as Crohn’s, colitis and myriad other bowel ailments. This is especially true since patients on the MP are reporting improvement and recovery from bowel diseases that have never previously been reversed.
Also interesting is that many bacteria in the mouth seem able to migrate down the esophagus and reach the interior organs of the body. For example,Porphyromonas gingivalis and A. actinomycetemcomitans, both of which cause decay in the mouth, have been repeatedly identified in artherosclerotic plaque. This strongly suggests that these bacteria may be wreaking havoc on the blood vessels and contributing to heart disease. In fact, biomedical research Trevor Marshall believes that arterial plaque is a result of chronic bacterial infection. Indeed, where arterial plaque was once thought to be made of cholesterol and lipids it is now known that it is largely composed of dead macrophages. Since bacteria can infect and kill macrophages the death of such cells and their accumulation in patients with heart conditions seems logically tied to bacterial infection. With the above in mind, it’s not surprising that patients on the MP have reported improvement and recovery from various cardiac conditions. Some have tests showing that after years on the MP the plaque in their arteries is greatly reduced.
So keep on brushing people, but I recommend doing the MP too. In a theoretical sense the MP “brushes” our insides – the places we can’t reach to kill pathogenic bacteria physically. For most MP patients doing so is proving to be quite rewarding.
Though the human genome was fully sequenced in 2001, the most promising work in genomics has just begun and not even in the study of human DNA. Human cells are outnumbered by bacterial cells by a factor of ten to one, and, as the rest of this site alludes to ad nauseam, there is strong reason to believe that bacteria are to blame for many of the chronic diseases from which humans suffer. Genetically speaking, we know relatively little about bacteria that persist in humans. The field is ripe for advances.
You may wonder how a researcher can view and understand a particular bacterial genome. On their own, they cannot. Progress in genetics is a group effort, and requires partnering with one of the handful of heavyweight institutions in the world that have developed resources allowing for genome interpretation. Several such institutions exist in the US. The NIH has bacterial protein sequencing tools at its disposal.The Broad Institute at MIT as well as theWashington University Genome Sequencing Center have also developed tools that allow for genome sequencing.
Many would argue though that the Institution most on the bleeding edge when it comes to genome sequencing technology is the J. Craig Venter Institute, formerly known as TIGR. Headed by transformative iconoclast and entrepreneur J. Craig Venter, the Institute is a non-profit research center that was founded in 2006. It has facilities in Rockville, Maryland and La Jolla, California and employs over 400 people, including Nobel laureate Hamilton Smith.
You can imagine how happy I was to get an email from my former advisor at Georgetown (where I have an undergraduate degree), asking if I wanted to attend a training session in bacterial sequencing technology at the Rockville branch of the J. Craig Venter Institute (JCVI). He was keenly aware of my thirst to gain hands on experience with sequencing technology.
The training was called the “Prokaryotic Annotation and Analysis Workshop.” (As some may know, “prokaryote” is just another name for single-celled bacteria.) This experience marked my first exposure to sequencing technology, and I had little idea what to expect. Would I be able to follow the procedures used to identify protein sequences? Three days isn’t much time, but I was cautiously optimistic.
Last Monday, I boarded a train to Washington DC, took a quick cab over to Georgetown to say hello to some of my old professors, and proceeded to take the Metro up to Rockville. After a solid night’s sleep at the “Sleep Inn,” I took the hotel’s shuttle to the door of the Venter Institute.
The entrance of JCVI has an aura of science and progress. The walls of the lobby and hallways are covered with neatly framed images of sequenced genomes. The individual proteins in such pictures are illuminated in different colors, invoking modern art. The head of educational outreach programs gave us a tour of the grounds, which concluded at the space right in front of Venter’s office (he was traveling at the time and unfortunately not in his office!). Employees of JCVI refer to the space as “the museum” as several objects deeply rooted in scientific history are on display. A glass case on the left side of the room stores letters exchanged between Watson and Crick. A very early model of a sequencing machine used by Rosyln Franklin is on the right. A large statue of a tiger seemingly prowls in the middle of the room – one of several tiger statues that used to be at the building’s entrance when the Institute bore its previous name. If the tiger is the unofficial mascot of JCVI, it’s certainly an appropriate one. This is not the place for the ambivalent.
It’s clear that the staff at JCVI take great pride in their accomplishments and with good reason! Copies of Science and other prestigious medical journals containing studies published by JVCI or reports of efforts led by Venter are displayed on tables in several locations. The walls of the hallway leading to Venter’s office are covered with framed newspaper and magazine articles featuring Venter – articles in Wired, People Magazine, and the New York Times. Venter has been named one of the top 100 most influential people in the world by Time Magazine for the last two years.
Before the training began, I had the opportunity to chat with some of the twelve other people in my class. I had already met Dr. Anne Rosenwald, a professor at Georgetown whose research focuses on understanding the genetics of various yeast forms. She also teaches biochemistry. Dr. Rosenwald was attending the session in the hopes of working out a deal with JCVI in which the Center could provide her with genomes that have been analyzed by computers but are still in need of human annotation. Annotation refers to the process of using clues in a DNA sequence in order to name and identify protein coding regions. If such an exchange of information is possible, it would allow her undergraduate students to map a bacterial genome as their thesis project. I hope the partnership works out because I think that while challenging, using JCVI’s annotation technology would provide any undergraduate with excellent preparation for microbiology and molecular biology gradate programs. I certainly wish I could have learned how to sequence a genome as an undergraduate!
Two members of our group had travelled to JCVI all the way from South Africa. Researchers at the University of the Free State, they were already using several of JCVI’s programs to sequence and thus better understand the genomes of bacteria isolated from several African caves – bacteria that have never before been classified. I spoke with them about the challenges of mapping completely new genomes. Soon enough, I aspire to study new genomes myself, especially those pertaining to the great mass of unclassified species of bacteria in the human body. I figured their feedback could clue me in to the challenges particular to dealing with unknown organisms.
The South African duo were pleased with what they have been able to learn thus far about their cave-dwelling species. When it comes to JCVI’s sequencing technology, they were old pros and suggested improvements to the software throughout the class. Why had they come to JCVI? I sensed an eagerness on their part to see the hub of progress in person and personally get to know some of the people working with and developing the technology they are using. At the end of the session they kindly invited me to visit South Africa and spend time in their lab. Who knows, I might take them up on the offer at some point as South Africa is one of my top travel destinations.
Our classroom provided a comfortable atmosphere in which to learn with shiny new laptops for each of us. Access to the laptop allowed us all to get a chance to navigate our way through a program as the instructor described its features in a lecture. Snacks, coffee, hot chocolate and tea were available at all times and during the class we would break every hour or so to refill our cups and chat.
The first day was spent learning about the process by which JVCI’s technology allows unknown proteins to be named and characterized (annotated). Our teacher, Ramana Madupu, is a full-time employee at JCVI who uses the technology discussed in the lecture in the course of her job.
Let’s say you are picking your nose. First of all, shame on you. But, let’s say that in spite of your flagrant disregard for common decency, you nobly want to contribute to human progress by determining what kind of bacteria are in your booger. After conducting several basic experiments on the bacterial DNA in your lab, you decide that a bacterial species may be new and unique, so you decide to contact JCVI.
JCVI has you send them a sample of the bacteria in question. A non-profit institution, JCVI will run your genome through its sequencing machine at no charge to you. This service is largely automated and it’s becoming cheaper and cheaper. JCVI’s mandate is to sequence as many genomes as possible and freely share that data with researchers. However, JCVI’s offer to freely sequence and interpret your genome comes with an expectation, namely that upon receiving your results, you will review and manually correct any of the sequence errors. At last check, JCVI’s computer annotation programs claims a 95% accuracy rate.
As we know from high school biology, DNA consists of four nucleotides: adenine, thymine, cytosine, and guanine. A gene is nothing more than a sequence of those As, Ts, Cs, and Gs, one which codes for a particular protein. Genes are the blueprints for making proteins, and in fact a new gene is often referred to, at JCVI at least, as a “putative protein.” (The word putative is used until one has sufficiently conclusive evidence to remove that label.)
At the risk of gross oversimplification or misstatement, let me dare to explain the technical process of how a genome is sequenced and interpreted. You start with a genome. The DNA is processed with fluorescent dye. Each base pair (aka a nucleotide) emits a different color. Those colors are read by a machine and interpreted as a sequence of nucleotides. The result is an exceedingly long sequence of As, Ts, Cs, and Gs.
Now the fun really begins. The goal is to take this enormous sequence and begin to determine which base pair sequence codes for which protein and in which biological category. The Prokaryotic Annotation Pipeline aka “the pipeline” to the rescue!
The pipeline is an algorithm-based workflow which automatically predicts to a good, but certainly not perfect, degree of reliability for the name, location, and function of a gene. The pipeline does this by comparing your base pair sequences to sequences of previously identified proteins that exist in a variety of databases.
But how does the pipeline know which segments of your base pair sequence to check against known protein sequences in the hopes of finding a match? There are apparently a lot of fancy statistical algorithms at work here, but one way is to look at the codons, or pieces of genetic code which mark the start and end of a protein sequence. The base pair sequences ATG, GTG, and TTG almost always code for the start of a protein, while the sequences TAA, TAG and TGA almost always indicate the end of a protein. Of course there are always exceptions to the rule, which is why every sequence to come through the pipeline should be checked by a human being. That’s the ideal anyway. By identifying these start and stop codons, the pipeline has a pretty good idea of where one protein coding sequence ends and another begins. At this point, each potential protein coding sequence is referred to as an ORF or “open reading frame.”
The algorithm matching a gene to an existing sequence applies greater weight to matches derived by certain databases. For example, one database frequently used for comparison is Swiss-Prot. Swiss-Prot relies exclusively on manual annotation by humans. At present, humans make fewer annotation errors and are, therefore, more reliable than software. For this reason (and perhaps due to the fact that, at least according to stereotype, the Swiss are highly precise), Swiss-Prot is arguably the gold standard. If a sequence from your bacterial genome matches a Swiss-Prot sequence, the confidence level is high that the match is correct.
During the pipeline comparison process, the software will also run your protein sequences against what are known as “Hidden Markov Models” or HMMs. HMMs are essentially statistical models of the patterns of amino acids in a multiple alignment of proteins which share sequence and functional similarity. Proteins run against HMMs receive a score as to how well they match the model. If the score is high enough you can reasonably expect your protein to have the same function that the HMM represents. For example, if your protein has a high-scoring match to an HMM model for a protein involved in sugar transport, you can be pretty sure that the match protein from your genome has the same role.
After a particular protein sequence is run against HMM models, the software assigns it a putative name and role, based on how much information it believes it has to support such a label. The process of comparing sections of your base pair sequence to as many existing protein databases as possible is also referred to as BLASTING. Depending on the level of evidence at hand, the protein is also given a gene symbol, role information, and sometimes numbers that pertain to its classification.
For example, after one of your proteins is run through the pipeline the pipeline might come up with the following result:
Name: biotin synthase
Gene symbol: bioB
TIGR role: 77 biotin synthesis
Now, what does this mean? Let’s break this down. The fact that the gene has what is called a TIGERfam ID (TIGR0043) refers to the fact that it had a high scoring match to a protein previously annotated at JCVI. Since JCVI obviously believes their genomes have been well annotated, a TIGERfam match that exceeds the minimum threshold for reliability is generally regarded as a sign that the computer has made the correct match. The name and protein role associated with the highest HMM and other database matches for your protein is also displayed, along with the symbol for the putative gene. In this example, it appears that it is the software’s best guess that your protein is involved in the synthesis of biotin (also known as vitamin B7).
JCVI repeats this process for every Open Reading Frame sequence it detects, and the number of sequences often ranges in the thousands.
How then does one access the proposed annotations generated by the pipeline? After each of your protein sequences has been run through the pipeline, JVCI software condenses them into a digital file that is sent to you. At this point you need to use a web-based program that allows you to manually modify the results. The program is called Manatee, Manual Annotation Tool Etc. Etc. An open source project, it was also created by JCVI software programmers. A bit intimidating for the uninitiated, Manatee is a powerful and exquisite program, which allows a person to assign each putative protein with the correct name and function.
Your goal in using Manatee is to make sure that the protein matches made by the pipeline are grounded by supporting evidence. For example, you can check for “gene model curation” which provides information necessary to ensure that your genes have the correct coordinates and that your set of predicated genes is complete. Other features allow you to look at the raw base pair sequence of your genome in order to identify rare start and stop codons that the computer may have missed, or screenshots that allow you to note if the software accidentally annotated overlapping genes.
As the human annotators using Manatee use their good old-fashioned brain power to identify where the JCVI computers may have made mistakes, they alter the names of certain proteins in accordance with such findings. When naming a protein, the goal is always to err on the side of conservatism.
Let’s say, for example, that based on a strong HMM hit, the computer has decided that one of your proteins is a ribose ABC transporter protein (ribose is a sugar). But after further examining the protein using Manatee’s tools you decide that there really isn’t enough evidence to support the conclusion that the sugar transported by your protein is ribose. You then manually change the protein’s name in Manatee so that it is less definitive by calling it only an “sugar ABC transporter”. Then, after using even more of Manatee’s features, you decide that you can’t put together sufficient evidence that the gene in question really transports any form of sugar. Under such circumstances, you make its name even less specific, calling it simply an “ABC transporter.”
As you can see, Manatee is a tool which enables researchers to better make judgments about the role and function of genes by assigning characteristics to those genes. Often enough, evidence to make these determinations is insufficient and attributes are characterized by how reliable the best evidence is. One expands and contracts the attributed qualities as the evidence warrants. When the evidence is equivocal, you say that a protein is “putative.” This apparently is the nature of genetic research, one which requires scientists to pick up on indeterminacy and do their best to fill in the gaps as they go.
Every DNA sequence which emerges from the pipeline is putative. Certain sequences remain so because they fall short of threshold reliability which would allow the software to give it an existing name. Under such circumstances, no name is assigned and no role is attributed. The protein is simply named “hypothetical protein.” If a hypothetical protein from one species matches a hypothetical protein from another, each are given the name “conserved hypothetical protein.” Since the hypothetical protein has been found in two different species, the corresponding sequence clearly exists. But the sequence’s series of base pairs are so different from known sequences that, at this point in time, neither the software nor a human annotator are able to give it a name or role. Ramana (my instructor) commented that as the Human Microbiome Project presses on ahead, she expects to see many more “hypothetical proteins” show up in genomes. In fact, she had just recently finished sequencing about eight Microbiome genomes and was surprised at how few of their DNA sequences matched known proteins. This suggests that the majority of the yet unknown bacteria that inhabit the human body are quite different than those species we have already become familiar with such as E. coli or Tuberculosis.
One might ask, “Isn’t the human annotation process open to error and bias?” The answer is yes. It’s up to the human annotators to decide if they can find enough information to support a software derived match and every human has different tendencies when it comes to such decisions. Annotators like Ramana say that after working with a sufficient number of genomes they usually learn to trust their gut feelings and standardize the process by which they make naming decisions. Even the best human annotators would admit that 100% consistency, from one day to the next, between one annotator and the next, is unattainable.
So what happens when an annotator has finished going over all the proteins in a particular genome? Genomes in which all protein sequences have been given a name and function are considered “closed” and made available throughGenBank, ultimately. Any genome with loose ends is considered “open,” with the hope that future researchers will we able to confidently determine what the names and roles of current hypothetical proteins. One way to determine the role of a “hypothetical protein” is to study the protein coding sequence in the laboratory using in vitro techniques such as the creation of gene knockouts. Given this, it should be clear to my readers that sequencing technology does not obviate the need for laboratory research. There remains a lot of work to be done in this field.
JCVI enters as many genomes as possible into a database called the Comprehensive Microbial Resource (CMR). CMR is a free, open-source website that allows access to the sequence and annotation of all completed prokaryotic genomes. CMR is a seemingly invaluable resource. Before genomes are entered into the database they are standardized in a manner that makes them much easier to be compared. Researchers from over 200 sequencing centers currently put sequenced genomes into a database called GenBank. GenBank contains about 600 complete prokaryotic genomes with about 10 new genomes released each month. One of the significant problems with GenBank is that the annotation process at each center that submits genomes to GenBank is done so differently that many of the genomes in GenBank have been named using different conventions. Often, they have also been assigned genes symbols and role names that differ depending on their where they were sequenced.
The goal of the CMR is to take the genomes from GenBank and create common datatypes with the same nomenclature sequence elements annotation methodology. When this has been done individual genomes can be compared much more easily and accurately. There are currently about 400 organisms in CMV but the project’s leaders have ambitiously committed themselves to adding several hundred more genomes to the database in the coming months. One reason that the CMV contains fewer genomes than GenBank is because the project is, thus far, unfunded. Apparently, JCVI has been working on the CMV without grant money for the last two years. The program is so well-designed and useful that it’s hard to believe it could go unfunded. I was told that JCVI has just applied for a new grant that might allow the project to be funded and project leaders should hear back about the decision in a week or two. Fingers crossed!
Tanja Davidson is one the main directors of the CMR, who was our teacher on day three. In fact, our entire third day was spent learning about the CMR, which at first glance, contains a daunting but well-organized number of features. CMR allows the researcher to compare multiple genomes using what are called “cross genome analysis pages.” These tools allow two or more genomes to be compared so that the elements they have in common (or the elements that make them different) can easily be analyzed.
Imagine that doctors report an outbreak of a stomach disease and a bacterial species is isolated from people with the illness. The genome of the disease-causing pathogen is put through Glimmer and the pipeline, annotated by humans, and found to be part of the E. coli family. By using CMR tools, researchers can compare the genome of the new E. coli variant to the genomes of other E. coli species that have not been tied to stomach disease. Most of the genes between the different forms of E. coli should be the same because they are of the same family. But those genes that differ between the recently isolated species and those already in the database can be assumed to be those coding for the proteins that endow the new variant with the ability to cause disease. In this case, the CMV comparison tools greatly narrowed down what would otherwise have been a veritable “needle in a haystack” situation.
Other nice features of the CMV include the ability to access a “Role Category Graph” which displays the different roles of all the proteins in a genome in a colorful pie chart. A tool called “Restriction Digest” allows users to splice genes of interest with various enzymes – a procedure that takes a long time to complete in the lab but only minutes to complete using the CMV. A “Pseudo 2-D Gel” allows users to get an idea of what a genome of interest looks like in another dimension. Each dot of a 2D gel represents a single protein whose location can be compared to others. The comparative tools even allow for the creation of a scatter plot in which two genomes are compared on a two-dimensional plane.MUMmer or (Maximum Unique Match) compares genomes at the nucleotide level, allowing scientists to detect just single nucleotide differences between DNA sequences.
When it comes to the CMR, Tanja and other JCVI employees welcome feedback from scientists other than those at JCVI. In fact, while we were doing some practice CMV tutorials in class, the pair from South African and Dr. Rosenwald came across a few minor glitches in the system. Tanja was quick to write them down and most of them were already fixed by the time we got back from lunch. Rosenwald and others also offered feedback about new features they might like to see in the CMV and Tanja was again quick to record their suggestion and insights. I could tell she was definitely not just humoring people but actually planning to pass every suggestion by her development team.
The reality is that Manatee and the Annotation Engine project are part of the Institute’s open source initiative, the goal of which is to provide high quality software and services to the genomic community. External involvement and feedback is strongly encouraged because it’s such feedback that drives development and continual improvement of the software. In fact, JCVI doesn’t actually have employees who test their software, so they fully depend on user feedback. Some of us joked that because we were testing the CMV as part of our class exercises we should have been paid to attend the training session rather than vice versa.
Human annotation is a lengthy and laborious process. One of the foremost goals at JCVI is to perfect the pipeline and the computer annotation process such that human annotation is no longer necessary. One Idea currently being tossed around at JCVI when it comes to perfecting the output of the pipeline is a concept referred to as something like “humanitization.” (I’ve searched my notes but can’t find the exact name!) The annotators at JCVI are currently being asked to report exactly how they go about using Manatee in order to annotate a genome. As previously discussed, since there are so many databases to compare and analyze in Manatee, each employee using the program has settled into a pattern of evaluating database information in a certain methodical fashion. The hope is that if some of the best human annotation regimens are recorded and analyzed, they can be translated into logic, which software could duplicate.
If these extra steps do indeed increase the accuracy of the protein matches made by the pipeline, there may no longer be a need for humans to check Manatee’s output. So it’s possible that in the coming years genome sequencing may be a completely automated process. At the current moment, the pipeline’s protein matches are accurate about 95% of the time The stated goal is to get that level of accuracy into the 99-100% percent range. So, as Tanja commented, the human annotators at JCVI who are currently helping programers understand how they navigate Manatee may, by doing so, actually be putting themselves out of a job.
But at least for now, human employees are still an integral part of the annotation system. Four recently hired JCVI employees were attending the teaching session. During a discussion about perfecting the pipeline, our instructor confided that one of our classmates had just been hired with the expectations that he would create the technology to make the pipeline more accurate. What a daunting job! The rest of us regarded him with a certain level of awe over the next two days. Every so often our practice sets would reveal a flaw in pipeline output and the instructor would turn to this particular employee and say something like, “Of course now, you’ll be fixing this problem.” Such comments reflect what seems to be the prevailing attitude at JCVI. Most of their projects are extremely ambitious and half the time I’m not sure if they even know if success is possible when a task is initiated. But the mindset is “No matter how hard this goal seems we will simply have to find a way to get it done!” This type of determined thinking does seem to generate results as there is little doubt that such an attitude was the driving force behind the Institute’s ability to sequence the human genome in record time.
As implied by the above paragraph there are a lot of situations at JCVI that end up pitting humans against computers. As Ramana described, it would be ideal if every genome sent to JCVI could be manually annotated from the onset. At least for now, a well-trained human is able to pick up on subtleties of database comparisons that the computer can miss. But such a scenario, at least over the long term, simply isn’t sustainable. Since genome mapping is growing in popularity over the coming years, humans alone cannot keep up with the number of genomes requiring mapping. Although using computers to annotate genomes slightly compromises accuracy, the technology must be used in order to keep up with demand. Ideally genomes are manually checked with Manatee but there are definitely JCVI/TIGER annotations that are never checked by a human annotator at all.
In recent years, mapping genomes has grown in popularity. Scientists working on efforts related to the Human Microbiome Project currently want to map the genomes of every single bacterial species capable of inhabiting the human body, and such pathogens may number in the thousands. But large groups of other scientists are set on better understanding the massive number of bacteria that inhabit our oceans. Since little is known about many regions of the ocean, who knows how many microbes these efforts may turn up? Then, like the two scientists in our group, other research teams seek to map the genome of bacteria that live in obscure land locations such as caves, volcanoes, mines etc. So, the JCVI computers and those at other sequencing centers are relentlessly accumulating DNA data.
Perhaps because they have each personally annotated so many hypothetical proteins about which we currently know nothing, the staff at JCVI are very open to the idea that we are only on the brink, if that, of truly understanding the bacteria capable of making us ill. This correlates with the Marshall Pathogenesis in which essentially all inflammatory diseases are attributed to infection with chronic intraphagocytic metagenomic bacteria that, for the most part, have yet to be clearly named and sequenced. One study I often invoke was conducted by Dempsey and team. This Glasgow-based group found human tissue taken from prosthetic hip joints contained protein sequences corresponding to those of hydrothermal heat vent bacteria. Most of the time when I discuss the study, other scientists are skeptical of the results. The average response is that they would like to see the results repeated or that the sample was contaminated. Ramana had no such reaction. In her opinion, there can definitely be hydrothermal heat bacteria in the human body and she’s confident the sample was not tainted. When we discussed the findings she suggested that the bacteria are probably not killed at high temperatures which, interestingly, was one of Dr. Marshall’s first inferences when analyzing the data.
The organizers made an admirable effort to serve us savory lunches which we ate in one of JCVIs cafeterias. All our teachers attended lunch and sat among us, meaning that I was able to easily batter them with questions. Alex Richter, one of the program heads, was great about answering my questions in detail. Thanks to his anecdotes, I got a much better impression of what microbiology labs will be doing in the coming years and the tools I will likely need to master as a potential microbiology PhD student. Before attending the training I had wondered if I would be able to understand JCVI’s sequencing technology without a background in computer science. But Richter didn’t seem to think that my lack of computer training is an issue and it’s true that I certainly seemed able to follow the discussions in class. I was encouraged by Richter’s comment that someone good at scientific reasoning (such as, ahem, myself) is also likely to be good at working systematically with computer programs. I’m sure he’s right, but even so I won’t be contributing to the Linux codebase any time soon.
It felt pretty darn good to be in a place where I personally believe that government funding is going towards research that is really going to have an impact on our ability to better understand chronic disease. As the Marshall Pathogenesis continues to spread, it’s clear that bacteria will eventually receive all the scrutiny they are due. At that point, scientists, doctors and patients alike are going to demand a more thorough understanding of bacteria implicated in chronic disease and down to the level of the genome.
It’s great that JCVI is already starting to collect data on never before sequenced bacteria. It’s also good that the Institute is striving to perfect bacterial sequencing technology now, so that by the time the Marshall Pathogenesis gains hold, sequencing results should not only be more accurate but also easier to use. Just as our ability to sequence genes has improved exponentially, so, I believe, will our ability to interpret the data. The tools are just getting better and better. As someone who has the inside scoop about the fact that bacteria are headed for the big time, I feel we’re closer than ever to characterizing the genomes of the pathogens that are capable of making us so ill.
Note: Much of the information included in this piece was derived from two articles published in the May 28th edition of Nature News, a resource published by the medical journal Nature
Even those of us who live under rocks have heard of the Human Genome Project, a massive international scientific research project the aim of which was to understand the genetic makeup of the human species. Its primary goal was to determine the sequence of chemical base pairs which make up DNA and to identify the approximately 25,000 genes of the human genome from both a physical and functional standpoint.
A working draft of the genome was released in 2000 and a complete one in 2003, with further analyses yet to be completed and published. Meanwhile, a parallel project was conducted by the private company Celera Genomics. Most of the sequencing was performed in universities and research centers from the United States, Canada and Great Britain.
Most researchers would agree that the Human Genome Project was launched in order to answer the long-standing question, “Who am I?” The goal was to identify and sequence every single human gene. By doing so, many researchers were certain they would uncover causes for most of the chronic diseases that plague humankind. At the project’s start, scientists were faced with a multitude of unknown sequences to decipher and understand. Surely such sequences would offer up answers to disease, and specific genes would be found that would correlate with specific illnesses. In a Gattaca-like environment, people would then be informed early in life that they had “the gene” for MS or “the gene” for breast cancer. Scientists would work fervently to identify and change the expression of such disease-causing genes, finally developing enough gene therapies to eradicate human disease. The above scenario has an abiding appeal, largely because the idea that our genes dictate our health is so temptingly simplistic.
Yet, while striving to answer the question- “Who am I?”- those researchers searching for a purely genetic cause for disease have failed to recognize that the question, “Who am I?” can only be answered after the question “Who are we?” has been clarified and understood.
“The question, “Who am I?” can only be answered after the question “Who are we?” has been clarified and understood.”As Asher Mullard of Nature Newsdescribes, ‘we’ refers to the wild profusion of bacteria, fungi and viruses that are able to colonize the human body. Such pathogens, and bacteria in particular, number in the trillions. According to one common estimate, the human gut alone contains at least a kilogram of bacteria.
The fact is, at the present moment, human beings serve as communities in which prodigious numbers of bacteria can thrive. Since the average human is currently outnumbered by the pathogens they harbor, the genes produced by these bacteria outnumber human genes as well. According to Mullard, “Between them [the pathogens in our bodies], they harbour millions of genes, compared with the paltry 20,000 estimated in the human genome. To say that you are outnumbered is a massive understatement.”
So by sequencing only human genes, the Human Genome Project has failed to take into account a vast number of bacterial genes that also have the potential to affect the progression of human disease. The fact that many researchers are interpreting genetic data while leaving bacterial genomes and bacteria in general out of the picture is a serious issue.
This is because many of the chronic bacteria we harbor are intraphagocytic – meaning they have developed the ability to live inside the nuclei of our cells. Such pathogens thrive in the cytoplasm, or the liquid surrounding the cellular organelles that allow for DNA replication and repair.
Our DNA sequences are replicated on a regular basis. The process of transcription allows for the synthesis of RNA under the direction of DNA. Since both RNA and DNA use the same “language”, information is simply transcribed, or copied, from one molecule to the other. The result is messenger RNA (mRNA) that carries a genetic message from the DNA to the protein-synthesizing machinery of the cell. In translation, messenger RNA (mRNA) is decoded to produce a specific protein according to the rules specified by the genetic code.
Unfortunately, pathogens in the cytoplasm can likely interfere with any number of the many precise steps involved in the transcription and translation processes. Such interference results in genetic mutations, meaning that our DNA is almost certainly altered, over time, by the intracellular pathogens we harbor. The more pathogens a person accumulates, the more his genome is potentially altered.
It is also quite likely that intracellular pathogens disrupt DNA repair mechanisms. Since environmental factors such as UV light result in as many as 1 million individual molecular lesions per cell per day, the potential of intracellular bacteria to interfere with DNA repair mechanisms also greatly interferes with the integrity of the genome and its normal functioning. If the rate of DNA damage exceeds the capacity of the cell to repair it, the accumulation of errors can overwhelm the cell and result in early senescence, apoptosis or cancer. Problems associated with faulty DNA repair functioning result in premature aging, increased sensitivity to carcinogens, and correspondingly increased cancer risk.
“Lifelong persistent symbiosis between the human genome and the microbiota [the large community of chronic pathogens that inhabit the human body] must necessarily result in modification of individual genomes,” states biomedical researcher Trevor Marshall. It must necessarily result in the accumulation of ‘junk’ in the cytosol, it must necessarily cause interactions between DNA repair and DNA transcription activity”, he continues.
So there is increasing evidence that many of the genetic mutations identified by Human Genome Project researchers are largely induced by bacteria and other pathogens. Rather than serving as markers of particular diseases, such mutations generally mark the presence of those pathogens capable of affecting DNA transcription and translation in the nucleus. This is why, in most cases, the “one gene, one disease” hypothesis has failed to hold water. Instead, geneticists are now stuck examining a perplexing number of different mutations, most of which differ so greatly between individuals that no correlations can be made between their presence and any particular illness. The mutations are nothing but genetic “noise,” induced either by random chance or by the pathogens that such researchers fail to factor into the picture.
As Marshall describes, researchers sequencing human DNA samples often make the assumption that only one genome (the human genome) is present, when in reality, their tests are also likely picking up on the loose bits of multiple genomes (bacterial genomes). So if a person’s genome is sequenced once and then sequenced again, will the same DNA sequences be obtained? Probably not, because each individual sequencing will randomly pull various pathogenic genes into the human mix. Thus, what are currently viewed as “disease-causing” mutations are essentially statistical anomalies that vary depending on when and how a person’s genome is sequenced.
If sequenced genomes are currently just a sum of several random parts, then it’s inevitable that an individual’s genome will change throughout life. Because pathogen-induced mutations, random mutations, and mutations that result from faulty DNA repair accumulate over decades, the genome map of a child will be very different from the genome map produced when the same individual is an adult, with differences increasing as people reach their elderly years. Goodbye the world of Gattaca. Right now, if a child’s genome is sequenced at birth, his or her genetic sequences predict nothing about the common chronic diseases they will encounter, and mutations accumulated later in life largely serve to signal the presence of infection. This means that using people’s genomes to define their identity – as envisioned by futuristic movies in which a person’s genome might serve as their passport and destiny – is not feasible at the moment.
This understanding marks a yet to be fully realized paradigm shift in the way the genome is interpreted. It’s hard not to feel sympathy for those individuals paying hefty sums of money to have their genomes sequenced by certain companies that now offer such a service. Customers are provided with a map of their genome in which the majority of observed mutations serve only to inform them that they harbor numerous intracellular pathogens. As Mullard describes, “Observers have started to question whether the human genome can deliver on its once-hyped promises to tackle disease. To take just one example, anyone so inclined can now pay genetic-testing companies for a preliminary rundown of the genetic variations associated with his or her risk of developing cancer, obesity and other conditions. But the risks identified are often so low or unclear that people are questioning whether the information will actually prompt the changes in health behaviour, such as losing weight, that could make them valuable.”
The paradigm shift described above, in which genetic mutations are viewed in a new light, has been largely fueled by a new movement in which scientists are now beginning to use molecular technology to detect and sequence bacteria in lieu of simply trying to grow them in the lab. These tools will allow researchers to bypass the need to culture bacteria, exploring the human microbiome by studying genes en masse, rather than studying the organisms themselves.
Recent studies that have used powerful molecular tools rather than standard cultivation techniques have left scientists slack-jawed at the number of bacterial DNA sequences that correspond to bacteria yet to be named or sequenced. A great number of sequences also correspond to bacteria never thought to have the capability of living on or within the human body. It has recently become all too clear that only a fraction of the bacteria capable of infecting humans grow in the lab, and that we have been oblivious to the presence of the majority of pathogens capable of entering our bodies. This realization that we harbor myriad unnamed and unidentified microbes comes at a time when the Human Genome Project is failing to capitalize on its promise to identify root causes for human disease.
As Mullard admits, “The microbes that swarm in and on the human body have always held a certain fascination for researchers. Since so few of them grow in the lab, it has been difficult to work out exactly who these microbial passengers are and how they interact with one another.” Whereas over the past century, standard laboratory culturing techniques have failed to detect the vast number of pathogens capable of infecting human beings, recent advances in molecular technology that allow for the sequencing of bacterial DNA mean that, at long last, we may be able to successfully identify and sequence the bacteria that cause disease.”
The reality is that the plethora of unknown pathogens that colonize the human body are the previously unrecognized puzzle piece behind chronic inflammatory disease. Enter metagenomics, a relatively new field of research that, thanks to advanced molecular techniques, enables researchers to study organisms not easily cultured in a laboratory as well as organisms in their natural environment.
This year marks the tenth birthday of metagenomics. It was a decade ago that Jo Handelsman and her colleagues at the University of Wisconsin in Madison successfully cloned and determined the functional analysis of the collective genomes of previously unculturable soil microorganisms in an attempt to reconstruct and characterize individual community inhabitants. The team coined the word “metagenomics” to explain their techniques and goals. Since the Handelsmam work, the scope of metagenomics has expanded greatly. Teams of researchers across the world have made efforts to describe the bacteria in environments as diverse as the human gut, the air over New York, the Sargasso Sea and honeybee colonies.
“We can look at the metagenomic analysis so much more deeply, at such a better cost,” says Jane Peterson, associate director of the Division of Extramural Research of the National Human Genome Research Institute in Bethesda, Maryland, which recently launched a five-year initiative to explore the human microbiome.
The five-year initiative is one of several massive projects striving to characterize what is referred to as the human microbiome, the name given to the collection of microorganisms living in and on the human body. The goal of the project is as ambitious as it is exciting – to detect and name every type of bacterial species that is currently capable of inhabiting the human body.
The project is perhaps the best example of a new, and long overdue, shift in thinking among many medical researchers. At long last, microbiome scientists are vastly more interested in studying and identifying the pathogens that inhabit the human body rather than simply examining human genes.
Late last year, the US National Institutes of Health (NIH) pledged US$ 115 million to identify and characterize the human microbiome, Also last year, the European Commission and various research institutes committed €20 million (US$31 million) for similar research. Smaller research teams with similar goals are also pledging lesser sums of money towards research that hopes to contribute to a greater understanding of the microbiome. These teams are situated in countries as diverse as China, Canada, Japan, Singapore and Australia.
Since the human microbiome is so diverse, it’s not surprising that an array of different research teams are needed to tackle divergent areas of the project. The NIH’s five-year Human Microbiome Project will spend much of its money identifying where certain bacteria in the body are located. They also plan to compile a reference set of genetic sequences that correspond to each bacterial species.
Although one-quarter of the project’s money has been earmarked to examine the role of the microbiome in health and disease, the Human Microbiome Project will do little to assess the function of microbes during its first year, although it may focus on the topic later. It’s serendipitous that the “health and disease” aspect of the project has been put off, since it’s only a matter of time before the medical community realizes that biomedical researcher Trevor Marshall has already largely elucidated how the intraphagocytic, metagenomic, microbiota of bacteria that cause chronic inflammatory disease are able to survive in the body and evade the immune system. Ideally, the money now dedicated towards examining the role of the human microbiome in disease could be used to pursue research projects related to Marshall’s discoveries.
Since the vast number of bacteria and other pathogens that cause human disease have yet to even be discovered and documented, the primary goal of the Human Microbiome Project is to build up a research community and generate a sequence resource, akin to that developed during the Human Genome Project, so that questions about bacteria and specific disease-causing mechanisms can be tackled at a later date.
“Under the most ideal of circumstances, the money now dedicated towards examining the role of the human microbiome in disease could be used to pursue research projects related to Marshall’s discoveries.”This year, researchers will collect samples of feces plus swabs from the vagina, mouth, nose and skin of 250 volunteers. 250 people may seem like a small number of subjects for such a massive project, but when one understands that the DNA of every single one of the trillions of pathogens harbored by each subject will be analyzed, it’s easy to see that such an undertaking is actually a monumental task.
How do you effectively study such a vast and unknown community? The ultimate goal is to sequence the complete genomes of hundreds of bacterial species and deposit them in a shared database. Most of the research teams involved in the project will be sequencing short, variable stretches of DNA that code for components of bacterial proteins in order to roughly identify which bacteria are present in each person and how many bacterial species volunteers have in common. Once an estimate of diversity has been attained, the researchers plan to mine deeper by using shotgun sequencing – a molecular technique that will allow them to analyze many short pieces of DNA from all over the microbes’ genomes and reveal which genes are present.
In shotgun sequencing, DNA is broken up randomly into numerous small segments. The DNA sequence of each fragment is subsequently identified. The process is then repeated in order to create multiple overlapping sequences of DNA. When enough overlapping sequences have been generated a computer program is able to assemble the ends of the overlapping sequences into a contiguous sequence.
Microbiome researchers will initially use shotgun sequencing data from a few bacterial species that can already be grown, and piece together their whole genomes by putting overlapping fragments in order. The Human Microbiome Project plans to provide 600 “reference genomes.” The European project will do another 100, and other sequencing efforts by the NIH and elsewhere will make additional contributions. The hope is that enough research teams are able to set up a broad enough reference database. Then, researchers will be able to predict the genetic capabilities of many currently unculturable species (many of which are in an L-form and/or biofilm-like state) solely on the basis of similarities with known genes.
Creating the database will not be a simple task. According to Peer Bork, a biochemist who heads the European project’s computational center at the European Molecular Biology Laboratory in Heidelberg, Germany, even if many reference sequences are created, fitting together DNA fragments in order to identify unknown species, “is pretty hairy from a computational biology analysis point of view. Even with the immense power of supercomputers to process the sequencing data, it will take some clever analysis to compare the millions of sequence reads that span thousands of species between hundreds of ‘healthy’ and unhealthy people.”
Yet, the scientists involved in the project appear intent on capitalizing on their promise to sequence the microbiome. Furthermore, each research team will still be allowed to pursue their own pet projects. “Talented people are doing what they think is the most important research to do, rather than being forced to do what somebody else has decided would be the best,” says Ehrlich.
As touched on above, one of the main scientists pushing forward the metagenomics movement is Marshall. Although not directly involved in the Microbiome Project at the moment, decades of in silico and clinical research have allowed the biomedical researcher to create a treatment regimen that effectively kills the intraphagocytic, metagenomic bacteria that microbiome researchers will be identifying in greater detail.
While at first glance it may seem counterintuitive that Marshall’s work has demonstrated how to kill the microbiome before the bacteria that comprise it have even been fully sequenced, one must keep in mind that all bacteria possess certain characteristics. Every bacterial species has a 70S ribosome – a protein region that must be functioning if the pathogen is to survive. Whether or not a species has been named, identified, cultured, or sequenced, if its 70S ribosome is blocked, as it is by the pulsed, low-dose antibiotics championed by Marshall, it will be weakened so greatly that it cannot survive in the presence of an activated immune system.
So Marshall’s treatment protocol – dubbed the Marshall Protocol – already exists, and can kill the pathogens that Microbiome researchers will be identifying. As it perfuses the mainstream, Marshall’s research (when fully appreciated) will represent a quantum leap forward for microbiome researchers. After all, the microbiome community should be quite pleasantly surprised to find out that the disease-causing bacteria they sequence can already be killed.
However, at the moment, patients on the Marshall Protocol have little knowledge of exactly what chronic pathogens they are killing. In a sense, such knowledge isn’t of utmost importance, as specific names are not needed to induce recovery. Yet it would certainly be of great interest for researchers working with various aspects of the Marshall Pathogenesis to possess a greater understanding of the bacterial species any one patient is killing, and what species of bacteria can generally be associated with specific symptoms.
Thus, the database that the Microbiome project intends to provide promises to be uniquely helpful for researchers working on MP-related projects. Such researchers will be able to use the database to get a much clearer idea of exactly which chronic pathogens cause inflammatory disease, the substances created by these pathogens that may lead to receptor blockage, and exactly which bacteria are killed by different antibiotic combinations. As more knowledge is built about the specific pathogens that cause inflammatory disease, other drugs besides the MP antibiotics may be developed that also target them effectively, adding to an already powerful arsenal to render them dead.
Of course, using the Microbiome database to identify the presence and species of bacteria targeted by the Marshall Protocol will require numerous researchers to perform shotgun sequencing of the bacteria in the tissues of patients with many forms of chronic disease. Sequences derived from such patients can be compared with the database in order to match DNA sequences with those of specific bacterial species.
“If periodic shotgun sequencing studies are performed as a patient progresses through the MP, they will undoubtedly reveal that MP patients have high bacterial loads at the onset of treatment and greatly reduced bacterial loads after several years of therapy.”Even better, shotgun sequencing can be used to convince skeptics of the MP’s validity. If periodic shotgun sequencing studies are performed as a patient progresses through the MP, they will undoubtedly reveal that MP patients have high bacterial loads at the onset of treatment and greatly reduced bacterial loads after several years of therapy. Absence of bacteria would, of course, correlate with disease resolution and cure. Such data would provide even the greatest skeptic with the proof necessary to confirm that the MP does indeed reverse inflammatory disease and successfully kill chronic idiopathic bacteria.
The few shotgun sequencing studies performed to date have already helped Marshall flesh out certain aspects of the Marshall Pathogenesis. A recent shotgun sequencing study that detected the species of bacteria present on prosthetic hip joints allowed him to identify (using molecular software) that the chronic pathogen Flavobacter, creates a lipid capable of dysregulating the Vitamin D Receptor (VDR). The discovery finally provided proof of concept for the fact that many of the chronic pathogens we harbor almost certainly increase their survival by creating similar ligands that block the ability of the VDR to activate components of the innate immune response.
The fact that same study also found hydrothermal heat vent bacteria (which clearly cannot be killed by boiling) on the joints reinforces just how much we have yet to discover about the pathogens capable of inhabiting our bodies. Other pathogens detected by the research team include species such as Lysobacter, Methylobacterium, and Eubacteria. “None of these species were previously expected to exist in man” states Marshall. “These are species nobody is looking for, they are not picked up by PCR testing and nobody is culturing them.”
Consider that each species of bacteria detected in the above study has about 1,000 – 4,000 genes. So together they create about 100,000 genes that are active in the body, yet are not even contemplated by the vast majority of mainstream researchers. And those are only the pathogens detected by one molecular analysis.
As previously discussed, the European Commission has launched a four-year initiative, called Metagenomics of the Human Intestinal Tract (MetaHIT). The project, which contains many different initiatives, overlaps somewhat with the American initiative in the sense that the European team is required to sequence bacterial genomes for a database. American and European results will be put in the same database, one which is freely available for anyone interested in sequencing and identifying bacterial DNA. And who isn’t?
But Tract (Meta HIT) has a different goal than the American Microbiome Project. It will focus on better understanding of the bacteria that inhabit the gut and how they contribute to obesity and inflammatory bowel disease. And, according to Mullard, whereas the Human Microbiome Project is initially comparing people’s microbiota on a species level, MetaHIT aims to find differences in microbial genes and the proteins they express without necessarily worrying about which species they came from.
“We don’t care if the name of the bacteria isEnterobacter or Salmonella. We want to know if there is an enzyme producing carbohydrates, an enzyme producing gas or an enzyme degrading proteins,” explains Francisco Guarner, a gastroenterologist at Vall D’Hebron University Hospital in Barcelona, Spain. We want to “examine associations between bacterial genes and human phenotypes,” says Dusko Ehrlich, coordinator of MetaHIT and head of microbial genetics at INRA, the French agricultural research agency in Jouy-en-Josas.
This is similar to the approach currently taken by Marshall who is more interested in the observable characteristics of the bacteria, including how they respond to different antibiotics and what substances they produce, than in identifying individual species.
A handful of projects have already tried to characterize the bacteria that cause bowel disease and obesity, including research by Jeff Gordon at Washington University in St. Louis, who found different compositions of bacteria in obese and lean subjects (for details, see my article on obesity). Then, there was the 2006 project by Steven Gill and colleagues at the Institute for Genomic Research in Rockville, Maryland, who threw around some then-hefty numbers when they carried out such a metagenomic analysis of the microbes in two people’s intestines. After 2,062 polymerase chain reactions and 78 million base pairs, the team provided only the briefest of glimpses into the genetic underpinnings of the human gut’s microbes.
According to Mullard, these first surveys involved too few individuals and sampled too few microbes, usually from only the gut or the mouth, to provide an adequate description of the microbiome. But things have changed in the past few years. A few million foreign genes no longer sound so daunting in the face of advanced genetic-sequencing methods that are capable of crunching monumental amounts of numbers. As with the American Microbiome project, thanks to certain cutting-edge technologies, researchers can assess hundreds of millions of base pairs in just a few hours.
An in-depth analysis by Tract (MetaHIT) researchers of exactly what microbes inhabit the gut and what substances they produce will only enhance the knowledge already derived from Marshall’s work, which shows that chronic bacteria in the gut or elsewhere survive largely because they have evolved mechanisms to block the activity of certain receptors that would otherwise activate pathways that would inhibit their survival. A better understanding of what substances gut bacteria create may help Marshall and other scientists identify other ligands that dysregulate the VDR or other receptors.
Combining data derived from Tract (MetaHIT) with that derived from Marshall’s work will also provide an opportunity to better understand exactly what the mass microbes in our gut are up to. Historically, researchers have understood that a great number of bacteria thrive in the gut. However, in the absence of enough data showing how pathogens in the gut survive, or how gut bacteria contribute to disease, they have only been able to guess at the role of the gut microbial community.
Such researchers have proven to be optimists. Over the past decades, the vast majority of them have concluded that, if bacteria are present in large numbers in the gut, they must be doing something helpful. That, or gut bacteria have been assumed to be commensal – helping the human gut in some way and in turn obtaining nutrients from the host. Yet nature shows that commensal relationships are not necessarily the rule. Sure, you’ll find mites on certain birds or orchids growing on trees. But in the majority of cases where two species interact, one usually takes advantage of the other. Furthermore, considering what we already know about bacteria, they are almost always guilty of exploiting the resources of their host. So it may be wishful thinking to assume that the bacteria in our guts are largely friendly helpers.
This isn’t to say that there aren’t species of gut bacteria that can provide a benefit to their host. Yet increasing evidence points to the fact that the vast majority of gut bacteria are actually responsible for causing many bowel diseases previously considered to be of “unknown” cause. When faced with the large number of different inflammatory bowel diseases and the fact that a tremendous number of uncharacterized bacteria inhabit the gut, it’s logical that there’s a connection between the two phenomena. Of course, Marshall’s in silico work, as well as data derived from the MP study site shows that patients who kill large numbers of gut bacteria end up recovering from a number of bowel diseases, providing a good deal of support for the above hypothesis.
This all invokes the rather controversial question, “Do humans really need gut bacteria?” Those patients to spend long periods of time on the MP have killed a great deal of their gut bacteria, yet seem to have GI tracts that function properly. Marshall has conceded that “good” gut bacteria could potentially exist, but as of yet, he has simply seen no evidence of a species that offers humans a benefit.
“Then again, whether or not a certain species of gut bacteria may be considered “helpful” may depend on a person’s set of circumstances.”Then again, as Marshall describes, whether or not a certain species of gut bacteria may be considered “helpful” may depend on a person’s set of circumstances. It’s widely accepted that some gut bacteria help metabolize carbohydrates, causing people to absorb about 15-20% more of the energy from the carbohydrates they ingest. In a country like the United States, where the majority of people are well-fed, or in many cases over-fed, the presence of such bacteria in the gut might provide a distinct disadvantage. People who have access to enough food are usually seeking to lose weight and, in such cases, the presence of bacteria that glean more energy from carbohydrates would contribute to weight gain. The average American would probably be better off without such species in the gut.
But what about people in developing countries who face food shortages and are often limited to eating just the amount of food they need to survive? Under such circumstances, the presence of a bacterial species in the gut that gleans more energy from carbohydrates would be seen as a great advantage, allowing people to acquire more energy from a smaller portion of food. In a world where even the developed world may face food shortages in the future, one can never tell if someday such bacteria would provide a benefit to the entire population.
Yet, possibilities like the one discussed above still don’t answer the questions of whether humans actually need gut bacteria. Bacteria and humans (or our ape-like ancestors) have evolved in tandem for millennia. Are pathogens’ ability to inhabit our bodies an evolutionary adaptation that serves to benefit both humans and bacteria, or is it possible that the ability of microbes to persist in the human body is largely an evolutionary victory for bacteria won at our expense? As more and more diseases are linked to bacteria previously considered innocuous, the latter is becoming an increasingly plausible possibility.
Compare the human body to planet Earth. Creatures including human beings have evolved to live on our planet, yet does the Earth need the presence of such animals to survive? Most people would agree that the Earth would manage just fine without human beings. Although a handful of humans may strive to enhance certain aspects of our natural surroundings, the vast majority of mankind is depleting the Earth’s resources, leading to massive problems such as climate change, pollution, and an accumulation of fake chemicals in our water and food supply. So if we compare the bacteria that inhabit our guts and bodies to the people that inhabit our planet, it’s plausible that both might be better off without alien inhabitants.
Of course, some animals may be seen as beneficial to Earth. The earthworm restores the resiliency of soil, or the honey bee carries pollen from flower to flower. Yet even under these potentially beneficial circumstances, one can still question whether the Earth could maintain a state of homeostasis on it’s own without such help, or quickly evolve different ways to manage without such aid.
Furthermore, there is no question that any bacteria, whether friend or foe, places a burden on the innate immune system. With trillions of bacteria to keep track of, the innate immune system is constantly at work, prioritizing which bacteria to attack and determining where immune system cells should be located. In fact, researchers estimate that 70 percent of the immune system is located in and around the gut. Imagine if gut bacterial load was reduced to the point where much of this burden was lifted? The innate immune system would certainly be able to divert much more strength towards killing pathogens in other tissues as well new pathogens attempting to enter the body. Of course, as of now, such a scenario would only be possible if a person were to remain on low, pulsed antibiotics for a lifetime. Without the help of antibiotics, it seems reasonable to conclude the innate immune system would be over-burdened by the task of keeping the body bacteria-free.
Some may argue that probiotics are beneficial bacteria yet, as described in this article. An alternate hypothesis about how they provide palliation must also be factored into the picture.
All this means that Tract (MetaHIT) researchers, MP researchers, and scientists studying gut bacteria in the light of new molecular technology may be facing a paradigm shift in the way gut microbes are perceived. Rather than viewing the majority of them as “friends,” we may unfortunately have to face the fact that many of them are enemies, or at least not necessary for our well-being. It still remains unclear if humans would want to be completely bacteria-free if the option existed, but the possibility that a person would be in better health without bacteria is nevertheless an intriguing possibility. Or perhaps in the future, humans will be able to pick and choose the bacteria that will inhabit their guts, in order to harbor certain species that fit their specific needs.
The fact that the Microbiome project and Tract (MetaHIT) plan to generate so much new information on bacteria means that collaboration between the two groups and other smaller groups involved in bacterial sequencing is important. According to Bork, when all the projects are running at speed, reams of data will be generated worldwide. But because different groups are using different techniques to collect samples, extract DNA and annotate data, the data sets could be difficult to compare.
Enter the as-yet-unlaunched International Human Microbiome Consortium. Scientists from several international projects, including the Human Microbiome Project and MetaHIT, have been meeting since late 2005 to figure out how to collaborate on a range of issues such as the compatibility of data and which bacteria to sequence for the reference database. The group is already setting up infrastructure and “beginning to address the tough questions,” says Weinstock. But according to Nathan Blow of Nature News, it is too early to say how well the Consortium will foster collaboration. Its official launch, scheduled for this past April, was postponed for six months to allow the NIH and the European Commission to overcome bureaucratic difficulties.
Another issue being addressed by the Consortium is that of intellectual property. As with other genomic projects, members of the Consortium will be expected to release sequence data into the public domain as soon as they are generated. But according to Blow, this doesn’t necessarily preclude disputes over intellectual property if, for instance, a particular bacterial gene proves to be a useful diagnostic marker for a disease. Another unresolved question is whether a laboratory can have one project that abides by the Consortium’s regulations, and another that doesn’t. “There are grey areas, and I feel that until we have a test case, they will have to be watched very carefully,” says Bhagirath Singh of the Canadian Institute of Health Research, who is helping to develop the Canadian Microbiome Initiative.
“It’s increasingly important for researchers and doctors to start pulling information out of individual laboratories and individual clinic records so that we may compile it.”Intellectual privacy and patent issues aside, optimism for the collaboration still runs high, and having a database of bacterial sequences that is available to other research teams and perhaps even the public would be a great step forward in a medical movement that many believe needs a pick-me-up. It’s increasingly important for researchers and doctors to start pulling information out of individual laboratories and individual clinic records so that we may compile it. Patterns and associations can be detected much more easily when large groups of data are gathered simultaneously and made accessible to as many people as possible. The computer open-source movement, which has spread to many other fields, has seen incredible success in areas where research and data are openly shared. Access to open-source databases will almost certainly augment the pace of major medical discoveries – a pace that, MP aside, can often seem as if it’s at a current standstill.
Participants from microbiome projects around the world have expressed the desire to join and attend the Consortium. Like bulls ready to race down the streets of Pamplona, such research teams will be competing with each other as the search to sequence the microbiome moves forward. After all, each group wants credit for identifying as many new species of bacteria as possible.
“The intention is to work together,” says George Weinstock, a geneticist at Washington University in St Louis, who is helping to organize the Human Microbiome Project, “but for the moment it is more about working in parallel until we can understand how to work together”. Apparently some European researchers feel at a disadvantage because MetaHIT’s operating budget is only a quarter the size of the Human Microbiome Project’s. “This is giving a huge advantage to the Americans,” Guarner says. “They are going to be quicker and they have more equipment.”
Then again, other members of MetaHIT feel that they actually have an edge because money for their project has already been distributed and data collection is under way, whereas the Human Microbiome Project will not announce many of its funding decisions until later this year. “We have an advantage already, we have a show on the road,” says Willem de Vos, a microbiologist at Wageningen University in the Netherlands and a member of MetaHIT.
For example, in Denmark, a team led by Oluf Pedersen at the Steno Diabetes Centre in Copenhagen is collecting fecal samples from 120 obese volunteers and 60 controls to tease out specific microbial genes that might contribute to obesity. A similar-sized study in Spain, led by Guarner, will compare the microbiotas of patients with inflammatory bowel disease with those of genetically matched controls and examine the effect of drugs.
Others feel that the sharing of data will simply allow the most ingenious teams to get ahead. “If it is an international consortium, it doesn’t matter where the data are generated,” Bork adds. “For example, we can be the pirates here, sitting at the end in Europe, and use American data to make the discoveries.”
As Blow describes, given the number of separate projects, all at such an early stage, it’s almost impossible to make out where the starting line lies or who exactly is edging ahead. But for many of us, the potentially intense competition among microbiome researchers is a welcome change to the increasing number of “consensus conferences” in which researchers with the same opinions fail to consider alternate lines of thinking and generate novel hypotheses. Competition has the potential to speed up output, allowing for a medical community that may stall less and deliver more. Furthermore, when faced with talented competitors, researchers are more likely to consider new hypotheses and break from the norm in order to gain an edge over an opposing team.
“Competition has the potential to speed up output, allowing for a medical community that may stall less and deliver more.”Then again, the fact that trillions of bacterial genomes must be sequenced means that at the current moment there is plenty of work for each research team and multiple ways for every research team to excel. With trillions of microbes to sift through, most researchers feel that there is more than enough of the microbiome to go around. “There’s so much to learn, so much we don’t know and so many adventures,” Gordon says. “There’s enough room for everyone.”
How many different bacterial species the microbiome project will uncover remains anyone’s guess. But according to Blow, many of the researchers involved with the project have one impending question.
“One of the things that is obsessing microbiologists is: ‘What is the size of the core microbiome?,’” says Jeremy Nicholson, a biological chemist who studies microbes and metabolism at Imperial College in London.
By core microbiome, researchers like Nicholson are referring to a hypothesized number of bacteria that every person might harbor. For example, if some bacteria are shown to have a beneficial effect on human health, then perhaps everybody needs a certain number of these pathogens to survive. Then again, if all people harbor certain bacterial species, such pathogens may be seen in a different light. If a “core microbiome” is established, then perhaps the bacteria that comprise it contribute a process that happens to every human being. That process is aging.
As Marshall and colleagues discussed at the recent “Understanding Aging: Biomedical and Bioengineering Approaches” conference at UCLA, it’s entirely possible that the bacteria we harbor are able to infect our stem cells – cells found in all adult tissues that act as a repair system for the body by replenishing other more specialized cells. But as people age, stem cells often lose their ability to repair and heal. If bacteria infect the stem cells, it has been hypothesized that they may expedite the rate at which they lose their resiliency, thus accelerating the aging process.
Remember the above discussion about how certain microbes can allow people to glean 15% more energy from the carbohydrates they consume? While beneficial under some circumstances, Marshall warns that if such a bacterial species can infect nearby stem cells, they will contribute to the aging processes in the gut.
Several studies support the possibility that chronic bacteria can infect the stem cells. A team of German researchers recently showed that patients who had suffered a heart attack (an event most likely caused by chronic bacterial forms in the heart and blood vessels) had stem cells which were only about half as effective at repairing the heart tissue as stem cells transplanted from healthy 20 year-old males. This supports the view that infected stem cells lack many of the healing properties maintained by their healthy counterparts.
Dr Emil Wirosko, one of the foremost experts on L-form bacteria, died before he could publish on the subject. But according to his colleagues, Wirosko believed L-form bacteria are able to infect stem cells.
Then there are telomeres – DNA sequences on the ends of chromosomes that are gradually lost as cells replicate. As they shorten, a cell can no longer divide and becomes inactive or dies – meaning that the length of a person’s telomeres plays a role in how quickly they will age. The fact that people with heart disease, Alzheimer’s, cancer, and other illnesses have been shown to lose telomere sequences at a faster rate than their healthy counterparts suggests that the bacteria involved in causing such diseases may also have an effect on telomere length. As Marshall describes, if pathogens do directly alter our DNA, then the weakened DNA at the ends of telomeres provides some of the easiest genetic material for them to mutate.
Once again then, the question is posed: What might occur if humans were to become largely bacteria-free? Might they age at a slower rate? The possibility is tantalizing. Data from people on the Marshall Protocol, who are gradually reducing their bacterial loads, will prove to be increasingly insightful in this regard as time wears on.
As previously discussed, the sequencing of the human genome alone does not allow for the Gattaca-like world described earlier in which humans could be identified and catalogued by their unique DNA sequences. Ironically, the human Microbiome Project and Marshall’s work might make that world more of a reality. If it turns out that the bacteria we harbor are a source of disease and a burden on the innate immune system, then the population will seek (like those people on the MP) to eliminate at least the majority of them.
If sequencing procedures then no longer detect bacterial genes along with human genes, it may be possible to sequence a more fully human genome. One must still factor in DNA mutated by other environmental factors or by chance, but nevertheless, we would be closest to actually answering the question “Who am I?”
Perhaps then, after people have eliminated much of their bacterial load, genetic information will prove to be a more valuable human fingerprint, ethical issues aside.
In the meantime, an optimal environment to better the health of humankind will be one in which controversial hypotheses such as that described above are at least put on the table, and new ideas that challenge current paradigms are embraced rather than rejected. Under such conditions scientists can fully live out Mullard’s advice to, “Celebrate their quest to map, catalogue, and understand the human microbiome for the inspiring saga it is.”
Amy Proal graduated from Georgetown University in 2005 with a degree in biology. While at Georgetown, she wrote her senior thesis on Chronic Fatigue Syndrome and the Marshall Protocol.