Biennial and Annual Report on the Rare Diseases Research Activities at the National Institutes of Health FY 2004

National Library of Medicine (NLM)

Overview of NLM Rare Disease Research Activities

The National Library of Medicine (NLM) provides information resources useful to rare disease research and to those seeking information about conditions that affect them or their families.

Database Resources

  • Citations to articles on rare diseases have long been available in the MEDLINE database, now accessible to researchers, health professionals, and the public through NLM's free Web-based PubMed system, and also in the TOXNET system. and
  • MedlinePlus, NLM’s consumer health information service, has a general rare diseases page that has been effective in referring members of the public to the NIH Office of Rare Diseases at and was accessed 15,655 times during FY 2004. MedlinePlus also incorporates links on health topic pages to Genetics Home Reference, a new NLM database that includes many rare diseases. Currently there are 125 links from MedlinePlus to Genetics Home Reference topics such as amyotrophic lateral sclerosis, Gaucher disease, and Marfan syndrome. In addition, MedlinePlus has continued to add topics on specific rare diseases requested by consumers; examples during FY 2004 included pulmonary hypertension, ataxia telangiectasia, anal cancer, premature ovarian failure, and rickets.
  • Online Multiple Congenital Anomaly/Mental Retardation (MCA/MR) Syndromes,, a database of structured descriptions of congenital abnormalities (many of them rare) associated with mental retardation. This database was searched nearly ½ million times in 2003.
  • The Genetics Home Reference (GHR) is the NLM’s Web site for consumer information about genetic conditions and the genes and chromosomes responsible for those conditions. GHR’s integrated Web-based approach provides brief, consumer-friendly summaries of genetic conditions and related genes and chromosomes. Understanding is enhanced by direct links to glossary definitions and a handbook called Help Me Understand Genetics that explains the fundamental genetic concepts. Additional links to consumer information from MedlinePlus, applicable clinical trials, and relevant patient support groups are provided. Each summary also includes links to advanced information from the NLM and other authoritative sources. GHR currently offers summaries on more than 119 genetic conditions, including numerous rare diseases and disorders, 186 genes, and each of the 23 pairs of human chromosomes. New content is added and updated on a regular basis and reviewed by experts in human genetics.
  •, NLM’s consumer health information system for linking patients to medical research, currently includes approximately 12,200 studies. Of these, about 5,714 represent approximately 850 rare disease conditions. Also, since FY 2003, study records in, including those investigating rare diseases, were linked to relevant genetic condition summaries in GHR. Such links provide consumers seeking information about clinical trials with additional background about the conditions and the genes responsible for the conditions. and

Research Support

  • NLM, working in partnership with organizations in Africa, the United States, the United Kingdom, and Europe, has created MIMCom.Net, the first electronic malaria research network in the world. Using satellite technology, the network provides full access to the Internet and the resources of the World Wide Web as well as access to current medical literature for scientists working in Africa. The African research sites are of recognized high quality, require improved communications to accomplish ongoing research, and have the necessary resources to purchase equipment and sustain the system.
  • NLM is assisting NCRR in establishing the NIH-funded Rare Disease Clinical Research Network as an early test-bed for the use of standard clinical vocabularies to improve the efficiency of clinical research.
  • A Multicenter Clinical Trial Using Next Generation Internet (NGI) Technology. NGI technology was applied to provide the infrastructure of a multicenter clinical trial of new therapies for adrenoleukodystrophy (ALD), a fatal neurologic genetic disorder. This project involved the formation of a worldwide imaging network of clinical institutions to evaluate ALD therapies. This network was required to provide a sufficient number of patients for evaluating ALD therapies. This can serve as a model for many other disorders. Three centers collaborated on this project: the Imaging Science and Information Systems (ISIS) Center at Georgetown University Medical Center, the Kennedy Krieger Institute, and the Department of Radiology at Johns Hopkins University. NGI technology was used to speed the transmission and evaluation of high-quality MRI images. Another important feature of this project was to gain insight into procedures that ensure medical data privacy and security.


  • Kawasaki disease (KD) is an acute, self-limited illness of infancy and early childhood that has now replaced rheumatic fever as the leading cause of acquired heart disease in children in the United States and Japan. Although the acute illness resolves spontaneously, permanent damage to the coronary arteries occurs in 20–25 percent of untreated children. The cause of KD remains unknown and there is no specific laboratory test to identify affected children. Nonetheless, an effective treatment exists that significantly reduces the risk of coronary artery damage. KD thus presents a unique dilemma: the disease may be difficult to recognize, there is no diagnostic laboratory test, there is an extremely effective therapy, and there is a 25-percent chance of serious cardiovascular damage or death if the therapy is not administered. NLM is funding a publications grant to the Kawasaki Disease Foundation to support the continued collaboration of an unusual multidisciplinary team with expertise in documentary film making, parent advocacy, pediatric medicine, anthropology, and the history of medicine to produce a Web-based archive of interviews and a television documentary to increase public awareness of KD and to support scholarly research on the origins of this emerging pediatric disease. Funds from this application will support three major interviewing sessions in Japan, Hawaii, and San Diego. The film will focus on (1) the importance of informed parents in establishing the timely diagnosis of KD, which permits effective treatment and prevention of complications and (2) the history of KD, showing that the ways in which it emerged as an internationally recognized disease mirror the ways in which it is now diagnosed or misdiagnosed in our contemporary health care system. In the case of KD, informed parent advocacy can mean the difference between life and death for an affected child.
  • Hepatitis B is the world's most common serious liver infection and is associated with more than 80 percent of liver cancer worldwide. Those that are affected are in urgent need of information to help them to successfully deal with their disease and maintain a high quality of life. The long-range goal of this information system grant to the Hepatitis Foundation is to provide information and support to the millions of individuals worldwide that are affected by hepatitis B. In particular, this project is developing a comprehensive hepatitis B Web site to bring high-quality health information to consumer users and health care professionals. The creation of this "virtual community" includes high-technology features such as an e-newsletter and expert forums, delivered in a user-friendly package. Additionally, Web site content and features will be personalized in response to the specific needs and preferences of the target audiences. This project also focuses on creating an important connection between information seekers and the National Library of Medicine databases, including MEDLINE, MEDLINEplus, and The Web site will be evaluated throughout the 3-year timeline to incorporate user feedback into the development process, to track Web site usage statistics, and to determine the impact of the Web site on the targeted users. Ultimately, the completed Web site can be utilized as a model for the interactive dissemination of high-quality health information through the Internet.

Rare Disease Research Initiatives

The National Center for Biotechnology Information (NCBI), a division of NLM, serves as a national public resource for molecular biology information. In this capacity, NCBI establishes and maintains various genomic databases and develops software tools for mining and analyzing this data, all of which are freely available to the biomedical community to facilitate a better understanding of the processes affecting human health and disease.

The Human Genetic Map

NCBI is responsible for collecting, managing, and analyzing the growing body of data being generated from the sequencing and mapping initiative of the Human Genome Project. At present, NCBI makes the sequence of the entire human genome, with its complement of over 28,000 known and predicted genes, available without restriction to the public. This unrestricted access has expedited the decoding by the scientific community of important human genes and, as a result, scientists are beginning to understand the causes of many rare diseases. Access to the complete human genome and the related genetic data at NCBI helps scientists determine the organization of the genes on a chromosome, study how these genes produce their protein products, investigate how changes in a gene’s DNA sequence give rise to a disease-causing mutation, and study how chromosomes are duplicated and inherited. Scientists have used these strategies to study gene defects on chromosomes 21 and 22 that lead to a variety of rare diseases, including Downs syndrome, Usher syndrome, DiGeorge syndrome, and Ewing’s sarcoma. NCBI investigators have also played an instrumental role in the identification and analysis of other disease genes and genetic loci and have analyzed genetic data leading to scientific advances in the understanding of several rare diseases and disorders, such as the identification and analysis of the genes for Kallmann syndrome and neurofibromatosis (NF1). Examples of other rare diseases currently being studied by NCBI investigators include ataxia telangiectasia, breast cancer, hyper-IgE syndrome, nemaline myopathy, and obesity.

Genetic Analysis Software

NCBI investigators are working to develop, implement, and disseminate high-performance computational tools and application software packages for the analysis of genetic data and its linkage to disease. Several of these software packages are described below.

FASTLINK is a computer program designed by NCBI investigators to analyze the associations between genes and genetic markers that lie near each other on a chromosome, a process called "genetic linkage analysis." Genes and other genetic markers that are linked are often inherited together and, therefore, can be used to map the location of a disease gene. NCBI investigators have used FASTLINK to study hyper-IgE syndrome, a rare immunodeficiency characterized by recurrent skin abscesses, pneumonia, and highly elevated levels of serum IgE. Using FASTLINK, researchers were able to find evidence linking this syndrome to chromosome 4. FASTLINK has been cited in over 400 other published genetic studies, including studies of macular dystrophy, type 1 hereditary sensory neuropathy, and Alstrom’s syndrome.

CASPAR (Computerized Affected Sibling Pair Analyzer and Reporter) is a computer program designed by NCBI investigators to study the genetics of complex diseases, or diseases involving the interaction of multiple genes. It allows a scientist to explore various hypotheses about how different factors may be involved in disease susceptibility. NCBI investigators have used CASPAR to study linkage analysis in patients with a form of diabetes.

The PedHunter computer program was developed to query genealogical databases to uncover connections between relatives that are afflicted with the same disease and to construct a pedigree suitable for genetic linkage analysis. NCBI investigators are using PedHunter to query the Amish genealogy database to collect information on various genetic diseases, including nemaline myopathy, a rare genetic neuromuscular disorder that is usually apparent at birth and is characterized by extreme muscle weakness. Using PedHunter, in combination with other genetic analysis software, NCBI investigators have demonstrated that, in the Amish, this disorder is caused by a mutation in the gene for the sarcomeric thin-filament protein, slow skeletal muscle troponin T (TNNT1). TNNT1 maps to chromosome 19 and has been previously sequenced. Further analysis resulted in the identification of a stop codon that segregated with the disease. Researchers concluded that Amish nemaline myopathy is a distinct, heritable, myopathic disorder caused by a mutation in TNNT1.

The Comparative Genomic Hybridization (CGH) analysis software package is being used by NCBI investigators for modeling the process of tumor formation in various forms of cancer.

The function of the software is to develop models that relate genetic aberrations with tumor progression. Investigators have used CGH as part of a larger project to search for and identify possible susceptibility loci involved in both breast and bladder cancer. Investigators have also published the results of a case study in which CGH was used to analyze chromosomal abnormalities in a large collection of ovarian cancer samples.

Three-dimensional Structure Database

NCBI’s Structure Research Group maintains a database of experimentally determined three-dimensional biomolecular structures as well as tools for visualizing and analyzing these structures. Three-dimensional structures provide a wealth of information on the biological function of a molecule, on mechanisms linked to function, and on evolutionary history of and relationships between macromolecules—all valuable clues leading to a better understanding of rare diseases.

For example, in 1995, the structure of leptin—the protein coded within a gene linked to obesity and diabetes—was predicted by NCBI investigators using the structure database. After the discovery of leptin, researchers analyzed the protein’s sequence and determined that it exhibited no similarities to other known proteins. NCBI investigators hypothesized that leptin was ancestrally related to at least one other protein whose sequence had diverged such that only a comparison of three-dimensional structures might detect a relationship. Investigators conducted a search of the database to determine whether this protein might adopt a similar fold pattern, or structure, to that of a protein structure already stored in the database. They discovered that leptin’s sequence was compatible with the structure of a family of known proteins and predicted a structural model based on these results. Subsequently, this early prediction was confirmed by cloning of the protein’s receptor, and more recently, by x-ray structure determination. Now that the structure of leptin has been confirmed, future studies of leptin as well as other leptin-regulated genes may reveal the mechanisms by which leptin exerts its effect on the body.

Malaria Genetics and Genomics

Human malaria is caused by four Plasmodium parasites: P. falciparum, P. vivax, P. ovale, and P. malariae. Although P. vivax is widespread, P. falciparum is the most severe and lethal tropical parasite, leading to an estimated 1–2 million deaths each year, mostly of children in Africa.

The resurgence of malaria in recent years is mainly attributed to the emergence and spread of multiple-drug-resistant parasites and insecticide-resistant mosquito vectors, presenting a serious problem for travelers and the military in malaria-endemic regions as well as for the resident populations. Accordingly, much research at the NIH focuses on the treatment and prevention of malaria, which is a curable disease if promptly diagnosed and adequately treated.

The NCBI, in collaboration with the NIAID, has supported the efforts to sequence and analyze the complete genome of P. falciparum and related parasites, thereby providing researchers with access to information relating to all of the genes found in these parasites. Analyses of these genomic data by NCBI researchers are contributing to enhanced understanding of this complex disease and attempts to develop improved anti-malarial drugs, vaccines, and other control strategies. Moreover, a collaborative team of NIH investigators, including researchers from the NCBI, have constructed a genome-wide, high-resolution genetic linkage map of P. falciparum. The computer analyses based on these genetic parameters and markers have facilitated genome sequence assembly and are currently helping to define the genes involved in parasite resistance to multiple drugs and to trace the evolution and spread of these genes in parasite field populations in Africa, Asia, and the Americas.

NCBI's Malaria Genetics and Genomics Web page serves as an information and data resource covering Plasmodium and related parasites, including rodent malarias and the Anopheles gambiae mosquito vector. This resource includes links to the sequence and genomic data in Entrez Genomes and other NCBI databases together with unique information on genome maps, linkage markers, and genetic studies. Links are provided to various malaria research projects being conducted at NIH, to NIAID's Malaria Research and Reference Reagent Resource Center, and to other malaria-related sites.

The NCBI, in collaboration with the WHO's Special Programme for Research and Training in Tropical Diseases (TDR) and other international partners, has continued to support the international outreach efforts to train scientists in developing countries to use current bioinformatics tools and genomic data, such as the mosquito and malaria genome, for their own research. NCBI staff have provided coordination and instruction at several international bioinformatics training courses and centers in Africa, Asia, and South America, including the WHO/TDR sponsored Regional Training Course in Bioinformatics Applied to Tropical Diseases.

Additional Human Genome Resources

NBCI makes available a number of other resources to facilitate the widespread use of human sequence data. The Human Genome Resources Web page serves as a focal point for biomedical researchers from around the world, enabling them to use this data in their research. From the Human Genome Resources Web page, researchers can access the NCBI Map Viewer, which presents a graphical view of the available human sequence data in conjunction with cytogenetic, genetic, and physical maps. Researchers may quickly search for a gene or a gene marker of interest by querying against the entire human genome. Query results link to a graphical display of the gene or gene marker within the context of additional data. The coupling of the human genome sequence with genetic and physical maps bearing markers associated with disease allows researchers to identify candidate genes for further research. The NCBI Map View also allows the maps and genomic sequences of organisms used in models of human disease, such as mouse and rat, to be viewed alongside the human maps. The ability to compare sets of genomes in this manner allows researchers to use the results obtained in their laboratories with these model organisms to better understand the roots of human disease.

NCBI’s Genes and Disease Web page is designed to introduce visitors to the relationship between genetic factors and human disease.

Genes and Disease provides information for more than 80 genetic diseases, including many rare diseases. The Online Mendelian Inheritance in Man database, or OMIM, is a continuously updated catalog of inherited human disorders and associated sequence mutations, authored and edited by Dr. Victor A. McKusick and colleagues and developed for the Web by NCBI. OMIM now contains over 15,000 entries for diseases linked to over 9,000 locations on the human genome.

One of the primary reasons for sequencing the human genome was to gain an understanding of the role of genes in human disease. By studying the gene sequences associated with a human or model organism disease, researchers can gain important insights into the genetic and environmental basis of disease. The advances outlined here demonstrate the importance and utility of NCBI’s computer databases, data analysis tools, and software algorithms in identifying and understanding human disease genes and pave the way for the development of novel strategies to diagnose, treat, and ultimately, prevent disease.

Severe Acute Respiratory Syndrome Coronavirus Resource

The severe acute respiratory syndrome (SARS) virus was responsible for an outbreak of severe, atypical pneumonia in Guangdong Province, China, in late 2002. The disease had an extremely high mortality rate (up to 19 percent) and expanded rapidly to other countries. In April 2003, a previously unknown coronavirus was isolated from patients and subsequently shown to be the causative agent in experiments on monkeys. The first complete sequence of SARS coronavirus was produced by the BCCA Genome Sciences Centre, Canada, about 2 weeks after the virus was detected in SARS patients. It was submitted to the NCBI’s GenBank sequence database, and the NCBI Viral Genomes Group annotated the sequence and released it the next day. The availability of sequence data and the functional dissection of the SARS-CoV genome at NCBI has been a necessary prerequisite for developing diagnostic tests, antiviral agents, and vaccines. The NCBI Web resource includes the complete sequence as well as links to the latest sequence data and publications and results of pre-computed sequence analyses: genomic, protein, and structural.

Database of the Major Histocompatibility Complex

The NCBI dbMHC database provides an open, publicly accessible platform for DNA and clinical data related to the human major histocompatibility complex (MHC). MHC research and clinical data generated at meetings such as the International HLA Workshop and Congress have proven valuable to the international research community. NCBI makes these data available along with tools for submission and analysis of research data linked to the MHC. The dbMHC contains reagent data used for tracing DNA typing and a section with anonymous clinical data from MHC-related research projects related to diseases such as celiac disease, narcolepsy, ankylosing spondylitis, and hemochromatosis.


Previous Contents Next


Last Reviewed: July 22, 2005