ABSTRACT
The low-density lipoprotein receptor (LDLr) plays a pivotal role in cholesterol
homeostasis. Mutations in the LDLr gene (LDLR), which is located on chromosome
19, cause familial hypercholesterolemia (FH), an autosomal dominant disorder
characterized by severe hypercholesterolemia associated with premature coronary
atherosclerosis. To date almost 300 mutations have been identified in the LDLR
gene. To facilitate the mutational analysis of the LDLR gene, and promote the
analysis of the relationship between genotype and phenotype, a software package
along with a computerized database (currently listing 210 entries) have been
created.
The low-density lipoprotein receptor (LDLr) is an ubiquitous transmembrane
glycoprotein of 839 amino acids that mediates the transport of LDL into cells,
via receptor-mediated endocytosis, and is pivotal in cholesterol homeostasis (
1
). Defects in the LDLr result in familial hypercholesterolemia (FH). FH is
characterized clinically by raised plasma LDL-cholesterol concentrations, xanthomas and early coronary heart disease (
2
). FH is an autosomal dominant trait, homozygotes being more severely affected
than heterozygotes. FH is also one of the most common inherited disorders with
a frequency of heterozygotes estimated to be 1:500 and a frequency of
homozygotes being ~1:10
6
in most populations. FH frequency is higher in certain communities in which a
small number of mutations predominate due to founder effects. These communities
include French Canadians (
3
), Christian Lebanese (
4
), Druze (
5
), Finns (
6
), Afrikaners (
7
) and Ashkenazi Jews of Lithuanian descent (
8
).
The gene for the LDL receptor (LDLR) located at chromosome 19p13.1-13.3 (
9
), spans 45 kb, and is divided into 18 exons (
10
). The correspondence between gene exons and functional domains of the mature
protein is well established (
10
).
Exon 1
encodes the 21 amino acids of the signal sequence which is cleaved from the
protein during translocation into the endoplasmic reticulum (ER).
Exons 2-6
encode the ligand-binding domain, which is made up of seven repeats of 40 amino acids each.
These repeats are homologous to sequences of the protein C9 of the complement
cascade (
10
). Each repeat contains six cysteine residues that form three disulfide bonds.
The C-terminal end of each repeat contains a negatively charged triplet, SDE,
important for ligand binding.
Exons 7-14
encode a 400 amino acids sequence that is 33% identical to a portion of the
human epidermal growth factor (EGF) precursor gene. This region includes three
growth factor repeats which are 40 amino acid cysteine-rich sequences that differ from the cysteine-rich sequences in the ligand-binding domain. The two first repeats are contiguous and
separated from the third by a 28 amino acids sequence that contains five copies
of a conserved motif (YWTD) repeated once each 40-60 amino acids (
10
). This domain is required for the dissociation of lipoproteins from the
receptor in the endosome during receptor recycling. It also serves to position
the ligand binding domain so that it can bind LDL on the cell surface (
11
).
Exon 15
encodes 58 amino acids that are enriched in serine and threonine residues,
which serve as attachment sites for O-linked sugar chains. Absence of this exon has no significant functional
consequence in cultured hamster fibroblasts (
12
).
The 3
'
end of exon 16 and the 5
'
end of exon 17
encode the 22 hydrophobic amino acids of the membrane-spanning domain.
The remainder of exon 17 and the 5
'
end of exon 18
encode the 50 amino acids that make up the cytoplasmic domain. This domain is
important for the localization of the receptor in coated pits on the cell
surface (
12
-
14
).
The remainder of exon 18
specifies the 2.6 kb 3' untranslated region of the mRNA.
Table 1
In normal fibroblasts, an LDLr precursor protein (120 kDa) is produced in the
ER. Within 30 min the protein (160 kDa) is transported to the Golgi complex.
From the Golgi complex, the receptor is transported then to the cell surface
where it binds its ligand, LDL, and is internalized by endocytosis (
1
). Mutations in the LDLR gene have been classified into five functional groups
based on the characteristics of the LDLr produced (
15
).
Class 1
mutations disrupt the receptor's synthesis in the ER.
Class 2
mutations block transport to the Golgi complex:
class 2A
mutations completely block receptor transport, while
class 2B
mutations produce proteins that are transported at a detectable, but markedly
reduced rate.
Class 3
mutations produce proteins that are transported to the cell surface, but fail
to bind LDL normally.
Class 4
mutations affect the cytoplasmic domain alone (4A) or also the membrane-spanning region (4B). They produce proteins that cannot internalize bound
LDL into the cell. Finally,
Class 5
mutations block the acid-dependent dissociation of receptor and ligand in the endosome, an
essential event for receptor recycling (
16
).
Through an extensive survey of the literature we found that 302 mutations have
been reported in the LDLR gene. Among these, only 72 (~25%) are major rearrangements. Therefore the majority of FH-causing mutations are either small deletions/insertions/duplications
or point mutations. With the exception of " founder ' gene mutations, many mutations are extremely rare and have been
identified in single families only. In effect, true recurrence has only been
conclusively demonstrated in a few cases. While much effort has been put into
the identification of molecular defects in the LDLR gene, few teams (except
Hobbs and coworkers) (
16
) have explored their functional implication and hardly no effort has been made
to investigate genotype/phenotype relationships. In this perspective, and to
facilitate the mutational analysis of the LDLR gene, we have compiled a
database and created a software package for its analysis.
In an effort to standardize the information regarding LDLr mutations, we have
created a computerized database that currently contains information about the
published mutations of the LDLr gene. The current version of the database
contains 210 entries (
3
-
6
,
8
,
15
-
71
). Major rearrangements, as well as the six mutations in the promoter sequence (
16
,
72
-
74
), the 14 splice mutations (
16
,
44
,
46
,
47
,
75
-
78
) and polymorphisms were omitted as they cannot be accommodated in the present
version of the software. For each mutation, information is provided at several
levels: at the gene level (exon and codon number, wild type and mutant codon,
mutational event, mutation name), at the mRNA level (size, processing), at the
protein level (wild type and mutant amino acid, affected domain, activity,
mutation class), and at the personal level (ethnic background, age, sex, body
mass index and familial history of coronary heart disease). Table
1
gives part of the database in Excel spreadsheet format.
The software package contains routines for the analysis of the LDLR database
that were developed with the 4th dimension
R
(4D) package from ACI. The use of the 4D SGDB gives access to optimized
multicriteria research and sorting tools to select records from any field.
Moreover, six routines were specifically developed: (i) `Position' studies the
distribution of mutations at the nucleotide level to identify preferential
mutation sites; (ii) `Statistical evaluation of mutational events' is
comparable with (i) but also indicates the type of mutational event. The result
can either be displayed as a table or in a graphic representation; (iii)
`Frequency of mutation' allows one to study the relative distribution of
mutations at all sites and to sort them according to their frequency. A graphic
representation is also available and displays a cumulative chart of mutation
distribution; (iv) `Stat exons' studies the distribution of mutations in the
different exons. It enables detection of a statistically significant difference
between observed and expected mutations. (v) `Protein' studies the distribution
of mutational events in various protein domains (ligand-binding and EGF-precursor-like motifs), and aligns the amino acids of the consensus
sequence for each domain type. (vi) `Insertions and deletions analysis'
searches for repeated sequences surrounding the mutation and possibly involved
in the mutational mechanism.
The present version of the database contains no clinical data as these are
incompletely given in almost all mutation reports that have been published.
However, as the purpose of this database is to promote not only the molecular
analysis of mutational events within the LDLR gene, but also genotype/phenotype
relationships, the database will be expanded in the future to include clinical
data (symptomatic coronary artery disease, xanthomas) and biological data
(total plasma cholesterol and LDL-cholesterol before or without treatment), as well as the ages at which
they were assessed and, when appropriate, the age of death. Furthermore, data
should also be available concerning therapy. Finally, the software will be
expanded as the database grows and according to the requirements of its users.
New functions could be implemented comparable with those already available in
the APC gene mutations database (
79
).
The current database and subsequent updated versions are (will be) available on
request from M.V. and C. Bo. on floppy disc using Apple format and Microsoft
Excel
R
. Notification of omissions and errors in the current version as well as
specific phenotypic data would be gratefully received by the corresponding
authors. The software package is available on a collaborative basis.
This work was supported by grants from GREG (Groupe de Recherche et d'Etude du Génome), Fondation de France, AFM (Association Française contre les Myopathies), Université René Descartes Paris V, Ministère de l'Education Nationale, de
l'Enseignement Supérieur, de la Recherche et de l'Insertion Professionnelle (ACC-SV2), and Faculté de Médecine Necker. M.V. is supported by a grant from
Ministère de l'Education Nationale, de l'Enseignement Supérieur, de la Recherche et de l'Insertion Professionnelle.
REFERENCES
Return
