Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (186K) Freely available
Right arrow Supplementary Material
Right arrow A corrigendum has been published
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (97)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Journet, E.-P.
Right arrow Articles by Gamas, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Journet, E.-P.
Right arrow Articles by Gamas, P.
Related Collections
Right arrow Computational methods
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2002, Vol. 30, No. 24 5579-5592
© 2002 Oxford University Press

Exploring root symbiotic programs in the model legume Medicago truncatula using EST analysis

Etienne-Pascal Journet*, Diederik van Tuinen2, Jérome Gouzy, Hervé Crespeau3, Véronique Carreau, Mary-Jo Farmer2, Andreas Niebel, Thomas Schiex1, Olivier Jaillon3, Odile Chatagnier2, Laurence Godiard, Fabienne Micheli, Daniel Kahn, Vivienne Gianinazzi-Pearson2 and Pascal Gamas

Laboratoire de Biologie Moléculaire des Relations Plantes-Microorganismes, CNRS-INRA, 1 Laboratoire de Biométrie et Intelligence Artificielle, INRA, 31326 Castanet-Tolosan Cedex, France, 2 UMR 1088 Biochimie, Biologie Cellulaire et Ecologie des Interactions Plantes-Microorganismes, INRA-Université de Bourgogne, 21065 Dijon, France and 3 Genoscope and CNRS UMR 8030, 91057 Evry Cedex, France

*To whom correspondence should be addressed. Tel: +33 561 285 324; Fax: +33 561 285 061; Email: journet{at}toulouse.inra.fr

Received July 15, 2002; Revised and Accepted October 18, 2002

DDBJ/EMBL/GenBank accession nos AL365523AL389869.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSION
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
We report on a large-scale expressed sequence tag (EST) sequencing and analysis program aimed at characterizing the sets of genes expressed in roots of the model legume Medicago truncatula during interactions with either of two microsymbionts, the nitrogen-fixing bacterium Sinorhizobium meliloti or the arbuscular mycorrhizal fungus Glomus intraradices. We have designed specific tools for in silico analysis of EST data, in relation to chimeric cDNA detection, EST clustering, encoded protein prediction, and detection of differential expression. Our 21 473 5'- and 3'-ESTs could be grouped into 6359 EST clusters, corresponding to distinct virtual genes, along with 52 498 other M.truncatula ESTs available in the dbEST (NCBI) database that were recruited in the process. These clusters were manually annotated, using a specifically developed annotation interface. Analysis of EST cluster distribution in various M.truncatula cDNA libraries, supported by a refined R test to evaluate statistical significance and by ‘electronic northern’ representation, enabled us to identify a large number of novel genes predicted to be up- or down-regulated during either symbiotic root interaction. These in silico analyses provide a first global view of the genetic programs for root symbioses in M.truncatula. A searchable database has been built and can be accessed through a public interface.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSION
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
The manner in which a number of biological questions can be addressed has profoundly evolved in the last few years with the advent of genomics. Indeed, the possibility to conduct large-scale analyses in functional genomics now opens the way to the identification of large sets of co-regulated genes involved in biological processes. This is interesting, not only to identify novel and possibly important molecular events, but also to investigate biological processes at the level of gene networks rather than individual genes. This type of approach is attractive for a better global understanding of complex developmental programs such as those activated during interactions between plants and microorganisms. In this analysis, we are particularly interested in the plant genetic programs involved in rhizobial and arbuscular mycorrhizal (AM) associations, the two main plant root endosymbioses (1).

The rhizobial and AM symbioses are of ecological and agronomical importance. In both associations, the microorganism provides the host plant with growth-limiting mineral nutrients (reduced nitrogen from rhizobia; phosphorus, other nutrients and water from the mycorrhizal fungus) and benefits in return from a privileged source of photosynthates. The comparative study of symbiotic interactions is also of major interest for many basic biological issues, such as signaling, morphogenetic and organogenetic processes in plants, or differential plant response to symbiotic versus pathogenic microorganisms (for recent reviews, see 25). The mechanisms involved in the establishment of the two associations appear to differ in a number of aspects. First, the extent of the host range is essentially restricted to legumes for rhizobia, with a generally narrow host-specificity for each Rhizobium species, whilst it is much broader for AM fungi, which can colonize most terrestrial plants and taxa, with the noticeable exception of the model plant Arabidopsis thaliana (6). Secondly, interactions of legume roots with rhizobia lead to the development of very specific organs, root nodules. In this process, the coordinated differentiation of the two partners creates a micro-aerobic environment allowing intracellular bacteroids to fix atmospheric nitrogen. In contrast, no specialized organ differentiates during root interactions with AM fungi but specific exchange structures (arbuscules) develop which provide an expanded membrane area between the two partners within the host cells (7,8). Finally, rhizobial colonization of roots necessitates specific structures of plant origin, the infection threads, which are tubular invaginations down which bacteria migrate through the root to the nodule primordium tissues. These structures appear within a few days after inoculation, by a process which is highly controlled by the plant in terms of timing, number and location (9). AM fungal infection proceeds by hyphal growth between and within epidermal and cortical root cells.

Despite these clear differences, genetic and molecular studies have shown that several common features characterize various stages in the establishment of these two root endosymbioses. A substantial proportion of plant mutations abolishing early symbiotic responses to rhizobia and nodulation (Nod phenotype) also block mycorrhiza development (Myc phenotype) (reviewed in 3,10). Recently, map-based cloning strategies applied to such Nod/Myc mutations in the two model legumes (see below) have led to the molecular identification of orthologous genes encoding a receptor-like protein kinase necessary for both symbioses (11,12). In addition, several nodulin genes, i.e. genes induced during nodulation, appear to be activated in AM interactions (13 and references therein) and common antigenic motifs are present in exchange interfaces located around arbuscules or bacteroids (7). These observations, combined with fossil evidence, have led to the proposition that the two endosymbioses are evolutionarily related. Finally, the host plant allows symbiotic fungal or bacterial infection only when the soil concentration of the mineral element provided by the microsymbiont (nitrogen for rhizobia and phosphorus for AM fungi) is limiting, suggesting similar regulatory mechanisms.

Two plant species have emerged as model legumes, Medicago truncatula Gaertn (14) and Lotus japonicus (10). Current research on M.truncatula addresses questions that can be only or better investigated in a legume species, such as the rhizobial and mycorrhizal symbioses, specific pathogenic and pest (aphids, nematodes) interactions, aspects of plant architecture, and proteinaceous seed development. Medicago truncatula and L.japonicus functional, structural and comparative genomics are being actively developed worldwide (10,15). The first two systematic M.truncatula expressed sequence tag (EST) sequencing projects were conducted on root hairs/root tips and nodules, respectively (16,17). Since then, a large number of M.truncatula ESTs have been deposited in databases, originating from programs sponsored by the Noble Foundation (18) and the NSF (http://www. medicago.org) in the USA and by a CNRS-INRA-Genoscope collaboration in France. In parallel, EST sequencing and analysis are progressing in L.japonicus (1921).

In this paper, we report on the French program for large-scale M.truncatula EST sequencing and analysis. The main goal was to obtain a first global and comparative view of the two root symbiotic programs, by characterizing transcript populations expressed in roots of M.truncatula in the presence or absence of interactions with either of the two microsymbionts, the bacterium Sinorhizobium meliloti or the AM fungus Glomus intraradices. The strategy implemented in this EST program has three important characteristics: (i) sequencing of both 5' and 3' cDNA ends; (ii) a clustering pro cedure that processes sequence information gradually, starting with the most reliable clusters; (iii) an annotation procedure conducted in a semi-automated fashion. Subsequently, in silico comparisons of EST distribution in the various sequenced M.truncatula cDNA libraries could be conducted using an improved statistical model to identify genes predicted to be differentially regulated. These EST clustering data, protein predictions, cluster annotations and tools for in silico analysis are available at http://medicago.toulouse.inra.fr/EST.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSION
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Plant material and microsymbiont strains
A nitrogen-starved root tip library (MtBA) was constructed from M.truncatula Jemalong (line A17) plants grown in aeroponic culture for 14 days with a nitrogen-rich medium followed by 3 days with a N-free medium as in Journet et al. (13). Regions of roots tips competent for early rhizobial interactions (~1 or 3 cm-long fragments, for short laterals or main vertical roots, respectively) were harvested and stored in liquid nitrogen. For the root nodule library (MtBB), plants (line J5, the genotype of which is probably identical to that of line A17) (22) were grown aeroponically in a nitrogen-rich medium for 21 days followed by 3 days in a N-free medium. Root systems were inoculated with the S.meliloti wild-type strain RCR 2011 and root nodules, along with short adjacent root segments, were harvested 4 days post-inoculation. To obtain an arbuscular mycorrhiza library (MtBC), seedlings (line J5) were transplanted into a neutral clay soil: calcined Terragreen® mix (1:2) in the presence of onion root fragments colonized by the AM fungus G.intraradices (Schenck and Smith, isolate BEG 141), and previously tested for the absence of S.meliloti on a hypernodulating M.truncatula mutant (TR122) (23). Plants were watered daily, and twice a week with a modified nutrient Long Ashton solution (24) without phosphate but with 15 mM nitrate to ensure nodulation-free conditions. Whole mycorrhizal root systems were harvested 21 days after inoculation. At this stage, the extent of mycorrhizal colonization was 70% of the root system as determined by trypan blue staining (25) (http://www.dijon.inra.fr/bbceipm/Mychintec/Mycocalc-prg/ download.html), which reflects a well established symbiosis characterized by a high frequency of arbuscule-containing cells.

cDNA libraries
For each library, total RNA was extracted from frozen tissue by the procedure of Jackson and Larkins (26). cDNA was prepared from poly(A)+ enriched RNA using the synthesis and cloning kit of Stratagene (La Jolla, CA). cDNA longer than ~400 bp was directionally ligated into the Uni-zap XR ({lambda} zap II) vector from Stratagene (site 1, EcoRI; site 2, XhoI) and packaged using Gigapack Gold packaging extracts. Plasmids (pBluescript SK) containing cDNA inserts were mass-excised from phage stocks using ExAssit helper phage and propagated in SOLR cells, according to the manufacturer’s instructions.

DNA sequencing and sequence processing
At least 5000 clones were randomly selected from each library and sequenced from 5' and 3' ends by the BigDye Terminator (Applied Biosystems) or ET terminator (Amersham Pharmacia Bioscience) method, using 5'-ATT AAC CCT CAC TAA AGG GA-3' as forward primer and 5'-TAA TAC GAC TCA CTA TAG GG-3' as reverse primer. Approximately half of the sequences were produced by capillary sequencers (ABI 3700, Megabace), and the rest on ABI 377 slab gel sequencers. 28 412 raw ESTs (95% of the runs) passed a test for high confidence base-call (PHRED software) (27,28), with an average valid read-length of 530 nt. These raw sequences were then edited to mask vector, cloning adapter, poly(dA/dT), sequence ends with more than 3% of ‘N’, as well as di- and tri-nucleotide repeats >20 bp to avoid possible interference with the EST clustering procedure. Trimmed ESTs with valid sequences shorter than 100 nt were eliminated. In the case of multiple runs for a same end (2808 redundant reads), we selected the longest sequence flanked with canonical vector sequence when available.

We also excluded 297 cDNAs (633 ESTs) found as chimeric by using a method based on pattern detection (see Fig. SA, Supplementary Material). A first type of cDNA chimer probably results from a contamination of {lambda} ZAP arms with the pBluescript vector stuffer fragment EcoRI–XhoI, and can be detected by automatic search for that sequence. A second type is caused by cDNAs carrying the EcoRI adapter at both ends, possibly following exonuclease degradation, during reverse transcription, of the XhoI site provided by the cDNA primer. The signature that we used to screen for such chimers was the palindromic EcoRI adapter found at the junction between the two cDNAs. A third type of chimer is due to cDNAs ending by two XhoI sites, possibly because of recognition of internal XhoI sites in the cDNA during EcoRI adapter elimination from the 3' end. Such chimers were searched for by looking for the presence of two poly(dA) stretches in a single clone, or of one poly(dA) and one contiguous poly(dT) stretch, separated by an XhoI site (depending on the cloning orientation of the XhoI–XhoI cDNA). At this stage, our EST data set (‘MtB’) was submitted in July 2000 to the EMBL data bank (accession nos AL365523–AL389869) to release 24 347 ESTs (average read-length 437 nt; see Table 1 for distribution of deposited cDNAs in libraries).


View this table:
[in this window]
[in a new window]
 
Table 1. Summary of validated MtB cDNA clones and resulting MtC EST clusters
 
EST clustering, assembly and annotation
We first defined highly reliable core clusters (‘cluster seeds’) by starting with pairs of cDNAs homologous to each other on both 5' and 3' ends. These cluster seeds were compared to the M.truncatula EST database by using WU-BLASTN [W.Gish, 1996–2002, http://blast.wustl.edu] and gradually enlarged and assembled with CAP3 (29) by successive rounds of EST aggregation, to generate the ‘MtC’ and ‘MtD’ cluster subsets (see Results). Thresholds chosen for the aggregation steps were 95% identity over more than 30 bp and a length of unmatched adjacent region shorter than 25 nt. We also added the constraint that in the case of cDNAs with both available and non-overlapping 3' and 5' ends, either both ends or none must be integrated into a same cluster, thereby further limiting the risk of aggregation of potentially chimeric clone sequences. Therefore, assembly by CAP3 of cluster EST members could generate in some cases several distinct and non-overlapping consensus sequences. The quality of our clustering/assembly procedure has been assessed by comparing Medicago gene or cDNA sequences available in public databases, with the consensus sequence of their putative MtC cluster homologs (see our web site http://medicago.toulouse. inra.fr/EST).

Coding sequences were predicted using FrameD (J.Gouzy, T.Schiex, E.-P.Journet, P.Thébault, F.Servant, V.Carreau, D.Kahn and P.Gamas, submitted for publication). This program was originally developed by Schiex et al. (30) for the analysis of prokaryotic genomes and allows for the detection of potential frame-shifts. It has been extended to handle incomplete sequences and trained with a set of probable M.truncatula coding sequences (showing BLASTX hits with more than 50% identity over 300 amino acids). EST clusters and associated predicted proteins were manually inspected and annotated using a specifically developed annotation interface, and the annotation was transferred to individual ESTs.

When comparing our database with the M.truncatula Gene Index at the The Institute for Genomic Research (http://www.tigr.org/tdb/tgi/mtgi/), there are several significant differences in the way the two databases were built and in the type of information accessible on-line. First, our clustering process using the same assembly program (CAP3) but ESTs were grouped on the basis of more than one sequence relationship (the link between 5' and 3' ends has been used to group non-overlapping contigs). In addition, M.truncatula gene or cDNA sequences available in public databases were not incorporated in this process but were used as a form of quality control at the end of the clustering phase, and ESTs corresponding to suspected chimeric cDNAs were eliminated. Secondly, all the clusters in our database were annotated manually. Thirdly, our web site provides on-line a number of specific pre-computed analyses and tools, such as predicted protein sequences, tools to compare cluster sequences and in silico expression profiles, and tools to search DNA or peptide patterns.

In silico analyses of gene expression
We developed a specific statistical model in order to evaluate whether differences in EST counts for a given gene between libraries reflect actual differences in gene expression level. Note that when two ESTs were available for a same cDNA clone (5' and 3' ends), these were counted as one.

Following Stekel et al. (31), we calculate the likelihood ratio between two hypotheses. In the null hypothesis H0, the frequency f of ESTs corresponding to a given gene is constant, whereas in the alternative H1 hypothesis the frequency fi differs in the various libraries, i.e. the gene is differentially expressed. Let xi be the number of ESTs corresponding to a given gene in library i containing Ni clones. Under the H0 hypothesis, the distribution Xi of xi is well approximated by a single Poisson distribution with mean fNi:

P(Xi = k) = efiNi(fNi)k / k!1

Note however that f is unknown. Let N = Ni be the total number of clones in the libraries and x = xi the total number of clones corresponding to the gene of interest. Following Audic and Claverie (32), assuming a flat non-informative prior on f, we note that the likelihood p(f | x) is proportional to P(x | f) = efN(fN)x / x! and therefore:

p(f | x) = efN(fN)x / {int}0{infty} e{phi}N({phi}N)x d{phi} = efNf xNx+1 / x!2

from which we derive:

P(Xi = k | x) = {int}0{infty}{efNi(fNi)k / k!}p(f | x) df

= (Ni / N)k(1 / k!x!){int}0{infty} e–µ (1 + Ni / N) µ (k + x) dµ (µ = fN)

= (Ni / N)k(k + x)! / [k!x!(1 + Ni / N)(k +x + 1)]3

This relationship corresponds to equation 2 of Audic and Claverie (32). The likelihood L0 of the observed data {xi} under the H0 hypothesis follows:

L0 = P(Xi = xi | x) = (Ni / N)xi[(xi + x)!] / [xi!x!(1 + Ni / N)(xi + x + 1)]4

In the alternative H1 hypothesis, Xi follows a Poisson distribution with mean fiNi:

P(Xi = k) = efiNi(fiNi)k / k!5

with fi unknown and likelihood:

p(fi | xi) = efiNifixiNixi + 1 / xi!6

from which we obtain:

P(Xi = k | xi) = {int}0{infty}{efiNi(fiNi)k / k!}p(fi | xi) dfi

= (1 / k!xi!) {int}0{infty} e–2µµ(k + xi)dµ (µ = fiNi)

= (k + xi)! / [k!xi!2(k + xi + 1)]7

which is identical to equation 1 from Audic and Claverie (32). The likelihood L1 of the observed data under the H1 hypothesis follows:

L1 = P(Xi = xi | xi) = (2xi)! / [xi!22(2xi + 1)]8

The likelihood ratio R = L1 / L0 is an indicator of whether a gene is differentially expressed. Note that these expressions differ significantly from those derived by Stekel et al. (31). These authors used the empirical frequencies xi / Ni and x / N as estimates for fi and f, respectively, whereas our likelihood estimates are marginalized over fi and f, which makes them more accurate for small xi. This is important practically because low EST counts are a common occurrence and because overestimates multiply when several EST libraries are considered (equation 8). In order to evaluate the significance of the levels of R, numerical simulation of the behavior of R under the H0 hypothesis was performed. For each pair of cDNA libraries, we generated 105 pairs of random data sets of the same cardinalities by simulating sampling from a multinomial distribution where each gene has a relative frequency equal to its empirical frequency in the pair of cDNA libraries pooled together (which corresponds to the null hypothesis). In each simulated data set pair, we then measured the number of genes achieving each level of R, and estimated the expected number of genes achieving such a level by computing mean values across the 105 repeats.


    RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSION
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Generation of a large data set of high-quality ESTs from three M.truncatula root cDNA libraries
Three cDNA libraries were used for EST sequencing. The first one, designated MtBA, was generated using regions of the roots competent for nodulation, i.e. 1–3-cm-long root tips harvested from nitrogen-starved plants. As these plants were not inoculated with a microsymbiont, MtBA can be considered as a control root library. The second library, MtBB, was made from young (4-day-old) nodules and has previously been screened for clones representing genes up-regulated during nodulation (33). MtBC, the third library, was obtained from roots harvested 3 weeks after inoculation with the AM fungus G.intraradices. At least 5000 clones were randomly selected from each library and sequenced from 5' and 3' ends.

After sequence editing (see Materials and Methods), an automatic search was carried out for potential chimeric clones, i.e. clones containing more than one cDNA insert, which can lead to serious problems for EST clustering and the generation of cDNA arrays at a later stage. We knew from past experience that different types of chimers could be found (Materials and Methods) and automatic searches were conducted based on the corresponding signatures. In this way, 297 cDNAs (633 ESTs) were eliminated. Finally, manual inspection of BLASTX results during the annotation process (see below) revealed 1023 cDNA clones (1559 ESTs), in which each end showed convincing homologies to totally unrelated proteins (generally including a highly expressed protein). This often correlated with dubious cluster organization (for an example see Fig. SB). Although chimer formation was generally not strictly demonstrated, it was considered safer to withdraw these clones from the final clustering (the full list of putative chimeric cDNA clones is available on our web site). Table 1 shows the number and distribution of MtB cDNA clones finally used.

EST clustering
In order to interpret EST data, it is useful to cluster all ESTs representing a unique transcript and to assemble them, thereby generating a consensus sequence. In many cases such consensus sequences are substantially longer than individual ESTs, allowing nearly full length cDNAs to be reconstituted. A strategy was elaborated for EST clustering that takes advantage of the availability of both 5'- and 3'-end sequences for the majority of MtB cDNAs. Highly reliable core clusters were first built from clones with both ends homologous (thus making chimer incorporation unlikely), and these clusters were gradually enlarged by successive rounds of EST incorporation (J.Gouzy, T.Schiex, E.-P.Journet, P.Thébault, F.Servant, V.Carreau, D.Kahn and P.Gamas, unpublished results). This method makes it easier to distinguish members of multigene families which may share a closely related 5' end (coding region), in contrast to their 3' untranslated regions. However, in some cases (~6% of the total number of ESTs used), ESTs corresponding to highly conserved coding regions could not be unambiguously classified into one cluster or another. In such cases, the clustering program used allowed more than one solution to be defined for the rebuilding of the biological transcripts. For example, the cluster family MtC00218 composed of two clusters MtC00218.1 and MtC00218.2 likely represents one case of alternate splicing. Another characteristic of this clustering procedure is that 5' and 3' segments of a same transcript are grouped into the same cluster, even though they do not necessarily overlap. As a consequence, 1031 MtC clusters (see below) contained more than one consensus sequence. This procedure generates fewer clusters and leads to a better view of gene representation in the case of long cDNA molecules, which is particularly important when estimating the level of gene expression from EST counts.

141 390 M.truncatula ESTs available as of September 2001 were used for clustering. Thus, 6359 clusters, designated ‘MtC’ clusters, were generated from the 21 473 MtB ESTs and 52 498 ESTs from other libraries that were recruited in the process. MtC clusters were subsequently manually annotated (see below). 24 645 additional clusters (‘MtD’ subset) were produced from the remaining 67 419 ESTs but were not annotated. The level of redundancy among the MtB cDNA clones appears to be moderate (12 362 / 6359 = 1.9), probably reflecting the diversity of cell types and situations represented in the root/symbiotic tissue used for the three MtB libraries. Not surprisingly, MtC cluster size varied substantially from singletons representing one cDNA (350, 419 and 845 found, respectively, in MtBA, MtBB and MtBC) to large clusters containing several hundreds of ESTs. Interestingly, despite the numerous M.truncatula libraries studied and the large number of M.truncatula ESTs already deposited, 1790 of the MtC clusters (28% of the total) remained specific to one or several of the three MtB libraries. This result emphasizes the complementarity of the various M.truncatula EST sequencing programs. However, it must be borne in mind that an unknown proportion of the 913 clusters specific to the MtBC mycorrhizal library may correspond to AM fungal cDNAs.

Cluster annotation and derived analyses
Merely indicating a list of the best hit(s) found by automated BLASTX searches cannot be considered a satisfactory annotation procedure since it suffers from serious limitations, especially in terms of error propagation (discussed in 34). In order to reach a high quality of annotation, we decided to systematically annotate EST clusters using a semi-automated approach, in which a functional annotation and a confidence index are assigned after human examination of the results of various automated analyses: BLASTX searches against protein databases [SWISS-PROT, TrEMBL (35) and ProDom (36)], coding sequence prediction using FrameD (30), search for functional and structural protein sites using various programs [TMpred (37), TopPred (38) and Interproscan (39)]. We have developed a graphical interface for our database (designated MENS for Medicago EST Navigation System) that displays all the information obtained from these analyses and provides search tools (J.Gouzy, T.Schiex, E.-P.Journet, P.Thébault, F.Servant, V.Carreau, D.Kahn and P.Gamas, submitted for publication: http://medicago.toulouse.inra.fr/EST).

In the process of functional annotation of the putative encoded proteins, clusters were classified into a limited set of 16 broad functional categories, similar to those used in previous M.truncatula EST analyses (16,17). Figure 1A shows the distribution of the 6359 annotated MtC clusters reflecting the number of predicted genes into these categories. The most abundant categories correspond to primary metabolism and protein synthesis/processing, and above all (approximately one-third of the total) to sequences showing homology to hypothetical proteins with unknown function. No S.meliloti sequence was detected, but contaminating Lambda phage sequences are represented (four clusters), as well as clones corresponding to 18S and 26S ribosomal RNA (six clusters). 836 clusters show no homology in the SP/TrEMBL protein database.




View larger version (41K):
[in this window]
[in a new window]
 
Figure 1. (A) Distribution of the 6359 MtC clusters into functional classes. The 16 broad categories that were used for classification during manual annotation are indicated, as well as the number of corresponding clusters. Only one class was assigned to each cluster. (B) Functional classes of candidate up-regulated genes in the nitrogen-fixing and mycorrhizal root symbiotic programs (R >= 4). The distribution into functional classes is shown for MtC clusters found as up-regulated (R >= 4) in young nodules (library MtBB, 4380 cDNAs, black bars), mature nodules (pooled libraries GVN, GVSN, NF-NR, 12 409 cDNAs in total, red bars), or mycorrhiza (library MtBC, 4878 cDNAs, blue bars) by in silico comparison with roots (pooled libraries MtBA, KV0, RP-, MHRP-, Mt04, RHE, 15 875 cDNAs in total). See http://medicago.toulouse.inra.fr/EST for description of libraries.

 
In order to initiate a search for potential legume-specific sequences, we made use of predicted protein sequences in order to avoid non-translated regions and spurious out-of-frame hits. A TBLASTN search was carried out against all available DNA sequences (ESTs, GSS, NT; database releases of 27/01/2002) from a set of legumes (Medicago sativa, Pisum sativum, Lotus japonicus, Glycine max, Trifolium purpureum, Cicer arienitum) and from non-legume plants (A.thaliana, Lycopersicum esculentum, Oryza sativa, Zea mays). 141 clusters gave hits with legume sequence(s) only, using as threshold values 60% conserved amino acids and an expect value cut-off of 10–6 (Table SA). Another criterion that can help validate genes as legume-specific is their expression pattern. It is worth mentioning in this respect that 18 out of these 141 genes display nodule-specific or -induced expression (as determined by in silico analyses, see below), including genes for three nodulins of unknown function (MtN14, MtN22, MtN26) (33) and two leginsulins.

A global view of the root symbiotic program, as defined by in silico analyses
Medicago truncatula is one of the few plant species for which a large number of ESTs have been generated by random sampling from many cDNA libraries, constructed using comparable plant material and cloning procedures, and representing a variety of tissues and physiological situations (see http://www.medicago.org). Therefore, it is reasonable to infer patterns of gene expression from the analysis of EST distribution, even though various biases may occur in the construction of cDNA libraries (see below). ESTs from 32 libraries available as of March 2002 were grouped into 12 broad classes and 22 sub-classes. Following this classification, we designed a graphical representation (histogram) showing normalized EST frequencies for a given cluster in the various libraries (see ‘electronic northern’, http://medicago.toulouse. inra.fr/EST; J.Gouzy, T.Schiex, E.-P.Journet, P.Thébault, F.Servant, V.Carreau, D.Kahn and P.Gamas, submitted for publication). We were able to verify that such electronic northerns gave a pattern consistent with results obtained by classical northern blots for 37 genes (see Table 2).


View this table:
[in this window]
[in a new window]
 
Table 2. Clusters differentially represented in various tissues with confirmation by molecular data
 
A more systematic way to exploit EST distribution is to identify all clusters with ESTs showing differential abundance between libraries (‘in silico screening’). The main difficulty in such analyses is to distinguish between random fluctuations due to cDNA sampling and those actually reflecting differential gene expression. For this purpose, a specific statistical model was developed, adapted from Stekel et al. (31), and based on the calculation of a likelihood ratio R (see Materials and Methods). For example, an R value of 10 implies that the observed data are 10 times more likely under the hypothesis that the gene is differentially expressed. To evaluate the proportion of false positives in each pairwise library comparison, i.e. genes passing a given R value due to stochastic effects, random numerical simulation was performed (see Materials and Methods).

Systematic pairwise comparisons were conducted between libraries (or groups of libraries). Table 3 shows examples for root libraries. The proportion of true positives (% credibility) estimated from the simulations varies between 67 and 86% at R >= 4 and 86 and 95% at R >= 10. It can be noted that the MtBA and MtKV0 libraries obtained in different laboratories, which are supposed to represent similar physiological conditions (nitrogen-starved, non-inoculated roots), show 57 and 26 differentially represented clusters at R >= 4 and R >= 10, respectively, versus 15 and 2 false positives expected by numerical simulation. Hence, actual differences in biological materials and cDNA cloning efficiencies (see Conclusion) probably explain the majority of the differential gene expression detected between MtBA and MtKV0 libraries. In contrast, twice as many genes are differentially represented between the MtBB young nodule library and the MtBA non-symbiotic root library. The R value, depending upon the size of the cDNA populations, can be useful to group libraries, provided they are homogeneous enough, in order to increase the sensitivity and robustness of the results. Thus, when comparing MtBB to the six pooled non-inoculated root libraries, 92 genes are predicted as up-regulated in MtBB at R >= 10 and 260 at R >= 4 (see our web site for description of the libraries). These clusters represent candidate genes for the early root nodule symbiotic program.


View this table:
[in this window]
[in a new window]
 
Table 3. Numbers of differentially regulated genes as predicted by pairwise comparisons of root cDNA libraries
 
The distribution of the potential members of the early nodulation symbiotic program among functional classes is shown in Figure 1B, and those with the 50 best R scores are listed in Table 4 (see Table SB for complete set). Most of these sequences fall into only a few broad categories. First, it can be noted that the major functional class corresponds to M.truncatula genes involved in protein synthesis, with an extensive set of ribosomal proteins (r-proteins; 53 different genes), translation initiation and elongation factors. A series of genes involved in protein sorting, processing and degradation is also predicted to be induced in young nodules, which is consistent with a vast increase in protein metabolism and probably reflects the high rate of cell division and cell growth in these developing organs. The importance of r-proteins in cell division, growth and development has been documented in plants and animals through studies of various mutants affected in r-proteins and of r-protein transcript accumulation (see 40 and references therein).


View this table:
[in this window]
[in a new window]
 
Table 4. Genes predicted to be up-regulated in young nodules (library MtBB) versus roots (50 highest R scores)
 
Genes involved in carbon sink functions are mainly induced in late stages of nodulation, but some are already expressed in young nodules, as seen with electronic northerns coupled with in silico screening (for example beta-amylase, hexose transporter, invertase genes). The same is true for the leghemoglobin Mtlb1 gene. Other up-regulated genes code for cell wall enzymes and structural proteins, potentially associated with the infection process and with cell growth and division, and for enzymes involved in secondary metabolism (flavonoid pathway, terpenoid/sterol biosynthetic pathway). Interest ingly, several stress response genes are also predicted to be induced, which could reflect an altered hormonal balance and/or stress induced by the organogenetic process or by the bacterial infection. As far as signaling and gene regulation is concerned, several protein kinases, zinc finger proteins, and DNA binding proteins are present among the predicted up-regulated M.truncatula genes, some of them being symbiosis-specific. Finally, 68 genes are of unknown function, including many previously described nodulin genes, a novel lectin gene, and 15 genes without any homology in protein databases (Table SB).

In order to gain a wider view of the nodulation symbiotic program, five EST libraries representing not only young nodules but also nitrogen-fixing and senescent nodules were examined. 331 clusters were found to correspond to predicted up-regulated genes (R >= 4) when comparing a combination of the nodule libraries with the six non-inoculated root libraries, 80 were symbiosis-specific, and 32 corresponded to genes already known to be transcriptionally activated during nodulation (33,4145) (data not shown).

Medicago truncatula genes predicted to be down-regulated in nodules appear far less frequent than up-regulated genes when comparing the six pooled libraries for non-inoculated roots to the nodule library MtBB (53 versus 260 at R >= 4, see Tables SB and SC). Closer examination reveals that a number of these genes belong to multigene families, and that in several cases alternative members of the same family are induced during nodulation. An example is provided by the PR10 gene family, where several members are apparently repressed during nodulation whereas the closely related MtN13 nodulin gene is up-regulated (46). This is also observed for several other gene families, such as those encoding cytochromes P450, lipoxygenases, glutamine synthetases, cell wall proteins or membrane intrinsic proteins (see our web site). Thus, it appears that there are nodulation-specific genes taking over a similar biochemical or cellular function.

It is more difficult to propose an interpretation for the set of genes predicted to be differentially regulated in mycorrhiza, in particular because we lack unambiguous internal controls [known as ‘endomycorrhizins’ (47); see (8) for a review]. When comparing MtBC to the six pooled libraries for non-inoculated roots, 51 clusters at R >= 10 and 209 at R >= 4 were predicted as up-regulated in MtBC (Table 5; see Table SD for complete set). These clusters represent candidate genes for the mycorrhizal symbiotic program. Combining MtBC with MtMHAM (the only other mycorrhiza-specific library, representing different stages of mycorrhiza development) gave fewer candidate up-regulated genes than MtBC alone (Table 3), which suggests that these two libraries are not very similar. Indeed there are many differences in cluster representation between the two libraries that cannot be explained by random cDNA sampling only (see comparison MtBC versus MtMHAM, Table 3), and could be due to differences in the fungal species inoculated, plant growth conditions or symbiotic stages of harvested material. This underlines the complementarity of cDNA libraries generated by different laboratories for gene discovery. However, it also emphasizes the necessity of molecular validation of the predicted differential regulation before speculating on the biological function of candidate genes identified in silico.


View this table:
[in this window]
[in a new window]
 
Table 5. Genes predicted to be up-regulated in mycorrhiza (library MtBC) versus roots (50 highest R scores)
 
The largest functional groups represented in the predicted up-regulated genes from the MtBC library (Fig. 1B) are related to protein synthesis and processing, primary metabolism, abiotic stimuli and development, or correspond to proteins with unknown functions. Up-regulation, at least transiently, of certain defence and cell rescue related genes (metallothionein, glutathione S-transferase, protease inhibitors) is not unexpected since this is a well reported phenomenon for mycorrhiza interactions in a number of plant species (8,48). The predicted induction of a glutathione S-transferase (R > 10) is also consistent with experimental observations on mycorrhizal potato roots and M.truncatula roots (49,50; L.Brechenmacher, S.Weidmann, D.van Tuinen, O.Chatagnier, S.Gianinazzi, P.Franken and V.Gianinazzi-Pearson, submitted for publication). It is noteworthy that the differential representation (R > 4, Table 2) of a MtBC-specific cluster, corresponding to the recently characterized phosphate transporter gene MtPT4, correlates with the specific expression of MtPT4 in AM roots at the arbuscule interface (51; see also 52 for a similar AM-specific phosphate transporter in potato). Two other phosphate transporter genes appear to be down-regulated (albeit at R ~ 2.5), consistent with experimental observations by Chiou et al. (53). The predicted up-regulation in the MtBC library of a gene encoding a plasma membrane intrinsic protein (PIP1 aquaporin) agrees with reported transcriptional activation of aquaporin genes in the mycorrhizal symbiosis (54,55). Finally, various cell wall protein-encoding genes appear to be differentially regulated in mycorrhiza (up- or down-regulation; see Tables SD and SE). This is consistent with previous molecular observations (13,55,56; L.Brechenmacher, S.Weidmann, D.van Tuinen, O.Chatagnier, S.Gianinazzi, P.Franken and V.Gianinazzi-Pearson, submitted for publication), and can be linked to the cell wall modifications accompanying the formation of the new apoplastic compartment surrounding the arbuscule (7).

An interesting question is how many genes are shared between the symbiotic programs with bacterial and fungal microsymbionts. It appears that a few clusters (19 at R >= 4) are predicted to be up-regulated both in MtBC and in nodules versus roots (Table 6), but none corresponds to the nodulin genes already described to be induced in mycorrhiza (ENOD2, -11, -12, -40) (13; for a review see 8). This may be due to signal dilution, because the mycorrhiza biological material consists of whole root systems where arbuscular structures are ephemeral and diluted by surrounding root tissues, in contrast to root nodules, which are new organs highly enriched in nodulin transcripts. Seven of these co-induced clusters correspond to genes involved in protein synthesis and processing, and may indicate a general increase in root cortical cell activity accompanying arbuscule development despite the absence of cell division (7).


View this table:
[in this window]
[in a new window]
 
Table 6. Genes predicted to be up-regulated both in nodules and mycorrhiza versus roots (R >= 4)
 

    CONCLUSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSION
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Our in silico analyses provide a first picture of the numerous M.truncatula genes potentially involved in the symbiotic programs, on the grounds of statistical significance of differential EST distribution. It is clear, however, that various factors can lead to misrepresentation of transcript abundance, in particular potential biases in cDNA cloning (due, for example, to RNA secondary structures or to cDNA sizing), efficiency of in vivo mass excision, or colony viability. Moreover, the sampling process generates random fluctuations by itself, which is a problem with small EST clusters. The EST clustering procedures may also misincorporate/separate some ESTs. We have tried to overcome some of these potential problems, by 3'-end sequencing and by developing specific methods for chimer detection, EST clustering, and statistical analysis.

Considering these potential biases, experimental studies are now needed to validate the in silico predictions presented here. In this context, it is encouraging to note that predictions for 37 M.truncatula documented genes are consistent with the corresponding molecular data (Table 2). Beyond this limited gene sample, a large-scale comparison is necessary for a thorough evaluation of the validity of these in silico analyses. This is now in progress using macro/microarrays generated with ~5500 MtC cluster representatives, in the framework of the European Medicago program (collaboration with H. Küster, A. Becker and A. Pühler, Bielefeld, Germany). Moreover, gene expression analyses based on macro/microarrays should provide valuable information for part of the numerous genes represented by a single cDNA whose expression profiles cannot be analyzed in silico for statistical reasons, provided that similar sensitivity problems will not be faced with arrays. A recent report on a transcript profiling study in L.japonicus illustrates both the potential and the limitations of cDNA arrays in identifying changes in gene expression accompanying nodulation (57). Note, however, that cross-hybridization on DNA arrays can be a problem for gene families, and in silico data should help to anticipate and solve such issues.

This study, like others, shows that it is fruitful to characterize a number of different cDNA libraries, and thus the coordinated international effort on M.truncatula (15) is providing the community with valuable data for gene identification, protein prediction and DNA array production. One important contribution is the construction of a M.truncatula Gene Index (MtGI) from all available M.truncatula ESTs by The Institute for Genomic Research (58,59). These results are presented in a publicly accessible and searchable database in the form of tentative consensus sequences (TCs) and ‘electronic northerns’ (http://www.tigr.org/tdb/tgi/mtgi/). This database has been exploited in a recent study leading to the identification of numerous nodule-specific M.truncatula transcripts (60). Several major differences can be underlined in the way the MtGI and our database were built and in the information accessible on-line (see Materials and Methods). However, further work is required to extensively compare the two databases at the level of reconstructed transcript sequences.

Finally, even though relatively large numbers of ESTs are now available for inoculated roots, nodules and mycorrhiza, several genes previously identified in these tissues by other methods are still missing, and it can be anticipated that many weakly expressed genes, including key regulators, are not represented. Moreover, it can be noted that approximately one-third of the ~600 M.truncatula/sativa genes and cDNAs found in the databases are not represented in the whole M.truncatula EST set (see our web site). Therefore, developing complementary strategies such as those based upon cDNA-AFLP or SSH should help to uncover novel genes.


    SUPPLEMENTARY MATERIAL
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSION
 SUPPLEMENTARY MATERIAL
 REFERENCES
 
Supplementary Material is available at NAR Online.


    ACKNOWLEDGEMENTS
 
The authors thank Julie Cullimore, David Barker and Thomas Faraut for critical reading of the manuscript and other colleagues for helpful discussions. Financial support was provided by INRA/CNRS for the Medicago truncatula Genome Project (1998–2000). V.C. was funded by the Toulouse-Genopole program and F.M. by the EU contract ‘Integrated structural, functional and comparative genomics of the model legume Medicago truncatula’, no. QLG2-CT-2000-00676 (2001–2003).


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSION
 SUPPLEMENTARY MATERIAL
 REFERENCES
 

  1. Gianinazzi-Pearson,V. and Dénarié,J. (1997) Red carpet genetic programmes for root endosymbioses. Trends Plant Sci., 2, 371–372.

  2. Geurts,R. and Bisseling,T. (2002) Rhizobium Nod factor perception and signalling. Plant Cell, 14, S239–S249.[Free Full Text]

  3. Marsh,J.F. and Schultze,M. (2001) Analysis of arbuscular mycorrhizas using symbiosis-defective plant mutants. New Phytol., 150, 525–532.[ISI]

  4. Parniske,M. (2000) Intracellular accommodation of microbes by plants: a common developmental program for symbiosis and disease? Curr. Opin. Plant Biol., 3, 320–328.[ISI][Medline]

  5. Stougaard,J. (2000) Regulators and regulation of legume root nodule development. Plant Physiol., 124, 531–540.[Free Full Text]

  6. Smith,S.E. and Read,D.J. (1997) Mycorrhizal Symbiosis, 2nd Edn. Academic Press, London.

  7. Gianinazzi-Pearson,V. (1996) Plant cell responses to arbuscular mycorrhizal fungi: getting to the roots of the symbiosis. Plant Cell, 8, 1871–1883.[ISI][Medline]

  8. Harrison,M.J. (1999) Molecular and cellular aspects of arbuscular mycorrhizal symbiosis. Annu. Rev. Plant Physiol. Plant Mol. Biol., 50, 361–389.[ISI][Medline]

  9. Brewin,N.J. (1998) Tissue and cell invasion by Rhizobium. In Spaink,H.P., Kondorosi,A. and Hooykas,P.J.J. (eds), The Rhizobiaceae. Kluwer Academic Publishers, Dordrecht, The Netherlands, pp. 417–429.

  10. Stougaard,J. (2001) Genetics and genomics of root symbiosis. Curr. Opin. Plant. Biol., 4, 328–335.[ISI][Medline]

  11. Stracke,S., Kistner,C., Yoshida,S., Mulder,L., Sato,S., Kaneko,T., Tabata,S., Sandal,N., Stougaard,J., Szczyglowski,K. et al. (2002) A plant receptor-like kinase required for both bacterial and fungal symbiosis. Nature, 417, 959–962.[Medline]

  12. Endre,G., Kereszt,A., Kevei,Z., Mihacea,S., Kalo,P. and Kiss,G.B. (2002) A receptor kinase gene regulating symbiotic nodule development. Nature, 417, 962–966.[Medline]

  13. Journet,E.P., El-Gachtouli,N., Vernoud,V., de Billy,F., Pichon,M., Dedieu,A., Morandi,D., Barker,D.G. and Gianinazzi-Pearson,V. (2001) Medicago truncatula ENOD11: a novel RPRP-encoding early nodulin gene expressed during mycorrhization in arbuscular-containing cells. Mol. Plant Microbe Interact., 14, 737–748.[ISI][Medline]

  14. Cook,D.R. (1999) Medicago truncatula—a model in the making. Curr. Opin. Plant Biol., 2, 301–304.[ISI][Medline]

  15. Frugoli,J. and Harris,J. (2001) Medicago truncatula on the move! Plant Cell, 13, 458–463.[Free Full Text]

  16. Covitz,P.A., Smith,L.S. and Long,S.R. (1998) Expressed sequence tags from a root-hair-enriched Medicago truncatula cDNA library. Plant Physiol., 117, 1325–1332.[Abstract/Free Full Text]

  17. Gyorgyey,J., Vaubert,D., Jimenez-Zurdo,J.I., Charon,C., Troussard,L., Kondorosi,A. and Kondorosi,E. (2000) Analysis of Medicago truncatula nodule expressed sequence tags. Mol. Plant Microbe Interact., 13, 62–71.[ISI][Medline]

  18. Bell,C.J., Dixon,R.A., Farmer,A.D., Flores,R., Inman,J., Gonzales,R.A., Harrison,M.J., Paiva,N.L., Scott,A.D., Weller,J.W. et al. (2001) The Medicago Genome Initiative: a model legume database. Nucleic Acids Res., 29, 114–117.[Abstract/Free Full Text]

  19. Szczyglowski,K., Hamburger,D., Kapranov,P. and de Bruijn,F.J. (1997) Construction of a Lotus japonicus late nodulin expressed sequence tag library and identification of novel nodule-specific genes. Plant Physiol., 114, 1335–1346.[Abstract]

  20. Asamizu,E., Watanabe,M. and Tabata,S. (2000) Large scale structural analysis of cDNAs in the model legume, Lotus japonicus. J. Plant Res., 113, 451–455.

  21. Poulsen,C. and Podenphant,L. (2002) Expressed sequence tags from roots and nodule primordia of Lotus japonicus infected with Mesorhizobium loti. Mol. Plant Microbe Interact., 15, 376–379.[ISI][Medline]

  22. Thoquet,P., Gherardi,M., Journet,E.P., Kereszt,A., Ane,J.M., Prosperi,J.M. and Huguet,T. (2002) The molecular genetic linkage map of the model legume Medicago truncatula: an essential tool for comparative legume genomics and the isolation of agronomically important genes. BMC Plant Biol., 2, 1.[Medline]

  23. Sagan,M., Morandi,D., Tarenghi,E. and Duc,G. (1995) Selection of nodulation and mycorrhizal mutants in the model plant Medicago truncatula (Gaertn.) after {gamma}-ray mutagenesis. Plant Sci., 111, 63–71.

  24. Hewitt,E.J. (1966) Sand and water culture methods used in the study of plant nutrition. In Technical Communication, 22nd Edn. Commonwealth Agricultural Bureau, London, pp. 430–434.

  25. Trouvelot,A., Kough,J.L. and Gianinazzi-Pearson,V. (1986) Mesure du taux de mycorhization VA d’un système radiculaire. Recherche de méthodes d’estimation ayant une signification fonctionnelle. In Gianinazzi-Pearson,V. and Gianinazzi,S. (eds), Mycorrhizae: Physiology and Genetics. INRA-Press, Paris, Vol. V, pp. 217–221.

  26. Jackson,A.D. and Larkins,B.A. (1976) Influence of ionic strength, pH and chelation of divalent metals on isolation of polyribosomes from tobacco leaves. Plant Physiol., 57, 5–10.[Abstract/Free Full Text]

  27. Ewing,B., Hillier,L.D., Wendl,M.C. and Green,P. (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res., 8, 175–185.[Abstract/Free Full Text]

  28. Ewing,B. and Green,P. (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res., 8, 186–194.[Abstract/Free Full Text]

  29. Huang,X. and Madan,A. (1999) CAP3: a DNA sequence assembly program. Genome Res., 9, 868–877.[Abstract/Free Full Text]

  30. Schiex,T., Thébault,P. and Kahn,D. (2000) Recherche des gènes et des erreurs de séquençage dans les génomes bactériens GC-riches (et autres...). In Gascuel,O. and Sagot,M.-F. (eds), JOBIM Conference Proceedings. ENSA and LIRMM Editor, Montpellier, France, pp. 321–328.

  31. Stekel,D.J., Git,Y. and Falciani,F. (2000) The comparison of gene expression from multiple cDNA libraries. Genome Res., 10, 2055–2061.[Abstract/Free Full Text]

  32. Audic,S. and Claverie,J.M. (1997) The significance of digital gene expression profiles. Genome Res., 7, 986–995.[Abstract/Free Full Text]

  33. Gamas,P., de Carvalho-Niebel,F., Lescure,N. and Cullimore,J. (1996) Use of a subtractive hybridization approach to identify new Medicago truncatula genes induced during root nodule development. Mol. Plant Microbe Interact., 9, 233–242.[ISI][Medline]

  34. Rouze,P., Pavy,N. and Rombauts,S. (1999) Genome annotation: which tools do we have for it? Curr. Opin. Plant Biol., 2, 90–95.[ISI][Medline]

  35. Bairoch,A. and Apweiler,R. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 28, 45–48.[Abstract/Free Full Text]

  36. Corpet,F., Servant,F., Gouzy,J. and Kahn,D. (2000) ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res., 28, 267–269.[Abstract/Free Full Text]

  37. Hofmann,K. and Stoffel,W. (1993) TMBASE—a database of membrane spanning protein segments. Biol. Chem. Hoppe Seyler, 374, 166.

  38. Claros,M.G. and von Heijne,G. (1994) TopPred II: an improved software for membrane protein structure predictions. Comput. Appl. Biosci., 10, 685–686.[Free Full Text]

  39. Apweiler,R., Attwood,T.K., Bairoch,A., Bateman,A., Birney,E., Biswas,M., Bucher,P., Cerutti,L., Corpet,F., Croning,M.D. et al. (2001) The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res., 29, 37–40.[Abstract/Free Full Text]

  40. Barakat,A., Szick-Miranda,K., Chang,I.F., Guyot,R., Blanc,G., Cooke,R., Delseny,M. and Bailey-Serres,J. (2001) The organization of cytoplasmic ribosomal protein genes in the Arabidopsis genome. Plant Physiol., 127, 398–415.[Abstract/Free Full Text]

  41. Vernoud,V., Journet,E.-P. and Barker,D.G. (1999) MtENOD20, a Nod factor-inducible molecular marker for root cortical cell activation. Mol. Plant Microbe Interact., 12, 604–614.[ISI]

  42. Crespi,M.D., Jurkevitch,E., Poiret,M., d’Aubenton-Carafa,Y., Petrovics,G., Kondorosi,E. and Kondorosi,A. (1994) enod40, a gene expressed during nodule organogenesis, codes for a non-translatable RNA involved in plant growth. EMBO J., 13, 5099–5112.[ISI][Medline]

  43. Coba de la Pena,T., Frugier,F., McKhann,H.I., Bauer,P., Brown,S., Kondorosi,A. and Crespi,M. (1997) A carbonic anhydrase gene is induced in the nodule primordium and its cell-specific expression is controlled by the presence of Rhizobium during development. Plant J., 11, 407–420.[ISI][Medline]

  44. Pichon,M., Journet,E.-P., Dedieu,A., de Billy,F., Truchet,G. and Barker,D.G. (1992) Rhizobium meliloti elicits transient expression of the early nodulin gene ENOD12 in the differentiating root epidermis of transgenic alfalfa. Plant Cell, 4, 1199–1211.[Abstract/Free Full Text]

  45. de Carvalho-Niebel,F., Lescure,N., Cullimore,J. and Gamas,P. (1998). The Medicago truncatula MtAnn1 gene encoding an annexin is induced by Nod factors and during the symbiotic interaction with Rhizobium meliloti. Mol. Plant Microbe Interact., 11, 504–513.[ISI][Medline]

  46. Gamas,P., de Billy,F. and Truchet,G. (1998) Symbiosis-specific expression of two Medicago truncatula nodulin genes, MtN1 and MtN13, encoding products homologous to plant defense proteins. Mol. Plant Microbe Interact., 11, 393–403.[ISI][Medline]

  47. Samra,A., Dumas-Gaudot,E. and Gianinazzi,S. (1997) Detection of symbiosis-related polypeptides during the early stages of the establishment of arbuscular mycorrhiza between Glomus mosseae and Pisum sativum roots. New Phytol., 135, 711–722.

  48. Gianinazzi-Pearson,V., Dumas-Gaudot,E., Gollotte,A., Tahiri-Alaoui,A. and Gianinazzi,S. (1996) Cellular and molecular defence-related root responses to invasion by arbuscular mycorrhizal fungi. New Phytol., 133, 45–57.[ISI]

  49. Strittmatter,G., Gheysen,G., Gianinazzi-Pearson,V., Hahn,K., Niebel,A., Rohde,W. and Tacke,E. (1996) Infections with various types of organisms stimulate transcription from a short promoter fragment of the potato gst1 gene. Mol. Plant Microbe Interact., 9, 68–73.[ISI][Medline]

  50. Bestel-Corre,G., Dumas-Gaudot,E., Poinsot,V., Dieu,M., Dierick,J.F., van Tuinen,D., Remacle,J., Gianinazzi-Pearson,V. and Gianinazzi,S. (2002) Proteome analysis and identification of symbiosis-related proteins from Medicago truncatula Gaertn. by two-dimensional electrophoresis and mass spectroscopy. Electrophoresis, 23, 122–137.[ISI][Medline]

  51. Harrison,M.J., Dewbre,G.R. and Liu,J. (2002) A phosphate transporter from Medicago truncatula involved in the acquisition of phosphate released by arbuscula