The TCP family of putative transcriptional factors, defined by the founding members
Les gènes
Deux sous-familles sont définies sur la base de la séquence du domaine TCP et d’autres régions conservées
La diversité des gènes homologues de
Afin de préciser la complexité de la famille TCP et l’évolution des gènes de la sous-famille CYC, nous avons tout d’abord recensé les séquences codantes comportant un domaine TCP dans le génome complet d’
La recherche de séquences codantes comportant un domaine TCP dans le génome complet d’
Les séquences des gènes des sous-familles CYC et PCF ont été obtenues à partir de Genbank
Au total 112 séquences ont été analysées. L’alignement nucléotidique a été réalisé grâce au logiciel BioEdit (version 4.8.6)
Vingt-quatre séquences codantes comportant un domaine TCP ont été trouvées dans le génome complet d’
Une analyse par
Les séquences dépourvues de domaine R ne forment pas un groupe monophylétique, mais apparaissent en trois groupes, qui comportent dans certains cas des séquences portant le domaine R. Il semble donc exister de l’homoplasie pour la présence/absence de ce domaine. L’hypothèse la plus parcimonieuse serait qu’il est apparu une fois à la base de la sous-famille CYC, et a été perdu plusieurs fois indépendamment (au minimum trois fois d’après l’arbre obtenu).
Par ailleurs, l’ensemble des séquences de Veronicaceae et de Gesneriaceae constitue un groupe monophylétique (
Cette évolution indépendante selon les taxons pose la question de la conservation de la fonction et du rôle des gènes de la sous-famille CYC. Cette fonction semble liée à la croissance et au développement ; la symétrie/asymétrie du territoire d’expression de ces gènes pourrait déterminer l’apparition de structures morphologiques symétriques ou non. Un échantillonnage taxinomique plus large ainsi que des études d’expression plus nombreuses sont indispensables pour mieux comprendre l’évolution de la sous-famille CYC, et pour examiner son rôle dans l’élaboration de structures morphologiques symétriques ou non.
The TCP family of putative transcription factors is characterised by an original conserved basic-Helix-Loop-Helix (bHLH) domain that was initially found in Teosinte branched1 (TB1) of
Two subfamilies were defined based on the characteristics of the TCP domain and other conserved regions
Genes belonging to this family have been suggested to play a role in plant growth and development.
Within the CYC subfamily, the diversity of so-called
In the Lamiales
The availability of the full genomic sequence of
In a first step, translated sequences of Y16313 (
Eighty eight sequences were retrieved from Genbank (see
In a first step, all
DNA alignments were constructed using ClustalW as implemented in BioEdit version 4.8.6
The choice of NJ can be criticized, because distance methods are not phylogenetic in their principle, and because NJ is less performing than other available methods, like maximum parsimony or maximum likelihood. However, in practice, NJ analyses efficiently recover the main sequence clusters from datasets such as ours (i.e. with many sequences and few informative positions), provided that significant nodes are distinguished from artefactual ones in the entirely resolved tree produced. Branch length and bootstrap values are the two criteria we used to identify significant nodes in the NJ trees. Maximum parsimony was also performed, but with the complete dataset, the high number of sequences relatively to the number of informative positions resulted in a huge number of equally parsimonious minimal topologies (as a result of the high degree of irresolution) and, for that reason, even the heuristic analysis could not be completed. A common practice in such a case is to show a strict-consensus tree computed from an arbitrary number of minimal trees, after the search has been aborted; but this results in neglecting a number of alternative minimal topologies, thus underestimating the irresolution. We preferred, as an alternative strategy, to reduce the number of sequences, on the basis of the initial NJ topology. A maximum parsimony analysis was thus performed on a subset of 21 sequences with a heuristic search strategy. Characters were unordered and were given equal weight, and gaps were treated as missing data. Random addition sequence of taxa was used with 100 replicates, followed by TBR (tree bisection-reconnection) branch swapping on the best trees. The option ‘MULTREES on’ was used. Bootstrap analyses (1000 replicates) were performed with the search option set to heuristic and random addition of taxa generating 50 starting tree replicates.
The TCP domain and several
After searches for TCP family sequences in GenBank, a dataset including 112 distinct members was constituted, including the 24 sequences from
The eight genes from the CYC subfamily without an R domain (including the six genes from
All sequences from Veronicaceae and Gesneriaceae constitute a monophyletic group (bootstrap value 81%), but among this clade, sequences from Veronicaceae or Gesneriaceae are not mixed. Instead, four well-supported clades can be recognised:
The
The present study encompasses a broader gene sampling than in any previously published analysis of TCP-domain containing genes. The evolution of CYC subfamily genes appears to have occurred through multiple duplications (and possibly gene losses) taking place after the divergence between the various sampled taxonomic groups. The other possibility of multiple ancestral sequences homogenised through gene conversion, then following divergent evolution in the various taxa seems to be dismissed by the mapping of TCP family genes on different chromosomes in
The question of a possible conservation of function and expression across the different gene subgroups and taxa is presently unanswered. Expression data are recorded for six genes (
At present, data on expression or function are very scarce from a taxonomic point of view, and sequence data is strongly biased since most data come from Lamiales and
We thank Professors H. Le Guyader and J. Deutsch for critical reading of the manuscript. This work was supported by a grant from the ‘Centre national de la recherche scientifique’ (France) allocated to C.D.
Neighbour-joining tree resulting from the analysis of the 834-nucleotide alignment, including 112 genes, arbitrarily rooted on the PCF subfamily of TCP sequences. Bootstrap values above 60% are indicated on the branches (a few values above 60% on terminal branches are not indicated for clarity reasons). The 24 TCP coding sequences found in the
Fig. 1. Arbre obtenu par la méthode du
Consensus maximum parsimony tree obtained from the sample of 21 sequences, using two PCF subfamily sequences as the outgroup. Bootstrap values above 60% are indicated.
Fig. 2. Arbre consensus issu de l’analyse de l’échantillon de 21 séquences par la méthode de parcimonie. Les deux séquences de la sous-famille PCF ont été utilisées comme groupe externe. Les valeurs de
Name, characteristics and chromosomal assignation of TCP homologous coding sequences found in the
Tableau 1
Nom, caractéristiques et localisation chromosomique des séquences codantes comportant un domaine TCP, trouvées dans le génome complet d’