分子遗传学.doc

上传人:scccc 文档编号:12477084 上传时间:2021-12-04 格式:DOC 页数:9 大小:80.50KB
返回 下载 相关 举报
分子遗传学.doc_第1页
第1页 / 共9页
分子遗传学.doc_第2页
第2页 / 共9页
分子遗传学.doc_第3页
第3页 / 共9页
亲,该文档总共9页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述

《分子遗传学.doc》由会员分享,可在线阅读,更多相关《分子遗传学.doc(9页珍藏版)》请在三一文库上搜索。

1、Database-assisted promoter analysis Reinhard Hehl and Edgar WingenderThe analysis of regulatory sequences is greatly facilitated by databaseThe analysis of regulatory sequences is greatly facilitated by database-assistedbioinformatic approaches. The TRANSFAC database contains information ontranscrip

2、tion factors and their origins, functional properties and sequencespecificbinding activities. Software tools enable us to screen the databasewith a given DNA sequence for interacting transcription factors. If a regulatory function is already attributed to this sequence thendatabase-assisted identifi

3、cation of binding sites for proteins or protein classes and subsequentexperimental verification might establish functionally relevant sites within thissequence. The binding transcription factors and interacting factors mightalready be present in the database.Since the establishment of the central do

4、gma ofmolecular biology, it has become obvious that the transformation ' processes of information flow fromDNAto RNA toprotein are subject to a variety ofcontrol mechanisms. The key transmitting step thatinitiates the information flow, transcription, ismediated in eukaryotes by three differentpo

5、lymerases that transcribe distinct sets of genes.RNApolymerase II transcribes all genes that encodeproteins. Many of these genes have a core promoter ' comprising the initiator site (a1r,o tuhned p ositionof the first transcribed nucleotide) and a TATA box ataround 30. The transcription apparatu

6、s for RNApolymerase II is assembled at the core promoter and,in addition to theenzyme, this machinery includesseveral general transcription factors.TB(TATAbindingprotein), a subunit of transcription factor IID,promoters1. The presence of two types of TATAbindingproteins (TBPs) in plants suggests tha

7、t,although transcription in eukaryotes is highlyconserved, fundamental differences might exist2,3.The efficiency of the transcription initiationcomplex formation is largely influenced by theregulatory transcription factors that bind to shortsequence elements that activate or repress genes in amanner

8、 that is specific for the tissue, thedevelopmental stage or the stress conditions. Theseregulatory transcription factors interact with thegeneral transcription factors directly or viacoactivators4,5. To satisfy their specific biologicalrequirements, plants have evolved unique regulatorymechanisms, i

9、nvolving completely new transcriptionfactors that have yet to be found in animals. Forexample the WRKY ( worky ' ) family of transcriptionfactors, with probably up to 100 members in Arabidopsis, regulates the expression of a variety oftarget genes involved in the response to pathogeninfection an

10、d other stresses6. Another plant-specific family of transcription factors is the Dof proteins,whose actions are related to biological processesunique to plants7. Dof proteins might contribute tothe expression of genes involved in photosynthesis, inthe response to stress and hormone signals, and in c

11、arbon metabolism8. Other examples of plant-specificfactors include thehomeodomain-ZIP (HD-ZIP) andGT-box-binding factors9.If a particular binding site occurs within apromoter, the relevant transcription factor can bind tothis site, assuming that it is present in the nucleus and is in a competent sta

12、te for binding. Such competentstates can involve heterodimer formation or specific post-translational modifications. Transcription factors normally regulate more than one gene. The presenceof a particular transcription factor binding site withinBox 1. The history of the transcription factor database

13、 TRANSFAC Model data about transcriptional regulation has been collected since 1987, mainly for vertebrates, fungi and insects. The basis for transcriptional regulation is the recognition of short sequence elements by transcription factors, which translate regulatory genomic information into biologi

14、cal reality. Therefore, the basal structure of the first data collection comprised two tables: SITES and FACTORSa. This database was subsequently called TRANSFAC and, from it, a flat file version was constructedb. It was later transformed into a relational database systemc,d. At the same time, the f

15、lat-file version of the database was made available on the World-Wide Web (http:/www.gene-regulation.de/) *.It was part of the concept to develop this and other databases, not only for encyclopedic purposes, but also to make them operational. For instance, using these database contents for the chara

16、cterization and identification of individual regulatory elementsb. This led to the development of tools such as MatInspector (Ref. e) and PatSearch (Ref. f), which are now being replaced by Match and Patch . More recently, efforts have been made to increase the number of plant-specific data setsg,h.

17、Referencesa Wingender, E. (1988) Compilation of transcription regulating proteins.N ucleic Acids Res.1 6, 18791902b Wingender, E. et al. (1991) Regulatory DNA sequences: predictability of their function.In Genome Analysis From Sequence to Function( BioTechForum Advances in MolecularGenetics) (Vol. 4

18、) (Collins, J. and Driesel, A.J., eds), pp. 95108, H üthig, Heidelberg c Knuppel, R. et al. (1994) TRANSFAC retrieval program: a network model database ofeukaryotic transcription regulating sequences and proteinsJ. Comput. Biol. 1, 191198 d Wingender, E. et al. (1996) TRANSFAC: a database on tr

19、anscription factors and their DNAbinding sites. Nucleic Acids Res.2 4, 238241 e Quandt, K. et al. (1995) MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence daNtau. cleic Acids Res.2 3,48784884 f Wingender, E. et al. (1997) TRANSFAC databas

20、e as a bridge between sequence data libraries and biological function. In Pacific Symposium on Biocomputing' 97(PSB'97)(Altman, R.B. et al., eds.), pp. 477485, World Scientificg Wingender, E. et al. (2001) The TRANSFAC system on gene expression regulation. Nucleic Acids Res.2 9, 281283h Wing

21、ender, E. et al. (2000) TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res.2 8, 316319*Most of the database tools described in this contribution are freely available to users from non-profit organizations. Users from commercial organizations are kindly requested to lice

22、nse the professional database version(s).the regulatory regions of a set of genes can reflect their mode of regulation. Often, combinations of binding sites are responsible for regulated gene expression.The joint presence of two types ocifs regulatory sequences is strong evidence for similar gene re

23、gulation, as recently shown for activated T cells10. In plants, combinatorial elements include those that control anthocyanin synthesis and abscisic-acidinduced gene expression11.Database-assisted plant promoter analysisCurrently, there are three databases that identify transcription factor binding

24、sites or cis-acting sequences in plant promoters. PLACE (http:/www.dna.affrc.go.jp/htdocs/PLACE/) is a database of 319c is-acting regulatory DNA elementsthat were collected from previously published reports12. PlantCARE (http:/sphinx.rug.ac.be:8080/ PlantCARE/index.htm) is a referential database wit

25、h 417 different names of plant transcription sites describing more than 159 plant promoters13. TRANSFAC (http:/www.gene-regulation.de/), the first transcription factor database of eukaryoticc is-acting regulatory elements andt rans-acting factors (Box 1), covers transcription factors from yeast to h

26、umans. Only TRANSFAC provides structural, expression, and functional information about the transcription factors. The number of plant transcription factors in the database has risen from 266 to 489 within the past year, and this was accompanied by a similar increase in the number of annotated sites1

27、4,15. To date, the number of sites in known plant promoters is 179, although this does not include artificial binding sites or binding sites derived from consensus sequences. There are 12 plantspecific MATRIX tables that represent 401 artificial binding sites. All plant species for which factors or

28、sites are known are covered by the database. Once a relevant functional region has been delineated in a promoter, the sequence can be screened by using the Patc,h SignalScan and Match software for real and putative transcription factor binding sites in the TRANSFAC database (Fig. 1). Once binding si

29、tes are discovered, they can be mutated specifically to test the involvement of the respective transcription factors in gene regulation (Fig. 1). Reconstruction experiments shed further light on the question of whether this site is responsible, either alone or in concert with other factors, for the

30、observed gene expression. However, in reality, things can be much more complex. For example,database-assisted promoter analysis can identify one factor that is common to all expressed promoters (Fig. 2a). Because the binding site is distributed over a variable distance from the transcription start s

31、ite, reconstruction experiments reveal that this site alone confers gene expression in a distance-independent mode. In a distance-dependent regulatory site, the binding sites are also found in genes that are not expressed (Fig. 2b). However, the genes that are expressed have the binding site at a sp

32、ecific distance from the transcription start site. These rather simplistic examples of promoter architecture might not represent the biological reality and composite elements might be responsible for specific expression profiles in most genes (Fig. 2c). Because the highly conserved core sequences of

33、 binding sites of transcription factors are relatively short, these sequences occur at a statistically predictable frequency in any given sequence. Therefore, it is helpful to have some experimental evidence about regulatory sequences before using database-assisted analysis.TRANSFAC and transcriptio

34、nal regulationTRANSFAC is a database of transcription factors and their genomic binding sites and DNA-binding profiles. This information resource is maintained as a relational database comprising 100 tables, which, in the flat-file version, are condensed to six text-basedfiles. The following descrip

35、tion refers to the main components and contents of this flat-file system.Fig. 1. Identification of regulatory sites within a functionally delineated promoter sequence.The black lines show a putative plant promoter region that is sufficient for gene expression( ) when linked to a minimal promoter ele

36、ment (TATAbox) and a reporter gene. Database-assisted analysis identified four putative transcription factor binding sites (first line). The cis sequences bound by these factors are shown with different shades of blue in the promoter region; the transcription factors are displayed in red, green or y

37、ellow above the promoter fragment. Mutations that abolish factor binding (white boxes) identified one binding site as relevant for gene expression (). This binding site alone is sufficient forgene expression, as shown by a reconstruction experiment involving the factor binding site alone.Accessing a

38、nd using TRANSFACUsers of the public version of the database need to register on the home page (http:/www.generegulation. de/). From this home page, users have access to different areas such as databases, programs, papers, commercial offers and events. When going to the databases, they find a list o

39、f databases, of which TRANSFAC is the most relevant to plant scientists. On the same page, there are options to search TRANSFAC for the presence of a particular factor, gene, matrix or site. Furthermore, one can readdetailed descriptions about the latest changes and the contents of each field in the

40、 database. There are 13 different featured programs on the home page. Currently, Patch, SignalScan and Match are the most relevant for identifying cis-acting sequences and binding factors in a given DNA sequence. FACTORThe FACTOR table holds data about individual transcription factors. Information i

41、s given about synonyms, physicochemical, local and global structural, functional properties, amino acid sequence, and gene expression patterns of transcription factors. Many transcription factors have several splice variants, which can differ greatly in their functional properties, such as transcrip

42、tional activation or DNA binding, and can even act as antagonists to each other. All these variants are included in this table as individual entries. Proteinprotein interactions are an important feature controlling the activity of transcription factors. For this reason, all those factors that are kn

43、own to interact physically are linked with each other. In future, these proteinprotein interactions will be classified as well. For instance, homo- or heterodimerization represents the most fundamental kind of interprotein complexing between transcription factors. At a higher level of complexity, th

44、is includes interactions between distinct transcription factors that are bound to distinct genomic sites and cause synergistic or antagonistic effects, which are typical for composite elements. Those proteinprotein interactions that mediate the effect of the upstream transcription factor on the basa

45、l transcription initiation complex can constitute a third level of interaction.CLASSMost transcription factors have been hierarchically classified according to the properties of their DNAbinding domains. This classification can also be used to browse those transcription factors that have been assign

46、ed a location in the classification scheme. The CLASS table gives detailed explanations about individual classes and superclasses in this classification scheme.SITE The known genomic binding sites and sequences of DNA-binding transcription factors are given in the SITE table. The corresponding gene

47、and the position of the binding site relative to its transcription start site (or another reference position, if appropriate) are given. The factorsite interactions are qualified by a number that reflects the confidence level of the underlying experimental evidence. Thus, merely suggested binding si

48、tes are rated 5, whereas a clear-cut identification of a transcription factor binding to a site complemented by functional evidence is rated 1. The genomic binding sites are linked to the corresponding entry in the GENE table. Artificial binding sequences are also documented in the SITE table; these

49、 might have been published in random selection studies to determine the DNA-binding properties of a certain transcription factor. Artificial binding sequences frequently serve as training sets to deduce a positional weight matrix. Finally, the SITE table also contains consensus strings using the 15-letter IUPAC alphabet; many of these have been published16.GENE The GENE table contains the names and acronyms of all genes with at least one transcription factor binding site listed in TRANSFAC. The location of the individual b

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 社会民生


经营许可证编号:宁ICP备18001539号-1