Laboratory for Systems Biology
Home About LSB Activities Publications Information FAQ Contact English Japanese

Identification

I. Identification of Clocks
Following the completion of genome projects for species such as mouse and human, genome-wide resources such as siRNA or cDNA libraries have undergone considerable expansion. Development of high-throughput technologies also assists in the efficient use of these resources. These genome-wide resources and technologies as well as genome-associated information currently allow us to comprehensively identify system components of interest (System Identification).
Circadian clocks of multicellular organisms consist of complexly integrated regulatory loops with positive or negative regulators known as clock genes. The transcriptional regulation network of these genes forms a circadian clock oscillator, which is known to control out-put genes and to affect physiological and metabolic processes. Although some transcriptional regulations of identified clock genes have been the subject of previous studies, a system-level understanding of circadian clocks remains to be elucidated. In this Identification of Clocks section, we provide results of our system identification of circadian clocks. We also found the topological meanings of each component behind the network (Section 3-I-1). In the second part, we present the challenges involved in finding direct target genes which are directly controlled by the circadian clock utilizing our novel genome-wide promoter/enhancer database (Section 3-I-2). In the final part, we go on to show a successful example of our identification of the circadian clock system with a combination strategy in which we orchestrated three types of functional genomics approaches. The model animal used in this research was Drosophila, due to its ease of use in applying molecular biological technologies (Section 3-I-3).

I-1. Identification of the Mammalian Clock Circuit
The mammalian circadian master clock is primarily located in the suprachiasmatic nucleus (SCN). Transcript analyses have indicated that circadian clocks are not restricted to SCN but are found in several tissues including liver and cultured fibroblast cells such as Rat-1 or NIH3T3 cells. The mechanisms underlying circadian rhythms are also known to be conserved across species. At the basic core of the clock lies a transcriptional/translational feedback loop, whose primary components are known as "clock genes." For example, in the mouse system, transcription factors CLOCK and BMAL1 proteins dimerize and directly and indirectly activate transcription of the Per and Cry genes through E-box elements (5'-CACGTG-3'). The PER and CRY proteins accumulate in the cytosol, and are then translocated following phosphorylation into the nucleus where they inhibit the activity of CLOCK and BMAL1. The turnover of the inhibitory PER and CRY proteins leads to a new cycle of activation by CLOCK and BMAL1 via E-box elements. Despite the reporting of many transcriptional regulations of each gene, however, an overview of circadian clock core network remains to be put forward.
Complicated networks cannot be elucidated without access to both 1) comprehensive identification of network circuits and 2) accurate measurement of system dynamics. In a previous attempt to comprehensively identify the circadian clock core network (prior to the start of LSB in the CDB in 2003), we first quantitatively and comprehensively measured genome-wide gene expression using GeneChip technology and identified genes showing circadian oscillation with characteristic expression patterns through biostatistics (Fig. 3-I-1-1). The second step involved comprehensively determining the transcription start sites (TSSs) and conserved non-coding regions to construct the genome-wide promoter/enhancer database. Using these data, we predicted that there was a relationship between expression patterns of identified genes and DNA regulatory elements on their promoter/enhancer regions. We found clock-controlled elements (CCEs), E (5'-CACGTG-3')/E'-boxes (5'-CACGTT-3'), RREs (5'-[A/T]A[A/T]NT[A/G]GGTCA-3'), or D-boxes (5'-TTATG[C/T]AA-3')} are distributed throughout the oscillatory genes (Ueda H.R. et al., 2002, Nature).


Figure. 3-I-1-1. (A) Strategy for identification of clock-controlled elements (CCEs). Gene expression information was obtained by performing comprehensive expression profiling (left panel). Through statistical analysis, genes with special characteristic pattern of expression (circadian oscillation) were selected (right panel, oscillatory genes are indicated by their color). DNA regulatory elements for specific issue (i.e., expression timing) were predicted by combining expression pattern information and transcriptional regulatory elements information from promoter regions. (B) Genome-wide expression profiles in mouse central (SCN) (left panel) and peripheral (liver, right panel) clocks. Total RNA were extracted every four hours during light/dark cycles (LD) or constant darkness (DD) over two days, and used to determine genome-wide gene expression profiles with Affymetrix mouse high-density oligonucleotide probe array (GeneChip). Data were normalized so that the average signal intensity and standard deviation over 12-point time courses were 0.0 and 1.0, respectively. Columns represent time points, and rows represent genes that were organized by peak-time. Colors in descending order from red to black to green represent the normalized data. From the obtained data, we identified a set of genes rhythmically expressed under both LD and DD. We classified 101 genes in the SCN and 393 genes in the liver as "significantly rhythmic under both LD and DD." (C) Temporal expression profiles of transcription factors in the SCN (upper panel) and liver (lower panel) under constant darkness (DD) conditions. Relative mRNA levels under DD condition of the indicated genes were measured with Q-PCR assay, in which GAPDH expression was used as an internal control. Data were normalized so that the average copy number (Q-PCR) over a 12-point time course is 1.0. Circadian expression of transcription factors having functional and evolutionary conserved E-boxes (Dbp, Dec1 and Dec2), both of E-boxes/E'-boxes and D-boxes (Per1, Per2, RevErbAα and RevErbAβ), D-boxes (Per3, Rorα and Rorβ), RREs (Bmal1, Clock, Npas2 and E4bp4), both of E-boxes/E'-boxes and RREs (Cry1 and Rorγ), on their non-coding regions. Clock and Rorγ were constitutively expressed in the SCN. All the data shown in this figure were published prior to start of LSB in the CDB.

Following the creation of LSB in the CDB in 2003, in order to determine the role of these elements in circadian clock, we utilized an in vitro cell culture system, with which we can monitor circadian rhythms in transcriptional dynamics using a destabilized luciferase (dLuc) reporter driven by clock-controlled promoters (Fig. 3-I-1-2 A). In this in vitro cell culture system-named "in vitro cycling assay"-we transiently transfected reporter constructs into cultured Rat-1 cells and stimulated with dexamethasone and measured their bioluminescences. Dexamethasone was administrated to induce circadian oscillations in the cultured cells. Through the genome-wide searching described above, we found CCEs on 16 clock/clock-controlled genes promoter/enhancers. Then using in vitro cycling assay system, we were able to reveal that functionally and evolutionary conserved E/E'-boxes are located on non-coding regions of nine genes {Per1, Per2, Cry1, Dbp, Rorγ, RevErbAα (Nr1d1), RevErbAβ (Nr1d2), Dec1 (Bhlhb2) and Dec2 (Bhlhb3)}, D-boxes on those of seven genes {Per1, Per2, Per3, RevErbAα, RevErbAβ, Rorα and Rorβ}, and RREs on those of six genes {Bmal1(Arntl), Clock, Npas2, Cry1, E4bp4 (Nfil3) and Ror γ }. Based on this functional and conserved transcriptional regulatory mechanism, we succeeded in drawing transcriptional circuits underlying mammalian circadian rhythms (Fig. 3-I-1-2 B) (Ueda H.R. et al., 2005, Nature Genetics).
Our analysis further suggested that regulation of E/E'-boxes is the topological vulnerability point in mammalian circadian clocks. We functionally verified this concept using in vitro cycling assay systems (Fig. 3-I-1-2 C). Overexpression of repressors of E/E'-box regulation (CRY1), RRE regulation (REVERBAα) or D-box regulation (E4BP4) affected circadian rhythmicity in Per2 or Bmal1 promoter activity. The effects were different, however, between each repressor, and the severest effect was observed when the E/E'-box was attacked. Such different modes of effect cannot be explained by mere quantitative differences in the strength of these three repressors, indicating that there is some qualitative difference between E/E'-box, D-box, and RRE regulation in circadian rhythmicity.


Figure 3-I-1-2. (A) Schematic over view of experiment. Cultured mammalian cell (Rat-1) was transfected with dLuc under the regulation of CCE and SV40 basic promoter. The circadian change of the bioluminescence was monitored by PMT detector over several days (upper panel). Representative circadian rhythms of bioluminescence from wild-type CCE fused to the SV40 basic promoter driving a dLuc reporter. The circadian bioluminescence phase from the Per2 promoter and that of the Bmal1 promoter are marked by yellow and purple line respectively (bottom panels). (B) Schematic representation of transcriptional network of mammalian circadian clock. Genes and CCEs are depicted as ellipsoids and rectangles respectively. Transcriptional/translational activation and repression are depicted as gray, green and red lines respectively. (C) Effect of repression on each CCEs. The E/E'-boxes, D box and RRE are repressed by over production of CRY1, E4BP4 and REVERBAα, respectively. The consequences of those repressions were monitored by Per2-dLuc (upper panel) and Bmal1-dLuc (lower panel). dLuc; destabilized luciferase, CCE; clock-controlled elements, PMT; photomultiplier tube.

I-2. Identification of the Direct targets of The Mammalian Clock
The transcription factors in the core of the circadian clock system recognize the clock-controlled elements (CCEs; E/E'-box, D-box and RRE), and control the transcriptional output of genes downstream of the CCEs. Both our laboratory and other groups have noted that ~10% of the genome is under circadian regulation. This led us to think that the output genes directly controlled by circadian clock (direct target genes) may contain CCEs in each promoter region, and expression of the genes is controlled via their CCEs (Section 3-I-1). We assumed that the existence of the functional CCEs in each promoter region can be used as an index to indicate the clock-controlled genes (CCGs), which are directly regulated by clock genes.
To identify the CCGs, we first constructed a general "Mammalian Promoter/Enhancer Database" (http://promoter.cdb.riken.jp/) by integrating information of conserved non-coding regions, transcriptional starts sites (TSSs) and transcription factor binding sites (TFBSs) (Fig. 3-I-2-1 A), which is generally useful and can be applied to any aspect of mammalian transcriptional regulations. We utilized this database with computational models of CCEs to predict new direct targets of the clock, and subsequently validated these targets at a cellular and organismal level.


Figure. 3-I-2-1 Construction of the mammalian promoter/enhancer database, and prediction of CCEs using hidden Markov models (HMMs). (A) Mammalian full-length cDNA and EST sequences were initially mapped onto mammalian genome sequences. These mammalian genes were then compared in order to identify 16,268 human-mouse orthologues. The positional information of adjacent orthologues was used to determine 434 human-mouse synteny regions, which contain 750,043 human-mouse conserved genomic regions. The 862 consensus sequences for TFBSs from TRANSFAC are then mapped on these conserved genomic regions to identify the 7,804,559 putative TFBSs conserved between human and mouse in non-coding regions. Finally, visualization of the putative promoter/enhancer and TFBSs data and curation of current genes were integrated into the "Mammalian Promoter/Enhancer Database." (B) Chromosomal distributions of predicted CCEs mapped on the mouse genome. Chromosomal positions of the 100 most significant hits for E-boxes, D-boxes, and RREs are shown in red. (C) Plots of FDRs (false discovery rates) against match scores of HMM searches in three conditions: 1) searches for conserved elements within conserved non-coding regions (red, conserved element); 2) searches for mouse elements within the conserved non-coding regions, relaxing the requirement of element conservation (blue, non-coding region); and 3) searches in the entire genome relaxing both element conservation and search space (orange, whole genome). FDRs in conserved elements search are plotted against the average match score of human and mouse elements.

Hidden Markov models (HMMs), which have statistical properties and tolerance for insertions and deletions, were then built and calibrated on known functional CCEs. HMM searches of conserved non-coding regions for CCEs between human and mouse revealed 1,108 E-boxes, 2,314 D-boxes, and 3,288 RREs (Fig. 3-I-2-1 B). Interestingly, putative E-box elements displayed a biased distribution of distance from TSSs, while putative D-box and RRE elements showed unbiased distributions that approximated a random distribution. These results suggest the positional preference of putative E-boxes around TSSs, which might reflect a core promoter requirement for structural genes that harbor this element. To estimate the accuracy of the prediction for each putative CCEs, we calculated the false discovery rate (FDR), a statistic that reflects the proportion of false positives in a series of observations, using simulations and by performing searches against randomized genome sequences. The value of the FDR is inversely proportional to the match score of the HMM, which is a representation of the statistical significance of the candidate element (Fig. 3-I-2-1 C). Importantly, we found the accuracy of the HMM-based prediction as measured by the FDR is also dependent on search conditions. HMM searches in conserved elements within conserved non-coding regions (the original condition) had the lowest FDR, while requiring a significant response element hit in only a single species in conserved non-coding regions, or searching the genomes without respect to conservation, showed less significant results (Fig. 3-I-2-1 C). These results demonstrate the value of utilizing human/mouse conservation and a confined search space for the most accurate response element predictions.
In looking to validate these predictions, we used an in vitro cycling assay system to empirically test candidate elements in circadian transcriptional output assays. We selected the ten most significant sequences for each HMM search, E-box, D-box and RREs that were located within 1kb of the TSS. Three tandem repeats of the genomic sequences containing the predicted CCEs were fused to the SV40 basic promoter driving a destabilized Luciferase (dLuc) reporter, and we monitored their bioluminescence using an in vitro cycling assay system (Fig. 3-I-2-2 A left). As a result, 40% of E-boxes, 70% of D-boxes, and 60% of RREs generated strong circadian transcriptional activity (P < 0.01 and high-amplitude) in phase (peak timing of expression) with those of Per1 E-box, Per3 D-box, and Bmal1 RRE, respectively (Fig. 3-I-2-2 A). The remaining sequences generated weak, low amplitude circadian transcriptional activity, or were arrhythmic. To confirm whether these elements play a prominent role in gene regulation in vivo (Fig. 3-I-2-2 B), we also examined temporal expression profiles of the predicted 17 CCGs at seven mouse tissues (aorta, bone, heart, kidney, liver, lung and muscle) known to contain the clock with Q-PCR. Evaluation of rhythmicity from the results revealed that 13 genes (76%) showed circadian expression profiles (P < 0.03): three E-box controlled genes, four D-box controlled genes and six RRE controlled genes, respectively. Both of these in vitro and in vivo experiments suggest that the majority of the predicted E-box, RRE, and D-box containing genes are bona fide circadian output genes. The manuscript of this study is currently under submission.


Figure. 3-I-2-2 Experimental validation of HMM-based predictions at cellular and organismal levels. (A) Circadian rhythms of bioluminescence from the predicted CCEs fused to the SV40 basic promoter driving dLuc reporter in NIH3T3 cells. Three known CCEs (Per1 E-box, Per3 D-box and Bmal1 RRE) are used as positive controls. The bioluminescence data were detrended in baseline and amplitude, and normalized so that their maximum, minimum, and average were set to 1, -1, and 0, respectively. The colors in descending order from magenta to black to green represent the detrended bioluminescence. Columns represent time points, and rows represent the predicted elements on the designated genes. (B) Temporal mRNA expression profiles of the predicted CCGs in mouse tissues. The colors in descending order from magenta to black to green represent the normalized data (the average and standard deviation over 12-point time courses are 0.0 and 1.0, respectively). Columns represent time points, and rows represent the predicted CCGs in the designated tissues.

I-3. Identification of the Fly Clock Circuit
The relatively simple Drosophila genome provides an excellent model for studying the elaborate network of the circadian clock system in mammals, as it is more comprehensively characterized, more amenable to experiments, and many circadian genes are known to be conserved between fly and mammals. We employed three functional genomic technologies to dissect this clock system; statistical analysis of DNA microarrays, in vivo RNA interference (RNAi) and ChIP-on-Chip. Briefly, we obtained candidates of new circadian genes by statistically analysing DNA microarray data, checked their physiological function by in vivo RNAi, and then validated its molecular function using ChIP-on-Chip. The combination of these three technologies led us to the successful identification of a new gene named clockwork orange (cwo) in the core of Drosophila circadian clock system as recently reported in the journal Genes & Development (Matsumoto A., Ukai-Tadenuma M., Yamada R.G. et al, 2007, Genes & Dev), highlighting the power of such advanced methodologies. In this report, we described how cwo forms a previously unknown negative regulatory feedback loop which sustains the clear (high amplitude) oscillation of circadian genes, including itself.
We utilized the dataset we had obtained in a previous genome-wide analysis of the circadian gene-expression profiles in Drosophila heads, using DNA microarrays. Statistically analyzing this dataset, we selected 200 genes which showed the most prominent circadian expression under both LD (light-dark) and DD (dark-dark: constant dark) conditions. We used those 200 genes as candidates in a subsequent search for new core clock genes. Core clock genes, which encode critical components of the clock system consisting of a set of coupled feedback loops, have to be distinguished from subsidiary clock-regulated "output genes."
To effectively distinguish core clock genes from the output genes, we applied genome-wide functional screening using an RNAi system in vivo and observed the circadian phenotypes of these genes' mutants. This in vivo RNAi is a new functional genomic strategy that can overcome the disadvantages of traditional mutant screening strategies. In this new strategy, utilizing genome sequence information, we raised flies having a genomic insertion of designed IR (Inverted Repeat) sequences to be transcribed as a double-strand RNA of the target gene when induced, in our case, by the Gal4-UAS binary system. Mating these UAS-IR transgenic flies to flies expressing Gal4 almost exclusively in pacemaker neurons, we were able to knockdown target genes with minimal undesired side-effects (Fig. 3-I-3-1 A).
The result of the in vivo RNAi strategy to the well-known core clock genes endorsed the validity of the strategy. Namely, the knockdown of the two well-known clock genes; per and tim resulted in phenotypes consistent with previous reports (Fig. 3-I-3-1 B left). Using this strategy, we successfully raised fly lines for 137 gene knockdowns, and isolated five genes as novel core clock gene candidates (Fig. 3-I-3-1 B right). Among the five candidates, we focused on the CG17100 gene since its locomotor phenotype was the strongest and most stable among all of them.


Figure. 3-I-3-1 Functional genomics strategy; in vivo RNAi, revealed cwo as a clock component. (A) Genome-wide tissue-specific knockdown analysis of clock-controlled genes in Drosophila. This involved establishing UAS-IR transgenic lines to express dsRNA for the target gene under the control of UAS. Each of the UAS-IR lines was mated to driver lines to induce the expression of dsRNA specifically within clock cells. The locomotor activity of RNAi transgenic flies for 137 candidates among 200 clock-controlled genes was recorded under DD condition. UAS-TATA sequence or UAS sequence (yellow rectangle), ~500 bp fragment of a target gene (red arrow), the clock specific promoter (tim promoter) region (green rectangle) and gal4 gene (pink arrow) are represented, respectively. (B) Typical locomotor activity in wild-type, knockdown flies of well-known core clock genes (left) and five new candidates (right). The names of the knocked down genes are described at each actogram.

We named this gene clockwork orange (cwo) as it encodes a transcriptional repressor, with a length of 685 amino acids, belonging to the basic helix-loop-helix (bHLH)-ORANGE family. Our Q-PCR measurement revealed that expression of cwo rhythmically changed in LD and DD in wild-type flies, peaking very closely in phase with well known clock genes such as per and tim. As per and tim are known to be regulated by the E-box sequence (CA[C/A]GTG) that exists in their promoter regions, we searched for the E-box sequence in the cwo promoter region and found a statistically significant number of E-boxes. We therefore examined whether the E-box activator, CLOCK-CYCLE (CLK-CYC) heterodimer, could induce gene expression through these E-boxes, performing reporter assays in Drosophila S2 cells with luciferase reporter gene, and confirmed that cwo could be directly regulated by CLK-CYC heterodimer through E-box on the cwo promoter region.
In an attempt to further investigate how cwo contributes to clock system in vivo, we measured the temporal expression profiles of the well-known core clock genes which are regulated through E-box; per, tim, vri and Pdp1 in cwo RNAi transgenic flies using Q-PCR. The expression amplitude of these genes drastically decreased to half the level found in wild-type flies (Fig. 3-I-3-2 A), suggesting that cwo functions to produce a high-amplitude oscillation of clock genes expression. Furthermore, we also found that cwo strongly suppresses its own oscillation (Fig. 3-I-3-2 A).
We further observed that the promoters of per, tim, vri and Pdp1 were strongly suppressed by the CWO with reporter assays in S2 cells. These results indicated that CWO protein targets genes with E-box in their promoters (Fig. 3-I-3-2 B), potentially include clock output genes. In an attempt to identify the potential targets of CWO protein at the genome-wide level, we performed a chromatin immunoprecipitation (ChIP) assay using a Drosophila genome tilling array (Fig. 3-I-3-2 C). Among the 1,512 sites detected, we confirmed that the CWO protein binds to the promoters of known clock genes vri and Pdp1, both of which have the E-box sequence. Interestingly, we also found that CWO protein strongly binds to its own promoter region. The significance of these results was verified by Q-PCR on each promoter region using the ChIP product as a template. Our subsequent bioinformatical search for the consensus DNA sequence recognized by the CWO protein identified a sequence containing canonical E-box (CACGTG, Fig. 3-I-3-2 C), strongly supporting the idea that CWO directly targets E-box.


Fig. 3-I-3-2. CWO protein directly targets known clock genes through E-box. (A) Temporal expression profiles of per, tim, vri and Pdp1 mRNA in wild-type (black circle) and cwo RNAi transgenic (red rectangle) flies under LD condition. Relative mRNA levels of each gene were measured with Q-PCR. GAPDH2 was used as an internal control. Error bars represent SEM (n=2). (B) Transcriptional circuit underlying Drosophila circadian clock. Ellipsoids represent clock proteins, and rectangles represent time-of-day specific DNA elements. Activators and repressors are represented in green and red, respectively. CWO protein directly binds to E-boxes and functions as a repressor. (C) ChIP-on-Chip experiment was performed with overexpressed and then immunoprecipitated CWO protein with its target genomic DNA sequences. The precipitated DNA sequences were amplified and labeled for subsequent hybridization to a Drosophila genome tiling array (left). Potential binding sites (black vertical bar on each chromosome) were identified and schematically displayed on the Drosophila genome. The locations of vri, Pdp1, and cwo genes are indicated as a green vertical bar (top right). The DNA sequence overrepresented in those potential CWO-binding sites was identified as canonical E-box by bioinformatics (bottom right).

The isolation of cwo as a new clock gene and subsequent identification of a new negative feedback loop in the Drosophila circadian clock in this study revealed that the circadian transcriptional regulation through E-boxes is more complex than previously thought. The indirect autoregulatory negative feedback mechanism by PER and TIM through E-boxes, which is one of the key factors in circadian oscillation, has been the subject of extensive study. The direct suppression mechanism through the E-box, however, has yet to be elucidated prior to our study. Indeed, it has long been a mystery as to how the constitutive expression of per and tim double mutant can rescue rhythmicity at the behavioral and molecular levels. Our finding that cwo, one of the CLK-CYC target genes, can suppress the expression of a group of clock genes through directly binding to E-boxes suggests a new pathway for the negative feedback regulation in Drosophila clock which sustains a high-amplitude circadian expression assuring the molecular rhythmicity even when a functional disorder occurs in other feedback loops.
This achievement also highlights the power of functional genomic approaches such as DNA microarrays, in vivo RNAi and ChIP-on-chip in addressing complex biological networks. Although the elucidation of the clock system is still far from complete, the discovery of cwo (clockwork orange) , which also has a homolog in the human genome, represents an important step in deciphering biological clocks at the systems level.

Page Top
Copyright(c) Laboratory for Systems Biology. All rights reserved. Terms of Use