Introduction

1. Data summary
2. Data collection
3. Data analysis
4. Data update

1. Data summary

1.1 BD genetic factors

BDgene contains multi-type BD-related genetic factors extracted from 918 studies. Category statistics of BD related genetic factors were shown in Table 1.

Table 1 Data content and statistics of BD related genetic factors in BDgene (by March 31, 2016)

1.2 BD shared genetic factors with SZ and MDD

The genetic studies that investigated the association of genetic factors with more than one disease in one study under identical or similar methodological conditions were regarded as 'cross-disorder studies'. Shared genetic factors between BD and SZ/MDD were obtained from the cross-disorder studies on two diseases (BD-SZ or BD-MDD) or three diseases (BD-SZ-MDD). Meanwhile, potential cross-disorder genes from intersection analysis of multi-disease candidate genes were also presented. Category statistics of BD shared genetic factors with SZ/MDD were shown in Table 2.

Table 2 Statistics of shared genetic factors of BD-SZ, BD-MDD and BD-SZ-MDD (by March 31 2016)

2. Data collection

2.1 Literature search

BDgene was established based on an integrated high-quality dataset derived from profound literature reading with manual curation. These publications for genetic susceptibility studies of BD were obtained via an extensive search in PubMed, including genome-wide association study, candidate-gene association study, linkage study, mutational study, copy number variation analysis and meta-analysis for association study or linkage study. The search terms are as below:

("bipolar" [Title/Abstract] OR "manic depressive" [Title/Abstract] OR "manic depression" [Title/Abstract]) AND (polymorphism [Title/Abstract] OR SNP [Title/Abstract] OR haplotype [Title/Abstract] OR interaction [Title/Abstract] OR variant [Title/Abstract] OR variation [Title/Abstract] OR mutation [Title/Abstract] OR CNV [Title/Abstract] OR "copy number variation" [Title/Abstract] OR repeats [Title/Abstract] OR deletion [Title/Abstract] OR duplication [Title/Abstract] OR ((gene [Title/Abstract] OR locus [Title/Abstract] OR chromosome [Title/Abstract] OR genetic [Title/Abstract] OR genome [Title/Abstract] OR genomic [Title/Abstract]) AND (linkage [Title/Abstract] OR associat* [Title/Abstract])))

2.2 Data extraction and categorization

Multi-type genetic factors were collected from literature, not only disease related SNP, gene, region, but also haplotype, gene-gene interaction, pathway, and other variants. To present a panoramic relation between genetic factors and disease, statistical results (e.g. P-values, ORs, LODs) were evaluated and detailed evidences (e.g. study design, sample population, analytical method) from original publications were presented. To better illustrate the association between genetic candidates and diseases, results were categorized into 'Positive', 'Negative' and 'Trend' according to the criteria described in Table 3. We also presented authors' original comments as an additional reference for the results evaluation.

3. Data analysis

3.1 Gene prioritization
Gene prioritization, aiming to help researchers evaluate genetic significance of candidate genes and explore the most promising genes for follow-up studies, was conducted among all BD candidate genes stored in BDgene by adopting five multiple source based gene prioritization tools including Endeavour, DIR, ToppGene, ToppNet and TargetMine as what we have done for ADHD (2). The first four tools were designed based on the 'guilt-by-association' principle, that is genes with solid evidence of involvement in disease susceptibility were regarded as training genes and genes having the closest relationship with training genes were considered as the most promising candidate genes. So the input of these tools included not only all BD candidate genes, but also a list of training genes. In our analysis, 20 genes were included in the training gene set, among which 18 genes were extracted from an extensively cited review paper about liable genes of bipolar disorder (3) with a threshold of at least three positive results in BDgene and another two important genes, ANK3 and CACNA1C, were reported and validated recently by several GWAS and meta-analysis (4-6). After inputting training genes and BD candidate genes, each tool would output a gene list as the ranking result. Fro Endeavour, DIR, ToppGene and ToppNet, top 50 genes were taken for further comparison analysis. For TargetMine, which pinpointed gene enriched biological terms after inputting the candidate genes, it would output enriched biological terms of KEGG pathways, GO terms and disease ontology terms respectively, then the top seven pathways of three resources were obtained respectively and intersected genes involved in these pathways were regarded as the results of TargetMine (7). Finally, we took the results from five tools together, and those genes predicted by at least three tools among the five ones were regarded as the most promising candidate genes, called 'prioritized genes'. Both training genes and prioritized genes were defined as BD 'core genes'.

3.2 Pathway-based analysis (PBA) of GWAS data
To detect novel candidate pathways/gene sets for follow-up studies, we performed PBA on 12 published independent BD GWAS data sets (Table 4) by using i-GSEA (improved gene set enrichment analysis) (8), a PBA method developed by our team for identification of pathways/gene sets associated with traits. Annotated GO terms downloaded from MSigDB v3.0 and related pathways from KEGG and BioCarta were regarded as reference gene sets. "5kb upstream and downstream of gene" was used to map SNPs to genes, while other default parameters were used as what the i-GSEA4GWAS (8) had provided. Pathways/gene sets with statistical significance of FDR < 0.05 were included in BDgene database.

3.3 Intersection analysis of multi-disease candidate genes
Candidate genes intersection analysis was performed on three data sets: BD related genes from BDgene, SZ related genes from SZGene (9) and MDD related genes from MK4MDD, a multi-level knowledge base for major depressive disorder developed by our group (10). Overlapped genes from BDgene-SZGene and BDgene-MK4MDD were considered as potential shared genes for BD-SZ and BD-MDD respectively. All overlapped genes between BD-SZ shared genes and BD-MDD shared genes (including both from cross-disorder study and intersection analysis) minus the genes from cross-disorder studies investigating all three diseases (BD-SZ-MDD) were considered as BD-SZ-MDD potential shared genes from intersection analysis. The numbers of potential cross-disorder candidate genes were summarized in Table 2.

3.4 Functional annotation and pathway enrichment analysis
In order to benefit the understanding of all reported candidate SNPs/genes, a full annotation to all SNPs and genes was made. All annotations to SNPs were obtained from Ensembl. Latest annotation was based on version GRCh38/hg38. Furthermore, to better illustrate the function of genetic factors associated with BD, we conducted linkage disequilibrium (LD) analysis to seek for the LD-proxies of the reported SNPs as novel candidate SNPs, especially the causal ones. The LD data in the analysis were downloaded from HapMap FTP (ftp.ncbi.nlm.nih.gov/hapmap/ld_data/2009-04_rel27/). They were compiled from merged genotype data from phases I+II+III (HapMap rel #27, NCBI B36) for markers up to 200 kb apart submitted by HapMap genotyping centers to the Data Coordination Center. Population information for LD-analysis was consistent with original literature. SNPs with a threshold of 0.8 for r² were regarded as LD-proxies. The extensive gene annotation includes gene-related GO terms, gene-related pathways from KEGG and BioCarta and protein-protein interactions from HPRD. Finally, to provide clues for functional pathways of these genes and thus to explore the mutual etiological mechanisms, pathway enrichment analyses on BD core genes from gene prioritization analysis and literature reported shared genes with at least one positive result were implemented by using DAVID 6.7 to highlight the most relevant pathways associated with the gene lists.

4. Data update

Before the formal release of BDgene, BDgene has been updated three times, including Beta V1.0 (till June 30, 2012), Beta V2.0 (till October 31, 2012) and Beta V3.0 (till March 1, 2013). Since 2013, BDgene was updated quarterly. The version formula is V+year+q+number (e.g. V2013q1). But after several quarters' update, we found the increased papers for each update was only around 10. Then we decided to update BDgene every half year from the second half of 2014.

To show the latest publications on BD genetic study, newly published studies in the recently month will be downloaded automatically monthly from PubMed by using the search formula as shown above. Then, manual filtering is performed to remove the irrelevance. The papers after filtering are shown in the rolling window of "New Papers" on the home page, and the data extraction and analysis will be carried out in the next update.

Reference:
1. Lander, E. and Kruglyak, L. (1995) Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nat Genet, 11, 241-247.
2. Chang S, Zhang W, Gao L, Wang J (2012): Prioritization of candidate genes for attention deficit hyperactivity disorder by computational analysis of multiple data sources. Protein & Cell. 3:526-534.
3. Serretti, A. and Mandelli, L. (2008) The genetics of bipolar disorder: genome 'hot regions,' genes, new potential candidates and future directions. Mol Psychiatry, 13, 742-771.
4. Ferreira MA, O'Donovan MC, Meng YA, Jones IR, Ruderfer DM, Jones L, et al. (2008): Collaborative genome-wide association analysis supports a role for ANK3 and CACNA1C in bipolar disorder. Nature genetics. 40:1056-1058.
5. Liu Y, Blackwood DH, Caesar S, de Geus EJ, Farmer A, Ferreira MA, et al. (2011): Meta-analysis of genome-wide association data of bipolar disorder and major depressive disorder. Mol Psychiatry. 16:2-4. 6. Green EK, Hamshere M, Forty L, Gordon-Smith K, Fraser C, Russell E, et al. (2012): Replication of bipolar disorder susceptibility alleles and identification of two novel genome-wide significant associations in a new bipolar disorder case-control sample. Mol Psychiatry. 7. Chen YA, Tripathi LP, Mizuguchi K (2011): TargetMine, an integrated data warehouse for candidate gene prioritisation and target discovery. PLoS ONE. 6:e17844.
8. Zhang, K., Cui, S., Chang, S., Zhang, L. and Wang, J. (2010) i-GSEA4GWAS: a web server for identification of pathways/gene sets associated with traits by applying an improved gene set enrichment analysis to genome-wide association study. Nucleic acids research, 38, W90-95.
9. Allen NC, Bagade S, McQueen MB, Ioannidis JPA, Kavvoura FK, Khoury MJ, Tanzi RE, Bertram L (2008) Systematic Meta-Analyses and Field Synopsis of Genetic Association Studies in Schizophrenia: The SzGene Database. Nat Genet 40(7): 827-34.
10. Guo, L., Zhang, W., Chang, S., Zhang, L., Ott, J. and Wang, J. (2012) MK4MDD: A Multi-Level Knowledge Base and Analysis Platform for Major Depressive Disorder. PLoS ONE, 7, e46335.