Guidelines for genetic association studies in Diabetologia
There is a widely accepted need to improve the robustness of published genetic association findings. We also need to provide the readership of the journal with information that allows a more complete assessment of the biological significance of the findings reported in these kinds of manuscripts. Submissions to Diabetologia should, therefore, pay careful attention to the following fundamental issues of study design. It is not intended that these represent absolute criteria for publication in Diabetologia (we don’t want to block otherwise interesting studies that fail to meet one or other of these). However, these guidelines set out the main factors that we expect our reviewers and Associate Editors to use in evaluating the quality of the manuscripts we receive.
Use of publicly available data from genome-wide association studies (GWAS)
In 2007, two of the type 2 diabetes GWAS (the Diabetes Genetics Initiative [DGI] http://www.broad.mit.edu/diabetes/, and WTCCC http://www.wtccc.org.uk) and one of the type 1 diabetes GWAS (http://www.wtccc.org.uk) made their case–control data available to bone fide researchers. Furthermore, results on ~2.2M directly genotyped and imputed SNPs from a meta-analysis of three high-density GWAS (FUSION, DGI and WTCCC: effective sample size ~9500), which incorporates data from a fourth study (deCODE), were also made publicly available (http://www.well.ox.ac.uk/DIAGRAM/). One study (the Diabetes Genetics Initiative http://www.broad.mit.edu/diabetes/) also made public their data from multiple diabetes-related quantitative traits. However, in August of 2008 a report by Homer et al. (PloS Genetics 4(8):e1000167. http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1000167; doi:10.1371/journal.pgen.1000167) showed that if a knowledgeable investigator had access to a specific DNA sample, s/he could determine with a high degree of certainty whether the owner of that DNA sample had participated in a given GWAS and infer case–control assignment, based solely on anonymised genotype counts across ~25,000 SNPs. This realisation motivated the withdrawal of such datasets from public websites while appropriate privacy protection measures are implemented. Nevertheless, investigators from the aforementioned GWAS presumably remain committed to dissemination of their results and validation of related findings by bona fide researchers. Therefore, subsequent studies now need to consider their findings in light of these data and any similar results that become available in the future. We encourage authors of genetic association studies submitted to Diabetologia (whether on candidate genes, regions of previous linkage or the whole genome) to meta-analyse these datasets with their own, to provide a more definitive answer on whether or not the variation under consideration alters diabetes risk or related quantitative traits. This can be done in datasets downloaded prior to their withdrawal, by requesting the relevant data from the same GWAS investigators, or by formal collaboration; an attempt to pursue this kind of analysis should be documented in submitted manuscripts. If a variant under consideration has not been directly typed in the publicly available data then we encourage the use of close proxies (r2>0.8 based on HapMap data for the appropriate population) instead, as long as the authors specify that a proxy was used and state its rs number. Authors should state the evidence for association with and without the previously reported data so that it is clear how much the new result is adding – clearly studies of >1000 individuals are likely to be needed to alter the evidence appreciably, given the approximately 3500 cases and 4500 controls for type 2 diabetes, 2000 cases and 3000 controls for type 1 diabetes, and 1500 controls and 1500 type 2 diabetes cases with quantitative trait data that are available. When results are inconsistent with the published evidence, authors should provide a potential rationale for such discrepancy.
Studies should include sufficient samples to have power to detect effect sizes that are reasonable given current understanding of the genetic architecture of complex traits. Power calculations should be included that make explicit the effect sizes that the study was powered to detect: such power calculations should guide the interpretation of the data. Wherever possible, all available samples should be typed: results based on only a portion of a larger sample are of limited interest.
Genetic association studies often involve testing of a large number of hypotheses (for example, multiple SNPs or haplotypes; multiple phenotypes; multiple analytical models; testing of multiple strata such as male/female, lean/obese). Manuscripts should feature explicit discussion of the consequences of multiple hypothesis-testing for the interpretation of the findings. Assessments of the significance of the findings should be related to the study-wide (or genome-wide) significance.
Functional data (e.g. demonstration that a SNP alters expression) can strengthen association findings but the functional assays must have demonstrable relevance to the phenotype showing the association. Good functional data do not compensate for a poor association study.
Whole gene studies
Single SNP studies are acceptable where the SNPs typed have strong prior claims for involvement in the trait of interest. However, where feasible, studies should attempt to examine genome sequence variation across a gene.
Replication is highly desirable for all association studies, particularly for studies where extensive multiple testing means that study-wide significance is not clear. However, replication should only be claimed when it addresses the same variant, phenotype and genetic model (all too often other phenotypes or variants within a gene are offered as evidence of replication).
Authors should explicitly justify why the samples typed are well-suited to addressing the particular hypothesis posed. Care needs to be taken in the definition of cases using standardised criteria, and in the selection of appropriate control samples.
Well performed association studies that represent “significant negative” findings are welcome provided the gene examined has clear relevance to disease pathogenesis (or has been implicated on the basis of prior association data).
The information provided by a manuscript can be improved if certain technical requirements are observed. Where relevant, we ask that authors:
- Provide rs numbers for all variants reported (these are quite easy to obtain for novel variants). Where these are provided, details of the assay (primer sequences, PCR conditions) can be kept brief;
- Provide explicit details of the measures taken to ensure genotyping accuracy (including for example % successful genotype calls, number of duplicated genotypes, % correspondence);
- Provide approved HUGO gene names in the appropriate case and italics;
- Use standard terminology for variants (see http://www.hgvs.org/mutnomen/);
- Describe LD relationships between typed variants;
- Provide information on departures from Hardy–Weinberg equilibrium, not only as a check for possible genotyping errors, but also because methods assuming HWE may be employed in the downstream association analyses (e.g. haplotype inference using the EM algorithm/single-point analyses testing the multiplicative model);
- Provide raw genotype frequencies (that is, allele frequencies alone are not sufficient);
- Provide the criteria they have used to select tagSNPs. Authors should also carry out association analyses consistent with the tagging method employed, e.g. if an aggressive multimaker tagging approach has been followed, appropriate analyses are required to retrieve all the information captured;
- Denote the boundaries they have considered when studying a gene of interest (e.g. 5kb upstream of transcription initiation etc.); and should indicate which portions of the gene have been examined (e.g. exons and exon / intron boundaries).