Association studies

As outlined above, linkage studies have been less successful than originally hoped in identifying the genes underlying complex diseases. Reasons for this include the low heritability of most common diseases and the heterogeneity of phenotypes investigated. This is particularly true of many rheumatic diseases, which show wide variation in their clinical presentation, progression, severity and associated autoanti-body profiles. However, the major reason for the lack of success of linkage-based approaches is because they are inherently underpowered to detect genes with modest effect sizes where allele frequencies are common. It is likely that most effect sizes important in common disease will be in the order of <1.5 and that disease associated alleles will be common (>10%). By way of example, both linkage and association studies have consistently identified the chromosome 6p region as a susceptibility locus for RA. The HLA DRB1 gene is estimated to have an effect size of ~3 and frequency of shared epitope carriage is common. However, where disease alleles are rare but have large effect sizes (e.g., CARD15 gene variants causing susceptibility to Crohn's disease) linkage studies are more likely to identify the locus than association studies. By contrast, for the situations expected for most complex diseases where effect sizes are small but disease causal alleles occur at a reasonable frequency in the population, association will have much greater power to detect a disease locus than linkage studies (Fig. 2). Secondly, because association relies on the existence of linkage disequilibrium (LD) between the disease and marker allele and LD only extends for short distances, association based methods have greater power to localise a disease gene. Finally, studies on the association between genetic variants and disease can suggest pathogenic mechanisms and have the potential for direct clinical application by providing markers of risk, diagnosis, prognosis, and, possibly, therapeutic targets [43].

A number of genes underlying common complex rheumatic disorders have been identified (Tab. 2). Those studies all adopted a candidate gene approach in which genes or regions are investigated based on prior knowledge of biological pathways, information gleaned from animal models or other similar diseases or because they map within a region of linkage. Most association studies have investigated single nucleotide polymorphism (SNP) markers because these are abundant in the genome, amenable to high throughput genotyping and may affect gene function. In order to detect association with a disease gene, the markers tested must either be causal or strongly correlated with the causal variant. It is estimated that up to 80% of the genome falls into segments of high LD within which variants are strongly correlated with each other and most chromosomes carry one of only a few common haplo-types. It is, therefore, inefficient to genotype all the SNPs within a gene. Cost efficiency is gained by genotyping a subset of markers, tagging SNPs, which capture most of the allelic variation in a region [44]. The availability of information regarding LD patterns across the genome with which to select tagging SNPs has been facil-

Effect size in

ODD ratio

CT LA 4 PPARG

1% Allele frequency ->50%

Figure 2

Linkage and association methods may be more suited to detecting genes depending upon the allele frequency and effect size, assuming typical sample sizes of several hundred families or case-control pairs. Using currently available methods, it is unlikely that rare alleles (<3%) conferring small risks (<1.2) will be detected in any study.

itated by initiatives such as the HapMap project, which has created a genome-wide catalogue of common haplotype blocks in multiple human populations [45].

An alternative approach would be only to genotype those SNPs encoding missense mutations, the argument being that these are most likely to be disease causal and the number of SNPs requiring genotyping would be markedly reduced [46]. This approach has been successful, for example, in identifying the PTPN22 gene, which has been associated with a number of autoimmune diseases (Chapter 8). However, for diseases of late age at onset including many common diseases, non-coding regulatory variants are likely to play an important role as they are less likely to have dramatic effects on disease risk and are less likely to be subject to strong negative selection. Examples are emerging in the literature to support this viewpoint. For example, regulatory variants in the CTLA-4 gene have been associated with susceptibility to autoimmune thyroid disease and type 1 diabetes [47].

Many gene-disease association studies have been performed, a large number of positive associations have been reported but few have been widely replicated and

Table 2 - Genes widely replicated with functional support in rheumatology

Gene

Disease

Population

Polymorphism

Refs

PTPN22

RA, SLE, T1D, AIT, JIA

Europeans/US

missense mutation

[64-69]

PDCD1

SLE

Europeans/US

intron 1

[70]

PADI4

RA

Japanese/Korean

haplotype

[42]

FCRL3

RA, SLE, AITD

Japanese

promoter variant

[71]

MIF

JIA

European

promoter haplotype

[72, 73]

identified disease causal mutations. Not excluding the possibility that results have arisen by chance is probably the major reason why many apparent associations are not replicated [48]. This arises because information about SNPs is increasingly available, genotyping costs continue to fall, more tests are performed and, hence, more are likely to show association by chance alone (false positive) if significance thresholds are not appropriately adjusted. Applying a Bonferroni correction for SNPs that are tightly linked is overly conservative and permutation testing has been proposed as an alternative to empirically test the probability of having observed an association by chance [49]. Thresholds for declaring significant association also need to be reviewed in light of the increasing number of hypotheses being tested in candidate gene studies. Suggestions have been made to adjust the p-value for declaration of statistical significance from 5 x 10-2 to 5 x 10-5 in candidate gene association studies [48, 49]. This will require at least a three-fold increase in sample size to retain the same power [48]. Increasing the prior probability of detecting association should also reduce the false positive rates [50]. This can be achieved by selecting genes for investigation, which either map to regions of linkage or where there is evidence for association in a different population.

There is a wealth of data available from published whole genome linkage scans for a number of rheumatic diseases including osteoarthritis, systemic lupus erythe-matosus and, as outlined above, RA. Furthermore, when selecting genes for disease association studies, some consideration should be given to the biological plausibility. For example, the gene should be expressed in a relevant tissue or be involved in the disease pathway. Large well-powered studies are required in order to robustly assess the significance of results using stringent criteria. A small sample size (hence, low power to detect anything other than major effects) in a first reported association has previously been shown to predict inconsistent replication in subsequent studies [51]. Indeed only two of seven studies reporting positive association in which the sample size was less than 150 were subsequently replicated [52].

Replication of findings in an independent cohort provides compelling evidence that the original association was real. It should be noted that replication studies should be powered to detect a smaller effect size than that reported in the original study because of the phenomenon of 'winner's curse' whereby there is an upward bias in the effect sizes reported in original studies [53]. This has been demonstrated in an investigation of 55 meta-analyses, which showed that subsequent research suggested weaker or no association compared with strong associations suggested by first research [51].

Population stratification has been proposed as another reason why false positive associations may occur. This arises when a population studied actually comprises subpopulations that differ both in allele frequency and in the prevalence of the disease under study. With more information publicly available regarding SNP frequencies in different populations, it is increasingly recognised that differences between populations exist. Very little difference is seen between individuals of Northern European descent and white US groups but Hispanics, African descent and Asians can have very different allele frequencies [54]. This can have important consequences for association. For example, the PTPN22 gene missense mutation has been widely associated with a variety of autoimmune diseases in US and European populations but the polymorphism does not exist in Asian populations. However, despite these concerns, several studies have now demonstrated that major population stratification is unlikely to be a problem in well-matched case control cohorts [55] and methods exist to correct for it if it is present by genotyping a limited number of markers in unlinked regions of the genome [56]. An alternative is to use family based methods such as the transmission disequilibrium test, which eliminates the possibility of stratification. However, these methods are less powerful than case control methods, are inefficient in their use of available information and are more prone to technical artefacts [49]. Furthermore, when investigating gene-environment interactions, overmatching of environmental exposures may occur using family-based methods [43]. Hence, there has been a resurgence of interest in using large, well-powered case-control study designs to investigate associations.

The power of a study to detect an association with disease depends on both the sample size studied and the frequency of the alleles that predispose to disease. If a susceptibility allele has a minor allele frequency of less than 10% in the population and a small effect size, many thousands of individuals will be required to detect association with disease. It is estimated that most effect sizes important in common disease will be in the order of <1.5. The 'common disease common variant' hypothesis supposes that, because common diseases generally arise later in life and thus do not have a major effect on reproductive ability, the variants responsible are unlikely to be subject to negative selection and will have reached a reasonably high frequency in the population. However, even if this is true and common diseases are caused by alleles that have allele frequencies >10%, large sample sizes are still crucial to detect modest effect sizes. In practice, it is likely that some diseases will have associated genes with low minor allele frequencies (but these may have a large effect size), while others will be due to common variants with small effect sizes [57, 58].

Once a genetic association has been robustly demonstrated, it is important to determine whether the associated variant is disease causal or highly correlated with a disease causal polymorphism. New approaches for the assessment of the functional significance of associated variants using statistical, bioinformatics and molecular biology approaches are being developed and should aid in the interpretation of associations between genotypes and disease [59].

It is clear that well-designed candidate gene association studies show great promise in contributing to the identification of the underlying genetic causes of common diseases. Indeed, there have already been a number of successes using this approach but also many reported associations that have not been replicated. A number of recommendations regarding study design and guidelines for referees in interpreting the quality of reports are now available [43, 48, 60]. In general terms, these highlight the importance of good lab practice (including use of negative controls and duplicates, blinding of lab personnel), rigorous statistical analysis (assessment of whether genotype frequencies conform to Hardy-Weinberg expectations, power and effect size considerations, correction for multiple testing), replication in independent data sets and assessment of biological plausibility.

Was this article helpful?

0 0
Osteoarthritis

Osteoarthritis

Thank you for deciding to learn more about the disorder, Osteoarthritis. Inside these pages, you will learn what it is, who is most at risk for developing it, what causes it, and some treatment plans to help those that do have it feel better. While there is no definitive “cure” for Osteoarthritis, there are ways in which individuals can improve their quality of life and change the discomfort level to one that can be tolerated on a daily basis.

Get My Free Ebook


Post a comment