Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Filter by Categories
Case Report
Case Series
Editorial
Editorial I
Editorial II
Original Article
Review
Review Article
Systematic Review
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Filter by Categories
Case Report
Case Series
Editorial
Editorial I
Editorial II
Original Article
Review
Review Article
Systematic Review
View/Download PDF

Translate this page into:

Original Article
12 (
5
); 35-43

Applying filtration steps to interpret the results of whole- exome sequencing in a consanguineous population to achieve a high detection rate

Department of Pediatrics, College of Medicine, Qassim University, Qassim, Saudi Arabia

Address for correspondence: Ahmed A. Alfares, Department of Pediatrics, College of Medicine, Qassim University, Qassim, Saudi Arabia. Tel.: 00966163800050. E-mail: fars@qu.edu.sa

Licence
This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-Share Alike 4.0 License, which allows others to remix, transform, and build upon the work non-commercially, as long as the author is credited and the new creations are licensed under the identical terms.
Disclaimer:
This article was originally published by Qassim University and was migrated to Scientific Scholar after the change of Publisher.

Abstract

Objective:

Interpreting whole-exome sequencing (WES) data are challenging, requiring extensive time, and effort to review all the variants in the variant call format. Here, we examined the application of custom filters to narrow the number of candidate variants in a consanguineous population that requires further analysis.

Methods:

In 100 cases undergoing WES, we applied a custom filtration process to look primarily for homozygous variants in autosomal recessive (AR) disorders, and second for variants in either autosomal dominant or x-linked disorders.

Results:

Most identified disease-causing variants were homozygous in AR disorders. By applying our custom filtration process, we narrowed the number of candidate variants requiring further analysis to 5–15 per case while maintaining a high detection rate and completing analysis in around 45 min.

Conclusion:

A custom filtration process and strategy targeting a specific population provide excellent detection rates in less time and should be considered as a first-tier laboratory workflow for analysis.

Keywords

Consanguinity
filtration steps
variant interpretation
whole-exome sequencing

Introduction

Interpreting whole-exome sequencing (WES) data are challenging, with an estimated six hundred thousand rare or novel variants per person spread across the genome, and many variants being nonsynonymous at conversed loci. Furthermore, a high number of these variants are predicted to be deleterious,[1] with high inaccuracy in genome variation annotation databases,[2] altogether creates significant challenges for testing laboratories when implementing and interpreting WES data.

WES is a method to sequences and analyzes the whole coding region of the genome. The reported diagnostic yield of WES ranges from 15% to 20%.[3,4] However, in consanguineous populations, this yield could reach up to 49%,[5,6] a higher detection rate that could simply be related to the simpler interpretation of variants at the genotype level.[5]

Here, we review the implementation of custom filtration steps during analysis of raw data from WES samples of a consanguineous population. Using this method, we narrowed down the number of candidate variants requiring classification and thus the time required to complete the analysis of a single case.

Methods

Molecular sequencing was done at a commercial CAP- accredited laboratory. Raw data, including fastq, BAM, and variant call format (vcf) files, were provided for the analysis. We applied custom-designed filters to assess and analyze vcf files. The design comprises two parts. The first, automated using a bioinformatics pipeline, uses three steps to create a processed vcf file as follows: (1) Alignment, (2) variant calling, and (3) variant classification, these steps could vary based on the sequencing systems and the type of the capturing kits [Figure 1]. The initial filters looked for sequencing quality controls and population allele frequency either from open databases or a local database. The second step involves manual filtration process using commercial VarSeq software from GoldenHelix (http://www. goldenhelix.com/). Using this software, we built custom steps and filter chains [Figure 1], and annotating all identified variants across ClinVar, any previously reported variant in ClinVar was filtered out for further evaluation. However, for the purpose of testing our filtration steps, in these work variants in ClinVar were considered at later stage and after completing the filtration process. Subsequent step involves filtration based on the mode of inheritance as follows:

The automated bioinformatics pipeline workflow to generate processed variant call format files followed by manual filtration
Figure 1:
The automated bioinformatics pipeline workflow to generate processed variant call format files followed by manual filtration

Autosomal recessive (AR), autosomal dominant (AD), or x-linked (XL). Next, we considered the allele state either homozygous, heterozygous, or hemizygous in cases of XL inheritance. We did not consider other modes of inheritance in this study such as mitochondrial or digenic. Furthermore, as a limitation of WES, we could not assess variants in noncoding regions of the genome.

Finally, we looked for the impact of the variant at the transcript level. During this assessment, we evaluated the most severe impacts (loss-of-function [LOF], missense, silent, and intronic). We classified variants into pathogenic/likely pathogenic based on the American College of Medical Genetics and Genomics (ACMGG) guidelines.[7] Variant assessments included clinical information with physical examination, further laboratory testing or imaging, segregation analysis, genotype and phenotype correlation, previous publication, or de novo assessment of the variant. Results were considered “positive” (meaning a likely disease-causing variant has been identified based on the ACMGG guidelines). As a proof of concept, we used 100 vcf files from consanguineous cases that were subjected to solo WES study.

Results

After applying the filtration steps [Figure 1], we could narrow the candidate variants to around only 5–10 LOF variants and 10–15 missense variants found in the homozygous state and genes with the AR mode of inheritance.

Table 1: Average number of identified variants in processed vcf files with hit rates for the whole cohort and separated by mode of inheritance and allele state

Many of these variants can be eliminated easily just by looking at the phenotype; in most cases, the gene’s clinical phenotype reported in the literature and the patient phenotype have no relation. Further variants can be eliminated as having high allele frequencies in our population or as our laboratory has observed previously, as being homozygous in nonaffected individuals. For each case, the time required to assess the remaining 5–10 variants (LOF and missense) averages 45 min, with 82% probability of identifying disease-causing variants using this filter chain. For cases with unidentified variants after the first filter, we proceeded to look for the next possible filter with the second-highest hit rate. For AD disorders and variants in the heterozygous or homozygous state, the number of identified variants were higher (130–150 variants), both missense and LOF. On average, 50–70 variants were identified in the other possible modes of inheritance in different scenarios such as compound heterozygous or XL. However, the diagnostic yield is low for these variants, accounting for around 7% of all positive cases [Table 1]. Our overall hit rate for the full sample of 100 cases with WES is 45% [Table 2] similar to the existing literature.[5]

Table 2: List of all identified diseases causing variants in processed vcf files from 100 consanguineous samples underwent whole exome sequencing

Discussion

WES has become a valuable tool in clinical settings for obtaining molecular diagnoses. Designing methods and tools that can facilitate the diagnostic accuracy of WES will certainly facilitate better and improved healthcare by identifying the molecular defects underlying rare disorders. Consanguinity impacts disorder incidence since deleterious and disease allele variations are known to occur as a result of long runs of homozygosity[8] or missense substitutions in a homozygous state.[9] In general, consanguineous marriages are expected to result in a high incidence of AR genetic disorders. The high rate of consanguinity in Saudi Arabia leads to possible founder effects for many genetic disorders, and population-specific AR genetic disorders.[10,11] It is critical to design a custom workflow focusing on the target population, starting from the bioinformatics pipeline, and proceeding to variant analysis and classification. For example, in our population, extensive effort during pipeline and workflow design focused on homozygous variants, which present higher chances of identifying disease- causing variants due to the high rates of consanguinity. Similar approaches have been applied before looking at autozygosity regions in the genomes.[12] However, with advances in technology, we can achieve better resolution and examining the variant level. Furthermore, a consanguineous population has fewer AD variants requiring less attention.

Furthermore, while specific populations already have custom databases, custom bioinformatics, and filtration steps for populations may enable better and faster interpretation of the results. By applying our custom filters to identify only homozygous variants in AR disorders, we could substantially narrow the number of candidate variants while still achieving a high hit rate toward 82% with an identifiable, disease-causing variant (positive cases) and around 36% of the whole cohort. Given the manageable number of variants requiring additional analysis, we achieved this in around 45 min, compared to 5 h without the filtration, and our hits account for the large percentage of positive cases.

In cases with different modes of inheritance (AD, XL), the number of identified variants is still high and would still require additional time to complete the analysis. This is unsurprising since consanguinity has little-to-no impact on these disorders’ underlying genotype. However, given the high rate of consanguinity, analysis of these variants should follow the full and complete analysis of AR disorder variants.

In conclusion, WES is a very useful tool to identify disease- causing variants, particularly in a consanguineous population, where higher detection rates are achieved. In this report, we verified that custom filtration steps and analysis to look primarily for homozygous variants in AR disorders will achieve the higher possible detection rates in less time, and testing laboratories are encouraged to consider this process for the first-tier analysis of WES raw data.

References

  1. , , , , , , et al. A global reference for human genetic variation. Nature. 2015;526:68-74.
    [Google Scholar]
  2. , , , , , , et al. A variant by any name: Quantifying annotation discordance across tools and clinical databases. Genome Med. 2017;9:7.
    [Google Scholar]
  3. , , , , , , et al. Clinical exome sequencing for genetic identification of rare mendelian disorders. JAMA. 2014;312:1880-7.
    [Google Scholar]
  4. , , , , . Whole exome sequencing of suspected mitochondrial patients in clinical practice. J Inherit Metab Dis. 2015;38:437-43.
    [Google Scholar]
  5. , , , , , , et al. A multicenter clinical exome study in unselected cohorts from a consanguineous population of Saudi Arabia demonstrated a high diagnostic yield. Mol Genet Metab. 2017;121:91-5.
    [Google Scholar]
  6. , , , , , , et al. The landscape of genetic diseases in Saudi Arabia based on the first 1000 diagnostic panels and exomes. Hum Genet. 2017;136:921-39.
    [Google Scholar]
  7. , , , , , , et al. Standards and guidelines for the interpretation of sequence variants: A joint consensus recommendation of the American college of medical genetics and genomics and the association for molecular pathology. Genet Med. 2015;17:405-24.
    [Google Scholar]
  8. , , , , , , et al. Long runs of homozygosity are enriched for deleterious variation. Am J Hum Genet. 2013;93:90-102.
    [Google Scholar]
  9. , , , , , , et al. Deleterious-and disease-allele prevalence in healthy individuals: Insights from current predictions, mutation databases, and population-scale resequencing. Am J Hum Genet. 2012;91:1022-32.
    [Google Scholar]
  10. , . Consanguineous marriage in an urban area of Saudi Arabia: Rates and adverse health effects on the offspring. J Community Health. 1998;23:75-83.
    [Google Scholar]
  11. , , . Map of autosomal recessive genetic disorders in Saudi Arabia: Concepts and future directions. Am J Med Genet A. 2012;158A:2629-40.
    [Google Scholar]
  12. . Impact of new genomic tools on the practice of clinical genetics in consanguineous populations: The Saudi experience. Clin Genet. 2013;84:203-8.
    [Google Scholar]

Fulltext Views
147

PDF downloads
25
View/Download PDF
Download Citations
BibTeX
RIS
Show Sections