< Back to Resources

Nanopore DNA Sequencing

Nanopore NGS Panel for Examination of Key Genes on Cannabinoid and Terpenoid Synthetic Pathways

Jerian Reynolds*, May Cui*, Manmeet Kaur*, Kelly Sveinson Ŧ and John Brunstein*; Segra International Corporation, Genotyping and *Molecular Lab Services Division, Richmond, B.C. Canada V6W 1M2; ŦLangara College, Vancouver, B.C. Canada

Presented by Dr. John Brunstein: https://www.youtube.com/watch?v=kwx2gezErEg&t=136s


Cannabis varieties are commonly bred or selected for expressed cannabinoid and / or terpenoid profiles. Allelic variation of key genes in the cannabinoid and / or terpenoid pathways would be expected to have phenotypic impact. Thus, an understanding of significant variation in these genes would potentially be useful in underpinning marker assisted selection strategies for breeding novel varieties. To investigate, we have applied the Oxford Nanopore MinION sequencer and the PCR Barcoding coverage. Comparison of these genomic sequences against extant cDNA sequences allows for determination of predicted amino acid sequences for each target. Unlike common short read NGS technologies, where allelic phasing can be unclear, the use of Nanopore long read sequencing may allow for easier resolution of individual parentally derived allelic sequences.

Derived amino acid sequence(s) for each examined gene target were aligned and amino acid substitutions were examined in comparison to paired chemotypic data where available (the majority of samples tested). This approach promises to allow association of particular amino acid substitutions in some target genes with chemotypic effects. Where possible, we have mapped these substitutions back onto available protein crystallographic structures to attempt to assess whether these observed changes are merely linked or are more likely directly mechanistically relevant to chemotype. Overall, we find this method represents a relatively low cost, high yield approach to uncovering markers of utility in directed cannabis breeding programs.

Samples and DNA Extraction

Cannabis dried flower samples were purchased from the BC Liquor Distribution Branch and extracted using KingFisherTM Duo Prime System (Magnetic Particle Processor) and MagMAX Plant DNA Kit in accordance with manufacturer’s instructions.

PCR Amplification

Samples were amplified for individual target genes of interest (see Table 1) using gene-specific primers, developed in house. PCR conditions varied according to expected amplicon size.

Barcoding, Library Preparation and NGS

For each cannabis sample tested, PCR products for all amplified targets were pooled and barcoded. Up to 12 cannabis samples were then pooled into a single library for NGS. As per Oxford Nanopore Technologies (ONT), the PCR Barcoding Expansion Kit (EXP-PBC001) was used in combination with the Sequencing Ligation Kit (SQK-LSK109) to prepare libraries which were applied to a Flowcell (R9.4.1) and sequenced on the MinION sequencer.


Base calling was performed on ONT MinIT by Guppy. Resulting sequences were demultiplexed by Epi2Me and subsequent analysis steps were performed in CLC Genomics Workbench 10.0.3 (Qiagen).

Chemotyping Data

Chemotypic data (THC(A) and CBD(A) only) were as provided with product lot. For some samples, an expanded panel of analytes including THC(A), CBD(A), plus multiple other cannabinoids and terpenoids were measured by validated in house methods (GC/MS and HPLC).

Table 1: Target Genes; expected size and presence/ absence of introns; gene reference used for alignment.

Target Sequence Recovery

Barcoded bulk sequence data for each sample was interrogated for target gene sequences using a ‘Map reads to Reference’ function against bait sequences in Table 1. Where matching reads were present, this generated a consensus sequence for the target.


Where more than one copy of a target allele is present, such as commonly the case in diploid organisms, a target consensus sequence such as derived above may in fact represent an artificial mixture of the two true sequences (see Figure 1). This is a problem in both Sanger and widely used short read NGS methods where allelic variation cannot be properly assigned phase (i.e. whether any two heterozygous nucleotide positions are effectively in cis or trans).

To address this, we have attempted to leverage ONT's long read technology to dephase alleles by:

  • running a variant finder algorithm on the target consensus to identify heterozygous positions;
  • choosing one variant position as a reference, and from reads making up target consensus, select a statistically large pool of reads all of one variant form at that anchor position; and
  • interrogate the resulting sub-pool of sequences for nucleotide identity at each of the other consensus variant positions.

This results in two dephased alleles as with long read technology, variants in common to each allele (i.e. in cis) will statistically associate together in reads grouped this way. We observed heterozygous positions in consensus reads (with near 50:50 ratios) resolve to 90% or better single nucleotide identity in the individual dephased alleles resulting from this method, in line with expected Nanopore accuracy limits.

Where more than two alleles of a locus are present, such as multiple gene copies, sequential iterative rounds of this approach as applied to large enough read depth data sets should effectively resolve out both true sequence of all alleles present, and the copy number. Application of variant finding algorithms on each derived sequence from this method is employed to determine when a unique allele, as opposed to mixed consensus requiring further rounds, is obtained.

For simplicity, in this study we have selected samples where this approach supported evidence of only one or two resolved alleles as noted.

Figure 1: Consensus may not represent true sequence of either allele, and resolution of phasing.

Example: Application to CBG oxidocyclase (CBGO) loci from selected samples

  • We identified two cultivars (C0209, C0361) with strongly divergent CBD(A)/THC(A) ratios in our sample pool.
  • Following dephasing, each was observed to have a single full length CBGO allele based on recovery against both canonical THCAS and CBDAS forms of CBGO.
  • We aligned these sequences along with the 3VTE CBGO (“THCA synthase”) crystal structure and identified three variations between C0209 and C0361 which mapped to the active site – residues 257, 259, and 363 relative to 3VTE (see Table 2). (14 other variations were all mapped to external protein surface).
  • Residue 259 in particular projects directly into central pocket of active site; alterations of this residue would a priori be expected to influence activity.
  • We further identified three more cultivars (C0357, C0358, C0362) each with intermediate chemotype and with evidence for only two full length CBGO alleles present following dephasing.
  • These cultivars were confirmed not closely related based on VNTR fingerprinting and thus represent truly distinct samples
  • We evaluated CBGO sequences in these cultivars and found each carried one allele similar to C0209 and one similar to C0361 (see Table 2).

Table 2: Chemotypes and dephased CBGO alleles in three chemogroups (High, Low, Intermediate) along with their positions superimposed on 3VTE crystal structure. FMN and residues thought involved in catalysis highlighted.


As we cannot rule out undetected alleles or gene loci as a source of cannabinoids present, we are limited to a negative argument that:

  • The CBGO allele seen in C0361 does not produce THC; and
  • The CBGO allele seen in C0209 does not produce CBD
  • Intermediate chemotype varieties C0357, C0358, and C0362 each present with one copy each of the CBGO synthase allele forms seen in C0361 and C0209
  • The simplest hypothesis is that:
  • The allele form in C0361 produces CBD almost exclusively;
  • The allele form in C0209 produces THC almost exclusively; and
  • Heterozygotes with one allele form each show a near balance between CBD and THC production from common CBGA precursor.
  • All of these are caveated by assumption observed genes are equally expressed.


  • We have demonstrated feasibility of a targeted multigene panel by ONT Nanopore on Cannabis;
  • We have developed a means for allele dephasing by taking advantage of the long read data format;
  • With the CBGO locus we demonstrate here on a limited data set application of this for uncovering evidence suggestive of allele/phenotype relationships useful for marker assisted selection.
  • We have expanded this analysis to other samples available and to date have not observed significant amounts of THC(A) in absence of an allele with a C0209-like residue pattern, or CBD(A) in absence of a C0361-like residue pattern.

Segra Genotyping - Helping You Grow with Certainty