Advanced User Guide - SangerContig (FASTA)

SangerContig is in the intermediate level of sangeranalyseR (Figure_1), and each SangerContig instance corresponds to a contig in a Sanger sequencing experiment. Among its slots, there are two lists, forward and reverse read list, storing SangerRead in the corresponding direction.

In this section, we are going to go through details about a reproducible SangerContig analysis example with the FASTA file input in sangeranalyseR. By running the following example codes, you will get an end-to-end SangerContig analysis result.

../_images/SangerContig_hierarchy.png

Figure 1. Hierarchy of classes in sangeranalyseR, SangerContig level.


Preparing SangerContig FASTA input

In Advanced User Guide - SangerContig (AB1), we demonstrated how to use AB1 input files to create SangerContig instance. Here, we explain another input format - the FASTA input. Before starting the analysis, users need to prepare one FASTA file, which must end with .fa or .fasta, containing sequences of all reads. In this example, the FASTA file is in the sangeranalyseR package, and you can simply get its path by running the following codes:

rawDataDir <- system.file("extdata", package = "sangeranalyseR")
fastaFN <- file.path(rawDataDir, "fasta", "SangerContig", "Achl_ACHLO006-09.fa")

The value of fastaFN is where the FASTA file is placed. If your operating system is macOS, then its value should look like this:

And we showed the reads in fastaFN in Figure_2 (example FASTA file):

../_images/SangerContig_fasta_input.png

Figure 2. SangerContig FASTA input file.

Inside the FASTA file (Figure_2; Achl_ACHLO006-09.fa), the strings starting with “>” before each read are the read names. There are two ways of grouping reads which are “regular expression matching” and “CSV file matching”, and following are instructions of how to prepare your FASTA input file.

(1) “regular expression matching” SangerContig inputs (FASTA)

For regular expression matching method, sangeranalyseR will group reads based on their contig name and read direction in their names automatically; therefore, users have to follow the read-naming regulations below:

Note

  • All reads in the same contig group must include the same contig name in their read names.
  • Forward or reverse direction also has to be specified in their read names.

There are four parameters, FASTA_File, contigName, REGEX_SuffixForward and REGEX_SuffixReverse, that define the grouping rule to let sangeranalyseR automatically match correct reads in FASTA file and divide them into forward and reverse directions.

Note

  • FASTA_File: this is the path to FASTA file that contains all sequences of reads, and it can be either an absolute or relative path. We suggest users to include only target reads inside this FASTA file and do not include any other unrelated reads.
  • contigName: this is a regular expression that matches read names that are going to be included in the SangerContig analysis. grepl function in R is used.
  • REGEX_SuffixForward: this is a regular expression that matches all read names in forward direction. grepl function in R is used.
  • REGEX_SuffixReverse: this is a regular expression that matches all read names in reverse direction. grepl function in R is used.

If you don’t know what regular expression is, don’t panic - it’s just a way of recognising text. Please refer to What is a regular expression? for more details. Here is an example of how it works in sangeranalseR:

So how sangeranalyseR works is that it first matches the contigName to exclude unrelated files and then separate the forward and reverse reads by matching REGEX_SuffixForward and REGEX_SuffixReverse. Therefore, it is important to make sure that all target reads in the FASTA file share the same contigName and carefully select your REGEX_SuffixForward and REGEX_SuffixReverse. The bad file-naming and wrong regex matching might accidentally include reverse reads into the forward read list or vice versa, which will make the program generate wrong results. Therefore, it is important to have a consistent naming strategy. So, how should we systematically name the reads? We suggest users to follow the file-naming regulation in Figure_3.

../_images/sangeranalyseR_filename_convention_fasta.png

Figure 3. Suggested read naming regulation in FASTA file - SangerContig.

As you can see, the first part of the regulation is a consensus read name (or contig name), which narrows down the scope of reads to those we are going to examine. The second part of the regulation is an index. Since there might be more than one read that is in the forward or reverse direction, we recommend you to number your reads in the same contig group. The last part is a direction which is either ‘F’ (forward) or ‘R’ (reverse).

To make it more specific, let’s go back to the true example. In Figure_2, there are two reads in the FASTA file (fasta_FN). First, we set contigName to "Achl_ACHLO006-09" to confirm that two of them, Achl_ACHLO006-09_1_F and Achl_ACHLO006-09_2_R, contain our target contigName and should be included. Then, we set REGEX_SuffixForward to "_[0-9]*_F$" and REGEX_SuffixReverse to "_[0-9]*_R$" to let sangeranalyseR match and group forward and reverse reads automatically. By the regular expression rule, Achl_ACHLO006-09_1_F and Achl_ACHLO006-09_2_R will be categorized into “forward read list” and “reverse read list” respectively. The reason why we strongly recommend you to follow this file-naming regulation is that by doing so, you can directly adopt the example regular expression matching values, "_[0-9]*_F$" and "_[0-9]*_R$", to group reads and reduce chances of error.

After understanding how parameters work, please refer to Creating SangerContig instance from FASTA below to see how sangeranalseR creates ‘Achl_ACHLO006-09’ SangerContig instance.

(2) “CSV file matching” SangerContig inputs (FASTA)

No doubt that read names in the original FASTA file do not follow the naming regulation, and you do not want to change the original FASTA file; thus, we provide a second grouping approach, CSV file matching method. sangeranalyseR will group reads in the FASTA file based on the information in a CSV file automatically, and users do not need to alter the read names in the FASTA file; therefore, users have to follow the regulations below:

Note

Here is an example CSV file (Figure_4)

../_images/sangeranalyseR_csv_file_sangercontig_fasta.png

Figure 4. Example CSV file for SangerContig instance creation.

  • There must be three columns, “reads”, “direction”, and “contig”, in the CSV file.
  • The “reads” column stores the read names in the FASTA file that are going to be included in the analysis.
  • The “direction” column stores the direction of the reads. It must be “F” (forward) or “R” (reverse).
  • The “contig” column stores the contig name that each read blongs. Reads in the same contig have to have the same contig name, and they will be grouped into the same SangerContig instance.

There are three parameters, FASTA_File, contigName, and CSV_NamesConversion,that define the grouping rule to help sangeranalseR to automatically match correct reads in a FASTA file and divide them into forward and reverse directions.

Note

  • FASTA_File: this is the path to FASTA file that contains all sequences of reads, and it can be either an absolute or relative path. We suggest users to include only target reads inside this FASTA file and do not include any other unrelated reads.
  • contigName: this is a regular expression that matches read names that are going to be included in the SangerContig analysis. grepl function in R is used.
  • CSV_NamesConversion: this is the path to the CSV file. It can be either an absolute or relative path.

The main difference between “CSV file matching” and “regular expression matching” is where the grouping rule is written. For “regular expression matching”, rules are writtein in read names, and thus more naming requirements are required. In contrast, rules of “CSV file matching” are written in an additional CSV file so it is more flexible on naming reads.

So how sangeranalyseR works is that it first reads in the CSV file (with “reads”, “direction”, and “contig” columns), filter out rows whose “contig” is not the value of contigName parameter, find the read names in the FASTA file listed in “reads”, and assign directions to them based on “direction”.

To make it more specific, let’s go back to the true example. First, we prepare a CSV file (CSV_NamesConversion) and a FASTA file (FASTA_File). In the CSV file, both rows have the contig name "Achl_ACHLO006-09", which is what we need to assign to the contigName parameter. sangeranalyseR then checks and matches “reads” of these two rows, "Achl_ACHLO006-09_1_F" and "Achl_ACHLO006-09_2_R". Last, these two reads are assigned into “forward read list” and “reverse read list” respectively by the “direction” column.

After understanding how parameters work, please refer to Creating SangerContig instance from FASTA below to see how sangeranalseR creates ‘Achl_ACHLO006-09’ SangerContig instance.


Creating SangerContig instance from FASTA

After preparing the input directory, we can create a SangerContig instance by running SangerContig constructor function or new method. The constructor function is a wrapper for new method and it makes instance creation more intuitive. Their input parameters are same, and all of them have their default values. For more details about SangerContig inputs and slots definition, please refer to sangeranalyseR reference manual. We will explain two SangerContig instance creation methods, “regular expression matching” and “CSV file matching”.

(1) “regular expression matching” SangerContig creation (FASTA)

The consturctor function and new method below contain four parameters, FASTA_File, contigName, REGEX_SuffixForward, and REGEX_SuffixReverse, that we mentioned in the previous section. In contrast to AB1 input method, it does not include quality trimming and chromatogram visualization parameters. Run the following code and create my_sangerContigFa instance.

# using `constructor` function to create SangerRead instance
my_sangerContigFa <- SangerContig(inputSource           = "FASTA",
                                  processMethod         = "REGEX",
                                  FASTA_File            = fastaFN,
                                  contigName            =  "Achl_ACHLO006-09",
                                  REGEX_SuffixForward   = "_[0-9]*_F$",
                                  REGEX_SuffixReverse   = "_[0-9]*_R$",
                                  refAminoAcidSeq       = "SRQWLFSTNHKDIGTLYFIFGAWAGMVGTSLSILIRAELGHPGALIGDDQIYNVIVTAHAFIMIFFMVMPIMIGGFGNWLVPLMLGAPDMAFPRMNNMSFWLLPPALSLLLVSSMVENGAGTGWTVYPPLSAGIAHGGASVDLAIFSLHLAGISSILGAVNFITTVINMRSTGISLDRMPLFVWSVVITALLLLLSLPVLAGAITMLLTDRNLNTSFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIISQESGKKETFGSLGMIYAMLAIGLLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAVPTGIKIFSWLATLHGTQLSYSPAILWALGFVFLFTVGGLTGVVLANSSVDIILHDTYYVVAHFHYVLSMGAVFAIMAGFIHWYPLFTGLTLNNKWLKSHFIIMFIGVNLTFFPQHFLGLAGMPRRYSDYPDAYTTWNIVSTIGSTISLLGILFFFFIIWESLVSQRQVIYPIQLNSSIEWYQNTPPAEHSYSELPLLTN",
                                  minReadsNum           = 2,
                                  minReadLength         = 20,
                                  minFractionCall       = 0.5,
                                  maxFractionLost       = 0.5,
                                  geneticCode           = GENETIC_CODE,
                                  acceptStopCodons      = TRUE,
                                  readingFrame          = 1,
                                  processorsNum         = 1)

# using `new` method to create SangerRead instance
my_sangerContigFa <- new("SangerContig",
                         inputSource           = "FASTA",
                         processMethod         = "REGEX",
                         FASTA_File            = fastaFN,
                         contigName            = "Achl_ACHLO006-09",
                         REGEX_SuffixForward   = "_[0-9]*_F$",
                         REGEX_SuffixReverse   = "_[0-9]*_R$",
                         refAminoAcidSeq       = "SRQWLFSTNHKDIGTLYFIFGAWAGMVGTSLSILIRAELGHPGALIGDDQIYNVIVTAHAFIMIFFMVMPIMIGGFGNWLVPLMLGAPDMAFPRMNNMSFWLLPPALSLLLVSSMVENGAGTGWTVYPPLSAGIAHGGASVDLAIFSLHLAGISSILGAVNFITTVINMRSTGISLDRMPLFVWSVVITALLLLLSLPVLAGAITMLLTDRNLNTSFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIISQESGKKETFGSLGMIYAMLAIGLLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAVPTGIKIFSWLATLHGTQLSYSPAILWALGFVFLFTVGGLTGVVLANSSVDIILHDTYYVVAHFHYVLSMGAVFAIMAGFIHWYPLFTGLTLNNKWLKSHFIIMFIGVNLTFFPQHFLGLAGMPRRYSDYPDAYTTWNIVSTIGSTISLLGILFFFFIIWESLVSQRQVIYPIQLNSSIEWYQNTPPAEHSYSELPLLTN",
                         minReadsNum           = 2,
                         minReadLength         = 20,
                         minFractionCall       = 0.5,
                         maxFractionLost       = 0.5,
                         geneticCode           = GENETIC_CODE,
                         acceptStopCodons      = TRUE,
                         readingFrame          = 1,
                         processorsNum         = 1)

In this example, contigName is set to Achl_ACHLO006-09, so Achl_ACHLO006-09_1_F and Achl_ACHLO006-09_2_R are matched and selected. Moreover, by regular expression pattern matching, Achl_ACHLO006-09_1_F is categorized into the forward list, and Achl_ACHLO006-09_2_R is categorized into the reverse read. Both reads are aligned into a contig, my_sangerContigFa, and it will be used as the input for the following functions.

Inside the R shell, you can run my_sangerContigFa to get basic information of the instance or run my_sangerContigFa@objectResults@readResultTable to check the creation result of every Sanger read after my_sangerContigFa is successfully created.

Here is the output of my_sangerContigFa:

SangerContig S4 instance
         Input Source :  FASTA
         Process Method :  REGEX
      Fasta File Name :  /Library/Frameworks/R.framework/Versions/4.0/Resources/library/sangeranalyseR/extdata/fasta/SangerContig/Achl_ACHLO006-09.fa
   REGEX Suffix Forward :  _[0-9]*_F$
   REGEX Suffix Reverse :  _[0-9]*_R$
            Contig Name :  Achl_ACHLO006-09
         'minReadsNum' :  2
      'minReadLength' :  20
      'minFractionCall' :  0.5
      'maxFractionLost' :  0.5
   'acceptStopCodons' :  TRUE
         'readingFrame' :  1
      Contig Sequence :  TTATATTTTATTCTGGGCGTCTGAGCAGGAATGGTTGGAGCCGGTATAAGACTTCTAATTCGAATCGAGCTAAGACAACCAGGAGCGTTCCTGGGCAGAGACCAACTATACAATACTATCGTTACTGCACACGCATTTGTAATAATCTTCTTTCTAGTAATGCCTGTATTCATCGGGGGATTCGGAAACTGGCTTTTACCTTTAATACTTGGAGCCCCCGATATAGCATTCCCTCGACTCAACAACATGAGATTCTGACTACTTCCCCCATCACTGATCCTTTTAGTGTCCTCTGCGGCGGTAGAAAAAGGCGCTGGTACGGGGTGAACTGTTTATCCGCCTCTAGCAAGAAATCTTGCCCACGCAGGCCCGTCTGTAGATTTAGCCATCTTTTCCCTTCATTTAGCGGGTGCGTCTTCTATTCTAGGGGCTATTAATTTTATCACCACAGTTATTAATATGCGTTGAAGAGGATTACGTCTTGAACGAATTCCCCTGTTTGTCTGAGCTGTGCTAATTACAGTTGTTCTTCTACTTCTATCTTTACCAGTGCTAGCAGGTGCCATTACCATACTTCTTACCGACCGAAACCTCAATACTTCATTCTTTGATCCTGCCGGTGGTGGAGACCCCATCCTC
Forward reads in the contig >>  1
Reverse reads in the contig >>  1
SUCCESS [2021-13-07 11:52:40] 'Achl_ACHLO006-09' is successfully created!

Here is the output of my_sangerContigFa@objectResults@readResultTable:

            readName creationResult errorType errorMessage inputSource    direction
1 Achl_ACHLO006-09_1_F           TRUE      None         None       FASTA Forward Read
2 Achl_ACHLO006-09_2_R           TRUE      None         None       FASTA Reverse Read

(2) “CSV file matching” SangerContig creation (FASTA)

The consturctor function and new method below contain three parameters, FASTA_File, contigName, and CSV_NamesConversion, that we mentioned in the previous section. Run the following code and create my_sangerContigFa instance.

csv_namesConversion <- file.path(rawDataDir, "fasta", "SangerContig", "names_conversion_1.csv")

# using `constructor` function to create SangerRead instance
my_sangerContigFa <- SangerContig(inputSource           = "FASTA",
                                  processMethod         = "CSV",
                                  FASTA_File            = fastaFN,
                                  contigName            = "Achl_ACHLO006-09",
                                  CSV_NamesConversion   = csv_namesConversion,
                                  refAminoAcidSeq       = "SRQWLFSTNHKDIGTLYFIFGAWAGMVGTSLSILIRAELGHPGALIGDDQIYNVIVTAHAFIMIFFMVMPIMIGGFGNWLVPLMLGAPDMAFPRMNNMSFWLLPPALSLLLVSSMVENGAGTGWTVYPPLSAGIAHGGASVDLAIFSLHLAGISSILGAVNFITTVINMRSTGISLDRMPLFVWSVVITALLLLLSLPVLAGAITMLLTDRNLNTSFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIISQESGKKETFGSLGMIYAMLAIGLLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAVPTGIKIFSWLATLHGTQLSYSPAILWALGFVFLFTVGGLTGVVLANSSVDIILHDTYYVVAHFHYVLSMGAVFAIMAGFIHWYPLFTGLTLNNKWLKSHFIIMFIGVNLTFFPQHFLGLAGMPRRYSDYPDAYTTWNIVSTIGSTISLLGILFFFFIIWESLVSQRQVIYPIQLNSSIEWYQNTPPAEHSYSELPLLTN",
                                  minReadsNum           = 2,
                                  minReadLength         = 20,
                                  minFractionCall       = 0.5,
                                  maxFractionLost       = 0.5,
                                  geneticCode           = GENETIC_CODE,
                                  acceptStopCodons      = TRUE,
                                  readingFrame          = 1,
                                  processorsNum         = 1)

# using `new` method to create SangerRead instance
my_sangerContigFa <- new("SangerContig",
                         inputSource           = "FASTA",
                         processMethod         = "CSV",
                         FASTA_File            = fastaFN,
                         contigName            = "Achl_ACHLO006-09",
                         CSV_NamesConversion   = csv_namesConversion,
                         refAminoAcidSeq       = "SRQWLFSTNHKDIGTLYFIFGAWAGMVGTSLSILIRAELGHPGALIGDDQIYNVIVTAHAFIMIFFMVMPIMIGGFGNWLVPLMLGAPDMAFPRMNNMSFWLLPPALSLLLVSSMVENGAGTGWTVYPPLSAGIAHGGASVDLAIFSLHLAGISSILGAVNFITTVINMRSTGISLDRMPLFVWSVVITALLLLLSLPVLAGAITMLLTDRNLNTSFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIISQESGKKETFGSLGMIYAMLAIGLLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAVPTGIKIFSWLATLHGTQLSYSPAILWALGFVFLFTVGGLTGVVLANSSVDIILHDTYYVVAHFHYVLSMGAVFAIMAGFIHWYPLFTGLTLNNKWLKSHFIIMFIGVNLTFFPQHFLGLAGMPRRYSDYPDAYTTWNIVSTIGSTISLLGILFFFFIIWESLVSQRQVIYPIQLNSSIEWYQNTPPAEHSYSELPLLTN",
                         minReadsNum           = 2,
                         minReadLength         = 20,
                         minFractionCall       = 0.5,
                         maxFractionLost       = 0.5,
                         geneticCode           = GENETIC_CODE,
                         acceptStopCodons      = TRUE,
                         readingFrame          = 1,
                         processorsNum         = 1)

First, you need to load the CSV file into the R environment. If you are still don’t know how to prepare it, please check (2) “CSV file matching” SangerContig inputs (FASTA). Then, it will follow rules in the CSV file and create my_sangerContigFa. After it’s created, inside the R shell, you can run my_sangerContigFa to get basic information of the instance or run my_sangerContigFa@objectResults@readResultTable to check the creation result of every Sanger read after my_sangerContigFa is successfully created.

Here is the output of my_sangerContigFa:

SangerContig S4 instance
         Input Source :  FASTA
         Process Method :  CSV
      Fasta File Name :  /Library/Frameworks/R.framework/Versions/4.0/Resources/library/sangeranalyseR/extdata/fasta/SangerContig/Achl_ACHLO006-09.fa
   CSV Names Conversion :  /Library/Frameworks/R.framework/Versions/4.0/Resources/library/sangeranalyseR/extdata/fasta/SangerContig/names_conversion_1.csv
            Contig Name :  Achl_ACHLO006-09
         'minReadsNum' :  2
      'minReadLength' :  20
      'minFractionCall' :  0.5
      'maxFractionLost' :  0.5
   'acceptStopCodons' :  TRUE
         'readingFrame' :  1
      Contig Sequence :  TTATATTTTATTCTGGGCGTCTGAGCAGGAATGGTTGGAGCCGGTATAAGACTTCTAATTCGAATCGAGCTAAGACAACCAGGAGCGTTCCTGGGCAGAGACCAACTATACAATACTATCGTTACTGCACACGCATTTGTAATAATCTTCTTTCTAGTAATGCCTGTATTCATCGGGGGATTCGGAAACTGGCTTTTACCTTTAATACTTGGAGCCCCCGATATAGCATTCCCTCGACTCAACAACATGAGATTCTGACTACTTCCCCCATCACTGATCCTTTTAGTGTCCTCTGCGGCGGTAGAAAAAGGCGCTGGTACGGGGTGAACTGTTTATCCGCCTCTAGCAAGAAATCTTGCCCACGCAGGCCCGTCTGTAGATTTAGCCATCTTTTCCCTTCATTTAGCGGGTGCGTCTTCTATTCTAGGGGCTATTAATTTTATCACCACAGTTATTAATATGCGTTGAAGAGGATTACGTCTTGAACGAATTCCCCTGTTTGTCTGAGCTGTGCTAATTACAGTTGTTCTTCTACTTCTATCTTTACCAGTGCTAGCAGGTGCCATTACCATACTTCTTACCGACCGAAACCTCAATACTTCATTCTTTGATCCTGCCGGTGGTGGAGACCCCATCCTC
Forward reads in the contig >>  1
Reverse reads in the contig >>  1
SUCCESS [2021-13-07 12:01:57] 'Achl_ACHLO006-09' is successfully created!

Here is the output of my_sangerContigFa@objectResults@readResultTable:

            readName creationResult errorType errorMessage inputSource    direction
1 Achl_ACHLO006-09_1_F           TRUE      None         None       FASTA Forward Read
2 Achl_ACHLO006-09_2_R           TRUE      None         None       FASTA Reverse Read

Writing SangerContig FASTA files (FASTA)

Users can write the SangerContig instance, my_sangerContigFa, to FASTA files. There are four options for users to choose from in selection parameter.

  • reads_unalignment: Writing reads into a single FASTA file (only trimmed without alignment).
  • reads_alignment: Writing reads alignment and contig read to a single FASTA file.
  • contig: Writing the contig to a single FASTA file.
  • all: Writing reads, reads alignment, and the contig into three different files.

Below is the oneliner for writing out FASTA files. This function mainly depends on writeXStringSet function in Biostrings R package. Users can set the compression level through writeFasta function.

writeFasta(my_sangerContigFa,
           outputDir         = tempdir(),
           compress          = FALSE,
           compression_level = NA,
           selection         = "all")

Users can download the output FASTA file of this example through the following three links:

  1. Achl_ACHLO006-09_reads_unalignment.fa
  2. Achl_ACHLO006-09_reads_alignment.fa
  3. Achl_ACHLO006-09_contig.fa

Generating SangerContig report (FASTA)

Last but not least, users can save SangerContig instance, my_sangerContigFa, into a report after the analysis. The report will be generated in HTML by knitting Rmd files.

Users can set includeSangerRead parameter to decide to which level the SangerContig report will go. Moreover, after the reports are generated, users can easily navigate through reports in different levels within the HTML file.

One thing to pay attention to is that if users have many reads, it will take quite a long time to write out all reports. If users only want to generate the contig result, remember to set includeSangerRead to FALSE in order to save time.

generateReport(my_sangerContigFa,
               outputDir           = tempdir(),
               includeSangerRead   = TRUE)

Here is the generated SangerContig html report of this example (FASTA). Users can access to ‘Basic Information’, ‘SangerContig Input Parameters’, ‘Contig Sequence’ and ‘Contig Results’ sections inside it. Furthermore, users can also navigate through html reports of all forward and reverse SangerRead in this SangerContig report.




Code summary (SangerContig, FASTA)

1. Preparing SangerContig FASTA input

rawDataDir <- system.file("extdata", package = "sangeranalyseR")
fastaFN <- file.path(rawDataDir, "fasta", "SangerContig", "Achl_ACHLO006-09.fa")

2. Creating SangerContig instance from FASTA

# using `constructor` function to create SangerRead instance
my_sangerContigFa <- SangerContig(inputSource           = "FASTA",
                                  processMethod         = "REGEX",
                                  FASTA_File            = fastaFN,
                                  contigName            =  "Achl_ACHLO006-09",
                                  REGEX_SuffixForward   = "_[0-9]*_F$",
                                  REGEX_SuffixReverse   = "_[0-9]*_R$",
                                  refAminoAcidSeq       = "SRQWLFSTNHKDIGTLYFIFGAWAGMVGTSLSILIRAELGHPGALIGDDQIYNVIVTAHAFIMIFFMVMPIMIGGFGNWLVPLMLGAPDMAFPRMNNMSFWLLPPALSLLLVSSMVENGAGTGWTVYPPLSAGIAHGGASVDLAIFSLHLAGISSILGAVNFITTVINMRSTGISLDRMPLFVWSVVITALLLLLSLPVLAGAITMLLTDRNLNTSFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIISQESGKKETFGSLGMIYAMLAIGLLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAVPTGIKIFSWLATLHGTQLSYSPAILWALGFVFLFTVGGLTGVVLANSSVDIILHDTYYVVAHFHYVLSMGAVFAIMAGFIHWYPLFTGLTLNNKWLKSHFIIMFIGVNLTFFPQHFLGLAGMPRRYSDYPDAYTTWNIVSTIGSTISLLGILFFFFIIWESLVSQRQVIYPIQLNSSIEWYQNTPPAEHSYSELPLLTN")

# using `new` method to create SangerRead instance
my_sangerContigFa <- new("SangerContig",
                         inputSource           = "FASTA",
                         processMethod         = "REGEX",
                         FASTA_File            = fastaFN,
                         contigName            = "Achl_ACHLO006-09",
                         REGEX_SuffixForward   = "_[0-9]*_F$",
                         REGEX_SuffixReverse   = "_[0-9]*_R$",
                         refAminoAcidSeq       = "SRQWLFSTNHKDIGTLYFIFGAWAGMVGTSLSILIRAELGHPGALIGDDQIYNVIVTAHAFIMIFFMVMPIMIGGFGNWLVPLMLGAPDMAFPRMNNMSFWLLPPALSLLLVSSMVENGAGTGWTVYPPLSAGIAHGGASVDLAIFSLHLAGISSILGAVNFITTVINMRSTGISLDRMPLFVWSVVITALLLLLSLPVLAGAITMLLTDRNLNTSFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIISQESGKKETFGSLGMIYAMLAIGLLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAVPTGIKIFSWLATLHGTQLSYSPAILWALGFVFLFTVGGLTGVVLANSSVDIILHDTYYVVAHFHYVLSMGAVFAIMAGFIHWYPLFTGLTLNNKWLKSHFIIMFIGVNLTFFPQHFLGLAGMPRRYSDYPDAYTTWNIVSTIGSTISLLGILFFFFIIWESLVSQRQVIYPIQLNSSIEWYQNTPPAEHSYSELPLLTN")
Following is the R shell output that you will get.

3. Writing SangerContig FASTA files (FASTA)

writeFasta(my_sangerContigFa)
Following is the R shell output that you will get.

And you will get three FASTA files:

  1. Achl_ACHLO006-09_reads_unalignment.fa
  2. Achl_ACHLO006-09_reads_alignment.fa
  3. Achl_ACHLO006-09_contig.fa

4. Generating SangerContig report (FASTA)

generateReport(my_sangerContigFa)

You can check the html report of this SangerContig example (FASTA).