Advanced User Guide - SangerRead (FASTA)

SangerRead is in the bottommost level of sangeranalyseR (Figure_1), and each SangerRead object corresponds to a single read in Sanger sequencing. In this section, we are going to go through detailed sangeranalyseR data analysis steps in SangerRead level with FASTA file input.

../_images/SangerRead_hierarchy.png

Figure 1. Hierarchy of classes in sangeranalyseR, SangerRead level.


Preparing SangerRead FASTA input

The FASTA input method is designed for those who do not want to do quality trimming and base calling on their Sanger sequencing data; therefore, no quality trimming and chromatogram input parameters are needed. Before starting the analysis, users need to prepare a FASTA file, and in this example, it is in the sangeranalyseR package; thus, you can simply get its path by running the following codes:

inputFilesPath <- system.file("extdata/", package = "sangeranalyseR")
A_chloroticaFFNfa <- file.path(inputFilesPath,
                               "fasta",
                               "SangerRead",
                               "Achl_ACHLO006-09_1_F.fa")

The only hard regulation of the filename, Achl_ACHLO006-09_1_F.fa in this example, is that file extension must be .fasta or .fa.


Creating SangerRead instance from FASTA

After preparing an input FASTA file, the next step is to create a SangerRead instance by running SangerRead constructor function or new method. The constructor function is a wrapper for new method which makes instance creation more intuitive. All of the input parameters have their default values. We list important parameters in the two SangerRead creation methods below. readFileName stores the FASTA filename, and inside it, the string in the first line after “>” is the name of the read. Users need to assign the name of the read to fastaReadName which is used for read-matching. Figure 2 is a valid FASTA file, Achl_ACHLO006-09_1_F.fa (example FASTA file), and the value of fastaReadName is Achl_ACHLO006-09_1_F.

../_images/SangerRead_fasta_input_file.png

Figure 2. SangerRead FASTA input file.

# using `constructor` function to create SangerRead instance
sangerReadFfa <- SangerRead(inputSource        = "FASTA",
                            readFeature        = "Forward Read",
                            readFileName       = A_chloroticaFFNfa,
                            fastaReadName      = "Achl_ACHLO006-09_1_F",
                            geneticCode        = GENETIC_CODE)


# using `new` method to create SangerRead instance
sangerReadFfa <- new("SangerRead",
                     inputSource        = "FASTA",
                     readFeature        = "Forward Read",
                     readFileName       = A_chloroticaFFNfa,
                     fastaReadName      = "Achl_ACHLO006-09_1_F",
                     geneticCode        = GENETIC_CODE)

The inputs of SangerRead constructor function and new method are the same. For more details about SangerRead inputs and slots definition, please refer to sangeranalyseR reference manual.

Inside the R shell, you can run sangerReadFfa to get basic information of the instance or run sangerReadFfa@objectResults@readResultTable to check the creation result of every Sanger read after sangerReadFfa is successfully created.

Here is the output of sangerReadFfa:

SangerRead S4 instance
         Input Source :  FASTA
         Read Feature :  Forward Read
         Read FileName :  Achl_ACHLO006-09_1_F.fa
      Fasta Read Name :  Achl_ACHLO006-09_1_F
      Primary Sequence :  CTGGGCGTCTGAGCAGGAATGGTTGGAGCCGGTATAAGACTTCTAATTCGAATCGAGCTAAGACAACCAGGAGCGTTCCTGGGCAGAGACCAACTATACAATACTATCGTTACTGCACACGCATTTGTAATAATCTTCTTTCTAGTAATGCCTGTATTCATCGGGGGATTCGGAAACTGGCTTTTACCTTTAATACTTGGAGCCCCCGATATAGCATTCCCTCGACTCAACAACATGAGATTCTGACTACTTCCCCCATCACTGATCCTTTTAGTGTCCTCTGCGGCGGTAGAAAAAGGCGCTGGTACGGGGTGAACTGTTTATCCGCCTCTAGCAAGAAATCTTGCCCACGCAGGCCCGTCTGTAGATTTAGCCATCTTTTCCCTTCATTTAGCGGGTGCGTCTTCTATTCTAGGGGCTATTAATTTTATCACCACAGTTATTAATATGCGTTGAAGAGG
SUCCESS [2021-12-07 23:37:43] 'Achl_ACHLO006-09_1_F.fa' is successfully created!

Here is the output of sangerReadFfa@objectResults@readResultTable:

            readName creationResult errorType errorMessage inputSource    direction
1 Achl_ACHLO006-09_1_F           TRUE      None         None       FASTA Forward Read

Writing SangerRead FASTA files (FASTA)

Users can write sangerReadFfa to a FASTA file. Because the FASTA input method does not support quality trimming or base calling, in this example, the sequence of the output FASTA file will be the same as the input FASTA file. Moreover, users can set the compression level through the one-liner, writeFasta, which mainly depends on writeXStringSet function in Biostrings R package.

writeFasta(sangerReadFfa,
           outputDir         = tempdir(),
           compress          = FALSE,
           compression_level = NA)

Users can download the Achl_ACHLO006-09_1_F.fa of this example.


Generating SangerRead report (FASTA)

Last but not least, users can save sangerReadFfa into a static HTML report by knitting Rmd files. In this example, tempdir function will generate a random path.

generateReport(sangerReadFfa,
               outputDir = tempdir())

SangerRead_Report_fasta.html is the generated SangerRead report html of this example. Users can access to ‘Basic Information’, ‘DNA Sequence’ and ‘Amino Acids Sequence’ sections inside this report.




Code summary (SangerRead, fasta)

(1) Preparing SangerRead FASTA input

inputFilesPath <- system.file("extdata/", package = "sangeranalyseR")
A_chloroticaFFNfa <- file.path(inputFilesPath,
                               "fasta",
                               "SangerRead",
                               "Achl_ACHLO006-09_1_F.fa")

(2) Creating SangerRead instance from FASTA

# using `constructor` function to create SangerRead instance
sangerReadFfa <- SangerRead(inputSource        = "FASTA",
                            readFeature        = "Forward Read",
                            readFileName       = A_chloroticaFFNfa,
                            fastaReadName      = "Achl_ACHLO006-09_1_F")

# using `new` method to create SangerRead instance
sangerReadFfa <- new("SangerRead",
                     inputSource        = "FASTA",
                     readFeature        = "Forward Read",
                     readFileName       = A_chloroticaFFNfa,
                     fastaReadName      = "Achl_ACHLO006-09_1_F")
Following is the R shell output that you will get.

(3) Writing SangerRead FASTA files (FASTA)

writeFasta(sangerReadFfa)
Following is the R shell output that you will get.

And you will get one FASTA file:

  1. Achl_ACHLO006-09_1_F.fa

(4) Generating SangerRead report (FASTA)

generateReport(sangerReadFfa)

You can check the html report of this SangerRead example (FASTA).