Problem 0 - Replication: DNA polymerase is an enzyme that can synthesize DNA
molecules from nucleotides. They are essential in DNA replication, for example
during cell division. They read the existing DNA and create new copies of
double stranded DNA. In particular, DNA polymerase reads a DNA strand in
reverse (3' to 5'), and synthesizes a complementary strand of DNA (along the 5'
to 3' direction of the newly synthesized DNA). In short, it produces the reverse
complement of the original.
(5 points) Write a simple function that accepts a DNA sequence and returns its
reverse complement. The input should be a string and so should be the output
(This applies generally throughout this project, unless mentioned
otherwise). Write in two lines for full credit.
Problem 1 -
Transcription and Translation: As we discussed in
class, the central dogma of biology is the transcription of DNA into RNA
followed by translation of RNA into proteins. During transciption, RNA
polymerase traverses a DNA template strand in reverse, and
synthesizes a messenger RNA (mRNA) strand that is complementary to
the template's DNA. Then, the mRNA is translated by the ribosome into protein.
We will write functions to emulate this process.
(5 points) Write a
function that transcribes a complenetary DNA strand of a gene into
RNA. Write in two lines for full credit
(5 points) Note that Transcribe and RevComp functions do
almost the same work. Write a single line transcription
function that uses RevComp.
(10 points) Going back to the central dogma
model, we have transcribed the DNA into mRNA, what remains now is to translate
the mRNA into proteins.
Analogous to how a piece of code you write makes the computer produce some
output, DNA served as the code for a cell to produce mRNA. Now, mRNA serves as
the code for the cell to produce proteins as output. Proteins are made up of
amino acids. So cells must interpret the code in mRNA appropriately to produce
amino acids, which later combine to form proteins. For this, a cell reads the
nucleotides in mRNA in groups of three, which are called codons.
The genetic code for the production of amino acids is provided below as a
dictionary. Each key is a codon, which is interpreted to produce the amino acid
indicated by the corresponding value. So for example the codon 'AAA' will
produce the amino acid denoted by 'K' (Lysine). The cell will stop translation
when it encounters a stop codon. In the following dictionary the
stop codons are associated with value of 'X'. The start codon, AUG, indicates
where translation should start (we won't code anything for this here, but it's
good to know).
For this problem, fill in the missing line below to make a function that
translates the mRNA to protein. Note: 1) your function should return an error
if the length of input is not a multiple of three, 2) your function should not
produce any more amino acids once a stop codon is found.
(5 points) The mRNA strand is the reverse
complement of the template strand of DNA, which is itself the reverse
complement of the coding strand. The function above translates from mRNA to
protein; below, write a function that takes a DNA coding strand as input and
returns the protein. The same notes as above apply here as well. The strands
below demonstrate useful terminology for understanding the remaining problems: