23. Manipulating DNA Sequences#
In this exercise, we will work with DNA sequences. This will allow us to practice the fundamental concepts we’ve covered in class, such as integers, floats, strings, functions, conditionals, and the while loop (the introduction part of the course).
The first set of questions involves the basic implementation of these concepts. The subsequent questions, when combined, will form an independent program.
Please make sure to follow the instructions carefully. Do not change the function names, parameters, or return values.
23.1. What is a DNA sequence?#
A DNA sequence is a molecule made up of basic building blocks called nucleic bases. There are four different types: adenine (“A”), cytosine (“C”), guanine (“G”), and thymine (“T”). In living cells, DNA molecules are composed of two antiparallel strands twisted around each other to form a double helix. The bases pair together specifically: an “A” base always pairs with a “T”, and a “G” base always pairs with a “C”.
23.1.1. Finding the Complementary Base#
Write a function complement(base)
that takes a single base (i.e., a character) as input and returns the corresponding complementary base. For example, the call complement("A")
should return "T"
.
Hint: You should define the function using def
, handle different cases using if
, elif
, and else
, and finally return the appropriate value using return
.
def complement(base):
pass
23.2. Display#
23.2.1. Vertical Display#
Write a function vstring(dna)
that takes a DNA sequence as a string and return a string representing the sequence vertically, showing each base and its complementary pair. For example, the call vstring("CGTAGTTTCGA")
should produce the following return value:
C-A
G-T
T-G
A-C
G-T
T-G
T-G
T-G
C-A
G-T
A-C
Hint: Use the pair function you defined earlier. Be mindful of potential infinite loops when using while. Remember that you can concatenate strings using the +
operator. You can acces to character of a given string with its index (s[i]
).
def vprint(dna):
pass
23.2.2. Horizontal Display#
Now, we want to create a horizontal display of the DNA sequence. Write a function hstring(dna)
that produces the following return value:
CGTAGTTTCGA
|||||||||||
ATGCTGGGATC
Hint: You can create strings to store intermediate results. Remember that strings in Python are immutable, but you can concatenate strings together to build the final result. If you do not find the “pipe” (|
) on your keyboard, you can copy/paste it.
def hstring(dna):
pass
23.3. Inputting DNA Sequences#
23.3.1. Testing a Sequence#
Write a function is_dna(sequence)
that takes a string as a parameter and returns True
if the string is composed exclusively of the characters 'A'
, 'T'
, 'G'
, and 'C'
(in any combination of base in uppercase), and False
otherwise.
print(is_adn("CGTAGTTTCGA")) # Output True
print(is_adn("CGTXAGTTTCGA")) # Output False
Hint: The in
keyword can be used to check if a character is present within a string. Be cautious of potential infinite loops when using iteration.
def is_dna(sequence):
pass
23.3.2. User Input#
Write a function input_dna()
that prompts the user to enter a string representing a DNA sequence. The function should keep asking the user to re-enter the sequence until a valid DNA sequence is provided. A valid DNA sequence contains only the characters ‘A’, ‘C’, ‘G’, and ‘T’ (case insensitive). Once a valid sequence is entered, the function should return that sequence. Here’s an example of how the function might be used:
Input your DNA sequence:
ACGTCF
Your sequence is not composed of 'ACGT', enter a new sequence.
ACGTCGAAGCG
'ACGTCGAAGCG'
Hint: Think about how we handled user input in the number guessing game. Don’t forget to include a return
statement in your function.
def input_dna():
pass
23.4. Molar Mass#
Write a function mass(dna)
that takes a DNA sequence as a parameter and returns the molar mass of the sequence. Each base has a different weight:
“A”, 135g/mol
“C”, 111g/mol
“G”, 151g/mol
“T”, 126g/mol
mass("CGTAGTTTCGA") # output : 2923
Hint: To calculate the molar mass, you will need to sum the weights of each base and its complementary base. Use the complement function from previous exercises to find the complementary base and ensure you account for each base and its pair in the calculation.
def mass(dna):
pass
23.5. Occurrences#
23.5.1. Contains a Pattern#
Write a function contains(dna, pattern, pos)
that takes two string parameters and one int: dna
, pattern
and pos
. The function should return True
if the dna sequence contain pattern
at a given position, and False
otherwise. For example:
print(contains("CGTACGTGTTTCGA", "CGT", 0)) # True
print(contains("CGTCGT", "CGT", 3)) # True
print(contains("AAAACG", "CGT", 4)) # False, and raise no error
print(contains("CCCACGTGTTTCGA", "CGT", 0)) # False
print(contains("CGT", "CGTA", 0)) # False, pattern is longer than the dna sequence
Hint: First, handle the case where the length of the pattern combined with the position exceeds the length of the dna sequence.
def contains(dna, pattern, pos):
pass
23.5.2. First occurence#
Write a function first_occurrence(dna, pattern, pos)
that takes three parameters: a string dna
(the DNA sequence), a string pattern
(the pattern to search for), and an integer pos
(the starting position for the search). The function should return the index of the first occurrence of pattern
in dna
starting from the index pos
. If the pattern is not found in the dna sequence starting from pos
, the function should return None
.
print(first_occurence("CGTCGT", "CGT", 0)) # 0
print(first_occurence("CGTCGT", "CGT", 1)) # 3
print(first_occurence("AAAAAA", "CGT", 0)) # None
print(first_occurence("AA", "A", 1)) # 1
Hint: Use the start_with function defined in the previous exercise to check for the presence of the pattern. This will allow you to use nested while loops effectively, minimizing the complexity of managing both loops simultanously.
def first_occurence(dna, pattern, pos):
pass
23.5.3. Number of occurences#
Write a function number_of_occurrences(dna, pattern)
that takes two parameters: a DNA sequence dna
and a pattern pattern
. The function should return the number of times the pattern
appears in the dna
sequence.
Hint: To solve this problem, you can use a loop to repeatedly search for the pattern in the dna sequence, updating the starting position each time you find an occurrence.
print(number_of_occurences("CGTCGT", "CGT")) # 2
print(number_of_occurences("CCCCCCC", "C")) # 7
print(number_of_occurences("CCCCCCC", "A")) # 0
def number_of_occurences(dna, pattern):
pass
23.6. Main#
We can now assemble all the functions into a main function. Remove, if not already done, ALL executable code statements. Keep only the function definitions.
Then add the following main function. If you have followed the instructions, everything should work.
def main():
dna = input_dna()
pattern = input_dna()
print()
print("-"*70)
print("INFORMATIONS")
print(hstring(dna))
print( f"Molar mass of the dna sequence is {mass(dna)} g/mol" )
print( f"There are {number_of_occurences(dna, pattern)} occurence(s) of the pattern {pattern} inside the dna sequence." )
print("-"*70)
main()
The output should be something like that.
Input your DNA sequence:
CGTCGT
Input your DNA sequence:
CGT
----------------------------------------------------------------------
INFORMATIONS
CGTCGT
||||||
ATGATG
Molar mass of the dna sequence is 1600 g/mol
There are 2 occurence(s) of the pattern CGT inside the dna sequence.
----------------------------------------------------------------------
23.7. Bibliothèque rich
#
You can install the rich
library using your package manager. With this module, you can use Markdown syntax to create a more visually appealing display. The syntax elements are available on this page.
23.7.1. Test rich
module#
Test the module using the following code.
MARKDOWN = """
# This is an h1
Rich can do a pretty *decent* job of rendering markdown.
1. This is a list item
2. This is another list item
"""
from rich.console import Console
from rich.markdown import Markdown
console = Console()
md = Markdown(MARKDOWN)
console.print(md)
As you can see, it is simple and nice looking.
23.7.2. Using rich
#
Adapt your main
function to use the module and make a nice display. Warning, in markdow, the tabulation at beggining of a line are important.