DNA / RNA /
Protein: Information Molecules and Languages
IGS 265/350/550 Computer Laboratory
M. Rice/ D. Krizanc / M. Weir
Part 1: DNA / RNA / Protein as
- DNA is composed of
four types of bases (ATCG). What would be the implications if DNA
structure had the four bases organized in clumps of four --
(ATCG)(ATCG)(ATCG)(ATCG)(ATCG)...... What is the evidence that this is not
the case? (check out CSH: DNA/proteins (click "animation"), CSH: Watson/Crick)
- Is ATGCGC equivalent
to CGCGTA? Explain why/why not.
- Is ATGCGC equivalent
to GCGCAT? Explain why/why not.
- Does RNA sequence
match DNA sequence?
- What is cDNA?
- Illustrated is an alignment of the engrailed gene cDNA and genomic DNA sequences. [engrailed is a gene required for embryo development.]
Explain the relationship between the sequences. Consider how this
alignment of cDNA and genomic DNA sequences might have been obtained.
- Could codons be 2
nucleotides long? (check out CSH: codons)
- If you know a protein
sequence, do you know its DNA sequence?
- Is the protein
sequence MASS equivalent to SSAM? Explain why/why not. (check out jkimbal link)
- Does amino acid sequence determine protein
structure and function?
Part 2: DNA as a Language
- What is the definition
of an open reading frame (ORF)? [Think of a definition involving the start
codon ATG, and the stop codons TAA, TAG, and TGA.]
- Write pseudocode for a
program that identifies all ORFs of a DNA string.
- Open Reading Frames
can be determined using the
ORF finder at NCBI.
Use this to determine the ORFs in
the engrailed cDNA and
genomic sequences. You can click on the ORF boxes to display the ORF amino
- Here is the actual ORF used for the Engrailed protein. Use the codon
tables to work out by hand the first ten amino acids of Engrailed. (See jkimball
for a codon table and description of translation.)
- Why are other ORFs of
the engrailed cDNA not
actually used to make the Engrailed protein? For example, there are many
copies of the start codon ATG (corresponding to methionine) within the
Kozak (for many organisms) and Cavener (for Drosophila) [see assigned
background reference: Cavener, D.R. (1987) Comparison of the concensus
sequence flanking translational start sites in Drosophila and vertebrates.
Nuc. Acid Res. 15:1353-1361] have examined the frequencies of different
bases at positions near the known start (ATG) codons of large numbers of
proteins. In particular, Cavener collected the sequences that occur in the
10 positions upstream (5') (-10 to -1) of the ATG that initiates
translation for 77 fly genes.
Based on cDNA sequences from the Berkeley Drosophila Genome
Project, the Wesleyan IGS database, dbCDNA, has this information for 10,284 Drosophila
transcripts. We will discuss
the use of relational databases in bioinformatics later in the course.
Part I Questions 2, 3, 8 and 9
Part 2 Question 2
Copyright 2008 Wesleyan University