DNA / RNA / Protein: Information Molecules and Languages

IGS 265/350/550 Computer Laboratory

M. Rice/ D. Krizanc / M. Weir

Part 1: DNA / RNA / Protein as Information Molecules

  1. DNA is composed of four types of bases (ATCG). What would be the implications if DNA structure had the four bases organized in clumps of four -- (ATCG)(ATCG)(ATCG)(ATCG)(ATCG)...... What is the evidence that this is not the case? (check out CSH: DNA/proteins (click "animation"), CSH: Watson/Crick)
  2. Is ATGCGC equivalent to CGCGTA? Explain why/why not.
  3. Is ATGCGC equivalent to GCGCAT? Explain why/why not.
  4. Does RNA sequence match DNA sequence?
  5. What is cDNA?
  6. Illustrated is an alignment of the engrailed gene cDNA and genomic DNA sequences. [engrailed is a gene required for embryo development.] Explain the relationship between the sequences. Consider how this alignment of cDNA and genomic DNA sequences might have been obtained.
  7. Could codons be 2 nucleotides long? (check out CSH: codons)
  8. If you know a protein sequence, do you know its DNA sequence?
  9. Is the protein sequence MASS equivalent to SSAM? Explain why/why not. (check out jkimbal link)
  10. Does amino acid sequence determine protein structure and function?

Part 2: DNA as a Language

  1. What is the definition of an open reading frame (ORF)? [Think of a definition involving the start codon ATG, and the stop codons TAA, TAG, and TGA.]
  2. Write pseudocode for a program that identifies all ORFs of a DNA string.
  3. Open Reading Frames can be determined using the ORF finder at NCBI.  Use this to determine the ORFs in the engrailed cDNA and genomic sequences.  You can click on the ORF boxes to display the ORF amino acids.
  4. Here is the actual ORF used for the Engrailed protein. Use the codon tables to work out by hand the first ten amino acids of Engrailed. (See jkimball for a codon table and description of translation.)
  5. Why are other ORFs of the engrailed cDNA not actually used to make the Engrailed protein? For example, there are many copies of the start codon ATG (corresponding to methionine) within the long ORF.

    Kozak (for many organisms) and Cavener (for Drosophila) [see assigned background reference: Cavener, D.R. (1987) Comparison of the concensus sequence flanking translational start sites in Drosophila and vertebrates. Nuc. Acid Res. 15:1353-1361] have examined the frequencies of different bases at positions near the known start (ATG) codons of large numbers of proteins. In particular, Cavener collected the sequences that occur in the 10 positions upstream (5') (-10 to -1) of the ATG that initiates translation for 77 fly genes.  Based on cDNA sequences from the Berkeley Drosophila Genome Project, the Wesleyan IGS database, dbCDNA
    , has this information for 10,284 Drosophila transcripts.  We will discuss the use of relational databases in bioinformatics later in the course.


Part I Questions 2, 3, 8 and 9

Part 2 Question 2

Copyright 2008 Wesleyan University