An Introduction to Drosophila Genomics.

In this lab, you will get a chance to integrate different concepts of GENE and MAP. You have considered genes variously as genetically mappable units, information residing on chromosomes, that which underlies phenotypes, scorable markers, historically discovered mutants, translatable coding regions, clonable segments of DNA, and sequenced strings of nucleotides. (You can probably add more.)

Today, you will choose one of the markers commonly scored in Drosophila, and follow it through various databases to find out:

  • its cytological map position,
  • its genetic map position
  • what is known about alleles of the gene
  • what other genes are known to be nearby
  • the gene's DNA sequence, and how it would be cut by a restriction enzyme (useful if you wanted to clone it into a vector)
  • the gene's protein sequence (by conceptual translation)
  • whether there are recognizable features in the sequence, and what their likely functions are
  • whether there are similar sequences (implying homologous genes) in other species, and what is known about these potential homologs.

At the end of this session, you will know more than you probably ever wanted to know about the gene you picked, but also have a feel for the difference between kinds of maps, the power of databases, and the strengths and weaknesses of some biological database searching tools.

Choose a partner and one of the Drosophila genes below, and let's get started.

brown (bw)

forked (f)

vermilion (v)

white (w)

cinnabar (cn)

Stubble (Sb)

scarlet (st)

eyeless (ey)


The major resource you will use is FlyBase, the on-line incarnation of what used to be known as The Redbook - a big red book called "Genetic Variations of Drosophila melanogaster" which listed all known Drosophila mutants, and what was known about them. The FlyBase database has all of the original Redbook information, plus molecular information and links to other databases. There are links directly to relevant sections of the Berkeley Drosophila Genome Project, which has genomic sequence, cloned and sequenced cDNAs, and annotation of predicted genes; and to searches of international protein databases such as SWISS-PROT and GenBank.

Open FlyBase (www.Flybase.org) in a new window, and start by searching for your gene in the "genes" section.

Find your gene among the query results.

  1. Verify that the abbreviation matches, so you know you have the right gene.
  2. If you are on a page with several genes in a table, click on your gene's symbol to get to a brief report on the gene. You might bookmark this synopsis page, since you'll want to come back to it. Information from many sources is summarized on this page, with links to more details.

About the Gene and its Mutants.

  1. Near the top of the synopsis page, you should be able to work out the different kinds of map information. C. B. Bridges worked out a scheme for naming the banding patterns when he drew detailed maps of the larval salivary gland chromosomes in the 1930's. He assigned 20 numbered sections to each major chromosome arm - 1-20 for the X chromosome, also known as chromosome 1, 21-40 for 2L, 41-60 for 2R, 61-80 for 3L, and 81-100 for 3R, with 101 and 102 leftover for the tiny 4th chromosome. L and R denote the left and right arms of the metacentric 2nd and 3rd chromosomes. Major bands in each section are labeled with capital letters, and the smaller bands in between numbers. There is often some uncertainty in reading fine bands, so a range may be listed. For example, 21D1-2 refers to a specific double band near the left end of chromosome 2.
    Genetic map distances are listed in recombinational map units starting at the left tip of each chromosome.
  2. Down the right-hand side are different sorts of "available reports." Use them to determine:
  3. How many alleles of this gene are known?
  4. When was the oldest allele found? (Follow the links back to information about alleles - the original mutant generally has superscript "1").
  5. The supplier of mutant fly stocks for most labs in this country is the Bloomington Indiana Stock Center. Under "stocks", could you order a fly with this mutation from Bloomington (if you had an account there)? What stock number would you ask for? (There may be many, if the marker is used in different combinations. Just list one.)
  6. What can you tell about the phenotype of mutants, and where the gene is expressed? (briefly)
  7. On the schematic map of the gene region, you can see other genes that have been identified nearby on the same chromosome. Find the nearest genes to the left and right of your gene. Click on them to find out more about them. How were they identified? Do they have mutant phenotypes, or are they cDNAs or genes predicted from the sequencing and annotating project?

About the Gene Product - the DNA and protein sequences.

  1. Start by retrieving the sequence in usable format. You may be able to get it under "transcript". Otherwise, set the choices after Sequence: get to TRANSCRIPT and FASTA, then click GET. That way, you'll get the sequence uninterrupted by basepair numbering and other non-nucleotide annotations. Again, bookmark this page for future reference. (You can also try specific known transcripts. Just make sure it's all A, T, C, and G. Proteins will be dealt with later.)
  2. Open a new window to Webcutter (link ). This is just one of many programs that will do an in silico restriction digest for you: that is, will find by computer the sites that a given restriction enzyme would find and cleave enzymatically in your sequence. Copy and paste your FASTA sequence into the box halfway down the page, and choose enzyme conditions. Start simple, with just EcoRI as the enzyme, to see how many EcoRI sites there are in your gene. Now you can repeat with another enzyme, or look at many enzymes at a time.
  3. Now look for open reading frames: an ATG followed by translatable codons. The reading frame closes at the first in-frame stop codon. Paste your sequence from before into the NCBI open reading frame (ORF) finder (http://www.ncbi.nih.gov/gorf/orfig.cgi). It will show you ORFs in all 6 possible reading frames (why 6?). You can see the DNA->protein translation by double clicking on one of the ORF shaded boxes (choose the longest ORF). The translation will show up in the one-letter amino acid abbreviation code. It should match the sequence you get from "polypeptides" on the synopsis page, but may not. (Can you think of reasons why not?)
  4. Are there recognized protein domains for your protein listed on the synopsis page? Do they relate to the presumed function of your gene?
  5. The Gene Ontology section is based on observed conserved protein domains. It illustrates predicted molecular functions, biological processes, and cellular components. By following some of the links, you will see the shared properties predicted for this gene.
  6. You may investigate sequence similarities between the protein product(s) of your gene and others using BLASTP (http://www.ncbi.nlm.nih.gov/BLAST). (BLAST is a computer algorithm for Sequence Alignment). Copy the amino acid sequence from Flybase, then paste it into the appropriate section of BLAST (Standard protein-protein BLAST [blastp]). The initial output shows conserved protein domains with links. You have to click the FORMAT button to view the detailed results of the BLASTP search, and may have to refresh a couple times if the server is busy. You can mouse down over the matches found, to discover their sources, and see where the similarities listed in "protein domains" above come from. (Gene names will be at the top as you mouse over the alignment bars.)
  7. "Linkout" leads you to some associated databases with information pertaining to your gene. Try exploring these links.

[there is a report page to go with this lab]


Copyright 2006 Wesleyan University (mw/lfa)