Huntingtin and Huntington's Disease

This website was produced as an assignment for Genetics 677 at UW-Madison Spring 2009

Protein Sequence and Homology

Human Sequence of Huntingtin

The protein sequence used in this project was obtained from the NCBI site (NP 002102)  The huntingtin protein is 3,144 amino acids in length.  From a search on ExPASy, I found the isoelectric point of huntingtin is 5.81 and the molecular weight 347,859.56 Da, based upon this sequence. 


In order to determine the closest protein homologs to human huntingtin, I first conducted a general search through Homologene to develop a list of other organisms potentially expressing a similar protein.  The following list contains these proteins found in other species homologous to the human huntingtin protein.  The proteins are listed by their corresponding gene name.  Click on the protein's accession number to view amino acid sequence data in FASTA format from Entrez Protein.  After generating a potential list, I attempted to compare the level of similarity between human huntingtin and the homologs through Blastp,a protein-protein algorithm. 

Pan troglodytes(chimpanzee)

solute carrier family 6 protein

Protein Sequence  XP_517080

Score=6014 bits (15603) E-value=0.0

Identities = 3021/3075 (98%), Positives = 3034/3075 (98%), Gaps = 13/3075 (0%)

Nucleotide Sequence XM 517080

Score = 7788 bits (4217),  Expect = 0.0
 Identities = 4378/4451 (98%), Gaps = 29/4451 (0%) Strand=Plus/Plus

Canis lupus familiaris (dog)

solute carrier family 6 protein
Protein Sequence    XP_536221   

Score = 5700 bits (14787),  Expect = 0.0
 Identities = 2816/3073 (91%), Positives = 2917/3073 (94%), Gaps = 10/3073 (0%)

Nucleotide Sequence  XM 536221

Score =  320 bits (173),  Expect = 5e-88 Identities = 218/240 (90%), Gaps = 2/240 (0%) Strand=Plus/Plus

Bos taurus (cattle)

Protein Sequence  XP_871851

Score = 5579 bits (14474),  Expect = 0.0Identities = 2736/3073 (89%), Positives = 2898/3073 (94%), Gaps = 11/3073 (0%)

Nucleotide Sequence XM 871851

Score =  342 bits (185),  Expect = 1e-94
Identities = 246/276 (89%), Gaps = 2/276 (0%)

Danio rerio (zebrafish)

Protein Sequence NP_571093

Score = 4467 bits (11586),  Expect = 0.0
 Identities = 2266/3195 (70%), Positives = 2650/3195 (82%), Gaps = 127/3195 (3%)

Nucleotide (mRNA) Sequence NM 131018

Blast does not yield results for a nucleotide alignment for Danio rerio

Mus musculus (mouse)

Protein Fasta Sequence  NP_034544

Score = 5665 bits (14696),  Expect = 0.0 Identities = 2793/3063 (91%), Positives = 2912/3063 (95%), Gaps = 4/3063 (0%)

Nucleotide (mRNA)Fasta Sequence NM 010414
Score = 1.008e+04 bits (5456),  Expect = 0.0
Identities = 8114/9403 (86%), Gaps = 159/9403 (1%) Strand=Plus/Plus

Rattus norvegicus (rat)

Protein Sequence  NP_077333

Score = 5671 bits (14712),  Expect = 0.0
Identities = 2794/3064 (91%), Positives = 2913/3064 (95%), Gaps = 5/3064 (0%

Nucleotide (mRNA) Sequence NM 024357

Score = 9912 bits (5367),  Expect = 0.0
 Identities = 7995/9269 (86%), Gaps = 159/9269 (1%) Strand=Plus/Plus

Gallus gallus (chicken)

Protein Sequence XP_420822

Score = 5297 bits (13741),  Expect = 0.0,
 Identities = 2604/3154 (82%), Positives = 2842/3154 (90%), Gaps = 69/3154 (2%)

Nucleotide (mRNA) Sequence  XM 420822

 Score = 3912 bits (2118),  Expect = 0.0
 Identities = 6249/8224 (75%), Gaps = 361/8224 (4%) Strand=Plus/Plus

Sequence Alignments

I used two programs to assess the similarity between all of the homologous sequences for huntingtin.  The results below show a relatively high level of conservation among the homologues.  The t-coffee alignment gives a colored key to similarity scores for each amino acid.  I also ran a muscle alignment, given below.   The alignments show a relatively high level of conservation in the amino acid sequences of the homologs.

File Size: 91 kb
File Type: doc
Download File

File Size: 77 kb
File Type: doc
Download File

File Size: 215 kb
File Type: pdf
Download File

File Size: 82 kb
File Type: doc
Download File

Phylogenetic Tree Analysis

I constructed the above phylogenetic tree using the Treetop program, analyzing for homology in the huntingtin protein homoogues for the species listed above.  Using the standard settings on the program, I generated a tree showing the greatest similarity of the human form of huntingtin to chimpanzees.  The greatest divergence is between the human and the zebrafish forms. 

I then performed a second phylogenetic analysis using on "One-Click" settings.  The results were similar to those of Treetop.  We see the human form of hungtingtin is very similar to the chimpanzee form.  Scores indicate relative similarities for specific branches-thus, with a score of 1, the Homo sapiens and Pan troglodyes huntingtin homologues are quite similar, almost identical. 

Overall Analysis

The BLAST protein sequence scores show there is a high level of similarity between the homologs in the different organisms.  The scores on the BLAST program match up well with my phylogenetic analysis of the homologs.  I found the BLAST program to be informative and simple to use, although the depth of information in the results is not in particular that great.  The different sequence alignment programs further supported these results.  I found the information from T-COFFEE to be the most useful, in the sense that all 4 programs I used brought back the same results, while T-COFFEE was the only program that represented the data graphically (in color).  The motif search programs were difficult to use with huntingtin - the expected HEAT repeats were not immediately found by any of the programs.  The relative scores for the HEAT repeats in the programs matched those of low-complexity regions and other undefined motifs.  Thus, I wasn't able to put a lot of confidence in my results from the motif programs, although the literature on huntingtin confirmed the presence of HEAT repeats.   I believe developing greater familiarity with these programs will allow me to gain better information in the future and will be more informative in my analysis of huntingtin. 

Created by Eric Nickels      5/8/2009      Genetics 677 Webpage