Hierarchical Clustering using Euclidean Distance

Understand the importance and usage of the hierarchical clustering using skew profiles.

Locate and process the viral cDNA genome files to calculate the skew profiles.

Understand the theory for using the Pythagorean equation to calculate the Euclidean distance. And apply that using python to build a linkage matrix.

Understand how errors occur, how to avoid them, and resolve their sources.

By the end of this project, you will create a Python program using a jupyter interface that analyzes a group of viruses and plot a dendrogram based on similarities among them. The dendrogram that you will create will depend on the cumulative skew profile, which in turn depends on the nucleotide composition. You will use complete genome sequences for many viruses including, Corona, SARS, HIV, Zika, Dengue, enterovirus, and West Nile viruses.

Python ProgrammingGenomicsplotting

  1. Task 1: Getting Started with Hierarchical Clustering

  2. Task 2: Locate and Process The Data Files

  3. Task 3: Understand The Result Dataset

  4. Task 4: Hierarchical Clustering - Metric

  5. Task 5: Hierarchical Clustering - Ordering & Methods

  6. Task 6: Dendrogram Plotting

  7. Task 7: Dendrogram - Analysis

  8. Task 8: Errors to Avoid

