parse_the_human_proteome

A Python script to parse and bin proteins in the human proteome by protein length and visualize results.

Notes

To challenge myself, this project was coded using the Python base package only (no libraries). However, to visualize results, matplotlib.pyplot was used to generate figures.

Requirements

Python 3
Any text editor
The file called "uniprot-9606-reviewed.fasta" which is included in this repository

How to Run This Program

Download this repository
Run the script "human_proteome_parser.py" from inside the folder called "run_from_this_dict"

Steps Performed

Set parameters
- User is prompted to enter the following:
  1. Desired lower bound of protein lengths: (e.g. 0 residues)
  2. Desired upper bound of protein lengths: (e.g. 1500 residues)
  3. Desired bin width: (e.g. 50 residues)
    - Note: Please enter only the integer, not the string "residues"
Define helper functions
- make_dict
  - Parses the .fasta file, isolating the protein name from the molecular sequence
- identify_bin
  - Based on user-specified input (bin width, range of protein sizes to consider), bin proteins based on number of residues
- cumilative_sum
  - Creates running sum of elements in a list. For example, the list [1,2,3,4,5] becomes [1,3,6,10,15].
Generate dictionary by calling make_dict
- (protein name) : (protein length)
Calculate number of proteins and relative frequency of proteins in each bin by calling identify_bin
Calculate cumilative frequency of proteins by calling cumilative_sum
Generate new dictionary based on binned data
- (cur bin) : [ (protein length) , (relative frequency of occurance in the human proteome), (cumilitive frequency) ]
Display output in commend window
Write output to CSV
Plot and save histograms for data visualization

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
run_from_this_dict		run_from_this_dict
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

parse_the_human_proteome

Notes

Requirements

How to Run This Program

Steps Performed

About

Releases

Packages

Languages

kwakim1/parse_the_human_proteome

Folders and files

Latest commit

History

Repository files navigation

parse_the_human_proteome

Notes

Requirements

How to Run This Program

Steps Performed

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages