Skip to content

Kareem-Mohamed-Wardany/BLAST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

BLAST (Basic Local Alignment Search Tool) Implementation

This project is a Python-based implementation of the Basic Local Alignment Search Tool (BLAST). It allows users to compare a query biological sequence (DNA, RNA, or protein) against a database of sequences to identify regions of similarity (local alignments). The implementation uses the Blosum62 substitution matrix to compute alignment scores for protein sequences.


Aim

The aim of this implementation is to:

  • Efficiently search large biological sequence databases.
  • Identify regions of local similarity between sequences.
  • Provide biologically relevant results while optimizing for speed.

By focusing on local alignments rather than global alignments, BLAST provides faster results suitable for large-scale bioinformatics tasks.


Environment

Programming Language

  • Python:
    Python is used for its rich libraries and tools for computational biology.

Substitution Matrix

  • Blosum62:
    • A widely used substitution matrix for scoring alignments between protein sequences.
    • Provides biologically meaningful scores by accounting for evolutionary conservation.

Features

  1. Sequence Comparison

    • Compare a query sequence against a database of sequences.
    • Identify regions of local similarity (local alignments).
  2. Alignment Scoring

    • Use the Blosum62 substitution matrix to compute scores for protein alignments.
  3. Efficient Search Algorithm

    • Implements the BLAST algorithm to optimize the search process.
  4. Output

    • Returns a list of high-scoring alignments along with their scores and positions.

Installation

  1. Clone the Repository
  2. Install Dependencies Install required Python packages using pip

How It Works

  1. Input Handling

Reads the query and database sequences from FASTA files.

  1. Seeding

Identifies short matching regions (seeds) between the query and database.

  1. Extension

Extends seeds to identify high-scoring local alignments using the Blosum62 substitution matrix.

  1. Scoring

Calculates alignment scores and ranks alignments based on their relevance.


Key Components

Blosum62 Substitution Matrix

Scores alignments based on evolutionary conservation.

Matches conserved amino acids while penalizing substitutions and gaps.

Algorithm Steps

  1. Seeding: Finds initial exact matches between sequences.

  2. Extension: Extends seeds in both directions to find local alignments.

  3. Scoring: Computes scores for each alignment using the Blosum62 matrix.


Future Enhancements

  1. Support for Multiple Substitution Matrices

Allow users to choose from matrices like PAM250 or Blosum45.

  1. Statistical Analysis

Provide e-values and bit scores for alignments.

  1. Graphical Visualization

Add support for visualizing alignments in a graphical interface.

  1. Optimization

Improve performance for large-scale sequence databases.


Contributing

Contributions are welcome!

  1. Fork the repository.

  2. Create a new branch for your feature or bugfix.

  3. Submit a pull request with detailed documentation of your changes.


License

This project is licensed under the MIT License. See the LICENSE file for more details.


Acknowledgments

NCBI BLAST: The inspiration for this project comes from the NCBI BLAST tool.

Open-source contributors for Python libraries like Biopython and NumPy.

Releases

No releases published

Packages

No packages published

Languages