Implementing an inverted index 🤘

This project is about implementing an inverted index using Apache Spark for building the index and a relational database (e.g. SQLite) for storing the index. We are using Python (PySpark) for this project. Storing the index in a database offers the benefit of using the B-Tree data structure offered by a relational database instead of building it from the scratch.

What we are doing ☁️

Build the index using a document collection.
Create database tables for storing the inverted index.
Implement the keyword search functionality.
Implement result ranking using the TF-IDF measure.
Implement a simple interface for giving keyword queries and showing results.

Packages and Software used 💻

Python(Pyspark)
SQLite
NLTK package
Google Colab

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Code		Code
Dataset		Dataset
ZReadme_img		ZReadme_img
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implementing an inverted index 🤘

What we are doing ☁️

Packages and Software used 💻

Insights 📝

About

Releases

Packages

Languages

nikitakpr/Big-Data

Folders and files

Latest commit

History

Repository files navigation

Implementing an inverted index 🤘

What we are doing ☁️

Packages and Software used 💻

Insights 📝

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages