🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump
-
Updated
Apr 7, 2021 - Shell
🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump
A command-line toolkit to extract text content and category data from Wikipedia dump files
Corpus creator for Chinese Wikipedia
Reading the data from OPIEC - an Open Information Extraction corpus
Wikipedia-based Explicit Semantic Analysis, as described by Gabrilovich and Markovitch
Downloads and imports Wikipedia page histories to a git repository
Extracting useful metadata from Wikipedia dumps in any language.
Python package for working with MediaWiki XML content dumps
A simple utility to index wikipedia dumps using Lucene.
Network Visualizer for the 'Geschichten aus der Geschichte' Podcast
Collects a multimodal dataset of Wikipedia articles and their images
A library that assists in traversing and downloading from Wikimedia Data Dumps and their mirrors.
A Python toolkit to generate a tokenized dump of Wikipedia for NLP
Node.js module for parsing the content of wikipedia articles into javascript objects
Research for master degree, operation projizz-I/O
Scripts to download the Wikipedia dumps (available at https://dumps.wikimedia.org/ )
Convert WIKI dumped XML (Chinese) to human readable documents in markdown and txt.
Convert Wikipedia XML dump files to JSON or Text files
Contains code to build a search engine by creating an index and perform search over Wikipedia data.
Java tool to Wikimedia dumps into Java Article pojos for test or fake data.
Add a description, image, and links to the wikipedia-dump topic page so that developers can more easily learn about it.
To associate your repository with the wikipedia-dump topic, visit your repo's landing page and select "manage topics."