Skip to content
This repository has been archived by the owner on Aug 25, 2021. It is now read-only.
/ tidyall Public archive

Parse and standardize unstructured columns

License

Notifications You must be signed in to change notification settings

BNIA/tidyall

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status

A tool for parsing and standardizing unstructured columns. TidyAll combines TidyDate, which standardizes inconsistent date columns into ISO 8601 formats, and TidyBlockNLot, which structures special strings related to block and lots.

Interface

TidyAll is a standalone Flask app with a web application interface. The minimalistic templates allow the user to drag and drop (or manually) upload the Excel or CSV dataset, choose the columns to be parsed, and download the new dataset as a CSV file.

Modules

python-dateutils is used for parsing the various date formats and pandas and numpy is used for manipulating dataframes.

Installation

Production

A standalone executable can be found in the dist directory. The executable was created on a Windows 10 system, and has been tested against Windows 10 systems, and Windows XP, Windows 7 and Windows 8 virtual environments.

Development

If you are not familiar with Python projects on Windows machines, you might want to check out this quick guide.

Requirements

  • Python 2.7, 3.5
  • Packages in requirements.txt

Clone the repository, create a virtual environment and install the packages via pip: pip install -r requirements.txt.
Or run the Makefile: make install

Tests

The tests run on nose. To install, run: pip install nose

  • For UNIX machines, a Makefile has been provided for convenience. Just run: make test
  • For non-UNIX machines: nosetests -v -w tests will work.

Executable

PyInstaller was used to build the Windows executable. More details on the building process can be found here