A tool for parsing and standardizing unstructured columns. TidyAll combines TidyDate, which standardizes inconsistent date columns into ISO 8601 formats, and TidyBlockNLot, which structures special strings related to block and lots.
TidyAll is a standalone Flask app with a web application interface. The minimalistic templates allow the user to drag and drop (or manually) upload the Excel or CSV dataset, choose the columns to be parsed, and download the new dataset as a CSV file.
python-dateutils is used for parsing the various date formats and pandas and numpy is used for manipulating dataframes.
A standalone executable can be found in the dist directory. The executable was created on a Windows 10 system, and has been tested against Windows 10 systems, and Windows XP, Windows 7 and Windows 8 virtual environments.
If you are not familiar with Python projects on Windows machines, you might want to check out this quick guide.
- Python 2.7, 3.5
- Packages in
requirements.txt
Clone the repository, create a virtual environment and install the packages via pip: pip install -r requirements.txt
.
Or run the Makefile: make install
The tests run on nose. To install, run: pip install nose
- For UNIX machines, a Makefile has been provided for convenience. Just run:
make test
- For non-UNIX machines:
nosetests -v -w tests
will work.
PyInstaller was used to build the Windows executable. More details on the building process can be found here