Skip to content

A multifaceted natural language tool written in Python 2.7.*. A release written in Python 3.8 has been uploaded in the GitHub project pyellytoo.

Notifications You must be signed in to change notification settings

prohippo/pyelly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyElly is a rule-based natural language processing tool that has existed
for over forty years in various incarnations. It is now free for download
from the Web as open source software. It is written entirely in
version 2.7 of Python and employs SQLite for data management.

PyElly is intended mainly for educational use. It allows a student to
engage natural language at a fine level of detail and to learn the issues
involved in processing text data. It can be of interest to others, though,
because of its extensive support for handling the messy aspects of
language not central to most text data problems or to their solutions.

The basic paradigm of PyElly is to rewrite natural language input into
some other text output, which might be SQL, XML, or some other form. This
falls short of full understanding, but can be quite helpful as a general
kind of preprocessing for data mining or for more precise indexing.

PyElly tools include flexible tokenization, syntax-driven parsing, English
inflectional and morphological stemming, macro substitutions, basic
and extended entity extraction, ambiguity handling, sentence recognition,
support for large external dictionaries, and a general procedural
framework for translating text from UTF-8 to UTF-8.
0
The latest versions have been completely rewritten in object-oriented
Python. It completed  beta testing in 2014 and can be found on
GitHub at https://gith$ub.com/prohippo/pyelly.git . Development and
refinement of PyElly software is ongoing.

To learn how to use PyElly, see the PyEllyManual.pdf file in the same
directory as this README.txt file. The manual has about 170 pages
of information, including an overview of basic linguistics. Documentation
of individual Python source files can be generated as needed by running
the Python pydoc utility on the source files.

At present, PyElly consists of 67 Python modules comprising about eleven
thousand lines of source code. The PyElly package also includes various
language definition files with rules implementing a broad range of
nontrivial example applications; these include

* indexing - remove stopwords and get stems for content words from raw
             text input.
* texting  - readable text compression.
* doctor   - emulation of Weizenbaum's Doctor program.
* chinese  - basic translation of English to Chinese in simplified
             or traditional characters.
* querying - rewrite English questions as SQL queries for a Soviet
             military aircraft database.
* marking  - rewrite English text from the Web with shallow XML markup
* name     - extract mostly English personal names from text
* disambig - disambiguation of phrases with WordNet concept information.
* chemic   - recognition of chemical names in text

These show just a few of the many things PyElly can do for you. They also
serve as a basis for comprehensive software integration testing. You may
use any of them as models for building your own PyElly applications.

PyElly is free software released under a BSD open-source license for
educational and other uses. Be advised that the current software and
documentation is still evolving, although releases after v1.2 should be
more stable than preceding releases.

Release Notes:

 0.1    -  25dec2013  initial beta release
 0.2    -  16mar2014  increase number of syntactic categories to 64
                      add storing and reinserting of deleted output buffer text
                      fix bugs in DELETE TO generative semantic command
                      add unit testing input to PyElly distribution
                      save integration testing script doTest properly
                      eliminate inconsistencies in integration testing keys
                      improve output of unit test for generativeProcedure.py
 0.3    -  24apr2014  extend generative semantics to support new applications
                      add UNITE, INTERSECT, COMPLEMENT, UNCAPITALIZE
                      add QUEUE, UNQUEUE, SHOW
                      replace DELETE ALL code
                      make STORE more efficient and generalize, fix bugs
                      allow for initializing of global variables in grammar
                      strengthen unit testing, add "querying" integration test
 0.4    -  04jul2014  support conceptual hierarchies in cognitive semantics
                      separate lookup tables for syntactic and semantic features
                      fix bugs in loading vocabulary tables from text input
                      fix bugs in loading conceptual hierarchies from text input
                      improve unit testing
                      add core of "disambig" application for integration testing
 0.4.1  -  13aug2014  clean up and flesh out "disambig" application
                      fix bugs in cognitive semantics
                      fix bugs in conceptual hierarchies
                      miscellaneous cleanup of Python source files
                      improve unit testing of modules, parse tree dump
 0.5    -  01sep2014  simplify doTest and make parse tree dumps easier to filter
                      add audit on usage of grammar symbols for error checking
                      add version check when loading saved binary language files
                      define ellyException to handle errors in table loading
                      add error messages when generating language tables
                      simplify semantic feature check by generative semantics
                      extend generative semantic unit tests
                      add "bad" application to test PyElly error reporting
 0.5.1  -  12sep2014  fix residual problems with error reporting and recovery
                      extend "bad" application for integration testing
 0.6    -  12oct2014  more input checking in vocabulary table compilation
                      more information in "disambig" application translations
                      better English inflectional and morphological stemming
                      English irregular stemming, update "echo" application
                      extend "chinese" application, improve classifiers
 1.0    -  24dec2014  add comprehensive error reporting in inflectional stemming
                      add WordNet exceptions to cases handled by stemmers
                      upgrade pattern table matching and clean up code
                      fix bug in ellyWildcard with $ wildcard
                      update "querying" application
                      clean up various problems in "chinese" application
                      clean up all modules with PyLint
 1.0.1  -  01jan2015  bug fixes, cleanup ahead of v1.1
 1.0.2  -  12jan2015  bug fixes, cleanup and upgrade ahead of v1.1
                      clean up token extraction and lookup
 1.0.3  -  22jan2015  bug fixes, cleanup ahead of v1.1
                      upgrade code for token extraction and lookup
                      add first iteration of "marking" application
 1.0.4  -  26jan2015  bug fixes and upgrades ahead of v1.1
                      extend "marking" rules and integration test
 1.0.5  -  31jan2015  bug fixes, cleanup ahead of v1.1
                      better handling of punctuation in parsing
                      extend "marking" rules and integration test
 1.0.6  -  07feb2015  bug fixes, cleanup ahead of v1.1
                      improve unit testing
                      add "marking" rules and extend its integration test
 1.0.6a -  12feb2015  clean up code
                      make parsing with "marking" rules more efficient
                      update "marking" integration test
 1.1    -  21mar2015  add name recognition to entity extraction capability
                      add word phonetic signatures
                      add "name" integration test
                      minor cleanup of table loading source
                      fix bug in sentence recognition and clean
 1.2    -  03apr2015  replace Berkeley Database with SQLite
                      clean up PyElly initialization logic
 1.2.1  -  15apr2015  extend rules for "marking" application
                      extend "marking" integration test
                      add more Unicode punctuation handling
                      fix input buffering for Unicode
                      fix morphological stemming problems
                      fix tokenization with new Unicode punctuation
                      fix macro table for new Unicode punctuation
                      add missing code for FIND in generative semantics
 1.2.2  -  01may2015  extend "test" and "marking" integration tests
                      extend handling of punctuation
                      add phrase limit for avoiding runaway analysis
                      fix bug in warning of unused grammar symbols
                      fix bug in token lookup
                      improve morphological stemming
                      break out pickling as separate module
 1.2.3  -  08may2015  extend "marking" integration test
                      fix bug in numerical transformations with period
                      clean up rule definition diagnostics
 1.2.4  -  15may2015  extend "marking" integration test
                      fix bug in scoring plausibility of phrases
                      fix simplified character translation in "chinese" test
                      add tracing to cognitive semantic logic
                      better checking on feature set identifiers
 1.2.5  -  25may2015  clean up "marking" rules and integration test
                      improve input code for syntactic and semantic features
                      increase upper limit on phrase count
                      fix bugs in parse tree growth restrictions
                      fix bug in inheriting syntactic features with *L, *R
                      change directions of FIND command to be more consistent
                      update "test" and "bad" grammars for PyElly changes
                      raise exception for phrase overflows
 1.2.6  -  01jun2015  clean up "marking" application rules
                      extend "marking" integration test
                      clean up logic for loading grammar and vocabulary
                      improve cognitive semantic tracing
                      add diagnostic output for parsing
 1.2.7  -  08jun2015  clean up "marking" rules and change integration test key
                      fix bug in morphological analysis match conditions
                      make punctuation syntax feature ID consistent
                      add automatic check for consistency of all feature IDs
                      fill out description of MERGE command in User's Manual
 1.2.8  -  15jun2015  better debugging for reading in sentences to process
                      fix incorrect stop exception
                      fix inconsistent feature ID in "chinese" grammar
                      fix problem in parse tree dump with big phrase IDs
                      fix bug with apostrophe as quotation mark
                      clean up "marking" application rules
 1.2.9  -  22jun2015  clean up "marking" application rules
                      fix swapping bug in reordering of ambiguous phrases
                      improve diagnostic output
 1.2.10 -  29jun2015  clean up and extend "marking" application
                      fix formatting problem in SHOW semantic command
                      clean up output for TRACE and SHOW
                      add VIEW instrumentation command
                      minor improvements in test scripts and data
 1.2.11 -  06jul2015  fix bug in computing plausibility scores for parses
                      improve reporting of rule usage in parse tree dump
                      clean up "marking" application rules
                      extend "marking" integration test
                      fix bug in handling forms of ellipsis
 1.2.12 -  13jul2015  fix bug in converting ellyBase parse tree depth arg
                      fix bug in adjusting grammar rule biases
                      clean up diagnostic output
                      extend "marking" integration test
 1.2.13 -  20jul2015  fix swapping bug in reordering of ambiguous phrases
                      define Kernel class to make phrase swapping cleaner
                      add check for multiple definition of subprocedures
                      extend "marking" integration test
                      improve default suffix removal
 1.2.14 -  30jul2015  fix minor bug in display of rules invoked for parse tree
                      fix problems in punctuation recognition, clean up code
                      fix bug in handling ` as punctuation in token extraction
                      extend "marking" application rules
 1.2.15 -  03aug2015  fix problems in tracking capitalization, clean up code
                      improve diagnostic output
                      extend "marking" integration test
 1.2.16 -  21aug2015  fix bug in pattern table method
                      improve default suffix removal
                      improve cognitive semantic diagnostics
                      add handling of em and en dashes in tokentization
                      extend default punctuation handling
                      extend "marking" rules and integration test
 1.3    -  06sep2015  add reset of inherited syntactic and semantic features
                      fix bugs in handling features and clean up code
 1.3.1  -  13sep2015  make integration testing script more flexible
                      extend basic "test" integration test
                      clean up "marking" integration test
                      fix missing cognitive semantics for leaf phrase nodes
                      improve diagnostic output
 1.3.2  -  23sep2015  add ellySurvey tool for vocabulary development
                      fix text normalization bug in handling input
                      add apostrophe wildcard
                      fix bugs in binding to text matching wildcards
                      clean up token lookup
                      clean up "marking" rules
 1.3.3  -  03oct2015  fix bugs in vocabulary lookup and tokenization
                      clean up vocabulary development tool
                      clean up char and wildcard definitions
                      improve release checking for binary tables
                      improve diagnostic output
                      extend "echo", "marking" rules and integration test
                      add "stem" application rules and integration test
 1.3.4  -  07oct2015  improve morphological stemming
                      fix stemming bugs in vocabulary table lookup, clean code
                      extend various integration tests for stemming
                      improve output of ellySurvey
                      extend "marking" vocabulary
                      clean up "marking" integration test
                      change comment format in language definition files
 1.3.5  -  11nov2015  add control character for management of parse trees
                      filter out extra ASCII control chars from text input
                      clean up "marking" rules and integration test
                      fix minor bug in generative semantic compilation
                      better error reporting in cognitive semantic compilation
                      make FIND semantic command consistent with other operations
 1.3.5.1 - 26nov2015  fix bugs in control characters for parse tree management
                      clean up affected code modules
                      clean up "marking" rules and integration test
 1.3.5.2 - 15dec2015  fix bug in null check for cognitive semantics
                      rework control characters to be no longer punctuation
                      add rendering of contral characters in rule dumps
                      adjust "chinese" and "querying" integration test
                      adjust integration test script
                      extend "marking" rules with control characters
                      extend "marking" grammar and vocabulary
                      extend "marking" integration test
 1.3.6   - 01jan2016  fix bug in pattern matching of tokens
                      more flexible use of predefined syntactic features
                      extend "marking" language definition
                      extend "marking" integration testing
 1.3.6.1 - 08jan2016  fix bug integrating inflectional stemming and macros
                      clean up English inflectional stemming
                      extend and clean up suffix test cases
                      extend "echo" integration test
                      clean up and extend "marking" rules
                      extend "marking" integration test
 1.3.7   - 18feb2016  add token count to phrase data
                      check token position in cognitive semantics
                      check token count in cognitive semantics
                      allow more spaces in cognitive semantic clauses
                      clean up parse tree building
                      extend "marking" rules and integration test
                      extend, revise, correct cognitive semantic writeup
 1.3.8   - 25feb2016  extend token extraction for nonalphabetic additions
                      clean up basic character handling
                      extend "echo" integration test
                      update documentation
 1.3.9   - 03mar2016  fix various problems with checking of capitalization
                      clean up parse tree code
                      clean up documentation
                      extend "marking" rules and integration test
                      extend "echo" rules
 1.3.10  - 17mar2016  allow fractions to be handled as single tokens
                      extend "marking" rules and integration test
 1.3.11  - 13apr2016  allow vocabulary table entries to start with ','
                      extend "marking" rules and integration test
 1.3.12  - 23apr2016  more error checking in vocabulary table entries
                      extend "bad" rules to test error checking
                      extend "marking" rules and integration test
 1.3.13  - 04jul2016  better handling of hyphens
                      improve parse tree full dump
                      clean up documentation
 1.3.14  - 14jul2016  add method to turn off individual feature bit
                      clean up handling of *L and *R syntactic features
                      fix capitalization bug in vocabulary lookup
                      recompile vocabulary only when needed
                      fix commentary bug with # at end of line
                      minor changes in reporting of table definition
                      clean up and extend documentation
 1.3.15  - 03aug2016  clean up procedure for recompiling language tables
                      clean up commentary and reporting
                      add basic cognitive semantics to pattern tables, entities
                      add feature inheritance checking
                      fix bug in disambiguation with type 0 rules
                      extend "test" integration testing for new patterns
                      extend "marking" application rules
                      clean up "doctor" rules
                      clean up and extend documentation
 1.3.16  - 21aug2016  add another recognizer for space chars
                      fix bug in pattern matching with spaces
                      extend "test" integration testing for space matching
                      clean up integration tests for space matching
                      update documentation
 1.3.17  - 07sep2016  fix bugs in handling tokenization breaks
                      define left enclosing punctuation in ellyChar
                      fix problems in ellyBase from changes in ellyChar.findBreak
                      fix ellyChar bug putting back left enclosing punctuation
                      implement alphabetic uppercase wildcard
                      clarify patternTable unit test
                      clarify macroTable unit test
                      extend "test" integration testing
                      clean up "marking" pattern and macro rules
                      clean up documentation
 1.3.18  - 16sep2016  fix integration problems in token lookup
                      improve unit testing for patternTable, substitutionBuffer
                      improve diagnostics for ellyBase, generativeProcedure
                      improve output representation of ellyBuffer, grammarRule
                      clean up "marking" rules and integration tests
                      extend "test" rules and integration test
                      clean up "doctor" and "chinese" rules
                      fix late setting of bias in leaf phrase nodes
 1.3.19  - 17oct1016  reorganize sentence extraction
                      fix problems with quotations and bracketed text
                      fix problems with English morphology rules
                      fix problem with ampersand in tokenization
                      fix problem with pattern matching on strings with brackets
                      fix problem with abbreviations and hyphenation
                      clean up and extend "marking" rules and integration tests
                      clean up documentation 
                      clean up ellyBase code and commentary
                      clean up ellySurvey code and fix dummy Tree class
                      fix problem with rule sequence numbers in parse tree dumping
                      use *x syntactic feature to identify period as punctuation
                      add check to avoid ord() error on ''
                      add missing error exit in loading vocabulary table
 1.3.20  - 01dec2016  clean up toplevel error checking and reporting
                      clean up logic for what rule files to recompile
                      fix problem with macro patterns ending in _ wildcard
                      add print statements for debugging
                      clean up PyElly table and tree dumps
                      extend "marking" rules and integration testing
 1.3.21  - 10dec2016  fix problem recognizing short bracketed tokens
                      clean up basic PyElly character handling
                      simplify output tags for "marking" example application
                      extend "marking" rules and integration testing
                      update and clarify documentation
 1.3.22  - 20dec2016  increase maximum syntactic category count to 72
                      add checks on semantic feature IDs in vocabulary rules
                      extend and clean up "marking" rules
                      extend "marking" integration testing
                      fix doTest script to make it self-complete
                      fix bug in *LEFT syntactic feature inheritance
                      fix bugs in date entity extraction
                      better checking of arguments for generative semantics
                      better error messages for cognitive semantic logic
                      fix bugs in stop punctuation exceptions
                      add nomatch logic for stop exceptions
                      update documentation
 1.3.23  - 03mar2017  increase maximum syntactic category count to 80
                      extend cases recognized by dateTransform
                      add more context to ellyCharInputStream logic
                      strengthen stopExceptions logic in nomatch()
                      update integration testing for new handling of dates
                      extend "marking" rules and integration test
                      update and clean up documentation
 1.3.24  - 15mar2017  fix bugs with buffer handling in generative semantics
                      add to cognitive semantic tracing output
                      show feature names sorted by index in grammar dump
                      clean up symbolTable error message
                      clean up commentary in parseTree
                      adjust debugging code in dateTransform
                      add extraction procedure for acronym definition
                      extend "marking" rules and integration test
                      update documentation
 1.4.0   - 20mar2017  enlarge Unicode subset recognized in input text
                      fix bugs and clean up ellyChar, add unit test
                      add vowels with diacriticals for pinyin
                      special handling of CJK in ellyCharInputStream
                      update documentation
 1.4.1   - 26mar2017  improve encapsulation of ellyCharInputStream
                      add lookahead method for matching up brackets
                      extend and clean up unit test
                      rework ellySentenceReader logic for bracketed punctuation
                      extend and clean up "marking" rules and integration testing
                      improve unit testing support output
                      add consistency checking for semantic features
                      clean up source files along with line count of code
                      update documentation
 1.4.2   - 17apr2017  add char count check to cognitive semantics
                      add buffer alignment operation to generative semantics
                      extend "bad" rules to test error detection and recovery
                      fix omission in ellyBase handling of phrase token count
                      restore macroTable error check, normalize error messages
                      fix Unicode output redirection in multiple main modules
                      warn in symbolTable of syntactic types with similar names
                      add error checks in syntaxSpecification 
                      extend, reorganize, and clean up "marking" rules
                      extemd "marking" integration test
                      revise, correct, and update documentation
 1.4.3   - 26apr2017  add lowercase letter wildcard
                      simplify stopExceptions and default rules
                      note capitalization at start of current letter sequence
                      clean up commentary in various modules
                      extend "marking" rules
                      correct and update documentation
 1.4.4   - 04may2017  fix bugs with FAIL in generative semantics
                      fix bug with mergeBuffers() method in interpretiveContext
                      clean up translation failure reporting
                      add "fail" integration test with rules to PyElly suite
                      update documentation
 1.4.5   - 22may2017  fix bug in entity extraction when no phrase type is acceptable
                      fix bug with Unicode ellipsis in token extraction
                      add limited title recognition in entity extraction repertory
                      enhance output in unit testing support
                      extend "marking" rules and integration test
                      update documentation
 1.4.6   - 29may2017  make numbers with final decimal point as sentence stop exception
                      add lowercase letters as semiwildcards in PyElly pattern matches
                      correct bug in handling of right context in stopExceptions
                      change stopExceptions to make use of semiwildcard matching
                      clean up "default" stop exceptions
                      extend "marking" rules
                      update documentation
 1.4.7   - 04jun2017  fix capitalization bugs in generative semantics
                      clean up ellyChar methods and tables, extend unit test
                      add method to check patterns for wildcards not matching 1-to-1
                      add checking for patterns with only 1-to-1 wildcard marching
                      put in missing code for stopException matching of right context
                      clean up default stopException logic
                      update documentation, make more accurate
                      extend "marking" rules and integration test
 1.4.8   - 15jun2017  put in missing code for handling nonalphanumeric wildcard
                      allow space wildcard in optional pattern components
                      clean up macro substitution pattern matching
                      update documentation for wildcards
                      extend "marking" rules and integration test
 1.4.9   - 24jun2017  add Greek small letters to PyElly char set
                      extend "marking" rules and integration test
                      update and correct documentation
 1.4.10  -  4jul2017  add Unicode thin spaces to text recognized by PyElly
                      clean handling of various spaces in ellyChar
                      fix bug in matching patterns with space wildcards
                      fix ellyWild bug in deconverting pattern string
                      fix error detection in converting syntactic features
                      correct and extend stopException unit test
                      clean up debugging statements in PyElly modules
                      minor improvements in unit testing
                      extend "marking" rules and integration test
                      update documentation
 1.4.11  - 27jul2017  clean up and extend stop exception recognition
                      improve substitutionBuffer unit test
                      extend "marking" rules
                      update documentation
 1.4.12  - 01aug2017  more rational handling of _ in vocabulary table keys
                      add handling of superscript 1, 2, 3 as digits
                      make tokenization of Unicode consistent with input coding
                      improve vocabularyTable unit test
                      extend "marking" rules and integration test
                      update all integration tests for tokenization encoding
                      update and clean up documentation
 1.4.13  - 01sep2017  increase limit on syntactic types to 96
                      extend "marking" rules and integration test
                      update and clean up documentation
 1.4.14  - 14sep2017  correct bugs in compiling cognitive semantics
                      extend "marking" rules and integration test
                      update and clean up documentation
 1.4.15  - 20sep2017  fix bugs in stop exception recognition
                      clean up stop exception code, commentary, and debugging
                      improve stop exception unit testing
                      fix and clean up default stop exception rules
                      handle ellipsis in PyElly char input stream
                      add musical ♯ and ♭ to Elly character set
                      treat ° as embedded combining
                      extend "marking" rules and integration test
                      update documentation
 1.4.16  - 05oct2017  fix bugs in macro substitution
                      store macro rules as hashable objects
                      add angle brackets 〈〉 for PyElly delimiting
                      generalized handling for all bracketing in term lookup
                      improve algorithm for setting range of pattern matching
                      clean up and extend "marking" rules and integration test
                      update documentation
 1.4.16.1  21oct2017  fix various bugs in dateTransform
                      extend "marking" rules
                      update documentation
 1.4.16.2  23nov2017  fix omissions in inflectional stemming logic
                      extend and correct "marking" rules
                      update documentation
 1.4.17  - 27nov2017  reimplement generative semantics FIND command
                      improve logic for recompiling PyElly tables
                      fix stemming problems with -n ending
                      fix punctuation problems with [ and ]
                      extend "marking" rules
                      update documentation
 1.4.18  - 31ded2017  rename vocabulary table building method to avoid conflict
                      improve handling of m dash in language definition rules
                      extend "marking" rules
                      update documentation
 1.4.18.1  01jan2018  clean up punctuation definitions
                      clean up and extend "marking" rules
                      update documentation
 1.4.19  - 06jan2018  add time period entity extraction
                      clean up and extend "marking" rules
                      update and revise documentation
 1.4.20  - 30jan2018  provide token list on parse tree overflow
                      clean up diagnostic output for parsing
                      fix bug in vocabulary table lookup of inflected entries
                      extend logic for -S inflections in English
                      extend "marking" rules
                      update and revise documentation
 1.4.21  - 05feb2018  fix bug and clean up stop exception code
                      add error check to vocabulary table definition loading
                      extend "marking" rules
                      update documentation
 1.4.22  - 08feb2018  fix bug in vocabulary table case-independent string comparison
                      fix bug in macro substitution with leading apostrophe pattern
                      better warning on macro substitution increasing text length
                      handle doubled single quotes in ellyCharInputStream
                      extend default suffix rules and unit test
                      extend "marking" rules
                      update documentation
 1.4.23  - 13feb2018  fix problems with converting Unicode to ASCII in ellyChar
                      fix problems with vocabulary table lookup in ellyBase
                      fix problems with multi-translation in vocabularyElement
                      fix problems with defining vocabularyTable search keys
                      improve vocabularyTable commentary
                      make nameRecognition compatible with new Unicode to ASCII
                      add dump of SQLite search keys to vocabularyTable
                      extend "marking" rules
                      update documentation
 1.4.24  - 18feb2018  change definitionLine to make it work for VocabularyTable
                      define Unicode hyphen in PyElly input text
                      improve macroTable unit test and add commentary
                      fix "disambig" rules and keys for new VocabularyTable
                      fix "test" rules
                      extend "marking" rules
                      update and expand documentation
 1.4.25  - 22feb2018  handle Euro symbol in PyElly input
                      allow for tokens to be split or joined by pattern match
                      allow date range with hyphen in entity extraction
                      improve English inflectional and morphological stemming
                      fix problem with ellipsis starting a sentence
                      clean up "test" application integration test key
                      extend and clean up rules for "marking" application
                      add daily Google News text data for "marking" tests
                      update and expand documentation
 1.4.26  - 21mar2018  fix problem with vocabulary lookup key ending in S
                      adjust rules for "indexing" application
                      extend rules for "marking" application
                      update documentation
 1.4.27  - 25mar2018  fix problems with tokenization with hyphens
                      add debugging option with no parse tree in ellyBase
                      clean up interpretiveContext and improve encapsulation
                      recognize Unicode hyphen as default punctuation
                      reduce amount of dumps on processing error
                      clean up commentary in code modules
                      update documentation
                      add Google News data
 1.4.28  - 11apr2018  handle special case for stop punctuation in English
                      handle comma after year in dateTransform
                      handle vocabulary entries starting with left double quote
                      improve output of FSA unit testing
                      extend "marking" rules for new data
                      extend "default" stop exception rules for Sr. and Jr.
                      extend "default" morphological stemming
                      update documentation
                      add Google News data
 1.4.29  - 07may2018  fix bug in ellyChar when checking for letter or digit
                      fix bug in dateTransform
                      add logic for special matching of hyphens in ellyWildcard
                      allow sentences to start with em dash
                      improve morphological analysis
                      extend "marking" rules for new data
                      add integration test with "marking" rules and news text
                      update documentation
                      add Google News data
 1.4.30  - 26may2018  fix bug in ellyWildcard with check for list index overflow
                      add special check in ellySentenceReader for lone ellipsis
                      fix problems with inflectional and morphological stemming
                      extend and clean up "marking" rules
                      extend "marking" integration test with news data
                      update "indexing" integration test
                      update documentation
                      add Google News data and clean up
 1.4.31  - 06jun2018  fix major bug in ellyWildcard matching algorithm
                      fix problems with inflectional and morphological stemming
                      extend and clean up "marking" rules
                      extend "marking" integration test with news data
                      update documentation
                      add Google News data and clean up
 1.4.32  - 08jul2018  fix delimiter bug in vocabularyTable and vocabularyElement
                      fix bug not recognizing .'" as a stop for sentence
                      handle soft hyphens properly in ellyCharInputStream
                      improve inflectional stemming
                      extend and clean up "marking" rules
                      clean up "marking" integration test with news data
                      update documentation
                      add Google News data and clean up
 1.5     - 30jul2018  increase maximum number of syntactic categories to 112
                      implement light inflectional stemming in deinflectedMatching
                      reorganize vocabularyTable with light inflectional stemming
                      expand PyElly language rules with new compoundTable module
                      revise integration tests
                      clean up documentation and diagnostic code
                      update documentation
 1.5.1   - 03aug2018  replace compoundTable stub with functioning code
                      integrate template matching in PyElly processing
                      fix bug in ellySurvey because of new vocabularyTable
                      fix minor bug in cognitiveDefiner
                      extend "test" application rules
                      extend "test" integration testing for templates
                      clean up "marking" rules
                      update documentation
 1.5.2   - 30aug2018  expand template elements that can be matched
                      expand "test" integration testing to include templates
                      extend "marking" integration testing with more sentences
                      update documentation
 1.5.3   - 24sep2018  make template matching more consistent for punctuation
                      update documentation
 1.5.4   - 25oct2018  fix bug in matching of $ wildcard
                      fix bug in patternTable with maximum match length
                      clean up patternTable code, add debugging statements
                      add "chemic" application for chemical names
                      add "chemic" to integration testing
                      clean up "marking" pattern rules and integration test
                      update documentation
 1.5.5   - 03nov2018  eliminate duplicate output from ellyBase
                      add data check to ellySurvey for robustness
                      clean up diagnostic output from parseTest
                      add debugging code to parseTree, clean up commentary
                      accept Unicode input in patternTable unit test
                      fix bug in patternTable handling solitary $ as pattern
                      fix bug in handling 00 in simpleTransform
                      fix bug leaving Unicode prime (u2032) undefined as text
                      extend "chemic" rules and integration test
                      update documentation
 1.5.6   - 08nov2018  allow for limited recursive prefix extractions
                      ellyBase reorganized to handle prefix tokens
                      ellyChar must let + be at end of token 
                      clean up ellyWildcard debugging and commentary
                      treeLogic needs to allow for + at start of token
                      extend "chemic" rules and integration test
                      update "marking" integration test
                      update documentation
 1.5.7   - 12nov2018  fix bug in simpleTransform in reading commas in numbers
                      fix bug in patternTable handling $ wildcard
                      rework ellyBase handling of + and - at front of tokens
                      extend "chemic" rules and integration test
                      update documentation
 1.5.8   - 20nov2018  include more format checking in treeLogic
                      make error reporting in morphologyAnalyzer more consistent
                      upgrade vocabularyTable for single Greek letter definition
                      fix bug in ellyBuffer extracting ',' as a token
                      extend "chemic", "bad" rules
                      extend "chemic" integration testing
                      update documentation
 1.5.8.1 - 29nov2018  fix problem with suffix removal after prefix removal
                      extend "chemic" rules
                      extend "chemic" integration testing
                      update documentation
 1.5.8.2 - 07dec2018  fix patternTable bug in handling Unicode prime char
                      extend "chemic" rules
                      extend "chemic" integration testing
                      update documentation
 1.5.8.3 - 10dec2018  handle Greek letters properly in ellyBuffer
                      extend "chemic" rules
                      extend "chemic" integration testing
                      update documentation
 1.5.8.4 - 21dec2018  handle Greek letters properly in patternTable
                      handle Greek letters properly in ellyWildcard matching
                      clarify ellyToken print representation, clean up code
                      extend "chemic" rules
                      extend "chemic" integration testing
                      update documentation
 1.5.8.5 - 10jul2019  extend default suffix rules
                      update documentation
 1.6     - 19oct2019  add support for Chinese Unicode input
                      fix problem with language initialization in ellyBase
                      fix problem in ellyDefinitionReader unit test
                      update documentation
 1.6.1   - 17nov2019  fix bug in recognizing Unicode control chars in input
                      clean up and correct ellyChar commentary
                      update documentation

New versions will be assigned for non-cosmetic changes in PyElly code. This will
often require regenerating any previously saved *.elly.bin files to ensure
correct operation. Changes only to PyElly example application definition files,
unit testing input or key files, and PyElly documentation will be made from time
to time, but these will leave version numbers the same, if they are no other
changes. Check Github for the latest files. The dates above are for the initial
release of a version, not the most recent update,

A website with information about PyElly is at

    https://sites.google.com/site/pyellynaturallanguage/

About

A multifaceted natural language tool written in Python 2.7.*. A release written in Python 3.8 has been uploaded in the GitHub project pyellytoo.

Resources

Stars

Watchers

Forks

Packages

No packages published