You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 25, 2024. It is now read-only.
icab_parser.py is currently hard-coded against the ICAB corpus. Make it a little bit more generic by allowing the user to define from which elements to extract the inner text.
Perhaps also allow the user to define which attribute(s) from which element(s) contains the text needed to NER. Experiment a little with this last option, but not too long. As long as the exact corpus that the scripts in this repo will be used against (hence with which it will have to work) is not decided, the options and possibilities are probably too many to come up with a properly generic solution.
The text was updated successfully, but these errors were encountered:
In our meeting today, clients told me that the corpora that they are most interested in working with are already in txt files. Make it so that this script works for extracting from basic elements (i.e. pass element name to extract text from) and for the Europeana corpus (i.e. extracting the text from the attributes of certain elements).
icab_parser.py
is currently hard-coded against the ICAB corpus. Make it a little bit more generic by allowing the user to define from which elements to extract the inner text.Perhaps also allow the user to define which attribute(s) from which element(s) contains the text needed to NER. Experiment a little with this last option, but not too long. As long as the exact corpus that the scripts in this repo will be used against (hence with which it will have to work) is not decided, the options and possibilities are probably too many to come up with a properly generic solution.
The text was updated successfully, but these errors were encountered: