Skip to content
This repository has been archived by the owner on Jul 25, 2024. It is now read-only.

Implement parser.py #22

Open
alexhebing opened this issue May 24, 2019 · 1 comment
Open

Implement parser.py #22

alexhebing opened this issue May 24, 2019 · 1 comment
Labels
enhancement New feature or request

Comments

@alexhebing
Copy link

icab_parser.py is currently hard-coded against the ICAB corpus. Make it a little bit more generic by allowing the user to define from which elements to extract the inner text.

Perhaps also allow the user to define which attribute(s) from which element(s) contains the text needed to NER. Experiment a little with this last option, but not too long. As long as the exact corpus that the scripts in this repo will be used against (hence with which it will have to work) is not decided, the options and possibilities are probably too many to come up with a properly generic solution.

@alexhebing alexhebing added the enhancement New feature or request label May 24, 2019
@alexhebing
Copy link
Author

In our meeting today, clients told me that the corpora that they are most interested in working with are already in txt files. Make it so that this script works for extracting from basic elements (i.e. pass element name to extract text from) and for the Europeana corpus (i.e. extracting the text from the attributes of certain elements).

@alexhebing alexhebing changed the title Make icab_parser.py slightly more generic Implement parser.py May 27, 2019
@alexhebing alexhebing self-assigned this Jun 3, 2019
alexhebing pushed a commit that referenced this issue Jun 3, 2019
alexhebing pushed a commit that referenced this issue Jun 3, 2019
alexhebing pushed a commit that referenced this issue Jun 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant