We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Feature
Would be nice
Hi,
A few improvements can be made to docx parsing in docxreader.py
apply_paragraph_style
style.startswith("List")
try: numPr = paragraph._element.xpath("./w:pPr/w:numPr")[0] level = int(numPr.xpath("./w:ilvl/@w:val")[0]) except Exception: numPr = None level = 0 style = paragraph.style.name # Apply style if re.match(r"^Heading [1-9]$", style): n = int(style.split(" ")[-1]) text = f"{'#' * n} {text}" elif style.startswith("List") or (numPr is not None): ...
from itertools import chain ... table_images = [cell_images for row in rows for _, cell_images in row] table_images = list(chain(*chain(*table_images)))
instead of
table_images = [image for row in rows for _, images in row for image in images]
To do so, you can replace lin 219
for c in paragraph.iter_inner_content(): ...
by
for section in self.document.sections: self.text += "".join([self.format_paragraph(p)[0] + "\n" for p in section.header.paragraphs]) for c in section.iter_inner_content(): ... self.text += "".join([self.format_paragraph(p)[0] + "\n" for p in section.footer.paragraphs])
No response
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
Would be nice
Please provide a clear description of problem this feature solves
Hi,
A few improvements can be made to docx parsing in docxreader.py
Describe the feature, and optionally a solution or implementation and any alternatives
apply_paragraph_style
(some bullet lists are not detected bystyle.startswith("List")
):instead of
To do so, you can replace lin 219
by
Additional context
No response
The text was updated successfully, but these errors were encountered: