Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug/error on HTML table generation #369

Open
pawel-kmiecik opened this issue Jul 24, 2024 · 1 comment
Open

bug/error on HTML table generation #369

pawel-kmiecik opened this issue Jul 24, 2024 · 1 comment

Comments

@pawel-kmiecik
Copy link
Contributor

When processing a PDF file with hi_res in unstructured-api, an error occurs on HTML table generation (from unstructured-inferece):

2024-07-24T08:49:18.887448624Z   File "/home/notebook-user/.local/lib/python3.11/site-packages/unstructured/partition/pdf_image/ocr.py", line 284, in supplement_element_with_table_extraction
2024-07-24T08:49:18.887488006Z     text_as_html = "" if tatr_cells == "" else cells_to_html(tatr_cells)
2024-07-24T08:49:18.887503751Z                                                ^^^^^^^^^^^^^^^^^^^^^^^^^
2024-07-24T08:49:18.887511928Z   File "/home/notebook-user/.local/lib/python3.11/site-packages/unstructured_inference/models/tables.py", line 704, in cells_to_html
2024-07-24T08:49:18.887519618Z     cells = sorted(fill_cells(cells), key=lambda k: (min(k["row_nums"]), min(k["column_nums"])))
2024-07-24T08:49:18.887527508Z                    ^^^^^^^^^^^^^^^^^
2024-07-24T08:49:18.887534601Z   File "/home/notebook-user/.local/lib/python3.11/site-packages/unstructured_inference/models/tables.py", line 667, in fill_cells
2024-07-24T08:49:18.887542331Z     table_rows_no = max({row for cell in cells for row in cell["row_nums"]})
2024-07-24T08:49:18.887549813Z                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-07-24T08:49:18.887557089Z ValueError: max() arg is an empty sequence

Environment:

Unstructured API 0.0.72 deployed in remote machine
Local deployment lib versions
unstructured==0.14.6
unstructured-client==0.18.0
OS
Ubunut 22.04.02 LTS
@christinestraub
Copy link
Contributor

@pawel-kmiecik The issue you're encountering appears to be related to the following:

This problem has been addressed in PR #359. To resolve it, please update your unstructured version to at least 0.14.8 or the latest available version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants