Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnexpectedSignatureError: b'<htm'` #174

Open
hinas-source opened this issue Mar 27, 2024 · 1 comment
Open

UnexpectedSignatureError: b'<htm'` #174

hinas-source opened this issue Mar 27, 2024 · 1 comment

Comments

@hinas-source
Copy link

`---------------------------------------------------------------------------
UnexpectedSignatureError Traceback (most recent call last)
Cell In[5], line 9
5 url = f"https://download.companieshouse.gov.uk/Accounts_Bulk_Data-2024-01-20.zip"
6 with
7 httpx.stream('GET', url) as r,
8 stream_read_xbrl_zip(r.iter_bytes(chunk_size=65536)) as (columns, rows):
----> 9 df = pd.DataFrame(rows, columns=columns)
10 if isinstance(df, pd.DataFrame):
11 df1 = df

File c:\Users\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\frame.py:832, in DataFrame.init(self, data, index, columns, dtype, copy)
830 data = np.asarray(data)
831 else:
--> 832 data = list(data)
833 if len(data) > 0:
834 if is_dataclass(data[0]):

File c:\Users\AppData\Local\Programs\Python\Python312\Lib\site-packages\stream_read_xbrl.py:556, in (.0)
553 yield queue.popleft().result()
555 with ProcessPoolExecutor(max_workers=num_workers) as executor:
--> 556 yield _COLUMNS, (
557 row + (zip_url,)
558 for results in imap(executor, _xbrl_to_rows, ((name.decode(), b''.join(chunks)) for name, _, chunks in stream_unzip(zip_bytes_iter)))
559 for row in results
560 )

File c:\Users\AppData\Local\Programs\Python\Python312\Lib\site-packages\stream_read_xbrl.py:546, in stream_read_xbrl_zip..imap(executor, func, param_iterables)
545 def imap(executor, func, param_iterables):
--> 546 for params in param_iterables:
547 if len(queue) == num_workers:
548 yield queue.popleft().result()

File c:\Users\AppData\Local\Programs\Python\Python312\Lib\site-packages\stream_read_xbrl.py:558, in (.0)
553 yield queue.popleft().result()
555 with ProcessPoolExecutor(max_workers=num_workers) as executor:
556 yield _COLUMNS, (
557 row + (zip_url,)
--> 558 for results in imap(executor, _xbrl_to_rows, ((name.decode(), b''.join(chunks)) for name, _, chunks in stream_unzip(zip_bytes_iter)))
559 for row in results
560 )

File c:\Users\AppData\Local\Programs\Python\Python312\Lib\site-packages\stream_unzip.py:460, in stream_unzip(zipfile_chunks, password, chunk_size, allow_zip64)
457 else:
458 raise UnexpectedSignatureError(signature)
--> 460 for file_name, file_size, unzipped_chunks in all():
461 yield file_name, file_size, unzipped_chunks
462 for _ in unzipped_chunks:

File c:\Users\AppData\Local\Programs\Python\Python312\Lib\site-packages\stream_unzip.py:458, in stream_unzip..all()
456 break
457 else:
--> 458 raise UnexpectedSignatureError(signature)

UnexpectedSignatureError: b'<htm'`

I am getting this error

@michalc
Copy link
Member

michalc commented Mar 27, 2024

Hi @hinas-source,

Can you post the exact code you're running? For example, I'm running this in a Python file:

import httpx
from stream_read_xbrl import stream_read_xbrl_zip

if __name__ == '__main__':
    url = 'https://download.companieshouse.gov.uk/Accounts_Bulk_Data-2024-01-20.zip'
    with httpx.stream('GET', url) as r:
        r.raise_for_status()
        with stream_read_xbrl_zip(r.iter_bytes(chunk_size=65536)) as (columns, rows):
            for row in rows:
                print(row)

And it seemingly woks fine - printing all the results without that error.

(If you can format the code so it appears as code, properly indented, that would be helpful. See https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-code)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants