-
-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add a basic TextFileStream and tests #23
base: main
Are you sure you want to change the base?
Conversation
Tests seem to be failing due to composer dependencies not being updated. |
Hey First of all, thanks so much for your contribution, it's greatly appreciated. I looked at your PR, and the first thing I noticed is the change to Alternatives from the top of my head might include
I'll be looking into this in more detail in the coming week. Thanks for your patience! |
hey @turanct - thanks for the update and getting time to review this PR. It is a bit tricky to resolve the proper way to deal with this type of situation. Mainly because with a string blob and a file stream the concept of So I'm curious what type of workflow could cause your concern here:
If that issue were possible then seeing an example or even a failing test for it would be great. I've accounted for both ASCII and UTF8 strings in my tests to ensure file based streams track this consistently so the parser doesn't get bytes confused and provide incorrect results. So absent any 3rd party interactions with the file in question, once the parser starts working with a TextFileStream there shouldn't be concern about the values drifting from what they should be.
This is essentially what is being done -the canonical default being byte position 0 and line/character 1- then bytes are calculated here as data is parsed. Given that The library I created to manage the concept of "immutable read file" handlers then keeps track of opening and closing file pointers as needed, as well as providing an immutable instance of "the current file position". So this allows a single file pointer to be reused for many unique So far this seems successful for file pointers and text blobs as the stream. I'm curious what other contexts we may need to imagine covering support for? I have a hard time imagining a method of parsing (and rewinding) the parsing of an actual "live stream" of text. I.e. for parsing an HTTP response for example, the response body would likely need to be cached no matter what in order to rewind it during parsing. So at that point the buffer should either be a |
...and include editorconfig changes to prevent that affecting tests
...might be worth doing a special class to wrap the immutable resource?
6b16e5b
to
535ad36
Compare
hey @turanct - just wondering if you might be able to take a look at this again? maybe if you start a "review" and make notes on the areas of concern that could be a good step? As mentioned I understand the confusion around BytePosition being tracked, but I believe I've taken steps to address those concerns. Effectively this PR expands the Position object to operate consistently while providing two different contexts for tracking position. This matters as working with an arbitrary blob of text the current Column+Line context makes sense. However, when working with a stream of text from a socket - AFAIK - only the byte position can be used. Thanks! |
Description
This PR provides an out of the box implementation of a
TextFileStream
which works like theStringStream
. The major difference is that it's using a file pointer for the stream source.How is this accomplished and why?
While this is a crucial functionality to add IMO, it's not without challenges. Normally PHP file pointers are very mutable and track their own current position. This would mean that file handles need to be managed carefully to either: a) pass them around for reuse, or b) properly close and reopen them.
For how this library works that could get hard to manage very fast. In order to make this more compatible with how the StringStream works we'd need effectively an immutable version of PHP file handlers. So I created something to do exactly that - an API to wrap the default PHP file handlers that provides Immutable objects with idempotence.
See here: https://github.com/mallardduck/immutable-read-file
Effectively the package is just a thin wrapper that uses a
FileHandlerManager
for opening, sharing, tracking and freeing open file handles.Necessary changes in parsica?
Due to file pointers tracking position based on current bytes I've had to extend
Position
to track bytes as well.This seems to not add any issues for
StringStream
and will help cover more use-cases like this TextFileStream. This newbytePosition
property should remain consistent with the existingPosition
logic as this value is tracked and incremented the same way thatline
andcolumn
are tracked.Internally nothing about parsica has changed in this context, however all methods of file handle manipulation speak in terms of
bytePosition
rather thanline
/column
. So really it's a new property to track and pass around - however it's only necessary when setting the files position like here:https://github.com/parsica-php/parsica/pull/23/files#diff-8c40f483a7f4a801d83fbe9ac70eb75e538c6216d8eef5789c730aa4145825a4R39
Notes about changes to tests
In order to reuse some test logic I've also split off a test file to be used with
require_once
that way Excel tests can be done on multiple streams.