Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make DataFrame & Pipelines truly immutable #846

Open
norberttech opened this issue Nov 28, 2023 · 0 comments
Open

Make DataFrame & Pipelines truly immutable #846

norberttech opened this issue Nov 28, 2023 · 0 comments
Labels
core developer experience Resolving this issue should improve development experience for the library users.

Comments

@norberttech
Copy link
Member

Currently run method triggers DataFrame clone internally, the idea behind that was to always duplicate the dataframe in order to make it reusable. However PHP clone is shallow, I tried to use deep copy library however readonly properties are not allowing for it.
The solution for this problem is to make DataFrame and Pipelines truly immutable, meaning that adding any new element should create entirely new DataFrame.
One thing to remember is that extractors should be always recreated as well, otherwise all instances of DataFrame will keep reference to the same extractor and only one instance will be able to use it.

This test should be a confirmation that dataframe is truly immutable:

<?PHP

        $df = (new Flow())
            ->read(
                From::array([
                    ['date' => '2023-01-01', 'user' => 'user_01', 'commits' => 1, 'active' => false],
                    ['date' => '2023-01-01', 'user' => 'user_02', 'commits' => 2, 'active' => true],
                    ['date' => '2023-01-01', 'user' => 'user_03', 'commits' => 3, 'active' => true],
                    ['date' => '2023-01-02', 'user' => 'user_01', 'commits' => 4, 'active' => true],
                    ['date' => '2023-01-02', 'user' => 'user_02', 'commits' => 5, 'active' => true],
                    ['date' => '2023-01-02', 'user' => 'user_03', 'commits' => 6, 'active' => true],
                ])
            )
            ->filter(ref('active')->isTrue());


        $this->assertSame(
            [
                ['date' => '2023-01-01', 'commits_sum' => 5],
                ['date' => '2023-01-02', 'commits_sum' => 15],
            ],
            $df->groupBy(ref("date"))
                ->aggregate(sum(ref('commits')))
                ->fetch()
                ->toArray()
        );

        $this->assertSame(
            [
                ['date' => '2023-01-01'],
                ['date' => '2023-01-02'],
            ],
            $df->fetch()->toArray()
        );
@norberttech norberttech converted this from a draft issue Nov 28, 2023
@norberttech norberttech added core developer experience Resolving this issue should improve development experience for the library users. labels Nov 28, 2023
@norberttech norberttech added this to the 0.6.0 milestone Nov 28, 2023
@norberttech norberttech modified the milestones: 0.6.0, 0.7.0 Jan 27, 2024
@norberttech norberttech removed this from the 0.7.0 milestone Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core developer experience Resolving this issue should improve development experience for the library users.
Projects
Status: Todo
Development

No branches or pull requests

1 participant