Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for partitioning in parquet loader #712

Merged

Conversation

norberttech
Copy link
Member

Change Log

Added

  • Added support for partitioning in parquet loader

Fixed

Changed

Removed

Deprecated

Security


Description

Ref: #575

Copy link
Contributor

github-actions bot commented Nov 2, 2023

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+------------------+-------------------+-----------------+
| benchmark             | subject           | revs | its | mem_peak         | mode              | rstdev          |
+-----------------------+-------------------+------+-----+------------------+-------------------+-----------------+
| AvroExtractorBench    | bench_extract_10k | 1    | 3   | 44.091mb +0.00%  | 352.848ms -19.51% | ±0.90% +445.92% |
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 13.976mb +0.00%  | 262.707ms -25.54% | ±0.10% -83.81%  |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 18.638mb +0.00%  | 548.201ms -24.81% | ±1.07% +143.93% |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 242.972mb +0.00% | 749.120ms -29.69% | ±0.42% -31.39%  |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 7.246mb +0.01%   | 14.508ms -32.05%  | ±0.98% -41.96%  |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 7.593mb +0.00%   | 438.189ms -33.38% | ±1.05% +132.20% |
+-----------------------+-------------------+------+-----+------------------+-------------------+-----------------+
Transformers
+-----------------------------+--------------------------+------+-----+-----------------+------------------+----------------+
| benchmark                   | subject                  | revs | its | mem_peak        | mode             | rstdev         |
+-----------------------------+--------------------------+------+-----+-----------------+------------------+----------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 87.032mb +0.00% | 49.304ms -33.84% | ±2.58% +66.26% |
+-----------------------------+--------------------------+------+-----+-----------------+------------------+----------------+
Loaders
+--------------------+----------------+------+-----+------------------+-------------------+-----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode              | rstdev          |
+--------------------+----------------+------+-----+------------------+-------------------+-----------------+
| AvroLoaderBench    | bench_load_10k | 1    | 3   | 93.195mb +0.00%  | 576.542ms -20.46% | ±0.55% +19.31%  |
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 47.109mb +0.00%  | 67.920ms -4.50%   | ±1.08% +20.67%  |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 88.543mb +0.00%  | 56.277ms -30.21%  | ±1.62% +812.81% |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 286.871mb +0.00% | 1.179s -27.58%    | ±1.10% +182.79% |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 16.531mb +0.00%  | 40.924ms +9.50%   | ±0.28% -39.92%  |
+--------------------+----------------+------+-----+------------------+-------------------+-----------------+
Building Blocks
+-------------------------+----------------------------+------+-----+-----------------+-------------------+-----------------+
| benchmark               | subject                    | revs | its | mem_peak        | mode              | rstdev          |
+-------------------------+----------------------------+------+-----+-----------------+-------------------+-----------------+
| RowsBench               | bench_chunk_10_on_10k      | 2    | 3   | 60.656mb +0.00% | 2.247ms -70.69%   | ±1.01% +51.22%  |
| RowsBench               | bench_diff_left_1k_on_10k  | 2    | 3   | 80.448mb +0.00% | 150.063ms -16.25% | ±0.28% +67.57%  |
| RowsBench               | bench_diff_right_1k_on_10k | 2    | 3   | 58.974mb +0.00% | 14.961ms -17.84%  | ±0.47% -36.39%  |
| RowsBench               | bench_drop_1k_on_10k       | 2    | 3   | 59.795mb +0.00% | 1.764ms -64.80%   | ±1.76% -13.82%  |
| RowsBench               | bench_drop_right_1k_on_10k | 2    | 3   | 59.795mb +0.00% | 1.753ms -66.29%   | ±0.75% -65.69%  |
| RowsBench               | bench_entries_on_10k       | 2    | 3   | 59.008mb +0.00% | 2.653ms -58.18%   | ±0.23% +97.80%  |
| RowsBench               | bench_filter_on_10k        | 2    | 3   | 59.537mb +0.00% | 13.952ms -47.63%  | ±1.17% +12.01%  |
| RowsBench               | bench_find_on_10k          | 2    | 3   | 59.536mb +0.00% | 14.151ms -46.01%  | ±1.14% -16.44%  |
| RowsBench               | bench_find_one_on_10k      | 10   | 3   | 57.608mb +0.00% | 1.606μs -46.36%   | ±2.89% +81.63%  |
| RowsBench               | bench_first_on_10k         | 10   | 3   | 57.608mb +0.00% | 0.400μs -20.00%   | ±0.00% +0.00%   |
| RowsBench               | bench_flat_map_on_1k       | 2    | 3   | 65.841mb +0.00% | 10.417ms -34.35%  | ±1.30% +2.41%   |
| RowsBench               | bench_map_on_10k           | 2    | 3   | 91.361mb +0.00% | 47.741ms -28.86%  | ±0.98% +68.97%  |
| RowsBench               | bench_merge_1k_on_10k      | 2    | 3   | 60.057mb +0.00% | 1.917ms -64.26%   | ±1.35% -35.60%  |
| RowsBench               | bench_partition_by_on_10k  | 2    | 3   | 62.327mb +0.00% | 32.621ms -40.54%  | ±0.78% +13.27%  |
| RowsBench               | bench_remove_on_10k        | 2    | 3   | 62.158mb +0.00% | 4.747ms -59.11%   | ±0.44% -84.70%  |
| RowsBench               | bench_sort_asc_on_1k       | 2    | 3   | 57.608mb +0.00% | 37.365ms -29.73%  | ±1.42% +278.80% |
| RowsBench               | bench_sort_by_on_1k        | 2    | 3   | 57.608mb +0.00% | 37.490ms -29.49%  | ±0.30% -26.20%  |
| RowsBench               | bench_sort_desc_on_1k      | 2    | 3   | 57.608mb +0.00% | 38.009ms -28.99%  | ±0.76% -38.62%  |
| RowsBench               | bench_sort_entries_on_1k   | 2    | 3   | 59.882mb +0.00% | 7.214ms -25.83%   | ±0.65% -25.81%  |
| RowsBench               | bench_sort_on_1k           | 2    | 3   | 57.607mb +0.00% | 28.265ms -28.56%  | ±0.29% +17.44%  |
| RowsBench               | bench_take_1k_on_10k       | 10   | 3   | 57.608mb +0.00% | 12.979μs -44.20%  | ±0.96% -69.43%  |
| RowsBench               | bench_take_right_1k_on_10k | 10   | 3   | 57.608mb +0.00% | 15.418μs -45.64%  | ±0.91% -44.58%  |
| RowsBench               | bench_unique_on_1k         | 2    | 3   | 80.449mb +0.00% | 157.246ms -13.99% | ±0.70% +33.20%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 91.740mb -0.01% | 123.687ms -18.93% | ±0.28% -55.05%  |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 47.596mb +0.01% | 60.817ms -22.10%  | ±2.86% +525.55% |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 12.388mb +0.00% | 13.963ms -27.40%  | ±1.75% +152.66% |
+-------------------------+----------------------------+------+-----+-----------------+-------------------+-----------------+

@@ -52,9 +56,14 @@ public function __unserialize(array $data) : void

public function closure(Rows $rows, FlowContext $context) : void
{
$this->writer($context)->close();
if (\count($this->writers)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to count, foreach handles that already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me personally, this is irrelevant but I'm fine with making it a rule where there is no need for count before foreach on non nullable variables that can be iterated through, however it should be automated to not waste time reviewing those kind of details.
Maybe psalm/phpstan could help here by throwing an error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@norberttech norberttech merged commit 69b0c81 into flow-php:1.x Nov 2, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants