Releases: pathwaycom/pathway
Releases · pathwaycom/pathway
v0.16.3
Added
pw.io.iceberg.write
method for writing Pathway tables into Apache Iceberg.
Changed
- values of non-deterministic UDFs are not stored in tables that are
append_only
. pw.Table.ix
has better runtime error message that includes id of the missing row.
Fixed
- temporal behaviors in temporal operators (
windowby
,interval_join
) now consume no CPU when no data passes through them.
v0.16.2
Added
pw.xpacks.llm.prompts.RAGPromptTemplate
, set of prompt utilities that enable verifying templates and creating UDFs from prompt strings or callables.pw.xpacks.llm.question_answering.BaseContextProcessor
streamlines development and tuning of representing retrieved context documents to the LLM.pw.io.kafka.read
now supportswith_metadata
flag, which makes it possible to attach the metadata of the Kafka messages to the table entries.pw.io.deltalake.read
can now stream the tables with deletions, if no deletion vectors were used.
Changed
pw.io.sharepoint.read
now explicitly terminates with an error if it fails to read the data the specified number of times per row (the default is8
).pw.xpacks.llm.prompts.prompt_qa
, and other prompts expect 'context' and 'query' fields instead of 'docs'.- Removed support for
short_prompt_template
andlong_prompt_template
inpw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer
. These prompt variants are no longer accepted during construction or in requests. pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer
allows setting user created prompts. Templates are verified to include 'context' and 'query' placeholders.pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer
can take aBaseContextProcessor
that represents context documents to the LLM. Defaults topw.xpacks.llm.question_answering.SimpleContextProcessor
which filters metadata fields and joins the documents with new lines.
Fixed
- The input of
pw.io.fs.read
andpw.io.s3.read
is now correctly persisted in case deletions or modifications of already processed objects take place.
v0.16.1
Changed
pw.io.s3.read
now monitors object deletions and modifications in the S3 source, when ran in streaming mode. When an object is deleted in S3, it is also removed from the engine. Similarly, if an object is modified in S3, the engine updates its state to reflect those changes.pw.io.s3.read
now supportswith_metadata
flag, which makes it possible to attach the metadata of the source object to the table entries.
Fixed
pw.xpacks.llm.document_store.DocumentStore
no longer requires_metadata
column in the input table.
v0.16.0
Changelog
All notable changes to this project will be documented in this file.
This project adheres to Semantic Versioning.
[Unreleased]
[0.16.0] - 2024-11-29
Added
pw.xpacks.llm.document_store.SlidesDocumentStore
, which is a subclass ofpw.xpacks.llm.document_store.DocumentStore
customized for retrieving slides from presentations.pw.temporal.inactivity_detection
andpw.temporal.utc_now
functions allowing for alerting and other time dependent usecases
Changed
pw.Table.concat
,pw.Table.with_id
,pw.Table.with_id_from
no longer perform checks if ids are unique. It improves memory usage.- table operations that store values (like
pw.Table.join
,pw.Table.update_cells
) no longer store columns that are not used downstream. append_only
column property is now propagated better (there are more places where we can infer it).- BREAKING: Unused arguments from the constructor
pw.xpacks.llm.question_answering.DeckRetriever
are no longer accepted.
Fixed
query_as_of_now
ofpw.stdlib.indexing.DataIndex
andpw.stdlib.indexing.HybridIndex
now work in constant memory for infinite query stream (no query-related data is kept after query is answered).
v0.15.4
Added
pw.io.kafka.read
now supports reading entries starting from a specified timestamp.pw.io.nats.read
andpw.io.nats.write
methods for reading from and writing Pathway tables to NATS.
Changed
pw.Table.diff
now supports settinginstance
parameter that allows computing differences for multiple groups.pw.io.postgres.write_snapshot
now keeps the Postgres table fully in sync with the current state of the table in Pathway. This means that if an entry is deleted in Pathway, the same entry will also be deleted from the Postgres table managed by the output connector.
Fixed
pw.PyObjectWrapper
is now picklable.
v0.15.3
Added
pw.io.mongodb.write
connector for writing Pathway tables in MongoDB.pw.io.s3.read
now supports downloading objects from an S3 bucket in parallel.
Changed
pw.io.fs.read
performance has been improved for directories containing a large number of files.
v0.15.2
Added
pw.io.deltalake.read
now supports custom S3 Delta Lakes with HTTP endpoints.pw.io.deltalake.read
now supports specifying both a custom endpoint and a custom region for Delta Lakes viapw.io.s3.AwsS3Settings
.
Changed
- Indices in
pathway.stdlib.indexing.nearest_neighbors
can now work also on numpy arrays. Previously they only acceptedlist[float]
. Working with numpy arrays improves memory efficiency. pw.io.s3.read
has been optimized to minimize new object requests whenever possible.- It is now possible to set the size limit of cache in
pw.udfs.DiskCache
. - State persistence now uses a single backend for both metadata and stream storage. The
pw.persistence.Config.simple_config
method is therefore deprecated. Now you can use thepw.persistence.Config
constructor with the same parameters that were previously used insimple_config
.
Fixed
pw.io.bigquery.write
connector now correctly handlespw.Json
columns.
v0.15.1
Fixed
pw.temporal.session
andpw.temporal.asof_join
now correctly works with multiple entries with the same time.- Fixed an issue in
pw.stdlib.indexing
where filters would cause runtime errors while usingHybridIndexFactory
.
v0.15.0
Added
- Experimental A
pw.xpacks.llm.document_store.DocumentStore
to process and index documents. pw.xpacks.llm.servers.DocumentStoreServer
used to expose REST server for retrieving documents frompw.xpacks.llm.document_store.DocumentStore
.pw.xpacks.stdlib.indexing.HybridIndex
used for querying multiple indices and combining their results.pw.io.airbyte.read
now also supports streams that only operate infull_refresh
mode.
Changed
- Running servers for answering queries is extracted from
pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer
intopw.xpacks.llm.servers.QARestServer
andpw.xpacks.llm.servers.QASummaryRestServer
. - BREAKING:
query
andquery_as_of_now
ofpathway.stdlib.indexing.data_index.DataIndex
now produce an empty list instead ofNone
if no match is found
v0.14.3
Fixed
pw.io.deltalake.read
andpw.io.deltalake.write
now correctly work with lakes hosted in S3 over min.io, Wasabi and Digital Ocean.
Added
- The Pathway CLI command
spawn
can now execute code directly from a specified GitHub repository. - A new CLI command,
spawn-from-env
, has been added. This command runs the Pathway CLIspawn
command using arguments provided in thePATHWAY_SPAWN_ARGS
environment variable.