Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#27 Fix obj serialization for saving #29

Merged
merged 9 commits into from
Oct 29, 2024
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ Extras available:
- `hf` to install Hugging Face Transformers dependencies
- `vllm` to install vLLM dependencies
- `llamafile` to install Llamafile dependencies
- `baseten` to install Baseten dependencies

## Quick Start

Expand Down
2 changes: 1 addition & 1 deletion flow_judge/flow_judge.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@ async def async_batch_evaluate(
f"Number of Baseten API errors: {len(batch_result.errors)}"
f" of {batch_result.total_requests}."
f" Success rate is {batch_result.success_rate}"
"List of errors: "
" List of errors: "
)
for error in batch_result.errors:
logger.warning(f"{error.error_type}: {error.error_message}")
Expand Down
43 changes: 0 additions & 43 deletions flow_judge/models/adapters/baseten/todos.md

This file was deleted.

245 changes: 208 additions & 37 deletions flow_judge/utils/result_writer.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
import json
import logging
import os
import re
from datetime import datetime, timezone
from enum import Enum
from pathlib import Path
from typing import Any

Check warning on line 7 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L5-L7

Added lines #L5 - L7 were not covered by tests

from pydantic import BaseModel

Check warning on line 9 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L9

Added line #L9 was not covered by tests

import flow_judge
from flow_judge.eval_data_types import EvalInput, EvalOutput
Expand All @@ -13,55 +17,222 @@
def write_results_to_disk(
eval_inputs: list[EvalInput],
eval_outputs: list[EvalOutput],
model_metadata: dict,
model_metadata: dict[str, Any],
metric_name: str,
output_dir: str,
):
output_dir: str | Path,
) -> None:
"""Write evaluation results, inputs, and metadata to separate JSONL files.

Warning:
The `eval_inputs` and `eval_outputs` lists must have the same length.
If they don't, a ValueError will be raised during the writing process.
This function processes evaluation data and writes it to disk in a structured format.
It creates separate files for metadata and results, organizing them in directories
based on the metric name and model ID.

Args:
eval_inputs: List of evaluation inputs.
eval_outputs: List of evaluation outputs.
model_metadata: Dictionary containing model metadata.
metric_name: Name of the metric being evaluated.
output_dir: Directory to write output files.

Raises:
ValueError: If inputs are invalid, empty, or lists have different lengths.
KeyError: If required keys are missing from model_metadata.
OSError: If there are file system related errors during writing.

Note:
- Ensures eval_inputs and eval_outputs have the same length.
- Creates necessary directories if they don't exist.
- Handles special characters in metric_name and model_id for file naming.
- Overwrites existing files with the same name without warning.
"""
fmt_metric_name = re.sub(r"\s", "_", re.sub(r"\(|\)", "", metric_name.lower()))
fmt_model_id = model_metadata["model_id"].replace("/", "__")
_validate_inputs(eval_inputs, eval_outputs, model_metadata, metric_name)

Check warning on line 48 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L48

Added line #L48 was not covered by tests

fmt_metric_name = _format_name(metric_name)
fmt_model_id = _format_name(model_metadata["model_id"])

Check warning on line 51 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L50-L51

Added lines #L50 - L51 were not covered by tests
timestamp = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H-%M-%S.%f")[:-3]

base_filename = f"{fmt_metric_name}_{fmt_model_id}_{model_metadata['model_type']}_{timestamp}"
paths = _prepare_file_paths(output_dir, fmt_metric_name, fmt_model_id, base_filename)

Check warning on line 55 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L54-L55

Added lines #L54 - L55 were not covered by tests

metadata = _prepare_metadata(model_metadata, timestamp)

Check warning on line 57 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L57

Added line #L57 was not covered by tests

try:
_write_json_file(paths["metadata"], metadata)
_write_results_file(paths["results"], eval_inputs, eval_outputs)
except OSError as e:
logger.error(f"Error writing files: {e}")
raise

Check warning on line 64 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L59-L64

Added lines #L59 - L64 were not covered by tests

logger.info(f"Results saved to {paths['results']}")

Check warning on line 66 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L66

Added line #L66 was not covered by tests


def _validate_inputs(

Check warning on line 69 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L69

Added line #L69 was not covered by tests
eval_inputs: list[EvalInput],
eval_outputs: list[EvalOutput],
model_metadata: dict[str, Any],
metric_name: str,
) -> None:
"""Validate input parameters for the write_results_to_disk function.

Args:
eval_inputs: List of evaluation inputs.
eval_outputs: List of evaluation outputs.
model_metadata: Dictionary containing model metadata.
metric_name: Name of the metric being evaluated.

Raises:
ValueError: If eval_inputs or eval_outputs are empty, have different lengths,
or if metric_name is empty or only whitespace.
KeyError: If required keys ('model_id', 'model_type') are missing from
model_metadata.

Note:
This function does not validate the content of eval_inputs or eval_outputs,
only their presence and length.
"""
if not eval_inputs or not eval_outputs:
raise ValueError("eval_inputs and eval_outputs cannot be empty")
if len(eval_inputs) != len(eval_outputs):
raise ValueError("eval_inputs and eval_outputs must have the same length")
if not metric_name or not metric_name.strip():
raise ValueError("metric_name cannot be empty or only whitespace")
required_keys = {"model_id", "model_type"}
missing_keys = required_keys - set(model_metadata.keys())
if missing_keys:
raise KeyError(f"model_metadata missing required keys: {missing_keys}")

Check warning on line 102 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L93-L102

Added lines #L93 - L102 were not covered by tests


def _format_name(name: str) -> str:

Check warning on line 105 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L105

Added line #L105 was not covered by tests
"""Format a name for use in file paths by removing special characters.

Args:
name: The name to format.

Returns:
A formatted string safe for use in file paths.

Note:
This function replaces spaces with underscores, removes non-alphanumeric
characters (except underscore and hyphen), and replaces non-ASCII
characters with underscores.
"""
# Replace spaces with underscores
name = name.replace(" ", "_")

Check warning on line 120 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L120

Added line #L120 was not covered by tests
# Remove any character that is not alphanumeric, underscore, or hyphen
name = re.sub(r"[^\w\-]", "", name)

Check warning on line 122 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L122

Added line #L122 was not covered by tests
# Replace any non-ASCII character with underscore
name = re.sub(r"[^\x00-\x7F]", "_", name)
return name

Check warning on line 125 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L124-L125

Added lines #L124 - L125 were not covered by tests


def _prepare_file_paths(

Check warning on line 128 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L128

Added line #L128 was not covered by tests
output_dir: str | Path,
fmt_metric_name: str,
fmt_model_id: str,
base_filename: str,
) -> dict[str, Path]:
"""Prepare file paths for metadata and results files.

Args:
output_dir: Base output directory.
fmt_metric_name: Formatted metric name.
fmt_model_id: Formatted model ID.
base_filename: Base filename for output files.

Returns:
A dictionary containing paths for metadata and results files.

Note:
This function creates the necessary directories if they don't exist.
It does not check if the resulting file paths already exist.
"""
output_dir = Path(output_dir)
metric_folder = output_dir / fmt_metric_name
metadata_folder = metric_folder / f"metadata_{fmt_metric_name}_{fmt_model_id}"
metadata_folder.mkdir(parents=True, exist_ok=True)

Check warning on line 152 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L149-L152

Added lines #L149 - L152 were not covered by tests

return {

Check warning on line 154 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L154

Added line #L154 was not covered by tests
"metadata": metadata_folder / f"metadata_{base_filename}.json",
"results": metric_folder / f"results_{base_filename}.jsonl",
}


def _prepare_metadata(model_metadata: dict[str, Any], timestamp: str) -> dict[str, Any]:

Check warning on line 160 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L160

Added line #L160 was not covered by tests
"""Prepare metadata dictionary for writing.

Args:
model_metadata: Dictionary containing model metadata.
timestamp: Timestamp string.

Returns:
A dictionary containing prepared metadata.

Note:
- Adds 'library_version' and 'timestamp' to the metadata.
- Converts Pydantic BaseModel instances to dictionaries.
- Converts Enum instances to their values.
- Does not deep copy the input model_metadata.
"""
metadata = {
"library_version": f"{flow_judge.__version__}",
"timestamp": timestamp,
**model_metadata,
}
for key, item in metadata.items():
if isinstance(item, BaseModel):
metadata[key] = item.model_dump()
elif isinstance(item, Enum):
metadata[key] = item.value
return metadata

Check warning on line 186 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L181-L186

Added lines #L181 - L186 were not covered by tests

metric_folder = os.path.join(output_dir, fmt_metric_name)
metadata_folder = os.path.join(metric_folder, f"metadata_{fmt_metric_name}_{fmt_model_id}")

# Create all necessary directories
os.makedirs(metadata_folder, exist_ok=True)
def _write_json_file(path: Path, data: dict[str, Any]) -> None:

Check warning on line 189 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L189

Added line #L189 was not covered by tests
"""Write data to a JSON file.

base_filename = f"{fmt_metric_name}_{fmt_model_id}_{model_metadata['model_type']}_{timestamp}"
metadata_path = os.path.join(metadata_folder, f"metadata_{base_filename}.json")
results_path = os.path.join(metric_folder, f"results_{base_filename}.jsonl")
Args:
path: Path to the output file.
data: Data to write to the file.

# Write metadata file
try:
with open(metadata_path, "w", encoding="utf-8") as f:
f.write(json.dumps(metadata) + "\n")
except OSError as e:
logger.error(f"Error writing metadata file: {e}")
raise
Raises:
OSError: If there's an error writing to the file.

# Write results file
try:
with open(results_path, "w", encoding="utf-8") as f:
for input_data, eval_output in zip(eval_inputs, eval_outputs, strict=True):
result = {
"sample": input_data.model_dump(),
"feedback": eval_output.feedback,
"score": eval_output.score,
}
f.write(json.dumps(result) + "\n")
except OSError as e:
logger.error(f"Error writing results file: {e}")
raise
Note:
- Uses UTF-8 encoding.
- Overwrites the file if it already exists.
- Ensures non-ASCII characters are preserved in the output.
"""
with path.open("w", encoding="utf-8") as f:
json.dump(data, f, ensure_ascii=False, indent=2)

Check warning on line 205 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L204-L205

Added lines #L204 - L205 were not covered by tests


def _write_results_file(

Check warning on line 208 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L208

Added line #L208 was not covered by tests
path: Path, eval_inputs: list[EvalInput], eval_outputs: list[EvalOutput]
) -> None:
"""Write results to a JSONL file.

Args:
path: Path to the output file.
eval_inputs: List of evaluation inputs.
eval_outputs: List of evaluation outputs.

Raises:
OSError: If there's an error writing to the file.
ValueError: If eval_inputs and eval_outputs have different lengths.

Note:
- Uses UTF-8 encoding.
- Overwrites the file if it already exists.
- Each line in the file is a JSON object representing one result.
- Ensures non-ASCII characters are preserved in the output.
"""
if len(eval_inputs) != len(eval_outputs):
raise ValueError("eval_inputs and eval_outputs must have the same length")

Check warning on line 229 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L228-L229

Added lines #L228 - L229 were not covered by tests
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will likely raise an error if there have been downstream errors with outputs. Eval outputs can be <= eval_inputs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Minaam, it's the zip function below that needs them to be the same lengths. This check is just to throw the right error type.

I'll see what I could do.

image


logger.info(f"Results saved to {results_path}")
with path.open("w", encoding="utf-8") as f:
for input_data, eval_output in zip(eval_inputs, eval_outputs, strict=True):
result = {

Check warning on line 233 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L231-L233

Added lines #L231 - L233 were not covered by tests
"sample": input_data.model_dump(),
"feedback": eval_output.feedback,
"score": eval_output.score,
}
f.write(json.dumps(result, ensure_ascii=False) + "\n")

Check warning on line 238 in flow_judge/utils/result_writer.py

View check run for this annotation

Codecov / codecov/patch

flow_judge/utils/result_writer.py#L238

Added line #L238 was not covered by tests
Loading