Skip to content

Commit

Permalink
Docs for v0.4.37 (#1297)
Browse files Browse the repository at this point in the history
  • Loading branch information
elenasamuylova authored Sep 11, 2024
1 parent cdd39cf commit dc562bc
Show file tree
Hide file tree
Showing 13 changed files with 259 additions and 94 deletions.
5 changes: 3 additions & 2 deletions docs/book/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,11 +80,12 @@
* [Feature importance in data drift](customization/feature-importance.md)
* [Text evals with LLM-as-judge](customization/llm_as_a_judge.md)
* [Text evals with HuggingFace](customization/huggingface_descriptor.md)
* [Add a custom text descriptor](customization/add-custom-descriptor.md)
* [Add a custom drift method](customization/add-custom-drift-method.md)
* [Add a custom Metric or Test](customization/add-custom-metric-or-test.md)
* [Customize JSON output](customization/json-dict-output.md)
* [Show raw data in Reports](customization/report-data-aggregation.md)
* [Add text comments to Reports](customization/text-comments.md)
* [Add a custom drift method](customization/add-custom-drift-method.md)
* [Add a custom Metric or Test](customization/add-custom-metric-or-test.md)
* [Change color schema](customization/options-for-color-schema.md)
* [How-to guides](how-to-guides/README.md)

Expand Down
110 changes: 110 additions & 0 deletions docs/book/customization/add-custom-descriptor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
---
description: How to add custom text descriptors.
---

You can implement custom row-level evaluations for text data that you will later use just like any other descriptor across Metrics and Tests. You can implement descriptors that use a single column or two columns.

Note that if you want to use LLM-based evaluations, you can write custom prompts using [LLM judge templates](llm_as_a_judge.md).

# Code example

Refer to a How-to example:

{% embed url="https://github.com/evidentlyai/evidently/blob/main/examples/how_to_questions/how_to_use_llm_judge_template.ipynb" %}

# Custom descriptors

Imports:

```python
from evidently.descriptors import CustomColumnEval, CustomPairColumnEval
```

## Single column descriptor

You can create a custom descriptor that will take a single column from your dataset and run a certain evaluation for each row.

**Implement your evaluation as a Python function**. It will take a pandas Series as input and return a transformed Series.

Here, the `is_empty_string_callable` function takes a column of strings and returns an "EMPTY" or "NON EMPTY" outcome for each.

```python
def is_empty_string_callable(val1):
return pd.Series(["EMPTY" if val == "" else "NON EMPTY" for val in val1], index=val1.index)
```

**Create a custom descriptor**. Create an example of `CustomColumnEval` class to wrap the evaluation logic into an object that you can later use to process specific dataset input.

```python
empty_string = CustomColumnEval(
func=is_empty_string_callable,
feature_type="cat",
display_name="Empty response"
)
```

Where:
* `func: Callable[[pd.Series], pd.Series]` is a function that returns a transformed pandas Series.
* `display_name: str` is the new descriptor's name that will appear in Reports and Test Suites.
* `feature_type` is the type of descriptor that the function returns (`cat` for categorical, `num` for numerical)

**Apply the new descriptor**. To create a Report with a new Descriptor, pass it as a `column_name` to the `ColumnSummaryMetric`. This will compute the new descriptor for all rows in the specified column and summarize its distribution:

```python
report = Report(metrics=[
ColumnSummaryMetric(column_name=empty_string.on("response")),
])
```

Run the Report on your `df` dataframe as usual:

```python
report.run(reference_data=None,
current_data=df)
```

## Double column descriptor

You can create a custom descriptor that will take two columns from your dataset and will run a certain evaluation for each row. (For example, for pairwise evaluators).

**Implement your evaluation as a Python function**. Here, the `exact_match_callable` function takes two columns and checks whether each pair of values is the same, returning "MATCH" if they are equal and "MISMATCH" if they are not.

```python
def exact_match_callable(val1, val2):
return pd.Series(["MATCH" if val else "MISMATCH" for val in val1 == val2])
```

**Create a custom descriptor**. Create an example of the `CustomPairColumnEval` class to wrap the evaluation logic into an object that you can later use to process two named columns in a dataset.

```python
exact_match = CustomPairColumnEval(
func=exact_match_callable,
first_column="response",
second_column="question",
feature_type="cat",
display_name="Exact match between response and question"
)
```

Where:

* `func: Callable[[pd.Series, pd.Series], pd.Series]` is a function that returns a transformed pandas Series after evaluating two columns.
* `first_column: str` is the name of the first column to be passed into the function.
* `second_column: str` is the name of the second column to be passed into the function.
* `display_name: str` is the new descriptor's name that will appear in Reports and Test Suites.
* `feature_type` is the type of descriptor that the function returns (`cat` for categorical, `num` for numerical).

**Apply the new descriptor**. To create a Report with a new Descriptor, pass it as a `column_name` to the ColumnSummaryMetric. This will compute the new descriptor for all rows in the dataset and summarize its distribution:

```python
report = Report(metrics=[
ColumnSummaryMetric(column_name=exact_match.as_column())
])
```

Run the Report on your `df` dataframe as usual:

```python
report.run(reference_data=None,
current_data=df)
```
13 changes: 10 additions & 3 deletions docs/book/customization/add-custom-metric-or-test.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,13 @@
There are two ways to add a custom Metric to Evidently.
There are two ways to add a custom Metric or Test to Evidently:
* Add it as a Python function (Recommended).
* Implement a custom metric with custom Plotly render.

Implementing a new Metric or Test means that you implement a completely custom column- or dataset-level evaluation.

There are other ways to customize your evaluations that do not require creating Metrics or Tests from scratch:
* Add a custom descriptor for row-level evaluations. Read on [adding custom text descriptors](add-custom-descriptor.md).
* Write a custom LLM-based evaluator using templates. Read on [designing LLM judges](llm_as_a_judge.md).
* Add a custom data drift detection method, re-using the existing Data Drift metric render. Read on [drift method customization](add-custom-drift-method.md) option.

# 1. Add a new Metric or Test as a Python function. (Recommended).

Expand All @@ -9,8 +18,6 @@ This is a recommended path to add custom Metrics. Using this method, you can sen
Example notebook:
{% embed url="https://github.com/evidentlyai/evidently/blob/main/examples/how_to_questions/how_to_build_metric_over_python_function.ipynb" %}

**Note**: if you want to add a custom data drift method, there is a separate [drift method customization](add-custom-drift-method.md) option. In this case, you will re-use the existing render.

# 2. Implement a new Metric and Test from scratch.

You can also implement a new Metric or Test from scratch, defining both the calculation method and the optional visualization.
Expand Down
12 changes: 11 additions & 1 deletion docs/book/customization/llm_as_a_judge.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ You can use built-in evaluators that include pre-written prompts for specific cr
**Imports**. Import the `LLMEval` and built-in evaluators you want to use:

```python
from evidently.descriptors import LLMEval, NegativityLLMEval, PIILLMEval, DeclineLLMEval
from evidently.descriptors import LLMEval, NegativityLLMEval, PIILLMEval, DeclineLLMEval, BiasLLMEval, ToxicityLLMEval, ContextQualityLLMEval
```

**Get a Report**. To create a Report, simply list them like any other descriptor:
Expand All @@ -58,6 +58,16 @@ report = Report(metrics=[
])
```

**Run descriptors over two columns**. An evaluator that assesses if the context contains enough information to answer the question requires both columns. Run the evaluation over the `context` column and pass the name of the column containing the `question` as a parameter.

```python
report = Report(metrics=[
TextEvals(column_name="context", descriptors=[
ContextQualityLLMEval(question="question"),
])
])
```

{% hint style="info" %}
**Which descriptors are there?** See the list of available built-in descriptors in the [All Metrics](../reference/all-metrics.md) page.
{% endhint %}
Expand Down
14 changes: 9 additions & 5 deletions docs/book/evaluations/no_code_evals.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@ The platform supports several evaluations directly from the user interface.

| Name | Type | Description |
|-------------------------|------------|------------------------------------------------------------------------------------------------------------------------------------------|
| Text Evals | Report | Analyze texts using methods from regular expressions to LLM judges. |
| Data Quality | Report | Get descriptive statistics and distribution overviews for all columns. |
| Text Evals | Report | Analyze text data, from regular expressions to LLM judges. |
| Data Quality | Report | Get descriptive statistics and distributions for all columns. |
| Classification Quality | Report | Evaluate the quality of a classification model. |
| Regression Quality | Report | Evaluate the quality of a regression model. |
| Data Quality Tests | Test Suite | Automatically check for issues like missing values, duplicates, etc. |
| Data Quality Tests | Test Suite | Automatically check for missing values, duplicates, etc. |

Before you start, pick a dataset to evaluate. For example, this could be a CSV file containing inputs and outputs of your AI system, like chatbot logs.

Expand Down Expand Up @@ -77,6 +77,10 @@ Select specific checks one by one:

Each evaluation result is called a **Descriptor**. No matter the method, you’ll get a label or score for every evaluated text. Some, like “Sentiment,” work instantly, while others may need setup.

{% hint style="info" %}
**What other evaluators are there?** Check the list of Descriptors on the [All Metrics](../reference/all-metrics.md) page.
{% endhint %}

Here are few examples of Descriptors and how to configure them:

## Words presence
Expand Down Expand Up @@ -111,8 +115,8 @@ For a binary classification template, you can configure:
* **Target/Non-target Category**: labels you want to use.
* **Uncertain Category**: how the model should respond when it can’t decide.
* **Reasoning**: choose to include explanation (Recommended).
* **Category** and/or **Score**: have the LLM respond with the category (Recommended) or also return a score.
* **Visualize as**: when both Category and Score are computed, choose which to display in the report.
* **Category** and/or **Score**: have the LLM respond with the category (Recommended) or score.
* **Visualize as**: when both Category and Score are computed, choose which to display in the Report.

{% hint style="info" %}
**What other evaluators are there?** Check the list of Descriptors on the [All Metrics](../reference/all-metrics.md) page.
Expand Down
1 change: 1 addition & 0 deletions docs/book/examples/cookbook_llm_judge.md
Original file line number Diff line number Diff line change
Expand Up @@ -365,6 +365,7 @@ verbosity_report.datasets().current
```

Preview:

![](../.gitbook/assets/cookbook/llmjudge_verbosity_examples.png)

Don't fully agree with the results? Use these labels as a starting point, and correct the decision where you see fit - now you've got your golden dataset! Next, iterate on your judge prompt.
Expand Down
Loading

0 comments on commit dc562bc

Please sign in to comment.