Skip to content

Commit

Permalink
Merge with main
Browse files Browse the repository at this point in the history
  • Loading branch information
WerLaj committed Jan 29, 2024
2 parents 3ead65f + e293e1a commit 9ab4cbe
Show file tree
Hide file tree
Showing 40 changed files with 1,207 additions and 55 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -201,4 +201,4 @@ jobs:
Current Branch | Main Branch |
| ------ | ------ |
![Coverage Badge](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/NoB0/8446f35dc373966dc971fb9237483cce/raw/coverage.${{ env.REPO_NAME }}.${{ github.event.number }}.json) | ![Coverage Badge](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/NoB0/8446f35dc373966dc971fb9237483cce/raw/coverage.${{ env.REPO_NAME }}.main.json) |
edit-mode: replace
edit-mode: replace
3 changes: 2 additions & 1 deletion data/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# Data

This folder should contain the description of data that was used in the project, including datasets, queries, ground truth, run files, etc. Files under 10MB can be stored on GitHub, larger files should be stored on a server (e.g., gustav1). This README should provide a comprehensive overview of all the data that is used and where it originates from (e.g., part of an official test collection, generated using code in this repo or a third-party tool, etc.).
* `llm_prompts`: LLM prompts are stored in this folder.
* `nl_annotations`: Evaluation dataset is contained in this folder.
3 changes: 3 additions & 0 deletions data/llm_prompts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# NL to API Prompts

This folder contains the prompts for the NL to API model. The prompts are simple text files with one prompt per file.
12 changes: 12 additions & 0 deletions data/llm_prompts/default/intent.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
What is user's intent with the following statement? The options are:

ADD - statement of fact or preference,
GET - Asking for information,
DELETE - Request to delete or remove an item,
UNKNOWN - Non of the above

Statement:
------------------------------
{statement}
------------------------------
Answer:
7 changes: 7 additions & 0 deletions data/llm_prompts/default/preference.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
What is the sentiment towards "{object}" in the following statement? Answer 1 for positive, -1 for negative, or N/A when sentiment is not applicable.

Statement:
------------------------------
{statement}
------------------------------
Answer:
19 changes: 19 additions & 0 deletions data/llm_prompts/default/triple.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Return pipe-separated subject, predicate, and object from the following statement. If a field is not applicable, output N/A.

Example:
------------------------------
I like cats.
------------------------------
Answer: I | like | cats

Example:
------------------------------
Hello John.
------------------------------
Answer: N/A | N/A | John

Statement:
------------------------------
{statement}
------------------------------
Answer:
31 changes: 31 additions & 0 deletions data/nl_annotations/test.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
"Sentence", "Intent", "Subject", "Predicate", "Object", "Preference"
"Bob lives in New York.","ADD", "Bob", "lives in", "New York",
"Diana is a fan of Steven Spielberg's work.", "ADD", "Diana", "fan of", "Steven Spielberg movies", 1
"Charlie doesn't prefer Steven Spielberg's movies.","ADD", "Charlie", "doesn't prefer", "Steven Spielberg movies", -1
"Alice admires Robert Downey Jr..","ADD", "Alice", "admires", "Robert Downey Jr.", 1
"Ethan is a fan of Interstellar.","ADD", "Ethan", "fan of", "Interstellar", 1
"Diana told me she loves movies directed by Quentin Tarantino.","ADD", "Diana", "loves", "movies directed by Quentin Tarantino", 1
"Bob's sister is Diana.","ADD", "Bob", "sister of", "Diana",
"Bob admires Emma Watson.","ADD", "Bob", "admires", "Emma Watson", 1
"Diana likes sci-fi movies.","ADD", "Diana", "likes", "sci-fi movies", 1
"Charlie admires Tom Hanks.","ADD", "Charlie", "admires", "Tom Hanks", 1
"Do I like Pulp Fiction?,","GET", "I", "like", "Pulp Fiction",
"Do I like movies directed by Mel Gibson?,","GET", "I", "like", "Mel Gibson movies",
"Do I prefer Pulp Fiction","GET", "I", "prefer", "Pulp Fiction", 1
"Do I prefer Rambo movies,","GET", "I", "prefer", "Rambo movies", 1
"Do I like movies featuring actors that have played Macbeth? Which of these movies do I prefer?,","UNKNOWN", , , ,
"Do I like action movies?,","GET", "I", "like", "Action movies",
"Do I prefer romantic comedies?,","GET", "I", "prefer", "Romantic comedies", 1
"Do I hate romantic comedies?,","GET", "I", "hate", "Action movies", -1
"How many action movies are stored in my PKG?,","UNKNOWN", "", "", "",
"Save The Godfather as my favourite movie,","ADD", "I", "favourite movie", "The Godfather", 1
"I enjoy watching action movies with Alice,","ADD", "I", "enjoy watching", "action movies with Alice", 1
"I went to the movie theatre yesterday and loved Oppenheimer,","ADD", "I", "loved", "Oppenheimer", 1
"Note that I would never watch cheesy romcom unless with my friends,","UNKNOWN", "", "", "",
"I am married to Bob,","ADD", "I", "married to", "Bob",
"My husband doesn't like romantic comedies,","ADD", "My husband", "dislikes", "romantic comedies", -1
"Emma is my mother,","ADD", "I", "son of", "Emma",
"My husband has seen all the Christopher Nolan movies and a big fan,","ADD", "My husband", "big fan of", "Christopher Nolan movies", 1
"Remove Forrest Gump from my movie library.","DELETE","", "", "Forrest Gump",
"I am no longer married to Bob", "DELETE", "I", "married to", "Bob",
"Discard everything directed by Steven Spielberg.", "DELETE", "", "directed by", "Steven Spielberg",
22 changes: 0 additions & 22 deletions data/pkg-vocabulary.owl.ttl

This file was deleted.

42 changes: 42 additions & 0 deletions pkg_api/core/annotations.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
"""Dataclasses for the annotations used in the PKG API."""

from dataclasses import dataclass, field
from typing import List, Optional, Union

from pkg_api.pkg_types import URI


@dataclass
class Concept:
"""Class representing a SKOS concept."""

description: str
related_entities: List[URI] = field(default_factory=list)
broader_entities: List[URI] = field(default_factory=list)
narrower_entities: List[URI] = field(default_factory=list)


@dataclass
class Triple:
"""Class representing a subject, predicate, object triple."""

subject: Union[URI, str, None] = None
predicate: Union[URI, str, None] = None
object: Union[URI, Concept, str, None] = None


@dataclass
class Preference:
"""Class representing a preference."""

topic: Union[URI, Concept, str]
weight: float


@dataclass
class PKGData:
"""Represents a statement annotated with a triple and a preference."""

statement: str
triple: Optional[Triple] = None
preference: Optional[Preference] = None
12 changes: 12 additions & 0 deletions pkg_api/core/intents.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
"""Intents for the API."""

from enum import Enum, auto


class Intent(Enum):
"""Enum for intents."""

ADD = auto()
GET = auto()
DELETE = auto()
UNKNOWN = auto()
17 changes: 17 additions & 0 deletions pkg_api/nl_to_pkg/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
"""NL to PKG module."""

from .annotators.annotator import StatementAnnotator
from .annotators.three_step_annotator import ThreeStepStatementAnnotator
from .entity_linking.entity_linker import EntityLinker
from .llm.llm_connector import LLMConnector
from .llm.prompt import Prompt
from .nl_to_pkg import NLtoPKG

__all__ = [
"StatementAnnotator",
"ThreeStepStatementAnnotator",
"EntityLinker",
"LLMConnector",
"Prompt",
"NLtoPKG",
]
35 changes: 35 additions & 0 deletions pkg_api/nl_to_pkg/annotators/annotator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
"""Class for annotating a natural language query.
The main purpose is to return the intent (ADD | GET | DELETE) and to
annotate the triple (subject, predicate, object) and the preference (1 |
-1) in the query.
"""


from abc import ABC, abstractmethod
from typing import Tuple

from pkg_api.core.annotations import PKGData
from pkg_api.core.intents import Intent
from pkg_api.nl_to_pkg.llm.prompt import Prompt


class StatementAnnotator(ABC):
def __init__(self) -> None:
"""Initializes the statement annotator."""
self._prompt = Prompt()

@abstractmethod
def get_annotations(self, statement: str) -> Tuple[Intent, PKGData]:
"""Returns a tuple of the intent and the annotated statement.
Args:
statement: The statement to be annotated.
Raises:
NotImplementedError: If the method is not implemented.
Returns:
A tuple of the intent and the annotated statement as PKGData.
"""
raise NotImplementedError
143 changes: 143 additions & 0 deletions pkg_api/nl_to_pkg/annotators/three_step_annotator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
"""A three-step annotator for annotating a statement.
This module contains a three-step annotator for annotating a statement
with a triple and a preference using LLM.
"""


import re
from abc import ABC
from typing import Optional, Tuple

from pkg_api.core.annotations import PKGData, Preference, Triple
from pkg_api.core.intents import Intent
from pkg_api.nl_to_pkg.llm.llm_connector import LLMConnector
from pkg_api.nl_to_pkg.llm.prompt import Prompt

_DEFAULT_PROMPT_PATHS = {
"intent": "data/llm_prompts/default/intent.txt",
"triple": "data/llm_prompts/default/triple.txt",
"preference": "data/llm_prompts/default/preference.txt",
}


def is_number(value: str) -> bool:
"""Returns True if a value is a number, False otherwise.
Args:
value: The value to be checked.
Returns:
True if the value is a number, False otherwise.
"""
try:
float(value)
return True
except ValueError:
return False


class ThreeStepStatementAnnotator(ABC):
def __init__(self) -> None:
"""Initializes the three-step statement annotator."""
self._prompt_paths = _DEFAULT_PROMPT_PATHS
self._prompt = Prompt()
self._valid_intents = {intent.name for intent in Intent}
self._llm_connector = LLMConnector()

def get_annotations(self, statement: str) -> Tuple[Intent, PKGData]:
"""Returns a tuple with annotations for a statement.
Args:
statement: The statement to be annotated.
Returns:
The intent and the annotations.
"""
intent = self._get_intent(statement)
triple = self._get_triple(statement)
preference = (
self._get_preference(statement, triple.object)
if triple and isinstance(triple.object, str)
else None
)
return intent, PKGData(statement, triple, preference)

def _get_intent(self, statement: str) -> Intent:
"""Returns the intent for a statement.
Args:
statement: The statement to be annotated.
Returns:
The intent.
"""
prompt = self._prompt.get_prompt(
self._prompt_paths["intent"], statement=statement
)
response = self._llm_connector.get_response(prompt)
response_terms = response.split()
if len(self._valid_intents.intersection(response_terms)) == 1:
return next(
intent for intent in Intent if intent.name in response_terms
)
return Intent.UNKNOWN

def _get_triple(self, statement: str) -> Optional[Triple]:
"""Returns the triple for a statement.
Args:
statement: The statement to be annotated.
Returns:
The triple comprised of subject, predicate, and object or None.
"""
prompt = self._prompt.get_prompt(
self._prompt_paths["triple"], statement=statement
)
response = self._llm_connector.get_response(prompt)
response_terms = [
None if term.strip() == "N/A" else term.strip()
for term in response.split("|")
]
if len(response_terms) == 3:
return Triple(*response_terms)
return None

def _get_preference(
self, statement: str, triple_object: str
) -> Optional[Preference]:
"""Returns the preference for a statement.
Args:
statement: The statement to be annotated.
triple_object: The object of the triple. It is only used in string
form.
Raises:
TypeError: If the triple object is not a string.
Returns:
The preference.
"""
if not isinstance(triple_object, str):
raise TypeError(
f"Triple object must be of type str, not {type(triple_object)}."
)

prompt = self._prompt.get_prompt(
self._prompt_paths["preference"],
statement=statement,
object=triple_object,
)
response = self._llm_connector.get_response(prompt)
response_terms = [
term.strip() for term in re.split(r"[ .,;]+", response)
]
preference = next(
(term for term in response_terms if is_number(term)),
None,
)
if preference:
return Preference(triple_object, float(preference))
return None
25 changes: 25 additions & 0 deletions pkg_api/nl_to_pkg/entity_linking/entity_linker.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
"""Abstract class for entity linking."""


from abc import ABC, abstractmethod

from pkg_api.core.annotations import PKGData


class EntityLinker(ABC):
"""Entity linker for linking entities to the PKG or available KGs."""

@abstractmethod
def link_annotation_entities(self, pkg_data: PKGData) -> PKGData:
"""Resolves the pkg data annotations if possible.
Args:
pkg_data: The PKG data to be resolved.
Raises:
NotImplementedError: If the method is not implemented.
Returns:
The resolved PKG data annotations.
"""
raise NotImplementedError
Loading

0 comments on commit 9ab4cbe

Please sign in to comment.