Skip to content

Evaluation

class versionhq.task.evaluate.Evaluation

A Pydantic class to store conditions and results of the evaluation.

Variables

Variable
Data Type Default Nullable Description
items List[InstanceOf[EvaluationItem]] list() - Stores evaluation items.
eval_by Any None True Stores an agent evaluated the output.

Property

Property
Returns Description
aggregate_score float Calucurates weighted average eval scores of the task output.
suggestion_summary str Returns summary of the suggestions.

EvaluationItem

class versionhq.task.evaluate.EvaluationItem

Variables

Variable
Data Type Default Nullable Description
criteria str None False Stores evaluation criteria given by the client.
suggestion str None True Stores suggestion on improvement from the evaluator agent.
score float None True Stores the score on a 0 to 1 scale.

Usage

Evaluator agents will evaluate the task output based on the given criteria, and store the results in the TaskOutput object.

import versionhq as vhq
from pydantic import BaseModel

class CustomOutput(BaseModel):
    test1: str
    test2: list[str]

task = vhq.Task(
    description="Research a topic to teach a kid aged 6 about math.",
    response_schema=CustomOutput,
    should_evaluate=True, # triggers evaluation
    eval_criteria=["uniquness", "audience fit",],

)
res = task.execute()

assert isinstance(res.evaluation, vhq.Evaluation)
assert [item for item in res.evaluation.items if item.criteria == "uniquness" or item.criteria == "audience fit"]
assert res.evaluation.aggregate_score is not None
assert res.evaluation.suggestion_summary is not None

An Evaluation object provides scores for the given criteria.

For example, it might indicate a uniqueness score of 0.56, an audience fit score of 0.70, and an aggregate score of 0.63.