Evaluation¶

class versionhq.task.evaluate.Evaluation

A Pydantic class to store conditions and results of the evaluation.

Variables¶

Variable	Data Type	Default	Nullable	Description
`items`	List[InstanceOf[EvaluationItem]]	list()	-	Stores evaluation items.
`eval_by`	Any	None	True	Stores an agent evaluated the output.

Property¶

Property	Returns	Description
`aggregate_score`	float	Calucurates weighted average eval scores of the task output.
`suggestion_summary`	str	Returns summary of the suggestions.

EvaluationItem¶

class versionhq.task.evaluate.EvaluationItem

Variables¶

Variable	Data Type	Default	Nullable	Description
`criteria`	str	None	False	Stores evaluation criteria given by the client.
`suggestion`	str	None	True	Stores suggestion on improvement from the evaluator agent.
`score`	float	None	True	Stores the score on a 0 to 1 scale.

Usage¶

Evaluator agents will evaluate the task output based on the given criteria, and store the results in the TaskOutput object.

import versionhq as vhq
from pydantic import BaseModel

class CustomOutput(BaseModel):
    test1: str
    test2: list[str]

task = vhq.Task(
    description="Research a topic to teach a kid aged 6 about math.",
    response_schema=CustomOutput,
    should_evaluate=True, # triggers evaluation
    eval_criteria=["uniquness", "audience fit",],

)
res = task.execute()

assert isinstance(res.evaluation, vhq.Evaluation)
assert [item for item in res.evaluation.items if item.criteria == "uniquness" or item.criteria == "audience fit"]
assert res.evaluation.aggregate_score is not None
assert res.evaluation.suggestion_summary is not None

An Evaluation object provides scores for the given criteria.

For example, it might indicate a uniqueness score of 0.56, an audience fit score of 0.70, and an aggregate score of 0.63.