Evaluating the quality of the responses of AI agents used to be tricky. It required knowledge of eval criteria as well as third-party tools like promptfoo, ragas or prometheus. Now openAI makes it ridiculously easy with a new API endpoint. It can grade a completion against a reference response, assess its format and tone, and you can even promt the eval to add your own criteria.
Evaluating the quality of the responses of AI agents used to be tricky. It required knowledge of eval criteria as well as third-party tools like promptfoo, ragas or prometheus. Now openAI makes it ridiculously easy with a new API endpoint. It can grade a completion against a reference response, assess its format and tone, and you can even promt the eval to add your own criteria.