Evaluating RAG for large scale codebases

by GavCoon 2/14/25, 8:29 AMwith 10 comments
by jimminyxon 2/14/25, 8:56 AM

Conceptually, LLM-as-a-judge doesn't feel like it should work — it's like asking a student to grade their own homework. it's very unintuitive for me that it actually seems to work pretty well

by 33aon 2/14/25, 9:50 PM

If the self evaluation makes it better, then why not do the self evaluation as part of the normal RAG workflow?

by namanyaygon 2/14/25, 12:16 PM

Who's data are they training on? Are they storing and using all customer data?