Comparative Retrieval Performance: Modules vs Golden Case

Goal Compare CG from current modules with golden case Data Modules Kritisches Denken and Agiles Mindset, indexed xapi. 20 test questions for each module. Method/Approach Retrieval results evaluated using binary relevance score 1-0 (manually labeled relevant chunks). Results The best retrieval performance achieved for the module Kritisches Denken was 40% at k3. For Agiles Mindset the best achieved was 50% at k2. Evaluation Metrics Retrieval quality: Cumulative Gain (CG at k=1 to k=6) Conclusions The Cumulative Gain (CG) performance for the current modules was notably lower than the golden case. Further improvements are necessary to reduce the gap in our retrieval quality when compared to the ideal structured dataset of the golden case.

Content retrieval details

After testing with our real courses, we have decided to obtain a golden case of almost 5000 questions and answers https://eduplex.atlassian.net/browse/EDX-529 and proceed with more experiments https://eduplex.atlassian.net/browse/EDX-530

Experiments with this golden case and different LLMs do not show a big difference in the retrieved results

But there is a big difference between the results from the golden case (above 90%) and our test learning content (that stays below 40% of retrieved results success)

We decided to test different chunking strategies to try to improve the retrieval success scores.

After using the sentence splitter strategy we got improved results for our test learning content:

PreviousGolden Case CLAPNQ NextLLM-based Evaluator for Context Relevance

Last updated 10 months ago