EduPLEx_API
InfoPrototypeAll docs
Recommendation, reporting & analytics
Recommendation, reporting & analytics
  • Experiments report
    • Key concepts
    • Data sources
    • First demonstrator: ESCO ontologies and semantic matching
    • Software design
      • Endpoints Sbert_eduplex
      • Setup Sbert_eduplex
    • AI Applications
    • Conclusions
    • Recommendation
    • Bibliography
  • Recommendation Engine
  • Reporting and predictive analytics
  • LRS User Journey Visualizer
  • AI Tutor - RAG system
    • LLM-augmented Retrieval and Ranking for Course Recommendations
    • Retrieval of course candidates when searching via title.
    • Answer Generation Evaluation
    • Chunk Size and Retrieval Evaluation
    • Chunking Techniques – Splitters
    • Golden Case CLAPNQ
    • Comparative Retrieval Performance: Modules vs Golden Case
    • LLM-based Evaluator for Context Relevance
    • Retrieval Performance Indexing pdf vs xapi, and Keywords vs Questions
Powered by GitBook
On this page
Edit on GitLab
  1. AI Tutor - RAG system

LLM-based Evaluator for Context Relevance

Goal Assessing how well an LLM can identify relevant context chunks given question-text pairs so it can be used in a second-step of our retrieval system to filtered out irrelevant candidates. Data Modules Agiles Mindset, Kritisches Denken. 20 test questions from each module. Method/Approach LLM-based relevance scores from comparing question-chunk pairs. Prompt taken from TruLens Context Relevance.The LLM rates the context text chunk from 1 to 10. The score is normalized to a 0-1 scale. LLM used: gpt-4-0125-preview Results Recall of 100% in both Agiles Mindset and Kritisches Denken data. Precision: AM (71%) KD(33%). Accuracy: AM (81%) KD (52%) Evaluation Metrics Accuracy: The percentage of correctly predicted values. Recall: True positives / sum of true positives and false negatives (actual positives). Crucial when cost of false negatives is high. (predicted as 0 when it is in fact relevant) Precision: True positive/ sum of true positives and false positives (Total predicted positives '1').Crucial when cost of false positives is high. (predicted as 1 when it is not relevant). Conclusions The LLM-based context relevance evaluator was very good at correctly predicting all relevant chunks as relevant (100% recall). However, accuracy and precision was very different depending on the data evaluated. It was less accurate in general for the case of Kritisches Denken (52% accuracy) than for Agiles Mindset (71% accuracy).

PreviousComparative Retrieval Performance: Modules vs Golden CaseNextRetrieval Performance Indexing pdf vs xapi, and Keywords vs Questions

Last updated 3 months ago