LLM-augmented Retrieval and Ranking for Course Recommendations

Goal

To evaluate the effectiveness of a recommendation pipeline leveraging LLM capabilities. To test how well an LLM can generate synthetic course title recommendations based on user profiles (job title, skills, and goals), and to assess the quality of course retrieval through vector search and LLM-based relevancy re-ranking.

Data

WBS modules data, synthetic data.

Method/Approach

Multi-stage pipeline combining LLM-generated synthetic course recommendations, vector-based retrieval using OpenSearch, and LLM-based relevancy scoring for re-ranking. Evaluation on 100 synthetic user profiles, analyzing cosine similarity scores and LLM-based relevancy.

Results

Overall, OpenAI 3-large showed the best retrieval quality, with higher overall and top-1 relevance scores, followed by OpenAI 3-small. SBERT showed the weakest performance in this setup.

Evaluation Metrics

Cosine similarity scores, relevancy scores (heatmap, averages).

Conclusions

OpenAI text-embedding models provided the best retrieval quality. SBERT showed weaker performance, with a bit lower average relevance scores at rank 1 (0.49 compared to 0.52 for OpenAI models). The experiment highlighted a key limitation: we cannot distinguish whether low scores are due to a weak retrieval system or a lack of relevant candidates in the database.

More details about the recommendation system

The recommendation system operates by first creating a user profile that includes a set of user skills. Due to the absence of actual user data, these skills are generated synthetically. Based on this synthetic skill user profile, a large language model (LLM) is employed to produce potential course titles. These generated titles may include hypothetical or non-existent courses, serving as proxies for the types of learning opportunities that align with the user's skills.

Once these synthetic course titles are generated, they are utilized as input queries to semantic search for real course titles that closely resemble the generated ones. This search identifies existing courses that are similar in content or focus to the synthetic titles. After retrieving these matches, a re-ranking process is applied to refine their relevance and prioritize the most suitable options. The system ultimately provides personalized course recommendations by leveraging synthetic data, advanced natural language generation, and retrieval techniques.

Optimize OpenSearch search to obtain more relevant results when searching via title

From 2000 synthetic course titles we have performed 100 queries with opensearch.

We see a much better performance in openai than in sbert.

The score is slightly better using a search just in the title field only, than when searching in a combination of title + learning goal (see below more details):