Hybrid Learning to Rank with Retrieval-Augmented Generation and Large Language (LTR-RAG-LLM) Framework for Context-Aware Information Retrieval

Authors

  • Akintayo Ayoade Department of Computer and Information Science, Faculty of Natural and Applied Science, Lead City University, Ibadan, Nigeria Author

Keywords:

Learning to Rank (LTR); Retrieval-Augmented Generation (RAG); Large Language Models (LLMs); Information Retrieval

Abstract

This study presented the Hybrid LTR-RAG-LLM framework, an innovative combination of learning-to-rank (LTR), retrieval-augmented generation (RAG), and large language model (LLM) enhancement for context-sensitive information retrieval. We assess the framework using two benchmark datasets: MS MARCO (1,000,000 queries, 8.8 million passages, average query length of 7 words, average passage length of 60 words) and Natural Questions (NQ) (307,373 queries, 3.6 million documents, average query length of 9 words, average document length of 120 words). Experimental findings indicated that the suggested model reaches an NDCG@10 of 0.782 and an MRR of 0.804 on MS MARCO, reflecting a 9.8% advancement compared to LambdaMART (0.712 NDCG@10, 0.729 MRR), 5.0% over BERT Ranker (0.745 NDCG@10, 0.768 MRR), and 7.0% relative to RAG (0.731 NDCG@10, 0.751 MRR). In NQ, the framework achieves 0.765 NDCG@10 and 0.789 MRR, exceeding LambdaMART by 10.2%, BERT Ranker by 4.5%, and RAG by 6.4%. Precision@10 increases to 0.691 on MS MARCO and 0.672 on NQ, as opposed to 0.623 (LambdaMART), 0.652 (BERT), and 0.640 (RAG). The ablation analysis showed that LTR-only models attain 0.732 NDCG@10, RAG-only models hit 0.741, and hybrid combinations lacking complete integration stay under 0.760. The entire LTR-RAG-LLM framework reliably surpasses these alternatives, reaching 0.782 NDCG@10, 0.804 MRR, and 0.691 Precision@10. Regarding scalability, the baseline 

latency of 180 ms per query decreased to 150 ms using FAISS, 135 ms through knowledge distillation, and 125 ms via quantization, whereas a unified optimization approach achieved 120 ms latency without sacrificing accuracy. Error analysis demonstrated resilience across different query types: for long-tail queries, accuracy increased by 8-12%; for multi-hop queries, the model provided completely linked answers in contrast to partial baseline results; and for ambiguous queries, disambiguation accuracy rose by 15-20%. Fairness evaluation indicated a decrease in exposure bias from 18.5% (BERT Ranker) to 12.1% and a drop in the demographic skew index from 0.32 to 0.21, along with a rise in fairness-adjusted NDCG from 0.710 to 0.752. The Hybrid LTR-RAG-LLM framework sets a new benchmark by integrating accuracy (+9-10% improvement over traditional baselines), efficiency (120 ms delay), and fairness (34.6% decrease in bias). These results established the model as a scalable and ethically aligned option for future information retrieval systems.

Downloads

Published

2026-02-11