Latest LLM & RAG Research: October 2025 Papers

Oct 28, 2025 by SLV Team 47 views

Hey guys! Check out the latest scoop on Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) research! This article dives into the freshest papers from October 28, 2025, bringing you the key insights and advancements in the field. For a smoother reading experience and even more papers, don't forget to peek at the Github page.

Large Language Models: The Hottest Papers

Let's dive straight into the world of LLMs! We've got a fantastic collection of papers covering everything from agent generation to safety alignment. It's a wild time for AI, so let's break down some of the most exciting developments.

Title	Date	Comment
PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity	2025-10-27	22 pages, 13 figures
Alita-G: Self-Evolving Generative Agent for Agent Generation	2025-10-27	15 pages, 3 figures
Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models	2025-10-27	The T... The Thirty-Ninth Annual Conference on Neural Information Processing Systems
Think Twice: Branch-and-Rethink Reasoning Reward Model	2025-10-27
Multi-Agent Evolve: LLM Self-Improve through Co-evolution	2025-10-27	29 pa... 29 pages, 4 figures, submitted to ICLR 2026
Lightweight Robust Direct Preference Optimization	2025-10-27
FARMER: Flow AutoRegressive Transformer over Pixels	2025-10-27	Byted... Bytedance Seed Technical Report
A Survey of Data Agents: Emerging Paradigm or Overstated Hype?	2025-10-27	Pleas... Please refer to our paper list and companion materials at: https://github.com/HKUSTDial/awesome-data-agents
ESCA: Contextualizing Embodied Agents via Scene-Graph Generation	2025-10-27	Accep... Accepted as a Spotlight Paper at NeurIPS 2025
LLM4Cell: A Survey of Large Language and Agentic Models for Single-Cell Biology	2025-10-27	34 pa... 34 pages, 5 figures, 7 tables
SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging	2025-10-27
EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT	2025-10-27	Accep... Accepted at NeurIPS 2025
ReCode: Unify Plan and Action for Universal Granularity Control	2025-10-27
SafeCOMM: A Study on Safety Degradation in Fine-Tuned Telecom Large Language Models	2025-10-27
ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models	2025-10-27	submi... submitted to icassp 2026

One paper that really grabs attention is PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity. Imagine being able to precisely pinpoint objects in both space and time using natural language. This paper, spanning 22 pages and packed with 13 figures, presents a novel framework for doing just that. The implications for areas like robotics, video analysis, and even human-computer interaction are massive. Another intriguing development is Alita-G: Self-Evolving Generative Agent for Agent Generation. This research explores how agents can generate other agents, potentially leading to a new era of self-improving AI systems. The 15-page paper with 3 figures delves into the mechanics of creating these self-evolving agents, a concept that could revolutionize how we design and deploy AI. The concept of unlearning in LLMs is also explored in Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models. This paper introduces a framework for unlearning specific information from LLMs, which is crucial for maintaining privacy and mitigating biases. Accepted for presentation at the Thirty-Ninth Annual Conference on Neural Information Processing Systems, this work offers a deep dive into the mathematical underpinnings of this process. Another standout is Multi-Agent Evolve: LLM Self-Improve through Co-evolution, this 29-page paper, submitted to ICLR 2026, demonstrates how LLMs can improve through co-evolution, where multiple agents interact and learn from each other. This approach opens up exciting possibilities for creating more robust and adaptable AI systems. The importance of preference optimization is highlighted in Lightweight Robust Direct Preference Optimization, this research focuses on optimizing LLMs based on human preferences, making them more aligned with real-world needs and expectations. Direct Preference Optimization is a key area for making models more practical and user-friendly. Another interesting paper is FARMER: Flow AutoRegressive Transformer over Pixels, this Bytedance Seed Technical Report introduces FARMER, a novel transformer architecture designed for processing pixels. This approach could lead to significant advancements in image and video generation tasks. Additionally, the survey paper A Survey of Data Agents: Emerging Paradigm or Overstated Hype? provides a comprehensive overview of data agents, helping to separate the hype from the reality in this emerging field. For those interested in the practical applications and limitations, this survey is a must-read. Contextualization of embodied agents is discussed in ESCA: Contextualizing Embodied Agents via Scene-Graph Generation, accepted as a Spotlight Paper at NeurIPS 2025, this paper explains how scene graphs can be used to provide context for embodied agents, improving their ability to interact with their environment. This is crucial for applications in robotics and virtual reality. A broader survey on the use of LLMs in biology is presented in LLM4Cell: A Survey of Large Language and Agentic Models for Single-Cell Biology, this 34-page paper with 5 figures and 7 tables explores the application of LLMs in single-cell biology, offering a glimpse into the future of AI-driven biological research. The safety aspects of LLMs are addressed in SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging, this research proposes a method for preserving safety alignment during fine-tuning, ensuring that models remain safe and reliable even after adaptation. Egocentric reasoning is the focus of EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT, accepted at NeurIPS 2025, this paper delves into egocentric reasoning, using spatio-temporal chains of thought to understand how AI models can reason from a first-person perspective. Action planning is discussed in ReCode: Unify Plan and Action for Universal Granularity Control, this research introduces a unified approach to planning and action, allowing for fine-grained control over AI behavior. The safety of telecom LLMs is examined in SafeCOMM: A Study on Safety Degradation in Fine-Tuned Telecom Large Language Models, this paper studies the potential for safety degradation in telecom-specific LLMs, highlighting the importance of careful fine-tuning and monitoring. Lastly, instruction sensitivity is benchmarked in ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models, submitted to ICASSP 2026, this paper introduces a benchmark for evaluating the instruction sensitivity of audio LLMs, a key factor in their usability and effectiveness. These papers collectively represent a significant step forward in the field of LLMs, covering diverse topics from agent generation to safety alignment. The advancements highlighted here promise to shape the future of AI development and its applications across various domains.

RAG: Retrieval-Augmented Generation Papers

Now, let's switch gears and explore the world of RAG! Retrieval-Augmented Generation is all about combining the power of pre-trained language models with external knowledge sources. It's like giving your AI a super-powered memory boost! We've got some exciting papers here that dig into everything from cross-lingual challenges to performance optimizations.

Title	Date	Comment
The Cross-Lingual Cost: Retrieval Biases in RAG over Arabic-English Corpora	2025-10-27	Accep... Accepted to ArabicNLP 2025
Quality-Aware Translation Tagging in Multilingual RAG system	2025-10-27	EMNLP... EMNLP 2025 MRL Workshop
Rethinking and Exploring String-Based Malware Family Classification in the Era of LLMs and RAG	2025-10-26	This ... This is a technical report from Lingnan University, Hong Kong. Code is available at https://github.com/AIS2Lab/MalwareGPT
Worse than Zero-shot? A Fact-Checking Dataset for Evaluating the Robustness of RAG Against Misleading Retrievals	2025-10-26	Advan... Advances in Neural Information Processing Systems (NeurIPS 2025)
SBASH: a Framework for Designing and Evaluating RAG vs. Prompt-Tuned LLM Honeypots	2025-10-24	to be... to be published in: The 3rd International Conference on Foundation and Large Language Models (FLLM2025), IEEE, 2025
SUBQRAG: Sub-Question Driven Dynamic Graph RAG	2025-10-24	5 pages, 1 figure
Bridging Language Gaps with Adaptive RAG: Improving Indonesian Language Question Answering	2025-10-24	12 pa... 12 pages, 7 figures, 5 tables
Practical Code RAG at Scale: Task-Aware Retrieval Design Choices under Compute Budgets	2025-10-23
HA-RAG: Hotness-Aware RAG Acceleration via Mixed Precision and Data Placement	2025-10-23	13 pa... 13 pages,16 figures,2 tables
More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG	2025-10-23	Preprint
RAG-Stack: Co-Optimizing RAG Quality and Performance From the Vector Database Perspective	2025-10-23
Balancing Fine-tuning and RAG: A Hybrid Strategy for Dynamic LLM Recommendation Updates	2025-10-23	RecSy... RecSys 2025 Industry Track
Automating Iconclass: LLMs and RAG for Large-Scale Classification of Religious Woodcuts	2025-10-22	29 pa... 29 pages, 7 figures. First presented at the "Digital Humanities and Artificial Intelligence" conference at the University of Reading on 17 June 2024
Policy-Governed RAG - Research Design Study	2025-10-22	51 pages, 8 figures
Think Straight, Stop Smart: Structured Reasoning for Efficient Multi-Hop RAG	2025-10-22	Accep... Accepted at NeurIPS 2025 Workshop

One of the key challenges in RAG is dealing with different languages, and The Cross-Lingual Cost: Retrieval Biases in RAG over Arabic-English Corpora, accepted to ArabicNLP 2025, directly addresses this issue. The paper examines retrieval biases when using RAG systems with Arabic and English text, shedding light on how to build more effective multilingual RAG systems. Another approach to multilingual RAG is presented in Quality-Aware Translation Tagging in Multilingual RAG system, showcased at the EMNLP 2025 MRL Workshop, this research introduces a method for tagging translations in a way that improves the quality of multilingual retrieval. This is crucial for applications where information needs to be accessed across language barriers. For those interested in cybersecurity, Rethinking and Exploring String-Based Malware Family Classification in the Era of LLMs and RAG offers a fascinating look at how RAG can be used to classify malware families. This technical report from Lingnan University, Hong Kong, highlights the potential of RAG in combating cyber threats, with code available on GitHub. The robustness of RAG systems is put to the test in Worse than Zero-shot? A Fact-Checking Dataset for Evaluating the Robustness of RAG Against Misleading Retrievals. Accepted for presentation at NeurIPS 2025, this paper introduces a dataset designed to evaluate how well RAG systems can handle misleading information, a critical aspect for building reliable AI. A framework for evaluating RAG systems in a security context is presented in SBASH: a Framework for Designing and Evaluating RAG vs. Prompt-Tuned LLM Honeypots. To be published in the 3rd International Conference on Foundation and Large Language Models (FLLM2025), IEEE, 2025, this paper introduces SBASH, a framework for designing and evaluating RAG-based honeypots, providing a valuable tool for cybersecurity researchers. Improving the efficiency of RAG systems is the focus of SUBQRAG: Sub-Question Driven Dynamic Graph RAG, this 5-page paper with 1 figure explores how breaking down questions into sub-questions can enhance the retrieval process in RAG. This approach can lead to more accurate and efficient results. The challenges of adapting RAG to different languages are further explored in Bridging Language Gaps with Adaptive RAG: Improving Indonesian Language Question Answering, this 12-page paper with 7 figures and 5 tables presents a method for improving Indonesian language question answering using adaptive RAG. This is crucial for making AI systems more accessible to diverse linguistic communities. Practical considerations for using RAG with code are discussed in Practical Code RAG at Scale: Task-Aware Retrieval Design Choices under Compute Budgets, this research delves into the design choices involved in building RAG systems for code, considering computational constraints and task-specific needs. Performance optimizations for RAG are the focus of HA-RAG: Hotness-Aware RAG Acceleration via Mixed Precision and Data Placement, this 13-page paper with 16 figures and 2 tables introduces HA-RAG, a method for accelerating RAG using mixed precision and data placement techniques. A fundamental challenge in RAG is isolating the impact of multiple documents, and More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG directly tackles this issue. This preprint highlights the complexities of handling multiple documents in RAG systems. A holistic approach to optimizing RAG systems is presented in RAG-Stack: Co-Optimizing RAG Quality and Performance From the Vector Database Perspective, this research examines how to co-optimize RAG quality and performance by focusing on the vector database, a key component of RAG infrastructure. The balance between fine-tuning and RAG is explored in Balancing Fine-tuning and RAG: A Hybrid Strategy for Dynamic LLM Recommendation Updates, this paper, presented at the RecSys 2025 Industry Track, introduces a hybrid strategy for updating recommendation models using both fine-tuning and RAG. Applications of RAG in the humanities are showcased in Automating Iconclass: LLMs and RAG for Large-Scale Classification of Religious Woodcuts, this 29-page paper with 7 figures, first presented at the