Fixing Missing Inline Citations: A Deep Dive

by SLV Team 45 views

Inline citations are a crucial part of providing credibility and allowing users to verify information. Currently, there's an issue where these citations are missing, and this article will dive into the reasons why and propose some solutions. Let's break down the problem and how we can fix it, guys!

Observed Behavior: Why Inline Citations Vanish

So, what's actually happening? Let's dig into the observed behavior to understand why our inline citations are MIA (Missing In Action).

First off, the FinalAnswer component uses renderWithCitations to display the agent's reply. However, this function only inserts links when the model throws out [n] tokens. Right now, the data we're getting doesn't have these markers, so the markdown shows up untouched. Think of it like trying to bake a cake without the right ingredients – it just won't work!

renderWithCitations works by turning bracketed numbers into markdown links. It assumes that each number neatly corresponds to an index in the evidence array. Without those explicit markers, no inline links are created. It's like having a treasure map without the 'X' marking the spot. No markers, no links! We need to ensure these markers are present to guide our users to the correct sources.

The Ragie streaming request is also missing the instructions field. This means the agent doesn't get any guidance on how to format citations. On the flip side, our old prompts specifically told models to avoid citations, which might still be influencing outputs. Essentially, the agent isn't being told to include citations, which is a problem we need to address directly.

When we parse and save the final answer, it's just whatever comes back in result.text. We don't post-process it to add markers, even though we do get structured evidence. It's like receiving a beautifully wrapped gift, but forgetting to put a tag on it.

Finally, streaming captures tool output for every step (including transfer_to_citation) into _streamedResponses. But, nothing uses that buffer, so any richer citation payload returned by the tool is thrown away. It's like catching a fish and then throwing it back into the sea – all that effort for nothing!

Working Theory: The Case of the Missing Markers

The core issue is that the agent returns evidence metadata, but we don't tell it to annotate the prose, nor do we reuse the dedicated citation step output. The result.text ends up citation-free. So, the UI has no inline markers to convert, and we're only left with the footer buttons. Basically, we're not giving the system the tools it needs to create inline citations, resulting in a less informative and verifiable user experience.

Recommendations: How to Bring Back Inline Citations

Okay, so how do we fix this mess? Here are a few recommendations to get our inline citations back on track.

  1. Inspect the citation tool payload: First, let's add some temporary logging for _streamedResponses entries where type === "citation". This will confirm whether Ragie is already supplying an annotated answer. If it is, we should use that string over result.text when rendering and saving messages. This is about making sure we're actually using the data we already have. Let’s not reinvent the wheel if we don’t have to.
  2. Provide explicit formatting instructions: We need to populate the Ragie /responses request with an instructions string. This tells the agent to insert inline markers like [1] that are tied to the evidence list order. This is the least intrusive way to leverage the existing renderWithCitations helper. Basically, we're giving the agent a clear roadmap to follow. The clearer the instructions, the better the outcome!
  3. Align numbering once markers exist: We need to verify whether Ragie numbers citations starting at 1 or 0. If it's zero-based, we need to adjust the link lookup in components/agentic-retriever/agentic-response.tsx so that [1] maps to the first evidence item. This is all about making sure our numbers match up correctly. It’s like ensuring the right key opens the right lock.
  4. Fallback plan: If we can't get the upstream output to include markers, we should consider a post-processing step. This step would map result.steps to answer.evidence IDs and inject [n] tokens using either deterministic heuristics (matching quoted snippets) or a small local prompt. This is our safety net, a plan B if all else fails. Sometimes, you just have to roll up your sleeves and do it yourself!

Open Questions / Next Steps: Untangling the Citation Web

Before we get too far ahead, here are some open questions and next steps we need to consider:

  • Does the transfer_to_citation tool return richer span metadata (e.g., offsets) that we could surface alongside inline markers?
  • Are there tenant-level instruction hooks we should respect to avoid conflicting directives when we add citation requirements?
  • Once markers exist, do we still want to deduplicate evidence chips by document_id, or should the chip list mirror the inline numbering one-to-one for clarity? Should it be unique?

These questions will help us fine-tune our approach and ensure we're creating the best possible user experience. It's about not just fixing the problem, but optimizing the solution for the long term. We need to think about scalability and maintainability.

In summary, the absence of inline citations is due to a combination of factors, including missing markers, lack of explicit instructions, and discarded tool output. By addressing these issues with the recommended steps, we can restore inline citations and improve the credibility and usability of our platform. It’s all about making the right connections and ensuring that the system has the information it needs to do its job properly. Keep your eyes peeled for updates as we tackle these challenges! Thanks, guys!