Where are We in the Journey to a Knowledgeable Assistant?

Discover the future of AI with Meta Chief Scientist Xin “Luna” Dong. As AI assistants transition from chatbots to wearables, the demand for reliability is paramount. Learn how the Dual-Neural Knowledge framework targets hallucinations, ensuring your digital assistant provides the precise, real-time accuracy required to navigate our complex information age.

The world is in a transformative era, with AI revolutionizing industries, reshaping innovation, and unlocking opportunities once thought impossible. Its vast potential inspires optimism for a future where technology drives progress across industries, governments, NGOs, and society as a whole. However, with this promise comes significant responsibility. Concerns over bias, inequitable access, safety vulnerabilities, and ethical uncertainties highlight the urgent need for a guiding framework. RISE (Responsible, Inclusive, Safe and Ethical) AI fulfills this role, ensuring that AI technologies are developed and applied responsibly, inclusively, and ethically.

The RISE AI Conference provides a unique platform to explore how artificial intelligence can be harnessed to tackle complex societal and contemporary challenges while upholding the principles of RISE. The inaugural RISE AI Conference took place from October 6-8, 2025 at the University of Notre Dame, and was hosted by the Lucy Family Institute for Data and Society.

For more information, please visit the RISE AI Conference website.

In her keynote, Xin “Luna” Dong, Chief Scientist at Meta, addressed the fundamental tension in current AI development: the struggle between the creative fluency of large language models (LLMs) and the rigid requirement for factual truth. Dong introduced the “Dual-Neural Knowledge” framework as a strategic evolution, moving beyond simple data retrieval toward a sophisticated architecture that mirrors human cognition—balancing internalized parameters with external symbolic references to ensure reliability.
The Hallucination Challenge
Dong characterized the current state of LLMs through the lens of “overconfidence,” where models frequently deliver inaccurate information with total certainty. She illustrated this using the “sister college” query. When asked about Trinity College Oxford, models often provided information about Trinity College Cambridge—an irrelevant response masked as fact. Dong humorously noted that because she is “not famous enough,” she has been able to use this same failed example for two years without the models correcting it. Beyond overconfidence, she highlighted the issue of “incompleteness” using a New York City ballet anecdote. While the AI could identify famous past shows, it failed to list an upcoming performance she was already aware of, proving that current systems struggle with the dynamism of real-world, real-time events.
The CRAG Benchmark & Industry Performance
To quantify these failures, Dong’s team developed the Comprehensive RAG (CRAG) benchmark, featuring 4,400 question-answer pairs across five domains: Finance, Sports, Movie, Music, and Open Domain. This benchmark evaluates information across four dimensions, including dynamism and entity popularity. Crucially, it tracks “Head, Torso, and Tail” entities, noting that 95% to 99.9% of all entities reside in the “long-tail” of less-common knowledge where LLMs struggle most. The results were sobering: LLM-only solutions achieved only 30% accuracy. While industry state-of-the-art (SOTA) solutions perform better, their “perfect” accuracy rate remains below 63%, with most sitting at approximately 50%.
The Dual-Neural Knowledge Solution
The proposed remedy is a “Dual-Neural Knowledge” framework, distinguishing between “neural” knowledge (facts internalized within the model) and “symbolic” knowledge (external data found in Knowledge Graphs). Dong posed three research questions: when to trigger a search vs. rely on internal memory, how to improve RAG robustness against retrieval “noise,” and how to internalize facts efficiently. On the final point, she introduced “Extended Memory Layers.” This breakthrough allows the model to store more data with higher accuracy than standard fine-tuning or LoRA (Low-Rank Adaptation), all while remaining significantly cheaper to implement than full fine-tuning.
The Next Frontier: Wearables
The ultimate application of this research is the transition from chatbots to multimodal wearable AI. Dong described a future where devices “see what you see.” She compared Meta’s glasses to the Humane AI pin and Rabbit R1, noting that while generic devices might identify a “domestic cat,” Meta’s goal is “expert” assistance. In one demo, Meta’s AI successfully identified a “Great Dane puppy” named Maggie, while competitors failed to provide specific breed details. By mastering “noise-robust” summarization and real-time retrieval, these wearables can provide personalized, trustworthy assistance in daily life—from identifying a dog breed to remembering where you left your keys.
As we look toward these industry-wide benchmarks, it becomes clear that establishing a new standard for factuality is essential for professional knowledge retention and strategic growth.

  • Calibrate Model Confidence: LLMs are naturally overconfident; developers must implement “dampening factors” and instruction tuning to teach models to admit when they do not know an answer.
  • Synthesize Retrieval and Generation: Factuality often drops when models are overwhelmed by “retrieval noise.” High-performing systems must balance retrieval recall with noise-robust summarization to maintain accuracy.
  • Internalize via Extended Memory Layers: To move beyond the limitations of standard fine-tuning, “extended memory layers” provide a higher-accuracy, lower-cost method for baking long-tail facts into model parameters.
  • Leverage the Data Flywheel: Continuous improvement relies on a feedback loop where real-world interactions and reinforcement learning refine the model’s ability to provide engaging and accurate responses.
  • Segment Information Dynamism: A knowledgeable assistant must categorize data based on its rate of change, distinguishing between static facts (historical dates) and real-time data (stock prices) to adjust its retrieval strategy accordingly.

  • “For me, I contribute to providing truthful trustworthy information to our users.” — Xin “Luna” Dong
  • “Large language models know a little bit about what they know but they tend to be overconfident.” — Xin “Luna” Dong
  • “The factuality… is accuracy minus hallucination.” — Xin “Luna” Dong
  • “We are building the next generation intelligent assistance for wearable devices… it better to be trustworthy.” — Xin “Luna” Dong
  • “The best answer for those [conflicting] information is to give the attribution… rather than make the decisions for them.” — Xin “Luna” Dong

Health and SocietyScience and TechnologyDigest157Lucy Family Institute for Data & SocietyUniversity of Notre DameArtificial Intelligence

More Like This

Related Posts

Let your curiosity roam! If you enjoyed the insights here, we think you might enjoy discovering the following publications.

Stay In Touch

Subscribe to our Newsletter


To receive the latest and featured content published to ThinkND, please provide your name and email. Free and open to all.

Name
This field is hidden when viewing the form
This field is hidden when viewing the form
What interests you?
Select your topics, and we'll curate relevant updates for your inbox.
Affiliation