Verständnis von Halluzinationsraten in Sprachmodellen: Einblicke aus dem Training auf Wissensgraphen und ihre Herausforderungen bei der Erkennung

Language models (LMs) show improved performance with increased size and training data, but the relationship between model scale and hallucinations remains unknown. Defining hallucinations in LMs is challenging due to their varied manifestations. A study by Google Deepmind focuses on hallucinations where correct answers are present verbatim in training data. It is found that achieving low hallucination rates requires larger models and more computational resources than previously believed, and detecting hallucinations becomes harder as LM size increases. Knowledge graphs (KGs) are seen as a promising solution to provide structured, factual training data for LMs, potentially reducing hallucinations.

The research explores the connection between LM scale and hallucinations, specifically focusing on cases where correct answers appear in the training data. By utilizing a knowledge graph (KG)-based dataset, researchers train increasingly large LMs to effectively control training content. The results suggest that larger, longer-trained LMs exhibit fewer hallucinations, but achieving low hallucination rates demands more resources than expected. An inverse relationship is also discovered between LM scale and hallucination detectability.

Defining and quantifying hallucinations in natural language settings is complex due to language ambiguity and unclear knowledge content in training data. Despite advances in generative abilities, hallucinations persist as a significant issue for LMs. The study addresses the gap in understanding how hallucinations relate to model scale. Knowledge graphs provide a structured approach to LM training, offering a straightforward method of fact verification and quantifiable measurement of hallucinations.

The research constructs a dataset using knowledge graph triplets, allowing precise control over training data and measurable hallucination assessment. LMs are trained from scratch on this dataset, optimizing auto-regressive log-likelihood. Evaluation involves prompting models with subject and predicate and assessing object completion accuracy against the knowledge graph. Token tasks and head detectors are used to evaluate hallucination detection performance, focusing on instances where correct answers are present in the training set.

In conclusion, larger language models with extended training exhibit reduced hallucination rates, but achieving minimal hallucinations requires significant computational resources. Increasing dataset size leads to higher hallucination rates when model size and training epochs are constant. A balance between memorization and generalization is crucial, as extended training can improve fact retention but may hinder adaptability to new data. The study underlines the complex relationship between model scale, dataset size, and hallucination rates, emphasizing the need to balance these factors to mitigate hallucinations effectively.

Verständnis von Halluzinationsraten in Sprachmodellen: Einblicke aus dem Training auf Wissensgraphen und ihre Herausforderungen bei der Erkennung

Zunehmender Einsatz von KI im Gesundheitswesen erfordert weltweite Nachhaltigkeitsinitiativen

Zunehmende Verbreitung künstlicher Intelligenz im indischen Gesundheitswesen

Zunehmende Beweise zeigen die Bedeutung von KI für das Gesundheitswesen

Zoom und Suki arbeiten zusammen, um KI-gestützte klinische Dokumentation bereitzustellen.

Chatbot-Tutorial 4 — Einsatz von Sentiment-Analyse zur Verbesserung von Chatbot-Interaktionen

Erkennung von Klimaanxiety in der Therapie durch natürliche Sprachverarbeitung – Wissenschaftliche Berichte

In China schafft KI mehr Arbeitsplätze, jedoch mit höheren Einstiegshürden.

Mini-InternVL: Eine Reihe von multimodalen großen Sprachmodellen (MLLMs) von 1B bis 4B, die 90 % der Leistung mit nur 5 % der Parameter erreichen

KI-Experten fordern globalen Konsens über Ethik und Regulierung beim WLA-Forum

Clarifai als Führer in Computer Vision Tools Bericht von unabhängiger Forschungseinrichtung genannt

Die Zukunft von ChatGPT: Wird es kostenlos bleiben?

Exklusiv: Serve Robotics Executive Interview mit EnerCom – Oil & Gas 360