Most large language models (LLMs) tend to produce similar, predictable answers, limiting their creativity and usefulness in open-ended tasks. Australian startup Springboards is addressing this issue with Flint, an LLM trained to generate a wider range of responses, challenging the prevailing groupthink among AI chatbots.
- Most LLMs produce repetitive and predictable answers to open-ended prompts.
- Springboards’ Flint encourages varied and creative responses, countering groupthink.
- Researchers confirm LLM homogeneity, linking it to similar training data and methods.
What happened
Testing popular chatbots by requesting random numbers repeatedly demonstrates a notable pattern: models like ChatGPT and Claude tend to return the same or similar values, showing limited variation. This predictability extends beyond numbers—when asked open-ended questions, these models often generate closely related answers, indicating a constrained response diversity.
Recognizing this limitation, Australian startup Springboards developed Flint, a language model explicitly trained to produce a broader spectrum of answers. Flint’s approach is unconventional; it intentionally embraces hallucinations—responses that deviate from strict factual accuracy—to foster creativity and avoid the homogenized output seen in mainstream models.
Why it matters
The widespread conformity among LLMs, known as groupthink, restricts their usefulness in tasks requiring originality, such as brainstorming, creative writing, and marketing. This limitation also misleads users who may believe their chatbot interactions are unique when in fact they often receive responses common to millions of others.
Research presented at the NeurIPS conference in late 2024 highlighted this phenomenon by demonstrating that an array of different models produced remarkably similar metaphors and answers to open-ended questions. This finding points to systemic issues in how LLMs are trained, typically on overlapping datasets and using similar methodologies, resulting in reduced innovative potential.
What to watch next
As AI adoption grows, the demand for more creative and less predictable language models is likely to increase. Innovations like Springboards’ Flint could set new standards for the industry by championing diversity in AI-generated content. Observers should track how such models perform commercially and whether they influence broader AI training paradigms.
Further research is expected on balancing factual accuracy with creativity to refine model outputs without sacrificing reliability. The community will also watch how leading AI providers respond to critiques of homogeneity—whether by varying training data, architectures, or encouraging more divergent responses to enhance user experience.