You’d think the pioneers behind some of the most advanced AI systems would have a grasp on how their mind-boggling creations actually think. But as it turns out, the human architects at OpenAI are still scratching their heads over the inner workings of the language models they’ve unleashed.
At a recent AI summit in Geneva, OpenAI CEO Sam Altman was refreshingly candid about the black box that is their large language model technology. When pressed on how the company’s flagship AIs like GPT-3 arrive at their often weird and inaccurate outputs, Altman admitted point-blank: “We certainly have not solved interpretability.”
The brutal honesty might sting, but it highlights a core challenge AI labs are grappling with. As these language models grow more capable and humanlike with each iteration, understanding the reasoning behind their responses has become increasingly murky. It’s the AI equivalent of looking under the hood of a car and seeing an indecipherable tangle of wires and components.
Not that OpenAI hasn’t tried lifting the veil. The company has poured enormous resources into cracking the code of interpretability – that is, decoding the artificial neurons and connections that fuel an AI’s “thinking.” But tracing the output back to the original training data has proven to be an arduous, Sisyphean task.
Altman’s counterparts know the struggle. Rival lab Anthropic recently published an in-depth look at the core concepts learned by their latest language model, Claude Sonnet. But even they conceded that the work has “really just begun” despite the substantial investment. Mapping an AI’s full set of inner representations would be “cost-prohibitive.”
The AI interpretability problem takes on existential weight when you consider the risks of advanced AI systems going rogue or developing misaligned values that could endanger humanity. How can we trust these superintelligent entities if we can’t properly peek under the hood?
It’s a paradox not lost on Altman. Despite OpenAI’s public commitment to AI safety – going so far as forming an entire “Superalignment” team before abruptly disbanding it – the company seems to be flying blind when it comes to the core mechanisms driving its breakthrough AI.
Altman knows the importance of demystifying AI can’t be overstated. “The more we can understand what’s happening in these models, the better,” he stated, framing AI interpretability as a key part of making safety “claims” in the future.
But for now, the black box remains stubbornly opaque – a humbling reminder that even as AI pulls off feats that seemto border on magic, its creators are still working to shed light on the spellbinding process happening under the surface. The journey to true AI understanding has only just begun.