|
| Confident Yet Uncertain In this week's newsletter, we're circling back to a theme I covered earlier this year.
In a past newsletter I called out the striking contradiction where AI systems are advancing rapidly but our ability to understand and measure them is critically behind. That gap hasn't closed. And, in fact, it's getting larger.
Stanford found that some of the benchmarks we use to evaluate AI contain serious flaws. And at a leading AI conference, 26,000 researchers admitted that we lack tools to understand how frontier LLMs really work.
The risk isn't that AI won't transform business. It already has.
The risk is that we are deploying AI faster than we can do so safely and effectively. When competitive pressure meets immature technology, that's when failures happen. Not because AI doesn't work. Because people don't understand what it is, and what it isn't. | Ask AI the same question twice, get different answers. This "non-determinism" demands human review for business-critical tasks, eliminating the complete automation narrative. | 26,000 industry and academic experts admit they lack the tools to measure LLM intelligence and reasoning, with researchers still asking basic questions about what "interpretable" even means. | Treating AI outputs as 100% reliable creates risk: agentic AI systems that reformat hard drives, wipe production databases, and exfiltrate private data if unsupervised. |
|
| Unlike traditional software that uses if-then logic for consistency, AI relies on inference and pattern recognition. This means it's non-deterministic: ask the same question in a new chat, and you'll likely get a different answer. This lack of precision, visible also in what the industry calls "hallucinations," creates fundamental reliability challenges, especially for tasks where consistency is required. JPMorgan's AI system can generate roughly 80% of a 30-page pitch deck in 30 seconds. But that 80% requires human oversight before client delivery. The issue isn't that AI can't produce impressive work, but you need system logging, error detection, and, ultimately, human sign-off before anything reaches clients or production systems. Strategic Insight: AI cannot replace deterministic software where you need identical results every time. This is about understanding where AI augments human expertise versus where it introduces unacceptable variance in critical processes. |
The NeurIPS conference, one of the AI industry's most important gatherings, drew a record 26,000 attendees. There were sessions across the spectrum of AI and its future, yet conversations often returned to something deceptively basic: how do frontier LLMs actually work? This pursuit of understanding remains in its infancy, and the black box problem is real. Leading companies are taking divergent approaches: Google is shifting toward practical methods, while OpenAI pursues the ambitious goal of fully understanding neural networks. The Practical Angle: Being able to fully interpret AI isn't required for real-world applications. But the black box problem adds risk to enterprises deploying AI in fields where precision and accountability matter. View Conference Recap β Normalization of Deviance refers to the systemic acceptance of deviations from proper behavior because absence of disaster is mistaken for safety. In AI, it means treating LLM outputs as reliable in agentic systems where untrusted output takes consequential actions. This cultural drift, fueled by competitive pressure, has enabled prompt injection attacks, and agents formatting hard drives (as reported in last week's newsletter), wiping production databases, or exfiltrating data. Despite vendor warnings, the rush to market encourages organizations to lower their guard. Critical Take-away: AI must remain human-led in most contexts. Security and threat modeling downstream of LLM output is essential. Competitive pressure is driving the dangerous assumption that "the model will do the right thing," but AI agents should be viewed as potential insider threats. Review Article β | Quick Hits - Holiday Shopping Edition πAI will drive $263 billion in global holiday sales Salesforce estimates AI will drive 21% of global holiday orders. The technology is moving beyond providing gift ideas to enabling direct purchases through chatbots via features like OpenAI's Instant. So, if your product data isn't discoverable by AI models, it will become increasingly invisible to a massive segment of buyers. | Parents strongly warned against purchasing AI toys Popular AI toys marketed for kids: Miko 3, Alilo Smart AI Bunny, and Miiloo have failing guardrails. Tests show they give instructions on sharpening knives, share explicit content, and repeat Communist Party talking points. Experts warn the technology is poorly tested for child safety and guardrails fail in extended conversations. | Four AI tools to elevate your holiday shopping Gift price trackers monitor historical data, alerting you when deals hit your target price. AI review summarizers pull themes from thousands of responses in seconds. Or your favorite chatbot can be used with a prompt like: "my 70-year-old uncle who restores cars and collects vinyl," and get curated suggestions you might not find alone. |
|
| Industry DevelopmentsLevi Strauss launches AI-powered styling tool Levi's created "Outfitting," an AI tool that generates complete styled looks based on product attributes and purchasing history. The tool updates daily, factoring in buying patterns, trends, and seasonality. Of note: Levi's built this internally suggesting that AI personalization could be core IP worth owning. | Disney invests $1B in OpenAI, opens Star Wars and Marvel to Sora Disney announced a $1 billion investment in OpenAI and granted access to Star Wars and Marvel characters for the Sora video generator. This sets a critical precedent: major IP holders with tightly controlled assets could see collaboration as more valuable than litigation. |
|
| |
|