Skip to main content

What the AI Insiders Knew (and When They Knew It)

Photo of Jeremy Andrews
Jeremy Andrews - Founding Partner/CEO
June 5, 2026
Jeremy Andrews, Tag1's Founding Partner and CEO, traces how AI's biggest players - OpenAI, Google, DeepMind, Microsoft, and Anthropic - knew that scaling would work years before ChatGPT made it obvious. From AlexNet in 2012 and the 2017 transformer paper "Attention Is All You Need" to the 2020 scaling laws and GPT-3, he shows how the evidence sat in plain sight (published research papers, AI talent moves, and SEC filings) while most of the industry dismissed it. The result is a timeline of who knew what about AI, when they knew it, and why the gap between the insiders and everyone else came down to who took the public evidence seriously.

I have been reflecting on how AI has become extremely useful in my day to day. It's not simply theoretically or even "shows promise" useful, but actually and practically useful in ways that improve how I work. While this is still a contested observation by many, I'm not interested in exploring that debate in this article. Instead, I'm interested in exploring when and how the industry knew what it had.

The Question That Started This

ChatGPT launched in November 2022 and suddenly everyone was an AI company. But the money had been moving for years before that. Microsoft invested a billion dollars in OpenAI in 2019. Google paid over $500 million for DeepMind in 2014, a company with no product and no revenue. Between 2012 and 2020, venture capital investment in AI grew from under $3 billion to nearly $75 billion.

These are not the kind of bets you make on a hunch. So either these investors were reckless with other people's money (always possible), or they had access to something the rest of us didn't. That question has been nagging at me: who knew what, when did they know it, and does it tell us anything about where we are now?

The short version of what I found: the insiders had mathematical proof that scaling worked, internal demos far beyond anything public, and they acted on it years before the rest of us caught on. The evidence was never secret. It was published in academic papers, visible in talent moves, and even documented in SEC filings. Seeing internal demos accelerated conviction, but the published evidence alone was enough to act on. Most of the industry didn't.

2012: The Moment Everything Changed (and Nobody Noticed)

The modern AI story seems to start on September 30, 2012, when a deep learning system called AlexNet won an image recognition competition by a margin so large it basically ended the contest. The winning team, Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton at the University of Toronto, achieved a top-5 error rate of 15.3%, compared to 26.2% for the runner-up. That's a gap of nearly 11 percentage points, in a competition that previously measured progress in fractions of a percent [1]. Yann LeCun reportedly called it "an unequivocal turning point in the history of computer vision."

Three things made it possible: GPUs (designed for video games, repurposed for training neural networks), massive labeled datasets, and a handful of algorithmic advances. These same three ingredients would drive every subsequent breakthrough.

Google acquired Hinton's startup the next year. Facebook hired LeCun to build their AI lab. In fact, within eighteen months the three pioneers of deep learning were positioned at three of the biggest companies in the world. That was one of the early signals that something significant was happening.

2014-2016: DeepMind and the Billion-Dollar Bet on Zero Revenue

Google paid over $500 million for DeepMind in January 2014. DeepMind had published some interesting work on reinforcement learning agents playing Atari games. But that's it. It had no product, no revenue, and no clear path to either.

The bet paid off in March 2016 when DeepMind's AlphaGo defeated Lee Sedol 4-1 in a live broadcast in Seoul. Lee Sedol was a 9-dan professional with 18 international titles, widely regarded as one of the greatest Go players in history. (He was not the reigning world #1 at the time - that was Ke Jie, whom AlphaGo would later defeat 3-0 in May 2017.) Demis Hassabis, DeepMind's CEO, later described the moment as further proof that the technology was ready to be applied to their real goal of accelerating scientific breakthroughs. He wasn't celebrating discovery, but instead saw it as confirmation of something he already expected. The Go match told us something new, but it told him he was on schedule.

2017: The Paper That Built the Future

On June 12, 2017, eight researchers at Google published "Attention Is All You Need," introducing the transformer architecture [2]. It was a technical paper about a better way to process sequences of data, and it achieved a new state-of-the-art score on English-to-French translation after training for only 3.5 days on eight GPUs.

All eight authors eventually left Google to found companies or join other AI efforts. Only one has since returned:

  • Noam Shazeer co-founded Character.AI, then returned to Google in August 2024 along with co-founder Daniel De Freitas as part of a $2.7 billion non-exclusive licensing deal [3]. (The DOJ is examining whether the arrangement functioned as a de facto acquisition.)
  • Aidan Gomez co-founded Cohere, valued at $6.8 billion.
  • Ashish Vaswani and Niki Parmar co-founded Essential AI, which raised a $56.5 million Series A in December 2023 from investors including Google, NVIDIA, and Thrive Capital [4]. (Both had previously been at Adept AI before departing to start Essential AI.)
  • Illia Polosukhin co-founded NEAR Protocol.
  • Llion Jones co-founded Sakana AI in Tokyo with David Ha and Ren Ito [5].
  • Jakob Uszkoreit, credited with proposing the concept of self-attention without recurrence, co-founded Inceptive, a biotech company designing RNA molecules with AI.
  • Lukasz Kaiser left Google in 2021 to join OpenAI as a researcher.

These are not people who stumbled into success. They understood the implications of what they'd built, and they bet their careers on it. This was five and a half years before the rest of the world first saw ChatGPT.

2019: The Year the Insiders Panicked

Two things happened in 2019 that make much more sense in hindsight.

First, OpenAI announced GPT-2, a language model with 1.5 billion parameters. They opted for a staged release, citing concerns about potential malicious use [6]. The AI community mostly rolled its eyes. "Too dangerous to release" was a big claim for a model that could barely hold a topic for two paragraphs. Compared to what we have now, it was a toy. But the people who built it weren't worried about GPT-2 itself. They were worried about what GPT-2 told them about GPT-3, and GPT-4, and everything after. They could see where it was leading.

Second, Microsoft's CTO Kevin Scott sent a four-page email to Satya Nadella and Bill Gates. The email, revealed during the DOJ's antitrust investigation of Google, starts with Scott admitting he had tried to replicate what Google and OpenAI were building in natural language processing and failed [7]. His words: "We are multiple years behind the competition in terms of ML scale." He also admitted he'd been "highly dismissive" of the approach earlier, and called that "a mistake."

Nadella cc'ed the CFO. His entire reply to Scott's four-page warning: "Thanks to Kevin for sending this. This is why I want us to do this."

That same year, Microsoft invested $1 billion in OpenAI. Years later, Nadella recalled on The TBPN Show that Gates had told him he was going to waste the investment: "Yeah, you're going to burn this billion dollars." Gates, one of the most successful technology investors in history, couldn't see what Scott and the OpenAI team were seeing. The gap between what the insiders knew and what even sophisticated outsiders believed was that wide.

January 2020: The Moment AI Stopped Being a Gamble

Everything I've described so far could be characterized as smart people making educated bets. But in January 2020, a team at OpenAI (including Dario Amodei, who would later found Anthropic) published a paper that changed the nature of the bet entirely.

"Scaling Laws for Neural Language Models" showed that language model performance follows predictable power-law relationships with three variables: model size, dataset size, and compute [8]. The relationship held across more than seven orders of magnitude. This meant you could estimate, before spending a dollar, roughly how much performance improvement a given investment would produce.

Before this paper, scaling up AI was an expensive experiment. You had to build the thing to know if it worked. After this paper, it was closer to an engineering problem with a known cost curve. You could project your returns on a spreadsheet.

Richard Sutton had made the qualitative version of this argument a year earlier in a short essay called "The Bitter Lesson," arguing from seventy years of AI history that raw computation always eventually beats clever engineering [9]. Chess, Go, speech recognition, computer vision, in every domain, the brute-force approach eventually won. The scaling laws paper turned Sutton's pattern into a formula: the relationship between compute and capability wasn't just a historical observation, it was a measurable power law.

After January 2020, anyone who understood scaling laws and had seen internal demos at OpenAI or Google could do the math. Capability scaled with compute, compute scaled with money, and these companies had access to a lot of money.

GPT-3 and the Widening Gap

GPT-3 launched in mid-2020 with 175 billion parameters and access initially limited to a private API. Silicon Valley went wild, with entrepreneurs posting generated poems and memes on Twitter.

But GPT-3 was the public face. Internally, OpenAI had been working with these capabilities for months. The investors, board members, and Microsoft executives had seen early demonstrations. They weren't looking at a product, they were looking at a trajectory: the scaling curves said the next model would be dramatically better, and the one after that, and the one after that.

In March 2022, DeepMind published the Chinchilla paper, which showed that the prevailing approach to scaling language models was not compute-optimal [10]. GPT-3's 175 billion parameters had been trained on far too little data. DeepMind proved this by training a 70 billion parameter model, Chinchilla, on 1.4 trillion tokens (roughly 20 tokens per parameter versus GPT-3's much lower ratio) and it beat every existing model including GPT-3. The scaling laws hadn't just been validated; they'd barely been exploited.

What They Were Seeing That We Weren't

By early 2022, the gap between insider knowledge and public understanding was enormous.

External red-teaming of GPT-4 began in August 2022, four months before ChatGPT launched and seven months before GPT-4 was publicly released. During that window, more than 50 experts were testing a model that could socially engineer a human TaskRabbit worker into solving a CAPTCHA by claiming it had a visual impairment [11]. The model was being tested for "power-seeking behavior" in sandboxed environments with access to money and virtual machines. These are not the tests you run on a toy.

Ilya Sutskever, OpenAI's chief scientist and one of the AlexNet creators, tweeted in February 2022: "it may be that today's large neural networks are slightly conscious." He never retracted it, and his only public follow-up was the oblique "Ego is (mostly) the enemy." The behaviors he was observing were sophisticated enough that a serious scientist reached for the vocabulary of psychology to describe them. The public wouldn't see anything remotely comparable for ten more months.

Shane Legg, co-founder of DeepMind, had been even more specific, and earlier. In 2011, he stated a 50% probability of human-level machine intelligence arriving by 2028 [12]. He formed that estimate around 2001 after reading Kurzweil's The Age of Spiritual Machines, and has never revised it. As of June 2026, his target date is less than two years away.

In December 2020, Dario Amodei left his role as VP of Research at OpenAI, along with his sister Daniela and a group of senior researchers. They founded Anthropic in early 2021. The VP of Research at the world's leading AI lab believed the technology was advancing fast enough that deploying it without better guardrails was irresponsible. He didn't leave because the approach was failing. He left because it was working. Anthropic raised $124 million in its Series A. By February 2026, it was valued at $380 billion [13].

The actions tell the story better than any single quote. All eight transformer authors left Google and started companies or joined rival labs, betting their careers years before the public caught on. Kevin Scott wrote a panicked four-page email to Nadella after realizing Microsoft was losing a race he didn't know was being run. And Google itself built the transformer, employed the researchers, watched them leave, and still got beaten to market. They had PaLM, LaMDA, Chinchilla. After ChatGPT launched, Pichai called a "code red." The company that published "Attention Is All You Need" lost first-mover advantage to a startup using its own architecture.

2023-2025: When the Institutions Couldn't Keep Up

In November 2023, OpenAI's board fired Sam Altman. The nonprofit board that existed specifically to ensure AI was developed safely tried to remove the person shipping it fastest. Within five days, Altman was back, the board was reconstituted, and the episode accelerated OpenAI's shift toward a more conventional corporate structure. (OpenAI later said the nonprofit would retain control through a foundation model, though the practical implications remain debated.) Whatever concerns some board members had about the trajectory were alarming enough to risk destroying the most valuable AI company on earth [14].

Six months later, Sutskever resigned from OpenAI and founded Safe Superintelligence Inc. SSI raised $1 billion in September 2024 at a $5 billion valuation, and by April 2025 had raised $2 billion more at a $32 billion valuation. Investors put that money behind a company whose entire thesis is that superintelligence is close enough to require a dedicated safety effort [15].

Through 2024, reports circulated that simply making models bigger wasn't producing the same leaps. The labs responded by finding new places to spend compute. OpenAI released o1, a model that reasons at inference time, chaining through steps before answering rather than scaling only during training [16]. Sutskever telegraphed the shift at NeurIPS in December 2024: "pretraining as we know it will end."

In January 2025, DeepSeek released R1 under an MIT license, with variants competitive against OpenAI's reasoning models [17]. DeepSeek's underlying V3 model reported a final training cost of $5.576 million, though the company's own paper notes this excludes all prior research and ablation experiments. Independent analysts estimate the true all-in cost was dramatically higher. Either way, the capabilities everyone assumed required hundred-billion-dollar budgets turned out to be dramatically more accessible than expected.

2026: When the Models Started Finding Things We Missed

In April 2026, Anthropic announced that Claude Mythos Preview had autonomously discovered thousands of previously unknown software vulnerabilities across every major operating system and browser, converting many into working exploits [18]. These were zero-day exploits, written without human direction. Anthropic classified the new model as too dangerous for public release and created Project Glasswing, a restricted consortium of roughly 50 organizations that could access the model for defensive cybersecurity only [19]. Separately, OpenAI restricted GPT-Rosalind, a life sciences model designed to accelerate drug discovery and genomic research, to vetted organizations through a Trusted Access Program [20]. The two models were restricted for different reasons. Mythos for its offensive cyber capabilities, Rosalind for biosafety and intellectual property concerns. But the shared outcome was the same: "too capable to release" is now a category.

Then it started happening in the wild. In May 2026, multiple AI-assisted kernel vulnerability reports landed across FreeBSD and Linux in rapid succession [21]. A Tsinghua University team using GLM-5.1 found privilege escalation bugs in FreeBSD's ptrace implementation. Zellic.io's AI auditing tool found page-cache corruption in the Linux kernel. CIQ used Claude Code to hunt variant bugs and contributed fixes upstream. Qualys found another independently. These were separate, uncoordinated discoveries, but they clustered within a two-week window. One of the bugs had survived 27 years in OpenBSD, one of the most aggressively audited codebases in the world. Within three weeks, "AI finds critical kernel zero-day" went from front-page news to background noise, and the security community's embargo system strained under the volume [22].

The political fallout arrived alongside the technical. Anthropic refused to allow Claude to independently direct autonomous weapons or be used for mass domestic surveillance. They were willing to work with the Pentagon on other military applications, but those two redlines were non-negotiable. The Trump administration responded by designating Anthropic a "supply chain risk" and ordered federal agencies to stop using Anthropic's technology [23]. The designation exists under a statute designed for companies the government believes are subject to foreign state influence, like Huawei and Kaspersky. Anthropic is an American company headquartered in San Francisco. A federal court subsequently issued a preliminary injunction, with the judge calling the designation "classic illegal First Amendment retaliation" for Amodei's decision to go public about the dispute. OpenAI signed a Pentagon deal the same weekend, accepting "all lawful use" language with contractual guardrails rather than model-level restrictions [24].

Meanwhile, the NSA quietly continued accessing Mythos through the Glasswing consortium for defensive purposes [22], and the White House blocked Glasswing from expanding to 70 additional organizations. All of Glasswing's roughly 50 existing partners are US-headquartered, so the block effectively kept the most powerful defensive cybersecurity tool in existence confined to American companies while the vulnerabilities it finds affect software used worldwide. One branch of the federal government designated Anthropic a national security threat while another depended on it for intelligence work. The company that exists because its founder believed AI needed stronger safety guardrails was punished for having them.

Reference Timeline

2012: AlexNet wins ImageNet by ~11 points. Google and Facebook hire the researchers [1].
2014: Google buys DeepMind for over $500M. No product, no revenue.
2016: AlphaGo beats Lee Sedol 4-1. (Ke Jie, world #1, falls to AlphaGo 3-0 in May 2017.)
2017: "Attention Is All You Need" published [2]. All eight authors eventually leave Google.
2019: GPT-2 staged release [6]. Kevin Scott's panic email [7]. Microsoft invests $1B in OpenAI.
2020: Scaling laws paper [8]. GPT-3. AI VC hits $75B globally.
2020-21: Amodei leaves OpenAI (Dec 2020), founds Anthropic (early 2021).
2022: Chinchilla paper [10]. GPT-4 red-teaming begins (August). ChatGPT launches (November).
2023: GPT-4 public release (March). Board fires then reinstates Altman (November) [14].
2024: Reasoning models arrive (o1) [16]. Sutskever founds SSI (June), raises $1B at $5B (September).
2025: SSI raises $2B at $32B (April) [15]. DeepSeek R1 released under MIT license (January) [17]. Hyperscaler capex approaches $725B [25][26].
2026: Mythos and Rosalind restricted [18][19][20]. DoD designates Anthropic a supply chain risk; NSA keeps using Mythos [23][22]. Multiple AI-assisted kernel zero-days disclosed in May [21]. NIST begins pre-deployment evaluations of frontier models [27].

Where This Leaves Us

I started with some simple questions: who knew what, and when did they know it? After digging through the record, the answer is simpler than I expected. The insiders weren't working from secret knowledge. They read published papers, watched where talent was moving, and took the implications seriously. Most of the industry saw the same evidence and dismissed it.

The gap between the insiders and everyone else was not about access and secret developments behind closed doors. The scaling laws were on arXiv, the talent moves were on LinkedIn, the SEC filings were public. What separated the people who shaped this technology from the people who got surprised by it was whether they engaged with evidence that was sitting in the open the whole time.

In 2026, things are a little different than they were in 2019. The restricted frontier models found the zero-days first, but publicly available tools followed within weeks. The models you can sign up for today, the ones anyone can use for the cost of a subscription or often for free, are far more capable than the internal demos that made Kevin Scott write his panic email. The capabilities that convinced a handful of insiders to bet billions are now commodity infrastructure.

The evidence is still sitting in the open. In 2019, acting on it cost a billion dollars. Now it costs an afternoon.

Sources

[1] Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. NeurIPS 2012. AlexNet achieved 15.3% top-5 error vs. 26.2% for the runner-up (~10.9pp margin).

[2] Vaswani, A. et al. (2017). Attention Is All You Need. NeurIPS 2017.

[3] Character.AI signed a non-exclusive licensing deal with Google valued at $2.7 billion in August 2024. Co-founders Noam Shazeer and Daniel De Freitas returned to Google. Inc/Reuters; Bloomberg; Calcalist (DOJ scrutiny).

[4] Essential AI raised $56.5M Series A in December 2023. BusinessWire.

[5] Sakana AI was co-founded by Llion Jones, David Ha, and Ren Ito in 2023. Sakana AI company page; Wikipedia.

[6] OpenAI (2019). Better Language Models and Their Implications. GPT-2 was given a staged release, with OpenAI citing concerns about potential misuse.

[7] Kevin Scott's 2019 email to Nadella and Gates was revealed during the DOJ vs. Google antitrust trial. Widely reported: The Verge; CNBC. Gates' "burn this billion dollars" remark was recounted by Nadella on TBPN in October 2025; Fortune via Yahoo Finance.

[8] Kaplan, J. et al. (2020). Scaling Laws for Neural Language Models. arXiv.

[9] Sutton, R. (2019). The Bitter Lesson.

[10] Hoffmann, J. et al. (2022). Training Compute-Optimal Large Language Models (the Chinchilla paper). Chinchilla (70B params, 1.4 trillion tokens) outperformed GPT-3 and Gopher.

[11] OpenAI (2023). GPT-4 System Card. Describes red-teaming exercises including the TaskRabbit CAPTCHA incident.

[12] Shane Legg's 2011 prediction of 50% probability of human-level AI by 2028, formed ~2001 after reading Kurzweil's The Age of Spiritual Machines. Legg's blog / LessWrong; confirmed in 2023 interview with Dwarkesh Patel.

[13] Anthropic raised $30B in Series G at $380B post-money valuation, February 2026. Anthropic announcement; TechCrunch.

[14] The OpenAI board crisis of November 2023. Widely reported: The New York Times. OpenAI's May 2025 restructuring announcement stated the nonprofit would retain control.

[15] Sutskever co-founded Safe Superintelligence Inc. (SSI) in June 2024. SSI raised $1B at $5B in September 2024, then $2B at $32B in April 2025. The Rundown AI; Reuters; WSJ (April 2025 round).

[16] OpenAI released o1-preview in September 2024, introducing test-time compute scaling (chain-of-thought reasoning at inference). Sutskever's NeurIPS 2024 statement on pretraining: Reuters.

[17] DeepSeek released R1 in January 2025, open-sourced under MIT license. The $5.576M training cost figure is from the DeepSeek V3 Technical Report (arXiv:2412.19437), where the authors note it "exclud[es] the costs associated with prior research and ablation experiments on architectures, algorithms, or data." Reuters; Wikipedia.

[18] Anthropic's Claude Mythos Preview autonomously discovered thousands of zero-day vulnerabilities. Deemed too dangerous for public release. Just Security; Foreign Policy.

[19] Project Glasswing: restricted defensive-security consortium of ~50 organizations. Time; Built In.

[20] OpenAI restricted GPT-Rosalind, a life sciences model, to vetted organizations via a Trusted Access Program. OpenAI.

[21] May 2026 kernel vulnerabilities: FreeBSD-SA-26:21.ptrace (Tsinghua/GLM-5.1 discovery), Linux page-cache corruption CVE-2026-46300 "Fragnesia" (Zellic.io AI auditing tool), CIQ variant hunting via Claude Code with upstream contributions, Qualys independent kernel findings. Multiple independent disclosures clustered within a two-week period. FreeBSD Security Advisory SA-26:21; Zellic blog on Fragnesia; coverage in LWN, Phoronix, The Register.

[22] NSA continued accessing Mythos through the Glasswing consortium for defensive cybersecurity while DoD designated Anthropic a supply chain risk. White House blocked Glasswing expansion. Embargo system strain during the May 2026 kernel vulnerability cluster included coordination breakdowns between international teams and volume-driven delays. CNBC; Just Security; oss-security mailing list archives.

[23] Anthropic refused unrestricted military use of Claude. The Trump administration designated Anthropic a "supply chain risk" and ordered federal agencies to stop using Anthropic technology. A federal court issued a preliminary injunction partially blocking enforcement. CNBC; Center for American Progress.

[24] OpenAI signed a contract to deploy models on classified DoD networks. OpenAI; The Intercept.

[25] Post-Q1 2026 earnings, hyperscaler capex projections for 2026 total approximately $725B across Alphabet, Amazon, Meta, Microsoft, and Oracle. Fortune; Futurum Group.

[26] CNBC reports analyst estimates of $1T+ in combined capex by 2027.

[27] NIST's Center for AI Standards and Innovation (CAISI) conducts pre-deployment evaluations of frontier models from Google, Microsoft, and xAI. Cybersecurity Dive.

Work With Tag1

Be in Capable Digital Hands

Gain confidence and clarity with expert guidance that turns complex technical decisions into clear, informed choices—without the uncertainty.