Table of Contents
This Week in Startups showcased three companies from their Twist 500 list that represent radically different approaches to artificial intelligence's next frontier. While most AI companies chase incremental improvements in language models, these startups tackle fundamental problems: biological computing, model evaluation, and labor market inefficiencies.
Key Takeaways
- Cortical Labs fuses biological neurons with digital chips, creating hybrid computers that run on glucose and learn through electrical feedback
- Turing addresses the critical gap in AI evaluation by using frontier humans to break frontier models, generating data that advances capabilities
- Mercor leverages AI interviews to revolutionize talent matching, achieving 165% net revenue retention by solving the labor market's core inefficiencies
- The shift from pre-training to post-training has created massive demand for expert-generated data across specialized domains
- Biological computing offers unprecedented energy efficiency, using 0.0001 watts compared to traditional silicon-based systems
- Remote work's persistence enables global talent pools, making centralized matching platforms increasingly valuable
- Hardware-first approaches in emerging compute paradigms require patient capital willing to fund infrastructure before software applications emerge
- Sample-constrained environments favor biological systems over reinforcement learning agents that rely on accelerated simulation
- Private AI evaluations matter more than public benchmarks for companies optimizing specific user distributions
Biological Computing Breakthrough at Cortical Labs
Cortical Labs represents perhaps the most audacious computing paradigm shift since the invention of the transistor. CEO Han Wing Chong's journey began in 2017 when he discovered Demis Hassabis's paper advocating for neuroscience's return to AI development. This led him to Melbourne University's neuroscience department, where researchers introduced him to multi-electrode arrays capable of interfacing with living neurons.
- The Pong experiment proved biological systems could learn through structured electrical feedback, with positive organized bursts for correct actions and white noise for mistakes, demonstrating the practical application of Kyle Friston's free energy principle
- Neurons communicate electrically just like computer chips, creating a shared language that enables true biological-digital hybrid systems rather than mere interfaces
- The CL1 commercial device maintains neurons alive for six months in sealed environments, protecting them from contamination since they lack immune systems and would "catch a cold" like humans
- Energy consumption calculations reveal biological systems use 0.0001 watts for 800,000 to a million neurons, powered entirely by glucose metabolism rather than electricity
- Sample efficiency comparisons show biological systems outperform reinforcement learning agents when constrained to real-time data collection, eliminating the artificial advantage of accelerated simulation
- Manufacturing challenges include life support plumbing, neural interfacing systems with FPGAs, and domain-specific programming languages for precise neuron stimulation at individual channels
The commercial implications extend far beyond novelty. Traditional AI systems require massive datasets accumulated over decades of internet development. Biological systems excel in domains lacking large training corpora, particularly real-time applications where data collection cannot be artificially accelerated.
Han explains the fundamental constraint: "If you think about reinforcement learning systems, nobody really talks about this but the way they actually learn in a simulation is that they spawn millions of parallel processes and they speed up the time of the game by 200 or 300-fold." Real-world applications cannot compress time, making biological systems particularly valuable for robotics, autonomous systems, and dynamic environments requiring immediate adaptation.
Demand has exceeded expectations by 100x, with over 3,000 cloud platform signups despite capacity for only 20-30 concurrent users. Hardware orders from research labs worldwide have created supply chain challenges as the company scales from proof-of-concept to commercial production. The capital intensity of pre-building units before payment creates cash flow pressures typical of hardware startups entering uncharted territories.
Turing's Frontier Human Approach to AI Evaluation
The AI evaluation landscape has evolved dramatically as models approach human-level performance across traditional benchmarks. Turing CEO Jonathan Sedarth identifies a critical inflection point: benchmark saturation combined with the need for increasingly sophisticated evaluation methods. SWEBench coding evaluations jumped from 2% to 60% accuracy in just two years, approaching the saturation threshold where tests no longer differentiate model capabilities.
- LLM evaluation requires three dimensions according to Turing's framework: complexity for truly difficult tasks, real-world applicability for human-relevant problems, and diversity across broad test case ranges
- Public benchmarks serve recruitment and bragging rights, but private evaluations matter more for optimizing models against specific user query distributions that companies actually serve
- The shift from pre-training to post-training has moved the battleground from internet data consumption to expert-generated question-answer pairs that expose model weaknesses
- Frontier models now require frontier humans with PhDs from Stanford, Berkeley, and MIT to identify failure modes that low-skilled contractors could previously expose
- Specialized expertise reaches granular levels, with Turing hiring experts in dark matter, black holes, and molecular biology to generate domain-specific training data
- Reinforcement learning gym environments clone real business applications like DoorDash, Uber Eats, and Salesforce, allowing agents to learn multi-step workflows through trial and error
The data generation process has become increasingly sophisticated. Teams of experts work collaboratively, with physics PhDs asking theoretical questions, software engineers building simulations to test theories, and data scientists analyzing results. This "Avengers" approach scales beyond individual expertise limitations.
Sedarth emphasizes the platform nature of their solution: "You need a platform or a partner that can scale up very quickly to elite talent, manage the talent to make sure the talent is generating data part-time, the data is high quality." The part-time structure keeps experts sharp in their primary fields while contributing specialized knowledge to AI advancement.
Turing's business model captures value across the evaluation-to-improvement pipeline. They identify model gaps through rigorous testing, then generate the precise training data needed to address those weaknesses. This closed-loop approach has driven nine-figure revenue growth while maintaining profitability, positioning them strategically as Meta's acquisition of Scale AI creates neutrality concerns among competing labs.
Mercor's Vision for Global Labor Market Transformation
Mercor tackles what CEO Brendan Foody calls "the largest, most inefficient market in the world" through AI-powered talent matching. The fundamental problem stems from manual processes that limit candidates to applying for dozens of jobs while companies can only evaluate tiny fractions of available talent pools.
- The core inefficiency manifests as a matching problem where candidates access limited opportunities while companies miss vast talent pools due to manual resume review and interview processes
- Mercor's AI interviewer, first built in March 2023, has evolved from hallucination-prone early versions to sophisticated systems that conduct, evaluate, and score candidate interactions at scale
- The platform facilitates matches across knowledge work domains, with average pay rates exceeding $90 per hour, positioning it in premium talent markets rather than crowdsourcing platforms
- Six of the "Magnificent Seven" tech companies use Mercor's services, demonstrating enterprise adoption at the highest levels of the technology sector
- Net revenue retention of 165% indicates customers dramatically expand usage over time, suggesting strong product-market fit and value demonstration
- Performance data collection creates competitive moats through feedback loops measuring bonus distribution, raise allocation, and dismissal patterns across contractor engagements
The business strategy leverages AI companies as a wedge market due to compressed feedback cycles and urgent hiring needs. Five-week contracts provide faster iteration than five-year placements, allowing rapid algorithm improvement through performance correlation analysis.
Foody explains the strategic advantage: "When we're hiring someone for 5 years, you want to get dinner with them, build trust and a relationship. 5 weeks, you want this fast, efficient AI interviews, automated process." This focus enables optimization for efficiency over relationship-building, playing to AI's strengths.
The global talent pool thesis depends on remote work's persistence despite corporate return-to-office mandates. Foody argues that AI automation of 90% of knowledge work will increase demand for the remaining 10%, particularly in high-elasticity domains like software engineering where productivity gains translate to expanded output rather than workforce reduction.
Power law dynamics in knowledge work create disproportionate value capture opportunities. The top 10% of contributors drive majority value across most professional domains, making accurate identification and placement of exceptional talent extraordinarily valuable to client companies.
The Infrastructure Investment Challenge
All three companies highlight a critical pattern in emerging technology markets: the need for patient capital willing to fund hardware infrastructure before software applications mature. Cortical Labs faces the classic hardware startup challenge of high capital expenditure requirements for manufacturing before revenue recognition.
- Venture capital tends toward thematic investing, currently focused on LLM transformers and AI agents, making it difficult for paradigm-shifting approaches to secure funding
- Biological computing requires building entirely new infrastructure since no existing supply chains support neuron-based systems
- Evaluation platforms need specialized talent networks and assessment tools before they can demonstrate value to model builders
- Talent matching platforms must achieve sufficient liquidity on both supply and demand sides before network effects create sustainable competitive advantages
- Geographic constraints affect funding availability, with Australian and European investors showing less appetite for deep technology risks
- The AI boom benefited from decades of gamer GPU purchases that created existing infrastructure, unlike emerging compute paradigms starting from zero
Han Wing Chong draws parallels to historical technology adoption: "If you think about the explosion in AI, it had a very fertile ground because the groundwork had been laid by decades of gamers buying GPUs, keeping the likes of AMD and Nvidia alive until the AI systems came online."
This infrastructure development challenge separates incremental AI improvements from fundamental platform shifts. Companies building new categories often face extended development timelines and higher capital requirements than software-only approaches.
Data Scarcity and Quality Evolution
The transition from data abundance to data scarcity marks a inflection point across all three companies' markets. Internet-scale text corpora that enabled LLM pre-training represent a finite resource, shifting competitive advantage toward post-training data quality and domain-specific expertise.
- Language models benefit from 40 years of public internet content, but most domains lack comparable training datasets, creating opportunities for biological systems in sample-constrained environments
- Expert-generated training data has become exponentially more valuable as models approach human-level performance on standardized tasks
- Evaluation quality determines training data effectiveness, requiring sophisticated assessment methods that can identify subtle capability gaps
- Real-world task performance matters more than synthetic benchmark scores for commercial applications
- Multi-modal and multi-lingual capabilities expand data requirements beyond English text to include visual, audio, and cultural context
- Specialized domains like dark matter physics or molecular biology require PhD-level expertise to generate meaningful training examples
Jonathan Sedarth captures the evolution: "Frontier models need frontier data. Frontier data needs a team of frontier humans." This creates sustainable competitive advantages for companies that can attract and coordinate world-class domain experts.
The scarcity economics of high-quality training data inverts traditional technology scaling assumptions. Instead of marginal costs approaching zero, expert-generated data becomes more expensive as models improve and require increasingly sophisticated training examples.
Biological systems offer an alternative paradigm that sidesteps data scarcity through direct experience rather than dataset consumption. This approach may prove particularly valuable in robotics and real-world applications where simulation-to-reality transfer remains challenging.
The three companies featured represent fundamentally different approaches to AI's next phase. Cortical Labs reimagines computing hardware through biological integration. Turing optimizes the evaluation-improvement cycle that drives model advancement. Mercor applies AI capabilities to transform labor market efficiency. Each tackles infrastructure-level challenges that extend far beyond incremental software improvements, positioning them for significant impact as artificial intelligence reshapes economic fundamentals.