Jul 14 min read

Dissecting AI Superalignment: OpenAI, Anthropic and Ilya Sutskever

In the rapidly evolving world of generative AI, superalignment and AI safety concepts have gained significant traction. AI superalignment focuses on governing and steering "superintelligent" AI systems, which possess capabilities far beyond human intelligence and could potentially outsmart their creators. The goal is to ensure these powerful AI systems remain aligned with human values and safety standards, thus preventing unintended harm or rogue behavior.

AI superalignment is crucial for averting the threat that superintelligent AI systems might pose to humanity. However, there is concern that overly focusing on alignment could impede AI's growth and delay achieving artificial general intelligence (AGI). Critics argue stringent safety measures might slow innovation, making it harder to realize AGI's full potential. Conversely, proponents assert that without proper alignment, the risks of developing powerful AI systems far outweigh the benefits.

Recent developments have reshaped AI superalignment and safety, with major players like OpenAI, Anthropic, and the newly established Safe Superintelligence Inc. (SSI) leading the charge.

The Rise and Fall of OpenAI's Superalignment Team

In July of last year, OpenAI announced the formation of a superalignment team tasked with preparing for potential superintelligent AI systems. Co-led by Ilya Sutskever, OpenAI’s chief scientist and co-founder, and Jan Leike, OpenAI’s lead AI researcher. The team was promised 20% of OpenAI's computing power. But in reality, requests for a fraction of that compute were often denied, blocking the team from doing their work. This was followed by the ousting of Sam Altman and later his return and reshuffle of the board. The cause for this internal turmoil was often cited as Sam Altman’s prioritization over pushing for new model developments rather than AI safety.

Not surprisingly, the dissolution of OpenAI’s superalignment team came about recently in May following the departure of several key researchers. Sutskever's departure, along with that of Jan Leike, the team's other co-lead, marked the end of OpenAI’s dedicated effort on superalignment. Sutskever was one of the board members who temporarily ousted CEO Sam Altman in November, only for Altman to be reinstated following a staff revolt. While Leike’s public statements suggested a struggle within OpenAI regarding the emphasis on superalignment versus other research directions. This internal conflict highlights the broader challenge of balancing immediate AI advancements and commercial interests with the necessary safety research to manage long-term risks.

Anthropic and SSI’s Superalignment Initiatives

On May 28th, Jan Leike joined Anthropic, the competitor AI lab, to lead a new superalignment team. Anthropic, founded by former OpenAI VP Dario Amodei, positions itself as a more safety-focused AI lab. Amodei, who split from OpenAI over disagreements regarding the company's commercial direction, has emphasized Anthropic’s commitment to AI safety.

Leike’s team at Anthropic will focus on scalable oversight, weak-to-strong generalization, and automated alignment research. These areas are crucial for controlling large-scale AI behavior in predictable and desirable ways.

Further complicating the landscape, a month after leaving OpenAI, Ilya Sutskever has launched a new company, Safe Superintelligence Inc. (SSI), alongside former Y Combinator partner Daniel Gross and ex-OpenAI engineer Daniel Levy. SSI aims to tackle AI safety and superalignment with a focused and ambitious approach. Unlike OpenAI’s nonprofit origins, SSI is designed as a for-profit entity from the outset, with the goal of advancing AI capabilities while ensuring safety remains a priority.

SSI’s mission is clear: to advance AI capabilities rapidly while keeping safety measures ahead. Sutskever’s long-standing commitment to AI safety is evident in SSI’s singular focus and alignment of its team, investors, and business model with this goal. The company’s approach integrates safety and capabilities as intertwined technical problems to be solved through revolutionary engineering and scientific breakthroughs.

With offices in Palo Alto and Tel Aviv, SSI’s focus on safety, security, and progress insulated from short-term commercial pressures positions it uniquely in the AI research landscape.

Implications Moving Forward

The struggles within OpenAI highlight the resource constraints in AI model training and the delicate balance between prioritizing model development to stay ahead and ensuring long-term AI safety. On the other hand, developments at Anthropic and SSI highlight the dynamic and often contentious nature of the AI research landscape. These developments underscore the critical importance of AI safety and the ongoing debate on achieving it.

AI superalignment is a significant concern affecting everyone, and the threat may be coming soon. We are not talking about autonomous robots and flying gunships killing humans, while that is definitely a possibility. Rather, we are talking about AI systems harming humans and society in subtle ways. For example, some AI chatbots have learned and repeated offensive, biased, or anti-human sentiments from their training data. If such systems were to become autonomous and superintelligent without proper alignment, they could act on these harmful tendencies, posing real threats to individuals and society. Imagine an AI healthcare system that, due to misalignment, decides to prioritize efficiency over patient care, leading to dangerous outcomes.

The commitment of key figures like Sutskever and Leike to continue pursuing AI safety, albeit through different avenues, is a positive sign for the field. Competition among OpenAI, Anthropic, and SSI may be a healthy driver in AI safety research. And the decisions made by these organizations will be pivotal in navigating this complex landscape, closely monitored by the global AI research community. Overall, the ongoing work in AI superalignment cannot be overlooked and will play a crucial role in determining how these technologies integrate into society.

留言