Research

Research

Interests: AI Control, Human-AI Alignment, AI Safety by design.

On the side, happy to chat on AI-based psychology or philosophy.

Works that (Roughly) Represent My Interests:

AI Safety by design/world models: Guaranteed Safe AI; Full-Stack Alignment
Pluralistic Alignment: Beyond Preferences in AI Alignment; Patchwork Quilt of Human Coexistence; Roadmap to Pluralistic Alignment
Morals/Ethics: Ethics of LLM Collaboration; Moral Alignment for LLM Agents
Cognition-Inspired: Recognition and Rapid Response to Unfamiliar Events; Lessons on autonomous learning from cognitive science

Books that Shape my Inspiration: