Research
Research
[v0.3.3] Interests are generally inter-disciplinary: AI Control, Human-AI Alignment, AI Safety by design, world models, cognitive science.
[v0.3.2] Currently working on some projects on Probabilistic programming and cognitive science; also generally keen in topics on AI-based Psychology or Philosophy.
Here’s A ChatGPT Provided Summary:
“… interests demonstrate a Safety-Cognition-Social triad: Looking at how AI is built (World Models), how it thinks (Cognition/Probabilistic Programming), and how it behaves in a human world (Pluralistic Alignment).”
Works that (Roughly) Represent My Interests:
- AI Safety by design/world models: Guaranteed Safe AI; Full-Stack Alignment
- Pluralistic Alignment: Beyond Preferences in AI Alignment; Patchwork Quilt of Human Coexistence; Roadmap to Pluralistic Alignment
- Morals/Ethics: Ethics of LLM Collaboration; Moral Alignment for LLM Agents
- Cognition-Inspired: Recognition and Rapid Response to Unfamiliar Events; Lessons on autonomous learning from cognitive science