View on
Semantic Scholar /
Google Scholar /
ACL Anthology /
DBLP
Showing (All/None):
Conferences /
Findings /
Journals /
Workshops /
Technical reports /
Theses /
Preprints
2026
-
LLM Novice Uplift on Dual-Use, In Silico Biology Tasks
Chen Bo Calvin Zhang,
Christina Q. Knight,
Nicholas Kruus,
Jason Hausenloy,
Pedro Medeiros,
Nathaniel Li,
Aiden Kim,
Yury Orlovskiy,
Coleman Breen,
Bryce Cai,
Jasper Götting,
Andrew Bo Liu,
Samira Nedungadi,
Paula Rodriguez,
Yannis Yiming He,
Mohamed Shaaban,
Zifan Wang,
Seth Donoughe and
Julian Michael
preprint
pdf
arxiv
bib
2025
-
Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models
Boyi Wei,
Zora Che,
Nathaniel Li,
Udari Madhushani Sehwag,
Jasper Götting,
Samira Nedungadi,
Julian Michael,
Summer Yue,
Dan Hendrycks,
Peter Henderson,
Zifan Wang,
Seth Donoughe and
Mantas Mazeika
preprint
pdf
arxiv
bib
-
Remote Labor Index: Measuring AI Automation of Remote Work
Mantas Mazeika,
Alice Gatti,
Cristina Menghini,
Udari Madhushani Sehwag,
Shivam Singhal,
Yury Orlovskiy,
Steven Basart,
Manasi Sharma,
Denis Peskoff,
Elaine Lau,
Jaehyuk Lim,
Lachlan Carroll,
Alice Blair,
Vinaya Sivakumar,
Sumana Basu,
Brad Kenstler,
Yuntao Ma,
Julian Michael,
Xiaoke Li,
Oliver Ingebretsen,
Aditya Mehta,
Jean Mottola,
John Teichmann,
Kevin Yu,
Zaina Shaik,
Adam Khoja,
Richard Ren,
Jason Hausenloy,
Long Phan,
Ye Htet,
Ankit Aich,
Tahseen Rabbani,
Vivswan Shah,
Andriy Novykov,
Felix Binder,
Kirill Chugunov,
Luis Ramirez,
Matias Geralnik,
Hernán Mesura,
Dean Lee,
Ed-Yeremai Hernandez Cardona,
Annette Diamond,
Summer Yue,
Alexandr Wang,
Bing Liu,
Ernesto Hernandez and
Dan Hendrycks
preprint
website
pdf
arxiv
bib
-
-
-
Inverse Scaling in Test-Time Compute
Aryo Pradipta Gema,
Alexander Hägele,
Runjin Chen,
Andy Arditi,
Jacob Goldman-Wetzler,
Kit Fraser-Taliente,
Henry Sleight,
Linda Petrini,
Julian Michael,
Beatrice Alex,
Pasquale Minervini,
Yanda Chen,
Joe Benton and
Ethan Perez
TMLR
(12/2025; Featured Certification)
pdf
arxiv
openreview
bib
-
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
Tomek Korbak,
Mikita Balesni,
Elizabeth Barnes,
Yoshua Bengio,
Joe Benton,
Joseph Bloom,
Mark Chen,
Alan Cooney,
Allan Dafoe,
Anca Dragan,
Scott Emmons,
Owain Evans,
David Farhi,
Ryan Greenblatt,
Dan Hendrycks,
Marius Hobbhahn,
Evan Hubinger,
Geoffrey Irving,
Erik Jenner,
Daniel Kokotajlo,
Victoria Krakovna,
Shane Legg,
David Lindner,
David Luan,
Aleksander Mądry,
Julian Michael,
Neel Nanda,
Dave Orr,
Jakub Pachocki,
Ethan Perez,
Mary Phuong,
Fabien Roger,
Joshua Saxe,
Buck Shlegeris,
Martín Soto,
Eric Steinberger,
Jasmine Wang,
Wojciech Zaremba,
Bowen Baker,
Rohin Shah and
Vlad Mikulik
preprint
pdf
arxiv
bib
-
-
The Singapore Consensus on Global AI Safety Research Priorities
Yoshua Bengio,
Tegan Maharaj,
Luke Ong,
Stuart Russell,
Dawn Song,
Max Tegmark,
Lan Xue,
Ya-Qin Zhang,
Stephen Casper,
Wan Sie Lee,
Sören Mindermann,
Vanessa Wilfred,
Vidhisha Balachandran,
Fazl Barez,
Michael Belinsky,
Imane Bello,
Malo Bourgon,
Mark Brakel,
Siméon Campos,
Duncan Cass-Beggs,
Jiahao Chen,
Rumman Chowdhury,
Kuan Chua Seah,
Jeff Clune,
Juntao Dai,
Agnes Delaborde,
Nouha Dziri,
Francisco Eiras,
Joshua Engels,
Jinyu Fan,
Adam Gleave,
Noah Goodman,
Fynn Heide,
Johannes Heidecke,
Dan Hendrycks,
Cyrus Hodes,
Bryan Low Kian Hsiang,
Minlie Huang,
Sami Jawhar,
Wang Jingyu,
Adam Tauman Kalai,
Meindert Kamphuis,
Mohan Kankanhalli,
Subhash Kantamneni,
Mathias Bonde Kirk,
Thomas Kwa,
Jeffrey Ladish,
Kwok-Yan Lam,
Wan Lee Sie,
Taewhi Lee,
Xiaojian Li,
Jiajun Liu,
Chaochao Lu,
Yifan Mai,
Richard Mallah,
Julian Michael,
Nick Moës,
Simon Möller,
Kihyuk Nam,
Kwan Yee Ng,
Mark Nitzberg,
Besmira Nushi,
Seán O hÉigeartaigh,
Alejandro Ortega,
Pierre Peigné,
James Petrie,
Benjamin Prud’Homme,
Reihaneh Rabbany,
Nayat Sanchez-Pi,
Sarah Schwettmann,
Buck Shlegeris,
Saad Siddiqui,
Aradhana Sinha,
Martín Soto,
Cheston Tan,
Dong Ting,
William Tjhi,
Robert Trager,
Brian Tse,
Anthony Tung K. H.,
John Willes,
Denise Wong,
Wei Xu,
Rongwu Xu,
Yi Zeng,
HongJiang Zhang and
Djordje Žikelić
SCAI 2025
pdf
arxiv
bib
-
-
-
AI Debate Aids Assessment of Controversial Claims
Salman Rahman,
Sheriff Issaka,
Ashima Suvarna,
Genglin Liu,
James Shiffer,
Jaeyoung Lee,
Md Rizwan Parvez,
Hamid Palangi,
Shi Feng,
Nanyun Peng,
Yejin Choi,
Julian Michael,
Liwei Jiang and
Saadia Gabriel
MAS 2025
s2
pdf
arxiv
bib
-
-
International AI Safety Report
Yoshua Bengio,
Sören Mindermann,
Daniel Privitera,
Tamay Besiroglu,
Rishi Bommasani,
Stephen Casper,
Yejin Choi,
Philip Fox,
Ben Garfinkel,
Danielle Goldfarb,
Hoda Heidari,
Anson Ho,
Sayash Kapoor,
Leila Khalatbari,
Shayne Longpre,
Sam Manning,
Vasilios Mavroudis,
Mantas Mazeika,
Julian Michael,
Jessica Newman,
Kwan Yee Ng,
Chinasa T. Okolo,
Deborah Raji,
Girish Sastry,
Elizabeth Seger,
Theodora Skeadas,
Tobin South,
Emma Strubell,
Florian Tramèr,
Lucia Velasco,
Nicole Wheeler,
Daron Acemoglu,
Olubayo Adekanmbi,
David Dalrymple,
Thomas G. Dietterich,
Edward W. Felten,
Pascale Fung,
Pierre-Olivier Gourinchas,
Fredrik Heintz,
Geoffrey Hinton,
Nick Jennings,
Andreas Krause,
Susan Leavy,
Percy Liang,
Teresa Ludermir,
Vidushi Marda,
Helen Margetts,
John McDermid,
Jane Munga,
Arvind Narayanan,
Alondra Nelson,
Clara Neppel,
Alice Oh,
Gopal Ramchurn,
Stuart Russell,
Marietje Schaake,
Bernhard Schölkopf,
Dawn Song,
Alvaro Soto,
Lee Tiedrich,
Gaël Varoquaux,
Andrew Yao,
Ya-Qin Zhang,
Olubunmi Ajala,
Fahad Albalawi,
Marwan Alserkal,
Guillaume Avrin,
Christian Busch,
André Carlos Ponce de Leon Ferreira de Carvalho,
Bronwyn Fox,
Amandeep Singh Gill,
Ahmet Halit Hatip,
Juha Heikkilä,
Chris Johnson,
Gill Jolly,
Ziv Katzir,
Saif M. Khan,
Hiroaki Kitano,
Antonio Krüger,
Kyoung Mu Lee,
Dominic Vincent Ligot,
José Ramón López Portillo,
Oleksii Molchanovskyi,
Andrea Monti,
Nusu Mwamanzi,
Mona Nemer,
Nuria Oliver,
Raquel Pezoa Rivera,
Balaraman Ravindran,
Hammam Riza,
Crystal Rugege,
Ciarán Seoighe,
Jerry Sheehan,
Haroon Sheikh,
Denise Wong and
Yi Zeng
UK DSIT/AISI Report
website
s2
pdf
arxiv
bib
2024
-
Alignment faking in large language models
Ryan Greenblatt,
Carson Denison,
Benjamin Wright,
Fabien Roger,
Monte MacDiarmid,
Sam Marks,
Johannes Treutlein,
Tim Belonax,
Jack Chen,
David Duvenaud,
Akbir Khan,
Julian Michael,
Sören Mindermann,
Ethan Perez,
Linda Petrini,
Jonathan Uesato,
Jared Kaplan,
Buck Shlegeris,
Samuel R. Bowman and
Evan Hubinger
preprint
website
s2
pdf
arxiv
code
video
memo
bib
-
-
-
-
Research Agenda for Sociotechnical Approaches to AI Safety
Samuel Curtis,
Ravi Iyer,
Cameron Domenico Kirk-Giannini,
Victoria Krakovna,
David Krueger,
Nathan Lambert,
Bruno Marnette,
Colleen McKenzie,
Julian Michael,
Evan Miyazono,
Noyuri Mima,
Aviv Ovadya,
Luke Thorburn and
Deger Turan
preprint
pdf
bib
-
-
2023
-
-
-
-
-
-
-
What Do NLP Researchers Believe? Results of the NLP Community Metasurvey
Julian Michael,
Ari Holtzman,
Alicia Parrish,
Aaron Mueller,
Alex Wang,
Angelica Chen,
Divyam Madaan,
Nikita Nangia,
Richard Yuanzhe Pang,
Jason Phang and
Samuel R. Bowman
ACL 2023
website
s2
pdf
arxiv
acl
poster
talk
bib
Media:
AI Index (2022; Ch. 8),
Data Skeptic Podcast,
IFLScience,
Tages-Anzeiger,
New Scientist,
The Times,
Yahoo,
NLP Deep Dive,
NYU CDS Blog
2022
2021
2020
2019
2018
2016
2015