View on
Semantic Scholar /
Google Scholar /
ACL Anthology /
DBLP
Showing (All/None):
Conferences /
Findings /
Journals /
Workshops /
Technical reports /
Theses /
Preprints
2025
-
International AI Safety Report
Yoshua Bengio,
Sören Mindermann,
Daniel Privitera,
Tamay Besiroglu,
Rishi Bommasani,
Stephen Casper,
Yejin Choi,
Philip Fox,
Ben Garfinkel,
Danielle Goldfarb,
Hoda Heidari,
Anson Ho,
Sayash Kapoor,
Leila Khalatbari,
Shayne Longpre,
Sam Manning,
Vasilios Mavroudis,
Mantas Mazeika,
Julian Michael,
Jessica Newman,
Kwan Yee Ng,
Chinasa T. Okolo,
Deborah Raji,
Girish Sastry,
Elizabeth Seger,
Theodora Skeadas,
Tobin South,
Emma Strubell,
Florian Tramèr,
Lucia Velasco,
Nicole Wheeler,
Daron Acemoglu,
Olubayo Adekanmbi,
David Dalrymple,
Thomas G. Dietterich,
Edward W. Felten,
Pascale Fung,
Pierre-Olivier Gourinchas,
Fredrik Heintz,
Geoffrey Hinton,
Nick Jennings,
Andreas Krause,
Susan Leavy,
Percy Liang,
Teresa Ludermir,
Vidushi Marda,
Helen Margetts,
John McDermid,
Jane Munga,
Arvind Narayanan,
Alondra Nelson,
Clara Neppel,
Alice Oh,
Gopal Ramchurn,
Stuart Russell,
Marietje Schaake,
Bernhard Schölkopf,
Dawn Song,
Alvaro Soto,
Lee Tiedrich,
Gaël Varoquaux,
Andrew Yao,
Ya-Qin Zhang,
Olubunmi Ajala,
Fahad Albalawi,
Marwan Alserkal,
Guillaume Avrin,
Christian Busch,
André Carlos Ponce de Leon Ferreira de Carvalho,
Bronwyn Fox,
Amandeep Singh Gill,
Ahmet Halit Hatip,
Juha Heikkilä,
Chris Johnson,
Gill Jolly,
Ziv Katzir,
Saif M. Khan,
Hiroaki Kitano,
Antonio Krüger,
Kyoung Mu Lee,
Dominic Vincent Ligot,
José Ramón López Portillo,
Oleksii Molchanovskyi,
Andrea Monti,
Nusu Mwamanzi,
Mona Nemer,
Nuria Oliver,
Raquel Pezoa Rivera,
Balaraman Ravindran,
Hammam Riza,
Crystal Rugege,
Ciarán Seoighe,
Jerry Sheehan,
Haroon Sheikh,
Denise Wong and
Yi Zeng
UK DSIT/AISI Report
website
s2
pdf
arxiv
bib
2024
-
Alignment faking in large language models
Ryan Greenblatt,
Carson Denison,
Benjamin Wright,
Fabien Roger,
Monte MacDiarmid,
Sam Marks,
Johannes Treutlein,
Tim Belonax,
Jack Chen,
David Duvenaud,
Akbir Khan,
Julian Michael,
Sören Mindermann,
Ethan Perez,
Linda Petrini,
Jonathan Uesato,
Jared Kaplan,
Buck Shlegeris,
Samuel R. Bowman and
Evan Hubinger
preprint
website
s2
pdf
arxiv
code
video
memo
bib
-
-
-
-
Research Agenda for Sociotechnical Approaches to AI Safety
Samuel Curtis,
Ravi Iyer,
Cameron Domenico Kirk-Giannini,
Victoria Krakovna,
David Krueger,
Nathan Lambert,
Bruno Marnette,
Colleen McKenzie,
Julian Michael,
Evan Miyazono,
Noyuri Mima,
Aviv Ovadya,
Luke Thorburn and
Deger Turan
preprint
pdf
bib
-
-
2023
-
-
-
-
-
-
-
What Do NLP Researchers Believe? Results of the NLP Community Metasurvey
Julian Michael,
Ari Holtzman,
Alicia Parrish,
Aaron Mueller,
Alex Wang,
Angelica Chen,
Divyam Madaan,
Nikita Nangia,
Richard Yuanzhe Pang,
Jason Phang and
Samuel R. Bowman
ACL 2023
website
s2
pdf
arxiv
acl
poster
talk
bib
Media:
AI Index (2022; Ch. 8),
Data Skeptic Podcast,
IFLScience,
Tages-Anzeiger,
New Scientist,
The Times,
Yahoo,
NLP Deep Dive,
NYU CDS Blog
2022
2021
2020
2019
2018
2016
2015