View on
Semantic Scholar /
Google Scholar /
ACL Anthology /
DBLP
Showing (All/None):
Conferences /
Findings /
Journals /
Workshops /
Theses /
Preprints
2024
-
Alignment faking in large language models
Ryan Greenblatt,
Carson Denison,
Benjamin Wright,
Fabien Roger,
Monte MacDiarmid,
Sam Marks,
Johannes Treutlein,
Tim Belonax,
Jack Chen,
David Duvenaud,
Akbir Khan,
Julian Michael,
Sören Mindermann,
Ethan Perez,
Linda Petrini,
Jonathan Uesato,
Jared Kaplan,
Buck Shlegeris,
Samuel R. Bowman and
Evan Hubinger
preprint
s2
pdf
arxiv
website
code
video
memo
bib
-
-
-
-
Research Agenda for Sociotechnical Approaches to AI Safety
Samuel Curtis,
Ravi Iyer,
Cameron Domenico Kirk-Giannini,
Victoria Krakovna,
David Krueger,
Nathan Lambert,
Bruno Marnette,
Colleen McKenzie,
Julian Michael,
Evan Miyazono,
Noyuri Mima,
Aviv Ovadya,
Luke Thorburn and
Deger Turan
preprint
pdf
bib
-
-
2023
-
-
-
-
-
-
-
What Do NLP Researchers Believe? Results of the NLP Community Metasurvey
Julian Michael,
Ari Holtzman,
Alicia Parrish,
Aaron Mueller,
Alex Wang,
Angelica Chen,
Divyam Madaan,
Nikita Nangia,
Richard Yuanzhe Pang,
Jason Phang and
Samuel R. Bowman
ACL 2023
website
s2
pdf
arxiv
acl
poster
talk
bib
Media:
AI Index (2022; Ch. 8),
Data Skeptic Podcast,
IFLScience,
Tages-Anzeiger,
New Scientist,
The Times,
Yahoo,
NLP Deep Dive,
NYU CDS Blog
2022
2021
2020
2019
2018
2016
2015