Alignment research often has to focus on averting concerning behaviors, but I think the positive vision for this kind of training is one where we can give models and honest and positive vision for what AI models can be and why. I'm excited about the future of this work.
alignment research(对齐研究)往往不得不把重点放在避免那些令人担忧的行为上,但我认为,这类训练更积极的愿景是:我们能够赋予模型一个诚实而正面的愿景,让它们理解 AI models 能成为什么,以及原因何在。我对这项工作的未来感到振奋。