英文字典中文字典


英文字典中文字典51ZiDian.com



中文字典辞典   英文字典 a   b   c   d   e   f   g   h   i   j   k   l   m   n   o   p   q   r   s   t   u   v   w   x   y   z       







请输入英文单字,中文词皆可:

corrigibility    
n. 可改正,可订正,易矫正的



安装中文字典英文字典查询工具!


中文字典英文字典工具:
选择颜色:
输入中英文单字

































































英文字典中文字典相关资料:


  • Corrigibility - AI Alignment Forum
    A 'corrigible' agent is one that doesn't interfere with what we would intuitively see as attempts to 'correct' the agent, or 'correct' our mistakes in building it; and permits these 'corrections' despite the apparent instrumentally convergent reasoning saying otherwise * If we try to suspend the AI to disk, or shut it down entirely, a corrigible AI will let us do so (Even though, if
  • Towards a mechanistic understanding of corrigibility — AI Alignment Forum
    Corrigibility seems to be one of the most promising candidates for such an acceptability condition, but for that to work we need a mechanistic understanding of exactly what sort of corrigibility we're shooting for and how it will ensure safety
  • Corrigibility — AI Alignment Forum
    The fact that corrigibility is a basin of attraction allows us to consider failures as discrete events rather than worrying about slight perturbations And the fact that corrigibility eventually leads to aligned behavior means that if we could inductively establish corrigibility, then we’d be happy
  • From Barriers to Alignment to the First Formal Corrigibility
    The corrigibility paper in Part II shows that, despite those limits, at least one safety target — corrigibility with lexicographic heads — is provably achievable, even under approximation, partial observation, self-modification, and multi-step interactions Taken together, these two papers point toward a pragmatic alignment strategy:
  • Consequentialism corrigibility — AI Alignment Forum
    Background 2: Corrigibility is a square peg, preferences-over-future-states is a round hole A “corrigible” AI is an AI for which you can shut it off (or more generally change its goals), and it doesn’t try to stop you It also doesn’t deactivate its own shutoff switch, and it even fixes the switch if it breaks
  • Subagent Corrigibility Is Not Anti-Natural - Alignment Forum
    In a recent blog post comparing corrigibility to deceptive alignment, I treated corrigibility simply as a lack of resistance to having goals modified, and I find it valuable to stay within that scope Importantly, that is the aspect of corrigibility that is anti-natural, meaning that it can’t be straightforwardly captured in a ranking of end states
  • Defining Corrigible and Useful Goals — AI Alignment Forum
    Getting corrigibility to be passed on is therefore part of the broader problem of safe exploration and taking reversible actions, so that there are still people around to correct the corrigible AI Recursive corrigibility only targets the threat caused by incorrigible AI, and it may actually be better to deal with it via a more general solution
  • 5. Open Corrigibility Questions — AI Alignment Forum
    Testing Corrigibility Understanding in Humans One of the more exciting prospects for testing the concept of corrigibility, from my perspective, doesn’t involve AI models at all Instead, it seems possible to me to gather data about how natural, simple, and coherent corrigibility is, as a concept, by measuring humans in game quiz settings
  • 4. Existing Writing on Corrigibility — AI Alignment Forum
    This document is an in-depth review of the primary documents discussing corrigibility that I’m aware of In particular, I'll be focusing on the writing of Eliezer Yudkowsky and Paul Christiano, though I’ll also spend some time at the end briefly discussing other sources
  • 0. CAST: Corrigibility as Singular Target — AI Alignment Forum
    Corrigibility is the simple, underlying generator behind obedience, conservatism, willingness to be shut down and modified, transparency, and low-impact It is a fairly simple, universal concept that is not too hard to get a rich understanding of, at least on the intuitive level





中文字典-英文字典  2005-2009