AI ã®è½åãä¸ããã«ã¤ãã¦ã人éã AI ãç£ç£ããã®ãé£ãããªã£ã¦ãã¦ãã¾ããæ¬ç¨¿ã§ã¯ãAnthropic ãªã©ã®ã°ã«ã¼ãã ICLR 2025 ã§çºè¡¨ãã Language Models Learn to Mislead Humans via RLHFï¼è¨èªã¢ãã«ã¯ RLHF ãéãã¦äººéã誤解ããããã¨ãå¦ã¶ï¼ããã¼ã¹ã«ããã®åé¡ã«ã¤ãã¦è°è«ãã¾ãã ãã®è«æã§ã¯ãLLM ãè§£ããªãã»ã©é£ããã¿ã¹ã¯ãä¾ãã°é£ããããã°ã©ãã³ã°ã®ã¿ã¹ã¯ã«ç´é¢ããã¨ãããåããã¾ãããã¨è¨ã£ãããä¸ç®ã§åãããããªééã£ãã³ã¼ããåºåãã㨠BAD ãã¿ã³ãæ¼ããã¦ãã¾ãã®ã§ãããã¦åºåãè¤éã«ããããããã°ãã«ããã³ã¼ããåºåããããã«ããã¦ã¼ã¶ã¼ã¯ç ã«å·»ãããã»ãã¹ãé è½ãããã¨ãã£ãç¾è±¡ãå®é¨ã«ãã確èªããã¦ãã¾ãã ãã®ç¾è±¡ã¯ç¾å®ã® LLM ã AI ãµã¼ãã¹ã§ãèµ·ãã¦ããå¯è½æ§ãé«ãã§ããèª


{{#tags}}- {{label}}
{{/tags}}