æ¨è«è½åãé«ããããã«ã¯ãLLM ã®äºå¾è¨ç·´ã§ä½¿ãè¨ç·´ãã¼ã¿ã¯ 1 ã¤ã§ååããããã¾ãããæ¬ç¨¿ã§ã¯è¨ç·´ãã¼ã¿ã 1 ã¤ã ã使ã£ãå¼·åå¦ç¿ã«ã¤ãã¦ã®ç ç©¶ Reinforcement Learning for Reasoning in Large Language Models with One Training Exampleï¼åä¸ã®è¨ç·´ä¾ãç¨ããå¤§è¦æ¨¡è¨èªã¢ãã«ã«ãããæ¨è«ã®ããã®å¼·åå¦ç¿, NeurIPS 2025ï¼ã«ã¤ãã¦è§£èª¬ãã¾ãã ãã®ç ç©¶ã®çµè«ãç´è¦³çã«è¿°ã¹ãã¨ãå³é¸ããæ°å¦ã®åé¡ 1 åã®è§£ãæ¹ã LLM ã«ã²ãããèãããç¶ããã¨é«ãæ¨è«è½åãå¾ãããã¨ãããã¨ã§ãã徿¥ã®è¨ç·´ã®ããã«æ§ã ãªåé¡ãç¨æããå¿ è¦ã¯ããã¾ããããã£ã 1 åã使ã£ãè¨ç·´ã§ MATH500 ã¨ããæ°å¦ãã³ããã¼ã¯ã«å¯¾ããæ£ççã 36.0% ãã 73.6% ã«ã6 ã¤ã®æ°å¦ãã³ããã¼ã¯ã®å¹³åæ£ç


{{#tags}}- {{label}}
{{/tags}}