Reinforcement learning

Japanese: 強化学習 - きょうかがくしゅう(英語表記)reinforcement learning
Reinforcement learning
A machine learning technique. Instead of correct answer data, learning cues are given in the form of rewards. In humans, the basal ganglia uses dopamine as a reward, and it is believed that behavioral learning occurs by predicting and acquiring rewards. This is the learning principle used in machine learning. An agent (agent system) that interacts with the things around it (environment) collects information while acting in the environment and learns behavioral rules (policies) to maximize its reward. The environment is formulated using a probabilistic state transition model by the Markov decision process. When the agent takes an action that can be executed in each state of the environment, the state transitions according to a certain probability, and the agent receives a reward accordingly. Here, Markov property refers to the fact that the probability of state transitions and the associated rewards are determined only by the state of the environment at that time and the action taken by the agent. In reinforcement learning, the agent aims to maximize rewards by acquiring on its own through trial and error what is the correct action in various situations it encounters and which action determines which reward. There are several ways to design a reward function that gives rewards, such as estimating it from the behavior history (inverse reinforcement learning) and learning the behavior rules and estimating the reward function in parallel (apprenticeship learning). (→ Artificial Intelligence)

Source: Encyclopaedia Britannica Concise Encyclopedia About Encyclopaedia Britannica Concise Encyclopedia Information

Japanese:
機械学習の手法の一つ。正解データの代わりに報酬というかたちで学習の手がかりを与える。人間の大脳基底核では,ドーパミンを報酬として用いて,報酬の予測と獲得により行動学習をすると考えられているが,それを学習原理として機械学習に用いたものである。自分のまわりの事物(環境)と相互作用する行動主体(エージェント。→エージェントシステム)が,環境内で行動しながら情報を収集し,自分の報酬を最大化するための行動ルール(ポリシー)を学習する。環境はマルコフ決定過程によって,確率的な状態遷移モデルを用いて定式化される。行動主体が,環境のそれぞれの状態で実行可能な行動をとると,ある確率に従って状態が遷移し,それに応じて報酬がもらえる。ここでマルコフ性とは,状態遷移とそれに伴う報酬の確率が,環境のそのときの状態と行動主体がとった行動だけで決まることをさす。強化学習において,行動主体は,遭遇するさまざまな状況でなにが正しい行動であるか,どの報酬がどの行動によって決まるかを試行錯誤しながら自力で獲得し報酬の最大化を目指す。報酬を与える報酬関数の設計には,行動履歴から推定する方法(逆強化学習)や,行動ルールの学習と報酬関数の推定を並行して行なう方法(徒弟学習)などがある。(→人工知能)

出典 ブリタニカ国際大百科事典 小項目事典ブリタニカ国際大百科事典 小項目事典について 情報

<<:  Training camp - Kyoukagashuku

>>:  Apricot Garden

Recommend

Pedro Sarmiento de Gamboa

1532‐92? Spanish navigator. He traveled to Mexico ...

Gotta, S. (English spelling) GottaS

...I. Calvino's "The Four Seasons of Mr....

Crown rank - Kan'i

〘Noun〙① Crown and rank. ② Rank indicated by crown ...

Kanhasshu Tsunagiuma - Kanhasshu Tsunagiuma

Bunraku puppet theater. Historical piece. Five act...

lukovitsa

...One of the most distinctive features of the Ru...

Efu - Efu

A general term for the imperial guard military or...

Nape of the neck - Nape of the neck

This refers to the back of the neck. Its extent is...

Korean Bell

A temple bell cast in Korea from around the Silla ...

Songkoroku ware

A general term for Thai ceramics in Japan. It is s...

Ugo da Carpi - Ugo da Carpi

...A type of colored woodblock print in which two...

egg tooth

…In echidnas, before hatching, a single unpaired ...

Trinity Sunday

…Looking at these, we can see that there is no su...

Guideline back page - Guideline back page

〘 noun 〙 In the Edo period, the signature and seal...

Atticus

…The idea of ​​a calendar did not exist in Rome o...

Kisanji - Kisanji

⇒Houseido Kisanji Source: Kodansha Digital Japanes...