Anylogic强化学习实验官方教程翻译

1年前浏览4383

The Reinforcement Learning experiment is a special type of experiment, specifically tailored for integrating RL-ready AnyLogic models onto platforms that specialize in training the AI brains.
Currently, the RL experiment allows for exporting AnyLogic models to Microsoft Project Bonsai. More options will be added in the future.
Please note that as of now, the RL experiment is not designed for running ML-driven experiments directly within your AnyLogic installation, being simply a tool for converting AnyLogic models that are prepared to be exported from AnyLogic and then imported to appropriate platforms.
To learn more about how to integrate AnyLogic into the AI agent training process, visit the Artificial Intelligence page on the AnyLogic website.

强化学习实验是一种特殊类型的实验，专门用于将RL-ready AnyLogic模型集成到专门用于训练AI大脑的平台上。

目前，RL实验允许将任何逻辑模型导出到Microsoft Project Bonsai。未来将添加更多选项。

请注意，到目前为止，RL实验并不是为直接在AnyLogic安装中运行ML驱动的实验而设计的，它只是一个工具，用于转换准备从AnyLogic导出然后导入到适当平台的AnyLogics模型。

要了解如何将AnyLogic集成到AI代理培训过程中的更多信息，请访问AnyLogics网站上的人工智能页面。

Model prerequisites
To prepare a model for reinforcement learning, that is, make it a valid foundation for an RL experiment, you need to make sure it meets certain requirements:
The model has an adjustable configuration as the initial state of each simulation run
The model is able to broadcast its current state to the AI agent
The model can implement the action the AI agent decides to perform
Some platforms — for example, Microsoft Project Bonsai — assume that the model’s logic contains certain points (in time), upon reaching which the AI agent has to make a decision on which some action should be performed. Should you decide to use these platforms for your RL process, you have to keep in mind that a model should contain such decision points. We need to associate these decision points with events that trigger the AI agent to take action on the model. Among examples of events that can be treated as decision points are:
Events that occur at specific time intervals (for example, every 2 days of model time)
Events that correspond to certain triggers occurring during the model run (think statechart transitions, condition-based events, the actions implemented via the process flow blocks)
These events do not trigger the AI agent actions directly. For the AI agent to act on them, you have to create a decision point that gets executed as the result of the event’s occurrence. See Creating a decision point below.

模型先决条件

为了准备强化学习模型，即使其成为RL实验的有效基础，您需要确保它满足某些要求：

模型具有可调整配置，作为每次模拟运行的初始状态

模型能够向AI代理广播其当前状态

该模型可以实现AI代理决定执行的动作

一些平台-例如微软项目盆景-假设模型的逻辑包含某些点（时间点），在到达这些点时，人工智能代理必须决定应该执行哪些操作。如果您决定在RL过程中使用这些平台，您必须记住，模型应该包含这些决策点。我们需要将这些决策点与触发AI代理对模型采取行动的事件关联起来。可被视为决策点的事件示例包括：

以特定时间间隔发生的事件（例如，每2天的模型时间）

与模型运行期间发生的某些触发器相对应的事件（考虑状态图转换、基于条件的事件、通过流程块实现的操作）

这些事件不会直接触发AI代理操作。为了让AI代理对其进行操作，您必须创建一个决策点，该决策点将作为事件发生的结果执行。请参见下面的创建决策点。

Understanding the logic behind the RL experiment
To configure your RL experiment properly, you need to understand 3 primary concepts behind the implementation of the RL experiment in AnyLogic. All of them describe the numeric variables of some kind, which are used in the procedure of the AI agent’s training.
Observations are key values passed to the AI agent for analysis during training: static values used by the model, the results of some calculations — raw or transformed model outputs, — and so on.

理解RL实验背后的逻辑

要正确配置RL实验，您需要了解在AnyLogic中实现RL实验的3个主要概念。所有这些都描述了某种数值变量，用于人工智能代理的训练过程。

观察值是在训练期间传递给AI代理进行分析的关键值：模型使用的静态值、一些计算的结果-原始或转换的模型输出-等等。（图片）

Actions are values that the AI agent determines at each step — and then assigns to the model’s variables (or functions that take the action) before proceeding with the next step in the simulation (an episode in terms of Project Bonsai).
Configuration is a set of values that define the initial state of the model before the simulation run starts. This code is run on the model’s setup and initializes each simulation run of the RL training. At this moment, the top-level agent of the model is already created, but the model is not started yet.
In the internal structure of the RL experiment in AnyLogic, all these values are represented as Java classes.
The RL experiment allows you to set up and manipulate these values within the model before starting the actual process of RL training in Project Bonsai.

动作是AI代理在每一步确定的值，然后在继续模拟的下一步（盆景项目中的插曲）之前分配给模型的变量（或采取动作的函数）。

配置是一组值，用于定义模拟运行开始前模型的初始状态。该代码在模型设置上运行，并初始化RL训练的每个模拟运行。此时，模型的顶级代理已经创建，但模型尚未启动。

在AnyLogic中RL实验的内部结构中，所有这些值都表示为Java类。

RL实验允许您在模型中设置和操作这些值，然后在盆景项目中开始RL训练的实际过程。

To create an RL experiment
In the Projects view, right-click (Mac OS: Ctrl + click) the model item and choose New >
Experiment from the context menu.
The New Experiment dialog box opens up.
Select the
Reinforcement Learning option in the Experiment Type list.
Specify the experiment name in the Name edit box.
Choose the top-level agent of the experiment from the Top-level agent drop-down list.
If you want to apply model time settings from another experiment, leave the Copy model time settings from checkbox selected and choose the experiment in the drop-down list to the right.
Upon completing, click Finish.
The resulting experiment will appear in the Projects view. Click it to access its properties.

创建RL实验

在“项目”视图中，右键单击（Mac OS:Ctrl+单击）模型项，然后从上下文菜单中选择“新建>实验”。

将打开“新建实验”对话框。

在实验类型列表中选择强化学习选项。

在名称编辑框中指定实验名称。

从顶级代理下拉列表中选择实验的顶级代理。

如果要应用另一个实验中的模型时间设置，请选中“复制模型时间设置”复选框，然后在右侧的下拉列表中选择该实验。

完成后，单击Finish。

结果实验将显示在“项目”视图中。单击它以访问其属性。

To export the RL-ready model and experiment
To export the model and experiment, do any of the following:
Select an item of the model in the Projects view and choose File >
Export >
Reinforcement learning from the main menu.
Right-click the model in the Projects view and choose
Export >
Reinforcement learning from the context menu.If the menu item appears inactive, make sure the model contains the Reinforcement Learning experiment.
The Export model wizard opens up. Select the platform you want to use for reinforcement learning in the RL platform edit box, and use the wizard to configure the necessary settings.
For more information on each specific platform, see below.

导出RL就绪模型和实验

要导出模型和实验，请执行以下任一操作：

在项目视图中选择模型的一项，然后从主菜单中选择文件>导出>强化学习。

在项目视图中右键单击模型，然后从关联菜单中选择导出>强化学习。

如果菜单项显示为非活动，请确保模型包含强化学习实验。

将打开导出模型向导。在RL平台编辑框中选择要用于强化学习的平台，并使用向导配置必要的设置。

有关每个特定平台的更多信息，请参见下文。

Exporting to Microsoft Bonsai
Configure your RL experiment using options available in the Observation, Action, and Configuration section.
Click Export to Microsoft Bonsai in the topmost section of the experiment’s properties.
In the resulting dialog, specify the path where you want to save the ZIP file containing the RL experiment, in the Destination ZIP file edit box, or
Click Browse... and go to the desired folder. After that, specify the name of the resulting ZIP file using the File name edit box, then click Save.

导出到Microsoft盆景

使用观察、操作和配置部分提供的选项配置RL实验。

单击实验属性最顶部的“导出到Microsoft盆景”。

在结果对话框中，在目标ZIP文件编辑框中指定要保存包含RL实验的ZIP文件的路径，或者

单击浏览…并转到所需文件夹。之后，使用文件名编辑框指定结果ZIP文件的名称，然后单击保存。

Click Next.
On the next page of the wizard, click the Bonsai platform link to open the Microsoft Project Bonsai website in the default browser of your system and follow the instructions provided there.
When a ZIP file of your model is requested, click the Locate ZIP file link in the wizard to navigate to the file’s location on your computer.
Click Finish to close the wizard.
Proceed with the RL training on the Project Bonsai platform.

单击下一步。

在向导的下一页，单击盆景平台链接，在系统的默认浏览器中打开Microsoft Project盆景网站，并按照其中提供的说明操作。

当请求模型的ZIP文件时，单击向导中的“查找ZIP文件”链接以导航到文件在计算机上的位置。

单击“完成”关闭向导。

在项目盆景平台上进行RL培训。

Creating a decision point
To declare that a certain event triggers a decision point for the AI agent (and the experiment step should be performed), call the ExperimentReinforcementLearning.takeAction(agent) static function, passing any model agent as an agent argument. This argument will cause the function to access the top-level (Main) agent, forcing all RL-related data processing (for example, retrieving the observation data) to be done in the context of this agent.
For example, when the event is located within a certain agent, the following code specified as Action of this event would refer this:
ExperimentReinforcementLearning.takeAction(this)

创建决策点

要声明某个事件触发AI代理的决策点（并且应该执行实验步骤），请调用ExperimentReinforcementLearning。takeAction（代理）静态函数，将任何模型代理作为代理参数传递。此参数将导致函数访问顶级（主）代理，强制所有与RL相关的数据处理（例如，检索观测数据）在此代理的上下文中完成。

例如，当事件位于某个代理内时，指定为该事件的操作的以下代码将引用此：

ExperimentReinforcementLearning.takeAction(this)

来源：知乎陈潜龙

著作权归作者所有，欢迎分享，未经许可，不得转载

首次发布时间：2023-06-05