Package solver¶

This package contains all the modules required to perform a parallel Monte-Carlo Tree Search with Upper Confidence bounds for Tree on multiple weather scenarios.

master¶

Module implementing a master tree that incorporates the results of a parallel MCTS-UCT search. However this object is not directly used during the search and is created afterwards, once the search in completed.

class master.MasterTree(sims, destination, nodes={}, proba=[])¶

Bases: object

Tree that stores the final result of a MCTS parallel search on multiple scenarios. Is not the direct output of the search but already incorporates some post-processing.

Variables:

nodes (dict) – dictionary containing master_node.MasterNode, the keys are their corresponding hash
probability (numpy.array) – array containing the probability of each scenario
Simulators (list) – List of the :class:`simulatorTLKT.Simulator`objects used during the search
numScenarios (int) – Number of scenarios
destination (list) – Destination state [lat, long]
best_policy (dict) – Dictionary of list of actions. Key of the dictionary is the scenario id. Key -1 is for the average policy. The list is the sequel of action.
best_global_nodes_policy (dict) – Dictionary of list of master_node.MasterNode encountered during the best policy of one scenario. The Key of the dictionary is the scenario id. Key -1 is for the average policy.

get_best_child(node, idscenario=-1, verbose=False)¶

Compares the children of a node based on their rewards and return the best one.

Parameters:	node (MasterNode) – the parent `master_node.MasterNode` idscenario (int) – id of the considered scenario. If default (-1), the method returns the best child for the global tree verbose (bool) – If True, prints the best reward and the best action.
Returns:	A tuple: (the best child, the action taken to go from the node to its best child)

get_best_policy()¶: Computes the best policy for each scenario and the global best policy. And add it to the object.

get_points(node, points, probability, coordinate=(0, 0), idscenario=-1, objective='depth')¶

Recursive function used in plot_tree() and plot_tree_colored() to compute the coordinates of a node in the plot and other node properties depending on the objective parameter

Parameters:

node – a master_node.MasterNode object
points (list) – the previous list of points
probability (np.array) – probability of each scenario
coordinate (tuple) – coordinates of the previous point
idscenario (int) – id of the corresponding worker tree to be plot. If -1 (default), the global tree is plotted.
objective (string) – “uct” to compute exploration, explotation and utility. “depth” to compute the depth of each node.

Returns:

the expanded list of points, a point being a tuple

get_utility(node, idscenario)¶

Computes the utility of an node as the sum of the exploration and exploitation term. It done either for a given scenario or as the weighted mean over all the scenario (global utility) if idscenario = -1

Parameters:	node – The `master_node.MasterNode` whose utility we want to compute idscenario – scenario’s id
Returns:	exploitation term, exploration term
Return type:	float, float

guess_reward(node, idscenario)¶

Estimates the reward of an unexplored node (for a given scenario) as the lower reward such that its father node would be expanded next.

Parameters:	node – The `master_node.MasterNode` whose reward we want to estimate idscenario (int) – scenario’s id
Returns:	The estimated reward
Return type:	float

classmethod load_tree(name)¶

Load a master tree (object) from the data Folder.

Parameters:	name – Name of the file.

plot_best_policy(idscenario=-1, number_subplots=1)¶

Plot a representation of a tree and its best policy.

Parameters:	grey (boolean) – if True, each node/branch are plot with a color (grey scale) depending of the depth of the node idscenario (int) – id of the corresponding worker tree to be plot. If -1 (default), the global tree is plotted.
Returns:	A tuple (fig, ax) of the current plot

plot_hist_best_policy(idscenario=-1, interactive=False)¶

Plot the best policy as in plot_best_policy(), with the histogram of the best action at each node (Animation)

Parameters:

idscenario (int) – id of the corresponding worker tree to be plot. If -1 (default), the global tree is plotted.
interactive (bool) – if True the plot is not an animation but can be browsed step by step

Returns:

Animation

plot_tree(idscenario=-1, number_subplots=1, gray=True)¶

Plot a 2D representation of a tree.

Parameters:	gray (boolean) – if True, each node/branch are plot with a color (grey scale) depending of the depth of the node idscenario (int) – id of the corresponding worker tree to be plot. If -1 (default), the global tree is plotted.
Returns:	A tuple (fig, ax) of the current plot

plot_tree_uct(idscenario=-1)¶

Plot the tree 3 times: for the first one the colormap represents the sum of exploitation and exploration for each node , the second one represents the exploitation and the third one the exploration.

Parameters:	idscenario (int) – id of the corresponding worker tree to be plot. If -1 (default), the global tree is plotted.
Returns:	A tuple (fig, ax) of the current plot

save_tree(name)¶

Save the master tree (object) in the data Folder.

Parameters:	name – Name of the file.

worker¶

Module implementing a worker in the parallel MCTS-UCT search. It is basically a search on a fixed weather scenario.

Created on Wed May 31 10:06:46 2017

@author: paul

class worker.Node(state=None, parent=None, origins=[], children=[], depth=0)¶

Bases: object

Node of a worker.Tree.

Variables:

state (tuple) – Only for the root Node: initial state (time, lat, lon), None for other node.
parent (worker.Node) – Reference to the parent of this node.
origins (list) – sequel of actions taken from to root node to this node.
children (list) – Child nodes of this node.
actions (list) – Remaining actions available (not expanded) from this node in random order.
Values (numpy.array) – Array of utils.Hist to save the rewards. Its size is the number of possible actions.
depth (int) – Depth of the node in the Tree.

back_up(reward)¶

Propagates the reward through the Tree, starting from this worker.Node, up to the root.

Parameters:	reward (float) – The reward corresponding to the expansion of this `worker.Node`.

get_uct(num_parent)¶

Computes the uct values of this worker.Node (combination of exploration and exploitation)

Parameters:	num_parent (int) – Number of times the parent of the `worker.Node` has been explored.
Return float:	The uct value.

is_fully_expanded()¶

Returns True if this worker.Node is fully expanded (if there is not remaining actions)

Return boolean:	True if fully expanded, False otherwise.

worker.RHO = 0.3¶: Proportion between master utility and worker utility of node utility.

class worker.Tree(workerid, nscenario, probability=[], ite=0, budget=1000, simulator=None, destination=[], TimeMin=0, buffer=[])¶

Bases: object

A tree which represents a MCTS on one weather scenario.

Variables:

id (int) – Id of the tree, corresponding to the id scenario.
ite (int) – Current number of iterations done.
budget (int) – Max. number of iterations
simulator (Simulator) – The simulator used to do the boat simulations (step, etc.).
destination (list) – Position [lat, lon] of the wanted destination.
TimeMax (float) – Time horizon of the search.
TimeMin (float) – Minimum time to arrive to the destination, computed on several boats which go straight from the initial point to the destination (default policy).
depth (int) – Maximum depth of the tree.
Nodes (list) – List of worker.Node, representing the tree.
buffer (list) – The buffer is a list of updates to be included in the master Tree. One update is a list : [scenarioId, newNodeHash, parentHash, action, reward].
numScenarios (int) – Number total of scenarios used during the MCT parallel search.
probability (numpy.array) – array containing the probability of each scenario. Scenario id as a probability probability[id] to occur.

best_child(node, Master_nodes)¶

Select the best child of a node, by comparing their uct values. The comparison is based on the value of the child in this tree, but also in the master tree, if it exists there.

Parameters:	node (worker.Node) – The parent node. Master_nodes (dict) – Manager.dict() (Dictionary of `master_node.MasterNode`) which saves the nodes of every scenario.
Returns:	The best child (`worker.Node`) of the parent node given in parameter.

default_policy(node)¶

Policy used to compute the reward of a node. First, the state of the node is estimated with get_sim_to_estimate_state(), then the default policy is applied (going straight to the destination).

Parameters:	node (worker.Node) – The node one wants to evaluate.
Return float:	The rewards of the node.

expand(node)¶

Creates a new worker.Node from a node (its parent). The new node is expanded randomly using an available actions.

Parameters:	node (worker.Node) – The parent node.
Returns:	The new `worker.Node`.

get_master_uct(node_hash, Master_nodes)¶

Compute the uct value seen by the master tree.

Parameters:	node_hash (int) – the corresponding hash node. Master_nodes (dict) – Manager.dict() (Dictionary of `master_node.MasterNode` objects) which saves the nodes of every scenario.
Return float:	The uct value of the node passed in parameter.

get_sim_to_estimate_state(node)¶

Brings the simulator to an estimate state (time, lat, lon) of a node. Since the dynamic is not deterministic, the state is an estimation.

Parameters:	node (worker.Node) – The nodes one wants to estimate.

integrate_buffer(Master_nodes)¶

Integrates the buffer of update from this scenario (the buffer is an attribute of the Tree) . The buffer is a list of updates coming from the worker. One update is a list : [scenarioId, newNodeHash, parentHash, action, reward]

Parameters:	Master_nodes (dict) – Manager.dict() (Dictionary of `master_node.MasterNode` objects) which saves the nodes of every scenario.

is_node_terminal(node)¶

Checks if the corresponding time of a node is equal to the time horizon of the simulation.

Parameters:	node (worker.Node) – The node that is checked
Returns:	True if the node is terminal, False otherwise.

static is_state_at_dest(destination, stateA, stateB)¶

Determines if the boat has gone beyond the destination. In this case, computes how much the boat has overshot.

Parameters:	destination – Destination state (goal). stateA – Previous state of the simulator. stateB – Current state of the simulator.
Returns:	[True, frac] if the boat has reached the destination (frac is the last iteration proportion corresponding to the part stateA–destination). Returns [False, None] otherwise.

static is_state_terminal(simulator, state)¶

Determines if a state is considered as terminal (time is equal or larger than the time horizon, or if the boat is out of the zone of interest).

Parameters:	simulator (Simulator) – Simulator of the scenario. state (list) – State one wants to test [time, lat, lon]
Returns:	True if the state is considered as terminal, False otherwise.

tree_policy(node, master_nodes)¶

Method implementing the tree policy phase in the MCTS-UCT search. Starts from the root nodes and iteratively selects the best child of the nodes. A node must be fully expanded before we continue down to its best child.

Parameters:	node – starting node of the tree policy, usually the root node. Master_nodes (dict) – Manager.dict() (Dictionary of `master_node.MasterNode`) which saves the nodes of every scenario.
Returns:	the newly expanded node
Return type:	`Node`

uct_search(rootState, frequency, Master_nodes)¶

Launches the MCTS for the scenario.

Parameters:	rootState (list) – State [t_index, lat, lon] of the root node. frequency (int) – Length of the buffer: number of iterations between each buffer integrations. Master_nodes (dict) – Manager.dict() (Dictionary of `master_node.MasterNode`) which saves the nodes of every scenario.

worker.UCT_COEFF = 0.1414213562373095¶: Exploration coefficient in the UCT formula.

forest¶

Module managing multiple workers to carry out a parallel MCTS-UCT search and incorporating the results into a master.

class forest.Forest(listsimulators=[], destination=[], timemin=0, budget=100)¶

Bases: object

Object coordinating a MasterTree and its WorkerTrees during a parallel MCTS search.

Variables:

listsimulators (list) – a list of simulatorTLKT.Simulator objects, each simulator should correspond to a different weather scenario
destination (list(float,float)) – [lat,lon], destination to reach
timemin (float) – reference time to define the reward model, comes from the initialize_simulators() function
budget (int) – number of nodes in each worker at the end of the search

launch_search(root_state, frequency)¶

Launches a parallel MCTS search. It uses multiprocessing class. Each worker is a multiprocessing.Process object. The master is a multiprocessing.Manager hosting a dict() (called Master_nodes) that can be read and written into by each worker.

Parameters:	root_state (list(int,float,float)) – [time index, lat, lon], state of the root node of the sea frequency (int) – number of node that are expanded in a worker before this one updates the corresponding nodes in the master.
Returns:	dictionnary of `master_node.MasterNode` the keys are the nodes’ hashes. These results must be deep copied with `master_node.deepcopy_dict()` if you want to do anything with them.
Return type:	`multiprocessing.Manager.dict`

forest.create_simulators(weathers, numberofsim, simtimestep=6, stateinit=(0, 47.5, 356.5), ndaysim=8, deltalatlon=0.5)¶

Creates and returns a list of simulatorTLKT.Simulator.

Parameters:	weathers (list()) – List of `weatherTLKT.Weather` objects, each one is the wind scenario of the `simulatorTLKT.Simulator` object to be created numberofsim (int) – number of simulator we want to create. Must be lower or equal to len(weathers). The simulators are created by chronologically browsing the weathers list simtimestep (float) – time step in hours of the simulator stateinit (list(int,float,float)) – [t_index, lat, lon], initial state of the simulators ndaysim (int) – time horizon of the search. Must be smaller or equal to the weather object time horizon. deltalatlon (float) – latitude and longitude discretization of the simulators. Just for plotting purposes
Returns:	List of `simulatorTLKT.Simulator`
Return type:	list()

forest.download_scenarios(mydate, latBound=[43, 50], lonBound=[350, 360], website='http://nomads.ncep.noaa.gov:9090/dods/', scenario_ids=range(0, 21))¶

To download the weathers scenarios for a MCTS. Scenarios are saved as ‘../data/’ + mydate + ‘_’ + s_id + ‘00z.obj’ where s_id is the scenario id. id = 0 corresponds to the mean scenario. They are stored as weatherTLKT.Weather objects.

Warning: you might need to launch it from a terminal and not your IDE

Parameters:

mydate (string) – ‘yyyymmdd’ date of the day starting the weather forecast in US format
latBound (list(float,float)) – [latmin, latmax], latitude boundaries of the domain on which the forecast is desired
lonBound (list(float,float)) – [lonmin, lonmax], longitude boundaries of the domain on which the forecast is desired
website (string) – url of the server hosting the forecasts
scenario_ids (range) – range covering the forecasts scenarios we want to download

forest.initialize_simulators(sims, ntra, stateinit, missionheading, plot=False)¶

Find the destination point and the reference time for a list of simulators. These quantities are mandatory in order to carry out a MCTS parallel search.

Parameters:	sims (list()) – List of `simulatorTLKT.Simulator` ntra (int) – Number of trajectories used to estimate the initialization stateinit (list(int,float,float)) – t_index, lat, lon], starting point of the MCTS search missionheading (float) – angle in deg wrt true North of the desired destination from stateinit plot (bool) – if True displays the initialization trajectories
Returns:	[destination, timemin], destination are the coordinates of the destination point and timemin is in days
Return type:	list(list(float,float), float)

forest.load_scenarios(mydate, scenario_ids=range(0, 21), latBound=[-90, 90], lonBound=[0, 360], timeSteps=[0, 64])¶

Loads previously downloaded scenarios and returns the corresponding list of weatherTLKT.Weather objects.

Parameters:	mydate (string) – ‘yyyymmdd’ date of the day starting the weather forecast in US format scenario_ids – range covering the forecasts scenarios we want to download latBound (list(float,float)) – [latmin, latmax], latitude boundaries of the domain on which the forecast is desired. If smaller than domain of stored object, returns a cropped object. Returns stored object otherwise lonBound (list(float,float)) – [lonmin, lonmax], longitude boundaries of the domain on which the forecast is desired. If smaller than domain of stored object, returns a cropped object. Returns stored object otherwise timeSteps (list(int,int)) – [min_t_index, max_t_index], time span of the desired forecast. If smaller than domain of stored object, returns a cropped object. Returns stored object otherwise
Returns:	List of of `weatherTLKT.Weather`
Return type:	list()

forest.play_multiple_scenarios(sims)¶

Display an interactive representation of the wind conditions of a list of simulatorTLKT.Simulator objects.

Parameters:	sims (list()) – List of the `simulatorTLKT.Simulator` objects

Basic use example of the forest class:

import forest as ft
from master_node import deepcopy_dict
from master import MasterTree

# Scenario parameters
date = '20180223'  # January 30, 2018
latBounds = [40, 50]
lonBounds = [360 - 15, 360]

## Download and saves the 5 scenarios
## run this in a terminal if you actually want to download anything
# ft.download_scenarios(date, latBound=latBounds, lonBound=lonBounds, scenario_ids=range(1, 5))

# Load the weathers scnearios
Weathers = ft.load_scenarios(date, latBound=latBounds, lonBound=lonBounds, scenario_ids=range(1, 5))

# Simulators parameters
NUMBER_OF_SIM = 4  # <=20
SIM_TIME_STEP = 6  # in hours
STATE_INIT = [0, 42.5, 360 - 11.5]  # first state
N_DAYS_SIM = 4  # time horizon in days
missionheading = 0
n_trajectories = 50

# Create the simulators
sims = ft.create_simulators(Weathers, numberofsim=NUMBER_OF_SIM, simtimestep=SIM_TIME_STEP,
                            stateinit=STATE_INIT, ndaysim=N_DAYS_SIM)

# Initialize the simulators to get common destination and individual time min
destination, timemin = ft.initialize_simulators(sims, n_trajectories, STATE_INIT, missionheading, plot=True)
print("destination : " + str(destination) + "  &  timemin : " + str(timemin) + "\n")

# Search parameters
name = "tree_exemple"
frequency = 10
budget = 100

# Initialize the forest
forest = ft.Forest(listsimulators=sims, destination=destination, timemin=timemin, budget=budget)

# Launch the search
master_nodes = forest.launch_search(STATE_INIT, frequency)

# Save the result as a tree
new_dict = deepcopy_dict(master_nodes)
forest.master = MasterTree(sims, destination, nodes=new_dict)
forest.master.save_tree(name)

# Displays some juicy results
forest.master.plot_tree_uct()
forest.master.get_best_policy()
forest.master.plot_best_policy()
forest.master.plot_hist_best_policy(interactive=True)

master_node¶

Module implementing the node class as it is represented in a master tree, a master node stores rewards from multiple scenarios. It is separate from the master module as this class is used in both the master one (the nodes in the final master tree are master node) and in the forest module. In the forest module the Manager.dict used to multiprocess is a dictionary of master modes. However as this Manager.dict represents the master tree being built it is not a MasterTree object from the master module.

class master_node.MasterNode(numscenarios, nodehash=None, parentNode=None, action=None, rewards=[])¶

Bases: object

Node of a MasterTree.

Variables:

hash (int) – hash of the node (key of the dictionary master.MasterTree.nodes)
arm (int) – Action taken to get to this node from its parent.
parentNode (MasterNode) – parent of this node
rewards (numpy.array) – Array of Hist. Its shape is (#scenario, #possible actions).
children (list) – List of children (MasterNode)
depth (int) – Depth of the node.

add_reward(idscenario, reward)¶

Includes a reward into the histogram for a random action of one scenario.

Parameters:	idscenario (int) – id of the scenario/workertree from which the update is coming. reward (float) – reward of the update.

add_reward_action(idscenario, action, reward)¶

Includes a reward into the histogram for one action of one scenario.

Parameters:	idscenario (int) – id of the scenario/workertree from which the update is coming. action (int) – Action (in degree) of the update reward (float) – reward of the update

is_expanded(idscenario)¶

Check if this node has been expanded by a scenario.

Parameters:	idscenario – id of the scenario
Return boolean:	True if the scenario has expanded this node.

master_node.deepcopy_dict(nodes)¶

Return a deep copy of a the master Manager.dict object. Add also the children, the parentNode and the depth of each node. This method is called after the search before saving the result.

Parameters:	nodes (dict) – a dictionary with `master_node.MasterNode` objects
Returns:	the deep copy of the input dictionary

utils¶

Module implementing two useful classes : the histogram that is used in each node to stored the obtained rewards. And an Player class that implements interactive plots to help with result visualisation.

class utils.Hist(init=[])¶

Bases: object

Definition of the histogram class.

MAX_REWARD = 1.5¶: Upper bound of histogram

MEANS = array([ 0.0625, 0.1875, 0.3125, 0.4375, 0.5625, 0.6875, 0.8125, 0.9375, 1.0625, 1.1875, 1.3125, 1.4375])¶: mean value of each bin (mean between upper and lower threshold of each bin)

MIN_REWARD = 0¶: Lower bound of histogram

N_BINS = 12¶: Number of bins in histogram

THRESH = array([ 0. , 0.125, 0.25 , 0.375, 0.5 , 0.625, 0.75 , 0.875, 1. , 1.125, 1.25 , 1.375, 1.5 ])¶: Values of the thresholds between bins (lower and upper included)

add(value)¶: Adds the value to the corresponding bin. If value is lower than lowest (resp. highest) threshold then the value is added to the first (resp. last) bin. :param int value: value to be added to the histogram

get_mean()¶

Computes the mean value of the histogram

Return float:	mean value

class utils.Player(fig, func, frames=None, init_func=None, fargs=None, save_count=None, mini=0, maxi=100, pos=(0.125, 0.92), **kwargs)¶

Bases: matplotlib.animation.FuncAnimation

Class implementing interactive plot, inherits from FuncAnimation and must be used accordingly.

Package solver¶

master¶

worker¶

forest¶

master_node¶

utils¶

Table Of Contents

Related Topics

This Page