Package solver

This package contains all the modules required to perform a parallel Monte-Carlo Tree Search with Upper Confidence bounds for Tree on multiple weather scenarios.

master

Module implementing a master tree that incorporates the results of a parallel MCTS-UCT search. However this object is not directly used during the search and is created afterwards, once the search in completed.

class master.MasterTree(sims, destination, nodes={}, proba=[])

Bases: object

Tree that stores the final result of a MCTS parallel search on multiple scenarios. Is not the direct output of the search but already incorporates some post-processing.

Variables:
  • nodes (dict) – dictionary containing master_node.MasterNode, the keys are their corresponding hash
  • probability (numpy.array) – array containing the probability of each scenario
  • Simulators (list) – List of the :class:`simulatorTLKT.Simulator`objects used during the search
  • numScenarios (int) – Number of scenarios
  • destination (list) – Destination state [lat, long]
  • best_policy (dict) – Dictionary of list of actions. Key of the dictionary is the scenario id. Key -1 is for the average policy. The list is the sequel of action.
  • best_global_nodes_policy (dict) – Dictionary of list of master_node.MasterNode encountered during the best policy of one scenario. The Key of the dictionary is the scenario id. Key -1 is for the average policy.
get_best_child(node, idscenario=-1, verbose=False)

Compares the children of a node based on their rewards and return the best one.

Parameters:
  • node (MasterNode) – the parent master_node.MasterNode
  • idscenario (int) – id of the considered scenario. If default (-1), the method returns the best child for the global tree
  • verbose (bool) – If True, prints the best reward and the best action.
Returns:

A tuple: (the best child, the action taken to go from the node to its best child)

get_best_policy()

Computes the best policy for each scenario and the global best policy. And add it to the object.

get_points(node, points, probability, coordinate=(0, 0), idscenario=-1, objective='depth')

Recursive function used in plot_tree() and plot_tree_colored() to compute the coordinates of a node in the plot and other node properties depending on the objective parameter

Parameters:
  • node – a master_node.MasterNode object
  • points (list) – the previous list of points
  • probability (np.array) – probability of each scenario
  • coordinate (tuple) – coordinates of the previous point
  • idscenario (int) – id of the corresponding worker tree to be plot. If -1 (default), the global tree is plotted.
  • objective (string) – “uct” to compute exploration, explotation and utility. “depth” to compute the depth of each node.
Returns:

the expanded list of points, a point being a tuple

get_utility(node, idscenario)

Computes the utility of an node as the sum of the exploration and exploitation term. It done either for a given scenario or as the weighted mean over all the scenario (global utility) if idscenario = -1

Parameters:
Returns:

exploitation term, exploration term

Return type:

float, float

guess_reward(node, idscenario)

Estimates the reward of an unexplored node (for a given scenario) as the lower reward such that its father node would be expanded next.

Parameters:
Returns:

The estimated reward

Return type:

float

classmethod load_tree(name)

Load a master tree (object) from the data Folder.

Parameters:name – Name of the file.
plot_best_policy(idscenario=-1, number_subplots=1)

Plot a representation of a tree and its best policy.

Parameters:
  • grey (boolean) – if True, each node/branch are plot with a color (grey scale) depending of the depth of the node
  • idscenario (int) – id of the corresponding worker tree to be plot. If -1 (default), the global tree is plotted.
Returns:

A tuple (fig, ax) of the current plot

plot_hist_best_policy(idscenario=-1, interactive=False)

Plot the best policy as in plot_best_policy(), with the histogram of the best action at each node (Animation)

Parameters:
  • idscenario (int) – id of the corresponding worker tree to be plot. If -1 (default), the global tree is plotted.
  • interactive (bool) – if True the plot is not an animation but can be browsed step by step
Returns:

Animation

plot_tree(idscenario=-1, number_subplots=1, gray=True)

Plot a 2D representation of a tree.

Parameters:
  • gray (boolean) – if True, each node/branch are plot with a color (grey scale) depending of the depth of the node
  • idscenario (int) – id of the corresponding worker tree to be plot. If -1 (default), the global tree is plotted.
Returns:

A tuple (fig, ax) of the current plot

plot_tree_uct(idscenario=-1)

Plot the tree 3 times: for the first one the colormap represents the sum of exploitation and exploration for each node , the second one represents the exploitation and the third one the exploration.

Parameters:idscenario (int) – id of the corresponding worker tree to be plot. If -1 (default), the global tree is plotted.
Returns:A tuple (fig, ax) of the current plot
save_tree(name)

Save the master tree (object) in the data Folder.

Parameters:name – Name of the file.

worker

Module implementing a worker in the parallel MCTS-UCT search. It is basically a search on a fixed weather scenario.

Created on Wed May 31 10:06:46 2017

@author: paul

class worker.Node(state=None, parent=None, origins=[], children=[], depth=0)

Bases: object

Node of a worker.Tree.

Variables:
  • state (tuple) – Only for the root Node: initial state (time, lat, lon), None for other node.
  • parent (worker.Node) – Reference to the parent of this node.
  • origins (list) – sequel of actions taken from to root node to this node.
  • children (list) – Child nodes of this node.
  • actions (list) – Remaining actions available (not expanded) from this node in random order.
  • Values (numpy.array) – Array of utils.Hist to save the rewards. Its size is the number of possible actions.
  • depth (int) – Depth of the node in the Tree.
back_up(reward)

Propagates the reward through the Tree, starting from this worker.Node, up to the root.

Parameters:reward (float) – The reward corresponding to the expansion of this worker.Node.
get_uct(num_parent)

Computes the uct values of this worker.Node (combination of exploration and exploitation)

Parameters:num_parent (int) – Number of times the parent of the worker.Node has been explored.
Return float:The uct value.
is_fully_expanded()

Returns True if this worker.Node is fully expanded (if there is not remaining actions)

Return boolean:True if fully expanded, False otherwise.
worker.RHO = 0.3

Proportion between master utility and worker utility of node utility.

class worker.Tree(workerid, nscenario, probability=[], ite=0, budget=1000, simulator=None, destination=[], TimeMin=0, buffer=[])

Bases: object

A tree which represents a MCTS on one weather scenario.

Variables:
  • id (int) – Id of the tree, corresponding to the id scenario.
  • ite (int) – Current number of iterations done.
  • budget (int) – Max. number of iterations
  • simulator (Simulator) – The simulator used to do the boat simulations (step, etc.).
  • destination (list) – Position [lat, lon] of the wanted destination.
  • TimeMax (float) – Time horizon of the search.
  • TimeMin (float) – Minimum time to arrive to the destination, computed on several boats which go straight from the initial point to the destination (default policy).
  • depth (int) – Maximum depth of the tree.
  • Nodes (list) – List of worker.Node, representing the tree.
  • buffer (list) – The buffer is a list of updates to be included in the master Tree. One update is a list : [scenarioId, newNodeHash, parentHash, action, reward].
  • numScenarios (int) – Number total of scenarios used during the MCT parallel search.
  • probability (numpy.array) – array containing the probability of each scenario. Scenario id as a probability probability[id] to occur.
best_child(node, Master_nodes)

Select the best child of a node, by comparing their uct values. The comparison is based on the value of the child in this tree, but also in the master tree, if it exists there.

Parameters:
Returns:

The best child (worker.Node) of the parent node given in parameter.

default_policy(node)

Policy used to compute the reward of a node. First, the state of the node is estimated with get_sim_to_estimate_state(), then the default policy is applied (going straight to the destination).

Parameters:node (worker.Node) – The node one wants to evaluate.
Return float:The rewards of the node.
expand(node)

Creates a new worker.Node from a node (its parent). The new node is expanded randomly using an available actions.

Parameters:node (worker.Node) – The parent node.
Returns:The new worker.Node.
get_master_uct(node_hash, Master_nodes)

Compute the uct value seen by the master tree.

Parameters:
Return float:

The uct value of the node passed in parameter.

get_sim_to_estimate_state(node)

Brings the simulator to an estimate state (time, lat, lon) of a node. Since the dynamic is not deterministic, the state is an estimation.

Parameters:node (worker.Node) – The nodes one wants to estimate.
integrate_buffer(Master_nodes)

Integrates the buffer of update from this scenario (the buffer is an attribute of the Tree) . The buffer is a list of updates coming from the worker. One update is a list : [scenarioId, newNodeHash, parentHash, action, reward]

Parameters:Master_nodes (dict) –

Manager.dict() (Dictionary of master_node.MasterNode objects) which saves the nodes of every scenario.

is_node_terminal(node)

Checks if the corresponding time of a node is equal to the time horizon of the simulation.

Parameters:node (worker.Node) – The node that is checked
Returns:True if the node is terminal, False otherwise.
static is_state_at_dest(destination, stateA, stateB)

Determines if the boat has gone beyond the destination. In this case, computes how much the boat has overshot.

Parameters:
  • destination – Destination state (goal).
  • stateA – Previous state of the simulator.
  • stateB – Current state of the simulator.
Returns:

[True, frac] if the boat has reached the destination (frac is the last iteration proportion corresponding to the part stateA–destination). Returns [False, None] otherwise.

static is_state_terminal(simulator, state)

Determines if a state is considered as terminal (time is equal or larger than the time horizon, or if the boat is out of the zone of interest).

Parameters:
  • simulator (Simulator) – Simulator of the scenario.
  • state (list) – State one wants to test [time, lat, lon]
Returns:

True if the state is considered as terminal, False otherwise.

tree_policy(node, master_nodes)

Method implementing the tree policy phase in the MCTS-UCT search. Starts from the root nodes and iteratively selects the best child of the nodes. A node must be fully expanded before we continue down to its best child.

Parameters:
  • node – starting node of the tree policy, usually the root node.
  • Master_nodes (dict) –

    Manager.dict() (Dictionary of master_node.MasterNode) which saves the nodes of every scenario.

Returns:

the newly expanded node

Return type:

Node

Launches the MCTS for the scenario.

Parameters:
  • rootState (list) – State [t_index, lat, lon] of the root node.
  • frequency (int) – Length of the buffer: number of iterations between each buffer integrations.
  • Master_nodes (dict) –

    Manager.dict() (Dictionary of master_node.MasterNode) which saves the nodes of every scenario.

worker.UCT_COEFF = 0.1414213562373095

Exploration coefficient in the UCT formula.

forest

Module managing multiple workers to carry out a parallel MCTS-UCT search and incorporating the results into a master.

class forest.Forest(listsimulators=[], destination=[], timemin=0, budget=100)

Bases: object

Object coordinating a MasterTree and its WorkerTrees during a parallel MCTS search.

Variables:
  • listsimulators (list) – a list of simulatorTLKT.Simulator objects, each simulator should correspond to a different weather scenario
  • destination (list(float,float)) – [lat,lon], destination to reach
  • timemin (float) – reference time to define the reward model, comes from the initialize_simulators() function
  • budget (int) – number of nodes in each worker at the end of the search

Launches a parallel MCTS search. It uses multiprocessing class. Each worker is a multiprocessing.Process object. The master is a multiprocessing.Manager hosting a dict() (called Master_nodes) that can be read and written into by each worker.

Parameters:
  • root_state (list(int,float,float)) – [time index, lat, lon], state of the root node of the sea
  • frequency (int) – number of node that are expanded in a worker before this one updates the corresponding nodes in the master.
Returns:

dictionnary of master_node.MasterNode the keys are the nodes’ hashes. These results must be deep copied with master_node.deepcopy_dict() if you want to do anything with them.

Return type:

multiprocessing.Manager.dict

forest.create_simulators(weathers, numberofsim, simtimestep=6, stateinit=(0, 47.5, 356.5), ndaysim=8, deltalatlon=0.5)

Creates and returns a list of simulatorTLKT.Simulator.

Parameters:
  • weathers (list()) – List of weatherTLKT.Weather objects, each one is the wind scenario of the simulatorTLKT.Simulator object to be created
  • numberofsim (int) – number of simulator we want to create. Must be lower or equal to len(weathers). The simulators are created by chronologically browsing the weathers list
  • simtimestep (float) – time step in hours of the simulator
  • stateinit (list(int,float,float)) – [t_index, lat, lon], initial state of the simulators
  • ndaysim (int) – time horizon of the search. Must be smaller or equal to the weather object time horizon.
  • deltalatlon (float) – latitude and longitude discretization of the simulators. Just for plotting purposes
Returns:

List of simulatorTLKT.Simulator

Return type:

list()

forest.download_scenarios(mydate, latBound=[43, 50], lonBound=[350, 360], website='http://nomads.ncep.noaa.gov:9090/dods/', scenario_ids=range(0, 21))

To download the weathers scenarios for a MCTS. Scenarios are saved as ‘../data/’ + mydate + ‘_’ + s_id + ‘00z.obj’ where s_id is the scenario id. id = 0 corresponds to the mean scenario. They are stored as weatherTLKT.Weather objects.

Warning: you might need to launch it from a terminal and not your IDE

Parameters:
  • mydate (string) – ‘yyyymmdd’ date of the day starting the weather forecast in US format
  • latBound (list(float,float)) – [latmin, latmax], latitude boundaries of the domain on which the forecast is desired
  • lonBound (list(float,float)) – [lonmin, lonmax], longitude boundaries of the domain on which the forecast is desired
  • website (string) – url of the server hosting the forecasts
  • scenario_ids (range) – range covering the forecasts scenarios we want to download
forest.initialize_simulators(sims, ntra, stateinit, missionheading, plot=False)

Find the destination point and the reference time for a list of simulators. These quantities are mandatory in order to carry out a MCTS parallel search.

Parameters:
  • sims (list()) – List of simulatorTLKT.Simulator
  • ntra (int) – Number of trajectories used to estimate the initialization
  • stateinit (list(int,float,float)) – t_index, lat, lon], starting point of the MCTS search
  • missionheading (float) – angle in deg wrt true North of the desired destination from stateinit
  • plot (bool) – if True displays the initialization trajectories
Returns:

[destination, timemin], destination are the coordinates of the destination point and timemin is in days

Return type:

list(list(float,float), float)

forest.load_scenarios(mydate, scenario_ids=range(0, 21), latBound=[-90, 90], lonBound=[0, 360], timeSteps=[0, 64])

Loads previously downloaded scenarios and returns the corresponding list of weatherTLKT.Weather objects.

Parameters:
  • mydate (string) – ‘yyyymmdd’ date of the day starting the weather forecast in US format
  • scenario_ids – range covering the forecasts scenarios we want to download
  • latBound (list(float,float)) – [latmin, latmax], latitude boundaries of the domain on which the forecast is desired. If smaller than domain of stored object, returns a cropped object. Returns stored object otherwise
  • lonBound (list(float,float)) – [lonmin, lonmax], longitude boundaries of the domain on which the forecast is desired. If smaller than domain of stored object, returns a cropped object. Returns stored object otherwise
  • timeSteps (list(int,int)) – [min_t_index, max_t_index], time span of the desired forecast. If smaller than domain of stored object, returns a cropped object. Returns stored object otherwise
Returns:

List of of weatherTLKT.Weather

Return type:

list()

forest.play_multiple_scenarios(sims)

Display an interactive representation of the wind conditions of a list of simulatorTLKT.Simulator objects.

Parameters:sims (list()) – List of the simulatorTLKT.Simulator objects

Basic use example of the forest class:

import forest as ft
from master_node import deepcopy_dict
from master import MasterTree

# Scenario parameters
date = '20180223'  # January 30, 2018
latBounds = [40, 50]
lonBounds = [360 - 15, 360]

## Download and saves the 5 scenarios
## run this in a terminal if you actually want to download anything
# ft.download_scenarios(date, latBound=latBounds, lonBound=lonBounds, scenario_ids=range(1, 5))

# Load the weathers scnearios
Weathers = ft.load_scenarios(date, latBound=latBounds, lonBound=lonBounds, scenario_ids=range(1, 5))

# Simulators parameters
NUMBER_OF_SIM = 4  # <=20
SIM_TIME_STEP = 6  # in hours
STATE_INIT = [0, 42.5, 360 - 11.5]  # first state
N_DAYS_SIM = 4  # time horizon in days
missionheading = 0
n_trajectories = 50

# Create the simulators
sims = ft.create_simulators(Weathers, numberofsim=NUMBER_OF_SIM, simtimestep=SIM_TIME_STEP,
                            stateinit=STATE_INIT, ndaysim=N_DAYS_SIM)

# Initialize the simulators to get common destination and individual time min
destination, timemin = ft.initialize_simulators(sims, n_trajectories, STATE_INIT, missionheading, plot=True)
print("destination : " + str(destination) + "  &  timemin : " + str(timemin) + "\n")

# Search parameters
name = "tree_exemple"
frequency = 10
budget = 100

# Initialize the forest
forest = ft.Forest(listsimulators=sims, destination=destination, timemin=timemin, budget=budget)

# Launch the search
master_nodes = forest.launch_search(STATE_INIT, frequency)

# Save the result as a tree
new_dict = deepcopy_dict(master_nodes)
forest.master = MasterTree(sims, destination, nodes=new_dict)
forest.master.save_tree(name)

# Displays some juicy results
forest.master.plot_tree_uct()
forest.master.get_best_policy()
forest.master.plot_best_policy()
forest.master.plot_hist_best_policy(interactive=True)

master_node

Module implementing the node class as it is represented in a master tree, a master node stores rewards from multiple scenarios. It is separate from the master module as this class is used in both the master one (the nodes in the final master tree are master node) and in the forest module. In the forest module the Manager.dict used to multiprocess is a dictionary of master modes. However as this Manager.dict represents the master tree being built it is not a MasterTree object from the master module.

class master_node.MasterNode(numscenarios, nodehash=None, parentNode=None, action=None, rewards=[])

Bases: object

Node of a MasterTree.

Variables:
  • hash (int) – hash of the node (key of the dictionary master.MasterTree.nodes)
  • arm (int) – Action taken to get to this node from its parent.
  • parentNode (MasterNode) – parent of this node
  • rewards (numpy.array) – Array of Hist. Its shape is (#scenario, #possible actions).
  • children (list) – List of children (MasterNode)
  • depth (int) – Depth of the node.
add_reward(idscenario, reward)

Includes a reward into the histogram for a random action of one scenario.

Parameters:
  • idscenario (int) – id of the scenario/workertree from which the update is coming.
  • reward (float) – reward of the update.
add_reward_action(idscenario, action, reward)

Includes a reward into the histogram for one action of one scenario.

Parameters:
  • idscenario (int) – id of the scenario/workertree from which the update is coming.
  • action (int) – Action (in degree) of the update
  • reward (float) – reward of the update
is_expanded(idscenario)

Check if this node has been expanded by a scenario.

Parameters:idscenario – id of the scenario
Return boolean:True if the scenario has expanded this node.
master_node.deepcopy_dict(nodes)

Return a deep copy of a the master Manager.dict object. Add also the children, the parentNode and the depth of each node. This method is called after the search before saving the result.

Parameters:nodes (dict) – a dictionary with master_node.MasterNode objects
Returns:the deep copy of the input dictionary

utils

Module implementing two useful classes : the histogram that is used in each node to stored the obtained rewards. And an Player class that implements interactive plots to help with result visualisation.

class utils.Hist(init=[])

Bases: object

Definition of the histogram class.

MAX_REWARD = 1.5

Upper bound of histogram

MEANS = array([ 0.0625, 0.1875, 0.3125, 0.4375, 0.5625, 0.6875, 0.8125, 0.9375, 1.0625, 1.1875, 1.3125, 1.4375])

mean value of each bin (mean between upper and lower threshold of each bin)

MIN_REWARD = 0

Lower bound of histogram

N_BINS = 12

Number of bins in histogram

THRESH = array([ 0. , 0.125, 0.25 , 0.375, 0.5 , 0.625, 0.75 , 0.875, 1. , 1.125, 1.25 , 1.375, 1.5 ])

Values of the thresholds between bins (lower and upper included)

add(value)

Adds the value to the corresponding bin. If value is lower than lowest (resp. highest) threshold then the value is added to the first (resp. last) bin. :param int value: value to be added to the histogram

get_mean()

Computes the mean value of the histogram

Return float:mean value
class utils.Player(fig, func, frames=None, init_func=None, fargs=None, save_count=None, mini=0, maxi=100, pos=(0.125, 0.92), **kwargs)

Bases: matplotlib.animation.FuncAnimation

Class implementing interactive plot, inherits from FuncAnimation and must be used accordingly.