Package solver¶
This package contains all the modules required to perform a parallel Monte-Carlo Tree Search with Upper Confidence bounds for Tree on multiple weather scenarios.
master¶
Module implementing a master tree that incorporates the results of a parallel MCTS-UCT search. However this object is not directly used during the search and is created afterwards, once the search in completed.
-
class
master.
MasterTree
(sims, destination, nodes={}, proba=[])¶ Bases:
object
Tree that stores the final result of a MCTS parallel search on multiple scenarios. Is not the direct output of the search but already incorporates some post-processing.
Variables: - nodes (dict) – dictionary containing
master_node.MasterNode
, the keys are their corresponding hash - probability (numpy.array) – array containing the probability of each scenario
- Simulators (list) – List of the :class:`simulatorTLKT.Simulator`objects used during the search
- numScenarios (int) – Number of scenarios
- destination (list) – Destination state [lat, long]
- best_policy (dict) – Dictionary of list of actions. Key of the dictionary is the scenario id. Key -1 is for the average policy. The list is the sequel of action.
- best_global_nodes_policy (dict) – Dictionary of list of
master_node.MasterNode
encountered during the best policy of one scenario. The Key of the dictionary is the scenario id. Key -1 is for the average policy.
-
get_best_child
(node, idscenario=-1, verbose=False)¶ Compares the children of a node based on their rewards and return the best one.
Parameters: - node (MasterNode) – the parent
master_node.MasterNode
- idscenario (int) – id of the considered scenario. If default (-1), the method returns the best child for the global tree
- verbose (bool) – If True, prints the best reward and the best action.
Returns: A tuple: (the best child, the action taken to go from the node to its best child)
- node (MasterNode) – the parent
-
get_best_policy
()¶ Computes the best policy for each scenario and the global best policy. And add it to the object.
-
get_points
(node, points, probability, coordinate=(0, 0), idscenario=-1, objective='depth')¶ Recursive function used in
plot_tree()
andplot_tree_colored()
to compute the coordinates of a node in the plot and other node properties depending on the objective parameterParameters: - node – a
master_node.MasterNode
object - points (list) – the previous list of points
- probability (np.array) – probability of each scenario
- coordinate (tuple) – coordinates of the previous point
- idscenario (int) – id of the corresponding worker tree to be plot. If -1 (default), the global tree is plotted.
- objective (string) – “uct” to compute exploration, explotation and utility. “depth” to compute the depth of each node.
Returns: the expanded list of points, a point being a tuple
- node – a
-
get_utility
(node, idscenario)¶ Computes the utility of an node as the sum of the exploration and exploitation term. It done either for a given scenario or as the weighted mean over all the scenario (global utility) if idscenario = -1
Parameters: - node – The
master_node.MasterNode
whose utility we want to compute - idscenario – scenario’s id
Returns: exploitation term, exploration term
Return type: float, float
- node – The
-
guess_reward
(node, idscenario)¶ Estimates the reward of an unexplored node (for a given scenario) as the lower reward such that its father node would be expanded next.
Parameters: - node – The
master_node.MasterNode
whose reward we want to estimate - idscenario (int) – scenario’s id
Returns: The estimated reward
Return type: float
- node – The
-
classmethod
load_tree
(name)¶ Load a master tree (object) from the data Folder.
Parameters: name – Name of the file.
-
plot_best_policy
(idscenario=-1, number_subplots=1)¶ Plot a representation of a tree and its best policy.
Parameters: - grey (boolean) – if True, each node/branch are plot with a color (grey scale) depending of the depth of the node
- idscenario (int) – id of the corresponding worker tree to be plot. If -1 (default), the global tree is plotted.
Returns: A tuple (fig, ax) of the current plot
-
plot_hist_best_policy
(idscenario=-1, interactive=False)¶ Plot the best policy as in
plot_best_policy()
, with the histogram of the best action at each node (Animation)Parameters: - idscenario (int) – id of the corresponding worker tree to be plot. If -1 (default), the global tree is plotted.
- interactive (bool) – if True the plot is not an animation but can be browsed step by step
Returns:
-
plot_tree
(idscenario=-1, number_subplots=1, gray=True)¶ Plot a 2D representation of a tree.
Parameters: - gray (boolean) – if True, each node/branch are plot with a color (grey scale) depending of the depth of the node
- idscenario (int) – id of the corresponding worker tree to be plot. If -1 (default), the global tree is plotted.
Returns: A tuple (fig, ax) of the current plot
-
plot_tree_uct
(idscenario=-1)¶ Plot the tree 3 times: for the first one the colormap represents the sum of exploitation and exploration for each node , the second one represents the exploitation and the third one the exploration.
Parameters: idscenario (int) – id of the corresponding worker tree to be plot. If -1 (default), the global tree is plotted. Returns: A tuple (fig, ax) of the current plot
-
save_tree
(name)¶ Save the master tree (object) in the data Folder.
Parameters: name – Name of the file.
- nodes (dict) – dictionary containing
worker¶
Module implementing a worker in the parallel MCTS-UCT search. It is basically a search on a fixed weather scenario.
Created on Wed May 31 10:06:46 2017
@author: paul
-
class
worker.
Node
(state=None, parent=None, origins=[], children=[], depth=0)¶ Bases:
object
Node of a
worker.Tree
.Variables: - state (tuple) – Only for the root Node: initial state (time, lat, lon), None for other node.
- parent (worker.Node) – Reference to the parent of this node.
- origins (list) – sequel of actions taken from to root node to this node.
- children (list) – Child nodes of this node.
- actions (list) – Remaining actions available (not expanded) from this node in random order.
- Values (numpy.array) – Array of
utils.Hist
to save the rewards. Its size is the number of possible actions. - depth (int) – Depth of the node in the Tree.
-
back_up
(reward)¶ Propagates the reward through the Tree, starting from this
worker.Node
, up to the root.Parameters: reward (float) – The reward corresponding to the expansion of this worker.Node
.
-
get_uct
(num_parent)¶ Computes the uct values of this
worker.Node
(combination of exploration and exploitation)Parameters: num_parent (int) – Number of times the parent of the worker.Node
has been explored.Return float: The uct value.
-
is_fully_expanded
()¶ Returns True if this
worker.Node
is fully expanded (if there is not remaining actions)Return boolean: True if fully expanded, False otherwise.
-
worker.
RHO
= 0.3¶ Proportion between master utility and worker utility of node utility.
-
class
worker.
Tree
(workerid, nscenario, probability=[], ite=0, budget=1000, simulator=None, destination=[], TimeMin=0, buffer=[])¶ Bases:
object
A tree which represents a MCTS on one weather scenario.
Variables: - id (int) – Id of the tree, corresponding to the id scenario.
- ite (int) – Current number of iterations done.
- budget (int) – Max. number of iterations
- simulator (Simulator) – The simulator used to do the boat simulations (step, etc.).
- destination (list) – Position [lat, lon] of the wanted destination.
- TimeMax (float) – Time horizon of the search.
- TimeMin (float) – Minimum time to arrive to the destination, computed on several boats which go straight from the initial point to the destination (default policy).
- depth (int) – Maximum depth of the tree.
- Nodes (list) – List of
worker.Node
, representing the tree. - buffer (list) – The buffer is a list of updates to be included in the master Tree. One update is a list : [scenarioId, newNodeHash, parentHash, action, reward].
- numScenarios (int) – Number total of scenarios used during the MCT parallel search.
- probability (numpy.array) – array containing the probability of each scenario. Scenario id as a probability probability[id] to occur.
-
best_child
(node, Master_nodes)¶ Select the best child of a node, by comparing their uct values. The comparison is based on the value of the child in this tree, but also in the master tree, if it exists there.
Parameters: - node (worker.Node) – The parent node.
- Master_nodes (dict) – Manager.dict() (Dictionary of
master_node.MasterNode
) which saves the nodes of every scenario.
Returns: The best child (
worker.Node
) of the parent node given in parameter.
-
default_policy
(node)¶ Policy used to compute the reward of a node. First, the state of the node is estimated with
get_sim_to_estimate_state()
, then the default policy is applied (going straight to the destination).Parameters: node (worker.Node) – The node one wants to evaluate. Return float: The rewards of the node.
-
expand
(node)¶ Creates a new
worker.Node
from a node (its parent). The new node is expanded randomly using an available actions.Parameters: node (worker.Node) – The parent node. Returns: The new worker.Node
.
-
get_master_uct
(node_hash, Master_nodes)¶ Compute the uct value seen by the master tree.
Parameters: - node_hash (int) – the corresponding hash node.
- Master_nodes (dict) –
Manager.dict() (Dictionary of
master_node.MasterNode
objects) which saves the nodes of every scenario.
Return float: The uct value of the node passed in parameter.
-
get_sim_to_estimate_state
(node)¶ Brings the simulator to an estimate state (time, lat, lon) of a node. Since the dynamic is not deterministic, the state is an estimation.
Parameters: node (worker.Node) – The nodes one wants to estimate.
-
integrate_buffer
(Master_nodes)¶ Integrates the buffer of update from this scenario (the buffer is an attribute of the
Tree
) . The buffer is a list of updates coming from the worker. One update is a list : [scenarioId, newNodeHash, parentHash, action, reward]Parameters: Master_nodes (dict) – Manager.dict() (Dictionary of
master_node.MasterNode
objects) which saves the nodes of every scenario.
-
is_node_terminal
(node)¶ Checks if the corresponding time of a node is equal to the time horizon of the simulation.
Parameters: node (worker.Node) – The node that is checked Returns: True if the node is terminal, False otherwise.
-
static
is_state_at_dest
(destination, stateA, stateB)¶ Determines if the boat has gone beyond the destination. In this case, computes how much the boat has overshot.
Parameters: - destination – Destination state (goal).
- stateA – Previous state of the simulator.
- stateB – Current state of the simulator.
Returns: [True, frac] if the boat has reached the destination (frac is the last iteration proportion corresponding to the part stateA–destination). Returns [False, None] otherwise.
-
static
is_state_terminal
(simulator, state)¶ Determines if a state is considered as terminal (time is equal or larger than the time horizon, or if the boat is out of the zone of interest).
Parameters: - simulator (Simulator) – Simulator of the scenario.
- state (list) – State one wants to test [time, lat, lon]
Returns: True if the state is considered as terminal, False otherwise.
-
tree_policy
(node, master_nodes)¶ Method implementing the tree policy phase in the MCTS-UCT search. Starts from the root nodes and iteratively selects the best child of the nodes. A node must be fully expanded before we continue down to its best child.
Parameters: - node – starting node of the tree policy, usually the root node.
- Master_nodes (dict) –
Manager.dict() (Dictionary of
master_node.MasterNode
) which saves the nodes of every scenario.
Returns: the newly expanded node
Return type:
-
uct_search
(rootState, frequency, Master_nodes)¶ Launches the MCTS for the scenario.
Parameters: - rootState (list) – State [t_index, lat, lon] of the root node.
- frequency (int) – Length of the buffer: number of iterations between each buffer integrations.
- Master_nodes (dict) –
Manager.dict() (Dictionary of
master_node.MasterNode
) which saves the nodes of every scenario.
-
worker.
UCT_COEFF
= 0.1414213562373095¶ Exploration coefficient in the UCT formula.
forest¶
Module managing multiple workers to carry out a parallel MCTS-UCT search and incorporating the results into a master.
-
class
forest.
Forest
(listsimulators=[], destination=[], timemin=0, budget=100)¶ Bases:
object
Object coordinating a MasterTree and its WorkerTrees during a parallel MCTS search.
Variables: - listsimulators (list) – a list of
simulatorTLKT.Simulator
objects, each simulator should correspond to a different weather scenario - destination (list(float,float)) – [lat,lon], destination to reach
- timemin (float) – reference time to define the reward model, comes from the
initialize_simulators()
function - budget (int) – number of nodes in each worker at the end of the search
-
launch_search
(root_state, frequency)¶ Launches a parallel MCTS search. It uses multiprocessing class. Each worker is a multiprocessing.Process object. The master is a multiprocessing.Manager hosting a dict() (called Master_nodes) that can be read and written into by each worker.
Parameters: - root_state (list(int,float,float)) – [time index, lat, lon], state of the root node of the sea
- frequency (int) – number of node that are expanded in a worker before this one updates the corresponding nodes in the master.
Returns: dictionnary of
master_node.MasterNode
the keys are the nodes’ hashes. These results must be deep copied withmaster_node.deepcopy_dict()
if you want to do anything with them.Return type: multiprocessing.Manager.dict
- listsimulators (list) – a list of
-
forest.
create_simulators
(weathers, numberofsim, simtimestep=6, stateinit=(0, 47.5, 356.5), ndaysim=8, deltalatlon=0.5)¶ Creates and returns a list of
simulatorTLKT.Simulator
.Parameters: - weathers (list()) – List of
weatherTLKT.Weather
objects, each one is the wind scenario of thesimulatorTLKT.Simulator
object to be created - numberofsim (int) – number of simulator we want to create. Must be lower or equal to len(weathers). The simulators are created by chronologically browsing the weathers list
- simtimestep (float) – time step in hours of the simulator
- stateinit (list(int,float,float)) – [t_index, lat, lon], initial state of the simulators
- ndaysim (int) – time horizon of the search. Must be smaller or equal to the weather object time horizon.
- deltalatlon (float) – latitude and longitude discretization of the simulators. Just for plotting purposes
Returns: List of
simulatorTLKT.Simulator
Return type: list()
- weathers (list()) – List of
-
forest.
download_scenarios
(mydate, latBound=[43, 50], lonBound=[350, 360], website='http://nomads.ncep.noaa.gov:9090/dods/', scenario_ids=range(0, 21))¶ To download the weathers scenarios for a MCTS. Scenarios are saved as ‘../data/’ + mydate + ‘_’ + s_id + ‘00z.obj’ where s_id is the scenario id. id = 0 corresponds to the mean scenario. They are stored as
weatherTLKT.Weather
objects.Warning: you might need to launch it from a terminal and not your IDE
Parameters: - mydate (string) – ‘yyyymmdd’ date of the day starting the weather forecast in US format
- latBound (list(float,float)) – [latmin, latmax], latitude boundaries of the domain on which the forecast is desired
- lonBound (list(float,float)) – [lonmin, lonmax], longitude boundaries of the domain on which the forecast is desired
- website (string) – url of the server hosting the forecasts
- scenario_ids (range) – range covering the forecasts scenarios we want to download
-
forest.
initialize_simulators
(sims, ntra, stateinit, missionheading, plot=False)¶ Find the destination point and the reference time for a list of simulators. These quantities are mandatory in order to carry out a MCTS parallel search.
Parameters: - sims (list()) – List of
simulatorTLKT.Simulator
- ntra (int) – Number of trajectories used to estimate the initialization
- stateinit (list(int,float,float)) – t_index, lat, lon], starting point of the MCTS search
- missionheading (float) – angle in deg wrt true North of the desired destination from stateinit
- plot (bool) – if True displays the initialization trajectories
Returns: [destination, timemin], destination are the coordinates of the destination point and timemin is in days
Return type: list(list(float,float), float)
- sims (list()) – List of
-
forest.
load_scenarios
(mydate, scenario_ids=range(0, 21), latBound=[-90, 90], lonBound=[0, 360], timeSteps=[0, 64])¶ Loads previously downloaded scenarios and returns the corresponding list of
weatherTLKT.Weather
objects.Parameters: - mydate (string) – ‘yyyymmdd’ date of the day starting the weather forecast in US format
- scenario_ids – range covering the forecasts scenarios we want to download
- latBound (list(float,float)) – [latmin, latmax], latitude boundaries of the domain on which the forecast is desired. If smaller than domain of stored object, returns a cropped object. Returns stored object otherwise
- lonBound (list(float,float)) – [lonmin, lonmax], longitude boundaries of the domain on which the forecast is desired. If smaller than domain of stored object, returns a cropped object. Returns stored object otherwise
- timeSteps (list(int,int)) – [min_t_index, max_t_index], time span of the desired forecast. If smaller than domain of stored object, returns a cropped object. Returns stored object otherwise
Returns: List of of
weatherTLKT.Weather
Return type: list()
-
forest.
play_multiple_scenarios
(sims)¶ Display an interactive representation of the wind conditions of a list of
simulatorTLKT.Simulator
objects.Parameters: sims (list()) – List of the simulatorTLKT.Simulator
objects
Basic use example of the forest class:
import forest as ft
from master_node import deepcopy_dict
from master import MasterTree
# Scenario parameters
date = '20180223' # January 30, 2018
latBounds = [40, 50]
lonBounds = [360 - 15, 360]
## Download and saves the 5 scenarios
## run this in a terminal if you actually want to download anything
# ft.download_scenarios(date, latBound=latBounds, lonBound=lonBounds, scenario_ids=range(1, 5))
# Load the weathers scnearios
Weathers = ft.load_scenarios(date, latBound=latBounds, lonBound=lonBounds, scenario_ids=range(1, 5))
# Simulators parameters
NUMBER_OF_SIM = 4 # <=20
SIM_TIME_STEP = 6 # in hours
STATE_INIT = [0, 42.5, 360 - 11.5] # first state
N_DAYS_SIM = 4 # time horizon in days
missionheading = 0
n_trajectories = 50
# Create the simulators
sims = ft.create_simulators(Weathers, numberofsim=NUMBER_OF_SIM, simtimestep=SIM_TIME_STEP,
stateinit=STATE_INIT, ndaysim=N_DAYS_SIM)
# Initialize the simulators to get common destination and individual time min
destination, timemin = ft.initialize_simulators(sims, n_trajectories, STATE_INIT, missionheading, plot=True)
print("destination : " + str(destination) + " & timemin : " + str(timemin) + "\n")
# Search parameters
name = "tree_exemple"
frequency = 10
budget = 100
# Initialize the forest
forest = ft.Forest(listsimulators=sims, destination=destination, timemin=timemin, budget=budget)
# Launch the search
master_nodes = forest.launch_search(STATE_INIT, frequency)
# Save the result as a tree
new_dict = deepcopy_dict(master_nodes)
forest.master = MasterTree(sims, destination, nodes=new_dict)
forest.master.save_tree(name)
# Displays some juicy results
forest.master.plot_tree_uct()
forest.master.get_best_policy()
forest.master.plot_best_policy()
forest.master.plot_hist_best_policy(interactive=True)
master_node¶
Module implementing the node class as it is represented in a master tree, a master node stores rewards from multiple scenarios. It is separate from the master module as this class is used in both the master one (the nodes in the final master tree are master node) and in the forest module. In the forest module the Manager.dict used to multiprocess is a dictionary of master modes. However as this Manager.dict represents the master tree being built it is not a MasterTree object from the master module.
-
class
master_node.
MasterNode
(numscenarios, nodehash=None, parentNode=None, action=None, rewards=[])¶ Bases:
object
Node of a MasterTree.
Variables: - hash (int) – hash of the node (key of the dictionary
master.MasterTree.nodes
) - arm (int) – Action taken to get to this node from its parent.
- parentNode (MasterNode) – parent of this node
- rewards (numpy.array) – Array of Hist. Its shape is (#scenario, #possible actions).
- children (list) – List of children (
MasterNode
) - depth (int) – Depth of the node.
-
add_reward
(idscenario, reward)¶ Includes a reward into the histogram for a random action of one scenario.
Parameters: - idscenario (int) – id of the scenario/workertree from which the update is coming.
- reward (float) – reward of the update.
-
add_reward_action
(idscenario, action, reward)¶ Includes a reward into the histogram for one action of one scenario.
Parameters: - idscenario (int) – id of the scenario/workertree from which the update is coming.
- action (int) – Action (in degree) of the update
- reward (float) – reward of the update
-
is_expanded
(idscenario)¶ Check if this node has been expanded by a scenario.
Parameters: idscenario – id of the scenario Return boolean: True if the scenario has expanded this node.
- hash (int) – hash of the node (key of the dictionary
-
master_node.
deepcopy_dict
(nodes)¶ Return a deep copy of a the master Manager.dict object. Add also the children, the parentNode and the depth of each node. This method is called after the search before saving the result.
Parameters: nodes (dict) – a dictionary with master_node.MasterNode
objectsReturns: the deep copy of the input dictionary
utils¶
Module implementing two useful classes : the histogram that is used in each node to stored the obtained rewards. And an Player class that implements interactive plots to help with result visualisation.
-
class
utils.
Hist
(init=[])¶ Bases:
object
Definition of the histogram class.
-
MAX_REWARD
= 1.5¶ Upper bound of histogram
-
MEANS
= array([ 0.0625, 0.1875, 0.3125, 0.4375, 0.5625, 0.6875, 0.8125, 0.9375, 1.0625, 1.1875, 1.3125, 1.4375])¶ mean value of each bin (mean between upper and lower threshold of each bin)
-
MIN_REWARD
= 0¶ Lower bound of histogram
-
N_BINS
= 12¶ Number of bins in histogram
-
THRESH
= array([ 0. , 0.125, 0.25 , 0.375, 0.5 , 0.625, 0.75 , 0.875, 1. , 1.125, 1.25 , 1.375, 1.5 ])¶ Values of the thresholds between bins (lower and upper included)
-
add
(value)¶ Adds the value to the corresponding bin. If value is lower than lowest (resp. highest) threshold then the value is added to the first (resp. last) bin. :param int value: value to be added to the histogram
-
get_mean
()¶ Computes the mean value of the histogram
Return float: mean value
-
-
class
utils.
Player
(fig, func, frames=None, init_func=None, fargs=None, save_count=None, mini=0, maxi=100, pos=(0.125, 0.92), **kwargs)¶ Bases:
matplotlib.animation.FuncAnimation
Class implementing interactive plot, inherits from FuncAnimation and must be used accordingly.