Reproducibility guide¶

Attention

This lab is all inclusive (including MAD and EnOS) at https://gitlab.inria.fr/Madeus/mad-openstack

Topology deployed in this lab¶

The lab makes use of EnOS to book the resources on Grid‘5000, and MAD to deploy OpenStack on those resources. In particular, we will need four G5K machines for our deployment:

mad-node: a machine we will deploy ourselves to run EnOS and MAD;
control node: hosts the control modules, projects’ APIs and databases;
network node: hosts network agents;
compute node: manages the compute modules where guest VMs live.

Note that while we will deploy mad-node ourselves on G5K, the three other nodes will be deployed automatically by EnOS. The following figure depicts the status of the different components in play during the lab:

                       +---------------+
+----------------------+ g5k-frontend  +----------------------+
|                      +-------+-------+                      |
|                              |                              |
|                              v                              |
|                      +---------------+                      |
|           +----------+    mad-node   +----------+           |
|           |          +---------------+          |           |
|           |                  |                  |           |
|           v                  v                  v           |
|   +-------+-------+  +-------+-------+  +-------+------ +   |
|   | compute-node  |  | control-node  |  | network-node  |   |
|   |               |  |               |  |               |   |
|   | * container 1 |  | * container 1 |  | * container 1 |   |
|   | * container 2 |  | * container 2 |  | * container 2 |   |
|   | * ...         |  | * ...         |  | * ...         |   |
|   | * container n |  | * container n |  | * container n |   |
|   +---------------+  +---------------+  +---------------+   |
|                                                             |
+-------------------------------------------------------------+

EnOS will be in charge of provisioning the compute, control and network nodes. Afterwards, mad will deploy Docker containers inside each nodes, which correspond to OpenStack services. For instance, the control-node will host the /nova-api/ and /nova-scheduler/ containers while the compute-node will host the /nova-compute/ and /nova-libvirt/ containers to provide VM hypervisor mechanisms.

Note that to deploy on G5K, we need a dedicated node to run EnOS and MAD because it is discouraged to run experiments on the frontend. This restriction is meant to avoid disturbing other users that are logged to the frontend node, since it has limited resources. On a regular deployment, EnOS could be run directly from your laptop.

Provisioning of the mad-node¶

The first step is to determine on which cluster you will deploy OpenStack. To that end, you can run funk (Find yoUr Nodes on g5K) from any frontend to see the availability on G5K:

user@laptop~$ ssh rennes.g5k
frennes:~$ funk -w 4:00:00

In this example, we check the availability of G5K’s clusters for the next four hours (adapt the time regarding your situation). Note that you can adapt the time of your reservation afterward, using the oarwalltime command. Find a cluster with at least four nodes available before going further. Once it is done, reach the cluster’s site first, and then, get a new machine which we will use as our Madeus node (mad-node). In this document, we target the nova cluster, located in the Lyon site:

frennes:~$ ssh lyon

Note that we created a ~tmux~ session in order to be resilient to any failure during your ssh session. Whenever you want to restore this session, you can connect to the frontend and attach to your /tmux session/, as follows:

flyon:~$ tmux

Then, there are two strategies regarding how you can setup the Madeus node:

The first one (the easiest and fastest one) can be sufficient if you are not interested in collecting metrics from the mad-node. It will set the mad-node without kadeploy.
Otherwise, you should deploy this node with kadeploy.

Deploy the mad-node without Kadeploy¶

flyon:~$ oarsub -I -l "nodes=1,walltime=4:00:00" -p "cluster='nova'

In this example, we get a new machine in interactive mode (i.e. “-I”) for the next four hours from the nova cluster. If it succeeds you should be directly connected to this node (check your prompt).

Deploy the mad-node with Kadeploy¶

flyon:~$ oarsub -I -l "nodes=1,walltime=4:00:00" -p "cluster='nova'" -t deploy
flyon:~$ MAD_NODE=$(cat $OAR_NODE_FILE | uniq)
flyon:~$ kadeploy3 -f ${OAR_NODE_FILE} -e debian9-x64-big -k
...
flyon:~$ # The previous command can take some time...
...
flyon:~$ ssh root@${MAD_NODE} apt-get install dirmngr
flyon:~$ ssh ${MAD_NODE}

The difference here regarding the oarsub command lies in the -t deploy which is used to set your job in “deploy” mode. Then, we ask kadeploy to provision the node with a g5k debian9-x64-big environment. Finally, we need to install dirmngr on the node.

Install MAD and EnOS on the mad-node¶

Download MAD in your working directory, and install it:

user@mad-node:~$ git clone https://gitlab.inria.fr/dpertin/mad_model -b expe/openstack mad
user@mad-node:~$ cd mad
user@mad-node:~/mad$: make install_deps
user@mad-node:~/mad$: source venv/bin/activate

Note that MAD is a Python project. We installed it inside a virtual environment, with virtualenv, to avoid any conflict regarding the version of its dependencies. Furthermore, it does not install anything outside the virtual environment which keeps your OS clean. Remember that you have to be in the virtual environment to use MAD. It means that if you open a new terminal on the mad-node, you need to re-enter the /venv/. For instance, now that MAD is installed, you can come as follow: :

user@laptop:~$ ssh lyon.g5k
user@flyon:~$ cd ~/mad
user@flyon:~/mad$: source venv/bin/activate

Please check everything is fine by running:

(venv) user@mad-node:~/mad$: ./bench.py -h

Prepare the Benchmark¶

Set the benchmark parameters¶

The benchmark parameters are set in the bench_params.yaml file. In this file, you can define many benchmark scenarios that can be selected when you call the MAD benchmark tool. Here is a scenario example:

# This file describe different benchmark definitions. Definitions contain
# benchmark configurations and parameters given to execo:

#Here is how to describe a scenario:
#<scenario_title>:
    #params:
        #tool: <list of tools>              # ["mad", "shell"]
        #test_type: <list of tests>         # ["seq_1t", "dag_2t", "dag_nt", "dag_nt4"]
        #registry: <list of registry>       # ["cached", "local", "remote"]
    #iterations: <number of iterations>     # 10
    #monitoring: <boolean>                  # True
    #reservation_file: <path to enos file>  # "./enos/reservation_g5k_perf.yaml"

perf:
    # Here are defined the parameters related to the Execo bench engine:
    params:
        tool: ["mad"]
        test_type: ["seq_1t", "dag_2t", "dag_nt4"]
        registry: ["cached", "local", "remote"]
    # Here are defined global parameters for our benchmarks:
    iterations: 10
    monitoring: True
    reservation_file: "./enos/reservation_g5k_perf.yaml"

In a scenario, we first set the Execo parameters under the key parameter:

tool: the list of tool to consider (mad and/or shell);
test_type: the list of tests to benchmark;
registry: the configuration of the registry (cached, local and/or remote).
repeat (optional): the list of iteration index to play.

Furthermore, we need global parameters:

iterations (optional): the number of iterations;
monitoring: when set to True it deploys the monitoring stack on nodes;
reservation_file: path to the reservation.yaml file required by EnOS.

For instance, the ~perf~ scenario is adapted to compare multiple Madeus component definitions. Here it compares the deployment time between Ansible-like (i.e. seq_1t), Aeolus-like (i.e. dag_2t), Mad-like (i.e. dag_nt4) definitions, regarding three registry configurations.

Set the EnOS parameters¶

To book resources, EnOS reads a configuration file. This file states the OpenStack resources you want to measure together with their topology. A configuration could say, “Deploy a basic OpenStack on a single node”, or “Put OpenStack control services on ClusterA and compute services on ClusterB”, but also “Deploy each OpenStack services on a dedicated node and add WAN network latency between them”.

The description of the configuration is done in a reservation.yaml file:

---
# ############################################### #
# Grid'5000 reservation parameters                #
# ############################################### #
provider:
  type: g5k
  name: 'mad-nova-perf'
  walltime: '4:00:00'
  # mandatory : you need to have exacly one vlan
  vlans:
     lyon: "{type='kavlan'}/vlan=1"
  # Be less strict on node distribution especially
  # when nodes are missing in the reservation
  # or not deployed
  role_distribution: debug

resources:
  nova:
    control: 1
    compute: 1
    network: 1


# ############################################### #
# Inventory to use                                #
# ############################################### #

# This will describe the topology of your services
inventory: inventory


# ############################################### #
# docker registry parameters
# ############################################### #

# A registry will be deployed and used during the deployment
registry:
  type: none


# ############################################### #
# Enos Customizations                             #
# ############################################### #
enable_monitoring: yes


# ############################################### #
# Kolla parameters                                #
# ############################################### #
# Repository
kolla_repo: "https://git.openstack.org/openstack/kolla-ansible"
kolla_ref: "stable/pike"

# Vars : globals.yml
kolla:
  kolla_base_distro: "centos"
  kolla_install_type: "source"
  docker_namespace: "beyondtheclouds"
  openstack_release: "5.0.1"
  enable_heat: "no"
  enable_horizon: "no"

Use your favorite text editor to open/create the reservation.yaml file, for instance: vim enos/reservation_g5k_perf.yaml, and edit the file to fit your situation. In the following, we will study some parts of the document in particular:

provider key: define on which testbed to deploy OpenStack;
resources key: define how many and what machines to deploy on the testbed.

First, this file provides a description of the desired testbed, under the provider key. Way you describe your topology may vary a little bit depending on the testbed you target. The current EnOS implementation supports Vagrant (VBox), Grid’5000, Chameleon and OpenStack itself. To that end, EnOS defines different providers which are in charge of provisioning resources on a a specific testbed. Please, refer to the EnOS provider documentation to find examples of resources description depending on the testbed. For the sake of this lab we are going to use the Grid‘5000 provider.

In particular, pay attention to 2 elements you might need to adapt here:

walltime: define the time of your reservation;
vlans: to interconnect your resources on G5K, EnOS relies on KaVLAN which must be reserved on from the site you plan to deploy OpenStack on, prior the experiment. In our example, we reserve a VLAN from the rennes site. Adapt the related value regarding the G5K site you plan to use.

Second, the description of the desired resources is declared in the reservation_g5k_perf.yaml file, under the resources key. Here we declare the G5K cluster we target, as well as the resources we want to deploy. In the provided file, we declare 3 machines on the nova cluster (which is located in Lyon): a ~control~ node, a compute node and a network node on which will be deployed all the required OpenStack services.

Note

In a general case, you can generate this file by typing ~enos new > generated_reservation.yaml that you can explore for you curiosity. It contains definitions regarding different deployment testbeds: not only G5K but also vagrant, chameleon, etc.

Benchmark¶

Launch the benchmark scenario with:

(venv) user@mad-node:~/mad$ ./bench.py -L -M -f bench_params.yaml -t perf

Note that at the end of your session, you can release your reservation by typing:

(venv) user@mad-node:~/mad$ cd enos
(venv) user@mad-node:~/mad/enos$ source venv/bin/activate
(venv) user@mad-node:~/mad/enos$ enos destroy --hard

This will destroy all your deployment and delete your reservation.

Since execo keeps records of the benchmark in a directory called MadBench_<date>, you can stop the benchmark at some point, then continue the benchmark where it stopped by using the -c parameter as follows:

(venv) user@mad-node:~/mad$ ./bench.py -L -M -f bench_params.yaml -t perf -c MadBench_<date>

Please be sure to adapt the desired record directory MadBench_<date> to your situation.