As we’ve already seen in our previous articles Mirantis Cloud Platform (MCP) is really flexible and can tackle lots of different use cases. Last time we’ve looked at using Ceph as the OpenStack storage backend. Today we are reviewing different ways to leverage Neutron Open vSwitch ML2 plugin instead of the standard OpenContrail SDN solution to offer networking as a service to our users.
Introduction
To model Open vSwitch Networking in MCP, we first have to choose between different options. What kind of segmentation we’ll be using for our tenant networks, VxLAN
or VLAN
? Do we want to use distributed routers (DVR
) for East-West routing, and also to directly access floating IPs on Compute ?
Let see how we can achieve all of this and more from our model.
VxLAN or VLAN segmentation ?
Specifying which network segmentation will be used is as simple as specifying it in a single configuration line.
VxLAN segmentation:
#!yaml
# vi classes/cluster/xxx/openstack/init.yml
parameters:
_param:
neutron_tenant_network_types: "flat,vxlan"
Notes:
- In the remaining of the article, replace
xxx
by the cluster name you choosed when you’ve generated your model. - If you use VxLAN make sure the data network MTU is set to at least 1550, we’ll see this later.
VLAN segmentation:
Instead set
#!yaml
neutron_tenant_network_types: "flat,vlan"
If you need to auto assign VLANs
ID to your tenant network, you also have to specify a VLAN
ID range
#!yaml
neutron_tenant_vlan_range: "1200:1900"
MCP will then configure these VLANs in /etc/neutron/plugins/ml2/ml2_conf.ini
, we’ll explain bridge mappings in the next chapter but they will be assigned to physnet2 like this
#!yaml
network_vlan_ranges = physnet1,physnet2:1200:1900
So if a tenant then ask for a network, VLAN
segmented, an ID within the 1200-1900 pool will be automatically selected, traffic will go thru the bridge associated with physnet2
(br-prv
). When an external provider network will be created with a VLAN outside of this range, the traffic will then go thru physnet1
(br-floating
) which allow all VLANs (nothing set), it’s an easy way to differentiate tenant networks and external networks traffic paths.
both VxLAN and VLAN segmentation
It’s also possible to allow both VxLAN
and VLAN
tenant network types with
#!yaml
neutron_tenant_network_types: "flat,vlan,vxlan"
In this situation the ordering matters, if VLAN
is first, neutron will first consume all available VLAN
ID from the allocated Pool (network_vlan_ranges
) before creating any VxLAN backed network. If it were opposite it would instead create VxLAN backed tenant networks by default.
about neutron VRRP HA routers
Neutron HA router VRRP heartbeat will be exchanged on a tenant network created using the VLAN
or VxLAN
. In the case above, if you use both, the first one in the list will be choosen. If you want otherwise it’s possible to specify it in l3_ha_network_type
but this isn’t parametrized in neutron salt formula yet, so easiest way is to use a good ordering above instead. You also have to know that even if you remove all HA routers of your project, the ha network will stay behind waiting for new routers to be instantiated, but you won’t see these networks, only cloud admin can.
But lets talk again about these routers later on.
Distributed Virtual Router (DVR) ?
Still in openstack/init.yml
, you can specify if you want to use DVR
or not, with the following parameters.
_param | Non DVR | DVR east-west network nodes for north-south |
DVR east-west floating IP on compute |
---|---|---|---|
neutron_control_dvr | False | True | True |
neutron_gateway_dvr | False | True | True |
neutron_compute_dvr | False | True | True |
neutron_gateway_agent_mode | legacy | dvr_snat | dvr_snat |
neutron_compute_agent_mode | legacy | dvr | dvr |
neutron_compute_external_access | False | False | True |
All DVR use cases will set router_distributed
to True in neutron.conf
so all tenant routers will be DVR based by default.
So as an example if you want DVR for both east-west and floating IPs with VxLAN segmentation and L3 HA routers, you should have in your openstack/init.yml
#!yaml
_param:
neutron_tenant_network_types: "flat,vxlan"
neutron_control_dvr: True
neutron_gateway_dvr: True
neutron_compute_dvr: True
neutron_gateway_agent_mode: dvr_snat
neutron_compute_agent_mode: dvr
neutron_compute_external_access: True
neutron_l3_ha: True
neutron_global_physnet_mtu: 9000
neutron_external_mtu: 9000
By having neutron_compute_external_access
set as True a bridge mapping (physnet1
) to br-floating
will be created on compute to allow them to access public network for instance floating IPs. North-South traffic of instances with floating IPs can then avoid to go thru network nodes.
If you use DVR keep in mind that they are incompatible with advanced services (LBaaS, FWaaS, VPNaaS), IPv6 and L3 HA routers but work is under way to add more support.
other settings
When the neutron salt state will be run on our neutron-servers
, the following additional settings set in openstack/init.yml
can also be applied to its configuration files.
|_param|meaning|/etc/neutron/…|conf param|default|
|:-:|:-:|
|neutron_l3_ha
|use VRRP for router HA ?|neutron.conf|l3_ha|False|
|neutron_global_physnet_mtu
|MTU of the underlying physical network|neutron.conf
|global_physnet_mtu
|1500|
|neutron_external_mtu
|MTU associated with external network (physnet1
)|plugins/ml2/ml2_conf.ini
|physical_network_mtus
|1500|
Neutron Bridge mappings
A bridge mapping is a comma-separated list of <physical_network>:VLAN
) and untagged (flat) traffic.
Depending on the segmentation you’ve choosen, VxLAN
or VLAN
different mappings will be configured by our salt-formula-neutron in /etc/neutron/plugins/ml2/openvswitch_agent.ini
. As you’l see below as soon as you use VLAN
tenant network type, a physnet2
mapping will be configured to br-prv
.
VxLAN | centralized | DVR east-west | DVR for all |
---|---|---|---|
Network nodes | physnet1:br-floating | ||
Compute nodes | empty | physnet1:br-floating |
VLAN | centralized | DVR for all | |
---|---|---|---|
Network Nodes | physnet1:br-floating,physnet2:br-prv | ||
Compute Nodes | physnet2:br-prv | physnet1:br-floating,physnet2:br-prv |
br-floating
is a provider OVS bridge, created by admin, connected to the external/public network and mapped to physnet1.
br-prv
is a provider OVS bridge, created by admin, connected to the data network (VLAN segmented), it will be automatically connected to the integration bridge (br-int
) where all guests are connected. It’s the tenant traffic bridge, mapped to physnet2.
But you don’t have to care too much about these mappings, they are managed by the neutron salt formula based on the value of neutron_tenant_network_types
and neutron_control_dvr
as described in the above table.
Linux Networking > Bonds and Bridges settings
We’ve just added mapping to Open vSwitch bridges, they need to exist on our nodes or neutron won’t be happy. So lets configure them by specifying Pillar data that will be consumed by our salt-formula-linux for their creation on network and compute nodes.
Pillar data for compute and gateways needs to be specified respectively in classes/cluster/xxx/openstack/compute.yml
and classes/cluster/xxx/openstack/gateways.yml
.
Network requirements | VxLAN | VLAN | |||
---|---|---|---|---|---|
routing | centralized | DVR east-west |
DVR for all |
centralized | DVR for all |
network nodes | br-floating br-mesh (port on br-floating ) |
br-floating br-prv (connected to br-floating ) |
|||
compute nodes | br-mesh (linux bridge) |
br-floating br-mesh |
br-prv |
br-floating br-prv (connected to br-floating ) |
br-mgmt
is also required on all nodes for Openstack and other management traffic.
Lets decompose the different requirement in the next few sections.
br-floating
This Open vSwitch bridge is required in all of our use cases. It can easily be created from this YAML Pillar data snippet that should be present in both gateways.yml
and also compute.yml
if you want your floating IPs to be directly accessible (DVR
for all)
#!yaml
parameters:
linux:
network:
bridges: openvswitch
interfaces:
br-floating:
enabled: true
type: ovs_bridge
It will be useless without any connectivity, so we have to have also a bond with some physical interfaces connected to the public network associated with it. VLAN tagging will be managed on by Neutron API.
#!yaml
primary_second_nic:
name: ${_param:primary_second_nic}
enabled: true
type: slave
mtu: 9000
master: bond0
primary_first_nic:
name: ${_param:primary_first_nic}
enabled: true
type: slave
mtu: 9000
master: bond0
bond0:
enabled: true
proto: manual
ovs_bridge: br-floating
ovs_type: OVSPort
type: bond
use_interfaces:
- ${_param:primary_second_nic}
- ${_param:primary_first_nic}
slaves: ${_param:primary_second_nic} ${_param:primary_first_nic}
mode: 4
mtu: 9000
You can replace LACP (mode 4), by active-backup
if you prefer.
Nics names
You may wonder where these primary_first_nic
and primary_second_nic
are defined ? We’ve parameterized these Nic names to avoid to repeat ourselves. Each nodes define its nics into classes/cluster/xxx/infra/config.yml
like this
#!yaml
# vi classes/cluster/xxx/infra/config.yml
classes:
- system.reclass.storage.system.openstack_gateway_cluster
...
parameters:
reclass:
storage:
node:
...
openstack_gateway_node01:
params:
primary_first_nic: enp3s0f0
primary_second_nic: enp3s0f1
More parameters for each of our nodes are defined in Mirantis system repo.
For example a cluster of three gateways (gtw01,gtw02,gtw03
) are already defined in system.reclass.storage.system.openstack_gateway_cluster
so you just need to define their nics ${_param:primary_xxx_nic}
as shown above, the rest will be inherited. It’s the purpose of Reclass itself, to abstract away the complexity from the cloud admin.
br-mesh | br-prv
Apart from the br-floating
, our network and compute nodes also require a connectivity to our data network, using br-mesh
(VxLAN
) or br-prv
(VLAN
) for tenant traffic.
br-mesh on compute
On compute, if you’ve selected VxLAN
, you have to create a br-mesh
linux bridge to handle encapsulated traffic, bind a tenant-address
and associate a VLAN subinterface of our bond to it.
#!yaml
# vi classes/cluster/xxx/openstack/compute.yml
parameters
linux:
network:
interface:
...
br-mesh:
enabled: true
type: bridge
address: ${_param:tenant_address}
netmask: <DATA_NETWORK_NETMASK>
mtu: 9000
use_interfaces:
- <BOND>.<VLAN_ID>
br-mesh on network nodes
On Network nodes, br-mesh
is an OVS internal ports of br-floating
with tag and ip addresses
#!yaml
br-mesh:
enabled: true
type: ovs_port
bridge: br-floating
proto: static
ovs_options: tag=<DATA_NETWORK_VLAN_ID>
address: ${_param:tenant_address}
netmask: <DATA_NETWORK_NETMASK>
br-prv on compute
For VLAN segmentation, you need to create instead a br-priv
OVS Bridge and connect a bond to it. VLANs will be managed by Neutron API.
#!yaml
# vi classes/cluster/xxx/openstack/compute.yml
parameters:
linux:
network:
bridge: openvswitch
interface:
bond0:
enabled: true
proto: manual
ovs_bridge: br-prv
ovs_type: OVSPort
type: bond
use_interfaces:
- ${_param:primary_second_nic} ${_param:primary_first_nic}
slaves: ${_param:primary_first_nic}
mode: 4
mtu: 9000
br-prv:
enabled: true
type: ovs_bridge
br-prv on network nodes
On network nodes, br-prv
is a OVS Bridge connected to br-floating
.
#!yaml
# vi classes/cluster/xxx/openstack/gateway.yml
parameters:
linux:
network:
bridge: openvswitch
interface:
...
br-prv:
enabled: true
type: ovs_bridge
floating-to-prv:
enabled: true
type: ovs_port
port_type: patch
bridge: br-floating
peer: prv-to-floating
prv-to-floating:
enabled: true
type: ovs_port
port_type: patch
bridge: br-prv
peer: floating-to-prv
br-mgmt
Lastly br-mgmt
is also required on all nodes for Openstack and other management traffic, by the way it isn’t specific to our Open vSwitch ML2 plugin, It’s also required for OpenContrail plugin.
br-mgmt on compute
On compute node it’s a linux bridge connected to a bond subinterface
#!yaml
# vi classes/cluster/xxx/openstack/gateway.yml
parameters:
linux:
network:
bridge: openvswitch
interface:
bond0.<MGMT_NETWORK_VLAN_ID>:
enabled: true
type: vlan
proto: manual
mtu: 9000
use_interfaces:
- bond0
br-mesh:
enabled: true
type: bridge
address: ${_param:tenant_address}
netmask: <MGMT_NETWORK_NETMASK>
mtu: 9000
use_interfaces:
- bond0.<MGMT_NETWORK_VLAN_ID>
br-mgmt on network-nodes
On network nodes, it’s an OVS internal ports of br-floating
with tag and ip addresses
#!yaml
# vi classes/cluster/xxx/openstack/gateway.yml
parameters:
linux:
network:
bridge: openvswitch
interface:
...
br-mgmt:
enabled: true
type: ovs_port
bridge: br-floating
proto: static
ovs_options: tag=<MGMT_NETWORK_VLAN_ID>
address: ${_param:single_address}
netmask: <MGMT_NETWORK_NETMASK>
Putting all this together !!!
At this stage you may be a bit lost, my explanation is a bit fragmented. But there is an easy way to see all the piece together. You can look at an non DVR/VxLAN model example, look at the gateway.yml and compute.yml files. I hope you see the big picture by now.
Changing hostname
If you have a specific nomenclature to name your nodes (vms or bare metal), you can update your model. For example to change network nodes hostname
#!yaml
# vi classes/cluster/xxx/init.yml
openstack_gateway_node01_hostname: fr-pa-gtw01
openstack_gateway_node02_hostname: fr-pa-gtw02
openstack_gateway_node03_hostname: fr-pa-gtw03
It will also be reflected in the node salt-minion ID. Let me explain how, our salt-formula-salt configure salt minion on our nodes, and use this template as a baseline to configure it. This template set {{ system.name }}.{{ system.domain }}
as the ID. If you look more closely in the template, you’ll find this line
{%- from "linux/map.jinja" import system with context %}
Now look at linux/map.jinja, you’ll find this
{% set system = salt['grains.filter_by']({
...
}, grain='os_family', merge=salt['pillar.get']('linux:system')) %}
Which merge our linux:system
Pillar data into system which is then imported in our template. So {{ system.name }}
in our template, refer to Pillar data linux:system:name
. Which in turn is defined in the generated Reclass node configuration when reclass.storage state is run and create the node based on this template and gateway cluster node declaration in classes/system/reclass/system/storage/openstack_gateway_cluster.yml
which associate openstack_gateway_node0x_hostname
to reclass:storage:node:name
.
Here is an example of a generated gateway node YAML where you’ll see that linux:system:name
Pillar data is set to our hostname as expected. This is exactly as we said earlier what gets injected in our salt-minion template, and used as the node ID.
#!yaml
# vi /srv/salt/reclass/nodes/_generated/fr-pa-gtw01.yet.org.yml
classes:
- cluster.int.openstack.gateway
parameters:
_param:
linux_system_codename: trusty
salt_master_host: 10.0.0.120
single_address: 192.168.1.120
tenant_address: 192.168.12.120
linux:
system:
name: fr-pa-gtw01
domain: yet.org
cluster: default
environment: prd
Ok, I’ve lost you :( two options from there, you can forget all of this and just trust me, or read my article on Salt Formula to understand how map.jinja
works and then the next article on Using Salt with Reclass to put everything together.
Neutron CLI - cheatsheet
Neutron agent
# neutron agent-list
# neutron agent-show
Restarting agents
gw# service neutron-l3-agent restart
cmp# service neutron-dhcp-agent restart
cmp# service neutron-openvswitch-agent restart
Neutron router
Get a list of virtual routers
# neutron router-list
You’ll get router ID, name external_gateway_info (network_id/snat?/subnet_id/ip), is it distributed and highly available ?
If you need more information about a router, like in which availability zone it is located, tenant_id, is it up ?
# neutron router-show mirantis-router
List all ports of a router
# neutron router-port-list mirantis-router
For HA router you can get the list of active/passive neutron gateways
# neutron l3-agent-list-hosting-router mirantis-router
List routers on a agent
# neutron router-list-on-l3-agent
Add/Remove HA from an existing router
# neutron router-update mirantis-router-nonha --admin_state_up=False
# neutron router-update mirantis-router-nonha --ha=<False|True>
# neutron router-update mirantis-router-nonha --admin_state_up=True
Neutron DHCP
# neutron net-list-on-dhcp-agent
L3 HA neutron routers
What you have to know about Neutron router high availability using VRRP/Keepalived
- Traffic will always go thru a single l3-agent
- It’s incompatible with
DVR
- Doesn’t address l3-agent failure, rely on ha network to determine failure.
- The failover process only retains the state of network connections for instances with a floating IP address.
While we talk about virtual routers using VRRP for HA, keep in mind that a single ha network will be created per tenant. Each HA router will be assigned a 8 bit virtual router ID (VRID), so a maximum of 255 HA routers can be created per tenant. This VRID will also be used to assign a virtual IP to each router in the 169.254.0.0/24 CIDR by default, if a router have a VRID of 4, it wil get 169.254.0.4, only the master will have it but it’s not going to be used as a gateway by anyone but at least it will be unique.
On top of that each instance of the VRRP router, running on an l3 agent, will be listening on the ha tenant network. It will get assigned an IP address within the l3_ha_net_cidr, 169.254.192.0/18 by default.
All router gets assigned the same VRRP priority (50), so when an election occurs the one with the lowest IP will always win and become master.
reverting to automatic l3 agent failover
If you don’t like any of these limitations, you can always revert back to non L3 HA routers by default, by setting up
#!yaml
# vi classes/cluster/int/openstack/init.yml
neutron_l3_ha: False
Then by changing the following hardcoded line in your the template configuration file of the neutron salt formula, you’ll then get the original mechanism instead, lets hope a pull request will be merged to avoid this hack in the futur.
#!yaml
cfg01# vi /srv/salt/env/prd/neutron/files/mitaka/neutron-server.conf.Debian
allow_automatic_l3agent_failover = true
Lastly, even when you set the neutron_l3_ha
to False, you’ll still be able to create L3 HA routers from the CLI
# neutron router-create my-ha-router --ha=true
Forcing a failover
Find the active gateways and set its ha interface down
ctl# neutron l3-agent-list-hosting-router mirantis-router
active-gw# ip netns exec qrouter-4c8c40dc-fd02-443a-a2c9-29afd8592b61 ifconfig ha-3e051b01-80 down
You can also enter the namespace like this
# ip netns exec qrouter-4c8c40dc-fd02-443a-a2c9-29afd8592b61 /bin/bash
All subsequent commands will be ran in the corresponding namespace
# ifconfig ha-3e051b01-80 down
You can observe failover events in a network node in
/var/lib/neutron/ha_confs/<ROUTER_ID>/neutron-keepalived-state-change.log
Conclusion
Linux and neutron salt formulas, combined with MCP Modelling allows to deploy a large spectre of use cases without a great deal of effort which I find pretty nice. Open vSwitch networking is complicated enough, Mirantis Cloud Platform abstract away the complexity of OVS Networking by allowing you to deploy well tested, reference architectures with few lines of yaml, to avoid to spend too much time builbing out your own stuff or troubleshooting corner cases.