
My name is Dmitriy, and I am a software developer of DCImanager — the equipment management panel by ISPsystem. I spent a long time on the team developing switch management software. Together we have experienced ups and downs: from development of hardware management services to failure of the office network and hours-long “dates” in the server room hoping not to lose your loved ones.
DCImanager supports different equipment types: switches, PDUs, and servers. The panel currently supports four switch handlers. Two handlers working via SNMP (Cisco Catalyst and snmp common) and two more via NETCONF (Juniper with and without ELS support).
All activities with equipment are heavily covered with testing. Using actual equipment for automatic testing does not work: tests are launched at each push to repository and run in parallel. Therefore, we try to use emulators.
We were able to cover the SNMP protocol handlers with tests by using the SNMP Agent Simulator library. But in case of Juniper, we ran into problems. After looking for ready-made solutions, we chose a couple of libraries, but one of them did not start, and the other was not doing the right thing — I actually spent more time trying to bring that little wonder to life.
So, the question was how to emulate Juniper switches? Juniper runs on NETCONF, which, runs over SSH. The idea of writing a small service that would work over SSH and emulate the switch came to mind. Accordingly, we needed the service itself as well as a Juniper "snapshot" for data emulation.
In snmpsim, a snapshot refers to a complete copy of the switch status, with all its supported OIDs and their current values.
However, in Junipier, things are slightly more complicated: no such snapshot can be created. In this case, a snapshot will refer to a set of query-response templates.
Part one: the architecture of planting
We are now actively developing a whole “zoo” of handlers for different switches. Soon we will have new switch handlers, but not all of them will be covered with ready-made testing solutions. However, we can try to write a base architecture of the service that will simulate different devices on different protocols.
In the simplest case — a factory, which depending on the protocol and the handler (some switches can run on several protocols), will return the switch object, in which all logic of its behavior will already be implemented. In the case of Juniper, it is a small query parser. Depending on the incoming rpc query with parameters, it will perform the necessary actions.
Important restriction: we will not be able to fully simulate operation of the switch. It will take a long time to describe all the logic, while if we add new functionality to the actual switch handler, we will also have to adjust the switch's mock server.
Part two: choosing the right soil for planting
Our eye fell on paramiko library that provides a convenient interface for working via SSH. To begin with, we wanted to check the basic things, such as connection and some simple query, instead of breaking down the architecture. We are doing research here, after all. So, we did not concern ourselves with authorization: a combination of simple ServerInterface and a socket server gave us something like a working option:
def check_auth_password(self, user, password):
if user == SSH_USER_NAME and password == SSH_USER_PASSWORD:
return paramiko.AUTH_SUCCESSFUL
return paramiko.AUTH_FAILED
socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
socket.bind(("127.0.0.1", 8300))
socket.listen(10)
client, address = socket.accept()
session = paramiko.Transport(client)
server = SshServer()
session.start_server(server=server)
An approximate implementation of something you would like to see, but it looks scary
When the client connects to the server, the latter should respond with a list of its capabilities. For example:
<hello>
<capabilities>
<capability>urn:ietf:params:xml:ns:netconf:base:1.0</capability>
<capability>xml.juniper.net/netconf/junos/1.0</capability>
<capability>xml.juniper.net/dmi/system/1.0</capability>
</capabilities>
<session-id>1</session-id>
</hello>
]]>]]>
"""
socket.send(reply)
Yes, this is XML ]]>]]>
In case you wondered, the code is unstable. This implementation has the problem of socket closure. I was able to find a couple of registered issues in paramiko with this problem. We put this option aside, and decided to check the remaining opportunity.
Part three: planting
The ace up our sleeve was Twisted. This is a network application development framework, which supports a large number of protocols. It has extensive documentation and the fantastic Cred module that would help us.
Credis an authentication mechanism that allows different network protocols to connect to the system depending on your requirements.
To organize the entire logic, Realm was used — the part of the application responsible for business logic and access to its objects. However, first things first.
The core of the system login is Portal. If we want to write a service on top of the network protocol, we need to define the standard Portal. It already includes methods:
- login (provides client access to the subsystem)
- registerChecker (verification of credentials).
A Realm object is used to connect the business logic to the authentication system. Since the client is already authorized, the logic of our service over the SSH starts here. This interface has only one requestAvatar method, which is requested upon successful authorization in Portal and returns the main object - SwitchProtocolAvatar:
class SwitchRealm(object):
def __init__(self, switch_obj):
self.switch_obj = switch_obj
def requestAvatar(self, avatarId, mind, *interfaces):
return interfaces[0], SwitchProtocolAvatar(avatarId, switch_obj=self.switch_obj), lambda: None
Special objects — Avatars — are in charge of managing the business logic. In our case, the service over the SSH starts here. When a query is sent, the data are brought into SwitchProtocolAvatar, which checks the query subsystem and updates the configuration:
def __init__(self, username, switch_core):
avatar.ConchUser.__init__(self)
self.username = username
self.channelLookup.update({b'session': session.SSHSession})
netconf_protocol = switch_core.get_netconf_protocol()
if netconf_protocol:
self.subsystemLookup.update({b'netconf': netconf_protocol})
Speaking of protocols. While bearing in mind that we are working with NETCONF, we proceed with execution. To write services on top of existing protocols and to implement our logic we use Protocol. The interface of this class is simple:
- dataReceived — used to process data receipt events;
- makeConnection — used to establish connection;
- сonnectionMade — used once the connection has been established. Here we can define some logic before the client starts sending queries. In our case, we need to send the list of our capabilities.
def __init__(self, capabilities=None):
self.session_count = 0
self.capabilities = capabilities
def __call__(self, *args, **kwargs):
return self
def connectionMade(self):
self.session_count += 1
self.send_capabilities()
def send_capabilities(self):
rpc_capabilities_reply = "<hello><capabilities>{capabilities}</capabilities>" \
"<session-id>{session_id}</session-id></hello>]]>]]>"
rpc_capabilities = "".join(f"<capability>{cap}</capability>" for cap in self.capabilities)
self.transport.write(rpc_capabilities_reply.format(capabilities=rpc_capabilities,
session_id=self.session_count))
def dataReceived(self, data):
# Process received data
pass
Here we start to wrap the layers of our nesting doll. Since we use a service on over SSH, we need to implement the SSH server logic. In it, we will specify keys for the server and processing modules for SSH services. Our interest in implementation of this class is limited, since the authorization will be password-based:
protocol = SSHServerTransport
publicKeys = {b'ssh-rsa': keys.Key.fromFile(SERVER_RSA_PUBLIC)}
privateKeys = {b'ssh-rsa': keys.Key.fromFile(SERVER_RSA_PRIVATE)}
services = {
b'ssh-userauth': userauth.SSHUserAuthServer,
b'ssh-connection': connection.SSHConnection
}
def getPrimes(self):
return PRIMES
For SSH server to work it is necessary to determine the logic of sessions, which operates regardless of the protocol we decide to use or what interface is requested:
def dataReceived(self, data):
if data == b'\r':
data = b'\r\n'
elif data == b'\x03': # Ctrl+C
self.transport.loseConnection()
return
self.transport.write(data)
class Session:
def __init__(self, avatar):
pass
def getPty(self, term, windowSize, attrs):
pass
def execCommand(self, proto, cmd):
pass
def openShell(self, transport):
protocol = EchoProtocol()
protocol.makeConnection(transport)
transport.makeConnection(session.wrapProtocol(protocol))
def eofReceived(self):
pass
def closed(self):
pass
I have nearly forgotten about the switch handler itself. After all checks and authorizations, the logic moves on to the object that emulates the switch. Here you can set the query processing logic: receiving or editing interfaces, device configuration etc.
def __init__(self):
self.protocol = Netconf(capabilities=self.capabilities())
def get_netconf_protocol(self):
return self.protocol
@staticmethod
def capabilities():
return [
"Candidate1_0urn:ietf:params:xml:ns:netconf:capability:candidate:1.0",
"urn:ietf:params:xml:ns:netconf:capability:confirmed-commit:1.0",
"urn:ietf:params:xml:ns:netconf:capability:validate:1.0",
"urn:ietf:params:xml:ns:netconf:capability:url:1.0?protocol=http,ftp,file",
"urn:ietf:params:netconf:capability:candidate:1.0",
"urn:ietf:params:netconf:capability:confirmed-commit:1.0",
"urn:ietf:params:netconf:capability:validate:1.0",
"urn:ietf:params:netconf:capability:url:1.0?scheme=http,ftp,file"
]
And finally we join it all together. The session adapter is registered (which describes the behavior upon connection), the connection method by username and password is defined, the Portal is configured and our service is launched:
switch_factory = SwitchFactory()
switch = switch_factory.get("juniper")
portal = portal.Portal(CustomRealm(switch))
credential_source = InMemoryUsernamePasswordDatabaseDontUse()
credential_source.addUser(b'admin', b'admin')
portal.registerChecker(credential_source)
SshServerFactory.portal = portal
reactor.listenTCP(830, SshServerFactory())
reactor.run()
connection = manager.connect(host="127.0.0.1",
port=830,
username="admin",
password="admin",
timeout=60,
device_params={'name': 'junos'},
hostkey_verify=False)
for capability in connection.server_capabilities:
print(capability)
The query result is provided below. We have successfully established the connection and the server delivered us the list of its capabilities:
urn:ietf:params:xml:ns:netconf:capability:confirmed-commit:1.0
urn:ietf:params:xml:ns:netconf:capability:validate:1.0
urn:ietf:params:xml:ns:netconf:capability:url:1.0?protocol=http,ftp,file
urn:ietf:params:netconf:capability:candidate:1.0
urn:ietf:params:netconf:capability:confirmed-commit:1.0
urn:ietf:params:netconf:capability:validate:1.0
urn:ietf:params:netconf:capability:url:1.0?scheme=http,ftp,file
Summary
This solution has its pros and cons. On the one hand, we spend a lot of time on implementation and description of the whole logic of query processing. On the other hand, we gain flexible configuration and behavior emulation. However, the key advantage is scalability. The Twisted framework has rich functionality and supports a large number of protocols, so you can easily describe new switch handlers’ interfaces. And if you think everything through well enough, this architecture can be used not only for working with switches, but also for other equipment.
Feedback from readers would be strongly appreciated. Have you done anything similar and if so, what technologies were used and how did you set up the testing process?