This post is a dummy walkthrough of neutron services code. I recommend being familiar with the following modules (not a hard requirement, but will help you avoid jumping between this post and other docs):

I created the following drawing to make it easier to track several of the files and classes mentioned in this post.

L3 Agent

My purpose with doing this was to learn more about how neutron handles services. As I believe in keeping it down to earth. I decided to follow a real example of service in neutron – the l3 agent.

So let’s start by looking at l3 agent main function (there is actually another one before it, for the cmd, but it contains only a call to this function).

The location of this code is  here. First relevant block, in this function, for creating the actual service are the following lines

It uses the create function from neuton.service. You’ll notice the ‘create’ method is a classmethod and the way the instance of the Service class is created, is by calling this ‘create’ method and not directly by creating an instance of the Service class.

Let’s go over the parameters:

  • host – where the agent runs, on which server
  • binary – determines the binary name used for running the service. It does so by using the inspect module from the Python’s standard library. inspect.stack() will return a list of frame records from the current stack.
  • topic – service listening or consuming queues based on the topic.
  • manager -usually a class which defines how the services behaves at certain point (e.g what to execute when initialized, what to execute after starting).
  • report_interval – the seconds between state reports to the server. We set this argument by passing ‘cfg.CONF.AGENT.report_interval’. It set here and the default is 30.
  • periodic_interval – the seconds between running periodic tasks (this description taken directly from the code). Default is 40 as can be seen here.
  • periodic_fuzzy_delay – used as a range of seconds to randomly delay the start of a certain loop.
  • service_obj – the last and the most important one. This is an object of the class, created by calling the class with all the above parameters.

Let’s see how service_obj is created (the code is located in the same file):

We can see most of the attributes are the same. Note the profiler line:

This will set up the profiler for this specific service.

You may also notice that this class inherits from n_rpc.Service which is in neutron.common.rpc .The last line is calling this class

Let’s see how it looks:

This Service class inherits from oslo service and this is the last inheritance in neutron for Service. This class basically subclasses oslo_service.service.ServiceBase. which provides an interface for a service.

Back to L3 agent main function.

We now reached the last line which runs the actual service. We are providing the launch method, two arguments:

1. the configuration (this is mostly composed by regsiter_opts in the same file).

2. ‘server’ which is the service we just created, in the previous line. We also use “wait” to not return anything, until the service is stopped.

Without deep diving too much into oslo service, ‘service.launch’ will eventually run the ‘start’ method of the service. You can take a look at it here.

In our case, it will run the start method in neutron.service which we already have seen:

First, it runs the ‘init_host’ method of our manager. Reminder: our manager is “neutron.agent.l3.agent.L3NATAgentWithStateReport”  as passed to main l3 agent function.

L3NATAgentWithStateReport inherits from L3NATAgent which inherits from multiple classes when one of them is neutron.manager.Manager which includes the method ‘init_host’

‘init_host’ is an empty method in neutron.manager.Manager. This method is expected to be overridden by a subclasses. In reality, this method implemented only by the DHCP agent at the moment.

Next, it calls the start method of the parent class, which is in neutron.common.rpc

The first line is calling oslo service start method. Which is an abstract (empty method, implementation is done by neutron.common.rpc service).

Next, it creates self.conn by calling ‘create_connection’ which basically creates and returns a Connection object from the same file.

Then, after assigning the manager to endpoints variable, it will call ‘create_consumer’ of our Connection object (self.conn).

‘create_consumer’ will create an RPC server that will consume every message sent to the topic of the L3 agent (which is ‘l3_agent’, surprisingly). Let’s see how it works.

The first line is creating a target. A target in oslo messaging encapsulates all the information to identify what messages a service is listening for or where the messages should be sent. In our case, the topic is ‘l3_agent‘, the host is where we are running the agent and fanout is False.

Next it calls ‘get_server‘ in the same file (neutron.common.rpc) to consturct an RPCServer from oslo.messaging.

The last line is adding the server from the previous line to the servers list of our connection object (self.conn).

Back to the start method in neutron.common.rpc where we left it.

So now that we have a consumer, we can move to the next step, which calls the method ‘initialize_service_hook’ of our manager, if it exists. In our case, our manager (neutron.agent.l3.agent.L3NATAgentWithStateReport) doesn’t have such attribute ( to be honest, it doesn’t exist at all in neutron tree), so this part is skipped.

Last line is calling ‘consume_in_threads’ of our connection object (self.conn)

It’s pretty basic. All it does is to go over the RPC servers list (in our case, one server which we created with ‘create_consumer’) and start them.

At this point, we are done with the start method in neutron.common.rpc and we are back to neutron.service start method

It checks now, if report_interval is defined (again, the seconds between state reports to the server which is set here) since it’s defined, it will create a loop interval, using loopingcall.FixedIntervalLoopingCall from oslo.service.

It will pass it ‘self.report_state’ which is a method of neutron.service Service that supposes to report the state, but at the moment is not implemented. Once created, it will start the loop with the interval set by ‘self.report_interval’ and it will add it to the timers list, which is an empty list at the moment.

Next, it checks if ‘periodic_interval’ (seconds between running periodic tasks) is set. Since it’s set here it will proceed to the next check of whether ‘periodic_fuzzy_delay’ (reminder: used as a range of seconds to randomly delay the start of a certain loop). In our case it’s defined by default to 5, so ‘initial_delay’ will be randomly set to a number between 0 and 5.

Next we’ll create a loopingcall which is a class in oslo service that allows us to run a specific method in loop. In this case we’ll run periodic_tasks method, which defined in the same file:

First, ‘periodic_tasks’ creates a context object by calling ‘get_admin_context‘ from the neutron-lib project. ‘get_admin_context’ returns an oslo context object which we’ll use for running the periodic tasks.

In the next line, we are calling neutron’s manager periodic_tasks function which is just another call

It calls ‘run_periodic_tasks‘ of oslo service module. Back to the start  method of neutron.service

We reached the point where neutron starts to execute the periodic loop from the previous line. It does so by running the start method of the ‘FixedIntervalLoopingCall’ class.

Next, we are adding ‘periodic’ from the previous line to the list of timers.

Finally, it calls the ‘after_start‘ method of our manager (reminder: our manager is L3NATAgentWithStateReport)

It starts by spawning a green thread with a call to _process_router_loop. This  loop uses a green thread pool of the size 8 to ensure that the maximum number of workers are either processing a router or waiting on the queue for the next update to come in.

Back to after_start,  it calls next to _report_state which reports on the overall status of all the existing routers. If the agent was just revived from a crash, then it will perform a full sync.

Next, it calls to ‘after_start‘ method of our prefix delegation object, which defines a signal handler.

At this point, our agent is up and running. Handling the L3 and consuming, publishing messages regarding L3 topics.