Everything You Always Wanted to Know about Twisted
This paper was originally prepared as an introduction to Twisted for the Launchpad development team, and was presented in London on October 30th, 2008. The version here has been edited for HTML, and has had Launchpad-specific references and examples removed.
By Jonathan M. Lange & Michael Hudson-Doyle.
Abstract #
Twisted has a bit of a reputation of being “magic” and some people seem to be a bit scared of it. This talk will aim to explain the basic ideas of Twisted in simple language and make it clear that it is not at all magic. It will touch on the reactor, Deferreds, basic protocol structure and dispell some common misconceptions.
Introduction #
Twisted is, at its heart, an asynchronous I/O framework written in Python. It comes bundled with application servers, defunct persistence systems, protocol implementations, a testing framework and a quotes file. Its size, scope and the unfamiliarity of its concepts make it intimidating for newcomers. This paper aims to address the latter point, by explaining the concepts of asynchronous programming as implemented by Twisted.
This paper assumes that you know how to write Python code, that you know roughly what an event loop is, and that you are comfortable with network programming in general. When you are finished, you should know enough about Twisted to start investigating how you can use it to solve your specific problems.
A Result You Don’t Have Yet #
Imagine that you are writing a program where one API that you are using returns placeholders for the actual results. You could think of each placeholder as a promise that you will eventually get a result or you could think of them as a result that you don’t have yet.
Suppose one of these functions was order_food(). This function places an order for some food and then returns immediately. When we call it, it returns a placeholder, like so:
PLACEHOLDER = order_food()
We have the placeholder now and sometime later we’ll get the actual response (e.g. eggs benedict). How can we write a program that depends on the outcome of ordering food? To do this, we would need to be able to schedule actions for when we get the actual response. So, something like:
PLACEHOLDER = order_food()
# When PLACEHOLDER is ready, do ACTION
In Python, PLACEHOLDER
is going to be a Python object, and the ACTION
is
best represented as a callable, giving us::
placeholder = order_food()
placeholder.when_ready(do_action)
You can think of PLACEHOLDER as the source of a single event: “got a
value” and do_action
a handler for that event. If order_food just
returned its result immediately, the code would look like:
value = order_food()
do_action(value)
But our placeholder is not yet good enough. We need a way to use the
result of do_action
to do other things. We need an equivalent of::
value = order_food()
x = do_action(value)do_something_else(x)
That is, our placeholder needs to do something like::
placeholder = order_food()
placeholder.when_ready(do_action)
placeholder.when_ready(do_something_else)
Where the return value of do_action
is passed to do_something_else.
To illustrate, let’s extend our example to do something interesting with the food we get back.
placeholder = order_food()
placeholder.when_ready(eat_meal)
placeholder.when_ready(compliment_chef)
def eat_meal(meal):
# I don't know how to do this in Python.
was_it_good = nom_nom_nom(meal)
return was_it_good
def compliment_chef(praiseworthy):
if praiseworthy:
print "Wuu chef!"
else:
print "I sneeze in your face!"
eat_meal
and compliment_chef
are both “callbacks”. eat_meal
is run
as soon as we have a real result for order_food()
. It’s passed that
result (meal
in the example), eats it and then returns a boolean
indicating whether or not it was any good. The key here is that the
return value of a callback is passed as the first parameter to the next
callback.
compliment_chef
is run as soon as eat_meal
finishes. It is passed the
return value of eat_meal
.
If order_food
returned its result immediately, the code would look like:
meal = order_food()
was_it_good = nom_nom_nom(meal)
if was_it_good:
print "Wuu chef!"
else:
print "I sneeze in your face!"
No Magic #
So far, there is no magic involved, just pure Python abstraction: no threads, no I/O, no subprocesses, no signals, no concurrency.
In fact, here’s an implementation:
class Placeholder:
"""A value you don't yet have."""
UNFIRED = object()
def __init__(self):
self._callbacks = []
self._result = self.UNFIRED
def already_fired(self):
return not self._result is self.UNFIRED
def when_ready(self, callable, *args, **kwargs):
self._callbacks.append((callable, args, kwargs))
if self.already_fired():
self._run_callbacks()
return self
def _run_callbacks(self):
while self._callbacks:
callable, args, kwargs = self._callbacks.pop()
self._result = callable(self._result, *args, **kwargs)
def fire(self, value):
if self.already_fired():
raise AlreadyFiredError(self, value)
self._result = value
self._run_callbacks()
Nothing special, just loops, lists and first-class functions.
The reader will notice three things about this implementation:
- If a callback takes a long time to execute, the
_run_callbacks
loop will block. - There is no error handling.
- If a callback returns a placeholder itself, the callbacks of that placeholder will never be run
The first is a deliberate design decision: there is no magic here. The second is the subject of the next section. The third is left as an exercise to the reader (hint: you can cheat by looking inside twisted/internet/defer.py).
Errors #
Actually, a placeholder is the source of two events: success and failure,
which correspond to return
and raise
. Any placeholder can stand for a
successful result or for a raised error.
This means that we need the equivalents of:
# 1. Handle error then do the action anyway.
try:
value = order_food()
except:
handle_error()do_action(value)
# 2. Handle the error but do the action only if the error doesn't occur.
try:
value = order_food()
except:
handle_error()
else:
do_action(value)
# 3. Handle the error for the entire operation.
try:
value = order_food()
do_action(value)
except:
handle_error()
# 4. Do something regardless of success or failure.
try:
value = order_food()
finally:
do_cleanup()
We simply cannot use Python’s built-in exception handling structures because
the result is not known yet. We need to extend our placeholder to be able to
replicate any error handling that we can do with
try
/except
/else
/finally
. Luckily, Twisted has already implemented such
a placeholder, calling it a Deferred. The when_ready
operation described
above is called addCallback
, and is analogous to ‘do this on next success’.
The example from the previous section becomes:
deferred = order_food()
deferred.addCallback(eat_meal)
deferred.addCallback(compliment_chef)
addCallback
has a sibling, addErrback
, which is ‘do this on next failure’.
The four clauses above become:
# 1. Handle error then do the action anyway.
deferred = order_food()
deferred.addErrback(handle_error)
deferred.addCallback(do_action)
# 2. Handle the error but do the action only if the error doesn't occur.
deferred = order_food()
deferred.addCallbacks(do_action, handle_error)
# 3. Handle the error for the entire operation.
deferred = order_food()
deferred.addCallback(do_action)
deferred.addErrback(handle_error)
# 4. Do something regardless of success or failure.
deferred = order_food()
deferred.addBoth(do_cleanup)
Again, none of this requires magic. We could make the “Placeholder” example above handle all of these cases without introducing any concurrency, threading or non-blocking I/O.
We now have everything we need to write programs that use results that we don’t have yet. The only thing we lack is a reason for using such results. For that we need a way of implementing asynchronous operations.
The Reactor #
The reactor is Twisted’s event loop. You use the reactor to register sources of events and handlers for these events, and then the reactor calls those handlers.
In this sense, the simplest possible Twisted program is:
from twisted.internet import reactor
print 'Hello'
reactor.run()
This will print ‘Hello’ and then run until it is interrupted with a signal (another event).
We can make the program a little more complex by registering our own event:
from twisted.internet import reactor
def print_world():
print 'World!'
print 'Hello ',
reactor.callWhenRunning(print_world)
reactor.run()
The reactor will print ‘Hello’, and then print ‘World!’ once it starts running, and then it will loop forever as before. The next simplest form of event is based on elapsed time:
from twisted.internet import reactor
def print_world():
print 'World!'
reactor.callLater(5, reactor.stop)
print 'Hello ',
reactor.callWhenRunning(print_world)
reactor.run()
This will print “Hello, World!” as above, and then wait five seconds before shutting down the reactor and exiting cleanly.
Note that the reactor calls your code and when your code is done, control returns to the reactor.
Deferreds are used in order to turn event handlers into APIs that are independent of those events:
from twisted.internet import defer, reactor
def run_later(seconds, function, *args, **kwargs):
d = defer.Deferred()
def fire():
value = function(*args, **kwargs)
d.callback(value)
reactor.callLater(seconds, fire)
return d
def print_stuff(message):
# Because 'print' is a *statement*.
print message
d = run_later(2, print_stuff, 'Hello')
d.addCallback(lambda ignored: run_later(3, print_stuff, 'World'))
run_later(3, print_stuff, 'Beautiful')
reactor.run()
This will print “Hello” at two seconds, “Beautiful” at three seconds and “World” three seconds after “Hello”.
The deferreds here make sure that one thing happens after another. The
‘World’ callback runs only after the deferred returned by the ‘Hello’
run_later
is fired.
The “concurrency” here, such as it is, only works because everything runs very quickly. If any method called by the reactor blocks, then the entire application blocks:
import time
from twisted.internet import defer, reactor
def sleep_then_print(seconds, value):
time.sleep(seconds)
print value
d = run_later(2, print_stuff, 'Hello')
d.addCallback(lambda ignored: sleep_then_print(3, 'World'))
run_later(3, print_stuff, 'Beautiful')
reactor.run()
Although this looks like it might do the same thing as the previous program, it will actually display:
HelloWorldBeautiful
What happens is that after ‘Hello’ is printed, the sleep_then_print
callback is fired, blocking processing for three seconds. The reactor
only gets a chance to execute the ‘Beautiful’ print statement after
sleep_then_print
has exited.
This is about as much as can be said about the reactor without introducing Twisted’s abstractions for I/O.
Separate Concerns #
Twisted is fundamentally about doing asynchronous I/O — the fundamental purpose of deferreds and the reactor is to make it easier to write programs that communicate over the network asynchronously.
The first necessary part is an interface for handling very low-level events and operations: this socket just got some data; the remote end disconnected; write some data etc. In Twisted terminology, this is a transport. There are transports for TCP, UDP, UNIX sockets, processes and SSL. Each of these knows how to read and write bytes to the wire.
The second part is an object that can provide meaning to the low-level byte
operations: a protocol object. The protocol is a state machine that responds
to events that the transport sends it, events like: dataReceived
;
connectionMade
and connectionLost
.
The third is an object that can be used to create new protocol objects as new connections are required and to bind these protocol objects to their transports (not strictly true — other parts of Twisted do the binding. Still, it’s a helpful lie). Twisted calls these protocol factories.
Now, let’s combine all of these together to write a toy server and a client to go with it.
from twisted.internet import protocol, reactor
from twisted.protocols.basic import LineReceiver
class ReverseLineProtocol(LineReceiver):
"""
Line-based protocol that echoes any lines it receives in reverse.
Disconnects the client when ten lines have been received.
"""
def __init__(self):
self._lines_received = 0
def lineReceived(self, line):
self.sendLine(''.join(reversed(line)))
self._lines_received += 1
if self._lines_received >= 10:
self.transport.loseConnection()
class ReverseLineFactory(protocol.ServerFactory):
protocol = ReverseLineProtocol
reactor.listenTCP(9999, ReverseLineFactory())
reactor.run()
The protocol is a LineReceiver
, which is a simple helper built on top of the
base protocol that buffers bytes until they become lines, and then sends those
lines to lineReceived
.
The call to listenTCP
binds that instance of the factory to listen on
port 9999. When a new connection is made, Twisted constructs an instance of
the protocol and binds it to the socket opened for that connection.
Here’s the client:
from twisted.internet import defer, reactor, protocol as proto
from twisted.protocols.basic import LineReceiver
class ReverseClientProtocol(LineReceiver):
def __init__(self, deferred, string):
self._deferred = deferred
self._string = string
def connectionMade(self):
self.sendLine(self._string)
def lineReceived(self, line):
if self._deferred is None:
return
d, self._deferred = self.deferred, None
d.callback(line)
class RemoteReverser:
def __init__(self, host, port):
self._host = host
self._port = port
def reverse(self, some_string):
"""Send 'some_string' to the host to be reversed."""
d = defer.Deferred()
client_creator = proto.ClientFactory(
reactor, ReverseClientProtocol, d, some_string)
client_creator.connectTCP(self._host, self._port)
return d
def print_stuff(message):
print message
reverser = RemoteReverser('localhost', 9999)
d = reverser.reverse('Hello')
d.addCallback(print_stuff)
d.addBoth(reactor.stop)
reactor.run()
The ReverseLineProtocol
implements the details of talking to the server:
it sends a line on connection and then fires its deferred as soon as it
receives a line back.
The dance with setting self._deferred
to None
is actually quite
important. If lineReceived
naively fired the deferred every time it was
called, then it would raise an AlreadyFiredError
.
RemoteReverser
is a user-defined class with no explicit place in Twisted. It
provides an API for reversing a string. reverse
returns a deferred that will
fire with the first line returned from the server.
Again, by calling reverse
and using the deferred, callers can avoid knowing
anything about how the protocol works or what events are involved. As far as
a caller is concerned, a call to reverse
simply returns a placeholder for a
value that you don’t have yet.
Conclusion #
Twisted provides tools for building libraries and applications that are built around asynchronous I/O. It tries hard not to make decisions for you so that you can write whatever application you need.