The not-so-sorry state of SSL in Python
TLS and SSL are two critical technologies which underly much of the secure communications that occur on the internet. Over the past few years, spurred by increasingly effective attacks and a desire for new functionality, SSL and TLS have seen many new features, as well as practical improvements.
Python is currently in a transitional period between Python 2 and Python 3. For
the past few years, all new feature development has been happening on Python 3,
including new features in Python's
ssl module. This means that Python 3 users
have had acccess to these improvements to TLS, but Python 2 users (still the
majority of Python users) have been falling behind.
Unfortunately, missing features in an SSL/TLS stack aren't like missing features in other modules. They can prevent interoperability with the wider internet, compromise the security of connections, and there's usually no workaround. To get a sense of how bad this situation is, you can watch Hynek Schlawack's talk on "The Sorry State of SSL" from PyCon.
Current State and Requirements
This situation is unacceptable, given that Python 2 is going to continue to see production usage for many years to come. As a result, we advocated strongly for PEP 466, which provides permission to backport many of these improvements to Python 2.
A few of the important features missing from Python 2 are:
ALPNsupport. NPN (Next Protocol Negotiation) was a TLS extension originally introduced by Google, which allows a connection to negotiate which application-level protocol will be used, it is currently in the process of being supplanted by ALPN (Application Level Protocol Negotiation), which is an IETF draft. These are necessary for protocols such as SPDY and HTTP/2. (ALPN is not yet in Python, because it's not yet in an OpenSSL releases, once it is the feature will be added).
- SNI support. SNI (Server Name Indication) is another TLS extension, which allows a server to support multiple hostnames on a single IP address. This is necessary because of the shrinking supply of IPv4 addresses and increasing desire to run distributed CDNs which serve multiple hostnames.
- Hostname verification support. Part of the process for correctly implementing a TLS client is verifying that the certificate it presents has a hostname which matches the one you intended to request. Without this support, TLS connections are vulnerable to Man-In-The-Middle attacks.
- Locating platform trust roots. With this Python is able to use your Operating System's provided set of Certificate Authorities for verifying the certificates of TLS servers.
- Improved cipher suites. Previously Python used OpenSSL's default cipher suite list, which is unfortunately insecure. It includes several weak ciphers, and does not correctly prioritize the strongest ciphers.
- ECDHE and DHE support. ECDHE (Elliptic Curve Diffie-Hellman Ephemeral) and DHE (Diffie-Hellman Ephemeral) provide a property called "Perfect Forward Secrecy", which means that even if the server's private key is compromised the plaintext of sessions before the compromise are still secret.
- Any configurability. It's impossible to disable TLS compression (to prevent the CRIME) attack, or configure which versions of the SSL/TLS protocol should be used.
Work, Solutions and Issues
To address this, we backported each of these features from the Python 3 branch to the Python 2 branch. This was a large task, because it is a huge amount of code (the resulting diff is over 12,000 lines), written for two different programming languages, with some very fundamental differences.
Our first step was to take a diff of the files that comprise the SSL module between Python 2 and Python 3. Then, we applied this diff, using a graphical merge tool and throwing away changes that were incorrect, or introduced backwards incompatibilities.
Then we made sure all of the code compiled, fixing syntax errors, and API differences between Python 2 and Python 3. And finally we ran the tests and made fixes as necessary until all of them were passing.
A few of the issues we ran across were:
- Missing functions in the C API, such as
- Usage of stdlib modules, such as
enum, which didn't exist in Python 2.
- Differences between the handling of
unicode. Particularly, existing Python 3 code was very strict about text vs. bytes in a way that often wasn't possible to emulate in Python 2.
sslmodule is deeply coupled to the
socketmodule, and many of the internals of the
socketmodule were changed in Python 3 to fully participate in PEP 3116.
- Most of the code for the
sslmodule is written in C, rather than in Python. C code is considerably more difficult to work with than Python code, while all of the Python-code differences were easy to work around, the C API differences required considerably more work.
This work is important to Rackspace for several reasons:
- Rackspace uses a lot of Python. Between OpenStack, other open source Python projects, and internal tooling, Rackspace is powered by Python. Keeping our own code secure is a very important priority.
- Many of Rackspace's customers use Python, including to orchestrate Rackspace's APIs, Keeping those connections secure is important.
- Features like NPN, ALPN, SNI, and Perfect Forward Secrecy are vital for the health and continued growth of the internet. Developers' access to and adoption of these technologies in Python, whether for use with Rackspace or not, is critical.
The patch with our work is currently in code review, and we're hoping it will be merged soon, for release in Python 2.7.9. One of our goals in working on this patch was to reduce the maintenance burden going forward, by minimizing the delta with Python 3. When new features like ALPN are added to Python 3, they should be much easier to backport to Python 2 than this was.