Some BadgeApp security/privacy improvements: Encrypting email addresses and restricting Gravatar
David A. Wheeler
We've recently made some security/privacy improvements to the BadgeApp, and I thought it'd be useful to mention here on the mailing list. In short, we've:
1. Started encrypting email addresses in the internal database. We don't allow users to see other user's email addresses already. The idea of this change is to store the data encrypted at rest, as an additional safeguard. They're encrypted using 'aes-256-gcm' and have a blind index using PBKDF2-HMAC-SHA256, both using strong cryptographically random 256-bit keys. These algorithms and keys should provide PLENTY of protection.
2. *Only* use Gravatar URLs when we've determined that there's an active Gravatar URL. This is so that we don't even leak the cryptographic hash of an email address for a local account unless the user's clearly approved it.
I don't think these are necessary for privacy or GDPR, but they are decent hardening measures because we try to provide as much security and privacy as we reasonably can.
Everything seems to work, but just in case, we're waiting a little bit before we delete the internal database column that stores the unencrypted email addresses. So if you see a problem relating to email address storage, please let us know ASAP before we take that last step.
Details below, which are an extract from the assurance case at https://github.com/coreinfrastructure/best-practices-badge/blob/master/doc/security.md
--- David A. Wheeler
First, encrypted email addresses. We encrypt email addresses within the database, and never send the decryption or index keys to the database. This provides protection of this data at rest, and also means that even if an attacker can view the data within the database, that attacker will not receive sensitive information. Email addresses are encrypted as described here, and almost all other data is considered public or at least not sensitive (Passwords are specially encrypted as described separately).
A little context may be useful here. We work hard to comply with various privacy-related regulations, including the European General Data Protection Regulation (GDPR). We do not believe that encrypting email addresses is strictly required by the GDPR. Still, we want to not just meet requirements, we want to exceed them. Encrypting email addresses makes it even harder for attackers to get this information, because it's encrypted at rest and not available by extracting data from the database system.
It is useful to note why we encrypt just email addresses (and passwords), and not all data. Most obviously, almost all data we manage is public anyway. In addition, the easy ways to encrypt data aren't available to us. Transparent Data Encryption (TDE) is not a capability of PostgreSQL. Whole-database encryption can be done with other tricks but it is extremely expensive on Heroku. Therefore, we encrypt data that is more sensitive, instead of encrypting everything.
We encrypt email addresses using the Rails-specific approach outlined in "Securing User Emails in Rails" by Andrew Kane (May 14, 2018). We use the gem 'attr_encrypted' to encrypt email addresses, and gem 'blind_index' to index encrypted email addresses. This approach builds on standard general-purpose approaches for encrypting data and indexing the data, e.g., see "How to Search on Securely Encrypted Database Fields" by Scott Arciszewski. The important aspect here is that we encrypt the data (so it cannot be revealed by those without the encryption key), and we also create cryptographic keyed hashes of the data (so we can search on the data if we have the hash key). The latter value is called a "blind index".
We encrypt the email addresses using AES with 256-bit keys in GCM mode ('aes-256-gcm'). AES is a well-accepted widely-used encryption algorithm. A 256-bit key is especially strong. The GCM mode is a widely-used strong encryption mode; it provides integrity ("authentication") mechanism. Each separate encryption uses a separate long initialization vector (IV) created using a cryptographically-strong random number generator.
We also hash the email addresses, so they can be indexed. Indexing is necessary so that we can quickly find matching email addresses (e.g., for local user login). We has them using the hashed key algorithm PBKDF2-HMAC-SHA256. SHA-256 is a widely-used cryptographic hash algorithm (in the SHA-2 family), and unlike SHA-1 it is not broken. Using sha256 directly is vulnerable to a length extension attack, but that appears to be irrelevant in this case. In any case, we counter this problem by using HMAC and PBKDF2. HMAC is defined in RFC 2104, which is the algorithm H(K XOR opad, H(K XOR ipad, text)). This enables us to use a private key on the hash, counters length extension, and is very well-studied. We also use PBKDF2 for key extension. This is another well-studied and widely-accepted algorithm. For our purposes we believe PBKDF2-HMAC-SHA256 is far stronger than needed, and thus is quite sufficient to protect the information. The hashes are of email addresses after they've been downcased; this supports case-insensitive searching for email addresses.
The two keys used for email encryption are EMAIL_ENCRYPTION_KEY and EMAIL_BLIND_INDEX_KEY. Both are 256 bits long (aka 64 hexadecimal digits long). The production values for both keys were independently created as cryptographically random values using "rails secret".
Implementation note: the indexes created by blind_index always end in a newline. That doesn't matter for security, but it can cause debugging problems if you weren't expecting that.
Note that 'attr_encrypted' depends on the gem 'encryptor'. Encryptor version 2.0.0 had a major security bug when using AES-*-GCM algorithms. We do not use that version, but instead use a newer version that does not have that vulnerability. Some old documentation recommends using 'attr_encryptor' instead because of this vulnerability, but the vulnerability has since been fixed and 'attr_encryptor' is no longer maintained. Vulnerabilities are never a great sign, but we do take it as a good sign that the developers of encryptor were willing to make a breaking change to fix a security vulnerabilities.
Also: Gravatar is now restricted.
We use gravatar to provide user icons for local (custom) accounts. Many users have created gravatar icons, and those who have created those icons have clearly consented to their use for them.
However, accessing gravatar icons requires the MD5 cryptographic hash of downcased email addresses. Users who have created gravatar icons have already consented to this, but we want to hide even the MD5 cryptographic hashes of those who have not so consented.
Therefore, we track for each user whether or not they should use a gravatar icon, as the boolean field "use_gravatar". Currently this is can only be true for local users (for GitHub users we use their GitHub icon). Whenever a new local user account is created or changed, we check if there is an active gravatar icon, and set use_gravatar accordingly. We also intend to occasionally iterate through local users to reset this (so that users won't need to remember to manipulate their BadgeApp user account). We will then only use the gravatar MD5 when there is an actual gravatar icon to refer to; otherwise, we use a bogus MD5 value. Thus, local users who do not have a gravatar account will not even have the MD5 of their email address revealed.
This is almost certainly not required by regulations such as the GDPR, since without this measure we would only expose MD5s of email addresses, and only in certain cases. But we want to exceed expectations, and this is one way we do that.
The current plan is to iterate through the local users once a month and check with Gravatar. That should be fine for the purpose, and easily scales to a huge number of users.