The Missing Semester Of Your CS Education
The Problem: Alice has to send information to Bob. But the transfer should achieve the following:
Checksums Checksum as a person’s full name: Eubediah Q. Horsefeathers. It’s a shortcut to uniqueness that’s fast and simple, but easy to forge, because security isn’t really the point of naming.
Hashing The core purpose of hashing is to create a fingerprint of data to assess data integrity. To qualify as a cryptographic hash function, a hash function must be pre-image resistant and collision resistant.
MD5
has since been broken and vulnerable to tampering. SHA-1
isn’t great. SHA-256
. Fast to compute, so vulnerable to brute-force, dictionary and rainbow tables. A single iteration of SHA512
is fast, which makes it inappropriate for use as a password hashing function. Brute-Force: If an attacker has gotten hold of password hashes that were hashed with something like SHA-256
, they could try to generate every password possible and hash these to find a match for the password hashes; this is called brute forcing. Since hash functions are deterministic (the same function input always results in the same hash), if a couple of users were to use the same password, their hash would be identical. If a significant amount of people are mapped to the same hash that could be an indicator that the hash represents a commonly used password and allow the attacker to significantly narrow down the number of passwords to use to break in by brute force.
Dictionary attack: This is where a file/database is previously constructed containing possible passwords that are better guesses than generating every possible password.
Solution: Use slower hashing method, making it unrealistic to find matches in our lifetime. PBKDF2
is a key derivation function where the user can set the computational cost; this aims to slow down the calculation of the key to make it more impractical to brute force. In usage terms, it takes a password, salt and a number or iterations to produce a certain key length which can also be compared to a hash as it is also a one way function. With iterations set to a large number, the algorithm takes longer to calculate the end result.
Rainbow Table Attack. An attacker can use a large database of precomputed hash chains to find the input of stolen password hashes. A hash chain is one row in a rainbow table, stored as an initial hash value and a final value obtained after many repeated operations on that initial value.
Since a rainbow table attack has to re-compute many of these operations, we can mitigate a rainbow table attack by boosting hashing with a procedure that adds unique random data to each input at the moment they are stored. This practice is known as adding salt to a hash and it produces salted password hashes. Why Salt? The salt is a randomly generated string that is joined with the password before hashing. Being randomly generated, it ensures that even hashes of equal passwords get different results.
Given a message, you run it through the checksum algorithm, which the recipient knows. The recipient also receives the checksum through some independent source. If any single bit of the message was modified, you’d know.
Checksums need not themselves contain the message, so they can be publicly displa
You don’t walk up to someone and demand their fingerprints to prove they are who they say they are. Names are just convenient disambiguators, a way of quickly determining who you’re talking to for social reasons, not absolute proof of identity.
yed, and also made to have an output of fixed length.
Given a checksum algorithm and a checksum, can we find the original message? This is not possible, as a given checksum can correspond to many different inputs; even though collission is rare.
Checksums are fast
Hash functions are one-way functions whose inverse is difficult to calculate.
Terms to understand: SHA1, SHA256, ROT13, B64 encoding, BCrypt, RSA, Diffie Helman, PGP, Public/Private keys
Digital Signatures / File Integrity Verification (Gmail, Google Drive files) Signing TLS Certificates Browser Security (Chrome, Firefox Certificates) Finding Duplicate Files in Storage Code Reposatries (Git uses SHA1) BitTorrent, rsync
Web-Apps
In standard webapps, you need to carry the authentication information with you in every request. If a request doesn’t contain any authentication information, then the thing that does the authentication will pop up and ask for it
(1) You create state (e.g., a database) and instead of passing the credentials around, you pass some kind of session identifier that maps to this state. e.g., PageSmith works. It’s not a bad approach, but it requires a lot more machinery
(2) You encrypt the sensitive parts of the authentication information e.g. base64(username + ":" + encrypt(password, secret_key))
Server Responds with 401
and header WWW-Authenticate
Client encodes in Base64 with “Basic” Prepended and sends in Authorization
Header. Only encoded, not encrypted. No login page or cookies are involved
Cookies
200
header and HTML response with no WWW-Authenticate
header, but a Set-Cookie
header (for the first time). In subsequent requests the client sends the named cookie in the headers.