The Myth That Hashing Is Encryption (And Why It Matters)

I've reviewed probably two hundred pull requests over my career, and I'd estimate one in fifteen had some version of this comment buried in the code: // encrypt password with MD5. Or its cousin: // decrypt hash to verify. Both reveal the same misunderstanding — one that isn't just pedantic wordplay. It has caused real breaches, real data losses, and real apologies to users whose passwords ended up on Pastebin.

Let's fix this once and for all.

They Are Fundamentally Different Things

Hashing and encryption both involve transforming data into something unrecognizable. That surface similarity is where the confusion is born and where it should die.

Hashing is one-way. You feed data in, you get a fixed-length digest out, and there is no path back. SHA-256 takes your string "hunter2" and returns f52fbd32b2b3b86ff88ef6c490628285f482af15ddcb29541f94bcf526a3f6c7. No key. No algorithm you can run in reverse. The digest is a fingerprint, not a cipher.

Encryption is two-way. It takes plaintext and a key, produces ciphertext, and with the right key, you can reconstruct the original plaintext exactly. AES-256 in GCM mode, RSA, ChaCha20 — they're all designed with reversibility as the core property. Encryption without decryption would be useless by definition.

This is not a subtle distinction. It is the entire point of each primitive.

The Security Mistake That Follows Immediately

When a developer thinks hashing is encryption, they often reach for MD5 or SHA-1 to "encrypt" passwords. The reasoning makes sense from the wrong premise: "I'm transforming it, so it's protected." But the assumption built into that reasoning is that the transformation can be reversed if you have the right tool. It can't — but that doesn't mean MD5 passwords are safe.

Here's the real attack: nobody needs to reverse a hash. They precompute them.

Rainbow tables are enormous lookup structures mapping common passwords and their MD5/SHA-1 digests. An attacker dumps your user table, runs it against a rainbow table, and recovers 60–80% of passwords in under an hour for typical user bases. The one-way nature of the hash doesn't protect you because reversibility was never the attack vector.

The confusion cuts the other way too. I've seen developers "decrypt" a hash to verify a user's password — meaning they stored something they thought they could reverse. This usually means they're storing passwords in a format that is either actually reversible (bad) or they're about to get a runtime error because their mental model is broken (annoying, but better than the alternative).

What Hashing Is Actually For

Hashing shines when you need to verify integrity or identity without needing the original data.

Password storage — done correctly — uses hashing. When a user logs in, you hash what they typed and compare it to the stored hash. You never need the original password again. bcrypt, Argon2, and scrypt are the right choices here, not MD5 or raw SHA-256, because they're designed to be slow and salted by default.

File integrity checks use hashing. When you download a binary and verify its SHA-256 against the one published on the project's website, you're confirming the file wasn't tampered with in transit. You don't need to "decrypt" the checksum — you just recompute and compare.

Git uses hashing for its entire object model. Every commit, tree, and blob is identified by its SHA-1 (now increasingly SHA-256) hash. This makes the history tamper-evident: change one byte in an old commit and every subsequent hash changes, making forgery detectable.

UUIDs and unique identifiers sometimes involve hashing too — UUID v5 is a deterministic, namespaced UUID derived from SHA-1. You can regenerate the same UUID from the same input forever, which is useful for content-addressable systems. But again: no decryption involved. You're deriving, not encrypting.

Build systems and CI pipelines use content hashes constantly. Webpack adds content hashes to filenames so browsers cache correctly. Bazel and Pants use hashes to decide whether a build step needs to rerun. Docker layers are identified by digest. In all these cases, the hash is a fingerprint for change detection — asking "is this the same as before?" rather than "what is this?"

What Encryption Is Actually For

Encryption is for when you need the original data back.

Storing a user's saved credit card for future purchases? Encrypt it — you'll need to retrieve it later to charge them. Storing their address for shipping? Encrypt it. Sending a message between two people who need to read it? Encrypt it.

Storing a password? Do not encrypt it. If your system can decrypt a user's password, so can an attacker who compromises your key store. And you don't need to decrypt it — you just need to verify it. Hash it instead.

API tokens are an interesting edge case. Some systems hash them (store the hash, never the raw token), which is correct for the same reason as passwords — you only need to verify, not retrieve. Others encrypt them so they can be displayed back to the user later. Both can be reasonable depending on your requirements, but you need to consciously choose based on whether retrieval is actually necessary.

The Language Your Coworkers Use Matters

This isn't purely academic. The words people use in code comments, Slack messages, and architecture documents shape how the next developer thinks about a system.

When someone writes encryptPassword() as a function name and it's actually hashing, the next developer who reads that code may assume it's reversible. They might build a "forgot password" flow that tries to decrypt rather than reset. They might expose an admin endpoint to "view encrypted passwords" not understanding that what they've built is a bug, not a feature.

Naming matters. Call your hashing functions hashPassword(), computeChecksum(), deriveFingerprint(). Call your encryption functions encryptField(), sealPayload(). The distinction in the name carries the distinction in the semantics, and it prevents a category of security bugs that stem purely from imprecision.

A Note on "Encoding" While We're Here

While we're busting myths, let's briefly address the third term that gets conflated: encoding. Base64 is not encryption. Rot13 is not encryption. URL encoding is not encryption. These are lossless, reversible transformations with no secret — anyone can decode them. I've seen "Base64 encoded passwords" treated as secure, which they absolutely are not.

Encoding transforms data for compatibility or transport. Encryption transforms data for confidentiality. Hashing transforms data for verification. They answer completely different questions.

Practical Checklist

Before you write another line of security-sensitive code, ask yourself:

Do I need to get the original value back? If yes: encryption. If no: hashing.
Am I storing a password? Hash it with bcrypt or Argon2. Never encrypt it. Never MD5 it.
Am I verifying a file or content hasn't changed? Hash it with SHA-256 or better.
Am I storing something sensitive I'll need to retrieve (card number, SSN, address)? Encrypt it with a well-maintained library (libsodium, AWS KMS, or similar), not a homebrew AES implementation.
Does my function name reflect what it actually does? If it hashes, call it a hash function.

The Real Danger Isn't Ignorance, It's Confident Ignorance

Most developers who write // encrypt with MD5 aren't inexperienced — they've just internalized a sloppy mental model and never had reason to examine it. The code works. Tests pass. The feature ships. The breach happens eighteen months later when someone dumps the database and cracks 70% of the passwords in a weekend.

Understanding that hashing is irreversible, that this is a feature not a limitation, and that encryption serves a different job entirely — this is the kind of knowledge that doesn't just make you a better developer. It makes the systems you build genuinely safer for the people who trust them.

The myth is easy to hold and easy to lose. All it takes is understanding the question each primitive is actually answering.