security - Fundamental difference between Hashing and Encryption algorithms -

i see lot of confusion between hashes , encryption algorithms , hear more expert advice about:

when use hashes vs encryptions
what makes hash or encryption algorithm different (from theoretical/mathematical level) i.e. makes hashes irreversible (without aid of rainbow tree)

here similar questions didn't go detail looking for:

what difference between obfuscation, hashing, , encryption?
difference between encryption , hashing

well, in wikipedia... since want explanation, i'll best here:

hash functions

they provide mapping between arbitrary length input, , (usually) fixed length (or smaller length) output. can simple crc32, full blown cryptographic hash function such md5 or sha1/2/256/512. point there's one-way mapping going on. it's many:1 mapping (meaning there collisions) since every function produces smaller output it's capable of inputting (if feed every possible 1mb file md5, you'll ton of collisions).

the reason hard (or impossible in practicality) reverse because of how work internally. cryptographic hash functions iterate on input set many times produce output. if @ each fixed length chunk of input (which algorithm dependent), hash function call current state. iterate on state , change new 1 , use feedback (md5 64 times each 512bit chunk of data). somehow combines resultant states these iterations form resultant hash.

now, if wanted decode hash, you'd first need figure out how split given hash iterated states (1 possibility inputs smaller size of chunk of data, many larger inputs). you'd need reverse iteration each state. now, explain why hard, imagine trying deduce a , b following formula: 10 = + b. there 10 positive combinations of a , b can work. loop on bunch of times: tmp = + b; = b; b = tmp. 64 iterations, you'd have on 10^64 possibilities try. , that's simple addition state preserved iteration iteration. real hash functions lot more 1 operation (md5 15 operations on 4 state variables). , since next iteration depends on state of previous , previous destroyed in creating current state, it's impossible determine input state led given output state (for each iteration no less). combine that, large number of possibilities involved, , decoding md5 take near infinite (but not infinite) amount of resources. many resources it's cheaper brute-force hash if have idea of size of input (for smaller inputs) try decode hash.

encryption functions

they provide 1:1 mapping between arbitrary length input , output. , reversible. important thing note it's reversible using method. , it's 1:1 given key. now, there multiple input:key pairs might generate same output (in fact there are, depending on encryption function). encrypted data indistinguishable random noise. different hash output of consistent format.

use cases

use hash function when want compare value can't store plain representation (for number of reasons). passwords should fit use-case since don't want store them plain-text security reasons (and shouldn't). if wanted check filesystem pirated music files? impractical store 3 mb per music file. instead, take hash of file, , store (md5 store 16 bytes instead of 3mb). way, hash each file , compare stored database of hashes (this doesn't work in practice because of re-encoding, changing file headers, etc, it's example use-case).

use hash function when you're checking validity of input data. that's designed for. if have 2 pieces of input, , want check see if same, run both through hash function. probability of collision astronomically low small input sizes (assuming hash function). that's why it's recommended passwords. passwords 32 characters, md5 has 4 times output space. sha1 has 6 times output space (approximately). sha512 has 16 times output space. don't care password was, care if it's same 1 stored. that's why should use hashes passwords.

use encryption whenever need input data out. notice word need. if you're storing credit card numbers, need them out @ point, don't want store them plain text. instead, store encrypted version , keep key safe possible.

hash functions great signing data. example, if you're using hmac, sign piece of data taking hash of data concatenated known not transmitted value (a secret value). so, send plain-text , hmac hash. then, receiver hashes submitted data known value , checks see if matches transmitted hmac. if it's same, know wasn't tampered party without secret value. commonly used in secure cookie systems http frameworks, in message transmission of data on http want assurance of integrity in data.

a note on hashes passwords:

a key feature of cryptographic hash functions should fast create, , very difficult/slow reverse (so it's practically impossible). poses problem passwords. if store sha512(password), you're not doing thing guard against rainbow tables or brute force attacks. remember, hash function designed speed. it's trivial attacker run dictionary through hash function , test each result.

adding salt helps matters since adds bit of unknown data hash. instead of finding matches md5(foo), need find when added known salt produces md5(foo.salt) (which harder do). still doesn't solve speed problem since if know salt it's matter of running dictionary through.

so, there ways of dealing this. 1 popular method called key strengthening (or key stretching). basically, iterate on hash many times (thousands usually). 2 things. first, slows down runtime of hashing algorithm significantly. second, if implemented right (passing input , salt in on each iteration) increases entropy (available space) output, reducing chances of collisions. trivial implementation is:

var hash = password + salt; (var = 0; < 5000; i++) {     hash = sha512(hash + password + salt); }

there other, more standard implementations such pbkdf2, bcrypt. technique used quite few security related systems (such pgp, wpa, apache , openssl).

the bottom line, hash(password) not enough. hash(password + salt) better, still not enough... use stretched hash mechanism produce password hashes...

another note on trivial stretching

do not under circumstances feed output of 1 hash directly hash function:

hash = sha512(password + salt);  (i = 0; < 1000; i++) {     hash = sha512(hash); // <-- not this! }

the reason has collisions. remember hash functions have collisions because possible output space (the number of possible outputs) smaller input space. see why, let's @ happens. preface this, let's make assumption there's 0.001% chance of collision sha1() (it's much lower in reality, demonstration purposes).

hash1 = sha1(password + salt);

now, hash1 has probability of collision of 0.001%. when next hash2 = sha1(hash1);, all collisions of hash1 automatically become collisions of hash2. now, have hash1's rate @ 0.001%, , 2nd sha1() call adds that. now, hash2 has probability of collision of 0.002%. that's twice many chances! each iteration add 0.001% chance of collision result. so, 1000 iterations, chance of collision jumped trivial 0.001% 1%. now, degradation linear, , real probabilities far smaller, effect same (an estimation of chance of single collision md5 1/(2¹²⁸) or 1/(3x10³⁸). while seems small, the birthday attack it's not small seems).

instead, re-appending salt , password each time, you're re-introducing data hash function. collisions of particular round no longer collisions of next round. so:

hash = sha512(password + salt); (i = 0; < 1000; i++) {     hash = sha512(hash + password + salt); }

has same chance of collision native sha512 function. want. use instead.

Search This Blog

Aleternatvie