Hashes.org Forum
Some observations on the Myspace dump - Printable Version

+- Hashes.org Forum (https://hashes.org/forum)
+-- Forum: Hashes.org (https://hashes.org/forum/forumdisplay.php?fid=4)
+--- Forum: General (https://hashes.org/forum/forumdisplay.php?fid=3)
+--- Thread: Some observations on the Myspace dump (/showthread.php?tid=34)

Some observations on the Myspace dump - frekvent - 07-04-2016

Now that the Myspace dump is public, it might be interesting to add
the unsalted SHA-1 hashes (116.8 million unique) to the public "leaked
lists". The full dump is available as a torrent here:

https://myspace.thecthulhu.com (15GiB)

Additionally, I have uploaded a file with just the unique unsalted sha1 hashes here:

https://transfer.sh/lGA95/myspace-unsalted-sha1.txt.xz (2.1 GiB)

SHA1: 3c4da283e594773070404b646940fe14933668dd

Once unrared the file is a 33GiB large textfile with 360213049 rows. Here is 10 lines selected at random:


The format of each record is
id : email : id/username : sha1(strtolower(substr($pass, 0, 9))) : sha1($id . $pass)
  • Field 1 is an integer.
  • Field 2 should be an email address but can contain any junk including unescaped newlines and colons. This has to be taken into account when parsing the data.
  • Field 3 is either an user id identical to field 1 or an username.
  • Field 4 is a sha1 hash of the password. The password was converted to lowercase and truncated to 10 characters before hashing.
  • Field 5 is a salted sha1 hash of the password. The salt is the user id in field 1. Unlike field 4 the password doesn't appear to have been lowercased and truncated before hashing.
Counting hashes
Each record that has a hash in field 5 also has a hash in field 4 but the converse is not true. Some records have no hashes at all. In total 359006286 records have an associated password.

$ tr -d '\r' <Myspace.com.txt | grep -E "'':0x[A-F0-9]{40}$" | wc -l
$ tr -d '\r' <Myspace.com.txt | grep -E ":0x[A-F0-9]{40}:''$" | wc -l
$ tr -d '\r' <Myspace.com.txt | grep "'':''$" | wc -l
$ tr -d '\r' <Myspace.com.txt | grep -E '(:0x[A-F0-9]{40}){2}$' | wc -l

Recovering salted passwords
The password in field 4 is a truncated and lowercased version of the password in field 5. One can use the truncated password in field 4 to recover the full password in field 5. If the password is shorter than 10 characters this is trivial.

$ echo -n 123456 | openssl sha1 | sed 's/.* //;y/abcdef/ABCDEF/;s/^/0x/'
$ grep -F 0x7C4A8D09CA3762AF61E59520943DC26494F8941B:0x Myspace.com.txt |
tr -d '\r' |
awk -F: '{ print $(NF) ":" $1 }' |
sed 's/..//' |
tr A-F a-f > test.hash
$ wc -l test.hash
269356 test.hash
$ hashcat -m 120 test.hash -a3 123456 -o /dev/null
Initializing hashcat v2.00 with 4 threads and 32mb segment-size...

Added hashes from file test.hash: 269356 (269356 salts)

All hashes have been recovered

Input.Mode: Mask (123456) [6]
Index.....: 0/1 (segment), 1 (words), 0 (bytes)
Recovered.: 269356/269356 hashes, 269356/269356 salts
Speed/sec.: - plains, - words
Progress..: 1/1 (100.00%)
Running...: 00:00:00:05
Estimated.: --:--:--:--

Extracting hashes from the dump
To extract the unsalted hashes in field 4 I use Awk. The fact that field 2 can contain newlines and colons makes it more difficult.

$ awk -F: '!/(:0x[A-F0-9]{40}|:''){2}$/ { print $(NF-1) }' Myspace.com.txt |
sed 's/..//' |
tr -s '\n' |
tr A-F a-f |
sort -u > myspace-unsalted.sha1.txt
$ wc -l myspace-unsalted-sha1.txt
116825318 myspace-unsalted-sha1.txt
$ du -h myspace-unsalted-sha1.txt
4.5G    myspace-unsalted-sha1.txt

Here is the result

That's it for now. The purpose of this thread is to discuss the Myspace dump.

Re: Some observations on the Myspace dump - s3in!c - 07-07-2016

That were some really good ideas you discovered there, as soon as I saw your post, we (CynoSure Prime) tried to use this information to recover the real passwords for the hashes where this is possible.
There is a more detailed text here: http://cynosureprime.blogspot.ch/2016/07/myspace-hashes-length-10-and-beyond.html

About your question, if the hashes get added to Hashes.org:
Currently the list is just too big to be handled well by the database and storage space is limited for this. And also as the passes from the non-salted hashes are not 'real' password as they are lowercased and cut by length 10, there could be better hashlists to have imported I think.