Targeted Bruteforcing – Mining patterns in passwords

Targeted Bruteforcing – Mining patterns in passwords


 

With every passing day, there is a new passwords breach. A lot of analysis has been done on patterns in password leaks. So, I thought of giving it a try with a latest breach i.e that of 000webhost. The plain text passwords of 000webhost were breached in 2015 and I could not find much analysis on them. Moreover, mining patterns in a real world data means a good thing for both security researchers and hackers. Therefore, in this post, lets do it.

Targeted Bruteforcing – Mining patterns to make brute forcing easy

One thing before we start the analysis is that all the passwords are of length 8 or greater. There are total 12 million such passwords which seems a good amount to perform analysis on.

The first thing that is inevitable to do on a password set is get top 50 passwords. In our case, the top 50 passwords were not the same as we see in every dataset. There are variations in top 50 passwords. One reason could be that people are getting better at having passwords like 12345678 but they still are very poor at choosing their passwords. The following image says it all.

top50passwords

Here’s the list in the text form.

Lets move to a better thing i.e pattern mining. I could not find a comprehensive pattern mining of the passwords of this leak.That was the main reason of doing it. The patterns identified are interesting and they can contribute in targeted brute forcing which is beneficial both for security guys as well as hackers. Lets look at the top 50 patterns.

Here, A stands for an uppercase alphabet, a for lowercase, D for a digit and S for a special character.

One interesting thing to note here was how much of the total passwords these 100 patterns were covering. It was astonishing to see that almost 60% of the total passwords lie in just the top 100 patterns.. This is amazing for the brute forcing guys, right? Here’s a graphical representation.

totalpercentage-ofpatterns

Another analysis that I think is essential is the letter frequency. It is said that n,t and a few other letters are used more frequently in English but in our case, the letter ‘a’ was used most frequently, far more than others. Here’s a list of top 10 letters.

 

That’s it. This was a precise yet an important analysis that had to be done. I hope this was informative. I believe that this analysis can help hackers as well as security people in attacking and preventing such attacks.

You can find the data set on google. Let me know what do you think.

 

 

Categories

5 Comments

Add yours
  1. 4
    Jack Blanchovik

    It’s actually a pretty good method if you’ve seen someones past breaches, I had a guy have passwords always starting with “Fluffy” and always the numbers 1, 2, and 3 [never used any other numbers] and I could make a hybrid mask using Fluffy as the base and DDD at the end (so Fluffy?d?d?d) with incremented min and max of 6 to 9.

+ Leave a Comment