Search for sensitive data in GitHub repositories

Developers generally like to share their code, and many of them do so by open sourcing it on GitHub.

From Wikipedia:

GitHub is a web-based Git or version control repository and Internet hosting service. It is mostly used for code. It offers all of the distributed version control and source code management (SCM) functionality of Git as well as adding its own features. It provides access control and several collaboration features such as bug tracking, feature requests, task management, and wikis for every project.

Many companies also use GitHub as a convenient place to host both private and public code repositories.

However sometimes employees accidentally publish sourcefiles that might contain sensitive information, like API keys or database credentials.


Let’s Dorking!

I’ve already talked about “dorks”, regarding the well known “Google Dorking”:

In 2002, Johnny Long began to collect interesting Google search queries that uncovers vulnerable systems or sensitive information, and calls them “Google dorks”.

We identify with “Google Dorking” the method for finding vulnerable targets using the google dorks in order to obtain usernames and passwords, email lists, sensitive documents and website vulnerabilities.

Similar to Google Dorking, GitHub Dorking uses specific search keys to find sensitive information in public repositories.

Here is a list (continuously updated):