With the rise of open-source, more and more public repositories are being hosted on GitHub. In fact, back in 2018 GitHub celebrated 100 million live repositories, and things have only been growing from there. However, with easy access to version control and open source, it’s important to make sure sensitive credentials and authentication tokens aren’t exposed to the public.
Let’s say I’m writing an application that takes advantage of data from an API call. For example, I could be targeting weather data from OpenWeatherMap:
GET "https://api.openweathermap.org/data/2.5/weather?&id=5128581&appid={YOUR API KEY}"
As this API call is prepared, it’s not uncommon to store your API key (your secret) in a variable in the same file. After all, it’s quick and easy during testing.
Here’s an example using Python and a fake API key, however this blog can be applied to any language:
# app.py api_key = "12abc3d45ef6789012345g6789h0ij12" city_id = "5128581"
base_url = "https://api.openweathermap.org/data/2.5/weather?" final_url = base_url + "appid=" + api_key + "&id=" + city_id
So, what’s the issue here? The problem is that when this code is pushed to a public GitHub repo, it’s now exposing the secret API key to the world. This is your private access token and should never be exposed outside of privileged users in your organization.
Some additional issues caused by this exposure:
So, as you can see, it’s very important to make sure credentials, access tokens, API keys, etc. are all secured in code before being pushed to GitHub (or any other public-facing version control or code repo).
How can we solve this problem? Well, the solution is simple. We just need a config file that stores our API keys (and other sensitive credentials) that’s included in other code files when necessary, but also ignored by version control (ex. gitignore). This allows your application to function as expected, while preventing any sensitive credentials from being pushed to GitHub.
Revisiting our earlier example, we can create a second code file named config.py and include that in any code that need access to our API key:
# config.py api_key = “12abc3d45ef6789012345g6789h0ij12” # app.py import config city_id = “5128581” base_url = "https://api.openweathermap.org/data/2.5/weather?" final_url = base_url + "appid=" + config.api_key + "&id=" + city_id
The change to app.py has been highlighted in RED and references the API key from config.py rather than assigning it to a variable directly in app.py.
Now we can prevent config.py from being pushed to GitHub by adding it to our gitignore file:
# .gitignore config.py
When we push to GitHub, only app.py and .gitignore will be uploaded to the public repo. config.py, and all sensitive information contained in it, will not end up on GitHub. If sensitive data has been pushed to GitHub in the past, then GitHub has a guide for removing that data from a repository.
This is a simple solution, but a powerful best practice any time you’re dealing with credentials. In fact, I like to always apply this principle and never assign secure credentials to variables unless the file is excluded from version control.
Depending on your organizational standards, you may want to also apply this to self-hosted version control. In general, it’s a good idea to know exactly where sensitive credentials are stored and pushing them to version control is often not a secure practice regardless of where version control is hosted (local, cloud, etc.).
Stealthbits’ StealthAUDIT data access governance solution includes a sensitive data component that helps organizations identify where sensitive data is located, who has access to it, how it’s being accessed, and what they’re doing with it.
StealthAUDIT includes:
Host Discovery: Identify the different platforms within the network that may contain various unstructured and structured data repositories to ensure a comprehensive view of your organization’s sensitive data.
Sensitive Data Discovery: Analyze content for patterns or keywords that match built-in or customized criteria.
Remediation Actions: Automate all or portions of the tasks you need to perform to remediate sensitive data violations.
Learn more about Stealthbits’ Data Access Governance here.
Dan Piazza is a Technical Product Manager at Stealthbits Technologies, responsible for File Systems and Sensitive Data in StealthAUDIT. He has worked in technical roles since 2013, with a passion for cybersecurity, data protection, data storage, and automation. He has a Bachelor’s degree from Bryant University, and outside of tech he enjoys running, tennis, and snowboarding.
Proper data security begins with a strong foundation. Find out what you're standing on with a free deep-dive into the security of your Structured and Unstructured Data, Active Directory, and Windows infrastructure.
Read more© 2021 Stealthbits Technologies, Inc.
Leave a Reply