Web security Innovations: How Cloudflare Bot Challenge and Turnstile Protect Web Traffic
With the constant development of technologies on the Internet, the security of web resources becomes a priority for website owners and developers. Using effective protection tools against bots and automated threats is becoming a necessity.
Cloudflare's solutions (Bot Challenge and Turnstile) stand out for their innovation and balance between user friendliness and reliable protection. Let's take a closer look at their operating mechanisms.
According to the developers, the main goal of creating these technologies is to mitigate attacks by malicious bots without affecting real users.
Cloudflare cannot block absolutely all bots, so they use their own list of allowed bots. The rest of the bot traffic not included in this list will be defined as malicious, which means they will most likely be denied access to the protected page.
What are Bot Challenge and Turnstile? In the first case, we are still being tested. In the second case, everything will be done for us, they will check it according to the principle of a turnstile in the subway. If our chip works, come in. No, that means blacklisted.
Bot Challenge example (the options may be different):
Turnstile example:
How Cloudflare Detects Bots
The service uses passive (server-side) and active (client-side) bot detection methods.
Passive methods
Botnet detection
Cloudflare maintains a catalog of devices, IP addresses, and behaviors that are associated with malicious botnets. Any device suspected of belonging to one of these networks is either automatically blocked or faces additional client-side issues that need to be resolved.
IP Reputation
The reputation of a user's IP address is based on factors such as geolocation, ISP, and reputation history. For example, IP addresses belonging to a data center or a well-known VPN provider will have a worse reputation than a home IP address. The site may also restrict access to the site from regions outside the territory it serves, since traffic from the real client should never come from there.
HTTP Request Headers
Cloudflare uses HTTP request headers for verification. If you have a non-browser User Agent, your parser can easily be mistaken for a bot. The service can also block a bot if it sends a request without headers. Or if there are mismatched headers depending on your User Agent.
TLS fingerprint
A TLS fingerprint is created when connecting to the server. The system analyzes cipher suites, extensions, and elliptic curves to calculate the fingerprint hash.
If the User Agent header from the client request matches the User Agent associated with the saved fingerprint hash, the security system assumes that the request came from a standard browser. If this data does not match, the request will be blocked.
Fingerprint HTTP/2
As with TLS fingerprinting, each client request will have a static HTTP/2 fingerprint. To determine the legitimacy of a request, Cloudflare always checks that the fingerprint and User Agent pair from the request matches the pair from the whitelist stored in their database.
HTTP/2 and TLS fingerprinting are almost identical. Of all the passive bot detection methods Cloudflare uses, these two are the technically most difficult to monitor based on requests. However, they are the most important.
Active methods
Canvas fingerprint
Canvas is an HTML5 API used to draw graphics and animations on a web page using JavaScript. To create a Canvas fingerprint, the web page requests your browser's Canvas API to render the image. This image is then hashed to create a fingerprint.
Canvas fingerprint depends on several layers of the computing system, such as:
- Graphics Processing Unit - GPU.
- GPU driver, operating system, fonts, rendering algorithms.
- The browser's image processing mechanism - WebGL.
Since changing any of these categories creates a unique fingerprint, this method accurately distinguishes between device classes.
Cloudflare has a large dataset of legitimate Canvas + User Agent pairs. Using machine learning, they can detect tampering with device properties (such as the User Agent, operating system, or GPU) by identifying a discrepancy between your fingerprint and the expected one.
Event Listening
Cloudflare uses JavaScript to add an addEventListener method to web pages that listens for user actions such as mouse movements, mouse clicks, or keystrokes. If they are not used, there is reason to believe that the user is a bot.
API request
Browser-specific APIs. These specifications exist in one browser but may not exist in another.
For example, window.chrome is a property that only exists in the Chrome browser. If the data you are sending indicates that you are using Chrome, but you are sending it using Firefox's User Agent, it will be obvious that something is wrong.
Timestamp API
The service uses timestamp APIs such as Date.now() or window.performance.timing.navigationStart to track user speed metrics. If the tags do not correspond to a person's normal Internet activity, the user will be blocked.
Automatic browser detection
Cloudflare requests properties that only exist in automated environments. For example, the presence of window.document.__selenium_unwrapped or window.callPhantom indicates the use of Selenium and PhantomJS. For obvious reasons, you will be blocked if this is discovered.
SandBox detection
There are checks that prevent emulated browser environments, such as in NodeJS using JSDOM. The script can search for the process object file that only exists in NodeJS. It is also possible to determine if functions have been changed using Function.prototype.toString.call(functionName).
Cloudflare Turnstile
Cloudflare Turnstile is a smart CAPTCHA alternative. It can be embedded on any web resource without sending traffic through Cloudflare and without showing captcha to visitors.
Advantages of Turnstile:
- Intuitive and user-friendly interface.
There is no need to decipher text or images.
- Easy integration with other Cloudflare services.
Which makes it attractive to Cloudflare users.
- Reliable protection against spam and cyber threats.
Combining security with user convenience.
Considering all of the above, it may seem that nothing can overcome such protection. Not everything is so hopeless, we will look at various options for solving Cloudflare, starting with the most direct method.
Solve Cloudflare CDN by Calling the Origin Server
Cloudflare can only block requests passing through its network, so it would be good if we could send the request directly to the origin server. No protection between you and the data you need!
You need to complete two steps:
- Find the source IP address.
On secure sites, DNS records will be hidden. But probably not everywhere: some unprotected subdomains, old services or mail messages may be accessible under the same domain name, but still point to the otigin server.
2. Request data from the origin server.
You've got the original IP address, great! But now... what to do about it? You can try to paste it into your browser's URL bar, but this may not work. This is a common server configuration to only allow connections using a valid domain name and not an IP address. Since domain name usage goes to DNS, we need to avoid them.
You can try a tool like curl, which allows you to send a request to a target IP address, but forces the host to use it. Another option is to try to force the use of the host file (i.e. /etc/hosts), since the request will not check DNS and will use the IP address that you set there manually.
All this sounds nice, but in many cases this method does not work, since Cloudflare in practice uses protective methods such as a waiting room.
What is a waiting room? Your browser spends some time solving tasks to prove that you are not a robot. If you are marked as a bot, you will receive an "Access Denied" error. Otherwise, you will be automatically redirected to the real web page.
You will find yourself in the Cloudflare waiting room for a few seconds. The exact time depends on the security level of the target and how your parser passes the tests. When the task is completed once, you will be able to browse the site for a while.
How to solve Cloudflare waiting room? Ideally, solve JavaScript tasks and prove that you are human. However, a viable approach is to analyze the JavaScript Cloudflare task to understand the algorithm responsible for creating the task and verifying the response. So you can redesign the script.
The key factors when contacting Bot Challenge and Turnstile are high-quality residential proxies and a carefully selected User Agent.
Considering all of the above, the easiest way is to trust the developed technologies for passing the Cloudflare Bot Challenge and Turnstile to such resources as CapMonster Cloud, which offers an effective solution to these types of protection several times cheaper than others.
Using ZennoPoster resources, you can implement the following Cloudflare Turnstile solution:
Using a regular expression, we pull out the required websiteKey, which, together with our API key, we pass with a POST request to CapMonster Cloud.
After a 5-7 second pause, we send a second request with a confirmed TaskID. If you receive a result with a ready-made token, enter and send it. If there is no token in the response, then we send a repeat request with the same parameters after a few seconds.
Detailed documentation can be found by following the link. In addition, the developers offer a solution to these types of protection using the CapMonster Cloud browser extension.
To sum up, with the development of technology, and especially with the introduction of artificial intelligence, protection will improve and complicate tasks for developers. But services like Capmonster Cloud will not stand still, but will develop their own response to such technologies.
Note: We'd like to remind you that the product is used for automating testing on your own websites and on websites to which you have legal access.