JACI - Just Another Captcha Implementation

What is it?

Jaci is a form of captcha that involves matching a series of images to other images based on the content or subject. Currently JACI uses images sourced from Google Images. The most common captcha that is in use on the Internet today is the distorted word captcha. Users are required to decipher a text string of around 6-10 characters that are warped or have other defects that make them less of a target for OCR attacks.

Try it out

Click to try the Jaci demo - 2 Sept 2009 Broken at the moment. Google seem to change their source code pretty often

How does it work?

Subjects are predefined within the source code of Jaci. The subjects can be easily customised but careful consideration has to be make to ensure that subjects don't overlap. These subjects are then the query for the Google Images search. Google Images returns a maximum of 1000 search results, with the results at the front more relevant than the ones at the end. Knowing this, the Jaci search algorithm "prefers" results at the front by biasing the random number generator using the following function.

Where c = maximum number generated by the RNG and b = the factor of bias

Jaci currently uses a c of 1000, as Google Images does not return image results greater than 1000, and a factor of bias of 4. When plotted, we can see how this function works to prefer lower numbers.

Jaci then fetches 2 images for each subject, creating a partnership between the images. Currently Jaci fetches 4 sets, creating a total of 8 unique image searches and fetches but this can be changed easily. Users are then presented with two rows of four images. One image from each row relates to one image from of adjacent row. This image explains this a bit better.

This relationship is saved on the web server and checked against after the user has completed the puzzle. Users are required to drag images from a(x) to their like partners on b(x). The experience is quite seamless. Both a(x) and b(x) images will dissappear once a(x) has been dragged onto b(x). Once the last a(x) is dropped onto b(x), the page automatically submits the mapping that the user created and compares the results. If the mapping is exactly the same, the user passes the test and is allowed the web service.

How easily can it be broken?

With the current state of image recognition, the test cannot be broken based on any type of intelligence. A brute force attack is the only attack that could break Jaci. With 4 images, the probability of guessing a correct mapping is 4!. 4! = 24, so there is about a 4.7% chance that it will be guessed. As each extra set of images are added, the chance of guessing gets drastically reduced. 5! = 120 and 6! = 720, so the growth is exponential. With this growth of brute force difficuilty comes greater complexity and time required to solve the image puzzle.

It is reccomended that Jaci is used with 4 image sets as well as extra programming built into the site that would block a user after say 3 or 4 incorrent attempts.

How can I intergrate it into my website?

At the moment you can't. There are still many performance issues to work out as Jaci fetches the page from Google Images as the script is run. Ideally, there should be a cache of prefetched images that is refilled when needed so the page could be served immediately. Also, the images sourced from Google Images are in no way perfect. Sometimes the images are hard to distinguish and do not seem relevant to its partner image. Using Google Images as an image source unfortunately means that this can and does occur as an image may have multiple tags.

Software used


Page last updated: 02 September 2009