[Webtest] PdfToTextFilter and Encoding

Ulrich Mayring Ulrich Mayring <ulim@denic.de>
Wed, 30 Jan 2013 10:29:09 +0100


Hi all,

the pdfToTextFilter appears to not respect the text encoding. For example, if 
I have a PDF with German Umlaut characters I can do this:

<pdfVerifyText text="ü"/>

Here the encoding is dealt with correctly. However, if I do this:

<applyFilters>
	<pdfToTextFilter lineSep=" "/>
</applyFilters>
<verifyText text="ü"/>

Then I get a failure, because the text that is extracted from the PDF is in 
ISO-8859-1 encoding and my webtest in utf-8.

Is this a bug? Or am I doing something wrong?

Ulrich