All About the Apps
Showing results for 
Search instead for 
Do you mean 

PDF's and image-based documents are a pain for testing automators. Is there a remedy?

Michael-Deady ‎11-20-2012 08:32 AM - edited ‎09-09-2015 04:34 PM

Have you ever had the honor or the privilege of trying to get information or data from a PDF file using Quick Test professional? If you have, you will understand my pain. Many software platforms opt to use image or PDF formats to represent data in the form of a formatted report.  However, over the years this has caused a large number of headaches for automators. PDFs are simply image-based documents, and it is close to impossible to work with them and decipher the information we need.


One my favorite steps in a test case to automate has always been “print report”. This simple statement alludes to the concept that that the automated testing tool has the capability to read printed documents. I always imagine a scenario where R2-D2 from Star Wars rolls over to a printer; beeping the whole way in disgust. Then he has C-3PO read the binary document for Jaba the Hut. The whole time R2 is thinking "I should have read the darn test case all the way through before starting the automation process."


The available options


PDFs are not impossible for automators to decipher. The capability to use Optical Character Recognition (OCR) software has always been an option when dealing with PDF files when automating test. However because this capability was not built-in to Quick Test Professional, it has been more trouble than it’s worth in the past, causing lengthy delays and has for the most part been very unreliable.

In fact, reading PDF or image-base reports in automation is the reason why it was virtually impossible to achieve 100 percent automation for regression testing when describing it to my customers. While 100 percent automation for regression testing is still a pipe dream today, with Unified Functional Testing (UFT)11.5 it has become one step closer to reality.


As a result of UFT 11.5, I will need to come up with a new example to describe the impossibility of achieving 100 percent automation. Or on second thought, I could tell the client the truth—that it is just too boring and really not any fun to automate everything.


Is 100 percent an attainable goal?


Before going on any farther, I would like to say that I don’t believe 100 percent automation is reasonable for most automation testing. But when you’re talking about ROI in the near future, I do believe that we can achieve a much higher percentage rate than we currently achieve. I think this is true especially when you’re talking about tools like UFT 11.5.


I can hear your question. How much closer could UFT 11.5 help us in achieving a goal of 90ish percent? I can’t really tell you this time; but this will definitely be a subject of later blogs because it is a personal interest of mine.


UFT 11.5 vs. PDF. The ultimate champion is….


But the topic of this discussion isn’t the amount of automation that a person can achieve. It’s about the ability to read, validate and even extract information from a PDF file using UFT 11.5.

I’ve included a video which will demonstrate how simple it is to use a base PDF file for checkpoints validation and even regular expressions to enhance this new functionality. Now we can also extract information from the PDF itself such as invoice numbers, addresses or even a list of information to be used in the current automated script or stored to an external data file.  Again this may seem like child’s play for some people; however over the years having accuracy and the ability to read a PDF file has sometimes been a roadblock to Automators unless they were willing to spend additional time on code that for the most part would go unnoticed.


More and more companies are using PDF image-based technology to address the legal issues of editable documents. UFT’s 11.5 file content interface allows users to easily add or edit the contents within the image-based document. Testing file-content checkpoints is very reliable and easy to troubleshoot, even when I used incorrect expressions. If you’re wondering what the performance hit is when using or reading a PDF file as a checkpoint, I found it very minimal with files averaging 25 pages or more.  

There are two things I’d like to ask you as a reader of this article.  First, how do you plan to use this new functionality? Second, as I stated earlier in this article, I have lost my primary example of why 100 percent regression automation is not feasible:


  1. Do you believe that you can reach 100 percent and not just in your environment but in others as well?
  2. What is the example that you give clients when attempting to explain why it is impossible to automate all your regression scripts?

Hopefully your answers will help me come up with a new example. Otherwise I will have to spend more time thinking about old movies, cartoons and the like—and you don’t want me doing that.

Also, I will be attending HP Discover in Frankfurt from December 4-6. I would love to meet with you and discuss your ideas over a bratwurst, or maybe a pretzel.


On the Down Low: From now on Pretzel is the Code Word for a Good German Beer so I could slip it by my Editor and HP’s censor team also known as HP Software Marketing Group #HPSoftware #HaveAPretzel


If you like this subject you may like these articles:

What does it mean to an Automator to have a True IDE?

HP Communities - Webinar announcement – “What’s new in HP’s automat...

My dream functionality is now a reality with HP UF...

“What does a name change mean to me?” A look at th...





0 Kudos
About the Author


Michael Deady is a Pr. Consultant & Solution Architect for Teksystems, center on quality, aimed at client's satisfaction, and long-term success. Perceived by clients, peers, and supervisors as a leader with the proven ability to lead development and quality assurance teams through software-development life cycle phases, to ensure quality of new products. He specializes in software development, testing, and security. He also loves science fiction movies and anything to do with Texas.

on ‎07-02-2015 05:10 PM

Can i verrify for Logo's and digital signatures in PDF Files?

27 Feb - 2 March 2017
Barcelona | Fira Gran Via
Mobile World Congress 2017
Hewlett Packard Enterprise at Mobile World Congress 2017, Barcelona | Fira Gran Via Location: Hall 3, Booth 3E11
Read more
Each Month in 2017
Software Expert Days - 2017
Join us online to talk directly with our Software experts during online Expert Days. Find information here about past, current, and upcoming Expert Da...
Read more
View all