Behind the scenes at Labs
Showing results for 
Search instead for 
Did you mean: 

Reading analytics - we’re gaining significant new insights by analyzing how people read online


Contributed by Georgia Koutrika, HP Labs Senior Research Scientist




Georgia_2012_thumbnail.jpgIn the online world, there are two well-known “species” of user: “searchers” and “shoppers.” Each leaves a voluminous trail of information behind them. Web searchers mainly deposit search keywords and result clicks, while shoppers’ trails are made up of views, purchases, and rentals of products. Analytics algorithms have long mined for insights, user preferences, and leveraging crowd wisdom, with the ultimate aim of optimizing advertising, sales, products, services, and web sites.


The Web landscape has changed a lot the last few years, and new types of users have emerged. One notable new class is “online readers,” those who read content online in order to be informed, entertained, trained, educated, and so forth. The proliferation of textual content in a plurality of forms (including e-news, e-books, and online courses), along with the popularity of portable devices, has shaken the foundations of traditional printed forms. But it has also opened the door to new and exciting opportunities. Why? Because online readers leave their own digital trails in the form of page scrolls, turns, and other content interactions. Now organizations in digital printing, publishing, book retail, education, and other domains, as well as authors, educators, and other individuals, can leverage these trails to answer questions that were difficult or impossible to answer before. For starters, how long do people read in one session? How long do they stay on a page? How does that time vary by topic? When or where do they stop reading? These questions are only the beginning of a new kind of analytics, called reading analytics, that could significantly influence our future interactions with, and offerings to, online readers.


Reading analytics, at its heart, is a variety of big data analytics. It entails measurement, collection, analysis, and reporting of online reading data for the purpose of uncovering hidden information about reader behavior and content consumption.


Clearly, reading analytics can help organizations optimize content, services, and products and improve outcomes. Reading data can be collected on any type of digital content, such as product pages, news pages, manuals, and e-books, residing in any format (e.g., html, pdf). It can offer us insights into reader behavior, such as how people read and interact with content, and how they stay engaged. These analytics can more accurately reveal navigation needs, evolving preferences, and user habits. At the same time, they can uncover patterns from which it’s possible to draw conclusions about content, such as which parts are read most often, which take more time to read, which are skipped. They can also help optimize content that originated in print for new digital formats. For example, publishers can create better and more personalized content based on how readers actually consume it. And we can better predict a book's sales potential from the reception of similar material. Authors themselves might use insights from reading data as a source of very direct feedback to improve the readership of their titles. Educators, meanwhile, can customize their material to better fit the needs of their students, and learners can both understand and improve their reading habits and planning.


Reading analytics offer companies important insights into their products even when they aren’t in the content business per se. A company might, for example, support its existing customers and appeal to new ones through various channels: product description pages on the company web site, product forums, manuals, troubleshooting guides, and so forth. Reading analytics can shed light into (a) the value and shortcomings of the supporting content, (b) the needs and preferences of the customers, and (c) appealing or unappealing features of their products. Analyzing how customers use manuals etc. could thus reveal the preferred reading order of the material, say, or the more popular sections versus the rarely accessed ones, and it could suggest ways to optimize the content and the organization of the manual based on how customers actually consume it. Popular troubleshooting pages may be related to a particular feature of the product, highlighting a need for improving this feature. And reading data from manuals, coupled with web site data and reading data from the company’s product forum showing which posts are read more often, could collectively uncover recurring product issues and point to useful product improvements.


A richer approach to reading analytics


Existing reading analytics are mainly descriptive. They answer simple, aggregate questions such as: how many people have opened a specific story? How many actually finished it? Where do our readers live? What language do they speak? How many readers are revisiting the content? Where and when did each reader stop reading?


At HP Labs, we are developing reading analytics algorithms that let us go deeper than this, generating actionable insights about content consumption and reader behavior, rich user profiles, and personalized recommendations.


Reading behavior is complex, for sure. Is a reader pausing because she lost interest or because she needs more time to understand a page? Are people jumping to different pages in search of specific content or just because they are skimming through? Our algorithms “walk along with” readers as they interact with textual content, figuring out which actions correspond to actual reading and mining all possible reading patterns. In this way, we build a more realistic model of a user’s actual reading progress, and from that we derive more accurate insights about content consumption.


Furthermore, we’re building prediction models that tell us whether a user is on track or behind based on specific consumption goals. We can tell whether an employee, for example, is keeping up with a training course, or if a customer has found a product page worth reading to the end.


We also profile readers based on several behavioral attributes. We can measure, for example, the attention span of a reader, and what type of reader they are. Do they typically read from beginning to end, or do they jump around a lot, reading out of sequence? We then classify users based on features such as their progress, their relative speed, and peer comparisons. Grouping people with the same reading habits and the same needs can aid content customization and personalization. By adding further context (time of day, device, location, for example) we can tie our analytics both to different scenarios (reading while in a store, a classroom, the library, work) and the saliency of the moment (e.g., morning, evening).  


There are plenty of challenging aspects to this research: content diversity, field and topic complexity, individual variations in skills and needs, multiple user contexts, identifying distractions, to name just a few. But that presents us with many fascinating problems to solve. And as we begin to consistently analyze our data along different dimensions (time, location, topic, product, preference etc.), we’re very close to making “reader” as important an online designation as “shopper” and “searcher” are today.

0 Kudos
About the Author


Online Expert Days - 2020
Visit this forum and get the schedules for online Expert Days where you can talk to HPE product experts, R&D and support team members and get answers...
Read more
HPE at 2020 Technology Events
Learn about the technology events where Hewlett Packard Enterprise will have a presence in 2020
Read more
View all