Digital Scholar Lab (DSLab) is a Gale product for digital humanists and scholars, which offers solutions and possibilities to the most common challenges in the digital humanities. During the summer, I focused on designing interactive visualization of sentiment analysis and document clustering results for the DSLab analysis tools.

UX Intern @ Gale, Cengage Learning

June 2018 - Aug. 2018

Mentor: Thomas Piggott (Senior UX Designer)

Tools: Axure, Sketch

Design Challenge

What is digital humanities?

Digital Humanists emerges from the field of humanities computing

What is sentiment analysis?

Sentiment analysis, also called opinion mining, aims to analyze and understand the attitude, the overall contextual polarity or the emotional reaction to the context of a speaker, writer...

There are lots of ways to apply sentiment analysis. How can Digital Scholar Lab help users with different level of coding ability to analyze materials and visualize results according to user scenarios?

Design Overview

Key Features:

Text clean setting
Overall sentiment analysis of a content set
Sentiment analysis of a document at section level
Sentiment analysis of sequenced documents

Research

Literature Review

Sentiment analysis has different outputs and visualization techniques based on different context.

After literature review, I found the outputs of sentiment analysis can be polarity values, polarity levels, sentiment intensity, keywords and detailed emotions. And color, line, bar and various elements are used to visualize the value and trend. Examples will be shown later in iteration section.

Persona

Ashley who is new to the analysis tools and methods, and experienced users like Armando both worth our attention.

User Scenario

User scenarios vary based on different types of input, analysis level and expected output.

There are several user scenarios:

Final Design

Understand sentiment of text at different levels with data visualization.

Challenge 1

Discern the sentiment of each document in a content set at section (sentence, paragraph, or other user defined chunks) level.

The visualization shows the overall sentiment distribution, key sentiment words and sentiment change analyzed by section of each document in the whole content set. User can click each card to view the detail of the document within content.

On the document/article page, a visualization summary panel is added under the existing DOC EXPLORER. The visualization contains three parts, the sentiment overlay, the overall sentiment, and the sentiment development of the document. The sentiment overlay will highlight the document with corresponding colors by chunk. The overall sentiment shows the number of chunks of the polarity levels. The sentiment development has a zoom-in view of the sentiment development of the current page and a scope controller showing the development of the whole document and a sliding window which can direct to the specific page.

Iterations

I finally decided to mark the document by section in stead of by keywords, considering user scenario and technical limitation since it's hard to get an accurate estimation of the emotion of a word and the emotion of a word can usually be different according to different time, culture and context.
During evaluation, users found that marking the document to connect analysis with the content was a pretty novel way to understand a document, and especially helpful for pedagogical purpose to help students to better understand how sentiment analysis works.

Challenge 2

Discern the overall sentiment of a content set of a topic or event.

Bar chart is used to show the number of documents fall into the five polarity levels, strongly negative, negative, neutral, positive, strongly positive. Click on the bar to view all documents within the content set that have the corresponding sentiment.

Challenge 3

Know the sentiment overtime of a topic or event by analyzing a set of sequential documents.

The line chart can most efficiently show the sentiment overtime. The top graph uses each document's polarity, and the bottom graph uses the average polarity of each decade to show the mainstream sentiment change of that topic/events.

Challenge 4

Set and adjust text clean parameters for sentiment analysis.

Since the sentiment analysis is ran on the OCRed text which might not be 100% accurate. Before sentiment analysis there should be several text cleaning on the content and format to improve the credibility of analysis result. But for most primary resources, automatic OCR and text clean can still be hard to provide high quality resources. The manual and crowd-sourcing text cleaning is another major challenge under developing.

In the meanwhile, for pedagogical purpose, what's the purpose of each text cleaning rule should be explained to students like Ashley. And samples of how these rules work and affect the analysis should be provided.

Reflection

Balancing design and development

Based on the tech stack the development team is using and development timeline, some visualization might not be able to delivery, so it's important to consider all these in the beginning. Some design might be able to realized easily based on the current tech stack, some design might need extra investment. For example, the sentiment overlay is not a common visualization and need extra effort for the development team to realize. But after evaluations with users, DH specialists, product team, development team and other related teams, it was a feature worth extra investment.

Designing and testing visualization is a little bit different.

Since for people new to visualization they may not so sure about what's the most efficient way for them. So at early stage, visualization and DH experts can provide much more valuable feedbacks based on their experience. And when a relative complete design is created, the major users who are new to the tools should be involved into evaluation to test whether it's easier to learn and understand.

If I had more time I'd like to do more user evaluations with DH students.

Since the design concept is mainly tested with specialists, the next step is to test whether text clean settings and visualization are understandable for pedagogical purpose, and how to better categories the visualization (group by analysis level or by context).

Digital Scholar Lab Visualization