A data explosion is happening, promising invaluable opportunities in scientific and technological progress, yet this vast potential relies not only on our ability to collect and access this data, but also on us being able to understand it. Sensemaking is the process of building a mental model of the data. It is only after acquiring this intuition that we can apply our mathematical tools (such as Machine Learning methods) in their full power and extract meaning and knowledge out of the raw data. Up until now, this sensemaking process has been done intuitively, usually through conventional visualization techniques (e.g. plotting the data). But the emergence of vast and high dimensional datasets is raising challenging issues not addressable by our current data analytic approaches. For example, current datasets are getting so large that asking even the simplest questions from them may take hours or days of computation. Even after accessing the data, usual visualization techniques may not work due to issues like overplotting. Furthermore, it is not even possible to fully visualize datasets that have hundreds or thousands of dimensions. These issues are providing the motivation for the emergence of a new field: Human Data Interaction.
Currently, HDI is more about asking the right questions and it has few answers to offer; questions like: Facing with a large and high dimensional dataset, how to even find meaningful and interesting questions to ask? How to answer those questions without waiting for hours and losing our train of thought? How to explore the data and navigate in its large and high dimensional space? How to use these explorations to make sense of the data? How these acquired mental models can be used in forming new knowledge and doing real scientific discoveries? How can we disseminate this knowledge? How to communicate our mental model of this complex data object to other people? What are the technical difficulties facing big data exploration and how to overcome them?
In this talk I will first start by giving some examples of a few successful data exploration tools, GigaPan and Time Machine. These tools experiment with a multitude of human data interaction mechanisms in order to help with the process of sensemaking of massive visual datasets and also in the process of communicating that knowledge to others. Then I will discuss the challenges facing HDI and some of initial answers that we can borrow from a multitude of fields, from database systems and human factors to developmental psychology, learning theory and communication theory. In the final section of this talk I will present EVA (Explorable Visual Analytics). EVA is a web-based prototype for understanding and interacting with large and high-dimensional data. We will use EVA to do some hands-on HDI experiments with Census and Twitter datasets.