Katy Börner, Indiana University
Envisioning Science, Technology, and Innovation
Recent developments in data mining, information visualization, and science
of science studies make it possible to study science, technology, and innovation
(STI) at multiple levels using a systems science approach. At the micro-level,
the impact of single individuals, specific works, or legal frameworks can
be examined. At the meso-level, the expertise profiles of institutions can
be compared or the trajectories of student cohorts can be modeled. The macro-level
provides a 10,000 foot view of the continuously evolving geospatial and topical
landscape of science and technology and the global import/export activities,
innovation diffusion, and brain circulation unfolding over both spaces.
This talk features STI visualizations including maps from the international
Places & Spaces: Mapping Science exhibit (http://scimaps.org), the Atlas
of Science, and the Atlas of Knowledge. Interested to empower many to not
only read but also make data visualizations, I will then present a visualization
framework that was developed to guide the design of complex data analysis
and visualization workflows as well as the development of macroscope tools
(http://www.scivee.tv/node/27704). Both, the framework and the macroscope
tools, are taught in the Information Visualization MOOC (http://ivmooc.cns.iu.edu,
now in its 3rd year) that empowers students from 100+ countries to convert
data into insights.
· Börner, Katy. 2010. Atlas of Science: Visualizing What We Know.
The MIT Press.
· Scharnhorst, Andrea, Katy Börner, and Peter van den Besselaar,
eds. 2012. Models of Science Dynamics: Encounters Between Complexity Theory
and Information Science. Springer Verlag.
· Börner, Katy, and David E. Polley. 2014. Visual Insights: A
Practical Guide to Making Sense of Data. Cambridge, MA: The MIT Press.
· Börner, Katy. 2015. Atlas of Knowledge: Anyone Can Map. The
Jenny Bryan, University of British Columbia
New tools and workflows for data analysis
In the past several years, there have been exciting additions to the toolkit
for statisticians and data analysts who work in R. Examples include RStudio,
the R markdown format for dynamic reports, the Shiny web application framework,
and improved integration with Git(Hub). The downside, of course, is the potential
agony associated with mastering new tools and developing new workflows. Change
is hard. I will give an overview of these developments and describe the costs
and benefits associated with adopting new approaches to data exploration and
analysis. I will also share my very positive and illuminating experiences
from teaching these new tools in several graduate courses.
Sheelagh Carpendale, University of
Information Visualization: Exploring New Options
Much of the excitement in the early 1990s about information visualization
originated in the idea of creating new visual, spatial representations that
would allow people to see their data. Much was said about the
amount of the brain that is devoted to spatial and visual reasoning and how
visualizations might have the power to utilize these relatively untapped resources.
However, as information visualization research has progressed, a degree of
practically has emerged heightening a focus on usability and task enablement.
As important as this focus maybe, there may still be something worth investigating
in the notion of alternate representations. In this talk, I will explore the
possible power of alternate interactive visual representations by considering
ideas around innovation.
Remco Chang, Tufts University
Big Data Visual Analytics: A User-Centric Approach
Modern visualization systems often assume that the data can fit within the
computer's memory. With such an assumption, visualizations can quickly slice
and dice the data and help the users examine and explore the data in a wide
variety of ways. However, as we enter the age of Big Data, the assumption
that data can fit within memory no longer applies. One critical challenge
in designing visual analytics systems today is therefore to allow the users
to explore large and remote datasets at an interactive rate. In this talk,
I will present our research in approaching this problem in a user-centric
manner. In the first half of the talk, I will present preliminary work with
the database group at MIT on developing a big data visualization system based
on the idea of predictive prefetching and precomputation. In the second half
of the talk, I will present mechanisms and approaches for performing prefetching
that are based on user's past interaction histories and their perceptual abilities.
Ta Chiraphadhanakul, Facebook Inc.
Visualizing Big Data at Facebook
There are over 300 petabytes of data at Facebook. It is difficult, if at all
possible, to visualize every single data point in these gigantic data sets.
I will provide a brief overview of Facebook data infrastructure and present
the key technologies that enable us to query subsets of data quickly and perform
interactive visual analysis. Through a lot of examples from our work, I will
discuss how data scientists and researchers transform big data into valuable
information and insightful visualization.
Christopher Collins, UOIT
Semantics and Sentiment in Visual Text Analytics
How do people feel about my product? What are the main themes in the news
today? These are examples of the questions people ask about large scale text
data. Visual text analytics tools are being created to help address these
challenging questions. In this talk I will review recent research advances
for exploring and analyzing sentiment in text, and extracting meaning and
relationships between entities in text. For those visualization designers
who want to take advantage of semantics and sentiment, this session will also
cover natural language processing toolkits and data resources.
Dianne Cook, Iowa State University
A Kaleidoscope of Statistical Graphics Research Projects
This will be a series of short talks displaying a range of current projects
in data visualization research from Iowa State University and Purdue University.
-Adding a new geom to ggplot2, in order to extend the widely used software's
network plotting capacity. Samantha Tyner
- Exploring networks using D3 and shiny in the software gravicom. Andee Kaplan
- Creating interactive web graphics from ggplot2 using the R package animint.
- Exploring temporal data by interactively slicing and dicing, using the R
package cranvastime. Xiaoyue Cheng
- Trelliscoping big data. Barret Schloerke
Michael Friendly, York University
Big Data and Big Questions: Vignettes and lessons from the history of data
This talk traces some key developments in the history of data visualization
to important scientific and social questions of their time and the availability
of relevant data: the Big Questions and Big Data of a given era. I try to
present this rich history in terms of a few vignettes. They range from problems
of geodesy and navigation that led to the first statistical graph, to problems
of crime and other "moral variables" that led to foundation of modern
social science, to graphs in the Golden Age of statistical graphics designed
for state planning and leading to what is arguably the most notable scientific
discovery of all time based on a purely visual analysis.
Alex Goncalves, Columbia University
New Trends in Data Visualization and Journalism
In a moment of tectonic shifts in the media industry, news media has been
looking for new strategies to captivate and engage the public. The combination
of data, visualization, and story telling is certainly one of the most promising
areas for media innovation. I will present some successful examples that suggest
the potential of data visualization for informing public opinion and promoting
Susan Holmes, Stanford University
Transformations before Visualization of Heterogeneous Data
Modern data presents many layers of heterogeneity. I will speak about how
careful data transformations can make visualization more effective. Using
examples from current research on the Human Microbiome I will show several
examples of data transformations that result in more meaningful graphics for
complex heterogeneous data that combines phylogenetic trees, OTU contingency
tables, clinical data and community networks.
I will show an implementation of these ideas in a Bioconductor package (phyloseq)
and its browser based extension (Shiny-phyloseq).
This talk contains joint work with Joey McMurdie.
Ekaterina Smirnova, University of Wyoming
Visualization of Multidimensional Data with Different Structures
Recent advances in various ``omics'' technologies allow for comprehensive
examination of microbial communities, together with other data such as those
obtained on lipids and cytokines, along with usual clinical (covariate) data,
all collected on the same individual. Often the goal is to relate these data
and explore their connections to various diseases. A characteristic of these
studies is that the data are often quite sparse but collected on a large number
of variables. As a start, exploring any structure present in the data is essentially.
Eigendecomposition-based methods provide tools for not only representing data
in lower dimensions, but also enable systematic integration and comparison
of multi-omic data sets.
Using data from a vaginal microbiome project, we address statistical methods
for big data visualization, for exploring temporal dynamics of the vaginal
microbiome, and for integrating three types of ``omics'' data, namely, microbial
data obtained from 16S sequencing, data on some cytokines and some lipids
collected on the same women. In particular, correspondence analysis allows
for exploring whether women with bacterial vaginosis can be grouped based
on the taxa present in their vaginal microbiome samples obtained via 16S sequencing,
co-inertia analysis provides a tool for coupling the taxa measurements over
two subsequent clinic visits, and multiple co-inertia analysis helps visualize
connections between the three different data types.
Ramnath Vaidyanathan, McGill University
Interactive Visualizations from R
In this talk, I will discuss an approach to create, customize and share interactive
visualizations straight from R, using a consistent interface, leveraging existing
like d3.js and raphaeljs, have made it easier to create sophisticated interactive
and web development tools, making it harder to use for data scientists, who
often spend a lot of their time analyzing data using languages like R/Python/Julia.
The main motivation behind this work is to provide data scientists a seamless
workflow that allows them to execute all steps of the data visualization process,
from acquiring data to exploring it, visualizing it, and sharing their results,
without having to leave the comfort of their primary language for data analysis
As a part of this talk, I will discuss three R packages: htmlwidgets, a framework
and Kenton Russell), rCharts, consistent plotting interface to leverage several
popular interactive visualization libraries (jointly authored with Kenton
Russell and Thomas Reinholdsson), and rMaps.
Hadley Wickham, R Studio
# Pipelines for big data
When thinking about visualising big data, the visualisation challenges play
a surprisingly small role. Instead, much of the challenge is ingesting the
data, tidying it into a workable format and transforming it to the appropriate
level of aggregation (so you _can_ see the forrest for the trees). In this
talk, I'll discuss my recent work that ties together R packages for working
with for big data, first loading it, then tidying it, then transforming it
and finally visualising it.
Leland Wilkinson, Tableau Software
Exploring huge collections of images
Tuan Nhon Dang and Leland Wilkinson
We introduce a method for guiding interactive exploration of a huge corpus
of images. The method is based on scagnostics - nine characterizations of
the 2D distributions of orthogonal pairwise projections on a set of points
in multidimensional Euclidean space. These characterizations include measures
such as, density, skewness, shape, outliers, and texture. Working directly
with these measures, we can locate anomalies for further analysis or search
for similar distributions in a large corpus of images.
Bowei Xi, Purdue University
Divide and Recombine (D&R) for Larger and More Complex Data
D&R is a statistical approach to the analysis of large complex data. The
goals are the following: (1) Provide the data analyst with methods and a computational
environment that enable study of large data with almost the same comprehensiveness
and detail that we can small data. (2) The analyst uses an interactive language
for data analysis that is both highly flexible and enables highly time-efficient
programming with the data.
(3) Underneath the language, a distributed database and parallel compute engine
makes computation feasible and practical, and is easily addressable from within
the language. (4) The environment provides access to the 1000s of analytic
methods of statistics, machine learning, and visualization.
(5) Get a reasonable first system going right away.
In D&R, the analyst divides the data into subsets. Computationally, each
subset is a small dataset. The analyst applies analytic methods to each of
the subsets, and the outputs of each method are recombined to form a result
for the entire data. Computations can be run in parallel with almost no communication
among them, making them nearly embarrassingly parallel, the simplest possible
One of our D&R research thrusts uses statistics to develop ``best'' division
and recombination methods for analytic methods. This is critical because the
division and recombination methods have an immense impact on the statistical
accuracy of the D&R result for an analytic method.
Another thrust is the D&R computational environment Tessera (http://tessera.io/).
The front end is R. The back end is the the Hadoop distributed database and
parallel compute engine. Our Tessera software manages the communication between
front and back, enabling the analyst to program D&R wholly from within
R, insulated from the complexity of distributed database management and parallel