Cockpit: "Trends and Distributions"
Analyze any number of variables regarding cycles, trends, anomalous patterns, and structural breaks. Relate multiple variables, summarize them by statistics, and drill-down to single values.
This cockpit provides a first overview of your data - even without concrete questions in advance. You can check the plausibility of your data, inspect individual variables in detail, or get the big picture of hundreds of variables using several overviews. For example, find groups of correlating sensors, and see how events propagate over time across multiple sensors. Compare statistics and value distributions for multiple time periods or other categories. Find trends and catch outliers, cleaning them on-the-fly.
- Overview of variables: Various overviews displaying all varibles, with the option of selecting one or more of them, as well as filter options through text input.
- Drill Down: View values of a selected variable aggregated according to time aspects or categories, and select time ranges for a detailed view by clicking them.
- Time Series: Time series plots of the selected variables with the possibility to select records, and to zoom to selected time ranges.
- Value Distribution: Various views showing the value distribution of the selected variables, including histograms and scatter plots.
- Table: A table-like view of single values that can be shown on demand.
Overview of variables
The diagrams in this group are primarily used to select variables to be displayed in detail in other diagrams.
- Statistics: A statistical overview that shows statistical figures per variable, considering only the records in focus. Sort the table by any statistic, and select the set of shown statistics by clicking the gray label "Statistics" on top.
- Heatmaps: An overview of similar groups of variables. It shows the mean value (or another statistic) of each variable over time (or other categories) using color, aggregated to a coarse raster (e.g. days). The variables are sorted by similarity of these statistics, to bring similar variables close together. Each variable is normalized, so that high or low values can be recognized from colors even when the value ranges are different. The aggregation raster, statistic, normalization method, and sorting method can be changed by clicking the areas highlighted in gray when hovering the view.
- Horizon graphs: An overview of time series graphs that emphasizes extreme values by color. Is only available when a date/time attribute is loaded. The colors show if values are high or low, using a color scheme of discrete steps.
- Histograms: A histogram per variable to compare value distributions. When a focus is defined (by selecting data records in another diagram), the list of variables is ordered to bring those variables to the top, for which the value distribution of the records in focus is different from the distribution of the records not in focus. For example, select some outlying population and see, which variables may characterize it.
These diagrams initially display the mean value of one selected variable per category, using colors. Each colored cell (or bar) aggregates the data that belongs to the corresponding (combination of) categories.
- Calendar: is only available if a Date/Time-typed data attribute is present. It shows one cell per day.
- Pivot Table: shows multiple statistics per (combinations) of categories. The categories can be selected via dropdown-menu when hovering the axis label.
- Bar Chart: a bar chart based on (combinations) of categories. The bar length encodes the number of data records per category. The categories for the bars can be selected via dropdown-menu when hovering the axis label.
- Heatmap: a grid-based, matrix display, built using (combinations of) categories on both the horizontal and the vertical axis. The size of cells encodes the number of data records per category combination. The categories used for building the chart can be selected via dropdown-menu when hovering the axis labels.
- Categories: Similar to 'Heatmap', for up to four categorical attributes at once.
The views can be used to perform selections of categories by clicking the category labels, which defines a focus for the other views. If a focus was already defined in other views, the Drill-Down views only consider the data records of the focus.
Change aggregation: for bar chart, matrix, and categories, the categories can be changed by clicking the area near the axis highlighted in gray.
Use other statistics than the mean value for coloring: click the label of the color legend, choose one of the statistics.
These plots show variables over time. If the time axis has a regular raster, the timeline is connected as line graph. If more than one variable is shown, there are two ways of distinguishing them:
- The tab Time Series plots them in the same graph, and distinguishes them by colors. Instead of displaying individual scales for each variable, you can select a common scale, or normalize the values for display, by clicking the y-axis label, and selecting a different option in the drop-down.
- The tab Time Series (stacked) shows them side by side (limited to 5 variables at a time).
- Zooming: Drag a rectangle with the right mouse button to zoom in. Alternatively, use CTRL + mouse wheel to zoom in or out. Once zoomed in, there is a button in the bottom-left corner of the diagram with arrows pointing outwards. Click it to zoom out completely again.
- Select time periods by dragging the left mouse button to put them into focus for the other views.
- Adjust axis range by clicking the top/bottom label of the axis. Note: if you want to adjust multiple axes, adjust them from left to right.
The diagram offers several options when clicking the view title "Time Series". The most important ones are:
- Selection mode: changes the tool for selecting data records in the view. Available alternatives to the 2D rectangle are 1D intervals, or a free-form Lasso selection.
- Trend overlays: In addition to the displayed time series, a smoothed version of the time series can be displayed to analyze long-term trends. This is done by calculating a moving average over adjacent time points.
- Automatic zooming: enabling this feature will let the time series zoom automatically to the focus, whenever a new focus is defined.
- Histogram: A histogram that shows the value distribution of up to five selected variables, and allows the selection of value intervals. If data is selected in another view, the histogram is immediately recalculated for the selected data. The histogram for the entire set of data records is then displayed as a gray context in the background. Scaling initally is in percent of data records, can be changed to absolute counts.
- Duration Curves: This diagram shows the value distribution of the selected variables similar to a distribution function. The Y axis displays the values of the variables - not as a function of time, but as a descending sorted sequence of values (i.e, the X axis is a percentage of the sorted data records). A slowly decreasing or even horizontal segment of the duration line indicates many records having similar values at that Y- position. This representation is suitable for an in-place comparison of value distributions for multiple variables of similar scale, and for quickly selecting percentiles of the data.
- Scatter Plot: The scatterplot shows the common distribution of two selected variables. The points in focus (= selected points in time) are displayed in full intensity. Data points out of focus are displayed as context in gray. A regression function (straight, square or cubic polynomial) provides information about a possible relationship between the two variables.
- 3D Scatter Plot: Shows the common distribution of three selected variables. The points in focus (= selected points in time) are displayed in full intensity, others are grayed out. Rotate the view with the left mouse button, move it with the right, and zoom in with the mouse wheel.
- Parallel Coordinates: In this diagram, the values of each variable are plotted on separate vertical axes. Values that belong to the same data record are connected by a polyline. Each polyline thus represents the values of all shown variables at the same time. The density of the polylines between two axes indicates frequently occurring value combinations, while the structure of the polyline bundle between two axes can indicate a relationship between variables.
Table ('Focus data records')
A table showing the single values. Only considers the data records in focus. Click the header of a column to sort the table by it. To change the set of displayed columns, click the header "Shown: 103 of 210 data attributes". Click single rows or drag a line with the left mouse button to select records, putting them into focus.
Exporting: A key use case of the table is exporting a selected subset of the data. To export the current state of the table, click the view title "Focus data records", and then "Data export".
Roles supported by this cockpit
The following roles can be given to data attibutes in this cockpit. Use the icon in the toolbar to adjust them.
- Time axis: This role can be given to a data attribute defining a temporal ordering of the data records. Can be time stamps, or values. It will be used to define the temporal context for all variables, e.g. the times of measurement, consumption, production, etc. If the role is assigned to a data attribute of date/time type, periods like 'Month', 'Hour' etc. are extracted and available for defining filters, and categorical plots like bar charts.
- Variable: Numerical data attributes with this role can be inspected in the cockpit and are considered in calculations. In case you want to exclude a variable from all considerations, simply do not assign this role to it.
- Category: Some views aggregate values by categories (e.g. per day of week, per month, etc.). When this role is assigned to a categorical data attribute, its categories are available for such aggregations. Example: Assigning this role to a 'Holiday' variable with categories 'yes' and 'no' allows comparing values like energy production for holidays vs. non-holidays. The role can also be assigned to numerical data attributes, which results in one category per distinct value of that attribute (e.g. different states encoded as 0 or 1). If the role is given to a data attribute of date/time type, periods like 'Month', 'Hour' etc. are extracted and available for defining filters, and categorical plots like bar charts.
- Upper limit (Visplore Professional only): A variable with this role defines an upper limit for another numerical variable. It is shown along with the referenced variable in time series views.
- Lower limit (Visplore Professional only): analogous to Upper limit, but for lower limits.
- Setpoint (Visplore Professional only): analogous to Upper limit, but variables with this role represent a setpoint (=desired state) for the reference variable.
License Statement for the Photovoltaic and Weather dataset used for Screenshots:
"Contains public sector information licensed under the Open Government Licence v3.0."
Source of Dataset (in its original form): https://data.london.gov.uk/dataset/photovoltaic--pv--solar-panel-energy-generation-data
License: UK Open Government Licence OGL 3: http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
Dataset was modified (e.g. columns renamed) for easier communication of Visplore USPs.