Cockpit: "Correlations"
Discover correlations between variables and investigate correlations in detail.
Overview
- Overview of Correlations: A matrix showing pairwise correlations of variables (first view tab), and a list of all variables correlated with one chosen target variable (second view tab). In these views, you select the variables shown in detail in the rest of the cockpit.
- Scatter Plot for selected variable pair: Large detail scatter plot for variable pair selected in the overview. Possibility to select points, e.g., outliers.
- Correlation of selected pair per category (Drill-Down): The Pearson correlation of the selected variable pair compared for different categories or time periods.
- Time Series: Time series plots of the selected variables with the possibility to select records, and to zoom to selected time ranges.
- Table: A table-like view of single values that can be shown on demand.
Overview of Correlations
The "Pairwise Correlation" view is a half-matrix displaying the correlation for each pair-wise combination of variables.
- Clicking on a matrix cell (i.e., a pair of variables) shows the corresponding pair of variables in all other views.
- The color of a cell shows the Pearson correlation of the pair (red = positive, blue = negative, white = independent).
- The order of the variable pairs in the matrix is determined by their correlation value. The combination with the highest correlation is shown on top.
- The set of displayed variables can be configured by typing a text filter in the "Filter" field above the view, or by clicking the view title, "Pairwise Correlation", then "Select variables".
The "Target Correlation" view shows a list of all variables, each Pearson-correlated with one target variable.
- Clicking the name of a variable shows it, paired with the target variable, in all other views.
- The target variable can be selected by clicking the label above the color legend (e.g. "Correlation with Power_Generation_BrightCounty_PV" in the image above).
- The list of variables correlated with the target can be filtered by the "Filter" field above the view.
- Initially, the correlation with every variable is compared for different categories, like months. These are the columns in the right image above. This is useful for finding differences in correlation behaviour. To change the categories, or show only the overall correlation per variable, click the x-axis label (here: "DateTime [Month]"), then "Switch category", or "Remove".
- The variables are initially sorted by overall correlation with the target (disregarding categories), from strong positive at the top, to strong negative, at the bottom. Alternatively, click the y-axis label "Variables", and choose "Order by" for further options. For example, "Variance" brings variables to the top, where the correlation varies much across the categories on the x-axis.
Scatter Plot for the selected variable pair
This view shows the currently selected variable pair as a detailed two-dimensional scatterplot. In addition, a regression line is displayed on top of the points. In the view-specific options of the view title menu, the order of the regression line can be set (linear, square or cubic).
It is also possible to select points (data records) in the diagram by dragging the left mouse button. Selected points are brought into focus, not selected points are shaded in gray. Two regression lines are now displayed: one line for the records in the focus, another (gray) regression line for the records outside the focus. The selection affects the remaining diagrams. For example, the "Pairwise Correlation" diagram now uses the currently selected records as a basis for calculating the correlation for the combinations of variables.
Further options in the view title menu, when clicking the label "Scatter Plot":
- Selection mode: Available alternatives to the 2D rectangle are 1D intervals, a straight line selecting all points on one side of it, or a free-form Lasso selection.
- Automatic zooming: enabling this feature will let the view zoom automatically to the focus, whenever a new focus is defined.
Correlation of selected pair per category (Drill Down)
These views show the Pearson Correlation of the selected variable pair for different categories, using colors. Each colored cell (or bar) aggregates the data that falls within the corresponding (combination of) categories.
- Calendar: is only available if a Date/Time-typed data attribute is present. It shows one cell per day.
- Pivot Table: shows multiple statistics per (combinations) of categories. The categories can be selected via dropdown-menu when hovering the axis label.
- Bar Chart: a bar chart based on (combinations) of categories. The bar length encodes the number of data records per category. The categories for the bars can be selected via dropdown-menu when hovering the axis label.
- Heatmap: a grid-based, matrix display, built using (combinations of) categories on both the horizontal and the vertical axis. The size of cells encodes the number of data records per category combination. The categories used for building the chart can be selected via dropdown-menu when hovering the axis labels.
- Categories: Similar to 'Heatmap', for up to four categorical attributes at once.
The views can be used to perform selections of categories by clicking the category labels, which defines a focus for the other views. If a focus was already defined in other views, the Drill-Down views only consider the data records of the focus.
Time Series
These plots show variables over time. If the time axis has a regular raster, the timeline is connected as line graph. If more than one variable is shown, there are two ways of distinguishing them:
- The tab Time Series plots them in the same graph, and distinguishes them by colors. Instead of displaying individual scales for each variable, you can select a common scale, or normalize the values for display, by clicking the y-axis label, and selecting a different option in the drop-down.
- The tab Time Series (stacked) shows them side by side (limited to 5 variables at a time).
Key actions:
- Zooming: Drag a rectangle with the right mouse button to zoom in. Alternatively, use CTRL + mouse wheel to zoom in or out. Once zoomed in, there is a button in the bottom-left corner of the diagram with arrows pointing outwards. Click it to zoom out completely again.
- Select time periods by dragging the left mouse button to put them into focus for the other views.
- Adjust axis range by clicking the top/bottom label of the axis. Note: if you want to adjust multiple axes, adjust them from left to right.
The diagram offers several options when clicking the view title "Time Series". The most important ones are:
- Selection mode: changes the tool for selecting data records in the view. Available alternatives to the 2D rectangle are 1D intervals, or a free-form Lasso selection.
- Trend overlays: In addition to the displayed time series, a smoothed version of the time series can be displayed to analyze long-term trends. This is done by calculating a moving average over adjacent time points.
- Automatic zooming: enabling this feature will let the time series zoom automatically to the focus, whenever a new focus is defined.
Table ('Focus data records')
A table showing the single values. Only considers the data records in focus. Click the header of a column to sort the table by it. To change the set of displayed columns, click the header "Shown: 103 of 210 data attributes". Click single rows or drag a line with the left mouse button to select records, putting them into focus.
Exporting: A key use case of the table is exporting a selected subset of the data. To export the current state of the table, click on "Export" in the top right of this view.
Roles supported by this cockpit
The following roles can be given to data attibutes in this cockpit. Use the icon in the toolbar to adjust them.
- Time axis: This role can be given to a data attribute defining a temporal ordering of the data records. Can be time stamps, or values. It will be used to define the temporal context for all variables, e.g. the times of measurement, consumption, production, etc. If the role is assigned to a data attribute of date/time type, periods like 'Month', 'Hour' etc. are extracted and available for defining filters, and categorical plots like bar charts.
- Variable: Numerical data attributes with this role can be inspected in the cockpit and are considered in calculations. In case you want to exclude a variable from all considerations, simply do not assign this role to it.
- Category: Some views aggregate values by categories (e.g. per day of week, per month, etc.). When this role is assigned to a categorical data attribute, its categories are available for such aggregations. Example: Assigning this role to a 'Holiday' variable with categories 'yes' and 'no' allows comparing values like energy production for holidays vs. non-holidays. The role can also be assigned to numerical data attributes, which results in one category per distinct value of that attribute (e.g. different states encoded as 0 or 1). If the role is given to a data attribute of date/time type, periods like 'Month', 'Hour' etc. are extracted and available for defining filters, and categorical plots like bar charts.
- Asset ID: Can be used to specify columns holding "Assets" as categories, that should be distinguished/compared in the analysis.
- Upper limit (Visplore Professional only): A variable with this role defines an upper limit for another numerical variable. It is shown along with the referenced variable in time series views.
- Lower limit (Visplore Professional only): analogous to Upper limit, but for lower limits.
- Setpoint (Visplore Professional only): analogous to Upper limit, but variables with this role represent a setpoint (=desired state) for the reference variable.
License Statement for the Photovoltaic and Weather dataset used for Screenshots:
"Contains public sector information licensed under the Open Government Licence v3.0."
Source of Dataset (in its original form): https://data.london.gov.uk/dataset/photovoltaic--pv--solar-panel-energy-generation-data
License: UK Open Government Licence OGL 3: http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
Dataset was modified (e.g. columns renamed) for easier communication of Visplore USPs.