Cockpit: "Dependency Analysis"

Discover non-linear dependencies between variables and analyze them in detail.

Pro This cockpit is only available in Visplore Professional.

Overview

This cockpit presents the dependency of chosen target variables on other, independent variables. The goal is to identify descriptive features for modeling the targets. Dependencies on pairs of independent variables are also displayed to understand the effect of parameter interactions. If you have more than one target variable, one must be chosen for a detailed analysis at a time.

In addition to visualizations of the relationships, the dependencies are also quantified by a relevance measure, where the independent variables (or pairs) are ranked according to the strength of the relationship. Especially with larger numbers of variables, this guidance towards the most relevant ones can be very helpful.

  1. Selection of the Target Variable: Here, you select the currently viewed target variable by a click. If there is only one target variable, this view is not available.
  2. Overview and Selection of 1D Dependencies: List of independent variables ordered according to relevance for the selected target. Click the independent variable to select it for detailed visualization. Use the search field to filter the list by text in the variable name.
  3. Overview and Selection of 2D Dependencies: Matrix of pairs of independent variables ordered by relevance for explaining the target variable. Click to select a pair for detailed visualization.
  4. 1D Dependency Details and Time Series: Detailed visualization of the selected relationship. Selecting a value range for the independent variable with the left mouse button, makes the overviews visualize only the remaining target variance within the selection.
  5. 2D Dependency Details and Drill-Down: Detailed visualization of a selected pair of independent variables for the current target variable. Also allows data selection, such that the overviews focus on explaining the remaining variance within the selection.

Starting the cockpit: assigning semantic roles


The following roles can be given to data attibutes in this cockpit. Use the icon in the toolbar to adjust them.


Selection of the Target Variable

In this scrollable list, you can select all variables that have been assigned the "Target" role on cockpit startup.


Overview and Selection of 1D Dependencies

This overview shows a list of all independents, sorted by their relevance for the target variable. The relevance is assessed by how much less the target varies, if the value of the independent variable is known, than when it is not known. Technically speaking, it is the R square metric of a piecewise constant model with the independent variable as input. Due to the piece-wise assessment, non-linear relationships are also found by this. The darker the background of the cell, the higher the relevance.

The second key aspect of the overview is a visualization of the dependency of the target on each independent variable. It is represented by the conditional distribution of the target variable: The independent variable is binned into discrete intervals (or categories, if categorical) along the x-axis. In each bin, the distribution of the target is shown as a vertical boxplot.

The vertical box plots per bin show the median of the target values in black and the interquartile distance in dark grey (25% to 75% percentile, i.e. the range in which the middle sorted half of the target data lies). In light grey, the range between 2.5% percentile and 97.5% percentile is displayed, i.e. the essential 95% of the data without any outliers of the target size. For a better understanding, you can click on a variable to display the plot large and labeled in the "1D Dependency Details" view.


Overview and Selection of 2D Dependencies

This overview shows a matrix of pairs of independent variables, sorted in descending order by relevance for the target variable. The relevance metric is the same as for the 1D Dependencies, however the variable pair is binned into 2D regions. Pairs with high relevance, i.e. where the target varies systematically, tend to be at the top of the matrix. Pairs of less explanatory independent variables tend to be located further down on the right.

Apart from the order, a key aspect of this overview is the visualization of the dependency of the target for each pair. In each cell of the half-matrix, the 2D space of possible values of variable pair is binned into 2D regions. In each region, the target values of points in that region is averaged, and shown in color. Thus, systematic variations are shown as color gradients in this 2D grid. For a better understanding you can click on a plot to display the plot in large and labeled in the view "2D Dependency Details".


1D Dependency Details and Time Series

This diagram shows the relationship between a target variable on the Y axis and a selected independent variable on the X axis.

If a numeric independent variable has been selected, a scatter diagram is used to display it. Additionally, the distribution of target values is summarized by box plots (based on the overview display). A regression function also provides information about a possible relationship between the two variables. If data has been selected, the box plots and the regression refer only to the selected data (focus).


2D Dependency Details and Drill-Down

This diagram serves as a detail display for the overview diagram "2D Dependencies" and shows the currently selected plot, enlarged with labels. The mean value of the target variable is color-coded using regular interval splits of the independent variables, whereby the size of the cells is scaled according to the number of averaged values.



License Statement for the Photovoltaic and Weather dataset used for Screenshots:
"Contains public sector information licensed under the Open Government Licence v3.0."
Source of Dataset (in its original form): https://data.london.gov.uk/dataset/photovoltaic--pv--solar-panel-energy-generation-data
License: UK Open Government Licence OGL 3: http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
Dataset was modified (e.g. columns renamed) for easier communication of Visplore USPs.