Analyzing tables in long format


Data tables in 'long' format are increasingly common for IoT/sensor data and time series databases. 'Long' means that different time series are not (or not only) represented by different columns of the table, but additionally need categories of categorical column(s) to be uniquely identified. A simple example is a 'Device name', 'Location' or other ID in a categorical column, that refers to independent assets. Therefore, with additional assets, such a table becomes 'longer', and not 'wider' as would be the case with wide tables. Timestamps can occur multiple times (once per category), or be very different for each sensor, which is also in contrast to typical wide tables. The image below shows an example of different Wind_turbine_names as asset categories:


Long table intro
In Visplore, the distinction by assets is not made automatically. This chapter describes, which settings should be set in order to effectively explore and analyze data tables in long format, based on an open Wind Power dataset you can download from here: ENGIE La Haute Borne windfarm dataset (modified, see credits at bottom of page). Some types of analysis will only work for tables in wide format, e.g. computing a correlation coefficient between different assets. Therefore, this chapter also describes the transformation of long to wide tables, which can be achieved in Visplore using Pivot tables.



Settings to work with data in long format

When loading a table in long format in Visplore (e.g. the demo CSV from above), Visplore will treat all 'assets' as part of the same variable (e.g. a mean value across all measurements, regarding which wind turbine they are from). In most cases, this is not desired, but assets should be treated independently. Look at the image below. The timeline holds measurements from all four turbines (the 'asset categories'). Also, the timeline is showing discrete points instead of connected lines by default, because each timestamp occurs multiple times (once per wind turbine), and therefore is not assumed to be a continuous time series.
You need to make a couple of configurations in Visplore, to work with such datasets in long format appropriately.


First, we recommend you distinguish assets by color, and make the display of the Time Series view continuous.


With one continuous line per asset category, now it is easy to distinguish the values of the 4 wind turbines. You can also plot the variables of each wind turbine separately, but it requires further settings:


As next step, we suggest to adjust either the Bar Chart or the Pivot Table tab to show the different wind turbines - in order to focus on specific assets at time. In this example, we are making the necessary settings in the Bar Chart tab, which you find in the "Drill Down" block in the top right corner of Visplore.

Make sure, the "Wind_turbine_name" category is displayed as the main category of the bar chart (see here to recap details on how to do this).

Select (=click) one bar to put one of the wind turbines in focus as shown in the image on the right. Before this step, make sure the Focus bar is empty.


The last step is to hide non-selected data in the Time Series view, so you won't see the data that are not in focus (= all non-selected turbines).


Great! You have made the settings needed to work with a dataset in long format. If you want, you can save the configurations as a .visplore template, which you can apply to new data of the same structure next month again.

Basic workflow at this point:
Advanced workflow tips for datasets in long format:



Transforming long tables to wide tables in Visplore

It is possible to transform the dataset to wide format directly in Visplore. You can use the Pivot table, to build the desired table in Wide format directly in Visplore (rastered timestamps as rows, assets/variables as columns). Do this as follows:

If you have a focus or filter defined, clear them for now (press the small "x" symbols next to the words "Filter" and "Focus" at the top of Visplore Clear Filter IconClear Focus Icon)

Open the Pivot Table (in the "Drill Down" section of Visplore, in the top right) and hide every statistics except the 'Mean' value.


You need to change the category on the Y-axis to see the mean values in a minute raster for the whole period. Note: this will aggregate all samples of the same minute to one row in the Pivot Table. As we only have a 10-minute resolution in our data, there will only be 10-minute timestamps shown, despite us allowing up to minute resolution.


Note: If you want hourly mean values, choose the "Year/Month/Day/Hour" category instead.

Now you need to subdivide the 'Mean' value by the Wind_turbine_name category so every turbine is displayed.


Note: In case you have more than one asset category you need to use multiple subdivisions.

The last step before exporting to wide format is the selection of the sensor variables we want to keep in our wide table. Do this by using the checkmarks in the 'Statistics' plot. If you want to select all of them, an easy way to do this is: [1] select the first variable (Absolutewinddirection_avg) in the Statistics view, [2] scroll down to the last variable, [3] hold down the 'Shift' key on your keyboard, [4] select the last variable (Windspeed_std).


Now the Pivot Table is set, so you can export it and load it in a new Visplore instance.


Great! Now a new Visplore instance is loaded with the dataset in wide format. Using the wide format, you can compute a Pearson correlation between the different assets and use the Multivariate Regression cockpit.




ENGIE La Haute Borne windfarm dataset is used for screenshots.
Source of Dataset (in its original form): https://opendata-renewables.engie.com/
License: Open License version 2.0 published by Etalab: https://www.etalab.gouv.fr/wp-content/uploads/2018/11/open-licence.pdf
Dataset was modified (e.g. columns renamed) for easier communication of Visplore USPs. Download the modified version, to follow our demo examples, from here.