Filtering data

Data filters allow you to selectively hide parts of the data from all visualizations and calculations in the cockpit, to limit the analysis to interesting parts. For example, to keep only those data records that belong to certain time periods, data categories or value ranges, or hide irrelevant data parts with one click.



Note: filters always refer to entire data records - this means, whole rows of your data table are filtered out, affecting all of your variables. If you have some outliers in only one variable, you may want to take a look at the chapter on data cleaning.



Preparation (to follow this lesson using demo data)

If not the case yet, load the demo dataset and start the 'Trends and Distributions' Cockpit, as described in the beginning of the first lesson of the step-by-step guide.

Select Time Series "Peak_Wind_Speed_BrightCounty_Weather" by clicking its name in the statistics view.

Open "Bar Chart" and make sure it shows "Peak_Wind_Direction_BrightCounty_Weather" (as described in lesson on categorical data, if not the case).

As a result, Visplore should look like this:






Create a filter to keep only relevant data parts

You can define filters based on any data attribute, to limit your analysis to data from certain time ranges, value ranges, or categories.

Note: any filter you define affects all visualizations and exports. Including statistics, time series plots, histograms - everything.

Filtering by time intervals

Let's limit our analysis to the summer months of July and August.

Create a new data filter based on DateTime [Month], and select months 7 and 8.



Note: If you have more than 1 year, the DateTime [Month] would put all "Julys" in the same category, and all "Augusts" in another. To have one category per year/month combination (e.g. 07/2014), choose DateTime [Year/Month]).

Note: You can also define a custom time interval (e.g. "2.5.2014 - 7.5.2014") by selecting the DateTime attribute for the Data Filter, instead of certain extracted periods like [Month] etc.).




Filtering by value intervals

Filters by value intervals are another way of limiting the analysis to relevant parts. For example, let's assume we want to analyze only times where stronger wind speeds were measured. Filters allow to remove everything below, for example, 5 m/s.

Create a new data filter based on "Peak_Wind_Speed_BrightCounty_Weather", and only keep wind speeds between 5 and 30 m/s.





Filtering by categories

Categorical filters limit the analysis to data of certain categories. In our example dataset, we could consider only wind from certain directions (like "East"). For manufacturing data, typical usecases are filters for particular product types, grades, machine types, etc.

Create a new data filter based on "Peak_Wind_Direction_BrightCounty_Weather". Search for all "W" to filter for winds coming from West, Northwest, Southwest,.. directions.



Note: You can also select data directly in the visualizations to define filters. See tutorial on "interactive data selection".





Modifying and deleting filters

Any filter parts (or filter components) you have made, can be modified (or removed) afterwards, to include more or less data in the analysis.

Modifying existing filter parts

To modify any existing filter part, click its orange representation in the "Filter" bar in the top area of Visplore.
Here are some examples:

Click the date filter "July-August" and include September using the checkmark.

Click the value interval filter for "Peak_Wind_Speed_BrightCounty_Weather [5;30]" and type 1 instead of 5 as the lower limit. Note: you need to press Enter (Return key) to confirm a change.

Click the categorical filter "Peak_Wind_Direction_BrightCounty_Weather (NNW NW, etc.)" to add "E" and "ENE".





Removing filters

Remove the wind directions part. Alternatively: Remove the entire filter.



Note: You can also select data directly in the visualizations to define filters. See tutorial on "interactive data selection".




>> Continue with Next lesson: Saving and loading the analysis



License Statement for the Photovoltaic and Weather dataset used for Screenshots:
"Contains public sector information licensed under the Open Government Licence v3.0."
Source of Dataset (in its original form): https://data.london.gov.uk/dataset/photovoltaic--pv--solar-panel-energy-generation-data
License: UK Open Government Licence OGL 3: http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
Dataset was modified (e.g. columns renamed) for easier communication of Visplore USPs.