Manipulating data
How to use data imported from a csv file with spaces in the header.
When importing data from a csv file (dataSpace.csv) that has headers with spaces in the middle of some of the fields there is a need to address the data slightly differently in order for it to be used easily in your JavaScript.
For example the following csv data has a column named ‘Date Purchased’;
Date Purchased,close
1-May-12,58.13
30-Apr-12,53.98
27-Apr-12,67.00
26-Apr-12,89.70
25-Apr-12,99.00
This is not an uncommon occurrence since RFC 4180 which specifies csv content allows for it and d3.js supports the RFC;
Within the header and each record, there may be one or more fields, separated by commas. Each line should contain the same number of fields throughout the file. Spaces are considered part of a field and should not be ignored.
When we go to import the data using the d3.csv function, we need to reference the ‘Data Purchased’ column in a way that makes allowances for the space. The following piece of script (with grateful thanks to Stephen Thomas for answering my Stack Overflow question) appears to be the most basic solution.
d3.csv("dataSpace.csv", function(error, data) {
if (error) throw error;
data.forEach(function(d) {
d.date = parseTime(d['Date Purchased']);
});
...
});
In the example above the ‘Date Purchased’ column is re-declared as ‘date’ making working in the following script much easier.
Extracting data from a portion of a string.
Suppose we have a set of values we want to extract from a string because they cannot be used in their original form.
For example, the following csv file contains the column ‘value’ and the values of the data in that column are prefixed with a dollar sign ($).
value,date,score
$1234,2011-03-23,99
$2234,2011-03-24,100
$3234,2011-03-25,99
$4235,2011-03-26,100
We can use the JavaScript substring() method to easily remove the leading character from the data.
The following example processes our csv file after loading it and for each ‘value’ entry on each row takes a substring of the entry that removes the first character and retains the rest.
d3.csv("dataSample.csv", function(error, data) {
if (error) throw error;
data.forEach(function(d) {
d.value = +d.value.substring(1);
});
...
});
The substring() function includes a ‘start’ index (as used above) and optionally a ‘stop’ index. More on how these can be configured can be found on the w3schools site.
Grouping and summing data (d3.nest)
Often we will wish to group elements in an array into a hierarchical structure similar to the GROUP BY operator in SQL (but with the scope for multiple levels). This can be achieved using the d3.nest operator. Additionally we will sometimes wish to collapse the elements that we are grouping in a specific way (for instance to sum values). This can be achieved using the rollup function.
The example we will use is having the following csv file consisting of a column of dates and corresponding values;
date,value
23-Mar-11,3
23-Mar-11,2
24-Mar-11,3
24-Mar-11,3
24-Mar-11,6
24-Mar-11,2
24-Mar-11,7
25-Mar-11,4
25-Mar-11,5
25-Mar-11,1
25-Mar-11,4
We will nest the data according to the date and sum the data for each date so that our data is in the equivalent form of;
key,values
23-Mar-11,5
24-Mar-11,21
25-Mar-11,14
We will do this with the following script;
d3.csv("source-data.csv", function(error, csv_data) {
if (error) throw error;
var data = d3.nest()
.key(function(d) { return d.date;})
.rollup(function(d) {
return d3.sum(d, function(g) {return g.value; });
}).entries(csv_data);
});
We are assuming the data is in a csv file and is named source-data.csv.
The first thing we do is load that file and assign the loaded array the variable name csv_data.
d3.csv("source-data.csv", function(error, csv_data) {
if (error) throw error;
Then we declare our new array’s name will be data and we initiate the nest function;
var data = d3.nest()
We assign the key for our new array as date. A ‘key’ is like a way of saying “This is the thing we will be grouping on”. In other words our resultant array will have a single entry for each unique date value.
.key(function(d) { return d.date;})
Then we include the rollup function that takes all the individual value variables that are in each unique date field and sums them;
.rollup(function(d) {
return d3.sum(d, function(g) {return g.value; });
Lastly we tell the entire nest function which data array we will be using for our source of data.
}).entries(csv_data);
What if your data turns out to be unsorted? Never fear, we can easily sort on the key value by tacking on the sortKeys function like so;
.key(function(d) { return d.date;}).sortKeys(d3.ascending)
You should note that our data will have changed name from date and value. This is as a function of the nest and rollup process. But never fear, it’s a simple task to re-name them if necessary using the following function (which could include a call to parse the date, but I have omitted it for clarity);
data.forEach(function(d) {
d.date = d.key;
d.value = d.values;
});
Selecting a random string from an array.
What if we had a situation where we wanted to be able to select a random colour for the fill of a set of objects from a restricted set of colour options.
The colours we want are green, orange, red and blue and the solution uses an adaptation of the one presented by Jacob Relkin on stackoverflow.
First we start by declaring the colours in an array;
var colorRange = ['green', 'orange', 'red', 'blue'];
From there we set up the function that will return one of the elements of the array at random by calculating an index number from the array of possible options based on the length of the array;
function randomColor() {
return colorRange[Math.floor(Math.random() * colorRange.length)];
}
colorRange.length returns the number of elements in the array (in this case 4). This is multiplied by a random number between 0 and 1 (Math.random()). Then we get the largest integer that is less than or equal to our generated number using Math.floor. This ‘flattens out’ the result to be one of 0,1,2 or 3.
Then when we want to find one of our random colours we simply call our randomColour function a little like the following for a fill.
...
.style("fill", randomColor)
...