Claire Miller, Author of Getting Started with Data Journalism - Second Edition
A Leanpub Frontmatter Podcast Interview with Claire Miller, Author of Getting Started with Data Journalism - Second Edition
Claire is the author of the Leanpub book Getting Started with Data Journalism - Second Edition. In this interview, Claire talks about her background and career, data journalism broadly and the details of how it has evolved over time, the importance of mobile, her book, and at the end, they talk a little bit about her experience writing a book.
Claire Miller is the author of the Leanpub book Getting Started with Data Journalism - Second Edition. In this interview, Leanpub co-founder Len Epp talks with Claire about her background and career, data journalism broadly and the details of how it has evolved over time, the importance of mobile, her book, and at the end, they talk a little bit about her experience writing a book.
This interview was recorded on [interview-date].
The full audio for the interview is here: https://s3.amazonaws.com/leanpub_podcasts/FM201-Claire-Miller-2022-04-13.mp3. You can subscribe to the Frontmatter podcast in iTunes here https://itunes.apple.com/ca/podcast/leanpub-podcast/id517117137 or add the podcast URL directly here: https://itunes.apple.com/ca/podcast/leanpub-podcast/id517117137.
This interview has been edited for conciseness and clarity.
Transcript
Len: Hi I’m Len Epp from Leanpub, and in this episode of the Frontmatter podcast I’ll be interviewing Claire Miller.
Based in Cardiff, Claire is an award-winning journalist who works across both national and regional newspapers in the UK, specializing in data journalism. She is currently the editor of the Reach Data Unit for Reach PLC, one of the biggest newspaper groups in the UK.
You can follow her on Twitter clairemilleruk and check out her website at clairemiller.net, and also read her blog at clairemiller.net/blog.
Claire is the author of the Leanpub book Getting Started with Data Journalism - Second Edition.
In the book, Claire teaches both the basic skills that are needed for data journalism, and provides an updated account of the relevant tools and techniques that have evolved over time.
In this interview, we’re going to talk about Claire’s background and career, professional interests, her book, and at the end we’ll talk about her experience self-publishing a book.
So, thank you Claire for being on the Leanpub Frontmatter Podcast.
Claire: Hi.
Len: I always like to start these interviews by asking people to tell us a little bit of their origin story. So, I was wondering if you could talk a little bit about where you grew up, and how you found your way into a career in journalism?
Claire: I’m from London originally. After growing up there and going off to university in Manchester, where I did a variety of history, politics, psychology - all sorts of interesting things - I decided that I really wanted to become a journalist.
I’d done some work on the student paper while I was at university. I did what’s called the NCTJ training. For me, that was an 18-week short course, where you learn writing and law and shorthand, so it’s all a very quick introduction to all the basics you need to be a journalist.
After that, I worked for a local paper in Kent. Quite a small edition of a local paper. I spent my time sitting in parish council meetings, listening to people talk about street lights, and planning decisions - that was the kind of thing I was doing, those very, very local kind of issues.
I did that for about 18 months, two years, before I decided that I wanted to go somewhere a bit bigger. That was kind of how I ended up in Wales.
Originally when I came to Wales, I was working as a general reporter for The Western Mail and South Wales Echo, WalesOnline - it was a local newspaper, and then the national newspaper for Wales, and the website that’s associated with them.
When I pitched up in Wales, I didn’t know anybody. I didn’t have any connections. I didn’t have any contacts. So I was fishing around for story ideas that I could put on my story list, the news of the day, so I had something to write.
I’ve always been interested in numbers and data. I did nail the maths, because I just liked maths, and because I was going into an area that was to do with maths. I like numbers, and I get numbers.
When I was looking around for stories in Wales, what I discovered is that the Welsh government, and the government across the UK, publishes lots and lots of data. Lots of statistics on all kinds of topics. And these weren’t really being picked up by any other journalists, because, I suspect, quite a lot of journalists find numbers really, really boring.
But there was this sort of source of stories that I could pitch each day, that gave me something to write about, and were potentially really interesting. That’s how I got started writing about data.
And this - back in sort of 2011-ish, was around the time that the Guardian data blog had become quite big. Seeing what they were doing got me thinking that this was something we could do in local, regional news as well - that we could have a repository for these data stories. That we could do things around visualizations.
One of the nice things about data journalism, is that everyone’s very willing to share their skills and their expertise, and there’s lots of free tools. It’s quite easy to teach yourself techniques and things, and experiment and find ways of building graphs or making maps, or telling stories. It’s really developed over the last decade as the tools have got better and easier to use, and websites have got better and, yeah, that’s how I got started with data journalism - and then how it built from there.
Len: Thanks very much for sharing that. One of the reasons I was really looking forward to interviewing you about this, is that you’ve been there seeing all these changes on the ground in data journalism, that the rest of us have just sort of been receiving. But you’ve been on the creating side, and that’s more than just building the stories. It’s building the best practices and understandings amongst journalists and the teams on papers and stuff like that, to make it a reality. I’m really interested in asking about that.
But before I do so, I spent a few years living in London myself. Not as a child, but as an adult. I was just wondering if you wouldn’t mind talking a little bit about what part of London you grew up in, because it can really make a difference, your experience of the city where you live.
Claire: I’m from Twickenham. Quite on the outskirts of London, South West London.
Len: Okay. I variously lived in outskirts as well. Like Beckenham Hill in Kent, and stuff like that. And Balham, Golders Green, and various different places, moving around. I always get nostalgic a little bit, when guests bring up living in London.
And so you mentioned your story list, I believe? When you were sitting in on council meetings in London, or when you moved to Wales - I’m just sort of trying to get a little bit of a sense of a day in the life of a journalist doing the work that you do. Were you very independent, in terms of, they just sort of gave you a desk, and said, “Go find stories?”
Claire: With my first job, I had an edition of the local paper. It was a weekly paper covering Sevenoaks - the surrounds, and Kent. But I had an edition which covered Westerham and the villages round it, which meant that I had to get three pages - front page, page three and page five each week - with stories from those areas. So, it was fairly up to me how I went about this. I spent a lot of time in parish council meetings, desperately trying to find out what was going on - and what might be of interest to the sort of people living in the area.
Len: I think a lot of our listeners might not be familiar with the idea of a council - it’s really important to get a sense of the getting into detail work that you do. What is a parish council?
Claire: In terms of levels of governance, obviously you have the UK government, and here we have councils, which are of various sizes - but they usually cover a city or a large rural area.
And then parish councils literally cover a village, or a couple of villages. You’re talking, maybe covering sort of 2,000 people. And they have responsibilities often around things like streetlights, or maintenance of the local toilets. They also get consulted on any planning applications that are happening in the area. If you want to build a new home, the parish council gets consulted. They’ll have a meeting and talk about the application, and whether it will affect the look of the village.
They’re quite small, but they have quite a lot of powers around things that matter on a day-to-day basis to people. They also tend to be connected into things. We had police and community meetings together, which basically is bringing people who are interested in what the police are doing, together with the police. They can talk about, usually antisocial behavior and shed thefts - it’s those things that really matter to people locally, because people really do care about where they live. They want where they live to be nice, and to have good facilities - and they care that the grass verges get mowed, and there isn’t anti-social behavior, and there’s stuff for the kids to do.
On a power level within the UK, they are tiny. But it’s nice to cover them, because you do get to really know the lives of the people that you cover.
Len: And I imagine, people must’ve been really happy to have a local journalist around, to cover those stories, and to talk to them about their own interests, and what they’re trying to achieve in their local community?
Claire: Yes. You do make contact quite quickly with people who are very committed to their local area, who like to ring you up and tell you. They’re interesting, they - thebest phone call I ever had was somebody who rang me up to tell me all about the plan for a new one way system around the village green, who then finished off with a, “Did you know the local betting shop’s just had a break in?”
Len: Oh my.
Claire: An armed robbery. And you’re like, “Could you not have told me that first? I mean, the one way system is lovely - and I think it’ll be really, really good for tourism in the area. But armed robbery at the betting shop, that’s a proper story.”
Len: Good to have those contacts, even if they bury the lede, as it were.
I’ve got to say, it’s one thing I found - I grew up in a province called Saskatchewan in Canada. And there was just no concept really of local participation in the way that I learned about when I moved to London, and learned about local councils and things like that. The idea that there will be people who do stand on the corner, and look at that corner, and they’re like, “How could this be improved?” Or, “What’s wrong with this?” And they have a sense that there’s a place where they can go - and there’s people that they can talk to, about how to improve it. That sense of sort of local empowerment, was quite striking to me.
What I’ve been sort of building up to is, these local councils, if you go to the meetings, you’ll learn all this stuff. You’ll hear all these details. The local councils - since there’s so many people participating and communicating with them, and giving them their opinions and advice or what have you - they have a lot of information.
Have local councils in the past been very open and transparent with that information, and have you seen a change over time in how councils deal with all the information that they have, or make it available to other people to consume?
Claire: I think it really depends on the local councils, and how they feel about transparency and the relationship you have. When I was working as a local reporter in such small areas, I knew the people in the parish council so well. I knew the parish council clerks really well. If I needed information, it really did leave me hanging around the office with a cup of tea, while they dug out whatever I asked for. It’s that level of transparency that you don’t really get, unless you are very much embedded in the community.
But I think some councils are better than others. Because, if there’s somebody driving it who’s very committed, it works. But I think it’s not necessarily that they don’t want to be transparent. For a lot of them, they don’t have time. It’s another thing that’s not really a priority. They’re not answering those questions, because there’s something else that they feel like they should be doing.
Len: Yeah, there might’ve been a break in the local betting shop that they need to -
Claire: Yeah.
Len: I mean, I say that jokingly. But the pressing day-to-day things that you need to get done, to help peopl,e are more important - obviously - than data management.
And so you moved to Wales. And you were working for Wales Online, and you launched something called The Data Store, which was a repository for data and graphics, relating to stories published in the Media Wales newspapers. I was wondering if you could just talk a little bit about that project?
Claire: Yes. This is what I was talking earlier about. How the Guardian data blog had got quite big and well known. This was me going, “Maybe we could do something like this in Wales, with the data that we’ve got, and talking about stories that we do?”
I managed to rope a couple of people from the office into this project, one of whom was on the IT side, who could create a section of the website. This is back in the days when the websites were notoriously clunky. But it was mostly about me going, “Well, what can I do along these lines that -?”
From seeing what other people were doing. Seeing the graphics they were making, the tools they were using, the stories they were telling. And going, “Can we do this on a more local level? Can we do this about Cardiff or about Wales, rather than it being national or international?” Because there was all this data that we could get from the Welsh government, or from the office for national statistics. And this was also into 2012, when the last census came out, and there was massive amounts of data.
The census is like Christmas for a data journalist. We get all this data that we can use to build all these lovely maps with. And write about how nobody goes to church in Blaenau Gwent [?]. Having that data and being like, “Well, what can I do? What can I build?”
It was mostly me just experimenting a lot of the time. I was also coming up with stories that I knew would work well for the paper or for online, but just to have that data element.
Len: I imagine part of what you had to do was sort of ask for resources and time, and make an argument for why the organization should devote more resources and time to data journalism - and having the Guardian data blog be there as a precedent, would’ve really helped.
Do you remember if there was a moment when you realized that this sort of technology and data that was available, could really drive just endless data journalism projects?
Claire: I think finding Tableau was probably one of the ones - a business information system, mostly.
When they were starting out, they were quite keen to sell it to journalists as something we could make graphics and visualizations with. And it was quite useful for that analysis of data, that was slightly too big to go in spreadsheets, but probably wouldn’t be called “big data.”
But it was more extensive, and I think using that, and looking at things like parking tickets, which is - and, look, this comes back to these stories that really, really matter to people locally. They’re not necessarily the big issue. But they’re perennial stories that people care about, when you talk about their local area - looking at parking tickets on an individual ticket basis, and being able to go, “These are the streets which always get ticketed.” Or the fact that you could see where the patrols for the traffic wardens were going. And then you could tap into the stories that people were telling.
You had a church saying, “Well, we always get caught out on a Sunday, because they know that we’ve got a service. And then they will come around and take it all up - parishioners or people going, “Well, they never come to our area. We’ve got these huge problems with people parking badly outside our schools, but nobody ever comes and tickets them, because they’re always walking around the city center ticketing everybody there.”
It’s having that data, to be able to look into the complaints that people had, and being able to look in more depth, because of having something like Tableau, where you could visualize it all - look at it all on the map, and then dig down and look at different elements of it. I think that was the point which it was going, “Actually, there’s lots we can do.”
Len: And could you use Tableau directly, to give the production team what they needed to make the graphics for, say, the website or the paper itself? Or was there some extra step that needed to happen? I mean, I’m speaking very naively about this, sorry for that.
Claire: It was mostly me that did the graphics.
Len: Okay.
Claire: This was back in the days when I was making all of the interactive maps and things by myself, and then just put them on the website. And having to get things whitelisted, so we were allowed to use it on the site.
I think at one point, I had to build something on my website that linked out from our site. Because I couldn’t embed it in the website, because it wouldn’t fit - it would break the like bounds of the, around that time, the limits of my coding and visualization skills. I had to just do as much as I could.
The interesting one that’s developed with me and my team, is something called The Real Schools Guide, which is a big look into lots of data around schools.
The idea behind it was that lots of papers in this country publish league tables of schools. But they tend to only focus on exam results. It’s the same schools that do well - basically the schools that select for the pupils they know are going to do well - always come out on top.
We wanted to look in more detail around things like, how do people from poorer backgrounds do - those who are on free school meals. Or, looking at how well schools did with people’s with different ranges of ability.
There is a huge amount of data that’s particularly, in England, around results and breakdowns by different groups of peoples. That was the idea behind that.
The first iteration was very basic. We basically cobbled together these graphs using Google Charts and Google Fusion Tables. It was basically copied and pasted code that called the individual school IDN, and showed you the graphs. It’s now a very, very swish looking page on my website.
Because when I joined the data units, eventually we expanded to the point where we had a developer who could actually build these things and make them look good, and make them work properly. That was a really nice development. But in the beginning, it was always about, “What can we do with the skills that we have, and the tools that are free and available?”
Len: That’s really amazing. It’sreally interesting to hear how independent the work was, and that it was you, and other people doing what you were doing at the time, being very resourceful - and having to take on all kinds of different technologies, and figure all kinds of things out, that I imagine when you got into journalism, you probably weren’t expecting that you were going to have to figure out.
But now we’re at the point where there’s - I mean, when the demands of mobile became real with smartphones and stuff like that - I imagine getting designers, and justifying a request for designers and coders to be working on the team with journalists, probably became a bit easier - unless I’m wrong about that?
Claire: No, I think there was definitely - because when we started with the data units, it was me and my then boss, who was working in Manchester - who we bought together to expand data journalism across Trinity Mirror, as it was then - because it’s now Reach.
We showed that data journalism worked. You could get these great stories - that perhaps we could do it on a bigger scale, and create stories or resources for more papers.
And I think, from there, there was an idea that - well, it would be useful, when you expand the team, if you could have a developer and a designer. I think we showed quite quickly how useful it was to have a developer and designer, because it meant everything we produced looked better and worked better.
And particularly around mobile, it was always creating things that would work well on mobile, which had always been the challenge. I think when I was starting out, it didn’t really work with mobile very well. Or it had to be very simple and quite small to work on mobile.
Len: And I gather one very important feature of contemporary journalism, and particularly things with graphics and stuff like that, is that shareability is really important?
Claire: Yes. I think either being able to just share it generally, and having perhaps graphics that work on social media - now, some of the things we do are static graphics as well. We do that in print, but also potentially work on social. And it’s also a lot of the stuff around widgets and interactives, is whether we can have an element of personalization, and then share it.
One of the things that’s developed over the life of the data unit, is something called “pick my team,” Which is around letting people pick their football team. A lot of these things start out quite simple, it’s just Google survey-type things.
But that’s now something where you can pick your team. You put them on the pitch, and then you can share a picture of your team on the pitch, with a link back to this. I think it’s those opportunities around - if you’re making stuff that can be personalized, it’s allowing people to then share their version of it. I think that intrigues people, because they’re like, “Well, if that’s what they think the team should be, well, I completely disagree - and I’m going to go put together my idea.”
Len: That’s a really interesting example of interactivity, which is something that all of us have seen become more popular in the news articles that we read online. Where you can click on things, and make things happen on the screen. Particularly, drilling down into different types of data, where it’s like, “Well, I want to see this demographic or that demographic on the chart, and see how it changes,” and things like that.”
It wasn’t until I was researching for this interview, and reading some of the stuff that you’ve written, that I realized that this was me personalizing, in a sense, what I was seeing; I just never thought of it in those terms. And that really makes it captivating, and make a lot more sense.
One, I guess, high level question I wanted to ask you, is - maybe I can get into it by going back to something you mentioned? Which was league tables for education data in the UK.
I’m now getting even more nostalgic - one of the other things that I found surprising about life in the UK, was that data around education was a hot topic in the news all the time. I remember, it felt like every year there’d be this moment when the results of the A-Levels would come out. And if they were too low, it’s like, “The teachers are failing the kids.” And if they were too high, it was, “The teachers are failing the kids, because they’re making school too easy now.” I always felt sorry for the poor teachers, and education administrators, who had to deal with this unwinnable war for public opinion.
But that just leads me into the question of - presenting people with arguments and words is very different from presenting people with numbers and with charts and graphs, and things like that. I mean, specifically in the sense that like - I think people are more willing to see articles written in words as things they can contest, and charts and graphs and numbers as incontestable.
And the flip side of that, is that when they see numbers and charts and graphs - they think that now they understand the reality of things, when maybe they might not.
In my own life, iIve seen this particularly with things like financial projections and stuff like that, where if you show a group of investors, say a chart showing some projections in the future - they might think that you’re telling them, “This is what’s going to happen,” when actually there’s a bunch of assumptions built into your data model. What you’re seeing is the result of a bunch of decisions, not a passive representation of the world.
I was just wondering, from your perspective, is there a way you’ve developed of framing the data that you present to people, in a way that lets them know, “This isn’t necessarily as straightforward as it might appear, since we’re giving you this finished product?”
Claire: I think there’s always going - explaining where the data’s come from, how you’ve possibly put it together. Sometimes it’s going, “It’s come from this source. They’ve done this.”
With a lot of things - possibly it’s a survey. Even if it’s coming from quite a credible source, like the Office Of National Statistics, this is only telling you the exact thing. But it’s giving you an idea. They’re doing their best to try and keep up with what’s going on.
And sometimes with stories, what we’re usually also adding, is comment from people who know more about the topic than you do. Quite often you’re following up with, “This is what the numbers say. What does that mean?” - from charities or experts. Or quite often also the government. To give their side of like, “What do these numbers mean?”
I think - potentially one of the ones that we do every year, is one around rough sleeping, which is a really, really hard story to get numbers on. Because the only numbers that exist are taken as a survey on one night, and are put together in various ways. Some people in some areas go out and count the people that they can see. Others use intelligence from charities working in the area to try and work out roughly how many people they think are sleeping rough.
And the thing with that is, what you can say is, “Well, this is what those numbers are, and this is how they compare to the numbers which were put together in much the same way previous years. They maybe give an indication of whether numbers are going up or down. But they probably don’t give you a true indication of just how many people are sleeping rough. Because that number changes literally every day. And if you compare those numbers to these other numbers, that are put together in a different way, you can see that they’re not the same.”
It’s like you’re trying to give people the nuance and the details, but you also - you’re working with this data that’s useful, because at least it gives you some indication, even though it’s not ever going to be completely comprehensive.
Len: Thanks very much for sharing that particularly difficult example of getting information. Where - in the case you mentioned - you can’t really get anything comprehensive, because of the nature of the thing you’re investigating. And at the same time, it’s so important to try and do your best and present it to people specifically about what’s happening in their community with people who are homeless and sleeping outside.
With respect to data journalism and charts and graphs and numbers, something happened a couple of years ago that made that very much more important part of people’s lives than it maybe had been in the past - other than weather and sports, which of course is the pandemic. I was wondering if you could talk a little bit about how you’ve experienced this time as a data journalist? Seeing this explosion of interest in, and contestation over, data data journalism. Has the way you’ve done things changed?
Claire: I don’t think necessarily the way that we’ve done things has changed, because our focus throughout has been very much around local data, and, how can we localize this? We haven’t necessarily done a lot of the big explainers. It’s always been around, “What’s happening in your local area?” Because that is working with regional press, that’s kind more about remits. We know that the nationals were covering, “This is the overall picture.” But it’s more like, “What’s happening here?”
I think what changed is around lots of spreadsheets being updated very, very often. It’s possibly, the experience of the pandemic as a data journalist is - I was literally updating spreadsheets every day for weeks on end. We had up-to-date figures on cases, on deaths. I think that’s probably quite a common experience. Is that you just - there was a lot of that underlying, “keeping the numbers going” work, that was going on.
Len: For example, were you having other journalists like banging down your door at two in the afternoon, like, “I need some information for my next article.” And then someone half an hour later saying, “I need it for my next article.” Or was it -?
Claire: In a lot of cases, they were powering interactives. If you wanted people to be able to search the latest numbers, the latest numbers needed to go into the correct spreadsheet, so they were ready.
I think it was really interesting watching the way - when the government’s response, and the government’s data, evolved over the pandemic. My job throughout has been upkeeping the list of hospital deaths spreadsheet updated every day. It’s now not quite every day, because they don’t update it as much. And so the first few weeks was literally the press office sending out a list of hospitals and the number of deaths at those hospitals.
I think quite quickly, they realized that was not going to be sustainable. Because it went from being one or two deaths, to being 20, 30 a day. And at that point, they were like, “Actually, we need to get a proper spreadsheet going out each day.”
It was interesting the way that, I think initially, there was a very ad hoc response, in terms of, they knew that people were asking for this data, and they need to put it out. But they hadn’t really thought about how they were going to put it out.
And over time, particularly for the UK government, and particularly covering England, there’s now an extremely good database of cases. Deaths, hospitalizations, vaccinations. It’s got an API that works really well.
You can pull the data from there directly. I think that’s the understanding of like - what do people want and need, and how can we give it to people in a way that they can -? Either if they’re the member of the public, they can literally just come on this website and look up what they want. Or if they are somebody who’s more technical and wants this information in machine-readable format, it’s also got that.
I wish the Welsh government would do something similar, because theirs just isn’t as good. But the one for the UK government and England in particular, is very useful now - in terms of keeping track of all the statistics.
Len: Thanks very much for sharing that. That’s really fascinating to hear, because I’m - I mean, again - like myself and most of our listeners, would be people who just saw these changes happening from the outside. But to hear about this relationship between the government and the reporters, and probably a bit of back-and-forth about how to do it, how to best communicate and how to make their practices better - it’s just really interesting.
Actually that - speaking of government - governments aren’t always so forthcoming with information. And you write in your book, which we’ll start talking about soon in the next part of the interview - about freedom of information. And there’s something, I believe, in the UK, called the Information Commissioner’s Office or ICO, and I was wondering if you could talk -? I know you write about this on Twitter and on your blog and stuff like that. I was wondering if you could talk a little bit about how the freedom of information process works in the UK? Claire: Okay. Yeah I’m a big fan of the Freedom Of Information Act. In the UK, anybody can ask public bodies - and it’s a big list of public bodies, from school - individual schools, through to the government itself, and government departments - for any information that they’re interested in.
The idea is that, if they can give out that information, they have to - and they’ve got 20 working days to do it. Obviously there’s a list of exemptions and reasons why they can refuse to give out this information. It covers things like, if it’s a really big, complicated request, and it’ll just take far too long to do, it’ll take too many resources, they can refuse it. Or there are also things around personal information or health and safety. Or if it’ll have an impact on law enforcement.
There’s a number of reasons why they can refuse. But the general idea is - if they’ve got the information and somebody asks for it, and there isn’t a reason why they can’t give it out - they need to give that information to the person who’s asked for it.
Which is obviously very useful for journalists in terms of finding out about all kinds of things that public bodies haven’t necessarily chosen to publish. Because the amount of information that they do publish is often quite limited, and there’s often topics or more detail that you’d like. This is just a way to get hold of that.
Len: It’s really fascinating. There’s a whole section about this in the book. And if you follow Claire on Twitter, you’ll see her tweet about it. And I’m just looking at your blog here at the moment as well. You have just really good examples of the very specific kinds of information that you can get about like council investments and - I’m seeing like calls answered, but details missed - and things like that. There’s very interesting details that you can find.
I guess, another high level question I have is - if the government has all this information, and if they have to give it out if people ask - why isn’t it just public in the first place? And let’s bracket the question of resources and time and funding, or something like that. Is there some deep reason that governments are so protective of information?
Claire: I mean, sometimes it is just resources. Sometimes they just don’t think it’s interesting. It’s really interesting sometimes what journalists will think are stories, and where press officers are like, “Nobody’s interested in that.” Like, “Well, the views on the story online tell me differently.”
Sometimes they just don’t think it’s worth publishing. Other times, they don’t want to publish it because they think it’s embarrassing to them, particularly if it shows that they spent money that didn’t achieve anything, or they’ve messed up somehow - or any of those things.
Sometimes they just don’t want to publish it. Sometimes they just don’t like being transparent. I mean, some public bodies definitely have a culture around, “We don’t tell people stuff, unless it’s for our benefit.” And so just publishing lots of information online doesn’t really fit that. I think in other cases, it’s -
Len: My question was so vague, I apologize - bad podcast hosting there. But it is a really interesting thing to think about. As you’re mentioning, it can be specific to the culture of a particular department or office, or something like that.
Not too long ago, I interviewed someone for the podcast named Giles Turnbull, who worked on GOV.UK. He’s still a consultant who works with lots of government bodies, and he talked about specifically how - there was this one group he worked with within the government, that was working with farmers handling some of the consequences of Brexit.
And they discovered that by being open about their own challenges with handling the uncertainties of what was happening, they actually increased the public opinion amongst their constituency for their work. Because they were just being more open and honest about it.
But a lot of the resistance from within the department came from, “Well, if we show people our failures, they’re going to like us less.” And it seemed counterintuitive to think that people would like you more for being open about your failures. But that’s what they discovered - at least in their case, that it humanized them.
Claire: Yeah, I can believe that would be the case. And there’s also - if you’re more open about stuff, people can get involved. Or they are understanding, because they’re getting more of the information. Also, you potentially pick up help. Because if you’re being open about where things are going wrong, there’s potentially someone out there who’ll go, “This happened to us, and we did this - and then we solved it.”
Len: That’s a really great observation. That being open gives people the opportunity to help, right? People actually like being able to help. And they especially like seeing something happen because of something they’ve done in their community or in their constituency.
Just moving on to talk about your book, Getting Started with Data Journalism - Second Edition, this is the second edition of the book. I was wondering if we could maybe go back to - what was your inspiration to write the first edition of the book, and around when did that happen?
Claire: That was back in 2012, 2013. I’d been working on the Wales Online Data Store for a while. And this was possibly around the time that the data unit started. I think it was kind of - I’d been blogging for a while, and writing about things I’d done and how I’d done them. As I said earlier - the data journalism community is pretty good at sharing resources, and sharing tips - and talking about what’s worked. I’d been doing blogging around that.
I think I was thinking, “Well, if I can put it all together in a book, then that’s potentially useful for other people who are interested in data journalism - to give them a view of all the things they need to get started. From the very basics of how spreadsheets work, through how to create maps and how to clean up data.” That was the idea behind it. It was just put everything in one place, so that it was a really good beginner’s resource.
Len: And the book, I should mention, is really great at that. At talking in detail about like, “What’s a spreadsheet? How does it work? When you click into it - what does that mean, for what’s going to happen to the cell?” And things like that.
It’s the thing that is very difficult to get right for these kinds of books. And you do it very well.
And so, since the first edition came out, what’s changed is a bunch of best practices, but also tools and things like that. I was wondering if you could talk a little bit about what some of the more recent advancements in say Excel are, that are so useful to people practicing data journalism nowadays?
Claire: Well, I think the big change was the loss of Google Fusion Tables, which was the tool back when I was starting out 2012. It was relatively new then, and we were amazed by the fact that we could make these great big maps with lots of colors on them, and all these things. And then Google shut it down. Because it was a useful tool, so it sadly didn’t survive. And so that was - one of the major problems with the book, is there’s like whole sections on this thing that no longer existed.
It was, for me, trying to find new ways of doing the same things that I’d been able to with that - particularly those colored maps of things. And particularly the adaptability that it had. Because the nice thing was that you could make maps of any area that you wanted, as long as you could find a base map. And I think - what I found, was that both Flourish and Datawrapper, both offer that functionality. We had been rewriting those sections to explain how you can use both of those to make your own maps.
The other thing with Google Fusion Tables, was you could use it to merge data together - so if you’ve got several spreadsheets that you wanted to put into one, by matching them up on codes. One of the options is to use Excel formulas to use “VLOOKUP.” But that takes forever if you’ve got massive spreadsheets with 30, 40 columns and 10,000 rows. I think with Google Fusion Tables, you literally just smoosh them together and come up with one big spreadsheet. I don’t think - totally - I’ve found anything that’s quite that level of functionality.
But again - interestingly, with Flourish - you can merge stuff together, and that replicates that functionality.
I think with the second edition, some of it has been finding replacements for tools which are no longer with us. And trying to find ways to do that.
But I think that’s pretty common in data journalism. Quite a lot of data journalism is around solving problems. Thinking, “Well, what can I do with the tools that I’ve got, to solve this problem I had?”
What I had recently was, somebody had asked me to do a story looking at where in the UK has the most Costa Coffees, which is a big chain of coffee shops. But to do this, I needed all of the data from the food hygiene ratings. Everywhere that sells food or drink has to be inspected by the council food hygiene inspectors, to check that they’re clean and safe, and serving hygienic food. Which makes it a really good resource for getting the addresses and locations for all of the food venues.
But it’s published in individual XML files, of which there’s about 400. I don’t really have coding skills. I tend to do everything in spreadsheets and other tools, trying to find a way to get all of these XML files into one. I then discovered that you can do something quite clever in Excel to drag them all in in one go, and use their import tools to make a great big spreadsheet with all of the data in it. That was quite fun. I think a lot of the time, it’s trying to find, “Well, what have I got, and how do I not do this manually one by one, and take 12 hours?”
Len: Actually for anyone listening who’s interested in learning about what Claire was just describing, she’s got an article about it on her blog. And i’ll make sure to link to it in the transcription.
In a former life, I was a financial analyst type, an investment banker, and I had to do a lot of - I mean, everything I did was in Excel, basically. Excel and PowerPoint was my life, basically. And when you talk about doing things manually, I was getting flashbacks to the dreaded PDF.
So, just for anyone listening, if you’ve never experienced it. If you’re analyzing - let’s say food and health safety information, to try and find out the locations of coffee shops, or if you’re looking at any industry data. Like, what are the projections for coffee consumption going forward ten years?
You’ll probably go online and you’ll find some data somewhere. And it might come in the form of a PDF. And there it will be all nicely laid out in the PDF. But when you try and copy and paste it into a spreadsheet, it’s just gobbledygook.
And then you find yourself - you might do things, like I remember - I used to get the free version of some optical character recognition software, and try and use that to like save myself the time from having to manually type things out. There were all these hacks.
That was quite some time ago that I was doing that. Has handling PD’s got any better in the last - I mean, since the first edition of the book came out in about 2014 or so. Has handling PDFs got any better?
Claire: No. I think the free optical character recognization tools are better than they were.
Len: Right.
Claire: Because I think in the past, they did used to just translate it from gobbledygook to more gobbledygook. But I think there are more of them, and they are a bit better. And there are lots of tools that will take your information out of PDF and into a spreadsheet.
Generally they’re not too bad. You will spend a lot of time trying to get columns to line up, and deleting straight rows. But, yeah - on the whole, it’s slightly better than it was - in that the tools are a bit better. But there’s still a lot of cleaning. I think it’s spending a lot of time - as a data journalist particularly - you do a lot of FOI, spending a lot of time asking public bodies, “Please don’t send me a PDF. Please just send me a spreadsheet.”
Len: Yeah, definitely always getting the spreadsheet was the number one thing.
That leads me to ask another question about getting started with data journalism. You mentioned earlier on in the interview that you took a course. Is data journalism part of journalism education now, for budding journalists?
Claire: Yes. Quite a lot of journalism degrees and Master’s have either a module on it, or there actually are degrees that are more around data journalism and computational journalism. Cardiff University has a masters that is much more focused on the intersection between journalism and Computer Science, and what you can do in that. Which is really interesting.
But yeah, it’s much more widely taught. I think most journalists going through university will have done some data journalism, even if its just a short module going through - the filing exists, the basics. That’s where a lot of people, I think now, get their interest, is they do a bit of it, and like, “Actually, this is really interesting. We can do lots of things on these great stories.” And then they start exploring it for themselves, and they do more modules, or start looking around at what resources there are.
Len: For anyone listening who might be interested in pursuing that path, one thing that you address, is that you don’t need to go into it having already been a computer programmer, or knowing the R programming language, or something like that. Or being a maths guru. There’s actually all kinds of really fascinating ways you can get started using free tools that are out there, to do your own data journalism.
Claire: Yeah. Because I think there’s a bit of a misconception that you must learn programming to do data journalism. I’m going to be honest, my programming skills are very limited. I can just about knock up a JavaScript chart. But I’ve been able to do a lot of really interesting, in-depth stories, and quite technical data cleaning, and finding ways to scrape data off websites using a lot of these different tools that don’t require a programming language. The vast majority of the stuff I do is done in an Excel spreadsheet.
Because a lot of the data you will deal with isn’t necessarily massive data. It’s either government releases or it’s FOI responses that you’ve put together as a spreadsheet - that you can analyze in Excel, and you can get the great interesting stories out of. You don’t necessarily need to be able to use R or code, or use Python to scrape things off websites.
While those are great and useful skills, and you’ll get great stories doing that - you can get a lot of brilliant stories without those. So, if you want to get started with it, you don’t need to be like, “Oh God, I need to go and learn Python.” It can be like, “I just need to know my way around a spreadsheet, and I’ll pick the rest of the skills up as I go.”
Len: The last section of these interviews, if the guest is an author, is to talk about their experience writing a book or books. I was just wondering if you could talk a little bit about - for those interested in that, about your process for writing the book? Did you have a plan? Did you plan out every chapter in advance? Did your plans change as you went through it? Were were a lot of the pieces based on blog posts that you’d already written? Things like that?
Claire: I think with the first edition, it was partly based on blog posts, and also on dividing it up into different sections. Going, “Well, there needs to be a section looking at visualizing maps. And then there needs to be a section to pick out charts. And then I probably need to cover where you get your data from. What is data?”
I think initially, it was probably planned in that, “Well, what topics do I need to cover, and where do those fit?” I mean, writing the second edition was much more about going back through the book in its entirety and going, “What sections still make sense, and work?”
A lot of the stuff around, “What is data?” hasn’t necessarily changed. But then other sections - either the tools didn’t exist, or things have changed. Or I wanted to expand things
The previous section I felt I was very basic - just explaining the Freedom of Informaation Act, how it worked. Whereas now, I can be like, “This is what you can do if you get refusals, and how you can challenge them and put more detail in.”
Interesting, like in the section on cleaning up data - I think in the first version, I’d written how to clean up - I’d had these particularly messy spreadsheets of data on parking tickets, that were very much designed to be looked at, rather than actually used. I’d use those as an example, to show how you can use various spreadsheet formulas to clean them up.
I think in the first book, I’d got to the point where I’d cleaned them up mostly. Then I was like, “At this point, you’ll just have to do this all manually.”
Whereas in the second version, I was like, “No, I have a formula that will solve this, and get that done in like five minutes.” That was quite nice, to be able to go back through and be like, “My skills have evolved, and also my patience for doing things manually is much lower than it obviously was back then.”
That I’ve managed to find ways to clean those up, to copy down dates, and put things in columns - and just generally get everything in a spreadsheet format that you can actually analyze, in a way that’s much better than it was back in 2013.
Yeah, that was definitely the process with the second edition, going through, expanding, adding bits. The structure didn’t really change. It was just the content, it’s much more up to date and much more detailed in places.
Len: I should mention, the section on cleaning data in your book - for example - is full of incredible time-savers for people who work with these kinds of things. I really like that description you just gave of a spreadsheet designed to be looked at, rather than used. I think we all know that feeling, “Oh I’ve got the spreadsheet. Thank goodness.” And then you open it up, and it’s like, “Oh no, this wasn’t made for people to do analysis, rigorous analysis on.”
Speaking of tools and things getting better, if the guest on our podcast is a Leanpub author, the last question we save for the end, the very last question is, if there’s some terribly awful thing about Leanpub that we could fix and make better for you and other authors that you can think of, or if there was some magical feature that we could build for you, is there anything you would ask us to do - or stop doing, perhaps?
Claire: I’ve generally found it easy to use, and quite intuitive. I think I swapped from writing it in text files in Dropbox, to writing it on the browser, because for some reason, it just did not like that.
I think the only area where I struggled a bit was, because my book has a lot of pictures in it, because a lot of the examples have got pictures to go with them, I think that’s where I struggled around getting pictures of the right size and the formatting right and -
Sometimes it took several attempts to get the picture right. It was either tiny, and you couldn’t see what the heck was going on. Or it was taking up the entire page.
I think stuff around adding pictures to books, it feels like there might be like an easier way of kind of - I think at the moment, it’s like - it’s easy to put them in when you’re working with the document. But then you can’t see what it looks like, until you preview it and try to get between the two to get it to look right. It was a bit time-consuming.
Len: Thank you very much for sharing that. That’s really useful actually. Claire’s mentioning previews. What happens when you write a Leanpub book, is you write it in plain text. You’ll put your image file somewhere. And then in your manuscript, you refer to that image. And you say, “I want this image to show up here.” But you don’t know what it’s going to look like, until you click the button to generate the ebook files. And then you can see. Having to do that over and over again - to try and get each image right, can be quite frustrating and time consuming. It’s an area where we know we have a lot of work we can improve.
Also, I know that you’re writing your book in our Browser writing mode. And the way that resources, like images, are handled there, could use a lot of improvement.
It just doesn’t look very nice, and things can’t be sorted. That’s an area where we could do a lot of improvement.
One thing we could do, and that we should do, is document what are the optimal image sizes, and things like that, for the different book page sizes for the PDFs. And just a little bit of a guide on, what will images look like? What does an image look like in the PDF and in the EPUB, and in the MOBI - and if someone’s reading in an EPUB reader on the web, things like that. That’s an area where we could do a lot of improvement. I really appreciate your feedback on that.
Well, Claire - thanks very much for taking some time out of your evening to talk to me, and to talk to our audience about your experience as a data journalist. And thank you very much for using Leanpub as the platform to publish the second edition of your book.
Claire: Thank you.
Len: Thanks.
And as always, thanks to all of you for listening to this episode of the Frontmatter podcast. If you like what you heard, please rate and review it wherever you found it, and if you’d like to be a Leanpub author, please visit our website at leanpub.com.
