The Leanpub Podcast: David Diez, Co-Founder of OpenIntro

Len: Hi I'm Len Epp from Leanpub, and in this episode of the Frontmatter podcast I'll be interviewing David Diez.

Based in San Francisco, David is a tech lead in Fan Funding at YouTube, and co-founder of OpenIntro, a non-profit organization that develops high-quality, tested educational resources, including open source textbooks, free videos, and much more.

You can follow OpenIntro on Twitter @OpenIntroOrg, check out their website at openintro.org, and you can also subscribe to their popular YouTube channel OpenIntroOrg.

OpenIntro has published a number of books on Leanpub, including OpenIntro Statistics, Advanced High School Statistics, Introduction to Modern Statistics, Introductory Statistics for the Life and Biomedical Sciences, and more.

In this interview, we’re going to talk about David's background and career, OpenIntro, some of the important issues regarding the publication and sale of college textbooks generally, and at the end we'll talk about his experience writing and publishing books and other content through a volunteer-based non-profit organization.

So, thank you David for being on the Leanpub Frontmatter Podcast.

David: Thank you for having me, I definitely appreciate being asked to join.

Len: I always like to start these interviews by asking people for their origin story. So, I was wondering if you could talk a little bit about where you grew up, and how you found yourself eventually in the field of statistics?

David: Sure. I grew up in Rochester, Minnesota. Which is, I guess, maybe best known for the Mayo Clinic. I did my undergrad atthe University of Minnesota, in Mathematics, then went to UCLA for my Statistics Ph.D. I did a postdoc in Biostatistics at Harvard School of Public Health. Then, I went to YouTube in 2012, and have been working there ever since.

And in parallel to some of these things - well, at UCLA Statistics - I had started working on OpenIntro with a few other people. And out of that came - as you mentioned, a book initially, and then, over time, a collection of books - as well as other books that other people have written over the years.

Len: And just to learn a little bit more about you - when I was looking at your page on LinkedIn, I noticed that you appeared to have gone straight from a bachelor's degree to your PhD at UCLA in statistics. Is that a normal thing in statistics?

David: I think it is reasonably common. But it definitely will vary for the background that people have. Some people will be coming from industry. But I do see a lot of people who go straight from undergrad to the Ph.D. program, or maybe do a Master's degree, and then move into the Ph.D. program.

Len: And what was your Ph.D. work on?

David: It was on spatial statistics, which I have not actually done much of since my postdoc.

Len: What's that?

David: So, that is - basically, maybe the easiest example to think about would be earthquakes in California. You could think of putting them on a map. So, maybe we have the map for 1989, the map for 1990 - and so on. And the main focus of the Ph.D. was trying to take that collection of maps, and summarize what would be a typical map for a typical year for earthquakes in California. So, trying to do that on kind of a broader basis for potentially multiple dimensions. In a way, it's kind of like the spatial median of these point process maps.

Len: I can see how that work would have been compelling for people in California.

David: Yeah.

Len: And so, you did a postdoc at Harvard for a couple of years - where one of things you did research on was the efficacy of smoking bans. And as a former smoker, who was quite skeptical about the efficacy of smoking bans when they were first introduced - I was just wondering if you could talk a little bit about your work there. What kind of work would a statistician do to determine the efficacy of smoking bans?

David: Yeah, that was a very interesting project. Maybe a little bit more background on the work that had been done before we had done our work.

Over the years, there had been a series of smoking bans in different counties and different states. And there had been different research studies on the efficacy of those, of reducing health effects - along the lines of heart attacks, as well as other cardiovascular outcomes. This is a pattern that does actually come up in a handful of contexts, where the initial study that gets published will estimate a very large effect, and it'll also have a very large confidence interval. But it will be a statistically significant result. And then a larger study comes along - and the effect gets a bit smaller, as does the confidence interval.

Basically, you see this trend where the effect tends to get smaller as the studies get larger and more powerful, for getting a more precise estimate of that effect. And when we did our study - actually, we did not find any statistically significant effect from the smoking bans on the particular health outcomes that we were focused on. Which - it's been close to a decade now - but I think it was heart attacks, if I recall correctly? And there's some nuance there as well, that we were focused on some particular health outcomes. There could be other health outcomes that were impacted. But thinking about secondhand smoke aspects, it perhaps wasn't as powerful as those - well - I'd be very surprised if it was as powerful as those earlier studies had suggested.

And, to me - that was a very powerful lesson in terms of, especially - the risk of publication bias. So, there might have been other studies as well, where the authors have done a study - but they didn't find a statistically significant effect, and so it wasn't as interesting. Maybe they either didn't even write the paper and suggest the result to journals? Or perhaps the journals were not as interested in those results.

So, it tended to be that you'd see these larger effects that were statistically significant, that dwindled over time as they got more and more precise to smaller effects - but still significant. And then eventually, not statistically significant.

And, I guess - one more nuance about that was that, the way that the modeling was done - was that if we had taken the particular approach that the more recent studies had done, we still would have found an effect. But when we did diagnostics on the models, it suggested that model was incorrect. And so, we needed to have a more flexible model that accounted for nonlinearity in the trend of these health outcomes.

Len: It's such an interesting subject. I mean - for example, like specifically if you're trying to find a - and just speaking as a layperson, right? If you're trying to find a connection between smoking bans and heart attacks, you've probably got to take into account things like - was smog going down in the areas where people were living at the same time? Were there new healthcare opportunities for people in the areas at the time? And even changing demographics, and things like that. So, you would have had to come up with some set of variables that you were taking into account, that might also have been affecting heart attacks. So in order to do it, you actually have to study many different influences on heart attacks at the same time as smoking.

David: That'd be the ideal, for sure. Yeah, we tried to kind of have those modeled as a background curve of the rate of heart attacks. So, you can imagine that before the smoking ban, we would be modeling the trend line of those health outcomes - and that trend would hopefully reflect those kind of background aspects - whether it be air pollution or whether it be healthier eating, or unhealthier eating, or exercise, etc. And what we were looking for in our particular model was to see whether there was a step change once the ban went into effect.

We had hoped to see that, as the exposure to secondhand smoke, in particular, was removed - in many cases, that we'd see a reduction in these bad health outcomes. So, I guess, that flexible modeling part beforehand, was I think where our study differed more from the other studies, and that kind of metered the results to zero. It was interesting work, an interesting project.

Len: It sounds fascinating. Actually, just generally speaking - statistics has become - I think, a more prominent part of even, you might say, pop culture, in the last ten years or so, than it was in the past. With figures like Nate Silver, for example, becoming very popular with his work on elections and things like that. But even with visualizations in reporting in the news and stuff like that - I think people have become a lot more aware of statistics being used, and how they can be used.

And of course, and this probably goes into a part of the OpenIntro story. But the huge explosion and interest in data science, in particular, is something that's happened in the last, I would say - just decade or so. Can you talk a little bit about how you've seen that happen, and why you think that might have happened?

David: I guess maybe I'll start with - when I was doing my math undergrad, one of the professors that I had, had steered me away from mathematics, toward statistics. He had actually foreseen this shift in trend, toward statistics being a much more relevant topic. So, I should also just acknowledge there that I didn't see this coming, but somebody who was a mentor to me did see it coming. And so I definitely appreciate that direction that he had provided.

In terms of what I've seen kind of evolve over the years - yeah, data science itself has become a very - both a pop culture, and I'd say, especially a pop culture term. But something that reflects where things are headed in terms of - more data is becoming available. We do need people who are competent at analyzing that data. Whether that be through analysis, or whether it be through machine learning systems.

There's an interesting terminology behind data science as well. In terms of - when somebody says like, "I do data science." It's also not always clear what they mean. It could be analysis, or it could be machine learning. There's some blogs that I've read that I really enjoyed around describing data scientists. There's a type A data scientist for analysts and type B for build. So, I would definitely encourage anybody who hasn't heard of that, or read those blogs - to search that out and read their thoughts on that.

But yeah, - we've definitely seen it grow as an area, even in the last ten years. I remember it being a much smaller community early on. Much more scrappy. And I think things have really developed and matured in a really healthy way.

I think some of those people that - contacts that you'd provide as well, like Nate Silver - I think, was a nice example of somebody bringing data science to the masses in a very interesting context. I know he may get some - people aren't always happy with every result that he estimates. But, I mean - for the most part, I think he's done a really phenomenal job in a very difficult context - to give predictions, based on very noisy past data, and very noisy current data, as well.

Len: And actually one really interesting thing about it is how political things can get when you're - and maybe it sounds funny to someone who might not be familiar with the whole discourse around statistics, you'd think, "Well, it's dry bean-counting, right?" But then up until one particular American election I'm thinking of - the New York Times was showing these like dials set to like a specific percentage point, like 86% chance that candidate A is going to win. I think a lot of the flack that Nate Silver - and I'm not an expert on the issue, but, that he somewhat unfairly got - was that there was - the presentation of things that he had partially been behind - was not necessarily something that he had decided upon, or even approved of.

That leads me to ask you a very general question, which has actually come up on the podcast before. Partly because it's a fascination of mine, but - one curious feature of ordinary human common sense is to believe that if something is presented in association with a number - is that it must be true, quantifiable and well understood, right?

I've had this experience in the financial world to some extent. Nothing to do with stocks or anything like that. But if you're doing high-level financial analysis of a business plan, for example, and you show people projections - let's say, take a simple line chart. And then you have some numbers like, "The IRR is going to be 12.5% over the course of the lifetime of this investment." In my experience, no matter what you say to qualify what you've presented, half of the people at least, are going to walk away thinking, "Len just said the IRR is going be this." And they might come back in five years and go, "You were wrong." And when you reply saying, "Well, if you recall at the time - I put all this context around my number." They'd go, "What are you talking about? No, you said it was 12.5%"

In general, in your work - which maybe we'll talk about in a little bit - I mean, I'm imagining at the professional level that you operate at YouTube, people understand this as well. But when you're talking to people in the general public, or even educating future data scientists - what advice do you have for them, about how to qualify the presentation of their results?

David: I think you're highlighting something particularly important, which is that communication is definitely an underrated skill for - not just data science - but I mean, every technical field. And in terms of the advice that I would maybe give someone, would be - be your toughest critic on your work. Really try to poke holes in your own work. Because if you don't do it, it's either going to be somebody who may do so in a more public way. Or it could be that - even worse would be that those errors are not considered, and they're not communicated accurately.

So I guess, first,, be your biggest critic. And make sure that you know what's not working there, and then temper the results.

As you've described, doing that can be very difficult. It's very easy for somebody to latch onto something that's very concrete, like a particular number that's reported - and forget all that context around it. I think this is probably an area where there's a lot of room for better education on, as well as maybe better best practices of how to communicate those results in an effective way, that does capture the uncertainty maybe that is associated with them.

One way we do that in statistics, is with confidence intervals. But confidence intervals, even themselves, don't always capture the true sense of the uncertainty. Usually, they're reflecting the uncertainty in the model. And maybe the model itself is something that we're not too sure is a correct model, or even a - well, no model is correct. But maybe not even that close to correct. So, I think it's worthwhile to try to run multiple variants of approaches to see how variable those results might be. And in many cases - I've done this in my past as well on projects, and sometimes those confidence intervals for a particular model will not reflect very much uncertainty. But then looking at a different model, we see a lot of uncertainty between those models.

In those cases, I actually find it useful to report all of the results and just say, "Hey, we know that there's uncertainty in all of these different approaches, and we don't know which of these is correct. These are the range of possible results." Even forgetting the confidence intervals for a moment, just reporting those initial point estimates. I found that to be a pretty helpful approach for communicating uncertainty - is to report, not just a single result - but multiple results. And if not, then at least a good confidence interval that does really reflect the uncertainty associated with that estimate.

Len: Thanks very much for that really great answer.

So, as someone who's worked both on the academic side of things and from - one might say - the sort of industry side of things - do you think it might be easier to practice being your harshest critic when you're working in a company? As opposed to in academia, where you're trying to say - for example, get tenure, or something like that? You talked a little bit about publication bias earlier on. But I know from friends of mine in the sciences in academia, that there's a tremendous - of course, in the humanities too - there's tremendous pressure to publish. And at a certain point, if you're going to get ahead in your career - you have to stop being that critic, to put it crudely.

David: Yeah. I think that's a really thoughtful observation, and I think that - I don't know that there's a good answer to that in terms of what the individual can do. I think that's a worthy call-out. Especially, I think, to the folks who are more senior - to try to adjust that system, which is basically the job of folks who are more senior in these departments, to make sure that the incentives to do good science, also align with people's advancement of their careers.

I think there's been attempts to do that in terms of focusing around publication, and making sure people are producing peer reviewed work. But I don't think that that is a solution in itself.

That feels like a metric that has been created of enough publications, as well as publications that receive certain amounts of citations, or some type of reputation based on that. But once - I think there's a saying, I can't remember who said it. But, "Once a metric is created, it becomes potentially less useful as people start to game that metric - even if not intentionally, just subconsciously." And that's definitely a problem that we see in industry of - a metric is created, the team tries to start focusing on that particular metric.

But the same thing happens in academia, and it does feel like that metric has become particularly stale. I know it's much more complex than that, but I would love to see some further developments. It's a very hard problem, and I would hope that senior folks who do come up with a better solution there, would get lots of credit for improving upon that problem as well.

Len: That's super interesting. That thing can happen in finance as well, right? Where a new metric is created, or one becomes more popular than in the past. And then you start tweaking what you do to basically "study to the test." That can actually have huge consequences on how all kinds of institutions are run, and how all kinds of people's individual career paths go.

Which reminds me of the next question I wanted to ask you, which is going back to your story a little bit. Which is - so there you were in Harvard doing a postdoc. You had your Ph.D. behind you as well - and you decided not to go with the academic route, and you moved to California and started working for Google at YouTube. How did that come about?

David: I guess - even as early - maybe, as high school - I'd planned to do teaching, and that had been most of my focus through midway through my postdoc.

I think one thing I realized during my postdoc was that, "I don't actually enjoy teaching itself," but I really enjoyed the development of educational materials. And to me, the best of both worlds was then to work in industry, which happened to be at Google and at YouTube.

But then also, in my spare time - in my evenings and weekends, and with a really awesome team at OpenIntro - to work on these additional resources that are open source. And that to me, was what was right for me. It's definitely not necessarily what is right for everybody else. But I found out, really understood better, what I enjoyed doing - and ended up heading in that direction instead.

Len: I've just got one more question before we go on to talking about OpenIntro, which is - I think people would probably be really interested to know and what a data scientist does working for YouTube on a day-to-day basis. I was just wondering about your current role in fan funding, if you could talk a little bit about what work you would do when you - well, go into the office - if you're even doing that?

David: Yeah. So, not quite going into the office yet. We're still working from home. But, soon - hopefully, we'll be back in the office.

In terms of what the - maybe I'll give a little bit more background on fan funding, and what that means for YouTube.

So, fan funding is primarily focused on a couple of different categories of products that are intended to - basically help fans of channels, help financially support those channels.

A couple of examples of products for that would be channel memberships. So, for example, Len - if you had a YouTube channel, and you had channel memberships on your channel - then I could join your channel and pay a subscription fee, basically to be a member of your channel. And so, I think - that's been, a very powerful feature for creators, to build an even stronger sense of community on YouTube - as well as make it more possible for them to further their work on the platform.

Another example of a product is through live video. So, folks who are running live video on YouTube, and have the features enabled for their channel. They can have what are called "Super Chats," and "Super Stickers." And so, basically - people who want to support the channel, can purchase a Super Chat or a Super Sticker in the live chat with that community. That can add a little bit more prominence to their chat message, or it can post a sticker into that chat thread. And the creator then makes some money off of that to help further their channel.

Len: Thanks a lot for sharing that. You've just reminded me - I'm not a big gamer or anything like that, but a while ago I got into Zelda: Breath of the Wild, and I started watching some Twitch streams. I was very surprised to see the person - like three times a minute - pause to sort of thank someone for what they've just contributed to them, and I imagine it was something similar to what you're describing.

And being old, I was surprised that this was just like - that giving money was actually this sort of cheerful community-building thing, and I was actually quite pleased to see that actually as a convention. That just giving a little bit of support to people, because you're enjoying what they're giving away otherwise for free - is actually really good development in supporting creators.

David: Yeah. And I think that kind of - in a way, it connects nicely with Leanpub - since that's exactly how we run our books on Leanpub. How it's a - folks who want to give for our books, it's purely optional. And so, it feels like there's some nice harmony there between working in OpenIntro and having books along those lines - as well as working on fan funding at YouTube.

Len: Well, you just gave me the great, a great segue into the next part of our podcast, where we talk about OpenIntro. You mentioned earlier - I didn't know this, that you started it while you were still doing your Ph.D., if I caught that correctly?

David: Yeah, that's right.

Len: Could you talk a little bit about the origin story, how did it come about?

David: I would say the earliest phases of it started in 2008, very roughly. We don't eve -- I don't even know if we have an exact year that it really started. But by 2009 it was Chris Barr, Mine Çetinkaya-Rundel and myself - who were really dedicated working on OpenIntro Statistics, the textbook that would become the core product of what we offered. And now it's one product among many, but that was - I would say, us just observing the textbook industry - of our own experience as students at the time. Of seeing extremely expensive textbooks. And especially as Teaching Assistants - and seeing this same textbook being used for students that didn't change that much year over year or edition over edition, yet kept getting more expensive. That was very confusing.

We now understand why that is much better, which has nothing to do with the free market.

I guess we were just confused and naïve, and we were like, "Hey, let's do something about this." It's good to be naïve when you get into something this time-consuming, and that you start spending hours, and you just keep spending hours - because you think it's much closer to completion than it actually is.

But eventually - fortunately, if you keep with it - you get there. So, yeah - I guess, 2009 we really started hard working on the actual text of the book. By mid-2010, we had a preliminary edition. In 2011, we had a first edition - and we've had a few editions since then of that same book, as well as some more books.

Len: And for those who might be interested in or intimidated by the process of setting up a nonprofit - was that something that you guys were planning on doing from early on, and was it hard?

David: Yeah. So we had planned to do a nonprofit. Early on, we just basically made sure we didn't make any money. That way, we didn't really have to worry about any of the aspects around that.

But in terms of setting it up - we actually ended up just paying for a service to set it up for us. And that was - if I remember right, it was a few thousand dollars. People can definitely set it up on their own. But we anticipated that we would be growing over time, and just wanted to really focus on the work that we were already doing. And so we just had paid an organization to take care of that aspect for us. And now we run it, more - basically on our own, but that initial setup was helpful.

Len: And had you raised any money, before you decided to spend the money to set it up?

David: We had. I think that same year is when we started charging more for the book. For the first five years or so, we actually just didn't make a profit on anything. I think even on the individual book sales, we were just marketing it at cost. This was the paperbacks - at that time, we were just doing straight PDF. People could download it without getting it through Leanpub. I hadn't known about Leanpub at the time - but had I, then we might have considered going it through there earlier.

So our paperbacks - we had just priced them at cost. I think it was 2016 that we did start increasing the margin, with the anticipation that we would become a nonprofit by the end of the year. And that was successful, so we were able to just roll that all into the nonprofit, and not have to deal with any weird tax stuff - which was nice to--

Len: Oh, I see. So you were actually operating for years, before you incorporated as a nonprofit organization.

David: Yeah.

Len: Ah, okay. Yeah, it's interesting - I actually had a recent interview with someone who was working in the open source world, who thought initially that not taking any money would make things simpler - and they got really complicated for him and for him in no short order. He eventually had to incorporate. And in his case, it was a for-profit organization. But at a certain point of maturity in any project, formalizing things and - yeah, having a sound legal basis for them - is very important for anyone out there listening, who's thinking about launching something like this. And yeah - don't think about it too much though, as you were saying.

David: Yeah.

Len: You might not start. I'm just actually just looking at a screenshot I took of a slide from a presentation you gave a few years ago. I didn't realize when I watched it how long ago it was - I think it was 2014 or 2015? But there was actually eventually an association between OpenIntro's products and Coursera.

David: Right. Mine had actually run a course on Coursera. And that must have been 2012 or 2013, I don't -? Somewhere around 2013ish that she ran a Coursera course. And that was actually a really - I think, a big boost for both her and also OpenIntro - as outreach for folks who would become aware of the project. I think that was a really nice set of work - and a lot of work, that she put into making that course happen. And since she is an author on the book - of course, she ended up using OpenIntro Statistics," which was perfect, since it was free online.

I think at that time, we - maybe that was the time where we shifted onto Leanpub, and I think we were just - she would just take the profits from that, since they were all voluntary, and she was doing all the work for getting that large audience. So it was just something that was listed on the Coursera page. But that was our first foray into Leanpub, which was exciting and encouraging for future developments in that space.

Len: We can talk a little bit about that at the end of the interview, where we go into the weeds of how to set up a project like this on Leanpub and things like that. But before we do that, actually - I feel like I've buried the lead here. I imagine it evolved over the years but - if you were to give us the mission statement for OpenIntro - what would you say its purpose is, and just a little bit of the details of how it goes about achieving that mission?

David: Yeah, so I'll actually just - I'm looking at our homepage right now, just to make sure I get it precisely right. Our mission it to make educational products that are free, transparent, and lower barriers to education. And that has evolved to also include now - really supporting other books, as well.

We've just started recently in 2020, to work with authors outside of kind our core line of textbooks - to try to bring our services to those books as well. Yhat's in that same mission, of trying to make education more accessible for students, no matter their financial means. As well as provide them with plenty of supplements that they can learn in their particular way, if possible at all.

Len: And the financial means question is actually quite interesting in the case - I'm thinking particularly of Coursera. And data science has become so - their MOOCs on data science - but data science generally, is actually popular all around the world. It didn't just explode in the United States or something like that.

And so, people who are producing content for students of data science, their audience is the whole world. And this might be people from countries with very different purchasing power parity and stuff like that. And there might be many people for whom any price is too high, when they're just at the beginning of their journey. And so, when you talk about the things you're trying to address, that's actually a big part of it. It's not limited to an American vision.

David: Yeah, that's definitely true. We definitely see people using the resources in many different countries and, as you said, in countries of different means as well. That has been something that has been important to us. We definitely have still focused on the US. Mainly because that's what we're most familiar with, and that's what we can get the most, I think, traction. But we do have lots of people in other countries who are using the book. We actually had some folks over in Japan, who did a translation of OpenIntro Statistics recently. So now, there's a book now in Japanese that students can learn from. Which is much better for local folks there, than reading the English version. So, yeah - I think that call-out of, "It varies across the world," is an important one.

Len: And although that's true - one of the reasons I bring it up is that there is a particularly American framing, I think - when I was doing my research for this interview, to the mission behind the organization, which is partially that tuition costs are so extraordinarily high in the United States, and this creates particular problems, even in a wealthy country, for people who are trying to get through their education.

And one particular feature of the expense of getting into university - let's just limit our scope to university degrees - is the cost of textbooks, which is fucking crazy. It's something I wanted to talk to give you an opportunity to talk a little bit about. Because this was obviously one of the origins - you and your colleagues and friends, were Teaching Assistants, doing your graduate student work. And then seeing students being assigned books that are worth hundreds of dollars, that they're going to use for one course once - and might not even be that good.

I guess, generally - maybe as a way in - you mentioned earlier on that it's not a free market, and I was wondering if you could talk a little bit about things from that perspective? The paid university textbook market.

David: Right, yeah. So this is one thing that we were completely naïve to, when we first started our work there. And, I guess - thinking about what happens during the purchase of a textbook, or the context around the purchase of a textbook - is particularly interesting.

You can imagine that maybe, Len - if you're running a course, and I was taking your course. You would choose the book that we would purchase as students. And, in a way we have one person picking the textbook, and another person having to pay for that textbook. So it may not be the right textbook for me as the student - in terms of it might not be ideal for me, or maybe I think it's just too expensive, and there's another book that offers the same content at much more value. But as a course goes, it's hard to get around that issue. It's not really a feasible aspect of making it so that students can choose whatever book they want. So that itself, is something that we didn't really understand - even though it is quite obvious in hindsight.

But then there's even more behind it from there. Maybe students don't recognize this as much - but teachers also get their textbooks for free in most cases, or at least in many cases. So if maybe I'm teaching a course and I want to explore a textbook for whether I want to use it - I'd reach out to the publisher and I'd say, "Hey, please can you send me a copy of the textbook, and I will consider using it for my course." And the textbook company will typically send the teacher the book. Which we also do for OpenIntro, I'll readily admit. Which is, I think, an important thing. Because we have to compete with these other publishers that are doing the same thing.

But I think where things really go haywire, is that it's not always transparent to teachers what the actual cost is to students - and they may not really be internalizing the true cost that are incurred on students. That's when I think, one place where we really differentiate from other publishers in that. As the book is free online - students can get the book for free, just as the teacher got it for free. And we also offer paperbacks at $20 or less for our books. And I guess, that aspect of, the teacher might not know the price of the book - actually it's interesting, because by law, the publisher has to tell them the price, and has to be very clear on that. But I don't know that that's always happening.

I've actually gone to conferences and visited these introductory textbook booths, and they don't always have a price list. So even at these conferences, they don't always follow the law. And that undermines any sense of a free market, in this aspect.

So, I think it's a very complex situation - and actually, in some ways it mirrors the US healthcare system, in that the people who are choosing the healthcare, aren't necessarily the people who are paying for the healthcare directly, at least. It's usually the insurance companies, but the insurance companies don't have full sway. And so, that can lead to escalating costs. Which I think is - there's some parallels here between healthcare and textbook industry - which is interesting.

Len: Oh yeah, definitely that's a super fascinating thing. I think I'll ask you - I've got a question I'm teeing up to ask you in a moment about that. But before we get there, just to give a sense of scale. You've talked about this in a couple of talks and things like that. But I think you give the statistic that something like 60% to 65% of first year undergraduate students don't buy a textbook that they need for one of their courses, because of the cost.

David: Yeah.

Len: That's a really proportion of people trying to get an education, who are actually missing a key resource. And you've also got this really funny observation that, there are cases where - if you take the number of textbooks purchased, the cost of the textbook multiplied by the students in the course, that that cost might actually be higher than the expenditure on the teaching.

David: Yes. I think those two statistics that we found are quite interesting, for sure. And the first one - that 65%. I've seen that in a couple studies. We actually ran our own survey and got a similar result when we asked students for it. That's why we felt pretty comfortable in reporting that number, is that - we saw it in two spots, and we replicated it ourselves in a relatively small and informal study - but still encouraging - thinking that it's actually a real result, and actually is a real disruption to education.

Len: The parallels to healthcare are so fascinating. For example, one question, when you start thinking about this, that you might ask yourself, is - why doesn't tuition cover textbooks? It covers a lot of different things that you need to get your education, but not the textbooks. If you were inventing the university system, with a big tuition payment at the beginning, you might take that into account. But that's actually, the answer to that being the right thing to do, isn't necessarily straightforward.

Because one of the reasons - at least, in my opinion - university costs have ballooned so much, is that when you just see the top level tuition amount, the people determining that amount can keep slipping things in, and making it higher and higher. And you don't necessarily realize that now you're paying - I don't know, what's the old joke? $10 000 for a toilet seat or something like that, right? And at the same time, though - giving people a tuition amount. It's sort of common sense to think, "Well, that's what it's going to cost."

And then all of a sudden, you realize as a student - maybe your parents never went to university, so how would you know? There's thousands and thousands of dollars more that you have to spend on these textbooks. And I guess, as a way in to your thinking about this, if someone were proposing, like, "We should actually include the cost of textbooks in tuition," is that something you would approve of or disapprove of?

David: I think I'd feel better about it than the current model. But I think you're making a really good point that if a student sees a price of $25,000 or $27,000, they may not really differentiate between those two prices for their decision. And in terms of - I'm trying to think of -

Len: I mean, just to carry on with the process of thinking, the counterargument would be - well, now if the universities are becoming - just like with drugs, right? With pharmaceuticals. If now the universities are becoming bulk buyers of textbooks and then they band together - for example, in a network - then they could put pressure on the publishers of textbooks to bring the price down in bulk.

David: Yeah, that's a good point for sure. I also feel like - take where I did my Ph.D. in UCLA statistics. Our introductory statistics course - that was, I want to say - around 1,500 or 2,000 students per year, when I was there. And you can imagine at $150 for a textbook, we're talking several hundred thousand dollars a year. And so, the university at that point could actually just say, "We're going to write our own book and have our own book. We're going to put one year of textbook money into paying a couple of faculty to full time write this textbook. And now we're set for arbitrary number of years going forward, or at least several years going forward."

So there's definitely I think a better incentive there. And this is, I think, one place where we see some differentiation between the college level and high school. We have one textbook that's targeted towards high schools, and that market seems a bit more competitive. Because there is that - the schools that are purchasing those books are also the ones who are deciding which book to use. And so, I think that trying to push the universities into trying to reduce those costs as well, would be a great step. I think your call-out there that, "It's within their motivation to minimize that cost," is a good one.

Len: And in the end - like so many other things - I imagine there's - I mean, money is a big motivator, right?

I actually wrote an angry blog post about an article I read in New York Times not too long ago, by someone who was complaining about students ripping off their professors, by using older versions of the textbooks, or pirating them online. And I'm like, "Oh yeah, the poor prof -" I mean, I've got lots of friends who are professors. I wrote a doctorate myself. I'm sympathetic to professors. But like, "Oh no, the poor professor," compared to the like 19 year-old kid paying 60 grand tuition. I mean, I'm not going to spend my time writing New York Times op-eds about one side rather than the other, I don't think. And just to give a sense of why people can get really passionate about these things -

One thing that students are subjected to nowadays is access codes. Temporary access codes to educational material. From the outside, it looks like a real racket, right? On the part of the publishing companies. Where basically students can pay lots of money to only have access to say a DRMed ebook for the period of the term of their study, or something like that.

I guess there's lots of things you could try to do to change a system like that from within. But there are things you can do to try change a system like that from without - which is exactly what you guys are doing with your open education resources, including textbooks and videos.

I wanted to get a little bit into the weeds around the organization of all that effort. So, OpenIntro is entirely a voluntary organization, is that correct?

David: That is correct, yeah. We do have a few authors who are making a small royalty now on the books that they've written. Previously it was just, nobody was making anything. But we have wanted to - for a long time - shift towards that model where people would make some money from their work. But given the constraints that we do put on pricing and such, it's a very modest amount. But still, we think it's an important thing to have happen.

Len: And so, currently no one's being paid to do any administrative work at all?

David: No, no one's being paid to do administrative work right now. We have talked about doing that, though. Because we do teacher verifications and desk copy fulfillment - which is relatively trivial, it doesn't take much time. But it is just an onerous step of the process. And as we hopefully scale up to more books, and supporting more books beyond our core set of textbooks, we do anticipate that we will have many more teachers signing up through OpenIntro, as well as sending out more desk copies. And we do plan to start paying some folks on staff to actually do that work, which is also a nice thing in that - the grungiest work, is work that gets paid. Whereas, I think - this other work that we do, oftentimes it's not always glamorous - but it's, I think, a bit more fun.

Len: And have you had to set up any systemic recruitment process or anything like that, to find volunteers? Or has OpenIntro just naturally found people when it needs them?

David: We had - yeah, we were super informal before. It would be - somebody would reach out, and we would get back to them over email. And maybe we'd find a project for them within the organization, or maybe they had a project idea? It was extremely informal before. It's still pretty informal. We finally put a form on our website - maybe in 2019 or 2020? For people to get involved with, and so we have a set of projects that people can sign up for, and get involved with there now.

Len: And can any teacher sign up to become a verified teacher, and then get access to restricted material? Like the answers to even numbered questions, and things like that?

David: Yeah. That's what we have set up on our website. I'd say that works probably, in - I would estimate about 90% of the cases, that works quite well. There's definitely some cases where - maybe 10% of teachers, maybe it's teachers who are at schools that don't have good infrastructure, or we can't actually get a good way to check and verify they're a teacher. So, not every teacher does make it through, but the vast majority do. And the vast majority do make it through very, very quickly as well, to access that teacher-only content.

Len: And are OpenIntro books being used as part of formal university courses?

David: Yeah. We definitely have a lot of university instructors who have signed up, and confirmed that they are teaching with the book. It's ironic - even though we're statisticians, we weren't collecting that data for a very long time. We finally started collecting that data, at least through the desk copy request program that we have, where we'll have teachers indicate whether they have already made a decision, and we now have a follow-up survey for those teachers to confirm what their decision was and such.

But we do track that now, and we definitely have a large number of courses that are using the book. We actually have - it's over 3,500, it might be now 4,000 teachers, who've registered and become verified on OpenIntro. And these are formal teachers at formal educational institutions, who've registered on the site to gain access to these materials.

Len: And are there any -? I'm interested if there are any institutional roadblocks that are put up? For example - if you're an ordinary university professor, and now you want to use this open resource ebook as your textbook for your introductory course to statistics - do you have to go through any special process to justify that, as opposed to getting one from a big conventional publisher to use for your course?

David: I don't think there's any difference between just switching to any other textbook. Our textbooks have, I'd say, even more optionality than what the typical publisher does. We do offer the paperback, we also have the free online - and the books are thoroughly vetted, and have been thoroughly tested now, over the last 10 years or so, at least for our oldest book. I think that's mainly a departmental decision of, "Is the teacher permitted to choose which textbook they want?" Less than, "Are they permitted to choose an open source textbook in particular?"

Len: And do you feel that attitudes have changed in recent years, from people who are part of these formal institutions? With respect to open access resources? Because I know in the past that typically people would voice - in my opinion - a knee-jerk and thoughtless concern about the quality of something, if it hasn't gone through the process that a publishing company would have put it through. Strikes me, the people who've never - who need to read more widely, if they think that's a guarantee of quality.

But do you think people's attitudes have changed around that? There's institutions like yours that have grown so well, all around the world. MIT offers free courses, and things like that. Do you think attitudes like that have changed within the university institution and faculty?

David: My perception is, yes. I would say, though, that we're definitely getting a biased sample - that is, within our view. The teachers who do register, we know that they're already interested.

But my general sense is that there is more acceptance of it within the community. And I think that - especially now - I think books have got a lot stronger, that are open source. It might have been true that, especially going back more than ten years ago - that the books that were available - maybe they weren't- the typical open source book wasn't quite as strong as the typical publisher book.

That doesn't mean that it was a good or a bad book - either publisher or open source education. But there probably was some differentiation there. And there might even still be some differentiation for the typical book. But we're looking at actual popular books, were books that have really devoted a lot of time to making that resource available. I think now, it's hard to tell the difference in terms of quality. And we've definitely had many instructors who've commented that they think our books are higher quality than many of the publisher books that they've reviewed, if not all of the publisher books that they've reviewed.

And of course, this is individual taste as well. So some teachers will feel otherwise, for sure. But yeah, we're very happy with where things have gone, and where the attitudes have gone in this direction.

Len: And it's really fascinating too, that with different business models can come different solutions to problems, right? Which is - so, price is actually a really important one. But, in a sense - sort of setting that aside, right? When you're making things available as ebooks, one of the things that you guys do which is so great, is you offer - when you get version four, you can also get version one, two and three along with it.

And as a student, that might sound strange, "What do I care about the older versions? I just want the more recent ones." But if you're a teacher and you've got students with printed versions of volumes one, two, three and four - being able to point people to the right place, is actually now something you can do. Which you couldn't do before, right? If you say, "Go to page 22," and people are on different versions, it's a classic teaching problem - being able to provide constructors with ebook versions of past versions, actually just solves this really huge problem for classroom teaching.

David: Yeah. And even as well for those students who took a course four years ago - they can come and they can get the version that they learned with. So, in addition too - some teachers who do use that older edition, which tends to dwindle out over time. But we also wanted to make sure that students - we have this line of, if a teacher uses our book, the student will have access to that book forever. That is our intent.

And that's the reason we make sure that all past editions remain available. So students can always go back to the book that they are most familiar with. Even if we think the newer edition is better - that doesn't mean that it's better for that student, who's most familiar with the edition from five or even ten years ago.

Len: Oh, that's fascinating. I had never thought of that enormous advantage, to being able to get these multiple versions at the same time. That's really great.

Just before we move on to last part of the interview where we talk more about process and things like that, I was wondering if you could talk about what future projects OpenIntro is working on right now?

David: Our biggest project that we're working on right now is around a partner program, where we are looking to identify a handful of textbooks that have been written, have been out for a while, have been vetted, and are well-reviewed and doing quite well - and basically trying to provide them services that are similar to what a publisher would provide to a book.

We're not intending to be the publisher of these books, but we're trying to basically help these existing open source books gain more traction, and get in front of more teachers.

Ae have a handful of books. We have a book by Stitz Zeager, Precalculus. We have APEX Calculus and we have Linear Algebra available as well. And these are books that have been written and around for quite a while now, and have been doing quite well - and they've joined us.

We basically provide desk copies - and if they have solutions manuals that they want to keep restricted to teachers, then we can host those on our site, and take care of that process of reviewing teachers for those books.

This is part of something that we've been working on for a while - trying to really streamline our own process, so that we can do these things extremely efficiently. And to support these other books - it's really just a little bit more than the cost of all the desk copies that go out. So, the actual - this is even before - right now, it's actually just the cost of the desk copies. Because we just do volunteer hours for this administration time. But in the future, when we do even start paying people to do this work - the cost is going to be still mainly those desk copies, in particular. That's where we're most excited about right now.

Len: That sounds really great. I just wanted to say that we don't have any data on it either, but a particularly high proportion of our audience is probably people who might have books that are - might be appropriate for a program like that. And is there anything - is there any particular place people can go, if they think their book might be a fit for partnering up with you?

David: They can send us an email. We have admin@openintro.org, would probably be the place to first reach out. I will highlight that we are still in a very small scale for this. We're scaling up slowly, to make sure that we work out all of the kinks.

Last year in 2020, our focus was on making sure that our infrastructure was sufficient for scaling up to more books. And this year, we're trying to make it so that we're confident that it's going to be financially sustainable for us to scale to more books. That'll probably continue on into 2022. If we hit both those kinds of steps, then our hope would be that we can scale this up to dozens, and in the long term, hundreds of books that we would support and provide these services to.

Where - interestingly, actually through this program - we use Leanpub as the primary mechanism for raising funds for covering those desk copies. So this might be a little bit different for - we might have to find a different model for people who are already on Leanpub and distributing their books. But it's just trying to bring in a new revenue stream to cover these costs - so that way, those existing authors can retain their royalties for the paperbacks, which they've been doing for years now. That's the model which Leanpub has enabled us to do, which I think is very interesting.

Len: I think you've brought that up a couple of times, so this might be the right moment to talk about that. One of the things that people who are familiar with Leanpub will know, but people who aren't familiar with Leanpub won't - is that we have a variable pricing model on our books, which means that you set two prices for your book, not just one. You set a minimum price and a suggested price.

And then, when someone comes to the website, to the landing page for the book - they're presented with a "pricing slider," as we call it, that you can slide to the left to take it all the way down to free, if you want. You can leave it where it is, at the suggested price. Or you can slide it to the right, and you can pay more. This has been a really popular tool for people who are in the educational space on Leanpub, where they can offer their resource for free to people, but also generate some revenue for it to help fund their projects. Because people do have the option to pay.

David: Yeah, I feel like this has been a really phenomenal model for us to participate in. Looking back at our recent years - where we have been participating in Leanpub, and using Leanpub as a distribution platform for our PDFs - it's become actually a hefty revenue stream for us to help grow our program. And I'm sure - even as an individual author, this would be an extremely great platform to be working on.

Len: Do you remember how you found out about Leanpub? I know it was a few years ago now.

David: Yeah, so - Mine was the first person on our team to bring it to my attention. And I think Roger Peng was the person who brought it to her attention, if I remember right? And Roger, has - I think, a handful of books on Leanpub and has also has a very heavy distribution on Leanpub, and he's run a couple of Coursera courses - if I remember right?

Len: Yes. He and his colleagues ran what might have been the most - at the time - successful MOOC ever, on data science on Coursera. And actually, if you look in the Frontmatter podcast archives, you'll find interviews with Roger Peng and Jeff Leek and Brian Caffo, who are some of the people behind that stuff. So that's really great to hear that they found their way to you, through that work. [For a fascinating paper they wrote called "The democratization of data science education", please go here - Eds.]

And have you found it easy to coordinate amongst authors? It's hard to ask this question. But - you're running this volunteer organization where you're producing these really high-quality books that are co-authored by people, and then rigorously tested. And so I think people could probably, more or less use their imaginations to figure out - how do you propose a new book project? How do you agree on it? How do you find authors? How do you coordinate the co-authoring and things like that? How do you go about doing your testing of your textbooks?

David: We'll usually do a semester or two of testing in the classroom where we'll just pay for textbooks to go to those classes. Usually it's two or three classes, in a given semester, that we'll be testing. So, we actually have, for example, a new edition of Advanced High School Statistics that we were thinking about releasing in 2021, but we decided to do additional testing on. And so that book is going out to - I think we have three different courses that'll be using it in 2021 to 2022, with the plan of releasing that new edition in 2022.

So, that's the main mechanism that we use for actual testing. And it is - actual students are getting the book, and in these cases, we just fund those books directly, based on our past edition's royalties.

Len: So you send a bunch of books in a box to a teacher?

David: Yeah, and a PDF to all the students who want it.

Len: And is there an organized way of getting feedback? Does the teacher tell the students, "If you ever find a typo, let me know and I'll tell the authors about it?"

David: Yeah. So the students can either send us a typo report through our website, or they can send us an email. The teacher will also oftentimes be reviewing the book in particular detail, to highlight these issues.

I think some of the other aspects that we're also trying to keep in mind are, whether some of the new examples maybe resonate with students. So we might do surveys with these students, as well as with the teacher, in particular - who is really the main channel for getting that feedback.

Len: One feature we have on Leanpub actually is an "Email the author or authors" link on the landing page for a book. And you've reached many, many students through Leanpub, which we've been really pleased to see over the years. Do you receive feedback through that mechanism as well?

David: I think most people go through our website, currently. But I think that's maybe just the inherent aspect of - we have a lot of our links through to our YouTube videos and to labs and such that are on our website, which are not on Leanpub. Which probably just means that students go to our website more often, than they go to Leanpub directly. I think that's probably a special case for books that are on Leanpub.

Len: I was just particularly curious actually from a - I was concerned that maybe having this feature on Leanpub was sort of messing up your process, but I'm glad to hear it's not.

David: No, it's working great. We do occasionally get a message through there, and it's always welcome to get a message through Leanpub.

Len: The last question I always like to ask people on the podcast - if the guest has been using Leanpub, is - if there was one magical feature we could build for you, or if there was one really annoying problem we could fix for you - what would you ask us to do?

David: I think the topic that - I guess, thinking that Leanpub is online - I would also be really excited to see Leanpub offline. So, in actual physical book distribution as well, might be a very interesting space. And part of that is - I'd say that the distribution is, for paperbacks in particular - or for these open source books, particularly - is a very small number of companies that we have options for working with. And that could be, I think, an interesting space for Leanpub to get into. I think it'd be a very capital-intensive space to get into, so I don't know that it's actually a viable one. But it's one that comes to mind.

Len: Thank you very much for that suggestion. That is something that we've had people ask before, in various ways. Some of them are, "You know what would be really great, is just like have a 'make a print book' button." Because our version of that is a Print-Ready PDF button. You set some parameters, you click a button - and you'll get a PDF that you can upload to various print-on-demand services. But it's not the same thing as like a "Publish my book in paper and make it for sale" magic button.

It is, I mean - without committing ourselves to anything like that. We haven't been discussing this thing lately, or anything like that - so for anyone listening. But one of the - so, I mentioned that some of the guests on here are Leanpub authors, and some aren't. The ones who aren't are typically experts in the book publishing industry, that I interview from time to time. And one of the really big changes in the book industry in the last few years, that was accelerated by the pandemic, was basically, people specifically switching to Ingram got the printing and distribution of their books.

When I say, "people," people like you who run nonprofits. Or the sort of ordinary isolated and alone self-published author. But also big publishers have been switching to Ingram.

And the idea that at some point in the future an outfit like Leanpub might be able to just sync up, is not outside the realm of possibility - and it's something we'd find really interesting. As you say, it would be both capital-intensive - and like, as you know very well, when you start adding - I don't know? Start using molecules instead of just electrons, the logistics become just a totally different thing.

And so that's basically, yeah - you hit right on sort of the issue with why this isn't something that we haven't really done yet. But it is something that we've heard from people about, and we totally understand the desire for it. It's one of the reasons that we do have our Print-Ready PDF output feature.

Well, David - thank you very much for taking the time out of what I'm sure was a beautiful afternoon in San Francisco, to talk to us. And thanks very much for making Leanpub the platform. We're just so happy to see how you guys have succeeded - and that you've been able to use it to offer free, open resources to people. But also to fund your organization - to some extent, as well - so that you can grow and reach even more people in the future.

David: Thank you so much for having me. It was definitely a pleasure to join you on the podcast. And thank you also so much for running Leanpub, which has been, I'd say, a critical feature of growing our community of textbook authors and making it work.

Len: Thanks.

And as always, thanks to all of you for listening to this episode of the Frontmatter podcast. If you like what you heard, please rate and review it wherever you found it, and if you'd like to be a Leanpub author, please visit our website at leanpub.com.

About

David Diez, Co-Founder of OpenIntro

Transcript