Joel Grus, Author of Ten Essays on Fizz Buzz
A Leanpub Frontmatter Podcast Interview with Joel Grus, Author of Ten Essays on Fizz Buzz: Meditations on Python, mathematics, science, engineering, and design
Joel Grus is the author of the Leanpub book Ten Essays on Fizz Buzz: Meditations on Python, mathematics, science, engineering, and design. In this interview, Leanpub co-founder Len Epp talks with Joel about his background, his career, a little bit about his work at the the Allen Institute for Artificial Intelligence, his book, what Fizz Buzz is and why it...
Joel Grus is the author of the Leanpub book Ten Essays on Fizz Buzz: Meditations on Python, mathematics, science, engineering, and design. In this interview, Leanpub co-founder Len Epp talks with Joel about his background, his career, a little bit about his work at the the Allen Institute for Artificial Intelligence, his book, what Fizz Buzz is and why it's posed in computer programming interviews, how he's used it in his book to teach readers a lot of really cool things, and at the end, they talk a little bit about his experience as both a conventionally published and as a self-published author.
This interview was recorded on July 14, 2020.
The full audio for the interview is here: https://s3.amazonaws.com/leanpub_podcasts/FM161-Joel-Grus-2020-07-14.mp3. You can subscribe to the Frontmatter podcast in iTunes here https://itunes.apple.com/ca/podcast/leanpub-podcast/id517117137 or add the podcast URL directly here: https://itunes.apple.com/ca/podcast/leanpub-podcast/id517117137.
This interview has been edited for conciseness and clarity.
Transcript
Len: Hi I'm Len Epp from Leanpub, and in this episode of the Frontmatter podcast I'll be interviewing Joel Grus.
Based in Seattle, Joel is a technical leader and software engineer, as well as the author of the popular O'Reilly book Data Science from Scratch: First Principles with Python.
You can follow him on Twitter (@joelgrus)[https://twitter.com/joelgrus] and check out his website at joelgrus.com.
Joel also co-hosts the Adversarial Learning podcast, which you can find at adversariallearning.com and on Twitter @Adversarial_L.
Joel is the author of the Leanpub book Ten Essays on Fizz Buzz: Meditations on Python, mathematics, science, engineering, and design. In the book, Joel explores different approaches to solving the Fizz Buzz problem that is well-known in the realm of computer science education and in software engineering interviews. The book is also is a great resource for anyone who's interested in getting into programming for the first time, full of practical examples of how programmers think, and many different categories of solution to the same problem, and a really fun and practical approach to explaining some important mathematical ideas.
In this interview, we’re going to talk about Joel's background and career, professional interests, what the Fizz Buzz problem is and some the approaches set out in his book, and at the end we'll talk about his experience as both a conventionally-publihsed and as a self-published author.
So, thank you Joel for being on the Leanpub Frontmatter Podcast.
Joel: Thanks for having me.
Len: I always like to start these interviews by asking people for their origin story. I was wondering if you could talk a little bit about where you grew up, and how you found your way into a career in tech and programming?
Joel: I grew up in Atlanta, actually. I went to college in Texas and studied math, and then I moved to Seattle after college for math grad school. I dropped out of math grad school after a couple of years, and this was way before data science and things like that, so the career path for math grad school dropouts was quantitative finance. And so I spent some time working in quantitative finance. I worked at a hedge fund doing foreign exchange analysis. And then the hedge fund went out of business.
It just so happened that I knew someone who was hiring at a start-up called Farecast, which was - this was like 2006. Farecast was an online travel site that did price predictions on airfares. You'd tell it, "I want to fly from Seattle to Los Angeles on these dates," and it would say, "The lowest price is $300 and we think it's going down, so you should wait." Or, "We think it's going up, so you should buy immediately." And so there I was in - let's call it a business intelligence role, where I built spreadsheets and I wrote SQL queries, and that was the bulk of my job. I sort of muddled along doing those kinds of things for quite a while.
And then data science started to become a thing, and I thought, "I would like to get into data science." And so I kind of BS'd my way through an interview, and convinced the CEO of a tiny start-up that I was qualified to be his data scientist. And lo and behold, I became his data scientist.
I taught myself a lot of data science and a lot of coding like really quickly, so I ended up writing a book on data science.
I found that I liked the data science, but I also liked the writing code and the programming part. So I pushed my career in that direction a little bit. I ended up going to Google in Seattle, where I was a software engineer for a couple of years, working on pretty boring things actually.
But then I found that I missed the data aspect, and I left Google and joined the Allen Institute for Artificial Intelligence, which is an AI research nonprofit. There, my role was kind of half engineering, half data and machine learning. I've sort of found a nice niche for myself, playing a little bit in each space.
I was at AI2 for about three and a half years, and then I left last fall to join Capital Group, which is a large investment firm. There, I lead a small team that's focused on building machine learning and data solutions.
Len: Do you remember what your first computer was?
Joel: My first computer - so when I was a kid, my dad had like one of these RadioShack color computers. And so that was the first computer.
Len: And did you start coding on your own organically, or -?
Joel: I did. When I was a real little kid, I was really interested in the computer, and I'd write these terrible BASIC programs. I think that computer had probably like 4K of RAM. So I would write like a little text adventure game - and eventually the computer would run out of memory, just because the program got too big. So I always enjoyed coding.
In high school, I took the Computer Science class and I did well in it, and I think I liked it - and then for some reason I went to college, and it fell off my radar, and I just ended up doing math instead, and was much more into science than computers.
Then after college, after grad school, after working, like years later - I don't want to say I remembered, but I somehow fell back into it. And I had never learned Computer Science the way that you had learned it if you had a degree in it, but I kind of picked that up as I went along.
Len: That's really interesting you mention that. My next question was going to be a variant of a question that comes up often on this podcast, depending on a person's path throughout their career. But given where you ended up, do you wish you had taken Computer Science in university?
Joel: I mean, there are many things where you can look back in life and think, "What if I had chosen differently?" And that's certainly one of them. I mean, yes - if I had studied Computer Science as an undergraduate and gone into that industry immediately after college, I would probably be further ahead in my career than I am now.
At the same time, I'm doing pretty well for myself, so I shouldn't complain about it too bad. And studying math is its own reward, so can't complain about that too much either.
Len: You mentioned that you worked for the Allen Institute for Artificial Intelligence. That just sounds like a really fascinating place to work. I was wondering if you could talk a little bit about the kind of work that you did there?
Joel: AI2 is a really neat organization. In many ways it's a little bit like an academic Computer Science department, but without teaching and classes. So just purely focused on research.
I was on a team called "Allen NLP," that was focused on two things. One was fundamental research and natural language processing and natural language understanding. And the second is, we built a library called, "Allen NLP" as well, which was a deep learning library for NLP researchers, built on top of PyTorch.
My job was to be an engineer on that library, but also to be an engineer who spoke the language of research, so that I could talk with researchers about what they were trying to do, and help them figure out how to get it done in this library.
Sometimes I would even end up like suggesting ways they should structure their experiments - which was a little bit weird, because I'm not an NLP authority the way that they are.
But, yeah - so I worked on that. I added features, I fixed bugs, I supported users. I partnered with researchers to get their experiments written. I sort of badgered them to use best practices, and a little bit of everything.
Len: Just taking the opportunity to talk to you a little bit about artificial intelligence. Most of the time when people hear about it, they hear about, "Big prominent people are really afraid about how it's going to take over the world and you're going to lose your job, and we've got to be really careful in how we manage it." Then you hear other people say, "Oh people will just find other things to do." Or it's really far out from actually having the scary impact that people are thinking. Just in general, what are your thoughts about the current state of affairs or where things are going?
Joel: I'm not super worried about any AI apocalypse anytime soon. It's interesting actually - OpenAI has been releasing these language models - GPT, GPT-2 - and they just had this one called GPT-3, which is bigger and better. And someone - I just saw a tweet, right before I hopped on this podcast - where someone put something in it and they're like, "Wow this thing captured my thoughts perfectly, and the text it generated is astounding." I read it, and the text it generated seemed like non-nonsensical and repetitive to me. So, a lot of what we see in these things is maybe a little bit of us projecting what we want to see.
I think that certainly there are some kinds of basic, repetitive jobs that we'll decide that computers are probably better at doing. I think that in most of those cases - I don't see anyone to be better off or losing their job, because they're not going to be. But I think there are probably more high-value things that most of those people can be working on.
I think that's mostly what you'll see. It's hard for me to imagine in the near term any industries where AI's going to eliminate human contributions to this or to that.
Len: You mentioned you started out your professional career as a quant. Just for those who might not know - in finance, often people who are very well educated in areas like math and things like that, get these jobs where they do - they set up very complicated algorithms and things like that. I have a friend who - his doctoral research was on getting satellites to look through clouds better. So he did atmospheric physics, and moved right into hedge funds and quant work afterwards. Is that the work that you're doing now as well?
Joel: No. So the work that I'm doing now is mor -- it's a little bit more focused around NLP and extracting insights from unstructured data. And so less like looking at time series or prices and trying to find patterns, and more - Capital Group has a thesis around long term fundamental investing, so really studying companies and getting to understand them, and trying to decide which companies do we think are going to do well as companies.
Because of that, it's a very research-driven investment process. What I focus on is mostly like building tools to help people make sense of all this research, and produce it more efficiently and consume it more efficiently, and find what they need inside that.
Len: That's really fascinating. So it's data science on reports, like written reports.
Joel: I would say - well, I mean - written reports or news articles or anything like that. But basically, making sense of unstructured textural data.
Len: That's really fascinating. One thing that is one of the pleasures of this podcast, is I get to interview people from all around the world and maybe talk to them a little bit about what people might have seen in the headlines where they're from. We still do that a little bit - in every episode, what we've done is taken the opportunity, for the last couple of months, to ask people a little bit about how the pandemic has affected them in their career and in the place where they live, and maybe personally as well, if they're willing to talk about that. I was wondering if you could talk a little bit about what your experience has been like in Seattle with the pandemic.
Joel: We live pretty far out in the suburbs, probably about 20 miles outside of Seattle. And so because of that - I had a pretty terrible commute, my wife had a pretty terrible commute. And on top of that, we're both homebodies. So I would say - if there are two people who are well suited to be stuck at home, and working from home, and not having to drive into the office - we are them.
That said, we also have a nine year old daughter, and she's not enjoying not getting to see her friends. So it's really tough on her. At a household level, it sort of balances out, although it's probably much worse on a kid, so I guess it doesn't balance out. It's bad on net.
Len: And around when did you start getting the sense that something was going on that you would have to adapt your life to?
Joel: Well, I follow all the weirdos on Twitter, so I heard them talking about it back in January - and probably my tweet referencing it is in January. I had a business trip - my company is based in Los Angeles, and I usually fly down there about once a month. I had a business trip down there, basically the first couple of days of March. And in, let's say in late February, I made like two Costco runs and stocked up on a bunch of supplies and toilet paper and things like that. So that was, I would say, a little bit ahead of the curve.
And then when I went down to Los Angeles in early March - I took with me some masks, I took with me some of those supplies. So it was on my radar. And then I was actually supposed to go back down for another trip the next week, and I said, "You know what? With this coronavirus, I'm not coming down for this next trip." They made fun of me, but then the meeting got cancelled because of coronavirus, so -
Len: Thanks a lot for sharing that. I probably - we maybe follow some of the same people, or at least the same bubbles of people online. I started becoming aware of things of it in late January myself. I remember my first reaction was to start stocking up on booze, because I really didn't want to run out of that. Being single, my responsibilities are not the same as yours.
But it was an interesting experience in just like realizing that a household - even if it's just one person, not to have a well-stocked larder, and how exposed you are to disruptions in supply chains or really anything going on, if you aren't well stocked up.
I think a lot of people had to do a little bit of psychological dancing around, "Am I becoming a hoarder or something like that?" What I realized is, "I'm just living now like my parents did when they were on the farm." Like you have a well-stocked larder so you could go there, instead of to the store to get stuff to make your dinner.
Joel: There was a period of time, and I would say April and maybe May were - if there was room in the fridge, I felt like I should buy something to take it up. I think we've gotten a little bit past that now, but -
Len: It's funny, that's exactly how I started looking at my empty cupboards. I was like, "They're empty. Fill them." I never really quite got there, because it turns out, you actually don't need all that much space to store food. Which was a good lesson to learn.
I should mention to everyone listening that if you want to hear Joel talk about this a little bit more, the last three episodes of their podcast have been rechristened, "Adversarial distancing," rather than "Adversarial Learning," and they talk quite a bit in depth about this thing. I enjoyed listening to them preparing for this interview, they were really good discussions.
So moving on to the subject of your book Fizz Buzz. Before we ask about the origins of the book and everything like that, can you talk a little bit about what "Fizz Buzz" is, and where people encounter it?
Joel: Fizz Buzz is the following problem, and it supposedly originated as a children's game: the problem is to say or print - if you're using a computer, so we'll say print - the numbers one to 100, except if the number's divisible by three, instead of printing the number, you print "Fizz." If the number is divisible by five, instead of printing the number, you print "Buzz." And if the number is divisible by 15 - instead of printing the number, you print "Fizz Buzz." So you can imagine a game where children are sitting in a circle going around saying the numbers in order, and then having to substitute "Fizz" or "Buzz" or "Fizz Buzz."
But it's a somewhat elementary computer programming exercise. And because it's elementary, sometimes it gets used as a lowest common denominator weed-out programmer interview question. And so sometimes people will ask that, to make sure that you can - if you're an experienced programmer, it's a very easy problem to solve. Sometimes they'll ask that, just to make sure that you can - quote unquote, "Code your way out of a paper bag," let's say. And so, now everyone knows that it gets used for that. So it's in people's minds as the canonical bad interview question that people get asked. And so that's Fizz Buzz. That's why most programmers are familiar with it.
Len: I just have to share a little anecdote. So my background was - my education was in English Literature, and then I went into investment banking for a few years. And I did a fair amount of pretty sophisticated - from my perspective - financial modelling in Excel. And the whole time, I didn't know that I was programming. When I started working for Leanpub with my old friend from high school, Peter Armstrong, I was the resident non-programmer. And I remember one time we were on a car trip to a business meeting or something like that, and he said, "Well, Len - these guys all think of you as like an English major or whatever, but I know that you can do some other things too. So I'm going to ask you this question that we always get asked in programming interviews, which is Fizz Buzz." And it was a bit of pressure.
But within a few seconds, I was just like, "Well, let me just think it through. What would I do if I had to do this in Excel? And I was like, "Oh, I'd use the mod function and I'd go 'if mod 15 or if mod 5 or if mod 3.'" I was happy with the result. But it wasn't until years later that I actually learned that mod isn't just like a feature built into Excel to do certain things, and that it's actually a concept in math. So it was just an interesting experience where like - given the tools and solutions that you can use, you can actually find your way to answering problems in different ways, and not even really know what you're doing.
But what your book really does really well, is it sets out 10 different approaches - although there's multiple dimensions to the approaches - and helps you learn a lot of these concepts, a lot of mathematical concepts along the way.
I should mention that Joel is also doing a really funny video series, called, "Ten Videos on Fizz Buzz." And for any programmer watching, I really recommend watching the first one, where Joel is doing a mock interview with someone who asks him to do this. And the sort of like naive programmer that he's playing, actually writes "print" 100 times on 100 different lines. It's really funny when you realize that you're really going to do it, that's when it becomes really funny.
I was wondering if you could talk a little bit about what's wrong with the approach of just typing out "print null" or "print Fizz," or "print Buzz" for each of the 100 numbers?
Joel: So first, let's talk about what's right with it. It solves the problem. Like if someone says to you, "I need you to print the numbers one to a hundred, except, in these circumstances, print these other things." And if you write out 100 print statements, that solves the problem. So that's what's right about it. And I can imagine as an interviewer myself, being like grudgingly impressed by someone who goes to all that trouble and does it. There's something funny about it.
As to what's wrong with it, there are a few things. From an efficiency point of view, part of the reason why we use computers to solve problems is that computers are good at doing certain things. And so when we use computers to solve problems, we'd like to let the computers do the part of the problem that they're especially good at. And using that - printing is not something they're particularly good at, they're okay at it. But having the human figure out what the right output is, is definitely not the right division of labor. So that's one problem.
In an interview setting, another problem is that - usually when they ask you a question like this, they're trying to understand - can you think about this algorithmically? Can you come up with an algorithm for solving this problem? And using 100 print statements is not really coming up with an algorithm for solving the problem. It's really solving the problem on paper and then just printing it out.
And then the third reason, from a more like software engineering point of view, is that a solution with 100 print statements is actually really hard to test. Usually when we write code, we like to be able to test it, and make sure that it's doing what we think it's doing. But the only way to test the 100 print statements version is to print them out and check them one-by-one. So it's really not ideal from that point of view either.
Len: You also write about extensibility.
Joel: So that one, that one's a little bit more - so, that's true. If you have a function that computes Fizz Buzz and you loop over numbers and print out the result of that function, then it's easy to make certain kinds of changes. Like, "I'd rather have the words be lowercase." Or, "I want to have the outputs be in Spanish." And then those kinds of changes are easy to make when you just have to apply it in one place to the function call, versus having to make those changes in 100 places.
That said, there are other solutions in the book that are not particularly extensible for one reason or another, but are interesting for other reasons.
Len: The second solution that you talk about is the "if else" response, which is basically the one, as I understand it, is pretty much the one I gave when I was challenged with this. Is that the most popular response, do you think?
Joel: Yeah. I would go even beyond that and say - if someone is asking you to solve Fizz Buzz, that's probably what they're expecting to see, is that or some minor variation on that.
Len: And can you talk a little bit about what the modulus, is for those who might not be so mathematically literate?
Joel: Sure. So think about when you do long division and you don't want to end up with a fraction. So I want to say, "45 divided by seven." Well, seven doesn't go evenly into 45. So there's going to be a remainder left over. So when I divide 45 by seven, I say, "Seven times six is 42, and there's a remainder of three." And so the simplest way to think about the modulus is just - if I want to say, "What is 45 mod seven?" I'm just saying, "What is the remainder when I divide 45 by seven?" And so if you take a number that like - 42, which is divisible by seven - then the remainder is zero. And so checking that the modulus of a number is zero, is an easy way of checking whether one number is divisible - well, I don't say, "An easy way." But computers usually have modulus built in as an operator. So if you're programming, it's an easy way to check that - whether one number's divisible by another.
Len: It's funny - my way into learning the Excel feature of mod, was - I was working on a model for a giant French infrastructure company. And one thing they had to deal with a lot was local mayors. And so you had to build a period into your assumptions, where like every 4 years, anyone who's vying to be elected mayor of a town would want to negotiate with this company to lower the fees that they charged. And so we had to build - it's like, "Well then go, go find a way to make something change every four years." But you want to be able to tighten that number, in case you change it to five or you want to change it to three or something else.
So yeah, it's interesting the applicability of these things, that might seem a little bit arcane - but they exist for very real reasons.
One of my favorite solutions that you offer in the book is Euclid's Solution.
Joel: Yeah.
Len: If you want to talk about that for a couple of minutes? Because it's just a fascinating example of the diversity of approaches and the complexity of the solutions that you can find in the book - all explained very clearly, by the way.
Joel: Yeah, so - Euclid's Solution is interesting, because it's based on a couple of steps. One step is this notion of having the greatest common denominator of two numbers. So the greatest common denominator of two numbers of two numbers is just, "What is the largest number that divides them both?" The largest number that divides ten and five is five. So that's the greatest common denominator. The largest number that divides four and six is two. So that's their greatest common denominator. And it turns out that there's a solution of Fizz Buzz that involves greatest common denominators. And what is this?
Well, if you take the greatest common denominator of any number and 15 - if that number is divisible by three but not by five, the greatest common denominator is going to be three. If the numbers divisible by five but not by three, the greatest common denominator will be five. If the number is divisible by 15, the greatest common denominator will be 15. And if the number's not divisible by three or five, the greatest common denominator will be one. And so if you compute the greatest common denominator, then it's going to be either one, three, five or 15 - and based on that, you can choose which is the correct output for Fizz Buzz.
But then - so Python, which all of the solutions in the book are in - has a greatest common denominator function, but how does it work? Well, it works according to what's called "Euclid's Algorithm." Euclid was a Greek mathematician, and he has an algorithm for computing the greatest common denominator of two numbers. And so Euclid's Solution basically uses Euclid's Algorithm to compute the greatest common denominator of your input in 15, and then uses that to choose the correct output.
Now, most people don't know, or at least don't recognize, Euclid's Algorithm. So when you read the solution, it looks elegant and it's short and it's very not obvious at all what it's doing. But it's a beautiful solution. That's also one of my favorites.
Len: Definitely, it's beautiful. I should also mention - by the way, in all these serious discussions - it's a very funny book as well, with a really great sense of humor. You'll definitely enjoy reading it.
There's other approaches. The fifth approach uses trigonometry. The eighth one uses random guessing. The ninth one uses matrix multiplication. But I think the tenth one, the last one - Fizz Buzz in Tensorflow, was actually the - I guess the source of where all the other essays came from?
Because you wrote this post about how to solve Fizz Buzz using machine learning, that got a lot of attention. And it's funny - because the way the post is framed, it's that there's someone up for an interview, and then they get asked the Fizz Buzz question and they take insult at it. And so then they take the opportunity to just show what a brilliant person they are to the interviewer, who is all of a sudden demonstrably out of their depth. I was wondering if you could talk a little bit about a machine learning approach to solving Fizz Buzz?
Joel: Yeah, so that was a blog post that I wrote in 2016, and I feel like I saw a discussion online about like stupid ways to solve Fizz Buzz. And I thought, "I bet I can come up with a stupider one." So I went off and thought about it and wrote that post. But Fizz Buzz as a machine learning problem, it actually turns out to be pretty interesting.
So what do we mean by machine learning? Well, if you think about how we typically program a computer to do things, most frequently you give it instructions to follow, and you say, "Take a number. If the number's divisible by five, then print 'Buzz.'" And so that's your traditional programming way of solving a problem.
With machine learning, you do something different. In one paradigm in particular, in the one that I use in the book - you give the computer a bunch of examples that are correct, and you ask it to learn a model to make the correct predictions. So the simplest example - not the simplest, but a simple example is - imagine wanting to get a computer to be able to tell you whether a picture is a picture of a dog or a picture of a cat?
Now a picture is a big array of numbers. And if you sat down and said, "I want to write a rule for when an array of numbers represents a dog and when it represents a cat," you're going to have a really hard time with that. There are things you can do, but it's going to be pretty tricky.
But the thing about machine learning is, if you specify, "Here is a model that has some weight and it's going to apply those weights in a certain way to the input and make a judgement, and then I can show it a lot of pictures." It can learn what weights to apply that actually get the right answer most of the time. And so that's the machine learning solution.
So the way you approach Fizz Buzz as a machine learning problem, is to say - well, there's probably a lot of ways to approach it. But the way that I approached it is to say - for any number, there are four potential outputs. Meaning - Fizz Buzz, Fizz, Buzz - or printing the number as-is. And so it's what you would call a four class classification problem.
Similarly, if you had, "This is either a picture of a dog or a cat or a turtle or a horse, and tell me which one it is," except that the classes are something different. And so we want to take the numbers and use a bunch of labelled examples.
Here I took the correct Fizz Buzz outputs for 101 up to say 1000, and used those to train a model, and then try to apply a model to get the right answers for one to a hundred, - which is what we're supposed to do.
In the chapter, I explore a couple of different ways of thinking about that, a couple of different models, a couple of different ways to represent a number as a set of features for machine learning. And it mostly works.
Len: Thanks a lot for sharing that. Again, to repeat the title of the book, because we're partly here to get people to buy it - Ten Essays on Fizz Buzz And also check out Ten Videos on Fizz Buzz, which is an ongoing series. And you can also subscribe to Joel's channel on YouTube as well.
Just moving on to the last part of the interview, where we talk about your experience as an author. So I believe your first experience as a book author was with O'Reilly, your Data Science from Scratch book?
Joel: I actually - well before that, quite a long time ago - I wrote and self-published with very little success, a book on spreadsheets.
Len: Oh okay, I didn't know that.
Joel: Yeah, no one does.
Len: It's really interesting to learn - a lot of people who aspire to be published by publishers like O'Reilly and things like that. How did that come about? Did they approach you? Did you approach them and pitch them an idea?
Joel: Here's how it happened. A different publishing company, whom I won't name, reached out to me out of the blue and asked me if I wanted to write a book on analytics in Python. And so I thought about it, and I talked with them, and they sent me a sample contract. It had a lot of terms in it that I considered really unacceptable. And so I told them, "No thanks." But that put the idea in my head of writing a book, and I thought, "What is the book I would like to write?" And analytics in Python was not really the book that I wanted to write anyway.
And then the second thing is that - well, two more things. One - as I said, my background is in math. And in math, there's this sense of - you're not allowed to use results unless you've already proved them. And so to give an almost comical example of this - when I was in grad school, I took a math class that was several quarters long. And the third quarter had a different professor from the first two quarters. He came in on the first day, and he said, "I am so delighted that last quarter you proved this certain theorem, because I'm going to need to use that this quarter." And in his mind, if we hadn't proved it in the previous quarter, we weren't allowed to use it in that third quarter. So that's the sensibility that I come from, that we should do the foundations before we do anything else.
And then the second thing is that - one of the first MOOCs, the online courses - was Andrew Ng's ML class, which is a machine learning class. And there, again - he went through machine learning by implementing these models in Octave and using gradient descents as this almost organizing principle for the class. Both of these two things really influenced the way that I thought about how to teach and how to explain. And so I thought, "What if I could approach data science from this perspective?"
And so I wrote a full proposal, and I emailed it to O'Reilly. Why did I choose them? Because they were prominent in technical publishing. That was the first name I thought of when I thought of technical publishers.
It was actually a much more ambitious proposal than what the book ended up being. They wrote back to me pretty quickly, and they said, "This sounds interesting, but this is like two books worth of material, and we don't give two-book contracts to unknown authors. So what can you do about that?" I said, "You know what, the first half where we derive everything from scratch is the interesting part to me."
The second half, which was basically - okay, now that you've done it from scratch, he's how you go to actually do things using libraries - the conceit of the book is that we're going to learn data science, and the models of using data science regression, decision trees, neural networks. But we're going to do that by implementing them all in bare Python, so they'll really understand how they work.
And so the second half of that book, which was the, "Now let's go in the libraries," I said, "You can give that to someone else." I don't know if this is exactly what happened, but Jake VanderPlas wrote a book called the Python Data Science Handbook which is basically what my second half would've been. So that's good, because it meant that I didn't have to write it.
Even after I pared this proposal down, they were pretty skeptical. So I wrote a sample chapter and I sent it to them. And then they're like, "Ah, I don't know." So I wrote another sample chapter and I sent it to them. This happened several times. And then eventually, Mike Loukides, who - I don't remember his title anymore, he's the VP of Content Strategy or something. He said, "Look, if I keep saying, 'I don't know,' are you eventually going to send me the entire book like one chapter at a time?" And I said, "Yeah, probably." And he said, "Okay, fine - we'll publish it." So I think that's how it happened.
Len: Thanks for sharing that really great story there. A couple of things for any aspiring authors or struggling authors out there to think about right now - the first is, read the terms of any contract that you're presented with. You can say "no," and succeed elsewhere if you don't like the terms that you're being offered.
And also, persistence. Persistence is just a necessary condition of completing a good book. Even if on the other end, you've just got people going, "Give me more, give me more, give me more." Either way, it's going to be up to you to sit at your desk, or wherever you sit when you write, and crank things out, and make them good.
And so for this latest book, you decided to self-publish it, like your very first one from years ago. Why did you decide to self-publish this book?
Joel: A couple of reasons. One, let's call it laziness. It's a lot easier to - it's a lot less work to - the publishing part, at least, is a lot less work than having to interact with editors and copywriters and so on, and back and forth, and contracts. This was sort of like a very hobby project for me. And so, the amount of energy that I wanted to put into it was correspondingly hobby-like. So that's one.
The second reason is that, it's a very strange book. What do I mean by that? I mean that most technical books are about specific topics. So here's a book you would read to learn data science. Here's a book you would read to learn Java. Here's a book you would read to learn about databases. Well, no one's going to say, "You know what? I need to read a book to learn about Fizz Buzz." Because no one really needs to learn about Fizz Buzz.
And the second thing is that - other than Fizz Buzz as a unifying theme, the book's not really about anything in particular. So I said in the introduction, "I hope that you'll learn a lot by reading the book." And I think anyone would learn a lot reading the book. But it's not a book that you would read to learn anything in particular. And so because of that, it makes it a little bit unusual.
And so when I think about it as like a book on the shelf at Barnes and Noble, I don't know how it fits. And when I think about it as like an O'Reilly book or a Manning book or something like that, it doesn't feel like one of their books to me. So that was also part of my consideration. It's just that I imagined - and maybe I'm wrong about this, but I imagined that it didn't fit in with their catalogs - or with anyone's catalog. I couldn't think of a publisher where I thought, "This book feels like it belongs with this publisher." So that's the second reason.
And the third reason was a little bit as a challenge to myself. Like I said, I self-published this Excel book that was not successful at all, and eventually I just put it online for free, and people - some people really like it and they email me, but for the most part nothing came of it.
But I saw this as a challenge for myself. When I self-published the Excel book, nobody knew who I was. Twitter was around, but I didn't have any followers. I was not prominent in data science or any other field. So there were a lot of things working against me.
And part of me wants to find out - now that I understand a lot more about how to sell things and how to market things, now that I have much more of an online presence and a reputation for writing books and doing various other things - can I actually make a success of this book when all of the marketing onus is on me, and I don't have - this is the O'Reilly book on data science to fall back on, so -
Len: And once you'd decided to self-publish it, you had to choose a place to do that. And at least one of the places you've chosen to publish it is Leanpub. Why did you choose to publish it on our platform?
Joel: So you'll laugh at this too. I started off just using Markdown files and Pandoc. I wasted a lot of time trying to get Pandoc to like format the book in a way that didn't look like really awful. And I was never really that successful about it.
And so then just like, almost on a lark, I converted everything to Leanpub-Flavored Markdown and uploaded it. It looked pretty nice out of the gate. And I was like, "You know what? I like how this looks. I'm just going to keep going with it."
My one main complaint is that in Pandoc, I was able to make the code blocks have a different background color, which I liked. But that was a small complaint.
Len: I do believe we have an option for a grey or a yellow background if you choose a custom book theme for code blocks, but I'll confirm that and let you know afterwards. [Note: There is no option to to this on code blocks; Len was thinking about the feature that lets you set the background colour of Asides to White or Gray - Eds.]
Joel: I wasn't able to find that option, but that's cool, that's cool.
Len: Well that would be an effectively non-existent feature from your perspective, so that's obviously a problem that we need to solve anyway.
But actually setting that aside, which we have - which I will note - the last question I always like to ask on these interviews, when I'm interviewing someone who's published on Leanpub, is - if you could ask us to build anything you wanted for you, or if you could ask us to fix anything that really bugged you, can you think of anything that you would ask us to build or fix?
Joel: That's an interesting question. Anything you could fix for me? It's funny - there's several levels of plan, right? And so while I was writing the book, I signed up for the intermediate level. And then once I got ready to launch the book, I went up to the higher level, because it has more analytics and things like that.
I think I probably should've done the higher level earlier and made more usage of the API and stuff, because there were a lot of manual processes I kept doing that I should've automated for the extra three bucks a month. But that's not really something you should've changed, that's something I should've done differently.
Len: Well, we should probably be a little bit more clear about how - so basically what Joel's talking about, is we have three plans. A Free plan - we have a freemium model, right? So the basic plan is free. Then we have a Standard monthly plan, and then a Pro monthly plan. And that's - with the very original naming rank there.
But yeah - so the higher you go in the level, the more sophisticated the tools get. And if you figure them out, and you can use them - then the more efficient your processes get as an author. So that's a pretty important thing, and we should probably be better at advertising or talking about how - you might have to learn one or two things, but if you use the Pro thing - we built it to make it more efficient for people to use.
Well, Joel - thank you very much for taking the time out of the day to be on the Frontmatter Podcast, and thanks very much for being a Leanpub author.
Joel: Thank you for providing the service, and thanks for having me on the podcast.
And as always, thanks to you for listening to this episode of the Frontmatter podcast. If you like what you heard, please rate and review it wherever you found it, and if you'd like to be a Leanpub author, please visit our website at leanpub.com.
