Leanpub Header

Skip to main content
The Leanpub Podcast Cover Art

The Leanpub Podcast

General Interest Interviews With Book Authors, Hosted By Leanpub Co-Founder Len Epp

Listen

Or find us on Stitcher, Player FM, TuneIn, CastBox, and Podbay.

Roy Keyes, Author of Hiring Data Scientists and Machine Learning Engineers: A Practical Guide

A Leanpub Frontmatter Podcast Interview with Roy Keyes, Author of Hiring Data Scientists and Machine Learning Engineers: A Practical Guide

Episode: #207Runtime: 01:35:01

Roy Keyes - Roy is the author of the Leanpub book Hiring Data Scientists and Machine Learning Engineers: A Practical Guide. In this interview, Roy talks about his background, computation and medical physics, data science and the challenges people in the industry face both in hiring and in getting hired, and at the end, they talk a little bit about his experience as a self-published author.


Roy Keyes is the author of the Leanpub book Hiring Data Scientists and Machine Learning Engineers: A Practical Guide. In this interview, Leanpub co-founder Len Epp talks with Roy about his background, computation and medical physics, data science and the challenges people in the industry face both in hiring and in getting hired, and at the end, they talk a little bit about his experience as a self-published author.

This interview was recorded on July 14, 2021.

The full audio for the interview is here: https://s3.amazonaws.com/leanpub_podcasts/FM185-Roy-Keyes-2021-07-14.mp3. You can subscribe to the Frontmatter podcast in iTunes here https://itunes.apple.com/ca/podcast/leanpub-podcast/id517117137 or add the podcast URL directly here: https://itunes.apple.com/ca/podcast/leanpub-podcast/id517117137.

This interview has been edited for conciseness and clarity.

Transcript

Hiring Data Scientists and Machine Learning Engineers: A Practical Guide by Roy Keyes

Len: Hi I'm Len Epp from Leanpub, and in this episode of the Frontmatter podcast I'll be interviewing Roy Keyes.

Based in Houston, Roy is a data scientist and consultant who has built and led teams at multiple tech startups in a variety of industries.

You can follow him on Twitter @roycoding and check out his website at roycoding.com.

Roy is the author of the Leanpub book Hiring Data Scientists and Machine Learning Engineers: A Practical Guide.

In the book, Roy provides a detailed guide for people hiring data scientists into their organizations, helping you understand the various roles you'll be hiring for, and how to pick the right candidates for each position.

In this interview, we’re going to talk about Roy's background and career, professional interests, his book, and at the end we'll talk about his experience writing and self-publishing a book.

So, thank you Roy for being on the Leanpub Frontmatter Podcast.

Roy: Thanks for having me.

Len: I always like to start these interviews by asking people for their origin story. So, I was wondering if you could talk a little bit about where you grew up, and your path to a career in data science?

Roy: Sure. Well, I guess the origin story - I didn't come out of a data science egg or anything exciting like that. I'll go back to where I grew up.

I grew up in Kansas, which is right in the middle of the US. I always wanted to be a marine biologist, which makes sense when you're hundreds or thousands of miles away from the nearest shoreline.

But when I went off to college, I ended up studying physics. Eventually I ended up doing a PhD in physics. That whole time, I think I was - when I was in high school, it was kind of the early web days. And so I started getting into just basic web stuff and programming. Eventually in college, and then later in grad school,I really drifted towards computational physics stuff.

I was always kind of doing some programming things. Always big on open source and Python. And right around the time when I was finishing grad school, data science started to become a thing.

It was something that I had started seeing. I also was always peripherally into startups and stuff like that, kind of paying attention - I had a few friends that were involved in the startup world.

After graduate school, I worked briefly in a cancer clinic doing radiation physics - what's called "medical physics."

And after doing that for a couple years, I was at a point where I needed to decide what I wanted to do going forward. I decided that maybe data science would be a good choice, because I thought it was really interesting. I had been learning some of these methods to try to incorporate into some of the research I'd been doing. And so I dove head in.

There were all these new - at the time - learning resources online, open free classes. People had the desire at that point, about ten years ago, to educate the whole world for free with a - quote, "Ivy League Education." And some of the early classes that came out were related to data science and machine learning and stuff.

So I did that. I started doing some consulting, mostly with tech startups. Eventually, I moved out to San Francisco, and started working directly for some companies. And then at some point, I moved back to Houston - which is where I went to college, and I've been here for last five years or so.

In the course of that time, I have started out as an individual contributor doing a lot of the hands-on stuff, and then eventually making the blind jump into management, and actually enjoying it a lot.

I mean, I like working with all the people on my team. I feel as a manager that if you can get high quality people on your team - and I think I've been incredibly lucky, in that regard - then they can serve as a - they can certainly amplify, or multiply, whatever you could have done yourself.

Sometimes that's a great feeling. I mean, you're still somebody's boss, which doesn't always go as smoothly as you might like - but I actually enjoy that. And one of the areas that I just spent a ton of time on, and I enjoyed - was related to hiring.

Len: Thanks very much for sharing that story. That's really interesting, the path that you went down. It was funny, you reminded me of an old memory at the beginning. I grew up in Saskatchewan, in the middle of Canada. Which, like Kansas - I imagine is very flat.

Roy: Yes.

Len: And at least where I grew up, there is a connection between the prairies and the sea. Which is that, in the pre-radar days, prairie kids were good candidates for the navy, because they could judge distances so well.

Roy: Interesting.

Len: If you've never grown up in a very flat place, it might be hard to imagine like being able to see the horizon, when you spin around in every direction.

Roy: Yes. And no trees.

Len: And no trees, exactly. Exactly.

Roy: I suppose as long as you don't get immediately seasick.

Len: Exactly. Well, I'd imagine that'd be a big problem.

And so - you mentioned you did a PhD. It was in physics I believe? And you did that in Houston?

Roy: The PhD I did in New Mexico, at the University of New Mexico.

Len: And what was the subject of your dissertation?

Roy: I initially went up there actually to do stuff related to quantum computing. I was probably about ten years too early for that, to really be a viable path. Maybe twenty years too early, actually.

But then at some point, I ended up switching over to what's called "medical physics", which is sort of a field of applied physics that mostly deals with radiation therapy for cancer treatment. And for diagnostic imaging like X-rays, CT scans, MRIs, ultrasounds - that stuff.

I did a clinical Master’s degree while I was in grad school, which is aimed at training people to be able to work in a hospital setting. And then my research itself was focused on what's called "particle therapy." That's using big accelerators to shoot beams out of charged particles into people, and try to kill off tumor cells. My specific research was almost all computational-focused. It was basically around trying to figure out, how you could do these calculations more quickly?

Because when you treat a cancer patient with radiation therapy, the radiation doctor or the radiation oncologist, they basically say, "Okay, here's your tumor. we want to deliver this dose to those tumor cells. But at the same time we want to make sure that the dose to some of these other healthy tissues is below certain thresholds." It depends on the organ, because different organs have different sensitivities to radiation.

So, there's a computational problem in there about how you set up the geometry of the radiation delivery, and everything to try to achieve those targets.

While I wasn't working on the geometric problem, I did my grad school - I had two advisors. One was in the physics department, one was in the computer science department. My advisor that was in the computer science department, he focused on this computational geometry, which was optimizing these problems.

Then on the physics side, which is more I was working on - I was trying to get as accurate as possible, calculations of what the radiation dose would be.

A lot of times as a physicist, you're working on very, very simplified problems like, "Imagine an electron floating along and then it encounters a proton, what's going to happen?" With these applied problems, you've got, as we call it, a "medium." That medium is a human. It's a lot more complicated than, say, the vacuum of space with one electron and one proton, or something.

The only real way to solve these problems is to do calculations. The gold standard way to calculate this stuff is called "Monte Carlo", which is basically - you're doing thousands or millions of simulations over and over, and taking the statistical result at the end.

The big problem with that type of calculation, is that - on the one hand, it's the most accurate method. But on the other hand, it's very slow. My dissertation was around trying to figure out ways to speed up those types of calculations, so that they could actually be used in a clinical setting.

Len: That's really fascinating. I'm familiar with Monte Carlo from the finance side of things.

Roy: Yes.

Len: It's so interesting, because from the layperson's perspective - of which I would certainly qualify - we would think someone doing medical physics would - when we seeon TV, somebody goes into the tube, and then like an image of their brain comes out. It's not - the machinery behind that isn't taking a picture of the brain, the picture is being -

Roy: Reconstructed.

Len: Reconstructed to you from a bunch of data.

Roy: Yes.

Len: There's calculations that are going on. It's not a straightforward, one-to-one correspondence between what's being fed into the machine, and what's being presented to you.

I think a lot of people would think, "Oh well, physics is basically shooting lasers at something." What do you need computation for, right? You point it and you shoot.

But of course there's so much variability in the human body. I mean, when you get down to microstructures and things - I just made up that word. But you get down to some very small things. There's a lot of variability. The idea that what you have to do in the end is a lot of really sophisticated guessing. Which is what it -

Roy: Yes.

Len: Sounds like. It's -

Roy: Right. One of the main reasons that you have physicists who work in these clinics, is because you are using some big machine that spits out radiation. The problem is that you can't see the radiation. You can't feel the radiation. You can't hear it, you can't smell it. Tou are, on the one hand, doing these very sophisticated calculations about, "Here is the shape of the beam we should use, and here is the angle and everything, and this is how long the beam should be on," and all that stuff, to achieve the desired doses. If the machine is not giving you the beam you think you have, then that's not going to work.

What a lot of what the physicists do in these radiation clinics, is they go in and they make these very precise measurements of the radiation. Then they have to be experts to interpret those measurements and deal with uncertainty, and all those things.

It's a very central role there, and it's one of those things that may be unusual in the medical setting. I'm sure someone, I'm sure a lot of people give me flack for this - but it's a role that cannot be fulfilled, for example, by the physician.

A doctor, they're the ones on the medical side that are the top-level person in the clinic and the hospital. They have the most qualifications. At the same time, they're not experts in the physics side.

I'm sure that maybe a few pharmacists would say something similar, that they're doing some chemistry and pharmo - I don't remember all the terms off the top of my head, it's been long enough. Things where there's this level of expertise that's needed, that just doesn't exist with anyone else.

Len: Yeah, "I'm a doctor not a physicist, Jim."

Roy: Right, right.

Len: That sounds like really fascinating work to have done.

It's really interesting. I was just thinking about it - going over your bio on LinkedIn and stuff like that - what it must have been like to go from a structured world and academic degree like a PhD - I didn't know about the clinical work, that training that you did as well - to go from that into the fray of the startup world. What was that experience like?

Roy: I had a little bit of a taste of it before. cCrtainly there will be people that I've worked with in the last ten years, who'd be very surprised to know that I used to wear a tie to work.

While I was doing my Master's degree, I actually worked at a startup. That was a startup that was building particle accelerators - a hardware startup, except that the hardware itself was all million-dollar-plus hardware. It was a bunch of physicists from Los Alamos using technology that they had developed for physics experiments. We were trying to build these relatively small particle accelerators that accelerated protons, to use in a lot of different medical and industrial applications.

That was certainly - for the people in the startup world - the problems, they were around product-market fit, funding. The iteration cycles on these products would make many startup people just give up, probably. Because we're talking years of iteration cycles and using physical technology, that you would send off to someone to do super precision machining and manufacturing. Then you'd get it back and, "Oh, by the way it doesn't do what you expect it to do." Very different then the software that most startups are working on.

I had some sense of that, and that was a company - I guess, the most classical startup aspect to it, was that it didn't work out in the end. Which is the story of most startups.

It was good experience. It was interesting. I'm sad that it didn't work out, because they were trying to do some cool stuff related to some processing with these particle accelerators. The customers wanted to do that. Then several medical applications.

But, then - later when I decided to go into the startup world and - as I mentioned before, I had always been paying attention, reading the news sites and stuff. Probably since the early 2000s, related to startups and what was going on.

And so, when I decided to go that route, I contacted my friends that were in this world, and started asking them for advice. Then jumping in.

It is a big transition, I think, from the academic world. Or also, when you're working in a hospital. Very bureaucratic, very different setting.

Once, actually, when I was working in a clinic, and I proposed to my boss that we get access to the database that recorded all of the treatments - how the treatment worked and the equipment - that way we could monitor some of the stuff that was going on, and also investigate something that had happened. My boss said, "Great. Go talk to IT."

I went to the IT department, and they basically looked at me like I was some hacker trying to break in. "Why could you possibly want access to the database?"

Eventually the CEO of that cancer center had to tell the head of IT, like, "You need to give Roy access to the database." Because it was just very far removed from the way that they did things, at least at that time.

So, going over to the startup world where they'd say, "Oh, here are the keys to the database. Do whatever you need to do. Try not to crash it." Very different. Then, just iterating at all times.

My two main startup experiences, the startups I spent most time in - one was around frantically trying to find product-market fit. The other one was about frantically trying to get margins to the point where profitability would be possible. Also - and more immediately, that people would want to invest more money in the company.

So those are things certainly that in an academic setting or something, it's just very, very different. Academia is more like, "Try to get your grant accepted. Then try to shoehorn the things that you are doing that don't quite fit under that grant, into the grant - so that you can pay for them."

Len: That gives me a great opportunity to ask you for a specific example. When you talk about product-market fit - for those familiar with the lingo in the startup world, what that usually means in people's imagination, is - that you've got an idea for a product, you put it out there, and you try and find a market for it. What happens - you said, iterating a couple times - is that like you don't try - you can't really iterate the market, but you can iterate your product.

Roy: Yes.

Len: And so, what you do is you try to find what's called "product-market fit," where you've got a product that has a market out there that wants it, and then you can try and achieve profitability.

I think - for someone like me, for example - when I think about product market fit, I think about like UI, for example.

How does the UI work? How do I make sure people know what this product actually does, and that they can actually do it? From a data science perspective, how can data science be used to help a startup find product market fit?

Roy: Right. I think there's two sides of it. One is the analytics side of things. So, you come back to the UI stuff. It's like you're doing A/B testing, and these things. How is the market - if you will - responding to this version of the feature, versus this version of the feature?

Then on the other side of it's like, if you are building a product that is powered by data, so data-driven somehow - then there's often a question - people love data science and machine learning and artificial intelligence. Right now it's very, very popular, we'll say. But there's often a question of whether the thing that you could build using those techniques, is actually solving a problem that people have.

A lot of it is - I mean, the founder of a company - they might come into something and say, "Oh, we'll use a lot of lingo here - we're going to disrupt this market using AI." Something like that. It may be - if they're not on the technical side of this - it may be that they've seen a demo of some technique, and they were like, "Oh, if we just apply it to this domain, then that'll make all the difference, and we can build a product that's better than these other ones."

Often it's hard to know if the techniques that you're using are actually going to perform well enough, or if the data's actually going to be available, or if you're going to have to spy on people, or do something that people don't like.

Also, ultimately - are you going to actually be able to build something that is better enough than the other solutions that exist, that people will want it?

There's just such an allure to the buzz words of things like AI, that kind of - it pulls people in, and they want to do that. You often just, you have no idea at the outset if you could build something that will actually give value to the customers. Give you that product-market fit that you need. It could be that really what you need is just a reasonable system and a reasonable UI and UX experience that people want, and that the AI component is small to none in there.

Len: Yeah. I've got a lot of questions to ask you, I think, about this. I mean, for example - we're going to have to go into - you do this at the beginning of your book - you go a little bit into terminology. What's data science? What's machine learning? Engineering? What's artificial intelligence? It's really fascinating to me.

One of the themes may be just - probably because of my personal proclivity behind this podcast, but also because so many of our guests are technical people, is that when you've been on the other side of the production of things, in my experience, in a former life as an investment banker, it would have been charts and forecasts and things like that, right? Business plans.

When you've been on the other side of much more sophisticated things like machine learning and AI and data science and stuff like that, and the charts or the dashboards or whatever the reports that you produce - you know that behind the curtain, there's some guy with crumbs on his keyboard making a bunch of decisions.

Roy: Right.

Len: If you've only ever been on the receiving end, you just see the great and powerful Oz hovering over the curtain.

Roy: Or I think, from the data science side - we'd say that there was a data person, who presented some analysis with a whole bunch of caveats. Then the business person took it and threw out all the caveats and said, "Look at this magic that we've got."

Len: Exactly. I'm going to ask you a version of a question that I've asked many times on this podcast before - which is - what do you do when you're trying to overcome the throes of the caveats hurdle, right? Because - often it's not even conscious, right? People just don't - they basically don't process the caveats when you present. It's the results of an investigation, basically. Or an analysis to somebody, again, with a bunch of decisions behind it.

In my experience, people just often - it's really hard to get through to them. Then you run into the next level problem, which is, "Then what good are you?" Right? "If you can't give me the definite results that I imagine a good data scientist would be able to provide me with". How do you navigate those waters?

Roy: I think it's very difficult. I have definitely seen that first-hand.

One anecdote - my team was charged with forecasting demand once. Then the executive that was in charge of our group - whatever, was dissatisfied with our forecasting, and said, "Look, I'm going to pick these couple - other people not on your team, and we're going to do this by hand. Because we don't think that what you're doing is any good." My initial internal reaction was just rage. Because we had spent so much effort on this, and we knew the problem inside and out. We also knew it was an incredibly hard problem.

Basically at this point, we had - or we had recently, before that - spent about six months just tooling everything up, so that we could get the data flowing, and everything going and - I didn't say that. I basically said, "Okay, go for it." Because I knew that they were destined to failure. Because it was a matter, it was really - the point of them completely underestimating the difficulty of the problem.

he outcome was that, within less than a week, they gave up and they never said anything about it again.

I mean, that's a bad version of it, where we failed at really trying to convey the situation as it was, and the difficulty of the situation.

Rhen at the same time, they had misaligned expectations. Rhere was this misalignment between the groups of people. I think that a lot of it, especially for the senior level person, manager, whatever - is that they need to spend a lot of time trying to educate and manage expectations around things. For when you're working with - internally or with a customer, or like if you're in some consulting situation - you need to really try to lay things out as they are, and as gently - the way I put it in the consulting situation is sort of, "As gently as possible, bring people back down to earth, so they can understand things."

More a maybe concrete anecdote that I just recently heard from someone I know - they told me that when they had worked at a big media company, and they would always put screenshots of plots they would make, to put them in slide decks, their boss or someone, maybe it was their boss's boss or something? Pulled them aside one day and said, "Look, I'm going to show you how to embed charts in the slides, so you're not doing screenshots." Then his response was, "the reason I take screenshots is so you can't mess with the chart."

That's the thing - also that they fear - whatever, as the data person. And that's when there's - in a sense, a big power imbalance. That data person's just doing whatever they can. It is, I think, a lot about doing your absolute best to set expectations, and cut away a lot of the fluff and the hype. Trying to sit people down and say, "There is real value here, probably. There's also a lot of stuff around it that feels very attractive. You've got to understand that that may not be real."

Len: It's really fascinating when the world of messy data meets the world of messy personalities. Your boss's boss, and it's two removed.

You reminded me of something I haven't done for a long time. One time when I was a young investment banking analyst, I produced an embedded chart. One of my colleagues said, "You can't show a toothy graph like that to Martin." I was like, "What do you mean?" He goes, "He doesn't like toothy graphs." Or, "He doesn't like toothy charts." What this means is, basically - for anyone listening - it means like really big spikes in the line and the chart, right?

Roy: You want something nice and smooth.

Len: What do you do? You change the scale of the Y axis, so that it's not toothy anymore. I think, if I recall correctly, I objected and said, "No. This is the scale that's appropriate." I remember being in the meeting room putting up the slide on the wall. The guy goes, "Why is there so much variability in the line?" He didn't care about the scale, it was just the feature of the line being toothy that grabbed him.

That's not to knock him or to knock me. It's just that people - the things that often people think are actually very discrete, are not. As you say, that - really, that "coming down to earth" is really important.

Another really important theme there is - all of a sudden with big data and things like, that there are a lot of people in the C-suite or just below it, who are suddenly exposed to the results of technology that they wouldn't have been, maybe fifty years ago. Old strategies for management that, perhaps - even endorsed or approved - the idea of "zero domain expertise", for example. Which is the idea that an executive shouldn't know anything about how things work, because they're operating at the level of how things work from the management perspective.

I'm fumbling this presentation of this idea, basically that old idea of zero domain expertise, it's - there's a fundamental question about whether that's actually possible in business when software has eaten the world.

Just to pick a like cartoonish example from just the other day - Richard Branson pays a bunch of other people to get him into near orbit, but Elon Musk actually gets people into space, might be a good analogy for the modern world that we're in now. Where, like - if you don't know anything about how the computers go, you might just be doomed.

Roy: Yeah. It's - I think that the technology, all of this stuff moves so fast too - that even those of us who are steeped in it today, can easily feel like we're outstripped tomorrow, as far as understanding these things.

Len: That actually leads me to a question I wanted to ask you, which is, since started your PhD in physics and then moved on into data science. Data science - as you mentioned, has become this thing that basically people can learn formally.

Roy: Yes.

Len: There are university courses in data science and stuff like that. Do you feel - if you were starting out now, with the intention of having a career in this rapidly evolving world of data science, would you do a physics PhD again, or would you do a formal data science course?

Roy: That's an excellent question. I think it's probably one of the ten million dollar questions. The other one is, "What should you do if you want to get into this field?" The other one - "get hired?" The other one is of course - what my book is about, which is, "How do you hire these people? How do you determine who's qualified?"

It's really hard to know. I got into this field at a time when, by definition, everyone was a career switcher, right? Nowadays you could have a candidate that did both an undergraduate degree and a graduate degree in - quote - "data science," and I'm not really sure.

I've in the past thought, "Oh, if I ever could magically go back and do undergrad again and knew what I knew now I would, my first choice for a major to do it again would be computer science." That I had an even stronger background in what I was doing to build on this. Maybe out of my personal interests, that might still be a way to go? Would be computer science, and focus on these types of things.

I think there's also an argument to be made, that, if you just know these techniques but you've never really dealt with real problems to apply them to, that maybe that's hollow. Although, I'm sure people make the exact same criticism of statistics or computer science or whatever. I'm not really sure. I think that, it's not clear to me yet. I don't know who's going to be the best people.

On the other hand, you could make an argument that this stuff is all about - it is evolving very rapidly. The data science degree that you get today, BSDS, what's it going to be worth in five years from now, as far as what you learned? So much of the skill is about really just continuing to learn continuously.

There's a whole argument out there by some people that would basically be like, "college is on the way out, because you'll be able to learn everything you need to do online, and you'll need to learn new stuff all the time anyway." It is very hard to say. People ask me all the time, mostly about switching, if they want to switch. They go get a Master's degree in data science or something, or a boot camp. I'm not really sure. I think if I were 18 again, there's a good chance I would do a data science degree if they offered it where I was. Because there's just so much cool stuff going on right now.

That's a bit of a non-answer, sorry.

Len: No, no. I think it's a perfect answer. Because it gives us an opportunity to move on to your book and some of the challenges that it's addressing - the idea of hiring data scientists.

Because the flip side of the coin of choosing what to do yourself if you want a career in data science, is choosing who to hire. If everything's in flux and moving so fast, even the institution of data science as an academic discipline, is itself only a few years old, and obviously rapidly evolving. As a person doing the hiring, how do you decide who to choose the path that you're now taking your company down.

Just moving on to the next part of the interview where we talk about your book, Hiring Data Scientists and Machine Learning Engineers. I was wondering if we could now do what I gestured towards a few moments ago, and define some terms.

Roy: Sure.

Len: You start the book by saying, "These terms are difficult to define, they can mean a lot of different things." Someone who really knows their stuff might use the term "AI" very differently from a marketing department.

Roy: Yes.

Len: The marketing department might be more successful at finding venture capital. I was wondering if you could talk a little bit about how the terms "data science" and "machine learning" and "artificial intelligence" are used?

Roy: Sure. I think that data science, when it came about, it was defined in a very broad way, basically encompassing any things that you would be doing with data in a technology and business context, especially.

In that sense, in the broadest sense, it would be answering questions by using data. Making predictions using data. Automating things using data. All of these types of things. That could be the very traditional data analytics type of things like, "What were our sales last quarter? last year?" Whatever. We'd get the data, take a look at it.

Slightly more sophisticated things like, "Can we identify groups within our customer base that seem to be similar to each other? Can we cluster those or group those, segment those customers into certain things? Then we can maybe have ads that are really going after those specific segments of our customer base, or the market."

Things like A/B testing, as I mentioned before - there's supposedly some famous Google one, where they tried out 41 different shades of blue on their search button, to see which one got the most clicks, or something like that.

Can we statistically show that this is a better idea than our last idea? A lot of that is traditional analytics. There's some more sophisticated statistical techniques.

Then there's also machine learning stuff that came out of the computer science world, and that's building these predictive models.

You want to predict what price a house is going to sell for. You want to do search and try to ran,k what are the links that someone doing a search term - yhey're doing a search, and for the search term, what are they most likely to actually want as a result, or recommend products or music or whatever else? They're using machine learning and some other advanced numerical techniques in the background. automating things.

A silly example would be, if you've got a big pile of pictures, and you want to know which ones are cats, and which ones are dogs. You can build an algorithm to do that for you faster than a human could do. Or facial recognition, or something along those lines.

All of that really has fallen under data science. On top of that is all the stuff you need to do to make that happen. Getting the data, cleaning up the data, transforming it. Also, putting some of that into production, building dashboards, making reports, making web apps, to some extent, depending on what it is.

Now, the machine learning part is the subset of techniques that is really a way of creating computer programs. Where, instead of the programmer manually describing the logic, "If this, then this. Else do this, else this", you are using data to essentially train the program. It's a whole process of training, to figure out what logic should be used internally in the program. And so, there's several types of machine learning.

Some of the broad categories are like supervised learning, where you take data, where you know the answers. You know this is a pile of pictures of cats, you know this is a pile of pictures of dogs. Then you feed that data in your algorithm during the training phase. It makes a guess, and you can say, "Oh that, You said 'cat,' but it was actually a dog." So then it goes, and it tries to adjust its internal parameters, so that it's able to make on average the best guesses.

Then there's also unsupervised learning, things you're doing where you don't necessarily know. That would be like the customer segmentation I mentioned before. Like if you had a ton of music, and you wanted to classify it into genres. You don't know, you just have, in the old days, millions of MP3 files or something. Certainly something that a human could do. When you have to do it at scale, you want to try to build an algorithm to do that. So you're trying to extract "meaningful features," as they'd call it, from this.

You might think of things in music, like beats per minute, and how loud it is, and if you can identify certain instruments or whatever. Metadata that might go along with it. There are still - if it's unlabeled for unsupervised learning, then you've got a lot of questions. How many genres should the music break down into? All sorts of stuff like that.

Because you don't know ahead of time, and you might be in the world of like, I'm trying to think of some very, let's say roots reggae, which is probably a relatively straightforward genre. Then you come over to something like metal, and like every song is in its own genre, as far as I understand. There's so many genres that just splintered, splintered, splintered. Then you have to make decisions there at some point, "Do we just call that all metal? Or should we break it down into the 20,000 different sub-genres?"

That's machine learning. AI, I would say is a broader set of techniques than machine learning. Today, when people talk about machine learning, 99% of the time they're talking about artificial intelligence.

Historically there were probably two main schools of artificial intelligence. One was actually trying to go in and handwrite all the logic rules to make decisions. Famously in the 80s and 90s, people used what were called "expert systems."

They would go and interview the radiologist and say, "Okay, how do we tell if there's a nodule in this chest x-ray?" They'd say, "I look for something that's about a centimeter across and it's brighter than the others." Blah, blah, blah and then, in the expert systems world, or knowledge-based AI or rules-based AI, you would try to go and really codify those decision rules.

Whereas, in the machine learning world, the idea was that you would try to use the data that you had to teach the program what those rules were.

AI has a long history of these, they're called "AI Winters", and booms and busts, where people got very excited, and then it didn't work out. Then they got excited again, and then it didn't work out.

At the moment we're in this, the last decade or so, a pretty high, high for AI, all based on machine learning. We've been able to demonstrate a lot of actual value that can come out of these techniques.

And, at the same time, from the perspective of someone who works in this field, the AI term just gets abused so much. It's one of those things that I think I'm - I think I put this in my book? The standard joke is that, when you're out to raise money you say "AI." When you're out to hire people you say "machine learning", right? That's what they're actually doing. It's not going to rub those people the wrong way if you say "machine learning," and that's what you're doing. Whereas if you say "AI," people feel like, "Oh, this is marketing. Human resources has been infected by the marketing people". I'd say those are the layout of those three terms.

Len: I think those are really great explanations. It's funny, another AI joke that you reminded me of is, that something is called artificial intelligence until we build a machine that can do it. Then it's not an intelligence any more.

That description you give in your book of saying - I really like the way you contrast just like a programmer writing out all the logical rules, "If this, then that". That machine learning, you can understand it as saying, "If this, then ask the machine. Then do what the machine does."

Len: That's one of the reasons that it's easier for people to think of it as - I think one of the reasons people are so often tempted to call it "artificial intelligence," is because it's like asking somebody to do something for you. You don't have to go into their head and know exactly what's going on. Something's coming out from there, where there's no straightforward relationship for you between the input and the output. It's natural to relate to something like that as a form of an intelligence.

Roy: Right. One of the examples I like to give to contrast those is decision trees, or flow charts. That's something, where, if you were a doctor and you say, "Oh, well, is their heartbeat above this beats per minute? Is their temperature below this? Is their blood pressure above this?" You can think of writing down those rules, and that would be your flow chart or your decision tree. You could easily write that as a program, right?

Whereas machine learning, typically, like in the supervised scenario, where you knew the symptoms going in and you knew what the outcome was - what you would do is you would put that information, that data into the training algorithm with the machine learning. In the end, it would just try out all theses different combinations of those decision rules.

Maybe it's, "Is the temperature over 37 degrees Celsius?" Maybe it starts off by saying, "What if it's over 45 degrees Celsius?" Which, by the way, you'd be dead probably, if you're that hot.

It's going to just keep trying a bunch of different rules, hopefully in a way that can actually find good rules. Then it might end up on the same similar decisions that a human would make. Or very different ones, if it ends up getting better results in the end.

You can think of it as, those are two ways where you could build the same algorithm. One is, you're just using what you as a human knew, and the other one is like just pouring in all the data to try to figure it out.

Now of course, it's only going to be as good as the data you have. I guess you could also say your algorithm that you write down to make your flow chart or decision tree, is only going to be as good as the knowledge that you have internalized.

Len: Now that we've built up this pretty good foundation of understanding what data science and machine learning and artificial intelligence are, and what the business or startup practices can be, that that can be applied to - let's say your the typical person that your book is meant for, which is, " You're working for a company, whether it's big medium or small, and you've been tasked with hiring a data science team. What do you do?"

Roy: The way I go about it in the book is, it's really about, first, try to figure out what you need and why. I try to start off with the fundamental questions of, "What are you trying to achieve? What are your business goals?" Then trying to be as specific as possible and break that down. To see what roles make sense to actually get you, help you meet those goals, achieve those goals that you have for your business or organization. then go from there.

Probably the highlight there for me is understanding what you want to do, and then trying to describe as crisply as possible what the roles are that you need.

That helps you for several reasons. one of those things is just overcoming all of the confusion that's out there. You can argue about whether, for example, job applicants will actually read the job description or not. The classic problems that you run into are, here's an ad for a data scientist, and a bunch of data scientists apply, but they may have all been doing very different stuff than what you need them to do. Your description of it was very broad. "We want you to get value from our data," something like that.

In the end, that's bad for everybody. Because you've wasted a bunch of your time, and they've wasted a bunch of their time. Because it was actually very different. Even worse, if you hire someone and you haven't conveyed what it is this role is really aimed at, and they have very different expectations.

Len: One thing you mentioned too is that there's this interesting issue with scale that you have, when you're hiring in data science. Because, I mean, if you're in the tech world, famously, recruiting good software engineers is like, there's not enough out there,

But when you put out your shingle out asking for people to apply for data science jobs, you just get deluged.

Roy: Yes. I would say that that's been one of the things that has, I don't know if "surprised me" is the right word? Maybe I just wasn't really expecting,,in the real world, what the overall candidate volumes would be like.

You see people are talking about data science and machine learning all the time, so obviously this is a very popular thing. Until I was faced with the actual numbers of candidates and stuff, I wasn't really prepared.

That's probably been my biggest challenge that I've been faced with. I talk a bit about this in the book - that this book is largely written from my experience, and my experience has been in small- to medium-sized tech startups. So the problems, the challenges I've been faced with, are like low resources to support the hiring effort, and then, just a very large candidate volume.

Over the course of my career, which in absolute terms, sounds short, in the data science world, it's been pretty long. I've gone from seeing a manageable number of candidates, where I was the one doing pretty much everything, to an unmanageable number, unless you designed a process to be very efficient.

So a lot of the book talks about confronting that challenge of that huge volume of applicants, which can just be overwhelming. I contrast that to - in my last company that I worked at, going over to the engineering organization, which was separate from data science, and talking to them about their hiring challenges and stuff. Their applicant pool was just so much smaller, that they could use a very different process, and still spend time with their children, and things like that.

So that's one of those things. I mean, depending on the type of organization you're in, that might mean that you need to spend time working with HR and recruiters, whoever's working on it, to help them understand what this challenge is, and also be prepared for that. There's certain aspects of your hire process that might work in one part of your organization, but won't necessarily work for your data science and machine learning hiring, if you're faced with that. I also talk about this in the book a lot.

The footnote there is that, if you're trying to hire a senior manager or a very senior technical person, you don't have that problem. It is, now, suddenly - your problem is more like you just can't get enough candidates. Because there just aren't that many people out there that have the amount of experience for those roles for you to be getting inundated. If you're looking at the more junior roles, new graduates, those things, it can be overwhelming.

Len: You've mentioned the term "process" a couple of times. I think you talk about that, about how one of the most important things to do, ifit's just you and a friend with your startup, you don't necessarily need to define processes. If you're dealing with a bigger organization, maybe with an HR department, and if you are going to get of the scale issue with the first wave of candidates, then the most important thing to do, as I understand it, is to actually have a defined process at the beginning. You're going to iterate, you're going to change,

Roy: Yes, exactly.

Len: The importance of defining it is that that definition of the process can be shared and understood amongst people, and they can say, "Look at it," look at something and go, "Yeah, we're in stage three with this candidate."

Roy: Yes. I think that when you talk to candidates, which is basically everyone, because everybody's applied for jobs at some point - I guess there's maybe a few people who just became a founder or something in the beginning. Everyone's had bad experiences as a candidate. I think that probably a lot of people out there, at least in the tech world, would say that their overall experience trying to get a job is probably negative, on the whole.

Some of it is, I, mean I think it's a fundamentally difficult challenge. Just trying to hire, trying to do it in a way that is effective, efficient and fair. I think that it's just very difficult.

Also, there are a lot of companies out there, that, they just don't spend much time thinking about how you might do this well. I guess that's the goal of my book, is to help people, to give them a set of questions and a thing to follow, to try to do this well.

Len: that, You just reminded me of something, a discussion I had a while ago with somebody, about this very same issue. One of the things that they said was that, recruitment is marketing,

Roy: Yeah.

Len: For the company, and that's something really important. I say this specifically with respect to sympathy for people on the candidate side of the desk, right? Or the process, right? That if you keep that in mind, as someone who's working on the recruitment side, that this is actually marketing for the company, that that can really help you - I mean, there's the squishy element of empathy and stuff like that, which is actually really important. There's also like, "This is actually important for our business that we treat people well." Transparency, and a good process, and fairness, things like that are all really important features of -

Roy: The other version I've heard from people I know that worked at Amazon, they would say that one of the rules they had there - maybe bringing up Amazon is good or bad, depending on who you are - they would say that, "We treat each applicant, they are a customer." Like, "Our number one value in the company is that we give outstanding customer service." I'm sure people could disagree about that too, specifically. People I knew who had been hiring managers there that say, "we have these things, guidelines, rules, targets that we're trying to follow, where we want to get back to people in these very short time cycles. We want to do all this stuff. We want them to come away feeling like they got the equivalent of great customer service." I think that that is a very challenging but a correct goal to have in your process.

If you have hundreds or thousands, or even more applicants, there's no way that everyone can come away from that process being happy. Especially, because, you've got 1,000 candidates, and you hire one person, there are 999 that didn't get hired. Certainly some of those don't care at all, but certainly some people are going to be very unhappy. You don't want them to be able to objectively point out things that were unfair or unprofessional or whatever in there. Ideally all of those people will come back and apply again the next time around.

Len: It's really interesting. I know your book is explicitly directed at people who are going to be hiring into data science, but any one of those thousand candidates would probably be - it would probably be a good decision for them to read the book too, because if you know what people on the - one of the really important lessons in getting hired, is to know what it's like to be on the other side, and what they're looking for.

And, so, for anyone listening here who is looking for a career in data science, and isn't going to be on the hiring side until they've been hired in the first place, if you could give them one piece of advice, or maybe a couple pieces of advice for how they should engage in the application process and the interview process, what advice would you give them?

Roy: Besides, "Buy my book."

Len: Yeah.

Roy: Or maybe the second piece of advice would be, "Buy two copies."

The main thing that I've told people along these lines is that, it's important to set your expectations realistically. {art of that is that, like I've mentioned, there just so many candidates out there, that unless you are absolutely outstanding, much of the time you are going to get rejected. So that's one piece.

I mean, that's an unsatisfying answer around stoicism or something. It is the reality. I've talked to many candidates who would come talk to me after they had applied and got rejected. There may have been no weakness that I could point out that they had, but they need to understand that they're up against a very large candidate pool, and there were people that were even stronger than them, as far as the signals that are coming to us.

That's one thing. I think that's probably a baseline. I think otherwise, a lot of it is about numbers, applying widely. This almost feels a little bit like cheating. If you know people that work at a company, and it sounds like a really good fit, then trying to get some referral or whatever. You may be put on some short list. That's not something actually that I've ever done, just with the goal of fairness in mind, we put everyone on the same level. That's a hard thing to do. It's very enticing to see someone with a strong resume or whatever, and not do that. A lot of people wouldn't do that, but that's the general rule that I've followed, is to try to treat everyone fairly.

That doesn't mean that it's bad advice, I'd say, to try to get that. Because a lot of places, that is the case. I think that's probably a bad practice, but that is what places do.

I think that it's very different, depending on what level you're coming in. If you're coming in at the entry level, like I said, that's what, coming in on the more senior level, a lot of that is done in a more conversational way, right? You can imagine, especially the further away you get from the technical stuff, the more subjective a lot of the skills are and everything. So if you're a manager, you want to have your firm handshake, or whatever. But, yeah, I think those are probably my two main pieces of advice.

Then the other baseline I guess, is, most of these companies are going to give people assessments, technical assessments of one kind or another, and you should do your best to prepare for those.

A lot of times they're bad assessments, they're unfair, whatever. They're testing you on inverting binary trees, which is not something that data scientists ever need to do. They may be doing that. If that's the reality, then you often have to go out there and do it.

Those are probably the things. There's no magic formula. I think that they're, the people trying to get hired, are in basically an equally difficult situation as the people trying to hire.

Len: Just one last question I have about that is very specific - in programming, in the world of software engineering, and stuff like that, often when, before you've has your first job, one thing that people are often advised to do is to have side projects of their own. Some little app that you've built somewhere that people can get, find the source code on GitHub, and look at it and stuff like that. Even if you haven't had a job, you might actually have something that you've built that you can point people to. It might even have users who can refer you and say, "Oh I know so and so, they built this and I use it every day in my such and such task." Is that true in the data science world as well?

Roy: I think so. I think especially early on, people were putting stuff out there, and there weren't many other examples of some cool analysis or some cool app that pulled a bunch of data in and did whatever. I think that it depends a lot on who's doing the hiring, how valuable that will be.

For the process that I've typically run, I would say it is valuable, with the understanding that I typically wouldn't look at the stuff until someone's gotten into the later stages.

This is one of the hard things that, some of those questions I've had, where, because the challenges I face are just like this sheer scale and volume of the candidate pool, I basically, in the early stages, can't spend time looking at resumes, cover letters or side projects, or any things like that. There's just simply not enough time.

And, so instead, you try to focus on, how can you most efficiently and effectively filter those candidates, to narrow your funnel of candidates to the next stage?

As I mentioned, I ended up adopting the process where I wouldn't even look at resumes. For a few reasons. One is that there are so many resumes that you just can't tell apart. Then the other one is that oftentimes the resumes are - the quality of the resumes, as far as getting the signal about the candidate is so - it can be so unreliable. You can waste your time looking, doing that too. Certainly the technical skills assessments that we give are imperfect, at best. It's a fair way, relatively fair, as fair as you probably could be way, to do some assessment, that will still allow you to deal with that volume and that scale.

So, as I mentioned, in other ones, you could have a thousand candidates, and they're all Grace Hopper, or whoever, is incredible. You just don't have enough time to talk to each one, certainly, in your process. So that's basically the way that I look at that.

Now, as I said before, if you've made it a couple stages in, then we start looking at candidates in a more thorough way. Then I'm looking at their resume, usually not as a filter, and the projects, not as a filter, but as additional pieces of information.

It might be like, "Oh, can you tell us about this?" Or like, we're hiring for, let's say, time series forecasting. We also notice that they've done work on audio. Somehow that ties into something that we want to do. We might be, say, "Oh this candidate, these candidates are about the same, but this person, they showed that they can use this technology, or they have this experience," or whatever, that's additional. I think that compared to traditional hiring methods, that's a little bit flipped, as far as using the resume much later in the process.

Len: Thanks very much for sharing that. For anyone interested in getting more into the details of how you can build a process to deal with such a large volume of applicants in such a indeterminate industry, with maybe indeterminate demands from management, please buy the book Hiring Data Scientists and Machine Learning Engineers.

Just moving on to the last part of the interview, where we talk about your experience, the craft and process and experience just of writing and self-publishing a book. I guess my first question would be, why did you choose Leanpub as a platform for writing and publishing it?

Roy: I didn't really set out to write a book. I was, last year, kind of, during the pandemic and in the beginning of the pandemic, I and most of my team were laid off from the company where we're at, just along with a lot of other people at that time. So I ended up doing a bunch of consulting.

Then, at the same time, I was looking for ideas to build a software startup, and a software company. As part of the software company, the one I ended up focusing on, was ideas around hiring. Because I'd just been spending so much time on that, and I wasn't happy with the tools that existed. I was trying to figure out, was there some software that I could try to build, that would help people be more effective and efficient and fair in their hiring process?

As part of that, I started basically interviewing a whole bunch of hiring managers, to ask them about the challenges they were facing, and what their process was, and how they went about everything.

At the same time, I'm having a bunch of conversations with clients and potential clients that are basically, "Who should I be hiring, and how do I hire them?" And, "How do I structure my team?" All these things like that. Strategic stuff around their stuff they're doing.

Basically, at some point I was like, "I should just write a series of blog posts that covers a lot of this stuff, so that I can point people to that."

Then a couple things happened simultaneously, coincidentally. One was that I had been hanging out on the SaaS Twitter world or something, people who are building these SaaS companies. I saw, someone said something like, "You don't need to wait until your software is ready to release. The first thing you release could be a newsletter, or a whitepaper, or something along those lines, while you're still trying to get your product out the door, to help you build some credibility and get markets, make the market aware of you." Things like that.

Then the other thing was that my friend Joel Grus, he had just released his most recent book, and that was on Leanpub, he decided to self-publish it. Versus going with a traditional publisher, like he had with his book before that.

So I very naively thought, "Hey, maybe I should just, instead of writing those blog posts," which I had drafted a couple of them, "I should put that out as a book and sell that, and it can't possibly take more than a couple months."

Looking at Leanpub, when I went and I checked out his book, Joel's book is, I believe it's called Ten Essays on Fizz Buzz. I went and checked it out, and then I looked around and I said, "Oh, this is a platform that looks like it will allow me to get things done quickly."

I would say that was the main thing. My primary goal, we'll say, is not to be like a bestselling author, but rather, to get some stuff out there, that I could promote, and help me towards the goals that I have.

Now of course, I grossly misunderestimated how long it was going to take, or maybe I over-estimated my abilities. Whatever it was. Maybe if I had been 100% full time very focused, I could have done it all in two months. My consulting work and other stuff intervened. My pace was a little bit slower. I ended up writing this on Leanpub.

The other thing that immediately jumped out to me was, I'm a computer person. I guess? I don't know if you can just say you're "a computer person" if you work in the tech world and program and stuff.

Len: People will definitely call you that if you are.

Roy: Incorrigible, insufferable computer geek, I guess at that point, right? With Leanpub, one of the things I immediately saw was the Markdown workflow that was available. I normally write my notes and things in Markdown. Just integration. I chose to write my book with the Markdown to GitHub workflow.

Then Leanpub then goes and grabs that, and builds all of the ebook stuff. I didn't really mess around with other stuff.

I was pretty happy from the beginning. Especially that I could put my book page up on basically day one, and say, "Here it is." Then start spamming people I knew to say, "Look, I'm going to write a book. Here it is. Please sign up. Please put in a price you think would be fair, what you'd be willing to pay, and get started." I think that was really good, just to start gauging interest, and also getting people a little bit ready. Whet their appetite, maybe?

I will say that the very first thing I did, was make a book cover. Which - everyone told me that I shouldn't be wasting my time on that. But, to me, making the book cover and putting it up on the Leanpub landing page for my book, it was saying, "Oh, this is real now. I need to do this. I need to complete this." Then I started getting some pre-orders and everything.

So, then, at that point, you really start to thinking, "I should probably finish this. People have already paid me money." I'm on the hook at this point.

Len: Thanks very much for sharing that. That is actually one of our most important features actually, is that the moment you create a book, we create a landing page for your book, and encourage you to fill it out, and fill out your profile.

The reason is, a), it makes it seem real, and b), it gives people an opportunity to sign up and say, "Hey, I'm interested in this book." It gives you motivation to do it. Without those things, it just seems like it's just you and an idea.

Anything being out there, including having a cover, is actually there as much for you at the beginning of a project, as it is for other people.

You brought up there, one of the - I mean, you can't do anything public without starting to get advice from people about what you should do. I would say, here's my advice: when you create a landing page for a book project, make a nice book cover image. The first thing you should do is create the book landing page, but the second thing you should do is make a book cover. It's really important.

In particular, it's very important in the self-publishing space. Because it's a sign that you are serious about what you're doing, and you're going to try hard, right? I mean, the skill of making a good book cover is a unique skill, and it's not the same as writing a good book on any topic. The fact that you put in the effort to do it, is not unique to the craft of making a -

Roy: Right. I have gotten a few compliments on the book cover, which I made myself, so I felt proud about that. There's no accounting for taste, so who knows?

That was a good starting experience. Like I said, I really liked the workflow that was available. I haven't tried the web version of it.

One of the other things I did that was probably unusual, was that I ended up doing some of the writing on my phone.

The way I did that, was, I have an Android phone, and you can set up a thing called, I think it's called "Termux?" Which is basically, and this is where it gets really geeky, it's a way to have like a Linux terminal command line on your phone. So then what I was doing was like, oh, and then you can install all these packages. I installed Git. Then I installed a Markdown app. Then I have a very tiny keyboard that I like to use that I would then, when I was out and about and I was hanging out at my relatives' houses, usually about once a week for some dinner or something, I might spend an hour in the corner when everyone else is doing things, just writing and have my little keyboard and my phone, and then I can commit the changes and ship them off to GitHub. Then they were ready to go. that was also interesting. It's not for everybody, but I enjoyed that part.

Len: Thanks very much for sharing that actually. It's one of the joys of having so many of our authors be people who are technically proficient and curious, is that we get to hear about all the Rube Goldberg machines that they invent to do their writing. For some people just the process of setting it up is actually fun, interesting in itself.

One other thing you did that was relatively unique too, was you actually conducted I think five or six relatively long-form interviews with people who work in hiring, particularly the space that you're writing about. I think there was someone from Spotify, someone from the Wikimedia Foundation, and things like that.

I was wondering, I mean, we've been going on for a while now, so maybe in just a few minutes, if you could give some advice to people who are going to be doing interviews for their books, what little bits of advice would you give them?

Roy: Sure. I decided to do the interviews for two reasons. One was that I was very aware that my experience was relatively narrow, and the challenges that I had faced in doing hiring, because of specific circumstances I had been in. So I wanted to get some broader perspectives.

Then, of course, there's a practical marketing aspect to it. Which is, that I'm not as famous as some of those people. Or certainly not as famous as some of the names of the places they were. They're all extremely competent, extremely experienced people. So that's why I did that.

I think, honestly, that the interviews are the highlight of the book. Maybe it's because I already knew the stuff I was writing, because it was already in my head, and they're saying things that I didn't necessarily know.

Basically, I found those people. Some of them are people that I already knew, some of them were people that, because I had been talking to other people about, "Hey, I'm going to do this book," or whatever, then they were through the network. People said, "You should really talk to this person. This is a person who has a lot to say about that topic."

As far as the specifics, I ended up using Zoom. Free Zoom, it's like no limit on the time. In the settings, you can go check the thing where it will give you a separate audio file for each speaker. It's very good for low-budget podcasting and stuff like that.

I have some ambition to turn the interviews that I did into some podcasts, or clips or whatever, that I'll use to help promote the book or something. I'm not sure yet.

I talked to those people, I set up a list of questions. Mostly, probably half the questions that I asked all of the people in my book were the same, and then some more specific ones.

Then it comes down to transcriptions. I actually went through and tried out a bunch of tools to do these transcriptions for me. The first one I didn't try, not sure if I tried it at all, but I started realizing like, "Oh, wow. I'm doing these interviews that are between 30 minutes and an hour, and I'm probably spending five times that much time doing the transcription." You can see, if you were in this podcast right now, you'd see Len looking at the clock wondering how long it's going to take to transcribe this interview.

Then I used some tools, and for the most part, well, also I'm a geek and I work in machine learning, and that's what they're using to do these transcriptions. I didn't try any of the ones that you had to pay for, because I was like, "I want to see what they're doing."

So I tried a few open source ones from Mozilla. I tried some models that had been released by Facebook. I think they have one called Wav2vec-U or Wav2vec 2 or something? I literally went and grabbed the source code, and then started running it on my computer. I'd say, unfortunately, nothing worked well. Some of the transcriptions were hilarious, but they weren't very useful.

Then, on the other hand, for my book, I was doing a fair amount of editing anyway. In the end, I decided just to manually transcribe those - that I could clean stuff up and shorten things, cut out a few things. Make me, certainly, and to a very small extent, the interviewees, sound a little bit smarter. Chopping out the ums and whatevers, and a little bit of content that they didn't want to get in there, whatever it was. At the end of the day, I felt that, for these, I think I did five interviews, that probably the best value was for me to just transcribe these by myself.

Len: Thanks very much for sharing the details of that process. We actually have a professional transcriber that we pay to do it.

Roy: Wow, yeah.

Len: I do an editorial pass afterwards. By the way, for anyone listening, if you do like the transcriptions of our podcast on Leanpub, I can put you in touch with our transcriber, named Alys McDonough. She's very fair and she's fair-priced and does a good job.

Yes, I know, I used to do them manually myself, and I know how long it takes and it's a very, it's actually - I also know that the auto-transcribing stuff is not there yet.

Roy: Right. It's amazing, because I do voice typing all the time on my phone, sending messages and whatever. It works very well. The little bit of stuff that I had seen, it just wasn't close to that.

Then, on top of that, when you are actually having a discussion in a setting like an interview or a podcast, and the way people talk, it doesn't - it's not the same as when you're doing, for example, deliberate voice typing. If you listen to yourself or someone else doing it, they'll tend to slow down, enunciate better. You're trying to structure what you're saying as a sentence, and it's not all "you knows" and "ums" and "ah's," like it is in regular free-flowing speech.

And that's what you get when you transcribe the other one, is, there's a lot of noise and junk, and then stuff that is confusing to the auto-transcribers.

Len: I mentioned advice earlier. Before we started recording, you mentioned that you had an interaction on Twitter today. I think, you didn't say it, but I gathered, questioning your decision to use our platform, and I was wondering if you could talk about that?

Roy: Sure. Well, it wasn't quite that.

Len: Okay.

Roy: I saw a tweet by Paul Graham. Who is, he was the founder of Y Combinator, probably the most famous startup incubator in Silicon Valley. He posed a question basically, "What platform and tools should I be using to self-publish a book? I want to do it all in the browser." I replied to that just saying, "Oh, I just released this book on Leanpub, and I didn't do the, use the browser tool, but it exists."

Then the other thing he said was that he wanted good control over the graphical elements. I said, "Leanpub might be the one you want." Someone else then asked him, "Well, why does it have to be all in the browser?" He said, "If it's not all in the browser, it's a sign that they're just incompetent."

I wasn't sure what to think there. Because I probably wouldn't have chosen Leanpub if I had been forced to be all in the browser. So to me, it was like, "I want to use the tools I'm most comfortable with." Which in this case was my text editor, right? I think that's for a lot of people.

Obviously Leanpub has a specific audience that's centric around tech people, that I find that to be very easy and nice, and also you can, well, like I said, I did something crazy, which was use my phone to do some of the writing.

Also, you can be in one of these Zen mode editors to block things out, and stuff like that. I don't know what to think about that, and it's - for something like Leanpub, it's a question of product-market fit, or product-market niche fit, whatever it is.

Len: Thanks for sharing that. That's really fascinating. I think when it comes to explaining why he would say that about, "If they don't know how to produce a book from a browser app, then they don't know what they're doing." I mean, that just speaks to what we talked about before, which is the messiness of personalities. One thing I would say in writing, is that people are often very opinionated.

Roy: Opinionated. You should come into hiring. People are even more opinionated.

Len: I imagine that. The thing is, one of the differences, I would say, between like having very specific preferences and being opinionated, is that you conflate having very specific preferences, with the right way of doing things for everybody.

Roy: Right. Yes.

Len: I'm an opinionated person in lots of things and, it is true then - I wanted to bring it up, just at the like more meta level, which is that whenever you - if you put yourself out there doing a project, like self-publishing a book, you're going to get a lot of people coming at you with very strong opinions about things.

Roy: Yes.

Len: One thing I would note, is that they often change them after they've had exposure to the world. Because it's actually a lot more complicated than you might think.

If I had to guess why someone like Paul Graham would be making a comment like, "If they don't know how to do it in the browser, they're not competent," or something like that, it would be a lot of people's, I think, experience with book publishing, and getting a book published, is a lot of back and forth in Word.

If you're someone who's seen a million technologies tried, and evolved and proposed, because the stuff that Y Combinator approves is one small sliver of the projects that people apply with - if you're that person, and then you try to get into book publishing, and someone's like, "Send me a Word document." I can see how you would quickly come to the conclusion, that like, "These people are not technically competent."

Whether that's a correct conclusion to draw or not, I can see how you would be inclined to want to signal that someone's technically sophisticated before you would get -

Roy: Right. Coming out of the like physics-y, math-y world of academia, I feel the same way about journal publishers that don't accept LaTeX. Like, how could you possibly be a serious academic journal if you don't take LaTeX?

I could only speculate about Paul Graham. It's a little bit of a, not something that I would expect, because, I mean he has published a few books. Also, I think I remember reading an essay or something by him, where he talked about his process, where he actually would write on a laptop that was completely offline.

Len: Yeah, you're just reminding me - I think I've interviewed more than one person who actually, using our in-browser mode, what they would do is, even though they were still online, what they do would do is they would go somewhere, a park, like a coffee shop or something like that, to get out of their home environment. They would write until their laptop ran out of power. That was like a natural, inbuilt timer, but that they didn't have to pay attention to. So, these different ways of getting yourself out of the normal circumstances are pretty common.

The last question that I always like to ask on these interviews, if the guest is a Leanpub author, is, if there was one thing that really annoyed you about Leanpub, or that was broken, or if there was one thing that, one magical feature that we could build for you, what would you ask us to do?

Roy: Well, I think the thing that only at the very end did I realize was available, was some of the - basically was that you could customize some of the layout.

I chose like, "Oh, this is a business book." then I just went with that. Then specifically, the main issue I had was around tables. If you look in my book, there are a lot of tables. The way that they ended up, the default table stuff, there's like no horizontal separating lines, and so things were just running together.

So that was one thing I was unhappy about. But, then, before I pushed out the final version, I just went through and looked at every single option that authors have about their book. I noticed, like, "Oh, I don't have to choose the business book, the whatever, fiction book.” Whatever the, I think three choices?

There was this other one that was "customize", or something like that. I went, and then I read through all of those options too. I realized that, "Oh, I can add some extra space. I can add lines between there." I'd say, I actually searched the forum several times to see if I could find something like that, and I could never find anything that pointed me there. It was really, I wasn't specifically looking for this. I did a dragnet, just to make sure that there wasn't anything I was leaving out of, important options. I think the other one, is, I'm considering doing some actual physical books, and I'm coming into this with no experience publishing a book. And, so, the real question to me is like, "What is the absolute easiest way to go from what I've just written to a physical book?" Or maybe a better question also would just be like, to very broad, distribution. I've tried to stay away from Amazon, just out of laziness/preference. I'm looking at it now, because, thinking about physical books, I thought I thought, "Maybe I'll just print a few as keepsakes and give them to," For example, to people I interviewed as just, "Thank you. Here's a physical copy of the book."

When I looked at a couple of places that did digital printing, they also had what looks to be, one-click, set up a storefront. You can sell those books and print-on-demand. So, I've been looking into that in recent days, just to try to figure out what's the best option. I feel like, if there was a dead simple way to do that, that would be incredible.

Len: Thanks very much for sharing both of those points. On the first one, what Roy's talking about, for those listening, is that every Leanpub book has four theme options. The first is fiction, the second is nonfiction, the third is business, Oh wait, no. The third, I think it's? No, it's fiction, nonfiction and technical now. We got rid of the business theme. We just renamed it to "nonfiction," because that's obviously more broad.

Roy: I hope my book is mostly nonfiction.

Len: Yeah. Then there's the custom option as well. If you choose the custom option and you're on a Standard or a Pro plan, then you have all these global formatting settings for your book that you can tweak.

We do, by the way, add to those from time to time, based on - they're, basically, every single one of those, is because an author asked for it at some point. We don't do everything everybody asks us for. If you're writing a Leanpub book, and you're getting to the point where you want to do some more customized global formatting settings, and there's something that you need that we don't provide, please feel free to reach out to us and ask if there's some way to do it, or if we can build that in.

On the second point, we do, I believe we have at least one - there might actually be two guest posts that I can point you to, and I'll put links to these in the transcription of the episode, guest posts from Leanpub authors who've gone into print and who outlined their process.

The main answer there though, is that we have a print-ready PDF export flow. Once you've written your book using one of Leanpub's writing workflows, you can then just choose some settings, whatever, the page size and stuff like that.

Then, you just click one button, and we give you the print-ready PDF file that you need, that you can then take to various print on demand, self-publishing sites and services, and things like that. You just upload it and add some metadata and descriptive stuff as usual. So, we give you the option to do that.

More generally, the idea of going wide or going narrow, as they - I don't know if they say "going narrow", but they definitely say "going wide" in the self-publishing blogosphere.

The idea is that, "You've written a book, good for you. You've got it in ebook format, good for you. You've got it in print-ready PDF format, good for you. Now put it out everywhere you can."

The other view, of course, and that's, by the way, that's assuming that none of the services that you're using have exclusivity requirements or anything like that.

Roy: Right.

tohere

Len: The other approach obviously is, "Point everybody to one place." This is just a decision that any self-published author has to make. "Do I want everything in one place, or do I want it in many places? If it's in many places, which one do I prioritize pointing people to?"

It's a choice. I mean, at least, I will say, not speaking for Leanpub, but the advice I always give is, "Go wide." Put your book up in as many places as you're comfortable putting it up. As many places as you - if you don't want to manage all the data coming from all ten different places, do five. Whatever works for you.

Having your book up in more than one place, at least at the beginning, is a really good idea. Because, a), you'll reach more people, presumably, but, b), if you do narrow it down to one in the end, you'll only know which one you should choose after having tried more than one.

So, you might find, for example, that you're selling more copies on Amazon, but you might find that you're making more money from Leanpub, because we pay higher royalty rate.

So, balancing those things out then depends on, "What's your goal?" Is your goal to increase your public profile? Well, then you don't necessarily care about money so much. Is your goal to attract clients to your consulting business, or something like that? Then reaching as many people as you can is maybe more important than making money.

Roy: I think that the question that people ask me is, "Oh, is this going to be on Amazon?" I haven't decided that yet. I still don't - I don't understand all the details related to launching this on Amazon, and if I'll need to do anything.

Certainly one of the parts about Leanpub that I've really appreciated, is a very large degree of sort of, "We are partners with the author, and we feel that being non-exclusive is in their best interests." So, really, like the ownership - I have the ownership of my book, and I can do whatever I want with it. I feel like that's, minimally, very aligned with my values. I think, also, probably just very good for the authors.

Len: Oh, yeah. I mean, that's actually one of the - in addition to letting people know that we have custom themes, we're also not very good at messaging about the fact that you don't need to publish your book on Leanpub at all, if you use it to write your book and create your PDF, EPUB and MOBI files.

A perfect Leanpub use case for a self-published author, is to use Leanpub to write the book and create the EPUB, like the ebook files, but never publish on Leanpub at all. We're here to be the best place in the world to write books, as well as hopefully publish them.

That writing part, if that's all you use Leanpub for, then that's a perfectly good case for us.

Well, Roy, thank you very much for taking time out of your afternoon.

Roy: Let me ask one more feature request.

Len: Sure.

Roy: Which is, for a brief time, I was the number one, top book on Leanpub. I really want a badge on my homepage that says that I was the top seller. There you go. Feature request.

Len: Thank you very much for suggesting that. That's a great feature request. I'll make a story, refer that for the team right away. People love getting on the bestseller lists for stuff like that. It is actually really important that we find a way of surfacing that, because it helps people understand how good a book is, and how much reach it has.

So, thank you, Roy, very much for being on the Frontmatter podcast and for being a Leanpub author.

Roy: Thank you so much for having me.

And as always, thanks to you for listening to this episode of the Frontmatter podcast. If you like what you heard, please rate and review it wherever you found it, and if you'd like to be a Leanpub author, please visit our website at leanpub.com.