An interview with Phillip Compeau
00:00
00:00

In Leanpub's Frontmatter podcast, we interview authors and special guests about their lives & careers, their areas of expertise and the issues of the moment, and their experiences as writers. Every episode is deeply researched and covers areas that are equally of human interest, general interest, and professional interest.

View all Frontmatter episodes

  • Episode 227

Phillip Compeau, Author of Biological Modeling: A Short Tour

00:00
00:00
1 H 9 MIN
In this Episode

Phillip Compeau is the author of the Leanpub book Biological Modeling: A Short Tour. In this interview, Leanpub co-founder Len Epp talks with Phillip about his background, how he got interested in math and biology, Computer Science and Computational Biology, his book, and at the end, they talk a little bit about his experience as a self-published author.

This interview was recorded on June 7, 2022.

The full audio for the interview is here: https://s3.amazonaws.com/leanpub_podcasts/FM205-Phillip-Compeau-2022-06-07.mp3. You can subscribe to the Frontmatter podcast in iTunes here https://itunes.apple.com/ca/podcast/leanpub-podcast/id517117137 or add the podcast URL directly here: https://itunes.apple.com/ca/podcast/leanpub-podcast/id517117137.

This interview has been edited for conciseness and clarity.

Transcript

Len: Hi I’m Len Epp from Leanpub, and in this episode of the Frontmatter podcast I’ll be interviewing Phillip Compeau.

Based in Pittsburgh, Phillip is Assistant Department Head and Associate Teaching Professor in the Computational Biology Department at Carnegie Mellon University.

In addition to his academic and administrative work, Phillip is a passionate supporter of a variety of online and offline educational initiatives. He co-founded the computational biology learning platform Rosalind, and he helped lead the development of the first computational biology MOOC on Coursera, all the way back in 2013.

You can follow Phillip on Twitter @PhillipCompeau and check out his website at compeau.cbd.cmu.edu.

Phillip is the author of the book Biological Modeling: A Short Tour.

In the book, Phillip introduces the reader to what biological modelling is through fascinating examples, like how zebras get their stripes, and how algorithms can be trained to “see” biological cells.

In this interview, we’re going to talk about his background and career, professional interests, his book, and at the end we’ll talk about his experience using Leanpub to self-publish his book.

So, thank Phillip him for being on the Leanpub Frontmatter Podcast.

Phillip: Thank you Len, great to be here.

Len: I always like to start these interviews by asking people for their origin story. So, I was wondering if you could just talk a little bit about where you grew up, and how you found your way into studying mathematics, and a career in Computational Biology?

Phillip: I grew up in a small town of about 2,500 to 3,000 in North Carolina, in the foothills of the mountains. I went through the public school system there, and had some really great teachers, and some interesting teachers as well. I’ll leave it at that. I think we could fill an entire hour probably with some stories, some of which are horrifying, from that experience. But I also had some fantastic teachers along the way, and some people who really fostered a love of mathematics, and an ability in mathematics.

I was able to get connected to problem solving competitions, and things like that. I had a sense that I wanted to be a college professor, even before I went to study my undergrad. Fortunately, I think I picked an undergrad institution where that desire made a lot of sense. I went to Davidson College in North Carolina. In part, because I could play tennis in college. It was going to be difficult - it was a dream of mine, really, to play competitive tennis at a Division 1 school. I wanted to go somewhere where that was possible, but that also had really good academics.

And so, I wound up at Davidson. And Davidson - most people know it, because that’s where Steph Curry went to school. But it’s also the top liberal arts college in the south. And it also has Division 1 athletics. About a quarter of the student body is a Division 1 athlete. Because the small - 1,700, 1.800 student enrollment type place - it fit what I wanted to do. As well as being close to home.

That’s where I did my undergraduate studies. I majored in mathematics there, and got to pay tennis, and make some close bonds with lifelong friends in that way.

It didn’t knock me off wanting to be a professor, because I had some absolutely fantastic professors across whatever subject I was studying, not just in mathematics, but across the liberal arts spectrum. I think it gave me a sense of what a professor is.

Now that I’m at a research institution - it was not reflective of exactly how all professors are, what they care about. Because I was at a place where teaching was at the forefront. You didn’t have graduate students. There was a high premium placed on being well-rounded, even though I knew I wanted to be a professor. Having that experience made me want to be a professor who focuses on teaching.

I spent a year after that at Cambridge University. I got a scholarship through Davidson, actually, to have a full scholarship to study mathematics at Cambridge, and to complete what’s called Part III of the Mathematical Tripos.

Undergraduates at Cambridge have a very gruelling curriculum that they go through. Then, they’re allowed to continue on and do an additional year of study if they like. They open that additional year of study to international students as well. It was an extremely international mix of people that I found myself with, 150 or so, I think - maybe even more? All of whom were very intent on studying mathematics at a place where you could take these courses that were unbelievably rigorous.

I think that it reinforced my desire to be a teacher, as much as my experience in undergrad had, but in a different way. Because a lot of the teaching was really bad. Some of it was good. But I felt like - the material was unbelievably advanced, but I essentially taught myself a huge amount of material, and scraped by on the final exams, and so on. I thought, “Well this is, in a different sense, making me want to be somebody who focuses on teaching undergrads for a living.” I knew I needed a Ph.D. to do that. I’d had a little bit of research experience in mathematics through my undergrad experience. But I wasn’t inspired by doing mathematics research in the way that I think other graduate students in mathematics are.

And so, that was a difficulty. Because I knew where point B was. I knew I wanted to be a professor, and focus on teaching. But getting there was somewhat of a struggle, in the sense that you really need to care very deeply about studying a single research project for a period of five-plus years. It may be a lot more. In some cases, people take ten, eleven-plus years to do a Ph.D. That’s a huge part of one’s life.

You’re working in a field, in where to get a Ph.D. in mathematics, you need, really, probably only a couple of publications.

Those publications, those research papers may get cited, if you’re lucky, a couple of dozen times. It’s not a place you’re going to obtain any sort of fame. You’re going to be working on a problem, where you are working on something that a tiny community of people around the world has any real understanding of. There are a lot of mathematicians maybe who could understand it. But in terms of when you actually sit down and reach the frontier of mathematics research, it can be a pretty lonely place. I found that out firsthand, in a Ph.D. that I completed at UC San Diego.

I would say that when I went out there, and didn’t have a full picture of this. I was just set on point B. I was fortunate that I happened to select an academic advisor, a Ph.D. advisor, who happened to have funding from Howard Hughes Medical Institute, to work on education projects.

Pavel Pevzner was my Ph.D. advisor. In the field of Computational Biology, he is someone that everyone knows. He has worked on fundamental problems in how we take biological data, and build algorithms to analyze it, for decades, notably in the area of assembling genomes.

If we want to read out what is the human genome, this ultimately boils down to - you have a bunch of fragments of DNA from multiple cells that all have the same DNA. Then you want to overlap fragments of DNA, since they came from the same initial source, to produce what the underlying genome is. That is a computational problem, because you may have hundreds of millions of fragments. You can’t do it manually.

He has worked in this area and a few other areas, and has really made a name for himself, in terms of research. But he also cares about teaching, and happened to have education funding, that I didn’t know about when I approached him to be a Ph.D. advisor.

Things were a little bit serendipitous there. He says that he was taken aback when he asked me what I wanted to do, and I said, “I want to be somebody who’s like an undergrad liberal arts professor, and focus full time, or as much of my time as possible, on teaching.” Because he’d never had anybody tell him that before. It’s a strange thing to tell to somebody who may have millions of dollars of research funding, and is at the forefront of their field, that you want to be a teacher for a living.

But, fortunately I found the right person there. This led to me having passion projects, that fortunately became part of what my Ph.D. thesis was. I did a fine Ph.D. thesis, that got its citations, and that I look back upon fondly. But I also, at the same time, had a chance to work on increasingly scalable online education projects with Pavel. That allowed me to have a little bit of a reputation for working in this area.

In addition to doing things like being an instructor in a pr-Calculus course, or TA’ing a bunch of courses which are required of math students, in order to get paid as a Ph.D. student - to have this portfolio, so to speak, of online education experiences, helped me out, I think a lot, when it came time to apply to be a professor.

I found a position at Carnegie Mellon, that would be in Computational Biology, that was focused on teaching. Even though Carnegie Mellon is an R1 large research institution, they have a track for people who want to focus on teaching and administration. A lot of universities have this role. I believe that was the only position in the world that would be within the field of Computational Biology focused on teaching, specifically. So, that was another point of fortune, where, in 2015, I started my position at Carnegie Mellon on the teaching track. That’s where I’ve been since then.

Len: Thanks very much for sharing that. You captured the nuances of every step, I thought, very well there. Including the particular challenges of a Ph.D. or the Tripos at Cambridge, and things like that.

You said a couple of very interesting things at the beginning there. Given your interest that you developed over time in teaching, which was, you said, “horrible teachers,” or, “horrible experiences with teachers,” you also mentioned getting into problem solving.

I wanted to ask you, if you’re willing to talk about it - I mean, because this has actually come up in discussions on the podcast before. Particularly because a lot of Leanpub authors are people out there trying to teach people things, and show people how to do things.

Sometimes it’s because they’re emulating all the positive, great experiences they had in their life. Sometimes there’s a reactionary element to it as well. I wouldn’t describe my own experiences as “horrible,”, but I had some bad ones myself, at an early age, realizing that teachers aren’t necessarily heroes. I was wondering if you wouldn’t mind maybe talking - not necessarily about the details, unless you want to - but just sharing generally what those horrible experiences were?

Phillip: Well, the darker ones would be people who wound up spending time in prison for horrific crimes.

Len: Okay.

Phillip: I wouldn’t get into those. But, the example I always give, is that I had a high school biology class that I hated so much, that I vowed I would never study biology again after that course.

I often will start a talk to students about what Computational Biology is, with the question, “Who here hated their biology class in high school?” It’s about 85 to 90%. Often these are talks that we give as prospective student talks, for students who have been admitted, or who are applying to Carnegie Mellon, who want to come to study undergrad there. I have noticed - it’s almost exclusively with their parents, that all the parents hated biology. Fortunately, not all of the students hated biology. Although most did. So, maybe we’re headed in the right direction there. I find some hope that fewer people hate it.

There were a lot of issues. But largely, it was a subject that seemed unbelievably based off of memorization. We just filled out a bunch of work sheets, and memorized a lot of facts that, I don’t think I could tell you one of them. We did the standard dissection. I think we had a fetal pig and a frog, and there was no guidance given on what we were even supposed to do, or what we were looking at. It was that poorly organized.

So, I thought, “I’m just surprised that anyone would be a biologist, if this is truly what it was.” I had no context of the beauty of biology. I had no idea that while I was doing this class, that we were just filling out hundreds of pages of worksheets, and memorizing things, so that we could do well on a state-sponsored test.

Meanwhile, there was this revolution happening in biology, where biology was going from an experimental field, to one where questions are answered by analyzing data. I think it’s a cool life lesson, in terms of - don’t promise that you’re never going to do something again, because it might come back and bite you.

In my case, what I do with my entire life, is trying to find cool ways of teaching biology, or to show biologists how things are really done, in terms of computational data analysis in the field.

Len: I’m really looking forward to asking you questions about that, about biology and data science, and the big transformation, and things like that.

But it’s funny, the parallels. I grew up in southern Saskatchewan, in Canada. Your description of a terrible biology class maps quite well onto my own experience. I remember being quite angry. I mean, I never quite put it together with the lack of guidance, when it came to dissecting the frog that I had to do. But there was something just wrong about what was happening, right? I was like, “Well, what? Why am I doing this? What do you mean? How am I supposed to know what a liver is? You showed me a two-dimensional picture from a twenty-year-old textbook, and that’s not enough guidance to do surgery.”

In particular, I remember really hating being graded on how well I could draw. I’m like, “Isn’t this biology?” I mean, these were like the crude thoughts of a young teenager. But I could tell, because - it wasn’t until later that I put together in my haphazard way, that, “Oh, this might be just the echo of the 19th century gentleman scientist, who walked around with a notebook.”

It was very important for them to be able to draw things, because they might go on a two year long journey on a sailboat. When they come back from that island, they better have been able to draw things reasonably well, or there’d be no record of what they saw. But being able to do things with like test tubes and pencils, and things like that, there was still some echo of that. Even at the time that I would’ve been studying, that would’ve been outdated.

Phillip: Yes. It’s not to indict all biology teachers. I think that they just have an extremely uphill climb, often. Because the curricula are getting better, but they’re getting better at a slow rate.

Often biology teachers may have to go outside of what they’re asked to really teach as part of that curriculum, in order to show students something that’s really neat and cool. I mean, biology has been really interesting for a long time. I was amazed when I finally went back and read The Origin of Species, and how brilliant it is, cover to cover. Or how Mendel’s experiments, by modern standards, probably would’ve been fraudulent.

When you go into the 20th century, you’ve got experiments to unlock, what is DNA? What is it made up of? You have the genetic code that takes triplets of DNA nucleotides and converts them into amino acids by the system. Those amino acids are building blocks for proteins that do everything in your cells. These experiments to identify what is actually going on in the cell, and what the identity of these molecules is that you can’t see. Or maybe with an electron microscope, you could very -

You could start to elucidate this. But you wouldn’t be able to see any change, to understand that change in the cell, without ever observing it. These experiments are just downright genius. Students really don’t learn them, for the most part. It’s this extremely strange dichotomy.

I would say the only thing that I find comparable to it, is mathematics, actually. Because what mathematics is, and how mathematics is taught in schools, are so wildly different from each other. People have written about this, I wouldn’t be the first to say it. But it’s truly shocking.

**Yeah, that actually - I think that’s actually one thing I wanted to pick up on, that I think is related to that - you mentioned when you were younger, getting into problem solving. That’s actually a very specific term of art. I think that a lot of people might think of that as - it’s paradoxical to say they think that’s math, or they think that’s not math. But if you’re coming from the wrong perspective on both sides of what math is, and what problem solving is, any comparison is going to be confusing.

I was wondering if we could talk a little bit about what problem solving is? It’s something that actually attracts a lot of younger people, maybe if the classes that they’re in at school aren’t that attractive, but they find this problem solving thing, which has this whole culture around it. I was wondering if you could - it’s related to the things you’re talking about, when you’re talking about solving these problems about what’s going on inside a cell. If you could talk a little bit about what problem solving is, particularly, let’s say, for teenagers who get into it?

Phillip: Yes. I think I got into it really when I was maybe in the seventh grade? I had a really good teacher who got me involved in the math counts, which I think is the middle school level version of that? There’s high school versions of problem solving that wind up culminating in the US, and then the International Mathematics Olympiad, where you have mathematics questions about different sorts of things - you have a timed exam where you have to answer the questions. Yeah, it’s a world that, I think, not a lot of people are that aware of. You have these competitions, and they’re extremely challenging exams.

They’ll ask questions, like - there’s some good examples of these sorts of questions on places like 3Blue1Brown on YouTube, that have done a great job of explaining them.

One example of a question that I saw on that channel would be, if you pick four points on the outside of a sphere, then they’re going to form a tetrahedron if you connect the four points. A simpler example would be, you have three points on the outside of a circle, and they form a triangle, just connect the three points. What are the chances in each of those two cases, you have a three dimensional case with the sphere, and a two dimensional case of the circle - what’s the probability that if you just chose the points randomly, that the triangle or the tetrahedron, as it were, would contain the center of the circle or the sphere?

It has a nice clear-cut answer that you can get to. That’s a very challenging question for the 3D case. That’s the question that you can pretty quickly explain it to someone.

But the idea of getting an answer to that - not just having an answer, but having verifiable proof of an answer, like you know that your answer is 100% correct, and can justify it, almost like a legal argument - that’s a different matter entirely. That’s what mathematics really is. And, so, a simpler example would be, the Pythagorean theorem. Students know A squared plus B squared is equal to C squared.

We drill that into them. We do a good job of making sure that that covers the population pretty well. Even though it’s not 100% clear to me, someone with a Ph.D. in mathematics, why they need it. That’s another matter.

But we do manage to teach that fact. It has an explanation. Very few students learn about that explanation. That explanation is what mathematics really is. The explanation itself can be done in a variety of ways, and it’s beautiful. If I have somebody who has a little bit of a knowledge of high school algebra, I find I can show it to them in five minutes or so.

That to me is the divide between high school or middle school mathematics courses and problem solving, which is trying to get more at what mathematics really is, which is being able to not just state a fact, but to know why it’s the case, and to be able to justify it.

Len: That’s a really great description of the area. Thanks for bringing up the competitions and stuff like that, that happen as well.

You’re reminding me of when I was doing my doctorate, it was in English, but I had a friend who was doing a doctorate in math. One day, I saw him with a backpack, and I’m like, “Where are you going?” He goes, “Glasgow.” I’m like, “Why?” He’s like, “They can’t get the penguins to hatch at the zoo.” It was something along those lines. We’d talked a lot about stuff, I think he did fluid dynamics, or something like that?

But that was when it finally clicked for me, what he did, was solve problems. As you mentioned, one of the things that makes this problem solving culture so attractive to some people, is that, there’s usually not one answer. There’s many ways of approaching a problem. Just because you solve it one way, doesn’t mean you’ve exhausted the interest in that problem. Because you can then think of another way to solve it.

Then, you can often find your way into mathematical concepts yourself, without knowing that’s what you’re doing. Then that becomes part of your journey along the way, including just fascinating things, like, you can have a terrible problem, or a problem that you’re having a terrible time solving. Then, you read a paper, and it’s like, “Oh, that’s how you do it.”

Phillip: Right, yes. I think about the same thing. Like the Pythagorean theorem example - there being multiple ways of getting at it, and that there’s a couple of hundred or however many different proofs of the statement. None of them still provide any real intuition about why it is that if you draw a right triangle, and then you form squares from each of the three sides of the triangle, that the areas of the smaller squares, when you add those two areas together, would equal the area of the third. That’s an extremely strange, non-intuitive fact.

That’s what the Pythagorean theorem is, right? So, often you’ll have occasions in mathematics that make the theorem, what it is you’re trying to prove, make sense. Like, “Oh, I get that, that’s intuitive and obvious now.” Even though you can justify why this has to be the case, it doesn’t provide any intuition on why, “If there were some higher power dictating why these things have to be true, why in the world would this be true?” I think that’s a good example, too, of sometimes you don’t have the perfect, elegant, intuitive explanation of why something is true in math.

Len: Of course, giving people problems to solve is a very great way of teaching as well. That leads me to ask you about Rosalind, which you co-founded, I believe, while you were doing your Ph.D. at UC San Diego. I was wondering if you could talk just a little bit about what that very popular platform was built for, and who used it?

Phillip: It’s funny. I was a teaching assistant in mathematics courses at UC San Diego. We would get - regardless of who was TA’ing which particular course, we would get the same courses every semester. But also, they’d be the same professors, usually. They might change slightly. But the homework assignments were the exact same.

I think that really the first point at which I started thinking about Rosalind, was, I was a TA for a calculus course. I put together a solutions guide to the homework, not knowing that they were going to give the same homework in the next semester. I wrote out detailed solutions for the students in the class, after the homework was due, so that they could see, “Here’s the cleanest way of going about solving these problems.”

The students really liked it. But then the professor stepped in, and said, “You really shouldn’t do that. Because we’re going to give this exact same homework assignment next semester, and that will spread around like wildfire, and that’s what people will use to solve the homework.” Not realizing, of course, that the solutions manuals to these textbooks are very easy to get. I mean, they take five seconds of Googling to get. If you want the answers, they’re not hard to find.

But I thought of this being my job. That Ph.D. students, a lot of people - if you’re talking about misconceptions, another misconception is that Ph.D. students are students, that they pay tuition, or that they may take on debt. It’s not quite like a medical school student. Ph.D. students in the sciences are essentially apprentices. In a discipline like mathematics, because you have people who may not have research funding for students, they make their apprenticeship salary, which is very meagre, by serving as teaching assistants. I just thought of the system, where we’re in a public school system. And, yes, we go in, and for an hour a week we’ll do a recitation, where we work problems on the blackboard.

But a lot of what we do, we’re just graders. We’re just sitting here grading exams, and grading homework problems. Then I thought about how many students, especially given that this is like - over multiple universities using the same textbook, grading the same problems by hand, semester in, semester out, over and over and over again, year after year. I thought, “That’s really strange. It’s a very strange system of human labor, that’s extremely inefficient.”

I had started being connected back to Computational Biology at that time, through my research. Pavel had a textbook at that time. Having gone through that textbook, and seeing the algorithms that are fundamental to biology, in terms of data analysis, especially with respect to DNA. Going back to my example from previously, of, how to you assemble a genome from fragments? There are algorithms like some of the ones that Pavel has worked on, but across a lot of different areas of Computational Biology.

Another example would be, say I have two genes, one, our hemoglobin gene, and a guinea pig’s hemoglobin gene. That’s representable as a string of amino acids. You want to identify how they’re similar. So, how is it that you would compare them, given that, in that case, the genes are relatively short, but you might have longer genes, and don’t want to do it by hand? You can do that for two genes, or you could do it for say 100 coronavirus genomes. The coronavirus genome’s about 30,000 nucleotides long.

You could take 100 from different patients. You want to line up all the letters, and see where the differences are, because that can help you identify variants, for example. In both of those cases, you have extremely fundamental approaches that have been developed to solve these problems, that are all about, “Here’s two strings,” or, “Here’s 100 strings.” Strings being like, words made up of letters. We want to compare them, and find out where they’re similar, and where they’re different, and slide symbols around, and so on. To line them up in the way that makes the most sense. Because that’s going to infer how they evolved.

I thought, it’s the same thing. You could have - a lot of students are learning about this. Especially as the discipline is growing, and you have hundreds of universities that teach a course in this type of area, fundamental bioinformatics algorithms. I was just thinking, “Does that mean that they are grading all the code by hand? Or everybody has produced all of their own auto-graders for these tasks?” It makes a whole lot more sense to just build a central repository, and say, “For the neighbor-joining algorithm,” or, “For pairwise sequence alignment by Needleman–Wunsch,” right?

Papers that people probably haven’t heard of, but that have several thousand or tens of thousands of citations - because they’re how biology gets done. Let’s just have one auto-grader, where students can write some code, and they’ll get a random data set from the website. They can plug it into their code, and then take the output and put it in the website, and it gives them a check mark if they’re done, and then they know they implemented that algorithm. It takes any human aspect out of this process, and automates something that’s going on all over the place. That’s what the conception of Rosalind was.

In parallel, Pavel had a different student - he had two labs. One in St. Petersburg, Russia - who had the same idea, but building this auto-grader more locally, in terms of his own teaching. So, it grew from there.

Our hope was maybe we could get like ten universities on board. We had no idea that it was going to wind up reaching hundreds of thousands of people. Because it wound up as an independent resource for learning. We figured it out as we went along. We started to get people who were using this, just the public using it and it getting posted places, that it was outgrowing its original conception, and it was now just a place where people could learn about our field in an open and free way.

Len: Thanks very much for sharing that. I was especially interested in hearing about the inspiration for building something that would auto-grade, and take the human element out of it. Because the work that was being done in the first place wasn’t very human, in any case, right? This was just rote grading of these - when you realize that actually, “This can be automated,” that’s better for everyone. I think a lot of people might hear. “Oh, we automated it. We took the human element out of it.” That this is detracting from it, or something like that. Not at all. This is actually making it much better for everyone. Phillip: Yes. The idea is to free your teaching assistants’ time, to be spent on things that are more relevant, right? Like actually working with students, and that type of thing.

Len: Yeah, working directly with the students, where the human element can be at play.

Phillip: Certainly.

Len: That reminds me, actually. You’ve written about your teaching philosophy on your website, and you’ve also got (Programming for Lovers, where you have a manifesto. I was wondering if you could talk a little bit about that? Your teaching philosophy and the Programming for Lovers project?

Phillip: That’s a project that’s still in progress. I’m planning a larger release of it on its own website.

The idea of this course would be that, we tend to teach programming to people that - I say, “we,” meaning the Computer Science community - computer scientists tend to teach programming to other people that are going to be computer scientists, ackers, and people who like video games. The examples that they tend to pull for teaching this, will come from mathematics or computer science. That’s great, if you’re interested in those fields. I mean, had I had an exposure to something like that in high school, I might’ve gotten into Computer Science, instead of mathematics. It wasn’t even on my radar that you could use math for computers.

But at the same time, you have a wealth of different processes that go on in science. Whether it’s some of the examples I’ve given from biology, or something like building a gravity simulator, right? To simulate the motions of celestial bodies. Great examples of how to program certain things. How to build a system and implement it in a programming language. Then visualize it, and analyze it, and so on.

I have taught a course for seven years now, that I inherited from Carl Kingsford at CMU. where I’ve added a bunch of scientific examples, where the idea of teaching the programming course, is, “Let’s just present a scientific narrative, and that will lead us to a point at which we need a computer to answer some questions.” That’s a scientific question. Then, “What skills do we need to answer that question?” Well, let’s get those skills, and then let’s return and answer the question scientifically, and with the computer.

I think that’s relatively unique. You don’t have courses that are focused on learning Computer Science. But from the lens of a certain discipline, it’s an area that’s growing.

Another part of programming education that’s growing, is in websites that are built to try and attract as many students as possible, and to lower barriers to entry, for entry to learning how to code. Those projects are really, really good in some respects, in terms of knocking down barriers, and convincing people that they can do something that maybe they thought was reserved for nerdy people at MIT, and Carnegie Mellon, and places like that, right?

Where they’re not so great, is that often they’re made with venture capital money, and they want to demonstrate to whoever, whomever, that the project is successful. There are metrics for the success, this success. Often those metrics are based off of, “How many active owners did you have? Exactly how many questions were they able to answer?” You boost those metrics by watering down content. That’s what I’ve seen when I’ve looked at a lot of places.

Now, I’m not indicting every online education project. But I tend to find - I find this through my own teaching at CMU. The number of students in the last seven years, when I go into my class and ask, “Who here has experience in programming?” The number of hands that go up, it’s practically like universal now. As opposed to it being a small minority of students seven years ago. That’s fantastic.

However, you have students who have an exposure to a discipline, but they lack the rigor, and the challenge of that, that is inherent to the discipline. It is hard in a great way. When you get it, it’s a wonderful thing, right? All that is critical.

A lot of times, they get very skin-deep exposures to this field. Then they hit a point where they don’t understand something, or they realize that it’s hard. Then they often may internalize it, or they hit a dead end. There’s no bridge to take them to - truly this is a skill that people can learn. You do have to have some quantitative knowhow, but that’s also another skill, right? It’s just critical that you get it at the right point in your educational timeline.

But my idea for the course is to combine a rigorous experience, like what you would get with a classic introduction to Computer Science at a top-notch place, with some fun applications that are provided from science. To try and find a way that’s still accessible to beginners. That’s the project that I’m currently working on, that I really want to see grow in the coming months and years, that I think could have as big an impact as some of the other stuff I’ve had the privilege of working on.

Len: It’s just such an interesting challenge, even to articulate, it seems. I’ve learnt this doing research for this interview - even to articulate the number of things that are intersecting, that you’re trying to bring together.

You mentioned, for example, I think it was the first time, just in your explanation? The first time in this interview that it came up, which is the term, “Computer Science.” In my introduction, I mentioned that you were in the Department of Computational Biology at Carnegie Mellon, but you’re in the School of Computer Science. That Department of Computational Biology is in the School of Computer Science. So, I guess maybe probably, one of the best ways to work out all the things that are coming together in this - could you talk a little bit about what someone would -? What their program would be like as an undergraduate, if they majored in Computational Biology?

Phillip: Sure.

Len: What courses would they take? What people are you trying to get in through the -? I think it’s called the “Pre-college Program in Computational Biology?” If someone’s listening and they’re - let’s say they don’t know anything about all this, but it all sounds super interesting. What would you study if you did a major in Computational Biology at CMU?

Phillip: There’s two programs here. We have a high school program, our Pre-college Program in Computational Biology, is open largely to rising seniors in high school. We really look only for students who are strong quantitatively, and who like science, especially biology. Because there’s a huge number of students out there who actually do like biology, want to study biology, and are just generally interested in science.

We have a research project, where we take students out on Pittsburgh’s three rivers on a boat trip, about a forty-mile boat trip. We do educational activities with them on the boat. Every so often we stop and sample water. Then they go into the lab, they isolate the DNA from that, their water samples.

Then, the purpose of the program is to analyze those water samples using algorithms that they themselves write, there being this harmony between field collection, laboratory experimental work, and computational data analysis, that is really what biology is all about. To show students this in a three week intensive, eight hour a day summer program.

I’ve been amazed at how many students we’ve had that have been interested in this. Because we didn’t know. As one of only a very small number of programs that teach Computational Biology to high school students globally, we didn’t know if the students would show up, and then they really started showing up in droves, which has been fantastic. That’s a program I love getting to work with.

I get to use my programming materials for students to get up to speed in programming. Then, by the end of the three weeks, they code all these algorithms together. They’re all extremely strong in programming, which is something to be proud of too. Then they go off to different universities, and so on. We’re hopeful that they take what they learn, and continue to study biology. But if not, that’s fine too.

Our undergrad program is a four year major. The School of Computer Science at CMU is a place that’s essentially a top three place to study Computer Science historically, alongside MIT and Berkeley. MIT being the big name, that has like the biggest household reach. But in terms of rankings and prestige and the research accomplishments of professors and the quality of the education, and so on, it’s a top three institution.

For thirtyyears, you could only study Computer Science. Over the past few years, we’ve seen a change, to have now four majors. Now, we have Artificial Intelligence, Human-Computer Interaction, and Computational Biology, as major options. Comp Bio was actually the first non-Computer Science major. The idea being that, Computer Science is becoming extremely broad, and influencing a lot of different parts of our lives. Thirty-years-ish ago, when they started the Computer Science major program, it was much more esoteric, right? In terms of what you could do with a Computer Science major.

Now, the demand for people working in this area is huge across every aspect of the economy, right? So, that exists, at many places at the graduate level.

But what’s neat about CMU, is we now have multiple majors at the undergrad level. If you were a Comp Bio major, for example, you’re still a Computer Science student, which means you still take what I call a “brain car wash” of mathematics and Computer Science core courses. But then you also take some biology foundational courses, where you start to realize what bio is. You have a Computational Biology core of coursework too. You have a lab course that’s based off of heavy data analysis.

I teach the first course in the major, called Great Ideas In Computational Biology, that I actually won a teaching award this spring at CMU, which I was very happy about. That shows students, what are the big ideas that have made biology a data discipline? Then students take advanced coursework that brings them closer to what real companies and real researchers are working on, and the methods that are at the frontier of the field when they’re in their later couple of years.

Len: You mentioned biology and data. That reminds me of a line I wrote down from a video that I watched yesterday. Where you say, “Biology is now fundamentally a data science.” When you talk about introducing students to what biology really is, and it’s not - what was it? A friend of mine who was a biologist once complained about how people often had a fried egg view of a cell. It’s like a circle with a circle in it. As you’ve mentioned before, it is easy to get very sarcastic about these kinds of things. But people -

Phillip: Because it’s been flattened against a plate?

Len: Right.

Phillip: It doesn’t necessarily reflect its reality.

Len: Yeah. Because of microscope technology and the way that worked, and things like that. Then you had to put light:32 underneath it, and stuff like that. It’s interesting. Of course, I know you brought that up on purpose, to say that the way we view things depends on the technology that we use. The images we have in our heads depends on the technology we use.

But when it comes to the complexity of what biology is, I just wanted to read - I wanted to move on now to talking about your book. Specifically, Biological Modeling: A Short Tour. There’s this great passage at the beginning, that I’m just going to read.

“You may feel like a single, coherent being, but you are just a skin-covered bag of trillions of cells — about half of which are bacteria — that act largely independently. These cells are full of proteins, complex macromolecules that perform nearly every cellular function. If a protein could move in a straight line, then it would move at 20 kph or faster, meaning that the protein would cover a distance 1 billion times its length every second (analogous to a car traveling at 20 billion kph). However, the cytoplasm filling the cell is so densely packed with water molecules that the protein ping-pongs off them, frequently changing direction.”

I just love how that it grounds things in ourselves. I mean, what it really brought home to me reading, it was just really how biology is a data science.

Phillip: It’s a data science. I hope it’s clear that, with the quote like that - I think, I don’t know? That’s me trying to get into somebody else’s head, the wonder of biology, from perspective of, “I thought it was X, and it’s actually Y.”

There’s a lot going on in what you just read, right? To try and densely grab somebody’s attention. Because all that to me, is just amazing, that you have a system that’s so complex, where things, just - moving unbelievably fast, with all this energy and bouncing off of each other. That somehow the whole symphony makes sense because of all these random interactions. It’s amazing. Our conversation is ultimately dependent upon all of that.

There’s other things lurking in that. Like about half of our cells aren’t even ours, they’re bacteria. They’re just hanging out. Sometimes they are helping us symbiotically. But also, what I think is a grand challenge of biology, which is, you have reductive models. But then how do we take a holistic view of those reductive models? How do they connect? How is it that all these random interactions of particles drive what goes on in the cell? But then, the interactions of cells, often behaving independently producing large scale behaviors, that is something that is pretty much open.

In one of the chapters, we hint on trying to connect, completely understanding a system on the level of its machinery. Then, being able to infer things about that system. Because we built a model that was good enough, to be able to step back from it on a higher view, and understand it.

That has been done in the - the biggest, probably the most landmark sample that I could think of, of this being done, would be taking a very simplistic bacterium and building a model of each of its processes independently. Then connecting those processes, according to what’s known biologically. Then you’ve got everything simulated on a computer. You click, “run,” and let the computer simulation work.

This is about ten years old, that they did this for a simple bacterium. They still haven’t taken that work and extended it to E.coli. What was partly cool about that model, is that they were able to look at the model, and see what happened at the end - measure the concentrations of particles over time, and look at different processes as they interact, and the system just worked.

Then, they were able to reach biological hypotheses, that weren’t actually experimentally validated, or known at that time. Because here’s what the model told us. Maybe that’s nonsense? Maybe it’s true? If the model is great, then it’s true. But it might just be a figment of the model not being perfect.

There are actually examples of, the model made predictions, that people went into a lab and validated. I think, to me, that’s one enormous frontier of biology, is doing that for increasingly complex organisms. Let’s do it for a complicated bacterium next. Let’s have some colony of bacteria that we simulate. Then, let’s get ever and ever more complex, in terms of what it is that we can model. Can we model a human cell?

We’re not there yet. We can model human cell systems. Very simplistic representations of it. But no one has built, I don’t think, a complete working model of a human cell. Then, what about an organ, right? That’s based off of cells.

My point is that, at what point are we then heading towards replicating human intelligence on a computer? It’s not going to be anytime soon, so there’s no reason to be concerned. But you can see how incremental changes to this - if you go back fifty years, no one would have thought you could completely replicate a bacterium on a computer. Yet, that’s been done.

Len: There’s so much to unpack there, that we could do probably three or four podcasts, just on that one set of things you just talked about. But I want to see if I can do my best to try and focus in on a specific example of biological modeling, based on something that you brought up.

Which is, the squishing of something between two glass slides, and then using it in a microscope. It used to be that, to observe the world, we used our eyes. We more or less didn’t have much - I mean, we used our other senses as well, and things like that.

But then, with respect specifically to biology, people developed microscopes. They’re all of a sudden, like, “Oh my God, look at all this stuff that’s going on at this smaller scale.” Now we can see things at that smaller scale.

Then, the scales at which we could do things got smaller and smaller and smaller, with things like electron microscopes, and things like that. But typically, just like you were talking about squishing, you squished that poor cell between the glass sides, and it goes flat. Often this meant changing the contents of the cell by coloring it, or just, it has to be a dead cell in order for you to view it certain ways.

But, with biology becoming a data science, and being able to do computational modeling, one thing you can do is basically train programs to look at the surface of living cell, and infer what’s going on with all the organelles and other kinds of things inside, if I’m getting it right? I’m probably getting that wrong. But like, there’s some version of this, right? That, you don’t actually have to kill the cell, but based on millions of examples, you can infer what’s going on by observing the surface, and using basically pattern matching?

Phillip: Right. I would say that that’s a little beyond my own expertise. But, it’s the type of thing that my colleagues have worked on.

One notable example would be, a couple of our former Ph.D. students were at the Allen Cell Institute. They were working on what are called “bright field images of cells.” They’re greyscale. The cells, you don’t have to kill them, that’s nice. But they’re low resolution images of the cells.

You would think of this as a simplistic form of microscopy. But it has the benefit that the cells are living, and you can digitalize them interacting. But that was in contrast to much more cutting-edge approaches. Like in florescent microscopy - that’s another thing that I don’t think maybe many people would appreciate. That there’s still extreme advances going on, in terms of microscopy, right?

But what these two folks did, and a few others worked on, was taking the bright field images, and saying, “Maybe the human eye can only see so much, and there actually is something that can be inferred from that on a deeper level? So let’s train a computer.” In other words, “build an algorithm.”

That sounds more AI than it really is. But build an algorithm that is able to infer a lot of patterns from that, and then predict what they think, the computer things, or the algorithm thinks that the fluorescent microscopy versions of those images would be.

I remember, they showed that to our students at an industry event. I couldn’t believe it. Apparently they said that is the same reaction that they’ve had anytime they show biologists. Because it’s just a mind-blowing thing to look at. That you can take one form of data that seems so primitive, and infer something that’s much, much higher resolution from that. That’s an amazing thing too.

Len: It’s reminding me, I don’t know if we want to go down this path, but one thing that was so interesting for me, looking at the book and stuff, is seeing the parallel language to physics. Concepts like course-graining and emergence, and things like that - which aren’t metaphors, given the way their biology is being done now, if I understand it correctly? It’s like the same thing.

Phillip: Yes, exactly.

Len: Just before we go on, maybe to move onto the last part of the interview, where we talk about how you pulled off these kinds of projects and things like that - specifically, the bigger project that this book came out of.

At the very beginning of the book, you do have this great example, where you bring computation, Computer Science and biology together, with the example of Alan Turing and the zebra spots. Or stripes, pardon me. I was wondering if you could tell that story just briefly, so people can see that connection?

Phillip: Sure. Alan Turing, I mean now they’ve made a movie about him starring Benedict Cumberbatch, he’s entered the public eye. So many people have now heard of Alan Turing. But he’s famous for a couple of things. He was a famed code breaker, who broke Nazi codes, working at Bletchley Park in England during World War II. He also conceptualized what a computer is, on a very basic level. He said, “It’s the same thing that we’re talking about in terms of building models.” He said, “Essentially, can I build a model of what a computer might be, and what abilities it might have, that’s as simply represented as we can come up with?”

So, his theoretical computer, there’s no implementation of that computer needed. The idea is just to theorize it. His computer - say you have a reader. The reader is like a camera of sorts, that can read out symbols on a tape. The tape is as long as you like it to be. But the tape only has symbols printed on it. Each cell of the tape has a single symbol. In a classical formulation, you have either a zero, a one, or a blank printed on a cell.

So, for example, you can have a string of zeroes and ones that might represent a longer number. Turing was able to show that you could do an enormous number of things with a computer, whose only abilities are to read something off of the tape. Based off of the current state it’s in, the machine has a finite number of internal states. If I’m in state 37 and I see a zero, I’m going to change the zero to a one. I’m going to move one cell to the right, and I’m going to enter state 43. That’s always the steps it’s going to take. If it’s in state 37 and it sees that zero on the tape, it’s always going to do that. It doesn’t have any flexibility.

That conception of a computer from the 1930s is as powerful as the most powerful supercomputer on the planet today. There’s a hypothesis that any computer that we could ever come up with, could be represented - anything it does, could be represented by some set of instructions on what’s called a “Turing Machine.” Turing’s famous for this, because that’s an amazing development. It’s in the 1930s, right? That he essentially lays these foundations of Computer Science, upon which no one has advanced since that time, even though we have wonderful devices.

But he’s also famous for one paper in biochemistry, where he proposed that, the reason why zebras have stripes, might be as the result of a relatively simple system, in which two different types of particles are interacting, according to some rules. Because of those rules, you see a segregation. Although it’s not perfect. If you zoom in on a zebra, you won’t see perfect black regions and perfect white regions.

It’s the same for a fish. For example, you zoom in on a fish that has stripes. They’re not perfect. You have a light blue stripe, and a dark blue stripe. Not every pigment cell is light blue or dark blue in those stripes. But you would see some segregation that, when, again, thinking from the perspective of zooming out, from a holistic perspective of seeing a photo of the fish or a photo of zebra, you see stripes. That was his one paper in this area.

In his honor, patterns like what we see on the zebra, what we see on zebra fish or puffer fish, are called Turing patterns. Based off of this hypothesis that - it’s based off of, actually a very small, simple set of rules. The same thing that Turing worked on with respect to computers. This has been validated in zebra fish. That the pigment of zebra fish, the pigment cells form stripes, because of a simple set of rules that are similar to what Turing was talking about.

So, the purpose of that introduction to the book, is to explain what a sample Turing pattern system might be. What are the rules that dictate two types of particles interacting, almost in a predator/prey relationship? Then to be able to zoom out from that, and see whether or not patterns are indeed forming as a result.

Len: You captured a really great and fascinating ambiguity there, in biology. But when you talk about the reasons zebras have stripes - because from this, from the particle-based modelling explanation, the reason is, “Because this is how these particles operate, according to these rules and are various states, under certain conditions. That’s the reason that they have stripes.”

But the evolutionary reason that zebras have stripes - we don’t really have a complete answer to that yet, right? I think you say in your book that people are pretty sure it’s because it helps keep flies off the zebras. But it captures that difference of - the totally different and that type of analysis that you do, given the scale at which you’re looking at the phenomenon.

Phillip: But then apparently other researchers said, “Well you don’t need stripes to keep the flies off.” As long as it were some type of black and white pattern, you could use a checkerboard. Why did they evolve the stripes, versus some other way of organizing the skin in alternating patterns of light and dark? It’s not clear.

Len: I just love that example. Because it goes like - even with all the like amazing stuff we can do nowadays, not just with the specific machines, but with the concepts - like the relatively recent in human history, the concept of a Turing machine. I mean, the things we can do at all these different scales. But there’s still these really big problems to solve and answers to find, which is what makes it such a fascinating field.

So, the book came out of a larger project called Biological Modeling if I understand it correctly? Which is a free course. Of course, everything that I’ve been talking about, there’ll be links to in the transcript for this interview. But I was wondering if you could just talk about that from the - bracketing the science of it, just from the project level. You’ve set up a number of successful projects, and worked on with other people, like Rosalind. The Coursera course, and things like that. But you started this, I believe, with a Kickstarter campaign?

Phillip: Well no, not quite, so - the project itself, the course is funded by a National Institutes of Health grant.

Len: Okay.

Phillip: We’ve worked to make a completely free and open online course for anyone to take a look at and learn from. I’ve worked on that with several students, undergraduate and Master’s students at Carnegie Mellon, in Computational Biology, who helped me build out parts of the course.

Then, the idea was, “Well, we’ve built this pretty gargantuan thing.” The course is a collection of text modules. The idea of the course is that every so often, you’ll hit a point where we need to build the model and then look at it. So, every time that happens, we have a link to a tutorial that shows you how to use modeling software to produce the model.

Then we’d come back to the main text and analyze it. The main text itself was like 100,000 words. Then the tutorials, I think were actually longer, if you add them all up. I realized we had quite a lot there.

So, I funded a textbook project coming from the course, packaging the main text of the course into a way that would be engaging in a PDF form or a printed form. We funded that by Kickstarter. Then, Indiegogo, starting in December. We’re finalizing publication of it currently.

Len: Just for those listening who are sort of, who are interested in setting up projects like this, maybe not everybody’s going to get National Institutes of Health grants to begin their projects, but there are people who are interested in Indiegogo and Kickstarter. I was very curious actually about, what happened that you used both Kickstarter and Indiegogo?

Phillip: This was just doing some due diligence on best practices for campaigns. Kickstarter has a timeline. What I would suggest to anybody, is, find these resources, and follow what they say. Because I think that they’re backed in research and results, and they make sense.

So, for example, don’t have a campaign that lasts six years. You might think you’ve maximized your revenue that way, or whatever, but it doesn’t - it’s nice to have very clearly-defined projects, and relatively short timelines in terms of funding and that type of thing.

Because Kickstarter’s timeframe ends after a set period of time, where everything is all or nothing, in terms of the funding - after that period of time, if you’re still in the process of making something, you can have the exact same project on Indiegogo. There’s a relationship between the two platforms. They will simply port over all of your information about the project, including how much you raised, and they cross-link. So, it’s essentially one project on two different platforms.

Len: Okay. I thought I saw that they actually had - maybe this happened years ago? But that they actually had established a connection with each other, rather than acting as total competitors, which I had naively viewed them as. That’s really fascinating, as you say, that they’re using their experience and their data theory, to help people actually genuinely complete their projects and set them up properly. That Kickstarter might be good for one part. then Indiegogo might be better for another part. That’s really, really fascinating.

Specifically when it came to writing the book - so, you’ve used our upload feature on Leanpub, to get it onto the site and onto the platform. It’s a very well-made book, and it has all sorts of cool features, like these laptop icons in the margins that people can click on, that take them to particular parts of the course or other places online. I was wondering if, for any authors listening, or would-be authors listening, what tool did you use, or tools did you use to create the ebook?

Phillip: Oh, so I used - because I’m a mathematician, I use LaTeX, which is typesetting software, that is very, very much used by mathematicians, often with all the defaults on. You can tell a document that a mathematician’s produced, because it’s in the same computer fonts. But LaTeX has a wonderful system, and it has a ton of packages and add-ons, and so on, where you can do quite a lot of things. Especially if you’re not an artsy person like me. I would have a hard time going from the HTML of our course, and converting that into Adobe Illustrator, or something like that.

But I’m technical, so I can handle the code-y aspects of LaTeX. Then there’s a package called, “Memoir,” that someone produced as an add on for LaTeX, with like a 650-page PDF documenting everything that you can do with it. It’s really nice for longer documents and customizable fonts, and things like that.

That’s something that I used for this book. It means that it’s relatively powerful, right? If you want to have calls to the margin, or hyperlinks to our website, that are these laptop icons, it really winds up just being one command that you write all the code for behind the scenes, and that type of thing.

Len: So that means you were writing in plain text files, and things like that?

Phillip: Yes. It’s essentially a .txt.

Len: Thanks very much for sharing that. It’s always fascinating. One of the interesting things about the fact that Leanpub attracts so many people who are technically minded, is that they often have very sophisticated, and somewhat bespoke, methods for making their own books, if they’re not using one of our own book writing processes. It’s always so interesting to hear about them.

The last question that I always like to ask on the podcast, if the guest is using Leanpub as a platform, is - if there was one magical feature that we could build for you, or if there was one terribly awful thing that had you shouting at the screen every time you went to our website, that we could fix for you, can you think of anything that you would ask us to do?

Phillip: I don’t know if I would. That’s a good question. I wasn’t expecting it. But I’d probably need more time to think.

Len: Okay, well, no problem. You’re probably not going to be surprised that about half the time, that’s what people say. But please get in touch anytime if you think of anything. One of the reasons, insofar as Leanpub is the platform that it is, is because we’ve asked this question of many authors over the years, getting their feedback about, “This would be great,” or, “That would be great.” Or, “This is awful.” Or, “You don’t need that,” and things like that.

But that being said, well Phillip, thank you very much for taking the time out of your day to do this interview.

I’m sure people listening can tell, there were so many things that we could’ve talked about for so much longer. But if you’re interested in learning more about all this, Phillip’s been a part of, and produced himself, so much really interesting content that’s available, much of it for free, that you can find, if you listen to what we were talking about here. Or if you go to the links in the transcript that we’ll publish on the website, where you can find the sources of all the things that we talked about. Thank you very much, Phillip, for being on the Frontmatter Podcast.

Phillip: Thank you Len, it’s great to be here.

Len: Thanks.

And as always, thanks to all of you for listening to this episode of the Frontmatter podcast. If you like what you heard, please rate and review it wherever you found it, and if you’d like to be a Leanpub author, please visit our website at leanpub.com.

Podcast info & credits
  • Published on July 25th, 2022
  • Interview by Len Epp on June 7th, 2022