An interview with Derrick Mwiti
00:00
00:00
  • March 26th, 2019

Derrick Mwiti, Author of Introductory Tutorials For Machine Learning: Kickstart your Career in Machine Learning

00:00
00:00
23 MIN
In this Episode

Derrick Mwiti is the author of the Leanpub book Introductory Tutorials For Machine Learning: Kickstart your Career in Machine Learning. In this interview, Leanpub co-founder Len Epp talks with Derrick about his background, the startup scene in Nairobi and elsewhere in Kenya, whether you need a Computer Science degree to start a career in machine learning, how he got into blogging and writing, machine learning in general and how it can be used to do things like fight everything from spam to insurance fraud and even help with writing software, his book, and at the end, they talk a little bit about his experience as a self-published author.

This interview was recorded on February 19, 2019.

The full audio for the interview is here. You can subscribe to the Frontmatter podcast in iTunes or add the podcast URL directly.

This interview has been edited for conciseness and clarity.

Transcript

Introductory Tutorials For Machine Learning: Kickstart your Career in Machine Learning by Derrick Mwiti

Len: Hi, I'm Len Epp from Leanpub, and in this Leanpub Frontmatter podcast, I'll be interviewing Derrick Mwiti.

Based in Nairobi, Derrick is a data scientist, mentor, and trainer, with particular expertise in areas such as data visualization and machine learning. He is also an avid writer who contributes to a number of data science publications, including Datacamp, Towards Data Science, KDnuggets, and Heartbeat, by the Boston based startup, Fritz.

Derrick is the author of the Leanpub book Introductory Tutorials For Machine Learning: Kickstart your Career in Machine Learning. In the book, Derrick provides tutorials with practical examples for software developers on the important topic of how machine learning is actually changing the way we write software and other things that we do.

In this interview, we're going to talk about Derrick's background and career, professional interests, his book, and at the end we'll talk a little bit about his experience using Leanpub to self-publish.

So, thank you Derrick for being on the Frontmatter podcast.

Derrick: Thank you Len for having me.

Len: I usually like to start these interviews by asking people for their origin story. I was wondering if you could tell us a little bit about where you grew up, and how you first became interested in computer science and technology generally?

Derrick: I grew up in Eastern Province) in Kenya. After finishing high school, I became a peer teacher in one of the local high schools, where I taught mathematics. That's where the interest to pursue a Bachelor of Science in Mathematics and Computer Science came out.

After that, I went to university, and I trained some of the locals, their communities - such as we have in Nairobi. That's how I got immersed into their technology scene.

Len: I can see on LinkedIn that you participated in a program called the Lapid Leaders Experience. Can you talk a little bit about that experience, and the purpose of the program?

Derrick: So Lapid Leaders Africa is an organization that is based in Nairobi that exists to unlock the potential of young people in Africa. The purpose of this organization is to help young people to build their self-awareness, sharpen their abilities and skills, and also develop the exceptional leaders that Africa needs.

After completing their program, I volunteered to lead recruiting in the organization, an experience that helped me to work on my leadership skills, as well as improve my level of self-awareness.

Len: How did you go about recruiting people?

Derrick: When I was in charge of recruiting, I was still on campus. So we came up with strategies, such as doing events in campuses. We also came up with something we used to do called "Lapid Sunday," where we'd meet young people after service in their churches. Also, we introduced something called "Lapid Coffee," where I would go to various campusesm and during break time, I'd buy coffee for the students, as we talk about leadership in the continent.

Len: That's a really good idea, buying coffee for students. You were also part of the Entrepreneur in Training program at the Meltwater Entrepreneurial School of Technology. Can you talk a little bit about that Entrepreneur in Training program?

Derrick: MEST offers graduate-level training for students in Africa - for those who have already completed university. It takes around 60 people across the continent. It's quite a difficult program to get into.

Through their one year scholarship, you learn about business and communication and software development. And after the program you pitch, and if they like your idea, they'll put money into it.

Len: One of the fun things about this podcast, is that I get to interview authors from all over the world. One thing I always like to ask about is what the startup scene is like where they're from. Can you talk a little bit about what the startup scene is like in Nairobi? Is the tech sector a priority for the government there, like it is in some other countries?

Derrick: That's a very interesting question, Len. The tech sector is a very big priority for the government in Kenya. The technology cabinet secretary, Joe Mucheru was previously the head of Google in Kenya - from the AI and blockchain task force, that has come up with strategies on how the government can implement blockchain, and to measure intelligence in government operations. There are also various funds that support local startups. And Nairobi's also a very good place for a startup, because we have over 27 tech hubs picking up acquisitions from Egypt and South Africa. This industry is also very mature, because of the high levels of internet and mobile penetration.

Unlike other nations, it's very easy to set up a business in Kenya. You can establish a business by just going online and registering it. However, the market is very competitive and it's not very forgiving, like other places on the continent. In Nairobi, consumers expect you to give them ready products. Their best product wins. It's a winner-takes-all market.

Len: I wasn't planning on asking you this question, but in preparing for this interview, I just did a little bit of checking out the news from Kenya, and I read about the deployment of a nationwide digital identification program. Is that related to the government's efforts in blockchain and things like that - if I read the story correctly, and I'm right about that?

Derrick: So it's called the Huduma number. It's a program by the government to register all the people in the republic. And also to get their fingerprints, take all their biometric information. It's going to help the government in giving better services to the citizens of the republic.

Len: I believe something similar has been done on a massive scale in India in the last few years as well. That was really interesting to hear.

One question I usually like to ask people who've studied computer science is, if you were starting out a career in tech now, would you still go to university? In your case, you graduated with a degree in math and computer science just a couple of years ago, so I think you're probably the most recent computer science graduate I've talked to about this.

So I wanted to ask you a slightly different question. Do you think it's necessary to study at the university level to start a career specifically in machine learning? A lot of people argue it's not required to start a career as a software developer generally, to get a computer science degree.

Derrick: While I agree with some of these sentiments, I must say that a degree is very important. I specialized in statistics. I must say that some of the concepts that we learn are very complex and require teaching by experienced professors.

That said, it's still possible to pick up some of these concepts from the learning resources that are available online. The challenge we are having is that the education system in most parts of the world is not adapting as fast as the technology is changing.

So yes, it's very possible to become a software developer without a degree. In fact, very many software developers are self-taught. Some have switched from their first degree to learn software development on their own. Due to these possibilities, the first degree does not seem to be a prerequisite for a career in machine learning or software development in general.

Len: And not everybody who gets into technology also gets into blogging and writing. How did you get into blogging and writing yourself?

Derrick: When I started learning, I started sharing what I learned on Towards Data Science. In fact, in the very beginning, they rejected some of my articles. But after some time, I got the hang of it. After doing this for a while, I was contacted by Austin Kodra, who is the community monitor at Fritz. He asked me to write for the blog, and the rest is history.

Len: This is a writer's question - how did you take the rejections when you first started getting them? Every writer has stories about how they deal with that.

Derrick: So the good thing is that they give you actionable feedback, so you can actually be able to go back and look at where the problem was, and rectify on that. And then you write again, get some more feedback - until someone accepts.

Len: You wrote in a recent blog post about the ways that AI may have an impact on the future of education. We've already sort of touched on this a little bit, but can you share some of your thoughts on that topic specifically? About how AI will impact education?

Derrick: I think that artificial intelligence will have a very big role to play in improving education in the future. Some nations are already using artificial intelligence in their education. For example, as the teacher/student ratio decreases, AI tutors will play a very big role in helping teachers in delivering content, and also to meet some things such as grading and predicting the best career for students. We're also seeing things such as smart schools with [?] technology. For example, the classroom's able to monitor class attendance, and even cheating. So ultimately, I think that AI will make learning easier and probably more fun.

Len: There was an article just today that I came across in Wired, by the philosopher Daniel Dennett, about some of the dangers that AI poses for us potentially in the future, and how one of the ways we might end up relating to AI is that it will give us solutions that we can test and are proven to work, but that we just can't understand where the answer came from, and that we might relate to AI as oracles in that sense. Do you have any concerns about how AI might impact perhaps our knowledge and learning, or just humanity generally?

Derrick: I think the biggest concern that has been around is the question of, will AI replace, for example, workers, teachers. But I think what [I think is going to happen] is that artificial intelligence will make work easier for all these workers, for the teachers. So by blending these two, how we work, and artificial intelligence - we will make work easier for ourselves.

Len: It's maybe not the safest thing for me to say, but one of the things is that often - and I've had some experience with this myself - not everybody has exposure to the best teachers. I would've loved to have had a lot of the tools around when I was younger and going to school, that are available to people today. Do you think that AI will help people learn on their own as well?

Derrick: I think what AI will do is, it will enable on-demand learning. For example, the AI tutors, you don't really need, well, for example, teachers in class for you to learn. You can just learn at your own time, by using these AI tutors.

Len: And presumably it will help with something that's becoming, I think, a growing concern for people in the education space, which is lifelong learning.

Derrick: Yes.

Len: The pace - it changes so fast, that learning is going to be something that you don't conceive of as ending when you join the workforce.

Derrick: Sure, sure.

Len: Moving onto the next part of the interview, for those who might not know - I'm sure everyone's heard the term "machine learning," but not everyone knows what it is. Can you talk a little bit about what it is?

Derrick: In very simple terms, we can say that machine learning is a technology that enables computers to learn from data and discover patterns. And in such a manner, automate complex tasks. So for example, Google applies this in spam protection - where if people are sending you spam messages, the Google machine learning algorithms can be able to learn over time what spam messages look like, and just send them directly to your spam folder.

Len: I think actually Google also uses machine learning for it's CAPTCHA system, that checks to see if we're a human or not?

Derrick: Oh yeah, for security.

Len: This is a bit random. But just to get a bit of the depth of the things that you write about in machine learning generally - I found an interesting post you've written on self-organizing maps. Could you talk a little bit about what SOM's are, and what competitive learning is?

Derrick: That's a very interesting question. Self-organizing maps are class of unsupervised learning in neural networks that are used in feature detection. They're mainly used in data compression. And so in supervised learning, we use gradient descent and back propagation in the turning process.

However, in unsupervised and in supervised learning, the data is not labeled. So the learning process, what is known as competitive learning - between the process of the neural networks, the neural nodes compete for the right to respond. And so due to this competition, the neurons are forced to organize themselves, and therefore forming a self-organizing map. The network is therefore able to distinguish various features, based on the similarities.

Len: And how can this be used to fight fraud?

Derrick: So what happens is that self-organizing maps can organize features that look similar together. For example, if it's insurance claims, it can be able to organize insurance claims that look fraudulent together, so you can be able to see that the insurance claims in this cluster are fraudulent claims.

Len: That's really fascinating. I've had to do a little bit of anti-fraud work myself. When you work for a startup, you wear many different hats.

Derrick: Yeah.

Len: And looking for those patterns is always the way you solve the problem going forward when it starts hitting you.

Derrick: Yes.

Len: One thing I confess to having been ignorant of - but it seemed suddenly obvious when I read a post that you wrote about it - machine learning models do not work directly with text. They work with numerical data. I was wondering if you could talk a little bit about how machine learning can handle tasks involving the analysis of words? I know it's a big question, but you have written about it.

Derrick: When working with text data, you have to find a way to convert text data into numbers. There are various techniques out there for doing so. One such technique is creating a "bag of words" model.

Let's assume that we are working on movie reviews data. Say we want to see whether a movie review was, let's say, positive or negative. We'd create a bag of words by taking all the words in the reviews, and creating a column for each word. We would then check each review, and if the word is found in that specific review, it would be represented by a one. And if the word does not exist, it would be represented by a zero.

In the process, you might find that there are words that appear many times, but don't help us in solving the task at hand. We solve this problem by dividing the number of times a word appears in the reviews, by the total number of words in all the reviews. To obtain what we call term frequencies - since a word that appears more will have higher frequencies, we reduce its weight by a technique called "Term Frequency times Inverse Document Frequency".

There other complex techniques for solving text based problems, but this is just one of them.

Len: It's just such a fascinating area. And with automated translation services - this is something I've been watching with fascination, like so many other people have in the last few years - just how good these things are getting, and that they're just going to get better over time - it's really interesting to think about where this kind of technology will be in 20 years, and the types of analysis that people will be able to do on our use of language.

Derrick: All this is going to get better, because of the availability of much more training data.

Len: On that note actually, that leads me to my next question. What is the relationship between data science and machine learning?

Derrick: That's also a very interesting one. Fefore you're diving into developing machine learning models, we first have to explore the data and to understand it. Also, we need to understand the program in depth. The data science process is important, because it ensures that we pass the right data to the machine learning models. For example, we have to first clean the data, deal with null values and extract new features that would help us, would help the machine learning models perform better. As we all know, if you pass in garbage data, you get garbage results.

Len: Garbage in, garbage out.

One of the themes of this podcast is that, as Marc Andreessen said years ago, "Software is eating the world." I mean everything, everything we do has software behind it one way or another, it seems. And so how software is written and conceived and deployed and maintained is actually something that's as basic as the questions we ask about our physical architecture and our sewage and things like that.

I wanted to ask you - moving onto the subject of your book specifically. How has machine learning already changed how people write software?

Derrick: Machine learning has actually made work easier for developers. For example, according to Jeff Dean, 500 lines of TensorFlow code, has already replaced 500,000 lines of code in Google Translate. Machine learning is also helping in getting faster responses for problems such as running out of memory. For example, if your server is having any issues, there are machine learning models that can be able to detect these problems, so you can be able to deal with them faster.

Len: And how do you think machine learning will change how software is written in say the next 10 years, if you have any thoughts on that in the future?

Derrick: That's a very interesting one that has actually been trending, is the ability of machine learning to write software. It'd be very interesting to see how this plays out in the next couple of years.

The big question has been - will software be able to write itself? I think this will be something that we can look forward to happening in the near future.

Len: And, if software is writing itself - I mean, presumably there will still be people involved along the way - are programmers going to have to become more sophisticated in their understanding of things like machine learning? Or will it become like a version of Stack Overflow, where now we're just kind of cutting and pasting from answers provided to us by other sources?

Derrick: I think the software developers will now focus much more on learning machine learning - how to tune the parameters, how to get the right training data, and to clean it such that when the training happens, they get the very best results.

Len: What was the inspiration for you to write this book?

Derrick: I have had these tutorials for a very long time. But then, I was given the idea to put them into a PDF by Austin Kondra, he's one of my editors. So the idea to put together a book was never mine.

Len: In researching for this interview, I saw that you previously had a book published on Packt. Is that correct?

Derrick: That's something that is still in process. I think the book will be released in May. We're still writing the book.

Len: Congratulations on that project. I'm looking forward to seeing what comes out of it.

And so for this book, you chose to publish it on Leanpub. I was wondering if you could talk a little bit about why you chose us as your publishing platform for this project?

Derrick: When the book was ready, I consulted one of our teaching fellows at Meltwater Entrepreneurial School of Technology. His name is Andrew Berkowitz. He's the one who sent me the link to Leanpub. And once I had the link, everything was straightforward. The reason I chose Leanpub is mostly because it was very easy for me to just upload my PDF, EPUB, and MOBI files. And that's it. There's not much complications involved.

Len: That was my next question, actually. You used our "Bring Your Own Book" feature, which lets you upload PDF, EPUB, and MOBI files that you make yourself, and put them up for sale on our bookstore. I wanted to know - I mean, all the people listening who are thinking of being writers or who are writers themselves would probably like to know - what tools you use to create your book.

Derrick: I prepared the book on Google Docs, and I downloaded the PDF and EPUB versions from there. But then I had to look for some tools online to convert it to a MOBI file.

Len: Often Leanpub authors are interested in engaging directly with people who've bought their book. Is that something that's important to you? Are you getting feedback from people that is improving your book?

Derrick: That's a very important thing, but I haven't really spoken to any of the readers. I think it's important to get valuable feedback, that I would use in writing, for example, my next book. I don't know how this would work, but probably a forum for each book would help.

Len: The last question I always like to ask people on this podcast is - if there was one feature we could build for you, or one problem we have that you noticed that you could ask us to fix, what would you ask us to do?

Derrick: Code formatting. I need a beautiful way to format code that takes into consideration rules of various languages, such as indentation in Python. For example, when someone is reading your book, you want them to be able to copy and paste the code, and it works. So if there's a way you paste the code and it's properly formatted, that would be a nice once.

Len: We do use Pygments. So if you write a book in Leanpub, for every block of code, you can set the language, and it does the syntax highlighting. In most cases, it should be, you should be able to copy and paste it and use it. But I'm not a programmer myself, so I won't claim it's beautiful.

Derrick: I think that I haven't seen that, because I used the "Bring Your Book" feature. Maybe the next one, I should write it online.

Len: Well ou're very welcome, please do. And any feedback you have, if you look at it and you think, "That's ugly," we would love to hear from you.

Derrick: Sure.

Len: Well thank you, Derrick, very much for taking the time out from what I assume is a beautiful morning - it's evening time here - for doing this interview. I really appreciate it and learned a lot. And thank you very much for being a Leanpub author.

Derrick: Thank you so much for having me.

Len: Thanks very much.

And, as always, thanks to all of you for listening to this episode of the Frontmatter Podcast. If you like what you heard, please subscribe and like and review in iTunes. And if you'd like to try being a Leanpub author yourself, please go to Leanpub.com and click on "Why Leanpub?" Thanks.

Podcast info & credits
  • Published on March 26th, 2019
  • Interview by Len Epp on February 19th, 2019
  • Transcribed by Alys McDonough