Shefali Nayak, Author of What Just Happened: Descriptive Statistics: An Explorer's Guide to Data
A Leanpub Frontmatter Podcast Interview with Shefali Nayak, Author of What Just Happened: Descriptive Statistics: An Explorer's Guide to Data
Shefali Nayak is the author of the Leanpub book What Just Happened: Descriptive Statistics: An Explorer's Guide to Data. In this interview, Leanpub co-founder Len Epp talks with Shefali about her background in data science, moving to the United States just before the pandemic started, mindfulness and Sahaja Yoga meditation, her books, and at the end, they talk a little bit about her experience as a self-published author.
This interview was recorded on May 4, 2021.
The full audio for the interview is here: https://s3.amazonaws.com/leanpub_podcasts/FM180-Shefali-Nayak-2021-05-04.mp3. You can subscribe to the Frontmatter podcast in iTunes here https://itunes.apple.com/ca/podcast/leanpub-podcast/id517117137 or add the podcast URL directly here: https://itunes.apple.com/ca/podcast/leanpub-podcast/id517117137.
This interview has been edited for conciseness and clarity.
Transcript
Len: Hi I'm Len Epp from Leanpub, and in this episode of the Frontmatter podcast I'll be interviewing Shefali Nayak.
Based in the Wisconsin, Shefali is a data science professional who specializes in statistical and machine learning, as well as strategy development and risk management.
You can follow her on Instagram @keep_on_learning_ and check out her profile on LinkedIn.
Shefali is the author of three books for sale on Leanpub, What Just Happened: Descriptive Statistics: An Explorer's Guide to Data, Sampling Techniques: A Comprehensive Overview, and Big Data Analytics: Data Scientist's Viewpoint.
In this interview, we're going to talk about Shefali's background and career, professional interests, her books, and at the end we'll talk about her experience as a self-published author.
So, thank you Shefali for being on the Leanpub Frontmatter Podcast.
Shefali: Thanks Len, thanks for having me. I am super excited to be on this podcast.
Len: And we're very glad to have you.
Shefali: Thank you.
Len: I always like to start these interviews by asking people for their own origin story. So, I was wondering if you could talk a little bit about where you grew up, and how you found your way into a career involving data science and management consulting, and so many other interesting things?
Shefali: Yes. I'm happy to share. So I grew up in India, and education-wise, I hold a Bachelor's and a Master's in Statistics. I love numbers, that's how I got into that field.
After that, I did have about a 12-year career - and still going on - in the data science space. I have done a fair amount of projects involving around credit and fraud risk. Those are kind of super intense when it comes to data science, when it comes to machine learning algorithms. It's definitely the up-and-coming field in the field of data science.
Currently, I have been in the field, or the domain, of banking and financial industries. So that's a bit about me.
On the professional or the working front, I also have written three books in the field of data science - like you just mentioned, Descriptive Statistics), Sampling Techniques, and Big Data.
Len: And around what time did you move to the United States?
Shefali: So, I just recently moved to the United States, and it has been around the pandemic year. I came to the United States just when the pandemic hit - so I came about a year ago, into the United States.
Len: Oh, that must've been an especially interesting experience.
Shefali: Yeah. So it has been. I would say everyone has put their fair share of - it has been a fair share of difficulty for everyone, and everybody's kind of putting their emotional support, or they have been trying to get connected with their loved ones virtually, and trying to make this happen. I say it was definitely a huge problem, because we were having a lot of our friends, acquaintances, our family members who were in a situation that required to be hospitalized, and those kind of things.
But I think the pro of it, is that all of us have been there for each other, and have been trying to use the technology that is available to us, to our fullest resources. So, I think that has been a blessing, that we have that as an option available. And with Zoom and all of this coming up, I think that kind of helped us to keep connected across the globe.
Len: It must've been particularly strange to move to a new country, and then have the door closed behind you.
Shefali: Yes, absolutely. I mean, it was a little bit - I would not say it is - it has not been stressful. It has definitely been a little bit of stress when you're just coming to a new country, and the doors close, right? You just want to know when is it going to open. You always want to have those options.
Len: I don't know what it is about people who show up as authors on Leanpub, but many of them are people who've moved around from one country to another. I myself have had the experience. So, I always like to ask when I find out that someone's done that - what were things like, what was the biggest change for you? Or what kind of was the most surprising thing that you encountered, I guess, in Wisconsin?
Shefali: Well, I would say the best part about Wisconsin is that you get to experience all the four seasons. I have never experienced a winter, and I have never experienced the snow. So this was a great experience for me. And at the same time, I did get to see all the four seasons in its full glory. I would say that definitely is a plus. And I love traveling and I have, like some of the self-published authors, traveled around quite a bit. So that's definitely hitting my to-do list, saying that I did travel to this part of the country.
Len: That's really great the way you describe the impact of sort of experiencing all four seasons. I grew up in a place called Saskatchewan in Canada, where, in the south - in the summer, it's basically a semi-desert - and in the winter it goes down to like minus 40. So pretty dramatic. I assume a lot like Wisconsin.
Shefali: Yes.
Len: And now I live next to the sea, and so it's very temperate all year. There's like basically a 15 degree variation from high to low kind of.
Shefali: Yes, exactly.
Len: Exaggerating a bit. But you do miss it when you're used to it. And it's interesting to hear about someone being excited to experience it for the first time.
Shefali: Yes, absolutely. I equal that part, and I equal that feeling. Because I come from Mumbai originally, and that is definitely like you mentioned, like a beach or a coastal area. So it's very humid, it's very temperate through the year. It's pretty much like a flat season. You do have showers, and you do have a little bit of winter, but this is definitely not the winter that we are talking about. So it's definitely extremes.
Len: One thing I wanted to ask you about, is - we'll be talking about your books and your data science experience and things like that in a bit, but I know that you do work on wellness and meditation and mindfulness and things like that. I was just wondering if you could talk a little bit about that aspect of what you do with your time?
Shefali: I did realize that meditation is definitely something that's keeping me grounded, and that's something that I have been practicing for nearly three decades right now. What I do take for granted in my state of mind, is that - I am focused on the present. It's something that does require a lot of working through the years. So it's just ten minutes every day, but it does make a lot of difference over a period of time. And things that I take for granted, like my state of mind right now - where I'm completely focused on the present moment - that's something that does not come very naturally to a lot of us, and at times we do kind fall over into the past or into the future.
So we kind of have that feeling of dwelling in the past, and saying that we could have done something better. Or being really anxious about what the future holds. I think the pandemic is a situation, where my education definitely helps. Because everybody's wondering what could have been done, and how is it that it's going to look in the future? So this is something that I have been practicing, and at the same time I have been taking some classes, where I kind of help some people, and it's kind of a volunteer activity that I do.
Len: That's really interesting. I am definitely one of those people who obsesses about some mistake I might have made 20 years ago. All of a sudden, that's in my head, and I'm completely captured by it. And then the next minute I'm totally concerned about something that might happen 20 years from now.
Shefali: Yes, absolutely.
Len: And whether I'm making the right decisions about that in the moment. So, what is the form of meditation that you practice? I was wondering if you could talk a little bit about that and how does it help?
Shefali: So I do practice the Sahaja Yoga meditation. It has been practiced in 100-plus countries. I think it would be 120-plus maybe, right now. It was founded by Her Holiness Shri Mataji Nirmala Devi in the year 1970, and is something that has been practiced all across the globe. It's a very scientific way of doing the meditation, wherein you take a couple of affirmations, you try to balance the left and the right or the past and the future. You try to come into this moment where you do not have much of thoughts. You're basically centered or focused in the present moment. And that actually helps us to become more productive.
Because at times, we do realize that when we are thinking about the past, we think about the future. We tend to lose the present moment. And we realize after two or three hours that we just didn't do anything for those two or tjree hours. So this kind of helps keep centered, and it is very scientific way of doing it. There are sessions all across the globe, all across the time zones. So it's more like a volunteer activity, where we use the water or the fire element and we just kind of - it's a no-cost kind of thing, it's just ten minutes of meditation per day. And that has incredible results, because it helps in both creativity, and being focused.
Len: And when you're talking about focused, I thought I might bring up something very specific, which is - which I definitely suffer from. Which is - when you're in front of a screen all day, like we all are basically now. I mean - unless you're working in the services industry and healthcare and taking care of people and stuff, where it's face-to-face - a typical professional nowadays, even if they were formerly a courtroom lawyer - might be spending most of their time on Zoom, like we are right now. And I particularly have the problem that I think a lot of other people have, of obsessively wanting to check the news, for example. Can meditation help with things like that, by keeping you focused, and keeping your brain from leaping off into all these potential areas all the time?
Shefali: Yes, absolutely. Because I do see that the moment we do turn on the news, we kind of react. We start reacting. And we have opinions. So we have opinions on everything. This kind of helps that we integrate our opinions, our reactions - and if we can do something, if we can help - then we go full - I mean, we go full enthusiastic, and we try to help out the best we can. But we kind of got the - the actions and the opinions that we have, and we just stop discussing and discussing. And then there's no actions. So it's basically playing down the reactions, and taking some actions where we can.
Len: Oh, that's really fascinating. So in addition to trying to control - one feature of it is trying to control your reactions, right? And so if you find it - one thing you might want to do is try to prevent yourself from being uncontrollably stimulated. But if you do find yourself stimulated, at least try and turn that into something -
Shefali: Yeah.
Len: ...that's productive in some way.
Shefali: Yeah. Because Len, I think that a lot of us do tend to just put on the news and see why it is going on, what is happening? And we tend to have these reactions. But then I think it makes more sense to think of all the possible actions that we can take in our surroundings, and all the things that we can do - rather than just have these unnecessary reactions. The way I put it is, if there's a carpet that you see - the best way is to enjoy it. Rather than think about what the price is, where was it made - and all those kind of reactions. But you just enjoy the beauty of what's happening, or you just try to be alert as to what is it that you can contribute to the society.
Len: And if there's something wrong with the carpet, shouting at it won't make a difference.
Shefali: Yeah, exactly.
Len: I just use that analogy, because I shout at the screen all the time.
Shefali: Yeah.
Len: I mean, metaphorically.
Shefali: Yeah, exactly.
Len: Thank you very much for sharing all of that. I'll make sure of course - as always - so regular listeners know, we try to put links to all the kinds of things that people mention in the transcript and stuff like that. *[Here are some links: https://wemeditate.com, https://sahajaonline.com, https://www.sahajayoganewyork.org/shri-mataji - Eds.] So, if you find this in text form on our website, you'll be able to look up all the things that we're talking about.
One thing I wanted to ask - just before we move on to talking about your books - you mentioned that you're now doing work in the banking and financial areas, and I was wondering if you could just maybe give us a specific example - ou don't have to name a company or anything like that - of the kind of work that someone like you would do in those areas?
Shefali: There's a part of business in banking that says, "intuition." Probably at the start we would - I am not sure about how it started, but I would say that at start, when we want to provide a service to a customer, we probably say, "This is the kind of credit card that you would want for your spends." But right now with analytics, with the data which is at your tips, and the kind of data which is available to us - it is making a lot more sense to kind of aggregate this data, go through the data - and try to understand the customer, know your customer much better, so that when you are selling across a product, it does not come across as something that the bank wants to push or solicit. But also, you're trying to understand the needs of the customer.
That is, what is the kind of credit that you want? And in case there's any fraud that is happening on your credit card - how is it that as a bank - as a financial institution, I can protect my customer? Some of the examples which are there in banking, that I have experienced or I have worked on - would be credit and fraud risk.
On the credit part, it's more around underwriting a customer. That is, how much of a line should be extended, and which would be a good trade off, or a break-even between the risk that the bank is taking on this customer? At the same time, how much is the profit that is going to be generated by this customer? If they want to extend this line, what is the kind of loyalty that the customer is going to shoot to us? And what are possible avenues that we can cross-sell our product? If this customer has a home and an auto loan with us, why not give this customer a credit card?
Because this customer is really loyal to this bank, is spending and also paying back responsibly, then why is it that we cannot step ahead, and proactively touch base with the customer and say, "What is it that you need?" And, "We are going to be here for you."
Also, we do realize that during the customer lifecycle, we have our own needs as a customer. So probably - I've completed my education. I want to start a new job, or I want to start a family. So for everything, it will probably require different lines or different cards and different financial services. I think that part is where I've worked on extensively, which is credit risk. Trying to understand and know your customer.
The second part that I have also worked on is the fraud risk. So for example, if there is fraud on a customers' credit card - then you're not definitely going to be enjoying spending on that card, because you just got fleeced for a couple of hundred dollars. You would want to be protected. You would want your identity to be protected. And how is it that as a bank, we can help you?
At the same time, I would say it requires some amount of sensitivity. Because as a bank, if I find that this is potential fraud, and I decline this transaction - but if it's a genuine customer, it can create a disruption. If I'd had a meal at a fine dining restaurant and somebody declined my card - declined and said, "I thought it was fraud", and I'm not carrying cash, that's going to be super embarrassing, and I'm never going to use that card ever again. So, trying to understand where is it that we can bring that balance, is something that I have been working on.
Len: Thank you very much for sharing all of that. These are the kinds of things that we all touch on from the consumer end in our lives, and rarely get a little bit of insight from the other side. That explanation you gave about fraud protection, and what it's like to have something declined, leads me to as a question. It's basically - this is all just completely anecdotal on my part. But that thing used to happen to me more regularly than it does now, and I've always had this sense, but I've never been able to speak anybody about it directly, that, basically, particularly credit card fraud prevention as a craft - has advanced dramatically in the last, let's say, ten years or so. I'm basing this on just my day-to-day observations. At Leanpub, we run an online marketplace - and I can see how our transaction provider has been improving in things like that as well. Is that true that things have dramatically improved in the last few years?
Shefali: I would definitely say, "Yes." It has dramatically improved. Because there are a lot of fraud rules that go behind the scenes. It's just a nanosecond that somebody's swiping a card at point of sale at a large retailer, or going online and making that transaction. And I would say, a sum of five seconds is all the time that you require. And you say, "This card is approved and this transaction is approved." But there are, I would say, roughly 1,000-plus rules that are going there - depending plus/minus, depending upon the bank that you're dealing with, and the algorithm that they have put in place. You do want to see something which is in pattern for the customer.
So for example, I am living in Wisconsin, and suddenly my card starts swiping in New York. So the bank takes it upon itself to reach out to the customer and say, "Is this truly you or not?" But at the same time, they cannot make it like it's just going to be the zip code or something. So they would put something in, like - if it is me, I always shop at this retailer, I go there, and also I'm using the same card enough - ticket size, or the same spends. Then, that makes sense.
But of course, there are natural calamities. Like, I would say in a situation that I have come across - for example, if there are hurricanes or something, and people are moving out of state. It's not an out-of-pattern that you can start declining, because people will be moving out of that locality. So you have to be up to date with what's going on. So you cannot just have a fraud rule and say, "I'm done with it." You will also need to be aware of the global scenario that's happening.
Len: It's super fascinating, what you said about in-pattern and things like that. Because a lot of fraud prevention is looking for patterns. And it's not just like you can like find them all, and you're done, right? Because the fraudsters are clever, and they're constantly evolving, and they learn what pattern - they might even test a pattern, just to see if it works - and then test a new one, to see if that works. And their plan is then to use it elsewhere, or even to sell off the pattern that they've discovered to somebody who's better at exploiting - they might be good at finding the patterns, and other people might be better off at exploiting them. And they'll do exchanges like that.
It reminds me a little bit of plagiarism. There's just - and when you talk about the algorithm, it's like - you can just start to tell. Like, not that we have a big problem with it at Leanpub or anything like that. Not at all, thank goodness. But you can just kind of - there's just things that start to - the obviously bad markers, right? Like, is someone buying the same product over and over again, or something like that. You know what I mean, from the fraud side, or from the plagiarism side. It's like, does the style not match throughout the document, and things like that?
And catching people trying to pull stuff off is basically really fascinating, and it's so interesting to know that things actually have been improving. And it's partly because of the techniques that we're about to talk about, that have been developed so well in the last few years.
So, onto the subject of your books, and data. I believe the first book you published on Leanpub is called, What Just Happened: Descriptive Statistics: An Explorer's Guide to Data. I wanted to start where that book starts with the question: What's data?
Shefali: That book, I put the pen to paper and I started with that book as the first book. Because I feel that 90% of our data is processed for something which is, descriptive statistics and data as a whole, has changed over a period of time. Because the data that we have been traditionally used to, is the rows and columns data. And with the entire digital age, things have moved. With social media and smart phones, we see that this data is not - we don't get it in that traditional form anymore.
In addition to that, a segment which is diminishing in its share, we are starting to get a lot of social media data. So, something that we can run databases on something which is like, in terms of graphics, MRI scans, or something which is more like any of the social media accounts that we have. A lot of data is in the free-flowing form right now.
So what I did was - in the initial book, I gave the readers a feel about what are, data and its types. Like the structure, in structured form. And then what is the basic processing that goes for all of this data?
90%, or maybe more than 90% of it, I would say, is processed for a basic understanding of how this data is centered around its mean, or what is the variance around it. Or, how is it like in terms of the percentages or percentiles, and the position of the data? All of this data is looking normal, and what is a normal distribution?
All of this is something that I put as the first basic block in data science. I truly believe that simplicity is the ultimate sophistication. I also believe in the Einstein code that says that, "If you cannot explain it simply, then you have not understood it well enough." I wanted it to be so easy that when I explain it to someone, it's like, "I got it." I wanted it to be fairly easy, and I made that attempt with these three books.
Len: I think you succeed very well. They're very clear. And it's not only a matter of how they're written and how they're structured very deliberately. But actually, the design and how they're presented, which is something we can talk about a little bit later. But they're very inviting, without being patronizing - if you know what I mean?
Shefali: Yes.
Len: So it's something that's come up. Because so many books on Leanpub are prescriptive non-fiction, which is a publisher's term for what books that explain things, and tell you how to do them. It's always hard to strike a balance between - how do you talk to a grown adult who is just as smart and experienced as you are, but just in other things than the thing you're trying to explain to them? I think you start that tone really well.
You mentioned just a moment ago, descriptive statistics. I was wondering if we could go into that in a little bit more detail? Can you give me an example of something that would be in a descriptive statistics study?
Shefali: Descriptive statistics, I would say, is broadly divided into a couple of categories. One would be the averages which is the mean, median, mode. What I did was, I went a little bit too deep into what is a mean. At times we say, "Okay, it's just an average." But then how is it that we want to compute this average, if it is like a frequency or a tally table? If it's grouped data, then how do you deal with it?
Those scenarios. Like with the grouped data, with ungrouped data, with your frequency tables - how is it that you come up with this mean? Which is the most easy or the most, best testament, I would say - the best testament that you have when you don't have any other vehicle.
So if I say, "On any given day, I know that the past five days are running in degrees Fahrenheit are X, then I would just say the next day is going to be an average, with no other information that is available to me.
Then there was mode. So, again the same way. I went into it and tried to understand the grouped data. With ungrouped data, how do I come up with this? So that is one part.
The other part would be around variation. So I try to touch upon every type of variation, not just the one that we report - like standard deviation, the normal one. But also like quartile deviation. How is it around deviation or variances? What are the different metrics that are available to us?
I also looked at position of the data. So, I'm trying to understand in terms of percentiles or deciles or quartiles - if you're going to break up the data into four quadrants - I would not say "quadrants." It would be a faux pas. Where is it that this data is around? And looking at normal distribution and trying to understand whether this data is normal or not normal. So those scenarios is what I played around.
I also touched upon visualization. Because I'm thinking, that is where data you're leaking off is having that visual effect - where we can relate and correlate strings. So when is it that you would probably want to use the scatterplot, to understand the correlations? When is it that you would want to show a stack chart, versus a pie graph?
So, those things. Just put pointers out there saying, "If you are using amount on amount, then you might as well be better off using a bar diagram, versus a pie chart, which does not have that feature".
Sometimes it does happen - like when we're at the start, we start with analysis and we say, "We are going to work with a pie chart." And then we realize there's a time aspect to it. So these couple of things which - maybe we are going to learn with the trial and error part of it? But I just put down in notes, "This is where you should use it." It makes sense.
Len: Data visualization in particular, is something that I think consumers of the news have noticed a big transformation in that, where prominent news sites have actually finally stopped complaining about having to be on the computer, and decided to start taking advantage of all the opportunities that are available - both for processing data, and then visually expressing it to people who are reading their sites. That's been an amazing thing to see.
You mentioned one thing. Before we go on to talk about the next book, you mentioned sentiment analysis. This is something I find really fascinating. It's basically when you hear - when you're talking to someone, when you call a company up - and you hear, "This call may be recorded," partly what that might mean is that there's going to be some machine that's going to basically listen to what you said, and then try and decide whether or not that was a good interaction for you or not. Am I right that that's what sentiment analysis is, or am I wrong?
Shefali: It is definitely. I would say it is also - I would say that is one of the aspects also. When we are looking at some of the things that are happening on social media - a good part of it is trying to catch couple of positive words and negative words. And also, there's a lot of advancement in the algorithm. Because they also have to catch anything which is contradictory. Because people can be sarcastic. They can say, "Thanks, I got late because of the flight." And now that "thanks" is not actually a thanks, because we are just being sarcastic about it.
So we have to understand the algorithm, - depending upon how you're feeding that algorithm, how well it is monitored. Are we going to get better algorithms? It is definitely advancing over a period of years, and it's basically understanding the positive and the negative sentiments that are happening, which revolves around a couple of - I would say the words that are being used in the entire conversation or the number of hashtags, the number of comments that are happening on the social media. I think a lot of that is driving the interaction globally.
Len: It's so fascinating. I can only imagine the challenges with having to catch or register tone like that. Like, "Thank you very much" doesn't mean, "Thank you very much." And that must be just a really fascinating challenge for people who work in that area.
And so your next book is Sampling Techniques: A Comprehensive Overview. I was wondering if we could just start like we did with the first book. What's sampling?
Shefali: Yes. I think this is an interesting piece. Because I feel - if you get the sample right, then you have got your answers right. Different samples can have different results altogether. And a very-- I would say a very layman way of putting it would be - you'll always have a biased sample, and you always have an unbiased sample. So if I'm just going to give across like five movie tickets, that's more like I'm just - for anybody who has shopped at this mall - if I'm just having some lottery and saying, "Anybody who was shopping at this mall more than X dollars, you can just put your name in." And I'm just going to pick up a name out of this lottery box or something like that, and it's going to be something that I'm giving out - movie coupons, this can be something which is more like an unbiased thing. I would say this is the way I would explain an unbiased sample.
But if you're saying, "I'm going to pick five people to present my school or my university or my college at a debate," or, "Five people to represent my country at the Olympics," then this is not going to be something which is like a sample which I can just go, "Okay, I'll just pick these five people and I'm going to send them for a debate or for the Olympics."
In this case, you have to understand that there's a bias that happened, where you would want somebody who's the best in what they do, have a full grasp of that topic or that subject, or have trained a great amount.
I would say this is the way that I put it. Saying, "This is biased, this is unbiased," and then try to move into the different sampling techniques that we have. And what I did was, I let go of some of the mathematical equations.
Then I tried to explain the concept as to what is a cluster analysis, and how is two-stage analysis going to be considered, or a snowball effect, and I made a diagram to say, "This is going to be the scenario in this sampling techniques." I did try to put that flavor, because I feel after the descriptive statistics, when we actually model something which is more hard core into analysis, it all depends upon the sample that we select. And with the data that we have, it's like running into millions and millions - at times, billions of transactions, billions of records.
We cannot actually process each and every record. Because that would be fairly consuming on the technology part of it. And it would be a lot of coding and a lot of processing, and you would just have to wait for the output. But we would want to understand what is a minimum viable product thing. Something which is quick and dirty, just to understand where is it that we're landing? Benchmark, or what is the ballpark number looking like? Before we go ahead with the analysis. So I think it's important that we come up with a sample. And then we can say, "We want to replicate this on millions."
Len: And sampling the right proportion of the population, to be confident that the sample is representative of general trends and things like that - is also a challenge, I gather?
I just wanted to circle back a little bit to what you said about bias. That's such a fascinating concept. Because a lot of us might think that that means you are prejudiced or something, there was something like that. But bias, and this example I think has come up on the podcast before - when you're gathering samples of information, it can be - for example - well, if you were doing a survey of how people intend to vote - let's say in a Canadian province. And the only method you used to try to contact them was afternoon calls to landlines, it -
Shefali: Absolutely.
Len: That's a biased sample. Because who's home in the afternoon who has a landline, and who answers the phone and actually answers surveys, right? 100% of the people you interviewed, those three things correspond to.
Shefali: Yeah. So that's - basically what this book is going to give a flavor about, to understand, what population are you really looking at? Are you looking for something which is very specific?
If I'm looking for some polls or I'm looking for something which is like, "Who's watching my TV or my sitcom at this particular hour?" Rather than go and see that, I'm going to look at the entire population. If you have this generic idea that you are targeting a segment, then why not filter and say, "Okay, I just want to exclude this, this, this, this. And I'm going to ask the people that I'm targeting."
Either we could use that approach, or if you want a segment that is for the general population, then it definitely does not make sense to have that time period which is just noon. Because then we'll be leaving out the entire population that is in the morning part, or the evening part. So, depending upon whom you're considering for this entire survey, it becomes very important to have those details in your questionnaire. Because if I'm targeting something for a college student, and then I have my survey that's going to be filled out by everyone who's not a college student, that just defeats the purpose.
At the same time, if this is for the generic masses, and I'm just picking up people from like 21 to 24, and asking them to fill out - that's definitely not going to give me that insight into somebody who is a teenager, or who is the middle aged, or somebody who is retired. So if I want something that is going to be catering to the entire masses, I have to be cognizant of the fact that I require to have this survey properly filled out and not have these biases come in.
Len: Yes. And of course, alternatively - if you want to be really focused, you'd better make sure you're not hitting the general population, or something like that. You want to make sure that you've funneled your sample to the right people.
I was just thinking of a real-world example of how like this might all sound like the things that banks and big companies get to use. But one interesting thing is that this thing is being presented now as a tool to people in the products they might use online.
I'm thinking specifically of sending email newsletters, for example. The service we use has a send time optimization feature, which is basically doing all kinds of super sophisticated stuff along the lines that you were describing, to decide when emails should be sent out, right? And it's - I'm sure it has profiles of when people open emails, and whether that's linked to region and time zone, or what's in the subject line, or what's in the content of the email, and things like that.
It really is amazing. We're now entering into this age, where ordinary people like me can access cloud computing, for example. Now all of these data analysis tools are becoming available to us. And it's not just - it's not only a matter of analyzing and understanding a situation, but actually like taking advantage of that information to be able to do what you're doing better.
a href="https://leanpub.com/big-data-analytics" target="_blank">
And so, the third book that you've published on Leanpub is Big Data Analytics: Data Scientist's Viewpoint. I think people are going to be getting a sense of the progression and structure to the project. What is big data?
Shefali: Yeah, so I put a feel into what is big data, and why are we talking about big data? And, how did this term come about? I just put a a brief about, how is it that now it no longer matters about looking at a couple of customers? But trying to understand the entire customer base, and trying to remove biases as much as possible, trying to include the entire population to make that decision, and trying to understand, "How is it that my customers are reacting to a product? What are the subsegments that are looking?"
For example, if I have a customer spend exclusively on airlines or on large retailers, or are more like a grocery customers - this big data is meaningful. Because we are looking at millions of transactions, and after a point, it does not makes sense to look at just dockets. We have to look at the entire spectrum, to get this holistic view and try to make a data-driven decision.
And then, I did put something about the IoT, and trying to explain how it is that the ATM is the basic form of IoT in its application form.
I just gave that feel and I put that out to the readers - that everything that you read in the first paragraph, that is data itself. And now the data's no longer traditional - now we are moving across to free-flowing text. And then I put some structure around trying to understand, how is it that we can process this data? We can start with the descriptive part of it, then we can sample it - and then we can model it, we can build a storyline - and then we can present it, and then we can run campaigns on it. And then we can monitor it. So I give more like the approach that we practice on the projects. And this helps people streamline.
Because when you start with a project, it gets a little overwhelming when you're getting a lot of data. And at times, we tend to do things sequentially. And now, things are going more agile, where we can do ideal processing, wherein we can optimize the time. We can start being more productive. But in the initial part of working on a project, I would say - in my experience, it's something that we learned over a period of time.
Because at times we were doing things sequentially, and then we realized after a project, after two projects - there are some parts that we can do it in parts, and that optimizes the time that is taken or spent on a project. I think this is what I put it out there - it's saying that, "These are the things that you are required to think about." And once you just have that brief roadmap, it makes sense. Because when you're just working on a project, I have to do one, two, three, four - it becomes like a cookie-cutter structure. So it's much more streamlined.
Len: That's really fascinating. I really picked up on the point you made that ATMs - or automated teller machines, were actually the first IoT or Internet of Things things, that most of us would've encountered. I'm old enough to remember the days before they existed - when you had to go into the bank and stand in line, and you had like a little book that you wrote down your transaction in, and stuff like that. It just never occurred to me before that my first experience with an Internet of Things thing, was actually taking money out of a bank machine. But of course it was.
Shefali: Yes, yes. These are some of the things that we take for granted. Because at times, when you say, "IoY," it gets overwhelming. But then when you relate it to some concepts that you have seen in the real world and when you have experienced it, it gets very relatable.
What I did also put, or inject in this book - is a brief about supervised machine learning and unsupervised machine learning, and reinforcement learning.
These arecommon topics. These are something that people have been spending a lot of time in, in terms of hard core model building activities. And they are getting some trends and packets from the data, which is a little difficult for a person just to look at millions of data and come up with something. So we are using this collaboration with computers, and understanding that we can collaborate and go much further than just doing that as humans. I have just injected that part into the book.
Len: Actually I specifically wanted to ask you - if you wouldn't mind explaining the difference between a supervised and an unsupervised learning model?
Shefali: So I would put it in - a simple term is that supervised is when it is guided, when the data is labeled. For example, if it is fraud and we have got historical transactions, and we have the ability to label the fraudulent transactions as, "These are fraudulent transactions." When the algorithm already has labels, it's more like supervised. Because it's guided, they already know that this is fraud. And what they're just trying to do is, from the millions of transactions that you have fed it, it's going to come up with a trend, or come up with a pattern and say, "This is what looks like fraud." So they are going to make it generic. I would say that is supervised.
Unsupervised is where the person's not guiding it. You just put it all out there and say, "This is what my data looks like." Just come up with some groups and categories. An easy way that I put it, is saying that - if you are looking at some soaps or detergents, maybe you are looking at anything which is antiseptic, or something that is going to kill bacteria as a health soap. Versus, something which is like a luxury soap, something which is a moisturizing soap. All of them would have different groups that it's coming across. But this is something which is more like unsupervised - when you're just trying to put across, and trying to see what are the groups that you want to come up with.
Len: That's a really great explanation. Now, particularly the use of the term "label" - that a label can be attached to a piece of data or a set, a particular coherent set of data. And often - my understanding is that - and people might think that all of this, everything is super automated. Because it's data and we're using computers, right? But actually often, labelling and things like that, there's a lot of manual work that's done, to set up data sets and make sure everything's right.
Shefali: Yeah. There is definitely a lot of work that goes in the background. Because for all of this data, you're required to label it. And you have to see that it is as accurate as possible, and there is less error in the labeling itself. Because if you have labeled that wrong at the start itself, you're going to end up with a very different analysis - and you will have a very different rule set. So that is a lot of work that is put in the background.
Len: Just moving on to the last part of the interview where we talk about your experience as a writer. So, you've written and published and designed these three books. I was just wondering if you could talk a little bit about - just for the sort of current self-published authors out there or potential self-published authors out there - if you could describe your process? What app did you use to create your ebook files, things like that?
Shefali: I always had this passion towards writing, and I always knew that I'm going to pen books. But it was just the one priority. And then I did put that high on priority, and I just kind of - it started with a bullet journal approach, where I put some bullets and said, "These are the main points," or, "These are the main key takeaways for a concept." And then I just brought that all together. And as I started it, when I put the pen to the paper, it started flowing. And then I started putting illustrations out there, because I wanted things to be more simple, and I tried to make it as simple as possible.
So it was more around trying to make a mini-series, that was my objective. Because when I started writing, I realized that it is exhausting - and this topic is so huge, that if I don't concentrate - then I'm going to exhaust myself. And at the same time, I'm going to - again - make it a little bit more heavy. I want it to be as simple as possible, so that somebody who is just picking up this book, and has got no background in statistics, no background in data science - can just pick it up and say, "Okay, I've got this concept." I got what people are talking about. Like, what is the sampling techniques that people talk about? What is big data analytics? And what is a descriptive statistics?
I wanted somebody to pick it up and say, "I don't want to look at something which is too extensive. I want it to be very simple, and I want it to be out there." When I did this, I gave it a shot and I said, "Okay, I'm going to make this like 100 pages. Let's start with 100 pages and let's see whether I can put that together." The second process was - I did a little bit of research. I did, I would say - it is not extensive, but I would say it's mid-range research.
And then I landed up with Leanpub. It stated that this is a simple platform. This is an easy platform for self-publishing authors. And then I said, "Let's give this a try." So I logged into it, I created this account - and then I tried to see how easy that onboarding process is. Because it was out there saying, "These are some of the top publishing platforms for people who are rookies." I am a rookie, and I have like no knowledge on this self-publishing front. So I said, "Let's give it a try."
Then I started onboarding. And it was so simple. It was literally five minutes, and I was super amazed. I would say I was - I got more enthusiastic after that. I just logged in and I said, "This is the title of my book." And I was like, "This is created." I was like, "Okay." I put it out, then I have to put some content out there. So it gave me, I would say - it was a catalyst sort of thing.
I had an initial draft. And the onboarding process was so smooth that I said, "Okay, I have to see this to the finished stage." That's how I got started with the first book.
And because that became so easy, and it was so easy to navigate, I said, "Okay." I had something which was like a draft copy of my second book. I said, "Okay, let's try to make this book come out." So I did that.
And then I started with the third book. I was super happy and super excited about it. And it was something which was - I would say, it definitely caught my fancy - because there were so many options. Like create a bundle, do promotions, and other stuff. I was learning a lot. So it was a great learning curve for me - I would say, on this platform.
Len: Oh, well, thank you very much for the kind words. We always like hearing that. Because we don't always hear them. But, you mentioned something that's particularly - I think, interesting about Leanpub. Which is that, often people who do publish books on Leanpub, it may be their first book. It may be in a sense the first public thing that they've done. And one thing that we do, is - the moment you create a book, right - which is the form you fill out to get started, there's a landing page for that book.
Shefali: Yes, there is.
Len: That book, and people can find if they search. Your name is on it. And you're already out there. And people actually do often find that like - it's the little kick they needed.
Shefali: Yes.
Len: It's real, and I'm actually out there. And because a lot of the time - and I know this, because I had this experience myself. You're a little bit scared, you're a little bit nervous, "How is this going to come across?" It's going to come across fine. People who come across it are going to be - if they're not interested, they'll go away. If they are interested, they'll sign up and say, "Let me know when the book is published."
And then it can be exciting, right? Because while you're writing, you can actually be gathering an audience to be there for your launch - right from the start, even before you've written a single word. All you've got is a title and you've got a page, but you can start gauging interest, and - yeah, giving people that little push out the door that they might - if they're like me - that they might psychologically need, to start doing stuff like that. It can really help.
I did want to actually ask one question specifically though - so what program did you use to create your ebook files? Because I don't believe you used our Bring Your Own Book writing mode.
Shefali: Yes, I did. So what I did was, I started with "Bring Your Own Book". And I created that in a PDF form.
Len: Okay.
Shefali: And then I published that. So I was exploring, and I was trying to understand, "What is it that works best for me?"
Len: Okay, thanks very much for letting us know.
So, just wrapping it up, the last question I always like to ask in these interviews - if the guest happens to be a Leanpub author, is - if there was one thing we could build for you, one feature we could build for you - or if there was one thing about Leanpub that really bugged you that we could fix for you - can you think of anything you would ask us to do?
Shefali: Honestly, I cannot think of anything. I am a happy soul on board, I'm still exploring. I would say I'm definitely excited, and I'm definitely still exploring the opportunities to understand how these features work, and how the promotional activities work on Leanpub.
That's something that I'm trying to figure it out. So I would say, it's been an easy learning experience - and I'm still learning - it's not been a year on the platform, and I am excited using, and learning more about it.
Len: Thank you very much for that. And if you ever do - if anything ever does really start to bug you, or if you ever do think of something that you'd like - please don't hesitate to reach out to me and let me know what you're thinking.
Well, thank you very much for a really great interview, and for really great descriptions of your books, and what you're up to. And thank you very much for being a Leanpub author.
Shefali: Thank you. Thanks Len, thanks for having me.
Len: Thanks.
Shefali: And thanks for giving me this opportunity.
Len: And as always, thanks to you for listening to this episode of the Frontmatter podcast. If you like what you heard, please rate and review it wherever you found it, and if you'd like to be a Leanpub author, please visit our website at leanpub.com.
