Leanpub: Publish Early, Publish Often

Chapter 9 - Wrapping it up

To summarize the entire data process we learned in this book, we will propose a hypothetical scenario of an organization and how it can start using its data to promote business impact.

We will follow the same sequence of content that was presented in this book, starting from Chapter 1 (What is Data) and finishing at Chapter 8 (How to Build a Data Team). While reading this chapter, feel free to pause and go back to the corresponding chapters if you don’t remember a concept that is being mentioned.

Now, let’s dive into the story. If you’re familiar with the TV mockumentary sitcom The Office ¹, you might have heard about the Dunder Mifflin Paper Company, Inc. It’s a company that resells paper from manufacturers to offices and other companies. We will examine how Dunder Mifflin could become data-driven and thrive in the paper business.

Some of the The Office's main characters. — Some of the The Office’s main characters.

First, what data could Dunder Mifflin have?

They have data from all of their paper suppliers. Names, emails, addresses, phone numbers, last purchase dates, and the supplier status, currently “active” or “inactive”. In a catalog of every type of paper, they also have the type, the price, and how many sheets are in a package.

Additionally, they have information about their customers: names, emails, addresses, and date of the last purchase.

They also have data about all of their sales. Every order can have an identification number and the name of the salesman responsible for that sale. The order’s content may include the type of paper, the number of packages, and the charged price. They also have an identification of the customer, the order date, shipping date, and delivery date.

In a more modern system, they could even store the buyer’s address so the GPS data generated by the delivery truck can automatically calculate the best routes and delivery ETAs.

Although all this data above seems like many different types of data, notice that all these fields could be categorized into a few data types: Integer (the type of paper, the number of packages), Floating-point (charged price), Character/String (Names, emails, addresses) and Boolean (status of the supplier is currently active or inactive).

How can this data become information at Dunder Mifflin?

At first, someone randomly looking at all these fields and types of data separately wouldn’t generate any value for Dunder Mifflin, but the moment a branch manager or a salesman looks at this data combined, it clearly becomes Information, as they know how to interpret and act based on it and, more importantly, they know how valuable this information could be to its competitors.

Then, where else could we gather data from?

If the sales system registers the date and time of a sale and stores it, it is application-generated. If the trucks that deliver the supplies for the buyers have sensors installed to register the truck’s speed and how long it took to reach the destination, and at what hour that happened, this is hardware-generated data.

Suppose there is a system that registers support tickets and product reviews submitted by the buyers. In that case, this can be user-generated data and can be used to analyze customer satisfaction.

How does Dunder Mifflin store its data?

The software used to register and manage sales can have its own Relational Database Management System (RDBMS) with the following tables:

One table containing all the customers and their details with a unique identification (ID) for each customer.
One table containing all the orders and their details containing the customer’s ID who made the order, along with the unique identifier for each order and product bought.
One table containing all the possible discounts for specific price ranges.
One table with all the salesmen and their details, like region and commission range.
One table with the relationship salesman-to-customer, where every record stores the ID of a customer and the ID of the salesman who is responsible for that customer.

Dunder Mifflin also uses an independent supply chain software that has its own RDBMS and contains tables like:

The suppliers, their details, and an ID for each.
The products available from the suppliers, with name, price, and the supplier’s ID that sells it.
Each order made by Dunder Mifflin, containing the ID of the supplier it was ordered from and the IDs of the products ordered.

From a data storage perspective, Dunder Mifflin, as a modern paper company, has its customer and supplier management software running as Software-as-a-Service and stored on the cloud along with their servers that hold their relational databases. Nothing is stored in their headquarters or branch (on-premises). Yes, even Michael Scott uses the cloud!

How does Dunder Mifflin analyze its data?

Using all the data from the relational databases to support the customer and supplier systems, Dunder Mifflin built a Data Warehouse using a modern column-based (OLAP) database that unifies all the company’s data to be analyzed. To prepare the data for the analysis, data engineers clean records registered with errors, treat empty fields so they won’t generate incorrect reports, and assure that data is in the right format and range of values.

With a Business Intelligence platform, Dunder Mifflin’s managers can continuously access and monitor the state of the reports, which can be in dashboards containing charts and historical timelines for easier comprehension.

Analysts could perform a descriptive analysis to check their customers’ satisfaction with metrics like the percentage of orders that receive complaints and what customers make more expensive orders so they can pay more attention to their requests, and what orders are still waiting for delivery.

Predictive analyses can be made to visualize future trends in the paper business:

Based on purchase history, what customers are likely to cancel their contracts?
Based on suppliers’ price change, what suppliers are likely to increase their prices?
Based on orders, what type of paper is likely to be bought by customers of what business industries?

Then, even prescriptive analysis can be performed to increase the number of sales, like:

What kind of marketing actions will result in the most sales for bigger offices?
What kind of customer relationship channel (phone, email, in-person visits) will lead to more satisfied customers from smaller offices?

How can a Dunder Mifflin branch convince the executives it needs an infrastructure for these types of analysis?

The corporate team can be insecure about investing in technology, as it is the very reason why the paper business is shrinking. A good pitch for the case can change the odds for the proposition.

Start with a known business question: With the rise of digital businesses and software that manages traditional companies’ operations, paper demand decreases every day. What markets do we have to explore in order to grow in the next quarter despite the digital evolution of offices?
Present evidence with consistent outputs: Dunder Mifflin can generate reports about its sales that show which businesses are growing, which are shrinking, and what kind of marketing or customer relationship actions can make a change in that. In this case, it can be useful to present an action that already worked: when a branch manager personally spent hours having dinner with a local client and got the order because the client saw value in supporting a local business. Seeing what kind of clients would value this trait, manager’s hours could be more efficiently spent with these clients.
Address possible concerns: The corporate staff might not seriously consider that technology can help a traditional company. Because of that, it can be essential to show the market tendencies: the fact that the company can be out of business in a few years or months if things stay as they are. Investing in technology can cause strategic changes and make administrative work faster and cut costs there, too.

Create a mission statement: It is essential to convince the corporate office and the teams in the branches to get along with the change and really use the systems so the analytical processes can have input to work with. A mission statement can be: “To support Dunder Mifflin in being the paper company with the best customer relationship, we will focus our efforts on our target customers by offering exclusive support, close contact, and fast shipping at a friendly cost.”

What is the data team going to look like?

The data engineers will have several attributions, including:

Integrating the data sources from different systems and databases to the Data Warehouse and then to the Business Intelligence platform.
Applying transformation to this data when it is being loaded into the Data Warehouse, so it is standardized for analysis.
Fixing possible errors in this data, like wrong formats, inconsistencies, and empty values.
Mapping, providing, and monitoring access to all data sources and the platforms where the data can be accessed.
Constantly monitoring the performance of the databases and the pipelines to fix errors and identify bottlenecks for improvement.

The team will also need analysts - people that make the interface between the business questions and the data to create the dashboards and reports required. Those analysts will have constant contact with branch management; the supplier relationship manager, the customer relationship manager, the accounting team, and the human resources representative.

These teams will need to know what the data shows about their actions and their work, to then change strategies accordingly.

Analysts will have contact with the corporate team and report the branch’s results about sales, orders, marketing, and other operational issues. Not only will analysts extract these metrics, but they will also need communication skills to tell the corporate what actions led to those results and why.

When a good amount of data about orders and suppliers has been registered, the branch can consider bringing in a data scientist to help with predictions. They can build models that will forecast sales, customer acquisition, and other metrics that will show in what direction the current strategies are taking a company.

With all of these pieces in place, Dunder Mifflin can now stop drifting in a digital economy and start taking action based on data to put its efforts into the right moves to keep itself in business!

We hope this conclusion helped you consolidate all the concepts presented in this book, and we really wish this could be the beginning of a new “data-related life” for you.

Remember that at the end of Chapter 8 (8.4 Learning Path), we proposed a few technical and business books as a learning path for the next steps if you decide to go deeper into the data world.

Finally, we present a glossary with the many terms, concepts, and jargon used within this book that you will frequently hear if you decide to start working more intimately with data. There’s also a brief explanation for each term and concept that you can use as a quick reference guide.

Up next

Glossary