Building Office Add-ins using Office.js

Building Office Add-ins using Office.js
Building Office Add-ins using Office.js
Buy on Leanpub

1 The book and its structure

1.1 The “evergreen”, in-progress book

1.2 Who should read this book

This book is aimed at the professional developer who is tasked with creating an Office Add-in (a.k.a an Office Web Add-in, and formerly known as an App for Office). In particular, it is aimed at the “new” Office 2016+ wave of Office.js APIs – which, at the time of writing, is supported in Word, Excel, and OneNote1.

Office has a rich and wonderful legacy of programmability, with VBA spanning the full gamut of developer skill levels – from novice programmers tweaking simple macros, to professional developers writing complex Office customization. But for purposes of this book (and due to the more complex nature of the underlying Office Add-ins platform – for now, anyway), it is really the professional developer that is the target audience. With that definition, I mean someone who is well-versed in code, who is eager to learn something new, and who is unfazed by occasional difficulty (which you will naturally get, being on the cutting edge of a developing Office Add-ins platform). A working knowledge of JavaScript is a plus, as is power-user knowledge of the Office application that you are targeting.

Importantly, this book is not meant as a “standard” API reference manual, which might walk you through a series of tasks or APIs, focusing on the particulars. From my perspective, we have dozens of object types, and hundreds of methods and properties, in each of the Office hosts, all of these are dutifully documented in our online API reference documentation. To put these in print, in the form of a static book, would serve very little use.

Instead, this book is about the underlying principles that these APIs, however diverse, have in common. It is about the conceptual threads that bind the API surface area with the common runtime that they share. My goal is to paint the overarching picture of the API model, and to zoom in on details that transcend beyond any given API object. In this regard, this book is about going from copy-pasting online sample code, to understanding the key concepts that make these samples work. In short, this book is about writing effective, efficient, testable code, and the tools that will help you do so.

As you’ll read in the introduction to Office Add-ins, Office 2016 – and, of course, its Office Online & Mac & iOS equivalents – represents a major re-birth of the APIs, with each of the supported hosts getting a custom-tailored object model that dives far deeper than the original set of Office 2013’s “common APIs”. This new (2016+) wave of Office.js APIs is the subject of this book. You may find that a part of the content in this book – especially the more general topics like an introduction to Office Add-ins, prerequisites, debugging techniques, pro tips, and various “beyond-API” topics – will apply to both the 2013 and 2016 Office API models, but the book’s focus and raison d’etre is the new Office 2016 wave of Office.js APIs.

1.3 Book structure



1.4 A few brief notes

A few brief notes before we get started:

  • This book assumes some general programming experience, and ideally a working knowledge of JavaScript. “Chapter 4. JavaScript & Promises primer (as pertaining to our APIs)” can get you somewhat up to speed, but if you’re intending to develop add-ins professionally, you’ll likely want a more formal JS training / book.
  • While you may need various other web skills (HTML & CSS expertise, experience writing back-end services and databases, and so forth) in order to create the full-fledged Add-in, these topics are not Office Add-in specific, and are covered in countless books and web articles. Beyond the occasional reference or link – or just enough explanation to show an example – this book will focus mostly on the “Office.js” part of writing Office Add-ins.
  • You should have a reasonably working knowledge (and ideally, “Power User” knowledge) of the host application that you will be targeting. Here and elsewhere throughout the book, “host application” refers to the Office application that your add-in will be running in (Word, Excel, etc.). After all, the host application is what your end-users will use, and so you’ll need to have the expertise to test that it all works under all circumstances! An example: in Excel, worksheet names can only contain up to 31 characters, and may not include a number of reserved characters. This means that, to avoid having the application stop midway through an operation, you will want to pre-validate the user input, or handle the error through other means.
  • The concepts and tools throughout this book are applicable to all of the Office applications that have adopted the “new” Office.js model (which, at the time of writing, is Excel, Word, and OneNote). In this book, I will generally use Excel-based samples, both because I love Excel, and because – at least for VSTO – more developers were programming against Excel than against any of the other hosts. But again, if you substitute Word or OneNote in place of Excel in the sample code – and if you substitute in the analogous objects – the exact same concepts will apply. I’ll try to sprinkle in some Word examples too, for fairness’ sake.
  • Finally, while the book is written by someone who works on the Office Extensibility Platform team at Microsoft, the information and viewpoints presented in this book represent the viewpoints of the author, not of Microsoft as a company.

2 Introduction to Office Add-ins

2.1 Chapter structure & planned content

2.2 What’s “new” in the Office 2016 APIs (relative to 2013)?

As you’ll read in greater detail in Office.js: The asynchronous programming model, the Office.js API model was first introduced in Office 2013, under the name of “Apps for Office”. The model was an ambitious – if somewhat limited –- attempt to create a new kind of web technology based APIs, and have those APIs work cross platform and across multiple host applications.

Office 2016 has seen a complete overhaul of the original 2013 API model, creating new and host specific APIs. These APIs offer a rich client-side object model that accurately reflects the unique characteristics of each object type for each of the hosts. For example, the Excel APIs have classes for Worksheets, Ranges, Tables, Charts, and so forth, with each object type offering dozens of properties and methods. Similarly, the Word API now have the notion of Paragraphs, and Content Controls, and Images, and more. The team has also changed the overarching design process, building out the APIs with great transparency in the form of “open specs”, which are posted online for the community to give feedback on before implementation ever takes place. In a similar vein, several colleagues on my team – and an increasing number of folks from the external community – are active participants on StackOverflow, where we actively monitor any questions that are tagged with “office-js”.

The Office 2016 release also signifies a much deeper commitment to cross-platform consistency in the APIs. For example, all of the new APIs in Excel that are available in Office 2016 for Windows are also available in Excel Online, Excel for Mac and Excel on iOS. To keep up with developers’ demands, the new APIs are shipped continuously, usually once per quarter, with those APIs becoming immediately available to users of Office 3651 on the Desktop, on Office Online, and Mac/iOS.

Even beyond the APIs, Office 2016 offers enhancements to other parts of the overall programming framework – most notably, the ability for Office Add-ins to appear in the ribbon and to launch dialog windows. If you were previously on the fence for whether to look into Office Add-ins, I hope that the APIs described in this book will help change your mind.

2.3 Office.js: The asynchronous programming model

2.3.1 Why is Office.js async?

For those who have used VBA before, you will know that VBA code was always executed in a linear (synchronous) fashion. This is very natural for an automation task, where you’re essentially manipulating a document through a series of steps (and where, more often than not, the steps are similar to the sequential series of steps that a human would do). For example, if you needed to analyze the current selection in Excel and highlight any values that were greater than 50, you might write something like this:

VBA macro for highlighting values over 50
 1 Dim selectionRange As Range
 2 Set selectionRange = Selection
 3 Call selectionRange.ClearFormats
 4 
 5 Dim row As Integer
 6 Dim column As Integer
 7 Dim cell As Range
 8 
 9 For row = 1 To selectionRange.Rows.Count
10     For column = 1 To selectionRange.Columns.Count
11         Set cell = selectionRange.Cells(row, column)
12         If cell.Value > 50 Then
13             cell.Interior.Color = RGB(255, 255, 0)
14         End If
15     Next column
16 Next row
Screenshot of the code in action

When run, such macro would execute line by line, reading cell values and manipulating them as it went. The macro have complete access to the in-memory representation of the workbook, and would run almost at the native Excel level, blocking out any other native Excel operations (or, for that matter, any user operations, since the VBA code executes on the UI thread).

With the rise of .NET, VSTO – Visual Studio Tools for Office – was introduced. VSTO still used the same underlying APIs that VBA accessed, and it still ran those APIs in a synchronous line-by-line fashion. However, in order to isolate the VSTO add-in from Excel – so that a faulty add-in would not crash Excel, and so that add-ins could be resilient against each other in a multi-add-in environment – VSTO had each code solution run within its own fully-isolated AppDomain. So while .NET code itself ran very fast – faster than VBA by pure numbers – the Object-Model calls into Excel suddenly incurred a significant cost of cross-domain marshaling (resulting in a ~3x slowdown compared to VBA, in my experience).

Thus, the VBA code above – translated into its VB.NET or C# equivalent – would continue to work, but it would run far less efficiently, since each subsequent read and write call would have to traverse the process boundary. To ameliorate that, the VSTO incarnation of the code could be made significantly faster if all of the read operations were lumped into a single read call at the very beginning. (The write operations, of setting the background of each cell one-by-one, would still have to remain as individually-dispatched calls2).

Here is C# Version of the code that addresses the above scenario, but this time reads all of the cell values in bulk (see line #4 below).

VSTO incarnation of the code, with bulk-reading of the selection values (line #4)
 1 Range selectionRange = Globals.ThisAddIn.Application.Selection;
 2 selectionRange.ClearFormats();
 3 
 4 object[,] selectionValues = selectionRange.Value;
 5 
 6 int rowCount = selectionValues.GetLength(0);
 7 int columnCount = selectionValues.GetLength(1);
 8  
 9 for (int row = 1; row <= rowCount; row++)
10 {
11     for (int column = 1; column <= columnCount; column++)
12     {
13         if (Convert.ToInt32(selectionValues[row, column]) > 50)
14         {
15             Range cell = selectionRange.Cells[row, column];
16             cell.Interior.Color = System.Drawing.Color.Yellow;
17         }
18     }
19 }

Note that the VSTO code, when interacting with the documents, still runs on (a.k.a., “blocks”) the Excel UI thread, since – due to the way that Excel (and Word, and others) are architected – all operations that manipulate the document run on the UI thread. But fortunately for VSTO, the process-boundary cost – while an order of magnitude higher than that of VBA – is still relatively small in the grand scheme of things, and the batch-reading technique described above can alleviate a chunk of the performance issues… and so, VSTO could continue to use the same APIs in the same synchronous fashion as VBA, while letting developers use the new goodness of the .NET Framework.

With the introduction of Office Add-ins (known at the time as “Apps for Office”) in Office 2013, the interaction between the add-in and the host application had to be re-thought. Office Add-ins, based in HTML and JavaScript and CSS, needed to run inside of a browser container. On Desktop, the browser container is an embedded Internet Explorer process3, and perhaps the synchronous model could still have worked there, even if it took another performance hit. But the real death knell to synchronous Office.js programming came from the need to support the Office Online applications, such as Word Online, Excel Online, etc. In those applications, the Office Add-ins runs inside of an HTML iframe, which in turn is hosted by the parent HTML page that represents the document editor. While the iframe and its parent are at least part of the same browser window, the problem is that the bulk of the documents might not – and generally is not – loaded in the browser’s memory 4. This means that for most operations 5, the request would have to travel from the iframe to the parent HTML page, all the way to an Office 365 web server running in a remote data center. The request would then get executed on the server, which would dutifully send the response back to the waiting HTML page, which would pass it on to the iframe, which would finally invoke the Add-in code. Not surprisingly, such round-trip cost is not cheap.

To put it into perspective: imagine that the entire roundtrip described above takes 50 milliseconds, and we are running a hypothetical synchronous and JavaScript-icized version of the VSTO macro. Imagine that we have one hundred cells, of which 50 meet the criteria for needing to be highlighted. This would mean that we need to make one request to clear the formatting from the selection, another to fetch all of the data, and then 50 requests for each time that we set on individual cell’s color. This means that the operation would take (2 + 50) * 50 milliseconds, or just over 2.5 seconds. Perhaps that doesn’t sound all that terrible… but then again, we were operating on a mere 100 cells! For 1000 cells, we’d be looking at 25 seconds, and for 10,000 cells we would be at over four minutes. What would the user be doing – other than sitting back in this chair and sipping coffee – while waiting for the Add-in operation to complete?!

If synchronous programming was out, the only remaining choice was asynchrony. In the Office 2013 model, this was embodied by a whole bunch of methods that ended with the word “Async”, such as:

    Office.context.document.setSelectedDataAsync(data, callback);

In this Office 2013 design, every operation was a standalone call that was dispatched to the Office host application. The browser would then wait to be notified that the operation completed (sometimes merely waiting for notification, other times waiting for data to be returned back), before calling the callback function.

Thus, while the Office 2013 APIs solved the Async problem, the solution was very much a request-based Async solution, akin to server web requests, but not the sort of automation scenarios that VBA users were accustomed to. Moreover, the API design itself was limiting, as there were almost no backing objects to represent the richness of the Office document. The omission was no accident: a rich object model implies objects that have countless properties and methods, but making each of them an async call would have been not only cumbersome to use, but also highly inefficient. The user would still be waiting their 2.5 seconds or 25 seconds, or 4+ minutes for the operation to complete, albeit without having their browser window frozen.

The new Office 2016 API model offers a radical departure from the Office 2013 design. The object model – now under the Excel namespace for Excel, Word for Word, OneNote for OneNote, etc., – is backed by strongly-typed object-oriented classes, with similar methods and properties to what you’d see in VBA. Interaction with the properties or methods is also simple and sequential, similar in spirit to what you’d do in VBA or VSTO code.

Whoa! How is this possible? The catch is that, underneath the covers, setting properties or methods adds them to a queue of pending changes, but doesn’t dispatch them until an explicit .sync() request. That is, the .sync() call is the only asynchrony in the whole system. When this sync() method is called, any queued-up changes are dispatched to the document, and any data that was requested to be loaded is received and injected into the objects that requested it. Take a look at this incarnation of the cell-highlighting scenario, this time written using the new Office.js paradigm, in JavaScript:

Office.js incarnation of the VBA macro, with blocks of still-seemingly-synchronous code
 1 Excel.run(function (context) {
 2     var selectionRange = context.workbook.getSelectedRange();
 3     selectionRange.format.fill.clear();
 4 
 5     selectionRange.load("values");
 6 
 7     return context.sync()
 8         .then(function () {
 9             var rowCount = selectionRange.values.length;
10             var columnCount = selectionRange.values[0].length;
11             for (var row = 0; row < rowCount; row++) {
12                 for (var column = 0; column < columnCount; column ++) {
13                     if (selectionRange.values[row][column] > 50) {
14                         selectionRange.getCell(row, column)
15                             .format.fill.color = "yellow";
16                     }
17                 }
18             }
19         })
20         .then(context.sync);
21 
22 }).catch(OfficeHelpers.Utilities.log);

As you can see, the code is pretty straightforward to read. Sure, there are the unfamiliar concepts of load and sync, but if you squint over the load and sync statements and the Excel.run wrapper (i.e., only look at lines #2-3, and then #9-18), you still have seemingly-synchronous code with a familiar-looking object model.

If instead of plain JavaScript you use TypeScript (see A word on JavaScript and TypeScript), the Office.js code becomes even cleaner.

The same Office.js rendition, but this time making use of TypeScript 2.1’s async/await feature
 1 Excel.run(async function (context) {
 2     var selectionRange = context.workbook.getSelectedRange();
 3     selectionRange.format.fill.clear();
 4 
 5     selectionRange.load("values");
 6     await context.sync();
 7 
 8     var rowCount = selectionRange.values.length;
 9     var columnCount = selectionRange.values[0].length;
10     for (var row = 0; row < rowCount; row++) {
11         for (var column = 0; column < columnCount; column ++) {
12             if (selectionRange.values[row][column] > 50) {
13                 selectionRange.getCell(row, column)
14                     .format.fill.color = "yellow";
15             }
16         }
17     }
18 
19     await context.sync();
20 
21 }).catch(OfficeHelpers.Utilities.log);

In fact, if you ignore the Excel.run wrapper code (the first and last lines), and if you squint over the load and sync statements (lines #5-6 and #18), the code looks reasonably similar to what you’d expect to write in VBA or VSTO!

Still, programming using the new Office.js model – and, for VBA or VSTO users, adapting to some of the differences – has a definite learning curve. This book will guide you through getting started, understanding the API basics, learning the unconventional tricks of debugging Office.js code, and grasping the key concepts for writing performant and reliable add-ins. It will also give tips on making your development experience easier, on utilizing some of the lesser-known functions of Office.js framework, and on writing testable code. Finally, it will address some frequently asked questions, and give guidance on broader topics such as how to call external services, how to authenticate within Office add-ins, and how to publish and license your Add-ins.

2.3.2 What is meant by “the server”

Throughout the Office documentation, as well as this book, you may occasionally encounter sentences such as “after calling chart.getImage() and doing context.sync(), the server will populate the result object with the requested image”. What is meant by “the server” when you’re issuing a simple API request to Excel? Does this mean that even when you’re working on the Desktop, something is being processed by some remote Office 365 endpoint?

Don’t let the wording confuse you. In the context of an Office Add-in, by the server, what is really meant is the host application – that is, the Office application that contains your add-in, and which ultimately manipulates the document on the add-ins behalf. In the case of Office Online, this is in fact a remote Office 365 server. In case of Office on the Desktop or Mac or on iOS, however, it’s just the Excel/Word/etc. application. So, in all cases except Office Online, the operation is executed locally – no remote back-end server crunches the request. The reason it’s sometimes referred to as the server is that, from the running JavaScript’s perspective, it’s still a remote and unknown endpoint, somewhere beyond the JavaScript runtime’s application boundary… And so it’s still a “server” from that perspective, even if it’s often just a different process on the same machine.


3 Prerequisites

3.1 Chapter structure & planned content

3.2 Walkthrough: Building an Add-in using Visual Studio

If you’re just getting started and want to use Visual Studio, I highly recommend watching a walkthrough tutorial that I recorded in late 2015: https://channel9.msdn.com/series/officejs/End-to-End-Walkthrough-of-Excel-JavaScript-Add-in-Development.

In the video, I walk through the end-to-end process of building an Office Add-in for Excel: from launching Visual Studio, to writing a bit of JavaScript code that uses the new Excel 2016 APIs, to adding some basic UI tweaks, to talking through the publishing options, debugging, and more.

The video touches on some API topics that are covered in much greater detail in this book – but it also shows the process of creating a project and debugging using Visual Studio, which is crucial for getting started. If you’ve not built an Office Add-in before, I highly recommend the video.

For those looking for written instruction, just on the Visual Studio piece: there is also official documentation for creating a project on https://dev.office.com/docs/add-ins/get-started/create-and-debug-office-add-ins-in-visual-studio.

4 JavaScript & Promises primer (as pertaining to our APIs)

To those first entering JavaScript from the traditional C#/Java/C++ world, JavaScript may seem like the Wild West (and it is). While it’s beyond the scope of this book to teach you JavaScript all-up, this chapter captures some important concepts in JavaScript – and particularly JavaScript’s “Promises” pattern – that will greatly improve the quality and speed at which you write Office Add-ins.

If you are coming from VBA or VSTO, with little true JS experience, this chapter should be very helpful in establishing your footing. If you’re a JavaScript pro, you may still find some nuggets of information useful, but you can probably skim the pages, or skip ahead entirely.

4.1 Chapter structure & planned content

5 Office.js APIs: Core concepts

5.1 Chapter structure

5.2 Canonical code sample: reading data and performing actions on the document

To set the context for the sort of code you’ll be writing, let’s take a very simple but canonical example of an automation task. This particular example will use Excel, but the exact same concepts apply to any of the other applications (Word, OneNote) that have adopted the new host-specific/Office 2016+ API model.

Scenario: Imagine I have some data on the population of the top cities in the United States, taken from “Top 50 Cities in the U.S. by Population and Rank” at http://www.infoplease.com/ipa/a0763098.html. The data – headers and all, just like I found it on the website – describes the population over the last 20+ years.

Let’s say the data is imported into Excel, into a table called “PopulationData”. The table could just as easily have been a named range, or even just a selection – but having it be a table makes it possible to address columns by name rather than index. Tables are also very handy for end-users, as they can filter and sort them very easily. Here is a screenshot of a portion of the table:

The population data, imported into an Excel table

Now, suppose my task is to find the top 10 cities that have experienced the most growth since 1990. How would I do that?


The code in the next few pages shows how to perform this classic automation scenario. As you look through the code, if not everything will be immediately obvious – and it probably won’t be – don’t worry: the details of this code is what the rest of this chapter (and, to some extent, the book) is all about! But I think it’s still worth reading through the sample as is for the first time, to gain a general sense of how such task would be done via Office.js.

Note: In a more real-world scenario, this sample would be broken down into ~4 functions: one to read the data, another to calculate the top 10 changed cities, a third to write out a table, and a forth to bind a chart to the data. For purposes of this sample, though – and in order to make it easily readable from top to bottom, rather than having the reader jump back and forth – I will do it in one long function.

The canonical data-retrieval-and-reporting automation task
  1 Excel.run(function (context) {
  2     // Create proxy objects to represent the "real" workbook objects
  3     // that we'll be working with.  More information on proxy objects 
  4     // will be presented in the very next section of this chapter.
  5 
  6     var table = context.workbook.tables.getItem("PopulationTable");
  7  
  8     var nameColumn = table.columns.getItem("City");
  9     var latestPopulationColumn = table.columns.getItem(
 10         "7/1/2014 population estimate");
 11     var earliestCensusColumn = table.columns.getItem(
 12         "4/1/1990 census population");
 13  
 14     // Now, load the values for each of the three columns that we
 15     // want to read from.  Note that, to support batching operations
 16     // together (again, you'll see more in the upcoming sections
 17     // of this chapter), the load doesn't *actually* happen until
 18     // we do a "context.sync()", as below.
 19  
 20     nameColumn.load("values");
 21     latestPopulationColumn.load("values");
 22     earliestCensusColumn.load("values");
 23  
 24     return context.sync()
 25         .then(function () {
 26             // Create an in-memory array of the retrieved data,
 27             // where each object represents information about the city.
 28             var cityData = [];
 29  
 30             // Start at i = 1 (that is, 2nd row of the table --
 31             // remember the 0-indexing) in order to skip the header.
 32             for (var i = 1; i < nameColumn.values.length; i++) {
 33                 // A couple of the cities don't have data for 1990,
 34                 // so skip over those.
 35  
 36                 // Note that because the "values" is a 2D array
 37                 // (even though, in this particular case, it's just
 38                 // a single column), need to extract out the 0th
 39                 // element of each row.
 40                 var population1990 = earliestCensusColumn.values[i][0];
 41   
 42                 // Otherwise, push the data into the in-memory store
 43                 cityData.push(
 44                     {
 45                         name: nameColumn.values[i][0],
 46                         growth:
 47                             latestPopulationColumn.values[i][0] -
 48                             earliestCensusColumn.values[i][0]
 49                     }
 50                 );
 51             }
 52  
 53             var sorted = cityData.sort(function (city1, city2) {
 54                 return city2.growth - city1.growth;
 55                 // Note the opposite order from the usual 
 56                 // "first minus second" -- because want to sort in
 57                 // descending order rather than ascending.
 58             });
 59             var top10 = sorted.slice(0, 10);
 60 
 61             // Now that we've computed the data, create a new worksheet
 62             // for the output. Note that, per Excel behavior,
 63             // no two worksheets may share a name. Since we explicitly
 64             // specify the name that we'd like to give to the sheet,
 65             // this call will fail if an existing sheet with the same
 66             // name already exists. A later section in the book
 67             // describes methods for checking an object's existence.
 68       
 69             var outputSheet = context.workbook.worksheets.add(
 70                 "Top 10 Growing Cities");
 71  
 72             var sheetHeader = outputSheet.getRange("B2:D2");
 73             sheetHeader.values =
 74                 [["Top 10 Growing Cities", "", ""]];
 75             sheetHeader.merge();
 76             sheetHeader.format.font.bold = true;
 77             sheetHeader.format.font.size = 14;
 78  
 79             var tableHeader = outputSheet.getRange("B4:D4");
 80             tableHeader.values =
 81                 [["Rank", "City", "Population Growth"]];
 82             var table = outputSheet.tables.add(
 83                     "B4:D4", true /*hasHeaders*/);
 84 
 85 
 86             // Could use a standar loop:
 87 			//    "for i = 0; i < array.length; i++",
 88             // but opting for an often-more-convenient
 89 			// "array.forEach" approach instead:
 90             top10.forEach(function (item, index) {
 91                 table.rows.add(
 92                     null /* null means "add to end" */,
 93                     [
 94                       [index + 1, item.name, item.growth]
 95                     ]);
 96                 // Note: even though adding just a single row,
 97                 // the API still expects a 2D array for consistency
 98                 // and interoperability with Range.values.
 99             });
100  
101             // Auto-fit the column widths, and set uniform
102             // thousands-separator number formatting on the
103             // "Population" column of the table.
104             table.getRange().getEntireColumn().format
105                 .autofitColumns();
106             table.getDataBodyRange().getLastColumn()
107                 .numberFormat = [["#,##"]];
108  
109  
110             // Finally, with the table in place, add a chart:
111  
112             var fullTableRange = table.getRange();
113  
114             // For the chart, no need to show the "Rank", so only use
115             // the columns for the citys' names and population deltas:
116             var dataRangeForChart =
117                 fullTableRange.getColumn(1).getBoundingRect(
118                     fullTableRange.getLastColumn());
119  
120             // A note on the function call above:
121             // Range.getBoundingRect can be thought of like a 
122             // "get range between" function, creating a new range
123             // between this object (in our case, the column at
124             // index 1, which is the "City" column), and
125             // the last column of the table ("Population Growth").
126  
127             var chart = outputSheet.charts.add(
128                 Excel.ChartType.columnClustered,
129                 dataRangeForChart,
130                 Excel.ChartSeriesBy.columns);
131  
132             chart.title.text = "Population Growth: 1990 to 2014";
133 
134             var tableEndRow =
135                 3 /* row #4 -- remember that we're 0-indexed */ +
136                 1 /* the table header */ +
137                 top10.length /* presumably 10 */;
138  
139             var chartPositionStart = outputSheet.getRange("F2");
140             chart.setPosition(
141                 chartPositionStart,
142                 chartPositionStart.getOffsetRange(
143                     19 /* 19 rows down, i.e., 20 rows in total */,
144                     9 /* 9 columns to the right, so 10 in total */
145                 )
146             );
147  
148             outputSheet.activate();
149         })
150         .then(context.sync);
151 })
152 .catch(function (error) {
153     console.log(error);
154     // Log additional information, if applicable:
155     if (error instanceof OfficeExtension.Error) {
156         console.log(error.debugInfo);
157     }
158 });




If you run the above code, here is what the resulting sheet looks like:


Now, let’s dive in and see how this sample works.



5.3 Understanding proxy objects

5.3.1 Proxy objects: the building-blocks of the Office 2016 API model

At the fundamental level, the Office 2016+ API model consists of local JavaScript proxy objects, which are the local placeholders for real objects in the document. You get a proxy object by calling some property or method (or a chain of properties or methods) that originate somewhere off of the request context. For example, if you have an Excel object called workbook, you can get cells A1:B2 on Sheet1 by calling

    var myRange = workbook.worksheets.getItem("Sheet1")
        .getRange("A1:B2");

To apply values to the object, you would simply set the corresponding properties. For example:

    myRange.values = [["A", "B"], [1, 2]];
    myRange.format.fill.color = "yellow";

The same applies to calling a method – you simply invoke it. For example,

    myRange.clear();

Importantly, what happens beneath the covers is that the object – or, more accurately, the request context from which the object derives – accumulates the changes locally, much like a changelist in a version-control system. Nothing is “committed” back into the document until you recite the magical incantation:

    context.sync();

5.3.2 The processing on the JavaScript side

Let’s pause for a moment. If all of these JavaScript objects are simply proxy objects, and all you’re doing is queuing up commands, how long can you keep going? It turns out that as long as you don’t need any information back from the document (i.e., you’re not reading back some values and then multiplying or formatting or doing whatever else with them), you can create an arbitrarily-long queue of commands. In the population-data example, within the block of code that’s inside of .then(function() { ... }), I issued a bunch of commands all at once: creating a new sheet, setting some values on it, overlaying it with a table and adding a bunch of rows, formatting the table, adding and positioning a chart, and finally setting focus on the newly-created output sheet.

On the JavaScript side, whenever I called a property that involved primitive types – so, for example, setting sheetHeader.values – the action would get noted and added to the internal queue, and then the JavaScript operations would proceed as is. Likewise, whenever I called a void returning method – such as sheetHeader.merge() – this too would get internally noted and added to the queue. If I called a property or method that returns an API object, though – for example, context.workbook.worksheets.add(...), or outputSheet.getRange("B2:D2") – Office.js would go and create a new on-the-fly proxy object and return it to me, noting internally how this object was derived.

In all those cases, the overarching paradigm is that Office.js tries to create the illusion that the operations are synchronous, even though in reality it simply added them to an internal queue, and waits until they can be “flushed” as part of a context.sync().


5.3.3 The processing on the host application’s side

The previous section described what happens on the JS layer for outgoing commands. But what happens on the receiving end? On the host application’s side (a.k.a, “the server”), the long chain of commands is executed one-by-one, resolving any proxy objects as the need arises (“supply on demand”?). So for example, when a call comes in to add a sheet, the resulting object is internally stored for the duration of the batch; and then, when this object gets used in a subsequent line for retrieving a range, this range is likewise created (and internally identified as being created off of a particular parent object, using method such-and-such with parameters such-and-such). When a subsequent call comes in to set values of the range, this call – which in turn references the object that it’s operating on – is identified to mean the range that was just recently created. And so forth.

If this all seems magical – well, to some extent it down right is! But it is important to remember that until you call “context.sync()”, you’re effectively building castles in the sky. And, an interesting corollary to this is that even “erroneous” code (such as accessing a non-existing item) won’t throw until it’s part of the “context.sync()” execution – up to that point, the changes are merely queued up. So for example, you can easily write the following code:

Oh-oh. Code executing past an obvious OM error
1 context.workbook.worksheets.getItem
2     ("SheetThatDoesNotExist").getRange("A1:B2").clear();
3 console.log("I'm past the call");
4 context.workbook.getSelectedRange().format.fill.color = "yellow";
5 
6 context.sync().then(...).catch(...);

When you run the above code, execution at the JavaScript layer will happily keep going past the SheetThatDoesNotExist line to the next JavaScript call: console.log("I'm past the call)". And in fact, it will keep on going all the way through till context.sync().

This is probably not what you’d expect – after all, shouldn’t the invalid-sheet-fetching throw an error? In a synchronous VBA/VSTO-like world, it absolutely would. But remember, in this proxy-object batched-execution world of Office.js, all you’re doing is building up an array of commands, but not yet executing them! As discussed in “Why is Office.js async?”, querying the document every time to perform a document-object-model request would be impractical. And so, without that direct feedback, the JavaScript layer has no way of knowing that SheetThatDoesNotExist does not exist in the workbook. Only when the JS layer is waiting on the asynchronous completion of context.sync() – and when the host application is meanwhile processing the queue of commands – does the error become apparent. At that point, the execution halts on the host applications’ side, dropping any other calls that were queued up in the batch (in this case, setting the selected range to yellow). The host then signals back to the JavaScript engine to return a failure in the form of a rejected promise. But importantly, any non-API-object JavaScript calls that preceded context.sync() will have already run!

This is the trickiest part of the new Office.js’s async model, and the most common pitfall (especially if you accidentally let a context.sync() run loose without awaiting it, thereby executing a bunch of JavaScript before the document object-model calls came back). So remember: it’s all proxy objects – and nothing happens until you call context.sync()"!


5.4 Loading properties

5.4.1 The object.load("property-names") method call

The preceding section only talked about setting properties or calling methods on document objects. What about when you need to read data back from the document?

For reading back document data, there is a special command on each object, object.load(properties). An identically-functioning method can also be found on the context object: context.load(object, properties). The two are 100% equivalent (in fact, object.load calls context.load internally), so which one you use is purely stylistic preference. I generally prefer object.load, just because it feels like I’m dealing with the object directly, rather than going through some context middleman/overlord. But again, it’s purely stylistic.

The load command is queued up just like any of the others, but at the completion of a context.sync() – which will be covered in greater detail in a subsequent “Syncing object state” section – an internal post-processing step will automatically populate the object’s properties with the requested data. The property names that you pass in into the object.load(properties) call are the very properties that – after a sync – you will be able to access directly off of the object.

For example, if you look at the IntelliSense for the Excel Range object, you will see that it contains properties like range.values and range.numberFormat. If you need these values and number formats in order to execute your function (maybe to display them to the user, or to calculate the minimum/maximum, or to copy them to a different section of the workbook, or to use them to make a web service call, etc.), you would simply list them as a comma-separated list in the .load function, as show in the example below:

Loading and using object properties
 1 var myRange = context.workbook.worksheets
 2     .getItem("Sheet1").getRange("A1:B2");
 3 
 4 myRange.load("values, numberFormat");
 5 
 6 context.sync()
 7     .then(function () {
 8         console.log(JSON.stringify(myRange.values));
 9         console.log(JSON.stringify(myRange.numberFormat));
10     })
11     .catch(...);


Importantly, just as in the code above, the reason to call object.load is that you intend to read back the values in the code following the context.sync(), and that your operation can’t continue without having that information. This brings us to arguably the most important rule in all of Office.js:




The return value of object.load(...) is the object itself. This is done purely for convenience, saving a line or two of vertical space:

Return value of object.load(...)
    ...
    .then(function() {
        return worksheet.getRange("A1").load("values")
    })
    ...

    // Is equivalent to the more verbose rendition, of
    // fetching the object, assigning it to a variable,
    // calling "load", and finally returning the object:
    
    ...
    .then(function() {
        var range = worksheet.getRange("A1");
        range.load("values");
        return range;
    })
    ...

5.4.2 Scalar vs. navigation properties – and their impact on load behavior

A bit of terminology: simple properties like values or numberFormat off of an API object like Excel.Range are called scalar properties. These scalar properties are either primitive types (numbers, Booleans, strings), arrays of primitive types, or complex objects (i.e., plain JS objects composed of primitives or arrays of primitives; the moral equivalent of a C++/C# struct). One way to think about scalar properties is that they are the sort of simple properties you could serialize and send over the wire via JSON, XML, etc.

By contrast, any API objects are instances of API classes, and are not serializable (they are classes, with methods and all, not just pure data!). Such properties are called “navigation properties”, because they’re used to navigate you from one API object to another – but they ultimately carry no data of their own. So, whereas you need to load the scalar properties that you intend to use, you do *not* need to load navigation properties.

In case an analogy helps, you can imagine the object hierarchy as a tree structure. In this case, let’s imagine we start with the Range object.

  • Any leaf node properties – whether on the Range object directly, or on one of its descendants – are scalar properties (marked in green).
  • Any non-scalars are navigation properties (marked in blue).


When specifying properties to load, you can specify either scalar properties either directly off the object, or scalar properties that are accessible via navigation properties off the object. For example, to load both the cell address and the fill color of a Range, you would specify the following (note that the scalar property is always the “leaf-node” one – and any preceding slashes indicate navigation properties):

    myRange.load("address, format/fill/color");

Technically speaking, you are not required to include the property names. You could (though you shouldn’t!) write a load statement with no property names passed in (e.g., myRange.load() or context.load(myRange)). But if you do that, Office.js will load all scalar properties on the object – which on the Excel Range object, for example, is over a dozen! – even if you end up using only one of them! Moreover:

  1. For some properties (i.e., Range values in Excel), loading a particular property might fail, even if some of the other properties would load just fine. Concrete example: an unbounded range like columns A:C will throw an error if you try to load their .values property (that’s upwards of a million cells in each row, and so the API blocks unbounded ranges!), but would respond just fine to the address property.
  2. Even if you don’t notice the difference on Desktop, the delay (and bandwidth costs!) of the extra properties will almost certainly show in Office Online.
  3. Even if an object exposes just two properties today, and you’re using them both, that’s not to say that more won’t be exposed eventually! This means that your add-in might slow down (and start chewing through more bandwidth) over time, as more APIs are added. In short, even if you’re currently using all properties of the object, and hence don’t see a need to load the properties explicitly, you are still better off writing them out to prevent future slowdown.
  4. The effects of #2 and #3 are multiplied ten-fold or hundred-fold when loading a collection (proportional to the number of elements in the collection – which often-times is something that the user controls, but not the developer!)

5.4.3 Loading and re-loading

If you receive a PropertyNotLoaded error when accessing a property, you must have forgotten to load it! The fix is fortunately very simple: add the load, and ensure there is a sync somewhere upstream from where you’re using the property.

Note that the PropertyNotLoaded error will only be thrown when you initially forget to load the property. If you manipulate the object (i.e., set the formula property on a cell where you’ve previously loaded a value), or if it gets impacted by other external factors (i.e.: a formula dependency on another cell), it is your responsibility to remember to re-load the property:

Re-loading properties
 1 // ...
 2 // Initial loading of values:
 3 myRange.load("values");
 4 
 5 return context.sync()
 6     .then(function() {
 7         // ... Some operations that impact range values
 8 
 9         // Re-load the cell values to retrieve the latest:
10         myRange.load("values");
11     });
12 // ....

When loading or re-loading properties on a regular (non-collection2) object, note that only the specified property names will be re-loaded; the rest will be kept as is. So, for example, loading myRange.load("address") within the .then above would not have refreshed the values property!




5.5 Syncing object state

5.5.1 De-mystifying context.sync()

Throughout the preceding sections, you will have seen the term – and seen example usage of – context.sync(). The purpose of context.sync() is for the developer to declare that he/she is done with either the entire batch or a portion of it. An invocation of context.sync() returns a Promise object, which can then be .then-ed with follow-up operations (which in turn might also call to context.sync() to signal the completion of their portion of the batch). Importantly, chaining Promises together returns a neta-Promise, which resolves only when all of the child Promises have finished chaining. This Promise is what must be returned out of the batch function, so that .run-method knows that the OM operations have completed3.

As an add-in author, your job is to minimize the number of context.sync() calls. Each sync is an extra round-trip to the host application; and when that application is Office Online, the cost of each of those round-trip adds up quickly.

If you set out to write your add-in with this in principle in mind, you will find that you need a surprisingly small number of sync calls. In fact, when writing this chapter, I found that I really needed to rack my brain to come up with a scenario that did need more than two sync calls. The trick for minimizing sync calls is to arrange the application logic in such a way that you’re initially scraping the document for whatever information you need (and queuing it all up for loading), and then following up with a bunch of operations that modify the document (based on the previously-loaded data). You’ve seen several examples of this already: one in the intro chapter, when describing why Office.js is async; and more recently in the “canonical sample” section at the beginning of this chapter. For the latter, note that the scenario itself was reasonably complex: reading document data, processing it to determine which city has experienced the highest growth, and then creating a formatted table and chart out of that data. However, given the “time-travel” superpowers of proxy objects, you can still accomplish this task as one group of read operations, followed by a group of write operations.

Still, there are some scenarios where multiple loads may be required. And in fact, there may be legitimate scenarios where even doing an extra sync is the right thing to do – if it saves on loading a bunch of unneeded data. You will see an example of this very soon.


5.5.2 Returning the context.sync() promise

Before we dive into an actual example, it’s worth pointing out a common pitfall and a way to get around it. Because a batch will typically contain two or more sync calls – each of which returns a Promise – it is pivotal NOT to break the Promise chain, or else you may encounter some unexpected and often hard-to-diagnose behavior (see “Chaining Promises, the right way”). This means that:

  1. You must remember to return context.sync() out of the batch function.
  2. You must also remember that the body of each .then must either be fully synchronous, or return a Promise.

#1 is fairly straightforward. In fact, if you’re using TypeScript, you will get a compilation error if you forget the return (the batch function must return a Promise, whereas by forgetting the return you’re making the function void). Even if you’re using just plain ol’ JavaScript, as long as you’ve added a .catch statement, you will see an error during runtime.

RunMustReturnPromise: The batch function passed to the “.run” method didn’t return a promise. The function must return a promise, so that any automatically-tracked objects can be released at the completion of the batch operation. Typically, you return a promise by returning the response from “context.sync()”.

Remembering #2, on the other hand, takes more effort. There is unfortunately no error-checking that can catch the issue for you, as it’s perfectly reasonable (and very common) to have a .then-function that doesn’t return any values (i.e., is void). And moreover, when you do forget the return statement, you end up in non-deterministic territory, where the code might or might not work based on timing; and where diagnosing the issue is notoriously difficult (with a broken Promise chain, you also lose error-handling; yay!).

Having unintentionally broken the Promise chain myself, on a number of occasions – and having seen beginner Office.js developers break the Promise chain time and time again – I personally prefer to sidestep the issue altogether, using the technique covered in “Passing in functions to Promise .then-functions”. Namely, I make all functions that make OM calls be synchronous (i.e., only queue up the operations, but not dispatch the context.sync() yet), and then I follow up these synchronous “.then”s with corresponding .then(context.sync)-s. Notice how the latter does not invoke context.sync (there are no () after sync, as you’re passing in a function reference, not a return value). And you can’t be guilty of forgetting a return statement, if there is physically no return statement to begin with!

In case the above wasn’t 100% clear, let me take a moment to “derive” this transformation:

Let’s start at the beginning. Imagine you have code like:

...
return context.sync()
    .then(function() {
        // ... synchronous OM code

        return context.sync();
    })
    .then(...)

This code makes some OM calls that queue up some operations, and then it invokes context.sync() in an “invoke-style” sync. If we wanted to, we could split out the two parts of the code into two consequitive “.then”s: one for the synchronous OM code, and one for the asynchronous sync invocation. There isn’t much sense in doing this, but it’s just an intermediary process in our derivation:

...
return context.sync()
    .then(function() {
        // ... synchronous OM code
    })
    .then(function() {
        return context.sync();
    })
    .then(...)

Now, all that the second .then does is create an anonymous function whose sole purpose is to return the result of an invocation of another [named] function. This is needlessly complex. Since functions are first-class citizens in JavaScript, we can simplify the second .then to take the context.sync function directly, “reference-style”, without an invocation!

...
return context.sync()
    .then(function() {
        // ... synchronous OM code
    })
    .then(context.sync)
    .then(...)

Putting this all together, it means that instead of all the “invoke-style” calls – with the risk of forgetting a return-statement – in the illustration in the preceding section, you can now use the “reference-style” approach to dispatch the sync in a safer manner:



If you follow the transformation prescribed by this pattern, you’ll find that typically, the only return statement you’ll need is the first return context.sync() invocation – which, between the TypeScript compiler and the runtime error, should be relatively straightforward. For the rest, the problem is avoided altogether via the “reference-style” approach.


Notes

The book and its structure

1As of December 2016, Word, Excel, and OneNote have all adopted the new 2016 wave of Office.js APIs. Outlook – though steadily continuing to expand its APIs – is continuing to use the “Office 2013” style of APIs even in its Office 2016 applications (partially because many of its scenarios are more about data consumption, and less about object-by-object automation, which is where the new model would shine). PowerPoint and Access have, so far, remained unchanged in their API surface relative to 2013; and Project, though it has added a number of APIs, is also still using the “Office 2013” style.

2 Says the person who, in high school, effectively did just that on his first day at a ski resort… And so can speak with great authority on the subject.

Introduction to Office Add-ins

1For more information on the “MSI” vs. subscription-based Office installations, see “Office versions: Office 2016 vs. Office 365 (Click-to-Run vs. MSI), Deferred vs. Current channel”.

2Technically not 100% true, since you could create a multi-area range and perform a single “write” operation to it – but there is a limitation on how many cells you can group together, and the performance boost is still not nearly as good as what the VBA performance would have been. There are some APIs, like reading and writing to values or formulas, that can accept bulk input/output, but most of the rest – like formatting – must be done on a range-by-range basis.

3If you’re curious for how the interaction between Office and the embedded Internet Explorer control is done: it is through a window.external API that IE provides, which acts as the pipe between IE and the process that created it. The same technique is possible in .NET and WinForms/WPF applications as well. See https://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser.objectforscripting(v=vs.110).aspx for more info on the latter.

4Imagine a 50MB Word document, complete a bunch of images. Does it make sense for the browser to receive all 50MB at once, or could it progressively load only the pages that it needs in the neighboring vicinity, and only serve up the compressed and optimized, rather than raw, copies of the images?

5The actual amount of code that can be run locally (i.e., does not require a roundtrip to the server) varies greatly depending on the host application. On one extreme end of the spectrum, Excel Online requires that pretty much all operations are executed remotely. On the opposite side, OneNote Online has a pretty good local cache of the document, and needs to call out to the server much less frequently.

Office.js APIs: Core concepts

1There is hardly a rule without an exception. In Excel in particular, the API exposes the native Excel function to JavaScript under the workbook.functions namespace (i.e., workbook.functions.vlookup(...). Within such functions, it made sense to keep the indexing consistent with the sort of native Excel formulas that a user would type (otherwise, what’s the point?!) – and so there, any index is 1-indexed. For example, if the third parameter to a “vlookup” call, “colIndexNum”, equals to 3, this refers to the third column, not the 4th one. But in terms of the object model, everywhere else (and in JavaScript itself, of course!), everything is zero-indexed.

2For re-loading on collections instead of regular API objects, see the end of section Loading collections”. Essentially, re-loading a collection blows away any existing objects and their properties, so you have to load everything from scratch – you can’t just additively load a couple of new properties while maintaining the existing items.

3See “What does .run do, and why do I need it?” regarding what .run does with this information.