Programming Episodes
Programming Episodes
Duncan McGregor
Buy on Leanpub

Introduction

The aim of this book is to teach software development by example. Rather than canned examples with predetermined teaching goals and known solutions; it documents programming episodes, with real-world uncertainties, blind-alleys, short-cuts and failures.

This is not a book to teach you how to program. It assumes that you have already mastered the basics, consider yourself an accomplished programmer even, but want to improve. You have a good grounding in a modern programming language, but are aware that there are techniques and mindsets that you have yet to experience. Like me you realise that, however big a fish you are, you have so far been swimming in a small pond.

This is not a book about Kotlin. The examples are in Kotlin because I believe that the language strikes a good balance between easy readability and being expressive enough. Before Kotlin I might have chosen Python, but my preferences are strongly-typed, especially where I need to communicate with other programmers or my future self. My aim is to explain those features of the language that might not be obvious to most programmers, but, at the time of writing, I’m aware that I haven’t addressed this at all.

What this book is, I hope, is a guide to the path from journeyman to master, from programmer to all-round developer. It covers not only the act of writing, testing and refactoring code but also the process of deciding what should be written, how to strike a balance between adding functionality and improving what we have, how to assess and manage risk and costs, and how to maintain a good relationship with our customers. It is, in short, a book about engineering.

While lots of books cover this material, and I recommend some of them in the text, it’s hard to find one that covers all these topics in the optimum depth. My approach to deciding what to cover is simple, start work on non-trivial software development, see what crops up, and talk about it as if we’re pair programming and you’re interested in what I have to say. If this is a good approach, everything important will be addressed, and really important things will be covered from several angles. Otherwise I may have to invent problems to illustrate an important topic, or shoehorn a discussion into an inappropriate gap. See if you can spot these happening!

As far as I can muster the energy to do so, I have tried to document things as they actually happened. There are times when I have expurgated some tedious details in order to keep you reading, but I have not returned to an example to paint it in a better, or worse, light than live. Post-production has been limited to formatting, copy-editing and conclusion-drawing; no scenes were filmed out of order or staged for entertainment value.

To keep me honest, you can see the history of this book in its GitHub repository. That would also be a great place to submit any corrections, suggestions or feedback.

Request for Feedback

If you’ve got this far through the introduction, I hope that I have at least piqued your interest. Even if I haven’t though, can I ask you to give me some feedback?

If this section is still in the book it is still some way from being completed. Honestly, I don’t know how it is working out. I’m really enjoying writing it, some people have said that they enjoyed reading it, but whether it is adding value to the world, and whether the format and topic stand a chance of meeting their aims, are still up for grabs.

If the book does interest you, and you reach the end of Chapter One and still have any will to live left, please follow the instructions there to send me feedback.

If you’ve decided on the text so far that the book isn’t for you, but can spare me 2 more minutes of your time, then please …

… please create an email to me, copy the following into an email, start answering at the top, keep going until you don’t think you owe me any more or your precious time, and send. I’m sorry that I don’t have an embedded form or anything - yet. Maybe if there is enough encouragement the processing of the form will become a chapter!

  1. Should I continue writing this book? [y | n]
  2. Are you in the book’s target market (3 - 10+ years of programming experience)? [y | n]
  3. What do you think of the recursive nature of the material, writing about the development of the software to assist the writing? [0 (disaster) - 10 (triumph)]
  4. How likely are you to recommend the book to a friend or colleague? [0 (not at all) - 10 (extremely)]
  5. Would you pay to read the completed book? [y | n]
  6. Do you care about paper copies of books? [y | n]
  7. Is there anything else you’d like to say? [ ... ]

If you do like what I’ve written, then there is more available on Leanpub where you can read the other chapters for free. And please invite a friend.

Thank you

Duncan

Chapter 1 - Spike

I’m writing this in Lisbon airport, an hour before my flight back to London is due to leave. On this holiday I’ve decided I’m pretty committed to writing this book, and now I’m wondering what technologies I should use for its production.

What criteria should I use to make the decision? Well the working title of the book is “Programming Episodes - learning modern development by example” so that might help. Defining modern development is something that I hope will become clear by the end, but as I think it through, I realise that deciding how to write the book is fundamentally an engineering decision. Engineering is about trade-offs, choice, optimisation. It’s about making the best use of your time or your client’s money. And very often in programming that comes down to trading the cost to make something now against the cost of fixing it later.

I know that this book will take a long time to write, and I also know that it will be full of code samples that we’d both like to actually work by the time you come to read them. So I’m looking for a way that I can mix code and prose in a way that is easy to write and maintain.

And that’s why these words are being written into the IntelliJ IDE rather than Word. They are plain text now, although I’m already thinking of them as Markdown - I just haven’t needed to emphasise anything yet. I suspect that this prose will end up as comments in Kotlin source files, and that some sort of build system will put the book together for me. In fact, if you’re reading this on paper you can’t see it yet, but this file is already surrounded by a Gradle build and is in the context of a Git repository. This is the way that modern development projects in Kotlin will start, and so it feels rather fitting that the book does too.

The First Kotlin File

Risk reduction is high on the list of modern programming concerns. I want to know that the idea of embedding the content of this book in its own source code is viable. So I’m going to start with a spike - a prototype that goes deep into the heart of the problem but doesn’t attempt breadth. This way we hope to find major problems that might scupper an idea with the least effort.

I’m typing this text as comments into a Kotlin source file called 2-spike.kt. Kotlin uses the same /* block comment markers */ as Java and C. So far I’ve found that the markers don’t get in the way of my text, but that I also don’t get a nice Markdown preview of the formatting. That’s alright though, as the nature of Markdown is that I can *see* the formatting even though it’s only characters.

So text is OK. What about code?

 1 import org.junit.Test
 2 import kotlin.test.assertEquals
 3 
 4 class SpikeTest {
 5 
 6     @Test fun `2 plus 2 is 4`() {
 7         assertEquals(4, add(2, 2))
 8     }
 9 }
10 
11 fun add(a: Int, b: Int) = a + b

Well I can run that directly in the IDE and from the Gradle build tool, so I guess that’s a start. Thinking ahead it occurs to me that I’m going to want to be able to write comments in the code. These could get confused with the comments that are the book text , so I’d better start using a different marker for the latter. Taking a lead from Javadoc, which extends standard block comments, but doesn’t change their parsing by the compiler, I’m going to go with

1 ⁠/*-
2 This is book text which the compiler will ignore.
3 ⁠-*/
4 
5 /* This is a standard comment block
6  ⁠*/
7 fun the_compiler_will_process_this() {
8 
9 }

for now and see how it goes.

It occurs to me that this is the opposite of the normal Markdown approach, where we can embed code inside blocks, with the text as the top level. Here the code is the top level, and the Markdown is embedded in it. The advantage is that we have to do nothing to our text to compile it, the disadvantage is that something is going to have to process the files to do something with the text before we can render the Markdown. That seems reasonable to me though, as I write code for a living, and an awful lot of that code does just this sort of thing to text. In fact, now that I have a programming task to do, I can see whether or not I can describe that task in a way that makes sense to both of us.

First Attempt at Publishing

Let’s write some code to take a mixed prose and source file and write a Markdown version. I’m not entirely sure what that output should look like yet, but I’ll know it when I see it. This is the sweet spot for Approval Tests, which will allow us to make rapid progress but at the same time know when something that used to work doesn’t any more.

When you run an Approval test it stores the output from the test, and compares it to a approved version of that output. If they differ, or if there was no approved version, the test fails. You could write this logic yourself, indeed I have, so I’ll use the OkeyDoke Approval Tests library, which integrates with JUnit through a Rule.

OK, time to write some code.

 1 class CodeExtractorTests {
 2 
 3     @Rule @JvmField val approver = approvalsRule()
 4 
 5     @Test fun writes_a_markdown_file_from_Kotlin_file() {
 6         val source = """
 7         |/*-
 8         |Title
 9         |=====
10         |This is Markdown paragraph
11         |-*/
12         |
13         |/* This is a code comment
14         |*/
15         |fun aFunction() {
16         |   return 42
17         |}
18         |/*-
19         |More book text.
20         |-*/
21         """.trimMargin()
22         approver.assertApproved(translate(source))
23     }
24 }
25 
26 fun translate(source: String) = source

Here I’ve written an example file content as a Kotlin here document, and then, for now, an identity translate function just to get us running. Running the test creates a file CodeExtractorTests.writes_a_markdown_file_from_Kotlin_file.actual with the contents

 1 /*-
 2 Title
 3 =====
 4 This is Markdown paragraph
 5 -*/
 6 
 7 /* This is a code comment
 8 */
 9 fun aFunction() {
10    return 42
11 }
12 /*-
13 More book text.
14 -*/

which is what we assertApproved. The test fails, as there was no approved content to compare with the actual. We can make it pass by approving the content with

1 cp 'CodeExtractorTests.writes_a_markdown_file_from_Kotlin_file.actual' 'CodeExtr\
2 actorTests.writes_a_markdown_file_from_Kotlin_file.approved'

and running it again.

Of course we have just approved something that we know to be incorrect, but we’re taking baby steps here. Now we need to improve the translate function. I was about to start by stripping out the lines beginning with /*- and -*/, but if we do that first we’ll loose information about where the code starts. In fact thinking it through I realise that this page has code that we don’t want to view (the package and import statements at the top), and I’m sure that in general there will be other code that is required to compile but doesn’t contribute to the narrative. Maybe we need to explicitly mark code to be included.

 1 @Test fun writes_a_markdown_file_from_Kotlin_file() {
 2     val source = """
 3     |package should.not.be.shown
 4     |/*-
 5     |Title
 6     |=====
 7     |This is Markdown paragraph
 8     |-*/
 9     |import should.not.be.shown
10     |//`
11     |/* This is a code comment
12     |*/
13     |fun aFunction() {
14     |   return 42
15     |}
16     |//`
17     |/*-
18     |More book text.
19     |-*/
20     """.trimMargin()
21     approver.assertApproved(translate(source))
22 }

Here I’ve used a line comment with a backtick //`` to mark the beginning and end of the code we want to see.

Now we can first implement the code to strip out the block comments that hide our prose from the Kotlin compiler

1 fun translate(source: String) = source.split("\n")
2     .filterNot { it.startsWith("/*-") || it.startsWith("-*/") }
3     .joinToString("\n")

The first run of the test fails as the actual file is different from the approved - inspecting it I can see that the differences are indeed the stripped out block comment markers - the file is now

 1 package should.not.be.shown
 2 Title
 3 =====
 4 This is Markdown paragraph
 5 import should.not.be.shown
 6 //`
 7 /* This is a code comment
 8 */
 9 fun aFunction() {
10    return 42
11 }
12 //`
13 More book text.

Let’s get to work putting the code between our special markers into a Markdown code block.

 1 fun translate(source: String): String {
 2     var inCodeBlock = false
 3     var inTextBlock = false
 4     return source.split("\n")
 5         .map {
 6             when {
 7                 !inCodeBlock && it.startsWith("//`") -> {
 8                     inCodeBlock = true
 9                     "```kotlin"
10                 }
11                 inCodeBlock && it.startsWith("//`") -> {
12                     inCodeBlock = false
13                     "```"
14                 }
15                 !inTextBlock && it.startsWith("/*-") -> {
16                     inTextBlock = true
17                     ""
18                 }
19                 inTextBlock && it.startsWith("-*/") -> {
20                     inTextBlock = false
21                     ""
22                 }
23                 inTextBlock -> it
24                 inCodeBlock -> it
25                 else -> ""
26             }
27         }
28         .joinToString("\n")
29 }

Now I won’t pretend that was easy to write, or that I’m proud of it, but it does work, yielding

 1 Title
 2 =====
 3 This is Markdown paragraph
 4 
 5 
 6 ```kotlin
 7 /* This is a code comment
 8 */
 9 fun aFunction() {
10    return 42
11 }
12 ```
13 
14 More book text.

Note that I’ve chosen to leave blank lines where markers and ignored text are for now, as they make making sense of the output easier.

Now of course, I have to try the code on the file that I’m typing into right now, as that is the real point.

1 fun main(args: Array<String>) {
2     val markdown = translate(File("src/test/java/com/oneeyedmen/book/Chapter_01_\
3 Spike/01_c_Spike-First-Attempt-at-Publishing.kt").readText())
4     File("build/delme").apply {
5         mkdirs()
6         resolve("out.md").writeText(markdown)
7     }
8 }

It doesn’t quite work as I expected - it doesn’t find publish code markers ('’’) when they are indented with spaces. I suppose we should add that case to our test suite.

 1 @Test fun writes_a_markdown_file_from_Kotlin_file() {
 2     val source = """
 3     |package should.not.be.shown
 4     |/*-
 5     |Title
 6     |=====
 7     |This is Markdown paragraph
 8     |-*/
 9     |object HiddenContext {
10     |  //`
11     |  /* This is a code comment
12     |  */
13     |  fun aFunction() {
14     |     return 42
15     |  }
16     |  //`
17     |}
18     |/*-
19     |More book text.
20     |-*/
21     """.trimMargin()
22     approver.assertApproved(translate(source))
23 }

and implement quickly and dirtyly to see if it’s good.

 1 fun translate(source: String): String {
 2     var inCodeBlock = false
 3     var inTextBlock = false
 4     return source.split("\n")
 5         .map {
 6             when {
 7                 !inCodeBlock && it.trim().startsWith("//`") -> {
 8                     inCodeBlock = true
 9                     "```kotlin"
10                 }
11                 inCodeBlock && it.trim().startsWith("//`") -> {
12                     inCodeBlock = false
13                     "```"
14                 }
15                 !inTextBlock && it.trim().startsWith("/*-") -> {
16                     inTextBlock = true
17                     ""
18                 }
19                 inTextBlock && it.trim().startsWith("-*/") -> {
20                     inTextBlock = false
21                     ""
22                 }
23                 inTextBlock -> it
24                 inCodeBlock -> it
25                 else -> ""
26             }
27         }
28         .joinToString("\n")
29 }

This works, and, after fixing some places in this file that I had messed up the formatting, it works here too.

It feels like there is something general trying to get out of that inBlock.. code, but we’ll come back to it when I’m less tired. I’ll just make a small change to make it look less bad.

 1 fun translate(source: String): String {
 2     var inCodeBlock = false
 3     var inTextBlock = false
 4     return source.split("\n")
 5         .map { line ->
 6             when {
 7                 !inCodeBlock && line.firstNonSpaceCharsAre("//`") -> {
 8                     inCodeBlock = true
 9                     "```kotlin"
10                 }
11                 inCodeBlock && line.firstNonSpaceCharsAre("//`") -> {
12                     inCodeBlock = false
13                     "```"
14                 }
15                 !inTextBlock && line.firstNonSpaceCharsAre("/*-") -> {
16                     inTextBlock = true
17                     ""
18                 }
19                 inTextBlock && line.firstNonSpaceCharsAre("-*/") -> {
20                     inTextBlock = false
21                     ""
22                 }
23                 inTextBlock -> line
24                 inCodeBlock -> line
25                 else -> ""
26             }
27         }
28         .joinToString("\n")
29 }
30 
31 fun String.firstNonSpaceCharsAre(s: String) = this.trimStart().startsWith(s)

Combining Files

We’re almost done in this first stint. The next thing I need to do to increase my confidence that this is working is to combine all the sections of this chapter into a whole, so that I and others can read it.

This text is being written into 01_d_Spike-Combining-Files.kt, in a directory (actually Java / Kotlin package) com.oneeyedmen.book.Chapter_01_Spike. Listing all the source in the directory we see

  • 01_a_Spike-Introduction.kt
  • 01_b_Spike-First-Kotlin-File.kt
  • 01_c_Spike-First-Attempt-at-Publishing.kt
  • 01_d_Spike-Combining-Files.kt

In the time it’s taken to write the contents, these filenames have changed several times, with the aim to allow me to see what is in them and at the same time process them automatically. Let’s see if I succeeded in that second goal by processing all the files in sequence.

 1 fun main(args: Array<String>) {
 2     val dir = File("src/main/java/com/oneeyedmen/book/Chapter_01_Spike")
 3     val translatedLines: Sequence<String> = sourceFilesIn(dir)
 4         .flatMap(this::translate)
 5 
 6     val outDir = File("build/book").apply {
 7         mkdirs()
 8     }
 9 
10     outDir.resolve(dir.name + ".md").bufferedWriter(Charsets.UTF_8).use { writer\
11  ->
12         translatedLines.forEach {
13             writer.appendln(it)
14         }
15     }
16 }
17 
18 fun sourceFilesIn(dir: File) = dir
19     .listFiles { file -> file.isSourceFile() }
20     .toList()
21     .sortedBy(File::getName)
22     .asSequence()
23 
24 private fun File.isSourceFile() = isFile && !isHidden && name.endsWith(".kt")
25 
26 fun translate(source: File): Sequence<String> = translate(source.readText(Charse\
27 ts.UTF_8))
28 
29 fun translate(sourceLines: String): Sequence<String> {
30     var inCodeBlock = false
31     var inTextBlock = false
32     return sourceLines.splitToSequence("\n")
33         .map { line ->
34             when {
35                 !inCodeBlock && line.firstNonSpaceCharsAre("//`") -> {
36                     inCodeBlock = true
37                     "```kotlin"
38                 }
39                 inCodeBlock && line.firstNonSpaceCharsAre("//`") -> {
40                     inCodeBlock = false
41                     "```"
42                 }
43                 !inTextBlock && line.firstNonSpaceCharsAre("/*-") -> {
44                     inTextBlock = true
45                     null
46                 }
47                 inTextBlock && line.firstNonSpaceCharsAre("-*/") -> {
48                     inTextBlock = false
49                     null
50                 }
51                 inTextBlock -> line
52                 inCodeBlock -> line
53                 else -> null
54             }
55         }
56         .filterNotNull()
57 }
58 
59 fun String.firstNonSpaceCharsAre(s: String) = this.trimStart().startsWith(s)

Looking back at the last version of the translate function you’ll see that I have changed returning a Sequence - this will allow me to avoid having all the text of all files in memory. I’ve also put nulls into that Sequence where we are skipping a line, and then filtered out nulls from the Sequence with filterNotNull, so that we don’t have a blank line for each source line we aren’t outputting.

The rest of the code is pretty standard Kotlin - not very pretty or abstracted yet but good enough. It would probably read very much the same if I wrote it in Python - which I would if I didn’t have Kotlin here as it would be just too verbose in Java.

I had to up update the test as well as the code to account for the change to the signature of translate. The test is now

 1 class CodeExtractorTests {
 2 
 3     @Rule @JvmField val approver = approvalsRule()
 4 
 5     @Test fun writes_a_markdown_file_from_Kotlin_file() {
 6         val source = """
 7         |package should.not.be.shown
 8         |/*-
 9         |Title
10         |=====
11         |This is Markdown paragraph
12         |-*/
13         |object HiddenContext {
14         |  //`
15         |  /* This is a code comment
16         |  */
17         |  fun aFunction() {
18         |     return 42
19         |  }
20         |  //`
21         |}
22         |/*-
23         |More book text.
24         |-*/
25         """.trimMargin()
26         approver.assertApproved(translate(source).joinToString("\n"))
27     }
28 }

and the approved file shows no blank lines where we the skip a line in the source

 1 Title
 2 =====
 3 This is Markdown paragraph
 4 ```kotlin
 5   /* This is a code comment
 6   */
 7   fun aFunction() {
 8      return 42
 9   }
10 ```
11 More book text.

Running the code and looking at the rendered Markdown with IntelliJ’s Markdown plugin I see one glaring problem. Where one file ends and another starts we need to separate them with a blank line if the combined Markdown isn’t to be interpreted as a contiguous paragraph. Let’s fix that by adding a blank line to the end of each file’s lines.

1 val translatedLines: Sequence<String> = sourceFilesIn(dir)
2     .flatMap { translate(it).plus("\n") }

There is one other issue that I can see, which has to do with trying to show in the text how I mark the prose regions. Where I wrote

1 I'm going to go with
2 
3 ⁠/*-
4 this is book text
5 ⁠-*/
the markers were interpreted as markers and messed up the output. I add a pipe character | to the beginning of those marker lines to get things running. I don’t have a solution to this at the moment, bar the pipe, but suspect that we’ll need some way of escaping our own codes. I’m trusting that as I gain fluency with Markdown something clever will come up. If you can see the markers without |s above I guess I succeeded in the end.

Conclusions

What can we learn from this episode?

Well I learned that this model works for me when actually explaining concepts. Unless you’re viewing the source version of this text, you won’t see that there is a subtlety around the different versions of the code existing in the same source file. When explaining code on my blog I would have to move forward and back between different source versions, cutting and pasting them into the code. Here, I can keep the different versions in the same file, which is a big productivity win.

My learning that is a demonstration of the big topic - risk reduction. Before I started work on this book I identified the following main risks:

  1. No-one will want to read what I write on this subject
  2. It will be too much effort to maintain a code-heavy text

These are typical of the 2 major risks facing a new software project - failure in the market and failure to be able to deliver economically (if at all). Unless we are certain that we can build something that our customers will want we had better address both risks early on. The way to address them is by gathering information as quickly and as cheaply as possible.

If you are reading this text in anything other than the finished book, then you’ll see that I will have addressed the market risk by releasing the product before it was finished in order to gauge if there is any enthusiasm. The chapter itself has been me addressing the technical risk that it’s too hard to write about code by

  1. Hypothesising that I could write the book text within the source code rather than vice versa, and that that would be an efficient process, if I had a tool to extract it.
  2. Experimenting to see if the hypothesis is true by trying it out.

I’ve done just enough here to show that the book-in-comments approach has legs. No doubt I’ll spend days on tooling to produce the book before I’m finished - the code is rough and the inability to write about my own notation without invoking it is a shame - but right now I have confidence without too much cost.

Tooling is another take-away. Modern developers will go to great lengths to increase their own productivity by building tools to leverage their time. Sometimes these are simple scripts, sometimes full applications. I’m hoping to leverage the power of an IDE that I already know with some text processing to produce something greater than the sum of its parts.

Leverage is my final thought. Can we find ways of solving several problems at once? In this case I’ve managed to test the writing model, written some tooling to support it, and perhaps demonstrated some aspects of modern programming, all at the same time. We should obviously strive for humility, but I’m secretly a bit pleased with myself.

Request for Feedback

Well you’ve either got this far, or skipped here having decided that the content wasn’t from you. Could you spare a couple of minutes to help me gauge whether I am wasting my time?

If so …

… please create an email to me, copy the following into an email, start answering at the top, keep going until you don’t think you owe me any more or your precious time, and send. I’m sorry that I don’t have an embedded form or anything - yet. Maybe if there is enough encouragement the processing of the form will become a chapter!

  1. Should I continue writing this book? [y | n]
  2. Are you in the book’s target market (3 - 10+ years of programming experience)? [y | n]
  3. What do you think of the recursive nature of the material, writing about the development of the software to assist the writing? [0 (disaster) - 10 (triumph)]
  4. How likely are you to recommend the book to a friend or colleague? [0 (not at all) - 10 (extremely)]
  5. Would you pay to read the completed book? [y | n]
  6. Do you care about paper copies of books? [y | n]
  7. Is there anything else you’d like to say? [ ... ]

If you do like what I’ve written, then there is more available on Leanpub where you can read the other chapters for free. And please invite a friend.

Thank you

Duncan

Chapter 2 - Entrenchment

Writing software is hard. We probably agree on that - I know that I find it hard, and you probably aren’t investing time reading a book on a subject that you have no problem mastering. One of the many many hard problems is deciding, minute by minute, hour by hour, day by, well, you get the drift, what we should do next.

So far we have, if you’ll forgive a military metaphor, taken some ground. Our previous foray resulted in some confidence that this book might be worth writing, and that the technology and process I proposed might do the job. But in order to gain this confidence we have overextended a bit. Some things don’t work at all, and others are not implemented well.

Agile development calls this technical debt, but that term is a bit pejorative, and doesn’t allow us to distinguish between, if you’ll forgive a finance metaphor, leverage and lack of servicing. [TODO]

I was about to suggest that we review what does and doesn’t work, maybe have a little refactor to make things clearer, but then a customer arrived who wanted to use the code in real life. Granted that customer was me, wanting to use the tool to simplify publishing code-heavy articles in my blog, but a customer is a customer. We’re always having to balance immediately gratifying our customer or users against code quality - it can lead to some difficult negotiations - but in this case I’m insistent that writing a blog article outweighs the nagging doubt that the code just isn’t good enough, and I find that I agree.

Our current main method is hard-coded to the source path in this project. Finding a way to run the code from the command-line, specifying the source directory and destination file as parameters, would get me off my back, so I set to.

 1 fun main(args: Array<String>) {
 2     val srcDir = File(args[0])
 3     val outFile = File(args[1]).apply {
 4         absoluteFile.parentFile.mkdirs()
 5     }
 6 
 7     val translatedLines: Sequence<String> = sourceFilesIn(srcDir)
 8         .flatMap { translate(it).plus("\n") }
 9 
10     outFile.bufferedWriter(Charsets.UTF_8).use { writer ->
11         translatedLines.forEach {
12             writer.appendln(it)
13         }
14     }
15 }

I test this using IntelliJ’s runner and then go rummaging in the far corners of the Internet to find out how to invoke Gradle to both build and then run this class. Full-disclosure, this led to over an hour of trying to work out what Gradle was building where - code can seem very simple compared to build systems. As I know that my chances of remembering what incantations are required are low, I capture the required commands in a top level script in the project directory -

1 #!/usr/bin/env bash
2 
3 ./gradlew clean installDist distZip
4 echo Zip file built in build/distributions
5 echo Start script built in build/install/book/bin/book

I then put on my blogger’s hat and write a couple of articles on the relationship between objects and functions in Kotlin, using the same formatting rules as this book. It goes pretty well, giving me more confidence that I could write a whole book this way, but it does reveal a minor issue in the translation.

It turns out that my blog publishing software, unlike IntelliJ’s Markdown renderer, requires a blank line before a code block. Our current translate logic removes blank lines. The fix is simple - output any blank sources lines.

Returning to this project after a little time, I review the code and the prose, and find that I can just about follow my original thread. I’m still far from convinced that anyone else could make sense of it though. The best way to find out would be to share the first chapter with friends to get their feedback, but the published Markdown files aren’t yet actually readable. In particular

  1. I have no way of escaping my own formatting markers, so I can’t represent them in the output text.
  2. The contents of the approved files are not shown (as I write this they are represented by [TODO] markers).

Markdown doesn’t have any standard way of representing included text, but as we’re reading and writing every line it shouldn’t be too hard to do this ourselves. That leaves the escaping as a bigger risk, as I don’t have a clue what to do.

What was the problem again? I want to be able to write express in Markdown the syntax for my ‘book text’ markers, namely /*- and -*/, but in a quoted section, so that you can read

1 ⁠/*-
2 this is book text
3 ⁠-*/
but when I type that and run it through the translation, it (correctly) interprets the markers and messes up the formatting. I had bodged around the problem by prepending | characters to the lines that cause problems, but that makes them very hard to read.

I talk the problem through with my friend Alan, and we hatch the plan of interpreting the traditional \ as an escape character which will just be removed from the output, allowing me to write /*- and */ at the beginning of a line. Of course in order to write \ there will need to be additional logic to replace a double-escape with a single.

Thinking through the consequences this looks a less good plan, as I may wish to use genuine escape sequences in the Kotlin code that is output as part of the book. I could just ignore /*- -*/ pair in quote blocks, but then I’d be having to parse Markdown which looks like a can of pigeons.

So instead I guess that we should pick a character other than \ to prevent text matches but not be rendered in our final output. This looks like a job for Unicode, so now we have 2 problems.

A bit of Googling reveals Unicode U+2060 - “WORD JOINER”, which is apparently similar to a zero-width space, but not a point at which a line can be broken. This seems perfect - I can enter it with the Mac hex input keyboard mode, and it prevents our special text being interpreted at the beginning of a line. It is rendered properly in IntelliJ, which is to say, not at all, which is great if you’re reading, but not so good when editing. I can probably use its HTML entity form \&#2060; if I want to make it more visible.

Frankly this is all making my head hurt. My spidey senses tell me that this going to cost me in the future, but for now, I need to get on with things and this change required no code, so I find all the places where I used the | hack and replace them with the invisible word joiner.

Ironically, writing about the attempt to fix the representing-our-own-codes-in-our-output problem has given us the same problem in another set of files, those for this chapter, currently Chapter 2. I’d like to be able to review the whole book so far, which means converting two chapters’ worth of Kotlin files into Markdown for reading. Luckily at the top of this chapter I worked out how to build the current software to be invoked from the command-line, so I add a top-level script to invoke this twice for the two current chapters.

1 #!/usr/bin/env bash
2 
3 build/install/book/bin/book src/main/java/com/oneeyedmen/book/Chapter_01_Spike b\
4 uild/book/Chapter_01_Spike.md
5 
6 build/install/book/bin/book src/main/java/com/oneeyedmen/book/Chapter_02_Entrenc\
7 hment build/book/Chapter_02_Entrenchment.md

Running this yields two Markdown files that I can preview with IntelliJ’s Markdown plugin to check that they look sensible.

This script leaves me feeling quite dirty - it can only work with one particular working directory, it has duplication, and it will require editing every time that I add a chapter (but at least not every file within a chapter). If I was sharing this code with a team I think that pride would cause me to do a better job, but for now momentum is more important than long-term efficiency, so I suck it up and move on to the next thing keeping me from sending out a review text - including the contents of files.

To review - the issue is that currently the book text has [TODO] markers in place of the contents of the Approval Tests approved files. I could copy and paste the text from the file into the manuscript of course, but that breaks one of those rules that we ignore at our peril - the Single Point of Truth. If I do copy and paste the text then it can get out of sync with the file contents, which would be confusing for you, the reader. It doesn’t seem like automatically including files should be hard, and I can think of some other places that it might come in handy, so let’s just get on and implement it.

Our approach to special codes so far has been to enrich Kotlin comments, so that the compiler ignores our information. It’s easy to extend that approach, and I settle on

1 //#include "filename.txt"

as worth trying.

Let’s get on and write a new test for including files.

 1 class CodeExtractorTests {
 2 
 3     @Rule @JvmField val approver = approvalsRule()
 4 
 5     @Test fun writes_a_markdown_file_from_Kotlin_file() {
 6         val source = """
 7         |package should.not.be.shown
 8         |/*-
 9         |Title
10         |=====
11         |
12         |This is Markdown paragraph
13         |-*/
14         |
15         |object HiddenContext {
16         |  //`
17         |  /* This is a code comment
18         |  */
19         |  fun aFunction() {
20         |     return 42
21         |  }
22         |  //`
23         |}
24         |
25         |/*-
26         |More book text.
27         |-*/
28         """.trimMargin()
29         approver.assertApproved(translate(source).joinToString("\n"))
30     }
31 
32     @Test fun includes_a_file() {
33         val source = """
34         |/*-
35         |Book text
36         |
37         |//#include "included-file.txt"
38         |
39         |more text
40         |/*-
41         """.trimMargin()
42         approver.assertApproved(translate(source).joinToString("\n"))
43     }
44 }

Whereas previously we had a single test for all of our formatting, #include seems like an orthogonal concern, so it gets its own test.

Before we go on I decide to remove the duplication in the test methods

 1 class CodeExtractorTests {
 2 
 3     @Rule @JvmField val approver = approvalsRule()
 4 
 5     @Test fun writes_a_markdown_file_from_Kotlin_file() {
 6         checkApprovedTranslation("""
 7         |package should.not.be.shown
 8         |/*-
 9         |Title
10         |=====
11         |
12         |This is Markdown paragraph
13         |-*/
14         |
15         |object HiddenContext {
16         |  //`
17         |  /* This is a code comment
18         |  */
19         |  fun aFunction() {
20         |     return 42
21         |  }
22         |  //`
23         |}
24         |
25         |/*-
26         |More book text.
27         |-*/
28         """)
29     }
30 
31     @Test fun includes_a_file() {
32         checkApprovedTranslation("""
33         |/*-
34         |Book text
35         |
36         |//#include "included-file.txt"
37         |
38         |more text
39         |/*-
40         """)
41     }
42 
43     private fun checkApprovedTranslation(source: String) {
44         approver.assertApproved(translate(source.trimMargin()).joinToString("\n"\
45 ))
46     }
47 }

which by removing clutter lets us focus on what we are really trying to demonstrate, albeit the actual result is hidden in the approved files.

Speaking of which, we can’t really approve the output for includes_a_file as we don’t. In order to get a little further lets create a file next to this Kotlin file with some text to be included

1 included file first line
2 included file last line

It’s worth noting at this point that what to put in this file took some thought. I originally wrote included file line 1 and included file line 2 but realised that you would have to know that there wasn’t a line 3 to know that the results were correct. I have also not put a line break at the end of the included file, because the files that we want to include do not have trailing break. Should the inclusion add a break? My gut feeling is yes, but I’m not going to think too hard. Instead I’ll go with that gut feeling and see if it works out when run over the book text so far.

Where to put the file also raises some questions. The files that we want to include are currently in the same directory as the file that includes them. Until I have a need for another behaviour I guess that

1 //#include "filename.txt"

should resolve filename.txt relative to the the source file. But so far, our tests have just translated strings and have no concept of their fileyness.

We could change our tests to read the source from the filesystem, but that would mean that we had to look in a source file, an approved file, and the test, to work out what is going on. Instead I’m going to try punting the problem upstream, by requiring that callers of the translate function provide a way of resolving a file path to the contents of that filename, in addition to the string that they want translated. In the test, we’ll use a lambda that just returns a known string.

 1 @Test fun includes_a_file() {
 2     checkApprovedTranslation(
 3         source = """
 4             |/*-
 5             |Book text
 6             |
 7             |//#include "included-file.txt"
 8             |
 9             |more text
10             |/*-""",
11         fileContents = """
12             |included file first line
13             |included file last line""")
14 }
15 
16 private fun checkApprovedTranslation(source: String, fileContents: String) {
17     approver.assertApproved(translate(source.trimMargin(), { fileContents }))
18 }
19 
20 fun translate(source: String, fileReader: (String) -> String): String {
21     TODO()
22 }

With this scheme it turns out that I needn’t have created the file, so I delete it. I’m also regretting the technical overdraft that is translate, but I really don’t have a better idea at the moment, so I’m inclined to leave the logic as-is, but at least try to make it not much worse. Perhaps the easiest way of achieving that is to treat file inclusion as a pre-processor step before our previous logic.

1 fun translate(source: String, fileReader: (String) -> String) =
2     translate(preprocess(source, fileReader))
3 
4 fun preprocess(source: String, fileReader: (String) -> String): String {
5     TODO()
6 }

This keeps things simple, but as I think through the consequences I realise that it means that the contents of the included file will be subject to the translation process, which would result in a complete mess when our use-case is including a file which sometimes includes our un-translated codes.

So it’s back to the drawing board.

I’m mulling it over when I realise that I now have two problems caused by translating our codes when we don’t want them translated - this, and the last issue that I had solved using the Unicode Word Joiner. In both cases the only time we see the issue in practice is

1 inside Markdown quote blocks

I had previously rejected the idea of suspending translation inside these blocks, but now that it solves two problems it’s looking more attractive. But it also feels kind of arbitrary. Notwithstanding my desire to make progress quickly to publish Chapter 1 for review, the aim of this book is to show largely unedited the process of writing software. I could come back and re-write this chapter when I can make it look smooth, but the truth is that I’m having a bit of a crisis of confidence about the whole translation code. I think it’s time to step back and see if we are still on the right track.

This is a very familiar stage in a software project for me. We have a kinda-cool idea - embedding the book text into the source of its examples rather than vice-versa. A spike shows that it works well for the customer - in this case me. We build on that spike, but then find that a number of tactical decisions driven by a desire to maintain momentum lead to a muddied model. It’s all a bit harder than we expected, and we rightly question whether we are on the right path at all.

So let’s review my options.

I could abandon the prose-in-code model - there are other ways to embed code and make sure that it compiles and works as expected. But none that I’ve tried allow the degree of refactoring and cross-referencing that I’m enjoying here, at least not with the immediate ability to visualise the output that this solution has. On the whole I’m inclined to continue down the path, and not just because the path is giving me things to write about.

Is there some more off-the-shelf processing system that I could use? Adopting a standard templating language would bring things like escaping and file inclusion for free. Perhaps more importantly, they will have worked through the sort of issues that I’m having - is a file included before or after expansion for example. On the downside, they are usually aimed at embedding in files like HTML rather than Kotlin source, so I might find that I end up in a dead-end where I can’t solve a problem at all.

Assuming that I continue with prose-in-code, what has this experiment taught me about the translation?

  1. The source files have to be parsable by the Kotlin compiler and IntelliJ has to be able to treat them as regular source. That’s why the prose has to go into comment blocks.
  2. I want to distinguish between prose comment blocks and code comment blocks - leading to the use of /*- and -*/ markers, taking a lead from Javadoc. Actually I don’t think that I’ve actually made use the distinction, but its cost is low.
  3. Code from the file should be expanded into Markdown code blocks, but only for some sections of the file. We can’t use a block comment to mark those sections because they would then not be visible to the Kotlin compiler, so instead we use //\ to mark the start and end of the blocks. The compiler will ignore the lines as they begin with a comment, and I can associate the backtick as Markdown's quote character. On reflection //```` would have been even better - maybe that’s a change I can make later.
  4. So far our special syntax has been able to occupy a line of its own, although in the case of the //\` markers they need to be indented to allow the code they are in to be reformatted.
  5. There has to be a way to have the markup rules written to the output rather than interpreted.
  6. I need a way of including the contents of a file in a Markdown quote block, without interpreting the contents of the file.

I can’t see any way that an off-the-shelf templating system could meet these requirements - the need to interoperate with Kotlin comments being the key issue. But I still really like this way of working, so I’m going to push on with the code that I have.

Deciding to go on is all very well, but I don’t think I’m much closer to knowing how to implement file inclusion. In these circumstances I’ve learned to just play with code until inspiration strikes. In particular - refactor until where the new feature should live is obvious.

We shouldn’t refactor code that doesn’t work though, so let’s back-out our last change and remove the inclusion test to see what we have.

 1 class CodeExtractorTests {
 2 
 3     @Rule @JvmField val approver = approvalsRule()
 4 
 5     @Test fun writes_a_markdown_file_from_Kotlin_file() {
 6         checkApprovedTranslation("""
 7         |package should.not.be.shown
 8         |/*-
 9         |Title
10         |=====
11         |
12         |This is Markdown paragraph
13         |-*/
14         |
15         |object HiddenContext {
16         |  //`
17         |  /* This is a code comment
18         |  */
19         |  fun aFunction() {
20         |     return 42
21         |  }
22         |  //`
23         |}
24         |
25         |/*-
26         |More book text.
27         |-*/
28         """)
29     }
30 
31     private fun checkApprovedTranslation(source: String) {
32         approver.assertApproved(translate(source.trimMargin()).joinToString("\n"\
33 ))
34     }
35 }
36 
37 fun translate(sourceLines: String): Sequence<String> {
38     var inCodeBlock = false
39     var inTextBlock = false
40     return sourceLines.splitToSequence("\n")
41         .map { line ->
42             when {
43                 !inCodeBlock && line.firstNonSpaceCharsAre("//`") -> {
44                     inCodeBlock = true
45                     "```kotlin"
46                 }
47                 inCodeBlock && line.firstNonSpaceCharsAre("//`") -> {
48                     inCodeBlock = false
49                     "```"
50                 }
51                 !inTextBlock && line.firstNonSpaceCharsAre("/*-") -> {
52                     inTextBlock = true
53                     null
54                 }
55                 inTextBlock && line.firstNonSpaceCharsAre("-*/") -> {
56                     inTextBlock = false
57                     null
58                 }
59                 inTextBlock -> line
60                 inCodeBlock -> line
61                 line.isBlank() -> line
62                 else -> null
63             }
64         }
65         .filterNotNull()
66 }

I never was happy with the translate method - let’s refactor to see what comes out. I don’t believe that we can be in a code block and a text block at the same time, but the code doesn’t express this. I’m going to replace the two variables with one, first by replacing inCodeBlock with an object. Note that each one of these steps keeps the tests passing.

 1 object InCodeBlock
 2 
 3 fun translate(sourceLines: String): Sequence<String> {
 4     var state: InCodeBlock? = null
 5     var inTextBlock = false
 6     return sourceLines.splitToSequence("\n")
 7         .map { line ->
 8             when {
 9                 state != InCodeBlock && line.firstNonSpaceCharsAre("//`") -> {
10                     state = InCodeBlock
11                     "```kotlin"
12                 }
13                 state == InCodeBlock && line.firstNonSpaceCharsAre("//`") -> {
14                     state = null
15                     "```"
16                 }
17                 !inTextBlock && line.firstNonSpaceCharsAre("/*-") -> {
18                     inTextBlock = true
19                     null
20                 }
21                 inTextBlock && line.firstNonSpaceCharsAre("-*/") -> {
22                     inTextBlock = false
23                     null
24                 }
25                 inTextBlock -> line
26                 state == InCodeBlock -> line
27                 line.isBlank() -> line
28                 else -> null
29             }
30         }
31         .filterNotNull()
32 }

Then inTextBlock. It turns out that we don’t really care if we are in a text block or not when we see it’s end markers, so we can also simplify the when clauses and keep the tests passing.

 1 object InCodeBlock
 2 object InTextBlock
 3 
 4 fun translate(sourceLines: String): Sequence<String> {
 5     var state: Any? = null
 6     return sourceLines.splitToSequence("\n")
 7         .map { line ->
 8             when {
 9                 state != InCodeBlock && line.firstNonSpaceCharsAre("//`") -> {
10                     state = InCodeBlock
11                     "```kotlin"
12                 }
13                 state == InCodeBlock && line.firstNonSpaceCharsAre("//`") -> {
14                     state = null
15                     "```"
16                 }
17                 line.firstNonSpaceCharsAre("/*-") -> {
18                     state = InTextBlock
19                     null
20                 }
21                 line.firstNonSpaceCharsAre("-*/") -> {
22                     state = null
23                     null
24                 }
25                 state == InTextBlock -> line
26                 state == InCodeBlock -> line
27                 line.isBlank() -> line
28                 else -> null
29             }
30         }
31         .filterNotNull()
32 }

and then replace null with Other and create a State class

 1 sealed class State {
 2     object InCodeBlock : State()
 3     object InTextBlock : State()
 4     object Other : State()
 5 }
 6 
 7 fun translate(sourceLines: String): Sequence<String> {
 8     var state: State = State.Other
 9     return sourceLines.splitToSequence("\n")
10         .map { line ->
11             when {
12                 state != State.InCodeBlock && line.firstNonSpaceCharsAre("//`") \
13 -> {
14                     state = State.InCodeBlock
15                     "```kotlin"
16                 }
17                 state == State.InCodeBlock && line.firstNonSpaceCharsAre("//`") \
18 -> {
19                     state = State.Other
20                     "```"
21                 }
22                 line.firstNonSpaceCharsAre("/*-") -> {
23                     state = State.InTextBlock
24                     null
25                 }
26                 line.firstNonSpaceCharsAre("-*/") -> {
27                     state = State.Other
28                     null
29                 }
30                 state == State.InTextBlock -> line
31                 state == State.InCodeBlock -> line
32                 line.isBlank() -> line
33                 else -> null
34             }
35         }
36         .filterNotNull()
37 }

Now I can delegate what happens to non-transition lines to the State

 1 sealed class State {
 2     abstract fun outputFor(line: String): String?
 3 
 4     object InCodeBlock : State() {
 5         override fun outputFor(line: String) = line
 6     }
 7 
 8     object InTextBlock : State() {
 9         override fun outputFor(line: String) = line
10     }
11 
12     object Other : State() {
13         override fun outputFor(line: String) = if (line.isBlank()) line else null
14     }
15 
16 }
17 
18 fun translate(sourceLines: String): Sequence<String> {
19     var state: State = State.Other
20     return sourceLines.splitToSequence("\n")
21         .map { line ->
22             when {
23                 state != State.InCodeBlock && line.firstNonSpaceCharsAre("//`") \
24 -> {
25                     state = State.InCodeBlock
26                     "```kotlin"
27                 }
28                 state == State.InCodeBlock && line.firstNonSpaceCharsAre("//`") \
29 -> {
30                     state = State.Other
31                     "```"
32                 }
33                 line.firstNonSpaceCharsAre("/*-") -> {
34                     state = State.InTextBlock
35                     null
36                 }
37                 line.firstNonSpaceCharsAre("-*/") -> {
38                     state = State.Other
39                     null
40                 }
41                 else -> state.outputFor(line)
42             }
43         }
44         .filterNotNull()
45 }

Now some lines cause the state to change, and some don’t. I’m going to pull out a method called advance that returns the next state and the string that should be output.

 1 sealed class State {
 2     abstract fun outputFor(line: String): String?
 3 
 4     object InCodeBlock : State() {
 5         override fun outputFor(line: String) = line
 6     }
 7 
 8     object InTextBlock : State() {
 9         override fun outputFor(line: String) = line
10     }
11 
12     object Other : State() {
13         override fun outputFor(line: String) = if (line.isBlank()) line else null
14     }
15 
16 }
17 
18 fun translate(sourceLines: String): Sequence<String> {
19     var state: State = State.Other
20     return sourceLines.splitToSequence("\n")
21         .map { line ->
22             advance(state, line)
23         }
24         .onEach { state = it.first }
25         .map { it.second }
26         .filterNotNull()
27 }
28 
29 private fun advance(state: State, line: String): Pair<State, String?> {
30     return when {
31         state != State.InCodeBlock && line.firstNonSpaceCharsAre("//`") -> {
32             State.InCodeBlock to "```kotlin"
33         }
34         state == State.InCodeBlock && line.firstNonSpaceCharsAre("//`") -> {
35             State.Other to "```"
36         }
37         line.firstNonSpaceCharsAre("/*-") -> {
38             State.InTextBlock to null
39         }
40         line.firstNonSpaceCharsAre("-*/") -> {
41             State.Other to null
42         }
43         else -> {
44             state to state.outputFor(line)
45         }
46     }
47 }

Now our processing is nice and symmetrical we can move advance to the State

 1 sealed class State {
 2     fun advance(line: String): Pair<State, String?> {
 3         return when {
 4             this !is InCodeBlock && line.firstNonSpaceCharsAre("//`") -> {
 5                 InCodeBlock to "```kotlin"
 6             }
 7             this is InCodeBlock && line.firstNonSpaceCharsAre("//`") -> {
 8                 Other to "```"
 9             }
10             line.firstNonSpaceCharsAre("/*-") -> {
11                 InTextBlock to null
12             }
13             line.firstNonSpaceCharsAre("-*/") -> {
14                 Other to null
15             }
16             else -> {
17                 this to outputFor(line)
18             }
19         }
20     }
21     abstract fun outputFor(line: String): String?
22 
23     object InCodeBlock : State() {
24         override fun outputFor(line: String) = line
25     }
26 
27     object InTextBlock : State() {
28         override fun outputFor(line: String) = line
29     }
30 
31     object Other : State() {
32         override fun outputFor(line: String) = if (line.isBlank()) line else null
33     }
34 
35 }

Now we can delegate to InCodeBlock to make things, if not simpler, at least explicit.

 1 sealed class State {
 2     open fun advance(line: String): Pair<State, String?> = when {
 3         line.firstNonSpaceCharsAre("//`") -> InCodeBlock to "```kotlin"
 4         line.firstNonSpaceCharsAre("/*-") -> InTextBlock to null
 5         line.firstNonSpaceCharsAre("-*/") -> Other to null
 6         else -> this to outputFor(line)
 7     }
 8     abstract fun outputFor(line: String): String?
 9 
10     object InCodeBlock : State() {
11         override fun advance(line: String) = if (line.firstNonSpaceCharsAre("//`\
12 ")) Other to "```"
13             else super.advance(line)
14         override fun outputFor(line: String) = line
15     }
16 
17     object InTextBlock : State() {
18         override fun outputFor(line: String) = line
19     }
20 
21     object Other : State() {
22         override fun outputFor(line: String) = if (line.isBlank()) line else null
23     }
24 }

To be honest I’m not entirely sure that this is better than the first formulation. That had the advantage of being transparent - you could see what is was doing. Now we have polymorphism and pairs and not-very well-named methods in the mix - they had better pull their weight if we are going to decide that this excursion was worthwhile. I wouldn’t be sorry to just revert this refactor - if nothing else I’ve realised that the code had redundant logic that when removed would improve the old version.

Luckily it’s lunchtime, so I get a natural break to allow my brain to mull over what I have done and see if it’s helpful.

I had a chance on a long bike ride to think things through, and came up with a plan, but trying that really didn’t work so I won’t bore you with it. I talk things over with Alan, who has a PhD in Computer Science, and he reminds me that he always said that I should use the right tool for the job - a proper parser. Now my education was as a physicist, so I have large gaps in my knowledge of proper computing, and I know that many other programmers I respect consider that being able to devise and parse little languages is a sign of maturity. When Alan offers to pair with me to use introduce a pukka parser I figure that I’d be a fool not to take the learning opportunity, even though it will potentially delay my goal of getting a review copy of Chapter One published.

Chapter 3 - Parser

The story so far. I have an-almost functioning parser for my prose-in-code format, but it keeps on collapsing under its own weight. I’d like to ignore this and get on with publishing the first chapter, but I can’t see how to add the features that I need to make this a reality without being left with an irredeemable mess. At this point I’m offered some help to show me how to use a grown-up parser, and I leap at the chance.

Alan and I go looking for parser libraries for Java. ANTLR (ANother Tool for Language Recognition) is the standard, but it is a parser generator - meaning that it generates Java source files that you have to compile. For a large project that might be acceptable, but here, its a build step too far.

Google’s fifth result is Parboiled. Parboiled allows us to specify our format in Java code, and make use of it without code generation, so it looks like a good bet. It also has a functioning Markdown parser as one of its examples, which feels like a good portent. We tell Gradle about it’s JAR file and convert a simple calculator example to Kotlin to see if it runs at all.

 1 class ParserTests {
 2 
 3     @org.junit.Rule @JvmField val approver = approvalsRule()
 4 
 5     @Test fun test() {
 6         val parser = Parboiled.createParser(CalculatorParser::class.java)
 7         val result = ReportingParseRunner<Any>(parser.Expression()).run("1+2");
 8         approver.assertApproved(ParseTreeUtils.printNodeTree(result));
 9     }
10 }
11 
12 @BuildParseTree
13 open class CalculatorParser : BaseParser<Any>() {
14 
15     open fun Expression(): Rule {
16         return Sequence(
17             Term(),
18             ZeroOrMore(AnyOf("+-"), Term())
19         )
20     }
21 
22     open fun Term(): Rule {
23         return Sequence(
24             Factor(),
25             ZeroOrMore(AnyOf("*/"), Factor())
26         )
27     }
28 
29     open fun Factor(): Rule {
30         return FirstOf(
31             Number(),
32             Sequence('(', Expression(), ')')
33         )
34     }
35 
36     open fun Number(): Rule {
37         return OneOrMore(CharRange('0', '9'))
38     }
39 }

The only change we had to make from IntelliJ’s conversion of the Java to Kotlin was to add the open keyword, as it turns out that Parboiled makes use of bytecode generation to enhance our parser class.

We used another Approval Test here so that we could record the output from parseNodeTree for your benefit - it looks like this

 1 [Expression] '1+2'
 2   [Term] '1'
 3     [Factor] '1'
 4       [Number] '1'
 5         [0..9] '1'
 6     [ZeroOrMore]
 7   [ZeroOrMore] '+2'
 8     [Sequence] '+2'
 9       [[+-]] '+'
10       [Term] '2'
11         [Factor] '2'
12           [Number] '2'
13             [0..9] '2'
14         [ZeroOrMore]

which is a representation of the Rules that matched the parsed text 1+2.

So far so good.

Now to try a subset of the book format.

 1 class ParserTests {
 2 
 3     val parser = Parboiled.createParser(Context3.BookParser::class.java) // TODO\
 4  - remove Context3
 5 
 6     @Test fun test() {
 7         check("""
 8         |hidden
 9         |/*-
10         |first shown
11         |last shown
12         |-*/
13         |hidden
14         """);
15     }
16 
17     private fun check(input: String) {
18         val parsingResult = ReportingParseRunner<Any>(parser.Root()).run(input.t\
19 rimMargin())
20         println(ParseTreeUtils.printNodeTree(parsingResult))
21         if (!parsingResult.matched) {
22             fail()
23         }
24     }
25 }

Full disclosure - this test was the end result of at least a couple of hours of iterating in on a solution as we gained understanding of how Parboiled works, and tried to find a way of having the tests tell us how well we were doing. The println is a sure sign that we’re really only using JUnit as way of exploring an API rather than writing TDD or regression tests.

What does the parser look like?

 1 @BuildParseTree
 2 open class BookParser : BaseParser<Any>() {
 3 
 4     open fun NewLine() = Ch('\n')
 5     open fun OptionalNewLine() = Optional(NewLine())
 6     open fun Line() = Sequence(ZeroOrMore(NoneOf("\n")), NewLine())
 7 
 8     open fun TextBlock(): Rule {
 9         return Sequence(
10             Sequence(String("/*-"), NewLine()),
11             ZeroOrMore(
12                 Sequence(
13                     TestNot(String("-*/")),
14                     ZeroOrMore(NoneOf("\n")),
15                     NewLine())),
16             Sequence(String("-*/"), OptionalNewLine())
17         )
18     }
19 
20     open fun Root() = ZeroOrMore(FirstOf(TextBlock(), Line()))
21 }

I won’t take this opportunity to explain PEG parsers, largely because you already know that at the time of writing I have no more than 4 hours experience. It’s a selling point of Parboiled that you can probably read the code and have a good idea how it works. That isn’t to say that we just rattled this out - it took a good 3 hours of experiment and blind alleys, even pairing with a trained computer scientist - and I’m still not convinced that it will work in all real circumstances. In order to find out we’re going to need a better way of seeing what matches.

Reading the documentation shows that we can add Actions in Sequences to capture results, which are added to a stack that is made available in the parsingResult. Let’s define a TextBlock class to hold the lines that we’re interested in and ask Parboiled to push one on the stack when when the result is available.

1 data class TextBlock(val lines: MutableList<String>) {
2     constructor(vararg lines: String) : this(mutableListOf(*lines))
3 
4     fun addLine(line: String) = lines.add(line)
5 }
 1 open fun TextBlockRule(): Rule {
 2     val block = Var<TextBlock>(TextBlock())
 3     return Sequence(
 4         Sequence(String("/*-"), NewLine()),
 5         ZeroOrMore(
 6             Sequence(
 7                 TestNot(String("-*/")),
 8                 ZeroOrMore(NoneOf("\n")),
 9                 NewLine()),
10             block.get().addLine(match())),
11         Sequence(String("-*/"), OptionalNewLine()),
12         push(block.get())
13     )
14 }

Now our test can access the result stack and actually make assertions. As TextBlock is a data class we get a working definition of equals for free.

 1 class ParserTests {
 2 
 3     val parser = Parboiled.createParser(BookParser::class.java)
 4 
 5     @Test fun `single text block`() {
 6         check("""
 7             |hidden
 8             |/*-
 9             |first shown
10             |last shown
11             |-*/
12             |hidden""",
13             expected = listOf(
14                 TextBlock("first shown\n", "last shown\n"))
15         )
16     }
17 
18     @Test fun `two text blocks`() {
19         check("""
20             |/*-
21             |first shown
22             |-*/
23             |hidden
24             |/*-
25             |last shown
26             |-*/""",
27             expected = listOf(
28                 TextBlock("first shown\n"),
29                 TextBlock("last shown\n"))
30         )
31     }
32 
33     private fun check(input: String, expected: List<TextBlock>) {
34         val parsingResult = ReportingParseRunner<Any>(parser.Root()).run(input.t\
35 rimMargin())
36         if (!parsingResult.matched) {
37             fail()
38         }
39         assertEquals(expected, parsingResult.valueStack.toList().asReversed())
40     }
41 }

This obviously isn’t perfect, but it is a promising start. Parboiled is evidently working very hard behind the scenes to make this Rule-as-function which is then interpreted in context work in Java - this worries me a bit, but I’m prepared to suspend disbelief for now.

One interesting effect of the use of Parboiled is that we have introduced an intermediate form in our parsing. Previously we parsed the input and generated the output lines in one step (if you discount filtering nulls). In the new version we have built a list of TextBlock objects that we will later have to render into a Markdown file. This separation of concerns introduces complexity, but it also buys us flexibility and allows us to focus on one task at a time. Whilst there is a risk that we will have some difficulty with the rendering, I think it’s small, so I’m going to concentrate on extending the parsing for now - adding the code block markers.

First let’s add a test

 1 @Test fun `single code block`() {
 2     check("""
 3         |hidden
 4         |   //`
 5         |   first shown
 6         |   last shown
 7         |   //`
 8         |hidden""",
 9         expected = listOf(
10             CodeBlock("first shown\n", "last shown\n"))
11     )
12 }

and now duplicate-and-edit an implementation

 1 interface Block {
 2     val lines: MutableList<String>
 3     fun addLine(line: String) = lines.add(line)
 4 }
 5 
 6 data class TextBlock(override val lines: MutableList<String>) : Block {
 7     constructor(vararg lines: String) : this(mutableListOf(*lines))
 8 }
 9 
10 data class CodeBlock(override val lines: MutableList<String>) : Block {
11     constructor(vararg lines: String) : this(mutableListOf(*lines))
12 }
13 
14 open fun TextBlockRule(): Rule {
15     val block = Var<TextBlock>(TextBlock())
16     return Sequence(
17         Sequence(String("/*-"), NewLine()),
18         ZeroOrMore(
19             Sequence(
20                 TestNot(String("-*/")),
21                 ZeroOrMore(NoneOf("\n")),
22                 NewLine()),
23             block.get().addLine(match())),
24         Sequence(String("-*/"), OptionalNewLine()),
25         push(block.get())
26     )
27 }
28 
29 open fun CodeBlockRule(): Rule {
30     val block = Var<CodeBlock>(CodeBlock())
31     return Sequence(
32         Sequence(String("//`"), NewLine()),
33         ZeroOrMore(
34             Sequence(
35                 TestNot(String("//`")),
36                 ZeroOrMore(NoneOf("\n")),
37                 NewLine()),
38             block.get().addLine(match())),
39         Sequence(String("//`"), OptionalNewLine()),
40         push(block.get())
41     )
42 }
43 
44 open fun Root() = ZeroOrMore(
45     FirstOf(
46         TextBlockRule(),
47         CodeBlockRule(),
48         Line()))

Unfortunately this fails, and a little thought reveals that it’s because our block-matching code only works for block markers at the start of a line. The previous behaviour was that blocks were defined by markers as the first non-space characters. My instinct here is to back up and extract some reusable Rule from TextBlockRule, and then use that to implement CodeBlockRule. Otherwise I’m going to have diverging Rules that I have to try to find the commonality in.

So, reverting to our two tests for the text block, we refactor the parser and define CodeBlockRule in much the same way as text TextBlockRule.

 1 open fun TextBlockRule(): Rule {
 2     val result = Var(TextBlock())
 3     return BlockRule(String("/*-"), String("-*/"), result)
 4 }
 5 
 6 open fun CodeBlockRule(): Rule {
 7     val result = Var(CodeBlock())
 8     return BlockRule(String("//`"), String("//`"), result)
 9 }
10 
11 open fun <T: Block> BlockRule(startRule: Rule, endRule: Rule, accumulator: Var<T\
12 >) = Sequence(
13     BlockEnter(startRule),
14     ZeroOrMore(
15         LinesNotStartingWith(endRule),
16         accumulator.get().addLine(match())),
17     BlockExit(endRule),
18     push(accumulator.get())
19 )
20 
21 open fun LinesNotStartingWith(rule: Rule) = Sequence(
22     TestNot(rule),
23     ZeroOrMore(NoneOf("\n")),
24     NewLine())
25 
26 open fun BlockEnter(rule: Rule) = Sequence(rule, NewLine())
27 open fun BlockExit(rule: Rule) = Sequence(rule, OptionalNewLine())

I test this with an unindented code block, and with mixed text and code blocks, before turning to indented blocks.

  1 class ParserTests {
  2 
  3     val parser = Parboiled.createParser(BookParser::class.java)
  4 
  5     @Test fun `single text block`() {
  6         check("""
  7             |hidden
  8             |/*-
  9             |first shown
 10             |last shown
 11             |-*/
 12             |hidden""",
 13             expected = listOf(
 14                 TextBlock("first shown\n", "last shown\n"))
 15         )
 16     }
 17 
 18     @Test fun `two text blocks`() {
 19         check("""
 20             |/*-
 21             |first shown
 22             |-*/
 23             |hidden
 24             |/*-
 25             |last shown
 26             |-*/""",
 27             expected = listOf(
 28                 TextBlock("first shown\n"),
 29                 TextBlock("last shown\n"))
 30         )
 31     }
 32 
 33     @Test fun `indented code block`() {
 34         check("""
 35             |hidden
 36             |    //`
 37             |    first shown
 38             |    last shown
 39             |    //`
 40             |hidden""",
 41             expected = listOf(
 42                 CodeBlock("    first shown\n", "    last shown\n"))
 43         )
 44     }
 45 
 46     @Test fun `mix and match`() {
 47         check("""
 48             |hidden
 49             |/*-
 50             |first text
 51             |last text
 52             |-*/
 53             |    //`
 54             |    first code
 55             |    last code
 56             |    //`
 57             |hidden""",
 58             expected = listOf(
 59                 TextBlock("first text\n", "last text\n"),
 60                 CodeBlock("    first code\n", "    last code\n")
 61             )
 62         )
 63     }
 64 
 65     private fun check(input: String, expected: List<Any>) {
 66         assertEquals(expected, parse(input.trimMargin()))
 67     }
 68 
 69     private fun parse(input: String) = ReportingParseRunner<Any>(parser.Root()).\
 70 run(input).valueStack.toList().asReversed()
 71 }
 72 
 73 interface Block {
 74     val lines: MutableList<String>
 75     fun addLine(line: String) = lines.add(line)
 76 }
 77 
 78 data class TextBlock(override val lines: MutableList<String>) : Block {
 79     constructor(vararg lines: String) : this(mutableListOf(*lines))
 80 }
 81 
 82 data class CodeBlock(override val lines: MutableList<String>) : Block {
 83     constructor(vararg lines: String) : this(mutableListOf(*lines))
 84 }
 85 
 86 @BuildParseTree
 87 open class BookParser : BaseParser<Any>() {
 88 
 89     open fun NewLine() = Ch('\n')
 90     open fun OptionalNewLine() = Optional(NewLine())
 91     open fun WhiteSpace() = AnyOf(" \t")
 92     open fun OptionalWhiteSpace() = ZeroOrMore(WhiteSpace())
 93     open fun Line() = Sequence(ZeroOrMore(NoneOf("\n")), NewLine())
 94 
 95     open fun TextBlockRule(): Rule {
 96         val result = Var(TextBlock())
 97         return BlockRule(String("/*-"), String("-*/"), result)
 98     }
 99 
100     open fun CodeBlockRule(): Rule {
101         val result = Var(CodeBlock())
102         return BlockRule(String("//`"), String("//`"), result)
103     }
104 
105     open fun <T: Block> BlockRule(startRule: Rule, endRule: Rule, accumulator: V\
106 ar<T>) = Sequence(
107         BlockEnter(startRule),
108         ZeroOrMore(
109             LinesNotStartingWith(endRule),
110             accumulator.get().addLine(match())),
111         BlockExit(endRule),
112         push(accumulator.get())
113     )
114 
115     open fun LinesNotStartingWith(rule: Rule) = Sequence(
116         OptionalWhiteSpace(),
117         TestNot(rule),
118         ZeroOrMore(NoneOf("\n")),
119         NewLine())
120 
121     open fun BlockEnter(rule: Rule) = Sequence(OptionalWhiteSpace(), rule, NewLi\
122 ne())
123     open fun BlockExit(rule: Rule) = Sequence(OptionalWhiteSpace(), rule, Option\
124 alNewLine())
125 
126     open fun Root() = ZeroOrMore(
127         FirstOf(
128             TextBlockRule(),
129             CodeBlockRule(),
130             Line()))
131 
132 }

Well that went surprisingly well, in that it passes the tests and is pretty understandable as a parser. The main wrinkle is that I don’t seem to be able to inline the

1 val result = Var(CodeBlock())

expressions - this must be something to do with the aggressive bytecode manipulation that Parboiled is performing. I expect that I will either understand this or be bitten by it again before the end of this project.

I’m a little dissatisfied by the fact that the lines in the Block objects contain the line separators and indents, but it’s easier to throw away information than try to reconstruct it, so reason that it should stay at least for now. At least the tests document the behaviour.

Now that we have a parser that converts our source into a list of blocks, let’s see how close we can get to having our original parser tests work against the new code.

Let’s remind ourselves what our test looks like. Previously this was our only test, now we have a separate parser it looks like a low-level integration test.

 1 class CodeExtractorTests {
 2 
 3     @Test fun writes_a_markdown_file_from_Kotlin_file() {
 4         checkApprovedTranslation("""
 5             |package should.not.be.shown
 6             |/*-
 7             |Title
 8             |=====
 9             |
10             |This is Markdown paragraph
11             |-*/
12             |
13             |object HiddenContext {
14             |  //`
15             |  /* This is a code comment
16             |  */
17             |  fun aFunction() {
18             |     return 42
19             |  }
20             |  //`
21             |}
22             |
23             |/*-
24             |More book text.
25             |-*/
26             """,
27             expected = """
28             |Title
29             |=====
30             |
31             |This is Markdown paragraph
32             |
33             |```kotlin
34             |  /* This is a code comment
35             |  */
36             |  fun aFunction() {
37             |     return 42
38             |  }
39             |```
40             |
41             |More book text.
42             |""")
43     }
44 
45     private fun checkApprovedTranslation(source: String, expected: String) {
46         assertEquals(expected.trimMargin(), translate(source.trimMargin()).joinT\
47 oString(""))
48     }
49 
50     fun translate(source: String): List<String> {
51         TODO()
52     }
53 }

In order to get this to pass, I’m going to parse the source into blocks, and then render each one.

 1 fun translate(source: String): List<String> {
 2     val parser = Parboiled.createParser(ContextB4.BookParser::class.java)
 3     val blocks: List<Block> = ReportingParseRunner<Block>(parser.Root())
 4         .run(source)
 5         .valueStack
 6         .toList()
 7         .asReversed()
 8     return blocks.flatMap { render(it) }
 9 }
10 
11 fun render(block: Block): List<String>  = when (block) {
12     is TextBlock -> block.lines
13     is CodeBlock -> listOf("\n```kotlin\n") + block.lines + "```\n\n"
14     else -> error("Unexpected block type")
15 }

The render function took a bit of tweaking with newlines in code blocks, but it works. And I don’t mind the newlines, as they allow us to express the realities of the requirements of the output Markdown. Obviously the render function should be moved to a method on Block, but I can’t resist the temptation to try the new code on the whole of the book so far. We just have to plug the latest translate into the old file-reading code and

it works!

Flushed with success I reorganise things and tidy up before bed.

Firstly, the top level functions are

 1 fun main(args: Array<String>) {
 2     val srcDir = File(args[0])
 3     val outFile = File(args[1]).apply {
 4         absoluteFile.parentFile.mkdirs()
 5     }
 6 
 7     val translatedLines: Sequence<String> = sourceFilesIn(srcDir)
 8         .flatMap { translate(it).plus("\n") }
 9 
10     outFile.bufferedWriter(Charsets.UTF_8).use { writer ->
11         translatedLines.forEach {
12             writer.appendln(it)
13         }
14     }
15 }
16 
17 private fun translate(file: File): Sequence<String> = Translator.translate(file.\
18 readText())
19 
20 private fun sourceFilesIn(dir: File) = dir
21     .listFiles { file -> file.isSourceFile() }
22     .toList()
23     .sortedBy(File::getName)
24     .asSequence()
25 
26 private fun File.isSourceFile() = isFile && !isHidden && name.endsWith(".kt")

I then pull the parser running into a Translator object - this allows us to lazily create the parser and runner and reuse them. Kotlin gives us other ways of achieving this aim, but I’m trying this object scoping out to see how it feels.

 1 object Translator {
 2     private val parser = Parboiled.createParser(BookParser::class.java)
 3     private val runner = ReportingParseRunner<Block>(parser.Root())
 4 
 5     fun translate(source: String): Sequence<String> = parse(source).flatMap { it\
 6 .render() }.asSequence()
 7 
 8     fun parse(source: String): List<Block> = runner.run(source).valueStack.toLis\
 9 t().asReversed()
10 }

As promised I move the render function to a method on Block, with a default implementation in the interface

 1 interface Block {
 2     val lines: MutableList<String>
 3     fun addLine(line: String) = lines.add(line)
 4     fun render(): List<String> = lines
 5 }
 6 
 7 data class TextBlock(override val lines: MutableList<String>) : Block {
 8     constructor(vararg lines: String) : this(mutableListOf(*lines))
 9 }
10 
11 data class CodeBlock(override val lines: MutableList<String>) : Block {
12     constructor(vararg lines: String) : this(mutableListOf(*lines))
13 
14     override fun render() = listOf("\n```kotlin\n") + lines + "```\n\n"
15 }

and finally use install the Kotlin All-Open compiler plugin and configure it to make @BuildParseTree annotated classes open, so that we can remove the open modifiers from every method in the BookParser

 1 @BuildParseTree
 2 class BookParser : BaseParser<Any>() {
 3 
 4     fun NewLine() = Ch('\n')
 5     fun OptionalNewLine() = Optional(NewLine())
 6     fun WhiteSpace() = AnyOf(" \t")
 7     fun OptionalWhiteSpace() = ZeroOrMore(WhiteSpace())
 8     fun Line() = Sequence(ZeroOrMore(NoneOf("\n")), NewLine())
 9 
10     fun TextBlockRule(): Rule {
11         val result = Var(TextBlock())
12         return BlockRule(String("/*-"), String("-*/"), result)
13     }
14 
15     fun CodeBlockRule(): Rule {
16         val result = Var(CodeBlock())
17         return BlockRule(String("//`"), String("//`"), result)
18     }
19 
20     fun <T : Block> BlockRule(startRule: Rule, endRule: Rule, accumulator: Var<T\
21 >) = Sequence(
22         BlockEnter(startRule),
23         ZeroOrMore(
24             LinesNotStartingWith(endRule),
25             accumulator.get().addLine(match())),
26         BlockExit(endRule),
27         push(accumulator.get())
28     )
29 
30     fun LinesNotStartingWith(rule: Rule) = Sequence(
31         OptionalWhiteSpace(),
32         TestNot(rule),
33         ZeroOrMore(NoneOf("\n")),
34         NewLine())
35 
36     fun BlockEnter(rule: Rule) = Sequence(OptionalWhiteSpace(), rule, NewLine())
37     fun BlockExit(rule: Rule) = Sequence(OptionalWhiteSpace(), rule, OptionalNew\
38 Line())
39 
40     fun Root() = ZeroOrMore(
41         FirstOf(
42             TextBlockRule(),
43             CodeBlockRule(),
44             Line()))
45 
46 }

Now, a day’s work from where we decided to try a proper parser, I am in a position to add the file include directive.

Firstly, a test for the parsing.

 1 @Test fun `include directive`() {
 2     check("""
 3         |hidden
 4         |//#include "filename1.txt"
 5         |//#include "filename2.txt"
 6         |hidden""",
 7         expected = listOf(
 8             IncludeDirective("filename1.txt"),
 9             IncludeDirective("filename2.txt")
10         )
11     )
12 }

We can punt on how IncludeDirective should behave for now

1 data class IncludeDirective(var path: String) : Block {
2     override val lines: MutableList<String>
3         get() = TODO("not implemented")
4 }

and just implement the parsing

 1 fun IncludeRule(): Rule {
 2     return Sequence(
 3         OptionalWhiteSpace(),
 4         String("//#include "),
 5         OptionalWhiteSpace(),
 6         Ch('"'),
 7         ZeroOrMore(NoneOf("\"")),
 8         pushIncludeDirective,
 9         Ch('"'),
10         OptionalWhiteSpace(),
11         NewLine()
12     )
13 }
14 
15 private val pushIncludeDirective = Action<Any> { context ->
16     context.valueStack.push(IncludeDirective(context.match))
17     true
18 }
19 
20 fun Root() = ZeroOrMore(
21     FirstOf(
22         IncludeRule(),
23         TextBlockRule(),
24         CodeBlockRule(),
25         Line()))

Of course it took me another hour to get IncludeRule working, even after discovering Parboiled’s useful TracingParseRunner. I am beginning to internalise, if not actually understand, the interaction between Rules and Actions - leading to the creation of pushIncludeDirective, so maybe I’m becoming a real programmer after all these years. Not so fast that I trust my intuition though, which is why the test has two include directives to check that I don’t end up sharing state like last time.

Now we know where includes are in our source, we need to process them. In order to do this we need to know how to resolve a path to its contents.
For now, I’m just going to pass a contentResolver function to the render method of Block, which, now it is the parent of IncludeDirective, had better be restructured a bit

 1 interface Block {
 2     fun render(contentResolver: (String) -> List<String>): List<String>
 3 }
 4 
 5 abstract class SourceBlock() : Block {
 6     abstract val lines: MutableList<String>
 7     fun addLine(line: String) = lines.add(line)
 8     override fun render(contentResolver: (String) -> List<String>): List<String>\
 9  = lines
10 }
11 
12 data class TextBlock(override val lines: MutableList<String>) : SourceBlock() {
13     constructor(vararg lines: String) : this(mutableListOf(*lines))
14 }
15 
16 data class CodeBlock(override val lines: MutableList<String>) : SourceBlock() {
17     constructor(vararg lines: String) : this(mutableListOf(*lines))
18 
19     override fun render(contentResolver: (String) -> List<String>) = listOf("\n`\
20 ``kotlin\n") + lines + "```\n\n"
21 }
22 
23 data class IncludeDirective(var path: String) : Block {
24     override fun render(contentResolver: (String) -> List<String>) = contentReso\
25 lver(path)
26 }

Now our Translator object can pass the contentResolver to each block’s render

 1 object Translator {
 2     private val parser = Parboiled.createParser(BookParser::class.java)
 3     private val runner = ReportingParseRunner<Block>(parser.Root())
 4 
 5     fun translate(source: String, contentResolver: (String) -> List<String>): Se\
 6 quence<String> = parse(source)
 7         .flatMap { it.render(contentResolver) }
 8         .asSequence()
 9 
10     fun parse(source: String): List<Block> = runner.run(source).valueStack.toLis\
11 t().asReversed()
12 }

leaving the top-level translate(File) function to supply an implementation of the contentResolver

1 private fun translate(file: File): Sequence<String> =
2     Translator.translate(file.readText(),
3         contentResolver = { path ->
4             file.parentFile.resolve(path).readLines().map { it + "\n" }
5         }
6     )

and yes it did take me a while to realise that I needed to re-add the newlines stripped by File.readLines.

Chapter 4 - Publishing

After an unconscionable amount of time I’m finally in a position to render the first chapter well enough to be reviewed, so I insert the required //#include directives and check the rendering. There is still an issue with the Markdown renderer interpreting

1 ```
2 three quotes in code blocks
3 ```

which occurs when we include some Markdown inside files, but Googling reveals that I can mark those blocks with four backticks.

Of course now I have another two chapters of material generated in my attempt to render the first, so I fix those up too and package up the three Markdown files for review.

When I review the Markdown files of the first three chapters, they still really don’t seem like a good format to unleash on other people. If you view them as plain text the code is hard to read, and you need an editor with soft-wraps to view the paragraphs. There are Markdown previewers available, but each does a slightly different job. I tell myself that what I really need is rendering to PDF.

Bitter experience has taught me to be suspicious of any advice that delays getting feedback on a delivery, especially the voices in my head. Deep down I’m scared that this book isn’t going to work, but I’m enjoying writing it. My ego finds it easier push my own doubts aside than other peoples’ - this wouldn’t be my first 6 month waste of time. Nevertheless it seems a shame to present work so far in anything less than the best light…

I had tried a week ago to find a good way to render Markdown to PDF on the Mac. I got as far as installing Pandoc, the best recommendation, but it had some Unicode issue, probably to do with my WORD-JOINER hack. In the end I square the feedback / quality circle by paying Leanpub $99 to host the book. They will then render all the files into a single PDF, generate a table of contents, index etc, and I have no excuses left.

Inevitably viewing the Leanpub generated PDF reveals some nuances in their Markdown parsing that require minor text changes, but the process is remarkable smooth. I had only to specify the location of the GitHub repo holding this text, write the translated Markdown into a manuscript directory rather than build/book, add a file Book.txt (currently hardcoded) with the names of the chapter markdown files in order, push the manuscript directory to GitHub, and click a button on a Leanpub webpage. 30 seconds later PDF, epub and mobi versions are available for download. I copy the PDF link URL and send it to some friends for feedback.

While I wait, I figure that I have little to loose by actually publishing work to date on Leanpub. Their book pages shows how complete a book is (I set the figure to 5%) in order to set expectations, so I go ahead and publish. I know that this isn’t really the same as being a published author, but it feels pretty momentous nevertheless.

Chapter 5 - Planning v1.1

MVP

At this stage I think that I can safely say that the publishing code is a Minimal Viable Product. I recently had a disagreement with a client about the definition of MVP after they published a long document describing the features that their MVP would have. To me though, you cannot predict whether your product will be viable when it has a set of features - you can only release it with those features and see if it is. And if we wait until we have our MVP complete before release, how do we know that it wouldn’t have been viable with half of them?

It follows that you can only approach an MVP from below, and you have to be continually releasing Sub-minimal Viable Products until, at some point, you discover that one of them is viable.

That is the process that our publishing code went through. Viable in this case means good enough to generate a reviewable manuscript for this book. I really can’t think of any feature that the software has that I could take out and make it over that bar. I suppose that the code itself could be more minimal with the same features, but I’m in quite a happy place. This is version 1.0

Is the book at the MVP stage yet? Frankly I doubt it. I believe (without any evidence yet) that my target audience could read and maybe even learn from the chapters so far, but there is a crucial difference between a book and a product. A product is generally used more than once - if it is fit for purpose the first time you use it you will find it has more utility as it gets better. A book, on the other hand, is to some large extent used up by reading it. If I revise Chapter 2 three times it may get better, but your interest will be over after at most two readings.

Looked at like this, I suppose we might find that some number of completed chapters were minimal viable, but as soon as I start moving material around, or heavily editing existing chapters, that confuses the equation. Perhaps that is why the episodic format is attractive - it is more amenable to addition rather than revision. Perhaps the Open-Closed Principle is at work here too.

Enhancements

One of the nice things about this process that I’ve slipped into is that I don’t have to prioritise writing prose over writing software, I only have to do the latter and write about it. There are times when I could be cutting code quicker if I weren’t documenting the process, and there is a whole layer of source complexity involved in having several generations of the same code coexisting in the same file that I am (currently) hiding from you, but on the whole the self-reflection is working well for me.

I’ve had my author’s hat on for a while now. The last time I wore the developer hat was completing the parser in Chapter 3. Since then the author has tried to publish, found it (at least technically) successful, declared that we have an MVP, and has written about it. That process has inevitably raised some issues in the software, so I grab both hats and discuss what new features would be nice.

Simplify Prose Comment Block Ends

Author - “Every time I type /*- at the beginning of a line to start a prose block, IntelliJ helpfully inserts a */ on the line underneath to close the comment, but I have to prepend a ~ to end our prose block. I keep forgetting to do this, and nothing alerts me until I view the rendered Markdown mess. Can you warn me?”

Developer - “I could make it an error at the point of translation I suppose, but then I’d have to find a way to tell you where the problem was. Why don’t I just allow

1 /*-
2 This is published
3  */

ie - let the normal block-comment-end end our prose blocks? Unless you have a need to overlap code and prose comments that should be fine, and it would be backward compatible with the current functionality.”

Author - “If you could do that it would be great. How long do you think it would take?”

Developer - “Hmmm, 20 minutes if it works out like I think it should, otherwise all day”

Author - “Well it will almost certainly save me 20 minutes over the course of the book. I think that we should do it if it’s done in that time, otherwise stop and we’ll drop the feature.”

Unindent Source

Author - “When Kotlin is rendered into code blocks it keeps whatever level of indentation it had in the source. This often results in lines being wrapped in the PDF, which is less than ideal, and will be even worse on phone screens. Can we fix that?”

Developer - “Shouldn’t be a problem - I can just remove the first line’s indent from all lines. Or the minimum indent from all lines. Which would you like?”

Author - “I don’t know - which would be quicker to try?”

Developer - “Well it’s easier to find the first line’s indent, but there really isn’t much in it. Let me do that for now, and you can let me know whether you need a better job done. Maybe an hour’s work, as I have some test cases to update.”

Better Code Block Inclusion

Author - “I keep on having to jump through Markdown hoops because the //#include directive isn’t processed in prose blocks, but I can’t use Markdown code block markers in Kotlin code.”

Developer - “Hmmm, I could work out how to process those directives in prose blocks, or give you a special code for including code block markers at the Kotlin level. Maybe a general purpose directive

1 //#literal ---

The latter would be simpler, but I’m more interested to learn how to be parsing more than one thing at once, and it would give you something to write about.”

Author - “Ooh, sneaky, I like that. And the longer it takes you the more material I have.”

Developer - “Yes, within reason - remember how badly that first-generation parser refactoring worked out!? Let’s try to get this one done in a day.”

Streamline New Chapters

Author - “Every time I start a new chapter, I have to update the Book.txt that Leanpub uses to stitch together the constituent Markdown files, and add a line to the shell script that invokes the translator on a directory”

Developer - “Oh, so Leanpub is stitching together files as well as us? Why didn’t you say so? Why don’t I just gather together all our translated source files in order and generate Book.txt from those? It’s probably an hour’s work, and will remove some code that we currently don’t have any tests for.”

Streamline Publication

Author - “You’ve given me a batch file to run to build the book, but every time you make changes to the publishing code I have to update that first, and I don’t necessarily know / remember when that is necessary.”

Developer - “Let me look at the Gradle build [sigh] - I can probably figure out a way to build the software, run the tests and build the whole book in one command. It’s probably a couple of hours of pain.”

Prioritization

We have a chat and decide that, while all four features would be nice, Unindent Source would make the biggest difference to the published book. The others are all focused on speeding up writing / development, so are arguably less important, but should be achievable relatively soon after that.

Chapter 6 - Unindent Source

I find that predicting what the consequences of a functional change in to a codebase are a good bellwether. A healthy codebase is one in which changes don’t propagate far from their root - if I have to touch more than a single class to change this behaviour I would question how well factored the code is. In addition, predicting and then testing our prediction allows us to gauge how well we understand what we have built.

As we have a clean separation between parsing and rendering, CodeBlock is the only production code that should have to change. The tests are a different story. The only tests we have for the render methods are currently the badly named CodeExtractorTests - effectively an integration test of parsing and rendering, but not currently updated for the #include directive processing.

In my excitement to get an actual PDF published I’d overlooked this deficit. One of the main arguments in favour of a separate QA function in a software team is to reign in the natural optimism of developers, and in this case I’ve accumulated a little technical debt. Should we pay it down, or get on with making progress?

This is one of those situations where it often depends on the codebase and your relationship with your customer. If the codebase has a lot of debt, and it might be difficult to take time after shipping this feature to re-balance things, I’d definitely add some tests now. It is easier to ask forgiveness than permission, and even easier if you have to do neither because you invisibly do the right thing.

In this case though, the relationship between customer and developer is pretty good. I appreciate my desire to publish more readable code as soon as possible, and trust myself to give me permission to pay down debt when it starts affecting progress. So let’s crack on.

Let’s at least add some rendering tests first though.

 1 class SourceBlockRenderingTests {
 2 
 3     val nullRenderingContext: (String) -> List<String> = {
 4         fail("Shouldn't need to use the rendering context")
 5     }
 6 
 7     @Test fun `A TextBlock renders as its lines`() {
 8         assertEquals(listOf(
 9             "line1\n",
10             "line2\n"),
11             TextBlock("line1\n", "line2\n").render(nullRenderingContext))
12     }
13 
14     @Test fun `A CodeBlock renders as a Markdown codeblock`() {
15         assertEquals(listOf(
16             "\n```kotlin\n",
17             "line1\n",
18             "line2\n",
19             "```\n\n"),
20             CodeBlock("line1\n", "line2\n").render(nullRenderingContext))
21     }
22 }

Oh that’s interesting, the CodeBlock’s newlines really aren’t in the place that I expect. Lets fix that first - it at very least violates the Principle of Least Surprise.

 1     @Test fun `A CodeBlock renders as a Markdown codeblock`() {
 2         assertEquals(listOf(
 3             "\n",
 4             "```kotlin\n",
 5             "line1\n",
 6             "line2\n",
 7             "```\n",
 8             "\n"),
 9             CodeBlock("line1\n", "line2\n").render(nullRenderingContext))
10     }
11 }
12 
13 data class CodeBlock(override val lines: MutableList<String>) : ContextA7.Source\
14 Block() {
15     constructor(vararg lines: String) : this(mutableListOf(*lines))
16 
17     override fun render(contentResolver: (String) -> List<String>) = listOf("\n"\
18 , "```kotlin\n") + lines + listOf("```\n", "\n")
19 }
 1     @Test fun `A CodeBlock renders as a Markdown codeblock with the minimum inde\
 2 nt removed`() {
 3         assertEquals(listOf(
 4             "\n",
 5             "```kotlin\n",
 6             "    line1\n",
 7             "line2\n",
 8             "        line3\n",
 9             "```\n",
10             "\n"),
11             CodeBlock(
12                 "        line1\n",
13                 "    line2\n",
14                 "            line3\n"
15             ).render(nullRenderingContext))
16     }
17 }
18 
19 data class CodeBlock(override val lines: MutableList<String>) : SourceBlock() {
20     constructor(vararg lines: String) : this(mutableListOf(*lines))
21 
22     override fun render(contentResolver: (String) -> List<String>) =
23         listOf("\n", "```kotlin\n") +
24             lines.toLinesWithMinimumIndentRemoved() +
25             listOf("```\n", "\n")
26 }
27 
28 fun List<String>.toLinesWithMinimumIndentRemoved(): List<String> {
29     val minimumIndent = this.map { it.indentLength }.min() ?: 0
30     return this.map { it.substring(minimumIndent) }
31 }
32 
33 val String.indentLength get() = this.indexOfFirst { !it.isWhitespace() }

That’s reasonably straightforward, and reads nicely because of our extension methods. Another Kotlin subtlety is hidden in the definition of minimumIndent. Iterable.min() will return null if there are no items in the list - in the same situation Collections.min() throws NoSuchElementException. Without Kotlin’s insistence that I couldn’t call String.substring() with a Int? (nullable Int) there would have been a failure if I ever tried to render an empty CodeBlock. That’s the sort of thing that’s embarrassing, and I can’t put my hand on my heart and say that I was planning to write a test for the empty case. As it is, I wrestle with my engineering judgement and conclude that it isn’t worth adding, but I’d be open to persuasion if my pair felt strongly.

Now I run the code on the source of this book, and my smugness at the type-system obviating the need for tests turns out to be a little optimistic. I get String index out of range: -1 invoking substring() in toLinesWithMinimumIndentRemoved(). Hah - Kotlin indexOfFirst() returns -1 and not null when the predicate doesn’t match any item. I know that -1 is an established convention for not found, but really Kotlin, it looks like you’re just trying to embarrass me.

The problem is lines that are only whitespace, or, I suppose, have no characters at all. Let’s add some tests for those, and now we’ve bitten the bullet of multiple tests, one for the empty code block.

 1 @Test fun `An empty CodeBlock renders OK`() {
 2     assertEquals(listOf(
 3         "\n",
 4         "```kotlin\n",
 5         "```\n",
 6         "\n"),
 7         CodeBlock().render(nullRenderingContext))
 8 }
 9 
10 @Test fun `A CodeBlock with blank lines renders OK`() {
11     assertEquals(listOf(
12         "\n",
13         "```kotlin\n",
14         "\n",
15         "line1\n",
16         "    line2\n",
17         "```\n",
18         "\n"),
19         CodeBlock(
20             "\n",
21             "    line1\n",
22             "        line2\n"
23         ).render(nullRenderingContext))
24 }

The indent for a blank line is obviously 0 not -1, so lets force that.

1 val String.indentLength get() = Math.max(0, this.indexOfFirst { !it.isWhitespace\
2 () })

Oh good grief, my test still fails, because now the minimum indent is 0, no indent is removed from any line. This is making me look like a fool.

OK, new strategy - any line that is entirely whitespace gets an indent of null. Then I’ll remove those nulls from the list before taking the min. I’d better beef up the test a bit first.

 1 @Test fun `A CodeBlock with blank lines renders OK`() {
 2     assertEquals(listOf(
 3         "\n",
 4         "```kotlin\n",
 5         "\n",
 6         "line1\n",
 7         "    line2\n",
 8         "    \n",
 9         "```\n",
10         "\n"),
11         CodeBlock(
12             "\n",
13             "    line1\n",
14             "        line2\n",
15             "        \n"
16         ).render(nullRenderingContext))
17 }

And now the implementation is

 1 fun List<String>.toLinesWithMinimumIndentRemoved(): List<String> {
 2     val minimumIndent = this.map { it.indentLength }.filterNotNull().min() ?: 0
 3     return this.map { it.substring(minimumIndent) }
 4 }
 5 
 6 val String.indentLength get() : Int? =
 7     when {
 8         this.isNullOrBlank() -> null
 9         else -> Math.max(0, this.indexOfFirst { !it.isWhitespace() })
10     }

I run that and

java.lang.StringIndexOutOfBoundsException: String index out of range: -3

Gah. Now we’re trying to remove the first four characters from the line that is just \n. OK, last chance to look like a professional programmer.

 1 fun List<String>.toLinesWithMinimumIndentRemoved(): List<String> {
 2     val minimumIndent = this.map { it.indentLength }.filterNotNull().min() ?: 0
 3     return this.map { it.substringIfPossible(minimumIndent) }
 4 }
 5 
 6 val String.indentLength get() : Int? =
 7 when {
 8     this.isNullOrBlank() -> null
 9     else -> Math.max(0, this.indexOfFirst { !it.isWhitespace() })
10 }
11 
12 fun String.substringIfPossible(startIndex: Int) =
13     if (startIndex > length)
14         this
15     else
16         this.substring(startIndex)

Well it’s 23:40 here, so I’m glad to say that not only passes, but runs successfully on the entire book so far.

This has been another humbling episode. I have tried as far as is sensible to show the process of coding in real life, as so many books skip to “here’s one I prepared earlier.” If you also occasionally have trouble implementing an at-first-glance simple feature take heart that I have worked with many excellent developers who would not, at the very least, have spotted the pitfalls here until the tests failed. The code makes a monkey out of the best of us, which is why tests and an enquiring, even sceptical, mind are so useful.

Are we done? To be honest I’m not convinced that I’ve considered all the corner cases here. It really wouldn’t surprise me to find a formatting glitch in the book and trace it back here, nor to end up with another IndexOutOfBoundsException. If this code were running in an unattended system, or anywhere that I might get a late-night call to fix, I would want to make sure that another pair of eyes had reviewed the code and tests. As it is, the consequences of failure are just that I will have to context switch back into this code and debug it. Whilst the cost of that context switch is easy to underestimate; this is engineering, and it’s my job to judge the trade-offs. In this case, it’s a chance I’m prepared to take.