about summary refs log tree commit diff stats
path: root/README.html
blob: 94c9c2d2c82b3cb4f770cd71afe0e5acacf26b0b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
<!DOCTYPE html>
<title>doc.awk</title>
<link type="text/css" rel="stylesheet" href="style.css" />
<body>
<h1 id="doc-awk"><a href="#doc-awk" class="header">DOC AWK&nbsp;<svg width="16" height="16" xmlns="http://www.w3.org/2000/svg"><g transform="rotate(-30, 8, 8)" stroke="#000000" opacity="0.25"><rect fill="none" height="6" width="8" x="2" y="6" rx="1.5"/><rect fill="none" height="6" width="8" x="6" y="4" rx="1.5"/></g></svg></a></h1>
<p>A quick-and-dirty literate-programming-style documentation generator
inspired by <a class="normal" href="https://ashkenas.com/docco/" title="">docco</a>.</p>
<p>by Case Duckworth <a class="normal" href="&#x6D;a&#105;&#108;&#x74;&#x6F;&#58;a&#x63;&#x64;&#x77;&#x40;&#97;&#x63;&#x64;&#x77;&#x2E;&#x6E;&#x65;&#x74;">&#97;&#x63;&#x64;w&#x40;&#97;&#x63;&#100;&#119;&#x2E;&#110;&#101;t</a></p>
<p>Source available under the <a class="normal" href="https://acdw.casa/gcl" title="">Good Choices License</a>.</p>
<p>There's a lot of quick-and-dirty "literate programming tools" out there, many
of which were inspired by, and also borrowed from, docco.  I was particularly
interested in <a class="normal" href="https://rtomayko.github.io/shocco/" title="">shocco</a>, written in POSIX shell (of which I am a fan).</p>
<p>Notably missing, however, was a converter of some kind written in AWK.  Thus,
DOC AWK was born.</p>
<p>This page is the result of DOC AWK working on itself.  Not bad for &lt; 250 lines
including commentary!  You can pick up the raw source code of doc.awk <a class="normal" href="https://git.acdw.net/doc.awk" title="">in its
git repository</a> to use it yourself.</p>
<h2 id="code"><a href="#code" class="header">Code&nbsp;<svg width="16" height="16" xmlns="http://www.w3.org/2000/svg"><g transform="rotate(-30, 8, 8)" stroke="#000000" opacity="0.25"><rect fill="none" height="6" width="8" x="2" y="6" rx="1.5"/><rect fill="none" height="6" width="8" x="6" y="4" rx="1.5"/></g></svg></a></h2>
<pre><code>BEGIN {
</code></pre>
<p>All the best awk scripts start with a <code>BEGIN</code> block.  In this one, we
set a few variables from the environment, with defaults.  I use the
convenience function <code>getenv</code>, further down this script, to make it
easier.</p>
<p>First, the comment regex.  This regex detects a comment <em>line</em>, not an
inline comment.  By default, it's set up for awk, shell, and other
languages that use <code>#</code> as a comment delimiter, but you can make it
whatever you want.</p>
<pre><code>	COMMENT = getenv("DOCAWK_COMMENT", COMMENT, "^[ \t]*#+[ \t]*")
</code></pre>
<p>You can set <code>DOCAWK_TEXTPROC</code> to any text processor you want, but the
default is the vendored <code>mdown.awk</code> script in this repo.  It's from
<a class="normal" href="https://github.com/wernsey/d.awk" title="">d.awk</a>.</p>
<pre><code>	TEXTPROC = getenv("DOCAWK_TEXTPROC", TEXTPROC, "./mdown.awk")
</code></pre>
<p>You can also set the processor for code sections of the source file;
the included <code>htmlsafe.awk</code> simply escapes &lt;, &amp;, and &gt;.</p>
<pre><code>	CODEPROC = getenv("DOCAWK_CODEPROC", CODEPROC, "./htmlsafe.awk")
</code></pre>
<p>Usually, a file header and footer are enough for most documents.  The
defaults here are the included header.html and footer.html, since the
default output type is html.</p>
<p>Each of these documents are actually <em>templates</em>, with keys that can
expand to variables inside of <code>@@VARIABLE@@</code>.  This is mostly
for title expansion.</p>
<pre><code>	HEADER = getenv("DOCAWK_HEADER", HEADER, "./header.html")
	FOOTER = getenv("DOCAWK_FOOTER", FOOTER, "./footer.html")
}
</code></pre>
<p>Because <code>FILENAME</code> is unset during <code>BEGIN</code>, template expansion that attempts
to view the filename doesn't work.  Thus, I need a state variable to track
whether we've started or not (so that I don't print a header with every new
file).</p>
<pre><code>! begun {
</code></pre>
<p>The template array is initialized with the document's title.</p>
<pre><code>	TV["TITLE"] = get_title()
</code></pre>
<p>Print the header here, since if multiple files are passed to DOC AWK
they'll all be concatenated anyway.</p>
<pre><code>	file_print(HEADER)
}
</code></pre>
<p><code>doc.awk</code> is multi-file aware.  It also removes the shebang line from the
script if it exists, because you probably don't want that in the output.</p>
<p>It wouldn't be a <em>bad</em> idea to make a heuristic for determining the type of
source file we're converting here.</p>
<pre><code>FNR == 1 {
	begun = 1
	if ($0 ~ COMMENT) {
		lt = "text"
	} else {
		lt = "code"
	}
	if ($0 !~ /^#!/) {
		bufadd(lt)
	}
	next
}
</code></pre>
<p>The main logic is quite simple: if a given line is a comment as defined by
<code>DOCAWK_COMMENT</code>, it's in a text block and should be treated as such;
otherwise, it's in a code block.  Accumulate each part in a dedicated buffer,
and on a switch-over between code and text, print the buffer and reset.</p>
<pre><code>$0 !~ COMMENT {
	lt = "code"
	bufprint("text")
}

$0 ~ COMMENT {
	lt = "text"
	bufprint("code")
	sub(COMMENT, "", $0)
}

{
	bufadd(lt)
}
</code></pre>
<p>Of course, at the end there might be something in either buffer, so print that
out too.  I've decided to put text last for the possibility of ending commentary.</p>
<pre><code>END {
	bufprint("code")
	bufprint("text")
	file_print(FOOTER)
}
</code></pre>
<h2 id="functions"><a href="#functions" class="header">Functions&nbsp;<svg width="16" height="16" xmlns="http://www.w3.org/2000/svg"><g transform="rotate(-30, 8, 8)" stroke="#000000" opacity="0.25"><rect fill="none" height="6" width="8" x="2" y="6" rx="1.5"/><rect fill="none" height="6" width="8" x="6" y="4" rx="1.5"/></g></svg></a></h2>
<p><em>bufadd</em>: Add a STR to buffer TYPE.  STR defaults to $0, the input record.</p>
<pre><code>function bufadd(type, str)
{
	buf[type] = buf[type] (str ? str : $0) "\n"
}
</code></pre>
<p><em>bufprint</em>: Print a buffer of TYPE.  Automatically wrap the code blocks in a
little HTML code block.  I could maybe have a DOCAWK_CODE_PRE/POST and maybe
even one for text too, to make it more extensible (to other markup languages,
for example).</p>
<pre><code>function bufprint(type)
{
	buf[type] = trim(buf[type])
	if (buf[type]) {
		if (type == "code") {
			printf "&lt;pre&gt;&lt;code&gt;"
			printf(buf[type]) | CODEPROC
			close(CODEPROC)
			print "&lt;/code&gt;&lt;/pre&gt;"
		} else if (type == "text") {
			print(buf[type]) | TEXTPROC
			close(TEXTPROC)
		}
		buf[type] = ""
	}
}
</code></pre>
<p><em>file_print</em>: Print FILE line-by-line.  The <code>&gt; 0</code> check here ensures that it
bails on error (-1).</p>
<pre><code>function file_print(file)
{
	if (file) {
		while ((getline l &lt; file) &gt; 0) {
			print template_expand(l)
		}
		close(file)
	}
}
</code></pre>
<p><em>get_title</em>: get the title of the current script, for the expanded document.
If variables are set, use those; otherwise try to figure out the title from
the document's basename.</p>
<pre><code>function get_title()
{
	title = getenv("DOCAWK_TITLE", TITLE)
	if (! title) {
		title = FILENAME
		sub(/.*\//, "", title)
	}
	return title
}
</code></pre>
<p><em>getenv</em>: a convenience function for pulling values out of the environment.
If an environment variable ENV isn't found, test if VAR is set (i.e., <code>doc.awk
-v var=foo</code>.) and return it if it's set.  Otherwise, return the default value
DEF.</p>
<pre><code>function getenv(env, var, def)
{
	if (ENVIRON[env]) {
		return ENVIRON[env]
	} else if (var) {
		return var
	} else {
		return def
	}
}
</code></pre>
<p><em>template_expand</em>: expand templates of the form <code>@@template@@</code> in the text.
Currently it only does variables, and works by line.</p>
<p>Due to the way awk works, template variables need to live in their own special
array, <code>TV</code>.  I'd love it if awk had some kind of <code>eval</code> functionality, but at
least POSIX awk doesn't.</p>
<pre><code>function template_expand(text)
{
	if (match(text, /@@[^@]*@@/)) {
		var = substr(text, RSTART + 2, RLENGTH - 4)
		new = substr(text, 1, RSTART - 1)
		new = new TV[var]
		new = new substr(text, RSTART + RLENGTH)
	} else {
		new = text
	}
	return new
}
</code></pre>
<p><em>trim</em>: remove whitespace from either end of a string.</p>
<pre><code>function trim(str)
{
	sub(/^[ \n]*/, "", str)
	sub(/[ \n]*$/, "", str)
	return str
}
</code></pre>
</body>
</html>