Initial commit HEAD main

author: Case Duckworth 2022-08-02 09:25:42 -0500
committer: Case Duckworth 2022-08-02 09:26:59 -0500
commit: 0d81f5100640c7f961fe6d6e79a6b0d801b3289b (patch)
tree: d5edcc746b7612e8ebebf2ea7407c6eabd53b9f7 /README.html
download: docawk-main.tar.gz
docawk-main.zip
1 files changed, 203 insertions, 0 deletions
diff --git a/README.html b/README.html
new file mode 100644
index 0000000..94c9c2d
--- /dev/null
+++ b/README.html

@@ -0,0 +1,203 @@
+<!DOCTYPE html>
+<title>doc.awk</title>
+<link type="text/css" rel="stylesheet" href="style.css" />
+<body>
+<h1 id="doc-awk"><a href="#doc-awk" class="header">DOC AWK&nbsp;<svg width="16" height="16" xmlns="http://www.w3.org/2000/svg"><g transform="rotate(-30, 8, 8)" stroke="#000000" opacity="0.25"><rect fill="none" height="6" width="8" x="2" y="6" rx="1.5"/><rect fill="none" height="6" width="8" x="6" y="4" rx="1.5"/></g></svg></a></h1>
+<p>A quick-and-dirty literate-programming-style documentation generator
+inspired by <a class="normal" href="https://ashkenas.com/docco/" title="">docco</a>.</p>
+<p>by Case Duckworth <a class="normal" href="&#x6D;a&#105;&#108;&#x74;&#x6F;&#58;a&#x63;&#x64;&#x77;&#x40;&#97;&#x63;&#x64;&#x77;&#x2E;&#x6E;&#x65;&#x74;">&#97;&#x63;&#x64;w&#x40;&#97;&#x63;&#100;&#119;&#x2E;&#110;&#101;t</a></p>
+<p>Source available under the <a class="normal" href="https://acdw.casa/gcl" title="">Good Choices License</a>.</p>
+<p>There's a lot of quick-and-dirty "literate programming tools" out there, many
+of which were inspired by, and also borrowed from, docco.  I was particularly
+interested in <a class="normal" href="https://rtomayko.github.io/shocco/" title="">shocco</a>, written in POSIX shell (of which I am a fan).</p>
+<p>Notably missing, however, was a converter of some kind written in AWK.  Thus,
+DOC AWK was born.</p>
+<p>This page is the result of DOC AWK working on itself.  Not bad for &lt; 250 lines
+including commentary!  You can pick up the raw source code of doc.awk <a class="normal" href="https://git.acdw.net/doc.awk" title="">in its
+git repository</a> to use it yourself.</p>
+<h2 id="code"><a href="#code" class="header">Code&nbsp;<svg width="16" height="16" xmlns="http://www.w3.org/2000/svg"><g transform="rotate(-30, 8, 8)" stroke="#000000" opacity="0.25"><rect fill="none" height="6" width="8" x="2" y="6" rx="1.5"/><rect fill="none" height="6" width="8" x="6" y="4" rx="1.5"/></g></svg></a></h2>
+<pre><code>BEGIN {
+</code></pre>
+<p>All the best awk scripts start with a <code>BEGIN</code> block.  In this one, we
+set a few variables from the environment, with defaults.  I use the
+convenience function <code>getenv</code>, further down this script, to make it
+easier.</p>
+<p>First, the comment regex.  This regex detects a comment <em>line</em>, not an
+inline comment.  By default, it's set up for awk, shell, and other
+languages that use <code>#</code> as a comment delimiter, but you can make it
+whatever you want.</p>
+<pre><code>     COMMENT = getenv("DOCAWK_COMMENT", COMMENT, "^[ \t]*#+[ \t]*")
+</code></pre>
+<p>You can set <code>DOCAWK_TEXTPROC</code> to any text processor you want, but the
+default is the vendored <code>mdown.awk</code> script in this repo.  It's from
+<a class="normal" href="https://github.com/wernsey/d.awk" title="">d.awk</a>.</p>
+<pre><code>     TEXTPROC = getenv("DOCAWK_TEXTPROC", TEXTPROC, "./mdown.awk")
+</code></pre>
+<p>You can also set the processor for code sections of the source file;
+the included <code>htmlsafe.awk</code> simply escapes &lt;, &amp;, and &gt;.</p>
+<pre><code>     CODEPROC = getenv("DOCAWK_CODEPROC", CODEPROC, "./htmlsafe.awk")
+</code></pre>
+<p>Usually, a file header and footer are enough for most documents.  The
+defaults here are the included header.html and footer.html, since the
+default output type is html.</p>
+<p>Each of these documents are actually <em>templates</em>, with keys that can
+expand to variables inside of <code>@@VARIABLE@@</code>.  This is mostly
+for title expansion.</p>
+<pre><code>     HEADER = getenv("DOCAWK_HEADER", HEADER, "./header.html")
+        FOOTER = getenv("DOCAWK_FOOTER", FOOTER, "./footer.html")
+}
+</code></pre>
+<p>Because <code>FILENAME</code> is unset during <code>BEGIN</code>, template expansion that attempts
+to view the filename doesn't work.  Thus, I need a state variable to track
+whether we've started or not (so that I don't print a header with every new
+file).</p>
+<pre><code>! begun {
+</code></pre>
+<p>The template array is initialized with the document's title.</p>
+<pre><code>     TV["TITLE"] = get_title()
+</code></pre>
+<p>Print the header here, since if multiple files are passed to DOC AWK
+they'll all be concatenated anyway.</p>
+<pre><code>     file_print(HEADER)
+}
+</code></pre>
+<p><code>doc.awk</code> is multi-file aware.  It also removes the shebang line from the
+script if it exists, because you probably don't want that in the output.</p>
+<p>It wouldn't be a <em>bad</em> idea to make a heuristic for determining the type of
+source file we're converting here.</p>
+<pre><code>FNR == 1 {
+        begun = 1
+        if ($0 ~ COMMENT) {
+                lt = "text"
+        } else {
+                lt = "code"
+        }
+        if ($0 !~ /^#!/) {
+                bufadd(lt)
+        }
+        next
+}
+</code></pre>
+<p>The main logic is quite simple: if a given line is a comment as defined by
+<code>DOCAWK_COMMENT</code>, it's in a text block and should be treated as such;
+otherwise, it's in a code block.  Accumulate each part in a dedicated buffer,
+and on a switch-over between code and text, print the buffer and reset.</p>
+<pre><code>$0 !~ COMMENT {
+        lt = "code"
+        bufprint("text")
+}
+$0 ~ COMMENT {
+        lt = "text"
+        bufprint("code")
+        sub(COMMENT, "", $0)
+}
+{
+        bufadd(lt)
+}
+</code></pre>
+<p>Of course, at the end there might be something in either buffer, so print that
+out too.  I've decided to put text last for the possibility of ending commentary.</p>
+<pre><code>END {
+        bufprint("code")
+        bufprint("text")
+        file_print(FOOTER)
+}
+</code></pre>
+<h2 id="functions"><a href="#functions" class="header">Functions&nbsp;<svg width="16" height="16" xmlns="http://www.w3.org/2000/svg"><g transform="rotate(-30, 8, 8)" stroke="#000000" opacity="0.25"><rect fill="none" height="6" width="8" x="2" y="6" rx="1.5"/><rect fill="none" height="6" width="8" x="6" y="4" rx="1.5"/></g></svg></a></h2>
+<p><em>bufadd</em>: Add a STR to buffer TYPE.  STR defaults to $0, the input record.</p>
+<pre><code>function bufadd(type, str)
+{
+        buf[type] = buf[type] (str ? str : $0) "\n"
+}
+</code></pre>
+<p><em>bufprint</em>: Print a buffer of TYPE.  Automatically wrap the code blocks in a
+little HTML code block.  I could maybe have a DOCAWK_CODE_PRE/POST and maybe
+even one for text too, to make it more extensible (to other markup languages,
+for example).</p>
+<pre><code>function bufprint(type)
+{
+        buf[type] = trim(buf[type])
+        if (buf[type]) {
+                if (type == "code") {
+                        printf "&lt;pre&gt;&lt;code&gt;"
+                        printf(buf[type]) | CODEPROC
+                        close(CODEPROC)
+                        print "&lt;/code&gt;&lt;/pre&gt;"
+                } else if (type == "text") {
+                        print(buf[type]) | TEXTPROC
+                        close(TEXTPROC)
+                }
+                buf[type] = ""
+        }
+}
+</code></pre>
+<p><em>file_print</em>: Print FILE line-by-line.  The <code>&gt; 0</code> check here ensures that it
+bails on error (-1).</p>
+<pre><code>function file_print(file)
+{
+        if (file) {
+                while ((getline l &lt; file) &gt; 0) {
+                        print template_expand(l)
+                }
+                close(file)
+        }
+}
+</code></pre>
+<p><em>get_title</em>: get the title of the current script, for the expanded document.
+If variables are set, use those; otherwise try to figure out the title from
+the document's basename.</p>
+<pre><code>function get_title()
+{
+        title = getenv("DOCAWK_TITLE", TITLE)
+        if (! title) {
+                title = FILENAME
+                sub(/.*\//, "", title)
+        }
+        return title
+}
+</code></pre>
+<p><em>getenv</em>: a convenience function for pulling values out of the environment.
+If an environment variable ENV isn't found, test if VAR is set (i.e., <code>doc.awk
+-v var=foo</code>.) and return it if it's set.  Otherwise, return the default value
+DEF.</p>
+<pre><code>function getenv(env, var, def)
+{
+        if (ENVIRON[env]) {
+                return ENVIRON[env]
+        } else if (var) {
+                return var
+        } else {
+                return def
+        }
+}
+</code></pre>
+<p><em>template_expand</em>: expand templates of the form <code>@@template@@</code> in the text.
+Currently it only does variables, and works by line.</p>
+<p>Due to the way awk works, template variables need to live in their own special
+array, <code>TV</code>.  I'd love it if awk had some kind of <code>eval</code> functionality, but at
+least POSIX awk doesn't.</p>
+<pre><code>function template_expand(text)
+{
+        if (match(text, /@@[^@]*@@/)) {
+                var = substr(text, RSTART + 2, RLENGTH - 4)
+                new = substr(text, 1, RSTART - 1)
+                new = new TV[var]
+                new = new substr(text, RSTART + RLENGTH)
+        } else {
+                new = text
+        }
+        return new
+}
+</code></pre>
+<p><em>trim</em>: remove whitespace from either end of a string.</p>
+<pre><code>function trim(str)
+{
+        sub(/^[ \n]*/, "", str)
+        sub(/[ \n]*$/, "", str)
+        return str
+}
+</code></pre>
+</body>
+</html>
author	Case Duckworth	2022-08-02 09:25:42 -0500
committer	Case Duckworth	2022-08-02 09:26:59 -0500
commit	0d81f5100640c7f961fe6d6e79a6b0d801b3289b (patch)
tree	d5edcc746b7612e8ebebf2ea7407c6eabd53b9f7 /README.html
download	docawk-main.tar.gz docawk-main.zip

diff --git a/README.html b/README.html new file mode 100644 index 0000000..94c9c2d --- /dev/null +++ b/README.html
@@ -0,0 +1,203 @@
	1	<!DOCTYPE html>
	2	<title>doc.awk</title>
	3	<link type="text/css" rel="stylesheet" href="style.css" />
	4	<body>
	5	<h1 id="doc-awk"><a href="#doc-awk" class="header">DOC AWK <svg width="16" height="16" xmlns="http://www.w3.org/2000/svg"><g transform="rotate(-30, 8, 8)" stroke="#000000" opacity="0.25"><rect fill="none" height="6" width="8" x="2" y="6" rx="1.5"/><rect fill="none" height="6" width="8" x="6" y="4" rx="1.5"/></g></svg></a></h1>
	6	<p>A quick-and-dirty literate-programming-style documentation generator
	7	inspired by <a class="normal" href="https://ashkenas.com/docco/" title="">docco</a>.</p>
	8	<p>by Case Duckworth <a class="normal" href="mailto:acdw@acdw.net">acdw@acdw.net</a></p>
	9	<p>Source available under the <a class="normal" href="https://acdw.casa/gcl" title="">Good Choices License</a>.</p>
	10	<p>There's a lot of quick-and-dirty "literate programming tools" out there, many
	11	of which were inspired by, and also borrowed from, docco. I was particularly
	12	interested in <a class="normal" href="https://rtomayko.github.io/shocco/" title="">shocco</a>, written in POSIX shell (of which I am a fan).</p>
	13	<p>Notably missing, however, was a converter of some kind written in AWK. Thus,
	14	DOC AWK was born.</p>
	15	<p>This page is the result of DOC AWK working on itself. Not bad for < 250 lines
	16	including commentary! You can pick up the raw source code of doc.awk <a class="normal" href="https://git.acdw.net/doc.awk" title="">in its
	17	git repository</a> to use it yourself.</p>
	18	<h2 id="code"><a href="#code" class="header">Code <svg width="16" height="16" xmlns="http://www.w3.org/2000/svg"><g transform="rotate(-30, 8, 8)" stroke="#000000" opacity="0.25"><rect fill="none" height="6" width="8" x="2" y="6" rx="1.5"/><rect fill="none" height="6" width="8" x="6" y="4" rx="1.5"/></g></svg></a></h2>
	19	<pre><code>BEGIN {
	20	</code></pre>
	21	<p>All the best awk scripts start with a <code>BEGIN</code> block. In this one, we
	22	set a few variables from the environment, with defaults. I use the
	23	convenience function <code>getenv</code>, further down this script, to make it
	24	easier.</p>
	25	<p>First, the comment regex. This regex detects a comment <em>line</em>, not an
	26	inline comment. By default, it's set up for awk, shell, and other
	27	languages that use <code>#</code> as a comment delimiter, but you can make it
	28	whatever you want.</p>
	29	<pre><code> COMMENT = getenv("DOCAWK_COMMENT", COMMENT, "^[ \t]#+[ \t]")
	30	</code></pre>
	31	<p>You can set <code>DOCAWK_TEXTPROC</code> to any text processor you want, but the
	32	default is the vendored <code>mdown.awk</code> script in this repo. It's from
	33	<a class="normal" href="https://github.com/wernsey/d.awk" title="">d.awk</a>.</p>
	34	<pre><code> TEXTPROC = getenv("DOCAWK_TEXTPROC", TEXTPROC, "./mdown.awk")
	35	</code></pre>
	36	<p>You can also set the processor for code sections of the source file;
	37	the included <code>htmlsafe.awk</code> simply escapes <, &, and >.</p>
	38	<pre><code> CODEPROC = getenv("DOCAWK_CODEPROC", CODEPROC, "./htmlsafe.awk")
	39	</code></pre>
	40	<p>Usually, a file header and footer are enough for most documents. The
	41	defaults here are the included header.html and footer.html, since the
	42	default output type is html.</p>
	43	<p>Each of these documents are actually <em>templates</em>, with keys that can
	44	expand to variables inside of <code>@@VARIABLE@@</code>. This is mostly
	45	for title expansion.</p>
	46	<pre><code> HEADER = getenv("DOCAWK_HEADER", HEADER, "./header.html")
	47	FOOTER = getenv("DOCAWK_FOOTER", FOOTER, "./footer.html")
	48	}
	49	</code></pre>
	50	<p>Because <code>FILENAME</code> is unset during <code>BEGIN</code>, template expansion that attempts
	51	to view the filename doesn't work. Thus, I need a state variable to track
	52	whether we've started or not (so that I don't print a header with every new
	53	file).</p>
	54	<pre><code>! begun {
	55	</code></pre>
	56	<p>The template array is initialized with the document's title.</p>
	57	<pre><code> TV["TITLE"] = get_title()
	58	</code></pre>
	59	<p>Print the header here, since if multiple files are passed to DOC AWK
	60	they'll all be concatenated anyway.</p>
	61	<pre><code> file_print(HEADER)
	62	}
	63	</code></pre>
	64	<p><code>doc.awk</code> is multi-file aware. It also removes the shebang line from the
	65	script if it exists, because you probably don't want that in the output.</p>
	66	<p>It wouldn't be a <em>bad</em> idea to make a heuristic for determining the type of
	67	source file we're converting here.</p>
	68	<pre><code>FNR == 1 {
	69	begun = 1
	70	if ($0 ~ COMMENT) {
	71	lt = "text"
	72	} else {
	73	lt = "code"
	74	}
	75	if ($0 !~ /^#!/) {
	76	bufadd(lt)
	77	}
	78	next
	79	}
	80	</code></pre>
	81	<p>The main logic is quite simple: if a given line is a comment as defined by
	82	<code>DOCAWK_COMMENT</code>, it's in a text block and should be treated as such;
	83	otherwise, it's in a code block. Accumulate each part in a dedicated buffer,
	84	and on a switch-over between code and text, print the buffer and reset.</p>
	85	<pre><code>$0 !~ COMMENT {
	86	lt = "code"
	87	bufprint("text")
	88	}
	89
	90	$0 ~ COMMENT {
	91	lt = "text"
	92	bufprint("code")
	93	sub(COMMENT, "", $0)
	94	}
	95
	96	{
	97	bufadd(lt)
	98	}
	99	</code></pre>
	100	<p>Of course, at the end there might be something in either buffer, so print that
	101	out too. I've decided to put text last for the possibility of ending commentary.</p>
	102	<pre><code>END {
	103	bufprint("code")
	104	bufprint("text")
	105	file_print(FOOTER)
	106	}
	107	</code></pre>
	108	<h2 id="functions"><a href="#functions" class="header">Functions <svg width="16" height="16" xmlns="http://www.w3.org/2000/svg"><g transform="rotate(-30, 8, 8)" stroke="#000000" opacity="0.25"><rect fill="none" height="6" width="8" x="2" y="6" rx="1.5"/><rect fill="none" height="6" width="8" x="6" y="4" rx="1.5"/></g></svg></a></h2>
	109	<p><em>bufadd</em>: Add a STR to buffer TYPE. STR defaults to $0, the input record.</p>
	110	<pre><code>function bufadd(type, str)
	111	{
	112	buf[type] = buf[type] (str ? str : $0) "\n"
	113	}
	114	</code></pre>
	115	<p><em>bufprint</em>: Print a buffer of TYPE. Automatically wrap the code blocks in a
	116	little HTML code block. I could maybe have a DOCAWK_CODE_PRE/POST and maybe
	117	even one for text too, to make it more extensible (to other markup languages,
	118	for example).</p>
	119	<pre><code>function bufprint(type)
	120	{
	121	buf[type] = trim(buf[type])
	122	if (buf[type]) {
	123	if (type == "code") {
	124	printf "<pre><code>"
	125	printf(buf[type]) \| CODEPROC
	126	close(CODEPROC)
	127	print "</code></pre>"
	128	} else if (type == "text") {
	129	print(buf[type]) \| TEXTPROC
	130	close(TEXTPROC)
	131	}
	132	buf[type] = ""
	133	}
	134	}
	135	</code></pre>
	136	<p><em>file_print</em>: Print FILE line-by-line. The <code>> 0</code> check here ensures that it
	137	bails on error (-1).</p>
	138	<pre><code>function file_print(file)
	139	{
	140	if (file) {
	141	while ((getline l < file) > 0) {
	142	print template_expand(l)
	143	}
	144	close(file)
	145	}
	146	}
	147	</code></pre>
	148	<p><em>get_title</em>: get the title of the current script, for the expanded document.
	149	If variables are set, use those; otherwise try to figure out the title from
	150	the document's basename.</p>
	151	<pre><code>function get_title()
	152	{
	153	title = getenv("DOCAWK_TITLE", TITLE)
	154	if (! title) {
	155	title = FILENAME
	156	sub(/.*\//, "", title)
	157	}
	158	return title
	159	}
	160	</code></pre>
	161	<p><em>getenv</em>: a convenience function for pulling values out of the environment.
	162	If an environment variable ENV isn't found, test if VAR is set (i.e., <code>doc.awk
	163	-v var=foo</code>.) and return it if it's set. Otherwise, return the default value
	164	DEF.</p>
	165	<pre><code>function getenv(env, var, def)
	166	{
	167	if (ENVIRON[env]) {
	168	return ENVIRON[env]
	169	} else if (var) {
	170	return var
	171	} else {
	172	return def
	173	}
	174	}
	175	</code></pre>
	176	<p><em>template_expand</em>: expand templates of the form <code>@@template@@</code> in the text.
	177	Currently it only does variables, and works by line.</p>
	178	<p>Due to the way awk works, template variables need to live in their own special
	179	array, <code>TV</code>. I'd love it if awk had some kind of <code>eval</code> functionality, but at
	180	least POSIX awk doesn't.</p>
	181	<pre><code>function template_expand(text)
	182	{
	183	if (match(text, /@@[^@]*@@/)) {
	184	var = substr(text, RSTART + 2, RLENGTH - 4)
	185	new = substr(text, 1, RSTART - 1)
	186	new = new TV[var]
	187	new = new substr(text, RSTART + RLENGTH)
	188	} else {
	189	new = text
	190	}
	191	return new
	192	}
	193	</code></pre>
	194	<p><em>trim</em>: remove whitespace from either end of a string.</p>
	195	<pre><code>function trim(str)
	196	{
	197	sub(/^[ \n]*/, "", str)
	198	sub(/[ \n]*$/, "", str)
	199	return str
	200	}
	201	</code></pre>
	202	</body>
	203	</html>