From 1c8e5f1ca9bf0eb6dad8a62bc87428610d96c350 Mon Sep 17 00:00:00 2001 From: Case Duckworth Date: Wed, 3 Aug 2022 23:36:07 -0500 Subject: Begin anew I think this might really be something! --- README.txt | 176 ---------------------------- ht.awk | 385 ++++++++++++++++++++++++++++++------------------------------- ht.conf | 24 ++++ ht.sh | 62 ++++++++++ test.ht | 45 ++++---- test.txt | 24 ---- 6 files changed, 298 insertions(+), 418 deletions(-) delete mode 100644 README.txt create mode 100644 ht.conf create mode 100755 ht.sh delete mode 100644 test.txt diff --git a/README.txt b/README.txt deleted file mode 100644 index c409335..0000000 --- a/README.txt +++ /dev/null @@ -1,176 +0,0 @@ -# HAT TRICK - -HAT TRICK is both a lightweight markup language inspired by gemtext and html, -and this awk program to convert the markup language to gemtext, html, and -gophermap markup. It uses a mixture of "block"-level and line-level sigils to -extend the pure line-based markup of gemtext, while removing some of the more -annoying points (to my mind) of writing pure html---i.e., repetitive tags and -other boilerplate. - -## Syntax - -### Blocks - -In HAT TRICK, block of text separated by a blank line is a type of "block." The -default block is a paragraph, or

tag in html. (In gemini and gophermaps, no -extra tags are added.) Other blocks defined by the syntax are as follows: - ->>> -# HEADING 1 -## HEADING 2 -### HEADING 3 -<<< - -Correspond to in html; passed through unmodified to gemtext and gophermaps. - ->>> -> BLOCK QUOTE -<<< - -Corresponds to

; passed through unmodified to gemtext and -gophermaps. - ->>> -- UNORDERED LIST ITEM -1. ORDERED LIST ITEM -<<< - -The first list item in a block automatically opens the necessary list tag in -html. In gemtext, the "-" is converted to "*" (which signifies a list item); -the hyphen is passed through in gophermaps, because I think it's better syntax -personally. - ->>> ---- SECTION BREAK -<<< - -A visual indication to break sections. Corresponds to
in html -(TODO: consider html5
tags --- this would take more logic.) - -HAT TRICK reflows blocks, which means that only the first line of a block -needs to start with the sigils outlined above. However, each line of the block -can begin with the sigil character for easier reading. - -### Lines - -Within blocks, there are certain other sigils that apply only to the line they -prepend. They include the following: - ->>> -=> LINK -<<< - -Links are probably the most important element in any hypertext language---since -without them, it's hardly hypertext. HAT TRICK borrows its link syntax from -gemtext: the line starts with a "=>", the next field is the link's URL, and the -rest of the line is the link's display text. - ->>> - HTML TAG -<<< - -Lines beginning with an html tag are passed on to html verbatim. The closing -tag is automatically appended to the end of the line, before any ending -punctuation. I've found that 99 times out of 100, I don't want formatting to -include the ending punctuation. - -A backslash (\) at the end of the tag line will prevent the tag from being -ended, which is useful for tag-included punctuation as well as nesting tags. -However, the tag is never closed, so you'll have to close it yourself on the -next line. In addition, text that isn't in a tag is html-escaped, so for the -markup to properly apply, you'll need to write something like this: - ->>> -She sells \ -sea shells \ - -on the sea shore.\ - -<<< - -So while this markup is possible, it's discouraged through the awkwardness of -the syntax. - -To translate these tags to meaningful markup in gemtext and gophermaps, a lookup -table is used to correspond the tags to opening and closing characters around -the line's text. This correspondance can be defined with the environment -variable HT_TAGCHARS or HAT TRICK's second positional argument (See INVOCATION, -below). - ->>> -; COMMENT -<<< - -Comments in HAT TRICK aren't passed on to the output text---even in html, which -has a comment syntax. Instead, comments are passed, including the prepending -semicolon, to standard error for further processing. - -### Verbatim blocks - -Finally, there is a special type of block for passing raw text through to the -next phase of processing: the verbatim block. - ->>> - >>> [OUTPUTS] - VERBATIM TEXT - <<< -<<< - -In html, the verbatim text is wrapped in
 tags; in gemtext, it's
-wrapped in gemtext's own verbatim text markers ```; and the text is unwrapped in
-gophermaps for a cleaner look.
-
-The OUTPUTS can be any output specifier HAT TRICK accepts; see INVOCATION below
-for details.  If OUTPUTS is present, the verbatim text will only be passed
-through in the output formats listed; with no OUTPUTS listed it will output to
-all formats.
-
-## "Escaping" line- and block-types
-
-Each of the types listed above are anchored at the beginning of the line.
-Therefore, a simple "escaping" mechanism is available for free: simply prepend a
-space to any line you don't want processed as a line or block and you'll be
-gravy.  Astute readers will notice I did just that above, to describe the syntax
-for verbatim fencing.
-
-## Invocation
-
-An invocation of HAT TRICK will look something like this:
-
->>>
-./ht.awk [HT_FORMATS] [HT_TAGCHARS] < INPUT
-<<<
-
-It processes text from standard input and uses two positional parameters to
-customize its usage, in addition to environment variables.  In each instance,
-the parameter will override the variable, and if neither are provided, HAT TRICK
-will choose a default.
-
-### HT_FORMATS (default: "html")
-
-The format(s) to export to.  Can be one or more of "html", "gemini", and
-"gopher".  As a convenience, a format can be prepended with a "-" (i.e.,
-"-html"), in which case every other format will be exported to.  Multiple
-formats can also be specified by separating them with a comma.  The special
-keyword "all" will export to all formats (this is the default).
-
-If HAT TRICK exports to one format, it will simply print out each line
-translated into that format.  However, if more than one format is given, 
-HAT TRICK prints each line multiple times, prepending the name of the format to
-the output.  This allows for further processing to filter outputs according to
-output type with just one pass through the input.
-
-### HT_TAGCHARS (default: 'b:**,i://,code:``')
-
-The correspondance between html tag lines and other output formats.  If
-HT_FORMAT is only html, this option has no real meaning.
-
-Each correspondance is of the (exploded) form
-
->>>
-TAG : LEFT_CHAR RIGHT_CHAR
-<<<
-
-where TAG is the html tag, LEFT_CHAR is the character on the left of the
-enclosed text, and RIGHT_CHAR is the character on the right.  Rules can be
-separated by commas to pass multiple ones to HAT TRICK.
diff --git a/ht.awk b/ht.awk
index 60e042b..b9ae377 100755
--- a/ht.awk
+++ b/ht.awk
@@ -1,246 +1,235 @@
-#!/usr/bin/awk -f
-# -*- indent-tabs-mode: t; -*-
+#!/bin/awk -f
 # HAT TRICK
-# (C) 2022 C. Duckworth
-
-### Commentary:
-
-# OLDIFS=$IFS; IFS=$'\n';
-# for line in `cat testfile`; do
-# test=`echo "$line" | grep -E '[\]$'`;
-# if [ $test ]; then
-# newline=`echo $line | rev | cut -c 2- | rev`;
-# echo -n "$newline"; else echo "$line";
-# fi; done;
-# IFS=$OLDIFS
-
-### Code:
+# Copyright (C) 2022 Case Duckworth 
+#
 BEGIN {
-	width = 72
-	default_htag = "p"
-	default_gtag = ""
-	default_ftag = ""
-}
-
-### Raw formatting
-/^>>>/ {
-	getline first_raw
-	if (raw_fmt_p("html")) {
-		raw_html = 1
-		html[++hpar] = "
" html_escape(first_raw)
-	}
-	if (raw_fmt_p("gemini")) {
-		raw_gemini = 1
-		gemini[++gpar] = "```"
-		gemini[++gpar] = first_raw
-	}
-	if (raw_fmt_p("gopher")) {
-		raw_gopher = 1
-		gopher[++fpar] = first_raw
+	# Configuration
+	DEFAULT_CONFIG_MODE = "config"
+	config_initialize()
+	config_parse(ENVIRON["HT_CONFIG"] ? ENVIRON["HT_CONFIG"] : "ht.conf")
+	# State
+	DEFTAG = CONFIG["default_tag"]
+	DEFATTR = CONFIG["default_attr"]
+	TAG = DEFTAG
+	ATTR = DEFATTR
+}
+
+# Mutliple-file awareness
+FNR == 1 {
+	fileflush()
+}
+
+# Handle raw sections
+$0 ~ CONFIG["raw_delim"] {
+	RAW = ! RAW
+	if (RAW) {
+		buflush()
+		bufpush(CONFIG["raw_beg"], -1)
+	} else {
+		bufpush(CONFIG["raw_end"], -1)
+		print BUFFER
+		BUFFER = ""
 	}
-	raw = 1
 	next
 }
 
-/^<<
" - } - if (raw_gemini) { - gemini[++gpar] = "```" - gemini[++gpar] = "" - } - if (raw_gopher) { - gopher[++fpar] = "" - } - raw_html = 0 - raw_gemini = 0 - raw_gopher = 0 - raw = 0 +RAW { + bufpush($0) next } -raw { - if (raw_html) { - html_empty = 0 - html[++hpar] = html_escape($0) - } - if (raw_gemini) { - gemini_empty = 0 - gemini[++gpar] = $0 - } - if (raw_gopher) { - gopher_empty = 0 - gopher[++fpar] = $0 - } +# Comments +$0 ~ ("^" COMMENT_DELIM) { next } -# Block types -/^#/ { - match($0, /#+/) - htag = "h" (RLENGTH > 6 ? 6 : RLENGTH) - gtag = substr($0, RSTART, (RLENGTH > 3 ? 3 : RLENGTH)) " " - ftag = substr($0, RSTART, RLENGTH) " " - sub(/^#+[ \t]*/, "", $0) +# HTML escape hatch +/^/ { - title = "" - for (i = 3; i <= NF; i++) { - title = title (title ? " " : "") $i - } - hbuf[++hline] = "" title "" - gbuf[++gline] = "\ngemini\t" $0 - # TODO: gopher - next +# Sure, let's do templating! This makes it less... weird. +/\$/ { + # XXX: This is probably the dumbest way to do it. + gsub(/\$\$/, "$\a", $0) + gsub(/\$[^\a]/, "\\\\&", $0) + gsub(/\$\a/, "$", $0) } -### Everything else +# Blocks of text /./ { - html_empty = 0 - gemini_empty = 0 - gopher_empty = 0 - hbuf[++hline] = $0 - gbuf[++gline] = $0 - fbuf[++fline] = $0 + # EOL escape + if (match($0, /\\$/)) { + sep = -1 + $0 = substr($0, 1, RSTART - 1) + } else { + sep = "\n" + } + # Loop through BLOCK_TYPES + for (bt in BLOCK_TYPES) { + if (match($0, "^" bt "[ \t]*")) { + $0 = substr($0, RSTART + RLENGTH) + if (match(BLOCK_TYPES[bt], "[ \t]*>[ \t]*")) { + parent = substr(BLOCK_TYPES[bt], 1, RSTART - 1) + child = substr(BLOCK_TYPES[bt], RSTART + RLENGTH) + } + if (parent) { + split(parent, pa, FS) + split(child, bl, FS) + if (! IN_PARENT) { + IN_PARENT = pa[1] + } + TAG = IN_PARENT + ATTR = "" + for (i = 2; i <= length(pa); i++) { + ATTR = ATTR (ATTR ? " " : "") pa[i] + } + bufpush("<" child ">" $0 "") + next # XXX: This is messy. + } else { + split(BLOCK_TYPES[bt], bl, FS) + if (IN_PARENT) { + bufpush("") + IN_PARENT = "" + } + if (! BUFFER) { + TAG = bl[1] + for (b = 2; b <= length(bl); b++) { + ATTR = ATTR (ATTR ? " " : "") bl[b] + } + } else { + $0 = "<" BLOCK_TYPES[bt] ">" $0 "" + } + } + } + } + # Loop through LINE_TYPES + for (lt in LINE_TYPES) { + if (match($0, "^" lt "[ \t]*")) { + $0 = substr($0, RSTART + RLENGTH) + templ = LINE_TYPES[lt] + while (match(templ, /\$[0-9]+/)) { + sub(/\$[0-9]+/, $(substr(templ, RSTART + 1, RLENGTH - 1)), templ) + } + $0 = templ + } + } + # Push to buffer + bufpush($0, sep) } +# Blank lines end blocks /^$/ { - bufput() + if (HTML) { + html_end() + } + if (! RAW) { + buflush() + } } +# Clean up END { - bufput() - printarr(html, "html") - printarr(gemini, "gemini") - printarr(gopher, "gopher") -} - - -function bufput() -{ - hbufput() - gbufput() - fbufput() -} - -function clear(arr) -{ - for (x in arr) { - delete arr[x] + if (HTML) { + html_end() + } else if (RAW) { + bufpush(CONFIG["raw_end"], -1) + print BUFFER + } else { + buflush() } } -function fbufput() -{ - if (! length(fbuf)) { - next - } - for (ln in fbuf) { # XXX: gopher line types - paragraph = paragraph (paragraph ? " " : "") fbuf[ln] - } - fill(paragraph) - for (ln in fp) { - gopher[++fpar] = ((ln == 1) ? ftag : "") fp[ln] - } - gopher[++fpar] = "" - paragraph = "" - ftag = default_ftag - clear(fp) - clear(fbuf) -} -function fill(paragraph) +### Buffer-y functions +function buflush() { - char = 0 - ln = 1 - split(paragraph, words, FS) - for (word in words) { - char += length(words[word]) - if (char <= width) { - fp[ln] = fp[ln] (fp[ln] ? " " : "") words[word] - } else { - fp[++ln] = words[word] - char = length(words[word]) + buftrim() + if (BUFFER) { + if (TAG) { + TAG_BEG = "<" TAG (ATTR ? " " ATTR : "") ">" + TAG_END = "" } + print TAG_BEG BUFFER TAG_END + BUFFER = "" + TAG = DEFTAG + ATTR = DEFATTR + IN_PARENT = "" } } -function gbufput() +function bufpush(text, separator) { - if (! length(gbuf)) { - next + if (! separator) { + separator = "\n" } - for (ln in gbuf) { - paragraph = paragraph (paragraph ? " " : "") gbuf[ln] + if (separator == -1) { + separator = "" } - gemini[++gpar] = gtag paragraph - gemini[++gpar] = "" - gtag = default_gtag - paragraph = "" - clear(gbuf) + BUFFER = BUFFER text (separator ? separator : "") } -function gopher_line(type, display, selector, hostname, port) +function buftrim() { - return (type display "\t" selector "\t" hostname "\t" port) -} - -function hbufput() -{ - if (! length(hbuf)) { - next - } - for (ln in hbuf) { - paragraph = paragraph (paragraph ? " " : "") hbuf[ln] - } - fill(paragraph) - for (ln in fp) { - html[++hpar] = ((ln == 1) ? "<" (htag ? htag : default_htag) ">" : "") fp[ln] + if (match(BUFFER, "\n+$")) { + BUFFER = substr(BUFFER, 1, RSTART - 1) } - html[hpar] = html[hpar] (htag_end ? htag_end : "") - paragraph = "" - htag = default_htag - clear(fp) - clear(hbuf) } -function html_escape(text) +### Config functions +function config_initialize() { - gsub(/&/, "\\&", text) - gsub(//, "\\>", text) - return text -} - -function printarr(arr, prefix) + COMMENT_DELIM = ";" + CONFIG["raw_delim"] = "```" + CONFIG["raw_beg"] = "
"
+	CONFIG["raw_end"] = "
" + CONFIG["default_tag"] = "p" + CONFIG["default_attr"] = "" + LINE_TYPES["@"] = "$2" + LINE_TYPES["`"] = "$0" + BLOCK_TYPES["#"] = "h1" + BLOCK_TYPES["##"] = "h2" + BLOCK_TYPES["###"] = "h3" + BLOCK_TYPES["-"] = "ul>li" +} + +function config_parse(file) { - if (prefix) { - fmt = "%s\t%s\n" - } else { - fmt = "%s%s\n" - } - for (x in arr) { - printf fmt, prefix, arr[x] + mode = DEFAULT_CONFIG_MODE + while ((getline < file) > 0) { + if (match($0, /^#/) || ! $0) { + continue + } + if (match($0, /^\\/)) { + $0 = substr($0, 2) + } + if (match($0, /\[[^\]]+\]/)) { + mode = substr($0, RSTART + 1, RLENGTH - 2) + continue + } else { + var = $1 + val = "" + for (i = 2; i <= NF; i++) { + val = val (val ? " " : "") $i + } + if (mode == "config") { + CONFIG[var] = val + } else if (mode == "block") { + BLOCK_TYPES[var] = val + } else if (mode == "line") { + LINE_TYPES[var] = val + } + } } } -function raw_fmt_p(format) +### Other functions +function html_end() { - if (NF < 2) { - return 1 - } - if ($2 ~ /-/) { - if ($2 ~ ("-" format)) { - return 0 - } else { - return 1 - } - } - if ($2 ~ format) { - return 1 - } - return 0 + buftrim() + print BUFFER + BUFFER = "" + HTML = 0 } diff --git a/ht.conf b/ht.conf new file mode 100644 index 0000000..0634a94 --- /dev/null +++ b/ht.conf @@ -0,0 +1,24 @@ +# hat trick configuration file +[config] +raw_delim ``` +raw_begin

+raw_end 
+ +[block] +\# h1 +\## h2 +\### h3 +\#### h4 +\##### h5 +\###### h6 + +- ul>li +% ol>li + +> blockquote + +[line] +@ $2 +` $0 +/ $0 +* $0 diff --git a/ht.sh b/ht.sh new file mode 100755 index 0000000..cc0d0ba --- /dev/null +++ b/ht.sh @@ -0,0 +1,62 @@ +#!/bin/sh +# ht.sh +# *.ht -> *html + +# config +header_file=header.htm +footer_file=footer.htm +meta_file=meta.sh + +# state +HTDAT="$(date +%s)" +HT_TMPL_COUNT=0 + +print() { + printf '%s\n' "$*" +} + +htt() { # htt FILE + # Like `cat`, but with templating. + ht_end="ht_main_${HTDAT}_${HT_TMPL_COUNT}" # be extra double sure + eval "$( + print "cat <<$ht_end" + cat "$@" + print + print "$ht_end" + )" + HT_TMPL_COUNT=$((HT_TMPL_COUNT + 1)) +} + +htmeta_clear() { + # Generate metadata-clearing commands from $meta_file. + while read -r line; do + case "$line" in + *'()'*) # function + unset -f "${line%()*}" + ;; + *=*) # variable assignment + unset -v "${line%=*}" + ;; + *) # other -- XXX: Don't know what to do + ;; + esac + done <"$meta_file" +} + +htmeta() { # htmeta FILE + # Collect metadata from FILE. + # Metadata looks like this: `;;@` + sed -n 's/^;;@//p' "$1" | tee "$meta_file" +} + +main() { + # Make two passes over each input file, collecting metadata and content. + : + # Of course, this isn't safe, but you trust yourself, right? + for file; do + eval "$(htmeta_clear)" + eval "$(htmeta "$file")" + + ./ht.awk <"$file" | htt "$header_file" - "$footer_file" >"${file}ml" + done +} diff --git a/test.ht b/test.ht index 0208568..97425a9 100644 --- a/test.ht +++ b/test.ht @@ -1,27 +1,32 @@ -# a test +# ht: a bespoke document preparation system -here's a test for ht.awk. -it's got paragraphs (these bad boys), long lines and such, and also raw blocks. -=> https://example.com and links! +;; comments are like this. +;; they're a good time. ->>> -rawblock example1: all of them, & more -## fee fi fo fum -<<< +`ht +is a quasi-line-based markup language that takes inspiration from +@https://gemini.circumlunar.space/docs/gemtext.gmi gemtext\ +, +@https://daringfireball.net/projects/markdown/ markdown\ +, and others. +Its aim is to be somewhat easy to read while being fairly easy to parse. -## just html -but over two lines +In fact, +`ht +is a simple awk script. ->>> html -rawblock example2: just html -hey adora -<<< +## Usage -### not html +- one +- two +- three ->>> -html -rawblock example3: everything /but/ html -# with a header inside, blah -<<< +ordered list: -and finally, the end of the file. +% one +% two +% three + +``` +./ht.awk source.ht +``` diff --git a/test.txt b/test.txt deleted file mode 100644 index 8c47543..0000000 --- a/test.txt +++ /dev/null @@ -1,24 +0,0 @@ -html

-html here's a test for ht.awk. it's got paragraphs (these bad boys), long lines and such, -html and also raw blocks. -html

-html
-html -html

-html and finally, the end of the file. -html

-gemini here's a test for ht.awk. it's got paragraphs (these bad boys), long lines and such, and also raw blocks. -gemini -gemini ``` -gemini rawblock example1: all of them. -gemini fee fi fo fum -gemini ``` -gemini -gemini and finally, the end of the file. -gemini -gopher here's a test for ht.awk. it's got paragraphs (these bad boys), long lines and such, -gopher and also raw blocks. -gopher -gopher -gopher and finally, the end of the file. -gopher -- cgit 1.4.1-21-gabe81