summary refs log tree commit diff stats
diff options
context:
space:
mode:
authorCase Duckworth2022-08-03 23:36:07 -0500
committerCase Duckworth2022-08-03 23:36:07 -0500
commit1c8e5f1ca9bf0eb6dad8a62bc87428610d96c350 (patch)
treea0e4c14ae7f88aca2ef777725641d95dccb4d52e
parentAdd README (diff)
downloadht-1c8e5f1ca9bf0eb6dad8a62bc87428610d96c350.tar.gz
ht-1c8e5f1ca9bf0eb6dad8a62bc87428610d96c350.zip
Begin anew
I think this might really be something!
-rw-r--r--README.txt176
-rwxr-xr-xht.awk385
-rw-r--r--ht.conf24
-rwxr-xr-xht.sh62
-rw-r--r--test.ht45
-rw-r--r--test.txt24
6 files changed, 298 insertions, 418 deletions
diff --git a/README.txt b/README.txt deleted file mode 100644 index c409335..0000000 --- a/README.txt +++ /dev/null
@@ -1,176 +0,0 @@
1# HAT TRICK
2
3HAT TRICK is both a lightweight markup language inspired by gemtext and html,
4and this awk program to convert the markup language to gemtext, html, and
5gophermap markup. It uses a mixture of "block"-level and line-level sigils to
6extend the pure line-based markup of gemtext, while removing some of the more
7annoying points (to my mind) of writing pure html---i.e., repetitive tags and
8other boilerplate.
9
10## Syntax
11
12### Blocks
13
14In HAT TRICK, block of text separated by a blank line is a type of "block." The
15default block is a paragraph, or <p> tag in html. (In gemini and gophermaps, no
16extra tags are added.) Other blocks defined by the syntax are as follows:
17
18>>>
19# HEADING 1
20## HEADING 2
21### HEADING 3
22<<<
23
24Correspond to <hx> in html; passed through unmodified to gemtext and gophermaps.
25
26>>>
27> BLOCK QUOTE
28<<<
29
30Corresponds to <blockquote>; passed through unmodified to gemtext and
31gophermaps.
32
33>>>
34- UNORDERED LIST ITEM
351. ORDERED LIST ITEM
36<<<
37
38The first list item in a block automatically opens the necessary list tag in
39html. In gemtext, the "-" is converted to "*" (which signifies a list item);
40the hyphen is passed through in gophermaps, because I think it's better syntax
41personally.
42
43>>>
44--- SECTION BREAK
45<<<
46
47A visual indication to break sections. Corresponds to <hr> in html
48(TODO: consider html5 <section> tags --- this would take more logic.)
49
50HAT TRICK reflows blocks, which means that only the first line of a block
51needs to start with the sigils outlined above. However, each line of the block
52can begin with the sigil character for easier reading.
53
54### Lines
55
56Within blocks, there are certain other sigils that apply only to the line they
57prepend. They include the following:
58
59>>>
60=> LINK
61<<<
62
63Links are probably the most important element in any hypertext language---since
64without them, it's hardly hypertext. HAT TRICK borrows its link syntax from
65gemtext: the line starts with a "=>", the next field is the link's URL, and the
66rest of the line is the link's display text.
67
68>>>
69<TAG> HTML TAG
70<<<
71
72Lines beginning with an html tag are passed on to html verbatim. The closing
73tag is automatically appended to the end of the line, before any ending
74punctuation. I've found that 99 times out of 100, I don't want formatting to
75include the ending punctuation.
76
77A backslash (\) at the end of the tag line will prevent the tag from being
78ended, which is useful for tag-included punctuation as well as nesting tags.
79However, the tag is never closed, so you'll have to close it yourself on the
80next line. In addition, text that isn't in a tag is html-escaped, so for the
81markup to properly apply, you'll need to write something like this:
82
83>>>
84<b>She sells \
85<i>sea shells \
86</i>
87on the sea shore.\
88</b>
89<<<
90
91So while this markup is possible, it's discouraged through the awkwardness of
92the syntax.
93
94To translate these tags to meaningful markup in gemtext and gophermaps, a lookup
95table is used to correspond the tags to opening and closing characters around
96the line's text. This correspondance can be defined with the environment
97variable HT_TAGCHARS or HAT TRICK's second positional argument (See INVOCATION,
98below).
99
100>>>
101; COMMENT
102<<<
103
104Comments in HAT TRICK aren't passed on to the output text---even in html, which
105has a comment syntax. Instead, comments are passed, including the prepending
106semicolon, to standard error for further processing.
107
108### Verbatim blocks
109
110Finally, there is a special type of block for passing raw text through to the
111next phase of processing: the verbatim block.
112
113>>>
114 >>> [OUTPUTS]
115 VERBATIM TEXT
116 <<<
117<<<
118
119In html, the verbatim text is wrapped in <pre><code> tags; in gemtext, it's
120wrapped in gemtext's own verbatim text markers ```; and the text is unwrapped in
121gophermaps for a cleaner look.
122
123The OUTPUTS can be any output specifier HAT TRICK accepts; see INVOCATION below
124for details. If OUTPUTS is present, the verbatim text will only be passed
125through in the output formats listed; with no OUTPUTS listed it will output to
126all formats.
127
128## "Escaping" line- and block-types
129
130Each of the types listed above are anchored at the beginning of the line.
131Therefore, a simple "escaping" mechanism is available for free: simply prepend a
132space to any line you don't want processed as a line or block and you'll be
133gravy. Astute readers will notice I did just that above, to describe the syntax
134for verbatim fencing.
135
136## Invocation
137
138An invocation of HAT TRICK will look something like this:
139
140>>>
141./ht.awk [HT_FORMATS] [HT_TAGCHARS] < INPUT
142<<<
143
144It processes text from standard input and uses two positional parameters to
145customize its usage, in addition to environment variables. In each instance,
146the parameter will override the variable, and if neither are provided, HAT TRICK
147will choose a default.
148
149### HT_FORMATS (default: "html")
150
151The format(s) to export to. Can be one or more of "html", "gemini", and
152"gopher". As a convenience, a format can be prepended with a "-" (i.e.,
153"-html"), in which case every other format will be exported to. Multiple
154formats can also be specified by separating them with a comma. The special
155keyword "all" will export to all formats (this is the default).
156
157If HAT TRICK exports to one format, it will simply print out each line
158translated into that format. However, if more than one format is given,
159HAT TRICK prints each line multiple times, prepending the name of the format to
160the output. This allows for further processing to filter outputs according to
161output type with just one pass through the input.
162
163### HT_TAGCHARS (default: 'b:**,i://,code:``')
164
165The correspondance between html tag lines and other output formats. If
166HT_FORMAT is only html, this option has no real meaning.
167
168Each correspondance is of the (exploded) form
169
170>>>
171TAG : LEFT_CHAR RIGHT_CHAR
172<<<
173
174where TAG is the html tag, LEFT_CHAR is the character on the left of the
175enclosed text, and RIGHT_CHAR is the character on the right. Rules can be
176separated by commas to pass multiple ones to HAT TRICK.
diff --git a/ht.awk b/ht.awk index 60e042b..b9ae377 100755 --- a/ht.awk +++ b/ht.awk
@@ -1,246 +1,235 @@
1#!/usr/bin/awk -f 1#!/bin/awk -f
2# -*- indent-tabs-mode: t; -*-
3# HAT TRICK 2# HAT TRICK
4# (C) 2022 C. Duckworth 3# Copyright (C) 2022 Case Duckworth <acdw@acdw.net>
5 4#
6### Commentary:
7
8# OLDIFS=$IFS; IFS=$'\n';
9# for line in `cat testfile`; do
10# test=`echo "$line" | grep -E '[\]$'`;
11# if [ $test ]; then
12# newline=`echo $line | rev | cut -c 2- | rev`;
13# echo -n "$newline"; else echo "$line";
14# fi; done;
15# IFS=$OLDIFS
16
17### Code:
18BEGIN { 5BEGIN {
19 width = 72 6 # Configuration
20 default_htag = "p" 7 DEFAULT_CONFIG_MODE = "config"
21 default_gtag = "" 8 config_initialize()
22 default_ftag = "" 9 config_parse(ENVIRON["HT_CONFIG"] ? ENVIRON["HT_CONFIG"] : "ht.conf")
23} 10 # State
24 11 DEFTAG = CONFIG["default_tag"]
25### Raw formatting 12 DEFATTR = CONFIG["default_attr"]
26/^>>>/ { 13 TAG = DEFTAG
27 getline first_raw 14 ATTR = DEFATTR
28 if (raw_fmt_p("html")) { 15}
29 raw_html = 1 16
30 html[++hpar] = "<pre><code>" html_escape(first_raw) 17# Mutliple-file awareness
31 } 18FNR == 1 {
32 if (raw_fmt_p("gemini")) { 19 fileflush()
33 raw_gemini = 1 20}
34 gemini[++gpar] = "```" 21
35 gemini[++gpar] = first_raw 22# Handle raw sections
36 } 23$0 ~ CONFIG["raw_delim"] {
37 if (raw_fmt_p("gopher")) { 24 RAW = ! RAW
38 raw_gopher = 1 25 if (RAW) {
39 gopher[++fpar] = first_raw 26 buflush()
27 bufpush(CONFIG["raw_beg"], -1)
28 } else {
29 bufpush(CONFIG["raw_end"], -1)
30 print BUFFER
31 BUFFER = ""
40 } 32 }
41 raw = 1
42 next 33 next
43} 34}
44 35
45/^<<</ { 36RAW {
46 if (raw_html) { 37 bufpush($0)
47 html[hpar] = html[hpar] "</code></pre>"
48 }
49 if (raw_gemini) {
50 gemini[++gpar] = "```"
51 gemini[++gpar] = ""
52 }
53 if (raw_gopher) {
54 gopher[++fpar] = ""
55 }
56 raw_html = 0
57 raw_gemini = 0
58 raw_gopher = 0
59 raw = 0
60 next 38 next
61} 39}
62 40
63raw { 41# Comments
64 if (raw_html) { 42$0 ~ ("^" COMMENT_DELIM) {
65 html_empty = 0
66 html[++hpar] = html_escape($0)
67 }
68 if (raw_gemini) {
69 gemini_empty = 0
70 gemini[++gpar] = $0
71 }
72 if (raw_gopher) {
73 gopher_empty = 0
74 gopher[++fpar] = $0
75 }
76 next 43 next
77} 44}
78 45
79# Block types 46# HTML escape hatch
80/^#/ { 47/^</ {
81 match($0, /#+/) 48 HTML = 1
82 htag = "h" (RLENGTH > 6 ? 6 : RLENGTH) 49 bufpush($0)
83 gtag = substr($0, RSTART, (RLENGTH > 3 ? 3 : RLENGTH)) " " 50 next
84 ftag = substr($0, RSTART, RLENGTH) " "
85 sub(/^#+[ \t]*/, "", $0)
86} 51}
87 52
88# Line types 53# Sure, let's do templating! This makes it less... weird.
89/^=>/ { 54/\$/ {
90 title = "" 55 # XXX: This is probably the dumbest way to do it.
91 for (i = 3; i <= NF; i++) { 56 gsub(/\$\$/, "$\a", $0)
92 title = title (title ? " " : "") $i 57 gsub(/\$[^\a]/, "\\\\&", $0)
93 } 58 gsub(/\$\a/, "$", $0)
94 hbuf[++hline] = "<a href=\"" $2 "\">" title "</a>"
95 gbuf[++gline] = "\ngemini\t" $0
96 # TODO: gopher
97 next
98} 59}
99 60
100### Everything else 61# Blocks of text
101/./ { 62/./ {
102 html_empty = 0 63 # EOL escape
103 gemini_empty = 0 64 if (match($0, /\\$/)) {
104 gopher_empty = 0 65 sep = -1
105 hbuf[++hline] = $0 66 $0 = substr($0, 1, RSTART - 1)
106 gbuf[++gline] = $0 67 } else {
107 fbuf[++fline] = $0 68 sep = "\n"
69 }
70 # Loop through BLOCK_TYPES
71 for (bt in BLOCK_TYPES) {
72 if (match($0, "^" bt "[ \t]*")) {
73 $0 = substr($0, RSTART + RLENGTH)
74 if (match(BLOCK_TYPES[bt], "[ \t]*>[ \t]*")) {
75 parent = substr(BLOCK_TYPES[bt], 1, RSTART - 1)
76 child = substr(BLOCK_TYPES[bt], RSTART + RLENGTH)
77 }
78 if (parent) {
79 split(parent, pa, FS)
80 split(child, bl, FS)
81 if (! IN_PARENT) {
82 IN_PARENT = pa[1]
83 }
84 TAG = IN_PARENT
85 ATTR = ""
86 for (i = 2; i <= length(pa); i++) {
87 ATTR = ATTR (ATTR ? " " : "") pa[i]
88 }
89 bufpush("<" child ">" $0 "</" bl[1] ">")
90 next # XXX: This is messy.
91 } else {
92 split(BLOCK_TYPES[bt], bl, FS)
93 if (IN_PARENT) {
94 bufpush("</" IN_PARENT ">")
95 IN_PARENT = ""
96 }
97 if (! BUFFER) {
98 TAG = bl[1]
99 for (b = 2; b <= length(bl); b++) {
100 ATTR = ATTR (ATTR ? " " : "") bl[b]
101 }
102 } else {
103 $0 = "<" BLOCK_TYPES[bt] ">" $0 "</" bl[1] ">"
104 }
105 }
106 }
107 }
108 # Loop through LINE_TYPES
109 for (lt in LINE_TYPES) {
110 if (match($0, "^" lt "[ \t]*")) {
111 $0 = substr($0, RSTART + RLENGTH)
112 templ = LINE_TYPES[lt]
113 while (match(templ, /\$[0-9]+/)) {
114 sub(/\$[0-9]+/, $(substr(templ, RSTART + 1, RLENGTH - 1)), templ)
115 }
116 $0 = templ
117 }
118 }
119 # Push to buffer
120 bufpush($0, sep)
108} 121}
109 122
123# Blank lines end blocks
110/^$/ { 124/^$/ {
111 bufput() 125 if (HTML) {
126 html_end()
127 }
128 if (! RAW) {
129 buflush()
130 }
112} 131}
113 132
133# Clean up
114END { 134END {
115 bufput() 135 if (HTML) {
116 printarr(html, "html") 136 html_end()
117 printarr(gemini, "gemini") 137 } else if (RAW) {
118 printarr(gopher, "gopher") 138 bufpush(CONFIG["raw_end"], -1)
119} 139 print BUFFER
120 140 } else {
121 141 buflush()
122function bufput()
123{
124 hbufput()
125 gbufput()
126 fbufput()
127}
128
129function clear(arr)
130{
131 for (x in arr) {
132 delete arr[x]
133 } 142 }
134} 143}
135 144
136function fbufput()
137{
138 if (! length(fbuf)) {
139 next
140 }
141 for (ln in fbuf) { # XXX: gopher line types
142 paragraph = paragraph (paragraph ? " " : "") fbuf[ln]
143 }
144 fill(paragraph)
145 for (ln in fp) {
146 gopher[++fpar] = ((ln == 1) ? ftag : "") fp[ln]
147 }
148 gopher[++fpar] = ""
149 paragraph = ""
150 ftag = default_ftag
151 clear(fp)
152 clear(fbuf)
153}
154 145
155function fill(paragraph) 146### Buffer-y functions
147function buflush()
156{ 148{
157 char = 0 149 buftrim()
158 ln = 1 150 if (BUFFER) {
159 split(paragraph, words, FS) 151 if (TAG) {
160 for (word in words) { 152 TAG_BEG = "<" TAG (ATTR ? " " ATTR : "") ">"
161 char += length(words[word]) 153 TAG_END = "</" TAG ">"
162 if (char <= width) {
163 fp[ln] = fp[ln] (fp[ln] ? " " : "") words[word]
164 } else {
165 fp[++ln] = words[word]
166 char = length(words[word])
167 } 154 }
155 print TAG_BEG BUFFER TAG_END
156 BUFFER = ""
157 TAG = DEFTAG
158 ATTR = DEFATTR
159 IN_PARENT = ""
168 } 160 }
169} 161}
170 162
171function gbufput() 163function bufpush(text, separator)
172{ 164{
173 if (! length(gbuf)) { 165 if (! separator) {
174 next 166 separator = "\n"
175 } 167 }
176 for (ln in gbuf) { 168 if (separator == -1) {
177 paragraph = paragraph (paragraph ? " " : "") gbuf[ln] 169 separator = ""
178 } 170 }
179 gemini[++gpar] = gtag paragraph 171 BUFFER = BUFFER text (separator ? separator : "")
180 gemini[++gpar] = ""
181 gtag = default_gtag
182 paragraph = ""
183 clear(gbuf)
184} 172}
185 173
186function gopher_line(type, display, selector, hostname, port) 174function buftrim()
187{ 175{
188 return (type display "\t" selector "\t" hostname "\t" port) 176 if (match(BUFFER, "\n+$")) {
189} 177 BUFFER = substr(BUFFER, 1, RSTART - 1)
190
191function hbufput()
192{
193 if (! length(hbuf)) {
194 next
195 }
196 for (ln in hbuf) {
197 paragraph = paragraph (paragraph ? " " : "") hbuf[ln]
198 }
199 fill(paragraph)
200 for (ln in fp) {
201 html[++hpar] = ((ln == 1) ? "<" (htag ? htag : default_htag) ">" : "") fp[ln]
202 } 178 }
203 html[hpar] = html[hpar] (htag_end ? htag_end : "</" (htag ? htag : default_htag) ">")
204 paragraph = ""
205 htag = default_htag
206 clear(fp)
207 clear(hbuf)
208} 179}
209 180
210function html_escape(text) 181### Config functions
182function config_initialize()
211{ 183{
212 gsub(/&/, "\\&amp;", text) 184 COMMENT_DELIM = ";"
213 gsub(/</, "\\&lt;", text) 185 CONFIG["raw_delim"] = "```"
214 gsub(/>/, "\\&gt;", text) 186 CONFIG["raw_beg"] = "<pre><code>"
215 return text 187 CONFIG["raw_end"] = "</code></pre>"
216} 188 CONFIG["default_tag"] = "p"
217 189 CONFIG["default_attr"] = ""
218function printarr(arr, prefix) 190 LINE_TYPES["@"] = "<a href=\"$1\">$2</a>"
191 LINE_TYPES["`"] = "<code>$0</code>"
192 BLOCK_TYPES["#"] = "h1"
193 BLOCK_TYPES["##"] = "h2"
194 BLOCK_TYPES["###"] = "h3"
195 BLOCK_TYPES["-"] = "ul>li"
196}
197
198function config_parse(file)
219{ 199{
220 if (prefix) { 200 mode = DEFAULT_CONFIG_MODE
221 fmt = "%s\t%s\n" 201 while ((getline < file) > 0) {
222 } else { 202 if (match($0, /^#/) || ! $0) {
223 fmt = "%s%s\n" 203 continue
224 } 204 }
225 for (x in arr) { 205 if (match($0, /^\\/)) {
226 printf fmt, prefix, arr[x] 206 $0 = substr($0, 2)
207 }
208 if (match($0, /\[[^\]]+\]/)) {
209 mode = substr($0, RSTART + 1, RLENGTH - 2)
210 continue
211 } else {
212 var = $1
213 val = ""
214 for (i = 2; i <= NF; i++) {
215 val = val (val ? " " : "") $i
216 }
217 if (mode == "config") {
218 CONFIG[var] = val
219 } else if (mode == "block") {
220 BLOCK_TYPES[var] = val
221 } else if (mode == "line") {
222 LINE_TYPES[var] = val
223 }
224 }
227 } 225 }
228} 226}
229 227
230function raw_fmt_p(format) 228### Other functions
229function html_end()
231{ 230{
232 if (NF < 2) { 231 buftrim()
233 return 1 232 print BUFFER
234 } 233 BUFFER = ""
235 if ($2 ~ /-/) { 234 HTML = 0
236 if ($2 ~ ("-" format)) {
237 return 0
238 } else {
239 return 1
240 }
241 }
242 if ($2 ~ format) {
243 return 1
244 }
245 return 0
246} 235}
diff --git a/ht.conf b/ht.conf new file mode 100644 index 0000000..0634a94 --- /dev/null +++ b/ht.conf
@@ -0,0 +1,24 @@
1# hat trick configuration file
2[config]
3raw_delim ```
4raw_begin <pre><code>
5raw_end </pre></code>
6
7[block]
8\# h1
9\## h2
10\### h3
11\#### h4
12\##### h5
13\###### h6
14
15- ul>li
16% ol>li
17
18> blockquote
19
20[line]
21@ <a href="$1">$2</a>
22` <code>$0</code>
23/ <em>$0</em>
24* <strong>$0</strong>
diff --git a/ht.sh b/ht.sh new file mode 100755 index 0000000..cc0d0ba --- /dev/null +++ b/ht.sh
@@ -0,0 +1,62 @@
1#!/bin/sh
2# ht.sh
3# *.ht -> *html
4
5# config
6header_file=header.htm
7footer_file=footer.htm
8meta_file=meta.sh
9
10# state
11HTDAT="$(date +%s)"
12HT_TMPL_COUNT=0
13
14print() {
15 printf '%s\n' "$*"
16}
17
18htt() { # htt FILE
19 # Like `cat`, but with templating.
20 ht_end="ht_main_${HTDAT}_${HT_TMPL_COUNT}" # be extra double sure
21 eval "$(
22 print "cat <<$ht_end"
23 cat "$@"
24 print
25 print "$ht_end"
26 )"
27 HT_TMPL_COUNT=$((HT_TMPL_COUNT + 1))
28}
29
30htmeta_clear() {
31 # Generate metadata-clearing commands from $meta_file.
32 while read -r line; do
33 case "$line" in
34 *'()'*) # function
35 unset -f "${line%()*}"
36 ;;
37 *=*) # variable assignment
38 unset -v "${line%=*}"
39 ;;
40 *) # other -- XXX: Don't know what to do
41 ;;
42 esac
43 done <"$meta_file"
44}
45
46htmeta() { # htmeta FILE
47 # Collect metadata from FILE.
48 # Metadata looks like this: `;;@<SHELL_EXPRESSION>`
49 sed -n 's/^;;@//p' "$1" | tee "$meta_file"
50}
51
52main() {
53 # Make two passes over each input file, collecting metadata and content.
54 :
55 # Of course, this isn't safe, but you trust yourself, right?
56 for file; do
57 eval "$(htmeta_clear)"
58 eval "$(htmeta "$file")"
59
60 ./ht.awk <"$file" | htt "$header_file" - "$footer_file" >"${file}ml"
61 done
62}
diff --git a/test.ht b/test.ht index 0208568..97425a9 100644 --- a/test.ht +++ b/test.ht
@@ -1,27 +1,32 @@
1# a test 1# ht: a bespoke document preparation system
2 2
3here's a test for ht.awk. 3;; comments are like this.
4it's got paragraphs (these bad boys), long lines and such, and also raw blocks. 4;; they're a good time.
5=> https://example.com and links!
6 5
7>>> 6`ht
8rawblock example1: all of them, & more <hi!> 7is a quasi-line-based markup language that takes inspiration from
9## fee fi fo fum 8@https://gemini.circumlunar.space/docs/gemtext.gmi gemtext\
10<<< 9,
10@https://daringfireball.net/projects/markdown/ markdown\
11, and others.
12Its aim is to be somewhat easy to read while being fairly easy to parse.
11 13
12## just html 14In fact,
13but over two lines 15`ht
16is a simple awk script.
14 17
15>>> html 18## Usage
16rawblock example2: just html
17hey adora
18<<<
19 19
20### not html 20- one
21- two
22- three
21 23
22>>> -html 24ordered list:
23rawblock example3: everything /but/ html
24# with a header inside, blah
25<<<
26 25
27and finally, the end of the file. 26% one
27% two
28% three
29
30```
31./ht.awk source.ht
32```
diff --git a/test.txt b/test.txt deleted file mode 100644 index 8c47543..0000000 --- a/test.txt +++ /dev/null
@@ -1,24 +0,0 @@
1html <p>
2html here's a test for ht.awk. it's got paragraphs (these bad boys), long lines and such,
3html and also raw blocks.
4html </p>
5html </code></pre>
6html </code></pre>
7html <p>
8html and finally, the end of the file.
9html </p>
10gemini here's a test for ht.awk. it's got paragraphs (these bad boys), long lines and such, and also raw blocks.
11gemini
12gemini ```
13gemini rawblock example1: all of them.
14gemini fee fi fo fum
15gemini ```
16gemini
17gemini and finally, the end of the file.
18gemini
19gopher here's a test for ht.awk. it's got paragraphs (these bad boys), long lines and such,
20gopher and also raw blocks.
21gopher
22gopher
23gopher and finally, the end of the file.
24gopher