From ba7fecb5a39eae715e27c103ac1e653d1df85c9f Mon Sep 17 00:00:00 2001
From: Case Duckworth
Date: Mon, 3 Jun 2024 22:24:08 -0500
Subject: Initial commit -- postcard format

---
 protocol.txt | 134 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 134 insertions(+)
 create mode 100644 protocol.txt

diff --git a/protocol.txt b/protocol.txt
new file mode 100644
index 0000000..323a0d8
--- /dev/null
+++ b/protocol.txt
@@ -0,0 +1,134 @@
+# Postcard protocol, version 1
+
+The POSTCARD PROTOCOL is a federated asynchronous communication protocol that
+aims to simulate the experience of sending postcards through the physical mail
+online.  A NETWORK consists of a series of servers, known as POST OFFICES, which
+provide a limited number of user accounts (P.O. BOXES) that can send and receive
+short messages, called POSTCARDS.  The protocol is implemented on top of UDP and
+tries to be relatively secure and mitigative against abuse.
+
+The postcard protocol is in ALPHA status and may change while we're working
+toward a version 1.0.  Below, each part of the protocol is described.
+
+## POSTCARD.
+
+A POSTCARD is a datagram that fits within a single UDP packet.  While the
+*actual* limit to UDP packet size is 2^16 bits[1], the number I've seen around
+the internet for a *safe* UDP packet size is more like 512 bytes.  So that's
+what we're going with --- especially because a physical postcard is much closer
+in data size to 512 bytes than 2^16 or even 1500 (another limit mentioned in
+Julia Evans's blog referenced above).
+
+A POSTCARD's size is further restricted (though not by much) by a short header
+with the following fields:
+
+* 2b MAGIC NUMBER: the ASCII codepoints of the letters "PC" (for postcard)
+* 1b VERSION: The version of the postcard protocol being used
+* 1b ENCODING: The text encoding of the postcard's message (see Appendix A)
+* 1b TOBOX: The P.O. BOX of the message recipient
+* 1b FROMBOX: The P.O. BOX of the message sender
+
+This header is 6 bytes long, leaving 506 bytes for message text:
+
+```
+,___________________________________________________________________.
+|________________ POSTCARD PROTOCOL DATAGRAM SKETCH ________________|
+> "PC" (2 bytes) |version,encoding|to(1b)  from(1b)|message(506b)   |
+|0101000001000011|0000000100000000|ttttttttffffffff| . . . . . . . .|
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+```
+
+## P.O. BOX.
+
+## POST OFFICE.
+
+## NETWORK.
+
+### APPENDIX A. Encoding table.
+
+While UTF-8 is the answer for almost every question of encoding in the modern
+day, on a byte-restricted format like the postcard protocol I think it's unfair
+for non-English speakers to be forced to use double or even triple the bytes to
+send the same message.  Therefore, the ENCODING field of a postcard's metadata
+corresponds to the message encoding, which allows senders to choose a more
+storage-friendly encoding for their messages.
+
+ENCODING is stored as a 1-byte number, allowing for 256 possible encodings.  As
+of Postcard protocol v. 1, these are recognized:
+
+``` | table
+0	UTF-8
+1	ASCII
+2	EBCDIC
+3	ISO 8859-1 Western Europe
+4	ISO 8859-2 Western and Central Europe
+5	ISO 8859-3 Western Europe and South European
+6	ISO 8859-4 Western Europe and Baltic countries
+7	ISO 8859-5 Cyrillic alphabet
+8	ISO 8859-6 Arabic
+9	ISO 8859-7 Greek
+10	ISO 8859-8 Hebrew
+11	ISO 8859-9 Western Europe with amended Turkish character set
+12	ISO 8859-10 Western Europe with rationalized Nordic character set
+13	ISO 8859-11 Thai
+14	ISO 8859-13 Baltic languages plus Polish
+15	ISO 8859-14 Celtic languages
+16	ISO 8859-15 ISO 8859-1 with rationalizations
+17	ISO 8859-16 Central, Eastern and Southern European languages
+18	CP437
+19	CP720
+20	CP737
+21	CP850
+22	CP852
+23	CP855
+24	CP857
+25	CP858
+26	CP860
+27	CP861
+28	CP862
+29	CP863
+30	CP865
+31	CP866
+32	CP869
+33	CP872
+34	Windows-1250 for Central European languages that use Latin script
+35	Windows-1251 for Cyrillic alphabets
+36	Windows-1252 for Western languages
+37	Windows-1253 for Greek
+38	Windows-1254 for Turkish
+39	Windows-1255 for Hebrew
+40	Windows-1256 for Arabic
+41	Windows-1257 for Baltic languages
+42	Windows-1258 for Vietnamese
+43	Mac OS Roman
+44	KOI8-R
+45	KOI8-U
+46	KOI7
+47	MIK
+48	ISCII
+49	TSCII
+50	VISCII
+51	Shift JIS
+52	EUC-JP
+53	ISO-2022-JP
+54	JIS X 0213
+55	Shift_JIS-2004
+56	EUC-JIS-2004
+57	ISO-2022-JP-2004
+58	GB 2312
+59	GBK (Microsoft Code page 936)
+60	GB 18030
+61	Taiwan Big5 (a more famous variant is Microsoft Code page 950)
+62	Hong Kong HKSCS
+63	Korean
+64	EUC-KR
+65	ISO-2022-KR
+```
+
+This list is pulled from Wikipedia's entry on common character encodings[2], so
+it may need to be revised.
+
+## END NOTES.
+
+=> https://jvns.ca/blog/2017/02/07/mtu/ [1]: J. Evans. "How big can a packet get?"
+=> https://en.wikipedia.org/wiki/Character_encoding#Common_character_encodings [2]: Wikipedia.  "Character encoding"
-- 
cgit 1.4.1-21-gabe81