From ba7fecb5a39eae715e27c103ac1e653d1df85c9f Mon Sep 17 00:00:00 2001 From: Case Duckworth Date: Mon, 3 Jun 2024 22:24:08 -0500 Subject: Initial commit -- postcard format --- protocol.txt | 134 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 134 insertions(+) create mode 100644 protocol.txt diff --git a/protocol.txt b/protocol.txt new file mode 100644 index 0000000..323a0d8 --- /dev/null +++ b/protocol.txt @@ -0,0 +1,134 @@ +# Postcard protocol, version 1 + +The POSTCARD PROTOCOL is a federated asynchronous communication protocol that +aims to simulate the experience of sending postcards through the physical mail +online. A NETWORK consists of a series of servers, known as POST OFFICES, which +provide a limited number of user accounts (P.O. BOXES) that can send and receive +short messages, called POSTCARDS. The protocol is implemented on top of UDP and +tries to be relatively secure and mitigative against abuse. + +The postcard protocol is in ALPHA status and may change while we're working +toward a version 1.0. Below, each part of the protocol is described. + +## POSTCARD. + +A POSTCARD is a datagram that fits within a single UDP packet. While the +*actual* limit to UDP packet size is 2^16 bits[1], the number I've seen around +the internet for a *safe* UDP packet size is more like 512 bytes. So that's +what we're going with --- especially because a physical postcard is much closer +in data size to 512 bytes than 2^16 or even 1500 (another limit mentioned in +Julia Evans's blog referenced above). + +A POSTCARD's size is further restricted (though not by much) by a short header +with the following fields: + +* 2b MAGIC NUMBER: the ASCII codepoints of the letters "PC" (for postcard) +* 1b VERSION: The version of the postcard protocol being used +* 1b ENCODING: The text encoding of the postcard's message (see Appendix A) +* 1b TOBOX: The P.O. BOX of the message recipient +* 1b FROMBOX: The P.O. BOX of the message sender + +This header is 6 bytes long, leaving 506 bytes for message text: + +``` +,___________________________________________________________________. +|________________ POSTCARD PROTOCOL DATAGRAM SKETCH ________________| +> "PC" (2 bytes) |version,encoding|to(1b) from(1b)|message(506b) | +|0101000001000011|0000000100000000|ttttttttffffffff| . . . . . . . .| +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +``` + +## P.O. BOX. + +## POST OFFICE. + +## NETWORK. + +### APPENDIX A. Encoding table. + +While UTF-8 is the answer for almost every question of encoding in the modern +day, on a byte-restricted format like the postcard protocol I think it's unfair +for non-English speakers to be forced to use double or even triple the bytes to +send the same message. Therefore, the ENCODING field of a postcard's metadata +corresponds to the message encoding, which allows senders to choose a more +storage-friendly encoding for their messages. + +ENCODING is stored as a 1-byte number, allowing for 256 possible encodings. As +of Postcard protocol v. 1, these are recognized: + +``` | table +0 UTF-8 +1 ASCII +2 EBCDIC +3 ISO 8859-1 Western Europe +4 ISO 8859-2 Western and Central Europe +5 ISO 8859-3 Western Europe and South European +6 ISO 8859-4 Western Europe and Baltic countries +7 ISO 8859-5 Cyrillic alphabet +8 ISO 8859-6 Arabic +9 ISO 8859-7 Greek +10 ISO 8859-8 Hebrew +11 ISO 8859-9 Western Europe with amended Turkish character set +12 ISO 8859-10 Western Europe with rationalized Nordic character set +13 ISO 8859-11 Thai +14 ISO 8859-13 Baltic languages plus Polish +15 ISO 8859-14 Celtic languages +16 ISO 8859-15 ISO 8859-1 with rationalizations +17 ISO 8859-16 Central, Eastern and Southern European languages +18 CP437 +19 CP720 +20 CP737 +21 CP850 +22 CP852 +23 CP855 +24 CP857 +25 CP858 +26 CP860 +27 CP861 +28 CP862 +29 CP863 +30 CP865 +31 CP866 +32 CP869 +33 CP872 +34 Windows-1250 for Central European languages that use Latin script +35 Windows-1251 for Cyrillic alphabets +36 Windows-1252 for Western languages +37 Windows-1253 for Greek +38 Windows-1254 for Turkish +39 Windows-1255 for Hebrew +40 Windows-1256 for Arabic +41 Windows-1257 for Baltic languages +42 Windows-1258 for Vietnamese +43 Mac OS Roman +44 KOI8-R +45 KOI8-U +46 KOI7 +47 MIK +48 ISCII +49 TSCII +50 VISCII +51 Shift JIS +52 EUC-JP +53 ISO-2022-JP +54 JIS X 0213 +55 Shift_JIS-2004 +56 EUC-JIS-2004 +57 ISO-2022-JP-2004 +58 GB 2312 +59 GBK (Microsoft Code page 936) +60 GB 18030 +61 Taiwan Big5 (a more famous variant is Microsoft Code page 950) +62 Hong Kong HKSCS +63 Korean +64 EUC-KR +65 ISO-2022-KR +``` + +This list is pulled from Wikipedia's entry on common character encodings[2], so +it may need to be revised. + +## END NOTES. + +=> https://jvns.ca/blog/2017/02/07/mtu/ [1]: J. Evans. "How big can a packet get?" +=> https://en.wikipedia.org/wiki/Character_encoding#Common_character_encodings [2]: Wikipedia. "Character encoding" -- cgit 1.4.1-21-gabe81