From 9d59c162fa5f2c0d49feffb94694d76864dfccf3 Mon Sep 17 00:00:00 2001 From: Case Duckworth Date: Tue, 4 Jun 2024 12:29:05 -0500 Subject: Move protocol.txt to readme.txt --- protocol.txt | 134 ---------------------------------------------------------- readme.txt | 135 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 134 insertions(+), 135 deletions(-) delete mode 100644 protocol.txt mode change 120000 => 100644 readme.txt diff --git a/protocol.txt b/protocol.txt deleted file mode 100644 index 323a0d8..0000000 --- a/protocol.txt +++ /dev/null @@ -1,134 +0,0 @@ -# Postcard protocol, version 1 - -The POSTCARD PROTOCOL is a federated asynchronous communication protocol that -aims to simulate the experience of sending postcards through the physical mail -online. A NETWORK consists of a series of servers, known as POST OFFICES, which -provide a limited number of user accounts (P.O. BOXES) that can send and receive -short messages, called POSTCARDS. The protocol is implemented on top of UDP and -tries to be relatively secure and mitigative against abuse. - -The postcard protocol is in ALPHA status and may change while we're working -toward a version 1.0. Below, each part of the protocol is described. - -## POSTCARD. - -A POSTCARD is a datagram that fits within a single UDP packet. While the -*actual* limit to UDP packet size is 2^16 bits[1], the number I've seen around -the internet for a *safe* UDP packet size is more like 512 bytes. So that's -what we're going with --- especially because a physical postcard is much closer -in data size to 512 bytes than 2^16 or even 1500 (another limit mentioned in -Julia Evans's blog referenced above). - -A POSTCARD's size is further restricted (though not by much) by a short header -with the following fields: - -* 2b MAGIC NUMBER: the ASCII codepoints of the letters "PC" (for postcard) -* 1b VERSION: The version of the postcard protocol being used -* 1b ENCODING: The text encoding of the postcard's message (see Appendix A) -* 1b TOBOX: The P.O. BOX of the message recipient -* 1b FROMBOX: The P.O. BOX of the message sender - -This header is 6 bytes long, leaving 506 bytes for message text: - -``` -,___________________________________________________________________. -|________________ POSTCARD PROTOCOL DATAGRAM SKETCH ________________| -> "PC" (2 bytes) |version,encoding|to(1b) from(1b)|message(506b) | -|0101000001000011|0000000100000000|ttttttttffffffff| . . . . . . . .| -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -``` - -## P.O. BOX. - -## POST OFFICE. - -## NETWORK. - -### APPENDIX A. Encoding table. - -While UTF-8 is the answer for almost every question of encoding in the modern -day, on a byte-restricted format like the postcard protocol I think it's unfair -for non-English speakers to be forced to use double or even triple the bytes to -send the same message. Therefore, the ENCODING field of a postcard's metadata -corresponds to the message encoding, which allows senders to choose a more -storage-friendly encoding for their messages. - -ENCODING is stored as a 1-byte number, allowing for 256 possible encodings. As -of Postcard protocol v. 1, these are recognized: - -``` | table -0 UTF-8 -1 ASCII -2 EBCDIC -3 ISO 8859-1 Western Europe -4 ISO 8859-2 Western and Central Europe -5 ISO 8859-3 Western Europe and South European -6 ISO 8859-4 Western Europe and Baltic countries -7 ISO 8859-5 Cyrillic alphabet -8 ISO 8859-6 Arabic -9 ISO 8859-7 Greek -10 ISO 8859-8 Hebrew -11 ISO 8859-9 Western Europe with amended Turkish character set -12 ISO 8859-10 Western Europe with rationalized Nordic character set -13 ISO 8859-11 Thai -14 ISO 8859-13 Baltic languages plus Polish -15 ISO 8859-14 Celtic languages -16 ISO 8859-15 ISO 8859-1 with rationalizations -17 ISO 8859-16 Central, Eastern and Southern European languages -18 CP437 -19 CP720 -20 CP737 -21 CP850 -22 CP852 -23 CP855 -24 CP857 -25 CP858 -26 CP860 -27 CP861 -28 CP862 -29 CP863 -30 CP865 -31 CP866 -32 CP869 -33 CP872 -34 Windows-1250 for Central European languages that use Latin script -35 Windows-1251 for Cyrillic alphabets -36 Windows-1252 for Western languages -37 Windows-1253 for Greek -38 Windows-1254 for Turkish -39 Windows-1255 for Hebrew -40 Windows-1256 for Arabic -41 Windows-1257 for Baltic languages -42 Windows-1258 for Vietnamese -43 Mac OS Roman -44 KOI8-R -45 KOI8-U -46 KOI7 -47 MIK -48 ISCII -49 TSCII -50 VISCII -51 Shift JIS -52 EUC-JP -53 ISO-2022-JP -54 JIS X 0213 -55 Shift_JIS-2004 -56 EUC-JIS-2004 -57 ISO-2022-JP-2004 -58 GB 2312 -59 GBK (Microsoft Code page 936) -60 GB 18030 -61 Taiwan Big5 (a more famous variant is Microsoft Code page 950) -62 Hong Kong HKSCS -63 Korean -64 EUC-KR -65 ISO-2022-KR -``` - -This list is pulled from Wikipedia's entry on common character encodings[2], so -it may need to be revised. - -## END NOTES. - -=> https://jvns.ca/blog/2017/02/07/mtu/ [1]: J. Evans. "How big can a packet get?" -=> https://en.wikipedia.org/wiki/Character_encoding#Common_character_encodings [2]: Wikipedia. "Character encoding" diff --git a/readme.txt b/readme.txt deleted file mode 120000 index ec0f06e..0000000 --- a/readme.txt +++ /dev/null @@ -1 +0,0 @@ -protocol.txt \ No newline at end of file diff --git a/readme.txt b/readme.txt new file mode 100644 index 0000000..323a0d8 --- /dev/null +++ b/readme.txt @@ -0,0 +1,134 @@ +# Postcard protocol, version 1 + +The POSTCARD PROTOCOL is a federated asynchronous communication protocol that +aims to simulate the experience of sending postcards through the physical mail +online. A NETWORK consists of a series of servers, known as POST OFFICES, which +provide a limited number of user accounts (P.O. BOXES) that can send and receive +short messages, called POSTCARDS. The protocol is implemented on top of UDP and +tries to be relatively secure and mitigative against abuse. + +The postcard protocol is in ALPHA status and may change while we're working +toward a version 1.0. Below, each part of the protocol is described. + +## POSTCARD. + +A POSTCARD is a datagram that fits within a single UDP packet. While the +*actual* limit to UDP packet size is 2^16 bits[1], the number I've seen around +the internet for a *safe* UDP packet size is more like 512 bytes. So that's +what we're going with --- especially because a physical postcard is much closer +in data size to 512 bytes than 2^16 or even 1500 (another limit mentioned in +Julia Evans's blog referenced above). + +A POSTCARD's size is further restricted (though not by much) by a short header +with the following fields: + +* 2b MAGIC NUMBER: the ASCII codepoints of the letters "PC" (for postcard) +* 1b VERSION: The version of the postcard protocol being used +* 1b ENCODING: The text encoding of the postcard's message (see Appendix A) +* 1b TOBOX: The P.O. BOX of the message recipient +* 1b FROMBOX: The P.O. BOX of the message sender + +This header is 6 bytes long, leaving 506 bytes for message text: + +``` +,___________________________________________________________________. +|________________ POSTCARD PROTOCOL DATAGRAM SKETCH ________________| +> "PC" (2 bytes) |version,encoding|to(1b) from(1b)|message(506b) | +|0101000001000011|0000000100000000|ttttttttffffffff| . . . . . . . .| +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +``` + +## P.O. BOX. + +## POST OFFICE. + +## NETWORK. + +### APPENDIX A. Encoding table. + +While UTF-8 is the answer for almost every question of encoding in the modern +day, on a byte-restricted format like the postcard protocol I think it's unfair +for non-English speakers to be forced to use double or even triple the bytes to +send the same message. Therefore, the ENCODING field of a postcard's metadata +corresponds to the message encoding, which allows senders to choose a more +storage-friendly encoding for their messages. + +ENCODING is stored as a 1-byte number, allowing for 256 possible encodings. As +of Postcard protocol v. 1, these are recognized: + +``` | table +0 UTF-8 +1 ASCII +2 EBCDIC +3 ISO 8859-1 Western Europe +4 ISO 8859-2 Western and Central Europe +5 ISO 8859-3 Western Europe and South European +6 ISO 8859-4 Western Europe and Baltic countries +7 ISO 8859-5 Cyrillic alphabet +8 ISO 8859-6 Arabic +9 ISO 8859-7 Greek +10 ISO 8859-8 Hebrew +11 ISO 8859-9 Western Europe with amended Turkish character set +12 ISO 8859-10 Western Europe with rationalized Nordic character set +13 ISO 8859-11 Thai +14 ISO 8859-13 Baltic languages plus Polish +15 ISO 8859-14 Celtic languages +16 ISO 8859-15 ISO 8859-1 with rationalizations +17 ISO 8859-16 Central, Eastern and Southern European languages +18 CP437 +19 CP720 +20 CP737 +21 CP850 +22 CP852 +23 CP855 +24 CP857 +25 CP858 +26 CP860 +27 CP861 +28 CP862 +29 CP863 +30 CP865 +31 CP866 +32 CP869 +33 CP872 +34 Windows-1250 for Central European languages that use Latin script +35 Windows-1251 for Cyrillic alphabets +36 Windows-1252 for Western languages +37 Windows-1253 for Greek +38 Windows-1254 for Turkish +39 Windows-1255 for Hebrew +40 Windows-1256 for Arabic +41 Windows-1257 for Baltic languages +42 Windows-1258 for Vietnamese +43 Mac OS Roman +44 KOI8-R +45 KOI8-U +46 KOI7 +47 MIK +48 ISCII +49 TSCII +50 VISCII +51 Shift JIS +52 EUC-JP +53 ISO-2022-JP +54 JIS X 0213 +55 Shift_JIS-2004 +56 EUC-JIS-2004 +57 ISO-2022-JP-2004 +58 GB 2312 +59 GBK (Microsoft Code page 936) +60 GB 18030 +61 Taiwan Big5 (a more famous variant is Microsoft Code page 950) +62 Hong Kong HKSCS +63 Korean +64 EUC-KR +65 ISO-2022-KR +``` + +This list is pulled from Wikipedia's entry on common character encodings[2], so +it may need to be revised. + +## END NOTES. + +=> https://jvns.ca/blog/2017/02/07/mtu/ [1]: J. Evans. "How big can a packet get?" +=> https://en.wikipedia.org/wiki/Character_encoding#Common_character_encodings [2]: Wikipedia. "Character encoding" -- cgit 1.4.1-21-gabe81