diff options
-rw-r--r-- | protocol.txt | 134 |
1 files changed, 134 insertions, 0 deletions
diff --git a/protocol.txt b/protocol.txt new file mode 100644 index 0000000..323a0d8 --- /dev/null +++ b/protocol.txt | |||
@@ -0,0 +1,134 @@ | |||
1 | # Postcard protocol, version 1 | ||
2 | |||
3 | The POSTCARD PROTOCOL is a federated asynchronous communication protocol that | ||
4 | aims to simulate the experience of sending postcards through the physical mail | ||
5 | online. A NETWORK consists of a series of servers, known as POST OFFICES, which | ||
6 | provide a limited number of user accounts (P.O. BOXES) that can send and receive | ||
7 | short messages, called POSTCARDS. The protocol is implemented on top of UDP and | ||
8 | tries to be relatively secure and mitigative against abuse. | ||
9 | |||
10 | The postcard protocol is in ALPHA status and may change while we're working | ||
11 | toward a version 1.0. Below, each part of the protocol is described. | ||
12 | |||
13 | ## POSTCARD. | ||
14 | |||
15 | A POSTCARD is a datagram that fits within a single UDP packet. While the | ||
16 | *actual* limit to UDP packet size is 2^16 bits[1], the number I've seen around | ||
17 | the internet for a *safe* UDP packet size is more like 512 bytes. So that's | ||
18 | what we're going with --- especially because a physical postcard is much closer | ||
19 | in data size to 512 bytes than 2^16 or even 1500 (another limit mentioned in | ||
20 | Julia Evans's blog referenced above). | ||
21 | |||
22 | A POSTCARD's size is further restricted (though not by much) by a short header | ||
23 | with the following fields: | ||
24 | |||
25 | * 2b MAGIC NUMBER: the ASCII codepoints of the letters "PC" (for postcard) | ||
26 | * 1b VERSION: The version of the postcard protocol being used | ||
27 | * 1b ENCODING: The text encoding of the postcard's message (see Appendix A) | ||
28 | * 1b TOBOX: The P.O. BOX of the message recipient | ||
29 | * 1b FROMBOX: The P.O. BOX of the message sender | ||
30 | |||
31 | This header is 6 bytes long, leaving 506 bytes for message text: | ||
32 | |||
33 | ``` | ||
34 | ,___________________________________________________________________. | ||
35 | |________________ POSTCARD PROTOCOL DATAGRAM SKETCH ________________| | ||
36 | > "PC" (2 bytes) |version,encoding|to(1b) from(1b)|message(506b) | | ||
37 | |0101000001000011|0000000100000000|ttttttttffffffff| . . . . . . . .| | ||
38 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
39 | ``` | ||
40 | |||
41 | ## P.O. BOX. | ||
42 | |||
43 | ## POST OFFICE. | ||
44 | |||
45 | ## NETWORK. | ||
46 | |||
47 | ### APPENDIX A. Encoding table. | ||
48 | |||
49 | While UTF-8 is the answer for almost every question of encoding in the modern | ||
50 | day, on a byte-restricted format like the postcard protocol I think it's unfair | ||
51 | for non-English speakers to be forced to use double or even triple the bytes to | ||
52 | send the same message. Therefore, the ENCODING field of a postcard's metadata | ||
53 | corresponds to the message encoding, which allows senders to choose a more | ||
54 | storage-friendly encoding for their messages. | ||
55 | |||
56 | ENCODING is stored as a 1-byte number, allowing for 256 possible encodings. As | ||
57 | of Postcard protocol v. 1, these are recognized: | ||
58 | |||
59 | ``` | table | ||
60 | 0 UTF-8 | ||
61 | 1 ASCII | ||
62 | 2 EBCDIC | ||
63 | 3 ISO 8859-1 Western Europe | ||
64 | 4 ISO 8859-2 Western and Central Europe | ||
65 | 5 ISO 8859-3 Western Europe and South European | ||
66 | 6 ISO 8859-4 Western Europe and Baltic countries | ||
67 | 7 ISO 8859-5 Cyrillic alphabet | ||
68 | 8 ISO 8859-6 Arabic | ||
69 | 9 ISO 8859-7 Greek | ||
70 | 10 ISO 8859-8 Hebrew | ||
71 | 11 ISO 8859-9 Western Europe with amended Turkish character set | ||
72 | 12 ISO 8859-10 Western Europe with rationalized Nordic character set | ||
73 | 13 ISO 8859-11 Thai | ||
74 | 14 ISO 8859-13 Baltic languages plus Polish | ||
75 | 15 ISO 8859-14 Celtic languages | ||
76 | 16 ISO 8859-15 ISO 8859-1 with rationalizations | ||
77 | 17 ISO 8859-16 Central, Eastern and Southern European languages | ||
78 | 18 CP437 | ||
79 | 19 CP720 | ||
80 | 20 CP737 | ||
81 | 21 CP850 | ||
82 | 22 CP852 | ||
83 | 23 CP855 | ||
84 | 24 CP857 | ||
85 | 25 CP858 | ||
86 | 26 CP860 | ||
87 | 27 CP861 | ||
88 | 28 CP862 | ||
89 | 29 CP863 | ||
90 | 30 CP865 | ||
91 | 31 CP866 | ||
92 | 32 CP869 | ||
93 | 33 CP872 | ||
94 | 34 Windows-1250 for Central European languages that use Latin script | ||
95 | 35 Windows-1251 for Cyrillic alphabets | ||
96 | 36 Windows-1252 for Western languages | ||
97 | 37 Windows-1253 for Greek | ||
98 | 38 Windows-1254 for Turkish | ||
99 | 39 Windows-1255 for Hebrew | ||
100 | 40 Windows-1256 for Arabic | ||
101 | 41 Windows-1257 for Baltic languages | ||
102 | 42 Windows-1258 for Vietnamese | ||
103 | 43 Mac OS Roman | ||
104 | 44 KOI8-R | ||
105 | 45 KOI8-U | ||
106 | 46 KOI7 | ||
107 | 47 MIK | ||
108 | 48 ISCII | ||
109 | 49 TSCII | ||
110 | 50 VISCII | ||
111 | 51 Shift JIS | ||
112 | 52 EUC-JP | ||
113 | 53 ISO-2022-JP | ||
114 | 54 JIS X 0213 | ||
115 | 55 Shift_JIS-2004 | ||
116 | 56 EUC-JIS-2004 | ||
117 | 57 ISO-2022-JP-2004 | ||
118 | 58 GB 2312 | ||
119 | 59 GBK (Microsoft Code page 936) | ||
120 | 60 GB 18030 | ||
121 | 61 Taiwan Big5 (a more famous variant is Microsoft Code page 950) | ||
122 | 62 Hong Kong HKSCS | ||
123 | 63 Korean | ||
124 | 64 EUC-KR | ||
125 | 65 ISO-2022-KR | ||
126 | ``` | ||
127 | |||
128 | This list is pulled from Wikipedia's entry on common character encodings[2], so | ||
129 | it may need to be revised. | ||
130 | |||
131 | ## END NOTES. | ||
132 | |||
133 | => https://jvns.ca/blog/2017/02/07/mtu/ [1]: J. Evans. "How big can a packet get?" | ||
134 | => https://en.wikipedia.org/wiki/Character_encoding#Common_character_encodings [2]: Wikipedia. "Character encoding" | ||