1 files changed, 134 insertions, 0 deletions
diff --git a/protocol.txt b/protocol.txt
new file mode 100644
index 0000000..323a0d8
--- /dev/null
+++ b/protocol.txt

@@ -0,0 +1,134 @@
+# Postcard protocol, version 1
+The POSTCARD PROTOCOL is a federated asynchronous communication protocol that
+aims to simulate the experience of sending postcards through the physical mail
+online.  A NETWORK consists of a series of servers, known as POST OFFICES, which
+provide a limited number of user accounts (P.O. BOXES) that can send and receive
+short messages, called POSTCARDS.  The protocol is implemented on top of UDP and
+tries to be relatively secure and mitigative against abuse.
+The postcard protocol is in ALPHA status and may change while we're working
+toward a version 1.0.  Below, each part of the protocol is described.
+## POSTCARD.
+A POSTCARD is a datagram that fits within a single UDP packet.  While the
+*actual* limit to UDP packet size is 2^16 bits[1], the number I've seen around
+the internet for a *safe* UDP packet size is more like 512 bytes.  So that's
+what we're going with --- especially because a physical postcard is much closer
+in data size to 512 bytes than 2^16 or even 1500 (another limit mentioned in
+Julia Evans's blog referenced above).
+A POSTCARD's size is further restricted (though not by much) by a short header
+with the following fields:
+* 2b MAGIC NUMBER: the ASCII codepoints of the letters "PC" (for postcard)
+* 1b VERSION: The version of the postcard protocol being used
+* 1b ENCODING: The text encoding of the postcard's message (see Appendix A)
+* 1b TOBOX: The P.O. BOX of the message recipient
+* 1b FROMBOX: The P.O. BOX of the message sender
+This header is 6 bytes long, leaving 506 bytes for message text:
+```
+,___________________________________________________________________.
+|________________ POSTCARD PROTOCOL DATAGRAM SKETCH ________________|
+> "PC" (2 bytes) |version,encoding|to(1b)  from(1b)|message(506b)   |
+|0101000001000011|0000000100000000|ttttttttffffffff| . . . . . . . .|
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+```
+## P.O. BOX.
+## POST OFFICE.
+## NETWORK.
+### APPENDIX A. Encoding table.
+While UTF-8 is the answer for almost every question of encoding in the modern
+day, on a byte-restricted format like the postcard protocol I think it's unfair
+for non-English speakers to be forced to use double or even triple the bytes to
+send the same message.  Therefore, the ENCODING field of a postcard's metadata
+corresponds to the message encoding, which allows senders to choose a more
+storage-friendly encoding for their messages.
+ENCODING is stored as a 1-byte number, allowing for 256 possible encodings.  As
+of Postcard protocol v. 1, these are recognized:
+``` | table
+0       UTF-8
+1       ASCII
+2       EBCDIC
+3       ISO 8859-1 Western Europe
+4       ISO 8859-2 Western and Central Europe
+5       ISO 8859-3 Western Europe and South European
+6       ISO 8859-4 Western Europe and Baltic countries
+7       ISO 8859-5 Cyrillic alphabet
+8       ISO 8859-6 Arabic
+9       ISO 8859-7 Greek
+10      ISO 8859-8 Hebrew
+11      ISO 8859-9 Western Europe with amended Turkish character set
+12      ISO 8859-10 Western Europe with rationalized Nordic character set
+13      ISO 8859-11 Thai
+14      ISO 8859-13 Baltic languages plus Polish
+15      ISO 8859-14 Celtic languages
+16      ISO 8859-15 ISO 8859-1 with rationalizations
+17      ISO 8859-16 Central, Eastern and Southern European languages
+18      CP437
+19      CP720
+20      CP737
+21      CP850
+22      CP852
+23      CP855
+24      CP857
+25      CP858
+26      CP860
+27      CP861
+28      CP862
+29      CP863
+30      CP865
+31      CP866
+32      CP869
+33      CP872
+34      Windows-1250 for Central European languages that use Latin script
+35      Windows-1251 for Cyrillic alphabets
+36      Windows-1252 for Western languages
+37      Windows-1253 for Greek
+38      Windows-1254 for Turkish
+39      Windows-1255 for Hebrew
+40      Windows-1256 for Arabic
+41      Windows-1257 for Baltic languages
+42      Windows-1258 for Vietnamese
+43      Mac OS Roman
+44      KOI8-R
+45      KOI8-U
+46      KOI7
+47      MIK
+48      ISCII
+49      TSCII
+50      VISCII
+51      Shift JIS
+52      EUC-JP
+53      ISO-2022-JP
+54      JIS X 0213
+55      Shift_JIS-2004
+56      EUC-JIS-2004
+57      ISO-2022-JP-2004
+58      GB 2312
+59      GBK (Microsoft Code page 936)
+60      GB 18030
+61      Taiwan Big5 (a more famous variant is Microsoft Code page 950)
+62      Hong Kong HKSCS
+63      Korean
+64      EUC-KR
+65      ISO-2022-KR
+```
+This list is pulled from Wikipedia's entry on common character encodings[2], so
+it may need to be revised.
+## END NOTES.
+=> https://jvns.ca/blog/2017/02/07/mtu/ [1]: J. Evans. "How big can a packet get?"
+=> https://en.wikipedia.org/wiki/Character_encoding#Common_character_encodings [2]: Wikipedia.  "Character encoding"

diff --git a/protocol.txt b/protocol.txt new file mode 100644 index 0000000..323a0d8 --- /dev/null +++ b/protocol.txt
@@ -0,0 +1,134 @@
	1	# Postcard protocol, version 1
	2
	3	The POSTCARD PROTOCOL is a federated asynchronous communication protocol that
	4	aims to simulate the experience of sending postcards through the physical mail
	5	online. A NETWORK consists of a series of servers, known as POST OFFICES, which
	6	provide a limited number of user accounts (P.O. BOXES) that can send and receive
	7	short messages, called POSTCARDS. The protocol is implemented on top of UDP and
	8	tries to be relatively secure and mitigative against abuse.
	9
	10	The postcard protocol is in ALPHA status and may change while we're working
	11	toward a version 1.0. Below, each part of the protocol is described.
	12
	13	## POSTCARD.
	14
	15	A POSTCARD is a datagram that fits within a single UDP packet. While the
	16	actual limit to UDP packet size is 2^16 bits[1], the number I've seen around
	17	the internet for a safe UDP packet size is more like 512 bytes. So that's
	18	what we're going with --- especially because a physical postcard is much closer
	19	in data size to 512 bytes than 2^16 or even 1500 (another limit mentioned in
	20	Julia Evans's blog referenced above).
	21
	22	A POSTCARD's size is further restricted (though not by much) by a short header
	23	with the following fields:
	24
	25	* 2b MAGIC NUMBER: the ASCII codepoints of the letters "PC" (for postcard)
	26	* 1b VERSION: The version of the postcard protocol being used
	27	* 1b ENCODING: The text encoding of the postcard's message (see Appendix A)
	28	* 1b TOBOX: The P.O. BOX of the message recipient
	29	* 1b FROMBOX: The P.O. BOX of the message sender
	30
	31	This header is 6 bytes long, leaving 506 bytes for message text:
	32
	33	```
	34	,___________________________________________________________________.
	35	\|________________ POSTCARD PROTOCOL DATAGRAM SKETCH ________________\|
	36	> "PC" (2 bytes) \|version,encoding\|to(1b) from(1b)\|message(506b) \|
	37	\|0101000001000011\|0000000100000000\|ttttttttffffffff\| . . . . . . . .\|
	38	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	39	```
	40
	41	## P.O. BOX.
	42
	43	## POST OFFICE.
	44
	45	## NETWORK.
	46
	47	### APPENDIX A. Encoding table.
	48
	49	While UTF-8 is the answer for almost every question of encoding in the modern
	50	day, on a byte-restricted format like the postcard protocol I think it's unfair
	51	for non-English speakers to be forced to use double or even triple the bytes to
	52	send the same message. Therefore, the ENCODING field of a postcard's metadata
	53	corresponds to the message encoding, which allows senders to choose a more
	54	storage-friendly encoding for their messages.
	55
	56	ENCODING is stored as a 1-byte number, allowing for 256 possible encodings. As
	57	of Postcard protocol v. 1, these are recognized:
	58
	59	``` \| table
	60	0 UTF-8
	61	1 ASCII
	62	2 EBCDIC
	63	3 ISO 8859-1 Western Europe
	64	4 ISO 8859-2 Western and Central Europe
	65	5 ISO 8859-3 Western Europe and South European
	66	6 ISO 8859-4 Western Europe and Baltic countries
	67	7 ISO 8859-5 Cyrillic alphabet
	68	8 ISO 8859-6 Arabic
	69	9 ISO 8859-7 Greek
	70	10 ISO 8859-8 Hebrew
	71	11 ISO 8859-9 Western Europe with amended Turkish character set
	72	12 ISO 8859-10 Western Europe with rationalized Nordic character set
	73	13 ISO 8859-11 Thai
	74	14 ISO 8859-13 Baltic languages plus Polish
	75	15 ISO 8859-14 Celtic languages
	76	16 ISO 8859-15 ISO 8859-1 with rationalizations
	77	17 ISO 8859-16 Central, Eastern and Southern European languages
	78	18 CP437
	79	19 CP720
	80	20 CP737
	81	21 CP850
	82	22 CP852
	83	23 CP855
	84	24 CP857
	85	25 CP858
	86	26 CP860
	87	27 CP861
	88	28 CP862
	89	29 CP863
	90	30 CP865
	91	31 CP866
	92	32 CP869
	93	33 CP872
	94	34 Windows-1250 for Central European languages that use Latin script
	95	35 Windows-1251 for Cyrillic alphabets
	96	36 Windows-1252 for Western languages
	97	37 Windows-1253 for Greek
	98	38 Windows-1254 for Turkish
	99	39 Windows-1255 for Hebrew
	100	40 Windows-1256 for Arabic
	101	41 Windows-1257 for Baltic languages
	102	42 Windows-1258 for Vietnamese
	103	43 Mac OS Roman
	104	44 KOI8-R
	105	45 KOI8-U
	106	46 KOI7
	107	47 MIK
	108	48 ISCII
	109	49 TSCII
	110	50 VISCII
	111	51 Shift JIS
	112	52 EUC-JP
	113	53 ISO-2022-JP
	114	54 JIS X 0213
	115	55 Shift_JIS-2004
	116	56 EUC-JIS-2004
	117	57 ISO-2022-JP-2004
	118	58 GB 2312
	119	59 GBK (Microsoft Code page 936)
	120	60 GB 18030
	121	61 Taiwan Big5 (a more famous variant is Microsoft Code page 950)
	122	62 Hong Kong HKSCS
	123	63 Korean
	124	64 EUC-KR
	125	65 ISO-2022-KR
	126	```
	127
	128	This list is pulled from Wikipedia's entry on common character encodings[2], so
	129	it may need to be revised.
	130
	131	## END NOTES.
	132
	133	=> https://jvns.ca/blog/2017/02/07/mtu/ [1]: J. Evans. "How big can a packet get?"
	134	=> https://en.wikipedia.org/wiki/Character_encoding#Common_character_encodings [2]: Wikipedia. "Character encoding"