Move protocol.txt to readme.txt main

author: Case Duckworth 2024-06-04 12:29:05 -0500
committer: Case Duckworth 2024-06-04 12:29:05 -0500
commit: 9d59c162fa5f2c0d49feffb94694d76864dfccf3 (patch)
tree: b576efc07de70951e5394a4bbf492c40356295d4 /readme.txt
parent: Add link to readme.txt (diff)
download: postcard-main.tar.gz
postcard-main.zip
1 files changed, 134 insertions, 1 deletions
diff --git a/readme.txt b/readme.txt
index ec0f06e..323a0d8 120000..100644
--- a/readme.txt
+++ b/readme.txt

@@ -1 +1,134 @@
-protocol.txt
-\ No newline at end of file
+# Postcard protocol, version 1
+The POSTCARD PROTOCOL is a federated asynchronous communication protocol that
+aims to simulate the experience of sending postcards through the physical mail
+online.  A NETWORK consists of a series of servers, known as POST OFFICES, which
+provide a limited number of user accounts (P.O. BOXES) that can send and receive
+short messages, called POSTCARDS.  The protocol is implemented on top of UDP and
+tries to be relatively secure and mitigative against abuse.
+The postcard protocol is in ALPHA status and may change while we're working
+toward a version 1.0.  Below, each part of the protocol is described.
+## POSTCARD.
+A POSTCARD is a datagram that fits within a single UDP packet.  While the
+*actual* limit to UDP packet size is 2^16 bits[1], the number I've seen around
+the internet for a *safe* UDP packet size is more like 512 bytes.  So that's
+what we're going with --- especially because a physical postcard is much closer
+in data size to 512 bytes than 2^16 or even 1500 (another limit mentioned in
+Julia Evans's blog referenced above).
+A POSTCARD's size is further restricted (though not by much) by a short header
+with the following fields:
+* 2b MAGIC NUMBER: the ASCII codepoints of the letters "PC" (for postcard)
+* 1b VERSION: The version of the postcard protocol being used
+* 1b ENCODING: The text encoding of the postcard's message (see Appendix A)
+* 1b TOBOX: The P.O. BOX of the message recipient
+* 1b FROMBOX: The P.O. BOX of the message sender
+This header is 6 bytes long, leaving 506 bytes for message text:
+```
+,___________________________________________________________________.
+|________________ POSTCARD PROTOCOL DATAGRAM SKETCH ________________|
+> "PC" (2 bytes) |version,encoding|to(1b)  from(1b)|message(506b)   |
+|0101000001000011|0000000100000000|ttttttttffffffff| . . . . . . . .|
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+```
+## P.O. BOX.
+## POST OFFICE.
+## NETWORK.
+### APPENDIX A. Encoding table.
+While UTF-8 is the answer for almost every question of encoding in the modern
+day, on a byte-restricted format like the postcard protocol I think it's unfair
+for non-English speakers to be forced to use double or even triple the bytes to
+send the same message.  Therefore, the ENCODING field of a postcard's metadata
+corresponds to the message encoding, which allows senders to choose a more
+storage-friendly encoding for their messages.
+ENCODING is stored as a 1-byte number, allowing for 256 possible encodings.  As
+of Postcard protocol v. 1, these are recognized:
+``` | table
+0       UTF-8
+1       ASCII
+2       EBCDIC
+3       ISO 8859-1 Western Europe
+4       ISO 8859-2 Western and Central Europe
+5       ISO 8859-3 Western Europe and South European
+6       ISO 8859-4 Western Europe and Baltic countries
+7       ISO 8859-5 Cyrillic alphabet
+8       ISO 8859-6 Arabic
+9       ISO 8859-7 Greek
+10      ISO 8859-8 Hebrew
+11      ISO 8859-9 Western Europe with amended Turkish character set
+12      ISO 8859-10 Western Europe with rationalized Nordic character set
+13      ISO 8859-11 Thai
+14      ISO 8859-13 Baltic languages plus Polish
+15      ISO 8859-14 Celtic languages
+16      ISO 8859-15 ISO 8859-1 with rationalizations
+17      ISO 8859-16 Central, Eastern and Southern European languages
+18      CP437
+19      CP720
+20      CP737
+21      CP850
+22      CP852
+23      CP855
+24      CP857
+25      CP858
+26      CP860
+27      CP861
+28      CP862
+29      CP863
+30      CP865
+31      CP866
+32      CP869
+33      CP872
+34      Windows-1250 for Central European languages that use Latin script
+35      Windows-1251 for Cyrillic alphabets
+36      Windows-1252 for Western languages
+37      Windows-1253 for Greek
+38      Windows-1254 for Turkish
+39      Windows-1255 for Hebrew
+40      Windows-1256 for Arabic
+41      Windows-1257 for Baltic languages
+42      Windows-1258 for Vietnamese
+43      Mac OS Roman
+44      KOI8-R
+45      KOI8-U
+46      KOI7
+47      MIK
+48      ISCII
+49      TSCII
+50      VISCII
+51      Shift JIS
+52      EUC-JP
+53      ISO-2022-JP
+54      JIS X 0213
+55      Shift_JIS-2004
+56      EUC-JIS-2004
+57      ISO-2022-JP-2004
+58      GB 2312
+59      GBK (Microsoft Code page 936)
+60      GB 18030
+61      Taiwan Big5 (a more famous variant is Microsoft Code page 950)
+62      Hong Kong HKSCS
+63      Korean
+64      EUC-KR
+65      ISO-2022-KR
+```
+This list is pulled from Wikipedia's entry on common character encodings[2], so
+it may need to be revised.
+## END NOTES.
+=> https://jvns.ca/blog/2017/02/07/mtu/ [1]: J. Evans. "How big can a packet get?"
+=> https://en.wikipedia.org/wiki/Character_encoding#Common_character_encodings [2]: Wikipedia.  "Character encoding"
author	Case Duckworth	2024-06-04 12:29:05 -0500
committer	Case Duckworth	2024-06-04 12:29:05 -0500
commit	9d59c162fa5f2c0d49feffb94694d76864dfccf3 (patch)
tree	b576efc07de70951e5394a4bbf492c40356295d4 /readme.txt
parent	Add link to readme.txt (diff)
download	postcard-main.tar.gz postcard-main.zip

diff --git a/readme.txt b/readme.txt index ec0f06e..323a0d8 120000..100644 --- a/readme.txt +++ b/readme.txt
@@ -1 +1,134 @@
1	protocol.txt \ No newline at end of file	1	# Postcard protocol, version 1
		2
		3	The POSTCARD PROTOCOL is a federated asynchronous communication protocol that
		4	aims to simulate the experience of sending postcards through the physical mail
		5	online. A NETWORK consists of a series of servers, known as POST OFFICES, which
		6	provide a limited number of user accounts (P.O. BOXES) that can send and receive
		7	short messages, called POSTCARDS. The protocol is implemented on top of UDP and
		8	tries to be relatively secure and mitigative against abuse.
		9
		10	The postcard protocol is in ALPHA status and may change while we're working
		11	toward a version 1.0. Below, each part of the protocol is described.
		12
		13	## POSTCARD.
		14
		15	A POSTCARD is a datagram that fits within a single UDP packet. While the
		16	actual limit to UDP packet size is 2^16 bits[1], the number I've seen around
		17	the internet for a safe UDP packet size is more like 512 bytes. So that's
		18	what we're going with --- especially because a physical postcard is much closer
		19	in data size to 512 bytes than 2^16 or even 1500 (another limit mentioned in
		20	Julia Evans's blog referenced above).
		21
		22	A POSTCARD's size is further restricted (though not by much) by a short header
		23	with the following fields:
		24
		25	* 2b MAGIC NUMBER: the ASCII codepoints of the letters "PC" (for postcard)
		26	* 1b VERSION: The version of the postcard protocol being used
		27	* 1b ENCODING: The text encoding of the postcard's message (see Appendix A)
		28	* 1b TOBOX: The P.O. BOX of the message recipient
		29	* 1b FROMBOX: The P.O. BOX of the message sender
		30
		31	This header is 6 bytes long, leaving 506 bytes for message text:
		32
		33	```
		34	,___________________________________________________________________.
		35	\|________________ POSTCARD PROTOCOL DATAGRAM SKETCH ________________\|
		36	> "PC" (2 bytes) \|version,encoding\|to(1b) from(1b)\|message(506b) \|
		37	\|0101000001000011\|0000000100000000\|ttttttttffffffff\| . . . . . . . .\|
		38	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
		39	```
		40
		41	## P.O. BOX.
		42
		43	## POST OFFICE.
		44
		45	## NETWORK.
		46
		47	### APPENDIX A. Encoding table.
		48
		49	While UTF-8 is the answer for almost every question of encoding in the modern
		50	day, on a byte-restricted format like the postcard protocol I think it's unfair
		51	for non-English speakers to be forced to use double or even triple the bytes to
		52	send the same message. Therefore, the ENCODING field of a postcard's metadata
		53	corresponds to the message encoding, which allows senders to choose a more
		54	storage-friendly encoding for their messages.
		55
		56	ENCODING is stored as a 1-byte number, allowing for 256 possible encodings. As
		57	of Postcard protocol v. 1, these are recognized:
		58
		59	``` \| table
		60	0 UTF-8
		61	1 ASCII
		62	2 EBCDIC
		63	3 ISO 8859-1 Western Europe
		64	4 ISO 8859-2 Western and Central Europe
		65	5 ISO 8859-3 Western Europe and South European
		66	6 ISO 8859-4 Western Europe and Baltic countries
		67	7 ISO 8859-5 Cyrillic alphabet
		68	8 ISO 8859-6 Arabic
		69	9 ISO 8859-7 Greek
		70	10 ISO 8859-8 Hebrew
		71	11 ISO 8859-9 Western Europe with amended Turkish character set
		72	12 ISO 8859-10 Western Europe with rationalized Nordic character set
		73	13 ISO 8859-11 Thai
		74	14 ISO 8859-13 Baltic languages plus Polish
		75	15 ISO 8859-14 Celtic languages
		76	16 ISO 8859-15 ISO 8859-1 with rationalizations
		77	17 ISO 8859-16 Central, Eastern and Southern European languages
		78	18 CP437
		79	19 CP720
		80	20 CP737
		81	21 CP850
		82	22 CP852
		83	23 CP855
		84	24 CP857
		85	25 CP858
		86	26 CP860
		87	27 CP861
		88	28 CP862
		89	29 CP863
		90	30 CP865
		91	31 CP866
		92	32 CP869
		93	33 CP872
		94	34 Windows-1250 for Central European languages that use Latin script
		95	35 Windows-1251 for Cyrillic alphabets
		96	36 Windows-1252 for Western languages
		97	37 Windows-1253 for Greek
		98	38 Windows-1254 for Turkish
		99	39 Windows-1255 for Hebrew
		100	40 Windows-1256 for Arabic
		101	41 Windows-1257 for Baltic languages
		102	42 Windows-1258 for Vietnamese
		103	43 Mac OS Roman
		104	44 KOI8-R
		105	45 KOI8-U
		106	46 KOI7
		107	47 MIK
		108	48 ISCII
		109	49 TSCII
		110	50 VISCII
		111	51 Shift JIS
		112	52 EUC-JP
		113	53 ISO-2022-JP
		114	54 JIS X 0213
		115	55 Shift_JIS-2004
		116	56 EUC-JIS-2004
		117	57 ISO-2022-JP-2004
		118	58 GB 2312
		119	59 GBK (Microsoft Code page 936)
		120	60 GB 18030
		121	61 Taiwan Big5 (a more famous variant is Microsoft Code page 950)
		122	62 Hong Kong HKSCS
		123	63 Korean
		124	64 EUC-KR
		125	65 ISO-2022-KR
		126	```
		127
		128	This list is pulled from Wikipedia's entry on common character encodings[2], so
		129	it may need to be revised.
		130
		131	## END NOTES.
		132
		133	=> https://jvns.ca/blog/2017/02/07/mtu/ [1]: J. Evans. "How big can a packet get?"
		134	=> https://en.wikipedia.org/wiki/Character_encoding#Common_character_encodings [2]: Wikipedia. "Character encoding"