From 5334d663073165bcf8a25a5dd36667b642f9829c Mon Sep 17 00:00:00 2001 From: akwizgran Date: Mon, 7 Dec 2015 14:26:06 +0000 Subject: [PATCH] akwizgran created page: ClientApiNotes --- ClientApiNotes.md | 53 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+) create mode 100644 ClientApiNotes.md diff --git a/ClientApiNotes.md b/ClientApiNotes.md new file mode 100644 index 0000000..e8b6203 --- /dev/null +++ b/ClientApiNotes.md @@ -0,0 +1,53 @@ +[BSP](BSP) clients may need to store their own metadata as well as the data they sync via BSP. Metadata may be extracted from the data (e.g. the subject line of a message), or it may describe the state of the client (e.g. whether the user has starred a message). It may refer to a single message or the relationships between messages. We have to decide whether the metadata should be stored in the same database as the data, and if so, what the API for storing and querying metadata should look like. + +Issues to consider: +* Encapsulation - if clients have low-level access to the data store they can't be insulated from each other +* Modularity - if we want to release the protocol stack as a separate library it should have a well-defined API, not all of SQL +* Expressiveness - if we provide an API for metadata it must be rich enough that clients don't need to use a separate database +* Performance - if we provide an API for metadata it must have comparable performance to using a separate database +* Transactions - do clients need to update metadata and data in a single atomic operation? Do they need atomic operations spanning multiple messages? +* Encryption - if clients use their own metadata storage it won't benefit from our database encryption + +Use cases to consider: +* Full text search - efficient queries over multiple messages +* Attachments - simple case of relationships between messages +* Peer moderation - users share upvotes and downvotes that refer to messages; if a message is eligible to be shared, so are all its ancestors +* Expiry - delete discussion threads that have been inactive for a certain amount of time + +Considering the above issues and uses cases, my current thinking is as follows: + +* Store metadata in the same database as data +* Allow arbitrary key/value pairs to be associated with each message and each channel +* Initially, support the following queries: + * Get the IDs of all messages with a given metadata key + * Ditto, also retrieving the metadata value + * Queries can be scoped to a single channel or all the client's channels +* The sync layer needs to know about dependencies between messages + * Dependencies are included in the message body (opaque to the sync layer) rather than the header, because signatures etc may need to cover them + * When the client validates a message, it parses the dependencies and informs the sync layer + * When a message is shared, the sync layer transitively shares its dependencies + * Not all references between messages have to be dependencies +* The client can flag expired messages +* The sync layer garbage collects expired messages that aren't transitive dependencies of any unexpired messages + +Sketch of how full text search would work: +* The client parses each message and extracts search words +* The client creates a metadata key for each search word +* The metadata value is a list of positions where the word appears in the message +* One metadata query finds all messages matching a word +* Boolean operators are handled by the client (we could push this down to the metadata API later if useful) +* To search for the phrase "foo bar", use two queries to get the metadata values for "foo" and "bar", manually combine the results to find message IDs where position("foo") + 1 == position("bar") +* This could be done with a single join query in SQL, so it doesn't score perfectly on expressiveness or performance + +Sketch of how attachments would work: +* An attachment is a dependency of the message it's attached to +* Sharing the message automatically shares the attachment + +Sketch of how peer moderation would work: +* Messages depend on their parents +* Sharing a message automatically shares its ancestors +* Moderation votes don't necessarily depend on the messages they refer to (we may want to share votes without sharing the messages they refer to, especially downvotes) + +Sketch of how expiry would work: +* All messages older than a certain age are flagged as expired +* The sync layer automatically garbage collects threads without any unexpired messages \ No newline at end of file