akwizgran created page: ClientApiNotes

akwizgran
2015-12-07 14:26:06 +00:00
parent 47ce621045
commit 5334d66307

53
ClientApiNotes.md Normal file

@@ -0,0 +1,53 @@
[BSP](BSP) clients may need to store their own metadata as well as the data they sync via BSP. Metadata may be extracted from the data (e.g. the subject line of a message), or it may describe the state of the client (e.g. whether the user has starred a message). It may refer to a single message or the relationships between messages. We have to decide whether the metadata should be stored in the same database as the data, and if so, what the API for storing and querying metadata should look like.
Issues to consider:
* Encapsulation - if clients have low-level access to the data store they can't be insulated from each other
* Modularity - if we want to release the protocol stack as a separate library it should have a well-defined API, not all of SQL
* Expressiveness - if we provide an API for metadata it must be rich enough that clients don't need to use a separate database
* Performance - if we provide an API for metadata it must have comparable performance to using a separate database
* Transactions - do clients need to update metadata and data in a single atomic operation? Do they need atomic operations spanning multiple messages?
* Encryption - if clients use their own metadata storage it won't benefit from our database encryption
Use cases to consider:
* Full text search - efficient queries over multiple messages
* Attachments - simple case of relationships between messages
* Peer moderation - users share upvotes and downvotes that refer to messages; if a message is eligible to be shared, so are all its ancestors
* Expiry - delete discussion threads that have been inactive for a certain amount of time
Considering the above issues and uses cases, my current thinking is as follows:
* Store metadata in the same database as data
* Allow arbitrary key/value pairs to be associated with each message and each channel
* Initially, support the following queries:
* Get the IDs of all messages with a given metadata key
* Ditto, also retrieving the metadata value
* Queries can be scoped to a single channel or all the client's channels
* The sync layer needs to know about dependencies between messages
* Dependencies are included in the message body (opaque to the sync layer) rather than the header, because signatures etc may need to cover them
* When the client validates a message, it parses the dependencies and informs the sync layer
* When a message is shared, the sync layer transitively shares its dependencies
* Not all references between messages have to be dependencies
* The client can flag expired messages
* The sync layer garbage collects expired messages that aren't transitive dependencies of any unexpired messages
Sketch of how full text search would work:
* The client parses each message and extracts search words
* The client creates a metadata key for each search word
* The metadata value is a list of positions where the word appears in the message
* One metadata query finds all messages matching a word
* Boolean operators are handled by the client (we could push this down to the metadata API later if useful)
* To search for the phrase "foo bar", use two queries to get the metadata values for "foo" and "bar", manually combine the results to find message IDs where position("foo") + 1 == position("bar")
* This could be done with a single join query in SQL, so it doesn't score perfectly on expressiveness or performance
Sketch of how attachments would work:
* An attachment is a dependency of the message it's attached to
* Sharing the message automatically shares the attachment
Sketch of how peer moderation would work:
* Messages depend on their parents
* Sharing a message automatically shares its ancestors
* Moderation votes don't necessarily depend on the messages they refer to (we may want to share votes without sharing the messages they refer to, especially downvotes)
Sketch of how expiry would work:
* All messages older than a certain age are flagged as expired
* The sync layer automatically garbage collects threads without any unexpired messages