jami-docs

Forked version of Jami documentation, see wrycode.com/jami-docs-demo
git clone git://git.wrycode.com/wrycode/jami-docs.git
Log | Files | Refs

swarm.md (23162B)


      1 # Understanding Swarms
      2 
      3 ## Synospis
      4 
      5 The goal of this document is to describe how group chats
      6 (a.k.a. **swarm chat**) will be implemented in Jami.
      7 
      8 A *swarm* is a group able to discuss without any central authority in
      9 a resilient way. Indeed, if two person doesn't have any connectivity
     10 with the rest of the group (ie Internet outage) but they can contact
     11 each other (in a LAN for example or in a sub network), they will be
     12 able to send messages to each other and then, will be able to sync
     13 with the rest of the group when it's possible.
     14 
     15 So, the *swarm* is defined by:
     16 1. Ability to split and merge following the connectivity.
     17 2. Syncing of the history. Anyone must be able to send a message to the whole group.
     18 3. No central authority. Can't rely on any server.
     19 4. Non-repudiation. Devices must be able to verify old messages validity and to replay the whole history.
     20 5. PFS on the transport. Storage is managed by the device.
     21 
     22 Main idea is to get a synchronized merkle tree with the participants.
     23 
     24 We identified four modes for swarm chat that we want to implement:
     25 + **ONE_TO_ONE**, basically the case we have today, when you discuss to a friend
     26 + **ADMIN_INVITES_ONLY** generally a class where the teacher can invite people, but not students
     27 + **INVITES_ONLY** a private group of friends
     28 + **PUBLIC** basically an opened forum
     29 
     30 ## Scenarios
     31 
     32 ### Create a Swarm
     33 
     34 *Bob wants to create a new swarm*
     35 
     36 1. Bob create a local git repository.
     37 2. Then, he adds an initial signed commit and adds the following:
     38 	+ His public key in `/admins`
     39 	+ His device certificate in `/devices`
     40 	+ His CRL in `/crls`
     41 3. The hash of the first commit becomes the **ID** of the conversation
     42 4. Bob announce to his other devices that he creates a new conversation. This is done via an invite to join the swarm sent through the DHT to other devices linked to that account.
     43 
     44 ### Adding someone
     45 
     46 *Alice adds Bob*
     47 
     48 1. Alice adds Bob to the repo:
     49 	+ Adds the invited uri in `/invited`
     50 	+ Adds the CRL into `/crls`
     51 2. Alice sends a request on the DHT
     52 
     53 ### Receiving an invite
     54 
     55 *Alice gets the invite to join the previously create swarm*
     56 
     57 1. She accepts the invite (if decline, do nothing, it will just stay into invited and Alice will never receives any message)
     58 2. A peer to peer connection between Alice and Bob is done.
     59 3. Alice pull the git repo of Bob. **WARNING this means that messages needs a connection, not from the DHT like today**
     60 4. Alice validates commits from Bob
     61 5. To validate that Alice is a member, she removes the invite from `/invited` directory, then adds her certificate into the `/members` directory
     62 6. Once all commits validated and on her device, other members of the group are discovered by Alice. with these peers, she will construct the **DRT** (explained below) with Bob as a bootstrap.
     63 
     64 ### Sending a message
     65 
     66 *Alice sends a message*
     67 
     68 Sending a message is pretty simple. Alice write a commit-message with the following format:
     69 
     70 **TODO format unclear**
     71 
     72 and adds her device and CRL to the repository if missing (others must be able to verify the commit). Merge conflicts are avoided because we are mostly based on commit messages, not files (unless CRLS + certificates but they are located). then she announce the new commit via the **DRT** with a service message (explained later) and ping the DHT for mobile devices (they must receive a push notification).
     73 
     74 For pinging other devices, the sender sends to other members a SIP messages with mimetype = "application/im-gitmessage-id" containing a JSON with the "deviceId" which sends the message, the "id" of the conversation related and the "commit"
     75 
     76 ### Receiving a message
     77 
     78 *Bob receives the message from Alice*
     79 
     80 1. *Bob* do a git pull on *Alice*
     81 2. Commits MUST be verified via a hook
     82 3. If all commits are valid, commits are stored and displayed. Then *Bob* announce the message via the DRT for other devices.
     83 4. If all commits are not valid, pull is cancelled. *Alice* must restablish her state to a correct state. **TODO process*
     84 
     85 ### Validating a commit
     86 
     87 To avoid users to push some unwanted commits (with conflicts, false messages, etc), this is how each commits (from the oldest to the newest one) MUST be validated before merging a remote branch:
     88 
     89 Note: if the validation fails, the fetch is ignored and we do not merge the branch (and remove the datas), and the user should be notified
     90 Note2: If a fetch is too big, it's not done (**TODO**)
     91 
     92 + For each commits, check that the device who try to send the commit is authorized at this moment, and that the certificates are present (in /devices for the device, and in /members or /admins for the issuer).
     93 + 3 cases. The commit has 2 parents, so it's a merge, nothing more to validate here
     94 + The commit has 0 parent, it's the initial commit:
     95 	+ Check that admin cert is added
     96 	+ Check that device cert is added
     97 	+ Check CRLs added
     98 	+ Check that no other file is added
     99 + The commit has 1 parent, commit msg is a JSON with a type:
    100 	+ If text (or other mimetype that doesn't change files)
    101 		+ Check signature from certif in repo
    102 		+ Check that no weird file is added outside device cert nor removed
    103 	+ If vote
    104 		+ Check that vote is for user that signs the commit
    105 		+ Check that vote is from an admin and device present & not banned
    106 		+ Check that no weird file is added nor removed
    107 	+ If member
    108 		+ If adds
    109 			+ Check that commit is correctly signed
    110 			+ Check that certificate is added in /invited
    111 			+ Check that no weird file is added nor removed
    112 			+ If ONE_TO_ONE, check that we only have one admin, one member
    113 			+ If ADMIN_INVITES_ONLY, check that invite is from an admin
    114 		+ If joins
    115 			+ Check that commit is correctly signed
    116 			+ Check that device is added
    117 			+ Check that invitation is moved to members
    118 			+ Check that no weird file is added nor removed
    119 		+ If kickban
    120 			+ Check that vote is valid
    121 			+ Check that the user is ban via an admin
    122 			+ Check that member or device certificate is moved to banned/
    123 			+ Check that only files related to the vote is removed
    124 			+ Check that no weird file is added nor removed
    125 	+ else fail. Notify the user that they may be with an old version or that peer tried to submit unwanted commits
    126 
    127 
    128 ### Ban a device
    129 
    130 *Alice, Bob, Carla, Denys are in a swarm. Alice bans Denys*
    131 
    132 This is one of the most difficult scenario in our context. Without central authority we can't trust:
    133 
    134 1. Timestamps of generated commits
    135 2. Conflicts with banned devices. If multiple admin devices are present and if Alice can speak with Bob but not Denys and Carla ; Carla can speak with Denys ; Denys bans Alice, Alice bans Denys, what will be the state when the 4 members will merge the conversations.
    136 3. A device can be compromised, stolen or its certificate can expire. We should be able to ban a device and avoid that it lies about its expiration or send messages in the past (by changing its certificate or the timestamp of its commit).
    137 
    138 Similar systems (with distributed group systems) are not so much, but this is some examples:
    139 
    140 + [mpOTR doesn't define how to ban someone](https://www.cypherpunks.ca/~iang/pubs/mpotr.pdf)
    141 + Signal, without any central server for group chat (EDIT: they recently change that point) doesn't give the ability to ban someone of a group.
    142 
    143 This voting system need a human action to ban someone or must be based on the CRLs infos from the repository (because we can't trust external CRLs)
    144 
    145 ### Remove a device from a conversation
    146 
    147 This is the only part that MUST have a consensus to avoid conversation's split, like if two members kick each others from the conversation, what will see the third one?
    148 
    149 This is needed to detect revoked devices, or simply avoid to get unwanted people present in a public room. The process is pretty similar between a member and a device:
    150 
    151 *Alice removes Bob*
    152 
    153 Note: Alice MUST be admins to vote
    154 
    155 + First, she votes for ban Bob. To do that, she creates the file in /votes/members/uri_bob/uri_alice (members can be replaced by devices for a device) and commit
    156 + Then she checks if the vote is resolved. This means that >50% of the admins agree to ban Bob (if she is alone, it's sure it's more than 50%).
    157 + If the vote is resolved, files into /votes can be removed, all files for Bob in /members, /admins, /CRLs, /devices can be removed (or only in /devices if it's a device that is banned) and Bob's certificate can be placed into /banned/members/bob_uri.crt (or /banned/devices/uri.crt if a device is banned) and committed to the repo
    158 + Then, Alice informs other users (outside Bob)
    159 
    160 ### Remove a conversation
    161 
    162 1. Save in convInfos removed=time::now() (like removeContact saves in contacts) that the conversation is removed and sync with other user's devices
    163 2. Now, if a new commit is received for this conversation it's ignored
    164 3. Now, if Jami startup and the repo is still present, the conversation is not announced to clients
    165 4. Two cases:
    166 	a. If no other member in the conversation we can immediately removes the repository
    167 	b. If still other members, commit that we leave the conversation, and now wait that at least another device sync this message. This avoid the fact that other members will still detect the user as a valid member and still sends new messages notifications.
    168 5. When we are sure that someone is synched, remove erased=time::now() and sync with other user's devices
    169 6. All devices owned by the user can now erase the repository and related files
    170 
    171 ## How to specify a mode
    172 
    173 Modes can't be changed through the time. Or it's another conversation. So, this data is stored into the initial commit message.
    174 The commit message will be the following:
    175 
    176 
    177 ```json
    178 {
    179 	"type": "initial",
    180 	"mode": 0,
    181 }
    182 ```
    183 
    184 For now, "mode" accepts values 0 (ONE_TO_ONE), 1 (ADMIN_INVITES_ONLY), 2 (INVITES_ONLY), 3 (PUBLIC)
    185 
    186 ### Processus for 1:1 swarms
    187 
    188 The goal there is to keep the old API (addContact/removeContact, sendTrustRequest/acceptTrustRequest/discardTrustRequest) to generate swarm with a peer and its contact. This still imply some changes that we cannot ignore:
    189 
    190 The process is still the same, an account can add a contact via addContact, then send a TrustRequest via the DHT. But two changes are necessary:
    191 1. The TrustRequest embeds a "conversationId" to inform the peer what conversation to clone when accepting the request
    192 2. TrustRequest are retried when contact come backs online. It's not the case today (as we don't want to generate a new TrustRequest if the peer discard the first). So, if an account receives a trust request, it will be automatically ignored if the request with related conversation is declined (as convRequests are synched)
    193 
    194 Then, when a contact accepts the request, a period of sync is necessary, because the contact now needs to clone the conversation.
    195 
    196 removeContact() will remove the contact and related 1:1 conversations (with the same processus as "Remove a conversation"). The only note there is that if we ban a contact, we don't wait for sync, we just remove all related files.
    197 
    198 #### Tricky scenarios
    199 
    200 There is some cases were two conversations can be created. This is at least two of those scenarios:
    201 
    202 1. Alice adds Bob
    203 2. Bob accepts
    204 3. Alice removes Bob
    205 4. Alice adds Bob
    206 
    207 or
    208 
    209 1, Alice adds Bob & Bob adds Alice at the same time, but both are not connected together
    210 
    211 In this case, two conversations are generated. We don't want to removes messages from users or choose one conversation here. So, sometimes two 1:1 swarm between the same members will be shown. It will generate some bugs during the transition time (as we don't want to break API, the infered conversation will be one of the two shown conversation, but for now it's "ok-ish", will be fixed when clients will fully handle conversationId for all APIs (calls, file transfer, etc)).
    212 
    213 ### Conversations requests specification
    214 
    215 Conversations requests are represented by a **Map<String, String>** with the following keys:
    216 
    217 + id: the conversation id
    218 + from: uri of the sender
    219 + received: timestamp
    220 + title: (optional) name for the conversation
    221 + description: (optional)
    222 + avatar: (optional)
    223 
    224 ### Conversation's profile synchronization
    225 
    226 To be identifiable, a conversation generally needs some metadatas, like a title (eg: Jami), a description (eg: some links, what is the project, etc), and an image (the logo of the project). Those metadatas are optional, but shared across all members, so need to be synced and incorporated in the requests.
    227 
    228 #### Storage in the repository
    229 
    230 The profile of the conversation is stored in a classic vCard file at the root (`/profile.vcf`) like:
    231 
    232 ```
    233 BEGIN:VCARD
    234 VERSION:2.1
    235 FN:TITLE
    236 DESCRIPTION:DESC
    237 END:VCARD
    238 ```
    239 
    240 #### Synchronization
    241 
    242 To update the vCard, an user with enough permissions (by default: =ADMIN) needs to edit `/profile.vcf`. and will commit the file with the mimetype `application/update-profile`. The new message is sent via the same mechanism and all peers will receive the **MessageReceived** signal from the daemon. The branch is dropped if the commit contains other files or too big or if done by a non authorized member (by default: <ADMIN).
    243 
    244 #### Merge conflicts management
    245 
    246 Because two admins can change the description at the same time, a merge conflict can occurs on `profile.vcf`. In this case, the commit with the higher hash (eg ffffff > 000000) will be chosen.
    247 
    248 #### APIs
    249 
    250 The user got 2 methods to get and set conversation's metadatas:
    251 
    252 ```xml
    253 	   <method name="updateConversationInfos" tp:name-for-bindings="updateConversationInfos">
    254 		   <tp:added version="10.0.0"/>
    255 		   <tp:docstring>
    256 			   Update conversation's infos (supported keys: title, description, avatar)
    257 		   </tp:docstring>
    258 		   <arg type="s" name="accountId" direction="in"/>
    259 		   <arg type="s" name="conversationId" direction="in"/>
    260 		   <annotation name="org.qtproject.QtDBus.QtTypeName.In2" value="VectorMapStringString"/>
    261 		   <arg type="a{ss}" name="infos" direction="in"/>
    262 	   </method>
    263 
    264 	   <method name="conversationInfos" tp:name-for-bindings="conversationInfos">
    265 		   <tp:added version="10.0.0"/>
    266 		   <tp:docstring>
    267 			   Get conversation's infos (mode, title, description, avatar)
    268 		   </tp:docstring>
    269 		   <annotation name="org.qtproject.QtDBus.QtTypeName.Out0" value="VectorMapStringString"/>
    270 		   <arg type="a{ss}" name="infos" direction="out"/>
    271 		   <arg type="s" name="accountId" direction="in"/>
    272 		   <arg type="s" name="conversationId" direction="in"/>
    273 	   </method>
    274 ```
    275 
    276 where `infos` is a `map<str, str>` with the following keys:
    277 
    278 + mode: READ-ONLY
    279 + title
    280 + description
    281 + avatar
    282 
    283 #### Re-import an account (link/export)
    284 
    285 The archive MUST contains conversationId to be able to be able to retrieve conversations on new commits after a re-import (because there is no invite at this point). If a commit come for a conversation not present there is two possibilities:
    286 
    287 + The conversationId is there, in this case the daemon is able to re-clone this conversation
    288 + The conversationId is missing, so the daemon asks (via a message `{{"application/invite", conversationId}}`) a new invite that the user needs to (re)accepts
    289 
    290 Note, a conversation can only be retrieven if a contact or another device is there, else it will be lost. There is no magic.
    291 
    292 ### Other mime types
    293 
    294 + `application/call-history+json` with a JSON containing the `id` and the `duration` of the call
    295 + `application/data-transfer+json` with a JSON containing the `tid` of the file
    296 
    297 ## Used protocols
    298 
    299 ### Git
    300 
    301 #### Why this choice
    302 
    303 Each conversation will be a git repository. This choice is motivated by:
    304 
    305 1. We need to sync and ordering messages. The Merkle Tree is the perfect structure to do that and can be linearized by merging branches. Moreover, because it's massively used by Git, it's easy to sync between devices.
    306 2. Distributed by nature. Massively used. Lot of backends and plugguable.
    307 3. Can verify commits via hooks and massively used crypto
    308 4. Can be stored in a database if necessary
    309 5. Conflicts are avoided by using commit messages, not files.
    310 
    311 #### What we have to validate
    312 
    313 + Performance? `git.lock` can be low
    314 + Hooks in libgit2
    315 + Multiple pulls at the same time?
    316 
    317 #### Limits
    318 
    319 History can't be deleted. To delete a conversation, the device has to leave the conversation and create another one.
    320 
    321 However, non permanent messages (like messages readable only for some minutes) can be sent via a  special message via the DRT (like Typing or Read notifications).
    322 
    323 Moreover editing messages will be possible! (`commit --fixup`)
    324 
    325 #### Structure
    326 
    327 ```
    328 /
    329  - invited
    330  - admins (public keys)
    331  - members (public keys)
    332  - devices (certificates of authors to verify commits)
    333  - banned
    334    - devices
    335    - members
    336  - votes
    337 	- members
    338 		- uri
    339 	- devices
    340 		- uri
    341  - CRLs
    342 ```
    343 
    344 #### Attacks?
    345 
    346 + Avoid git bombs
    347 
    348 #### Notes
    349 
    350 Timestamp of a commit can be trusted because it's editable. Only the user's timestamp can be trusted.
    351 
    352 ### TLS
    353 
    354 Git operations, control messages, files and other things will use a p2p TLS v1.3 link with only cyphers which garanty PFS. So each key is renegotiated for each new connexion.
    355 
    356 ### DHT (udp)
    357 
    358 Used to send messages for mobiles (to trigger push notifications) and to initiate TCP connexions.
    359 
    360 ### DRT (name will change)
    361 
    362 The DRT is a new concept used in swarm to maintain p2p connections. Indeed, group members define a graph of nodes (identified by a hash) en must be connected.
    363 
    364 So we need a structure that:
    365 
    366 1. Maximize the connected nodes at every time
    367 2. Minimize the time for message transmission
    368 3. Minimize links between peers
    369 4. Needs low calculation
    370 
    371 Several solutions exists:
    372 
    373 1. Each node has a connection  to the next node. So we only need $N$ connections, but it's not effective to transmit a message, because the message will go though all peers, one by one.
    374 2. Each nodes is connected to all other nodes $N\timesN$ connections. Effective to transmit, but need more resources **WILL BE CHOSEN FOR THE FIRST VERSION**
    375 3. *Maximizing the Coverage of Roadmap Graph for Optimal Motion Planning*
    376 https://www.hindawi.com/journals/complexity/2018/9104720/. But need calculations
    377 4. Use the algorithm of the DHT for the routing table. The 4 points are basically solved and already used by Jami in UDP.
    378 
    379 Note: to optimize the socket numbers, a socket will be given by a **ConnectionManager** to get multiplexed sockets with a given hash. This means that if we need to transmit several files, and chat with someone, only one socket will be used.
    380 
    381 ### File transfer (libtorrent?)
    382 
    383 **TODO**
    384 
    385 ### Network activity
    386 
    387 #### Process to invite someone
    388 
    389 Alice wants to invite Bob:
    390 
    391 1. Alice adds bob to a conversation
    392 2. Alice generates an invite: { "application/invite+json" : {
    393 	"conversationId": "$id",
    394 	"members": [{...}]
    395 }}
    396 3. Two possibilites for sending the message
    397 	a. If not connected, via the DHT
    398 	b. Else, Alice sends on the SIP channel
    399 4. Two possibilities for Bob
    400 	a. Receives the invite, a signal is emitted for the client
    401 	b. Not connected, so will never receives the request cause Alice must not know if Bob just ignored or blocked alice. The only way is to regenerate a new invite via a new message (cf next scenario)
    402 
    403 #### Process to send a message to someone
    404 
    405 Alice wants to send a message to Bob:
    406 
    407 1. Alice adds a message in the repo, giving an ID
    408 2. Alice gets a message received (from herself) if successful
    409 3. Two possibilities, alice and bob are connected, or not. In both case a message is crafted: { "application/im-gitmessage-id" : "{"id":"$convId", "commit":"$commitId", "deviceId": "$alice_device_hash"}"}.
    410 	a. If not connected, via the DHT
    411 	b. Else, Alice sends on the SIP channel
    412 4. Four possibilities for Bob:
    413 	a. Bob is not connected to Alice, so if he trusts alice, ask for a new conection and go to b.
    414 	b. If connected, fetch from alice and announce new messages
    415 	c. Bob doesn't know that conversation. Ask through the DHT to get an invite first to be able to accept that conversation ({"application/invite", conversationId})
    416 	d. Bob is disconnected (no network, or just closed). He will not receive the new message, but will try to sync when the next connection will occurs
    417 
    418 
    419 ### Implementation
    420 
    421 **!! OLD DRAFT !!**
    422 
    423 Note: Following notes are not organized yet. Just some line of thoughts.
    424 
    425 ## Crypto improvements.
    426 
    427 For a serious group chat feature, we also need serious crypto. With the current design, if a certificate is stolen as the previous DHT values of a conversation, the conversation can be decrypted. Maybe we need to go to something like **Double ratchet**.
    428 
    429 Note: a lib might exists to implement group conversations. TODO, investigate.
    430 
    431 Needs ECC support in OpenDHT
    432 
    433 ## Usage
    434 
    435 ### Add Roles?
    436 
    437 There is two major use case for group chats:
    438 
    439 1. Something like a Mattermost in a company, with private channels, and some roles (admin/spectator/bot/etc) or for educations (where only a few are active).
    440 2. Horizontal conversations like a conversation between friends.
    441 
    442 Ring will be for which one?
    443 
    444 #### Implementation idea
    445 
    446 A certificate for a group which sign user with a flag for a role. Adding or revoking can also be done.
    447 
    448 ### Join a conversation
    449 
    450 + Only via a direct invite
    451 + Via a link/QR Code/whatever
    452 + Via a room name? (a **hash** on the DHT)
    453 
    454 ## What we need
    455 
    456 + Confidentiality: members outside of the group chat should not be able to read messages in the group
    457 + Forward secrecy: if any key from the group os compromised, previous messages should remain confidential (as much as possible)
    458 
    459 + Message ordering: There is a need to have messages in the right order
    460 + Synchronization: There is also a need to be sure to have all messages at soon as possible.
    461 + Persistence: Actually, a message on the DHT lives only 10 minutes. Because it's the best timing calculated for this kind of DHT. To persist datas, the node must re-put the value on the DHT every 10 minutes. Another way to do when the node is offline is to let nodes re-put the data. But, if after 10 minutes, 8 nodes are still here, they will do 64 requests (and it's exponential). The current way to avoid spamming for that is queries. This will still do 64 requests but limit the max redundency to 8 nodes.
    462 
    463 ## Other distributed ways
    464 
    465 + IPFS: Need some investigation
    466 + BitMessage: Need some investigation
    467 + Maidsafe: Need some investigation
    468 
    469 ### Based on current work we have
    470 
    471 Groups chat can be based on the same work we already have for multi devices (but here, with a group certificate). Problems to solve:
    472 
    473 1. History sync. This need to move the database from the client into the daemon.
    474 2. If nobody is connected, the synchronization can't be done, and the person will never see the conversation
    475 
    476 ### Another dedicated DHT
    477 
    478 Like a DHT with super user. (Not convinced)
    479 
    480 ## File transfer
    481 
    482 Currently the file transfer algorithm is based on a TURN connection (See https://git.ring.cx/savoirfairelinux/ring-project/wikis/tutorials/File-transfer). In case of a big group, this will be bad. We first need a p2p implem for the file transfer. Implement the RFC for p2p transfer.
    483 
    484 Other problem: currently there is no implementation for TCP support for ICE in PJSIP. This is mandatory for this point (in pjsip or home made)
    485 
    486 ## Resources
    487 
    488 + https://eprint.iacr.org/2017/666.pdf
    489 + Robust distributed synchronization of networked linear systems with intermittent information (Sean Phillips and Ricardo G.Sanfelice)