2021-04-04
Write a simple Matrix bot in Scheme (or any other language) - Part 1

A while ago I rewrote a bot we used in a (now defunct) Matrix community, and in the process I took the opportunity to throw away matrix-nio, a python library for writing Matrix clients, and learn a bit about Matrix’s client-server API. I was happy with the result but I have no use for the bot now, so I thought I’d share the knowledge with the rest of the world so that my efforts won’t go to waste.

In Part 2 I’ll be using Guile, but you can follow this guide in whatever language you like, provided it has a reasonable way to make HTTP requests and parse JSON.

Prerequisites

Set up an homeserver or choose a public one, then make an account for your bot (it will be a regular user, so no special action should be taken here) and join the rooms you want to use the bot in. If you are using Guile, install guile-json. We won’t be dealing with encryption, so if you plan on using the bot inside end-to-end encrypted rooms you should also set up pantalaimon. Finally, curl and jq will come in handy for exploring the API.

How Matrix’s API works

Matrix clients talk to servers by exchanging JSON objects through HTTP requests. Here you can find the documentation of the APIs we’ll need. Don’t get discouraged by the size of the document, the protocol makes an effort to support both full-featured clients that satisfy modern IM expectations, and simple automated ones like ours. Believe it or not, the endpoints we need are only three: one to log in, one to read incoming events, and one to send messages.

First, make sure you have the URL of your homeserver. For instance, mine is at https://matrix.alsd.eu:8448, but you may want to use a local instance of pantalaimon (e.g. http://localhost:8009). You can check that everything works by requesting /_matrix/client/versions:

$ curl https://matrix.alsd.eu:8448/_matrix/client/versions | jq

{
  "versions": [
    "r0.0.1",
    "r0.1.0",
    "r0.2.0",
    "r0.3.0",
    "r0.4.0",
    "r0.5.0",
    "r0.6.0"
  ],
  "unstable_features": {
    "org.matrix.label_based_filtering": true,
    "org.matrix.e2e_cross_signing": true,
    "org.matrix.msc2432": true,
    "uk.half-shot.msc2666": true,
    "io.element.e2ee_forced.public": false,
    "io.element.e2ee_forced.private": false,
    "io.element.e2ee_forced.trusted_private": false
  }
}

So far so good! All the endpoints we’ll be accessing start with /_matrix/client/r0, so we’ll say:

$ base=https://matrix.alsd.eu/_matrix/client/r0

Logging in

We can log in by POSTing some JSON to $base/login:

$ curl -d @- $base/login <<END | jq
> {
>   "type": "m.login.password",
>   "identifier": {
>     "type": "m.id.user",
>     "user": "testbot"
>   },
>   "password": "BOT_PASSWORD",
>   "device_id": "bot"
> }
> END

{
  "user_id": "@testbot:alsd.eu",
  "access_token": "MDAxNWxvY2F0aW9uIGFsc2...",
  "home_server": "alsd.eu",
  "device_id": "bot"
}

Here you should replace testbot with the id you chose, and the same goes for BOT_PASSWORD. In the response object there’s an access token that we’ll use to authenticate further operations. The device id will be shown in the session list in the bot’s profile. Every time you log in with the same id, the previous token associated with that id is revoked.

From now on we’ll provide the token inside an Authorization header:

$ token="MDAxNWxvY2F0aW9uIGFsc2..."
$ curl -H "Authorization: Bearer $token" $base/...

Synchronizing state

Matrix isn’t designed to simply pass messages between clients, but to keep the state of a room syncronized across clients and servers. When a client GETs $base/sync, for every room the user joined the response will contain the latest events that happened in that room, as well as tokens to retrieve events sent prior to the first contained in the response, and to tell the server where to start reporting events the next time the client syncs.

The docs have an example to help visualize the process of retrieving events:

First, the client makes an inital sync, and receives events [E2] to [E5] from the server. The response also contains the prev_batch and next_batch tokens.

[E0]->[E1]->[E2]->[E3]->[E4]->[E5]
           ^                      ^
           |                      |
     prev_batch: '1-2-3'        next_batch: 'a-b-c'

The next time the client syncs, it will provide the next_batch token received earlier. The servers replies with [E6], the only event generated since the last request.

[E0]->[E1]->[E2]->[E3]->[E4]->[E5]->[E6]
                                  ^     ^
                                  |     |
                                  |  next_batch: 'x-y-z'
                                prev_batch: 'a-b-c'

However, it may happen that many events have been sent in the meantime: in that case, the server only sends the most recent ones, and the client has a gap in knowledge of the room’s history.

                                  | gap |
                                  | <-> |
[E0]->[E1]->[E2]->[E3]->[E4]->[E5]->[E6]->[E7]->[E8]->[E9]->[E10]
                                        ^                        ^
                                        |                        |
                                   prev_batch: 'd-e-f'       next_batch: 'u-v-w'

The gap can be filled with the help of the prev_batch token.

The server also makes sure that the client always receives enough information about the room’s state (who is in the room, has the description changed…), even if the corrisponding events fall into the forementioned gap, by putting state events that don’t fit in the returned timeline in a separate response field.

For simplicity, we’ll assume that:

the bot will never receive enough messages for the gap to be a problem;
we don’t care about what happens when the bot is not running.

Make sure your bot joined at least a room, then try out the following:

$ curl -H "Authorization: Bearer $token" $base/sync | jq
# probably very long output

At the end of the output should can see the next_batch token. Now let’s try putting it in the request:

$ next_batch=$(curl -H "Authorization: Bearer $token" $base/sync | jq -r .next_batch)
$ curl -H "Authorization: Bearer $token" $base/sync?since=$next_batch | jq

{
  "account_data": {
    "events": []
  },
  "to_device": {
    "events": []
  },
  "device_lists": {
    "changed": [],
    "left": []
  },
  "presence": {
    "events": []
  },
  "rooms": {
    "join": {},
    "invite": {},
    "leave": {}
  },
  "groups": {
    "join": {},
    "invite": {},
    "leave": {}
  },
  "device_one_time_keys_count": {},
  "org.matrix.msc2732.device_unused_fallback_key_types": [],
  "next_batch": "s152848_7628336_4261_148566_26667_43_66472_186015_5"
}

As you can see, I provided the token using the since query parameter. This time the output is much shorter: in this case, nothing happened between the two syncs, so the response object is mostly empty. This gives us a chance to familiarize ourselves with its structure: what we’re intrested in is the .rooms.join object. Try writing something in a room the bot’s in and syncing again:

$ curl -H "Authorization: Bearer $token" $base/sync?since=$next_batch | jq .rooms.join
{
  "!PXeSeufpLzIQnfleAn:alsd.eu": {
    "timeline": {
      "events": [
        {
          "type": "m.room.message",
          "sender": "@dalz:alsd.eu",
          "content": {
            "msgtype": "m.text",
            "body": "hello there"
          },
          "origin_server_ts": 1611324778904,
          "unsigned": {
            "age": 38842
          },
          "event_id": "$rwoCYM9CitktykunRqT_v2ta8aenebgOM-aHD20EKZ0"
        }
      ],
      "prev_batch": "s152848_7628433_4263_148573_26671_43_66472_186015_5",
      "limited": false
    },
    "state": {
      "events": []
    },
    "account_data": {
      "events": []
    },
    "ephemeral": {
      "events": [
        {
          "type": "m.typing",
          "content": {
            "user_ids": []
          }
        }
      ]
    },
    "unread_notifications": {
      "notification_count": 1,
      "highlight_count": 0
    },
    "summary": {},
    "org.matrix.msc2654.unread_count": 1
  }
}

Here I used jq to filter only the intersting part. .rooms.join is an object that maps room identifiers to updates on the room’s content: most importantly, a list of events sent to the room since the last sync. All events of type m.room.message must have a textual .content.body, which we’ll use later to make our bot react to incoming messages.

Lastly, this endpoint supports long polling: you can specify a timeout in milliseconds as a query parameter (like $base/sync?since=$next_batch&timeout=30000) so that the server will wait for up to the specified interval if it has no new events to report.

Sending messages

Let’s see some action now: we’ll send a message using the PUT endpoint $base/rooms/{roomId}/send/{eventType}/{txnId}. First you need to find out the id of the room the message will be sent to: you can copy it from the .rooms.join object we retrieved earlier, or look it up from Element (room settings > advanced > internal room id).

$ room='!PXeSeufpLzIQnfleAn:alsd.eu'
$ curl -H "Authorization: Bearer $token" "$base/rooms/$room/send/m.room.message/0" -X PUT -d @- <<END
> {
>   "msgtype": "m.text",
>   "body": "Hello, world!"
> }
> END

You should now see the message in the chat. A couple of things to note:

the 0 at the end of the URL is a transaction id that should be unique as long as you reuse the same access token. We’ll take the easiest approach and use a monotonically increasing integer;
what we’re sending is an m.room.message event (docs here) of type m.text, which also supports a formatted_body (more on this later).

Aaand we’re done curling and jqing, head over to Part 2 to put all this to practice!

2021-04-04 Write a simple Matrix bot in Scheme (or any other language) - Part 1