Before, the problem scenario would get caught in one of the assertions because
we expect the to_token <= membership_snapshot_token or vice-versa but it's
possible the tokens are intertwined and neither is ahead of each other.
Especially since the `instance_map` in `membership_snapshot_token` is made up
from the `stream_ordering` of membership events at various stream positions
and processed on different instances (not current stream positions).
We get into trouble when stream positions are lagging between workers and our
now/`to_token` doesn't cleanly compare to `membership_snapshot_token`.
What we really want to assert is that the `to_token` <= the stream positions
at the time we asked for the room membership snapshot. Since
`get_rooms_for_local_user_where_membership_is()` doesn't return that
information, the closest we can get is to get the stream positions before we
ask for the room membership snapshot and consider that good enough to compare
against.
The new implementation catches the problem with an assert
but I think it's possible to make it work as well.
```
SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.handlers.test_sliding_sync.GetSyncRoomIdsForUserEventShardTestCase
```
Use fully-qualified `PersistedEventPosition` (`instance_name` and `stream_ordering`) when returning `RoomsForUser` to facilitate proper comparisons and `RoomStreamToken` generation.
Spawning from https://github.com/element-hq/synapse/pull/17187 where we want to utilize this change
Before:
```
$ SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.replication.storage.test_events
[...]
Traceback (most recent call last):
File "pypoetry/virtualenvs/matrix-synapse-xCtC9ulO-py3.12/lib/python3.12/site-packages/twisted/trial/runner.py", line 711, in loadByName
return self.suiteFactory([self.findByName(name, recurse=recurse)])
File "pypoetry/virtualenvs/matrix-synapse-xCtC9ulO-py3.12/lib/python3.12/site-packages/twisted/trial/runner.py", line 474, in findByName
obj = reflect.namedModule(searchName)
File "pypoetry/virtualenvs/matrix-synapse-xCtC9ulO-py3.12/lib/python3.12/site-packages/twisted/python/reflect.py", line 156, in namedModule
topLevel = __import__(name)
File "synapse/tests/replication/storage/test_events.py", line 33, in <module>
from synapse.handlers.room import RoomEventSource
File "synapse/synapse/handlers/room.py", line 74, in <module>
from synapse.rest.admin._base import assert_user_is_admin
File "synapse/synapse/rest/__init__.py", line 24, in <module>
from synapse.rest import admin
File "synapse/synapse/rest/admin/__init__.py", line 41, in <module>
from synapse.handlers.pagination import PURGE_HISTORY_ACTION_NAME
File "synapse/synapse/handlers/pagination.py", line 30, in <module>
from synapse.handlers.room import ShutdownRoomParams, ShutdownRoomResponse
builtins.ImportError: cannot import name 'ShutdownRoomParams' from partially initialized module 'synapse.handlers.room' (most likely due to a circular import) (synapse/synapse/handlers/room.py)
```
Otherwise things will get confused.
An alternative would be to make sure that for lagging stream we don't
return anything (and make sure the returned next_batch token doesn't go
backwards). But that is a faff.
We try and deduplicate in two places: 1) really early on, and 2) just
before we persist the event. The first case was broken due to it
occuring before the profile information was added, and so it thought the
event contents were different.
The second case did catch it and handle it correctly, however doing so
creates a redundant state group leading to bloat.
Fixes#3791
Fixes up #17239
We need to keep the spam check within the `try/except` block. Also makes
it so that we don't enter the top span twice.
Also also ensures that we get the right thumbnail length.
There is a problem with `StreamIdGenerator` where it can go backwards
over restarts when a stream ID is requested but then not inserted into
the DB. This is problematic if we want to land #17215, and is generally
a potential cause for all sorts of nastiness.
Instead of trying to fix `StreamIdGenerator`, we may as well move to
`MultiWriterIdGenerator` that does not suffer from this problem (the
latest positions are stored in `stream_positions` table). This involves
adding SQLite support to the class.
This only changes id generators that were already using
`MultiWriterIdGenerator` under postgres, a separate PR will move the
rest of the uses of `StreamIdGenerator` over.