Inline URL preview documentation. (#13261)

Inline URL preview documentation near the implementation.
This commit is contained in:
Patrick Cloke 2022-07-12 15:01:58 -04:00 committed by GitHub
parent a366b75b72
commit 1381563988
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
7 changed files with 61 additions and 73 deletions

View file

@ -1,2 +1 @@
Add a link to configuration instructions in the URL preview documentation.
Move the documentation for how URL previews work to the URL preview module.

1
changelog.d/13261.doc Normal file
View file

@ -0,0 +1 @@
Move the documentation for how URL previews work to the URL preview module.

View file

@ -35,7 +35,6 @@
- [Application Services](application_services.md)
- [Server Notices](server_notices.md)
- [Consent Tracking](consent_tracking.md)
- [URL Previews](development/url_previews.md)
- [User Directory](user_directory.md)
- [Message Retention Policies](message_retention_policies.md)
- [Pluggable Modules](modules/index.md)

View file

@ -544,7 +544,7 @@ Gets a list of all local media that a specific `user_id` has created.
These are media that the user has uploaded themselves
([local media](../media_repository.md#local-media)), as well as
[URL preview images](../media_repository.md#url-previews) requested by the user if the
[feature is enabled](../development/url_previews.md).
[feature is enabled](../usage/configuration/config_documentation.md#url_preview_enabled).
By default, the response is ordered by descending creation date and ascending media ID.
The newest media is on top. You can change the order with parameters

View file

@ -1,62 +0,0 @@
URL Previews
============
For information on how to enable URL previews in synapse, please see the [config manual](../usage/configuration/config_documentation.md#url_preview_enabled).
The `GET /_matrix/media/r0/preview_url` endpoint provides a generic preview API
for URLs which outputs [Open Graph](https://ogp.me/) responses (with some Matrix
specific additions).
This does have trade-offs compared to other designs:
* Pros:
* Simple and flexible; can be used by any clients at any point
* Cons:
* If each homeserver provides one of these independently, all the HSes in a
room may needlessly DoS the target URI
* The URL metadata must be stored somewhere, rather than just using Matrix
itself to store the media.
* Matrix cannot be used to distribute the metadata between homeservers.
When Synapse is asked to preview a URL it does the following:
1. Checks against a URL blacklist (defined as `url_preview_url_blacklist` in the
config).
2. Checks the in-memory cache by URLs and returns the result if it exists. (This
is also used to de-duplicate processing of multiple in-flight requests at once.)
3. Kicks off a background process to generate a preview:
1. Checks the database cache by URL and timestamp and returns the result if it
has not expired and was successful (a 2xx return code).
2. Checks if the URL matches an [oEmbed](https://oembed.com/) pattern. If it
does, update the URL to download.
3. Downloads the URL and stores it into a file via the media storage provider
and saves the local media metadata.
4. If the media is an image:
1. Generates thumbnails.
2. Generates an Open Graph response based on image properties.
5. If the media is HTML:
1. Decodes the HTML via the stored file.
2. Generates an Open Graph response from the HTML.
3. If a JSON oEmbed URL was found in the HTML via autodiscovery:
1. Downloads the URL and stores it into a file via the media storage provider
and saves the local media metadata.
2. Convert the oEmbed response to an Open Graph response.
3. Override any Open Graph data from the HTML with data from oEmbed.
4. If an image exists in the Open Graph response:
1. Downloads the URL and stores it into a file via the media storage
provider and saves the local media metadata.
2. Generates thumbnails.
3. Updates the Open Graph response based on image properties.
6. If the media is JSON and an oEmbed URL was found:
1. Convert the oEmbed response to an Open Graph response.
2. If a thumbnail or image is in the oEmbed response:
1. Downloads the URL and stores it into a file via the media storage
provider and saves the local media metadata.
2. Generates thumbnails.
3. Updates the Open Graph response based on image properties.
7. Stores the result in the database cache.
4. Returns the result.
The in-memory cache expires after 1 hour.
Expired entries in the database cache (and their associated media files) are
deleted every 10 seconds. The default expiration time is 1 hour from download.

View file

@ -7,8 +7,7 @@ The media repository
users.
* caches avatars, attachments and their thumbnails for media uploaded by remote
users.
* caches resources and thumbnails used for
[URL previews](development/url_previews.md).
* caches resources and thumbnails used for URL previews.
All media in Matrix can be identified by a unique
[MXC URI](https://spec.matrix.org/latest/client-server-api/#matrix-content-mxc-uris),
@ -59,8 +58,6 @@ remote_thumbnail/matrix.org/aa/bb/cccccccccccccccccccc/128-96-image-jpeg
Note that `remote_thumbnail/` does not have an `s`.
## URL Previews
See [URL Previews](development/url_previews.md) for documentation on the URL preview
process.
When generating previews for URLs, Synapse may download and cache various
resources, including images. These resources are assigned temporary media IDs

View file

@ -109,10 +109,64 @@ class MediaInfo:
class PreviewUrlResource(DirectServeJsonResource):
"""
Generating URL previews is a complicated task which many potential pitfalls.
The `GET /_matrix/media/r0/preview_url` endpoint provides a generic preview API
for URLs which outputs Open Graph (https://ogp.me/) responses (with some Matrix
specific additions).
See docs/development/url_previews.md for discussion of the design and
algorithm followed in this module.
This does have trade-offs compared to other designs:
* Pros:
* Simple and flexible; can be used by any clients at any point
* Cons:
* If each homeserver provides one of these independently, all the homeservers in a
room may needlessly DoS the target URI
* The URL metadata must be stored somewhere, rather than just using Matrix
itself to store the media.
* Matrix cannot be used to distribute the metadata between homeservers.
When Synapse is asked to preview a URL it does the following:
1. Checks against a URL blacklist (defined as `url_preview_url_blacklist` in the
config).
2. Checks the URL against an in-memory cache and returns the result if it exists. (This
is also used to de-duplicate processing of multiple in-flight requests at once.)
3. Kicks off a background process to generate a preview:
1. Checks URL and timestamp against the database cache and returns the result if it
has not expired and was successful (a 2xx return code).
2. Checks if the URL matches an oEmbed (https://oembed.com/) pattern. If it
does, update the URL to download.
3. Downloads the URL and stores it into a file via the media storage provider
and saves the local media metadata.
4. If the media is an image:
1. Generates thumbnails.
2. Generates an Open Graph response based on image properties.
5. If the media is HTML:
1. Decodes the HTML via the stored file.
2. Generates an Open Graph response from the HTML.
3. If a JSON oEmbed URL was found in the HTML via autodiscovery:
1. Downloads the URL and stores it into a file via the media storage provider
and saves the local media metadata.
2. Convert the oEmbed response to an Open Graph response.
3. Override any Open Graph data from the HTML with data from oEmbed.
4. If an image exists in the Open Graph response:
1. Downloads the URL and stores it into a file via the media storage
provider and saves the local media metadata.
2. Generates thumbnails.
3. Updates the Open Graph response based on image properties.
6. If the media is JSON and an oEmbed URL was found:
1. Convert the oEmbed response to an Open Graph response.
2. If a thumbnail or image is in the oEmbed response:
1. Downloads the URL and stores it into a file via the media storage
provider and saves the local media metadata.
2. Generates thumbnails.
3. Updates the Open Graph response based on image properties.
7. Stores the result in the database cache.
4. Returns the result.
The in-memory cache expires after 1 hour.
Expired entries in the database cache (and their associated media files) are
deleted every 10 seconds. The default expiration time is 1 hour from download.
"""
isLeaf = True