Merge branch 'jamie-unsorted-changes' of github.com:spacedriveapp/spacedrive into jamie-unsorted-changes

This commit is contained in:
maxichrome 2022-05-22 22:59:00 -05:00
commit 198f52af20
No known key found for this signature in database
GPG key ID: DDC459310E98B6AB
4 changed files with 40 additions and 36 deletions

View file

@ -19,12 +19,19 @@ model Migration {
}
model SyncEvent {
id Int @id @default(autoincrement())
id Int @id @default(autoincrement())
node_id Int
timestamp String
data String
node Node @relation(fields: [node_id], references: [id])
// individual record pub id OR compound many-to-many pub ids
record_id String
// the type of operation, I.E: CREATE, UPDATE, DELETE as an enum
type Int
// the column name for atomic update operations
column String?
// the new value for create/update operations, msgpack encoded
value String
node Node @relation(fields: [node_id], references: [id])
@@map("sync_events")
}

View file

@ -0,0 +1,13 @@
## Database backup
## Database migrations
Currently migrations are applied on app launch with no visual feedback, backup or error handling.
It doesn't appear that migrations are applied succesfully
##

View file

@ -5,20 +5,8 @@ Synchronizing data between clients in a Spacedrive network is accomplished using
Designed for synchronizing data in realtime between [SQLite](https://www.sqlite.org/) databases potentially in the gigabytes.
```rust
mod sync {
struct SyncEngine {
pending: Vec<SyncEvent>, // events waiting to be sent
}
struct SyncEvent {
client_uuid: String, // client that created change
timestamp: uhlc::Timestamp, // unique hybrid logical clock timestamp
resource: SyncResource, // the CRDT resource
transport: SyncTransport, // method of data transport (msg or binary)
}
// we can now impl specfic CRDT traits to given resources
enum SyncResource {
// we can now impl specfic CRDT traits to given resources
enum SyncResource {
FilePath(dyn Replicate),
File(dyn PropertyOperation),
Tag(dyn PropertyOperation),
@ -46,32 +34,32 @@ Data is divided into several kinds, Shared and Owned.
\*_Shared data doesn't always use this method, in some cases we can create shared resources in bulk, where conflicts are handled by simply merging. More on that in [Synchronization Strategy]()_.
## Client Pool
## Node Pool
The client pool maintains record of all clients in your network.
The node pool maintains record of all nodes in your network.
An exact replica of the client pool is synchronized on each client. When a given client has a state change, it will notify every other client in the pool via the `connection` struct.
The `ClientConnection` is maintained in memory and is established on startup.
```rust
struct ClientPool {
struct NodePool {
clients: Vec<Client>
}
struct Client {
struct Node {
uuid: String,
last_seen: DateTime<Utc>,
last_synchronized: DateTime<Utc>,
connection: Option<ClientConnection>
connection: Option<NodeConnection>
}
```
Clients will ping-pong to ensure their connection stays alive, this logic is contained within the `ClientConnection` instance.
Nodes will ping-pong to ensure their connection stays alive, this logic is contained within the `NodeConnection` instance.
**Handling stale clients**
**Handling stale nodes**
If a client has not been seen in X amount of time, other clients will not persist pending operations for them. Clients take care of flushing the pending operation queue once all non-stale clients have received the pending operations.
If a node has not been seen in X amount of time, other nodes will not persist pending operations for them. Nodes take care of flushing the pending operation queue once all non-stale nodes have received the pending operations.
## Clock
@ -93,7 +81,7 @@ This allows us to entirely avoid the need to synchronize time between clients, a
Sync happens in the following order:
Owned data → Bulk shared data → Shared data → Relational data
Owned data → Bulk shared data → Shared data
### Types of CRDT:

View file

@ -9,38 +9,34 @@ Represents a unique file across the virtual filesystem, all Spacedrive metadata
```rust
struct File {
id: i32,
partial_checksum: str,
checksum: Option<str>,
cas_id: str,
integrity_checksum: Option<str>,
kind: FileKind,
hidden: bool,
favorite: bool,
has_thumbnail: bool,
has_thumbstrip: bool,
has_video_preview: bool,
encryption: EncryptionAlgorithm,
key: Key,
ipfs_id: Option<str>,
file_paths: Vec<FilePath>,
paths: Vec<FilePath>,
tags: Vec<Tag>,
labels: Vec<Label>,
comments: Vec<Comment>,
albums: Vec<Album>,
media_data: Option<MediaData>,
date_created: DateTime<Utc>,
date_modified: DateTime<Utc>,
}
```
- `partial_checksum ` - A SHA256 checksum generated from 5 samples of 10,000 bytes throughout the file data, including the beginning and end + total byte count. This is used to identify a file as _likely_ unique in under 100µs.
- `cas_id ` - A SHA256 checksum generated from 5 samples of 10,000 bytes throughout the file data, including the beginning and end + total byte count. This is used to identify a file as _likely_ unique in under 100µs.
> ~~It is impossible to have a unique constraint at a database level for the `partial_checksum` however we can asynchronously resolve conflicts by querying for duplicates and generating full checksums at a later date.~~
>
> For synchronization of this resource we can tolerate temporary duplicates, any client can calculate that two files resources are duplicate and merge them into a single resource. In turn, triggering a shared data merge operation, whereby the older record is prioritised at a property level during the merge.
- `checksum` - A full SHA256 checksum of the file data used to verify uniqueness should a `partial_checksum` conflict occur.
- `integrity_checksum` - A full SHA256 checksum of the file data used to verify uniqueness should a `cas_id` conflict occur.
### FilePath — `Owned data`