Constructing a scalable doc administration system: Classes from separating metadata and content material



{
  "unique_document_id": "aGVsbG93b3JsZA==",
  "member_id": "123456",
  "file_name": "employment_contract.pdf",
  "document_category_id": 101,
  "document_subcategory_id": 10110,
  "document_extension": ".pdf",
  "document_size_in_bytes": 245678,
  "date_added": "2025-09-20T12:11:01Z",
  "date_updated": "2025-09-21T15:22:00Z",
  "created_by_user_id": "u-01",
  "updated_by_user_id": "u-02",
  "notes": "Signed by each events"
}

For question patterns, I leveraged secondary indexes aggressively. Whereas the first desk makes use of the distinctive doc ID as its key, a secondary index organized by member ID and doc class permits environment friendly queries like “retrieve all paperwork of a sure class for a given member” with out costly desk scans.

The schema-on-read mannequin of NoSQL proved invaluable for evolution. Once we wanted so as to add a brand new optionally available metadata discipline, there was no dangerous ALTER TABLE assertion or downtime. New paperwork merely began together with the attribute, whereas current paperwork continued working with out it. This agility allowed us to answer new necessities in hours as an alternative of weeks.

Constructing in catastrophe restoration and information resiliency

A complete catastrophe restoration technique was important for enterprise continuity. I integrated resiliency at each the metadata and content material layers.