Paperless-ngx: Database Lock Contention Fix

by Admin 44 views
Paperless-ngx Database Lock Contention: A Deep Dive into Performance Bottlenecks

Hey guys! We're diving deep into a tricky issue affecting Paperless-ngx users: database lock contention during bulk edits, especially when you're updating custom fields. This can lead to upload timeouts and general sluggishness. Let's break down the problem, the root causes, and how we can potentially fix it. Buckle up, it's gonna be a technical ride!

The Problem: Database Locks and Slowdowns

First off, what's the deal with database lock contention? Simply put, it's when multiple processes try to access the same data in your database simultaneously, leading to a traffic jam. In Paperless-ngx, this is primarily happening when you're performing bulk updates to custom fields, a common operation, like adding tags or updating metadata on many documents at once. When these processes collide, they cause Lock:tuple and Lock:transactionid waits in PostgreSQL, which can significantly slow things down. The worst part? It makes your Paperless unusable. Users can't upload documents, the API fails, and your productivity grinds to a halt. This is an issue that impacts all users of Paperless-ngx, not just specific installations.

The Symptoms: What You'll See

What does this look like in the real world? Here's what we've observed:

  • Upload Failures: You try to upload a document, and BAM! Timeout errors. The database can't keep up.
  • Lock Wait Times: Queries get stuck, waiting for locks to be released. We're talking 5-30 seconds of waiting.
  • Database CPU Spikes: Your database server starts chugging, maxing out its resources.
  • Increased Lock Acquisitions: The database is constantly acquiring and releasing locks, as high as 400 times a minute during peak hours.

Diving into the Details: Database Evidence

Let's get a little technical and look at the evidence. We've been using PostgreSQL Performance Insights to pinpoint the problem. Here's what we've found:

1. Lock:tuple on documents_customfieldinstance

This lock occurs on the documents_customfieldinstance table, specifically when updating or creating custom field instances. The SQL query involved looks something like this:

SELECT "documents_customfieldinstance".*
FROM "documents_customfieldinstance"
WHERE "document_id" = ? AND "field_id" = ?
FOR UPDATE

Essentially, the database is trying to lock a specific row in the documents_customfieldinstance table, based on the document_id and field_id. This is a row-level lock, meaning only that specific row is affected. However, if many processes are trying to update different custom fields for the same document simultaneously, they'll all be fighting for the same locks, leading to contention.

2. Lock:transactionid on documents_document

This lock targets the documents_document table, specifically when updating the modified timestamp. The SQL query looks like this:

UPDATE "documents_document"
SET "modified" = ?::timestamptz
WHERE "id" = ?

Each time a document's custom fields are updated, the modified timestamp is updated as well. This causes an additional lock on the documents_document table. This can create a bottleneck when combined with the custom field instance updates. A cascade of locking is triggered and the documents table is tied up, making the system unusable.

The Root Cause: Why Is This Happening?

So, what's causing all this chaos? Two main culprits:

1. Nested Loops with Individual Database Calls (The O(n×m) Problem)

Let's look at the modify_custom_fields function in src/documents/bulk_edit.py. The code uses a nested loop with update_or_create calls, which is highly inefficient:

def modify_custom_fields(
    doc_ids: list[int],
    add_custom_fields: list[int] | dict,
    remove_custom_fields: list[int],
) -> Literal["OK"]:
    # ...
    for field_id, value in add_custom_fields:      # m custom fields
        for doc_id in affected_docs:                # n documents
            CustomFieldInstance.objects.update_or_create(  # n×m individual DB calls!
                document_id=doc_id,
                field_id=field_id,
                defaults=defaults,
            )

Here's the problem: For every custom field (m) and every document (n), the code makes an individual database call. This means the code has a time complexity of O(n×m). This is where the issue is. This approach is incredibly slow. Each call acquires a row-level lock on the (document_id, field_id) unique constraint.

Example: Updating 10 documents with 3 custom fields? That's 30 individual database queries, each holding locks, which is a lot of sequential operations.

2. Automatic modified Timestamp Updates (The auto_now Trap)

In src/documents/models.py, the modified field uses auto_now=True. This means that every time a custom field instance is updated or created, Django automatically updates the modified timestamp on the documents_document table.

modified = models.DateTimeField(
    _("modified"),
    auto_now=True,  # Triggers UPDATE on every save
    editable=False,
    db_index=True,
)

Additionally, there are explicit timestamp updates in other parts of the code.

This auto_now behavior adds another UPDATE query, which means more database operations and more chances for lock contention. For every update on the documents_customfieldinstance table, there's a subsequent update on the documents_document table, which is a huge performance hit.

Combined Effect: The Perfect Storm

Imagine you're processing 10 documents with 3 custom fields. Here's what happens:

  • 30 SELECT...FOR UPDATE queries (for the custom field instances)
  • 30+ UPDATE documents_document queries (for the modified timestamps)

That's a total of 60+ sequential database operations, each holding locks, making the process incredibly slow, especially when multiple processes are trying to do the same thing at the same time.

Steps to Reproduce the Issue

Want to see this problem in action? Here's how to reproduce it:

  1. Set up Paperless-ngx with PostgreSQL. Make sure you have a working installation.
  2. Create a background job. This job should call /api/documents/bulk_edit/ every 5-10 seconds, updating custom fields on 10+ documents. You can use a tool like cron or a task scheduler within Paperless-ngx.
  3. Simultaneously trigger document uploads. Use webhooks to upload new documents. These uploads should also update custom fields. This simulates real-world usage.
  4. Target overlapping documents. Make sure that both the background job and the uploads are targeting some of the same documents. This increases the chances of lock contention.
  5. Monitor PostgreSQL. Use tools like pg_stat_activity and pg_locks to watch the database and see the locks in action.

Potential Solutions and Workarounds

While a complete fix requires code changes, there are a few things you can try to mitigate the issue:

  1. Reduce Concurrent Operations: Limit the number of concurrent processes that perform bulk edits. You can do this by adjusting the number of worker threads or the frequency of background jobs.
  2. Optimize the bulk_edit Function: This will require code changes. The goal is to reduce the number of individual database calls. Instead of looping through each document and each custom field, try to use bulk operations, such as bulk_update or bulk_create, to update multiple records in a single query. This reduces the number of database round trips and the chances of lock contention.
  3. Debounce or Batch Updates: Instead of updating the modified timestamp on every single change, consider batching the updates or using a debouncing mechanism to update it less frequently.
  4. Database Tuning: Make sure your PostgreSQL database is properly configured and optimized. This includes things like connection pool size, buffer cache settings, and index optimization. You may also consider the use of database replication.

Conclusion: Facing the Challenge

Database lock contention in Paperless-ngx, especially during bulk edits of custom fields, is a real issue. It leads to slow performance, upload failures, and an overall poor user experience. The root causes are the nested loops and frequent timestamp updates. By understanding these issues, we can begin to address the problem. This is a call to action. We encourage developers to propose solutions. If you can help, please do! Your contribution can make Paperless-ngx a much more robust and user-friendly platform. We welcome community participation. Let's work together to make Paperless-ngx even better!