Knowledge

This section describes the knowledge base functionality available in the Dify client library.

Dataset Management

class KnowledgeDataset

Represents a knowledge dataset that can contain multiple documents.

__init__(id: str, client: DifyKnowledgeClient, info: dict = None): Initialize a new dataset.

property documents → list[KnowledgeDocument]: Returns all documents in the dataset.

get_document(document_id: str) → KnowledgeDocument: Retrieve a specific document by ID.

create_document_by_text(text: str, settings: KnowledgeSegmentSettings = None) → KnowledgeDocument: Create a new document from text content.

create_document_by_file(file_path: str, settings: KnowledgeSegmentSettings = None) → KnowledgeDocument: Create a new document from a file.

update_document_from_file(document_id: str, file_path: str, settings: KnowledgeSegmentSettings = None) → KnowledgeDocument: Update an existing document with new file content.

delete_document(document_id: str): Delete a document from the dataset.

delete(): Delete the entire dataset.

Document

class KnowledgeDocument

Represents a document within a knowledge dataset.

__init__(id: str, client: DifyKnowledgeClient, dataset: KnowledgeDataset, batch_id: str | None = None): Initialize a new document.

property segments → list[KnowledgeSegment]: Returns all segments in the document.

create_segments(segments: list[KnowledgeDocumentSegmentSettings]) → list[KnowledgeSegment]: Create multiple segments in the document.

get_segment(segment_id: str) → KnowledgeSegment: Retrieve a specific segment by ID.

delete_segment(segment_id: str): Delete a segment from the document.

property indexing_status → DocumentIndexingStatuses: Get the current indexing status of the document.

wait_for_indexing(timeout: int = 60) → DocumentIndexingStatuses: Wait for document indexing to complete.

property data → KnowledgeDocumentData: Get detailed document data.

delete(): Delete the document.

Segment

class KnowledgeSegment

Represents a segment within a document.

__init__(id: str, client: DifyKnowledgeClient, dataset: KnowledgeDataset, document: KnowledgeDocument): Initialize a new segment.

update(settings: KnowledgeDocumentSegmentSettings): Update segment settings.

delete(): Delete the segment.

Data Models

class DatasetPermissionEnum(str, Enum)

Enumeration of dataset permission levels.

ONLY_ME: Only the creator can access

ALL_TEAM: All team members can access

PARTIAL_TEAM: Selected team members can access

class DocumentIndexingStatuses(str, Enum)

Enumeration of document indexing states.

WAITING

PARSING

CLEANING

SPLITTING

COMPLETED

INDEXING

ERROR

PAUSED

class KnowledgeToken

Represents an API token for knowledge operations.

id: str

type: Literal["dataset"]

token: str

last_used_at: Optional[int]

created_at: int

class KnowledgeSegmentSettings

Settings for knowledge segment processing.

name: Optional[str]

indexing_technique: Literal["high_quality", "economy"]

process_rule: ProcessRule

class KnowledgeDocumentSegmentSettings

Settings for document segments.

content: str
Text content / question content

answer: Optional[str]
Answer content for Q&A mode

keywords: Optional[list[str]]
Optional keywords

class KnowledgeDocumentData

Detailed document information.

id: str

name: str

data_source_type: str

indexing_status: DocumentIndexingStatuses

tokens: int

segment_count: int

average_segment_length: int

hit_count: int

display_status: Literal["queuing", "paused", "indexing", "error", "available", "disabled", "archived"]

Hit

Dataset Settings

The Knowledge Dataset Settings API allows you to manage dataset configurations including retrieval methods, reranking settings, and permissions.

Basic Usage

from dify_user_client import DifyClient
from dify_user_client.knowledge import DatasetPermissionEnum, RetrievalMethod

# Initialize client
client = DifyClient("YOUR_API_KEY")

# Get dataset
dataset = client.knowledge.get_dataset("dataset_id")

# Get current settings
settings = dataset.settings

# Update settings
dataset.update_settings(
    name="Updated Dataset",
    description="New description",
    permission=DatasetPermissionEnum.ALL_TEAM,
    retrieval_model={
        "search_method": RetrievalMethod.HYBRID_SEARCH,
        "weights": {
            "vector_setting": {"vector_weight": 0.7},
            "keyword_setting": {"keyword_weight": 0.3}
        }
    }
)

Settings Configuration

Retrieval Methods

The API supports three retrieval methods:

SEMANTIC_SEARCH: Uses embeddings for semantic similarity search
FULL_TEXT_SEARCH: Uses keyword-based search
HYBRID_SEARCH: Combines both semantic and keyword search

# Configure semantic search
dataset.update_settings(
    retrieval_model={
        "search_method": RetrievalMethod.SEMANTIC_SEARCH,
        "weights": {
            "vector_setting": {"vector_weight": 1.0},
            "keyword_setting": {"keyword_weight": 0.0}
        }
    }
)

Permissions

Dataset access can be controlled with the following permission levels:

ONLY_ME: Only the creator can access
ALL_TEAM: All team members can access
PARTIAL_TEAM: Selected team members can access

# Update permissions
dataset.update_settings(
    permission=DatasetPermissionEnum.ALL_TEAM
)

Settings Properties

The dataset settings object includes the following properties:

id: Dataset identifier
name: Dataset name
description: Optional dataset description
permission: Access permission level
indexing_technique: “high_quality” or “economy”
retrieval_model_dict: Retrieval configuration - search_method: Search method to use - weights: Weight configuration for hybrid search - top_k: Number of results to return - score_threshold: Minimum score threshold
embedding_model: Name of the embedding model
embedding_model_provider: Provider of the embedding model