Knowledge

This section describes the knowledge base functionality available in the Dify client library.

Dataset Management

class KnowledgeDataset

Represents a knowledge dataset that can contain multiple documents.

__init__(id: str, client: DifyKnowledgeClient, info: dict = None)

Initialize a new dataset.

property documents list[KnowledgeDocument]

Returns all documents in the dataset.

get_document(document_id: str) KnowledgeDocument

Retrieve a specific document by ID.

create_document_by_text(text: str, settings: KnowledgeSegmentSettings = None) KnowledgeDocument

Create a new document from text content.

create_document_by_file(file_path: str, settings: KnowledgeSegmentSettings = None) KnowledgeDocument

Create a new document from a file.

update_document_from_file(document_id: str, file_path: str, settings: KnowledgeSegmentSettings = None) KnowledgeDocument

Update an existing document with new file content.

delete_document(document_id: str)

Delete a document from the dataset.

delete()

Delete the entire dataset.

Document

class KnowledgeDocument

Represents a document within a knowledge dataset.

__init__(id: str, client: DifyKnowledgeClient, dataset: KnowledgeDataset, batch_id: str | None = None)

Initialize a new document.

property segments list[KnowledgeSegment]

Returns all segments in the document.

create_segments(segments: list[KnowledgeDocumentSegmentSettings]) list[KnowledgeSegment]

Create multiple segments in the document.

get_segment(segment_id: str) KnowledgeSegment

Retrieve a specific segment by ID.

delete_segment(segment_id: str)

Delete a segment from the document.

property indexing_status DocumentIndexingStatuses

Get the current indexing status of the document.

wait_for_indexing(timeout: int = 60) DocumentIndexingStatuses

Wait for document indexing to complete.

property data KnowledgeDocumentData

Get detailed document data.

delete()

Delete the document.

Segment

class KnowledgeSegment

Represents a segment within a document.

__init__(id: str, client: DifyKnowledgeClient, dataset: KnowledgeDataset, document: KnowledgeDocument)

Initialize a new segment.

update(settings: KnowledgeDocumentSegmentSettings)

Update segment settings.

delete()

Delete the segment.

Data Models

class DatasetPermissionEnum(str, Enum)

Enumeration of dataset permission levels.

ONLY_ME

Only the creator can access

ALL_TEAM

All team members can access

PARTIAL_TEAM

Selected team members can access

class DocumentIndexingStatuses(str, Enum)

Enumeration of document indexing states.

WAITING
PARSING
CLEANING
SPLITTING
COMPLETED
INDEXING
ERROR
PAUSED
class KnowledgeToken

Represents an API token for knowledge operations.

id: str
type: Literal["dataset"]
token: str
last_used_at: Optional[int]
created_at: int
class KnowledgeSegmentSettings

Settings for knowledge segment processing.

name: Optional[str]
indexing_technique: Literal["high_quality", "economy"]
process_rule: ProcessRule
class KnowledgeDocumentSegmentSettings

Settings for document segments.

content: str
Text content / question content
answer: Optional[str]
Answer content for Q&A mode
keywords: Optional[list[str]]
Optional keywords
class KnowledgeDocumentData

Detailed document information.

id: str
name: str
data_source_type: str
indexing_status: DocumentIndexingStatuses
tokens: int
segment_count: int
average_segment_length: int
hit_count: int
display_status: Literal["queuing", "paused", "indexing", "error", "available", "disabled", "archived"]

Hit

Dataset Settings

The Knowledge Dataset Settings API allows you to manage dataset configurations including retrieval methods, reranking settings, and permissions.

Basic Usage

from dify_user_client import DifyClient
from dify_user_client.knowledge import DatasetPermissionEnum, RetrievalMethod

# Initialize client
client = DifyClient("YOUR_API_KEY")

# Get dataset
dataset = client.knowledge.get_dataset("dataset_id")

# Get current settings
settings = dataset.settings

# Update settings
dataset.update_settings(
    name="Updated Dataset",
    description="New description",
    permission=DatasetPermissionEnum.ALL_TEAM,
    retrieval_model={
        "search_method": RetrievalMethod.HYBRID_SEARCH,
        "weights": {
            "vector_setting": {"vector_weight": 0.7},
            "keyword_setting": {"keyword_weight": 0.3}
        }
    }
)

Settings Configuration

Retrieval Methods

The API supports three retrieval methods:

  • SEMANTIC_SEARCH: Uses embeddings for semantic similarity search

  • FULL_TEXT_SEARCH: Uses keyword-based search

  • HYBRID_SEARCH: Combines both semantic and keyword search

# Configure semantic search
dataset.update_settings(
    retrieval_model={
        "search_method": RetrievalMethod.SEMANTIC_SEARCH,
        "weights": {
            "vector_setting": {"vector_weight": 1.0},
            "keyword_setting": {"keyword_weight": 0.0}
        }
    }
)

Permissions

Dataset access can be controlled with the following permission levels:

  • ONLY_ME: Only the creator can access

  • ALL_TEAM: All team members can access

  • PARTIAL_TEAM: Selected team members can access

# Update permissions
dataset.update_settings(
    permission=DatasetPermissionEnum.ALL_TEAM
)

Settings Properties

The dataset settings object includes the following properties:

  • id: Dataset identifier

  • name: Dataset name

  • description: Optional dataset description

  • permission: Access permission level

  • indexing_technique: “high_quality” or “economy”

  • retrieval_model_dict: Retrieval configuration - search_method: Search method to use - weights: Weight configuration for hybrid search - top_k: Number of results to return - score_threshold: Minimum score threshold

  • embedding_model: Name of the embedding model

  • embedding_model_provider: Provider of the embedding model