Knowledge
This section describes the knowledge base functionality available in the Dify client library.
Dataset Management
- class KnowledgeDataset
Represents a knowledge dataset that can contain multiple documents.
- property documents list[KnowledgeDocument]
Returns all documents in the dataset.
- get_document(document_id: str) KnowledgeDocument
Retrieve a specific document by ID.
- create_document_by_text(text: str, settings: KnowledgeSegmentSettings = None) KnowledgeDocument
Create a new document from text content.
- create_document_by_file(file_path: str, settings: KnowledgeSegmentSettings = None) KnowledgeDocument
Create a new document from a file.
- update_document_from_file(document_id: str, file_path: str, settings: KnowledgeSegmentSettings = None) KnowledgeDocument
Update an existing document with new file content.
- delete()
Delete the entire dataset.
Document
- class KnowledgeDocument
Represents a document within a knowledge dataset.
- __init__(id: str, client: DifyKnowledgeClient, dataset: KnowledgeDataset, batch_id: str | None = None)
Initialize a new document.
- property segments list[KnowledgeSegment]
Returns all segments in the document.
- create_segments(segments: list[KnowledgeDocumentSegmentSettings]) list[KnowledgeSegment]
Create multiple segments in the document.
- get_segment(segment_id: str) KnowledgeSegment
Retrieve a specific segment by ID.
- property indexing_status DocumentIndexingStatuses
Get the current indexing status of the document.
- wait_for_indexing(timeout: int = 60) DocumentIndexingStatuses
Wait for document indexing to complete.
- property data KnowledgeDocumentData
Get detailed document data.
- delete()
Delete the document.
Segment
- class KnowledgeSegment
Represents a segment within a document.
- __init__(id: str, client: DifyKnowledgeClient, dataset: KnowledgeDataset, document: KnowledgeDocument)
Initialize a new segment.
- update(settings: KnowledgeDocumentSegmentSettings)
Update segment settings.
- delete()
Delete the segment.
Data Models
- class DatasetPermissionEnum(str, Enum)
Enumeration of dataset permission levels.
- ONLY_ME
Only the creator can access
- ALL_TEAM
All team members can access
- PARTIAL_TEAM
Selected team members can access
- class DocumentIndexingStatuses(str, Enum)
Enumeration of document indexing states.
- WAITING
- PARSING
- CLEANING
- SPLITTING
- COMPLETED
- INDEXING
- ERROR
- PAUSED
- class KnowledgeToken
Represents an API token for knowledge operations.
- id: str
- type: Literal["dataset"]
- token: str
- last_used_at: Optional[int]
- created_at: int
- class KnowledgeSegmentSettings
Settings for knowledge segment processing.
- name: Optional[str]
- indexing_technique: Literal["high_quality", "economy"]
- process_rule: ProcessRule
- class KnowledgeDocumentSegmentSettings
Settings for document segments.
- content: str
- Text content / question content
- answer: Optional[str]
- Answer content for Q&A mode
- keywords: Optional[list[str]]
- Optional keywords
- class KnowledgeDocumentData
Detailed document information.
- id: str
- name: str
- data_source_type: str
- indexing_status: DocumentIndexingStatuses
- tokens: int
- segment_count: int
- average_segment_length: int
- hit_count: int
- display_status: Literal["queuing", "paused", "indexing", "error", "available", "disabled", "archived"]
Hit
Dataset Settings
The Knowledge Dataset Settings API allows you to manage dataset configurations including retrieval methods, reranking settings, and permissions.
Basic Usage
from dify_user_client import DifyClient
from dify_user_client.knowledge import DatasetPermissionEnum, RetrievalMethod
# Initialize client
client = DifyClient("YOUR_API_KEY")
# Get dataset
dataset = client.knowledge.get_dataset("dataset_id")
# Get current settings
settings = dataset.settings
# Update settings
dataset.update_settings(
name="Updated Dataset",
description="New description",
permission=DatasetPermissionEnum.ALL_TEAM,
retrieval_model={
"search_method": RetrievalMethod.HYBRID_SEARCH,
"weights": {
"vector_setting": {"vector_weight": 0.7},
"keyword_setting": {"keyword_weight": 0.3}
}
}
)
Settings Configuration
Retrieval Methods
The API supports three retrieval methods:
SEMANTIC_SEARCH
: Uses embeddings for semantic similarity searchFULL_TEXT_SEARCH
: Uses keyword-based searchHYBRID_SEARCH
: Combines both semantic and keyword search
# Configure semantic search
dataset.update_settings(
retrieval_model={
"search_method": RetrievalMethod.SEMANTIC_SEARCH,
"weights": {
"vector_setting": {"vector_weight": 1.0},
"keyword_setting": {"keyword_weight": 0.0}
}
}
)
Permissions
Dataset access can be controlled with the following permission levels:
ONLY_ME
: Only the creator can accessALL_TEAM
: All team members can accessPARTIAL_TEAM
: Selected team members can access
# Update permissions
dataset.update_settings(
permission=DatasetPermissionEnum.ALL_TEAM
)
Settings Properties
The dataset settings object includes the following properties:
id
: Dataset identifiername
: Dataset namedescription
: Optional dataset descriptionpermission
: Access permission levelindexing_technique
: “high_quality” or “economy”retrieval_model_dict
: Retrieval configuration -search_method
: Search method to use -weights
: Weight configuration for hybrid search -top_k
: Number of results to return -score_threshold
: Minimum score thresholdembedding_model
: Name of the embedding modelembedding_model_provider
: Provider of the embedding model