Knowledge
=========

This section describes the knowledge base functionality available in the Dify client library.

Dataset Management
----------------

.. py:class:: KnowledgeDataset

   Represents a knowledge dataset that can contain multiple documents.

   .. py:method:: __init__(id: str, client: DifyKnowledgeClient, info: dict = None)
      
      Initialize a new dataset.

   .. py:property:: documents() -> list[KnowledgeDocument]
      
      Returns all documents in the dataset.

   .. py:method:: get_document(document_id: str) -> KnowledgeDocument
      
      Retrieve a specific document by ID.

   .. py:method:: create_document_by_text(text: str, settings: KnowledgeSegmentSettings = None) -> KnowledgeDocument
      
      Create a new document from text content.

   .. py:method:: create_document_by_file(file_path: str, settings: KnowledgeSegmentSettings = None) -> KnowledgeDocument
      
      Create a new document from a file.

   .. py:method:: update_document_from_file(document_id: str, file_path: str, settings: KnowledgeSegmentSettings = None) -> KnowledgeDocument
      
      Update an existing document with new file content.

   .. py:method:: delete_document(document_id: str)
      
      Delete a document from the dataset.

   .. py:method:: delete()
      
      Delete the entire dataset.

Document
--------

.. py:class:: KnowledgeDocument

   Represents a document within a knowledge dataset.

   .. py:method:: __init__(id: str, client: DifyKnowledgeClient, dataset: KnowledgeDataset, batch_id: Optional[str] = None)
      
      Initialize a new document.

   .. py:property:: segments() -> list[KnowledgeSegment]
      
      Returns all segments in the document.

   .. py:method:: create_segments(segments: list[KnowledgeDocumentSegmentSettings]) -> list[KnowledgeSegment]
      
      Create multiple segments in the document.

   .. py:method:: get_segment(segment_id: str) -> KnowledgeSegment
      
      Retrieve a specific segment by ID.

   .. py:method:: delete_segment(segment_id: str)
      
      Delete a segment from the document.

   .. py:property:: indexing_status() -> DocumentIndexingStatuses
      
      Get the current indexing status of the document.

   .. py:method:: wait_for_indexing(timeout: int = 60) -> DocumentIndexingStatuses
      
      Wait for document indexing to complete.

   .. py:property:: data() -> KnowledgeDocumentData
      
      Get detailed document data.

   .. py:method:: delete()
      
      Delete the document.

Segment
-------

.. py:class:: KnowledgeSegment

   Represents a segment within a document.

   .. py:method:: __init__(id: str, client: DifyKnowledgeClient, dataset: KnowledgeDataset, document: KnowledgeDocument)
      
      Initialize a new segment.

   .. py:method:: update(settings: KnowledgeDocumentSegmentSettings)
      
      Update segment settings.

   .. py:method:: delete()
      
      Delete the segment.

Data Models
-----------

.. py:class:: DatasetPermissionEnum(str, Enum)

   Enumeration of dataset permission levels.

   .. py:attribute:: ONLY_ME
      
      Only the creator can access

   .. py:attribute:: ALL_TEAM
      
      All team members can access

   .. py:attribute:: PARTIAL_TEAM
      
      Selected team members can access

.. py:class:: DocumentIndexingStatuses(str, Enum)

   Enumeration of document indexing states.

   .. py:attribute:: WAITING
   .. py:attribute:: PARSING
   .. py:attribute:: CLEANING
   .. py:attribute:: SPLITTING
   .. py:attribute:: COMPLETED
   .. py:attribute:: INDEXING
   .. py:attribute:: ERROR
   .. py:attribute:: PAUSED

.. py:class:: KnowledgeToken

   Represents an API token for knowledge operations.

   .. py:attribute:: id: str
   .. py:attribute:: type: Literal["dataset"]
   .. py:attribute:: token: str
   .. py:attribute:: last_used_at: Optional[int]
   .. py:attribute:: created_at: int

.. py:class:: KnowledgeSegmentSettings

   Settings for knowledge segment processing.

   .. py:attribute:: name: Optional[str]
   .. py:attribute:: indexing_technique: Literal["high_quality", "economy"]
   .. py:attribute:: process_rule: ProcessRule

.. py:class:: KnowledgeDocumentSegmentSettings

   Settings for document segments.

   .. py:attribute:: content: str
      Text content / question content

   .. py:attribute:: answer: Optional[str]
      Answer content for Q&A mode

   .. py:attribute:: keywords: Optional[list[str]]
      Optional keywords

.. py:class:: KnowledgeDocumentData

   Detailed document information.

   .. py:attribute:: id: str
   .. py:attribute:: name: str
   .. py:attribute:: data_source_type: str
   .. py:attribute:: indexing_status: DocumentIndexingStatuses
   .. py:attribute:: tokens: int
   .. py:attribute:: segment_count: int
   .. py:attribute:: average_segment_length: int
   .. py:attribute:: hit_count: int
   .. py:attribute:: display_status: Literal["queuing", "paused", "indexing", "error", "available", "disabled", "archived"]

Hit
--- 

Dataset Settings
----------------

The Knowledge Dataset Settings API allows you to manage dataset configurations including retrieval methods, reranking settings, and permissions.

Basic Usage
~~~~~~~~~~~

.. code-block:: python

    from dify_user_client import DifyClient
    from dify_user_client.knowledge import DatasetPermissionEnum, RetrievalMethod

    # Initialize client
    client = DifyClient("YOUR_API_KEY")
    
    # Get dataset
    dataset = client.knowledge.get_dataset("dataset_id")
    
    # Get current settings
    settings = dataset.settings
    
    # Update settings
    dataset.update_settings(
        name="Updated Dataset",
        description="New description",
        permission=DatasetPermissionEnum.ALL_TEAM,
        retrieval_model={
            "search_method": RetrievalMethod.HYBRID_SEARCH,
            "weights": {
                "vector_setting": {"vector_weight": 0.7},
                "keyword_setting": {"keyword_weight": 0.3}
            }
        }
    )

Settings Configuration
~~~~~~~~~~~~~~~~~~~~~~

Retrieval Methods
'''''''''''''''''

The API supports three retrieval methods:

- ``SEMANTIC_SEARCH``: Uses embeddings for semantic similarity search
- ``FULL_TEXT_SEARCH``: Uses keyword-based search
- ``HYBRID_SEARCH``: Combines both semantic and keyword search

.. code-block:: python

    # Configure semantic search
    dataset.update_settings(
        retrieval_model={
            "search_method": RetrievalMethod.SEMANTIC_SEARCH,
            "weights": {
                "vector_setting": {"vector_weight": 1.0},
                "keyword_setting": {"keyword_weight": 0.0}
            }
        }
    )

Permissions
'''''''''''

Dataset access can be controlled with the following permission levels:

- ``ONLY_ME``: Only the creator can access
- ``ALL_TEAM``: All team members can access
- ``PARTIAL_TEAM``: Selected team members can access

.. code-block:: python

    # Update permissions
    dataset.update_settings(
        permission=DatasetPermissionEnum.ALL_TEAM
    )

Settings Properties
~~~~~~~~~~~~~~~~~~~

The dataset settings object includes the following properties:

- ``id``: Dataset identifier
- ``name``: Dataset name
- ``description``: Optional dataset description
- ``permission``: Access permission level
- ``indexing_technique``: "high_quality" or "economy"
- ``retrieval_model_dict``: Retrieval configuration
  - ``search_method``: Search method to use
  - ``weights``: Weight configuration for hybrid search
  - ``top_k``: Number of results to return
  - ``score_threshold``: Minimum score threshold
- ``embedding_model``: Name of the embedding model
- ``embedding_model_provider``: Provider of the embedding model