Keyword Search API - Page Level

Page level keyword search API makes it possible for the content search based on search option.

Keyword Search API - Page Level

POST https://backend.alli.ai/webapi/keyword_search/pages

Page level keyword search API makes it possible for the content search based on search option.

Getting the API KEY

Please provide your API key in the request header API-KEY. Your API key can be found in your dashboard Settings menu, under the General tab.

Request

Provide your API KEY in the request header API-KEY.

NameTypeDescription

API-KEY*

string

Your cognitive search API key can be found in your dashboard Settings menu, under the General tab.

Request Body

NameTypeDescription

topN

int

How many result want to get (default: 10, up to: 100)

node*

SearchNode

Search data model for keyword search condition

cursor

string

pagination cursor

hashtags

HashtagNode

Search data model that includes hashtags condition

excludingHashtags

HashtagNode

Search data model that excludes hashtags condition

Search Node

NameTypeDescription

WildCardNode

{
  "nodeType": "wildcard",
  "input": "string"
}

Search for pages where containing input keyword with wildcard. Return the start page if phrase spans across multiple pages

ExactlyPhraseNode

{
  "nodeType": "exactly_phrase",
  "input": "string"
}

Search for pages where containing input phrase exactly. Return the start page if phrase spans across multiple pages

WithinXWordsNode

{
  "nodeType": "within_x_words",
  "input1": "string",
  "input2": "string",
  "distance": int
}

Search for pages where input1 keyword and input2 keyword are separated by a maximum distance. Return pages containing input1.

ExcludeKeywordNode

{
  "nodeType": "exclude_keyword",
  "input": "string"
}

Search for pages where not containing input keyword. This SearchNode utilizes AND and OR nodes to combine with other SearchNodes.

AndNode

{
  "nodeType": "and",
  "nodes": [SearchNode]
}

Search for pages where Include all nodes result pages Requirement: nodes need to include at least two nodes.

OrNode

{
  "nodeType": "or",
  "nodes": [SearchNode],
}

Search for pages where Include any nodes result pages

Requirement: nodes need to include at least two nodes.

Hashtag Node

NameTypeDescription

hashtags

array[string]

Hashtags to filter the list. Only the documents with these hashtags are searched. You can add multiple hashtags separated by a comma.

combinedHashtags

array[array[string]]

Documents you uploaded to the dashboard have hashtags. You can set the search scope to allow for groups of hashtags.

Note: If "combinedHashtags" are used, you must keep the "hashtags" parameter empty.

hashtagsOperator

string

and or or. If it's and, 'hashtags' filter for multiple hashtags works with AND logic. If or, OR logic.

combinedHashtagsOperator

string

Either and or or. Choose and if you want both of the combined hashtags in the results.

Choose or if you want at least one of the combined hashtags in the results.

Example

Request Example

Please replace YOUR_API_KEY with your one in the example below. Please see getting-api-key section.

curl https://backend.alli.ai/webapi/keyword_search/pages \
-d '{"topN": 2, "node": { "nodeType": "wildcard", "input": "hello*" }}' \
-H "Content-Type: application/json" \
-H "API-KEY: YOUR_API_KEY"

Response Example

{
  "projectId": "PROJECT_ID",
  "cursor": "CURSOR_STRING"
  "results": [
    {
      "knowledgeBaseId": "KNOWLEDGE_BASE_ID",
      "title": "Allganize Document",
      "pageNo": 3
    },
    {
      "knowledgeBaseId": "KNOWLEDGE_BASE_ID",
      "title": "What is the AI",
      "pageNo": 1
    }
  ]
}

Pagination Example

Want to search for a list of pages that contain 'input*'

Please replace YOUR_API_KEY with your one in the example below. Please see getting-api-key section.

1. Request page level keyword search using wildcard node type

curl https://backend.alli.ai/webapi/keyword_search/pages \
-d '{"topN": 2, "node": { "nodeType": "wildcard", "input": "hello*" }}' \
-H "Content-Type: application/json" \
-H "API-KEY: YOUR_API_KEY"

Response returns cursor and results.

{
  "projectId": "PROJECT_ID",
  "cursor": "CURSOR_STRING"
  "results": [
    {
      "knowledgeBaseId": "KNOWLEDGE_BASE_ID",
      "title": "Allganize Document",
      "pageNo": 3
    },
    {
      "knowledgeBaseId": "KNOWLEDGE_BASE_ID",
      "title": "What is the AI",
      "pageNo": 1
    }
  ]
}

2. If the response cursor exists, fetch the next data using the cursor key.

curl https://backend.alli.ai/webapi/keyword_search/pages \
-d '{"topN": 2, "node": { "nodeType": "wildcard", "input": "hello*", "cursor": "CURSOR_STRING" }}' \
-H "Content-Type: application/json" \
-H "API-KEY: YOUR_API_KEY"

if the response cursor exists, goto 2.

3. If response cursor is none, finish

Hashtag Example

Want to search for a list of pages that contain 'hello*' with hashtag conditions

  • Search for pages which has the hashtag test1

{
    "topN": 10,
    "node": {
        "nodeType": "wildcard",
        "input": "hello*"
    },
    "hashtags": {
        "hashtags": ["test1"]
    }
}
  • Search for pages which has the hashtag test1 or test2

{
    "topN": 10,
    "node": {
        "nodeType": "wildcard",
        "input": "hello*"
    },
    "hashtags": {
        "hashtags": ["test1", "test2"],
        "hashtagsOperator": "or"
    }
}
  • Search for pages which has the hashtag test1 and test2

{
    "topN": 10,
    "node": {
        "nodeType": "wildcard",
        "input": "hello*"
    },
    "hashtags": {
        "hashtags": ["test1", "test2"],
        "hashtagsOperator": "and"
    }
}
  • Search for pages which has the hashtag (( test1 and test2 ) or ( test3 and test4 ))

{
    "topN": 10,
    "node": {
        "nodeType": "wildcard",
        "input": "hello*"
    },
    "hashtags": {
        "combinedHashtags": [["test1", "test2"], ["test3", "test4"]],
        "hashtagsOperator": "or",
        "combinedHashtagsOperator": "and"
    }
}
  • Search for pages which has the hashtag (( test1 or test2 ) and ( test3 or test4 ))

{
    "topN": 10,
    "node": {
        "nodeType": "wildcard",
        "input": "hello*"
    },
    "hashtags": {
        "combinedHashtags": [["test1", "test2"], ["test3", "test4"]],
        "hashtagsOperator": "and",
        "combinedHashtagsOperator": "or"
    }
}
  • Search for pages without the hashtag test1

{
    "topN": 10,
    "node": {
        "nodeType": "wildcard",
        "input": "hello*"
    },
    "excludingHashtags": {
        "hashtags": ["test1"]
    }
}
  • Search for pages without the hashtag test1 or test2

{
    "topN": 10,
    "node": {
        "nodeType": "wildcard",
        "input": "hello*"
    },
    "excludingHashtags": {
        "hashtags": ["test1", "test2"],
        "hashtagsOperator": "or"
    }
}
  • Search for pages without the hashtag test1 and test2

{
    "topN": 10,
    "node": {
        "nodeType": "wildcard",
        "input": "hello*"
    },
    "excludingHashtags": {
        "hashtags": ["test1", "test2"],
        "hashtagsOperator": "and"
    }
}
  • Search for pages which has the hashtag (( test1 or test2 ) and ( test3 or test4 )) and without the hashtag test1 and test3 :

{
    "topN": 10,
    "node": {
        "nodeType": "wildcard",
        "input": "hello*"
    },
    "hashtags": {
        "combinedHashtags": [["test1", "test2"], ["test3", "test4"]],
        "hashtagsOperator": "and",
        "combinedHashtagsOperator": "or"
    },
    "excludingHashtags": {
        "hashtags": ["test1", "test3"],
        "hashtagsOperation": "and"
    }
}

Limitation

Search Request

  • Exclude Search must be used in combination with Include Search.

  • Wildcard Search is supported for word.

    • Wildcards attached to phrases like "hello world*" are not supported

Search Result

  • Exactly Phrase and With X queries supported for data belonging to consecutive 2 pages.

Search Quality

  • Presence of Header and Footer may affect proper search results.

  • Unrecognized control characters in documents may cause search issues.

  • Documents with no word wrap between pages may not be properly searchable.

  • We manage the index in Elasticsearch by adding commonly recommended analyzers for English sentences. e.g.) Normalization of verb tenses (past tense, present tense). Due to the related analyzer, some indexed searches may not be clear, and in such cases, searches should be conducted using exactly_phrase.

Last updated