Zenera Logo

The Q&A service lets you create a Q&A Agentic Interface that uses static data sources: company website pages, product manuals, guidelines, FAQ pages, articles and so on.

You can define the following types of static corpuses in the dialog script:

  • Web corpus: retrieve information from website pages and PDF files available online
  • Text corpus: use plain text as an information source

Web corpus

To define a web corpus for your Q&A Agentic Interface, use the corpus() function.

corpus({    
    title: `HTTP corpus`,
    urls: [
        `https://developer.mozilla.org/en-US/docs/Web/HTTP/Overview`,
        `https://developer.mozilla.org/en-US/docs/Web/HTTP/Messages`,
        `https://developer.mozilla.org/en-US/docs/Web/HTTP/Session`],
    auth: {username: 'johnsmith', password: 'password'},
    include: [/.*\.pdf/],
    exclude: [`https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Evolution_of_HTTP`],
    query: transforms.queries,
    transforms: transforms.answers,
    depth: 1, 
    maxPages: 5,
    priority: 0, 
});

Corpus parameters

NameTypeRequired/OptionalDescription
titlestringOptionalCorpus title.
urlsstring arrayRequiredList of URLs from which information must be retrieved. You can define URLs of website folders and pages.
authJSON objectOptionalCredentials to access resources that require basic authentication.
includestring arrayOptionalResources to be obligatory indexed. You can define an array of URLs or use RegEx to specify a rule.
excludestring arrayOptionalResources to be excluded from indexing. You can define an array of URLs or use RegEx to specify a rule.
queryfunctionOptionalTransforms function used to process user queries.
transformsfunctionOptionalTransforms function used to format the corpus output.
depthintegerOptionalCrawl depth for web and PDF resources. The minimum value is 0 (crawling only the page content without linked resources).
maxPagesintegerOptionalMaximum number of pages and files to index. If not set, only 1 page with the defined URL will be indexed.
priorityintegerOptionalPriority level assigned to the corpus. Corpuses with higher priority are considered more relevant when user requests are processed.

Note:

  • Make sure the websites and pages you define in the corpus() function are not protected from crawling. The Q&A service cannot retrieve content from such resources.
  • The indexing process may take some time. To check the progress and results, use the Alan AI Studio logs.
  • The maximum number of indexed pages depends on your pricing plan.

Text corpus

To define a text corpus for the Q&A Agentic Interface, add plain text strings or Markdown-formatted text to the corpus() function:

corpus({
    title: `HTTP corpus`,
    text: `
        # Understanding **async/await** in JavaScript

        **async/await** is a feature in JavaScript that makes working with asynchronous code easier and more readable.

        ## How Does **async/await** Work?

        ### **async** Keyword:
        - The **async** keyword is used to declare a function as asynchronous.
        - An **async** function returns a **Promise**.

        ### **await** Keyword:
        - The **await** keyword can only be used inside an **async** function.
        - It pauses the execution until the **Promise** is settled.
    `,
    query: transforms.queries,
    transforms: transforms.answers,
    priority: 0,
});