The Q&A service lets you create a Q&A Agentic Interface that uses static data sources: company website pages, product manuals, guidelines, FAQ pages, articles and so on.
You can define the following types of static corpuses in the dialog script:
- Web corpus: retrieve information from website pages and PDF files available online
- Text corpus: use plain text as an information source
Web corpus
To define a web corpus for your Q&A Agentic Interface, use the corpus() function.
corpus({
title: `HTTP corpus`,
urls: [
`https://developer.mozilla.org/en-US/docs/Web/HTTP/Overview`,
`https://developer.mozilla.org/en-US/docs/Web/HTTP/Messages`,
`https://developer.mozilla.org/en-US/docs/Web/HTTP/Session`],
auth: {username: 'johnsmith', password: 'password'},
include: [/.*\.pdf/],
exclude: [`https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Evolution_of_HTTP`],
query: transforms.queries,
transforms: transforms.answers,
depth: 1,
maxPages: 5,
priority: 0,
});Corpus parameters
| Name | Type | Required/Optional | Description |
|---|---|---|---|
title | string | Optional | Corpus title. |
urls | string array | Required | List of URLs from which information must be retrieved. You can define URLs of website folders and pages. |
auth | JSON object | Optional | Credentials to access resources that require basic authentication. |
include | string array | Optional | Resources to be obligatory indexed. You can define an array of URLs or use RegEx to specify a rule. |
exclude | string array | Optional | Resources to be excluded from indexing. You can define an array of URLs or use RegEx to specify a rule. |
query | function | Optional | Transforms function used to process user queries. |
transforms | function | Optional | Transforms function used to format the corpus output. |
depth | integer | Optional | Crawl depth for web and PDF resources. The minimum value is 0 (crawling only the page content without linked resources). |
maxPages | integer | Optional | Maximum number of pages and files to index. If not set, only 1 page with the defined URL will be indexed. |
priority | integer | Optional | Priority level assigned to the corpus. Corpuses with higher priority are considered more relevant when user requests are processed. |
Note:
- Make sure the websites and pages you define in the
corpus()function are not protected from crawling. The Q&A service cannot retrieve content from such resources. - The indexing process may take some time. To check the progress and results, use the Alan AI Studio logs.
- The maximum number of indexed pages depends on your pricing plan.
Text corpus
To define a text corpus for the Q&A Agentic Interface, add plain text strings or Markdown-formatted text to the corpus() function:
corpus({
title: `HTTP corpus`,
text: `
# Understanding **async/await** in JavaScript
**async/await** is a feature in JavaScript that makes working with asynchronous code easier and more readable.
## How Does **async/await** Work?
### **async** Keyword:
- The **async** keyword is used to declare a function as asynchronous.
- An **async** function returns a **Promise**.
### **await** Keyword:
- The **await** keyword can only be used inside an **async** function.
- It pauses the execution until the **Promise** is settled.
`,
query: transforms.queries,
transforms: transforms.answers,
priority: 0,
});