API endpoint¶
Running API server¶
As mentioned in the tutorial, you can run api server as follows:
from autorag.deploy import ApiRunner
import nest_asyncio
nest_asyncio.apply()
runner = ApiRunner.from_yaml('your/path/to/pipeline.yaml', project_dir='your/project/directory')
runner.run_api_server()
or
from autorag.deploy import ApiRunner
import nest_asyncio
nest_asyncio.apply()
runner = ApiRunner.from_trial_folder('/your/path/to/trial_dir')
runner.run_api_server()
autorag run_api --trial_dir /trial/dir/0 --host 0.0.0.0 --port 8000
Use NGrok Tunnel for public access¶
For accessing the API server from the public, you can use the NGrok tunnel service. It automatically creates ngrok tunnel to your local server.
You can see the logs of the public URL like below:
INFO [api.py:199] >> Public API URL: api.py:199
https://8a31-14-52-132-205.ngrok-free.app
This is the URL to your local server, so use it as the host at request.
Endpoints¶
1. /v1/run
(POST)¶
Summary: Run a query and get generated text with retrieved passages.
Request Body:
Content Type:
application/json
Schema:
Properties:
query
(string, required): The query string.result_column
(string, optional): The result column name. Default isgenerated_texts
.
Responses:
200 OK:
Content Type:
application/json
Schema:
Properties:
result
(string or array of strings): The result text or list of texts.retrieved_passage
(array of objects): List of retrieved passages.Properties:
content
(string): The content of the passage.doc_id
(string): Document ID.score
(string): The relevance score of the retrieved passage.filepath
(string, nullable): File path.file_page
(integer, nullable): File page number.start_idx
(integer, nullable): Start index.end_idx
(integer, nullable): End index.
2. /v1/retrieve
(POST)¶
This API endpoint allows developers to retrieve documents based on a specified query. It will ignore generator and prompt maker, only return retrieved passages.
The request must include a JSON object with the following structure:
{
"query": "your query string here"
}
Parameters¶
query (string, required): The search string used to retrieve documents.
Example Request¶
{
"query": "latest trends in AI"
}
Success Response¶
HTTP Status Code: 200 OK
Response Body¶
Properties¶
passages (array): An array of documents retrieved based on the query.
doc_id (string): The unique identifier for each document.
content (string): The content of the retrieved document.
score (number, float): The relevance score of the retrieved document.
filepath (string, optional): The file path of the document.
file_page (integer, optional): The page number of the document.
start_idx (integer, optional): The start index of the retrieved passage from the parsed data.
end_idx (integer, optional): The end index of the retrieved passage from the parsed data.
Example Response¶
{
"passages": [
{
"doc_id": "doc123",
"content": "Artificial Intelligence is transforming industries.",
"score": 0.98,
"filepath": "path/to/file",
"file_page": 2,
"start_idx": 100,
"end_idx": 150
},
{
"doc_id": "doc456",
"content": "The future of AI includes advancements in machine learning.",
"score": 0.92,
"filepath": null,
"file_page": null,
"start_idx": null,
"end_idx": null
}
]
}
3. /v1/stream
(POST)¶
Summary: Stream generated text with retrieved passages.
Description: This endpoint streams the generated text line by line. The
retrieved_passage
is sent first, followed by theresult
streamed incrementally.Request Body:
Content Type:
application/json
Schema:
Properties:
query
(string, required): The query string.result_column
(string, optional): The result column name. Default isgenerated_texts
.
Responses:
200 OK:
Content Type:
text/event-stream
Schema:
Properties:
type
(generated_text or retrieved_passage): If it is generated_text, you can see only the generated text. If it is retrieved_passage, you can see the retrieved passage and passage_index.generated_text
(string): The generated text from the generator (LLM). The result of the RAG system.retrieved_passage
(object): Retrieved passage.Properties:
content
(string): The content of the passage.doc_id
(string): Document ID.score
(string): The relevance score of the retrieved passage.filepath
(string, nullable): File path.file_page
(integer, nullable): File page number.start_idx
(integer, nullable): Start index.end_idx
(integer, nullable): End index.
passage_index
(integer): Index of the retrieved passage.
4. /version
(GET)¶
Summary: Get the API version.
Description: Returns the current version of the API as a string.
Responses:
200 OK:
Content Type:
application/json
Schema:
Properties:
version
(string): The version of the API.
API client usage example¶
Certainly! Below, I’ll provide both Python sample code using the requests
library and a curl
command for each of the API endpoints described in the OpenAPI specification.
Python Sample Code¶
First, ensure you have the requests
library installed. You can install it using pip if you haven’t already:
pip install requests
Here’s the Python client code for each endpoint:
import requests
from autorag.utils.util import decode_multiple_json_from_bytes
# Base URL of the API
BASE_URL = "http://example.com:8000" # Replace with the actual base URL of the API
def run_query(query, result_column="generated_texts"):
url = f"{BASE_URL}/v1/run"
payload = {
"query": query,
"result_column": result_column
}
response = requests.post(url, json=payload)
if response.status_code == 200:
return response.json()
else:
response.raise_for_status()
def stream_query(query, result_column="generated_texts"):
url = f"{BASE_URL}/v1/stream"
payload = {
"query": query,
"result_column": result_column
}
with requests.Session() as session:
response = session.post(url, json=payload, stream=True)
retrieved_passages = [] # This will store retrieved passages
# Check if the request was successful
if response.status_code == 200:
# Process the streaming response
for i, chunk in enumerate(response.iter_content(chunk_size=None)):
if chunk:
data_list = decode_multiple_json_from_bytes(chunk)
for data in data_list:
if data["type"] == "retrieved_passage":
retrieved_passages.append(data["retrieved_passage"])
else:
print(data["generated_text"], end="") # Stream the generated texts
else:
print(f"Request failed with status code: {response.status_code}")
print(f"Response content: {response.text}")
def get_version():
url = f"{BASE_URL}/version"
response = requests.get(url)
if response.status_code == 200:
return response.json()
else:
response.raise_for_status()
# Example usage
if __name__ == "__main__":
# Run a query
result = run_query("example query")
print("Run Query Result:", result)
# Stream a query
print("Stream Query Result:")
stream_query("example query")
# Get API version
version = get_version()
print("API Version:", version)
curl
Commands¶
Here are the equivalent curl
commands for each endpoint:
/v1/run
(POST)¶
curl -X POST "http://example.com/v1/run" \
-H "Content-Type: application/json" \
-d '{"query": "example query", "result_column": "generated_texts"}'
/v1/stream
(POST)¶
curl -X POST "http://example.com/v1/stream" \
-H "Content-Type: application/json" \
-d '{"query": "example query", "result_column": "generated_texts"}' \
--no-buffer
/v1/retrieve
(POST)¶
curl -X POST "http://example.com/v1/retrieve" \
-H "Content-Type: application/json" \
-d '{"query": "example query"}'
/version
(GET)¶
curl -X GET "http://example.com/version"