Skip to main content

Rate Limiting

Current Rate Limiting Policy

Currently, our rate limiting rules are as follows:
Each user can only initiate one request at a time. Requests exceeding this limit will return a 429 error code.
The current service is free to use, but please use resources reasonably and avoid unnecessary high-concurrency requests.

Concurrency LimitDescription
Streaming requestsTokens are immediately released after active cancellation. Streaming requests are recommended for improved efficiency.
Non-streaming requestsAfter active cancellation, the model continues running in the background and tokens are only released after completion.

Prioritize streaming requests: Streaming requests immediately release tokens after active cancellation, allowing for more efficient resource utilization.