Rate Limiting
Current Rate Limiting Policy
Currently, our rate limiting rules are as follows:
Each user can only initiate one request at a time. Requests exceeding this limit will return a 429 error code.
The current service is free to use, but please use resources reasonably and avoid unnecessary high-concurrency requests.
Concurrency Limit | Description |
---|---|
Streaming requests | Tokens are immediately released after active cancellation. Streaming requests are recommended for improved efficiency. |
Non-streaming requests | After active cancellation, the model continues running in the background and tokens are only released after completion. |
Recommended Usage
Prioritize streaming requests: Streaming requests immediately release tokens after active cancellation, allowing for more efficient resource utilization.