To ensure acceptable performance, the size of the model should be at least two times smaller than the amount of RAM available on the server and ⅔ of the available video memory on the GPU. For example, a model of size 8GB requires 16GB of RAM and 12GB of video memory on the GPU.