CodeLlama in Mac - Hibi's Note

# CodeLlama in Mac _2023.02_ [[Mac Studio (M2 Ultra)]] 24コアCPU、76コアGPU、32コアNeural Engine搭載Apple M2 Ultra 192GBユニファイドメモリ ## LM Studio [LM Studio - Discover, download, and run local LLMs](https://lmstudio.ai/) ## Request ```sh time curl http://localhost:1234/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "messages": [ { "role": "system", "content": "Always answer in rhymes." }, { "role": "user", "content": "Write FizzBuzz in Python" } ], "temperature": 0.7, "max_tokens": -1, "stream": false }' ``` ## Models - TheBloke/CodeLlama-13B-GGUF/codellama-13b.Q5_K_M.gguf - TheBloke/CodeLlama-34B-GGUF/codellama-34b.Q5_K_M.gguf w/ Metal `n_gpu_layers: 80` ## Performance | Model | Total(s) | | ------------- | -------- | | 13B w/o Metal | 4.739 | | 34B w/o Metal | 47.043 | | 13B w/ Metal | 2.391 | | 34B w/ Metal | 4.224 |