Ryujin 3.5 !new! ❲RECENT · SOLUTION❳

Works best with vLLM for production (supports MoE expert parallelism) or llama.cpp (with MoE kernels) for CPU inference. Ryujin 3.5 vs. The Competition | Feature | Ryujin 3.5 | Mixtral 8x7B | DeepSeek-V2 | | :--- | :--- | :--- | :--- | | Active Params | 6B | 12B | 21B | | Total Params | 35B | 47B | 236B | | Expert Count | 16 | 8 | 160 | | Context Window | 256k | 32k | 128k | | License | Apache 2.0 | Apache 2.0 | MIT |

from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_name = "ryujin-3.5-35b-moe" tokenizer = AutoTokenizer.from_pretrained(model_name) ryujin 3.5

model = AutoModelForCausalLM.from_pretrained( model_name, device_map="auto", torch_dtype=torch.float16, load_in_4bit=True # Critical for MoE memory savings ) Works best with vLLM for production (supports MoE

prompt = "Explain the significance of the Dragon God in Shinto mythology." inputs = tokenizer(prompt, return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=512) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ryujin 3.5