Breaking News! OpenAI Officially Open-Sources GPT-OSS, OpenAI Finally Goes 'Open'

OpenAI has released the model weights of GPT-OSS for the first time, allowing local execution on ordinary computers, compatible across multiple platforms, and supporting chain-of-thought and tool calling, with performance rivaling the closed-source version.

On August 5, OpenAI has finally stepped out of the walls of closed-source development and publicly released the new GPT-OSS model weights. This marks OpenAI's first true "open" release of large model weights in six years since GPT-2 in 2019. Both ordinary developers and enterprises seeking privacy and cost optimization can now run AI models sourced from ChatGPT directly on their own hardware.

Overview of GPT-OSS Models

The released gpt-oss series includes two variants:

gpt-oss-120b: Approximately 117 billion parameters, using a MoE (Mixture of Experts) architecture, with powerful reasoning and logical capabilities, specially designed for advanced inference and production deployment. Its performance closely matches OpenAI's closed o4-mini model.
gpt-oss-20b: 21 billion parameters, suitable for localized and specialized scenarios. It can be easily run on typical consumer-grade GPUs (16GB VRAM) or slightly higher configuration laptops. Its performance is on par with o3-mini.

Both models are licensed under the Apache 2.0 license, allowing free commercial use without the need for licensing or payment; they are ready to use upon download.

Performance and Application Highlights

Chain-of-Thought (CoT) and Tool Calling capabilities facilitate the integration of more complex automation in various scenarios.
Supports local private deployment, providing full autonomy over the entire data process, thereby effectively reducing cloud dependency and privacy risks.
Performance assessments indicate that the 120b version has surpassed o3-mini in programming, mathematics, health Q&A, and tool calling, standing neck and neck with o4-mini. The 20b version also excels within its parameter class.
Supports a maximum context input of 128k, suitable for handling long documents and complex tasks.

Quick Start: Three Methods for Local Deployment

Method 1: Ollama (suitable for most users)

Download the Ollama client
Pull the model: sh ollama pull gpt-oss:20b # or gpt-oss:120b
Run the dialogue directly: sh ollama run gpt-oss:20b
Call via API: python from openai import OpenAI client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama") resp = client.chat.completions.create(model="gpt-oss:20b", messages={"role":"user","content":"Hello!"}) print(resp)

Method 2: Transformers (flexible and suitable for developers)

Install dependencies: sh pip install transformers accelerate torch triton kernels
Load and infer: python from transformers import pipeline pipe = pipeline("text-generation", model="openai/gpt-oss-20b", torch_dtype="auto", device_map="auto") output = pipe({"role": "user", "content": "Explain quantum mechanics"}, max_new_tokens=200) print(output[0]"generated_text")
Start a local service for API access: sh transformers serve
In another terminal run

transformers chat localhost:8000 --model-name-or-path openai/gpt-oss-20b

Method 3: llama.cpp (optimal lightweight deployment, supporting non-GPU devices)

Install llama.cpp and configure Hugging Face CLI.
Download 4-bit Q4_K_S quantized model files from Hugging Face.
Start the local inference server: sh llama-server -m "path/to/model.gguf"
Access the complete local AI dialogue experience by visiting http://localhost:8080 in your browser.

Safety Compliance and Risk Control

OpenAI has conducted a safety risk assessment on the gpt-oss models. The results indicate that the risk of malicious fine-tuning for this series is generally lower than that of their closed models. Additionally, the open-source version lacks direct supervision over chain-of-thought outputs, enhancing the open research space while issuing a clear warning to developers on how to present the model's "thought process": CoT content is for development and debugging only, and should not be presented directly to end users to avoid inappropriate or fictitious outputs.

Who is Best Suited to Use GPT-OSS?

Developers, Researchers: gpt-oss-20b offers excellent cost-performance, allowing for effortless desktop experimentation.
Enterprises/Organizations: gpt-oss-120b is suitable for custom development, product launches, and data sovereignty scenarios.
Privacy-Conscious and Localized Scenarios: Fully local, no reliance on external APIs, eliminating privacy concerns and cost burdens.

Industry Implications Behind the Move

OpenAI's true "open-source" step not only provides developers with new tooling options but also represents a shift in business strategy and technology direction. The digital world is changing rapidly, and the open-source market is becoming a new battleground for model vendors vying for developers, enterprise users, and ecological forces. In the past, developers could only use the models remotely through OpenAI's API with all data flowing through OpenAI servers. Now, with model weights openly available, local control is complete, significantly empowering the development process.

Open-sourcing lowers the entry barrier for startups and resource-constrained industries to adopt and integrate AI, driving technological innovation and fostering a richer application ecosystem. This also serves as an encouragement for Chinese AI firms to step up their open-source efforts—future global AI proliferation will rely not just on closed-source giants, but on collective societal cooperation and innovation.

Resources and Further Reading

Now, you can experience the freedom and power of top-tier AI models on your own computer or server. Whether you aim to develop AI applications, conduct data analysis, or integrate cutting-edge AI engines into your products, GPT-OSS is worth trying. Don't wait for "GPT-5," dive in and experience the AI that truly belongs to you!

For more advanced AI models, visit YooAI, with no renewal fees and free to use!