TL;DR

  • NVIDIA integrates DeepSeek's advanced Thinking Model R1 into its Inference Microservices, enhancing AI reasoning capabilities with 671 billion parameters.
  • The model employs complex thought processes such as chain-of-thought and consensus, necessitating powerful hardware for real-time performance.
  • Developers can now access a preview service, optimized for NVIDIA's architecture, to facilitate tailored AI solutions for enterprises.
  • NVIDIA aims to simplify AI deployments while prioritizing security and privacy through customizable microservices.

NVIDIA has unveiled the integration of DeepSeek’s Thinking Model R1 into its NVIDIA Inference Microservices (NIM), leveraging the model’s capacity for sophisticated reasoning.

DeepSeek-R1 isn’t about serving straightforward answers; rather, it involves a series of thought processes—chain-of-thought, consensus, and search methods—to arrive at the best response, a technique known as test-time scaling.

DeepSeek-R1, boasting 671 billion parameters, reflects a significant advancement in reasoning models. Its architecture involves a mixture-of-experts (MoE) model, using multiple experts per layer to process input context effectively.

With a large input context length of 128,000 tokens, each layer in R1 contains 256 experts, making real-time inference demanding but achievable with many GPUs. NVIDIA’s Erik Pounds points out in the announcement that this shows why models like DeepSeek-R1 need accelerated computing for agentic AI inference.

Nvidia’s NIM service essentially lets users run various AI models using its GPUs.

For developers, NVIDIA is launching a preview of DeepSeek-R1 as a NIM microservice on build.nvidia.com. By running on a single NVIDIA HGX H200 system, the service can produce up to 3,872 tokens per second, using NVIDIA’s Hopper architecture’s FP8 Transformer Engine for optimized performance.

NVIDIA believes its Blackwell architecture will boost reasoning models further, delivering up to 20 petaflops of FP4 compute performance. This integration hopes to streamline deployments for enterprises, maintaining security and privacy on their infrastructure, with support for industry-standard APIs.

The company expects the API for DeepSeek-R1 to soon become available as downloadable NIM microservices under the NVIDIA AI Enterprise software platform.

Using tools like NVIDIA NeMo, enterprises can customize the R1 microservices for specialized applications, further illustrating NVIDIA’s commitment to facilitating advanced AI deployments on accessible and secure infrastructures.

DeepSeek is an open-sourced large language model (LLM) made by Chinese hedge fund High-Flyer, and shot to prominence just last week as mainstream media picked up on its new thinking model R1 outperforming the benchmarks of a similar offering o1 from OpenAI, while being trained at a fraction of the costs.

The native DeepSeek app that lets users talk to the model for free shot to prominence, climbing to the top of the iPhone charts in the United States, beating ChatGPT.

The Liang Wenfeng-led company has been grappling with keeping its services running with the sudden rise in popularity that have also apparently brought on cyberattacks.

Meanwhile, interfaces are rushing to integrate the R1 model. Ahead of Nvidia, Perplexity announced it had included the o1 thinking model as part of its Pro subscription, using data centers in the Western countries to address concerns of privacy and national security with DeepSeek’s own native servers based in China.

What does it take to achieve financial independence and retire early? Fire Fast by Dzambhala helps you understand and plan it out.

Join the vibrant privacy-ensured Dzambhala community on