Inference Engine Examples

OpenAI unveils first custom AI inference chip, Jalapeño, with Broadcom — and its development was sped-up with OpenAI's own models

The companies attributed this speed to a deep software-hardware co-development process that actively used OpenAI’s own models ...

Interesting Engineering

Jalapeño, OpenAI’s first custom chip, targets gigawatt-scale data center deployment

OpenAI's new Jalapeño inference chip promises faster performance, improved efficiency, and lower computing costs.

The Next Platform

Tensordyne Converts AI Matrix Math To Logs To Crank Up Inference Oomph

Transformations are the key to such codes, and they rely on math that predates computing as we know it by centuries. There ...

eLife

SqueakPose Studio: An end-to-end platform for pose estimation and real-time edge-AI deployment

This important work introduces an integrated open-source platform for behavioral acquisition and pose estimation that substantially improves the accessibility and speed of real-time animal tracking ...

Defense One

First B-52s to get new engines this year

Two B-52 bombers will head back to their manufacturer for new engines this year, kicking off a long-awaited upgrade meant to help keep flying the Stratofortress until nearly their 100th birthday. On ...

Nasdaq

DigitalOcean Launches Inference Engine with New Capabilities for Production AI, Including Inference Router for Efficient Scaling of Agentic Workloads

Built alongside early design partners, the Inference Engine gives AI developers unified control over performance, cost, and scale — with customers reporting up to 67% lower inference costs. Inference ...

Morningstar

AI-Native Startups Are Leaving Hyperscalers for DigitalOcean's Agentic Inference Cloud

AI-native startups report 50% faster training cycles and 40% decrease in latency when running production AI on DigitalOcean. DigitalOcean (NYSE: DOCN), the Agentic Inference Cloud built for production ...

SDxCentral

Nvidia, hyperscaler-backed open standard for AI inference torch passed to Linux Foundation

An open standard for AI inference backed by Google Cloud, IBM, Red Hat, Nvidia and more was given to the Linux Foundation for stewardship in further proof training has been superseded by inference in ...

Wall Street Journal

Amazon Announces Inference Chips Deal With Cerebras

Amazon Web Services plans to deploy processors designed by Cerebras inside its data centers, the latest vote of confidence in the startup, which specializes in chips that power artificial-intelligence ...

VentureBeat

The team behind continuous batching says your idle GPUs should be running inference, not sitting dark

Every GPU cluster has dead time. Training jobs finish, workloads shift and hardware sits dark while power and cooling costs keep running. For neocloud operators, those empty cycles are lost margin.

GitHub

llminfer: A GPU-efficient LLM inference engine

This is a python package focused on systems performance: quantized weights, KV cache reuse, dynamic batching, token streaming, and rigorous benchmarking across backends. llminfer is for engineers who ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results