Hive on MR3 is a high-performance alternative to Apache Hive and Spark SQL, running natively on Kubernetes.
Hive on MR3 is a powerful, cost-effective, and portable way to run Apache Hive workloads. It combines the familiarity of Hive with the performance and efficiency of the MR3 execution engine.
Achieve high performance without sacrificing correctness.
Run interactive and batch queries side by side in a single system.
Run in any environment with flexible compute and storage options.
Hive on MR3 runs slightly slower than Trino for sequential queries, but significantly faster under concurrent workloads on the 10TB TPC-DS benchmark. Unlike Trino, it returns correct results for all queries.
Hive on MR3 features fault-tolerant execution and built-in capacity scheduling. By leveraging capacity scheduling, interactive queries can be prioritized while batch jobs continue running reliably in the background — ensuring smooth operation within a single unified system.
Hive on MR3 can run both interactive and batch queries together, simplifying operations and reducing costs. With fast autoscaling, smart caching, and easy deployment, it offers a powerful combination of performance, resource efficiency, and portability.
Simplify operations and reduce costs with a single system for all workloads.
Maximize resource efficiency with autoscaling and smart caching.
Set up fast with automation scripts and production-ready configurations.
Many organizations deploy separate systems for interactive and batch queries, increasing complexity and costs. Hive on MR3 streamlines operations by offering a single fault-tolerant system that handles both workloads. Capacity scheduling ensures efficient resource usage without compromising performance.
With Hive on MR3, one system is all you need.
Still running Apache Hive 2 or 3 on Hadoop? Hive on MR3 makes upgrading easy, with no need to overhaul your infrastructure. Choose the path that works best for your environment.
Run on your current Hadoop cluster with a compatible build of Hive on MR3 we provide.
Deploy Hive on MR3 on Kubernetes alongside your existing Hadoop cluster.
Use Hive on MR3 without Hadoop or Kubernetes. Just deploy and start.
When it was time to expand our existing analytics Hadoop cluster, we
knew it was also time to upgrade our software. We were heavily
invested in Hive but HDP was no longer available and, as a small
company, we could not afford its then current replacement. A search
of our options led us to DataMonad's Hive/MR3.
Hive/MR3 not only met all of our requirements and expectations, it
exceeded them. We had hoped to switch from a Hadoop-only cluster to
one using Kubernetes for greater flexibility. Hive/MR3 supported
that.
We needed to support large batch and quick interactive queries.
Hive/MR3 supported them both and performed as well or better than
Hive/Tez and Trino without the extra complexity of LLAP. For
interactive queries, some of our users even used Hive/MR3 exclusively
instead of having to use Trino.
Finally, Hive/MR3 stood out in two other, very important ways.
First, it was affordable. Second, the customer support was top notch.
DataMonad went above and beyond many times tracking down and fixing
bugs in Hive or in our own processing. Using Hive/MR3 let us
concentrate on serving our customers instead of having to roll and
maintain our own Hive installation or switch to a totally different
technology.
David Engel, Intrusion Inc., USA
Hive on MR3 delivers strong performance across both sequential and concurrent workloads. Based on the 10TB TPC-DS benchmark:
Unlike Trino, which may return incorrect results for some queries, Hive on MR3 consistently produces correct answers.
Yes. Hive on MR3 is architected from the outset to support both interactive and batch queries in a single unified system. Its fault-tolerant execution and built-in capacity scheduling allow different types of workloads to run together efficiently. Interactive queries can be prioritized for faster response times, while batch jobs continue running reliably in the background — ensuring smooth operation without the need to manage separate systems.
Yes. Hive on MR3 runs in any environment — on Hadoop, on Kubernetes, or even in standalone mode without a resource manager. It supports both HDFS and S3, enabling full separation of compute and storage. You can deploy Hive on MR3 on-premises, in the cloud, or in hybrid environments. This flexibility allows you to tailor deployment to any infrastructure.
In many organizations, interactive and batch workloads are handled by separate systems — one optimized for responsiveness, the other for throughput. This approach adds complexity, increases infrastructure costs, and requires maintaining multiple platforms.
Hive on MR3 eliminates this divide by supporting both types of queries in a single fault-tolerant system. With built-in capacity scheduling, it allows interactive queries to take priority without delaying batch jobs. This unified design simplifies operations, reduces infrastructure costs, and eliminates the need to maintain multiple platforms.
Hive on MR3 improves resource efficiency through fast autoscaling and smart caching. In cloud environments, it can scale quickly based on workload demand, efficiently combining spot and on-demand instances without risking query interruption. Selective caching can reduce repeated access to storage like S3, minimizing both latency and cost.
Hive on MR3 offers multiple deployment options: shell scripts for all environments, and Helm charts and a custom TypeScript generator for Kubernetes. With quick start guides and production-ready configurations, data engineers familiar with distributed systems can typically get Hive on MR3 running in about 30 minutes, given a suitable on-premises environment. In cloud environments, setup may take longer depending on provisioning, network configuration, and cloud-specific security settings.
Hive on MR3 requires a working cluster and a database for the Hive Metastore (such as MySQL or PostgreSQL). You’ll also need writable local disks on every worker node to store intermediate query data. For shared temporary data, you can use either a PersistentVolume on Kubernetes, or distributed storage like HDFS or S3. Detailed prerequisites are available in the quick start guides.
Yes, migrating your existing Hive Metastore and workloads to Hive on MR3 is straightforward. Hive on MR3 uses the same Metastore schema as Apache Hive, with no differences at all. If your Apache Hive version matches the version of Hive on MR3 you intend to use, you can directly reuse your existing Metastore. For older versions such as Hive 2 or 3, you can follow the standard Hive upgrade procedure. Existing Hive queries and User-Defined Functions (UDFs) also work without changes, as Hive on MR3 preserves the same interface — only replacing the underlying execution layer.
For users running Hive 3.1, a compatible build of Hive 3.1 on MR3 is available upon request.
Once you meet the requirements, the first step is to download the MR3 release from the public repository and follow the quick start guides. In most cases, users find the guides clear enough to get started without needing additional help. One team even adopted Hive on MR3 in production without ever contacting us!
If you need help, the best place to start is the MR3 Slack, where you can ask questions and get real-time help from the team. You can also post in the MR3 Google Group for longer discussions or support. If you prefer, you can contact us directly by email as well.
While the MR3 execution engine is not open source, Hive on MR3 is fully open source, with its source code publicly available on GitHub. It consists of two customizable components: a fork of Apache Hive extended to run on MR3, and a runtime library originally based on Apache Tez but significantly evolved over time. We provide shell scripts for rebuilding Hive on MR3 using these components — so you can apply patches or tailor the system to your needs without vendor involvement.
Unlike some open source products that restrict key features to paid editions, Hive on MR3 offers all features even in the Free plan. It gives users substantial and targeted control over the system, with the ability to modify and rebuild key components for query compilation and execution.
The only part not open is the MR3 execution engine, which manages low-level operations like resource management, fault tolerance, and task scheduling — areas that users don’t typically need to modify. If you do need changes to the execution engine, however, you can simply reach out. We are happy to implement new features at no cost.
We understand that enterprise teams may require a deeper technical evaluation of the MR3 execution engine before adoption. While MR3 is not open source, we offer source code walkthrough sessions — live, developer-led calls that explain the architecture using detailed design documents and walk through key parts of the source code. These sessions are designed to give your team the confidence needed to evaluate MR3 for production use.
Yes. Hive on MR3 inherits its security capabilities directly from Apache Hive, which means it supports the same integrations for authentication, authorization, and encryption — including tools like Apache Ranger, LDAP, Kerberos, SAML, and more. Because Hive on MR3 builds on Apache Hive, it benefits from the full range of security features maintained by the Hive community. As Hive continues to evolve, those improvements are naturally reflected in Hive on MR3 as well.
No — there is no vendor lock-in with Hive on MR3. Since it works with the standard Hive Metastore, you can switch back to Apache Hive or move to another technology whenever you choose. This flexibility is even greater if you use an open table format like Apache Iceberg.
The development of MR3 began in July 2015, following two years of preliminary research. The first official release, MR3 0.1, was launched in March 2018 and featured Hive on MR3 as its first application. Since then, we have been actively contributing to Apache Hive and expanding MR3 with new features and improvements. MR3 reflects nearly a decade of focused engineering, hands-on experience with Hive, and a long-term commitment to performance and stability.
Yes. Hive on MR3 has been used in production by several companies, and a few continue to run it in production today. The system has matured through years of feedback from real-world deployments. With the release of MR3 2.0, we’re focused on making Hive on MR3 more broadly accessible to teams who can benefit from it.
Try Hive on MR3 today and see the difference.