Hive on MR3 — Run Hive on Kubernetes, a Spark SQL Alternative

What is Hive on MR3?

Hive on MR3 is a powerful, cost-effective, and portable way to run Apache Hive workloads. It combines the familiarity of Hive with the performance and efficiency of the MR3 execution engine.

Fast and Correct

Achieve high performance without sacrificing correctness.

Unified Processing

Run interactive and batch queries side by side in a single system.

Runs Anywhere

Run in any environment with flexible compute and storage options.

Why Hive on MR3?

Hive on MR3 can run both interactive and batch queries together, simplifying operations and reducing costs. With fast autoscaling, smart caching, and easy deployment, it offers a powerful combination of performance, resource efficiency, and portability.

Operational Efficiency

Simplify operations and reduce costs with a single system for all workloads.

Resource Efficiency

Maximize resource efficiency with autoscaling and smart caching.

Deployment Efficiency

Set up fast with automation scripts and production-ready configurations.

What Our Users Say About Hive on MR3

When it was time to expand our existing analytics Hadoop cluster, we knew it was also time to upgrade our software. We were heavily invested in Hive but HDP was no longer available and, as a small company, we could not afford its then current replacement. A search of our options led us to DataMonad's Hive/MR3.

Hive/MR3 not only met all of our requirements and expectations, it exceeded them. We had hoped to switch from a Hadoop-only cluster to one using Kubernetes for greater flexibility. Hive/MR3 supported that. We needed to support large batch and quick interactive queries. Hive/MR3 supported them both and performed as well or better than Hive/Tez and Trino without the extra complexity of LLAP. For interactive queries, some of our users even used Hive/MR3 exclusively instead of having to use Trino.

Finally, Hive/MR3 stood out in two other, very important ways. First, it was affordable. Second, the customer support was top notch. DataMonad went above and beyond many times tracking down and fixing bugs in Hive or in our own processing. Using Hive/MR3 let us concentrate on serving our customers instead of having to roll and maintain our own Hive installation or switch to a totally different technology.

David Engel, Intrusion Inc., USA

Frequently Asked Questions

Product Capabilities

How does Hive on MR3 compare with Trino and Spark in performance?

Hive on MR3 delivers strong performance across both sequential and concurrent workloads. Based on the 10TB TPC-DS benchmark:

For sequential runs, Hive on MR3 performs slightly slower than Trino but significantly faster than Spark.
For concurrent workloads, Hive on MR3 significantly outperforms both Trino and Spark.

Unlike Trino, which may return incorrect results for some queries, Hive on MR3 consistently produces correct answers.

With Hive on MR3, you don’t have to choose between performance and correctness.

Can Hive on MR3 run batch and interactive queries in the same system?

Yes. Hive on MR3 is architected from the outset to support both interactive and batch queries in a single unified system. Its fault-tolerant execution and built-in capacity scheduling allow different types of workloads to run together efficiently. Interactive queries can be prioritized for faster response times, while batch jobs continue running reliably in the background — ensuring smooth operation without the need to manage separate systems.

What environments does Hive on MR3 support, and does it work with S3?

Yes. Hive on MR3 runs in any environment — on Hadoop, on Kubernetes, or even in standalone mode without a resource manager. It supports both HDFS and S3, enabling full separation of compute and storage. You can deploy Hive on MR3 on-premises, in the cloud, or in hybrid environments. This flexibility allows you to tailor deployment to any infrastructure.

Operational Advantages

How does Hive on MR3 help simplify operations and reduce costs?

In many organizations, interactive and batch workloads are handled by separate systems — one optimized for responsiveness, the other for throughput. This approach adds complexity, increases infrastructure costs, and requires maintaining multiple platforms.

Hive on MR3 eliminates this divide by supporting both types of queries in a single fault-tolerant system. With built-in capacity scheduling, it allows interactive queries to take priority without delaying batch jobs. This unified design simplifies operations, reduces infrastructure costs, and eliminates the need to maintain multiple platforms.

With Hive on MR3, one system is all you need.

How does Hive on MR3 improve resource efficiency in the cloud?

Hive on MR3 improves resource efficiency through fast autoscaling and smart caching. In cloud environments, it can scale quickly based on workload demand, efficiently combining spot and on-demand instances without risking query interruption. Selective caching can reduce repeated access to storage like S3, minimizing both latency and cost.

How easy is it to deploy Hive on MR3?

Hive on MR3 offers multiple deployment options: shell scripts for all environments, and Helm charts and a custom TypeScript generator for Kubernetes. With quick start guides and production-ready configurations, data engineers familiar with distributed systems can typically get Hive on MR3 running in about 30 minutes, given a suitable on-premises environment. In cloud environments, setup may take longer depending on provisioning, network configuration, and cloud-specific security settings.

Getting Started

What are the basic requirements for running Hive on MR3?

Hive on MR3 requires a working cluster and a database for the Hive Metastore (such as MySQL or PostgreSQL). You’ll also need writable local disks on every worker node to store intermediate query data. For shared temporary data, you can use either a PersistentVolume on Kubernetes, or distributed storage like HDFS or S3. Detailed prerequisites are available in the quick start guides.

Can I migrate my existing Hive Metastore and workloads to Hive on MR3?

Yes, migrating your existing Hive Metastore and workloads to Hive on MR3 is straightforward. Hive on MR3 uses the same Metastore schema as Apache Hive, with no differences at all. If your Apache Hive version matches the version of Hive on MR3 you intend to use, you can directly reuse your existing Metastore. For older versions such as Hive 2 or 3, you can follow the standard Hive upgrade procedure. Existing Hive queries and User-Defined Functions (UDFs) also work without changes, as Hive on MR3 preserves the same interface — only replacing the underlying execution layer.

For users running Hive 3.1, a compatible build of Hive 3.1 on MR3 is available upon request.

What is the first step after meeting the requirements?

Once you meet the requirements, the first step is to download the MR3 release from the public repository and follow the quick start guides. In most cases, users find the guides clear enough to get started without needing additional help. One team even adopted Hive on MR3 in production without ever contacting us!

Where should I go for help while testing Hive on MR3?

If you need help, the best place to start is the MR3 Slack, where you can ask questions and get real-time help from the team. You can also post in the MR3 Google Group for longer discussions or support. If you prefer, you can contact us directly by email as well.

Openness and Flexibility

Is Hive on MR3 open source?

While the MR3 execution engine is not open source, Hive on MR3 is fully open source, with its source code publicly available on GitHub. It consists of two customizable components: a fork of Apache Hive extended to run on MR3, and a runtime library originally based on Apache Tez but significantly evolved over time. We provide shell scripts for rebuilding Hive on MR3 using these components — so you can apply patches or tailor the system to your needs without vendor involvement.

How is Hive on MR3 different from other open source products in practice?

Unlike some open source products that restrict key features to paid editions, Hive on MR3 offers all features even in the Free plan. It gives users substantial and targeted control over the system, with the ability to modify and rebuild key components for query compilation and execution.

The only part not open is the MR3 execution engine, which manages low-level operations like resource management, fault tolerance, and task scheduling — areas that users don’t typically need to modify. If you do need changes to the execution engine, however, you can simply reach out. We are happy to implement new features at no cost.

In practice, Hive on MR3 is more open than most open source products.

Can enterprise customers review the MR3 source code?

We understand that enterprise teams may require a deeper technical evaluation of the MR3 execution engine before adoption. While MR3 is not open source, we offer source code walkthrough sessions — live, developer-led calls that explain the architecture using detailed design documents and walk through key parts of the source code. These sessions are designed to give your team the confidence needed to evaluate MR3 for production use.

Can I apply security policies and controls of my choice?

Yes. Hive on MR3 inherits its security capabilities directly from Apache Hive, which means it supports the same integrations for authentication, authorization, and encryption — including tools like Apache Ranger, LDAP, Kerberos, SAML, and more. Because Hive on MR3 builds on Apache Hive, it benefits from the full range of security features maintained by the Hive community. As Hive continues to evolve, those improvements are naturally reflected in Hive on MR3 as well.

Am I locked in if I use Hive on MR3?

No — there is no vendor lock-in with Hive on MR3. Since it works with the standard Hive Metastore, you can switch back to Apache Hive or move to another technology whenever you choose. This flexibility is even greater if you use an open table format like Apache Iceberg.

Project Background

What is the history behind MR3?

The development of MR3 began in July 2015, following two years of preliminary research. The first official release, MR3 0.1, was launched in March 2018 and featured Hive on MR3 as its first application. Since then, we have been actively contributing to Apache Hive and expanding MR3 with new features and improvements. MR3 reflects nearly a decade of focused engineering, hands-on experience with Hive, and a long-term commitment to performance and stability.

Is Hive on MR3 used in production by other companies?

Yes. Hive on MR3 has been used in production by several companies, and a few continue to run it in production today. The system has matured through years of feedback from real-world deployments. With the release of MR3 2.0, we’re focused on making Hive on MR3 more broadly accessible to teams who can benefit from it.

Who is behind MR3?

MR3 is actively maintained by a small, dedicated team led by its original architect — a PhD in computer science from Carnegie Mellon University (CMU), USA. He has been the driving force behind MR3 since its inception in 2015, and continues to guide its development and offer hands-on support to users.

Ready to Experience the Power of Hive on MR3?

Try Hive on MR3 today and see the difference.

Get Started Contact Us

Supercharge Big Data Analytics with Hive on MR3