In a previous article, Using Apache Phoenix to Access HBase via JDBC and DBVisualizer, I covered how to connect to Phoenix from a jdbc client. Let’s talk about what Phoenix is and why you should be excited about it.

Often when dealing with Big Data, you will find yourself in a situation where HBase is the obvious answer. You need scalability, low latency, consistency, and random read/write access to massive amounts of structured data. The problem comes from your non-technical business users who would like to be able to query the data-store. You can’t expect these users to write api level code in order to access the data held in HBase. Apache Phoenix solves this problem while adding additional features to HBase.

Apache Phoenix is, in some ways, to HBase what Hive is to HCatalog and MapReduce. It serves as a means to increase performance and add functionality while simplifying access to a complex yet efficient tool. Unfortunately, as is the case with Hive, Phoenix is still reliant on its underlying technology and, as such, is only a good solution where HBase is also a good solution.  

Apache Phoenix can run as a standalone server via the Phoenix Query Server, or as a standalone client. Both solutions translate SQL queries into a series of HBase scans, run the scans, then return JDBC result sets. This allows greatly simplified access to HBase using a familiar language and tool set. Additionally, by using the latest features in HBase and stored metadata, Phoenix can optimize access to the HBase tables generally improving performance over standalone HBase.  

Phoenix has a number of powerful features it adds to HBase. It can use stored metadata to create secondary indexes on an HBase table. The secondary indexes are a significant benefit to using Phoenix, as opposed to standard HBase, as they offer additional access patterns to a given table. Phoenix includes a bulk loader for csv files, allowing a simple means to populate data. In the latest versions, Phoenix also offers user-defined functions allowing more complex queries. Finally, in the current beta, Phoenix enables row and cross table transaction support with full ACID semantics.  

As you can see, much as Hive simplified access for HCatalog and MapReduce, Phoenix is changing the world of HBase. It isn’t always the fastest or best solution, but it is generally close and it is almost always the simplest and easiest to create and maintain. The additional functionality and tools provided with Phoenix can shorten development timelines and prevent creating custom solutions. If you are going to be implementing a solution utilizing HBase, I highly suggest you investigate whether Phoenix will work for you. At the very least, it can be used to lay views atop your HBase tables to allow your end users easy JDBC access utilizing tools they are comfortable with.


For More Information

For more insightful resources related to this subject, check the links below:

Moser's Technology Infrastructure

Moser's Big Data

Using Apache Phoenix to Access HBase via JDBC and DBVisualizer