Apache Arrow adds a new protocol to speed up database access and reduce program code writing

2022-08-06 0 By

Apache Arrow, one of Apache’s top projects, now includes a community-developed client-to-server protocol called Flight SQL that combines Arrow’s In-memory Columnar Format with the Flight RPC framework.To speed up SQL database operations.Flight SQL aims to provide much of the same functionality as existing apis such as JDBC and ODBC, including executing queries, creating expressions, and even intercepting metadata for SQL dialects, available types, and defining tables.By building applications with Apache Arrow, Flight SQL allows users to interact with Arrow’s local database without additional data conversion.Flight offers an example of Wire Format that supports out-of-the-box encryption and authentication, and gives developers the ability to further implement optimizations like parallel data access.While Flight SQL can be used directly to access the database, it is not intended to be a JDBC/ODBC replacement. Since Flight SQL can be used as a wire transfer protocol and driver instance, it can support JDBC/ODBC drivers, reducing the burden of database instances.Although standards such as JDBC/ODBC have been used for decades, JDBC/ODBC functionality is far from adequate for users who want to use Apache Arrow or a general column database. Official examples include using column apis such as JDBC or PEP 249.The data must be transposed twice, once to the API as a column and once back to the column format.Although ODBC and other apis provide batch access to the result buffer, the user still needs to copy the data into the Arrow array to be easily used in the Arrow ecosystem, and the Flight SQL aims to solve this cumbersome intermediate step.Flight SQL enables database servers to instantiate a standard interface from the start, in the same way that Arrow provides a standard memory format. Just as Arrow provides a standard memory format, Flight SQL frees developers from having to design and instantiate a new online transport protocol.On the client side, Flight SQL provides users with batch access to query results without converting data from other apis or formats. Additionally, by transferring protocols online between Flight and Flight SQL library instances, the client or driver needs to write less code and use Flight underneath.Clients and servers can also instantiate optimizations such as parallel data access.Arrow Flight is already up to 20 times faster than existing libraries such as PyODBC, and Flight SQL can package this performance advantage into a standard interface for clients and database instances.Flight SQL is still under development with Apache Arrow 7.0.0, but C and Java instances are available.