HPE Ezmeral Software platform

Re: get drillbit hostname or IP address when connecting via zookeepers

 
SOLVED
Go to solution
MarkRajcok
Occasional Advisor

get drillbit hostname or IP address when connecting via zookeepers

I'm using a zookeeper connection string in a Scala program to get a connection to a drill cluster. I would like to log which drillbit (hostname or IP address) I successfully connected to, but I can't find any property or method on the java.sql.Connection class to get this information.

I tried connection.getMetaData.getURL, but that just returns the zookeeper connection string I used to get the connection (scala code below):

import java.sql._
Class.forName("com.mapr.drill.jdbc41.Driver")
val con = DriverManager.getConnection("jdbc:drill:zk=<server1>:5181,<server2>:5181/drill/<cluster_name>...")
con.getMetaData().getURL()  // returns "jdbc:drill:zk=<server1>..."
val st  = con.createStatement()
val res = st.executeQuery("select *,'findme' from sys.drillbits")
while(res.next()) { println(res.getString(1)) }

If I grep the drillbit_queries.json logs on all of the drillbit servers for findme, I can find which drillbit was used to execute the query. For tracking down connection issues, I'd like to be able to log which drillbit is being used from the Scala application, rather than have to grep logfiles.

(The above is also posted to StackOverflow: https://stackoverflow.com/questions/68042577/how-to-determine-which-apache-drill-drillbit-was-selected-when-connecting-via-zo)

9 REPLIES 9
tdunning
HPE Pro

Re: get drillbit hostname or IP address when connecting via zookeepers

 
I work for HPE
tdunning
HPE Pro

Re: get drillbit hostname or IP address when connecting via zookeepers

Don’t forget that you can ask these questions directly to the Apache Drill community on the drill mailing lists or slack channels. See https://drill.apache.org/community-resources/ for more details.

Specific to your question, check out the system tables (https://drill.apache.org/docs/querying-system-tables/). The sys.profiles_json table, in particular, seems to be the one that is likely to give you the information you need.

It is a generally pretty safe bet that if you can see the information that you need in the web interface for Drill that you will also be able to find that information in the system tables. Commonly, a quick scan of the docs and some trial queries will give you the info you need. Since the web interface is open source, you can also grovel the source code to figure out how the web interface gives you the pertinent info.

In particular, this query gives roughly what you are asking for:

 

with t1 as (
    select convert_from(x.json, 'JSON') js from sys.profiles_json x
)
select t1.js.queryId id, t1.js.foreman.address foreman, t1.js.query query
from t1
limit 3

 

When I run it on my test instance, I get this (you can see me refining this query

"id","foreman","query"
"1f2c78e4-949c-8a31-4184-6f7b84b80f4e","nodea","with t1 as (select convert_from(x.json, 'JSON') js from sys.profiles_json x)\nselect t1.js.queryId, t1.js.foreman.address, t1.js.query from t1 limit 3"
"1f2c78fd-61e7-5105-49d6-2845c452fad4","nodea","with t1 as (select convert_from(x.json, 'JSON') js from sys.profiles_json x)\nselect t1.js.queryId, t1.js.foreman.address, t1.js.query from t1"
"1f2c7964-8cea-1810-6e4f-858edf344d0e","nodea","with t1 as (select convert_from(x.json, 'JSON') js from sys.profiles_json x)\nselect t1.js.queryId, t1.js.foreman.address, t1.js.query from t1"

 

I work for HPE
MarkRajcok
Occasional Advisor

Re: get drillbit hostname or IP address when connecting via zookeepers

Thanks Ted.
The ability to query sys.profiles_json via SQL will be useful for some other query investigations that I want to do, so thanks much for that.

Unfortunately, looking at the most recent "n" profiles wouldn't be a reliable way for us to determine the connected drillbit, since we have multiple applications that are querying Drill, some of which are multi-threaded and may issue multiple queries at about the same time.

I noticed that sqlline logs which drillbit it connects to.  I think this log is coming from the MapR Drill JDBC driver.  Any chance someone from HPE can look at that code and tell us how to do the same in our applications?

I'm really surprised that there doesn't seem to be an easy way to get the drillbit hostname directly from the connection object.

tdunning
HPE Pro

Re: get drillbit hostname or IP address when connecting via zookeepers

Mark,

I don't understand why the most recent n profiles wouldn't work for you if you inject a random pattern to search for and if n is large enough.

Regarding the code for sqlline and the JDBC driver, I will ask. 

I work for HPE
MarkRajcok
Occasional Advisor

Re: get drillbit hostname or IP address when connecting via zookeepers

Yeah okay.
"Wouldn't work" in the sense of not being practical for us right now.

We're issuing more than 100k queries per day.  To get the drillbit hostname (just for logging purposes) would require 1) an extra query with a random pattern 2) another extra query to ascertain the drillbit.  We're currently getting a new drill connection using the zookeepers for every query. rather than pooling connections.  (Please don't ask why we're doing that, since I know it is not efficient.  Let's just say I didn't design that part.)
So our query count would immediately balloon from 100k to 300k per day--every real query would turn into 3 queries.

MarkRajcok
Occasional Advisor
Solution

Re: get drillbit hostname or IP address when connecting via zookeepers

I actually found what I was looking for:
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java#L462

private void connect(DrillbitEndpoint endpoint) throws RpcException {
   client.connect(endpoint, properties, getUserCredentials());
   logger.info("Foreman drillbit is {}", endpoint.getAddress());
}

So now the question becomes (assuming the MapR Drill JDBC driver is similar), can our application get access to that DrillbitEndpoint object?

In the UserClient code (DillClient is a wrapper around UserClient) I see that
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserClient.java#L109
    DrillbitEndpoint endpoint;
is a private instance variable, and I don't see any public method to expose it.  So I don't think there is a way for our app to get access to that.  

Ted suggested Java reflection and/or a future pull request for Drill to expose this field.
Thanks Ted for your help with this.

tdunning
HPE Pro

Re: get drillbit hostname or IP address when connecting via zookeepers

This snippet of code illustrates how to access a private field:

        PrivateFieldHolder p = new PrivateFieldHolder();

        Field f = p.getClass().getDeclaredField("endPoint");
        f.setAccessible(true);
        Double x = (Double) f.get(p);
        System.out.printf("Private field = %.6f\n", x);

 

I think that this would work for your needs

I work for HPE
MarkRajcok
Occasional Advisor

Re: get drillbit hostname or IP address when connecting via zookeepers

Just adding this info to the thread, for future readers...

In my recent travels through the Drill docs, I found the following on https://drill.apache.org/docs/troubleshooting/#identify-the-foreman:

SELECT hostname FROM sys.drillbits WHERE `current` = true;

That will return the hostname of the drillbit (i.e., the foreman) the session is using.
That looks to be easier than using any profiles table.

tdunning
HPE Pro

Re: get drillbit hostname or IP address when connecting via zookeepers

Ouch. Sooo much better than any of my suggestions!

That is exactly the right way to do this.

I work for HPE