Posted by
Admin on
10/29/2005 9:22 AM |
Comments (10)
Tuesday I had a very interesting telephone interview with Google.
I
had sent my resume to Google a few weeks ago - not because I really
wanted to work for Google but rather I was really bored at work and was
curious to see if I would get noticed. I guess I did and that was
surprising.
The interview was scheduled via e-mail for Tuesday
at 10:00 (Google time - noon in Houston). I waited. And waited some
more. Then picked up the phone and called the HR person who setup the
call. After about an hours I got a call from a engineer named Jim that
explained that he just got into the office and forgot about the
interview. I still think it's weird to go to work at 11:00 in the
morning.
Having approached this interview cold, I really didn't
know what to expect and had not prepared in anyway. I was quite
surprised at the technical level of the interview. My interviewer was
obviously a senior software engineer that was really into it. He seemed
to be an expert at everything (or at least in the areas he was asking
about). He wasn't very interested in me, my background, why I was
interested in Google, or even where I lived. His only goal was to
measure my ability to answer 4 questions.
Here is what was thrown at me:
Suppose
you have a shell script that must write to a temporary file. How can
you ensure that this file doesn't exist and will not be overwritten by
something else?
I started by saying that you
could create a file that included an application prefix and the
date/time. This was only an okay answer. Jim would simply respond with
a what if scenario breaking my solution. I kept on adding things to the
file name and finally said to scrap it all and use a
GUID. Where that came I don't know.
The correct answer to this is to use the $$ variable to insert the pid into the filename.
0-1
The second question was
Assume
you have a database table with a list of accounts and another table
with a list of transactions. How can I find how many accounts don't
have a transaction.
I said the SQL query would look something like this:
SELECT COUNT(account)
FROM
SELECT DISTINCT Account.account
FROM Account, AccountTrans
WHERE Account.account <> AccountTrans.account
Which I don't think would work. Here is the correct answer (I think):
SELECT COUNT(account)
FROM Account OUTER JOIN AccountTrans
WHERE AccountTrans.Account = null
0-2
The third question was
Assume you have two homogeneous databases - one in the US and another one in Europe. If you want to maintain the ACID properties, how can a transfer be made between an account in the US and an account in Europe.
I
sort of got this one right. My solution was kind of long and drawn out
but I basically said that you would have to recreate the services the
database system provides inside the application performing the
transfer. So this application would have to track the progress of the
transaction. Now I said you could just write to a file as you go along.
That's where I stumbled. If you write to a file to track the progress
of the transaction you are going to break one more of the ACID
properties. You have to use the database to track your progress.
0.5 - 2
And last but not least the fourth question was
Google
get queries from around the world. Write a function that will return a
two character string representing a country code given an IP address as
its input.
I started by saying that you would have to get a
list from ICANN containing the IP address ranges and country codes. Jim
said "Okay, here is a file with 100,000 records with the starting and
ending ranges corresponding to a country code."
I can't recreate
my entire solution because it was kind of vague to start with.
Essentially, you have to create a tree structure with the country codes
as the leaves at the very bottom. I chose to split the IP addresses by
octets. So the top level would contain all of the starting and ending
numbers corresponding to the first octet on the IP address. So this
tree structure would be very wide but only 4 levels deep.
This
was a fair answer in my mind. But Jim then wanted me to estimate the
size of the tree. This was really an impossible feat since it could
vary wildly based upon the data from ICANN. He made some assumptions
for me and I tried to give an estimation but I fell flat on my face on
this task.
He must have really liked this question because then
he wanted to know how I was going to manage the memory requirements.
Could all of this fit into memory? Hell if I know. His only point of
this was to point out that Google has lots of memory and this is
something that they can do but I wouldn't be able to do on my laptop.
Point taken - move on.
He hits me again - how could this be
optimized. After 45 minutes I really didn't care. I flipped the problem
around and said that the best way to optimize this function was to
operate it in batch passing in several records at a time. If you have a
batch of IPs to look up, they can be sorted, then only the relevant
portion of the tree would have to be loaded. This would free up lots of
memory and should improve the performance (which I think is false now).
I'll give myself about 0.75 points for this answer. So the final score
Zach - 1.25
Google - 2.75
So there Google - you won.
To conclude the interview, Jim gave a nice courteous "Thank you. Have a nice day." And he was gone.
Original story