This is apparently a popular interview programming question. There are 2 CSV files with dinosaur data. We need to query them to return dinosaurs satisfying a certain condition.

Note - We cannot use additional modules like q, fsql, csvkit etc.


Tyrannosaurus Rex,2.5,carnivore


Tyrannosaurus Rex,5.76,bipedal

using the forumla : speed = ((STRIDE_LENGTH / LEG_LENGTH) - 1) * SQRT(LEG_LENGTH * g), where g = 9.8 m/s^2

Write a program to read csv files, and print only names of bipedal dinosaurs, sorted by speed from fastest to slowest.

In SQL, this would be simple:

select f2.name from
file1 f1 join file2 f2 on f1.name = f2.name
where f1.stance = 'bipedal'
order by ((f2.stride_length/f1.leg_length - 1)*pow(f1.leg_length*9.8,0.5) desc

How can this be done in python ?

Total Answers 4

Answers 1 : of Querying csv files in python like sql

You can do it in pandas,

import pandas as pd
df_1 = pd.read_csv('df_1.csv')
df_2 = pd.read_csv('df_2.csv')

df_comb = df_1.join(df_2.set_index('NAME'), on = 'NAME')
df_comb = df_comb.loc[df_comb.STANCE == 'bipedal']
df_comb['SPEED'] = (df_comb.STRIDE_LENGTH/df_comb.LEG_LENGTH - 1)*pd.Series.pow(df_comb.LEG_LENGTH*9.8,0.5)
df_comb.sort_values('SPEED', ascending = False)

Not as clean as SQL!


Answers 2 : of Querying csv files in python like sql

You can write SQL in python using pandasql.


Answers 3 : of Querying csv files in python like sql

def csvtable(file):     # Read CSV file into 2-D dictionary
    table = {}
    f = open(file)
    columns = f.readline().strip().split(',')       # Read column names
    for line in f.readlines():
        values = line.strip().split(',')            # Get current row
        for column,value in zip(columns,values):
            if column == 'NAME':                    # table['TREX'] = {}
                key = value
                table[key] = {}
   else:
                table[key][column] = value          # table['TREX']['LENGTH'] = 10
    f.close()
    return table

# READ
    table1 = csvtable('csv1.txt')
    table2 = csvtable('csv2.txt')
except Exception as e:
    print (e)

# JOIN, FILTER & COMPUTE
table3 = {}
for value in table1.keys():
    if value in table2.keys() and table2[value]['STANCE'] == 'bipedal':    # Join both tables on key (NAME) and filter (STANCE)

        leg_length = float(table1[value]['LEG_LENGTH'])
      stride_length = float(table2[value]['STRIDE_LENGTH'])
   speed = ((stride_length / leg_length) - 1) * pow((leg_length * 9.8),0.5)    # Compute SPEED

        table3[value] = speed

result = sorted(table3, key=lambda x:table3[x], reverse=True)                       # Sort descending by value

  f = open('result.txt', 'w')
    for r in result:
        f.write('%s\n' % r)
  f.close()
except Exception as e:
    print (e)

Answers 4 : of Querying csv files in python like sql

I've encountered the same problem at work and decided to build an offline Desktop app where you can load CSVs and start writing SQL. You can join, group by, and etc.

This is backed by C and SQLite and can handle GBs of CSVs file in ~10 seconds. It's very fast.

Here's the app: https://superintendent.app/

This is not Python though, but it is a lot more convenient to use.

