Skip to content

CSV to JSON with column

The column tool is pretty neat. It allows you to format text in columns (how surprising). Less known fact is that itd can also show output in JSON format. This is useful when you need quick way to convert text or even CSV to JSON.

I've learned about it form this amazing YouTube video by Veronica Explains.

Preparation

Let's start with download of Nederlandse Spoorwagen stations open data from:

https://www.rijdendetreinen.nl/en/open-data/stations

1
curl -s https://opendata.rijdendetreinen.nl/public/stations/stations-2023-09.csv > stations-2023-09.csv

Check the first 5 rows of the file:

1
 head -n 5 stations-2023-09.csv
It should look like this:

1
2
3
4
5
id,code,uic,name_short,name_medium,name_long,slug,country,type,geo_lat,geo_lng
266,HT,8400319,"Den Bosch",'s-Hertogenbosch,'s-Hertogenbosch,s-hertogenbosch,NL,knooppuntIntercitystation,51.69048,5.29362
269,HTO,8400320,"Dn Bosch O","'s-Hertogenb. O.","'s-Hertogenbosch Oost",s-hertogenbosch-oost,NL,stoptreinstation,51.700553894043,5.3183331489563
227,HDE,8400388,"'t Harde","'t Harde","'t Harde",t-harde,NL,stoptreinstation,52.4091682,5.893611
8,AHBF,8015345,Aachen,"Aachen Hbf","Aachen Hbf",aachen-hbf,D,knooppuntIntercitystation,50.7678,6.091499

CSV to JSON conversion

The command may look like arcane spell, but it's actually quite simple:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Skip the first line of the input file (header) with `tail -n +2`
#
# For the column options:
# -t to format the output as a table
# -s ',' to specify the delimiter in the input file
# -N to specify the column names (we take the first line of the input csv file)
# -J to get JSON output
# -n to specify the key in the JSON output: {"name": [{<row1>}, {<row2>}, ...]}
# otherwise it would be {"table": {<row1>, <row2>, ...}}
#
# To provide column with the column names, we use the first line of the input file
# with head
tail -n +2 stations-2023-09.csv | column -t -s ',' -N $(head -n 1 stations-2023-09.csv) -J -n "stations" > stations-2023-09.json

That's it! Now you have the data in JSON without any extra tools. You can check the first two rows from the output with jq:

1
jq '.stations | .[:2]' stations-2023-09.json
It should show:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[
  {
    "id": "266",
    "code": "HT",
    "uic": "8400319",
    "name_short": "\"Den Bosch\"",
    "name_medium": "'s-Hertogenbosch",
    "name_long": "'s-Hertogenbosch",
    "slug": "s-hertogenbosch",
    "country": "NL",
    "type": "knooppuntIntercitystation",
    "geo_lat": "51.69048",
    "geo_lng": "5.29362"
  },
  {
    "id": "269",
    "code": "HTO",
    "uic": "8400320",
    "name_short": "\"Dn Bosch O\"",
    "name_medium": "\"'s-Hertogenb. O.\"",
    "name_long": "\"'s-Hertogenbosch Oost\"",
    "slug": "s-hertogenbosch-oost",
    "country": "NL",
    "type": "stoptreinstation",
    "geo_lat": "51.700553894043",
    "geo_lng": "5.3183331489563"
  }
]

Classical use cases

If you didn't heard of column before, you may find it useful for other table formatting tasks. For example show the content of the /etc/systemd directory in columns with 60 characters width:

1
ls /etc/systemd | column -c 60

1
2
3
4
5
6
7
8
9
coredump.conf           oomd.conf
homed.conf              pstore.conf
journald.conf           resolved.conf
journal-remote.conf     sleep.conf
journal-upload.conf     system
logind.conf             system.conf
logind.conf.pacnew      timesyncd.conf
network                 user
networkd.conf           user.conf

But the table formatting in the JSON conversion example is also very powerful by itself:

1
cat /etc/passwd | column -t -s ':' -N User,Pass,UID,GID,Description,Home,Shell

This will show you the content of the /etc/passwd file in a nice table format:

1
2
3
4
5
6
7
8
9
User                    Pass  UID    GID    Description                    Home              Shell
root                    x     0      0                                     /root             /usr/bin/bash
bin                     x     1      1                                     /                 /usr/bin/nologin
daemon                  x     2      2                                     /                 /usr/bin/nologin
mail                    x     8      12                                    /var/spool/mail   /usr/bin/nologin
ftp                     x     14     11                                    /srv/ftp          /usr/bin/nologin
http                    x     33     33                                    /srv/http         /usr/bin/nologin
nobody                  x     65534  65534  Kernel Overflow User           /                 /usr/bin/nologin
dbus                    x     81     81     System Message Bus             /                 /usr/bin/nologin

With named columns we can now rearrange their order with the -O option and even hide some columns with the -H option:

1
2
3
cat /etc/passwd | column -t -s ':' -N User,Pass,UID,GID,Description,Home,Shell \
-O UID,GID,User,Shell \
-H Pass,Description,Home

The output will look like this:

1
2
3
4
5
6
7
UID    GID    User                    Shell
0      0      root                    /usr/bin/bash
1      1      bin                     /usr/bin/nologin
2      2      daemon                  /usr/bin/nologin
8      12     mail                    /usr/bin/nologin
14     11     ftp                     /usr/bin/nologin
33     33     http                    /usr/bin/nologin