Linux data manipulation

Wherever creating or manipulating files my first port of call is linux (or git bash). Windows certainly has tools that can achieve the same results but I find linux can manage larger files more efficiently and gives you the facility to perform much more complex tasks at once without resorting to code.

SEQ COMMAND

This command provides a method of creating number sequences. The first number is the starting number, the second is the increment value and the third the maximum value.

$ seq 1000000 10 9000000

1000000

1000010

1000020

…..

9000000

REV COMMAND

This command allows you to reverse a string.

$ rev filename01.csv

vsc.10emanelif

Now imagine you wanted to change filename01.csv to filename02.csv. You could run the following command.

$ echo filename-01.csv | cut -d- -f2 | sed ‘s/^/20/’ | rev | sed ‘s/$/.csv'

filename-02.csv

FOR COMMAND

The command allows you to loop around a block of command(s). In this example we use the seq command to create a sequence of 1-10 and echo each value out.

$ for i in `seq 1 10`;

do

echo $1

done

1

2

3

10

SED COMMAND

This command can replace values within a string/file. In this example we add a header to an existing file called filename 01.csv.

FILENAME01.CSV (PRE UPDATE)

1000000,9000000

$ sed -i 1i'column1,column2’ filename01.csv

FILENAME01.CSV (POST UPDATE)

column1, column2

1000000,9000000

SPLIT COMMAND

This command allows you to take a large file and break it into smaller files. For example you have a data file with a million rows you could split that in to files each with 10000 rows.

$ split -l10000 onemillion.csv

xaa

xab

xac

xaz

Previous
Previous

Rancher OS

Next
Next

Never give in