Martkos IT Ltd

View Original

Linux data manipulation

Wherever creating or manipulating files my first port of call is linux (or git bash). Windows certainly has tools that can achieve the same results but I find linux can manage larger files more efficiently and gives you the facility to perform much more complex tasks at once without resorting to code.

SEQ COMMAND

This command provides a method of creating number sequences. The first number is the starting number, the second is the increment value and the third the maximum value.

$ seq 1000000 10 9000000

1000000

1000010

1000020

…..

9000000

REV COMMAND

This command allows you to reverse a string.

$ rev filename01.csv

vsc.10emanelif

Now imagine you wanted to change filename01.csv to filename02.csv. You could run the following command.

$ echo filename-01.csv | cut -d- -f2 | sed ‘s/^/20/’ | rev | sed ‘s/$/.csv'

filename-02.csv

FOR COMMAND

The command allows you to loop around a block of command(s). In this example we use the seq command to create a sequence of 1-10 and echo each value out.

$ for i in `seq 1 10`;

do

echo $1

done

1

2

3

10

SED COMMAND

This command can replace values within a string/file. In this example we add a header to an existing file called filename 01.csv.

FILENAME01.CSV (PRE UPDATE)

1000000,9000000

$ sed -i 1i'column1,column2’ filename01.csv

FILENAME01.CSV (POST UPDATE)

column1, column2

1000000,9000000

SPLIT COMMAND

This command allows you to take a large file and break it into smaller files. For example you have a data file with a million rows you could split that in to files each with 10000 rows.

$ split -l10000 onemillion.csv

xaa

xab

xac

xaz