Split file into multiple files one row at a time

I have a log file (about 50K rows) in the format:

email1@gmail.com:address0:some_details0
email2@gmail.com:address1:some_details1
email1@yahoo.com:address2:some_details2
email2@yahoo.com:address3:some_details3

I am trying to read this file and split it into two folders (gmail.com and yahoo.com), and then write each row to a unique file named after the email-ID. My code below works, but it is very slow. Can someone pls help me make this faster and more effecient? Would be appreciated.

#/bin/sh
grep -hv -P "[^[:ascii:]]" * |
awk -F":" '
    {
        if ($1 ~ /^[[:alnum:]_.+-]+@[[:alnum:]_.-]+\.[[:alnum:]]/ && NF>1 && $NF!="")
        {
            split($1, arr, "@")
            system("mkdir -p "tolower(arr[2]))
            print $0 >> tolower(arr[2])"/"tolower(arr[1])
        }
    }'

PS: the regex is a basic check to ensure the email address is valid. I am not doing a overly heavy check. At first I thought the regex is making my code slower, but not really. Even without the regex the code is super slow. I think the I/O is making this slow. How do we improve?



Read more here: https://stackoverflow.com/questions/64947045/split-file-into-multiple-files-one-row-at-a-time

Content Attribution

This content was originally published by rogerwhite at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: