How to preprocess monolit text sequences in python?

I have dataset, which strcture contain 3 part [label,type,random-sequence]. label is constant, there are 5-10 values of type and random-sequence is random field. Dataset example you can find below:

HTTPTRACEb615c083-0ddf-4d69-aa8e-aeff2c1c2a62
HTTPGETb9119006-db6e-46e1-81f8-e22d3ef94d97
HTTPTRACE98fac866-003c-4555-bee3-ba9d2b1205fd

Main goal is to find label, type and random-sequence. In the future data will change. So text processing method have to be universal. Is there way to devide this text sequence in words? So result should be like this:

field1: HTTP; field2: TRACE; field3: b615c083-0ddf-4d69-aa8e-aeff2c1c2a62;

Future data set can look like.

tcpestablished9sdfc866-003c-4sd5-bsfd-ba9pouigdjd 


Read more here: https://stackoverflow.com/questions/64892675/how-to-preprocess-monolit-text-sequences-in-python

Content Attribution

This content was originally published by leafar_giraphick at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: