XPath – Get just some part of the attribute value or text node

Well, I have the following html that I want to get the @data-coords attribute from, but I want the latitude and longitude to be in different variables. See html bellow:

<div id="gmap-container">
    <div id="gmap-value" data-coords="-26.995548880319042,-48.633818457672135,16,150">

If I use //div[@id='gmap-imovel']/@data-coords as XPath, it returns the entire thing from @data-coords attribute.

My Python code is something like that:

xpaths = {
    "parser_lat": "//div[@id='gmap-value']/@data-coords", 
    "parser_lon": "//div[@id='gmap-value']/@data-coords"

latitude: str = parsel.Selector().xpath(xpaths['parser_lat']).extract_first()
longitude: str = parsel.Selector().xpath(xpaths['parser_lon']).extract_first()

return latitude, longitude

I would like to get the latitude and longitude splitted as mentioned above, I know that I can add regular expression to the Python code to get what I want, but that way would break the pipe for others websites. Example using regular expression that I don't want to use:

regex_expression = r'^-(\d+\.\d+)'

latitude = re.findall(regex_expression, '-26.995548880319042,-48.633818457672135,16,150')[0]
longitude = re.findall(regex_expression, '-26.995548880319042,-48.633818457672135,16,150')[1]

This example above would give me the -26.995548880319042 and -48.633818457672135 in their respective variables, but as I mentioned this will break the pipe to other websites.

I want to get this result I mentioned above only using XPath, like this:

parser_lat: regex('^-(\d+\.\d+)', //div[@id='gmap-imovel']/@data-coords)[0]
parser_lon: regex('^-(\d+\.\d+)', //div[@id='gmap-imovel']/@data-coords)[1]

and then use it in the first Python code example I gave.

I tried using substring but didn't worked for me.

Read more here: https://stackoverflow.com/questions/66320610/xpath-get-just-some-part-of-the-attribute-value-or-text-node

Content Attribution

This content was originally published by João Koritar at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: