Well, I have the following html that I want to get the @data-coords
attribute from, but I want the latitude and longitude to be in different variables. See html bellow:
<div id="gmap-container">
<div id="gmap-value" data-coords="-26.995548880319042,-48.633818457672135,16,150">
...
</div>
</div>
If I use //div[@id='gmap-imovel']/@data-coords
as XPath, it returns the entire thing from @data-coords
attribute.
My Python code is something like that:
xpaths = {
"parser_lat": "//div[@id='gmap-value']/@data-coords",
"parser_lon": "//div[@id='gmap-value']/@data-coords"
}
latitude: str = parsel.Selector().xpath(xpaths['parser_lat']).extract_first()
longitude: str = parsel.Selector().xpath(xpaths['parser_lon']).extract_first()
return latitude, longitude
I would like to get the latitude and longitude splitted as mentioned above, I know that I can add regular expression to the Python code to get what I want, but that way would break the pipe for others websites. Example using regular expression that I don't want to use:
regex_expression = r'^-(\d+\.\d+)'
latitude = re.findall(regex_expression, '-26.995548880319042,-48.633818457672135,16,150')[0]
longitude = re.findall(regex_expression, '-26.995548880319042,-48.633818457672135,16,150')[1]
This example above would give me the -26.995548880319042
and -48.633818457672135
in their respective variables, but as I mentioned this will break the pipe to other websites.
I want to get this result I mentioned above only using XPath, like this:
parser_lat: regex('^-(\d+\.\d+)', //div[@id='gmap-imovel']/@data-coords)[0]
parser_lon: regex('^-(\d+\.\d+)', //div[@id='gmap-imovel']/@data-coords)[1]
and then use it in the first Python code example I gave.
I tried using substring
but didn't worked for me.
Read more here: https://stackoverflow.com/questions/66320610/xpath-get-just-some-part-of-the-attribute-value-or-text-node
Content Attribution
This content was originally published by João Koritar at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.