I have an enormous dataframe containing sensor data harvested from a chassis containing nearly 30 embedded processors while it was undergoing environmental stress testing. The vast majority of the sensors are for temperature, but there are several electrical sensors as well. Some processors report more complete data than others for their attached sensors. In particular, some of the electrical sensors provide only two of the set (volts, amps, watts), and I want to calculate the third (via Ohm's Law).
Each sensor value is in its own row, along with all the meta data needed to characterize that specific acquisition. After filtering to find missing Ohm's Law values, I calculate a new row based on the other two.
This is my only calculation in my entire analysis that uses two rows to calculate a third. The problem is this is extremely slow, and I have no clue how to vectorize it (as I do for all my other Pandas dataframe operations). Here's the start of the loop:
for i, g in df_elec.groupby(np.arange(len(df_elec)) // 2):
I append the new line to the end of the dataframe, then sort and reindex after all additions have been made.
But it takes 3 minutes to run, where my other operations are nearly instantaneous.
How can I make it faster?