Tips in aggregation with join query

I have big table with about 20 000 000 records and need write query with aggregation that refer to 20 columns (according to them I aggregated). In optimal query I want to make firstly aggregation in subquery and then joins. But here is the problem, I use to join columns that i didn't use in subquery, because i don't need them in output.

I saw that we use aggregate function in subquery (array_agg) and i need singles values from this array. But when i use UNNEST() i got duplicated values, because every columns are writed to unnested values. How can remove duplicates and use them to join ?

Here is example (very slow query):

SELECT 
col1,
col2,
col3,                 <----- destination of these columns are from : col1-col10 from table1 
col4,                        col11-col20 from table2
col5,
....
col20
aggregation_fun1
aggregation_fun2
......
aggregation_fun5  -> array_agg(table1.value)
FROM table1 
LEFT JOIN table2 ON table2.value=table1.value <---this value isn't on top
GROUP BY 1,2,3,4,5,6...20

So, i wanted to write optimal:

WITH aggregation_data as (     <---- aggregation first
                            col1,
                            col2,
                            col3,
                            col4,      -----> columns from table1
                            col5,
                            ....
                            col10
                            aggregation_fun1
                            aggregation_fun2
                            ......
                            aggregation_fun5  -> array_agg(table1.value) as array_aggg
                            FROM table1 
                            GROUP BY 1,2,3,4,5,6...10
                             ),
 unnest_value as (
                       SELECT *, unnest(array_aggg) single_value FROM aggregation_data                     
                      ),
join_table as (
                  SELECT * from unnest_value a
                  LEFT JOIN  table2 b ON b.value=a.single_value
              )
 SELECT * FROM join_table

Query in this with WITH CTE is faster, but it has duplicated values. I can't use distinct, because time of query will be much higher than base query. So i need help to at least remove duplication without lose performance.



Read more here: https://stackoverflow.com/questions/67925187/tips-in-aggregation-with-join-query

Content Attribution

This content was originally published by Sebastian Sańko at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: