I have big table with about 20 000 000 records and need write query with aggregation that refer to 20 columns (according to them I aggregated). In optimal query I want to make firstly aggregation in subquery and then joins. But here is the problem, I use to join columns that i didn't use in subquery, because i don't need them in output.
I saw that we use aggregate function in subquery (array_agg) and i need singles values from this array. But when i use UNNEST() i got duplicated values, because every columns are writed to unnested values. How can remove duplicates and use them to join ?
Here is example (very slow query):
SELECT col1, col2, col3, <----- destination of these columns are from : col1-col10 from table1 col4, col11-col20 from table2 col5, .... col20 aggregation_fun1 aggregation_fun2 ...... aggregation_fun5 -> array_agg(table1.value) FROM table1 LEFT JOIN table2 ON table2.value=table1.value <---this value isn't on top GROUP BY 1,2,3,4,5,6...20
So, i wanted to write optimal:
WITH aggregation_data as ( <---- aggregation first col1, col2, col3, col4, -----> columns from table1 col5, .... col10 aggregation_fun1 aggregation_fun2 ...... aggregation_fun5 -> array_agg(table1.value) as array_aggg FROM table1 GROUP BY 1,2,3,4,5,6...10 ), unnest_value as ( SELECT *, unnest(array_aggg) single_value FROM aggregation_data ), join_table as ( SELECT * from unnest_value a LEFT JOIN table2 b ON b.value=a.single_value ) SELECT * FROM join_table
Query in this with WITH CTE is faster, but it has duplicated values. I can't use distinct, because time of query will be much higher than base query. So i need help to at least remove duplication without lose performance.