Posted on 16 Aug 2022, this text provides information on Technology & Software related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.
Table 2 has a list of countries with different types of exports.
TABLE 2
COUNTRY EXPORT
-----------------------
GERMANY AUTO
GERMANY TECHNOLOGY
-----------------------
ITALY FOOD
ITALY GARMENT
-----------------------
CHINA FOOD
CHINA GARMENT
CHINA TECHNOLOGY
CHINA STEEL
-----------------------
FRANCE FOOD
FRANCE GARMENT
-----------------------
(more countries)...
I would like to find countries that satisfy each set of the conditions. A country satisfies a set of conditions if it meets all the conditions in a set (even if it may meet additional conditions) -- For example, CHINA satisfies SET 1 even though it also meets the additional condition STEEL.
The expected output is something like:
SET Country
--------------------------
1 CHINA
2 CHINA
3 ITALY
3 CHINA
3 FRANCE
4 GERMANY
4 CHINA
...
Here's the query I have. It does the job, but I wonder if there is a way to make it perform better.Currently, when Table 1 has 5000 rows (approx. 600 sets of conditions), and Table 2 has 6 million rows, it takes 15 minutes to get the output.
WITH TEMP AS(SELECT SETID, COUNTRY, COUNT(*)AS TOTALCOUNT
FROM#TABLE1 AS T1
JOIN#TABLE2 AS T2 ON T2.EXPORT = T1.EXPORT
GROUPBY SETID, COUNTRY
)SELECT TEMP.SETID, COUNTRY
FROM TEMP
JOIN(SELECT COUNT(*)AS TOTALCOUNT, SETID
FROM#TABLE1
GROUPBY SETID
)AS TEMP2 ON TEMP.TOTALCOUNT = TEMP2.TOTALCOUNT AND TEMP.SETID = TEMP2.SETID
ORDERBY TEMP.SETID
This is the standard logic to find subsets, but 15 minutes for two not really big tables is really a lot.
The join might return a huge result intermediate set (it's a m-n-join), you can check it using:
SELECT COUNT(*)FROM#TABLE1 AS T1
JOIN#TABLE2 AS T2 ON T2.EXPORT = T1.EXPORT
Your query can be simplified a bit using a Group Count before the join:
WITH cte AS(SELECT*,
COUNT(*)OVER(PARTITIONBY SETID)AS cnt
FROM#TABLE1
)SELECT SETID, COUNTRY
FROM cte
JOIN#TABLE2 AS T2
ON T2.EXPORT = cte.EXPORT
GROUPBY SETID, COUNTRY
HAVING COUNT(*)= MAX(cte.cnt)ORDERBY SETID
But I don't expect this to improve runtime (unless the optimizer did a very stupid plan before)
Edit:
As expected, most of the runtime is caused by the huge number (1,500,000,000) of rows in the intermediate result set. For a similar problem (exactly matching two sets) I used a solution based on joining comma-delimited strings of all values in a group (which was very efficient on a Teradata system). This logic can be modified for subsets, but I don't really know about efficiency.
This is the algorithm:
For each country create a comma-delimited list like this:
,FOOD,,GARMENT,,STEEL,,TECHNOLOGY,
For each set create a comma-delimited list like this:
%,GARMENT,%,TECHNOLOGY,%
And then simply join using a LIKE (the additional commas are needed to prevent matching partial values, e.g. 'TECHNO' matching 'TECHNOLOGY'):
WITH cte1 AS(SELECT SETID,
STUFF((SELECT',%,'+ EXPORT
FROM#TABLE1 AS t2
WHERE t.SETID = t2.SETID
ORDERBY EXPORT
FORXML PATH(''))+',%',1,1,'')AS x
FROM#TABLE1 AS t
GROUPBY setid
),cte2 AS(SELECT COUNTRY,(SELECT','+ EXPORT +','FROM#TABLE2 AS t2
WHERE t.COUNTRY = t2.COUNTRY
ORDERBY EXPORT
FORXML PATH(''))AS x
FROM#TABLE2 AS t
GROUPBY COUNTRY
)SELECT cte1.SETID, cte2.COUNTRY
FROM
No matter what stage you're at in your education or career, TuteeHUB will help you reach the next level that
you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice
sessions to improve your knowledge and scores.
manpreet
Best Answer
2 years ago
I have two tables. Table 1 has sets of EXPORT conditions: e.g., SET 1 has FOOD, GARMENT, and TECHNOLOGY, and SET 2 contains GARMENT and TECHNOLOGY.
Table 2 has a list of countries with different types of exports.
I would like to find countries that satisfy each set of the conditions. A country satisfies a set of conditions if it meets all the conditions in a set (even if it may meet additional conditions) -- For example, CHINA satisfies SET 1 even though it also meets the additional condition STEEL.
The expected output is something like:
Here's the query I have. It does the job, but I wonder if there is a way to make it perform better.Currently, when Table 1 has 5000 rows (approx. 600 sets of conditions), and Table 2 has 6 million rows, it takes 15 minutes to get the output.
Code for generating sample tables: