Find items in a table that match a set of conditions in another table

General Tech Technology & Software 2 years ago

0 2 0 0 0 tuteeHUB earn credit +10 pts

5 Star Rating 1 Rating

Posted on 16 Aug 2022, this text provides information on Technology & Software related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.

Take Quiz To Earn Credits!

Turn Your Knowledge into Earnings.

tuteehub_quiz

Answers (2)

Post Answer
profilepic.png
manpreet Tuteehub forum best answer Best Answer 2 years ago

 

I have two tables. Table 1 has sets of EXPORT conditions: e.g., SET 1 has FOOD, GARMENT, and TECHNOLOGY, and SET 2 contains GARMENT and TECHNOLOGY.

TABLE 1
SETID       EXPORT
-----------------------
1           FOOD
1           GARMENT
1           TECHNOLOGY
-----------------------
2           GARMENT
2           TECHNOLOGY
-----------------------
3           GARMENT
3           FOOD
-----------------------
4           TECHNOLOGY
(more sets of conditions) ...

Table 2 has a list of countries with different types of exports.

TABLE 2
COUNTRY     EXPORT
-----------------------
GERMANY     AUTO
GERMANY     TECHNOLOGY
-----------------------
ITALY       FOOD              
ITALY       GARMENT 
-----------------------          
CHINA       FOOD
CHINA       GARMENT
CHINA       TECHNOLOGY
CHINA       STEEL
-----------------------
FRANCE      FOOD
FRANCE      GARMENT
-----------------------
(more countries)...

I would like to find countries that satisfy each set of the conditions. A country satisfies a set of conditions if it meets all the conditions in a set (even if it may meet additional conditions) -- For example, CHINA satisfies SET 1 even though it also meets the additional condition STEEL.

The expected output is something like:

SET          Country
--------------------------
1            CHINA
2            CHINA
3            ITALY
3            CHINA
3            FRANCE
4            GERMANY
4            CHINA
...

Here's the query I have. It does the job, but I wonder if there is a way to make it perform better.Currently, when Table 1 has 5000 rows (approx. 600 sets of conditions), and Table 2 has 6 million rows, it takes 15 minutes to get the output.

WITH TEMP AS (  SELECT   SETID, COUNTRY, COUNT(*) AS TOTALCOUNT
                  FROM   #TABLE1 AS T1
                  JOIN   #TABLE2 AS T2 ON T2.EXPORT = T1.EXPORT
              GROUP BY   SETID, COUNTRY
              )
      SELECT  TEMP.SETID, COUNTRY
        FROM  TEMP
        JOIN  (  SELECT  COUNT(*) AS TOTALCOUNT, SETID
                   FROM  #TABLE1
               GROUP BY  SETID
              ) AS TEMP2 ON  TEMP.TOTALCOUNT = TEMP2.TOTALCOUNT AND TEMP.SETID = TEMP2.SETID
    ORDER BY  TEMP.SETID

Code for generating sample tables:

CREATE TABLE #TABLE1 (SETID INT, EXPORT NVARCHAR(MAX))
INSERT INTO #TABLE1(SETID, EXPORT)
VALUES (1, 'FOOD'), (1, 'GARMENT'), (1, 'TECHNOLOGY'), (2, 'GARMENT'),
       (2, 'TECHNOLOGY'), (3, 'GARMENT'), (3, 'FOOD'), (4, 'TECHNOLOGY');

CREATE TABLE #TABLE2 (COUNTRY NVARCHAR(MAX), EXPORT NVARCHAR(MAX))
INSERT INTO #TABLE2(COUNTRY, EXPORT)
VALUES ('GERMANY', 'AUTO'), ('GERMANY', 'TECHNOLOGY'), ('ITALY', 'FOOD'),              
       ('ITALY', 'GARMENT'), ('CHINA', 'FOOD'), ('CHINA', 'GARMENT'
                                                
                                                
0 views
0 shares
profilepic.png
manpreet 2 years ago

 

This is the standard logic to find subsets, but 15 minutes for two not really big tables is really a lot.

The join might return a huge result intermediate set (it's a m-n-join), you can check it using:

SELECT COUNT(*) 
FROM   #TABLE1 AS T1
JOIN   #TABLE2 AS T2 ON T2.EXPORT = T1.EXPORT

Your query can be simplified a bit using a Group Count before the join:

WITH cte AS
 (
   SELECT *, 
      COUNT(*) OVER (PARTITION BY SETID) AS cnt
   FROM #TABLE1
 )
SELECT SETID, COUNTRY
FROM cte
JOIN #TABLE2 AS T2
  ON T2.EXPORT = cte.EXPORT
GROUP BY SETID, COUNTRY
HAVING COUNT(*) = MAX(cte.cnt)
ORDER BY SETID

But I don't expect this to improve runtime (unless the optimizer did a very stupid plan before)

Edit:

As expected, most of the runtime is caused by the huge number (1,500,000,000) of rows in the intermediate result set. For a similar problem (exactly matching two sets) I used a solution based on joining comma-delimited strings of all values in a group (which was very efficient on a Teradata system). This logic can be modified for subsets, but I don't really know about efficiency.

This is the algorithm:

For each country create a comma-delimited list like this:

,FOOD,,GARMENT,,STEEL,,TECHNOLOGY,

For each set create a comma-delimited list like this:

%,GARMENT,%,TECHNOLOGY,%

And then simply join using a LIKE (the additional commas are needed to prevent matching partial values, e.g. 'TECHNO' matching 'TECHNOLOGY'):

WITH cte1 AS
 (
   SELECT SETID, 
      STUFF((SELECT ',%,' + EXPORT 
             FROM  #TABLE1 AS t2
             WHERE t.SETID = t2.SETID 
             ORDER BY EXPORT
             FOR XML PATH('')) + ',%', 1, 1, '') AS x
   FROM #TABLE1 AS t
   GROUP BY setid
 )
,cte2 AS
 (
   SELECT COUNTRY, 
      (SELECT ',' + EXPORT  + ','
       FROM  #TABLE2 AS t2
       WHERE t.COUNTRY = t2.COUNTRY 
       ORDER BY EXPORT
       FOR XML PATH('')) AS x
   FROM #TABLE2 AS t
   GROUP BY COUNTRY
 ) 
SELECT cte1.SETID, cte2.COUNTRY
FROM
                                                    
                                                    
0 views   0 shares

No matter what stage you're at in your education or career, TuteeHub will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.