Speak now
Please Wait Image Converting Into Text...
Embark on a journey of knowledge! Take the quiz and earn valuable credits.
Challenge yourself and boost your learning! Start the quiz now to earn credits.
Unlock your potential! Begin the quiz, answer questions, and accumulate credits along the way.
General Tech Technology & Software 2 years ago
Posted on 16 Aug 2022, this text provides information on Technology & Software related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.
Turn Your Knowledge into Earnings.
I have two tables. Table 1 has sets of EXPORT conditions: e.g., SET 1 has FOOD, GARMENT, and TECHNOLOGY, and SET 2 contains GARMENT and TECHNOLOGY.
TABLE 1 SETID EXPORT ----------------------- 1 FOOD 1 GARMENT 1 TECHNOLOGY ----------------------- 2 GARMENT 2 TECHNOLOGY ----------------------- 3 GARMENT 3 FOOD ----------------------- 4 TECHNOLOGY (more sets of conditions) ...
Table 2 has a list of countries with different types of exports.
TABLE 2 COUNTRY EXPORT ----------------------- GERMANY AUTO GERMANY TECHNOLOGY ----------------------- ITALY FOOD ITALY GARMENT ----------------------- CHINA FOOD CHINA GARMENT CHINA TECHNOLOGY CHINA STEEL ----------------------- FRANCE FOOD FRANCE GARMENT ----------------------- (more countries)...
I would like to find countries that satisfy each set of the conditions. A country satisfies a set of conditions if it meets all the conditions in a set (even if it may meet additional conditions) -- For example, CHINA satisfies SET 1 even though it also meets the additional condition STEEL.
The expected output is something like:
SET Country -------------------------- 1 CHINA 2 CHINA 3 ITALY 3 CHINA 3 FRANCE 4 GERMANY 4 CHINA ...
Here's the query I have. It does the job, but I wonder if there is a way to make it perform better.Currently, when Table 1 has 5000 rows (approx. 600 sets of conditions), and Table 2 has 6 million rows, it takes 15 minutes to get the output.
WITH TEMP AS ( SELECT SETID, COUNTRY, COUNT(*) AS TOTALCOUNT FROM #TABLE1 AS T1 JOIN #TABLE2 AS T2 ON T2.EXPORT = T1.EXPORT GROUP BY SETID, COUNTRY ) SELECT TEMP.SETID, COUNTRY FROM TEMP JOIN ( SELECT COUNT(*) AS TOTALCOUNT, SETID FROM #TABLE1 GROUP BY SETID ) AS TEMP2 ON TEMP.TOTALCOUNT = TEMP2.TOTALCOUNT AND TEMP.SETID = TEMP2.SETID ORDER BY TEMP.SETID
Code for generating sample tables:
CREATE TABLE #TABLE1 (SETID INT, EXPORT NVARCHAR(MAX)) INSERT INTO #TABLE1(SETID, EXPORT) VALUES (1, 'FOOD'), (1, 'GARMENT'), (1, 'TECHNOLOGY'), (2, 'GARMENT'), (2, 'TECHNOLOGY'), (3, 'GARMENT'), (3, 'FOOD'), (4, 'TECHNOLOGY'); CREATE TABLE #TABLE2 (COUNTRY NVARCHAR(MAX), EXPORT NVARCHAR(MAX)) INSERT INTO #TABLE2(COUNTRY, EXPORT) VALUES ('GERMANY', 'AUTO'), ('GERMANY', 'TECHNOLOGY'), ('ITALY', 'FOOD'), ('ITALY', 'GARMENT'), ('CHINA', 'FOOD'), ('CHINA', 'GARMENT' REPLY 0 views 0 likes 0 shares Facebook Twitter Linked In WhatsApp
This is the standard logic to find subsets, but 15 minutes for two not really big tables is really a lot.
The join might return a huge result intermediate set (it's a m-n-join), you can check it using:
SELECT COUNT(*) FROM #TABLE1 AS T1 JOIN #TABLE2 AS T2 ON T2.EXPORT = T1.EXPORT
Your query can be simplified a bit using a Group Count before the join:
WITH cte AS ( SELECT *, COUNT(*) OVER (PARTITION BY SETID) AS cnt FROM #TABLE1 ) SELECT SETID, COUNTRY FROM cte JOIN #TABLE2 AS T2 ON T2.EXPORT = cte.EXPORT GROUP BY SETID, COUNTRY HAVING COUNT(*) = MAX(cte.cnt) ORDER BY SETID
But I don't expect this to improve runtime (unless the optimizer did a very stupid plan before)
As expected, most of the runtime is caused by the huge number (1,500,000,000) of rows in the intermediate result set. For a similar problem (exactly matching two sets) I used a solution based on joining comma-delimited strings of all values in a group (which was very efficient on a Teradata system). This logic can be modified for subsets, but I don't really know about efficiency.
This is the algorithm:
For each country create a comma-delimited list like this:
,FOOD,,GARMENT,,STEEL,,TECHNOLOGY,
For each set create a comma-delimited list like this:
%,GARMENT,%,TECHNOLOGY,%
And then simply join using a LIKE (the additional commas are needed to prevent matching partial values, e.g. 'TECHNO' matching 'TECHNOLOGY'):
LIKE
WITH cte1 AS ( SELECT SETID, STUFF((SELECT ',%,' + EXPORT FROM #TABLE1 AS t2 WHERE t.SETID = t2.SETID ORDER BY EXPORT FOR XML PATH('')) + ',%', 1, 1, '') AS x FROM #TABLE1 AS t GROUP BY setid ) ,cte2 AS ( SELECT COUNTRY, (SELECT ',' + EXPORT + ',' FROM #TABLE2 AS t2 WHERE t.COUNTRY = t2.COUNTRY ORDER BY EXPORT FOR XML PATH('')) AS x FROM #TABLE2 AS t GROUP BY COUNTRY ) SELECT cte1.SETID, cte2.COUNTRY FROM REPLY 0 views 0 likes 0 shares Facebook Twitter Linked In WhatsApp
No matter what stage you're at in your education or career, TuteeHub will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.
General Tech 10 Answers
General Tech 7 Answers
General Tech 3 Answers
General Tech 9 Answers
General Tech 2 Answers
Ready to take your education and career to the next level? Register today and join our growing community of learners and professionals.