add data/spider2_original
Browse files
.gitattributes
CHANGED
|
@@ -37,3 +37,5 @@ embeddings/grast-0.6b-swift-merged/tokenizer.json filter=lfs diff=lfs merge=lfs
|
|
| 37 |
embeddings/grast-4b-swift-merged/tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
| 38 |
embeddings/grast-0.6b-flagembed-merged/tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
| 39 |
data/spider2_lite_samples_hf/dev/dataset_info.json filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
| 37 |
embeddings/grast-4b-swift-merged/tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
| 38 |
embeddings/grast-0.6b-flagembed-merged/tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
| 39 |
data/spider2_lite_samples_hf/dev/dataset_info.json filter=lfs diff=lfs merge=lfs -text
|
| 40 |
+
data/spider2_original/review.xlsx filter=lfs diff=lfs merge=lfs -text
|
| 41 |
+
data/spider2_original/spider2_original_lite_samples.json filter=lfs diff=lfs merge=lfs -text
|
data/spider2_original/dev_samples_graph_no_evidence.pkl
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0d771d51ee92eeb2978a55bbc918de1f82562d6240d1eeb58c3badd502c783b7
|
| 3 |
+
size 67644624
|
data/spider2_original/review.md
ADDED
|
@@ -0,0 +1,695 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Spider 2.0 ORIGINAL (no-wildcards) β review
|
| 2 |
+
|
| 3 |
+
**Total samples:** 233
|
| 4 |
+
**Samples with substitutions:** 83
|
| 5 |
+
**Total pattern substitutions:** 396
|
| 6 |
+
**Leftover `*` in graph (should be 0):** 0
|
| 7 |
+
|
| 8 |
+
## Substitution reasons
|
| 9 |
+
|
| 10 |
+
| reason | count |
|
| 11 |
+
|---|---|
|
| 12 |
+
| `q_year_fallback` | 250 |
|
| 13 |
+
| `hardcoded_default` | 101 |
|
| 14 |
+
| `sql_specific` | 13 |
|
| 15 |
+
| `table_suffix_range` | 10 |
|
| 16 |
+
| `sql_specific_year_alpha` | 7 |
|
| 17 |
+
| `sql_partial_wildcard` | 4 |
|
| 18 |
+
| `sql_8digit_literal` | 3 |
|
| 19 |
+
| `sql_specific_alphanumeric` | 3 |
|
| 20 |
+
| `declare` | 2 |
|
| 21 |
+
| `family_default` | 2 |
|
| 22 |
+
| `usa_1910_current_singleton` | 1 |
|
| 23 |
+
|
| 24 |
+
## Top 20 patterns substituted
|
| 25 |
+
|
| 26 |
+
| pattern | count |
|
| 27 |
+
|---|---|
|
| 28 |
+
| `ga_sessions_*` | 11 |
|
| 29 |
+
| `zip_codes_*` | 10 |
|
| 30 |
+
| `censustract_*` | 10 |
|
| 31 |
+
| `puma_*` | 10 |
|
| 32 |
+
| `schooldistrictunified_*` | 10 |
|
| 33 |
+
| `state_*` | 10 |
|
| 34 |
+
| `congressionaldistrict_*` | 10 |
|
| 35 |
+
| `place_*` | 10 |
|
| 36 |
+
| `blockgroup_*` | 10 |
|
| 37 |
+
| `cbsa_*` | 10 |
|
| 38 |
+
| `schooldistrictsecondary_*` | 10 |
|
| 39 |
+
| `county_*` | 10 |
|
| 40 |
+
| `schooldistrictelementary_*` | 10 |
|
| 41 |
+
| `zcta_*` | 10 |
|
| 42 |
+
| `events_*` | 9 |
|
| 43 |
+
| `zcta5_*` | 8 |
|
| 44 |
+
| `icoads_core_*` | 6 |
|
| 45 |
+
| `storms_*` | 6 |
|
| 46 |
+
| `tlc_fhv_trips_*` | 6 |
|
| 47 |
+
| `tlc_yellow_trips_*` | 6 |
|
| 48 |
+
|
| 49 |
+
## Gold-affecting substitutions (47 suspect samples)
|
| 50 |
+
|
| 51 |
+
These are the samples where the substitution changes `used_columns` (the gold).
|
| 52 |
+
|
| 53 |
+
### sid=0 iid=bq011 db=ga4
|
| 54 |
+
|
| 55 |
+
**Q:** How many distinct pseudo users had positive engagement time in the 7-day period ending on January 7, 2021 at 23:59:59, but had no positive engagement time in the 2-day period ending on the same date (January 7, 2021 at 23:59:59) ?
|
| 56 |
+
|
| 57 |
+
**Old gold (2):** `['events_*.user_pseudo_id', 'events_*.event_timestamp']`
|
| 58 |
+
|
| 59 |
+
**New gold (2):** `['events_20210101.user_pseudo_id', 'events_20210101.event_timestamp']`
|
| 60 |
+
|
| 61 |
+
**Substitutions:**
|
| 62 |
+
- `events_*` β `events_20210101` (table_suffix_range:20210101 -> pad8)
|
| 63 |
+
|
| 64 |
+
---
|
| 65 |
+
|
| 66 |
+
### sid=1 iid=bq010 db=ga360
|
| 67 |
+
|
| 68 |
+
**Q:** Find the top-selling product among customers who bought 'Youtube Menβs Vintage Henley' in July 2017, excluding itself.
|
| 69 |
+
|
| 70 |
+
**Old gold (2):** `['ga_sessions_*.fullVisitorId', 'ga_sessions_*.hits']`
|
| 71 |
+
|
| 72 |
+
**New gold (2):** `['ga_sessions_20170701.fullVisitorId', 'ga_sessions_20170701.hits']`
|
| 73 |
+
|
| 74 |
+
**Substitutions:**
|
| 75 |
+
- `ga_sessions_*` β `ga_sessions_20170701` (sql_partial_wildcard:201707 -> pad8)
|
| 76 |
+
|
| 77 |
+
---
|
| 78 |
+
|
| 79 |
+
### sid=2 iid=bq009 db=ga360
|
| 80 |
+
|
| 81 |
+
**Q:** Which traffic source has the highest total transaction revenue for the year 2017, and what is the difference in millions (rounded to two decimal places) between the highest and lowest monthly total transaction revenue for that traffic source?
|
| 82 |
+
|
| 83 |
+
**Old gold (3):** `['ga_sessions_*.date', 'ga_sessions_*.trafficSource', 'ga_sessions_*.totals']`
|
| 84 |
+
|
| 85 |
+
**New gold (3):** `['ga_sessions_20170101.date', 'ga_sessions_20170101.trafficSource', 'ga_sessions_20170101.totals']`
|
| 86 |
+
|
| 87 |
+
**Substitutions:**
|
| 88 |
+
- `ga_sessions_*` β `ga_sessions_20170101` (sql_partial_wildcard:2017 -> pad8)
|
| 89 |
+
|
| 90 |
+
---
|
| 91 |
+
|
| 92 |
+
### sid=3 iid=bq001 db=ga360
|
| 93 |
+
|
| 94 |
+
**Q:** For each visitor who made at least one transaction in February 2017, how many days elapsed between the date of their first visit in February and the date of their first transaction in February, and on what type of device did they make that first transaction?
|
| 95 |
+
|
| 96 |
+
**Old gold (2):** `['ga_sessions_*.fullVisitorId', 'ga_sessions_*.date']`
|
| 97 |
+
|
| 98 |
+
**New gold (2):** `['ga_sessions_20170201.fullVisitorId', 'ga_sessions_20170201.date']`
|
| 99 |
+
|
| 100 |
+
**Substitutions:**
|
| 101 |
+
- `ga_sessions_*` β `ga_sessions_20170201` (declare:20170201 -> pad8)
|
| 102 |
+
|
| 103 |
+
---
|
| 104 |
+
|
| 105 |
+
### sid=4 iid=bq002 db=ga360
|
| 106 |
+
|
| 107 |
+
**Q:** During the first half of 2017, focusing on hits product revenue, which traffic source generated the highest total product revenue, and what were the maximum daily, weekly, and monthly product revenues (in millions) for that top-performing source over this period?
|
| 108 |
+
|
| 109 |
+
**Old gold (2):** `['ga_sessions_*.date', 'ga_sessions_*.hits']`
|
| 110 |
+
|
| 111 |
+
**New gold (2):** `['ga_sessions_20170101.date', 'ga_sessions_20170101.hits']`
|
| 112 |
+
|
| 113 |
+
**Substitutions:**
|
| 114 |
+
- `ga_sessions_*` β `ga_sessions_20170101` (declare:20170101 -> pad8)
|
| 115 |
+
|
| 116 |
+
---
|
| 117 |
+
|
| 118 |
+
### sid=5 iid=bq003 db=ga360
|
| 119 |
+
|
| 120 |
+
**Q:** Between April 1 and July 31 of 2017, using the hits product revenue data along with the totals transactions to classify sessions as purchase (transactions β₯ 1 and productRevenue not null) or non-purchase (transactions null and productRevenue null), compare the average pageviews per visitor for each
|
| 121 |
+
|
| 122 |
+
**Old gold (2):** `['ga_sessions_*.date', 'ga_sessions_*.fullVisitorId']`
|
| 123 |
+
|
| 124 |
+
**New gold (2):** `['ga_sessions_20170101.date', 'ga_sessions_20170101.fullVisitorId']`
|
| 125 |
+
|
| 126 |
+
**Substitutions:**
|
| 127 |
+
- `ga_sessions_*` β `ga_sessions_20170101` (sql_partial_wildcard:2017 -> pad8)
|
| 128 |
+
|
| 129 |
+
---
|
| 130 |
+
|
| 131 |
+
### sid=7 iid=bq008 db=ga360
|
| 132 |
+
|
| 133 |
+
**Q:** In January 2017, among visitors whose campaign name contains 'Data Share' and who accessed any page starting with '/home', which page did they most commonly visit next, and what is the maximum time (in seconds) they spent on the '/home' page before moving on?
|
| 134 |
+
|
| 135 |
+
**Old gold (7):** `['ga_sessions_*.fullVisitorId', 'ga_sessions_*.visitId', 'ga_sessions_*.trafficSource', 'ga_sessions_*.hits', 'ga_sessions_*.hits', 'ga_sessions_*.visitStartTime', 'ga_sessions_*.hits']`
|
| 136 |
+
|
| 137 |
+
**New gold (7):** `['ga_sessions_20170101.fullVisitorId', 'ga_sessions_20170101.visitId', 'ga_sessions_20170101.trafficSource', 'ga_sessions_20170101.hits', 'ga_sessions_20170101.hits', 'ga_sessions_20170101.visitStartTime', 'ga_sessions_20170101.hits']`
|
| 138 |
+
|
| 139 |
+
**Substitutions:**
|
| 140 |
+
- `ga_sessions_*` β `ga_sessions_20170101` (table_suffix_range:20170101 -> pad8)
|
| 141 |
+
|
| 142 |
+
---
|
| 143 |
+
|
| 144 |
+
### sid=8 iid=bq269 db=ga360
|
| 145 |
+
|
| 146 |
+
**Q:** Between June 1, 2017, and July 31, 2017, consider only sessions that have non-null pageviews. Classify each session as βpurchaseβ if it has at least one transaction, or βnon_purchaseβ otherwise. For each month, sum each visitorβs total pageviews under each classification, then compute the average pa
|
| 147 |
+
|
| 148 |
+
**Old gold (4):** `['ga_sessions_*.date', 'ga_sessions_*.fullVisitorId', 'ga_sessions_*.totals', 'ga_sessions_*.totals']`
|
| 149 |
+
|
| 150 |
+
**New gold (4):** `['ga_sessions_20170601.date', 'ga_sessions_20170601.fullVisitorId', 'ga_sessions_20170601.totals', 'ga_sessions_20170601.totals']`
|
| 151 |
+
|
| 152 |
+
**Substitutions:**
|
| 153 |
+
- `ga_sessions_*` β `ga_sessions_20170601` (table_suffix_range:20170601 -> pad8)
|
| 154 |
+
|
| 155 |
+
---
|
| 156 |
+
|
| 157 |
+
### sid=9 iid=bq268 db=ga360
|
| 158 |
+
|
| 159 |
+
**Q:** Identify the longest number of days between the first visit and the last recorded event (either the last visit or the first transaction) for a user, where the last recorded event is associated with a mobile device. The last recorded event could either be the last visit or the first transaction, and
|
| 160 |
+
|
| 161 |
+
**Old gold (4):** `['ga_sessions_*.fullVisitorId', 'ga_sessions_*.date', 'ga_sessions_*.device', 'ga_sessions_*.hits']`
|
| 162 |
+
|
| 163 |
+
**New gold (4):** `['ga_sessions_20170701.fullVisitorId', 'ga_sessions_20170701.date', 'ga_sessions_20170701.device', 'ga_sessions_20170701.hits']`
|
| 164 |
+
|
| 165 |
+
**Substitutions:**
|
| 166 |
+
- `ga_sessions_*` β `ga_sessions_20170701` (family_default:ga_sessions=20170701)
|
| 167 |
+
|
| 168 |
+
---
|
| 169 |
+
|
| 170 |
+
### sid=10 iid=bq270 db=ga360
|
| 171 |
+
|
| 172 |
+
**Q:** What were the monthly add-to-cart and purchase conversion rates, calculated as a percentage of pageviews on product details, from January to March 2017?
|
| 173 |
+
|
| 174 |
+
**Old gold (3):** `['ga_sessions_*.date', 'ga_sessions_*.hits', 'ga_sessions_*.hits']`
|
| 175 |
+
|
| 176 |
+
**New gold (3):** `['ga_sessions_20170101.date', 'ga_sessions_20170101.hits', 'ga_sessions_20170101.hits']`
|
| 177 |
+
|
| 178 |
+
**Substitutions:**
|
| 179 |
+
- `ga_sessions_*` β `ga_sessions_20170101` (sql_partial_wildcard:2017 -> pad8)
|
| 180 |
+
|
| 181 |
+
---
|
| 182 |
+
|
| 183 |
+
### sid=11 iid=bq275 db=ga360
|
| 184 |
+
|
| 185 |
+
**Q:** Which visitor IDs belong to users whose first transaction occurred on a device explicitly labeled as 'mobile' on a later date than their first visit?
|
| 186 |
+
|
| 187 |
+
**Old gold (4):** `['ga_sessions_*.fullVisitorId', 'ga_sessions_*.date', 'ga_sessions_*.hits', 'ga_sessions_*.device']`
|
| 188 |
+
|
| 189 |
+
**New gold (4):** `['ga_sessions_20170701.fullVisitorId', 'ga_sessions_20170701.date', 'ga_sessions_20170701.hits', 'ga_sessions_20170701.device']`
|
| 190 |
+
|
| 191 |
+
**Substitutions:**
|
| 192 |
+
- `ga_sessions_*` β `ga_sessions_20170701` (family_default:ga_sessions=20170701)
|
| 193 |
+
|
| 194 |
+
---
|
| 195 |
+
|
| 196 |
+
### sid=12 iid=bq374 db=ga360
|
| 197 |
+
|
| 198 |
+
**Q:** Calculates the percentage of new users who, between August 1, 2016, and April 30, 2017, both stayed on the site for more than 5 minutes during their initial visit and made a purchase on a subsequent visit at any later time, relative to the total number of new users in the same period.
|
| 199 |
+
|
| 200 |
+
**Old gold (6):** `['ga_sessions_*.fullVisitorId', 'ga_sessions_*.visitStartTime', 'ga_sessions_*.totals', 'ga_sessions_*.date', 'ga_sessions_*.totals', 'ga_sessions_*.totals']`
|
| 201 |
+
|
| 202 |
+
**New gold (6):** `['ga_sessions_20160801.fullVisitorId', 'ga_sessions_20160801.visitStartTime', 'ga_sessions_20160801.totals', 'ga_sessions_20160801.date', 'ga_sessions_20160801.totals', 'ga_sessions_20160801.totals']`
|
| 203 |
+
|
| 204 |
+
**Substitutions:**
|
| 205 |
+
- `ga_sessions_*` β `ga_sessions_20160801` (sql_8digit_literal:20160801)
|
| 206 |
+
|
| 207 |
+
---
|
| 208 |
+
|
| 209 |
+
### sid=61 iid=bq235 db=cms_data
|
| 210 |
+
|
| 211 |
+
**Q:** Can you tell me which healthcare provider incurs the highest combined average costs for both outpatient and inpatient services in 2014?
|
| 212 |
+
|
| 213 |
+
**Old gold (12):** `['outpatient_charges_*.provider_state', 'outpatient_charges_*.provider_city', 'outpatient_charges_*.provider_id', 'outpatient_charges_*.provider_name', 'outpatient_charges_*.outpatient_services', 'outpatient_charges_*.average_total_payments', 'inpatient_charges_*.provider_state', 'inpatient_charges_*.provider_city', 'inpatient_charges_*.provider_id', 'inpatient_charges_*.provider_name', 'inpatient_charges_*.total_discharges', 'inpatient_charges_*.average_medicare_payments']`
|
| 214 |
+
|
| 215 |
+
**New gold (12):** `['outpatient_charges_2014.provider_state', 'outpatient_charges_2014.provider_city', 'outpatient_charges_2014.provider_id', 'outpatient_charges_2014.provider_name', 'outpatient_charges_2014.outpatient_services', 'outpatient_charges_2014.average_total_payments', 'inpatient_charges_2014.provider_state', 'inpatient_charges_2014.provider_city', 'inpatient_charges_2014.provider_id', 'inpatient_charges_2014.provider_name', 'inpatient_charges_2014.total_discharges', 'inpatient_charges_2014.average_medicare_payments']`
|
| 216 |
+
|
| 217 |
+
**Substitutions:**
|
| 218 |
+
- `outpatient_charges_*` β `outpatient_charges_2014` (sql_specific:2014 -> pad4)
|
| 219 |
+
- `inpatient_charges_*` β `inpatient_charges_2014` (sql_specific:2014 -> pad4)
|
| 220 |
+
|
| 221 |
+
---
|
| 222 |
+
|
| 223 |
+
### sid=69 iid=bq419 db=noaa_data
|
| 224 |
+
|
| 225 |
+
**Q:** Which 5 states had the most storm events from 1980 to 1995, considering only the top 1000 states with the highest event counts each year? Please use state abbreviations.
|
| 226 |
+
|
| 227 |
+
**Old gold (2):** `['storms_*.state', 'storms_*.event_id']`
|
| 228 |
+
|
| 229 |
+
**New gold (2):** `['storms_1980.state', 'storms_1980.event_id']`
|
| 230 |
+
|
| 231 |
+
**Substitutions:**
|
| 232 |
+
- `storms_*` β `storms_1980` (sql_specific:1980 -> pad4)
|
| 233 |
+
|
| 234 |
+
---
|
| 235 |
+
|
| 236 |
+
### sid=71 iid=sf_bq236 db=NOAA_DATA_PLUS
|
| 237 |
+
|
| 238 |
+
**Q:** What are the top 5 zip codes of the areas in the United States that have experienced the most hail storm events in the past 10 years? Don't use data from hail reports table.
|
| 239 |
+
|
| 240 |
+
**Old gold (7):** `['STORMS_*.event_point', 'STORMS_*.event_type', 'STORMS_*.event_id', 'ZIP_CODES.zip_code', 'ZIP_CODES.city', 'ZIP_CODES.state_name', 'ZIP_CODES.zip_code_geom']`
|
| 241 |
+
|
| 242 |
+
**New gold (7):** `['STORMS_2014.event_point', 'STORMS_2014.event_type', 'STORMS_2014.event_id', 'ZIP_CODES.zip_code', 'ZIP_CODES.city', 'ZIP_CODES.state_name', 'ZIP_CODES.zip_code_geom']`
|
| 243 |
+
|
| 244 |
+
**Substitutions:**
|
| 245 |
+
- `STORMS_*` β `STORMS_2014` (sql_specific:2014 -> pad4)
|
| 246 |
+
|
| 247 |
+
---
|
| 248 |
+
|
| 249 |
+
### sid=75 iid=bq357 db=noaa_data
|
| 250 |
+
|
| 251 |
+
**Q:** What are the latitude and longitude coordinates and dates between 2005 and 2015 with the top 5 highest daily average wind speeds, excluding records with missing wind speed values? Using data from tables start with prefix "icoads_core".
|
| 252 |
+
|
| 253 |
+
**Old gold (6):** `['icoads_core_*.year', 'icoads_core_*.month', 'icoads_core_*.day', 'icoads_core_*.latitude', 'icoads_core_*.longitude', 'icoads_core_*.wind_speed']`
|
| 254 |
+
|
| 255 |
+
**New gold (6):** `['icoads_core_2005.year', 'icoads_core_2005.month', 'icoads_core_2005.day', 'icoads_core_2005.latitude', 'icoads_core_2005.longitude', 'icoads_core_2005.wind_speed']`
|
| 256 |
+
|
| 257 |
+
**Substitutions:**
|
| 258 |
+
- `icoads_core_*` β `icoads_core_2005` (table_suffix_range:2005 -> pad4)
|
| 259 |
+
|
| 260 |
+
---
|
| 261 |
+
|
| 262 |
+
### sid=93 iid=sf_bq429 db=CENSUS_BUREAU_ACS_2
|
| 263 |
+
|
| 264 |
+
**Q:** Which are the top five states with the greatest average difference in median income between 2015 and 2018 at the ZIP code level, and what is the corresponding average number of vulnerable employees across wholesale trade, natural resources and construction, arts and entertainment, information, and r
|
| 265 |
+
|
| 266 |
+
**Old gold (26):** `['ZCTA5_2017_5YR.employed_information', 'PLACE_*.geo_id', 'PLACE_*.employed_arts_entertainment_recreation_accommodation_food', 'PLACE_*.employed_wholesale_trade', 'ZIP_CODES_*.employed_information', 'ZCTA5_2018_5YR.geo_id', 'PLACE_*.occupation_natural_resources_construction_maintenance', 'ZCTA5_2015_5YR.geo_id', 'ZCTA5_2017_5YR.employed_arts_entertainment_recreation_accommodation_food', 'ZCTA5_2017_5YR.occupation_natural_resources_construction_maintenance', 'ZCTA5_2017_5YR.employed_retail_trade', 'PLACE_*.employed_information', 'ZCTA5_2018_5YR.median_income', 'ZIP_CODES_*.employed_wholesale_trade', 'ZIP_CODES_*.employed_retail_trade', 'ZIP_CODES_*.median_income', 'ZIP_CODES_*.geo_id', 'ZIP_CODES.state_name', 'PLACE_*.employed_retail_trade', 'PLACE_*.median_income', 'ZIP_CODES_*.occupation_natural_resources_construction_maintenance', 'ZIP_CODES_*.employed_arts_entertainment_recreation_accommodation_food', 'ZIP_CODES.zip_code', 'ZCTA5_2017_5YR.employed_wholesale_trade', 'ZCTA5_2015_5YR.median_income', 'ZCTA5_2017_5YR.geo_id']`
|
| 267 |
+
|
| 268 |
+
**New gold (26):** `['ZCTA5_2017_5YR.employed_information', 'PLACE_2015.geo_id', 'PLACE_2015.employed_arts_entertainment_recreation_accommodation_food', 'PLACE_2015.employed_wholesale_trade', 'ZIP_CODES_2018_5YR.employed_information', 'ZCTA5_2018_5YR.geo_id', 'PLACE_2015.occupation_natural_resources_construction_maintenance', 'ZCTA5_2015_5YR.geo_id', 'ZCTA5_2017_5YR.employed_arts_entertainment_recreation_accommodation_food', 'ZCTA5_2017_5YR.occupation_natural_resources_construction_maintenance', 'ZCTA5_2017_5YR.employed_retail_trade', 'PLACE_2015.employed_information', 'ZCTA5_2018_5YR.median_income', 'ZIP_CODES_2018_5YR.employed_wholesale_trade', 'ZIP_CODES_2018_5YR.employed_retail_trade', 'ZIP_CODES_2018_5YR.median_income', 'ZIP_CODES_2018_5YR.geo_id', 'ZIP_CODES.state_name', 'PLACE_2015.employed_retail_trade', 'PLACE_2015.median_income', 'ZIP_CODES_2018_5YR.occupation_natural_resources_construction_maintenance', 'ZIP_CODES_2018_5YR.employed_arts_entertainment_recreation_accommodation_food', 'ZIP_CODES.zip_code', 'ZCTA5_2017_5YR.employed_wholesale_trade', 'ZCTA5_2015_5YR.median_income', 'ZCTA5_2017_5YR.geo_id']`
|
| 269 |
+
|
| 270 |
+
**Substitutions:**
|
| 271 |
+
- `ZIP_CODES_*` β `ZIP_CODES_2018_5YR` (sql_specific_year_alpha:2018_5YR)
|
| 272 |
+
- `PLACE_*` β `PLACE_2015` (q_year_fallback:2015 -> pad4)
|
| 273 |
+
|
| 274 |
+
---
|
| 275 |
+
|
| 276 |
+
### sid=100 iid=sf_bq289 db=GEO_OPENSTREETMAP_CENSUS_PLACES
|
| 277 |
+
|
| 278 |
+
**Q:** Can you find the shortest distance between any two amenities (either a library, place of worship, or community center) located within Philadelphia, analyzed through pennsylvania table and planet features points?
|
| 279 |
+
|
| 280 |
+
**Old gold (5):** `['PLACES_*.place_name', 'PLACES_*.place_geom', 'PLANET_FEATURES_POINTS.all_tags', 'PLANET_FEATURES_POINTS.geometry', 'PLANET_FEATURES_POINTS.osm_id']`
|
| 281 |
+
|
| 282 |
+
**New gold (5):** `['PLACES_PENNSYLVANIA.place_name', 'PLACES_PENNSYLVANIA.place_geom', 'PLANET_FEATURES_POINTS.all_tags', 'PLANET_FEATURES_POINTS.geometry', 'PLANET_FEATURES_POINTS.osm_id']`
|
| 283 |
+
|
| 284 |
+
**Substitutions:**
|
| 285 |
+
- `PLACES_*` β `PLACES_PENNSYLVANIA` (sql_specific_alphanumeric:PENNSYLVANIA)
|
| 286 |
+
|
| 287 |
+
---
|
| 288 |
+
|
| 289 |
+
### sid=132 iid=bq021 db=new_york
|
| 290 |
+
|
| 291 |
+
**Q:** For the top 20 Citi Bike routes in 2016, which route is faster than yellow taxis and among those, which one has the longest average bike duration? Please provide the start station name of this route. The coordinates are rounded to three decimals.
|
| 292 |
+
|
| 293 |
+
**Old gold (14):** `['citibike_trips.start_station_name', 'citibike_trips.end_station_name', 'citibike_trips.start_station_latitude', 'citibike_trips.start_station_longitude', 'citibike_trips.end_station_latitude', 'citibike_trips.end_station_longitude', 'citibike_trips.tripduration', 'citibike_trips.starttime', 'tlc_yellow_trips_*.pickup_latitude', 'tlc_yellow_trips_*.pickup_longitude', 'tlc_yellow_trips_*.dropoff_latitude', 'tlc_yellow_trips_*.dropoff_longitude', 'tlc_yellow_trips_*.dropoff_datetime', 'tlc_yellow_trips_*.pickup_datetime']`
|
| 294 |
+
|
| 295 |
+
**New gold (14):** `['citibike_trips.start_station_name', 'citibike_trips.end_station_name', 'citibike_trips.start_station_latitude', 'citibike_trips.start_station_longitude', 'citibike_trips.end_station_latitude', 'citibike_trips.end_station_longitude', 'citibike_trips.tripduration', 'citibike_trips.starttime', 'tlc_yellow_trips_2016.pickup_latitude', 'tlc_yellow_trips_2016.pickup_longitude', 'tlc_yellow_trips_2016.dropoff_latitude', 'tlc_yellow_trips_2016.dropoff_longitude', 'tlc_yellow_trips_2016.dropoff_datetime', 'tlc_yellow_trips_2016.pickup_datetime']`
|
| 296 |
+
|
| 297 |
+
**Substitutions:**
|
| 298 |
+
- `tlc_yellow_trips_*` β `tlc_yellow_trips_2016` (sql_specific:2016 -> pad4)
|
| 299 |
+
|
| 300 |
+
---
|
| 301 |
+
|
| 302 |
+
### sid=134 iid=bq185 db=new_york_plus
|
| 303 |
+
|
| 304 |
+
**Q:** What is the average trip duration in minutes for all valid Yellow taxi trips that took place between February 1, 2016, and February 7, 2016 (inclusive), with a positive trip duration, more than three passengers, and a trip distance of at least ten miles, where both the pickup and dropoff locations a
|
| 305 |
+
|
| 306 |
+
**Old gold (8):** `['tlc_yellow_trips_*.dropoff_datetime', 'tlc_yellow_trips_*.pickup_datetime', 'tlc_yellow_trips_*.passenger_count', 'tlc_yellow_trips_*.trip_distance', 'tlc_yellow_trips_*.pickup_location_id', 'tlc_yellow_trips_*.dropoff_location_id', 'taxi_zone_geom.zone_id', 'taxi_zone_geom.borough']`
|
| 307 |
+
|
| 308 |
+
**New gold (8):** `['tlc_yellow_trips_2016.dropoff_datetime', 'tlc_yellow_trips_2016.pickup_datetime', 'tlc_yellow_trips_2016.passenger_count', 'tlc_yellow_trips_2016.trip_distance', 'tlc_yellow_trips_2016.pickup_location_id', 'tlc_yellow_trips_2016.dropoff_location_id', 'taxi_zone_geom.zone_id', 'taxi_zone_geom.borough']`
|
| 309 |
+
|
| 310 |
+
**Substitutions:**
|
| 311 |
+
- `tlc_yellow_trips_*` β `tlc_yellow_trips_2016` (sql_specific:2016 -> pad4)
|
| 312 |
+
|
| 313 |
+
---
|
| 314 |
+
|
| 315 |
+
### sid=136 iid=bq098 db=new_york_plus
|
| 316 |
+
|
| 317 |
+
**Q:** For NYC yellow taxi trips where both the pickup and dropoff occurred between January 1 and 7, 2016, inclusive, calculate the percentage of trips with no tip in each pickup borough, ensuring that only trips where the dropoff occurs after the pickup are included, the passenger count is greater than ze
|
| 318 |
+
|
| 319 |
+
**Old gold (12):** `['tlc_yellow_trips_*.dropoff_datetime', 'tlc_yellow_trips_*.pickup_datetime', 'tlc_yellow_trips_*.total_amount', 'tlc_yellow_trips_*.tip_amount', 'tlc_yellow_trips_*.passenger_count', 'tlc_yellow_trips_*.trip_distance', 'tlc_yellow_trips_*.tolls_amount', 'tlc_yellow_trips_*.mta_tax', 'tlc_yellow_trips_*.fare_amount', 'tlc_yellow_trips_*.pickup_location_id', 'taxi_zone_geom.zone_id', 'taxi_zone_geom.borough']`
|
| 320 |
+
|
| 321 |
+
**New gold (12):** `['tlc_yellow_trips_2016.dropoff_datetime', 'tlc_yellow_trips_2016.pickup_datetime', 'tlc_yellow_trips_2016.total_amount', 'tlc_yellow_trips_2016.tip_amount', 'tlc_yellow_trips_2016.passenger_count', 'tlc_yellow_trips_2016.trip_distance', 'tlc_yellow_trips_2016.tolls_amount', 'tlc_yellow_trips_2016.mta_tax', 'tlc_yellow_trips_2016.fare_amount', 'tlc_yellow_trips_2016.pickup_location_id', 'taxi_zone_geom.zone_id', 'taxi_zone_geom.borough']`
|
| 322 |
+
|
| 323 |
+
**Substitutions:**
|
| 324 |
+
- `tlc_yellow_trips_*` β `tlc_yellow_trips_2016` (sql_specific:2016 -> pad4)
|
| 325 |
+
|
| 326 |
+
---
|
| 327 |
+
|
| 328 |
+
### sid=137 iid=bq039 db=new_york_plus
|
| 329 |
+
|
| 330 |
+
**Q:** Find the top 10 taxi trips in New York City between July 1 and July 7, 2016 (ensuring both pickup and dropoff times fall within these dates) where the passenger count is greater than five, the trip distance is at least ten miles, and there are no negative fare-related amounts (including tip, tolls,
|
| 331 |
+
|
| 332 |
+
**Old gold (13):** `['tlc_yellow_trips_*.pickup_datetime', 'tlc_yellow_trips_*.dropoff_datetime', 'tlc_yellow_trips_*.trip_distance', 'tlc_yellow_trips_*.tip_amount', 'tlc_yellow_trips_*.total_amount', 'tlc_yellow_trips_*.pickup_location_id', 'tlc_yellow_trips_*.dropoff_location_id', 'tlc_yellow_trips_*.passenger_count', 'tlc_yellow_trips_*.tolls_amount', 'tlc_yellow_trips_*.mta_tax', 'tlc_yellow_trips_*.fare_amount', 'taxi_zone_geom.zone_name', 'taxi_zone_geom.zone_id']`
|
| 333 |
+
|
| 334 |
+
**New gold (13):** `['tlc_yellow_trips_2016.pickup_datetime', 'tlc_yellow_trips_2016.dropoff_datetime', 'tlc_yellow_trips_2016.trip_distance', 'tlc_yellow_trips_2016.tip_amount', 'tlc_yellow_trips_2016.total_amount', 'tlc_yellow_trips_2016.pickup_location_id', 'tlc_yellow_trips_2016.dropoff_location_id', 'tlc_yellow_trips_2016.passenger_count', 'tlc_yellow_trips_2016.tolls_amount', 'tlc_yellow_trips_2016.mta_tax', 'tlc_yellow_trips_2016.fare_amount', 'taxi_zone_geom.zone_name', 'taxi_zone_geom.zone_id']`
|
| 335 |
+
|
| 336 |
+
**Substitutions:**
|
| 337 |
+
- `tlc_yellow_trips_*` β `tlc_yellow_trips_2016` (sql_specific:2016 -> pad4)
|
| 338 |
+
|
| 339 |
+
---
|
| 340 |
+
|
| 341 |
+
### sid=175 iid=bq087 db=covid19_symptom_search
|
| 342 |
+
|
| 343 |
+
**Q:** Please calculate the overall percentage change in the average weekly search frequency for the symptom 'Anosmia' across the five New York City countiesβBronx County, Queens County, Kings County, New York County, and Richmond Countyβby comparing the combined data from January 1, 2019, through December
|
| 344 |
+
|
| 345 |
+
**Old gold (4):** `['symptom_search_*.symptom_anosmia', 'symptom_search_*.sub_region_1', 'symptom_search_*.sub_region_2', 'symptom_search_*.date']`
|
| 346 |
+
|
| 347 |
+
**New gold (4):** `['symptom_search_2019.symptom_anosmia', 'symptom_search_2019.sub_region_1', 'symptom_search_2019.sub_region_2', 'symptom_search_2019.date']`
|
| 348 |
+
|
| 349 |
+
**Substitutions:**
|
| 350 |
+
- `symptom_search_*` β `symptom_search_2019` (q_year_fallback:2019 -> pad4)
|
| 351 |
+
|
| 352 |
+
---
|
| 353 |
+
|
| 354 |
+
### sid=176 iid=bq088 db=covid19_symptom_search
|
| 355 |
+
|
| 356 |
+
**Q:** Please calculate the average levels of anxiety and depression symptoms from the weekly country data for the United States during the periods from January 1, 2019, to January 1, 2020, and from January 1, 2020, to January 1, 2021. Then, compute the percentage increase in these average symptom levels f
|
| 357 |
+
|
| 358 |
+
**Old gold (4):** `['symptom_search_*.symptom_anxiety', 'symptom_search_*.symptom_depression', 'symptom_search_*.country_region_code', 'symptom_search_*.date']`
|
| 359 |
+
|
| 360 |
+
**New gold (4):** `['symptom_search_2019.symptom_anxiety', 'symptom_search_2019.symptom_depression', 'symptom_search_2019.country_region_code', 'symptom_search_2019.date']`
|
| 361 |
+
|
| 362 |
+
**Substitutions:**
|
| 363 |
+
- `symptom_search_*` β `symptom_search_2019` (q_year_fallback:2019 -> pad4)
|
| 364 |
+
|
| 365 |
+
---
|
| 366 |
+
|
| 367 |
+
### sid=177 iid=bq089 db=covid19_usa
|
| 368 |
+
|
| 369 |
+
**Q:** Given the latest population estimates from the 2018 five-year American Community Survey, what is the number of vaccine sites per 1000 people for counties in California?
|
| 370 |
+
|
| 371 |
+
**Old gold (12):** `['state_*.geo_id', 'facility_boundary_us_all.facility_place_id', 'place_*.total_pop', 'facility_boundary_us_all.facility_sub_region_2_code', 'place_*.geo_id', 'censustract_*.total_pop', 'facility_boundary_us_all.facility_sub_region_2', 'congressionaldistrict_*.total_pop', 'state_*.total_pop', 'congressionaldistrict_*.geo_id', 'facility_boundary_us_all.facility_sub_region_1', 'censustract_*.geo_id']`
|
| 372 |
+
|
| 373 |
+
**New gold (12):** `['state_2018.geo_id', 'facility_boundary_us_all.facility_place_id', 'place_2018.total_pop', 'facility_boundary_us_all.facility_sub_region_2_code', 'place_2018.geo_id', 'censustract_2018.total_pop', 'facility_boundary_us_all.facility_sub_region_2', 'congressionaldistrict_2018.total_pop', 'state_2018.total_pop', 'congressionaldistrict_2018.geo_id', 'facility_boundary_us_all.facility_sub_region_1', 'censustract_2018.geo_id']`
|
| 374 |
+
|
| 375 |
+
**Substitutions:**
|
| 376 |
+
- `censustract_*` β `censustract_2018` (q_year_fallback:2018 -> pad4)
|
| 377 |
+
- `state_*` β `state_2018` (q_year_fallback:2018 -> pad4)
|
| 378 |
+
- `congressionaldistrict_*` β `congressionaldistrict_2018` (q_year_fallback:2018 -> pad4)
|
| 379 |
+
- `place_*` β `place_2018` (q_year_fallback:2018 -> pad4)
|
| 380 |
+
|
| 381 |
+
---
|
| 382 |
+
|
| 383 |
+
### sid=178 iid=bq407 db=covid19_usa
|
| 384 |
+
|
| 385 |
+
**Q:** Find the top three counties with populations over 50,000, using the 2020 5-year census data, that had the highest COVID-19 case fatality rates on August 27, 2020. For these counties, provide the name, state, median age, total population, number of confirmed COVID-19 cases per 100,000 people, number
|
| 386 |
+
|
| 387 |
+
**Old gold (30):** `['state_*.geo_id', 'place_*.geo_id', 'county_*.geo_id', 'schooldistrictelementary_*.geo_id', 'congressionaldistrict_*.median_age', 'schooldistrictsecondary_*.median_age', 'state_*.total_pop', 'summary.state', 'place_*.total_pop', 'summary.county_name', 'summary.date', 'schooldistrictelementary_*.median_age', 'congressionaldistrict_*.total_pop', 'congressionaldistrict_*.geo_id', 'schooldistrictsecondary_*.total_pop', 'summary.deaths', 'puma_*.total_pop', 'puma_*.median_age', 'summary.confirmed_cases', 'state_*.median_age', 'schooldistrictsecondary_*.geo_id', 'summary.county_fips_code', 'county_*.geo_id', 'schooldistrictelementary_*.total_pop', 'county_*.total_pop', 'puma_*.geo_id', 'county_*.median_age', 'county_*.median_age', 'place_*.median_age', 'county_*.total_pop']`
|
| 388 |
+
|
| 389 |
+
**New gold (30):** `['state_2020.geo_id', 'place_2020.geo_id', 'county_2020_5yr.geo_id', 'schooldistrictelementary_2020.geo_id', 'congressionaldistrict_2020.median_age', 'schooldistrictsecondary_2020.median_age', 'state_2020.total_pop', 'summary.state', 'place_2020.total_pop', 'summary.county_name', 'summary.date', 'schooldistrictelementary_2020.median_age', 'congressionaldistrict_2020.total_pop', 'congressionaldistrict_2020.geo_id', 'schooldistrictsecondary_2020.total_pop', 'summary.deaths', 'puma_2020.total_pop', 'puma_2020.median_age', 'summary.confirmed_cases', 'state_2020.median_age', 'schooldistrictsecondary_2020.geo_id', 'summary.county_fips_code', 'county_2020_5yr.geo_id', 'schooldistrictelementary_2020.total_pop', 'county_2020_5yr.total_pop', 'puma_2020.geo_id', 'county_2020_5yr.median_age', 'county_2020_5yr.median_age', 'place_2020.median_age', 'county_2020_5yr.total_pop']`
|
| 390 |
+
|
| 391 |
+
**Substitutions:**
|
| 392 |
+
- `puma_*` β `puma_2020` (q_year_fallback:2020 -> pad4)
|
| 393 |
+
- `state_*` β `state_2020` (q_year_fallback:2020 -> pad4)
|
| 394 |
+
- `congressionaldistrict_*` β `congressionaldistrict_2020` (q_year_fallback:2020 -> pad4)
|
| 395 |
+
- `place_*` β `place_2020` (q_year_fallback:2020 -> pad4)
|
| 396 |
+
- `schooldistrictsecondary_*` β `schooldistrictsecondary_2020` (q_year_fallback:2020 -> pad4)
|
| 397 |
+
- `county_*` β `county_2020_5yr` (sql_specific_year_alpha:2020_5yr)
|
| 398 |
+
- `schooldistrictelementary_*` β `schooldistrictelementary_2020` (q_year_fallback:2020 -> pad4)
|
| 399 |
+
|
| 400 |
+
---
|
| 401 |
+
|
| 402 |
+
### sid=182 iid=bq061 db=census_bureau_acs_1
|
| 403 |
+
|
| 404 |
+
**Q:** Which census tract has witnessed the largest increase in median income between 2015 and 2018 in California? Tell me the tract code.
|
| 405 |
+
|
| 406 |
+
**Old gold (6):** `['state_*.geo_id', 'census_tracts_*.geo_id', 'censustract_*.median_income', 'censustract_*.geo_id', 'state_*.median_income', 'census_tracts_*.tract_ce']`
|
| 407 |
+
|
| 408 |
+
**New gold (6):** `['state_2015.geo_id', 'census_tracts_2015.geo_id', 'censustract_2018_5yr.median_income', 'censustract_2018_5yr.geo_id', 'state_2015.median_income', 'census_tracts_2015.tract_ce']`
|
| 409 |
+
|
| 410 |
+
**Substitutions:**
|
| 411 |
+
- `censustract_*` β `censustract_2018_5yr` (sql_specific_year_alpha:2018_5yr)
|
| 412 |
+
- `state_*` β `state_2015` (q_year_fallback:2015 -> pad4)
|
| 413 |
+
- `census_tracts_*` β `census_tracts_2015` (q_year_fallback:2015 -> pad4)
|
| 414 |
+
|
| 415 |
+
---
|
| 416 |
+
|
| 417 |
+
### sid=183 iid=bq064 db=census_bureau_acs_1
|
| 418 |
+
|
| 419 |
+
**Q:** Using the 2017 U.S. Census Tract data from the BigQuery public datasets, you need to proportionally allocate each tract's population and income to the zip codes based on the overlapping area between their geographic boundaries. Then, filter the results to include only those zip codes located within
|
| 420 |
+
|
| 421 |
+
**Old gold (18):** `['state_*.geo_id', 'census_tracts_*.geo_id', 'us_census_tracts_national.functional_status', 'us_census_tracts_national.tract_ce', 'state_*.income_per_capita', 'us_census_tracts_national.geo_id', 'census_tracts_*.functional_status', 'censustract_*.total_pop', 'censustract_*.income_per_capita', 'state_*.total_pop', 'zip_codes.zip_code', 'zip_codes.zip_code_geom', 'zip_codes.functional_status', 'census_tracts_*.tract_geom', 'zip_codes.state_code', 'censustract_*.geo_id', 'us_census_tracts_national.tract_geom', 'census_tracts_*.tract_ce']`
|
| 422 |
+
|
| 423 |
+
**New gold (18):** `['state_2017.geo_id', 'census_tracts_2017.geo_id', 'us_census_tracts_national.functional_status', 'us_census_tracts_national.tract_ce', 'state_2017.income_per_capita', 'us_census_tracts_national.geo_id', 'census_tracts_2017.functional_status', 'censustract_2017_5yr.total_pop', 'censustract_2017_5yr.income_per_capita', 'state_2017.total_pop', 'zip_codes.zip_code', 'zip_codes.zip_code_geom', 'zip_codes.functional_status', 'census_tracts_2017.tract_geom', 'zip_codes.state_code', 'censustract_2017_5yr.geo_id', 'us_census_tracts_national.tract_geom', 'census_tracts_2017.tract_ce']`
|
| 424 |
+
|
| 425 |
+
**Substitutions:**
|
| 426 |
+
- `censustract_*` β `censustract_2017_5yr` (sql_specific_year_alpha:2017_5yr)
|
| 427 |
+
- `state_*` β `state_2017` (q_year_fallback:2017 -> pad4)
|
| 428 |
+
- `census_tracts_*` β `census_tracts_2017` (q_year_fallback:2017 -> pad4)
|
| 429 |
+
|
| 430 |
+
---
|
| 431 |
+
|
| 432 |
+
### sid=194 iid=bq406 db=google_dei
|
| 433 |
+
|
| 434 |
+
**Q:** Please calculate the growth rates for Asians, Black people, Latinx people, Native Americans, White people, US women, US men, global women, and global men from 2014 to 2024 concerning the overall workforce.
|
| 435 |
+
|
| 436 |
+
**Old gold (11):** `['dar_non_intersectional_*.race_hispanic_latinx', 'dar_intersectional_*.race_white', 'dar_region_non_intersectional_representation.race_asian', 'dar_region_non_intersectional_representation.report_year', 'dar_non_intersectional_*.gender_us_men', 'dar_non_intersectional_*.gender_global_men', 'dar_non_intersectional_*.gender_global_women', 'dar_intersectional_*.race_black', 'dar_non_intersectional_*.gender_us_women', 'dar_intersectional_*.race_native_american', 'dar_region_non_intersectional_representation.workforce']`
|
| 437 |
+
|
| 438 |
+
**New gold (11):** `['dar_non_intersectional_2014.race_hispanic_latinx', 'dar_intersectional_2014.race_white', 'dar_region_non_intersectional_representation.race_asian', 'dar_region_non_intersectional_representation.report_year', 'dar_non_intersectional_2014.gender_us_men', 'dar_non_intersectional_2014.gender_global_men', 'dar_non_intersectional_2014.gender_global_women', 'dar_intersectional_2014.race_black', 'dar_non_intersectional_2014.gender_us_women', 'dar_intersectional_2014.race_native_american', 'dar_region_non_intersectional_representation.workforce']`
|
| 439 |
+
|
| 440 |
+
**Substitutions:**
|
| 441 |
+
- `dar_intersectional_*` β `dar_intersectional_2014` (q_year_fallback:2014 -> pad4)
|
| 442 |
+
- `dar_non_intersectional_*` β `dar_non_intersectional_2014` (q_year_fallback:2014 -> pad4)
|
| 443 |
+
|
| 444 |
+
---
|
| 445 |
+
|
| 446 |
+
### sid=228 iid=bq105 db=nhtsa_traffic_fatalities_plus
|
| 447 |
+
|
| 448 |
+
**Q:** According to the 2015 and 2016 accident and driver distraction, and excluding cases where the driverβs distraction status is recorded as 'Not Distracted,' 'Unknown if Distracted,' or 'Not Reported,' how many traffic accidents per 100,000 people were caused by driver distraction in each U.S. state fo
|
| 449 |
+
|
| 450 |
+
**Old gold (12):** `[' accident_2015.consecutive_number', ' accident_2015.state_name', ' distract_2015.consecutive_number', ' distract_2015.driver_distracted_by_name', 'population_by_zip_*.population', 'population_by_zip_*.zipcode', 'zipcode_area.zipcode', 'zipcode_area.state_name', ' accident_2016.consecutive_number', ' accident_2016.state_name', ' distract_2016.consecutive_number', ' distract_2016.driver_distracted_by_name']`
|
| 451 |
+
|
| 452 |
+
**New gold (12):** `[' accident_2015.consecutive_number', ' accident_2015.state_name', ' distract_2015.consecutive_number', ' distract_2015.driver_distracted_by_name', 'population_by_zip_2010.population', 'population_by_zip_2010.zipcode', 'zipcode_area.zipcode', 'zipcode_area.state_name', ' accident_2016.consecutive_number', ' accident_2016.state_name', ' distract_2016.consecutive_number', ' distract_2016.driver_distracted_by_name']`
|
| 453 |
+
|
| 454 |
+
**Substitutions:**
|
| 455 |
+
- `population_by_zip_*` β `population_by_zip_2010` (sql_specific:2010 -> pad4)
|
| 456 |
+
|
| 457 |
+
---
|
| 458 |
+
|
| 459 |
+
### sid=237 iid=bq352 db=sdoh
|
| 460 |
+
|
| 461 |
+
**Q:** Please list the average number of prenatal weeks in 2018 for counties in Wisconsin where more than 5% of the employed population had commutes of 45-59 minutes in 2017.
|
| 462 |
+
|
| 463 |
+
**Old gold (25):** `['state_*.geo_id', 'place_*.geo_id', 'cbsa_*.geo_id', 'state_*.employed_pop', 'county_*.geo_id', 'county_*.commute_45_59_mins', 'schooldistrictelementary_*.geo_id', 'place_*.employed_pop', 'place_*.commute_45_59_mins', 'county_*.County_of_Residence', 'cbsa_*.employed_pop', 'county_*.employed_pop', 'censustract_*.commute_45_59_mins', 'state_*.commute_45_59_mins', 'schooldistrictelementary_*.commute_45_59_mins', 'censustract_*.geo_id', 'cbsa_*.commute_45_59_mins', 'schooldistrictelementary_*.employed_pop', 'county_*.County_of_Residence_FIPS', 'censustract_*.employed_pop', 'county_*.Year', 'puma_*.commute_45_59_mins', 'county_*.Ave_Number_of_Prenatal_Wks', 'puma_*.geo_id', 'puma_*.employed_pop']`
|
| 464 |
+
|
| 465 |
+
**New gold (25):** `['state_2017.geo_id', 'place_2017.geo_id', 'cbsa_2017.geo_id', 'state_2017.employed_pop', 'county_2017_5yr.geo_id', 'county_2017_5yr.commute_45_59_mins', 'schooldistrictelementary_2017.geo_id', 'place_2017.employed_pop', 'place_2017.commute_45_59_mins', 'county_2017_5yr.County_of_Residence', 'cbsa_2017.employed_pop', 'county_2017_5yr.employed_pop', 'censustract_2017.commute_45_59_mins', 'state_2017.commute_45_59_mins', 'schooldistrictelementary_2017.commute_45_59_mins', 'censustract_2017.geo_id', 'cbsa_2017.commute_45_59_mins', 'schooldistrictelementary_2017.employed_pop', 'county_2017_5yr.County_of_Residence_FIPS', 'censustract_2017.employed_pop', 'county_2017_5yr.Year', 'puma_2017.commute_45_59_mins', 'county_2017_5yr.Ave_Number_of_Prenatal_Wks', 'puma_2017.geo_id', 'puma_2017.employed_pop']`
|
| 466 |
+
|
| 467 |
+
**Substitutions:**
|
| 468 |
+
- `puma_*` β `puma_2017` (q_year_fallback:2017 -> pad4)
|
| 469 |
+
- `censustract_*` β `censustract_2017` (q_year_fallback:2017 -> pad4)
|
| 470 |
+
- `state_*` β `state_2017` (q_year_fallback:2017 -> pad4)
|
| 471 |
+
- `place_*` β `place_2017` (q_year_fallback:2017 -> pad4)
|
| 472 |
+
- `cbsa_*` β `cbsa_2017` (q_year_fallback:2017 -> pad4)
|
| 473 |
+
- `county_*` β `county_2017_5yr` (sql_specific_year_alpha:2017_5yr)
|
| 474 |
+
- `schooldistrictelementary_*` β `schooldistrictelementary_2017` (q_year_fallback:2017 -> pad4)
|
| 475 |
+
|
| 476 |
+
---
|
| 477 |
+
|
| 478 |
+
### sid=238 iid=bq074 db=sdoh
|
| 479 |
+
|
| 480 |
+
**Q:** Count the number of counties that experienced an increase in unemployment from 2015 to 2018, using 5-year ACS data, and a decrease in dual-eligible enrollee counts between December 1, 2015, and December 1, 2018.
|
| 481 |
+
|
| 482 |
+
**Old gold (16):** `['dual_eligible_enrollment_by_county_and_program.County_Name', 'state_*.geo_id', 'county_*.unemployed_pop', 'dual_eligible_enrollment_by_county_and_program.Public_Total', 'place_*.geo_id', 'congressionaldistrict_*.unemployed_pop', 'schooldistrictsecondary_*.unemployed_pop', 'puma_*.geo_id', 'dual_eligible_enrollment_by_county_and_program.FIPS', 'county_*.geo_id', 'state_*.unemployed_pop', 'place_*.unemployed_pop', 'dual_eligible_enrollment_by_county_and_program.Date', 'congressionaldistrict_*.geo_id', 'puma_*.unemployed_pop', 'schooldistrictsecondary_*.geo_id']`
|
| 483 |
+
|
| 484 |
+
**New gold (16):** `['dual_eligible_enrollment_by_county_and_program.County_Name', 'state_2015.geo_id', 'county_2018_5yr.unemployed_pop', 'dual_eligible_enrollment_by_county_and_program.Public_Total', 'place_2015.geo_id', 'congressionaldistrict_2015.unemployed_pop', 'schooldistrictsecondary_2015.unemployed_pop', 'puma_2015.geo_id', 'dual_eligible_enrollment_by_county_and_program.FIPS', 'county_2018_5yr.geo_id', 'state_2015.unemployed_pop', 'place_2015.unemployed_pop', 'dual_eligible_enrollment_by_county_and_program.Date', 'congressionaldistrict_2015.geo_id', 'puma_2015.unemployed_pop', 'schooldistrictsecondary_2015.geo_id']`
|
| 485 |
+
|
| 486 |
+
**Substitutions:**
|
| 487 |
+
- `puma_*` β `puma_2015` (q_year_fallback:2015 -> pad4)
|
| 488 |
+
- `state_*` β `state_2015` (q_year_fallback:2015 -> pad4)
|
| 489 |
+
- `congressionaldistrict_*` β `congressionaldistrict_2015` (q_year_fallback:2015 -> pad4)
|
| 490 |
+
- `place_*` β `place_2015` (q_year_fallback:2015 -> pad4)
|
| 491 |
+
- `schooldistrictsecondary_*` β `schooldistrictsecondary_2015` (q_year_fallback:2015 -> pad4)
|
| 492 |
+
- `county_*` β `county_2018_5yr` (sql_specific_year_alpha:2018_5yr)
|
| 493 |
+
|
| 494 |
+
---
|
| 495 |
+
|
| 496 |
+
### sid=239 iid=bq066 db=sdoh
|
| 497 |
+
|
| 498 |
+
**Q:** Could you assess the relationship between the poverty rates from the previous year's census data and the percentage of births without maternal morbidity for the years 2016 to 2018? Use only data for births where no maternal morbidity was reported and for each year, use the 5-year census data from th
|
| 499 |
+
|
| 500 |
+
**Old gold (28):** `['state_*.geo_id', 'county_*.geo_id', 'county_*.Maternal_Morbidity_YN', 'cbsa_*.geo_id', 'place_*.geo_id', 'county_*.geo_id', 'place_*.pop_determined_poverty_status', 'cbsa_*.poverty', 'county_*.pop_determined_poverty_status', 'cbsa_*.pop_determined_poverty_status', 'county_*.geo_id', 'county_*.poverty', 'county_*.pop_determined_poverty_status', 'county_*.geo_id', 'puma_*.poverty', 'county_*.pop_determined_poverty_status', 'state_*.poverty', 'puma_*.pop_determined_poverty_status', 'county_*.Births', 'county_*.County_of_Residence_FIPS', 'county_*.poverty', 'county_*.Year', 'county_*.poverty', 'place_*.poverty', 'puma_*.geo_id', 'state_*.pop_determined_poverty_status', 'county_*.pop_determined_poverty_status', 'county_*.poverty']`
|
| 501 |
+
|
| 502 |
+
**New gold (28):** `['state_2016.geo_id', 'county_2015_5yr.geo_id', 'county_2015_5yr.Maternal_Morbidity_YN', 'cbsa_2016.geo_id', 'place_2016.geo_id', 'county_2015_5yr.geo_id', 'place_2016.pop_determined_poverty_status', 'cbsa_2016.poverty', 'county_2015_5yr.pop_determined_poverty_status', 'cbsa_2016.pop_determined_poverty_status', 'county_2015_5yr.geo_id', 'county_2015_5yr.poverty', 'county_2015_5yr.pop_determined_poverty_status', 'county_2015_5yr.geo_id', 'puma_2016.poverty', 'county_2015_5yr.pop_determined_poverty_status', 'state_2016.poverty', 'puma_2016.pop_determined_poverty_status', 'county_2015_5yr.Births', 'county_2015_5yr.County_of_Residence_FIPS', 'county_2015_5yr.poverty', 'county_2015_5yr.Year', 'county_2015_5yr.poverty', 'place_2016.poverty', 'puma_2016.geo_id', 'state_2016.pop_determined_poverty_status', 'county_2015_5yr.pop_determined_poverty_status', 'county_2015_5yr.poverty']`
|
| 503 |
+
|
| 504 |
+
**Substitutions:**
|
| 505 |
+
- `puma_*` β `puma_2016` (q_year_fallback:2016 -> pad4)
|
| 506 |
+
- `state_*` β `state_2016` (q_year_fallback:2016 -> pad4)
|
| 507 |
+
- `place_*` β `place_2016` (q_year_fallback:2016 -> pad4)
|
| 508 |
+
- `cbsa_*` β `cbsa_2016` (q_year_fallback:2016 -> pad4)
|
| 509 |
+
- `county_*` β `county_2015_5yr` (sql_specific_year_alpha:2015_5yr)
|
| 510 |
+
|
| 511 |
+
---
|
| 512 |
+
|
| 513 |
+
### sid=267 iid=bq204 db=eclipse_megamovie
|
| 514 |
+
|
| 515 |
+
**Q:** Find the user with the highest total clicks across all records from all available photo collections.
|
| 516 |
+
|
| 517 |
+
**Old gold (1):** `['photos_v_0_*.user']`
|
| 518 |
+
|
| 519 |
+
**New gold (1):** `['photos_v_0_1.user']`
|
| 520 |
+
|
| 521 |
+
**Substitutions:**
|
| 522 |
+
- `photos_v_0_*` β `photos_v_0_1` (sql_specific:1 -> pad4)
|
| 523 |
+
|
| 524 |
+
---
|
| 525 |
+
|
| 526 |
+
### sid=279 iid=bq049 db=iowa_liquor_sales_plus
|
| 527 |
+
|
| 528 |
+
**Q:** Please show the monthly per capita Bourbon Whiskey sales during 2022 in Dubuque County for the zip code that ranks third in total Bourbon Whiskey sales, using only the population aged 21 and older.
|
| 529 |
+
|
| 530 |
+
**Old gold (8):** `['sales.category_name', 'sales.date', 'sales.zip_code', 'sales.sale_dollars', 'sales.county', 'population_by_zip_*.zipcode', 'population_by_zip_*.population', 'population_by_zip_*.minimum_age']`
|
| 531 |
+
|
| 532 |
+
**New gold (8):** `['sales.category_name', 'sales.date', 'sales.zip_code', 'sales.sale_dollars', 'sales.county', 'population_by_zip_2022.zipcode', 'population_by_zip_2022.population', 'population_by_zip_2022.minimum_age']`
|
| 533 |
+
|
| 534 |
+
**Substitutions:**
|
| 535 |
+
- `population_by_zip_*` β `population_by_zip_2022` (q_year_fallback:2022 -> pad4)
|
| 536 |
+
|
| 537 |
+
---
|
| 538 |
+
|
| 539 |
+
### sid=281 iid=bq286 db=usa_names
|
| 540 |
+
|
| 541 |
+
**Q:** Can you tell me the name of the most popular female baby in Wyoming for the year 2021, based on the proportion of female babies given that name compared to the total number of female babies given the same name across all states?
|
| 542 |
+
|
| 543 |
+
**Old gold (5):** `['usa_1910_*.name', 'usa_1910_*.gender', 'usa_1910_*.year', 'usa_1910_*.number', 'usa_1910_*.state']`
|
| 544 |
+
|
| 545 |
+
**New gold (5):** `['usa_1910_current.name', 'usa_1910_current.gender', 'usa_1910_current.year', 'usa_1910_current.number', 'usa_1910_current.state']`
|
| 546 |
+
|
| 547 |
+
**Substitutions:**
|
| 548 |
+
- `usa_1910_*` β `usa_1910_current` (usa_1910_current_singleton)
|
| 549 |
+
|
| 550 |
+
---
|
| 551 |
+
|
| 552 |
+
### sid=284 iid=bq143 db=CPTAC_PDC
|
| 553 |
+
|
| 554 |
+
**Q:** Use CPTAC proteomics and RNAseq data for Clear Cell Renal Cell Carcinoma to select 'Primary Tumor' and 'Solid Tissue Normal' samples. Join the datasets on sample submitter IDs and gene symbols. Calculate the correlation between protein abundance (log2 ratio) and gene expression levels (log-transform
|
| 555 |
+
|
| 556 |
+
**Old gold (11):** `['quant_proteome_*.case_id', 'quant_proteome_*.aliquot_id', 'quant_proteome_*.gene_symbol', 'quant_proteome_*.protein_abundance_log2ratio', 'aliquot_to_case_mapping_current.sample_submitter_id', 'aliquot_to_case_mapping_current.sample_type', 'aliquot_to_case_mapping_current.case_id', 'aliquot_to_case_mapping_current.aliquot_id', 'RNAseq_hg38_gdc_current.gene_name', 'RNAseq_hg38_gdc_current.fpkm_unstranded', 'RNAseq_hg38_gdc_current.sample_barcode']`
|
| 557 |
+
|
| 558 |
+
**New gold (11):** `['quant_proteome_CPTAC_CCRCC_discovery_study_pdc_current.case_id', 'quant_proteome_CPTAC_CCRCC_discovery_study_pdc_current.aliquot_id', 'quant_proteome_CPTAC_CCRCC_discovery_study_pdc_current.gene_symbol', 'quant_proteome_CPTAC_CCRCC_discovery_study_pdc_current.protein_abundance_log2ratio', 'aliquot_to_case_mapping_current.sample_submitter_id', 'aliquot_to_case_mapping_current.sample_type', 'aliquot_to_case_mapping_current.case_id', 'aliquot_to_case_mapping_current.aliquot_id', 'RNAseq_hg38_gdc_current.gene_name', 'RNAseq_hg38_gdc_current.fpkm_unstranded', 'RNAseq_hg38_gdc_current.sample_barcode']`
|
| 559 |
+
|
| 560 |
+
**Substitutions:**
|
| 561 |
+
- `quant_proteome_*` β `quant_proteome_CPTAC_CCRCC_discovery_study_pdc_current` (sql_specific_alphanumeric:CPTAC_CCRCC_discovery_study_pdc_current)
|
| 562 |
+
|
| 563 |
+
---
|
| 564 |
+
|
| 565 |
+
### sid=318 iid=bq006 db=austin
|
| 566 |
+
|
| 567 |
+
**Q:** What is the date with the second highest Z-score for daily counts of 'PUBLIC INTOXICATION' incidents in Austin for the year 2016? List the date in the format of '2016-xx-xx'.
|
| 568 |
+
|
| 569 |
+
**Old gold (2):** `['incidents_*.date', 'incidents_*.descript']`
|
| 570 |
+
|
| 571 |
+
**New gold (2):** `['incidents_2016.date', 'incidents_2016.descript']`
|
| 572 |
+
|
| 573 |
+
**Substitutions:**
|
| 574 |
+
- `incidents_*` β `incidents_2016` (sql_specific:2016 -> pad4)
|
| 575 |
+
|
| 576 |
+
---
|
| 577 |
+
|
| 578 |
+
### sid=323 iid=bq430 db=ebi_chembl
|
| 579 |
+
|
| 580 |
+
**Q:** Find pairs of different molecules tested in the same assay and standard type, where both have 10β15 heavy atoms, fewer than 5 activities in that assay, fewer than 2 duplicate activities, non-null standard values, and pChEMBL values over 10. For each pair, report the maximum heavy atom count, the lat
|
| 581 |
+
|
| 582 |
+
**Old gold (13):** `['activities_*.assay_id', 'activities_*.standard_type', 'activities_*.activity_id', 'activities_*.standard_value', 'activities_*.standard_relation', 'activities_*.pchembl_value', 'activities_*.molregno', 'compound_structures_25.canonical_smiles', 'compound_properties_26.heavy_atoms', 'docs_*.doc_id', 'docs_*.year', 'docs_*.journal', 'docs_*.first_page']`
|
| 583 |
+
|
| 584 |
+
**New gold (13):** `['activities_29.assay_id', 'activities_29.standard_type', 'activities_29.activity_id', 'activities_29.standard_value', 'activities_29.standard_relation', 'activities_29.pchembl_value', 'activities_29.molregno', 'compound_structures_25.canonical_smiles', 'compound_properties_26.heavy_atoms', 'docs_29.doc_id', 'docs_29.year', 'docs_29.journal', 'docs_29.first_page']`
|
| 585 |
+
|
| 586 |
+
**Substitutions:**
|
| 587 |
+
- `activities_*` β `activities_29` (sql_specific:29 -> pad4)
|
| 588 |
+
- `docs_*` β `docs_29` (sql_specific:29 -> pad4)
|
| 589 |
+
|
| 590 |
+
---
|
| 591 |
+
|
| 592 |
+
### sid=369 iid=ga001 db=ga4
|
| 593 |
+
|
| 594 |
+
**Q:** I want to know the preferences of customers who purchased the Google Navy Speckled Tee in December 2020. What other product was purchased with the highest total quantity alongside this item?
|
| 595 |
+
|
| 596 |
+
**Old gold (5):** `['events_*.user_pseudo_id', 'events_*.event_name', 'events_*.items', 'events_*.items', 'events_*.items']`
|
| 597 |
+
|
| 598 |
+
**New gold (5):** `['events_20201201.user_pseudo_id', 'events_20201201.event_name', 'events_20201201.items', 'events_20201201.items', 'events_20201201.items']`
|
| 599 |
+
|
| 600 |
+
**Substitutions:**
|
| 601 |
+
- `events_*` β `events_20201201` (table_suffix_range:20201201 -> pad8)
|
| 602 |
+
|
| 603 |
+
---
|
| 604 |
+
|
| 605 |
+
### sid=370 iid=ga002 db=ga4
|
| 606 |
+
|
| 607 |
+
**Q:** Tell me the most purchased other products and their quantities by customers who bought the Google Red Speckled Tee each month for the three months starting from November 2020.
|
| 608 |
+
|
| 609 |
+
**Old gold (3):** `['events_*.user_pseudo_id', 'events_*.items', 'events_*.event_name']`
|
| 610 |
+
|
| 611 |
+
**New gold (3):** `['events_20201101.user_pseudo_id', 'events_20201101.items', 'events_20201101.event_name']`
|
| 612 |
+
|
| 613 |
+
**Substitutions:**
|
| 614 |
+
- `events_*` β `events_20201101` (sql_8digit_literal:20201101)
|
| 615 |
+
|
| 616 |
+
---
|
| 617 |
+
|
| 618 |
+
### sid=372 iid=ga004 db=ga4
|
| 619 |
+
|
| 620 |
+
**Q:** Can you figure out the average difference in pageviews between users who bought something and those who didnβt in December 2020? Just label anyone who was involved in purchase events as a purchaser.
|
| 621 |
+
|
| 622 |
+
**Old gold (2):** `['events_*.user_pseudo_id', 'events_*.event_name']`
|
| 623 |
+
|
| 624 |
+
**New gold (2):** `['events_20201201.user_pseudo_id', 'events_20201201.event_name']`
|
| 625 |
+
|
| 626 |
+
**Substitutions:**
|
| 627 |
+
- `events_*` β `events_20201201` (table_suffix_range:20201201 -> pad8)
|
| 628 |
+
|
| 629 |
+
---
|
| 630 |
+
|
| 631 |
+
### sid=373 iid=ga008 db=ga4
|
| 632 |
+
|
| 633 |
+
**Q:** Could you provide the total number of page views for each day in November 2020 as well as the average number of page views per user on those days, restricted to users who made at least one purchase in November 2020?
|
| 634 |
+
|
| 635 |
+
**Old gold (3):** `['events_*.user_pseudo_id', 'events_*.event_date', 'events_*.event_name']`
|
| 636 |
+
|
| 637 |
+
**New gold (3):** `['events_20201101.user_pseudo_id', 'events_20201101.event_date', 'events_20201101.event_name']`
|
| 638 |
+
|
| 639 |
+
**Substitutions:**
|
| 640 |
+
- `events_*` β `events_20201101` (table_suffix_range:20201101 -> pad8)
|
| 641 |
+
|
| 642 |
+
---
|
| 643 |
+
|
| 644 |
+
### sid=374 iid=ga017 db=ga4
|
| 645 |
+
|
| 646 |
+
**Q:** How many distinct users viewed the most frequently visited page during January 2021?
|
| 647 |
+
|
| 648 |
+
**Old gold (4):** `['events_*.event_params', 'events_*.event_name', 'events_*.event_timestamp', 'events_*.user_pseudo_id']`
|
| 649 |
+
|
| 650 |
+
**New gold (4):** `['events_20210101.event_params', 'events_20210101.event_name', 'events_20210101.event_timestamp', 'events_20210101.user_pseudo_id']`
|
| 651 |
+
|
| 652 |
+
**Substitutions:**
|
| 653 |
+
- `events_*` β `events_20210101` (table_suffix_range:20210101 -> pad8)
|
| 654 |
+
|
| 655 |
+
---
|
| 656 |
+
|
| 657 |
+
### sid=377 iid=ga018 db=ga4
|
| 658 |
+
|
| 659 |
+
**Q:** On January 2nd, 2021, I want to determine the percentage of times users transition from a product list page (PLP) view to a product detail page (PDP) view within the same session, using only page_view events. Could you calculate how many PLP views eventually led to a PDP view in the same session on
|
| 660 |
+
|
| 661 |
+
**Old gold (10):** `['events_*.event_name', 'events_*.event_date', 'events_*.event_timestamp', 'events_*.user_pseudo_id', 'events_*.user_id', 'events_*.device', 'events_*.geo', 'events_*.traffic_source', 'events_*.event_params', 'events_*.user_properties']`
|
| 662 |
+
|
| 663 |
+
**New gold (10):** `['events_20210102.event_name', 'events_20210102.event_date', 'events_20210102.event_timestamp', 'events_20210102.user_pseudo_id', 'events_20210102.user_id', 'events_20210102.device', 'events_20210102.geo', 'events_20210102.traffic_source', 'events_20210102.event_params', 'events_20210102.user_properties']`
|
| 664 |
+
|
| 665 |
+
**Substitutions:**
|
| 666 |
+
- `events_*` β `events_20210102` (sql_8digit_literal:20210102)
|
| 667 |
+
|
| 668 |
+
---
|
| 669 |
+
|
| 670 |
+
### sid=382 iid=ga010 db=ga4
|
| 671 |
+
|
| 672 |
+
**Q:** Can you give me an overview of our website traffic for December 2020? I'm particularly interested in the channel with the fourth highest number of sessions.
|
| 673 |
+
|
| 674 |
+
**Old gold (3):** `['events_*.user_pseudo_id', 'events_*.event_params', 'events_*.event_timestamp']`
|
| 675 |
+
|
| 676 |
+
**New gold (3):** `['events_20201201.user_pseudo_id', 'events_20201201.event_params', 'events_20201201.event_timestamp']`
|
| 677 |
+
|
| 678 |
+
**Substitutions:**
|
| 679 |
+
- `events_*` β `events_20201201` (table_suffix_range:20201201 -> pad8)
|
| 680 |
+
|
| 681 |
+
---
|
| 682 |
+
|
| 683 |
+
### sid=385 iid=ga012 db=ga4
|
| 684 |
+
|
| 685 |
+
**Q:** On November 30, 2020, identify the item category with the highest tax rate by dividing tax value in usd by purchase revenue in usd for purchase events, and then retrieve the transaction IDs, total item quantities, and both purchase revenue in usd and purchase revenue for those purchase events in tha
|
| 686 |
+
|
| 687 |
+
**Old gold (7):** `['events_*.items', 'events_*.ecommerce', 'events_*.ecommerce', 'events_*.event_name', 'events_*.ecommerce', 'events_*.ecommerce', 'events_*.ecommerce']`
|
| 688 |
+
|
| 689 |
+
**New gold (7):** `['events_20200101.items', 'events_20200101.ecommerce', 'events_20200101.ecommerce', 'events_20200101.event_name', 'events_20200101.ecommerce', 'events_20200101.ecommerce', 'events_20200101.ecommerce']`
|
| 690 |
+
|
| 691 |
+
**Substitutions:**
|
| 692 |
+
- `events_*` β `events_20200101` (q_year_fallback:2020 -> pad8)
|
| 693 |
+
|
| 694 |
+
---
|
| 695 |
+
|
data/spider2_original/review.xlsx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ea5c49b0a59d75cf15fef63d3060338fefce5caf1de19b4ff0a3c1f60fe85be2
|
| 3 |
+
size 118096
|
data/spider2_original/spider2_original_lite_samples.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:56cd6497cf0e479d8f3d880a7c57395c670d80bedce97325054ffdc1dde95dc1
|
| 3 |
+
size 198491180
|
data/spider2_original/substitution_log.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|