Kusto Detective Agency 2022, Case #2 – Election fraud?
The mayor of Digitown, Mrs. Gaia Budskott, has found herself in quite a pickle. The election for the city’s mascot was run online for the first time, and it was a huge success! Or was it??
Over 5 million people voted. Four candidates made it to the final round:
- – Kastor the Elephant – The darling of Digitown Zoo
- – Gaul the Octopus – A Digitown celebrity, who was a whiz at predicting who’d win the local soccer games
- – William (Willie) the Tortoise – Digitown’s oldest living creature (estimated age – 176.4 years)
- – Poppy the Goldfish – ex-Mayor Jason Guvid’s childhood pet
The polls predicted a close battle between Kastor and Gaul, but the actual results showed that the ex-mayor’s fish got a whopping 51.7% of all votes! That sure does sound fishy…
The mayor is afraid of a vote-tampering scandal that could affect all elections in Digitown! You’ve helped her out last time, and she’s counting on you to get to the bottom of this mystery.
If voting fraud happened – prove it and correct the election numbers: what percentage of the votes did each candidate get?
You have access to the elections data: IP, anonymized id, vote, date-time – and the function used for counting the votes.
Good luck, rookie. We’re counting on you.
.execute database script <| // Ingestion may take ~40sec to complete, total 5M+ records .create-merge table Votes (Timestamp:datetime, vote:string, via_ip:string, voter_hash_id:string) .ingest async into table Votes (@'https://kustodetectiveagency.blob.core.windows.net/digitown-votes/votes_1.csv.gz') .ingest async into table Votes (@'https://kustodetectiveagency.blob.core.windows.net/digitown-votes/votes_2.csv.gz') .ingest async into table Votes (@'https://kustodetectiveagency.blob.core.windows.net/digitown-votes/votes_3.csv.gz') The description gives us a query to run. This query calculates the percentages of each candidate. Lets run it!
// Query that counts the votes: Votes | summarize Count=count() by vote | as hint.materialized=true T | extend Total = toscalar(T | summarize sum(Count)) | project vote, Percentage = round(Count*100.0 / Total, 1), Count | order by Count
The key takeaways from the riddle:
- Election fraud has happened, but how — is the question
- Kastor and Gaul were supposed to be in a close battle
- Poppy won the elections! 🐟
Run Below Query to get the “What percentage of the votes did each candidate get?” Answer
The Votes
table has; Timestamp, Vote, IP Address, Voter Hash ID.
Optimistically, we can see if the same voter has voted multiple times?
Let’s count
by Voter Hash ID, and filter where
voter count more than once:
Votes
| summarize VoterCount = count() by voter_hash_id
| where VoterCount > 1
We shall utilize bin
operator to group votes counts into bins of 15 minutes.
Votes
| extend bin_timestamp = bin(Timestamp, 15m)
| summarize count() by vote, bin_timestamp
| render columnchart with (ysplit=axes, series=bin_timestamp)
Counting votes for each IP in a specific timestamp bin
That tells us how many votes each candidate gets each minute:
Votes
| extend bin_timestamp = bin(Timestamp, 1m)
| summarize votes_count_per_bin = count() by vote, bin_timestamp, via_ip
| summarize count_votes_count_per_bin = count() by votes_count_per_bin, vote
| render columnchart
Let’s remove votes of IPs where there have been more than 11 in a single minute!
The iff
function allows setting a value of 0 when the condition for fraudulent votes is detected otherwise the original values for votes from that IP:
Votes
| extend bin_timestamp = bin(Timestamp, 1m)
| summarize votes_count_per_bin = count() by vote, bin_timestamp, via_ip
| extend votes_count_per_bin = iff(votes_count_per_bin >= 11, 0, votes_count_per_bin)
| summarize Count=sum(votes_count_per_bin) by vote
| as hint.materialized=true T
| extend Total = toscalar(T | summarize sum(Count))
| project vote, Percentage = round(Count*100.0 / Total, 1), Count
| order by Count
The Answer is below.