Case Studies

Everyone doing Data Analysis, me too, but WHY ?

Data analysis dates back to early civilizations, where records of trade and taxes were manually kept. In the 18th and 19th centuries, statistics emerged, with pioneers like Gauss and Nightingale shaping modern analysis. The 20th century saw breakthroughs with relational databases and SQL, revolutionizing data management. By the 1990s, business intelligence and data warehouses enabled deeper insights. Today, data analysis is a critical tool in finance, healthcare, and beyond, shaping the future of technology and business strategies. Thus, general workforce should take actions to upskill in order be a part of this unprecedented era.

README – Project Overview
  • Excel: Watch the video walkthrough demonstrating Excel tricks and analysis methods used in DoorDash dataset, Download dataset here.
  • Tableau: Explore interactive charts embedded directly on this page—no Tableau file download needed. Download dataset here.
  • SQL: Query results and SQL scripts are fully shown on the webpage. Datasets could be retrieved from WorldBank

This project demonstrates core analytical skills using real tools on simulated business cases. Practice makes better skillsets — thanks for checking them out!

Project 1: A Excel Skill Share Project Reflection

I love the Skill Share projects, though I got lost in the thousands of ideas 😵‍💫. Skillsharing provides a platform for expressing ideas and spreading knowledge. These projects are critical in building our online personal brands while connecting with the community.

Time at American Family Insurance, I gained hands-on experience with IBM Mainframe databases, Apex CRM, Guidewire, Salesforce, and more. This journey opened my eyes to how technology is our most powerful assistant 🧠💻.

Years in school helped me develop analytical thinking and structured decision-making. It’s not just about knowledge, but understanding how technology works and how we can use it effectively.


Fast forward to today: Imagine you're working at DoorDash to uncover insights for a promo campaign. Data engineers prepare the datasets—now it’s our turn 🎯.

📊 Excel Tips and Tricks

  • Ctrl + A to highlight sheets. Then use Ctrl + T and Ctrl + Shift + N to move data into a clean new sheet with the right dimensions.
  • Use Define Name to name your tables early 🏷️—it keeps things organized when juggling multiple sheets.
  • To clean data, use the Remove Duplicates feature (search 🔍) and Filter for blanks. Blank cells at times might be considered as flags for inconsistency ⚠️.

📦 Why Pivot Tables?

Pivot Tables are essential for:

  • Data aggregation (totals, averages, modes, etc.)
  • OLAP slicing and dicing—filter by dimensions to uncover trends.

For example, seniors order online more, which I truly believes as my dad orders from Walmart online all the time. Pivot tables helped me identify campaign trends and family behavior patterns.

🎧 Listen to my project walkthrough to gain more insights that could inform your next campaign. With my previous experiences across industries, I believe some of my insights are worth considered.


Being part of the Data Career Jumpstart community has sharpened my skills. Seeing how others approach data differently constantly inspires me. I keep refining my work, because everyone’s story—and analysis—brings something new.

In Exploratory Data Analysis (EDA), one dataset can tell a hundred stories—each shaped by the perspective of the analyst 🔬📚.

My Confession Looking at my basic dashboard, you all can tell that I am not so much into arts but thanks God, we have Tableau/ Power BI to do this hard work for me.

Pivot Summary Scatterplot Graph Data Screenshot Dashboard Overview

This section was previously removed in error during a routine update. I’ve since restored the content, including interactive dashboards and visuals. This experience served as a valuable reminder of the importance of version control—moving forward, I will implement Git-based version tracking for this site to prevent accidental loss of work and to maintain transparency in project updates.

Project 2: Visualization with Tableau

This Tableau project explores how early academic performance—measured by 4th grade MCAS math scores—relates to long-term outcomes such as college enrollment across Massachusetts public and charter schools.

📌 Project Title:
Longitudinal Analysis of MCAS 4th Grade Math Performance and College Enrollment Across Massachusetts School Districts

🎯 Objective:
Examine whether 4th grade MCAS math scores correlate with college enrollment outcomes across public and charter schools.

📊 Data Overview:

  • Sample Size: ~900,000 students
  • Timeline: 4th to 12th grade (8-year cohort)
  • Focus Point: 5th grade transition as a milestone

🔍 Key Findings:

  1. Performance at 5th Grade Entry:
    Only 5 public districts had students pass 4th grade MCAS math (pass rates: 27–34%). However, internal grades showed 52–63% earning A’s — highlighting possible grade inflation.
  2. College Enrollment vs. Economic Disadvantage:
    In classes of 15–20 students, 50–60% of economically disadvantaged students enrolled in college. These schools had ~55–60% low-income populations.
  3. Outliers Identified:
    Positive: A class of 35 with 34 students attending college (33% disadvantaged).
    Negative: A class of 12 with only 2 students attending (50% disadvantaged).
  4. Cluster of High College Attendance:
    Schools with 10–30% disadvantaged students saw 80–90% college attendance — suggesting strong correlation between poverty concentration and post-secondary success.
  5. Charter School Performance:
    No charter schools had passing scores in 4th grade MCAS math — raising concerns about early academic prep in the charter model.
  6. Case Study – O’Bryant Math School:
    95% graduation rate, 53% non-native English speakers. Suggests language barriers don’t preclude academic success. Needs follow-up to verify MCAS alignment.

📈 Next Steps:

  • Verify if high-performing high schools were among early MCAS achievers
  • Conduct regression analysis (variables: class size, income, school type)
  • Investigate grading inflation in districts with high internal A grades

🏛️ Potential Impact:

  • Inform class size and funding policy
  • Guide early academic interventions (4th–5th grade)
  • Reassess charter school effectiveness in foundational years
Tableau dashboard screenshot
Tableau dashboard screenshot

This project underscores how Tableau can bring life to raw academic data, helping stakeholders uncover meaningful patterns in education policy.

Project 3: The One Coding Language Everyone in Data Must Know - SQL

Project Overview

How much have countries borrowed from the World Bank? Since childhood, I often heard about this organization and its efforts to support global prosperity by providing loans to poorer nations. Today, by exploring a dataset from their website, we can uncover some interesting insights about global debt, repayment, and financial obligations worldwide.

1. Which countries paid off their loans?

SQL Query:

Repaying Countries Result Countries pay more than 50% Countries pay more than 50%

There were 31 countries which paid more than 50 % of their debts to WorldBank, which was easily seen by scrolling down the result. For practicing purposes, I created another sub-query and result shows the same 31.

We can see that China repaid a huge amount out of their loans, and Macedonia, paid 95%, which meant they are incredibly reliable. Also the one who pays a smallest portion was Korea, at 0%. I wonder if there is any discrepancies in this dataset, but it was downloaded from WorldBank so, maybe in further studies we can dig deeper later.

Query Result:

SQL Query for Repaying Countries
Min- Repaying Countries
2. How much did Vietnam owe, how has the repayment process been?

SQL Query:

Vietnam Loan Query

Overall, Vietnam repaid to IDA $149 billion, which was 15% of the total owed $1,601 billion. We heard here and there stories like some countries loaned us, e.g., Japan $6 billion to create a highway, but the salary paid to the Japanese engineers were extremely high. So with these numbers, I would further explore data integrity from both sides of aid sources.

Query Result:

Result of Vietnam’s Loan
Repayment Result
3. Loan Transactions Across Countries

SQL Query:

Transactions by All Countries

From our query result, we can look at the case of Afghanistan, there are 5004 transactions fully disbursed and around 35% of it, 1854 transactions fully repaid. Up until the published date of this dataset, there were another 3195 transactions in progress of being disbursed. There are 667 repaying transactions. On the other hand, Africa, has around 121 repaid transactions

Query Result:

SQL Query for World Loan Transactions
4. Which Country Received the Maximum Loan?
SQL Query:

Most Owed

India has received the highest loan amount. No wonder, over the past 20 years, Indian talent has been widely recognized in computer science, mathematics, and engineering. Some cities have become unexpectedly wealthier due to increased investment in infrastructure and the growth of capitalism. However, the issue with foreign aid to Asian countries is that, despite the large sums provided, regional economic disparities have not significantly improved. In fact, in some cases, they have worsened.

Query Result:

SQL Query for Most Owed Country
5. Credit Status of Countries

SQL Query:

Loan Status Result

We can see that Afganistan paid off around 33% of their transactions. For further analysis, we can have some stacked column charts to compare the percentage of repaid versus total number of transactions for all countries.

Query Result:

SQL Query for Loan Status
6. Loan Disbursements status for Individual Projects

SQL Query:

Project Loan Disbursement

Ever curious about how these project used their money, and how did it impacted their community? We need to know the projects' names and hopefully we can retrieve some insights from their local newspapers or reports about them. We can take a look at Ho Chi Minh city, where I grew up from to see if I can spot any familiar names of the transactions, in later queries.

Query Result:

SQL Query for Project Disbursement
7. Revising currency formats

SQL Query:

Formats

I recognize our numbers was hard to read from afar so I revise the currency format. I also have some stem plot using SQL, weirdo right but funny.

Query Result:

Stem plot SQL Query for Formats

🌏 Global Loan Distribution – SQL Practice Case Study Key Findings

📌 Insight:
India received the highest number of loans globally, while my home country, Vietnam, ranked 5th.

Growing up in the 1980s–1990s, my hometown had no skyscrapers.Today, it's rising as the new dragon of Asia. Tech giants are taking notice: Google may soon open a headquarters in Vietnam. NVIDIA acquired the AI division of Vingroup. Qualcomm bought a division of VinAI.

🎖️ Historical Context
My father lived through two wars and remembers how hard it was for Vietnam to join the WTO in the early 2000s. Our government had to prove our economy was an open economy, where capital belonged to the people. And Vietnam was approved to join.Once known for corruption, Vietnam has made major progress in the past 5 years. The government’s top political priority today:

🚫 Eradicating corruption

🧮 Data Cleaning Process

The dataset had formatting issues due to CSV export:

🧠 SQL Concepts Practiced


Over the past 40 years, Vietnam has gone from the ground up— from a war-torn nation to an emerging tech economy. This case study sharpened my early SQL skills and reminded me how far we’ve come.

Project 4: Is My Hospital Doing Well Financially?

Project Details

Imagine I am the CFO of Mars Health, analyzing patient data from 1998–2008. These patients had diabetes and were admitted to the hospital. We want to understand how many were readmitted within 30 days, which increases financial strain and worsens outcomes due to comorbidities like cardiovascular or renal disease.

  • Many people don’t understand why hospital stays in the U.S. are limited, unlike in many parts of Asia where patients can request extended stays. However, in Asia, 90% of expenses are paid out of pocket, leading to tragic situations when families can’t afford treatment. In the U.S., early intervention is often possible, though rising concerns about private insurance add complexity.
  • As CFO of Mars Health, I want to identify departments admitting over 50 patients who require 2.5 or more procedures per stay. Procedures include EKGs, labs, MRIs, and CTs; vital sign monitoring is excluded as it incurs no cost. I've conducted four key SQL queries.
1. Number of Procedures per Department

SQL Query:

SQL Query

In the proctology department, which specializes in colorectal surgeries, patients often require 4 procedures per stay. This aligns with my experience interpreting for patients—colonoscopy alone may involve vitals, imaging, anesthesia, and more.

SQL Result:

SQL Result
2. Number of Patients per Department

SQL Query:

SQL Query

Despite having fewer procedures per stay, the cardiology department served 5,352 patients. I still need to confirm whether this figure represents one week or one month of data.

SQL Result:

SQL Result
3. Departments with More Than 50 Patients

SQL Query:

SQL Query

This query identifies departments with more than 50 admissions. There appear to be about 15 such departments.

SQL Result:

SQL Result
4. Filtering Departments with More Than 50 Patients and >2.5 Procedures

SQL Query:

SQL Query

SQL Result:

SQL Result

The analysis shows no major racial disparity in the number of procedures per patient.

5. Number of Procedures by Race

SQL Query:

SQL Query

SQL Result:

SQL Result

Further Queries: Subqueries, CTE, Window Functions, Materialized Views

The query in problem statement #5 is an example of a subquery. Let's now explore how we can revise the query as shown below.

Subquery

In this query, we join the two tables—`demographics` and `health`—on the common column `patient_nbr`.

We created a new table called `inner_table`, which doesn't automatically refresh when the data in the original tables is updated.

  • Key Benefits:
  • Materializing Data: The `inner_table` stores the results permanently, unlike a temporary view.
  • Performance Optimization: This approach saves time by avoiding repeated calculations when querying the average lab procedures per race.
  • Can Be Used for Further Analysis: You can run new queries on `inner_table` without needing to repeatedly join `health` and `demographics`.
CTE

Next, we examine CTE (Common Table Expressions) by rewriting the materialized view query as a CTE.

Window Function

Now, let's use a Window Function in the query to observe how long it takes for MySQL to process the request.

Feature Comparison Table
Feature Window Functions Common Table Expressions (CTEs) Materialized Views
Purpose Calculate running totals, ranks, etc., over a set of rows Simplify complex queries by breaking them into readable blocks Store results of a query physically for faster access
Volatile/Static Dynamic—recomputed with each query Dynamic—evaluated at runtime Static—must be refreshed manually or periodically
Storage No extra storage No extra storage Occupies disk space
Performance Efficient for row-by-row calculations Good for logical clarity; neutral on performance Can greatly improve performance for complex joins
Use Case Ranking, moving averages, percentiles Recursive queries, simplification Dashboards, heavy aggregations reused often

Key Business Takeaways

Based on our analysis, it is evident that cardiology and nephrology departments have the highest patient volumes and the most procedures performed. These departments represent critical areas of focus for operational improvements.

These insights are crucial for directing future strategies to improve departmental efficiency and patient care.

My Insights

These insights are not just theoretical but directly applicable to the project and could significantly contribute to improving business processes and enhancing overall performance.