Life, Love, and Lawbreaking

Investigating the Correlation Between Birth, Marriage Counts,
and Crime Rates in the Philippines

Sabi nga nila, mahirap magkapamilya at magka-anak.

While starting a family is often seen as a source of joy and stability, the reality in the Philippines is far more complicated. This ideal is burdened with pressure — financial, emotional, and social — that weigh increasingly on Filipinos today.

Declining marriage and birth rates reflect a growing hesitation among Filipinos to commit to long-term family life.

Meanwhile, communities across the country continue to face challenges from crime. Although shaped by many factors, its patterns may mirror broader societal stresses — and when regarded alongside trends in birth and marriage, they could reveal meaningful insights.

Thus, in line with the United Nations SDG 16: Peace, Justice and Strong Institutions, our group set out to explore the correlation between birth, marriage counts, and crime rates in the Philippines — not to imply any causation, but to systematically uncover patterns that could shed light on the realities of Filipino family life and public safety.

Can investigating how these figures rise and fall together reveal patterns beneath the surface?

To see a fuller picture, let us look at some statistics.

The number of registered marriages in the Philippines have steadily declined over the past five years, falling by 7.8% from 2022 to 2023 alone. [1] The COVID-19 pandemic dealt a major blow to these numbers, bringing 2020 figures to their lowest level since 1970. [2]

At the same time, latest data indicates that the number of registered births decreased by 17.2% percent from 2014 to 2023. [3]

These trends signal more profound shifts in how Filipinos approach family life, shaped by worsening economic conditions.

By the end of 2024, 63% of Filipino families considered themselves poor, with half stating they could not afford a healthy diet. [4]

In these conditions, starting a family becomes not just a personal goal but a difficult economic gamble.

In parallel, crime rates continue to fluctuate, influenced by both real-world changes and modifications in reporting systems. While efforts in crime prevention and shifts in data collection have affected the numbers, [5] [6] its trends regardless reflect the pressures that communities face.

Together, these indicators paint a picture of a country grappling with both personal and collective burdens.

Well, what now?

Let us concretize our approach in the succeeding sections.

The issue is, birth, marriage, and crime are often studied in isolation, but they are all deeply tied to the state of a society.

Without a clear grasp of how these trends may interact, we risk overlooking possible indicators of wider societal stress, especially in relation to the demands of building a family.

Recognizing this gap, what do we do?

We endeavor to leverage the power of data science to analyze statistics surrounding births, marriages, and crimes.

To move beyond intuition and surface real patterns, we framed the following research questions:

1

Is there a statistically significant correlation between monthly crime rates and monthly birth and marriage counts in the Philippines?

What patterns in total crime incidents emerge across Philippine regions with varying birth and marriage counts and levels of urbanization?

2
3

Which specific crime types show strong correlations with both birth and marriage counts?

To anchor our investigation, we hypothesized:

NULL HYPOTHESIS

There is no statistically significant correlation between crime rates and birth and marriage counts in the Philippines.

ALTERNATIVE HYPOTHESIS

There is a statistically significant correlation between crime rates and birth and marriage counts in the Philippines.

And finally, we mapped out a plan:

We collect monthly crime data by type and region, along with birth and marriage data by region, to explore trends and assess correlations between crime rates and both birth and marriage counts.

DATA COLLECTION

To carry out our analysis, we gathered four datasets that directly support our research questions.

CRIME

Crime statistics was sourced from the Philippine National Police (PNP) website, encompassing national crime data by month from January 2022 to September 2023 in PDF format.

We compiled the data from these PDFs into a unified sheet in order to preprocess it.

BIRTH AND MARRIAGE

Birth and marriage statistics were collected from the Philippine Statistics Authority (PSA) website, covering monthly data from January 2022 to October 2024 grouped by region, province, and highly urbanized city. These data, compiled from local civil registrars, are organized into separate Excel sheets per year.

URBANIZATION

Data on urbanization, expressed in percentages by region, province, and highly urbanized city, was taken from the 2020 Census of Population and Housing conducted by the PSA. The findings were officially released in Excel format and made publicly available on their website.

DATA EXPLORATION

Data Understanding

Good research begins with understanding the data. Let us provide a closer look at the datasets that shape our findings.

The Crime dataset contains 378 rows and 66 columns.

Each row is uniquely identified by:

  • Source Name
    • The filename of the PDF from which the data was extracted.
    • Note:
      • This column also denotes the month and year the crime data represents, which is ordinal in nature.
  • OFFICE
    • The name of the PNP regional office that reported the data.
    • Type of Data: Qualitative, Nominal Level

The remaining columns are aggregated from three tables present in each PDF:

  1. Index and non-index crimes
  2. Public safety-related crimes
  3. A summary table

These columns include:

  • Crime Types
    • Type of Data: Qualitative, Ratio Level
  • Derived Tools
    • Type of Data: Quantitative, Discrete, Ratio Level
  • Derived Rates
    • Type of Data: Quantitative, Continuous, Ratio Level

Further technical details are available in the linked page.

This dataset provides a comprehensive snapshot of reported crimes across Philippine regions, capturing both crime types and derived performance indicators of law enforcement.

The Birth dataset consists of 405 rows and 16 columns, while the Marriage dataset includes 402 rows and 16 columns.

Since both datasets come from the same source, they share a nearly identical structure, allowing them to be seamlessly merged for preprocessing.

Each row is uniquely identified by:

  • Location Column
    • In the Birth dataset: PLACE OF USUAL RESIDENCE
    • In the Marriage dataset: PLACE OF OCCURRENCE
    • Both refer to the same concept: the geographic area where the event is associated.
    • Type of Data: Qualitative, Discrete, Nominal level
  • Type of Place
    • Identifies whether the row refers to a region, province, or highly urbanized city.
    • Type of Data: Qualitative, Discrete, Nominal level
  • Year
    • The year of reporting.
    • Type of Data: Quantitative, Discrete, Ordinal level

The remaining columns cover event counts:

  • Total
    • The overall total of registered events.
    • Type of Data: Quantitative, Discrete, Ratio Level
  • January to December columns
    • The monthly total of registered events.
    • Type of Data: Quantitative, Discrete, Ratio Levels

These datasets offer a structured view of vital events across regions and time, helping trace patterns in how Filipinos form families over recent years.

The Urbanization dataset has 135 rows and 8 columns.

Each row is uniquely identified by:

  • REGION, PROVINCE, AND HIGHLY URBANIZED CITY
    • The geographic unit associated with the statistics.
    • Type of Data: Qualitative, Discrete, Nominal level
  • Type of Place
    • Classifies each entry as a region, province, or highly urbanized city.
    • Type of Data: Qualitative, Discrete, Nominal level

The remaining columns describe population and urbanization measures:

  • Total Population Columns
    • The total number of individuals living in the geographic area.
    • Type of Data: Quantitative, Discrete, Ratio Level
  • Urban Population Columns
    • The number of individuals living in areas classified as urban within the geographic unit.
    • Type of Data: Quantitative, Discrete, Ratio Level
  • Urbanization Percentages
    • The proportion of the total population living in urban areas, expressed as a percentage.
    • Type of Data: Quantitative, Continuous, Ratio Level

This dataset provides a static but essential snapshot of the distribution between urban and rural living spaces across the Philippines and remains the most recent and complete urbanization reference as of the time of this study.

We compiled the key details of each dataset into the below summary:

Dataset Size Time Covered Identifiers Source
Crime 378 × 66 1/2022 - 9/2023 Source Name, OFFICE PNP PDFs
Birth 405 × 16 1/2022 - 10/2024 (provisional) Place, Year PSA Excel Files
Marriage 402 × 16 1/2022 - 10/2024 (provisional) Place, Year PSA Excel Files
Urbanization 135 × 8 2020 (static) Place PSA Excel Files

Now that we are more familiar with our data, we can now take a closer look at its nuances.

Data Preparation

Raw data rarely comes ready for analysis. So, we must preprocess our datasets to lay the groundwork for accurate and reliable insights.

To clean our datasets, we followed a general framework:

Format
Refinement

Removal of
Irrelevant
Columns

Removal of
Irrelevant
Rows

Handling of
Missing
Values

Handling of
Duplicate
Values

Handling of
Outliers

Final
Adjustments

  1. Format Refinement
    1. Renamed columns for consistent casing and clarity
    2. Renamed the Source Name column to Year-Month
    3. Renamed rows in the Year-Month column to reflect actual time periods
    4. Renamed the Office column to Place
    5. Renamed rows in the Place column to reflect actual Philippine regions
  2. Removal of Irrelevant Columns
    1. Dropped all columns pertaining to derived totals and rates, except for the Average Monthly Crime Rate column
  3. Removal of Irrelevant Rows
    1. Dropped rows where the Region column has the value TOTAL
  4. Handling of Outliers
    1. Retained all outliers to preserve possible contextual significance
  5. Final Adjustments
    1. Converted incorrectly flagged categorical columns into numerical type
    2. Ordered rows by Year-Month
  1. Format Refinement
    1. Standardized the location column
    2. Reformatted the data to have a single Month column instead of twelve separate columns
    3. Merged!
    4. Converted incorrectly flagged categorical columns into numerical type
    5. Combined Year and Month columns into a single Year-Month column
  2. Removal of Irrelevant Columns
    1. Dropped rows where the location column has the value TOTAL
    2. Dropped rows where the Type of Place column is not equal to Region
      • Dropped the Type of Place column
  3. Removal of Irrelevant Rows
    1. Dropped rows from October 2024 onwards
    2. Filled the missing September 2024 value for BARMM in the Number of Registered Marriages column using the rounded mean of its September 2022 and 2023 values
  4. Handling of Outliers
    1. Retained all outliers to preserve possible contextual significance
  5. Final Adjustments
    1. Replaced all instances of MIMAROPA REGION in the location column with REGION IV-B (MIMAROPA)
  1. Removal of Irrelevant Columns
    1. Removed every column except identifiers and the Percent Urban - 2020 column
  2. Removal of Irrelevant Rows
    1. Dropped rows where the Type of Place column is not equal to Region
      • Dropped the Type of Place column
  3. Final Adjustments
    1. Renamed the Percent Urban - 2020 column to Percent Urban
    2. Renamed the REGION, PROVINCE, AND HIGHLY URBANIZED CITY column to Place
    3. Replaced all instances of MIMAROPA REGION in the location column with REGION IV-B (MIMAROPA)
    4. Replaced all instances of REGION XIII (CARAGA) in the location column with REGION XIII (Caraga)

View how each dataset was preprocessed in detail:

After individually preparing our datasets, we merged them into a consolidated dataset consisting of

357 rows × 24 columns

With our merged dataset finalized, the next step is to dissect the features that power our study.

Feature Understanding

What do our features say? We zoom in on each one, interpret its quirks, and identify what insights they unlock.

UNIVARIATE ANALYSIS

Distribution

To begin, we checked the distribution of each dataset.

As can be seen, the distributions of the births, marriages, and crime datasets are right-skewed, where most data points are clustered toward the lower end.

Birth Count values are mostly concentrated around 5,000, with some rare peaks at the upper end.

Marriage Count clusters around 1,000 to 2,000. Notably, its tail is the thinnest among all three distributions.

Crime Rate commonly falls between 10 to 20%. Unlike the other distributions, however, its tail is not tapered, with outliers well above 50%.

These patterns imply that, month-to-month, birth counts, marriage counts, and crime rates generally are relatively low, save for the few rare, extreme instances on their tails. The tails of birth and marriage counts may reflect seasonal or cultural events like wedding affairs. In contrast, sudden increases in the otherwise relatively low crime rates may relate to political tensions, economic events, or other social stressors.

Outliers

We then identify outliers to better understand any irregular patterns.

As expected from the skewed distributions, all three features show outliers on the higher end.

We will not drop these outliers as, given our research focus, these data points may be meaningful when evaluating correlations across regions and time.

TEMPORAL AND REGIONAL ANALYSIS

To understand how trends in birth, marriage, and crime vary across the country, we visualized monthly regional data using choropleth maps.

These patterns help us grasp not only where things are happening, but how communities differ in the pressures they face

Crime Rate Choropleth Map

The map identifies four consistent crime hotspots: NCR, Region 7 (Central Visayas), and Region 11 (Davao). These outlined areas regularly report higher crime rates month after month.

Notably, Central Visayas and Davao experienced spikes in crime rates in July and August 2022. September 2022 saw a nationwide surge in crime — a pattern not mirrored in September 2023. Meanwhile, Caraga showed a marked increase in crime in June 2023.

These fluctuations point to the uneven pressures felt by different regions, underscoring the need for localized context-setting.

Birth Choropleth Map

Region 4-A (CALABARZON) consistently records the highest number of births, followed closely by Region 3 (Central Luzon) and NCR.

The map’s color distribution and intensity is relatively stable over time, indicating minimal variation in birth count trends across months.

Marriage Choropleth Map

Similar to the birth count, Region 4-A leads in marriage count, followed by Central Luzon.

While the general pattern stays steady, we observe seasonal peaks every April in Luzon, and May in Visayas.

Ubanization Choropleth Map

The urbanization map paints a picture of a highly centralized development landscape.

NCR, Central Luzon, and Southern Luzon stand out as the most urbanized regions in the country, hosting more developed infrastructure. Much of the northern and central part of the Philippines, however, remains predominantly rural.

Interestingly, parts of lower-right Mindanao, such as the Davao region, also exhibit relatively high urbanization levels.

Understanding these differences is crucial, since while urbanized regions may attract more activity, they often experience distinct pressures. While these patterns may coexist, further analysis is needed to understand how, or if, they’re related.

BIVARIATE ANALYSIS

To better understand the relationships between our variables, we explored the pairwise correlations between birth counts, marriage counts, and crime rates.

Based on the plot above, we observe clear positive correlations between crime rates and both birth and marriage counts. This is promising information, as we are particularly invested in how these two aspects of family life might relate to crime rates.

However, a notable relationship also emerges between birth and marriage counts themselves. Since these two features are our independent variables, their correlation raises a red flag for multicollinearity, which could affect the reliability of a multiple regression model.

Multicollinearity

To check for multicollinearity, we created a correlation matrix:

As shown, all three variables are positively correlated:

  • Birth Count and Crime Rate: 0.78
  • Marriage Count and Crime Rate: 0.78
  • Birth Count and Marriage Count: 0.80

This suggests multicollinearity in our datasets that may skew the results of our multiple regression. To investigate further, we calculated the Variance Inflation Factor (VIF) for each predictor.

Feature VIF
0 const 3.744967
1 Number of Registered Births 2.516740
2 Number of Registered Marriages 2.516740

Encouragingly, both VIF values fall below 4, which indicates that while multicollinearity does exist, it is not severe enough to significantly distort the model’s estimates.

With that, we move forward — cautiously but more confidently — to explore deeper statistical relationships.

Putting Eveything Together

To help visualize how all these variables interact, we plotted birth and marriage counts, with crime rates represented by the darkness of each point:

This plot helps tie our initial insights together. From this, we can see that higher crime rates tend to cluster around regions and time periods with moderately high birth and marriage counts, at around 12,500 to 15,000 and 2,000 to 6,000, respectively.

Intriguingly, there’s also a standout point with low birth and marriage counts but a high crime rate, suggesting that elevated crime rate doesn’t always coincide with high birth and marriage activity; it can occur in different contexts.

So, why does this matter?

These early explorations set the stage.

They suggest that patterns do exist, but they’re not one-size-fits-all.

By digging deeper across regions, time, and reviewing crime types, we can move beyond general trends and ask more meaningful and targeted questions about how family dynamics and pressures intersect in the Philippines today.

Let us now take a closer look, one research question at a time.

RESEARCH QUESTION 1

To explore this, we use two key visualizations:

3D Scatter Plots per Month

These highlight monthly patterns, clustering, and outliers of the given data, allowing us to observe how relationships among variables shift from one month to the next.

Line Graphs + Multi-axis Line Graph

These offer a long-view perspective on potential correlations between features, helping us trace the broader temporal trends of crime rates, birth counts, and marriage counts.

VISUALIZATION 1: 3D Scatter Plots per Month

Seen above: 21 3D Scatter Plots for each month from 2022-01 to 2023-09.

The plots show that:

  • Most months show a roughly linear clustering pattern, suggesting a possible correlation between crime rates, birth counts, and marriage counts.
  • There are four outlier months: April 2022, December 2022, June 2023, and July 2023, which show more spread out values and numerous outliers that are noticeably more extreme and distant from the main cluster.
    • These draw attention to possible context-specific disruptions due to social or political events.

The table below summarizes these outlier months and their potential significance:

Month-Year Notable Events Potential Impact on Data
April 2022 Tropical Storm Agaton; COVID-19 Omicron BA.2.12 case; Plausible disruption in government and civil functions because of the tropical storm and COVID-19 Omicron BA.2.12, which could delay birth and/or marriage registrations
December 2022 Severe flooding; Implementation of child marriage ban; Possible delays in civil registrations and crime reporting due to natural disaster; Decrease in reported underage marriages
June 2023 Pride PH Festival; Post-pandemic adjustments and eased pandemic restrictions Spike in petty crimes due to crowd density; Resumption of delayed civil registrations
July 2023 Super Typhoon Doksuri (Egay); State of the Nation Address (SONA) Disruptions to civil registration services and mobility due to typhoon; Increased security measures and public demonstrations may have influenced crime statistics

References: [7] [8] [9] [10] [11] [12] [13] [14]

These scatter plots visually support the notion that, sans some outliers,crime rates, birth counts, and marriage counts generally move in patterned ways across months.

VISUALIZATION 2: Line Graphs + Multi-axis Line Graph

Seen above: Individual Line Graphs Tracking Crime Rates, Birth Counts, and Marriage Counts from January 2022 to September 2023

It can be observed from the above graphs that:

  • Crime Rate
    • Crime rates show fluctuations over the course of the available time period, displaying noticeable peaks around September 2022 and June 2023.
  • Birth
    • Birth counts generally trend higher between August and November 2022.
    • There is a steep decrease in February 2023 followed by a steady rise through September 2023.
    • Significant dips appear in January to February for both years.
    • Spikes show between July and September across both years.
  • Marriage
    • Marriage counts experience sharp month-to-month fluctuations, with notable spikes seen in February 2022, December 2022, and February 2023.
      • Peaks in February may be associated with Valentine’s Day-related events and the cultural association of February with love.
      • The December spike could be linked to holiday-related events or traditions.
    • Notable dips occur in November 2022 and August 2023.

Seen above: Multi-axis line graph layering Crime Rate, Births, and Marriages

The above graph illustrates that:

  • In September 2022, both crime rates and birth counts rose simultaneously, while marriage counts began to rise.
  • In June 2023, crime rates and marriage counts both surged, while birth counts remained flat.
  • Birth counts and crime rates tended to move in tandem for an extended period between May 2022 and February 2023, with rises and falls aligning.
  • Marriage counts and crime rates appear to exhibit parallel fluctuations from April to August 2023.

The observations and possible events corresponding to these trends are summarized in the table below:

Month-Year Notable Events Potential Impact on Data
September 2022 School year opening under full in-person setup; loosening of pandemic restrictions Higher social mobility may have led to increased crimes; seasonal birth peaks; resumed civil operations affect data
June 2023 Pride PH Festival; traditional wedding season; Large gatherings may have spiked petty crimes; June weddings influenced higher marriage registrations
May 2022 - February 2023 Ongoing pandemic-related restrictions and limited civil service capacity; implementation of child marriage ban (December 2021 – into effect 2022) Eased restrictions enabled more civil activities; ban likely affected underage marriage data and correlated metrics
April-August 2023 Record-breaking heatwave; power outages in Occidental Mindoro (April); Super Typhoon Mawar (May); Pride PH Festival (June); Palarong Pambansa & Storm Dodong (July); Second Thomas Shoal standoff and national festivals (August) Civil registration may have been delayed due to weather and outages; large gatherings may have driven petty crime; festivals and travel may have impacted marriages and local population dynamics

References: [15] [16] [17] [18] [19] [20] [21]

These overlapping patterns suggest the possibility of shared underlying drivers and interrelated behaviors among features.

While these plots and graphs do not yet confirm statistical significance, which will be explored, they reveal consistent co-movements among crime rates, birth counts, and marriage counts. These visualizations justify moving forward with our statistical analysis.

Note: The events pointed out in the tables during interpretation are simply observational, and their potential impacts are pure speculation and do not, in any way, guarantee causation that these events transformed our data.

Our analysis of monthly trends revealed promising signs of correlation between birth and marriage counts and crime rates.

While these patterns suggest a relationship, they do not capture where exactly these interactions are most prominent. Thus, the next question:

RESEARCH QUESTION 2

We’ll visualize the data via two means:

Radar Charts Per Region

These charts highlight how each region compares across the four variables, revealing distinct regional patterns and imbalances.

Composite Choropleth Map

This map offers a high-level spatial overview that helps summarize earlier findings.

VISUALIZATION 1: Radar Charts for each Region

Seen above: Radar Charts for NCR and Regions 1, 2, and 3

The above graphs illustrate that:

High urbanization level often aligns with elevated features across regions

Regions with higher Percent Urban like NCR and Region IV-A report relatively high values across Crime Rate, Birth Count, and Marriage Count.

Low-urbanization regions tend to show lower values across all indicators

Regions like CAR, Region II, Region VIII, and BARMM consistently display low Percent Urban along with subdued crime, birth, and marriage figures.

Mid-urbanization regions exhibit varied patterns

Region 3 shows relatively high Birth and Marriage Counts but only moderate Crime Rate. Region 8, by contrast, reports a high Crime Rate despite only moderate Percent Urban, while Region 10 appears more balanced and subdued across all four features.

VISUALIZATION 2: Composite Choropleth Map

Each region’s color is generated by blending RGB channels as follows:

  • Red → Crime Rate
  • Green → Birth Count
  • Blue → Marriage Count
  • Opacity → to Percent Urban

Color intensity reflects the relative dominance of each variable:

  • Redder → Crime-dominant
  • Greener → Birth-dominant
  • Bluer → Marriage-dominant
  • Yellow (R + G) → High Crime Rate and Birth Counts
  • Magenta (R + B) → High Crime and Marriage Counts
  • Cyan (G + B) → High Birth and Marriage Counts
  • White (R + G + B) → High across all variables
  • Black → Low across all variables (quiet or less developed regions)

Seen above: (left) map with no-alpha / urbanization, (right) map with alpha / urbanization

Augmenting our previous observations, the above maps show that:

There is a natural clustering among regions depending on their geographic proximity

  • Color similarities across neighboring regions suggest that birth, marriage, and crime rates tend to form geographically coherent groupings.
  • For instance,
    • Northern Luzon → bluer hues
    • Central Luzon and Region IV-A → greener hues
    • Region IV-B and Western Visayas → bluer hues
    • Region V and Eastern Visayas → muted hues
    • Central Visayas and Mindanao → redder hues

Percent Urban acts as a multiplier

  • We can observe that regions with more muted hues in the RGB-only map, reflecting lower all around values across crime, birth, and marriage, also appear more transparent in the alpha-blended version.
  • Conversely, regions with more vivid colors in the RGB map tend to be more opaque in the alpha-blended graphic, suggesting that higher urbanization typically accompanies higher values in crime, birth, and marriage.

The relationship between Crime Rate, Birth, and Marriage Counts is not straightforward

  • While more urbanized areas tend to have higher crime, birth, and marriage values, the balance between them varies across regions.
  • This variation suggests different underlying dynamics like governance, infrastructure, and culture, but these fall outside the scope of this dataset and research.
    • e.g. NCR and Davao

Our regional analysis revealed that correlations between crime rates, birth counts, and marriage counts are not only present but vary across space.

Crime rate, however, is an aggregate macro-trend.

Therefore, to understand the components of our observed relationships, we ask:

RESEARCH QUESTION 3

We visualize the data through

Spearman Correlation Matrix of Crime Types with Birth and Marriage Counts

These show which crime types exhibit strong correlations with both birth and marriage counts.

Correlation Matrix + Bubble Graph

This visualizes the shown correlations in a different angle, highlighting possible relationships between high correlation and crime type count.

VISUALIZATION 1: Spearman Correlation Matrix of Crime Types with Birth and Marriage Counts

Seen above: Correlation Matrix between Birth and Marriage counts and Crime types

The graph above illustrates that:

Several crime types show strong correlations (r ≥ 0.80) with both births and marriage, implying a notable alignment between birth and marriage counts and crime incidence.

The five crime types with the strongest dual correlations are:

  1. Violation of Special Laws (Non-Index)
    • Births: 0.884, Marriages: 0.798
    • This is the strongest overall correlation, suggesting that as the number of births and marriages rise, often suggesting an increased amount of household activity, legal violations typically involving violations against women and children, assault, and other offenses, rise accordingly.
  2. PSI: RIR - Physical Injury
    • Births: 0.881, Marriages: 0.793
    • This strong correlation may reflect heightened interpersonal tensions or community strain during significant life transitions like childbirth and marriage. Events like welcoming a child and adjusting to married life can amplify domestic or neighborhood disputes, increasing the likelihood of physical altercations.
  3. Index: Focus - Theft
    • Births: 0.870, Marriages: 0.752
    • Theft is among the most population-sensitive index crimes, indicating that rising birth and marriage numbers may coincide with higher density in communities, presenting both more targets and potentially more motivations for opportunistic crimes like theft.
  4. Non-Index - Other Crimes
    • Births: 0.849, Marriages: 0.822
    • This catch-all category includes crimes influenced by changing household structures or financial stresses accompanying births and marriages.
  5. PSI: RIR - Damage to Property
    • Births: 0.858, Marriages: 0.775
    • This relationship indicates a strong connection between domestic life changes common around family expansion or union and property damage-related crimes, possibly due to disputes or increased mobility.

Crimes like Quasi-Homicide (r = 0.234 with births, 0.194 with marriages) show weak correlations, indicating minimal or no relationship with demographic changes.

Other notable observations:

  • Rape and Robbery both exceed 0.72 correlation with marriages and 0.78 with births, underscoring a troubling connection between population events and violent or opportunistic crimes.
  • Carnapping MC (Motorcycles) has a stronger correlation (r = 0.685 with births) than Carnapping MV (Motor Vehicles) (r = 0.419), suggesting motorcycles may be more affected by demographic growth.
  • Overall, the matrix indicates that crimes of opportunity and interpersonal harm—such as theft, physical injury, and damage to property—are more responsive to population increases than crimes like quasi-homicide or murder.

VISUALIZATION 2: Correlation Matrix + Bubble Graph

Seen above: Bubble Graph (Hover to bubble to see specific crime type)

Top-right quadrant bubbles represent crime types with strong positive correlations to both birth and marriage counts. Larger bubble sizes are associated with higher average crime counts, indicating these crimes occur more frequently.

The graph shows that the most prominent crimes, both in terms of correlation and frequency, are:

  • Violation of Special Laws (Non-Index) – Births: 0.884, Marriages: 0.798
  • PSI: RIR – Physical Injury – Births: 0.881, Marriages: 0.793
  • Non-Index – Other Crimes – Births: 0.849, Marriages: 0.822
  • PSI: RIR – Damage to Property – Births: 0.858, Marriages: 0.775
  • Index: Focus – Theft – Births: 0.870, Marriages: 0.752

These crime types are both highly responsive to demographic trends and frequently committed, as shown by their position and bubble size.

Lower-left quadrant bubbles like Quasi-Homicide and Special Complex Crimes have weak correlations and low frequency, showing minimal connection to births or marriages.

The visual trends in the plot reinforce the numerical findings in the correlation matrix: crimes involving regulation, interpersonal harm, and property are most influenced by demographic events like births and marriages.

So far, we have explored visual patterns and correlations that highlight meaningful relationships between crime rates, birth counts, and marriage counts.

But while these observations suggest associations, they do not confirm whether these relationships are statistically significant or simply coincidental.

To formally assess whether the patterns we have seen hold up under statistical scrutiny, we now turn to hypothesis testing.

Pairwise Correlation

First, let us formally check for the pairwise correlations.

Given that the distributions are not normal, we use Spearman’s rank correlation.

Correlation Coefficient p-value
Births - Crime Rates 0.782 4.953e-75
Marriages - Crime Rates 0.689 1.311e-51

The correlation between birth counts and crime rates is 0.78, with a p-value well below the alpha 0.05. On the other hand, the correlation between marriage counts and crime rates is 0.69, also significant at the 0.05 level.

Thus, we can say that there is a significant positive correlation between birth counts and crime rates, and between marriage counts and crime rates.

A possible explanation for both positive pairwise correlations is that birth counts, marriage counts, and crime rates may co-move as byproducts of population growth, where more people result in more of each event.

Multiple Regression

Next, we assess how birth and marriage counts together correlate with changes in crime rates. This can be done by using a multiple regression model.

Since the data is positively skewed, we apply a log transformation to birth counts, marriage counts, and crime rates. The data is not perfectly normally distributed, but it is no longer significantly right-skewed.

Number of Registered Births Number of Registered Marriages Average Monthly Crime Rate
0 9.492055 7.945201 3.832114
1 7.675546 6.624065 1.985131
2 8.617039 7.647786 2.545531
3 8.294300 6.878326 2.758109
4 9.561912 8.176673 3.151881

Using Ordinary Least Squares regression with logged birth and marriage counts as predictors gives the following output:

p-value
const 1.398782e-19
Number of Registered Births 4.053306e-29
Number of Registered Marriages 8.114509e-05

We can see from the above table that the p-values for both predictor variables are well below the p-value of 0.05, indicating they each have a statistically significant effect on crime rates.

The coefficient for births is notably stronger, suggesting that it may exert more influence in explaining crime fluctuations than marriages do.

R-squared 0.564
(Prob) F-statistic 3.86e-58

The R-squared value is 0.564, indicating that 56.4% of the variance in crime rates is explained by the birth and marriage counts.

Finally, as the value of (Prob) F-statistic is less than the alpha 0.05, it can be said that at least one of our independent variables (Number of Registered Births and Number of Registered Marriages) has a statistically significant effect on the crime rates.

Residual Diagnostics

To assess the validity of our model, we check whether the residuals follow a normal distribution, as assumed in linear regression.

As we can see, the residuals are not normally distributed, violating the assumption.

We can also see from the Q-Q plot that the points do not follow the normal line:

The violation of these assumptions may be caused by the outliers or structural irregularities in the data.

To address this issue, we apply a Robust Linear Regression model (RLM), which is more resilient to skew and outliers.

The resulting coefficients remain consistent. Hence, we can take our earlier results as valid despite linear regression assumption violations.

OLS RLM
const -3.139933 -3.248856
Births 0.598162 0.627443
Marriages 0.118368 0.094712

Consistent with the pairwise correlation results, the multiple regression models confirm that birth and marriage counts have a statistically significant correlation with crime rates.

Although the outliers were retained in the primary analysis, as they may be valuable data points, we also fitted both the OLS and RLM models on datasets with outliers removed to assess their potential influence.

The resulting coefficients are as follows:

OLS RLM
const -3.179567 -3.303626
Births 0.582826 0.610660
Marriages 0.141194 0.121883

We also have the p-values from the OLS model:

p-value
const 1.159422e-19
Number of Registered Births 3.241703e-25
Number of Registered Marriages 3.610169e-04

The resulting coefficients remained consistent. Moreover, based on the p-values, birth counts continue to show a stronger correlation on crime rates than marriage counts.

RESULTS & CONCLUSION

Summary

From the analysis we can reject our null hypothesis and conclude that:

There is a statistically significant correlation between crime rates and birth and marriage counts in the Philippines.

Furthermore, we found that births, marriages, and crime rates, while influenced by political, social, and economic events, consistently increased and decreased together. The relationship between births, marriages, and crime rates was also found to vary by region, and by urbanization values: high urbanization levels correspond to high births, marriages, and crime rates, while low urbanization levels correspond to low birth and marriage counts and crime rates.

Crimes under Violation of Special Laws (Non-Index); Physical Injury, and Damage to Property due to Reckless Imprudence; and Theft (Focus Crimes) were found to be highly correlated with birth counts and marriage counts.

Implications

With a positive correlation observed between birth counts, marriage counts, and crime rates, especially of crimes of an interpersonal and property-related nature, more support should be given to families, especially those living in highly urban, high-risk areas. Every family deserves to celebrate important milestones such as childbirths and marriages without the fear of their family members at risk of being victims of crime.

Birth counts, marriage counts, and crime rates were also found to have high values in areas of higher urbanization levels. This suggests that people, in earning a living for their families, flock towards areas considered more urban, where jobs are more prolific. This may be a reflection of the unequal distribution of development and opportunities in the country. Policy-making should prioritize distributing developments, to distribute population density, so that there are no highly urbanized areas, and families living in highly urban, high-risk areas are considered.

Limitations

While we were able to produce analyses using various visualizations and models, and were able to come to the same findings, the results of this study are limited by our datasets. We are limited by the crime dataset, which only spans 2022-2023 data. Our urbanization dataset came from the 2020 census as the 2024 census has not yet been released.

Furthermore, the data could be modeled using k-means clustering for more sophisticated results, but this method would best be done with a more granular dataset (e.g. per barangay). The current dataset is limited to by-region granularity.

Recommendations

Well, what now?

Given our limitations, and throughout the course of this study, we recommend the following improvements.

Firstly, we recommend coordinating with PNP and PSA for an expanded dataset that contains more data. The crime dataset in particular can be further expanded in granularity by coordinating with the PNP.

Next, we recommend further investigating urbanization as a variable, as it seems to act like a multiplier on our three primary features.

For a more direct comparison, we also recommend studying the relationship between birth rates, marriage rates, and crime rates. In line with this, we also recommend considering population as a potential confounding variable on the correlation between births, marriages, and crime rates.

Meet the Team

Reach out to us. We'd love you hear from you!

Adrianne Paul Abyado

I am Adrianne Paul M. Abyado, a 3rd year BS Computer Science student with a longstanding love for creating visually appealing stuff. I am particularly interested in the field of UI and UX, as well as software and web development.

Jason Alcantara

I am Jason S. Alcantara, a 3rd year BS Computer Science student with an fond interest for Data Analysis and Software Development. In my spare time, I like to play games read and watch media, and drink coffee with my buddies.

Karl Andrei Alcober

I'm a 3rd year BS Computer Science student, interested in the fields of Cybersecurity and Web Development. Outside my coding environment, I like to jog, to play Basketball, and to watch movies.

Riana Bejarin

I am Riana D. Bejarin, a 3rd year BS Computer Science student. Enjoyed computer science subjects in high school so much I decided to take this course. In my free time I can be found reading novels and manga, making drawings, or playing video games.