{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "fb3416a0",
   "metadata": {},
   "source": [
    "# minilab: Ασκήσεις εξάσκησης - Αριθμητική Python και Προεπεξεργασία δεδομένων"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a77089e",
   "metadata": {},
   "source": [
    "## Η βιβλιοθήκη ``Pandas``"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "488b134b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import the pandas module here once\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cee5953a",
   "metadata": {},
   "source": [
    "### Exercise 1.1: DataFrame Basics\n",
    "\n",
    "Do the following:\n",
    "\n",
    "Create an 3x4 (3 rows x 4 columns) Pandas DataFrame in which the columns are named ``Savvas``, ``Nikos``, ``Roman``, and ``Avinash``. Populate each of the 12 cells in the DataFrame with a **random integer** between 0 and 100 (inclusive).\n",
    "\n",
    "1. Output the entire DataFrame\n",
    "\n",
    "\n",
    "2. Output the value in the cell of row \\#1 of the Roman column\n",
    "\n",
    "\n",
    "3. Create a fifth column named ``Sums``, which is populated with the row-by-row sums of Roman and Avinash.\n",
    "\n",
    "> **Hint**: Use the [np.random.randint()](https://numpy.org/doc/stable/reference/random/generated/numpy.random.randint.html) method to get random integers in an interval."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "cf68f7bd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Savvas</th>\n",
       "      <th>Nikos</th>\n",
       "      <th>Roman</th>\n",
       "      <th>Avinash</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>63</td>\n",
       "      <td>19</td>\n",
       "      <td>95</td>\n",
       "      <td>55</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>52</td>\n",
       "      <td>77</td>\n",
       "      <td>28</td>\n",
       "      <td>26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>10</td>\n",
       "      <td>13</td>\n",
       "      <td>78</td>\n",
       "      <td>58</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Savvas  Nikos  Roman  Avinash\n",
       "0      63     19     95       55\n",
       "1      52     77     28       26\n",
       "2      10     13     78       58"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# One way to do this task is the following\n",
    "import numpy as np\n",
    "\n",
    "data = {\n",
    "    \"Savvas\" : [np.random.randint(0, 101) for _ in range(3)],\n",
    "    \"Nikos\" : [np.random.randint(0, 101) for _ in range(3)],\n",
    "    \"Roman\" : [np.random.randint(0, 101) for _ in range(3)],\n",
    "    \"Avinash\" : [np.random.randint(0, 101) for _ in range(3)]\n",
    "}\n",
    "\n",
    "df = pd.DataFrame(data)\n",
    "\n",
    "# Output the entire DataFrame\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "285cdd35",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "28\n",
      "28\n",
      "28\n"
     ]
    }
   ],
   "source": [
    "# Output the value in the cell of row #1 of the Roman column\n",
    "print(df.iloc[1, 2])\n",
    "\n",
    "# or..\n",
    "print(df.loc[1, \"Roman\"])\n",
    "\n",
    "# or...\n",
    "print(df['Roman'][1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "8d6cb48f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Savvas</th>\n",
       "      <th>Nikos</th>\n",
       "      <th>Roman</th>\n",
       "      <th>Avinash</th>\n",
       "      <th>Sums</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>63</td>\n",
       "      <td>19</td>\n",
       "      <td>95</td>\n",
       "      <td>55</td>\n",
       "      <td>150</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>52</td>\n",
       "      <td>77</td>\n",
       "      <td>28</td>\n",
       "      <td>26</td>\n",
       "      <td>54</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>10</td>\n",
       "      <td>13</td>\n",
       "      <td>78</td>\n",
       "      <td>58</td>\n",
       "      <td>136</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Savvas  Nikos  Roman  Avinash  Sums\n",
       "0      63     19     95       55   150\n",
       "1      52     77     28       26    54\n",
       "2      10     13     78       58   136"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create a fifth column named Sums, which is populated with the row-by-row sums of Roman and Avinash.\n",
    "df[\"Sums\"] = df[\"Roman\"] + df[\"Avinash\"]\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "a94f1733",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   Savvas  Nikos  Roman  Avinash\n",
      "0      91     77     13       66\n",
      "1      89      0     88       93\n",
      "2      78     54     16       90\n",
      "\n",
      "Second row of the Roman column: 88\n",
      "\n",
      "   Savvas  Nikos  Roman  Avinash  Sums\n",
      "0      91     77     13       66    79\n",
      "1      89      0     88       93   181\n",
      "2      78     54     16       90   106\n"
     ]
    }
   ],
   "source": [
    "# Another solution to this task is the following\n",
    "\n",
    "# Create a Python list that holds the names of the four columns.\n",
    "my_column_names = ['Savvas', 'Nikos', 'Roman', 'Avinash']\n",
    "\n",
    "# Create a 3x4 numpy array, each cell populated with a random integer.\n",
    "my_data = np.random.randint(low=0, high=101, size=(3, 4))\n",
    "\n",
    "# Create a DataFrame.\n",
    "df = pd.DataFrame(data=my_data, columns=my_column_names)\n",
    "\n",
    "# Print the entire DataFrame\n",
    "print(df)\n",
    "\n",
    "# Print the value in row #1 of the Roman column.\n",
    "print(\"\\nSecond row of the Roman column: %d\\n\" % df['Roman'][1])\n",
    "\n",
    "# Create a column named Janet whose contents are the sum\n",
    "# of two other columns.\n",
    "df['Sums'] = df['Roman'] + df['Avinash']\n",
    "\n",
    "# Print the enhanced DataFrame\n",
    "print(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f17cd61",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86f51098",
   "metadata": {},
   "source": [
    "### Exercise 1.2: DataFrame Basics (cont.)\n",
    "\n",
    "Create a Pandas DataFrame with the following data:\n",
    "\n",
    "```python\n",
    "data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],\n",
    "        'Age': [25, 30, 22, 35],\n",
    "        'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}\n",
    "\n",
    "```\n",
    "\n",
    "1. Print the first 3 rows of the DataFrame.\n",
    "\n",
    "2. Calculate the mean age of the individuals."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "9f63adfa",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "      Name  Age         City\n",
      "0    Alice   25     New York\n",
      "1      Bob   30  Los Angeles\n",
      "2  Charlie   22      Chicago\n",
      "Mean Age: 28.0\n"
     ]
    }
   ],
   "source": [
    "data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],\n",
    "        'Age': [25, 30, 22, 35],\n",
    "        'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}\n",
    "\n",
    "# Create the dataframe\n",
    "df = pd.DataFrame(data)\n",
    "\n",
    "# 1. Print the first 3 rows\n",
    "print(df.head(3))\n",
    "\n",
    "# 2. Calculate the mean age\n",
    "mean_age = df['Age'].mean()\n",
    "print(\"Mean Age:\", mean_age)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27dc5d8c",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c1489273",
   "metadata": {},
   "source": [
    "### Exercise 2: Data Filtering\n",
    "\n",
    "Given the DataFrame from Exercise 1.2, write code to:\n",
    "\n",
    "1. Create a new DataFrame containing only individuals older than 25.\n",
    "\n",
    "2. Find and print the names of individuals from Chicago."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "3dc9f5d0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Residents of Chicago: ['Charlie']\n"
     ]
    }
   ],
   "source": [
    "# 1. Filter individuals older than 25\n",
    "filtered_df = df[df['Age'] > 25]\n",
    "\n",
    "# 2. Find names of individuals from Chicago\n",
    "chicago_residents = df[df['City'] == 'Chicago']['Name']\n",
    "print(\"Residents of Chicago:\", chicago_residents.values)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22364e06",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13616083",
   "metadata": {},
   "source": [
    "### Exercise 3: Data Sorting\n",
    "\n",
    "Given the DataFrame from Exercise 1.2, write code to:\n",
    "\n",
    "1. Sort the DataFrame by age in descending order.\n",
    "\n",
    "2. Sort the DataFrame by name in alphabetical order."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "13720e39",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sorted by Age:\n",
      "       Name  Age         City\n",
      "3    David   35      Houston\n",
      "1      Bob   30  Los Angeles\n",
      "0    Alice   25     New York\n",
      "2  Charlie   22      Chicago\n",
      "Sorted by Name:\n",
      "       Name  Age         City\n",
      "0    Alice   25     New York\n",
      "1      Bob   30  Los Angeles\n",
      "2  Charlie   22      Chicago\n",
      "3    David   35      Houston\n"
     ]
    }
   ],
   "source": [
    "# 1. Sort by age in descending order\n",
    "sorted_by_age = df.sort_values(by='Age', ascending=False)\n",
    "\n",
    "# 2. Sort by name in alphabetical order\n",
    "sorted_by_name = df.sort_values(by='Name')\n",
    "\n",
    "print(\"Sorted by Age:\\n\", sorted_by_age)\n",
    "print(\"Sorted by Name:\\n\", sorted_by_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "80b19c53",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "78fd5068",
   "metadata": {},
   "source": [
    "### Exercise 4: Using ``loc`` for Label-Based Indexing\n",
    "\n",
    "Given the following DataFrame df:\n",
    "\n",
    "```python\n",
    "data = {'A': [1, 2, 3, 4, 5],\n",
    "        'B': [10, 20, 30, 40, 50],\n",
    "        'C': ['apple', 'banana', 'cherry', 'date', 'elderberry']}\n",
    "\n",
    "df = pd.DataFrame(data, index=['one', 'two', 'three', 'four', 'five'])\n",
    "```\n",
    "\n",
    "1. Use the ``loc`` indexer to select and print the row with the label 'three'.\n",
    "\n",
    "2. Use the ``loc`` indexer to select and print the values from columns 'A' and 'C' for the rows with labels 'two' and 'four'."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "263e3ca8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>one</th>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>apple</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>two</th>\n",
       "      <td>2</td>\n",
       "      <td>20</td>\n",
       "      <td>banana</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>three</th>\n",
       "      <td>3</td>\n",
       "      <td>30</td>\n",
       "      <td>cherry</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>four</th>\n",
       "      <td>4</td>\n",
       "      <td>40</td>\n",
       "      <td>date</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>five</th>\n",
       "      <td>5</td>\n",
       "      <td>50</td>\n",
       "      <td>elderberry</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       A   B           C\n",
       "one    1  10       apple\n",
       "two    2  20      banana\n",
       "three  3  30      cherry\n",
       "four   4  40        date\n",
       "five   5  50  elderberry"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = {'A': [1, 2, 3, 4, 5],\n",
    "        'B': [10, 20, 30, 40, 50],\n",
    "        'C': ['apple', 'banana', 'cherry', 'date', 'elderberry']}\n",
    "\n",
    "df = pd.DataFrame(data, index=['one', 'two', 'three', 'four', 'five'])\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "c9bb14b1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Row with label 'three':\n",
      " A         3\n",
      "B        30\n",
      "C    cherry\n",
      "Name: three, dtype: object\n"
     ]
    }
   ],
   "source": [
    "# 1. Using loc to select and print the row with label 'three':\n",
    "selected_row = df.loc['three']\n",
    "\n",
    "print(\"Row with label 'three':\\n\", selected_row)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "4baf4adf",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Values from columns 'A' and 'C' for rows 'two' and 'four':\n",
      "       A       C\n",
      "two   2  banana\n",
      "four  4    date\n"
     ]
    }
   ],
   "source": [
    "# 2. Using loc to select and print values from columns 'A' and 'C' \n",
    "#    for the rows with labels 'two' and 'four':\n",
    "\n",
    "selected_values = df.loc[['two', 'four'], ['A', 'C']]\n",
    "\n",
    "print(\"Values from columns 'A' and 'C' for rows 'two' and 'four':\\n\", selected_values)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "be26d5f0",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e86e3d27",
   "metadata": {},
   "source": [
    "### Exercise 5: Using ``iloc`` for Integer-Based Indexing\n",
    "\n",
    "Given the following DataFrame df:\n",
    "\n",
    "```python\n",
    "data = {'X': [11, 22, 33, 44, 55],\n",
    "        'Y': [110, 220, 330, 440, 550],\n",
    "        'Z': ['apple', 'banana', 'cherry', 'date', 'elderberry']}\n",
    "\n",
    "df = pd.DataFrame(data)\n",
    "```\n",
    "\n",
    "1. Use the ``iloc`` indexer to select and print the first row.\n",
    "\n",
    "2. Use the ``iloc`` indexer to select and print the values from columns 'X' and 'Z' for the second and fourth rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "c20e88d2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>X</th>\n",
       "      <th>Y</th>\n",
       "      <th>Z</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>11</td>\n",
       "      <td>110</td>\n",
       "      <td>apple</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>22</td>\n",
       "      <td>220</td>\n",
       "      <td>banana</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>33</td>\n",
       "      <td>330</td>\n",
       "      <td>cherry</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>44</td>\n",
       "      <td>440</td>\n",
       "      <td>date</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>55</td>\n",
       "      <td>550</td>\n",
       "      <td>elderberry</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    X    Y           Z\n",
       "0  11  110       apple\n",
       "1  22  220      banana\n",
       "2  33  330      cherry\n",
       "3  44  440        date\n",
       "4  55  550  elderberry"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = {'X': [11, 22, 33, 44, 55],\n",
    "        'Y': [110, 220, 330, 440, 550],\n",
    "        'Z': ['apple', 'banana', 'cherry', 'date', 'elderberry']}\n",
    "\n",
    "df = pd.DataFrame(data)\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "4e07f685",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "First row:\n",
      " X       11\n",
      "Y      110\n",
      "Z    apple\n",
      "Name: 0, dtype: object\n"
     ]
    }
   ],
   "source": [
    "# 1. Using iloc to select and print the first row:\n",
    "first_row = df.iloc[0]\n",
    "\n",
    "print(\"First row:\\n\", first_row)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "4ee4cba4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Values from columns 'X' and 'Z' for rows 2 and 4:\n",
      "     X       Z\n",
      "1  22  banana\n",
      "3  44    date\n"
     ]
    }
   ],
   "source": [
    "# 2. Using iloc to select and print values from columns 'X' and 'Z' for the second and fourth rows:\n",
    "selected_values = df.iloc[[1, 3], [0, 2]]\n",
    "\n",
    "print(\"Values from columns 'X' and 'Z' for rows 2 and 4:\\n\", selected_values)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "89e4eeff",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "132dba49",
   "metadata": {},
   "source": [
    "### Exercise 6: Data Import and Export\n",
    "\n",
    "1. Read a CSV file named \"employees.csv\", in the data directory, into a Pandas DataFrame.\n",
    "\n",
    "2. Export the DataFrame from part 1 to a new CSV file named \"output.csv\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "c71c05d4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 1. Read CSV into DataFrame\n",
    "df = pd.read_csv(\"../data/employees.csv\")\n",
    "\n",
    "# 2. Export DataFrame to CSV\n",
    "df.to_csv(\"output.csv\", index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc198406",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6def4158",
   "metadata": {},
   "source": [
    "### Exercise 7: Missing Data Handling\n",
    "\n",
    "Below you are given a DataFrame with missing values. \n",
    "\n",
    "```python\n",
    "data_na = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],\n",
    "        'Age': [25, 30, 22, np.nan, 29],\n",
    "        'Score1': [85, 92, 78, 65, 88],\n",
    "        'Score2': [90, 87, 75, 60, np.nan]}\n",
    "```\n",
    "\n",
    "Write code to:\n",
    "\n",
    "1. Fill missing values in the \"Age\" column with the mean age.\n",
    "\n",
    "2. Remove rows with missing values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "2b05cd5e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Age</th>\n",
       "      <th>Score1</th>\n",
       "      <th>Score2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Alice</td>\n",
       "      <td>25.0</td>\n",
       "      <td>85</td>\n",
       "      <td>90.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Bob</td>\n",
       "      <td>30.0</td>\n",
       "      <td>92</td>\n",
       "      <td>87.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Charlie</td>\n",
       "      <td>22.0</td>\n",
       "      <td>78</td>\n",
       "      <td>75.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>David</td>\n",
       "      <td>NaN</td>\n",
       "      <td>65</td>\n",
       "      <td>60.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Eve</td>\n",
       "      <td>29.0</td>\n",
       "      <td>88</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Name   Age  Score1  Score2\n",
       "0    Alice  25.0      85    90.0\n",
       "1      Bob  30.0      92    87.0\n",
       "2  Charlie  22.0      78    75.0\n",
       "3    David   NaN      65    60.0\n",
       "4      Eve  29.0      88     NaN"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "data_na = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],\n",
    "        'Age': [25, 30, 22, np.nan, 29],\n",
    "        'Score1': [85, 92, 78, 65, 88],\n",
    "        'Score2': [90, 87, 75, 60, np.nan]}\n",
    "\n",
    "\n",
    "# Create the dataframe\n",
    "df = pd.DataFrame(data_na)\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "5bc93a75",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Age</th>\n",
       "      <th>Score1</th>\n",
       "      <th>Score2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Alice</td>\n",
       "      <td>25.0</td>\n",
       "      <td>85</td>\n",
       "      <td>90.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Bob</td>\n",
       "      <td>30.0</td>\n",
       "      <td>92</td>\n",
       "      <td>87.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Charlie</td>\n",
       "      <td>22.0</td>\n",
       "      <td>78</td>\n",
       "      <td>75.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>David</td>\n",
       "      <td>26.5</td>\n",
       "      <td>65</td>\n",
       "      <td>60.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Eve</td>\n",
       "      <td>29.0</td>\n",
       "      <td>88</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Name   Age  Score1  Score2\n",
       "0    Alice  25.0      85    90.0\n",
       "1      Bob  30.0      92    87.0\n",
       "2  Charlie  22.0      78    75.0\n",
       "3    David  26.5      65    60.0\n",
       "4      Eve  29.0      88     NaN"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 1. Fill missing values in the \"Age\" column with the mean age\n",
    "df['Age'].fillna(df['Age'].mean(), inplace=True)\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "7e6f78ca",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Age</th>\n",
       "      <th>Score1</th>\n",
       "      <th>Score2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Alice</td>\n",
       "      <td>25.0</td>\n",
       "      <td>85</td>\n",
       "      <td>90.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Bob</td>\n",
       "      <td>30.0</td>\n",
       "      <td>92</td>\n",
       "      <td>87.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Charlie</td>\n",
       "      <td>22.0</td>\n",
       "      <td>78</td>\n",
       "      <td>75.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>David</td>\n",
       "      <td>26.5</td>\n",
       "      <td>65</td>\n",
       "      <td>60.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Name   Age  Score1  Score2\n",
       "0    Alice  25.0      85    90.0\n",
       "1      Bob  30.0      92    87.0\n",
       "2  Charlie  22.0      78    75.0\n",
       "3    David  26.5      65    60.0"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 2. Remove rows with missing values\n",
    "df.dropna(inplace=True)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c3739a7",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eccbfd40",
   "metadata": {},
   "source": [
    "### Exercise 8: Adding New Columns\n",
    "\n",
    "Create a dictionary that contains the data in the following table. Then, convert this dictionary into a DataFrame. Use Hubble's law ($v = H_0 D$) to calculate the distance (in Mpc) of the galaxies and add it as a new column to the DataFrame.\n",
    "\n",
    "| Galaxy    | Velocity (km/s) |\n",
    "|-----------|-----------------|\n",
    "| NGC 123   | 1320            |\n",
    "| NGC 2342  | 5690            |\n",
    "| NGC 4442  | 8200            |\n",
    "\n",
    "Remember that $H_0 \\simeq 70$ km/s/Mpc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "6fa02b63",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Galaxy</th>\n",
       "      <th>Velocity (km/s)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NGC 123</td>\n",
       "      <td>1320</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NGC 2342</td>\n",
       "      <td>5690</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NGC 4442</td>\n",
       "      <td>8200</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Galaxy  Velocity (km/s)\n",
       "0   NGC 123             1320\n",
       "1  NGC 2342             5690\n",
       "2  NGC 4442             8200"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table_data = {\n",
    "    \"Galaxy\" : [\"NGC 123\", \"NGC 2342\", \"NGC 4442\"],\n",
    "    \"Velocity (km/s)\" : [1320, 5690, 8200]\n",
    "}\n",
    "\n",
    "df = pd.DataFrame(table_data)\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "1de14302",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Galaxy</th>\n",
       "      <th>Velocity (km/s)</th>\n",
       "      <th>Distance (Mpc)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NGC 123</td>\n",
       "      <td>1320</td>\n",
       "      <td>18.857143</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NGC 2342</td>\n",
       "      <td>5690</td>\n",
       "      <td>81.285714</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NGC 4442</td>\n",
       "      <td>8200</td>\n",
       "      <td>117.142857</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Galaxy  Velocity (km/s)  Distance (Mpc)\n",
       "0   NGC 123             1320       18.857143\n",
       "1  NGC 2342             5690       81.285714\n",
       "2  NGC 4442             8200      117.142857"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "H0 = 70 # Hubble's constant in km/s/Mpc\n",
    "\n",
    "df[\"Distance (Mpc)\"] = df[\"Velocity (km/s)\"] / H0\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "81404c2c",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "efee6df0",
   "metadata": {},
   "source": [
    "### Exercise 9: Data Pre-processing with Pandas\n",
    "\n",
    "Suppose you have a dataset containing information about student performance, but it requires some basic pre-processing before analysis. You are provided with a DataFrame named ``students_df`` with the following columns: 'StudentID', 'Name', 'MathScore', 'ScienceScore', 'EnglishScore', and 'TotalScore'. \n",
    "\n",
    "The exercise involves data pre-processing tasks to clean and prepare the data for analysis.\n",
    "\n",
    "```python\n",
    "data = {\n",
    "    'StudentID': [1, 2, 3, 4, 5],\n",
    "    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],\n",
    "    'MathScore': [85, 92, 78, 65, 88],\n",
    "    'ScienceScore': [90, 87, 75, 88, 92],\n",
    "    'EnglishScore': [80, 75, 88, 79, 95],\n",
    "    'TotalScore': [265, 152, 541, 202, 215]\n",
    "}\n",
    "\n",
    "students_df = pd.DataFrame(data)\n",
    "```\n",
    "\n",
    "**Your task**:\n",
    "\n",
    "1. The data in the 'TotalScore' column seems to be incorrect and needs to be recalculated as the sum of Math, Science, and English scores. Recalculate the 'TotalScore' column as the sum of MathScore, ScienceScore, and EnglishScore.\n",
    "\n",
    "2. Identify and handle any missing values (if present) in the dataset. Print the modified DataFrame.\n",
    "\n",
    "3. Calculate and add a new column 'AverageScore' that represents the average score of each student across subjects.\n",
    "\n",
    "4. Find and print the student with the highest 'AverageScore'.\n",
    "\n",
    "5. Identify and filter students who have scored 90 or above in at least one subject (Math, Science, or English).\n",
    "\n",
    "6. Print the DataFrame containing only the students who meet the criteria in part 5."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "b55f9f28",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>StudentID</th>\n",
       "      <th>Name</th>\n",
       "      <th>MathScore</th>\n",
       "      <th>ScienceScore</th>\n",
       "      <th>EnglishScore</th>\n",
       "      <th>TotalScore</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>Alice</td>\n",
       "      <td>85</td>\n",
       "      <td>90</td>\n",
       "      <td>80</td>\n",
       "      <td>265</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Bob</td>\n",
       "      <td>92</td>\n",
       "      <td>87</td>\n",
       "      <td>75</td>\n",
       "      <td>152</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>Charlie</td>\n",
       "      <td>78</td>\n",
       "      <td>75</td>\n",
       "      <td>88</td>\n",
       "      <td>541</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>David</td>\n",
       "      <td>65</td>\n",
       "      <td>88</td>\n",
       "      <td>79</td>\n",
       "      <td>202</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>Eve</td>\n",
       "      <td>88</td>\n",
       "      <td>92</td>\n",
       "      <td>95</td>\n",
       "      <td>215</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   StudentID     Name  MathScore  ScienceScore  EnglishScore  TotalScore\n",
       "0          1    Alice         85            90            80         265\n",
       "1          2      Bob         92            87            75         152\n",
       "2          3  Charlie         78            75            88         541\n",
       "3          4    David         65            88            79         202\n",
       "4          5      Eve         88            92            95         215"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create the dataframe\n",
    "data = {\n",
    "    'StudentID': [1, 2, 3, 4, 5],\n",
    "    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],\n",
    "    'MathScore': [85, 92, 78, 65, 88],\n",
    "    'ScienceScore': [90, 87, 75, 88, 92],\n",
    "    'EnglishScore': [80, 75, 88, 79, 95],\n",
    "    'TotalScore': [265, 152, 541, 202, 215]\n",
    "}\n",
    "\n",
    "students_df = pd.DataFrame(data)\n",
    "students_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "87e0d96f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>StudentID</th>\n",
       "      <th>Name</th>\n",
       "      <th>MathScore</th>\n",
       "      <th>ScienceScore</th>\n",
       "      <th>EnglishScore</th>\n",
       "      <th>TotalScore</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>Alice</td>\n",
       "      <td>85</td>\n",
       "      <td>90</td>\n",
       "      <td>80</td>\n",
       "      <td>255</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Bob</td>\n",
       "      <td>92</td>\n",
       "      <td>87</td>\n",
       "      <td>75</td>\n",
       "      <td>254</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>Charlie</td>\n",
       "      <td>78</td>\n",
       "      <td>75</td>\n",
       "      <td>88</td>\n",
       "      <td>241</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>David</td>\n",
       "      <td>65</td>\n",
       "      <td>88</td>\n",
       "      <td>79</td>\n",
       "      <td>232</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>Eve</td>\n",
       "      <td>88</td>\n",
       "      <td>92</td>\n",
       "      <td>95</td>\n",
       "      <td>275</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   StudentID     Name  MathScore  ScienceScore  EnglishScore  TotalScore\n",
       "0          1    Alice         85            90            80         255\n",
       "1          2      Bob         92            87            75         254\n",
       "2          3  Charlie         78            75            88         241\n",
       "3          4    David         65            88            79         232\n",
       "4          5      Eve         88            92            95         275"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 1. Recalculating the 'TotalScore' column\n",
    "students_df['TotalScore'] = students_df['MathScore'] + students_df['ScienceScore'] + students_df['EnglishScore']\n",
    "students_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "f3028b0b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "StudentID       0\n",
       "Name            0\n",
       "MathScore       0\n",
       "ScienceScore    0\n",
       "EnglishScore    0\n",
       "TotalScore      0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 2. Identifying and handling missing values\n",
    "students_df.isna().sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "953538e9",
   "metadata": {},
   "source": [
    "The DataFrame does not contain any missing values, so we do not have to handle this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "293f7b9e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>StudentID</th>\n",
       "      <th>Name</th>\n",
       "      <th>MathScore</th>\n",
       "      <th>ScienceScore</th>\n",
       "      <th>EnglishScore</th>\n",
       "      <th>TotalScore</th>\n",
       "      <th>AverageScore</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>Alice</td>\n",
       "      <td>85</td>\n",
       "      <td>90</td>\n",
       "      <td>80</td>\n",
       "      <td>255</td>\n",
       "      <td>85.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Bob</td>\n",
       "      <td>92</td>\n",
       "      <td>87</td>\n",
       "      <td>75</td>\n",
       "      <td>254</td>\n",
       "      <td>84.666667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>Charlie</td>\n",
       "      <td>78</td>\n",
       "      <td>75</td>\n",
       "      <td>88</td>\n",
       "      <td>241</td>\n",
       "      <td>80.333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>David</td>\n",
       "      <td>65</td>\n",
       "      <td>88</td>\n",
       "      <td>79</td>\n",
       "      <td>232</td>\n",
       "      <td>77.333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>Eve</td>\n",
       "      <td>88</td>\n",
       "      <td>92</td>\n",
       "      <td>95</td>\n",
       "      <td>275</td>\n",
       "      <td>91.666667</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   StudentID     Name  MathScore  ScienceScore  EnglishScore  TotalScore  \\\n",
       "0          1    Alice         85            90            80         255   \n",
       "1          2      Bob         92            87            75         254   \n",
       "2          3  Charlie         78            75            88         241   \n",
       "3          4    David         65            88            79         232   \n",
       "4          5      Eve         88            92            95         275   \n",
       "\n",
       "   AverageScore  \n",
       "0     85.000000  \n",
       "1     84.666667  \n",
       "2     80.333333  \n",
       "3     77.333333  \n",
       "4     91.666667  "
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 3. Calculating and adding a new 'AverageScore' column\n",
    "students_df['AverageScore'] = (\n",
    "    students_df['MathScore'] + \n",
    "    students_df['ScienceScore'] + \n",
    "    students_df['EnglishScore']\n",
    ") / 3\n",
    "\n",
    "students_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "2a3ae56a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Student with the highest AverageScore: Eve\n"
     ]
    }
   ],
   "source": [
    "# 4. Finding the student with the highest 'AverageScore'\n",
    "highest_avg_score_mask = students_df['AverageScore'] == students_df['AverageScore'].max()\n",
    "\n",
    "highest_avg_score_student = students_df[highest_avg_score_mask]['Name'].values[0]\n",
    "\n",
    "print(\"Student with the highest AverageScore:\", highest_avg_score_student)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "bb66fe2e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 5. Identifying and filtering students who have scored 90 or above in at least one subject\n",
    "above_90_score_mask = (\n",
    "    (students_df['MathScore'] >= 90) | \n",
    "    (students_df['ScienceScore'] >= 90) | \n",
    "    (students_df['EnglishScore'] >= 90)\n",
    ")\n",
    "\n",
    "filtered_students = students_df[above_90_score_mask]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "5e992885",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Students who have scored 90 or above in at least one subject:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>StudentID</th>\n",
       "      <th>Name</th>\n",
       "      <th>MathScore</th>\n",
       "      <th>ScienceScore</th>\n",
       "      <th>EnglishScore</th>\n",
       "      <th>TotalScore</th>\n",
       "      <th>AverageScore</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>Alice</td>\n",
       "      <td>85</td>\n",
       "      <td>90</td>\n",
       "      <td>80</td>\n",
       "      <td>255</td>\n",
       "      <td>85.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Bob</td>\n",
       "      <td>92</td>\n",
       "      <td>87</td>\n",
       "      <td>75</td>\n",
       "      <td>254</td>\n",
       "      <td>84.666667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>Eve</td>\n",
       "      <td>88</td>\n",
       "      <td>92</td>\n",
       "      <td>95</td>\n",
       "      <td>275</td>\n",
       "      <td>91.666667</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   StudentID   Name  MathScore  ScienceScore  EnglishScore  TotalScore  \\\n",
       "0          1  Alice         85            90            80         255   \n",
       "1          2    Bob         92            87            75         254   \n",
       "4          5    Eve         88            92            95         275   \n",
       "\n",
       "   AverageScore  \n",
       "0     85.000000  \n",
       "1     84.666667  \n",
       "4     91.666667  "
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 6. Printing the filtered DataFrame\n",
    "print(\"Students who have scored 90 or above in at least one subject:\")\n",
    "filtered_students"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa426443",
   "metadata": {},
   "source": [
    "---"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}