e7herodata

Project: E7 Hero Data

Introduction

“E7 Hero Data” is a personal project which involves web scraping, data analysis, visualization, and the creation of an interactive web dashboard. This project is dedicated to extracting, analyzing, and presenting data related to characters from Epic Seven.

For some context, Epic Seven is a turn-based strategy game developed by a Korean game company Smilegate. In a fight, heroes take turns to use their ability to deal damage, heal or provide utility such as buffing allies and debuffing enemies. Each heroes would have a rarity from 1 to 5 Stars; though 1 and 2 stars heroes are rarely use in a fight but rather as fodders to upgrade other heroes so I have omitted them from this project. Together with Class and Horoscope, the base stats of a hero could be determined; with some exceptions such as Summertime Iseria having a 50% atk increase from her Passive skills. This project aims to display the relationships between each of the factors and explores how each stats relate to one anothers. (Side note: from the picture, you might notice that theres are “equipments” that each hero can equip. These could give either a flat or a percentage increase based on the base stat. Hence, this is why it is why it is important for a hero to have a high base stat)

Summertime Iseria, a 5 Star, Capricorn, Ranger

 

Key Findings

  1. While flat stats (Attack, Heath, Defense and Speed) and effectiveness generally increase with increasing rarity, other percentage stat (Crit Chance, Crit Damage, Eff Res) does not follow this trend. Notably, Crit Chance has the opposite relationship.

Flat Stat average based on rarity

 

Average of Crit Chance based on rarity

 

  1. Different class have their own “specialities”
    • Mage has high Attack, Defense, Effectiveness, medium Speed, Crit Chance, Eff Res and low Health.
    • Thief has high Attack, Crit Chance, Speed, medium Health, low Defense, Effectiveness and low Eff Res = Ranger has high Attack, Speed, Effectiveness, medium Health, Defense, Crit Chance and low Eff Res
    • Warrior has high Attack, medium Health, Defense, Speed, Crit Chance and low Effectiveness and Eff Res
    • Knight has high Health, Defense, medium Attack, Crit Chance, Effectiveness, Eff Res and low Speed
    • Soul Weaver has High Defense and Eff Res, medium Speed Effectiveness and low Attack, Health and Crit Chance
    • Crit Damage is pretty constant for all class.

Average of stats by class

 

  1. Horoscopes can be extreme:
    • The horoscope Cancer ranked near the top for defensive (hp, def) stat while being near the bottom for utility stat (speed, eff, er) and offensive stat (atk, cc). Similar to previously, crit damage does not really vary with horoscope.
    • On the other hand, horoscope Leo ranked near the bottom for defensive stat and utility stat while being near the top for offensive stat.

Health, Defense, Attack and Speed by Horoscope (Arrow pointing to Cancer)

 

  1. Correlation and Conclusion
    • It should be quite clear that in Epic Seven, there are no class, horoscope and rarity which are the best for everything. In order for the game to be balanced, there should always be a tradeoff between one stat and another. To confirm the general trend and relationship between each stat, we can use plot a Correlation Matrix using Matplotlib.

 

The Process

Part 1: Data Extraction ( e7xscrape.py )

The project begins with e7xscrape.py, which serves as the foundation for gathering character data. Here’s a summary of its role in the project:

Firstly, I import the necessary modules

import urllib.request
from bs4 import BeautifulSoup
import re
import pandas as pd
url = "https://epic7x.com/characters/"
request_site = urllib.request.Request(url, headers={"User-Agent": "Mozilla/5.0"})
html = urllib.request.urlopen(request_site).read()
tags = soup("script")
counter = 0
for tag in tags:
    if counter == 8:
        new_tag = f'{tag}'
        break
    counter += 1

cleaned = new_tag[152:106439].rstrip()
result = re.findall("\{.*?\}", cleaned)

Here's how the array look like

 

dict = {}
for res in result:   
    icon = res.index("icon")
    
    if res[9:icon-3] == "Support Model Brinus":
        continue 

    link = res.index('link')
    rar = res.index("rarity")
    cla = res.index("class")
    ele = res.index("element")
    hor = res.index("horoscope")
    max = res.index("max")
    att = res.index("attack")
    hea = res.index("health")
    defe =res.index("defense")
    spd = res.index("speed")
    link_end = res.index('","stats')
    dict[f"{res[9:icon-3]}"] = {}
    dict[f"{res[9:icon-3]}"]["link"] = res[link+7:link_end].replace("\/", "/")
    dict[f"{res[9:icon-3]}"]["info"] = {}
    dict[f"{res[9:icon-3]}"]["info"]["rarity"] = res[rar+9]
    dict[f"{res[9:icon-3]}"]["info"]["class"] = res[cla+8:ele-3]
    dict[f"{res[9:icon-3]}"]["info"]["horoscope"] = res[hor+12:link-3]  
    dict[f"{res[9:icon-3]}"]["info"]["attack"] = int(res[att+9:hea-3])
    dict[f"{res[9:icon-3]}"]["info"]["health"] = int(res[hea+9:defe-3])
    dict[f"{res[9:icon-3]}"]["info"]["defense"] = int(res[defe+10:spd-3])
for char in dict:
    char_link = dict[char]["link"]
    request_site = urllib.request.Request(char_link, headers={"User-Agent": "Mozilla/5.0"})

    html = urllib.request.urlopen(request_site, context=ctx).read()
    soup = BeautifulSoup(html, "html.parser")

    tags = soup("tr")
    i = 0

    if dict[char]["info"]["rarity"] == "5":
        for tag in tags:
            if i == 7:
                stat_table = tag
            i += 1
    elif dict[char]["info"]["rarity"] == "4":
        for tag in tags:
            if i == 9:
                stat_table = tag
            i += 1
    elif dict[char]["info"]["rarity"] == "3":
        for tag in tags:
            if i == 11:
                stat_table = tag
            i += 1

    stat = []
    try:
        for child in stat_table.children:
            try:
                for kid in child.children:
                    try:
                        for baby in kid.children:
                            if f"{baby}"[0] != " ":
                                try:
                                    new_baby = int(baby)
                                except:
                                    new_baby = int(baby[:-1])
                                stat.append(new_baby)
                            else:
                                start = baby.index("(") +1
                                try:
                                    end = baby.index("%")
                                except:
                                    end = baby.index(")")
                                stat[prev] += int(baby[start:end])
                            prev = len(stat) - 1 
                    except:
                        pass
            except:
                pass
    except:
        pass
dict[char]["info"]["crit chance"] = stat[0]
dict[char]["info"]["crit damage"] = stat[1]
dict[char]["info"]["effectiveness"] = stat[2]
dict[char]["info"]["effectiveness resistance"] = stat[3]
dict[char]["info"]["speed"] = stat[4]

if len(dict[char]["info"]) != 11:
    print(f"Len Mismatch for {char}, {dict[char]['info']}")
    break
df = pd.DataFrame(new_dict)
df.to_csv(f"e7HeroData.csv")

Part 2: Data Visualization ( e7visualization.py )

In this section, we delve into the script e7visualization.py, which is dedicated to the visual representation of character data. Here’s an outline of its role:

for i, fil in enumerate(filters[filt]): temp = stats[stat][0][stats[stat][0][filt] == fil]

perc = []
for v in range(len(temp)):
    perc.append(v * 100 / (len(temp) - 1))

stats[stat].append(temp)
plt.plot(perc, stats[stat][i + 1][stat].values, colour[i], label=filters[filt][i])

plt.title(f”{stat} by {filt}”) plt.xlabel(“percentile”) plt.ylabel(stat) plt.legend()



                                              ![](https://cdn.discordapp.com/attachments/844184695754457122/1155136996247863296/image.png "Speed by class")

<div align="center"> Distribution of Speed by Class </div>

&nbsp;

- **Plot Storage**: The resulting plots are stored as png files, each depicting the variation of a specific statistic (e.g., attack) across distinct character categories (e.g., rarity) with the convention stat-filter.png

plt.savefig(f”{stat}-{filt}.png”) plt.clf()

![](https://media.discordapp.net/attachments/844184695754457122/1156248987435802765/image.png?ex=65144848&is=6512f6c8&hm=b2d83d5340ef2be5437bbf44ccd64dcd98a6ea766f1c78a0a5b7d5c1c225118e&=&width=1920&height=636 "E7 Hero Data Graphs")

<div align="center"> Here's the 24 graphs plotted with Matplotlib </div>

&nbsp;

## Part 3: Additional Data Analysis ( [e7supplementary.py](https://github.com/pthanapon/e7herodata/blob/main/e7supplementary.py) )

This section introduces the script `e7supplementary.py`, which enhances the project with additional data analysis. Here's an overview of its role:

- **Data Import**: The script imports character data from the "e7HeroData.csv" file using the Pandas library similarly to part 2.

- **Mean Calculation**: It computes the mean values for specific character statistics.

mean_df = df.groupby(fil)[stat].mean().reset_index()


- **Data Grouping**: The script groups the data based on filter criteria such as rarity, class, and horoscope.

sorted_df = mean_df.sort_values(by=stat)


- **Visualization**: For each stat, it generates a series of bar plots, one for each filter, to visualize the average value of that statistic across various categories together in a single plot.

for stat in stats: fig, axes = plt.subplots(len(filter), figsize=(8, 12))

for i, fil in enumerate(filter):
    mean_df = df.groupby(fil)[stat].mean().reset_index()

    sorted_df = mean_df.sort_values(by=stat)

    sns.barplot(data=sorted_df, x=fil, y=stat, ax=axes[i])

    axes[i].set_title(f"Average {stat} by {fil}")
    axes[i].set_xlabel(fil)
    axes[i].set_ylabel(f"Average {stat}") ```

                                        

Mean of Attack based on Rarity, Class and Horoscope

 

While Part 2 use Matplotlib for visualization, Part 3 employs the Seaborn library to create these informative visualizations. Together, these visualizations allowing me to explore character statistics from different perspectives and gain valuable insights into the game’s characters.

Part 4: Interactive Web Dashboard ( e7dash.py )

The final component of the “E7 Hero Data” project is the interactive web dashboard created using the script e7dash.py. This part ties everything together:

Initially, I wasn’t planning to make a dashboard for this project. However, after taking an AI literacy course, I started to wonder if I could use AI to make this project better. So thats exactly what I asked ChatGPT

Me:
how should i improve the code above and analyze the data better

ChatGPT:
[omitted]
10. Interactive Dashboard:
Create an interactive dashboard using tools like Dash (for web-based apps)
or Jupyter Widgets to allow users to explore the data interactively.
[omitted]

However, I have never made a web-based dashboard using python before so I tried to prompt it further for more specific instruction. After hours of fine-tuning, this is the result:

E7 Hero Data Dashboard

 

Dropdown Menu for stat options

 

Part 5: Challenges Faced

Throughout the development of the “E7 Hero Data” project, I encountered several challenges that required problem-solving and troubleshooting. Here, I outline two notable challenges and how I addressed them:

Challenge 1: Handling Different Data Structures for 4-Star and 3-Star Heroes

One of the unexpected challenges arose from the fact that character data for 4-star and 3-star heroes on the Epic Seven website was stored differently compared to 5-star heroes. This difference in data structure led to missing stat values in the initial data extraction process. To overcome this challenge:

Challenge 2: Dealing with FutureWarning in Seaborn Library

When I incorporated the Seaborn library for data visualization, I encountered a recurring FutureWarning message. This warning was related to certain aspects of data visualization in Seaborn. To resolve this issue:

FutureWarning

 

By addressing these challenges, I not only enhanced the robustness of the project but also gained valuable problem-solving skills that are essential in the field of data analysis and web scraping.

Conclusion

The “E7 Hero Data” project combines web scraping, data analysis, visualization, and web development to create a comprehensive platform for exploring and understanding character data from Epic Seven. Users can interact with the data through the web dashboard, explore character statistics, and gain valuable insights into the game’s characters. This project showcases how data extraction, analysis, and visualization can be transformed into an engaging and informative dashboard.