Set up files and notebook

Definitions

Helpers

Functions' library

Read and clean the original data

Definitions

Read the data

Clean up the data

Delete unimportant columns

Take interesting keys from dictionaries to new columns

Delete original columns with dictionaries

Choosing the countries for analysis

Specific cleaning, debugging for this campaign

Sellecting users who did have level 9 but not level 8

Analazing why some users didn't have level 8 but had level 9 => reason: in this version was level 8 put after level 13.

Analysis of games dropping from level 4 to level 5

Unique players on levels

All users who stopped after level 4

All user who removed app after level 4

All users who stopped after level 4 but didn't remove the app

Analysis of games dropping from level 3 to level 4

All users who stopped after level 3

All user who removed app after level 3

All users who stopped after level 3 but didn't remove the app

Unpacking all data in dictionary event_params

Checking events on user

Analysis on country UK

Analysis on levels

Definitions

Defining the levels

Levels played by unique players

Levels played by all games

Time analysis on levels

Average time spent on levels

Calculate a number of players per level

Calculates percentile of time spent on level from all mean times of users.

Total time spent on levels

Calculates 25 percentile of total time spent on level per user.

Analysis on sessions

Definitions

Defining the session

Definition of session:

If the app was removed in separate session, this can mislead the results -> events with removing the app are removed from df.

All session_id which aren't a number are also removed from df.

Events which are closer than 1h are considered as a same session -> session id is corrected

Transition to next sessions

Sessions_amount is a df of all sessions grouped with session_Nr, when function size() is called -> returns amount of players who have 1 session and 2 sessions, 3 sessions, it means that if player has for example 3 sessions, he is calculated in the amount for 1 session, 2 sessions and 3 sessions.

Total time spent on session per player

Analysis on days

Definitions

Defining the days

Dataframe without "app remove", sorted timestamps, session -> time gap < 1h

New column "date" taken from event_timestamp

Transition to next days