bundle_notifications package

Submodules

bundle_notifications.bundle_notifications module

Main module.

bundle_notifications.bundle_notifications.add_notif_counter(x)[source]

Creates a counter given a solution x

Equivalent function to:

df_g['notification_bool'] = False
df_g.notification_bool.iloc[x] = True

# Now do a cumsum counter
df_g['notification_counter'] =
    df_g.notification_bool.cumsum().shift()+1
df_g.notification_counter.iloc[0] = 1

Computationally it is 50x faster to do it this way:

print('With @jit:')
%time n1 = add_notif_counter_j(x,np.zeros(x[-1]+1,dtype='int'))
print('Without @jit:')
%time n2 = add_notif_counter(x)
print('With pandas:')
%time n3 = notif_counter_pandas(df_g)
# Recall that 1ms = 1000 ms. So it is 95x faster

np.allclose(n1,n2.astype("int") )
np.allclose(n2,n3.astype("int") )

>> With @jit:
>> CPU times: user 22 µs, sys: 0 ns, total: 22 µs
>> Wall time: 26.9 µs
>> Without @jit:
>> CPU times: user 25 µs, sys: 1 µs, total: 26 µs
>> Wall time: 29.1 µs
>> With pandas:
>> CPU times: user 2.62 ms, sys: 705 µs, total: 3.33 ms
>> Wall time: 2.73 ms
Parameters:x (np.array of int) – Array of length 4 with. Each element is an index, indicating the timestamp when the notification should be sent. Example: x = np.array([0,1,2,5])
Returns:Numpy array containing a counter, starting from 1 up to 4
Return type:np.array of int
bundle_notifications.bundle_notifications.bundle(df)[source]

Bundles the motifications given a pd.dataFrame of events

Parameters:df (pd.DataFrame) – DataFrame containing 4 columns: ['timestamp', 'user_id', 'friend_id', 'friend_name']
Returns:Contains 4 derived columns: ['notification_sent', 'timestamp_first_tour', 'tours', 'receiver_id', 'message']
Return type:pd.DataFrame
bundle_notifications.bundle_notifications.bundle_func(df_g)[source]

Bundles notifications for a user_id

This function is meant to be used after a pandas grouping (or manual filtering) of user_ids.

Parameters:df_g (pd.DataFrame) – DataFrame containing 4 columns: ['timestamp', 'user_id', 'friend_id', 'friend_name']
Returns:DataFrame containing 5 extra columns: ['notification_bool','tours','notification_counter', 'message','timestamp_first_tour']
Return type:pd.DataFrame
bundle_notifications.bundle_notifications.count_tours_per_notif(notification_counter, friend_id, friend_name, timestamp)[source]

Count number of friends that went on a tour during a given time, indicated by a counter.

Equivalent to:

df_g['tours'] =
    df_g.groupby('notification_counter')['friend_id'].apply(
            lambda x:(1- x.duplicated()).cumsum()
            )

In pseudo-code, this is equivalent to:

  • As an input, we have a dataset filtered by user_id and day of the year
  • Each of the inputs of this function are numpy arrays, corresponding to a column of the dataset. Doing basic in numpy is much faster, especially if we manage to use a @jit compilator (TODO)
  • Let us call the solution _tours_. For each element in friend_id we do:
    • Start tours = 1 at iteration i=0
    • We add tours += 1 if the friend_id is new.
    • We continue until i in notification_counter
      • Reset tours = 1
      • Keep track of the name and timestamp of the first element
        (name_first,timestamp_first_tour)
Parameters:
  • notification_counter (np.array) – notification counter. Could be the output of an optimal_delay method. For example, np.array([1,2,3,10]) for a 10 element array
  • friend_id (np.array of str) – array containing names of the friends
  • friend_name (np.array of str) – array containing names of the friends
  • timestamp (np.array of datetime64[ns]) – timestamps when the notifications are generated
Returns:

  • tours (np.array of int) – Count of the number of tours done since the last notification was sent, for unique friends-id
  • name_first (np.array of <U256) – Names of the friend who first did a tour since the last notification was sent. len(name_first) <=4
  • timestamp_first_tour (np.array of datetime64[ns]) – Timestamp of the first tour done by a friend, since the last notification was sent. len(timestamp_first_tour) <=4
  • message (np.array of np.array of <U256) – Message to be sent. len(message) <= 4

bundle_notifications.bundle_notifications.create_message(tours, name_first)[source]

Returns the notification message as a numpy array

Parameters:
  • tours (np.array of in) – array of integers representing the number of tours
  • name_first (np.array) – array of names
Returns:

array with the message like “Mona and 12 others went on a tour”

Return type:

np.array

bundle_notifications.bundle_notifications.create_message_single(t, n)[source]

Creates custom message based on the number of tours and the friend name

Parameters:
  • t (int) – Number of tours
  • n (str) – Name of the friend
Returns:

Notification message

Return type:

str

bundle_notifications.bundle_notifications.load_data(path_csv, nrows=None)[source]

Loads the notification csv file

Parameters:
  • path_csv (str) – Path or url to the csv file containing the data. It should have 4 comma-separated columns without header.
  • nrows (int, optional) – Number of rows of file to read. Useful for reading pieces of large files or for testing this function.
Returns:

DataFrame containing the stream of data. It has 4 columns: 'timestamp','user_id','friend_id','friend_name'. The column named ‘timestamp’ is cast as a datetime64[ns] type.

Return type:

pd.DataFrame

bundle_notifications.cli module

Console script for bundle_notifications package.

bundle_notifications.optimal_delay module

This module compiles a few functions used to compute the optimal notification times.

Functions decorated with @jit are not included in the coverage report.

bundle_notifications.optimal_delay.delay[source]

Calculates delay if notifications are sent at indexes indicated by notification_idx

Parameters:t (np.array) – Array containing the timestamps of the events
Returns:Sum of total delay
Return type:datetime[ns]

Heuristic optimization of the notification schedule

Parameters:t (np.array (int)) – Array of integer values. These could correspond to datetime64[ns]. The inputs needs to be an integer as it is better supported by the JIT compiler (TODO: allow for datetime64 type)
Returns:Optimized notification schedule. Each item corresponds to an index of t where the notification should have been sent.
Return type:np.array
bundle_notifications.optimal_delay.local_search_negative[source]

Local search negative step

Parameters:
  • t (np.array (DateTime)) – array of timestamps
  • x (list of int) – Indicate indexes where the notification is sent
  • max_iter (int) – Maximum number of local search steps to be performed
Returns:

optimized notification schedule. Each item corresponds to an index of t

Return type:

np.array

bundle_notifications.optimal_delay.local_search_positive[source]

Local search negative step

Parameters:
  • t (np.array (DateTime)) – array of timestamps
  • x (list of int) – Indicate indexes where the notification is sent
  • max_iter (int) – Maximum number of local search steps to be performed
Returns:

optimized notification schedule. Each item corresponds to an index of t

Return type:

np.array

bundle_notifications.optimal_delay.total_delay_brute(timestamp)[source]

Given a Series of Timestamps, calculate total delay using brute force: try out all possible combinations

This function is kept here for reference and possible future implementations.

Parameters:t (np.array) – Array containing the timestamps of the events
Returns:array of length 4. Each element indicates an index where the optimal notification should be sent.
Return type:np.array
bundle_notifications.optimal_delay.total_delay_initial[source]

Given a Series of Timestamps, sample 4 points equally distributed index-wise

Parameters:t (np.array) – Array containing the timestamps of the events
Returns:array of length 4. Each element indicates an index where the initial optimal notification should be sent.
Return type:np.array

Module contents

Top-level package for bundle_notifications.