bundle_notifications package¶
Submodules¶
bundle_notifications.bundle_notifications module¶
Main module.
-
bundle_notifications.bundle_notifications.
add_notif_counter
(x)[source]¶ Creates a counter given a solution x
Equivalent function to:
df_g['notification_bool'] = False df_g.notification_bool.iloc[x] = True # Now do a cumsum counter df_g['notification_counter'] = df_g.notification_bool.cumsum().shift()+1 df_g.notification_counter.iloc[0] = 1
Computationally it is 50x faster to do it this way:
print('With @jit:') %time n1 = add_notif_counter_j(x,np.zeros(x[-1]+1,dtype='int')) print('Without @jit:') %time n2 = add_notif_counter(x) print('With pandas:') %time n3 = notif_counter_pandas(df_g) # Recall that 1ms = 1000 ms. So it is 95x faster np.allclose(n1,n2.astype("int") ) np.allclose(n2,n3.astype("int") ) >> With @jit: >> CPU times: user 22 µs, sys: 0 ns, total: 22 µs >> Wall time: 26.9 µs >> Without @jit: >> CPU times: user 25 µs, sys: 1 µs, total: 26 µs >> Wall time: 29.1 µs >> With pandas: >> CPU times: user 2.62 ms, sys: 705 µs, total: 3.33 ms >> Wall time: 2.73 ms
Parameters: x (np.array of int) – Array of length 4 with. Each element is an index, indicating the timestamp when the notification should be sent. Example: x = np.array([0,1,2,5])
Returns: Numpy array containing a counter, starting from 1 up to 4 Return type: np.array of int
-
bundle_notifications.bundle_notifications.
bundle
(df)[source]¶ Bundles the motifications given a pd.dataFrame of events
Parameters: df (pd.DataFrame) – DataFrame containing 4 columns: ['timestamp', 'user_id', 'friend_id', 'friend_name']
Returns: Contains 4 derived columns: ['notification_sent', 'timestamp_first_tour', 'tours', 'receiver_id', 'message']
Return type: pd.DataFrame
-
bundle_notifications.bundle_notifications.
bundle_func
(df_g)[source]¶ Bundles notifications for a user_id
This function is meant to be used after a pandas grouping (or manual filtering) of user_ids.
Parameters: df_g (pd.DataFrame) – DataFrame containing 4 columns: ['timestamp', 'user_id', 'friend_id', 'friend_name']
Returns: DataFrame containing 5 extra columns: ['notification_bool','tours','notification_counter', 'message','timestamp_first_tour']
Return type: pd.DataFrame
-
bundle_notifications.bundle_notifications.
count_tours_per_notif
(notification_counter, friend_id, friend_name, timestamp)[source]¶ Count number of friends that went on a tour during a given time, indicated by a counter.
Equivalent to:
df_g['tours'] = df_g.groupby('notification_counter')['friend_id'].apply( lambda x:(1- x.duplicated()).cumsum() )
In pseudo-code, this is equivalent to:
- As an input, we have a dataset filtered by user_id and day of the year
- Each of the inputs of this function are numpy arrays, corresponding to a column of the dataset. Doing basic in numpy is much faster, especially if we manage to use a @jit compilator (TODO)
- Let us call the solution _tours_. For each element in friend_id we do:
- Start tours = 1 at iteration i=0
- We add tours += 1 if the friend_id is new.
- We continue until i in notification_counter
- Reset tours = 1
- Keep track of the name and timestamp of the first element
- (name_first,timestamp_first_tour)
Parameters: - notification_counter (np.array) – notification counter. Could be the output of an optimal_delay method.
For example,
np.array([1,2,3,10])
for a 10 element array - friend_id (np.array of str) – array containing names of the friends
- friend_name (np.array of str) – array containing names of the friends
- timestamp (np.array of datetime64[ns]) – timestamps when the notifications are generated
Returns: - tours (np.array of int) – Count of the number of tours done since the last notification was sent, for unique friends-id
- name_first (np.array of <U256) – Names of the friend who first did a tour since the last notification was sent. len(name_first) <=4
- timestamp_first_tour (np.array of datetime64[ns]) – Timestamp of the first tour done by a friend, since the last
notification was sent.
len(timestamp_first_tour) <=4
- message (np.array of np.array of <U256) – Message to be sent.
len(message) <= 4
-
bundle_notifications.bundle_notifications.
create_message
(tours, name_first)[source]¶ Returns the notification message as a numpy array
Parameters: - tours (np.array of in) – array of integers representing the number of tours
- name_first (np.array) – array of names
Returns: array with the message like “Mona and 12 others went on a tour”
Return type: np.array
-
bundle_notifications.bundle_notifications.
create_message_single
(t, n)[source]¶ Creates custom message based on the number of tours and the friend name
Parameters: - t (int) – Number of tours
- n (str) – Name of the friend
Returns: Notification message
Return type: str
-
bundle_notifications.bundle_notifications.
load_data
(path_csv, nrows=None)[source]¶ Loads the notification csv file
Parameters: - path_csv (str) – Path or url to the csv file containing the data. It should have 4 comma-separated columns without header.
- nrows (int, optional) – Number of rows of file to read. Useful for reading pieces of large files or for testing this function.
Returns: DataFrame containing the stream of data. It has 4 columns:
'timestamp','user_id','friend_id','friend_name'
. The column named ‘timestamp’ is cast as a datetime64[ns] type.Return type: pd.DataFrame
bundle_notifications.cli module¶
Console script for bundle_notifications package.
bundle_notifications.optimal_delay module¶
This module compiles a few functions used to compute the optimal notification times.
Functions decorated with @jit are not included in the coverage report.
-
bundle_notifications.optimal_delay.
delay
[source]¶ Calculates delay if notifications are sent at indexes indicated by notification_idx
Parameters: t (np.array) – Array containing the timestamps of the events Returns: Sum of total delay Return type: datetime[ns]
-
bundle_notifications.optimal_delay.
local_search
(timestamp)[source]¶ Heuristic optimization of the notification schedule
Parameters: t (np.array (int)) – Array of integer values. These could correspond to datetime64[ns]. The inputs needs to be an integer as it is better supported by the JIT compiler (TODO: allow for datetime64 type) Returns: Optimized notification schedule. Each item corresponds to an index of t where the notification should have been sent. Return type: np.array
-
bundle_notifications.optimal_delay.
local_search_negative
[source]¶ Local search negative step
Parameters: - t (np.array (DateTime)) – array of timestamps
- x (list of int) – Indicate indexes where the notification is sent
- max_iter (int) – Maximum number of local search steps to be performed
Returns: optimized notification schedule. Each item corresponds to an index of t
Return type: np.array
-
bundle_notifications.optimal_delay.
local_search_positive
[source]¶ Local search negative step
Parameters: - t (np.array (DateTime)) – array of timestamps
- x (list of int) – Indicate indexes where the notification is sent
- max_iter (int) – Maximum number of local search steps to be performed
Returns: optimized notification schedule. Each item corresponds to an index of t
Return type: np.array
-
bundle_notifications.optimal_delay.
total_delay_brute
(timestamp)[source]¶ Given a Series of Timestamps, calculate total delay using brute force: try out all possible combinations
This function is kept here for reference and possible future implementations.
Parameters: t (np.array) – Array containing the timestamps of the events Returns: array of length 4. Each element indicates an index where the optimal notification should be sent. Return type: np.array
-
bundle_notifications.optimal_delay.
total_delay_initial
[source]¶ Given a Series of Timestamps, sample 4 points equally distributed index-wise
Parameters: t (np.array) – Array containing the timestamps of the events Returns: array of length 4. Each element indicates an index where the initial optimal notification should be sent. Return type: np.array
Module contents¶
Top-level package for bundle_notifications.