Tables are data tables that aggregate data from user-level events (e.g. purchase made on a certain day) and visualized in Charts.
Tables are also a tool for optimizing the use of data. Projects may contain large amount of Event Types and an even larger amount of Data Fields, but you may want to use only some of them in one Dashboard or set of Charts. Tables focus on a part of the large mass of data and thus provide efficient access to that data. Having larger Tables with more Dimensions and Measures can make building Charts easier, as the existing Tables may already contain all data needed for new Chart. However, smaller Tables are more efficient.
The columns of Tables are various types of Data Fields. Aside from Table Fields, which are virtual and only calculated when Charts query the Table, these Fields store all the data in database.
Tables are reprocessed at midnight UTC every day, and aggregate and append data generated during the day to the existing table. As an example, at midnight UTC of January 2nd, all the data generated on January 1st will be aggregated and appended to the table, and then on January 2nd the data from January 1st will be available. On-Demand Jobs can be run at any time to refresh the data in a Table in a Custom Project, and can show data generated up to the current day. Standard Metrics does not allow On-Demand Jobs, so data will always be up to yesterday. The exception to this is hourly reports, which have a one hour latency.
To access Tables, Navigate to Model > Tables. In here you can search for Tables, create new ones, and edit, clone and delete existing ones.
Before starting to create Tables, it is strongly recommended to familiarize oneself with Data Fields. They process the data of Events to be aggregated in the Table, and are used as the columns of the Table.
Omniata’s data visualization tool allows anyone create charts and data aggregations to their needs, regardless of technical expertise. There are some unique aspects to theplatform, and one of the most important distinctions is between Dimensions and Measures.
Dimensions are data fields that are not aggregated; in other words, a data field that will never have any arithmetic operations performed on it. Examples of these are dates, country names, ad network names, but numeric demographic data such as user ages would also fall into this category. A good way to think about Dimensions is to ask “would adding two values of this field together make sense?” If the answer is no, then it is most likely a Dimension.
Dimensions breaks out metrics, such as revenue by acquisition date, or DAU by country name. Other Dimensions transform data values, such as a SQL CASE statement.
Measures are data fields that are meant to be aggregated; in other words, a data field that can have arithmetic operations performed on it. Examples of these would be counts of users, summations of revenue, or time spent using an app. As with Dimensions, if the answer to “would adding two values of this field together make sense?” is yes, then the value should probably be a Measure.
Measures aggregate values together to form metrics, such as summing the revenue generated, counting the number of users that are active, or finding the average, minimum, or maximum values, such as for payment data.
All Tables must include the following Omniata-Specific Fields:
If any of these Fields are left out of a Table, it will not be able to be run nightly or on-demand.
Unlike most off-the-shelf tools, Omniata natively offers a large number of predefined data fields that can be used to enable a vast range of analyses. Some may be unfamiliar, and as such it may not be clear whether something should be a Dimension or Measure. Below you’ll find some commonly used fields, what the field means, and whether it is a Dimension or Measure.
If you are interested in a particular Count of Unique Users, please note that the value obtained by the Count of Unique Users always depends on the Dimensions that are part of the Table. The value is calculated per one combination of values of all the Dimensions. For example, if the Dimensions are User Activity Date, Country Name, and Platform, the Count of Unique Users is calculated per one User Activity Date, one Country Name and one Platform. In most cases, the values cannot be re-aggregated across different combination of Dimensions, for example only per User Activity Date and Country, without modifying and reprocessing the Table.
By default, Data Fields with string values are recognized as Dimensions, and Data Fields with numeric values as Measures. Please note that Tables have a limit to the number of Dimensions and Measures that can be used in a single table. Each table can have 128 bytes of storage for the Dimensions and Measures, with each Dimension (X) taking 4 bytes and each Measure (Y) taking 8 bytes, as shown in this equation:
4X + 8Y <= 128
If a Table exceeds 128 bytes, it will cause issues with the processing. Table Fields do not count towards this total as they are virtual.
Navigate to Model > Tables. Select 'New...'. You will have the following options:
Building an Event Based Table consist of selecting the scope of the Table, providing the structure of the Table through dimensions and measures, and adding configurations.
Once saved, the next step is to run an on-demand job that will populate the Table based on the provided schema. Select a date range and press 'Run Job'. You have an option to publish the job upon completion, which means that the new Table is available for Chart building. Unselecting option means that the data is not available for Charts but the data itself is processed and can be accessed under the Model > Jobs section. an unpublished job can be also published at a later date.
Once the job has completed and it is published, you have an option to create Charts directly in the panel.
Materialized Views are data sets that contain the results of a query. It can be a local copy of data located remotely, or may be a subset of rows and/or columns of an existing table, join results of two tables, or aggregations of an existing table's data. To summarize, Materialized Views are snapshots of the data set that are created for performance reasons, i.e. as a form of optimization. An example would be to create a UID based table and using this as the foundation of creating the Materialized View (example below). If the base Table grows in size rapidly, the best option is to create a Materialized View.
Step 1. Create a UID based table i.e. UID being one of the dimensions in the reporting table. Include other dimensions and measures as desired.
Step 2. Create Materialized View
Step 3. Populate the Materialized View selecting data range of your choice, and choose the approriate environment. You need to choose whether to 'Drop All' existing tables and replace them with new ones, 'Drop if Exists' if you wish to update a specific date range, and 'None' that can be used to fill a gap in the data for the specified date range. Finally, you can press 'Run Job' to start the job.
Once the job has completed, you have an option to create Charts directly in the panel.
Processing a Table means to process the Events of a certain time period and aggregating the data to the Table. Processing transforms incoming raw Events into aggregated data. There are two distinct cases of processing Tables: daily update, and manually data processing also known as On-Demand Job. If Table is set to run daily, the Events from previous day are automatically processed at 00:00 UTC and the data is appended to the existing Table. Nightly updates are the default way of keeping Tables up-to-date.
Table On-Demand Job is a manual processing of the Table. An On-Demand Job processes all Events sent during Processing Date Range and saves the resulting data. With default settings, data from the On-Demand Job is automatically published, and will overwrite all old data in the Table. A new Job is needed when creating a new Table or updating an existing Table.
Note: If you only add new Table Fields, but not other Data Fields, you do not need to run a Job. This is because Table Fields do not process events, but run on top of already aggregated data.
Jobs can be started in Table Actions. Navigate to Model > Tables. Then navigate to the Table for which you want to run the Job and select Run. You can then select the processing Date Range and whether you wish to Publish the Table once the processing is completed. Publishing means that the old Table will be automatically overwritten by the data from the Job. That means all Charts will use the data generated by this Job. If not checked, you can still export the resulting data as a CSV file once the Job is completed. You can also publish the data later, for example if you first want to check the data by exporting it.
When creating or editing a Table, it is recommended to run Jobs with only one day first. This way processing and potential iterations are faster. The longer date range can be processed when the Table is considered ready. If default settings are used for a Job, data from the Job is automatically published and it will overwrite all old data in the Table. That is why it is very important to be careful when creating a new Job.
On-Demand Jobs can be accessed from Data Model > Jobs. Thee Jobs page shows all On-Demand Jobs of all Tables that have been processed earlier, and that are currently being processed. If the Job is not completed, the Properties and Actions page shows the progress of the Job.
When the Job is completed, there are other Actions available.
In certain circumstances it is desireable to limit the data that is used for creating a Table, such as only including data from users with certain demographic data, or even an individual user or group of users. To enable this, click on "Advanced" at the top of the Table builder. To utilize this, write Lua within the box and it will be applied to the data prior to performing the aggregations and creating the Table. For example, to limit the data to a user with UID "12345678", you would utilize this code:
tonumber(event_kvp['uid']) == 12345678
It is important to note the use of "tonumber()" here, as with all Lua used within Omniata, there are specific practices that must be followed. Please see our documentation on Custom Lua for more information.
A Debug Job is way to process a Table so that resulting table has more information than when processing the Table normally. The differences between a Debug Job and a normal On-Demand Job are:
Max Rows works so that when processing a day there's a counter starting from zero and when an event is aggregated, the counter is incremented. When the counter reaches Max Rows, no further events are aggregated for that date. There's a twist, Omniata has internal user sharding system and there's a counter for per each shard (having internal Max Rows counter which is the Max Rows divided per the shard count). This means in practice that it's not exactly so that the first Max Rows events of the day are aggregated. Currently even a small Max Rows value doesn't necessarily speed up the job processing since the events of a date are processed even if they wouldn't be aggregated (Max Rows was reached).
The following are the additional dimensions Debug Job adds.
|om_meta_highway||int(11)||Omniata internal identifier of user shard of the User of the Event, value between 0 and 3|
|om_meta_lane||int(11)||Omniata internal identifier of user shard of the User of the Event, value between 0 and 7|
|om_meta_count||int(11)||Event counter. Unique in the the table with unique om_meta_highway / om_meta_lane combination.|
|om_meta_uid||varchar(255)||The uid of the Event|
|om_meta_timestamp||varchar(255)||The Unix timestamp of the Event|
|om_meta_ipaddr||varchar(255)||The source IP address of the Event|
|om_meta_api_key||varchar(255)||The API Key of the Event|
|om_meta_time_str||varchar(255)||The time of the Event HH:mm:ss, e.g. 23:15:10. Note if you need time and date, use activity_date and SQL query to concatenate that with this column.|
A Condition (in the context of a Debug Job) is a Lua-code snippet that decides whether an Event is aggregated into a Table or not. The filter is applied separately for each event. It is evaluated after the User State and Custom User Fields have been updated, but before the event is aggregated. The idea is that an event can be conditionally included in the aggregation. If the event is not aggregated, it doesn’t (directly) affect the contents of the Table, since it’s dropped before aggregation. It can indirectly affect because it can cause modifications in the User State and Custom User Fields. If the is Event aggregated the effect of the filter is the same as having no filter, i.e. the Event is aggregated normally.
Technically, a filter is a Lua-snippet that returns a boolean and has available the following input:
The Event Fields are not available directly, but the formula of an Event Field can be copy-pasted to a filter as such, because the filter has available the same information as an Event Field has. An example use case of a Debug Job is to find all the actions of a specific user during the date range specified. To do so, enter event_kvp['uid'] == '123abc' in the Condition text box, and only events from a user with "123abc" as their UID will be shown.
Debug Job is started on the Table by clicking "Debug Job". User Max Rows slider and Condtions box to set the parameters for a job. Note: you cannot publish a Debug Job. Click 'Create Job' to start processing of the job. Once the Job has finished you can download the CSV file for further analysis or use Raw SQL access (if you have that) to examine the raw SQL table
When analyzing the table, you can use SORT BY om_meta_highway, om_meta_lane, om_meta_count to see the events (for each highway & lane) in the order they were processed.
When developing and debugging Tables, only include events of a time window in order to hand pick certain events:
An exact time: event_meta.timetamp == TIMESTAMP Time window: event_meta.timetamp >= TIMESTAMP and event_meta < TIMESTAMP
Include event based on User State and the Custom User Attributes. This is useful e.g. when you want to include in the aggregation event only when the User State (or the Custom User Attributes) fullfils a certain criteria, e.g. report first and second purchase.
user_vars['count_of_purchase'] == 1 or user_vars['count_of_purchase'] == 2
If a Table does not contain the data you expected it to contain, consider the following steps:
Cloning means creating a new Table based on an existing one. Data Setup and other settings of the existing Table are copied as the base for the new Table, but they can be edited in the process. Tables can be cloned only within a Project.
To clone a Table navigate to Model > Tables and select Clone from Table Actions.
All Tables can be edited at any time. If you add Event Fields or User Fields to the Table, you need to run a On-Demand Job to reprocess the data in the Table. If you only modify Table Fields, you do not need to run a Job as Table Fields do not process events, but function on top of already aggregated data.
To edit a Table navigate to Model > Tables.
All Tables can be deleted at any time. When a Table is deleted, the data it contains will be removed. Charts that are using that Table will stay in place but they will not show data anymore.
To delete a Table navigate to Model > Tables.