Kranio Methodology: Keys to Successful Data Projects

Many companies develop initiatives around their data by implementing Data Lakes or other technologies that allow them to extract, store and organize the maximum amount of data for analysis and better decisions. The objective is to better understand the needs of its customers, improve the quality of services, predict and prevent results by taking advantage of significant knowledge from all this information.

With organized and stored data, we can process and analyze current and future scenarios, answer questions such as: what is happening (Descriptive), why it is happening (Diagnostic), predict what will happen (Predictive) and what effective solutions you should take to optimize and increase efficiency (Prescriptive).

Value Matrix - Complexity in Data Projects - @2020 Kranio.io

The Data Lake is a living, dynamic and evolutionary system that receives data from different sources (structured and unstructured), with a variety of formats, where the data arrives raw, not optimized or transformed for specific purposes, so it is important to know and understand the characteristics, regulations and standards that data must meet in order to be consumed by users.

When implementing a Data Lake, you should always consider defining data governance. The first thing is to understand what is data governance? There are many definitions, for us they are all the processes, policies and tools that ensure the security, availability, usability and quality of data, ensuring that only authorized users can access and explore and exploit data, that it is updated and complete, avoiding the risk of exposing sensitive and confidential information of individuals and organizations.

Deciding how to protect data, how to maintain consistency, integrity and accuracy, and how we will always keep it up to date, are the points we will cover in this document and are part of our Kranio data process.

How do we do it? - The Secret Ingredient

Methodology for Data Projects - @2020 Kranio.io

The following summarizes the Kranio methodology applied to data processing, consisting of several stages that are not necessarily sequential:

Preparation

This is where the data processing begins. We initially define with the business and incorporate an agile framework, we lead a design sprint organized, in time, the people who participate and their context, to obtain and define the KPIs, cycles of iteration in the construction of the product, to understand the definitions and rules of the business. We conclude with an initial backlog of activities that will be prioritized during the execution of the project.

Depending on the type of project, we use or propose a framework based on agile methodologies, with our greatest experience being the use of Scrum or KanbaN, which allows us to prioritize tasks over time, defining follow-up routines with the business that allow us to give visibility of the product to interested parties (we recommend carrying out a weekly dashboard).

With regard to the form and tools of communication, monitoring and documentation, our vision is agnostic, we adapt to what clients have, and if they don't have any, we recommend and implement the minimum to meet the expectation and success of the project, supporting the establishment of standards for their data projects.

The key factors we consider in the preparation phase of a data project are:

  • Get involved with the needs and expectations of the business, understand what the problems are and what is the value that the implementation of the project will generate. What are we doing the project for? What is the real value for the business? Being clear about this allows us not only to take the business requirements, but also to contribute our experience and make suggestions and value propositions during execution.
  • Identify the interlocutors and Stakeholders clearly establishing the role they play within the project.
  • In the first instance, identify the sources of information that are available, both external and internal to the Data Lake. This helps us better manage expectations, allowing us to raise early warnings to prepare and present action plans.

Data Intake

As important as understanding the problem to be solved is understanding the data we have available. Typically, customers have a way of saving and manipulating data that doesn't necessarily make it stored the right way, or they may not have the necessary platforms to do so. With that we begin to define and agree with the customer a technological architecture aimed at meeting the needs of the current business and designed in future terms of usability, scalability and ease of maintenance.

Photo by Lasseter Winery On Unsplash

With the defined architecture, we identify the sources we need to integrate, the best way to extract the data (tool or technology), the frequency and we also evaluate if the data contains structures that allow us to identify individuals (PII) or if they contain sensitive data. This is important in order to be able to give it the appropriate treatment, and then be stored in an organized and secure manner in the format and structure defined in the Data Lake. We must analyze the current information available, in order to tend to reuse the maximum amount of information.

Los DataOps Kranio began building digital products (code) that allow data to be brought from different sources to the Data Lake. We can use a multitude of tools and services that support the extraction and storage process, however, the creation of a data pipeline is vital because it helps automate the validation and data loading processes. Thus, it provides a centralized orchestration and monitoring tool, incorporating monitoring of the execution of a process, generation of alerts, error recording and auditing trails.

Check out how to create a simple and robust data pipeline between legacy applications and the Data Lake in this video 

Checklist that you should ensure at this stage:

Define Standards:

  • It establishes the programming languages, code repositories, libraries, cloud services, orchestration, monitoring, management and auditing tools.
  • Generate the parametric code, that is, never leave static code inside the programs, use configuration files/tables.
  • Use standardized naming of buckets and stored files
  • Define the format in which we will store the data in the transformation and consumption layers.
  • In case the project requires defining a Data Model ideally this should not be oriented to a specific requirement, we must think about scalability, a robust model that allows us to lay the foundations for future requirements, not just the specific one of the project.
  • Products must have validation points to ensure that, in addition to finishing correctly, square and consistent data will be generated.

It generates monitoring and auditing processes:

  • Provides traceability of all executions (successful and failed)
  • Record all actions for data captures, transformations and outputs
  • It provides sufficient information to minimize analysis time in the event of possible failures.
  • It provides a repository of Log centralized and easily accessible to allow us to solve a problem.

It ensures the quality of the products:

  • The delivery of the products must include evidence of the respective quality control carried out. We guarantee not only that a process runs, but that it runs well and with the expected result.
  • It generates evidence of the squaring of the data generated, considering evidence of what was squared and under what scenario and conditions it was generated.
  • Reliable data with automated validations and quadratures with high monitoring coverage, prevents any error or mismatch from questioning the credibility of the digital product. Your best ally is to deliver certified, error-free, consistent work, warranted.
  • Products designed for operational continuity, with all the resources available for easy delivery and control by the operating team.
  • End-to-end automated products avoid manual interventions that jeopardize operational continuity.

All of the above will allow us, in addition to certifying the work, that all implementations have the same guideline and way of doing things, and we can optimize construction time, improve the quality and clarity of understanding and traceability of each part of the process.

Data processing and enrichment

Already with the data in the Data Lake in a secure and organized form, we establish within the framework the procedures and transformations to move from data stored in raw form to information usable by customers.

Knowing that data is obtained from multiple sources that can be unreliable, it is vital to be able to have a process of analyzing the quality and usability of the data. This process can be started manually, however it must end in automated processes with tools. The role of Architects and Data Engineers it is vital since they help us evaluate and detect if more information is required to define a Dataset reliable that allows performing the required analyses. They also help identify the missing data for the correct creation of the Dataset essential in the development of the project, and to eliminate incorrect, duplicate or incomplete data.

In this process, specialists implement the best practices of data projects (BigData), organize information through various categories and classifications as the data allows, each subset is analyzed independently and they carry out transformations to structure or enrich the data by adding new columns, calculated data that are generated from existing data or by incorporating information from other external sources. The final data can be made available to the user through a relational model in a Datawarehouse , a view for accessing data or a consumption file.

Taking advantage of the “intelligence” provided by data is the role of data scientist which is responsible for analyzing simulated scenarios to make predictions, providing mathematical modeling, statistical analysis, machine learning techniques, developing predictive analysis, clustering, regression, pattern recognition that allows us to bring new data that enrich the project and give more value to customers.

Information security and correct user access to the Data Lake must be established in data governance where ownership, forms of access, security rules for sensitive data, data history, sources of origin and others are identified.

Operation, consumption and operational continuity

The last aspect is how users will use the data. Regarding the consumption and exploitation of the data generated, the focus must be on the needs of the business user, with whom we co-design a solution proposal that meets the requirements and parameters defined in the preparation phase. This work includes reviewing the survey of the requirements defined in the preparation of the project and the selection of a platform suitable for your visualization needs such as Tableau, PowerBI, AWS QuickSight or another. You must create a Storyboard for different categories of users and to prepare a personalized and efficient design for understanding the data being presented.

A good job of discovering and exploring data will define the basis of the objective of having a data visualization platform self-service, where users obtain insights to improve management and decision-making, in an intelligent way and backed by reliable information. For example, See how a company 'listens' to what its customers say in forms, contact centers and social networks and uses it to improve customer service

The quality and reliability of the data is of vital importance, since if the data arrives poorly on the platform, there will be errors in the visualizations and the dashboards and the files generated will not provide the information required by the customer. You should emphasize this point since the analysis corresponds to 70% of the total time required to create a good Dashboard.

Well, not everything ends with the deliverable of a dashboard or file for customers, you must also think about the future, that is, consider other aspects that are important:

  • Operational continuity of the digital products developed. Most of the time, the operation and monitoring in a productive environment will be the responsibility of the customer, so the focus is on the lives of these people, that is, to minimize the time spent monitoring, to minimize the time it takes to review the causes of a problem, and to minimize the time to solve a fault.
  • Scalability of the solution in all components such as infrastructure, architecture, together with the tools that will be used during development and that allow it to grow as the business requires it.
  • Easy to draw any possible problems. Be able to quickly find information, have an initial diagnosis and a correction plan.
  • Minimize complexity of processes to simplify future adjustments or improvements.
  • All projects include full documentation which facilitate collective understanding.
  • Avoid using multiple tools that in the long run do not give value to the business.
  • Generate the instances of communication with the customer that make it possible to make an effective and clear transfer of everything done. The more clarity the client has of all the good work that was done, the greater the degree of satisfaction, and he always considers the profile of the interlocutor (operations, business and others).

Conclusion 

It ensures success in the design, implementation and development of a data project by applying a data project methodology. In this article, we show you the Kranio methodology applied and refined in dozens of data projects in several countries. If you apply this Methodology, you are more likely to meet expectations and avoid mistakes that can ruin the project.

Another fundamental aspect that you must ensure is the participation of the business user. They are the ones who consume data to improve decisions. From the start, at every stage, robust, aligned and well-connected teams deliver better projects.

Do you want to review your methodology or implement this Methodology for successful projects?

Ready to take your data projects to the next level?

At Kranio, we apply a proven methodology that ensures success at every stage of your data initiatives. Our team of experts will guide you from planning to implementation, ensuring effective results aligned with your business objectives. Contact us and discover how we can contribute to the success of your company.

JP

September 16, 2024