Copyright © 2018 DataScience.US All Rights Reserved.
12 Best Practices for Big Data and Data Science
The 12 best practices for big data and data science along with a few comments about why each is important. Think of the best practices as recommendations that can guide your organization into successful implementations of big data and #DataScience.
- Get your data in order.
The right data management strategy is important to big data and data science success. In the zeal to get started analyzing data, organizations often don’t pay attention to that data. Yes, it is OK to experiment on raw data; a good data scientist usually explores the data before building models—particularly models that are put into production. However, we’ve seen that access to data is a challenge for those embarking on big data and data science, which might be due to politics or data integration issues. It is important to make sure the data is in order. That will ultimately include collaboration between different parts of the business as well as governance.
- Plan on a phased approach.
As mentioned above, many respondents cited the value of a proof of concept. That use case, when it succeeds, should be designed to provide a lot of value. Success begets success. Don’t try to boil the ocean—there is too much data there. Plan and execute in phases.
- Get some training.
This can’t be stressed enough. As cited above, a big challenge to big data and data science is knowledge. This is true for those looking to deploy new data management platforms as well as those planning to analyze big data. Even if tools are supposed to be easy to use, typically they are not. It is important to understand how technology works and how advanced algorithms operate. If you’re using a machine learning algorithm, understand it first. Before deploying NLP, make sure you know how it works, as well as its strengths and weaknesses.
- Move past the data warehouse.
The data warehouse is not going anywhere anytime soon. Nevertheless, big data may necessitate moving beyond the data warehouse to platforms that can support multi-structured data and iterative analytics. That said, don’t be seduced by every new big data platform. Before adopting one, be sure it can satisfy real-world requirements with the right performance and in a cost-effective manner.
- Use disparate data types.
Although structured data is still the mainstay of modelers and analysts, disparate data types can enrich a data set and provide lift to models. Think about incorporating new kinds of data, such as text data and geospatial data, into the mix. Depending on your use case and business needs, streaming data can also be quite valuable for situational awareness and improving operational efficiencies. Of course, you’ll need the right tools for the right jobs.
[easy-tweet tweet=”Data is the new oil? No: Data is the new soil. ~ David McCandles” user=”TeamBisilo”]
- Use multiple analytics methods.
Organizations are starting to move beyond basic reporting and dashboards and that is a good thing. Analytics, such as predictive analytics, can provide real value. However, many organizations get hung up on predictive analytics as the goal for analytics. There are other kinds of analytics that can be used in conjunction with (or separately from) predictive analytics. These include text mining, geospatial analytics, and graph analytics. All can provide value and those who are most successful make use of multiple kinds of analyses.
- Consider a center of excellence.
As described earlier, a CoE can be a great way to make sure that the infrastructure and analytics you implement are coherent. CoEs can help your organization disseminate information, provide training, or maintain governance.
- Consider open source technologies.
Open source technologies can provide a cost-effective way to gain access to a large community of innovators. These technologies can be worth exploring, although they require a certain skill set.
- Consider the cloud as part of the data and analytics ecosystem.
Some organizations will not move their data or analytics to the cloud (especially the public cloud) because of security concerns, yet many cloud providers (especially the large ones) have better security than that found on a company’s premises. Organizations that have moved to the cloud often reap the benefits of scalability, flexibility, and agility—especially for big data. It is worth exploring and asking questions of cloud providers about this option.
- Address cultural issues.
Change can be hard. Education is critical here, as is changing the mindset. Some people don’t get it. Some have legitimate concerns. Some are concerned about their jobs. It will be important for those driving change to get executive support (someone who is the champion) and help to communicate their message.
- Plan for new architectures.
Big data will necessitate new platforms and new architectures. These evolving ecosystems might include the data warehouse, Hadoop, and other platforms, both on premises and in a public cloud. Data scientists, citizen data scientists, and others will need to access data from different sources. The architecture can become complex, but can be manageable if a plan is put in place. This means reworking architecture plans to determine how platforms will integrate and operate together.
- Take action on big data analytics.
What good is all of the insight gleaned from big data unless you take action on it? Insight from a PowerPoint presentation or a visualization tool is great, but making the analytics developed through a big data effort part of a business process is where real value will occur. Think about how you might operationalize or embed analytics into a process to help drive or automate action on analytics.
TDWI Research provides research and advice for data professionals worldwide. TDWI Research focuses exclusively on business intelligence, data warehousing, and analytics issues and teams up with industry thought leaders and practitioners to deliver both broad and deep understanding of the business and technical challenges surrounding the deployment and use of business intelligence, data warehousing, and analytics solutions. TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor organizations.