By Jeff Burk
The CIO’s traditional role of overseeing an enterprise’s technology strategy and procurement is no longer straightforward. As the business world moves to a more decentralized technology model and shadow IT runs rampant, CIOs are no longer completely in control of technology within the business, the data it creates, or the security processes around it.
Gartner predicts that 50 percent of the data generated will soon be outside the corporate data center, generated by IoT applications and edge computing devices. As a result, CIOs must modernize their data strategy, ensuring it can seamlessly and reliably feed data insights from across the whole business into a technology strategy that will deliver digital transformation.
However, with new data privacy regulations and issues with data ownership and algorithmic bias, using data to drive insight is not easily understood. Data must be of good quality and be controllable and consistent to ensure that it can be relied upon to feed these technologies appropriately, to generate business value and inform decisions – that’s where data governance comes in.
Creating Data Governance Strategies
Data is a corporate asset. In order to use it for faster decision making, users must trust it. Data governance breaks down data silos from disparate systems across the enterprise and establishes a set of processes, standards and policies to make the data consumable enterprise-wide. An automated data governance platform quickly and securely delivers trusted data to the business users who need it to perform their jobs.
Collaboration is key to successful data use across an organization. All business users need to know where to find the right data and have a common understanding of what the data means. The amount, variety and scope of business data available is growing exponentially, making it increasingly difficult to find, understand and trust, despite this being essential to deriving value from it.
Having systems in place — like a governed data catalog — enables users to understand data in business terms, while also establishing relationships and associations between data sets. To do this right, the CIO needs to collaborate with the chief data officer (CDO) to manage a data strategy underpinned by governance that delivers the right data into the right hands to drive business transformation.
As Gartner defines, CIOs and CDOs must support each other in delivering business, and neither can succeed alone. For data governance strategies to succeed, every business leader must have data in their DNA. The CDO is tasked with building these organizational practices and behaviors needed to orchestrate this foundational change to become a data-driven organization.
To ensure the right data is in the right hands at the right time, organizations must work relentlessly to identify and reduce islands of bad data by moving rapidly to adopt agile data processes that enable the business to move quicker and be more responsive to internal and external data needs. Most importantly, they need to ensure teams are aligned to a common vision with a pragmatic path to driving the desired outcomes.
Data is making its way into nearly every aspect of the business and is the foundation of digital transformation. Governance efforts should be tightly linked to digital efforts. Looking deeper at technologies like cloud, IoT, AI and machine learning, it’s imperative to employ data governance strategies before organizations are able to fully embrace the full potential of these technologies. Data is often the biggest problem when implementing new technologies, and more and more, consumers are being impacted by data-driven decisions. Data must be correct and unbiased in order for technology to have the right impact. Flawed data only slows digital transformation.
For example, at Forrester’s inaugural Data Strategy and Insights Conference, we learned that AI spending is stagnant due to a lack of trust in the data feeding the algorithms. Data governance is a solution to this problem, providing benefits of data visibility and data quality – without implementing sound governance, there can be no trust in AI.
Digital Transformation is More Than a Buzzword
Creating a strategy around data is pivotal to a successful digital transformation. Buzzwords aside — organizations need to understand what they are trying to achieve from digital transformation and how it makes a difference in their industry and to their competitive advantage. They must then work this back to the technologies that enable transformation and the data that fuels the technologies, and implement data governance strategies that match.
Data governance needs to emerge as a separate entity within the business that helps organizations better maintain their data inventory, facilitate the use of data, improve data quality and exercise control over processes and methods employed by all data users across the business.
About the author: Jeff Burk is the senior vice president of engineering at Collibra, where he leads the global engineering group. Prior to Collibra, Jeff was VP, Engineering & Operations for Dell Boomi, where he oversaw all product development, including design, development, and site operations for Boomi’s cloud-native integration platform as a service (iPaaS). Jeff has also held senior technology roles at EMS, NextDocs, and The Neat Company. He holds a Bachelor’s degree in Information Technology from the University of Massachusetts, Lowell.
The post Driving Business Transformation through Data Governance Strategies appeared first on Datanami.
Read more here:: www.datanami.com/feed/
Remember the days before the relational database? Neither do I, so I have no idea how painful it was to keep track of data back then. But even after relational databases became more mainstream, data was still mostly about keeping track of stuff: product inventory, customer information, customer orders, etc. Then, one day, everyone wanted to use their databases to make better decisions. Before we knew it, we had data warehouses and business intelligence (BI) tools. Soon after, big data appeared, and smart people realized the relational database and data warehouse weren’t very good for that. Search engines, Apache Hadoop, NoSQL databases, Apache Kafka, et al., got more attention.
The jump from data warehouses to “big data platforms,” especially Apache Hadoop, wasn’t nearly as smooth as we all hoped. Analytical queries were slow, sometimes acceptably (like with MapReduce), but often not. People tried to bolt on data warehouse business intelligence (BI) tools onto their Hadoop deployment, and that approach didn’t work, either. People blamed Hadoop and data lakes.
No Such Thing as Big Data BI?
Then, the conversation shifted to “big data BI.” Some pundits used to say that there was no such thing as “big data.” For them, the concept of “big data BI” really had no chance.
But people are coming around to the idea of big data BI, especially when it comes to big data platforms like Hadoop and big data architectures like data lakes. These approaches allow organizations to load data directly from the source and are useful for analysis without the need for extensive up-front modeling and transformation. The ability to glean insights from unstructured data types that was difficult and impractical with traditional data warehouses was a game-changer, and data experts were recognizing this.
Traditional BI tools (those you currently use with your data warehouse) supposedly support Hadoop, but they still require data to be extracted and transformed from Hadoop to a dedicated BI server. That doesn’t sound like “big data BI.” On the other hand, the Forrester Research Native Hadoop BI Platforms Wave report was one of the first documented assertions that big data BI was a real market. The report was written in 2016, and the market has grown since then, but at the same time, Hadoop itself has gotten a lot of criticism. Some began to feel that maybe Hadoop isn’t right for BI-style analytics, and that the “native Hadoop BI platform” category was going to be subsumed by the broader, traditional BI market.
Turns out that still, after over two years, the traditional BI platforms can’t handle big data efficiently. Industry experts are calling this out; for now in a subtle way, but I believe this story is going to pick up more steam in the coming months.
For example, Dresner Advisory Services recently published a research report on big data analytics, recognizing the use of BI tools specifically for a big data environment. Boris Evelson of Forrester Research discusses how new BI architectures are required for agile insights in a recent report on BI migration. In its recent 15-Step Methodology for Shortlisting BI Vendors, Forrester refers to this new architecture as “in data lake cluster BI platforms,” which it defines as a repository “where all data and applications reside inside data lake clusters, such as Amazon Web Services, Hadoop, or Microsoft Azure.” (Forrester has since updated the term to “in-data-lake BI” in a subsequent report on systems of insight).
That means BI professionals must adapt to the more advanced environments that data lakes present. We believe “in-data-lake BI” is the next generation of BI. This generation of modern BI tools has four key characteristics:
A Scale-Out Distributed Architecture
In a scale-out architecture, organizations can add servers/processors to their existing cluster in a linear fashion. In theory, this provides nearly unlimited scale. Unlimited scale and the way in which it’s achieved, offer flexibility and agile provisioning with low cost. The ability to scale out is in stark contrast to legacy architectures. These approaches leverage dedicated BI servers and data warehouses that require scale-up growth or massively parallel architectures. Both techniques are far more expensive and limiting than a scale-out model.
In practice, we might consider the example of a large information services company that provides marketing analytics to major global corporations. The sheer volume of data this company provides is too much for traditional BI technologies to handle, despite using modern platforms such as a Hadoop data lake. The problem here is that, while the Hadoop-based data lake offers reliable storage and processing, the interaction with traditional BI tools presents a bottleneck for delivery of analytics to end users.
This fact is not so much the fault of Hadoop-style data lakes as it is that of traditional BI tools and processes. These types of approaches do not scale to match the degree to which the organization was growing. Adding more data—to service additional customers—to the analytics process became expensive and time consuming in a scale-up architecture and imposed too many performance restrictions on the system. For a company like this, an in-data-lake BI approach represents a huge win in terms of saving cost, time, and effort, and also provided the performance and user concurrency customers demand.
BI Processing Runs Natively In the Data Lake
Non-native BI tools require extract databases with numerous downsides. Downsides include redundancy and inconsistency with source data, data movement effort, extra systems to manage, and processing and storage overhead. Extract tables and multidimensional cubes take a long time to create, increasing the risk that the data will be stale by the time it’s ready to use. Finally, some regulated industries must restrict duplication of production data, which makes non-native BI tools more inflexible. Native processing, by comparison, takes advantage of the servers in a data lake cluster, in a model popularized by Hadoop, and does not require data movement.
We can watch this play out in an example with a company that collects telemetry data from customer-deployed storage arrays. Information the company collects might help identify issues—related to usage, warning conditions, and failure—that, if addressed would help the company better serve its customers. Because the different customer arrays generate so much data, a data lake environment offers the only scalable analytics environment. However, all that data might be for nothing if the company isn’t able to analyze it quickly and with minimal overhead.
An in-data-lake BI platform is ideal for this use case, since the company can analyze their customer data as soon as it lands without additional overhead of moving the data externally to a data mart or other dedicated BI platform. As a result, they can immediately identify when customers are ready for additional storage, when components are failing and need replacement, or understanding what factors contribute to lower reliability.
Many industries such as financial services use data analytics in a variety of line-of-business operations: customer retention and acquisition, and fraud detection, to name a few. These firms are leveraging machine learning algorithms to analyze enormous volumes of data in order to quickly scan transactional records to make cost-reducing decisions. For all of this, a data lake architecture makes sense. But for it to work, the analytics technologies must be deeply integrated into the architecture, rather than “bolted-on” to the existing architecture. In-data-lake BI provides that deep integration and can allow firms to move quickly and react immediately in a dynamic market.
Support for Multiple Data Sources
While research suggests the majority of companies collect data from fewer than five external sources, a number of organizations still leverage five or more external data-generating resources. As the number of IoT devices continues to rise, and organizations learn to implement machine learning algorithms and other artificial-intelligence enabling tools, the number and variety of external data sources should continue to proliferate.
Sources today include Hadoop HDFS, cloud object storage like Amazon S3 and Microsoft ADLS, and distributed streaming platforms like Apache Kafka. It is absolutely crucial that today’s in-data-lake BI platform integrates with these examples, as well as other modern data platforms.
Flexible Deployment Options
In-data-lake BI platforms need to work across whatever combination of platforms customers choose, providing insights to end users while also simplifying IT work. To achieve this cross-platform functionality, organizations should look to on-premises, cloud, hybrid cloud, and multi-cloud as equally viable ways to run BI analytics systems.
The BI platform should be able to run on almost any reasonably sized computer, whether physical or virtual, as scale (and performance to some degree) is achieved through the addition of more nodes to the cluster. An important aspect of the deployment options is the ability to support object storage to enable environments where data is decoupled from the compute engines. Object storage is used by organizations today regardless of where the computing layer resides, even on-premises.
Organizations are still figuring out how to get the most out in-data-lake BI, and so the architecture will evolve with customer demands. One thing is clear, though: BI tools must evolve with customers’ data landscapes. The winning companies in today’s business environment will be the ones who carve the shortest path to decisions. To achieve the competitive edge promised by data lakes, organizations should look for the modern BI tools that align with the above four characteristics of in-data-lake BI.
About the author: Shant Hovsepian is a co-founder and CTO of Arcadia Data, where he is responsible for the company’s long-term innovation and technical direction. Previously, Shant was an early member of the engineering team at Teradata, which he joined through the acquisition of Aster Data. Shant interned at Google, where he worked on optimizing the AdWords database, and was a graduate student in computer science at UCLA. He is the co-author of publications in the areas of modular database design and high-performance storage systems.
The post In-Data-Lake BI Is the Next Frontier for Business Intelligence appeared first on Datanami.
Read more here:: www.datanami.com/feed/
In this monthly feature, we’ll keep you up-to-date on the latest career developments for individuals in the big data community. Whether it’s a promotion, new company hire, or even an accolade, we’ve got the details. Check in each month for an updated list and you may even come across someone you know, or better yet, yourself!
BackOffice Associates has announced that its board of directors has appointed current president of global and consulting services Kevin Campbell as chief executive officer. Prior to joining BackOffice Associates in April 2018, Campbell was COO of Oscar Insurance Corporation.
“I am often asked why I joined BackOffice Associates. My reasons are simple – the attractive data management market space, our Global 2000 customer list and our people including our outstanding leadership team and its Board of Directors,” said Campbell. “As the newly appointed CEO, I am committed to our customers, partners, and employees. Our purpose is to solve the world’s most complex data challenges, one byte at a time.”
Infoworks.io has announced that it has named former AT&T chairman and CEO David Dorman as chairman of the board. Dorman is currently chairman of the board at CVS Health Corporation and a member of the boards at Dell Technologies and PayPal Holdings, Inc.
“I am excited to work with this remarkable team on a mission to empower enterprises to organize, manage and utilize data as they strive to compete in the rapidly evolving digital economy,” said David Dorman, Chairman of Infoworks. “Digital transformation is an existential imperative for enterprises and analytics agility is a prerequisite. I have seen first-hand how Infoworks’ automation has enabled large enterprises to achieve analytics agility without hiring armies of big data specialists. With the right product at exactly the right time, Infoworks has the potential to quickly become an important industry leader in this large and growing market.”
Commvault has announced the appointment of Sanjay Mirchandani as president, CEO and member of the board. Previously, Mirchandani was the CEO of Puppet. “I’m honored to join the Commvault team with its respected reputation, industry-leading technology and services, and infectious company culture,” said Mirchandani. “Commvault’s partner-driven approach is closely aligned with my own. I look forward to being an advocate for our customer, channel and partner ecosystems to deliver complete solutions.”
“Sanjay’s accomplishments at Puppet demonstrate a deep understanding of multi-cloud and cloud native applications,” said Commvault’s incoming Chairman, Nick Adamo. “We are confident he is the ideal person to build on Commvault’s current momentum and champion the rich heritage of combining innovation with unwavering focus on customer and partner success.”
InfluxData has announced that Will Paulus is its new vice president of sales. Previously, Paulus served as head of U.S. sales at Algolia. Prior to Algolia, he led U.S. and international sales initiatives for Google’s G Suite and Mixpanel.
“We experienced tremendous growth in 2018 and are well-positioned to continue this momentum through 2019,” said Evan Kaplan, CEO of InfluxData. Appointing these accomplished executives in such pivotal roles at the company will support our overall push to innovate, expand and increase profitability.”
InfluxData has also announced that Jim Walsh is its new senior vice president of engineering. In his new role, Walsh will focus on scaling the InfluxDB technology and introducing new features to expand DevOps and IoT functionalities.
Previously, Walsh was senior vice president of infrastructure engineering at Salesforce. Prior to Salesforce, he served in leadership roles at Microsoft and was a founding member and VP of engineering at Versive.
To read last month’s edition of Career Notes, click here.
Do you know someone that should be included in next month’s list? If so, send us an email at firstname.lastname@example.org. We look forward to hearing from you.
Read more here:: www.datanami.com/feed/
LOS ANGELES – 15 February 2019 –The Internet Corporation for Assigned Names and Numbers (ICANN) today announced that it is aware of several recent public reports regarding malicious activity targeting the Domain Name System (DNS). We have no indication that any ICANN organization systems have been compromised, and we are working with relevant community members to investigate reports of attacks against top-level domains (TLDs). For some reporting on this issue, please refer to these sources:
- United States Department of Homeland Security (DHS) and Cybersecurity and Infrastructure Security Agency (CISA) Emergency Directive 19-01: “Mitigate DNS Internet Tampering”, 22 January 2019.
- “Why CISA Issued our first Emergency Directive”, United States DHS CISA blog, 24 January 2019.
- “Global DNS Hijacking Campaign: DNS Record Manipulation at Scale”, FireEye, 9 January 2019.
- “Widespread DNS Hijacking Activity Targets Multiple Sectors”, Crowdstrike blog, 25 January 2019.
- “Statement on man-in-the-middle attack against Netnod”, Netnod statement, 5 February 2019.
- “Revisiting How Registrants Can Reduce the Threat of Domain Hijacking”, Verisign blog, 11 February 2019.
- “.nl not affected by global domain hijacking campaign”, Stichting Internet Domeinregistratie Nederland blog, 15 February 2019.
ICANN believes it is essential that members of the domain name industry, registries, registrars, resellers, and related others, take immediate proactive and precautionary measures, including implementing security best practices, to protect their systems, their customers’ systems and information reachable via the DNS.
We trust that DNS industry actors are already taking strong security precautions in your business. However, here is a checklist to consider.
- Ensure all system security patches have been reviewed and have been applied;
- Review log files for unauthorized access to systems, especially administrator access;
- Review internal controls over administrator (“root”) access;
- Verify integrity of every DNS record, and the change history of those records;
- Enforce sufficient password complexity, especially length of password;
- Ensure that passwords are not shared with other users;
- Ensure that passwords are never stored or transmitted in clear text;
- Enforce regular and periodic password changes;
- Enforce a password lockout policy;
- Ensure that DNS zone records are DNSSEC signed and your DNS resolvers are performing DNSSEC validation;
- Ideally ensure multi-factor authentication is enabled to all systems, especially for administrator access; and
- Ideally ensure your email domain has a DMARC policy with SPF and/or DKIM and that you enforce such policies provided by other domains on your email system.
The Security and Stability Advisory Committee (SSAC) previously published advice and information on security best practices relevant to this threat:
ICANN strives to be a trusted partner in the multistakeholder community, and engage in collaborative efforts to ensure the security, stability and resiliency of the Internet’s global identifier systems. For more information on ICANN’s role in the security, stability and resiliency of the Internet’s identifier systems, visit https://www.icann.org/octo-ssr.
The ICANN community will continue the discussion on this critical topic at its upcoming ICANN64 meeting in Kobe. In addition, ICANN org is available to provide consultation on security best practices by emailing email@example.com.
ICANN’s mission is to help ensure a stable, secure, and unified global Internet. To reach another person on the Internet, you need to type an address – a name or a number – into your computer or other device. That address must be unique so computers know where to find each other. ICANN helps coordinate and support these unique identifiers across the world. ICANN was formed in 1998 as a not-for-profit public-benefit corporation with a community of participants from all over the world.
Read more here:: www.icann.org/news.rss
The SWOT Guide To Blockchain Part 6
The SWOT guide to blockchain is a guide in 6 parts, where both the opportunities and challenges of blockchain are considered. Blockchain has the potential to be groundbreaking, offering opportunities and better solutions for a range of situations and industries worldwide. In the sixth part of this guide, we analyze the most important weakness of blockchain technology in our opinion: its consumption of energy. We also demonstrate how such a weakness can be transformed into a strength, if innovation is triggered in the blockchain community, towards finding sustainable and ecological alternatives of blockchain solutions.
By Maria Fonseca and Paula Newton
Bitcoin And The Environment
Increasingly, concerns about blockchain’s impact on the environment are rising to the fore. In 2018 it was reported that bitcoin uses the same amount of carbon dioxide in a year, as one million flights crossing the Atlantic. This cannot be ignored. The use of electricity to drive bitcoin is tremendous. Further statistics have emerged that have accentuated the point. For example, it was argued that in one month alone, the use of electricity by the bitcoin network was greater than that used by the whole of the Republic of Ireland. Since that time (November 2017) the use of electricity by bitcoin has only grown further. For anyone with even a moderate passing interest in the environment, this is of concern.
The whole bitcoin system is built around the use of electricity. Mining, using electricity, needs to occur for the system to operate. When miners are able to mine faster and more effectively – using more electricity through more powerful machines – the higher the chances re that a miner will get the biggest reward. Everyone is motivated to use more electricity to gain the highest rewards. The bigger the system gets, the more electricity is burned to support it. Estimates show that if the price of bitcoin rose to $50,000, electricity consumption would increase by ten times – which is clearly tremendous. Some believe this is not a major concern since bitcoin will not achieve such a value (though this is arguable) and due to the fact that it is likely that technology will be developed that allows mining that is more energy efficient. Indeed, mining computers have already increased in efficiency over time. Yet it is impossible to rule out ongoing and continual increases in demand for energy use, making blockchain an environmental concern for many.
But it isn’t all bad news in so far as the environment is concerned. Other industry analysts suggest that green cryptocurrencies could have a part to play. While to date, buying environmentally friendly items has generally been equated to paying more for them, changes are afoot with the rise of green cryptocurrencies. It is thought that such cryptocurrencies will be beneficial in terms of offering benefits for people making greener purchases, as well as driving innovation. The way that green cryptocurrencies work is that blockchain has the ability to track and monitor environmental performance by business or individuals. This can be saved and embedded into the system, and those that are more effective in this regard can be rewarded. On the other hand, consumers will be able to have increased confidence that green really means green.
2019: And Still Waiting for Green Cryptocurrencies
It was predicted that in 2018 green cryptocurrencies would start to have their day, based on environmental data built into blockchain. Energy companies were in some cases carrying out pilots for peer-to-peer energy transactions and platforms, used for trading. However, some benefits of energy savings have been found to lead to a so-called “rebound effect” where the benefits gleaned are offset by the fact that environmentally unfriendly behaviour occurs with the savings made that would otherwise be spent on energy. To counteract this, green cryptocurrencies could be built in such a way that would ensure that the benefits gained could only be offset against payments for green products and services.
While these innovations are welcomed and likely to be very positive in terms of environmental benefits, this does not change the fact that cryptocurrencies and mining in the way that the system is structured at the moment is damaging to the environment. It is not clear the extent to which green cryptocurrencies would in themselves use electricity that would offset the benefits gained from them. Overall, further innovation is required to ensure that mining technology improves to reduce the gigantic electricity used through undertaking these processes.
The SWOT Guide To Blockchain Part 6 Intelligenthq
In What Ways Could Blockchain Support the Environment?
Blockchain, is more then cryptocurrencies though. It is also a way to develop softwares that are safer and theoretically distributed, as we have seen in other sections of this guide. How then, could Blockchain tech support the environment in. FutureThinkers have compiled the following practical examples that give readers a better picture of practical ways through which this technology could be used:
- Blockchain can be used to track environmental compliance and the impact of Treaties — decreasing fraud and manipulation.
- Donations to charities can be tracked to ensure that they are being attributed efficiently and as planned.
- Products can be tracked from origin to source. This can help reduce carbon footprints, increase ethical accountability and reduce unsustainable practices.
- Schemes such as recycling can be incentivised by offering token rewards to participants.
- Peer to peer localised energy distribution is possible, rather than the current system of a centralised hub.
- Blockchain can also be used to track the carbon footprint of products, which can then determine the amount of carbon tax to be charged.
Over this comprehensive SWOT analyses of Blockchain, we have described the challenges, strengths, opportunities, and weaknesses of Blockchain. One thing is certain, blockchain tech is here to stay. We live in a world increasingly digitized and interconnected, and with the rise of the Internet of Things, blockchain tech, due to its focus on trust, accountability and its distributed character, might be the perfect technology to structure the global digital networks of the future.
One cannot forget though, that there are also various problems and weaknesses that need to be tackled. And one cannot either, just take at face value, that this technology is better and fairer, just because it’s disruptive.
Solutions will certainly appear along the way, as more people experiment with putting into practice its theoretical concepts. Our hope is that blockchain will have positive implications for the implementation of a fairer and more sustainable circular economy, that will better tackle environmental issues, inequality and other problems of our current broken system. In order for that to happen, calls for, from all of us, a high doses of attention to detail, and critical reasoning, while examining this technology. Only by actively speaking out what is going astray from its initial promises, and acting quickly to correct and prevent problems, can we build a system, eventually built with blockchain, that will provide us with a more connected and beautiful world.
Read more here:: www.intelligenthq.com/feed/