Data Aggregators Archives

March 1, 2019 by Miriam Carey

One of the top complaints data scientists have is the amount of time it takes to clean and label text data to prepare it for machine learning. In fact, it is the complaint. If you’re in the data cleaning business at all, you’ve seen the statistics – preparing and cleaning data can eat up almost 80 percent of a data scientists’ time, according to a recent CrowdFlower survey.[1]

This means less data is being used. One estimate published by PWC maintains that businesses use only 0.5 percent of data that’s available to them.[2]

Consider, also, the issues caused by data that’s labeled incorrectly. Poor data quality can proliferate and lead to a greater error rate, higher storage fees and require additional costs for cleaning.

And all the while, the demand for data-driven decision-making increases.

What makes for good data?

Data scientists work with a wide range of text data including social media posts, product reviews, call center voice-to-text data, academic libraries, product descriptions…it’s an endless stream of text data that can produce insight and value if analyzed properly.

Normalizing this data presents the first real hurdle for data scientists. Just getting the data into a format where it can be looked at for labeling is a cumbersome task.

Once the data is normalized, there are a few approaches and options for labeling it. Depending on the size of the dataset, it could be labeled “by hand” or by matching data to a taxonomy. If data scientists are working with a specific set of data in a specific subject area, there may be a taxonomy designed for that system. Mapping to an auto parts taxonomy is a fantastic way to organize data about auto parts – but a horrible way to map customer reviews about an auto parts store.

Label Text Data with a General Taxonomy

More than ten years ago, our company launched a meta search engine called Info.com. Serving up relevant results – and ads – required a deep and thorough understanding of search terms. So, we set out to map the most-searched-for words on the internet. The result was a huge taxonomy (it took more than 1 million hours of labor to build.) And once that was complete, we realized that our nifty tool had value to a lot of other people, so we launched eContext, an API that can take text data from any source and map it – in real time – to a taxonomy that is curated by humans. A general taxonomy, eContext has 500,000 nodes on topics that range from children’s toys to arthritis treatments.

eContext also sets itself apart as being a very deep taxonomy. The IABC provides an industry-standard taxonomic structure for retail, which contains 3 tiers of structure. The eContext taxonomy, which incidentally covers thousands and thousands of retail topics, offers up to 25 tiers.

For data scientists, this level of depth and such a wide range of topics in a general taxonomy means, simply, better and more accurate text labeling. And the fact that the API can take raw text data from anywhere and map it in real time opens a new door for data scientists – they can take back a big chunk of the time they used to spend normalizing and focus on refining labels and doing the work they love – analyzing data.

Give us a Try

We’re as excited as everyone else about the potential for machine learning, artificial intelligence, and neural networks – we want everyone to have clean data, so we can get on with the business of putting that data to work.

Try us out. You can see a mini-demonstration at http://www.econtext.ai/try. Simply type in a URL, a Twitter handle, or paste a page of text to see how we classify it. We think you’ll be impressed enough to give us a call.

We’re very happy to talk with you about your specific needs and walk you through a demo of eContext.

Additionally, if you’re interested in learning more about how a general taxonomy supports better machine learning initiatives, read our whitepaper, Contextual Machine Learning – It’s Classified by Seth Grimes. The paper outlines five ways that machine learning accuracy can be improved by deep text classification.

______

[1] CrowdFlower Data Report, 2017, p1, https://visit.crowdflower.com/rs/416-ZBE-142/images/CrowdFlower_DataScienceReport.pdf

[2] PWC, Data and Analysis in Fiancial Research, Financial Services Research, https://www.pwc.com/us/en/industries/financial-services/research-institute/top-issues/data-analytics.html

February 18, 2019 by eContext

Firms that specialize in data and analytics are in big demand these days. Even large companies that used to be able to manage data in-house are beginning to outsource to experts at data firms. Businesses are looking for help with managing and processing data, but they’re also beginning to look for more than just the basics. It seems we’re hearing the term “Data as a Service” (DaaS) being coupled with newer terms like “Insights as a Service” (IaaS) as the technology speeds us forward.

A new report released by Forrester, Insights Services Disrupt The Data And Analytics Market, addresses this trend, detailing how changes in the marketplace are driving businesses to outsource their data needs to specialists such as data aggregators and other DaaS firms, and stressing the need for these data firms to adapt quickly to support this growing market.

One of the report’s authors, Jennifer Belissent, Ph.D., writes in a blog post, Introducing the New Insights Service Provider, “According to Forrester’s Q3 2015 Global State of Strategic Planning, Enterprise Architecture, and PMO Online Survey, 73 percent of companies understand the business value of data and aspire to be data-driven but just 29 percent confirm that they are actually turning data into action.”

The Forrester team recommends a new approach for data experts to consider.

A great time to be a data aggregator

Data aggregators and other data-crunching firms that are keeping up with the trends are in high demand. They’re also attracting some of the best talent in the data sciences, and if you’ve been looking to hire data experts, you know how competitive the market is these days.

Data firms understand what Belissent refers to as the “insights services cycle”. She argues that an increase in time to value can be achieved in four steps.

In the first step, a data firm helps its client define or refine the business problem, helping the client construct the right questions to ask in order to solve the problem. Once the right questions are in place, the data specialists are often really good at helping clients with step two, knowing where to find the right data in order to get the best answers. Belessent notes that aggregators often have an opportunity to get the right data at the right price for clients, as well.

Step three is the most important: applying advanced analytics capabilities and designing the right framework for performing those analytics.

This is where eContext is helpful. As a data expert, you have a question that needs an answer as quickly as possible, and you have a series of data sets that contain the right kind of information for analysis. What you don’t have is a month to wait for your data to be pulled from disparate sources and normalized for analysis — clients want answers and insights now.

eContext is a real-time classification engine that helps data aggregators pull text data from any source into a unified taxonomy structure for easy discovery, analysis, and strategic recommendations. eContext offers a deep classification structure — more than 500,000 total categories that reach down 21 tiers — to help you deliver greater relevance and deeper meaning on behalf of your clients.

We work with companies like DataSift that aggregate astronomic volumes of text data from a wide range of sources like social media networks, streams of editorial content, blogs, and forums, and other data sources they have access to. DataSift relies on eContext to quickly provide an infrastructure for all their data, so that they can unify, standardize, and structure their assets into a topic hierarchy framework to enhance overall efficiency of the insights supply chain.

Pulling it all together

eContext can be key in Belissent’s final quadrant of the insights services cycle — delivering better insights. She notes the growing number of firms that are differentiating their services by stressing excellent data capabilities backed up by keen insights.

Consider the competitors in your space who are delivering insights to clients based on the shallow industry-standard three-tiers-deep classification system. Leveraging eContext to classify your data gives you the infrastructure required to , help your key staff cull stronger and deeper insights, faster.

It’s a great moment to be a data expert. Your services have never been more in demand, and there’s a wide and untapped market that’s just waking up to the fact that data will drive them forward. If you can add value by giving them answers along with insight faster than your competitor, you’re in a good position to lead the pack. Step out ahead of your competition with better, deeper, richer text classification capabilities from eContext.

February 15, 2019 by Miriam Carey

Revenues for big data and analytics (BDA) commodities are projected to to reach $210 billion by 2020, according to a recent forecast from IDC. Such significant investment begs several questions. What industries are pushing this growth? What is the perceived value from more sophisticated BDA practices? Perhaps most importantly, how can businesses continue to refine and update these practices as the technology matures?

According to IDC, nearly half of that forecasted $210 billion will comprise investment from banking, discrete and process manufacturing, central government, and professional services. However, the report also noted that banking, healthcare, insurance, investment services, and telecoms are expected to grow their BDA spending most rapidly over the next three years.

Data strategies

So what are organizations gaining from this investment? According to NewVenture’s 2016 survey of “senior Fortune 1000 business and technology decision-makers”, the quest for insight remains the most-cited argument for big data spending. Businesses are looking for an edge by snagging the most relevant, helpful information, understanding it more accurately than their competition, and converting on that value with greater speed.

Interestingly, a healthy variety of data sources is widely seen as more important that volume or velocity. In NewVenture’s survey, 40% of participating firms cited a high-priority need to integrate data from new and legacy sources. Of course, this emphasis leads to additional challenges: how to make sure disparate data sources can “talk to” each other? How can businesses update their data management strategies to incorporate new sources like IoT output or social data? And in an increasingly automated data landscape, how can businesses lend structure to raw or messy data in order to facilitate hands-free machine learning projects?

The challenge: Actually using the data

Data and analytics professionals are often forced to spend too much time cleaning data, even when the real value from BDA practices lies on the other end of that pipeline — interpreting and acting on the output is where all this work is converted to better KPIs. Therefore, expect to see more investment in products and services that facilitate cleaner data and more flexible integration from varied sources.

The BDA world is growing, but it’s also becoming more accessible. By partnering with firms that address the challenges of adopting a data-driven culture, businesses can keep up with more sophisticated competitors without reinventing the wheel.

August 31, 2018 by Patrick

In our last post, we discussed classification in general, arguing that structured labeling helps businesses leverage more information with fewer resources. For the next couple of entries, we’re going to focus more specifically on semantic classification, which is eContext’s core specialty. eContext labels massive volumes of content by subject matter, so our clients can better manage and learn from their data without having to manually take stock of each individual item. These Twitter posts mention NFL players; this video is about Silicon Valley; this customer survey is focused on retail jewelry, etc, etc. As previously mentioned, our clients typically access our classification by way of the eContext API. Before digging deeper into concrete use cases, we thought we would answer a few of practical questions that come up when new clients use the API for the first time.

What does the eContext API actually do?

The eContext API allows subscribers to map their content to eContext’s Categories. A category could represent a thing, person, place, product, service, or an abstract concept. Categories are organized in a huge taxonomy that houses specific topics beneath broader ones; each category exists in a single location within this structure. A user submits a batch of content (such as a list of social posts or URLs) and receives, for each individual item, a list of corresponding categories that includes the following information:

Category name
Category ID
Category path
- for example, the category path for “Breaking Bad” is Arts & Entertainment::Movies & Television::Movie & TV Products::TV::Drama TV Shows::Breaking Bad
A few statistics on the category’s prevalence within recent social conversations
- the importance of the category within recent social conversations
- the percentage of conversations in the past 28 days that have been about the category

What are the different functions available through the eContext API?

Clients can make different classifications calls depending on the kind of data they need to label. This is because eContext uses different language processing for different types of content. The core classification functions of the API are:

Classify/Text – The most basic functionality, designed for free-form text.

Classify/Social – Used for social media posts and other user generated content. The language processing used for Classify/Social is optimized for short-to-medium length content, and also includes functionality to consider usernames and hashtags.

Classify/Keywords – This function is optimized for very short text strings. While other calls can map content to multiple results, Classify/Keywords will give each keyword a single, best-possible category. This allows keywords to be bucketed into discrete groups without duplicates.

Classify/URL – Users submit a list of URLs and eContext labels the topics that appear on each of those web pages. Our processing method ignores advertising and other irrelevant elements, focusing only on the core content of the page.

Classify/HTML – An alternative to Classify/URL, this call allows users to hand-select which page elements are to be classified.

In addition, the eContext API offers a few ancillary functions, allowing users to see a list of eContext’s top-tier categories, obtain keywords that have been pre-mapped to individual categories, and check their own usage information. For a more tech-oriented guide to the eContext API, feel free to check out our documentation.

What can I do with eContext Classifications?

It depends on your goals, but broadly speaking, applications for semantic classification can be divided into two groups: utility and insight. UTILITY APPLICATIONS: eContext classifications are used to facilitate some other (usually automated) process. Examples include:

tagging browseable content for intuitive navigation, related content suggestions, etc.
deriving user profiles that can be used to automate personalization
organizing data in CRM so an organization can quickly source information by topic
automatic content filtering for improved relevance and/or brand safety

INSIGHT APPLICATIONS: eContext classifications are used to analyze large-scale digital activity, typically for marketing and research purposes. Examples include:

discovering the distinctive interests of a target audience
identifying optimal channels for efficient media buying
content ideation to publish media that resonates with consumers

Of course, many of our clients use eContext in ways that combine these “utility” and “insight” benefits. For instance, an online retailer might classify product descriptions so they’re easier to search, but can also get the added value of topic-by-topic conversion stats. In customer service, classification can help dispatch tickets to the best representative, but it can also help the company analyze the kinds of problems their users are experiencing. Check back soon for an in-depth look at how clients are putting eContext to use.[/vc_column_text][/vc_column][/vc_row]

March 21, 2018 by eContext

In a house with six kids, spring cleaning is a big deal. By the end of March, there can be dozens of pairs of shoes and boots clogging the mud room, not to mention a sea of ski parkas, winter coats, hats, scarves, and countless orphaned mittens and gloves. Each bedroom is equally littered with the spoils of winter, and everyone is eager for the “out with the old, in with the new” purge that spring cleaning promises.

One year, after a particularly harsh winter season, the powers that be in the household (the parents) went a bit nuts trying to clean it all and decided, instead, to organize a Great Purge. Everyone in the house was instructed to pull everything into a group of piles in their room: clothes that needed to go to storage, clothes they’d outgrown, clothes and boots for Goodwill, items that could be hand-me-downs, things that needed to be washed…

You get the picture. Once all the piles were arranged, boxes and laundry baskets were brought around to each room before anyone could forget which pile was which. Each kid placed their items into the correct box or laundry basket, and the boxes were carried off by the bigger kids.

As all the boxes were put neatly away in the basement, everyone was able to sweep away the winter dust and to make room for their spring wardrobe.

Using data tags to clean house

Does your data look like our house full of six messy kids? Admit it, you’ve got data strewn across every office silo, and a data mudroom so chock full of stray bits of text, you can no longer see the floor. It’s time to organize – with data tags.

Imagine being able to take all that stray data, throw it into a big pile, put it through a filter and have it come out on the other side, neatly arranged into compartments that make sense to you and to your business.

eContext is a text analytics solution that can take text data from any source — clickstream, social, customer profiles, you name it — and normalize it in real time into a topic hierarchy that you can use. Tag your data so that you can put it in the right bin to be analyzed, recycled, or thrown away, and suddenly you’ve got a very organized space to work with.

There’s another reason eContext is so good for your data purge. With some category structures, your data gets organized into very shallow categories, so you might wind up with the entire family’s boots in one box, and all the ski parkas in another. eContext extends to 21 tiers, so you can organize your data into very specific nodes. Instead of a box full of boots, you wind up with a series of specific boxes in your data basement, very neatly labeled, “boys ski boots”, “girls rain boots”, “riding boots”, “snow boots”, and “winter hiking boots”.

Ready for a new season?

It was always our parents’ dream that once the house was in order, we’d take the opportunity to “keep everything in its place from now on.” Of course, this never really happens when six kids are running around, but as a grown-up with tons of data charging through the door every day, you have the opportunity to keep your data in check constantly.

Because it can filter and tag your data in real time, you can use eContext continually to keep your data spring-cleaning-fresh. As much data as you have, from any source, in real time. Organized into 21 categories, and more than 450,000 nodes.

Now get out that broom. It’s time for a good data sweep.

August 24, 2017 by Patrick

I once heard a colleague say:

“When you think about it, every business problem is really a matter of adequate classification.”

Granted, he was doing PR for a classification software company, but broadly speaking, his point is tough to disprove. For starters, classification is undeniably crucial in areas like marketing and sales. That’s why we group people into demographics; it’s why we describe a lead according to its position in the funnel. But really, every member of your organization relies on classification to be successful, whether it’s an intern sorting through junk mail or the C-Suite framing objectives for their shareholders.

Imagine, in your email today, you were solicited by two different companies that you’d recently patronized, each asking you to fill out a customer satisfaction survey. Company A gives you a bunch of very broad, open-ended questions such as ”How would you describe your experience?” and there’s a spacious free-form text field for each. Company B gives you a longer list of extremely limited questions — ”Approximately how many minutes did it take for a representative to respond to your request?” — and a discrete list of options from which to choose.

Based on these two surveys, which company would you assume has a larger customer base?

Lacking any other information, Company B is the safe assumption. As an organization grows, the opinions of one individual become tougher and tougher to consider. What we need then is an effective way to group concepts together, to align and aggregate and make decisions based on a wider view of the landscape. This is the argument for effective classification; the ability to derive intelligence from larger and larger pools of information is both the burden and hallmark of successful businesses.

The Threat of Bad Classification

I say “effective classification” because, of course, we all know of examples where inadequate labeling actually inhibits understanding. It doesn’t matter if you’re talking about products or people: when the labels are too broad, or too irrelevant, or too vague, they marginalize nuance and cement a foundation of inaccuracy. Unchecked, that broken foundation will compromise decision-making. In short: bad labels are damaging.

So what we need then are criteria that help us to evaluate any classification method:

PRECISION – Are you classifying accurately? The goal here is to eliminate both false-positives, when an item is assigned an incorrect label; and false-negatives, when an item should receive a given label but doesn’t.

DEPTH – Going hand in hand with precision, the health of your classification depends on how much information your labels can communicate. “Turtle” is informative; “Serrated Hinged Terrapin” more so. Deeper granularity of labels fosters higher-fidelity understanding.

STRUCTURE – The danger of an excessively granular system is that the labels become so specific that they’re meaningless. That’s why we need structure: so that each classification is not just defined in a vacuum, but achieves meaning through its relationships to other classifications. To illustrate using the above example, “Serrated Hinged Terrapin” isn’t all that helpful if you don’t know what a turtle is to begin with.

RECALL – The amount of available information you can successfully classify. Do your labels completely cover the range of different possibilities, or are there items that don’t correspond with your criteria, and thus can’t be classified at all? It’s important to note that, in many classification systems, improving recall can adversely affect precision, and vise-versa. Increasing recall often means broadening the rules, which can result in false positives. On the other hand, trying to improve precision through stricter rules means fewer items meet those tougher standards. It’s a balancing act.

FLEXIBILITY – There’s no such thing as perfect classification. Any system you can come up with will have some kind of flaw in its precision, depth, structure, or recall. Moreover, change is a constant in any organization, and any classification can become outdated eventually. (Fun fact: there once was a time when Harry Potter could be adequately characterized as a book series, as opposed to an international multimedia entertainment franchise.) If your classification system is going to maintain effectiveness over time, labels and rules must be constantly tested and easily revised.

RELEVANCE – Is the information conveyed via classification in line with what you need to know? This can be a tricky one. As we increase our capacity to record and store data on pretty much everything, many organizations are developing a mentality of “grab everything and make sense of it later.” This pack-rat idea is simple — sometimes even seemingly insignificant data can yield important insights. But if you don’t have the capacity to analyze all that data in an organized way, then you’re really just wasting resources on chasing irrelevance. It’s a good idea to approach classification with a clear goal of what you’re trying to achieve.

Understanding at Scale

Classification at its best is all about an economy of information: How can I consider the largest volume of data while losing as little meaning as possible in the process?

Think back to that survey question for a moment — Company A with its free-form approach, and Company B with its tightly controlled survey questions. Maybe when you got there, you thought: “Well clearly Company A is the more successful, because they’re taking the time to elicit organic responses. They seem like they’re really giving each customer individual attention.”

I would agree with that assessment to some extent — free-form answers can provide fuller overall meaning — but this kind of intel becomes harder to align and consider when you’re a large corporation operating in several markets. The ultimate goal, for a company classifying their data, is to capture all the nuance and variety that you’d get from sitting there and manually considering each data point, but in a quantifiable way that makes this meaning accessible on a huge scale and fast.

eContext specializes in a specific kind of classification — the labeling of multimedia content according to the topics discussed — to gain fast, reliable intelligence on behalf of our clients.

eContext’s Semantic Classification

eContext is a rule-based classification engine that annotates content according to topics mentioned. Our clients use semantic classification to organize and interpret all kinds of data, including:

Web content
Social posts
Customer feedback
Videos
Search queries
Messaging

Now, there’s a whole ecosystem of tools available to mine insights from digital content — and many of those solutions don’t have anything to do with that content’s subject matter. Maybe you simply need to know when your customers are most active on social media, or to discover what percentage of searches on your website result in a sale.

But semantic data provides deeper understanding and utility, because it tells you what that content’s actually about. Semantic classification supports a wide variety of applications, including:

Market research
Personalized content delivery
Query response (traditional search box, chatbot, or virtual agent)
Media planning
Customer service
Brand safety

How to tell if you need semantic classification, in three simple questions:

Does your role involve any analysis of digital content?

Does it help to know what that content’s actually about?

Do you have the time or resources to analyze that content manually?

How eContext Recognizes Topics

As mentioned in the last post, any decent classification must be sufficiently structured, accurate, and flexible. eContext meets and maintains these standards through two unique elements:

Topics are organized into a hierarchy comprising 25 verticals, 20 tiers of depth, and 450,000 individual nodes
For each topic, a list of vocabulary rules is created to identify when that topic is being mentioned.

eContext organizational structure is the world’s largest taxonomy of commercial and social topics, comprising over 450,000 categories across 25 verticals. These categories are arranged in a hierarchical structure; the top tier includes very broad topics like “Arts & Entertainment”, “Health”, and “Travel”, while in lower tiers, the topics become more and more granular.

To classify text into its 450,000 topic categories, eContext utilizes a database of 55 million positive and negative vocabulary rules. Positive vocabularies indicate if a text string is eligible for a certain classification. For example, in the category “David Bowie”, positive vocabularies include “ziggy stardust” and “thin white duke”. Positive vocabularies let us know when people are using different words to talk about the same thing. Negative vocabularies indicate if a text string is ineligible for a certain classification. For example, in the category “Bow Ties”, negative vocabularies include “pasta” and “noodles”. These rules greatly improve the accuracy of the classification process.

eContext’s 55 million vocabulary rules are trained and maintained by subject experts. This instills common sense in a scalable process that classifies thousands of text inputs per second.

Accessing eContext Classification

eContext offers clients three different ways to take advantage of semantic classification: by web app, on-premise appliance, or API. Which of these options is best for a client depends on the volume of data to be enriched as well the client’s available resources.

Classify.econtext.com — The most lightweight option, but also the least scaleable. eContext’s browser-based tools allow users to review sample-size portions of web, social, or keyword classification. We recommend using the Classify site either to demo the accuracy and depth of eContext classifications, or as an easily-accessible adjunct to one of our other solutions.

On-Premise — For clients that need to classify extremely high volumes of data and have the resources to install eContext’s architecture onsite. This definitely represents the high-end of the spectrum, and is only necessary if you have the data-ingestion rates of a social media company or big data aggregator.

eContext API — The vast majority of our clients access eContext’s classification engine via API. Users have access to the eContext Taxonomy, can extract topics from data in real-time, and can retrieve keywords from the eContext dataset.

Because the API is the most commonly-used option here, subsequent posts will detail its use, including an overview of available functions and a guide to classifying content for a select variety of typical use cases. [/vc_column_text][/vc_column][/vc_row]

April 5, 2017 by eContext

Clients rely on your firm to sort through their data and make recommendations based on what you see. They need you to help them boost their competitiveness, and the minute your approach gets stale, you’ll risk losing their business.

Success with data classification

Your success with clients depends on your success rate with their data. Are you using the right tools? Can your systems accomplish the following:

Analyze content from social streams, third parties, and user behavior profiles and quickly organize them into a detailed classification framework.
Process an entire firehose of social network content or clickstream content from millions of users in real-time and accurately determine the contextual meaning for almost every interaction.
Provide qualitative analysis on content consumed by panelists
Provide granular analysis on the topics that panelists care about on every channel and device type
Classify text into 21 tiers and into 400,000 categories in 25 verticals

eContext can brag about this kind of performance, and we work with data specialists who use our technology as the filter through which all their clients’ data is pushed for better results. With eContext, they can bring data sets in from anywhere and quickly have a single, cohesive data set to look at that’s classified into deep categories for better accuracy rates.

Categorize data for better results

With the level of accuracy that eContext helps you achieve, you’ve got a great advantage over your competition. Because eContext is universal, you can use it for all your clients — no matter which industry they’re in, no matter where the data comes from, and regardless of the data’s original format.

Being able to reference all your incoming data through the same lens brings more cohesion to your systems internally. Less confusion and a better framework to start with on every project means fewer hassles with trying to get IT to help you convert data sources and more opportunities to focus on your work and impress clients.

Check out eContext, or contact us for a demo.

January 26, 2017 by eContext

Las Vegas casinos will soon be full of the world’s leading data crunchers.

This week, the city will play host to the Big Data Innovation Summit, a two-day forum on advanced theories and applications for big data. . With that much mathematical brainpower floating around, there’s bound to be a few card counters working the tables during the conference off-hours.

If you skim through the Summit’s speakers and topics list, you’ll notice a few recurring themes: big data (obviously), machine learning, and IoT. But there’s another focusthat emerges over and over again, one that might seem out of place if it weren’t so ubiquitous: the human element.

Technically speaking

Beena Ammanath, executive leader of data and analytics for General Electric will be there to argue on behalf of industrial applications for big data, “It’s in the industrial area that big data is going to have the biggest impact, transforming economies, saving lives, reducing power consumption, changing the way we live,” notes her presentation overview – these are highly aspirational and human goals for a corporation to have, and the burden falls on the analytics team to achieve them.

Boeing is also looking for the human element, and will present on big data technologies they’ve developed and the important role of humans throughout the process.

Microsoft’s James McCaffrey, a senior research scientist, is presenting on neural networks. He is promising to explain how neural networks operate without using “Greek letters or annoying jargon.” It’s a telling hook: artificial intelligence used to be a purely esoteric concept, but in 2016, it’s becoming something that increasingly affects — and is informed by — the actions of ordinary people.

Speaking of ordinary people, Arjit Sengupta, CEO of BeyondCore is talking about how his company interprets big data in a way that non-scientists can quickly understand. He’ll show how analytics is moving to a place where “users can easily overlay human intuition on top of automated analysis.”

Man vs. machine

We hear and read a lot about the threat of artificial intelligence to humanity — a matter of science fiction that geeks just love to talk about — but pop culture tends to minimize the truth that the human element will always play a critical role in the development and application of new technologies.

Everyday communication is a great example. Attendees at the Big Data Innovation Summit will probably talk about the need for better natural language processing and solid text analytics strategies, because real-world communication is such a complex, idiosyncratic system of signals.This is where human input plays a role, because in order for Big Data to deliver, machines have to be able to learn from us about what we mean when we speak and write. Machines need to know how to interpret informal, spoken language, slang-filled social media text, when we’re joking, when we’re quoting Caddyshack… it’s a complicated job. .

Our technology supports big data by bridging the gap between formal business English and everything else, helping to bring context to the new kind of data that businesses rely on to make better decisions. As the world’s largest text classification engine, we support any big data effort by bringing structure to the seemingly unstructurable.

For the last ten years, we’ve had humans curating a massive taxonomy than runs 21 tiers deep and contains almost half a million total categories. We can take data from any source and run it through our universal topic hierarchy in real time and deliver structured data for your applications. It’s simple and elegant, and it creates an opportunity to take a very deep dive into the human meaning of the data that you’re looking at.

As the number crunchers assemble in one of the most data-driven vacation and conference spots on the planet, we’ll be following the conversation about the importance of human language to Big Data. Enjoy the show!

January 22, 2017 by eContext

Consider an individual who posts often about the highbrow shows he’s watching. If you peeked quickly at his Netflix reviews you’d find a list of documentaries and period dramas and you might leap to some fairly quick conclusions about this individual’s demographic. Passive monitoring, however, would take it one step deeper and look at what he’s actually watching to find out that, yes, he did watch the documentaries that he reviewed, but he also watched “Dumb and Dumber” multiple times, along with the entire Three Stooges collection this month, so a different profile emerges.

Passive data collection can help marketers draw a more well-rounded picture of an individual or population.

Passive data collection

The Marketing Research Association notes that passive data collection “occurs without any overt consumer interaction and generally includes capturing user preferences and usage behavior, including location data, from personal mobile devices.”

As the market research industry adopts more mobile strategies and methodologies for data collection, passive data collection is a tactic that’s being used more often. Passive data sources include clickstream, search, and social data along with details such as app usage, location, mobile browsing behavior, any kind of search that’s going on, and depending on permissions, social media posts and activity. It’s even possible now to get information on battery life of the device being used.

The kind of data that’s available to marketers now is getting to be too big and broad to stand up to traditional strategies. For example, if you have user identification and timestamp information for various activities for search and mobile browsing as well as social activity, it would take manual analysis by a team of people to look at a website or URL to determine what high level categories various activities falls into. Introducing automation is the logical next step to becoming more efficient in market research.

Text classification engines

When humans are relied on to look at and classify data, there is always inconsistency, human error, or ambiguity in judgement. A machine is consistent, predictable, fast, and reliable, and allows you to be confident in the fact that each activity is classified into the proper category and assures you that you’ll have consistency across all channels (search, mobile, social.)

With structured data, you can now add the human element to analyze it more quickly and make good decisions about how to act on what the data is telling you. There are some interesting applications for passive data collection. KantarHealth published a helpful infographic that illustrates the application of passive data collection techniques in the healthcare industry. By pulling data from sources as diverse as social media channels and biometric devices, Kantar can help their clients deliver better user experience to patients and improve quality of life.

eContext is a text classification engine that allows you to take any kind of of text data—from any source such as click stream, search, social—and wind up with data classified, in real time, in a uniform fashion. eContext features particularly deep classification, to 21 tiers and into more than 450,000 available categories, so you can draw deeper, more accurate conclusions about your data, and move onto step two, which is pulling insights from the data and making smarter marketing decisions with the results.
eContext’s ability to bring a heightened level of clarity to your unstructured data means you can have faster, more effective access to your data and the marketing insights it can give you. In the face of more traditional research options, including digital research techniques, passive data collection combined with accurate data classification can give you an amazing edge and help you make valuable connections faster and with less of a financial investment.

December 8, 2016 by Patrick

This post is the fourth in a series on the eContext API and how our clients get value out of semantic classification. To learn more about our technology and its uses, feel free to get in touch.

Way back when we were in R&D here at eContext, you could find, at any given time, up to 200 language and subject-matter experts, working on a project basis, curating the vocabulary rules that became the core of eContext’s classification. When the mass development process was over, most of these guys moved on–some to new employment, others to post-grad academics, etc.

I can’t tell you how many times we’ve heard back from one of these former employees, saying something like: “Man… this place where I am right now? They could definitely use what we were building.”

Our last few posts have really been devoted to consumer-oriented applications of eContext: market research, personalization, improved navigation for publisher and ecommerce properties, etc. And yet, while increased convenience and relevance for consumers should absolutely be emphasized, we can’t overlook the benefits of semantic classification for enterprise-level knowledge management.

Imagine you work for a company with a sub-par system of organizing and retrieving documents (which, incidentally, is probably true for most of you). You’re tasked with updating all of your organization’s training and development materials, but since those documents aren’t all in one place, you have to do a little digging. To make matters worse, prior documents have used different terminology to refer to the same concepts, meaning you have to perform separate searches for things like “human resource development” and “corporate education”, among many others.

This kind of gophering isn’t just tedious. It’s expensive. According to research conducted by IDC, inefficiencies in working with documents can cost an organization $19,732 per information worker per year and result in a 21.3% decrease in organization’s overall productivity.

No one, neither a consumer nor an employee, should ever have to conduct multiple searches to look up one common topic. It’s outdated; we have the technology. So below are a few basic steps to use eContext for enterprise document management. If you’d like to know more about the nuts and bolts of integrating classification in your organization’s existing architecture, that’s a conversation we’d love to have with you.

Much like in consumer-oriented content delivery, organizing enterprise materials with semantic classification really involves two separate phases: labeling the documents themselves, then aligning those labels with user navigation methods.

Step One: Classify Content with the eContext API

Broadly speaking, there are a few API calls you’ll want to use here, depending on the type of content to be labeled:

Classify/html – for HTML content
Classify/social – for most user-generated content, including long-form documents, archived enterprise messaging, emails, etc.
Classify/url – for any digital document that exists on a discoverable web page (this blog post, for example)
Classify/keywords – for extremely short text strings, when you want to limit classification to one best-match topic
Classify/text – for any general text data not satisfied by the above

Each API call (except for classify/keywords) will generate a list of labels describing the topics of the classified document; these labels can then be appended as part of the document’s metadata.

Step Two: Classify Searches for Apples-to-Apples Retrieval

eContext can be used to classify user searches in real-time, using the classify/keywords API call. For example, the search strings “leadership development” and “leadership instruction” would both be classified to the eContext topic, “Leadership Training”. Remember: eContext’s topics are organized hierarchically, so depending on how you opt to integrate topic classification into your existing infrastructure, you could retrieve content that matches your topic exactly, or include content that matches any of your topic’s subtypes.

Another feature of classify/keywords is the ability to constrain classification to one or more specified verticals. This is particularly helpful in select knowledge management applications where a given subject matter can be assumed. For example, in an organization that deals entirely in the buying and selling of automotive parts and accessories, searches can be automatically be classified to the most relevant automotive category.

While plenty of digital innovation is geared towards consumers, providing more convenience in exchange for increased sales and satisfaction, the technology of the office can often get overlooked, especially in small and medium sized businesses. Organizing your documents by topic–a portable, intuitive, and evergreen system–means employees can save their sanity and companies can save their cash.

What makes for good data?

Label Text Data with a General Taxonomy

Give us a Try

A great time to be a data aggregator

Pulling it all together

Data strategies

The challenge: Actually using the data

Using data tags to clean house

Ready for a new season?

Success with data classification

Categorize data for better results

Check out eContext, or contact us for a demo.

Technically speaking

Man vs. machine

Passive data collection

Text classification engines

Step One: Classify Content with the eContext API

Step Two: Classify Searches for Apples-to-Apples Retrieval

Ready to learn more?

Chicago

London

INQUIRIES

Ready to learn more?

Label Text Data for Machine Learning

What makes for good data?

Label Text Data with a General Taxonomy

Give us a Try

A great time to be a data aggregator

Pulling it all together

Data strategies

The challenge: Actually using the data

What does the eContext API actually do?

What are the different functions available through the eContext API?

What can I do with eContext Classifications?

Using data tags to clean house

Ready for a new season?

The Threat of Bad Classification

Understanding at Scale

eContext’s Semantic Classification

How to tell if you need semantic classification, in three simple questions:

Does your role involve any analysis of digital content?

Does it help to know what that content’s actually about?

Do you have the time or resources to analyze that content manually?

How eContext Recognizes Topics

Accessing eContext Classification

Success with data classification

Categorize data for better results

Check out eContext, or contact us for a demo.

Technically speaking

Man vs. machine

Passive data collection

Text classification engines

Step One: Classify Content with the eContext API

Step Two: Classify Searches for Apples-to-Apples Retrieval

Ready to learn more?

Chicago

London

INQUIRIES

Ready to learn more?