We Created Document Dysfunction. It Is Time to Fix It.

It is time for some of us building software to take a hard look in the mirror.

For years, we promised technology would solve the world’s information management problems, but 85% of business information is still “dark data,” potentially useful insights lost in a rising tide of disconnected documents, emails, Slack conversations, voice-to-text messages, and myriad other forms.

As the digital transformation accelerates, the sheer volume and opacity of documents make it harder to ensure quality, consistency, accountability, and regulatory compliance.

We call this problem “document dysfunction,” and it affects nearly every type of organization, from finance to health care to real estate to government and more, impacting millions of citizens, customers and companies.

What does document dysfunction look like?

It’s a bank with thousands of loan documents, but zero visibility into the terms and conditions that impact the value of those loans.
A government agency with hundreds of project agreements that need to be audited and updated due to a regulatory change.
A commercial real estate firm with hundreds of contracts, but no insight into millions of dollars in underlying obligations.
A health care system with dozens of doctors spending “pajama time” every night recording and writing patient notes in a laborious and disconnected process.

Now multiply those cases by hundreds of thousands of companies and organizations around the world. That’s document dysfunction.

It is not enough to shrug our shoulders and say “hey, we just make the tools… we are not responsible for how people use them.” It’s time for the tech industry to step up and help solve these problems.

Right now, there are lots of smart people working to use artificial intelligence to tackle mind-boggling problems like asteroid mining or AI enhanced humans.

We think that is great, but we are focused on using AI to solve much more mundane problems. We are a document engineering company, and we think AI can solve the information management problems that afflict businesses large and small.

If that sounds boring compared to human settlements on Mars, that is okay.

We think “Boring AI” could be a pretty big deal.

We envision a world where documents that are written for humans can, quickly and securely thanks to AI, be understood as data by computers. Even better, we envision a world where AI helps people construct documents that are engineered for maximum data reuse from the start, fostering human creativity and unlocking billions of dollars in increased efficiency, improved compliance, and business insights for companies around the world.

We know we are not the only people thinking about these issues — researchers and academics and other luminaries have been raising these issues for years. But we think science and technology have advanced to the point where we can finally solve these problems.

We see five principles that can lead us to more effective solutions:

First, we need to bring together multiple scientific domains in innovative and powerful ways.

Of course, we need to apply artificial intelligence in natural language processing using machine learning methods like neural networks or Bayesian techniques. But we also need other disciplines like image processing and recognition, semi-structured information, declarative markup, and even approaches inspired by natural sciences like the theories of cognition and evolution. Breaking down the walls and combining these disciplines will give us new ways to solve these very hard problems.

Second, instead of “Big Data,” we need AI that understands “Small Data”– the unique sets of business documents distinctive to individual companies.

There is a lot of this “Small Data,” and each company’s small data is different. What people call Big Data artificial intelligence these days is usually just highly supervised machine learning on massive datasets. The preparation of those datasets is labor intensive and prohibitively expensive for most individual companies. We need algorithms that are smart enough to figure out your specific documents in your company or even your division within your company, in a potentially small volume, with only minimal learning and guidance.

Third, this focus on company-specific “Small Data” will enable us to maintain the privacy and security of each individual customer.

As an industry, it’s fine to develop and hone algorithms using massive amounts of publicly available documents and data sets, but we should not use learning from one customer to train algorithms for use with other customers. At a time when some are looking to combine data from multiple customers to increase their insights, raising questions about privacy and security, we believe it is better to treat each customer’s data as its own unique universe.

Fourth, past attempts to use AI to try to solve business data and document problems have failed because they focused on the wrong altitude — helping to complete words or sentences instead of applying AI to the document as a whole.

Algorithms need to understand the structure and strategy behind a company’s business documents, not just the co-occurrence of individual words and phrases. If we can create tools that can understand the different portions of a document, and their unique usages in an individual company, COOs will have powerful new ways to accelerate performance, monitor accountability, and ensure legal and regulatory compliance.

And fifth, to be truly effective, we need solutions that do not disrupt existing workflows or require massive investments in staff training, IT development, or armies of consultants.

From the start, AI should enrich the tools and routines that frontline workers already use to get their work done. The past 50 years have proven that you cannot force employees to adapt to straitjacket templates, you have to provide solutions that fit into how they already work, and reduce their repetitive tasks to foster their creativity. The more users accept the AI’s help, the smarter and more helpful the AI will become. It’s a virtuous cycle.

It is fashionable to say that “AI is going to take our jobs,” but we can do better. Companies that focus on AI to cut costs may do okay in the short run, but companies that use AI to empower their frontline workers and drive their strategic advantage will be the real winners.

The future isn’t about AI making human beings obsolete. The future is about AI making human beings and companies more productive, effective, and creative.

We don’t think that’s boring at all.

And we look forward to working with others across the industry to make that future a reality.

We want to start a public conversation about these issues.

Tell us your document dysfunction horror stories, or your dream for how technology could give you greater efficiency and control. Or maybe you completely disagree and have never met a dysfunctional document in your life. Or maybe you think our principles are all wrong — we’d still like to hear from you!

Join the document dysfunction conversation on Twitter.

Contact us at Docugami.com.