Companies have to process more and more unstructured data, such as documents, tables, presentations or e-mails, on a daily basis.
This leads to increasingly opaque file systems and access authorizations.
Information security is threatened by undesired rights accumulations and orphaned user rights of previous employees.
What is needed to fix this problem is clarity on who may access what data, who controls data access and whether the file system has structural weaknesses that facilitate unwanted data leakage.
This paper presents a consolidated description of big data by integrating definitions from practitioners and academics.
The paper's primary focus is on the analytic methods used for big data.
As enterprise organizations expand to compete in a global economy, IT staff face new challenges in providing the increasingly distributed workforce efficient access to essential data.
Additionally, the cloud services allow an unprecedented economy of scale.
Unstructured data is a generic label for describing data that is not contained in a database or some other type of data structure . Textual unstructured data is generated in media like email messages, Power Point presentations, Word documents, collaboration software and instant messages.
Non-textual unstructured data is generated in media like JPEG images, MP3 audio files and Flash video files.
Size is the first, and at times, the only dimension that leaps out at the mention of big data.
This paper attempts to offer a broader definition of big data that captures its other unique and defining characteristics.