Materials Science at Scale is Big Data
Collaboration
Modern science is innately collaborative. Collaboration enables researchers with complimentary skills to address highly constrained engineering and scientific problems. Many will claim that collaboration is a fundamental short coming in research science, but I contend that educating users on best practices in collaboration really is the struggle. In this course, I hope teach the students to make their ideas more permeable to their colleagues and their community at large.
Collaborative research science is a complex process that introduces non-technical challenges. Typically changes of theory, direction, coding, experimentation, and other individual activities can be complicated by working with others. In the past it was easy to work with a tight knit group of colleagues who’s raport could simply drive a collaboration. Currently, the fluidity in collaboration is complicated by information sharing. We as a community are contributing to and sitting on top of an immovable mass of legacy materials data.
What is the most annoying thing for you to share?
Big Data as Hierarchy of Needs
In modern science, collaborations are typically driven by the transfer of data and its big.
The V’s of Big Data
- Volume
- Velocity
- Variety
- Veracity
- Value
A Verbose Big Data Hierarchy of Needs
A hierarchy of needs requires that its bases are satisfied before advancing up in layers. For Big Data, this requires that the value of data can not be assessed before its truth. The truth and value of the data require both fluid verbal and digital lexicons amongst a group of diverse specialists; sound familiar to peer review? A DOM or API indicates a sort of digital lexicon. It follows that Big Data can be split into technical (rational) and non-technical (irrational) categories.
Variety in Big Data (IMO) is an abstraction that is incomplete. Big Data is big because many agents contribute to growth and existence. Many agents implies that the Variety is not only in the Typing of Data, but also in the language that the Data is discussed. Variety can by split into in technical and non-technical categories. Addressing questions on the irrational spectrum of Big Data requires the input of many agents and their personalities. Variety in Lexicon intimates at a language problem; the language Language can pose one of the most substantial challenges in agreement on peer reviewed ideas. Language is a consideration in the variety block of Big Data. Language takes time to mature. At its earliest stages, it is a clumsy pidgin language that can grow into a formalized creole. Our language for building cross-disciplinary collaborations in Big Materials Data will take time to grow. I contend that digital conversation in materials science will empower researchers to work more freely and contribute to the knowledge pool of materials science effectively.
Rational versus Irrational Agents
Rational and Irrational agents are a product of Behavioral Economics which is an thriving sector of data science. An enormous unconstrained challenge in Big Data is the intersection of its rational and irrational components. Rational data considerations are of the technical flavor. There are often optimal or at least acceptable to solutions to dealing with the volume, velocity, and typing variety of data. Irrational complications in big data exist on a peer-to-peer level. They take into account an agents experience, lexicon, personality, bias, and idiosyncracies in their research. It is not always easy to find a formidable colleague to address a research challenge. This is not for a lack of need or want, but moreso that soft skills are neglected as a critical piece collaborative research science.
On Transparency
My Thoughts: Peer review publication takes inordinate amount of time to share, often, uncertain results. The current societal pressures on the engineering sciences will continue to stockpile until an improved model for dissemating research results exists. I promote a broad dissemination of research during the research process. Contributions from the community can drive work in high-valued research directions and prevent an extra expenditure in energy when proofing and revising research. Scientific research needs a more agile bent on it.
Increasingly, access to research information is being demanded by both the government, industry, and the public scientific community. Recent examples indicate that The Corruption of Peer Review Is Harming Scientific Credibility. The mar of other scientists are having a strong impact on the public’s and science’s opinion of scientific research. As a result, there is a strong need for fluid information access and reproducible scientific research. Hiding our scars is adverse to the community.
Efforts to create standards for Research Data, Digital Object Identifiers, and Persistent Uniform Research Locator allows users to confidently share their information while recieving credit.
Mozilla Science Lab, GitHub and Figshare team up to fix the citation of code in academia