BlockChatter #2: ‘We Are Still In The Data Gathering Phase’

BlockChatter #2: ‘We Are Still In The Data Gathering Phase’

BlockChatter2-We Are Still In The Data Gathering

What is Encrypgen? Why do genetics studies are so critical? How can blockchain technology help in that field? David Koepsell, the CEO of Encrypgen, answers those questions.

In the first interview, we talked with the CEO of the TIE. In the second episode of our series, we talk with the creator of one of the most exciting blockchain projects. David Koepsell and his Encrypgen are set to revolutionize our approach to genetics studies. Let’s find out how they are going to do that.

Dawid Paluch: What are you trying to achieve?

David Koepsell: Our company has been focused on providing a solution for two problems in our field. The first is the need for scientists to have access to genomic data.

The second is that people who own the genomic data mainly got it through consumer testing companies like 23andMe and Ancestry. They are not taking the full advantage of owning the data.

To give you an example, there are about 20 million people who did Ancestry or 23andMe tests. Some scientists calculated that if they had been paid directly for the data that was sold by the companies that did the testing, they would have made an extra $130 each. 

So you are paying money to get the genetic test and then somebody else is making more money selling the data that was found during the test.

We want to solve this injustice by our platform, which is now functional, on which people can take that genetic data, upload it, and make it available to be purchased by researchers.

It eliminates two problems by getting people paid directly instead of having the data sold on their behalf and also making more data available for the researchers. 

D.P.: You started this company with your wife, right?

D.K.: That’s right. Dr. Vanessa Gonzalez is my wife, and she is our co-founder. She doesn’t have a paid position within our company because she works for the Mexican National Genomics Institute. But she is our chief genetic science advisor providing us guidance and basically acting as our target research user. 

Explain that to me. I can understand your wife’s involvement in that project. But how did it happen that ethics teacher goes into genetics, into blockchain?

Since I met Vanessa, I’ve been interested in issues of ethics, technology, and genomics. So in 2006, I had a visiting position at Yale center for bioethics. My research there led to the book called “Who owns you”, which is about the ownership of genes. I got involved in that issue in a sort of a public way. I helped write briefs on the Myriad case, which ended up overturning the practice of patenting genes. After that, Dr. Gonzalez and I continued to collaborate on issues with genomics, ethics, and science. We wrote a couple of chapters and articles about big data, genomic data sharing, and privacy. I am a hobbyist in blockchain technology. I started mining a couple of years ago.

What did you mine?

I was mining Litecoin and Bitcoin for a while, and then more recently, Ethereum. That plus my interest in the nature of money, cause I’ve been writing about that for a while too, made me aware of the potential for blockchains in other fields. That’s how I became interested in this topic. I thought: maybe this time I can build something instead of just writing about something.

Why is your project not yet widespread?

We are a U.S. company, and that puts restrictions on how we can market our project, especially in the crypto community.

We don’t really focus on the token, which makes us not so enticing to people who are mostly interested in crypto.

Most of our marketing is for the genomics space. I’ve been mostly going to events that give exposure in the genomics community. We’ve been in Nature, we’ve been in The Scientist, Wired. Not typical crypto journals, but more focused on science in general.

How can blockchain help in the genetics field?

There are a couple of ways. One would be if

we could put genomic data on blockchains and use the distributed nature of blockchains to help preserve that record completely.

To assert that kind of ownership. That is something that every genomic blockchain company has considered and none has pursued seriously.


Well, genomic data is huge. Your typical genetic test is gonna be somewhere between 10 and 25 MB. Those tests are very small. They capture only 1% of your genome. So, if you want to do a whole genome sequence, you’re talking about GBs of data.

Pumping megabytes of data into blockchain makes the blockchain unusable. There are a number of solutions that people working on, for instance, IPFS (InterPlanetary File System). All of those are actually compromises because they don’t really put the data on the blockchain. Maybe the technology will mature to the stage when we could do that. And that would provide robust distributed proof of ownership of the data.

Let’s talk about the biggest challenges in genetics studies. What are they?

About 20 years ago, the human genome project started to wrap up. They were beginning to complete the map of the human genome. But that is really only a picture from 30 thousand feet. It gives us the general notion of where different genes lie.

It doesn’t tell us how they function with their environment, metabolomics, how they are involved in the processing of chemicals and foods.

We really focus on huge data now. Not just big data, huge data. We got to have data from a dozen different dimensions.

Over the lifetime of an individual, their genetics change, and your adaptation to the environment alters your genome. These are confusing and powerfully problematic computational issues.

We have to understand all of this to have a good picture of human health and the use of genomics in a way that can provide us with a better lifestyle, enhance life spans and health. Right now, we are still in the data gathering phase.

One of the reasons that 23andMe and Ancestry are so successful is that behind the scenes they’ve been gathering this data and making really rich pictures of individual genomics and health. Pharmaceutical companies and researchers are buying that data for millions of dollars. We need much more of that.

You and I are mostly the same genetically speaking. But there are 600 thousand or so variances with some interesting stuff. Many scientists think that there is a lot of data in the rest of the genome that could be important. So we want to get the whole genome sequence. 

That’s expensive, that’s data heavy. But to do the science right, we’d really like to move into that direction.

So that is another challenge: making that cheaper, using that data efficiently, and spending a lot of computer hours to try to understand it.

The biggest challenge is that we still don’t have enough data to work with. 

We are still in the infancy of genomics in many ways. 

Your company’s Twitter published an article about GWAS. I want to talk about it. Basically, the premise of it is that we have mostly white people’s samples. Is there anything we can do to fix that problem?

Yes, that is a large part of why we started actually. So, the people who are getting tested are largely people who can afford it recreationally. 23andMe and Ancestry tests are largely recreational. They don’t give you important health data, they only give you stuff about your ancestry mostly with some suggestions about health habits. The people who can afford that, tend to be white, middle-class Americans and Europeans. We need to make sure we get other populations to do proper big science.

One of the reasons we started our company was to make that affordable.

We’re not gonna subsidize the test but we can help to pay for the cost of the test. When that data is sold you’re gonna get reimbursed.

People, who might be reluctant, because they don’t have $90 to spend on a test, might be more likely to do so, if they’re gonna make a $150 back over the next couple of years, simply by having that data and putting it on our platform. We are using the free market as a sort of incentive.

You’ve launched the world’s first genomic data marketplace. How does it work?

It is actually pretty easy to use. We want to make it easy for everybody to be able to use it. People like my mother can take a genomic test and they are not necessarily computer savvy, but they can go to our website, sign in, make an account, log in. There is a simple interface where you can drag and drop your data file from 23andMe and Ancestry.

Once that data is uploaded, we index it, which means we are looking for important genomic markers. Things like RSID tags and other information. It is important for scientists to locate specific variants. That indexed file is stripped of all ID data. You set a price for the file, and if some researcher finds you in a search, he/she can buy it. Then, you start making money. There is nothing else for you to do.

The customers set the price for themselves.

Yes. Also, to make it more valuable, users should fill in as much profile information as possible. What makes that data valuable for researchers is being able to associate it with other things about the user.

So, you give people the platform, when they can sell information about themselves.

They only sell the data file. Searches are only among variables like demographics, medical information, etc. When researcher purchases it, he/she only gets the genomic data file stripped of all ID information. 

You have DNA token. Why do you even need a token?

The Multichain platform that we built on tracks the transfers of assets from one place to another, and also associates one asset with another. With Multichain, creating another asset is simple. We created DNA token as the currency of that underlying transaction.

To incentivise people to use the platform, we decided to create a mirror token on ERC-20. They can take a Multichain token, turn it to ERC-20, then sell it for whatever asset they want.

Maybe they don’t want to hold on to our cryptocurrency, maybe they want to turn it into something that they can buy a hamburger with. That is why we issued the token. To provide extra value and an extra incentive for people to put their data on the gene-chain.

 You are not bothered by the price of the token, are you?

In our long-term business plan, whether it goes up and down, doesn’t make-or-break our company. We function as the medium of a transaction anyway. We hold a lot of tokens, so obviously we look at the price, but it won’t sink us if the token is very cheap. It could certainly improve our bank accounts if it goes up, though.

What about funding? Do you work with some bigger companies?

We were self-funded originally to build the product. With friends and family money. We spent about $800,000 of that. We are looking to do a small fundraise to do a mass marketing campaign.

Our calculations have us profitable about 50,000 files. And we are at about 750. We could continue to get the files through our current effort, which would include partnerships with other companies, with research projects, etc. But we know, based on our current metrics from Google Ads campaign, that with a million dollars, we would have 50 thousand users in a matter of months if we would just ramp up the volume of our current campaign.

Unless something happens, which brings a lot of users overnight, which is something that we are working out right now.

And you don’t want to reveal it just yet?

I can tell you a little bit.

We are part of a research consortium. That research consortium aims to bring 100,000 new users into their research. It looks like those people will become users of our product as well.

We are very hopeful about that alliance. It appears to be going in the right direction.

Another interesting option is that we have a number of companies saying: ‘what if we want to recruit a certain type of user?’ Some researchers are already paying for data. And we might have access to data. In fact, we can invite users to the platform specifically for that purpose.

We’ve just created a new page on our site for Active Studies. We can’t give you all the information if you want to be a part of it because then everybody would say that they have all of those attributes. But we will give a clue of what sort of people those studies require. We will give you enough to know that the risk is lower to upload your data and you are likely to be recruited as long as you actually meet the criteria and we can confirm it.

How many users do you have right now?

We have over 1,000 users who’ve created accounts and have profiles. We have about 750 files within the system. Most of those are indexed now. We have conducted 150 transactions. That comes in spurts. Every so often, it looks like some research company comes and purchases a bunch.

You wrote in your blog post that decentralization is the future. Why?

I want to see our product moving towards the future. Things like gene-chain become impossible to stop. Like nodes and torrents. When you get enough of them distributed around the world, nobody can stop it. That is important to science. That is important to consumers who have access to what is actually theirs.

Our current platform is far from decentralized. Any similar platform isn’t really decentralized. In fact, a lot of cryptocurrencies are poorly decentralized as well.

That is a threat to the robustness of the network. If you look at torrents, for instance, there are several thousands of nodes for the most highly prized assets on the torrents.

For any of these systems to be robust, to survive the possibility of governments or corporations deciding they don’t want them around, you really need a highly distributed infrastructure. If somebody someday in the future wants to shut us down, I can open the whole thing, share nodes everywhere and let the world run it, just to ensure its survival, when it is fully distributed.

What are the plans for the next month? There is the consortium, and what next?

We are building a B2B product since we are Microsoft startup. One of the things that Microsoft startups get as an incentive is an opportunity to have a bunch of free stuff if they build a B2B product to list on Microsoft store. Right now, our product is marketed mostly to consumers. But we have a notion of how to create a B2B product, and we are in the midst of doing it. It’s gonna be available in September. That will open a new channel for users from the corporate world.

Where will EncrypGen be in two or three years?

In the next two years, as long as we meet the users base goals that we’ve set and the channels that we’re working on come through, we’ll be making about 1,5-2 million in revenue per year.

That will make us secure for the long term, and we’ll focus more on building that user base. 50 thousand is profitable for us, but I want 500 thousand, a million.

There is no reason we can’t reach 10% of users who are doing 23andMe. And that would be 2 million people. The sky’s the limit once we start to reach that profitability.

Then we will start to get noticed by companies like 23andMe and Ancestry, and they begin to wonder whether they should do one of two things: buy us and put us out of business, or go the way we are going and incorporate our platform to theirs. I would prefer the latter because I want to see their users get paid. There is no reason they couldn’t do it. They could call me tomorrow and make me very happy too.

Dr. David Koepsell: Co-Founder and CEO of EncrypGen. David is an entrepreneur, author, philosopher, attorney (retired), and educator whose recent research focuses on the nexus of science, technology, ethics, and public policy. He has been a tenured Associate Professor of Philosophy at the Delft University of Technology, Faculty of Technology, Policy, and Management in the Netherlands, Visiting Professor at UNAM, Instituto de Filosoficas and the Unidad Posgrado, Mexico, Director of Research and Strategic Initiatives at COMISION NACIONAL DE BIOETICA in Mexico, and Asesor de Rector at UAM Xochimilco.

Artykuł BlockChatter #2: ‘We Are Still In The Data Gathering Phase’ pochodzi z serwisu | portal with cryptocurrency bitcoin & blockchain news.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>