Healthy APIs: Take One API and Call Me in the Morning

  Januar 08, 2014

Healthcare applications – both mobile and Web-based – are going mad with the inception of the Affordable Healthcare Act. But to even think about writing health-related software, you have to get clinical information that is relevant and current. Fortunately, plenty of APIs – some endorsed by the U.S. government – are ready to use.

The number of healthcare and wellness applications, for both mobile devices and websites, have grown at a huge rate over the last two years. Start-ups focused on healthcare have grown almost as fast. One contributor to this surge in interest may be the imminent implementation of the Affordable Healthcare Act key provisions in 2014. With so much disruption caused by this act, some expect an emergence of non-traditional markets and its related services.

You don’t have to be a medical expert to take advantage of the opportunity – whether with a start-up, as an independent app developer, or as an extension to your existing company software. Several APIs help you connect your software to existing data sources or integrate medical data into yours, while maintaining privacy and adhering to government regulations.

NIH wants your queries

The U.S. National Library of Medicine maintains a list of APIs for many healthcare databases and functions. Currently, 27 are available for anyone to access.

While the NLM wants these APIs used, they do ask for specific attribution with their wording in the finished product (“This product uses publicly available data from the U.S. National Library of Medicine (NLM), National Institutes of Health, Department of Health and Human Services; NLM is not responsible for the product and does not endorse or recommend this or any other product.”).

So, what kind of APIs are stuffed in here? Need to check the spelling of a chemical (like some drug) that your user has entered? They’ve got one for that. Browsing the list can give you a sense of what kinds of information lend themselves to an API’s query/response model. Here’s a short sample.

DIRLINE is a good starting point. This directory of online information resources is curated by the NLM itself. The DIRLINE database contains location and descriptive information about a wide variety of information resources including organizations, research resources, projects, and databases concerned with health and biomedicine. Search results include brief records and full records in its DIRLINE API response format. For example, searching on “vitamin” as a keyword returns information about the Linus Pauling Institute at Oregon State University and the National Health Federation.

MedlinePlus Connect is another National Institute of Health project showcased by NLM. Medline takes medical diagnosis (problem) codes, medication codes, and lab test codes and returns a human-readable text description, such as anemia or meningitis. Medline does this by encapsulating that information inside an XML envelope. This technique of XML wrapping is routinely done in the SOAP protocol of information flow, which uses the SOAP data format (an XML schema) for the queries and their replies. (For more on SOAP versus REST, see Understanding SOAP and REST Basics.)

If you need what that API can get you, then you have to use the supported protocols. But it seems that RESTful APIs are growing faster than SOAP or other alternatives.

Several of these are useful for data lookup purposes. One program, the National Drug File-Reference Terminology (NDF-RT), supports both SOAP and REST. Two NDF-RT APIs provide developers with functions for retrieving NDF-RT data from the most current NDF-RT data set. Developed by the Veterans Health Administration, NDF-RT provides clinical information about medications, including therapeutic intent, mechanism of action, physiologic effect, and drug-drug interactions. Another API, TOXNET (TOXicology Data NETwork) is a cluster of databases covering toxicology, hazardous chemicals, environmental health, and related areas.

These are far from the only APIs available for healthcare development. Among others that aren’t listed on the government website that you may consider are:

Programmable Web does an excellent job of organizing APIs on medical topics.

Let’s do Gene Sequencing

The most useful APIs can get to data no matter where it resides. Usually, the data is stored on the internet somewhere, so a modern API usually talks to the Internet by some means.

One gene sequencing facility with an API that you can code in Perl is the University of Texas at Austin. Its BioMart API is a way to get to some rather specialized medical databases. One is the Ensembl project, which produces genome databases for vertebrates and other eukaryotic species, and then makes that genome information freely available online.

Let’s see what this BioMart API looks like.

The first task of the Perl script is to explicitly name the package location.

 use lib ‘/home/scott/Downloads/biomart-perl/lib';
 
Once that is done, you explicitly reference the central biomart repository, and put that just after the naming previously done. The code to do this is a little more complicated, but has a built-in error handler.

my $confFile = (grep { m/biomart-perl\/lib+$/ }
@INC)[0]."/../conf/apiExampleRegistry.xml";
die ("Cant find configuration file $confFile\n") unless (-f $confFile);
 
$confFile is the path to your registry file at BioMart. The central registry service can be found at http://www.biomart.org/biomart/martservice?type=registry.

So, lets get the gene information that is associate with an Ensembl gene ID.

use BioMart::Query;
use BioMart::QueryRunner;
  
my $action='cached';
my $initializer = BioMart::Initializer->new('registryFile'=>$confFile, 'action'=>$action);
my $registry = $initializer->getRegistry;
 
my $query = BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default');
 
 
    $query->setDataset("hsapiens_gene_ensembl");
    $query->addFilter("ensembl_gene_id",
["ENSG00000224813","ENSG00000248149","ENSG00000239664","ENSG00000237491","ENSG00000241768","ENSG00000241180"]);
    $query->addAttribute("ensembl_gene_id");
    $query->addAttribute("protein_id");
 
$query->formatter("TSV");
 
my $query_runner = BioMart::QueryRunner->new();
 
Note the action selector (my $action) that comes after the BioMart calls. You set it to clean to start a fresh configuration. This code uses cached, which skips the configuration step on subsequent runs through the same registry. Query setup is first done by one long line of code, and then the gene data follows in literal expression. The query is then run.

An API like BioMart one shows how much informational power can be had from fairly simple code, as just one example of the many medical databases available for query online. Each has a purpose, and it is the developer’s role to understand which can help a project’s success.

See also:

[dfads params='groups=933&limit=1&orderby=random']

[dfads params='groups=937&limit=1&orderby=random']