We are pleased to share with you all an interesting article contributed by Saad Sheikh who is the Chief Architect Consultant in Saudi Telecom Company primarily responsible for Network Deployment and planning for CSP Digital transformation involving domains of NFV, SDN, Telco Cloud, 5G. Always interested in those disruptive technology driving the industry transformation, Author hails from Telco CSP background and since 2013 working on Telco Cloud domain including Amazon, Huawei, Mirantis, VMware, RedHat etc.
Saad Sheikh Chief Architect – Consultant NFV, SDN, Telco Cloud, 5G at Saudi Telecom Company (STC) |
|
CommSP’s shift towards virtualization started since 2012 but 5years down the road still we do not find large workloads running on NFV/SDN. There are clear indications indirectly given by vendors to either move to Vendor Siloed NFV solutions or to slow down the NFV commercialization. It is also believed that with more and more traffic on NFV will reduce the performance. In other words the Open NFV Solutions are not meeting Telecom Service requirements at scale
In this Paper I should enlist the top obstructions both from standardization /technology and business from architecture aspects and I feel the suggested way will be our feedback to SDO’s /community to address them. Key point to summarize is that Telco’s cannot become IT company by mimic them but need to find a total new IT way for Telecom’s because we Telco’s have two intricate tasks to evolve existing networks in a seamless fashion and open our networks to 3rd parties and Application providers . Let’s try to reveal the Architect’s prospective
#1: Transition from Play Store to ONAP
To be the First mover is Fine but do we really measuring what we are supposed to achieve with NFV which is service agility and delivery from Months to Hours. Unfortunately the maturity is still low. After years of experience with NFV and Virtualization of Legacy Applications and many vendors claims their VNFs are Cloud ready it seems clear VNF should be Micro Service based and support Automatic LCM and it should smoothly work with Legacy Applications in seamless manner Similarly NFV Applications onboarding must support same concept as Mobile applications which is same model for all markets.
Till now the VNF certification is not automated although ETSI Plug tests have changed the situation , still the CommSPs are totally relying on Vendors for both VNF certification and its integration in network Based on our experience with leading vendors we still believe the vendors have not same understanding of Open standards and API’s as they still want to treat it like 3GPP , ETSI NFV ISG have done lot of work to solve these issues but at least still this part is not PnP and lacks automation . Over all the results from NFV solution components conformance testing is not the same as inter-operability. Telecom services usually have high requirement on performance and availability, usually requiring five nines of availability. Thus, when deploying services in software, the monitoring and validation is important, especially in face of failure, errors, and human mistakes.
Similarly as more and more applications come to cloud the case of Unified cloud for both Telecom and IT is a necessity specially consider management and migration aspect. However still the questions of Isolation and Security is not agreed upon. Even if all vendors claim their compliance to security it cannot be validated in field trials.
Overall still there seems a Gap between Standardization/Academia and Industry which means technology on ground is not same as depicted in marketing brochure.
#2: NFV Performance At scale problem lies in VNF
The biggest challenge and Question we still trying to solve is will NFV will work at scale. Means if we put 100Gbps traffic on vEPC whether it will be same as put 1Tbps . This question is further diluted with all Service providers’ careful plans to virtualize. Even AT&T as pioneer have not really speed up to migrate till now. Based on Lab trails we get notion that NFV at scale will not work in optimal and it will impact customer experience. The cost of it is paid by Service providers in form of over sizing their infrastructure, services and wrong licensing models which as per latest Survey by RedHat costing them 36% more . The situation becomes more complex as there still not a mature and open testing framework and Service providers look again to vendors to perform and validate their own solutions . I think except for Telco’s with Strong R&D and industry presence it is not easy issue to solve in a quick go and it will continue to hamper all efforts to commercialize NFV networks at scale. Portability mechanisms and management across NFVI realizations. There could be multiple virtualization methods, multiple NFVIs, and multiple MANO systems. How to support seamless migration across different platforms is challenging.
Another aspect that makes the NFV Performance at scale difficult is VM layout for a specific capacity from different vendor varies so deploying on a single powerful VM or on multiple VMs not lead to same capacity. Finally, to reveal the actual performance that one will experience in the real network, we need to test with different network traffic, not only using plain dummy traffic to test throughput but also application-aware traffic , there are some mature test tools for NFR but not for FR and customer require Functional testing and performance validation at scale .
The other domain we need to understand is SDN as enabler for NFV not in Infrastructure but in applications to deliver NFV performance at scale. Offloading the VNF stateless information from Application like LB , Processing Tier to Switching tier like SDN can improve performance a lot but this approach requires strong Micro service architecture which still needs some more tile to mature for Telco’s
#3: Do Programmable Network real Target
It is not the programmable but the Automated Network which is target. However how we can program a network without modeling it properly. Problem that lie on hand currently is that there is a Gap between programmer and Telecom experts. ONAP is doing a good job to reduce a Gap by introducing easy to use tools that allow Service providers to automate the Network without knowing too much detailed programming. ONAP Service Logic Interpreter Directed Graph Guide (DG) is one such initiative. Details can be seen here
During Programming the Networks one more dimension we need consider is Service Function chaining. As of today SFC applied on inter VNF links or Forwarding Graphs however with Micro service architectures there can be “N” components/modules in the same VNF, how to chain such flows as it needs alignment of intra VNF and Inter VNF flows whose standardization is lacing as of today. Complete End to End design of ordering and parallelism is critical to the performance and the correctness of the entire service chain.
#4: NetOPS is still not Agile compared to IT Counterpart
The Networking equivalent of DevOps which we so called NetOps is still not agile. Both the scrum runs as well as New Service TTM is still not able to deliver its promise. It is because as enlisted in SDXcentral latest survey that application automation (40%) is far more than Network part automation (20%) . In other words the lot of manual pieces make it difficult to stich the end to end solution to achieve CI/CD pipe line. In fact the success of DevOps is more linked to close integration of tool set which simply is not normalized in Telecom industry as of now. The same is not true for NetOps. With multi-vendor network architectures, they are faced with trying to force-fit a diverse set of APIs and data formats to work together seamlessly to deploy all the components in the deployment pipeline.
“DevOps are often developers themselves; highly skilled at coding a solution to just about any problem. NetOps, on the other hand, are highly skilled networking professionals. Integration in a network is about protocol interoperability, not plug-ins and APIs. It’s a completely foreign world to most NetOps, and the tools and frameworks available are not well-suited to the kind of integration required to build a continuous deployment pipeline “
It’s clear from the challenges faced by NetOps in automating the network that they can’t catch up to their DevOps counterparts on their own. That means the networking industry at large must do more than just offer APIs and examples of automation. It means stepping up to meet the challenge with communities, support, and training that focuses as much on the basics of coding and continuous deployment as it does on interfacing with specific devices.
#5: MANO and Orchestration seems very complex
If Telco’s will ever reach their ultimate #1 goal of virtualization it must be automation and this piece cannot be achieved unless you truly orchestrate your network. It is a reality that it’s taking too long to get a good commercial MANO product. In fact on ONAP/ECOMP even AT&T started with orchestrator that was completely closed because none of the open source was ready. It’s not there yet.
Cable Labs believe that just do an abstraction layer between your orchestrator and your VIM and VNFs. The workaround isn’t that bad for this part of the stack. APIs are becoming the de facto standard. The lower in the stack you go, that’s the best place for open source.
Issues for Orchestration and Network Service modeling are still to be solved and we need shift from ONAP and OSM demos to live trials. It is consensus that ONAP will be final destiny for Telco’s but with Telefonica leading OSM and claims on standardization especially on Information model the key element in Operationalize the NFV we are still not clear how market will evolve.
Can we think that ONAP will grab the NFVO pie and OSM will come to End to End Orchestrator owing to its detailed work on modeling of network and information model?
Vendor Agnostic Network Function and service model is still an open debate specially consider the fact many vendors are reluctant to freely share their VNF meta models or artifacts with each other .Until recently, there was a big debate in the telecom world whether YANG from IT world should also become the standard modeling language for orchestration. The ability to combine TOSCA and YANG integration is gaining wider acceptance now, as this approach seems to provide the best of both standards – where TOSCA is responsible the service lifecycle and YANG controls the network configuration of the VNFs.
#6: Hardware acceleration roadmaps
Shared Memory, System on Chip and FPGA are technology trends that can really make Server cost model attractive to invest in NFV. However Intel Virtio standardization for smart NIC is a bit late. Even as of day of this writing it do not support the Off load making it harder to use it based on consolidated NFV solution based on OVS +DPDK processing . It leaves industry only choice to still rely on SRIOV for Smart NICS which is not a scalable solution as it require investment and customization /Support from VNF and ultimately making it hard to make a Light VNF which is our target.
For optimal Capacity and CAPEX investments by Carriers on NFV the use of smart NICS for data plane VNF’s is very important.
#7: Architecture Limitations
Many VNF’s as of today assume their deployment in collocated Data Centers. It means different VNFCs cannot split across Data Centers. It means difficult to support high content and latency applications. 3GPP CUPS architecture is one direction to address this challenge. Various distributed protocols and consistency mechanisms must be used to support a fully distributed NFV implementation. Some of the network functions, for example, traditional 3GPP and telecom NFs, are not easy to scale since they are not modular. It is important to reconsider its design to fit the new cloud architecture. I think use of VxLAN and L2 E-VPN tunnels in SDN/Networking and Cloud VNF must support its deployment is split Data centers.
#8: Issues with Open Source Standardization
It looks lucid to transform CommSPs from SDO’s to Open Source approach primarily because except the Top Operators most do not have relevant teams and support mechanism like R&D , participation with communities to make the whole chain work . Result is that even the Open source communities are being swayed by vendors who invested large money in them with aim to steer it in their favors. I think Service Providers require persistent interoperability in large scale multi-domain, multi-vendor deployments of NFV/SDN and Openstack, OPNFV and ONAP looks like minimum communities that should develop easy methods encourage all Operators participation. Similarly Cross organizations cooperation is essential to build a thriving environment for Operator innovation and to reduce fragmentation posing issues with Open NFV/SDN solutions. Similarly Network modeling and Service orchestration will be real value that Operators can achieve by this transformation but till now API exposure and Information model standardization looks like issue hampering the mass scale deployments
#9: Services is Key to Lead Open Solution Market
According to ABI Research spending on NFV Infrastructure including servers, storage devices, and switches, SDN, Cloud would decline over time. At the same time, software and services will show higher growth rates of 55% and 50%, respectively. Even of today the standardization and multi-vendor involvement challenges remain stagnant and ready complete ready for carrier grade deployments at scale. Result of all of this is NFV is not delivered on its promise to cannibalize PNF world Products but rather still today act as an investment over head on top of existing Legacy Network . The Problem statement of non-mature Services and System integrators will be for how long CommSPs can survive to Plan duplicate investments.
“It is strongly recommended to separate Service and specially the system integration from product of NFV/SDN solutions in practice , Focus on CoE and in-house R&D capability with presence/involvements with ISGs “
The Services together combined with independent testing framework will surely support for fast commercialization of NFV at scale.
#10: Why Cost is the first metric
NFV/SDN may happen to be the best technology of Industry but matter of fact is that if a Technology is not supporting business objectives it may not be fit for use in commercial networks. Latest analysis predict using a multi-vendor solution on NFVI , VNF ,MANO can cost Operator 5X times compared to procuring similar capacities in the PNF . Currently still main stream vendors trying to sell licenses and services in same manner as they used to sell legacy so for sure Cost metric as KPI cannot deliver meaningful information .Instead TTM ,Automation, new Product offerings seems right metric to analyze NFV/SDN ROI . But these benefits can be difficult to quantify making investments in new technology domains a bit difficult. Till now the CSP major bulk of investment is going to CAPEX 80-90% . This skews the very business model of cloud which is primarily OPEX. This is a shift from traditional ways that operators have done business and it can distort the monetary benefits NFV can deliver, making them harder to grasp. Address this sooner rather than later. Re-Balancing the Investment between CAPEX and OPEX is key especially with many S&S and Subscription based models in NFV/SDN
#11: Why NFV/SDN Licensing is not optimal
Five years down the NFV road and still the VNF applications are not licenses as per recommended best practice suited for applications in the Cloud. Still it is not standardized and we need rely on Procurement negotiations solely to see if we can move from proprietary VNF model to something like pay-per-use, pay-per-GByte, pay-per-Gbit, Pay per Active user (SAU) ,pay-by-maximum-instances, pay-per-day, pay-per-month, and pay-per-minute. Add licenses for multiple VNFs to the mix, and it could get ugly, fast.
Still the issues for centralized license manager for NFV is not standardized leaving Operator to myriad of licenses each of which need to be managed by its own vendor specific manner . This is causing bit operator limit its choice to 2~3 vendors for its NFV/SDN program blocking the way for ISG’s and small niche players.
Key recommendations for license managed in NFV can be summarized as follows
Summary
At last I only want to share my idea that just like IT in Telco the way application will be written and its LCM will be managed is key if Telco’s can be agile and follow NetOps approach . Unfortunately till now the industry embarked on a critical transformation journey with a bottom up approach starting with things they know, vendors re-purposed existing orchestration products, which led to the bolt-on architectures. In other words, they used same old methods and expected new results. This approach needs change in an automated way. This is an important issue and I hope in next Open Stack summit I shall be speaking about it about our unique understanding on this.
1. https://events.linuxfoundation.org/wp-content/uploads/2017/11/Akraino-Technical-2.Overview-OSS-Shane-Wang.pdf |
||